This commit prevents tail-drop when a qdisc is present and the ptr_ring
becomes full. Once an entry is successfully produced and the ptr_ring
reaches capacity, the netdev queue is stopped instead of dropping
subsequent packets.
If producing an entry fails anyways, the tun_net_xmit returns
NETDEV_TX_BUSY, again avoiding a drop. Such failures are expected because
LLTX is enabled and the transmit path operates without the usual locking.
As a result, concurrent calls to tun_net_xmit() are not prevented.
The existing __{tun,tap}_ring_consume functions free space in the
ptr_ring and wake the netdev queue. Races between this wakeup and the
queue-stop logic could leave the queue stopped indefinitely. To prevent
this, a memory barrier is enforced (as discussed in a similar
implementation in [1]), followed by a recheck that wakes the queue if
space is already available.
If no qdisc is present, the previous tail-drop behavior is preserved.
+-------------------------+-----------+---------------+----------------+
| pktgen benchmarks to | Stock | Patched with | Patched with |
| Debian VM, i5 6300HQ, | | noqueue qdisc | fq_codel qdisc |
| 10M packets | | | |
+-----------+-------------+-----------+---------------+----------------+
| TAP | Transmitted | 196 Kpps | 195 Kpps | 185 Kpps |
| +-------------+-----------+---------------+----------------+
| | Lost | 1618 Kpps | 1556 Kpps | 0 |
+-----------+-------------+-----------+---------------+----------------+
| TAP | Transmitted | 577 Kpps | 582 Kpps | 578 Kpps |
| + +-------------+-----------+---------------+----------------+
| vhost-net | Lost | 1170 Kpps | 1109 Kpps | 0 |
+-----------+-------------+-----------+---------------+----------------+
[1] Link: https://lore.kernel.org/all/20250424085358.75d817ae@kernel.org/
Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
---
drivers/net/tun.c | 31 +++++++++++++++++++++++++++++--
1 file changed, 29 insertions(+), 2 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 71b6981d07d7..74d7fd09e9ba 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1008,6 +1008,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
struct netdev_queue *queue;
struct tun_file *tfile;
int len = skb->len;
+ bool qdisc_present;
+ int ret;
rcu_read_lock();
tfile = rcu_dereference(tun->tfiles[txq]);
@@ -1060,13 +1062,38 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
nf_reset_ct(skb);
- if (ptr_ring_produce(&tfile->tx_ring, skb)) {
+ queue = netdev_get_tx_queue(dev, txq);
+ qdisc_present = !qdisc_txq_has_no_queue(queue);
+
+ spin_lock(&tfile->tx_ring.producer_lock);
+ ret = __ptr_ring_produce(&tfile->tx_ring, skb);
+ if (__ptr_ring_produce_peek(&tfile->tx_ring) && qdisc_present) {
+ netif_tx_stop_queue(queue);
+ /* Avoid races with queue wake-up in
+ * __{tun,tap}_ring_consume by waking if space is
+ * available in a re-check.
+ * The barrier makes sure that the stop is visible before
+ * we re-check.
+ */
+ smp_mb__after_atomic();
+ if (!__ptr_ring_produce_peek(&tfile->tx_ring))
+ netif_tx_wake_queue(queue);
+ }
+ spin_unlock(&tfile->tx_ring.producer_lock);
+
+ if (ret) {
+ /* If a qdisc is attached to our virtual device,
+ * returning NETDEV_TX_BUSY is allowed.
+ */
+ if (qdisc_present) {
+ rcu_read_unlock();
+ return NETDEV_TX_BUSY;
+ }
drop_reason = SKB_DROP_REASON_FULL_RING;
goto drop;
}
/* dev->lltx requires to do our own update of trans_start */
- queue = netdev_get_tx_queue(dev, txq);
txq_trans_cond_update(queue);
/* Notify and wake up reader process */
--
2.43.0
On Thu, Jan 8, 2026 at 5:06 AM Simon Schippers
<simon.schippers@tu-dortmund.de> wrote:
>
> This commit prevents tail-drop when a qdisc is present and the ptr_ring
> becomes full. Once an entry is successfully produced and the ptr_ring
> reaches capacity, the netdev queue is stopped instead of dropping
> subsequent packets.
>
> If producing an entry fails anyways, the tun_net_xmit returns
> NETDEV_TX_BUSY, again avoiding a drop. Such failures are expected because
> LLTX is enabled and the transmit path operates without the usual locking.
> As a result, concurrent calls to tun_net_xmit() are not prevented.
>
> The existing __{tun,tap}_ring_consume functions free space in the
> ptr_ring and wake the netdev queue. Races between this wakeup and the
> queue-stop logic could leave the queue stopped indefinitely. To prevent
> this, a memory barrier is enforced (as discussed in a similar
> implementation in [1]), followed by a recheck that wakes the queue if
> space is already available.
>
> If no qdisc is present, the previous tail-drop behavior is preserved.
>
> +-------------------------+-----------+---------------+----------------+
> | pktgen benchmarks to | Stock | Patched with | Patched with |
> | Debian VM, i5 6300HQ, | | noqueue qdisc | fq_codel qdisc |
> | 10M packets | | | |
> +-----------+-------------+-----------+---------------+----------------+
> | TAP | Transmitted | 196 Kpps | 195 Kpps | 185 Kpps |
> | +-------------+-----------+---------------+----------------+
> | | Lost | 1618 Kpps | 1556 Kpps | 0 |
> +-----------+-------------+-----------+---------------+----------------+
> | TAP | Transmitted | 577 Kpps | 582 Kpps | 578 Kpps |
> | + +-------------+-----------+---------------+----------------+
> | vhost-net | Lost | 1170 Kpps | 1109 Kpps | 0 |
> +-----------+-------------+-----------+---------------+----------------+
>
> [1] Link: https://lore.kernel.org/all/20250424085358.75d817ae@kernel.org/
>
> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> ---
> drivers/net/tun.c | 31 +++++++++++++++++++++++++++++--
> 1 file changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 71b6981d07d7..74d7fd09e9ba 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1008,6 +1008,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> struct netdev_queue *queue;
> struct tun_file *tfile;
> int len = skb->len;
> + bool qdisc_present;
> + int ret;
>
> rcu_read_lock();
> tfile = rcu_dereference(tun->tfiles[txq]);
> @@ -1060,13 +1062,38 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>
> nf_reset_ct(skb);
>
> - if (ptr_ring_produce(&tfile->tx_ring, skb)) {
> + queue = netdev_get_tx_queue(dev, txq);
> + qdisc_present = !qdisc_txq_has_no_queue(queue);
> +
> + spin_lock(&tfile->tx_ring.producer_lock);
> + ret = __ptr_ring_produce(&tfile->tx_ring, skb);
> + if (__ptr_ring_produce_peek(&tfile->tx_ring) && qdisc_present) {
> + netif_tx_stop_queue(queue);
> + /* Avoid races with queue wake-up in
> + * __{tun,tap}_ring_consume by waking if space is
> + * available in a re-check.
> + * The barrier makes sure that the stop is visible before
> + * we re-check.
> + */
> + smp_mb__after_atomic();
> + if (!__ptr_ring_produce_peek(&tfile->tx_ring))
> + netif_tx_wake_queue(queue);
I'm not sure I will get here, but I think those should be moved to the
following if(ret) check. If __ptr_ring_produce() succeed, there's no
need to bother with those queue stop/wake logic?
> + }
> + spin_unlock(&tfile->tx_ring.producer_lock);
> +
> + if (ret) {
> + /* If a qdisc is attached to our virtual device,
> + * returning NETDEV_TX_BUSY is allowed.
> + */
> + if (qdisc_present) {
> + rcu_read_unlock();
> + return NETDEV_TX_BUSY;
> + }
> drop_reason = SKB_DROP_REASON_FULL_RING;
> goto drop;
> }
>
> /* dev->lltx requires to do our own update of trans_start */
> - queue = netdev_get_tx_queue(dev, txq);
> txq_trans_cond_update(queue);
>
> /* Notify and wake up reader process */
> --
> 2.43.0
>
Thanks
On 1/8/26 05:37, Jason Wang wrote:
> On Thu, Jan 8, 2026 at 5:06 AM Simon Schippers
> <simon.schippers@tu-dortmund.de> wrote:
>>
>> This commit prevents tail-drop when a qdisc is present and the ptr_ring
>> becomes full. Once an entry is successfully produced and the ptr_ring
>> reaches capacity, the netdev queue is stopped instead of dropping
>> subsequent packets.
>>
>> If producing an entry fails anyways, the tun_net_xmit returns
>> NETDEV_TX_BUSY, again avoiding a drop. Such failures are expected because
>> LLTX is enabled and the transmit path operates without the usual locking.
>> As a result, concurrent calls to tun_net_xmit() are not prevented.
>>
>> The existing __{tun,tap}_ring_consume functions free space in the
>> ptr_ring and wake the netdev queue. Races between this wakeup and the
>> queue-stop logic could leave the queue stopped indefinitely. To prevent
>> this, a memory barrier is enforced (as discussed in a similar
>> implementation in [1]), followed by a recheck that wakes the queue if
>> space is already available.
>>
>> If no qdisc is present, the previous tail-drop behavior is preserved.
>>
>> +-------------------------+-----------+---------------+----------------+
>> | pktgen benchmarks to | Stock | Patched with | Patched with |
>> | Debian VM, i5 6300HQ, | | noqueue qdisc | fq_codel qdisc |
>> | 10M packets | | | |
>> +-----------+-------------+-----------+---------------+----------------+
>> | TAP | Transmitted | 196 Kpps | 195 Kpps | 185 Kpps |
>> | +-------------+-----------+---------------+----------------+
>> | | Lost | 1618 Kpps | 1556 Kpps | 0 |
>> +-----------+-------------+-----------+---------------+----------------+
>> | TAP | Transmitted | 577 Kpps | 582 Kpps | 578 Kpps |
>> | + +-------------+-----------+---------------+----------------+
>> | vhost-net | Lost | 1170 Kpps | 1109 Kpps | 0 |
>> +-----------+-------------+-----------+---------------+----------------+
>>
>> [1] Link: https://lore.kernel.org/all/20250424085358.75d817ae@kernel.org/
>>
>> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
>> ---
>> drivers/net/tun.c | 31 +++++++++++++++++++++++++++++--
>> 1 file changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 71b6981d07d7..74d7fd09e9ba 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1008,6 +1008,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>> struct netdev_queue *queue;
>> struct tun_file *tfile;
>> int len = skb->len;
>> + bool qdisc_present;
>> + int ret;
>>
>> rcu_read_lock();
>> tfile = rcu_dereference(tun->tfiles[txq]);
>> @@ -1060,13 +1062,38 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>
>> nf_reset_ct(skb);
>>
>> - if (ptr_ring_produce(&tfile->tx_ring, skb)) {
>> + queue = netdev_get_tx_queue(dev, txq);
>> + qdisc_present = !qdisc_txq_has_no_queue(queue);
>> +
>> + spin_lock(&tfile->tx_ring.producer_lock);
>> + ret = __ptr_ring_produce(&tfile->tx_ring, skb);
>> + if (__ptr_ring_produce_peek(&tfile->tx_ring) && qdisc_present) {
>> + netif_tx_stop_queue(queue);
>> + /* Avoid races with queue wake-up in
>> + * __{tun,tap}_ring_consume by waking if space is
>> + * available in a re-check.
>> + * The barrier makes sure that the stop is visible before
>> + * we re-check.
>> + */
>> + smp_mb__after_atomic();
>> + if (!__ptr_ring_produce_peek(&tfile->tx_ring))
>> + netif_tx_wake_queue(queue);
>
> I'm not sure I will get here, but I think those should be moved to the
> following if(ret) check. If __ptr_ring_produce() succeed, there's no
> need to bother with those queue stop/wake logic?
There is a need for that. If __ptr_ring_produce_peek() returns -ENOSPC,
we stop the queue proactively.
I believe what you are aiming for is to always stop the queue if(ret),
which I can agree with. In that case, I would simply change the condition
to:
if (qdisc_present && (ret || __ptr_ring_produce_peek(&tfile->tx_ring)))
>
>> + }
>> + spin_unlock(&tfile->tx_ring.producer_lock);
>> +
>> + if (ret) {
>> + /* If a qdisc is attached to our virtual device,
>> + * returning NETDEV_TX_BUSY is allowed.
>> + */
>> + if (qdisc_present) {
>> + rcu_read_unlock();
>> + return NETDEV_TX_BUSY;
>> + }
>> drop_reason = SKB_DROP_REASON_FULL_RING;
>> goto drop;
>> }
>>
>> /* dev->lltx requires to do our own update of trans_start */
>> - queue = netdev_get_tx_queue(dev, txq);
>> txq_trans_cond_update(queue);
>>
>> /* Notify and wake up reader process */
>> --
>> 2.43.0
>>
>
> Thanks
>
On Thu, Jan 8, 2026 at 4:02 PM Simon Schippers
<simon.schippers@tu-dortmund.de> wrote:
>
> On 1/8/26 05:37, Jason Wang wrote:
> > On Thu, Jan 8, 2026 at 5:06 AM Simon Schippers
> > <simon.schippers@tu-dortmund.de> wrote:
> >>
> >> This commit prevents tail-drop when a qdisc is present and the ptr_ring
> >> becomes full. Once an entry is successfully produced and the ptr_ring
> >> reaches capacity, the netdev queue is stopped instead of dropping
> >> subsequent packets.
> >>
> >> If producing an entry fails anyways, the tun_net_xmit returns
> >> NETDEV_TX_BUSY, again avoiding a drop. Such failures are expected because
> >> LLTX is enabled and the transmit path operates without the usual locking.
> >> As a result, concurrent calls to tun_net_xmit() are not prevented.
> >>
> >> The existing __{tun,tap}_ring_consume functions free space in the
> >> ptr_ring and wake the netdev queue. Races between this wakeup and the
> >> queue-stop logic could leave the queue stopped indefinitely. To prevent
> >> this, a memory barrier is enforced (as discussed in a similar
> >> implementation in [1]), followed by a recheck that wakes the queue if
> >> space is already available.
> >>
> >> If no qdisc is present, the previous tail-drop behavior is preserved.
> >>
> >> +-------------------------+-----------+---------------+----------------+
> >> | pktgen benchmarks to | Stock | Patched with | Patched with |
> >> | Debian VM, i5 6300HQ, | | noqueue qdisc | fq_codel qdisc |
> >> | 10M packets | | | |
> >> +-----------+-------------+-----------+---------------+----------------+
> >> | TAP | Transmitted | 196 Kpps | 195 Kpps | 185 Kpps |
> >> | +-------------+-----------+---------------+----------------+
> >> | | Lost | 1618 Kpps | 1556 Kpps | 0 |
> >> +-----------+-------------+-----------+---------------+----------------+
> >> | TAP | Transmitted | 577 Kpps | 582 Kpps | 578 Kpps |
> >> | + +-------------+-----------+---------------+----------------+
> >> | vhost-net | Lost | 1170 Kpps | 1109 Kpps | 0 |
> >> +-----------+-------------+-----------+---------------+----------------+
> >>
> >> [1] Link: https://lore.kernel.org/all/20250424085358.75d817ae@kernel.org/
> >>
> >> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> >> ---
> >> drivers/net/tun.c | 31 +++++++++++++++++++++++++++++--
> >> 1 file changed, 29 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >> index 71b6981d07d7..74d7fd09e9ba 100644
> >> --- a/drivers/net/tun.c
> >> +++ b/drivers/net/tun.c
> >> @@ -1008,6 +1008,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >> struct netdev_queue *queue;
> >> struct tun_file *tfile;
> >> int len = skb->len;
> >> + bool qdisc_present;
> >> + int ret;
> >>
> >> rcu_read_lock();
> >> tfile = rcu_dereference(tun->tfiles[txq]);
> >> @@ -1060,13 +1062,38 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>
> >> nf_reset_ct(skb);
> >>
> >> - if (ptr_ring_produce(&tfile->tx_ring, skb)) {
> >> + queue = netdev_get_tx_queue(dev, txq);
> >> + qdisc_present = !qdisc_txq_has_no_queue(queue);
> >> +
> >> + spin_lock(&tfile->tx_ring.producer_lock);
> >> + ret = __ptr_ring_produce(&tfile->tx_ring, skb);
> >> + if (__ptr_ring_produce_peek(&tfile->tx_ring) && qdisc_present) {
> >> + netif_tx_stop_queue(queue);
> >> + /* Avoid races with queue wake-up in
> >> + * __{tun,tap}_ring_consume by waking if space is
> >> + * available in a re-check.
> >> + * The barrier makes sure that the stop is visible before
> >> + * we re-check.
> >> + */
> >> + smp_mb__after_atomic();
> >> + if (!__ptr_ring_produce_peek(&tfile->tx_ring))
> >> + netif_tx_wake_queue(queue);
> >
> > I'm not sure I will get here, but I think those should be moved to the
> > following if(ret) check. If __ptr_ring_produce() succeed, there's no
> > need to bother with those queue stop/wake logic?
>
> There is a need for that. If __ptr_ring_produce_peek() returns -ENOSPC,
> we stop the queue proactively.
This seems to conflict with the following NETDEV_TX_BUSY. Or is
NETDEV_TX_BUSY prepared for the xdp_xmit?
>
> I believe what you are aiming for is to always stop the queue if(ret),
> which I can agree with. In that case, I would simply change the condition
> to:
>
> if (qdisc_present && (ret || __ptr_ring_produce_peek(&tfile->tx_ring)))
>
> >
> >> + }
> >> + spin_unlock(&tfile->tx_ring.producer_lock);
> >> +
> >> + if (ret) {
> >> + /* If a qdisc is attached to our virtual device,
> >> + * returning NETDEV_TX_BUSY is allowed.
> >> + */
> >> + if (qdisc_present) {
> >> + rcu_read_unlock();
> >> + return NETDEV_TX_BUSY;
> >> + }
> >> drop_reason = SKB_DROP_REASON_FULL_RING;
> >> goto drop;
> >> }
> >>
> >> /* dev->lltx requires to do our own update of trans_start */
> >> - queue = netdev_get_tx_queue(dev, txq);
> >> txq_trans_cond_update(queue);
> >>
> >> /* Notify and wake up reader process */
> >> --
> >> 2.43.0
> >>
> >
> > Thanks
> >
>
Thanks
On 1/9/26 07:09, Jason Wang wrote:
> On Thu, Jan 8, 2026 at 4:02 PM Simon Schippers
> <simon.schippers@tu-dortmund.de> wrote:
>>
>> On 1/8/26 05:37, Jason Wang wrote:
>>> On Thu, Jan 8, 2026 at 5:06 AM Simon Schippers
>>> <simon.schippers@tu-dortmund.de> wrote:
>>>>
>>>> This commit prevents tail-drop when a qdisc is present and the ptr_ring
>>>> becomes full. Once an entry is successfully produced and the ptr_ring
>>>> reaches capacity, the netdev queue is stopped instead of dropping
>>>> subsequent packets.
>>>>
>>>> If producing an entry fails anyways, the tun_net_xmit returns
>>>> NETDEV_TX_BUSY, again avoiding a drop. Such failures are expected because
>>>> LLTX is enabled and the transmit path operates without the usual locking.
>>>> As a result, concurrent calls to tun_net_xmit() are not prevented.
>>>>
>>>> The existing __{tun,tap}_ring_consume functions free space in the
>>>> ptr_ring and wake the netdev queue. Races between this wakeup and the
>>>> queue-stop logic could leave the queue stopped indefinitely. To prevent
>>>> this, a memory barrier is enforced (as discussed in a similar
>>>> implementation in [1]), followed by a recheck that wakes the queue if
>>>> space is already available.
>>>>
>>>> If no qdisc is present, the previous tail-drop behavior is preserved.
>>>>
>>>> +-------------------------+-----------+---------------+----------------+
>>>> | pktgen benchmarks to | Stock | Patched with | Patched with |
>>>> | Debian VM, i5 6300HQ, | | noqueue qdisc | fq_codel qdisc |
>>>> | 10M packets | | | |
>>>> +-----------+-------------+-----------+---------------+----------------+
>>>> | TAP | Transmitted | 196 Kpps | 195 Kpps | 185 Kpps |
>>>> | +-------------+-----------+---------------+----------------+
>>>> | | Lost | 1618 Kpps | 1556 Kpps | 0 |
>>>> +-----------+-------------+-----------+---------------+----------------+
>>>> | TAP | Transmitted | 577 Kpps | 582 Kpps | 578 Kpps |
>>>> | + +-------------+-----------+---------------+----------------+
>>>> | vhost-net | Lost | 1170 Kpps | 1109 Kpps | 0 |
>>>> +-----------+-------------+-----------+---------------+----------------+
>>>>
>>>> [1] Link: https://lore.kernel.org/all/20250424085358.75d817ae@kernel.org/
>>>>
>>>> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>>>> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>>>> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
>>>> ---
>>>> drivers/net/tun.c | 31 +++++++++++++++++++++++++++++--
>>>> 1 file changed, 29 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>> index 71b6981d07d7..74d7fd09e9ba 100644
>>>> --- a/drivers/net/tun.c
>>>> +++ b/drivers/net/tun.c
>>>> @@ -1008,6 +1008,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>>> struct netdev_queue *queue;
>>>> struct tun_file *tfile;
>>>> int len = skb->len;
>>>> + bool qdisc_present;
>>>> + int ret;
>>>>
>>>> rcu_read_lock();
>>>> tfile = rcu_dereference(tun->tfiles[txq]);
>>>> @@ -1060,13 +1062,38 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>
>>>> nf_reset_ct(skb);
>>>>
>>>> - if (ptr_ring_produce(&tfile->tx_ring, skb)) {
>>>> + queue = netdev_get_tx_queue(dev, txq);
>>>> + qdisc_present = !qdisc_txq_has_no_queue(queue);
>>>> +
>>>> + spin_lock(&tfile->tx_ring.producer_lock);
>>>> + ret = __ptr_ring_produce(&tfile->tx_ring, skb);
>>>> + if (__ptr_ring_produce_peek(&tfile->tx_ring) && qdisc_present) {
>>>> + netif_tx_stop_queue(queue);
>>>> + /* Avoid races with queue wake-up in
>>>> + * __{tun,tap}_ring_consume by waking if space is
>>>> + * available in a re-check.
>>>> + * The barrier makes sure that the stop is visible before
>>>> + * we re-check.
>>>> + */
>>>> + smp_mb__after_atomic();
>>>> + if (!__ptr_ring_produce_peek(&tfile->tx_ring))
>>>> + netif_tx_wake_queue(queue);
>>>
>>> I'm not sure I will get here, but I think those should be moved to the
>>> following if(ret) check. If __ptr_ring_produce() succeed, there's no
>>> need to bother with those queue stop/wake logic?
>>
>> There is a need for that. If __ptr_ring_produce_peek() returns -ENOSPC,
>> we stop the queue proactively.
>
> This seems to conflict with the following NETDEV_TX_BUSY. Or is
> NETDEV_TX_BUSY prepared for the xdp_xmit?
Am I not allowed to stop the queue and then return NETDEV_TX_BUSY?
And I do not understand the connection with xdp_xmit.
>
>>
>> I believe what you are aiming for is to always stop the queue if(ret),
>> which I can agree with. In that case, I would simply change the condition
>> to:
>>
>> if (qdisc_present && (ret || __ptr_ring_produce_peek(&tfile->tx_ring)))
>>
>>>
>>>> + }
>>>> + spin_unlock(&tfile->tx_ring.producer_lock);
>>>> +
>>>> + if (ret) {
>>>> + /* If a qdisc is attached to our virtual device,
>>>> + * returning NETDEV_TX_BUSY is allowed.
>>>> + */
>>>> + if (qdisc_present) {
>>>> + rcu_read_unlock();
>>>> + return NETDEV_TX_BUSY;
>>>> + }
>>>> drop_reason = SKB_DROP_REASON_FULL_RING;
>>>> goto drop;
>>>> }
>>>>
>>>> /* dev->lltx requires to do our own update of trans_start */
>>>> - queue = netdev_get_tx_queue(dev, txq);
>>>> txq_trans_cond_update(queue);
>>>>
>>>> /* Notify and wake up reader process */
>>>> --
>>>> 2.43.0
>>>>
>>>
>>> Thanks
>>>
>>
>
> Thanks
>
On Fri, Jan 09, 2026 at 11:14:54AM +0100, Simon Schippers wrote: > Am I not allowed to stop the queue and then return NETDEV_TX_BUSY? We jump through a lot of hoops in virtio_net to avoid using NETDEV_TX_BUSY because that bypasses all the net/ cleverness. Given your patches aim to improve precisely ring full, I would say stopping proactively before NETDEV_TX_BUSY should be a priority. -- MST
On 1/12/26 05:33, Michael S. Tsirkin wrote: > On Fri, Jan 09, 2026 at 11:14:54AM +0100, Simon Schippers wrote: >> Am I not allowed to stop the queue and then return NETDEV_TX_BUSY? > > We jump through a lot of hoops in virtio_net to avoid using > NETDEV_TX_BUSY because that bypasses all the net/ cleverness. > Given your patches aim to improve precisely ring full, > I would say stopping proactively before NETDEV_TX_BUSY > should be a priority. > I already proactively stop here with the approach you proposed in the v6. Or am I missing something (apart from the xdp path)? And yes I also dislike returning NETDEV_TX_BUSY but I do not see how this can be prevented with lltx enabled.
On Mon, Jan 12, 2026 at 12:17:12PM +0100, Simon Schippers wrote: > On 1/12/26 05:33, Michael S. Tsirkin wrote: > > On Fri, Jan 09, 2026 at 11:14:54AM +0100, Simon Schippers wrote: > >> Am I not allowed to stop the queue and then return NETDEV_TX_BUSY? > > > > We jump through a lot of hoops in virtio_net to avoid using > > NETDEV_TX_BUSY because that bypasses all the net/ cleverness. > > Given your patches aim to improve precisely ring full, > > I would say stopping proactively before NETDEV_TX_BUSY > > should be a priority. > > > > I already proactively stop here with the approach you proposed in > the v6. > Or am I missing something (apart from the xdp path)? Yes, I am just answering the general question you posed. > > And yes I also dislike returning NETDEV_TX_BUSY but I do not see how > this can be prevented with lltx enabled. Preventing NETDEV_TX_BUSY 100% of the time is not required. It's there to handle races. -- MST
On 1/12/26 12:19, Michael S. Tsirkin wrote: > On Mon, Jan 12, 2026 at 12:17:12PM +0100, Simon Schippers wrote: >> On 1/12/26 05:33, Michael S. Tsirkin wrote: >>> On Fri, Jan 09, 2026 at 11:14:54AM +0100, Simon Schippers wrote: >>>> Am I not allowed to stop the queue and then return NETDEV_TX_BUSY? >>> >>> We jump through a lot of hoops in virtio_net to avoid using >>> NETDEV_TX_BUSY because that bypasses all the net/ cleverness. >>> Given your patches aim to improve precisely ring full, >>> I would say stopping proactively before NETDEV_TX_BUSY >>> should be a priority. >>> >> >> I already proactively stop here with the approach you proposed in >> the v6. >> Or am I missing something (apart from the xdp path)? > > Yes, I am just answering the general question you posed. Ah okay. > >> >> And yes I also dislike returning NETDEV_TX_BUSY but I do not see how >> this can be prevented with lltx enabled. > > Preventing NETDEV_TX_BUSY 100% of the time is not required. It's there > to handle races. Great to know. Thanks
On Fri, Jan 9, 2026 at 6:15 PM Simon Schippers
<simon.schippers@tu-dortmund.de> wrote:
>
> On 1/9/26 07:09, Jason Wang wrote:
> > On Thu, Jan 8, 2026 at 4:02 PM Simon Schippers
> > <simon.schippers@tu-dortmund.de> wrote:
> >>
> >> On 1/8/26 05:37, Jason Wang wrote:
> >>> On Thu, Jan 8, 2026 at 5:06 AM Simon Schippers
> >>> <simon.schippers@tu-dortmund.de> wrote:
> >>>>
> >>>> This commit prevents tail-drop when a qdisc is present and the ptr_ring
> >>>> becomes full. Once an entry is successfully produced and the ptr_ring
> >>>> reaches capacity, the netdev queue is stopped instead of dropping
> >>>> subsequent packets.
> >>>>
> >>>> If producing an entry fails anyways, the tun_net_xmit returns
> >>>> NETDEV_TX_BUSY, again avoiding a drop. Such failures are expected because
> >>>> LLTX is enabled and the transmit path operates without the usual locking.
> >>>> As a result, concurrent calls to tun_net_xmit() are not prevented.
> >>>>
> >>>> The existing __{tun,tap}_ring_consume functions free space in the
> >>>> ptr_ring and wake the netdev queue. Races between this wakeup and the
> >>>> queue-stop logic could leave the queue stopped indefinitely. To prevent
> >>>> this, a memory barrier is enforced (as discussed in a similar
> >>>> implementation in [1]), followed by a recheck that wakes the queue if
> >>>> space is already available.
> >>>>
> >>>> If no qdisc is present, the previous tail-drop behavior is preserved.
> >>>>
> >>>> +-------------------------+-----------+---------------+----------------+
> >>>> | pktgen benchmarks to | Stock | Patched with | Patched with |
> >>>> | Debian VM, i5 6300HQ, | | noqueue qdisc | fq_codel qdisc |
> >>>> | 10M packets | | | |
> >>>> +-----------+-------------+-----------+---------------+----------------+
> >>>> | TAP | Transmitted | 196 Kpps | 195 Kpps | 185 Kpps |
> >>>> | +-------------+-----------+---------------+----------------+
> >>>> | | Lost | 1618 Kpps | 1556 Kpps | 0 |
> >>>> +-----------+-------------+-----------+---------------+----------------+
> >>>> | TAP | Transmitted | 577 Kpps | 582 Kpps | 578 Kpps |
> >>>> | + +-------------+-----------+---------------+----------------+
> >>>> | vhost-net | Lost | 1170 Kpps | 1109 Kpps | 0 |
> >>>> +-----------+-------------+-----------+---------------+----------------+
> >>>>
> >>>> [1] Link: https://lore.kernel.org/all/20250424085358.75d817ae@kernel.org/
> >>>>
> >>>> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >>>> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >>>> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> >>>> ---
> >>>> drivers/net/tun.c | 31 +++++++++++++++++++++++++++++--
> >>>> 1 file changed, 29 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >>>> index 71b6981d07d7..74d7fd09e9ba 100644
> >>>> --- a/drivers/net/tun.c
> >>>> +++ b/drivers/net/tun.c
> >>>> @@ -1008,6 +1008,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>> struct netdev_queue *queue;
> >>>> struct tun_file *tfile;
> >>>> int len = skb->len;
> >>>> + bool qdisc_present;
> >>>> + int ret;
> >>>>
> >>>> rcu_read_lock();
> >>>> tfile = rcu_dereference(tun->tfiles[txq]);
> >>>> @@ -1060,13 +1062,38 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>
> >>>> nf_reset_ct(skb);
> >>>>
> >>>> - if (ptr_ring_produce(&tfile->tx_ring, skb)) {
> >>>> + queue = netdev_get_tx_queue(dev, txq);
> >>>> + qdisc_present = !qdisc_txq_has_no_queue(queue);
> >>>> +
> >>>> + spin_lock(&tfile->tx_ring.producer_lock);
> >>>> + ret = __ptr_ring_produce(&tfile->tx_ring, skb);
> >>>> + if (__ptr_ring_produce_peek(&tfile->tx_ring) && qdisc_present) {
> >>>> + netif_tx_stop_queue(queue);
> >>>> + /* Avoid races with queue wake-up in
> >>>> + * __{tun,tap}_ring_consume by waking if space is
> >>>> + * available in a re-check.
> >>>> + * The barrier makes sure that the stop is visible before
> >>>> + * we re-check.
> >>>> + */
> >>>> + smp_mb__after_atomic();
> >>>> + if (!__ptr_ring_produce_peek(&tfile->tx_ring))
> >>>> + netif_tx_wake_queue(queue);
> >>>
> >>> I'm not sure I will get here, but I think those should be moved to the
> >>> following if(ret) check. If __ptr_ring_produce() succeed, there's no
> >>> need to bother with those queue stop/wake logic?
> >>
> >> There is a need for that. If __ptr_ring_produce_peek() returns -ENOSPC,
> >> we stop the queue proactively.
> >
> > This seems to conflict with the following NETDEV_TX_BUSY. Or is
> > NETDEV_TX_BUSY prepared for the xdp_xmit?
>
> Am I not allowed to stop the queue and then return NETDEV_TX_BUSY?
No, I mean I don't understand why we still need to peek since we've
already used NETDEV_TX_BUSY.
> And I do not understand the connection with xdp_xmit.
Since there's we don't modify xdp_xmit path, so even if we peek next
ndo_start_xmit can still hit ring full.
Thanks
>
> >
> >>
> >> I believe what you are aiming for is to always stop the queue if(ret),
> >> which I can agree with. In that case, I would simply change the condition
> >> to:
> >>
> >> if (qdisc_present && (ret || __ptr_ring_produce_peek(&tfile->tx_ring)))
> >>
> >>>
> >>>> + }
> >>>> + spin_unlock(&tfile->tx_ring.producer_lock);
> >>>> +
> >>>> + if (ret) {
> >>>> + /* If a qdisc is attached to our virtual device,
> >>>> + * returning NETDEV_TX_BUSY is allowed.
> >>>> + */
> >>>> + if (qdisc_present) {
> >>>> + rcu_read_unlock();
> >>>> + return NETDEV_TX_BUSY;
> >>>> + }
> >>>> drop_reason = SKB_DROP_REASON_FULL_RING;
> >>>> goto drop;
> >>>> }
> >>>>
> >>>> /* dev->lltx requires to do our own update of trans_start */
> >>>> - queue = netdev_get_tx_queue(dev, txq);
> >>>> txq_trans_cond_update(queue);
> >>>>
> >>>> /* Notify and wake up reader process */
> >>>> --
> >>>> 2.43.0
> >>>>
> >>>
> >>> Thanks
> >>>
> >>
> >
> > Thanks
> >
>
On 1/12/26 03:22, Jason Wang wrote:
> On Fri, Jan 9, 2026 at 6:15 PM Simon Schippers
> <simon.schippers@tu-dortmund.de> wrote:
>>
>> On 1/9/26 07:09, Jason Wang wrote:
>>> On Thu, Jan 8, 2026 at 4:02 PM Simon Schippers
>>> <simon.schippers@tu-dortmund.de> wrote:
>>>>
>>>> On 1/8/26 05:37, Jason Wang wrote:
>>>>> On Thu, Jan 8, 2026 at 5:06 AM Simon Schippers
>>>>> <simon.schippers@tu-dortmund.de> wrote:
>>>>>>
>>>>>> This commit prevents tail-drop when a qdisc is present and the ptr_ring
>>>>>> becomes full. Once an entry is successfully produced and the ptr_ring
>>>>>> reaches capacity, the netdev queue is stopped instead of dropping
>>>>>> subsequent packets.
>>>>>>
>>>>>> If producing an entry fails anyways, the tun_net_xmit returns
>>>>>> NETDEV_TX_BUSY, again avoiding a drop. Such failures are expected because
>>>>>> LLTX is enabled and the transmit path operates without the usual locking.
>>>>>> As a result, concurrent calls to tun_net_xmit() are not prevented.
>>>>>>
>>>>>> The existing __{tun,tap}_ring_consume functions free space in the
>>>>>> ptr_ring and wake the netdev queue. Races between this wakeup and the
>>>>>> queue-stop logic could leave the queue stopped indefinitely. To prevent
>>>>>> this, a memory barrier is enforced (as discussed in a similar
>>>>>> implementation in [1]), followed by a recheck that wakes the queue if
>>>>>> space is already available.
>>>>>>
>>>>>> If no qdisc is present, the previous tail-drop behavior is preserved.
>>>>>>
>>>>>> +-------------------------+-----------+---------------+----------------+
>>>>>> | pktgen benchmarks to | Stock | Patched with | Patched with |
>>>>>> | Debian VM, i5 6300HQ, | | noqueue qdisc | fq_codel qdisc |
>>>>>> | 10M packets | | | |
>>>>>> +-----------+-------------+-----------+---------------+----------------+
>>>>>> | TAP | Transmitted | 196 Kpps | 195 Kpps | 185 Kpps |
>>>>>> | +-------------+-----------+---------------+----------------+
>>>>>> | | Lost | 1618 Kpps | 1556 Kpps | 0 |
>>>>>> +-----------+-------------+-----------+---------------+----------------+
>>>>>> | TAP | Transmitted | 577 Kpps | 582 Kpps | 578 Kpps |
>>>>>> | + +-------------+-----------+---------------+----------------+
>>>>>> | vhost-net | Lost | 1170 Kpps | 1109 Kpps | 0 |
>>>>>> +-----------+-------------+-----------+---------------+----------------+
>>>>>>
>>>>>> [1] Link: https://lore.kernel.org/all/20250424085358.75d817ae@kernel.org/
>>>>>>
>>>>>> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>>>>>> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>>>>>> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
>>>>>> ---
>>>>>> drivers/net/tun.c | 31 +++++++++++++++++++++++++++++--
>>>>>> 1 file changed, 29 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>>> index 71b6981d07d7..74d7fd09e9ba 100644
>>>>>> --- a/drivers/net/tun.c
>>>>>> +++ b/drivers/net/tun.c
>>>>>> @@ -1008,6 +1008,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>> struct netdev_queue *queue;
>>>>>> struct tun_file *tfile;
>>>>>> int len = skb->len;
>>>>>> + bool qdisc_present;
>>>>>> + int ret;
>>>>>>
>>>>>> rcu_read_lock();
>>>>>> tfile = rcu_dereference(tun->tfiles[txq]);
>>>>>> @@ -1060,13 +1062,38 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>>
>>>>>> nf_reset_ct(skb);
>>>>>>
>>>>>> - if (ptr_ring_produce(&tfile->tx_ring, skb)) {
>>>>>> + queue = netdev_get_tx_queue(dev, txq);
>>>>>> + qdisc_present = !qdisc_txq_has_no_queue(queue);
>>>>>> +
>>>>>> + spin_lock(&tfile->tx_ring.producer_lock);
>>>>>> + ret = __ptr_ring_produce(&tfile->tx_ring, skb);
>>>>>> + if (__ptr_ring_produce_peek(&tfile->tx_ring) && qdisc_present) {
>>>>>> + netif_tx_stop_queue(queue);
>>>>>> + /* Avoid races with queue wake-up in
>>>>>> + * __{tun,tap}_ring_consume by waking if space is
>>>>>> + * available in a re-check.
>>>>>> + * The barrier makes sure that the stop is visible before
>>>>>> + * we re-check.
>>>>>> + */
>>>>>> + smp_mb__after_atomic();
>>>>>> + if (!__ptr_ring_produce_peek(&tfile->tx_ring))
>>>>>> + netif_tx_wake_queue(queue);
>>>>>
>>>>> I'm not sure I will get here, but I think those should be moved to the
>>>>> following if(ret) check. If __ptr_ring_produce() succeed, there's no
>>>>> need to bother with those queue stop/wake logic?
>>>>
>>>> There is a need for that. If __ptr_ring_produce_peek() returns -ENOSPC,
>>>> we stop the queue proactively.
>>>
>>> This seems to conflict with the following NETDEV_TX_BUSY. Or is
>>> NETDEV_TX_BUSY prepared for the xdp_xmit?
>>
>> Am I not allowed to stop the queue and then return NETDEV_TX_BUSY?
>
> No, I mean I don't understand why we still need to peek since we've
> already used NETDEV_TX_BUSY.
Yes, if __ptr_ring_produce() returns -ENOSPC, there is no need to check
__ptr_ring_produce_peek(). I agree with you on this point and will update
the code accordingly. In all other cases, checking
__ptr_ring_produce_peek() is still required in order to proactively stop
the queue.
>
>> And I do not understand the connection with xdp_xmit.
>
> Since there's we don't modify xdp_xmit path, so even if we peek next
> ndo_start_xmit can still hit ring full.
Ah okay. Would you apply the same stop-and-recheck logic in
tun_xdp_xmit when __ptr_ring_produce() fails to produce it, or is that
not permitted there?
Apart from that, as noted in the commit message, since we are using LLTX,
hitting a full ring is still possible anyway. I could see that especially
at multiqueue tests with pktgen by looking at the qdisc requeues.
Thanks
>
> Thanks
>
>>
>>>
>>>>
>>>> I believe what you are aiming for is to always stop the queue if(ret),
>>>> which I can agree with. In that case, I would simply change the condition
>>>> to:
>>>>
>>>> if (qdisc_present && (ret || __ptr_ring_produce_peek(&tfile->tx_ring)))
>>>>
>>>>>
>>>>>> + }
>>>>>> + spin_unlock(&tfile->tx_ring.producer_lock);
>>>>>> +
>>>>>> + if (ret) {
>>>>>> + /* If a qdisc is attached to our virtual device,
>>>>>> + * returning NETDEV_TX_BUSY is allowed.
>>>>>> + */
>>>>>> + if (qdisc_present) {
>>>>>> + rcu_read_unlock();
>>>>>> + return NETDEV_TX_BUSY;
>>>>>> + }
>>>>>> drop_reason = SKB_DROP_REASON_FULL_RING;
>>>>>> goto drop;
>>>>>> }
>>>>>>
>>>>>> /* dev->lltx requires to do our own update of trans_start */
>>>>>> - queue = netdev_get_tx_queue(dev, txq);
>>>>>> txq_trans_cond_update(queue);
>>>>>>
>>>>>> /* Notify and wake up reader process */
>>>>>> --
>>>>>> 2.43.0
>>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>
>>> Thanks
>>>
>>
>
On Mon, Jan 12, 2026 at 7:08 PM Simon Schippers
<simon.schippers@tu-dortmund.de> wrote:
>
> On 1/12/26 03:22, Jason Wang wrote:
> > On Fri, Jan 9, 2026 at 6:15 PM Simon Schippers
> > <simon.schippers@tu-dortmund.de> wrote:
> >>
> >> On 1/9/26 07:09, Jason Wang wrote:
> >>> On Thu, Jan 8, 2026 at 4:02 PM Simon Schippers
> >>> <simon.schippers@tu-dortmund.de> wrote:
> >>>>
> >>>> On 1/8/26 05:37, Jason Wang wrote:
> >>>>> On Thu, Jan 8, 2026 at 5:06 AM Simon Schippers
> >>>>> <simon.schippers@tu-dortmund.de> wrote:
> >>>>>>
> >>>>>> This commit prevents tail-drop when a qdisc is present and the ptr_ring
> >>>>>> becomes full. Once an entry is successfully produced and the ptr_ring
> >>>>>> reaches capacity, the netdev queue is stopped instead of dropping
> >>>>>> subsequent packets.
> >>>>>>
> >>>>>> If producing an entry fails anyways, the tun_net_xmit returns
> >>>>>> NETDEV_TX_BUSY, again avoiding a drop. Such failures are expected because
> >>>>>> LLTX is enabled and the transmit path operates without the usual locking.
> >>>>>> As a result, concurrent calls to tun_net_xmit() are not prevented.
> >>>>>>
> >>>>>> The existing __{tun,tap}_ring_consume functions free space in the
> >>>>>> ptr_ring and wake the netdev queue. Races between this wakeup and the
> >>>>>> queue-stop logic could leave the queue stopped indefinitely. To prevent
> >>>>>> this, a memory barrier is enforced (as discussed in a similar
> >>>>>> implementation in [1]), followed by a recheck that wakes the queue if
> >>>>>> space is already available.
> >>>>>>
> >>>>>> If no qdisc is present, the previous tail-drop behavior is preserved.
> >>>>>>
> >>>>>> +-------------------------+-----------+---------------+----------------+
> >>>>>> | pktgen benchmarks to | Stock | Patched with | Patched with |
> >>>>>> | Debian VM, i5 6300HQ, | | noqueue qdisc | fq_codel qdisc |
> >>>>>> | 10M packets | | | |
> >>>>>> +-----------+-------------+-----------+---------------+----------------+
> >>>>>> | TAP | Transmitted | 196 Kpps | 195 Kpps | 185 Kpps |
> >>>>>> | +-------------+-----------+---------------+----------------+
> >>>>>> | | Lost | 1618 Kpps | 1556 Kpps | 0 |
> >>>>>> +-----------+-------------+-----------+---------------+----------------+
> >>>>>> | TAP | Transmitted | 577 Kpps | 582 Kpps | 578 Kpps |
> >>>>>> | + +-------------+-----------+---------------+----------------+
> >>>>>> | vhost-net | Lost | 1170 Kpps | 1109 Kpps | 0 |
> >>>>>> +-----------+-------------+-----------+---------------+----------------+
> >>>>>>
> >>>>>> [1] Link: https://lore.kernel.org/all/20250424085358.75d817ae@kernel.org/
> >>>>>>
> >>>>>> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >>>>>> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >>>>>> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> >>>>>> ---
> >>>>>> drivers/net/tun.c | 31 +++++++++++++++++++++++++++++--
> >>>>>> 1 file changed, 29 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >>>>>> index 71b6981d07d7..74d7fd09e9ba 100644
> >>>>>> --- a/drivers/net/tun.c
> >>>>>> +++ b/drivers/net/tun.c
> >>>>>> @@ -1008,6 +1008,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>>> struct netdev_queue *queue;
> >>>>>> struct tun_file *tfile;
> >>>>>> int len = skb->len;
> >>>>>> + bool qdisc_present;
> >>>>>> + int ret;
> >>>>>>
> >>>>>> rcu_read_lock();
> >>>>>> tfile = rcu_dereference(tun->tfiles[txq]);
> >>>>>> @@ -1060,13 +1062,38 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>>>
> >>>>>> nf_reset_ct(skb);
> >>>>>>
> >>>>>> - if (ptr_ring_produce(&tfile->tx_ring, skb)) {
> >>>>>> + queue = netdev_get_tx_queue(dev, txq);
> >>>>>> + qdisc_present = !qdisc_txq_has_no_queue(queue);
> >>>>>> +
> >>>>>> + spin_lock(&tfile->tx_ring.producer_lock);
> >>>>>> + ret = __ptr_ring_produce(&tfile->tx_ring, skb);
> >>>>>> + if (__ptr_ring_produce_peek(&tfile->tx_ring) && qdisc_present) {
> >>>>>> + netif_tx_stop_queue(queue);
> >>>>>> + /* Avoid races with queue wake-up in
> >>>>>> + * __{tun,tap}_ring_consume by waking if space is
> >>>>>> + * available in a re-check.
> >>>>>> + * The barrier makes sure that the stop is visible before
> >>>>>> + * we re-check.
> >>>>>> + */
> >>>>>> + smp_mb__after_atomic();
> >>>>>> + if (!__ptr_ring_produce_peek(&tfile->tx_ring))
> >>>>>> + netif_tx_wake_queue(queue);
> >>>>>
> >>>>> I'm not sure I will get here, but I think those should be moved to the
> >>>>> following if(ret) check. If __ptr_ring_produce() succeed, there's no
> >>>>> need to bother with those queue stop/wake logic?
> >>>>
> >>>> There is a need for that. If __ptr_ring_produce_peek() returns -ENOSPC,
> >>>> we stop the queue proactively.
> >>>
> >>> This seems to conflict with the following NETDEV_TX_BUSY. Or is
> >>> NETDEV_TX_BUSY prepared for the xdp_xmit?
> >>
> >> Am I not allowed to stop the queue and then return NETDEV_TX_BUSY?
> >
> > No, I mean I don't understand why we still need to peek since we've
> > already used NETDEV_TX_BUSY.
>
> Yes, if __ptr_ring_produce() returns -ENOSPC, there is no need to check
> __ptr_ring_produce_peek(). I agree with you on this point and will update
> the code accordingly. In all other cases, checking
> __ptr_ring_produce_peek() is still required in order to proactively stop
> the queue.
>
> >
> >> And I do not understand the connection with xdp_xmit.
> >
> > Since there's we don't modify xdp_xmit path, so even if we peek next
> > ndo_start_xmit can still hit ring full.
>
> Ah okay. Would you apply the same stop-and-recheck logic in
> tun_xdp_xmit when __ptr_ring_produce() fails to produce it, or is that
> not permitted there?
I think it won't work as there's no qdsic logic implemented in the XDP
xmit path. NETDEV_TX_BUSY for tun_net_xmit() should be sufficient.
>
> Apart from that, as noted in the commit message, since we are using LLTX,
> hitting a full ring is still possible anyway. I could see that especially
> at multiqueue tests with pktgen by looking at the qdisc requeues.
>
> Thanks
Thanks
>
> >
> > Thanks
> >
> >>
> >>>
> >>>>
> >>>> I believe what you are aiming for is to always stop the queue if(ret),
> >>>> which I can agree with. In that case, I would simply change the condition
> >>>> to:
> >>>>
> >>>> if (qdisc_present && (ret || __ptr_ring_produce_peek(&tfile->tx_ring)))
> >>>>
> >>>>>
> >>>>>> + }
> >>>>>> + spin_unlock(&tfile->tx_ring.producer_lock);
> >>>>>> +
> >>>>>> + if (ret) {
> >>>>>> + /* If a qdisc is attached to our virtual device,
> >>>>>> + * returning NETDEV_TX_BUSY is allowed.
> >>>>>> + */
> >>>>>> + if (qdisc_present) {
> >>>>>> + rcu_read_unlock();
> >>>>>> + return NETDEV_TX_BUSY;
> >>>>>> + }
> >>>>>> drop_reason = SKB_DROP_REASON_FULL_RING;
> >>>>>> goto drop;
> >>>>>> }
> >>>>>>
> >>>>>> /* dev->lltx requires to do our own update of trans_start */
> >>>>>> - queue = netdev_get_tx_queue(dev, txq);
> >>>>>> txq_trans_cond_update(queue);
> >>>>>>
> >>>>>> /* Notify and wake up reader process */
> >>>>>> --
> >>>>>> 2.43.0
> >>>>>>
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>
> >>>
> >>> Thanks
> >>>
> >>
> >
>
On Mon, Jan 12, 2026 at 12:08:14PM +0100, Simon Schippers wrote:
> On 1/12/26 03:22, Jason Wang wrote:
> > On Fri, Jan 9, 2026 at 6:15 PM Simon Schippers
> > <simon.schippers@tu-dortmund.de> wrote:
> >>
> >> On 1/9/26 07:09, Jason Wang wrote:
> >>> On Thu, Jan 8, 2026 at 4:02 PM Simon Schippers
> >>> <simon.schippers@tu-dortmund.de> wrote:
> >>>>
> >>>> On 1/8/26 05:37, Jason Wang wrote:
> >>>>> On Thu, Jan 8, 2026 at 5:06 AM Simon Schippers
> >>>>> <simon.schippers@tu-dortmund.de> wrote:
> >>>>>>
> >>>>>> This commit prevents tail-drop when a qdisc is present and the ptr_ring
> >>>>>> becomes full. Once an entry is successfully produced and the ptr_ring
> >>>>>> reaches capacity, the netdev queue is stopped instead of dropping
> >>>>>> subsequent packets.
> >>>>>>
> >>>>>> If producing an entry fails anyways, the tun_net_xmit returns
> >>>>>> NETDEV_TX_BUSY, again avoiding a drop. Such failures are expected because
> >>>>>> LLTX is enabled and the transmit path operates without the usual locking.
> >>>>>> As a result, concurrent calls to tun_net_xmit() are not prevented.
> >>>>>>
> >>>>>> The existing __{tun,tap}_ring_consume functions free space in the
> >>>>>> ptr_ring and wake the netdev queue. Races between this wakeup and the
> >>>>>> queue-stop logic could leave the queue stopped indefinitely. To prevent
> >>>>>> this, a memory barrier is enforced (as discussed in a similar
> >>>>>> implementation in [1]), followed by a recheck that wakes the queue if
> >>>>>> space is already available.
> >>>>>>
> >>>>>> If no qdisc is present, the previous tail-drop behavior is preserved.
> >>>>>>
> >>>>>> +-------------------------+-----------+---------------+----------------+
> >>>>>> | pktgen benchmarks to | Stock | Patched with | Patched with |
> >>>>>> | Debian VM, i5 6300HQ, | | noqueue qdisc | fq_codel qdisc |
> >>>>>> | 10M packets | | | |
> >>>>>> +-----------+-------------+-----------+---------------+----------------+
> >>>>>> | TAP | Transmitted | 196 Kpps | 195 Kpps | 185 Kpps |
> >>>>>> | +-------------+-----------+---------------+----------------+
> >>>>>> | | Lost | 1618 Kpps | 1556 Kpps | 0 |
> >>>>>> +-----------+-------------+-----------+---------------+----------------+
> >>>>>> | TAP | Transmitted | 577 Kpps | 582 Kpps | 578 Kpps |
> >>>>>> | + +-------------+-----------+---------------+----------------+
> >>>>>> | vhost-net | Lost | 1170 Kpps | 1109 Kpps | 0 |
> >>>>>> +-----------+-------------+-----------+---------------+----------------+
> >>>>>>
> >>>>>> [1] Link: https://lore.kernel.org/all/20250424085358.75d817ae@kernel.org/
> >>>>>>
> >>>>>> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >>>>>> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> >>>>>> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> >>>>>> ---
> >>>>>> drivers/net/tun.c | 31 +++++++++++++++++++++++++++++--
> >>>>>> 1 file changed, 29 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >>>>>> index 71b6981d07d7..74d7fd09e9ba 100644
> >>>>>> --- a/drivers/net/tun.c
> >>>>>> +++ b/drivers/net/tun.c
> >>>>>> @@ -1008,6 +1008,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>>> struct netdev_queue *queue;
> >>>>>> struct tun_file *tfile;
> >>>>>> int len = skb->len;
> >>>>>> + bool qdisc_present;
> >>>>>> + int ret;
> >>>>>>
> >>>>>> rcu_read_lock();
> >>>>>> tfile = rcu_dereference(tun->tfiles[txq]);
> >>>>>> @@ -1060,13 +1062,38 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>>>
> >>>>>> nf_reset_ct(skb);
> >>>>>>
> >>>>>> - if (ptr_ring_produce(&tfile->tx_ring, skb)) {
> >>>>>> + queue = netdev_get_tx_queue(dev, txq);
> >>>>>> + qdisc_present = !qdisc_txq_has_no_queue(queue);
> >>>>>> +
> >>>>>> + spin_lock(&tfile->tx_ring.producer_lock);
> >>>>>> + ret = __ptr_ring_produce(&tfile->tx_ring, skb);
> >>>>>> + if (__ptr_ring_produce_peek(&tfile->tx_ring) && qdisc_present) {
> >>>>>> + netif_tx_stop_queue(queue);
> >>>>>> + /* Avoid races with queue wake-up in
> >>>>>> + * __{tun,tap}_ring_consume by waking if space is
> >>>>>> + * available in a re-check.
> >>>>>> + * The barrier makes sure that the stop is visible before
> >>>>>> + * we re-check.
> >>>>>> + */
> >>>>>> + smp_mb__after_atomic();
> >>>>>> + if (!__ptr_ring_produce_peek(&tfile->tx_ring))
> >>>>>> + netif_tx_wake_queue(queue);
> >>>>>
> >>>>> I'm not sure I will get here, but I think those should be moved to the
> >>>>> following if(ret) check. If __ptr_ring_produce() succeed, there's no
> >>>>> need to bother with those queue stop/wake logic?
> >>>>
> >>>> There is a need for that. If __ptr_ring_produce_peek() returns -ENOSPC,
> >>>> we stop the queue proactively.
> >>>
> >>> This seems to conflict with the following NETDEV_TX_BUSY. Or is
> >>> NETDEV_TX_BUSY prepared for the xdp_xmit?
> >>
> >> Am I not allowed to stop the queue and then return NETDEV_TX_BUSY?
> >
> > No, I mean I don't understand why we still need to peek since we've
> > already used NETDEV_TX_BUSY.
>
> Yes, if __ptr_ring_produce() returns -ENOSPC, there is no need to check
> __ptr_ring_produce_peek(). I agree with you on this point and will update
> the code accordingly. In all other cases, checking
> __ptr_ring_produce_peek() is still required in order to proactively stop
> the queue.
>
> >
> >> And I do not understand the connection with xdp_xmit.
> >
> > Since there's we don't modify xdp_xmit path, so even if we peek next
> > ndo_start_xmit can still hit ring full.
>
> Ah okay. Would you apply the same stop-and-recheck logic in
> tun_xdp_xmit when __ptr_ring_produce() fails to produce it, or is that
> not permitted there?
>
> Apart from that, as noted in the commit message, since we are using LLTX,
> hitting a full ring is still possible anyway. I could see that especially
> at multiqueue tests with pktgen by looking at the qdisc requeues.
>
> Thanks
If it's an exceptional rare condition (i.e. a race), it's not a big deal. That's why
NETDEV_TX_BUSY is there in the 1st place. If it's the main modus
operadi - not good.
> >
> > Thanks
> >
> >>
> >>>
> >>>>
> >>>> I believe what you are aiming for is to always stop the queue if(ret),
> >>>> which I can agree with. In that case, I would simply change the condition
> >>>> to:
> >>>>
> >>>> if (qdisc_present && (ret || __ptr_ring_produce_peek(&tfile->tx_ring)))
> >>>>
> >>>>>
> >>>>>> + }
> >>>>>> + spin_unlock(&tfile->tx_ring.producer_lock);
> >>>>>> +
> >>>>>> + if (ret) {
> >>>>>> + /* If a qdisc is attached to our virtual device,
> >>>>>> + * returning NETDEV_TX_BUSY is allowed.
> >>>>>> + */
> >>>>>> + if (qdisc_present) {
> >>>>>> + rcu_read_unlock();
> >>>>>> + return NETDEV_TX_BUSY;
> >>>>>> + }
> >>>>>> drop_reason = SKB_DROP_REASON_FULL_RING;
> >>>>>> goto drop;
> >>>>>> }
> >>>>>>
> >>>>>> /* dev->lltx requires to do our own update of trans_start */
> >>>>>> - queue = netdev_get_tx_queue(dev, txq);
> >>>>>> txq_trans_cond_update(queue);
> >>>>>>
> >>>>>> /* Notify and wake up reader process */
> >>>>>> --
> >>>>>> 2.43.0
> >>>>>>
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>>
> >>>
> >>> Thanks
> >>>
> >>
> >
© 2016 - 2026 Red Hat, Inc.