[PATCH net-next v8 0/4] tun/tap & vhost-net: apply qdisc backpressure on full ptr_ring to reduce TX drops

Simon Schippers posted 4 patches 3 weeks, 5 days ago
drivers/net/tun.c        | 91 +++++++++++++++++++++++++++++++++++++---
drivers/vhost/net.c      | 15 +++++--
include/linux/if_tun.h   |  3 ++
include/linux/ptr_ring.h | 14 ++++++-
4 files changed, 111 insertions(+), 12 deletions(-)
[PATCH net-next v8 0/4] tun/tap & vhost-net: apply qdisc backpressure on full ptr_ring to reduce TX drops
Posted by Simon Schippers 3 weeks, 5 days ago
This patch series deals with tun/tap & vhost-net which drop incoming
SKBs whenever their internal ptr_ring buffer is full. Instead, with this 
patch series, the associated netdev queue is stopped - but only when a
qdisc is attached. If no qdisc is present the existing behavior is
preserved. This patch series touches tun/tap and vhost-net, as they
share common logic and must be updated together. Modifying only one of
them would break the other.

By applying proper backpressure, this change allows the connected qdisc to 
operate correctly, as reported in [1], and significantly improves
performance in real-world scenarios, as demonstrated in our paper [2]. For 
example, we observed a 36% TCP throughput improvement for an OpenVPN 
connection between Germany and the USA.

Synthetic pktgen benchmarks indicate a slight regression.
Pktgen benchmarks are provided per commit, with the final commit showing
the overall performance.

Thanks!

[1] Link: https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
[2] Link: https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
[3] Link: https://lore.kernel.org/r/174549940981.608169.4363875844729313831.stgit@firesoul
[4] Link: https://lore.kernel.org/r/176295323282.307447.14790015927673763094.stgit@firesoul

---
Changelog:
V8:
- Drop code changes in drivers/net/tap.c; The code there deals with
  ipvtap/macvtap which are unrelated to the goal of this patch series
  and I did not realize that before
-> Greatly simplified logic, 4 instead of 9 commits
-> No more duplicated logics and distinction in vhost required
- Only wake after the queue stopped and half of the ring was consumed
  as suggested by MST
-> Performance improvements for TAP, but still slightly slower
- Better benchmarking with pinned threads, XDP drop program for
  tap+vhost-net and disabling CPU mitigations (and newer Ryzen 5 5600X
  processor) as suggested by Jason Wang

V7: https://lore.kernel.org/netdev/20260107210448.37851-1-simon.schippers@tu-dortmund.de/
- Switch to an approach similar to veth [3] (excluding the recently fixed 
variant [4]), as suggested by MST, with minor adjustments discussed in V6
- Rename the cover-letter title
- Add multithreaded pktgen and iperf3 benchmarks, as suggested by Jason 
Wang
- Rework __ptr_ring_consume_created_space() so it can also be used after 
batched consume

V6: https://lore.kernel.org/netdev/20251120152914.1127975-1-simon.schippers@tu-dortmund.de/
General:
- Major adjustments to the descriptions. Special thanks to Jon Kohler!
- Fix git bisect by moving most logic into dedicated functions and only 
start using them in patch 7.
- Moved the main logic of the coupled producer and consumer into a single 
patch to avoid a chicken-and-egg dependency between commits :-)
- Rebased to 6.18-rc5 and ran benchmarks again that now also include lost 
packets (previously I missed a 0, so all benchmark results were higher by 
factor 10...).
- Also include the benchmark in patch 7.

Producer:
- Move logic into the new helper tun_ring_produce()
- Added a smp_rmb() paired with the consumer, ensuring freed space of the 
consumer is visible
- Assume that ptr_ring is not full when __ptr_ring_full_next() is called

Consumer:
- Use an unpaired smp_rmb() instead of barrier() to ensure that the 
netdev_tx_queue_stopped() call completes before discarding
- Also wake the netdev queue if it was stopped before discarding and then 
becomes empty
-> Fixes race with producer as identified by MST in V5
-> Waking the netdev queues upon resize is not required anymore
- Use __ptr_ring_consume_created_space() instead of messing with ptr_ring 
internals
-> Batched consume now just calls 
__tun_ring_consume()/__tap_ring_consume() in a loop
- Added an smp_wmb() before waking the netdev queue which is paired with 
the smp_rmb() discussed above

V5: https://lore.kernel.org/netdev/20250922221553.47802-1-simon.schippers@tu-dortmund.de/T/#u
- Stop the netdev queue prior to producing the final fitting ptr_ring entry
-> Ensures the consumer has the latest netdev queue state, making it safe 
to wake the queue
-> Resolves an issue in vhost-net where the netdev queue could remain 
stopped despite being empty
-> For TUN/TAP, the netdev queue no longer needs to be woken in the 
blocking loop
-> Introduces new helpers __ptr_ring_full_next and 
__ptr_ring_will_invalidate for this purpose
- vhost-net now uses wrappers of TUN/TAP for ptr_ring consumption rather 
than maintaining its own rx_ring pointer

V4: https://lore.kernel.org/netdev/20250902080957.47265-1-simon.schippers@tu-dortmund.de/T/#u
- Target net-next instead of net
- Changed to patch series instead of single patch
- Changed to new title from old title
"TUN/TAP: Improving throughput and latency by avoiding SKB drops"
- Wake netdev queue with new helpers wake_netdev_queue when there is any 
spare capacity in the ptr_ring instead of waiting for it to be empty
- Use tun_file instead of tun_struct in tun_ring_recv as a more consistent 
logic
- Use smp_wmb() and smp_rmb() barrier pair, which avoids any packet drops 
that happened rarely before
- Use safer logic for vhost-net using RCU read locks to access TUN/TAP data

V3: https://lore.kernel.org/netdev/20250825211832.84901-1-simon.schippers@tu-dortmund.de/T/#u
- Added support for TAP and TAP+vhost-net.

V2: https://lore.kernel.org/netdev/20250811220430.14063-1-simon.schippers@tu-dortmund.de/T/#u
- Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed 
unnecessary netif_tx_wake_queue in tun_ring_recv.

V1: https://lore.kernel.org/netdev/20250808153721.261334-1-simon.schippers@tu-dortmund.de/T/#u
---

Simon Schippers (4):
  tun/tap: add ptr_ring consume helper with netdev queue wakeup
  vhost-net: wake queue of tun/tap after ptr_ring consume
  ptr_ring: move free-space check into separate helper
  tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present

 drivers/net/tun.c        | 91 +++++++++++++++++++++++++++++++++++++---
 drivers/vhost/net.c      | 15 +++++--
 include/linux/if_tun.h   |  3 ++
 include/linux/ptr_ring.h | 14 ++++++-
 4 files changed, 111 insertions(+), 12 deletions(-)

-- 
2.43.0
Re: [PATCH net-next v8 0/4] tun/tap & vhost-net: apply qdisc backpressure on full ptr_ring to reduce TX drops
Posted by Simon Schippers 2 weeks ago
On 3/12/26 14:06, Simon Schippers wrote:
> This patch series deals with tun/tap & vhost-net which drop incoming
> SKBs whenever their internal ptr_ring buffer is full. Instead, with this 
> patch series, the associated netdev queue is stopped - but only when a
> qdisc is attached. If no qdisc is present the existing behavior is
> preserved. This patch series touches tun/tap and vhost-net, as they
> share common logic and must be updated together. Modifying only one of
> them would break the other.

Hi,

@Jason what do you think about the patchset?

Thanks!
Re: [PATCH net-next v8 0/4] tun/tap & vhost-net: apply qdisc backpressure on full ptr_ring to reduce TX drops
Posted by Michael S. Tsirkin 3 weeks, 5 days ago
On Thu, Mar 12, 2026 at 02:06:35PM +0100, Simon Schippers wrote:
> This patch series deals with tun/tap & vhost-net which drop incoming
> SKBs whenever their internal ptr_ring buffer is full. Instead, with this 
> patch series, the associated netdev queue is stopped - but only when a
> qdisc is attached. If no qdisc is present the existing behavior is
> preserved. This patch series touches tun/tap and vhost-net, as they
> share common logic and must be updated together. Modifying only one of
> them would break the other.
> 
> By applying proper backpressure, this change allows the connected qdisc to 
> operate correctly, as reported in [1], and significantly improves
> performance in real-world scenarios, as demonstrated in our paper [2]. For 
> example, we observed a 36% TCP throughput improvement for an OpenVPN 
> connection between Germany and the USA.
> 
> Synthetic pktgen benchmarks indicate a slight regression.
> Pktgen benchmarks are provided per commit, with the final commit showing
> the overall performance.
> 
> Thanks!

I posted a minor nit on patch 2.

Otherwise LGTM:

Acked-by: Michael S. Tsirkin <mst@redhat.com>

thanks for the work!


> [1] Link: https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
> [2] Link: https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
> [3] Link: https://lore.kernel.org/r/174549940981.608169.4363875844729313831.stgit@firesoul
> [4] Link: https://lore.kernel.org/r/176295323282.307447.14790015927673763094.stgit@firesoul
> 
> ---
> Changelog:
> V8:
> - Drop code changes in drivers/net/tap.c; The code there deals with
>   ipvtap/macvtap which are unrelated to the goal of this patch series
>   and I did not realize that before
> -> Greatly simplified logic, 4 instead of 9 commits
> -> No more duplicated logics and distinction in vhost required
> - Only wake after the queue stopped and half of the ring was consumed
>   as suggested by MST
> -> Performance improvements for TAP, but still slightly slower
> - Better benchmarking with pinned threads, XDP drop program for
>   tap+vhost-net and disabling CPU mitigations (and newer Ryzen 5 5600X
>   processor) as suggested by Jason Wang
> 
> V7: https://lore.kernel.org/netdev/20260107210448.37851-1-simon.schippers@tu-dortmund.de/
> - Switch to an approach similar to veth [3] (excluding the recently fixed 
> variant [4]), as suggested by MST, with minor adjustments discussed in V6
> - Rename the cover-letter title
> - Add multithreaded pktgen and iperf3 benchmarks, as suggested by Jason 
> Wang
> - Rework __ptr_ring_consume_created_space() so it can also be used after 
> batched consume
> 
> V6: https://lore.kernel.org/netdev/20251120152914.1127975-1-simon.schippers@tu-dortmund.de/
> General:
> - Major adjustments to the descriptions. Special thanks to Jon Kohler!
> - Fix git bisect by moving most logic into dedicated functions and only 
> start using them in patch 7.
> - Moved the main logic of the coupled producer and consumer into a single 
> patch to avoid a chicken-and-egg dependency between commits :-)
> - Rebased to 6.18-rc5 and ran benchmarks again that now also include lost 
> packets (previously I missed a 0, so all benchmark results were higher by 
> factor 10...).
> - Also include the benchmark in patch 7.
> 
> Producer:
> - Move logic into the new helper tun_ring_produce()
> - Added a smp_rmb() paired with the consumer, ensuring freed space of the 
> consumer is visible
> - Assume that ptr_ring is not full when __ptr_ring_full_next() is called
> 
> Consumer:
> - Use an unpaired smp_rmb() instead of barrier() to ensure that the 
> netdev_tx_queue_stopped() call completes before discarding
> - Also wake the netdev queue if it was stopped before discarding and then 
> becomes empty
> -> Fixes race with producer as identified by MST in V5
> -> Waking the netdev queues upon resize is not required anymore
> - Use __ptr_ring_consume_created_space() instead of messing with ptr_ring 
> internals
> -> Batched consume now just calls 
> __tun_ring_consume()/__tap_ring_consume() in a loop
> - Added an smp_wmb() before waking the netdev queue which is paired with 
> the smp_rmb() discussed above
> 
> V5: https://lore.kernel.org/netdev/20250922221553.47802-1-simon.schippers@tu-dortmund.de/T/#u
> - Stop the netdev queue prior to producing the final fitting ptr_ring entry
> -> Ensures the consumer has the latest netdev queue state, making it safe 
> to wake the queue
> -> Resolves an issue in vhost-net where the netdev queue could remain 
> stopped despite being empty
> -> For TUN/TAP, the netdev queue no longer needs to be woken in the 
> blocking loop
> -> Introduces new helpers __ptr_ring_full_next and 
> __ptr_ring_will_invalidate for this purpose
> - vhost-net now uses wrappers of TUN/TAP for ptr_ring consumption rather 
> than maintaining its own rx_ring pointer
> 
> V4: https://lore.kernel.org/netdev/20250902080957.47265-1-simon.schippers@tu-dortmund.de/T/#u
> - Target net-next instead of net
> - Changed to patch series instead of single patch
> - Changed to new title from old title
> "TUN/TAP: Improving throughput and latency by avoiding SKB drops"
> - Wake netdev queue with new helpers wake_netdev_queue when there is any 
> spare capacity in the ptr_ring instead of waiting for it to be empty
> - Use tun_file instead of tun_struct in tun_ring_recv as a more consistent 
> logic
> - Use smp_wmb() and smp_rmb() barrier pair, which avoids any packet drops 
> that happened rarely before
> - Use safer logic for vhost-net using RCU read locks to access TUN/TAP data
> 
> V3: https://lore.kernel.org/netdev/20250825211832.84901-1-simon.schippers@tu-dortmund.de/T/#u
> - Added support for TAP and TAP+vhost-net.
> 
> V2: https://lore.kernel.org/netdev/20250811220430.14063-1-simon.schippers@tu-dortmund.de/T/#u
> - Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed 
> unnecessary netif_tx_wake_queue in tun_ring_recv.
> 
> V1: https://lore.kernel.org/netdev/20250808153721.261334-1-simon.schippers@tu-dortmund.de/T/#u
> ---
> 
> Simon Schippers (4):
>   tun/tap: add ptr_ring consume helper with netdev queue wakeup
>   vhost-net: wake queue of tun/tap after ptr_ring consume
>   ptr_ring: move free-space check into separate helper
>   tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present
> 
>  drivers/net/tun.c        | 91 +++++++++++++++++++++++++++++++++++++---
>  drivers/vhost/net.c      | 15 +++++--
>  include/linux/if_tun.h   |  3 ++
>  include/linux/ptr_ring.h | 14 ++++++-
>  4 files changed, 111 insertions(+), 12 deletions(-)
> 
> -- 
> 2.43.0
Re: [PATCH net-next v8 0/4] tun/tap & vhost-net: apply qdisc backpressure on full ptr_ring to reduce TX drops
Posted by Simon Schippers 3 weeks, 4 days ago
On 3/12/26 14:55, Michael S. Tsirkin wrote:
> On Thu, Mar 12, 2026 at 02:06:35PM +0100, Simon Schippers wrote:
>> This patch series deals with tun/tap & vhost-net which drop incoming
>> SKBs whenever their internal ptr_ring buffer is full. Instead, with this 
>> patch series, the associated netdev queue is stopped - but only when a
>> qdisc is attached. If no qdisc is present the existing behavior is
>> preserved. This patch series touches tun/tap and vhost-net, as they
>> share common logic and must be updated together. Modifying only one of
>> them would break the other.
>>
>> By applying proper backpressure, this change allows the connected qdisc to 
>> operate correctly, as reported in [1], and significantly improves
>> performance in real-world scenarios, as demonstrated in our paper [2]. For 
>> example, we observed a 36% TCP throughput improvement for an OpenVPN 
>> connection between Germany and the USA.
>>
>> Synthetic pktgen benchmarks indicate a slight regression.
>> Pktgen benchmarks are provided per commit, with the final commit showing
>> the overall performance.
>>
>> Thanks!
> 
> I posted a minor nit on patch 2.
> 
> Otherwise LGTM:
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> thanks for the work!

Thanks!

Should I do a new version for the minor nit?

And how about the ptr_ring race:
I see there is a seperate discussion for that now. [1]
Should I wait for that?

Before sending a potential new version I would of course wait for
Jason's take.

[1] Link: https://lore.kernel.org/netdev/CANn89iJLC9p+N=Rqtuj7ZuPRdpSGCATCtZdz1Vi9mbzf3ATekQ@mail.gmail.com/

> 
> 
>> [1] Link: https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
>> [2] Link: https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
>> [3] Link: https://lore.kernel.org/r/174549940981.608169.4363875844729313831.stgit@firesoul
>> [4] Link: https://lore.kernel.org/r/176295323282.307447.14790015927673763094.stgit@firesoul
>>
>> ---
>> Changelog:
>> V8:
>> - Drop code changes in drivers/net/tap.c; The code there deals with
>>   ipvtap/macvtap which are unrelated to the goal of this patch series
>>   and I did not realize that before
>> -> Greatly simplified logic, 4 instead of 9 commits
>> -> No more duplicated logics and distinction in vhost required
>> - Only wake after the queue stopped and half of the ring was consumed
>>   as suggested by MST
>> -> Performance improvements for TAP, but still slightly slower
>> - Better benchmarking with pinned threads, XDP drop program for
>>   tap+vhost-net and disabling CPU mitigations (and newer Ryzen 5 5600X
>>   processor) as suggested by Jason Wang
>>
>> V7: https://lore.kernel.org/netdev/20260107210448.37851-1-simon.schippers@tu-dortmund.de/
>> - Switch to an approach similar to veth [3] (excluding the recently fixed 
>> variant [4]), as suggested by MST, with minor adjustments discussed in V6
>> - Rename the cover-letter title
>> - Add multithreaded pktgen and iperf3 benchmarks, as suggested by Jason 
>> Wang
>> - Rework __ptr_ring_consume_created_space() so it can also be used after 
>> batched consume
>>
>> V6: https://lore.kernel.org/netdev/20251120152914.1127975-1-simon.schippers@tu-dortmund.de/
>> General:
>> - Major adjustments to the descriptions. Special thanks to Jon Kohler!
>> - Fix git bisect by moving most logic into dedicated functions and only 
>> start using them in patch 7.
>> - Moved the main logic of the coupled producer and consumer into a single 
>> patch to avoid a chicken-and-egg dependency between commits :-)
>> - Rebased to 6.18-rc5 and ran benchmarks again that now also include lost 
>> packets (previously I missed a 0, so all benchmark results were higher by 
>> factor 10...).
>> - Also include the benchmark in patch 7.
>>
>> Producer:
>> - Move logic into the new helper tun_ring_produce()
>> - Added a smp_rmb() paired with the consumer, ensuring freed space of the 
>> consumer is visible
>> - Assume that ptr_ring is not full when __ptr_ring_full_next() is called
>>
>> Consumer:
>> - Use an unpaired smp_rmb() instead of barrier() to ensure that the 
>> netdev_tx_queue_stopped() call completes before discarding
>> - Also wake the netdev queue if it was stopped before discarding and then 
>> becomes empty
>> -> Fixes race with producer as identified by MST in V5
>> -> Waking the netdev queues upon resize is not required anymore
>> - Use __ptr_ring_consume_created_space() instead of messing with ptr_ring 
>> internals
>> -> Batched consume now just calls 
>> __tun_ring_consume()/__tap_ring_consume() in a loop
>> - Added an smp_wmb() before waking the netdev queue which is paired with 
>> the smp_rmb() discussed above
>>
>> V5: https://lore.kernel.org/netdev/20250922221553.47802-1-simon.schippers@tu-dortmund.de/T/#u
>> - Stop the netdev queue prior to producing the final fitting ptr_ring entry
>> -> Ensures the consumer has the latest netdev queue state, making it safe 
>> to wake the queue
>> -> Resolves an issue in vhost-net where the netdev queue could remain 
>> stopped despite being empty
>> -> For TUN/TAP, the netdev queue no longer needs to be woken in the 
>> blocking loop
>> -> Introduces new helpers __ptr_ring_full_next and 
>> __ptr_ring_will_invalidate for this purpose
>> - vhost-net now uses wrappers of TUN/TAP for ptr_ring consumption rather 
>> than maintaining its own rx_ring pointer
>>
>> V4: https://lore.kernel.org/netdev/20250902080957.47265-1-simon.schippers@tu-dortmund.de/T/#u
>> - Target net-next instead of net
>> - Changed to patch series instead of single patch
>> - Changed to new title from old title
>> "TUN/TAP: Improving throughput and latency by avoiding SKB drops"
>> - Wake netdev queue with new helpers wake_netdev_queue when there is any 
>> spare capacity in the ptr_ring instead of waiting for it to be empty
>> - Use tun_file instead of tun_struct in tun_ring_recv as a more consistent 
>> logic
>> - Use smp_wmb() and smp_rmb() barrier pair, which avoids any packet drops 
>> that happened rarely before
>> - Use safer logic for vhost-net using RCU read locks to access TUN/TAP data
>>
>> V3: https://lore.kernel.org/netdev/20250825211832.84901-1-simon.schippers@tu-dortmund.de/T/#u
>> - Added support for TAP and TAP+vhost-net.
>>
>> V2: https://lore.kernel.org/netdev/20250811220430.14063-1-simon.schippers@tu-dortmund.de/T/#u
>> - Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed 
>> unnecessary netif_tx_wake_queue in tun_ring_recv.
>>
>> V1: https://lore.kernel.org/netdev/20250808153721.261334-1-simon.schippers@tu-dortmund.de/T/#u
>> ---
>>
>> Simon Schippers (4):
>>   tun/tap: add ptr_ring consume helper with netdev queue wakeup
>>   vhost-net: wake queue of tun/tap after ptr_ring consume
>>   ptr_ring: move free-space check into separate helper
>>   tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present
>>
>>  drivers/net/tun.c        | 91 +++++++++++++++++++++++++++++++++++++---
>>  drivers/vhost/net.c      | 15 +++++--
>>  include/linux/if_tun.h   |  3 ++
>>  include/linux/ptr_ring.h | 14 ++++++-
>>  4 files changed, 111 insertions(+), 12 deletions(-)
>>
>> -- 
>> 2.43.0
>
Re: [PATCH net-next v8 0/4] tun/tap & vhost-net: apply qdisc backpressure on full ptr_ring to reduce TX drops
Posted by Michael S. Tsirkin 3 weeks, 4 days ago
On Fri, Mar 13, 2026 at 10:49:53AM +0100, Simon Schippers wrote:
> On 3/12/26 14:55, Michael S. Tsirkin wrote:
> > On Thu, Mar 12, 2026 at 02:06:35PM +0100, Simon Schippers wrote:
> >> This patch series deals with tun/tap & vhost-net which drop incoming
> >> SKBs whenever their internal ptr_ring buffer is full. Instead, with this 
> >> patch series, the associated netdev queue is stopped - but only when a
> >> qdisc is attached. If no qdisc is present the existing behavior is
> >> preserved. This patch series touches tun/tap and vhost-net, as they
> >> share common logic and must be updated together. Modifying only one of
> >> them would break the other.
> >>
> >> By applying proper backpressure, this change allows the connected qdisc to 
> >> operate correctly, as reported in [1], and significantly improves
> >> performance in real-world scenarios, as demonstrated in our paper [2]. For 
> >> example, we observed a 36% TCP throughput improvement for an OpenVPN 
> >> connection between Germany and the USA.
> >>
> >> Synthetic pktgen benchmarks indicate a slight regression.
> >> Pktgen benchmarks are provided per commit, with the final commit showing
> >> the overall performance.
> >>
> >> Thanks!
> > 
> > I posted a minor nit on patch 2.
> > 
> > Otherwise LGTM:
> > 
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > thanks for the work!
> 
> Thanks!
> 
> Should I do a new version for the minor nit?

It's easier if you do, yes.

> And how about the ptr_ring race:
> I see there is a seperate discussion for that now. [1]
> Should I wait for that?

If there's a conflict it will be trivial to resolve, so I'd say no.

> Before sending a potential new version I would of course wait for
> Jason's take.
> 
> [1] Link: https://lore.kernel.org/netdev/CANn89iJLC9p+N=Rqtuj7ZuPRdpSGCATCtZdz1Vi9mbzf3ATekQ@mail.gmail.com/
> 
> > 
> > 
> >> [1] Link: https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
> >> [2] Link: https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
> >> [3] Link: https://lore.kernel.org/r/174549940981.608169.4363875844729313831.stgit@firesoul
> >> [4] Link: https://lore.kernel.org/r/176295323282.307447.14790015927673763094.stgit@firesoul
> >>
> >> ---
> >> Changelog:
> >> V8:
> >> - Drop code changes in drivers/net/tap.c; The code there deals with
> >>   ipvtap/macvtap which are unrelated to the goal of this patch series
> >>   and I did not realize that before
> >> -> Greatly simplified logic, 4 instead of 9 commits
> >> -> No more duplicated logics and distinction in vhost required
> >> - Only wake after the queue stopped and half of the ring was consumed
> >>   as suggested by MST
> >> -> Performance improvements for TAP, but still slightly slower
> >> - Better benchmarking with pinned threads, XDP drop program for
> >>   tap+vhost-net and disabling CPU mitigations (and newer Ryzen 5 5600X
> >>   processor) as suggested by Jason Wang
> >>
> >> V7: https://lore.kernel.org/netdev/20260107210448.37851-1-simon.schippers@tu-dortmund.de/
> >> - Switch to an approach similar to veth [3] (excluding the recently fixed 
> >> variant [4]), as suggested by MST, with minor adjustments discussed in V6
> >> - Rename the cover-letter title
> >> - Add multithreaded pktgen and iperf3 benchmarks, as suggested by Jason 
> >> Wang
> >> - Rework __ptr_ring_consume_created_space() so it can also be used after 
> >> batched consume
> >>
> >> V6: https://lore.kernel.org/netdev/20251120152914.1127975-1-simon.schippers@tu-dortmund.de/
> >> General:
> >> - Major adjustments to the descriptions. Special thanks to Jon Kohler!
> >> - Fix git bisect by moving most logic into dedicated functions and only 
> >> start using them in patch 7.
> >> - Moved the main logic of the coupled producer and consumer into a single 
> >> patch to avoid a chicken-and-egg dependency between commits :-)
> >> - Rebased to 6.18-rc5 and ran benchmarks again that now also include lost 
> >> packets (previously I missed a 0, so all benchmark results were higher by 
> >> factor 10...).
> >> - Also include the benchmark in patch 7.
> >>
> >> Producer:
> >> - Move logic into the new helper tun_ring_produce()
> >> - Added a smp_rmb() paired with the consumer, ensuring freed space of the 
> >> consumer is visible
> >> - Assume that ptr_ring is not full when __ptr_ring_full_next() is called
> >>
> >> Consumer:
> >> - Use an unpaired smp_rmb() instead of barrier() to ensure that the 
> >> netdev_tx_queue_stopped() call completes before discarding
> >> - Also wake the netdev queue if it was stopped before discarding and then 
> >> becomes empty
> >> -> Fixes race with producer as identified by MST in V5
> >> -> Waking the netdev queues upon resize is not required anymore
> >> - Use __ptr_ring_consume_created_space() instead of messing with ptr_ring 
> >> internals
> >> -> Batched consume now just calls 
> >> __tun_ring_consume()/__tap_ring_consume() in a loop
> >> - Added an smp_wmb() before waking the netdev queue which is paired with 
> >> the smp_rmb() discussed above
> >>
> >> V5: https://lore.kernel.org/netdev/20250922221553.47802-1-simon.schippers@tu-dortmund.de/T/#u
> >> - Stop the netdev queue prior to producing the final fitting ptr_ring entry
> >> -> Ensures the consumer has the latest netdev queue state, making it safe 
> >> to wake the queue
> >> -> Resolves an issue in vhost-net where the netdev queue could remain 
> >> stopped despite being empty
> >> -> For TUN/TAP, the netdev queue no longer needs to be woken in the 
> >> blocking loop
> >> -> Introduces new helpers __ptr_ring_full_next and 
> >> __ptr_ring_will_invalidate for this purpose
> >> - vhost-net now uses wrappers of TUN/TAP for ptr_ring consumption rather 
> >> than maintaining its own rx_ring pointer
> >>
> >> V4: https://lore.kernel.org/netdev/20250902080957.47265-1-simon.schippers@tu-dortmund.de/T/#u
> >> - Target net-next instead of net
> >> - Changed to patch series instead of single patch
> >> - Changed to new title from old title
> >> "TUN/TAP: Improving throughput and latency by avoiding SKB drops"
> >> - Wake netdev queue with new helpers wake_netdev_queue when there is any 
> >> spare capacity in the ptr_ring instead of waiting for it to be empty
> >> - Use tun_file instead of tun_struct in tun_ring_recv as a more consistent 
> >> logic
> >> - Use smp_wmb() and smp_rmb() barrier pair, which avoids any packet drops 
> >> that happened rarely before
> >> - Use safer logic for vhost-net using RCU read locks to access TUN/TAP data
> >>
> >> V3: https://lore.kernel.org/netdev/20250825211832.84901-1-simon.schippers@tu-dortmund.de/T/#u
> >> - Added support for TAP and TAP+vhost-net.
> >>
> >> V2: https://lore.kernel.org/netdev/20250811220430.14063-1-simon.schippers@tu-dortmund.de/T/#u
> >> - Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed 
> >> unnecessary netif_tx_wake_queue in tun_ring_recv.
> >>
> >> V1: https://lore.kernel.org/netdev/20250808153721.261334-1-simon.schippers@tu-dortmund.de/T/#u
> >> ---
> >>
> >> Simon Schippers (4):
> >>   tun/tap: add ptr_ring consume helper with netdev queue wakeup
> >>   vhost-net: wake queue of tun/tap after ptr_ring consume
> >>   ptr_ring: move free-space check into separate helper
> >>   tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present
> >>
> >>  drivers/net/tun.c        | 91 +++++++++++++++++++++++++++++++++++++---
> >>  drivers/vhost/net.c      | 15 +++++--
> >>  include/linux/if_tun.h   |  3 ++
> >>  include/linux/ptr_ring.h | 14 ++++++-
> >>  4 files changed, 111 insertions(+), 12 deletions(-)
> >>
> >> -- 
> >> 2.43.0
> >