From nobody Tue Apr 7 16:17:48 2026 Received: from unimail.uni-dortmund.de (mx1.hrz.uni-dortmund.de [129.217.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E3B13C3436; Thu, 12 Mar 2026 13:07:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=129.217.128.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773320824; cv=none; b=OHtnZydBR/+TfQpGi+wF7aGC5TWCxPbGIiii8HxpoY/cRWCMlmCp/79ZDE3J1AEW/BaahD2jcEoslF7jiVrKN8DG1wrgDUH66u+mxZ9seHFQRX26UgIm/T7nMjxY9b2rM1DClYJySEieXT5oWwUESjiQK2yOAOiNZ/QfzkQTgCc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773320824; c=relaxed/simple; bh=ZL6w4Nml5BRk4h5VWdpCH4AYdVCznstv0Nwlmc4aX4s=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aHcii32dvyRv+GQ5eDXbuFBK8Oeu8vEaYb+61nw1CWrvJqdD26HJwJr5l1r8ha/AaAU26LYQz7zPdSy7P4lCdNluyP/P5YJTLxhsNnkfMgjf4z5cp5f9e7z2kPoYh2H3QOvKuSbp2NyLhMj5q72gcjhZ0RAmkI5iZQcgqMrFPTs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=tu-dortmund.de; spf=pass smtp.mailfrom=tu-dortmund.de; dkim=pass (1024-bit key) header.d=tu-dortmund.de header.i=@tu-dortmund.de header.b=JaiPru6m; arc=none smtp.client-ip=129.217.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=tu-dortmund.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tu-dortmund.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=tu-dortmund.de header.i=@tu-dortmund.de header.b="JaiPru6m" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tu-dortmund.de; s=unimail; t=1773320811; bh=ZL6w4Nml5BRk4h5VWdpCH4AYdVCznstv0Nwlmc4aX4s=; h=From:To:Subject:Date:In-Reply-To:References; b=JaiPru6mKPbKf7+xG36DOlgz8I7/zw/maYpDmPFTbCDa8j2XQbCwqL/dGUd/49TNU J3BYEMCJZfYzOnSFDRdru0iRfHxWzW2oBZMPDlRANUkGnl+fLHab+r00CBF86FRYWa xamnvh83BvnDIj5Ye2Ts2vRHSHcVdxiyn0t4FrCo= Received: from simon-Latitude-5450.fritz.box ([129.217.186.105]) (authenticated bits=0) by unimail.uni-dortmund.de (8.18.2/8.18.2) with ESMTPSA id 62CD6mqh014094 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 12 Mar 2026 14:06:50 +0100 (CET) From: Simon Schippers To: willemdebruijn.kernel@gmail.com, jasowang@redhat.com, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, mst@redhat.com, eperezma@redhat.com, leiyang@redhat.com, stephen@networkplumber.org, jon@nutanix.com, tim.gebauer@tu-dortmund.de, simon.schippers@tu-dortmund.de, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux.dev Subject: [PATCH net-next v8 1/4] tun/tap: add ptr_ring consume helper with netdev queue wakeup Date: Thu, 12 Mar 2026 14:06:36 +0100 Message-ID: <20260312130639.138988-2-simon.schippers@tu-dortmund.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260312130639.138988-1-simon.schippers@tu-dortmund.de> References: <20260312130639.138988-1-simon.schippers@tu-dortmund.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce tun_ring_consume() that wraps ptr_ring_consume() and calls __tun_wake_queue(). The latter wakes the stopped netdev subqueue once half of the ring capacity has been consumed, tracked via the new cons_cnt field in tun_file. When the ring is empty the queue is also woken to handle potential races. Without the corresponding queue stopping (introduced in a subsequent commit), this patch alone causes no regression for a tap setup sending to a qemu VM: 1.151 Mpps to 1.153 Mpps. Details: AMD Ryzen 5 5600X at 4.3 GHz, 3200 MHz RAM, isolated QEMU threads, pktgen sender; Avg over 20 runs @ 100,000,000 packets; SRSO and spectre v2 mitigations disabled. Co-developed-by: Tim Gebauer Signed-off-by: Tim Gebauer Signed-off-by: Simon Schippers Acked-by: Michael S. Tsirkin --- drivers/net/tun.c | 40 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index c492fda6fc15..a82d665dab5f 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -145,6 +145,7 @@ struct tun_file { struct list_head next; struct tun_struct *detached; struct ptr_ring tx_ring; + int cons_cnt; struct xdp_rxq_info xdp_rxq; }; =20 @@ -564,6 +565,7 @@ static void tun_queue_purge(struct tun_file *tfile) while ((ptr =3D ptr_ring_consume(&tfile->tx_ring)) !=3D NULL) tun_ptr_free(ptr); =20 + tfile->cons_cnt =3D 0; skb_queue_purge(&tfile->sk.sk_write_queue); skb_queue_purge(&tfile->sk.sk_error_queue); } @@ -730,6 +732,7 @@ static int tun_attach(struct tun_struct *tun, struct fi= le *file, goto out; } =20 + tfile->cons_cnt =3D 0; tfile->queue_index =3D tun->numqueues; tfile->socket.sk->sk_shutdown &=3D ~RCV_SHUTDOWN; =20 @@ -2113,13 +2116,39 @@ static ssize_t tun_put_user(struct tun_struct *tun, return total; } =20 -static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err) +static void __tun_wake_queue(struct tun_struct *tun, struct tun_file *tfil= e) +{ + if (ptr_ring_empty(&tfile->tx_ring)) + goto wake; + + if (!__netif_subqueue_stopped(tun->dev, tfile->queue_index) || + ++tfile->cons_cnt < tfile->tx_ring.size / 2) + return; + +wake: + netif_wake_subqueue(tun->dev, tfile->queue_index); + tfile->cons_cnt =3D 0; +} + +static void *tun_ring_consume(struct tun_struct *tun, struct tun_file *tfi= le) +{ + void *ptr; + + ptr =3D ptr_ring_consume(&tfile->tx_ring); + if (ptr) + __tun_wake_queue(tun, tfile); + + return ptr; +} + +static void *tun_ring_recv(struct tun_struct *tun, struct tun_file *tfile, + int noblock, int *err) { DECLARE_WAITQUEUE(wait, current); void *ptr =3D NULL; int error =3D 0; =20 - ptr =3D ptr_ring_consume(&tfile->tx_ring); + ptr =3D tun_ring_consume(tun, tfile); if (ptr) goto out; if (noblock) { @@ -2131,7 +2160,7 @@ static void *tun_ring_recv(struct tun_file *tfile, in= t noblock, int *err) =20 while (1) { set_current_state(TASK_INTERRUPTIBLE); - ptr =3D ptr_ring_consume(&tfile->tx_ring); + ptr =3D tun_ring_consume(tun, tfile); if (ptr) break; if (signal_pending(current)) { @@ -2168,7 +2197,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, st= ruct tun_file *tfile, =20 if (!ptr) { /* Read frames from ring */ - ptr =3D tun_ring_recv(tfile, noblock, &err); + ptr =3D tun_ring_recv(tun, tfile, noblock, &err); if (!ptr) return err; } @@ -3404,6 +3433,8 @@ static int tun_chr_open(struct inode *inode, struct f= ile * file) return -ENOMEM; } =20 + tfile->cons_cnt =3D 0; + mutex_init(&tfile->napi_mutex); RCU_INIT_POINTER(tfile->tun, NULL); tfile->flags =3D 0; @@ -3612,6 +3643,7 @@ static int tun_queue_resize(struct tun_struct *tun) for (i =3D 0; i < tun->numqueues; i++) { tfile =3D rtnl_dereference(tun->tfiles[i]); rings[i] =3D &tfile->tx_ring; + tfile->cons_cnt =3D 0; } list_for_each_entry(tfile, &tun->disabled, next) rings[i++] =3D &tfile->tx_ring; --=20 2.43.0 From nobody Tue Apr 7 16:17:48 2026 Received: from unimail.uni-dortmund.de (mx1.hrz.uni-dortmund.de [129.217.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E61C3C4576; Thu, 12 Mar 2026 13:07:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=129.217.128.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773320825; cv=none; b=cjC7itUZLY7S6xHE5INzNFdDtdHLZVDGiBqYTW1VkRCHqzvCoAO2kdijTSHYdZsgx9yDW8PaQzND5CsMMgy8iTWNBXhrpxC6N/6Whk4KkmzCM8MJgpnxABoSQUKO8f3AmrDNuYRTIyEakBZ+Olxt/MDfoEH57KsNKyB6bpM5gXw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773320825; c=relaxed/simple; bh=4HehsY8l5R7T57W9CPx8ozXdCfAAZac4RFP3cRM7vt8=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pzt5BeNWem3zxUv8lCgfxbt7TKy4lAbIr2RPXITODj1AoEsc5VP//yHp4LC75tbbe0sHXhW9RC808sHUw2UMr3AYSJvNaxAA477utm7o9xQmjdB4Fgu8b44FxOMyKEdskRmySs/RvnPvNz6yRQbf1lrQBm+VBjzIcbO0JBrw/Go= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=tu-dortmund.de; spf=pass smtp.mailfrom=tu-dortmund.de; dkim=pass (1024-bit key) header.d=tu-dortmund.de header.i=@tu-dortmund.de header.b=cZDixCV2; arc=none smtp.client-ip=129.217.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=tu-dortmund.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tu-dortmund.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=tu-dortmund.de header.i=@tu-dortmund.de header.b="cZDixCV2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tu-dortmund.de; s=unimail; t=1773320811; bh=4HehsY8l5R7T57W9CPx8ozXdCfAAZac4RFP3cRM7vt8=; h=From:To:Subject:Date:In-Reply-To:References; b=cZDixCV2TYkAFMMcNpHa2jbnbm8MATQ67xu5FTTVlt7dfxmQBNgMq2RWyohitk9Pj 73HV+SNUU2UGUi9LwVuQ4IbPR/3U3uPwYNz3GgtOzzBu7gafucjj/IeI6+puzzOXcH 92/mnTHNZmrNy/f7km9RMkaZWCJIcdJA8CxdpKBw= Received: from simon-Latitude-5450.fritz.box ([129.217.186.105]) (authenticated bits=0) by unimail.uni-dortmund.de (8.18.2/8.18.2) with ESMTPSA id 62CD6mqj014094 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 12 Mar 2026 14:06:51 +0100 (CET) From: Simon Schippers To: willemdebruijn.kernel@gmail.com, jasowang@redhat.com, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, mst@redhat.com, eperezma@redhat.com, leiyang@redhat.com, stephen@networkplumber.org, jon@nutanix.com, tim.gebauer@tu-dortmund.de, simon.schippers@tu-dortmund.de, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux.dev Subject: [PATCH net-next v8 2/4] vhost-net: wake queue of tun/tap after ptr_ring consume Date: Thu, 12 Mar 2026 14:06:37 +0100 Message-ID: <20260312130639.138988-3-simon.schippers@tu-dortmund.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260312130639.138988-1-simon.schippers@tu-dortmund.de> References: <20260312130639.138988-1-simon.schippers@tu-dortmund.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add tun_wake_queue() to tun.c and export it for use by vhost-net. The function validates that the file belongs to a tun/tap device, dereferences the tun_struct under RCU, and delegates to __tun_wake_queue(). vhost_net_buf_produce() now calls tun_wake_queue() after a successful batched consume of the ring to allow the netdev subqueue to be woken up. Without the corresponding queue stopping (introduced in a subsequent commit), this patch alone causes a slight throughput regression for a tap+vhost-net setup sending to a qemu VM: 3.948 Mpps to 3.888 Mpps (-1.5%). Details: AMD Ryzen 5 5600X at 4.3 GHz, 3200 MHz RAM, isolated QEMU threads, XDP drop program active in VM, pktgen sender; Avg over 20 runs @ 100,000,000 packets. SRSO and spectre v2 mitigations disabled. Co-developed-by: Tim Gebauer Signed-off-by: Tim Gebauer Signed-off-by: Simon Schippers Acked-by: Michael S. Tsirkin --- drivers/net/tun.c | 21 +++++++++++++++++++++ drivers/vhost/net.c | 15 +++++++++++---- include/linux/if_tun.h | 3 +++ 3 files changed, 35 insertions(+), 4 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index a82d665dab5f..b86582cc6cb6 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -3760,6 +3760,27 @@ struct ptr_ring *tun_get_tx_ring(struct file *file) } EXPORT_SYMBOL_GPL(tun_get_tx_ring); =20 +void tun_wake_queue(struct file *file) +{ + struct tun_file *tfile; + struct tun_struct *tun; + + if (file->f_op !=3D &tun_fops) + return; + tfile =3D file->private_data; + if (!tfile) + return; + + rcu_read_lock(); + + tun =3D rcu_dereference(tfile->tun); + if (tun) + __tun_wake_queue(tun, tfile); + + rcu_read_unlock(); +} +EXPORT_SYMBOL_GPL(tun_wake_queue); + module_init(tun_init); module_exit(tun_cleanup); MODULE_DESCRIPTION(DRV_DESCRIPTION); diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 80965181920c..c8ef804ef28c 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -176,13 +176,19 @@ static void *vhost_net_buf_consume(struct vhost_net_b= uf *rxq) return ret; } =20 -static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq) +static int vhost_net_buf_produce(struct sock *sk, + struct vhost_net_virtqueue *nvq) { + struct file *file =3D sk->sk_socket->file; struct vhost_net_buf *rxq =3D &nvq->rxq; =20 rxq->head =3D 0; rxq->tail =3D ptr_ring_consume_batched(nvq->rx_ring, rxq->queue, VHOST_NET_BATCH); + + if (rxq->tail) + tun_wake_queue(file); + return rxq->tail; } =20 @@ -209,14 +215,15 @@ static int vhost_net_buf_peek_len(void *ptr) return __skb_array_len_with_tag(ptr); } =20 -static int vhost_net_buf_peek(struct vhost_net_virtqueue *nvq) +static int vhost_net_buf_peek(struct sock *sk, + struct vhost_net_virtqueue *nvq) { struct vhost_net_buf *rxq =3D &nvq->rxq; =20 if (!vhost_net_buf_is_empty(rxq)) goto out; =20 - if (!vhost_net_buf_produce(nvq)) + if (!vhost_net_buf_produce(sk, nvq)) return 0; =20 out: @@ -995,7 +1002,7 @@ static int peek_head_len(struct vhost_net_virtqueue *r= vq, struct sock *sk) unsigned long flags; =20 if (rvq->rx_ring) - return vhost_net_buf_peek(rvq); + return vhost_net_buf_peek(sk, rvq); =20 spin_lock_irqsave(&sk->sk_receive_queue.lock, flags); head =3D skb_peek(&sk->sk_receive_queue); diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h index 80166eb62f41..ab3b4ebca059 100644 --- a/include/linux/if_tun.h +++ b/include/linux/if_tun.h @@ -22,6 +22,7 @@ struct tun_msg_ctl { #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) struct socket *tun_get_socket(struct file *); struct ptr_ring *tun_get_tx_ring(struct file *file); +void tun_wake_queue(struct file *file); =20 static inline bool tun_is_xdp_frame(void *ptr) { @@ -55,6 +56,8 @@ static inline struct ptr_ring *tun_get_tx_ring(struct fil= e *f) return ERR_PTR(-EINVAL); } =20 +static inline void tun_wake_queue(struct file *f) {} + static inline bool tun_is_xdp_frame(void *ptr) { return false; --=20 2.43.0 From nobody Tue Apr 7 16:17:48 2026 Received: from unimail.uni-dortmund.de (mx1.hrz.uni-dortmund.de [129.217.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E1B12DA759; Thu, 12 Mar 2026 13:07:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=129.217.128.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773320824; cv=none; b=Po+Ng28YjlYI/SI/PDieKJdBsd9m5MxVOix+i0ALDGQPLDtB0zKMzTr5jeVrZaXZ1EOZO8Q93Bi9Ghwo3czx5LjZp+GaF6rTl2Dkhwo9CMYsarWK4XFMNMIZN580PUo/olO/n9z9xKudJkKOITK5EcJNECGIDrMJJSCOyaQcM2Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773320824; c=relaxed/simple; bh=i6EXM/dz88Cyoq8LtFGDSx5xkGpNbJe1oE+QY2KW7uQ=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kIdteaW01wp483ckPHMF65skwTqUzJIvXNHUmUr0T1iHqAAQ97HfyI4i/TdGzTkWBZdKYdj+zOioGu4RsO6u7+j42EB30TnVXoASAfWNypRbFASyz9WPDcLRMf+ynKsF3whOIkKIsYpoJCkKNWrl2lXGsb0Y65g7eEMedbCVlLQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=tu-dortmund.de; spf=pass smtp.mailfrom=tu-dortmund.de; dkim=pass (1024-bit key) header.d=tu-dortmund.de header.i=@tu-dortmund.de header.b=L7Fn6Usr; arc=none smtp.client-ip=129.217.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=tu-dortmund.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tu-dortmund.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=tu-dortmund.de header.i=@tu-dortmund.de header.b="L7Fn6Usr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tu-dortmund.de; s=unimail; t=1773320812; bh=i6EXM/dz88Cyoq8LtFGDSx5xkGpNbJe1oE+QY2KW7uQ=; h=From:To:Subject:Date:In-Reply-To:References; b=L7Fn6Usr6CqpQSJg+sTfOg+35BhQICdFNJsUxwLeYojj09WsbqE+QES+PyTLa3XFu FxS04Bhek6fbU8EeHr7JE5aXHsrDYgmS7DMPN3pylLsP+mmqheDPHhH+upjzAsxHlu Hu27Gl+oOCiNAmlpvvsN7rxBct8aXBLITJeT4+sg= Received: from simon-Latitude-5450.fritz.box ([129.217.186.105]) (authenticated bits=0) by unimail.uni-dortmund.de (8.18.2/8.18.2) with ESMTPSA id 62CD6mql014094 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 12 Mar 2026 14:06:51 +0100 (CET) From: Simon Schippers To: willemdebruijn.kernel@gmail.com, jasowang@redhat.com, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, mst@redhat.com, eperezma@redhat.com, leiyang@redhat.com, stephen@networkplumber.org, jon@nutanix.com, tim.gebauer@tu-dortmund.de, simon.schippers@tu-dortmund.de, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux.dev Subject: [PATCH net-next v8 3/4] ptr_ring: move free-space check into separate helper Date: Thu, 12 Mar 2026 14:06:38 +0100 Message-ID: <20260312130639.138988-4-simon.schippers@tu-dortmund.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260312130639.138988-1-simon.schippers@tu-dortmund.de> References: <20260312130639.138988-1-simon.schippers@tu-dortmund.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch moves the check for available free space for a new entry into a separate function. As a result, __ptr_ring_produce() remains logically unchanged, while the new helper allows callers to determine in advance whether subsequent __ptr_ring_produce() calls will succeed. This information can, for example, be used to temporarily stop producing until __ptr_ring_peek() indicates that space is available again. Co-developed-by: Tim Gebauer Signed-off-by: Tim Gebauer Signed-off-by: Simon Schippers Acked-by: Michael S. Tsirkin --- include/linux/ptr_ring.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h index 534531807d95..a5a3fa4916d3 100644 --- a/include/linux/ptr_ring.h +++ b/include/linux/ptr_ring.h @@ -96,6 +96,14 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r) return ret; } =20 +static inline int __ptr_ring_produce_peek(struct ptr_ring *r) +{ + if (unlikely(!r->size) || r->queue[r->producer]) + return -ENOSPC; + + return 0; +} + /* Note: callers invoking this in a loop must use a compiler barrier, * for example cpu_relax(). Callers must hold producer_lock. * Callers are responsible for making sure pointer that is being queued @@ -103,8 +111,10 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r) */ static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr) { - if (unlikely(!r->size) || r->queue[r->producer]) - return -ENOSPC; + int p =3D __ptr_ring_produce_peek(r); + + if (p) + return p; =20 /* Make sure the pointer we are storing points to a valid data. */ /* Pairs with the dependency ordering in __ptr_ring_consume. */ --=20 2.43.0 From nobody Tue Apr 7 16:17:48 2026 Received: from unimail.uni-dortmund.de (mx1.hrz.uni-dortmund.de [129.217.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E5223C4566; Thu, 12 Mar 2026 13:07:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=129.217.128.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773320825; cv=none; b=QfcGdirBN473ZwRvs0aOz837eqksG84vqQbtZOHST5pSPELm14HrzQRY1eqSqPjhOSGnZc7QKhtV0VGLEUxTEtIOff+b8AY2/FZR3qNlL4emysryjb0wGDL2uL3sPnwwFiqoR9Rwr5JH7a6c8UTkTfnnT3FBJNLBAq0vGTwC2+k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773320825; c=relaxed/simple; bh=14g8Dn3IYRYwobx+qBN31EM/eWBQu9pcStubPs0cKHc=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KrItxwnIkSpvkt7Y8NPg17GHGfbSI1GpLa7oXzRlZQ6u2JtTf931haTk2iFH8bDVlivjv++48iQqveJjBB9CAwDOjPDYKg3rx+7OgbDTxbUaUk6aZTzlUOXnXMVNFKTlYJ8yPxjXj+XyPzDgoP1s4yJwSg9eOrTsEPzcdAR20II= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=tu-dortmund.de; spf=pass smtp.mailfrom=tu-dortmund.de; dkim=pass (1024-bit key) header.d=tu-dortmund.de header.i=@tu-dortmund.de header.b=DknabfDH; arc=none smtp.client-ip=129.217.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=tu-dortmund.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tu-dortmund.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=tu-dortmund.de header.i=@tu-dortmund.de header.b="DknabfDH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tu-dortmund.de; s=unimail; t=1773320812; bh=14g8Dn3IYRYwobx+qBN31EM/eWBQu9pcStubPs0cKHc=; h=From:To:Subject:Date:In-Reply-To:References; b=DknabfDHUcW1dg3aQivqf4TQmz8Lm6eGW3e0/Qt/nDgixLG060gF/35aeJttA+sKX h2KpKNBEvPv3tOvGJmQLFFOcrjWdQwgh7+xV+7gMpr4UAaM06OZ1YH2m8MzPiXnWeM 6cJe4xBcyTpGFmyoCOUjKMPr33vv2ea5dJwsUF+g= Received: from simon-Latitude-5450.fritz.box ([129.217.186.105]) (authenticated bits=0) by unimail.uni-dortmund.de (8.18.2/8.18.2) with ESMTPSA id 62CD6mqn014094 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 12 Mar 2026 14:06:52 +0100 (CET) From: Simon Schippers To: willemdebruijn.kernel@gmail.com, jasowang@redhat.com, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, mst@redhat.com, eperezma@redhat.com, leiyang@redhat.com, stephen@networkplumber.org, jon@nutanix.com, tim.gebauer@tu-dortmund.de, simon.schippers@tu-dortmund.de, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux.dev Subject: [PATCH net-next v8 4/4] tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present Date: Thu, 12 Mar 2026 14:06:39 +0100 Message-ID: <20260312130639.138988-5-simon.schippers@tu-dortmund.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260312130639.138988-1-simon.schippers@tu-dortmund.de> References: <20260312130639.138988-1-simon.schippers@tu-dortmund.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This commit prevents tail-drop when a qdisc is present and the ptr_ring becomes full. Once an entry is successfully produced and the ptr_ring reaches capacity, the netdev queue is stopped instead of dropping subsequent packets. If producing an entry fails anyways due to a race, tun_net_xmit returns NETDEV_TX_BUSY, again avoiding a drop. Such races are expected because LLTX is enabled and the transmit path operates without the usual locking. The existing __tun_wake_queue() function wakes the netdev queue. Races between this wakeup and the queue-stop logic could leave the queue stopped indefinitely. To prevent this, a memory barrier is enforced (as discussed in a similar implementation in [1]), followed by a recheck that wakes the queue if space is already available. If no qdisc is present, the previous tail-drop behavior is preserved. Benchmarks: The benchmarks show a slight regression in raw transmission performance, though no packets are lost anymore. The previously introduced threshold to only wake after the queue stopped and half of the ring was consumed showed to be a descent choice: Waking the queue whenever a consume made space in the ring strongly degrades performance for tap, while waking only when the ring is empty is too late and also hurts throughput for tap & tap+vhost-net. Other ratios (3/4, 7/8) showed similar results (not shown here), so 1/2 was chosen for the sake of simplicity for both tun/tap and tun/tap+vhost-net. Test setup: AMD Ryzen 5 5600X at 4.3 GHz, 3200 MHz RAM, isolated QEMU threads; Average over 20 runs @ 100,000,000 packets. SRSO and spectre v2 mitigations disabled. Note for tap+vhost-net: XDP drop program active in VM -> ~2.5x faster, slower for tap due to more syscalls (high utilization of entry_SYSRETQ_unsafe_stack in perf) +--------------------------+--------------+----------------+----------+ | 1 thread | Stock | Patched with | diff | | sending | | fq_codel qdisc | | +------------+-------------+--------------+----------------+----------+ | TAP | Transmitted | 1.151 Mpps | 1.139 Mpps | -1.1% | | +-------------+--------------+----------------+----------+ | | Lost/s | 3.606 Mpps | 0 pps | | +------------+-------------+--------------+----------------+----------+ | TAP | Transmitted | 3.948 Mpps | 3.738 Mpps | -5.3% | | +-------------+--------------+----------------+----------+ | +vhost-net | Lost/s | 496.5 Kpps | 0 pps | | +------------+-------------+--------------+----------------+----------+ +--------------------------+--------------+----------------+----------+ | 2 threads | Stock | Patched with | diff | | sending | | fq_codel qdisc | | +------------+-------------+--------------+----------------+----------+ | TAP | Transmitted | 1.133 Mpps | 1.109 Mpps | -2.1% | | +-------------+--------------+----------------+----------+ | | Lost/s | 8.269 Mpps | 0 pps | | +------------+-------------+--------------+----------------+----------+ | TAP | Transmitted | 3.820 Mpps | 3.513 Mpps | -8.0% | | +-------------+--------------+----------------+----------+ | +vhost-net | Lost/s | 4.961 Mpps | 0 pps | | +------------+-------------+--------------+----------------+----------+ [1] Link: https://lore.kernel.org/all/20250424085358.75d817ae@kernel.org/ Co-developed-by: Tim Gebauer Signed-off-by: Tim Gebauer Signed-off-by: Simon Schippers Acked-by: Michael S. Tsirkin --- drivers/net/tun.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index b86582cc6cb6..9b7daec69acd 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1011,6 +1011,8 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, = struct net_device *dev) struct netdev_queue *queue; struct tun_file *tfile; int len =3D skb->len; + bool qdisc_present; + int ret; =20 rcu_read_lock(); tfile =3D rcu_dereference(tun->tfiles[txq]); @@ -1063,13 +1065,37 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb= , struct net_device *dev) =20 nf_reset_ct(skb); =20 - if (ptr_ring_produce(&tfile->tx_ring, skb)) { + queue =3D netdev_get_tx_queue(dev, txq); + qdisc_present =3D !qdisc_txq_has_no_queue(queue); + + spin_lock(&tfile->tx_ring.producer_lock); + ret =3D __ptr_ring_produce(&tfile->tx_ring, skb); + if (__ptr_ring_produce_peek(&tfile->tx_ring) && qdisc_present) { + netif_tx_stop_queue(queue); + /* Avoid races with queue wake-ups in __tun_wake_queue by + * waking if space is available in a re-check. + * The barrier makes sure that the stop is visible before + * we re-check. + */ + smp_mb__after_atomic(); + if (!__ptr_ring_produce_peek(&tfile->tx_ring)) + netif_tx_wake_queue(queue); + } + spin_unlock(&tfile->tx_ring.producer_lock); + + if (ret) { + /* If a qdisc is attached to our virtual device, + * returning NETDEV_TX_BUSY is allowed. + */ + if (qdisc_present) { + rcu_read_unlock(); + return NETDEV_TX_BUSY; + } drop_reason =3D SKB_DROP_REASON_FULL_RING; goto drop; } =20 /* dev->lltx requires to do our own update of trans_start */ - queue =3D netdev_get_tx_queue(dev, txq); txq_trans_cond_update(queue); =20 /* Notify and wake up reader process */ --=20 2.43.0