From nobody Tue Dec 9 03:07:28 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8969313AF2 for ; Thu, 13 Nov 2025 01:54:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762998876; cv=none; b=hrQeo4k6eO/LdnlimYv1zd6NhNPe8wbCmPf5xTbL+PbWWeybrrbhhkfIDjnG1fQxB5D3lzNh3KmZki4fAOp+9V+TYh7yyI/Mqr0YsTGZ3UVRpRL0AMXbCyngM3YyEwYVPZBYJBvCRdjQHuAQu0nHOgSWtHvY7g1YmcFmIvgutNs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762998876; c=relaxed/simple; bh=gnWSNzi0OA3gG3/b8AUlMQAVwsXgB/vT1Z9sJGw/krw=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=KtNy5nhE+TbumYO5+8Ef9ILSL87xTSd9NR0Gi7DhYhE0N1NMu4Hz3GwTembKKy5ONgEqBXIE7IAt0Dx/PymgisKKpQCnnMJaYRyOXwIJHG0FCVTt2KsYyDYlLMcw9tLSMw6SGec1HBzafA9AsFHim7/MTTJc5S5jV3CJFl/9sCs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=fQkIksfs; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fQkIksfs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762998872; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=A9AVvw2WJi8KKO+NxgyZBOodrIZvASz8rlcC9l8TZKw=; b=fQkIksfs8LKlsS5D00w6MAZR8fTMREHqebIMfbd+c8M+my9aLh1a3bjZsjqzY85nC9qtff UhTiaojMdN+QUV1uls8Bnqz1C6IO2STf7VVES3c+qjTnSQVeESwsPT7ZGIt4nO8GYkXRzn iiew6tamtuJ8lYT4MGr/ducMbQmXsbg= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-595-_adgK1-DMFeFQ7Y_acducQ-1; Wed, 12 Nov 2025 20:54:30 -0500 X-MC-Unique: _adgK1-DMFeFQ7Y_acducQ-1 X-Mimecast-MFC-AGG-ID: _adgK1-DMFeFQ7Y_acducQ_1762998869 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 83C9C1800451; Thu, 13 Nov 2025 01:54:29 +0000 (UTC) Received: from localhost.localdomain (unknown [10.72.120.25]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CF2E6180094B; Thu, 13 Nov 2025 01:54:24 +0000 (UTC) From: Jason Wang To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jason Wang , stable@vger.kernel.org Subject: [PATCH net] vhost: rewind next_avail_head while discarding descriptors Date: Thu, 13 Nov 2025 09:54:20 +0800 Message-ID: <20251113015420.3496-1-jasowang@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 Content-Type: text/plain; charset="utf-8" When discarding descriptors with IN_ORDER, we should rewind next_avail_head otherwise it would run out of sync with last_avail_idx. This would cause driver to report "id X is not a head". Fixing this by returning the number of descriptors that is used for each buffer via vhost_get_vq_desc_n() so caller can use the value while discarding descriptors. Fixes: 67a873df0c41 ("vhost: basic in order support") Cc: stable@vger.kernel.org Signed-off-by: Jason Wang --- drivers/vhost/net.c | 53 ++++++++++++++++++++++++++----------------- drivers/vhost/vhost.c | 43 ++++++++++++++++++++++++----------- drivers/vhost/vhost.h | 9 +++++++- 3 files changed, 70 insertions(+), 35 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 35ded4330431..8f7f50acb6d6 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -592,14 +592,15 @@ static void vhost_net_busy_poll(struct vhost_net *net, static int vhost_net_tx_get_vq_desc(struct vhost_net *net, struct vhost_net_virtqueue *tnvq, unsigned int *out_num, unsigned int *in_num, - struct msghdr *msghdr, bool *busyloop_intr) + struct msghdr *msghdr, bool *busyloop_intr, + unsigned int *ndesc) { struct vhost_net_virtqueue *rnvq =3D &net->vqs[VHOST_NET_VQ_RX]; struct vhost_virtqueue *rvq =3D &rnvq->vq; struct vhost_virtqueue *tvq =3D &tnvq->vq; =20 - int r =3D vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov), - out_num, in_num, NULL, NULL); + int r =3D vhost_get_vq_desc_n(tvq, tvq->iov, ARRAY_SIZE(tvq->iov), + out_num, in_num, NULL, NULL, ndesc); =20 if (r =3D=3D tvq->num && tvq->busyloop_timeout) { /* Flush batched packets first */ @@ -610,8 +611,8 @@ static int vhost_net_tx_get_vq_desc(struct vhost_net *n= et, =20 vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, false); =20 - r =3D vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov), - out_num, in_num, NULL, NULL); + r =3D vhost_get_vq_desc_n(tvq, tvq->iov, ARRAY_SIZE(tvq->iov), + out_num, in_num, NULL, NULL, ndesc); } =20 return r; @@ -642,12 +643,14 @@ static int get_tx_bufs(struct vhost_net *net, struct vhost_net_virtqueue *nvq, struct msghdr *msg, unsigned int *out, unsigned int *in, - size_t *len, bool *busyloop_intr) + size_t *len, bool *busyloop_intr, + unsigned int *ndesc) { struct vhost_virtqueue *vq =3D &nvq->vq; int ret; =20 - ret =3D vhost_net_tx_get_vq_desc(net, nvq, out, in, msg, busyloop_intr); + ret =3D vhost_net_tx_get_vq_desc(net, nvq, out, in, msg, + busyloop_intr, ndesc); =20 if (ret < 0 || ret =3D=3D vq->num) return ret; @@ -766,6 +769,7 @@ static void handle_tx_copy(struct vhost_net *net, struc= t socket *sock) int sent_pkts =3D 0; bool sock_can_batch =3D (sock->sk->sk_sndbuf =3D=3D INT_MAX); bool in_order =3D vhost_has_feature(vq, VIRTIO_F_IN_ORDER); + unsigned int ndesc =3D 0; =20 do { bool busyloop_intr =3D false; @@ -774,7 +778,7 @@ static void handle_tx_copy(struct vhost_net *net, struc= t socket *sock) vhost_tx_batch(net, nvq, sock, &msg); =20 head =3D get_tx_bufs(net, nvq, &msg, &out, &in, &len, - &busyloop_intr); + &busyloop_intr, &ndesc); /* On error, stop handling until the next kick. */ if (unlikely(head < 0)) break; @@ -806,7 +810,7 @@ static void handle_tx_copy(struct vhost_net *net, struc= t socket *sock) goto done; } else if (unlikely(err !=3D -ENOSPC)) { vhost_tx_batch(net, nvq, sock, &msg); - vhost_discard_vq_desc(vq, 1); + vhost_discard_vq_desc(vq, 1, ndesc); vhost_net_enable_vq(net, vq); break; } @@ -829,7 +833,7 @@ static void handle_tx_copy(struct vhost_net *net, struc= t socket *sock) err =3D sock->ops->sendmsg(sock, &msg, len); if (unlikely(err < 0)) { if (err =3D=3D -EAGAIN || err =3D=3D -ENOMEM || err =3D=3D -ENOBUFS) { - vhost_discard_vq_desc(vq, 1); + vhost_discard_vq_desc(vq, 1, ndesc); vhost_net_enable_vq(net, vq); break; } @@ -868,6 +872,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, s= truct socket *sock) int err; struct vhost_net_ubuf_ref *ubufs; struct ubuf_info_msgzc *ubuf; + unsigned int ndesc =3D 0; bool zcopy_used; int sent_pkts =3D 0; =20 @@ -879,7 +884,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, s= truct socket *sock) =20 busyloop_intr =3D false; head =3D get_tx_bufs(net, nvq, &msg, &out, &in, &len, - &busyloop_intr); + &busyloop_intr, &ndesc); /* On error, stop handling until the next kick. */ if (unlikely(head < 0)) break; @@ -941,7 +946,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, s= truct socket *sock) vq->heads[ubuf->desc].len =3D VHOST_DMA_DONE_LEN; } if (retry) { - vhost_discard_vq_desc(vq, 1); + vhost_discard_vq_desc(vq, 1, ndesc); vhost_net_enable_vq(net, vq); break; } @@ -1045,11 +1050,12 @@ static int get_rx_bufs(struct vhost_net_virtqueue *= nvq, unsigned *iovcount, struct vhost_log *log, unsigned *log_num, - unsigned int quota) + unsigned int quota, + unsigned int *ndesc) { struct vhost_virtqueue *vq =3D &nvq->vq; bool in_order =3D vhost_has_feature(vq, VIRTIO_F_IN_ORDER); - unsigned int out, in; + unsigned int out, in, desc_num, n =3D 0; int seg =3D 0; int headcount =3D 0; unsigned d; @@ -1064,9 +1070,9 @@ static int get_rx_bufs(struct vhost_net_virtqueue *nv= q, r =3D -ENOBUFS; goto err; } - r =3D vhost_get_vq_desc(vq, vq->iov + seg, - ARRAY_SIZE(vq->iov) - seg, &out, - &in, log, log_num); + r =3D vhost_get_vq_desc_n(vq, vq->iov + seg, + ARRAY_SIZE(vq->iov) - seg, &out, + &in, log, log_num, &desc_num); if (unlikely(r < 0)) goto err; =20 @@ -1093,6 +1099,7 @@ static int get_rx_bufs(struct vhost_net_virtqueue *nv= q, ++headcount; datalen -=3D len; seg +=3D in; + n +=3D desc_num; } =20 *iovcount =3D seg; @@ -1113,9 +1120,11 @@ static int get_rx_bufs(struct vhost_net_virtqueue *n= vq, nheads[0] =3D headcount; } =20 + *ndesc =3D n; + return headcount; err: - vhost_discard_vq_desc(vq, headcount); + vhost_discard_vq_desc(vq, headcount, n); return r; } =20 @@ -1151,6 +1160,7 @@ static void handle_rx(struct vhost_net *net) struct iov_iter fixup; __virtio16 num_buffers; int recv_pkts =3D 0; + unsigned int ndesc; =20 mutex_lock_nested(&vq->mutex, VHOST_NET_VQ_RX); sock =3D vhost_vq_get_backend(vq); @@ -1182,7 +1192,8 @@ static void handle_rx(struct vhost_net *net) headcount =3D get_rx_bufs(nvq, vq->heads + count, vq->nheads + count, vhost_len, &in, vq_log, &log, - likely(mergeable) ? UIO_MAXIOV : 1); + likely(mergeable) ? UIO_MAXIOV : 1, + &ndesc); /* On error, stop handling until the next kick. */ if (unlikely(headcount < 0)) goto out; @@ -1228,7 +1239,7 @@ static void handle_rx(struct vhost_net *net) if (unlikely(err !=3D sock_len)) { pr_debug("Discarded rx packet: " " len %d, expected %zd\n", err, sock_len); - vhost_discard_vq_desc(vq, headcount); + vhost_discard_vq_desc(vq, headcount, ndesc); continue; } /* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */ @@ -1252,7 +1263,7 @@ static void handle_rx(struct vhost_net *net) copy_to_iter(&num_buffers, sizeof num_buffers, &fixup) !=3D sizeof num_buffers) { vq_err(vq, "Failed num_buffers write"); - vhost_discard_vq_desc(vq, headcount); + vhost_discard_vq_desc(vq, headcount, ndesc); goto out; } nvq->done_idx +=3D headcount; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 8570fdf2e14a..b56568807588 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2792,18 +2792,11 @@ static int get_indirect(struct vhost_virtqueue *vq, return 0; } =20 -/* This looks in the virtqueue and for the first available buffer, and con= verts - * it to an iovec for convenient access. Since descriptors consist of some - * number of output then some number of input descriptors, it's actually t= wo - * iovecs, but we pack them into one and note how many of each there were. - * - * This function returns the descriptor number found, or vq->num (which is - * never a valid descriptor number) if none was found. A negative code is - * returned on error. */ -int vhost_get_vq_desc(struct vhost_virtqueue *vq, - struct iovec iov[], unsigned int iov_size, - unsigned int *out_num, unsigned int *in_num, - struct vhost_log *log, unsigned int *log_num) +int vhost_get_vq_desc_n(struct vhost_virtqueue *vq, + struct iovec iov[], unsigned int iov_size, + unsigned int *out_num, unsigned int *in_num, + struct vhost_log *log, unsigned int *log_num, + unsigned int *ndesc) { bool in_order =3D vhost_has_feature(vq, VIRTIO_F_IN_ORDER); struct vring_desc desc; @@ -2921,16 +2914,40 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq, vq->last_avail_idx++; vq->next_avail_head +=3D c; =20 + if (ndesc) + *ndesc =3D c; + /* Assume notifications from guest are disabled at this point, * if they aren't we would need to update avail_event index. */ BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY)); return head; } +EXPORT_SYMBOL_GPL(vhost_get_vq_desc_n); + +/* This looks in the virtqueue and for the first available buffer, and con= verts + * it to an iovec for convenient access. Since descriptors consist of some + * number of output then some number of input descriptors, it's actually t= wo + * iovecs, but we pack them into one and note how many of each there were. + * + * This function returns the descriptor number found, or vq->num (which is + * never a valid descriptor number) if none was found. A negative code is + * returned on error. + */ +int vhost_get_vq_desc(struct vhost_virtqueue *vq, + struct iovec iov[], unsigned int iov_size, + unsigned int *out_num, unsigned int *in_num, + struct vhost_log *log, unsigned int *log_num) +{ + return vhost_get_vq_desc_n(vq, iov, iov_size, out_num, in_num, + log, log_num, NULL); +} EXPORT_SYMBOL_GPL(vhost_get_vq_desc); =20 /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */ -void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n) +void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n, + unsigned int ndesc) { + vq->next_avail_head -=3D ndesc; vq->last_avail_idx -=3D n; } EXPORT_SYMBOL_GPL(vhost_discard_vq_desc); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 621a6d9a8791..69a39540df3d 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -230,7 +230,14 @@ int vhost_get_vq_desc(struct vhost_virtqueue *, struct iovec iov[], unsigned int iov_size, unsigned int *out_num, unsigned int *in_num, struct vhost_log *log, unsigned int *log_num); -void vhost_discard_vq_desc(struct vhost_virtqueue *, int n); + +int vhost_get_vq_desc_n(struct vhost_virtqueue *vq, + struct iovec iov[], unsigned int iov_size, + unsigned int *out_num, unsigned int *in_num, + struct vhost_log *log, unsigned int *log_num, + unsigned int *ndesc); + +void vhost_discard_vq_desc(struct vhost_virtqueue *, int n, unsigned int n= desc); =20 bool vhost_vq_work_queue(struct vhost_virtqueue *vq, struct vhost_work *wo= rk); bool vhost_vq_has_work(struct vhost_virtqueue *vq); --=20 2.31.1