From nobody Mon Feb 9 13:36:20 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BC682D94A9 for ; Thu, 25 Dec 2025 04:27:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766636860; cv=none; b=cYzIo2apdGfZlMGSbuHAPcsDW+ORQLenCw+d3DwEWwo2YHyiS2W3TUFJnF2VqI3VyW1rpVeNLE3GyUizkIFfXd3ZLhCebgL7U0AXHi1/JbkOiK9Tdz4SX5UtHsosuDx5LcfdeyLsqe8IB+LOwzvtJhdRmnnfXGgvk/O6ypujrY8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766636860; c=relaxed/simple; bh=S9QQnKb8nMTsfTXd0WsBQJFxZPEN4/GH8jFj00vR4Rs=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=c7gwBXdN73WRxBBKQCO6xtPjMzWFRTCuxylvHpTszhok2rZI/tCLniXhePWWBgecSrXHG0dKLsomNPY66HkfJn7v0rJ9RLyrFSfzco/2JYQVr0tPXGCPctuM6OkbWzDKGUUbqN+4vH7Jv97REnVfQi/cq5ZU1UC/aTMf/52ZH1E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=V1rdo8lb; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="V1rdo8lb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1766636857; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uDcWUR+EG1zAM+W5vYk5I+BzDLS37LfBDrfsAhyFkpE=; b=V1rdo8lbCVP3gEI+5o08JSD2PAYpczSSz7hVVwoNLJC5i9IOxjlQdhBj0Kp4meBWuT5YmO aNS2TTRNksxnyhpN5z6W1uNW19ePgzpVCi7qPHFmlAhapZio3CKJERFYAppSp6+S8us6Kg Tlvfk2alJCuCIVfnyKWhp8JX/7j1zaI= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-139-ZKnfT6bKMFOxMh1p-2kw5g-1; Wed, 24 Dec 2025 23:27:35 -0500 X-MC-Unique: ZKnfT6bKMFOxMh1p-2kw5g-1 X-Mimecast-MFC-AGG-ID: ZKnfT6bKMFOxMh1p-2kw5g_1766636854 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id D7302195DE49; Thu, 25 Dec 2025 04:27:34 +0000 (UTC) Received: from localhost.localdomain (unknown [10.72.120.18]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B8226180045B; Thu, 25 Dec 2025 04:27:31 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com, xuanzhuo@linux.alibaba.com, eperezma@redhat.com, virtualization@lists.linux.dev, linux-kernel@vger.kernel.org Subject: [PATCH V11 19/19] virtio_ring: add in order support Date: Thu, 25 Dec 2025 12:26:08 +0800 Message-ID: <20251225042608.32350-20-jasowang@redhat.com> In-Reply-To: <20251225042608.32350-1-jasowang@redhat.com> References: <20251225042608.32350-1-jasowang@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 This patch implements in order support for both split virtqueue and packed virtqueue. Performance could be gained for the device where the memory access could be expensive (e.g vhost-net or a real PCI device): Benchmark with KVM guest: Vhost-net on the host: (pktgen + XDP_DROP): in_order=3Doff | in_order=3Don | +% TX: 4.51Mpps | 5.30Mpps | +17% RX: 3.47Mpps | 3.61Mpps | + 4% Vhost-user(testpmd) on the host: (pktgen/XDP_DROP): For split virtqueue: in_order=3Doff | in_order=3Don | +% TX: 5.60Mpps | 5.60Mpps | +0.0% RX: 9.16Mpps | 9.61Mpps | +4.9% For packed virtqueue: in_order=3Doff | in_order=3Don | +% TX: 5.60Mpps | 5.70Mpps | +1.7% RX: 10.6Mpps | 10.8Mpps | +1.8% Benchmark also shows no performance impact for in_order=3Doff for queue size with 256 and 1024. Reviewed-by: Eugenio P=C3=A9rez Signed-off-by: Jason Wang --- drivers/virtio/virtio_ring.c | 455 +++++++++++++++++++++++++++++++++-- 1 file changed, 432 insertions(+), 23 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 61884e031b94..d1bcd1d8c66b 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -70,6 +70,8 @@ enum vq_layout { SPLIT =3D 0, PACKED, + SPLIT_IN_ORDER, + PACKED_IN_ORDER, VQ_TYPE_MAX, }; =20 @@ -80,6 +82,7 @@ struct vring_desc_state_split { * allocated together. So we won't stress more to the memory allocator. */ struct vring_desc *indir_desc; + u32 total_in_len; }; =20 struct vring_desc_state_packed { @@ -91,6 +94,7 @@ struct vring_desc_state_packed { struct vring_packed_desc *indir_desc; u16 num; /* Descriptor list length. */ u16 last; /* The last desc state in a list. */ + u32 total_in_len; }; =20 struct vring_desc_extra { @@ -205,8 +209,24 @@ struct vring_virtqueue { =20 enum vq_layout layout; =20 - /* Head of free buffer list. */ + /* + * Without IN_ORDER it's the head of free buffer list. With + * IN_ORDER and SPLIT, it's the next available buffer + * index. With IN_ORDER and PACKED, it's unused. + */ unsigned int free_head; + + /* + * With IN_ORDER, once we see an in-order batch, this stores + * this last entry, and until we return the last buffer. + * After this, id is set to UINT_MAX to mark it invalid. + * Unused without IN_ORDER. + */ + struct used_entry { + u32 id; + u32 len; + } batch_last; + /* Number we've added since last sync. */ unsigned int num_added; =20 @@ -218,6 +238,11 @@ struct vring_virtqueue { */ u16 last_used_idx; =20 + /* With IN_ORDER and SPLIT, last descriptor id we used to + * detach buffer. + */ + u16 last_used; + /* Hint for event idx: already triggered no need to disable. */ bool event_triggered; =20 @@ -259,7 +284,12 @@ static void vring_free(struct virtqueue *_vq); =20 static inline bool virtqueue_is_packed(const struct vring_virtqueue *vq) { - return vq->layout =3D=3D PACKED; + return vq->layout =3D=3D PACKED || vq->layout =3D=3D PACKED_IN_ORDER; +} + +static inline bool virtqueue_is_in_order(const struct vring_virtqueue *vq) +{ + return vq->layout =3D=3D SPLIT_IN_ORDER || vq->layout =3D=3D PACKED_IN_OR= DER; } =20 static bool virtqueue_use_indirect(const struct vring_virtqueue *vq, @@ -469,6 +499,8 @@ static void virtqueue_init(struct vring_virtqueue *vq, = u32 num) else vq->last_used_idx =3D 0; =20 + vq->last_used =3D 0; + vq->event_triggered =3D false; vq->num_added =3D 0; =20 @@ -576,6 +608,8 @@ static inline int virtqueue_add_split(struct vring_virt= queue *vq, struct scatterlist *sg; struct vring_desc *desc; unsigned int i, n, avail, descs_used, err_idx, sg_count =3D 0; + /* Total length for in-order */ + unsigned int total_in_len =3D 0; int head; bool indirect; =20 @@ -667,6 +701,7 @@ static inline int virtqueue_add_split(struct vring_virt= queue *vq, */ i =3D virtqueue_add_desc_split(vq, desc, extra, i, addr, len, flags, premapped); + total_in_len +=3D len; } } =20 @@ -689,7 +724,12 @@ static inline int virtqueue_add_split(struct vring_vir= tqueue *vq, vq->vq.num_free -=3D descs_used; =20 /* Update free pointer */ - if (indirect) + if (virtqueue_is_in_order(vq)) { + vq->free_head +=3D descs_used; + if (vq->free_head >=3D vq->split.vring.num) + vq->free_head -=3D vq->split.vring.num; + vq->split.desc_state[head].total_in_len =3D total_in_len; + } else if (indirect) vq->free_head =3D vq->split.desc_extra[head].next; else vq->free_head =3D i; @@ -862,6 +902,14 @@ static bool more_used_split(const struct vring_virtque= ue *vq) return virtqueue_poll_split(vq, vq->last_used_idx); } =20 +static bool more_used_split_in_order(const struct vring_virtqueue *vq) +{ + if (vq->batch_last.id !=3D UINT_MAX) + return true; + + return virtqueue_poll_split(vq, vq->last_used_idx); +} + static void *virtqueue_get_buf_ctx_split(struct vring_virtqueue *vq, unsigned int *len, void **ctx) @@ -919,6 +967,76 @@ static void *virtqueue_get_buf_ctx_split(struct vring_= virtqueue *vq, return ret; } =20 +static void *virtqueue_get_buf_ctx_split_in_order(struct vring_virtqueue *= vq, + unsigned int *len, + void **ctx) +{ + void *ret; + unsigned int num =3D vq->split.vring.num; + unsigned int num_free =3D vq->vq.num_free; + u16 last_used, last_used_idx; + + START_USE(vq); + + if (unlikely(vq->broken)) { + END_USE(vq); + return NULL; + } + + last_used =3D vq->last_used & (num - 1); + last_used_idx =3D vq->last_used_idx & (num - 1); + + if (vq->batch_last.id =3D=3D UINT_MAX) { + if (!more_used_split_in_order(vq)) { + pr_debug("No more buffers in queue\n"); + END_USE(vq); + return NULL; + } + + /* + * Only get used array entries after they have been + * exposed by host. + */ + virtio_rmb(vq->weak_barriers); + + vq->batch_last.id =3D virtio32_to_cpu(vq->vq.vdev, + vq->split.vring.used->ring[last_used_idx].id); + vq->batch_last.len =3D virtio32_to_cpu(vq->vq.vdev, + vq->split.vring.used->ring[last_used_idx].len); + } + + if (vq->batch_last.id =3D=3D last_used) { + vq->batch_last.id =3D UINT_MAX; + *len =3D vq->batch_last.len; + } else { + *len =3D vq->split.desc_state[last_used].total_in_len; + } + + if (unlikely(!vq->split.desc_state[last_used].data)) { + BAD_RING(vq, "id %u is not a head!\n", last_used); + return NULL; + } + + /* detach_buf_split clears data, so grab it now. */ + ret =3D vq->split.desc_state[last_used].data; + detach_buf_split_in_order(vq, last_used, ctx); + + vq->last_used_idx++; + vq->last_used +=3D (vq->vq.num_free - num_free); + /* If we expect an interrupt for the next entry, tell host + * by writing event index and flush out the write before + * the read in the next get_buf call. */ + if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) + virtio_store_mb(vq->weak_barriers, + &vring_used_event(&vq->split.vring), + cpu_to_virtio16(vq->vq.vdev, vq->last_used_idx)); + + LAST_ADD_TIME_INVALID(vq); + + END_USE(vq); + return ret; +} + static void virtqueue_disable_cb_split(struct vring_virtqueue *vq) { if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) { @@ -1012,7 +1130,10 @@ static void *virtqueue_detach_unused_buf_split(struc= t vring_virtqueue *vq) continue; /* detach_buf_split clears data, so grab it now. */ buf =3D vq->split.desc_state[i].data; - detach_buf_split(vq, i, NULL); + if (virtqueue_is_in_order(vq)) + detach_buf_split_in_order(vq, i, NULL); + else + detach_buf_split(vq, i, NULL); vq->split.avail_idx_shadow--; vq->split.vring.avail->idx =3D cpu_to_virtio16(vq->vq.vdev, vq->split.avail_idx_shadow); @@ -1075,6 +1196,7 @@ static void virtqueue_vring_attach_split(struct vring= _virtqueue *vq, =20 /* Put everything in free lists. */ vq->free_head =3D 0; + vq->batch_last.id =3D UINT_MAX; } =20 static int vring_alloc_state_extra_split(struct vring_virtqueue_split *vri= ng_split) @@ -1186,7 +1308,6 @@ static struct virtqueue *__vring_new_virtqueue_split(= unsigned int index, if (!vq) return NULL; =20 - vq->layout =3D SPLIT; vq->vq.callback =3D callback; vq->vq.vdev =3D vdev; vq->vq.name =3D name; @@ -1206,6 +1327,8 @@ static struct virtqueue *__vring_new_virtqueue_split(= unsigned int index, vq->indirect =3D virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; vq->event =3D virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX); + vq->layout =3D virtio_has_feature(vdev, VIRTIO_F_IN_ORDER) ? + SPLIT_IN_ORDER : SPLIT; =20 if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM)) vq->weak_barriers =3D false; @@ -1363,13 +1486,14 @@ static int virtqueue_add_indirect_packed(struct vri= ng_virtqueue *vq, unsigned int in_sgs, void *data, bool premapped, - gfp_t gfp) + gfp_t gfp, + u16 id) { struct vring_desc_extra *extra; struct vring_packed_desc *desc; struct scatterlist *sg; - unsigned int i, n, err_idx, len; - u16 head, id; + unsigned int i, n, err_idx, len, total_in_len =3D 0; + u16 head; dma_addr_t addr; =20 head =3D vq->packed.next_avail_idx; @@ -1387,8 +1511,6 @@ static int virtqueue_add_indirect_packed(struct vring= _virtqueue *vq, } =20 i =3D 0; - id =3D vq->free_head; - BUG_ON(id =3D=3D vq->packed.vring.num); =20 for (n =3D 0; n < out_sgs + in_sgs; n++) { for (sg =3D sgs[n]; sg; sg =3D sg_next(sg)) { @@ -1408,6 +1530,8 @@ static int virtqueue_add_indirect_packed(struct vring= _virtqueue *vq, extra[i].flags =3D n < out_sgs ? 0 : VRING_DESC_F_WRITE; } =20 + if (n >=3D out_sgs) + total_in_len +=3D len; i++; } } @@ -1454,13 +1578,15 @@ static int virtqueue_add_indirect_packed(struct vri= ng_virtqueue *vq, 1 << VRING_PACKED_DESC_F_USED; } vq->packed.next_avail_idx =3D n; - vq->free_head =3D vq->packed.desc_extra[id].next; + if (!virtqueue_is_in_order(vq)) + vq->free_head =3D vq->packed.desc_extra[id].next; =20 /* Store token and indirect buffer state. */ vq->packed.desc_state[id].num =3D 1; vq->packed.desc_state[id].data =3D data; vq->packed.desc_state[id].indir_desc =3D desc; vq->packed.desc_state[id].last =3D id; + vq->packed.desc_state[id].total_in_len =3D total_in_len; =20 vq->num_added +=3D 1; =20 @@ -1513,8 +1639,11 @@ static inline int virtqueue_add_packed(struct vring_= virtqueue *vq, BUG_ON(total_sg =3D=3D 0); =20 if (virtqueue_use_indirect(vq, total_sg)) { + id =3D vq->free_head; + BUG_ON(id =3D=3D vq->packed.vring.num); err =3D virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs, - in_sgs, data, premapped, gfp); + in_sgs, data, premapped, + gfp, id); if (err !=3D -ENOMEM) { END_USE(vq); return err; @@ -1635,6 +1764,157 @@ static inline int virtqueue_add_packed(struct vring= _virtqueue *vq, return -EIO; } =20 +static inline int virtqueue_add_packed_in_order(struct vring_virtqueue *vq, + struct scatterlist *sgs[], + unsigned int total_sg, + unsigned int out_sgs, + unsigned int in_sgs, + void *data, + void *ctx, + bool premapped, + gfp_t gfp) +{ + struct vring_packed_desc *desc; + struct scatterlist *sg; + unsigned int i, n, sg_count, err_idx, total_in_len =3D 0; + __le16 head_flags, flags; + u16 head, avail_used_flags; + int err; + + START_USE(vq); + + BUG_ON(data =3D=3D NULL); + BUG_ON(ctx && vq->indirect); + + if (unlikely(vq->broken)) { + END_USE(vq); + return -EIO; + } + + LAST_ADD_TIME_UPDATE(vq); + + BUG_ON(total_sg =3D=3D 0); + + if (virtqueue_use_indirect(vq, total_sg)) { + err =3D virtqueue_add_indirect_packed(vq, sgs, total_sg, out_sgs, + in_sgs, data, premapped, gfp, + vq->packed.next_avail_idx); + if (err !=3D -ENOMEM) { + END_USE(vq); + return err; + } + + /* fall back on direct */ + } + + head =3D vq->packed.next_avail_idx; + avail_used_flags =3D vq->packed.avail_used_flags; + + WARN_ON_ONCE(total_sg > vq->packed.vring.num && !vq->indirect); + + desc =3D vq->packed.vring.desc; + i =3D head; + + if (unlikely(vq->vq.num_free < total_sg)) { + pr_debug("Can't add buf len %i - avail =3D %i\n", + total_sg, vq->vq.num_free); + END_USE(vq); + return -ENOSPC; + } + + sg_count =3D 0; + for (n =3D 0; n < out_sgs + in_sgs; n++) { + for (sg =3D sgs[n]; sg; sg =3D sg_next(sg)) { + dma_addr_t addr; + u32 len; + + flags =3D 0; + if (++sg_count !=3D total_sg) + flags |=3D cpu_to_le16(VRING_DESC_F_NEXT); + if (n >=3D out_sgs) + flags |=3D cpu_to_le16(VRING_DESC_F_WRITE); + + if (vring_map_one_sg(vq, sg, n < out_sgs ? + DMA_TO_DEVICE : DMA_FROM_DEVICE, + &addr, &len, premapped)) + goto unmap_release; + + flags |=3D cpu_to_le16(vq->packed.avail_used_flags); + + if (i =3D=3D head) + head_flags =3D flags; + else + desc[i].flags =3D flags; + + desc[i].addr =3D cpu_to_le64(addr); + desc[i].len =3D cpu_to_le32(len); + desc[i].id =3D cpu_to_le16(head); + + if (unlikely(vq->use_map_api)) { + vq->packed.desc_extra[i].addr =3D premapped ? + DMA_MAPPING_ERROR: addr; + vq->packed.desc_extra[i].len =3D len; + vq->packed.desc_extra[i].flags =3D + le16_to_cpu(flags); + } + + if ((unlikely(++i >=3D vq->packed.vring.num))) { + i =3D 0; + vq->packed.avail_used_flags ^=3D + 1 << VRING_PACKED_DESC_F_AVAIL | + 1 << VRING_PACKED_DESC_F_USED; + vq->packed.avail_wrap_counter ^=3D 1; + } + + if (n >=3D out_sgs) + total_in_len +=3D len; + } + } + + /* We're using some buffers from the free list. */ + vq->vq.num_free -=3D total_sg; + + /* Update free pointer */ + vq->packed.next_avail_idx =3D i; + + /* Store token. */ + vq->packed.desc_state[head].num =3D total_sg; + vq->packed.desc_state[head].data =3D data; + vq->packed.desc_state[head].indir_desc =3D ctx; + vq->packed.desc_state[head].total_in_len =3D total_in_len; + + /* + * A driver MUST NOT make the first descriptor in the list + * available before all subsequent descriptors comprising + * the list are made available. + */ + virtio_wmb(vq->weak_barriers); + vq->packed.vring.desc[head].flags =3D head_flags; + vq->num_added +=3D total_sg; + + pr_debug("Added buffer head %i to %p\n", head, vq); + END_USE(vq); + + return 0; + +unmap_release: + err_idx =3D i; + i =3D head; + vq->packed.avail_used_flags =3D avail_used_flags; + + for (n =3D 0; n < total_sg; n++) { + if (i =3D=3D err_idx) + break; + vring_unmap_extra_packed(vq, &vq->packed.desc_extra[i]); + i++; + if (i >=3D vq->packed.vring.num) + i =3D 0; + } + + END_USE(vq); + return -EIO; +} + static bool virtqueue_kick_prepare_packed(struct vring_virtqueue *vq) { u16 new, old, off_wrap, flags, wrap_counter, event_idx; @@ -1796,10 +2076,82 @@ static void update_last_used_idx_packed(struct vrin= g_virtqueue *vq, cpu_to_le16(vq->last_used_idx)); } =20 +static bool more_used_packed_in_order(const struct vring_virtqueue *vq) +{ + if (vq->batch_last.id !=3D UINT_MAX) + return true; + + return virtqueue_poll_packed(vq, READ_ONCE(vq->last_used_idx)); +} + +static void *virtqueue_get_buf_ctx_packed_in_order(struct vring_virtqueue = *vq, + unsigned int *len, + void **ctx) +{ + unsigned int num =3D vq->packed.vring.num; + u16 last_used, last_used_idx; + bool used_wrap_counter; + void *ret; + + START_USE(vq); + + if (unlikely(vq->broken)) { + END_USE(vq); + return NULL; + } + + last_used_idx =3D vq->last_used_idx; + used_wrap_counter =3D packed_used_wrap_counter(last_used_idx); + last_used =3D packed_last_used(last_used_idx); + + if (vq->batch_last.id =3D=3D UINT_MAX) { + if (!more_used_packed_in_order(vq)) { + pr_debug("No more buffers in queue\n"); + END_USE(vq); + return NULL; + } + /* Only get used elements after they have been exposed by host. */ + virtio_rmb(vq->weak_barriers); + vq->batch_last.id =3D + le16_to_cpu(vq->packed.vring.desc[last_used].id); + vq->batch_last.len =3D + le32_to_cpu(vq->packed.vring.desc[last_used].len); + } + + if (vq->batch_last.id =3D=3D last_used) { + vq->batch_last.id =3D UINT_MAX; + *len =3D vq->batch_last.len; + } else { + *len =3D vq->packed.desc_state[last_used].total_in_len; + } + + if (unlikely(last_used >=3D num)) { + BAD_RING(vq, "id %u out of range\n", last_used); + return NULL; + } + if (unlikely(!vq->packed.desc_state[last_used].data)) { + BAD_RING(vq, "id %u is not a head!\n", last_used); + return NULL; + } + + /* detach_buf_packed clears data, so grab it now. */ + ret =3D vq->packed.desc_state[last_used].data; + detach_buf_packed_in_order(vq, last_used, ctx); + + update_last_used_idx_packed(vq, last_used, last_used, + used_wrap_counter); + + LAST_ADD_TIME_INVALID(vq); + + END_USE(vq); + return ret; +} + static void *virtqueue_get_buf_ctx_packed(struct vring_virtqueue *vq, unsigned int *len, void **ctx) { + unsigned int num =3D vq->packed.vring.num; u16 last_used, id, last_used_idx; bool used_wrap_counter; void *ret; @@ -1826,7 +2178,7 @@ static void *virtqueue_get_buf_ctx_packed(struct vrin= g_virtqueue *vq, id =3D le16_to_cpu(vq->packed.vring.desc[last_used].id); *len =3D le32_to_cpu(vq->packed.vring.desc[last_used].len); =20 - if (unlikely(id >=3D vq->packed.vring.num)) { + if (unlikely(id >=3D num)) { BAD_RING(vq, "id %u out of range\n", id); return NULL; } @@ -1967,7 +2319,10 @@ static void *virtqueue_detach_unused_buf_packed(stru= ct vring_virtqueue *vq) continue; /* detach_buf clears data, so grab it now. */ buf =3D vq->packed.desc_state[i].data; - detach_buf_packed(vq, i, NULL); + if (virtqueue_is_in_order(vq)) + detach_buf_packed_in_order(vq, i, NULL); + else + detach_buf_packed(vq, i, NULL); END_USE(vq); return buf; } @@ -1993,6 +2348,8 @@ static struct vring_desc_extra *vring_alloc_desc_extr= a(unsigned int num) for (i =3D 0; i < num - 1; i++) desc_extra[i].next =3D i + 1; =20 + desc_extra[num - 1].next =3D 0; + return desc_extra; } =20 @@ -2124,10 +2481,17 @@ static void virtqueue_vring_attach_packed(struct vr= ing_virtqueue *vq, { vq->packed =3D *vring_packed; =20 - /* Put everything in free lists. */ - vq->free_head =3D 0; + if (virtqueue_is_in_order(vq)) { + vq->batch_last.id =3D UINT_MAX; + } else { + /* + * Put everything in free lists. Note that + * next_avail_idx is sufficient with IN_ORDER so + * free_head is unused. + */ + vq->free_head =3D 0; + } } - static void virtqueue_reset_packed(struct vring_virtqueue *vq) { memset(vq->packed.vring.device, 0, vq->packed.event_size_in_bytes); @@ -2172,13 +2536,14 @@ static struct virtqueue *__vring_new_virtqueue_pack= ed(unsigned int index, #else vq->broken =3D false; #endif - vq->layout =3D PACKED; vq->map =3D map; vq->use_map_api =3D vring_use_map_api(vdev); =20 vq->indirect =3D virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && !context; vq->event =3D virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX); + vq->layout =3D virtio_has_feature(vdev, VIRTIO_F_IN_ORDER) ? + PACKED_IN_ORDER : PACKED; =20 if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM)) vq->weak_barriers =3D false; @@ -2288,9 +2653,39 @@ static const struct virtqueue_ops packed_ops =3D { .reset =3D virtqueue_reset_packed, }; =20 +static const struct virtqueue_ops split_in_order_ops =3D { + .add =3D virtqueue_add_split, + .get =3D virtqueue_get_buf_ctx_split_in_order, + .kick_prepare =3D virtqueue_kick_prepare_split, + .disable_cb =3D virtqueue_disable_cb_split, + .enable_cb_delayed =3D virtqueue_enable_cb_delayed_split, + .enable_cb_prepare =3D virtqueue_enable_cb_prepare_split, + .poll =3D virtqueue_poll_split, + .detach_unused_buf =3D virtqueue_detach_unused_buf_split, + .more_used =3D more_used_split_in_order, + .resize =3D virtqueue_resize_split, + .reset =3D virtqueue_reset_split, +}; + +static const struct virtqueue_ops packed_in_order_ops =3D { + .add =3D virtqueue_add_packed_in_order, + .get =3D virtqueue_get_buf_ctx_packed_in_order, + .kick_prepare =3D virtqueue_kick_prepare_packed, + .disable_cb =3D virtqueue_disable_cb_packed, + .enable_cb_delayed =3D virtqueue_enable_cb_delayed_packed, + .enable_cb_prepare =3D virtqueue_enable_cb_prepare_packed, + .poll =3D virtqueue_poll_packed, + .detach_unused_buf =3D virtqueue_detach_unused_buf_packed, + .more_used =3D more_used_packed_in_order, + .resize =3D virtqueue_resize_packed, + .reset =3D virtqueue_reset_packed, +}; + static const struct virtqueue_ops *const all_ops[VQ_TYPE_MAX] =3D { [SPLIT] =3D &split_ops, - [PACKED] =3D &packed_ops + [PACKED] =3D &packed_ops, + [SPLIT_IN_ORDER] =3D &split_in_order_ops, + [PACKED_IN_ORDER] =3D &packed_in_order_ops, }; =20 static int virtqueue_disable_and_recycle(struct virtqueue *_vq, @@ -2346,6 +2741,12 @@ static int virtqueue_enable_after_reset(struct virtq= ueue *_vq) case PACKED: \ ret =3D all_ops[PACKED]->op(vq, ##__VA_ARGS__); \ break; \ + case SPLIT_IN_ORDER: \ + ret =3D all_ops[SPLIT_IN_ORDER]->op(vq, ##__VA_ARGS__); \ + break; \ + case PACKED_IN_ORDER: \ + ret =3D all_ops[PACKED_IN_ORDER]->op(vq, ##__VA_ARGS__); \ + break; \ default: \ BUG(); \ break; \ @@ -2362,10 +2763,16 @@ static int virtqueue_enable_after_reset(struct virt= queue *_vq) case PACKED: \ all_ops[PACKED]->op(vq, ##__VA_ARGS__); \ break; \ - default: \ - BUG(); \ - break; \ - } \ + case SPLIT_IN_ORDER: \ + all_ops[SPLIT_IN_ORDER]->op(vq, ##__VA_ARGS__); \ + break; \ + case PACKED_IN_ORDER: \ + all_ops[PACKED_IN_ORDER]->op(vq, ##__VA_ARGS__); \ + break; \ + default: \ + BUG(); \ + break; \ + } \ }) =20 static inline int virtqueue_add(struct virtqueue *_vq, @@ -3081,6 +3488,8 @@ void vring_transport_features(struct virtio_device *v= dev) break; case VIRTIO_F_NOTIFICATION_DATA: break; + case VIRTIO_F_IN_ORDER: + break; default: /* We don't understand this bit. */ __virtio_clear_bit(vdev, i); --=20 2.31.1