vhost: Introduce packed vq and add buffer elements

[RFC] vhost: Introduce packed vq and add buffer elements

Posted by Sahil Siddiq 1 year, 7 months ago

This is the first patch in a series to add support for packed
virtqueues in vhost_shadow_virtqueue. This patch implements the
insertion of available buffers in the descriptor area. It takes
into account descriptor chains, but does not consider indirect
descriptors.

VhostShadowVirtqueue has also been modified so it acts as a layer
of abstraction for split and packed virtqueues.

Signed-off-by: Sahil Siddiq <sahilcdq@proton.me>
---
Hi,

I am currently working on adding support for packed virtqueues in
vhost_shadow_virtqueue [1]. This patch only implements the insertion of
available buffers in the descriptor area. It does not take into
account indirect descriptors, event_idx or notifications.

I don't think these changes are testable yet but I thought I would
still post this patch for feedback. The following email annotates these
changes with a few comments and questions that I have.

Thanks,
Sahil

[1] https://wiki.qemu.org/Internships/ProjectIdeas/PackedShadowVirtqueue

 hw/virtio/vhost-shadow-virtqueue.c | 124 ++++++++++++++++++++++++++++-
 hw/virtio/vhost-shadow-virtqueue.h |  66 ++++++++++-----
 2 files changed, 167 insertions(+), 23 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index fc5f408f77..e3b276a9e9 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -217,6 +217,122 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
     return true;
 }
 
+/**
+ * Write descriptors to SVQ packed vring
+ *
+ * @svq: The shadow virtqueue
+ * @sg: Cache for hwaddr
+ * @out_sg: The iovec from the guest that is read-only for device
+ * @out_num: iovec length
+ * @in_sg: The iovec from the guest that is write-only for device
+ * @in_num: iovec length
+ * @head_flags: flags for first descriptor in list
+ *
+ * Return true if success, false otherwise and print error.
+ */
+static bool vhost_svq_vring_write_descs_packed(VhostShadowVirtqueue *svq, hwaddr *sg,
+                                        const struct iovec *out_sg, size_t out_num,
+                                        const struct iovec *in_sg, size_t in_num,
+                                        uint16_t *head_flags)
+{
+    uint16_t id, curr, head, i;
+    unsigned n;
+    struct vring_packed_desc *descs = svq->vring_packed.vring.desc;
+    bool ok;
+
+    head = svq->vring_packed.next_avail_idx;
+    i = head;
+    id = svq->free_head;
+    curr = id;
+
+    size_t num = out_num + in_num;
+
+    if (num == 0) {
+        return true;
+    }
+
+    ok = vhost_svq_translate_addr(svq, sg, out_sg, out_num);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    ok = vhost_svq_translate_addr(svq, sg + out_num, in_sg, in_num);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    for (n = 0; n < num; n++) {
+        uint16_t flags = cpu_to_le16(svq->vring_packed.avail_used_flags |
+                (n < out_num ? 0 : VRING_DESC_F_WRITE) |
+                (n + 1 == num ? 0 : VRING_DESC_F_NEXT));
+        if (i == head) {
+            *head_flags = flags;
+        } else {
+            descs[i].flags = flags;
+        }
+
+        descs[i].addr = cpu_to_le64(sg[n]);
+        descs[i].id = id;
+        if (n < out_num) {
+            descs[i].len = cpu_to_le32(out_sg[n].iov_len);
+        } else {
+            descs[i].len = cpu_to_le32(in_sg[n - out_num].iov_len);
+        }
+
+        curr = cpu_to_le16(svq->desc_next[curr]);
+
+        if (++i >= svq->vring_packed.vring.num) {
+            i = 0;
+            svq->vring_packed.avail_used_flags ^=
+                    1 << VRING_PACKED_DESC_F_AVAIL |
+                    1 << VRING_PACKED_DESC_F_USED;
+        }
+    }
+
+    if (i <= head) {
+        svq->vring_packed.avail_wrap_counter ^= 1;
+    }
+
+    svq->vring_packed.next_avail_idx = i;
+    svq->free_head = curr;
+    return true;
+}
+
+static bool vhost_svq_add_packed(VhostShadowVirtqueue *svq,
+                                const struct iovec *out_sg, size_t out_num,
+                                const struct iovec *in_sg, size_t in_num,
+                                unsigned *head)
+{
+    bool ok;
+    uint16_t head_flags = 0;
+    g_autofree hwaddr *sgs = g_new(hwaddr, out_num + in_num);
+
+    *head = svq->vring_packed.next_avail_idx;
+
+    /* We need some descriptors here */
+    if (unlikely(!out_num && !in_num)) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "Guest provided element with no descriptors");
+        return false;
+    }
+
+    ok = vhost_svq_vring_write_descs_packed(svq, sgs, out_sg, out_num,
+                                            in_sg, in_num, &head_flags);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    /*
+     * A driver MUST NOT make the first descriptor in the list
+     * available before all subsequent descriptors comprising
+     * the list are made available.
+     */
+    smp_wmb();
+    svq->vring_packed.vring.desc[*head].flags = head_flags;
+
+    return true;
+}
+
 static void vhost_svq_kick(VhostShadowVirtqueue *svq)
 {
     bool needs_kick;
@@ -258,7 +374,13 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
         return -ENOSPC;
     }
 
-    ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
+    if (virtio_vdev_has_feature(svq->vdev, VIRTIO_F_RING_PACKED)) {
+        ok = vhost_svq_add_packed(svq, out_sg, out_num,
+                                  in_sg, in_num, &qemu_head);
+    } else {
+        ok = vhost_svq_add_split(svq, out_sg, out_num,
+                                 in_sg, in_num, &qemu_head);
+    }
     if (unlikely(!ok)) {
         return -EINVAL;
     }
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index 19c842a15b..ee1a87f523 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -46,10 +46,53 @@ typedef struct VhostShadowVirtqueueOps {
     VirtQueueAvailCallback avail_handler;
 } VhostShadowVirtqueueOps;
 
+struct vring_packed {
+    /* Actual memory layout for this queue. */
+    struct {
+        unsigned int num;
+        struct vring_packed_desc *desc;
+        struct vring_packed_desc_event *driver;
+        struct vring_packed_desc_event *device;
+    } vring;
+
+    /* Avail used flags. */
+    uint16_t avail_used_flags;
+
+    /* Index of the next avail descriptor. */
+    uint16_t next_avail_idx;
+
+    /* Driver ring wrap counter */
+    bool avail_wrap_counter;
+};
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
+    /* Virtio queue shadowing */
+    VirtQueue *vq;
+
+    /* Virtio device */
+    VirtIODevice *vdev;
+
+    /* SVQ vring descriptors state */
+    SVQDescState *desc_state;
+
+    /*
+     * Backup next field for each descriptor so we can recover securely, not
+     * needing to trust the device access.
+     */
+    uint16_t *desc_next;
+
+    /* Next free descriptor */
+    uint16_t free_head;
+
+    /* Size of SVQ vring free descriptors */
+    uint16_t num_free;
+
     /* Shadow vring */
-    struct vring vring;
+    union {
+        struct vring vring;
+        struct vring_packed vring_packed;
+    };
 
     /* Shadow kick notifier, sent to vhost */
     EventNotifier hdev_kick;
@@ -69,27 +112,12 @@ typedef struct VhostShadowVirtqueue {
     /* Guest's call notifier, where the SVQ calls guest. */
     EventNotifier svq_call;
 
-    /* Virtio queue shadowing */
-    VirtQueue *vq;
-
-    /* Virtio device */
-    VirtIODevice *vdev;
-
     /* IOVA mapping */
     VhostIOVATree *iova_tree;
 
-    /* SVQ vring descriptors state */
-    SVQDescState *desc_state;
-
     /* Next VirtQueue element that guest made available */
     VirtQueueElement *next_guest_avail_elem;
 
-    /*
-     * Backup next field for each descriptor so we can recover securely, not
-     * needing to trust the device access.
-     */
-    uint16_t *desc_next;
-
     /* Caller callbacks */
     const VhostShadowVirtqueueOps *ops;
 
@@ -99,17 +127,11 @@ typedef struct VhostShadowVirtqueue {
     /* Next head to expose to the device */
     uint16_t shadow_avail_idx;
 
-    /* Next free descriptor */
-    uint16_t free_head;
-
     /* Last seen used idx */
     uint16_t shadow_used_idx;
 
     /* Next head to consume from the device */
     uint16_t last_used_idx;
-
-    /* Size of SVQ vring free descriptors */
-    uint16_t num_free;
 } VhostShadowVirtqueue;
 
 bool vhost_svq_valid_features(uint64_t features, Error **errp);
-- 
2.45.2

Re: [RFC] vhost: Introduce packed vq and add buffer elements

Posted by Eugenio Perez Martin 1 year, 7 months ago

On Tue, Jun 18, 2024 at 8:19 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
>
> This is the first patch in a series to add support for packed
> virtqueues in vhost_shadow_virtqueue. This patch implements the
> insertion of available buffers in the descriptor area. It takes
> into account descriptor chains, but does not consider indirect
> descriptors.
>
> VhostShadowVirtqueue has also been modified so it acts as a layer
> of abstraction for split and packed virtqueues.
>
> Signed-off-by: Sahil Siddiq <sahilcdq@proton.me>
> ---
> Hi,
>
> I am currently working on adding support for packed virtqueues in
> vhost_shadow_virtqueue [1]. This patch only implements the insertion of
> available buffers in the descriptor area. It does not take into
> account indirect descriptors, event_idx or notifications.
>
> I don't think these changes are testable yet but I thought I would
> still post this patch for feedback. The following email annotates these
> changes with a few comments and questions that I have.
>
> Thanks,
> Sahil
>

Hi Sahil,

Just some nitpicks here and there,

> [1] https://wiki.qemu.org/Internships/ProjectIdeas/PackedShadowVirtqueue
>
>  hw/virtio/vhost-shadow-virtqueue.c | 124 ++++++++++++++++++++++++++++-
>  hw/virtio/vhost-shadow-virtqueue.h |  66 ++++++++++-----
>  2 files changed, 167 insertions(+), 23 deletions(-)
>
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index fc5f408f77..e3b276a9e9 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -217,6 +217,122 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
>      return true;
>  }
>
> +/**
> + * Write descriptors to SVQ packed vring
> + *
> + * @svq: The shadow virtqueue
> + * @sg: Cache for hwaddr
> + * @out_sg: The iovec from the guest that is read-only for device
> + * @out_num: iovec length
> + * @in_sg: The iovec from the guest that is write-only for device
> + * @in_num: iovec length
> + * @head_flags: flags for first descriptor in list
> + *
> + * Return true if success, false otherwise and print error.
> + */
> +static bool vhost_svq_vring_write_descs_packed(VhostShadowVirtqueue *svq, hwaddr *sg,
> +                                        const struct iovec *out_sg, size_t out_num,
> +                                        const struct iovec *in_sg, size_t in_num,
> +                                        uint16_t *head_flags)
> +{
> +    uint16_t id, curr, head, i;
> +    unsigned n;
> +    struct vring_packed_desc *descs = svq->vring_packed.vring.desc;
> +    bool ok;
> +
> +    head = svq->vring_packed.next_avail_idx;
> +    i = head;
> +    id = svq->free_head;
> +    curr = id;
> +
> +    size_t num = out_num + in_num;
> +
> +    if (num == 0) {
> +        return true;
> +    }

num == 0 is impossible now, the caller checks for that.

> +
> +    ok = vhost_svq_translate_addr(svq, sg, out_sg, out_num);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    ok = vhost_svq_translate_addr(svq, sg + out_num, in_sg, in_num);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    for (n = 0; n < num; n++) {
> +        uint16_t flags = cpu_to_le16(svq->vring_packed.avail_used_flags |
> +                (n < out_num ? 0 : VRING_DESC_F_WRITE) |
> +                (n + 1 == num ? 0 : VRING_DESC_F_NEXT));
> +        if (i == head) {
> +            *head_flags = flags;
> +        } else {
> +            descs[i].flags = flags;
> +        }
> +
> +        descs[i].addr = cpu_to_le64(sg[n]);
> +        descs[i].id = id;
> +        if (n < out_num) {
> +            descs[i].len = cpu_to_le32(out_sg[n].iov_len);
> +        } else {
> +            descs[i].len = cpu_to_le32(in_sg[n - out_num].iov_len);
> +        }
> +
> +        curr = cpu_to_le16(svq->desc_next[curr]);
> +
> +        if (++i >= svq->vring_packed.vring.num) {
> +            i = 0;
> +            svq->vring_packed.avail_used_flags ^=
> +                    1 << VRING_PACKED_DESC_F_AVAIL |
> +                    1 << VRING_PACKED_DESC_F_USED;
> +        }
> +    }
> +
> +    if (i <= head) {
> +        svq->vring_packed.avail_wrap_counter ^= 1;
> +    }
> +
> +    svq->vring_packed.next_avail_idx = i;
> +    svq->free_head = curr;
> +    return true;
> +}
> +
> +static bool vhost_svq_add_packed(VhostShadowVirtqueue *svq,
> +                                const struct iovec *out_sg, size_t out_num,
> +                                const struct iovec *in_sg, size_t in_num,
> +                                unsigned *head)
> +{
> +    bool ok;
> +    uint16_t head_flags = 0;
> +    g_autofree hwaddr *sgs = g_new(hwaddr, out_num + in_num);
> +
> +    *head = svq->vring_packed.next_avail_idx;
> +
> +    /* We need some descriptors here */
> +    if (unlikely(!out_num && !in_num)) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                      "Guest provided element with no descriptors");
> +        return false;
> +    }
> +
> +    ok = vhost_svq_vring_write_descs_packed(svq, sgs, out_sg, out_num,
> +                                            in_sg, in_num, &head_flags);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +

Ok now I see why you switched sgs length from MAX to sum. But if we're
here, why not just embed all vhost_svq_vring_write_descs_packed here?
vhost_svq_vring_write_descs makes sense to split as we repeat the
operation, but I think it adds nothing here. What do you think?

> +    /*
> +     * A driver MUST NOT make the first descriptor in the list
> +     * available before all subsequent descriptors comprising
> +     * the list are made available.
> +     */
> +    smp_wmb();
> +    svq->vring_packed.vring.desc[*head].flags = head_flags;
> +
> +    return true;
> +}
> +
>  static void vhost_svq_kick(VhostShadowVirtqueue *svq)
>  {
>      bool needs_kick;
> @@ -258,7 +374,13 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>          return -ENOSPC;
>      }
>
> -    ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
> +    if (virtio_vdev_has_feature(svq->vdev, VIRTIO_F_RING_PACKED)) {
> +        ok = vhost_svq_add_packed(svq, out_sg, out_num,
> +                                  in_sg, in_num, &qemu_head);
> +    } else {
> +        ok = vhost_svq_add_split(svq, out_sg, out_num,
> +                                 in_sg, in_num, &qemu_head);
> +    }
>      if (unlikely(!ok)) {
>          return -EINVAL;
>      }
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 19c842a15b..ee1a87f523 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -46,10 +46,53 @@ typedef struct VhostShadowVirtqueueOps {
>      VirtQueueAvailCallback avail_handler;
>  } VhostShadowVirtqueueOps;
>
> +struct vring_packed {
> +    /* Actual memory layout for this queue. */
> +    struct {
> +        unsigned int num;
> +        struct vring_packed_desc *desc;
> +        struct vring_packed_desc_event *driver;
> +        struct vring_packed_desc_event *device;
> +    } vring;
> +
> +    /* Avail used flags. */
> +    uint16_t avail_used_flags;
> +
> +    /* Index of the next avail descriptor. */
> +    uint16_t next_avail_idx;
> +
> +    /* Driver ring wrap counter */
> +    bool avail_wrap_counter;
> +};
> +
>  /* Shadow virtqueue to relay notifications */
>  typedef struct VhostShadowVirtqueue {
> +    /* Virtio queue shadowing */
> +    VirtQueue *vq;
> +
> +    /* Virtio device */
> +    VirtIODevice *vdev;
> +
> +    /* SVQ vring descriptors state */
> +    SVQDescState *desc_state;
> +
> +    /*
> +     * Backup next field for each descriptor so we can recover securely, not
> +     * needing to trust the device access.
> +     */
> +    uint16_t *desc_next;
> +
> +    /* Next free descriptor */
> +    uint16_t free_head;
> +
> +    /* Size of SVQ vring free descriptors */
> +    uint16_t num_free;
> +

Why the reorder of all of the previous members?

Apart from the nitpicks I don't see anything wrong with this part of
the project. Looking forward to the next!

Thanks!

>      /* Shadow vring */
> -    struct vring vring;
> +    union {
> +        struct vring vring;
> +        struct vring_packed vring_packed;
> +    };
>
>      /* Shadow kick notifier, sent to vhost */
>      EventNotifier hdev_kick;
> @@ -69,27 +112,12 @@ typedef struct VhostShadowVirtqueue {
>      /* Guest's call notifier, where the SVQ calls guest. */
>      EventNotifier svq_call;
>
> -    /* Virtio queue shadowing */
> -    VirtQueue *vq;
> -
> -    /* Virtio device */
> -    VirtIODevice *vdev;
> -
>      /* IOVA mapping */
>      VhostIOVATree *iova_tree;
>
> -    /* SVQ vring descriptors state */
> -    SVQDescState *desc_state;
> -
>      /* Next VirtQueue element that guest made available */
>      VirtQueueElement *next_guest_avail_elem;
>
> -    /*
> -     * Backup next field for each descriptor so we can recover securely, not
> -     * needing to trust the device access.
> -     */
> -    uint16_t *desc_next;
> -
>      /* Caller callbacks */
>      const VhostShadowVirtqueueOps *ops;
>
> @@ -99,17 +127,11 @@ typedef struct VhostShadowVirtqueue {
>      /* Next head to expose to the device */
>      uint16_t shadow_avail_idx;
>
> -    /* Next free descriptor */
> -    uint16_t free_head;
> -
>      /* Last seen used idx */
>      uint16_t shadow_used_idx;
>
>      /* Next head to consume from the device */
>      uint16_t last_used_idx;
> -
> -    /* Size of SVQ vring free descriptors */
> -    uint16_t num_free;
>  } VhostShadowVirtqueue;
>
>  bool vhost_svq_valid_features(uint64_t features, Error **errp);
> --
> 2.45.2
>

Re: [RFC] vhost: Introduce packed vq and add buffer elements

Posted by Sahil 1 year, 7 months ago

Hi,

On Wednesday, June 19, 2024 3:49:29 PM GMT+5:30 Eugenio Perez Martin wrote:
> [...]
> Hi Sahil,
> 
> Just some nitpicks here and there,
> 
> > [1] https://wiki.qemu.org/Internships/ProjectIdeas/PackedShadowVirtqueue
> > 
> >  hw/virtio/vhost-shadow-virtqueue.c | 124 ++++++++++++++++++++++++++++-
> >  hw/virtio/vhost-shadow-virtqueue.h |  66 ++++++++++-----
> >  2 files changed, 167 insertions(+), 23 deletions(-)
> > 
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c
> > b/hw/virtio/vhost-shadow-virtqueue.c index fc5f408f77..e3b276a9e9 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -217,6 +217,122 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> >      return true;
> >  }
> > 
> > +/**
> > + * Write descriptors to SVQ packed vring
> > + *
> > + * @svq: The shadow virtqueue
> > + * @sg: Cache for hwaddr
> > + * @out_sg: The iovec from the guest that is read-only for device
> > + * @out_num: iovec length
> > + * @in_sg: The iovec from the guest that is write-only for device
> > + * @in_num: iovec length
> > + * @head_flags: flags for first descriptor in list
> > + *
> > + * Return true if success, false otherwise and print error.
> > + */
> > +static bool vhost_svq_vring_write_descs_packed(VhostShadowVirtqueue *svq, hwaddr *sg,
> > +                                        const struct iovec *out_sg, size_t out_num,
> > +                                        const struct iovec *in_sg, size_t in_num,
> > +                                        uint16_t *head_flags)
> > +{
> > +    uint16_t id, curr, head, i;
> > +    unsigned n;
> > +    struct vring_packed_desc *descs = svq->vring_packed.vring.desc;
> > +    bool ok;
> > +
> > +    head = svq->vring_packed.next_avail_idx;
> > +    i = head;
> > +    id = svq->free_head;
> > +    curr = id;
> > +
> > +    size_t num = out_num + in_num;
> > +
> > +    if (num == 0) {
> > +        return true;
> > +    }
> 
> num == 0 is impossible now, the caller checks for that.

Oh yes, I missed that.

> 
> > +
> > +    ok = vhost_svq_translate_addr(svq, sg, out_sg, out_num);
> > +    if (unlikely(!ok)) {
> > +        return false;
> > +    }
> > +
> > +    ok = vhost_svq_translate_addr(svq, sg + out_num, in_sg, in_num);
> > +    if (unlikely(!ok)) {
> > +        return false;
> > +    }
> > +
> > +    for (n = 0; n < num; n++) {
> > +        uint16_t flags = cpu_to_le16(svq->vring_packed.avail_used_flags |
> > +                (n < out_num ? 0 : VRING_DESC_F_WRITE) |
> > +                (n + 1 == num ? 0 : VRING_DESC_F_NEXT));
> > +        if (i == head) {
> > +            *head_flags = flags;
> > +        } else {
> > +            descs[i].flags = flags;
> > +        }
> > +
> > +        descs[i].addr = cpu_to_le64(sg[n]);
> > +        descs[i].id = id;
> > +        if (n < out_num) {
> > +            descs[i].len = cpu_to_le32(out_sg[n].iov_len);
> > +        } else {
> > +            descs[i].len = cpu_to_le32(in_sg[n - out_num].iov_len);
> > +        }
> > +
> > +        curr = cpu_to_le16(svq->desc_next[curr]);
> > +
> > +        if (++i >= svq->vring_packed.vring.num) {
> > +            i = 0;
> > +            svq->vring_packed.avail_used_flags ^=
> > +                    1 << VRING_PACKED_DESC_F_AVAIL |
> > +                    1 << VRING_PACKED_DESC_F_USED;
> > +        }
> > +    }
> > +
> > +    if (i <= head) {
> > +        svq->vring_packed.avail_wrap_counter ^= 1;
> > +    }
> > +
> > +    svq->vring_packed.next_avail_idx = i;
> > +    svq->free_head = curr;
> > +    return true;
> > +}
> > +
> > +static bool vhost_svq_add_packed(VhostShadowVirtqueue *svq,
> > +                                const struct iovec *out_sg, size_t out_num,
> > +                                const struct iovec *in_sg, size_t in_num,
> > +                                unsigned *head)
> > +{
> > +    bool ok;
> > +    uint16_t head_flags = 0;
> > +    g_autofree hwaddr *sgs = g_new(hwaddr, out_num + in_num);
> > +
> > +    *head = svq->vring_packed.next_avail_idx;
> > +
> > +    /* We need some descriptors here */
> > +    if (unlikely(!out_num && !in_num)) {
> > +        qemu_log_mask(LOG_GUEST_ERROR,
> > +                      "Guest provided element with no descriptors");
> > +        return false;
> > +    }
> > +
> > +    ok = vhost_svq_vring_write_descs_packed(svq, sgs, out_sg, out_num,
> > +                                            in_sg, in_num, &head_flags);
> > +    if (unlikely(!ok)) {
> > +        return false;
> > +    }
> > +
> 
> Ok now I see why you switched sgs length from MAX to sum. But if we're
> here, why not just embed all vhost_svq_vring_write_descs_packed here?
> vhost_svq_vring_write_descs makes sense to split as we repeat the
> operation, but I think it adds nothing here. What do you think?

You're right. The function is called only once and there is nothing to reuse.
I'll move "vhost_svq_vring_write_descs_packed" to "vhost_svq_add_packed".

> > [...]
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h
> > b/hw/virtio/vhost-shadow-virtqueue.h index 19c842a15b..ee1a87f523 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -46,10 +46,53 @@ typedef struct VhostShadowVirtqueueOps {
> > 
> >      VirtQueueAvailCallback avail_handler;
> >  
> >  } VhostShadowVirtqueueOps;
> > 
> > +struct vring_packed {
> > +    /* Actual memory layout for this queue. */
> > +    struct {
> > +        unsigned int num;
> > +        struct vring_packed_desc *desc;
> > +        struct vring_packed_desc_event *driver;
> > +        struct vring_packed_desc_event *device;
> > +    } vring;
> > +
> > +    /* Avail used flags. */
> > +    uint16_t avail_used_flags;
> > +
> > +    /* Index of the next avail descriptor. */
> > +    uint16_t next_avail_idx;
> > +
> > +    /* Driver ring wrap counter */
> > +    bool avail_wrap_counter;
> > +};
> > +
> > 
> >  /* Shadow virtqueue to relay notifications */
> >  typedef struct VhostShadowVirtqueue {
> > 
> > +    /* Virtio queue shadowing */
> > +    VirtQueue *vq;
> > +
> > +    /* Virtio device */
> > +    VirtIODevice *vdev;
> > +
> > +    /* SVQ vring descriptors state */
> > +    SVQDescState *desc_state;
> > +
> > +    /*
> > +     * Backup next field for each descriptor so we can recover securely,
> > not +     * needing to trust the device access.
> > +     */
> > +    uint16_t *desc_next;
> > +
> > +    /* Next free descriptor */
> > +    uint16_t free_head;
> > +
> > +    /* Size of SVQ vring free descriptors */
> > +    uint16_t num_free;
> > +
> 
> Why the reorder of all of the previous members?

I was thinking of placing all the members that are common to packed and
split vring above the union while leaving the remaining members below the
union. I did this based on our discussion here [1]. I don't think this reordering
serves any purpose implementation-wise. Please let me know if I should revert
this change.

> Apart from the nitpicks I don't see anything wrong with this part of
> the project. Looking forward to the next!
> 

Thank you for the review.

Thanks,
Sahil

[1] https://lists.nongnu.org/archive/html/qemu-devel/2024-05/msg02591.html

Re: [RFC] vhost: Introduce packed vq and add buffer elements

Posted by Eugenio Perez Martin 1 year, 7 months ago

On Sat, Jun 22, 2024 at 6:34 AM Sahil <icegambit91@gmail.com> wrote:
>
> Hi,
>
> On Wednesday, June 19, 2024 3:49:29 PM GMT+5:30 Eugenio Perez Martin wrote:
> > [...]
> > Hi Sahil,
> >
> > Just some nitpicks here and there,
> >
> > > [1] https://wiki.qemu.org/Internships/ProjectIdeas/PackedShadowVirtqueue
> > >
> > >  hw/virtio/vhost-shadow-virtqueue.c | 124 ++++++++++++++++++++++++++++-
> > >  hw/virtio/vhost-shadow-virtqueue.h |  66 ++++++++++-----
> > >  2 files changed, 167 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.c
> > > b/hw/virtio/vhost-shadow-virtqueue.c index fc5f408f77..e3b276a9e9 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > > @@ -217,6 +217,122 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> > >      return true;
> > >  }
> > >
> > > +/**
> > > + * Write descriptors to SVQ packed vring
> > > + *
> > > + * @svq: The shadow virtqueue
> > > + * @sg: Cache for hwaddr
> > > + * @out_sg: The iovec from the guest that is read-only for device
> > > + * @out_num: iovec length
> > > + * @in_sg: The iovec from the guest that is write-only for device
> > > + * @in_num: iovec length
> > > + * @head_flags: flags for first descriptor in list
> > > + *
> > > + * Return true if success, false otherwise and print error.
> > > + */
> > > +static bool vhost_svq_vring_write_descs_packed(VhostShadowVirtqueue *svq, hwaddr *sg,
> > > +                                        const struct iovec *out_sg, size_t out_num,
> > > +                                        const struct iovec *in_sg, size_t in_num,
> > > +                                        uint16_t *head_flags)
> > > +{
> > > +    uint16_t id, curr, head, i;
> > > +    unsigned n;
> > > +    struct vring_packed_desc *descs = svq->vring_packed.vring.desc;
> > > +    bool ok;
> > > +
> > > +    head = svq->vring_packed.next_avail_idx;
> > > +    i = head;
> > > +    id = svq->free_head;
> > > +    curr = id;
> > > +
> > > +    size_t num = out_num + in_num;
> > > +
> > > +    if (num == 0) {
> > > +        return true;
> > > +    }
> >
> > num == 0 is impossible now, the caller checks for that.
>
> Oh yes, I missed that.
>
> >
> > > +
> > > +    ok = vhost_svq_translate_addr(svq, sg, out_sg, out_num);
> > > +    if (unlikely(!ok)) {
> > > +        return false;
> > > +    }
> > > +
> > > +    ok = vhost_svq_translate_addr(svq, sg + out_num, in_sg, in_num);
> > > +    if (unlikely(!ok)) {
> > > +        return false;
> > > +    }
> > > +
> > > +    for (n = 0; n < num; n++) {
> > > +        uint16_t flags = cpu_to_le16(svq->vring_packed.avail_used_flags |
> > > +                (n < out_num ? 0 : VRING_DESC_F_WRITE) |
> > > +                (n + 1 == num ? 0 : VRING_DESC_F_NEXT));
> > > +        if (i == head) {
> > > +            *head_flags = flags;
> > > +        } else {
> > > +            descs[i].flags = flags;
> > > +        }
> > > +
> > > +        descs[i].addr = cpu_to_le64(sg[n]);
> > > +        descs[i].id = id;
> > > +        if (n < out_num) {
> > > +            descs[i].len = cpu_to_le32(out_sg[n].iov_len);
> > > +        } else {
> > > +            descs[i].len = cpu_to_le32(in_sg[n - out_num].iov_len);
> > > +        }
> > > +
> > > +        curr = cpu_to_le16(svq->desc_next[curr]);
> > > +
> > > +        if (++i >= svq->vring_packed.vring.num) {
> > > +            i = 0;
> > > +            svq->vring_packed.avail_used_flags ^=
> > > +                    1 << VRING_PACKED_DESC_F_AVAIL |
> > > +                    1 << VRING_PACKED_DESC_F_USED;
> > > +        }
> > > +    }
> > > +
> > > +    if (i <= head) {
> > > +        svq->vring_packed.avail_wrap_counter ^= 1;
> > > +    }
> > > +
> > > +    svq->vring_packed.next_avail_idx = i;
> > > +    svq->free_head = curr;
> > > +    return true;
> > > +}
> > > +
> > > +static bool vhost_svq_add_packed(VhostShadowVirtqueue *svq,
> > > +                                const struct iovec *out_sg, size_t out_num,
> > > +                                const struct iovec *in_sg, size_t in_num,
> > > +                                unsigned *head)
> > > +{
> > > +    bool ok;
> > > +    uint16_t head_flags = 0;
> > > +    g_autofree hwaddr *sgs = g_new(hwaddr, out_num + in_num);
> > > +
> > > +    *head = svq->vring_packed.next_avail_idx;
> > > +
> > > +    /* We need some descriptors here */
> > > +    if (unlikely(!out_num && !in_num)) {
> > > +        qemu_log_mask(LOG_GUEST_ERROR,
> > > +                      "Guest provided element with no descriptors");
> > > +        return false;
> > > +    }
> > > +
> > > +    ok = vhost_svq_vring_write_descs_packed(svq, sgs, out_sg, out_num,
> > > +                                            in_sg, in_num, &head_flags);
> > > +    if (unlikely(!ok)) {
> > > +        return false;
> > > +    }
> > > +
> >
> > Ok now I see why you switched sgs length from MAX to sum. But if we're
> > here, why not just embed all vhost_svq_vring_write_descs_packed here?
> > vhost_svq_vring_write_descs makes sense to split as we repeat the
> > operation, but I think it adds nothing here. What do you think?
>
> You're right. The function is called only once and there is nothing to reuse.
> I'll move "vhost_svq_vring_write_descs_packed" to "vhost_svq_add_packed".
>
> > > [...]
> > > diff --git a/hw/virtio/vhost-shadow-virtqueue.h
> > > b/hw/virtio/vhost-shadow-virtqueue.h index 19c842a15b..ee1a87f523 100644
> > > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > > @@ -46,10 +46,53 @@ typedef struct VhostShadowVirtqueueOps {
> > >
> > >      VirtQueueAvailCallback avail_handler;
> > >
> > >  } VhostShadowVirtqueueOps;
> > >
> > > +struct vring_packed {
> > > +    /* Actual memory layout for this queue. */
> > > +    struct {
> > > +        unsigned int num;
> > > +        struct vring_packed_desc *desc;
> > > +        struct vring_packed_desc_event *driver;
> > > +        struct vring_packed_desc_event *device;
> > > +    } vring;
> > > +
> > > +    /* Avail used flags. */
> > > +    uint16_t avail_used_flags;
> > > +
> > > +    /* Index of the next avail descriptor. */
> > > +    uint16_t next_avail_idx;
> > > +
> > > +    /* Driver ring wrap counter */
> > > +    bool avail_wrap_counter;
> > > +};
> > > +
> > >
> > >  /* Shadow virtqueue to relay notifications */
> > >  typedef struct VhostShadowVirtqueue {
> > >
> > > +    /* Virtio queue shadowing */
> > > +    VirtQueue *vq;
> > > +
> > > +    /* Virtio device */
> > > +    VirtIODevice *vdev;
> > > +
> > > +    /* SVQ vring descriptors state */
> > > +    SVQDescState *desc_state;
> > > +
> > > +    /*
> > > +     * Backup next field for each descriptor so we can recover securely,
> > > not +     * needing to trust the device access.
> > > +     */
> > > +    uint16_t *desc_next;
> > > +
> > > +    /* Next free descriptor */
> > > +    uint16_t free_head;
> > > +
> > > +    /* Size of SVQ vring free descriptors */
> > > +    uint16_t num_free;
> > > +
> >
> > Why the reorder of all of the previous members?
>
> I was thinking of placing all the members that are common to packed and
> split vring above the union while leaving the remaining members below the
> union. I did this based on our discussion here [1]. I don't think this reordering
> serves any purpose implementation-wise. Please let me know if I should revert
> this change.
>

I'm not against the change, but removing superfluous changes helps the
reviewing. If you send the reordering, please use a separate patch so
it's easier to review.

Thanks!

> > Apart from the nitpicks I don't see anything wrong with this part of
> > the project. Looking forward to the next!
> >
>
> Thank you for the review.
>
> Thanks,
> Sahil
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2024-05/msg02591.html
>
>

Re: [RFC] vhost: Introduce packed vq and add buffer elements

Posted by Sahil 1 year, 7 months ago

Hi,

On Monday, June 24, 2024 5:06:42 PM GMT+5:30 Eugenio Perez Martin wrote:
> [...]
> > > >  /* Shadow virtqueue to relay notifications */
> > > >  typedef struct VhostShadowVirtqueue {
> > > > 
> > > > +    /* Virtio queue shadowing */
> > > > +    VirtQueue *vq;
> > > > +
> > > > +    /* Virtio device */
> > > > +    VirtIODevice *vdev;
> > > > +
> > > > +    /* SVQ vring descriptors state */
> > > > +    SVQDescState *desc_state;
> > > > +
> > > > +    /*
> > > > +     * Backup next field for each descriptor so we can recover
> > > > securely,
> > > > not +     * needing to trust the device access.
> > > > +     */
> > > > +    uint16_t *desc_next;
> > > > +
> > > > +    /* Next free descriptor */
> > > > +    uint16_t free_head;
> > > > +
> > > > +    /* Size of SVQ vring free descriptors */
> > > > +    uint16_t num_free;
> > > > +
> > > 
> > > Why the reorder of all of the previous members?
> > 
> > I was thinking of placing all the members that are common to packed and
> > split vring above the union while leaving the remaining members below the
> > union. I did this based on our discussion here [1]. I don't think this
> > reordering serves any purpose implementation-wise. Please let me know if
> > I should revert this change.
> 
> I'm not against the change, but removing superfluous changes helps the
> reviewing. If you send the reordering, please use a separate patch so
> it's easier to review.

Understood, I'll keep this in mind when sending patches.

Thanks,
Sahil

Re: [RFC] vhost: Introduce packed vq and add buffer elements

Posted by Sahil 1 year, 7 months ago

Hi,

On Tuesday, June 18, 2024 11:48:34 PM GMT+5:30 Sahil Siddiq wrote:
> [...]
>
>  hw/virtio/vhost-shadow-virtqueue.c | 124 ++++++++++++++++++++++++++++-
>  hw/virtio/vhost-shadow-virtqueue.h |  66 ++++++++++-----
>  2 files changed, 167 insertions(+), 23 deletions(-)
> 
> diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> index fc5f408f77..e3b276a9e9 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.c
> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> @@ -217,6 +217,122 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
>      return true;
>  }
>  
> +/**
> + * Write descriptors to SVQ packed vring
> + *
> + * @svq: The shadow virtqueue
> + * @sg: Cache for hwaddr
> + * @out_sg: The iovec from the guest that is read-only for device
> + * @out_num: iovec length
> + * @in_sg: The iovec from the guest that is write-only for device
> + * @in_num: iovec length
> + * @head_flags: flags for first descriptor in list
> + *
> + * Return true if success, false otherwise and print error.
> + */
> +static bool vhost_svq_vring_write_descs_packed(VhostShadowVirtqueue *svq, hwaddr *sg,
> +                                        const struct iovec *out_sg, size_t out_num,
> +                                        const struct iovec *in_sg, size_t in_num,
> +                                        uint16_t *head_flags)
> +{
> +    uint16_t id, curr, head, i;
> +    unsigned n;
> +    struct vring_packed_desc *descs = svq->vring_packed.vring.desc;
> +    bool ok;
> +
> +    head = svq->vring_packed.next_avail_idx;
> +    i = head;
> +    id = svq->free_head;
> +    curr = id;
> +
> +    size_t num = out_num + in_num;
> +
> +    if (num == 0) {
> +        return true;
> +    }
> +
> +    ok = vhost_svq_translate_addr(svq, sg, out_sg, out_num);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    ok = vhost_svq_translate_addr(svq, sg + out_num, in_sg, in_num);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    for (n = 0; n < num; n++) {
> +        uint16_t flags = cpu_to_le16(svq->vring_packed.avail_used_flags |
> +                (n < out_num ? 0 : VRING_DESC_F_WRITE) |
> +                (n + 1 == num ? 0 : VRING_DESC_F_NEXT));
> +        if (i == head) {
> +            *head_flags = flags;
> +        } else {
> +            descs[i].flags = flags;
> +        }
> +
> +        descs[i].addr = cpu_to_le64(sg[n]);
> +        descs[i].id = id;
> +        if (n < out_num) {
> +            descs[i].len = cpu_to_le32(out_sg[n].iov_len);
> +        } else {
> +            descs[i].len = cpu_to_le32(in_sg[n - out_num].iov_len);
> +        }
> +
> +        curr = cpu_to_le16(svq->desc_next[curr]);

"curr" is being updated here, but descs[i].id is always set to id which doesn't change in
the loop. So all the descriptors in the chain will have the same id. I can't find anything
in the virtio specification [1] that suggests that all descriptors in the chain have the same
id. Also, going by the figure captioned "Three chained descriptors available" in the blog
post on packed virtqueues [2], it looks like the descriptors in the chain have different
buffer ids.

The virtio implementation in Linux also reuses the same id value for all the descriptors in a
single chain. I am not sure if I am missing something here.

> +        if (++i >= svq->vring_packed.vring.num) {
> +            i = 0;
> +            svq->vring_packed.avail_used_flags ^=
> +                    1 << VRING_PACKED_DESC_F_AVAIL |
> +                    1 << VRING_PACKED_DESC_F_USED;
> +        }
> +    }
> +
> +    if (i <= head) {
> +        svq->vring_packed.avail_wrap_counter ^= 1;
> +    }
> +
> +    svq->vring_packed.next_avail_idx = i;
> +    svq->free_head = curr;

Even though the same id is used, curr will not be id+1 here.

> +    return true;
> +}
> +
> +static bool vhost_svq_add_packed(VhostShadowVirtqueue *svq,
> +                                const struct iovec *out_sg, size_t out_num,
> +                                const struct iovec *in_sg, size_t in_num,
> +                                unsigned *head)
> +{
> +    bool ok;
> +    uint16_t head_flags = 0;
> +    g_autofree hwaddr *sgs = g_new(hwaddr, out_num + in_num);

I chose to use out_num+in_num as the size instead of MAX(ount_num, in_num). I
found it easier to implement "vhost_svq_vring_write_descs_packed()" like this.
Please let me know if this isn't feasible or ideal.

> +    *head = svq->vring_packed.next_avail_idx;
> +
> +    /* We need some descriptors here */
> +    if (unlikely(!out_num && !in_num)) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                      "Guest provided element with no descriptors");
> +        return false;
> +    }
> +
> +    ok = vhost_svq_vring_write_descs_packed(svq, sgs, out_sg, out_num,
> +                                            in_sg, in_num, &head_flags);
> +    if (unlikely(!ok)) {
> +        return false;
> +    }
> +
> +    /*
> +     * A driver MUST NOT make the first descriptor in the list
> +     * available before all subsequent descriptors comprising
> +     * the list are made available.
> +     */
> +    smp_wmb();
> +    svq->vring_packed.vring.desc[*head].flags = head_flags;
> +
> +    return true;
> +}
> +
>  static void vhost_svq_kick(VhostShadowVirtqueue *svq)
>  {
>      bool needs_kick;
> @@ -258,7 +374,13 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
>          return -ENOSPC;
>      }
>  
> -    ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
> +    if (virtio_vdev_has_feature(svq->vdev, VIRTIO_F_RING_PACKED)) {
> +        ok = vhost_svq_add_packed(svq, out_sg, out_num,
> +                                  in_sg, in_num, &qemu_head);
> +    } else {
> +        ok = vhost_svq_add_split(svq, out_sg, out_num,
> +                                 in_sg, in_num, &qemu_head);
> +    }
>      if (unlikely(!ok)) {
>          return -EINVAL;
>      }
> diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> index 19c842a15b..ee1a87f523 100644
> --- a/hw/virtio/vhost-shadow-virtqueue.h
> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> @@ -46,10 +46,53 @@ typedef struct VhostShadowVirtqueueOps {
>      VirtQueueAvailCallback avail_handler;
>  } VhostShadowVirtqueueOps;
>  
> +struct vring_packed {
> +    /* Actual memory layout for this queue. */
> +    struct {
> +        unsigned int num;
> +        struct vring_packed_desc *desc;
> +        struct vring_packed_desc_event *driver;
> +        struct vring_packed_desc_event *device;
> +    } vring;
> +
> +    /* Avail used flags. */
> +    uint16_t avail_used_flags;
> +
> +    /* Index of the next avail descriptor. */
> +    uint16_t next_avail_idx;
> +
> +    /* Driver ring wrap counter */
> +    bool avail_wrap_counter;
> +};
> +
>  /* Shadow virtqueue to relay notifications */
>  typedef struct VhostShadowVirtqueue {
> +    /* Virtio queue shadowing */
> +    VirtQueue *vq;
> +
> +    /* Virtio device */
> +    VirtIODevice *vdev;
> +
> +    /* SVQ vring descriptors state */
> +    SVQDescState *desc_state;
> +
> +    /*
> +     * Backup next field for each descriptor so we can recover securely, not
> +     * needing to trust the device access.
> +     */
> +    uint16_t *desc_next;
> +
> +    /* Next free descriptor */
> +    uint16_t free_head;
> +
> +    /* Size of SVQ vring free descriptors */
> +    uint16_t num_free;
> +
>      /* Shadow vring */
> -    struct vring vring;
> +    union {
> +        struct vring vring;
> +        struct vring_packed vring_packed;
> +    };
>  
>      /* Shadow kick notifier, sent to vhost */
>      EventNotifier hdev_kick;
> @@ -69,27 +112,12 @@ typedef struct VhostShadowVirtqueue {
>      /* Guest's call notifier, where the SVQ calls guest. */
>      EventNotifier svq_call;
>  
> -    /* Virtio queue shadowing */
> -    VirtQueue *vq;
> -
> -    /* Virtio device */
> -    VirtIODevice *vdev;
> -
>      /* IOVA mapping */
>      VhostIOVATree *iova_tree;
>  
> -    /* SVQ vring descriptors state */
> -    SVQDescState *desc_state;
> -
>      /* Next VirtQueue element that guest made available */
>      VirtQueueElement *next_guest_avail_elem;
>  
> -    /*
> -     * Backup next field for each descriptor so we can recover securely, not
> -     * needing to trust the device access.
> -     */
> -    uint16_t *desc_next;
> -
>      /* Caller callbacks */
>      const VhostShadowVirtqueueOps *ops;
>  
> @@ -99,17 +127,11 @@ typedef struct VhostShadowVirtqueue {
>      /* Next head to expose to the device */
>      uint16_t shadow_avail_idx;
>  
> -    /* Next free descriptor */
> -    uint16_t free_head;
> -
>      /* Last seen used idx */
>      uint16_t shadow_used_idx;
>  
>      /* Next head to consume from the device */
>      uint16_t last_used_idx;
> -
> -    /* Size of SVQ vring free descriptors */
> -    uint16_t num_free;
>  } VhostShadowVirtqueue;
>  
>  bool vhost_svq_valid_features(uint64_t features, Error **errp);
> -- 
> 2.45.2
>

In "struct VhostShadowVirtqueue", I rearranged the order in which some members appear.
I tried to keep the members common to split and packed virtqueues above the union and
the rest below the union. I haven't entirely understood the role of some of the members
(for example, VhostShadowVirtqueueOps *ops). I'll change this ordering if need be as I
continue to understand them better.

For the next step, I think I should work on "vhost_svq_start()" which is where members of
the struct are actually initialized. At the moment, only the split ring part of the structure is
initialized.

I think I should also start working on enhancing "vhost_svq_kick()" to actually send the buffers
to the device. I think it'll be easier to test these changes once that's done (I am not sure about
this though). Would this involve implementing the notification mechanism and event_idx?

Thanks,
Sahil

[1] https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-720008
[2] https://www.redhat.com/en/blog/packed-virtqueue-how-reduce-overhead-virtio

Re: [RFC] vhost: Introduce packed vq and add buffer elements

Posted by Eugenio Perez Martin 1 year, 7 months ago

On Tue, Jun 18, 2024 at 8:58 PM Sahil <icegambit91@gmail.com> wrote:
>
> Hi,
>
> On Tuesday, June 18, 2024 11:48:34 PM GMT+5:30 Sahil Siddiq wrote:
> > [...]
> >
> >  hw/virtio/vhost-shadow-virtqueue.c | 124 ++++++++++++++++++++++++++++-
> >  hw/virtio/vhost-shadow-virtqueue.h |  66 ++++++++++-----
> >  2 files changed, 167 insertions(+), 23 deletions(-)
> >
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
> > index fc5f408f77..e3b276a9e9 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.c
> > +++ b/hw/virtio/vhost-shadow-virtqueue.c
> > @@ -217,6 +217,122 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
> >      return true;
> >  }
> >
> > +/**
> > + * Write descriptors to SVQ packed vring
> > + *
> > + * @svq: The shadow virtqueue
> > + * @sg: Cache for hwaddr
> > + * @out_sg: The iovec from the guest that is read-only for device
> > + * @out_num: iovec length
> > + * @in_sg: The iovec from the guest that is write-only for device
> > + * @in_num: iovec length
> > + * @head_flags: flags for first descriptor in list
> > + *
> > + * Return true if success, false otherwise and print error.
> > + */
> > +static bool vhost_svq_vring_write_descs_packed(VhostShadowVirtqueue *svq, hwaddr *sg,
> > +                                        const struct iovec *out_sg, size_t out_num,
> > +                                        const struct iovec *in_sg, size_t in_num,
> > +                                        uint16_t *head_flags)
> > +{
> > +    uint16_t id, curr, head, i;
> > +    unsigned n;
> > +    struct vring_packed_desc *descs = svq->vring_packed.vring.desc;
> > +    bool ok;
> > +
> > +    head = svq->vring_packed.next_avail_idx;
> > +    i = head;
> > +    id = svq->free_head;
> > +    curr = id;
> > +
> > +    size_t num = out_num + in_num;
> > +
> > +    if (num == 0) {
> > +        return true;
> > +    }
> > +
> > +    ok = vhost_svq_translate_addr(svq, sg, out_sg, out_num);
> > +    if (unlikely(!ok)) {
> > +        return false;
> > +    }
> > +
> > +    ok = vhost_svq_translate_addr(svq, sg + out_num, in_sg, in_num);
> > +    if (unlikely(!ok)) {
> > +        return false;
> > +    }
> > +
> > +    for (n = 0; n < num; n++) {
> > +        uint16_t flags = cpu_to_le16(svq->vring_packed.avail_used_flags |
> > +                (n < out_num ? 0 : VRING_DESC_F_WRITE) |
> > +                (n + 1 == num ? 0 : VRING_DESC_F_NEXT));
> > +        if (i == head) {
> > +            *head_flags = flags;
> > +        } else {
> > +            descs[i].flags = flags;
> > +        }
> > +
> > +        descs[i].addr = cpu_to_le64(sg[n]);
> > +        descs[i].id = id;
> > +        if (n < out_num) {
> > +            descs[i].len = cpu_to_le32(out_sg[n].iov_len);
> > +        } else {
> > +            descs[i].len = cpu_to_le32(in_sg[n - out_num].iov_len);
> > +        }
> > +
> > +        curr = cpu_to_le16(svq->desc_next[curr]);
>
> "curr" is being updated here, but descs[i].id is always set to id which doesn't change in
> the loop. So all the descriptors in the chain will have the same id. I can't find anything
> in the virtio specification [1] that suggests that all descriptors in the chain have the same
> id. Also, going by the figure captioned "Three chained descriptors available" in the blog
> post on packed virtqueues [2], it looks like the descriptors in the chain have different
> buffer ids.
>
> The virtio implementation in Linux also reuses the same id value for all the descriptors in a
> single chain. I am not sure if I am missing something here.
>

The code is right, the id that identifies the whole chain is just the
one on the last descriptor. The key is that all the tail descriptors
of the chains will have a different id, the rest ids are ignored so it
is easier this way. I got it wrong in a recent mail in the list, where
you can find more information. Let me know if you cannot find it :).

In the split vq is different as a chained descriptor can go back and
forth in the descriptor ring with the next id. So all of them must be
different. But in the packed vq, the device knows the next descriptor
is placed at the next entry in the descriptor ring, so the only
important id is the last one.

> > +        if (++i >= svq->vring_packed.vring.num) {
> > +            i = 0;
> > +            svq->vring_packed.avail_used_flags ^=
> > +                    1 << VRING_PACKED_DESC_F_AVAIL |
> > +                    1 << VRING_PACKED_DESC_F_USED;
> > +        }
> > +    }
> > +
> > +    if (i <= head) {
> > +        svq->vring_packed.avail_wrap_counter ^= 1;
> > +    }
> > +
> > +    svq->vring_packed.next_avail_idx = i;
> > +    svq->free_head = curr;
>
> Even though the same id is used, curr will not be id+1 here.
>

curr is not the descriptor index, but the id. They're used in a stack
format: One available chain pops an id and one used id pushes its id
in the stack.

Maybe I'm wrong, but I think the main reason is to reuse the same
memory region of the descriptor state etc so less memory is changed to
be used in all the operations.

> > +    return true;
> > +}
> > +
> > +static bool vhost_svq_add_packed(VhostShadowVirtqueue *svq,
> > +                                const struct iovec *out_sg, size_t out_num,
> > +                                const struct iovec *in_sg, size_t in_num,
> > +                                unsigned *head)
> > +{
> > +    bool ok;
> > +    uint16_t head_flags = 0;
> > +    g_autofree hwaddr *sgs = g_new(hwaddr, out_num + in_num);
>
> I chose to use out_num+in_num as the size instead of MAX(ount_num, in_num). I
> found it easier to implement "vhost_svq_vring_write_descs_packed()" like this.
> Please let me know if this isn't feasible or ideal.
>

Not a big deal, I picked the MAX just because it is all the
hwaddresses the function needs at the same time. Addition should work
too, and AFAIK chains are usually short. We should get rid of this
dynamic allocation in the future anyway.

> > +    *head = svq->vring_packed.next_avail_idx;
> > +
> > +    /* We need some descriptors here */
> > +    if (unlikely(!out_num && !in_num)) {
> > +        qemu_log_mask(LOG_GUEST_ERROR,
> > +                      "Guest provided element with no descriptors");
> > +        return false;
> > +    }
> > +
> > +    ok = vhost_svq_vring_write_descs_packed(svq, sgs, out_sg, out_num,
> > +                                            in_sg, in_num, &head_flags);
> > +    if (unlikely(!ok)) {
> > +        return false;
> > +    }
> > +
> > +    /*
> > +     * A driver MUST NOT make the first descriptor in the list
> > +     * available before all subsequent descriptors comprising
> > +     * the list are made available.
> > +     */
> > +    smp_wmb();
> > +    svq->vring_packed.vring.desc[*head].flags = head_flags;
> > +
> > +    return true;
> > +}
> > +
> >  static void vhost_svq_kick(VhostShadowVirtqueue *svq)
> >  {
> >      bool needs_kick;
> > @@ -258,7 +374,13 @@ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
> >          return -ENOSPC;
> >      }
> >
> > -    ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
> > +    if (virtio_vdev_has_feature(svq->vdev, VIRTIO_F_RING_PACKED)) {
> > +        ok = vhost_svq_add_packed(svq, out_sg, out_num,
> > +                                  in_sg, in_num, &qemu_head);
> > +    } else {
> > +        ok = vhost_svq_add_split(svq, out_sg, out_num,
> > +                                 in_sg, in_num, &qemu_head);
> > +    }
> >      if (unlikely(!ok)) {
> >          return -EINVAL;
> >      }
> > diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
> > index 19c842a15b..ee1a87f523 100644
> > --- a/hw/virtio/vhost-shadow-virtqueue.h
> > +++ b/hw/virtio/vhost-shadow-virtqueue.h
> > @@ -46,10 +46,53 @@ typedef struct VhostShadowVirtqueueOps {
> >      VirtQueueAvailCallback avail_handler;
> >  } VhostShadowVirtqueueOps;
> >
> > +struct vring_packed {
> > +    /* Actual memory layout for this queue. */
> > +    struct {
> > +        unsigned int num;
> > +        struct vring_packed_desc *desc;
> > +        struct vring_packed_desc_event *driver;
> > +        struct vring_packed_desc_event *device;
> > +    } vring;
> > +
> > +    /* Avail used flags. */
> > +    uint16_t avail_used_flags;
> > +
> > +    /* Index of the next avail descriptor. */
> > +    uint16_t next_avail_idx;
> > +
> > +    /* Driver ring wrap counter */
> > +    bool avail_wrap_counter;
> > +};
> > +
> >  /* Shadow virtqueue to relay notifications */
> >  typedef struct VhostShadowVirtqueue {
> > +    /* Virtio queue shadowing */
> > +    VirtQueue *vq;
> > +
> > +    /* Virtio device */
> > +    VirtIODevice *vdev;
> > +
> > +    /* SVQ vring descriptors state */
> > +    SVQDescState *desc_state;
> > +
> > +    /*
> > +     * Backup next field for each descriptor so we can recover securely, not
> > +     * needing to trust the device access.
> > +     */
> > +    uint16_t *desc_next;
> > +
> > +    /* Next free descriptor */
> > +    uint16_t free_head;
> > +
> > +    /* Size of SVQ vring free descriptors */
> > +    uint16_t num_free;
> > +
> >      /* Shadow vring */
> > -    struct vring vring;
> > +    union {
> > +        struct vring vring;
> > +        struct vring_packed vring_packed;
> > +    };
> >
> >      /* Shadow kick notifier, sent to vhost */
> >      EventNotifier hdev_kick;
> > @@ -69,27 +112,12 @@ typedef struct VhostShadowVirtqueue {
> >      /* Guest's call notifier, where the SVQ calls guest. */
> >      EventNotifier svq_call;
> >
> > -    /* Virtio queue shadowing */
> > -    VirtQueue *vq;
> > -
> > -    /* Virtio device */
> > -    VirtIODevice *vdev;
> > -
> >      /* IOVA mapping */
> >      VhostIOVATree *iova_tree;
> >
> > -    /* SVQ vring descriptors state */
> > -    SVQDescState *desc_state;
> > -
> >      /* Next VirtQueue element that guest made available */
> >      VirtQueueElement *next_guest_avail_elem;
> >
> > -    /*
> > -     * Backup next field for each descriptor so we can recover securely, not
> > -     * needing to trust the device access.
> > -     */
> > -    uint16_t *desc_next;
> > -
> >      /* Caller callbacks */
> >      const VhostShadowVirtqueueOps *ops;
> >
> > @@ -99,17 +127,11 @@ typedef struct VhostShadowVirtqueue {
> >      /* Next head to expose to the device */
> >      uint16_t shadow_avail_idx;
> >
> > -    /* Next free descriptor */
> > -    uint16_t free_head;
> > -
> >      /* Last seen used idx */
> >      uint16_t shadow_used_idx;
> >
> >      /* Next head to consume from the device */
> >      uint16_t last_used_idx;
> > -
> > -    /* Size of SVQ vring free descriptors */
> > -    uint16_t num_free;
> >  } VhostShadowVirtqueue;
> >
> >  bool vhost_svq_valid_features(uint64_t features, Error **errp);
> > --
> > 2.45.2
> >
>
> In "struct VhostShadowVirtqueue", I rearranged the order in which some members appear.
> I tried to keep the members common to split and packed virtqueues above the union and
> the rest below the union. I haven't entirely understood the role of some of the members
> (for example, VhostShadowVirtqueueOps *ops). I'll change this ordering if need be as I
> continue to understand them better.
>

That's fine, but do it in a separate patch for future series, so it is
easier to review.

ops is used when a kind of device wants specialized handling for the
descriptors forwarded. vdpa-net uses it when QEMU also needs to
inspect the descriptors. Feel free to ask more about it, but adding
packed format to SVQ should not affect the ops member.

> For the next step, I think I should work on "vhost_svq_start()" which is where members of
> the struct are actually initialized. At the moment, only the split ring part of the structure is
> initialized.
>

Sounds reasonable. My recommendation is to mimic the patches of the
kernel, doing a git log and following that order. You also need to
apply the fixes in the history from that moment.

> I think I should also start working on enhancing "vhost_svq_kick()" to actually send the buffers
> to the device. I think it'll be easier to test these changes once that's done (I am not sure about
> this though). Would this involve implementing the notification mechanism and event_idx?
>

You can start omitting event_idx if you disable it from the device or
from qemu's commandline with event_idx=off. If you use vdpa_sim, it's
easier to remove it using qemu's cmdline in my opinion. Also, there is
a needs_kick boolean that you can set to always true for testing
purposes, since it is valid to send extra notifications. I think you
don't need to modify anything else from vhost_svq_kick to test the
device receives the buffer, but let me know if you find problems.

Thanks!

> Thanks,
> Sahil
>
> [1] https://docs.oasis-open.org/virtio/virtio/v1.3/csd01/virtio-v1.3-csd01.html#x1-720008
> [2] https://www.redhat.com/en/blog/packed-virtqueue-how-reduce-overhead-virtio
>
>

Re: [RFC] vhost: Introduce packed vq and add buffer elements

Posted by Sahil 1 year, 7 months ago

Hi,

Thank you for your reply.

On Wednesday, June 19, 2024 1:07:54 PM GMT+5:30 Eugenio Perez Martin wrote:
> [...]
> > "curr" is being updated here, but descs[i].id is always set to id which
> > doesn't change in the loop. So all the descriptors in the chain will have
> > the same id. I can't find anything in the virtio specification [1] that
> > suggests that all descriptors in the chain have the same id. Also, going
> > by the figure captioned "Three chained descriptors available" in the blog
> > post on packed virtqueues [2], it looks like the descriptors in the chain
> > have different buffer ids.
> > 
> > The virtio implementation in Linux also reuses the same id value for all
> > the descriptors in a single chain. I am not sure if I am missing
> > something here.
> 
> The code is right, the id that identifies the whole chain is just the
> one on the last descriptor. The key is that all the tail descriptors
> of the chains will have a different id, the rest ids are ignored so it
> is easier this way. I got it wrong in a recent mail in the list, where
> you can find more information. Let me know if you cannot find it :).

I found the mail here [1] :)

> In the split vq is different as a chained descriptor can go back and
> forth in the descriptor ring with the next id. So all of them must be
> different. But in the packed vq, the device knows the next descriptor
> is placed at the next entry in the descriptor ring, so the only
> important id is the last one.

Ok, this makes sense now.

> > > +        if (++i >= svq->vring_packed.vring.num) {
> > > +            i = 0;
> > > +            svq->vring_packed.avail_used_flags ^=
> > > +                    1 << VRING_PACKED_DESC_F_AVAIL |
> > > +                    1 << VRING_PACKED_DESC_F_USED;
> > > +        }
> > > +    }
> > > +
> > > +    if (i <= head) {
> > > +        svq->vring_packed.avail_wrap_counter ^= 1;
> > > +    }
> > > +
> > > +    svq->vring_packed.next_avail_idx = i;
> > > +    svq->free_head = curr;
> > 
> > Even though the same id is used, curr will not be id+1 here.
> 
> curr is not the descriptor index, but the id. They're used in a stack
> format: One available chain pops an id and one used id pushes its id
> in the stack.
> 
> Maybe I'm wrong, but I think the main reason is to reuse the same
> memory region of the descriptor state etc so less memory is changed to
> be used in all the operations.

Right, curr is the id. I didn't really understand the popping and pushing
part.

In the implementation, the possible ids are stored in svq.desc_next. And
it's implemented in a way that the next id is (id + 1) % queue_size.

By the following line:
> Even though the same id is used, curr will not be id+1 here.

I meant that if, for example,  there is a chain of 3 descriptors and the
current id is 1, then all 3 descriptors will have 1 as their id. If the vring
size is 5, then the value of curr that will be filled in the 4th descriptor
will be 4 instead of 2.

> > > +    return true;
> > > +}
> > > +
> > > +static bool vhost_svq_add_packed(VhostShadowVirtqueue *svq,
> > > +                                const struct iovec *out_sg, size_t
> > > out_num, +                                const struct iovec *in_sg,
> > > size_t in_num, +                                unsigned *head)
> > > +{
> > > +    bool ok;
> > > +    uint16_t head_flags = 0;
> > > +    g_autofree hwaddr *sgs = g_new(hwaddr, out_num + in_num);
> > 
> > I chose to use out_num+in_num as the size instead of MAX(ount_num,
> > in_num). I found it easier to implement
> > "vhost_svq_vring_write_descs_packed()" like this. Please let me know if
> > this isn't feasible or ideal.
> 
> Not a big deal, I picked the MAX just because it is all the
> hwaddresses the function needs at the same time. Addition should work
> too, and AFAIK chains are usually short. We should get rid of this
> dynamic allocation in the future anyway.

Ok, understood.

> [...]
> > In "struct VhostShadowVirtqueue", I rearranged the order in which some
> > members appear. I tried to keep the members common to split and packed
> > virtqueues above the union and the rest below the union. I haven't
> > entirely understood the role of some of the members (for example,
> > VhostShadowVirtqueueOps *ops). I'll change this ordering if need be as I
> > continue to understand them better.
> 
> That's fine, but do it in a separate patch for future series, so it is
> easier to review.

Sure, I'll do that.

> ops is used when a kind of device wants specialized handling for the
> descriptors forwarded. vdpa-net uses it when QEMU also needs to
> inspect the descriptors. Feel free to ask more about it, but adding
> packed format to SVQ should not affect the ops member.

Got it. I don't have any other questions related to ops.

> > For the next step, I think I should work on "vhost_svq_start()" which is
> > where members of the struct are actually initialized. At the moment, only
> > the split ring part of the structure is initialized.
> 
> Sounds reasonable. My recommendation is to mimic the patches of the
> kernel, doing a git log and following that order. You also need to
> apply the fixes in the history from that moment.
> 
> > I think I should also start working on enhancing "vhost_svq_kick()" to
> > actually send the buffers to the device. I think it'll be easier to test
> > these changes once that's done (I am not sure about this though). Would
> > this involve implementing the notification mechanism and event_idx?
> You can start omitting event_idx if you disable it from the device or
> from qemu's commandline with event_idx=off. If you use vdpa_sim, it's
> easier to remove it using qemu's cmdline in my opinion. Also, there is
> a needs_kick boolean that you can set to always true for testing
> purposes, since it is valid to send extra notifications. I think you
> don't need to modify anything else from vhost_svq_kick to test the
> device receives the buffer, but let me know if you find problems.

I'll take a look at the kernel patches. I'll also try testing these changes with
vdpa_sim. I'll check if the device gets the buffers.

Thanks,
Sahil

[1] https://lists.nongnu.org/archive/html/qemu-devel/2024-05/msg01843.html