Series comparison

-[Qemu-devel] [PULL 00/14] Net patches
+[PULL V2 00/25] Net patches
-The following changes since commit 6632f6ff96f0537fc34cdc00c760656fc62e23c5:
+The following changes since commit d48125de38f48a61d6423ef6a01156d6dff9ee2c:
-  Merge remote-tracking branch 'remotes/famz/tags/block-and-testing-pull-request' into staging (2017-07-17 11:46:36 +0100)
+  Merge tag 'kraxel-20220719-pull-request' of https://gitlab.com/kraxel/qemu into staging (2022-07-19 17:40:36 +0100)
 are available in the git repository at:
   https://github.com/jasowang/qemu.git tags/net-pull-request
-for you to fetch changes up to 189ae6bb5ce1f5a322f8691d00fe942ba43dd601:
+for you to fetch changes up to 8bdab83b34efb0b598be4e5b98e4f466ca5f2f80:
-  virtio-net: fix offload ctrl endian (2017-07-17 20:13:56 +0800)
+  net/colo.c: fix segmentation fault when packet is not parsed correctly (2022-07-20 16:58:08 +0800)
 ----------------------------------------------------------------
-- fix virtio-net ctrl offload endian
+Changes since V1:
-- vnet header support for variou COLO netfilters and compare thread
+- Fix build erros of vhost-vdpa when virtio-net is not set
 ----------------------------------------------------------------
-Jason Wang (1):
+Eugenio Pérez (21):
-      virtio-net: fix offload ctrl endian
+      vhost: move descriptor translation to vhost_svq_vring_write_descs
       virtio-net: Expose MAC_TABLE_ENTRIES
       virtio-net: Expose ctrl virtqueue logic
       vdpa: Avoid compiler to squash reads to used idx
       vhost: Reorder vhost_svq_kick
       vhost: Move vhost_svq_kick call to vhost_svq_add
       vhost: Check for queue full at vhost_svq_add
       vhost: Decouple vhost_svq_add from VirtQueueElement
       vhost: Add SVQDescState
       vhost: Track number of descs in SVQDescState
       vhost: add vhost_svq_push_elem
       vhost: Expose vhost_svq_add
       vhost: add vhost_svq_poll
       vhost: Add svq avail_handler callback
       vdpa: Export vhost_vdpa_dma_map and unmap calls
       vhost-net-vdpa: add stubs for when no virtio-net device is present
       vdpa: manual forward CVQ buffers
       vdpa: Buffer CVQ support on shadow virtqueue
       vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
       vdpa: Add device migration blocker
       vdpa: Add x-svq to NetdevVhostVDPAOptions
-Michal Privoznik (1):
+Zhang Chen (4):
-      virtion-net: Prefer is_power_of_2()
+      softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH
       net/colo: Fix a "double free" crash to clear the conn_list
       net/colo.c: No need to track conn_list for filter-rewriter
       net/colo.c: fix segmentation fault when packet is not parsed correctly
-Zhang Chen (12):
+ hw/net/virtio-net.c                |  85 +++++----
-      net: Add vnet_hdr_len arguments in NetClientState
+ hw/virtio/vhost-shadow-virtqueue.c | 210 +++++++++++++++-------
-      net/net.c: Add vnet_hdr support in SocketReadState
+ hw/virtio/vhost-shadow-virtqueue.h |  52 +++++-
-      net/filter-mirror.c: Introduce parameter for filter_send()
+ hw/virtio/vhost-vdpa.c             |  26 ++-
-      net/filter-mirror.c: Make filter mirror support vnet support.
+ include/hw/virtio/vhost-vdpa.h     |   8 +
-      net/filter-mirror.c: Add new option to enable vnet support for filter-redirector
+ include/hw/virtio/virtio-net.h     |   7 +
-      net/colo.c: Make vnet_hdr_len as packet property
+ net/colo-compare.c                 |   2 +-
-      net/colo-compare.c: Introduce parameter for compare_chr_send()
+ net/colo.c                         |  11 +-
-      net/colo-compare.c: Make colo-compare support vnet_hdr_len
+ net/filter-rewriter.c              |   2 +-
-      net/colo.c: Add vnet packet parse feature in colo-proxy
+ net/meson.build                    |   3 +-
-      net/colo-compare.c: Add vnet packet's tcp/udp/icmp compare
+ net/trace-events                   |   1 +
-      net/filter-rewriter.c: Make filter-rewriter support vnet_hdr_len
+ net/vhost-vdpa-stub.c              |  21 +++
-      docs/colo-proxy.txt: Update colo-proxy usage of net driver with vnet_header
+ net/vhost-vdpa.c                   | 357 +++++++++++++++++++++++++++++++++++--
+ qapi/net.json                      |   9 +-
- docs/colo-proxy.txt   | 26 ++++++++++++++++
+ softmmu/runstate.c                 |   1 +
- hw/net/virtio-net.c   |  4 ++-
+files changed, 671 insertions(+), 124 deletions(-)
- include/net/net.h     | 10 ++++--
+ create mode 100644 net/vhost-vdpa-stub.c
  net/colo-compare.c    | 84 ++++++++++++++++++++++++++++++++++++++++++---------
  net/colo.c            |  9 +++---
  net/colo.h            |  4 ++-
  net/filter-mirror.c   | 75 +++++++++++++++++++++++++++++++++++++++++----
  net/filter-rewriter.c | 37 ++++++++++++++++++++++-
  net/net.c             | 37 ++++++++++++++++++++---
  net/socket.c          |  8 ++---
  qemu-options.hx       | 19 ++++++------
 files changed, 265 insertions(+), 48 deletions(-)

-[Qemu-devel] [PULL 07/14] net/colo-compare.c: Introduce parameter for compare_chr_send()
+[PULL V2 01/25] vhost: move descriptor translation to vhost_svq_vring_write_descs
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Eugenio Pérez <eperezma@redhat.com>
-This patch change the compare_chr_send() parameter from CharBackend to CompareState,
+It's done for both in and out descriptors so it's better placed here.
 we can get more information like vnet_hdr(We use it to support packet with vnet_header).
-Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+Acked-by: Jason Wang <jasowang@redhat.com>
 Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- net/colo-compare.c | 14 +++++++-------
+ hw/virtio/vhost-shadow-virtqueue.c | 38 +++++++++++++++++++++++++++-----------
-file changed, 7 insertions(+), 7 deletions(-)
+file changed, 27 insertions(+), 11 deletions(-)
-diff --git a/net/colo-compare.c b/net/colo-compare.c
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
 index XXXXXXX..XXXXXXX 100644
---- a/net/colo-compare.c
+--- a/hw/virtio/vhost-shadow-virtqueue.c
-+++ b/net/colo-compare.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
-@@ -XXX,XX +XXX,XX @@ enum {
+@@ -XXX,XX +XXX,XX @@ static bool vhost_svq_translate_addr(const VhostShadowVirtqueue *svq,
-     SECONDARY_IN,
+     return true;
- };
+ }
--static int compare_chr_send(CharBackend *out,
+-static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
-+static int compare_chr_send(CompareState *s,
+-                                    const struct iovec *iovec, size_t num,
-                             const uint8_t *buf,
+-                                    bool more_descs, bool write)
-                             uint32_t size);
++/**
++ * Write descriptors to SVQ vring
-@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
++ *
-         }
++ * @svq: The shadow virtqueue
++ * @sg: Cache for hwaddr
-         if (result) {
++ * @iovec: The iovec from the guest
--            ret = compare_chr_send(&s->chr_out, pkt->data, pkt->size);
++ * @num: iovec length
-+            ret = compare_chr_send(s, pkt->data, pkt->size);
++ * @more_descs: True if more descriptors come in the chain
-             if (ret < 0) {
++ * @write: True if they are writeable descriptors
-                 error_report("colo_send_primary_packet failed");
++ *
-             }
++ * Return true if success, false otherwise and print error.
-@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
++ */
 +static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
 +                                        const struct iovec *iovec, size_t num,
 +                                        bool more_descs, bool write)
  {
      uint16_t i = svq->free_head, last = svq->free_head;
      unsigned n;
      uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
      vring_desc_t *descs = svq->vring.desc;
 +    bool ok;
      if (num == 0) {
 -        return;
 +        return true;
 +    }
 +
 +    ok = vhost_svq_translate_addr(svq, sg, iovec, num);
 +    if (unlikely(!ok)) {
 +        return false;
      }
+     for (n = 0; n < num; n++) {
+@@ -XXX,XX +XXX,XX @@ static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
+     }
+     svq->free_head = le16_to_cpu(svq->desc_next[last]);
++    return true;
  }
--static int compare_chr_send(CharBackend *out,
+ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-+static int compare_chr_send(CompareState *s,
+@@ -XXX,XX +XXX,XX @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-                             const uint8_t *buf,
+         return false;
                              uint32_t size)
  {
@@ -XXX,XX +XXX,XX @@ static int compare_chr_send(CharBackend *out,
          return 0;
      }
--    ret = qemu_chr_fe_write_all(out, (uint8_t *)&len, sizeof(len));
+-    ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
-+    ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
++    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
-     if (ret != sizeof(len)) {
++                                     elem->in_num > 0, false);
-         goto err;
+     if (unlikely(!ok)) {
          return false;
      }
+-    vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
--    ret = qemu_chr_fe_write_all(out, (uint8_t *)buf, size);
+-                            elem->in_num > 0, false);
-+    ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
+-
-     if (ret != size) {
-         goto err;
+-    ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
 +    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
 +                                     true);
      if (unlikely(!ok)) {
          return false;
      }
-@@ -XXX,XX +XXX,XX @@ static void compare_pri_rs_finalize(SocketReadState *pri_rs)
+-    vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
-     if (packet_enqueue(s, PRIMARY_IN)) {
+-
-         trace_colo_compare_main("primary: unsupported packet in");
+     /*
--        compare_chr_send(&s->chr_out, pri_rs->buf, pri_rs->packet_len);
+      * Put the entry in the available array (but don't update avail->idx until
-+        compare_chr_send(s, pri_rs->buf, pri_rs->packet_len);
+      * they do sync).
      } else {
          /* compare connection */
          g_queue_foreach(&s->conn_list, colo_compare_connection, s);
@@ -XXX,XX +XXX,XX @@ static void colo_flush_packets(void *opaque, void *user_data)
      while (!g_queue_is_empty(&conn->primary_list)) {
          pkt = g_queue_pop_head(&conn->primary_list);
 -        compare_chr_send(&s->chr_out, pkt->data, pkt->size);
 +        compare_chr_send(s, pkt->data, pkt->size);
          packet_destroy(pkt, NULL);
      }
      while (!g_queue_is_empty(&conn->secondary_list)) {
 --
 .7.4

-[Qemu-devel] [PULL 14/14] virtio-net: fix offload ctrl endian
+[PULL V2 02/25] virtio-net: Expose MAC_TABLE_ENTRIES
-Spec said offloads should be le64, so use virtio_ldq_p() to guarantee
+From: Eugenio Pérez <eperezma@redhat.com>
 valid endian.
-Fixes: 644c98587d4c ("virtio-net: dynamic network offloads configuration")
+vhost-vdpa control virtqueue needs to know the maximum entries supported
-Cc: qemu-stable@nongnu.org
+by the virtio-net device, so we know if it is possible to apply the
-Cc: Dmitry Fleytman <dfleytma@redhat.com>
+filter.
 Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- hw/net/virtio-net.c | 2 ++
+ hw/net/virtio-net.c            | 1 -
-file changed, 2 insertions(+)
+ include/hw/virtio/virtio-net.h | 3 +++
 files changed, 3 insertions(+), 1 deletion(-)
 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/net/virtio-net.c
 +++ b/hw/net/virtio-net.c
-@@ -XXX,XX +XXX,XX @@ static int virtio_net_handle_offloads(VirtIONet *n, uint8_t cmd,
+@@ -XXX,XX +XXX,XX @@
-     if (cmd == VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET) {
-         uint64_t supported_offloads;
+ #define VIRTIO_NET_VM_VERSION    11
-+        offloads = virtio_ldq_p(vdev, &offloads);
+-#define MAC_TABLE_ENTRIES    64
  #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
  /* previously fixed value */
 diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/virtio/virtio-net.h
 +++ b/include/hw/virtio/virtio-net.h
@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET)
   * and latency. */
  #define TX_BURST 256
 +/* Maximum VIRTIO_NET_CTRL_MAC_TABLE_SET unicast + multicast entries. */
 +#define MAC_TABLE_ENTRIES    64
 +
-         if (!n->has_vnet_hdr) {
+ typedef struct virtio_net_conf
-             return VIRTIO_NET_ERR;
+ {
-         }
+     uint32_t txtimer;
 --
 .7.4

-[Qemu-devel] [PULL 13/14] virtion-net: Prefer is_power_of_2()
+[PULL V2 03/25] virtio-net: Expose ctrl virtqueue logic
-From: Michal Privoznik <mprivozn@redhat.com>
+From: Eugenio Pérez <eperezma@redhat.com>
-We have a function that checks if given number is power of two.
+This allows external vhost-net devices to modify the state of the
-We should prefer it instead of expanding the check on our own.
+VirtIO device model once the vhost-vdpa device has acknowledged the
 control commands.
-Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- hw/net/virtio-net.c | 2 +-
+ hw/net/virtio-net.c            | 84 ++++++++++++++++++++++++------------------
-file changed, 1 insertion(+), 1 deletion(-)
+ include/hw/virtio/virtio-net.h |  4 ++
 files changed, 53 insertions(+), 35 deletions(-)
 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/net/virtio-net.c
 +++ b/hw/net/virtio-net.c
-@@ -XXX,XX +XXX,XX @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
-      */
+     return VIRTIO_NET_OK;
-     if (n->net_conf.rx_queue_size < VIRTIO_NET_RX_QUEUE_MIN_SIZE ||
+ }
-         n->net_conf.rx_queue_size > VIRTQUEUE_MAX_SIZE ||
--        (n->net_conf.rx_queue_size & (n->net_conf.rx_queue_size - 1))) {
+-static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
-+        !is_power_of_2(n->net_conf.rx_queue_size)) {
++size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
-         error_setg(errp, "Invalid rx_queue_size (= %" PRIu16 "), "
++                                  const struct iovec *in_sg, unsigned in_num,
-                    "must be a power of 2 between %d and %d.",
++                                  const struct iovec *out_sg,
-                    n->net_conf.rx_queue_size, VIRTIO_NET_RX_QUEUE_MIN_SIZE,
++                                  unsigned out_num)
  {
      VirtIONet *n = VIRTIO_NET(vdev);
      struct virtio_net_ctrl_hdr ctrl;
      virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
 -    VirtQueueElement *elem;
      size_t s;
      struct iovec *iov, *iov2;
 -    unsigned int iov_cnt;
 +
 +    if (iov_size(in_sg, in_num) < sizeof(status) ||
 +        iov_size(out_sg, out_num) < sizeof(ctrl)) {
 +        virtio_error(vdev, "virtio-net ctrl missing headers");
 +        return 0;
 +    }
 +
 +    iov2 = iov = g_memdup2(out_sg, sizeof(struct iovec) * out_num);
 +    s = iov_to_buf(iov, out_num, 0, &ctrl, sizeof(ctrl));
 +    iov_discard_front(&iov, &out_num, sizeof(ctrl));
 +    if (s != sizeof(ctrl)) {
 +        status = VIRTIO_NET_ERR;
 +    } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
 +        status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, out_num);
 +    } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
 +        status = virtio_net_handle_mac(n, ctrl.cmd, iov, out_num);
 +    } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
 +        status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, out_num);
 +    } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
 +        status = virtio_net_handle_announce(n, ctrl.cmd, iov, out_num);
 +    } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
 +        status = virtio_net_handle_mq(n, ctrl.cmd, iov, out_num);
 +    } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
 +        status = virtio_net_handle_offloads(n, ctrl.cmd, iov, out_num);
 +    }
 +
 +    s = iov_from_buf(in_sg, in_num, 0, &status, sizeof(status));
 +    assert(s == sizeof(status));
 +
 +    g_free(iov2);
 +    return sizeof(status);
 +}
 +
 +static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 +{
 +    VirtQueueElement *elem;
      for (;;) {
 +        size_t written;
          elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
          if (!elem) {
              break;
          }
 -        if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
 -            iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) {
 -            virtio_error(vdev, "virtio-net ctrl missing headers");
 +
 +        written = virtio_net_handle_ctrl_iov(vdev, elem->in_sg, elem->in_num,
 +                                             elem->out_sg, elem->out_num);
 +        if (written > 0) {
 +            virtqueue_push(vq, elem, written);
 +            virtio_notify(vdev, vq);
 +            g_free(elem);
 +        } else {
              virtqueue_detach_element(vq, elem, 0);
              g_free(elem);
              break;
          }
 -
 -        iov_cnt = elem->out_num;
 -        iov2 = iov = g_memdup2(elem->out_sg,
 -                               sizeof(struct iovec) * elem->out_num);
 -        s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
 -        iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
 -        if (s != sizeof(ctrl)) {
 -            status = VIRTIO_NET_ERR;
 -        } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
 -            status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, iov_cnt);
 -        } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
 -            status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
 -        } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
 -            status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
 -        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
 -            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
 -        } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
 -            status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
 -        } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
 -            status = virtio_net_handle_offloads(n, ctrl.cmd, iov, iov_cnt);
 -        }
 -
 -        s = iov_from_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
 -        assert(s == sizeof(status));
 -
 -        virtqueue_push(vq, elem, sizeof(status));
 -        virtio_notify(vdev, vq);
 -        g_free(iov2);
 -        g_free(elem);
      }
  }
 diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/virtio/virtio-net.h
 +++ b/include/hw/virtio/virtio-net.h
@@ -XXX,XX +XXX,XX @@ struct VirtIONet {
      struct EBPFRSSContext ebpf_rss;
  };
 +size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
 +                                  const struct iovec *in_sg, unsigned in_num,
 +                                  const struct iovec *out_sg,
 +                                  unsigned out_num);
  void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
                                     const char *type);
 --
 .7.4

-New patch
+[PULL V2 04/25] vdpa: Avoid compiler to squash reads to used idx
+From: Eugenio Pérez <eperezma@redhat.com>
+In the next patch we will allow busypolling of this value. The compiler
+have a running path where shadow_used_idx, last_used_idx, and vring used
+idx are not modified within the same thread busypolling.
+This was not an issue before since we always cleared device event
+notifier before checking it, and that could act as memory barrier.
+However, the busypoll needs something similar to kernel READ_ONCE.
+Let's add it here, sepparated from the polling.
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
+Signed-off-by: Jason Wang <jasowang@redhat.com>
+---
+ hw/virtio/vhost-shadow-virtqueue.c | 3 ++-
+file changed, 2 insertions(+), 1 deletion(-)
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
+@@ -XXX,XX +XXX,XX @@ static void vhost_handle_guest_kick_notifier(EventNotifier *n)
+ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
+ {
++    uint16_t *used_idx = &svq->vring.used->idx;
+     if (svq->last_used_idx != svq->shadow_used_idx) {
+         return true;
+     }
+-    svq->shadow_used_idx = cpu_to_le16(svq->vring.used->idx);
++    svq->shadow_used_idx = cpu_to_le16(*(volatile uint16_t *)used_idx);
+     return svq->last_used_idx != svq->shadow_used_idx;
+ }
+--
+.7.4

-[Qemu-devel] [PULL 05/14] net/filter-mirror.c: Add new option to enable vnet support for filter-redirector
+[PULL V2 05/25] vhost: Reorder vhost_svq_kick
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Eugenio Pérez <eperezma@redhat.com>
-We add the vnet_hdr_support option for filter-redirector, default is disabled.
+Future code needs to call it from vhost_svq_add.
 If you use virtio-net-pci net driver or other driver needs vnet_hdr, please enable it.
 Because colo-compare or other modules needs the vnet_hdr_len to parse
 packet, we add this new option send the len to others.
 You can use it for example:
 -object filter-redirector,id=r0,netdev=hn0,queue=tx,outdev=red0,vnet_hdr_support
-Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+No functional change intended.
 Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- net/filter-mirror.c | 23 +++++++++++++++++++++++
+ hw/virtio/vhost-shadow-virtqueue.c | 28 ++++++++++++++--------------
- qemu-options.hx     |  6 +++---
+file changed, 14 insertions(+), 14 deletions(-)
 files changed, 26 insertions(+), 3 deletions(-)
-diff --git a/net/filter-mirror.c b/net/filter-mirror.c
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
 index XXXXXXX..XXXXXXX 100644
---- a/net/filter-mirror.c
+--- a/hw/virtio/vhost-shadow-virtqueue.c
-+++ b/net/filter-mirror.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
-@@ -XXX,XX +XXX,XX @@ static void filter_redirector_set_outdev(Object *obj,
+@@ -XXX,XX +XXX,XX @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-     s->outdev = g_strdup(value);
+     return true;
  }
-+static bool filter_redirector_get_vnet_hdr(Object *obj, Error **errp)
++static void vhost_svq_kick(VhostShadowVirtqueue *svq)
 +{
-+    MirrorState *s = FILTER_REDIRECTOR(obj);
++    /*
 +     * We need to expose the available array entries before checking the used
 +     * flags
 +     */
 +    smp_mb();
 +    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
 +        return;
 +    }
 +
-+    return s->vnet_hdr;
++    event_notifier_set(&svq->hdev_kick);
 +}
 +
-+static void filter_redirector_set_vnet_hdr(Object *obj,
+ /**
-+                                           bool value,
+  * Add an element to a SVQ.
-+                                           Error **errp)
+  *
-+{
+@@ -XXX,XX +XXX,XX @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
-+    MirrorState *s = FILTER_REDIRECTOR(obj);
+     return true;
 +
 +    s->vnet_hdr = value;
 +}
 +
  static void filter_mirror_init(Object *obj)
  {
      MirrorState *s = FILTER_MIRROR(obj);
@@ -XXX,XX +XXX,XX @@ static void filter_mirror_init(Object *obj)
  static void filter_redirector_init(Object *obj)
  {
 +    MirrorState *s = FILTER_REDIRECTOR(obj);
 +
      object_property_add_str(obj, "indev", filter_redirector_get_indev,
                              filter_redirector_set_indev, NULL);
      object_property_add_str(obj, "outdev", filter_redirector_get_outdev,
                              filter_redirector_set_outdev, NULL);
 +
 +    s->vnet_hdr = false;
 +    object_property_add_bool(obj, "vnet_hdr_support",
 +                             filter_redirector_get_vnet_hdr,
 +                             filter_redirector_set_vnet_hdr, NULL);
  }
- static void filter_mirror_fini(Object *obj)
+-static void vhost_svq_kick(VhostShadowVirtqueue *svq)
-diff --git a/qemu-options.hx b/qemu-options.hx
+-{
-index XXXXXXX..XXXXXXX 100644
+-    /*
---- a/qemu-options.hx
+-     * We need to expose the available array entries before checking the used
-+++ b/qemu-options.hx
+-     * flags
-@@ -XXX,XX +XXX,XX @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter.
+-     */
+-    smp_mb();
- filter-mirror on netdev @var{netdevid},mirror net packet to chardev@var{chardevid}, if it has the vnet_hdr_support flag, filter-mirror will mirror packet with vnet_hdr_len.
+-    if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
+-        return;
--@item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},
+-    }
--outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
+-
-+@item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support]
+-    event_notifier_set(&svq->hdev_kick);
+-}
- filter-redirector on netdev @var{netdevid},redirect filter's net packet to chardev
+-
--@var{chardevid},and redirect indev's packet to filter.
+ /**
-+@var{chardevid},and redirect indev's packet to filter.if it has the vnet_hdr_support flag,
+  * Forward available buffers.
-+filter-redirector will redirect packet with vnet_hdr_len.
+  *
  Create a filter-redirector we need to differ outdev id from indev id, id can not
  be the same. we can just use indev or outdev, but at least one of indev or outdev
  need to be specified.
 --
 .7.4

-New patch
+[PULL V2 06/25] vhost: Move vhost_svq_kick call to vhost_svq_add
+From: Eugenio Pérez <eperezma@redhat.com>
+The series needs to expose vhost_svq_add with full functionality,
+including kick
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
+Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
+Signed-off-by: Jason Wang <jasowang@redhat.com>
+---
+ hw/virtio/vhost-shadow-virtqueue.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
+@@ -XXX,XX +XXX,XX @@ static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+     }
+     svq->ring_id_maps[qemu_head] = elem;
++    vhost_svq_kick(svq);
+     return true;
+ }
+@@ -XXX,XX +XXX,XX @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
+                 /* VQ is broken, just return and ignore any other kicks */
+                 return;
+             }
+-            vhost_svq_kick(svq);
+         }
+         virtio_queue_set_notification(svq->vq, true);
+--
+.7.4

-New patch
+[PULL V2 07/25] vhost: Check for queue full at vhost_svq_add
+From: Eugenio Pérez <eperezma@redhat.com>
+The series need to expose vhost_svq_add with full functionality,
+including checking for full queue.
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
+Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
+Signed-off-by: Jason Wang <jasowang@redhat.com>
+---
+ hw/virtio/vhost-shadow-virtqueue.c | 59 +++++++++++++++++++++-----------------
+file changed, 33 insertions(+), 26 deletions(-)
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
+@@ -XXX,XX +XXX,XX @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
+  * Add an element to a SVQ.
+  *
+  * The caller must check that there is enough slots for the new element. It
+- * takes ownership of the element: In case of failure, it is free and the SVQ
+- * is considered broken.
++ * takes ownership of the element: In case of failure not ENOSPC, it is free.
++ *
++ * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
+  */
+-static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
++static int vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+ {
+     unsigned qemu_head;
+-    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
++    unsigned ndescs = elem->in_num + elem->out_num;
++    bool ok;
++
++    if (unlikely(ndescs > vhost_svq_available_slots(svq))) {
++        return -ENOSPC;
++    }
++
++    ok = vhost_svq_add_split(svq, elem, &qemu_head);
+     if (unlikely(!ok)) {
+         g_free(elem);
+-        return false;
++        return -EINVAL;
+     }
+     svq->ring_id_maps[qemu_head] = elem;
+     vhost_svq_kick(svq);
+-    return true;
++    return 0;
+ }
+ /**
+@@ -XXX,XX +XXX,XX @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
+         while (true) {
+             VirtQueueElement *elem;
+-            bool ok;
++            int r;
+             if (svq->next_guest_avail_elem) {
+                 elem = g_steal_pointer(&svq->next_guest_avail_elem);
+@@ -XXX,XX +XXX,XX @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
+                 break;
+             }
+-            if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
+-                /*
+-                 * This condition is possible since a contiguous buffer in GPA
+-                 * does not imply a contiguous buffer in qemu's VA
+-                 * scatter-gather segments. If that happens, the buffer exposed
+-                 * to the device needs to be a chain of descriptors at this
+-                 * moment.
+-                 *
+-                 * SVQ cannot hold more available buffers if we are here:
+-                 * queue the current guest descriptor and ignore further kicks
+-                 * until some elements are used.
+-                 */
+-                svq->next_guest_avail_elem = elem;
+-                return;
+-            }
+-
+-            ok = vhost_svq_add(svq, elem);
+-            if (unlikely(!ok)) {
+-                /* VQ is broken, just return and ignore any other kicks */
++            r = vhost_svq_add(svq, elem);
++            if (unlikely(r != 0)) {
++                if (r == -ENOSPC) {
++                    /*
++                     * This condition is possible since a contiguous buffer in
++                     * GPA does not imply a contiguous buffer in qemu's VA
++                     * scatter-gather segments. If that happens, the buffer
++                     * exposed to the device needs to be a chain of descriptors
++                     * at this moment.
++                     *
++                     * SVQ cannot hold more available buffers if we are here:
++                     * queue the current guest descriptor and ignore kicks
++                     * until some elements are used.
++                     */
++                    svq->next_guest_avail_elem = elem;
++                }
++
++                /* VQ is full or broken, just return and ignore kicks */
+                 return;
+             }
+         }
+--
+.7.4

-[Qemu-devel] [PULL 04/14] net/filter-mirror.c: Make filter mirror support vnet support.
+[PULL V2 08/25] vhost: Decouple vhost_svq_add from VirtQueueElement
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Eugenio Pérez <eperezma@redhat.com>
-We add the vnet_hdr_support option for filter-mirror, default is disabled.
+VirtQueueElement comes from the guest, but we're heading SVQ to be able
-If you use virtio-net-pci or other driver needs vnet_hdr, please enable it.
+to modify the element presented to the device without the guest's
-You can use it for example:
+knowledge.
 -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0,vnet_hdr_support
-If it has vnet_hdr_support flag, we will change the sending packet format from
+To do so, make SVQ accept sg buffers directly, instead of using
-struct {int size; const uint8_t buf[];} to {int size; int vnet_hdr_len; const uint8_t buf[];}.
+VirtQueueElement.
 make other module(like colo-compare) know how to parse net packet correctly.
-Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+Add vhost_svq_add_element to maintain element convenience.
 Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
 Acked-by: Jason Wang <jasowang@redhat.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- net/filter-mirror.c | 42 +++++++++++++++++++++++++++++++++++++++++-
+ hw/virtio/vhost-shadow-virtqueue.c | 33 ++++++++++++++++++++++-----------
- qemu-options.hx     |  5 ++---
+file changed, 22 insertions(+), 11 deletions(-)
 files changed, 43 insertions(+), 4 deletions(-)
-diff --git a/net/filter-mirror.c b/net/filter-mirror.c
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
 index XXXXXXX..XXXXXXX 100644
---- a/net/filter-mirror.c
+--- a/hw/virtio/vhost-shadow-virtqueue.c
-+++ b/net/filter-mirror.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
-@@ -XXX,XX +XXX,XX @@ typedef struct MirrorState {
+@@ -XXX,XX +XXX,XX @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
-     CharBackend chr_in;
+ }
-     CharBackend chr_out;
-     SocketReadState rs;
+ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-+    bool vnet_hdr;
+-                                VirtQueueElement *elem, unsigned *head)
- } MirrorState;
++                                const struct iovec *out_sg, size_t out_num,
++                                const struct iovec *in_sg, size_t in_num,
- static int filter_send(MirrorState *s,
++                                unsigned *head)
                         const struct iovec *iov,
                         int iovcnt)
  {
-+    NetFilterState *nf = NETFILTER(s);
+     unsigned avail_idx;
-     int ret = 0;
+     vring_avail_t *avail = svq->vring.avail;
-     ssize_t size = 0;
+     bool ok;
-     uint32_t len = 0;
+-    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
-@@ -XXX,XX +XXX,XX @@ static int filter_send(MirrorState *s,
++    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(out_num, in_num));
-         goto err;
      *head = svq->free_head;
      /* We need some descriptors here */
 -    if (unlikely(!elem->out_num && !elem->in_num)) {
 +    if (unlikely(!out_num && !in_num)) {
          qemu_log_mask(LOG_GUEST_ERROR,
                        "Guest provided element with no descriptors");
          return false;
      }
-+    if (s->vnet_hdr) {
+-    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
-+        /*
+-                                     elem->in_num > 0, false);
-+         * If vnet_hdr = on, we send vnet header len to make other
++    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
-+         * module(like colo-compare) know how to parse net
++                                     false);
-+         * packet correctly.
+     if (unlikely(!ok)) {
-+         */
+         return false;
 +        ssize_t vnet_hdr_len;
 +
 +        vnet_hdr_len = nf->netdev->vnet_hdr_len;
 +
 +        len = htonl(vnet_hdr_len);
 +        ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
 +        if (ret != sizeof(len)) {
 +            goto err;
 +        }
 +    }
 +
      buf = g_malloc(size);
      iov_to_buf(iov, iovcnt, 0, buf, size);
      ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
@@ -XXX,XX +XXX,XX @@ static void filter_redirector_setup(NetFilterState *nf, Error **errp)
          }
      }
--    net_socket_rs_init(&s->rs, redirector_rs_finalize, false);
+-    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
-+    net_socket_rs_init(&s->rs, redirector_rs_finalize, s->vnet_hdr);
+-                                     true);
++    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
-     if (s->indev) {
+     if (unlikely(!ok)) {
-         chr = qemu_chr_find(s->indev);
+         return false;
@@ -XXX,XX +XXX,XX @@ static void filter_mirror_set_outdev(Object *obj,
      }
+@@ -XXX,XX +XXX,XX @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
+  *
+  * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
+  */
+-static int vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
++static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
++                          size_t out_num, const struct iovec *in_sg,
++                          size_t in_num, VirtQueueElement *elem)
+ {
+     unsigned qemu_head;
+-    unsigned ndescs = elem->in_num + elem->out_num;
++    unsigned ndescs = in_num + out_num;
+     bool ok;
+     if (unlikely(ndescs > vhost_svq_available_slots(svq))) {
+         return -ENOSPC;
+     }
+-    ok = vhost_svq_add_split(svq, elem, &qemu_head);
++    ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
+     if (unlikely(!ok)) {
+         g_free(elem);
+         return -EINVAL;
+@@ -XXX,XX +XXX,XX @@ static int vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+     return 0;
  }
-+static bool filter_mirror_get_vnet_hdr(Object *obj, Error **errp)
++/* Convenience wrapper to add a guest's element to SVQ */
 +static int vhost_svq_add_element(VhostShadowVirtqueue *svq,
 +                                 VirtQueueElement *elem)
 +{
-+    MirrorState *s = FILTER_MIRROR(obj);
++    return vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->in_sg,
-+
++                         elem->in_num, elem);
 +    return s->vnet_hdr;
 +}
 +
-+static void filter_mirror_set_vnet_hdr(Object *obj, bool value, Error **errp)
+ /**
-+{
+  * Forward available buffers.
-+    MirrorState *s = FILTER_MIRROR(obj);
+  *
-+
+@@ -XXX,XX +XXX,XX @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
-+    s->vnet_hdr = value;
+                 break;
-+}
+             }
-+
- static char *filter_redirector_get_outdev(Object *obj, Error **errp)
+-            r = vhost_svq_add(svq, elem);
- {
++            r = vhost_svq_add_element(svq, elem);
-     MirrorState *s = FILTER_REDIRECTOR(obj);
+             if (unlikely(r != 0)) {
-@@ -XXX,XX +XXX,XX @@ static void filter_redirector_set_outdev(Object *obj,
+                 if (r == -ENOSPC) {
+                     /*
  static void filter_mirror_init(Object *obj)
  {
 +    MirrorState *s = FILTER_MIRROR(obj);
 +
      object_property_add_str(obj, "outdev", filter_mirror_get_outdev,
                              filter_mirror_set_outdev, NULL);
 +
 +    s->vnet_hdr = false;
 +    object_property_add_bool(obj, "vnet_hdr_support",
 +                             filter_mirror_get_vnet_hdr,
 +                             filter_mirror_set_vnet_hdr, NULL);
  }
  static void filter_redirector_init(Object *obj)
 diff --git a/qemu-options.hx b/qemu-options.hx
 index XXXXXXX..XXXXXXX 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter.
  @option{tx}: the filter is attached to the transmit queue of the netdev,
               where it will receive packets sent by the netdev.
 -@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
 +@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support]
 -filter-mirror on netdev @var{netdevid},mirror net packet to chardev
 -@var{chardevid}
 +filter-mirror on netdev @var{netdevid},mirror net packet to chardev@var{chardevid}, if it has the vnet_hdr_support flag, filter-mirror will mirror packet with vnet_hdr_len.
  @item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},
  outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
 --
 .7.4

-New patch
+[PULL V2 09/25] vhost: Add SVQDescState
+From: Eugenio Pérez <eperezma@redhat.com>
+This will allow SVQ to add context to the different queue elements.
+This patch only store the actual element, no functional change intended.
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
+Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
+Signed-off-by: Jason Wang <jasowang@redhat.com>
+---
+ hw/virtio/vhost-shadow-virtqueue.c | 16 ++++++++--------
+ hw/virtio/vhost-shadow-virtqueue.h |  8 ++++++--
+files changed, 14 insertions(+), 10 deletions(-)
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
+@@ -XXX,XX +XXX,XX @@ static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
+         return -EINVAL;
+     }
+-    svq->ring_id_maps[qemu_head] = elem;
++    svq->desc_state[qemu_head].elem = elem;
+     vhost_svq_kick(svq);
+     return 0;
+ }
+@@ -XXX,XX +XXX,XX @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
+         return NULL;
+     }
+-    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
++    if (unlikely(!svq->desc_state[used_elem.id].elem)) {
+         qemu_log_mask(LOG_GUEST_ERROR,
+             "Device %s says index %u is used, but it was not available",
+             svq->vdev->name, used_elem.id);
+         return NULL;
+     }
+-    num = svq->ring_id_maps[used_elem.id]->in_num +
+-          svq->ring_id_maps[used_elem.id]->out_num;
++    num = svq->desc_state[used_elem.id].elem->in_num +
++          svq->desc_state[used_elem.id].elem->out_num;
+     last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
+     svq->desc_next[last_used_chain] = svq->free_head;
+     svq->free_head = used_elem.id;
+     *len = used_elem.len;
+-    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
++    return g_steal_pointer(&svq->desc_state[used_elem.id].elem);
+ }
+ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
+@@ -XXX,XX +XXX,XX @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
+     memset(svq->vring.desc, 0, driver_size);
+     svq->vring.used = qemu_memalign(qemu_real_host_page_size(), device_size);
+     memset(svq->vring.used, 0, device_size);
+-    svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
++    svq->desc_state = g_new0(SVQDescState, svq->vring.num);
+     svq->desc_next = g_new0(uint16_t, svq->vring.num);
+     for (unsigned i = 0; i < svq->vring.num - 1; i++) {
+         svq->desc_next[i] = cpu_to_le16(i + 1);
+@@ -XXX,XX +XXX,XX @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
+     for (unsigned i = 0; i < svq->vring.num; ++i) {
+         g_autofree VirtQueueElement *elem = NULL;
+-        elem = g_steal_pointer(&svq->ring_id_maps[i]);
++        elem = g_steal_pointer(&svq->desc_state[i].elem);
+         if (elem) {
+             virtqueue_detach_element(svq->vq, elem, 0);
+         }
+@@ -XXX,XX +XXX,XX @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
+     }
+     svq->vq = NULL;
+     g_free(svq->desc_next);
+-    g_free(svq->ring_id_maps);
++    g_free(svq->desc_state);
+     qemu_vfree(svq->vring.desc);
+     qemu_vfree(svq->vring.used);
+ }
+diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.h
++++ b/hw/virtio/vhost-shadow-virtqueue.h
+@@ -XXX,XX +XXX,XX @@
+ #include "standard-headers/linux/vhost_types.h"
+ #include "hw/virtio/vhost-iova-tree.h"
++typedef struct SVQDescState {
++    VirtQueueElement *elem;
++} SVQDescState;
++
+ /* Shadow virtqueue to relay notifications */
+ typedef struct VhostShadowVirtqueue {
+     /* Shadow vring */
+@@ -XXX,XX +XXX,XX @@ typedef struct VhostShadowVirtqueue {
+     /* IOVA mapping */
+     VhostIOVATree *iova_tree;
+-    /* Map for use the guest's descriptors */
+-    VirtQueueElement **ring_id_maps;
++    /* SVQ vring descriptors state */
++    SVQDescState *desc_state;
+     /* Next VirtQueue element that guest made available */
+     VirtQueueElement *next_guest_avail_elem;
+--
+.7.4

-[Qemu-devel] [PULL 03/14] net/filter-mirror.c: Introduce parameter for filter_send()
+[PULL V2 10/25] vhost: Track number of descs in SVQDescState
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Eugenio Pérez <eperezma@redhat.com>
-This patch change the filter_send() parameter from CharBackend to MirrorState,
+A guest's buffer continuos on GPA may need multiple descriptors on
-we can get more information like vnet_hdr(We use it to support packet with vnet_header).
+qemu's VA, so SVQ should track its length sepparatedly.
-Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- net/filter-mirror.c | 10 +++++-----
+ hw/virtio/vhost-shadow-virtqueue.c | 4 ++--
-file changed, 5 insertions(+), 5 deletions(-)
+ hw/virtio/vhost-shadow-virtqueue.h | 6 ++++++
 files changed, 8 insertions(+), 2 deletions(-)
-diff --git a/net/filter-mirror.c b/net/filter-mirror.c
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
 index XXXXXXX..XXXXXXX 100644
---- a/net/filter-mirror.c
+--- a/hw/virtio/vhost-shadow-virtqueue.c
-+++ b/net/filter-mirror.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
-@@ -XXX,XX +XXX,XX @@ typedef struct MirrorState {
+@@ -XXX,XX +XXX,XX @@ static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
      SocketReadState rs;
  } MirrorState;
 -static int filter_send(CharBackend *chr_out,
 +static int filter_send(MirrorState *s,
                         const struct iovec *iov,
                         int iovcnt)
  {
@@ -XXX,XX +XXX,XX @@ static int filter_send(CharBackend *chr_out,
      }
-     len = htonl(size);
+     svq->desc_state[qemu_head].elem = elem;
--    ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)&len, sizeof(len));
++    svq->desc_state[qemu_head].ndescs = ndescs;
-+    ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
+     vhost_svq_kick(svq);
-     if (ret != sizeof(len)) {
+     return 0;
-         goto err;
+ }
@@ -XXX,XX +XXX,XX @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
          return NULL;
      }
-     buf = g_malloc(size);
+-    num = svq->desc_state[used_elem.id].elem->in_num +
-     iov_to_buf(iov, iovcnt, 0, buf, size);
+-          svq->desc_state[used_elem.id].elem->out_num;
--    ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)buf, size);
++    num = svq->desc_state[used_elem.id].ndescs;
-+    ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
+     last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
-     g_free(buf);
+     svq->desc_next[last_used_chain] = svq->free_head;
-     if (ret != size) {
+     svq->free_head = used_elem.id;
-         goto err;
+diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
-@@ -XXX,XX +XXX,XX @@ static ssize_t filter_mirror_receive_iov(NetFilterState *nf,
+index XXXXXXX..XXXXXXX 100644
-     MirrorState *s = FILTER_MIRROR(nf);
+--- a/hw/virtio/vhost-shadow-virtqueue.h
-     int ret;
++++ b/hw/virtio/vhost-shadow-virtqueue.h
+@@ -XXX,XX +XXX,XX @@
--    ret = filter_send(&s->chr_out, iov, iovcnt);
-+    ret = filter_send(s, iov, iovcnt);
+ typedef struct SVQDescState {
-     if (ret) {
+     VirtQueueElement *elem;
-         error_report("filter mirror send failed(%s)", strerror(-ret));
++
-     }
++    /*
-@@ -XXX,XX +XXX,XX @@ static ssize_t filter_redirector_receive_iov(NetFilterState *nf,
++     * Number of descriptors exposed to the device. May or may not match
-     int ret;
++     * guest's
++     */
-     if (qemu_chr_fe_backend_connected(&s->chr_out)) {
++    unsigned int ndescs;
--        ret = filter_send(&s->chr_out, iov, iovcnt);
+ } SVQDescState;
-+        ret = filter_send(s, iov, iovcnt);
-         if (ret) {
+ /* Shadow virtqueue to relay notifications */
              error_report("filter redirector send failed(%s)", strerror(-ret));
          }
 --
 .7.4

-New patch
+[PULL V2 11/25] vhost: add vhost_svq_push_elem
+From: Eugenio Pérez <eperezma@redhat.com>
+This function allows external SVQ users to return guest's available
+buffers.
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
+Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
+Signed-off-by: Jason Wang <jasowang@redhat.com>
+---
+ hw/virtio/vhost-shadow-virtqueue.c | 16 ++++++++++++++++
+ hw/virtio/vhost-shadow-virtqueue.h |  3 +++
+files changed, 19 insertions(+)
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
+@@ -XXX,XX +XXX,XX @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
+     return g_steal_pointer(&svq->desc_state[used_elem.id].elem);
+ }
++/**
++ * Push an element to SVQ, returning it to the guest.
++ */
++void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
++                         const VirtQueueElement *elem, uint32_t len)
++{
++    virtqueue_push(svq->vq, elem, len);
++    if (svq->next_guest_avail_elem) {
++        /*
++         * Avail ring was full when vhost_svq_flush was called, so it's a
++         * good moment to make more descriptors available if possible.
++         */
++        vhost_handle_guest_kick(svq);
++    }
++}
++
+ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
+                             bool check_for_avail_queue)
+ {
+diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.h
++++ b/hw/virtio/vhost-shadow-virtqueue.h
+@@ -XXX,XX +XXX,XX @@ typedef struct VhostShadowVirtqueue {
+ bool vhost_svq_valid_features(uint64_t features, Error **errp);
++void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
++                         const VirtQueueElement *elem, uint32_t len);
++
+ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
+ void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
+ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
+--
+.7.4

-New patch
+[PULL V2 12/25] vhost: Expose vhost_svq_add
+From: Eugenio Pérez <eperezma@redhat.com>
+This allows external parts of SVQ to forward custom buffers to the
+device.
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
+Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
+Signed-off-by: Jason Wang <jasowang@redhat.com>
+---
+ hw/virtio/vhost-shadow-virtqueue.c | 6 +++---
+ hw/virtio/vhost-shadow-virtqueue.h | 3 +++
+files changed, 6 insertions(+), 3 deletions(-)
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
+@@ -XXX,XX +XXX,XX @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
+  *
+  * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
+  */
+-static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
+-                          size_t out_num, const struct iovec *in_sg,
+-                          size_t in_num, VirtQueueElement *elem)
++int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
++                  size_t out_num, const struct iovec *in_sg, size_t in_num,
++                  VirtQueueElement *elem)
+ {
+     unsigned qemu_head;
+     unsigned ndescs = in_num + out_num;
+diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.h
++++ b/hw/virtio/vhost-shadow-virtqueue.h
+@@ -XXX,XX +XXX,XX @@ bool vhost_svq_valid_features(uint64_t features, Error **errp);
+ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
+                          const VirtQueueElement *elem, uint32_t len);
++int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
++                  size_t out_num, const struct iovec *in_sg, size_t in_num,
++                  VirtQueueElement *elem);
+ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
+ void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
+--
+.7.4

-New patch
+[PULL V2 13/25] vhost: add vhost_svq_poll
+From: Eugenio Pérez <eperezma@redhat.com>
+It allows the Shadow Control VirtQueue to wait for the device to use the
+available buffers.
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
+Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
+Signed-off-by: Jason Wang <jasowang@redhat.com>
+---
+ hw/virtio/vhost-shadow-virtqueue.c | 27 +++++++++++++++++++++++++++
+ hw/virtio/vhost-shadow-virtqueue.h |  1 +
+files changed, 28 insertions(+)
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
+@@ -XXX,XX +XXX,XX @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
+ }
+ /**
++ * Poll the SVQ for one device used buffer.
++ *
++ * This function race with main event loop SVQ polling, so extra
++ * synchronization is needed.
++ *
++ * Return the length written by the device.
++ */
++size_t vhost_svq_poll(VhostShadowVirtqueue *svq)
++{
++    int64_t start_us = g_get_monotonic_time();
++    do {
++        uint32_t len;
++        VirtQueueElement *elem = vhost_svq_get_buf(svq, &len);
++        if (elem) {
++            return len;
++        }
++
++        if (unlikely(g_get_monotonic_time() - start_us > 10e6)) {
++            return 0;
++        }
++
++        /* Make sure we read new used_idx */
++        smp_rmb();
++    } while (true);
++}
++
++/**
+  * Forward used buffers.
+  *
+  * @n: hdev call event notifier, the one that device set to notify svq.
+diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.h
++++ b/hw/virtio/vhost-shadow-virtqueue.h
+@@ -XXX,XX +XXX,XX @@ void vhost_svq_push_elem(VhostShadowVirtqueue *svq,
+ int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
+                   size_t out_num, const struct iovec *in_sg, size_t in_num,
+                   VirtQueueElement *elem);
++size_t vhost_svq_poll(VhostShadowVirtqueue *svq);
+ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
+ void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
+--
+.7.4

-New patch
+[PULL V2 14/25] vhost: Add svq avail_handler callback
+From: Eugenio Pérez <eperezma@redhat.com>
+This allows external handlers to be aware of new buffers that the guest
+places in the virtqueue.
+When this callback is defined the ownership of the guest's virtqueue
+element is transferred to the callback. This means that if the user
+wants to forward the descriptor it needs to manually inject it. The
+callback is also free to process the command by itself and use the
+element with svq_push.
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
+Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
+Signed-off-by: Jason Wang <jasowang@redhat.com>
+---
+ hw/virtio/vhost-shadow-virtqueue.c | 14 ++++++++++++--
+ hw/virtio/vhost-shadow-virtqueue.h | 31 ++++++++++++++++++++++++++++++-
+ hw/virtio/vhost-vdpa.c             |  3 ++-
+files changed, 44 insertions(+), 4 deletions(-)
+diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.c
++++ b/hw/virtio/vhost-shadow-virtqueue.c
+@@ -XXX,XX +XXX,XX @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
+                 break;
+             }
+-            r = vhost_svq_add_element(svq, elem);
++            if (svq->ops) {
++                r = svq->ops->avail_handler(svq, elem, svq->ops_opaque);
++            } else {
++                r = vhost_svq_add_element(svq, elem);
++            }
+             if (unlikely(r != 0)) {
+                 if (r == -ENOSPC) {
+                     /*
+@@ -XXX,XX +XXX,XX @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
+  * shadow methods and file descriptors.
+  *
+  * @iova_tree: Tree to perform descriptors translations
++ * @ops: SVQ owner callbacks
++ * @ops_opaque: ops opaque pointer
+  *
+  * Returns the new virtqueue or NULL.
+  *
+  * In case of error, reason is reported through error_report.
+  */
+-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
++VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
++                                    const VhostShadowVirtqueueOps *ops,
++                                    void *ops_opaque)
+ {
+     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
+     int r;
+@@ -XXX,XX +XXX,XX @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
+     event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
+     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
+     svq->iova_tree = iova_tree;
++    svq->ops = ops;
++    svq->ops_opaque = ops_opaque;
+     return g_steal_pointer(&svq);
+ err_init_hdev_call:
+diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-shadow-virtqueue.h
++++ b/hw/virtio/vhost-shadow-virtqueue.h
+@@ -XXX,XX +XXX,XX @@ typedef struct SVQDescState {
+     unsigned int ndescs;
+ } SVQDescState;
++typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
++
++/**
++ * Callback to handle an avail buffer.
++ *
++ * @svq:  Shadow virtqueue
++ * @elem:  Element placed in the queue by the guest
++ * @vq_callback_opaque:  Opaque
++ *
++ * Returns 0 if the vq is running as expected.
++ *
++ * Note that ownership of elem is transferred to the callback.
++ */
++typedef int (*VirtQueueAvailCallback)(VhostShadowVirtqueue *svq,
++                                      VirtQueueElement *elem,
++                                      void *vq_callback_opaque);
++
++typedef struct VhostShadowVirtqueueOps {
++    VirtQueueAvailCallback avail_handler;
++} VhostShadowVirtqueueOps;
++
+ /* Shadow virtqueue to relay notifications */
+ typedef struct VhostShadowVirtqueue {
+     /* Shadow vring */
+@@ -XXX,XX +XXX,XX @@ typedef struct VhostShadowVirtqueue {
+      */
+     uint16_t *desc_next;
++    /* Caller callbacks */
++    const VhostShadowVirtqueueOps *ops;
++
++    /* Caller callbacks opaque */
++    void *ops_opaque;
++
+     /* Next head to expose to the device */
+     uint16_t shadow_avail_idx;
+@@ -XXX,XX +XXX,XX @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
+                      VirtQueue *vq);
+ void vhost_svq_stop(VhostShadowVirtqueue *svq);
+-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
++VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
++                                    const VhostShadowVirtqueueOps *ops,
++                                    void *ops_opaque);
+ void vhost_svq_free(gpointer vq);
+ G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
+diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-vdpa.c
++++ b/hw/virtio/vhost-vdpa.c
+@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
+     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
+     for (unsigned n = 0; n < hdev->nvqs; ++n) {
+-        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree);
++        g_autoptr(VhostShadowVirtqueue) svq;
++        svq = vhost_svq_new(v->iova_tree, NULL, NULL);
+         if (unlikely(!svq)) {
+             error_setg(errp, "Cannot create svq %u", n);
+             return -1;
+--
+.7.4

-New patch
+[PULL V2 15/25] vdpa: Export vhost_vdpa_dma_map and unmap calls
+From: Eugenio Pérez <eperezma@redhat.com>
+Shadow CVQ will copy buffers on qemu VA, so we avoid TOCTOU attacks from
+the guest that could set a different state in qemu device model and vdpa
+device.
+To do so, it needs to be able to map these new buffers to the device.
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
+Acked-by: Jason Wang <jasowang@redhat.com>
+Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
+Signed-off-by: Jason Wang <jasowang@redhat.com>
+---
+ hw/virtio/vhost-vdpa.c         | 7 +++----
+ include/hw/virtio/vhost-vdpa.h | 4 ++++
+files changed, 7 insertions(+), 4 deletions(-)
+diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/virtio/vhost-vdpa.c
++++ b/hw/virtio/vhost-vdpa.c
+@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
+     return false;
+ }
+-static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
+-                              void *vaddr, bool readonly)
++int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
++                       void *vaddr, bool readonly)
+ {
+     struct vhost_msg_v2 msg = {};
+     int fd = v->device_fd;
+@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
+     return ret;
+ }
+-static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
+-                                hwaddr size)
++int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
+ {
+     struct vhost_msg_v2 msg = {};
+     int fd = v->device_fd;
+diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/hw/virtio/vhost-vdpa.h
++++ b/include/hw/virtio/vhost-vdpa.h
+@@ -XXX,XX +XXX,XX @@ typedef struct vhost_vdpa {
+     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
+ } VhostVDPA;
++int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
++                       void *vaddr, bool readonly);
++int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
++
+ #endif
+--
+.7.4

-[Qemu-devel] [PULL 12/14] docs/colo-proxy.txt: Update colo-proxy usage of net driver with vnet_header
+[PULL V2 16/25] vhost-net-vdpa: add stubs for when no virtio-net device is present
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Eugenio Pérez <eperezma@redhat.com>
-Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+net/vhost-vdpa.c will need functions that are declared in
 vhost-shadow-virtqueue.c, that needs functions of virtio-net.c.
 Copy the vhost-vdpa-stub.c code so
 only the constructor net_init_vhost_vdpa needs to be defined.
 Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- docs/colo-proxy.txt | 26 ++++++++++++++++++++++++++
+ net/meson.build       |  3 ++-
-file changed, 26 insertions(+)
+ net/vhost-vdpa-stub.c | 21 +++++++++++++++++++++
 files changed, 23 insertions(+), 1 deletion(-)
  create mode 100644 net/vhost-vdpa-stub.c
-diff --git a/docs/colo-proxy.txt b/docs/colo-proxy.txt
+diff --git a/net/meson.build b/net/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/docs/colo-proxy.txt
+--- a/net/meson.build
-+++ b/docs/colo-proxy.txt
++++ b/net/meson.build
-@@ -XXX,XX +XXX,XX @@ Secondary(ip:3.3.3.8):
+@@ -XXX,XX +XXX,XX @@ endif
- -chardev socket,id=red1,host=3.3.3.3,port=9004
+ softmmu_ss.add(when: 'CONFIG_POSIX', if_true: files(tap_posix))
- -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
+ softmmu_ss.add(when: 'CONFIG_WIN32', if_true: files('tap-win32.c'))
- -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
+ if have_vhost_net_vdpa
-+-object filter-rewriter,id=f3,netdev=hn0,queue=all
+-  softmmu_ss.add(files('vhost-vdpa.c'))
 +  softmmu_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('vhost-vdpa.c'), if_false: files('vhost-vdpa-stub.c'))
 +  softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-vdpa-stub.c'))
  endif
  vmnet_files = files(
 diff --git a/net/vhost-vdpa-stub.c b/net/vhost-vdpa-stub.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/net/vhost-vdpa-stub.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * vhost-vdpa-stub.c
 + *
 + * Copyright (c) 2022 Red Hat, Inc.
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + *
 + */
 +
-+If you want to use virtio-net-pci or other driver with vnet_header:
++#include "qemu/osdep.h"
 +#include "clients.h"
 +#include "net/vhost-vdpa.h"
 +#include "qapi/error.h"
 +
-+Primary(ip:3.3.3.3):
++int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
-+-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
++                        NetClientState *peer, Error **errp)
-+-device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
++{
-+-chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
++    error_setg(errp, "vhost-vdpa requires frontend driver virtio-net-*");
-+-chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
++    return -1;
-+-chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
++}
 +-chardev socket,id=compare0-0,host=3.3.3.3,port=9001
 +-chardev socket,id=compare_out,host=3.3.3.3,port=9005,server,nowait
 +-chardev socket,id=compare_out0,host=3.3.3.3,port=9005
 +-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0,vnet_hdr_support
 +-object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out,vnet_hdr_support
 +-object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0,vnet_hdr_support
 +-object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,vnet_hdr_support
 +
 +Secondary(ip:3.3.3.8):
 +-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,down script=/etc/qemu-ifdown
 +-device e1000,netdev=hn0,mac=52:a4:00:12:78:66
 +-chardev socket,id=red0,host=3.3.3.3,port=9003
 +-chardev socket,id=red1,host=3.3.3.3,port=9004
 +-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0,vnet_hdr_support
 +-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1,vnet_hdr_support
 +-object filter-rewriter,id=f3,netdev=hn0,queue=all,vnet_hdr_support
  Note:
    a.COLO-proxy must work with COLO-frame and Block-replication.
 --
 .7.4

-[Qemu-devel] [PULL 11/14] net/filter-rewriter.c: Make filter-rewriter support vnet_hdr_len
+[PULL V2 17/25] vdpa: manual forward CVQ buffers
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Eugenio Pérez <eperezma@redhat.com>
-We add the vnet_hdr_support option for filter-rewriter, default is disabled.
+Do a simple forwarding of CVQ buffers, the same work SVQ could do but
-If you use virtio-net-pci or other driver needs vnet_hdr, please enable it.
+through callbacks. No functional change intended.
 You can use it for example:
 -object filter-rewriter,id=rew0,netdev=hn0,queue=all,vnet_hdr_support
-We get the vnet_hdr_len from NetClientState that make us
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
-parse net packet correctly.
+Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- net/filter-rewriter.c | 37 ++++++++++++++++++++++++++++++++++++-
+ hw/virtio/vhost-vdpa.c         |  3 ++-
- qemu-options.hx       |  4 ++--
+ include/hw/virtio/vhost-vdpa.h |  3 +++
-files changed, 38 insertions(+), 3 deletions(-)
+ net/vhost-vdpa.c               | 58 ++++++++++++++++++++++++++++++++++++++++++
 files changed, 63 insertions(+), 1 deletion(-)
-diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
+diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
 index XXXXXXX..XXXXXXX 100644
---- a/net/filter-rewriter.c
+--- a/hw/virtio/vhost-vdpa.c
-+++ b/net/filter-rewriter.c
++++ b/hw/virtio/vhost-vdpa.c
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
      for (unsigned n = 0; n < hdev->nvqs; ++n) {
          g_autoptr(VhostShadowVirtqueue) svq;
 -        svq = vhost_svq_new(v->iova_tree, NULL, NULL);
 +        svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
 +                            v->shadow_vq_ops_opaque);
          if (unlikely(!svq)) {
              error_setg(errp, "Cannot create svq %u", n);
              return -1;
 diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/virtio/vhost-vdpa.h
 +++ b/include/hw/virtio/vhost-vdpa.h
 @@ -XXX,XX +XXX,XX @@
- #include "qemu-common.h"
+ #include <gmodule.h>
  #include "hw/virtio/vhost-iova-tree.h"
 +#include "hw/virtio/vhost-shadow-virtqueue.h"
  #include "hw/virtio/virtio.h"
  #include "standard-headers/linux/vhost_types.h"
@@ -XXX,XX +XXX,XX @@ typedef struct vhost_vdpa {
      /* IOVA mapping used by the Shadow Virtqueue */
      VhostIOVATree *iova_tree;
      GPtrArray *shadow_vqs;
 +    const VhostShadowVirtqueueOps *shadow_vq_ops;
 +    void *shadow_vq_ops_opaque;
      struct vhost_dev *dev;
      VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
  } VhostVDPA;
 diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
 index XXXXXXX..XXXXXXX 100644
 --- a/net/vhost-vdpa.c
 +++ b/net/vhost-vdpa.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
  #include "clients.h"
 +#include "hw/virtio/virtio-net.h"
  #include "net/vhost_net.h"
  #include "net/vhost-vdpa.h"
  #include "hw/virtio/vhost-vdpa.h"
  #include "qemu/config-file.h"
  #include "qemu/error-report.h"
 +#include "qemu/log.h"
 +#include "qemu/memalign.h"
  #include "qemu/option.h"
  #include "qapi/error.h"
- #include "qapi/qmp/qerror.h"
+ #include <linux/vhost.h>
-+#include "qemu/error-report.h"
+@@ -XXX,XX +XXX,XX @@ static NetClientInfo net_vhost_vdpa_info = {
- #include "qapi-visit.h"
+         .check_peer_type = vhost_vdpa_check_peer_type,
- #include "qom/object.h"
+ };
- #include "qemu/main-loop.h"
-@@ -XXX,XX +XXX,XX @@ typedef struct RewriterState {
++/**
-     NetQueue *incoming_queue;
++ * Forward buffer for the moment.
-     /* hashtable to save connection */
++ */
-     GHashTable *connection_track_table;
++static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
-+    bool vnet_hdr;
++                                            VirtQueueElement *elem,
- } RewriterState;
++                                            void *opaque)
++{
- static void filter_rewriter_flush(NetFilterState *nf)
++    unsigned int n = elem->out_num + elem->in_num;
-@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
++    g_autofree struct iovec *dev_buffers = g_new(struct iovec, n);
-     ConnectionKey key;
++    size_t in_len, dev_written;
-     Packet *pkt;
++    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
-     ssize_t size = iov_size(iov, iovcnt);
++    int r;
 +    ssize_t vnet_hdr_len = 0;
      char *buf = g_malloc0(size);
      iov_to_buf(iov, iovcnt, 0, buf, size);
 -    pkt = packet_new(buf, size, 0);
 +
-+    if (s->vnet_hdr) {
++    memcpy(dev_buffers, elem->out_sg, elem->out_num);
-+        vnet_hdr_len = nf->netdev->vnet_hdr_len;
++    memcpy(dev_buffers + elem->out_num, elem->in_sg, elem->in_num);
 +
 +    r = vhost_svq_add(svq, &dev_buffers[0], elem->out_num, &dev_buffers[1],
 +                      elem->in_num, elem);
 +    if (unlikely(r != 0)) {
 +        if (unlikely(r == -ENOSPC)) {
 +            qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
 +                          __func__);
 +        }
 +        goto out;
 +    }
 +
-+    pkt = packet_new(buf, size, vnet_hdr_len);
++    /*
-     g_free(buf);
++     * We can poll here since we've had BQL from the time we sent the
++     * descriptor. Also, we need to take the answer before SVQ pulls by itself,
-     /*
++     * when BQL is released
-@@ -XXX,XX +XXX,XX @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
++     */
-     s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
++    dev_written = vhost_svq_poll(svq);
- }
++    if (unlikely(dev_written < sizeof(status))) {
++        error_report("Insufficient written data (%zu)", dev_written);
-+static bool filter_rewriter_get_vnet_hdr(Object *obj, Error **errp)
++    }
 +{
 +    RewriterState *s = FILTER_COLO_REWRITER(obj);
 +
-+    return s->vnet_hdr;
++out:
 +    in_len = iov_from_buf(elem->in_sg, elem->in_num, 0, &status,
 +                          sizeof(status));
 +    if (unlikely(in_len < sizeof(status))) {
 +        error_report("Bad device CVQ written length");
 +    }
 +    vhost_svq_push_elem(svq, elem, MIN(in_len, sizeof(status)));
 +    g_free(elem);
 +    return r;
 +}
 +
-+static void filter_rewriter_set_vnet_hdr(Object *obj,
++static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
-+                                         bool value,
++    .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
-+                                         Error **errp)
++};
 +{
 +    RewriterState *s = FILTER_COLO_REWRITER(obj);
 +
-+    s->vnet_hdr = value;
+ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
-+}
+                                            const char *device,
-+
+                                            const char *name,
-+static void filter_rewriter_init(Object *obj)
+@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
-+{
-+    RewriterState *s = FILTER_COLO_REWRITER(obj);
+     s->vhost_vdpa.device_fd = vdpa_device_fd;
-+
+     s->vhost_vdpa.index = queue_pair_index;
-+    s->vnet_hdr = false;
++    if (!is_datapath) {
-+    object_property_add_bool(obj, "vnet_hdr_support",
++        s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
-+                             filter_rewriter_get_vnet_hdr,
++        s->vhost_vdpa.shadow_vq_ops_opaque = s;
-+                             filter_rewriter_set_vnet_hdr, NULL);
++    }
-+}
+     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
-+
+     if (ret) {
- static void colo_rewriter_class_init(ObjectClass *oc, void *data)
+         qemu_del_net_client(nc);
  {
      NetFilterClass *nfc = NETFILTER_CLASS(oc);
@@ -XXX,XX +XXX,XX @@ static const TypeInfo colo_rewriter_info = {
      .name = TYPE_FILTER_REWRITER,
      .parent = TYPE_NETFILTER,
      .class_init = colo_rewriter_class_init,
 +    .instance_init = filter_rewriter_init,
      .instance_size = sizeof(RewriterState),
  };
 diff --git a/qemu-options.hx b/qemu-options.hx
 index XXXXXXX..XXXXXXX 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ Create a filter-redirector we need to differ outdev id from indev id, id can not
  be the same. we can just use indev or outdev, but at least one of indev or outdev
  need to be specified.
 -@item -object filter-rewriter,id=@var{id},netdev=@var{netdevid}[,queue=@var{all|rx|tx}]
 +@item -object filter-rewriter,id=@var{id},netdev=@var{netdevid},queue=@var{all|rx|tx},[vnet_hdr_support]
  Filter-rewriter is a part of COLO project.It will rewrite tcp packet to
  secondary from primary to keep secondary tcp connection,and rewrite
  tcp packet to primary from secondary make tcp packet can be handled by
 -client.
 +client.if it has the vnet_hdr_support flag, we can parse packet with vnet header.
  usage:
  colo secondary:
 --
 .7.4

-[Qemu-devel] [PULL 01/14] net: Add vnet_hdr_len arguments in NetClientState
+[PULL V2 18/25] vdpa: Buffer CVQ support on shadow virtqueue
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Eugenio Pérez <eperezma@redhat.com>
-Add vnet_hdr_len arguments in NetClientState
+Introduce the control virtqueue support for vDPA shadow virtqueue. This
-that make other module get real vnet_hdr_len easily.
+is needed for advanced networking features like rx filtering.
-Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+Virtio-net control VQ copies the descriptors to qemu's VA, so we avoid
 TOCTOU with the guest's or device's memory every time there is a device
 model change.  Otherwise, the guest could change the memory content in
 the time between qemu and the device read it.
 To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is
 implemented.  If the virtio-net driver changes MAC the virtio-net device
 model will be updated with the new one, and a rx filtering change event
 will be raised.
 More cvq commands could be added here straightforwardly but they have
 not been tested.
 Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- include/net/net.h | 1 +
+ net/vhost-vdpa.c | 213 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
- net/net.c         | 1 +
+file changed, 205 insertions(+), 8 deletions(-)
-files changed, 2 insertions(+)
+diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
 diff --git a/include/net/net.h b/include/net/net.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/net/net.h
+--- a/net/vhost-vdpa.c
-+++ b/include/net/net.h
++++ b/net/vhost-vdpa.c
-@@ -XXX,XX +XXX,XX @@ struct NetClientState {
+@@ -XXX,XX +XXX,XX @@ typedef struct VhostVDPAState {
-     unsigned int queue_index;
+     NetClientState nc;
-     unsigned rxfilter_notify_enabled:1;
+     struct vhost_vdpa vhost_vdpa;
-     int vring_enable;
+     VHostNetState *vhost_net;
-+    int vnet_hdr_len;
++
-     QTAILQ_HEAD(NetFilterHead, NetFilterState) filters;
++    /* Control commands shadow buffers */
 +    void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
      bool started;
  } VhostVDPAState;
@@ -XXX,XX +XXX,XX @@ static void vhost_vdpa_cleanup(NetClientState *nc)
  {
      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
 +    qemu_vfree(s->cvq_cmd_out_buffer);
 +    qemu_vfree(s->cvq_cmd_in_buffer);
      if (s->vhost_net) {
          vhost_net_cleanup(s->vhost_net);
          g_free(s->vhost_net);
@@ -XXX,XX +XXX,XX @@ static NetClientInfo net_vhost_vdpa_info = {
          .check_peer_type = vhost_vdpa_check_peer_type,
  };
-diff --git a/net/net.c b/net/net.c
++static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
-index XXXXXXX..XXXXXXX 100644
++{
---- a/net/net.c
++    VhostIOVATree *tree = v->iova_tree;
-+++ b/net/net.c
++    DMAMap needle = {
-@@ -XXX,XX +XXX,XX @@ void qemu_set_vnet_hdr_len(NetClientState *nc, int len)
++        /*
-         return;
++         * No need to specify size or to look for more translations since
 +         * this contiguous chunk was allocated by us.
 +         */
 +        .translated_addr = (hwaddr)(uintptr_t)addr,
 +    };
 +    const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
 +    int r;
 +
 +    if (unlikely(!map)) {
 +        error_report("Cannot locate expected map");
 +        return;
 +    }
 +
 +    r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1);
 +    if (unlikely(r != 0)) {
 +        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
 +    }
 +
 +    vhost_iova_tree_remove(tree, map);
 +}
 +
 +static size_t vhost_vdpa_net_cvq_cmd_len(void)
 +{
 +    /*
 +     * MAC_TABLE_SET is the ctrl command that produces the longer out buffer.
 +     * In buffer is always 1 byte, so it should fit here
 +     */
 +    return sizeof(struct virtio_net_ctrl_hdr) +
 +           2 * sizeof(struct virtio_net_ctrl_mac) +
 +           MAC_TABLE_ENTRIES * ETH_ALEN;
 +}
 +
 +static size_t vhost_vdpa_net_cvq_cmd_page_len(void)
 +{
 +    return ROUND_UP(vhost_vdpa_net_cvq_cmd_len(), qemu_real_host_page_size());
 +}
 +
 +/** Copy and map a guest buffer. */
 +static bool vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v,
 +                                   const struct iovec *out_data,
 +                                   size_t out_num, size_t data_len, void *buf,
 +                                   size_t *written, bool write)
 +{
 +    DMAMap map = {};
 +    int r;
 +
 +    if (unlikely(!data_len)) {
 +        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid legnth of %s buffer\n",
 +                      __func__, write ? "in" : "out");
 +        return false;
 +    }
 +
 +    *written = iov_to_buf(out_data, out_num, 0, buf, data_len);
 +    map.translated_addr = (hwaddr)(uintptr_t)buf;
 +    map.size = vhost_vdpa_net_cvq_cmd_page_len() - 1;
 +    map.perm = write ? IOMMU_RW : IOMMU_RO,
 +    r = vhost_iova_tree_map_alloc(v->iova_tree, &map);
 +    if (unlikely(r != IOVA_OK)) {
 +        error_report("Cannot map injected element");
 +        return false;
 +    }
 +
 +    r = vhost_vdpa_dma_map(v, map.iova, vhost_vdpa_net_cvq_cmd_page_len(), buf,
 +                           !write);
 +    if (unlikely(r < 0)) {
 +        goto dma_map_err;
 +    }
 +
 +    return true;
 +
 +dma_map_err:
 +    vhost_iova_tree_remove(v->iova_tree, &map);
 +    return false;
 +}
 +
  /**
 - * Forward buffer for the moment.
 + * Copy the guest element into a dedicated buffer suitable to be sent to NIC
 + *
 + * @iov: [0] is the out buffer, [1] is the in one
 + */
 +static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
 +                                        VirtQueueElement *elem,
 +                                        struct iovec *iov)
 +{
 +    size_t in_copied;
 +    bool ok;
 +
 +    iov[0].iov_base = s->cvq_cmd_out_buffer;
 +    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
 +                                vhost_vdpa_net_cvq_cmd_len(), iov[0].iov_base,
 +                                &iov[0].iov_len, false);
 +    if (unlikely(!ok)) {
 +        return false;
 +    }
 +
 +    iov[1].iov_base = s->cvq_cmd_in_buffer;
 +    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, NULL, 0,
 +                                sizeof(virtio_net_ctrl_ack), iov[1].iov_base,
 +                                &in_copied, true);
 +    if (unlikely(!ok)) {
 +        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
 +        return false;
 +    }
 +
 +    iov[1].iov_len = sizeof(virtio_net_ctrl_ack);
 +    return true;
 +}
 +
 +/**
 + * Do not forward commands not supported by SVQ. Otherwise, the device could
 + * accept it and qemu would not know how to update the device model.
 + */
 +static bool vhost_vdpa_net_cvq_validate_cmd(const struct iovec *out,
 +                                            size_t out_num)
 +{
 +    struct virtio_net_ctrl_hdr ctrl;
 +    size_t n;
 +
 +    n = iov_to_buf(out, out_num, 0, &ctrl, sizeof(ctrl));
 +    if (unlikely(n < sizeof(ctrl))) {
 +        qemu_log_mask(LOG_GUEST_ERROR,
 +                      "%s: invalid legnth of out buffer %zu\n", __func__, n);
 +        return false;
 +    }
 +
 +    switch (ctrl.class) {
 +    case VIRTIO_NET_CTRL_MAC:
 +        switch (ctrl.cmd) {
 +        case VIRTIO_NET_CTRL_MAC_ADDR_SET:
 +            return true;
 +        default:
 +            qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mac cmd %u\n",
 +                          __func__, ctrl.cmd);
 +        };
 +        break;
 +    default:
 +        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid control class %u\n",
 +                      __func__, ctrl.class);
 +    };
 +
 +    return false;
 +}
 +
 +/**
 + * Validate and copy control virtqueue commands.
 + *
 + * Following QEMU guidelines, we offer a copy of the buffers to the device to
 + * prevent TOCTOU bugs.
   */
  static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
                                              VirtQueueElement *elem,
                                              void *opaque)
  {
 -    unsigned int n = elem->out_num + elem->in_num;
 -    g_autofree struct iovec *dev_buffers = g_new(struct iovec, n);
 +    VhostVDPAState *s = opaque;
      size_t in_len, dev_written;
      virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
 -    int r;
 +    /* out and in buffers sent to the device */
 +    struct iovec dev_buffers[2] = {
 +        { .iov_base = s->cvq_cmd_out_buffer },
 +        { .iov_base = s->cvq_cmd_in_buffer },
 +    };
 +    /* in buffer used for device model */
 +    const struct iovec in = {
 +        .iov_base = &status,
 +        .iov_len = sizeof(status),
 +    };
 +    int r = -EINVAL;
 +    bool ok;
 +
 +    ok = vhost_vdpa_net_cvq_map_elem(s, elem, dev_buffers);
 +    if (unlikely(!ok)) {
 +        goto out;
 +    }
 -    memcpy(dev_buffers, elem->out_sg, elem->out_num);
 -    memcpy(dev_buffers + elem->out_num, elem->in_sg, elem->in_num);
 +    ok = vhost_vdpa_net_cvq_validate_cmd(&dev_buffers[0], 1);
 +    if (unlikely(!ok)) {
 +        goto out;
 +    }
 -    r = vhost_svq_add(svq, &dev_buffers[0], elem->out_num, &dev_buffers[1],
 -                      elem->in_num, elem);
 +    r = vhost_svq_add(svq, &dev_buffers[0], 1, &dev_buffers[1], 1, elem);
      if (unlikely(r != 0)) {
          if (unlikely(r == -ENOSPC)) {
              qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
      dev_written = vhost_svq_poll(svq);
      if (unlikely(dev_written < sizeof(status))) {
          error_report("Insufficient written data (%zu)", dev_written);
 +        goto out;
 +    }
 +
 +    memcpy(&status, dev_buffers[1].iov_base, sizeof(status));
 +    if (status != VIRTIO_NET_OK) {
 +        goto out;
 +    }
 +
 +    status = VIRTIO_NET_ERR;
 +    virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, dev_buffers, 1);
 +    if (status != VIRTIO_NET_OK) {
 +        error_report("Bad CVQ processing in model");
      }
-+    nc->vnet_hdr_len = len;
+ out:
-     nc->info->set_vnet_hdr_len(nc, len);
+@@ -XXX,XX +XXX,XX @@ out:
      }
      vhost_svq_push_elem(svq, elem, MIN(in_len, sizeof(status)));
      g_free(elem);
 +    if (dev_buffers[0].iov_base) {
 +        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, dev_buffers[0].iov_base);
 +    }
 +    if (dev_buffers[1].iov_base) {
 +        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, dev_buffers[1].iov_base);
 +    }
      return r;
  }
+@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
+     s->vhost_vdpa.device_fd = vdpa_device_fd;
+     s->vhost_vdpa.index = queue_pair_index;
+     if (!is_datapath) {
++        s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
++                                            vhost_vdpa_net_cvq_cmd_page_len());
++        memset(s->cvq_cmd_out_buffer, 0, vhost_vdpa_net_cvq_cmd_page_len());
++        s->cvq_cmd_in_buffer = qemu_memalign(qemu_real_host_page_size(),
++                                            vhost_vdpa_net_cvq_cmd_page_len());
++        memset(s->cvq_cmd_in_buffer, 0, vhost_vdpa_net_cvq_cmd_page_len());
++
+         s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
+         s->vhost_vdpa.shadow_vq_ops_opaque = s;
+     }
 --
 .7.4

-[Qemu-devel] [PULL 10/14] net/colo-compare.c: Add vnet packet's tcp/udp/icmp compare
+[PULL V2 19/25] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Eugenio Pérez <eperezma@redhat.com>
-COLO-Proxy just focus on packet payload, so we skip vnet header.
+To know the device features is needed for CVQ SVQ, so SVQ knows if it
 can handle all commands or not. Extract from
 vhost_vdpa_get_max_queue_pairs so we can reuse it.
-Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
 Acked-by: Jason Wang <jasowang@redhat.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- net/colo-compare.c | 8 ++++++--
+ net/vhost-vdpa.c | 30 ++++++++++++++++++++----------
-file changed, 6 insertions(+), 2 deletions(-)
+file changed, 20 insertions(+), 10 deletions(-)
-diff --git a/net/colo-compare.c b/net/colo-compare.c
+diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
 index XXXXXXX..XXXXXXX 100644
---- a/net/colo-compare.c
+--- a/net/vhost-vdpa.c
-+++ b/net/colo-compare.c
++++ b/net/vhost-vdpa.c
-@@ -XXX,XX +XXX,XX @@ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset)
+@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
-                                    sec_ip_src, sec_ip_dst);
+     return nc;
  }
 -static int vhost_vdpa_get_max_queue_pairs(int fd, int *has_cvq, Error **errp)
 +static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
 +{
 +    int ret = ioctl(fd, VHOST_GET_FEATURES, features);
 +    if (unlikely(ret < 0)) {
 +        error_setg_errno(errp, errno,
 +                         "Fail to query features from vhost-vDPA device");
 +    }
 +    return ret;
 +}
 +
 +static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
 +                                          int *has_cvq, Error **errp)
  {
      unsigned long config_size = offsetof(struct vhost_vdpa_config, buf);
      g_autofree struct vhost_vdpa_config *config = NULL;
      __virtio16 *max_queue_pairs;
 -    uint64_t features;
      int ret;
 -    ret = ioctl(fd, VHOST_GET_FEATURES, &features);
 -    if (ret) {
 -        error_setg(errp, "Fail to query features from vhost-vDPA device");
 -        return ret;
 -    }
 -
      if (features & (1 << VIRTIO_NET_F_CTRL_VQ)) {
          *has_cvq = 1;
      } else {
@@ -XXX,XX +XXX,XX @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
                          NetClientState *peer, Error **errp)
  {
      const NetdevVhostVDPAOptions *opts;
 +    uint64_t features;
      int vdpa_device_fd;
      g_autofree NetClientState **ncs = NULL;
      NetClientState *nc;
 -    int queue_pairs, i, has_cvq = 0;
 +    int queue_pairs, r, i, has_cvq = 0;
      assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
      opts = &netdev->u.vhost_vdpa;
@@ -XXX,XX +XXX,XX @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
          return -errno;
      }
-+    offset = ppkt->vnet_hdr_len + offset;
+-    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd,
 +    r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp);
 +    if (unlikely(r < 0)) {
 +        return r;
 +    }
 +
-     if (ppkt->size == spkt->size) {
++    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features,
--        return memcmp(ppkt->data + offset, spkt->data + offset,
+                                                  &has_cvq, errp);
-+        return memcmp(ppkt->data + offset,
+     if (queue_pairs < 0) {
-+                      spkt->data + offset,
+         qemu_close(vdpa_device_fd);
                        spkt->size - offset);
      } else {
          trace_colo_compare_main("Net packet size are not the same");
@@ -XXX,XX +XXX,XX @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt)
       */
      if (ptcp->th_off > 5) {
          ptrdiff_t tcp_offset;
 +
          tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
 -                     + (ptcp->th_off * 4);
 +                     + (ptcp->th_off * 4) - ppkt->vnet_hdr_len;
          res = colo_packet_compare_common(ppkt, spkt, tcp_offset);
      } else if (ptcp->th_sum == stcp->th_sum) {
          res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);
 --
 .7.4

-[Qemu-devel] [PULL 08/14] net/colo-compare.c: Make colo-compare support vnet_hdr_len
+[PULL V2 20/25] vdpa: Add device migration blocker
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Eugenio Pérez <eperezma@redhat.com>
-We add the vnet_hdr_support option for colo-compare, default is disabled.
+Since the vhost-vdpa device is exposing _F_LOG, adding a migration blocker if
-If you use virtio-net-pci or other driver needs vnet_hdr, please enable it.
+it uses CVQ.
 You can use it for example:
 -object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,vnet_hdr_support
-COLO-compare can get vnet header length from filter,
+However, qemu is able to migrate simple devices with no CVQ as long as
-Add vnet_hdr_len to struct packet and output packet with
+they use SVQ. To allow it, add a placeholder error to vhost_vdpa, and
-the vnet_hdr_len.
+only add to vhost_dev when used. vhost_dev machinery place the migration
 blocker if needed.
-Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- net/colo-compare.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++-------
+ hw/virtio/vhost-vdpa.c         | 15 +++++++++++++++
- qemu-options.hx    |  4 ++--
+ include/hw/virtio/vhost-vdpa.h |  1 +
-files changed, 55 insertions(+), 9 deletions(-)
+files changed, 16 insertions(+)
-diff --git a/net/colo-compare.c b/net/colo-compare.c
+diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
 index XXXXXXX..XXXXXXX 100644
---- a/net/colo-compare.c
+--- a/hw/virtio/vhost-vdpa.c
-+++ b/net/colo-compare.c
++++ b/hw/virtio/vhost-vdpa.c
-@@ -XXX,XX +XXX,XX @@ typedef struct CompareState {
+@@ -XXX,XX +XXX,XX @@
-     CharBackend chr_out;
+ #include "hw/virtio/vhost-shadow-virtqueue.h"
-     SocketReadState pri_rs;
+ #include "hw/virtio/vhost-vdpa.h"
-     SocketReadState sec_rs;
+ #include "exec/address-spaces.h"
-+    bool vnet_hdr;
++#include "migration/blocker.h"
+ #include "qemu/cutils.h"
-     /* connection list: the connections belonged to this NIC could be found
+ #include "qemu/main-loop.h"
-      * in this list.
+ #include "cpu.h"
-@@ -XXX,XX +XXX,XX @@ enum {
+@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
+         return true;
  static int compare_chr_send(CompareState *s,
                              const uint8_t *buf,
 -                            uint32_t size);
 +                            uint32_t size,
 +                            uint32_t vnet_hdr_len);
  static gint seq_sorter(Packet *a, Packet *b, gpointer data)
  {
@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
          }
          if (result) {
 -            ret = compare_chr_send(s, pkt->data, pkt->size);
 +            ret = compare_chr_send(s,
 +                                   pkt->data,
 +                                   pkt->size,
 +                                   pkt->vnet_hdr_len);
              if (ret < 0) {
                  error_report("colo_send_primary_packet failed");
              }
@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
  static int compare_chr_send(CompareState *s,
                              const uint8_t *buf,
 -                            uint32_t size)
 +                            uint32_t size,
 +                            uint32_t vnet_hdr_len)
  {
      int ret = 0;
      uint32_t len = htonl(size);
@@ -XXX,XX +XXX,XX @@ static int compare_chr_send(CompareState *s,
          goto err;
      }
-+    if (s->vnet_hdr) {
++    if (v->migration_blocker) {
-+        /*
++        int r = migrate_add_blocker(v->migration_blocker, &err);
-+         * We send vnet header len make other module(like filter-redirector)
++        if (unlikely(r < 0)) {
-+         * know how to parse net packet correctly.
++            return false;
 +         */
 +        len = htonl(vnet_hdr_len);
 +        ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
 +        if (ret != sizeof(len)) {
 +            goto err;
 +        }
 +    }
 +
-     ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
+     for (i = 0; i < v->shadow_vqs->len; ++i) {
-     if (ret != size) {
+         VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
-         goto err;
+         VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
-@@ -XXX,XX +XXX,XX @@ static void compare_set_outdev(Object *obj, const char *value, Error **errp)
+@@ -XXX,XX +XXX,XX @@ err:
-     s->outdev = g_strdup(value);
+         vhost_svq_stop(svq);
      }
 +    if (v->migration_blocker) {
 +        migrate_del_blocker(v->migration_blocker);
 +    }
 +
      return false;
  }
-+static bool compare_get_vnet_hdr(Object *obj, Error **errp)
+@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
-+{
+         }
 +    CompareState *s = COLO_COMPARE(obj);
 +
 +    return s->vnet_hdr;
 +}
 +
 +static void compare_set_vnet_hdr(Object *obj,
 +                                 bool value,
 +                                 Error **errp)
 +{
 +    CompareState *s = COLO_COMPARE(obj);
 +
 +    s->vnet_hdr = value;
 +}
 +
  static void compare_pri_rs_finalize(SocketReadState *pri_rs)
  {
      CompareState *s = container_of(pri_rs, CompareState, pri_rs);
      if (packet_enqueue(s, PRIMARY_IN)) {
          trace_colo_compare_main("primary: unsupported packet in");
 -        compare_chr_send(s, pri_rs->buf, pri_rs->packet_len);
 +        compare_chr_send(s,
 +                         pri_rs->buf,
 +                         pri_rs->packet_len,
 +                         pri_rs->vnet_hdr_len);
      } else {
          /* compare connection */
          g_queue_foreach(&s->conn_list, colo_compare_connection, s);
@@ -XXX,XX +XXX,XX @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
          return;
      }
--    net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, false);
++    if (v->migration_blocker) {
--    net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, false);
++        migrate_del_blocker(v->migration_blocker);
-+    net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, s->vnet_hdr);
++    }
-+    net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, s->vnet_hdr);
+     return true;
      g_queue_init(&s->conn_list);
@@ -XXX,XX +XXX,XX @@ static void colo_flush_packets(void *opaque, void *user_data)
      while (!g_queue_is_empty(&conn->primary_list)) {
          pkt = g_queue_pop_head(&conn->primary_list);
 -        compare_chr_send(s, pkt->data, pkt->size);
 +        compare_chr_send(s,
 +                         pkt->data,
 +                         pkt->size,
 +                         pkt->vnet_hdr_len);
          packet_destroy(pkt, NULL);
      }
      while (!g_queue_is_empty(&conn->secondary_list)) {
@@ -XXX,XX +XXX,XX @@ static void colo_compare_class_init(ObjectClass *oc, void *data)
  static void colo_compare_init(Object *obj)
  {
 +    CompareState *s = COLO_COMPARE(obj);
 +
      object_property_add_str(obj, "primary_in",
                              compare_get_pri_indev, compare_set_pri_indev,
                              NULL);
@@ -XXX,XX +XXX,XX @@ static void colo_compare_init(Object *obj)
      object_property_add_str(obj, "outdev",
                              compare_get_outdev, compare_set_outdev,
                              NULL);
 +
 +    s->vnet_hdr = false;
 +    object_property_add_bool(obj, "vnet_hdr_support", compare_get_vnet_hdr,
 +                             compare_set_vnet_hdr, NULL);
  }
- static void colo_compare_finalize(Object *obj)
+diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
 diff --git a/qemu-options.hx b/qemu-options.hx
 index XXXXXXX..XXXXXXX 100644
---- a/qemu-options.hx
+--- a/include/hw/virtio/vhost-vdpa.h
-+++ b/qemu-options.hx
++++ b/include/hw/virtio/vhost-vdpa.h
-@@ -XXX,XX +XXX,XX @@ Dump the network traffic on netdev @var{dev} to the file specified by
+@@ -XXX,XX +XXX,XX @@ typedef struct vhost_vdpa {
- The file format is libpcap, so it can be analyzed with tools such as tcpdump
+     bool shadow_vqs_enabled;
- or Wireshark.
+     /* IOVA mapping used by the Shadow Virtqueue */
+     VhostIOVATree *iova_tree;
--@item -object colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},
++    Error *migration_blocker;
--outdev=@var{chardevid}
+     GPtrArray *shadow_vqs;
-+@item -object colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},outdev=@var{chardevid}[,vnet_hdr_support]
+     const VhostShadowVirtqueueOps *shadow_vq_ops;
+     void *shadow_vq_ops_opaque;
  Colo-compare gets packet from primary_in@var{chardevid} and secondary_in@var{chardevid}, than compare primary packet with
  secondary packet. If the packets are same, we will output primary
  packet to outdev@var{chardevid}, else we will notify colo-frame
  do checkpoint and send primary packet to outdev@var{chardevid}.
 +if it has the vnet_hdr_support flag, colo compare will send/recv packet with vnet_hdr_len.
  we must use it with the help of filter-mirror and filter-redirector.
 --
 .7.4

-[Qemu-devel] [PULL 02/14] net/net.c: Add vnet_hdr support in SocketReadState
+[PULL V2 21/25] vdpa: Add x-svq to NetdevVhostVDPAOptions
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Eugenio Pérez <eperezma@redhat.com>
-We add a flag to decide whether net_fill_rstate() need read
+Finally offering the possibility to enable SVQ from the command line.
 the vnet_hdr_len or not.
-Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
-Suggested-by: Jason Wang <jasowang@redhat.com>
+Acked-by: Markus Armbruster <armbru@redhat.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- include/net/net.h   |  9 +++++++--
+ net/vhost-vdpa.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
- net/colo-compare.c  |  4 ++--
+ qapi/net.json    |  9 ++++++-
- net/filter-mirror.c |  2 +-
+files changed, 77 insertions(+), 4 deletions(-)
  net/net.c           | 36 ++++++++++++++++++++++++++++++++----
  net/socket.c        |  8 ++++----
 files changed, 46 insertions(+), 13 deletions(-)
-diff --git a/include/net/net.h b/include/net/net.h
+diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/net/net.h
+--- a/net/vhost-vdpa.c
-+++ b/include/net/net.h
++++ b/net/vhost-vdpa.c
-@@ -XXX,XX +XXX,XX @@ typedef struct NICState {
+@@ -XXX,XX +XXX,XX @@ const int vdpa_feature_bits[] = {
- } NICState;
+     VHOST_INVALID_FEATURE_BIT
  struct SocketReadState {
 -    int state; /* 0 = getting length, 1 = getting data */
 +    /* 0 = getting length, 1 = getting vnet header length, 2 = getting data */
 +    int state;
 +    /* This flag decide whether to read the vnet_hdr_len field */
 +    bool vnet_hdr;
      uint32_t index;
      uint32_t packet_len;
 +    uint32_t vnet_hdr_len;
      uint8_t buf[NET_BUFSIZE];
      SocketReadStateFinalize *finalize;
  };
-@@ -XXX,XX +XXX,XX @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
- void print_net_client(Monitor *mon, NetClientState *nc);
++/** Supported device specific feature bits with SVQ */
- void hmp_info_network(Monitor *mon, const QDict *qdict);
++static const uint64_t vdpa_svq_device_features =
- void net_socket_rs_init(SocketReadState *rs,
++    BIT_ULL(VIRTIO_NET_F_CSUM) |
--                        SocketReadStateFinalize *finalize);
++    BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
-+                        SocketReadStateFinalize *finalize,
++    BIT_ULL(VIRTIO_NET_F_MTU) |
-+                        bool vnet_hdr);
++    BIT_ULL(VIRTIO_NET_F_MAC) |
++    BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
- /* NIC info */
++    BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
++    BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
-diff --git a/net/colo-compare.c b/net/colo-compare.c
++    BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
-index XXXXXXX..XXXXXXX 100644
++    BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
---- a/net/colo-compare.c
++    BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
-+++ b/net/colo-compare.c
++    BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
-@@ -XXX,XX +XXX,XX @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
++    BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
-         return;
++    BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |
 +    BIT_ULL(VIRTIO_NET_F_STATUS) |
 +    BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
 +    BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
 +    BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
 +    BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
 +    BIT_ULL(VIRTIO_NET_F_STANDBY);
 +
  VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
  {
      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -XXX,XX +XXX,XX @@ err_init:
  static void vhost_vdpa_cleanup(NetClientState *nc)
  {
      VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
 +    struct vhost_dev *dev = &s->vhost_net->dev;
      qemu_vfree(s->cvq_cmd_out_buffer);
      qemu_vfree(s->cvq_cmd_in_buffer);
 +    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
 +        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
 +    }
      if (s->vhost_net) {
          vhost_net_cleanup(s->vhost_net);
          g_free(s->vhost_net);
@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                             int vdpa_device_fd,
                                             int queue_pair_index,
                                             int nvqs,
 -                                           bool is_datapath)
 +                                           bool is_datapath,
 +                                           bool svq,
 +                                           VhostIOVATree *iova_tree)
  {
      NetClientState *nc = NULL;
      VhostVDPAState *s;
@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
      s->vhost_vdpa.device_fd = vdpa_device_fd;
      s->vhost_vdpa.index = queue_pair_index;
 +    s->vhost_vdpa.shadow_vqs_enabled = svq;
 +    s->vhost_vdpa.iova_tree = iova_tree;
      if (!is_datapath) {
          s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
                                              vhost_vdpa_net_cvq_cmd_page_len());
@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
          s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
          s->vhost_vdpa.shadow_vq_ops_opaque = s;
 +        error_setg(&s->vhost_vdpa.migration_blocker,
 +                   "Migration disabled: vhost-vdpa uses CVQ.");
      }
+     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
--    net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize);
+     if (ret) {
--    net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize);
+@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
-+    net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, false);
+     return nc;
-+    net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, false);
+ }
-     g_queue_init(&s->conn_list);
++static int vhost_vdpa_get_iova_range(int fd,
++                                     struct vhost_vdpa_iova_range *iova_range)
-diff --git a/net/filter-mirror.c b/net/filter-mirror.c
++{
-index XXXXXXX..XXXXXXX 100644
++    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
---- a/net/filter-mirror.c
++
-+++ b/net/filter-mirror.c
++    return ret < 0 ? -errno : 0;
-@@ -XXX,XX +XXX,XX @@ static void filter_redirector_setup(NetFilterState *nf, Error **errp)
++}
 +
  static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
  {
      int ret = ioctl(fd, VHOST_GET_FEATURES, features);
@@ -XXX,XX +XXX,XX @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
      uint64_t features;
      int vdpa_device_fd;
      g_autofree NetClientState **ncs = NULL;
 +    g_autoptr(VhostIOVATree) iova_tree = NULL;
      NetClientState *nc;
      int queue_pairs, r, i, has_cvq = 0;
@@ -XXX,XX +XXX,XX @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
          return queue_pairs;
      }
 +    if (opts->x_svq) {
 +        struct vhost_vdpa_iova_range iova_range;
 +
 +        uint64_t invalid_dev_features =
 +            features & ~vdpa_svq_device_features &
 +            /* Transport are all accepted at this point */
 +            ~MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
 +                             VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
 +
 +        if (invalid_dev_features) {
 +            error_setg(errp, "vdpa svq does not work with features 0x%" PRIx64,
 +                       invalid_dev_features);
 +            goto err_svq;
 +        }
 +
 +        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
 +        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
 +    }
 +
      ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
      for (i = 0; i < queue_pairs; i++) {
          ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
 -                                     vdpa_device_fd, i, 2, true);
 +                                     vdpa_device_fd, i, 2, true, opts->x_svq,
 +                                     iova_tree);
          if (!ncs[i])
              goto err;
      }
      if (has_cvq) {
          nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
 -                                 vdpa_device_fd, i, 1, false);
 +                                 vdpa_device_fd, i, 1, false,
 +                                 opts->x_svq, iova_tree);
          if (!nc)
              goto err;
      }
 +    /* iova_tree ownership belongs to last NetClientState */
 +    g_steal_pointer(&iova_tree);
      return 0;
  err:
@@ -XXX,XX +XXX,XX @@ err:
              qemu_del_net_client(ncs[i]);
          }
      }
++
--    net_socket_rs_init(&s->rs, redirector_rs_finalize);
++err_svq:
-+    net_socket_rs_init(&s->rs, redirector_rs_finalize, false);
+     qemu_close(vdpa_device_fd);
-     if (s->indev) {
+     return -1;
-         chr = qemu_chr_find(s->indev);
+diff --git a/qapi/net.json b/qapi/net.json
 diff --git a/net/net.c b/net/net.c
 index XXXXXXX..XXXXXXX 100644
---- a/net/net.c
+--- a/qapi/net.json
-+++ b/net/net.c
++++ b/qapi/net.json
-@@ -XXX,XX +XXX,XX @@ QemuOptsList qemu_net_opts = {
+@@ -XXX,XX +XXX,XX @@
- };
+ # @queues: number of queues to be created for multiqueue vhost-vdpa
+ #          (default: 1)
- void net_socket_rs_init(SocketReadState *rs,
+ #
--                        SocketReadStateFinalize *finalize)
++# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
-+                        SocketReadStateFinalize *finalize,
++#         (default: false)
-+                        bool vnet_hdr)
++#
- {
++# Features:
-     rs->state = 0;
++# @unstable: Member @x-svq is experimental.
-+    rs->vnet_hdr = vnet_hdr;
++#
-     rs->index = 0;
+ # Since: 5.1
-     rs->packet_len = 0;
+ ##
-+    rs->vnet_hdr_len = 0;
+ { 'struct': 'NetdevVhostVDPAOptions',
-     memset(rs->buf, 0, sizeof(rs->buf));
+   'data': {
-     rs->finalize = finalize;
+     '*vhostdev':     'str',
- }
+-    '*queues':       'int' } }
-@@ -XXX,XX +XXX,XX @@ int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size)
++    '*queues':       'int',
-     unsigned int l;
++    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
-     while (size > 0) {
+ ##
--        /* reassemble a packet from the network */
+ # @NetdevVmnetHostOptions:
 -        switch (rs->state) { /* 0 = getting length, 1 = getting data */
 +        /* Reassemble a packet from the network.
 +         * 0 = getting length.
 +         * 1 = getting vnet header length.
 +         * 2 = getting data.
 +         */
 +        switch (rs->state) {
          case 0:
              l = 4 - rs->index;
              if (l > size) {
@@ -XXX,XX +XXX,XX @@ int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size)
                  /* got length */
                  rs->packet_len = ntohl(*(uint32_t *)rs->buf);
                  rs->index = 0;
 -                rs->state = 1;
 +                if (rs->vnet_hdr) {
 +                    rs->state = 1;
 +                } else {
 +                    rs->state = 2;
 +                    rs->vnet_hdr_len = 0;
 +                }
              }
              break;
          case 1:
 +            l = 4 - rs->index;
 +            if (l > size) {
 +                l = size;
 +            }
 +            memcpy(rs->buf + rs->index, buf, l);
 +            buf += l;
 +            size -= l;
 +            rs->index += l;
 +            if (rs->index == 4) {
 +                /* got vnet header length */
 +                rs->vnet_hdr_len = ntohl(*(uint32_t *)rs->buf);
 +                rs->index = 0;
 +                rs->state = 2;
 +            }
 +            break;
 +        case 2:
              l = rs->packet_len - rs->index;
              if (l > size) {
                  l = size;
 diff --git a/net/socket.c b/net/socket.c
 index XXXXXXX..XXXXXXX 100644
 --- a/net/socket.c
 +++ b/net/socket.c
@@ -XXX,XX +XXX,XX @@ static void net_socket_send(void *opaque)
          closesocket(s->fd);
          s->fd = -1;
 -        net_socket_rs_init(&s->rs, net_socket_rs_finalize);
 +        net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
          s->nc.link_down = true;
          memset(s->nc.info_str, 0, sizeof(s->nc.info_str));
@@ -XXX,XX +XXX,XX @@ static NetSocketState *net_socket_fd_init_dgram(NetClientState *peer,
      s->fd = fd;
      s->listen_fd = -1;
      s->send_fn = net_socket_send_dgram;
 -    net_socket_rs_init(&s->rs, net_socket_rs_finalize);
 +    net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
      net_socket_read_poll(s, true);
      /* mcast: save bound address as dst */
@@ -XXX,XX +XXX,XX @@ static NetSocketState *net_socket_fd_init_stream(NetClientState *peer,
      s->fd = fd;
      s->listen_fd = -1;
 -    net_socket_rs_init(&s->rs, net_socket_rs_finalize);
 +    net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
      /* Disable Nagle algorithm on TCP sockets to reduce latency */
      socket_set_nodelay(fd);
@@ -XXX,XX +XXX,XX @@ static int net_socket_listen_init(NetClientState *peer,
      s->fd = -1;
      s->listen_fd = fd;
      s->nc.link_down = true;
 -    net_socket_rs_init(&s->rs, net_socket_rs_finalize);
 +    net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
      qemu_set_fd_handler(s->listen_fd, net_socket_accept, NULL, s);
      return 0;
 --
 .7.4

-New patch
+[PULL V2 22/25] softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH
+From: Zhang Chen <chen.zhang@intel.com>
+If the checkpoint occurs when the guest finishes restarting
+but has not started running, the runstate_set() may reject
+the transition from COLO to PRELAUNCH with the crash log:
+{"timestamp": {"seconds": 1593484591, "microseconds": 26605},\
+"event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
+qemu-system-x86_64: invalid runstate transition: 'colo' -> 'prelaunch'
+Long-term testing says that it's pretty safe.
+Signed-off-by: Like Xu <like.xu@linux.intel.com>
+Signed-off-by: Zhang Chen <chen.zhang@intel.com>
+Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
+Signed-off-by: Jason Wang <jasowang@redhat.com>
+---
+ softmmu/runstate.c | 1 +
+file changed, 1 insertion(+)
+diff --git a/softmmu/runstate.c b/softmmu/runstate.c
+index XXXXXXX..XXXXXXX 100644
+--- a/softmmu/runstate.c
++++ b/softmmu/runstate.c
+@@ -XXX,XX +XXX,XX @@ static const RunStateTransition runstate_transitions_def[] = {
+     { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
+     { RUN_STATE_COLO, RUN_STATE_RUNNING },
++    { RUN_STATE_COLO, RUN_STATE_PRELAUNCH },
+     { RUN_STATE_COLO, RUN_STATE_SHUTDOWN},
+     { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
+--
+.7.4

-[Qemu-devel] [PULL 06/14] net/colo.c: Make vnet_hdr_len as packet property
+[PULL V2 23/25] net/colo: Fix a "double free" crash to clear the conn_list
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Zhang Chen <chen.zhang@intel.com>
-We can use this property flush and send packet with vnet_hdr_len.
+We notice the QEMU may crash when the guest has too many
 incoming network connections with the following log:
-Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
 free(): invalid pointer
 [1]    15195 abort (core dumped)  qemu-system-x86_64 ....
 This is because we create the s->connection_track_table with
 g_hash_table_new_full() which is defined as:
 GHashTable * g_hash_table_new_full (GHashFunc hash_func,
                        GEqualFunc key_equal_func,
                        GDestroyNotify key_destroy_func,
                        GDestroyNotify value_destroy_func);
 The fourth parameter connection_destroy() will be called to free the
 memory allocated for all 'Connection' values in the hashtable when
 we call g_hash_table_remove_all() in the connection_hashtable_reset().
 But both connection_track_table and conn_list reference to the same
 conn instance. It will trigger double free in conn_list clear. So this
 patch remove free action on hash table side to avoid double free the
 conn.
 Signed-off-by: Like Xu <like.xu@linux.intel.com>
 Signed-off-by: Zhang Chen <chen.zhang@intel.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- net/colo-compare.c    | 8 ++++++--
+ net/colo-compare.c    | 2 +-
  net/colo.c            | 3 ++-
  net/colo.h            | 4 +++-
  net/filter-rewriter.c | 2 +-
-files changed, 12 insertions(+), 5 deletions(-)
+files changed, 2 insertions(+), 2 deletions(-)
 diff --git a/net/colo-compare.c b/net/colo-compare.c
 index XXXXXXX..XXXXXXX 100644
 --- a/net/colo-compare.c
 +++ b/net/colo-compare.c
-@@ -XXX,XX +XXX,XX @@ static int packet_enqueue(CompareState *s, int mode)
+@@ -XXX,XX +XXX,XX @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
-     Connection *conn;
+     s->connection_track_table = g_hash_table_new_full(connection_key_hash,
+                                                       connection_key_equal,
-     if (mode == PRIMARY_IN) {
+                                                       g_free,
--        pkt = packet_new(s->pri_rs.buf, s->pri_rs.packet_len);
+-                                                      connection_destroy);
-+        pkt = packet_new(s->pri_rs.buf,
++                                                      NULL);
-+                         s->pri_rs.packet_len,
-+                         s->pri_rs.vnet_hdr_len);
+     colo_compare_iothread(s);
-     } else {
 -        pkt = packet_new(s->sec_rs.buf, s->sec_rs.packet_len);
 +        pkt = packet_new(s->sec_rs.buf,
 +                         s->sec_rs.packet_len,
 +                         s->sec_rs.vnet_hdr_len);
      }
      if (parse_packet_early(pkt)) {
 diff --git a/net/colo.c b/net/colo.c
 index XXXXXXX..XXXXXXX 100644
 --- a/net/colo.c
 +++ b/net/colo.c
@@ -XXX,XX +XXX,XX @@ void connection_destroy(void *opaque)
      g_slice_free(Connection, conn);
  }
 -Packet *packet_new(const void *data, int size)
 +Packet *packet_new(const void *data, int size, int vnet_hdr_len)
  {
      Packet *pkt = g_slice_new(Packet);
      pkt->data = g_memdup(data, size);
      pkt->size = size;
      pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 +    pkt->vnet_hdr_len = vnet_hdr_len;
      return pkt;
  }
 diff --git a/net/colo.h b/net/colo.h
 index XXXXXXX..XXXXXXX 100644
 --- a/net/colo.h
 +++ b/net/colo.h
@@ -XXX,XX +XXX,XX @@ typedef struct Packet {
      int size;
      /* Time of packet creation, in wall clock ms */
      int64_t creation_ms;
 +    /* Get vnet_hdr_len from filter */
 +    uint32_t vnet_hdr_len;
  } Packet;
  typedef struct ConnectionKey {
@@ -XXX,XX +XXX,XX @@ Connection *connection_get(GHashTable *connection_track_table,
                             ConnectionKey *key,
                             GQueue *conn_list);
  void connection_hashtable_reset(GHashTable *connection_track_table);
 -Packet *packet_new(const void *data, int size);
 +Packet *packet_new(const void *data, int size, int vnet_hdr_len);
  void packet_destroy(void *opaque, void *user_data);
  #endif /* QEMU_COLO_PROXY_H */
 diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
 index XXXXXXX..XXXXXXX 100644
 --- a/net/filter-rewriter.c
 +++ b/net/filter-rewriter.c
-@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
+@@ -XXX,XX +XXX,XX @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
-     char *buf = g_malloc0(size);
+     s->connection_track_table = g_hash_table_new_full(connection_key_hash,
+                                                       connection_key_equal,
-     iov_to_buf(iov, iovcnt, 0, buf, size);
+                                                       g_free,
--    pkt = packet_new(buf, size);
+-                                                      connection_destroy);
-+    pkt = packet_new(buf, size, 0);
++                                                      NULL);
-     g_free(buf);
+     s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
+ }
-     /*
 --
 .7.4

-New patch
+[PULL V2 24/25] net/colo.c: No need to track conn_list for filter-rewriter
+From: Zhang Chen <chen.zhang@intel.com>
+Filter-rewriter no need to track connection in conn_list.
+This patch fix the glib g_queue_is_empty assertion when COLO guest
+keep a lot of network connection.
+Signed-off-by: Zhang Chen <chen.zhang@intel.com>
+Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
+Signed-off-by: Jason Wang <jasowang@redhat.com>
+---
+ net/colo.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/net/colo.c b/net/colo.c
+index XXXXXXX..XXXXXXX 100644
+--- a/net/colo.c
++++ b/net/colo.c
+@@ -XXX,XX +XXX,XX @@ Connection *connection_get(GHashTable *connection_track_table,
+             /*
+              * clear the conn_list
+              */
+-            while (!g_queue_is_empty(conn_list)) {
++            while (conn_list && !g_queue_is_empty(conn_list)) {
+                 connection_destroy(g_queue_pop_head(conn_list));
+             }
+         }
+--
+.7.4

-[Qemu-devel] [PULL 09/14] net/colo.c: Add vnet packet parse feature in colo-proxy
+[PULL V2 25/25] net/colo.c: fix segmentation fault when packet is not parsed correctly
-From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+From: Zhang Chen <chen.zhang@intel.com>
-Make colo-compare and filter-rewriter can parse vnet packet.
+When COLO use only one vnet_hdr_support parameter between
 filter-redirector and filter-mirror(or colo-compare), COLO will crash
 with segmentation fault. Back track as follow:
-Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
+Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
 x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
     at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
 uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
 (gdb) bt
 0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
     at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
 0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
 net/colo.c:49
 0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
 net/filter-rewriter.c:63
 So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
 raise error and add trace-events to track vnet_hdr_len.
 Signed-off-by: Tao Xu <tao3.xu@intel.com>
 Signed-off-by: Zhang Chen <chen.zhang@intel.com>
 Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- net/colo.c | 6 +++---
+ net/colo.c       | 9 ++++++++-
-file changed, 3 insertions(+), 3 deletions(-)
+ net/trace-events | 1 +
 files changed, 9 insertions(+), 1 deletion(-)
 diff --git a/net/colo.c b/net/colo.c
 index XXXXXXX..XXXXXXX 100644
 --- a/net/colo.c
 +++ b/net/colo.c
 @@ -XXX,XX +XXX,XX @@ int parse_packet_early(Packet *pkt)
- {
-     int network_length;
      static const uint8_t vlan[] = {0x81, 0x00};
--    uint8_t *data = pkt->data;
+     uint8_t *data = pkt->data + pkt->vnet_hdr_len;
 +    uint8_t *data = pkt->data + pkt->vnet_hdr_len;
      uint16_t l3_proto;
-     ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
+-    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
++    ssize_t l2hdr_len;
--    if (pkt->size < ETH_HLEN) {
++
-+    if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
++    if (data == NULL) {
 +        trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
 +                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
 +        return 1;
 +    }
 +    l2hdr_len = eth_get_l2_hdr_length(data);
      if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
          trace_colo_proxy_main("pkt->size < ETH_HLEN");
-         return 1;
+diff --git a/net/trace-events b/net/trace-events
-     }
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ int parse_packet_early(Packet *pkt)
+--- a/net/trace-events
-     }
++++ b/net/trace-events
+@@ -XXX,XX +XXX,XX @@ vhost_user_event(const char *chr, int event) "chr: %s got event: %d"
-     network_length = pkt->ip->ip_hl * 4;
--    if (pkt->size < l2hdr_len + network_length) {
+ # colo.c
-+    if (pkt->size < l2hdr_len + network_length + pkt->vnet_hdr_len) {
+ colo_proxy_main(const char *chr) ": %s"
-         trace_colo_proxy_main("pkt->size < network_header + network_length");
++colo_proxy_main_vnet_info(const char *sta, int size) ": %s = %d"
-         return 1;
-     }
+ # colo-compare.c
  colo_compare_main(const char *chr) ": %s"
 --
 .7.4

The following changes since commit 6632f6ff96f0537fc34cdc00c760656fc62e23c5:

Merge remote-tracking branch 'remotes/famz/tags/block-and-testing-pull-request' into staging (2017-07-17 11:46:36 +0100)

are available in the git repository at:

https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to 189ae6bb5ce1f5a322f8691d00fe942ba43dd601:

virtio-net: fix offload ctrl endian (2017-07-17 20:13:56 +0800)

----------------------------------------------------------------

- fix virtio-net ctrl offload endian
- vnet header support for variou COLO netfilters and compare thread

----------------------------------------------------------------
Jason Wang (1):
      virtio-net: fix offload ctrl endian

Michal Privoznik (1):
      virtion-net: Prefer is_power_of_2()

Zhang Chen (12):
      net: Add vnet_hdr_len arguments in NetClientState
      net/net.c: Add vnet_hdr support in SocketReadState
      net/filter-mirror.c: Introduce parameter for filter_send()
      net/filter-mirror.c: Make filter mirror support vnet support.
      net/filter-mirror.c: Add new option to enable vnet support for filter-redirector
      net/colo.c: Make vnet_hdr_len as packet property
      net/colo-compare.c: Introduce parameter for compare_chr_send()
      net/colo-compare.c: Make colo-compare support vnet_hdr_len
      net/colo.c: Add vnet packet parse feature in colo-proxy
      net/colo-compare.c: Add vnet packet's tcp/udp/icmp compare
      net/filter-rewriter.c: Make filter-rewriter support vnet_hdr_len
      docs/colo-proxy.txt: Update colo-proxy usage of net driver with vnet_header

From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

We add a flag to decide whether net_fill_rstate() need read
the vnet_hdr_len or not.

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/net/net.h   |  9 +++++++--
 net/colo-compare.c  |  4 ++--
 net/filter-mirror.c |  2 +-
 net/net.c           | 36 ++++++++++++++++++++++++++++++++----
 net/socket.c        |  8 ++++----
 5 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index XXXXXXX..XXXXXXX 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -XXX,XX +XXX,XX @@ typedef struct NICState {
 } NICState;
 
 struct SocketReadState {
-    int state; /* 0 = getting length, 1 = getting data */
+    /* 0 = getting length, 1 = getting vnet header length, 2 = getting data */
+    int state;
+    /* This flag decide whether to read the vnet_hdr_len field */
+    bool vnet_hdr;
     uint32_t index;
     uint32_t packet_len;
+    uint32_t vnet_hdr_len;
     uint8_t buf[NET_BUFSIZE];
     SocketReadStateFinalize *finalize;
 };
@@ -XXX,XX +XXX,XX @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
 void print_net_client(Monitor *mon, NetClientState *nc);
 void hmp_info_network(Monitor *mon, const QDict *qdict);
 void net_socket_rs_init(SocketReadState *rs,
-                        SocketReadStateFinalize *finalize);
+                        SocketReadStateFinalize *finalize,
+                        bool vnet_hdr);
 
 /* NIC info */
 
diff --git a/net/colo-compare.c b/net/colo-compare.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -XXX,XX +XXX,XX @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
         return;
     }
 
-    net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize);
-    net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize);
+    net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, false);
+    net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, false);
 
     g_queue_init(&s->conn_list);
 
diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index XXXXXXX..XXXXXXX 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -XXX,XX +XXX,XX @@ static void filter_redirector_setup(NetFilterState *nf, Error **errp)
         }
     }
 
-    net_socket_rs_init(&s->rs, redirector_rs_finalize);
+    net_socket_rs_init(&s->rs, redirector_rs_finalize, false);
 
     if (s->indev) {
         chr = qemu_chr_find(s->indev);
diff --git a/net/net.c b/net/net.c
index XXXXXXX..XXXXXXX 100644
--- a/net/net.c
+++ b/net/net.c
@@ -XXX,XX +XXX,XX @@ QemuOptsList qemu_net_opts = {
 };
 
 void net_socket_rs_init(SocketReadState *rs,
-                        SocketReadStateFinalize *finalize)
+                        SocketReadStateFinalize *finalize,
+                        bool vnet_hdr)
 {
     rs->state = 0;
+    rs->vnet_hdr = vnet_hdr;
     rs->index = 0;
     rs->packet_len = 0;
+    rs->vnet_hdr_len = 0;
     memset(rs->buf, 0, sizeof(rs->buf));
     rs->finalize = finalize;
 }
@@ -XXX,XX +XXX,XX @@ int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size)
     unsigned int l;
 
     while (size > 0) {
-        /* reassemble a packet from the network */
-        switch (rs->state) { /* 0 = getting length, 1 = getting data */
+        /* Reassemble a packet from the network.
+         * 0 = getting length.
+         * 1 = getting vnet header length.
+         * 2 = getting data.
+         */
+        switch (rs->state) {
         case 0:
             l = 4 - rs->index;
             if (l > size) {
@@ -XXX,XX +XXX,XX @@ int net_fill_rstate(SocketReadState *rs, const uint8_t *buf, int size)
                 /* got length */
                 rs->packet_len = ntohl(*(uint32_t *)rs->buf);
                 rs->index = 0;
-                rs->state = 1;
+                if (rs->vnet_hdr) {
+                    rs->state = 1;
+                } else {
+                    rs->state = 2;
+                    rs->vnet_hdr_len = 0;
+                }
             }
             break;
         case 1:
+            l = 4 - rs->index;
+            if (l > size) {
+                l = size;
+            }
+            memcpy(rs->buf + rs->index, buf, l);
+            buf += l;
+            size -= l;
+            rs->index += l;
+            if (rs->index == 4) {
+                /* got vnet header length */
+                rs->vnet_hdr_len = ntohl(*(uint32_t *)rs->buf);
+                rs->index = 0;
+                rs->state = 2;
+            }
+            break;
+        case 2:
             l = rs->packet_len - rs->index;
             if (l > size) {
                 l = size;
diff --git a/net/socket.c b/net/socket.c
index XXXXXXX..XXXXXXX 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -XXX,XX +XXX,XX @@ static void net_socket_send(void *opaque)
         closesocket(s->fd);
 
         s->fd = -1;
-        net_socket_rs_init(&s->rs, net_socket_rs_finalize);
+        net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
         s->nc.link_down = true;
         memset(s->nc.info_str, 0, sizeof(s->nc.info_str));
 
@@ -XXX,XX +XXX,XX @@ static NetSocketState *net_socket_fd_init_dgram(NetClientState *peer,
     s->fd = fd;
     s->listen_fd = -1;
     s->send_fn = net_socket_send_dgram;
-    net_socket_rs_init(&s->rs, net_socket_rs_finalize);
+    net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
     net_socket_read_poll(s, true);
 
     /* mcast: save bound address as dst */
@@ -XXX,XX +XXX,XX @@ static NetSocketState *net_socket_fd_init_stream(NetClientState *peer,
 
     s->fd = fd;
     s->listen_fd = -1;
-    net_socket_rs_init(&s->rs, net_socket_rs_finalize);
+    net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
 
     /* Disable Nagle algorithm on TCP sockets to reduce latency */
     socket_set_nodelay(fd);
@@ -XXX,XX +XXX,XX @@ static int net_socket_listen_init(NetClientState *peer,
     s->fd = -1;
     s->listen_fd = fd;
     s->nc.link_down = true;
-    net_socket_rs_init(&s->rs, net_socket_rs_finalize);
+    net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
 
     qemu_set_fd_handler(s->listen_fd, net_socket_accept, NULL, s);
     return 0;
-- 
2.7.4

From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

This patch change the filter_send() parameter from CharBackend to MirrorState,
we can get more information like vnet_hdr(We use it to support packet with vnet_header).

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/filter-mirror.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index XXXXXXX..XXXXXXX 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -XXX,XX +XXX,XX @@ typedef struct MirrorState {
     SocketReadState rs;
 } MirrorState;
 
-static int filter_send(CharBackend *chr_out,
+static int filter_send(MirrorState *s,
                        const struct iovec *iov,
                        int iovcnt)
 {
@@ -XXX,XX +XXX,XX @@ static int filter_send(CharBackend *chr_out,
     }
 
     len = htonl(size);
-    ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)&len, sizeof(len));
+    ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
     if (ret != sizeof(len)) {
         goto err;
     }
 
     buf = g_malloc(size);
     iov_to_buf(iov, iovcnt, 0, buf, size);
-    ret = qemu_chr_fe_write_all(chr_out, (uint8_t *)buf, size);
+    ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
     g_free(buf);
     if (ret != size) {
         goto err;
@@ -XXX,XX +XXX,XX @@ static ssize_t filter_mirror_receive_iov(NetFilterState *nf,
     MirrorState *s = FILTER_MIRROR(nf);
     int ret;
 
-    ret = filter_send(&s->chr_out, iov, iovcnt);
+    ret = filter_send(s, iov, iovcnt);
     if (ret) {
         error_report("filter mirror send failed(%s)", strerror(-ret));
     }
@@ -XXX,XX +XXX,XX @@ static ssize_t filter_redirector_receive_iov(NetFilterState *nf,
     int ret;
 
     if (qemu_chr_fe_backend_connected(&s->chr_out)) {
-        ret = filter_send(&s->chr_out, iov, iovcnt);
+        ret = filter_send(s, iov, iovcnt);
         if (ret) {
             error_report("filter redirector send failed(%s)", strerror(-ret));
         }
-- 
2.7.4

From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

We add the vnet_hdr_support option for filter-mirror, default is disabled.
If you use virtio-net-pci or other driver needs vnet_hdr, please enable it.
You can use it for example:
-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0,vnet_hdr_support

If it has vnet_hdr_support flag, we will change the sending packet format from
struct {int size; const uint8_t buf[];} to {int size; int vnet_hdr_len; const uint8_t buf[];}.
make other module(like colo-compare) know how to parse net packet correctly.

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/filter-mirror.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 qemu-options.hx     |  5 ++---
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index XXXXXXX..XXXXXXX 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -XXX,XX +XXX,XX @@ typedef struct MirrorState {
     CharBackend chr_in;
     CharBackend chr_out;
     SocketReadState rs;
+    bool vnet_hdr;
 } MirrorState;
 
 static int filter_send(MirrorState *s,
                        const struct iovec *iov,
                        int iovcnt)
 {
+    NetFilterState *nf = NETFILTER(s);
     int ret = 0;
     ssize_t size = 0;
     uint32_t len = 0;
@@ -XXX,XX +XXX,XX @@ static int filter_send(MirrorState *s,
         goto err;
     }
 
+    if (s->vnet_hdr) {
+        /*
+         * If vnet_hdr = on, we send vnet header len to make other
+         * module(like colo-compare) know how to parse net
+         * packet correctly.
+         */
+        ssize_t vnet_hdr_len;
+
+        vnet_hdr_len = nf->netdev->vnet_hdr_len;
+
+        len = htonl(vnet_hdr_len);
+        ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
+        if (ret != sizeof(len)) {
+            goto err;
+        }
+    }
+
     buf = g_malloc(size);
     iov_to_buf(iov, iovcnt, 0, buf, size);
     ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
@@ -XXX,XX +XXX,XX @@ static void filter_redirector_setup(NetFilterState *nf, Error **errp)
         }
     }
 
-    net_socket_rs_init(&s->rs, redirector_rs_finalize, false);
+    net_socket_rs_init(&s->rs, redirector_rs_finalize, s->vnet_hdr);
 
     if (s->indev) {
         chr = qemu_chr_find(s->indev);
@@ -XXX,XX +XXX,XX @@ static void filter_mirror_set_outdev(Object *obj,
     }
 }
 
+static bool filter_mirror_get_vnet_hdr(Object *obj, Error **errp)
+{
+    MirrorState *s = FILTER_MIRROR(obj);
+
+    return s->vnet_hdr;
+}
+
+static void filter_mirror_set_vnet_hdr(Object *obj, bool value, Error **errp)
+{
+    MirrorState *s = FILTER_MIRROR(obj);
+
+    s->vnet_hdr = value;
+}
+
 static char *filter_redirector_get_outdev(Object *obj, Error **errp)
 {
     MirrorState *s = FILTER_REDIRECTOR(obj);
@@ -XXX,XX +XXX,XX @@ static void filter_redirector_set_outdev(Object *obj,
 
 static void filter_mirror_init(Object *obj)
 {
+    MirrorState *s = FILTER_MIRROR(obj);
+
     object_property_add_str(obj, "outdev", filter_mirror_get_outdev,
                             filter_mirror_set_outdev, NULL);
+
+    s->vnet_hdr = false;
+    object_property_add_bool(obj, "vnet_hdr_support",
+                             filter_mirror_get_vnet_hdr,
+                             filter_mirror_set_vnet_hdr, NULL);
 }
 
 static void filter_redirector_init(Object *obj)
diff --git a/qemu-options.hx b/qemu-options.hx
index XXXXXXX..XXXXXXX 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter.
 @option{tx}: the filter is attached to the transmit queue of the netdev,
              where it will receive packets sent by the netdev.
 
-@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
+@item -object filter-mirror,id=@var{id},netdev=@var{netdevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support]
 
-filter-mirror on netdev @var{netdevid},mirror net packet to chardev
-@var{chardevid}
+filter-mirror on netdev @var{netdevid},mirror net packet to chardev@var{chardevid}, if it has the vnet_hdr_support flag, filter-mirror will mirror packet with vnet_hdr_len.
 
 @item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},
 outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
-- 
2.7.4

From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

We add the vnet_hdr_support option for filter-redirector, default is disabled.
If you use virtio-net-pci net driver or other driver needs vnet_hdr, please enable it.
Because colo-compare or other modules needs the vnet_hdr_len to parse
packet, we add this new option send the len to others.
You can use it for example:
-object filter-redirector,id=r0,netdev=hn0,queue=tx,outdev=red0,vnet_hdr_support

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/filter-mirror.c | 23 +++++++++++++++++++++++
 qemu-options.hx     |  6 +++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/net/filter-mirror.c b/net/filter-mirror.c
index XXXXXXX..XXXXXXX 100644
--- a/net/filter-mirror.c
+++ b/net/filter-mirror.c
@@ -XXX,XX +XXX,XX @@ static void filter_redirector_set_outdev(Object *obj,
     s->outdev = g_strdup(value);
 }
 
+static bool filter_redirector_get_vnet_hdr(Object *obj, Error **errp)
+{
+    MirrorState *s = FILTER_REDIRECTOR(obj);
+
+    return s->vnet_hdr;
+}
+
+static void filter_redirector_set_vnet_hdr(Object *obj,
+                                           bool value,
+                                           Error **errp)
+{
+    MirrorState *s = FILTER_REDIRECTOR(obj);
+
+    s->vnet_hdr = value;
+}
+
 static void filter_mirror_init(Object *obj)
 {
     MirrorState *s = FILTER_MIRROR(obj);
@@ -XXX,XX +XXX,XX @@ static void filter_mirror_init(Object *obj)
 
 static void filter_redirector_init(Object *obj)
 {
+    MirrorState *s = FILTER_REDIRECTOR(obj);
+
     object_property_add_str(obj, "indev", filter_redirector_get_indev,
                             filter_redirector_set_indev, NULL);
     object_property_add_str(obj, "outdev", filter_redirector_get_outdev,
                             filter_redirector_set_outdev, NULL);
+
+    s->vnet_hdr = false;
+    object_property_add_bool(obj, "vnet_hdr_support",
+                             filter_redirector_get_vnet_hdr,
+                             filter_redirector_set_vnet_hdr, NULL);
 }
 
 static void filter_mirror_fini(Object *obj)
diff --git a/qemu-options.hx b/qemu-options.hx
index XXXXXXX..XXXXXXX 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ queue @var{all|rx|tx} is an option that can be applied to any netfilter.
 
 filter-mirror on netdev @var{netdevid},mirror net packet to chardev@var{chardevid}, if it has the vnet_hdr_support flag, filter-mirror will mirror packet with vnet_hdr_len.
 
-@item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},
-outdev=@var{chardevid}[,queue=@var{all|rx|tx}]
+@item -object filter-redirector,id=@var{id},netdev=@var{netdevid},indev=@var{chardevid},outdev=@var{chardevid},queue=@var{all|rx|tx}[,vnet_hdr_support]
 
 filter-redirector on netdev @var{netdevid},redirect filter's net packet to chardev
-@var{chardevid},and redirect indev's packet to filter.
+@var{chardevid},and redirect indev's packet to filter.if it has the vnet_hdr_support flag,
+filter-redirector will redirect packet with vnet_hdr_len.
 Create a filter-redirector we need to differ outdev id from indev id, id can not
 be the same. we can just use indev or outdev, but at least one of indev or outdev
 need to be specified.
-- 
2.7.4

From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

We can use this property flush and send packet with vnet_hdr_len.

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo-compare.c    | 8 ++++++--
 net/colo.c            | 3 ++-
 net/colo.h            | 4 +++-
 net/filter-rewriter.c | 2 +-
 4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -XXX,XX +XXX,XX @@ static int packet_enqueue(CompareState *s, int mode)
     Connection *conn;
 
     if (mode == PRIMARY_IN) {
-        pkt = packet_new(s->pri_rs.buf, s->pri_rs.packet_len);
+        pkt = packet_new(s->pri_rs.buf,
+                         s->pri_rs.packet_len,
+                         s->pri_rs.vnet_hdr_len);
     } else {
-        pkt = packet_new(s->sec_rs.buf, s->sec_rs.packet_len);
+        pkt = packet_new(s->sec_rs.buf,
+                         s->sec_rs.packet_len,
+                         s->sec_rs.vnet_hdr_len);
     }
 
     if (parse_packet_early(pkt)) {
diff --git a/net/colo.c b/net/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -XXX,XX +XXX,XX @@ void connection_destroy(void *opaque)
     g_slice_free(Connection, conn);
 }
 
-Packet *packet_new(const void *data, int size)
+Packet *packet_new(const void *data, int size, int vnet_hdr_len)
 {
     Packet *pkt = g_slice_new(Packet);
 
     pkt->data = g_memdup(data, size);
     pkt->size = size;
     pkt->creation_ms = qemu_clock_get_ms(QEMU_CLOCK_HOST);
+    pkt->vnet_hdr_len = vnet_hdr_len;
 
     return pkt;
 }
diff --git a/net/colo.h b/net/colo.h
index XXXXXXX..XXXXXXX 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -XXX,XX +XXX,XX @@ typedef struct Packet {
     int size;
     /* Time of packet creation, in wall clock ms */
     int64_t creation_ms;
+    /* Get vnet_hdr_len from filter */
+    uint32_t vnet_hdr_len;
 } Packet;
 
 typedef struct ConnectionKey {
@@ -XXX,XX +XXX,XX @@ Connection *connection_get(GHashTable *connection_track_table,
                            ConnectionKey *key,
                            GQueue *conn_list);
 void connection_hashtable_reset(GHashTable *connection_track_table);
-Packet *packet_new(const void *data, int size);
+Packet *packet_new(const void *data, int size, int vnet_hdr_len);
 void packet_destroy(void *opaque, void *user_data);
 
 #endif /* QEMU_COLO_PROXY_H */
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index XXXXXXX..XXXXXXX 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
     char *buf = g_malloc0(size);
 
     iov_to_buf(iov, iovcnt, 0, buf, size);
-    pkt = packet_new(buf, size);
+    pkt = packet_new(buf, size, 0);
     g_free(buf);
 
     /*
-- 
2.7.4

From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

This patch change the compare_chr_send() parameter from CharBackend to CompareState,
we can get more information like vnet_hdr(We use it to support packet with vnet_header).

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo-compare.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -XXX,XX +XXX,XX @@ enum {
     SECONDARY_IN,
 };
 
-static int compare_chr_send(CharBackend *out,
+static int compare_chr_send(CompareState *s,
                             const uint8_t *buf,
                             uint32_t size);
 
@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
         }
 
         if (result) {
-            ret = compare_chr_send(&s->chr_out, pkt->data, pkt->size);
+            ret = compare_chr_send(s, pkt->data, pkt->size);
             if (ret < 0) {
                 error_report("colo_send_primary_packet failed");
             }
@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
     }
 }
 
-static int compare_chr_send(CharBackend *out,
+static int compare_chr_send(CompareState *s,
                             const uint8_t *buf,
                             uint32_t size)
 {
@@ -XXX,XX +XXX,XX @@ static int compare_chr_send(CharBackend *out,
         return 0;
     }
 
-    ret = qemu_chr_fe_write_all(out, (uint8_t *)&len, sizeof(len));
+    ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
     if (ret != sizeof(len)) {
         goto err;
     }
 
-    ret = qemu_chr_fe_write_all(out, (uint8_t *)buf, size);
+    ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
     if (ret != size) {
         goto err;
     }
@@ -XXX,XX +XXX,XX @@ static void compare_pri_rs_finalize(SocketReadState *pri_rs)
 
     if (packet_enqueue(s, PRIMARY_IN)) {
         trace_colo_compare_main("primary: unsupported packet in");
-        compare_chr_send(&s->chr_out, pri_rs->buf, pri_rs->packet_len);
+        compare_chr_send(s, pri_rs->buf, pri_rs->packet_len);
     } else {
         /* compare connection */
         g_queue_foreach(&s->conn_list, colo_compare_connection, s);
@@ -XXX,XX +XXX,XX @@ static void colo_flush_packets(void *opaque, void *user_data)
 
     while (!g_queue_is_empty(&conn->primary_list)) {
         pkt = g_queue_pop_head(&conn->primary_list);
-        compare_chr_send(&s->chr_out, pkt->data, pkt->size);
+        compare_chr_send(s, pkt->data, pkt->size);
         packet_destroy(pkt, NULL);
     }
     while (!g_queue_is_empty(&conn->secondary_list)) {
-- 
2.7.4

From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

We add the vnet_hdr_support option for colo-compare, default is disabled.
If you use virtio-net-pci or other driver needs vnet_hdr, please enable it.
You can use it for example:
-object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,vnet_hdr_support

COLO-compare can get vnet header length from filter,
Add vnet_hdr_len to struct packet and output packet with
the vnet_hdr_len.

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo-compare.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++-------
 qemu-options.hx    |  4 ++--
 2 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -XXX,XX +XXX,XX @@ typedef struct CompareState {
     CharBackend chr_out;
     SocketReadState pri_rs;
     SocketReadState sec_rs;
+    bool vnet_hdr;
 
     /* connection list: the connections belonged to this NIC could be found
      * in this list.
@@ -XXX,XX +XXX,XX @@ enum {
 
 static int compare_chr_send(CompareState *s,
                             const uint8_t *buf,
-                            uint32_t size);
+                            uint32_t size,
+                            uint32_t vnet_hdr_len);
 
 static gint seq_sorter(Packet *a, Packet *b, gpointer data)
 {
@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
         }
 
         if (result) {
-            ret = compare_chr_send(s, pkt->data, pkt->size);
+            ret = compare_chr_send(s,
+                                   pkt->data,
+                                   pkt->size,
+                                   pkt->vnet_hdr_len);
             if (ret < 0) {
                 error_report("colo_send_primary_packet failed");
             }
@@ -XXX,XX +XXX,XX @@ static void colo_compare_connection(void *opaque, void *user_data)
 
 static int compare_chr_send(CompareState *s,
                             const uint8_t *buf,
-                            uint32_t size)
+                            uint32_t size,
+                            uint32_t vnet_hdr_len)
 {
     int ret = 0;
     uint32_t len = htonl(size);
@@ -XXX,XX +XXX,XX @@ static int compare_chr_send(CompareState *s,
         goto err;
     }
 
+    if (s->vnet_hdr) {
+        /*
+         * We send vnet header len make other module(like filter-redirector)
+         * know how to parse net packet correctly.
+         */
+        len = htonl(vnet_hdr_len);
+        ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)&len, sizeof(len));
+        if (ret != sizeof(len)) {
+            goto err;
+        }
+    }
+
     ret = qemu_chr_fe_write_all(&s->chr_out, (uint8_t *)buf, size);
     if (ret != size) {
         goto err;
@@ -XXX,XX +XXX,XX @@ static void compare_set_outdev(Object *obj, const char *value, Error **errp)
     s->outdev = g_strdup(value);
 }
 
+static bool compare_get_vnet_hdr(Object *obj, Error **errp)
+{
+    CompareState *s = COLO_COMPARE(obj);
+
+    return s->vnet_hdr;
+}
+
+static void compare_set_vnet_hdr(Object *obj,
+                                 bool value,
+                                 Error **errp)
+{
+    CompareState *s = COLO_COMPARE(obj);
+
+    s->vnet_hdr = value;
+}
+
 static void compare_pri_rs_finalize(SocketReadState *pri_rs)
 {
     CompareState *s = container_of(pri_rs, CompareState, pri_rs);
 
     if (packet_enqueue(s, PRIMARY_IN)) {
         trace_colo_compare_main("primary: unsupported packet in");
-        compare_chr_send(s, pri_rs->buf, pri_rs->packet_len);
+        compare_chr_send(s,
+                         pri_rs->buf,
+                         pri_rs->packet_len,
+                         pri_rs->vnet_hdr_len);
     } else {
         /* compare connection */
         g_queue_foreach(&s->conn_list, colo_compare_connection, s);
@@ -XXX,XX +XXX,XX @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
         return;
     }
 
-    net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, false);
-    net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, false);
+    net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, s->vnet_hdr);
+    net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, s->vnet_hdr);
 
     g_queue_init(&s->conn_list);
 
@@ -XXX,XX +XXX,XX @@ static void colo_flush_packets(void *opaque, void *user_data)
 
     while (!g_queue_is_empty(&conn->primary_list)) {
         pkt = g_queue_pop_head(&conn->primary_list);
-        compare_chr_send(s, pkt->data, pkt->size);
+        compare_chr_send(s,
+                         pkt->data,
+                         pkt->size,
+                         pkt->vnet_hdr_len);
         packet_destroy(pkt, NULL);
     }
     while (!g_queue_is_empty(&conn->secondary_list)) {
@@ -XXX,XX +XXX,XX @@ static void colo_compare_class_init(ObjectClass *oc, void *data)
 
 static void colo_compare_init(Object *obj)
 {
+    CompareState *s = COLO_COMPARE(obj);
+
     object_property_add_str(obj, "primary_in",
                             compare_get_pri_indev, compare_set_pri_indev,
                             NULL);
@@ -XXX,XX +XXX,XX @@ static void colo_compare_init(Object *obj)
     object_property_add_str(obj, "outdev",
                             compare_get_outdev, compare_set_outdev,
                             NULL);
+
+    s->vnet_hdr = false;
+    object_property_add_bool(obj, "vnet_hdr_support", compare_get_vnet_hdr,
+                             compare_set_vnet_hdr, NULL);
 }
 
 static void colo_compare_finalize(Object *obj)
diff --git a/qemu-options.hx b/qemu-options.hx
index XXXXXXX..XXXXXXX 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ Dump the network traffic on netdev @var{dev} to the file specified by
 The file format is libpcap, so it can be analyzed with tools such as tcpdump
 or Wireshark.
 
-@item -object colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},
-outdev=@var{chardevid}
+@item -object colo-compare,id=@var{id},primary_in=@var{chardevid},secondary_in=@var{chardevid},outdev=@var{chardevid}[,vnet_hdr_support]
 
 Colo-compare gets packet from primary_in@var{chardevid} and secondary_in@var{chardevid}, than compare primary packet with
 secondary packet. If the packets are same, we will output primary
 packet to outdev@var{chardevid}, else we will notify colo-frame
 do checkpoint and send primary packet to outdev@var{chardevid}.
+if it has the vnet_hdr_support flag, colo compare will send/recv packet with vnet_hdr_len.
 
 we must use it with the help of filter-mirror and filter-redirector.
 
-- 
2.7.4

From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

Make colo-compare and filter-rewriter can parse vnet packet.

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -XXX,XX +XXX,XX @@ int parse_packet_early(Packet *pkt)
 {
     int network_length;
     static const uint8_t vlan[] = {0x81, 0x00};
-    uint8_t *data = pkt->data;
+    uint8_t *data = pkt->data + pkt->vnet_hdr_len;
     uint16_t l3_proto;
     ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
 
-    if (pkt->size < ETH_HLEN) {
+    if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
         trace_colo_proxy_main("pkt->size < ETH_HLEN");
         return 1;
     }
@@ -XXX,XX +XXX,XX @@ int parse_packet_early(Packet *pkt)
     }
 
     network_length = pkt->ip->ip_hl * 4;
-    if (pkt->size < l2hdr_len + network_length) {
+    if (pkt->size < l2hdr_len + network_length + pkt->vnet_hdr_len) {
         trace_colo_proxy_main("pkt->size < network_header + network_length");
         return 1;
     }
-- 
2.7.4

From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

COLO-Proxy just focus on packet payload, so we skip vnet header.

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo-compare.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -XXX,XX +XXX,XX @@ static int colo_packet_compare_common(Packet *ppkt, Packet *spkt, int offset)
                                    sec_ip_src, sec_ip_dst);
     }
 
+    offset = ppkt->vnet_hdr_len + offset;
+
     if (ppkt->size == spkt->size) {
-        return memcmp(ppkt->data + offset, spkt->data + offset,
+        return memcmp(ppkt->data + offset,
+                      spkt->data + offset,
                       spkt->size - offset);
     } else {
         trace_colo_compare_main("Net packet size are not the same");
@@ -XXX,XX +XXX,XX @@ static int colo_packet_compare_tcp(Packet *spkt, Packet *ppkt)
      */
     if (ptcp->th_off > 5) {
         ptrdiff_t tcp_offset;
+
         tcp_offset = ppkt->transport_header - (uint8_t *)ppkt->data
-                     + (ptcp->th_off * 4);
+                     + (ptcp->th_off * 4) - ppkt->vnet_hdr_len;
         res = colo_packet_compare_common(ppkt, spkt, tcp_offset);
     } else if (ptcp->th_sum == stcp->th_sum) {
         res = colo_packet_compare_common(ppkt, spkt, ETH_HLEN);
-- 
2.7.4

From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

We add the vnet_hdr_support option for filter-rewriter, default is disabled.
If you use virtio-net-pci or other driver needs vnet_hdr, please enable it.
You can use it for example:
-object filter-rewriter,id=rew0,netdev=hn0,queue=all,vnet_hdr_support

We get the vnet_hdr_len from NetClientState that make us
parse net packet correctly.

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/filter-rewriter.c | 37 ++++++++++++++++++++++++++++++++++++-
 qemu-options.hx       |  4 ++--
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index XXXXXXX..XXXXXXX 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu-common.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qerror.h"
+#include "qemu/error-report.h"
 #include "qapi-visit.h"
 #include "qom/object.h"
 #include "qemu/main-loop.h"
@@ -XXX,XX +XXX,XX @@ typedef struct RewriterState {
     NetQueue *incoming_queue;
     /* hashtable to save connection */
     GHashTable *connection_track_table;
+    bool vnet_hdr;
 } RewriterState;
 
 static void filter_rewriter_flush(NetFilterState *nf)
@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
     ConnectionKey key;
     Packet *pkt;
     ssize_t size = iov_size(iov, iovcnt);
+    ssize_t vnet_hdr_len = 0;
     char *buf = g_malloc0(size);
 
     iov_to_buf(iov, iovcnt, 0, buf, size);
-    pkt = packet_new(buf, size, 0);
+
+    if (s->vnet_hdr) {
+        vnet_hdr_len = nf->netdev->vnet_hdr_len;
+    }
+
+    pkt = packet_new(buf, size, vnet_hdr_len);
     g_free(buf);
 
     /*
@@ -XXX,XX +XXX,XX @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
     s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
 }
 
+static bool filter_rewriter_get_vnet_hdr(Object *obj, Error **errp)
+{
+    RewriterState *s = FILTER_COLO_REWRITER(obj);
+
+    return s->vnet_hdr;
+}
+
+static void filter_rewriter_set_vnet_hdr(Object *obj,
+                                         bool value,
+                                         Error **errp)
+{
+    RewriterState *s = FILTER_COLO_REWRITER(obj);
+
+    s->vnet_hdr = value;
+}
+
+static void filter_rewriter_init(Object *obj)
+{
+    RewriterState *s = FILTER_COLO_REWRITER(obj);
+
+    s->vnet_hdr = false;
+    object_property_add_bool(obj, "vnet_hdr_support",
+                             filter_rewriter_get_vnet_hdr,
+                             filter_rewriter_set_vnet_hdr, NULL);
+}
+
 static void colo_rewriter_class_init(ObjectClass *oc, void *data)
 {
     NetFilterClass *nfc = NETFILTER_CLASS(oc);
@@ -XXX,XX +XXX,XX @@ static const TypeInfo colo_rewriter_info = {
     .name = TYPE_FILTER_REWRITER,
     .parent = TYPE_NETFILTER,
     .class_init = colo_rewriter_class_init,
+    .instance_init = filter_rewriter_init,
     .instance_size = sizeof(RewriterState),
 };
 
diff --git a/qemu-options.hx b/qemu-options.hx
index XXXXXXX..XXXXXXX 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ Create a filter-redirector we need to differ outdev id from indev id, id can not
 be the same. we can just use indev or outdev, but at least one of indev or outdev
 need to be specified.
 
-@item -object filter-rewriter,id=@var{id},netdev=@var{netdevid}[,queue=@var{all|rx|tx}]
+@item -object filter-rewriter,id=@var{id},netdev=@var{netdevid},queue=@var{all|rx|tx},[vnet_hdr_support]
 
 Filter-rewriter is a part of COLO project.It will rewrite tcp packet to
 secondary from primary to keep secondary tcp connection,and rewrite
 tcp packet to primary from secondary make tcp packet can be handled by
-client.
+client.if it has the vnet_hdr_support flag, we can parse packet with vnet header.
 
 usage:
 colo secondary:
-- 
2.7.4

From: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>

Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 docs/colo-proxy.txt | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/docs/colo-proxy.txt b/docs/colo-proxy.txt
index XXXXXXX..XXXXXXX 100644
--- a/docs/colo-proxy.txt
+++ b/docs/colo-proxy.txt
@@ -XXX,XX +XXX,XX @@ Secondary(ip:3.3.3.8):
 -chardev socket,id=red1,host=3.3.3.3,port=9004
 -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0
 -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1
+-object filter-rewriter,id=f3,netdev=hn0,queue=all
+
+If you want to use virtio-net-pci or other driver with vnet_header:
+
+Primary(ip:3.3.3.3):
+-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
+-device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
+-chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
+-chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
+-chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
+-chardev socket,id=compare0-0,host=3.3.3.3,port=9001
+-chardev socket,id=compare_out,host=3.3.3.3,port=9005,server,nowait
+-chardev socket,id=compare_out0,host=3.3.3.3,port=9005
+-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0,vnet_hdr_support
+-object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out,vnet_hdr_support
+-object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0,vnet_hdr_support
+-object colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0,vnet_hdr_support
+
+Secondary(ip:3.3.3.8):
+-netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,down script=/etc/qemu-ifdown
+-device e1000,netdev=hn0,mac=52:a4:00:12:78:66
+-chardev socket,id=red0,host=3.3.3.3,port=9003
+-chardev socket,id=red1,host=3.3.3.3,port=9004
+-object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0,vnet_hdr_support
+-object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1,vnet_hdr_support
+-object filter-rewriter,id=f3,netdev=hn0,queue=all,vnet_hdr_support
 
 Note:
   a.COLO-proxy must work with COLO-frame and Block-replication.
-- 
2.7.4

From: Michal Privoznik <mprivozn@redhat.com>

We have a function that checks if given number is power of two.
We should prefer it instead of expanding the check on our own.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/virtio-net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -XXX,XX +XXX,XX @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
      */
     if (n->net_conf.rx_queue_size < VIRTIO_NET_RX_QUEUE_MIN_SIZE ||
         n->net_conf.rx_queue_size > VIRTQUEUE_MAX_SIZE ||
-        (n->net_conf.rx_queue_size & (n->net_conf.rx_queue_size - 1))) {
+        !is_power_of_2(n->net_conf.rx_queue_size)) {
         error_setg(errp, "Invalid rx_queue_size (= %" PRIu16 "), "
                    "must be a power of 2 between %d and %d.",
                    n->net_conf.rx_queue_size, VIRTIO_NET_RX_QUEUE_MIN_SIZE,
-- 
2.7.4

The following changes since commit d48125de38f48a61d6423ef6a01156d6dff9ee2c:

Merge tag 'kraxel-20220719-pull-request' of https://gitlab.com/kraxel/qemu into staging (2022-07-19 17:40:36 +0100)

are available in the git repository at:

https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to 8bdab83b34efb0b598be4e5b98e4f466ca5f2f80:

net/colo.c: fix segmentation fault when packet is not parsed correctly (2022-07-20 16:58:08 +0800)

----------------------------------------------------------------

Changes since V1:
- Fix build erros of vhost-vdpa when virtio-net is not set

----------------------------------------------------------------
Eugenio Pérez (21):
      vhost: move descriptor translation to vhost_svq_vring_write_descs
      virtio-net: Expose MAC_TABLE_ENTRIES
      virtio-net: Expose ctrl virtqueue logic
      vdpa: Avoid compiler to squash reads to used idx
      vhost: Reorder vhost_svq_kick
      vhost: Move vhost_svq_kick call to vhost_svq_add
      vhost: Check for queue full at vhost_svq_add
      vhost: Decouple vhost_svq_add from VirtQueueElement
      vhost: Add SVQDescState
      vhost: Track number of descs in SVQDescState
      vhost: add vhost_svq_push_elem
      vhost: Expose vhost_svq_add
      vhost: add vhost_svq_poll
      vhost: Add svq avail_handler callback
      vdpa: Export vhost_vdpa_dma_map and unmap calls
      vhost-net-vdpa: add stubs for when no virtio-net device is present
      vdpa: manual forward CVQ buffers
      vdpa: Buffer CVQ support on shadow virtqueue
      vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
      vdpa: Add device migration blocker
      vdpa: Add x-svq to NetdevVhostVDPAOptions

Zhang Chen (4):
      softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH
      net/colo: Fix a "double free" crash to clear the conn_list
      net/colo.c: No need to track conn_list for filter-rewriter
      net/colo.c: fix segmentation fault when packet is not parsed correctly

From: Eugenio Pérez <eperezma@redhat.com>

It's done for both in and out descriptors so it's better placed here.

Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 38 +++++++++++++++++++++++++++-----------
 1 file changed, 27 insertions(+), 11 deletions(-)

From: Eugenio Pérez <eperezma@redhat.com>

vhost-vdpa control virtqueue needs to know the maximum entries supported
by the virtio-net device, so we know if it is possible to apply the
filter.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/virtio-net.c            | 1 -
 include/hw/virtio/virtio-net.h | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -XXX,XX +XXX,XX @@
 
 #define VIRTIO_NET_VM_VERSION    11
 
-#define MAC_TABLE_ENTRIES    64
 #define MAX_VLAN    (1 << 12)   /* Per 802.1Q definition */
 
 /* previously fixed value */
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -XXX,XX +XXX,XX @@ OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET)
  * and latency. */
 #define TX_BURST 256
 
+/* Maximum VIRTIO_NET_CTRL_MAC_TABLE_SET unicast + multicast entries. */
+#define MAC_TABLE_ENTRIES    64
+
 typedef struct virtio_net_conf
 {
     uint32_t txtimer;
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

This allows external vhost-net devices to modify the state of the
VirtIO device model once the vhost-vdpa device has acknowledged the
control commands.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/virtio-net.c            | 84 ++++++++++++++++++++++++------------------
 include/hw/virtio/virtio-net.h |  4 ++
 2 files changed, 53 insertions(+), 35 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -XXX,XX +XXX,XX @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t cmd,
     return VIRTIO_NET_OK;
 }
 
-static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+                                  const struct iovec *in_sg, unsigned in_num,
+                                  const struct iovec *out_sg,
+                                  unsigned out_num)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
     struct virtio_net_ctrl_hdr ctrl;
     virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
-    VirtQueueElement *elem;
     size_t s;
     struct iovec *iov, *iov2;
-    unsigned int iov_cnt;
+
+    if (iov_size(in_sg, in_num) < sizeof(status) ||
+        iov_size(out_sg, out_num) < sizeof(ctrl)) {
+        virtio_error(vdev, "virtio-net ctrl missing headers");
+        return 0;
+    }
+
+    iov2 = iov = g_memdup2(out_sg, sizeof(struct iovec) * out_num);
+    s = iov_to_buf(iov, out_num, 0, &ctrl, sizeof(ctrl));
+    iov_discard_front(&iov, &out_num, sizeof(ctrl));
+    if (s != sizeof(ctrl)) {
+        status = VIRTIO_NET_ERR;
+    } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
+        status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
+        status = virtio_net_handle_mac(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
+        status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
+        status = virtio_net_handle_announce(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
+        status = virtio_net_handle_mq(n, ctrl.cmd, iov, out_num);
+    } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
+        status = virtio_net_handle_offloads(n, ctrl.cmd, iov, out_num);
+    }
+
+    s = iov_from_buf(in_sg, in_num, 0, &status, sizeof(status));
+    assert(s == sizeof(status));
+
+    g_free(iov2);
+    return sizeof(status);
+}
+
+static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtQueueElement *elem;
 
     for (;;) {
+        size_t written;
         elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
         if (!elem) {
             break;
         }
-        if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
-            iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) {
-            virtio_error(vdev, "virtio-net ctrl missing headers");
+
+        written = virtio_net_handle_ctrl_iov(vdev, elem->in_sg, elem->in_num,
+                                             elem->out_sg, elem->out_num);
+        if (written > 0) {
+            virtqueue_push(vq, elem, written);
+            virtio_notify(vdev, vq);
+            g_free(elem);
+        } else {
             virtqueue_detach_element(vq, elem, 0);
             g_free(elem);
             break;
         }
-
-        iov_cnt = elem->out_num;
-        iov2 = iov = g_memdup2(elem->out_sg,
-                               sizeof(struct iovec) * elem->out_num);
-        s = iov_to_buf(iov, iov_cnt, 0, &ctrl, sizeof(ctrl));
-        iov_discard_front(&iov, &iov_cnt, sizeof(ctrl));
-        if (s != sizeof(ctrl)) {
-            status = VIRTIO_NET_ERR;
-        } else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
-            status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
-            status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
-            status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
-            status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
-            status = virtio_net_handle_mq(n, ctrl.cmd, iov, iov_cnt);
-        } else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
-            status = virtio_net_handle_offloads(n, ctrl.cmd, iov, iov_cnt);
-        }
-
-        s = iov_from_buf(elem->in_sg, elem->in_num, 0, &status, sizeof(status));
-        assert(s == sizeof(status));
-
-        virtqueue_push(vq, elem, sizeof(status));
-        virtio_notify(vdev, vq);
-        g_free(iov2);
-        g_free(elem);
     }
 }
 
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -XXX,XX +XXX,XX @@ struct VirtIONet {
     struct EBPFRSSContext ebpf_rss;
 };
 
+size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+                                  const struct iovec *in_sg, unsigned in_num,
+                                  const struct iovec *out_sg,
+                                  unsigned out_num);
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
                                    const char *type);
 
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

In the next patch we will allow busypolling of this value. The compiler
have a running path where shadow_used_idx, last_used_idx, and vring used
idx are not modified within the same thread busypolling.

This was not an issue before since we always cleared device event
notifier before checking it, and that could act as memory barrier.
However, the busypoll needs something similar to kernel READ_ONCE.

Let's add it here, sepparated from the polling.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

From: Eugenio Pérez <eperezma@redhat.com>

Future code needs to call it from vhost_svq_add.

No functional change intended.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

From: Eugenio Pérez <eperezma@redhat.com>

The series needs to expose vhost_svq_add with full functionality,
including kick

From: Eugenio Pérez <eperezma@redhat.com>

The series need to expose vhost_svq_add with full functionality,
including checking for full queue.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 59 +++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 26 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -XXX,XX +XXX,XX @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
  * Add an element to a SVQ.
  *
  * The caller must check that there is enough slots for the new element. It
- * takes ownership of the element: In case of failure, it is free and the SVQ
- * is considered broken.
+ * takes ownership of the element: In case of failure not ENOSPC, it is free.
+ *
+ * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
  */
-static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+static int vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
 {
     unsigned qemu_head;
-    bool ok = vhost_svq_add_split(svq, elem, &qemu_head);
+    unsigned ndescs = elem->in_num + elem->out_num;
+    bool ok;
+
+    if (unlikely(ndescs > vhost_svq_available_slots(svq))) {
+        return -ENOSPC;
+    }
+
+    ok = vhost_svq_add_split(svq, elem, &qemu_head);
     if (unlikely(!ok)) {
         g_free(elem);
-        return false;
+        return -EINVAL;
     }
 
     svq->ring_id_maps[qemu_head] = elem;
     vhost_svq_kick(svq);
-    return true;
+    return 0;
 }
 
 /**
@@ -XXX,XX +XXX,XX @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
 
         while (true) {
             VirtQueueElement *elem;
-            bool ok;
+            int r;
 
             if (svq->next_guest_avail_elem) {
                 elem = g_steal_pointer(&svq->next_guest_avail_elem);
@@ -XXX,XX +XXX,XX @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                 break;
             }
 
-            if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) {
-                /*
-                 * This condition is possible since a contiguous buffer in GPA
-                 * does not imply a contiguous buffer in qemu's VA
-                 * scatter-gather segments. If that happens, the buffer exposed
-                 * to the device needs to be a chain of descriptors at this
-                 * moment.
-                 *
-                 * SVQ cannot hold more available buffers if we are here:
-                 * queue the current guest descriptor and ignore further kicks
-                 * until some elements are used.
-                 */
-                svq->next_guest_avail_elem = elem;
-                return;
-            }
-
-            ok = vhost_svq_add(svq, elem);
-            if (unlikely(!ok)) {
-                /* VQ is broken, just return and ignore any other kicks */
+            r = vhost_svq_add(svq, elem);
+            if (unlikely(r != 0)) {
+                if (r == -ENOSPC) {
+                    /*
+                     * This condition is possible since a contiguous buffer in
+                     * GPA does not imply a contiguous buffer in qemu's VA
+                     * scatter-gather segments. If that happens, the buffer
+                     * exposed to the device needs to be a chain of descriptors
+                     * at this moment.
+                     *
+                     * SVQ cannot hold more available buffers if we are here:
+                     * queue the current guest descriptor and ignore kicks
+                     * until some elements are used.
+                     */
+                    svq->next_guest_avail_elem = elem;
+                }
+
+                /* VQ is full or broken, just return and ignore kicks */
                 return;
             }
         }
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

VirtQueueElement comes from the guest, but we're heading SVQ to be able
to modify the element presented to the device without the guest's
knowledge.

To do so, make SVQ accept sg buffers directly, instead of using
VirtQueueElement.

Add vhost_svq_add_element to maintain element convenience.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -XXX,XX +XXX,XX @@ static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-                                VirtQueueElement *elem, unsigned *head)
+                                const struct iovec *out_sg, size_t out_num,
+                                const struct iovec *in_sg, size_t in_num,
+                                unsigned *head)
 {
     unsigned avail_idx;
     vring_avail_t *avail = svq->vring.avail;
     bool ok;
-    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(elem->out_num, elem->in_num));
+    g_autofree hwaddr *sgs = g_new(hwaddr, MAX(out_num, in_num));
 
     *head = svq->free_head;
 
     /* We need some descriptors here */
-    if (unlikely(!elem->out_num && !elem->in_num)) {
+    if (unlikely(!out_num && !in_num)) {
         qemu_log_mask(LOG_GUEST_ERROR,
                       "Guest provided element with no descriptors");
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
-                                     elem->in_num > 0, false);
+    ok = vhost_svq_vring_write_descs(svq, sgs, out_sg, out_num, in_num > 0,
+                                     false);
     if (unlikely(!ok)) {
         return false;
     }
 
-    ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false,
-                                     true);
+    ok = vhost_svq_vring_write_descs(svq, sgs, in_sg, in_num, false, true);
     if (unlikely(!ok)) {
         return false;
     }
@@ -XXX,XX +XXX,XX @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
  *
  * Return -EINVAL if element is invalid, -ENOSPC if dev queue is full
  */
-static int vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
+                          size_t out_num, const struct iovec *in_sg,
+                          size_t in_num, VirtQueueElement *elem)
 {
     unsigned qemu_head;
-    unsigned ndescs = elem->in_num + elem->out_num;
+    unsigned ndescs = in_num + out_num;
     bool ok;
 
     if (unlikely(ndescs > vhost_svq_available_slots(svq))) {
         return -ENOSPC;
     }
 
-    ok = vhost_svq_add_split(svq, elem, &qemu_head);
+    ok = vhost_svq_add_split(svq, out_sg, out_num, in_sg, in_num, &qemu_head);
     if (unlikely(!ok)) {
         g_free(elem);
         return -EINVAL;
@@ -XXX,XX +XXX,XX @@ static int vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
     return 0;
 }
 
+/* Convenience wrapper to add a guest's element to SVQ */
+static int vhost_svq_add_element(VhostShadowVirtqueue *svq,
+                                 VirtQueueElement *elem)
+{
+    return vhost_svq_add(svq, elem->out_sg, elem->out_num, elem->in_sg,
+                         elem->in_num, elem);
+}
+
 /**
  * Forward available buffers.
  *
@@ -XXX,XX +XXX,XX @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                 break;
             }
 
-            r = vhost_svq_add(svq, elem);
+            r = vhost_svq_add_element(svq, elem);
             if (unlikely(r != 0)) {
                 if (r == -ENOSPC) {
                     /*
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

This will allow SVQ to add context to the different queue elements.

This patch only store the actual element, no functional change intended.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 16 ++++++++--------
 hw/virtio/vhost-shadow-virtqueue.h |  8 ++++++--
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -XXX,XX +XXX,XX @@ static int vhost_svq_add(VhostShadowVirtqueue *svq, const struct iovec *out_sg,
         return -EINVAL;
     }
 
-    svq->ring_id_maps[qemu_head] = elem;
+    svq->desc_state[qemu_head].elem = elem;
     vhost_svq_kick(svq);
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
         return NULL;
     }
 
-    if (unlikely(!svq->ring_id_maps[used_elem.id])) {
+    if (unlikely(!svq->desc_state[used_elem.id].elem)) {
         qemu_log_mask(LOG_GUEST_ERROR,
             "Device %s says index %u is used, but it was not available",
             svq->vdev->name, used_elem.id);
         return NULL;
     }
 
-    num = svq->ring_id_maps[used_elem.id]->in_num +
-          svq->ring_id_maps[used_elem.id]->out_num;
+    num = svq->desc_state[used_elem.id].elem->in_num +
+          svq->desc_state[used_elem.id].elem->out_num;
     last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
     svq->desc_next[last_used_chain] = svq->free_head;
     svq->free_head = used_elem.id;
 
     *len = used_elem.len;
-    return g_steal_pointer(&svq->ring_id_maps[used_elem.id]);
+    return g_steal_pointer(&svq->desc_state[used_elem.id].elem);
 }
 
 static void vhost_svq_flush(VhostShadowVirtqueue *svq,
@@ -XXX,XX +XXX,XX @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
     memset(svq->vring.desc, 0, driver_size);
     svq->vring.used = qemu_memalign(qemu_real_host_page_size(), device_size);
     memset(svq->vring.used, 0, device_size);
-    svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
+    svq->desc_state = g_new0(SVQDescState, svq->vring.num);
     svq->desc_next = g_new0(uint16_t, svq->vring.num);
     for (unsigned i = 0; i < svq->vring.num - 1; i++) {
         svq->desc_next[i] = cpu_to_le16(i + 1);
@@ -XXX,XX +XXX,XX @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 
     for (unsigned i = 0; i < svq->vring.num; ++i) {
         g_autofree VirtQueueElement *elem = NULL;
-        elem = g_steal_pointer(&svq->ring_id_maps[i]);
+        elem = g_steal_pointer(&svq->desc_state[i].elem);
         if (elem) {
             virtqueue_detach_element(svq->vq, elem, 0);
         }
@@ -XXX,XX +XXX,XX @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
     }
     svq->vq = NULL;
     g_free(svq->desc_next);
-    g_free(svq->ring_id_maps);
+    g_free(svq->desc_state);
     qemu_vfree(svq->vring.desc);
     qemu_vfree(svq->vring.used);
 }
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -XXX,XX +XXX,XX @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/vhost-iova-tree.h"
 
+typedef struct SVQDescState {
+    VirtQueueElement *elem;
+} SVQDescState;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
     /* Shadow vring */
@@ -XXX,XX +XXX,XX @@ typedef struct VhostShadowVirtqueue {
     /* IOVA mapping */
     VhostIOVATree *iova_tree;
 
-    /* Map for use the guest's descriptors */
-    VirtQueueElement **ring_id_maps;
+    /* SVQ vring descriptors state */
+    SVQDescState *desc_state;
 
     /* Next VirtQueue element that guest made available */
     VirtQueueElement *next_guest_avail_elem;
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

A guest's buffer continuos on GPA may need multiple descriptors on
qemu's VA, so SVQ should track its length sepparatedly.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 4 ++--
 hw/virtio/vhost-shadow-virtqueue.h | 6 ++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

From: Eugenio Pérez <eperezma@redhat.com>

This function allows external SVQ users to return guest's available
buffers.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 16 ++++++++++++++++
 hw/virtio/vhost-shadow-virtqueue.h |  3 +++
 2 files changed, 19 insertions(+)

From: Eugenio Pérez <eperezma@redhat.com>

This allows external parts of SVQ to forward custom buffers to the
device.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 6 +++---
 hw/virtio/vhost-shadow-virtqueue.h | 3 +++
 2 files changed, 6 insertions(+), 3 deletions(-)

From: Eugenio Pérez <eperezma@redhat.com>

It allows the Shadow Control VirtQueue to wait for the device to use the
available buffers.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 27 +++++++++++++++++++++++++++
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 2 files changed, 28 insertions(+)

From: Eugenio Pérez <eperezma@redhat.com>

This allows external handlers to be aware of new buffers that the guest
places in the virtqueue.

When this callback is defined the ownership of the guest's virtqueue
element is transferred to the callback. This means that if the user
wants to forward the descriptor it needs to manually inject it. The
callback is also free to process the command by itself and use the
element with svq_push.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-shadow-virtqueue.c | 14 ++++++++++++--
 hw/virtio/vhost-shadow-virtqueue.h | 31 ++++++++++++++++++++++++++++++-
 hw/virtio/vhost-vdpa.c             |  3 ++-
 3 files changed, 44 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -XXX,XX +XXX,XX @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue *svq)
                 break;
             }
 
-            r = vhost_svq_add_element(svq, elem);
+            if (svq->ops) {
+                r = svq->ops->avail_handler(svq, elem, svq->ops_opaque);
+            } else {
+                r = vhost_svq_add_element(svq, elem);
+            }
             if (unlikely(r != 0)) {
                 if (r == -ENOSPC) {
                     /*
@@ -XXX,XX +XXX,XX @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  * shadow methods and file descriptors.
  *
  * @iova_tree: Tree to perform descriptors translations
+ * @ops: SVQ owner callbacks
+ * @ops_opaque: ops opaque pointer
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+                                    const VhostShadowVirtqueueOps *ops,
+                                    void *ops_opaque)
 {
     g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
     int r;
@@ -XXX,XX +XXX,XX @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
     event_notifier_init_fd(&svq->svq_kick, VHOST_FILE_UNBIND);
     event_notifier_set_handler(&svq->hdev_call, vhost_svq_handle_call);
     svq->iova_tree = iova_tree;
+    svq->ops = ops;
+    svq->ops_opaque = ops_opaque;
     return g_steal_pointer(&svq);
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-shadow-virtqueue.h b/hw/virtio/vhost-shadow-virtqueue.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -XXX,XX +XXX,XX @@ typedef struct SVQDescState {
     unsigned int ndescs;
 } SVQDescState;
 
+typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
+
+/**
+ * Callback to handle an avail buffer.
+ *
+ * @svq:  Shadow virtqueue
+ * @elem:  Element placed in the queue by the guest
+ * @vq_callback_opaque:  Opaque
+ *
+ * Returns 0 if the vq is running as expected.
+ *
+ * Note that ownership of elem is transferred to the callback.
+ */
+typedef int (*VirtQueueAvailCallback)(VhostShadowVirtqueue *svq,
+                                      VirtQueueElement *elem,
+                                      void *vq_callback_opaque);
+
+typedef struct VhostShadowVirtqueueOps {
+    VirtQueueAvailCallback avail_handler;
+} VhostShadowVirtqueueOps;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
     /* Shadow vring */
@@ -XXX,XX +XXX,XX @@ typedef struct VhostShadowVirtqueue {
      */
     uint16_t *desc_next;
 
+    /* Caller callbacks */
+    const VhostShadowVirtqueueOps *ops;
+
+    /* Caller callbacks opaque */
+    void *ops_opaque;
+
     /* Next head to expose to the device */
     uint16_t shadow_avail_idx;
 
@@ -XXX,XX +XXX,XX @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice *vdev,
                      VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+                                    const VhostShadowVirtqueueOps *ops,
+                                    void *ops_opaque);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
 
     shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
-        g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree);
+        g_autoptr(VhostShadowVirtqueue) svq;
 
+        svq = vhost_svq_new(v->iova_tree, NULL, NULL);
         if (unlikely(!svq)) {
             error_setg(errp, "Cannot create svq %u", n);
             return -1;
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

Shadow CVQ will copy buffers on qemu VA, so we avoid TOCTOU attacks from
the guest that could set a different state in qemu device model and vdpa
device.

To do so, it needs to be able to map these new buffers to the device.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-vdpa.c         | 7 +++----
 include/hw/virtio/vhost-vdpa.h | 4 ++++
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
     return false;
 }
 
-static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
-                              void *vaddr, bool readonly)
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
+                       void *vaddr, bool readonly)
 {
     struct vhost_msg_v2 msg = {};
     int fd = v->device_fd;
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
     return ret;
 }
 
-static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova,
-                                hwaddr size)
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size)
 {
     struct vhost_msg_v2 msg = {};
     int fd = v->device_fd;
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -XXX,XX +XXX,XX @@ typedef struct vhost_vdpa {
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
 
+int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr iova, hwaddr size,
+                       void *vaddr, bool readonly);
+int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, hwaddr iova, hwaddr size);
+
 #endif
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

net/vhost-vdpa.c will need functions that are declared in
vhost-shadow-virtqueue.c, that needs functions of virtio-net.c.

Copy the vhost-vdpa-stub.c code so
only the constructor net_init_vhost_vdpa needs to be defined.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/meson.build       |  3 ++-
 net/vhost-vdpa-stub.c | 21 +++++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100644 net/vhost-vdpa-stub.c

diff --git a/net/meson.build b/net/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/net/meson.build
+++ b/net/meson.build
@@ -XXX,XX +XXX,XX @@ endif
 softmmu_ss.add(when: 'CONFIG_POSIX', if_true: files(tap_posix))
 softmmu_ss.add(when: 'CONFIG_WIN32', if_true: files('tap-win32.c'))
 if have_vhost_net_vdpa
-  softmmu_ss.add(files('vhost-vdpa.c'))
+  softmmu_ss.add(when: 'CONFIG_VIRTIO_NET', if_true: files('vhost-vdpa.c'), if_false: files('vhost-vdpa-stub.c'))
+  softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-vdpa-stub.c'))
 endif
 
 vmnet_files = files(
diff --git a/net/vhost-vdpa-stub.c b/net/vhost-vdpa-stub.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/net/vhost-vdpa-stub.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * vhost-vdpa-stub.c
+ *
+ * Copyright (c) 2022 Red Hat, Inc.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "clients.h"
+#include "net/vhost-vdpa.h"
+#include "qapi/error.h"
+
+int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
+                        NetClientState *peer, Error **errp)
+{
+    error_setg(errp, "vhost-vdpa requires frontend driver virtio-net-*");
+    return -1;
+}
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

Do a simple forwarding of CVQ buffers, the same work SVQ could do but
through callbacks. No functional change intended.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-vdpa.c         |  3 ++-
 include/hw/virtio/vhost-vdpa.h |  3 +++
 net/vhost-vdpa.c               | 58 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
     for (unsigned n = 0; n < hdev->nvqs; ++n) {
         g_autoptr(VhostShadowVirtqueue) svq;
 
-        svq = vhost_svq_new(v->iova_tree, NULL, NULL);
+        svq = vhost_svq_new(v->iova_tree, v->shadow_vq_ops,
+                            v->shadow_vq_ops_opaque);
         if (unlikely(!svq)) {
             error_setg(errp, "Cannot create svq %u", n);
             return -1;
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -XXX,XX +XXX,XX @@
 #include <gmodule.h>
 
 #include "hw/virtio/vhost-iova-tree.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
 
@@ -XXX,XX +XXX,XX @@ typedef struct vhost_vdpa {
     /* IOVA mapping used by the Shadow Virtqueue */
     VhostIOVATree *iova_tree;
     GPtrArray *shadow_vqs;
+    const VhostShadowVirtqueueOps *shadow_vq_ops;
+    void *shadow_vq_ops_opaque;
     struct vhost_dev *dev;
     VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index XXXXXXX..XXXXXXX 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/osdep.h"
 #include "clients.h"
+#include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "net/vhost-vdpa.h"
 #include "hw/virtio/vhost-vdpa.h"
 #include "qemu/config-file.h"
 #include "qemu/error-report.h"
+#include "qemu/log.h"
+#include "qemu/memalign.h"
 #include "qemu/option.h"
 #include "qapi/error.h"
 #include <linux/vhost.h>
@@ -XXX,XX +XXX,XX @@ static NetClientInfo net_vhost_vdpa_info = {
         .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+/**
+ * Forward buffer for the moment.
+ */
+static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
+                                            VirtQueueElement *elem,
+                                            void *opaque)
+{
+    unsigned int n = elem->out_num + elem->in_num;
+    g_autofree struct iovec *dev_buffers = g_new(struct iovec, n);
+    size_t in_len, dev_written;
+    virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
+    int r;
+
+    memcpy(dev_buffers, elem->out_sg, elem->out_num);
+    memcpy(dev_buffers + elem->out_num, elem->in_sg, elem->in_num);
+
+    r = vhost_svq_add(svq, &dev_buffers[0], elem->out_num, &dev_buffers[1],
+                      elem->in_num, elem);
+    if (unlikely(r != 0)) {
+        if (unlikely(r == -ENOSPC)) {
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
+                          __func__);
+        }
+        goto out;
+    }
+
+    /*
+     * We can poll here since we've had BQL from the time we sent the
+     * descriptor. Also, we need to take the answer before SVQ pulls by itself,
+     * when BQL is released
+     */
+    dev_written = vhost_svq_poll(svq);
+    if (unlikely(dev_written < sizeof(status))) {
+        error_report("Insufficient written data (%zu)", dev_written);
+    }
+
+out:
+    in_len = iov_from_buf(elem->in_sg, elem->in_num, 0, &status,
+                          sizeof(status));
+    if (unlikely(in_len < sizeof(status))) {
+        error_report("Bad device CVQ written length");
+    }
+    vhost_svq_push_elem(svq, elem, MIN(in_len, sizeof(status)));
+    g_free(elem);
+    return r;
+}
+
+static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
+    .avail_handler = vhost_vdpa_net_handle_ctrl_avail,
+};
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                            const char *device,
                                            const char *name,
@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
+    if (!is_datapath) {
+        s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
+        s->vhost_vdpa.shadow_vq_ops_opaque = s;
+    }
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
         qemu_del_net_client(nc);
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

Introduce the control virtqueue support for vDPA shadow virtqueue. This
is needed for advanced networking features like rx filtering.

Virtio-net control VQ copies the descriptors to qemu's VA, so we avoid
TOCTOU with the guest's or device's memory every time there is a device
model change.  Otherwise, the guest could change the memory content in
the time between qemu and the device read it.

To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is
implemented.  If the virtio-net driver changes MAC the virtio-net device
model will be updated with the new one, and a rx filtering change event
will be raised.

More cvq commands could be added here straightforwardly but they have
not been tested.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/vhost-vdpa.c | 213 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 205 insertions(+), 8 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index XXXXXXX..XXXXXXX 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -XXX,XX +XXX,XX @@ typedef struct VhostVDPAState {
     NetClientState nc;
     struct vhost_vdpa vhost_vdpa;
     VHostNetState *vhost_net;
+
+    /* Control commands shadow buffers */
+    void *cvq_cmd_out_buffer, *cvq_cmd_in_buffer;
     bool started;
 } VhostVDPAState;
 
@@ -XXX,XX +XXX,XX @@ static void vhost_vdpa_cleanup(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
 
+    qemu_vfree(s->cvq_cmd_out_buffer);
+    qemu_vfree(s->cvq_cmd_in_buffer);
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
         g_free(s->vhost_net);
@@ -XXX,XX +XXX,XX @@ static NetClientInfo net_vhost_vdpa_info = {
         .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static void vhost_vdpa_cvq_unmap_buf(struct vhost_vdpa *v, void *addr)
+{
+    VhostIOVATree *tree = v->iova_tree;
+    DMAMap needle = {
+        /*
+         * No need to specify size or to look for more translations since
+         * this contiguous chunk was allocated by us.
+         */
+        .translated_addr = (hwaddr)(uintptr_t)addr,
+    };
+    const DMAMap *map = vhost_iova_tree_find_iova(tree, &needle);
+    int r;
+
+    if (unlikely(!map)) {
+        error_report("Cannot locate expected map");
+        return;
+    }
+
+    r = vhost_vdpa_dma_unmap(v, map->iova, map->size + 1);
+    if (unlikely(r != 0)) {
+        error_report("Device cannot unmap: %s(%d)", g_strerror(r), r);
+    }
+
+    vhost_iova_tree_remove(tree, map);
+}
+
+static size_t vhost_vdpa_net_cvq_cmd_len(void)
+{
+    /*
+     * MAC_TABLE_SET is the ctrl command that produces the longer out buffer.
+     * In buffer is always 1 byte, so it should fit here
+     */
+    return sizeof(struct virtio_net_ctrl_hdr) +
+           2 * sizeof(struct virtio_net_ctrl_mac) +
+           MAC_TABLE_ENTRIES * ETH_ALEN;
+}
+
+static size_t vhost_vdpa_net_cvq_cmd_page_len(void)
+{
+    return ROUND_UP(vhost_vdpa_net_cvq_cmd_len(), qemu_real_host_page_size());
+}
+
+/** Copy and map a guest buffer. */
+static bool vhost_vdpa_cvq_map_buf(struct vhost_vdpa *v,
+                                   const struct iovec *out_data,
+                                   size_t out_num, size_t data_len, void *buf,
+                                   size_t *written, bool write)
+{
+    DMAMap map = {};
+    int r;
+
+    if (unlikely(!data_len)) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid legnth of %s buffer\n",
+                      __func__, write ? "in" : "out");
+        return false;
+    }
+
+    *written = iov_to_buf(out_data, out_num, 0, buf, data_len);
+    map.translated_addr = (hwaddr)(uintptr_t)buf;
+    map.size = vhost_vdpa_net_cvq_cmd_page_len() - 1;
+    map.perm = write ? IOMMU_RW : IOMMU_RO,
+    r = vhost_iova_tree_map_alloc(v->iova_tree, &map);
+    if (unlikely(r != IOVA_OK)) {
+        error_report("Cannot map injected element");
+        return false;
+    }
+
+    r = vhost_vdpa_dma_map(v, map.iova, vhost_vdpa_net_cvq_cmd_page_len(), buf,
+                           !write);
+    if (unlikely(r < 0)) {
+        goto dma_map_err;
+    }
+
+    return true;
+
+dma_map_err:
+    vhost_iova_tree_remove(v->iova_tree, &map);
+    return false;
+}
+
 /**
- * Forward buffer for the moment.
+ * Copy the guest element into a dedicated buffer suitable to be sent to NIC
+ *
+ * @iov: [0] is the out buffer, [1] is the in one
+ */
+static bool vhost_vdpa_net_cvq_map_elem(VhostVDPAState *s,
+                                        VirtQueueElement *elem,
+                                        struct iovec *iov)
+{
+    size_t in_copied;
+    bool ok;
+
+    iov[0].iov_base = s->cvq_cmd_out_buffer;
+    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, elem->out_sg, elem->out_num,
+                                vhost_vdpa_net_cvq_cmd_len(), iov[0].iov_base,
+                                &iov[0].iov_len, false);
+    if (unlikely(!ok)) {
+        return false;
+    }
+
+    iov[1].iov_base = s->cvq_cmd_in_buffer;
+    ok = vhost_vdpa_cvq_map_buf(&s->vhost_vdpa, NULL, 0,
+                                sizeof(virtio_net_ctrl_ack), iov[1].iov_base,
+                                &in_copied, true);
+    if (unlikely(!ok)) {
+        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, s->cvq_cmd_out_buffer);
+        return false;
+    }
+
+    iov[1].iov_len = sizeof(virtio_net_ctrl_ack);
+    return true;
+}
+
+/**
+ * Do not forward commands not supported by SVQ. Otherwise, the device could
+ * accept it and qemu would not know how to update the device model.
+ */
+static bool vhost_vdpa_net_cvq_validate_cmd(const struct iovec *out,
+                                            size_t out_num)
+{
+    struct virtio_net_ctrl_hdr ctrl;
+    size_t n;
+
+    n = iov_to_buf(out, out_num, 0, &ctrl, sizeof(ctrl));
+    if (unlikely(n < sizeof(ctrl))) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s: invalid legnth of out buffer %zu\n", __func__, n);
+        return false;
+    }
+
+    switch (ctrl.class) {
+    case VIRTIO_NET_CTRL_MAC:
+        switch (ctrl.cmd) {
+        case VIRTIO_NET_CTRL_MAC_ADDR_SET:
+            return true;
+        default:
+            qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mac cmd %u\n",
+                          __func__, ctrl.cmd);
+        };
+        break;
+    default:
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid control class %u\n",
+                      __func__, ctrl.class);
+    };
+
+    return false;
+}
+
+/**
+ * Validate and copy control virtqueue commands.
+ *
+ * Following QEMU guidelines, we offer a copy of the buffers to the device to
+ * prevent TOCTOU bugs.
  */
 static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
                                             VirtQueueElement *elem,
                                             void *opaque)
 {
-    unsigned int n = elem->out_num + elem->in_num;
-    g_autofree struct iovec *dev_buffers = g_new(struct iovec, n);
+    VhostVDPAState *s = opaque;
     size_t in_len, dev_written;
     virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
-    int r;
+    /* out and in buffers sent to the device */
+    struct iovec dev_buffers[2] = {
+        { .iov_base = s->cvq_cmd_out_buffer },
+        { .iov_base = s->cvq_cmd_in_buffer },
+    };
+    /* in buffer used for device model */
+    const struct iovec in = {
+        .iov_base = &status,
+        .iov_len = sizeof(status),
+    };
+    int r = -EINVAL;
+    bool ok;
+
+    ok = vhost_vdpa_net_cvq_map_elem(s, elem, dev_buffers);
+    if (unlikely(!ok)) {
+        goto out;
+    }
 
-    memcpy(dev_buffers, elem->out_sg, elem->out_num);
-    memcpy(dev_buffers + elem->out_num, elem->in_sg, elem->in_num);
+    ok = vhost_vdpa_net_cvq_validate_cmd(&dev_buffers[0], 1);
+    if (unlikely(!ok)) {
+        goto out;
+    }
 
-    r = vhost_svq_add(svq, &dev_buffers[0], elem->out_num, &dev_buffers[1],
-                      elem->in_num, elem);
+    r = vhost_svq_add(svq, &dev_buffers[0], 1, &dev_buffers[1], 1, elem);
     if (unlikely(r != 0)) {
         if (unlikely(r == -ENOSPC)) {
             qemu_log_mask(LOG_GUEST_ERROR, "%s: No space on device queue\n",
@@ -XXX,XX +XXX,XX @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
     dev_written = vhost_svq_poll(svq);
     if (unlikely(dev_written < sizeof(status))) {
         error_report("Insufficient written data (%zu)", dev_written);
+        goto out;
+    }
+
+    memcpy(&status, dev_buffers[1].iov_base, sizeof(status));
+    if (status != VIRTIO_NET_OK) {
+        goto out;
+    }
+
+    status = VIRTIO_NET_ERR;
+    virtio_net_handle_ctrl_iov(svq->vdev, &in, 1, dev_buffers, 1);
+    if (status != VIRTIO_NET_OK) {
+        error_report("Bad CVQ processing in model");
     }
 
 out:
@@ -XXX,XX +XXX,XX @@ out:
     }
     vhost_svq_push_elem(svq, elem, MIN(in_len, sizeof(status)));
     g_free(elem);
+    if (dev_buffers[0].iov_base) {
+        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, dev_buffers[0].iov_base);
+    }
+    if (dev_buffers[1].iov_base) {
+        vhost_vdpa_cvq_unmap_buf(&s->vhost_vdpa, dev_buffers[1].iov_base);
+    }
     return r;
 }
 
@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
     if (!is_datapath) {
+        s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
+                                            vhost_vdpa_net_cvq_cmd_page_len());
+        memset(s->cvq_cmd_out_buffer, 0, vhost_vdpa_net_cvq_cmd_page_len());
+        s->cvq_cmd_in_buffer = qemu_memalign(qemu_real_host_page_size(),
+                                            vhost_vdpa_net_cvq_cmd_page_len());
+        memset(s->cvq_cmd_in_buffer, 0, vhost_vdpa_net_cvq_cmd_page_len());
+
         s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
         s->vhost_vdpa.shadow_vq_ops_opaque = s;
     }
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

To know the device features is needed for CVQ SVQ, so SVQ knows if it
can handle all commands or not. Extract from
vhost_vdpa_get_max_queue_pairs so we can reuse it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/vhost-vdpa.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index XXXXXXX..XXXXXXX 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     return nc;
 }
 
-static int vhost_vdpa_get_max_queue_pairs(int fd, int *has_cvq, Error **errp)
+static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
+{
+    int ret = ioctl(fd, VHOST_GET_FEATURES, features);
+    if (unlikely(ret < 0)) {
+        error_setg_errno(errp, errno,
+                         "Fail to query features from vhost-vDPA device");
+    }
+    return ret;
+}
+
+static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
+                                          int *has_cvq, Error **errp)
 {
     unsigned long config_size = offsetof(struct vhost_vdpa_config, buf);
     g_autofree struct vhost_vdpa_config *config = NULL;
     __virtio16 *max_queue_pairs;
-    uint64_t features;
     int ret;
 
-    ret = ioctl(fd, VHOST_GET_FEATURES, &features);
-    if (ret) {
-        error_setg(errp, "Fail to query features from vhost-vDPA device");
-        return ret;
-    }
-
     if (features & (1 << VIRTIO_NET_F_CTRL_VQ)) {
         *has_cvq = 1;
     } else {
@@ -XXX,XX +XXX,XX @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
                         NetClientState *peer, Error **errp)
 {
     const NetdevVhostVDPAOptions *opts;
+    uint64_t features;
     int vdpa_device_fd;
     g_autofree NetClientState **ncs = NULL;
     NetClientState *nc;
-    int queue_pairs, i, has_cvq = 0;
+    int queue_pairs, r, i, has_cvq = 0;
 
     assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
     opts = &netdev->u.vhost_vdpa;
@@ -XXX,XX +XXX,XX @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         return -errno;
     }
 
-    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd,
+    r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp);
+    if (unlikely(r < 0)) {
+        return r;
+    }
+
+    queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features,
                                                  &has_cvq, errp);
     if (queue_pairs < 0) {
         qemu_close(vdpa_device_fd);
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

Since the vhost-vdpa device is exposing _F_LOG, adding a migration blocker if
it uses CVQ.

However, qemu is able to migrate simple devices with no CVQ as long as
they use SVQ. To allow it, add a placeholder error to vhost_vdpa, and
only add to vhost_dev when used. vhost_dev machinery place the migration
blocker if needed.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/vhost-vdpa.c         | 15 +++++++++++++++
 include/hw/virtio/vhost-vdpa.h |  1 +
 2 files changed, 16 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -XXX,XX +XXX,XX @@
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 #include "hw/virtio/vhost-vdpa.h"
 #include "exec/address-spaces.h"
+#include "migration/blocker.h"
 #include "qemu/cutils.h"
 #include "qemu/main-loop.h"
 #include "cpu.h"
@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_svqs_start(struct vhost_dev *dev)
         return true;
     }
 
+    if (v->migration_blocker) {
+        int r = migrate_add_blocker(v->migration_blocker, &err);
+        if (unlikely(r < 0)) {
+            return false;
+        }
+    }
+
     for (i = 0; i < v->shadow_vqs->len; ++i) {
         VirtQueue *vq = virtio_get_queue(dev->vdev, dev->vq_index + i);
         VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs, i);
@@ -XXX,XX +XXX,XX @@ err:
         vhost_svq_stop(svq);
     }
 
+    if (v->migration_blocker) {
+        migrate_del_blocker(v->migration_blocker);
+    }
+
     return false;
 }
 
@@ -XXX,XX +XXX,XX @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
         }
     }
 
+    if (v->migration_blocker) {
+        migrate_del_blocker(v->migration_blocker);
+    }
     return true;
 }
 
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -XXX,XX +XXX,XX @@ typedef struct vhost_vdpa {
     bool shadow_vqs_enabled;
     /* IOVA mapping used by the Shadow Virtqueue */
     VhostIOVATree *iova_tree;
+    Error *migration_blocker;
     GPtrArray *shadow_vqs;
     const VhostShadowVirtqueueOps *shadow_vq_ops;
     void *shadow_vq_ops_opaque;
-- 
2.7.4

From: Eugenio Pérez <eperezma@redhat.com>

Finally offering the possibility to enable SVQ from the command line.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/vhost-vdpa.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 qapi/net.json    |  9 ++++++-
 2 files changed, 77 insertions(+), 4 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index XXXXXXX..XXXXXXX 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -XXX,XX +XXX,XX @@ const int vdpa_feature_bits[] = {
     VHOST_INVALID_FEATURE_BIT
 };
 
+/** Supported device specific feature bits with SVQ */
+static const uint64_t vdpa_svq_device_features =
+    BIT_ULL(VIRTIO_NET_F_CSUM) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
+    BIT_ULL(VIRTIO_NET_F_MTU) |
+    BIT_ULL(VIRTIO_NET_F_MAC) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
+    BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
+    BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
+    BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
+    BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
+    BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
+    BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |
+    BIT_ULL(VIRTIO_NET_F_STATUS) |
+    BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
+    BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
+    BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
+    BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
+    BIT_ULL(VIRTIO_NET_F_STANDBY);
+
 VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -XXX,XX +XXX,XX @@ err_init:
 static void vhost_vdpa_cleanup(NetClientState *nc)
 {
     VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
+    struct vhost_dev *dev = &s->vhost_net->dev;
 
     qemu_vfree(s->cvq_cmd_out_buffer);
     qemu_vfree(s->cvq_cmd_in_buffer);
+    if (dev->vq_index + dev->nvqs == dev->vq_index_end) {
+        g_clear_pointer(&s->vhost_vdpa.iova_tree, vhost_iova_tree_delete);
+    }
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
         g_free(s->vhost_net);
@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
                                            int vdpa_device_fd,
                                            int queue_pair_index,
                                            int nvqs,
-                                           bool is_datapath)
+                                           bool is_datapath,
+                                           bool svq,
+                                           VhostIOVATree *iova_tree)
 {
     NetClientState *nc = NULL;
     VhostVDPAState *s;
@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
     s->vhost_vdpa.device_fd = vdpa_device_fd;
     s->vhost_vdpa.index = queue_pair_index;
+    s->vhost_vdpa.shadow_vqs_enabled = svq;
+    s->vhost_vdpa.iova_tree = iova_tree;
     if (!is_datapath) {
         s->cvq_cmd_out_buffer = qemu_memalign(qemu_real_host_page_size(),
                                             vhost_vdpa_net_cvq_cmd_page_len());
@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
 
         s->vhost_vdpa.shadow_vq_ops = &vhost_vdpa_net_svq_ops;
         s->vhost_vdpa.shadow_vq_ops_opaque = s;
+        error_setg(&s->vhost_vdpa.migration_blocker,
+                   "Migration disabled: vhost-vdpa uses CVQ.");
     }
     ret = vhost_vdpa_add(nc, (void *)&s->vhost_vdpa, queue_pair_index, nvqs);
     if (ret) {
@@ -XXX,XX +XXX,XX @@ static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
     return nc;
 }
 
+static int vhost_vdpa_get_iova_range(int fd,
+                                     struct vhost_vdpa_iova_range *iova_range)
+{
+    int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
+
+    return ret < 0 ? -errno : 0;
+}
+
 static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
 {
     int ret = ioctl(fd, VHOST_GET_FEATURES, features);
@@ -XXX,XX +XXX,XX @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
     uint64_t features;
     int vdpa_device_fd;
     g_autofree NetClientState **ncs = NULL;
+    g_autoptr(VhostIOVATree) iova_tree = NULL;
     NetClientState *nc;
     int queue_pairs, r, i, has_cvq = 0;
 
@@ -XXX,XX +XXX,XX @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
         return queue_pairs;
     }
 
+    if (opts->x_svq) {
+        struct vhost_vdpa_iova_range iova_range;
+
+        uint64_t invalid_dev_features =
+            features & ~vdpa_svq_device_features &
+            /* Transport are all accepted at this point */
+            ~MAKE_64BIT_MASK(VIRTIO_TRANSPORT_F_START,
+                             VIRTIO_TRANSPORT_F_END - VIRTIO_TRANSPORT_F_START);
+
+        if (invalid_dev_features) {
+            error_setg(errp, "vdpa svq does not work with features 0x%" PRIx64,
+                       invalid_dev_features);
+            goto err_svq;
+        }
+
+        vhost_vdpa_get_iova_range(vdpa_device_fd, &iova_range);
+        iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
+    }
+
     ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
 
     for (i = 0; i < queue_pairs; i++) {
         ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                     vdpa_device_fd, i, 2, true);
+                                     vdpa_device_fd, i, 2, true, opts->x_svq,
+                                     iova_tree);
         if (!ncs[i])
             goto err;
     }
 
     if (has_cvq) {
         nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
-                                 vdpa_device_fd, i, 1, false);
+                                 vdpa_device_fd, i, 1, false,
+                                 opts->x_svq, iova_tree);
         if (!nc)
             goto err;
     }
 
+    /* iova_tree ownership belongs to last NetClientState */
+    g_steal_pointer(&iova_tree);
     return 0;
 
 err:
@@ -XXX,XX +XXX,XX @@ err:
             qemu_del_net_client(ncs[i]);
         }
     }
+
+err_svq:
     qemu_close(vdpa_device_fd);
 
     return -1;
diff --git a/qapi/net.json b/qapi/net.json
index XXXXXXX..XXXXXXX 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -XXX,XX +XXX,XX @@
 # @queues: number of queues to be created for multiqueue vhost-vdpa
 #          (default: 1)
 #
+# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
+#         (default: false)
+#
+# Features:
+# @unstable: Member @x-svq is experimental.
+#
 # Since: 5.1
 ##
 { 'struct': 'NetdevVhostVDPAOptions',
   'data': {
     '*vhostdev':     'str',
-    '*queues':       'int' } }
+    '*queues':       'int',
+    '*x-svq':        {'type': 'bool', 'features' : [ 'unstable'] } } }
 
 ##
 # @NetdevVmnetHostOptions:
-- 
2.7.4

From: Zhang Chen <chen.zhang@intel.com>

If the checkpoint occurs when the guest finishes restarting
but has not started running, the runstate_set() may reject
the transition from COLO to PRELAUNCH with the crash log:

{"timestamp": {"seconds": 1593484591, "microseconds": 26605},\
"event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
qemu-system-x86_64: invalid runstate transition: 'colo' -> 'prelaunch'

Long-term testing says that it's pretty safe.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 softmmu/runstate.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index XXXXXXX..XXXXXXX 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -XXX,XX +XXX,XX @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_RESTORE_VM, RUN_STATE_PRELAUNCH },
 
     { RUN_STATE_COLO, RUN_STATE_RUNNING },
+    { RUN_STATE_COLO, RUN_STATE_PRELAUNCH },
     { RUN_STATE_COLO, RUN_STATE_SHUTDOWN},
 
     { RUN_STATE_RUNNING, RUN_STATE_DEBUG },
-- 
2.7.4

From: Zhang Chen <chen.zhang@intel.com>

We notice the QEMU may crash when the guest has too many
incoming network connections with the following log:

15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
free(): invalid pointer
[1]    15195 abort (core dumped)  qemu-system-x86_64 ....

This is because we create the s->connection_track_table with
g_hash_table_new_full() which is defined as:

GHashTable * g_hash_table_new_full (GHashFunc hash_func,
                       GEqualFunc key_equal_func,
                       GDestroyNotify key_destroy_func,
                       GDestroyNotify value_destroy_func);

The fourth parameter connection_destroy() will be called to free the
memory allocated for all 'Connection' values in the hashtable when
we call g_hash_table_remove_all() in the connection_hashtable_reset().

But both connection_track_table and conn_list reference to the same
conn instance. It will trigger double free in conn_list clear. So this
patch remove free action on hash table side to avoid double free the
conn.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo-compare.c    | 2 +-
 net/filter-rewriter.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -XXX,XX +XXX,XX @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
     s->connection_track_table = g_hash_table_new_full(connection_key_hash,
                                                       connection_key_equal,
                                                       g_free,
-                                                      connection_destroy);
+                                                      NULL);
 
     colo_compare_iothread(s);
 
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index XXXXXXX..XXXXXXX 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -XXX,XX +XXX,XX @@ static void colo_rewriter_setup(NetFilterState *nf, Error **errp)
     s->connection_track_table = g_hash_table_new_full(connection_key_hash,
                                                       connection_key_equal,
                                                       g_free,
-                                                      connection_destroy);
+                                                      NULL);
     s->incoming_queue = qemu_new_net_queue(qemu_netfilter_pass_to_next, nf);
 }
 
-- 
2.7.4

From: Zhang Chen <chen.zhang@intel.com>

When COLO use only one vnet_hdr_support parameter between
filter-redirector and filter-mirror(or colo-compare), COLO will crash
with segmentation fault. Back track as follow:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
    at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
(gdb) bt
0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
    at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
net/colo.c:49
2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
net/filter-rewriter.c:63

So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
raise error and add trace-events to track vnet_hdr_len.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo.c       | 9 ++++++++-
 net/trace-events | 1 +
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/colo.c b/net/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -XXX,XX +XXX,XX @@ int parse_packet_early(Packet *pkt)
     static const uint8_t vlan[] = {0x81, 0x00};
     uint8_t *data = pkt->data + pkt->vnet_hdr_len;
     uint16_t l3_proto;
-    ssize_t l2hdr_len = eth_get_l2_hdr_length(data);
+    ssize_t l2hdr_len;
+
+    if (data == NULL) {
+        trace_colo_proxy_main_vnet_info("This packet is not parsed correctly, "
+                                        "pkt->vnet_hdr_len", pkt->vnet_hdr_len);
+        return 1;
+    }
+    l2hdr_len = eth_get_l2_hdr_length(data);
 
     if (pkt->size < ETH_HLEN + pkt->vnet_hdr_len) {
         trace_colo_proxy_main("pkt->size < ETH_HLEN");
diff --git a/net/trace-events b/net/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -XXX,XX +XXX,XX @@ vhost_user_event(const char *chr, int event) "chr: %s got event: %d"
 
 # colo.c
 colo_proxy_main(const char *chr) ": %s"
+colo_proxy_main_vnet_info(const char *sta, int size) ": %s = %d"
 
 # colo-compare.c
 colo_compare_main(const char *chr) ": %s"
-- 
2.7.4