Series comparison

-[Qemu-devel] [PULL 00/25] Net patches
+[PULL 0/7] Net patches
-The following changes since commit c5e4e49258e9b89cb34c085a419dd9f862935c48:
+The following changes since commit 92f8c6fef13b31ba222c4d20ad8afd2b79c4c28e:
-  Merge remote-tracking branch 'remotes/xanclic/tags/pull-block-2018-09-25' into staging (2018-09-25 16:47:35 +0100)
+  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20210525' into staging (2021-05-25 16:17:06 +0100)
-are available in the Git repository at:
+are available in the git repository at:
   https://github.com/jasowang/qemu.git tags/net-pull-request
-for you to fetch changes up to f3df030edf90db184cd029697e976e24f1925e03:
+for you to fetch changes up to 90322e646e87c1440661cb3ddbc0cc94309d8a4f:
-  e1000: indicate dropped packets in HW counters (2018-09-26 11:06:10 +0800)
+  MAINTAINERS: Added eBPF maintainers information. (2021-06-04 15:25:46 +0800)
 ----------------------------------------------------------------
 ----------------------------------------------------------------
-Jason Wang (4):
+Andrew Melnychenko (7):
-      ne2000: fix possible out of bound access in ne2000_receive
+      net/tap: Added TUNSETSTEERINGEBPF code.
-      rtl8139: fix possible out of bound access
+      net: Added SetSteeringEBPF method for NetClientState.
-      pcnet: fix possible buffer overflow
+      ebpf: Added eBPF RSS program.
-      net: ignore packet size greater than INT_MAX
+      ebpf: Added eBPF RSS loader.
       virtio-net: Added eBPF RSS to virtio-net.
       docs: Added eBPF documentation.
       MAINTAINERS: Added eBPF maintainers information.
-Martin Wilck (1):
+ MAINTAINERS                    |   8 +
-      e1000: indicate dropped packets in HW counters
+ configure                      |   8 +-
+ docs/devel/ebpf_rss.rst        | 125 +++++++++
-Zhang Chen (15):
+ docs/devel/index.rst           |   1 +
-      filter-rewriter: Add TCP state machine and fix memory leak in connection_track_table
+ ebpf/ebpf_rss-stub.c           |  40 +++
-      colo-compare: implement the process of checkpoint
+ ebpf/ebpf_rss.c                | 165 ++++++++++++
-      colo-compare: use notifier to notify packets comparing result
+ ebpf/ebpf_rss.h                |  44 ++++
-      COLO: integrate colo compare with colo frame
+ ebpf/meson.build               |   1 +
-      COLO: Add block replication into colo process
+ ebpf/rss.bpf.skeleton.h        | 431 +++++++++++++++++++++++++++++++
-      COLO: Remove colo_state migration struct
+ ebpf/trace-events              |   4 +
-      COLO: Load dirty pages into SVM's RAM cache firstly
+ ebpf/trace.h                   |   1 +
-      ram/COLO: Record the dirty pages that SVM received
+ hw/net/vhost_net.c             |   3 +
-      COLO: Flush memory data from ram cache
+ hw/net/virtio-net.c            | 116 ++++++++-
-      qapi/migration.json: Rename COLO unknown mode to none mode.
+ include/hw/virtio/virtio-net.h |   4 +
-      qapi: Add new command to query colo status
+ include/net/net.h              |   2 +
-      savevm: split the process of different stages for loadvm/savevm
+ meson.build                    |  23 ++
-      filter: Add handle_event method for NetFilterClass
+ meson_options.txt              |   2 +
-      filter-rewriter: handle checkpoint and failover event
+ net/tap-bsd.c                  |   5 +
-      docs: Add COLO status diagram to COLO-FT.txt
+ net/tap-linux.c                |  13 +
+ net/tap-linux.h                |   1 +
-liujunjie (1):
+ net/tap-solaris.c              |   5 +
-      clean up callback when del virtqueue
+ net/tap-stub.c                 |   5 +
+ net/tap.c                      |   9 +
-zhanghailiang (4):
+ net/tap_int.h                  |   1 +
-      qmp event: Add COLO_EXIT event to notify users while exited COLO
+ net/vhost-vdpa.c               |   2 +
-      COLO: flush host dirty ram from cache
+ tools/ebpf/Makefile.ebpf       |  21 ++
-      COLO: notify net filters about checkpoint/failover event
+ tools/ebpf/rss.bpf.c           | 571 +++++++++++++++++++++++++++++++++++++++++
-      COLO: quick failover process by kick COLO thread
+files changed, 1607 insertions(+), 4 deletions(-)
+ create mode 100644 docs/devel/ebpf_rss.rst
- docs/COLO-FT.txt          |  34 ++++++++
+ create mode 100644 ebpf/ebpf_rss-stub.c
- hw/net/e1000.c            |  16 +++-
+ create mode 100644 ebpf/ebpf_rss.c
- hw/net/ne2000.c           |   4 +-
+ create mode 100644 ebpf/ebpf_rss.h
- hw/net/pcnet.c            |   4 +-
+ create mode 100644 ebpf/meson.build
- hw/net/rtl8139.c          |   8 +-
+ create mode 100644 ebpf/rss.bpf.skeleton.h
- hw/net/trace-events       |   3 +
+ create mode 100644 ebpf/trace-events
- hw/virtio/virtio.c        |   2 +
+ create mode 100644 ebpf/trace.h
- include/exec/ram_addr.h   |   1 +
+ create mode 100755 tools/ebpf/Makefile.ebpf
- include/migration/colo.h  |  11 ++-
+ create mode 100644 tools/ebpf/rss.bpf.c
  include/net/filter.h      |   5 ++
  migration/Makefile.objs   |   2 +-
  migration/colo-comm.c     |  76 -----------------
  migration/colo-failover.c |   2 +-
  migration/colo.c          | 212 +++++++++++++++++++++++++++++++++++++++++++---
  migration/migration.c     |  46 ++++++++--
  migration/ram.c           | 166 +++++++++++++++++++++++++++++++++++-
  migration/ram.h           |   4 +
  migration/savevm.c        |  53 ++++++++++--
  migration/savevm.h        |   5 ++
  migration/trace-events    |   3 +
  net/colo-compare.c        | 115 ++++++++++++++++++++++---
  net/colo-compare.h        |  24 ++++++
  net/colo.c                |  10 ++-
  net/colo.h                |  11 +--
  net/filter-rewriter.c     | 166 +++++++++++++++++++++++++++++++++---
  net/filter.c              |  17 ++++
  net/net.c                 |  26 +++++-
  qapi/migration.json       |  80 +++++++++++++++--
  vl.c                      |   2 -
 files changed, 957 insertions(+), 151 deletions(-)
  delete mode 100644 migration/colo-comm.c
  create mode 100644 net/colo-compare.h

-[Qemu-devel] [PULL 01/25] filter-rewriter: Add TCP state machine and fix memory leak in connection_track_table
+Deleted patch
-From: Zhang Chen <zhangckid@gmail.com>
-We add almost full TCP state machine in filter-rewriter, except
-TCPS_LISTEN and some simplify in VM active close FIN states.
-The reason for this simplify job is because guest kernel will track
-the TCP status and wait 2MSL time too, if client resend the FIN packet,
-guest will resend the last ACK, so we needn't wait 2MSL time in filter-rewriter.
-After a net connection is closed, we didn't clear its related resources
-in connection_track_table, which will lead to memory leak.
-Let's track the state of net connection, if it is closed, its related
-resources will be cleared up.
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- net/colo.c            |   2 +-
- net/colo.h            |   9 ++--
- net/filter-rewriter.c | 109 ++++++++++++++++++++++++++++++++++++++----
-files changed, 104 insertions(+), 16 deletions(-)
-diff --git a/net/colo.c b/net/colo.c
-index XXXXXXX..XXXXXXX 100644
---- a/net/colo.c
-+++ b/net/colo.c
-@@ -XXX,XX +XXX,XX @@ Connection *connection_new(ConnectionKey *key)
-     conn->ip_proto = key->ip_proto;
-     conn->processing = false;
-     conn->offset = 0;
--    conn->syn_flag = 0;
-+    conn->tcp_state = TCPS_CLOSED;
-     conn->pack = 0;
-     conn->sack = 0;
-     g_queue_init(&conn->primary_list);
-diff --git a/net/colo.h b/net/colo.h
-index XXXXXXX..XXXXXXX 100644
---- a/net/colo.h
-+++ b/net/colo.h
-@@ -XXX,XX +XXX,XX @@
- #include "slirp/slirp.h"
- #include "qemu/jhash.h"
- #include "qemu/timer.h"
-+#include "slirp/tcp.h"
- #define HASHTABLE_MAX_SIZE 16384
-@@ -XXX,XX +XXX,XX @@ typedef struct Connection {
-     uint32_t sack;
-     /* offset = secondary_seq - primary_seq */
-     tcp_seq  offset;
--    /*
--     * we use this flag update offset func
--     * run once in independent tcp connection
--     */
--    int syn_flag;
-+
-+    int tcp_state; /* TCP FSM state */
-+    tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */
- } Connection;
- uint32_t connection_key_hash(const void *opaque);
-diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
-index XXXXXXX..XXXXXXX 100644
---- a/net/filter-rewriter.c
-+++ b/net/filter-rewriter.c
-@@ -XXX,XX +XXX,XX @@ static int is_tcp_packet(Packet *pkt)
- }
- /* handle tcp packet from primary guest */
--static int handle_primary_tcp_pkt(NetFilterState *nf,
-+static int handle_primary_tcp_pkt(RewriterState *rf,
-                                   Connection *conn,
--                                  Packet *pkt)
-+                                  Packet *pkt, ConnectionKey *key)
- {
-     struct tcphdr *tcp_pkt;
-@@ -XXX,XX +XXX,XX @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
-         trace_colo_filter_rewriter_conn_offset(conn->offset);
-     }
-+    if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN)) &&
-+        conn->tcp_state == TCPS_SYN_SENT) {
-+        conn->tcp_state = TCPS_ESTABLISHED;
-+    }
-+
-     if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
-         /*
-          * we use this flag update offset func
-          * run once in independent tcp connection
-          */
--        conn->syn_flag = 1;
-+        conn->tcp_state = TCPS_SYN_RECEIVED;
-     }
-     if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) {
--        if (conn->syn_flag) {
-+        if (conn->tcp_state == TCPS_SYN_RECEIVED) {
-             /*
-              * offset = secondary_seq - primary seq
-              * ack packet sent by guest from primary node,
-              * so we use th_ack - 1 get primary_seq
-              */
-             conn->offset -= (ntohl(tcp_pkt->th_ack) - 1);
--            conn->syn_flag = 0;
-+            conn->tcp_state = TCPS_ESTABLISHED;
-         }
-         if (conn->offset) {
-             /* handle packets to the secondary from the primary */
-@@ -XXX,XX +XXX,XX @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
-             net_checksum_calculate((uint8_t *)pkt->data + pkt->vnet_hdr_len,
-                                    pkt->size - pkt->vnet_hdr_len);
-         }
-+
-+        /*
-+         * Passive close step 3
-+         */
-+        if ((conn->tcp_state == TCPS_LAST_ACK) &&
-+            (ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
-+            conn->tcp_state = TCPS_CLOSED;
-+            g_hash_table_remove(rf->connection_track_table, key);
-+        }
-+    }
-+
-+    if ((tcp_pkt->th_flags & TH_FIN) == TH_FIN) {
-+        /*
-+         * Passive close.
-+         * Step 1:
-+         * The *server* side of this connect is VM, *client* tries to close
-+         * the connection. We will into CLOSE_WAIT status.
-+         *
-+         * Step 2:
-+         * In this step we will into LAST_ACK status.
-+         *
-+         * We got 'fin=1, ack=1' packet from server side, we need to
-+         * record the seq of 'fin=1, ack=1' packet.
-+         *
-+         * Step 3:
-+         * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1'
-+         * packet from server side. From this point, we can ensure that there
-+         * will be no packets in the connection, except that, some errors
-+         * happen between the path of 'filter object' and vNIC, if this rare
-+         * case really happen, we can still create a new connection,
-+         * So it is safe to remove the connection from connection_track_table.
-+         *
-+         */
-+        if (conn->tcp_state == TCPS_ESTABLISHED) {
-+            conn->tcp_state = TCPS_CLOSE_WAIT;
-+        }
-+
-+        /*
-+         * Active close step 2.
-+         */
-+        if (conn->tcp_state == TCPS_FIN_WAIT_1) {
-+            conn->tcp_state = TCPS_TIME_WAIT;
-+            /*
-+             * For simplify implementation, we needn't wait 2MSL time
-+             * in filter rewriter. Because guest kernel will track the
-+             * TCP status and wait 2MSL time, if client resend the FIN
-+             * packet, guest will apply the last ACK too.
-+             */
-+            conn->tcp_state = TCPS_CLOSED;
-+            g_hash_table_remove(rf->connection_track_table, key);
-+        }
-     }
-     return 0;
- }
- /* handle tcp packet from secondary guest */
--static int handle_secondary_tcp_pkt(NetFilterState *nf,
-+static int handle_secondary_tcp_pkt(RewriterState *rf,
-                                     Connection *conn,
--                                    Packet *pkt)
-+                                    Packet *pkt, ConnectionKey *key)
- {
-     struct tcphdr *tcp_pkt;
-@@ -XXX,XX +XXX,XX @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
-         trace_colo_filter_rewriter_conn_offset(conn->offset);
-     }
--    if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) {
-+    if (conn->tcp_state == TCPS_SYN_RECEIVED &&
-+        ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) {
-         /*
-          * save offset = secondary_seq and then
-          * in handle_primary_tcp_pkt make offset
-@@ -XXX,XX +XXX,XX @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
-         conn->offset = ntohl(tcp_pkt->th_seq);
-     }
-+    /* VM active connect */
-+    if (conn->tcp_state == TCPS_CLOSED &&
-+        ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
-+        conn->tcp_state = TCPS_SYN_SENT;
-+    }
-+
-     if ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK) {
-         /* Only need to adjust seq while offset is Non-zero */
-         if (conn->offset) {
-@@ -XXX,XX +XXX,XX @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
-         }
-     }
-+    /*
-+     * Passive close step 2:
-+     */
-+    if (conn->tcp_state == TCPS_CLOSE_WAIT &&
-+        (tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) {
-+        conn->fin_ack_seq = ntohl(tcp_pkt->th_seq);
-+        conn->tcp_state = TCPS_LAST_ACK;
-+    }
-+
-+    /*
-+     * Active close
-+     *
-+     * Step 1:
-+     * The *server* side of this connect is VM, *server* tries to close
-+     * the connection.
-+     *
-+     * Step 2:
-+     * We will into CLOSE_WAIT status.
-+     * We simplify the TCPS_FIN_WAIT_2, TCPS_TIME_WAIT and
-+     * CLOSING status.
-+     */
-+    if (conn->tcp_state == TCPS_ESTABLISHED &&
-+        (tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == TH_FIN) {
-+        conn->tcp_state = TCPS_FIN_WAIT_1;
-+    }
-+
-     return 0;
- }
-@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
-         if (sender == nf->netdev) {
-             /* NET_FILTER_DIRECTION_TX */
--            if (!handle_primary_tcp_pkt(nf, conn, pkt)) {
-+            if (!handle_primary_tcp_pkt(s, conn, pkt, &key)) {
-                 qemu_net_queue_send(s->incoming_queue, sender, 0,
-                 (const uint8_t *)pkt->data, pkt->size, NULL);
-                 packet_destroy(pkt, NULL);
-@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
-             }
-         } else {
-             /* NET_FILTER_DIRECTION_RX */
--            if (!handle_secondary_tcp_pkt(nf, conn, pkt)) {
-+            if (!handle_secondary_tcp_pkt(s, conn, pkt, &key)) {
-                 qemu_net_queue_send(s->incoming_queue, sender, 0,
-                 (const uint8_t *)pkt->data, pkt->size, NULL);
-                 packet_destroy(pkt, NULL);
---
-.17.1

-[Qemu-devel] [PULL 23/25] pcnet: fix possible buffer overflow
+[PULL 1/7] net/tap: Added TUNSETSTEERINGEBPF code.
-In pcnet_receive(), we try to assign size_ to size which converts from
+From: Andrew Melnychenko <andrew@daynix.com>
 size_t to integer. This will cause troubles when size_ is greater
 INT_MAX, this will lead a negative value in size and it can then pass
 the check of size < MIN_BUF_SIZE which may lead out of bound access
 for both buf and buf1.
-Fixing by converting the type of size to size_t.
+Additional code that will be used for eBPF setting steering routine.
-CC: qemu-stable@nongnu.org
+Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
 Reported-by: Daniel Shapira <daniel@twistlock.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- hw/net/pcnet.c | 4 ++--
+ net/tap-linux.h | 1 +
-file changed, 2 insertions(+), 2 deletions(-)
+file changed, 1 insertion(+)
-diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
+diff --git a/net/tap-linux.h b/net/tap-linux.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/net/pcnet.c
+--- a/net/tap-linux.h
-+++ b/hw/net/pcnet.c
++++ b/net/tap-linux.h
-@@ -XXX,XX +XXX,XX @@ ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
+@@ -XXX,XX +XXX,XX @@
-     uint8_t buf1[60];
+ #define TUNSETQUEUE  _IOW('T', 217, int)
-     int remaining;
+ #define TUNSETVNETLE _IOW('T', 220, int)
-     int crc_err = 0;
+ #define TUNSETVNETBE _IOW('T', 222, int)
--    int size = size_;
++#define TUNSETSTEERINGEBPF _IOR('T', 224, int)
-+    size_t size = size_;
      if (CSR_DRX(s) || CSR_STOP(s) || CSR_SPND(s) || !size ||
          (CSR_LOOP(s) && !s->looptest)) {
          return -1;
      }
  #ifdef PCNET_DEBUG
 -    printf("pcnet_receive size=%d\n", size);
 +    printf("pcnet_receive size=%zu\n", size);
  #endif
-     /* if too small buffer, then expand it */
 --
-.17.1
+.7.4

-[Qemu-devel] [PULL 04/25] COLO: integrate colo compare with colo frame
+[PULL 2/7] net: Added SetSteeringEBPF method for NetClientState.
-From: Zhang Chen <zhangckid@gmail.com>
+From: Andrew Melnychenko <andrew@daynix.com>
-For COLO FT, both the PVM and SVM run at the same time,
+For now, that method supported only by Linux TAP.
-only sync the state while it needs.
+Linux TAP uses TUNSETSTEERINGEBPF ioctl.
-So here, let SVM runs while not doing checkpoint, change
+Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
 DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100.
 Besides, we forgot to release colo_checkpoint_semd and
 colo_delay_timer, fix them here.
 Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
 Signed-off-by: Zhang Chen <zhangckid@gmail.com>
 Signed-off-by: Zhang Chen <chen.zhang@intel.com>
 Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- migration/colo.c      | 42 ++++++++++++++++++++++++++++++++++++++++--
+ include/net/net.h |  2 ++
- migration/migration.c |  6 ++----
+ net/tap-bsd.c     |  5 +++++
-files changed, 42 insertions(+), 6 deletions(-)
+ net/tap-linux.c   | 13 +++++++++++++
  net/tap-solaris.c |  5 +++++
  net/tap-stub.c    |  5 +++++
  net/tap.c         |  9 +++++++++
  net/tap_int.h     |  1 +
 files changed, 40 insertions(+)
-diff --git a/migration/colo.c b/migration/colo.c
+diff --git a/include/net/net.h b/include/net/net.h
 index XXXXXXX..XXXXXXX 100644
---- a/migration/colo.c
+--- a/include/net/net.h
-+++ b/migration/colo.c
++++ b/include/net/net.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ typedef int (SetVnetBE)(NetClientState *, bool);
- #include "qemu/error-report.h"
+ typedef struct SocketReadState SocketReadState;
- #include "migration/failover.h"
+ typedef void (SocketReadStateFinalize)(SocketReadState *rs);
- #include "replication.h"
+ typedef void (NetAnnounce)(NetClientState *);
-+#include "net/colo-compare.h"
++typedef bool (SetSteeringEBPF)(NetClientState *, int);
-+#include "net/colo.h"
+ typedef struct NetClientInfo {
- static bool vmstate_loading;
+     NetClientDriver type;
-+static Notifier packets_compare_notifier;
+@@ -XXX,XX +XXX,XX @@ typedef struct NetClientInfo {
+     SetVnetLE *set_vnet_le;
- #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
+     SetVnetBE *set_vnet_be;
+     NetAnnounce *announce;
-@@ -XXX,XX +XXX,XX @@ static int colo_do_checkpoint_transaction(MigrationState *s,
++    SetSteeringEBPF *set_steering_ebpf;
-         goto out;
+ } NetClientInfo;
-     }
+ struct NetClientState {
-+    colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT, &local_err);
+diff --git a/net/tap-bsd.c b/net/tap-bsd.c
-+    if (local_err) {
+index XXXXXXX..XXXXXXX 100644
-+        goto out;
+--- a/net/tap-bsd.c
 +++ b/net/tap-bsd.c
@@ -XXX,XX +XXX,XX @@ int tap_fd_get_ifname(int fd, char *ifname)
  {
      return -1;
  }
 +
 +int tap_fd_set_steering_ebpf(int fd, int prog_fd)
 +{
 +    return -1;
 +}
 diff --git a/net/tap-linux.c b/net/tap-linux.c
 index XXXXXXX..XXXXXXX 100644
 --- a/net/tap-linux.c
 +++ b/net/tap-linux.c
@@ -XXX,XX +XXX,XX @@ int tap_fd_get_ifname(int fd, char *ifname)
      pstrcpy(ifname, sizeof(ifr.ifr_name), ifr.ifr_name);
      return 0;
  }
 +
 +int tap_fd_set_steering_ebpf(int fd, int prog_fd)
 +{
 +    if (ioctl(fd, TUNSETSTEERINGEBPF, (void *) &prog_fd) != 0) {
 +        error_report("Issue while setting TUNSETSTEERINGEBPF:"
 +                    " %s with fd: %d, prog_fd: %d",
 +                    strerror(errno), fd, prog_fd);
 +
 +       return -1;
 +    }
 +
-     /* Disable block migration */
++    return 0;
-     migrate_set_block_enabled(false, &local_err);
++}
-     qemu_savevm_state_header(fb);
+diff --git a/net/tap-solaris.c b/net/tap-solaris.c
-@@ -XXX,XX +XXX,XX @@ out:
+index XXXXXXX..XXXXXXX 100644
-     return ret;
+--- a/net/tap-solaris.c
 +++ b/net/tap-solaris.c
@@ -XXX,XX +XXX,XX @@ int tap_fd_get_ifname(int fd, char *ifname)
  {
      return -1;
  }
++
-+static void colo_compare_notify_checkpoint(Notifier *notifier, void *data)
++int tap_fd_set_steering_ebpf(int fd, int prog_fd)
 +{
-+    colo_checkpoint_notify(data);
++    return -1;
 +}
 diff --git a/net/tap-stub.c b/net/tap-stub.c
 index XXXXXXX..XXXXXXX 100644
 --- a/net/tap-stub.c
 +++ b/net/tap-stub.c
@@ -XXX,XX +XXX,XX @@ int tap_fd_get_ifname(int fd, char *ifname)
  {
      return -1;
  }
 +
 +int tap_fd_set_steering_ebpf(int fd, int prog_fd)
 +{
 +    return -1;
 +}
 diff --git a/net/tap.c b/net/tap.c
 index XXXXXXX..XXXXXXX 100644
 --- a/net/tap.c
 +++ b/net/tap.c
@@ -XXX,XX +XXX,XX @@ static void tap_poll(NetClientState *nc, bool enable)
      tap_write_poll(s, enable);
  }
 +static bool tap_set_steering_ebpf(NetClientState *nc, int prog_fd)
 +{
 +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
 +    assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
 +
 +    return tap_fd_set_steering_ebpf(s->fd, prog_fd) == 0;
 +}
 +
- static void colo_process_checkpoint(MigrationState *s)
+ int tap_get_fd(NetClientState *nc)
  {
-     QIOChannelBuffer *bioc;
+     TAPState *s = DO_UPCAST(TAPState, nc, nc);
-@@ -XXX,XX +XXX,XX @@ static void colo_process_checkpoint(MigrationState *s)
+@@ -XXX,XX +XXX,XX @@ static NetClientInfo net_tap_info = {
-         goto out;
+     .set_vnet_hdr_len = tap_set_vnet_hdr_len,
-     }
+     .set_vnet_le = tap_set_vnet_le,
+     .set_vnet_be = tap_set_vnet_be,
-+    packets_compare_notifier.notify = colo_compare_notify_checkpoint;
++    .set_steering_ebpf = tap_set_steering_ebpf,
-+    colo_compare_register_notifier(&packets_compare_notifier);
+ };
-+
-     /*
+ static TAPState *net_tap_fd_init(NetClientState *peer,
-      * Wait for Secondary finish loading VM states and enter COLO
+diff --git a/net/tap_int.h b/net/tap_int.h
       * restore.
@@ -XXX,XX +XXX,XX @@ out:
          qemu_fclose(fb);
      }
 -    timer_del(s->colo_delay_timer);
 -
      /* Hope this not to be too long to wait here */
      qemu_sem_wait(&s->colo_exit_sem);
      qemu_sem_destroy(&s->colo_exit_sem);
 +
 +    /*
 +     * It is safe to unregister notifier after failover finished.
 +     * Besides, colo_delay_timer and colo_checkpoint_sem can't be
 +     * released befor unregister notifier, or there will be use-after-free
 +     * error.
 +     */
 +    colo_compare_unregister_notifier(&packets_compare_notifier);
 +    timer_del(s->colo_delay_timer);
 +    timer_free(s->colo_delay_timer);
 +    qemu_sem_destroy(&s->colo_checkpoint_sem);
 +
      /*
       * Must be called after failover BH is completed,
       * Or the failover BH may shutdown the wrong fd that
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
      fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
      object_unref(OBJECT(bioc));
 +    qemu_mutex_lock_iothread();
 +    vm_start();
 +    trace_colo_vm_state_change("stop", "run");
 +    qemu_mutex_unlock_iothread();
 +
      colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
                        &local_err);
      if (local_err) {
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
              goto out;
          }
 +        qemu_mutex_lock_iothread();
 +        vm_stop_force_state(RUN_STATE_COLO);
 +        trace_colo_vm_state_change("run", "stop");
 +        qemu_mutex_unlock_iothread();
 +
          /* FIXME: This is unnecessary for periodic checkpoint mode */
          colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
                       &local_err);
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
          }
          vmstate_loading = false;
 +        vm_start();
 +        trace_colo_vm_state_change("stop", "run");
          qemu_mutex_unlock_iothread();
          if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
 diff --git a/migration/migration.c b/migration/migration.c
 index XXXXXXX..XXXXXXX 100644
---- a/migration/migration.c
+--- a/net/tap_int.h
-+++ b/migration/migration.c
++++ b/net/tap_int.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ int tap_fd_set_vnet_be(int fd, int vnet_is_be);
- /* Migration XBZRLE default cache size */
+ int tap_fd_enable(int fd);
- #define DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE (64 * 1024 * 1024)
+ int tap_fd_disable(int fd);
+ int tap_fd_get_ifname(int fd, char *ifname);
--/* The delay time (in ms) between two COLO checkpoints
++int tap_fd_set_steering_ebpf(int fd, int prog_fd);
-- * Note: Please change this default value to 10000 when we support hybrid mode.
-- */
+ #endif /* NET_TAP_INT_H */
 -#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
 +/* The delay time (in ms) between two COLO checkpoints */
 +#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
  #define DEFAULT_MIGRATE_MULTIFD_CHANNELS 2
  #define DEFAULT_MIGRATE_MULTIFD_PAGE_COUNT 16
 --
-.17.1
+.7.4

-[Qemu-devel] [PULL 15/25] filter: Add handle_event method for NetFilterClass
+[PULL 3/7] ebpf: Added eBPF RSS program.
-From: Zhang Chen <zhangckid@gmail.com>
+From: Andrew Melnychenko <andrew@daynix.com>
-Filter needs to process the event of checkpoint/failover or
+RSS program and Makefile to build it.
-other event passed by COLO frame.
+The bpftool used to generate '.h' file.
 The data in that file may be loaded by libbpf.
 EBPF compilation is not required for building qemu.
 You can use Makefile if you need to regenerate rss.bpf.skeleton.h.
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
+Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
+Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
 Signed-off-by: Zhang Chen <chen.zhang@intel.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- include/net/filter.h |  5 +++++
+ tools/ebpf/Makefile.ebpf |  21 ++
- net/filter.c         | 17 +++++++++++++++++
+ tools/ebpf/rss.bpf.c     | 571 +++++++++++++++++++++++++++++++++++++++++++++++
- net/net.c            | 19 +++++++++++++++++++
+files changed, 592 insertions(+)
-files changed, 41 insertions(+)
+ create mode 100755 tools/ebpf/Makefile.ebpf
  create mode 100644 tools/ebpf/rss.bpf.c
-diff --git a/include/net/filter.h b/include/net/filter.h
+diff --git a/tools/ebpf/Makefile.ebpf b/tools/ebpf/Makefile.ebpf
-index XXXXXXX..XXXXXXX 100644
+new file mode 100755
---- a/include/net/filter.h
+index XXXXXXX..XXXXXXX
-+++ b/include/net/filter.h
+--- /dev/null
-@@ -XXX,XX +XXX,XX @@ typedef ssize_t (FilterReceiveIOV)(NetFilterState *nc,
++++ b/tools/ebpf/Makefile.ebpf
  typedef void (FilterStatusChanged) (NetFilterState *nf, Error **errp);
 +typedef void (FilterHandleEvent) (NetFilterState *nf, int event, Error **errp);
 +
  typedef struct NetFilterClass {
      ObjectClass parent_class;
@@ -XXX,XX +XXX,XX @@ typedef struct NetFilterClass {
      FilterSetup *setup;
      FilterCleanup *cleanup;
      FilterStatusChanged *status_changed;
 +    FilterHandleEvent *handle_event;
      /* mandatory */
      FilterReceiveIOV *receive_iov;
  } NetFilterClass;
@@ -XXX,XX +XXX,XX @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
                                      int iovcnt,
                                      void *opaque);
 +void colo_notify_filters_event(int event, Error **errp);
 +
  #endif /* QEMU_NET_FILTER_H */
 diff --git a/net/filter.c b/net/filter.c
 index XXXXXXX..XXXXXXX 100644
 --- a/net/filter.c
 +++ b/net/filter.c
 @@ -XXX,XX +XXX,XX @@
- #include "net/vhost_net.h"
++OBJS = rss.bpf.o
- #include "qom/object_interfaces.h"
++
- #include "qemu/iov.h"
++LLC ?= llc
-+#include "net/colo.h"
++CLANG ?= clang
-+#include "migration/colo.h"
++INC_FLAGS = `$(CLANG) -print-file-name=include`
++EXTRA_CFLAGS ?= -O2 -emit-llvm -fno-stack-protector
- static inline bool qemu_can_skip_netfilter(NetFilterState *nf)
++
- {
++all: $(OBJS)
-@@ -XXX,XX +XXX,XX @@ static void netfilter_finalize(Object *obj)
++
-     g_free(nf->netdev_id);
++.PHONY: clean
- }
++
++clean:
-+static void default_handle_event(NetFilterState *nf, int event, Error **errp)
++    rm -f $(OBJS)
 +
 +$(OBJS):  %.o:%.c
 +    $(CLANG) $(INC_FLAGS) \
 +                -D__KERNEL__ -D__ASM_SYSREG_H \
 +                -I../include $(LINUXINCLUDE) \
 +                $(EXTRA_CFLAGS) -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
 +    bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h
 +    cp rss.bpf.skeleton.h ../../ebpf/
 diff --git a/tools/ebpf/rss.bpf.c b/tools/ebpf/rss.bpf.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tools/ebpf/rss.bpf.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * eBPF RSS program
 + *
 + * Developed by Daynix Computing LTD (http://www.daynix.com)
 + *
 + * Authors:
 + *  Andrew Melnychenko <andrew@daynix.com>
 + *  Yuri Benditovich <yuri.benditovich@daynix.com>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.
 + *
 + * Prepare:
 + * Requires llvm, clang, bpftool, linux kernel tree
 + *
 + * Build rss.bpf.skeleton.h:
 + * make -f Makefile.ebpf clean all
 + */
 +
 +#include <stddef.h>
 +#include <stdbool.h>
 +#include <linux/bpf.h>
 +
 +#include <linux/in.h>
 +#include <linux/if_ether.h>
 +#include <linux/ip.h>
 +#include <linux/ipv6.h>
 +
 +#include <linux/udp.h>
 +#include <linux/tcp.h>
 +
 +#include <bpf/bpf_helpers.h>
 +#include <bpf/bpf_endian.h>
 +#include <linux/virtio_net.h>
 +
 +#define INDIRECTION_TABLE_SIZE 128
 +#define HASH_CALCULATION_BUFFER_SIZE 36
 +
 +struct rss_config_t {
 +    __u8 redirect;
 +    __u8 populate_hash;
 +    __u32 hash_types;
 +    __u16 indirections_len;
 +    __u16 default_queue;
 +} __attribute__((packed));
 +
 +struct toeplitz_key_data_t {
 +    __u32 leftmost_32_bits;
 +    __u8 next_byte[HASH_CALCULATION_BUFFER_SIZE];
 +};
 +
 +struct packet_hash_info_t {
 +    __u8 is_ipv4;
 +    __u8 is_ipv6;
 +    __u8 is_udp;
 +    __u8 is_tcp;
 +    __u8 is_ipv6_ext_src;
 +    __u8 is_ipv6_ext_dst;
 +    __u8 is_fragmented;
 +
 +    __u16 src_port;
 +    __u16 dst_port;
 +
 +    union {
 +        struct {
 +            __be32 in_src;
 +            __be32 in_dst;
 +        };
 +
 +        struct {
 +            struct in6_addr in6_src;
 +            struct in6_addr in6_dst;
 +            struct in6_addr in6_ext_src;
 +            struct in6_addr in6_ext_dst;
 +        };
 +    };
 +};
 +
 +struct bpf_map_def SEC("maps")
 +tap_rss_map_configurations = {
 +        .type        = BPF_MAP_TYPE_ARRAY,
 +        .key_size    = sizeof(__u32),
 +        .value_size  = sizeof(struct rss_config_t),
 +        .max_entries = 1,
 +};
 +
 +struct bpf_map_def SEC("maps")
 +tap_rss_map_toeplitz_key = {
 +        .type        = BPF_MAP_TYPE_ARRAY,
 +        .key_size    = sizeof(__u32),
 +        .value_size  = sizeof(struct toeplitz_key_data_t),
 +        .max_entries = 1,
 +};
 +
 +struct bpf_map_def SEC("maps")
 +tap_rss_map_indirection_table = {
 +        .type        = BPF_MAP_TYPE_ARRAY,
 +        .key_size    = sizeof(__u32),
 +        .value_size  = sizeof(__u16),
 +        .max_entries = INDIRECTION_TABLE_SIZE,
 +};
 +
 +static inline void net_rx_rss_add_chunk(__u8 *rss_input, size_t *bytes_written,
 +                                        const void *ptr, size_t size) {
 +    __builtin_memcpy(&rss_input[*bytes_written], ptr, size);
 +    *bytes_written += size;
 +}
 +
 +static inline
 +void net_toeplitz_add(__u32 *result,
 +                      __u8 *input,
 +                      __u32 len
 +        , struct toeplitz_key_data_t *key) {
 +
 +    __u32 accumulator = *result;
 +    __u32 leftmost_32_bits = key->leftmost_32_bits;
 +    __u32 byte;
 +
 +    for (byte = 0; byte < HASH_CALCULATION_BUFFER_SIZE; byte++) {
 +        __u8 input_byte = input[byte];
 +        __u8 key_byte = key->next_byte[byte];
 +        __u8 bit;
 +
 +        for (bit = 0; bit < 8; bit++) {
 +            if (input_byte & (1 << 7)) {
 +                accumulator ^= leftmost_32_bits;
 +            }
 +
 +            leftmost_32_bits =
 +                    (leftmost_32_bits << 1) | ((key_byte & (1 << 7)) >> 7);
 +
 +            input_byte <<= 1;
 +            key_byte <<= 1;
 +        }
 +    }
 +
 +    *result = accumulator;
 +}
 +
 +
 +static inline int ip6_extension_header_type(__u8 hdr_type)
 +{
-+    switch (event) {
++    switch (hdr_type) {
-+    case COLO_EVENT_CHECKPOINT:
++    case IPPROTO_HOPOPTS:
-+        break;
++    case IPPROTO_ROUTING:
-+    case COLO_EVENT_FAILOVER:
++    case IPPROTO_FRAGMENT:
-+        object_property_set_str(OBJECT(nf), "off", "status", errp);
++    case IPPROTO_ICMPV6:
-+        break;
++    case IPPROTO_NONE:
 +    case IPPROTO_DSTOPTS:
 +    case IPPROTO_MH:
 +        return 1;
 +    default:
 +        return 0;
 +    }
 +}
 +/*
 + * According to
 + * https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml
 + * we expect that there are would be no more than 11 extensions in IPv6 header,
 + * also there is 27 TLV options for Destination and Hop-by-hop extensions.
 + * Need to choose reasonable amount of maximum extensions/options we may
 + * check to find ext src/dst.
 + */
 +#define IP6_EXTENSIONS_COUNT 11
 +#define IP6_OPTIONS_COUNT 30
 +
 +static inline int parse_ipv6_ext(struct __sk_buff *skb,
 +        struct packet_hash_info_t *info,
 +        __u8 *l4_protocol, size_t *l4_offset)
 +{
 +    int err = 0;
 +
 +    if (!ip6_extension_header_type(*l4_protocol)) {
 +        return 0;
 +    }
 +
 +    struct ipv6_opt_hdr ext_hdr = {};
 +
 +    for (unsigned int i = 0; i < IP6_EXTENSIONS_COUNT; ++i) {
 +
 +        err = bpf_skb_load_bytes_relative(skb, *l4_offset, &ext_hdr,
 +                                    sizeof(ext_hdr), BPF_HDR_START_NET);
 +        if (err) {
 +            goto error;
 +        }
 +
 +        if (*l4_protocol == IPPROTO_ROUTING) {
 +            struct ipv6_rt_hdr ext_rt = {};
 +
 +            err = bpf_skb_load_bytes_relative(skb, *l4_offset, &ext_rt,
 +                                        sizeof(ext_rt), BPF_HDR_START_NET);
 +            if (err) {
 +                goto error;
 +            }
 +
 +            if ((ext_rt.type == IPV6_SRCRT_TYPE_2) &&
 +                    (ext_rt.hdrlen == sizeof(struct in6_addr) / 8) &&
 +                    (ext_rt.segments_left == 1)) {
 +
 +                err = bpf_skb_load_bytes_relative(skb,
 +                    *l4_offset + offsetof(struct rt2_hdr, addr),
 +                    &info->in6_ext_dst, sizeof(info->in6_ext_dst),
 +                    BPF_HDR_START_NET);
 +                if (err) {
 +                    goto error;
 +                }
 +
 +                info->is_ipv6_ext_dst = 1;
 +            }
 +
 +        } else if (*l4_protocol == IPPROTO_DSTOPTS) {
 +            struct ipv6_opt_t {
 +                __u8 type;
 +                __u8 length;
 +            } __attribute__((packed)) opt = {};
 +
 +            size_t opt_offset = sizeof(ext_hdr);
 +
 +            for (unsigned int j = 0; j < IP6_OPTIONS_COUNT; ++j) {
 +                err = bpf_skb_load_bytes_relative(skb, *l4_offset + opt_offset,
 +                                        &opt, sizeof(opt), BPF_HDR_START_NET);
 +                if (err) {
 +                    goto error;
 +                }
 +
 +                if (opt.type == IPV6_TLV_HAO) {
 +                    err = bpf_skb_load_bytes_relative(skb,
 +                        *l4_offset + opt_offset
 +                        + offsetof(struct ipv6_destopt_hao, addr),
 +                        &info->in6_ext_src, sizeof(info->in6_ext_src),
 +                        BPF_HDR_START_NET);
 +                    if (err) {
 +                        goto error;
 +                    }
 +
 +                    info->is_ipv6_ext_src = 1;
 +                    break;
 +                }
 +
 +                opt_offset += (opt.type == IPV6_TLV_PAD1) ?
 +                              1 : opt.length + sizeof(opt);
 +
 +                if (opt_offset + 1 >= ext_hdr.hdrlen * 8) {
 +                    break;
 +                }
 +            }
 +        } else if (*l4_protocol == IPPROTO_FRAGMENT) {
 +            info->is_fragmented = true;
 +        }
 +
 +        *l4_protocol = ext_hdr.nexthdr;
 +        *l4_offset += (ext_hdr.hdrlen + 1) * 8;
 +
 +        if (!ip6_extension_header_type(ext_hdr.nexthdr)) {
 +            return 0;
 +        }
 +    }
 +
 +    return 0;
 +error:
 +    return err;
 +}
 +
 +static __be16 parse_eth_type(struct __sk_buff *skb)
 +{
 +    unsigned int offset = 12;
 +    __be16 ret = 0;
 +    int err = 0;
 +
 +    err = bpf_skb_load_bytes_relative(skb, offset, &ret, sizeof(ret),
 +                                BPF_HDR_START_MAC);
 +    if (err) {
 +        return 0;
 +    }
 +
 +    switch (bpf_ntohs(ret)) {
 +    case ETH_P_8021AD:
 +        offset += 4;
 +    case ETH_P_8021Q:
 +        offset += 4;
 +        err = bpf_skb_load_bytes_relative(skb, offset, &ret, sizeof(ret),
 +                                    BPF_HDR_START_MAC);
 +    default:
 +        break;
 +    }
-+}
++
-+
++    if (err) {
- static void netfilter_class_init(ObjectClass *oc, void *data)
++        return 0;
- {
++    }
-     UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
++
-+    NetFilterClass *nfc = NETFILTER_CLASS(oc);
++    return ret;
++}
-     ucc->complete = netfilter_complete;
++
-+    nfc->handle_event = default_handle_event;
++static inline int parse_packet(struct __sk_buff *skb,
- }
++        struct packet_hash_info_t *info)
  static const TypeInfo netfilter_info = {
 diff --git a/net/net.c b/net/net.c
 index XXXXXXX..XXXXXXX 100644
 --- a/net/net.c
 +++ b/net/net.c
@@ -XXX,XX +XXX,XX @@ void hmp_info_network(Monitor *mon, const QDict *qdict)
      }
  }
 +void colo_notify_filters_event(int event, Error **errp)
 +{
-+    NetClientState *nc;
++    int err = 0;
-+    NetFilterState *nf;
++
-+    NetFilterClass *nfc = NULL;
++    if (!info || !skb) {
-+    Error *local_err = NULL;
++        return -1;
-+
++    }
-+    QTAILQ_FOREACH(nc, &net_clients, next) {
++
-+        QTAILQ_FOREACH(nf, &nc->filters, next) {
++    size_t l4_offset = 0;
-+            nfc = NETFILTER_GET_CLASS(OBJECT(nf));
++    __u8 l4_protocol = 0;
-+            nfc->handle_event(nf, event, &local_err);
++    __u16 l3_protocol = bpf_ntohs(parse_eth_type(skb));
-+            if (local_err) {
++    if (l3_protocol == 0) {
-+                error_propagate(errp, local_err);
++        err = -1;
-+                return;
++        goto error;
-+            }
++    }
-+        }
++
-+    }
++    if (l3_protocol == ETH_P_IP) {
-+}
++        info->is_ipv4 = 1;
 +
- void qmp_set_link(const char *name, bool up, Error **errp)
++        struct iphdr ip = {};
- {
++        err = bpf_skb_load_bytes_relative(skb, 0, &ip, sizeof(ip),
-     NetClientState *ncs[MAX_QUEUE_NUM];
++                                    BPF_HDR_START_NET);
 +        if (err) {
 +            goto error;
 +        }
 +
 +        info->in_src = ip.saddr;
 +        info->in_dst = ip.daddr;
 +        info->is_fragmented = !!ip.frag_off;
 +
 +        l4_protocol = ip.protocol;
 +        l4_offset = ip.ihl * 4;
 +    } else if (l3_protocol == ETH_P_IPV6) {
 +        info->is_ipv6 = 1;
 +
 +        struct ipv6hdr ip6 = {};
 +        err = bpf_skb_load_bytes_relative(skb, 0, &ip6, sizeof(ip6),
 +                                    BPF_HDR_START_NET);
 +        if (err) {
 +            goto error;
 +        }
 +
 +        info->in6_src = ip6.saddr;
 +        info->in6_dst = ip6.daddr;
 +
 +        l4_protocol = ip6.nexthdr;
 +        l4_offset = sizeof(ip6);
 +
 +        err = parse_ipv6_ext(skb, info, &l4_protocol, &l4_offset);
 +        if (err) {
 +            goto error;
 +        }
 +    }
 +
 +    if (l4_protocol != 0 && !info->is_fragmented) {
 +        if (l4_protocol == IPPROTO_TCP) {
 +            info->is_tcp = 1;
 +
 +            struct tcphdr tcp = {};
 +            err = bpf_skb_load_bytes_relative(skb, l4_offset, &tcp, sizeof(tcp),
 +                                        BPF_HDR_START_NET);
 +            if (err) {
 +                goto error;
 +            }
 +
 +            info->src_port = tcp.source;
 +            info->dst_port = tcp.dest;
 +        } else if (l4_protocol == IPPROTO_UDP) { /* TODO: add udplite? */
 +            info->is_udp = 1;
 +
 +            struct udphdr udp = {};
 +            err = bpf_skb_load_bytes_relative(skb, l4_offset, &udp, sizeof(udp),
 +                                        BPF_HDR_START_NET);
 +            if (err) {
 +                goto error;
 +            }
 +
 +            info->src_port = udp.source;
 +            info->dst_port = udp.dest;
 +        }
 +    }
 +
 +    return 0;
 +
 +error:
 +    return err;
 +}
 +
 +static inline __u32 calculate_rss_hash(struct __sk_buff *skb,
 +        struct rss_config_t *config, struct toeplitz_key_data_t *toe)
 +{
 +    __u8 rss_input[HASH_CALCULATION_BUFFER_SIZE] = {};
 +    size_t bytes_written = 0;
 +    __u32 result = 0;
 +    int err = 0;
 +    struct packet_hash_info_t packet_info = {};
 +
 +    err = parse_packet(skb, &packet_info);
 +    if (err) {
 +        return 0;
 +    }
 +
 +    if (packet_info.is_ipv4) {
 +        if (packet_info.is_tcp &&
 +            config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCPv4) {
 +
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.in_src,
 +                                 sizeof(packet_info.in_src));
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.in_dst,
 +                                 sizeof(packet_info.in_dst));
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.src_port,
 +                                 sizeof(packet_info.src_port));
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.dst_port,
 +                                 sizeof(packet_info.dst_port));
 +        } else if (packet_info.is_udp &&
 +                   config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDPv4) {
 +
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.in_src,
 +                                 sizeof(packet_info.in_src));
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.in_dst,
 +                                 sizeof(packet_info.in_dst));
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.src_port,
 +                                 sizeof(packet_info.src_port));
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.dst_port,
 +                                 sizeof(packet_info.dst_port));
 +        } else if (config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IPv4) {
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.in_src,
 +                                 sizeof(packet_info.in_src));
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.in_dst,
 +                                 sizeof(packet_info.in_dst));
 +        }
 +    } else if (packet_info.is_ipv6) {
 +        if (packet_info.is_tcp &&
 +            config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCPv6) {
 +
 +            if (packet_info.is_ipv6_ext_src &&
 +                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCP_EX) {
 +
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_ext_src,
 +                                     sizeof(packet_info.in6_ext_src));
 +            } else {
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_src,
 +                                     sizeof(packet_info.in6_src));
 +            }
 +            if (packet_info.is_ipv6_ext_dst &&
 +                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCP_EX) {
 +
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_ext_dst,
 +                                     sizeof(packet_info.in6_ext_dst));
 +            } else {
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_dst,
 +                                     sizeof(packet_info.in6_dst));
 +            }
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.src_port,
 +                                 sizeof(packet_info.src_port));
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.dst_port,
 +                                 sizeof(packet_info.dst_port));
 +        } else if (packet_info.is_udp &&
 +                   config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDPv6) {
 +
 +            if (packet_info.is_ipv6_ext_src &&
 +               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) {
 +
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_ext_src,
 +                                     sizeof(packet_info.in6_ext_src));
 +            } else {
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_src,
 +                                     sizeof(packet_info.in6_src));
 +            }
 +            if (packet_info.is_ipv6_ext_dst &&
 +               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) {
 +
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_ext_dst,
 +                                     sizeof(packet_info.in6_ext_dst));
 +            } else {
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_dst,
 +                                     sizeof(packet_info.in6_dst));
 +            }
 +
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.src_port,
 +                                 sizeof(packet_info.src_port));
 +            net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                 &packet_info.dst_port,
 +                                 sizeof(packet_info.dst_port));
 +
 +        } else if (config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IPv6) {
 +            if (packet_info.is_ipv6_ext_src &&
 +               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IP_EX) {
 +
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_ext_src,
 +                                     sizeof(packet_info.in6_ext_src));
 +            } else {
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_src,
 +                                     sizeof(packet_info.in6_src));
 +            }
 +            if (packet_info.is_ipv6_ext_dst &&
 +                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IP_EX) {
 +
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_ext_dst,
 +                                     sizeof(packet_info.in6_ext_dst));
 +            } else {
 +                net_rx_rss_add_chunk(rss_input, &bytes_written,
 +                                     &packet_info.in6_dst,
 +                                     sizeof(packet_info.in6_dst));
 +            }
 +        }
 +    }
 +
 +    if (bytes_written) {
 +        net_toeplitz_add(&result, rss_input, bytes_written, toe);
 +    }
 +
 +    return result;
 +}
 +
 +SEC("tun_rss_steering")
 +int tun_rss_steering_prog(struct __sk_buff *skb)
 +{
 +
 +    struct rss_config_t *config;
 +    struct toeplitz_key_data_t *toe;
 +
 +    __u32 key = 0;
 +    __u32 hash = 0;
 +
 +    config = bpf_map_lookup_elem(&tap_rss_map_configurations, &key);
 +    toe = bpf_map_lookup_elem(&tap_rss_map_toeplitz_key, &key);
 +
 +    if (config && toe) {
 +        if (!config->redirect) {
 +            return config->default_queue;
 +        }
 +
 +        hash = calculate_rss_hash(skb, config, toe);
 +        if (hash) {
 +            __u32 table_idx = hash % config->indirections_len;
 +            __u16 *queue = 0;
 +
 +            queue = bpf_map_lookup_elem(&tap_rss_map_indirection_table,
 +                                        &table_idx);
 +
 +            if (queue) {
 +                return *queue;
 +            }
 +        }
 +
 +        return config->default_queue;
 +    }
 +
 +    return -1;
 +}
 +
 +char _license[] SEC("license") = "GPL v2";
 --
-.17.1
+.7.4

-[Qemu-devel] [PULL 02/25] colo-compare: implement the process of checkpoint
+[PULL 4/7] ebpf: Added eBPF RSS loader.
-From: Zhang Chen <zhangckid@gmail.com>
+From: Andrew Melnychenko <andrew@daynix.com>
-While do checkpoint, we need to flush all the unhandled packets,
+Added function that loads RSS eBPF program.
-By using the filter notifier mechanism, we can easily to notify
+Added stub functions for RSS eBPF loader.
-every compare object to do this process, which runs inside
+Added meson and configuration options.
 of compare threads as a coroutine.
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
+By default, eBPF feature enabled if libbpf is present in the build system.
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
+libbpf checked in configuration shell script and meson script.
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
 Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
 Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- include/migration/colo.h |  6 ++++
+ configure               |   8 +-
- net/colo-compare.c       | 78 ++++++++++++++++++++++++++++++++++++++++
+ ebpf/ebpf_rss-stub.c    |  40 +++++
- net/colo-compare.h       | 22 ++++++++++++
+ ebpf/ebpf_rss.c         | 165 ++++++++++++++++++
-files changed, 106 insertions(+)
+ ebpf/ebpf_rss.h         |  44 +++++
- create mode 100644 net/colo-compare.h
+ ebpf/meson.build        |   1 +
  ebpf/rss.bpf.skeleton.h | 431 ++++++++++++++++++++++++++++++++++++++++++++++++
  ebpf/trace-events       |   4 +
  ebpf/trace.h            |   1 +
  meson.build             |  23 +++
  meson_options.txt       |   2 +
 files changed, 718 insertions(+), 1 deletion(-)
  create mode 100644 ebpf/ebpf_rss-stub.c
  create mode 100644 ebpf/ebpf_rss.c
  create mode 100644 ebpf/ebpf_rss.h
  create mode 100644 ebpf/meson.build
  create mode 100644 ebpf/rss.bpf.skeleton.h
  create mode 100644 ebpf/trace-events
  create mode 100644 ebpf/trace.h
-diff --git a/include/migration/colo.h b/include/migration/colo.h
+diff --git a/configure b/configure
-index XXXXXXX..XXXXXXX 100644
+index XXXXXXX..XXXXXXX 100755
---- a/include/migration/colo.h
+--- a/configure
-+++ b/include/migration/colo.h
++++ b/configure
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ vhost_vsock="$default_feature"
- #include "qemu-common.h"
+ vhost_user="no"
- #include "qapi/qapi-types-migration.h"
+ vhost_user_blk_server="auto"
+ vhost_user_fs="$default_feature"
-+enum colo_event {
++bpf="auto"
-+    COLO_EVENT_NONE,
+ kvm="auto"
-+    COLO_EVENT_CHECKPOINT,
+ hax="auto"
-+    COLO_EVENT_FAILOVER,
+ hvf="auto"
-+};
+@@ -XXX,XX +XXX,XX @@ for opt do
-+
+   ;;
- void colo_info_init(void);
+   --enable-membarrier) membarrier="yes"
+   ;;
- void migrate_start_colo_process(MigrationState *s);
++  --disable-bpf) bpf="disabled"
-diff --git a/net/colo-compare.c b/net/colo-compare.c
++  ;;
-index XXXXXXX..XXXXXXX 100644
++  --enable-bpf) bpf="enabled"
---- a/net/colo-compare.c
++  ;;
-+++ b/net/colo-compare.c
+   --disable-blobs) blobs="false"
-@@ -XXX,XX +XXX,XX @@
+   ;;
- #include "qemu/sockets.h"
+   --with-pkgversion=*) pkgversion="$optarg"
- #include "colo.h"
+@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available
- #include "sysemu/iothread.h"
+   vhost-user      vhost-user backend support
-+#include "net/colo-compare.h"
+   vhost-user-blk-server    vhost-user-blk server support
-+#include "migration/colo.h"
+   vhost-vdpa      vhost-vdpa kernel backend support
++  bpf             BPF kernel support
- #define TYPE_COLO_COMPARE "colo-compare"
+   spice           spice
- #define COLO_COMPARE(obj) \
+   spice-protocol  spice-protocol
-     OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
+   rbd             rados block device (rbd)
+@@ -XXX,XX +XXX,XX @@ if test "$skip_meson" = no; then
-+static QTAILQ_HEAD(, CompareState) net_compares =
+         -Dattr=$attr -Ddefault_devices=$default_devices \
-+       QTAILQ_HEAD_INITIALIZER(net_compares);
+         -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \
-+
+         -Dvhost_user_blk_server=$vhost_user_blk_server -Dmultiprocess=$multiprocess \
- #define COMPARE_READ_LEN_MAX NET_BUFSIZE
+-        -Dfuse=$fuse -Dfuse_lseek=$fuse_lseek -Dguest_agent_msi=$guest_agent_msi \
- #define MAX_QUEUE_SIZE 1024
++        -Dfuse=$fuse -Dfuse_lseek=$fuse_lseek -Dguest_agent_msi=$guest_agent_msi -Dbpf=$bpf\
+         $(if test "$default_features" = no; then echo "-Dauto_features=disabled"; fi) \
-@@ -XXX,XX +XXX,XX @@
+     -Dtcg_interpreter=$tcg_interpreter \
- /* TODO: Should be configurable */
+         $cross_arg \
- #define REGULAR_PACKET_CHECK_MS 3000
+diff --git a/ebpf/ebpf_rss-stub.c b/ebpf/ebpf_rss-stub.c
 +static QemuMutex event_mtx;
 +static QemuCond event_complete_cond;
 +static int event_unhandled_count;
 +
  /*
   *  + CompareState ++
   *  |               |
@@ -XXX,XX +XXX,XX @@ typedef struct CompareState {
      IOThread *iothread;
      GMainContext *worker_context;
      QEMUTimer *packet_check_timer;
 +
 +    QEMUBH *event_bh;
 +    enum colo_event event;
 +
 +    QTAILQ_ENTRY(CompareState) next;
  } CompareState;
  typedef struct CompareClass {
@@ -XXX,XX +XXX,XX @@ static void check_old_packet_regular(void *opaque)
                  REGULAR_PACKET_CHECK_MS);
  }
 +/* Public API, Used for COLO frame to notify compare event */
 +void colo_notify_compares_event(void *opaque, int event, Error **errp)
 +{
 +    CompareState *s;
 +
 +    qemu_mutex_lock(&event_mtx);
 +    QTAILQ_FOREACH(s, &net_compares, next) {
 +        s->event = event;
 +        qemu_bh_schedule(s->event_bh);
 +        event_unhandled_count++;
 +    }
 +    /* Wait all compare threads to finish handling this event */
 +    while (event_unhandled_count > 0) {
 +        qemu_cond_wait(&event_complete_cond, &event_mtx);
 +    }
 +
 +    qemu_mutex_unlock(&event_mtx);
 +}
 +
  static void colo_compare_timer_init(CompareState *s)
  {
      AioContext *ctx = iothread_get_aio_context(s->iothread);
@@ -XXX,XX +XXX,XX @@ static void colo_compare_timer_del(CompareState *s)
      }
   }
 +static void colo_flush_packets(void *opaque, void *user_data);
 +
 +static void colo_compare_handle_event(void *opaque)
 +{
 +    CompareState *s = opaque;
 +
 +    switch (s->event) {
 +    case COLO_EVENT_CHECKPOINT:
 +        g_queue_foreach(&s->conn_list, colo_flush_packets, s);
 +        break;
 +    case COLO_EVENT_FAILOVER:
 +        break;
 +    default:
 +        break;
 +    }
 +
 +    assert(event_unhandled_count > 0);
 +
 +    qemu_mutex_lock(&event_mtx);
 +    event_unhandled_count--;
 +    qemu_cond_broadcast(&event_complete_cond);
 +    qemu_mutex_unlock(&event_mtx);
 +}
 +
  static void colo_compare_iothread(CompareState *s)
  {
      object_ref(OBJECT(s->iothread));
@@ -XXX,XX +XXX,XX @@ static void colo_compare_iothread(CompareState *s)
                               s, s->worker_context, true);
      colo_compare_timer_init(s);
 +    s->event_bh = qemu_bh_new(colo_compare_handle_event, s);
  }
  static char *compare_get_pri_indev(Object *obj, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
      net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, s->vnet_hdr);
      net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, s->vnet_hdr);
 +    QTAILQ_INSERT_TAIL(&net_compares, s, next);
 +
      g_queue_init(&s->conn_list);
 +    qemu_mutex_init(&event_mtx);
 +    qemu_cond_init(&event_complete_cond);
 +
      s->connection_track_table = g_hash_table_new_full(connection_key_hash,
                                                        connection_key_equal,
                                                        g_free,
@@ -XXX,XX +XXX,XX @@ static void colo_compare_init(Object *obj)
  static void colo_compare_finalize(Object *obj)
  {
      CompareState *s = COLO_COMPARE(obj);
 +    CompareState *tmp = NULL;
      qemu_chr_fe_deinit(&s->chr_pri_in, false);
      qemu_chr_fe_deinit(&s->chr_sec_in, false);
@@ -XXX,XX +XXX,XX @@ static void colo_compare_finalize(Object *obj)
      if (s->iothread) {
          colo_compare_timer_del(s);
      }
 +
 +    qemu_bh_delete(s->event_bh);
 +
 +    QTAILQ_FOREACH(tmp, &net_compares, next) {
 +        if (tmp == s) {
 +            QTAILQ_REMOVE(&net_compares, s, next);
 +            break;
 +        }
 +    }
 +
      /* Release all unhandled packets after compare thead exited */
      g_queue_foreach(&s->conn_list, colo_flush_packets, s);
@@ -XXX,XX +XXX,XX @@ static void colo_compare_finalize(Object *obj)
      if (s->iothread) {
          object_unref(OBJECT(s->iothread));
      }
 +
 +    qemu_mutex_destroy(&event_mtx);
 +    qemu_cond_destroy(&event_complete_cond);
 +
      g_free(s->pri_indev);
      g_free(s->sec_indev);
      g_free(s->outdev);
 diff --git a/net/colo-compare.h b/net/colo-compare.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/net/colo-compare.h
++++ b/ebpf/ebpf_rss-stub.c
 @@ -XXX,XX +XXX,XX @@
 +/*
-+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
++ * eBPF RSS stub file
 + * (a.k.a. Fault Tolerance or Continuous Replication)
 + *
-+ * Copyright (c) 2017 HUAWEI TECHNOLOGIES CO., LTD.
++ * Developed by Daynix Computing LTD (http://www.daynix.com)
 + * Copyright (c) 2017 FUJITSU LIMITED
 + * Copyright (c) 2017 Intel Corporation
 + *
 + * Authors:
-+ *    zhanghailiang <zhang.zhanghailiang@huawei.com>
++ *  Yuri Benditovich <yuri.benditovich@daynix.com>
 + *    Zhang Chen <zhangckid@gmail.com>
 + *
-+ * This work is licensed under the terms of the GNU GPL, version 2 or
++ * This work is licensed under the terms of the GNU GPL, version 2.  See
-+ * later.  See the COPYING file in the top-level directory.
++ * the COPYING file in the top-level directory.
 + */
 +
-+#ifndef QEMU_COLO_COMPARE_H
++#include "qemu/osdep.h"
-+#define QEMU_COLO_COMPARE_H
++#include "ebpf/ebpf_rss.h"
 +
-+void colo_notify_compares_event(void *opaque, int event, Error **errp);
++void ebpf_rss_init(struct EBPFRSSContext *ctx)
-+
++{
-+#endif /* QEMU_COLO_COMPARE_H */
++
 +}
 +
 +bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
 +{
 +    return false;
 +}
 +
 +bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 +{
 +    return false;
 +}
 +
 +bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
 +                      uint16_t *indirections_table, uint8_t *toeplitz_key)
 +{
 +    return false;
 +}
 +
 +void ebpf_rss_unload(struct EBPFRSSContext *ctx)
 +{
 +
 +}
 diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/ebpf/ebpf_rss.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * eBPF RSS loader
 + *
 + * Developed by Daynix Computing LTD (http://www.daynix.com)
 + *
 + * Authors:
 + *  Andrew Melnychenko <andrew@daynix.com>
 + *  Yuri Benditovich <yuri.benditovich@daynix.com>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/error-report.h"
 +
 +#include <bpf/libbpf.h>
 +#include <bpf/bpf.h>
 +
 +#include "hw/virtio/virtio-net.h" /* VIRTIO_NET_RSS_MAX_TABLE_LEN */
 +
 +#include "ebpf/ebpf_rss.h"
 +#include "ebpf/rss.bpf.skeleton.h"
 +#include "trace.h"
 +
 +void ebpf_rss_init(struct EBPFRSSContext *ctx)
 +{
 +    if (ctx != NULL) {
 +        ctx->obj = NULL;
 +    }
 +}
 +
 +bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
 +{
 +    return ctx != NULL && ctx->obj != NULL;
 +}
 +
 +bool ebpf_rss_load(struct EBPFRSSContext *ctx)
 +{
 +    struct rss_bpf *rss_bpf_ctx;
 +
 +    if (ctx == NULL) {
 +        return false;
 +    }
 +
 +    rss_bpf_ctx = rss_bpf__open();
 +    if (rss_bpf_ctx == NULL) {
 +        trace_ebpf_error("eBPF RSS", "can not open eBPF RSS object");
 +        goto error;
 +    }
 +
 +    bpf_program__set_socket_filter(rss_bpf_ctx->progs.tun_rss_steering_prog);
 +
 +    if (rss_bpf__load(rss_bpf_ctx)) {
 +        trace_ebpf_error("eBPF RSS", "can not load RSS program");
 +        goto error;
 +    }
 +
 +    ctx->obj = rss_bpf_ctx;
 +    ctx->program_fd = bpf_program__fd(
 +            rss_bpf_ctx->progs.tun_rss_steering_prog);
 +    ctx->map_configuration = bpf_map__fd(
 +            rss_bpf_ctx->maps.tap_rss_map_configurations);
 +    ctx->map_indirections_table = bpf_map__fd(
 +            rss_bpf_ctx->maps.tap_rss_map_indirection_table);
 +    ctx->map_toeplitz_key = bpf_map__fd(
 +            rss_bpf_ctx->maps.tap_rss_map_toeplitz_key);
 +
 +    return true;
 +error:
 +    rss_bpf__destroy(rss_bpf_ctx);
 +    ctx->obj = NULL;
 +
 +    return false;
 +}
 +
 +static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
 +                                struct EBPFRSSConfig *config)
 +{
 +    uint32_t map_key = 0;
 +
 +    if (!ebpf_rss_is_loaded(ctx)) {
 +        return false;
 +    }
 +    if (bpf_map_update_elem(ctx->map_configuration,
 +                            &map_key, config, 0) < 0) {
 +        return false;
 +    }
 +    return true;
 +}
 +
 +static bool ebpf_rss_set_indirections_table(struct EBPFRSSContext *ctx,
 +                                            uint16_t *indirections_table,
 +                                            size_t len)
 +{
 +    uint32_t i = 0;
 +
 +    if (!ebpf_rss_is_loaded(ctx) || indirections_table == NULL ||
 +       len > VIRTIO_NET_RSS_MAX_TABLE_LEN) {
 +        return false;
 +    }
 +
 +    for (; i < len; ++i) {
 +        if (bpf_map_update_elem(ctx->map_indirections_table, &i,
 +                                indirections_table + i, 0) < 0) {
 +            return false;
 +        }
 +    }
 +    return true;
 +}
 +
 +static bool ebpf_rss_set_toepliz_key(struct EBPFRSSContext *ctx,
 +                                     uint8_t *toeplitz_key)
 +{
 +    uint32_t map_key = 0;
 +
 +    /* prepare toeplitz key */
 +    uint8_t toe[VIRTIO_NET_RSS_MAX_KEY_SIZE] = {};
 +
 +    if (!ebpf_rss_is_loaded(ctx) || toeplitz_key == NULL) {
 +        return false;
 +    }
 +    memcpy(toe, toeplitz_key, VIRTIO_NET_RSS_MAX_KEY_SIZE);
 +    *(uint32_t *)toe = ntohl(*(uint32_t *)toe);
 +
 +    if (bpf_map_update_elem(ctx->map_toeplitz_key, &map_key, toe,
 +                            0) < 0) {
 +        return false;
 +    }
 +    return true;
 +}
 +
 +bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
 +                      uint16_t *indirections_table, uint8_t *toeplitz_key)
 +{
 +    if (!ebpf_rss_is_loaded(ctx) || config == NULL ||
 +        indirections_table == NULL || toeplitz_key == NULL) {
 +        return false;
 +    }
 +
 +    if (!ebpf_rss_set_config(ctx, config)) {
 +        return false;
 +    }
 +
 +    if (!ebpf_rss_set_indirections_table(ctx, indirections_table,
 +                                      config->indirections_len)) {
 +        return false;
 +    }
 +
 +    if (!ebpf_rss_set_toepliz_key(ctx, toeplitz_key)) {
 +        return false;
 +    }
 +
 +    return true;
 +}
 +
 +void ebpf_rss_unload(struct EBPFRSSContext *ctx)
 +{
 +    if (!ebpf_rss_is_loaded(ctx)) {
 +        return;
 +    }
 +
 +    rss_bpf__destroy(ctx->obj);
 +    ctx->obj = NULL;
 +}
 diff --git a/ebpf/ebpf_rss.h b/ebpf/ebpf_rss.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/ebpf/ebpf_rss.h
@@ -XXX,XX +XXX,XX @@
 +/*
 + * eBPF RSS header
 + *
 + * Developed by Daynix Computing LTD (http://www.daynix.com)
 + *
 + * Authors:
 + *  Andrew Melnychenko <andrew@daynix.com>
 + *  Yuri Benditovich <yuri.benditovich@daynix.com>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2.  See
 + * the COPYING file in the top-level directory.
 + */
 +
 +#ifndef QEMU_EBPF_RSS_H
 +#define QEMU_EBPF_RSS_H
 +
 +struct EBPFRSSContext {
 +    void *obj;
 +    int program_fd;
 +    int map_configuration;
 +    int map_toeplitz_key;
 +    int map_indirections_table;
 +};
 +
 +struct EBPFRSSConfig {
 +    uint8_t redirect;
 +    uint8_t populate_hash;
 +    uint32_t hash_types;
 +    uint16_t indirections_len;
 +    uint16_t default_queue;
 +} __attribute__((packed));
 +
 +void ebpf_rss_init(struct EBPFRSSContext *ctx);
 +
 +bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx);
 +
 +bool ebpf_rss_load(struct EBPFRSSContext *ctx);
 +
 +bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
 +                      uint16_t *indirections_table, uint8_t *toeplitz_key);
 +
 +void ebpf_rss_unload(struct EBPFRSSContext *ctx);
 +
 +#endif /* QEMU_EBPF_RSS_H */
 diff --git a/ebpf/meson.build b/ebpf/meson.build
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/ebpf/meson.build
@@ -0,0 +1 @@
 +common_ss.add(when: libbpf, if_true: files('ebpf_rss.c'), if_false: files('ebpf_rss-stub.c'))
 diff --git a/ebpf/rss.bpf.skeleton.h b/ebpf/rss.bpf.skeleton.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/ebpf/rss.bpf.skeleton.h
@@ -XXX,XX +XXX,XX @@
 +/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
 +
 +/* THIS FILE IS AUTOGENERATED! */
 +#ifndef __RSS_BPF_SKEL_H__
 +#define __RSS_BPF_SKEL_H__
 +
 +#include <stdlib.h>
 +#include <bpf/libbpf.h>
 +
 +struct rss_bpf {
 +    struct bpf_object_skeleton *skeleton;
 +    struct bpf_object *obj;
 +    struct {
 +        struct bpf_map *tap_rss_map_configurations;
 +        struct bpf_map *tap_rss_map_indirection_table;
 +        struct bpf_map *tap_rss_map_toeplitz_key;
 +    } maps;
 +    struct {
 +        struct bpf_program *tun_rss_steering_prog;
 +    } progs;
 +    struct {
 +        struct bpf_link *tun_rss_steering_prog;
 +    } links;
 +};
 +
 +static void
 +rss_bpf__destroy(struct rss_bpf *obj)
 +{
 +    if (!obj)
 +        return;
 +    if (obj->skeleton)
 +        bpf_object__destroy_skeleton(obj->skeleton);
 +    free(obj);
 +}
 +
 +static inline int
 +rss_bpf__create_skeleton(struct rss_bpf *obj);
 +
 +static inline struct rss_bpf *
 +rss_bpf__open_opts(const struct bpf_object_open_opts *opts)
 +{
 +    struct rss_bpf *obj;
 +
 +    obj = (struct rss_bpf *)calloc(1, sizeof(*obj));
 +    if (!obj)
 +        return NULL;
 +    if (rss_bpf__create_skeleton(obj))
 +        goto err;
 +    if (bpf_object__open_skeleton(obj->skeleton, opts))
 +        goto err;
 +
 +    return obj;
 +err:
 +    rss_bpf__destroy(obj);
 +    return NULL;
 +}
 +
 +static inline struct rss_bpf *
 +rss_bpf__open(void)
 +{
 +    return rss_bpf__open_opts(NULL);
 +}
 +
 +static inline int
 +rss_bpf__load(struct rss_bpf *obj)
 +{
 +    return bpf_object__load_skeleton(obj->skeleton);
 +}
 +
 +static inline struct rss_bpf *
 +rss_bpf__open_and_load(void)
 +{
 +    struct rss_bpf *obj;
 +
 +    obj = rss_bpf__open();
 +    if (!obj)
 +        return NULL;
 +    if (rss_bpf__load(obj)) {
 +        rss_bpf__destroy(obj);
 +        return NULL;
 +    }
 +    return obj;
 +}
 +
 +static inline int
 +rss_bpf__attach(struct rss_bpf *obj)
 +{
 +    return bpf_object__attach_skeleton(obj->skeleton);
 +}
 +
 +static inline void
 +rss_bpf__detach(struct rss_bpf *obj)
 +{
 +    return bpf_object__detach_skeleton(obj->skeleton);
 +}
 +
 +static inline int
 +rss_bpf__create_skeleton(struct rss_bpf *obj)
 +{
 +    struct bpf_object_skeleton *s;
 +
 +    s = (struct bpf_object_skeleton *)calloc(1, sizeof(*s));
 +    if (!s)
 +        return -1;
 +    obj->skeleton = s;
 +
 +    s->sz = sizeof(*s);
 +    s->name = "rss_bpf";
 +    s->obj = &obj->obj;
 +
 +    /* maps */
 +    s->map_cnt = 3;
 +    s->map_skel_sz = sizeof(*s->maps);
 +    s->maps = (struct bpf_map_skeleton *)calloc(s->map_cnt, s->map_skel_sz);
 +    if (!s->maps)
 +        goto err;
 +
 +    s->maps[0].name = "tap_rss_map_configurations";
 +    s->maps[0].map = &obj->maps.tap_rss_map_configurations;
 +
 +    s->maps[1].name = "tap_rss_map_indirection_table";
 +    s->maps[1].map = &obj->maps.tap_rss_map_indirection_table;
 +
 +    s->maps[2].name = "tap_rss_map_toeplitz_key";
 +    s->maps[2].map = &obj->maps.tap_rss_map_toeplitz_key;
 +
 +    /* programs */
 +    s->prog_cnt = 1;
 +    s->prog_skel_sz = sizeof(*s->progs);
 +    s->progs = (struct bpf_prog_skeleton *)calloc(s->prog_cnt, s->prog_skel_sz);
 +    if (!s->progs)
 +        goto err;
 +
 +    s->progs[0].name = "tun_rss_steering_prog";
 +    s->progs[0].prog = &obj->progs.tun_rss_steering_prog;
 +    s->progs[0].link = &obj->links.tun_rss_steering_prog;
 +
 +    s->data_sz = 8088;
 +    s->data = (void *)"\
 +\x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
 +\0\0\0\0\0\0\0\0\0\0\0\x18\x1d\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0a\0\
 +\x01\0\xbf\x18\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x4c\xff\0\0\0\0\xbf\xa7\
 +\0\0\0\0\0\0\x07\x07\0\0\x4c\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 +\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x06\0\0\0\0\0\0\x18\x01\0\0\0\0\0\
 +\0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x07\0\0\0\0\0\0\
 +\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x66\x02\0\0\0\0\xbf\x79\0\0\
 +\0\0\0\0\x15\x09\x64\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
 +\0\x5d\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc0\xff\0\0\0\0\x7b\x1a\xb8\xff\
 +\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\0\x7b\x1a\xa0\xff\0\0\0\
 +\0\x63\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\x1a\x88\xff\0\0\0\0\x7b\
 +\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\x70\xff\0\0\0\0\x7b\x1a\
 +\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\xff\0\0\0\0\x7b\x1a\x50\
 +\xff\0\0\0\0\x15\x08\x4c\x02\0\0\0\0\x6b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
 +\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x81\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
 +\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\
 +\x77\0\0\0\x20\0\0\0\x55\0\x11\0\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xd0\xff\
 +\0\0\0\0\xbf\x13\0\0\0\0\0\0\xdc\x03\0\0\x10\0\0\0\x15\x03\x02\0\0\x81\0\0\x55\
 +\x03\x0c\0\xa8\x88\0\0\xb7\x02\0\0\x14\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\
 +\xd0\xff\xff\xff\xbf\x81\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\
 +\x85\0\0\0\x44\0\0\0\x69\xa1\xd0\xff\0\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\
 +\0\0\0\x15\0\x01\0\0\0\0\0\x05\0\x2f\x02\0\0\0\0\x15\x01\x2e\x02\0\0\0\0\x7b\
 +\x9a\x30\xff\0\0\0\0\x15\x01\x57\0\x86\xdd\0\0\x55\x01\x3b\0\x08\0\0\0\x7b\x7a\
 +\x20\xff\0\0\0\0\xb7\x07\0\0\x01\0\0\0\x73\x7a\x50\xff\0\0\0\0\xb7\x01\0\0\0\0\
 +\0\0\x63\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\
 +\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x81\0\0\0\0\0\0\xb7\x02\0\
 +\0\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\
 +\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x1a\x02\0\0\0\0\x69\xa1\xd6\xff\0\0\
 +\0\0\x55\x01\x01\0\0\0\0\0\xb7\x07\0\0\0\0\0\0\x61\xa1\xdc\xff\0\0\0\0\x63\x1a\
 +\x5c\xff\0\0\0\0\x61\xa1\xe0\xff\0\0\0\0\x63\x1a\x60\xff\0\0\0\0\x73\x7a\x56\
 +\xff\0\0\0\0\x71\xa9\xd9\xff\0\0\0\0\x71\xa1\xd0\xff\0\0\0\0\x67\x01\0\0\x02\0\
 +\0\0\x57\x01\0\0\x3c\0\0\0\x7b\x1a\x40\xff\0\0\0\0\x79\xa7\x20\xff\0\0\0\0\xbf\
 +\x91\0\0\0\0\0\0\x57\x01\0\0\xff\0\0\0\x15\x01\x19\0\0\0\0\0\x71\xa1\x56\xff\0\
 +\0\0\0\x55\x01\x17\0\0\0\0\0\x57\x09\0\0\xff\0\0\0\x15\x09\x7a\x01\x11\0\0\0\
 +\x55\x09\x14\0\x06\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\x1a\x53\xff\0\0\0\0\xb7\x01\
 +\0\0\0\0\0\0\x63\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\xff\0\
 +\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x81\0\0\0\0\0\0\x79\
 +\xa2\x40\xff\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\
 +\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\xf4\x01\0\0\0\0\x69\xa1\
 +\xd0\xff\0\0\0\0\x6b\x1a\x58\xff\0\0\0\0\x69\xa1\xd2\xff\0\0\0\0\x6b\x1a\x5a\
 +\xff\0\0\0\0\x71\xa1\x50\xff\0\0\0\0\x15\x01\xd4\0\0\0\0\0\x71\x62\x03\0\0\0\0\
 +\0\x67\x02\0\0\x08\0\0\0\x71\x61\x02\0\0\0\0\0\x4f\x12\0\0\0\0\0\0\x71\x63\x04\
 +\0\0\0\0\0\x71\x61\x05\0\0\0\0\0\x67\x01\0\0\x08\0\0\0\x4f\x31\0\0\0\0\0\0\x67\
 +\x01\0\0\x10\0\0\0\x4f\x21\0\0\0\0\0\0\x71\xa2\x53\xff\0\0\0\0\x79\xa0\x30\xff\
 +\0\0\0\0\x15\x02\x06\x01\0\0\0\0\xbf\x12\0\0\0\0\0\0\x57\x02\0\0\x02\0\0\0\x15\
 +\x02\x03\x01\0\0\0\0\x61\xa1\x5c\xff\0\0\0\0\x63\x1a\xa0\xff\0\0\0\0\x61\xa1\
 +\x60\xff\0\0\0\0\x63\x1a\xa4\xff\0\0\0\0\x69\xa1\x58\xff\0\0\0\0\x6b\x1a\xa8\
 +\xff\0\0\0\0\x69\xa1\x5a\xff\0\0\0\0\x6b\x1a\xaa\xff\0\0\0\0\x05\0\x65\x01\0\0\
 +\0\0\xb7\x01\0\0\x01\0\0\0\x73\x1a\x51\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\
 +\xf0\xff\0\0\0\0\x7b\x1a\xe8\xff\0\0\0\0\x7b\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\
 +\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\
 +\xff\xff\xb7\x01\0\0\x28\0\0\0\x7b\x1a\x40\xff\0\0\0\0\xbf\x81\0\0\0\0\0\0\xb7\
 +\x02\0\0\0\0\0\0\xb7\x04\0\0\x28\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\
 +\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x10\x01\0\0\0\0\x79\xa1\xe0\
 +\xff\0\0\0\0\x63\x1a\x64\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x68\xff\0\0\
 +\0\0\x79\xa1\xd8\xff\0\0\0\0\x63\x1a\x5c\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\
 +\x1a\x60\xff\0\0\0\0\x79\xa1\xe8\xff\0\0\0\0\x63\x1a\x6c\xff\0\0\0\0\x77\x01\0\
 +\0\x20\0\0\0\x63\x1a\x70\xff\0\0\0\0\x79\xa1\xf0\xff\0\0\0\0\x63\x1a\x74\xff\0\
 +\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x78\xff\0\0\0\0\x71\xa9\xd6\xff\0\0\0\0\
 +\x25\x09\xff\0\x3c\0\0\0\xb7\x01\0\0\x01\0\0\0\x6f\x91\0\0\0\0\0\0\x18\x02\0\0\
 +\x01\0\0\0\0\0\0\0\0\x18\0\x1c\x5f\x21\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\0\
 +\xf8\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x6b\x1a\xfe\xff\0\0\0\0\xb7\x01\0\0\x28\0\0\
 +\0\x7b\x1a\x40\xff\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\x01\0\0\x8c\xff\xff\xff\x7b\
 +\x1a\x18\xff\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\x01\0\0\x7c\xff\xff\xff\x7b\x1a\
 +\x10\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\x28\xff\0\0\0\0\x7b\x7a\x20\xff\0\
 +\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xfe\xff\xff\xff\xbf\x81\0\0\0\0\0\0\x79\
 +\xa2\x40\xff\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\
 +\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x15\0\x01\0\0\0\0\0\x05\0\x90\
 +\x01\0\0\0\0\xbf\x91\0\0\0\0\0\0\x15\x01\x23\0\x3c\0\0\0\x15\x01\x59\0\x2c\0\0\
 +\0\x55\x01\x5a\0\x2b\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xf8\xff\0\0\0\0\xbf\xa3\
 +\0\0\0\0\0\0\x07\x03\0\0\xf8\xff\xff\xff\xbf\x81\0\0\0\0\0\0\x79\xa2\x40\xff\0\
 +\0\0\0\xb7\x04\0\0\x04\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\xbf\x01\
 +\0\0\0\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\x55\x01\x03\x01\0\0\0\
 +\0\x71\xa1\xfa\xff\0\0\0\0\x55\x01\x4b\0\x02\0\0\0\x71\xa1\xf9\xff\0\0\0\0\x55\
 +\x01\x49\0\x02\0\0\0\x71\xa1\xfb\xff\0\0\0\0\x55\x01\x47\0\x01\0\0\0\x79\xa2\
 +\x40\xff\0\0\0\0\x07\x02\0\0\x08\0\0\0\xbf\x81\0\0\0\0\0\0\x79\xa3\x18\xff\0\0\
 +\0\0\xb7\x04\0\0\x10\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\xbf\x01\0\
 +\0\0\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\x55\x01\xf2\0\0\0\0\0\
 +\xb7\x01\0\0\x01\0\0\0\x73\x1a\x55\xff\0\0\0\0\x05\0\x39\0\0\0\0\0\xb7\x01\0\0\
 +\0\0\0\0\x6b\x1a\xf8\xff\0\0\0\0\xb7\x09\0\0\x02\0\0\0\xb7\x07\0\0\x1e\0\0\0\
 +\x05\0\x0e\0\0\0\0\0\x79\xa2\x38\xff\0\0\0\0\x0f\x29\0\0\0\0\0\0\xbf\x92\0\0\0\
 +\0\0\0\x07\x02\0\0\x01\0\0\0\x71\xa3\xff\xff\0\0\0\0\x67\x03\0\0\x03\0\0\0\x2d\
 +\x23\x02\0\0\0\0\0\x79\xa7\x20\xff\0\0\0\0\x05\0\x2b\0\0\0\0\0\x07\x07\0\0\xff\
 +\xff\xff\xff\xbf\x72\0\0\0\0\0\0\x67\x02\0\0\x20\0\0\0\x77\x02\0\0\x20\0\0\0\
 +\x15\x02\xf9\xff\0\0\0\0\x7b\x9a\x38\xff\0\0\0\0\x79\xa1\x40\xff\0\0\0\0\x0f\
 +\x19\0\0\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xf8\xff\xff\xff\xbf\x81\0\0\0\
 +\0\0\0\xbf\x92\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\
 +\0\x44\0\0\0\xbf\x01\0\0\0\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\
 +\x55\x01\x94\0\0\0\0\0\x71\xa2\xf8\xff\0\0\0\0\x55\x02\x0f\0\xc9\0\0\0\x07\x09\
 +\0\0\x02\0\0\0\xbf\x81\0\0\0\0\0\0\xbf\x92\0\0\0\0\0\0\x79\xa3\x10\xff\0\0\0\0\
 +\xb7\x04\0\0\x10\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\xbf\x01\0\0\0\
 +\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\x55\x01\x87\0\0\0\0\0\xb7\
 +\x01\0\0\x01\0\0\0\x73\x1a\x54\xff\0\0\0\0\x79\xa7\x20\xff\0\0\0\0\x05\0\x07\0\
 +\0\0\0\0\xb7\x09\0\0\x01\0\0\0\x15\x02\xd1\xff\0\0\0\0\x71\xa9\xf9\xff\0\0\0\0\
 +\x07\x09\0\0\x02\0\0\0\x05\0\xce\xff\0\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\x1a\x56\
 +\xff\0\0\0\0\x71\xa1\xff\xff\0\0\0\0\x67\x01\0\0\x03\0\0\0\x79\xa2\x40\xff\0\0\
 +\0\0\x0f\x12\0\0\0\0\0\0\x07\x02\0\0\x08\0\0\0\x7b\x2a\x40\xff\0\0\0\0\x71\xa9\
 +\xfe\xff\0\0\0\0\x25\x09\x0e\0\x3c\0\0\0\xb7\x01\0\0\x01\0\0\0\x6f\x91\0\0\0\0\
 +\0\0\x18\x02\0\0\x01\0\0\0\0\0\0\0\0\x18\0\x1c\x5f\x21\0\0\0\0\0\0\x55\x01\x01\
 +\0\0\0\0\0\x05\0\x07\0\0\0\0\0\x79\xa1\x28\xff\0\0\0\0\x07\x01\0\0\x01\0\0\0\
 +\x7b\x1a\x28\xff\0\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\x55\x01\
 +\x82\xff\x0b\0\0\0\x05\0\x10\xff\0\0\0\0\x15\x09\xf8\xff\x87\0\0\0\x05\0\xfd\
 +\xff\0\0\0\0\x71\xa1\x51\xff\0\0\0\0\x79\xa0\x30\xff\0\0\0\0\x15\x01\x17\x01\0\
 +\0\0\0\x71\x62\x03\0\0\0\0\0\x67\x02\0\0\x08\0\0\0\x71\x61\x02\0\0\0\0\0\x4f\
 +\x12\0\0\0\0\0\0\x71\x63\x04\0\0\0\0\0\x71\x61\x05\0\0\0\0\0\x67\x01\0\0\x08\0\
 +\0\0\x4f\x31\0\0\0\0\0\0\x67\x01\0\0\x10\0\0\0\x4f\x21\0\0\0\0\0\0\x71\xa2\x53\
 +\xff\0\0\0\0\x15\x02\x3d\0\0\0\0\0\xbf\x12\0\0\0\0\0\0\x57\x02\0\0\x10\0\0\0\
 +\x15\x02\x3a\0\0\0\0\0\xbf\xa2\0\0\0\0\0\0\x07\x02\0\0\x5c\xff\xff\xff\x71\xa4\
 +\x54\xff\0\0\0\0\xbf\x23\0\0\0\0\0\0\x15\x04\x02\0\0\0\0\0\xbf\xa3\0\0\0\0\0\0\
 +\x07\x03\0\0\x7c\xff\xff\xff\x67\x01\0\0\x38\0\0\0\xc7\x01\0\0\x38\0\0\0\x65\
 +\x01\x01\0\xff\xff\xff\xff\xbf\x32\0\0\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\
 +\x6c\xff\xff\xff\x71\xa5\x55\xff\0\0\0\0\xbf\x34\0\0\0\0\0\0\x15\x05\x02\0\0\0\
 +\0\0\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\x8c\xff\xff\xff\x65\x01\x01\0\xff\xff\xff\
 +\xff\xbf\x43\0\0\0\0\0\0\x61\x21\x04\0\0\0\0\0\x67\x01\0\0\x20\0\0\0\x61\x24\0\
 +\0\0\0\0\0\x4f\x41\0\0\0\0\0\0\x7b\x1a\xa0\xff\0\0\0\0\x61\x21\x08\0\0\0\0\0\
 +\x61\x22\x0c\0\0\0\0\0\x67\x02\0\0\x20\0\0\0\x4f\x12\0\0\0\0\0\0\x7b\x2a\xa8\
 +\xff\0\0\0\0\x61\x31\0\0\0\0\0\0\x61\x32\x04\0\0\0\0\0\x61\x34\x08\0\0\0\0\0\
 +\x61\x33\x0c\0\0\0\0\0\x69\xa5\x5a\xff\0\0\0\0\x6b\x5a\xc2\xff\0\0\0\0\x69\xa5\
 +\x58\xff\0\0\0\0\x6b\x5a\xc0\xff\0\0\0\0\x67\x03\0\0\x20\0\0\0\x4f\x43\0\0\0\0\
 +\0\0\x7b\x3a\xb8\xff\0\0\0\0\x67\x02\0\0\x20\0\0\0\x4f\x12\0\0\0\0\0\0\x7b\x2a\
 +\xb0\xff\0\0\0\0\x05\0\x6b\0\0\0\0\0\x71\xa2\x52\xff\0\0\0\0\x15\x02\x04\0\0\0\
 +\0\0\xbf\x12\0\0\0\0\0\0\x57\x02\0\0\x04\0\0\0\x15\x02\x01\0\0\0\0\0\x05\0\xf7\
 +\xfe\0\0\0\0\x57\x01\0\0\x01\0\0\0\x15\x01\xd3\0\0\0\0\0\x61\xa1\x5c\xff\0\0\0\
 +\0\x63\x1a\xa0\xff\0\0\0\0\x61\xa1\x60\xff\0\0\0\0\x63\x1a\xa4\xff\0\0\0\0\x05\
 +\0\x5e\0\0\0\0\0\x71\xa2\x52\xff\0\0\0\0\x15\x02\x1e\0\0\0\0\0\xbf\x12\0\0\0\0\
 +\0\0\x57\x02\0\0\x20\0\0\0\x15\x02\x1b\0\0\0\0\0\xbf\xa2\0\0\0\0\0\0\x07\x02\0\
 +\0\x5c\xff\xff\xff\x71\xa4\x54\xff\0\0\0\0\xbf\x23\0\0\0\0\0\0\x15\x04\x02\0\0\
 +\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\x7c\xff\xff\xff\x57\x01\0\0\0\x01\0\0\
 +\x15\x01\x01\0\0\0\0\0\xbf\x32\0\0\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\x6c\
 +\xff\xff\xff\x71\xa5\x55\xff\0\0\0\0\xbf\x34\0\0\0\0\0\0\x15\x05\x02\0\0\0\0\0\
 +\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\x8c\xff\xff\xff\x15\x01\xc3\xff\0\0\0\0\x05\0\
 +\xc1\xff\0\0\0\0\xb7\x09\0\0\x3c\0\0\0\x79\xa7\x20\xff\0\0\0\0\x67\0\0\0\x20\0\
 +\0\0\x77\0\0\0\x20\0\0\0\x15\0\xa5\xfe\0\0\0\0\x05\0\xb0\0\0\0\0\0\x15\x09\x07\
 +\xff\x87\0\0\0\x05\0\xa2\xfe\0\0\0\0\xbf\x12\0\0\0\0\0\0\x57\x02\0\0\x08\0\0\0\
 +\x15\x02\xab\0\0\0\0\0\xbf\xa2\0\0\0\0\0\0\x07\x02\0\0\x5c\xff\xff\xff\x71\xa4\
 +\x54\xff\0\0\0\0\xbf\x23\0\0\0\0\0\0\x15\x04\x02\0\0\0\0\0\xbf\xa3\0\0\0\0\0\0\
 +\x07\x03\0\0\x7c\xff\xff\xff\x57\x01\0\0\x40\0\0\0\x15\x01\x01\0\0\0\0\0\xbf\
 +\x32\0\0\0\0\0\0\x61\x23\x04\0\0\0\0\0\x67\x03\0\0\x20\0\0\0\x61\x24\0\0\0\0\0\
 +\0\x4f\x43\0\0\0\0\0\0\x7b\x3a\xa0\xff\0\0\0\0\x61\x23\x08\0\0\0\0\0\x61\x22\
 +\x0c\0\0\0\0\0\x67\x02\0\0\x20\0\0\0\x4f\x32\0\0\0\0\0\0\x7b\x2a\xa8\xff\0\0\0\
 +\0\x15\x01\x1c\0\0\0\0\0\x71\xa1\x55\xff\0\0\0\0\x15\x01\x1a\0\0\0\0\0\x61\xa1\
 +\x98\xff\0\0\0\0\x67\x01\0\0\x20\0\0\0\x61\xa2\x94\xff\0\0\0\0\x4f\x21\0\0\0\0\
 +\0\0\x7b\x1a\xb8\xff\0\0\0\0\x61\xa1\x90\xff\0\0\0\0\x67\x01\0\0\x20\0\0\0\x61\
 +\xa2\x8c\xff\0\0\0\0\x05\0\x19\0\0\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\x1a\x52\xff\
 +\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\
 +\x03\0\0\xd0\xff\xff\xff\xbf\x81\0\0\0\0\0\0\x79\xa2\x40\xff\0\0\0\0\xb7\x04\0\
 +\0\x08\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\
 +\0\0\0\x20\0\0\0\x55\0\x7d\0\0\0\0\0\x05\0\x88\xfe\0\0\0\0\xb7\x09\0\0\x2b\0\0\
 +\0\x05\0\xc6\xff\0\0\0\0\x61\xa1\x78\xff\0\0\0\0\x67\x01\0\0\x20\0\0\0\x61\xa2\
 +\x74\xff\0\0\0\0\x4f\x21\0\0\0\0\0\0\x7b\x1a\xb8\xff\0\0\0\0\x61\xa1\x70\xff\0\
 +\0\0\0\x67\x01\0\0\x20\0\0\0\x61\xa2\x6c\xff\0\0\0\0\x4f\x21\0\0\0\0\0\0\x7b\
 +\x1a\xb0\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x07\x07\0\0\x04\0\0\0\x61\x03\0\0\0\0\
 +\0\0\xb7\x05\0\0\0\0\0\0\x05\0\x4e\0\0\0\0\0\xaf\x52\0\0\0\0\0\0\xbf\x75\0\0\0\
 +\0\0\0\x0f\x15\0\0\0\0\0\0\x71\x55\0\0\0\0\0\0\x67\x03\0\0\x01\0\0\0\xbf\x50\0\
 +\0\0\0\0\0\x77\0\0\0\x07\0\0\0\x4f\x03\0\0\0\0\0\0\xbf\x40\0\0\0\0\0\0\x67\0\0\
 +\0\x39\0\0\0\xc7\0\0\0\x3f\0\0\0\x5f\x30\0\0\0\0\0\0\xaf\x02\0\0\0\0\0\0\xbf\
 +\x50\0\0\0\0\0\0\x77\0\0\0\x06\0\0\0\x57\0\0\0\x01\0\0\0\x67\x03\0\0\x01\0\0\0\
 +\x4f\x03\0\0\0\0\0\0\xbf\x40\0\0\0\0\0\0\x67\0\0\0\x3a\0\0\0\xc7\0\0\0\x3f\0\0\
 +\0\x5f\x30\0\0\0\0\0\0\xaf\x02\0\0\0\0\0\0\x67\x03\0\0\x01\0\0\0\xbf\x50\0\0\0\
 +\0\0\0\x77\0\0\0\x05\0\0\0\x57\0\0\0\x01\0\0\0\x4f\x03\0\0\0\0\0\0\xbf\x40\0\0\
 +\0\0\0\0\x67\0\0\0\x3b\0\0\0\xc7\0\0\0\x3f\0\0\0\x5f\x30\0\0\0\0\0\0\xaf\x02\0\
 +\0\0\0\0\0\x67\x03\0\0\x01\0\0\0\xbf\x50\0\0\0\0\0\0\x77\0\0\0\x04\0\0\0\x57\0\
 +\0\0\x01\0\0\0\x4f\x03\0\0\0\0\0\0\xbf\x40\0\0\0\0\0\0\x67\0\0\0\x3c\0\0\0\xc7\
 +\0\0\0\x3f\0\0\0\x5f\x30\0\0\0\0\0\0\xaf\x02\0\0\0\0\0\0\xbf\x50\0\0\0\0\0\0\
 +\x77\0\0\0\x03\0\0\0\x57\0\0\0\x01\0\0\0\x67\x03\0\0\x01\0\0\0\x4f\x03\0\0\0\0\
 +\0\0\xbf\x40\0\0\0\0\0\0\x67\0\0\0\x3d\0\0\0\xc7\0\0\0\x3f\0\0\0\x5f\x30\0\0\0\
 +\0\0\0\xaf\x02\0\0\0\0\0\0\xbf\x50\0\0\0\0\0\0\x77\0\0\0\x02\0\0\0\x57\0\0\0\
 +\x01\0\0\0\x67\x03\0\0\x01\0\0\0\x4f\x03\0\0\0\0\0\0\xbf\x40\0\0\0\0\0\0\x67\0\
 +\0\0\x3e\0\0\0\xc7\0\0\0\x3f\0\0\0\x5f\x30\0\0\0\0\0\0\xaf\x02\0\0\0\0\0\0\xbf\
 +\x50\0\0\0\0\0\0\x77\0\0\0\x01\0\0\0\x57\0\0\0\x01\0\0\0\x67\x03\0\0\x01\0\0\0\
 +\x4f\x03\0\0\0\0\0\0\x57\x04\0\0\x01\0\0\0\x87\x04\0\0\0\0\0\0\x5f\x34\0\0\0\0\
 +\0\0\xaf\x42\0\0\0\0\0\0\x57\x05\0\0\x01\0\0\0\x67\x03\0\0\x01\0\0\0\x4f\x53\0\
 +\0\0\0\0\0\x07\x01\0\0\x01\0\0\0\xbf\x25\0\0\0\0\0\0\x15\x01\x0b\0\x24\0\0\0\
 +\xbf\xa2\0\0\0\0\0\0\x07\x02\0\0\xa0\xff\xff\xff\x0f\x12\0\0\0\0\0\0\x71\x24\0\
 +\0\0\0\0\0\xbf\x40\0\0\0\0\0\0\x67\0\0\0\x38\0\0\0\xc7\0\0\0\x38\0\0\0\xb7\x02\
 +\0\0\0\0\0\0\x65\0\xa9\xff\xff\xff\xff\xff\xbf\x32\0\0\0\0\0\0\x05\0\xa7\xff\0\
 +\0\0\0\xbf\x21\0\0\0\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\x15\x01\
 +\x0e\0\0\0\0\0\x71\x63\x06\0\0\0\0\0\x71\x64\x07\0\0\0\0\0\x67\x04\0\0\x08\0\0\
 +\0\x4f\x34\0\0\0\0\0\0\x3f\x41\0\0\0\0\0\0\x2f\x41\0\0\0\0\0\0\x1f\x12\0\0\0\0\
 +\0\0\x63\x2a\x50\xff\0\0\0\0\xbf\xa2\0\0\0\0\0\0\x07\x02\0\0\x50\xff\xff\xff\
 +\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\x55\0\x05\0\0\0\0\0\
 +\x71\x61\x08\0\0\0\0\0\x71\x60\x09\0\0\0\0\0\x67\0\0\0\x08\0\0\0\x4f\x10\0\0\0\
 +\0\0\0\x95\0\0\0\0\0\0\0\x69\0\0\0\0\0\0\0\x05\0\xfd\xff\0\0\0\0\x02\0\0\0\x04\
 +\0\0\0\x0a\0\0\0\x01\0\0\0\0\0\0\0\x02\0\0\0\x04\0\0\0\x28\0\0\0\x01\0\0\0\0\0\
 +\0\0\x02\0\0\0\x04\0\0\0\x02\0\0\0\x80\0\0\0\0\0\0\0\x47\x50\x4c\x20\x76\x32\0\
 +\0\0\0\0\0\x10\0\0\0\0\0\0\0\x01\x7a\x52\0\x08\x7c\x0b\x01\x0c\0\0\0\x18\0\0\0\
 +\x18\0\0\0\0\0\0\0\0\0\0\0\xd8\x13\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 +\0\0\0\0\0\0\0\0\0\0\0\0\xa0\0\0\0\x04\0\xf1\xff\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 +\0\x60\x02\0\0\0\0\x03\0\x20\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x3f\x02\0\0\0\0\
 +\x03\0\xd0\x0f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xed\x01\0\0\0\0\x03\0\x10\x10\0\0\0\
 +\0\0\0\0\0\0\0\0\0\0\0\xd4\x01\0\0\0\0\x03\0\x20\x10\0\0\0\0\0\0\0\0\0\0\0\0\0\
 +\0\xa3\x01\0\0\0\0\x03\0\xb8\x12\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x63\x01\0\0\0\0\
 +\x03\0\x48\x10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x2a\x01\0\0\0\0\x03\0\x10\x13\0\0\0\
 +\0\0\0\0\0\0\0\0\0\0\0\xe1\0\0\0\0\0\x03\0\xa0\x13\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 +\x2e\x02\0\0\0\0\x03\0\x28\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x68\x02\0\0\0\0\x03\
 +\0\xc0\x13\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x36\x02\0\0\0\0\x03\0\xc8\x13\0\0\0\0\0\
 +\0\0\0\0\0\0\0\0\0\x22\x01\0\0\0\0\x03\0\xe8\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 +\x02\x01\0\0\0\0\x03\0\x40\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xd9\0\0\0\0\0\x03\0\
 +\xf8\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x26\x02\0\0\0\0\x03\0\x20\x0e\0\0\0\0\0\0\
 +\0\0\0\0\0\0\0\0\xcc\x01\0\0\0\0\x03\0\x60\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x9b\
 +\x01\0\0\0\0\x03\0\xc8\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x5b\x01\0\0\0\0\x03\0\
 +\x20\x07\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x7c\x01\0\0\0\0\x03\0\x48\x08\0\0\0\0\0\0\
 +\0\0\0\0\0\0\0\0\x53\x01\0\0\0\0\x03\0\xb8\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x1a\
 +\x01\0\0\0\0\x03\0\xe0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x84\x01\0\0\0\0\x03\0\
 +\xb8\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x1e\x02\0\0\0\0\x03\0\xd8\x09\0\0\0\0\0\0\0\
 +\0\0\0\0\0\0\0\xc4\x01\0\0\0\0\x03\0\x70\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x93\
 +\x01\0\0\0\0\x03\0\xa8\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x74\x01\0\0\0\0\x03\0\
 +\xf0\x0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x4b\x01\0\0\0\0\x03\0\0\x0a\0\0\0\0\0\0\0\
 +\0\0\0\0\0\0\0\x12\x01\0\0\0\0\x03\0\x10\x0a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xfa\0\
 +\0\0\0\0\x03\0\xc0\x0a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x58\x02\0\0\0\0\x03\0\x88\
 +\x0a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x16\x02\0\0\0\0\x03\0\xb8\x0a\0\0\0\0\0\0\0\0\
 +\0\0\0\0\0\0\xe5\x01\0\0\0\0\x03\0\xc0\x0f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xbc\x01\
 +\0\0\0\0\x03\0\0\x0e\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x8b\x01\0\0\0\0\x03\0\x18\x0e\
 +\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xd1\0\0\0\0\0\x03\0\0\x04\0\0\0\0\0\0\0\0\0\0\0\0\
 +\0\0\x50\x02\0\0\0\0\x03\0\x20\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x0e\x02\0\0\0\0\
 +\x03\0\x48\x0f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x6c\x01\0\0\0\0\x03\0\xb0\x04\0\0\0\
 +\0\0\0\0\0\0\0\0\0\0\0\x43\x01\0\0\0\0\x03\0\xc8\x0c\0\0\0\0\0\0\0\0\0\0\0\0\0\
 +\0\xc9\0\0\0\0\0\x03\0\xf8\x0c\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x06\x02\0\0\0\0\x03\
 +\0\xd0\x0a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x3b\x01\0\0\0\0\x03\0\x98\x0b\0\0\0\0\0\
 +\0\0\0\0\0\0\0\0\0\xf2\0\0\0\0\0\x03\0\xb8\x0b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x48\
 +\x02\0\0\0\0\x03\0\xf0\x0b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xfe\x01\0\0\0\0\x03\0\
 +\xf8\x0b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xdd\x01\0\0\0\0\x03\0\0\x0c\0\0\0\0\0\0\0\
 +\0\0\0\0\0\0\0\xb4\x01\0\0\0\0\x03\0\x30\x0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x0a\
 +\x01\0\0\0\0\x03\0\x90\x0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xc1\0\0\0\0\0\x03\0\xa8\
 +\x0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xba\0\0\0\0\0\x03\0\xd0\x01\0\0\0\0\0\0\0\0\0\
 +\0\0\0\0\0\xf6\x01\0\0\0\0\x03\0\xe0\x0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xac\x01\0\
 +\0\0\0\x03\0\x30\x0e\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x33\x01\0\0\0\0\x03\0\x80\x0e\
 +\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xea\0\0\0\0\0\x03\0\x98\x0e\0\0\0\0\0\0\0\0\0\0\0\
 +\0\0\0\0\0\0\0\x03\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x6b\0\0\0\x11\0\x06\
 +\0\0\0\0\0\0\0\0\0\x07\0\0\0\0\0\0\0\x25\0\0\0\x11\0\x05\0\0\0\0\0\0\0\0\0\x14\
 +\0\0\0\0\0\0\0\x82\0\0\0\x11\0\x05\0\x28\0\0\0\0\0\0\0\x14\0\0\0\0\0\0\0\x01\0\
 +\0\0\x11\0\x05\0\x14\0\0\0\0\0\0\0\x14\0\0\0\0\0\0\0\x40\0\0\0\x12\0\x03\0\0\0\
 +\0\0\0\0\0\0\xd8\x13\0\0\0\0\0\0\x28\0\0\0\0\0\0\0\x01\0\0\0\x3a\0\0\0\x50\0\0\
 +\0\0\0\0\0\x01\0\0\0\x3c\0\0\0\x80\x13\0\0\0\0\0\0\x01\0\0\0\x3b\0\0\0\x1c\0\0\
 +\0\0\0\0\0\x01\0\0\0\x38\0\0\0\0\x74\x61\x70\x5f\x72\x73\x73\x5f\x6d\x61\x70\
 +\x5f\x74\x6f\x65\x70\x6c\x69\x74\x7a\x5f\x6b\x65\x79\0\x2e\x74\x65\x78\x74\0\
 +\x6d\x61\x70\x73\0\x74\x61\x70\x5f\x72\x73\x73\x5f\x6d\x61\x70\x5f\x63\x6f\x6e\
 +\x66\x69\x67\x75\x72\x61\x74\x69\x6f\x6e\x73\0\x74\x75\x6e\x5f\x72\x73\x73\x5f\
 +\x73\x74\x65\x65\x72\x69\x6e\x67\x5f\x70\x72\x6f\x67\0\x2e\x72\x65\x6c\x74\x75\
 +\x6e\x5f\x72\x73\x73\x5f\x73\x74\x65\x65\x72\x69\x6e\x67\0\x5f\x6c\x69\x63\x65\
 +\x6e\x73\x65\0\x2e\x72\x65\x6c\x2e\x65\x68\x5f\x66\x72\x61\x6d\x65\0\x74\x61\
 +\x70\x5f\x72\x73\x73\x5f\x6d\x61\x70\x5f\x69\x6e\x64\x69\x72\x65\x63\x74\x69\
 +\x6f\x6e\x5f\x74\x61\x62\x6c\x65\0\x72\x73\x73\x2e\x62\x70\x66\x2e\x63\0\x2e\
 +\x73\x74\x72\x74\x61\x62\0\x2e\x73\x79\x6d\x74\x61\x62\0\x4c\x42\x42\x30\x5f\
 +\x39\0\x4c\x42\x42\x30\x5f\x38\x39\0\x4c\x42\x42\x30\x5f\x36\x39\0\x4c\x42\x42\
 +\x30\x5f\x35\x39\0\x4c\x42\x42\x30\x5f\x31\x39\0\x4c\x42\x42\x30\x5f\x31\x30\
 +\x39\0\x4c\x42\x42\x30\x5f\x39\x38\0\x4c\x42\x42\x30\x5f\x37\x38\0\x4c\x42\x42\
 +\x30\x5f\x34\x38\0\x4c\x42\x42\x30\x5f\x31\x38\0\x4c\x42\x42\x30\x5f\x38\x37\0\
 +\x4c\x42\x42\x30\x5f\x34\x37\0\x4c\x42\x42\x30\x5f\x33\x37\0\x4c\x42\x42\x30\
 +\x5f\x31\x37\0\x4c\x42\x42\x30\x5f\x31\x30\x37\0\x4c\x42\x42\x30\x5f\x39\x36\0\
 +\x4c\x42\x42\x30\x5f\x37\x36\0\x4c\x42\x42\x30\x5f\x36\x36\0\x4c\x42\x42\x30\
 +\x5f\x34\x36\0\x4c\x42\x42\x30\x5f\x33\x36\0\x4c\x42\x42\x30\x5f\x32\x36\0\x4c\
 +\x42\x42\x30\x5f\x31\x30\x36\0\x4c\x42\x42\x30\x5f\x36\x35\0\x4c\x42\x42\x30\
 +\x5f\x34\x35\0\x4c\x42\x42\x30\x5f\x33\x35\0\x4c\x42\x42\x30\x5f\x34\0\x4c\x42\
 +\x42\x30\x5f\x35\x34\0\x4c\x42\x42\x30\x5f\x34\x34\0\x4c\x42\x42\x30\x5f\x32\
 +\x34\0\x4c\x42\x42\x30\x5f\x31\x30\x34\0\x4c\x42\x42\x30\x5f\x39\x33\0\x4c\x42\
 +\x42\x30\x5f\x38\x33\0\x4c\x42\x42\x30\x5f\x35\x33\0\x4c\x42\x42\x30\x5f\x34\
 +\x33\0\x4c\x42\x42\x30\x5f\x32\x33\0\x4c\x42\x42\x30\x5f\x31\x30\x33\0\x4c\x42\
 +\x42\x30\x5f\x38\x32\0\x4c\x42\x42\x30\x5f\x35\x32\0\x4c\x42\x42\x30\x5f\x31\
 +\x30\x32\0\x4c\x42\x42\x30\x5f\x39\x31\0\x4c\x42\x42\x30\x5f\x38\x31\0\x4c\x42\
 +\x42\x30\x5f\x37\x31\0\x4c\x42\x42\x30\x5f\x36\x31\0\x4c\x42\x42\x30\x5f\x35\
 +\x31\0\x4c\x42\x42\x30\x5f\x34\x31\0\x4c\x42\x42\x30\x5f\x32\x31\0\x4c\x42\x42\
 +\x30\x5f\x31\x31\0\x4c\x42\x42\x30\x5f\x31\x31\x31\0\x4c\x42\x42\x30\x5f\x31\
 +\x30\x31\0\x4c\x42\x42\x30\x5f\x38\x30\0\x4c\x42\x42\x30\x5f\x36\x30\0\x4c\x42\
 +\x42\x30\x5f\x35\x30\0\x4c\x42\x42\x30\x5f\x31\x30\0\x4c\x42\x42\x30\x5f\x31\
 +\x31\x30\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 +\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xaa\
 +\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xa0\x1a\0\0\0\0\0\0\x71\x02\0\
 +\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x1a\0\0\0\x01\0\0\
 +\0\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 +\0\0\0\0\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x5a\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\
 +\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\0\0\xd8\x13\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\
 +\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x56\0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
 +\0\x60\x1a\0\0\0\0\0\0\x30\0\0\0\0\0\0\0\x09\0\0\0\x03\0\0\0\x08\0\0\0\0\0\0\0\
 +\x10\0\0\0\0\0\0\0\x20\0\0\0\x01\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x18\
 +\x14\0\0\0\0\0\0\x3c\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x04\0\0\0\0\0\0\0\0\0\0\0\0\
 +\0\0\0\x6c\0\0\0\x01\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x54\x14\0\0\0\0\0\
 +\0\x07\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x78\0\0\
 +\0\x01\0\0\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x60\x14\0\0\0\0\0\0\x30\0\0\0\0\
 +\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x74\0\0\0\x09\0\0\0\0\
 +\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x90\x1a\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x09\0\0\0\
 +\x07\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\xb2\0\0\0\x02\0\0\0\0\0\0\0\0\0\
 +\0\0\0\0\0\0\0\0\0\0\x90\x14\0\0\0\0\0\0\xd0\x05\0\0\0\0\0\0\x01\0\0\0\x39\0\0\
 +\0\x08\0\0\0\0\0\0\0\x18\0\0\0\0\0\0\0";
 +
 +    return 0;
 +err:
 +    bpf_object__destroy_skeleton(s);
 +    return -1;
 +}
 +
 +#endif /* __RSS_BPF_SKEL_H__ */
 diff --git a/ebpf/trace-events b/ebpf/trace-events
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/ebpf/trace-events
@@ -XXX,XX +XXX,XX @@
 +# See docs/devel/tracing.txt for syntax documentation.
 +
 +# ebpf-rss.c
 +ebpf_error(const char *s1, const char *s2) "error in %s: %s"
 diff --git a/ebpf/trace.h b/ebpf/trace.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/ebpf/trace.h
@@ -0,0 +1 @@
 +#include "trace/trace-ebpf.h"
 diff --git a/meson.build b/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/meson.build
 +++ b/meson.build
@@ -XXX,XX +XXX,XX @@ if not get_option('fuse_lseek').disabled()
    endif
  endif
 +# libbpf
 +libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
 +if libbpf.found() and not cc.links('''
 +   #include <bpf/libbpf.h>
 +   int main(void)
 +   {
 +     bpf_object__destroy_skeleton(NULL);
 +     return 0;
 +   }''', dependencies: libbpf)
 +  libbpf = not_found
 +  if get_option('bpf').enabled()
 +    error('libbpf skeleton test failed')
 +  else
 +    warning('libbpf skeleton test failed, disabling')
 +  endif
 +endif
 +
  if get_option('cfi')
    cfi_flags=[]
    # Check for dependency on LTO
@@ -XXX,XX +XXX,XX @@ endif
  config_host_data.set('CONFIG_GTK', gtk.found())
  config_host_data.set('CONFIG_LIBATTR', have_old_libattr)
  config_host_data.set('CONFIG_LIBCAP_NG', libcap_ng.found())
 +config_host_data.set('CONFIG_EBPF', libbpf.found())
  config_host_data.set('CONFIG_LIBISCSI', libiscsi.found())
  config_host_data.set('CONFIG_LIBNFS', libnfs.found())
  config_host_data.set('CONFIG_RBD', rbd.found())
@@ -XXX,XX +XXX,XX @@ if have_system
      'backends',
      'backends/tpm',
      'chardev',
 +    'ebpf',
      'hw/9pfs',
      'hw/acpi',
      'hw/adc',
@@ -XXX,XX +XXX,XX @@ subdir('accel')
  subdir('plugins')
  subdir('bsd-user')
  subdir('linux-user')
 +subdir('ebpf')
 +
 +common_ss.add(libbpf)
  bsd_user_ss.add(files('gdbstub.c'))
  specific_ss.add_all(when: 'CONFIG_BSD_USER', if_true: bsd_user_ss)
@@ -XXX,XX +XXX,XX @@ summary_info += {'RDMA support':      config_host.has_key('CONFIG_RDMA')}
  summary_info += {'PVRDMA support':    config_host.has_key('CONFIG_PVRDMA')}
  summary_info += {'fdt support':       fdt_opt == 'disabled' ? false : fdt_opt}
  summary_info += {'libcap-ng support': libcap_ng.found()}
 +summary_info += {'bpf support': libbpf.found()}
  # TODO: add back protocol and server version
  summary_info += {'spice support':     config_host.has_key('CONFIG_SPICE')}
  summary_info += {'rbd support':       rbd.found()}
 diff --git a/meson_options.txt b/meson_options.txt
 index XXXXXXX..XXXXXXX 100644
 --- a/meson_options.txt
 +++ b/meson_options.txt
@@ -XXX,XX +XXX,XX @@ option('bzip2', type : 'feature', value : 'auto',
         description: 'bzip2 support for DMG images')
  option('cap_ng', type : 'feature', value : 'auto',
         description: 'cap_ng support')
 +option('bpf', type : 'feature', value : 'auto',
 +        description: 'eBPF support')
  option('cocoa', type : 'feature', value : 'auto',
         description: 'Cocoa user interface (macOS only)')
  option('curl', type : 'feature', value : 'auto',
 --
-.17.1
+.7.4

-[Qemu-devel] [PULL 03/25] colo-compare: use notifier to notify packets comparing result
+Deleted patch
-From: Zhang Chen <zhangckid@gmail.com>
-It's a good idea to use notifier to notify COLO frame of
-inconsistent packets comparing.
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- net/colo-compare.c | 37 ++++++++++++++++++++++++++-----------
- net/colo-compare.h |  2 ++
-files changed, 28 insertions(+), 11 deletions(-)
-diff --git a/net/colo-compare.c b/net/colo-compare.c
-index XXXXXXX..XXXXXXX 100644
---- a/net/colo-compare.c
-+++ b/net/colo-compare.c
-@@ -XXX,XX +XXX,XX @@
- #include "sysemu/iothread.h"
- #include "net/colo-compare.h"
- #include "migration/colo.h"
-+#include "migration/migration.h"
- #define TYPE_COLO_COMPARE "colo-compare"
- #define COLO_COMPARE(obj) \
-@@ -XXX,XX +XXX,XX @@
- static QTAILQ_HEAD(, CompareState) net_compares =
-        QTAILQ_HEAD_INITIALIZER(net_compares);
-+static NotifierList colo_compare_notifiers =
-+    NOTIFIER_LIST_INITIALIZER(colo_compare_notifiers);
-+
- #define COMPARE_READ_LEN_MAX NET_BUFSIZE
- #define MAX_QUEUE_SIZE 1024
-@@ -XXX,XX +XXX,XX @@ static bool colo_mark_tcp_pkt(Packet *ppkt, Packet *spkt,
-     return false;
- }
-+static void colo_compare_inconsistency_notify(void)
-+{
-+    notifier_list_notify(&colo_compare_notifiers,
-+                migrate_get_current());
-+}
-+
- static void colo_compare_tcp(CompareState *s, Connection *conn)
- {
-     Packet *ppkt = NULL, *spkt = NULL;
-@@ -XXX,XX +XXX,XX @@ sec:
-         qemu_hexdump((char *)spkt->data, stderr,
-                      "colo-compare spkt", spkt->size);
--        /*
--         * colo_compare_inconsistent_notify();
--         * TODO: notice to checkpoint();
--         */
-+        colo_compare_inconsistency_notify();
-     }
- }
-@@ -XXX,XX +XXX,XX @@ static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time)
-     }
- }
-+void colo_compare_register_notifier(Notifier *notify)
-+{
-+    notifier_list_add(&colo_compare_notifiers, notify);
-+}
-+
-+void colo_compare_unregister_notifier(Notifier *notify)
-+{
-+    notifier_remove(notify);
-+}
-+
- static int colo_old_packet_check_one_conn(Connection *conn,
--                                          void *user_data)
-+                                           void *user_data)
- {
-     GList *result = NULL;
-     int64_t check_time = REGULAR_PACKET_CHECK_MS;
-@@ -XXX,XX +XXX,XX @@ static int colo_old_packet_check_one_conn(Connection *conn,
-     if (result) {
-         /* Do checkpoint will flush old packet */
--        /*
--         * TODO: Notify colo frame to do checkpoint.
--         * colo_compare_inconsistent_notify();
--         */
-+        colo_compare_inconsistency_notify();
-         return 0;
-     }
-@@ -XXX,XX +XXX,XX @@ static void colo_compare_packet(CompareState *s, Connection *conn,
-             /*
-              * If one packet arrive late, the secondary_list or
-              * primary_list will be empty, so we can't compare it
--             * until next comparison.
-+             * until next comparison. If the packets in the list are
-+             * timeout, it will trigger a checkpoint request.
-              */
-             trace_colo_compare_main("packet different");
-             g_queue_push_head(&conn->primary_list, pkt);
--            /* TODO: colo_notify_checkpoint();*/
-+            colo_compare_inconsistency_notify();
-             break;
-         }
-     }
-diff --git a/net/colo-compare.h b/net/colo-compare.h
-index XXXXXXX..XXXXXXX 100644
---- a/net/colo-compare.h
-+++ b/net/colo-compare.h
-@@ -XXX,XX +XXX,XX @@
- #define QEMU_COLO_COMPARE_H
- void colo_notify_compares_event(void *opaque, int event, Error **errp);
-+void colo_compare_register_notifier(Notifier *notify);
-+void colo_compare_unregister_notifier(Notifier *notify);
- #endif /* QEMU_COLO_COMPARE_H */
---
-.17.1

-[Qemu-devel] [PULL 05/25] COLO: Add block replication into colo process
+Deleted patch
-From: Zhang Chen <zhangckid@gmail.com>
-Make sure master start block replication after slave's block
-replication started.
-Besides, we need to activate VM's blocks before goes into
-COLO state.
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- migration/colo.c      | 43 +++++++++++++++++++++++++++++++++++++++++++
- migration/migration.c | 10 ++++++++++
-files changed, 53 insertions(+)
-diff --git a/migration/colo.c b/migration/colo.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/colo.c
-+++ b/migration/colo.c
-@@ -XXX,XX +XXX,XX @@
- #include "replication.h"
- #include "net/colo-compare.h"
- #include "net/colo.h"
-+#include "block/block.h"
- static bool vmstate_loading;
- static Notifier packets_compare_notifier;
-@@ -XXX,XX +XXX,XX @@ static void secondary_vm_do_failover(void)
- {
-     int old_state;
-     MigrationIncomingState *mis = migration_incoming_get_current();
-+    Error *local_err = NULL;
-     /* Can not do failover during the process of VM's loading VMstate, Or
-      * it will break the secondary VM.
-@@ -XXX,XX +XXX,XX @@ static void secondary_vm_do_failover(void)
-     migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
-                       MIGRATION_STATUS_COMPLETED);
-+    replication_stop_all(true, &local_err);
-+    if (local_err) {
-+        error_report_err(local_err);
-+    }
-+
-     if (!autostart) {
-         error_report("\"-S\" qemu option will be ignored in secondary side");
-         /* recover runstate to normal migration finish state */
-@@ -XXX,XX +XXX,XX @@ static void primary_vm_do_failover(void)
- {
-     MigrationState *s = migrate_get_current();
-     int old_state;
-+    Error *local_err = NULL;
-     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
-                       MIGRATION_STATUS_COMPLETED);
-@@ -XXX,XX +XXX,XX @@ static void primary_vm_do_failover(void)
-                      FailoverStatus_str(old_state));
-         return;
-     }
-+
-+    replication_stop_all(true, &local_err);
-+    if (local_err) {
-+        error_report_err(local_err);
-+        local_err = NULL;
-+    }
-+
-     /* Notify COLO thread that failover work is finished */
-     qemu_sem_post(&s->colo_exit_sem);
- }
-@@ -XXX,XX +XXX,XX @@ static int colo_do_checkpoint_transaction(MigrationState *s,
-     qemu_savevm_state_header(fb);
-     qemu_savevm_state_setup(fb);
-     qemu_mutex_lock_iothread();
-+    replication_do_checkpoint_all(&local_err);
-+    if (local_err) {
-+        qemu_mutex_unlock_iothread();
-+        goto out;
-+    }
-     qemu_savevm_state_complete_precopy(fb, false, false);
-     qemu_mutex_unlock_iothread();
-@@ -XXX,XX +XXX,XX @@ static void colo_process_checkpoint(MigrationState *s)
-     object_unref(OBJECT(bioc));
-     qemu_mutex_lock_iothread();
-+    replication_start_all(REPLICATION_MODE_PRIMARY, &local_err);
-+    if (local_err) {
-+        qemu_mutex_unlock_iothread();
-+        goto out;
-+    }
-+
-     vm_start();
-     qemu_mutex_unlock_iothread();
-     trace_colo_vm_state_change("stop", "run");
-@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
-     object_unref(OBJECT(bioc));
-     qemu_mutex_lock_iothread();
-+    replication_start_all(REPLICATION_MODE_SECONDARY, &local_err);
-+    if (local_err) {
-+        qemu_mutex_unlock_iothread();
-+        goto out;
-+    }
-     vm_start();
-     trace_colo_vm_state_change("stop", "run");
-     qemu_mutex_unlock_iothread();
-@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
-             goto out;
-         }
-+        replication_get_error_all(&local_err);
-+        if (local_err) {
-+            qemu_mutex_unlock_iothread();
-+            goto out;
-+        }
-+        /* discard colo disk buffer */
-+        replication_do_checkpoint_all(&local_err);
-+        if (local_err) {
-+            qemu_mutex_unlock_iothread();
-+            goto out;
-+        }
-+
-         vmstate_loading = false;
-         vm_start();
-         trace_colo_vm_state_change("stop", "run");
-diff --git a/migration/migration.c b/migration/migration.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/migration.c
-+++ b/migration/migration.c
-@@ -XXX,XX +XXX,XX @@ static void process_incoming_migration_co(void *opaque)
-     MigrationIncomingState *mis = migration_incoming_get_current();
-     PostcopyState ps;
-     int ret;
-+    Error *local_err = NULL;
-     assert(mis->from_src_file);
-     mis->migration_incoming_co = qemu_coroutine_self();
-@@ -XXX,XX +XXX,XX @@ static void process_incoming_migration_co(void *opaque)
-     /* we get COLO info, and know if we are in COLO mode */
-     if (!ret && migration_incoming_enable_colo()) {
-+        /* Make sure all file formats flush their mutable metadata */
-+        bdrv_invalidate_cache_all(&local_err);
-+        if (local_err) {
-+            migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
-+                    MIGRATION_STATUS_FAILED);
-+            error_report_err(local_err);
-+            exit(EXIT_FAILURE);
-+        }
-+
-         qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
-              colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
-         mis->have_colo_incoming_thread = true;
---
-.17.1

-[Qemu-devel] [PULL 06/25] COLO: Remove colo_state migration struct
+Deleted patch
-From: Zhang Chen <zhangckid@gmail.com>
-We need to know if migration is going into COLO state for
-incoming side before start normal migration.
-Instead by using the VMStateDescription to send colo_state
-from source side to destination side, we use MIG_CMD_ENABLE_COLO
-to indicate whether COLO is enabled or not.
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- include/migration/colo.h |  5 +--
- migration/Makefile.objs  |  2 +-
- migration/colo-comm.c    | 76 ----------------------------------------
- migration/colo.c         | 13 ++++++-
- migration/migration.c    | 23 +++++++++++-
- migration/savevm.c       | 17 +++++++++
- migration/savevm.h       |  1 +
- migration/trace-events   |  1 +
- vl.c                     |  2 --
-files changed, 57 insertions(+), 83 deletions(-)
- delete mode 100644 migration/colo-comm.c
-diff --git a/include/migration/colo.h b/include/migration/colo.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/migration/colo.h
-+++ b/include/migration/colo.h
-@@ -XXX,XX +XXX,XX @@ void migrate_start_colo_process(MigrationState *s);
- bool migration_in_colo_state(void);
- /* loadvm */
--bool migration_incoming_enable_colo(void);
--void migration_incoming_exit_colo(void);
-+void migration_incoming_enable_colo(void);
-+void migration_incoming_disable_colo(void);
-+bool migration_incoming_colo_enabled(void);
- void *colo_process_incoming_thread(void *opaque);
- bool migration_incoming_in_colo_state(void);
-diff --git a/migration/Makefile.objs b/migration/Makefile.objs
-index XXXXXXX..XXXXXXX 100644
---- a/migration/Makefile.objs
-+++ b/migration/Makefile.objs
-@@ -XXX,XX +XXX,XX @@
- common-obj-y += migration.o socket.o fd.o exec.o
- common-obj-y += tls.o channel.o savevm.o
--common-obj-y += colo-comm.o colo.o colo-failover.o
-+common-obj-y += colo.o colo-failover.o
- common-obj-y += vmstate.o vmstate-types.o page_cache.o
- common-obj-y += qemu-file.o global_state.o
- common-obj-y += qemu-file-channel.o
-diff --git a/migration/colo-comm.c b/migration/colo-comm.c
-deleted file mode 100644
-index XXXXXXX..XXXXXXX
---- a/migration/colo-comm.c
-+++ /dev/null
-@@ -XXX,XX +XXX,XX @@
--/*
-- * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
-- * (a.k.a. Fault Tolerance or Continuous Replication)
-- *
-- * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
-- * Copyright (c) 2016 FUJITSU LIMITED
-- * Copyright (c) 2016 Intel Corporation
-- *
-- * This work is licensed under the terms of the GNU GPL, version 2 or
-- * later. See the COPYING file in the top-level directory.
-- *
-- */
--
--#include "qemu/osdep.h"
--#include "migration.h"
--#include "migration/colo.h"
--#include "migration/vmstate.h"
--#include "trace.h"
--
--typedef struct {
--     bool colo_requested;
--} COLOInfo;
--
--static COLOInfo colo_info;
--
--COLOMode get_colo_mode(void)
--{
--    if (migration_in_colo_state()) {
--        return COLO_MODE_PRIMARY;
--    } else if (migration_incoming_in_colo_state()) {
--        return COLO_MODE_SECONDARY;
--    } else {
--        return COLO_MODE_UNKNOWN;
--    }
--}
--
--static int colo_info_pre_save(void *opaque)
--{
--    COLOInfo *s = opaque;
--
--    s->colo_requested = migrate_colo_enabled();
--
--    return 0;
--}
--
--static bool colo_info_need(void *opaque)
--{
--   return migrate_colo_enabled();
--}
--
--static const VMStateDescription colo_state = {
--    .name = "COLOState",
--    .version_id = 1,
--    .minimum_version_id = 1,
--    .pre_save = colo_info_pre_save,
--    .needed = colo_info_need,
--    .fields = (VMStateField[]) {
--        VMSTATE_BOOL(colo_requested, COLOInfo),
--        VMSTATE_END_OF_LIST()
--    },
--};
--
--void colo_info_init(void)
--{
--    vmstate_register(NULL, 0, &colo_state, &colo_info);
--}
--
--bool migration_incoming_enable_colo(void)
--{
--    return colo_info.colo_requested;
--}
--
--void migration_incoming_exit_colo(void)
--{
--    colo_info.colo_requested = false;
--}
-diff --git a/migration/colo.c b/migration/colo.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/colo.c
-+++ b/migration/colo.c
-@@ -XXX,XX +XXX,XX @@ static void primary_vm_do_failover(void)
-     qemu_sem_post(&s->colo_exit_sem);
- }
-+COLOMode get_colo_mode(void)
-+{
-+    if (migration_in_colo_state()) {
-+        return COLO_MODE_PRIMARY;
-+    } else if (migration_incoming_in_colo_state()) {
-+        return COLO_MODE_SECONDARY;
-+    } else {
-+        return COLO_MODE_UNKNOWN;
-+    }
-+}
-+
- void colo_do_failover(MigrationState *s)
- {
-     /* Make sure VM stopped while failover happened. */
-@@ -XXX,XX +XXX,XX @@ out:
-     if (mis->to_src_file) {
-         qemu_fclose(mis->to_src_file);
-     }
--    migration_incoming_exit_colo();
-+    migration_incoming_disable_colo();
-     rcu_unregister_thread();
-     return NULL;
-diff --git a/migration/migration.c b/migration/migration.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/migration.c
-+++ b/migration/migration.c
-@@ -XXX,XX +XXX,XX @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
-     return migrate_send_rp_message(mis, msg_type, msglen, bufc);
- }
-+static bool migration_colo_enabled;
-+bool migration_incoming_colo_enabled(void)
-+{
-+    return migration_colo_enabled;
-+}
-+
-+void migration_incoming_disable_colo(void)
-+{
-+    migration_colo_enabled = false;
-+}
-+
-+void migration_incoming_enable_colo(void)
-+{
-+    migration_colo_enabled = true;
-+}
-+
- void qemu_start_incoming_migration(const char *uri, Error **errp)
- {
-     const char *p;
-@@ -XXX,XX +XXX,XX @@ static void process_incoming_migration_co(void *opaque)
-     }
-     /* we get COLO info, and know if we are in COLO mode */
--    if (!ret && migration_incoming_enable_colo()) {
-+    if (!ret && migration_incoming_colo_enabled()) {
-         /* Make sure all file formats flush their mutable metadata */
-         bdrv_invalidate_cache_all(&local_err);
-         if (local_err) {
-@@ -XXX,XX +XXX,XX @@ static void *migration_thread(void *opaque)
-         qemu_savevm_send_postcopy_advise(s->to_dst_file);
-     }
-+    if (migrate_colo_enabled()) {
-+        /* Notify migration destination that we enable COLO */
-+        qemu_savevm_send_colo_enable(s->to_dst_file);
-+    }
-+
-     qemu_savevm_state_setup(s->to_dst_file);
-     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
-diff --git a/migration/savevm.c b/migration/savevm.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -XXX,XX +XXX,XX @@
- #include "io/channel-file.h"
- #include "sysemu/replay.h"
- #include "qjson.h"
-+#include "migration/colo.h"
- #ifndef ETH_P_RARP
- #define ETH_P_RARP 0x8035
-@@ -XXX,XX +XXX,XX @@ enum qemu_vm_cmd {
-                                       were previously sent during
-                                       precopy but are dirty. */
-     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
-+    MIG_CMD_ENABLE_COLO,       /* Enable COLO */
-     MIG_CMD_POSTCOPY_RESUME,   /* resume postcopy on dest */
-     MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
-     MIG_CMD_MAX
-@@ -XXX,XX +XXX,XX @@ static void qemu_savevm_command_send(QEMUFile *f,
-     qemu_fflush(f);
- }
-+void qemu_savevm_send_colo_enable(QEMUFile *f)
-+{
-+    trace_savevm_send_colo_enable();
-+    qemu_savevm_command_send(f, MIG_CMD_ENABLE_COLO, 0, NULL);
-+}
-+
- void qemu_savevm_send_ping(QEMUFile *f, uint32_t value)
- {
-     uint32_t buf;
-@@ -XXX,XX +XXX,XX @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
-     return 0;
- }
-+static int loadvm_process_enable_colo(MigrationIncomingState *mis)
-+{
-+    migration_incoming_enable_colo();
-+    return 0;
-+}
-+
- /*
-  * Process an incoming 'QEMU_VM_COMMAND'
-  * 0           just a normal return
-@@ -XXX,XX +XXX,XX @@ static int loadvm_process_command(QEMUFile *f)
-     case MIG_CMD_RECV_BITMAP:
-         return loadvm_handle_recv_bitmap(mis, len);
-+
-+    case MIG_CMD_ENABLE_COLO:
-+        return loadvm_process_enable_colo(mis);
-     }
-     return 0;
-diff --git a/migration/savevm.h b/migration/savevm.h
-index XXXXXXX..XXXXXXX 100644
---- a/migration/savevm.h
-+++ b/migration/savevm.h
-@@ -XXX,XX +XXX,XX @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
-                                            uint16_t len,
-                                            uint64_t *start_list,
-                                            uint64_t *length_list);
-+void qemu_savevm_send_colo_enable(QEMUFile *f);
- int qemu_loadvm_state(QEMUFile *f);
- void qemu_loadvm_state_cleanup(void);
-diff --git a/migration/trace-events b/migration/trace-events
-index XXXXXXX..XXXXXXX 100644
---- a/migration/trace-events
-+++ b/migration/trace-events
-@@ -XXX,XX +XXX,XX @@ savevm_send_ping(uint32_t val) "0x%x"
- savevm_send_postcopy_listen(void) ""
- savevm_send_postcopy_run(void) ""
- savevm_send_postcopy_resume(void) ""
-+savevm_send_colo_enable(void) ""
- savevm_send_recv_bitmap(char *name) "%s"
- savevm_state_setup(void) ""
- savevm_state_resume_prepare(void) ""
-diff --git a/vl.c b/vl.c
-index XXXXXXX..XXXXXXX 100644
---- a/vl.c
-+++ b/vl.c
-@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv, char **envp)
- #endif
-     }
--    colo_info_init();
--
-     if (net_init_clients(&err) < 0) {
-         error_report_err(err);
-         exit(1);
---
-.17.1

-[Qemu-devel] [PULL 07/25] COLO: Load dirty pages into SVM's RAM cache firstly
+Deleted patch
-From: Zhang Chen <zhangckid@gmail.com>
-We should not load PVM's state directly into SVM, because there maybe some
-errors happen when SVM is receving data, which will break SVM.
-We need to ensure receving all data before load the state into SVM. We use
-an extra memory to cache these data (PVM's ram). The ram cache in secondary side
-is initially the same as SVM/PVM's memory. And in the process of checkpoint,
-we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
-always the same as PVM's memory at every checkpoint, then we flush this cached ram
-to SVM after we receive all PVM's state.
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- include/exec/ram_addr.h |  1 +
- migration/migration.c   |  7 ++++
- migration/ram.c         | 83 ++++++++++++++++++++++++++++++++++++++++-
- migration/ram.h         |  4 ++
- migration/savevm.c      |  2 +-
-files changed, 94 insertions(+), 3 deletions(-)
-diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/exec/ram_addr.h
-+++ b/include/exec/ram_addr.h
-@@ -XXX,XX +XXX,XX @@ struct RAMBlock {
-     struct rcu_head rcu;
-     struct MemoryRegion *mr;
-     uint8_t *host;
-+    uint8_t *colo_cache; /* For colo, VM's ram cache */
-     ram_addr_t offset;
-     ram_addr_t used_length;
-     ram_addr_t max_length;
-diff --git a/migration/migration.c b/migration/migration.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/migration.c
-+++ b/migration/migration.c
-@@ -XXX,XX +XXX,XX @@ static void process_incoming_migration_co(void *opaque)
-             exit(EXIT_FAILURE);
-         }
-+        if (colo_init_ram_cache() < 0) {
-+            error_report("Init ram cache failed");
-+            exit(EXIT_FAILURE);
-+        }
-+
-         qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
-              colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
-         mis->have_colo_incoming_thread = true;
-@@ -XXX,XX +XXX,XX @@ static void process_incoming_migration_co(void *opaque)
-         /* Wait checkpoint incoming thread exit before free resource */
-         qemu_thread_join(&mis->colo_incoming_thread);
-+        /* We hold the global iothread lock, so it is safe here */
-+        colo_release_ram_cache();
-     }
-     if (ret < 0) {
-diff --git a/migration/ram.c b/migration/ram.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -XXX,XX +XXX,XX @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
-     return block->host + offset;
- }
-+static inline void *colo_cache_from_block_offset(RAMBlock *block,
-+                                                 ram_addr_t offset)
-+{
-+    if (!offset_in_ramblock(block, offset)) {
-+        return NULL;
-+    }
-+    if (!block->colo_cache) {
-+        error_report("%s: colo_cache is NULL in block :%s",
-+                     __func__, block->idstr);
-+        return NULL;
-+    }
-+    return block->colo_cache + offset;
-+}
-+
- /**
-  * ram_handle_compressed: handle the zero page case
-  *
-@@ -XXX,XX +XXX,XX @@ static void decompress_data_with_multi_threads(QEMUFile *f,
-     qemu_mutex_unlock(&decomp_done_lock);
- }
-+/*
-+ * colo cache: this is for secondary VM, we cache the whole
-+ * memory of the secondary VM, it is need to hold the global lock
-+ * to call this helper.
-+ */
-+int colo_init_ram_cache(void)
-+{
-+    RAMBlock *block;
-+
-+    rcu_read_lock();
-+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-+        block->colo_cache = qemu_anon_ram_alloc(block->used_length,
-+                                                NULL,
-+                                                false);
-+        if (!block->colo_cache) {
-+            error_report("%s: Can't alloc memory for COLO cache of block %s,"
-+                         "size 0x" RAM_ADDR_FMT, __func__, block->idstr,
-+                         block->used_length);
-+            goto out_locked;
-+        }
-+        memcpy(block->colo_cache, block->host, block->used_length);
-+    }
-+    rcu_read_unlock();
-+    return 0;
-+
-+out_locked:
-+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-+        if (block->colo_cache) {
-+            qemu_anon_ram_free(block->colo_cache, block->used_length);
-+            block->colo_cache = NULL;
-+        }
-+    }
-+
-+    rcu_read_unlock();
-+    return -errno;
-+}
-+
-+/* It is need to hold the global lock to call this helper */
-+void colo_release_ram_cache(void)
-+{
-+    RAMBlock *block;
-+
-+    rcu_read_lock();
-+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-+        if (block->colo_cache) {
-+            qemu_anon_ram_free(block->colo_cache, block->used_length);
-+            block->colo_cache = NULL;
-+        }
-+    }
-+    rcu_read_unlock();
-+}
-+
- /**
-  * ram_load_setup: Setup RAM for migration incoming side
-  *
-@@ -XXX,XX +XXX,XX @@ static int ram_load_setup(QEMUFile *f, void *opaque)
-     xbzrle_load_setup();
-     ramblock_recv_map_init();
-+
-     return 0;
- }
-@@ -XXX,XX +XXX,XX @@ static int ram_load_cleanup(void *opaque)
-         g_free(rb->receivedmap);
-         rb->receivedmap = NULL;
-     }
-+
-     return 0;
- }
-@@ -XXX,XX +XXX,XX @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
-                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
-             RAMBlock *block = ram_block_from_stream(f, flags);
--            host = host_from_ram_block_offset(block, addr);
-+            /*
-+             * After going into COLO, we should load the Page into colo_cache.
-+             */
-+            if (migration_incoming_in_colo_state()) {
-+                host = colo_cache_from_block_offset(block, addr);
-+            } else {
-+                host = host_from_ram_block_offset(block, addr);
-+            }
-             if (!host) {
-                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
-                 ret = -EINVAL;
-                 break;
-             }
--            ramblock_recv_bitmap_set(block, host);
-+
-+            if (!migration_incoming_in_colo_state()) {
-+                ramblock_recv_bitmap_set(block, host);
-+            }
-+
-             trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host);
-         }
-diff --git a/migration/ram.h b/migration/ram.h
-index XXXXXXX..XXXXXXX 100644
---- a/migration/ram.h
-+++ b/migration/ram.h
-@@ -XXX,XX +XXX,XX @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
-                                   const char *block_name);
- int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
-+/* ram cache */
-+int colo_init_ram_cache(void);
-+void colo_release_ram_cache(void);
-+
- #endif
-diff --git a/migration/savevm.c b/migration/savevm.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -XXX,XX +XXX,XX @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
- static int loadvm_process_enable_colo(MigrationIncomingState *mis)
- {
-     migration_incoming_enable_colo();
--    return 0;
-+    return colo_init_ram_cache();
- }
- /*
---
-.17.1

-[Qemu-devel] [PULL 08/25] ram/COLO: Record the dirty pages that SVM received
+Deleted patch
-From: Zhang Chen <zhangckid@gmail.com>
-We record the address of the dirty pages that received,
-it will help flushing pages that cached into SVM.
-Here, it is a trick, we record dirty pages by re-using migration
-dirty bitmap. In the later patch, we will start the dirty log
-for SVM, just like migration, in this way, we can record both
-the dirty pages caused by PVM and SVM, we only flush those dirty
-pages from RAM cache while do checkpoint.
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- migration/ram.c | 43 ++++++++++++++++++++++++++++++++++++++++---
-file changed, 40 insertions(+), 3 deletions(-)
-diff --git a/migration/ram.c b/migration/ram.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -XXX,XX +XXX,XX @@ static inline void *colo_cache_from_block_offset(RAMBlock *block,
-                      __func__, block->idstr);
-         return NULL;
-     }
-+
-+    /*
-+    * During colo checkpoint, we need bitmap of these migrated pages.
-+    * It help us to decide which pages in ram cache should be flushed
-+    * into VM's RAM later.
-+    */
-+    if (!test_and_set_bit(offset >> TARGET_PAGE_BITS, block->bmap)) {
-+        ram_state->migration_dirty_pages++;
-+    }
-     return block->colo_cache + offset;
- }
-@@ -XXX,XX +XXX,XX @@ int colo_init_ram_cache(void)
-     RAMBlock *block;
-     rcu_read_lock();
--    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
-         block->colo_cache = qemu_anon_ram_alloc(block->used_length,
-                                                 NULL,
-                                                 false);
-@@ -XXX,XX +XXX,XX @@ int colo_init_ram_cache(void)
-         memcpy(block->colo_cache, block->host, block->used_length);
-     }
-     rcu_read_unlock();
-+    /*
-+    * Record the dirty pages that sent by PVM, we use this dirty bitmap together
-+    * with to decide which page in cache should be flushed into SVM's RAM. Here
-+    * we use the same name 'ram_bitmap' as for migration.
-+    */
-+    if (ram_bytes_total()) {
-+        RAMBlock *block;
-+
-+        RAMBLOCK_FOREACH_MIGRATABLE(block) {
-+            unsigned long pages = block->max_length >> TARGET_PAGE_BITS;
-+
-+            block->bmap = bitmap_new(pages);
-+            bitmap_set(block->bmap, 0, pages);
-+        }
-+    }
-+    ram_state = g_new0(RAMState, 1);
-+    ram_state->migration_dirty_pages = 0;
-+
-     return 0;
- out_locked:
--    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-+
-+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
-         if (block->colo_cache) {
-             qemu_anon_ram_free(block->colo_cache, block->used_length);
-             block->colo_cache = NULL;
-@@ -XXX,XX +XXX,XX @@ void colo_release_ram_cache(void)
- {
-     RAMBlock *block;
-+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
-+        g_free(block->bmap);
-+        block->bmap = NULL;
-+    }
-+
-     rcu_read_lock();
--    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
-+
-+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
-         if (block->colo_cache) {
-             qemu_anon_ram_free(block->colo_cache, block->used_length);
-             block->colo_cache = NULL;
-         }
-     }
-+
-     rcu_read_unlock();
-+    g_free(ram_state);
-+    ram_state = NULL;
- }
- /**
---
-.17.1

-[Qemu-devel] [PULL 09/25] COLO: Flush memory data from ram cache
+Deleted patch
-From: Zhang Chen <zhangckid@gmail.com>
-During the time of VM's running, PVM may dirty some pages, we will transfer
-PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
-time. So, the content of SVM's RAM cache will always be same with PVM's memory
-after checkpoint.
-Instead of flushing all content of PVM's RAM cache into SVM's MEMORY,
-we do this in a more efficient way:
-Only flush any page that dirtied by PVM since last checkpoint.
-In this way, we can ensure SVM's memory same with PVM's.
-Besides, we must ensure flush RAM cache before load device state.
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
-Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- migration/ram.c        | 37 +++++++++++++++++++++++++++++++++++++
- migration/trace-events |  2 ++
-files changed, 39 insertions(+)
-diff --git a/migration/ram.c b/migration/ram.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -XXX,XX +XXX,XX @@ static bool postcopy_is_running(void)
-     return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END;
- }
-+/*
-+ * Flush content of RAM cache into SVM's memory.
-+ * Only flush the pages that be dirtied by PVM or SVM or both.
-+ */
-+static void colo_flush_ram_cache(void)
-+{
-+    RAMBlock *block = NULL;
-+    void *dst_host;
-+    void *src_host;
-+    unsigned long offset = 0;
-+
-+    trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
-+    rcu_read_lock();
-+    block = QLIST_FIRST_RCU(&ram_list.blocks);
-+
-+    while (block) {
-+        offset = migration_bitmap_find_dirty(ram_state, block, offset);
-+
-+        if (offset << TARGET_PAGE_BITS >= block->used_length) {
-+            offset = 0;
-+            block = QLIST_NEXT_RCU(block, next);
-+        } else {
-+            migration_bitmap_clear_dirty(ram_state, block, offset);
-+            dst_host = block->host + (offset << TARGET_PAGE_BITS);
-+            src_host = block->colo_cache + (offset << TARGET_PAGE_BITS);
-+            memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
-+        }
-+    }
-+
-+    rcu_read_unlock();
-+    trace_colo_flush_ram_cache_end();
-+}
-+
- static int ram_load(QEMUFile *f, void *opaque, int version_id)
- {
-     int flags = 0, ret = 0, invalid_flags = 0;
-@@ -XXX,XX +XXX,XX @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
-     ret |= wait_for_decompress_done();
-     rcu_read_unlock();
-     trace_ram_load_complete(ret, seq_iter);
-+
-+    if (!ret  && migration_incoming_in_colo_state()) {
-+        colo_flush_ram_cache();
-+    }
-     return ret;
- }
-diff --git a/migration/trace-events b/migration/trace-events
-index XXXXXXX..XXXXXXX 100644
---- a/migration/trace-events
-+++ b/migration/trace-events
-@@ -XXX,XX +XXX,XX @@ ram_dirty_bitmap_sync_start(void) ""
- ram_dirty_bitmap_sync_wait(void) ""
- ram_dirty_bitmap_sync_complete(void) ""
- ram_state_resume_prepare(uint64_t v) "%" PRId64
-+colo_flush_ram_cache_begin(uint64_t dirty_pages) "dirty_pages %" PRIu64
-+colo_flush_ram_cache_end(void) ""
- # migration/migration.c
- await_return_path_close_on_source_close(void) ""
---
-.17.1

-[Qemu-devel] [PULL 10/25] qmp event: Add COLO_EXIT event to notify users while exited COLO
+Deleted patch
-From: zhanghailiang <zhang.zhanghailiang@huawei.com>
-If some errors happen during VM's COLO FT stage, it's important to
-notify the users of this event. Together with 'x-colo-lost-heartbeat',
-Users can intervene in COLO's failover work immediately.
-If users don't want to get involved in COLO's failover verdict,
-it is still necessary to notify users that we exited COLO mode.
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- migration/colo.c    | 31 +++++++++++++++++++++++++++++++
- qapi/migration.json | 38 ++++++++++++++++++++++++++++++++++++++
-files changed, 69 insertions(+)
-diff --git a/migration/colo.c b/migration/colo.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/colo.c
-+++ b/migration/colo.c
-@@ -XXX,XX +XXX,XX @@
- #include "net/colo-compare.h"
- #include "net/colo.h"
- #include "block/block.h"
-+#include "qapi/qapi-events-migration.h"
- static bool vmstate_loading;
- static Notifier packets_compare_notifier;
-@@ -XXX,XX +XXX,XX @@ out:
-         qemu_fclose(fb);
-     }
-+    /*
-+     * There are only two reasons we can get here, some error happened
-+     * or the user triggered failover.
-+     */
-+    switch (failover_get_state()) {
-+    case FAILOVER_STATUS_NONE:
-+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
-+                                  COLO_EXIT_REASON_ERROR);
-+        break;
-+    case FAILOVER_STATUS_REQUIRE:
-+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
-+                                  COLO_EXIT_REASON_REQUEST);
-+        break;
-+    default:
-+        abort();
-+    }
-+
-     /* Hope this not to be too long to wait here */
-     qemu_sem_wait(&s->colo_exit_sem);
-     qemu_sem_destroy(&s->colo_exit_sem);
-@@ -XXX,XX +XXX,XX @@ out:
-         error_report_err(local_err);
-     }
-+    switch (failover_get_state()) {
-+    case FAILOVER_STATUS_NONE:
-+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
-+                                  COLO_EXIT_REASON_ERROR);
-+        break;
-+    case FAILOVER_STATUS_REQUIRE:
-+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
-+                                  COLO_EXIT_REASON_REQUEST);
-+        break;
-+    default:
-+        abort();
-+    }
-+
-     if (fb) {
-         qemu_fclose(fb);
-     }
-diff --git a/qapi/migration.json b/qapi/migration.json
-index XXXXXXX..XXXXXXX 100644
---- a/qapi/migration.json
-+++ b/qapi/migration.json
-@@ -XXX,XX +XXX,XX @@
- { 'enum': 'FailoverStatus',
-   'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] }
-+##
-+# @COLO_EXIT:
-+#
-+# Emitted when VM finishes COLO mode due to some errors happening or
-+# at the request of users.
-+#
-+# @mode: report COLO mode when COLO exited.
-+#
-+# @reason: describes the reason for the COLO exit.
-+#
-+# Since: 3.1
-+#
-+# Example:
-+#
-+# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172},
-+#      "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
-+#
-+##
-+{ 'event': 'COLO_EXIT',
-+  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } }
-+
-+##
-+# @COLOExitReason:
-+#
-+# The reason for a COLO exit
-+#
-+# @none: no failover has ever happened. This can't occur in the
-+# COLO_EXIT event, only in the result of query-colo-status.
-+#
-+# @request: COLO exit is due to an external request
-+#
-+# @error: COLO exit is due to an internal error
-+#
-+# Since: 3.1
-+##
-+{ 'enum': 'COLOExitReason',
-+  'data': [ 'none', 'request', 'error' ] }
-+
- ##
- # @x-colo-lost-heartbeat:
- #
---
-.17.1

-[Qemu-devel] [PULL 11/25] qapi/migration.json: Rename COLO unknown mode to none mode.
+Deleted patch
-From: Zhang Chen <chen.zhang@intel.com>
-Suggested by Markus Armbruster rename COLO unknown mode to none mode.
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Markus Armbruster <armbru@redhat.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- migration/colo-failover.c |  2 +-
- migration/colo.c          |  2 +-
- qapi/migration.json       | 10 +++++-----
-files changed, 7 insertions(+), 7 deletions(-)
-diff --git a/migration/colo-failover.c b/migration/colo-failover.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/colo-failover.c
-+++ b/migration/colo-failover.c
-@@ -XXX,XX +XXX,XX @@ FailoverStatus failover_get_state(void)
- void qmp_x_colo_lost_heartbeat(Error **errp)
- {
--    if (get_colo_mode() == COLO_MODE_UNKNOWN) {
-+    if (get_colo_mode() == COLO_MODE_NONE) {
-         error_setg(errp, QERR_FEATURE_DISABLED, "colo");
-         return;
-     }
-diff --git a/migration/colo.c b/migration/colo.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/colo.c
-+++ b/migration/colo.c
-@@ -XXX,XX +XXX,XX @@ COLOMode get_colo_mode(void)
-     } else if (migration_incoming_in_colo_state()) {
-         return COLO_MODE_SECONDARY;
-     } else {
--        return COLO_MODE_UNKNOWN;
-+        return COLO_MODE_NONE;
-     }
- }
-diff --git a/qapi/migration.json b/qapi/migration.json
-index XXXXXXX..XXXXXXX 100644
---- a/qapi/migration.json
-+++ b/qapi/migration.json
-@@ -XXX,XX +XXX,XX @@
- ##
- # @COLOMode:
- #
--# The colo mode
-+# The COLO current mode.
- #
--# @unknown: unknown mode
-+# @none: COLO is disabled.
- #
--# @primary: master side
-+# @primary: COLO node in primary side.
- #
--# @secondary: slave side
-+# @secondary: COLO node in slave side.
- #
- # Since: 2.8
- ##
- { 'enum': 'COLOMode',
--  'data': [ 'unknown', 'primary', 'secondary'] }
-+  'data': [ 'none', 'primary', 'secondary'] }
- ##
- # @FailoverStatus:
---
-.17.1

-[Qemu-devel] [PULL 12/25] qapi: Add new command to query colo status
+Deleted patch
-From: Zhang Chen <zhangckid@gmail.com>
-Libvirt or other high level software can use this command query colo status.
-You can test this command like that:
-{'execute':'query-colo-status'}
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- migration/colo.c    | 21 +++++++++++++++++++++
- qapi/migration.json | 32 ++++++++++++++++++++++++++++++++
-files changed, 53 insertions(+)
-diff --git a/migration/colo.c b/migration/colo.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/colo.c
-+++ b/migration/colo.c
-@@ -XXX,XX +XXX,XX @@
- #include "net/colo.h"
- #include "block/block.h"
- #include "qapi/qapi-events-migration.h"
-+#include "qapi/qmp/qerror.h"
- static bool vmstate_loading;
- static Notifier packets_compare_notifier;
-@@ -XXX,XX +XXX,XX @@ void qmp_xen_colo_do_checkpoint(Error **errp)
- #endif
- }
-+COLOStatus *qmp_query_colo_status(Error **errp)
-+{
-+    COLOStatus *s = g_new0(COLOStatus, 1);
-+
-+    s->mode = get_colo_mode();
-+
-+    switch (failover_get_state()) {
-+    case FAILOVER_STATUS_NONE:
-+        s->reason = COLO_EXIT_REASON_NONE;
-+        break;
-+    case FAILOVER_STATUS_REQUIRE:
-+        s->reason = COLO_EXIT_REASON_REQUEST;
-+        break;
-+    default:
-+        s->reason = COLO_EXIT_REASON_ERROR;
-+    }
-+
-+    return s;
-+}
-+
- static void colo_send_message(QEMUFile *f, COLOMessage msg,
-                               Error **errp)
- {
-diff --git a/qapi/migration.json b/qapi/migration.json
-index XXXXXXX..XXXXXXX 100644
---- a/qapi/migration.json
-+++ b/qapi/migration.json
-@@ -XXX,XX +XXX,XX @@
- ##
- { 'command': 'xen-colo-do-checkpoint' }
-+##
-+# @COLOStatus:
-+#
-+# The result format for 'query-colo-status'.
-+#
-+# @mode: COLO running mode. If COLO is running, this field will return
-+#        'primary' or 'secondary'.
-+#
-+# @reason: describes the reason for the COLO exit.
-+#
-+# Since: 3.0
-+##
-+{ 'struct': 'COLOStatus',
-+  'data': { 'mode': 'COLOMode', 'reason': 'COLOExitReason' } }
-+
-+##
-+# @query-colo-status:
-+#
-+# Query COLO status while the vm is running.
-+#
-+# Returns: A @COLOStatus object showing the status.
-+#
-+# Example:
-+#
-+# -> { "execute": "query-colo-status" }
-+# <- { "return": { "mode": "primary", "active": true, "reason": "request" } }
-+#
-+# Since: 3.0
-+##
-+{ 'command': 'query-colo-status',
-+  'returns': 'COLOStatus' }
-+
- ##
- # @migrate-recover:
- #
---
-.17.1

-[Qemu-devel] [PULL 13/25] savevm: split the process of different stages for loadvm/savevm
+[PULL 5/7] virtio-net: Added eBPF RSS to virtio-net.
-From: Zhang Chen <zhangckid@gmail.com>
+From: Andrew Melnychenko <andrew@daynix.com>
-There are several stages during loadvm/savevm process. In different stage,
+When RSS is enabled the device tries to load the eBPF program
-migration incoming processes different types of sections.
+to select RX virtqueue in the TUN. If eBPF can be loaded
-We want to control these stages more accuracy, it will benefit COLO
+the RSS will function also with vhost (works with kernel 5.8 and later).
-performance, we don't have to save type of QEMU_VM_SECTION_START
+Software RSS is used as a fallback with vhost=off when eBPF can't be loaded
-sections everytime while do checkpoint, besides, we want to separate
+or when hash population requested by the guest.
-the process of saving/loading memory and devices state.
+Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
-So we add three new helper functions: qemu_load_device_state() and
+Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
 qemu_savevm_live_state() to achieve different process during migration.
 Besides, we make qemu_loadvm_state_main() and qemu_save_device_state()
 public, and simplify the codes of qemu_save_device_state() by calling the
 wrapper qemu_savevm_state_header().
 Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
 Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
 Signed-off-by: Zhang Chen <zhangckid@gmail.com>
 Signed-off-by: Zhang Chen <chen.zhang@intel.com>
 Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- migration/colo.c   | 41 ++++++++++++++++++++++++++++++++---------
+ hw/net/vhost_net.c             |   3 ++
- migration/savevm.c | 36 +++++++++++++++++++++++++++++-------
+ hw/net/virtio-net.c            | 116 +++++++++++++++++++++++++++++++++++++++--
- migration/savevm.h |  4 ++++
+ include/hw/virtio/virtio-net.h |   4 ++
-files changed, 65 insertions(+), 16 deletions(-)
+ net/vhost-vdpa.c               |   2 +
+files changed, 122 insertions(+), 3 deletions(-)
-diff --git a/migration/colo.c b/migration/colo.c
-index XXXXXXX..XXXXXXX 100644
+diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
---- a/migration/colo.c
+index XXXXXXX..XXXXXXX 100644
-+++ b/migration/colo.c
+--- a/hw/net/vhost_net.c
 +++ b/hw/net/vhost_net.c
@@ -XXX,XX +XXX,XX @@ static const int kernel_feature_bits[] = {
      VIRTIO_NET_F_MTU,
      VIRTIO_F_IOMMU_PLATFORM,
      VIRTIO_F_RING_PACKED,
 +    VIRTIO_NET_F_HASH_REPORT,
      VHOST_INVALID_FEATURE_BIT
  };
@@ -XXX,XX +XXX,XX @@ static const int user_feature_bits[] = {
      VIRTIO_NET_F_MTU,
      VIRTIO_F_IOMMU_PLATFORM,
      VIRTIO_F_RING_PACKED,
 +    VIRTIO_NET_F_RSS,
 +    VIRTIO_NET_F_HASH_REPORT,
      /* This bit implies RARP isn't sent by QEMU out of band */
      VIRTIO_NET_F_GUEST_ANNOUNCE,
 diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/net/virtio-net.c
 +++ b/hw/net/virtio-net.c
@@ -XXX,XX +XXX,XX @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
          return features;
      }
 -    virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
 -    virtio_clear_feature(&features, VIRTIO_NET_F_HASH_REPORT);
 +    if (!ebpf_rss_is_loaded(&n->ebpf_rss)) {
 +        virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
 +    }
      features = vhost_net_get_features(get_vhost_net(nc->peer), features);
      vdev->backend_features = features;
@@ -XXX,XX +XXX,XX @@ static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
      }
  }
 +static void virtio_net_detach_epbf_rss(VirtIONet *n);
 +
  static void virtio_net_disable_rss(VirtIONet *n)
  {
      if (n->rss_data.enabled) {
          trace_virtio_net_rss_disable();
      }
      n->rss_data.enabled = false;
 +
 +    virtio_net_detach_epbf_rss(n);
 +}
 +
 +static bool virtio_net_attach_ebpf_to_backend(NICState *nic, int prog_fd)
 +{
 +    NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
 +    if (nc == NULL || nc->info->set_steering_ebpf == NULL) {
 +        return false;
 +    }
 +
 +    return nc->info->set_steering_ebpf(nc, prog_fd);
 +}
 +
 +static void rss_data_to_rss_config(struct VirtioNetRssData *data,
 +                                   struct EBPFRSSConfig *config)
 +{
 +    config->redirect = data->redirect;
 +    config->populate_hash = data->populate_hash;
 +    config->hash_types = data->hash_types;
 +    config->indirections_len = data->indirections_len;
 +    config->default_queue = data->default_queue;
 +}
 +
 +static bool virtio_net_attach_epbf_rss(VirtIONet *n)
 +{
 +    struct EBPFRSSConfig config = {};
 +
 +    if (!ebpf_rss_is_loaded(&n->ebpf_rss)) {
 +        return false;
 +    }
 +
 +    rss_data_to_rss_config(&n->rss_data, &config);
 +
 +    if (!ebpf_rss_set_all(&n->ebpf_rss, &config,
 +                          n->rss_data.indirections_table, n->rss_data.key)) {
 +        return false;
 +    }
 +
 +    if (!virtio_net_attach_ebpf_to_backend(n->nic, n->ebpf_rss.program_fd)) {
 +        return false;
 +    }
 +
 +    return true;
 +}
 +
 +static void virtio_net_detach_epbf_rss(VirtIONet *n)
 +{
 +    virtio_net_attach_ebpf_to_backend(n->nic, -1);
 +}
 +
 +static bool virtio_net_load_ebpf(VirtIONet *n)
 +{
 +    if (!virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
 +        /* backend does't support steering ebpf */
 +        return false;
 +    }
 +
 +    return ebpf_rss_load(&n->ebpf_rss);
 +}
 +
 +static void virtio_net_unload_ebpf(VirtIONet *n)
 +{
 +    virtio_net_attach_ebpf_to_backend(n->nic, -1);
 +    ebpf_rss_unload(&n->ebpf_rss);
  }
  static uint16_t virtio_net_handle_rss(VirtIONet *n,
@@ -XXX,XX +XXX,XX @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
          goto error;
      }
      n->rss_data.enabled = true;
 +
 +    if (!n->rss_data.populate_hash) {
 +        if (!virtio_net_attach_epbf_rss(n)) {
 +            /* EBPF must be loaded for vhost */
 +            if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
 +                warn_report("Can't load eBPF RSS for vhost");
 +                goto error;
 +            }
 +            /* fallback to software RSS */
 +            warn_report("Can't load eBPF RSS - fallback to software RSS");
 +            n->rss_data.enabled_software_rss = true;
 +        }
 +    } else {
 +        /* use software RSS for hash populating */
 +        /* and detach eBPF if was loaded before */
 +        virtio_net_detach_epbf_rss(n);
 +        n->rss_data.enabled_software_rss = true;
 +    }
 +
      trace_virtio_net_rss_enable(n->rss_data.hash_types,
                                  n->rss_data.indirections_len,
                                  temp.b);
@@ -XXX,XX +XXX,XX @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
          return -1;
      }
 -    if (!no_rss && n->rss_data.enabled) {
 +    if (!no_rss && n->rss_data.enabled && n->rss_data.enabled_software_rss) {
          int index = virtio_net_process_rss(nc, buf, size);
          if (index >= 0) {
              NetClientState *nc2 = qemu_get_subqueue(n->nic, index);
@@ -XXX,XX +XXX,XX @@ static int virtio_net_post_load_device(void *opaque, int version_id)
      }
      if (n->rss_data.enabled) {
 +        n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
 +        if (!n->rss_data.populate_hash) {
 +            if (!virtio_net_attach_epbf_rss(n)) {
 +                if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
 +                    warn_report("Can't post-load eBPF RSS for vhost");
 +                } else {
 +                    warn_report("Can't post-load eBPF RSS - "
 +                                "fallback to software RSS");
 +                    n->rss_data.enabled_software_rss = true;
 +                }
 +            }
 +        }
 +
          trace_virtio_net_rss_enable(n->rss_data.hash_types,
                                      n->rss_data.indirections_len,
                                      sizeof(n->rss_data.key));
@@ -XXX,XX +XXX,XX @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
      n->qdev = dev;
      net_rx_pkt_init(&n->rx_pkt, false);
 +
 +    if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
 +        virtio_net_load_ebpf(n);
 +    }
  }
  static void virtio_net_device_unrealize(DeviceState *dev)
@@ -XXX,XX +XXX,XX @@ static void virtio_net_device_unrealize(DeviceState *dev)
      VirtIONet *n = VIRTIO_NET(dev);
      int i, max_queues;
 +    if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
 +        virtio_net_unload_ebpf(n);
 +    }
 +
      /* This will stop vhost backend if appropriate. */
      virtio_net_set_status(vdev, 0);
@@ -XXX,XX +XXX,XX @@ static void virtio_net_instance_init(Object *obj)
      device_add_bootindex_property(obj, &n->nic_conf.bootindex,
                                    "bootindex", "/ethernet-phy@0",
                                    DEVICE(n));
 +
 +    ebpf_rss_init(&n->ebpf_rss);
  }
  static int virtio_net_pre_save(void *opaque)
 diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/hw/virtio/virtio-net.h
 +++ b/include/hw/virtio/virtio-net.h
 @@ -XXX,XX +XXX,XX @@
- #include "block/block.h"
+ #include "qemu/option_int.h"
- #include "qapi/qapi-events-migration.h"
+ #include "qom/object.h"
- #include "qapi/qmp/qerror.h"
-+#include "sysemu/cpus.h"
++#include "ebpf/ebpf_rss.h"
++
- static bool vmstate_loading;
+ #define TYPE_VIRTIO_NET "virtio-net-device"
- static Notifier packets_compare_notifier;
+ OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET)
-@@ -XXX,XX +XXX,XX @@ static int colo_do_checkpoint_transaction(MigrationState *s,
+@@ -XXX,XX +XXX,XX @@ typedef struct VirtioNetRscChain {
-     /* Disable block migration */
-     migrate_set_block_enabled(false, &local_err);
+ typedef struct VirtioNetRssData {
--    qemu_savevm_state_header(fb);
+     bool    enabled;
--    qemu_savevm_state_setup(fb);
++    bool    enabled_software_rss;
-     qemu_mutex_lock_iothread();
+     bool    redirect;
-     replication_do_checkpoint_all(&local_err);
+     bool    populate_hash;
-     if (local_err) {
+     uint32_t hash_types;
-         qemu_mutex_unlock_iothread();
+@@ -XXX,XX +XXX,XX @@ struct VirtIONet {
-         goto out;
+     Notifier migration_state;
-     }
+     VirtioNetRssData rss_data;
--    qemu_savevm_state_complete_precopy(fb, false, false);
+     struct NetRxPkt *rx_pkt;
--    qemu_mutex_unlock_iothread();
++    struct EBPFRSSContext ebpf_rss;
 -
 -    qemu_fflush(fb);
      colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
      if (local_err) {
 +        qemu_mutex_unlock_iothread();
 +        goto out;
 +    }
 +    /* Note: device state is saved into buffer */
 +    ret = qemu_save_device_state(fb);
 +
 +    qemu_mutex_unlock_iothread();
 +    if (ret < 0) {
          goto out;
      }
 +    /*
 +     * Only save VM's live state, which not including device state.
 +     * TODO: We may need a timeout mechanism to prevent COLO process
 +     * to be blocked here.
 +     */
 +    qemu_savevm_live_state(s->to_dst_file);
 +
 +    qemu_fflush(fb);
 +
      /*
       * We need the size of the VMstate data in Secondary side,
       * With which we can decide how much data should be read.
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
      uint64_t total_size;
      uint64_t value;
      Error *local_err = NULL;
 +    int ret;
      rcu_register_thread();
      qemu_sem_init(&mis->colo_incoming_sem, 0);
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
              goto out;
          }
 +        qemu_mutex_lock_iothread();
 +        cpu_synchronize_all_pre_loadvm();
 +        ret = qemu_loadvm_state_main(mis->from_src_file, mis);
 +        qemu_mutex_unlock_iothread();
 +
 +        if (ret < 0) {
 +            error_report("Load VM's live state (ram) error");
 +            goto out;
 +        }
 +
          value = colo_receive_message_value(mis->from_src_file,
                                   COLO_MESSAGE_VMSTATE_SIZE, &local_err);
          if (local_err) {
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
          }
          qemu_mutex_lock_iothread();
 -        qemu_system_reset(SHUTDOWN_CAUSE_NONE);
          vmstate_loading = true;
 -        if (qemu_loadvm_state(fb) < 0) {
 -            error_report("COLO: loadvm failed");
 +        ret = qemu_load_device_state(fb);
 +        if (ret < 0) {
 +            error_report("COLO: load device state failed");
              qemu_mutex_unlock_iothread();
              goto out;
          }
 diff --git a/migration/savevm.c b/migration/savevm.c
 index XXXXXXX..XXXXXXX 100644
 --- a/migration/savevm.c
 +++ b/migration/savevm.c
@@ -XXX,XX +XXX,XX @@ done:
      return ret;
  }
 -static int qemu_save_device_state(QEMUFile *f)
 +void qemu_savevm_live_state(QEMUFile *f)
  {
 -    SaveStateEntry *se;
 +    /* save QEMU_VM_SECTION_END section */
 +    qemu_savevm_state_complete_precopy(f, true, false);
 +    qemu_put_byte(f, QEMU_VM_EOF);
 +}
 -    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
 -    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
 +int qemu_save_device_state(QEMUFile *f)
 +{
 +    SaveStateEntry *se;
 +    if (!migration_in_colo_state()) {
 +        qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
 +        qemu_put_be32(f, QEMU_VM_FILE_VERSION);
 +    }
      cpu_synchronize_all_states();
      QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
@@ -XXX,XX +XXX,XX @@ enum LoadVMExitCodes {
      LOADVM_QUIT     =  1,
  };
--static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
+ void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
--
+diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
- /* ------ incoming postcopy messages ------ */
+index XXXXXXX..XXXXXXX 100644
- /* 'advise' arrives before any transfers just to tell us that a postcopy
+--- a/net/vhost-vdpa.c
-  * *might* happen - it might be skipped if precopy transferred everything
++++ b/net/vhost-vdpa.c
-@@ -XXX,XX +XXX,XX @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
+@@ -XXX,XX +XXX,XX @@ const int vdpa_feature_bits[] = {
-     return true;
+     VIRTIO_NET_F_MTU,
- }
+     VIRTIO_F_IOMMU_PLATFORM,
+     VIRTIO_F_RING_PACKED,
--static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
++    VIRTIO_NET_F_RSS,
-+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
++    VIRTIO_NET_F_HASH_REPORT,
- {
+     VIRTIO_NET_F_GUEST_ANNOUNCE,
-     uint8_t section_type;
+     VIRTIO_NET_F_STATUS,
-     int ret = 0;
+     VHOST_INVALID_FEATURE_BIT
@@ -XXX,XX +XXX,XX @@ int qemu_loadvm_state(QEMUFile *f)
      return ret;
  }
 +int qemu_load_device_state(QEMUFile *f)
 +{
 +    MigrationIncomingState *mis = migration_incoming_get_current();
 +    int ret;
 +
 +    /* Load QEMU_VM_SECTION_FULL section */
 +    ret = qemu_loadvm_state_main(f, mis);
 +    if (ret < 0) {
 +        error_report("Failed to load device state: %d", ret);
 +        return ret;
 +    }
 +
 +    cpu_synchronize_all_post_init();
 +    return 0;
 +}
 +
  int save_snapshot(const char *name, Error **errp)
  {
      BlockDriverState *bs, *bs1;
 diff --git a/migration/savevm.h b/migration/savevm.h
 index XXXXXXX..XXXXXXX 100644
 --- a/migration/savevm.h
 +++ b/migration/savevm.h
@@ -XXX,XX +XXX,XX @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                             uint64_t *start_list,
                                             uint64_t *length_list);
  void qemu_savevm_send_colo_enable(QEMUFile *f);
 +void qemu_savevm_live_state(QEMUFile *f);
 +int qemu_save_device_state(QEMUFile *f);
  int qemu_loadvm_state(QEMUFile *f);
  void qemu_loadvm_state_cleanup(void);
 +int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 +int qemu_load_device_state(QEMUFile *f);
  #endif
 --
-.17.1
+.7.4

-[Qemu-devel] [PULL 14/25] COLO: flush host dirty ram from cache
+Deleted patch
-From: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Don't need to flush all VM's ram from cache, only
-flush the dirty pages since last checkpoint
-Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- migration/ram.c | 9 +++++++++
-file changed, 9 insertions(+)
-diff --git a/migration/ram.c b/migration/ram.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/ram.c
-+++ b/migration/ram.c
-@@ -XXX,XX +XXX,XX @@ int colo_init_ram_cache(void)
-     }
-     ram_state = g_new0(RAMState, 1);
-     ram_state->migration_dirty_pages = 0;
-+    memory_global_dirty_log_start();
-     return 0;
-@@ -XXX,XX +XXX,XX @@ void colo_release_ram_cache(void)
- {
-     RAMBlock *block;
-+    memory_global_dirty_log_stop();
-     RAMBLOCK_FOREACH_MIGRATABLE(block) {
-         g_free(block->bmap);
-         block->bmap = NULL;
-@@ -XXX,XX +XXX,XX @@ static void colo_flush_ram_cache(void)
-     void *src_host;
-     unsigned long offset = 0;
-+    memory_global_dirty_log_sync();
-+    rcu_read_lock();
-+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
-+        migration_bitmap_sync_range(ram_state, block, 0, block->used_length);
-+    }
-+    rcu_read_unlock();
-+
-     trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
-     rcu_read_lock();
-     block = QLIST_FIRST_RCU(&ram_list.blocks);
---
-.17.1

-[Qemu-devel] [PULL 16/25] filter-rewriter: handle checkpoint and failover event
+Deleted patch
-From: Zhang Chen <zhangckid@gmail.com>
-After one round of checkpoint, the states between PVM and SVM
-become consistent, so it is unnecessary to adjust the sequence
-of net packets for old connections, besides, while failover
-happens, filter-rewriter will into failover mode that needn't
-handle the new TCP connection.
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- net/colo-compare.c    | 12 ++++-----
- net/colo.c            |  8 ++++++
- net/colo.h            |  2 ++
- net/filter-rewriter.c | 57 +++++++++++++++++++++++++++++++++++++++++++
-files changed, 73 insertions(+), 6 deletions(-)
-diff --git a/net/colo-compare.c b/net/colo-compare.c
-index XXXXXXX..XXXXXXX 100644
---- a/net/colo-compare.c
-+++ b/net/colo-compare.c
-@@ -XXX,XX +XXX,XX @@ enum {
-     SECONDARY_IN,
- };
-+static void colo_compare_inconsistency_notify(void)
-+{
-+    notifier_list_notify(&colo_compare_notifiers,
-+                migrate_get_current());
-+}
-+
- static int compare_chr_send(CompareState *s,
-                             const uint8_t *buf,
-                             uint32_t size,
-@@ -XXX,XX +XXX,XX @@ static bool colo_mark_tcp_pkt(Packet *ppkt, Packet *spkt,
-     return false;
- }
--static void colo_compare_inconsistency_notify(void)
--{
--    notifier_list_notify(&colo_compare_notifiers,
--                migrate_get_current());
--}
--
- static void colo_compare_tcp(CompareState *s, Connection *conn)
- {
-     Packet *ppkt = NULL, *spkt = NULL;
-diff --git a/net/colo.c b/net/colo.c
-index XXXXXXX..XXXXXXX 100644
---- a/net/colo.c
-+++ b/net/colo.c
-@@ -XXX,XX +XXX,XX @@ Connection *connection_get(GHashTable *connection_track_table,
-     return conn;
- }
-+
-+bool connection_has_tracked(GHashTable *connection_track_table,
-+                            ConnectionKey *key)
-+{
-+    Connection *conn = g_hash_table_lookup(connection_track_table, key);
-+
-+    return conn ? true : false;
-+}
-diff --git a/net/colo.h b/net/colo.h
-index XXXXXXX..XXXXXXX 100644
---- a/net/colo.h
-+++ b/net/colo.h
-@@ -XXX,XX +XXX,XX @@ void connection_destroy(void *opaque);
- Connection *connection_get(GHashTable *connection_track_table,
-                            ConnectionKey *key,
-                            GQueue *conn_list);
-+bool connection_has_tracked(GHashTable *connection_track_table,
-+                            ConnectionKey *key);
- void connection_hashtable_reset(GHashTable *connection_track_table);
- Packet *packet_new(const void *data, int size, int vnet_hdr_len);
- void packet_destroy(void *opaque, void *user_data);
-diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
-index XXXXXXX..XXXXXXX 100644
---- a/net/filter-rewriter.c
-+++ b/net/filter-rewriter.c
-@@ -XXX,XX +XXX,XX @@
- #include "qemu/main-loop.h"
- #include "qemu/iov.h"
- #include "net/checksum.h"
-+#include "net/colo.h"
-+#include "migration/colo.h"
- #define FILTER_COLO_REWRITER(obj) \
-     OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
- #define TYPE_FILTER_REWRITER "filter-rewriter"
-+#define FAILOVER_MODE_ON  true
-+#define FAILOVER_MODE_OFF false
- typedef struct RewriterState {
-     NetFilterState parent_obj;
-@@ -XXX,XX +XXX,XX @@ typedef struct RewriterState {
-     /* hashtable to save connection */
-     GHashTable *connection_track_table;
-     bool vnet_hdr;
-+    bool failover_mode;
- } RewriterState;
-+static void filter_rewriter_failover_mode(RewriterState *s)
-+{
-+    s->failover_mode = FAILOVER_MODE_ON;
-+}
-+
- static void filter_rewriter_flush(NetFilterState *nf)
- {
-     RewriterState *s = FILTER_COLO_REWRITER(nf);
-@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
-              */
-             reverse_connection_key(&key);
-         }
-+
-+        /* After failover we needn't change new TCP packet */
-+        if (s->failover_mode &&
-+            !connection_has_tracked(s->connection_track_table, &key)) {
-+            goto out;
-+        }
-+
-         conn = connection_get(s->connection_track_table,
-                               &key,
-                               NULL);
-@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
-         }
-     }
-+out:
-     packet_destroy(pkt, NULL);
-     pkt = NULL;
-     return 0;
- }
-+static void reset_seq_offset(gpointer key, gpointer value, gpointer user_data)
-+{
-+    Connection *conn = (Connection *)value;
-+
-+    conn->offset = 0;
-+}
-+
-+static gboolean offset_is_nonzero(gpointer key,
-+                                  gpointer value,
-+                                  gpointer user_data)
-+{
-+    Connection *conn = (Connection *)value;
-+
-+    return conn->offset ? true : false;
-+}
-+
-+static void colo_rewriter_handle_event(NetFilterState *nf, int event,
-+                                       Error **errp)
-+{
-+    RewriterState *rs = FILTER_COLO_REWRITER(nf);
-+
-+    switch (event) {
-+    case COLO_EVENT_CHECKPOINT:
-+        g_hash_table_foreach(rs->connection_track_table,
-+                            reset_seq_offset, NULL);
-+        break;
-+    case COLO_EVENT_FAILOVER:
-+        if (!g_hash_table_find(rs->connection_track_table,
-+                              offset_is_nonzero, NULL)) {
-+            filter_rewriter_failover_mode(rs);
-+        }
-+        break;
-+    default:
-+        break;
-+    }
-+}
-+
- static void colo_rewriter_cleanup(NetFilterState *nf)
- {
-     RewriterState *s = FILTER_COLO_REWRITER(nf);
-@@ -XXX,XX +XXX,XX @@ static void filter_rewriter_init(Object *obj)
-     RewriterState *s = FILTER_COLO_REWRITER(obj);
-     s->vnet_hdr = false;
-+    s->failover_mode = FAILOVER_MODE_OFF;
-     object_property_add_bool(obj, "vnet_hdr_support",
-                              filter_rewriter_get_vnet_hdr,
-                              filter_rewriter_set_vnet_hdr, NULL);
-@@ -XXX,XX +XXX,XX @@ static void colo_rewriter_class_init(ObjectClass *oc, void *data)
-     nfc->setup = colo_rewriter_setup;
-     nfc->cleanup = colo_rewriter_cleanup;
-     nfc->receive_iov = colo_rewriter_receive_iov;
-+    nfc->handle_event = colo_rewriter_handle_event;
- }
- static const TypeInfo colo_rewriter_info = {
---
-.17.1

-[Qemu-devel] [PULL 17/25] COLO: notify net filters about checkpoint/failover event
+Deleted patch
-From: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Notify all net filters about the checkpoint and failover event.
-Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
-Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- migration/colo.c | 15 +++++++++++++++
-file changed, 15 insertions(+)
-diff --git a/migration/colo.c b/migration/colo.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/colo.c
-+++ b/migration/colo.c
-@@ -XXX,XX +XXX,XX @@
- #include "qapi/qapi-events-migration.h"
- #include "qapi/qmp/qerror.h"
- #include "sysemu/cpus.h"
-+#include "net/filter.h"
- static bool vmstate_loading;
- static Notifier packets_compare_notifier;
-@@ -XXX,XX +XXX,XX @@ static void secondary_vm_do_failover(void)
-         error_report_err(local_err);
-     }
-+    /* Notify all filters of all NIC to do checkpoint */
-+    colo_notify_filters_event(COLO_EVENT_FAILOVER, &local_err);
-+    if (local_err) {
-+        error_report_err(local_err);
-+    }
-+
-     if (!autostart) {
-         error_report("\"-S\" qemu option will be ignored in secondary side");
-         /* recover runstate to normal migration finish state */
-@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
-             goto out;
-         }
-+        /* Notify all filters of all NIC to do checkpoint */
-+        colo_notify_filters_event(COLO_EVENT_CHECKPOINT, &local_err);
-+
-+        if (local_err) {
-+            qemu_mutex_unlock_iothread();
-+            goto out;
-+        }
-+
-         vmstate_loading = false;
-         vm_start();
-         trace_colo_vm_state_change("stop", "run");
---
-.17.1

-[Qemu-devel] [PULL 24/25] net: ignore packet size greater than INT_MAX
+[PULL 6/7] docs: Added eBPF documentation.
-There should not be a reason for passing a packet size greater than
+From: Andrew Melnychenko <andrew@daynix.com>
 INT_MAX. It's usually a hint of bug somewhere, so ignore packet size
 greater than INT_MAX in qemu_deliver_packet_iov()
-CC: qemu-stable@nongnu.org
+Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
-Reported-by: Daniel Shapira <daniel@twistlock.com>
+Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
 Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- net/net.c | 7 ++++++-
+ docs/devel/ebpf_rss.rst | 125 ++++++++++++++++++++++++++++++++++++++++++++++++
-file changed, 6 insertions(+), 1 deletion(-)
+ docs/devel/index.rst    |   1 +
 files changed, 126 insertions(+)
  create mode 100644 docs/devel/ebpf_rss.rst
-diff --git a/net/net.c b/net/net.c
+diff --git a/docs/devel/ebpf_rss.rst b/docs/devel/ebpf_rss.rst
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/docs/devel/ebpf_rss.rst
@@ -XXX,XX +XXX,XX @@
 +===========================
 +eBPF RSS virtio-net support
 +===========================
 +
 +RSS(Receive Side Scaling) is used to distribute network packets to guest virtqueues
 +by calculating packet hash. Usually every queue is processed then by a specific guest CPU core.
 +
 +For now there are 2 RSS implementations in qemu:
 +- 'in-qemu' RSS (functions if qemu receives network packets, i.e. vhost=off)
 +- eBPF RSS (can function with also with vhost=on)
 +
 +eBPF support (CONFIG_EBPF) is enabled by 'configure' script.
 +To enable eBPF RSS support use './configure --enable-bpf'.
 +
 +If steering BPF is not set for kernel's TUN module, the TUN uses automatic selection
 +of rx virtqueue based on lookup table built according to calculated symmetric hash
 +of transmitted packets.
 +If steering BPF is set for TUN the BPF code calculates the hash of packet header and
 +returns the virtqueue number to place the packet to.
 +
 +Simplified decision formula:
 +
 +.. code:: C
 +
 +    queue_index = indirection_table[hash(<packet data>)%<indirection_table size>]
 +
 +
 +Not for all packets, the hash can/should be calculated.
 +
 +Note: currently, eBPF RSS does not support hash reporting.
 +
 +eBPF RSS turned on by different combinations of vhost-net, vitrio-net and tap configurations:
 +
 +- eBPF is used:
 +
 +        tap,vhost=off & virtio-net-pci,rss=on,hash=off
 +
 +- eBPF is used:
 +
 +        tap,vhost=on & virtio-net-pci,rss=on,hash=off
 +
 +- 'in-qemu' RSS is used:
 +
 +        tap,vhost=off & virtio-net-pci,rss=on,hash=on
 +
 +- eBPF is used, hash population feature is not reported to the guest:
 +
 +        tap,vhost=on & virtio-net-pci,rss=on,hash=on
 +
 +If CONFIG_EBPF is not set then only 'in-qemu' RSS is supported.
 +Also 'in-qemu' RSS, as a fallback, is used if the eBPF program failed to load or set to TUN.
 +
 +RSS eBPF program
 +----------------
 +
 +RSS program located in ebpf/rss.bpf.skeleton.h generated by bpftool.
 +So the program is part of the qemu binary.
 +Initially, the eBPF program was compiled by clang and source code located at tools/ebpf/rss.bpf.c.
 +Prerequisites to recompile the eBPF program (regenerate ebpf/rss.bpf.skeleton.h):
 +
 +        llvm, clang, kernel source tree, bpftool
 +        Adjust Makefile.ebpf to reflect the location of the kernel source tree
 +
 +        $ cd tools/ebpf
 +        $ make -f Makefile.ebpf
 +
 +Current eBPF RSS implementation uses 'bounded loops' with 'backward jump instructions' which present in the last kernels.
 +Overall eBPF RSS works on kernels 5.8+.
 +
 +eBPF RSS implementation
 +-----------------------
 +
 +eBPF RSS loading functionality located in ebpf/ebpf_rss.c and ebpf/ebpf_rss.h.
 +
 +The `struct EBPFRSSContext` structure that holds 4 file descriptors:
 +
 +- ctx - pointer of the libbpf context.
 +- program_fd - file descriptor of the eBPF RSS program.
 +- map_configuration - file descriptor of the 'configuration' map. This map contains one element of 'struct EBPFRSSConfig'. This configuration determines eBPF program behavior.
 +- map_toeplitz_key - file descriptor of the 'Toeplitz key' map. One element of the 40byte key prepared for the hashing algorithm.
 +- map_indirections_table - 128 elements of queue indexes.
 +
 +`struct EBPFRSSConfig` fields:
 +
 +- redirect - "boolean" value, should the hash be calculated, on false  - `default_queue` would be used as the final decision.
 +- populate_hash - for now, not used. eBPF RSS doesn't support hash reporting.
 +- hash_types - binary mask of different hash types. See `VIRTIO_NET_RSS_HASH_TYPE_*` defines. If for packet hash should not be calculated - `default_queue` would be used.
 +- indirections_len - length of the indirections table, maximum 128.
 +- default_queue - the queue index that used for packet that shouldn't be hashed. For some packets, the hash can't be calculated(g.e ARP).
 +
 +Functions:
 +
 +- `ebpf_rss_init()` - sets ctx to NULL, which indicates that EBPFRSSContext is not loaded.
 +- `ebpf_rss_load()` - creates 3 maps and loads eBPF program from the rss.bpf.skeleton.h. Returns 'true' on success. After that, program_fd can be used to set steering for TAP.
 +- `ebpf_rss_set_all()` - sets values for eBPF maps. `indirections_table` length is in EBPFRSSConfig. `toeplitz_key` is VIRTIO_NET_RSS_MAX_KEY_SIZE aka 40 bytes array.
 +- `ebpf_rss_unload()` - close all file descriptors and set ctx to NULL.
 +
 +Simplified eBPF RSS workflow:
 +
 +.. code:: C
 +
 +    struct EBPFRSSConfig config;
 +    config.redirect = 1;
 +    config.hash_types = VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | VIRTIO_NET_RSS_HASH_TYPE_TCPv4;
 +    config.indirections_len = VIRTIO_NET_RSS_MAX_TABLE_LEN;
 +    config.default_queue = 0;
 +
 +    uint16_t table[VIRTIO_NET_RSS_MAX_TABLE_LEN] = {...};
 +    uint8_t key[VIRTIO_NET_RSS_MAX_KEY_SIZE] = {...};
 +
 +    struct EBPFRSSContext ctx;
 +    ebpf_rss_init(&ctx);
 +    ebpf_rss_load(&ctx);
 +    ebpf_rss_set_all(&ctx, &config, table, key);
 +    if (net_client->info->set_steering_ebpf != NULL) {
 +        net_client->info->set_steering_ebpf(net_client, ctx->program_fd);
 +    }
 +    ...
 +    ebpf_unload(&ctx);
 +
 +
 +NetClientState SetSteeringEBPF()
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +For now, `set_steering_ebpf()` method supported by Linux TAP NetClientState. The method requires an eBPF program file descriptor as an argument.
 diff --git a/docs/devel/index.rst b/docs/devel/index.rst
 index XXXXXXX..XXXXXXX 100644
---- a/net/net.c
+--- a/docs/devel/index.rst
-+++ b/net/net.c
++++ b/docs/devel/index.rst
-@@ -XXX,XX +XXX,XX @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
+@@ -XXX,XX +XXX,XX @@ Contents:
-                                 void *opaque)
+    qom
- {
+    block-coroutine-wrapper
-     NetClientState *nc = opaque;
+    multi-process
-+    size_t size = iov_size(iov, iovcnt);
++   ebpf_rss
      int ret;
 +    if (size > INT_MAX) {
 +        return size;
 +    }
 +
      if (nc->link_down) {
 -        return iov_size(iov, iovcnt);
 +        return size;
      }
      if (nc->receive_disabled) {
 --
-.17.1
+.7.4

-[Qemu-devel] [PULL 18/25] COLO: quick failover process by kick COLO thread
+[PULL 7/7] MAINTAINERS: Added eBPF maintainers information.
-From: zhanghailiang <zhang.zhanghailiang@huawei.com>
+From: Andrew Melnychenko <andrew@daynix.com>
-COLO thread may sleep at qemu_sem_wait(&s->colo_checkpoint_sem),
+Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
-while failover works begin, It's better to wakeup it to quick
+Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
 the process.
 Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
 Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
 Signed-off-by: Jason Wang <jasowang@redhat.com>
 ---
- migration/colo.c | 8 ++++++++
+ MAINTAINERS | 8 ++++++++
 file changed, 8 insertions(+)
-diff --git a/migration/colo.c b/migration/colo.c
+diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
---- a/migration/colo.c
+--- a/MAINTAINERS
-+++ b/migration/colo.c
++++ b/MAINTAINERS
-@@ -XXX,XX +XXX,XX @@ static void primary_vm_do_failover(void)
+@@ -XXX,XX +XXX,XX @@ F: include/hw/remote/proxy-memory-listener.h
+ F: hw/remote/iohub.c
-     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
+ F: include/hw/remote/iohub.h
-                       MIGRATION_STATUS_COMPLETED);
-+    /*
++EBPF:
-+     * kick COLO thread which might wait at
++M: Jason Wang <jasowang@redhat.com>
-+     * qemu_sem_wait(&s->colo_checkpoint_sem).
++R: Andrew Melnychenko <andrew@daynix.com>
-+     */
++R: Yuri Benditovich <yuri.benditovich@daynix.com>
-+    colo_checkpoint_notify(migrate_get_current());
++S: Maintained
++F: ebpf/*
-     /*
++F: tools/ebpf/*
-      * Wake up COLO thread which may blocked in recv() or send(),
++
-@@ -XXX,XX +XXX,XX @@ static void colo_process_checkpoint(MigrationState *s)
+ Build and test automation
+ -------------------------
-         qemu_sem_wait(&s->colo_checkpoint_sem);
+ Build and test automation, general continuous integration
 +        if (s->state != MIGRATION_STATUS_COLO) {
 +            goto out;
 +        }
          ret = colo_do_checkpoint_transaction(s, bioc, fb);
          if (ret < 0) {
              goto out;
 --
-.17.1
+.7.4

-[Qemu-devel] [PULL 19/25] docs: Add COLO status diagram to COLO-FT.txt
+Deleted patch
-From: Zhang Chen <chen.zhang@intel.com>
-This diagram make user better understand COLO.
-Suggested by Markus Armbruster.
-Signed-off-by: Zhang Chen <zhangckid@gmail.com>
-Signed-off-by: Zhang Chen <chen.zhang@intel.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- docs/COLO-FT.txt | 34 ++++++++++++++++++++++++++++++++++
-file changed, 34 insertions(+)
-diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
-index XXXXXXX..XXXXXXX 100644
---- a/docs/COLO-FT.txt
-+++ b/docs/COLO-FT.txt
-@@ -XXX,XX +XXX,XX @@ Note:
- HeartBeat has not been implemented yet, so you need to trigger failover process
- by using 'x-colo-lost-heartbeat' command.
-+== COLO operation status ==
-+
-++-----------------+
-+|                 |
-+|    Start COLO   |
-+|                 |
-++--------+--------+
-+         |
-+         |  Main qmp command:
-+         |  migrate-set-capabilities with x-colo
-+         |  migrate
-+         |
-+         v
-++--------+--------+
-+|                 |
-+|  COLO running   |
-+|                 |
-++--------+--------+
-+         |
-+         |  Main qmp command:
-+         |  x-colo-lost-heartbeat
-+         |  or
-+         |  some error happened
-+         v
-++--------+--------+
-+|                 |  send qmp event:
-+|  COLO failover  |  COLO_EXIT
-+|                 |
-++-----------------+
-+
-+COLO use the qmp command to switch and report operation status.
-+The diagram just shows the main qmp command, you can get the detail
-+in test procedure.
-+
- == Test procedure ==
-. Startup qemu
- Primary:
---
-.17.1

-[Qemu-devel] [PULL 20/25] clean up callback when del virtqueue
+Deleted patch
-From: liujunjie <liujunjie23@huawei.com>
-Before, we did not clear callback like handle_output when delete
-the virtqueue which may result be segmentfault.
-The scene is as follows:
-. Start a vm with multiqueue vhost-net,
-. then we write VIRTIO_PCI_GUEST_FEATURES in PCI configuration to
-triger multiqueue disable in this vm which will delete the virtqueue.
-In this step, the tx_bh is deleted but the callback virtio_net_handle_tx_bh
-still exist.
-. Finally, we write VIRTIO_PCI_QUEUE_NOTIFY in PCI configuration to
-notify the deleted virtqueue. In this way, virtio_net_handle_tx_bh
-will be called and qemu will be crashed.
-Although the way described above is uncommon, we had better reinforce it.
-CC: qemu-stable@nongnu.org
-Signed-off-by: liujunjie <liujunjie23@huawei.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- hw/virtio/virtio.c | 2 ++
-file changed, 2 insertions(+)
-diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/virtio/virtio.c
-+++ b/hw/virtio/virtio.c
-@@ -XXX,XX +XXX,XX @@ void virtio_del_queue(VirtIODevice *vdev, int n)
-     vdev->vq[n].vring.num = 0;
-     vdev->vq[n].vring.num_default = 0;
-+    vdev->vq[n].handle_output = NULL;
-+    vdev->vq[n].handle_aio_output = NULL;
- }
- static void virtio_set_isr(VirtIODevice *vdev, int value)
---
-.17.1

-[Qemu-devel] [PULL 21/25] ne2000: fix possible out of bound access in ne2000_receive
+Deleted patch
-In ne2000_receive(), we try to assign size_ to size which converts
-from size_t to integer. This will cause troubles when size_ is greater
-INT_MAX, this will lead a negative value in size and it can then pass
-the check of size < MIN_BUF_SIZE which may lead out of bound access of
-for both buf and buf1.
-Fixing by converting the type of size to size_t.
-CC: qemu-stable@nongnu.org
-Reported-by: Daniel Shapira <daniel@twistlock.com>
-Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- hw/net/ne2000.c | 4 ++--
-file changed, 2 insertions(+), 2 deletions(-)
-diff --git a/hw/net/ne2000.c b/hw/net/ne2000.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/net/ne2000.c
-+++ b/hw/net/ne2000.c
-@@ -XXX,XX +XXX,XX @@ static int ne2000_buffer_full(NE2000State *s)
- ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
- {
-     NE2000State *s = qemu_get_nic_opaque(nc);
--    int size = size_;
-+    size_t size = size_;
-     uint8_t *p;
-     unsigned int total_len, next, avail, len, index, mcast_idx;
-     uint8_t buf1[60];
-@@ -XXX,XX +XXX,XX @@ ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
-         { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
- #if defined(DEBUG_NE2000)
--    printf("NE2000: received len=%d\n", size);
-+    printf("NE2000: received len=%zu\n", size);
- #endif
-     if (s->cmd & E8390_STOP || ne2000_buffer_full(s))
---
-.17.1

-[Qemu-devel] [PULL 22/25] rtl8139: fix possible out of bound access
+Deleted patch
-In rtl8139_do_receive(), we try to assign size_ to size which converts
-from size_t to integer. This will cause troubles when size_ is greater
-INT_MAX, this will lead a negative value in size and it can then pass
-the check of size < MIN_BUF_SIZE which may lead out of bound access of
-for both buf and buf1.
-Fixing by converting the type of size to size_t.
-CC: qemu-stable@nongnu.org
-Reported-by: Daniel Shapira <daniel@twistlock.com>
-Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
-Signed-off-by: Jason Wang <jasowang@redhat.com>
----
- hw/net/rtl8139.c | 8 ++++----
-file changed, 4 insertions(+), 4 deletions(-)
-diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/net/rtl8139.c
-+++ b/hw/net/rtl8139.c
-@@ -XXX,XX +XXX,XX @@ static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t
-     RTL8139State *s = qemu_get_nic_opaque(nc);
-     PCIDevice *d = PCI_DEVICE(s);
-     /* size is the length of the buffer passed to the driver */
--    int size = size_;
-+    size_t size = size_;
-     const uint8_t *dot1q_buf = NULL;
-     uint32_t packet_header = 0;
-@@ -XXX,XX +XXX,XX @@ static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t
-     static const uint8_t broadcast_macaddr[6] =
-         { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
--    DPRINTF(">>> received len=%d\n", size);
-+    DPRINTF(">>> received len=%zu\n", size);
-     /* test if board clock is stopped */
-     if (!s->clock_enabled)
-@@ -XXX,XX +XXX,XX @@ static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t
-         if (size+4 > rx_space)
-         {
--            DPRINTF("C+ Rx mode : descriptor %d size %d received %d + 4\n",
-+            DPRINTF("C+ Rx mode : descriptor %d size %d received %zu + 4\n",
-                 descriptor, rx_space, size);
-             s->IntrStatus |= RxOverflow;
-@@ -XXX,XX +XXX,XX @@ static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t
-         if (avail != 0 && RX_ALIGN(size + 8) >= avail)
-         {
-             DPRINTF("rx overflow: rx buffer length %d head 0x%04x "
--                "read 0x%04x === available 0x%04x need 0x%04x\n",
-+                "read 0x%04x === available 0x%04x need 0x%04zx\n",
-                 s->RxBufferSize, s->RxBufAddr, s->RxBufPtr, avail, size + 8);
-             s->IntrStatus |= RxOverflow;
---
-.17.1

-[Qemu-devel] [PULL 25/25] e1000: indicate dropped packets in HW counters
+Deleted patch
-From: Martin Wilck <mwilck@suse.com>
-The e1000 emulation silently discards RX packets if there's
-insufficient space in the ring buffer. This leads to errors
-on higher-level protocols in the guest, with no indication
-about the error cause.
-This patch increments the "Missed Packets Count" (MPC) and
-"Receive No Buffers Count" (RNBC) HW counters in this case.
-As the emulation has no FIFO for buffering packets that can't
-immediately be pushed to the guest, these two registers are
-practically equivalent (see 10.2.7.4, 10.2.7.33 in
-https://www.intel.com/content/www/us/en/embedded/products/networking/82574l-gbe-controller-datasheet.html).
-On a Linux guest, the register content  will be reflected in
-the "rx_missed_errors" and "rx_no_buffer_count" stats from
-"ethtool -S", and in the "missed" stat from "ip -s -s link show",
-giving at least some hint about the error cause inside the guest.
-If the cause is known, problems like this can often be avoided
-easily, by increasing the number of RX descriptors in the guest
-e1000 driver (e.g under Linux, "e1000.RxDescriptors=1024").
-The patch also adds a qemu trace message for this condition.
-Signed-off-by: Martin Wilck <mwilck@suse.com>
----
- hw/net/e1000.c      | 16 +++++++++++++---
- hw/net/trace-events |  3 +++
-files changed, 16 insertions(+), 3 deletions(-)
-diff --git a/hw/net/e1000.c b/hw/net/e1000.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/net/e1000.c
-+++ b/hw/net/e1000.c
-@@ -XXX,XX +XXX,XX @@
- #include "qemu/range.h"
- #include "e1000x_common.h"
-+#include "trace.h"
- static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
-@@ -XXX,XX +XXX,XX @@ static uint64_t rx_desc_base(E1000State *s)
-     return (bah << 32) + bal;
- }
-+static void
-+e1000_receiver_overrun(E1000State *s, size_t size)
-+{
-+    trace_e1000_receiver_overrun(size, s->mac_reg[RDH], s->mac_reg[RDT]);
-+    e1000x_inc_reg_if_not_full(s->mac_reg, RNBC);
-+    e1000x_inc_reg_if_not_full(s->mac_reg, MPC);
-+    set_ics(s, 0, E1000_ICS_RXO);
-+}
-+
- static ssize_t
- e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
- {
-@@ -XXX,XX +XXX,XX @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
-     desc_offset = 0;
-     total_size = size + e1000x_fcs_len(s->mac_reg);
-     if (!e1000_has_rxbufs(s, total_size)) {
--            set_ics(s, 0, E1000_ICS_RXO);
--            return -1;
-+        e1000_receiver_overrun(s, total_size);
-+        return -1;
-     }
-     do {
-         desc_size = total_size - desc_offset;
-@@ -XXX,XX +XXX,XX @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
-             rdh_start >= s->mac_reg[RDLEN] / sizeof(desc)) {
-             DBGOUT(RXERR, "RDH wraparound @%x, RDT %x, RDLEN %x\n",
-                    rdh_start, s->mac_reg[RDT], s->mac_reg[RDLEN]);
--            set_ics(s, 0, E1000_ICS_RXO);
-+            e1000_receiver_overrun(s, total_size);
-             return -1;
-         }
-     } while (desc_offset < total_size);
-diff --git a/hw/net/trace-events b/hw/net/trace-events
-index XXXXXXX..XXXXXXX 100644
---- a/hw/net/trace-events
-+++ b/hw/net/trace-events
-@@ -XXX,XX +XXX,XX @@ net_rx_pkt_rss_ip6_ex(void) "Calculating IPv6/EX RSS  hash"
- net_rx_pkt_rss_hash(size_t rss_length, uint32_t rss_hash) "RSS hash for %zu bytes: 0x%X"
- net_rx_pkt_rss_add_chunk(void* ptr, size_t size, size_t input_offset) "Add RSS chunk %p, %zu bytes, RSS input offset %zu bytes"
-+# hw/net/e1000.c
-+e1000_receiver_overrun(size_t s, uint32_t rdh, uint32_t rdt) "Receiver overrun: dropped packet of %lu bytes, RDH=%u, RDT=%u"
-+
- # hw/net/e1000x_common.c
- e1000x_rx_can_recv_disabled(bool link_up, bool rx_enabled, bool pci_master) "link_up: %d, rx_enabled %d, pci_master %d"
- e1000x_vlan_is_vlan_pkt(bool is_vlan_pkt, uint16_t eth_proto, uint16_t vet) "Is VLAN packet: %d, ETH proto: 0x%X, VET: 0x%X"
---
-.17.1

The following changes since commit c5e4e49258e9b89cb34c085a419dd9f862935c48:

Merge remote-tracking branch 'remotes/xanclic/tags/pull-block-2018-09-25' into staging (2018-09-25 16:47:35 +0100)

are available in the Git repository at:

https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to f3df030edf90db184cd029697e976e24f1925e03:

e1000: indicate dropped packets in HW counters (2018-09-26 11:06:10 +0800)

----------------------------------------------------------------

----------------------------------------------------------------
Jason Wang (4):
      ne2000: fix possible out of bound access in ne2000_receive
      rtl8139: fix possible out of bound access
      pcnet: fix possible buffer overflow
      net: ignore packet size greater than INT_MAX

Martin Wilck (1):
      e1000: indicate dropped packets in HW counters

Zhang Chen (15):
      filter-rewriter: Add TCP state machine and fix memory leak in connection_track_table
      colo-compare: implement the process of checkpoint
      colo-compare: use notifier to notify packets comparing result
      COLO: integrate colo compare with colo frame
      COLO: Add block replication into colo process
      COLO: Remove colo_state migration struct
      COLO: Load dirty pages into SVM's RAM cache firstly
      ram/COLO: Record the dirty pages that SVM received
      COLO: Flush memory data from ram cache
      qapi/migration.json: Rename COLO unknown mode to none mode.
      qapi: Add new command to query colo status
      savevm: split the process of different stages for loadvm/savevm
      filter: Add handle_event method for NetFilterClass
      filter-rewriter: handle checkpoint and failover event
      docs: Add COLO status diagram to COLO-FT.txt

liujunjie (1):
      clean up callback when del virtqueue

zhanghailiang (4):
      qmp event: Add COLO_EXIT event to notify users while exited COLO
      COLO: flush host dirty ram from cache
      COLO: notify net filters about checkpoint/failover event
      COLO: quick failover process by kick COLO thread

From: Zhang Chen <zhangckid@gmail.com>

We add almost full TCP state machine in filter-rewriter, except
TCPS_LISTEN and some simplify in VM active close FIN states.
The reason for this simplify job is because guest kernel will track
the TCP status and wait 2MSL time too, if client resend the FIN packet,
guest will resend the last ACK, so we needn't wait 2MSL time in filter-rewriter.

After a net connection is closed, we didn't clear its related resources
in connection_track_table, which will lead to memory leak.

Let's track the state of net connection, if it is closed, its related
resources will be cleared up.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo.c            |   2 +-
 net/colo.h            |   9 ++--
 net/filter-rewriter.c | 109 ++++++++++++++++++++++++++++++++++++++----
 3 files changed, 104 insertions(+), 16 deletions(-)

diff --git a/net/colo.c b/net/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -XXX,XX +XXX,XX @@ Connection *connection_new(ConnectionKey *key)
     conn->ip_proto = key->ip_proto;
     conn->processing = false;
     conn->offset = 0;
-    conn->syn_flag = 0;
+    conn->tcp_state = TCPS_CLOSED;
     conn->pack = 0;
     conn->sack = 0;
     g_queue_init(&conn->primary_list);
diff --git a/net/colo.h b/net/colo.h
index XXXXXXX..XXXXXXX 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -XXX,XX +XXX,XX @@
 #include "slirp/slirp.h"
 #include "qemu/jhash.h"
 #include "qemu/timer.h"
+#include "slirp/tcp.h"
 
 #define HASHTABLE_MAX_SIZE 16384
 
@@ -XXX,XX +XXX,XX @@ typedef struct Connection {
     uint32_t sack;
     /* offset = secondary_seq - primary_seq */
     tcp_seq  offset;
-    /*
-     * we use this flag update offset func
-     * run once in independent tcp connection
-     */
-    int syn_flag;
+
+    int tcp_state; /* TCP FSM state */
+    tcp_seq fin_ack_seq; /* the seq of 'fin=1,ack=1' */
 } Connection;
 
 uint32_t connection_key_hash(const void *opaque);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index XXXXXXX..XXXXXXX 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -XXX,XX +XXX,XX @@ static int is_tcp_packet(Packet *pkt)
 }
 
 /* handle tcp packet from primary guest */
-static int handle_primary_tcp_pkt(NetFilterState *nf,
+static int handle_primary_tcp_pkt(RewriterState *rf,
                                   Connection *conn,
-                                  Packet *pkt)
+                                  Packet *pkt, ConnectionKey *key)
 {
     struct tcphdr *tcp_pkt;
 
@@ -XXX,XX +XXX,XX @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
         trace_colo_filter_rewriter_conn_offset(conn->offset);
     }
 
+    if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN)) &&
+        conn->tcp_state == TCPS_SYN_SENT) {
+        conn->tcp_state = TCPS_ESTABLISHED;
+    }
+
     if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
         /*
          * we use this flag update offset func
          * run once in independent tcp connection
          */
-        conn->syn_flag = 1;
+        conn->tcp_state = TCPS_SYN_RECEIVED;
     }
 
     if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK)) {
-        if (conn->syn_flag) {
+        if (conn->tcp_state == TCPS_SYN_RECEIVED) {
             /*
              * offset = secondary_seq - primary seq
              * ack packet sent by guest from primary node,
              * so we use th_ack - 1 get primary_seq
              */
             conn->offset -= (ntohl(tcp_pkt->th_ack) - 1);
-            conn->syn_flag = 0;
+            conn->tcp_state = TCPS_ESTABLISHED;
         }
         if (conn->offset) {
             /* handle packets to the secondary from the primary */
@@ -XXX,XX +XXX,XX @@ static int handle_primary_tcp_pkt(NetFilterState *nf,
             net_checksum_calculate((uint8_t *)pkt->data + pkt->vnet_hdr_len,
                                    pkt->size - pkt->vnet_hdr_len);
         }
+
+        /*
+         * Passive close step 3
+         */
+        if ((conn->tcp_state == TCPS_LAST_ACK) &&
+            (ntohl(tcp_pkt->th_ack) == (conn->fin_ack_seq + 1))) {
+            conn->tcp_state = TCPS_CLOSED;
+            g_hash_table_remove(rf->connection_track_table, key);
+        }
+    }
+
+    if ((tcp_pkt->th_flags & TH_FIN) == TH_FIN) {
+        /*
+         * Passive close.
+         * Step 1:
+         * The *server* side of this connect is VM, *client* tries to close
+         * the connection. We will into CLOSE_WAIT status.
+         *
+         * Step 2:
+         * In this step we will into LAST_ACK status.
+         *
+         * We got 'fin=1, ack=1' packet from server side, we need to
+         * record the seq of 'fin=1, ack=1' packet.
+         *
+         * Step 3:
+         * We got 'ack=1' packets from client side, it acks 'fin=1, ack=1'
+         * packet from server side. From this point, we can ensure that there
+         * will be no packets in the connection, except that, some errors
+         * happen between the path of 'filter object' and vNIC, if this rare
+         * case really happen, we can still create a new connection,
+         * So it is safe to remove the connection from connection_track_table.
+         *
+         */
+        if (conn->tcp_state == TCPS_ESTABLISHED) {
+            conn->tcp_state = TCPS_CLOSE_WAIT;
+        }
+
+        /*
+         * Active close step 2.
+         */
+        if (conn->tcp_state == TCPS_FIN_WAIT_1) {
+            conn->tcp_state = TCPS_TIME_WAIT;
+            /*
+             * For simplify implementation, we needn't wait 2MSL time
+             * in filter rewriter. Because guest kernel will track the
+             * TCP status and wait 2MSL time, if client resend the FIN
+             * packet, guest will apply the last ACK too.
+             */
+            conn->tcp_state = TCPS_CLOSED;
+            g_hash_table_remove(rf->connection_track_table, key);
+        }
     }
 
     return 0;
 }
 
 /* handle tcp packet from secondary guest */
-static int handle_secondary_tcp_pkt(NetFilterState *nf,
+static int handle_secondary_tcp_pkt(RewriterState *rf,
                                     Connection *conn,
-                                    Packet *pkt)
+                                    Packet *pkt, ConnectionKey *key)
 {
     struct tcphdr *tcp_pkt;
 
@@ -XXX,XX +XXX,XX @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
         trace_colo_filter_rewriter_conn_offset(conn->offset);
     }
 
-    if (((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) {
+    if (conn->tcp_state == TCPS_SYN_RECEIVED &&
+        ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == (TH_ACK | TH_SYN))) {
         /*
          * save offset = secondary_seq and then
          * in handle_primary_tcp_pkt make offset
@@ -XXX,XX +XXX,XX @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
         conn->offset = ntohl(tcp_pkt->th_seq);
     }
 
+    /* VM active connect */
+    if (conn->tcp_state == TCPS_CLOSED &&
+        ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_SYN)) {
+        conn->tcp_state = TCPS_SYN_SENT;
+    }
+
     if ((tcp_pkt->th_flags & (TH_ACK | TH_SYN)) == TH_ACK) {
         /* Only need to adjust seq while offset is Non-zero */
         if (conn->offset) {
@@ -XXX,XX +XXX,XX @@ static int handle_secondary_tcp_pkt(NetFilterState *nf,
         }
     }
 
+    /*
+     * Passive close step 2:
+     */
+    if (conn->tcp_state == TCPS_CLOSE_WAIT &&
+        (tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == (TH_ACK | TH_FIN)) {
+        conn->fin_ack_seq = ntohl(tcp_pkt->th_seq);
+        conn->tcp_state = TCPS_LAST_ACK;
+    }
+
+    /*
+     * Active close
+     *
+     * Step 1:
+     * The *server* side of this connect is VM, *server* tries to close
+     * the connection.
+     *
+     * Step 2:
+     * We will into CLOSE_WAIT status.
+     * We simplify the TCPS_FIN_WAIT_2, TCPS_TIME_WAIT and
+     * CLOSING status.
+     */
+    if (conn->tcp_state == TCPS_ESTABLISHED &&
+        (tcp_pkt->th_flags & (TH_ACK | TH_FIN)) == TH_FIN) {
+        conn->tcp_state = TCPS_FIN_WAIT_1;
+    }
+
     return 0;
 }
 
@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
 
         if (sender == nf->netdev) {
             /* NET_FILTER_DIRECTION_TX */
-            if (!handle_primary_tcp_pkt(nf, conn, pkt)) {
+            if (!handle_primary_tcp_pkt(s, conn, pkt, &key)) {
                 qemu_net_queue_send(s->incoming_queue, sender, 0,
                 (const uint8_t *)pkt->data, pkt->size, NULL);
                 packet_destroy(pkt, NULL);
@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
             }
         } else {
             /* NET_FILTER_DIRECTION_RX */
-            if (!handle_secondary_tcp_pkt(nf, conn, pkt)) {
+            if (!handle_secondary_tcp_pkt(s, conn, pkt, &key)) {
                 qemu_net_queue_send(s->incoming_queue, sender, 0,
                 (const uint8_t *)pkt->data, pkt->size, NULL);
                 packet_destroy(pkt, NULL);
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

While do checkpoint, we need to flush all the unhandled packets,
By using the filter notifier mechanism, we can easily to notify
every compare object to do this process, which runs inside
of compare threads as a coroutine.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/migration/colo.h |  6 ++++
 net/colo-compare.c       | 78 ++++++++++++++++++++++++++++++++++++++++
 net/colo-compare.h       | 22 ++++++++++++
 3 files changed, 106 insertions(+)
 create mode 100644 net/colo-compare.h

diff --git a/include/migration/colo.h b/include/migration/colo.h
index XXXXXXX..XXXXXXX 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -XXX,XX +XXX,XX @@
 #include "qemu-common.h"
 #include "qapi/qapi-types-migration.h"
 
+enum colo_event {
+    COLO_EVENT_NONE,
+    COLO_EVENT_CHECKPOINT,
+    COLO_EVENT_FAILOVER,
+};
+
 void colo_info_init(void);
 
 void migrate_start_colo_process(MigrationState *s);
diff --git a/net/colo-compare.c b/net/colo-compare.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/sockets.h"
 #include "colo.h"
 #include "sysemu/iothread.h"
+#include "net/colo-compare.h"
+#include "migration/colo.h"
 
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
     OBJECT_CHECK(CompareState, (obj), TYPE_COLO_COMPARE)
 
+static QTAILQ_HEAD(, CompareState) net_compares =
+       QTAILQ_HEAD_INITIALIZER(net_compares);
+
 #define COMPARE_READ_LEN_MAX NET_BUFSIZE
 #define MAX_QUEUE_SIZE 1024
 
@@ -XXX,XX +XXX,XX @@
 /* TODO: Should be configurable */
 #define REGULAR_PACKET_CHECK_MS 3000
 
+static QemuMutex event_mtx;
+static QemuCond event_complete_cond;
+static int event_unhandled_count;
+
 /*
  *  + CompareState ++
  *  |               |
@@ -XXX,XX +XXX,XX @@ typedef struct CompareState {
     IOThread *iothread;
     GMainContext *worker_context;
     QEMUTimer *packet_check_timer;
+
+    QEMUBH *event_bh;
+    enum colo_event event;
+
+    QTAILQ_ENTRY(CompareState) next;
 } CompareState;
 
 typedef struct CompareClass {
@@ -XXX,XX +XXX,XX @@ static void check_old_packet_regular(void *opaque)
                 REGULAR_PACKET_CHECK_MS);
 }
 
+/* Public API, Used for COLO frame to notify compare event */
+void colo_notify_compares_event(void *opaque, int event, Error **errp)
+{
+    CompareState *s;
+
+    qemu_mutex_lock(&event_mtx);
+    QTAILQ_FOREACH(s, &net_compares, next) {
+        s->event = event;
+        qemu_bh_schedule(s->event_bh);
+        event_unhandled_count++;
+    }
+    /* Wait all compare threads to finish handling this event */
+    while (event_unhandled_count > 0) {
+        qemu_cond_wait(&event_complete_cond, &event_mtx);
+    }
+
+    qemu_mutex_unlock(&event_mtx);
+}
+
 static void colo_compare_timer_init(CompareState *s)
 {
     AioContext *ctx = iothread_get_aio_context(s->iothread);
@@ -XXX,XX +XXX,XX @@ static void colo_compare_timer_del(CompareState *s)
     }
  }
 
+static void colo_flush_packets(void *opaque, void *user_data);
+
+static void colo_compare_handle_event(void *opaque)
+{
+    CompareState *s = opaque;
+
+    switch (s->event) {
+    case COLO_EVENT_CHECKPOINT:
+        g_queue_foreach(&s->conn_list, colo_flush_packets, s);
+        break;
+    case COLO_EVENT_FAILOVER:
+        break;
+    default:
+        break;
+    }
+
+    assert(event_unhandled_count > 0);
+
+    qemu_mutex_lock(&event_mtx);
+    event_unhandled_count--;
+    qemu_cond_broadcast(&event_complete_cond);
+    qemu_mutex_unlock(&event_mtx);
+}
+
 static void colo_compare_iothread(CompareState *s)
 {
     object_ref(OBJECT(s->iothread));
@@ -XXX,XX +XXX,XX @@ static void colo_compare_iothread(CompareState *s)
                              s, s->worker_context, true);
 
     colo_compare_timer_init(s);
+    s->event_bh = qemu_bh_new(colo_compare_handle_event, s);
 }
 
 static char *compare_get_pri_indev(Object *obj, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
     net_socket_rs_init(&s->pri_rs, compare_pri_rs_finalize, s->vnet_hdr);
     net_socket_rs_init(&s->sec_rs, compare_sec_rs_finalize, s->vnet_hdr);
 
+    QTAILQ_INSERT_TAIL(&net_compares, s, next);
+
     g_queue_init(&s->conn_list);
 
+    qemu_mutex_init(&event_mtx);
+    qemu_cond_init(&event_complete_cond);
+
     s->connection_track_table = g_hash_table_new_full(connection_key_hash,
                                                       connection_key_equal,
                                                       g_free,
@@ -XXX,XX +XXX,XX @@ static void colo_compare_init(Object *obj)
 static void colo_compare_finalize(Object *obj)
 {
     CompareState *s = COLO_COMPARE(obj);
+    CompareState *tmp = NULL;
 
     qemu_chr_fe_deinit(&s->chr_pri_in, false);
     qemu_chr_fe_deinit(&s->chr_sec_in, false);
@@ -XXX,XX +XXX,XX @@ static void colo_compare_finalize(Object *obj)
     if (s->iothread) {
         colo_compare_timer_del(s);
     }
+
+    qemu_bh_delete(s->event_bh);
+
+    QTAILQ_FOREACH(tmp, &net_compares, next) {
+        if (tmp == s) {
+            QTAILQ_REMOVE(&net_compares, s, next);
+            break;
+        }
+    }
+
     /* Release all unhandled packets after compare thead exited */
     g_queue_foreach(&s->conn_list, colo_flush_packets, s);
 
@@ -XXX,XX +XXX,XX @@ static void colo_compare_finalize(Object *obj)
     if (s->iothread) {
         object_unref(OBJECT(s->iothread));
     }
+
+    qemu_mutex_destroy(&event_mtx);
+    qemu_cond_destroy(&event_complete_cond);
+
     g_free(s->pri_indev);
     g_free(s->sec_indev);
     g_free(s->outdev);
diff --git a/net/colo-compare.h b/net/colo-compare.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/net/colo-compare.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
+ * (a.k.a. Fault Tolerance or Continuous Replication)
+ *
+ * Copyright (c) 2017 HUAWEI TECHNOLOGIES CO., LTD.
+ * Copyright (c) 2017 FUJITSU LIMITED
+ * Copyright (c) 2017 Intel Corporation
+ *
+ * Authors:
+ *    zhanghailiang <zhang.zhanghailiang@huawei.com>
+ *    Zhang Chen <zhangckid@gmail.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * later.  See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_COLO_COMPARE_H
+#define QEMU_COLO_COMPARE_H
+
+void colo_notify_compares_event(void *opaque, int event, Error **errp);
+
+#endif /* QEMU_COLO_COMPARE_H */
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

It's a good idea to use notifier to notify COLO frame of
inconsistent packets comparing.

Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo-compare.c | 37 ++++++++++++++++++++++++++-----------
 net/colo-compare.h |  2 ++
 2 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -XXX,XX +XXX,XX @@
 #include "sysemu/iothread.h"
 #include "net/colo-compare.h"
 #include "migration/colo.h"
+#include "migration/migration.h"
 
 #define TYPE_COLO_COMPARE "colo-compare"
 #define COLO_COMPARE(obj) \
@@ -XXX,XX +XXX,XX @@
 static QTAILQ_HEAD(, CompareState) net_compares =
        QTAILQ_HEAD_INITIALIZER(net_compares);
 
+static NotifierList colo_compare_notifiers =
+    NOTIFIER_LIST_INITIALIZER(colo_compare_notifiers);
+
 #define COMPARE_READ_LEN_MAX NET_BUFSIZE
 #define MAX_QUEUE_SIZE 1024
 
@@ -XXX,XX +XXX,XX @@ static bool colo_mark_tcp_pkt(Packet *ppkt, Packet *spkt,
     return false;
 }
 
+static void colo_compare_inconsistency_notify(void)
+{
+    notifier_list_notify(&colo_compare_notifiers,
+                migrate_get_current());
+}
+
 static void colo_compare_tcp(CompareState *s, Connection *conn)
 {
     Packet *ppkt = NULL, *spkt = NULL;
@@ -XXX,XX +XXX,XX @@ sec:
         qemu_hexdump((char *)spkt->data, stderr,
                      "colo-compare spkt", spkt->size);
 
-        /*
-         * colo_compare_inconsistent_notify();
-         * TODO: notice to checkpoint();
-         */
+        colo_compare_inconsistency_notify();
     }
 }
 
@@ -XXX,XX +XXX,XX @@ static int colo_old_packet_check_one(Packet *pkt, int64_t *check_time)
     }
 }
 
+void colo_compare_register_notifier(Notifier *notify)
+{
+    notifier_list_add(&colo_compare_notifiers, notify);
+}
+
+void colo_compare_unregister_notifier(Notifier *notify)
+{
+    notifier_remove(notify);
+}
+
 static int colo_old_packet_check_one_conn(Connection *conn,
-                                          void *user_data)
+                                           void *user_data)
 {
     GList *result = NULL;
     int64_t check_time = REGULAR_PACKET_CHECK_MS;
@@ -XXX,XX +XXX,XX @@ static int colo_old_packet_check_one_conn(Connection *conn,
 
     if (result) {
         /* Do checkpoint will flush old packet */
-        /*
-         * TODO: Notify colo frame to do checkpoint.
-         * colo_compare_inconsistent_notify();
-         */
+        colo_compare_inconsistency_notify();
         return 0;
     }
 
@@ -XXX,XX +XXX,XX @@ static void colo_compare_packet(CompareState *s, Connection *conn,
             /*
              * If one packet arrive late, the secondary_list or
              * primary_list will be empty, so we can't compare it
-             * until next comparison.
+             * until next comparison. If the packets in the list are
+             * timeout, it will trigger a checkpoint request.
              */
             trace_colo_compare_main("packet different");
             g_queue_push_head(&conn->primary_list, pkt);
-            /* TODO: colo_notify_checkpoint();*/
+            colo_compare_inconsistency_notify();
             break;
         }
     }
diff --git a/net/colo-compare.h b/net/colo-compare.h
index XXXXXXX..XXXXXXX 100644
--- a/net/colo-compare.h
+++ b/net/colo-compare.h
@@ -XXX,XX +XXX,XX @@
 #define QEMU_COLO_COMPARE_H
 
 void colo_notify_compares_event(void *opaque, int event, Error **errp);
+void colo_compare_register_notifier(Notifier *notify);
+void colo_compare_unregister_notifier(Notifier *notify);
 
 #endif /* QEMU_COLO_COMPARE_H */
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

For COLO FT, both the PVM and SVM run at the same time,
only sync the state while it needs.

So here, let SVM runs while not doing checkpoint, change
DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100.

Besides, we forgot to release colo_checkpoint_semd and
colo_delay_timer, fix them here.

diff --git a/migration/colo.c b/migration/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/error-report.h"
 #include "migration/failover.h"
 #include "replication.h"
+#include "net/colo-compare.h"
+#include "net/colo.h"
 
 static bool vmstate_loading;
+static Notifier packets_compare_notifier;
 
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
@@ -XXX,XX +XXX,XX @@ static int colo_do_checkpoint_transaction(MigrationState *s,
         goto out;
     }
 
+    colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
     /* Disable block migration */
     migrate_set_block_enabled(false, &local_err);
     qemu_savevm_state_header(fb);
@@ -XXX,XX +XXX,XX @@ out:
     return ret;
 }
 
+static void colo_compare_notify_checkpoint(Notifier *notifier, void *data)
+{
+    colo_checkpoint_notify(data);
+}
+
 static void colo_process_checkpoint(MigrationState *s)
 {
     QIOChannelBuffer *bioc;
@@ -XXX,XX +XXX,XX @@ static void colo_process_checkpoint(MigrationState *s)
         goto out;
     }
 
+    packets_compare_notifier.notify = colo_compare_notify_checkpoint;
+    colo_compare_register_notifier(&packets_compare_notifier);
+
     /*
      * Wait for Secondary finish loading VM states and enter COLO
      * restore.
@@ -XXX,XX +XXX,XX @@ out:
         qemu_fclose(fb);
     }
 
-    timer_del(s->colo_delay_timer);
-
     /* Hope this not to be too long to wait here */
     qemu_sem_wait(&s->colo_exit_sem);
     qemu_sem_destroy(&s->colo_exit_sem);
+
+    /*
+     * It is safe to unregister notifier after failover finished.
+     * Besides, colo_delay_timer and colo_checkpoint_sem can't be
+     * released befor unregister notifier, or there will be use-after-free
+     * error.
+     */
+    colo_compare_unregister_notifier(&packets_compare_notifier);
+    timer_del(s->colo_delay_timer);
+    timer_free(s->colo_delay_timer);
+    qemu_sem_destroy(&s->colo_checkpoint_sem);
+
     /*
      * Must be called after failover BH is completed,
      * Or the failover BH may shutdown the wrong fd that
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
     fb = qemu_fopen_channel_input(QIO_CHANNEL(bioc));
     object_unref(OBJECT(bioc));
 
+    qemu_mutex_lock_iothread();
+    vm_start();
+    trace_colo_vm_state_change("stop", "run");
+    qemu_mutex_unlock_iothread();
+
     colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_READY,
                       &local_err);
     if (local_err) {
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
+        qemu_mutex_lock_iothread();
+        vm_stop_force_state(RUN_STATE_COLO);
+        trace_colo_vm_state_change("run", "stop");
+        qemu_mutex_unlock_iothread();
+
         /* FIXME: This is unnecessary for periodic checkpoint mode */
         colo_send_message(mis->to_src_file, COLO_MESSAGE_CHECKPOINT_REPLY,
                      &local_err);
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
         }
 
         vmstate_loading = false;
+        vm_start();
+        trace_colo_vm_state_change("stop", "run");
         qemu_mutex_unlock_iothread();
 
         if (failover_get_state() == FAILOVER_STATUS_RELAUNCH) {
diff --git a/migration/migration.c b/migration/migration.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -XXX,XX +XXX,XX @@
 /* Migration XBZRLE default cache size */
 #define DEFAULT_MIGRATE_XBZRLE_CACHE_SIZE (64 * 1024 * 1024)
 
-/* The delay time (in ms) between two COLO checkpoints
- * Note: Please change this default value to 10000 when we support hybrid mode.
- */
-#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY 200
+/* The delay time (in ms) between two COLO checkpoints */
+#define DEFAULT_MIGRATE_X_CHECKPOINT_DELAY (200 * 100)
 #define DEFAULT_MIGRATE_MULTIFD_CHANNELS 2
 #define DEFAULT_MIGRATE_MULTIFD_PAGE_COUNT 16
 
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

Make sure master start block replication after slave's block
replication started.

Besides, we need to activate VM's blocks before goes into
COLO state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 migration/colo.c      | 43 +++++++++++++++++++++++++++++++++++++++++++
 migration/migration.c | 10 ++++++++++
 2 files changed, 53 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -XXX,XX +XXX,XX @@
 #include "replication.h"
 #include "net/colo-compare.h"
 #include "net/colo.h"
+#include "block/block.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -XXX,XX +XXX,XX @@ static void secondary_vm_do_failover(void)
 {
     int old_state;
     MigrationIncomingState *mis = migration_incoming_get_current();
+    Error *local_err = NULL;
 
     /* Can not do failover during the process of VM's loading VMstate, Or
      * it will break the secondary VM.
@@ -XXX,XX +XXX,XX @@ static void secondary_vm_do_failover(void)
     migrate_set_state(&mis->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
 
+    replication_stop_all(true, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+
     if (!autostart) {
         error_report("\"-S\" qemu option will be ignored in secondary side");
         /* recover runstate to normal migration finish state */
@@ -XXX,XX +XXX,XX @@ static void primary_vm_do_failover(void)
 {
     MigrationState *s = migrate_get_current();
     int old_state;
+    Error *local_err = NULL;
 
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
@@ -XXX,XX +XXX,XX @@ static void primary_vm_do_failover(void)
                      FailoverStatus_str(old_state));
         return;
     }
+
+    replication_stop_all(true, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        local_err = NULL;
+    }
+
     /* Notify COLO thread that failover work is finished */
     qemu_sem_post(&s->colo_exit_sem);
 }
@@ -XXX,XX +XXX,XX @@ static int colo_do_checkpoint_transaction(MigrationState *s,
     qemu_savevm_state_header(fb);
     qemu_savevm_state_setup(fb);
     qemu_mutex_lock_iothread();
+    replication_do_checkpoint_all(&local_err);
+    if (local_err) {
+        qemu_mutex_unlock_iothread();
+        goto out;
+    }
     qemu_savevm_state_complete_precopy(fb, false, false);
     qemu_mutex_unlock_iothread();
 
@@ -XXX,XX +XXX,XX @@ static void colo_process_checkpoint(MigrationState *s)
     object_unref(OBJECT(bioc));
 
     qemu_mutex_lock_iothread();
+    replication_start_all(REPLICATION_MODE_PRIMARY, &local_err);
+    if (local_err) {
+        qemu_mutex_unlock_iothread();
+        goto out;
+    }
+
     vm_start();
     qemu_mutex_unlock_iothread();
     trace_colo_vm_state_change("stop", "run");
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
     object_unref(OBJECT(bioc));
 
     qemu_mutex_lock_iothread();
+    replication_start_all(REPLICATION_MODE_SECONDARY, &local_err);
+    if (local_err) {
+        qemu_mutex_unlock_iothread();
+        goto out;
+    }
     vm_start();
     trace_colo_vm_state_change("stop", "run");
     qemu_mutex_unlock_iothread();
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
+        replication_get_error_all(&local_err);
+        if (local_err) {
+            qemu_mutex_unlock_iothread();
+            goto out;
+        }
+        /* discard colo disk buffer */
+        replication_do_checkpoint_all(&local_err);
+        if (local_err) {
+            qemu_mutex_unlock_iothread();
+            goto out;
+        }
+
         vmstate_loading = false;
         vm_start();
         trace_colo_vm_state_change("stop", "run");
diff --git a/migration/migration.c b/migration/migration.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -XXX,XX +XXX,XX @@ static void process_incoming_migration_co(void *opaque)
     MigrationIncomingState *mis = migration_incoming_get_current();
     PostcopyState ps;
     int ret;
+    Error *local_err = NULL;
 
     assert(mis->from_src_file);
     mis->migration_incoming_co = qemu_coroutine_self();
@@ -XXX,XX +XXX,XX @@ static void process_incoming_migration_co(void *opaque)
 
     /* we get COLO info, and know if we are in COLO mode */
     if (!ret && migration_incoming_enable_colo()) {
+        /* Make sure all file formats flush their mutable metadata */
+        bdrv_invalidate_cache_all(&local_err);
+        if (local_err) {
+            migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE,
+                    MIGRATION_STATUS_FAILED);
+            error_report_err(local_err);
+            exit(EXIT_FAILURE);
+        }
+
         qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
              colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
         mis->have_colo_incoming_thread = true;
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

We need to know if migration is going into COLO state for
incoming side before start normal migration.

Instead by using the VMStateDescription to send colo_state
from source side to destination side, we use MIG_CMD_ENABLE_COLO
to indicate whether COLO is enabled or not.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/migration/colo.h |  5 +--
 migration/Makefile.objs  |  2 +-
 migration/colo-comm.c    | 76 ----------------------------------------
 migration/colo.c         | 13 ++++++-
 migration/migration.c    | 23 +++++++++++-
 migration/savevm.c       | 17 +++++++++
 migration/savevm.h       |  1 +
 migration/trace-events   |  1 +
 vl.c                     |  2 --
 9 files changed, 57 insertions(+), 83 deletions(-)
 delete mode 100644 migration/colo-comm.c

diff --git a/include/migration/colo.h b/include/migration/colo.h
index XXXXXXX..XXXXXXX 100644
--- a/include/migration/colo.h
+++ b/include/migration/colo.h
@@ -XXX,XX +XXX,XX @@ void migrate_start_colo_process(MigrationState *s);
 bool migration_in_colo_state(void);
 
 /* loadvm */
-bool migration_incoming_enable_colo(void);
-void migration_incoming_exit_colo(void);
+void migration_incoming_enable_colo(void);
+void migration_incoming_disable_colo(void);
+bool migration_incoming_colo_enabled(void);
 void *colo_process_incoming_thread(void *opaque);
 bool migration_incoming_in_colo_state(void);
 
diff --git a/migration/Makefile.objs b/migration/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/migration/Makefile.objs
+++ b/migration/Makefile.objs
@@ -XXX,XX +XXX,XX @@
 common-obj-y += migration.o socket.o fd.o exec.o
 common-obj-y += tls.o channel.o savevm.o
-common-obj-y += colo-comm.o colo.o colo-failover.o
+common-obj-y += colo.o colo-failover.o
 common-obj-y += vmstate.o vmstate-types.o page_cache.o
 common-obj-y += qemu-file.o global_state.o
 common-obj-y += qemu-file-channel.o
diff --git a/migration/colo-comm.c b/migration/colo-comm.c
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/migration/colo-comm.c
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-/*
- * COarse-grain LOck-stepping Virtual Machines for Non-stop Service (COLO)
- * (a.k.a. Fault Tolerance or Continuous Replication)
- *
- * Copyright (c) 2016 HUAWEI TECHNOLOGIES CO., LTD.
- * Copyright (c) 2016 FUJITSU LIMITED
- * Copyright (c) 2016 Intel Corporation
- *
- * This work is licensed under the terms of the GNU GPL, version 2 or
- * later. See the COPYING file in the top-level directory.
- *
- */
-
-#include "qemu/osdep.h"
-#include "migration.h"
-#include "migration/colo.h"
-#include "migration/vmstate.h"
-#include "trace.h"
-
-typedef struct {
-     bool colo_requested;
-} COLOInfo;
-
-static COLOInfo colo_info;
-
-COLOMode get_colo_mode(void)
-{
-    if (migration_in_colo_state()) {
-        return COLO_MODE_PRIMARY;
-    } else if (migration_incoming_in_colo_state()) {
-        return COLO_MODE_SECONDARY;
-    } else {
-        return COLO_MODE_UNKNOWN;
-    }
-}
-
-static int colo_info_pre_save(void *opaque)
-{
-    COLOInfo *s = opaque;
-
-    s->colo_requested = migrate_colo_enabled();
-
-    return 0;
-}
-
-static bool colo_info_need(void *opaque)
-{
-   return migrate_colo_enabled();
-}
-
-static const VMStateDescription colo_state = {
-    .name = "COLOState",
-    .version_id = 1,
-    .minimum_version_id = 1,
-    .pre_save = colo_info_pre_save,
-    .needed = colo_info_need,
-    .fields = (VMStateField[]) {
-        VMSTATE_BOOL(colo_requested, COLOInfo),
-        VMSTATE_END_OF_LIST()
-    },
-};
-
-void colo_info_init(void)
-{
-    vmstate_register(NULL, 0, &colo_state, &colo_info);
-}
-
-bool migration_incoming_enable_colo(void)
-{
-    return colo_info.colo_requested;
-}
-
-void migration_incoming_exit_colo(void)
-{
-    colo_info.colo_requested = false;
-}
diff --git a/migration/colo.c b/migration/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -XXX,XX +XXX,XX @@ static void primary_vm_do_failover(void)
     qemu_sem_post(&s->colo_exit_sem);
 }
 
+COLOMode get_colo_mode(void)
+{
+    if (migration_in_colo_state()) {
+        return COLO_MODE_PRIMARY;
+    } else if (migration_incoming_in_colo_state()) {
+        return COLO_MODE_SECONDARY;
+    } else {
+        return COLO_MODE_UNKNOWN;
+    }
+}
+
 void colo_do_failover(MigrationState *s)
 {
     /* Make sure VM stopped while failover happened. */
@@ -XXX,XX +XXX,XX @@ out:
     if (mis->to_src_file) {
         qemu_fclose(mis->to_src_file);
     }
-    migration_incoming_exit_colo();
+    migration_incoming_disable_colo();
 
     rcu_unregister_thread();
     return NULL;
diff --git a/migration/migration.c b/migration/migration.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -XXX,XX +XXX,XX @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis, const char *rbname,
     return migrate_send_rp_message(mis, msg_type, msglen, bufc);
 }
 
+static bool migration_colo_enabled;
+bool migration_incoming_colo_enabled(void)
+{
+    return migration_colo_enabled;
+}
+
+void migration_incoming_disable_colo(void)
+{
+    migration_colo_enabled = false;
+}
+
+void migration_incoming_enable_colo(void)
+{
+    migration_colo_enabled = true;
+}
+
 void qemu_start_incoming_migration(const char *uri, Error **errp)
 {
     const char *p;
@@ -XXX,XX +XXX,XX @@ static void process_incoming_migration_co(void *opaque)
     }
 
     /* we get COLO info, and know if we are in COLO mode */
-    if (!ret && migration_incoming_enable_colo()) {
+    if (!ret && migration_incoming_colo_enabled()) {
         /* Make sure all file formats flush their mutable metadata */
         bdrv_invalidate_cache_all(&local_err);
         if (local_err) {
@@ -XXX,XX +XXX,XX @@ static void *migration_thread(void *opaque)
         qemu_savevm_send_postcopy_advise(s->to_dst_file);
     }
 
+    if (migrate_colo_enabled()) {
+        /* Notify migration destination that we enable COLO */
+        qemu_savevm_send_colo_enable(s->to_dst_file);
+    }
+
     qemu_savevm_state_setup(s->to_dst_file);
 
     s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
diff --git a/migration/savevm.c b/migration/savevm.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -XXX,XX +XXX,XX @@
 #include "io/channel-file.h"
 #include "sysemu/replay.h"
 #include "qjson.h"
+#include "migration/colo.h"
 
 #ifndef ETH_P_RARP
 #define ETH_P_RARP 0x8035
@@ -XXX,XX +XXX,XX @@ enum qemu_vm_cmd {
                                       were previously sent during
                                       precopy but are dirty. */
     MIG_CMD_PACKAGED,          /* Send a wrapped stream within this stream */
+    MIG_CMD_ENABLE_COLO,       /* Enable COLO */
     MIG_CMD_POSTCOPY_RESUME,   /* resume postcopy on dest */
     MIG_CMD_RECV_BITMAP,       /* Request for recved bitmap on dst */
     MIG_CMD_MAX
@@ -XXX,XX +XXX,XX @@ static void qemu_savevm_command_send(QEMUFile *f,
     qemu_fflush(f);
 }
 
+void qemu_savevm_send_colo_enable(QEMUFile *f)
+{
+    trace_savevm_send_colo_enable();
+    qemu_savevm_command_send(f, MIG_CMD_ENABLE_COLO, 0, NULL);
+}
+
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value)
 {
     uint32_t buf;
@@ -XXX,XX +XXX,XX @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
     return 0;
 }
 
+static int loadvm_process_enable_colo(MigrationIncomingState *mis)
+{
+    migration_incoming_enable_colo();
+    return 0;
+}
+
 /*
  * Process an incoming 'QEMU_VM_COMMAND'
  * 0           just a normal return
@@ -XXX,XX +XXX,XX @@ static int loadvm_process_command(QEMUFile *f)
 
     case MIG_CMD_RECV_BITMAP:
         return loadvm_handle_recv_bitmap(mis, len);
+
+    case MIG_CMD_ENABLE_COLO:
+        return loadvm_process_enable_colo(mis);
     }
 
     return 0;
diff --git a/migration/savevm.h b/migration/savevm.h
index XXXXXXX..XXXXXXX 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -XXX,XX +XXX,XX @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint16_t len,
                                            uint64_t *start_list,
                                            uint64_t *length_list);
+void qemu_savevm_send_colo_enable(QEMUFile *f);
 
 int qemu_loadvm_state(QEMUFile *f);
 void qemu_loadvm_state_cleanup(void);
diff --git a/migration/trace-events b/migration/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -XXX,XX +XXX,XX @@ savevm_send_ping(uint32_t val) "0x%x"
 savevm_send_postcopy_listen(void) ""
 savevm_send_postcopy_run(void) ""
 savevm_send_postcopy_resume(void) ""
+savevm_send_colo_enable(void) ""
 savevm_send_recv_bitmap(char *name) "%s"
 savevm_state_setup(void) ""
 savevm_state_resume_prepare(void) ""
diff --git a/vl.c b/vl.c
index XXXXXXX..XXXXXXX 100644
--- a/vl.c
+++ b/vl.c
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv, char **envp)
 #endif
     }
 
-    colo_info_init();
-
     if (net_init_clients(&err) < 0) {
         error_report_err(err);
         exit(1);
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

We should not load PVM's state directly into SVM, because there maybe some
errors happen when SVM is receving data, which will break SVM.

We need to ensure receving all data before load the state into SVM. We use
an extra memory to cache these data (PVM's ram). The ram cache in secondary side
is initially the same as SVM/PVM's memory. And in the process of checkpoint,
we cache the dirty pages of PVM into this ram cache firstly, so this ram cache
always the same as PVM's memory at every checkpoint, then we flush this cached ram
to SVM after we receive all PVM's state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/exec/ram_addr.h |  1 +
 migration/migration.c   |  7 ++++
 migration/ram.c         | 83 ++++++++++++++++++++++++++++++++++++++++-
 migration/ram.h         |  4 ++
 migration/savevm.c      |  2 +-
 5 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index XXXXXXX..XXXXXXX 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -XXX,XX +XXX,XX @@ struct RAMBlock {
     struct rcu_head rcu;
     struct MemoryRegion *mr;
     uint8_t *host;
+    uint8_t *colo_cache; /* For colo, VM's ram cache */
     ram_addr_t offset;
     ram_addr_t used_length;
     ram_addr_t max_length;
diff --git a/migration/migration.c b/migration/migration.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -XXX,XX +XXX,XX @@ static void process_incoming_migration_co(void *opaque)
             exit(EXIT_FAILURE);
         }
 
+        if (colo_init_ram_cache() < 0) {
+            error_report("Init ram cache failed");
+            exit(EXIT_FAILURE);
+        }
+
         qemu_thread_create(&mis->colo_incoming_thread, "COLO incoming",
              colo_process_incoming_thread, mis, QEMU_THREAD_JOINABLE);
         mis->have_colo_incoming_thread = true;
@@ -XXX,XX +XXX,XX @@ static void process_incoming_migration_co(void *opaque)
 
         /* Wait checkpoint incoming thread exit before free resource */
         qemu_thread_join(&mis->colo_incoming_thread);
+        /* We hold the global iothread lock, so it is safe here */
+        colo_release_ram_cache();
     }
 
     if (ret < 0) {
diff --git a/migration/ram.c b/migration/ram.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -XXX,XX +XXX,XX @@ static inline void *host_from_ram_block_offset(RAMBlock *block,
     return block->host + offset;
 }
 
+static inline void *colo_cache_from_block_offset(RAMBlock *block,
+                                                 ram_addr_t offset)
+{
+    if (!offset_in_ramblock(block, offset)) {
+        return NULL;
+    }
+    if (!block->colo_cache) {
+        error_report("%s: colo_cache is NULL in block :%s",
+                     __func__, block->idstr);
+        return NULL;
+    }
+    return block->colo_cache + offset;
+}
+
 /**
  * ram_handle_compressed: handle the zero page case
  *
@@ -XXX,XX +XXX,XX @@ static void decompress_data_with_multi_threads(QEMUFile *f,
     qemu_mutex_unlock(&decomp_done_lock);
 }
 
+/*
+ * colo cache: this is for secondary VM, we cache the whole
+ * memory of the secondary VM, it is need to hold the global lock
+ * to call this helper.
+ */
+int colo_init_ram_cache(void)
+{
+    RAMBlock *block;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        block->colo_cache = qemu_anon_ram_alloc(block->used_length,
+                                                NULL,
+                                                false);
+        if (!block->colo_cache) {
+            error_report("%s: Can't alloc memory for COLO cache of block %s,"
+                         "size 0x" RAM_ADDR_FMT, __func__, block->idstr,
+                         block->used_length);
+            goto out_locked;
+        }
+        memcpy(block->colo_cache, block->host, block->used_length);
+    }
+    rcu_read_unlock();
+    return 0;
+
+out_locked:
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        if (block->colo_cache) {
+            qemu_anon_ram_free(block->colo_cache, block->used_length);
+            block->colo_cache = NULL;
+        }
+    }
+
+    rcu_read_unlock();
+    return -errno;
+}
+
+/* It is need to hold the global lock to call this helper */
+void colo_release_ram_cache(void)
+{
+    RAMBlock *block;
+
+    rcu_read_lock();
+    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+        if (block->colo_cache) {
+            qemu_anon_ram_free(block->colo_cache, block->used_length);
+            block->colo_cache = NULL;
+        }
+    }
+    rcu_read_unlock();
+}
+
 /**
  * ram_load_setup: Setup RAM for migration incoming side
  *
@@ -XXX,XX +XXX,XX @@ static int ram_load_setup(QEMUFile *f, void *opaque)
 
     xbzrle_load_setup();
     ramblock_recv_map_init();
+
     return 0;
 }
 
@@ -XXX,XX +XXX,XX @@ static int ram_load_cleanup(void *opaque)
         g_free(rb->receivedmap);
         rb->receivedmap = NULL;
     }
+
     return 0;
 }
 
@@ -XXX,XX +XXX,XX @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
                      RAM_SAVE_FLAG_COMPRESS_PAGE | RAM_SAVE_FLAG_XBZRLE)) {
             RAMBlock *block = ram_block_from_stream(f, flags);
 
-            host = host_from_ram_block_offset(block, addr);
+            /*
+             * After going into COLO, we should load the Page into colo_cache.
+             */
+            if (migration_incoming_in_colo_state()) {
+                host = colo_cache_from_block_offset(block, addr);
+            } else {
+                host = host_from_ram_block_offset(block, addr);
+            }
             if (!host) {
                 error_report("Illegal RAM offset " RAM_ADDR_FMT, addr);
                 ret = -EINVAL;
                 break;
             }
-            ramblock_recv_bitmap_set(block, host);
+
+            if (!migration_incoming_in_colo_state()) {
+                ramblock_recv_bitmap_set(block, host);
+            }
+
             trace_ram_load_loop(block->idstr, (uint64_t)addr, flags, host);
         }
 
diff --git a/migration/ram.h b/migration/ram.h
index XXXXXXX..XXXXXXX 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -XXX,XX +XXX,XX @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
                                   const char *block_name);
 int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb);
 
+/* ram cache */
+int colo_init_ram_cache(void);
+void colo_release_ram_cache(void);
+
 #endif
diff --git a/migration/savevm.c b/migration/savevm.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -XXX,XX +XXX,XX @@ static int loadvm_handle_recv_bitmap(MigrationIncomingState *mis,
 static int loadvm_process_enable_colo(MigrationIncomingState *mis)
 {
     migration_incoming_enable_colo();
-    return 0;
+    return colo_init_ram_cache();
 }
 
 /*
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

We record the address of the dirty pages that received,
it will help flushing pages that cached into SVM.

Here, it is a trick, we record dirty pages by re-using migration
dirty bitmap. In the later patch, we will start the dirty log
for SVM, just like migration, in this way, we can record both
the dirty pages caused by PVM and SVM, we only flush those dirty
pages from RAM cache while do checkpoint.

diff --git a/migration/ram.c b/migration/ram.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -XXX,XX +XXX,XX @@ static inline void *colo_cache_from_block_offset(RAMBlock *block,
                      __func__, block->idstr);
         return NULL;
     }
+
+    /*
+    * During colo checkpoint, we need bitmap of these migrated pages.
+    * It help us to decide which pages in ram cache should be flushed
+    * into VM's RAM later.
+    */
+    if (!test_and_set_bit(offset >> TARGET_PAGE_BITS, block->bmap)) {
+        ram_state->migration_dirty_pages++;
+    }
     return block->colo_cache + offset;
 }
 
@@ -XXX,XX +XXX,XX @@ int colo_init_ram_cache(void)
     RAMBlock *block;
 
     rcu_read_lock();
-    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
         block->colo_cache = qemu_anon_ram_alloc(block->used_length,
                                                 NULL,
                                                 false);
@@ -XXX,XX +XXX,XX @@ int colo_init_ram_cache(void)
         memcpy(block->colo_cache, block->host, block->used_length);
     }
     rcu_read_unlock();
+    /*
+    * Record the dirty pages that sent by PVM, we use this dirty bitmap together
+    * with to decide which page in cache should be flushed into SVM's RAM. Here
+    * we use the same name 'ram_bitmap' as for migration.
+    */
+    if (ram_bytes_total()) {
+        RAMBlock *block;
+
+        RAMBLOCK_FOREACH_MIGRATABLE(block) {
+            unsigned long pages = block->max_length >> TARGET_PAGE_BITS;
+
+            block->bmap = bitmap_new(pages);
+            bitmap_set(block->bmap, 0, pages);
+        }
+    }
+    ram_state = g_new0(RAMState, 1);
+    ram_state->migration_dirty_pages = 0;
+
     return 0;
 
 out_locked:
-    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+
+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
         if (block->colo_cache) {
             qemu_anon_ram_free(block->colo_cache, block->used_length);
             block->colo_cache = NULL;
@@ -XXX,XX +XXX,XX @@ void colo_release_ram_cache(void)
 {
     RAMBlock *block;
 
+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
+        g_free(block->bmap);
+        block->bmap = NULL;
+    }
+
     rcu_read_lock();
-    QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+
+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
         if (block->colo_cache) {
             qemu_anon_ram_free(block->colo_cache, block->used_length);
             block->colo_cache = NULL;
         }
     }
+
     rcu_read_unlock();
+    g_free(ram_state);
+    ram_state = NULL;
 }
 
 /**
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

During the time of VM's running, PVM may dirty some pages, we will transfer
PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
time. So, the content of SVM's RAM cache will always be same with PVM's memory
after checkpoint.

Instead of flushing all content of PVM's RAM cache into SVM's MEMORY,
we do this in a more efficient way:
Only flush any page that dirtied by PVM since last checkpoint.
In this way, we can ensure SVM's memory same with PVM's.

Besides, we must ensure flush RAM cache before load device state.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 migration/ram.c        | 37 +++++++++++++++++++++++++++++++++++++
 migration/trace-events |  2 ++
 2 files changed, 39 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -XXX,XX +XXX,XX @@ static bool postcopy_is_running(void)
     return ps >= POSTCOPY_INCOMING_LISTENING && ps < POSTCOPY_INCOMING_END;
 }
 
+/*
+ * Flush content of RAM cache into SVM's memory.
+ * Only flush the pages that be dirtied by PVM or SVM or both.
+ */
+static void colo_flush_ram_cache(void)
+{
+    RAMBlock *block = NULL;
+    void *dst_host;
+    void *src_host;
+    unsigned long offset = 0;
+
+    trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
+    rcu_read_lock();
+    block = QLIST_FIRST_RCU(&ram_list.blocks);
+
+    while (block) {
+        offset = migration_bitmap_find_dirty(ram_state, block, offset);
+
+        if (offset << TARGET_PAGE_BITS >= block->used_length) {
+            offset = 0;
+            block = QLIST_NEXT_RCU(block, next);
+        } else {
+            migration_bitmap_clear_dirty(ram_state, block, offset);
+            dst_host = block->host + (offset << TARGET_PAGE_BITS);
+            src_host = block->colo_cache + (offset << TARGET_PAGE_BITS);
+            memcpy(dst_host, src_host, TARGET_PAGE_SIZE);
+        }
+    }
+
+    rcu_read_unlock();
+    trace_colo_flush_ram_cache_end();
+}
+
 static int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
     int flags = 0, ret = 0, invalid_flags = 0;
@@ -XXX,XX +XXX,XX @@ static int ram_load(QEMUFile *f, void *opaque, int version_id)
     ret |= wait_for_decompress_done();
     rcu_read_unlock();
     trace_ram_load_complete(ret, seq_iter);
+
+    if (!ret  && migration_incoming_in_colo_state()) {
+        colo_flush_ram_cache();
+    }
     return ret;
 }
 
diff --git a/migration/trace-events b/migration/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -XXX,XX +XXX,XX @@ ram_dirty_bitmap_sync_start(void) ""
 ram_dirty_bitmap_sync_wait(void) ""
 ram_dirty_bitmap_sync_complete(void) ""
 ram_state_resume_prepare(uint64_t v) "%" PRId64
+colo_flush_ram_cache_begin(uint64_t dirty_pages) "dirty_pages %" PRIu64
+colo_flush_ram_cache_end(void) ""
 
 # migration/migration.c
 await_return_path_close_on_source_close(void) ""
-- 
2.17.1

From: zhanghailiang <zhang.zhanghailiang@huawei.com>

If some errors happen during VM's COLO FT stage, it's important to
notify the users of this event. Together with 'x-colo-lost-heartbeat',
Users can intervene in COLO's failover work immediately.
If users don't want to get involved in COLO's failover verdict,
it is still necessary to notify users that we exited COLO mode.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 migration/colo.c    | 31 +++++++++++++++++++++++++++++++
 qapi/migration.json | 38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -XXX,XX +XXX,XX @@
 #include "net/colo-compare.h"
 #include "net/colo.h"
 #include "block/block.h"
+#include "qapi/qapi-events-migration.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -XXX,XX +XXX,XX @@ out:
         qemu_fclose(fb);
     }
 
+    /*
+     * There are only two reasons we can get here, some error happened
+     * or the user triggered failover.
+     */
+    switch (failover_get_state()) {
+    case FAILOVER_STATUS_NONE:
+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+                                  COLO_EXIT_REASON_ERROR);
+        break;
+    case FAILOVER_STATUS_REQUIRE:
+        qapi_event_send_colo_exit(COLO_MODE_PRIMARY,
+                                  COLO_EXIT_REASON_REQUEST);
+        break;
+    default:
+        abort();
+    }
+
     /* Hope this not to be too long to wait here */
     qemu_sem_wait(&s->colo_exit_sem);
     qemu_sem_destroy(&s->colo_exit_sem);
@@ -XXX,XX +XXX,XX @@ out:
         error_report_err(local_err);
     }
 
+    switch (failover_get_state()) {
+    case FAILOVER_STATUS_NONE:
+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+                                  COLO_EXIT_REASON_ERROR);
+        break;
+    case FAILOVER_STATUS_REQUIRE:
+        qapi_event_send_colo_exit(COLO_MODE_SECONDARY,
+                                  COLO_EXIT_REASON_REQUEST);
+        break;
+    default:
+        abort();
+    }
+
     if (fb) {
         qemu_fclose(fb);
     }
diff --git a/qapi/migration.json b/qapi/migration.json
index XXXXXXX..XXXXXXX 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -XXX,XX +XXX,XX @@
 { 'enum': 'FailoverStatus',
   'data': [ 'none', 'require', 'active', 'completed', 'relaunch' ] }
 
+##
+# @COLO_EXIT:
+#
+# Emitted when VM finishes COLO mode due to some errors happening or
+# at the request of users.
+#
+# @mode: report COLO mode when COLO exited.
+#
+# @reason: describes the reason for the COLO exit.
+#
+# Since: 3.1
+#
+# Example:
+#
+# <- { "timestamp": {"seconds": 2032141960, "microseconds": 417172},
+#      "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
+#
+##
+{ 'event': 'COLO_EXIT',
+  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason' } }
+
+##
+# @COLOExitReason:
+#
+# The reason for a COLO exit
+#
+# @none: no failover has ever happened. This can't occur in the
+# COLO_EXIT event, only in the result of query-colo-status.
+#
+# @request: COLO exit is due to an external request
+#
+# @error: COLO exit is due to an internal error
+#
+# Since: 3.1
+##
+{ 'enum': 'COLOExitReason',
+  'data': [ 'none', 'request', 'error' ] }
+
 ##
 # @x-colo-lost-heartbeat:
 #
-- 
2.17.1

From: Zhang Chen <chen.zhang@intel.com>

Suggested by Markus Armbruster rename COLO unknown mode to none mode.

Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 migration/colo-failover.c |  2 +-
 migration/colo.c          |  2 +-
 qapi/migration.json       | 10 +++++-----
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/migration/colo-failover.c b/migration/colo-failover.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/colo-failover.c
+++ b/migration/colo-failover.c
@@ -XXX,XX +XXX,XX @@ FailoverStatus failover_get_state(void)
 
 void qmp_x_colo_lost_heartbeat(Error **errp)
 {
-    if (get_colo_mode() == COLO_MODE_UNKNOWN) {
+    if (get_colo_mode() == COLO_MODE_NONE) {
         error_setg(errp, QERR_FEATURE_DISABLED, "colo");
         return;
     }
diff --git a/migration/colo.c b/migration/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -XXX,XX +XXX,XX @@ COLOMode get_colo_mode(void)
     } else if (migration_incoming_in_colo_state()) {
         return COLO_MODE_SECONDARY;
     } else {
-        return COLO_MODE_UNKNOWN;
+        return COLO_MODE_NONE;
     }
 }
 
diff --git a/qapi/migration.json b/qapi/migration.json
index XXXXXXX..XXXXXXX 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -XXX,XX +XXX,XX @@
 ##
 # @COLOMode:
 #
-# The colo mode
+# The COLO current mode.
 #
-# @unknown: unknown mode
+# @none: COLO is disabled.
 #
-# @primary: master side
+# @primary: COLO node in primary side.
 #
-# @secondary: slave side
+# @secondary: COLO node in slave side.
 #
 # Since: 2.8
 ##
 { 'enum': 'COLOMode',
-  'data': [ 'unknown', 'primary', 'secondary'] }
+  'data': [ 'none', 'primary', 'secondary'] }
 
 ##
 # @FailoverStatus:
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

Libvirt or other high level software can use this command query colo status.
You can test this command like that:
{'execute':'query-colo-status'}

Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 migration/colo.c    | 21 +++++++++++++++++++++
 qapi/migration.json | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -XXX,XX +XXX,XX @@
 #include "net/colo.h"
 #include "block/block.h"
 #include "qapi/qapi-events-migration.h"
+#include "qapi/qmp/qerror.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -XXX,XX +XXX,XX @@ void qmp_xen_colo_do_checkpoint(Error **errp)
 #endif
 }
 
+COLOStatus *qmp_query_colo_status(Error **errp)
+{
+    COLOStatus *s = g_new0(COLOStatus, 1);
+
+    s->mode = get_colo_mode();
+
+    switch (failover_get_state()) {
+    case FAILOVER_STATUS_NONE:
+        s->reason = COLO_EXIT_REASON_NONE;
+        break;
+    case FAILOVER_STATUS_REQUIRE:
+        s->reason = COLO_EXIT_REASON_REQUEST;
+        break;
+    default:
+        s->reason = COLO_EXIT_REASON_ERROR;
+    }
+
+    return s;
+}
+
 static void colo_send_message(QEMUFile *f, COLOMessage msg,
                               Error **errp)
 {
diff --git a/qapi/migration.json b/qapi/migration.json
index XXXXXXX..XXXXXXX 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -XXX,XX +XXX,XX @@
 ##
 { 'command': 'xen-colo-do-checkpoint' }
 
+##
+# @COLOStatus:
+#
+# The result format for 'query-colo-status'.
+#
+# @mode: COLO running mode. If COLO is running, this field will return
+#        'primary' or 'secondary'.
+#
+# @reason: describes the reason for the COLO exit.
+#
+# Since: 3.0
+##
+{ 'struct': 'COLOStatus',
+  'data': { 'mode': 'COLOMode', 'reason': 'COLOExitReason' } }
+
+##
+# @query-colo-status:
+#
+# Query COLO status while the vm is running.
+#
+# Returns: A @COLOStatus object showing the status.
+#
+# Example:
+#
+# -> { "execute": "query-colo-status" }
+# <- { "return": { "mode": "primary", "active": true, "reason": "request" } }
+#
+# Since: 3.0
+##
+{ 'command': 'query-colo-status',
+  'returns': 'COLOStatus' }
+
 ##
 # @migrate-recover:
 #
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

There are several stages during loadvm/savevm process. In different stage,
migration incoming processes different types of sections.
We want to control these stages more accuracy, it will benefit COLO
performance, we don't have to save type of QEMU_VM_SECTION_START
sections everytime while do checkpoint, besides, we want to separate
the process of saving/loading memory and devices state.

So we add three new helper functions: qemu_load_device_state() and
qemu_savevm_live_state() to achieve different process during migration.

Besides, we make qemu_loadvm_state_main() and qemu_save_device_state()
public, and simplify the codes of qemu_save_device_state() by calling the
wrapper qemu_savevm_state_header().

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 migration/colo.c   | 41 ++++++++++++++++++++++++++++++++---------
 migration/savevm.c | 36 +++++++++++++++++++++++++++++-------
 migration/savevm.h |  4 ++++
 3 files changed, 65 insertions(+), 16 deletions(-)

diff --git a/migration/colo.c b/migration/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -XXX,XX +XXX,XX @@
 #include "block/block.h"
 #include "qapi/qapi-events-migration.h"
 #include "qapi/qmp/qerror.h"
+#include "sysemu/cpus.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -XXX,XX +XXX,XX @@ static int colo_do_checkpoint_transaction(MigrationState *s,
 
     /* Disable block migration */
     migrate_set_block_enabled(false, &local_err);
-    qemu_savevm_state_header(fb);
-    qemu_savevm_state_setup(fb);
     qemu_mutex_lock_iothread();
     replication_do_checkpoint_all(&local_err);
     if (local_err) {
         qemu_mutex_unlock_iothread();
         goto out;
     }
-    qemu_savevm_state_complete_precopy(fb, false, false);
-    qemu_mutex_unlock_iothread();
-
-    qemu_fflush(fb);
 
     colo_send_message(s->to_dst_file, COLO_MESSAGE_VMSTATE_SEND, &local_err);
     if (local_err) {
+        qemu_mutex_unlock_iothread();
+        goto out;
+    }
+    /* Note: device state is saved into buffer */
+    ret = qemu_save_device_state(fb);
+
+    qemu_mutex_unlock_iothread();
+    if (ret < 0) {
         goto out;
     }
+    /*
+     * Only save VM's live state, which not including device state.
+     * TODO: We may need a timeout mechanism to prevent COLO process
+     * to be blocked here.
+     */
+    qemu_savevm_live_state(s->to_dst_file);
+
+    qemu_fflush(fb);
+
     /*
      * We need the size of the VMstate data in Secondary side,
      * With which we can decide how much data should be read.
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
     uint64_t total_size;
     uint64_t value;
     Error *local_err = NULL;
+    int ret;
 
     rcu_register_thread();
     qemu_sem_init(&mis->colo_incoming_sem, 0);
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
+        qemu_mutex_lock_iothread();
+        cpu_synchronize_all_pre_loadvm();
+        ret = qemu_loadvm_state_main(mis->from_src_file, mis);
+        qemu_mutex_unlock_iothread();
+
+        if (ret < 0) {
+            error_report("Load VM's live state (ram) error");
+            goto out;
+        }
+
         value = colo_receive_message_value(mis->from_src_file,
                                  COLO_MESSAGE_VMSTATE_SIZE, &local_err);
         if (local_err) {
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
         }
 
         qemu_mutex_lock_iothread();
-        qemu_system_reset(SHUTDOWN_CAUSE_NONE);
         vmstate_loading = true;
-        if (qemu_loadvm_state(fb) < 0) {
-            error_report("COLO: loadvm failed");
+        ret = qemu_load_device_state(fb);
+        if (ret < 0) {
+            error_report("COLO: load device state failed");
             qemu_mutex_unlock_iothread();
             goto out;
         }
diff --git a/migration/savevm.c b/migration/savevm.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -XXX,XX +XXX,XX @@ done:
     return ret;
 }
 
-static int qemu_save_device_state(QEMUFile *f)
+void qemu_savevm_live_state(QEMUFile *f)
 {
-    SaveStateEntry *se;
+    /* save QEMU_VM_SECTION_END section */
+    qemu_savevm_state_complete_precopy(f, true, false);
+    qemu_put_byte(f, QEMU_VM_EOF);
+}
 
-    qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
-    qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+int qemu_save_device_state(QEMUFile *f)
+{
+    SaveStateEntry *se;
 
+    if (!migration_in_colo_state()) {
+        qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
+        qemu_put_be32(f, QEMU_VM_FILE_VERSION);
+    }
     cpu_synchronize_all_states();
 
     QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
@@ -XXX,XX +XXX,XX @@ enum LoadVMExitCodes {
     LOADVM_QUIT     =  1,
 };
 
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
-
 /* ------ incoming postcopy messages ------ */
 /* 'advise' arrives before any transfers just to tell us that a postcopy
  * *might* happen - it might be skipped if precopy transferred everything
@@ -XXX,XX +XXX,XX @@ static bool postcopy_pause_incoming(MigrationIncomingState *mis)
     return true;
 }
 
-static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis)
 {
     uint8_t section_type;
     int ret = 0;
@@ -XXX,XX +XXX,XX @@ int qemu_loadvm_state(QEMUFile *f)
     return ret;
 }
 
+int qemu_load_device_state(QEMUFile *f)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    int ret;
+
+    /* Load QEMU_VM_SECTION_FULL section */
+    ret = qemu_loadvm_state_main(f, mis);
+    if (ret < 0) {
+        error_report("Failed to load device state: %d", ret);
+        return ret;
+    }
+
+    cpu_synchronize_all_post_init();
+    return 0;
+}
+
 int save_snapshot(const char *name, Error **errp)
 {
     BlockDriverState *bs, *bs1;
diff --git a/migration/savevm.h b/migration/savevm.h
index XXXXXXX..XXXXXXX 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -XXX,XX +XXX,XX @@ void qemu_savevm_send_postcopy_ram_discard(QEMUFile *f, const char *name,
                                            uint64_t *start_list,
                                            uint64_t *length_list);
 void qemu_savevm_send_colo_enable(QEMUFile *f);
+void qemu_savevm_live_state(QEMUFile *f);
+int qemu_save_device_state(QEMUFile *f);
 
 int qemu_loadvm_state(QEMUFile *f);
 void qemu_loadvm_state_cleanup(void);
+int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
+int qemu_load_device_state(QEMUFile *f);
 
 #endif
-- 
2.17.1

From: zhanghailiang <zhang.zhanghailiang@huawei.com>

Don't need to flush all VM's ram from cache, only
flush the dirty pages since last checkpoint

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 migration/ram.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -XXX,XX +XXX,XX @@ int colo_init_ram_cache(void)
     }
     ram_state = g_new0(RAMState, 1);
     ram_state->migration_dirty_pages = 0;
+    memory_global_dirty_log_start();
 
     return 0;
 
@@ -XXX,XX +XXX,XX @@ void colo_release_ram_cache(void)
 {
     RAMBlock *block;
 
+    memory_global_dirty_log_stop();
     RAMBLOCK_FOREACH_MIGRATABLE(block) {
         g_free(block->bmap);
         block->bmap = NULL;
@@ -XXX,XX +XXX,XX @@ static void colo_flush_ram_cache(void)
     void *src_host;
     unsigned long offset = 0;
 
+    memory_global_dirty_log_sync();
+    rcu_read_lock();
+    RAMBLOCK_FOREACH_MIGRATABLE(block) {
+        migration_bitmap_sync_range(ram_state, block, 0, block->used_length);
+    }
+    rcu_read_unlock();
+
     trace_colo_flush_ram_cache_begin(ram_state->migration_dirty_pages);
     rcu_read_lock();
     block = QLIST_FIRST_RCU(&ram_list.blocks);
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

Filter needs to process the event of checkpoint/failover or
other event passed by COLO frame.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/net/filter.h |  5 +++++
 net/filter.c         | 17 +++++++++++++++++
 net/net.c            | 19 +++++++++++++++++++
 3 files changed, 41 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index XXXXXXX..XXXXXXX 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -XXX,XX +XXX,XX @@ typedef ssize_t (FilterReceiveIOV)(NetFilterState *nc,
 
 typedef void (FilterStatusChanged) (NetFilterState *nf, Error **errp);
 
+typedef void (FilterHandleEvent) (NetFilterState *nf, int event, Error **errp);
+
 typedef struct NetFilterClass {
     ObjectClass parent_class;
 
@@ -XXX,XX +XXX,XX @@ typedef struct NetFilterClass {
     FilterSetup *setup;
     FilterCleanup *cleanup;
     FilterStatusChanged *status_changed;
+    FilterHandleEvent *handle_event;
     /* mandatory */
     FilterReceiveIOV *receive_iov;
 } NetFilterClass;
@@ -XXX,XX +XXX,XX @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
                                     int iovcnt,
                                     void *opaque);
 
+void colo_notify_filters_event(int event, Error **errp);
+
 #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter.c b/net/filter.c
index XXXXXXX..XXXXXXX 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -XXX,XX +XXX,XX @@
 #include "net/vhost_net.h"
 #include "qom/object_interfaces.h"
 #include "qemu/iov.h"
+#include "net/colo.h"
+#include "migration/colo.h"
 
 static inline bool qemu_can_skip_netfilter(NetFilterState *nf)
 {
@@ -XXX,XX +XXX,XX @@ static void netfilter_finalize(Object *obj)
     g_free(nf->netdev_id);
 }
 
+static void default_handle_event(NetFilterState *nf, int event, Error **errp)
+{
+    switch (event) {
+    case COLO_EVENT_CHECKPOINT:
+        break;
+    case COLO_EVENT_FAILOVER:
+        object_property_set_str(OBJECT(nf), "off", "status", errp);
+        break;
+    default:
+        break;
+    }
+}
+
 static void netfilter_class_init(ObjectClass *oc, void *data)
 {
     UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
+    NetFilterClass *nfc = NETFILTER_CLASS(oc);
 
     ucc->complete = netfilter_complete;
+    nfc->handle_event = default_handle_event;
 }
 
 static const TypeInfo netfilter_info = {
diff --git a/net/net.c b/net/net.c
index XXXXXXX..XXXXXXX 100644
--- a/net/net.c
+++ b/net/net.c
@@ -XXX,XX +XXX,XX @@ void hmp_info_network(Monitor *mon, const QDict *qdict)
     }
 }
 
+void colo_notify_filters_event(int event, Error **errp)
+{
+    NetClientState *nc;
+    NetFilterState *nf;
+    NetFilterClass *nfc = NULL;
+    Error *local_err = NULL;
+
+    QTAILQ_FOREACH(nc, &net_clients, next) {
+        QTAILQ_FOREACH(nf, &nc->filters, next) {
+            nfc = NETFILTER_GET_CLASS(OBJECT(nf));
+            nfc->handle_event(nf, event, &local_err);
+            if (local_err) {
+                error_propagate(errp, local_err);
+                return;
+            }
+        }
+    }
+}
+
 void qmp_set_link(const char *name, bool up, Error **errp)
 {
     NetClientState *ncs[MAX_QUEUE_NUM];
-- 
2.17.1

From: Zhang Chen <zhangckid@gmail.com>

After one round of checkpoint, the states between PVM and SVM
become consistent, so it is unnecessary to adjust the sequence
of net packets for old connections, besides, while failover
happens, filter-rewriter will into failover mode that needn't
handle the new TCP connection.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo-compare.c    | 12 ++++-----
 net/colo.c            |  8 ++++++
 net/colo.h            |  2 ++
 net/filter-rewriter.c | 57 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 73 insertions(+), 6 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -XXX,XX +XXX,XX @@ enum {
     SECONDARY_IN,
 };
 
+static void colo_compare_inconsistency_notify(void)
+{
+    notifier_list_notify(&colo_compare_notifiers,
+                migrate_get_current());
+}
+
 static int compare_chr_send(CompareState *s,
                             const uint8_t *buf,
                             uint32_t size,
@@ -XXX,XX +XXX,XX @@ static bool colo_mark_tcp_pkt(Packet *ppkt, Packet *spkt,
     return false;
 }
 
-static void colo_compare_inconsistency_notify(void)
-{
-    notifier_list_notify(&colo_compare_notifiers,
-                migrate_get_current());
-}
-
 static void colo_compare_tcp(CompareState *s, Connection *conn)
 {
     Packet *ppkt = NULL, *spkt = NULL;
diff --git a/net/colo.c b/net/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/net/colo.c
+++ b/net/colo.c
@@ -XXX,XX +XXX,XX @@ Connection *connection_get(GHashTable *connection_track_table,
 
     return conn;
 }
+
+bool connection_has_tracked(GHashTable *connection_track_table,
+                            ConnectionKey *key)
+{
+    Connection *conn = g_hash_table_lookup(connection_track_table, key);
+
+    return conn ? true : false;
+}
diff --git a/net/colo.h b/net/colo.h
index XXXXXXX..XXXXXXX 100644
--- a/net/colo.h
+++ b/net/colo.h
@@ -XXX,XX +XXX,XX @@ void connection_destroy(void *opaque);
 Connection *connection_get(GHashTable *connection_track_table,
                            ConnectionKey *key,
                            GQueue *conn_list);
+bool connection_has_tracked(GHashTable *connection_track_table,
+                            ConnectionKey *key);
 void connection_hashtable_reset(GHashTable *connection_track_table);
 Packet *packet_new(const void *data, int size, int vnet_hdr_len);
 void packet_destroy(void *opaque, void *user_data);
diff --git a/net/filter-rewriter.c b/net/filter-rewriter.c
index XXXXXXX..XXXXXXX 100644
--- a/net/filter-rewriter.c
+++ b/net/filter-rewriter.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/main-loop.h"
 #include "qemu/iov.h"
 #include "net/checksum.h"
+#include "net/colo.h"
+#include "migration/colo.h"
 
 #define FILTER_COLO_REWRITER(obj) \
     OBJECT_CHECK(RewriterState, (obj), TYPE_FILTER_REWRITER)
 
 #define TYPE_FILTER_REWRITER "filter-rewriter"
+#define FAILOVER_MODE_ON  true
+#define FAILOVER_MODE_OFF false
 
 typedef struct RewriterState {
     NetFilterState parent_obj;
@@ -XXX,XX +XXX,XX @@ typedef struct RewriterState {
     /* hashtable to save connection */
     GHashTable *connection_track_table;
     bool vnet_hdr;
+    bool failover_mode;
 } RewriterState;
 
+static void filter_rewriter_failover_mode(RewriterState *s)
+{
+    s->failover_mode = FAILOVER_MODE_ON;
+}
+
 static void filter_rewriter_flush(NetFilterState *nf)
 {
     RewriterState *s = FILTER_COLO_REWRITER(nf);
@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
              */
             reverse_connection_key(&key);
         }
+
+        /* After failover we needn't change new TCP packet */
+        if (s->failover_mode &&
+            !connection_has_tracked(s->connection_track_table, &key)) {
+            goto out;
+        }
+
         conn = connection_get(s->connection_track_table,
                               &key,
                               NULL);
@@ -XXX,XX +XXX,XX @@ static ssize_t colo_rewriter_receive_iov(NetFilterState *nf,
         }
     }
 
+out:
     packet_destroy(pkt, NULL);
     pkt = NULL;
     return 0;
 }
 
+static void reset_seq_offset(gpointer key, gpointer value, gpointer user_data)
+{
+    Connection *conn = (Connection *)value;
+
+    conn->offset = 0;
+}
+
+static gboolean offset_is_nonzero(gpointer key,
+                                  gpointer value,
+                                  gpointer user_data)
+{
+    Connection *conn = (Connection *)value;
+
+    return conn->offset ? true : false;
+}
+
+static void colo_rewriter_handle_event(NetFilterState *nf, int event,
+                                       Error **errp)
+{
+    RewriterState *rs = FILTER_COLO_REWRITER(nf);
+
+    switch (event) {
+    case COLO_EVENT_CHECKPOINT:
+        g_hash_table_foreach(rs->connection_track_table,
+                            reset_seq_offset, NULL);
+        break;
+    case COLO_EVENT_FAILOVER:
+        if (!g_hash_table_find(rs->connection_track_table,
+                              offset_is_nonzero, NULL)) {
+            filter_rewriter_failover_mode(rs);
+        }
+        break;
+    default:
+        break;
+    }
+}
+
 static void colo_rewriter_cleanup(NetFilterState *nf)
 {
     RewriterState *s = FILTER_COLO_REWRITER(nf);
@@ -XXX,XX +XXX,XX @@ static void filter_rewriter_init(Object *obj)
     RewriterState *s = FILTER_COLO_REWRITER(obj);
 
     s->vnet_hdr = false;
+    s->failover_mode = FAILOVER_MODE_OFF;
     object_property_add_bool(obj, "vnet_hdr_support",
                              filter_rewriter_get_vnet_hdr,
                              filter_rewriter_set_vnet_hdr, NULL);
@@ -XXX,XX +XXX,XX @@ static void colo_rewriter_class_init(ObjectClass *oc, void *data)
     nfc->setup = colo_rewriter_setup;
     nfc->cleanup = colo_rewriter_cleanup;
     nfc->receive_iov = colo_rewriter_receive_iov;
+    nfc->handle_event = colo_rewriter_handle_event;
 }
 
 static const TypeInfo colo_rewriter_info = {
-- 
2.17.1

From: zhanghailiang <zhang.zhanghailiang@huawei.com>

Notify all net filters about the checkpoint and failover event.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 migration/colo.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/migration/colo.c b/migration/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -XXX,XX +XXX,XX @@
 #include "qapi/qapi-events-migration.h"
 #include "qapi/qmp/qerror.h"
 #include "sysemu/cpus.h"
+#include "net/filter.h"
 
 static bool vmstate_loading;
 static Notifier packets_compare_notifier;
@@ -XXX,XX +XXX,XX @@ static void secondary_vm_do_failover(void)
         error_report_err(local_err);
     }
 
+    /* Notify all filters of all NIC to do checkpoint */
+    colo_notify_filters_event(COLO_EVENT_FAILOVER, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+    }
+
     if (!autostart) {
         error_report("\"-S\" qemu option will be ignored in secondary side");
         /* recover runstate to normal migration finish state */
@@ -XXX,XX +XXX,XX @@ void *colo_process_incoming_thread(void *opaque)
             goto out;
         }
 
+        /* Notify all filters of all NIC to do checkpoint */
+        colo_notify_filters_event(COLO_EVENT_CHECKPOINT, &local_err);
+
+        if (local_err) {
+            qemu_mutex_unlock_iothread();
+            goto out;
+        }
+
         vmstate_loading = false;
         vm_start();
         trace_colo_vm_state_change("stop", "run");
-- 
2.17.1

From: zhanghailiang <zhang.zhanghailiang@huawei.com>

COLO thread may sleep at qemu_sem_wait(&s->colo_checkpoint_sem),
while failover works begin, It's better to wakeup it to quick
the process.

diff --git a/migration/colo.c b/migration/colo.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -XXX,XX +XXX,XX @@ static void primary_vm_do_failover(void)
 
     migrate_set_state(&s->state, MIGRATION_STATUS_COLO,
                       MIGRATION_STATUS_COMPLETED);
+    /*
+     * kick COLO thread which might wait at
+     * qemu_sem_wait(&s->colo_checkpoint_sem).
+     */
+    colo_checkpoint_notify(migrate_get_current());
 
     /*
      * Wake up COLO thread which may blocked in recv() or send(),
@@ -XXX,XX +XXX,XX @@ static void colo_process_checkpoint(MigrationState *s)
 
         qemu_sem_wait(&s->colo_checkpoint_sem);
 
+        if (s->state != MIGRATION_STATUS_COLO) {
+            goto out;
+        }
         ret = colo_do_checkpoint_transaction(s, bioc, fb);
         if (ret < 0) {
             goto out;
-- 
2.17.1

From: Zhang Chen <chen.zhang@intel.com>

This diagram make user better understand COLO.
Suggested by Markus Armbruster.

Signed-off-by: Zhang Chen <zhangckid@gmail.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 docs/COLO-FT.txt | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
index XXXXXXX..XXXXXXX 100644
--- a/docs/COLO-FT.txt
+++ b/docs/COLO-FT.txt
@@ -XXX,XX +XXX,XX @@ Note:
 HeartBeat has not been implemented yet, so you need to trigger failover process
 by using 'x-colo-lost-heartbeat' command.
 
+== COLO operation status ==
+
++-----------------+
+|                 |
+|    Start COLO   |
+|                 |
++--------+--------+
+         |
+         |  Main qmp command:
+         |  migrate-set-capabilities with x-colo
+         |  migrate
+         |
+         v
++--------+--------+
+|                 |
+|  COLO running   |
+|                 |
++--------+--------+
+         |
+         |  Main qmp command:
+         |  x-colo-lost-heartbeat
+         |  or
+         |  some error happened
+         v
++--------+--------+
+|                 |  send qmp event:
+|  COLO failover  |  COLO_EXIT
+|                 |
++-----------------+
+
+COLO use the qmp command to switch and report operation status.
+The diagram just shows the main qmp command, you can get the detail
+in test procedure.
+
 == Test procedure ==
 1. Startup qemu
 Primary:
-- 
2.17.1

From: liujunjie <liujunjie23@huawei.com>

Before, we did not clear callback like handle_output when delete
the virtqueue which may result be segmentfault.
The scene is as follows:
1. Start a vm with multiqueue vhost-net,
2. then we write VIRTIO_PCI_GUEST_FEATURES in PCI configuration to
triger multiqueue disable in this vm which will delete the virtqueue.
In this step, the tx_bh is deleted but the callback virtio_net_handle_tx_bh
still exist.
3. Finally, we write VIRTIO_PCI_QUEUE_NOTIFY in PCI configuration to
notify the deleted virtqueue. In this way, virtio_net_handle_tx_bh
will be called and qemu will be crashed.

Although the way described above is uncommon, we had better reinforce it.

CC: qemu-stable@nongnu.org
Signed-off-by: liujunjie <liujunjie23@huawei.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/virtio/virtio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -XXX,XX +XXX,XX @@ void virtio_del_queue(VirtIODevice *vdev, int n)
 
     vdev->vq[n].vring.num = 0;
     vdev->vq[n].vring.num_default = 0;
+    vdev->vq[n].handle_output = NULL;
+    vdev->vq[n].handle_aio_output = NULL;
 }
 
 static void virtio_set_isr(VirtIODevice *vdev, int value)
-- 
2.17.1

In ne2000_receive(), we try to assign size_ to size which converts
from size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access of
for both buf and buf1.

Fixing by converting the type of size to size_t.

CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/ne2000.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/net/ne2000.c b/hw/net/ne2000.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/ne2000.c
+++ b/hw/net/ne2000.c
@@ -XXX,XX +XXX,XX @@ static int ne2000_buffer_full(NE2000State *s)
 ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
 {
     NE2000State *s = qemu_get_nic_opaque(nc);
-    int size = size_;
+    size_t size = size_;
     uint8_t *p;
     unsigned int total_len, next, avail, len, index, mcast_idx;
     uint8_t buf1[60];
@@ -XXX,XX +XXX,XX @@ ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
         { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
 
 #if defined(DEBUG_NE2000)
-    printf("NE2000: received len=%d\n", size);
+    printf("NE2000: received len=%zu\n", size);
 #endif
 
     if (s->cmd & E8390_STOP || ne2000_buffer_full(s))
-- 
2.17.1

In rtl8139_do_receive(), we try to assign size_ to size which converts
from size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access of
for both buf and buf1.

Fixing by converting the type of size to size_t.

CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/rtl8139.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -XXX,XX +XXX,XX @@ static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t
     RTL8139State *s = qemu_get_nic_opaque(nc);
     PCIDevice *d = PCI_DEVICE(s);
     /* size is the length of the buffer passed to the driver */
-    int size = size_;
+    size_t size = size_;
     const uint8_t *dot1q_buf = NULL;
 
     uint32_t packet_header = 0;
@@ -XXX,XX +XXX,XX @@ static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t
     static const uint8_t broadcast_macaddr[6] =
         { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
 
-    DPRINTF(">>> received len=%d\n", size);
+    DPRINTF(">>> received len=%zu\n", size);
 
     /* test if board clock is stopped */
     if (!s->clock_enabled)
@@ -XXX,XX +XXX,XX @@ static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t
 
         if (size+4 > rx_space)
         {
-            DPRINTF("C+ Rx mode : descriptor %d size %d received %d + 4\n",
+            DPRINTF("C+ Rx mode : descriptor %d size %d received %zu + 4\n",
                 descriptor, rx_space, size);
 
             s->IntrStatus |= RxOverflow;
@@ -XXX,XX +XXX,XX @@ static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t
         if (avail != 0 && RX_ALIGN(size + 8) >= avail)
         {
             DPRINTF("rx overflow: rx buffer length %d head 0x%04x "
-                "read 0x%04x === available 0x%04x need 0x%04x\n",
+                "read 0x%04x === available 0x%04x need 0x%04zx\n",
                 s->RxBufferSize, s->RxBufAddr, s->RxBufPtr, avail, size + 8);
 
             s->IntrStatus |= RxOverflow;
-- 
2.17.1

In pcnet_receive(), we try to assign size_ to size which converts from
size_t to integer. This will cause troubles when size_ is greater
INT_MAX, this will lead a negative value in size and it can then pass
the check of size < MIN_BUF_SIZE which may lead out of bound access
for both buf and buf1.

Fixing by converting the type of size to size_t.

CC: qemu-stable@nongnu.org
Reported-by: Daniel Shapira <daniel@twistlock.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/pcnet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/pcnet.c
+++ b/hw/net/pcnet.c
@@ -XXX,XX +XXX,XX @@ ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
     uint8_t buf1[60];
     int remaining;
     int crc_err = 0;
-    int size = size_;
+    size_t size = size_;
 
     if (CSR_DRX(s) || CSR_STOP(s) || CSR_SPND(s) || !size ||
         (CSR_LOOP(s) && !s->looptest)) {
         return -1;
     }
 #ifdef PCNET_DEBUG
-    printf("pcnet_receive size=%d\n", size);
+    printf("pcnet_receive size=%zu\n", size);
 #endif
 
     /* if too small buffer, then expand it */
-- 
2.17.1

From: Martin Wilck <mwilck@suse.com>

The e1000 emulation silently discards RX packets if there's
insufficient space in the ring buffer. This leads to errors
on higher-level protocols in the guest, with no indication
about the error cause.

This patch increments the "Missed Packets Count" (MPC) and
"Receive No Buffers Count" (RNBC) HW counters in this case.
As the emulation has no FIFO for buffering packets that can't
immediately be pushed to the guest, these two registers are
practically equivalent (see 10.2.7.4, 10.2.7.33 in
https://www.intel.com/content/www/us/en/embedded/products/networking/82574l-gbe-controller-datasheet.html).

On a Linux guest, the register content  will be reflected in
the "rx_missed_errors" and "rx_no_buffer_count" stats from
"ethtool -S", and in the "missed" stat from "ip -s -s link show",
giving at least some hint about the error cause inside the guest.

If the cause is known, problems like this can often be avoided
easily, by increasing the number of RX descriptors in the guest
e1000 driver (e.g under Linux, "e1000.RxDescriptors=1024").

The patch also adds a qemu trace message for this condition.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 hw/net/e1000.c      | 16 +++++++++++++---
 hw/net/trace-events |  3 +++
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/range.h"
 
 #include "e1000x_common.h"
+#include "trace.h"
 
 static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
 
@@ -XXX,XX +XXX,XX @@ static uint64_t rx_desc_base(E1000State *s)
     return (bah << 32) + bal;
 }
 
+static void
+e1000_receiver_overrun(E1000State *s, size_t size)
+{
+    trace_e1000_receiver_overrun(size, s->mac_reg[RDH], s->mac_reg[RDT]);
+    e1000x_inc_reg_if_not_full(s->mac_reg, RNBC);
+    e1000x_inc_reg_if_not_full(s->mac_reg, MPC);
+    set_ics(s, 0, E1000_ICS_RXO);
+}
+
 static ssize_t
 e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
 {
@@ -XXX,XX +XXX,XX @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
     desc_offset = 0;
     total_size = size + e1000x_fcs_len(s->mac_reg);
     if (!e1000_has_rxbufs(s, total_size)) {
-            set_ics(s, 0, E1000_ICS_RXO);
-            return -1;
+        e1000_receiver_overrun(s, total_size);
+        return -1;
     }
     do {
         desc_size = total_size - desc_offset;
@@ -XXX,XX +XXX,XX @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
             rdh_start >= s->mac_reg[RDLEN] / sizeof(desc)) {
             DBGOUT(RXERR, "RDH wraparound @%x, RDT %x, RDLEN %x\n",
                    rdh_start, s->mac_reg[RDT], s->mac_reg[RDLEN]);
-            set_ics(s, 0, E1000_ICS_RXO);
+            e1000_receiver_overrun(s, total_size);
             return -1;
         }
     } while (desc_offset < total_size);
diff --git a/hw/net/trace-events b/hw/net/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -XXX,XX +XXX,XX @@ net_rx_pkt_rss_ip6_ex(void) "Calculating IPv6/EX RSS  hash"
 net_rx_pkt_rss_hash(size_t rss_length, uint32_t rss_hash) "RSS hash for %zu bytes: 0x%X"
 net_rx_pkt_rss_add_chunk(void* ptr, size_t size, size_t input_offset) "Add RSS chunk %p, %zu bytes, RSS input offset %zu bytes"
 
+# hw/net/e1000.c
+e1000_receiver_overrun(size_t s, uint32_t rdh, uint32_t rdt) "Receiver overrun: dropped packet of %lu bytes, RDH=%u, RDT=%u"
+
 # hw/net/e1000x_common.c
 e1000x_rx_can_recv_disabled(bool link_up, bool rx_enabled, bool pci_master) "link_up: %d, rx_enabled %d, pci_master %d"
 e1000x_vlan_is_vlan_pkt(bool is_vlan_pkt, uint16_t eth_proto, uint16_t vet) "Is VLAN packet: %d, ETH proto: 0x%X, VET: 0x%X"
-- 
2.17.1

The following changes since commit 92f8c6fef13b31ba222c4d20ad8afd2b79c4c28e:

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20210525' into staging (2021-05-25 16:17:06 +0100)

are available in the git repository at:

https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to 90322e646e87c1440661cb3ddbc0cc94309d8a4f:

MAINTAINERS: Added eBPF maintainers information. (2021-06-04 15:25:46 +0800)

----------------------------------------------------------------

----------------------------------------------------------------
Andrew Melnychenko (7):
      net/tap: Added TUNSETSTEERINGEBPF code.
      net: Added SetSteeringEBPF method for NetClientState.
      ebpf: Added eBPF RSS program.
      ebpf: Added eBPF RSS loader.
      virtio-net: Added eBPF RSS to virtio-net.
      docs: Added eBPF documentation.
      MAINTAINERS: Added eBPF maintainers information.

MAINTAINERS                    |   8 +
 configure                      |   8 +-
 docs/devel/ebpf_rss.rst        | 125 +++++++++
 docs/devel/index.rst           |   1 +
 ebpf/ebpf_rss-stub.c           |  40 +++
 ebpf/ebpf_rss.c                | 165 ++++++++++++
 ebpf/ebpf_rss.h                |  44 ++++
 ebpf/meson.build               |   1 +
 ebpf/rss.bpf.skeleton.h        | 431 +++++++++++++++++++++++++++++++
 ebpf/trace-events              |   4 +
 ebpf/trace.h                   |   1 +
 hw/net/vhost_net.c             |   3 +
 hw/net/virtio-net.c            | 116 ++++++++-
 include/hw/virtio/virtio-net.h |   4 +
 include/net/net.h              |   2 +
 meson.build                    |  23 ++
 meson_options.txt              |   2 +
 net/tap-bsd.c                  |   5 +
 net/tap-linux.c                |  13 +
 net/tap-linux.h                |   1 +
 net/tap-solaris.c              |   5 +
 net/tap-stub.c                 |   5 +
 net/tap.c                      |   9 +
 net/tap_int.h                  |   1 +
 net/vhost-vdpa.c               |   2 +
 tools/ebpf/Makefile.ebpf       |  21 ++
 tools/ebpf/rss.bpf.c           | 571 +++++++++++++++++++++++++++++++++++++++++
 27 files changed, 1607 insertions(+), 4 deletions(-)
 create mode 100644 docs/devel/ebpf_rss.rst
 create mode 100644 ebpf/ebpf_rss-stub.c
 create mode 100644 ebpf/ebpf_rss.c
 create mode 100644 ebpf/ebpf_rss.h
 create mode 100644 ebpf/meson.build
 create mode 100644 ebpf/rss.bpf.skeleton.h
 create mode 100644 ebpf/trace-events
 create mode 100644 ebpf/trace.h
 create mode 100755 tools/ebpf/Makefile.ebpf
 create mode 100644 tools/ebpf/rss.bpf.c

From: Andrew Melnychenko <andrew@daynix.com>

For now, that method supported only by Linux TAP.
Linux TAP uses TUNSETSTEERINGEBPF ioctl.

diff --git a/include/net/net.h b/include/net/net.h
index XXXXXXX..XXXXXXX 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -XXX,XX +XXX,XX @@ typedef int (SetVnetBE)(NetClientState *, bool);
 typedef struct SocketReadState SocketReadState;
 typedef void (SocketReadStateFinalize)(SocketReadState *rs);
 typedef void (NetAnnounce)(NetClientState *);
+typedef bool (SetSteeringEBPF)(NetClientState *, int);
 
 typedef struct NetClientInfo {
     NetClientDriver type;
@@ -XXX,XX +XXX,XX @@ typedef struct NetClientInfo {
     SetVnetLE *set_vnet_le;
     SetVnetBE *set_vnet_be;
     NetAnnounce *announce;
+    SetSteeringEBPF *set_steering_ebpf;
 } NetClientInfo;
 
 struct NetClientState {
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index XXXXXXX..XXXXXXX 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -XXX,XX +XXX,XX @@ int tap_fd_get_ifname(int fd, char *ifname)
 {
     return -1;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+    return -1;
+}
diff --git a/net/tap-linux.c b/net/tap-linux.c
index XXXXXXX..XXXXXXX 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -XXX,XX +XXX,XX @@ int tap_fd_get_ifname(int fd, char *ifname)
     pstrcpy(ifname, sizeof(ifr.ifr_name), ifr.ifr_name);
     return 0;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+    if (ioctl(fd, TUNSETSTEERINGEBPF, (void *) &prog_fd) != 0) {
+        error_report("Issue while setting TUNSETSTEERINGEBPF:"
+                    " %s with fd: %d, prog_fd: %d",
+                    strerror(errno), fd, prog_fd);
+
+       return -1;
+    }
+
+    return 0;
+}
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index XXXXXXX..XXXXXXX 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -XXX,XX +XXX,XX @@ int tap_fd_get_ifname(int fd, char *ifname)
 {
     return -1;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+    return -1;
+}
diff --git a/net/tap-stub.c b/net/tap-stub.c
index XXXXXXX..XXXXXXX 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -XXX,XX +XXX,XX @@ int tap_fd_get_ifname(int fd, char *ifname)
 {
     return -1;
 }
+
+int tap_fd_set_steering_ebpf(int fd, int prog_fd)
+{
+    return -1;
+}
diff --git a/net/tap.c b/net/tap.c
index XXXXXXX..XXXXXXX 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -XXX,XX +XXX,XX @@ static void tap_poll(NetClientState *nc, bool enable)
     tap_write_poll(s, enable);
 }
 
+static bool tap_set_steering_ebpf(NetClientState *nc, int prog_fd)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+    assert(nc->info->type == NET_CLIENT_DRIVER_TAP);
+
+    return tap_fd_set_steering_ebpf(s->fd, prog_fd) == 0;
+}
+
 int tap_get_fd(NetClientState *nc)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
@@ -XXX,XX +XXX,XX @@ static NetClientInfo net_tap_info = {
     .set_vnet_hdr_len = tap_set_vnet_hdr_len,
     .set_vnet_le = tap_set_vnet_le,
     .set_vnet_be = tap_set_vnet_be,
+    .set_steering_ebpf = tap_set_steering_ebpf,
 };
 
 static TAPState *net_tap_fd_init(NetClientState *peer,
diff --git a/net/tap_int.h b/net/tap_int.h
index XXXXXXX..XXXXXXX 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -XXX,XX +XXX,XX @@ int tap_fd_set_vnet_be(int fd, int vnet_is_be);
 int tap_fd_enable(int fd);
 int tap_fd_disable(int fd);
 int tap_fd_get_ifname(int fd, char *ifname);
+int tap_fd_set_steering_ebpf(int fd, int prog_fd);
 
 #endif /* NET_TAP_INT_H */
-- 
2.7.4

From: Andrew Melnychenko <andrew@daynix.com>

RSS program and Makefile to build it.
The bpftool used to generate '.h' file.
The data in that file may be loaded by libbpf.
EBPF compilation is not required for building qemu.
You can use Makefile if you need to regenerate rss.bpf.skeleton.h.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 tools/ebpf/Makefile.ebpf |  21 ++
 tools/ebpf/rss.bpf.c     | 571 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 592 insertions(+)
 create mode 100755 tools/ebpf/Makefile.ebpf
 create mode 100644 tools/ebpf/rss.bpf.c

diff --git a/tools/ebpf/Makefile.ebpf b/tools/ebpf/Makefile.ebpf
new file mode 100755
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tools/ebpf/Makefile.ebpf
@@ -XXX,XX +XXX,XX @@
+OBJS = rss.bpf.o
+
+LLC ?= llc
+CLANG ?= clang
+INC_FLAGS = `$(CLANG) -print-file-name=include`
+EXTRA_CFLAGS ?= -O2 -emit-llvm -fno-stack-protector
+
+all: $(OBJS)
+
+.PHONY: clean
+
+clean:
+	rm -f $(OBJS)
+
+$(OBJS):  %.o:%.c
+	$(CLANG) $(INC_FLAGS) \
+                -D__KERNEL__ -D__ASM_SYSREG_H \
+                -I../include $(LINUXINCLUDE) \
+                $(EXTRA_CFLAGS) -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
+	bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h
+	cp rss.bpf.skeleton.h ../../ebpf/
diff --git a/tools/ebpf/rss.bpf.c b/tools/ebpf/rss.bpf.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tools/ebpf/rss.bpf.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * eBPF RSS program
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko <andrew@daynix.com>
+ *  Yuri Benditovich <yuri.benditovich@daynix.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ * Prepare:
+ * Requires llvm, clang, bpftool, linux kernel tree
+ *
+ * Build rss.bpf.skeleton.h:
+ * make -f Makefile.ebpf clean all
+ */
+
+#include <stddef.h>
+#include <stdbool.h>
+#include <linux/bpf.h>
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+
+#include <linux/udp.h>
+#include <linux/tcp.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+#include <linux/virtio_net.h>
+
+#define INDIRECTION_TABLE_SIZE 128
+#define HASH_CALCULATION_BUFFER_SIZE 36
+
+struct rss_config_t {
+    __u8 redirect;
+    __u8 populate_hash;
+    __u32 hash_types;
+    __u16 indirections_len;
+    __u16 default_queue;
+} __attribute__((packed));
+
+struct toeplitz_key_data_t {
+    __u32 leftmost_32_bits;
+    __u8 next_byte[HASH_CALCULATION_BUFFER_SIZE];
+};
+
+struct packet_hash_info_t {
+    __u8 is_ipv4;
+    __u8 is_ipv6;
+    __u8 is_udp;
+    __u8 is_tcp;
+    __u8 is_ipv6_ext_src;
+    __u8 is_ipv6_ext_dst;
+    __u8 is_fragmented;
+
+    __u16 src_port;
+    __u16 dst_port;
+
+    union {
+        struct {
+            __be32 in_src;
+            __be32 in_dst;
+        };
+
+        struct {
+            struct in6_addr in6_src;
+            struct in6_addr in6_dst;
+            struct in6_addr in6_ext_src;
+            struct in6_addr in6_ext_dst;
+        };
+    };
+};
+
+struct bpf_map_def SEC("maps")
+tap_rss_map_configurations = {
+        .type        = BPF_MAP_TYPE_ARRAY,
+        .key_size    = sizeof(__u32),
+        .value_size  = sizeof(struct rss_config_t),
+        .max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps")
+tap_rss_map_toeplitz_key = {
+        .type        = BPF_MAP_TYPE_ARRAY,
+        .key_size    = sizeof(__u32),
+        .value_size  = sizeof(struct toeplitz_key_data_t),
+        .max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps")
+tap_rss_map_indirection_table = {
+        .type        = BPF_MAP_TYPE_ARRAY,
+        .key_size    = sizeof(__u32),
+        .value_size  = sizeof(__u16),
+        .max_entries = INDIRECTION_TABLE_SIZE,
+};
+
+static inline void net_rx_rss_add_chunk(__u8 *rss_input, size_t *bytes_written,
+                                        const void *ptr, size_t size) {
+    __builtin_memcpy(&rss_input[*bytes_written], ptr, size);
+    *bytes_written += size;
+}
+
+static inline
+void net_toeplitz_add(__u32 *result,
+                      __u8 *input,
+                      __u32 len
+        , struct toeplitz_key_data_t *key) {
+
+    __u32 accumulator = *result;
+    __u32 leftmost_32_bits = key->leftmost_32_bits;
+    __u32 byte;
+
+    for (byte = 0; byte < HASH_CALCULATION_BUFFER_SIZE; byte++) {
+        __u8 input_byte = input[byte];
+        __u8 key_byte = key->next_byte[byte];
+        __u8 bit;
+
+        for (bit = 0; bit < 8; bit++) {
+            if (input_byte & (1 << 7)) {
+                accumulator ^= leftmost_32_bits;
+            }
+
+            leftmost_32_bits =
+                    (leftmost_32_bits << 1) | ((key_byte & (1 << 7)) >> 7);
+
+            input_byte <<= 1;
+            key_byte <<= 1;
+        }
+    }
+
+    *result = accumulator;
+}
+
+
+static inline int ip6_extension_header_type(__u8 hdr_type)
+{
+    switch (hdr_type) {
+    case IPPROTO_HOPOPTS:
+    case IPPROTO_ROUTING:
+    case IPPROTO_FRAGMENT:
+    case IPPROTO_ICMPV6:
+    case IPPROTO_NONE:
+    case IPPROTO_DSTOPTS:
+    case IPPROTO_MH:
+        return 1;
+    default:
+        return 0;
+    }
+}
+/*
+ * According to
+ * https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml
+ * we expect that there are would be no more than 11 extensions in IPv6 header,
+ * also there is 27 TLV options for Destination and Hop-by-hop extensions.
+ * Need to choose reasonable amount of maximum extensions/options we may
+ * check to find ext src/dst.
+ */
+#define IP6_EXTENSIONS_COUNT 11
+#define IP6_OPTIONS_COUNT 30
+
+static inline int parse_ipv6_ext(struct __sk_buff *skb,
+        struct packet_hash_info_t *info,
+        __u8 *l4_protocol, size_t *l4_offset)
+{
+    int err = 0;
+
+    if (!ip6_extension_header_type(*l4_protocol)) {
+        return 0;
+    }
+
+    struct ipv6_opt_hdr ext_hdr = {};
+
+    for (unsigned int i = 0; i < IP6_EXTENSIONS_COUNT; ++i) {
+
+        err = bpf_skb_load_bytes_relative(skb, *l4_offset, &ext_hdr,
+                                    sizeof(ext_hdr), BPF_HDR_START_NET);
+        if (err) {
+            goto error;
+        }
+
+        if (*l4_protocol == IPPROTO_ROUTING) {
+            struct ipv6_rt_hdr ext_rt = {};
+
+            err = bpf_skb_load_bytes_relative(skb, *l4_offset, &ext_rt,
+                                        sizeof(ext_rt), BPF_HDR_START_NET);
+            if (err) {
+                goto error;
+            }
+
+            if ((ext_rt.type == IPV6_SRCRT_TYPE_2) &&
+                    (ext_rt.hdrlen == sizeof(struct in6_addr) / 8) &&
+                    (ext_rt.segments_left == 1)) {
+
+                err = bpf_skb_load_bytes_relative(skb,
+                    *l4_offset + offsetof(struct rt2_hdr, addr),
+                    &info->in6_ext_dst, sizeof(info->in6_ext_dst),
+                    BPF_HDR_START_NET);
+                if (err) {
+                    goto error;
+                }
+
+                info->is_ipv6_ext_dst = 1;
+            }
+
+        } else if (*l4_protocol == IPPROTO_DSTOPTS) {
+            struct ipv6_opt_t {
+                __u8 type;
+                __u8 length;
+            } __attribute__((packed)) opt = {};
+
+            size_t opt_offset = sizeof(ext_hdr);
+
+            for (unsigned int j = 0; j < IP6_OPTIONS_COUNT; ++j) {
+                err = bpf_skb_load_bytes_relative(skb, *l4_offset + opt_offset,
+                                        &opt, sizeof(opt), BPF_HDR_START_NET);
+                if (err) {
+                    goto error;
+                }
+
+                if (opt.type == IPV6_TLV_HAO) {
+                    err = bpf_skb_load_bytes_relative(skb,
+                        *l4_offset + opt_offset
+                        + offsetof(struct ipv6_destopt_hao, addr),
+                        &info->in6_ext_src, sizeof(info->in6_ext_src),
+                        BPF_HDR_START_NET);
+                    if (err) {
+                        goto error;
+                    }
+
+                    info->is_ipv6_ext_src = 1;
+                    break;
+                }
+
+                opt_offset += (opt.type == IPV6_TLV_PAD1) ?
+                              1 : opt.length + sizeof(opt);
+
+                if (opt_offset + 1 >= ext_hdr.hdrlen * 8) {
+                    break;
+                }
+            }
+        } else if (*l4_protocol == IPPROTO_FRAGMENT) {
+            info->is_fragmented = true;
+        }
+
+        *l4_protocol = ext_hdr.nexthdr;
+        *l4_offset += (ext_hdr.hdrlen + 1) * 8;
+
+        if (!ip6_extension_header_type(ext_hdr.nexthdr)) {
+            return 0;
+        }
+    }
+
+    return 0;
+error:
+    return err;
+}
+
+static __be16 parse_eth_type(struct __sk_buff *skb)
+{
+    unsigned int offset = 12;
+    __be16 ret = 0;
+    int err = 0;
+
+    err = bpf_skb_load_bytes_relative(skb, offset, &ret, sizeof(ret),
+                                BPF_HDR_START_MAC);
+    if (err) {
+        return 0;
+    }
+
+    switch (bpf_ntohs(ret)) {
+    case ETH_P_8021AD:
+        offset += 4;
+    case ETH_P_8021Q:
+        offset += 4;
+        err = bpf_skb_load_bytes_relative(skb, offset, &ret, sizeof(ret),
+                                    BPF_HDR_START_MAC);
+    default:
+        break;
+    }
+
+    if (err) {
+        return 0;
+    }
+
+    return ret;
+}
+
+static inline int parse_packet(struct __sk_buff *skb,
+        struct packet_hash_info_t *info)
+{
+    int err = 0;
+
+    if (!info || !skb) {
+        return -1;
+    }
+
+    size_t l4_offset = 0;
+    __u8 l4_protocol = 0;
+    __u16 l3_protocol = bpf_ntohs(parse_eth_type(skb));
+    if (l3_protocol == 0) {
+        err = -1;
+        goto error;
+    }
+
+    if (l3_protocol == ETH_P_IP) {
+        info->is_ipv4 = 1;
+
+        struct iphdr ip = {};
+        err = bpf_skb_load_bytes_relative(skb, 0, &ip, sizeof(ip),
+                                    BPF_HDR_START_NET);
+        if (err) {
+            goto error;
+        }
+
+        info->in_src = ip.saddr;
+        info->in_dst = ip.daddr;
+        info->is_fragmented = !!ip.frag_off;
+
+        l4_protocol = ip.protocol;
+        l4_offset = ip.ihl * 4;
+    } else if (l3_protocol == ETH_P_IPV6) {
+        info->is_ipv6 = 1;
+
+        struct ipv6hdr ip6 = {};
+        err = bpf_skb_load_bytes_relative(skb, 0, &ip6, sizeof(ip6),
+                                    BPF_HDR_START_NET);
+        if (err) {
+            goto error;
+        }
+
+        info->in6_src = ip6.saddr;
+        info->in6_dst = ip6.daddr;
+
+        l4_protocol = ip6.nexthdr;
+        l4_offset = sizeof(ip6);
+
+        err = parse_ipv6_ext(skb, info, &l4_protocol, &l4_offset);
+        if (err) {
+            goto error;
+        }
+    }
+
+    if (l4_protocol != 0 && !info->is_fragmented) {
+        if (l4_protocol == IPPROTO_TCP) {
+            info->is_tcp = 1;
+
+            struct tcphdr tcp = {};
+            err = bpf_skb_load_bytes_relative(skb, l4_offset, &tcp, sizeof(tcp),
+                                        BPF_HDR_START_NET);
+            if (err) {
+                goto error;
+            }
+
+            info->src_port = tcp.source;
+            info->dst_port = tcp.dest;
+        } else if (l4_protocol == IPPROTO_UDP) { /* TODO: add udplite? */
+            info->is_udp = 1;
+
+            struct udphdr udp = {};
+            err = bpf_skb_load_bytes_relative(skb, l4_offset, &udp, sizeof(udp),
+                                        BPF_HDR_START_NET);
+            if (err) {
+                goto error;
+            }
+
+            info->src_port = udp.source;
+            info->dst_port = udp.dest;
+        }
+    }
+
+    return 0;
+
+error:
+    return err;
+}
+
+static inline __u32 calculate_rss_hash(struct __sk_buff *skb,
+        struct rss_config_t *config, struct toeplitz_key_data_t *toe)
+{
+    __u8 rss_input[HASH_CALCULATION_BUFFER_SIZE] = {};
+    size_t bytes_written = 0;
+    __u32 result = 0;
+    int err = 0;
+    struct packet_hash_info_t packet_info = {};
+
+    err = parse_packet(skb, &packet_info);
+    if (err) {
+        return 0;
+    }
+
+    if (packet_info.is_ipv4) {
+        if (packet_info.is_tcp &&
+            config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCPv4) {
+
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_src,
+                                 sizeof(packet_info.in_src));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_dst,
+                                 sizeof(packet_info.in_dst));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.src_port,
+                                 sizeof(packet_info.src_port));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.dst_port,
+                                 sizeof(packet_info.dst_port));
+        } else if (packet_info.is_udp &&
+                   config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDPv4) {
+
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_src,
+                                 sizeof(packet_info.in_src));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_dst,
+                                 sizeof(packet_info.in_dst));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.src_port,
+                                 sizeof(packet_info.src_port));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.dst_port,
+                                 sizeof(packet_info.dst_port));
+        } else if (config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IPv4) {
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_src,
+                                 sizeof(packet_info.in_src));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.in_dst,
+                                 sizeof(packet_info.in_dst));
+        }
+    } else if (packet_info.is_ipv6) {
+        if (packet_info.is_tcp &&
+            config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCPv6) {
+
+            if (packet_info.is_ipv6_ext_src &&
+                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_src,
+                                     sizeof(packet_info.in6_ext_src));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_src,
+                                     sizeof(packet_info.in6_src));
+            }
+            if (packet_info.is_ipv6_ext_dst &&
+                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_TCP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_dst,
+                                     sizeof(packet_info.in6_ext_dst));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_dst,
+                                     sizeof(packet_info.in6_dst));
+            }
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.src_port,
+                                 sizeof(packet_info.src_port));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.dst_port,
+                                 sizeof(packet_info.dst_port));
+        } else if (packet_info.is_udp &&
+                   config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDPv6) {
+
+            if (packet_info.is_ipv6_ext_src &&
+               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_src,
+                                     sizeof(packet_info.in6_ext_src));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_src,
+                                     sizeof(packet_info.in6_src));
+            }
+            if (packet_info.is_ipv6_ext_dst &&
+               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_UDP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_dst,
+                                     sizeof(packet_info.in6_ext_dst));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_dst,
+                                     sizeof(packet_info.in6_dst));
+            }
+
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.src_port,
+                                 sizeof(packet_info.src_port));
+            net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                 &packet_info.dst_port,
+                                 sizeof(packet_info.dst_port));
+
+        } else if (config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IPv6) {
+            if (packet_info.is_ipv6_ext_src &&
+               config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_src,
+                                     sizeof(packet_info.in6_ext_src));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_src,
+                                     sizeof(packet_info.in6_src));
+            }
+            if (packet_info.is_ipv6_ext_dst &&
+                config->hash_types & VIRTIO_NET_RSS_HASH_TYPE_IP_EX) {
+
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_ext_dst,
+                                     sizeof(packet_info.in6_ext_dst));
+            } else {
+                net_rx_rss_add_chunk(rss_input, &bytes_written,
+                                     &packet_info.in6_dst,
+                                     sizeof(packet_info.in6_dst));
+            }
+        }
+    }
+
+    if (bytes_written) {
+        net_toeplitz_add(&result, rss_input, bytes_written, toe);
+    }
+
+    return result;
+}
+
+SEC("tun_rss_steering")
+int tun_rss_steering_prog(struct __sk_buff *skb)
+{
+
+    struct rss_config_t *config;
+    struct toeplitz_key_data_t *toe;
+
+    __u32 key = 0;
+    __u32 hash = 0;
+
+    config = bpf_map_lookup_elem(&tap_rss_map_configurations, &key);
+    toe = bpf_map_lookup_elem(&tap_rss_map_toeplitz_key, &key);
+
+    if (config && toe) {
+        if (!config->redirect) {
+            return config->default_queue;
+        }
+
+        hash = calculate_rss_hash(skb, config, toe);
+        if (hash) {
+            __u32 table_idx = hash % config->indirections_len;
+            __u16 *queue = 0;
+
+            queue = bpf_map_lookup_elem(&tap_rss_map_indirection_table,
+                                        &table_idx);
+
+            if (queue) {
+                return *queue;
+            }
+        }
+
+        return config->default_queue;
+    }
+
+    return -1;
+}
+
+char _license[] SEC("license") = "GPL v2";
-- 
2.7.4

From: Andrew Melnychenko <andrew@daynix.com>

Added function that loads RSS eBPF program.
Added stub functions for RSS eBPF loader.
Added meson and configuration options.

By default, eBPF feature enabled if libbpf is present in the build system.
libbpf checked in configuration shell script and meson script.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 configure               |   8 +-
 ebpf/ebpf_rss-stub.c    |  40 +++++
 ebpf/ebpf_rss.c         | 165 ++++++++++++++++++
 ebpf/ebpf_rss.h         |  44 +++++
 ebpf/meson.build        |   1 +
 ebpf/rss.bpf.skeleton.h | 431 ++++++++++++++++++++++++++++++++++++++++++++++++
 ebpf/trace-events       |   4 +
 ebpf/trace.h            |   1 +
 meson.build             |  23 +++
 meson_options.txt       |   2 +
 10 files changed, 718 insertions(+), 1 deletion(-)
 create mode 100644 ebpf/ebpf_rss-stub.c
 create mode 100644 ebpf/ebpf_rss.c
 create mode 100644 ebpf/ebpf_rss.h
 create mode 100644 ebpf/meson.build
 create mode 100644 ebpf/rss.bpf.skeleton.h
 create mode 100644 ebpf/trace-events
 create mode 100644 ebpf/trace.h

diff --git a/configure b/configure
index XXXXXXX..XXXXXXX 100755
--- a/configure
+++ b/configure
@@ -XXX,XX +XXX,XX @@ vhost_vsock="$default_feature"
 vhost_user="no"
 vhost_user_blk_server="auto"
 vhost_user_fs="$default_feature"
+bpf="auto"
 kvm="auto"
 hax="auto"
 hvf="auto"
@@ -XXX,XX +XXX,XX @@ for opt do
   ;;
   --enable-membarrier) membarrier="yes"
   ;;
+  --disable-bpf) bpf="disabled"
+  ;;
+  --enable-bpf) bpf="enabled"
+  ;;
   --disable-blobs) blobs="false"
   ;;
   --with-pkgversion=*) pkgversion="$optarg"
@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available
   vhost-user      vhost-user backend support
   vhost-user-blk-server    vhost-user-blk server support
   vhost-vdpa      vhost-vdpa kernel backend support
+  bpf             BPF kernel support
   spice           spice
   spice-protocol  spice-protocol
   rbd             rados block device (rbd)
@@ -XXX,XX +XXX,XX @@ if test "$skip_meson" = no; then
         -Dattr=$attr -Ddefault_devices=$default_devices \
         -Ddocs=$docs -Dsphinx_build=$sphinx_build -Dinstall_blobs=$blobs \
         -Dvhost_user_blk_server=$vhost_user_blk_server -Dmultiprocess=$multiprocess \
-        -Dfuse=$fuse -Dfuse_lseek=$fuse_lseek -Dguest_agent_msi=$guest_agent_msi \
+        -Dfuse=$fuse -Dfuse_lseek=$fuse_lseek -Dguest_agent_msi=$guest_agent_msi -Dbpf=$bpf\
         $(if test "$default_features" = no; then echo "-Dauto_features=disabled"; fi) \
 	-Dtcg_interpreter=$tcg_interpreter \
         $cross_arg \
diff --git a/ebpf/ebpf_rss-stub.c b/ebpf/ebpf_rss-stub.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/ebpf/ebpf_rss-stub.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * eBPF RSS stub file
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Yuri Benditovich <yuri.benditovich@daynix.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "ebpf/ebpf_rss.h"
+
+void ebpf_rss_init(struct EBPFRSSContext *ctx)
+{
+
+}
+
+bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
+{
+    return false;
+}
+
+bool ebpf_rss_load(struct EBPFRSSContext *ctx)
+{
+    return false;
+}
+
+bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
+                      uint16_t *indirections_table, uint8_t *toeplitz_key)
+{
+    return false;
+}
+
+void ebpf_rss_unload(struct EBPFRSSContext *ctx)
+{
+
+}
diff --git a/ebpf/ebpf_rss.c b/ebpf/ebpf_rss.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/ebpf/ebpf_rss.c
@@ -XXX,XX +XXX,XX @@
+/*
+ * eBPF RSS loader
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko <andrew@daynix.com>
+ *  Yuri Benditovich <yuri.benditovich@daynix.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/error-report.h"
+
+#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
+
+#include "hw/virtio/virtio-net.h" /* VIRTIO_NET_RSS_MAX_TABLE_LEN */
+
+#include "ebpf/ebpf_rss.h"
+#include "ebpf/rss.bpf.skeleton.h"
+#include "trace.h"
+
+void ebpf_rss_init(struct EBPFRSSContext *ctx)
+{
+    if (ctx != NULL) {
+        ctx->obj = NULL;
+    }
+}
+
+bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx)
+{
+    return ctx != NULL && ctx->obj != NULL;
+}
+
+bool ebpf_rss_load(struct EBPFRSSContext *ctx)
+{
+    struct rss_bpf *rss_bpf_ctx;
+
+    if (ctx == NULL) {
+        return false;
+    }
+
+    rss_bpf_ctx = rss_bpf__open();
+    if (rss_bpf_ctx == NULL) {
+        trace_ebpf_error("eBPF RSS", "can not open eBPF RSS object");
+        goto error;
+    }
+
+    bpf_program__set_socket_filter(rss_bpf_ctx->progs.tun_rss_steering_prog);
+
+    if (rss_bpf__load(rss_bpf_ctx)) {
+        trace_ebpf_error("eBPF RSS", "can not load RSS program");
+        goto error;
+    }
+
+    ctx->obj = rss_bpf_ctx;
+    ctx->program_fd = bpf_program__fd(
+            rss_bpf_ctx->progs.tun_rss_steering_prog);
+    ctx->map_configuration = bpf_map__fd(
+            rss_bpf_ctx->maps.tap_rss_map_configurations);
+    ctx->map_indirections_table = bpf_map__fd(
+            rss_bpf_ctx->maps.tap_rss_map_indirection_table);
+    ctx->map_toeplitz_key = bpf_map__fd(
+            rss_bpf_ctx->maps.tap_rss_map_toeplitz_key);
+
+    return true;
+error:
+    rss_bpf__destroy(rss_bpf_ctx);
+    ctx->obj = NULL;
+
+    return false;
+}
+
+static bool ebpf_rss_set_config(struct EBPFRSSContext *ctx,
+                                struct EBPFRSSConfig *config)
+{
+    uint32_t map_key = 0;
+
+    if (!ebpf_rss_is_loaded(ctx)) {
+        return false;
+    }
+    if (bpf_map_update_elem(ctx->map_configuration,
+                            &map_key, config, 0) < 0) {
+        return false;
+    }
+    return true;
+}
+
+static bool ebpf_rss_set_indirections_table(struct EBPFRSSContext *ctx,
+                                            uint16_t *indirections_table,
+                                            size_t len)
+{
+    uint32_t i = 0;
+
+    if (!ebpf_rss_is_loaded(ctx) || indirections_table == NULL ||
+       len > VIRTIO_NET_RSS_MAX_TABLE_LEN) {
+        return false;
+    }
+
+    for (; i < len; ++i) {
+        if (bpf_map_update_elem(ctx->map_indirections_table, &i,
+                                indirections_table + i, 0) < 0) {
+            return false;
+        }
+    }
+    return true;
+}
+
+static bool ebpf_rss_set_toepliz_key(struct EBPFRSSContext *ctx,
+                                     uint8_t *toeplitz_key)
+{
+    uint32_t map_key = 0;
+
+    /* prepare toeplitz key */
+    uint8_t toe[VIRTIO_NET_RSS_MAX_KEY_SIZE] = {};
+
+    if (!ebpf_rss_is_loaded(ctx) || toeplitz_key == NULL) {
+        return false;
+    }
+    memcpy(toe, toeplitz_key, VIRTIO_NET_RSS_MAX_KEY_SIZE);
+    *(uint32_t *)toe = ntohl(*(uint32_t *)toe);
+
+    if (bpf_map_update_elem(ctx->map_toeplitz_key, &map_key, toe,
+                            0) < 0) {
+        return false;
+    }
+    return true;
+}
+
+bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
+                      uint16_t *indirections_table, uint8_t *toeplitz_key)
+{
+    if (!ebpf_rss_is_loaded(ctx) || config == NULL ||
+        indirections_table == NULL || toeplitz_key == NULL) {
+        return false;
+    }
+
+    if (!ebpf_rss_set_config(ctx, config)) {
+        return false;
+    }
+
+    if (!ebpf_rss_set_indirections_table(ctx, indirections_table,
+                                      config->indirections_len)) {
+        return false;
+    }
+
+    if (!ebpf_rss_set_toepliz_key(ctx, toeplitz_key)) {
+        return false;
+    }
+
+    return true;
+}
+
+void ebpf_rss_unload(struct EBPFRSSContext *ctx)
+{
+    if (!ebpf_rss_is_loaded(ctx)) {
+        return;
+    }
+
+    rss_bpf__destroy(ctx->obj);
+    ctx->obj = NULL;
+}
diff --git a/ebpf/ebpf_rss.h b/ebpf/ebpf_rss.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/ebpf/ebpf_rss.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * eBPF RSS header
+ *
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ *  Andrew Melnychenko <andrew@daynix.com>
+ *  Yuri Benditovich <yuri.benditovich@daynix.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_EBPF_RSS_H
+#define QEMU_EBPF_RSS_H
+
+struct EBPFRSSContext {
+    void *obj;
+    int program_fd;
+    int map_configuration;
+    int map_toeplitz_key;
+    int map_indirections_table;
+};
+
+struct EBPFRSSConfig {
+    uint8_t redirect;
+    uint8_t populate_hash;
+    uint32_t hash_types;
+    uint16_t indirections_len;
+    uint16_t default_queue;
+} __attribute__((packed));
+
+void ebpf_rss_init(struct EBPFRSSContext *ctx);
+
+bool ebpf_rss_is_loaded(struct EBPFRSSContext *ctx);
+
+bool ebpf_rss_load(struct EBPFRSSContext *ctx);
+
+bool ebpf_rss_set_all(struct EBPFRSSContext *ctx, struct EBPFRSSConfig *config,
+                      uint16_t *indirections_table, uint8_t *toeplitz_key);
+
+void ebpf_rss_unload(struct EBPFRSSContext *ctx);
+
+#endif /* QEMU_EBPF_RSS_H */
diff --git a/ebpf/meson.build b/ebpf/meson.build
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/ebpf/meson.build
@@ -0,0 +1 @@
+common_ss.add(when: libbpf, if_true: files('ebpf_rss.c'), if_false: files('ebpf_rss-stub.c'))
diff --git a/ebpf/rss.bpf.skeleton.h b/ebpf/rss.bpf.skeleton.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/ebpf/rss.bpf.skeleton.h
@@ -XXX,XX +XXX,XX @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+
+/* THIS FILE IS AUTOGENERATED! */
+#ifndef __RSS_BPF_SKEL_H__
+#define __RSS_BPF_SKEL_H__
+
+#include <stdlib.h>
+#include <bpf/libbpf.h>
+
+struct rss_bpf {
+	struct bpf_object_skeleton *skeleton;
+	struct bpf_object *obj;
+	struct {
+		struct bpf_map *tap_rss_map_configurations;
+		struct bpf_map *tap_rss_map_indirection_table;
+		struct bpf_map *tap_rss_map_toeplitz_key;
+	} maps;
+	struct {
+		struct bpf_program *tun_rss_steering_prog;
+	} progs;
+	struct {
+		struct bpf_link *tun_rss_steering_prog;
+	} links;
+};
+
+static void
+rss_bpf__destroy(struct rss_bpf *obj)
+{
+	if (!obj)
+		return;
+	if (obj->skeleton)
+		bpf_object__destroy_skeleton(obj->skeleton);
+	free(obj);
+}
+
+static inline int
+rss_bpf__create_skeleton(struct rss_bpf *obj);
+
+static inline struct rss_bpf *
+rss_bpf__open_opts(const struct bpf_object_open_opts *opts)
+{
+	struct rss_bpf *obj;
+
+	obj = (struct rss_bpf *)calloc(1, sizeof(*obj));
+	if (!obj)
+		return NULL;
+	if (rss_bpf__create_skeleton(obj))
+		goto err;
+	if (bpf_object__open_skeleton(obj->skeleton, opts))
+		goto err;
+
+	return obj;
+err:
+	rss_bpf__destroy(obj);
+	return NULL;
+}
+
+static inline struct rss_bpf *
+rss_bpf__open(void)
+{
+	return rss_bpf__open_opts(NULL);
+}
+
+static inline int
+rss_bpf__load(struct rss_bpf *obj)
+{
+	return bpf_object__load_skeleton(obj->skeleton);
+}
+
+static inline struct rss_bpf *
+rss_bpf__open_and_load(void)
+{
+	struct rss_bpf *obj;
+
+	obj = rss_bpf__open();
+	if (!obj)
+		return NULL;
+	if (rss_bpf__load(obj)) {
+		rss_bpf__destroy(obj);
+		return NULL;
+	}
+	return obj;
+}
+
+static inline int
+rss_bpf__attach(struct rss_bpf *obj)
+{
+	return bpf_object__attach_skeleton(obj->skeleton);
+}
+
+static inline void
+rss_bpf__detach(struct rss_bpf *obj)
+{
+	return bpf_object__detach_skeleton(obj->skeleton);
+}
+
+static inline int
+rss_bpf__create_skeleton(struct rss_bpf *obj)
+{
+	struct bpf_object_skeleton *s;
+
+	s = (struct bpf_object_skeleton *)calloc(1, sizeof(*s));
+	if (!s)
+		return -1;
+	obj->skeleton = s;
+
+	s->sz = sizeof(*s);
+	s->name = "rss_bpf";
+	s->obj = &obj->obj;
+
+	/* maps */
+	s->map_cnt = 3;
+	s->map_skel_sz = sizeof(*s->maps);
+	s->maps = (struct bpf_map_skeleton *)calloc(s->map_cnt, s->map_skel_sz);
+	if (!s->maps)
+		goto err;
+
+	s->maps[0].name = "tap_rss_map_configurations";
+	s->maps[0].map = &obj->maps.tap_rss_map_configurations;
+
+	s->maps[1].name = "tap_rss_map_indirection_table";
+	s->maps[1].map = &obj->maps.tap_rss_map_indirection_table;
+
+	s->maps[2].name = "tap_rss_map_toeplitz_key";
+	s->maps[2].map = &obj->maps.tap_rss_map_toeplitz_key;
+
+	/* programs */
+	s->prog_cnt = 1;
+	s->prog_skel_sz = sizeof(*s->progs);
+	s->progs = (struct bpf_prog_skeleton *)calloc(s->prog_cnt, s->prog_skel_sz);
+	if (!s->progs)
+		goto err;
+
+	s->progs[0].name = "tun_rss_steering_prog";
+	s->progs[0].prog = &obj->progs.tun_rss_steering_prog;
+	s->progs[0].link = &obj->links.tun_rss_steering_prog;
+
+	s->data_sz = 8088;
+	s->data = (void *)"\
+\x7f\x45\x4c\x46\x02\x01\x01\0\0\0\0\0\0\0\0\0\x01\0\xf7\0\x01\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\x18\x1d\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\x40\0\x0a\0\
+\x01\0\xbf\x18\0\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\x4c\xff\0\0\0\0\xbf\xa7\
+\0\0\0\0\0\0\x07\x07\0\0\x4c\xff\xff\xff\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x06\0\0\0\0\0\0\x18\x01\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\xbf\x72\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\xbf\x07\0\0\0\0\0\0\
+\x18\0\0\0\xff\xff\xff\xff\0\0\0\0\0\0\0\0\x15\x06\x66\x02\0\0\0\0\xbf\x79\0\0\
+\0\0\0\0\x15\x09\x64\x02\0\0\0\0\x71\x61\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\
+\0\x5d\x02\0\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xc0\xff\0\0\0\0\x7b\x1a\xb8\xff\
+\0\0\0\0\x7b\x1a\xb0\xff\0\0\0\0\x7b\x1a\xa8\xff\0\0\0\0\x7b\x1a\xa0\xff\0\0\0\
+\0\x63\x1a\x98\xff\0\0\0\0\x7b\x1a\x90\xff\0\0\0\0\x7b\x1a\x88\xff\0\0\0\0\x7b\
+\x1a\x80\xff\0\0\0\0\x7b\x1a\x78\xff\0\0\0\0\x7b\x1a\x70\xff\0\0\0\0\x7b\x1a\
+\x68\xff\0\0\0\0\x7b\x1a\x60\xff\0\0\0\0\x7b\x1a\x58\xff\0\0\0\0\x7b\x1a\x50\
+\xff\0\0\0\0\x15\x08\x4c\x02\0\0\0\0\x6b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\
+\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x81\0\0\0\0\0\0\xb7\x02\0\0\x0c\0\0\0\xb7\
+\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\
+\x77\0\0\0\x20\0\0\0\x55\0\x11\0\0\0\0\0\xb7\x02\0\0\x10\0\0\0\x69\xa1\xd0\xff\
+\0\0\0\0\xbf\x13\0\0\0\0\0\0\xdc\x03\0\0\x10\0\0\0\x15\x03\x02\0\0\x81\0\0\x55\
+\x03\x0c\0\xa8\x88\0\0\xb7\x02\0\0\x14\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\
+\xd0\xff\xff\xff\xbf\x81\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\0\0\0\0\
+\x85\0\0\0\x44\0\0\0\x69\xa1\xd0\xff\0\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\
+\0\0\0\x15\0\x01\0\0\0\0\0\x05\0\x2f\x02\0\0\0\0\x15\x01\x2e\x02\0\0\0\0\x7b\
+\x9a\x30\xff\0\0\0\0\x15\x01\x57\0\x86\xdd\0\0\x55\x01\x3b\0\x08\0\0\0\x7b\x7a\
+\x20\xff\0\0\0\0\xb7\x07\0\0\x01\0\0\0\x73\x7a\x50\xff\0\0\0\0\xb7\x01\0\0\0\0\
+\0\0\x63\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\
+\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x81\0\0\0\0\0\0\xb7\x02\0\
+\0\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\
+\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x1a\x02\0\0\0\0\x69\xa1\xd6\xff\0\0\
+\0\0\x55\x01\x01\0\0\0\0\0\xb7\x07\0\0\0\0\0\0\x61\xa1\xdc\xff\0\0\0\0\x63\x1a\
+\x5c\xff\0\0\0\0\x61\xa1\xe0\xff\0\0\0\0\x63\x1a\x60\xff\0\0\0\0\x73\x7a\x56\
+\xff\0\0\0\0\x71\xa9\xd9\xff\0\0\0\0\x71\xa1\xd0\xff\0\0\0\0\x67\x01\0\0\x02\0\
+\0\0\x57\x01\0\0\x3c\0\0\0\x7b\x1a\x40\xff\0\0\0\0\x79\xa7\x20\xff\0\0\0\0\xbf\
+\x91\0\0\0\0\0\0\x57\x01\0\0\xff\0\0\0\x15\x01\x19\0\0\0\0\0\x71\xa1\x56\xff\0\
+\0\0\0\x55\x01\x17\0\0\0\0\0\x57\x09\0\0\xff\0\0\0\x15\x09\x7a\x01\x11\0\0\0\
+\x55\x09\x14\0\x06\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\x1a\x53\xff\0\0\0\0\xb7\x01\
+\0\0\0\0\0\0\x63\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\xff\0\0\0\0\x7b\x1a\xd0\xff\0\
+\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\xff\xff\xbf\x81\0\0\0\0\0\0\x79\
+\xa2\x40\xff\0\0\0\0\xb7\x04\0\0\x14\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\
+\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\xf4\x01\0\0\0\0\x69\xa1\
+\xd0\xff\0\0\0\0\x6b\x1a\x58\xff\0\0\0\0\x69\xa1\xd2\xff\0\0\0\0\x6b\x1a\x5a\
+\xff\0\0\0\0\x71\xa1\x50\xff\0\0\0\0\x15\x01\xd4\0\0\0\0\0\x71\x62\x03\0\0\0\0\
+\0\x67\x02\0\0\x08\0\0\0\x71\x61\x02\0\0\0\0\0\x4f\x12\0\0\0\0\0\0\x71\x63\x04\
+\0\0\0\0\0\x71\x61\x05\0\0\0\0\0\x67\x01\0\0\x08\0\0\0\x4f\x31\0\0\0\0\0\0\x67\
+\x01\0\0\x10\0\0\0\x4f\x21\0\0\0\0\0\0\x71\xa2\x53\xff\0\0\0\0\x79\xa0\x30\xff\
+\0\0\0\0\x15\x02\x06\x01\0\0\0\0\xbf\x12\0\0\0\0\0\0\x57\x02\0\0\x02\0\0\0\x15\
+\x02\x03\x01\0\0\0\0\x61\xa1\x5c\xff\0\0\0\0\x63\x1a\xa0\xff\0\0\0\0\x61\xa1\
+\x60\xff\0\0\0\0\x63\x1a\xa4\xff\0\0\0\0\x69\xa1\x58\xff\0\0\0\0\x6b\x1a\xa8\
+\xff\0\0\0\0\x69\xa1\x5a\xff\0\0\0\0\x6b\x1a\xaa\xff\0\0\0\0\x05\0\x65\x01\0\0\
+\0\0\xb7\x01\0\0\x01\0\0\0\x73\x1a\x51\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\
+\xf0\xff\0\0\0\0\x7b\x1a\xe8\xff\0\0\0\0\x7b\x1a\xe0\xff\0\0\0\0\x7b\x1a\xd8\
+\xff\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xd0\xff\
+\xff\xff\xb7\x01\0\0\x28\0\0\0\x7b\x1a\x40\xff\0\0\0\0\xbf\x81\0\0\0\0\0\0\xb7\
+\x02\0\0\0\0\0\0\xb7\x04\0\0\x28\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\
+\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x55\0\x10\x01\0\0\0\0\x79\xa1\xe0\
+\xff\0\0\0\0\x63\x1a\x64\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x68\xff\0\0\
+\0\0\x79\xa1\xd8\xff\0\0\0\0\x63\x1a\x5c\xff\0\0\0\0\x77\x01\0\0\x20\0\0\0\x63\
+\x1a\x60\xff\0\0\0\0\x79\xa1\xe8\xff\0\0\0\0\x63\x1a\x6c\xff\0\0\0\0\x77\x01\0\
+\0\x20\0\0\0\x63\x1a\x70\xff\0\0\0\0\x79\xa1\xf0\xff\0\0\0\0\x63\x1a\x74\xff\0\
+\0\0\0\x77\x01\0\0\x20\0\0\0\x63\x1a\x78\xff\0\0\0\0\x71\xa9\xd6\xff\0\0\0\0\
+\x25\x09\xff\0\x3c\0\0\0\xb7\x01\0\0\x01\0\0\0\x6f\x91\0\0\0\0\0\0\x18\x02\0\0\
+\x01\0\0\0\0\0\0\0\0\x18\0\x1c\x5f\x21\0\0\0\0\0\0\x55\x01\x01\0\0\0\0\0\x05\0\
+\xf8\0\0\0\0\0\xb7\x01\0\0\0\0\0\0\x6b\x1a\xfe\xff\0\0\0\0\xb7\x01\0\0\x28\0\0\
+\0\x7b\x1a\x40\xff\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\x01\0\0\x8c\xff\xff\xff\x7b\
+\x1a\x18\xff\0\0\0\0\xbf\xa1\0\0\0\0\0\0\x07\x01\0\0\x7c\xff\xff\xff\x7b\x1a\
+\x10\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\x28\xff\0\0\0\0\x7b\x7a\x20\xff\0\
+\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xfe\xff\xff\xff\xbf\x81\0\0\0\0\0\0\x79\
+\xa2\x40\xff\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\
+\0\0\0\x67\0\0\0\x20\0\0\0\x77\0\0\0\x20\0\0\0\x15\0\x01\0\0\0\0\0\x05\0\x90\
+\x01\0\0\0\0\xbf\x91\0\0\0\0\0\0\x15\x01\x23\0\x3c\0\0\0\x15\x01\x59\0\x2c\0\0\
+\0\x55\x01\x5a\0\x2b\0\0\0\xb7\x01\0\0\0\0\0\0\x63\x1a\xf8\xff\0\0\0\0\xbf\xa3\
+\0\0\0\0\0\0\x07\x03\0\0\xf8\xff\xff\xff\xbf\x81\0\0\0\0\0\0\x79\xa2\x40\xff\0\
+\0\0\0\xb7\x04\0\0\x04\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\xbf\x01\
+\0\0\0\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\x55\x01\x03\x01\0\0\0\
+\0\x71\xa1\xfa\xff\0\0\0\0\x55\x01\x4b\0\x02\0\0\0\x71\xa1\xf9\xff\0\0\0\0\x55\
+\x01\x49\0\x02\0\0\0\x71\xa1\xfb\xff\0\0\0\0\x55\x01\x47\0\x01\0\0\0\x79\xa2\
+\x40\xff\0\0\0\0\x07\x02\0\0\x08\0\0\0\xbf\x81\0\0\0\0\0\0\x79\xa3\x18\xff\0\0\
+\0\0\xb7\x04\0\0\x10\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\xbf\x01\0\
+\0\0\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\x55\x01\xf2\0\0\0\0\0\
+\xb7\x01\0\0\x01\0\0\0\x73\x1a\x55\xff\0\0\0\0\x05\0\x39\0\0\0\0\0\xb7\x01\0\0\
+\0\0\0\0\x6b\x1a\xf8\xff\0\0\0\0\xb7\x09\0\0\x02\0\0\0\xb7\x07\0\0\x1e\0\0\0\
+\x05\0\x0e\0\0\0\0\0\x79\xa2\x38\xff\0\0\0\0\x0f\x29\0\0\0\0\0\0\xbf\x92\0\0\0\
+\0\0\0\x07\x02\0\0\x01\0\0\0\x71\xa3\xff\xff\0\0\0\0\x67\x03\0\0\x03\0\0\0\x2d\
+\x23\x02\0\0\0\0\0\x79\xa7\x20\xff\0\0\0\0\x05\0\x2b\0\0\0\0\0\x07\x07\0\0\xff\
+\xff\xff\xff\xbf\x72\0\0\0\0\0\0\x67\x02\0\0\x20\0\0\0\x77\x02\0\0\x20\0\0\0\
+\x15\x02\xf9\xff\0\0\0\0\x7b\x9a\x38\xff\0\0\0\0\x79\xa1\x40\xff\0\0\0\0\x0f\
+\x19\0\0\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\xf8\xff\xff\xff\xbf\x81\0\0\0\
+\0\0\0\xbf\x92\0\0\0\0\0\0\xb7\x04\0\0\x02\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\
+\0\x44\0\0\0\xbf\x01\0\0\0\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\
+\x55\x01\x94\0\0\0\0\0\x71\xa2\xf8\xff\0\0\0\0\x55\x02\x0f\0\xc9\0\0\0\x07\x09\
+\0\0\x02\0\0\0\xbf\x81\0\0\0\0\0\0\xbf\x92\0\0\0\0\0\0\x79\xa3\x10\xff\0\0\0\0\
+\xb7\x04\0\0\x10\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\xbf\x01\0\0\0\
+\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\x55\x01\x87\0\0\0\0\0\xb7\
+\x01\0\0\x01\0\0\0\x73\x1a\x54\xff\0\0\0\0\x79\xa7\x20\xff\0\0\0\0\x05\0\x07\0\
+\0\0\0\0\xb7\x09\0\0\x01\0\0\0\x15\x02\xd1\xff\0\0\0\0\x71\xa9\xf9\xff\0\0\0\0\
+\x07\x09\0\0\x02\0\0\0\x05\0\xce\xff\0\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\x1a\x56\
+\xff\0\0\0\0\x71\xa1\xff\xff\0\0\0\0\x67\x01\0\0\x03\0\0\0\x79\xa2\x40\xff\0\0\
+\0\0\x0f\x12\0\0\0\0\0\0\x07\x02\0\0\x08\0\0\0\x7b\x2a\x40\xff\0\0\0\0\x71\xa9\
+\xfe\xff\0\0\0\0\x25\x09\x0e\0\x3c\0\0\0\xb7\x01\0\0\x01\0\0\0\x6f\x91\0\0\0\0\
+\0\0\x18\x02\0\0\x01\0\0\0\0\0\0\0\0\x18\0\x1c\x5f\x21\0\0\0\0\0\0\x55\x01\x01\
+\0\0\0\0\0\x05\0\x07\0\0\0\0\0\x79\xa1\x28\xff\0\0\0\0\x07\x01\0\0\x01\0\0\0\
+\x7b\x1a\x28\xff\0\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\x55\x01\
+\x82\xff\x0b\0\0\0\x05\0\x10\xff\0\0\0\0\x15\x09\xf8\xff\x87\0\0\0\x05\0\xfd\
+\xff\0\0\0\0\x71\xa1\x51\xff\0\0\0\0\x79\xa0\x30\xff\0\0\0\0\x15\x01\x17\x01\0\
+\0\0\0\x71\x62\x03\0\0\0\0\0\x67\x02\0\0\x08\0\0\0\x71\x61\x02\0\0\0\0\0\x4f\
+\x12\0\0\0\0\0\0\x71\x63\x04\0\0\0\0\0\x71\x61\x05\0\0\0\0\0\x67\x01\0\0\x08\0\
+\0\0\x4f\x31\0\0\0\0\0\0\x67\x01\0\0\x10\0\0\0\x4f\x21\0\0\0\0\0\0\x71\xa2\x53\
+\xff\0\0\0\0\x15\x02\x3d\0\0\0\0\0\xbf\x12\0\0\0\0\0\0\x57\x02\0\0\x10\0\0\0\
+\x15\x02\x3a\0\0\0\0\0\xbf\xa2\0\0\0\0\0\0\x07\x02\0\0\x5c\xff\xff\xff\x71\xa4\
+\x54\xff\0\0\0\0\xbf\x23\0\0\0\0\0\0\x15\x04\x02\0\0\0\0\0\xbf\xa3\0\0\0\0\0\0\
+\x07\x03\0\0\x7c\xff\xff\xff\x67\x01\0\0\x38\0\0\0\xc7\x01\0\0\x38\0\0\0\x65\
+\x01\x01\0\xff\xff\xff\xff\xbf\x32\0\0\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\
+\x6c\xff\xff\xff\x71\xa5\x55\xff\0\0\0\0\xbf\x34\0\0\0\0\0\0\x15\x05\x02\0\0\0\
+\0\0\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\x8c\xff\xff\xff\x65\x01\x01\0\xff\xff\xff\
+\xff\xbf\x43\0\0\0\0\0\0\x61\x21\x04\0\0\0\0\0\x67\x01\0\0\x20\0\0\0\x61\x24\0\
+\0\0\0\0\0\x4f\x41\0\0\0\0\0\0\x7b\x1a\xa0\xff\0\0\0\0\x61\x21\x08\0\0\0\0\0\
+\x61\x22\x0c\0\0\0\0\0\x67\x02\0\0\x20\0\0\0\x4f\x12\0\0\0\0\0\0\x7b\x2a\xa8\
+\xff\0\0\0\0\x61\x31\0\0\0\0\0\0\x61\x32\x04\0\0\0\0\0\x61\x34\x08\0\0\0\0\0\
+\x61\x33\x0c\0\0\0\0\0\x69\xa5\x5a\xff\0\0\0\0\x6b\x5a\xc2\xff\0\0\0\0\x69\xa5\
+\x58\xff\0\0\0\0\x6b\x5a\xc0\xff\0\0\0\0\x67\x03\0\0\x20\0\0\0\x4f\x43\0\0\0\0\
+\0\0\x7b\x3a\xb8\xff\0\0\0\0\x67\x02\0\0\x20\0\0\0\x4f\x12\0\0\0\0\0\0\x7b\x2a\
+\xb0\xff\0\0\0\0\x05\0\x6b\0\0\0\0\0\x71\xa2\x52\xff\0\0\0\0\x15\x02\x04\0\0\0\
+\0\0\xbf\x12\0\0\0\0\0\0\x57\x02\0\0\x04\0\0\0\x15\x02\x01\0\0\0\0\0\x05\0\xf7\
+\xfe\0\0\0\0\x57\x01\0\0\x01\0\0\0\x15\x01\xd3\0\0\0\0\0\x61\xa1\x5c\xff\0\0\0\
+\0\x63\x1a\xa0\xff\0\0\0\0\x61\xa1\x60\xff\0\0\0\0\x63\x1a\xa4\xff\0\0\0\0\x05\
+\0\x5e\0\0\0\0\0\x71\xa2\x52\xff\0\0\0\0\x15\x02\x1e\0\0\0\0\0\xbf\x12\0\0\0\0\
+\0\0\x57\x02\0\0\x20\0\0\0\x15\x02\x1b\0\0\0\0\0\xbf\xa2\0\0\0\0\0\0\x07\x02\0\
+\0\x5c\xff\xff\xff\x71\xa4\x54\xff\0\0\0\0\xbf\x23\0\0\0\0\0\0\x15\x04\x02\0\0\
+\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\x7c\xff\xff\xff\x57\x01\0\0\0\x01\0\0\
+\x15\x01\x01\0\0\0\0\0\xbf\x32\0\0\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\x03\0\0\x6c\
+\xff\xff\xff\x71\xa5\x55\xff\0\0\0\0\xbf\x34\0\0\0\0\0\0\x15\x05\x02\0\0\0\0\0\
+\xbf\xa4\0\0\0\0\0\0\x07\x04\0\0\x8c\xff\xff\xff\x15\x01\xc3\xff\0\0\0\0\x05\0\
+\xc1\xff\0\0\0\0\xb7\x09\0\0\x3c\0\0\0\x79\xa7\x20\xff\0\0\0\0\x67\0\0\0\x20\0\
+\0\0\x77\0\0\0\x20\0\0\0\x15\0\xa5\xfe\0\0\0\0\x05\0\xb0\0\0\0\0\0\x15\x09\x07\
+\xff\x87\0\0\0\x05\0\xa2\xfe\0\0\0\0\xbf\x12\0\0\0\0\0\0\x57\x02\0\0\x08\0\0\0\
+\x15\x02\xab\0\0\0\0\0\xbf\xa2\0\0\0\0\0\0\x07\x02\0\0\x5c\xff\xff\xff\x71\xa4\
+\x54\xff\0\0\0\0\xbf\x23\0\0\0\0\0\0\x15\x04\x02\0\0\0\0\0\xbf\xa3\0\0\0\0\0\0\
+\x07\x03\0\0\x7c\xff\xff\xff\x57\x01\0\0\x40\0\0\0\x15\x01\x01\0\0\0\0\0\xbf\
+\x32\0\0\0\0\0\0\x61\x23\x04\0\0\0\0\0\x67\x03\0\0\x20\0\0\0\x61\x24\0\0\0\0\0\
+\0\x4f\x43\0\0\0\0\0\0\x7b\x3a\xa0\xff\0\0\0\0\x61\x23\x08\0\0\0\0\0\x61\x22\
+\x0c\0\0\0\0\0\x67\x02\0\0\x20\0\0\0\x4f\x32\0\0\0\0\0\0\x7b\x2a\xa8\xff\0\0\0\
+\0\x15\x01\x1c\0\0\0\0\0\x71\xa1\x55\xff\0\0\0\0\x15\x01\x1a\0\0\0\0\0\x61\xa1\
+\x98\xff\0\0\0\0\x67\x01\0\0\x20\0\0\0\x61\xa2\x94\xff\0\0\0\0\x4f\x21\0\0\0\0\
+\0\0\x7b\x1a\xb8\xff\0\0\0\0\x61\xa1\x90\xff\0\0\0\0\x67\x01\0\0\x20\0\0\0\x61\
+\xa2\x8c\xff\0\0\0\0\x05\0\x19\0\0\0\0\0\xb7\x01\0\0\x01\0\0\0\x73\x1a\x52\xff\
+\0\0\0\0\xb7\x01\0\0\0\0\0\0\x7b\x1a\xd0\xff\0\0\0\0\xbf\xa3\0\0\0\0\0\0\x07\
+\x03\0\0\xd0\xff\xff\xff\xbf\x81\0\0\0\0\0\0\x79\xa2\x40\xff\0\0\0\0\xb7\x04\0\
+\0\x08\0\0\0\xb7\x05\0\0\x01\0\0\0\x85\0\0\0\x44\0\0\0\x67\0\0\0\x20\0\0\0\x77\
+\0\0\0\x20\0\0\0\x55\0\x7d\0\0\0\0\0\x05\0\x88\xfe\0\0\0\0\xb7\x09\0\0\x2b\0\0\
+\0\x05\0\xc6\xff\0\0\0\0\x61\xa1\x78\xff\0\0\0\0\x67\x01\0\0\x20\0\0\0\x61\xa2\
+\x74\xff\0\0\0\0\x4f\x21\0\0\0\0\0\0\x7b\x1a\xb8\xff\0\0\0\0\x61\xa1\x70\xff\0\
+\0\0\0\x67\x01\0\0\x20\0\0\0\x61\xa2\x6c\xff\0\0\0\0\x4f\x21\0\0\0\0\0\0\x7b\
+\x1a\xb0\xff\0\0\0\0\xb7\x01\0\0\0\0\0\0\x07\x07\0\0\x04\0\0\0\x61\x03\0\0\0\0\
+\0\0\xb7\x05\0\0\0\0\0\0\x05\0\x4e\0\0\0\0\0\xaf\x52\0\0\0\0\0\0\xbf\x75\0\0\0\
+\0\0\0\x0f\x15\0\0\0\0\0\0\x71\x55\0\0\0\0\0\0\x67\x03\0\0\x01\0\0\0\xbf\x50\0\
+\0\0\0\0\0\x77\0\0\0\x07\0\0\0\x4f\x03\0\0\0\0\0\0\xbf\x40\0\0\0\0\0\0\x67\0\0\
+\0\x39\0\0\0\xc7\0\0\0\x3f\0\0\0\x5f\x30\0\0\0\0\0\0\xaf\x02\0\0\0\0\0\0\xbf\
+\x50\0\0\0\0\0\0\x77\0\0\0\x06\0\0\0\x57\0\0\0\x01\0\0\0\x67\x03\0\0\x01\0\0\0\
+\x4f\x03\0\0\0\0\0\0\xbf\x40\0\0\0\0\0\0\x67\0\0\0\x3a\0\0\0\xc7\0\0\0\x3f\0\0\
+\0\x5f\x30\0\0\0\0\0\0\xaf\x02\0\0\0\0\0\0\x67\x03\0\0\x01\0\0\0\xbf\x50\0\0\0\
+\0\0\0\x77\0\0\0\x05\0\0\0\x57\0\0\0\x01\0\0\0\x4f\x03\0\0\0\0\0\0\xbf\x40\0\0\
+\0\0\0\0\x67\0\0\0\x3b\0\0\0\xc7\0\0\0\x3f\0\0\0\x5f\x30\0\0\0\0\0\0\xaf\x02\0\
+\0\0\0\0\0\x67\x03\0\0\x01\0\0\0\xbf\x50\0\0\0\0\0\0\x77\0\0\0\x04\0\0\0\x57\0\
+\0\0\x01\0\0\0\x4f\x03\0\0\0\0\0\0\xbf\x40\0\0\0\0\0\0\x67\0\0\0\x3c\0\0\0\xc7\
+\0\0\0\x3f\0\0\0\x5f\x30\0\0\0\0\0\0\xaf\x02\0\0\0\0\0\0\xbf\x50\0\0\0\0\0\0\
+\x77\0\0\0\x03\0\0\0\x57\0\0\0\x01\0\0\0\x67\x03\0\0\x01\0\0\0\x4f\x03\0\0\0\0\
+\0\0\xbf\x40\0\0\0\0\0\0\x67\0\0\0\x3d\0\0\0\xc7\0\0\0\x3f\0\0\0\x5f\x30\0\0\0\
+\0\0\0\xaf\x02\0\0\0\0\0\0\xbf\x50\0\0\0\0\0\0\x77\0\0\0\x02\0\0\0\x57\0\0\0\
+\x01\0\0\0\x67\x03\0\0\x01\0\0\0\x4f\x03\0\0\0\0\0\0\xbf\x40\0\0\0\0\0\0\x67\0\
+\0\0\x3e\0\0\0\xc7\0\0\0\x3f\0\0\0\x5f\x30\0\0\0\0\0\0\xaf\x02\0\0\0\0\0\0\xbf\
+\x50\0\0\0\0\0\0\x77\0\0\0\x01\0\0\0\x57\0\0\0\x01\0\0\0\x67\x03\0\0\x01\0\0\0\
+\x4f\x03\0\0\0\0\0\0\x57\x04\0\0\x01\0\0\0\x87\x04\0\0\0\0\0\0\x5f\x34\0\0\0\0\
+\0\0\xaf\x42\0\0\0\0\0\0\x57\x05\0\0\x01\0\0\0\x67\x03\0\0\x01\0\0\0\x4f\x53\0\
+\0\0\0\0\0\x07\x01\0\0\x01\0\0\0\xbf\x25\0\0\0\0\0\0\x15\x01\x0b\0\x24\0\0\0\
+\xbf\xa2\0\0\0\0\0\0\x07\x02\0\0\xa0\xff\xff\xff\x0f\x12\0\0\0\0\0\0\x71\x24\0\
+\0\0\0\0\0\xbf\x40\0\0\0\0\0\0\x67\0\0\0\x38\0\0\0\xc7\0\0\0\x38\0\0\0\xb7\x02\
+\0\0\0\0\0\0\x65\0\xa9\xff\xff\xff\xff\xff\xbf\x32\0\0\0\0\0\0\x05\0\xa7\xff\0\
+\0\0\0\xbf\x21\0\0\0\0\0\0\x67\x01\0\0\x20\0\0\0\x77\x01\0\0\x20\0\0\0\x15\x01\
+\x0e\0\0\0\0\0\x71\x63\x06\0\0\0\0\0\x71\x64\x07\0\0\0\0\0\x67\x04\0\0\x08\0\0\
+\0\x4f\x34\0\0\0\0\0\0\x3f\x41\0\0\0\0\0\0\x2f\x41\0\0\0\0\0\0\x1f\x12\0\0\0\0\
+\0\0\x63\x2a\x50\xff\0\0\0\0\xbf\xa2\0\0\0\0\0\0\x07\x02\0\0\x50\xff\xff\xff\
+\x18\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x85\0\0\0\x01\0\0\0\x55\0\x05\0\0\0\0\0\
+\x71\x61\x08\0\0\0\0\0\x71\x60\x09\0\0\0\0\0\x67\0\0\0\x08\0\0\0\x4f\x10\0\0\0\
+\0\0\0\x95\0\0\0\0\0\0\0\x69\0\0\0\0\0\0\0\x05\0\xfd\xff\0\0\0\0\x02\0\0\0\x04\
+\0\0\0\x0a\0\0\0\x01\0\0\0\0\0\0\0\x02\0\0\0\x04\0\0\0\x28\0\0\0\x01\0\0\0\0\0\
+\0\0\x02\0\0\0\x04\0\0\0\x02\0\0\0\x80\0\0\0\0\0\0\0\x47\x50\x4c\x20\x76\x32\0\
+\0\0\0\0\0\x10\0\0\0\0\0\0\0\x01\x7a\x52\0\x08\x7c\x0b\x01\x0c\0\0\0\x18\0\0\0\
+\x18\0\0\0\0\0\0\0\0\0\0\0\xd8\x13\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\0\xa0\0\0\0\x04\0\xf1\xff\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\x60\x02\0\0\0\0\x03\0\x20\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x3f\x02\0\0\0\0\
+\x03\0\xd0\x0f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xed\x01\0\0\0\0\x03\0\x10\x10\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\xd4\x01\0\0\0\0\x03\0\x20\x10\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\xa3\x01\0\0\0\0\x03\0\xb8\x12\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x63\x01\0\0\0\0\
+\x03\0\x48\x10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x2a\x01\0\0\0\0\x03\0\x10\x13\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\xe1\0\0\0\0\0\x03\0\xa0\x13\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\x2e\x02\0\0\0\0\x03\0\x28\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x68\x02\0\0\0\0\x03\
+\0\xc0\x13\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x36\x02\0\0\0\0\x03\0\xc8\x13\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\x22\x01\0\0\0\0\x03\0\xe8\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\x02\x01\0\0\0\0\x03\0\x40\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xd9\0\0\0\0\0\x03\0\
+\xf8\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x26\x02\0\0\0\0\x03\0\x20\x0e\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\xcc\x01\0\0\0\0\x03\0\x60\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x9b\
+\x01\0\0\0\0\x03\0\xc8\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x5b\x01\0\0\0\0\x03\0\
+\x20\x07\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x7c\x01\0\0\0\0\x03\0\x48\x08\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\x53\x01\0\0\0\0\x03\0\xb8\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x1a\
+\x01\0\0\0\0\x03\0\xe0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x84\x01\0\0\0\0\x03\0\
+\xb8\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x1e\x02\0\0\0\0\x03\0\xd8\x09\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\xc4\x01\0\0\0\0\x03\0\x70\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x93\
+\x01\0\0\0\0\x03\0\xa8\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x74\x01\0\0\0\0\x03\0\
+\xf0\x0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x4b\x01\0\0\0\0\x03\0\0\x0a\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\x12\x01\0\0\0\0\x03\0\x10\x0a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xfa\0\
+\0\0\0\0\x03\0\xc0\x0a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x58\x02\0\0\0\0\x03\0\x88\
+\x0a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x16\x02\0\0\0\0\x03\0\xb8\x0a\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\xe5\x01\0\0\0\0\x03\0\xc0\x0f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xbc\x01\
+\0\0\0\0\x03\0\0\x0e\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x8b\x01\0\0\0\0\x03\0\x18\x0e\
+\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xd1\0\0\0\0\0\x03\0\0\x04\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\x50\x02\0\0\0\0\x03\0\x20\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x0e\x02\0\0\0\0\
+\x03\0\x48\x0f\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x6c\x01\0\0\0\0\x03\0\xb0\x04\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\x43\x01\0\0\0\0\x03\0\xc8\x0c\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\xc9\0\0\0\0\0\x03\0\xf8\x0c\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x06\x02\0\0\0\0\x03\
+\0\xd0\x0a\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x3b\x01\0\0\0\0\x03\0\x98\x0b\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\xf2\0\0\0\0\0\x03\0\xb8\x0b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x48\
+\x02\0\0\0\0\x03\0\xf0\x0b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xfe\x01\0\0\0\0\x03\0\
+\xf8\x0b\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xdd\x01\0\0\0\0\x03\0\0\x0c\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\xb4\x01\0\0\0\0\x03\0\x30\x0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x0a\
+\x01\0\0\0\0\x03\0\x90\x0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xc1\0\0\0\0\0\x03\0\xa8\
+\x0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xba\0\0\0\0\0\x03\0\xd0\x01\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\xf6\x01\0\0\0\0\x03\0\xe0\x0d\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xac\x01\0\
+\0\0\0\x03\0\x30\x0e\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x33\x01\0\0\0\0\x03\0\x80\x0e\
+\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xea\0\0\0\0\0\x03\0\x98\x0e\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\x03\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x6b\0\0\0\x11\0\x06\
+\0\0\0\0\0\0\0\0\0\x07\0\0\0\0\0\0\0\x25\0\0\0\x11\0\x05\0\0\0\0\0\0\0\0\0\x14\
+\0\0\0\0\0\0\0\x82\0\0\0\x11\0\x05\0\x28\0\0\0\0\0\0\0\x14\0\0\0\0\0\0\0\x01\0\
+\0\0\x11\0\x05\0\x14\0\0\0\0\0\0\0\x14\0\0\0\0\0\0\0\x40\0\0\0\x12\0\x03\0\0\0\
+\0\0\0\0\0\0\xd8\x13\0\0\0\0\0\0\x28\0\0\0\0\0\0\0\x01\0\0\0\x3a\0\0\0\x50\0\0\
+\0\0\0\0\0\x01\0\0\0\x3c\0\0\0\x80\x13\0\0\0\0\0\0\x01\0\0\0\x3b\0\0\0\x1c\0\0\
+\0\0\0\0\0\x01\0\0\0\x38\0\0\0\0\x74\x61\x70\x5f\x72\x73\x73\x5f\x6d\x61\x70\
+\x5f\x74\x6f\x65\x70\x6c\x69\x74\x7a\x5f\x6b\x65\x79\0\x2e\x74\x65\x78\x74\0\
+\x6d\x61\x70\x73\0\x74\x61\x70\x5f\x72\x73\x73\x5f\x6d\x61\x70\x5f\x63\x6f\x6e\
+\x66\x69\x67\x75\x72\x61\x74\x69\x6f\x6e\x73\0\x74\x75\x6e\x5f\x72\x73\x73\x5f\
+\x73\x74\x65\x65\x72\x69\x6e\x67\x5f\x70\x72\x6f\x67\0\x2e\x72\x65\x6c\x74\x75\
+\x6e\x5f\x72\x73\x73\x5f\x73\x74\x65\x65\x72\x69\x6e\x67\0\x5f\x6c\x69\x63\x65\
+\x6e\x73\x65\0\x2e\x72\x65\x6c\x2e\x65\x68\x5f\x66\x72\x61\x6d\x65\0\x74\x61\
+\x70\x5f\x72\x73\x73\x5f\x6d\x61\x70\x5f\x69\x6e\x64\x69\x72\x65\x63\x74\x69\
+\x6f\x6e\x5f\x74\x61\x62\x6c\x65\0\x72\x73\x73\x2e\x62\x70\x66\x2e\x63\0\x2e\
+\x73\x74\x72\x74\x61\x62\0\x2e\x73\x79\x6d\x74\x61\x62\0\x4c\x42\x42\x30\x5f\
+\x39\0\x4c\x42\x42\x30\x5f\x38\x39\0\x4c\x42\x42\x30\x5f\x36\x39\0\x4c\x42\x42\
+\x30\x5f\x35\x39\0\x4c\x42\x42\x30\x5f\x31\x39\0\x4c\x42\x42\x30\x5f\x31\x30\
+\x39\0\x4c\x42\x42\x30\x5f\x39\x38\0\x4c\x42\x42\x30\x5f\x37\x38\0\x4c\x42\x42\
+\x30\x5f\x34\x38\0\x4c\x42\x42\x30\x5f\x31\x38\0\x4c\x42\x42\x30\x5f\x38\x37\0\
+\x4c\x42\x42\x30\x5f\x34\x37\0\x4c\x42\x42\x30\x5f\x33\x37\0\x4c\x42\x42\x30\
+\x5f\x31\x37\0\x4c\x42\x42\x30\x5f\x31\x30\x37\0\x4c\x42\x42\x30\x5f\x39\x36\0\
+\x4c\x42\x42\x30\x5f\x37\x36\0\x4c\x42\x42\x30\x5f\x36\x36\0\x4c\x42\x42\x30\
+\x5f\x34\x36\0\x4c\x42\x42\x30\x5f\x33\x36\0\x4c\x42\x42\x30\x5f\x32\x36\0\x4c\
+\x42\x42\x30\x5f\x31\x30\x36\0\x4c\x42\x42\x30\x5f\x36\x35\0\x4c\x42\x42\x30\
+\x5f\x34\x35\0\x4c\x42\x42\x30\x5f\x33\x35\0\x4c\x42\x42\x30\x5f\x34\0\x4c\x42\
+\x42\x30\x5f\x35\x34\0\x4c\x42\x42\x30\x5f\x34\x34\0\x4c\x42\x42\x30\x5f\x32\
+\x34\0\x4c\x42\x42\x30\x5f\x31\x30\x34\0\x4c\x42\x42\x30\x5f\x39\x33\0\x4c\x42\
+\x42\x30\x5f\x38\x33\0\x4c\x42\x42\x30\x5f\x35\x33\0\x4c\x42\x42\x30\x5f\x34\
+\x33\0\x4c\x42\x42\x30\x5f\x32\x33\0\x4c\x42\x42\x30\x5f\x31\x30\x33\0\x4c\x42\
+\x42\x30\x5f\x38\x32\0\x4c\x42\x42\x30\x5f\x35\x32\0\x4c\x42\x42\x30\x5f\x31\
+\x30\x32\0\x4c\x42\x42\x30\x5f\x39\x31\0\x4c\x42\x42\x30\x5f\x38\x31\0\x4c\x42\
+\x42\x30\x5f\x37\x31\0\x4c\x42\x42\x30\x5f\x36\x31\0\x4c\x42\x42\x30\x5f\x35\
+\x31\0\x4c\x42\x42\x30\x5f\x34\x31\0\x4c\x42\x42\x30\x5f\x32\x31\0\x4c\x42\x42\
+\x30\x5f\x31\x31\0\x4c\x42\x42\x30\x5f\x31\x31\x31\0\x4c\x42\x42\x30\x5f\x31\
+\x30\x31\0\x4c\x42\x42\x30\x5f\x38\x30\0\x4c\x42\x42\x30\x5f\x36\x30\0\x4c\x42\
+\x42\x30\x5f\x35\x30\0\x4c\x42\x42\x30\x5f\x31\x30\0\x4c\x42\x42\x30\x5f\x31\
+\x31\x30\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xaa\
+\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xa0\x1a\0\0\0\0\0\0\x71\x02\0\
+\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x1a\0\0\0\x01\0\0\
+\0\x06\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\x04\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x5a\0\0\0\x01\0\0\0\x06\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\x40\0\0\0\0\0\0\0\xd8\x13\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x08\0\
+\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x56\0\0\0\x09\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\x60\x1a\0\0\0\0\0\0\x30\0\0\0\0\0\0\0\x09\0\0\0\x03\0\0\0\x08\0\0\0\0\0\0\0\
+\x10\0\0\0\0\0\0\0\x20\0\0\0\x01\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x18\
+\x14\0\0\0\0\0\0\x3c\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x04\0\0\0\0\0\0\0\0\0\0\0\0\
+\0\0\0\x6c\0\0\0\x01\0\0\0\x03\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x54\x14\0\0\0\0\0\
+\0\x07\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x78\0\0\
+\0\x01\0\0\0\x02\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x60\x14\0\0\0\0\0\0\x30\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\x08\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x74\0\0\0\x09\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\x90\x1a\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\x09\0\0\0\
+\x07\0\0\0\x08\0\0\0\0\0\0\0\x10\0\0\0\0\0\0\0\xb2\0\0\0\x02\0\0\0\0\0\0\0\0\0\
+\0\0\0\0\0\0\0\0\0\0\x90\x14\0\0\0\0\0\0\xd0\x05\0\0\0\0\0\0\x01\0\0\0\x39\0\0\
+\0\x08\0\0\0\0\0\0\0\x18\0\0\0\0\0\0\0";
+
+	return 0;
+err:
+	bpf_object__destroy_skeleton(s);
+	return -1;
+}
+
+#endif /* __RSS_BPF_SKEL_H__ */
diff --git a/ebpf/trace-events b/ebpf/trace-events
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/ebpf/trace-events
@@ -XXX,XX +XXX,XX @@
+# See docs/devel/tracing.txt for syntax documentation.
+
+# ebpf-rss.c
+ebpf_error(const char *s1, const char *s2) "error in %s: %s"
diff --git a/ebpf/trace.h b/ebpf/trace.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/ebpf/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-ebpf.h"
diff --git a/meson.build b/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/meson.build
+++ b/meson.build
@@ -XXX,XX +XXX,XX @@ if not get_option('fuse_lseek').disabled()
   endif
 endif
 
+# libbpf
+libbpf = dependency('libbpf', required: get_option('bpf'), method: 'pkg-config')
+if libbpf.found() and not cc.links('''
+   #include <bpf/libbpf.h>
+   int main(void)
+   {
+     bpf_object__destroy_skeleton(NULL);
+     return 0;
+   }''', dependencies: libbpf)
+  libbpf = not_found
+  if get_option('bpf').enabled()
+    error('libbpf skeleton test failed')
+  else
+    warning('libbpf skeleton test failed, disabling')
+  endif
+endif
+
 if get_option('cfi')
   cfi_flags=[]
   # Check for dependency on LTO
@@ -XXX,XX +XXX,XX @@ endif
 config_host_data.set('CONFIG_GTK', gtk.found())
 config_host_data.set('CONFIG_LIBATTR', have_old_libattr)
 config_host_data.set('CONFIG_LIBCAP_NG', libcap_ng.found())
+config_host_data.set('CONFIG_EBPF', libbpf.found())
 config_host_data.set('CONFIG_LIBISCSI', libiscsi.found())
 config_host_data.set('CONFIG_LIBNFS', libnfs.found())
 config_host_data.set('CONFIG_RBD', rbd.found())
@@ -XXX,XX +XXX,XX @@ if have_system
     'backends',
     'backends/tpm',
     'chardev',
+    'ebpf',
     'hw/9pfs',
     'hw/acpi',
     'hw/adc',
@@ -XXX,XX +XXX,XX @@ subdir('accel')
 subdir('plugins')
 subdir('bsd-user')
 subdir('linux-user')
+subdir('ebpf')
+
+common_ss.add(libbpf)
 
 bsd_user_ss.add(files('gdbstub.c'))
 specific_ss.add_all(when: 'CONFIG_BSD_USER', if_true: bsd_user_ss)
@@ -XXX,XX +XXX,XX @@ summary_info += {'RDMA support':      config_host.has_key('CONFIG_RDMA')}
 summary_info += {'PVRDMA support':    config_host.has_key('CONFIG_PVRDMA')}
 summary_info += {'fdt support':       fdt_opt == 'disabled' ? false : fdt_opt}
 summary_info += {'libcap-ng support': libcap_ng.found()}
+summary_info += {'bpf support': libbpf.found()}
 # TODO: add back protocol and server version
 summary_info += {'spice support':     config_host.has_key('CONFIG_SPICE')}
 summary_info += {'rbd support':       rbd.found()}
diff --git a/meson_options.txt b/meson_options.txt
index XXXXXXX..XXXXXXX 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -XXX,XX +XXX,XX @@ option('bzip2', type : 'feature', value : 'auto',
        description: 'bzip2 support for DMG images')
 option('cap_ng', type : 'feature', value : 'auto',
        description: 'cap_ng support')
+option('bpf', type : 'feature', value : 'auto',
+        description: 'eBPF support')
 option('cocoa', type : 'feature', value : 'auto',
        description: 'Cocoa user interface (macOS only)')
 option('curl', type : 'feature', value : 'auto',
-- 
2.7.4

From: Andrew Melnychenko <andrew@daynix.com>

When RSS is enabled the device tries to load the eBPF program
to select RX virtqueue in the TUN. If eBPF can be loaded
the RSS will function also with vhost (works with kernel 5.8 and later).
Software RSS is used as a fallback with vhost=off when eBPF can't be loaded
or when hash population requested by the guest.

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/vhost_net.c             |   3 ++
 hw/net/virtio-net.c            | 116 +++++++++++++++++++++++++++++++++++++++--
 include/hw/virtio/virtio-net.h |   4 ++
 net/vhost-vdpa.c               |   2 +
 4 files changed, 122 insertions(+), 3 deletions(-)

diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -XXX,XX +XXX,XX @@ static const int kernel_feature_bits[] = {
     VIRTIO_NET_F_MTU,
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_PACKED,
+    VIRTIO_NET_F_HASH_REPORT,
     VHOST_INVALID_FEATURE_BIT
 };
 
@@ -XXX,XX +XXX,XX @@ static const int user_feature_bits[] = {
     VIRTIO_NET_F_MTU,
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_PACKED,
+    VIRTIO_NET_F_RSS,
+    VIRTIO_NET_F_HASH_REPORT,
 
     /* This bit implies RARP isn't sent by QEMU out of band */
     VIRTIO_NET_F_GUEST_ANNOUNCE,
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -XXX,XX +XXX,XX @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
         return features;
     }
 
-    virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
-    virtio_clear_feature(&features, VIRTIO_NET_F_HASH_REPORT);
+    if (!ebpf_rss_is_loaded(&n->ebpf_rss)) {
+        virtio_clear_feature(&features, VIRTIO_NET_F_RSS);
+    }
     features = vhost_net_get_features(get_vhost_net(nc->peer), features);
     vdev->backend_features = features;
 
@@ -XXX,XX +XXX,XX @@ static int virtio_net_handle_announce(VirtIONet *n, uint8_t cmd,
     }
 }
 
+static void virtio_net_detach_epbf_rss(VirtIONet *n);
+
 static void virtio_net_disable_rss(VirtIONet *n)
 {
     if (n->rss_data.enabled) {
         trace_virtio_net_rss_disable();
     }
     n->rss_data.enabled = false;
+
+    virtio_net_detach_epbf_rss(n);
+}
+
+static bool virtio_net_attach_ebpf_to_backend(NICState *nic, int prog_fd)
+{
+    NetClientState *nc = qemu_get_peer(qemu_get_queue(nic), 0);
+    if (nc == NULL || nc->info->set_steering_ebpf == NULL) {
+        return false;
+    }
+
+    return nc->info->set_steering_ebpf(nc, prog_fd);
+}
+
+static void rss_data_to_rss_config(struct VirtioNetRssData *data,
+                                   struct EBPFRSSConfig *config)
+{
+    config->redirect = data->redirect;
+    config->populate_hash = data->populate_hash;
+    config->hash_types = data->hash_types;
+    config->indirections_len = data->indirections_len;
+    config->default_queue = data->default_queue;
+}
+
+static bool virtio_net_attach_epbf_rss(VirtIONet *n)
+{
+    struct EBPFRSSConfig config = {};
+
+    if (!ebpf_rss_is_loaded(&n->ebpf_rss)) {
+        return false;
+    }
+
+    rss_data_to_rss_config(&n->rss_data, &config);
+
+    if (!ebpf_rss_set_all(&n->ebpf_rss, &config,
+                          n->rss_data.indirections_table, n->rss_data.key)) {
+        return false;
+    }
+
+    if (!virtio_net_attach_ebpf_to_backend(n->nic, n->ebpf_rss.program_fd)) {
+        return false;
+    }
+
+    return true;
+}
+
+static void virtio_net_detach_epbf_rss(VirtIONet *n)
+{
+    virtio_net_attach_ebpf_to_backend(n->nic, -1);
+}
+
+static bool virtio_net_load_ebpf(VirtIONet *n)
+{
+    if (!virtio_net_attach_ebpf_to_backend(n->nic, -1)) {
+        /* backend does't support steering ebpf */
+        return false;
+    }
+
+    return ebpf_rss_load(&n->ebpf_rss);
+}
+
+static void virtio_net_unload_ebpf(VirtIONet *n)
+{
+    virtio_net_attach_ebpf_to_backend(n->nic, -1);
+    ebpf_rss_unload(&n->ebpf_rss);
 }
 
 static uint16_t virtio_net_handle_rss(VirtIONet *n,
@@ -XXX,XX +XXX,XX @@ static uint16_t virtio_net_handle_rss(VirtIONet *n,
         goto error;
     }
     n->rss_data.enabled = true;
+
+    if (!n->rss_data.populate_hash) {
+        if (!virtio_net_attach_epbf_rss(n)) {
+            /* EBPF must be loaded for vhost */
+            if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
+                warn_report("Can't load eBPF RSS for vhost");
+                goto error;
+            }
+            /* fallback to software RSS */
+            warn_report("Can't load eBPF RSS - fallback to software RSS");
+            n->rss_data.enabled_software_rss = true;
+        }
+    } else {
+        /* use software RSS for hash populating */
+        /* and detach eBPF if was loaded before */
+        virtio_net_detach_epbf_rss(n);
+        n->rss_data.enabled_software_rss = true;
+    }
+
     trace_virtio_net_rss_enable(n->rss_data.hash_types,
                                 n->rss_data.indirections_len,
                                 temp.b);
@@ -XXX,XX +XXX,XX @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
         return -1;
     }
 
-    if (!no_rss && n->rss_data.enabled) {
+    if (!no_rss && n->rss_data.enabled && n->rss_data.enabled_software_rss) {
         int index = virtio_net_process_rss(nc, buf, size);
         if (index >= 0) {
             NetClientState *nc2 = qemu_get_subqueue(n->nic, index);
@@ -XXX,XX +XXX,XX @@ static int virtio_net_post_load_device(void *opaque, int version_id)
     }
 
     if (n->rss_data.enabled) {
+        n->rss_data.enabled_software_rss = n->rss_data.populate_hash;
+        if (!n->rss_data.populate_hash) {
+            if (!virtio_net_attach_epbf_rss(n)) {
+                if (get_vhost_net(qemu_get_queue(n->nic)->peer)) {
+                    warn_report("Can't post-load eBPF RSS for vhost");
+                } else {
+                    warn_report("Can't post-load eBPF RSS - "
+                                "fallback to software RSS");
+                    n->rss_data.enabled_software_rss = true;
+                }
+            }
+        }
+
         trace_virtio_net_rss_enable(n->rss_data.hash_types,
                                     n->rss_data.indirections_len,
                                     sizeof(n->rss_data.key));
@@ -XXX,XX +XXX,XX @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
     n->qdev = dev;
 
     net_rx_pkt_init(&n->rx_pkt, false);
+
+    if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
+        virtio_net_load_ebpf(n);
+    }
 }
 
 static void virtio_net_device_unrealize(DeviceState *dev)
@@ -XXX,XX +XXX,XX @@ static void virtio_net_device_unrealize(DeviceState *dev)
     VirtIONet *n = VIRTIO_NET(dev);
     int i, max_queues;
 
+    if (virtio_has_feature(n->host_features, VIRTIO_NET_F_RSS)) {
+        virtio_net_unload_ebpf(n);
+    }
+
     /* This will stop vhost backend if appropriate. */
     virtio_net_set_status(vdev, 0);
 
@@ -XXX,XX +XXX,XX @@ static void virtio_net_instance_init(Object *obj)
     device_add_bootindex_property(obj, &n->nic_conf.bootindex,
                                   "bootindex", "/ethernet-phy@0",
                                   DEVICE(n));
+
+    ebpf_rss_init(&n->ebpf_rss);
 }
 
 static int virtio_net_pre_save(void *opaque)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -XXX,XX +XXX,XX @@
 #include "qemu/option_int.h"
 #include "qom/object.h"
 
+#include "ebpf/ebpf_rss.h"
+
 #define TYPE_VIRTIO_NET "virtio-net-device"
 OBJECT_DECLARE_SIMPLE_TYPE(VirtIONet, VIRTIO_NET)
 
@@ -XXX,XX +XXX,XX @@ typedef struct VirtioNetRscChain {
 
 typedef struct VirtioNetRssData {
     bool    enabled;
+    bool    enabled_software_rss;
     bool    redirect;
     bool    populate_hash;
     uint32_t hash_types;
@@ -XXX,XX +XXX,XX @@ struct VirtIONet {
     Notifier migration_state;
     VirtioNetRssData rss_data;
     struct NetRxPkt *rx_pkt;
+    struct EBPFRSSContext ebpf_rss;
 };
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index XXXXXXX..XXXXXXX 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -XXX,XX +XXX,XX @@ const int vdpa_feature_bits[] = {
     VIRTIO_NET_F_MTU,
     VIRTIO_F_IOMMU_PLATFORM,
     VIRTIO_F_RING_PACKED,
+    VIRTIO_NET_F_RSS,
+    VIRTIO_NET_F_HASH_REPORT,
     VIRTIO_NET_F_GUEST_ANNOUNCE,
     VIRTIO_NET_F_STATUS,
     VHOST_INVALID_FEATURE_BIT
-- 
2.7.4

From: Andrew Melnychenko <andrew@daynix.com>

Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 docs/devel/ebpf_rss.rst | 125 ++++++++++++++++++++++++++++++++++++++++++++++++
 docs/devel/index.rst    |   1 +
 2 files changed, 126 insertions(+)
 create mode 100644 docs/devel/ebpf_rss.rst

diff --git a/docs/devel/ebpf_rss.rst b/docs/devel/ebpf_rss.rst
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/docs/devel/ebpf_rss.rst
@@ -XXX,XX +XXX,XX @@
+===========================
+eBPF RSS virtio-net support
+===========================
+
+RSS(Receive Side Scaling) is used to distribute network packets to guest virtqueues
+by calculating packet hash. Usually every queue is processed then by a specific guest CPU core.
+
+For now there are 2 RSS implementations in qemu:
+- 'in-qemu' RSS (functions if qemu receives network packets, i.e. vhost=off)
+- eBPF RSS (can function with also with vhost=on)
+
+eBPF support (CONFIG_EBPF) is enabled by 'configure' script.
+To enable eBPF RSS support use './configure --enable-bpf'.
+
+If steering BPF is not set for kernel's TUN module, the TUN uses automatic selection
+of rx virtqueue based on lookup table built according to calculated symmetric hash
+of transmitted packets.
+If steering BPF is set for TUN the BPF code calculates the hash of packet header and
+returns the virtqueue number to place the packet to.
+
+Simplified decision formula:
+
+.. code:: C
+
+    queue_index = indirection_table[hash(<packet data>)%<indirection_table size>]
+
+
+Not for all packets, the hash can/should be calculated.
+
+Note: currently, eBPF RSS does not support hash reporting.
+
+eBPF RSS turned on by different combinations of vhost-net, vitrio-net and tap configurations:
+
+- eBPF is used:
+
+        tap,vhost=off & virtio-net-pci,rss=on,hash=off
+
+- eBPF is used:
+
+        tap,vhost=on & virtio-net-pci,rss=on,hash=off
+
+- 'in-qemu' RSS is used:
+
+        tap,vhost=off & virtio-net-pci,rss=on,hash=on
+
+- eBPF is used, hash population feature is not reported to the guest:
+
+        tap,vhost=on & virtio-net-pci,rss=on,hash=on
+
+If CONFIG_EBPF is not set then only 'in-qemu' RSS is supported.
+Also 'in-qemu' RSS, as a fallback, is used if the eBPF program failed to load or set to TUN.
+
+RSS eBPF program
+----------------
+
+RSS program located in ebpf/rss.bpf.skeleton.h generated by bpftool.
+So the program is part of the qemu binary.
+Initially, the eBPF program was compiled by clang and source code located at tools/ebpf/rss.bpf.c.
+Prerequisites to recompile the eBPF program (regenerate ebpf/rss.bpf.skeleton.h):
+
+        llvm, clang, kernel source tree, bpftool
+        Adjust Makefile.ebpf to reflect the location of the kernel source tree
+
+        $ cd tools/ebpf
+        $ make -f Makefile.ebpf
+
+Current eBPF RSS implementation uses 'bounded loops' with 'backward jump instructions' which present in the last kernels.
+Overall eBPF RSS works on kernels 5.8+.
+
+eBPF RSS implementation
+-----------------------
+
+eBPF RSS loading functionality located in ebpf/ebpf_rss.c and ebpf/ebpf_rss.h.
+
+The `struct EBPFRSSContext` structure that holds 4 file descriptors:
+
+- ctx - pointer of the libbpf context.
+- program_fd - file descriptor of the eBPF RSS program.
+- map_configuration - file descriptor of the 'configuration' map. This map contains one element of 'struct EBPFRSSConfig'. This configuration determines eBPF program behavior.
+- map_toeplitz_key - file descriptor of the 'Toeplitz key' map. One element of the 40byte key prepared for the hashing algorithm.
+- map_indirections_table - 128 elements of queue indexes.
+
+`struct EBPFRSSConfig` fields:
+
+- redirect - "boolean" value, should the hash be calculated, on false  - `default_queue` would be used as the final decision.
+- populate_hash - for now, not used. eBPF RSS doesn't support hash reporting.
+- hash_types - binary mask of different hash types. See `VIRTIO_NET_RSS_HASH_TYPE_*` defines. If for packet hash should not be calculated - `default_queue` would be used.
+- indirections_len - length of the indirections table, maximum 128.
+- default_queue - the queue index that used for packet that shouldn't be hashed. For some packets, the hash can't be calculated(g.e ARP).
+
+Functions:
+
+- `ebpf_rss_init()` - sets ctx to NULL, which indicates that EBPFRSSContext is not loaded.
+- `ebpf_rss_load()` - creates 3 maps and loads eBPF program from the rss.bpf.skeleton.h. Returns 'true' on success. After that, program_fd can be used to set steering for TAP.
+- `ebpf_rss_set_all()` - sets values for eBPF maps. `indirections_table` length is in EBPFRSSConfig. `toeplitz_key` is VIRTIO_NET_RSS_MAX_KEY_SIZE aka 40 bytes array.
+- `ebpf_rss_unload()` - close all file descriptors and set ctx to NULL.
+
+Simplified eBPF RSS workflow:
+
+.. code:: C
+
+    struct EBPFRSSConfig config;
+    config.redirect = 1;
+    config.hash_types = VIRTIO_NET_RSS_HASH_TYPE_UDPv4 | VIRTIO_NET_RSS_HASH_TYPE_TCPv4;
+    config.indirections_len = VIRTIO_NET_RSS_MAX_TABLE_LEN;
+    config.default_queue = 0;
+
+    uint16_t table[VIRTIO_NET_RSS_MAX_TABLE_LEN] = {...};
+    uint8_t key[VIRTIO_NET_RSS_MAX_KEY_SIZE] = {...};
+
+    struct EBPFRSSContext ctx;
+    ebpf_rss_init(&ctx);
+    ebpf_rss_load(&ctx);
+    ebpf_rss_set_all(&ctx, &config, table, key);
+    if (net_client->info->set_steering_ebpf != NULL) {
+        net_client->info->set_steering_ebpf(net_client, ctx->program_fd);
+    }
+    ...
+    ebpf_unload(&ctx);
+
+
+NetClientState SetSteeringEBPF()
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For now, `set_steering_ebpf()` method supported by Linux TAP NetClientState. The method requires an eBPF program file descriptor as an argument.
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -XXX,XX +XXX,XX @@ Contents:
    qom
    block-coroutine-wrapper
    multi-process
+   ebpf_rss
-- 
2.7.4