Series comparison

-[Qemu-devel] [PULL 0/2] Block patches
+[Qemu-devel] [PULL v2 0/8] Block patches
-The following changes since commit ca4e667dbf431d4a2a5a619cde79d30dd2ac3eb2:
+The following changes since commit 474f3938d79ab36b9231c9ad3b5a9314c2aeacde:
-  Merge remote-tracking branch 'remotes/kraxel/tags/usb-20170717-pull-request' into staging (2017-07-17 17:54:17 +0100)
+  Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-jun-21-2019' into staging (2019-06-21 15:40:50 +0100)
-are available in the git repository at:
+are available in the Git repository at:
-  git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request
+  https://github.com/XanClic/qemu.git tags/pull-block-2019-06-24
-for you to fetch changes up to 8508eee740c78d1465e25dad7c3e06137485dfbc:
+for you to fetch changes up to ab5d4a30f7f3803ca5106b370969c1b7b54136f8:
-  live-block-ops.txt: Rename, rewrite, and improve it (2017-07-18 00:11:01 -0400)
+  iotests: Fix 205 for concurrent runs (2019-06-24 16:01:40 +0200)
 ----------------------------------------------------------------
-Block patches (documentation)
+Block patches:
 - The SSH block driver now uses libssh instead of libssh2
 - The VMDK block driver gets read-only support for the seSparse
   subformat
 - Various fixes
 ---
 v2:
 - Squashed Pino's fix for pre-0.8 libssh into the libssh patch
 ----------------------------------------------------------------
+Anton Nefedov (1):
+  iotest 134: test cluster-misaligned encrypted write
-Kashyap Chamarthy (2):
+Klaus Birkelund Jensen (1):
-  bitmaps.md: Convert to rST; move it into 'interop' dir
+  nvme: do not advertise support for unsupported arbitration mechanism
   live-block-ops.txt: Rename, rewrite, and improve it
- docs/devel/bitmaps.md                  |  505 ---------------
+Max Reitz (1):
- docs/interop/bitmaps.rst               |  555 ++++++++++++++++
+  iotests: Fix 205 for concurrent runs
- docs/interop/live-block-operations.rst | 1088 ++++++++++++++++++++++++++++++++
- docs/live-block-ops.txt                |   72 ---
+Pino Toscano (1):
-files changed, 1643 insertions(+), 577 deletions(-)
+  ssh: switch from libssh2 to libssh
- delete mode 100644 docs/devel/bitmaps.md
- create mode 100644 docs/interop/bitmaps.rst
+Sam Eiderman (3):
- create mode 100644 docs/interop/live-block-operations.rst
+  vmdk: Fix comment regarding max l1_size coverage
- delete mode 100644 docs/live-block-ops.txt
+  vmdk: Reduce the max bound for L1 table size
   vmdk: Add read-only support for seSparse snapshots
 Vladimir Sementsov-Ogievskiy (1):
   blockdev: enable non-root nodes for transaction drive-backup source
  configure                                     |  65 +-
  block/Makefile.objs                           |   6 +-
  block/ssh.c                                   | 652 ++++++++++--------
  block/vmdk.c                                  | 372 +++++++++-
  blockdev.c                                    |   2 +-
  hw/block/nvme.c                               |   1 -
  .travis.yml                                   |   4 +-
  block/trace-events                            |  14 +-
  docs/qemu-block-drivers.texi                  |   2 +-
  .../dockerfiles/debian-win32-cross.docker     |   1 -
  .../dockerfiles/debian-win64-cross.docker     |   1 -
  tests/docker/dockerfiles/fedora.docker        |   4 +-
  tests/docker/dockerfiles/ubuntu.docker        |   2 +-
  tests/docker/dockerfiles/ubuntu1804.docker    |   2 +-
  tests/qemu-iotests/059.out                    |   2 +-
  tests/qemu-iotests/134                        |   9 +
  tests/qemu-iotests/134.out                    |  10 +
  tests/qemu-iotests/205                        |   2 +-
  tests/qemu-iotests/207                        |  54 +-
  tests/qemu-iotests/207.out                    |   2 +-
 files changed, 823 insertions(+), 384 deletions(-)
 --
-.9.4
+.21.0

-New patch
+[Qemu-devel] [PULL v2 1/8] nvme: do not advertise support for unsupported arbitration mechanism
+From: Klaus Birkelund Jensen <klaus@birkelund.eu>
+The device mistakenly reports that the Weighted Round Robin with Urgent
+Priority Class arbitration mechanism is supported.
+It is not.
+Signed-off-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
+Message-id: 20190606092530.14206-1-klaus@birkelund.eu
+Acked-by: Maxim Levitsky <mlevitsk@redhat.com>
+Signed-off-by: Max Reitz <mreitz@redhat.com>
+---
+ hw/block/nvme.c | 1 -
+file changed, 1 deletion(-)
+diff --git a/hw/block/nvme.c b/hw/block/nvme.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/block/nvme.c
++++ b/hw/block/nvme.c
+@@ -XXX,XX +XXX,XX @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
+     n->bar.cap = 0;
+     NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
+     NVME_CAP_SET_CQR(n->bar.cap, 1);
+-    NVME_CAP_SET_AMS(n->bar.cap, 1);
+     NVME_CAP_SET_TO(n->bar.cap, 0xf);
+     NVME_CAP_SET_CSS(n->bar.cap, 1);
+     NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
+--
+.21.0

-New patch
+[Qemu-devel] [PULL v2 2/8] blockdev: enable non-root nodes for transaction drive-backup source
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+We forget to enable it for transaction .prepare, while it is already
+enabled in do_drive_backup since commit a2d665c1bc362
+    "blockdev: loosen restrictions on drive-backup source node"
+Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+Message-id: 20190618140804.59214-1-vsementsov@virtuozzo.com
+Reviewed-by: John Snow <jsnow@redhat.com>
+Signed-off-by: Max Reitz <mreitz@redhat.com>
+---
+ blockdev.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/blockdev.c b/blockdev.c
+index XXXXXXX..XXXXXXX 100644
+--- a/blockdev.c
++++ b/blockdev.c
+@@ -XXX,XX +XXX,XX @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
+     assert(common->action->type == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
+     backup = common->action->u.drive_backup.data;
+-    bs = qmp_get_root_bs(backup->device, errp);
++    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
+     if (!bs) {
+         return;
+     }
+--
+.21.0

-New patch
+[Qemu-devel] [PULL v2 3/8] iotest 134: test cluster-misaligned encrypted write
+From: Anton Nefedov <anton.nefedov@virtuozzo.com>
+COW (even empty/zero) areas require encryption too
+Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
+Reviewed-by: Eric Blake <eblake@redhat.com>
+Reviewed-by: Max Reitz <mreitz@redhat.com>
+Reviewed-by: Alberto Garcia <berto@igalia.com>
+Message-id: 20190516143028.81155-1-anton.nefedov@virtuozzo.com
+Signed-off-by: Max Reitz <mreitz@redhat.com>
+---
+ tests/qemu-iotests/134     |  9 +++++++++
+ tests/qemu-iotests/134.out | 10 ++++++++++
+files changed, 19 insertions(+)
+diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
+index XXXXXXX..XXXXXXX 100755
+--- a/tests/qemu-iotests/134
++++ b/tests/qemu-iotests/134
+@@ -XXX,XX +XXX,XX @@ echo
+ echo "== reading whole image =="
+ $QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
++echo
++echo "== rewriting cluster part =="
++$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
++
++echo
++echo "== verify pattern =="
++$QEMU_IO --object $SECRET -c "read -P 0 0 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
++$QEMU_IO --object $SECRET -c "read -P 0xb 512 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
++
+ echo
+ echo "== rewriting whole image =="
+ $QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qemu-iotests/134.out
++++ b/tests/qemu-iotests/134.out
+@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
+ read 134217728/134217728 bytes at offset 0
+MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++== rewriting cluster part ==
++wrote 512/512 bytes at offset 512
++512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++
++== verify pattern ==
++read 512/512 bytes at offset 0
++512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 512/512 bytes at offset 512
++512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++
+ == rewriting whole image ==
+ wrote 134217728/134217728 bytes at offset 0
+MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+--
+.21.0

-New patch
+[Qemu-devel] [PULL v2 4/8] vmdk: Fix comment regarding max l1_size coverage
+From: Sam Eiderman <shmuel.eiderman@oracle.com>
+Commit b0651b8c246d ("vmdk: Move l1_size check into vmdk_add_extent")
+extended the l1_size check from VMDK4 to VMDK3 but did not update the
+default coverage in the moved comment.
+The previous vmdk4 calculation:
+    (512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB
+The added vmdk3 calculation:
+    (512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB
+Adding the calculation of vmdk3 to the comment.
+In any case, VMware does not offer virtual disks more than 2TB for
+vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is
+not implemented yet in qemu.
+Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
+Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
+Reviewed-by: Liran Alon <liran.alon@oracle.com>
+Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
+Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
+Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com
+Reviewed-by: yuchenlin <yuchenlin@synology.com>
+Reviewed-by: Max Reitz <mreitz@redhat.com>
+Signed-off-by: Max Reitz <mreitz@redhat.com>
+---
+ block/vmdk.c | 11 ++++++++---
+file changed, 8 insertions(+), 3 deletions(-)
+diff --git a/block/vmdk.c b/block/vmdk.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/vmdk.c
++++ b/block/vmdk.c
+@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
+         return -EFBIG;
+     }
+     if (l1_size > 512 * 1024 * 1024) {
+-        /* Although with big capacity and small l1_entry_sectors, we can get a
++        /*
++         * Although with big capacity and small l1_entry_sectors, we can get a
+          * big l1_size, we don't want unbounded value to allocate the table.
+-         * Limit it to 512M, which is 16PB for default cluster and L2 table
+-         * size */
++         * Limit it to 512M, which is:
++         *     16PB - for default "Hosted Sparse Extent" (VMDK4)
++         *            cluster size: 64KB, L2 table size: 512 entries
++         *     1PB  - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
++         *            cluster size: 512B, L2 table size: 4096 entries
++         */
+         error_setg(errp, "L1 size too big");
+         return -EFBIG;
+     }
+--
+.21.0

-New patch
+[Qemu-devel] [PULL v2 5/8] vmdk: Reduce the max bound for L1 table size
+From: Sam Eiderman <shmuel.eiderman@oracle.com>
+M of L1 entries is a very loose bound, only 32M are required to store
+the maximal supported VMDK file size of 2TB.
+Fixed qemu-iotest 59# - now failure occures before on impossible L1
+table size.
+Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
+Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
+Reviewed-by: Liran Alon <liran.alon@oracle.com>
+Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
+Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
+Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com
+Reviewed-by: Max Reitz <mreitz@redhat.com>
+Signed-off-by: Max Reitz <mreitz@redhat.com>
+---
+ block/vmdk.c               | 13 +++++++------
+ tests/qemu-iotests/059.out |  2 +-
+files changed, 8 insertions(+), 7 deletions(-)
+diff --git a/block/vmdk.c b/block/vmdk.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/vmdk.c
++++ b/block/vmdk.c
+@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
+         error_setg(errp, "Invalid granularity, image may be corrupt");
+         return -EFBIG;
+     }
+-    if (l1_size > 512 * 1024 * 1024) {
++    if (l1_size > 32 * 1024 * 1024) {
+         /*
+          * Although with big capacity and small l1_entry_sectors, we can get a
+          * big l1_size, we don't want unbounded value to allocate the table.
+-         * Limit it to 512M, which is:
+-         *     16PB - for default "Hosted Sparse Extent" (VMDK4)
+-         *            cluster size: 64KB, L2 table size: 512 entries
+-         *     1PB  - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
+-         *            cluster size: 512B, L2 table size: 4096 entries
++         * Limit it to 32M, which is enough to store:
++         *     8TB  - for both VMDK3 & VMDK4 with
++         *            minimal cluster size: 512B
++         *            minimal L2 table size: 512 entries
++         *            8 TB is still more than the maximal value supported for
++         *            VMDK3 & VMDK4 which is 2TB.
+          */
+         error_setg(errp, "L1 size too big");
+         return -EFBIG;
+diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qemu-iotests/059.out
++++ b/tests/qemu-iotests/059.out
+@@ -XXX,XX +XXX,XX @@ Offset          Length          Mapped to       File
+x140000000     0x10000         0x50000         TEST_DIR/t-s003.vmdk
+ === Testing afl image with a very large capacity ===
+-qemu-img: Can't get image size 'TEST_DIR/afl9.IMGFMT': File too large
++qemu-img: Could not open 'TEST_DIR/afl9.IMGFMT': L1 size too big
+ *** done
+--
+.21.0

-New patch
+[Qemu-devel] [PULL v2 6/8] vmdk: Add read-only support for seSparse snapshots
+From: Sam Eiderman <shmuel.eiderman@oracle.com>
 Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
 QEMU).
 This format was lacking in the following:
     * Grain directory (L1) and grain table (L2) entries were 32-bit,
       allowing access to only 2TB (slightly less) of data.
     * The grain size (default) was 512 bytes - leading to data
       fragmentation and many grain tables.
     * For space reclamation purposes, it was necessary to find all the
       grains which are not pointed to by any grain table - so a reverse
       mapping of "offset of grain in vmdk" to "grain table" must be
       constructed - which takes large amounts of CPU/RAM.
 The format specification can be found in VMware's documentation:
 https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf
 In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
 introduced: SESparse (Space Efficient).
 This format fixes the above issues:
     * All entries are now 64-bit.
     * The grain size (default) is 4KB.
     * Grain directory and grain tables are now located at the beginning
       of the file.
       + seSparse format reserves space for all grain tables.
       + Grain tables can be addressed using an index.
       + Grains are located in the end of the file and can also be
         addressed with an index.
       - seSparse vmdks of large disks (64TB) have huge preallocated
         headers - mainly due to L2 tables, even for empty snapshots.
     * The header contains a reverse mapping ("backmap") of "offset of
       grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
       specifies for each grain - whether it is allocated or not.
       Using these data structures we can implement space reclamation
       efficiently.
     * Due to the fact that the header now maintains two mappings:
         * The regular one (grain directory & grain tables)
         * A reverse one (backmap and free bitmap)
       These data structures can lose consistency upon crash and result
       in a corrupted VMDK.
       Therefore, a journal is also added to the VMDK and is replayed
       when the VMware reopens the file after a crash.
 Since ESXi 6.7 - SESparse is the only snapshot format available.
 Unfortunately, VMware does not provide documentation regarding the new
 seSparse format.
 This commit is based on black-box research of the seSparse format.
 Various in-guest block operations and their effect on the snapshot file
 were tested.
 The only VMware provided source of information (regarding the underlying
 implementation) was a log file on the ESXi:
     /var/log/hostd.log
 Whenever an seSparse snapshot is created - the log is being populated
 with seSparse records.
 Relevant log records are of the form:
 [...] Const Header:
 [...]  constMagic     = 0xcafebabe
 [...]  version        = 2.1
 [...]  capacity       = 204800
 [...]  grainSize      = 8
 [...]  grainTableSize = 64
 [...]  flags          = 0
 [...] Extents:
 [...]  Header         : <1 : 1>
 [...]  JournalHdr     : <2 : 2>
 [...]  Journal        : <2048 : 2048>
 [...]  GrainDirectory : <4096 : 2048>
 [...]  GrainTables    : <6144 : 2048>
 [...]  FreeBitmap     : <8192 : 2048>
 [...]  BackMap        : <10240 : 2048>
 [...]  Grain          : <12288 : 204800>
 [...] Volatile Header:
 [...] volatileMagic     = 0xcafecafe
 [...] FreeGTNumber      = 0
 [...] nextTxnSeqNumber  = 0
 [...] replayJournal     = 0
 The sizes that are seen in the log file are in sectors.
 Extents are of the following format: <offset : size>
 This commit is a strict implementation which enforces:
     * magics
     * version number 2.1
     * grain size of 8 sectors  (4KB)
     * grain table size of 64 sectors
     * zero flags
     * extent locations
 Additionally, this commit proivdes only a subset of the functionality
 offered by seSparse's format:
     * Read-only
     * No journal replay
     * No space reclamation
     * No unmap support
 Hence, journal header, journal, free bitmap and backmap extents are
 unused, only the "classic" (L1 -> L2 -> data) grain access is
 implemented.
 However there are several differences in the grain access itself.
 Grain directory (L1):
     * Grain directory entries are indexes (not offsets) to grain
       tables.
     * Valid grain directory entries have their highest nibble set to
 x1.
     * Since grain tables are always located in the beginning of the
       file - the index can fit into 32 bits - so we can use its low
       part if it's valid.
 Grain table (L2):
     * Grain table entries are indexes (not offsets) to grains.
     * If the highest nibble of the entry is:
 x0:
             The grain in not allocated.
             The rest of the bytes are 0.
 x1:
             The grain is unmapped - guest sees a zero grain.
             The rest of the bits point to the previously mapped grain,
             see 0x3 case.
 x2:
             The grain is zero.
 x3:
             The grain is allocated - to get the index calculate:
             ((entry & 0x0fff000000000000) >> 48) |
             ((entry & 0x0000ffffffffffff) << 12)
     * The difference between 0x1 and 0x2 is that 0x1 is an unallocated
       grain which results from the guest using sg_unmap to unmap the
       grain - but the grain itself still exists in the grain extent - a
       space reclamation procedure should delete it.
       Unmapping a zero grain has no effect (0x2 will not change to 0x1)
       but unmapping an unallocated grain will (0x0 to 0x1) - naturally.
 In order to implement seSparse some fields had to be changed to support
 both 32-bit and 64-bit entry sizes.
 Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
 Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
 Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
 Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
 Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
  block/vmdk.c | 358 ++++++++++++++++++++++++++++++++++++++++++++++++---
 file changed, 342 insertions(+), 16 deletions(-)
 diff --git a/block/vmdk.c b/block/vmdk.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/vmdk.c
 +++ b/block/vmdk.c
@@ -XXX,XX +XXX,XX @@ typedef struct {
      uint16_t compressAlgorithm;
  } QEMU_PACKED VMDK4Header;
 +typedef struct VMDKSESparseConstHeader {
 +    uint64_t magic;
 +    uint64_t version;
 +    uint64_t capacity;
 +    uint64_t grain_size;
 +    uint64_t grain_table_size;
 +    uint64_t flags;
 +    uint64_t reserved1;
 +    uint64_t reserved2;
 +    uint64_t reserved3;
 +    uint64_t reserved4;
 +    uint64_t volatile_header_offset;
 +    uint64_t volatile_header_size;
 +    uint64_t journal_header_offset;
 +    uint64_t journal_header_size;
 +    uint64_t journal_offset;
 +    uint64_t journal_size;
 +    uint64_t grain_dir_offset;
 +    uint64_t grain_dir_size;
 +    uint64_t grain_tables_offset;
 +    uint64_t grain_tables_size;
 +    uint64_t free_bitmap_offset;
 +    uint64_t free_bitmap_size;
 +    uint64_t backmap_offset;
 +    uint64_t backmap_size;
 +    uint64_t grains_offset;
 +    uint64_t grains_size;
 +    uint8_t pad[304];
 +} QEMU_PACKED VMDKSESparseConstHeader;
 +
 +typedef struct VMDKSESparseVolatileHeader {
 +    uint64_t magic;
 +    uint64_t free_gt_number;
 +    uint64_t next_txn_seq_number;
 +    uint64_t replay_journal;
 +    uint8_t pad[480];
 +} QEMU_PACKED VMDKSESparseVolatileHeader;
 +
  #define L2_CACHE_SIZE 16
  typedef struct VmdkExtent {
@@ -XXX,XX +XXX,XX @@ typedef struct VmdkExtent {
      bool compressed;
      bool has_marker;
      bool has_zero_grain;
 +    bool sesparse;
 +    uint64_t sesparse_l2_tables_offset;
 +    uint64_t sesparse_clusters_offset;
 +    int32_t entry_size;
      int version;
      int64_t sectors;
      int64_t end_sector;
      int64_t flat_start_offset;
      int64_t l1_table_offset;
      int64_t l1_backup_table_offset;
 -    uint32_t *l1_table;
 +    void *l1_table;
      uint32_t *l1_backup_table;
      unsigned int l1_size;
      uint32_t l1_entry_sectors;
      unsigned int l2_size;
 -    uint32_t *l2_cache;
 +    void *l2_cache;
      uint32_t l2_cache_offsets[L2_CACHE_SIZE];
      uint32_t l2_cache_counts[L2_CACHE_SIZE];
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
           *            minimal L2 table size: 512 entries
           *            8 TB is still more than the maximal value supported for
           *            VMDK3 & VMDK4 which is 2TB.
 +         *     64TB - for "ESXi seSparse Extent"
 +         *            minimal cluster size: 512B (default is 4KB)
 +         *            L2 table size: 4096 entries (const).
 +         *            64TB is more than the maximal value supported for
 +         *            seSparse VMDKs (which is slightly less than 64TB)
           */
          error_setg(errp, "L1 size too big");
          return -EFBIG;
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
      extent->l2_size = l2_size;
      extent->cluster_sectors = flat ? sectors : cluster_sectors;
      extent->next_cluster_sector = ROUND_UP(nb_sectors, cluster_sectors);
 +    extent->entry_size = sizeof(uint32_t);
      if (s->num_extents > 1) {
          extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
      int i;
      /* read the L1 table */
 -    l1_size = extent->l1_size * sizeof(uint32_t);
 +    l1_size = extent->l1_size * extent->entry_size;
      extent->l1_table = g_try_malloc(l1_size);
      if (l1_size && extent->l1_table == NULL) {
          return -ENOMEM;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
          goto fail_l1;
      }
      for (i = 0; i < extent->l1_size; i++) {
 -        le32_to_cpus(&extent->l1_table[i]);
 +        if (extent->entry_size == sizeof(uint64_t)) {
 +            le64_to_cpus((uint64_t *)extent->l1_table + i);
 +        } else {
 +            assert(extent->entry_size == sizeof(uint32_t));
 +            le32_to_cpus((uint32_t *)extent->l1_table + i);
 +        }
      }
      if (extent->l1_backup_table_offset) {
 +        assert(!extent->sesparse);
          extent->l1_backup_table = g_try_malloc(l1_size);
          if (l1_size && extent->l1_backup_table == NULL) {
              ret = -ENOMEM;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
      }
      extent->l2_cache =
 -        g_new(uint32_t, extent->l2_size * L2_CACHE_SIZE);
 +        g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE);
      return 0;
   fail_l1b:
      g_free(extent->l1_backup_table);
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
      return ret;
  }
 +#define SESPARSE_CONST_HEADER_MAGIC UINT64_C(0x00000000cafebabe)
 +#define SESPARSE_VOLATILE_HEADER_MAGIC UINT64_C(0x00000000cafecafe)
 +
 +/* Strict checks - format not officially documented */
 +static int check_se_sparse_const_header(VMDKSESparseConstHeader *header,
 +                                        Error **errp)
 +{
 +    header->magic = le64_to_cpu(header->magic);
 +    header->version = le64_to_cpu(header->version);
 +    header->grain_size = le64_to_cpu(header->grain_size);
 +    header->grain_table_size = le64_to_cpu(header->grain_table_size);
 +    header->flags = le64_to_cpu(header->flags);
 +    header->reserved1 = le64_to_cpu(header->reserved1);
 +    header->reserved2 = le64_to_cpu(header->reserved2);
 +    header->reserved3 = le64_to_cpu(header->reserved3);
 +    header->reserved4 = le64_to_cpu(header->reserved4);
 +
 +    header->volatile_header_offset =
 +        le64_to_cpu(header->volatile_header_offset);
 +    header->volatile_header_size = le64_to_cpu(header->volatile_header_size);
 +
 +    header->journal_header_offset = le64_to_cpu(header->journal_header_offset);
 +    header->journal_header_size = le64_to_cpu(header->journal_header_size);
 +
 +    header->journal_offset = le64_to_cpu(header->journal_offset);
 +    header->journal_size = le64_to_cpu(header->journal_size);
 +
 +    header->grain_dir_offset = le64_to_cpu(header->grain_dir_offset);
 +    header->grain_dir_size = le64_to_cpu(header->grain_dir_size);
 +
 +    header->grain_tables_offset = le64_to_cpu(header->grain_tables_offset);
 +    header->grain_tables_size = le64_to_cpu(header->grain_tables_size);
 +
 +    header->free_bitmap_offset = le64_to_cpu(header->free_bitmap_offset);
 +    header->free_bitmap_size = le64_to_cpu(header->free_bitmap_size);
 +
 +    header->backmap_offset = le64_to_cpu(header->backmap_offset);
 +    header->backmap_size = le64_to_cpu(header->backmap_size);
 +
 +    header->grains_offset = le64_to_cpu(header->grains_offset);
 +    header->grains_size = le64_to_cpu(header->grains_size);
 +
 +    if (header->magic != SESPARSE_CONST_HEADER_MAGIC) {
 +        error_setg(errp, "Bad const header magic: 0x%016" PRIx64,
 +                   header->magic);
 +        return -EINVAL;
 +    }
 +
 +    if (header->version != 0x0000000200000001) {
 +        error_setg(errp, "Unsupported version: 0x%016" PRIx64,
 +                   header->version);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->grain_size != 8) {
 +        error_setg(errp, "Unsupported grain size: %" PRIu64,
 +                   header->grain_size);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->grain_table_size != 64) {
 +        error_setg(errp, "Unsupported grain table size: %" PRIu64,
 +                   header->grain_table_size);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->flags != 0) {
 +        error_setg(errp, "Unsupported flags: 0x%016" PRIx64,
 +                   header->flags);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->reserved1 != 0 || header->reserved2 != 0 ||
 +        header->reserved3 != 0 || header->reserved4 != 0) {
 +        error_setg(errp, "Unsupported reserved bits:"
 +                   " 0x%016" PRIx64 " 0x%016" PRIx64
 +                   " 0x%016" PRIx64 " 0x%016" PRIx64,
 +                   header->reserved1, header->reserved2,
 +                   header->reserved3, header->reserved4);
 +        return -ENOTSUP;
 +    }
 +
 +    /* check that padding is 0 */
 +    if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
 +        error_setg(errp, "Unsupported non-zero const header padding");
 +        return -ENOTSUP;
 +    }
 +
 +    return 0;
 +}
 +
 +static int check_se_sparse_volatile_header(VMDKSESparseVolatileHeader *header,
 +                                           Error **errp)
 +{
 +    header->magic = le64_to_cpu(header->magic);
 +    header->free_gt_number = le64_to_cpu(header->free_gt_number);
 +    header->next_txn_seq_number = le64_to_cpu(header->next_txn_seq_number);
 +    header->replay_journal = le64_to_cpu(header->replay_journal);
 +
 +    if (header->magic != SESPARSE_VOLATILE_HEADER_MAGIC) {
 +        error_setg(errp, "Bad volatile header magic: 0x%016" PRIx64,
 +                   header->magic);
 +        return -EINVAL;
 +    }
 +
 +    if (header->replay_journal) {
 +        error_setg(errp, "Image is dirty, Replaying journal not supported");
 +        return -ENOTSUP;
 +    }
 +
 +    /* check that padding is 0 */
 +    if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
 +        error_setg(errp, "Unsupported non-zero volatile header padding");
 +        return -ENOTSUP;
 +    }
 +
 +    return 0;
 +}
 +
 +static int vmdk_open_se_sparse(BlockDriverState *bs,
 +                               BdrvChild *file,
 +                               int flags, Error **errp)
 +{
 +    int ret;
 +    VMDKSESparseConstHeader const_header;
 +    VMDKSESparseVolatileHeader volatile_header;
 +    VmdkExtent *extent;
 +
 +    ret = bdrv_apply_auto_read_only(bs,
 +            "No write support for seSparse images available", errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    assert(sizeof(const_header) == SECTOR_SIZE);
 +
 +    ret = bdrv_pread(file, 0, &const_header, sizeof(const_header));
 +    if (ret < 0) {
 +        bdrv_refresh_filename(file->bs);
 +        error_setg_errno(errp, -ret,
 +                         "Could not read const header from file '%s'",
 +                         file->bs->filename);
 +        return ret;
 +    }
 +
 +    /* check const header */
 +    ret = check_se_sparse_const_header(&const_header, errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    assert(sizeof(volatile_header) == SECTOR_SIZE);
 +
 +    ret = bdrv_pread(file,
 +                     const_header.volatile_header_offset * SECTOR_SIZE,
 +                     &volatile_header, sizeof(volatile_header));
 +    if (ret < 0) {
 +        bdrv_refresh_filename(file->bs);
 +        error_setg_errno(errp, -ret,
 +                         "Could not read volatile header from file '%s'",
 +                         file->bs->filename);
 +        return ret;
 +    }
 +
 +    /* check volatile header */
 +    ret = check_se_sparse_volatile_header(&volatile_header, errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    ret = vmdk_add_extent(bs, file, false,
 +                          const_header.capacity,
 +                          const_header.grain_dir_offset * SECTOR_SIZE,
 +                          0,
 +                          const_header.grain_dir_size *
 +                          SECTOR_SIZE / sizeof(uint64_t),
 +                          const_header.grain_table_size *
 +                          SECTOR_SIZE / sizeof(uint64_t),
 +                          const_header.grain_size,
 +                          &extent,
 +                          errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    extent->sesparse = true;
 +    extent->sesparse_l2_tables_offset = const_header.grain_tables_offset;
 +    extent->sesparse_clusters_offset = const_header.grains_offset;
 +    extent->entry_size = sizeof(uint64_t);
 +
 +    ret = vmdk_init_tables(bs, extent, errp);
 +    if (ret) {
 +        /* free extent allocated by vmdk_add_extent */
 +        vmdk_free_last_extent(bs);
 +    }
 +
 +    return ret;
 +}
 +
  static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
                                 QDict *options, Error **errp);
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
           * RW [size in sectors] SPARSE "file-name.vmdk"
           * RW [size in sectors] VMFS "file-name.vmdk"
           * RW [size in sectors] VMFSSPARSE "file-name.vmdk"
 +         * RW [size in sectors] SESPARSE "file-name.vmdk"
           */
          flat_offset = -1;
          matches = sscanf(p, "%10s %" SCNd64 " %10s \"%511[^\n\r\"]\" %" SCNd64,
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
          if (sectors <= 0 ||
              (strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
 -             strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE")) ||
 +             strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE") &&
 +             strcmp(type, "SESPARSE")) ||
              (strcmp(access, "RW"))) {
              continue;
          }
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
                  return ret;
              }
              extent = &s->extents[s->num_extents - 1];
 +        } else if (!strcmp(type, "SESPARSE")) {
 +            ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
 +            if (ret) {
 +                bdrv_unref_child(bs, extent_file);
 +                return ret;
 +            }
 +            extent = &s->extents[s->num_extents - 1];
          } else {
              error_setg(errp, "Unsupported extent type '%s'", type);
              bdrv_unref_child(bs, extent_file);
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
      if (strcmp(ct, "monolithicFlat") &&
          strcmp(ct, "vmfs") &&
          strcmp(ct, "vmfsSparse") &&
 +        strcmp(ct, "seSparse") &&
          strcmp(ct, "twoGbMaxExtentSparse") &&
          strcmp(ct, "twoGbMaxExtentFlat")) {
          error_setg(errp, "Unsupported image type '%s'", ct);
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
  {
      unsigned int l1_index, l2_offset, l2_index;
      int min_index, i, j;
 -    uint32_t min_count, *l2_table;
 +    uint32_t min_count;
 +    void *l2_table;
      bool zeroed = false;
      int64_t ret;
      int64_t cluster_sector;
 +    unsigned int l2_size_bytes = extent->l2_size * extent->entry_size;
      if (m_data) {
          m_data->valid = 0;
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
      if (l1_index >= extent->l1_size) {
          return VMDK_ERROR;
      }
 -    l2_offset = extent->l1_table[l1_index];
 +    if (extent->sesparse) {
 +        uint64_t l2_offset_u64;
 +
 +        assert(extent->entry_size == sizeof(uint64_t));
 +
 +        l2_offset_u64 = ((uint64_t *)extent->l1_table)[l1_index];
 +        if (l2_offset_u64 == 0) {
 +            l2_offset = 0;
 +        } else if ((l2_offset_u64 & 0xffffffff00000000) != 0x1000000000000000) {
 +            /*
 +             * Top most nibble is 0x1 if grain table is allocated.
 +             * strict check - top most 4 bytes must be 0x10000000 since max
 +             * supported size is 64TB for disk - so no more than 64TB / 16MB
 +             * grain directories which is smaller than uint32,
 +             * where 16MB is the only supported default grain table coverage.
 +             */
 +            return VMDK_ERROR;
 +        } else {
 +            l2_offset_u64 = l2_offset_u64 & 0x00000000ffffffff;
 +            l2_offset_u64 = extent->sesparse_l2_tables_offset +
 +                l2_offset_u64 * l2_size_bytes / SECTOR_SIZE;
 +            if (l2_offset_u64 > 0x00000000ffffffff) {
 +                return VMDK_ERROR;
 +            }
 +            l2_offset = (unsigned int)(l2_offset_u64);
 +        }
 +    } else {
 +        assert(extent->entry_size == sizeof(uint32_t));
 +        l2_offset = ((uint32_t *)extent->l1_table)[l1_index];
 +    }
      if (!l2_offset) {
          return VMDK_UNALLOC;
      }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
                      extent->l2_cache_counts[j] >>= 1;
                  }
              }
 -            l2_table = extent->l2_cache + (i * extent->l2_size);
 +            l2_table = (char *)extent->l2_cache + (i * l2_size_bytes);
              goto found;
          }
      }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
              min_index = i;
          }
      }
 -    l2_table = extent->l2_cache + (min_index * extent->l2_size);
 +    l2_table = (char *)extent->l2_cache + (min_index * l2_size_bytes);
      BLKDBG_EVENT(extent->file, BLKDBG_L2_LOAD);
      if (bdrv_pread(extent->file,
                  (int64_t)l2_offset * 512,
                  l2_table,
 -                extent->l2_size * sizeof(uint32_t)
 -            ) != extent->l2_size * sizeof(uint32_t)) {
 +                l2_size_bytes
 +            ) != l2_size_bytes) {
          return VMDK_ERROR;
      }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
      extent->l2_cache_counts[min_index] = 1;
   found:
      l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
 -    cluster_sector = le32_to_cpu(l2_table[l2_index]);
 -    if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
 -        zeroed = true;
 +    if (extent->sesparse) {
 +        cluster_sector = le64_to_cpu(((uint64_t *)l2_table)[l2_index]);
 +        switch (cluster_sector & 0xf000000000000000) {
 +        case 0x0000000000000000:
 +            /* unallocated grain */
 +            if (cluster_sector != 0) {
 +                return VMDK_ERROR;
 +            }
 +            break;
 +        case 0x1000000000000000:
 +            /* scsi-unmapped grain - fallthrough */
 +        case 0x2000000000000000:
 +            /* zero grain */
 +            zeroed = true;
 +            break;
 +        case 0x3000000000000000:
 +            /* allocated grain */
 +            cluster_sector = (((cluster_sector & 0x0fff000000000000) >> 48) |
 +                              ((cluster_sector & 0x0000ffffffffffff) << 12));
 +            cluster_sector = extent->sesparse_clusters_offset +
 +                cluster_sector * extent->cluster_sectors;
 +            break;
 +        default:
 +            return VMDK_ERROR;
 +        }
 +    } else {
 +        cluster_sector = le32_to_cpu(((uint32_t *)l2_table)[l2_index]);
 +
 +        if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
 +            zeroed = true;
 +        }
      }
      if (!cluster_sector || zeroed) {
          if (!allocate) {
              return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
          }
 +        assert(!extent->sesparse);
          if (extent->next_cluster_sector >= VMDK_EXTENT_MAX_SECTORS) {
              return VMDK_ERROR;
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
              m_data->l1_index = l1_index;
              m_data->l2_index = l2_index;
              m_data->l2_offset = l2_offset;
 -            m_data->l2_cache_entry = &l2_table[l2_index];
 +            m_data->l2_cache_entry = ((uint32_t *)l2_table) + l2_index;
          }
      }
      *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
@@ -XXX,XX +XXX,XX @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
          if (!extent) {
              return -EIO;
          }
 +        if (extent->sesparse) {
 +            return -ENOTSUP;
 +        }
          offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
          n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
                               - offset_in_cluster);
 --
 .21.0

-[Qemu-devel] [PULL 2/2] live-block-ops.txt: Rename, rewrite, and improve it
+[Qemu-devel] [PULL v2 7/8] ssh: switch from libssh2 to libssh
-From: Kashyap Chamarthy <kchamart@redhat.com>
+From: Pino Toscano <ptoscano@redhat.com>
-This patch documents (including their QMP invocations) all the four
+Rewrite the implementation of the ssh block driver to use libssh instead
-major kinds of live block operations:
+of libssh2.  The libssh library has various advantages over libssh2:
 - easier API for authentication (for example for using ssh-agent)
 - easier API for known_hosts handling
 - supports newer types of keys in known_hosts
-  - `block-stream`
+Use APIs/features available in libssh 0.8 conditionally, to support
-  - `block-commit`
+older versions (which are not recommended though).
   - `drive-mirror` (& `blockdev-mirror`)
   - `drive-backup` (& `blockdev-backup`)
-Things considered while writing this document:
+Adjust the iotest 207 according to the different error message, and to
 find the default key type for localhost (to properly compare the
 fingerprint with).
 Contributed-by: Max Reitz <mreitz@redhat.com>
-  - Use reStructuredText as markup language (with the goal of generating
+Adjust the various Docker/Travis scripts to use libssh when available
-    the HTML output using the Sphinx Documentation Generator).  It is
+instead of libssh2. The mingw/mxe testing is dropped for now, as there
-    gentler on the eye, and can be trivially converted to different
+are no packages for it.
     formats.  (Another reason: upstream QEMU is considering to switch to
     Sphinx, which uses reStructuredText as its markup language.)
-  - Raw QMP JSON output vs. 'qmp-shell'.  I debated with myself whether
+Signed-off-by: Pino Toscano <ptoscano@redhat.com>
-    to only show raw QMP JSON output (as that is the canonical
+Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-    representation), or use 'qmp-shell', which takes key-value pairs.  I
+Acked-by: Alex Bennée <alex.bennee@linaro.org>
-    settled on the approach of: for the first occurrence of a command,
+Message-id: 20190620200840.17655-1-ptoscano@redhat.com
-    use raw JSON; for subsequent occurrences, use 'qmp-shell', with an
+Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-    occasional exception.
+Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
  configure                                     |  65 +-
  block/Makefile.objs                           |   6 +-
  block/ssh.c                                   | 652 ++++++++++--------
  .travis.yml                                   |   4 +-
  block/trace-events                            |  14 +-
  docs/qemu-block-drivers.texi                  |   2 +-
  .../dockerfiles/debian-win32-cross.docker     |   1 -
  .../dockerfiles/debian-win64-cross.docker     |   1 -
  tests/docker/dockerfiles/fedora.docker        |   4 +-
  tests/docker/dockerfiles/ubuntu.docker        |   2 +-
  tests/docker/dockerfiles/ubuntu1804.docker    |   2 +-
  tests/qemu-iotests/207                        |  54 +-
  tests/qemu-iotests/207.out                    |   2 +-
 files changed, 449 insertions(+), 360 deletions(-)
-  - Usage of `-blockdev` command-line.
+diff --git a/configure b/configure
+index XXXXXXX..XXXXXXX 100755
-  - Usage of 'node-name' vs. file path to refer to disks.  While we have
+--- a/configure
-    `blockdev-{mirror, backup}` as 'node-name'-alternatives for
++++ b/configure
-    `drive-{mirror, backup}`, the `block-commit` command still operates
+@@ -XXX,XX +XXX,XX @@ auth_pam=""
-    on file names for parameters 'base' and 'top'.  So I added a caveat
+ vte=""
-    at the beginning to that effect.
+ virglrenderer=""
+ tpm=""
-    Refer this related thread that I started (where I learnt
+-libssh2=""
-    `block-stream` was recently reworked to accept 'node-name' for 'top'
++libssh=""
-    and 'base' parameters):
+ live_block_migration="yes"
-    https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg06466.html
+ numa=""
-    "[RFC] Making 'block-stream', and 'block-commit' accept node-name"
+ tcmalloc="no"
+@@ -XXX,XX +XXX,XX @@ for opt do
-All commands showed in this document were tested while documenting.
+   ;;
+   --enable-tpm) tpm="yes"
-Thanks: Eric Blake for the section: "A note on points-in-time vs file
+   ;;
-names".  This useful bit was originally articulated by Eric in his
+-  --disable-libssh2) libssh2="no"
-KVMForum 2015 presentation, so I included that specific bit in this
++  --disable-libssh) libssh="no"
-document.
+   ;;
+-  --enable-libssh2) libssh2="yes"
-Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
++  --enable-libssh) libssh="yes"
-Reviewed-by: Eric Blake <eblake@redhat.com>
+   ;;
-Message-id: 20170717105205.32639-3-kchamart@redhat.com
+   --disable-live-block-migration) live_block_migration="no"
-Signed-off-by: Jeff Cody <jcody@redhat.com>
+   ;;
----
+@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available:
- docs/interop/live-block-operations.rst | 1088 ++++++++++++++++++++++++++++++++
+   coroutine-pool  coroutine freelist (better performance)
- docs/live-block-ops.txt                |   72 ---
+   glusterfs       GlusterFS backend
-files changed, 1088 insertions(+), 72 deletions(-)
+   tpm             TPM support
- create mode 100644 docs/interop/live-block-operations.rst
+-  libssh2         ssh block device support
- delete mode 100644 docs/live-block-ops.txt
++  libssh          ssh block device support
+   numa            libnuma support
-diff --git a/docs/interop/live-block-operations.rst b/docs/interop/live-block-operations.rst
+   libxml2         for Parallels image format
-new file mode 100644
+   tcmalloc        tcmalloc support
-index XXXXXXX..XXXXXXX
+@@ -XXX,XX +XXX,XX @@ EOF
---- /dev/null
+ fi
-+++ b/docs/interop/live-block-operations.rst
  ##########################################
 -# libssh2 probe
 -min_libssh2_version=1.2.8
 -if test "$libssh2" != "no" ; then
 -  if $pkg_config --atleast-version=$min_libssh2_version libssh2; then
 -    libssh2_cflags=$($pkg_config libssh2 --cflags)
 -    libssh2_libs=$($pkg_config libssh2 --libs)
 -    libssh2=yes
 +# libssh probe
 +if test "$libssh" != "no" ; then
 +  if $pkg_config --exists libssh; then
 +    libssh_cflags=$($pkg_config libssh --cflags)
 +    libssh_libs=$($pkg_config libssh --libs)
 +    libssh=yes
    else
 -    if test "$libssh2" = "yes" ; then
 -      error_exit "libssh2 >= $min_libssh2_version required for --enable-libssh2"
 +    if test "$libssh" = "yes" ; then
 +      error_exit "libssh required for --enable-libssh"
      fi
 -    libssh2=no
 +    libssh=no
    fi
  fi
  ##########################################
 -# libssh2_sftp_fsync probe
 +# Check for libssh 0.8
 +# This is done like this instead of using the LIBSSH_VERSION_* and
 +# SSH_VERSION_* macros because some distributions in the past shipped
 +# snapshots of the future 0.8 from Git, and those snapshots did not
 +# have updated version numbers (still referring to 0.7.0).
 -if test "$libssh2" = "yes"; then
 +if test "$libssh" = "yes"; then
    cat > $TMPC <<EOF
 -#include <stdio.h>
 -#include <libssh2.h>
 -#include <libssh2_sftp.h>
 -int main(void) {
 -    LIBSSH2_SESSION *session;
 -    LIBSSH2_SFTP *sftp;
 -    LIBSSH2_SFTP_HANDLE *sftp_handle;
 -    session = libssh2_session_init ();
 -    sftp = libssh2_sftp_init (session);
 -    sftp_handle = libssh2_sftp_open (sftp, "/", 0, 0);
 -    libssh2_sftp_fsync (sftp_handle);
 -    return 0;
 -}
 +#include <libssh/libssh.h>
 +int main(void) { return ssh_get_server_publickey(NULL, NULL); }
  EOF
 -  # libssh2_cflags/libssh2_libs defined in previous test.
 -  if compile_prog "$libssh2_cflags" "$libssh2_libs" ; then
 -    QEMU_CFLAGS="-DHAS_LIBSSH2_SFTP_FSYNC $QEMU_CFLAGS"
 +  if compile_prog "$libssh_cflags" "$libssh_libs"; then
 +    libssh_cflags="-DHAVE_LIBSSH_0_8 $libssh_cflags"
    fi
  fi
@@ -XXX,XX +XXX,XX @@ echo "GlusterFS support $glusterfs"
  echo "gcov              $gcov_tool"
  echo "gcov enabled      $gcov"
  echo "TPM support       $tpm"
 -echo "libssh2 support   $libssh2"
 +echo "libssh support    $libssh"
  echo "QOM debugging     $qom_cast_debug"
  echo "Live block migration $live_block_migration"
  echo "lzo support       $lzo"
@@ -XXX,XX +XXX,XX @@ if test "$glusterfs_iocb_has_stat" = "yes" ; then
    echo "CONFIG_GLUSTERFS_IOCB_HAS_STAT=y" >> $config_host_mak
  fi
 -if test "$libssh2" = "yes" ; then
 -  echo "CONFIG_LIBSSH2=m" >> $config_host_mak
 -  echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
 -  echo "LIBSSH2_LIBS=$libssh2_libs" >> $config_host_mak
 +if test "$libssh" = "yes" ; then
 +  echo "CONFIG_LIBSSH=m" >> $config_host_mak
 +  echo "LIBSSH_CFLAGS=$libssh_cflags" >> $config_host_mak
 +  echo "LIBSSH_LIBS=$libssh_libs" >> $config_host_mak
  fi
  if test "$live_block_migration" = "yes" ; then
 diff --git a/block/Makefile.objs b/block/Makefile.objs
 index XXXXXXX..XXXXXXX 100644
 --- a/block/Makefile.objs
 +++ b/block/Makefile.objs
@@ -XXX,XX +XXX,XX @@ block-obj-$(CONFIG_CURL) += curl.o
  block-obj-$(CONFIG_RBD) += rbd.o
  block-obj-$(CONFIG_GLUSTERFS) += gluster.o
  block-obj-$(CONFIG_VXHS) += vxhs.o
 -block-obj-$(CONFIG_LIBSSH2) += ssh.o
 +block-obj-$(CONFIG_LIBSSH) += ssh.o
  block-obj-y += accounting.o dirty-bitmap.o
  block-obj-y += write-threshold.o
  block-obj-y += backup.o
@@ -XXX,XX +XXX,XX @@ rbd.o-libs         := $(RBD_LIBS)
  gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
  gluster.o-libs     := $(GLUSTERFS_LIBS)
  vxhs.o-libs        := $(VXHS_LIBS)
 -ssh.o-cflags       := $(LIBSSH2_CFLAGS)
 -ssh.o-libs         := $(LIBSSH2_LIBS)
 +ssh.o-cflags       := $(LIBSSH_CFLAGS)
 +ssh.o-libs         := $(LIBSSH_LIBS)
  block-obj-dmg-bz2-$(CONFIG_BZIP2) += dmg-bz2.o
  block-obj-$(if $(CONFIG_DMG),m,n) += $(block-obj-dmg-bz2-y)
  dmg-bz2.o-libs     := $(BZIP2_LIBS)
 diff --git a/block/ssh.c b/block/ssh.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/ssh.c
 +++ b/block/ssh.c
 @@ -XXX,XX +XXX,XX @@
-+..
-+    Copyright (C) 2017 Red Hat Inc.
+ #include "qemu/osdep.h"
-+
-+    This work is licensed under the terms of the GNU GPL, version 2 or
+-#include <libssh2.h>
-+    later.  See the COPYING file in the top-level directory.
+-#include <libssh2_sftp.h>
-+
++#include <libssh/libssh.h>
-+============================
++#include <libssh/sftp.h>
-+Live Block Device Operations
-+============================
+ #include "block/block_int.h"
-+
+ #include "block/qdict.h"
-+QEMU Block Layer currently (as of QEMU 2.9) supports four major kinds of
+@@ -XXX,XX +XXX,XX @@
-+live block device jobs -- stream, commit, mirror, and backup.  These can
+ #include "trace.h"
-+be used to manipulate disk image chains to accomplish certain tasks,
-+namely: live copy data from backing files into overlays; shorten long
+ /*
-+disk image chains by merging data from overlays into backing files; live
+- * TRACE_LIBSSH2=<bitmask> enables tracing in libssh2 itself.  Note
-+synchronize data from a disk image chain (including current active disk)
+- * that this requires that libssh2 was specially compiled with the
-+to another target image; and point-in-time (and incremental) backups of
+- * `./configure --enable-debug' option, so most likely you will have
-+a block device.  Below is a description of the said block (QMP)
+- * to compile it yourself.  The meaning of <bitmask> is described
-+primitives, and some (non-exhaustive list of) examples to illustrate
+- * here: http://www.libssh2.org/libssh2_trace.html
-+their use.
++ * TRACE_LIBSSH=<level> enables tracing in libssh itself.
-+
++ * The meaning of <level> is described here:
-+.. note::
++ * http://api.libssh.org/master/group__libssh__log.html
-+    The file ``qapi/block-core.json`` in the QEMU source tree has the
+  */
-+    canonical QEMU API (QAPI) schema documentation for the QMP
+-#define TRACE_LIBSSH2 0 /* or try: LIBSSH2_TRACE_SFTP */
-+    primitives discussed here.
++#define TRACE_LIBSSH  0 /* see: SSH_LOG_* */
-+
-+.. todo (kashyapc):: Remove the ".. contents::" directive when Sphinx is
+ typedef struct BDRVSSHState {
-+                     integrated.
+     /* Coroutine. */
-+
+@@ -XXX,XX +XXX,XX @@ typedef struct BDRVSSHState {
-+.. contents::
-+
+     /* SSH connection. */
-+Disk image backing chain notation
+     int sock;                         /* socket */
-+---------------------------------
+-    LIBSSH2_SESSION *session;         /* ssh session */
-+
+-    LIBSSH2_SFTP *sftp;               /* sftp session */
-+A simple disk image chain.  (This can be created live using QMP
+-    LIBSSH2_SFTP_HANDLE *sftp_handle; /* sftp remote file handle */
-+``blockdev-snapshot-sync``, or offline via ``qemu-img``)::
++    ssh_session session;              /* ssh session */
-+
++    sftp_session sftp;                /* sftp session */
-+                   (Live QEMU)
++    sftp_file sftp_handle;            /* sftp remote file handle */
-+                        |
-+                        .
+-    /* See ssh_seek() function below. */
-+                        V
+-    int64_t offset;
-+
+-    bool offset_op_read;
-+            [A] <----- [B]
+-
-+
+-    /* File attributes at open.  We try to keep the .filesize field
-+    (backing file)    (overlay)
++    /*
-+
++     * File attributes at open.  We try to keep the .size field
-+The arrow can be read as: Image [A] is the backing file of disk image
+      * updated if it changes (eg by writing at the end of the file).
-+[B].  And live QEMU is currently writing to image [B], consequently, it
+      */
-+is also referred to as the "active layer".
+-    LIBSSH2_SFTP_ATTRIBUTES attrs;
-+
++    sftp_attributes attrs;
-+There are two kinds of terminology that are common when referring to
-+files in a disk image backing chain:
+     InetSocketAddress *inet;
-+
-+(1) Directional: 'base' and 'top'.  Given the simple disk image chain
+@@ -XXX,XX +XXX,XX @@ static void ssh_state_init(BDRVSSHState *s)
-+    above, image [A] can be referred to as 'base', and image [B] as
+ {
-+    'top'.  (This terminology can be seen in in QAPI schema file,
+     memset(s, 0, sizeof *s);
-+    block-core.json.)
+     s->sock = -1;
-+
+-    s->offset = -1;
-+(2) Relational: 'backing file' and 'overlay'.  Again, taking the same
+     qemu_co_mutex_init(&s->lock);
-+    simple disk image chain from the above, disk image [A] is referred
+ }
-+    to as the backing file, and image [B] as overlay.
-+
+@@ -XXX,XX +XXX,XX @@ static void ssh_state_free(BDRVSSHState *s)
-+   Throughout this document, we will use the relational terminology.
+ {
-+
+     g_free(s->user);
-+.. important::
-+    The overlay files can generally be any format that supports a
++    if (s->attrs) {
-+    backing file, although QCOW2 is the preferred format and the one
++        sftp_attributes_free(s->attrs);
-+    used in this document.
++    }
-+
+     if (s->sftp_handle) {
-+
+-        libssh2_sftp_close(s->sftp_handle);
-+Brief overview of live block QMP primitives
++        sftp_close(s->sftp_handle);
-+-------------------------------------------
+     }
-+
+     if (s->sftp) {
-+The following are the four different kinds of live block operations that
+-        libssh2_sftp_shutdown(s->sftp);
-+QEMU block layer supports.
++        sftp_free(s->sftp);
-+
+     }
-+(1) ``block-stream``: Live copy of data from backing files into overlay
+     if (s->session) {
-+    files.
+-        libssh2_session_disconnect(s->session,
-+
+-                                   "from qemu ssh client: "
-+    .. note:: Once the 'stream' operation has finished, three things to
+-                                   "user closed the connection");
-+              note:
+-        libssh2_session_free(s->session);
-+
+-    }
-+                (a) QEMU rewrites the backing chain to remove
+-    if (s->sock >= 0) {
-+                    reference to the now-streamed and redundant backing
+-        close(s->sock);
-+                    file;
++        ssh_disconnect(s->session);
-+
++        ssh_free(s->session); /* This frees s->sock */
-+                (b) the streamed file *itself* won't be removed by QEMU,
+     }
-+                    and must be explicitly discarded by the user;
+ }
-+
-+                (c) the streamed file remains valid -- i.e. further
+@@ -XXX,XX +XXX,XX @@ session_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
-+                    overlays can be created based on it.  Refer the
+     va_end(args);
-+                    ``block-stream`` section further below for more
-+                    details.
+     if (s->session) {
-+
+-        char *ssh_err;
-+(2) ``block-commit``: Live merge of data from overlay files into backing
++        const char *ssh_err;
-+    files (with the optional goal of removing the overlay file from the
+         int ssh_err_code;
-+    chain).  Since QEMU 2.0, this includes "active ``block-commit``"
-+    (i.e. merge the current active layer into the base image).
+-        /* This is not an errno.  See <libssh2.h>. */
-+
+-        ssh_err_code = libssh2_session_last_error(s->session,
-+    .. note:: Once the 'commit' operation has finished, there are three
+-                                                  &ssh_err, NULL, 0);
-+              things to note here as well:
+-        error_setg(errp, "%s: %s (libssh2 error code: %d)",
-+
++        /* This is not an errno.  See <libssh/libssh.h>. */
-+                (a) QEMU rewrites the backing chain to remove reference
++        ssh_err = ssh_get_error(s->session);
-+                    to now-redundant overlay images that have been
++        ssh_err_code = ssh_get_error_code(s->session);
-+                    committed into a backing file;
++        error_setg(errp, "%s: %s (libssh error code: %d)",
-+
+                    msg, ssh_err, ssh_err_code);
-+                (b) the committed file *itself* won't be removed by QEMU
+     } else {
-+                    -- it ought to be manually removed;
+         error_setg(errp, "%s", msg);
-+
+@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
-+                (c) however, unlike in the case of ``block-stream``, the
+     va_end(args);
-+                    intermediate images will be rendered invalid -- i.e.
-+                    no more further overlays can be created based on
+     if (s->sftp) {
-+                    them.  Refer the ``block-commit`` section further
+-        char *ssh_err;
-+                    below for more details.
++        const char *ssh_err;
-+
+         int ssh_err_code;
-+(3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize a running
+-        unsigned long sftp_err_code;
-+    disk to another image.
++        int sftp_err_code;
-+
-+(4) ``drive-backup`` (and ``blockdev-backup``): Point-in-time (live) copy
+-        /* This is not an errno.  See <libssh2.h>. */
-+    of a block device to a destination.
+-        ssh_err_code = libssh2_session_last_error(s->session,
-+
+-                                                  &ssh_err, NULL, 0);
-+
+-        /* See <libssh2_sftp.h>. */
-+.. _`Interacting with a QEMU instance`:
+-        sftp_err_code = libssh2_sftp_last_error((s)->sftp);
-+
++        /* This is not an errno.  See <libssh/libssh.h>. */
-+Interacting with a QEMU instance
++        ssh_err = ssh_get_error(s->session);
-+--------------------------------
++        ssh_err_code = ssh_get_error_code(s->session);
-+
++        /* See <libssh/sftp.h>. */
-+To show some example invocations of command-line, we will use the
++        sftp_err_code = sftp_get_error(s->sftp);
-+following invocation of QEMU, with a QMP server running over UNIX
-+socket::
+         error_setg(errp,
-+
+-                   "%s: %s (libssh2 error code: %d, sftp error code: %lu)",
-+    $ ./x86_64-softmmu/qemu-system-x86_64 -display none -nodefconfig \
++                   "%s: %s (libssh error code: %d, sftp error code: %d)",
-+        -M q35 -nodefaults -m 512 \
+                    msg, ssh_err, ssh_err_code, sftp_err_code);
-+        -blockdev node-name=node-A,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./a.qcow2 \
+     } else {
-+        -device virtio-blk,drive=node-A,id=virtio0 \
+         error_setg(errp, "%s", msg);
-+        -monitor stdio -qmp unix:/tmp/qmp-sock,server,nowait
+@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
-+
-+The ``-blockdev`` command-line option, used above, is available from
+ static void sftp_error_trace(BDRVSSHState *s, const char *op)
-+QEMU 2.9 onwards.  In the above invocation, notice the ``node-name``
+ {
-+parameter that is used to refer to the disk image a.qcow2 ('node-A') --
+-    char *ssh_err;
-+this is a cleaner way to refer to a disk image (as opposed to referring
++    const char *ssh_err;
-+to it by spelling out file paths).  So, we will continue to designate a
+     int ssh_err_code;
-+``node-name`` to each further disk image created (either via
+-    unsigned long sftp_err_code;
-+``blockdev-snapshot-sync``, or ``blockdev-add``) as part of the disk
++    int sftp_err_code;
-+image chain, and continue to refer to the disks using their
-+``node-name`` (where possible, because ``block-commit`` does not yet, as
+-    /* This is not an errno.  See <libssh2.h>. */
-+of QEMU 2.9, accept ``node-name`` parameter) when performing various
+-    ssh_err_code = libssh2_session_last_error(s->session,
-+block operations.
+-                                              &ssh_err, NULL, 0);
-+
+-    /* See <libssh2_sftp.h>. */
-+To interact with the QEMU instance launched above, we will use the
+-    sftp_err_code = libssh2_sftp_last_error((s)->sftp);
-+``qmp-shell`` utility (located at: ``qemu/scripts/qmp``, as part of the
++    /* This is not an errno.  See <libssh/libssh.h>. */
-+QEMU source directory), which takes key-value pairs for QMP commands.
++    ssh_err = ssh_get_error(s->session);
-+Invoke it as below (which will also print out the complete raw JSON
++    ssh_err_code = ssh_get_error_code(s->session);
-+syntax for reference -- examples in the following sections)::
++    /* See <libssh/sftp.h>. */
-+
++    sftp_err_code = sftp_get_error(s->sftp);
-+    $ ./qmp-shell -v -p /tmp/qmp-sock
-+    (QEMU)
+     trace_sftp_error(op, ssh_err, ssh_err_code, sftp_err_code);
-+
+ }
-+.. note::
+@@ -XXX,XX +XXX,XX @@ static void ssh_parse_filename(const char *filename, QDict *options,
-+    In the event we have to repeat a certain QMP command, we will: for
+     parse_uri(filename, options, errp);
-+    the first occurrence of it, show the ``qmp-shell`` invocation, *and*
+ }
-+    the corresponding raw JSON QMP syntax; but for subsequent
-+    invocations, present just the ``qmp-shell`` syntax, and omit the
+-static int check_host_key_knownhosts(BDRVSSHState *s,
-+    equivalent JSON output.
+-                                     const char *host, int port, Error **errp)
-+
++static int check_host_key_knownhosts(BDRVSSHState *s, Error **errp)
-+
+ {
-+Example disk image chain
+-    const char *home;
-+------------------------
+-    char *knh_file = NULL;
-+
+-    LIBSSH2_KNOWNHOSTS *knh = NULL;
-+We will use the below disk image chain (and occasionally spelling it
+-    struct libssh2_knownhost *found;
-+out where appropriate) when discussing various primitives::
+-    int ret, r;
-+
+-    const char *hostkey;
-+    [A] <-- [B] <-- [C] <-- [D]
+-    size_t len;
-+
+-    int type;
-+Where [A] is the original base image; [B] and [C] are intermediate
+-
-+overlay images; image [D] is the active layer -- i.e. live QEMU is
+-    hostkey = libssh2_session_hostkey(s->session, &len, &type);
-+writing to it.  (The rule of thumb is: live QEMU will always be pointing
+-    if (!hostkey) {
-+to the rightmost image in a disk image chain.)
++    int ret;
-+
++#ifdef HAVE_LIBSSH_0_8
-+The above image chain can be created by invoking
++    enum ssh_known_hosts_e state;
-+``blockdev-snapshot-sync`` commands as following (which shows the
++    int r;
-+creation of overlay image [B]) using the ``qmp-shell`` (our invocation
++    ssh_key pubkey;
-+also prints the raw JSON invocation of it)::
++    enum ssh_keytypes_e pubkey_type;
-+
++    unsigned char *server_hash = NULL;
-+    (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
++    size_t server_hash_len;
-+    {
++    char *fingerprint = NULL;
-+        "execute": "blockdev-snapshot-sync",
++
-+        "arguments": {
++    state = ssh_session_is_known_server(s->session);
-+            "node-name": "node-A",
++    trace_ssh_server_status(state);
-+            "snapshot-file": "b.qcow2",
++
-+            "format": "qcow2",
++    switch (state) {
-+            "snapshot-node-name": "node-B"
++    case SSH_KNOWN_HOSTS_OK:
 +        /* OK */
 +        trace_ssh_check_host_key_knownhosts();
 +        break;
 +    case SSH_KNOWN_HOSTS_CHANGED:
          ret = -EINVAL;
 -        session_error_setg(errp, s, "failed to read remote host key");
 +        r = ssh_get_server_publickey(s->session, &pubkey);
 +        if (r == 0) {
 +            r = ssh_get_publickey_hash(pubkey, SSH_PUBLICKEY_HASH_SHA256,
 +                                       &server_hash, &server_hash_len);
 +            pubkey_type = ssh_key_type(pubkey);
 +            ssh_key_free(pubkey);
 +        }
 +        if (r == 0) {
 +            fingerprint = ssh_get_fingerprint_hash(SSH_PUBLICKEY_HASH_SHA256,
 +                                                   server_hash,
 +                                                   server_hash_len);
 +            ssh_clean_pubkey_hash(&server_hash);
 +        }
 +        if (fingerprint) {
 +            error_setg(errp,
 +                       "host key (%s key with fingerprint %s) does not match "
 +                       "the one in known_hosts; this may be a possible attack",
 +                       ssh_key_type_to_char(pubkey_type), fingerprint);
 +            ssh_string_free_char(fingerprint);
 +        } else  {
 +            error_setg(errp,
 +                       "host key does not match the one in known_hosts; this "
 +                       "may be a possible attack");
 +        }
          goto out;
 -    }
 -
 -    knh = libssh2_knownhost_init(s->session);
 -    if (!knh) {
 +    case SSH_KNOWN_HOSTS_OTHER:
          ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                           "failed to initialize known hosts support");
 +        error_setg(errp,
 +                   "host key for this server not found, another type exists");
 +        goto out;
 +    case SSH_KNOWN_HOSTS_UNKNOWN:
 +        ret = -EINVAL;
 +        error_setg(errp, "no host key was found in known_hosts");
 +        goto out;
 +    case SSH_KNOWN_HOSTS_NOT_FOUND:
 +        ret = -ENOENT;
 +        error_setg(errp, "known_hosts file not found");
 +        goto out;
 +    case SSH_KNOWN_HOSTS_ERROR:
 +        ret = -EINVAL;
 +        error_setg(errp, "error while checking the host");
 +        goto out;
 +    default:
 +        ret = -EINVAL;
 +        error_setg(errp, "error while checking for known server (%d)", state);
          goto out;
      }
 +#else /* !HAVE_LIBSSH_0_8 */
 +    int state;
 -    home = getenv("HOME");
 -    if (home) {
 -        knh_file = g_strdup_printf("%s/.ssh/known_hosts", home);
 -    } else {
 -        knh_file = g_strdup_printf("/root/.ssh/known_hosts");
 -    }
 -
 -    /* Read all known hosts from OpenSSH-style known_hosts file. */
 -    libssh2_knownhost_readfile(knh, knh_file, LIBSSH2_KNOWNHOST_FILE_OPENSSH);
 +    state = ssh_is_server_known(s->session);
 +    trace_ssh_server_status(state);
 -    r = libssh2_knownhost_checkp(knh, host, port, hostkey, len,
 -                                 LIBSSH2_KNOWNHOST_TYPE_PLAIN|
 -                                 LIBSSH2_KNOWNHOST_KEYENC_RAW,
 -                                 &found);
 -    switch (r) {
 -    case LIBSSH2_KNOWNHOST_CHECK_MATCH:
 +    switch (state) {
 +    case SSH_SERVER_KNOWN_OK:
          /* OK */
 -        trace_ssh_check_host_key_knownhosts(found->key);
 +        trace_ssh_check_host_key_knownhosts();
          break;
 -    case LIBSSH2_KNOWNHOST_CHECK_MISMATCH:
 +    case SSH_SERVER_KNOWN_CHANGED:
          ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                      "host key does not match the one in known_hosts"
 -                      " (found key %s)", found->key);
 +        error_setg(errp,
 +                   "host key does not match the one in known_hosts; this "
 +                   "may be a possible attack");
          goto out;
 -    case LIBSSH2_KNOWNHOST_CHECK_NOTFOUND:
 +    case SSH_SERVER_FOUND_OTHER:
          ret = -EINVAL;
 -        session_error_setg(errp, s, "no host key was found in known_hosts");
 +        error_setg(errp,
 +                   "host key for this server not found, another type exists");
 +        goto out;
 +    case SSH_SERVER_FILE_NOT_FOUND:
 +        ret = -ENOENT;
 +        error_setg(errp, "known_hosts file not found");
          goto out;
 -    case LIBSSH2_KNOWNHOST_CHECK_FAILURE:
 +    case SSH_SERVER_NOT_KNOWN:
          ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                      "failure matching the host key with known_hosts");
 +        error_setg(errp, "no host key was found in known_hosts");
 +        goto out;
 +    case SSH_SERVER_ERROR:
 +        ret = -EINVAL;
 +        error_setg(errp, "server error");
          goto out;
      default:
          ret = -EINVAL;
 -        session_error_setg(errp, s, "unknown error matching the host key"
 -                      " with known_hosts (%d)", r);
 +        error_setg(errp, "error while checking for known server (%d)", state);
          goto out;
      }
 +#endif /* !HAVE_LIBSSH_0_8 */
      /* known_hosts checking successful. */
      ret = 0;
   out:
 -    if (knh != NULL) {
 -        libssh2_knownhost_free(knh);
 -    }
 -    g_free(knh_file);
      return ret;
  }
@@ -XXX,XX +XXX,XX @@ static int compare_fingerprint(const unsigned char *fingerprint, size_t len,
  static int
  check_host_key_hash(BDRVSSHState *s, const char *hash,
 -                    int hash_type, size_t fingerprint_len, Error **errp)
 +                    enum ssh_publickey_hash_type type, Error **errp)
  {
 -    const char *fingerprint;
 -
 -    fingerprint = libssh2_hostkey_hash(s->session, hash_type);
 -    if (!fingerprint) {
 +    int r;
 +    ssh_key pubkey;
 +    unsigned char *server_hash;
 +    size_t server_hash_len;
 +
 +#ifdef HAVE_LIBSSH_0_8
 +    r = ssh_get_server_publickey(s->session, &pubkey);
 +#else
 +    r = ssh_get_publickey(s->session, &pubkey);
 +#endif
 +    if (r != SSH_OK) {
          session_error_setg(errp, s, "failed to read remote host key");
          return -EINVAL;
      }
 -    if(compare_fingerprint((unsigned char *) fingerprint, fingerprint_len,
 -                           hash) != 0) {
 +    r = ssh_get_publickey_hash(pubkey, type, &server_hash, &server_hash_len);
 +    ssh_key_free(pubkey);
 +    if (r != 0) {
 +        session_error_setg(errp, s,
 +                           "failed reading the hash of the server SSH key");
 +        return -EINVAL;
 +    }
 +
 +    r = compare_fingerprint(server_hash, server_hash_len, hash);
 +    ssh_clean_pubkey_hash(&server_hash);
 +    if (r != 0) {
          error_setg(errp, "remote host key does not match host_key_check '%s'",
                     hash);
          return -EPERM;
@@ -XXX,XX +XXX,XX @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
      return 0;
  }
 -static int check_host_key(BDRVSSHState *s, const char *host, int port,
 -                          SshHostKeyCheck *hkc, Error **errp)
 +static int check_host_key(BDRVSSHState *s, SshHostKeyCheck *hkc, Error **errp)
  {
      SshHostKeyCheckMode mode;
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
      case SSH_HOST_KEY_CHECK_MODE_HASH:
          if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_MD5) {
              return check_host_key_hash(s, hkc->u.hash.hash,
 -                                       LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
 +                                       SSH_PUBLICKEY_HASH_MD5, errp);
          } else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA1) {
              return check_host_key_hash(s, hkc->u.hash.hash,
 -                                       LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
 +                                       SSH_PUBLICKEY_HASH_SHA1, errp);
          }
          g_assert_not_reached();
          break;
      case SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS:
 -        return check_host_key_knownhosts(s, host, port, errp);
 +        return check_host_key_knownhosts(s, errp);
      default:
          g_assert_not_reached();
      }
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
      return -EINVAL;
  }
 -static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
 +static int authenticate(BDRVSSHState *s, Error **errp)
  {
      int r, ret;
 -    const char *userauthlist;
 -    LIBSSH2_AGENT *agent = NULL;
 -    struct libssh2_agent_publickey *identity;
 -    struct libssh2_agent_publickey *prev_identity = NULL;
 +    int method;
 -    userauthlist = libssh2_userauth_list(s->session, user, strlen(user));
 -    if (strstr(userauthlist, "publickey") == NULL) {
 +    /* Try to authenticate with the "none" method. */
 +    r = ssh_userauth_none(s->session, NULL);
 +    if (r == SSH_AUTH_ERROR) {
          ret = -EPERM;
 -        error_setg(errp,
 -                "remote server does not support \"publickey\" authentication");
 +        session_error_setg(errp, s, "failed to authenticate using none "
 +                                    "authentication");
          goto out;
 -    }
 -
 -    /* Connect to ssh-agent and try each identity in turn. */
 -    agent = libssh2_agent_init(s->session);
 -    if (!agent) {
 -        ret = -EINVAL;
 -        session_error_setg(errp, s, "failed to initialize ssh-agent support");
 -        goto out;
 -    }
 -    if (libssh2_agent_connect(agent)) {
 -        ret = -ECONNREFUSED;
 -        session_error_setg(errp, s, "failed to connect to ssh-agent");
 -        goto out;
 -    }
 -    if (libssh2_agent_list_identities(agent)) {
 -        ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                           "failed requesting identities from ssh-agent");
 +    } else if (r == SSH_AUTH_SUCCESS) {
 +        /* Authenticated! */
 +        ret = 0;
          goto out;
      }
 -    for(;;) {
 -        r = libssh2_agent_get_identity(agent, &identity, prev_identity);
 -        if (r == 1) {           /* end of list */
 -            break;
 -        }
 -        if (r < 0) {
 +    method = ssh_userauth_list(s->session, NULL);
 +    trace_ssh_auth_methods(method);
 +
 +    /*
 +     * Try to authenticate with publickey, using the ssh-agent
 +     * if available.
 +     */
 +    if (method & SSH_AUTH_METHOD_PUBLICKEY) {
 +        r = ssh_userauth_publickey_auto(s->session, NULL, NULL);
 +        if (r == SSH_AUTH_ERROR) {
              ret = -EINVAL;
 -            session_error_setg(errp, s,
 -                               "failed to obtain identity from ssh-agent");
 +            session_error_setg(errp, s, "failed to authenticate using "
 +                                        "publickey authentication");
              goto out;
 -        }
 -        r = libssh2_agent_userauth(agent, user, identity);
 -        if (r == 0) {
 +        } else if (r == SSH_AUTH_SUCCESS) {
              /* Authenticated! */
              ret = 0;
              goto out;
          }
 -        /* Failed to authenticate with this identity, try the next one. */
 -        prev_identity = identity;
      }
      ret = -EPERM;
@@ -XXX,XX +XXX,XX @@ static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
                 "and the identities held by your ssh-agent");
   out:
 -    if (agent != NULL) {
 -        /* Note: libssh2 implementation implicitly calls
 -         * libssh2_agent_disconnect if necessary.
 -         */
 -        libssh2_agent_free(agent);
 -    }
 -
      return ret;
  }
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
                            int ssh_flags, int creat_mode, Error **errp)
  {
      int r, ret;
 -    long port = 0;
 +    unsigned int port = 0;
 +    int new_sock = -1;
      if (opts->has_user) {
          s->user = g_strdup(opts->user);
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
      s->inet = opts->server;
      opts->server = NULL;
 -    if (qemu_strtol(s->inet->port, NULL, 10, &port) < 0) {
 +    if (qemu_strtoui(s->inet->port, NULL, 10, &port) < 0) {
          error_setg(errp, "Use only numeric port value");
          ret = -EINVAL;
          goto err;
      }
      /* Open the socket and connect. */
 -    s->sock = inet_connect_saddr(s->inet, errp);
 -    if (s->sock < 0) {
 +    new_sock = inet_connect_saddr(s->inet, errp);
 +    if (new_sock < 0) {
          ret = -EIO;
          goto err;
      }
 +    /*
 +     * Try to disable the Nagle algorithm on TCP sockets to reduce latency,
 +     * but do not fail if it cannot be disabled.
 +     */
 +    r = socket_set_nodelay(new_sock);
 +    if (r < 0) {
 +        warn_report("can't set TCP_NODELAY for the ssh server %s: %s",
 +                    s->inet->host, strerror(errno));
 +    }
 +
      /* Create SSH session. */
 -    s->session = libssh2_session_init();
 +    s->session = ssh_new();
      if (!s->session) {
          ret = -EINVAL;
 -        session_error_setg(errp, s, "failed to initialize libssh2 session");
 +        session_error_setg(errp, s, "failed to initialize libssh session");
          goto err;
      }
 -#if TRACE_LIBSSH2 != 0
 -    libssh2_trace(s->session, TRACE_LIBSSH2);
 -#endif
 +    /*
 +     * Make sure we are in blocking mode during the connection and
 +     * authentication phases.
 +     */
 +    ssh_set_blocking(s->session, 1);
 -    r = libssh2_session_handshake(s->session, s->sock);
 -    if (r != 0) {
 +    r = ssh_options_set(s->session, SSH_OPTIONS_USER, s->user);
 +    if (r < 0) {
 +        ret = -EINVAL;
 +        session_error_setg(errp, s,
 +                           "failed to set the user in the libssh session");
 +        goto err;
 +    }
 +
 +    r = ssh_options_set(s->session, SSH_OPTIONS_HOST, s->inet->host);
 +    if (r < 0) {
 +        ret = -EINVAL;
 +        session_error_setg(errp, s,
 +                           "failed to set the host in the libssh session");
 +        goto err;
 +    }
 +
 +    if (port > 0) {
 +        r = ssh_options_set(s->session, SSH_OPTIONS_PORT, &port);
 +        if (r < 0) {
 +            ret = -EINVAL;
 +            session_error_setg(errp, s,
 +                               "failed to set the port in the libssh session");
 +            goto err;
 +        }
 +    }
 +
-+Here, "node-A" is the name QEMU internally uses to refer to the base
++    r = ssh_options_set(s->session, SSH_OPTIONS_COMPRESSION, "none");
-+image [A] -- it is the backing file, based on which the overlay image,
++    if (r < 0) {
-+[B], is created.
++        ret = -EINVAL;
-+
++        session_error_setg(errp, s,
-+To create the rest of the overlay images, [C], and [D] (omitting the raw
++                           "failed to disable the compression in the libssh "
-+JSON output for brevity)::
++                           "session");
-+
++        goto err;
-+    (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2
++    }
-+    (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2
++
-+
++    /* Read ~/.ssh/config. */
-+
++    r = ssh_options_parse_config(s->session, NULL);
-+A note on points-in-time vs file names
++    if (r < 0) {
-+--------------------------------------
++        ret = -EINVAL;
-+
++        session_error_setg(errp, s, "failed to parse ~/.ssh/config");
-+In our disk image chain::
++        goto err;
-+
++    }
-+    [A] <-- [B] <-- [C] <-- [D]
++
-+
++    r = ssh_options_set(s->session, SSH_OPTIONS_FD, &new_sock);
-+We have *three* points in time and an active layer:
++    if (r < 0) {
-+
++        ret = -EINVAL;
-+- Point 1: Guest state when [B] was created is contained in file [A]
++        session_error_setg(errp, s,
-+- Point 2: Guest state when [C] was created is contained in [A] + [B]
++                           "failed to set the socket in the libssh session");
-+- Point 3: Guest state when [D] was created is contained in
++        goto err;
-+  [A] + [B] + [C]
++    }
-+- Active layer: Current guest state is contained in [A] + [B] + [C] +
++    /* libssh took ownership of the socket. */
-+  [D]
++    s->sock = new_sock;
-+
++    new_sock = -1;
-+Therefore, be aware with naming choices:
++
-+
++    /* Connect. */
-+- Naming a file after the time it is created is misleading -- the
++    r = ssh_connect(s->session);
-+  guest data for that point in time is *not* contained in that file
++    if (r != SSH_OK) {
-+  (as explained earlier)
+         ret = -EINVAL;
-+- Rather, think of files as a *delta* from the backing file
+         session_error_setg(errp, s, "failed to establish SSH session");
-+
+         goto err;
-+
+     }
-+Live block streaming --- ``block-stream``
-+-----------------------------------------
+     /* Check the remote host's key against known_hosts. */
-+
+-    ret = check_host_key(s, s->inet->host, port, opts->host_key_check, errp);
-+The ``block-stream`` command allows you to do live copy data from backing
++    ret = check_host_key(s, opts->host_key_check, errp);
-+files into overlay images.
+     if (ret < 0) {
-+
+         goto err;
-+Given our original example disk image chain from earlier::
+     }
-+
-+    [A] <-- [B] <-- [C] <-- [D]
+     /* Authenticate. */
-+
+-    ret = authenticate(s, s->user, errp);
-+The disk image chain can be shortened in one of the following different
++    ret = authenticate(s, errp);
-+ways (not an exhaustive list).
+     if (ret < 0) {
-+
+         goto err;
-+.. _`Case-1`:
+     }
-+
-+(1) Merge everything into the active layer: I.e. copy all contents from
+     /* Start SFTP. */
-+    the base image, [A], and overlay images, [B] and [C], into [D],
+-    s->sftp = libssh2_sftp_init(s->session);
-+    *while* the guest is running.  The resulting chain will be a
++    s->sftp = sftp_new(s->session);
-+    standalone image, [D] -- with contents from [A], [B] and [C] merged
+     if (!s->sftp) {
-+    into it (where live QEMU writes go to)::
+-        session_error_setg(errp, s, "failed to initialize sftp handle");
-+
++        session_error_setg(errp, s, "failed to create sftp handle");
-+        [D]
++        ret = -EINVAL;
-+
++        goto err;
-+.. _`Case-2`:
++    }
 +
-+(2) Taking the same example disk image chain mentioned earlier, merge
++    r = sftp_init(s->sftp);
-+    only images [B] and [C] into [D], the active layer.  The result will
++    if (r < 0) {
-+    be contents of images [B] and [C] will be copied into [D], and the
++        sftp_error_setg(errp, s, "failed to initialize sftp handle");
-+    backing file pointer of image [D] will be adjusted to point to image
+         ret = -EINVAL;
-+    [A].  The resulting chain will be::
+         goto err;
-+
+     }
-+        [A] <-- [D]
-+
+     /* Open the remote file. */
-+.. _`Case-3`:
+     trace_ssh_connect_to_ssh(opts->path, ssh_flags, creat_mode);
-+
+-    s->sftp_handle = libssh2_sftp_open(s->sftp, opts->path, ssh_flags,
-+(3) Intermediate streaming (available since QEMU 2.8): Starting afresh
+-                                       creat_mode);
-+    with the original example disk image chain, with a total of four
++    s->sftp_handle = sftp_open(s->sftp, opts->path, ssh_flags, creat_mode);
-+    images, it is possible to copy contents from image [B] into image
+     if (!s->sftp_handle) {
-+    [C].  Once the copy is finished, image [B] can now be (optionally)
+-        session_error_setg(errp, s, "failed to open remote file '%s'",
-+    discarded; and the backing file pointer of image [C] will be
+-                           opts->path);
-+    adjusted to point to [A].  I.e. after performing "intermediate
++        sftp_error_setg(errp, s, "failed to open remote file '%s'",
-+    streaming" of [B] into [C], the resulting image chain will be (where
++                        opts->path);
-+    live QEMU is writing to [D])::
+         ret = -EINVAL;
-+
+         goto err;
-+        [A] <-- [C] <-- [D]
+     }
-+
-+
+-    r = libssh2_sftp_fstat(s->sftp_handle, &s->attrs);
-+QMP invocation for ``block-stream``
+-    if (r < 0) {
-+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
++    /* Make sure the SFTP file is handled in blocking mode. */
-+
++    sftp_file_set_blocking(s->sftp_handle);
-+For `Case-1`_, to merge contents of all the backing files into the
++
-+active layer, where 'node-D' is the current active image (by default
++    s->attrs = sftp_fstat(s->sftp_handle);
-+``block-stream`` will flatten the entire chain); ``qmp-shell`` (and its
++    if (!s->attrs) {
-+corresponding JSON output)::
+         sftp_error_setg(errp, s, "failed to read file attributes");
-+
+         return -EINVAL;
-+    (QEMU) block-stream device=node-D job-id=job0
+     }
-+    {
+@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
-+        "execute": "block-stream",
+     return 0;
-+        "arguments": {
-+            "device": "node-D",
+  err:
-+            "job-id": "job0"
++    if (s->attrs) {
 +        sftp_attributes_free(s->attrs);
 +    }
 +    s->attrs = NULL;
      if (s->sftp_handle) {
 -        libssh2_sftp_close(s->sftp_handle);
 +        sftp_close(s->sftp_handle);
      }
      s->sftp_handle = NULL;
      if (s->sftp) {
 -        libssh2_sftp_shutdown(s->sftp);
 +        sftp_free(s->sftp);
      }
      s->sftp = NULL;
      if (s->session) {
 -        libssh2_session_disconnect(s->session,
 -                                   "from qemu ssh client: "
 -                                   "error opening connection");
 -        libssh2_session_free(s->session);
 +        ssh_disconnect(s->session);
 +        ssh_free(s->session);
      }
      s->session = NULL;
 +    s->sock = -1;
 +    if (new_sock >= 0) {
 +        close(new_sock);
 +    }
      return ret;
  }
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
      ssh_state_init(s);
 -    ssh_flags = LIBSSH2_FXF_READ;
 +    ssh_flags = 0;
      if (bdrv_flags & BDRV_O_RDWR) {
 -        ssh_flags |= LIBSSH2_FXF_WRITE;
 +        ssh_flags |= O_RDWR;
 +    } else {
 +        ssh_flags |= O_RDONLY;
      }
      opts = ssh_parse_options(options, errp);
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
      }
      /* Go non-blocking. */
 -    libssh2_session_set_blocking(s->session, 0);
 +    ssh_set_blocking(s->session, 0);
      qapi_free_BlockdevOptionsSsh(opts);
      return 0;
   err:
 -    if (s->sock >= 0) {
 -        close(s->sock);
 -    }
 -    s->sock = -1;
 -
      qapi_free_BlockdevOptionsSsh(opts);
      return ret;
@@ -XXX,XX +XXX,XX @@ static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
  {
      ssize_t ret;
      char c[1] = { '\0' };
 -    int was_blocking = libssh2_session_get_blocking(s->session);
 +    int was_blocking = ssh_is_blocking(s->session);
      /* offset must be strictly greater than the current size so we do
       * not overwrite anything */
 -    assert(offset > 0 && offset > s->attrs.filesize);
 +    assert(offset > 0 && offset > s->attrs->size);
 -    libssh2_session_set_blocking(s->session, 1);
 +    ssh_set_blocking(s->session, 1);
 -    libssh2_sftp_seek64(s->sftp_handle, offset - 1);
 -    ret = libssh2_sftp_write(s->sftp_handle, c, 1);
 +    sftp_seek64(s->sftp_handle, offset - 1);
 +    ret = sftp_write(s->sftp_handle, c, 1);
 -    libssh2_session_set_blocking(s->session, was_blocking);
 +    ssh_set_blocking(s->session, was_blocking);
      if (ret < 0) {
          sftp_error_setg(errp, s, "Failed to grow file");
          return -EIO;
      }
 -    s->attrs.filesize = offset;
 +    s->attrs->size = offset;
      return 0;
  }
@@ -XXX,XX +XXX,XX @@ static int ssh_co_create(BlockdevCreateOptions *options, Error **errp)
      ssh_state_init(&s);
      ret = connect_to_ssh(&s, opts->location,
 -                         LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
 -                         LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
 +                         O_RDWR | O_CREAT | O_TRUNC,
 , errp);
      if (ret < 0) {
          goto fail;
@@ -XXX,XX +XXX,XX @@ static int ssh_has_zero_init(BlockDriverState *bs)
      /* Assume false, unless we can positively prove it's true. */
      int has_zero_init = 0;
 -    if (s->attrs.flags & LIBSSH2_SFTP_ATTR_PERMISSIONS) {
 -        if (s->attrs.permissions & LIBSSH2_SFTP_S_IFREG) {
 -            has_zero_init = 1;
 -        }
 +    if (s->attrs->type == SSH_FILEXFER_TYPE_REGULAR) {
 +        has_zero_init = 1;
      }
      return has_zero_init;
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
          .co = qemu_coroutine_self()
      };
 -    r = libssh2_session_block_directions(s->session);
 +    r = ssh_get_poll_flags(s->session);
 -    if (r & LIBSSH2_SESSION_BLOCK_INBOUND) {
 +    if (r & SSH_READ_PENDING) {
          rd_handler = restart_coroutine;
      }
 -    if (r & LIBSSH2_SESSION_BLOCK_OUTBOUND) {
 +    if (r & SSH_WRITE_PENDING) {
          wr_handler = restart_coroutine;
      }
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
      trace_ssh_co_yield_back(s->sock);
  }
 -/* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
 - * in the remote file.  Notice that it just updates a field in the
 - * sftp_handle structure, so there is no network traffic and it cannot
 - * fail.
 - *
 - * However, `libssh2_sftp_seek64' does have a catastrophic effect on
 - * performance since it causes the handle to throw away all in-flight
 - * reads and buffered readahead data.  Therefore this function tries
 - * to be intelligent about when to call the underlying libssh2 function.
 - */
 -#define SSH_SEEK_WRITE 0
 -#define SSH_SEEK_READ  1
 -#define SSH_SEEK_FORCE 2
 -
 -static void ssh_seek(BDRVSSHState *s, int64_t offset, int flags)
 -{
 -    bool op_read = (flags & SSH_SEEK_READ) != 0;
 -    bool force = (flags & SSH_SEEK_FORCE) != 0;
 -
 -    if (force || op_read != s->offset_op_read || offset != s->offset) {
 -        trace_ssh_seek(offset);
 -        libssh2_sftp_seek64(s->sftp_handle, offset);
 -        s->offset = offset;
 -        s->offset_op_read = op_read;
 -    }
 -}
 -
  static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
                                   int64_t offset, size_t size,
                                   QEMUIOVector *qiov)
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
      trace_ssh_read(offset, size);
 -    ssh_seek(s, offset, SSH_SEEK_READ);
 +    trace_ssh_seek(offset);
 +    sftp_seek64(s->sftp_handle, offset);
      /* This keeps track of the current iovec element ('i'), where we
       * will write to next ('buf'), and the end of the current iovec
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
      buf = i->iov_base;
      end_of_vec = i->iov_base + i->iov_len;
 -    /* libssh2 has a hard-coded limit of 2000 bytes per request,
 -     * although it will also do readahead behind our backs.  Therefore
 -     * we may have to do repeated reads here until we have read 'size'
 -     * bytes.
 -     */
      for (got = 0; got < size; ) {
 +        size_t request_read_size;
      again:
 -        trace_ssh_read_buf(buf, end_of_vec - buf);
 -        r = libssh2_sftp_read(s->sftp_handle, buf, end_of_vec - buf);
 -        trace_ssh_read_return(r);
 +        /*
 +         * The size of SFTP packets is limited to 32K bytes, so limit
 +         * the amount of data requested to 16K, as libssh currently
 +         * does not handle multiple requests on its own.
 +         */
 +        request_read_size = MIN(end_of_vec - buf, 16384);
 +        trace_ssh_read_buf(buf, end_of_vec - buf, request_read_size);
 +        r = sftp_read(s->sftp_handle, buf, request_read_size);
 +        trace_ssh_read_return(r, sftp_get_error(s->sftp));
 -        if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
 +        if (r == SSH_AGAIN) {
              co_yield(s, bs);
              goto again;
          }
 -        if (r < 0) {
 -            sftp_error_trace(s, "read");
 -            s->offset = -1;
 -            return -EIO;
 -        }
 -        if (r == 0) {
 +        if (r == SSH_EOF || (r == 0 && sftp_get_error(s->sftp) == SSH_FX_EOF)) {
              /* EOF: Short read so pad the buffer with zeroes and return it. */
              qemu_iovec_memset(qiov, got, 0, size - got);
              return 0;
          }
 +        if (r <= 0) {
 +            sftp_error_trace(s, "read");
 +            return -EIO;
 +        }
+         got += r;
+         buf += r;
+-        s->offset += r;
+         if (buf >= end_of_vec && got < size) {
+             i++;
+             buf = i->iov_base;
+@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
+     trace_ssh_write(offset, size);
+-    ssh_seek(s, offset, SSH_SEEK_WRITE);
++    trace_ssh_seek(offset);
++    sftp_seek64(s->sftp_handle, offset);
+     /* This keeps track of the current iovec element ('i'), where we
+      * will read from next ('buf'), and the end of the current iovec
+@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
+     end_of_vec = i->iov_base + i->iov_len;
+     for (written = 0; written < size; ) {
++        size_t request_write_size;
+     again:
+-        trace_ssh_write_buf(buf, end_of_vec - buf);
+-        r = libssh2_sftp_write(s->sftp_handle, buf, end_of_vec - buf);
+-        trace_ssh_write_return(r);
++        /*
++         * Avoid too large data packets, as libssh currently does not
++         * handle multiple requests on its own.
++         */
++        request_write_size = MIN(end_of_vec - buf, 131072);
++        trace_ssh_write_buf(buf, end_of_vec - buf, request_write_size);
++        r = sftp_write(s->sftp_handle, buf, request_write_size);
++        trace_ssh_write_return(r, sftp_get_error(s->sftp));
+-        if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
++        if (r == SSH_AGAIN) {
+             co_yield(s, bs);
+             goto again;
+         }
+         if (r < 0) {
+             sftp_error_trace(s, "write");
+-            s->offset = -1;
+             return -EIO;
+         }
+-        /* The libssh2 API is very unclear about this.  A comment in
+-         * the code says "nothing was acked, and no EAGAIN was
+-         * received!" which apparently means that no data got sent
+-         * out, and the underlying channel didn't return any EAGAIN
+-         * indication.  I think this is a bug in either libssh2 or
+-         * OpenSSH (server-side).  In any case, forcing a seek (to
+-         * discard libssh2 internal buffers), and then trying again
+-         * works for me.
+-         */
+-        if (r == 0) {
+-            ssh_seek(s, offset + written, SSH_SEEK_WRITE|SSH_SEEK_FORCE);
+-            co_yield(s, bs);
+-            goto again;
+-        }
+         written += r;
+         buf += r;
+-        s->offset += r;
+         if (buf >= end_of_vec && written < size) {
+             i++;
+             buf = i->iov_base;
+             end_of_vec = i->iov_base + i->iov_len;
+         }
+-        if (offset + written > s->attrs.filesize)
+-            s->attrs.filesize = offset + written;
++        if (offset + written > s->attrs->size) {
++            s->attrs->size = offset + written;
++        }
+     }
+     return 0;
+@@ -XXX,XX +XXX,XX @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
+     }
+ }
+-#ifdef HAS_LIBSSH2_SFTP_FSYNC
++#ifdef HAVE_LIBSSH_0_8
+ static coroutine_fn int ssh_flush(BDRVSSHState *s, BlockDriverState *bs)
+ {
+     int r;
+     trace_ssh_flush();
++
++    if (!sftp_extension_supported(s->sftp, "fsync@openssh.com", "1")) {
++        unsafe_flush_warning(s, "OpenSSH >= 6.3");
++        return 0;
 +    }
-+
+  again:
-+For `Case-2`_, merge contents of the images [B] and [C] into [D], where
+-    r = libssh2_sftp_fsync(s->sftp_handle);
-+image [D] ends up referring to image [A] as its backing file::
+-    if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
-+
++    r = sftp_fsync(s->sftp_handle);
-+    (QEMU) block-stream device=node-D base-node=node-A job-id=job0
++    if (r == SSH_AGAIN) {
-+
+         co_yield(s, bs);
-+And for `Case-3`_, of "intermediate" streaming", merge contents of
+         goto again;
-+images [B] into [C], where [C] ends up referring to [A] as its backing
+     }
-+image::
+-    if (r == LIBSSH2_ERROR_SFTP_PROTOCOL &&
-+
+-        libssh2_sftp_last_error(s->sftp) == LIBSSH2_FX_OP_UNSUPPORTED) {
-+    (QEMU) block-stream device=node-C base-node=node-A job-id=job0
+-        unsafe_flush_warning(s, "OpenSSH >= 6.3");
-+
+-        return 0;
-+Progress of a ``block-stream`` operation can be monitored via the QMP
+-    }
-+command::
+     if (r < 0) {
-+
+         sftp_error_trace(s, "fsync");
-+    (QEMU) query-block-jobs
+         return -EIO;
-+    {
+@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
-+        "execute": "query-block-jobs",
+     return ret;
-+        "arguments": {}
+ }
-+    }
-+
+-#else /* !HAS_LIBSSH2_SFTP_FSYNC */
-+
++#else /* !HAVE_LIBSSH_0_8 */
-+Once the ``block-stream`` operation has completed, QEMU will emit an
-+event, ``BLOCK_JOB_COMPLETED``.  The intermediate overlays remain valid,
+ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
-+and can now be (optionally) discarded, or retained to create further
+ {
-+overlays based on them.  Finally, the ``block-stream`` jobs can be
+     BDRVSSHState *s = bs->opaque;
-+restarted at anytime.
-+
+-    unsafe_flush_warning(s, "libssh2 >= 1.4.4");
-+
++    unsafe_flush_warning(s, "libssh >= 0.8.0");
-+Live block commit --- ``block-commit``
+     return 0;
-+--------------------------------------
+ }
-+
-+The ``block-commit`` command lets you merge live data from overlay
+-#endif /* !HAS_LIBSSH2_SFTP_FSYNC */
-+images into backing file(s).  Since QEMU 2.0, this includes "live active
++#endif /* !HAVE_LIBSSH_0_8 */
-+commit" (i.e. it is possible to merge the "active layer", the right-most
-+image in a disk image chain where live QEMU will be writing to, into the
+ static int64_t ssh_getlength(BlockDriverState *bs)
-+base image).  This is analogous to ``block-stream``, but in the opposite
+ {
-+direction.
+     BDRVSSHState *s = bs->opaque;
-+
+     int64_t length;
-+Again, starting afresh with our example disk image chain, where live
-+QEMU is writing to the right-most image in the chain, [D]::
+-    /* Note we cannot make a libssh2 call here. */
-+
+-    length = (int64_t) s->attrs.filesize;
-+    [A] <-- [B] <-- [C] <-- [D]
++    /* Note we cannot make a libssh call here. */
-+
++    length = (int64_t) s->attrs->size;
-+The disk image chain can be shortened in one of the following ways:
+     trace_ssh_getlength(length);
-+
-+.. _`block-commit_Case-1`:
+     return length;
-+
+@@ -XXX,XX +XXX,XX @@ static int coroutine_fn ssh_co_truncate(BlockDriverState *bs, int64_t offset,
-+(1) Commit content from only image [B] into image [A].  The resulting
+         return -ENOTSUP;
-+    chain is the following, where image [C] is adjusted to point at [A]
+     }
-+    as its new backing file::
-+
+-    if (offset < s->attrs.filesize) {
-+        [A] <-- [C] <-- [D]
++    if (offset < s->attrs->size) {
-+
+         error_setg(errp, "ssh driver does not support shrinking files");
-+(2) Commit content from images [B] and [C] into image [A].  The
+         return -ENOTSUP;
-+    resulting chain, where image [D] is adjusted to point to image [A]
+     }
-+    as its new backing file::
-+
+-    if (offset == s->attrs.filesize) {
-+        [A] <-- [D]
++    if (offset == s->attrs->size) {
-+
+         return 0;
-+.. _`block-commit_Case-3`:
+     }
-+
-+(3) Commit content from images [B], [C], and the active layer [D] into
+@@ -XXX,XX +XXX,XX @@ static void bdrv_ssh_init(void)
-+    image [A].  The resulting chain (in this case, a consolidated single
+ {
-+    image)::
+     int r;
-+
-+        [A]
+-    r = libssh2_init(0);
-+
++    r = ssh_init();
-+(4) Commit content from image only image [C] into image [B].  The
+     if (r != 0) {
-+    resulting chain::
+-        fprintf(stderr, "libssh2 initialization failed, %d\n", r);
-+
++        fprintf(stderr, "libssh initialization failed, %d\n", r);
-+    [A] <-- [B] <-- [D]
+         exit(EXIT_FAILURE);
-+
+     }
-+(5) Commit content from image [C] and the active layer [D] into image
-+    [B].  The resulting chain::
++#if TRACE_LIBSSH != 0
-+
++    ssh_set_log_level(TRACE_LIBSSH);
-+    [A] <-- [B]
++#endif
 +
-+
+     bdrv_register(&bdrv_ssh);
-+QMP invocation for ``block-commit``
+ }
-+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-+
+diff --git a/.travis.yml b/.travis.yml
-+For :ref:`Case-1 <block-commit_Case-1>`, to merge contents only from
+index XXXXXXX..XXXXXXX 100644
-+image [B] into image [A], the invocation is as follows::
+--- a/.travis.yml
-+
++++ b/.travis.yml
-+    (QEMU) block-commit device=node-D base=a.qcow2 top=b.qcow2 job-id=job0
+@@ -XXX,XX +XXX,XX @@ addons:
-+    {
+       - libseccomp-dev
-+        "execute": "block-commit",
+       - libspice-protocol-dev
-+        "arguments": {
+       - libspice-server-dev
-+            "device": "node-D",
+-      - libssh2-1-dev
-+            "job-id": "job0",
++      - libssh-dev
-+            "top": "b.qcow2",
+       - liburcu-dev
-+            "base": "a.qcow2"
+       - libusb-1.0-0-dev
-+        }
+       - libvte-2.91-dev
-+    }
+@@ -XXX,XX +XXX,XX @@ matrix:
-+
+             - libseccomp-dev
-+Once the above ``block-commit`` operation has completed, a
+             - libspice-protocol-dev
-+``BLOCK_JOB_COMPLETED`` event will be issued, and no further action is
+             - libspice-server-dev
-+required.  As the end result, the backing file of image [C] is adjusted
+-            - libssh2-1-dev
-+to point to image [A], and the original 4-image chain will end up being
++            - libssh-dev
-+transformed to::
+             - liburcu-dev
-+
+             - libusb-1.0-0-dev
-+    [A] <-- [C] <-- [D]
+             - libvte-2.91-dev
-+
+diff --git a/block/trace-events b/block/trace-events
-+.. note::
+index XXXXXXX..XXXXXXX 100644
-+    The intermediate image [B] is invalid (as in: no more further
+--- a/block/trace-events
-+    overlays based on it can be created).
++++ b/block/trace-events
-+
+@@ -XXX,XX +XXX,XX @@ nbd_client_connect_success(const char *export_name) "export '%s'"
-+    Reasoning: An intermediate image after a 'stream' operation still
+ # ssh.c
-+    represents that old point-in-time, and may be valid in that context.
+ ssh_restart_coroutine(void *co) "co=%p"
-+    However, an intermediate image after a 'commit' operation no longer
+ ssh_flush(void) "fsync"
-+    represents any point-in-time, and is invalid in any context.
+-ssh_check_host_key_knownhosts(const char *key) "host key OK: %s"
-+
++ssh_check_host_key_knownhosts(void) "host key OK"
-+
+ ssh_connect_to_ssh(char *path, int flags, int mode) "opening file %s flags=0x%x creat_mode=0%o"
-+However, :ref:`Case-3 <block-commit_Case-3>` (also called: "active
+ ssh_co_yield(int sock, void *rd_handler, void *wr_handler) "s->sock=%d rd_handler=%p wr_handler=%p"
-+``block-commit``") is a *two-phase* operation: In the first phase, the
+ ssh_co_yield_back(int sock) "s->sock=%d - back"
-+content from the active overlay, along with the intermediate overlays,
+ ssh_getlength(int64_t length) "length=%" PRIi64
-+is copied into the backing file (also called the base image).  In the
+ ssh_co_create_opts(uint64_t size) "total_size=%" PRIu64
-+second phase, adjust the said backing file as the current active image
+ ssh_read(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
-+-- possible via issuing the command ``block-job-complete``.  Optionally,
+-ssh_read_buf(void *buf, size_t size) "sftp_read buf=%p size=%zu"
-+the ``block-commit`` operation can be cancelled by issuing the command
+-ssh_read_return(ssize_t ret) "sftp_read returned %zd"
-+``block-job-cancel``, but be careful when doing this.
++ssh_read_buf(void *buf, size_t size, size_t actual_size) "sftp_read buf=%p size=%zu (actual size=%zu)"
-+
++ssh_read_return(ssize_t ret, int sftp_err) "sftp_read returned %zd (sftp error=%d)"
-+Once the ``block-commit`` operation has completed, the event
+ ssh_write(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
-+``BLOCK_JOB_READY`` will be emitted, signalling that the synchronization
+-ssh_write_buf(void *buf, size_t size) "sftp_write buf=%p size=%zu"
-+has finished.  Now the job can be gracefully completed by issuing the
+-ssh_write_return(ssize_t ret) "sftp_write returned %zd"
-+command ``block-job-complete`` -- until such a command is issued, the
++ssh_write_buf(void *buf, size_t size, size_t actual_size) "sftp_write buf=%p size=%zu (actual size=%zu)"
-+'commit' operation remains active.
++ssh_write_return(ssize_t ret, int sftp_err) "sftp_write returned %zd (sftp error=%d)"
-+
+ ssh_seek(int64_t offset) "seeking to offset=%" PRIi64
-+The following is the flow for :ref:`Case-3 <block-commit_Case-3>` to
++ssh_auth_methods(int methods) "auth methods=0x%x"
-+convert a disk image chain such as this::
++ssh_server_status(int status) "server status=%d"
-+
-+    [A] <-- [B] <-- [C] <-- [D]
+ # curl.c
-+
+ curl_timer_cb(long timeout_ms) "timer callback timeout_ms %ld"
-+Into::
+@@ -XXX,XX +XXX,XX @@ sheepdog_snapshot_create(const char *sn_name, const char *id) "%s %s"
-+
+ sheepdog_snapshot_create_inode(const char *name, uint32_t snap, uint32_t vdi) "s->inode: name %s snap_id 0x%" PRIx32 " vdi 0x%" PRIx32
-+    [A]
-+
+ # ssh.c
-+Where content from all the subsequent overlays, [B], and [C], including
+-sftp_error(const char *op, const char *ssh_err, int ssh_err_code, unsigned long sftp_err_code) "%s failed: %s (libssh2 error code: %d, sftp error code: %lu)"
-+the active layer, [D], is committed back to [A] -- which is where live
++sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
-+QEMU is performing all its current writes).
+diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
-+
+index XXXXXXX..XXXXXXX 100644
-+Start the "active ``block-commit``" operation::
+--- a/docs/qemu-block-drivers.texi
-+
++++ b/docs/qemu-block-drivers.texi
-+    (QEMU) block-commit device=node-D base=a.qcow2 top=d.qcow2 job-id=job0
+@@ -XXX,XX +XXX,XX @@ print a warning when @code{fsync} is not supported:
-+    {
-+        "execute": "block-commit",
+ warning: ssh server @code{ssh.example.com:22} does not support fsync
-+        "arguments": {
-+            "device": "node-D",
+-With sufficiently new versions of libssh2 and OpenSSH, @code{fsync} is
-+            "job-id": "job0",
++With sufficiently new versions of libssh and OpenSSH, @code{fsync} is
-+            "top": "d.qcow2",
+ supported.
-+            "base": "a.qcow2"
-+        }
+ @node disk_images_nvme
-+    }
+diff --git a/tests/docker/dockerfiles/debian-win32-cross.docker b/tests/docker/dockerfiles/debian-win32-cross.docker
-+
+index XXXXXXX..XXXXXXX 100644
-+
+--- a/tests/docker/dockerfiles/debian-win32-cross.docker
-+Once the synchronization has completed, the event ``BLOCK_JOB_READY`` will
++++ b/tests/docker/dockerfiles/debian-win32-cross.docker
-+be emitted.
+@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
-+
+         mxe-$TARGET-w64-mingw32.shared-curl \
-+Then, optionally query for the status of the active block operations.
+         mxe-$TARGET-w64-mingw32.shared-glib \
-+We can see the 'commit' job is now ready to be completed, as indicated
+         mxe-$TARGET-w64-mingw32.shared-libgcrypt \
-+by the line *"ready": true*::
+-        mxe-$TARGET-w64-mingw32.shared-libssh2 \
-+
+         mxe-$TARGET-w64-mingw32.shared-libusb1 \
-+    (QEMU) query-block-jobs
+         mxe-$TARGET-w64-mingw32.shared-lzo \
-+    {
+         mxe-$TARGET-w64-mingw32.shared-nettle \
-+        "execute": "query-block-jobs",
+diff --git a/tests/docker/dockerfiles/debian-win64-cross.docker b/tests/docker/dockerfiles/debian-win64-cross.docker
-+        "arguments": {}
+index XXXXXXX..XXXXXXX 100644
-+    }
+--- a/tests/docker/dockerfiles/debian-win64-cross.docker
-+    {
++++ b/tests/docker/dockerfiles/debian-win64-cross.docker
-+        "return": [
+@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
-+            {
+         mxe-$TARGET-w64-mingw32.shared-curl \
-+                "busy": false,
+         mxe-$TARGET-w64-mingw32.shared-glib \
-+                "type": "commit",
+         mxe-$TARGET-w64-mingw32.shared-libgcrypt \
-+                "len": 1376256,
+-        mxe-$TARGET-w64-mingw32.shared-libssh2 \
-+                "paused": false,
+         mxe-$TARGET-w64-mingw32.shared-libusb1 \
-+                "ready": true,
+         mxe-$TARGET-w64-mingw32.shared-lzo \
-+                "io-status": "ok",
+         mxe-$TARGET-w64-mingw32.shared-nettle \
-+                "offset": 1376256,
+diff --git a/tests/docker/dockerfiles/fedora.docker b/tests/docker/dockerfiles/fedora.docker
-+                "device": "job0",
+index XXXXXXX..XXXXXXX 100644
-+                "speed": 0
+--- a/tests/docker/dockerfiles/fedora.docker
-+            }
++++ b/tests/docker/dockerfiles/fedora.docker
-+        ]
+@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
-+    }
+     libpng-devel \
-+
+     librbd-devel \
-+Gracefully complete the 'commit' block device job::
+     libseccomp-devel \
-+
+-    libssh2-devel \
-+    (QEMU) block-job-complete device=job0
++    libssh-devel \
-+    {
+     libubsan \
-+        "execute": "block-job-complete",
+     libusbx-devel \
-+        "arguments": {
+     libxml2-devel \
-+            "device": "job0"
+@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
-+        }
+     mingw32-gtk3 \
-+    }
+     mingw32-libjpeg-turbo \
-+    {
+     mingw32-libpng \
-+        "return": {}
+-    mingw32-libssh2 \
-+    }
+     mingw32-libtasn1 \
-+
+     mingw32-nettle \
-+Finally, once the above job is completed, an event
+     mingw32-pixman \
-+``BLOCK_JOB_COMPLETED`` will be emitted.
+@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
-+
+     mingw64-gtk3 \
-+.. note::
+     mingw64-libjpeg-turbo \
-+    The invocation for rest of the cases (2, 4, and 5), discussed in the
+     mingw64-libpng \
-+    previous section, is omitted for brevity.
+-    mingw64-libssh2 \
-+
+     mingw64-libtasn1 \
-+
+     mingw64-nettle \
-+Live disk synchronization --- ``drive-mirror`` and ``blockdev-mirror``
+     mingw64-pixman \
-+----------------------------------------------------------------------
+diff --git a/tests/docker/dockerfiles/ubuntu.docker b/tests/docker/dockerfiles/ubuntu.docker
-+
+index XXXXXXX..XXXXXXX 100644
-+Synchronize a running disk image chain (all or part of it) to a target
+--- a/tests/docker/dockerfiles/ubuntu.docker
-+image.
++++ b/tests/docker/dockerfiles/ubuntu.docker
-+
+@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
-+Again, given our familiar disk image chain::
+     libsnappy-dev \
-+
+     libspice-protocol-dev \
-+    [A] <-- [B] <-- [C] <-- [D]
+     libspice-server-dev \
-+
+-    libssh2-1-dev \
-+The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``) allows
++    libssh-dev \
-+you to copy data from the entire chain into a single target image (which
+     libusb-1.0-0-dev \
-+can be located on a different host).
+     libusbredirhost-dev \
-+
+     libvdeplug-dev \
-+Once a 'mirror' job has started, there are two possible actions while a
+diff --git a/tests/docker/dockerfiles/ubuntu1804.docker b/tests/docker/dockerfiles/ubuntu1804.docker
-+``drive-mirror`` job is active:
+index XXXXXXX..XXXXXXX 100644
-+
+--- a/tests/docker/dockerfiles/ubuntu1804.docker
-+(1) Issuing the command ``block-job-cancel`` after it emits the event
++++ b/tests/docker/dockerfiles/ubuntu1804.docker
-+    ``BLOCK_JOB_CANCELLED``: will (after completing synchronization of
+@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
-+    the content from the disk image chain to the target image, [E])
+     libsnappy-dev \
-+    create a point-in-time (which is at the time of *triggering* the
+     libspice-protocol-dev \
-+    cancel command) copy, contained in image [E], of the the entire disk
+     libspice-server-dev \
-+    image chain (or only the top-most image, depending on the ``sync``
+-    libssh2-1-dev \
-+    mode).
++    libssh-dev \
-+
+     libusb-1.0-0-dev \
-+(2) Issuing the command ``block-job-complete`` after it emits the event
+     libusbredirhost-dev \
-+    ``BLOCK_JOB_COMPLETED``: will, after completing synchronization of
+     libvdeplug-dev \
-+    the content, adjust the guest device (i.e. live QEMU) to point to
+diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
-+    the target image, and, causing all the new writes from this point on
+index XXXXXXX..XXXXXXX 100755
-+    to happen there.  One use case for this is live storage migration.
+--- a/tests/qemu-iotests/207
-+
++++ b/tests/qemu-iotests/207
-+About synchronization modes: The synchronization mode determines
+@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
-+*which* part of the disk image chain will be copied to the target.
-+Currently, there are four different kinds:
+     iotests.img_info_log(remote_path)
-+
-+(1) ``full`` -- Synchronize the content of entire disk image chain to
+-    md5_key = subprocess.check_output(
-+    the target
+-        'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
-+
+-        'cut -d" " -f3 | base64 -d | md5sum -b | cut -d" " -f1',
-+(2) ``top`` -- Synchronize only the contents of the top-most disk image
+-        shell=True).rstrip().decode('ascii')
-+    in the chain to the target
++    keys = subprocess.check_output(
-+
++        'ssh-keyscan 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
-+(3) ``none`` -- Synchronize only the new writes from this point on.
++        'cut -d" " -f3',
-+
++        shell=True).rstrip().decode('ascii').split('\n')
-+    .. note:: In the case of ``drive-backup`` (or ``blockdev-backup``),
++
-+              the behavior of ``none`` synchronization mode is different.
++    # Mappings of base64 representations to digests
-+              Normally, a ``backup`` job consists of two parts: Anything
++    md5_keys = {}
-+              that is overwritten by the guest is first copied out to
++    sha1_keys = {}
-+              the backup, and in the background the whole image is
++
-+              copied from start to end. With ``sync=none``, it's only
++    for key in keys:
-+              the first part.
++        md5_keys[key] = subprocess.check_output(
-+
++            'echo %s | base64 -d | md5sum -b | cut -d" " -f1' % key,
-+(4) ``incremental`` -- Synchronize content that is described by the
++            shell=True).rstrip().decode('ascii')
-+    dirty bitmap
++
-+
++        sha1_keys[key] = subprocess.check_output(
-+.. note::
++            'echo %s | base64 -d | sha1sum -b | cut -d" " -f1' % key,
-+    Refer to the :doc:`bitmaps` document in the QEMU source
++            shell=True).rstrip().decode('ascii')
-+    tree to learn about the detailed workings of the ``incremental``
-+    synchronization mode.
+     vm.launch()
 +
-+
++    # Find correct key first
-+QMP invocation for ``drive-mirror``
++    matching_key = None
-+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
++    for key in keys:
-+
++        result = vm.qmp('blockdev-add',
-+To copy the contents of the entire disk image chain, from [A] all the
++                        driver='ssh', node_name='node0', path=disk_path,
-+way to [D], to a new target (``drive-mirror`` will create the destination
++                        server={
-+file, if it doesn't already exist), call it [E]::
++                             'host': '127.0.0.1',
-+
++                             'port': '22',
-+    (QEMU) drive-mirror device=node-D target=e.qcow2 sync=full job-id=job0
++                        }, host_key_check={
-+    {
++                             'mode': 'hash',
-+        "execute": "drive-mirror",
++                             'type': 'md5',
-+        "arguments": {
++                             'hash': md5_keys[key],
-+            "device": "node-D",
++                        })
-+            "job-id": "job0",
++
-+            "target": "e.qcow2",
++        if 'error' not in result:
-+            "sync": "full"
++            vm.qmp('blockdev-del', node_name='node0')
-+        }
++            matching_key = key
-+    }
++            break
 +
-+The ``"sync": "full"``, from the above, means: copy the *entire* chain
++    if matching_key is None:
-+to the destination.
++        vm.shutdown()
-+
++        iotests.notrun('Did not find a key that fits 127.0.0.1')
-+Following the above, querying for active block jobs will show that a
++
-+'mirror' job is "ready" to be completed (and QEMU will also emit an
+     blockdev_create(vm, { 'driver': 'ssh',
-+event, ``BLOCK_JOB_READY``)::
+                           'location': {
-+
+                               'path': disk_path,
-+    (QEMU) query-block-jobs
+@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
-+    {
+                               'host-key-check': {
-+        "execute": "query-block-jobs",
+                                   'mode': 'hash',
-+        "arguments": {}
+                                   'type': 'md5',
-+    }
+-                                  'hash': md5_key,
-+    {
++                                  'hash': md5_keys[matching_key],
-+        "return": [
+                               }
-+            {
+                           },
-+                "busy": false,
+                           'size': 8388608 })
-+                "type": "mirror",
+@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
-+                "len": 21757952,
-+                "paused": false,
+     iotests.img_info_log(remote_path)
-+                "ready": true,
-+                "io-status": "ok",
+-    sha1_key = subprocess.check_output(
-+                "offset": 21757952,
+-        'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
-+                "device": "job0",
+-        'cut -d" " -f3 | base64 -d | sha1sum -b | cut -d" " -f1',
-+                "speed": 0
+-        shell=True).rstrip().decode('ascii')
 +            }
 +        ]
 +    }
 +
 +And, as noted in the previous section, there are two possible actions
 +at this point:
 +
 +(a) Create a point-in-time snapshot by ending the synchronization.  The
 +    point-in-time is at the time of *ending* the sync.  (The result of
 +    the following being: the target image, [E], will be populated with
 +    content from the entire chain, [A] to [D])::
 +
 +        (QEMU) block-job-cancel device=job0
 +        {
 +            "execute": "block-job-cancel",
 +            "arguments": {
 +                "device": "job0"
 +            }
 +        }
 +
 +(b) Or, complete the operation and pivot the live QEMU to the target
 +    copy::
 +
 +        (QEMU) block-job-complete device=job0
 +
 +In either of the above cases, if you once again run the
 +`query-block-jobs` command, there should not be any active block
 +operation.
 +
 +Comparing 'commit' and 'mirror': In both then cases, the overlay images
 +can be discarded.  However, with 'commit', the *existing* base image
 +will be modified (by updating it with contents from overlays); while in
 +the case of 'mirror', a *new* target image is populated with the data
 +from the disk image chain.
 +
 +
 +QMP invocation for live storage migration with ``drive-mirror`` + NBD
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +Live storage migration (without shared storage setup) is one of the most
 +common use-cases that takes advantage of the ``drive-mirror`` primitive
 +and QEMU's built-in Network Block Device (NBD) server.  Here's a quick
 +walk-through of this setup.
 +
 +Given the disk image chain::
 +
 +    [A] <-- [B] <-- [C] <-- [D]
 +
 +Instead of copying content from the entire chain, synchronize *only* the
 +contents of the *top*-most disk image (i.e. the active layer), [D], to a
 +target, say, [TargetDisk].
 +
 +.. important::
 +    The destination host must already have the contents of the backing
 +    chain, involving images [A], [B], and [C], visible via other means
 +    -- whether by ``cp``, ``rsync``, or by some storage array-specific
 +    command.)
 +
 +Sometimes, this is also referred to as "shallow copy" -- because only
 +the "active layer", and not the rest of the image chain, is copied to
 +the destination.
 +
 +.. note::
 +    In this example, for the sake of simplicity, we'll be using the same
 +    ``localhost`` as both source and destination.
 +
 +As noted earlier, on the destination host the contents of the backing
 +chain -- from images [A] to [C] -- are already expected to exist in some
 +form (e.g. in a file called, ``Contents-of-A-B-C.qcow2``).  Now, on the
 +destination host, let's create a target overlay image (with the image
 +``Contents-of-A-B-C.qcow2`` as its backing file), to which the contents
 +of image [D] (from the source QEMU) will be mirrored to::
 +
 +    $ qemu-img create -f qcow2 -b ./Contents-of-A-B-C.qcow2 \
 +        -F qcow2 ./target-disk.qcow2
 +
 +And start the destination QEMU (we already have the source QEMU running
 +-- discussed in the section: `Interacting with a QEMU instance`_)
 +instance, with the following invocation.  (As noted earlier, for
 +simplicity's sake, the destination QEMU is started on the same host, but
 +it could be located elsewhere)::
 +
 +    $ ./x86_64-softmmu/qemu-system-x86_64 -display none -nodefconfig \
 +        -M q35 -nodefaults -m 512 \
 +        -blockdev node-name=node-TargetDisk,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./target-disk.qcow2 \
 +        -device virtio-blk,drive=node-TargetDisk,id=virtio0 \
 +        -S -monitor stdio -qmp unix:./qmp-sock2,server,nowait \
 +        -incoming tcp:localhost:6666
 +
 +Given the disk image chain on source QEMU::
 +
 +    [A] <-- [B] <-- [C] <-- [D]
 +
 +On the destination host, it is expected that the contents of the chain
 +``[A] <-- [B] <-- [C]`` are *already* present, and therefore copy *only*
 +the content of image [D].
 +
 +(1) [On *destination* QEMU] As part of the first step, start the
 +    built-in NBD server on a given host (local host, represented by
 +    ``::``)and port::
 +
 +        (QEMU) nbd-server-start addr={"type":"inet","data":{"host":"::","port":"49153"}}
 +        {
 +            "execute": "nbd-server-start",
 +            "arguments": {
 +                "addr": {
 +                    "data": {
 +                        "host": "::",
 +                        "port": "49153"
 +                    },
 +                    "type": "inet"
 +                }
 +            }
 +        }
 +
 +(2) [On *destination* QEMU] And export the destination disk image using
 +    QEMU's built-in NBD server::
 +
 +        (QEMU) nbd-server-add device=node-TargetDisk writable=true
 +        {
 +            "execute": "nbd-server-add",
 +            "arguments": {
 +                "device": "node-TargetDisk"
 +            }
 +        }
 +
 +(3) [On *source* QEMU] Then, invoke ``drive-mirror`` (NB: since we're
 +    running ``drive-mirror`` with ``mode=existing`` (meaning:
 +    synchronize to a pre-created file, therefore 'existing', file on the
 +    target host), with the synchronization mode as 'top' (``"sync:
 +    "top"``)::
 +
 +        (QEMU) drive-mirror device=node-D target=nbd:localhost:49153:exportname=node-TargetDisk sync=top mode=existing job-id=job0
 +        {
 +            "execute": "drive-mirror",
 +            "arguments": {
 +                "device": "node-D",
 +                "mode": "existing",
 +                "job-id": "job0",
 +                "target": "nbd:localhost:49153:exportname=node-TargetDisk",
 +                "sync": "top"
 +            }
 +        }
 +
 +(4) [On *source* QEMU] Once ``drive-mirror`` copies the entire data, and the
 +    event ``BLOCK_JOB_READY`` is emitted, issue ``block-job-cancel`` to
 +    gracefully end the synchronization, from source QEMU::
 +
 +        (QEMU) block-job-cancel device=job0
 +        {
 +            "execute": "block-job-cancel",
 +            "arguments": {
 +                "device": "job0"
 +            }
 +        }
 +
 +(5) [On *destination* QEMU] Then, stop the NBD server::
 +
 +        (QEMU) nbd-server-stop
 +        {
 +            "execute": "nbd-server-stop",
 +            "arguments": {}
 +        }
 +
 +(6) [On *destination* QEMU] Finally, resume the guest vCPUs by issuing the
 +    QMP command `cont`::
 +
 +        (QEMU) cont
 +        {
 +            "execute": "cont",
 +            "arguments": {}
 +        }
 +
 +.. note::
 +    Higher-level libraries (e.g. libvirt) automate the entire above
 +    process (although note that libvirt does not allow same-host
 +    migrations to localhost for other reasons).
 +
 +
 +Notes on ``blockdev-mirror``
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +The ``blockdev-mirror`` command is equivalent in core functionality to
 +``drive-mirror``, except that it operates at node-level in a BDS graph.
 +
 +Also: for ``blockdev-mirror``, the 'target' image needs to be explicitly
 +created (using ``qemu-img``) and attach it to live QEMU via
 +``blockdev-add``, which assigns a name to the to-be created target node.
 +
 +E.g. the sequence of actions to create a point-in-time backup of an
 +entire disk image chain, to a target, using ``blockdev-mirror`` would be:
 +
 +(0) Create the QCOW2 overlays, to arrive at a backing chain of desired
 +    depth
 +
 +(1) Create the target image (using ``qemu-img``), say, ``e.qcow2``
 +
 +(2) Attach the above created file (``e.qcow2``), run-time, using
 +    ``blockdev-add`` to QEMU
 +
 +(3) Perform ``blockdev-mirror`` (use ``"sync": "full"`` to copy the
 +    entire chain to the target).  And notice the event
 +    ``BLOCK_JOB_READY``
 +
 +(4) Optionally, query for active block jobs, there should be a 'mirror'
 +    job ready to be completed
 +
 +(5) Gracefully complete the 'mirror' block device job, and notice the
 +    the event ``BLOCK_JOB_COMPLETED``
 +
 +(6) Shutdown the guest by issuing the QMP ``quit`` command so that
 +    caches are flushed
 +
 +(7) Then, finally, compare the contents of the disk image chain, and
 +    the target copy with ``qemu-img compare``.  You should notice:
 +    "Images are identical"
 +
 +
 +QMP invocation for ``blockdev-mirror``
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +Given the disk image chain::
 +
 +    [A] <-- [B] <-- [C] <-- [D]
 +
 +To copy the contents of the entire disk image chain, from [A] all the
 +way to [D], to a new target, call it [E].  The following is the flow.
 +
 +Create the overlay images, [B], [C], and [D]::
 +
 +    (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
 +    (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2
 +    (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2
 +
 +Create the target image, [E]::
 +
 +    $ qemu-img create -f qcow2 e.qcow2 39M
 +
 +Add the above created target image to QEMU, via ``blockdev-add``::
 +
 +    (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"}
 +    {
 +        "execute": "blockdev-add",
 +        "arguments": {
 +            "node-name": "node-E",
 +            "driver": "qcow2",
 +            "file": {
 +                "driver": "file",
 +                "filename": "e.qcow2"
 +            }
 +        }
 +    }
 +
 +Perform ``blockdev-mirror``, and notice the event ``BLOCK_JOB_READY``::
 +
 +    (QEMU) blockdev-mirror device=node-B target=node-E sync=full job-id=job0
 +    {
 +        "execute": "blockdev-mirror",
 +        "arguments": {
 +            "device": "node-D",
 +            "job-id": "job0",
 +            "target": "node-E",
 +            "sync": "full"
 +        }
 +    }
 +
 +Query for active block jobs, there should be a 'mirror' job ready::
 +
 +    (QEMU) query-block-jobs
 +    {
 +        "execute": "query-block-jobs",
 +        "arguments": {}
 +    }
 +    {
 +        "return": [
 +            {
 +                "busy": false,
 +                "type": "mirror",
 +                "len": 21561344,
 +                "paused": false,
 +                "ready": true,
 +                "io-status": "ok",
 +                "offset": 21561344,
 +                "device": "job0",
 +                "speed": 0
 +            }
 +        ]
 +    }
 +
 +Gracefully complete the block device job operation, and notice the
 +event ``BLOCK_JOB_COMPLETED``::
 +
 +    (QEMU) block-job-complete device=job0
 +    {
 +        "execute": "block-job-complete",
 +        "arguments": {
 +            "device": "job0"
 +        }
 +    }
 +    {
 +        "return": {}
 +    }
 +
 +Shutdown the guest, by issuing the ``quit`` QMP command::
 +
 +    (QEMU) quit
 +    {
 +        "execute": "quit",
 +        "arguments": {}
 +    }
 +
 +
 +Live disk backup --- ``drive-backup`` and ``blockdev-backup``
 +-------------------------------------------------------------
 +
 +The ``drive-backup`` (and its newer equivalent ``blockdev-backup``) allows
 +you to create a point-in-time snapshot.
 +
 +In this case, the point-in-time is when you *start* the ``drive-backup``
 +(or its newer equivalent ``blockdev-backup``) command.
 +
 +
 +QMP invocation for ``drive-backup``
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +Yet again, starting afresh with our example disk image chain::
 +
 +    [A] <-- [B] <-- [C] <-- [D]
 +
 +To create a target image [E], with content populated from image [A] to
 +[D], from the above chain, the following is the syntax.  (If the target
 +image does not exist, ``drive-backup`` will create it)::
 +
 +    (QEMU) drive-backup device=node-D sync=full target=e.qcow2 job-id=job0
 +    {
 +        "execute": "drive-backup",
 +        "arguments": {
 +            "device": "node-D",
 +            "job-id": "job0",
 +            "sync": "full",
 +            "target": "e.qcow2"
 +        }
 +    }
 +
 +Once the above ``drive-backup`` has completed, a ``BLOCK_JOB_COMPLETED`` event
 +will be issued, indicating the live block device job operation has
 +completed, and no further action is required.
 +
 +
 +Notes on ``blockdev-backup``
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +The ``blockdev-backup`` command is equivalent in functionality to
 +``drive-backup``, except that it operates at node-level in a Block Driver
 +State (BDS) graph.
 +
 +E.g. the sequence of actions to create a point-in-time backup
 +of an entire disk image chain, to a target, using ``blockdev-backup``
 +would be:
 +
 +(0) Create the QCOW2 overlays, to arrive at a backing chain of desired
 +    depth
 +
 +(1) Create the target image (using ``qemu-img``), say, ``e.qcow2``
 +
 +(2) Attach the above created file (``e.qcow2``), run-time, using
 +    ``blockdev-add`` to QEMU
 +
 +(3) Perform ``blockdev-backup`` (use ``"sync": "full"`` to copy the
 +    entire chain to the target).  And notice the event
 +    ``BLOCK_JOB_COMPLETED``
 +
 +(4) Shutdown the guest, by issuing the QMP ``quit`` command, so that
 +    caches are flushed
 +
 +(5) Then, finally, compare the contents of the disk image chain, and
 +    the target copy with ``qemu-img compare``.  You should notice:
 +    "Images are identical"
 +
 +The following section shows an example QMP invocation for
 +``blockdev-backup``.
 +
 +QMP invocation for ``blockdev-backup``
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +Given a disk image chain of depth 1 where image [B] is the active
 +overlay (live QEMU is writing to it)::
 +
 +    [A] <-- [B]
 +
 +The following is the procedure to copy the content from the entire chain
 +to a target image (say, [E]), which has the full content from [A] and
 +[B].
 +
 +Create the overlay [B]::
 +
 +    (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
 +    {
 +        "execute": "blockdev-snapshot-sync",
 +        "arguments": {
 +            "node-name": "node-A",
 +            "snapshot-file": "b.qcow2",
 +            "format": "qcow2",
 +            "snapshot-node-name": "node-B"
 +        }
 +    }
 +
 +
 +Create a target image that will contain the copy::
 +
 +    $ qemu-img create -f qcow2 e.qcow2 39M
 +
 +Then add it to QEMU via ``blockdev-add``::
 +
 +    (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"}
 +    {
 +        "execute": "blockdev-add",
 +        "arguments": {
 +            "node-name": "node-E",
 +            "driver": "qcow2",
 +            "file": {
 +                "driver": "file",
 +                "filename": "e.qcow2"
 +            }
 +        }
 +    }
 +
 +Then invoke ``blockdev-backup`` to copy the contents from the entire
 +image chain, consisting of images [A] and [B] to the target image
 +'e.qcow2'::
 +
 +    (QEMU) blockdev-backup device=node-B target=node-E sync=full job-id=job0
 +    {
 +        "execute": "blockdev-backup",
 +        "arguments": {
 +            "device": "node-B",
 +            "job-id": "job0",
 +            "target": "node-E",
 +            "sync": "full"
 +        }
 +    }
 +
 +Once the above 'backup' operation has completed, the event,
 +``BLOCK_JOB_COMPLETED`` will be emitted, signalling successful
 +completion.
 +
 +Next, query for any active block device jobs (there should be none)::
 +
 +    (QEMU) query-block-jobs
 +    {
 +        "execute": "query-block-jobs",
 +        "arguments": {}
 +    }
 +
 +Shutdown the guest::
 +
 +    (QEMU) quit
 +    {
 +            "execute": "quit",
 +                "arguments": {}
 +    }
 +            "return": {}
 +    }
 +
 +.. note::
 +    The above step is really important; if forgotten, an error, "Failed
 +    to get shared "write" lock on e.qcow2", will be thrown when you do
 +    ``qemu-img compare`` to verify the integrity of the disk image
 +    with the backup content.
 +
 +
 +The end result will be the image 'e.qcow2' containing a
 +point-in-time backup of the disk image chain -- i.e. contents from
 +images [A] and [B] at the time the ``blockdev-backup`` command was
 +initiated.
 +
 +One way to confirm the backup disk image contains the identical content
 +with the disk image chain is to compare the backup and the contents of
 +the chain, you should see "Images are identical".  (NB: this is assuming
 +QEMU was launched with ``-S`` option, which will not start the CPUs at
 +guest boot up)::
 +
 +    $ qemu-img compare b.qcow2 e.qcow2
 +    Warning: Image size mismatch!
 +    Images are identical.
 +
 +NOTE: The "Warning: Image size mismatch!" is expected, as we created the
 +target image (e.qcow2) with 39M size.
 diff --git a/docs/live-block-ops.txt b/docs/live-block-ops.txt
 deleted file mode 100644
 index XXXXXXX..XXXXXXX
 --- a/docs/live-block-ops.txt
 +++ /dev/null
@@ -XXX,XX +XXX,XX @@
 -LIVE BLOCK OPERATIONS
 -=====================
 -
--High level description of live block operations. Note these are not
+     vm.launch()
--supported for use with the raw format at the moment.
+     blockdev_create(vm, { 'driver': 'ssh',
--
+                           'location': {
--Note also that this document is incomplete and it currently only
+@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
--covers the 'stream' operation. Other operations supported by QEMU such
+                               'host-key-check': {
--as 'commit', 'mirror' and 'backup' are not described here yet. Please
+                                   'mode': 'hash',
--refer to the qapi/block-core.json file for an overview of those.
+                                   'type': 'sha1',
--
+-                                  'hash': sha1_key,
--Snapshot live merge
++                                  'hash': sha1_keys[matching_key],
--===================
+                               }
--
+                           },
--Given a snapshot chain, described in this document in the following
+                           'size': 4194304 })
--format:
+diff --git a/tests/qemu-iotests/207.out b/tests/qemu-iotests/207.out
--
+index XXXXXXX..XXXXXXX 100644
--[A] <- [B] <- [C] <- [D] <- [E]
+--- a/tests/qemu-iotests/207.out
--
++++ b/tests/qemu-iotests/207.out
--Where the rightmost object ([E] in the example) described is the current
+@@ -XXX,XX +XXX,XX @@ virtual size: 4 MiB (4194304 bytes)
--image which the guest OS has write access to. To the left of it is its base
--image, and so on accordingly until the leftmost image, which has no
+ {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": {"driver": "ssh", "location": {"host-key-check": {"mode": "none"}, "path": "/this/is/not/an/existing/path", "server": {"host": "127.0.0.1", "port": "22"}}, "size": 4194304}}}
--base.
+ {"return": {}}
--
+-Job failed: failed to open remote file '/this/is/not/an/existing/path': Failed opening remote file (libssh2 error code: -31)
--The snapshot live merge operation transforms such a chain into a
++Job failed: failed to open remote file '/this/is/not/an/existing/path': SFTP server: No such file (libssh error code: 1, sftp error code: 2)
--smaller one with fewer elements, such as this transformation relative
+ {"execute": "job-dismiss", "arguments": {"id": "job0"}}
--to the first example:
+ {"return": {}}
--
 -[A] <- [E]
 -
 -Data is copied in the right direction with destination being the
 -rightmost image, but any other intermediate image can be specified
 -instead. In this example data is copied from [C] into [D], so [D] can
 -be backed by [B]:
 -
 -[A] <- [B] <- [D] <- [E]
 -
 -The operation is implemented in QEMU through image streaming facilities.
 -
 -The basic idea is to execute 'block_stream virtio0' while the guest is
 -running. Progress can be monitored using 'info block-jobs'. When the
 -streaming operation completes it raises a QMP event. 'block_stream'
 -copies data from the backing file(s) into the active image. When finished,
 -it adjusts the backing file pointer.
 -
 -The 'base' parameter specifies an image which data need not be
 -streamed from. This image will be used as the backing file for the
 -destination image when the operation is finished.
 -
 -In the first example above, the command would be:
 -
 -(qemu) block_stream virtio0 file-A.img
 -
 -In order to specify a destination image different from the active
 -(rightmost) one we can use its node name instead.
 -
 -In the second example above, the command would be:
 -
 -(qemu) block_stream node-D file-B.img
 -
 -Live block copy
 -===============
 -
 -To copy an in use image to another destination in the filesystem, one
 -should create a live snapshot in the desired destination, then stream
 -into that image. Example:
 -
 -(qemu) snapshot_blkdev ide0-hd0 /new-path/disk.img qcow2
 -
 -(qemu) block_stream ide0-hd0
 -
 -
 --
-.9.4
+.21.0

-[Qemu-devel] [PULL 1/2] bitmaps.md: Convert to rST; move it into 'interop' dir
+[Qemu-devel] [PULL v2 8/8] iotests: Fix 205 for concurrent runs
-From: Kashyap Chamarthy <kchamart@redhat.com>
+Tests should place their files into the test directory.  This includes
 Unix sockets.  205 currently fails to do so, which prevents it from
 being run concurrently.
-This is part of the on-going effort to convert QEMU upstream
+Signed-off-by: Max Reitz <mreitz@redhat.com>
-documentation syntax to reStructuredText (rST).
+Message-id: 20190618210238.9524-1-mreitz@redhat.com
 Reviewed-by: Eric Blake <eblake@redhat.com>
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
  tests/qemu-iotests/205 | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-The conversion to rST was done using:
+diff --git a/tests/qemu-iotests/205 b/tests/qemu-iotests/205
+index XXXXXXX..XXXXXXX 100755
-    $ pandoc -f markdown -t rst bitmaps.md -o bitmaps.rst
+--- a/tests/qemu-iotests/205
++++ b/tests/qemu-iotests/205
-Then, make a couple of small syntactical adjustments.  While at it,
+@@ -XXX,XX +XXX,XX @@ import iotests
-reword a statement to avoid ambiguity.  Addressing the feedback from
+ import time
-this thread:
+ from iotests import qemu_img_create, qemu_io, filter_qemu_io, QemuIoInteractive
-    https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg05428.html
+-nbd_sock = 'nbd_sock'
++nbd_sock = os.path.join(iotests.test_dir, 'nbd_sock')
-Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
+ nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock
-Reviewed-by: John Snow <jsnow@redhat.com>
+ disk = os.path.join(iotests.test_dir, 'disk')
-Reviewed-by: Eric Blake <eblake@redhat.com>
 Message-id: 20170717105205.32639-2-kchamart@redhat.com
 Signed-off-by: Jeff Cody <jcody@redhat.com>
 ---
  docs/devel/bitmaps.md    | 505 ------------------------------------------
  docs/interop/bitmaps.rst | 555 +++++++++++++++++++++++++++++++++++++++++++++++
 files changed, 555 insertions(+), 505 deletions(-)
  delete mode 100644 docs/devel/bitmaps.md
  create mode 100644 docs/interop/bitmaps.rst
 diff --git a/docs/devel/bitmaps.md b/docs/devel/bitmaps.md
 deleted file mode 100644
 index XXXXXXX..XXXXXXX
 --- a/docs/devel/bitmaps.md
 +++ /dev/null
@@ -XXX,XX +XXX,XX @@
 -<!--
 -Copyright 2015 John Snow <jsnow@redhat.com> and Red Hat, Inc.
 -All rights reserved.
 -
 -This file is licensed via The FreeBSD Documentation License, the full text of
 -which is included at the end of this document.
 --->
 -
 -# Dirty Bitmaps and Incremental Backup
 -
 -* Dirty Bitmaps are objects that track which data needs to be backed up for the
 -  next incremental backup.
 -
 -* Dirty bitmaps can be created at any time and attached to any node
 -  (not just complete drives.)
 -
 -## Dirty Bitmap Names
 -
 -* A dirty bitmap's name is unique to the node, but bitmaps attached to different
 -  nodes can share the same name.
 -
 -* Dirty bitmaps created for internal use by QEMU may be anonymous and have no
 -  name, but any user-created bitmaps may not be. There can be any number of
 -  anonymous bitmaps per node.
 -
 -* The name of a user-created bitmap must not be empty ("").
 -
 -## Bitmap Modes
 -
 -* A Bitmap can be "frozen," which means that it is currently in-use by a backup
 -  operation and cannot be deleted, renamed, written to, reset,
 -  etc.
 -
 -* The normal operating mode for a bitmap is "active."
 -
 -## Basic QMP Usage
 -
 -### Supported Commands ###
 -
 -* block-dirty-bitmap-add
 -* block-dirty-bitmap-remove
 -* block-dirty-bitmap-clear
 -
 -### Creation
 -
 -* To create a new bitmap, enabled, on the drive with id=drive0:
 -
 -```json
 -{ "execute": "block-dirty-bitmap-add",
 -  "arguments": {
 -    "node": "drive0",
 -    "name": "bitmap0"
 -  }
 -}
 -```
 -
 -* This bitmap will have a default granularity that matches the cluster size of
 -  its associated drive, if available, clamped to between [4KiB, 64KiB].
 -  The current default for qcow2 is 64KiB.
 -
 -* To create a new bitmap that tracks changes in 32KiB segments:
 -
 -```json
 -{ "execute": "block-dirty-bitmap-add",
 -  "arguments": {
 -    "node": "drive0",
 -    "name": "bitmap0",
 -    "granularity": 32768
 -  }
 -}
 -```
 -
 -### Deletion
 -
 -* Bitmaps that are frozen cannot be deleted.
 -
 -* Deleting the bitmap does not impact any other bitmaps attached to the same
 -  node, nor does it affect any backups already created from this node.
 -
 -* Because bitmaps are only unique to the node to which they are attached,
 -  you must specify the node/drive name here, too.
 -
 -```json
 -{ "execute": "block-dirty-bitmap-remove",
 -  "arguments": {
 -    "node": "drive0",
 -    "name": "bitmap0"
 -  }
 -}
 -```
 -
 -### Resetting
 -
 -* Resetting a bitmap will clear all information it holds.
 -
 -* An incremental backup created from an empty bitmap will copy no data,
 -  as if nothing has changed.
 -
 -```json
 -{ "execute": "block-dirty-bitmap-clear",
 -  "arguments": {
 -    "node": "drive0",
 -    "name": "bitmap0"
 -  }
 -}
 -```
 -
 -## Transactions
 -
 -### Justification
 -
 -Bitmaps can be safely modified when the VM is paused or halted by using
 -the basic QMP commands. For instance, you might perform the following actions:
 -
 -1. Boot the VM in a paused state.
 -2. Create a full drive backup of drive0.
 -3. Create a new bitmap attached to drive0.
 -4. Resume execution of the VM.
 -5. Incremental backups are ready to be created.
 -
 -At this point, the bitmap and drive backup would be correctly in sync,
 -and incremental backups made from this point forward would be correctly aligned
 -to the full drive backup.
 -
 -This is not particularly useful if we decide we want to start incremental
 -backups after the VM has been running for a while, for which we will need to
 -perform actions such as the following:
 -
 -1. Boot the VM and begin execution.
 -2. Using a single transaction, perform the following operations:
 -    * Create bitmap0.
 -    * Create a full drive backup of drive0.
 -3. Incremental backups are now ready to be created.
 -
 -### Supported Bitmap Transactions
 -
 -* block-dirty-bitmap-add
 -* block-dirty-bitmap-clear
 -
 -The usages are identical to their respective QMP commands, but see below
 -for examples.
 -
 -### Example: New Incremental Backup
 -
 -As outlined in the justification, perhaps we want to create a new incremental
 -backup chain attached to a drive.
 -
 -```json
 -{ "execute": "transaction",
 -  "arguments": {
 -    "actions": [
 -      {"type": "block-dirty-bitmap-add",
 -       "data": {"node": "drive0", "name": "bitmap0"} },
 -      {"type": "drive-backup",
 -       "data": {"device": "drive0", "target": "/path/to/full_backup.img",
 -                "sync": "full", "format": "qcow2"} }
 -    ]
 -  }
 -}
 -```
 -
 -### Example: New Incremental Backup Anchor Point
 -
 -Maybe we just want to create a new full backup with an existing bitmap and
 -want to reset the bitmap to track the new chain.
 -
 -```json
 -{ "execute": "transaction",
 -  "arguments": {
 -    "actions": [
 -      {"type": "block-dirty-bitmap-clear",
 -       "data": {"node": "drive0", "name": "bitmap0"} },
 -      {"type": "drive-backup",
 -       "data": {"device": "drive0", "target": "/path/to/new_full_backup.img",
 -                "sync": "full", "format": "qcow2"} }
 -    ]
 -  }
 -}
 -```
 -
 -## Incremental Backups
 -
 -The star of the show.
 -
 -**Nota Bene!** Only incremental backups of entire drives are supported for now.
 -So despite the fact that you can attach a bitmap to any arbitrary node, they are
 -only currently useful when attached to the root node. This is because
 -drive-backup only supports drives/devices instead of arbitrary nodes.
 -
 -### Example: First Incremental Backup
 -
 -1. Create a full backup and sync it to the dirty bitmap, as in the transactional
 -examples above; or with the VM offline, manually create a full copy and then
 -create a new bitmap before the VM begins execution.
 -
 -    * Let's assume the full backup is named 'full_backup.img'.
 -    * Let's assume the bitmap you created is 'bitmap0' attached to 'drive0'.
 -
 -2. Create a destination image for the incremental backup that utilizes the
 -full backup as a backing image.
 -
 -    * Let's assume it is named 'incremental.0.img'.
 -
 -    ```sh
 -    # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
 -    ```
 -
 -3. Issue the incremental backup command:
 -
 -    ```json
 -    { "execute": "drive-backup",
 -      "arguments": {
 -        "device": "drive0",
 -        "bitmap": "bitmap0",
 -        "target": "incremental.0.img",
 -        "format": "qcow2",
 -        "sync": "incremental",
 -        "mode": "existing"
 -      }
 -    }
 -    ```
 -
 -### Example: Second Incremental Backup
 -
 -1. Create a new destination image for the incremental backup that points to the
 -   previous one, e.g.: 'incremental.1.img'
 -
 -    ```sh
 -    # qemu-img create -f qcow2 incremental.1.img -b incremental.0.img -F qcow2
 -    ```
 -
 -2. Issue a new incremental backup command. The only difference here is that we
 -   have changed the target image below.
 -
 -    ```json
 -    { "execute": "drive-backup",
 -      "arguments": {
 -        "device": "drive0",
 -        "bitmap": "bitmap0",
 -        "target": "incremental.1.img",
 -        "format": "qcow2",
 -        "sync": "incremental",
 -        "mode": "existing"
 -      }
 -    }
 -    ```
 -
 -## Errors
 -
 -* In the event of an error that occurs after a backup job is successfully
 -  launched, either by a direct QMP command or a QMP transaction, the user
 -  will receive a BLOCK_JOB_COMPLETE event with a failure message, accompanied
 -  by a BLOCK_JOB_ERROR event.
 -
 -* In the case of an event being cancelled, the user will receive a
 -  BLOCK_JOB_CANCELLED event instead of a pair of COMPLETE and ERROR events.
 -
 -* In either case, the incremental backup data contained within the bitmap is
 -  safely rolled back, and the data within the bitmap is not lost. The image
 -  file created for the failed attempt can be safely deleted.
 -
 -* Once the underlying problem is fixed (e.g. more storage space is freed up),
 -  you can simply retry the incremental backup command with the same bitmap.
 -
 -### Example
 -
 -1. Create a target image:
 -
 -    ```sh
 -    # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
 -    ```
 -
 -2. Attempt to create an incremental backup via QMP:
 -
 -    ```json
 -    { "execute": "drive-backup",
 -      "arguments": {
 -        "device": "drive0",
 -        "bitmap": "bitmap0",
 -        "target": "incremental.0.img",
 -        "format": "qcow2",
 -        "sync": "incremental",
 -        "mode": "existing"
 -      }
 -    }
 -    ```
 -
 -3. Receive an event notifying us of failure:
 -
 -    ```json
 -    { "timestamp": { "seconds": 1424709442, "microseconds": 844524 },
 -      "data": { "speed": 0, "offset": 0, "len": 67108864,
 -                "error": "No space left on device",
 -                "device": "drive1", "type": "backup" },
 -      "event": "BLOCK_JOB_COMPLETED" }
 -    ```
 -
 -4. Delete the failed incremental, and re-create the image.
 -
 -    ```sh
 -    # rm incremental.0.img
 -    # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
 -    ```
 -
 -5. Retry the command after fixing the underlying problem,
 -   such as freeing up space on the backup volume:
 -
 -    ```json
 -    { "execute": "drive-backup",
 -      "arguments": {
 -        "device": "drive0",
 -        "bitmap": "bitmap0",
 -        "target": "incremental.0.img",
 -        "format": "qcow2",
 -        "sync": "incremental",
 -        "mode": "existing"
 -      }
 -    }
 -    ```
 -
 -6. Receive confirmation that the job completed successfully:
 -
 -    ```json
 -    { "timestamp": { "seconds": 1424709668, "microseconds": 526525 },
 -      "data": { "device": "drive1", "type": "backup",
 -                "speed": 0, "len": 67108864, "offset": 67108864},
 -      "event": "BLOCK_JOB_COMPLETED" }
 -    ```
 -
 -### Partial Transactional Failures
 -
 -* Sometimes, a transaction will succeed in launching and return success,
 -  but then later the backup jobs themselves may fail. It is possible that
 -  a management application may have to deal with a partial backup failure
 -  after a successful transaction.
 -
 -* If multiple backup jobs are specified in a single transaction, when one of
 -  them fails, it will not interact with the other backup jobs in any way.
 -
 -* The job(s) that succeeded will clear the dirty bitmap associated with the
 -  operation, but the job(s) that failed will not. It is not "safe" to delete
 -  any incremental backups that were created successfully in this scenario,
 -  even though others failed.
 -
 -#### Example
 -
 -* QMP example highlighting two backup jobs:
 -
 -    ```json
 -    { "execute": "transaction",
 -      "arguments": {
 -        "actions": [
 -          { "type": "drive-backup",
 -            "data": { "device": "drive0", "bitmap": "bitmap0",
 -                      "format": "qcow2", "mode": "existing",
 -                      "sync": "incremental", "target": "d0-incr-1.qcow2" } },
 -          { "type": "drive-backup",
 -            "data": { "device": "drive1", "bitmap": "bitmap1",
 -                      "format": "qcow2", "mode": "existing",
 -                      "sync": "incremental", "target": "d1-incr-1.qcow2" } },
 -        ]
 -      }
 -    }
 -    ```
 -
 -* QMP example response, highlighting one success and one failure:
 -    * Acknowledgement that the Transaction was accepted and jobs were launched:
 -        ```json
 -        { "return": {} }
 -        ```
 -
 -    * Later, QEMU sends notice that the first job was completed:
 -        ```json
 -        { "timestamp": { "seconds": 1447192343, "microseconds": 615698 },
 -          "data": { "device": "drive0", "type": "backup",
 -                     "speed": 0, "len": 67108864, "offset": 67108864 },
 -          "event": "BLOCK_JOB_COMPLETED"
 -        }
 -        ```
 -
 -    * Later yet, QEMU sends notice that the second job has failed:
 -        ```json
 -        { "timestamp": { "seconds": 1447192399, "microseconds": 683015 },
 -          "data": { "device": "drive1", "action": "report",
 -                    "operation": "read" },
 -          "event": "BLOCK_JOB_ERROR" }
 -        ```
 -
 -        ```json
 -        { "timestamp": { "seconds": 1447192399, "microseconds": 685853 },
 -          "data": { "speed": 0, "offset": 0, "len": 67108864,
 -                    "error": "Input/output error",
 -                    "device": "drive1", "type": "backup" },
 -          "event": "BLOCK_JOB_COMPLETED" }
 -
 -* In the above example, "d0-incr-1.qcow2" is valid and must be kept,
 -  but "d1-incr-1.qcow2" is invalid and should be deleted. If a VM-wide
 -  incremental backup of all drives at a point-in-time is to be made,
 -  new backups for both drives will need to be made, taking into account
 -  that a new incremental backup for drive0 needs to be based on top of
 -  "d0-incr-1.qcow2."
 -
 -### Grouped Completion Mode
 -
 -* While jobs launched by transactions normally complete or fail on their own,
 -  it is possible to instruct them to complete or fail together as a group.
 -
 -* QMP transactions take an optional properties structure that can affect
 -  the semantics of the transaction.
 -
 -* The "completion-mode" transaction property can be either "individual"
 -  which is the default, legacy behavior described above, or "grouped,"
 -  a new behavior detailed below.
 -
 -* Delayed Completion: In grouped completion mode, no jobs will report
 -  success until all jobs are ready to report success.
 -
 -* Grouped failure: If any job fails in grouped completion mode, all remaining
 -  jobs will be cancelled. Any incremental backups will restore their dirty
 -  bitmap objects as if no backup command was ever issued.
 -
 -    * Regardless of if QEMU reports a particular incremental backup job as
 -      CANCELLED or as an ERROR, the in-memory bitmap will be restored.
 -
 -#### Example
 -
 -* Here's the same example scenario from above with the new property:
 -
 -    ```json
 -    { "execute": "transaction",
 -      "arguments": {
 -        "actions": [
 -          { "type": "drive-backup",
 -            "data": { "device": "drive0", "bitmap": "bitmap0",
 -                      "format": "qcow2", "mode": "existing",
 -                      "sync": "incremental", "target": "d0-incr-1.qcow2" } },
 -          { "type": "drive-backup",
 -            "data": { "device": "drive1", "bitmap": "bitmap1",
 -                      "format": "qcow2", "mode": "existing",
 -                      "sync": "incremental", "target": "d1-incr-1.qcow2" } },
 -        ],
 -        "properties": {
 -          "completion-mode": "grouped"
 -        }
 -      }
 -    }
 -    ```
 -
 -* QMP example response, highlighting a failure for drive2:
 -    * Acknowledgement that the Transaction was accepted and jobs were launched:
 -        ```json
 -        { "return": {} }
 -        ```
 -
 -    * Later, QEMU sends notice that the second job has errored out,
 -      but that the first job was also cancelled:
 -        ```json
 -        { "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
 -          "data": { "device": "drive1", "action": "report",
 -                    "operation": "read" },
 -          "event": "BLOCK_JOB_ERROR" }
 -        ```
 -
 -        ```json
 -        { "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
 -          "data": { "speed": 0, "offset": 0, "len": 67108864,
 -                    "error": "Input/output error",
 -                    "device": "drive1", "type": "backup" },
 -          "event": "BLOCK_JOB_COMPLETED" }
 -        ```
 -
 -        ```json
 -        { "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
 -          "data": { "device": "drive0", "type": "backup", "speed": 0,
 -                    "len": 67108864, "offset": 16777216 },
 -          "event": "BLOCK_JOB_CANCELLED" }
 -        ```
 -
 -<!--
 -The FreeBSD Documentation License
 -
 -Redistribution and use in source (Markdown) and 'compiled' forms (SGML, HTML,
 -PDF, PostScript, RTF and so forth) with or without modification, are permitted
 -provided that the following conditions are met:
 -
 -Redistributions of source code (Markdown) must retain the above copyright
 -notice, this list of conditions and the following disclaimer of this file
 -unmodified.
 -
 -Redistributions in compiled form (transformed to other DTDs, converted to PDF,
 -PostScript, RTF and other formats) must reproduce the above copyright notice,
 -this list of conditions and the following disclaimer in the documentation and/or
 -other materials provided with the distribution.
 -
 -THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR  PURPOSE ARE
 -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS  BE LIABLE
 -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
 -THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 --->
 diff --git a/docs/interop/bitmaps.rst b/docs/interop/bitmaps.rst
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/docs/interop/bitmaps.rst
@@ -XXX,XX +XXX,XX @@
 +..
 +   Copyright 2015 John Snow <jsnow@redhat.com> and Red Hat, Inc.
 +   All rights reserved.
 +
 +   This file is licensed via The FreeBSD Documentation License, the full
 +   text of which is included at the end of this document.
 +
 +====================================
 +Dirty Bitmaps and Incremental Backup
 +====================================
 +
 +-  Dirty Bitmaps are objects that track which data needs to be backed up
 +   for the next incremental backup.
 +
 +-  Dirty bitmaps can be created at any time and attached to any node
 +   (not just complete drives).
 +
 +.. contents::
 +
 +Dirty Bitmap Names
 +------------------
 +
 +-  A dirty bitmap's name is unique to the node, but bitmaps attached to
 +   different nodes can share the same name.
 +
 +-  Dirty bitmaps created for internal use by QEMU may be anonymous and
 +   have no name, but any user-created bitmaps must have a name. There
 +   can be any number of anonymous bitmaps per node.
 +
 +-  The name of a user-created bitmap must not be empty ("").
 +
 +Bitmap Modes
 +------------
 +
 +-  A bitmap can be "frozen," which means that it is currently in-use by
 +   a backup operation and cannot be deleted, renamed, written to, reset,
 +   etc.
 +
 +-  The normal operating mode for a bitmap is "active."
 +
 +Basic QMP Usage
 +---------------
 +
 +Supported Commands
 +~~~~~~~~~~~~~~~~~~
 +
 +- ``block-dirty-bitmap-add``
 +- ``block-dirty-bitmap-remove``
 +- ``block-dirty-bitmap-clear``
 +
 +Creation
 +~~~~~~~~
 +
 +-  To create a new bitmap, enabled, on the drive with id=drive0:
 +
 +.. code:: json
 +
 +    { "execute": "block-dirty-bitmap-add",
 +      "arguments": {
 +        "node": "drive0",
 +        "name": "bitmap0"
 +      }
 +    }
 +
 +-  This bitmap will have a default granularity that matches the cluster
 +   size of its associated drive, if available, clamped to between [4KiB,
 +   64KiB]. The current default for qcow2 is 64KiB.
 +
 +-  To create a new bitmap that tracks changes in 32KiB segments:
 +
 +.. code:: json
 +
 +    { "execute": "block-dirty-bitmap-add",
 +      "arguments": {
 +        "node": "drive0",
 +        "name": "bitmap0",
 +        "granularity": 32768
 +      }
 +    }
 +
 +Deletion
 +~~~~~~~~
 +
 +-  Bitmaps that are frozen cannot be deleted.
 +
 +-  Deleting the bitmap does not impact any other bitmaps attached to the
 +   same node, nor does it affect any backups already created from this
 +   node.
 +
 +-  Because bitmaps are only unique to the node to which they are
 +   attached, you must specify the node/drive name here, too.
 +
 +.. code:: json
 +
 +    { "execute": "block-dirty-bitmap-remove",
 +      "arguments": {
 +        "node": "drive0",
 +        "name": "bitmap0"
 +      }
 +    }
 +
 +Resetting
 +~~~~~~~~~
 +
 +-  Resetting a bitmap will clear all information it holds.
 +
 +-  An incremental backup created from an empty bitmap will copy no data,
 +   as if nothing has changed.
 +
 +.. code:: json
 +
 +    { "execute": "block-dirty-bitmap-clear",
 +      "arguments": {
 +        "node": "drive0",
 +        "name": "bitmap0"
 +      }
 +    }
 +
 +Transactions
 +------------
 +
 +Justification
 +~~~~~~~~~~~~~
 +
 +Bitmaps can be safely modified when the VM is paused or halted by using
 +the basic QMP commands. For instance, you might perform the following
 +actions:
 +
 +1. Boot the VM in a paused state.
 +2. Create a full drive backup of drive0.
 +3. Create a new bitmap attached to drive0.
 +4. Resume execution of the VM.
 +5. Incremental backups are ready to be created.
 +
 +At this point, the bitmap and drive backup would be correctly in sync,
 +and incremental backups made from this point forward would be correctly
 +aligned to the full drive backup.
 +
 +This is not particularly useful if we decide we want to start
 +incremental backups after the VM has been running for a while, for which
 +we will need to perform actions such as the following:
 +
 +1. Boot the VM and begin execution.
 +2. Using a single transaction, perform the following operations:
 +
 +   -  Create ``bitmap0``.
 +   -  Create a full drive backup of ``drive0``.
 +
 +3. Incremental backups are now ready to be created.
 +
 +Supported Bitmap Transactions
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +-  ``block-dirty-bitmap-add``
 +-  ``block-dirty-bitmap-clear``
 +
 +The usages are identical to their respective QMP commands, but see below
 +for examples.
 +
 +Example: New Incremental Backup
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +As outlined in the justification, perhaps we want to create a new
 +incremental backup chain attached to a drive.
 +
 +.. code:: json
 +
 +    { "execute": "transaction",
 +      "arguments": {
 +        "actions": [
 +          {"type": "block-dirty-bitmap-add",
 +           "data": {"node": "drive0", "name": "bitmap0"} },
 +          {"type": "drive-backup",
 +           "data": {"device": "drive0", "target": "/path/to/full_backup.img",
 +                    "sync": "full", "format": "qcow2"} }
 +        ]
 +      }
 +    }
 +
 +Example: New Incremental Backup Anchor Point
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +Maybe we just want to create a new full backup with an existing bitmap
 +and want to reset the bitmap to track the new chain.
 +
 +.. code:: json
 +
 +    { "execute": "transaction",
 +      "arguments": {
 +        "actions": [
 +          {"type": "block-dirty-bitmap-clear",
 +           "data": {"node": "drive0", "name": "bitmap0"} },
 +          {"type": "drive-backup",
 +           "data": {"device": "drive0", "target": "/path/to/new_full_backup.img",
 +                    "sync": "full", "format": "qcow2"} }
 +        ]
 +      }
 +    }
 +
 +Incremental Backups
 +-------------------
 +
 +The star of the show.
 +
 +**Nota Bene!** Only incremental backups of entire drives are supported
 +for now. So despite the fact that you can attach a bitmap to any
 +arbitrary node, they are only currently useful when attached to the root
 +node. This is because drive-backup only supports drives/devices instead
 +of arbitrary nodes.
 +
 +Example: First Incremental Backup
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +1. Create a full backup and sync it to the dirty bitmap, as in the
 +   transactional examples above; or with the VM offline, manually create
 +   a full copy and then create a new bitmap before the VM begins
 +   execution.
 +
 +   -  Let's assume the full backup is named ``full_backup.img``.
 +   -  Let's assume the bitmap you created is ``bitmap0`` attached to
 +      ``drive0``.
 +
 +2. Create a destination image for the incremental backup that utilizes
 +   the full backup as a backing image.
 +
 +   -  Let's assume the new incremental image is named
 +      ``incremental.0.img``.
 +
 +   .. code:: bash
 +
 +       $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
 +
 +3. Issue the incremental backup command:
 +
 +   .. code:: json
 +
 +       { "execute": "drive-backup",
 +         "arguments": {
 +           "device": "drive0",
 +           "bitmap": "bitmap0",
 +           "target": "incremental.0.img",
 +           "format": "qcow2",
 +           "sync": "incremental",
 +           "mode": "existing"
 +         }
 +       }
 +
 +Example: Second Incremental Backup
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +1. Create a new destination image for the incremental backup that points
 +   to the previous one, e.g.: ``incremental.1.img``
 +
 +   .. code:: bash
 +
 +       $ qemu-img create -f qcow2 incremental.1.img -b incremental.0.img -F qcow2
 +
 +2. Issue a new incremental backup command. The only difference here is
 +   that we have changed the target image below.
 +
 +   .. code:: json
 +
 +       { "execute": "drive-backup",
 +         "arguments": {
 +           "device": "drive0",
 +           "bitmap": "bitmap0",
 +           "target": "incremental.1.img",
 +           "format": "qcow2",
 +           "sync": "incremental",
 +           "mode": "existing"
 +         }
 +       }
 +
 +Errors
 +------
 +
 +-  In the event of an error that occurs after a backup job is
 +   successfully launched, either by a direct QMP command or a QMP
 +   transaction, the user will receive a ``BLOCK_JOB_COMPLETE`` event with
 +   a failure message, accompanied by a ``BLOCK_JOB_ERROR`` event.
 +
 +-  In the case of an event being cancelled, the user will receive a
 +   ``BLOCK_JOB_CANCELLED`` event instead of a pair of COMPLETE and ERROR
 +   events.
 +
 +-  In either case, the incremental backup data contained within the
 +   bitmap is safely rolled back, and the data within the bitmap is not
 +   lost. The image file created for the failed attempt can be safely
 +   deleted.
 +
 +-  Once the underlying problem is fixed (e.g. more storage space is
 +   freed up), you can simply retry the incremental backup command with
 +   the same bitmap.
 +
 +Example
 +~~~~~~~
 +
 +1. Create a target image:
 +
 +   .. code:: bash
 +
 +       $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
 +
 +2. Attempt to create an incremental backup via QMP:
 +
 +   .. code:: json
 +
 +       { "execute": "drive-backup",
 +         "arguments": {
 +           "device": "drive0",
 +           "bitmap": "bitmap0",
 +           "target": "incremental.0.img",
 +           "format": "qcow2",
 +           "sync": "incremental",
 +           "mode": "existing"
 +         }
 +       }
 +
 +3. Receive an event notifying us of failure:
 +
 +   .. code:: json
 +
 +       { "timestamp": { "seconds": 1424709442, "microseconds": 844524 },
 +         "data": { "speed": 0, "offset": 0, "len": 67108864,
 +                   "error": "No space left on device",
 +                   "device": "drive1", "type": "backup" },
 +         "event": "BLOCK_JOB_COMPLETED" }
 +
 +4. Delete the failed incremental, and re-create the image.
 +
 +   .. code:: bash
 +
 +       $ rm incremental.0.img
 +       $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
 +
 +5. Retry the command after fixing the underlying problem, such as
 +   freeing up space on the backup volume:
 +
 +   .. code:: json
 +
 +       { "execute": "drive-backup",
 +         "arguments": {
 +           "device": "drive0",
 +           "bitmap": "bitmap0",
 +           "target": "incremental.0.img",
 +           "format": "qcow2",
 +           "sync": "incremental",
 +           "mode": "existing"
 +         }
 +       }
 +
 +6. Receive confirmation that the job completed successfully:
 +
 +   .. code:: json
 +
 +       { "timestamp": { "seconds": 1424709668, "microseconds": 526525 },
 +         "data": { "device": "drive1", "type": "backup",
 +                   "speed": 0, "len": 67108864, "offset": 67108864},
 +         "event": "BLOCK_JOB_COMPLETED" }
 +
 +Partial Transactional Failures
 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 +
 +-  Sometimes, a transaction will succeed in launching and return
 +   success, but then later the backup jobs themselves may fail. It is
 +   possible that a management application may have to deal with a
 +   partial backup failure after a successful transaction.
 +
 +-  If multiple backup jobs are specified in a single transaction, when
 +   one of them fails, it will not interact with the other backup jobs in
 +   any way.
 +
 +-  The job(s) that succeeded will clear the dirty bitmap associated with
 +   the operation, but the job(s) that failed will not. It is not "safe"
 +   to delete any incremental backups that were created successfully in
 +   this scenario, even though others failed.
 +
 +Example
 +^^^^^^^
 +
 +-  QMP example highlighting two backup jobs:
 +
 +   .. code:: json
 +
 +       { "execute": "transaction",
 +         "arguments": {
 +           "actions": [
 +             { "type": "drive-backup",
 +               "data": { "device": "drive0", "bitmap": "bitmap0",
 +                         "format": "qcow2", "mode": "existing",
 +                         "sync": "incremental", "target": "d0-incr-1.qcow2" } },
 +             { "type": "drive-backup",
 +               "data": { "device": "drive1", "bitmap": "bitmap1",
 +                         "format": "qcow2", "mode": "existing",
 +                         "sync": "incremental", "target": "d1-incr-1.qcow2" } },
 +           ]
 +         }
 +       }
 +
 +-  QMP example response, highlighting one success and one failure:
 +
 +   -  Acknowledgement that the Transaction was accepted and jobs were
 +      launched:
 +
 +      .. code:: json
 +
 +          { "return": {} }
 +
 +   -  Later, QEMU sends notice that the first job was completed:
 +
 +      .. code:: json
 +
 +          { "timestamp": { "seconds": 1447192343, "microseconds": 615698 },
 +            "data": { "device": "drive0", "type": "backup",
 +                       "speed": 0, "len": 67108864, "offset": 67108864 },
 +            "event": "BLOCK_JOB_COMPLETED"
 +          }
 +
 +   -  Later yet, QEMU sends notice that the second job has failed:
 +
 +      .. code:: json
 +
 +          { "timestamp": { "seconds": 1447192399, "microseconds": 683015 },
 +            "data": { "device": "drive1", "action": "report",
 +                      "operation": "read" },
 +            "event": "BLOCK_JOB_ERROR" }
 +
 +      .. code:: json
 +
 +          { "timestamp": { "seconds": 1447192399, "microseconds":
 +          685853 }, "data": { "speed": 0, "offset": 0, "len": 67108864,
 +          "error": "Input/output error", "device": "drive1", "type":
 +          "backup" }, "event": "BLOCK_JOB_COMPLETED" }
 +
 +-  In the above example, ``d0-incr-1.qcow2`` is valid and must be kept,
 +   but ``d1-incr-1.qcow2`` is invalid and should be deleted. If a VM-wide
 +   incremental backup of all drives at a point-in-time is to be made,
 +   new backups for both drives will need to be made, taking into account
 +   that a new incremental backup for drive0 needs to be based on top of
 +   ``d0-incr-1.qcow2``.
 +
 +Grouped Completion Mode
 +~~~~~~~~~~~~~~~~~~~~~~~
 +
 +-  While jobs launched by transactions normally complete or fail on
 +   their own, it is possible to instruct them to complete or fail
 +   together as a group.
 +
 +-  QMP transactions take an optional properties structure that can
 +   affect the semantics of the transaction.
 +
 +-  The "completion-mode" transaction property can be either "individual"
 +   which is the default, legacy behavior described above, or "grouped,"
 +   a new behavior detailed below.
 +
 +-  Delayed Completion: In grouped completion mode, no jobs will report
 +   success until all jobs are ready to report success.
 +
 +-  Grouped failure: If any job fails in grouped completion mode, all
 +   remaining jobs will be cancelled. Any incremental backups will
 +   restore their dirty bitmap objects as if no backup command was ever
 +   issued.
 +
 +   -  Regardless of if QEMU reports a particular incremental backup job
 +      as CANCELLED or as an ERROR, the in-memory bitmap will be
 +      restored.
 +
 +Example
 +^^^^^^^
 +
 +-  Here's the same example scenario from above with the new property:
 +
 +   .. code:: json
 +
 +       { "execute": "transaction",
 +         "arguments": {
 +           "actions": [
 +             { "type": "drive-backup",
 +               "data": { "device": "drive0", "bitmap": "bitmap0",
 +                         "format": "qcow2", "mode": "existing",
 +                         "sync": "incremental", "target": "d0-incr-1.qcow2" } },
 +             { "type": "drive-backup",
 +               "data": { "device": "drive1", "bitmap": "bitmap1",
 +                         "format": "qcow2", "mode": "existing",
 +                         "sync": "incremental", "target": "d1-incr-1.qcow2" } },
 +           ],
 +           "properties": {
 +             "completion-mode": "grouped"
 +           }
 +         }
 +       }
 +
 +-  QMP example response, highlighting a failure for ``drive2``:
 +
 +   -  Acknowledgement that the Transaction was accepted and jobs were
 +      launched:
 +
 +      .. code:: json
 +
 +          { "return": {} }
 +
 +   -  Later, QEMU sends notice that the second job has errored out, but
 +      that the first job was also cancelled:
 +
 +      .. code:: json
 +
 +          { "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
 +            "data": { "device": "drive1", "action": "report",
 +                      "operation": "read" },
 +            "event": "BLOCK_JOB_ERROR" }
 +
 +      .. code:: json
 +
 +          { "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
 +            "data": { "speed": 0, "offset": 0, "len": 67108864,
 +                      "error": "Input/output error",
 +                      "device": "drive1", "type": "backup" },
 +            "event": "BLOCK_JOB_COMPLETED" }
 +
 +      .. code:: json
 +
 +          { "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
 +            "data": { "device": "drive0", "type": "backup", "speed": 0,
 +                      "len": 67108864, "offset": 16777216 },
 +            "event": "BLOCK_JOB_CANCELLED" }
 +
 +.. raw:: html
 +
 +   <!--
 +   The FreeBSD Documentation License
 +
 +   Redistribution and use in source (Markdown) and 'compiled' forms (SGML, HTML,
 +   PDF, PostScript, RTF and so forth) with or without modification, are permitted
 +   provided that the following conditions are met:
 +
 +   Redistributions of source code (Markdown) must retain the above copyright
 +   notice, this list of conditions and the following disclaimer of this file
 +   unmodified.
 +
 +   Redistributions in compiled form (transformed to other DTDs, converted to PDF,
 +   PostScript, RTF and other formats) must reproduce the above copyright notice,
 +   this list of conditions and the following disclaimer in the documentation and/or
 +   other materials provided with the distribution.
 +
 +   THIS DOCUMENTATION IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 +   AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 +   IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR  PURPOSE ARE
 +   DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS  BE LIABLE
 +   FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 +   DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 +   SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 +   CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
 +   OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
 +   THIS DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 +   -->
 --
-.9.4
+.21.0

The following changes since commit ca4e667dbf431d4a2a5a619cde79d30dd2ac3eb2:

Merge remote-tracking branch 'remotes/kraxel/tags/usb-20170717-pull-request' into staging (2017-07-17 17:54:17 +0100)

are available in the git repository at:

git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to 8508eee740c78d1465e25dad7c3e06137485dfbc:

live-block-ops.txt: Rename, rewrite, and improve it (2017-07-18 00:11:01 -0400)

----------------------------------------------------------------
Block patches (documentation)
----------------------------------------------------------------

Kashyap Chamarthy (2):
  bitmaps.md: Convert to rST; move it into 'interop' dir
  live-block-ops.txt: Rename, rewrite, and improve it

docs/devel/bitmaps.md                  |  505 ---------------
 docs/interop/bitmaps.rst               |  555 ++++++++++++++++
 docs/interop/live-block-operations.rst | 1088 ++++++++++++++++++++++++++++++++
 docs/live-block-ops.txt                |   72 ---
 4 files changed, 1643 insertions(+), 577 deletions(-)
 delete mode 100644 docs/devel/bitmaps.md
 create mode 100644 docs/interop/bitmaps.rst
 create mode 100644 docs/interop/live-block-operations.rst
 delete mode 100644 docs/live-block-ops.txt

-- 
2.9.4

From: Kashyap Chamarthy <kchamart@redhat.com>

This is part of the on-going effort to convert QEMU upstream
documentation syntax to reStructuredText (rST).

The conversion to rST was done using:

$ pandoc -f markdown -t rst bitmaps.md -o bitmaps.rst

Then, make a couple of small syntactical adjustments.  While at it,
reword a statement to avoid ambiguity.  Addressing the feedback from
this thread:

https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg05428.html

Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20170717105205.32639-2-kchamart@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 docs/devel/bitmaps.md    | 505 ------------------------------------------
 docs/interop/bitmaps.rst | 555 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 555 insertions(+), 505 deletions(-)
 delete mode 100644 docs/devel/bitmaps.md
 create mode 100644 docs/interop/bitmaps.rst

diff --git a/docs/devel/bitmaps.md b/docs/devel/bitmaps.md
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/docs/devel/bitmaps.md
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-
-
-# Dirty Bitmaps and Incremental Backup
-
-* Dirty Bitmaps are objects that track which data needs to be backed up for the
-  next incremental backup.
-
-* Dirty bitmaps can be created at any time and attached to any node
-  (not just complete drives.)
-
-## Dirty Bitmap Names
-
-* A dirty bitmap's name is unique to the node, but bitmaps attached to different
-  nodes can share the same name.
-
-* Dirty bitmaps created for internal use by QEMU may be anonymous and have no
-  name, but any user-created bitmaps may not be. There can be any number of
-  anonymous bitmaps per node.
-
-* The name of a user-created bitmap must not be empty ("").
-
-## Bitmap Modes
-
-* A Bitmap can be "frozen," which means that it is currently in-use by a backup
-  operation and cannot be deleted, renamed, written to, reset,
-  etc.
-
-* The normal operating mode for a bitmap is "active."
-
-## Basic QMP Usage
-
-### Supported Commands ###
-
-* block-dirty-bitmap-add
-* block-dirty-bitmap-remove
-* block-dirty-bitmap-clear
-
-### Creation
-
-* To create a new bitmap, enabled, on the drive with id=drive0:
-
-```json
-{ "execute": "block-dirty-bitmap-add",
-  "arguments": {
-    "node": "drive0",
-    "name": "bitmap0"
-  }
-}
-```
-
-* This bitmap will have a default granularity that matches the cluster size of
-  its associated drive, if available, clamped to between [4KiB, 64KiB].
-  The current default for qcow2 is 64KiB.
-
-* To create a new bitmap that tracks changes in 32KiB segments:
-
-```json
-{ "execute": "block-dirty-bitmap-add",
-  "arguments": {
-    "node": "drive0",
-    "name": "bitmap0",
-    "granularity": 32768
-  }
-}
-```
-
-### Deletion
-
-* Bitmaps that are frozen cannot be deleted.
-
-* Deleting the bitmap does not impact any other bitmaps attached to the same
-  node, nor does it affect any backups already created from this node.
-
-* Because bitmaps are only unique to the node to which they are attached,
-  you must specify the node/drive name here, too.
-
-```json
-{ "execute": "block-dirty-bitmap-remove",
-  "arguments": {
-    "node": "drive0",
-    "name": "bitmap0"
-  }
-}
-```
-
-### Resetting
-
-* Resetting a bitmap will clear all information it holds.
-
-* An incremental backup created from an empty bitmap will copy no data,
-  as if nothing has changed.
-
-```json
-{ "execute": "block-dirty-bitmap-clear",
-  "arguments": {
-    "node": "drive0",
-    "name": "bitmap0"
-  }
-}
-```
-
-## Transactions
-
-### Justification
-
-Bitmaps can be safely modified when the VM is paused or halted by using
-the basic QMP commands. For instance, you might perform the following actions:
-
-1. Boot the VM in a paused state.
-2. Create a full drive backup of drive0.
-3. Create a new bitmap attached to drive0.
-4. Resume execution of the VM.
-5. Incremental backups are ready to be created.
-
-At this point, the bitmap and drive backup would be correctly in sync,
-and incremental backups made from this point forward would be correctly aligned
-to the full drive backup.
-
-This is not particularly useful if we decide we want to start incremental
-backups after the VM has been running for a while, for which we will need to
-perform actions such as the following:
-
-1. Boot the VM and begin execution.
-2. Using a single transaction, perform the following operations:
-    * Create bitmap0.
-    * Create a full drive backup of drive0.
-3. Incremental backups are now ready to be created.
-
-### Supported Bitmap Transactions
-
-* block-dirty-bitmap-add
-* block-dirty-bitmap-clear
-
-The usages are identical to their respective QMP commands, but see below
-for examples.
-
-### Example: New Incremental Backup
-
-As outlined in the justification, perhaps we want to create a new incremental
-backup chain attached to a drive.
-
-```json
-{ "execute": "transaction",
-  "arguments": {
-    "actions": [
-      {"type": "block-dirty-bitmap-add",
-       "data": {"node": "drive0", "name": "bitmap0"} },
-      {"type": "drive-backup",
-       "data": {"device": "drive0", "target": "/path/to/full_backup.img",
-                "sync": "full", "format": "qcow2"} }
-    ]
-  }
-}
-```
-
-### Example: New Incremental Backup Anchor Point
-
-Maybe we just want to create a new full backup with an existing bitmap and
-want to reset the bitmap to track the new chain.
-
-```json
-{ "execute": "transaction",
-  "arguments": {
-    "actions": [
-      {"type": "block-dirty-bitmap-clear",
-       "data": {"node": "drive0", "name": "bitmap0"} },
-      {"type": "drive-backup",
-       "data": {"device": "drive0", "target": "/path/to/new_full_backup.img",
-                "sync": "full", "format": "qcow2"} }
-    ]
-  }
-}
-```
-
-## Incremental Backups
-
-The star of the show.
-
-**Nota Bene!** Only incremental backups of entire drives are supported for now.
-So despite the fact that you can attach a bitmap to any arbitrary node, they are
-only currently useful when attached to the root node. This is because
-drive-backup only supports drives/devices instead of arbitrary nodes.
-
-### Example: First Incremental Backup
-
-1. Create a full backup and sync it to the dirty bitmap, as in the transactional
-examples above; or with the VM offline, manually create a full copy and then
-create a new bitmap before the VM begins execution.
-
-    * Let's assume the full backup is named 'full_backup.img'.
-    * Let's assume the bitmap you created is 'bitmap0' attached to 'drive0'.
-
-2. Create a destination image for the incremental backup that utilizes the
-full backup as a backing image.
-
-    * Let's assume it is named 'incremental.0.img'.
-
-    ```sh
-    # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
-    ```
-
-3. Issue the incremental backup command:
-
-    ```json
-    { "execute": "drive-backup",
-      "arguments": {
-        "device": "drive0",
-        "bitmap": "bitmap0",
-        "target": "incremental.0.img",
-        "format": "qcow2",
-        "sync": "incremental",
-        "mode": "existing"
-      }
-    }
-    ```
-
-### Example: Second Incremental Backup
-
-1. Create a new destination image for the incremental backup that points to the
-   previous one, e.g.: 'incremental.1.img'
-
-    ```sh
-    # qemu-img create -f qcow2 incremental.1.img -b incremental.0.img -F qcow2
-    ```
-
-2. Issue a new incremental backup command. The only difference here is that we
-   have changed the target image below.
-
-    ```json
-    { "execute": "drive-backup",
-      "arguments": {
-        "device": "drive0",
-        "bitmap": "bitmap0",
-        "target": "incremental.1.img",
-        "format": "qcow2",
-        "sync": "incremental",
-        "mode": "existing"
-      }
-    }
-    ```
-
-## Errors
-
-* In the event of an error that occurs after a backup job is successfully
-  launched, either by a direct QMP command or a QMP transaction, the user
-  will receive a BLOCK_JOB_COMPLETE event with a failure message, accompanied
-  by a BLOCK_JOB_ERROR event.
-
-* In the case of an event being cancelled, the user will receive a
-  BLOCK_JOB_CANCELLED event instead of a pair of COMPLETE and ERROR events.
-
-* In either case, the incremental backup data contained within the bitmap is
-  safely rolled back, and the data within the bitmap is not lost. The image
-  file created for the failed attempt can be safely deleted.
-
-* Once the underlying problem is fixed (e.g. more storage space is freed up),
-  you can simply retry the incremental backup command with the same bitmap.
-
-### Example
-
-1. Create a target image:
-
-    ```sh
-    # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
-    ```
-
-2. Attempt to create an incremental backup via QMP:
-
-    ```json
-    { "execute": "drive-backup",
-      "arguments": {
-        "device": "drive0",
-        "bitmap": "bitmap0",
-        "target": "incremental.0.img",
-        "format": "qcow2",
-        "sync": "incremental",
-        "mode": "existing"
-      }
-    }
-    ```
-
-3. Receive an event notifying us of failure:
-
-    ```json
-    { "timestamp": { "seconds": 1424709442, "microseconds": 844524 },
-      "data": { "speed": 0, "offset": 0, "len": 67108864,
-                "error": "No space left on device",
-                "device": "drive1", "type": "backup" },
-      "event": "BLOCK_JOB_COMPLETED" }
-    ```
-
-4. Delete the failed incremental, and re-create the image.
-
-    ```sh
-    # rm incremental.0.img
-    # qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
-    ```
-
-5. Retry the command after fixing the underlying problem,
-   such as freeing up space on the backup volume:
-
-    ```json
-    { "execute": "drive-backup",
-      "arguments": {
-        "device": "drive0",
-        "bitmap": "bitmap0",
-        "target": "incremental.0.img",
-        "format": "qcow2",
-        "sync": "incremental",
-        "mode": "existing"
-      }
-    }
-    ```
-
-6. Receive confirmation that the job completed successfully:
-
-    ```json
-    { "timestamp": { "seconds": 1424709668, "microseconds": 526525 },
-      "data": { "device": "drive1", "type": "backup",
-                "speed": 0, "len": 67108864, "offset": 67108864},
-      "event": "BLOCK_JOB_COMPLETED" }
-    ```
-
-### Partial Transactional Failures
-
-* Sometimes, a transaction will succeed in launching and return success,
-  but then later the backup jobs themselves may fail. It is possible that
-  a management application may have to deal with a partial backup failure
-  after a successful transaction.
-
-* If multiple backup jobs are specified in a single transaction, when one of
-  them fails, it will not interact with the other backup jobs in any way.
-
-* The job(s) that succeeded will clear the dirty bitmap associated with the
-  operation, but the job(s) that failed will not. It is not "safe" to delete
-  any incremental backups that were created successfully in this scenario,
-  even though others failed.
-
-#### Example
-
-* QMP example highlighting two backup jobs:
-
-    ```json
-    { "execute": "transaction",
-      "arguments": {
-        "actions": [
-          { "type": "drive-backup",
-            "data": { "device": "drive0", "bitmap": "bitmap0",
-                      "format": "qcow2", "mode": "existing",
-                      "sync": "incremental", "target": "d0-incr-1.qcow2" } },
-          { "type": "drive-backup",
-            "data": { "device": "drive1", "bitmap": "bitmap1",
-                      "format": "qcow2", "mode": "existing",
-                      "sync": "incremental", "target": "d1-incr-1.qcow2" } },
-        ]
-      }
-    }
-    ```
-
-* QMP example response, highlighting one success and one failure:
-    * Acknowledgement that the Transaction was accepted and jobs were launched:
-        ```json
-        { "return": {} }
-        ```
-
-    * Later, QEMU sends notice that the first job was completed:
-        ```json
-        { "timestamp": { "seconds": 1447192343, "microseconds": 615698 },
-          "data": { "device": "drive0", "type": "backup",
-                     "speed": 0, "len": 67108864, "offset": 67108864 },
-          "event": "BLOCK_JOB_COMPLETED"
-        }
-        ```
-
-    * Later yet, QEMU sends notice that the second job has failed:
-        ```json
-        { "timestamp": { "seconds": 1447192399, "microseconds": 683015 },
-          "data": { "device": "drive1", "action": "report",
-                    "operation": "read" },
-          "event": "BLOCK_JOB_ERROR" }
-        ```
-
-        ```json
-        { "timestamp": { "seconds": 1447192399, "microseconds": 685853 },
-          "data": { "speed": 0, "offset": 0, "len": 67108864,
-                    "error": "Input/output error",
-                    "device": "drive1", "type": "backup" },
-          "event": "BLOCK_JOB_COMPLETED" }
-
-* In the above example, "d0-incr-1.qcow2" is valid and must be kept,
-  but "d1-incr-1.qcow2" is invalid and should be deleted. If a VM-wide
-  incremental backup of all drives at a point-in-time is to be made,
-  new backups for both drives will need to be made, taking into account
-  that a new incremental backup for drive0 needs to be based on top of
-  "d0-incr-1.qcow2."
-
-### Grouped Completion Mode
-
-* While jobs launched by transactions normally complete or fail on their own,
-  it is possible to instruct them to complete or fail together as a group.
-
-* QMP transactions take an optional properties structure that can affect
-  the semantics of the transaction.
-
-* The "completion-mode" transaction property can be either "individual"
-  which is the default, legacy behavior described above, or "grouped,"
-  a new behavior detailed below.
-
-* Delayed Completion: In grouped completion mode, no jobs will report
-  success until all jobs are ready to report success.
-
-* Grouped failure: If any job fails in grouped completion mode, all remaining
-  jobs will be cancelled. Any incremental backups will restore their dirty
-  bitmap objects as if no backup command was ever issued.
-
-    * Regardless of if QEMU reports a particular incremental backup job as
-      CANCELLED or as an ERROR, the in-memory bitmap will be restored.
-
-#### Example
-
-* Here's the same example scenario from above with the new property:
-
-    ```json
-    { "execute": "transaction",
-      "arguments": {
-        "actions": [
-          { "type": "drive-backup",
-            "data": { "device": "drive0", "bitmap": "bitmap0",
-                      "format": "qcow2", "mode": "existing",
-                      "sync": "incremental", "target": "d0-incr-1.qcow2" } },
-          { "type": "drive-backup",
-            "data": { "device": "drive1", "bitmap": "bitmap1",
-                      "format": "qcow2", "mode": "existing",
-                      "sync": "incremental", "target": "d1-incr-1.qcow2" } },
-        ],
-        "properties": {
-          "completion-mode": "grouped"
-        }
-      }
-    }
-    ```
-
-* QMP example response, highlighting a failure for drive2:
-    * Acknowledgement that the Transaction was accepted and jobs were launched:
-        ```json
-        { "return": {} }
-        ```
-
-    * Later, QEMU sends notice that the second job has errored out,
-      but that the first job was also cancelled:
-        ```json
-        { "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
-          "data": { "device": "drive1", "action": "report",
-                    "operation": "read" },
-          "event": "BLOCK_JOB_ERROR" }
-        ```
-
-        ```json
-        { "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
-          "data": { "speed": 0, "offset": 0, "len": 67108864,
-                    "error": "Input/output error",
-                    "device": "drive1", "type": "backup" },
-          "event": "BLOCK_JOB_COMPLETED" }
-        ```
-
-        ```json
-        { "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
-          "data": { "device": "drive0", "type": "backup", "speed": 0,
-                    "len": 67108864, "offset": 16777216 },
-          "event": "BLOCK_JOB_CANCELLED" }
-        ```
-
-
diff --git a/docs/interop/bitmaps.rst b/docs/interop/bitmaps.rst
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/docs/interop/bitmaps.rst
@@ -XXX,XX +XXX,XX @@
+..
+   Copyright 2015 John Snow <jsnow@redhat.com> and Red Hat, Inc.
+   All rights reserved.
+
+   This file is licensed via The FreeBSD Documentation License, the full
+   text of which is included at the end of this document.
+
+====================================
+Dirty Bitmaps and Incremental Backup
+====================================
+
+-  Dirty Bitmaps are objects that track which data needs to be backed up
+   for the next incremental backup.
+
+-  Dirty bitmaps can be created at any time and attached to any node
+   (not just complete drives).
+
+.. contents::
+
+Dirty Bitmap Names
+------------------
+
+-  A dirty bitmap's name is unique to the node, but bitmaps attached to
+   different nodes can share the same name.
+
+-  Dirty bitmaps created for internal use by QEMU may be anonymous and
+   have no name, but any user-created bitmaps must have a name. There
+   can be any number of anonymous bitmaps per node.
+
+-  The name of a user-created bitmap must not be empty ("").
+
+Bitmap Modes
+------------
+
+-  A bitmap can be "frozen," which means that it is currently in-use by
+   a backup operation and cannot be deleted, renamed, written to, reset,
+   etc.
+
+-  The normal operating mode for a bitmap is "active."
+
+Basic QMP Usage
+---------------
+
+Supported Commands
+~~~~~~~~~~~~~~~~~~
+
+- ``block-dirty-bitmap-add``
+- ``block-dirty-bitmap-remove``
+- ``block-dirty-bitmap-clear``
+
+Creation
+~~~~~~~~
+
+-  To create a new bitmap, enabled, on the drive with id=drive0:
+
+.. code:: json
+
+    { "execute": "block-dirty-bitmap-add",
+      "arguments": {
+        "node": "drive0",
+        "name": "bitmap0"
+      }
+    }
+
+-  This bitmap will have a default granularity that matches the cluster
+   size of its associated drive, if available, clamped to between [4KiB,
+   64KiB]. The current default for qcow2 is 64KiB.
+
+-  To create a new bitmap that tracks changes in 32KiB segments:
+
+.. code:: json
+
+    { "execute": "block-dirty-bitmap-add",
+      "arguments": {
+        "node": "drive0",
+        "name": "bitmap0",
+        "granularity": 32768
+      }
+    }
+
+Deletion
+~~~~~~~~
+
+-  Bitmaps that are frozen cannot be deleted.
+
+-  Deleting the bitmap does not impact any other bitmaps attached to the
+   same node, nor does it affect any backups already created from this
+   node.
+
+-  Because bitmaps are only unique to the node to which they are
+   attached, you must specify the node/drive name here, too.
+
+.. code:: json
+
+    { "execute": "block-dirty-bitmap-remove",
+      "arguments": {
+        "node": "drive0",
+        "name": "bitmap0"
+      }
+    }
+
+Resetting
+~~~~~~~~~
+
+-  Resetting a bitmap will clear all information it holds.
+
+-  An incremental backup created from an empty bitmap will copy no data,
+   as if nothing has changed.
+
+.. code:: json
+
+    { "execute": "block-dirty-bitmap-clear",
+      "arguments": {
+        "node": "drive0",
+        "name": "bitmap0"
+      }
+    }
+
+Transactions
+------------
+
+Justification
+~~~~~~~~~~~~~
+
+Bitmaps can be safely modified when the VM is paused or halted by using
+the basic QMP commands. For instance, you might perform the following
+actions:
+
+1. Boot the VM in a paused state.
+2. Create a full drive backup of drive0.
+3. Create a new bitmap attached to drive0.
+4. Resume execution of the VM.
+5. Incremental backups are ready to be created.
+
+At this point, the bitmap and drive backup would be correctly in sync,
+and incremental backups made from this point forward would be correctly
+aligned to the full drive backup.
+
+This is not particularly useful if we decide we want to start
+incremental backups after the VM has been running for a while, for which
+we will need to perform actions such as the following:
+
+1. Boot the VM and begin execution.
+2. Using a single transaction, perform the following operations:
+
+   -  Create ``bitmap0``.
+   -  Create a full drive backup of ``drive0``.
+
+3. Incremental backups are now ready to be created.
+
+Supported Bitmap Transactions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+-  ``block-dirty-bitmap-add``
+-  ``block-dirty-bitmap-clear``
+
+The usages are identical to their respective QMP commands, but see below
+for examples.
+
+Example: New Incremental Backup
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As outlined in the justification, perhaps we want to create a new
+incremental backup chain attached to a drive.
+
+.. code:: json
+
+    { "execute": "transaction",
+      "arguments": {
+        "actions": [
+          {"type": "block-dirty-bitmap-add",
+           "data": {"node": "drive0", "name": "bitmap0"} },
+          {"type": "drive-backup",
+           "data": {"device": "drive0", "target": "/path/to/full_backup.img",
+                    "sync": "full", "format": "qcow2"} }
+        ]
+      }
+    }
+
+Example: New Incremental Backup Anchor Point
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Maybe we just want to create a new full backup with an existing bitmap
+and want to reset the bitmap to track the new chain.
+
+.. code:: json
+
+    { "execute": "transaction",
+      "arguments": {
+        "actions": [
+          {"type": "block-dirty-bitmap-clear",
+           "data": {"node": "drive0", "name": "bitmap0"} },
+          {"type": "drive-backup",
+           "data": {"device": "drive0", "target": "/path/to/new_full_backup.img",
+                    "sync": "full", "format": "qcow2"} }
+        ]
+      }
+    }
+
+Incremental Backups
+-------------------
+
+The star of the show.
+
+**Nota Bene!** Only incremental backups of entire drives are supported
+for now. So despite the fact that you can attach a bitmap to any
+arbitrary node, they are only currently useful when attached to the root
+node. This is because drive-backup only supports drives/devices instead
+of arbitrary nodes.
+
+Example: First Incremental Backup
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+1. Create a full backup and sync it to the dirty bitmap, as in the
+   transactional examples above; or with the VM offline, manually create
+   a full copy and then create a new bitmap before the VM begins
+   execution.
+
+   -  Let's assume the full backup is named ``full_backup.img``.
+   -  Let's assume the bitmap you created is ``bitmap0`` attached to
+      ``drive0``.
+
+2. Create a destination image for the incremental backup that utilizes
+   the full backup as a backing image.
+
+   -  Let's assume the new incremental image is named
+      ``incremental.0.img``.
+
+   .. code:: bash
+
+       $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
+
+3. Issue the incremental backup command:
+
+   .. code:: json
+
+       { "execute": "drive-backup",
+         "arguments": {
+           "device": "drive0",
+           "bitmap": "bitmap0",
+           "target": "incremental.0.img",
+           "format": "qcow2",
+           "sync": "incremental",
+           "mode": "existing"
+         }
+       }
+
+Example: Second Incremental Backup
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+1. Create a new destination image for the incremental backup that points
+   to the previous one, e.g.: ``incremental.1.img``
+
+   .. code:: bash
+
+       $ qemu-img create -f qcow2 incremental.1.img -b incremental.0.img -F qcow2
+
+2. Issue a new incremental backup command. The only difference here is
+   that we have changed the target image below.
+
+   .. code:: json
+
+       { "execute": "drive-backup",
+         "arguments": {
+           "device": "drive0",
+           "bitmap": "bitmap0",
+           "target": "incremental.1.img",
+           "format": "qcow2",
+           "sync": "incremental",
+           "mode": "existing"
+         }
+       }
+
+Errors
+------
+
+-  In the event of an error that occurs after a backup job is
+   successfully launched, either by a direct QMP command or a QMP
+   transaction, the user will receive a ``BLOCK_JOB_COMPLETE`` event with
+   a failure message, accompanied by a ``BLOCK_JOB_ERROR`` event.
+
+-  In the case of an event being cancelled, the user will receive a
+   ``BLOCK_JOB_CANCELLED`` event instead of a pair of COMPLETE and ERROR
+   events.
+
+-  In either case, the incremental backup data contained within the
+   bitmap is safely rolled back, and the data within the bitmap is not
+   lost. The image file created for the failed attempt can be safely
+   deleted.
+
+-  Once the underlying problem is fixed (e.g. more storage space is
+   freed up), you can simply retry the incremental backup command with
+   the same bitmap.
+
+Example
+~~~~~~~
+
+1. Create a target image:
+
+   .. code:: bash
+
+       $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
+
+2. Attempt to create an incremental backup via QMP:
+
+   .. code:: json
+
+       { "execute": "drive-backup",
+         "arguments": {
+           "device": "drive0",
+           "bitmap": "bitmap0",
+           "target": "incremental.0.img",
+           "format": "qcow2",
+           "sync": "incremental",
+           "mode": "existing"
+         }
+       }
+
+3. Receive an event notifying us of failure:
+
+   .. code:: json
+
+       { "timestamp": { "seconds": 1424709442, "microseconds": 844524 },
+         "data": { "speed": 0, "offset": 0, "len": 67108864,
+                   "error": "No space left on device",
+                   "device": "drive1", "type": "backup" },
+         "event": "BLOCK_JOB_COMPLETED" }
+
+4. Delete the failed incremental, and re-create the image.
+
+   .. code:: bash
+
+       $ rm incremental.0.img
+       $ qemu-img create -f qcow2 incremental.0.img -b full_backup.img -F qcow2
+
+5. Retry the command after fixing the underlying problem, such as
+   freeing up space on the backup volume:
+
+   .. code:: json
+
+       { "execute": "drive-backup",
+         "arguments": {
+           "device": "drive0",
+           "bitmap": "bitmap0",
+           "target": "incremental.0.img",
+           "format": "qcow2",
+           "sync": "incremental",
+           "mode": "existing"
+         }
+       }
+
+6. Receive confirmation that the job completed successfully:
+
+   .. code:: json
+
+       { "timestamp": { "seconds": 1424709668, "microseconds": 526525 },
+         "data": { "device": "drive1", "type": "backup",
+                   "speed": 0, "len": 67108864, "offset": 67108864},
+         "event": "BLOCK_JOB_COMPLETED" }
+
+Partial Transactional Failures
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+-  Sometimes, a transaction will succeed in launching and return
+   success, but then later the backup jobs themselves may fail. It is
+   possible that a management application may have to deal with a
+   partial backup failure after a successful transaction.
+
+-  If multiple backup jobs are specified in a single transaction, when
+   one of them fails, it will not interact with the other backup jobs in
+   any way.
+
+-  The job(s) that succeeded will clear the dirty bitmap associated with
+   the operation, but the job(s) that failed will not. It is not "safe"
+   to delete any incremental backups that were created successfully in
+   this scenario, even though others failed.
+
+Example
+^^^^^^^
+
+-  QMP example highlighting two backup jobs:
+
+   .. code:: json
+
+       { "execute": "transaction",
+         "arguments": {
+           "actions": [
+             { "type": "drive-backup",
+               "data": { "device": "drive0", "bitmap": "bitmap0",
+                         "format": "qcow2", "mode": "existing",
+                         "sync": "incremental", "target": "d0-incr-1.qcow2" } },
+             { "type": "drive-backup",
+               "data": { "device": "drive1", "bitmap": "bitmap1",
+                         "format": "qcow2", "mode": "existing",
+                         "sync": "incremental", "target": "d1-incr-1.qcow2" } },
+           ]
+         }
+       }
+
+-  QMP example response, highlighting one success and one failure:
+
+   -  Acknowledgement that the Transaction was accepted and jobs were
+      launched:
+
+      .. code:: json
+
+          { "return": {} }
+
+   -  Later, QEMU sends notice that the first job was completed:
+
+      .. code:: json
+
+          { "timestamp": { "seconds": 1447192343, "microseconds": 615698 },
+            "data": { "device": "drive0", "type": "backup",
+                       "speed": 0, "len": 67108864, "offset": 67108864 },
+            "event": "BLOCK_JOB_COMPLETED"
+          }
+
+   -  Later yet, QEMU sends notice that the second job has failed:
+
+      .. code:: json
+
+          { "timestamp": { "seconds": 1447192399, "microseconds": 683015 },
+            "data": { "device": "drive1", "action": "report",
+                      "operation": "read" },
+            "event": "BLOCK_JOB_ERROR" }
+
+      .. code:: json
+
+          { "timestamp": { "seconds": 1447192399, "microseconds":
+          685853 }, "data": { "speed": 0, "offset": 0, "len": 67108864,
+          "error": "Input/output error", "device": "drive1", "type":
+          "backup" }, "event": "BLOCK_JOB_COMPLETED" }
+
+-  In the above example, ``d0-incr-1.qcow2`` is valid and must be kept,
+   but ``d1-incr-1.qcow2`` is invalid and should be deleted. If a VM-wide
+   incremental backup of all drives at a point-in-time is to be made,
+   new backups for both drives will need to be made, taking into account
+   that a new incremental backup for drive0 needs to be based on top of
+   ``d0-incr-1.qcow2``.
+
+Grouped Completion Mode
+~~~~~~~~~~~~~~~~~~~~~~~
+
+-  While jobs launched by transactions normally complete or fail on
+   their own, it is possible to instruct them to complete or fail
+   together as a group.
+
+-  QMP transactions take an optional properties structure that can
+   affect the semantics of the transaction.
+
+-  The "completion-mode" transaction property can be either "individual"
+   which is the default, legacy behavior described above, or "grouped,"
+   a new behavior detailed below.
+
+-  Delayed Completion: In grouped completion mode, no jobs will report
+   success until all jobs are ready to report success.
+
+-  Grouped failure: If any job fails in grouped completion mode, all
+   remaining jobs will be cancelled. Any incremental backups will
+   restore their dirty bitmap objects as if no backup command was ever
+   issued.
+
+   -  Regardless of if QEMU reports a particular incremental backup job
+      as CANCELLED or as an ERROR, the in-memory bitmap will be
+      restored.
+
+Example
+^^^^^^^
+
+-  Here's the same example scenario from above with the new property:
+
+   .. code:: json
+
+       { "execute": "transaction",
+         "arguments": {
+           "actions": [
+             { "type": "drive-backup",
+               "data": { "device": "drive0", "bitmap": "bitmap0",
+                         "format": "qcow2", "mode": "existing",
+                         "sync": "incremental", "target": "d0-incr-1.qcow2" } },
+             { "type": "drive-backup",
+               "data": { "device": "drive1", "bitmap": "bitmap1",
+                         "format": "qcow2", "mode": "existing",
+                         "sync": "incremental", "target": "d1-incr-1.qcow2" } },
+           ],
+           "properties": {
+             "completion-mode": "grouped"
+           }
+         }
+       }
+
+-  QMP example response, highlighting a failure for ``drive2``:
+
+   -  Acknowledgement that the Transaction was accepted and jobs were
+      launched:
+
+      .. code:: json
+
+          { "return": {} }
+
+   -  Later, QEMU sends notice that the second job has errored out, but
+      that the first job was also cancelled:
+
+      .. code:: json
+
+          { "timestamp": { "seconds": 1447193702, "microseconds": 632377 },
+            "data": { "device": "drive1", "action": "report",
+                      "operation": "read" },
+            "event": "BLOCK_JOB_ERROR" }
+
+      .. code:: json
+
+          { "timestamp": { "seconds": 1447193702, "microseconds": 640074 },
+            "data": { "speed": 0, "offset": 0, "len": 67108864,
+                      "error": "Input/output error",
+                      "device": "drive1", "type": "backup" },
+            "event": "BLOCK_JOB_COMPLETED" }
+
+      .. code:: json
+
+          { "timestamp": { "seconds": 1447193702, "microseconds": 640163 },
+            "data": { "device": "drive0", "type": "backup", "speed": 0,
+                      "len": 67108864, "offset": 16777216 },
+            "event": "BLOCK_JOB_CANCELLED" }
+
+.. raw:: html
+
+   
-- 
2.9.4

From: Kashyap Chamarthy <kchamart@redhat.com>

This patch documents (including their QMP invocations) all the four
major kinds of live block operations:

- `block-stream`
  - `block-commit`
  - `drive-mirror` (& `blockdev-mirror`)
  - `drive-backup` (& `blockdev-backup`)

Things considered while writing this document:

- Use reStructuredText as markup language (with the goal of generating
    the HTML output using the Sphinx Documentation Generator).  It is
    gentler on the eye, and can be trivially converted to different
    formats.  (Another reason: upstream QEMU is considering to switch to
    Sphinx, which uses reStructuredText as its markup language.)

- Raw QMP JSON output vs. 'qmp-shell'.  I debated with myself whether
    to only show raw QMP JSON output (as that is the canonical
    representation), or use 'qmp-shell', which takes key-value pairs.  I
    settled on the approach of: for the first occurrence of a command,
    use raw JSON; for subsequent occurrences, use 'qmp-shell', with an
    occasional exception.

- Usage of `-blockdev` command-line.

- Usage of 'node-name' vs. file path to refer to disks.  While we have
    `blockdev-{mirror, backup}` as 'node-name'-alternatives for
    `drive-{mirror, backup}`, the `block-commit` command still operates
    on file names for parameters 'base' and 'top'.  So I added a caveat
    at the beginning to that effect.

Refer this related thread that I started (where I learnt
    `block-stream` was recently reworked to accept 'node-name' for 'top'
    and 'base' parameters):
    https://lists.nongnu.org/archive/html/qemu-devel/2017-05/msg06466.html
    "[RFC] Making 'block-stream', and 'block-commit' accept node-name"

All commands showed in this document were tested while documenting.

Thanks: Eric Blake for the section: "A note on points-in-time vs file
names".  This useful bit was originally articulated by Eric in his
KVMForum 2015 presentation, so I included that specific bit in this
document.

Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 20170717105205.32639-3-kchamart@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 docs/interop/live-block-operations.rst | 1088 ++++++++++++++++++++++++++++++++
 docs/live-block-ops.txt                |   72 ---
 2 files changed, 1088 insertions(+), 72 deletions(-)
 create mode 100644 docs/interop/live-block-operations.rst
 delete mode 100644 docs/live-block-ops.txt

diff --git a/docs/interop/live-block-operations.rst b/docs/interop/live-block-operations.rst
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/docs/interop/live-block-operations.rst
@@ -XXX,XX +XXX,XX @@
+..
+    Copyright (C) 2017 Red Hat Inc.
+
+    This work is licensed under the terms of the GNU GPL, version 2 or
+    later.  See the COPYING file in the top-level directory.
+
+============================
+Live Block Device Operations
+============================
+
+QEMU Block Layer currently (as of QEMU 2.9) supports four major kinds of
+live block device jobs -- stream, commit, mirror, and backup.  These can
+be used to manipulate disk image chains to accomplish certain tasks,
+namely: live copy data from backing files into overlays; shorten long
+disk image chains by merging data from overlays into backing files; live
+synchronize data from a disk image chain (including current active disk)
+to another target image; and point-in-time (and incremental) backups of
+a block device.  Below is a description of the said block (QMP)
+primitives, and some (non-exhaustive list of) examples to illustrate
+their use.
+
+.. note::
+    The file ``qapi/block-core.json`` in the QEMU source tree has the
+    canonical QEMU API (QAPI) schema documentation for the QMP
+    primitives discussed here.
+
+.. todo (kashyapc):: Remove the ".. contents::" directive when Sphinx is
+                     integrated.
+
+.. contents::
+
+Disk image backing chain notation
+---------------------------------
+
+A simple disk image chain.  (This can be created live using QMP
+``blockdev-snapshot-sync``, or offline via ``qemu-img``)::
+
+                   (Live QEMU)
+                        |
+                        .
+                        V
+
+            [A] <----- [B]
+
+    (backing file)    (overlay)
+
+The arrow can be read as: Image [A] is the backing file of disk image
+[B].  And live QEMU is currently writing to image [B], consequently, it
+is also referred to as the "active layer".
+
+There are two kinds of terminology that are common when referring to
+files in a disk image backing chain:
+
+(1) Directional: 'base' and 'top'.  Given the simple disk image chain
+    above, image [A] can be referred to as 'base', and image [B] as
+    'top'.  (This terminology can be seen in in QAPI schema file,
+    block-core.json.)
+
+(2) Relational: 'backing file' and 'overlay'.  Again, taking the same
+    simple disk image chain from the above, disk image [A] is referred
+    to as the backing file, and image [B] as overlay.
+
+   Throughout this document, we will use the relational terminology.
+
+.. important::
+    The overlay files can generally be any format that supports a
+    backing file, although QCOW2 is the preferred format and the one
+    used in this document.
+
+
+Brief overview of live block QMP primitives
+-------------------------------------------
+
+The following are the four different kinds of live block operations that
+QEMU block layer supports.
+
+(1) ``block-stream``: Live copy of data from backing files into overlay
+    files.
+
+    .. note:: Once the 'stream' operation has finished, three things to
+              note:
+
+                (a) QEMU rewrites the backing chain to remove
+                    reference to the now-streamed and redundant backing
+                    file;
+
+                (b) the streamed file *itself* won't be removed by QEMU,
+                    and must be explicitly discarded by the user;
+
+                (c) the streamed file remains valid -- i.e. further
+                    overlays can be created based on it.  Refer the
+                    ``block-stream`` section further below for more
+                    details.
+
+(2) ``block-commit``: Live merge of data from overlay files into backing
+    files (with the optional goal of removing the overlay file from the
+    chain).  Since QEMU 2.0, this includes "active ``block-commit``"
+    (i.e. merge the current active layer into the base image).
+
+    .. note:: Once the 'commit' operation has finished, there are three
+              things to note here as well:
+
+                (a) QEMU rewrites the backing chain to remove reference
+                    to now-redundant overlay images that have been
+                    committed into a backing file;
+
+                (b) the committed file *itself* won't be removed by QEMU
+                    -- it ought to be manually removed;
+
+                (c) however, unlike in the case of ``block-stream``, the
+                    intermediate images will be rendered invalid -- i.e.
+                    no more further overlays can be created based on
+                    them.  Refer the ``block-commit`` section further
+                    below for more details.
+
+(3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize a running
+    disk to another image.
+
+(4) ``drive-backup`` (and ``blockdev-backup``): Point-in-time (live) copy
+    of a block device to a destination.
+
+
+.. _`Interacting with a QEMU instance`:
+
+Interacting with a QEMU instance
+--------------------------------
+
+To show some example invocations of command-line, we will use the
+following invocation of QEMU, with a QMP server running over UNIX
+socket::
+
+    $ ./x86_64-softmmu/qemu-system-x86_64 -display none -nodefconfig \
+        -M q35 -nodefaults -m 512 \
+        -blockdev node-name=node-A,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./a.qcow2 \
+        -device virtio-blk,drive=node-A,id=virtio0 \
+        -monitor stdio -qmp unix:/tmp/qmp-sock,server,nowait
+
+The ``-blockdev`` command-line option, used above, is available from
+QEMU 2.9 onwards.  In the above invocation, notice the ``node-name``
+parameter that is used to refer to the disk image a.qcow2 ('node-A') --
+this is a cleaner way to refer to a disk image (as opposed to referring
+to it by spelling out file paths).  So, we will continue to designate a
+``node-name`` to each further disk image created (either via
+``blockdev-snapshot-sync``, or ``blockdev-add``) as part of the disk
+image chain, and continue to refer to the disks using their
+``node-name`` (where possible, because ``block-commit`` does not yet, as
+of QEMU 2.9, accept ``node-name`` parameter) when performing various
+block operations.
+
+To interact with the QEMU instance launched above, we will use the
+``qmp-shell`` utility (located at: ``qemu/scripts/qmp``, as part of the
+QEMU source directory), which takes key-value pairs for QMP commands.
+Invoke it as below (which will also print out the complete raw JSON
+syntax for reference -- examples in the following sections)::
+
+    $ ./qmp-shell -v -p /tmp/qmp-sock
+    (QEMU)
+
+.. note::
+    In the event we have to repeat a certain QMP command, we will: for
+    the first occurrence of it, show the ``qmp-shell`` invocation, *and*
+    the corresponding raw JSON QMP syntax; but for subsequent
+    invocations, present just the ``qmp-shell`` syntax, and omit the
+    equivalent JSON output.
+
+
+Example disk image chain
+------------------------
+
+We will use the below disk image chain (and occasionally spelling it
+out where appropriate) when discussing various primitives::
+
+    [A] <-- [B] <-- [C] <-- [D]
+
+Where [A] is the original base image; [B] and [C] are intermediate
+overlay images; image [D] is the active layer -- i.e. live QEMU is
+writing to it.  (The rule of thumb is: live QEMU will always be pointing
+to the rightmost image in a disk image chain.)
+
+The above image chain can be created by invoking
+``blockdev-snapshot-sync`` commands as following (which shows the
+creation of overlay image [B]) using the ``qmp-shell`` (our invocation
+also prints the raw JSON invocation of it)::
+
+    (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
+    {
+        "execute": "blockdev-snapshot-sync",
+        "arguments": {
+            "node-name": "node-A",
+            "snapshot-file": "b.qcow2",
+            "format": "qcow2",
+            "snapshot-node-name": "node-B"
+        }
+    }
+
+Here, "node-A" is the name QEMU internally uses to refer to the base
+image [A] -- it is the backing file, based on which the overlay image,
+[B], is created.
+
+To create the rest of the overlay images, [C], and [D] (omitting the raw
+JSON output for brevity)::
+
+    (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2
+    (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2
+
+
+A note on points-in-time vs file names
+--------------------------------------
+
+In our disk image chain::
+
+    [A] <-- [B] <-- [C] <-- [D]
+
+We have *three* points in time and an active layer:
+
+- Point 1: Guest state when [B] was created is contained in file [A]
+- Point 2: Guest state when [C] was created is contained in [A] + [B]
+- Point 3: Guest state when [D] was created is contained in
+  [A] + [B] + [C]
+- Active layer: Current guest state is contained in [A] + [B] + [C] +
+  [D]
+
+Therefore, be aware with naming choices:
+
+- Naming a file after the time it is created is misleading -- the
+  guest data for that point in time is *not* contained in that file
+  (as explained earlier)
+- Rather, think of files as a *delta* from the backing file
+
+
+Live block streaming --- ``block-stream``
+-----------------------------------------
+
+The ``block-stream`` command allows you to do live copy data from backing
+files into overlay images.
+
+Given our original example disk image chain from earlier::
+
+    [A] <-- [B] <-- [C] <-- [D]
+
+The disk image chain can be shortened in one of the following different
+ways (not an exhaustive list).
+
+.. _`Case-1`:
+
+(1) Merge everything into the active layer: I.e. copy all contents from
+    the base image, [A], and overlay images, [B] and [C], into [D],
+    *while* the guest is running.  The resulting chain will be a
+    standalone image, [D] -- with contents from [A], [B] and [C] merged
+    into it (where live QEMU writes go to)::
+
+        [D]
+
+.. _`Case-2`:
+
+(2) Taking the same example disk image chain mentioned earlier, merge
+    only images [B] and [C] into [D], the active layer.  The result will
+    be contents of images [B] and [C] will be copied into [D], and the
+    backing file pointer of image [D] will be adjusted to point to image
+    [A].  The resulting chain will be::
+
+        [A] <-- [D]
+
+.. _`Case-3`:
+
+(3) Intermediate streaming (available since QEMU 2.8): Starting afresh
+    with the original example disk image chain, with a total of four
+    images, it is possible to copy contents from image [B] into image
+    [C].  Once the copy is finished, image [B] can now be (optionally)
+    discarded; and the backing file pointer of image [C] will be
+    adjusted to point to [A].  I.e. after performing "intermediate
+    streaming" of [B] into [C], the resulting image chain will be (where
+    live QEMU is writing to [D])::
+
+        [A] <-- [C] <-- [D]
+
+
+QMP invocation for ``block-stream``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For `Case-1`_, to merge contents of all the backing files into the
+active layer, where 'node-D' is the current active image (by default
+``block-stream`` will flatten the entire chain); ``qmp-shell`` (and its
+corresponding JSON output)::
+
+    (QEMU) block-stream device=node-D job-id=job0
+    {
+        "execute": "block-stream",
+        "arguments": {
+            "device": "node-D",
+            "job-id": "job0"
+        }
+    }
+
+For `Case-2`_, merge contents of the images [B] and [C] into [D], where
+image [D] ends up referring to image [A] as its backing file::
+
+    (QEMU) block-stream device=node-D base-node=node-A job-id=job0
+
+And for `Case-3`_, of "intermediate" streaming", merge contents of
+images [B] into [C], where [C] ends up referring to [A] as its backing
+image::
+
+    (QEMU) block-stream device=node-C base-node=node-A job-id=job0
+
+Progress of a ``block-stream`` operation can be monitored via the QMP
+command::
+
+    (QEMU) query-block-jobs
+    {
+        "execute": "query-block-jobs",
+        "arguments": {}
+    }
+
+
+Once the ``block-stream`` operation has completed, QEMU will emit an
+event, ``BLOCK_JOB_COMPLETED``.  The intermediate overlays remain valid,
+and can now be (optionally) discarded, or retained to create further
+overlays based on them.  Finally, the ``block-stream`` jobs can be
+restarted at anytime.
+
+
+Live block commit --- ``block-commit``
+--------------------------------------
+
+The ``block-commit`` command lets you merge live data from overlay
+images into backing file(s).  Since QEMU 2.0, this includes "live active
+commit" (i.e. it is possible to merge the "active layer", the right-most
+image in a disk image chain where live QEMU will be writing to, into the
+base image).  This is analogous to ``block-stream``, but in the opposite
+direction.
+
+Again, starting afresh with our example disk image chain, where live
+QEMU is writing to the right-most image in the chain, [D]::
+
+    [A] <-- [B] <-- [C] <-- [D]
+
+The disk image chain can be shortened in one of the following ways:
+
+.. _`block-commit_Case-1`:
+
+(1) Commit content from only image [B] into image [A].  The resulting
+    chain is the following, where image [C] is adjusted to point at [A]
+    as its new backing file::
+
+        [A] <-- [C] <-- [D]
+
+(2) Commit content from images [B] and [C] into image [A].  The
+    resulting chain, where image [D] is adjusted to point to image [A]
+    as its new backing file::
+
+        [A] <-- [D]
+
+.. _`block-commit_Case-3`:
+
+(3) Commit content from images [B], [C], and the active layer [D] into
+    image [A].  The resulting chain (in this case, a consolidated single
+    image)::
+
+        [A]
+
+(4) Commit content from image only image [C] into image [B].  The
+    resulting chain::
+
+	[A] <-- [B] <-- [D]
+
+(5) Commit content from image [C] and the active layer [D] into image
+    [B].  The resulting chain::
+
+	[A] <-- [B]
+
+
+QMP invocation for ``block-commit``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For :ref:`Case-1 <block-commit_Case-1>`, to merge contents only from
+image [B] into image [A], the invocation is as follows::
+
+    (QEMU) block-commit device=node-D base=a.qcow2 top=b.qcow2 job-id=job0
+    {
+        "execute": "block-commit",
+        "arguments": {
+            "device": "node-D",
+            "job-id": "job0",
+            "top": "b.qcow2",
+            "base": "a.qcow2"
+        }
+    }
+
+Once the above ``block-commit`` operation has completed, a
+``BLOCK_JOB_COMPLETED`` event will be issued, and no further action is
+required.  As the end result, the backing file of image [C] is adjusted
+to point to image [A], and the original 4-image chain will end up being
+transformed to::
+
+    [A] <-- [C] <-- [D]
+
+.. note::
+    The intermediate image [B] is invalid (as in: no more further
+    overlays based on it can be created).
+
+    Reasoning: An intermediate image after a 'stream' operation still
+    represents that old point-in-time, and may be valid in that context.
+    However, an intermediate image after a 'commit' operation no longer
+    represents any point-in-time, and is invalid in any context.
+
+
+However, :ref:`Case-3 <block-commit_Case-3>` (also called: "active
+``block-commit``") is a *two-phase* operation: In the first phase, the
+content from the active overlay, along with the intermediate overlays,
+is copied into the backing file (also called the base image).  In the
+second phase, adjust the said backing file as the current active image
+-- possible via issuing the command ``block-job-complete``.  Optionally,
+the ``block-commit`` operation can be cancelled by issuing the command
+``block-job-cancel``, but be careful when doing this.
+
+Once the ``block-commit`` operation has completed, the event
+``BLOCK_JOB_READY`` will be emitted, signalling that the synchronization
+has finished.  Now the job can be gracefully completed by issuing the
+command ``block-job-complete`` -- until such a command is issued, the
+'commit' operation remains active.
+
+The following is the flow for :ref:`Case-3 <block-commit_Case-3>` to
+convert a disk image chain such as this::
+
+    [A] <-- [B] <-- [C] <-- [D]
+
+Into::
+
+    [A]
+
+Where content from all the subsequent overlays, [B], and [C], including
+the active layer, [D], is committed back to [A] -- which is where live
+QEMU is performing all its current writes).
+
+Start the "active ``block-commit``" operation::
+
+    (QEMU) block-commit device=node-D base=a.qcow2 top=d.qcow2 job-id=job0
+    {
+        "execute": "block-commit",
+        "arguments": {
+            "device": "node-D",
+            "job-id": "job0",
+            "top": "d.qcow2",
+            "base": "a.qcow2"
+        }
+    }
+
+
+Once the synchronization has completed, the event ``BLOCK_JOB_READY`` will
+be emitted.
+
+Then, optionally query for the status of the active block operations.
+We can see the 'commit' job is now ready to be completed, as indicated
+by the line *"ready": true*::
+
+    (QEMU) query-block-jobs
+    {
+        "execute": "query-block-jobs",
+        "arguments": {}
+    }
+    {
+        "return": [
+            {
+                "busy": false,
+                "type": "commit",
+                "len": 1376256,
+                "paused": false,
+                "ready": true,
+                "io-status": "ok",
+                "offset": 1376256,
+                "device": "job0",
+                "speed": 0
+            }
+        ]
+    }
+
+Gracefully complete the 'commit' block device job::
+
+    (QEMU) block-job-complete device=job0
+    {
+        "execute": "block-job-complete",
+        "arguments": {
+            "device": "job0"
+        }
+    }
+    {
+        "return": {}
+    }
+
+Finally, once the above job is completed, an event
+``BLOCK_JOB_COMPLETED`` will be emitted.
+
+.. note::
+    The invocation for rest of the cases (2, 4, and 5), discussed in the
+    previous section, is omitted for brevity.
+
+
+Live disk synchronization --- ``drive-mirror`` and ``blockdev-mirror``
+----------------------------------------------------------------------
+
+Synchronize a running disk image chain (all or part of it) to a target
+image.
+
+Again, given our familiar disk image chain::
+
+    [A] <-- [B] <-- [C] <-- [D]
+
+The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``) allows
+you to copy data from the entire chain into a single target image (which
+can be located on a different host).
+
+Once a 'mirror' job has started, there are two possible actions while a
+``drive-mirror`` job is active:
+
+(1) Issuing the command ``block-job-cancel`` after it emits the event
+    ``BLOCK_JOB_CANCELLED``: will (after completing synchronization of
+    the content from the disk image chain to the target image, [E])
+    create a point-in-time (which is at the time of *triggering* the
+    cancel command) copy, contained in image [E], of the the entire disk
+    image chain (or only the top-most image, depending on the ``sync``
+    mode).
+
+(2) Issuing the command ``block-job-complete`` after it emits the event
+    ``BLOCK_JOB_COMPLETED``: will, after completing synchronization of
+    the content, adjust the guest device (i.e. live QEMU) to point to
+    the target image, and, causing all the new writes from this point on
+    to happen there.  One use case for this is live storage migration.
+
+About synchronization modes: The synchronization mode determines
+*which* part of the disk image chain will be copied to the target.
+Currently, there are four different kinds:
+
+(1) ``full`` -- Synchronize the content of entire disk image chain to
+    the target
+
+(2) ``top`` -- Synchronize only the contents of the top-most disk image
+    in the chain to the target
+
+(3) ``none`` -- Synchronize only the new writes from this point on.
+
+    .. note:: In the case of ``drive-backup`` (or ``blockdev-backup``),
+              the behavior of ``none`` synchronization mode is different.
+              Normally, a ``backup`` job consists of two parts: Anything
+              that is overwritten by the guest is first copied out to
+              the backup, and in the background the whole image is
+              copied from start to end. With ``sync=none``, it's only
+              the first part.
+
+(4) ``incremental`` -- Synchronize content that is described by the
+    dirty bitmap
+
+.. note::
+    Refer to the :doc:`bitmaps` document in the QEMU source
+    tree to learn about the detailed workings of the ``incremental``
+    synchronization mode.
+
+
+QMP invocation for ``drive-mirror``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+To copy the contents of the entire disk image chain, from [A] all the
+way to [D], to a new target (``drive-mirror`` will create the destination
+file, if it doesn't already exist), call it [E]::
+
+    (QEMU) drive-mirror device=node-D target=e.qcow2 sync=full job-id=job0
+    {
+        "execute": "drive-mirror",
+        "arguments": {
+            "device": "node-D",
+            "job-id": "job0",
+            "target": "e.qcow2",
+            "sync": "full"
+        }
+    }
+
+The ``"sync": "full"``, from the above, means: copy the *entire* chain
+to the destination.
+
+Following the above, querying for active block jobs will show that a
+'mirror' job is "ready" to be completed (and QEMU will also emit an
+event, ``BLOCK_JOB_READY``)::
+
+    (QEMU) query-block-jobs
+    {
+        "execute": "query-block-jobs",
+        "arguments": {}
+    }
+    {
+        "return": [
+            {
+                "busy": false,
+                "type": "mirror",
+                "len": 21757952,
+                "paused": false,
+                "ready": true,
+                "io-status": "ok",
+                "offset": 21757952,
+                "device": "job0",
+                "speed": 0
+            }
+        ]
+    }
+
+And, as noted in the previous section, there are two possible actions
+at this point:
+
+(a) Create a point-in-time snapshot by ending the synchronization.  The
+    point-in-time is at the time of *ending* the sync.  (The result of
+    the following being: the target image, [E], will be populated with
+    content from the entire chain, [A] to [D])::
+
+        (QEMU) block-job-cancel device=job0
+        {
+            "execute": "block-job-cancel",
+            "arguments": {
+                "device": "job0"
+            }
+        }
+
+(b) Or, complete the operation and pivot the live QEMU to the target
+    copy::
+
+        (QEMU) block-job-complete device=job0
+
+In either of the above cases, if you once again run the
+`query-block-jobs` command, there should not be any active block
+operation.
+
+Comparing 'commit' and 'mirror': In both then cases, the overlay images
+can be discarded.  However, with 'commit', the *existing* base image
+will be modified (by updating it with contents from overlays); while in
+the case of 'mirror', a *new* target image is populated with the data
+from the disk image chain.
+
+
+QMP invocation for live storage migration with ``drive-mirror`` + NBD
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Live storage migration (without shared storage setup) is one of the most
+common use-cases that takes advantage of the ``drive-mirror`` primitive
+and QEMU's built-in Network Block Device (NBD) server.  Here's a quick
+walk-through of this setup.
+
+Given the disk image chain::
+
+    [A] <-- [B] <-- [C] <-- [D]
+
+Instead of copying content from the entire chain, synchronize *only* the
+contents of the *top*-most disk image (i.e. the active layer), [D], to a
+target, say, [TargetDisk].
+
+.. important::
+    The destination host must already have the contents of the backing
+    chain, involving images [A], [B], and [C], visible via other means
+    -- whether by ``cp``, ``rsync``, or by some storage array-specific
+    command.)
+
+Sometimes, this is also referred to as "shallow copy" -- because only
+the "active layer", and not the rest of the image chain, is copied to
+the destination.
+
+.. note::
+    In this example, for the sake of simplicity, we'll be using the same
+    ``localhost`` as both source and destination.
+
+As noted earlier, on the destination host the contents of the backing
+chain -- from images [A] to [C] -- are already expected to exist in some
+form (e.g. in a file called, ``Contents-of-A-B-C.qcow2``).  Now, on the
+destination host, let's create a target overlay image (with the image
+``Contents-of-A-B-C.qcow2`` as its backing file), to which the contents
+of image [D] (from the source QEMU) will be mirrored to::
+
+    $ qemu-img create -f qcow2 -b ./Contents-of-A-B-C.qcow2 \
+        -F qcow2 ./target-disk.qcow2
+
+And start the destination QEMU (we already have the source QEMU running
+-- discussed in the section: `Interacting with a QEMU instance`_)
+instance, with the following invocation.  (As noted earlier, for
+simplicity's sake, the destination QEMU is started on the same host, but
+it could be located elsewhere)::
+
+    $ ./x86_64-softmmu/qemu-system-x86_64 -display none -nodefconfig \
+        -M q35 -nodefaults -m 512 \
+        -blockdev node-name=node-TargetDisk,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./target-disk.qcow2 \
+        -device virtio-blk,drive=node-TargetDisk,id=virtio0 \
+        -S -monitor stdio -qmp unix:./qmp-sock2,server,nowait \
+        -incoming tcp:localhost:6666
+
+Given the disk image chain on source QEMU::
+
+    [A] <-- [B] <-- [C] <-- [D]
+
+On the destination host, it is expected that the contents of the chain
+``[A] <-- [B] <-- [C]`` are *already* present, and therefore copy *only*
+the content of image [D].
+
+(1) [On *destination* QEMU] As part of the first step, start the
+    built-in NBD server on a given host (local host, represented by
+    ``::``)and port::
+
+        (QEMU) nbd-server-start addr={"type":"inet","data":{"host":"::","port":"49153"}}
+        {
+            "execute": "nbd-server-start",
+            "arguments": {
+                "addr": {
+                    "data": {
+                        "host": "::",
+                        "port": "49153"
+                    },
+                    "type": "inet"
+                }
+            }
+        }
+
+(2) [On *destination* QEMU] And export the destination disk image using
+    QEMU's built-in NBD server::
+
+        (QEMU) nbd-server-add device=node-TargetDisk writable=true
+        {
+            "execute": "nbd-server-add",
+            "arguments": {
+                "device": "node-TargetDisk"
+            }
+        }
+
+(3) [On *source* QEMU] Then, invoke ``drive-mirror`` (NB: since we're
+    running ``drive-mirror`` with ``mode=existing`` (meaning:
+    synchronize to a pre-created file, therefore 'existing', file on the
+    target host), with the synchronization mode as 'top' (``"sync:
+    "top"``)::
+
+        (QEMU) drive-mirror device=node-D target=nbd:localhost:49153:exportname=node-TargetDisk sync=top mode=existing job-id=job0
+        {
+            "execute": "drive-mirror",
+            "arguments": {
+                "device": "node-D",
+                "mode": "existing",
+                "job-id": "job0",
+                "target": "nbd:localhost:49153:exportname=node-TargetDisk",
+                "sync": "top"
+            }
+        }
+
+(4) [On *source* QEMU] Once ``drive-mirror`` copies the entire data, and the
+    event ``BLOCK_JOB_READY`` is emitted, issue ``block-job-cancel`` to
+    gracefully end the synchronization, from source QEMU::
+
+        (QEMU) block-job-cancel device=job0
+        {
+            "execute": "block-job-cancel",
+            "arguments": {
+                "device": "job0"
+            }
+        }
+
+(5) [On *destination* QEMU] Then, stop the NBD server::
+
+        (QEMU) nbd-server-stop
+        {
+            "execute": "nbd-server-stop",
+            "arguments": {}
+        }
+
+(6) [On *destination* QEMU] Finally, resume the guest vCPUs by issuing the
+    QMP command `cont`::
+
+        (QEMU) cont
+        {
+            "execute": "cont",
+            "arguments": {}
+        }
+
+.. note::
+    Higher-level libraries (e.g. libvirt) automate the entire above
+    process (although note that libvirt does not allow same-host
+    migrations to localhost for other reasons).
+
+
+Notes on ``blockdev-mirror``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``blockdev-mirror`` command is equivalent in core functionality to
+``drive-mirror``, except that it operates at node-level in a BDS graph.
+
+Also: for ``blockdev-mirror``, the 'target' image needs to be explicitly
+created (using ``qemu-img``) and attach it to live QEMU via
+``blockdev-add``, which assigns a name to the to-be created target node.
+
+E.g. the sequence of actions to create a point-in-time backup of an
+entire disk image chain, to a target, using ``blockdev-mirror`` would be:
+
+(0) Create the QCOW2 overlays, to arrive at a backing chain of desired
+    depth
+
+(1) Create the target image (using ``qemu-img``), say, ``e.qcow2``
+
+(2) Attach the above created file (``e.qcow2``), run-time, using
+    ``blockdev-add`` to QEMU
+
+(3) Perform ``blockdev-mirror`` (use ``"sync": "full"`` to copy the
+    entire chain to the target).  And notice the event
+    ``BLOCK_JOB_READY``
+
+(4) Optionally, query for active block jobs, there should be a 'mirror'
+    job ready to be completed
+
+(5) Gracefully complete the 'mirror' block device job, and notice the
+    the event ``BLOCK_JOB_COMPLETED``
+
+(6) Shutdown the guest by issuing the QMP ``quit`` command so that
+    caches are flushed
+
+(7) Then, finally, compare the contents of the disk image chain, and
+    the target copy with ``qemu-img compare``.  You should notice:
+    "Images are identical"
+
+
+QMP invocation for ``blockdev-mirror``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Given the disk image chain::
+
+    [A] <-- [B] <-- [C] <-- [D]
+
+To copy the contents of the entire disk image chain, from [A] all the
+way to [D], to a new target, call it [E].  The following is the flow.
+
+Create the overlay images, [B], [C], and [D]::
+
+    (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
+    (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2
+    (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2
+
+Create the target image, [E]::
+
+    $ qemu-img create -f qcow2 e.qcow2 39M
+
+Add the above created target image to QEMU, via ``blockdev-add``::
+
+    (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"}
+    {
+        "execute": "blockdev-add",
+        "arguments": {
+            "node-name": "node-E",
+            "driver": "qcow2",
+            "file": {
+                "driver": "file",
+                "filename": "e.qcow2"
+            }
+        }
+    }
+
+Perform ``blockdev-mirror``, and notice the event ``BLOCK_JOB_READY``::
+
+    (QEMU) blockdev-mirror device=node-B target=node-E sync=full job-id=job0
+    {
+        "execute": "blockdev-mirror",
+        "arguments": {
+            "device": "node-D",
+            "job-id": "job0",
+            "target": "node-E",
+            "sync": "full"
+        }
+    }
+
+Query for active block jobs, there should be a 'mirror' job ready::
+
+    (QEMU) query-block-jobs
+    {
+        "execute": "query-block-jobs",
+        "arguments": {}
+    }
+    {
+        "return": [
+            {
+                "busy": false,
+                "type": "mirror",
+                "len": 21561344,
+                "paused": false,
+                "ready": true,
+                "io-status": "ok",
+                "offset": 21561344,
+                "device": "job0",
+                "speed": 0
+            }
+        ]
+    }
+
+Gracefully complete the block device job operation, and notice the
+event ``BLOCK_JOB_COMPLETED``::
+
+    (QEMU) block-job-complete device=job0
+    {
+        "execute": "block-job-complete",
+        "arguments": {
+            "device": "job0"
+        }
+    }
+    {
+        "return": {}
+    }
+
+Shutdown the guest, by issuing the ``quit`` QMP command::
+
+    (QEMU) quit
+    {
+        "execute": "quit",
+        "arguments": {}
+    }
+
+
+Live disk backup --- ``drive-backup`` and ``blockdev-backup``
+-------------------------------------------------------------
+
+The ``drive-backup`` (and its newer equivalent ``blockdev-backup``) allows
+you to create a point-in-time snapshot.
+
+In this case, the point-in-time is when you *start* the ``drive-backup``
+(or its newer equivalent ``blockdev-backup``) command.
+
+
+QMP invocation for ``drive-backup``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Yet again, starting afresh with our example disk image chain::
+
+    [A] <-- [B] <-- [C] <-- [D]
+
+To create a target image [E], with content populated from image [A] to
+[D], from the above chain, the following is the syntax.  (If the target
+image does not exist, ``drive-backup`` will create it)::
+
+    (QEMU) drive-backup device=node-D sync=full target=e.qcow2 job-id=job0
+    {
+        "execute": "drive-backup",
+        "arguments": {
+            "device": "node-D",
+            "job-id": "job0",
+            "sync": "full",
+            "target": "e.qcow2"
+        }
+    }
+
+Once the above ``drive-backup`` has completed, a ``BLOCK_JOB_COMPLETED`` event
+will be issued, indicating the live block device job operation has
+completed, and no further action is required.
+
+
+Notes on ``blockdev-backup``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``blockdev-backup`` command is equivalent in functionality to
+``drive-backup``, except that it operates at node-level in a Block Driver
+State (BDS) graph.
+
+E.g. the sequence of actions to create a point-in-time backup
+of an entire disk image chain, to a target, using ``blockdev-backup``
+would be:
+
+(0) Create the QCOW2 overlays, to arrive at a backing chain of desired
+    depth
+
+(1) Create the target image (using ``qemu-img``), say, ``e.qcow2``
+
+(2) Attach the above created file (``e.qcow2``), run-time, using
+    ``blockdev-add`` to QEMU
+
+(3) Perform ``blockdev-backup`` (use ``"sync": "full"`` to copy the
+    entire chain to the target).  And notice the event
+    ``BLOCK_JOB_COMPLETED``
+
+(4) Shutdown the guest, by issuing the QMP ``quit`` command, so that
+    caches are flushed
+
+(5) Then, finally, compare the contents of the disk image chain, and
+    the target copy with ``qemu-img compare``.  You should notice:
+    "Images are identical"
+
+The following section shows an example QMP invocation for
+``blockdev-backup``.
+
+QMP invocation for ``blockdev-backup``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Given a disk image chain of depth 1 where image [B] is the active
+overlay (live QEMU is writing to it)::
+
+    [A] <-- [B]
+
+The following is the procedure to copy the content from the entire chain
+to a target image (say, [E]), which has the full content from [A] and
+[B].
+
+Create the overlay [B]::
+
+    (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2
+    {
+        "execute": "blockdev-snapshot-sync",
+        "arguments": {
+            "node-name": "node-A",
+            "snapshot-file": "b.qcow2",
+            "format": "qcow2",
+            "snapshot-node-name": "node-B"
+        }
+    }
+
+
+Create a target image that will contain the copy::
+
+    $ qemu-img create -f qcow2 e.qcow2 39M
+
+Then add it to QEMU via ``blockdev-add``::
+
+    (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"}
+    {
+        "execute": "blockdev-add",
+        "arguments": {
+            "node-name": "node-E",
+            "driver": "qcow2",
+            "file": {
+                "driver": "file",
+                "filename": "e.qcow2"
+            }
+        }
+    }
+
+Then invoke ``blockdev-backup`` to copy the contents from the entire
+image chain, consisting of images [A] and [B] to the target image
+'e.qcow2'::
+
+    (QEMU) blockdev-backup device=node-B target=node-E sync=full job-id=job0
+    {
+        "execute": "blockdev-backup",
+        "arguments": {
+            "device": "node-B",
+            "job-id": "job0",
+            "target": "node-E",
+            "sync": "full"
+        }
+    }
+
+Once the above 'backup' operation has completed, the event,
+``BLOCK_JOB_COMPLETED`` will be emitted, signalling successful
+completion.
+
+Next, query for any active block device jobs (there should be none)::
+
+    (QEMU) query-block-jobs
+    {
+        "execute": "query-block-jobs",
+        "arguments": {}
+    }
+
+Shutdown the guest::
+
+    (QEMU) quit
+    {
+            "execute": "quit",
+                "arguments": {}
+    }
+            "return": {}
+    }
+
+.. note::
+    The above step is really important; if forgotten, an error, "Failed
+    to get shared "write" lock on e.qcow2", will be thrown when you do
+    ``qemu-img compare`` to verify the integrity of the disk image
+    with the backup content.
+
+
+The end result will be the image 'e.qcow2' containing a
+point-in-time backup of the disk image chain -- i.e. contents from
+images [A] and [B] at the time the ``blockdev-backup`` command was
+initiated.
+
+One way to confirm the backup disk image contains the identical content
+with the disk image chain is to compare the backup and the contents of
+the chain, you should see "Images are identical".  (NB: this is assuming
+QEMU was launched with ``-S`` option, which will not start the CPUs at
+guest boot up)::
+
+    $ qemu-img compare b.qcow2 e.qcow2
+    Warning: Image size mismatch!
+    Images are identical.
+
+NOTE: The "Warning: Image size mismatch!" is expected, as we created the
+target image (e.qcow2) with 39M size.
diff --git a/docs/live-block-ops.txt b/docs/live-block-ops.txt
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/docs/live-block-ops.txt
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-LIVE BLOCK OPERATIONS
-=====================
-
-High level description of live block operations. Note these are not
-supported for use with the raw format at the moment.
-
-Note also that this document is incomplete and it currently only
-covers the 'stream' operation. Other operations supported by QEMU such
-as 'commit', 'mirror' and 'backup' are not described here yet. Please
-refer to the qapi/block-core.json file for an overview of those.
-
-Snapshot live merge
-===================
-
-Given a snapshot chain, described in this document in the following
-format:
-
-[A] <- [B] <- [C] <- [D] <- [E]
-
-Where the rightmost object ([E] in the example) described is the current
-image which the guest OS has write access to. To the left of it is its base
-image, and so on accordingly until the leftmost image, which has no
-base.
-
-The snapshot live merge operation transforms such a chain into a
-smaller one with fewer elements, such as this transformation relative
-to the first example:
-
-[A] <- [E]
-
-Data is copied in the right direction with destination being the
-rightmost image, but any other intermediate image can be specified
-instead. In this example data is copied from [C] into [D], so [D] can
-be backed by [B]:
-
-[A] <- [B] <- [D] <- [E]
-
-The operation is implemented in QEMU through image streaming facilities.
-
-The basic idea is to execute 'block_stream virtio0' while the guest is
-running. Progress can be monitored using 'info block-jobs'. When the
-streaming operation completes it raises a QMP event. 'block_stream'
-copies data from the backing file(s) into the active image. When finished,
-it adjusts the backing file pointer.
-
-The 'base' parameter specifies an image which data need not be
-streamed from. This image will be used as the backing file for the
-destination image when the operation is finished.
-
-In the first example above, the command would be:
-
-(qemu) block_stream virtio0 file-A.img
-
-In order to specify a destination image different from the active
-(rightmost) one we can use its node name instead.
-
-In the second example above, the command would be:
-
-(qemu) block_stream node-D file-B.img
-
-Live block copy
-===============
-
-To copy an in use image to another destination in the filesystem, one
-should create a live snapshot in the desired destination, then stream
-into that image. Example:
-
-(qemu) snapshot_blkdev ide0-hd0 /new-path/disk.img qcow2
-
-(qemu) block_stream ide0-hd0
-
-
-- 
2.9.4

The following changes since commit 474f3938d79ab36b9231c9ad3b5a9314c2aeacde:

Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-jun-21-2019' into staging (2019-06-21 15:40:50 +0100)

are available in the Git repository at:

https://github.com/XanClic/qemu.git tags/pull-block-2019-06-24

for you to fetch changes up to ab5d4a30f7f3803ca5106b370969c1b7b54136f8:

iotests: Fix 205 for concurrent runs (2019-06-24 16:01:40 +0200)

----------------------------------------------------------------
Block patches:
- The SSH block driver now uses libssh instead of libssh2
- The VMDK block driver gets read-only support for the seSparse
  subformat
- Various fixes

---

v2:
- Squashed Pino's fix for pre-0.8 libssh into the libssh patch

----------------------------------------------------------------
Anton Nefedov (1):
  iotest 134: test cluster-misaligned encrypted write

Klaus Birkelund Jensen (1):
  nvme: do not advertise support for unsupported arbitration mechanism

Max Reitz (1):
  iotests: Fix 205 for concurrent runs

Pino Toscano (1):
  ssh: switch from libssh2 to libssh

Sam Eiderman (3):
  vmdk: Fix comment regarding max l1_size coverage
  vmdk: Reduce the max bound for L1 table size
  vmdk: Add read-only support for seSparse snapshots

Vladimir Sementsov-Ogievskiy (1):
  blockdev: enable non-root nodes for transaction drive-backup source

-- 
2.21.0

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

We forget to enable it for transaction .prepare, while it is already
enabled in do_drive_backup since commit a2d665c1bc362
    "blockdev: loosen restrictions on drive-backup source node"

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190618140804.59214-1-vsementsov@virtuozzo.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 blockdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index XXXXXXX..XXXXXXX 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -XXX,XX +XXX,XX @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
     assert(common->action->type == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
     backup = common->action->u.drive_backup.data;
 
-    bs = qmp_get_root_bs(backup->device, errp);
+    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
     if (!bs) {
         return;
     }
-- 
2.21.0

From: Anton Nefedov <anton.nefedov@virtuozzo.com>

COW (even empty/zero) areas require encryption too

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190516143028.81155-1-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/134     |  9 +++++++++
 tests/qemu-iotests/134.out | 10 ++++++++++
 2 files changed, 19 insertions(+)

diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
index XXXXXXX..XXXXXXX 100755
--- a/tests/qemu-iotests/134
+++ b/tests/qemu-iotests/134
@@ -XXX,XX +XXX,XX @@ echo
 echo "== reading whole image =="
 $QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 
+echo
+echo "== rewriting cluster part =="
+$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
+echo "== verify pattern =="
+$QEMU_IO --object $SECRET -c "read -P 0 0 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+$QEMU_IO --object $SECRET -c "read -P 0xb 512 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
 echo
 echo "== rewriting whole image =="
 $QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/134.out
+++ b/tests/qemu-iotests/134.out
@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
 read 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+== rewriting cluster part ==
+wrote 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify pattern ==
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 == rewriting whole image ==
 wrote 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-- 
2.21.0

From: Sam Eiderman <shmuel.eiderman@oracle.com>

Commit b0651b8c246d ("vmdk: Move l1_size check into vmdk_add_extent")
extended the l1_size check from VMDK4 to VMDK3 but did not update the
default coverage in the moved comment.

The previous vmdk4 calculation:

(512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB

The added vmdk3 calculation:

(512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB

Adding the calculation of vmdk3 to the comment.

In any case, VMware does not offer virtual disks more than 2TB for
vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is
not implemented yet in qemu.

Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com
Reviewed-by: yuchenlin <yuchenlin@synology.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/vmdk.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index XXXXXXX..XXXXXXX 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
         return -EFBIG;
     }
     if (l1_size > 512 * 1024 * 1024) {
-        /* Although with big capacity and small l1_entry_sectors, we can get a
+        /*
+         * Although with big capacity and small l1_entry_sectors, we can get a
          * big l1_size, we don't want unbounded value to allocate the table.
-         * Limit it to 512M, which is 16PB for default cluster and L2 table
-         * size */
+         * Limit it to 512M, which is:
+         *     16PB - for default "Hosted Sparse Extent" (VMDK4)
+         *            cluster size: 64KB, L2 table size: 512 entries
+         *     1PB  - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
+         *            cluster size: 512B, L2 table size: 4096 entries
+         */
         error_setg(errp, "L1 size too big");
         return -EFBIG;
     }
-- 
2.21.0

From: Sam Eiderman <shmuel.eiderman@oracle.com>

512M of L1 entries is a very loose bound, only 32M are required to store
the maximal supported VMDK file size of 2TB.

Fixed qemu-iotest 59# - now failure occures before on impossible L1
table size.

Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/vmdk.c               | 13 +++++++------
 tests/qemu-iotests/059.out |  2 +-
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index XXXXXXX..XXXXXXX 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
         error_setg(errp, "Invalid granularity, image may be corrupt");
         return -EFBIG;
     }
-    if (l1_size > 512 * 1024 * 1024) {
+    if (l1_size > 32 * 1024 * 1024) {
         /*
          * Although with big capacity and small l1_entry_sectors, we can get a
          * big l1_size, we don't want unbounded value to allocate the table.
-         * Limit it to 512M, which is:
-         *     16PB - for default "Hosted Sparse Extent" (VMDK4)
-         *            cluster size: 64KB, L2 table size: 512 entries
-         *     1PB  - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
-         *            cluster size: 512B, L2 table size: 4096 entries
+         * Limit it to 32M, which is enough to store:
+         *     8TB  - for both VMDK3 & VMDK4 with
+         *            minimal cluster size: 512B
+         *            minimal L2 table size: 512 entries
+         *            8 TB is still more than the maximal value supported for
+         *            VMDK3 & VMDK4 which is 2TB.
          */
         error_setg(errp, "L1 size too big");
         return -EFBIG;
diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/059.out
+++ b/tests/qemu-iotests/059.out
@@ -XXX,XX +XXX,XX @@ Offset          Length          Mapped to       File
 0x140000000     0x10000         0x50000         TEST_DIR/t-s003.vmdk
 
 === Testing afl image with a very large capacity ===
-qemu-img: Can't get image size 'TEST_DIR/afl9.IMGFMT': File too large
+qemu-img: Could not open 'TEST_DIR/afl9.IMGFMT': L1 size too big
 *** done
-- 
2.21.0

From: Sam Eiderman <shmuel.eiderman@oracle.com>

Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
QEMU).

This format was lacking in the following:

* Grain directory (L1) and grain table (L2) entries were 32-bit,
      allowing access to only 2TB (slightly less) of data.
    * The grain size (default) was 512 bytes - leading to data
      fragmentation and many grain tables.
    * For space reclamation purposes, it was necessary to find all the
      grains which are not pointed to by any grain table - so a reverse
      mapping of "offset of grain in vmdk" to "grain table" must be
      constructed - which takes large amounts of CPU/RAM.

The format specification can be found in VMware's documentation:
https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf

In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
introduced: SESparse (Space Efficient).

This format fixes the above issues:

* All entries are now 64-bit.
    * The grain size (default) is 4KB.
    * Grain directory and grain tables are now located at the beginning
      of the file.
      + seSparse format reserves space for all grain tables.
      + Grain tables can be addressed using an index.
      + Grains are located in the end of the file and can also be
        addressed with an index.
      - seSparse vmdks of large disks (64TB) have huge preallocated
        headers - mainly due to L2 tables, even for empty snapshots.
    * The header contains a reverse mapping ("backmap") of "offset of
      grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
      specifies for each grain - whether it is allocated or not.
      Using these data structures we can implement space reclamation
      efficiently.
    * Due to the fact that the header now maintains two mappings:
        * The regular one (grain directory & grain tables)
        * A reverse one (backmap and free bitmap)
      These data structures can lose consistency upon crash and result
      in a corrupted VMDK.
      Therefore, a journal is also added to the VMDK and is replayed
      when the VMware reopens the file after a crash.

Since ESXi 6.7 - SESparse is the only snapshot format available.

Unfortunately, VMware does not provide documentation regarding the new
seSparse format.

This commit is based on black-box research of the seSparse format.
Various in-guest block operations and their effect on the snapshot file
were tested.

The only VMware provided source of information (regarding the underlying
implementation) was a log file on the ESXi:

/var/log/hostd.log

Whenever an seSparse snapshot is created - the log is being populated
with seSparse records.

Relevant log records are of the form:

[...] Const Header:
[...]  constMagic     = 0xcafebabe
[...]  version        = 2.1
[...]  capacity       = 204800
[...]  grainSize      = 8
[...]  grainTableSize = 64
[...]  flags          = 0
[...] Extents:
[...]  Header         : <1 : 1>
[...]  JournalHdr     : <2 : 2>
[...]  Journal        : <2048 : 2048>
[...]  GrainDirectory : <4096 : 2048>
[...]  GrainTables    : <6144 : 2048>
[...]  FreeBitmap     : <8192 : 2048>
[...]  BackMap        : <10240 : 2048>
[...]  Grain          : <12288 : 204800>
[...] Volatile Header:
[...] volatileMagic     = 0xcafecafe
[...] FreeGTNumber      = 0
[...] nextTxnSeqNumber  = 0
[...] replayJournal     = 0

The sizes that are seen in the log file are in sectors.
Extents are of the following format: <offset : size>

This commit is a strict implementation which enforces:
    * magics
    * version number 2.1
    * grain size of 8 sectors  (4KB)
    * grain table size of 64 sectors
    * zero flags
    * extent locations

Additionally, this commit proivdes only a subset of the functionality
offered by seSparse's format:
    * Read-only
    * No journal replay
    * No space reclamation
    * No unmap support

Hence, journal header, journal, free bitmap and backmap extents are
unused, only the "classic" (L1 -> L2 -> data) grain access is
implemented.

However there are several differences in the grain access itself.
Grain directory (L1):
    * Grain directory entries are indexes (not offsets) to grain
      tables.
    * Valid grain directory entries have their highest nibble set to
      0x1.
    * Since grain tables are always located in the beginning of the
      file - the index can fit into 32 bits - so we can use its low
      part if it's valid.
Grain table (L2):
    * Grain table entries are indexes (not offsets) to grains.
    * If the highest nibble of the entry is:
        0x0:
            The grain in not allocated.
            The rest of the bytes are 0.
        0x1:
            The grain is unmapped - guest sees a zero grain.
            The rest of the bits point to the previously mapped grain,
            see 0x3 case.
        0x2:
            The grain is zero.
        0x3:
            The grain is allocated - to get the index calculate:
            ((entry & 0x0fff000000000000) >> 48) |
            ((entry & 0x0000ffffffffffff) << 12)
    * The difference between 0x1 and 0x2 is that 0x1 is an unallocated
      grain which results from the guest using sg_unmap to unmap the
      grain - but the grain itself still exists in the grain extent - a
      space reclamation procedure should delete it.
      Unmapping a zero grain has no effect (0x2 will not change to 0x1)
      but unmapping an unallocated grain will (0x0 to 0x1) - naturally.

In order to implement seSparse some fields had to be changed to support
both 32-bit and 64-bit entry sizes.

Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/vmdk.c | 358 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 342 insertions(+), 16 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index XXXXXXX..XXXXXXX 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -XXX,XX +XXX,XX @@ typedef struct {
     uint16_t compressAlgorithm;
 } QEMU_PACKED VMDK4Header;
 
+typedef struct VMDKSESparseConstHeader {
+    uint64_t magic;
+    uint64_t version;
+    uint64_t capacity;
+    uint64_t grain_size;
+    uint64_t grain_table_size;
+    uint64_t flags;
+    uint64_t reserved1;
+    uint64_t reserved2;
+    uint64_t reserved3;
+    uint64_t reserved4;
+    uint64_t volatile_header_offset;
+    uint64_t volatile_header_size;
+    uint64_t journal_header_offset;
+    uint64_t journal_header_size;
+    uint64_t journal_offset;
+    uint64_t journal_size;
+    uint64_t grain_dir_offset;
+    uint64_t grain_dir_size;
+    uint64_t grain_tables_offset;
+    uint64_t grain_tables_size;
+    uint64_t free_bitmap_offset;
+    uint64_t free_bitmap_size;
+    uint64_t backmap_offset;
+    uint64_t backmap_size;
+    uint64_t grains_offset;
+    uint64_t grains_size;
+    uint8_t pad[304];
+} QEMU_PACKED VMDKSESparseConstHeader;
+
+typedef struct VMDKSESparseVolatileHeader {
+    uint64_t magic;
+    uint64_t free_gt_number;
+    uint64_t next_txn_seq_number;
+    uint64_t replay_journal;
+    uint8_t pad[480];
+} QEMU_PACKED VMDKSESparseVolatileHeader;
+
 #define L2_CACHE_SIZE 16
 
 typedef struct VmdkExtent {
@@ -XXX,XX +XXX,XX @@ typedef struct VmdkExtent {
     bool compressed;
     bool has_marker;
     bool has_zero_grain;
+    bool sesparse;
+    uint64_t sesparse_l2_tables_offset;
+    uint64_t sesparse_clusters_offset;
+    int32_t entry_size;
     int version;
     int64_t sectors;
     int64_t end_sector;
     int64_t flat_start_offset;
     int64_t l1_table_offset;
     int64_t l1_backup_table_offset;
-    uint32_t *l1_table;
+    void *l1_table;
     uint32_t *l1_backup_table;
     unsigned int l1_size;
     uint32_t l1_entry_sectors;
 
     unsigned int l2_size;
-    uint32_t *l2_cache;
+    void *l2_cache;
     uint32_t l2_cache_offsets[L2_CACHE_SIZE];
     uint32_t l2_cache_counts[L2_CACHE_SIZE];
 
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
          *            minimal L2 table size: 512 entries
          *            8 TB is still more than the maximal value supported for
          *            VMDK3 & VMDK4 which is 2TB.
+         *     64TB - for "ESXi seSparse Extent"
+         *            minimal cluster size: 512B (default is 4KB)
+         *            L2 table size: 4096 entries (const).
+         *            64TB is more than the maximal value supported for
+         *            seSparse VMDKs (which is slightly less than 64TB)
          */
         error_setg(errp, "L1 size too big");
         return -EFBIG;
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
     extent->l2_size = l2_size;
     extent->cluster_sectors = flat ? sectors : cluster_sectors;
     extent->next_cluster_sector = ROUND_UP(nb_sectors, cluster_sectors);
+    extent->entry_size = sizeof(uint32_t);
 
     if (s->num_extents > 1) {
         extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
     int i;
 
     /* read the L1 table */
-    l1_size = extent->l1_size * sizeof(uint32_t);
+    l1_size = extent->l1_size * extent->entry_size;
     extent->l1_table = g_try_malloc(l1_size);
     if (l1_size && extent->l1_table == NULL) {
         return -ENOMEM;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
         goto fail_l1;
     }
     for (i = 0; i < extent->l1_size; i++) {
-        le32_to_cpus(&extent->l1_table[i]);
+        if (extent->entry_size == sizeof(uint64_t)) {
+            le64_to_cpus((uint64_t *)extent->l1_table + i);
+        } else {
+            assert(extent->entry_size == sizeof(uint32_t));
+            le32_to_cpus((uint32_t *)extent->l1_table + i);
+        }
     }
 
     if (extent->l1_backup_table_offset) {
+        assert(!extent->sesparse);
         extent->l1_backup_table = g_try_malloc(l1_size);
         if (l1_size && extent->l1_backup_table == NULL) {
             ret = -ENOMEM;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
     }
 
     extent->l2_cache =
-        g_new(uint32_t, extent->l2_size * L2_CACHE_SIZE);
+        g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE);
     return 0;
  fail_l1b:
     g_free(extent->l1_backup_table);
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
     return ret;
 }
 
+#define SESPARSE_CONST_HEADER_MAGIC UINT64_C(0x00000000cafebabe)
+#define SESPARSE_VOLATILE_HEADER_MAGIC UINT64_C(0x00000000cafecafe)
+
+/* Strict checks - format not officially documented */
+static int check_se_sparse_const_header(VMDKSESparseConstHeader *header,
+                                        Error **errp)
+{
+    header->magic = le64_to_cpu(header->magic);
+    header->version = le64_to_cpu(header->version);
+    header->grain_size = le64_to_cpu(header->grain_size);
+    header->grain_table_size = le64_to_cpu(header->grain_table_size);
+    header->flags = le64_to_cpu(header->flags);
+    header->reserved1 = le64_to_cpu(header->reserved1);
+    header->reserved2 = le64_to_cpu(header->reserved2);
+    header->reserved3 = le64_to_cpu(header->reserved3);
+    header->reserved4 = le64_to_cpu(header->reserved4);
+
+    header->volatile_header_offset =
+        le64_to_cpu(header->volatile_header_offset);
+    header->volatile_header_size = le64_to_cpu(header->volatile_header_size);
+
+    header->journal_header_offset = le64_to_cpu(header->journal_header_offset);
+    header->journal_header_size = le64_to_cpu(header->journal_header_size);
+
+    header->journal_offset = le64_to_cpu(header->journal_offset);
+    header->journal_size = le64_to_cpu(header->journal_size);
+
+    header->grain_dir_offset = le64_to_cpu(header->grain_dir_offset);
+    header->grain_dir_size = le64_to_cpu(header->grain_dir_size);
+
+    header->grain_tables_offset = le64_to_cpu(header->grain_tables_offset);
+    header->grain_tables_size = le64_to_cpu(header->grain_tables_size);
+
+    header->free_bitmap_offset = le64_to_cpu(header->free_bitmap_offset);
+    header->free_bitmap_size = le64_to_cpu(header->free_bitmap_size);
+
+    header->backmap_offset = le64_to_cpu(header->backmap_offset);
+    header->backmap_size = le64_to_cpu(header->backmap_size);
+
+    header->grains_offset = le64_to_cpu(header->grains_offset);
+    header->grains_size = le64_to_cpu(header->grains_size);
+
+    if (header->magic != SESPARSE_CONST_HEADER_MAGIC) {
+        error_setg(errp, "Bad const header magic: 0x%016" PRIx64,
+                   header->magic);
+        return -EINVAL;
+    }
+
+    if (header->version != 0x0000000200000001) {
+        error_setg(errp, "Unsupported version: 0x%016" PRIx64,
+                   header->version);
+        return -ENOTSUP;
+    }
+
+    if (header->grain_size != 8) {
+        error_setg(errp, "Unsupported grain size: %" PRIu64,
+                   header->grain_size);
+        return -ENOTSUP;
+    }
+
+    if (header->grain_table_size != 64) {
+        error_setg(errp, "Unsupported grain table size: %" PRIu64,
+                   header->grain_table_size);
+        return -ENOTSUP;
+    }
+
+    if (header->flags != 0) {
+        error_setg(errp, "Unsupported flags: 0x%016" PRIx64,
+                   header->flags);
+        return -ENOTSUP;
+    }
+
+    if (header->reserved1 != 0 || header->reserved2 != 0 ||
+        header->reserved3 != 0 || header->reserved4 != 0) {
+        error_setg(errp, "Unsupported reserved bits:"
+                   " 0x%016" PRIx64 " 0x%016" PRIx64
+                   " 0x%016" PRIx64 " 0x%016" PRIx64,
+                   header->reserved1, header->reserved2,
+                   header->reserved3, header->reserved4);
+        return -ENOTSUP;
+    }
+
+    /* check that padding is 0 */
+    if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
+        error_setg(errp, "Unsupported non-zero const header padding");
+        return -ENOTSUP;
+    }
+
+    return 0;
+}
+
+static int check_se_sparse_volatile_header(VMDKSESparseVolatileHeader *header,
+                                           Error **errp)
+{
+    header->magic = le64_to_cpu(header->magic);
+    header->free_gt_number = le64_to_cpu(header->free_gt_number);
+    header->next_txn_seq_number = le64_to_cpu(header->next_txn_seq_number);
+    header->replay_journal = le64_to_cpu(header->replay_journal);
+
+    if (header->magic != SESPARSE_VOLATILE_HEADER_MAGIC) {
+        error_setg(errp, "Bad volatile header magic: 0x%016" PRIx64,
+                   header->magic);
+        return -EINVAL;
+    }
+
+    if (header->replay_journal) {
+        error_setg(errp, "Image is dirty, Replaying journal not supported");
+        return -ENOTSUP;
+    }
+
+    /* check that padding is 0 */
+    if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
+        error_setg(errp, "Unsupported non-zero volatile header padding");
+        return -ENOTSUP;
+    }
+
+    return 0;
+}
+
+static int vmdk_open_se_sparse(BlockDriverState *bs,
+                               BdrvChild *file,
+                               int flags, Error **errp)
+{
+    int ret;
+    VMDKSESparseConstHeader const_header;
+    VMDKSESparseVolatileHeader volatile_header;
+    VmdkExtent *extent;
+
+    ret = bdrv_apply_auto_read_only(bs,
+            "No write support for seSparse images available", errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    assert(sizeof(const_header) == SECTOR_SIZE);
+
+    ret = bdrv_pread(file, 0, &const_header, sizeof(const_header));
+    if (ret < 0) {
+        bdrv_refresh_filename(file->bs);
+        error_setg_errno(errp, -ret,
+                         "Could not read const header from file '%s'",
+                         file->bs->filename);
+        return ret;
+    }
+
+    /* check const header */
+    ret = check_se_sparse_const_header(&const_header, errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    assert(sizeof(volatile_header) == SECTOR_SIZE);
+
+    ret = bdrv_pread(file,
+                     const_header.volatile_header_offset * SECTOR_SIZE,
+                     &volatile_header, sizeof(volatile_header));
+    if (ret < 0) {
+        bdrv_refresh_filename(file->bs);
+        error_setg_errno(errp, -ret,
+                         "Could not read volatile header from file '%s'",
+                         file->bs->filename);
+        return ret;
+    }
+
+    /* check volatile header */
+    ret = check_se_sparse_volatile_header(&volatile_header, errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = vmdk_add_extent(bs, file, false,
+                          const_header.capacity,
+                          const_header.grain_dir_offset * SECTOR_SIZE,
+                          0,
+                          const_header.grain_dir_size *
+                          SECTOR_SIZE / sizeof(uint64_t),
+                          const_header.grain_table_size *
+                          SECTOR_SIZE / sizeof(uint64_t),
+                          const_header.grain_size,
+                          &extent,
+                          errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    extent->sesparse = true;
+    extent->sesparse_l2_tables_offset = const_header.grain_tables_offset;
+    extent->sesparse_clusters_offset = const_header.grains_offset;
+    extent->entry_size = sizeof(uint64_t);
+
+    ret = vmdk_init_tables(bs, extent, errp);
+    if (ret) {
+        /* free extent allocated by vmdk_add_extent */
+        vmdk_free_last_extent(bs);
+    }
+
+    return ret;
+}
+
 static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
                                QDict *options, Error **errp);
 
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
          * RW [size in sectors] SPARSE "file-name.vmdk"
          * RW [size in sectors] VMFS "file-name.vmdk"
          * RW [size in sectors] VMFSSPARSE "file-name.vmdk"
+         * RW [size in sectors] SESPARSE "file-name.vmdk"
          */
         flat_offset = -1;
         matches = sscanf(p, "%10s %" SCNd64 " %10s \"%511[^\n\r\"]\" %" SCNd64,
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
 
         if (sectors <= 0 ||
             (strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
-             strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE")) ||
+             strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE") &&
+             strcmp(type, "SESPARSE")) ||
             (strcmp(access, "RW"))) {
             continue;
         }
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
                 return ret;
             }
             extent = &s->extents[s->num_extents - 1];
+        } else if (!strcmp(type, "SESPARSE")) {
+            ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
+            if (ret) {
+                bdrv_unref_child(bs, extent_file);
+                return ret;
+            }
+            extent = &s->extents[s->num_extents - 1];
         } else {
             error_setg(errp, "Unsupported extent type '%s'", type);
             bdrv_unref_child(bs, extent_file);
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
     if (strcmp(ct, "monolithicFlat") &&
         strcmp(ct, "vmfs") &&
         strcmp(ct, "vmfsSparse") &&
+        strcmp(ct, "seSparse") &&
         strcmp(ct, "twoGbMaxExtentSparse") &&
         strcmp(ct, "twoGbMaxExtentFlat")) {
         error_setg(errp, "Unsupported image type '%s'", ct);
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
 {
     unsigned int l1_index, l2_offset, l2_index;
     int min_index, i, j;
-    uint32_t min_count, *l2_table;
+    uint32_t min_count;
+    void *l2_table;
     bool zeroed = false;
     int64_t ret;
     int64_t cluster_sector;
+    unsigned int l2_size_bytes = extent->l2_size * extent->entry_size;
 
     if (m_data) {
         m_data->valid = 0;
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
     if (l1_index >= extent->l1_size) {
         return VMDK_ERROR;
     }
-    l2_offset = extent->l1_table[l1_index];
+    if (extent->sesparse) {
+        uint64_t l2_offset_u64;
+
+        assert(extent->entry_size == sizeof(uint64_t));
+
+        l2_offset_u64 = ((uint64_t *)extent->l1_table)[l1_index];
+        if (l2_offset_u64 == 0) {
+            l2_offset = 0;
+        } else if ((l2_offset_u64 & 0xffffffff00000000) != 0x1000000000000000) {
+            /*
+             * Top most nibble is 0x1 if grain table is allocated.
+             * strict check - top most 4 bytes must be 0x10000000 since max
+             * supported size is 64TB for disk - so no more than 64TB / 16MB
+             * grain directories which is smaller than uint32,
+             * where 16MB is the only supported default grain table coverage.
+             */
+            return VMDK_ERROR;
+        } else {
+            l2_offset_u64 = l2_offset_u64 & 0x00000000ffffffff;
+            l2_offset_u64 = extent->sesparse_l2_tables_offset +
+                l2_offset_u64 * l2_size_bytes / SECTOR_SIZE;
+            if (l2_offset_u64 > 0x00000000ffffffff) {
+                return VMDK_ERROR;
+            }
+            l2_offset = (unsigned int)(l2_offset_u64);
+        }
+    } else {
+        assert(extent->entry_size == sizeof(uint32_t));
+        l2_offset = ((uint32_t *)extent->l1_table)[l1_index];
+    }
     if (!l2_offset) {
         return VMDK_UNALLOC;
     }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
                     extent->l2_cache_counts[j] >>= 1;
                 }
             }
-            l2_table = extent->l2_cache + (i * extent->l2_size);
+            l2_table = (char *)extent->l2_cache + (i * l2_size_bytes);
             goto found;
         }
     }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
             min_index = i;
         }
     }
-    l2_table = extent->l2_cache + (min_index * extent->l2_size);
+    l2_table = (char *)extent->l2_cache + (min_index * l2_size_bytes);
     BLKDBG_EVENT(extent->file, BLKDBG_L2_LOAD);
     if (bdrv_pread(extent->file,
                 (int64_t)l2_offset * 512,
                 l2_table,
-                extent->l2_size * sizeof(uint32_t)
-            ) != extent->l2_size * sizeof(uint32_t)) {
+                l2_size_bytes
+            ) != l2_size_bytes) {
         return VMDK_ERROR;
     }
 
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
     extent->l2_cache_counts[min_index] = 1;
  found:
     l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
-    cluster_sector = le32_to_cpu(l2_table[l2_index]);
 
-    if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
-        zeroed = true;
+    if (extent->sesparse) {
+        cluster_sector = le64_to_cpu(((uint64_t *)l2_table)[l2_index]);
+        switch (cluster_sector & 0xf000000000000000) {
+        case 0x0000000000000000:
+            /* unallocated grain */
+            if (cluster_sector != 0) {
+                return VMDK_ERROR;
+            }
+            break;
+        case 0x1000000000000000:
+            /* scsi-unmapped grain - fallthrough */
+        case 0x2000000000000000:
+            /* zero grain */
+            zeroed = true;
+            break;
+        case 0x3000000000000000:
+            /* allocated grain */
+            cluster_sector = (((cluster_sector & 0x0fff000000000000) >> 48) |
+                              ((cluster_sector & 0x0000ffffffffffff) << 12));
+            cluster_sector = extent->sesparse_clusters_offset +
+                cluster_sector * extent->cluster_sectors;
+            break;
+        default:
+            return VMDK_ERROR;
+        }
+    } else {
+        cluster_sector = le32_to_cpu(((uint32_t *)l2_table)[l2_index]);
+
+        if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
+            zeroed = true;
+        }
     }
 
     if (!cluster_sector || zeroed) {
         if (!allocate) {
             return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
         }
+        assert(!extent->sesparse);
 
         if (extent->next_cluster_sector >= VMDK_EXTENT_MAX_SECTORS) {
             return VMDK_ERROR;
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
             m_data->l1_index = l1_index;
             m_data->l2_index = l2_index;
             m_data->l2_offset = l2_offset;
-            m_data->l2_cache_entry = &l2_table[l2_index];
+            m_data->l2_cache_entry = ((uint32_t *)l2_table) + l2_index;
         }
     }
     *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
@@ -XXX,XX +XXX,XX @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
         if (!extent) {
             return -EIO;
         }
+        if (extent->sesparse) {
+            return -ENOTSUP;
+        }
         offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
         n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
                              - offset_in_cluster);
-- 
2.21.0

From: Pino Toscano <ptoscano@redhat.com>

Rewrite the implementation of the ssh block driver to use libssh instead
of libssh2.  The libssh library has various advantages over libssh2:
- easier API for authentication (for example for using ssh-agent)
- easier API for known_hosts handling
- supports newer types of keys in known_hosts

Use APIs/features available in libssh 0.8 conditionally, to support
older versions (which are not recommended though).

Adjust the iotest 207 according to the different error message, and to
find the default key type for localhost (to properly compare the
fingerprint with).
Contributed-by: Max Reitz <mreitz@redhat.com>

Adjust the various Docker/Travis scripts to use libssh when available
instead of libssh2. The mingw/mxe testing is dropped for now, as there
are no packages for it.

Signed-off-by: Pino Toscano <ptoscano@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20190620200840.17655-1-ptoscano@redhat.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 configure                                     |  65 +-
 block/Makefile.objs                           |   6 +-
 block/ssh.c                                   | 652 ++++++++++--------
 .travis.yml                                   |   4 +-
 block/trace-events                            |  14 +-
 docs/qemu-block-drivers.texi                  |   2 +-
 .../dockerfiles/debian-win32-cross.docker     |   1 -
 .../dockerfiles/debian-win64-cross.docker     |   1 -
 tests/docker/dockerfiles/fedora.docker        |   4 +-
 tests/docker/dockerfiles/ubuntu.docker        |   2 +-
 tests/docker/dockerfiles/ubuntu1804.docker    |   2 +-
 tests/qemu-iotests/207                        |  54 +-
 tests/qemu-iotests/207.out                    |   2 +-
 13 files changed, 449 insertions(+), 360 deletions(-)

diff --git a/configure b/configure
index XXXXXXX..XXXXXXX 100755
--- a/configure
+++ b/configure
@@ -XXX,XX +XXX,XX @@ auth_pam=""
 vte=""
 virglrenderer=""
 tpm=""
-libssh2=""
+libssh=""
 live_block_migration="yes"
 numa=""
 tcmalloc="no"
@@ -XXX,XX +XXX,XX @@ for opt do
   ;;
   --enable-tpm) tpm="yes"
   ;;
-  --disable-libssh2) libssh2="no"
+  --disable-libssh) libssh="no"
   ;;
-  --enable-libssh2) libssh2="yes"
+  --enable-libssh) libssh="yes"
   ;;
   --disable-live-block-migration) live_block_migration="no"
   ;;
@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available:
   coroutine-pool  coroutine freelist (better performance)
   glusterfs       GlusterFS backend
   tpm             TPM support
-  libssh2         ssh block device support
+  libssh          ssh block device support
   numa            libnuma support
   libxml2         for Parallels image format
   tcmalloc        tcmalloc support
@@ -XXX,XX +XXX,XX @@ EOF
 fi
 
 ##########################################
-# libssh2 probe
-min_libssh2_version=1.2.8
-if test "$libssh2" != "no" ; then
-  if $pkg_config --atleast-version=$min_libssh2_version libssh2; then
-    libssh2_cflags=$($pkg_config libssh2 --cflags)
-    libssh2_libs=$($pkg_config libssh2 --libs)
-    libssh2=yes
+# libssh probe
+if test "$libssh" != "no" ; then
+  if $pkg_config --exists libssh; then
+    libssh_cflags=$($pkg_config libssh --cflags)
+    libssh_libs=$($pkg_config libssh --libs)
+    libssh=yes
   else
-    if test "$libssh2" = "yes" ; then
-      error_exit "libssh2 >= $min_libssh2_version required for --enable-libssh2"
+    if test "$libssh" = "yes" ; then
+      error_exit "libssh required for --enable-libssh"
     fi
-    libssh2=no
+    libssh=no
   fi
 fi
 
 ##########################################
-# libssh2_sftp_fsync probe
+# Check for libssh 0.8
+# This is done like this instead of using the LIBSSH_VERSION_* and
+# SSH_VERSION_* macros because some distributions in the past shipped
+# snapshots of the future 0.8 from Git, and those snapshots did not
+# have updated version numbers (still referring to 0.7.0).
 
-if test "$libssh2" = "yes"; then
+if test "$libssh" = "yes"; then
   cat > $TMPC <<EOF
-#include <stdio.h>
-#include <libssh2.h>
-#include <libssh2_sftp.h>
-int main(void) {
-    LIBSSH2_SESSION *session;
-    LIBSSH2_SFTP *sftp;
-    LIBSSH2_SFTP_HANDLE *sftp_handle;
-    session = libssh2_session_init ();
-    sftp = libssh2_sftp_init (session);
-    sftp_handle = libssh2_sftp_open (sftp, "/", 0, 0);
-    libssh2_sftp_fsync (sftp_handle);
-    return 0;
-}
+#include <libssh/libssh.h>
+int main(void) { return ssh_get_server_publickey(NULL, NULL); }
 EOF
-  # libssh2_cflags/libssh2_libs defined in previous test.
-  if compile_prog "$libssh2_cflags" "$libssh2_libs" ; then
-    QEMU_CFLAGS="-DHAS_LIBSSH2_SFTP_FSYNC $QEMU_CFLAGS"
+  if compile_prog "$libssh_cflags" "$libssh_libs"; then
+    libssh_cflags="-DHAVE_LIBSSH_0_8 $libssh_cflags"
   fi
 fi
 
@@ -XXX,XX +XXX,XX @@ echo "GlusterFS support $glusterfs"
 echo "gcov              $gcov_tool"
 echo "gcov enabled      $gcov"
 echo "TPM support       $tpm"
-echo "libssh2 support   $libssh2"
+echo "libssh support    $libssh"
 echo "QOM debugging     $qom_cast_debug"
 echo "Live block migration $live_block_migration"
 echo "lzo support       $lzo"
@@ -XXX,XX +XXX,XX @@ if test "$glusterfs_iocb_has_stat" = "yes" ; then
   echo "CONFIG_GLUSTERFS_IOCB_HAS_STAT=y" >> $config_host_mak
 fi
 
-if test "$libssh2" = "yes" ; then
-  echo "CONFIG_LIBSSH2=m" >> $config_host_mak
-  echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
-  echo "LIBSSH2_LIBS=$libssh2_libs" >> $config_host_mak
+if test "$libssh" = "yes" ; then
+  echo "CONFIG_LIBSSH=m" >> $config_host_mak
+  echo "LIBSSH_CFLAGS=$libssh_cflags" >> $config_host_mak
+  echo "LIBSSH_LIBS=$libssh_libs" >> $config_host_mak
 fi
 
 if test "$live_block_migration" = "yes" ; then
diff --git a/block/Makefile.objs b/block/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -XXX,XX +XXX,XX @@ block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 block-obj-$(CONFIG_GLUSTERFS) += gluster.o
 block-obj-$(CONFIG_VXHS) += vxhs.o
-block-obj-$(CONFIG_LIBSSH2) += ssh.o
+block-obj-$(CONFIG_LIBSSH) += ssh.o
 block-obj-y += accounting.o dirty-bitmap.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
@@ -XXX,XX +XXX,XX @@ rbd.o-libs         := $(RBD_LIBS)
 gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
 gluster.o-libs     := $(GLUSTERFS_LIBS)
 vxhs.o-libs        := $(VXHS_LIBS)
-ssh.o-cflags       := $(LIBSSH2_CFLAGS)
-ssh.o-libs         := $(LIBSSH2_LIBS)
+ssh.o-cflags       := $(LIBSSH_CFLAGS)
+ssh.o-libs         := $(LIBSSH_LIBS)
 block-obj-dmg-bz2-$(CONFIG_BZIP2) += dmg-bz2.o
 block-obj-$(if $(CONFIG_DMG),m,n) += $(block-obj-dmg-bz2-y)
 dmg-bz2.o-libs     := $(BZIP2_LIBS)
diff --git a/block/ssh.c b/block/ssh.c
index XXXXXXX..XXXXXXX 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/osdep.h"
 
-#include <libssh2.h>
-#include <libssh2_sftp.h>
+#include <libssh/libssh.h>
+#include <libssh/sftp.h>
 
 #include "block/block_int.h"
 #include "block/qdict.h"
@@ -XXX,XX +XXX,XX @@
 #include "trace.h"
 
 /*
- * TRACE_LIBSSH2=<bitmask> enables tracing in libssh2 itself.  Note
- * that this requires that libssh2 was specially compiled with the
- * `./configure --enable-debug' option, so most likely you will have
- * to compile it yourself.  The meaning of <bitmask> is described
- * here: http://www.libssh2.org/libssh2_trace.html
+ * TRACE_LIBSSH=<level> enables tracing in libssh itself.
+ * The meaning of <level> is described here:
+ * http://api.libssh.org/master/group__libssh__log.html
  */
-#define TRACE_LIBSSH2 0 /* or try: LIBSSH2_TRACE_SFTP */
+#define TRACE_LIBSSH  0 /* see: SSH_LOG_* */
 
 typedef struct BDRVSSHState {
     /* Coroutine. */
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVSSHState {
 
     /* SSH connection. */
     int sock;                         /* socket */
-    LIBSSH2_SESSION *session;         /* ssh session */
-    LIBSSH2_SFTP *sftp;               /* sftp session */
-    LIBSSH2_SFTP_HANDLE *sftp_handle; /* sftp remote file handle */
+    ssh_session session;              /* ssh session */
+    sftp_session sftp;                /* sftp session */
+    sftp_file sftp_handle;            /* sftp remote file handle */
 
-    /* See ssh_seek() function below. */
-    int64_t offset;
-    bool offset_op_read;
-
-    /* File attributes at open.  We try to keep the .filesize field
+    /*
+     * File attributes at open.  We try to keep the .size field
      * updated if it changes (eg by writing at the end of the file).
      */
-    LIBSSH2_SFTP_ATTRIBUTES attrs;
+    sftp_attributes attrs;
 
     InetSocketAddress *inet;
 
@@ -XXX,XX +XXX,XX @@ static void ssh_state_init(BDRVSSHState *s)
 {
     memset(s, 0, sizeof *s);
     s->sock = -1;
-    s->offset = -1;
     qemu_co_mutex_init(&s->lock);
 }
 
@@ -XXX,XX +XXX,XX @@ static void ssh_state_free(BDRVSSHState *s)
 {
     g_free(s->user);
 
+    if (s->attrs) {
+        sftp_attributes_free(s->attrs);
+    }
     if (s->sftp_handle) {
-        libssh2_sftp_close(s->sftp_handle);
+        sftp_close(s->sftp_handle);
     }
     if (s->sftp) {
-        libssh2_sftp_shutdown(s->sftp);
+        sftp_free(s->sftp);
     }
     if (s->session) {
-        libssh2_session_disconnect(s->session,
-                                   "from qemu ssh client: "
-                                   "user closed the connection");
-        libssh2_session_free(s->session);
-    }
-    if (s->sock >= 0) {
-        close(s->sock);
+        ssh_disconnect(s->session);
+        ssh_free(s->session); /* This frees s->sock */
     }
 }
 
@@ -XXX,XX +XXX,XX @@ session_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
     va_end(args);
 
     if (s->session) {
-        char *ssh_err;
+        const char *ssh_err;
         int ssh_err_code;
 
-        /* This is not an errno.  See <libssh2.h>. */
-        ssh_err_code = libssh2_session_last_error(s->session,
-                                                  &ssh_err, NULL, 0);
-        error_setg(errp, "%s: %s (libssh2 error code: %d)",
+        /* This is not an errno.  See <libssh/libssh.h>. */
+        ssh_err = ssh_get_error(s->session);
+        ssh_err_code = ssh_get_error_code(s->session);
+        error_setg(errp, "%s: %s (libssh error code: %d)",
                    msg, ssh_err, ssh_err_code);
     } else {
         error_setg(errp, "%s", msg);
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
     va_end(args);
 
     if (s->sftp) {
-        char *ssh_err;
+        const char *ssh_err;
         int ssh_err_code;
-        unsigned long sftp_err_code;
+        int sftp_err_code;
 
-        /* This is not an errno.  See <libssh2.h>. */
-        ssh_err_code = libssh2_session_last_error(s->session,
-                                                  &ssh_err, NULL, 0);
-        /* See <libssh2_sftp.h>. */
-        sftp_err_code = libssh2_sftp_last_error((s)->sftp);
+        /* This is not an errno.  See <libssh/libssh.h>. */
+        ssh_err = ssh_get_error(s->session);
+        ssh_err_code = ssh_get_error_code(s->session);
+        /* See <libssh/sftp.h>. */
+        sftp_err_code = sftp_get_error(s->sftp);
 
         error_setg(errp,
-                   "%s: %s (libssh2 error code: %d, sftp error code: %lu)",
+                   "%s: %s (libssh error code: %d, sftp error code: %d)",
                    msg, ssh_err, ssh_err_code, sftp_err_code);
     } else {
         error_setg(errp, "%s", msg);
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
 
 static void sftp_error_trace(BDRVSSHState *s, const char *op)
 {
-    char *ssh_err;
+    const char *ssh_err;
     int ssh_err_code;
-    unsigned long sftp_err_code;
+    int sftp_err_code;
 
-    /* This is not an errno.  See <libssh2.h>. */
-    ssh_err_code = libssh2_session_last_error(s->session,
-                                              &ssh_err, NULL, 0);
-    /* See <libssh2_sftp.h>. */
-    sftp_err_code = libssh2_sftp_last_error((s)->sftp);
+    /* This is not an errno.  See <libssh/libssh.h>. */
+    ssh_err = ssh_get_error(s->session);
+    ssh_err_code = ssh_get_error_code(s->session);
+    /* See <libssh/sftp.h>. */
+    sftp_err_code = sftp_get_error(s->sftp);
 
     trace_sftp_error(op, ssh_err, ssh_err_code, sftp_err_code);
 }
@@ -XXX,XX +XXX,XX @@ static void ssh_parse_filename(const char *filename, QDict *options,
     parse_uri(filename, options, errp);
 }
 
-static int check_host_key_knownhosts(BDRVSSHState *s,
-                                     const char *host, int port, Error **errp)
+static int check_host_key_knownhosts(BDRVSSHState *s, Error **errp)
 {
-    const char *home;
-    char *knh_file = NULL;
-    LIBSSH2_KNOWNHOSTS *knh = NULL;
-    struct libssh2_knownhost *found;
-    int ret, r;
-    const char *hostkey;
-    size_t len;
-    int type;
-
-    hostkey = libssh2_session_hostkey(s->session, &len, &type);
-    if (!hostkey) {
+    int ret;
+#ifdef HAVE_LIBSSH_0_8
+    enum ssh_known_hosts_e state;
+    int r;
+    ssh_key pubkey;
+    enum ssh_keytypes_e pubkey_type;
+    unsigned char *server_hash = NULL;
+    size_t server_hash_len;
+    char *fingerprint = NULL;
+
+    state = ssh_session_is_known_server(s->session);
+    trace_ssh_server_status(state);
+
+    switch (state) {
+    case SSH_KNOWN_HOSTS_OK:
+        /* OK */
+        trace_ssh_check_host_key_knownhosts();
+        break;
+    case SSH_KNOWN_HOSTS_CHANGED:
         ret = -EINVAL;
-        session_error_setg(errp, s, "failed to read remote host key");
+        r = ssh_get_server_publickey(s->session, &pubkey);
+        if (r == 0) {
+            r = ssh_get_publickey_hash(pubkey, SSH_PUBLICKEY_HASH_SHA256,
+                                       &server_hash, &server_hash_len);
+            pubkey_type = ssh_key_type(pubkey);
+            ssh_key_free(pubkey);
+        }
+        if (r == 0) {
+            fingerprint = ssh_get_fingerprint_hash(SSH_PUBLICKEY_HASH_SHA256,
+                                                   server_hash,
+                                                   server_hash_len);
+            ssh_clean_pubkey_hash(&server_hash);
+        }
+        if (fingerprint) {
+            error_setg(errp,
+                       "host key (%s key with fingerprint %s) does not match "
+                       "the one in known_hosts; this may be a possible attack",
+                       ssh_key_type_to_char(pubkey_type), fingerprint);
+            ssh_string_free_char(fingerprint);
+        } else  {
+            error_setg(errp,
+                       "host key does not match the one in known_hosts; this "
+                       "may be a possible attack");
+        }
         goto out;
-    }
-
-    knh = libssh2_knownhost_init(s->session);
-    if (!knh) {
+    case SSH_KNOWN_HOSTS_OTHER:
         ret = -EINVAL;
-        session_error_setg(errp, s,
-                           "failed to initialize known hosts support");
+        error_setg(errp,
+                   "host key for this server not found, another type exists");
+        goto out;
+    case SSH_KNOWN_HOSTS_UNKNOWN:
+        ret = -EINVAL;
+        error_setg(errp, "no host key was found in known_hosts");
+        goto out;
+    case SSH_KNOWN_HOSTS_NOT_FOUND:
+        ret = -ENOENT;
+        error_setg(errp, "known_hosts file not found");
+        goto out;
+    case SSH_KNOWN_HOSTS_ERROR:
+        ret = -EINVAL;
+        error_setg(errp, "error while checking the host");
+        goto out;
+    default:
+        ret = -EINVAL;
+        error_setg(errp, "error while checking for known server (%d)", state);
         goto out;
     }
+#else /* !HAVE_LIBSSH_0_8 */
+    int state;
 
-    home = getenv("HOME");
-    if (home) {
-        knh_file = g_strdup_printf("%s/.ssh/known_hosts", home);
-    } else {
-        knh_file = g_strdup_printf("/root/.ssh/known_hosts");
-    }
-
-    /* Read all known hosts from OpenSSH-style known_hosts file. */
-    libssh2_knownhost_readfile(knh, knh_file, LIBSSH2_KNOWNHOST_FILE_OPENSSH);
+    state = ssh_is_server_known(s->session);
+    trace_ssh_server_status(state);
 
-    r = libssh2_knownhost_checkp(knh, host, port, hostkey, len,
-                                 LIBSSH2_KNOWNHOST_TYPE_PLAIN|
-                                 LIBSSH2_KNOWNHOST_KEYENC_RAW,
-                                 &found);
-    switch (r) {
-    case LIBSSH2_KNOWNHOST_CHECK_MATCH:
+    switch (state) {
+    case SSH_SERVER_KNOWN_OK:
         /* OK */
-        trace_ssh_check_host_key_knownhosts(found->key);
+        trace_ssh_check_host_key_knownhosts();
         break;
-    case LIBSSH2_KNOWNHOST_CHECK_MISMATCH:
+    case SSH_SERVER_KNOWN_CHANGED:
         ret = -EINVAL;
-        session_error_setg(errp, s,
-                      "host key does not match the one in known_hosts"
-                      " (found key %s)", found->key);
+        error_setg(errp,
+                   "host key does not match the one in known_hosts; this "
+                   "may be a possible attack");
         goto out;
-    case LIBSSH2_KNOWNHOST_CHECK_NOTFOUND:
+    case SSH_SERVER_FOUND_OTHER:
         ret = -EINVAL;
-        session_error_setg(errp, s, "no host key was found in known_hosts");
+        error_setg(errp,
+                   "host key for this server not found, another type exists");
+        goto out;
+    case SSH_SERVER_FILE_NOT_FOUND:
+        ret = -ENOENT;
+        error_setg(errp, "known_hosts file not found");
         goto out;
-    case LIBSSH2_KNOWNHOST_CHECK_FAILURE:
+    case SSH_SERVER_NOT_KNOWN:
         ret = -EINVAL;
-        session_error_setg(errp, s,
-                      "failure matching the host key with known_hosts");
+        error_setg(errp, "no host key was found in known_hosts");
+        goto out;
+    case SSH_SERVER_ERROR:
+        ret = -EINVAL;
+        error_setg(errp, "server error");
         goto out;
     default:
         ret = -EINVAL;
-        session_error_setg(errp, s, "unknown error matching the host key"
-                      " with known_hosts (%d)", r);
+        error_setg(errp, "error while checking for known server (%d)", state);
         goto out;
     }
+#endif /* !HAVE_LIBSSH_0_8 */
 
     /* known_hosts checking successful. */
     ret = 0;
 
  out:
-    if (knh != NULL) {
-        libssh2_knownhost_free(knh);
-    }
-    g_free(knh_file);
     return ret;
 }
 
@@ -XXX,XX +XXX,XX @@ static int compare_fingerprint(const unsigned char *fingerprint, size_t len,
 
 static int
 check_host_key_hash(BDRVSSHState *s, const char *hash,
-                    int hash_type, size_t fingerprint_len, Error **errp)
+                    enum ssh_publickey_hash_type type, Error **errp)
 {
-    const char *fingerprint;
-
-    fingerprint = libssh2_hostkey_hash(s->session, hash_type);
-    if (!fingerprint) {
+    int r;
+    ssh_key pubkey;
+    unsigned char *server_hash;
+    size_t server_hash_len;
+
+#ifdef HAVE_LIBSSH_0_8
+    r = ssh_get_server_publickey(s->session, &pubkey);
+#else
+    r = ssh_get_publickey(s->session, &pubkey);
+#endif
+    if (r != SSH_OK) {
         session_error_setg(errp, s, "failed to read remote host key");
         return -EINVAL;
     }
 
-    if(compare_fingerprint((unsigned char *) fingerprint, fingerprint_len,
-                           hash) != 0) {
+    r = ssh_get_publickey_hash(pubkey, type, &server_hash, &server_hash_len);
+    ssh_key_free(pubkey);
+    if (r != 0) {
+        session_error_setg(errp, s,
+                           "failed reading the hash of the server SSH key");
+        return -EINVAL;
+    }
+
+    r = compare_fingerprint(server_hash, server_hash_len, hash);
+    ssh_clean_pubkey_hash(&server_hash);
+    if (r != 0) {
         error_setg(errp, "remote host key does not match host_key_check '%s'",
                    hash);
         return -EPERM;
@@ -XXX,XX +XXX,XX @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
     return 0;
 }
 
-static int check_host_key(BDRVSSHState *s, const char *host, int port,
-                          SshHostKeyCheck *hkc, Error **errp)
+static int check_host_key(BDRVSSHState *s, SshHostKeyCheck *hkc, Error **errp)
 {
     SshHostKeyCheckMode mode;
 
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
     case SSH_HOST_KEY_CHECK_MODE_HASH:
         if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_MD5) {
             return check_host_key_hash(s, hkc->u.hash.hash,
-                                       LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
+                                       SSH_PUBLICKEY_HASH_MD5, errp);
         } else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA1) {
             return check_host_key_hash(s, hkc->u.hash.hash,
-                                       LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
+                                       SSH_PUBLICKEY_HASH_SHA1, errp);
         }
         g_assert_not_reached();
         break;
     case SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS:
-        return check_host_key_knownhosts(s, host, port, errp);
+        return check_host_key_knownhosts(s, errp);
     default:
         g_assert_not_reached();
     }
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
     return -EINVAL;
 }
 
-static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
+static int authenticate(BDRVSSHState *s, Error **errp)
 {
     int r, ret;
-    const char *userauthlist;
-    LIBSSH2_AGENT *agent = NULL;
-    struct libssh2_agent_publickey *identity;
-    struct libssh2_agent_publickey *prev_identity = NULL;
+    int method;
 
-    userauthlist = libssh2_userauth_list(s->session, user, strlen(user));
-    if (strstr(userauthlist, "publickey") == NULL) {
+    /* Try to authenticate with the "none" method. */
+    r = ssh_userauth_none(s->session, NULL);
+    if (r == SSH_AUTH_ERROR) {
         ret = -EPERM;
-        error_setg(errp,
-                "remote server does not support \"publickey\" authentication");
+        session_error_setg(errp, s, "failed to authenticate using none "
+                                    "authentication");
         goto out;
-    }
-
-    /* Connect to ssh-agent and try each identity in turn. */
-    agent = libssh2_agent_init(s->session);
-    if (!agent) {
-        ret = -EINVAL;
-        session_error_setg(errp, s, "failed to initialize ssh-agent support");
-        goto out;
-    }
-    if (libssh2_agent_connect(agent)) {
-        ret = -ECONNREFUSED;
-        session_error_setg(errp, s, "failed to connect to ssh-agent");
-        goto out;
-    }
-    if (libssh2_agent_list_identities(agent)) {
-        ret = -EINVAL;
-        session_error_setg(errp, s,
-                           "failed requesting identities from ssh-agent");
+    } else if (r == SSH_AUTH_SUCCESS) {
+        /* Authenticated! */
+        ret = 0;
         goto out;
     }
 
-    for(;;) {
-        r = libssh2_agent_get_identity(agent, &identity, prev_identity);
-        if (r == 1) {           /* end of list */
-            break;
-        }
-        if (r < 0) {
+    method = ssh_userauth_list(s->session, NULL);
+    trace_ssh_auth_methods(method);
+
+    /*
+     * Try to authenticate with publickey, using the ssh-agent
+     * if available.
+     */
+    if (method & SSH_AUTH_METHOD_PUBLICKEY) {
+        r = ssh_userauth_publickey_auto(s->session, NULL, NULL);
+        if (r == SSH_AUTH_ERROR) {
             ret = -EINVAL;
-            session_error_setg(errp, s,
-                               "failed to obtain identity from ssh-agent");
+            session_error_setg(errp, s, "failed to authenticate using "
+                                        "publickey authentication");
             goto out;
-        }
-        r = libssh2_agent_userauth(agent, user, identity);
-        if (r == 0) {
+        } else if (r == SSH_AUTH_SUCCESS) {
             /* Authenticated! */
             ret = 0;
             goto out;
         }
-        /* Failed to authenticate with this identity, try the next one. */
-        prev_identity = identity;
     }
 
     ret = -EPERM;
@@ -XXX,XX +XXX,XX @@ static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
                "and the identities held by your ssh-agent");
 
  out:
-    if (agent != NULL) {
-        /* Note: libssh2 implementation implicitly calls
-         * libssh2_agent_disconnect if necessary.
-         */
-        libssh2_agent_free(agent);
-    }
-
     return ret;
 }
 
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
                           int ssh_flags, int creat_mode, Error **errp)
 {
     int r, ret;
-    long port = 0;
+    unsigned int port = 0;
+    int new_sock = -1;
 
     if (opts->has_user) {
         s->user = g_strdup(opts->user);
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
     s->inet = opts->server;
     opts->server = NULL;
 
-    if (qemu_strtol(s->inet->port, NULL, 10, &port) < 0) {
+    if (qemu_strtoui(s->inet->port, NULL, 10, &port) < 0) {
         error_setg(errp, "Use only numeric port value");
         ret = -EINVAL;
         goto err;
     }
 
     /* Open the socket and connect. */
-    s->sock = inet_connect_saddr(s->inet, errp);
-    if (s->sock < 0) {
+    new_sock = inet_connect_saddr(s->inet, errp);
+    if (new_sock < 0) {
         ret = -EIO;
         goto err;
     }
 
+    /*
+     * Try to disable the Nagle algorithm on TCP sockets to reduce latency,
+     * but do not fail if it cannot be disabled.
+     */
+    r = socket_set_nodelay(new_sock);
+    if (r < 0) {
+        warn_report("can't set TCP_NODELAY for the ssh server %s: %s",
+                    s->inet->host, strerror(errno));
+    }
+
     /* Create SSH session. */
-    s->session = libssh2_session_init();
+    s->session = ssh_new();
     if (!s->session) {
         ret = -EINVAL;
-        session_error_setg(errp, s, "failed to initialize libssh2 session");
+        session_error_setg(errp, s, "failed to initialize libssh session");
         goto err;
     }
 
-#if TRACE_LIBSSH2 != 0
-    libssh2_trace(s->session, TRACE_LIBSSH2);
-#endif
+    /*
+     * Make sure we are in blocking mode during the connection and
+     * authentication phases.
+     */
+    ssh_set_blocking(s->session, 1);
 
-    r = libssh2_session_handshake(s->session, s->sock);
-    if (r != 0) {
+    r = ssh_options_set(s->session, SSH_OPTIONS_USER, s->user);
+    if (r < 0) {
+        ret = -EINVAL;
+        session_error_setg(errp, s,
+                           "failed to set the user in the libssh session");
+        goto err;
+    }
+
+    r = ssh_options_set(s->session, SSH_OPTIONS_HOST, s->inet->host);
+    if (r < 0) {
+        ret = -EINVAL;
+        session_error_setg(errp, s,
+                           "failed to set the host in the libssh session");
+        goto err;
+    }
+
+    if (port > 0) {
+        r = ssh_options_set(s->session, SSH_OPTIONS_PORT, &port);
+        if (r < 0) {
+            ret = -EINVAL;
+            session_error_setg(errp, s,
+                               "failed to set the port in the libssh session");
+            goto err;
+        }
+    }
+
+    r = ssh_options_set(s->session, SSH_OPTIONS_COMPRESSION, "none");
+    if (r < 0) {
+        ret = -EINVAL;
+        session_error_setg(errp, s,
+                           "failed to disable the compression in the libssh "
+                           "session");
+        goto err;
+    }
+
+    /* Read ~/.ssh/config. */
+    r = ssh_options_parse_config(s->session, NULL);
+    if (r < 0) {
+        ret = -EINVAL;
+        session_error_setg(errp, s, "failed to parse ~/.ssh/config");
+        goto err;
+    }
+
+    r = ssh_options_set(s->session, SSH_OPTIONS_FD, &new_sock);
+    if (r < 0) {
+        ret = -EINVAL;
+        session_error_setg(errp, s,
+                           "failed to set the socket in the libssh session");
+        goto err;
+    }
+    /* libssh took ownership of the socket. */
+    s->sock = new_sock;
+    new_sock = -1;
+
+    /* Connect. */
+    r = ssh_connect(s->session);
+    if (r != SSH_OK) {
         ret = -EINVAL;
         session_error_setg(errp, s, "failed to establish SSH session");
         goto err;
     }
 
     /* Check the remote host's key against known_hosts. */
-    ret = check_host_key(s, s->inet->host, port, opts->host_key_check, errp);
+    ret = check_host_key(s, opts->host_key_check, errp);
     if (ret < 0) {
         goto err;
     }
 
     /* Authenticate. */
-    ret = authenticate(s, s->user, errp);
+    ret = authenticate(s, errp);
     if (ret < 0) {
         goto err;
     }
 
     /* Start SFTP. */
-    s->sftp = libssh2_sftp_init(s->session);
+    s->sftp = sftp_new(s->session);
     if (!s->sftp) {
-        session_error_setg(errp, s, "failed to initialize sftp handle");
+        session_error_setg(errp, s, "failed to create sftp handle");
+        ret = -EINVAL;
+        goto err;
+    }
+
+    r = sftp_init(s->sftp);
+    if (r < 0) {
+        sftp_error_setg(errp, s, "failed to initialize sftp handle");
         ret = -EINVAL;
         goto err;
     }
 
     /* Open the remote file. */
     trace_ssh_connect_to_ssh(opts->path, ssh_flags, creat_mode);
-    s->sftp_handle = libssh2_sftp_open(s->sftp, opts->path, ssh_flags,
-                                       creat_mode);
+    s->sftp_handle = sftp_open(s->sftp, opts->path, ssh_flags, creat_mode);
     if (!s->sftp_handle) {
-        session_error_setg(errp, s, "failed to open remote file '%s'",
-                           opts->path);
+        sftp_error_setg(errp, s, "failed to open remote file '%s'",
+                        opts->path);
         ret = -EINVAL;
         goto err;
     }
 
-    r = libssh2_sftp_fstat(s->sftp_handle, &s->attrs);
-    if (r < 0) {
+    /* Make sure the SFTP file is handled in blocking mode. */
+    sftp_file_set_blocking(s->sftp_handle);
+
+    s->attrs = sftp_fstat(s->sftp_handle);
+    if (!s->attrs) {
         sftp_error_setg(errp, s, "failed to read file attributes");
         return -EINVAL;
     }
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
     return 0;
 
  err:
+    if (s->attrs) {
+        sftp_attributes_free(s->attrs);
+    }
+    s->attrs = NULL;
     if (s->sftp_handle) {
-        libssh2_sftp_close(s->sftp_handle);
+        sftp_close(s->sftp_handle);
     }
     s->sftp_handle = NULL;
     if (s->sftp) {
-        libssh2_sftp_shutdown(s->sftp);
+        sftp_free(s->sftp);
     }
     s->sftp = NULL;
     if (s->session) {
-        libssh2_session_disconnect(s->session,
-                                   "from qemu ssh client: "
-                                   "error opening connection");
-        libssh2_session_free(s->session);
+        ssh_disconnect(s->session);
+        ssh_free(s->session);
     }
     s->session = NULL;
+    s->sock = -1;
+    if (new_sock >= 0) {
+        close(new_sock);
+    }
 
     return ret;
 }
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
 
     ssh_state_init(s);
 
-    ssh_flags = LIBSSH2_FXF_READ;
+    ssh_flags = 0;
     if (bdrv_flags & BDRV_O_RDWR) {
-        ssh_flags |= LIBSSH2_FXF_WRITE;
+        ssh_flags |= O_RDWR;
+    } else {
+        ssh_flags |= O_RDONLY;
     }
 
     opts = ssh_parse_options(options, errp);
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
     }
 
     /* Go non-blocking. */
-    libssh2_session_set_blocking(s->session, 0);
+    ssh_set_blocking(s->session, 0);
 
     qapi_free_BlockdevOptionsSsh(opts);
 
     return 0;
 
  err:
-    if (s->sock >= 0) {
-        close(s->sock);
-    }
-    s->sock = -1;
-
     qapi_free_BlockdevOptionsSsh(opts);
 
     return ret;
@@ -XXX,XX +XXX,XX @@ static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
 {
     ssize_t ret;
     char c[1] = { '\0' };
-    int was_blocking = libssh2_session_get_blocking(s->session);
+    int was_blocking = ssh_is_blocking(s->session);
 
     /* offset must be strictly greater than the current size so we do
      * not overwrite anything */
-    assert(offset > 0 && offset > s->attrs.filesize);
+    assert(offset > 0 && offset > s->attrs->size);
 
-    libssh2_session_set_blocking(s->session, 1);
+    ssh_set_blocking(s->session, 1);
 
-    libssh2_sftp_seek64(s->sftp_handle, offset - 1);
-    ret = libssh2_sftp_write(s->sftp_handle, c, 1);
+    sftp_seek64(s->sftp_handle, offset - 1);
+    ret = sftp_write(s->sftp_handle, c, 1);
 
-    libssh2_session_set_blocking(s->session, was_blocking);
+    ssh_set_blocking(s->session, was_blocking);
 
     if (ret < 0) {
         sftp_error_setg(errp, s, "Failed to grow file");
         return -EIO;
     }
 
-    s->attrs.filesize = offset;
+    s->attrs->size = offset;
     return 0;
 }
 
@@ -XXX,XX +XXX,XX @@ static int ssh_co_create(BlockdevCreateOptions *options, Error **errp)
     ssh_state_init(&s);
 
     ret = connect_to_ssh(&s, opts->location,
-                         LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
-                         LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
+                         O_RDWR | O_CREAT | O_TRUNC,
                          0644, errp);
     if (ret < 0) {
         goto fail;
@@ -XXX,XX +XXX,XX @@ static int ssh_has_zero_init(BlockDriverState *bs)
     /* Assume false, unless we can positively prove it's true. */
     int has_zero_init = 0;
 
-    if (s->attrs.flags & LIBSSH2_SFTP_ATTR_PERMISSIONS) {
-        if (s->attrs.permissions & LIBSSH2_SFTP_S_IFREG) {
-            has_zero_init = 1;
-        }
+    if (s->attrs->type == SSH_FILEXFER_TYPE_REGULAR) {
+        has_zero_init = 1;
     }
 
     return has_zero_init;
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
         .co = qemu_coroutine_self()
     };
 
-    r = libssh2_session_block_directions(s->session);
+    r = ssh_get_poll_flags(s->session);
 
-    if (r & LIBSSH2_SESSION_BLOCK_INBOUND) {
+    if (r & SSH_READ_PENDING) {
         rd_handler = restart_coroutine;
     }
-    if (r & LIBSSH2_SESSION_BLOCK_OUTBOUND) {
+    if (r & SSH_WRITE_PENDING) {
         wr_handler = restart_coroutine;
     }
 
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
     trace_ssh_co_yield_back(s->sock);
 }
 
-/* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
- * in the remote file.  Notice that it just updates a field in the
- * sftp_handle structure, so there is no network traffic and it cannot
- * fail.
- *
- * However, `libssh2_sftp_seek64' does have a catastrophic effect on
- * performance since it causes the handle to throw away all in-flight
- * reads and buffered readahead data.  Therefore this function tries
- * to be intelligent about when to call the underlying libssh2 function.
- */
-#define SSH_SEEK_WRITE 0
-#define SSH_SEEK_READ  1
-#define SSH_SEEK_FORCE 2
-
-static void ssh_seek(BDRVSSHState *s, int64_t offset, int flags)
-{
-    bool op_read = (flags & SSH_SEEK_READ) != 0;
-    bool force = (flags & SSH_SEEK_FORCE) != 0;
-
-    if (force || op_read != s->offset_op_read || offset != s->offset) {
-        trace_ssh_seek(offset);
-        libssh2_sftp_seek64(s->sftp_handle, offset);
-        s->offset = offset;
-        s->offset_op_read = op_read;
-    }
-}
-
 static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
                                  int64_t offset, size_t size,
                                  QEMUIOVector *qiov)
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
 
     trace_ssh_read(offset, size);
 
-    ssh_seek(s, offset, SSH_SEEK_READ);
+    trace_ssh_seek(offset);
+    sftp_seek64(s->sftp_handle, offset);
 
     /* This keeps track of the current iovec element ('i'), where we
      * will write to next ('buf'), and the end of the current iovec
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
     buf = i->iov_base;
     end_of_vec = i->iov_base + i->iov_len;
 
-    /* libssh2 has a hard-coded limit of 2000 bytes per request,
-     * although it will also do readahead behind our backs.  Therefore
-     * we may have to do repeated reads here until we have read 'size'
-     * bytes.
-     */
     for (got = 0; got < size; ) {
+        size_t request_read_size;
     again:
-        trace_ssh_read_buf(buf, end_of_vec - buf);
-        r = libssh2_sftp_read(s->sftp_handle, buf, end_of_vec - buf);
-        trace_ssh_read_return(r);
+        /*
+         * The size of SFTP packets is limited to 32K bytes, so limit
+         * the amount of data requested to 16K, as libssh currently
+         * does not handle multiple requests on its own.
+         */
+        request_read_size = MIN(end_of_vec - buf, 16384);
+        trace_ssh_read_buf(buf, end_of_vec - buf, request_read_size);
+        r = sftp_read(s->sftp_handle, buf, request_read_size);
+        trace_ssh_read_return(r, sftp_get_error(s->sftp));
 
-        if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
+        if (r == SSH_AGAIN) {
             co_yield(s, bs);
             goto again;
         }
-        if (r < 0) {
-            sftp_error_trace(s, "read");
-            s->offset = -1;
-            return -EIO;
-        }
-        if (r == 0) {
+        if (r == SSH_EOF || (r == 0 && sftp_get_error(s->sftp) == SSH_FX_EOF)) {
             /* EOF: Short read so pad the buffer with zeroes and return it. */
             qemu_iovec_memset(qiov, got, 0, size - got);
             return 0;
         }
+        if (r <= 0) {
+            sftp_error_trace(s, "read");
+            return -EIO;
+        }
 
         got += r;
         buf += r;
-        s->offset += r;
         if (buf >= end_of_vec && got < size) {
             i++;
             buf = i->iov_base;
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
 
     trace_ssh_write(offset, size);
 
-    ssh_seek(s, offset, SSH_SEEK_WRITE);
+    trace_ssh_seek(offset);
+    sftp_seek64(s->sftp_handle, offset);
 
     /* This keeps track of the current iovec element ('i'), where we
      * will read from next ('buf'), and the end of the current iovec
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
     end_of_vec = i->iov_base + i->iov_len;
 
     for (written = 0; written < size; ) {
+        size_t request_write_size;
     again:
-        trace_ssh_write_buf(buf, end_of_vec - buf);
-        r = libssh2_sftp_write(s->sftp_handle, buf, end_of_vec - buf);
-        trace_ssh_write_return(r);
+        /*
+         * Avoid too large data packets, as libssh currently does not
+         * handle multiple requests on its own.
+         */
+        request_write_size = MIN(end_of_vec - buf, 131072);
+        trace_ssh_write_buf(buf, end_of_vec - buf, request_write_size);
+        r = sftp_write(s->sftp_handle, buf, request_write_size);
+        trace_ssh_write_return(r, sftp_get_error(s->sftp));
 
-        if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
+        if (r == SSH_AGAIN) {
             co_yield(s, bs);
             goto again;
         }
         if (r < 0) {
             sftp_error_trace(s, "write");
-            s->offset = -1;
             return -EIO;
         }
-        /* The libssh2 API is very unclear about this.  A comment in
-         * the code says "nothing was acked, and no EAGAIN was
-         * received!" which apparently means that no data got sent
-         * out, and the underlying channel didn't return any EAGAIN
-         * indication.  I think this is a bug in either libssh2 or
-         * OpenSSH (server-side).  In any case, forcing a seek (to
-         * discard libssh2 internal buffers), and then trying again
-         * works for me.
-         */
-        if (r == 0) {
-            ssh_seek(s, offset + written, SSH_SEEK_WRITE|SSH_SEEK_FORCE);
-            co_yield(s, bs);
-            goto again;
-        }
 
         written += r;
         buf += r;
-        s->offset += r;
         if (buf >= end_of_vec && written < size) {
             i++;
             buf = i->iov_base;
             end_of_vec = i->iov_base + i->iov_len;
         }
 
-        if (offset + written > s->attrs.filesize)
-            s->attrs.filesize = offset + written;
+        if (offset + written > s->attrs->size) {
+            s->attrs->size = offset + written;
+        }
     }
 
     return 0;
@@ -XXX,XX +XXX,XX @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
     }
 }
 
-#ifdef HAS_LIBSSH2_SFTP_FSYNC
+#ifdef HAVE_LIBSSH_0_8
 
 static coroutine_fn int ssh_flush(BDRVSSHState *s, BlockDriverState *bs)
 {
     int r;
 
     trace_ssh_flush();
+
+    if (!sftp_extension_supported(s->sftp, "fsync@openssh.com", "1")) {
+        unsafe_flush_warning(s, "OpenSSH >= 6.3");
+        return 0;
+    }
  again:
-    r = libssh2_sftp_fsync(s->sftp_handle);
-    if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
+    r = sftp_fsync(s->sftp_handle);
+    if (r == SSH_AGAIN) {
         co_yield(s, bs);
         goto again;
     }
-    if (r == LIBSSH2_ERROR_SFTP_PROTOCOL &&
-        libssh2_sftp_last_error(s->sftp) == LIBSSH2_FX_OP_UNSUPPORTED) {
-        unsafe_flush_warning(s, "OpenSSH >= 6.3");
-        return 0;
-    }
     if (r < 0) {
         sftp_error_trace(s, "fsync");
         return -EIO;
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
     return ret;
 }
 
-#else /* !HAS_LIBSSH2_SFTP_FSYNC */
+#else /* !HAVE_LIBSSH_0_8 */
 
 static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
 {
     BDRVSSHState *s = bs->opaque;
 
-    unsafe_flush_warning(s, "libssh2 >= 1.4.4");
+    unsafe_flush_warning(s, "libssh >= 0.8.0");
     return 0;
 }
 
-#endif /* !HAS_LIBSSH2_SFTP_FSYNC */
+#endif /* !HAVE_LIBSSH_0_8 */
 
 static int64_t ssh_getlength(BlockDriverState *bs)
 {
     BDRVSSHState *s = bs->opaque;
     int64_t length;
 
-    /* Note we cannot make a libssh2 call here. */
-    length = (int64_t) s->attrs.filesize;
+    /* Note we cannot make a libssh call here. */
+    length = (int64_t) s->attrs->size;
     trace_ssh_getlength(length);
 
     return length;
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn ssh_co_truncate(BlockDriverState *bs, int64_t offset,
         return -ENOTSUP;
     }
 
-    if (offset < s->attrs.filesize) {
+    if (offset < s->attrs->size) {
         error_setg(errp, "ssh driver does not support shrinking files");
         return -ENOTSUP;
     }
 
-    if (offset == s->attrs.filesize) {
+    if (offset == s->attrs->size) {
         return 0;
     }
 
@@ -XXX,XX +XXX,XX @@ static void bdrv_ssh_init(void)
 {
     int r;
 
-    r = libssh2_init(0);
+    r = ssh_init();
     if (r != 0) {
-        fprintf(stderr, "libssh2 initialization failed, %d\n", r);
+        fprintf(stderr, "libssh initialization failed, %d\n", r);
         exit(EXIT_FAILURE);
     }
 
+#if TRACE_LIBSSH != 0
+    ssh_set_log_level(TRACE_LIBSSH);
+#endif
+
     bdrv_register(&bdrv_ssh);
 }
 
diff --git a/.travis.yml b/.travis.yml
index XXXXXXX..XXXXXXX 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -XXX,XX +XXX,XX @@ addons:
       - libseccomp-dev
       - libspice-protocol-dev
       - libspice-server-dev
-      - libssh2-1-dev
+      - libssh-dev
       - liburcu-dev
       - libusb-1.0-0-dev
       - libvte-2.91-dev
@@ -XXX,XX +XXX,XX @@ matrix:
             - libseccomp-dev
             - libspice-protocol-dev
             - libspice-server-dev
-            - libssh2-1-dev
+            - libssh-dev
             - liburcu-dev
             - libusb-1.0-0-dev
             - libvte-2.91-dev
diff --git a/block/trace-events b/block/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -XXX,XX +XXX,XX @@ nbd_client_connect_success(const char *export_name) "export '%s'"
 # ssh.c
 ssh_restart_coroutine(void *co) "co=%p"
 ssh_flush(void) "fsync"
-ssh_check_host_key_knownhosts(const char *key) "host key OK: %s"
+ssh_check_host_key_knownhosts(void) "host key OK"
 ssh_connect_to_ssh(char *path, int flags, int mode) "opening file %s flags=0x%x creat_mode=0%o"
 ssh_co_yield(int sock, void *rd_handler, void *wr_handler) "s->sock=%d rd_handler=%p wr_handler=%p"
 ssh_co_yield_back(int sock) "s->sock=%d - back"
 ssh_getlength(int64_t length) "length=%" PRIi64
 ssh_co_create_opts(uint64_t size) "total_size=%" PRIu64
 ssh_read(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
-ssh_read_buf(void *buf, size_t size) "sftp_read buf=%p size=%zu"
-ssh_read_return(ssize_t ret) "sftp_read returned %zd"
+ssh_read_buf(void *buf, size_t size, size_t actual_size) "sftp_read buf=%p size=%zu (actual size=%zu)"
+ssh_read_return(ssize_t ret, int sftp_err) "sftp_read returned %zd (sftp error=%d)"
 ssh_write(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
-ssh_write_buf(void *buf, size_t size) "sftp_write buf=%p size=%zu"
-ssh_write_return(ssize_t ret) "sftp_write returned %zd"
+ssh_write_buf(void *buf, size_t size, size_t actual_size) "sftp_write buf=%p size=%zu (actual size=%zu)"
+ssh_write_return(ssize_t ret, int sftp_err) "sftp_write returned %zd (sftp error=%d)"
 ssh_seek(int64_t offset) "seeking to offset=%" PRIi64
+ssh_auth_methods(int methods) "auth methods=0x%x"
+ssh_server_status(int status) "server status=%d"
 
 # curl.c
 curl_timer_cb(long timeout_ms) "timer callback timeout_ms %ld"
@@ -XXX,XX +XXX,XX @@ sheepdog_snapshot_create(const char *sn_name, const char *id) "%s %s"
 sheepdog_snapshot_create_inode(const char *name, uint32_t snap, uint32_t vdi) "s->inode: name %s snap_id 0x%" PRIx32 " vdi 0x%" PRIx32
 
 # ssh.c
-sftp_error(const char *op, const char *ssh_err, int ssh_err_code, unsigned long sftp_err_code) "%s failed: %s (libssh2 error code: %d, sftp error code: %lu)"
+sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
index XXXXXXX..XXXXXXX 100644
--- a/docs/qemu-block-drivers.texi
+++ b/docs/qemu-block-drivers.texi
@@ -XXX,XX +XXX,XX @@ print a warning when @code{fsync} is not supported:
 
 warning: ssh server @code{ssh.example.com:22} does not support fsync
 
-With sufficiently new versions of libssh2 and OpenSSH, @code{fsync} is
+With sufficiently new versions of libssh and OpenSSH, @code{fsync} is
 supported.
 
 @node disk_images_nvme
diff --git a/tests/docker/dockerfiles/debian-win32-cross.docker b/tests/docker/dockerfiles/debian-win32-cross.docker
index XXXXXXX..XXXXXXX 100644
--- a/tests/docker/dockerfiles/debian-win32-cross.docker
+++ b/tests/docker/dockerfiles/debian-win32-cross.docker
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
         mxe-$TARGET-w64-mingw32.shared-curl \
         mxe-$TARGET-w64-mingw32.shared-glib \
         mxe-$TARGET-w64-mingw32.shared-libgcrypt \
-        mxe-$TARGET-w64-mingw32.shared-libssh2 \
         mxe-$TARGET-w64-mingw32.shared-libusb1 \
         mxe-$TARGET-w64-mingw32.shared-lzo \
         mxe-$TARGET-w64-mingw32.shared-nettle \
diff --git a/tests/docker/dockerfiles/debian-win64-cross.docker b/tests/docker/dockerfiles/debian-win64-cross.docker
index XXXXXXX..XXXXXXX 100644
--- a/tests/docker/dockerfiles/debian-win64-cross.docker
+++ b/tests/docker/dockerfiles/debian-win64-cross.docker
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
         mxe-$TARGET-w64-mingw32.shared-curl \
         mxe-$TARGET-w64-mingw32.shared-glib \
         mxe-$TARGET-w64-mingw32.shared-libgcrypt \
-        mxe-$TARGET-w64-mingw32.shared-libssh2 \
         mxe-$TARGET-w64-mingw32.shared-libusb1 \
         mxe-$TARGET-w64-mingw32.shared-lzo \
         mxe-$TARGET-w64-mingw32.shared-nettle \
diff --git a/tests/docker/dockerfiles/fedora.docker b/tests/docker/dockerfiles/fedora.docker
index XXXXXXX..XXXXXXX 100644
--- a/tests/docker/dockerfiles/fedora.docker
+++ b/tests/docker/dockerfiles/fedora.docker
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
     libpng-devel \
     librbd-devel \
     libseccomp-devel \
-    libssh2-devel \
+    libssh-devel \
     libubsan \
     libusbx-devel \
     libxml2-devel \
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
     mingw32-gtk3 \
     mingw32-libjpeg-turbo \
     mingw32-libpng \
-    mingw32-libssh2 \
     mingw32-libtasn1 \
     mingw32-nettle \
     mingw32-pixman \
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
     mingw64-gtk3 \
     mingw64-libjpeg-turbo \
     mingw64-libpng \
-    mingw64-libssh2 \
     mingw64-libtasn1 \
     mingw64-nettle \
     mingw64-pixman \
diff --git a/tests/docker/dockerfiles/ubuntu.docker b/tests/docker/dockerfiles/ubuntu.docker
index XXXXXXX..XXXXXXX 100644
--- a/tests/docker/dockerfiles/ubuntu.docker
+++ b/tests/docker/dockerfiles/ubuntu.docker
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
     libsnappy-dev \
     libspice-protocol-dev \
     libspice-server-dev \
-    libssh2-1-dev \
+    libssh-dev \
     libusb-1.0-0-dev \
     libusbredirhost-dev \
     libvdeplug-dev \
diff --git a/tests/docker/dockerfiles/ubuntu1804.docker b/tests/docker/dockerfiles/ubuntu1804.docker
index XXXXXXX..XXXXXXX 100644
--- a/tests/docker/dockerfiles/ubuntu1804.docker
+++ b/tests/docker/dockerfiles/ubuntu1804.docker
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
     libsnappy-dev \
     libspice-protocol-dev \
     libspice-server-dev \
-    libssh2-1-dev \
+    libssh-dev \
     libusb-1.0-0-dev \
     libusbredirhost-dev \
     libvdeplug-dev \
diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
index XXXXXXX..XXXXXXX 100755
--- a/tests/qemu-iotests/207
+++ b/tests/qemu-iotests/207
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
 
     iotests.img_info_log(remote_path)
 
-    md5_key = subprocess.check_output(
-        'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
-        'cut -d" " -f3 | base64 -d | md5sum -b | cut -d" " -f1',
-        shell=True).rstrip().decode('ascii')
+    keys = subprocess.check_output(
+        'ssh-keyscan 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
+        'cut -d" " -f3',
+        shell=True).rstrip().decode('ascii').split('\n')
+
+    # Mappings of base64 representations to digests
+    md5_keys = {}
+    sha1_keys = {}
+
+    for key in keys:
+        md5_keys[key] = subprocess.check_output(
+            'echo %s | base64 -d | md5sum -b | cut -d" " -f1' % key,
+            shell=True).rstrip().decode('ascii')
+
+        sha1_keys[key] = subprocess.check_output(
+            'echo %s | base64 -d | sha1sum -b | cut -d" " -f1' % key,
+            shell=True).rstrip().decode('ascii')
 
     vm.launch()
+
+    # Find correct key first
+    matching_key = None
+    for key in keys:
+        result = vm.qmp('blockdev-add',
+                        driver='ssh', node_name='node0', path=disk_path,
+                        server={
+                             'host': '127.0.0.1',
+                             'port': '22',
+                        }, host_key_check={
+                             'mode': 'hash',
+                             'type': 'md5',
+                             'hash': md5_keys[key],
+                        })
+
+        if 'error' not in result:
+            vm.qmp('blockdev-del', node_name='node0')
+            matching_key = key
+            break
+
+    if matching_key is None:
+        vm.shutdown()
+        iotests.notrun('Did not find a key that fits 127.0.0.1')
+
     blockdev_create(vm, { 'driver': 'ssh',
                           'location': {
                               'path': disk_path,
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
                               'host-key-check': {
                                   'mode': 'hash',
                                   'type': 'md5',
-                                  'hash': md5_key,
+                                  'hash': md5_keys[matching_key],
                               }
                           },
                           'size': 8388608 })
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
 
     iotests.img_info_log(remote_path)
 
-    sha1_key = subprocess.check_output(
-        'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
-        'cut -d" " -f3 | base64 -d | sha1sum -b | cut -d" " -f1',
-        shell=True).rstrip().decode('ascii')
-
     vm.launch()
     blockdev_create(vm, { 'driver': 'ssh',
                           'location': {
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
                               'host-key-check': {
                                   'mode': 'hash',
                                   'type': 'sha1',
-                                  'hash': sha1_key,
+                                  'hash': sha1_keys[matching_key],
                               }
                           },
                           'size': 4194304 })
diff --git a/tests/qemu-iotests/207.out b/tests/qemu-iotests/207.out
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/207.out
+++ b/tests/qemu-iotests/207.out
@@ -XXX,XX +XXX,XX @@ virtual size: 4 MiB (4194304 bytes)
 
 {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": {"driver": "ssh", "location": {"host-key-check": {"mode": "none"}, "path": "/this/is/not/an/existing/path", "server": {"host": "127.0.0.1", "port": "22"}}, "size": 4194304}}}
 {"return": {}}
-Job failed: failed to open remote file '/this/is/not/an/existing/path': Failed opening remote file (libssh2 error code: -31)
+Job failed: failed to open remote file '/this/is/not/an/existing/path': SFTP server: No such file (libssh error code: 1, sftp error code: 2)
 {"execute": "job-dismiss", "arguments": {"id": "job0"}}
 {"return": {}}
 
-- 
2.21.0