Series comparison

-[PULL 00/19] Block patches
+[Qemu-devel] [PULL v2 0/8] Block patches
-The following changes since commit ba29883206d92a29ad5a466e679ccfc2ee6132ef:
+The following changes since commit 474f3938d79ab36b9231c9ad3b5a9314c2aeacde:
-  Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20200310' into staging (2020-03-10 16:50:28 +0000)
+  Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-jun-21-2019' into staging (2019-06-21 15:40:50 +0100)
 are available in the Git repository at:
-  https://github.com/XanClic/qemu.git tags/pull-block-2020-03-11
+  https://github.com/XanClic/qemu.git tags/pull-block-2019-06-24
-for you to fetch changes up to 397f4e9d83e9c0000905f0a988ba1aeda162571c:
+for you to fetch changes up to ab5d4a30f7f3803ca5106b370969c1b7b54136f8:
-  block/block-copy: hide structure definitions (2020-03-11 12:42:30 +0100)
+  iotests: Fix 205 for concurrent runs (2019-06-24 16:01:40 +0200)
 ----------------------------------------------------------------
-Block patches for the 5.0 softfreeze:
+Block patches:
-- qemu-img measure for LUKS
+- The SSH block driver now uses libssh instead of libssh2
-- Improve block-copy's performance by reducing inter-request
+- The VMDK block driver gets read-only support for the seSparse
-  dependencies
+  subformat
-- Make curl's detection of accept-ranges more robust
+- Various fixes
-- Memleak fixes
-- iotest fix
+---
 v2:
 - Squashed Pino's fix for pre-0.8 libssh into the libssh patch
 ----------------------------------------------------------------
-David Edmondson (2):
+Anton Nefedov (1):
-  block/curl: HTTP header fields allow whitespace around values
+  iotest 134: test cluster-misaligned encrypted write
   block/curl: HTTP header field names are case insensitive
-Eric Blake (1):
+Klaus Birkelund Jensen (1):
-  iotests: Fix nonportable use of od --endian
+  nvme: do not advertise support for unsupported arbitration mechanism
-Pan Nengyuan (2):
+Max Reitz (1):
-  block/qcow2: do free crypto_opts in qcow2_close()
+  iotests: Fix 205 for concurrent runs
   qemu-img: free memory before re-assign
-Stefan Hajnoczi (4):
+Pino Toscano (1):
-  luks: extract qcrypto_block_calculate_payload_offset()
+  ssh: switch from libssh2 to libssh
   luks: implement .bdrv_measure()
   qemu-img: allow qemu-img measure --object without a filename
   iotests: add 288 luks qemu-img measure test
-Vladimir Sementsov-Ogievskiy (10):
+Sam Eiderman (3):
-  block/qcow2-threads: fix qcow2_decompress
+  vmdk: Fix comment regarding max l1_size coverage
-  job: refactor progress to separate object
+  vmdk: Reduce the max bound for L1 table size
-  block/block-copy: fix progress calculation
+  vmdk: Add read-only support for seSparse snapshots
   block/block-copy: specialcase first copy_range request
   block/block-copy: use block_status
   block/block-copy: factor out find_conflicting_inflight_req
   block/block-copy: refactor interfaces to use bytes instead of end
   block/block-copy: rename start to offset in interfaces
   block/block-copy: reduce intersecting request lock
   block/block-copy: hide structure definitions
- block/backup-top.c               |   6 +-
+Vladimir Sementsov-Ogievskiy (1):
- block/backup.c                   |  38 ++-
+  blockdev: enable non-root nodes for transaction drive-backup source
- block/block-copy.c               | 405 ++++++++++++++++++++++++-------
- block/crypto.c                   |  62 +++++
+ configure                                     |  65 +-
- block/curl.c                     |  32 ++-
+ block/Makefile.objs                           |   6 +-
- block/qcow2-threads.c            |  12 +-
+ block/ssh.c                                   | 652 ++++++++++--------
- block/qcow2.c                    |  75 ++----
+ block/vmdk.c                                  | 372 +++++++++-
- block/trace-events               |   1 +
+ blockdev.c                                    |   2 +-
- blockjob.c                       |  16 +-
+ hw/block/nvme.c                               |   1 -
- crypto/block.c                   |  36 +++
+ .travis.yml                                   |   4 +-
- include/block/block-copy.h       |  65 +----
+ block/trace-events                            |  14 +-
- include/crypto/block.h           |  22 ++
+ docs/qemu-block-drivers.texi                  |   2 +-
- include/qemu/job.h               |  11 +-
+ .../dockerfiles/debian-win32-cross.docker     |   1 -
- include/qemu/progress_meter.h    |  58 +++++
+ .../dockerfiles/debian-win64-cross.docker     |   1 -
- job-qmp.c                        |   4 +-
+ tests/docker/dockerfiles/fedora.docker        |   4 +-
- job.c                            |   6 +-
+ tests/docker/dockerfiles/ubuntu.docker        |   2 +-
- qemu-img.c                       |  14 +-
+ tests/docker/dockerfiles/ubuntu1804.docker    |   2 +-
- tests/qemu-iotests/178           |   2 +-
+ tests/qemu-iotests/059.out                    |   2 +-
- tests/qemu-iotests/178.out.qcow2 |   8 +-
+ tests/qemu-iotests/134                        |   9 +
- tests/qemu-iotests/178.out.raw   |   8 +-
+ tests/qemu-iotests/134.out                    |  10 +
- tests/qemu-iotests/288           |  93 +++++++
+ tests/qemu-iotests/205                        |   2 +-
- tests/qemu-iotests/288.out       |  30 +++
+ tests/qemu-iotests/207                        |  54 +-
- tests/qemu-iotests/common.rc     |  22 +-
+ tests/qemu-iotests/207.out                    |   2 +-
- tests/qemu-iotests/group         |   1 +
+files changed, 823 insertions(+), 384 deletions(-)
 files changed, 749 insertions(+), 278 deletions(-)
  create mode 100644 include/qemu/progress_meter.h
  create mode 100755 tests/qemu-iotests/288
  create mode 100644 tests/qemu-iotests/288.out
 --
-.24.1
+.21.0

-[PULL 18/19] block/block-copy: reduce intersecting request lock
+[Qemu-devel] [PULL v2 1/8] nvme: do not advertise support for unsupported arbitration mechanism
-From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+From: Klaus Birkelund Jensen <klaus@birkelund.eu>
-Currently, block_copy operation lock the whole requested region. But
+The device mistakenly reports that the Weighted Round Robin with Urgent
-there is no reason to lock clusters, which are already copied, it will
+Priority Class arbitration mechanism is supported.
 disturb other parallel block_copy requests for no reason.
-Let's instead do the following:
+It is not.
-Lock only sub-region, which we are going to operate on. Then, after
+Signed-off-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
-copying all dirty sub-regions, we should wait for intersecting
+Message-id: 20190606092530.14206-1-klaus@birkelund.eu
-requests block-copy, if they failed, we should retry these new dirty
+Acked-by: Maxim Levitsky <mlevitsk@redhat.com>
 clusters.
 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
 Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
 Message-Id: <20200311103004.7649-9-vsementsov@virtuozzo.com>
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- block/block-copy.c | 129 ++++++++++++++++++++++++++++++++++++---------
+ hw/block/nvme.c | 1 -
-file changed, 105 insertions(+), 24 deletions(-)
+file changed, 1 deletion(-)
-diff --git a/block/block-copy.c b/block/block-copy.c
+diff --git a/hw/block/nvme.c b/hw/block/nvme.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/block-copy.c
+--- a/hw/block/nvme.c
-+++ b/block/block-copy.c
++++ b/hw/block/nvme.c
-@@ -XXX,XX +XXX,XX @@ static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
+@@ -XXX,XX +XXX,XX @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
-     return NULL;
+     n->bar.cap = 0;
- }
+     NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
+     NVME_CAP_SET_CQR(n->bar.cap, 1);
--static void coroutine_fn block_copy_wait_inflight_reqs(BlockCopyState *s,
+-    NVME_CAP_SET_AMS(n->bar.cap, 1);
--                                                       int64_t offset,
+     NVME_CAP_SET_TO(n->bar.cap, 0xf);
--                                                       int64_t bytes)
+     NVME_CAP_SET_CSS(n->bar.cap, 1);
-+/*
+     NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
 + * If there are no intersecting requests return false. Otherwise, wait for the
 + * first found intersecting request to finish and return true.
 + */
 +static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, int64_t offset,
 +                                             int64_t bytes)
  {
 -    BlockCopyInFlightReq *req;
 +    BlockCopyInFlightReq *req = find_conflicting_inflight_req(s, offset, bytes);
 -    while ((req = find_conflicting_inflight_req(s, offset, bytes))) {
 -        qemu_co_queue_wait(&req->wait_queue, NULL);
 +    if (!req) {
 +        return false;
      }
 +
 +    qemu_co_queue_wait(&req->wait_queue, NULL);
 +
 +    return true;
  }
 +/* Called only on full-dirty region */
  static void block_copy_inflight_req_begin(BlockCopyState *s,
                                            BlockCopyInFlightReq *req,
                                            int64_t offset, int64_t bytes)
  {
 +    assert(!find_conflicting_inflight_req(s, offset, bytes));
 +
 +    bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
 +    s->in_flight_bytes += bytes;
 +
      req->offset = offset;
      req->bytes = bytes;
      qemu_co_queue_init(&req->wait_queue);
      QLIST_INSERT_HEAD(&s->inflight_reqs, req, list);
  }
 -static void coroutine_fn block_copy_inflight_req_end(BlockCopyInFlightReq *req)
 +/*
 + * block_copy_inflight_req_shrink
 + *
 + * Drop the tail of the request to be handled later. Set dirty bits back and
 + * wake up all requests waiting for us (may be some of them are not intersecting
 + * with shrunk request)
 + */
 +static void coroutine_fn block_copy_inflight_req_shrink(BlockCopyState *s,
 +        BlockCopyInFlightReq *req, int64_t new_bytes)
  {
 +    if (new_bytes == req->bytes) {
 +        return;
 +    }
 +
 +    assert(new_bytes > 0 && new_bytes < req->bytes);
 +
 +    s->in_flight_bytes -= req->bytes - new_bytes;
 +    bdrv_set_dirty_bitmap(s->copy_bitmap,
 +                          req->offset + new_bytes, req->bytes - new_bytes);
 +
 +    req->bytes = new_bytes;
 +    qemu_co_queue_restart_all(&req->wait_queue);
 +}
 +
 +static void coroutine_fn block_copy_inflight_req_end(BlockCopyState *s,
 +                                                     BlockCopyInFlightReq *req,
 +                                                     int ret)
 +{
 +    s->in_flight_bytes -= req->bytes;
 +    if (ret < 0) {
 +        bdrv_set_dirty_bitmap(s->copy_bitmap, req->offset, req->bytes);
 +    }
      QLIST_REMOVE(req, list);
      qemu_co_queue_restart_all(&req->wait_queue);
  }
@@ -XXX,XX +XXX,XX @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
      return ret;
  }
 -int coroutine_fn block_copy(BlockCopyState *s,
 -                            int64_t offset, int64_t bytes,
 -                            bool *error_is_read)
 +/*
 + * block_copy_dirty_clusters
 + *
 + * Copy dirty clusters in @offset/@bytes range.
 + * Returns 1 if dirty clusters found and successfully copied, 0 if no dirty
 + * clusters found and -errno on failure.
 + */
 +static int coroutine_fn block_copy_dirty_clusters(BlockCopyState *s,
 +                                                  int64_t offset, int64_t bytes,
 +                                                  bool *error_is_read)
  {
      int ret = 0;
 -    BlockCopyInFlightReq req;
 +    bool found_dirty = false;
      /*
       * block_copy() user is responsible for keeping source and target in same
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
      assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
      assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
 -    block_copy_wait_inflight_reqs(s, offset, bytes);
 -    block_copy_inflight_req_begin(s, &req, offset, bytes);
 -
      while (bytes) {
 +        BlockCopyInFlightReq req;
          int64_t next_zero, cur_bytes, status_bytes;
          if (!bdrv_dirty_bitmap_get(s->copy_bitmap, offset)) {
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
              continue; /* already copied */
          }
 +        found_dirty = true;
 +
          cur_bytes = MIN(bytes, s->copy_size);
          next_zero = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, offset,
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
              assert(next_zero < offset + cur_bytes); /* no need to do MIN() */
              cur_bytes = next_zero - offset;
          }
 +        block_copy_inflight_req_begin(s, &req, offset, cur_bytes);
          ret = block_copy_block_status(s, offset, cur_bytes, &status_bytes);
 +        assert(ret >= 0); /* never fail */
 +        cur_bytes = MIN(cur_bytes, status_bytes);
 +        block_copy_inflight_req_shrink(s, &req, cur_bytes);
          if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
 -            bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, status_bytes);
 +            block_copy_inflight_req_end(s, &req, 0);
              progress_set_remaining(s->progress,
                                     bdrv_get_dirty_count(s->copy_bitmap) +
                                     s->in_flight_bytes);
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
              continue;
          }
 -        cur_bytes = MIN(cur_bytes, status_bytes);
 -
          trace_block_copy_process(s, offset);
 -        bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, cur_bytes);
 -        s->in_flight_bytes += cur_bytes;
 -
          co_get_from_shres(s->mem, cur_bytes);
          ret = block_copy_do_copy(s, offset, cur_bytes, ret & BDRV_BLOCK_ZERO,
                                   error_is_read);
          co_put_to_shres(s->mem, cur_bytes);
 -        s->in_flight_bytes -= cur_bytes;
 +        block_copy_inflight_req_end(s, &req, ret);
          if (ret < 0) {
 -            bdrv_set_dirty_bitmap(s->copy_bitmap, offset, cur_bytes);
 -            break;
 +            return ret;
          }
          progress_work_done(s->progress, cur_bytes);
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
          bytes -= cur_bytes;
      }
 -    block_copy_inflight_req_end(&req);
 +    return found_dirty;
 +}
 +
 +/*
 + * block_copy
 + *
 + * Copy requested region, accordingly to dirty bitmap.
 + * Collaborate with parallel block_copy requests: if they succeed it will help
 + * us. If they fail, we will retry not-copied regions. So, if we return error,
 + * it means that some I/O operation failed in context of _this_ block_copy call,
 + * not some parallel operation.
 + */
 +int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
 +                            bool *error_is_read)
 +{
 +    int ret;
 +
 +    do {
 +        ret = block_copy_dirty_clusters(s, offset, bytes, error_is_read);
 +
 +        if (ret == 0) {
 +            ret = block_copy_wait_one(s, offset, bytes);
 +        }
 +
 +        /*
 +         * We retry in two cases:
 +         * 1. Some progress done
 +         *    Something was copied, which means that there were yield points
 +         *    and some new dirty bits may have appeared (due to failed parallel
 +         *    block-copy requests).
 +         * 2. We have waited for some intersecting block-copy request
 +         *    It may have failed and produced new dirty bits.
 +         */
 +    } while (ret > 0);
      return ret;
  }
 --
-.24.1
+.21.0

-[PULL 19/19] block/block-copy: hide structure definitions
+[Qemu-devel] [PULL v2 2/8] blockdev: enable non-root nodes for transaction drive-backup source
 From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Hide structure definitions and add explicit API instead, to keep an
+We forget to enable it for transaction .prepare, while it is already
-eye on the scope of the shared fields.
+enabled in do_drive_backup since commit a2d665c1bc362
     "blockdev: loosen restrictions on drive-backup source node"
 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
+Message-id: 20190618140804.59214-1-vsementsov@virtuozzo.com
-Reviewed-by: Max Reitz <mreitz@redhat.com>
+Reviewed-by: John Snow <jsnow@redhat.com>
 Message-Id: <20200311103004.7649-10-vsementsov@virtuozzo.com>
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- block/backup-top.c         |  6 ++--
+ blockdev.c | 2 +-
- block/backup.c             | 25 ++++++++--------
+file changed, 1 insertion(+), 1 deletion(-)
  block/block-copy.c         | 59 ++++++++++++++++++++++++++++++++++++++
  include/block/block-copy.h | 52 +++------------------------------
 files changed, 80 insertions(+), 62 deletions(-)
-diff --git a/block/backup-top.c b/block/backup-top.c
+diff --git a/blockdev.c b/blockdev.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/backup-top.c
+--- a/blockdev.c
-+++ b/block/backup-top.c
++++ b/blockdev.c
-@@ -XXX,XX +XXX,XX @@ typedef struct BDRVBackupTopState {
+@@ -XXX,XX +XXX,XX @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
-     BlockCopyState *bcs;
+     assert(common->action->type == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
-     BdrvChild *target;
+     backup = common->action->u.drive_backup.data;
-     bool active;
-+    int64_t cluster_size;
+-    bs = qmp_get_root_bs(backup->device, errp);
- } BDRVBackupTopState;
++    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
+     if (!bs) {
  static coroutine_fn int backup_top_co_preadv(
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offset,
          return 0;
      }
 -    off = QEMU_ALIGN_DOWN(offset, s->bcs->cluster_size);
 -    end = QEMU_ALIGN_UP(offset + bytes, s->bcs->cluster_size);
 +    off = QEMU_ALIGN_DOWN(offset, s->cluster_size);
 +    end = QEMU_ALIGN_UP(offset + bytes, s->cluster_size);
      return block_copy(s->bcs, off, end - off, NULL);
  }
@@ -XXX,XX +XXX,XX @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
          goto fail;
      }
 +    state->cluster_size = cluster_size;
      state->bcs = block_copy_state_new(top->backing, state->target,
                                        cluster_size, write_flags, &local_err);
      if (local_err) {
 diff --git a/block/backup.c b/block/backup.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/backup.c
 +++ b/block/backup.c
@@ -XXX,XX +XXX,XX @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
      if (ret < 0 && job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
          /* If we failed and synced, merge in the bits we didn't copy: */
 -        bdrv_dirty_bitmap_merge_internal(bm, job->bcs->copy_bitmap,
 +        bdrv_dirty_bitmap_merge_internal(bm, block_copy_dirty_bitmap(job->bcs),
                                           NULL, true);
      }
  }
@@ -XXX,XX +XXX,XX @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
          return;
      }
--    bdrv_set_dirty_bitmap(backup_job->bcs->copy_bitmap, 0, backup_job->len);
-+    bdrv_set_dirty_bitmap(block_copy_dirty_bitmap(backup_job->bcs), 0,
-+                          backup_job->len);
- }
- static BlockErrorAction backup_error_action(BackupBlockJob *job,
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_loop(BackupBlockJob *job)
-     BdrvDirtyBitmapIter *bdbi;
-     int ret = 0;
--    bdbi = bdrv_dirty_iter_new(job->bcs->copy_bitmap);
-+    bdbi = bdrv_dirty_iter_new(block_copy_dirty_bitmap(job->bcs));
-     while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
-         do {
-             if (yield_and_check(job)) {
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_loop(BackupBlockJob *job)
-     return ret;
- }
--static void backup_init_copy_bitmap(BackupBlockJob *job)
-+static void backup_init_bcs_bitmap(BackupBlockJob *job)
- {
-     bool ret;
-     uint64_t estimate;
-+    BdrvDirtyBitmap *bcs_bitmap = block_copy_dirty_bitmap(job->bcs);
-     if (job->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
--        ret = bdrv_dirty_bitmap_merge_internal(job->bcs->copy_bitmap,
--                                               job->sync_bitmap,
-+        ret = bdrv_dirty_bitmap_merge_internal(bcs_bitmap, job->sync_bitmap,
-                                                NULL, true);
-         assert(ret);
-     } else {
-@@ -XXX,XX +XXX,XX @@ static void backup_init_copy_bitmap(BackupBlockJob *job)
-              * We can't hog the coroutine to initialize this thoroughly.
-              * Set a flag and resume work when we are able to yield safely.
-              */
--            job->bcs->skip_unallocated = true;
-+            block_copy_set_skip_unallocated(job->bcs, true);
-         }
--        bdrv_set_dirty_bitmap(job->bcs->copy_bitmap, 0, job->len);
-+        bdrv_set_dirty_bitmap(bcs_bitmap, 0, job->len);
-     }
--    estimate = bdrv_get_dirty_count(job->bcs->copy_bitmap);
-+    estimate = bdrv_get_dirty_count(bcs_bitmap);
-     job_progress_set_remaining(&job->common.job, estimate);
- }
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_run(Job *job, Error **errp)
-     BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
-     int ret = 0;
--    backup_init_copy_bitmap(s);
-+    backup_init_bcs_bitmap(s);
-     if (s->sync_mode == MIRROR_SYNC_MODE_TOP) {
-         int64_t offset = 0;
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_run(Job *job, Error **errp)
-             offset += count;
-         }
--        s->bcs->skip_unallocated = false;
-+        block_copy_set_skip_unallocated(s->bcs, false);
-     }
-     if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
-         /*
--         * All bits are set in copy_bitmap to allow any cluster to be copied.
-+         * All bits are set in bcs bitmap to allow any cluster to be copied.
-          * This does not actually require them to be copied.
-          */
-         while (!job_is_cancelled(job)) {
-diff --git a/block/block-copy.c b/block/block-copy.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/block-copy.c
-+++ b/block/block-copy.c
-@@ -XXX,XX +XXX,XX @@
- #define BLOCK_COPY_MAX_BUFFER (1 * MiB)
- #define BLOCK_COPY_MAX_MEM (128 * MiB)
-+typedef struct BlockCopyInFlightReq {
-+    int64_t offset;
-+    int64_t bytes;
-+    QLIST_ENTRY(BlockCopyInFlightReq) list;
-+    CoQueue wait_queue; /* coroutines blocked on this request */
-+} BlockCopyInFlightReq;
-+
-+typedef struct BlockCopyState {
-+    /*
-+     * BdrvChild objects are not owned or managed by block-copy. They are
-+     * provided by block-copy user and user is responsible for appropriate
-+     * permissions on these children.
-+     */
-+    BdrvChild *source;
-+    BdrvChild *target;
-+    BdrvDirtyBitmap *copy_bitmap;
-+    int64_t in_flight_bytes;
-+    int64_t cluster_size;
-+    bool use_copy_range;
-+    int64_t copy_size;
-+    uint64_t len;
-+    QLIST_HEAD(, BlockCopyInFlightReq) inflight_reqs;
-+
-+    BdrvRequestFlags write_flags;
-+
-+    /*
-+     * skip_unallocated:
-+     *
-+     * Used by sync=top jobs, which first scan the source node for unallocated
-+     * areas and clear them in the copy_bitmap.  During this process, the bitmap
-+     * is thus not fully initialized: It may still have bits set for areas that
-+     * are unallocated and should actually not be copied.
-+     *
-+     * This is indicated by skip_unallocated.
-+     *
-+     * In this case, block_copy() will query the source’s allocation status,
-+     * skip unallocated regions, clear them in the copy_bitmap, and invoke
-+     * block_copy_reset_unallocated() every time it does.
-+     */
-+    bool skip_unallocated;
-+
-+    ProgressMeter *progress;
-+    /* progress_bytes_callback: called when some copying progress is done. */
-+    ProgressBytesCallbackFunc progress_bytes_callback;
-+    void *progress_opaque;
-+
-+    SharedResource *mem;
-+} BlockCopyState;
-+
- static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
-                                                            int64_t offset,
-                                                            int64_t bytes)
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
-     return ret;
- }
-+
-+BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s)
-+{
-+    return s->copy_bitmap;
-+}
-+
-+void block_copy_set_skip_unallocated(BlockCopyState *s, bool skip)
-+{
-+    s->skip_unallocated = skip;
-+}
-diff --git a/include/block/block-copy.h b/include/block/block-copy.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/block/block-copy.h
-+++ b/include/block/block-copy.h
-@@ -XXX,XX +XXX,XX @@
- #include "block/block.h"
- #include "qemu/co-shared-resource.h"
--typedef struct BlockCopyInFlightReq {
--    int64_t offset;
--    int64_t bytes;
--    QLIST_ENTRY(BlockCopyInFlightReq) list;
--    CoQueue wait_queue; /* coroutines blocked on this request */
--} BlockCopyInFlightReq;
--
- typedef void (*ProgressBytesCallbackFunc)(int64_t bytes, void *opaque);
--typedef struct BlockCopyState {
--    /*
--     * BdrvChild objects are not owned or managed by block-copy. They are
--     * provided by block-copy user and user is responsible for appropriate
--     * permissions on these children.
--     */
--    BdrvChild *source;
--    BdrvChild *target;
--    BdrvDirtyBitmap *copy_bitmap;
--    int64_t in_flight_bytes;
--    int64_t cluster_size;
--    bool use_copy_range;
--    int64_t copy_size;
--    uint64_t len;
--    QLIST_HEAD(, BlockCopyInFlightReq) inflight_reqs;
--
--    BdrvRequestFlags write_flags;
--
--    /*
--     * skip_unallocated:
--     *
--     * Used by sync=top jobs, which first scan the source node for unallocated
--     * areas and clear them in the copy_bitmap.  During this process, the bitmap
--     * is thus not fully initialized: It may still have bits set for areas that
--     * are unallocated and should actually not be copied.
--     *
--     * This is indicated by skip_unallocated.
--     *
--     * In this case, block_copy() will query the source’s allocation status,
--     * skip unallocated regions, clear them in the copy_bitmap, and invoke
--     * block_copy_reset_unallocated() every time it does.
--     */
--    bool skip_unallocated;
--
--    ProgressMeter *progress;
--    /* progress_bytes_callback: called when some copying progress is done. */
--    ProgressBytesCallbackFunc progress_bytes_callback;
--    void *progress_opaque;
--
--    SharedResource *mem;
--} BlockCopyState;
-+typedef struct BlockCopyState BlockCopyState;
- BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
-                                      int64_t cluster_size,
-@@ -XXX,XX +XXX,XX @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
- int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
-                             bool *error_is_read);
-+BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s);
-+void block_copy_set_skip_unallocated(BlockCopyState *s, bool skip);
-+
- #endif /* BLOCK_COPY_H */
 --
-.24.1
+.21.0

-[PULL 04/19] iotests: add 288 luks qemu-img measure test
+[Qemu-devel] [PULL v2 3/8] iotest 134: test cluster-misaligned encrypted write
-From: Stefan Hajnoczi <stefanha@redhat.com>
+From: Anton Nefedov <anton.nefedov@virtuozzo.com>
-This test exercises the block/crypto.c "luks" block driver
+COW (even empty/zero) areas require encryption too
 .bdrv_measure() code.
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
 Reviewed-by: Eric Blake <eblake@redhat.com>
 Reviewed-by: Max Reitz <mreitz@redhat.com>
-Message-Id: <20200221112522.1497712-5-stefanha@redhat.com>
+Reviewed-by: Alberto Garcia <berto@igalia.com>
-[mreitz: Renamed test from 282 to 288]
+Message-id: 20190516143028.81155-1-anton.nefedov@virtuozzo.com
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- tests/qemu-iotests/288     | 93 ++++++++++++++++++++++++++++++++++++++
+ tests/qemu-iotests/134     |  9 +++++++++
- tests/qemu-iotests/288.out | 30 ++++++++++++
+ tests/qemu-iotests/134.out | 10 ++++++++++
- tests/qemu-iotests/group   |  1 +
+files changed, 19 insertions(+)
 files changed, 124 insertions(+)
  create mode 100755 tests/qemu-iotests/288
  create mode 100644 tests/qemu-iotests/288.out
-diff --git a/tests/qemu-iotests/288 b/tests/qemu-iotests/288
+diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
-new file mode 100755
+index XXXXXXX..XXXXXXX 100755
-index XXXXXXX..XXXXXXX
+--- a/tests/qemu-iotests/134
---- /dev/null
++++ b/tests/qemu-iotests/134
-+++ b/tests/qemu-iotests/288
+@@ -XXX,XX +XXX,XX @@ echo
-@@ -XXX,XX +XXX,XX @@
+ echo "== reading whole image =="
-+#!/usr/bin/env bash
+ $QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
-+#
 +# qemu-img measure tests for LUKS images
 +#
 +# Copyright (C) 2020 Red Hat, Inc.
 +#
 +# This program is free software; you can redistribute it and/or modify
 +# it under the terms of the GNU General Public License as published by
 +# the Free Software Foundation; either version 2 of the License, or
 +# (at your option) any later version.
 +#
 +# This program is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +# GNU General Public License for more details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
 +#
 +
 +# creator
 +owner=stefanha@redhat.com
 +
 +seq=`basename $0`
 +echo "QA output created by $seq"
 +
 +status=1    # failure is the default!
 +
 +_cleanup()
 +{
 +    _cleanup_test_img
 +    rm -f "$TEST_IMG.converted"
 +}
 +trap "_cleanup; exit \$status" 0 1 2 3 15
 +
 +# get standard environment, filters and checks
 +. ./common.rc
 +. ./common.filter
 +. ./common.pattern
 +
 +_supported_fmt luks
 +_supported_proto file
 +_supported_os Linux
 +
 +SECRET=secret,id=sec0,data=passphrase
 +
 +echo "== measure 1G image file =="
 +echo
-+
++echo "== rewriting cluster part =="
-+$QEMU_IMG measure --object "$SECRET" \
++$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 +              -O "$IMGFMT" \
 +          -o key-secret=sec0,iter-time=10 \
 +          --size 1G
 +
 +echo
-+echo "== create 1G image file (size should be no greater than measured) =="
++echo "== verify pattern =="
-+echo
++$QEMU_IO --object $SECRET -c "read -P 0 0 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 +$QEMU_IO --object $SECRET -c "read -P 0xb 512 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 +
-+_make_test_img 1G
+ echo
-+stat -c "image file size in bytes: %s" "$TEST_IMG_FILE"
+ echo "== rewriting whole image =="
  $QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qemu-iotests/134.out
 +++ b/tests/qemu-iotests/134.out
@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
  read 134217728/134217728 bytes at offset 0
 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +== rewriting cluster part ==
 +wrote 512/512 bytes at offset 512
 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +
-+echo
++== verify pattern ==
-+echo "== modified 1G image file (size should be no greater than measured) =="
++read 512/512 bytes at offset 0
-+echo
++512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +read 512/512 bytes at offset 512
 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +
-+$QEMU_IO --object "$SECRET" --image-opts "$TEST_IMG" -c "write -P 0x51 0x10000 0x400" | _filter_qemu_io | _filter_testdir
+ == rewriting whole image ==
-+stat -c "image file size in bytes: %s" "$TEST_IMG_FILE"
+ wrote 134217728/134217728 bytes at offset 0
-+
+MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +echo
 +echo "== measure preallocation=falloc 1G image file =="
 +echo
 +
 +$QEMU_IMG measure --object "$SECRET" \
 +              -O "$IMGFMT" \
 +          -o key-secret=sec0,iter-time=10,preallocation=falloc \
 +          --size 1G
 +
 +echo
 +echo "== measure with input image file =="
 +echo
 +
 +IMGFMT=raw IMGKEYSECRET= IMGOPTS= _make_test_img 1G | _filter_imgfmt
 +QEMU_IO_OPTIONS= IMGOPTSSYNTAX= $QEMU_IO -f raw -c "write -P 0x51 0x10000 0x400" "$TEST_IMG_FILE" | _filter_qemu_io | _filter_testdir
 +$QEMU_IMG measure --object "$SECRET" \
 +              -O "$IMGFMT" \
 +          -o key-secret=sec0,iter-time=10 \
 +          -f raw \
 +          "$TEST_IMG_FILE"
 +
 +# success, all done
 +echo "*** done"
 +rm -f $seq.full
 +status=0
 diff --git a/tests/qemu-iotests/288.out b/tests/qemu-iotests/288.out
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tests/qemu-iotests/288.out
@@ -XXX,XX +XXX,XX @@
 +QA output created by 288
 +== measure 1G image file ==
 +
 +required size: 1075810304
 +fully allocated size: 1075810304
 +
 +== create 1G image file (size should be no greater than measured) ==
 +
 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 +image file size in bytes: 1075810304
 +
 +== modified 1G image file (size should be no greater than measured) ==
 +
 +wrote 1024/1024 bytes at offset 65536
 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +image file size in bytes: 1075810304
 +
 +== measure preallocation=falloc 1G image file ==
 +
 +required size: 1075810304
 +fully allocated size: 1075810304
 +
 +== measure with input image file ==
 +
 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 +wrote 1024/1024 bytes at offset 65536
 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +required size: 1075810304
 +fully allocated size: 1075810304
 +*** done
 diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qemu-iotests/group
 +++ b/tests/qemu-iotests/group
@@ -XXX,XX +XXX,XX @@
 auto quick
 rw
 rw quick
 +288 quick
 --
-.24.1
+.21.0

-[PULL 17/19] block/block-copy: rename start to offset in interfaces
+[Qemu-devel] [PULL v2 4/8] vmdk: Fix comment regarding max l1_size coverage
-From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+From: Sam Eiderman <shmuel.eiderman@oracle.com>
-offset/bytes pair is more usual naming in block layer, let's use it.
+Commit b0651b8c246d ("vmdk: Move l1_size check into vmdk_add_extent")
 extended the l1_size check from VMDK4 to VMDK3 but did not update the
 default coverage in the moved comment.
-Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+The previous vmdk4 calculation:
-Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
     (512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB
 The added vmdk3 calculation:
     (512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB
 Adding the calculation of vmdk3 to the comment.
 In any case, VMware does not offer virtual disks more than 2TB for
 vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is
 not implemented yet in qemu.
 Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
 Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
 Reviewed-by: Liran Alon <liran.alon@oracle.com>
 Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
 Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
 Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com
 Reviewed-by: yuchenlin <yuchenlin@synology.com>
 Reviewed-by: Max Reitz <mreitz@redhat.com>
-Message-Id: <20200311103004.7649-8-vsementsov@virtuozzo.com>
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- block/block-copy.c         | 82 +++++++++++++++++++-------------------
+ block/vmdk.c | 11 ++++++++---
- include/block/block-copy.h |  4 +-
+file changed, 8 insertions(+), 3 deletions(-)
 files changed, 43 insertions(+), 43 deletions(-)
-diff --git a/block/block-copy.c b/block/block-copy.c
+diff --git a/block/vmdk.c b/block/vmdk.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/block-copy.c
+--- a/block/vmdk.c
-+++ b/block/block-copy.c
++++ b/block/vmdk.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
- #define BLOCK_COPY_MAX_MEM (128 * MiB)
+         return -EFBIG;
  static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
 -                                                           int64_t start,
 +                                                           int64_t offset,
                                                             int64_t bytes)
  {
      BlockCopyInFlightReq *req;
      QLIST_FOREACH(req, &s->inflight_reqs, list) {
 -        if (start + bytes > req->start && start < req->start + req->bytes) {
 +        if (offset + bytes > req->offset && offset < req->offset + req->bytes) {
              return req;
          }
      }
-@@ -XXX,XX +XXX,XX @@ static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
+     if (l1_size > 512 * 1024 * 1024) {
- }
+-        /* Although with big capacity and small l1_entry_sectors, we can get a
++        /*
- static void coroutine_fn block_copy_wait_inflight_reqs(BlockCopyState *s,
++         * Although with big capacity and small l1_entry_sectors, we can get a
--                                                       int64_t start,
+          * big l1_size, we don't want unbounded value to allocate the table.
-+                                                       int64_t offset,
+-         * Limit it to 512M, which is 16PB for default cluster and L2 table
-                                                        int64_t bytes)
+-         * size */
- {
++         * Limit it to 512M, which is:
-     BlockCopyInFlightReq *req;
++         *     16PB - for default "Hosted Sparse Extent" (VMDK4)
++         *            cluster size: 64KB, L2 table size: 512 entries
--    while ((req = find_conflicting_inflight_req(s, start, bytes))) {
++         *     1PB  - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
-+    while ((req = find_conflicting_inflight_req(s, offset, bytes))) {
++         *            cluster size: 512B, L2 table size: 4096 entries
-         qemu_co_queue_wait(&req->wait_queue, NULL);
++         */
          error_setg(errp, "L1 size too big");
          return -EFBIG;
      }
- }
- static void block_copy_inflight_req_begin(BlockCopyState *s,
-                                           BlockCopyInFlightReq *req,
--                                          int64_t start, int64_t bytes)
-+                                          int64_t offset, int64_t bytes)
- {
--    req->start = start;
-+    req->offset = offset;
-     req->bytes = bytes;
-     qemu_co_queue_init(&req->wait_queue);
-     QLIST_INSERT_HEAD(&s->inflight_reqs, req, list);
-@@ -XXX,XX +XXX,XX @@ void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm)
-  * Returns 0 on success.
-  */
- static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
--                                           int64_t start, int64_t bytes,
-+                                           int64_t offset, int64_t bytes,
-                                            bool zeroes, bool *error_is_read)
- {
-     int ret;
--    int64_t nbytes = MIN(start + bytes, s->len) - start;
-+    int64_t nbytes = MIN(offset + bytes, s->len) - offset;
-     void *bounce_buffer = NULL;
--    assert(start >= 0 && bytes > 0 && INT64_MAX - start >= bytes);
--    assert(QEMU_IS_ALIGNED(start, s->cluster_size));
-+    assert(offset >= 0 && bytes > 0 && INT64_MAX - offset >= bytes);
-+    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
-     assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
--    assert(start < s->len);
--    assert(start + bytes <= s->len ||
--           start + bytes == QEMU_ALIGN_UP(s->len, s->cluster_size));
-+    assert(offset < s->len);
-+    assert(offset + bytes <= s->len ||
-+           offset + bytes == QEMU_ALIGN_UP(s->len, s->cluster_size));
-     assert(nbytes < INT_MAX);
-     if (zeroes) {
--        ret = bdrv_co_pwrite_zeroes(s->target, start, nbytes, s->write_flags &
-+        ret = bdrv_co_pwrite_zeroes(s->target, offset, nbytes, s->write_flags &
-                                     ~BDRV_REQ_WRITE_COMPRESSED);
-         if (ret < 0) {
--            trace_block_copy_write_zeroes_fail(s, start, ret);
-+            trace_block_copy_write_zeroes_fail(s, offset, ret);
-             if (error_is_read) {
-                 *error_is_read = false;
-             }
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
-     }
-     if (s->use_copy_range) {
--        ret = bdrv_co_copy_range(s->source, start, s->target, start, nbytes,
-+        ret = bdrv_co_copy_range(s->source, offset, s->target, offset, nbytes,
-, s->write_flags);
-         if (ret < 0) {
--            trace_block_copy_copy_range_fail(s, start, ret);
-+            trace_block_copy_copy_range_fail(s, offset, ret);
-             s->use_copy_range = false;
-             s->copy_size = MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER);
-             /* Fallback to read+write with allocated buffer */
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
-     bounce_buffer = qemu_blockalign(s->source->bs, nbytes);
--    ret = bdrv_co_pread(s->source, start, nbytes, bounce_buffer, 0);
-+    ret = bdrv_co_pread(s->source, offset, nbytes, bounce_buffer, 0);
-     if (ret < 0) {
--        trace_block_copy_read_fail(s, start, ret);
-+        trace_block_copy_read_fail(s, offset, ret);
-         if (error_is_read) {
-             *error_is_read = true;
-         }
-         goto out;
-     }
--    ret = bdrv_co_pwrite(s->target, start, nbytes, bounce_buffer,
-+    ret = bdrv_co_pwrite(s->target, offset, nbytes, bounce_buffer,
-                          s->write_flags);
-     if (ret < 0) {
--        trace_block_copy_write_fail(s, start, ret);
-+        trace_block_copy_write_fail(s, offset, ret);
-         if (error_is_read) {
-             *error_is_read = false;
-         }
-@@ -XXX,XX +XXX,XX @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
- }
- int coroutine_fn block_copy(BlockCopyState *s,
--                            int64_t start, int64_t bytes,
-+                            int64_t offset, int64_t bytes,
-                             bool *error_is_read)
- {
-     int ret = 0;
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
-     assert(bdrv_get_aio_context(s->source->bs) ==
-            bdrv_get_aio_context(s->target->bs));
--    assert(QEMU_IS_ALIGNED(start, s->cluster_size));
-+    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
-     assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
--    block_copy_wait_inflight_reqs(s, start, bytes);
--    block_copy_inflight_req_begin(s, &req, start, bytes);
-+    block_copy_wait_inflight_reqs(s, offset, bytes);
-+    block_copy_inflight_req_begin(s, &req, offset, bytes);
-     while (bytes) {
-         int64_t next_zero, cur_bytes, status_bytes;
--        if (!bdrv_dirty_bitmap_get(s->copy_bitmap, start)) {
--            trace_block_copy_skip(s, start);
--            start += s->cluster_size;
-+        if (!bdrv_dirty_bitmap_get(s->copy_bitmap, offset)) {
-+            trace_block_copy_skip(s, offset);
-+            offset += s->cluster_size;
-             bytes -= s->cluster_size;
-             continue; /* already copied */
-         }
-         cur_bytes = MIN(bytes, s->copy_size);
--        next_zero = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, start,
-+        next_zero = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, offset,
-                                                 cur_bytes);
-         if (next_zero >= 0) {
--            assert(next_zero > start); /* start is dirty */
--            assert(next_zero < start + cur_bytes); /* no need to do MIN() */
--            cur_bytes = next_zero - start;
-+            assert(next_zero > offset); /* offset is dirty */
-+            assert(next_zero < offset + cur_bytes); /* no need to do MIN() */
-+            cur_bytes = next_zero - offset;
-         }
--        ret = block_copy_block_status(s, start, cur_bytes, &status_bytes);
-+        ret = block_copy_block_status(s, offset, cur_bytes, &status_bytes);
-         if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
--            bdrv_reset_dirty_bitmap(s->copy_bitmap, start, status_bytes);
-+            bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, status_bytes);
-             progress_set_remaining(s->progress,
-                                    bdrv_get_dirty_count(s->copy_bitmap) +
-                                    s->in_flight_bytes);
--            trace_block_copy_skip_range(s, start, status_bytes);
--            start += status_bytes;
-+            trace_block_copy_skip_range(s, offset, status_bytes);
-+            offset += status_bytes;
-             bytes -= status_bytes;
-             continue;
-         }
-         cur_bytes = MIN(cur_bytes, status_bytes);
--        trace_block_copy_process(s, start);
-+        trace_block_copy_process(s, offset);
--        bdrv_reset_dirty_bitmap(s->copy_bitmap, start, cur_bytes);
-+        bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, cur_bytes);
-         s->in_flight_bytes += cur_bytes;
-         co_get_from_shres(s->mem, cur_bytes);
--        ret = block_copy_do_copy(s, start, cur_bytes, ret & BDRV_BLOCK_ZERO,
-+        ret = block_copy_do_copy(s, offset, cur_bytes, ret & BDRV_BLOCK_ZERO,
-                                  error_is_read);
-         co_put_to_shres(s->mem, cur_bytes);
-         s->in_flight_bytes -= cur_bytes;
-         if (ret < 0) {
--            bdrv_set_dirty_bitmap(s->copy_bitmap, start, cur_bytes);
-+            bdrv_set_dirty_bitmap(s->copy_bitmap, offset, cur_bytes);
-             break;
-         }
-         progress_work_done(s->progress, cur_bytes);
-         s->progress_bytes_callback(cur_bytes, s->progress_opaque);
--        start += cur_bytes;
-+        offset += cur_bytes;
-         bytes -= cur_bytes;
-     }
-diff --git a/include/block/block-copy.h b/include/block/block-copy.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/block/block-copy.h
-+++ b/include/block/block-copy.h
-@@ -XXX,XX +XXX,XX @@
- #include "qemu/co-shared-resource.h"
- typedef struct BlockCopyInFlightReq {
--    int64_t start;
-+    int64_t offset;
-     int64_t bytes;
-     QLIST_ENTRY(BlockCopyInFlightReq) list;
-     CoQueue wait_queue; /* coroutines blocked on this request */
-@@ -XXX,XX +XXX,XX @@ void block_copy_state_free(BlockCopyState *s);
- int64_t block_copy_reset_unallocated(BlockCopyState *s,
-                                      int64_t offset, int64_t *count);
--int coroutine_fn block_copy(BlockCopyState *s, int64_t start, int64_t bytes,
-+int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
-                             bool *error_is_read);
- #endif /* BLOCK_COPY_H */
 --
-.24.1
+.21.0

-[PULL 05/19] block/curl: HTTP header fields allow whitespace around values
+[Qemu-devel] [PULL v2 5/8] vmdk: Reduce the max bound for L1 table size
-From: David Edmondson <david.edmondson@oracle.com>
+From: Sam Eiderman <shmuel.eiderman@oracle.com>
-RFC 7230 section 3.2 indicates that whitespace is permitted between
+M of L1 entries is a very loose bound, only 32M are required to store
-the field name and field value and after the field value.
+the maximal supported VMDK file size of 2TB.
-Signed-off-by: David Edmondson <david.edmondson@oracle.com>
+Fixed qemu-iotest 59# - now failure occures before on impossible L1
-Message-Id: <20200224101310.101169-2-david.edmondson@oracle.com>
+table size.
 Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
 Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
 Reviewed-by: Liran Alon <liran.alon@oracle.com>
 Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
 Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
 Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com
 Reviewed-by: Max Reitz <mreitz@redhat.com>
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- block/curl.c | 31 +++++++++++++++++++++++++++----
+ block/vmdk.c               | 13 +++++++------
-file changed, 27 insertions(+), 4 deletions(-)
+ tests/qemu-iotests/059.out |  2 +-
 files changed, 8 insertions(+), 7 deletions(-)
-diff --git a/block/curl.c b/block/curl.c
+diff --git a/block/vmdk.c b/block/vmdk.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/curl.c
+--- a/block/vmdk.c
-+++ b/block/curl.c
++++ b/block/vmdk.c
-@@ -XXX,XX +XXX,XX @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
+@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
- {
+         error_setg(errp, "Invalid granularity, image may be corrupt");
-     BDRVCURLState *s = opaque;
+         return -EFBIG;
      size_t realsize = size * nmemb;
 -    const char *accept_line = "Accept-Ranges: bytes";
 +    const char *header = (char *)ptr;
 +    const char *end = header + realsize;
 +    const char *accept_ranges = "Accept-Ranges:";
 +    const char *bytes = "bytes";
 -    if (realsize >= strlen(accept_line)
 -        && strncmp((char *)ptr, accept_line, strlen(accept_line)) == 0) {
 -        s->accept_range = true;
 +    if (realsize >= strlen(accept_ranges)
 +        && strncmp(header, accept_ranges, strlen(accept_ranges)) == 0) {
 +
 +        char *p = strchr(header, ':') + 1;
 +
 +        /* Skip whitespace between the header name and value. */
 +        while (p < end && *p && g_ascii_isspace(*p)) {
 +            p++;
 +        }
 +
 +        if (end - p >= strlen(bytes)
 +            && strncmp(p, bytes, strlen(bytes)) == 0) {
 +
 +            /* Check that there is nothing but whitespace after the value. */
 +            p += strlen(bytes);
 +            while (p < end && *p && g_ascii_isspace(*p)) {
 +                p++;
 +            }
 +
 +            if (p == end || !*p) {
 +                s->accept_range = true;
 +            }
 +        }
      }
+-    if (l1_size > 512 * 1024 * 1024) {
-     return realsize;
++    if (l1_size > 32 * 1024 * 1024) {
          /*
           * Although with big capacity and small l1_entry_sectors, we can get a
           * big l1_size, we don't want unbounded value to allocate the table.
 -         * Limit it to 512M, which is:
 -         *     16PB - for default "Hosted Sparse Extent" (VMDK4)
 -         *            cluster size: 64KB, L2 table size: 512 entries
 -         *     1PB  - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
 -         *            cluster size: 512B, L2 table size: 4096 entries
 +         * Limit it to 32M, which is enough to store:
 +         *     8TB  - for both VMDK3 & VMDK4 with
 +         *            minimal cluster size: 512B
 +         *            minimal L2 table size: 512 entries
 +         *            8 TB is still more than the maximal value supported for
 +         *            VMDK3 & VMDK4 which is 2TB.
           */
          error_setg(errp, "L1 size too big");
          return -EFBIG;
 diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qemu-iotests/059.out
 +++ b/tests/qemu-iotests/059.out
@@ -XXX,XX +XXX,XX @@ Offset          Length          Mapped to       File
 x140000000     0x10000         0x50000         TEST_DIR/t-s003.vmdk
  === Testing afl image with a very large capacity ===
 -qemu-img: Can't get image size 'TEST_DIR/afl9.IMGFMT': File too large
 +qemu-img: Could not open 'TEST_DIR/afl9.IMGFMT': L1 size too big
  *** done
 --
-.24.1
+.21.0

-[PULL 01/19] luks: extract qcrypto_block_calculate_payload_offset()
+[Qemu-devel] [PULL v2 6/8] vmdk: Add read-only support for seSparse snapshots
-From: Stefan Hajnoczi <stefanha@redhat.com>
+From: Sam Eiderman <shmuel.eiderman@oracle.com>
-The qcow2 .bdrv_measure() code calculates the crypto payload offset.
+Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
-This logic really belongs in crypto/block.c where it can be reused by
+QEMU).
-other image formats.
+This format was lacking in the following:
-The "luks" block driver will need this same logic in order to implement
-.bdrv_measure(), so extract the qcrypto_block_calculate_payload_offset()
+    * Grain directory (L1) and grain table (L2) entries were 32-bit,
-function now.
+      allowing access to only 2TB (slightly less) of data.
+    * The grain size (default) was 512 bytes - leading to data
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+      fragmentation and many grain tables.
-Reviewed-by: Max Reitz <mreitz@redhat.com>
+    * For space reclamation purposes, it was necessary to find all the
-Message-Id: <20200221112522.1497712-2-stefanha@redhat.com>
+      grains which are not pointed to by any grain table - so a reverse
       mapping of "offset of grain in vmdk" to "grain table" must be
       constructed - which takes large amounts of CPU/RAM.
 The format specification can be found in VMware's documentation:
 https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf
 In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
 introduced: SESparse (Space Efficient).
 This format fixes the above issues:
     * All entries are now 64-bit.
     * The grain size (default) is 4KB.
     * Grain directory and grain tables are now located at the beginning
       of the file.
       + seSparse format reserves space for all grain tables.
       + Grain tables can be addressed using an index.
       + Grains are located in the end of the file and can also be
         addressed with an index.
       - seSparse vmdks of large disks (64TB) have huge preallocated
         headers - mainly due to L2 tables, even for empty snapshots.
     * The header contains a reverse mapping ("backmap") of "offset of
       grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
       specifies for each grain - whether it is allocated or not.
       Using these data structures we can implement space reclamation
       efficiently.
     * Due to the fact that the header now maintains two mappings:
         * The regular one (grain directory & grain tables)
         * A reverse one (backmap and free bitmap)
       These data structures can lose consistency upon crash and result
       in a corrupted VMDK.
       Therefore, a journal is also added to the VMDK and is replayed
       when the VMware reopens the file after a crash.
 Since ESXi 6.7 - SESparse is the only snapshot format available.
 Unfortunately, VMware does not provide documentation regarding the new
 seSparse format.
 This commit is based on black-box research of the seSparse format.
 Various in-guest block operations and their effect on the snapshot file
 were tested.
 The only VMware provided source of information (regarding the underlying
 implementation) was a log file on the ESXi:
     /var/log/hostd.log
 Whenever an seSparse snapshot is created - the log is being populated
 with seSparse records.
 Relevant log records are of the form:
 [...] Const Header:
 [...]  constMagic     = 0xcafebabe
 [...]  version        = 2.1
 [...]  capacity       = 204800
 [...]  grainSize      = 8
 [...]  grainTableSize = 64
 [...]  flags          = 0
 [...] Extents:
 [...]  Header         : <1 : 1>
 [...]  JournalHdr     : <2 : 2>
 [...]  Journal        : <2048 : 2048>
 [...]  GrainDirectory : <4096 : 2048>
 [...]  GrainTables    : <6144 : 2048>
 [...]  FreeBitmap     : <8192 : 2048>
 [...]  BackMap        : <10240 : 2048>
 [...]  Grain          : <12288 : 204800>
 [...] Volatile Header:
 [...] volatileMagic     = 0xcafecafe
 [...] FreeGTNumber      = 0
 [...] nextTxnSeqNumber  = 0
 [...] replayJournal     = 0
 The sizes that are seen in the log file are in sectors.
 Extents are of the following format: <offset : size>
 This commit is a strict implementation which enforces:
     * magics
     * version number 2.1
     * grain size of 8 sectors  (4KB)
     * grain table size of 64 sectors
     * zero flags
     * extent locations
 Additionally, this commit proivdes only a subset of the functionality
 offered by seSparse's format:
     * Read-only
     * No journal replay
     * No space reclamation
     * No unmap support
 Hence, journal header, journal, free bitmap and backmap extents are
 unused, only the "classic" (L1 -> L2 -> data) grain access is
 implemented.
 However there are several differences in the grain access itself.
 Grain directory (L1):
     * Grain directory entries are indexes (not offsets) to grain
       tables.
     * Valid grain directory entries have their highest nibble set to
 x1.
     * Since grain tables are always located in the beginning of the
       file - the index can fit into 32 bits - so we can use its low
       part if it's valid.
 Grain table (L2):
     * Grain table entries are indexes (not offsets) to grains.
     * If the highest nibble of the entry is:
 x0:
             The grain in not allocated.
             The rest of the bytes are 0.
 x1:
             The grain is unmapped - guest sees a zero grain.
             The rest of the bits point to the previously mapped grain,
             see 0x3 case.
 x2:
             The grain is zero.
 x3:
             The grain is allocated - to get the index calculate:
             ((entry & 0x0fff000000000000) >> 48) |
             ((entry & 0x0000ffffffffffff) << 12)
     * The difference between 0x1 and 0x2 is that 0x1 is an unallocated
       grain which results from the guest using sg_unmap to unmap the
       grain - but the grain itself still exists in the grain extent - a
       space reclamation procedure should delete it.
       Unmapping a zero grain has no effect (0x2 will not change to 0x1)
       but unmapping an unallocated grain will (0x0 to 0x1) - naturally.
 In order to implement seSparse some fields had to be changed to support
 both 32-bit and 64-bit entry sizes.
 Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
 Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
 Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
 Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
 Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- block/qcow2.c          | 74 +++++++++++-------------------------------
+ block/vmdk.c | 358 ++++++++++++++++++++++++++++++++++++++++++++++++---
- crypto/block.c         | 36 ++++++++++++++++++++
+file changed, 342 insertions(+), 16 deletions(-)
- include/crypto/block.h | 22 +++++++++++++
-files changed, 77 insertions(+), 55 deletions(-)
+diff --git a/block/vmdk.c b/block/vmdk.c
 diff --git a/block/qcow2.c b/block/qcow2.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2.c
+--- a/block/vmdk.c
-+++ b/block/qcow2.c
++++ b/block/vmdk.c
-@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
+@@ -XXX,XX +XXX,XX @@ typedef struct {
      uint16_t compressAlgorithm;
  } QEMU_PACKED VMDK4Header;
 +typedef struct VMDKSESparseConstHeader {
 +    uint64_t magic;
 +    uint64_t version;
 +    uint64_t capacity;
 +    uint64_t grain_size;
 +    uint64_t grain_table_size;
 +    uint64_t flags;
 +    uint64_t reserved1;
 +    uint64_t reserved2;
 +    uint64_t reserved3;
 +    uint64_t reserved4;
 +    uint64_t volatile_header_offset;
 +    uint64_t volatile_header_size;
 +    uint64_t journal_header_offset;
 +    uint64_t journal_header_size;
 +    uint64_t journal_offset;
 +    uint64_t journal_size;
 +    uint64_t grain_dir_offset;
 +    uint64_t grain_dir_size;
 +    uint64_t grain_tables_offset;
 +    uint64_t grain_tables_size;
 +    uint64_t free_bitmap_offset;
 +    uint64_t free_bitmap_size;
 +    uint64_t backmap_offset;
 +    uint64_t backmap_size;
 +    uint64_t grains_offset;
 +    uint64_t grains_size;
 +    uint8_t pad[304];
 +} QEMU_PACKED VMDKSESparseConstHeader;
 +
 +typedef struct VMDKSESparseVolatileHeader {
 +    uint64_t magic;
 +    uint64_t free_gt_number;
 +    uint64_t next_txn_seq_number;
 +    uint64_t replay_journal;
 +    uint8_t pad[480];
 +} QEMU_PACKED VMDKSESparseVolatileHeader;
 +
  #define L2_CACHE_SIZE 16
  typedef struct VmdkExtent {
@@ -XXX,XX +XXX,XX @@ typedef struct VmdkExtent {
      bool compressed;
      bool has_marker;
      bool has_zero_grain;
 +    bool sesparse;
 +    uint64_t sesparse_l2_tables_offset;
 +    uint64_t sesparse_clusters_offset;
 +    int32_t entry_size;
      int version;
      int64_t sectors;
      int64_t end_sector;
      int64_t flat_start_offset;
      int64_t l1_table_offset;
      int64_t l1_backup_table_offset;
 -    uint32_t *l1_table;
 +    void *l1_table;
      uint32_t *l1_backup_table;
      unsigned int l1_size;
      uint32_t l1_entry_sectors;
      unsigned int l2_size;
 -    uint32_t *l2_cache;
 +    void *l2_cache;
      uint32_t l2_cache_offsets[L2_CACHE_SIZE];
      uint32_t l2_cache_counts[L2_CACHE_SIZE];
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
           *            minimal L2 table size: 512 entries
           *            8 TB is still more than the maximal value supported for
           *            VMDK3 & VMDK4 which is 2TB.
 +         *     64TB - for "ESXi seSparse Extent"
 +         *            minimal cluster size: 512B (default is 4KB)
 +         *            L2 table size: 4096 entries (const).
 +         *            64TB is more than the maximal value supported for
 +         *            seSparse VMDKs (which is slightly less than 64TB)
           */
          error_setg(errp, "L1 size too big");
          return -EFBIG;
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
      extent->l2_size = l2_size;
      extent->cluster_sectors = flat ? sectors : cluster_sectors;
      extent->next_cluster_sector = ROUND_UP(nb_sectors, cluster_sectors);
 +    extent->entry_size = sizeof(uint32_t);
      if (s->num_extents > 1) {
          extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
      int i;
      /* read the L1 table */
 -    l1_size = extent->l1_size * sizeof(uint32_t);
 +    l1_size = extent->l1_size * extent->entry_size;
      extent->l1_table = g_try_malloc(l1_size);
      if (l1_size && extent->l1_table == NULL) {
          return -ENOMEM;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
          goto fail_l1;
      }
      for (i = 0; i < extent->l1_size; i++) {
 -        le32_to_cpus(&extent->l1_table[i]);
 +        if (extent->entry_size == sizeof(uint64_t)) {
 +            le64_to_cpus((uint64_t *)extent->l1_table + i);
 +        } else {
 +            assert(extent->entry_size == sizeof(uint32_t));
 +            le32_to_cpus((uint32_t *)extent->l1_table + i);
 +        }
      }
      if (extent->l1_backup_table_offset) {
 +        assert(!extent->sesparse);
          extent->l1_backup_table = g_try_malloc(l1_size);
          if (l1_size && extent->l1_backup_table == NULL) {
              ret = -ENOMEM;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
      }
      extent->l2_cache =
 -        g_new(uint32_t, extent->l2_size * L2_CACHE_SIZE);
 +        g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE);
      return 0;
   fail_l1b:
      g_free(extent->l1_backup_table);
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
      return ret;
  }
--static ssize_t qcow2_measure_crypto_hdr_init_func(QCryptoBlock *block,
++#define SESPARSE_CONST_HEADER_MAGIC UINT64_C(0x00000000cafebabe)
--        size_t headerlen, void *opaque, Error **errp)
++#define SESPARSE_VOLATILE_HEADER_MAGIC UINT64_C(0x00000000cafecafe)
--{
++
--    size_t *headerlenp = opaque;
++/* Strict checks - format not officially documented */
--
++static int check_se_sparse_const_header(VMDKSESparseConstHeader *header,
--    /* Stash away the payload size */
++                                        Error **errp)
 -    *headerlenp = headerlen;
 -    return 0;
 -}
 -
 -static ssize_t qcow2_measure_crypto_hdr_write_func(QCryptoBlock *block,
 -        size_t offset, const uint8_t *buf, size_t buflen,
 -        void *opaque, Error **errp)
 -{
 -    /* Discard the bytes, we're not actually writing to an image */
 -    return buflen;
 -}
 -
 -/* Determine the number of bytes for the LUKS payload */
 -static bool qcow2_measure_luks_headerlen(QemuOpts *opts, size_t *len,
 -                                         Error **errp)
 -{
 -    QDict *opts_qdict;
 -    QDict *cryptoopts_qdict;
 -    QCryptoBlockCreateOptions *cryptoopts;
 -    QCryptoBlock *crypto;
 -
 -    /* Extract "encrypt." options into a qdict */
 -    opts_qdict = qemu_opts_to_qdict(opts, NULL);
 -    qdict_extract_subqdict(opts_qdict, &cryptoopts_qdict, "encrypt.");
 -    qobject_unref(opts_qdict);
 -
 -    /* Build QCryptoBlockCreateOptions object from qdict */
 -    qdict_put_str(cryptoopts_qdict, "format", "luks");
 -    cryptoopts = block_crypto_create_opts_init(cryptoopts_qdict, errp);
 -    qobject_unref(cryptoopts_qdict);
 -    if (!cryptoopts) {
 -        return false;
 -    }
 -
 -    /* Fake LUKS creation in order to determine the payload size */
 -    crypto = qcrypto_block_create(cryptoopts, "encrypt.",
 -                                  qcow2_measure_crypto_hdr_init_func,
 -                                  qcow2_measure_crypto_hdr_write_func,
 -                                  len, errp);
 -    qapi_free_QCryptoBlockCreateOptions(cryptoopts);
 -    if (!crypto) {
 -        return false;
 -    }
 -
 -    qcrypto_block_free(crypto);
 -    return true;
 -}
 -
  static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
                                         Error **errp)
  {
@@ -XXX,XX +XXX,XX @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
      g_free(optstr);
      if (has_luks) {
 +        g_autoptr(QCryptoBlockCreateOptions) create_opts = NULL;
 +        QDict *opts_qdict;
 +        QDict *cryptoopts;
          size_t headerlen;
 -        if (!qcow2_measure_luks_headerlen(opts, &headerlen, &local_err)) {
 +        opts_qdict = qemu_opts_to_qdict(opts, NULL);
 +        qdict_extract_subqdict(opts_qdict, &cryptoopts, "encrypt.");
 +        qobject_unref(opts_qdict);
 +
 +        qdict_put_str(cryptoopts, "format", "luks");
 +
 +        create_opts = block_crypto_create_opts_init(cryptoopts, errp);
 +        qobject_unref(cryptoopts);
 +        if (!create_opts) {
 +            goto err;
 +        }
 +
 +        if (!qcrypto_block_calculate_payload_offset(create_opts,
 +                                                    "encrypt.",
 +                                                    &headerlen,
 +                                                    &local_err)) {
              goto err;
          }
 diff --git a/crypto/block.c b/crypto/block.c
 index XXXXXXX..XXXXXXX 100644
 --- a/crypto/block.c
 +++ b/crypto/block.c
@@ -XXX,XX +XXX,XX @@ QCryptoBlock *qcrypto_block_create(QCryptoBlockCreateOptions *options,
  }
 +static ssize_t qcrypto_block_headerlen_hdr_init_func(QCryptoBlock *block,
 +        size_t headerlen, void *opaque, Error **errp)
 +{
-+    size_t *headerlenp = opaque;
++    header->magic = le64_to_cpu(header->magic);
-+
++    header->version = le64_to_cpu(header->version);
-+    /* Stash away the payload size */
++    header->grain_size = le64_to_cpu(header->grain_size);
-+    *headerlenp = headerlen;
++    header->grain_table_size = le64_to_cpu(header->grain_table_size);
 +    header->flags = le64_to_cpu(header->flags);
 +    header->reserved1 = le64_to_cpu(header->reserved1);
 +    header->reserved2 = le64_to_cpu(header->reserved2);
 +    header->reserved3 = le64_to_cpu(header->reserved3);
 +    header->reserved4 = le64_to_cpu(header->reserved4);
 +
 +    header->volatile_header_offset =
 +        le64_to_cpu(header->volatile_header_offset);
 +    header->volatile_header_size = le64_to_cpu(header->volatile_header_size);
 +
 +    header->journal_header_offset = le64_to_cpu(header->journal_header_offset);
 +    header->journal_header_size = le64_to_cpu(header->journal_header_size);
 +
 +    header->journal_offset = le64_to_cpu(header->journal_offset);
 +    header->journal_size = le64_to_cpu(header->journal_size);
 +
 +    header->grain_dir_offset = le64_to_cpu(header->grain_dir_offset);
 +    header->grain_dir_size = le64_to_cpu(header->grain_dir_size);
 +
 +    header->grain_tables_offset = le64_to_cpu(header->grain_tables_offset);
 +    header->grain_tables_size = le64_to_cpu(header->grain_tables_size);
 +
 +    header->free_bitmap_offset = le64_to_cpu(header->free_bitmap_offset);
 +    header->free_bitmap_size = le64_to_cpu(header->free_bitmap_size);
 +
 +    header->backmap_offset = le64_to_cpu(header->backmap_offset);
 +    header->backmap_size = le64_to_cpu(header->backmap_size);
 +
 +    header->grains_offset = le64_to_cpu(header->grains_offset);
 +    header->grains_size = le64_to_cpu(header->grains_size);
 +
 +    if (header->magic != SESPARSE_CONST_HEADER_MAGIC) {
 +        error_setg(errp, "Bad const header magic: 0x%016" PRIx64,
 +                   header->magic);
 +        return -EINVAL;
 +    }
 +
 +    if (header->version != 0x0000000200000001) {
 +        error_setg(errp, "Unsupported version: 0x%016" PRIx64,
 +                   header->version);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->grain_size != 8) {
 +        error_setg(errp, "Unsupported grain size: %" PRIu64,
 +                   header->grain_size);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->grain_table_size != 64) {
 +        error_setg(errp, "Unsupported grain table size: %" PRIu64,
 +                   header->grain_table_size);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->flags != 0) {
 +        error_setg(errp, "Unsupported flags: 0x%016" PRIx64,
 +                   header->flags);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->reserved1 != 0 || header->reserved2 != 0 ||
 +        header->reserved3 != 0 || header->reserved4 != 0) {
 +        error_setg(errp, "Unsupported reserved bits:"
 +                   " 0x%016" PRIx64 " 0x%016" PRIx64
 +                   " 0x%016" PRIx64 " 0x%016" PRIx64,
 +                   header->reserved1, header->reserved2,
 +                   header->reserved3, header->reserved4);
 +        return -ENOTSUP;
 +    }
 +
 +    /* check that padding is 0 */
 +    if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
 +        error_setg(errp, "Unsupported non-zero const header padding");
 +        return -ENOTSUP;
 +    }
 +
 +    return 0;
 +}
 +
-+
++static int check_se_sparse_volatile_header(VMDKSESparseVolatileHeader *header,
-+static ssize_t qcrypto_block_headerlen_hdr_write_func(QCryptoBlock *block,
++                                           Error **errp)
 +        size_t offset, const uint8_t *buf, size_t buflen,
 +        void *opaque, Error **errp)
 +{
-+    /* Discard the bytes, we're not actually writing to an image */
++    header->magic = le64_to_cpu(header->magic);
-+    return buflen;
++    header->free_gt_number = le64_to_cpu(header->free_gt_number);
 +    header->next_txn_seq_number = le64_to_cpu(header->next_txn_seq_number);
 +    header->replay_journal = le64_to_cpu(header->replay_journal);
 +
 +    if (header->magic != SESPARSE_VOLATILE_HEADER_MAGIC) {
 +        error_setg(errp, "Bad volatile header magic: 0x%016" PRIx64,
 +                   header->magic);
 +        return -EINVAL;
 +    }
 +
 +    if (header->replay_journal) {
 +        error_setg(errp, "Image is dirty, Replaying journal not supported");
 +        return -ENOTSUP;
 +    }
 +
 +    /* check that padding is 0 */
 +    if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
 +        error_setg(errp, "Unsupported non-zero volatile header padding");
 +        return -ENOTSUP;
 +    }
 +
 +    return 0;
 +}
 +
-+
++static int vmdk_open_se_sparse(BlockDriverState *bs,
-+bool
++                               BdrvChild *file,
-+qcrypto_block_calculate_payload_offset(QCryptoBlockCreateOptions *create_opts,
++                               int flags, Error **errp)
 +                                       const char *optprefix,
 +                                       size_t *len,
 +                                       Error **errp)
 +{
-+    /* Fake LUKS creation in order to determine the payload size */
++    int ret;
-+    g_autoptr(QCryptoBlock) crypto =
++    VMDKSESparseConstHeader const_header;
-+        qcrypto_block_create(create_opts, optprefix,
++    VMDKSESparseVolatileHeader volatile_header;
-+                             qcrypto_block_headerlen_hdr_init_func,
++    VmdkExtent *extent;
-+                             qcrypto_block_headerlen_hdr_write_func,
++
-+                             len, errp);
++    ret = bdrv_apply_auto_read_only(bs,
-+    return crypto != NULL;
++            "No write support for seSparse images available", errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    assert(sizeof(const_header) == SECTOR_SIZE);
 +
 +    ret = bdrv_pread(file, 0, &const_header, sizeof(const_header));
 +    if (ret < 0) {
 +        bdrv_refresh_filename(file->bs);
 +        error_setg_errno(errp, -ret,
 +                         "Could not read const header from file '%s'",
 +                         file->bs->filename);
 +        return ret;
 +    }
 +
 +    /* check const header */
 +    ret = check_se_sparse_const_header(&const_header, errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    assert(sizeof(volatile_header) == SECTOR_SIZE);
 +
 +    ret = bdrv_pread(file,
 +                     const_header.volatile_header_offset * SECTOR_SIZE,
 +                     &volatile_header, sizeof(volatile_header));
 +    if (ret < 0) {
 +        bdrv_refresh_filename(file->bs);
 +        error_setg_errno(errp, -ret,
 +                         "Could not read volatile header from file '%s'",
 +                         file->bs->filename);
 +        return ret;
 +    }
 +
 +    /* check volatile header */
 +    ret = check_se_sparse_volatile_header(&volatile_header, errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    ret = vmdk_add_extent(bs, file, false,
 +                          const_header.capacity,
 +                          const_header.grain_dir_offset * SECTOR_SIZE,
 +                          0,
 +                          const_header.grain_dir_size *
 +                          SECTOR_SIZE / sizeof(uint64_t),
 +                          const_header.grain_table_size *
 +                          SECTOR_SIZE / sizeof(uint64_t),
 +                          const_header.grain_size,
 +                          &extent,
 +                          errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    extent->sesparse = true;
 +    extent->sesparse_l2_tables_offset = const_header.grain_tables_offset;
 +    extent->sesparse_clusters_offset = const_header.grains_offset;
 +    extent->entry_size = sizeof(uint64_t);
 +
 +    ret = vmdk_init_tables(bs, extent, errp);
 +    if (ret) {
 +        /* free extent allocated by vmdk_add_extent */
 +        vmdk_free_last_extent(bs);
 +    }
 +
 +    return ret;
 +}
 +
-+
+ static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
- QCryptoBlockInfo *qcrypto_block_get_info(QCryptoBlock *block,
+                                QDict *options, Error **errp);
-                                          Error **errp)
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
           * RW [size in sectors] SPARSE "file-name.vmdk"
           * RW [size in sectors] VMFS "file-name.vmdk"
           * RW [size in sectors] VMFSSPARSE "file-name.vmdk"
 +         * RW [size in sectors] SESPARSE "file-name.vmdk"
           */
          flat_offset = -1;
          matches = sscanf(p, "%10s %" SCNd64 " %10s \"%511[^\n\r\"]\" %" SCNd64,
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
          if (sectors <= 0 ||
              (strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
 -             strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE")) ||
 +             strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE") &&
 +             strcmp(type, "SESPARSE")) ||
              (strcmp(access, "RW"))) {
              continue;
          }
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
                  return ret;
              }
              extent = &s->extents[s->num_extents - 1];
 +        } else if (!strcmp(type, "SESPARSE")) {
 +            ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
 +            if (ret) {
 +                bdrv_unref_child(bs, extent_file);
 +                return ret;
 +            }
 +            extent = &s->extents[s->num_extents - 1];
          } else {
              error_setg(errp, "Unsupported extent type '%s'", type);
              bdrv_unref_child(bs, extent_file);
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
      if (strcmp(ct, "monolithicFlat") &&
          strcmp(ct, "vmfs") &&
          strcmp(ct, "vmfsSparse") &&
 +        strcmp(ct, "seSparse") &&
          strcmp(ct, "twoGbMaxExtentSparse") &&
          strcmp(ct, "twoGbMaxExtentFlat")) {
          error_setg(errp, "Unsupported image type '%s'", ct);
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
  {
-diff --git a/include/crypto/block.h b/include/crypto/block.h
+     unsigned int l1_index, l2_offset, l2_index;
-index XXXXXXX..XXXXXXX 100644
+     int min_index, i, j;
---- a/include/crypto/block.h
+-    uint32_t min_count, *l2_table;
-+++ b/include/crypto/block.h
++    uint32_t min_count;
-@@ -XXX,XX +XXX,XX @@ QCryptoBlock *qcrypto_block_create(QCryptoBlockCreateOptions *options,
++    void *l2_table;
-                                    Error **errp);
+     bool zeroed = false;
+     int64_t ret;
+     int64_t cluster_sector;
-+/**
++    unsigned int l2_size_bytes = extent->l2_size * extent->entry_size;
-+ * qcrypto_block_calculate_payload_offset:
-+ * @create_opts: the encryption options
+     if (m_data) {
-+ * @optprefix: name prefix for options
+         m_data->valid = 0;
-+ * @len: output for number of header bytes before payload
+@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
-+ * @errp: pointer to a NULL-initialized error object
+     if (l1_index >= extent->l1_size) {
-+ *
+         return VMDK_ERROR;
-+ * Calculate the number of header bytes before the payload in an encrypted
+     }
-+ * storage volume.  The header is an area before the payload that is reserved
+-    l2_offset = extent->l1_table[l1_index];
-+ * for encryption metadata.
++    if (extent->sesparse) {
-+ *
++        uint64_t l2_offset_u64;
-+ * Returns: true on success, false on error
++
-+ */
++        assert(extent->entry_size == sizeof(uint64_t));
-+bool
++
-+qcrypto_block_calculate_payload_offset(QCryptoBlockCreateOptions *create_opts,
++        l2_offset_u64 = ((uint64_t *)extent->l1_table)[l1_index];
-+                                       const char *optprefix,
++        if (l2_offset_u64 == 0) {
-+                                       size_t *len,
++            l2_offset = 0;
-+                                       Error **errp);
++        } else if ((l2_offset_u64 & 0xffffffff00000000) != 0x1000000000000000) {
-+
++            /*
-+
++             * Top most nibble is 0x1 if grain table is allocated.
- /**
++             * strict check - top most 4 bytes must be 0x10000000 since max
-  * qcrypto_block_get_info:
++             * supported size is 64TB for disk - so no more than 64TB / 16MB
-  * @block: the block encryption object
++             * grain directories which is smaller than uint32,
-@@ -XXX,XX +XXX,XX @@ uint64_t qcrypto_block_get_sector_size(QCryptoBlock *block);
++             * where 16MB is the only supported default grain table coverage.
- void qcrypto_block_free(QCryptoBlock *block);
++             */
++            return VMDK_ERROR;
- G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoBlock, qcrypto_block_free)
++        } else {
-+G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoBlockCreateOptions,
++            l2_offset_u64 = l2_offset_u64 & 0x00000000ffffffff;
-+                              qapi_free_QCryptoBlockCreateOptions)
++            l2_offset_u64 = extent->sesparse_l2_tables_offset +
++                l2_offset_u64 * l2_size_bytes / SECTOR_SIZE;
- #endif /* QCRYPTO_BLOCK_H */
++            if (l2_offset_u64 > 0x00000000ffffffff) {
 +                return VMDK_ERROR;
 +            }
 +            l2_offset = (unsigned int)(l2_offset_u64);
 +        }
 +    } else {
 +        assert(extent->entry_size == sizeof(uint32_t));
 +        l2_offset = ((uint32_t *)extent->l1_table)[l1_index];
 +    }
      if (!l2_offset) {
          return VMDK_UNALLOC;
      }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
                      extent->l2_cache_counts[j] >>= 1;
                  }
              }
 -            l2_table = extent->l2_cache + (i * extent->l2_size);
 +            l2_table = (char *)extent->l2_cache + (i * l2_size_bytes);
              goto found;
          }
      }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
              min_index = i;
          }
      }
 -    l2_table = extent->l2_cache + (min_index * extent->l2_size);
 +    l2_table = (char *)extent->l2_cache + (min_index * l2_size_bytes);
      BLKDBG_EVENT(extent->file, BLKDBG_L2_LOAD);
      if (bdrv_pread(extent->file,
                  (int64_t)l2_offset * 512,
                  l2_table,
 -                extent->l2_size * sizeof(uint32_t)
 -            ) != extent->l2_size * sizeof(uint32_t)) {
 +                l2_size_bytes
 +            ) != l2_size_bytes) {
          return VMDK_ERROR;
      }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
      extent->l2_cache_counts[min_index] = 1;
   found:
      l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
 -    cluster_sector = le32_to_cpu(l2_table[l2_index]);
 -    if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
 -        zeroed = true;
 +    if (extent->sesparse) {
 +        cluster_sector = le64_to_cpu(((uint64_t *)l2_table)[l2_index]);
 +        switch (cluster_sector & 0xf000000000000000) {
 +        case 0x0000000000000000:
 +            /* unallocated grain */
 +            if (cluster_sector != 0) {
 +                return VMDK_ERROR;
 +            }
 +            break;
 +        case 0x1000000000000000:
 +            /* scsi-unmapped grain - fallthrough */
 +        case 0x2000000000000000:
 +            /* zero grain */
 +            zeroed = true;
 +            break;
 +        case 0x3000000000000000:
 +            /* allocated grain */
 +            cluster_sector = (((cluster_sector & 0x0fff000000000000) >> 48) |
 +                              ((cluster_sector & 0x0000ffffffffffff) << 12));
 +            cluster_sector = extent->sesparse_clusters_offset +
 +                cluster_sector * extent->cluster_sectors;
 +            break;
 +        default:
 +            return VMDK_ERROR;
 +        }
 +    } else {
 +        cluster_sector = le32_to_cpu(((uint32_t *)l2_table)[l2_index]);
 +
 +        if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
 +            zeroed = true;
 +        }
      }
      if (!cluster_sector || zeroed) {
          if (!allocate) {
              return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
          }
 +        assert(!extent->sesparse);
          if (extent->next_cluster_sector >= VMDK_EXTENT_MAX_SECTORS) {
              return VMDK_ERROR;
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
              m_data->l1_index = l1_index;
              m_data->l2_index = l2_index;
              m_data->l2_offset = l2_offset;
 -            m_data->l2_cache_entry = &l2_table[l2_index];
 +            m_data->l2_cache_entry = ((uint32_t *)l2_table) + l2_index;
          }
      }
      *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
@@ -XXX,XX +XXX,XX @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
          if (!extent) {
              return -EIO;
          }
 +        if (extent->sesparse) {
 +            return -ENOTSUP;
 +        }
          offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
          n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
                               - offset_in_cluster);
 --
-.24.1
+.21.0

-[PULL 02/19] luks: implement .bdrv_measure()
+[Qemu-devel] [PULL v2 7/8] ssh: switch from libssh2 to libssh
-From: Stefan Hajnoczi <stefanha@redhat.com>
+From: Pino Toscano <ptoscano@redhat.com>
-Add qemu-img measure support in the "luks" block driver.
+Rewrite the implementation of the ssh block driver to use libssh instead
 of libssh2.  The libssh library has various advantages over libssh2:
 - easier API for authentication (for example for using ssh-agent)
 - easier API for known_hosts handling
 - supports newer types of keys in known_hosts
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Use APIs/features available in libssh 0.8 conditionally, to support
-Reviewed-by: Max Reitz <mreitz@redhat.com>
+older versions (which are not recommended though).
-Message-Id: <20200221112522.1497712-3-stefanha@redhat.com>
 Adjust the iotest 207 according to the different error message, and to
 find the default key type for localhost (to properly compare the
 fingerprint with).
 Contributed-by: Max Reitz <mreitz@redhat.com>
 Adjust the various Docker/Travis scripts to use libssh when available
 instead of libssh2. The mingw/mxe testing is dropped for now, as there
 are no packages for it.
 Signed-off-by: Pino Toscano <ptoscano@redhat.com>
 Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Acked-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20190620200840.17655-1-ptoscano@redhat.com
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- block/crypto.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
+ configure                                     |  65 +-
-file changed, 62 insertions(+)
+ block/Makefile.objs                           |   6 +-
  block/ssh.c                                   | 652 ++++++++++--------
  .travis.yml                                   |   4 +-
  block/trace-events                            |  14 +-
  docs/qemu-block-drivers.texi                  |   2 +-
  .../dockerfiles/debian-win32-cross.docker     |   1 -
  .../dockerfiles/debian-win64-cross.docker     |   1 -
  tests/docker/dockerfiles/fedora.docker        |   4 +-
  tests/docker/dockerfiles/ubuntu.docker        |   2 +-
  tests/docker/dockerfiles/ubuntu1804.docker    |   2 +-
  tests/qemu-iotests/207                        |  54 +-
  tests/qemu-iotests/207.out                    |   2 +-
 files changed, 449 insertions(+), 360 deletions(-)
-diff --git a/block/crypto.c b/block/crypto.c
+diff --git a/configure b/configure
 index XXXXXXX..XXXXXXX 100755
 --- a/configure
 +++ b/configure
@@ -XXX,XX +XXX,XX @@ auth_pam=""
  vte=""
  virglrenderer=""
  tpm=""
 -libssh2=""
 +libssh=""
  live_block_migration="yes"
  numa=""
  tcmalloc="no"
@@ -XXX,XX +XXX,XX @@ for opt do
    ;;
    --enable-tpm) tpm="yes"
    ;;
 -  --disable-libssh2) libssh2="no"
 +  --disable-libssh) libssh="no"
    ;;
 -  --enable-libssh2) libssh2="yes"
 +  --enable-libssh) libssh="yes"
    ;;
    --disable-live-block-migration) live_block_migration="no"
    ;;
@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available:
    coroutine-pool  coroutine freelist (better performance)
    glusterfs       GlusterFS backend
    tpm             TPM support
 -  libssh2         ssh block device support
 +  libssh          ssh block device support
    numa            libnuma support
    libxml2         for Parallels image format
    tcmalloc        tcmalloc support
@@ -XXX,XX +XXX,XX @@ EOF
  fi
  ##########################################
 -# libssh2 probe
 -min_libssh2_version=1.2.8
 -if test "$libssh2" != "no" ; then
 -  if $pkg_config --atleast-version=$min_libssh2_version libssh2; then
 -    libssh2_cflags=$($pkg_config libssh2 --cflags)
 -    libssh2_libs=$($pkg_config libssh2 --libs)
 -    libssh2=yes
 +# libssh probe
 +if test "$libssh" != "no" ; then
 +  if $pkg_config --exists libssh; then
 +    libssh_cflags=$($pkg_config libssh --cflags)
 +    libssh_libs=$($pkg_config libssh --libs)
 +    libssh=yes
    else
 -    if test "$libssh2" = "yes" ; then
 -      error_exit "libssh2 >= $min_libssh2_version required for --enable-libssh2"
 +    if test "$libssh" = "yes" ; then
 +      error_exit "libssh required for --enable-libssh"
      fi
 -    libssh2=no
 +    libssh=no
    fi
  fi
  ##########################################
 -# libssh2_sftp_fsync probe
 +# Check for libssh 0.8
 +# This is done like this instead of using the LIBSSH_VERSION_* and
 +# SSH_VERSION_* macros because some distributions in the past shipped
 +# snapshots of the future 0.8 from Git, and those snapshots did not
 +# have updated version numbers (still referring to 0.7.0).
 -if test "$libssh2" = "yes"; then
 +if test "$libssh" = "yes"; then
    cat > $TMPC <<EOF
 -#include <stdio.h>
 -#include <libssh2.h>
 -#include <libssh2_sftp.h>
 -int main(void) {
 -    LIBSSH2_SESSION *session;
 -    LIBSSH2_SFTP *sftp;
 -    LIBSSH2_SFTP_HANDLE *sftp_handle;
 -    session = libssh2_session_init ();
 -    sftp = libssh2_sftp_init (session);
 -    sftp_handle = libssh2_sftp_open (sftp, "/", 0, 0);
 -    libssh2_sftp_fsync (sftp_handle);
 -    return 0;
 -}
 +#include <libssh/libssh.h>
 +int main(void) { return ssh_get_server_publickey(NULL, NULL); }
  EOF
 -  # libssh2_cflags/libssh2_libs defined in previous test.
 -  if compile_prog "$libssh2_cflags" "$libssh2_libs" ; then
 -    QEMU_CFLAGS="-DHAS_LIBSSH2_SFTP_FSYNC $QEMU_CFLAGS"
 +  if compile_prog "$libssh_cflags" "$libssh_libs"; then
 +    libssh_cflags="-DHAVE_LIBSSH_0_8 $libssh_cflags"
    fi
  fi
@@ -XXX,XX +XXX,XX @@ echo "GlusterFS support $glusterfs"
  echo "gcov              $gcov_tool"
  echo "gcov enabled      $gcov"
  echo "TPM support       $tpm"
 -echo "libssh2 support   $libssh2"
 +echo "libssh support    $libssh"
  echo "QOM debugging     $qom_cast_debug"
  echo "Live block migration $live_block_migration"
  echo "lzo support       $lzo"
@@ -XXX,XX +XXX,XX @@ if test "$glusterfs_iocb_has_stat" = "yes" ; then
    echo "CONFIG_GLUSTERFS_IOCB_HAS_STAT=y" >> $config_host_mak
  fi
 -if test "$libssh2" = "yes" ; then
 -  echo "CONFIG_LIBSSH2=m" >> $config_host_mak
 -  echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
 -  echo "LIBSSH2_LIBS=$libssh2_libs" >> $config_host_mak
 +if test "$libssh" = "yes" ; then
 +  echo "CONFIG_LIBSSH=m" >> $config_host_mak
 +  echo "LIBSSH_CFLAGS=$libssh_cflags" >> $config_host_mak
 +  echo "LIBSSH_LIBS=$libssh_libs" >> $config_host_mak
  fi
  if test "$live_block_migration" = "yes" ; then
 diff --git a/block/Makefile.objs b/block/Makefile.objs
 index XXXXXXX..XXXXXXX 100644
---- a/block/crypto.c
+--- a/block/Makefile.objs
-+++ b/block/crypto.c
++++ b/block/Makefile.objs
-@@ -XXX,XX +XXX,XX @@ static int64_t block_crypto_getlength(BlockDriverState *bs)
+@@ -XXX,XX +XXX,XX @@ block-obj-$(CONFIG_CURL) += curl.o
  block-obj-$(CONFIG_RBD) += rbd.o
  block-obj-$(CONFIG_GLUSTERFS) += gluster.o
  block-obj-$(CONFIG_VXHS) += vxhs.o
 -block-obj-$(CONFIG_LIBSSH2) += ssh.o
 +block-obj-$(CONFIG_LIBSSH) += ssh.o
  block-obj-y += accounting.o dirty-bitmap.o
  block-obj-y += write-threshold.o
  block-obj-y += backup.o
@@ -XXX,XX +XXX,XX @@ rbd.o-libs         := $(RBD_LIBS)
  gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
  gluster.o-libs     := $(GLUSTERFS_LIBS)
  vxhs.o-libs        := $(VXHS_LIBS)
 -ssh.o-cflags       := $(LIBSSH2_CFLAGS)
 -ssh.o-libs         := $(LIBSSH2_LIBS)
 +ssh.o-cflags       := $(LIBSSH_CFLAGS)
 +ssh.o-libs         := $(LIBSSH_LIBS)
  block-obj-dmg-bz2-$(CONFIG_BZIP2) += dmg-bz2.o
  block-obj-$(if $(CONFIG_DMG),m,n) += $(block-obj-dmg-bz2-y)
  dmg-bz2.o-libs     := $(BZIP2_LIBS)
 diff --git a/block/ssh.c b/block/ssh.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/ssh.c
 +++ b/block/ssh.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
 -#include <libssh2.h>
 -#include <libssh2_sftp.h>
 +#include <libssh/libssh.h>
 +#include <libssh/sftp.h>
  #include "block/block_int.h"
  #include "block/qdict.h"
@@ -XXX,XX +XXX,XX @@
  #include "trace.h"
  /*
 - * TRACE_LIBSSH2=<bitmask> enables tracing in libssh2 itself.  Note
 - * that this requires that libssh2 was specially compiled with the
 - * `./configure --enable-debug' option, so most likely you will have
 - * to compile it yourself.  The meaning of <bitmask> is described
 - * here: http://www.libssh2.org/libssh2_trace.html
 + * TRACE_LIBSSH=<level> enables tracing in libssh itself.
 + * The meaning of <level> is described here:
 + * http://api.libssh.org/master/group__libssh__log.html
   */
 -#define TRACE_LIBSSH2 0 /* or try: LIBSSH2_TRACE_SFTP */
 +#define TRACE_LIBSSH  0 /* see: SSH_LOG_* */
  typedef struct BDRVSSHState {
      /* Coroutine. */
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVSSHState {
      /* SSH connection. */
      int sock;                         /* socket */
 -    LIBSSH2_SESSION *session;         /* ssh session */
 -    LIBSSH2_SFTP *sftp;               /* sftp session */
 -    LIBSSH2_SFTP_HANDLE *sftp_handle; /* sftp remote file handle */
 +    ssh_session session;              /* ssh session */
 +    sftp_session sftp;                /* sftp session */
 +    sftp_file sftp_handle;            /* sftp remote file handle */
 -    /* See ssh_seek() function below. */
 -    int64_t offset;
 -    bool offset_op_read;
 -
 -    /* File attributes at open.  We try to keep the .filesize field
 +    /*
 +     * File attributes at open.  We try to keep the .size field
       * updated if it changes (eg by writing at the end of the file).
       */
 -    LIBSSH2_SFTP_ATTRIBUTES attrs;
 +    sftp_attributes attrs;
      InetSocketAddress *inet;
@@ -XXX,XX +XXX,XX @@ static void ssh_state_init(BDRVSSHState *s)
  {
      memset(s, 0, sizeof *s);
      s->sock = -1;
 -    s->offset = -1;
      qemu_co_mutex_init(&s->lock);
  }
+@@ -XXX,XX +XXX,XX @@ static void ssh_state_free(BDRVSSHState *s)
-+static BlockMeasureInfo *block_crypto_measure(QemuOpts *opts,
+ {
-+                                              BlockDriverState *in_bs,
+     g_free(s->user);
-+                                              Error **errp)
-+{
++    if (s->attrs) {
-+    g_autoptr(QCryptoBlockCreateOptions) create_opts = NULL;
++        sftp_attributes_free(s->attrs);
-+    Error *local_err = NULL;
++    }
-+    BlockMeasureInfo *info;
+     if (s->sftp_handle) {
-+    uint64_t size;
+-        libssh2_sftp_close(s->sftp_handle);
-+    size_t luks_payload_size;
++        sftp_close(s->sftp_handle);
-+    QDict *cryptoopts;
+     }
      if (s->sftp) {
 -        libssh2_sftp_shutdown(s->sftp);
 +        sftp_free(s->sftp);
      }
      if (s->session) {
 -        libssh2_session_disconnect(s->session,
 -                                   "from qemu ssh client: "
 -                                   "user closed the connection");
 -        libssh2_session_free(s->session);
 -    }
 -    if (s->sock >= 0) {
 -        close(s->sock);
 +        ssh_disconnect(s->session);
 +        ssh_free(s->session); /* This frees s->sock */
      }
  }
@@ -XXX,XX +XXX,XX @@ session_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
      va_end(args);
      if (s->session) {
 -        char *ssh_err;
 +        const char *ssh_err;
          int ssh_err_code;
 -        /* This is not an errno.  See <libssh2.h>. */
 -        ssh_err_code = libssh2_session_last_error(s->session,
 -                                                  &ssh_err, NULL, 0);
 -        error_setg(errp, "%s: %s (libssh2 error code: %d)",
 +        /* This is not an errno.  See <libssh/libssh.h>. */
 +        ssh_err = ssh_get_error(s->session);
 +        ssh_err_code = ssh_get_error_code(s->session);
 +        error_setg(errp, "%s: %s (libssh error code: %d)",
                     msg, ssh_err, ssh_err_code);
      } else {
          error_setg(errp, "%s", msg);
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
      va_end(args);
      if (s->sftp) {
 -        char *ssh_err;
 +        const char *ssh_err;
          int ssh_err_code;
 -        unsigned long sftp_err_code;
 +        int sftp_err_code;
 -        /* This is not an errno.  See <libssh2.h>. */
 -        ssh_err_code = libssh2_session_last_error(s->session,
 -                                                  &ssh_err, NULL, 0);
 -        /* See <libssh2_sftp.h>. */
 -        sftp_err_code = libssh2_sftp_last_error((s)->sftp);
 +        /* This is not an errno.  See <libssh/libssh.h>. */
 +        ssh_err = ssh_get_error(s->session);
 +        ssh_err_code = ssh_get_error_code(s->session);
 +        /* See <libssh/sftp.h>. */
 +        sftp_err_code = sftp_get_error(s->sftp);
          error_setg(errp,
 -                   "%s: %s (libssh2 error code: %d, sftp error code: %lu)",
 +                   "%s: %s (libssh error code: %d, sftp error code: %d)",
                     msg, ssh_err, ssh_err_code, sftp_err_code);
      } else {
          error_setg(errp, "%s", msg);
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
  static void sftp_error_trace(BDRVSSHState *s, const char *op)
  {
 -    char *ssh_err;
 +    const char *ssh_err;
      int ssh_err_code;
 -    unsigned long sftp_err_code;
 +    int sftp_err_code;
 -    /* This is not an errno.  See <libssh2.h>. */
 -    ssh_err_code = libssh2_session_last_error(s->session,
 -                                              &ssh_err, NULL, 0);
 -    /* See <libssh2_sftp.h>. */
 -    sftp_err_code = libssh2_sftp_last_error((s)->sftp);
 +    /* This is not an errno.  See <libssh/libssh.h>. */
 +    ssh_err = ssh_get_error(s->session);
 +    ssh_err_code = ssh_get_error_code(s->session);
 +    /* See <libssh/sftp.h>. */
 +    sftp_err_code = sftp_get_error(s->sftp);
      trace_sftp_error(op, ssh_err, ssh_err_code, sftp_err_code);
  }
@@ -XXX,XX +XXX,XX @@ static void ssh_parse_filename(const char *filename, QDict *options,
      parse_uri(filename, options, errp);
  }
 -static int check_host_key_knownhosts(BDRVSSHState *s,
 -                                     const char *host, int port, Error **errp)
 +static int check_host_key_knownhosts(BDRVSSHState *s, Error **errp)
  {
 -    const char *home;
 -    char *knh_file = NULL;
 -    LIBSSH2_KNOWNHOSTS *knh = NULL;
 -    struct libssh2_knownhost *found;
 -    int ret, r;
 -    const char *hostkey;
 -    size_t len;
 -    int type;
 -
 -    hostkey = libssh2_session_hostkey(s->session, &len, &type);
 -    if (!hostkey) {
 +    int ret;
 +#ifdef HAVE_LIBSSH_0_8
 +    enum ssh_known_hosts_e state;
 +    int r;
 +    ssh_key pubkey;
 +    enum ssh_keytypes_e pubkey_type;
 +    unsigned char *server_hash = NULL;
 +    size_t server_hash_len;
 +    char *fingerprint = NULL;
 +
 +    state = ssh_session_is_known_server(s->session);
 +    trace_ssh_server_status(state);
 +
 +    switch (state) {
 +    case SSH_KNOWN_HOSTS_OK:
 +        /* OK */
 +        trace_ssh_check_host_key_knownhosts();
 +        break;
 +    case SSH_KNOWN_HOSTS_CHANGED:
          ret = -EINVAL;
 -        session_error_setg(errp, s, "failed to read remote host key");
 +        r = ssh_get_server_publickey(s->session, &pubkey);
 +        if (r == 0) {
 +            r = ssh_get_publickey_hash(pubkey, SSH_PUBLICKEY_HASH_SHA256,
 +                                       &server_hash, &server_hash_len);
 +            pubkey_type = ssh_key_type(pubkey);
 +            ssh_key_free(pubkey);
 +        }
 +        if (r == 0) {
 +            fingerprint = ssh_get_fingerprint_hash(SSH_PUBLICKEY_HASH_SHA256,
 +                                                   server_hash,
 +                                                   server_hash_len);
 +            ssh_clean_pubkey_hash(&server_hash);
 +        }
 +        if (fingerprint) {
 +            error_setg(errp,
 +                       "host key (%s key with fingerprint %s) does not match "
 +                       "the one in known_hosts; this may be a possible attack",
 +                       ssh_key_type_to_char(pubkey_type), fingerprint);
 +            ssh_string_free_char(fingerprint);
 +        } else  {
 +            error_setg(errp,
 +                       "host key does not match the one in known_hosts; this "
 +                       "may be a possible attack");
 +        }
          goto out;
 -    }
 -
 -    knh = libssh2_knownhost_init(s->session);
 -    if (!knh) {
 +    case SSH_KNOWN_HOSTS_OTHER:
          ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                           "failed to initialize known hosts support");
 +        error_setg(errp,
 +                   "host key for this server not found, another type exists");
 +        goto out;
 +    case SSH_KNOWN_HOSTS_UNKNOWN:
 +        ret = -EINVAL;
 +        error_setg(errp, "no host key was found in known_hosts");
 +        goto out;
 +    case SSH_KNOWN_HOSTS_NOT_FOUND:
 +        ret = -ENOENT;
 +        error_setg(errp, "known_hosts file not found");
 +        goto out;
 +    case SSH_KNOWN_HOSTS_ERROR:
 +        ret = -EINVAL;
 +        error_setg(errp, "error while checking the host");
 +        goto out;
 +    default:
 +        ret = -EINVAL;
 +        error_setg(errp, "error while checking for known server (%d)", state);
          goto out;
      }
 +#else /* !HAVE_LIBSSH_0_8 */
 +    int state;
 -    home = getenv("HOME");
 -    if (home) {
 -        knh_file = g_strdup_printf("%s/.ssh/known_hosts", home);
 -    } else {
 -        knh_file = g_strdup_printf("/root/.ssh/known_hosts");
 -    }
 -
 -    /* Read all known hosts from OpenSSH-style known_hosts file. */
 -    libssh2_knownhost_readfile(knh, knh_file, LIBSSH2_KNOWNHOST_FILE_OPENSSH);
 +    state = ssh_is_server_known(s->session);
 +    trace_ssh_server_status(state);
 -    r = libssh2_knownhost_checkp(knh, host, port, hostkey, len,
 -                                 LIBSSH2_KNOWNHOST_TYPE_PLAIN|
 -                                 LIBSSH2_KNOWNHOST_KEYENC_RAW,
 -                                 &found);
 -    switch (r) {
 -    case LIBSSH2_KNOWNHOST_CHECK_MATCH:
 +    switch (state) {
 +    case SSH_SERVER_KNOWN_OK:
          /* OK */
 -        trace_ssh_check_host_key_knownhosts(found->key);
 +        trace_ssh_check_host_key_knownhosts();
          break;
 -    case LIBSSH2_KNOWNHOST_CHECK_MISMATCH:
 +    case SSH_SERVER_KNOWN_CHANGED:
          ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                      "host key does not match the one in known_hosts"
 -                      " (found key %s)", found->key);
 +        error_setg(errp,
 +                   "host key does not match the one in known_hosts; this "
 +                   "may be a possible attack");
          goto out;
 -    case LIBSSH2_KNOWNHOST_CHECK_NOTFOUND:
 +    case SSH_SERVER_FOUND_OTHER:
          ret = -EINVAL;
 -        session_error_setg(errp, s, "no host key was found in known_hosts");
 +        error_setg(errp,
 +                   "host key for this server not found, another type exists");
 +        goto out;
 +    case SSH_SERVER_FILE_NOT_FOUND:
 +        ret = -ENOENT;
 +        error_setg(errp, "known_hosts file not found");
          goto out;
 -    case LIBSSH2_KNOWNHOST_CHECK_FAILURE:
 +    case SSH_SERVER_NOT_KNOWN:
          ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                      "failure matching the host key with known_hosts");
 +        error_setg(errp, "no host key was found in known_hosts");
 +        goto out;
 +    case SSH_SERVER_ERROR:
 +        ret = -EINVAL;
 +        error_setg(errp, "server error");
          goto out;
      default:
          ret = -EINVAL;
 -        session_error_setg(errp, s, "unknown error matching the host key"
 -                      " with known_hosts (%d)", r);
 +        error_setg(errp, "error while checking for known server (%d)", state);
          goto out;
      }
 +#endif /* !HAVE_LIBSSH_0_8 */
      /* known_hosts checking successful. */
      ret = 0;
   out:
 -    if (knh != NULL) {
 -        libssh2_knownhost_free(knh);
 -    }
 -    g_free(knh_file);
      return ret;
  }
@@ -XXX,XX +XXX,XX @@ static int compare_fingerprint(const unsigned char *fingerprint, size_t len,
  static int
  check_host_key_hash(BDRVSSHState *s, const char *hash,
 -                    int hash_type, size_t fingerprint_len, Error **errp)
 +                    enum ssh_publickey_hash_type type, Error **errp)
  {
 -    const char *fingerprint;
 -
 -    fingerprint = libssh2_hostkey_hash(s->session, hash_type);
 -    if (!fingerprint) {
 +    int r;
 +    ssh_key pubkey;
 +    unsigned char *server_hash;
 +    size_t server_hash_len;
 +
 +#ifdef HAVE_LIBSSH_0_8
 +    r = ssh_get_server_publickey(s->session, &pubkey);
 +#else
 +    r = ssh_get_publickey(s->session, &pubkey);
 +#endif
 +    if (r != SSH_OK) {
          session_error_setg(errp, s, "failed to read remote host key");
          return -EINVAL;
      }
 -    if(compare_fingerprint((unsigned char *) fingerprint, fingerprint_len,
 -                           hash) != 0) {
 +    r = ssh_get_publickey_hash(pubkey, type, &server_hash, &server_hash_len);
 +    ssh_key_free(pubkey);
 +    if (r != 0) {
 +        session_error_setg(errp, s,
 +                           "failed reading the hash of the server SSH key");
 +        return -EINVAL;
 +    }
 +
 +    r = compare_fingerprint(server_hash, server_hash_len, hash);
 +    ssh_clean_pubkey_hash(&server_hash);
 +    if (r != 0) {
          error_setg(errp, "remote host key does not match host_key_check '%s'",
                     hash);
          return -EPERM;
@@ -XXX,XX +XXX,XX @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
      return 0;
  }
 -static int check_host_key(BDRVSSHState *s, const char *host, int port,
 -                          SshHostKeyCheck *hkc, Error **errp)
 +static int check_host_key(BDRVSSHState *s, SshHostKeyCheck *hkc, Error **errp)
  {
      SshHostKeyCheckMode mode;
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
      case SSH_HOST_KEY_CHECK_MODE_HASH:
          if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_MD5) {
              return check_host_key_hash(s, hkc->u.hash.hash,
 -                                       LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
 +                                       SSH_PUBLICKEY_HASH_MD5, errp);
          } else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA1) {
              return check_host_key_hash(s, hkc->u.hash.hash,
 -                                       LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
 +                                       SSH_PUBLICKEY_HASH_SHA1, errp);
          }
          g_assert_not_reached();
          break;
      case SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS:
 -        return check_host_key_knownhosts(s, host, port, errp);
 +        return check_host_key_knownhosts(s, errp);
      default:
          g_assert_not_reached();
      }
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
      return -EINVAL;
  }
 -static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
 +static int authenticate(BDRVSSHState *s, Error **errp)
  {
      int r, ret;
 -    const char *userauthlist;
 -    LIBSSH2_AGENT *agent = NULL;
 -    struct libssh2_agent_publickey *identity;
 -    struct libssh2_agent_publickey *prev_identity = NULL;
 +    int method;
 -    userauthlist = libssh2_userauth_list(s->session, user, strlen(user));
 -    if (strstr(userauthlist, "publickey") == NULL) {
 +    /* Try to authenticate with the "none" method. */
 +    r = ssh_userauth_none(s->session, NULL);
 +    if (r == SSH_AUTH_ERROR) {
          ret = -EPERM;
 -        error_setg(errp,
 -                "remote server does not support \"publickey\" authentication");
 +        session_error_setg(errp, s, "failed to authenticate using none "
 +                                    "authentication");
          goto out;
 -    }
 -
 -    /* Connect to ssh-agent and try each identity in turn. */
 -    agent = libssh2_agent_init(s->session);
 -    if (!agent) {
 -        ret = -EINVAL;
 -        session_error_setg(errp, s, "failed to initialize ssh-agent support");
 -        goto out;
 -    }
 -    if (libssh2_agent_connect(agent)) {
 -        ret = -ECONNREFUSED;
 -        session_error_setg(errp, s, "failed to connect to ssh-agent");
 -        goto out;
 -    }
 -    if (libssh2_agent_list_identities(agent)) {
 -        ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                           "failed requesting identities from ssh-agent");
 +    } else if (r == SSH_AUTH_SUCCESS) {
 +        /* Authenticated! */
 +        ret = 0;
          goto out;
      }
 -    for(;;) {
 -        r = libssh2_agent_get_identity(agent, &identity, prev_identity);
 -        if (r == 1) {           /* end of list */
 -            break;
 -        }
 -        if (r < 0) {
 +    method = ssh_userauth_list(s->session, NULL);
 +    trace_ssh_auth_methods(method);
 +
 +    /*
-+     * Preallocation mode doesn't affect size requirements but we must consume
++     * Try to authenticate with publickey, using the ssh-agent
-+     * the option.
++     * if available.
 +     */
-+    g_free(qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC));
++    if (method & SSH_AUTH_METHOD_PUBLICKEY) {
-+
++        r = ssh_userauth_publickey_auto(s->session, NULL, NULL);
-+    size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
++        if (r == SSH_AUTH_ERROR) {
-+
+             ret = -EINVAL;
-+    if (in_bs) {
+-            session_error_setg(errp, s,
-+        int64_t ssize = bdrv_getlength(in_bs);
+-                               "failed to obtain identity from ssh-agent");
-+
++            session_error_setg(errp, s, "failed to authenticate using "
-+        if (ssize < 0) {
++                                        "publickey authentication");
-+            error_setg_errno(&local_err, -ssize,
+             goto out;
-+                             "Unable to get image virtual_size");
+-        }
 -        r = libssh2_agent_userauth(agent, user, identity);
 -        if (r == 0) {
 +        } else if (r == SSH_AUTH_SUCCESS) {
              /* Authenticated! */
              ret = 0;
              goto out;
          }
 -        /* Failed to authenticate with this identity, try the next one. */
 -        prev_identity = identity;
      }
      ret = -EPERM;
@@ -XXX,XX +XXX,XX @@ static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
                 "and the identities held by your ssh-agent");
   out:
 -    if (agent != NULL) {
 -        /* Note: libssh2 implementation implicitly calls
 -         * libssh2_agent_disconnect if necessary.
 -         */
 -        libssh2_agent_free(agent);
 -    }
 -
      return ret;
  }
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
                            int ssh_flags, int creat_mode, Error **errp)
  {
      int r, ret;
 -    long port = 0;
 +    unsigned int port = 0;
 +    int new_sock = -1;
      if (opts->has_user) {
          s->user = g_strdup(opts->user);
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
      s->inet = opts->server;
      opts->server = NULL;
 -    if (qemu_strtol(s->inet->port, NULL, 10, &port) < 0) {
 +    if (qemu_strtoui(s->inet->port, NULL, 10, &port) < 0) {
          error_setg(errp, "Use only numeric port value");
          ret = -EINVAL;
          goto err;
      }
      /* Open the socket and connect. */
 -    s->sock = inet_connect_saddr(s->inet, errp);
 -    if (s->sock < 0) {
 +    new_sock = inet_connect_saddr(s->inet, errp);
 +    if (new_sock < 0) {
          ret = -EIO;
          goto err;
      }
 +    /*
 +     * Try to disable the Nagle algorithm on TCP sockets to reduce latency,
 +     * but do not fail if it cannot be disabled.
 +     */
 +    r = socket_set_nodelay(new_sock);
 +    if (r < 0) {
 +        warn_report("can't set TCP_NODELAY for the ssh server %s: %s",
 +                    s->inet->host, strerror(errno));
 +    }
 +
      /* Create SSH session. */
 -    s->session = libssh2_session_init();
 +    s->session = ssh_new();
      if (!s->session) {
          ret = -EINVAL;
 -        session_error_setg(errp, s, "failed to initialize libssh2 session");
 +        session_error_setg(errp, s, "failed to initialize libssh session");
          goto err;
      }
 -#if TRACE_LIBSSH2 != 0
 -    libssh2_trace(s->session, TRACE_LIBSSH2);
 -#endif
 +    /*
 +     * Make sure we are in blocking mode during the connection and
 +     * authentication phases.
 +     */
 +    ssh_set_blocking(s->session, 1);
 -    r = libssh2_session_handshake(s->session, s->sock);
 -    if (r != 0) {
 +    r = ssh_options_set(s->session, SSH_OPTIONS_USER, s->user);
 +    if (r < 0) {
 +        ret = -EINVAL;
 +        session_error_setg(errp, s,
 +                           "failed to set the user in the libssh session");
 +        goto err;
 +    }
 +
 +    r = ssh_options_set(s->session, SSH_OPTIONS_HOST, s->inet->host);
 +    if (r < 0) {
 +        ret = -EINVAL;
 +        session_error_setg(errp, s,
 +                           "failed to set the host in the libssh session");
 +        goto err;
 +    }
 +
 +    if (port > 0) {
 +        r = ssh_options_set(s->session, SSH_OPTIONS_PORT, &port);
 +        if (r < 0) {
 +            ret = -EINVAL;
 +            session_error_setg(errp, s,
 +                               "failed to set the port in the libssh session");
 +            goto err;
 +        }
-+
-+        size = ssize;
 +    }
 +
-+    cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
++    r = ssh_options_set(s->session, SSH_OPTIONS_COMPRESSION, "none");
-+            &block_crypto_create_opts_luks, true);
++    if (r < 0) {
-+    qdict_put_str(cryptoopts, "format", "luks");
++        ret = -EINVAL;
-+    create_opts = block_crypto_create_opts_init(cryptoopts, &local_err);
++        session_error_setg(errp, s,
-+    qobject_unref(cryptoopts);
++                           "failed to disable the compression in the libssh "
-+    if (!create_opts) {
++                           "session");
 +        goto err;
 +    }
 +
-+    if (!qcrypto_block_calculate_payload_offset(create_opts, NULL,
++    /* Read ~/.ssh/config. */
-+                                                &luks_payload_size,
++    r = ssh_options_parse_config(s->session, NULL);
-+                                                &local_err)) {
++    if (r < 0) {
 +        ret = -EINVAL;
 +        session_error_setg(errp, s, "failed to parse ~/.ssh/config");
 +        goto err;
 +    }
 +
-+    /*
++    r = ssh_options_set(s->session, SSH_OPTIONS_FD, &new_sock);
-+     * Unallocated blocks are still encrypted so allocation status makes no
++    if (r < 0) {
-+     * difference to the file size.
++        ret = -EINVAL;
-+     */
++        session_error_setg(errp, s,
-+    info = g_new(BlockMeasureInfo, 1);
++                           "failed to set the socket in the libssh session");
-+    info->fully_allocated = luks_payload_size + size;
++        goto err;
-+    info->required = luks_payload_size + size;
++    }
-+    return info;
++    /* libssh took ownership of the socket. */
-+
++    s->sock = new_sock;
-+err:
++    new_sock = -1;
-+    error_propagate(errp, local_err);
++
-+    return NULL;
++    /* Connect. */
-+}
++    r = ssh_connect(s->session);
-+
++    if (r != SSH_OK) {
-+
+         ret = -EINVAL;
- static int block_crypto_probe_luks(const uint8_t *buf,
+         session_error_setg(errp, s, "failed to establish SSH session");
-                                    int buf_size,
+         goto err;
-                                    const char *filename) {
+     }
-@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_crypto_luks = {
-     .bdrv_co_preadv     = block_crypto_co_preadv,
+     /* Check the remote host's key against known_hosts. */
-     .bdrv_co_pwritev    = block_crypto_co_pwritev,
+-    ret = check_host_key(s, s->inet->host, port, opts->host_key_check, errp);
-     .bdrv_getlength     = block_crypto_getlength,
++    ret = check_host_key(s, opts->host_key_check, errp);
-+    .bdrv_measure       = block_crypto_measure,
+     if (ret < 0) {
-     .bdrv_get_info      = block_crypto_get_info_luks,
+         goto err;
-     .bdrv_get_specific_info = block_crypto_get_specific_info_luks,
+     }
      /* Authenticate. */
 -    ret = authenticate(s, s->user, errp);
 +    ret = authenticate(s, errp);
      if (ret < 0) {
          goto err;
      }
      /* Start SFTP. */
 -    s->sftp = libssh2_sftp_init(s->session);
 +    s->sftp = sftp_new(s->session);
      if (!s->sftp) {
 -        session_error_setg(errp, s, "failed to initialize sftp handle");
 +        session_error_setg(errp, s, "failed to create sftp handle");
 +        ret = -EINVAL;
 +        goto err;
 +    }
 +
 +    r = sftp_init(s->sftp);
 +    if (r < 0) {
 +        sftp_error_setg(errp, s, "failed to initialize sftp handle");
          ret = -EINVAL;
          goto err;
      }
      /* Open the remote file. */
      trace_ssh_connect_to_ssh(opts->path, ssh_flags, creat_mode);
 -    s->sftp_handle = libssh2_sftp_open(s->sftp, opts->path, ssh_flags,
 -                                       creat_mode);
 +    s->sftp_handle = sftp_open(s->sftp, opts->path, ssh_flags, creat_mode);
      if (!s->sftp_handle) {
 -        session_error_setg(errp, s, "failed to open remote file '%s'",
 -                           opts->path);
 +        sftp_error_setg(errp, s, "failed to open remote file '%s'",
 +                        opts->path);
          ret = -EINVAL;
          goto err;
      }
 -    r = libssh2_sftp_fstat(s->sftp_handle, &s->attrs);
 -    if (r < 0) {
 +    /* Make sure the SFTP file is handled in blocking mode. */
 +    sftp_file_set_blocking(s->sftp_handle);
 +
 +    s->attrs = sftp_fstat(s->sftp_handle);
 +    if (!s->attrs) {
          sftp_error_setg(errp, s, "failed to read file attributes");
          return -EINVAL;
      }
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
      return 0;
   err:
 +    if (s->attrs) {
 +        sftp_attributes_free(s->attrs);
 +    }
 +    s->attrs = NULL;
      if (s->sftp_handle) {
 -        libssh2_sftp_close(s->sftp_handle);
 +        sftp_close(s->sftp_handle);
      }
      s->sftp_handle = NULL;
      if (s->sftp) {
 -        libssh2_sftp_shutdown(s->sftp);
 +        sftp_free(s->sftp);
      }
      s->sftp = NULL;
      if (s->session) {
 -        libssh2_session_disconnect(s->session,
 -                                   "from qemu ssh client: "
 -                                   "error opening connection");
 -        libssh2_session_free(s->session);
 +        ssh_disconnect(s->session);
 +        ssh_free(s->session);
      }
      s->session = NULL;
 +    s->sock = -1;
 +    if (new_sock >= 0) {
 +        close(new_sock);
 +    }
      return ret;
  }
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
      ssh_state_init(s);
 -    ssh_flags = LIBSSH2_FXF_READ;
 +    ssh_flags = 0;
      if (bdrv_flags & BDRV_O_RDWR) {
 -        ssh_flags |= LIBSSH2_FXF_WRITE;
 +        ssh_flags |= O_RDWR;
 +    } else {
 +        ssh_flags |= O_RDONLY;
      }
      opts = ssh_parse_options(options, errp);
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
      }
      /* Go non-blocking. */
 -    libssh2_session_set_blocking(s->session, 0);
 +    ssh_set_blocking(s->session, 0);
      qapi_free_BlockdevOptionsSsh(opts);
      return 0;
   err:
 -    if (s->sock >= 0) {
 -        close(s->sock);
 -    }
 -    s->sock = -1;
 -
      qapi_free_BlockdevOptionsSsh(opts);
      return ret;
@@ -XXX,XX +XXX,XX @@ static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
  {
      ssize_t ret;
      char c[1] = { '\0' };
 -    int was_blocking = libssh2_session_get_blocking(s->session);
 +    int was_blocking = ssh_is_blocking(s->session);
      /* offset must be strictly greater than the current size so we do
       * not overwrite anything */
 -    assert(offset > 0 && offset > s->attrs.filesize);
 +    assert(offset > 0 && offset > s->attrs->size);
 -    libssh2_session_set_blocking(s->session, 1);
 +    ssh_set_blocking(s->session, 1);
 -    libssh2_sftp_seek64(s->sftp_handle, offset - 1);
 -    ret = libssh2_sftp_write(s->sftp_handle, c, 1);
 +    sftp_seek64(s->sftp_handle, offset - 1);
 +    ret = sftp_write(s->sftp_handle, c, 1);
 -    libssh2_session_set_blocking(s->session, was_blocking);
 +    ssh_set_blocking(s->session, was_blocking);
      if (ret < 0) {
          sftp_error_setg(errp, s, "Failed to grow file");
          return -EIO;
      }
 -    s->attrs.filesize = offset;
 +    s->attrs->size = offset;
      return 0;
  }
@@ -XXX,XX +XXX,XX @@ static int ssh_co_create(BlockdevCreateOptions *options, Error **errp)
      ssh_state_init(&s);
      ret = connect_to_ssh(&s, opts->location,
 -                         LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
 -                         LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
 +                         O_RDWR | O_CREAT | O_TRUNC,
 , errp);
      if (ret < 0) {
          goto fail;
@@ -XXX,XX +XXX,XX @@ static int ssh_has_zero_init(BlockDriverState *bs)
      /* Assume false, unless we can positively prove it's true. */
      int has_zero_init = 0;
 -    if (s->attrs.flags & LIBSSH2_SFTP_ATTR_PERMISSIONS) {
 -        if (s->attrs.permissions & LIBSSH2_SFTP_S_IFREG) {
 -            has_zero_init = 1;
 -        }
 +    if (s->attrs->type == SSH_FILEXFER_TYPE_REGULAR) {
 +        has_zero_init = 1;
      }
      return has_zero_init;
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
          .co = qemu_coroutine_self()
      };
 -    r = libssh2_session_block_directions(s->session);
 +    r = ssh_get_poll_flags(s->session);
 -    if (r & LIBSSH2_SESSION_BLOCK_INBOUND) {
 +    if (r & SSH_READ_PENDING) {
          rd_handler = restart_coroutine;
      }
 -    if (r & LIBSSH2_SESSION_BLOCK_OUTBOUND) {
 +    if (r & SSH_WRITE_PENDING) {
          wr_handler = restart_coroutine;
      }
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
      trace_ssh_co_yield_back(s->sock);
  }
 -/* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
 - * in the remote file.  Notice that it just updates a field in the
 - * sftp_handle structure, so there is no network traffic and it cannot
 - * fail.
 - *
 - * However, `libssh2_sftp_seek64' does have a catastrophic effect on
 - * performance since it causes the handle to throw away all in-flight
 - * reads and buffered readahead data.  Therefore this function tries
 - * to be intelligent about when to call the underlying libssh2 function.
 - */
 -#define SSH_SEEK_WRITE 0
 -#define SSH_SEEK_READ  1
 -#define SSH_SEEK_FORCE 2
 -
 -static void ssh_seek(BDRVSSHState *s, int64_t offset, int flags)
 -{
 -    bool op_read = (flags & SSH_SEEK_READ) != 0;
 -    bool force = (flags & SSH_SEEK_FORCE) != 0;
 -
 -    if (force || op_read != s->offset_op_read || offset != s->offset) {
 -        trace_ssh_seek(offset);
 -        libssh2_sftp_seek64(s->sftp_handle, offset);
 -        s->offset = offset;
 -        s->offset_op_read = op_read;
 -    }
 -}
 -
  static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
                                   int64_t offset, size_t size,
                                   QEMUIOVector *qiov)
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
      trace_ssh_read(offset, size);
 -    ssh_seek(s, offset, SSH_SEEK_READ);
 +    trace_ssh_seek(offset);
 +    sftp_seek64(s->sftp_handle, offset);
      /* This keeps track of the current iovec element ('i'), where we
       * will write to next ('buf'), and the end of the current iovec
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
      buf = i->iov_base;
      end_of_vec = i->iov_base + i->iov_len;
 -    /* libssh2 has a hard-coded limit of 2000 bytes per request,
 -     * although it will also do readahead behind our backs.  Therefore
 -     * we may have to do repeated reads here until we have read 'size'
 -     * bytes.
 -     */
      for (got = 0; got < size; ) {
 +        size_t request_read_size;
      again:
 -        trace_ssh_read_buf(buf, end_of_vec - buf);
 -        r = libssh2_sftp_read(s->sftp_handle, buf, end_of_vec - buf);
 -        trace_ssh_read_return(r);
 +        /*
 +         * The size of SFTP packets is limited to 32K bytes, so limit
 +         * the amount of data requested to 16K, as libssh currently
 +         * does not handle multiple requests on its own.
 +         */
 +        request_read_size = MIN(end_of_vec - buf, 16384);
 +        trace_ssh_read_buf(buf, end_of_vec - buf, request_read_size);
 +        r = sftp_read(s->sftp_handle, buf, request_read_size);
 +        trace_ssh_read_return(r, sftp_get_error(s->sftp));
 -        if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
 +        if (r == SSH_AGAIN) {
              co_yield(s, bs);
              goto again;
          }
 -        if (r < 0) {
 -            sftp_error_trace(s, "read");
 -            s->offset = -1;
 -            return -EIO;
 -        }
 -        if (r == 0) {
 +        if (r == SSH_EOF || (r == 0 && sftp_get_error(s->sftp) == SSH_FX_EOF)) {
              /* EOF: Short read so pad the buffer with zeroes and return it. */
              qemu_iovec_memset(qiov, got, 0, size - got);
              return 0;
          }
 +        if (r <= 0) {
 +            sftp_error_trace(s, "read");
 +            return -EIO;
 +        }
          got += r;
          buf += r;
 -        s->offset += r;
          if (buf >= end_of_vec && got < size) {
              i++;
              buf = i->iov_base;
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
      trace_ssh_write(offset, size);
 -    ssh_seek(s, offset, SSH_SEEK_WRITE);
 +    trace_ssh_seek(offset);
 +    sftp_seek64(s->sftp_handle, offset);
      /* This keeps track of the current iovec element ('i'), where we
       * will read from next ('buf'), and the end of the current iovec
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
      end_of_vec = i->iov_base + i->iov_len;
      for (written = 0; written < size; ) {
 +        size_t request_write_size;
      again:
 -        trace_ssh_write_buf(buf, end_of_vec - buf);
 -        r = libssh2_sftp_write(s->sftp_handle, buf, end_of_vec - buf);
 -        trace_ssh_write_return(r);
 +        /*
 +         * Avoid too large data packets, as libssh currently does not
 +         * handle multiple requests on its own.
 +         */
 +        request_write_size = MIN(end_of_vec - buf, 131072);
 +        trace_ssh_write_buf(buf, end_of_vec - buf, request_write_size);
 +        r = sftp_write(s->sftp_handle, buf, request_write_size);
 +        trace_ssh_write_return(r, sftp_get_error(s->sftp));
 -        if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
 +        if (r == SSH_AGAIN) {
              co_yield(s, bs);
              goto again;
          }
          if (r < 0) {
              sftp_error_trace(s, "write");
 -            s->offset = -1;
              return -EIO;
          }
 -        /* The libssh2 API is very unclear about this.  A comment in
 -         * the code says "nothing was acked, and no EAGAIN was
 -         * received!" which apparently means that no data got sent
 -         * out, and the underlying channel didn't return any EAGAIN
 -         * indication.  I think this is a bug in either libssh2 or
 -         * OpenSSH (server-side).  In any case, forcing a seek (to
 -         * discard libssh2 internal buffers), and then trying again
 -         * works for me.
 -         */
 -        if (r == 0) {
 -            ssh_seek(s, offset + written, SSH_SEEK_WRITE|SSH_SEEK_FORCE);
 -            co_yield(s, bs);
 -            goto again;
 -        }
          written += r;
          buf += r;
 -        s->offset += r;
          if (buf >= end_of_vec && written < size) {
              i++;
              buf = i->iov_base;
              end_of_vec = i->iov_base + i->iov_len;
          }
 -        if (offset + written > s->attrs.filesize)
 -            s->attrs.filesize = offset + written;
 +        if (offset + written > s->attrs->size) {
 +            s->attrs->size = offset + written;
 +        }
      }
      return 0;
@@ -XXX,XX +XXX,XX @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
      }
  }
 -#ifdef HAS_LIBSSH2_SFTP_FSYNC
 +#ifdef HAVE_LIBSSH_0_8
  static coroutine_fn int ssh_flush(BDRVSSHState *s, BlockDriverState *bs)
  {
      int r;
      trace_ssh_flush();
 +
 +    if (!sftp_extension_supported(s->sftp, "fsync@openssh.com", "1")) {
 +        unsafe_flush_warning(s, "OpenSSH >= 6.3");
 +        return 0;
 +    }
   again:
 -    r = libssh2_sftp_fsync(s->sftp_handle);
 -    if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
 +    r = sftp_fsync(s->sftp_handle);
 +    if (r == SSH_AGAIN) {
          co_yield(s, bs);
          goto again;
      }
 -    if (r == LIBSSH2_ERROR_SFTP_PROTOCOL &&
 -        libssh2_sftp_last_error(s->sftp) == LIBSSH2_FX_OP_UNSUPPORTED) {
 -        unsafe_flush_warning(s, "OpenSSH >= 6.3");
 -        return 0;
 -    }
      if (r < 0) {
          sftp_error_trace(s, "fsync");
          return -EIO;
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
      return ret;
  }
 -#else /* !HAS_LIBSSH2_SFTP_FSYNC */
 +#else /* !HAVE_LIBSSH_0_8 */
  static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
  {
      BDRVSSHState *s = bs->opaque;
 -    unsafe_flush_warning(s, "libssh2 >= 1.4.4");
 +    unsafe_flush_warning(s, "libssh >= 0.8.0");
      return 0;
  }
 -#endif /* !HAS_LIBSSH2_SFTP_FSYNC */
 +#endif /* !HAVE_LIBSSH_0_8 */
  static int64_t ssh_getlength(BlockDriverState *bs)
  {
      BDRVSSHState *s = bs->opaque;
      int64_t length;
 -    /* Note we cannot make a libssh2 call here. */
 -    length = (int64_t) s->attrs.filesize;
 +    /* Note we cannot make a libssh call here. */
 +    length = (int64_t) s->attrs->size;
      trace_ssh_getlength(length);
      return length;
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn ssh_co_truncate(BlockDriverState *bs, int64_t offset,
          return -ENOTSUP;
      }
 -    if (offset < s->attrs.filesize) {
 +    if (offset < s->attrs->size) {
          error_setg(errp, "ssh driver does not support shrinking files");
          return -ENOTSUP;
      }
 -    if (offset == s->attrs.filesize) {
 +    if (offset == s->attrs->size) {
          return 0;
      }
@@ -XXX,XX +XXX,XX @@ static void bdrv_ssh_init(void)
  {
      int r;
 -    r = libssh2_init(0);
 +    r = ssh_init();
      if (r != 0) {
 -        fprintf(stderr, "libssh2 initialization failed, %d\n", r);
 +        fprintf(stderr, "libssh initialization failed, %d\n", r);
          exit(EXIT_FAILURE);
      }
 +#if TRACE_LIBSSH != 0
 +    ssh_set_log_level(TRACE_LIBSSH);
 +#endif
 +
      bdrv_register(&bdrv_ssh);
  }
 diff --git a/.travis.yml b/.travis.yml
 index XXXXXXX..XXXXXXX 100644
 --- a/.travis.yml
 +++ b/.travis.yml
@@ -XXX,XX +XXX,XX @@ addons:
        - libseccomp-dev
        - libspice-protocol-dev
        - libspice-server-dev
 -      - libssh2-1-dev
 +      - libssh-dev
        - liburcu-dev
        - libusb-1.0-0-dev
        - libvte-2.91-dev
@@ -XXX,XX +XXX,XX @@ matrix:
              - libseccomp-dev
              - libspice-protocol-dev
              - libspice-server-dev
 -            - libssh2-1-dev
 +            - libssh-dev
              - liburcu-dev
              - libusb-1.0-0-dev
              - libvte-2.91-dev
 diff --git a/block/trace-events b/block/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/block/trace-events
 +++ b/block/trace-events
@@ -XXX,XX +XXX,XX @@ nbd_client_connect_success(const char *export_name) "export '%s'"
  # ssh.c
  ssh_restart_coroutine(void *co) "co=%p"
  ssh_flush(void) "fsync"
 -ssh_check_host_key_knownhosts(const char *key) "host key OK: %s"
 +ssh_check_host_key_knownhosts(void) "host key OK"
  ssh_connect_to_ssh(char *path, int flags, int mode) "opening file %s flags=0x%x creat_mode=0%o"
  ssh_co_yield(int sock, void *rd_handler, void *wr_handler) "s->sock=%d rd_handler=%p wr_handler=%p"
  ssh_co_yield_back(int sock) "s->sock=%d - back"
  ssh_getlength(int64_t length) "length=%" PRIi64
  ssh_co_create_opts(uint64_t size) "total_size=%" PRIu64
  ssh_read(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
 -ssh_read_buf(void *buf, size_t size) "sftp_read buf=%p size=%zu"
 -ssh_read_return(ssize_t ret) "sftp_read returned %zd"
 +ssh_read_buf(void *buf, size_t size, size_t actual_size) "sftp_read buf=%p size=%zu (actual size=%zu)"
 +ssh_read_return(ssize_t ret, int sftp_err) "sftp_read returned %zd (sftp error=%d)"
  ssh_write(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
 -ssh_write_buf(void *buf, size_t size) "sftp_write buf=%p size=%zu"
 -ssh_write_return(ssize_t ret) "sftp_write returned %zd"
 +ssh_write_buf(void *buf, size_t size, size_t actual_size) "sftp_write buf=%p size=%zu (actual size=%zu)"
 +ssh_write_return(ssize_t ret, int sftp_err) "sftp_write returned %zd (sftp error=%d)"
  ssh_seek(int64_t offset) "seeking to offset=%" PRIi64
 +ssh_auth_methods(int methods) "auth methods=0x%x"
 +ssh_server_status(int status) "server status=%d"
  # curl.c
  curl_timer_cb(long timeout_ms) "timer callback timeout_ms %ld"
@@ -XXX,XX +XXX,XX @@ sheepdog_snapshot_create(const char *sn_name, const char *id) "%s %s"
  sheepdog_snapshot_create_inode(const char *name, uint32_t snap, uint32_t vdi) "s->inode: name %s snap_id 0x%" PRIx32 " vdi 0x%" PRIx32
  # ssh.c
 -sftp_error(const char *op, const char *ssh_err, int ssh_err_code, unsigned long sftp_err_code) "%s failed: %s (libssh2 error code: %d, sftp error code: %lu)"
 +sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
 diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/qemu-block-drivers.texi
 +++ b/docs/qemu-block-drivers.texi
@@ -XXX,XX +XXX,XX @@ print a warning when @code{fsync} is not supported:
  warning: ssh server @code{ssh.example.com:22} does not support fsync
 -With sufficiently new versions of libssh2 and OpenSSH, @code{fsync} is
 +With sufficiently new versions of libssh and OpenSSH, @code{fsync} is
  supported.
  @node disk_images_nvme
 diff --git a/tests/docker/dockerfiles/debian-win32-cross.docker b/tests/docker/dockerfiles/debian-win32-cross.docker
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/docker/dockerfiles/debian-win32-cross.docker
 +++ b/tests/docker/dockerfiles/debian-win32-cross.docker
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
          mxe-$TARGET-w64-mingw32.shared-curl \
          mxe-$TARGET-w64-mingw32.shared-glib \
          mxe-$TARGET-w64-mingw32.shared-libgcrypt \
 -        mxe-$TARGET-w64-mingw32.shared-libssh2 \
          mxe-$TARGET-w64-mingw32.shared-libusb1 \
          mxe-$TARGET-w64-mingw32.shared-lzo \
          mxe-$TARGET-w64-mingw32.shared-nettle \
 diff --git a/tests/docker/dockerfiles/debian-win64-cross.docker b/tests/docker/dockerfiles/debian-win64-cross.docker
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/docker/dockerfiles/debian-win64-cross.docker
 +++ b/tests/docker/dockerfiles/debian-win64-cross.docker
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
          mxe-$TARGET-w64-mingw32.shared-curl \
          mxe-$TARGET-w64-mingw32.shared-glib \
          mxe-$TARGET-w64-mingw32.shared-libgcrypt \
 -        mxe-$TARGET-w64-mingw32.shared-libssh2 \
          mxe-$TARGET-w64-mingw32.shared-libusb1 \
          mxe-$TARGET-w64-mingw32.shared-lzo \
          mxe-$TARGET-w64-mingw32.shared-nettle \
 diff --git a/tests/docker/dockerfiles/fedora.docker b/tests/docker/dockerfiles/fedora.docker
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/docker/dockerfiles/fedora.docker
 +++ b/tests/docker/dockerfiles/fedora.docker
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
      libpng-devel \
      librbd-devel \
      libseccomp-devel \
 -    libssh2-devel \
 +    libssh-devel \
      libubsan \
      libusbx-devel \
      libxml2-devel \
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
      mingw32-gtk3 \
      mingw32-libjpeg-turbo \
      mingw32-libpng \
 -    mingw32-libssh2 \
      mingw32-libtasn1 \
      mingw32-nettle \
      mingw32-pixman \
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
      mingw64-gtk3 \
      mingw64-libjpeg-turbo \
      mingw64-libpng \
 -    mingw64-libssh2 \
      mingw64-libtasn1 \
      mingw64-nettle \
      mingw64-pixman \
 diff --git a/tests/docker/dockerfiles/ubuntu.docker b/tests/docker/dockerfiles/ubuntu.docker
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/docker/dockerfiles/ubuntu.docker
 +++ b/tests/docker/dockerfiles/ubuntu.docker
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
      libsnappy-dev \
      libspice-protocol-dev \
      libspice-server-dev \
 -    libssh2-1-dev \
 +    libssh-dev \
      libusb-1.0-0-dev \
      libusbredirhost-dev \
      libvdeplug-dev \
 diff --git a/tests/docker/dockerfiles/ubuntu1804.docker b/tests/docker/dockerfiles/ubuntu1804.docker
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/docker/dockerfiles/ubuntu1804.docker
 +++ b/tests/docker/dockerfiles/ubuntu1804.docker
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
      libsnappy-dev \
      libspice-protocol-dev \
      libspice-server-dev \
 -    libssh2-1-dev \
 +    libssh-dev \
      libusb-1.0-0-dev \
      libusbredirhost-dev \
      libvdeplug-dev \
 diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
 index XXXXXXX..XXXXXXX 100755
 --- a/tests/qemu-iotests/207
 +++ b/tests/qemu-iotests/207
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
      iotests.img_info_log(remote_path)
 -    md5_key = subprocess.check_output(
 -        'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
 -        'cut -d" " -f3 | base64 -d | md5sum -b | cut -d" " -f1',
 -        shell=True).rstrip().decode('ascii')
 +    keys = subprocess.check_output(
 +        'ssh-keyscan 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
 +        'cut -d" " -f3',
 +        shell=True).rstrip().decode('ascii').split('\n')
 +
 +    # Mappings of base64 representations to digests
 +    md5_keys = {}
 +    sha1_keys = {}
 +
 +    for key in keys:
 +        md5_keys[key] = subprocess.check_output(
 +            'echo %s | base64 -d | md5sum -b | cut -d" " -f1' % key,
 +            shell=True).rstrip().decode('ascii')
 +
 +        sha1_keys[key] = subprocess.check_output(
 +            'echo %s | base64 -d | sha1sum -b | cut -d" " -f1' % key,
 +            shell=True).rstrip().decode('ascii')
      vm.launch()
 +
 +    # Find correct key first
 +    matching_key = None
 +    for key in keys:
 +        result = vm.qmp('blockdev-add',
 +                        driver='ssh', node_name='node0', path=disk_path,
 +                        server={
 +                             'host': '127.0.0.1',
 +                             'port': '22',
 +                        }, host_key_check={
 +                             'mode': 'hash',
 +                             'type': 'md5',
 +                             'hash': md5_keys[key],
 +                        })
 +
 +        if 'error' not in result:
 +            vm.qmp('blockdev-del', node_name='node0')
 +            matching_key = key
 +            break
 +
 +    if matching_key is None:
 +        vm.shutdown()
 +        iotests.notrun('Did not find a key that fits 127.0.0.1')
 +
      blockdev_create(vm, { 'driver': 'ssh',
                            'location': {
                                'path': disk_path,
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
                                'host-key-check': {
                                    'mode': 'hash',
                                    'type': 'md5',
 -                                  'hash': md5_key,
 +                                  'hash': md5_keys[matching_key],
                                }
                            },
                            'size': 8388608 })
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
      iotests.img_info_log(remote_path)
 -    sha1_key = subprocess.check_output(
 -        'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
 -        'cut -d" " -f3 | base64 -d | sha1sum -b | cut -d" " -f1',
 -        shell=True).rstrip().decode('ascii')
 -
      vm.launch()
      blockdev_create(vm, { 'driver': 'ssh',
                            'location': {
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
                                'host-key-check': {
                                    'mode': 'hash',
                                    'type': 'sha1',
 -                                  'hash': sha1_key,
 +                                  'hash': sha1_keys[matching_key],
                                }
                            },
                            'size': 4194304 })
 diff --git a/tests/qemu-iotests/207.out b/tests/qemu-iotests/207.out
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qemu-iotests/207.out
 +++ b/tests/qemu-iotests/207.out
@@ -XXX,XX +XXX,XX @@ virtual size: 4 MiB (4194304 bytes)
  {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": {"driver": "ssh", "location": {"host-key-check": {"mode": "none"}, "path": "/this/is/not/an/existing/path", "server": {"host": "127.0.0.1", "port": "22"}}, "size": 4194304}}}
  {"return": {}}
 -Job failed: failed to open remote file '/this/is/not/an/existing/path': Failed opening remote file (libssh2 error code: -31)
 +Job failed: failed to open remote file '/this/is/not/an/existing/path': SFTP server: No such file (libssh error code: 1, sftp error code: 2)
  {"execute": "job-dismiss", "arguments": {"id": "job0"}}
  {"return": {}}
 --
-.24.1
+.21.0

-[PULL 03/19] qemu-img: allow qemu-img measure --object without a filename
+Deleted patch
-From: Stefan Hajnoczi <stefanha@redhat.com>
-In most qemu-img sub-commands the --object option only makes sense when
-there is a filename.  qemu-img measure is an exception because objects
-may be referenced from the image creation options instead of an existing
-image file.  Allow --object without a filename.
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
-Message-Id: <20200221112522.1497712-4-stefanha@redhat.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- qemu-img.c                       | 6 ++----
- tests/qemu-iotests/178           | 2 +-
- tests/qemu-iotests/178.out.qcow2 | 8 ++++----
- tests/qemu-iotests/178.out.raw   | 8 ++++----
-files changed, 11 insertions(+), 13 deletions(-)
-diff --git a/qemu-img.c b/qemu-img.c
-index XXXXXXX..XXXXXXX 100644
---- a/qemu-img.c
-+++ b/qemu-img.c
-@@ -XXX,XX +XXX,XX @@ static int img_measure(int argc, char **argv)
-         filename = argv[optind];
-     }
--    if (!filename &&
--        (object_opts || image_opts || fmt || snapshot_name || sn_opts)) {
--        error_report("--object, --image-opts, -f, and -l "
--                     "require a filename argument.");
-+    if (!filename && (image_opts || fmt || snapshot_name || sn_opts)) {
-+        error_report("--image-opts, -f, and -l require a filename argument.");
-         goto out;
-     }
-     if (filename && img_size != UINT64_MAX) {
-diff --git a/tests/qemu-iotests/178 b/tests/qemu-iotests/178
-index XXXXXXX..XXXXXXX 100755
---- a/tests/qemu-iotests/178
-+++ b/tests/qemu-iotests/178
-@@ -XXX,XX +XXX,XX @@ _make_test_img 1G
- $QEMU_IMG measure # missing arguments
- $QEMU_IMG measure --size 2G "$TEST_IMG" # only one allowed
- $QEMU_IMG measure "$TEST_IMG" a # only one filename allowed
--$QEMU_IMG measure --object secret,id=sec0,data=MTIzNDU2,format=base64 # missing filename
-+$QEMU_IMG measure --object secret,id=sec0,data=MTIzNDU2,format=base64 # size or filename needed
- $QEMU_IMG measure --image-opts # missing filename
- $QEMU_IMG measure -f qcow2 # missing filename
- $QEMU_IMG measure -l snap1 # missing filename
-diff --git a/tests/qemu-iotests/178.out.qcow2 b/tests/qemu-iotests/178.out.qcow2
-index XXXXXXX..XXXXXXX 100644
---- a/tests/qemu-iotests/178.out.qcow2
-+++ b/tests/qemu-iotests/178.out.qcow2
-@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
- qemu-img: Either --size N or one filename must be specified.
- qemu-img: --size N cannot be used together with a filename.
- qemu-img: At most one filename argument is allowed.
--qemu-img: --object, --image-opts, -f, and -l require a filename argument.
--qemu-img: --object, --image-opts, -f, and -l require a filename argument.
--qemu-img: --object, --image-opts, -f, and -l require a filename argument.
--qemu-img: --object, --image-opts, -f, and -l require a filename argument.
-+qemu-img: Either --size N or one filename must be specified.
-+qemu-img: --image-opts, -f, and -l require a filename argument.
-+qemu-img: --image-opts, -f, and -l require a filename argument.
-+qemu-img: --image-opts, -f, and -l require a filename argument.
- qemu-img: Invalid option list: ,
- qemu-img: Invalid parameter 'snapshot.foo'
- qemu-img: Failed in parsing snapshot param 'snapshot.foo'
-diff --git a/tests/qemu-iotests/178.out.raw b/tests/qemu-iotests/178.out.raw
-index XXXXXXX..XXXXXXX 100644
---- a/tests/qemu-iotests/178.out.raw
-+++ b/tests/qemu-iotests/178.out.raw
-@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
- qemu-img: Either --size N or one filename must be specified.
- qemu-img: --size N cannot be used together with a filename.
- qemu-img: At most one filename argument is allowed.
--qemu-img: --object, --image-opts, -f, and -l require a filename argument.
--qemu-img: --object, --image-opts, -f, and -l require a filename argument.
--qemu-img: --object, --image-opts, -f, and -l require a filename argument.
--qemu-img: --object, --image-opts, -f, and -l require a filename argument.
-+qemu-img: Either --size N or one filename must be specified.
-+qemu-img: --image-opts, -f, and -l require a filename argument.
-+qemu-img: --image-opts, -f, and -l require a filename argument.
-+qemu-img: --image-opts, -f, and -l require a filename argument.
- qemu-img: Invalid option list: ,
- qemu-img: Invalid parameter 'snapshot.foo'
- qemu-img: Failed in parsing snapshot param 'snapshot.foo'
---
-.24.1

-[PULL 06/19] block/curl: HTTP header field names are case insensitive
+[Qemu-devel] [PULL v2 8/8] iotests: Fix 205 for concurrent runs
-From: David Edmondson <david.edmondson@oracle.com>
+Tests should place their files into the test directory.  This includes
 Unix sockets.  205 currently fails to do so, which prevents it from
 being run concurrently.
-RFC 7230 section 3.2 indicates that HTTP header field names are case
+Signed-off-by: Max Reitz <mreitz@redhat.com>
-insensitive.
+Message-id: 20190618210238.9524-1-mreitz@redhat.com
+Reviewed-by: Eric Blake <eblake@redhat.com>
 Signed-off-by: David Edmondson <david.edmondson@oracle.com>
 Message-Id: <20200224101310.101169-3-david.edmondson@oracle.com>
 Reviewed-by: Max Reitz <mreitz@redhat.com>
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- block/curl.c | 5 +++--
+ tests/qemu-iotests/205 | 2 +-
-file changed, 3 insertions(+), 2 deletions(-)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/block/curl.c b/block/curl.c
+diff --git a/tests/qemu-iotests/205 b/tests/qemu-iotests/205
-index XXXXXXX..XXXXXXX 100644
+index XXXXXXX..XXXXXXX 100755
---- a/block/curl.c
+--- a/tests/qemu-iotests/205
-+++ b/block/curl.c
++++ b/tests/qemu-iotests/205
-@@ -XXX,XX +XXX,XX @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
+@@ -XXX,XX +XXX,XX @@ import iotests
-     size_t realsize = size * nmemb;
+ import time
-     const char *header = (char *)ptr;
+ from iotests import qemu_img_create, qemu_io, filter_qemu_io, QemuIoInteractive
-     const char *end = header + realsize;
--    const char *accept_ranges = "Accept-Ranges:";
+-nbd_sock = 'nbd_sock'
-+    const char *accept_ranges = "accept-ranges:";
++nbd_sock = os.path.join(iotests.test_dir, 'nbd_sock')
-     const char *bytes = "bytes";
+ nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock
+ disk = os.path.join(iotests.test_dir, 'disk')
      if (realsize >= strlen(accept_ranges)
 -        && strncmp(header, accept_ranges, strlen(accept_ranges)) == 0) {
 +        && g_ascii_strncasecmp(header, accept_ranges,
 +                               strlen(accept_ranges)) == 0) {
          char *p = strchr(header, ':') + 1;
 --
-.24.1
+.21.0

-[PULL 07/19] iotests: Fix nonportable use of od --endian
+Deleted patch
-From: Eric Blake <eblake@redhat.com>
-Tests 261 and 272 fail on RHEL 7 with coreutils 8.22, since od
---endian was not added until coreutils 8.23.  Fix this by manually
-constructing the final value one byte at a time.
-Fixes: fc8ba423
-Reported-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-Signed-off-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
-Message-Id: <20200226125424.481840-1-eblake@redhat.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- tests/qemu-iotests/common.rc | 22 +++++++++++++++++-----
-file changed, 17 insertions(+), 5 deletions(-)
-diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
-index XXXXXXX..XXXXXXX 100644
---- a/tests/qemu-iotests/common.rc
-+++ b/tests/qemu-iotests/common.rc
-@@ -XXX,XX +XXX,XX @@ poke_file()
- # peek_file_le 'test.img' 512 2 => 65534
- peek_file_le()
- {
--    # Wrap in echo $() to strip spaces
--    echo $(od -j"$2" -N"$3" --endian=little -An -vtu"$3" "$1")
-+    local val=0 shift=0 byte
-+
-+    # coreutils' od --endian is not portable, so manually assemble bytes.
-+    for byte in $(od -j"$2" -N"$3" -An -v -tu1 "$1"); do
-+        val=$(( val | (byte << shift) ))
-+        shift=$((shift + 8))
-+    done
-+    printf %llu $val
- }
- # peek_file_be 'test.img' 512 2 => 65279
- peek_file_be()
- {
--    # Wrap in echo $() to strip spaces
--    echo $(od -j"$2" -N"$3" --endian=big -An -vtu"$3" "$1")
-+    local val=0 byte
-+
-+    # coreutils' od --endian is not portable, so manually assemble bytes.
-+    for byte in $(od -j"$2" -N"$3" -An -v -tu1 "$1"); do
-+        val=$(( (val << 8) | byte ))
-+    done
-+    printf %llu $val
- }
--# peek_file_raw 'test.img' 512 2 => '\xff\xfe'
-+# peek_file_raw 'test.img' 512 2 => '\xff\xfe'. Do not use if the raw data
-+# is likely to contain \0 or trailing \n.
- peek_file_raw()
- {
-     dd if="$1" bs=1 skip="$2" count="$3" status=none
---
-.24.1

-[PULL 08/19] block/qcow2: do free crypto_opts in qcow2_close()
+Deleted patch
-From: Pan Nengyuan <pannengyuan@huawei.com>
-'crypto_opts' forgot to free in qcow2_close(), this patch fix the bellow leak stack:
-Direct leak of 24 byte(s) in 1 object(s) allocated from:
-    #0 0x7f0edd81f970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
-    #1 0x7f0edc6d149d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
-    #2 0x55d7eaede63d in qobject_input_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qobject-input-visitor.c:295
-    #3 0x55d7eaed78b8 in visit_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qapi-visit-core.c:49
-    #4 0x55d7eaf5140b in visit_type_QCryptoBlockOpenOptions qapi/qapi-visit-crypto.c:290
-    #5 0x55d7eae43af3 in block_crypto_open_opts_init /mnt/sdb/qemu-new/qemu_test/qemu/block/crypto.c:163
-    #6 0x55d7eacd2924 in qcow2_update_options_prepare /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1148
-    #7 0x55d7eacd33f7 in qcow2_update_options /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1232
-    #8 0x55d7eacd9680 in qcow2_do_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1512
-    #9 0x55d7eacdc55e in qcow2_open_entry /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1792
-    #10 0x55d7eacdc8fe in qcow2_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1819
-    #11 0x55d7eac3742d in bdrv_open_driver /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1317
-    #12 0x55d7eac3e990 in bdrv_open_common /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1575
-    #13 0x55d7eac4442c in bdrv_open_inherit /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3126
-    #14 0x55d7eac45c3f in bdrv_open /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3219
-    #15 0x55d7ead8e8a4 in blk_new_open /mnt/sdb/qemu-new/qemu_test/qemu/block/block-backend.c:397
-    #16 0x55d7eacde74c in qcow2_co_create /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3534
-    #17 0x55d7eacdfa6d in qcow2_co_create_opts /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3668
-    #18 0x55d7eac1c678 in bdrv_create_co_entry /mnt/sdb/qemu-new/qemu_test/qemu/block.c:485
-    #19 0x55d7eb0024d2 in coroutine_trampoline /mnt/sdb/qemu-new/qemu_test/qemu/util/coroutine-ucontext.c:115
-Reported-by: Euler Robot <euler.robot@huawei.com>
-Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
-Message-Id: <20200227012950.12256-2-pannengyuan@huawei.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- block/qcow2.c | 1 +
-file changed, 1 insertion(+)
-diff --git a/block/qcow2.c b/block/qcow2.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2.c
-+++ b/block/qcow2.c
-@@ -XXX,XX +XXX,XX @@ static void qcow2_close(BlockDriverState *bs)
-     qcrypto_block_free(s->crypto);
-     s->crypto = NULL;
-+    qapi_free_QCryptoBlockOpenOptions(s->crypto_opts);
-     g_free(s->unknown_header_fields);
-     cleanup_unknown_header_ext(bs);
---
-.24.1

-[PULL 09/19] qemu-img: free memory before re-assign
+Deleted patch
-From: Pan Nengyuan <pannengyuan@huawei.com>
-collect_image_check() is called twice in img_check(), the filename/format will be alloced without free the original memory.
-It is not a big deal since the process will exit anyway, but seems like a clean code and it will remove the warning spotted by asan.
-Reported-by: Euler Robot <euler.robot@huawei.com>
-Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
-Message-Id: <20200227012950.12256-3-pannengyuan@huawei.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- qemu-img.c | 2 ++
-file changed, 2 insertions(+)
-diff --git a/qemu-img.c b/qemu-img.c
-index XXXXXXX..XXXXXXX 100644
---- a/qemu-img.c
-+++ b/qemu-img.c
-@@ -XXX,XX +XXX,XX @@ static int img_check(int argc, char **argv)
-                     check->corruptions_fixed);
-         }
-+        qapi_free_ImageCheck(check);
-+        check = g_new0(ImageCheck, 1);
-         ret = collect_image_check(bs, check, filename, fmt, 0);
-         check->leaks_fixed          = leaks_fixed;
---
-.24.1

-[PULL 10/19] block/qcow2-threads: fix qcow2_decompress
+Deleted patch
-From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-On success path we return what inflate() returns instead of 0. And it
-most probably works for Z_STREAM_END as it is positive, but is
-definitely broken for Z_BUF_ERROR.
-While being here, switch to errno return code, to be closer to
-qcow2_compress API (and usual expectations).
-Revert condition in if to be more positive. Drop dead initialization of
-ret.
-Cc: qemu-stable@nongnu.org # v4.0
-Fixes: 341926ab83e2b
-Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Message-Id: <20200302150930.16218-1-vsementsov@virtuozzo.com>
-Reviewed-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Ján Tomko <jtomko@redhat.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- block/qcow2-threads.c | 12 +++++++-----
-file changed, 7 insertions(+), 5 deletions(-)
-diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2-threads.c
-+++ b/block/qcow2-threads.c
-@@ -XXX,XX +XXX,XX @@ static ssize_t qcow2_compress(void *dest, size_t dest_size,
-  * @src - source buffer, @src_size bytes
-  *
-  * Returns: 0 on success
-- *          -1 on fail
-+ *          -EIO on fail
-  */
- static ssize_t qcow2_decompress(void *dest, size_t dest_size,
-                                 const void *src, size_t src_size)
- {
--    int ret = 0;
-+    int ret;
-     z_stream strm;
-     memset(&strm, 0, sizeof(strm));
-@@ -XXX,XX +XXX,XX @@ static ssize_t qcow2_decompress(void *dest, size_t dest_size,
-     ret = inflateInit2(&strm, -12);
-     if (ret != Z_OK) {
--        return -1;
-+        return -EIO;
-     }
-     ret = inflate(&strm, Z_FINISH);
--    if ((ret != Z_STREAM_END && ret != Z_BUF_ERROR) || strm.avail_out != 0) {
-+    if ((ret == Z_STREAM_END || ret == Z_BUF_ERROR) && strm.avail_out == 0) {
-         /*
-          * We approve Z_BUF_ERROR because we need @dest buffer to be filled, but
-          * @src buffer may be processed partly (because in qcow2 we know size of
-          * compressed data with precision of one sector)
-          */
--        ret = -1;
-+        ret = 0;
-+    } else {
-+        ret = -EIO;
-     }
-     inflateEnd(&strm);
---
-.24.1

-[PULL 11/19] job: refactor progress to separate object
+Deleted patch
-From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-We need it in separate to pass to the block-copy object in the next
-commit.
-Cc: qemu-stable@nongnu.org
-Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
-Message-Id: <20200311103004.7649-2-vsementsov@virtuozzo.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- blockjob.c                    | 16 +++++-----
- include/qemu/job.h            | 11 ++-----
- include/qemu/progress_meter.h | 58 +++++++++++++++++++++++++++++++++++
- job-qmp.c                     |  4 +--
- job.c                         |  6 ++--
- qemu-img.c                    |  6 ++--
-files changed, 76 insertions(+), 25 deletions(-)
- create mode 100644 include/qemu/progress_meter.h
-diff --git a/blockjob.c b/blockjob.c
-index XXXXXXX..XXXXXXX 100644
---- a/blockjob.c
-+++ b/blockjob.c
-@@ -XXX,XX +XXX,XX @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
-     info->device    = g_strdup(job->job.id);
-     info->busy      = atomic_read(&job->job.busy);
-     info->paused    = job->job.pause_count > 0;
--    info->offset    = job->job.progress_current;
--    info->len       = job->job.progress_total;
-+    info->offset    = job->job.progress.current;
-+    info->len       = job->job.progress.total;
-     info->speed     = job->speed;
-     info->io_status = job->iostatus;
-     info->ready     = job_is_ready(&job->job),
-@@ -XXX,XX +XXX,XX @@ static void block_job_event_cancelled(Notifier *n, void *opaque)
-     qapi_event_send_block_job_cancelled(job_type(&job->job),
-                                         job->job.id,
--                                        job->job.progress_total,
--                                        job->job.progress_current,
-+                                        job->job.progress.total,
-+                                        job->job.progress.current,
-                                         job->speed);
- }
-@@ -XXX,XX +XXX,XX @@ static void block_job_event_completed(Notifier *n, void *opaque)
-     qapi_event_send_block_job_completed(job_type(&job->job),
-                                         job->job.id,
--                                        job->job.progress_total,
--                                        job->job.progress_current,
-+                                        job->job.progress.total,
-+                                        job->job.progress.current,
-                                         job->speed,
-                                         !!msg,
-                                         msg);
-@@ -XXX,XX +XXX,XX @@ static void block_job_event_ready(Notifier *n, void *opaque)
-     qapi_event_send_block_job_ready(job_type(&job->job),
-                                     job->job.id,
--                                    job->job.progress_total,
--                                    job->job.progress_current,
-+                                    job->job.progress.total,
-+                                    job->job.progress.current,
-                                     job->speed);
- }
-diff --git a/include/qemu/job.h b/include/qemu/job.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/job.h
-+++ b/include/qemu/job.h
-@@ -XXX,XX +XXX,XX @@
- #include "qapi/qapi-types-job.h"
- #include "qemu/queue.h"
-+#include "qemu/progress_meter.h"
- #include "qemu/coroutine.h"
- #include "block/aio.h"
-@@ -XXX,XX +XXX,XX @@ typedef struct Job {
-     /** True if this job should automatically dismiss itself */
-     bool auto_dismiss;
--    /**
--     * Current progress. The unit is arbitrary as long as the ratio between
--     * progress_current and progress_total represents the estimated percentage
--     * of work already done.
--     */
--    int64_t progress_current;
--
--    /** Estimated progress_current value at the completion of the job */
--    int64_t progress_total;
-+    ProgressMeter progress;
-     /**
-      * Return code from @run and/or @prepare callback(s).
-diff --git a/include/qemu/progress_meter.h b/include/qemu/progress_meter.h
-new file mode 100644
-index XXXXXXX..XXXXXXX
---- /dev/null
-+++ b/include/qemu/progress_meter.h
-@@ -XXX,XX +XXX,XX @@
-+/*
-+ * Helper functionality for some process progress tracking.
-+ *
-+ * Copyright (c) 2011 IBM Corp.
-+ * Copyright (c) 2012, 2018 Red Hat, Inc.
-+ * Copyright (c) 2020 Virtuozzo International GmbH
-+ *
-+ * Permission is hereby granted, free of charge, to any person obtaining a copy
-+ * of this software and associated documentation files (the "Software"), to deal
-+ * in the Software without restriction, including without limitation the rights
-+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-+ * copies of the Software, and to permit persons to whom the Software is
-+ * furnished to do so, subject to the following conditions:
-+ *
-+ * The above copyright notice and this permission notice shall be included in
-+ * all copies or substantial portions of the Software.
-+ *
-+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
-+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-+ * THE SOFTWARE.
-+ */
-+
-+#ifndef QEMU_PROGRESS_METER_H
-+#define QEMU_PROGRESS_METER_H
-+
-+typedef struct ProgressMeter {
-+    /**
-+     * Current progress. The unit is arbitrary as long as the ratio between
-+     * current and total represents the estimated percentage
-+     * of work already done.
-+     */
-+    uint64_t current;
-+
-+    /** Estimated current value at the completion of the process */
-+    uint64_t total;
-+} ProgressMeter;
-+
-+static inline void progress_work_done(ProgressMeter *pm, uint64_t done)
-+{
-+    pm->current += done;
-+}
-+
-+static inline void progress_set_remaining(ProgressMeter *pm, uint64_t remaining)
-+{
-+    pm->total = pm->current + remaining;
-+}
-+
-+static inline void progress_increase_remaining(ProgressMeter *pm,
-+                                               uint64_t delta)
-+{
-+    pm->total += delta;
-+}
-+
-+#endif /* QEMU_PROGRESS_METER_H */
-diff --git a/job-qmp.c b/job-qmp.c
-index XXXXXXX..XXXXXXX 100644
---- a/job-qmp.c
-+++ b/job-qmp.c
-@@ -XXX,XX +XXX,XX @@ static JobInfo *job_query_single(Job *job, Error **errp)
-         .id                 = g_strdup(job->id),
-         .type               = job_type(job),
-         .status             = job->status,
--        .current_progress   = job->progress_current,
--        .total_progress     = job->progress_total,
-+        .current_progress   = job->progress.current,
-+        .total_progress     = job->progress.total,
-         .has_error          = !!job->err,
-         .error              = job->err ? \
-                               g_strdup(error_get_pretty(job->err)) : NULL,
-diff --git a/job.c b/job.c
-index XXXXXXX..XXXXXXX 100644
---- a/job.c
-+++ b/job.c
-@@ -XXX,XX +XXX,XX @@ void job_unref(Job *job)
- void job_progress_update(Job *job, uint64_t done)
- {
--    job->progress_current += done;
-+    progress_work_done(&job->progress, done);
- }
- void job_progress_set_remaining(Job *job, uint64_t remaining)
- {
--    job->progress_total = job->progress_current + remaining;
-+    progress_set_remaining(&job->progress, remaining);
- }
- void job_progress_increase_remaining(Job *job, uint64_t delta)
- {
--    job->progress_total += delta;
-+    progress_increase_remaining(&job->progress, delta);
- }
- void job_event_cancelled(Job *job)
-diff --git a/qemu-img.c b/qemu-img.c
-index XXXXXXX..XXXXXXX 100644
---- a/qemu-img.c
-+++ b/qemu-img.c
-@@ -XXX,XX +XXX,XX @@ static void run_block_job(BlockJob *job, Error **errp)
-     do {
-         float progress = 0.0f;
-         aio_poll(aio_context, true);
--        if (job->job.progress_total) {
--            progress = (float)job->job.progress_current /
--                       job->job.progress_total * 100.f;
-+        if (job->job.progress.total) {
-+            progress = (float)job->job.progress.current /
-+                       job->job.progress.total * 100.f;
-         }
-         qemu_progress_print(progress, 0);
-     } while (!job_is_ready(&job->job) && !job_is_completed(&job->job));
---
-.24.1

-[PULL 12/19] block/block-copy: fix progress calculation
+Deleted patch
-From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Assume we have two regions, A and B, and region B is in-flight now,
-region A is not yet touched, but it is unallocated and should be
-skipped.
-Correspondingly, as progress we have
-  total = A + B
-  current = 0
-If we reset unallocated region A and call progress_reset_callback,
-it will calculate 0 bytes dirty in the bitmap and call
-job_progress_set_remaining, which will set
-   total = current + 0 = 0 + 0 = 0
-So, B bytes are actually removed from total accounting. When job
-finishes we'll have
-   total = 0
-   current = B
-, which doesn't sound good.
-This is because we didn't considered in-flight bytes, actually when
-calculating remaining, we should have set (in_flight + dirty_bytes)
-as remaining, not only dirty_bytes.
-To fix it, let's refactor progress calculation, moving it to block-copy
-itself instead of fixing callback. And, of course, track in_flight
-bytes count.
-We still have to keep one callback, to maintain backup job bytes_read
-calculation, but it will go on soon, when we turn the whole backup
-process into one block_copy call.
-Cc: qemu-stable@nongnu.org
-Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-Message-Id: <20200311103004.7649-3-vsementsov@virtuozzo.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- block/backup.c             | 13 ++-----------
- block/block-copy.c         | 16 ++++++++++++----
- include/block/block-copy.h | 15 +++++----------
-files changed, 19 insertions(+), 25 deletions(-)
-diff --git a/block/backup.c b/block/backup.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/backup.c
-+++ b/block/backup.c
-@@ -XXX,XX +XXX,XX @@ static void backup_progress_bytes_callback(int64_t bytes, void *opaque)
-     BackupBlockJob *s = opaque;
-     s->bytes_read += bytes;
--    job_progress_update(&s->common.job, bytes);
--}
--
--static void backup_progress_reset_callback(void *opaque)
--{
--    BackupBlockJob *s = opaque;
--    uint64_t estimate = bdrv_get_dirty_count(s->bcs->copy_bitmap);
--
--    job_progress_set_remaining(&s->common.job, estimate);
- }
- static int coroutine_fn backup_do_cow(BackupBlockJob *job,
-@@ -XXX,XX +XXX,XX @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
-     job->cluster_size = cluster_size;
-     job->len = len;
--    block_copy_set_callbacks(bcs, backup_progress_bytes_callback,
--                             backup_progress_reset_callback, job);
-+    block_copy_set_progress_callback(bcs, backup_progress_bytes_callback, job);
-+    block_copy_set_progress_meter(bcs, &job->common.job.progress);
-     /* Required permissions are already taken by backup-top target */
-     block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
-diff --git a/block/block-copy.c b/block/block-copy.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/block-copy.c
-+++ b/block/block-copy.c
-@@ -XXX,XX +XXX,XX @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
-     return s;
- }
--void block_copy_set_callbacks(
-+void block_copy_set_progress_callback(
-         BlockCopyState *s,
-         ProgressBytesCallbackFunc progress_bytes_callback,
--        ProgressResetCallbackFunc progress_reset_callback,
-         void *progress_opaque)
- {
-     s->progress_bytes_callback = progress_bytes_callback;
--    s->progress_reset_callback = progress_reset_callback;
-     s->progress_opaque = progress_opaque;
- }
-+void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm)
-+{
-+    s->progress = pm;
-+}
-+
- /*
-  * block_copy_do_copy
-  *
-@@ -XXX,XX +XXX,XX @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
-     if (!ret) {
-         bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
--        s->progress_reset_callback(s->progress_opaque);
-+        progress_set_remaining(s->progress,
-+                               bdrv_get_dirty_count(s->copy_bitmap) +
-+                               s->in_flight_bytes);
-     }
-     *count = bytes;
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
-         trace_block_copy_process(s, start);
-         bdrv_reset_dirty_bitmap(s->copy_bitmap, start, chunk_end - start);
-+        s->in_flight_bytes += chunk_end - start;
-         co_get_from_shres(s->mem, chunk_end - start);
-         ret = block_copy_do_copy(s, start, chunk_end, error_is_read);
-         co_put_to_shres(s->mem, chunk_end - start);
-+        s->in_flight_bytes -= chunk_end - start;
-         if (ret < 0) {
-             bdrv_set_dirty_bitmap(s->copy_bitmap, start, chunk_end - start);
-             break;
-         }
-+        progress_work_done(s->progress, chunk_end - start);
-         s->progress_bytes_callback(chunk_end - start, s->progress_opaque);
-         start = chunk_end;
-         ret = 0;
-diff --git a/include/block/block-copy.h b/include/block/block-copy.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/block/block-copy.h
-+++ b/include/block/block-copy.h
-@@ -XXX,XX +XXX,XX @@ typedef struct BlockCopyInFlightReq {
- } BlockCopyInFlightReq;
- typedef void (*ProgressBytesCallbackFunc)(int64_t bytes, void *opaque);
--typedef void (*ProgressResetCallbackFunc)(void *opaque);
- typedef struct BlockCopyState {
-     /*
-      * BdrvChild objects are not owned or managed by block-copy. They are
-@@ -XXX,XX +XXX,XX @@ typedef struct BlockCopyState {
-     BdrvChild *source;
-     BdrvChild *target;
-     BdrvDirtyBitmap *copy_bitmap;
-+    int64_t in_flight_bytes;
-     int64_t cluster_size;
-     bool use_copy_range;
-     int64_t copy_size;
-@@ -XXX,XX +XXX,XX @@ typedef struct BlockCopyState {
-      */
-     bool skip_unallocated;
-+    ProgressMeter *progress;
-     /* progress_bytes_callback: called when some copying progress is done. */
-     ProgressBytesCallbackFunc progress_bytes_callback;
--
--    /*
--     * progress_reset_callback: called when some bytes reset from copy_bitmap
--     * (see @skip_unallocated above). The callee is assumed to recalculate how
--     * many bytes remain based on the dirty bit count of copy_bitmap.
--     */
--    ProgressResetCallbackFunc progress_reset_callback;
-     void *progress_opaque;
-     SharedResource *mem;
-@@ -XXX,XX +XXX,XX @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
-                                      BdrvRequestFlags write_flags,
-                                      Error **errp);
--void block_copy_set_callbacks(
-+void block_copy_set_progress_callback(
-         BlockCopyState *s,
-         ProgressBytesCallbackFunc progress_bytes_callback,
--        ProgressResetCallbackFunc progress_reset_callback,
-         void *progress_opaque);
-+void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
-+
- void block_copy_state_free(BlockCopyState *s);
- int64_t block_copy_reset_unallocated(BlockCopyState *s,
---
-.24.1

-[PULL 13/19] block/block-copy: specialcase first copy_range request
+Deleted patch
-From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-In block_copy_do_copy we fallback to read+write if copy_range failed.
-In this case copy_size is larger than defined for buffered IO, and
-there is corresponding commit. Still, backup copies data cluster by
-cluster, and most of requests are limited to one cluster anyway, so the
-only source of this one bad-limited request is copy-before-write
-operation.
-Further patch will move backup to use block_copy directly, than for
-cases where copy_range is not supported, first request will be
-oversized in each backup. It's not good, let's change it now.
-Fix is simple: just limit first copy_range request like buffer-based
-request. If it succeed, set larger copy_range limit.
-Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
-Message-Id: <20200311103004.7649-4-vsementsov@virtuozzo.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- block/block-copy.c | 41 +++++++++++++++++++++++++++++++----------
-file changed, 31 insertions(+), 10 deletions(-)
-diff --git a/block/block-copy.c b/block/block-copy.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/block-copy.c
-+++ b/block/block-copy.c
-@@ -XXX,XX +XXX,XX @@ void block_copy_state_free(BlockCopyState *s)
-     g_free(s);
- }
-+static uint32_t block_copy_max_transfer(BdrvChild *source, BdrvChild *target)
-+{
-+    return MIN_NON_ZERO(INT_MAX,
-+                        MIN_NON_ZERO(source->bs->bl.max_transfer,
-+                                     target->bs->bl.max_transfer));
-+}
-+
- BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
-                                      int64_t cluster_size,
-                                      BdrvRequestFlags write_flags, Error **errp)
- {
-     BlockCopyState *s;
-     BdrvDirtyBitmap *copy_bitmap;
--    uint32_t max_transfer =
--            MIN_NON_ZERO(INT_MAX,
--                         MIN_NON_ZERO(source->bs->bl.max_transfer,
--                                      target->bs->bl.max_transfer));
-     copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL,
-                                            errp);
-@@ -XXX,XX +XXX,XX @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
-         .mem = shres_create(BLOCK_COPY_MAX_MEM),
-     };
--    if (max_transfer < cluster_size) {
-+    if (block_copy_max_transfer(source, target) < cluster_size) {
-         /*
-          * copy_range does not respect max_transfer. We don't want to bother
-          * with requests smaller than block-copy cluster size, so fallback to
-@@ -XXX,XX +XXX,XX @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
-         s->copy_size = cluster_size;
-     } else {
-         /*
--         * copy_range does not respect max_transfer (it's a TODO), so we factor
--         * that in here.
-+         * We enable copy-range, but keep small copy_size, until first
-+         * successful copy_range (look at block_copy_do_copy).
-          */
-         s->use_copy_range = true;
--        s->copy_size = MIN(MAX(cluster_size, BLOCK_COPY_MAX_COPY_RANGE),
--                           QEMU_ALIGN_DOWN(max_transfer, cluster_size));
-+        s->copy_size = MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER);
-     }
-     QLIST_INIT(&s->inflight_reqs);
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
-             s->copy_size = MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER);
-             /* Fallback to read+write with allocated buffer */
-         } else {
-+            if (s->use_copy_range) {
-+                /*
-+                 * Successful copy-range. Now increase copy_size.  copy_range
-+                 * does not respect max_transfer (it's a TODO), so we factor
-+                 * that in here.
-+                 *
-+                 * Note: we double-check s->use_copy_range for the case when
-+                 * parallel block-copy request unsets it during previous
-+                 * bdrv_co_copy_range call.
-+                 */
-+                s->copy_size =
-+                        MIN(MAX(s->cluster_size, BLOCK_COPY_MAX_COPY_RANGE),
-+                            QEMU_ALIGN_DOWN(block_copy_max_transfer(s->source,
-+                                                                    s->target),
-+                                            s->cluster_size));
-+            }
-             goto out;
-         }
-     }
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
-     /*
-      * In case of failed copy_range request above, we may proceed with buffered
-      * request larger than BLOCK_COPY_MAX_BUFFER. Still, further requests will
--     * be properly limited, so don't care too much.
-+     * be properly limited, so don't care too much. Moreover the most likely
-+     * case (copy_range is unsupported for the configuration, so the very first
-+     * copy_range request fails) is handled by setting large copy_size only
-+     * after first successful copy_range.
-      */
-     bounce_buffer = qemu_blockalign(s->source->bs, nbytes);
---
-.24.1

-[PULL 14/19] block/block-copy: use block_status
+Deleted patch
-From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Use bdrv_block_status_above to chose effective chunk size and to handle
-zeroes effectively.
-This substitutes checking for just being allocated or not, and drops
-old code path for it. Assistance by backup job is dropped too, as
-caching block-status information is more difficult than just caching
-is-allocated information in our dirty bitmap, and backup job is not
-good place for this caching anyway.
-Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
-Message-Id: <20200311103004.7649-5-vsementsov@virtuozzo.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- block/block-copy.c | 73 +++++++++++++++++++++++++++++++++++++---------
- block/trace-events |  1 +
-files changed, 61 insertions(+), 13 deletions(-)
-diff --git a/block/block-copy.c b/block/block-copy.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/block-copy.c
-+++ b/block/block-copy.c
-@@ -XXX,XX +XXX,XX @@ void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm)
-  */
- static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
-                                            int64_t start, int64_t end,
--                                           bool *error_is_read)
-+                                           bool zeroes, bool *error_is_read)
- {
-     int ret;
-     int nbytes = MIN(end, s->len) - start;
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
-     assert(QEMU_IS_ALIGNED(end, s->cluster_size));
-     assert(end < s->len || end == QEMU_ALIGN_UP(s->len, s->cluster_size));
-+    if (zeroes) {
-+        ret = bdrv_co_pwrite_zeroes(s->target, start, nbytes, s->write_flags &
-+                                    ~BDRV_REQ_WRITE_COMPRESSED);
-+        if (ret < 0) {
-+            trace_block_copy_write_zeroes_fail(s, start, ret);
-+            if (error_is_read) {
-+                *error_is_read = false;
-+            }
-+        }
-+        return ret;
-+    }
-+
-     if (s->use_copy_range) {
-         ret = bdrv_co_copy_range(s->source, start, s->target, start, nbytes,
-, s->write_flags);
-@@ -XXX,XX +XXX,XX @@ out:
-     return ret;
- }
-+static int block_copy_block_status(BlockCopyState *s, int64_t offset,
-+                                   int64_t bytes, int64_t *pnum)
-+{
-+    int64_t num;
-+    BlockDriverState *base;
-+    int ret;
-+
-+    if (s->skip_unallocated && s->source->bs->backing) {
-+        base = s->source->bs->backing->bs;
-+    } else {
-+        base = NULL;
-+    }
-+
-+    ret = bdrv_block_status_above(s->source->bs, base, offset, bytes, &num,
-+                                  NULL, NULL);
-+    if (ret < 0 || num < s->cluster_size) {
-+        /*
-+         * On error or if failed to obtain large enough chunk just fallback to
-+         * copy one cluster.
-+         */
-+        num = s->cluster_size;
-+        ret = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_DATA;
-+    } else if (offset + num == s->len) {
-+        num = QEMU_ALIGN_UP(num, s->cluster_size);
-+    } else {
-+        num = QEMU_ALIGN_DOWN(num, s->cluster_size);
-+    }
-+
-+    *pnum = num;
-+    return ret;
-+}
-+
- /*
-  * Check if the cluster starting at offset is allocated or not.
-  * return via pnum the number of contiguous clusters sharing this allocation.
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
- {
-     int ret = 0;
-     int64_t end = bytes + start; /* bytes */
--    int64_t status_bytes;
-     BlockCopyInFlightReq req;
-     /*
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
-     block_copy_inflight_req_begin(s, &req, start, end);
-     while (start < end) {
--        int64_t next_zero, chunk_end;
-+        int64_t next_zero, chunk_end, status_bytes;
-         if (!bdrv_dirty_bitmap_get(s->copy_bitmap, start)) {
-             trace_block_copy_skip(s, start);
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
-             chunk_end = next_zero;
-         }
--        if (s->skip_unallocated) {
--            ret = block_copy_reset_unallocated(s, start, &status_bytes);
--            if (ret == 0) {
--                trace_block_copy_skip_range(s, start, status_bytes);
--                start += status_bytes;
--                continue;
--            }
--            /* Clamp to known allocated region */
--            chunk_end = MIN(chunk_end, start + status_bytes);
-+        ret = block_copy_block_status(s, start, chunk_end - start,
-+                                      &status_bytes);
-+        if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
-+            bdrv_reset_dirty_bitmap(s->copy_bitmap, start, status_bytes);
-+            progress_set_remaining(s->progress,
-+                                   bdrv_get_dirty_count(s->copy_bitmap) +
-+                                   s->in_flight_bytes);
-+            trace_block_copy_skip_range(s, start, status_bytes);
-+            start += status_bytes;
-+            continue;
-         }
-+        chunk_end = MIN(chunk_end, start + status_bytes);
-+
-         trace_block_copy_process(s, start);
-         bdrv_reset_dirty_bitmap(s->copy_bitmap, start, chunk_end - start);
-         s->in_flight_bytes += chunk_end - start;
-         co_get_from_shres(s->mem, chunk_end - start);
--        ret = block_copy_do_copy(s, start, chunk_end, error_is_read);
-+        ret = block_copy_do_copy(s, start, chunk_end, ret & BDRV_BLOCK_ZERO,
-+                                 error_is_read);
-         co_put_to_shres(s->mem, chunk_end - start);
-         s->in_flight_bytes -= chunk_end - start;
-         if (ret < 0) {
-diff --git a/block/trace-events b/block/trace-events
-index XXXXXXX..XXXXXXX 100644
---- a/block/trace-events
-+++ b/block/trace-events
-@@ -XXX,XX +XXX,XX @@ block_copy_process(void *bcs, int64_t start) "bcs %p start %"PRId64
- block_copy_copy_range_fail(void *bcs, int64_t start, int ret) "bcs %p start %"PRId64" ret %d"
- block_copy_read_fail(void *bcs, int64_t start, int ret) "bcs %p start %"PRId64" ret %d"
- block_copy_write_fail(void *bcs, int64_t start, int ret) "bcs %p start %"PRId64" ret %d"
-+block_copy_write_zeroes_fail(void *bcs, int64_t start, int ret) "bcs %p start %"PRId64" ret %d"
- # ../blockdev.c
- qmp_block_job_cancel(void *job) "job %p"
---
-.24.1

-[PULL 15/19] block/block-copy: factor out find_conflicting_inflight_req
+Deleted patch
-From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Split find_conflicting_inflight_req to be used separately.
-Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
-Message-Id: <20200311103004.7649-6-vsementsov@virtuozzo.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- block/block-copy.c | 31 +++++++++++++++++++------------
-file changed, 19 insertions(+), 12 deletions(-)
-diff --git a/block/block-copy.c b/block/block-copy.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/block-copy.c
-+++ b/block/block-copy.c
-@@ -XXX,XX +XXX,XX @@
- #define BLOCK_COPY_MAX_BUFFER (1 * MiB)
- #define BLOCK_COPY_MAX_MEM (128 * MiB)
-+static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
-+                                                           int64_t start,
-+                                                           int64_t end)
-+{
-+    BlockCopyInFlightReq *req;
-+
-+    QLIST_FOREACH(req, &s->inflight_reqs, list) {
-+        if (end > req->start_byte && start < req->end_byte) {
-+            return req;
-+        }
-+    }
-+
-+    return NULL;
-+}
-+
- static void coroutine_fn block_copy_wait_inflight_reqs(BlockCopyState *s,
-                                                        int64_t start,
-                                                        int64_t end)
- {
-     BlockCopyInFlightReq *req;
--    bool waited;
--
--    do {
--        waited = false;
--        QLIST_FOREACH(req, &s->inflight_reqs, list) {
--            if (end > req->start_byte && start < req->end_byte) {
--                qemu_co_queue_wait(&req->wait_queue, NULL);
--                waited = true;
--                break;
--            }
--        }
--    } while (waited);
-+
-+    while ((req = find_conflicting_inflight_req(s, start, end))) {
-+        qemu_co_queue_wait(&req->wait_queue, NULL);
-+    }
- }
- static void block_copy_inflight_req_begin(BlockCopyState *s,
---
-.24.1

-[PULL 16/19] block/block-copy: refactor interfaces to use bytes instead of end
+Deleted patch
-From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-We have a lot of "chunk_end - start" invocations, let's switch to
-bytes/cur_bytes scheme instead.
-While being here, improve check on block_copy_do_copy parameters to not
-overflow when calculating nbytes and use int64_t for bytes in
-block_copy for consistency.
-Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
-Message-Id: <20200311103004.7649-7-vsementsov@virtuozzo.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- block/block-copy.c         | 78 ++++++++++++++++++++------------------
- include/block/block-copy.h |  6 +--
-files changed, 44 insertions(+), 40 deletions(-)
-diff --git a/block/block-copy.c b/block/block-copy.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/block-copy.c
-+++ b/block/block-copy.c
-@@ -XXX,XX +XXX,XX @@
- static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
-                                                            int64_t start,
--                                                           int64_t end)
-+                                                           int64_t bytes)
- {
-     BlockCopyInFlightReq *req;
-     QLIST_FOREACH(req, &s->inflight_reqs, list) {
--        if (end > req->start_byte && start < req->end_byte) {
-+        if (start + bytes > req->start && start < req->start + req->bytes) {
-             return req;
-         }
-     }
-@@ -XXX,XX +XXX,XX @@ static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
- static void coroutine_fn block_copy_wait_inflight_reqs(BlockCopyState *s,
-                                                        int64_t start,
--                                                       int64_t end)
-+                                                       int64_t bytes)
- {
-     BlockCopyInFlightReq *req;
--    while ((req = find_conflicting_inflight_req(s, start, end))) {
-+    while ((req = find_conflicting_inflight_req(s, start, bytes))) {
-         qemu_co_queue_wait(&req->wait_queue, NULL);
-     }
- }
- static void block_copy_inflight_req_begin(BlockCopyState *s,
-                                           BlockCopyInFlightReq *req,
--                                          int64_t start, int64_t end)
-+                                          int64_t start, int64_t bytes)
- {
--    req->start_byte = start;
--    req->end_byte = end;
-+    req->start = start;
-+    req->bytes = bytes;
-     qemu_co_queue_init(&req->wait_queue);
-     QLIST_INSERT_HEAD(&s->inflight_reqs, req, list);
- }
-@@ -XXX,XX +XXX,XX @@ void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm)
- /*
-  * block_copy_do_copy
-  *
-- * Do copy of cluser-aligned chunk. @end is allowed to exceed s->len only to
-- * cover last cluster when s->len is not aligned to clusters.
-+ * Do copy of cluster-aligned chunk. Requested region is allowed to exceed
-+ * s->len only to cover last cluster when s->len is not aligned to clusters.
-  *
-  * No sync here: nor bitmap neighter intersecting requests handling, only copy.
-  *
-  * Returns 0 on success.
-  */
- static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
--                                           int64_t start, int64_t end,
-+                                           int64_t start, int64_t bytes,
-                                            bool zeroes, bool *error_is_read)
- {
-     int ret;
--    int nbytes = MIN(end, s->len) - start;
-+    int64_t nbytes = MIN(start + bytes, s->len) - start;
-     void *bounce_buffer = NULL;
-+    assert(start >= 0 && bytes > 0 && INT64_MAX - start >= bytes);
-     assert(QEMU_IS_ALIGNED(start, s->cluster_size));
--    assert(QEMU_IS_ALIGNED(end, s->cluster_size));
--    assert(end < s->len || end == QEMU_ALIGN_UP(s->len, s->cluster_size));
-+    assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
-+    assert(start < s->len);
-+    assert(start + bytes <= s->len ||
-+           start + bytes == QEMU_ALIGN_UP(s->len, s->cluster_size));
-+    assert(nbytes < INT_MAX);
-     if (zeroes) {
-         ret = bdrv_co_pwrite_zeroes(s->target, start, nbytes, s->write_flags &
-@@ -XXX,XX +XXX,XX @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
- }
- int coroutine_fn block_copy(BlockCopyState *s,
--                            int64_t start, uint64_t bytes,
-+                            int64_t start, int64_t bytes,
-                             bool *error_is_read)
- {
-     int ret = 0;
--    int64_t end = bytes + start; /* bytes */
-     BlockCopyInFlightReq req;
-     /*
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
-            bdrv_get_aio_context(s->target->bs));
-     assert(QEMU_IS_ALIGNED(start, s->cluster_size));
--    assert(QEMU_IS_ALIGNED(end, s->cluster_size));
-+    assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
-     block_copy_wait_inflight_reqs(s, start, bytes);
--    block_copy_inflight_req_begin(s, &req, start, end);
-+    block_copy_inflight_req_begin(s, &req, start, bytes);
--    while (start < end) {
--        int64_t next_zero, chunk_end, status_bytes;
-+    while (bytes) {
-+        int64_t next_zero, cur_bytes, status_bytes;
-         if (!bdrv_dirty_bitmap_get(s->copy_bitmap, start)) {
-             trace_block_copy_skip(s, start);
-             start += s->cluster_size;
-+            bytes -= s->cluster_size;
-             continue; /* already copied */
-         }
--        chunk_end = MIN(end, start + s->copy_size);
-+        cur_bytes = MIN(bytes, s->copy_size);
-         next_zero = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, start,
--                                                chunk_end - start);
-+                                                cur_bytes);
-         if (next_zero >= 0) {
-             assert(next_zero > start); /* start is dirty */
--            assert(next_zero < chunk_end); /* no need to do MIN() */
--            chunk_end = next_zero;
-+            assert(next_zero < start + cur_bytes); /* no need to do MIN() */
-+            cur_bytes = next_zero - start;
-         }
--        ret = block_copy_block_status(s, start, chunk_end - start,
--                                      &status_bytes);
-+        ret = block_copy_block_status(s, start, cur_bytes, &status_bytes);
-         if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
-             bdrv_reset_dirty_bitmap(s->copy_bitmap, start, status_bytes);
-             progress_set_remaining(s->progress,
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
-                                    s->in_flight_bytes);
-             trace_block_copy_skip_range(s, start, status_bytes);
-             start += status_bytes;
-+            bytes -= status_bytes;
-             continue;
-         }
--        chunk_end = MIN(chunk_end, start + status_bytes);
-+        cur_bytes = MIN(cur_bytes, status_bytes);
-         trace_block_copy_process(s, start);
--        bdrv_reset_dirty_bitmap(s->copy_bitmap, start, chunk_end - start);
--        s->in_flight_bytes += chunk_end - start;
-+        bdrv_reset_dirty_bitmap(s->copy_bitmap, start, cur_bytes);
-+        s->in_flight_bytes += cur_bytes;
--        co_get_from_shres(s->mem, chunk_end - start);
--        ret = block_copy_do_copy(s, start, chunk_end, ret & BDRV_BLOCK_ZERO,
-+        co_get_from_shres(s->mem, cur_bytes);
-+        ret = block_copy_do_copy(s, start, cur_bytes, ret & BDRV_BLOCK_ZERO,
-                                  error_is_read);
--        co_put_to_shres(s->mem, chunk_end - start);
--        s->in_flight_bytes -= chunk_end - start;
-+        co_put_to_shres(s->mem, cur_bytes);
-+        s->in_flight_bytes -= cur_bytes;
-         if (ret < 0) {
--            bdrv_set_dirty_bitmap(s->copy_bitmap, start, chunk_end - start);
-+            bdrv_set_dirty_bitmap(s->copy_bitmap, start, cur_bytes);
-             break;
-         }
--        progress_work_done(s->progress, chunk_end - start);
--        s->progress_bytes_callback(chunk_end - start, s->progress_opaque);
--        start = chunk_end;
--        ret = 0;
-+        progress_work_done(s->progress, cur_bytes);
-+        s->progress_bytes_callback(cur_bytes, s->progress_opaque);
-+        start += cur_bytes;
-+        bytes -= cur_bytes;
-     }
-     block_copy_inflight_req_end(&req);
-diff --git a/include/block/block-copy.h b/include/block/block-copy.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/block/block-copy.h
-+++ b/include/block/block-copy.h
-@@ -XXX,XX +XXX,XX @@
- #include "qemu/co-shared-resource.h"
- typedef struct BlockCopyInFlightReq {
--    int64_t start_byte;
--    int64_t end_byte;
-+    int64_t start;
-+    int64_t bytes;
-     QLIST_ENTRY(BlockCopyInFlightReq) list;
-     CoQueue wait_queue; /* coroutines blocked on this request */
- } BlockCopyInFlightReq;
-@@ -XXX,XX +XXX,XX @@ void block_copy_state_free(BlockCopyState *s);
- int64_t block_copy_reset_unallocated(BlockCopyState *s,
-                                      int64_t offset, int64_t *count);
--int coroutine_fn block_copy(BlockCopyState *s, int64_t start, uint64_t bytes,
-+int coroutine_fn block_copy(BlockCopyState *s, int64_t start, int64_t bytes,
-                             bool *error_is_read);
- #endif /* BLOCK_COPY_H */
---
-.24.1

The following changes since commit ba29883206d92a29ad5a466e679ccfc2ee6132ef:

Merge remote-tracking branch 'remotes/borntraeger/tags/s390x-20200310' into staging (2020-03-10 16:50:28 +0000)

are available in the Git repository at:

https://github.com/XanClic/qemu.git tags/pull-block-2020-03-11

for you to fetch changes up to 397f4e9d83e9c0000905f0a988ba1aeda162571c:

block/block-copy: hide structure definitions (2020-03-11 12:42:30 +0100)

----------------------------------------------------------------
Block patches for the 5.0 softfreeze:
- qemu-img measure for LUKS
- Improve block-copy's performance by reducing inter-request
  dependencies
- Make curl's detection of accept-ranges more robust
- Memleak fixes
- iotest fix

----------------------------------------------------------------
David Edmondson (2):
  block/curl: HTTP header fields allow whitespace around values
  block/curl: HTTP header field names are case insensitive

Eric Blake (1):
  iotests: Fix nonportable use of od --endian

Pan Nengyuan (2):
  block/qcow2: do free crypto_opts in qcow2_close()
  qemu-img: free memory before re-assign

Stefan Hajnoczi (4):
  luks: extract qcrypto_block_calculate_payload_offset()
  luks: implement .bdrv_measure()
  qemu-img: allow qemu-img measure --object without a filename
  iotests: add 288 luks qemu-img measure test

Vladimir Sementsov-Ogievskiy (10):
  block/qcow2-threads: fix qcow2_decompress
  job: refactor progress to separate object
  block/block-copy: fix progress calculation
  block/block-copy: specialcase first copy_range request
  block/block-copy: use block_status
  block/block-copy: factor out find_conflicting_inflight_req
  block/block-copy: refactor interfaces to use bytes instead of end
  block/block-copy: rename start to offset in interfaces
  block/block-copy: reduce intersecting request lock
  block/block-copy: hide structure definitions

-- 
2.24.1

From: Stefan Hajnoczi <stefanha@redhat.com>

The qcow2 .bdrv_measure() code calculates the crypto payload offset.
This logic really belongs in crypto/block.c where it can be reused by
other image formats.

The "luks" block driver will need this same logic in order to implement
.bdrv_measure(), so extract the qcrypto_block_calculate_payload_offset()
function now.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200221112522.1497712-2-stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c          | 74 +++++++++++-------------------------------
 crypto/block.c         | 36 ++++++++++++++++++++
 include/crypto/block.h | 22 +++++++++++++
 3 files changed, 77 insertions(+), 55 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
     return ret;
 }
 
-static ssize_t qcow2_measure_crypto_hdr_init_func(QCryptoBlock *block,
-        size_t headerlen, void *opaque, Error **errp)
-{
-    size_t *headerlenp = opaque;
-
-    /* Stash away the payload size */
-    *headerlenp = headerlen;
-    return 0;
-}
-
-static ssize_t qcow2_measure_crypto_hdr_write_func(QCryptoBlock *block,
-        size_t offset, const uint8_t *buf, size_t buflen,
-        void *opaque, Error **errp)
-{
-    /* Discard the bytes, we're not actually writing to an image */
-    return buflen;
-}
-
-/* Determine the number of bytes for the LUKS payload */
-static bool qcow2_measure_luks_headerlen(QemuOpts *opts, size_t *len,
-                                         Error **errp)
-{
-    QDict *opts_qdict;
-    QDict *cryptoopts_qdict;
-    QCryptoBlockCreateOptions *cryptoopts;
-    QCryptoBlock *crypto;
-
-    /* Extract "encrypt." options into a qdict */
-    opts_qdict = qemu_opts_to_qdict(opts, NULL);
-    qdict_extract_subqdict(opts_qdict, &cryptoopts_qdict, "encrypt.");
-    qobject_unref(opts_qdict);
-
-    /* Build QCryptoBlockCreateOptions object from qdict */
-    qdict_put_str(cryptoopts_qdict, "format", "luks");
-    cryptoopts = block_crypto_create_opts_init(cryptoopts_qdict, errp);
-    qobject_unref(cryptoopts_qdict);
-    if (!cryptoopts) {
-        return false;
-    }
-
-    /* Fake LUKS creation in order to determine the payload size */
-    crypto = qcrypto_block_create(cryptoopts, "encrypt.",
-                                  qcow2_measure_crypto_hdr_init_func,
-                                  qcow2_measure_crypto_hdr_write_func,
-                                  len, errp);
-    qapi_free_QCryptoBlockCreateOptions(cryptoopts);
-    if (!crypto) {
-        return false;
-    }
-
-    qcrypto_block_free(crypto);
-    return true;
-}
-
 static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
                                        Error **errp)
 {
@@ -XXX,XX +XXX,XX @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
     g_free(optstr);
 
     if (has_luks) {
+        g_autoptr(QCryptoBlockCreateOptions) create_opts = NULL;
+        QDict *opts_qdict;
+        QDict *cryptoopts;
         size_t headerlen;
 
-        if (!qcow2_measure_luks_headerlen(opts, &headerlen, &local_err)) {
+        opts_qdict = qemu_opts_to_qdict(opts, NULL);
+        qdict_extract_subqdict(opts_qdict, &cryptoopts, "encrypt.");
+        qobject_unref(opts_qdict);
+
+        qdict_put_str(cryptoopts, "format", "luks");
+
+        create_opts = block_crypto_create_opts_init(cryptoopts, errp);
+        qobject_unref(cryptoopts);
+        if (!create_opts) {
+            goto err;
+        }
+
+        if (!qcrypto_block_calculate_payload_offset(create_opts,
+                                                    "encrypt.",
+                                                    &headerlen,
+                                                    &local_err)) {
             goto err;
         }
 
diff --git a/crypto/block.c b/crypto/block.c
index XXXXXXX..XXXXXXX 100644
--- a/crypto/block.c
+++ b/crypto/block.c
@@ -XXX,XX +XXX,XX @@ QCryptoBlock *qcrypto_block_create(QCryptoBlockCreateOptions *options,
 }
 
 
+static ssize_t qcrypto_block_headerlen_hdr_init_func(QCryptoBlock *block,
+        size_t headerlen, void *opaque, Error **errp)
+{
+    size_t *headerlenp = opaque;
+
+    /* Stash away the payload size */
+    *headerlenp = headerlen;
+    return 0;
+}
+
+
+static ssize_t qcrypto_block_headerlen_hdr_write_func(QCryptoBlock *block,
+        size_t offset, const uint8_t *buf, size_t buflen,
+        void *opaque, Error **errp)
+{
+    /* Discard the bytes, we're not actually writing to an image */
+    return buflen;
+}
+
+
+bool
+qcrypto_block_calculate_payload_offset(QCryptoBlockCreateOptions *create_opts,
+                                       const char *optprefix,
+                                       size_t *len,
+                                       Error **errp)
+{
+    /* Fake LUKS creation in order to determine the payload size */
+    g_autoptr(QCryptoBlock) crypto =
+        qcrypto_block_create(create_opts, optprefix,
+                             qcrypto_block_headerlen_hdr_init_func,
+                             qcrypto_block_headerlen_hdr_write_func,
+                             len, errp);
+    return crypto != NULL;
+}
+
+
 QCryptoBlockInfo *qcrypto_block_get_info(QCryptoBlock *block,
                                          Error **errp)
 {
diff --git a/include/crypto/block.h b/include/crypto/block.h
index XXXXXXX..XXXXXXX 100644
--- a/include/crypto/block.h
+++ b/include/crypto/block.h
@@ -XXX,XX +XXX,XX @@ QCryptoBlock *qcrypto_block_create(QCryptoBlockCreateOptions *options,
                                    Error **errp);
 
 
+/**
+ * qcrypto_block_calculate_payload_offset:
+ * @create_opts: the encryption options
+ * @optprefix: name prefix for options
+ * @len: output for number of header bytes before payload
+ * @errp: pointer to a NULL-initialized error object
+ *
+ * Calculate the number of header bytes before the payload in an encrypted
+ * storage volume.  The header is an area before the payload that is reserved
+ * for encryption metadata.
+ *
+ * Returns: true on success, false on error
+ */
+bool
+qcrypto_block_calculate_payload_offset(QCryptoBlockCreateOptions *create_opts,
+                                       const char *optprefix,
+                                       size_t *len,
+                                       Error **errp);
+
+
 /**
  * qcrypto_block_get_info:
  * @block: the block encryption object
@@ -XXX,XX +XXX,XX @@ uint64_t qcrypto_block_get_sector_size(QCryptoBlock *block);
 void qcrypto_block_free(QCryptoBlock *block);
 
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoBlock, qcrypto_block_free)
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(QCryptoBlockCreateOptions,
+                              qapi_free_QCryptoBlockCreateOptions)
 
 #endif /* QCRYPTO_BLOCK_H */
-- 
2.24.1

From: Stefan Hajnoczi <stefanha@redhat.com>

Add qemu-img measure support in the "luks" block driver.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200221112522.1497712-3-stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/crypto.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/block/crypto.c b/block/crypto.c
index XXXXXXX..XXXXXXX 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -XXX,XX +XXX,XX @@ static int64_t block_crypto_getlength(BlockDriverState *bs)
 }
 
 
+static BlockMeasureInfo *block_crypto_measure(QemuOpts *opts,
+                                              BlockDriverState *in_bs,
+                                              Error **errp)
+{
+    g_autoptr(QCryptoBlockCreateOptions) create_opts = NULL;
+    Error *local_err = NULL;
+    BlockMeasureInfo *info;
+    uint64_t size;
+    size_t luks_payload_size;
+    QDict *cryptoopts;
+
+    /*
+     * Preallocation mode doesn't affect size requirements but we must consume
+     * the option.
+     */
+    g_free(qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC));
+
+    size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
+
+    if (in_bs) {
+        int64_t ssize = bdrv_getlength(in_bs);
+
+        if (ssize < 0) {
+            error_setg_errno(&local_err, -ssize,
+                             "Unable to get image virtual_size");
+            goto err;
+        }
+
+        size = ssize;
+    }
+
+    cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
+            &block_crypto_create_opts_luks, true);
+    qdict_put_str(cryptoopts, "format", "luks");
+    create_opts = block_crypto_create_opts_init(cryptoopts, &local_err);
+    qobject_unref(cryptoopts);
+    if (!create_opts) {
+        goto err;
+    }
+
+    if (!qcrypto_block_calculate_payload_offset(create_opts, NULL,
+                                                &luks_payload_size,
+                                                &local_err)) {
+        goto err;
+    }
+
+    /*
+     * Unallocated blocks are still encrypted so allocation status makes no
+     * difference to the file size.
+     */
+    info = g_new(BlockMeasureInfo, 1);
+    info->fully_allocated = luks_payload_size + size;
+    info->required = luks_payload_size + size;
+    return info;
+
+err:
+    error_propagate(errp, local_err);
+    return NULL;
+}
+
+
 static int block_crypto_probe_luks(const uint8_t *buf,
                                    int buf_size,
                                    const char *filename) {
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_crypto_luks = {
     .bdrv_co_preadv     = block_crypto_co_preadv,
     .bdrv_co_pwritev    = block_crypto_co_pwritev,
     .bdrv_getlength     = block_crypto_getlength,
+    .bdrv_measure       = block_crypto_measure,
     .bdrv_get_info      = block_crypto_get_info_luks,
     .bdrv_get_specific_info = block_crypto_get_specific_info_luks,
 
-- 
2.24.1

From: Stefan Hajnoczi <stefanha@redhat.com>

In most qemu-img sub-commands the --object option only makes sense when
there is a filename.  qemu-img measure is an exception because objects
may be referenced from the image creation options instead of an existing
image file.  Allow --object without a filename.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200221112522.1497712-4-stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 qemu-img.c                       | 6 ++----
 tests/qemu-iotests/178           | 2 +-
 tests/qemu-iotests/178.out.qcow2 | 8 ++++----
 tests/qemu-iotests/178.out.raw   | 8 ++++----
 4 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index XXXXXXX..XXXXXXX 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -XXX,XX +XXX,XX @@ static int img_measure(int argc, char **argv)
         filename = argv[optind];
     }
 
-    if (!filename &&
-        (object_opts || image_opts || fmt || snapshot_name || sn_opts)) {
-        error_report("--object, --image-opts, -f, and -l "
-                     "require a filename argument.");
+    if (!filename && (image_opts || fmt || snapshot_name || sn_opts)) {
+        error_report("--image-opts, -f, and -l require a filename argument.");
         goto out;
     }
     if (filename && img_size != UINT64_MAX) {
diff --git a/tests/qemu-iotests/178 b/tests/qemu-iotests/178
index XXXXXXX..XXXXXXX 100755
--- a/tests/qemu-iotests/178
+++ b/tests/qemu-iotests/178
@@ -XXX,XX +XXX,XX @@ _make_test_img 1G
 $QEMU_IMG measure # missing arguments
 $QEMU_IMG measure --size 2G "$TEST_IMG" # only one allowed
 $QEMU_IMG measure "$TEST_IMG" a # only one filename allowed
-$QEMU_IMG measure --object secret,id=sec0,data=MTIzNDU2,format=base64 # missing filename
+$QEMU_IMG measure --object secret,id=sec0,data=MTIzNDU2,format=base64 # size or filename needed
 $QEMU_IMG measure --image-opts # missing filename
 $QEMU_IMG measure -f qcow2 # missing filename
 $QEMU_IMG measure -l snap1 # missing filename
diff --git a/tests/qemu-iotests/178.out.qcow2 b/tests/qemu-iotests/178.out.qcow2
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/178.out.qcow2
+++ b/tests/qemu-iotests/178.out.qcow2
@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 qemu-img: Either --size N or one filename must be specified.
 qemu-img: --size N cannot be used together with a filename.
 qemu-img: At most one filename argument is allowed.
-qemu-img: --object, --image-opts, -f, and -l require a filename argument.
-qemu-img: --object, --image-opts, -f, and -l require a filename argument.
-qemu-img: --object, --image-opts, -f, and -l require a filename argument.
-qemu-img: --object, --image-opts, -f, and -l require a filename argument.
+qemu-img: Either --size N or one filename must be specified.
+qemu-img: --image-opts, -f, and -l require a filename argument.
+qemu-img: --image-opts, -f, and -l require a filename argument.
+qemu-img: --image-opts, -f, and -l require a filename argument.
 qemu-img: Invalid option list: ,
 qemu-img: Invalid parameter 'snapshot.foo'
 qemu-img: Failed in parsing snapshot param 'snapshot.foo'
diff --git a/tests/qemu-iotests/178.out.raw b/tests/qemu-iotests/178.out.raw
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/178.out.raw
+++ b/tests/qemu-iotests/178.out.raw
@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
 qemu-img: Either --size N or one filename must be specified.
 qemu-img: --size N cannot be used together with a filename.
 qemu-img: At most one filename argument is allowed.
-qemu-img: --object, --image-opts, -f, and -l require a filename argument.
-qemu-img: --object, --image-opts, -f, and -l require a filename argument.
-qemu-img: --object, --image-opts, -f, and -l require a filename argument.
-qemu-img: --object, --image-opts, -f, and -l require a filename argument.
+qemu-img: Either --size N or one filename must be specified.
+qemu-img: --image-opts, -f, and -l require a filename argument.
+qemu-img: --image-opts, -f, and -l require a filename argument.
+qemu-img: --image-opts, -f, and -l require a filename argument.
 qemu-img: Invalid option list: ,
 qemu-img: Invalid parameter 'snapshot.foo'
 qemu-img: Failed in parsing snapshot param 'snapshot.foo'
-- 
2.24.1

From: Stefan Hajnoczi <stefanha@redhat.com>

This test exercises the block/crypto.c "luks" block driver
.bdrv_measure() code.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200221112522.1497712-5-stefanha@redhat.com>
[mreitz: Renamed test from 282 to 288]
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/288     | 93 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/288.out | 30 ++++++++++++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 124 insertions(+)
 create mode 100755 tests/qemu-iotests/288
 create mode 100644 tests/qemu-iotests/288.out

diff --git a/tests/qemu-iotests/288 b/tests/qemu-iotests/288
new file mode 100755
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qemu-iotests/288
@@ -XXX,XX +XXX,XX @@
+#!/usr/bin/env bash
+#
+# qemu-img measure tests for LUKS images
+#
+# Copyright (C) 2020 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=stefanha@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1    # failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+    rm -f "$TEST_IMG.converted"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.pattern
+
+_supported_fmt luks
+_supported_proto file
+_supported_os Linux
+
+SECRET=secret,id=sec0,data=passphrase
+
+echo "== measure 1G image file =="
+echo
+
+$QEMU_IMG measure --object "$SECRET" \
+	          -O "$IMGFMT" \
+		  -o key-secret=sec0,iter-time=10 \
+		  --size 1G
+
+echo
+echo "== create 1G image file (size should be no greater than measured) =="
+echo
+
+_make_test_img 1G
+stat -c "image file size in bytes: %s" "$TEST_IMG_FILE"
+
+echo
+echo "== modified 1G image file (size should be no greater than measured) =="
+echo
+
+$QEMU_IO --object "$SECRET" --image-opts "$TEST_IMG" -c "write -P 0x51 0x10000 0x400" | _filter_qemu_io | _filter_testdir
+stat -c "image file size in bytes: %s" "$TEST_IMG_FILE"
+
+echo
+echo "== measure preallocation=falloc 1G image file =="
+echo
+
+$QEMU_IMG measure --object "$SECRET" \
+	          -O "$IMGFMT" \
+		  -o key-secret=sec0,iter-time=10,preallocation=falloc \
+		  --size 1G
+
+echo
+echo "== measure with input image file =="
+echo
+
+IMGFMT=raw IMGKEYSECRET= IMGOPTS= _make_test_img 1G | _filter_imgfmt
+QEMU_IO_OPTIONS= IMGOPTSSYNTAX= $QEMU_IO -f raw -c "write -P 0x51 0x10000 0x400" "$TEST_IMG_FILE" | _filter_qemu_io | _filter_testdir
+$QEMU_IMG measure --object "$SECRET" \
+	          -O "$IMGFMT" \
+		  -o key-secret=sec0,iter-time=10 \
+		  -f raw \
+		  "$TEST_IMG_FILE"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/288.out b/tests/qemu-iotests/288.out
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qemu-iotests/288.out
@@ -XXX,XX +XXX,XX @@
+QA output created by 288
+== measure 1G image file ==
+
+required size: 1075810304
+fully allocated size: 1075810304
+
+== create 1G image file (size should be no greater than measured) ==
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
+image file size in bytes: 1075810304
+
+== modified 1G image file (size should be no greater than measured) ==
+
+wrote 1024/1024 bytes at offset 65536
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+image file size in bytes: 1075810304
+
+== measure preallocation=falloc 1G image file ==
+
+required size: 1075810304
+fully allocated size: 1075810304
+
+== measure with input image file ==
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824
+wrote 1024/1024 bytes at offset 65536
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+required size: 1075810304
+fully allocated size: 1075810304
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -XXX,XX +XXX,XX @@
 283 auto quick
 284 rw
 286 rw quick
+288 quick
-- 
2.24.1

From: David Edmondson <david.edmondson@oracle.com>

RFC 7230 section 3.2 indicates that whitespace is permitted between
the field name and field value and after the field value.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20200224101310.101169-2-david.edmondson@oracle.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/curl.c | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index XXXXXXX..XXXXXXX 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -XXX,XX +XXX,XX @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
 {
     BDRVCURLState *s = opaque;
     size_t realsize = size * nmemb;
-    const char *accept_line = "Accept-Ranges: bytes";
+    const char *header = (char *)ptr;
+    const char *end = header + realsize;
+    const char *accept_ranges = "Accept-Ranges:";
+    const char *bytes = "bytes";
 
-    if (realsize >= strlen(accept_line)
-        && strncmp((char *)ptr, accept_line, strlen(accept_line)) == 0) {
-        s->accept_range = true;
+    if (realsize >= strlen(accept_ranges)
+        && strncmp(header, accept_ranges, strlen(accept_ranges)) == 0) {
+
+        char *p = strchr(header, ':') + 1;
+
+        /* Skip whitespace between the header name and value. */
+        while (p < end && *p && g_ascii_isspace(*p)) {
+            p++;
+        }
+
+        if (end - p >= strlen(bytes)
+            && strncmp(p, bytes, strlen(bytes)) == 0) {
+
+            /* Check that there is nothing but whitespace after the value. */
+            p += strlen(bytes);
+            while (p < end && *p && g_ascii_isspace(*p)) {
+                p++;
+            }
+
+            if (p == end || !*p) {
+                s->accept_range = true;
+            }
+        }
     }
 
     return realsize;
-- 
2.24.1

From: David Edmondson <david.edmondson@oracle.com>

RFC 7230 section 3.2 indicates that HTTP header field names are case
insensitive.

Signed-off-by: David Edmondson <david.edmondson@oracle.com>
Message-Id: <20200224101310.101169-3-david.edmondson@oracle.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/curl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index XXXXXXX..XXXXXXX 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -XXX,XX +XXX,XX @@ static size_t curl_header_cb(void *ptr, size_t size, size_t nmemb, void *opaque)
     size_t realsize = size * nmemb;
     const char *header = (char *)ptr;
     const char *end = header + realsize;
-    const char *accept_ranges = "Accept-Ranges:";
+    const char *accept_ranges = "accept-ranges:";
     const char *bytes = "bytes";
 
     if (realsize >= strlen(accept_ranges)
-        && strncmp(header, accept_ranges, strlen(accept_ranges)) == 0) {
+        && g_ascii_strncasecmp(header, accept_ranges,
+                               strlen(accept_ranges)) == 0) {
 
         char *p = strchr(header, ':') + 1;
 
-- 
2.24.1

From: Eric Blake <eblake@redhat.com>

Tests 261 and 272 fail on RHEL 7 with coreutils 8.22, since od
--endian was not added until coreutils 8.23.  Fix this by manually
constructing the final value one byte at a time.

Fixes: fc8ba423
Reported-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200226125424.481840-1-eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/common.rc | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -XXX,XX +XXX,XX @@ poke_file()
 # peek_file_le 'test.img' 512 2 => 65534
 peek_file_le()
 {
-    # Wrap in echo $() to strip spaces
-    echo $(od -j"$2" -N"$3" --endian=little -An -vtu"$3" "$1")
+    local val=0 shift=0 byte
+
+    # coreutils' od --endian is not portable, so manually assemble bytes.
+    for byte in $(od -j"$2" -N"$3" -An -v -tu1 "$1"); do
+        val=$(( val | (byte << shift) ))
+        shift=$((shift + 8))
+    done
+    printf %llu $val
 }
 
 # peek_file_be 'test.img' 512 2 => 65279
 peek_file_be()
 {
-    # Wrap in echo $() to strip spaces
-    echo $(od -j"$2" -N"$3" --endian=big -An -vtu"$3" "$1")
+    local val=0 byte
+
+    # coreutils' od --endian is not portable, so manually assemble bytes.
+    for byte in $(od -j"$2" -N"$3" -An -v -tu1 "$1"); do
+        val=$(( (val << 8) | byte ))
+    done
+    printf %llu $val
 }
 
-# peek_file_raw 'test.img' 512 2 => '\xff\xfe'
+# peek_file_raw 'test.img' 512 2 => '\xff\xfe'. Do not use if the raw data
+# is likely to contain \0 or trailing \n.
 peek_file_raw()
 {
     dd if="$1" bs=1 skip="$2" count="$3" status=none
-- 
2.24.1

From: Pan Nengyuan <pannengyuan@huawei.com>

'crypto_opts' forgot to free in qcow2_close(), this patch fix the bellow leak stack:

Direct leak of 24 byte(s) in 1 object(s) allocated from:
    #0 0x7f0edd81f970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970)
    #1 0x7f0edc6d149d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d)
    #2 0x55d7eaede63d in qobject_input_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qobject-input-visitor.c:295
    #3 0x55d7eaed78b8 in visit_start_struct /mnt/sdb/qemu-new/qemu_test/qemu/qapi/qapi-visit-core.c:49
    #4 0x55d7eaf5140b in visit_type_QCryptoBlockOpenOptions qapi/qapi-visit-crypto.c:290
    #5 0x55d7eae43af3 in block_crypto_open_opts_init /mnt/sdb/qemu-new/qemu_test/qemu/block/crypto.c:163
    #6 0x55d7eacd2924 in qcow2_update_options_prepare /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1148
    #7 0x55d7eacd33f7 in qcow2_update_options /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1232
    #8 0x55d7eacd9680 in qcow2_do_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1512
    #9 0x55d7eacdc55e in qcow2_open_entry /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1792
    #10 0x55d7eacdc8fe in qcow2_open /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:1819
    #11 0x55d7eac3742d in bdrv_open_driver /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1317
    #12 0x55d7eac3e990 in bdrv_open_common /mnt/sdb/qemu-new/qemu_test/qemu/block.c:1575
    #13 0x55d7eac4442c in bdrv_open_inherit /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3126
    #14 0x55d7eac45c3f in bdrv_open /mnt/sdb/qemu-new/qemu_test/qemu/block.c:3219
    #15 0x55d7ead8e8a4 in blk_new_open /mnt/sdb/qemu-new/qemu_test/qemu/block/block-backend.c:397
    #16 0x55d7eacde74c in qcow2_co_create /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3534
    #17 0x55d7eacdfa6d in qcow2_co_create_opts /mnt/sdb/qemu-new/qemu_test/qemu/block/qcow2.c:3668
    #18 0x55d7eac1c678 in bdrv_create_co_entry /mnt/sdb/qemu-new/qemu_test/qemu/block.c:485
    #19 0x55d7eb0024d2 in coroutine_trampoline /mnt/sdb/qemu-new/qemu_test/qemu/util/coroutine-ucontext.c:115

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200227012950.12256-2-pannengyuan@huawei.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -XXX,XX +XXX,XX @@ static void qcow2_close(BlockDriverState *bs)
 
     qcrypto_block_free(s->crypto);
     s->crypto = NULL;
+    qapi_free_QCryptoBlockOpenOptions(s->crypto_opts);
 
     g_free(s->unknown_header_fields);
     cleanup_unknown_header_ext(bs);
-- 
2.24.1

From: Pan Nengyuan <pannengyuan@huawei.com>

collect_image_check() is called twice in img_check(), the filename/format will be alloced without free the original memory.
It is not a big deal since the process will exit anyway, but seems like a clean code and it will remove the warning spotted by asan.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200227012950.12256-3-pannengyuan@huawei.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 qemu-img.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index XXXXXXX..XXXXXXX 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -XXX,XX +XXX,XX @@ static int img_check(int argc, char **argv)
                     check->corruptions_fixed);
         }
 
+        qapi_free_ImageCheck(check);
+        check = g_new0(ImageCheck, 1);
         ret = collect_image_check(bs, check, filename, fmt, 0);
 
         check->leaks_fixed          = leaks_fixed;
-- 
2.24.1

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

On success path we return what inflate() returns instead of 0. And it
most probably works for Z_STREAM_END as it is positive, but is
definitely broken for Z_BUF_ERROR.

While being here, switch to errno return code, to be closer to
qcow2_compress API (and usual expectations).

Revert condition in if to be more positive. Drop dead initialization of
ret.

Cc: qemu-stable@nongnu.org # v4.0
Fixes: 341926ab83e2b
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200302150930.16218-1-vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Ján Tomko <jtomko@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-threads.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -XXX,XX +XXX,XX @@ static ssize_t qcow2_compress(void *dest, size_t dest_size,
  * @src - source buffer, @src_size bytes
  *
  * Returns: 0 on success
- *          -1 on fail
+ *          -EIO on fail
  */
 static ssize_t qcow2_decompress(void *dest, size_t dest_size,
                                 const void *src, size_t src_size)
 {
-    int ret = 0;
+    int ret;
     z_stream strm;
 
     memset(&strm, 0, sizeof(strm));
@@ -XXX,XX +XXX,XX @@ static ssize_t qcow2_decompress(void *dest, size_t dest_size,
 
     ret = inflateInit2(&strm, -12);
     if (ret != Z_OK) {
-        return -1;
+        return -EIO;
     }
 
     ret = inflate(&strm, Z_FINISH);
-    if ((ret != Z_STREAM_END && ret != Z_BUF_ERROR) || strm.avail_out != 0) {
+    if ((ret == Z_STREAM_END || ret == Z_BUF_ERROR) && strm.avail_out == 0) {
         /*
          * We approve Z_BUF_ERROR because we need @dest buffer to be filled, but
          * @src buffer may be processed partly (because in qcow2 we know size of
          * compressed data with precision of one sector)
          */
-        ret = -1;
+        ret = 0;
+    } else {
+        ret = -EIO;
     }
 
     inflateEnd(&strm);
-- 
2.24.1

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

We need it in separate to pass to the block-copy object in the next
commit.

Cc: qemu-stable@nongnu.org
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-2-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 blockjob.c                    | 16 +++++-----
 include/qemu/job.h            | 11 ++-----
 include/qemu/progress_meter.h | 58 +++++++++++++++++++++++++++++++++++
 job-qmp.c                     |  4 +--
 job.c                         |  6 ++--
 qemu-img.c                    |  6 ++--
 6 files changed, 76 insertions(+), 25 deletions(-)
 create mode 100644 include/qemu/progress_meter.h

diff --git a/blockjob.c b/blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
     info->device    = g_strdup(job->job.id);
     info->busy      = atomic_read(&job->job.busy);
     info->paused    = job->job.pause_count > 0;
-    info->offset    = job->job.progress_current;
-    info->len       = job->job.progress_total;
+    info->offset    = job->job.progress.current;
+    info->len       = job->job.progress.total;
     info->speed     = job->speed;
     info->io_status = job->iostatus;
     info->ready     = job_is_ready(&job->job),
@@ -XXX,XX +XXX,XX @@ static void block_job_event_cancelled(Notifier *n, void *opaque)
 
     qapi_event_send_block_job_cancelled(job_type(&job->job),
                                         job->job.id,
-                                        job->job.progress_total,
-                                        job->job.progress_current,
+                                        job->job.progress.total,
+                                        job->job.progress.current,
                                         job->speed);
 }
 
@@ -XXX,XX +XXX,XX @@ static void block_job_event_completed(Notifier *n, void *opaque)
 
     qapi_event_send_block_job_completed(job_type(&job->job),
                                         job->job.id,
-                                        job->job.progress_total,
-                                        job->job.progress_current,
+                                        job->job.progress.total,
+                                        job->job.progress.current,
                                         job->speed,
                                         !!msg,
                                         msg);
@@ -XXX,XX +XXX,XX @@ static void block_job_event_ready(Notifier *n, void *opaque)
 
     qapi_event_send_block_job_ready(job_type(&job->job),
                                     job->job.id,
-                                    job->job.progress_total,
-                                    job->job.progress_current,
+                                    job->job.progress.total,
+                                    job->job.progress.current,
                                     job->speed);
 }
 
diff --git a/include/qemu/job.h b/include/qemu/job.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -XXX,XX +XXX,XX @@
 
 #include "qapi/qapi-types-job.h"
 #include "qemu/queue.h"
+#include "qemu/progress_meter.h"
 #include "qemu/coroutine.h"
 #include "block/aio.h"
 
@@ -XXX,XX +XXX,XX @@ typedef struct Job {
     /** True if this job should automatically dismiss itself */
     bool auto_dismiss;
 
-    /**
-     * Current progress. The unit is arbitrary as long as the ratio between
-     * progress_current and progress_total represents the estimated percentage
-     * of work already done.
-     */
-    int64_t progress_current;
-
-    /** Estimated progress_current value at the completion of the job */
-    int64_t progress_total;
+    ProgressMeter progress;
 
     /**
      * Return code from @run and/or @prepare callback(s).
diff --git a/include/qemu/progress_meter.h b/include/qemu/progress_meter.h
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/include/qemu/progress_meter.h
@@ -XXX,XX +XXX,XX @@
+/*
+ * Helper functionality for some process progress tracking.
+ *
+ * Copyright (c) 2011 IBM Corp.
+ * Copyright (c) 2012, 2018 Red Hat, Inc.
+ * Copyright (c) 2020 Virtuozzo International GmbH
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef QEMU_PROGRESS_METER_H
+#define QEMU_PROGRESS_METER_H
+
+typedef struct ProgressMeter {
+    /**
+     * Current progress. The unit is arbitrary as long as the ratio between
+     * current and total represents the estimated percentage
+     * of work already done.
+     */
+    uint64_t current;
+
+    /** Estimated current value at the completion of the process */
+    uint64_t total;
+} ProgressMeter;
+
+static inline void progress_work_done(ProgressMeter *pm, uint64_t done)
+{
+    pm->current += done;
+}
+
+static inline void progress_set_remaining(ProgressMeter *pm, uint64_t remaining)
+{
+    pm->total = pm->current + remaining;
+}
+
+static inline void progress_increase_remaining(ProgressMeter *pm,
+                                               uint64_t delta)
+{
+    pm->total += delta;
+}
+
+#endif /* QEMU_PROGRESS_METER_H */
diff --git a/job-qmp.c b/job-qmp.c
index XXXXXXX..XXXXXXX 100644
--- a/job-qmp.c
+++ b/job-qmp.c
@@ -XXX,XX +XXX,XX @@ static JobInfo *job_query_single(Job *job, Error **errp)
         .id                 = g_strdup(job->id),
         .type               = job_type(job),
         .status             = job->status,
-        .current_progress   = job->progress_current,
-        .total_progress     = job->progress_total,
+        .current_progress   = job->progress.current,
+        .total_progress     = job->progress.total,
         .has_error          = !!job->err,
         .error              = job->err ? \
                               g_strdup(error_get_pretty(job->err)) : NULL,
diff --git a/job.c b/job.c
index XXXXXXX..XXXXXXX 100644
--- a/job.c
+++ b/job.c
@@ -XXX,XX +XXX,XX @@ void job_unref(Job *job)
 
 void job_progress_update(Job *job, uint64_t done)
 {
-    job->progress_current += done;
+    progress_work_done(&job->progress, done);
 }
 
 void job_progress_set_remaining(Job *job, uint64_t remaining)
 {
-    job->progress_total = job->progress_current + remaining;
+    progress_set_remaining(&job->progress, remaining);
 }
 
 void job_progress_increase_remaining(Job *job, uint64_t delta)
 {
-    job->progress_total += delta;
+    progress_increase_remaining(&job->progress, delta);
 }
 
 void job_event_cancelled(Job *job)
diff --git a/qemu-img.c b/qemu-img.c
index XXXXXXX..XXXXXXX 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -XXX,XX +XXX,XX @@ static void run_block_job(BlockJob *job, Error **errp)
     do {
         float progress = 0.0f;
         aio_poll(aio_context, true);
-        if (job->job.progress_total) {
-            progress = (float)job->job.progress_current /
-                       job->job.progress_total * 100.f;
+        if (job->job.progress.total) {
+            progress = (float)job->job.progress.current /
+                       job->job.progress.total * 100.f;
         }
         qemu_progress_print(progress, 0);
     } while (!job_is_ready(&job->job) && !job_is_completed(&job->job));
-- 
2.24.1

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Assume we have two regions, A and B, and region B is in-flight now,
region A is not yet touched, but it is unallocated and should be
skipped.

Correspondingly, as progress we have

total = A + B
  current = 0

If we reset unallocated region A and call progress_reset_callback,
it will calculate 0 bytes dirty in the bitmap and call
job_progress_set_remaining, which will set

total = current + 0 = 0 + 0 = 0

So, B bytes are actually removed from total accounting. When job
finishes we'll have

total = 0
   current = B

, which doesn't sound good.

This is because we didn't considered in-flight bytes, actually when
calculating remaining, we should have set (in_flight + dirty_bytes)
as remaining, not only dirty_bytes.

To fix it, let's refactor progress calculation, moving it to block-copy
itself instead of fixing callback. And, of course, track in_flight
bytes count.

We still have to keep one callback, to maintain backup job bytes_read
calculation, but it will go on soon, when we turn the whole backup
process into one block_copy call.

Cc: qemu-stable@nongnu.org
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200311103004.7649-3-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/backup.c             | 13 ++-----------
 block/block-copy.c         | 16 ++++++++++++----
 include/block/block-copy.h | 15 +++++----------
 3 files changed, 19 insertions(+), 25 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index XXXXXXX..XXXXXXX 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -XXX,XX +XXX,XX @@ static void backup_progress_bytes_callback(int64_t bytes, void *opaque)
     BackupBlockJob *s = opaque;
 
     s->bytes_read += bytes;
-    job_progress_update(&s->common.job, bytes);
-}
-
-static void backup_progress_reset_callback(void *opaque)
-{
-    BackupBlockJob *s = opaque;
-    uint64_t estimate = bdrv_get_dirty_count(s->bcs->copy_bitmap);
-
-    job_progress_set_remaining(&s->common.job, estimate);
 }
 
 static int coroutine_fn backup_do_cow(BackupBlockJob *job,
@@ -XXX,XX +XXX,XX @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     job->cluster_size = cluster_size;
     job->len = len;
 
-    block_copy_set_callbacks(bcs, backup_progress_bytes_callback,
-                             backup_progress_reset_callback, job);
+    block_copy_set_progress_callback(bcs, backup_progress_bytes_callback, job);
+    block_copy_set_progress_meter(bcs, &job->common.job.progress);
 
     /* Required permissions are already taken by backup-top target */
     block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
diff --git a/block/block-copy.c b/block/block-copy.c
index XXXXXXX..XXXXXXX 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -XXX,XX +XXX,XX @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
     return s;
 }
 
-void block_copy_set_callbacks(
+void block_copy_set_progress_callback(
         BlockCopyState *s,
         ProgressBytesCallbackFunc progress_bytes_callback,
-        ProgressResetCallbackFunc progress_reset_callback,
         void *progress_opaque)
 {
     s->progress_bytes_callback = progress_bytes_callback;
-    s->progress_reset_callback = progress_reset_callback;
     s->progress_opaque = progress_opaque;
 }
 
+void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm)
+{
+    s->progress = pm;
+}
+
 /*
  * block_copy_do_copy
  *
@@ -XXX,XX +XXX,XX @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
 
     if (!ret) {
         bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
-        s->progress_reset_callback(s->progress_opaque);
+        progress_set_remaining(s->progress,
+                               bdrv_get_dirty_count(s->copy_bitmap) +
+                               s->in_flight_bytes);
     }
 
     *count = bytes;
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
         trace_block_copy_process(s, start);
 
         bdrv_reset_dirty_bitmap(s->copy_bitmap, start, chunk_end - start);
+        s->in_flight_bytes += chunk_end - start;
 
         co_get_from_shres(s->mem, chunk_end - start);
         ret = block_copy_do_copy(s, start, chunk_end, error_is_read);
         co_put_to_shres(s->mem, chunk_end - start);
+        s->in_flight_bytes -= chunk_end - start;
         if (ret < 0) {
             bdrv_set_dirty_bitmap(s->copy_bitmap, start, chunk_end - start);
             break;
         }
 
+        progress_work_done(s->progress, chunk_end - start);
         s->progress_bytes_callback(chunk_end - start, s->progress_opaque);
         start = chunk_end;
         ret = 0;
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -XXX,XX +XXX,XX @@ typedef struct BlockCopyInFlightReq {
 } BlockCopyInFlightReq;
 
 typedef void (*ProgressBytesCallbackFunc)(int64_t bytes, void *opaque);
-typedef void (*ProgressResetCallbackFunc)(void *opaque);
 typedef struct BlockCopyState {
     /*
      * BdrvChild objects are not owned or managed by block-copy. They are
@@ -XXX,XX +XXX,XX @@ typedef struct BlockCopyState {
     BdrvChild *source;
     BdrvChild *target;
     BdrvDirtyBitmap *copy_bitmap;
+    int64_t in_flight_bytes;
     int64_t cluster_size;
     bool use_copy_range;
     int64_t copy_size;
@@ -XXX,XX +XXX,XX @@ typedef struct BlockCopyState {
      */
     bool skip_unallocated;
 
+    ProgressMeter *progress;
     /* progress_bytes_callback: called when some copying progress is done. */
     ProgressBytesCallbackFunc progress_bytes_callback;
-
-    /*
-     * progress_reset_callback: called when some bytes reset from copy_bitmap
-     * (see @skip_unallocated above). The callee is assumed to recalculate how
-     * many bytes remain based on the dirty bit count of copy_bitmap.
-     */
-    ProgressResetCallbackFunc progress_reset_callback;
     void *progress_opaque;
 
     SharedResource *mem;
@@ -XXX,XX +XXX,XX @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
                                      BdrvRequestFlags write_flags,
                                      Error **errp);
 
-void block_copy_set_callbacks(
+void block_copy_set_progress_callback(
         BlockCopyState *s,
         ProgressBytesCallbackFunc progress_bytes_callback,
-        ProgressResetCallbackFunc progress_reset_callback,
         void *progress_opaque);
 
+void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
+
 void block_copy_state_free(BlockCopyState *s);
 
 int64_t block_copy_reset_unallocated(BlockCopyState *s,
-- 
2.24.1

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

In block_copy_do_copy we fallback to read+write if copy_range failed.
In this case copy_size is larger than defined for buffered IO, and
there is corresponding commit. Still, backup copies data cluster by
cluster, and most of requests are limited to one cluster anyway, so the
only source of this one bad-limited request is copy-before-write
operation.

Further patch will move backup to use block_copy directly, than for
cases where copy_range is not supported, first request will be
oversized in each backup. It's not good, let's change it now.

Fix is simple: just limit first copy_range request like buffer-based
request. If it succeed, set larger copy_range limit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-4-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/block-copy.c | 41 +++++++++++++++++++++++++++++++----------
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index XXXXXXX..XXXXXXX 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -XXX,XX +XXX,XX @@ void block_copy_state_free(BlockCopyState *s)
     g_free(s);
 }
 
+static uint32_t block_copy_max_transfer(BdrvChild *source, BdrvChild *target)
+{
+    return MIN_NON_ZERO(INT_MAX,
+                        MIN_NON_ZERO(source->bs->bl.max_transfer,
+                                     target->bs->bl.max_transfer));
+}
+
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
                                      int64_t cluster_size,
                                      BdrvRequestFlags write_flags, Error **errp)
 {
     BlockCopyState *s;
     BdrvDirtyBitmap *copy_bitmap;
-    uint32_t max_transfer =
-            MIN_NON_ZERO(INT_MAX,
-                         MIN_NON_ZERO(source->bs->bl.max_transfer,
-                                      target->bs->bl.max_transfer));
 
     copy_bitmap = bdrv_create_dirty_bitmap(source->bs, cluster_size, NULL,
                                            errp);
@@ -XXX,XX +XXX,XX @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
         .mem = shres_create(BLOCK_COPY_MAX_MEM),
     };
 
-    if (max_transfer < cluster_size) {
+    if (block_copy_max_transfer(source, target) < cluster_size) {
         /*
          * copy_range does not respect max_transfer. We don't want to bother
          * with requests smaller than block-copy cluster size, so fallback to
@@ -XXX,XX +XXX,XX @@ BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
         s->copy_size = cluster_size;
     } else {
         /*
-         * copy_range does not respect max_transfer (it's a TODO), so we factor
-         * that in here.
+         * We enable copy-range, but keep small copy_size, until first
+         * successful copy_range (look at block_copy_do_copy).
          */
         s->use_copy_range = true;
-        s->copy_size = MIN(MAX(cluster_size, BLOCK_COPY_MAX_COPY_RANGE),
-                           QEMU_ALIGN_DOWN(max_transfer, cluster_size));
+        s->copy_size = MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER);
     }
 
     QLIST_INIT(&s->inflight_reqs);
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
             s->copy_size = MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER);
             /* Fallback to read+write with allocated buffer */
         } else {
+            if (s->use_copy_range) {
+                /*
+                 * Successful copy-range. Now increase copy_size.  copy_range
+                 * does not respect max_transfer (it's a TODO), so we factor
+                 * that in here.
+                 *
+                 * Note: we double-check s->use_copy_range for the case when
+                 * parallel block-copy request unsets it during previous
+                 * bdrv_co_copy_range call.
+                 */
+                s->copy_size =
+                        MIN(MAX(s->cluster_size, BLOCK_COPY_MAX_COPY_RANGE),
+                            QEMU_ALIGN_DOWN(block_copy_max_transfer(s->source,
+                                                                    s->target),
+                                            s->cluster_size));
+            }
             goto out;
         }
     }
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
     /*
      * In case of failed copy_range request above, we may proceed with buffered
      * request larger than BLOCK_COPY_MAX_BUFFER. Still, further requests will
-     * be properly limited, so don't care too much.
+     * be properly limited, so don't care too much. Moreover the most likely
+     * case (copy_range is unsupported for the configuration, so the very first
+     * copy_range request fails) is handled by setting large copy_size only
+     * after first successful copy_range.
      */
 
     bounce_buffer = qemu_blockalign(s->source->bs, nbytes);
-- 
2.24.1

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Use bdrv_block_status_above to chose effective chunk size and to handle
zeroes effectively.

This substitutes checking for just being allocated or not, and drops
old code path for it. Assistance by backup job is dropped too, as
caching block-status information is more difficult than just caching
is-allocated information in our dirty bitmap, and backup job is not
good place for this caching anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-5-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/block-copy.c | 73 +++++++++++++++++++++++++++++++++++++---------
 block/trace-events |  1 +
 2 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index XXXXXXX..XXXXXXX 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -XXX,XX +XXX,XX @@ void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm)
  */
 static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
                                            int64_t start, int64_t end,
-                                           bool *error_is_read)
+                                           bool zeroes, bool *error_is_read)
 {
     int ret;
     int nbytes = MIN(end, s->len) - start;
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
     assert(QEMU_IS_ALIGNED(end, s->cluster_size));
     assert(end < s->len || end == QEMU_ALIGN_UP(s->len, s->cluster_size));
 
+    if (zeroes) {
+        ret = bdrv_co_pwrite_zeroes(s->target, start, nbytes, s->write_flags &
+                                    ~BDRV_REQ_WRITE_COMPRESSED);
+        if (ret < 0) {
+            trace_block_copy_write_zeroes_fail(s, start, ret);
+            if (error_is_read) {
+                *error_is_read = false;
+            }
+        }
+        return ret;
+    }
+
     if (s->use_copy_range) {
         ret = bdrv_co_copy_range(s->source, start, s->target, start, nbytes,
                                  0, s->write_flags);
@@ -XXX,XX +XXX,XX @@ out:
     return ret;
 }
 
+static int block_copy_block_status(BlockCopyState *s, int64_t offset,
+                                   int64_t bytes, int64_t *pnum)
+{
+    int64_t num;
+    BlockDriverState *base;
+    int ret;
+
+    if (s->skip_unallocated && s->source->bs->backing) {
+        base = s->source->bs->backing->bs;
+    } else {
+        base = NULL;
+    }
+
+    ret = bdrv_block_status_above(s->source->bs, base, offset, bytes, &num,
+                                  NULL, NULL);
+    if (ret < 0 || num < s->cluster_size) {
+        /*
+         * On error or if failed to obtain large enough chunk just fallback to
+         * copy one cluster.
+         */
+        num = s->cluster_size;
+        ret = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_DATA;
+    } else if (offset + num == s->len) {
+        num = QEMU_ALIGN_UP(num, s->cluster_size);
+    } else {
+        num = QEMU_ALIGN_DOWN(num, s->cluster_size);
+    }
+
+    *pnum = num;
+    return ret;
+}
+
 /*
  * Check if the cluster starting at offset is allocated or not.
  * return via pnum the number of contiguous clusters sharing this allocation.
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
 {
     int ret = 0;
     int64_t end = bytes + start; /* bytes */
-    int64_t status_bytes;
     BlockCopyInFlightReq req;
 
     /*
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
     block_copy_inflight_req_begin(s, &req, start, end);
 
     while (start < end) {
-        int64_t next_zero, chunk_end;
+        int64_t next_zero, chunk_end, status_bytes;
 
         if (!bdrv_dirty_bitmap_get(s->copy_bitmap, start)) {
             trace_block_copy_skip(s, start);
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
             chunk_end = next_zero;
         }
 
-        if (s->skip_unallocated) {
-            ret = block_copy_reset_unallocated(s, start, &status_bytes);
-            if (ret == 0) {
-                trace_block_copy_skip_range(s, start, status_bytes);
-                start += status_bytes;
-                continue;
-            }
-            /* Clamp to known allocated region */
-            chunk_end = MIN(chunk_end, start + status_bytes);
+        ret = block_copy_block_status(s, start, chunk_end - start,
+                                      &status_bytes);
+        if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
+            bdrv_reset_dirty_bitmap(s->copy_bitmap, start, status_bytes);
+            progress_set_remaining(s->progress,
+                                   bdrv_get_dirty_count(s->copy_bitmap) +
+                                   s->in_flight_bytes);
+            trace_block_copy_skip_range(s, start, status_bytes);
+            start += status_bytes;
+            continue;
         }
 
+        chunk_end = MIN(chunk_end, start + status_bytes);
+
         trace_block_copy_process(s, start);
 
         bdrv_reset_dirty_bitmap(s->copy_bitmap, start, chunk_end - start);
         s->in_flight_bytes += chunk_end - start;
 
         co_get_from_shres(s->mem, chunk_end - start);
-        ret = block_copy_do_copy(s, start, chunk_end, error_is_read);
+        ret = block_copy_do_copy(s, start, chunk_end, ret & BDRV_BLOCK_ZERO,
+                                 error_is_read);
         co_put_to_shres(s->mem, chunk_end - start);
         s->in_flight_bytes -= chunk_end - start;
         if (ret < 0) {
diff --git a/block/trace-events b/block/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -XXX,XX +XXX,XX @@ block_copy_process(void *bcs, int64_t start) "bcs %p start %"PRId64
 block_copy_copy_range_fail(void *bcs, int64_t start, int ret) "bcs %p start %"PRId64" ret %d"
 block_copy_read_fail(void *bcs, int64_t start, int ret) "bcs %p start %"PRId64" ret %d"
 block_copy_write_fail(void *bcs, int64_t start, int ret) "bcs %p start %"PRId64" ret %d"
+block_copy_write_zeroes_fail(void *bcs, int64_t start, int ret) "bcs %p start %"PRId64" ret %d"
 
 # ../blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-- 
2.24.1

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Split find_conflicting_inflight_req to be used separately.

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

We have a lot of "chunk_end - start" invocations, let's switch to
bytes/cur_bytes scheme instead.

While being here, improve check on block_copy_do_copy parameters to not
overflow when calculating nbytes and use int64_t for bytes in
block_copy for consistency.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-7-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/block-copy.c         | 78 ++++++++++++++++++++------------------
 include/block/block-copy.h |  6 +--
 2 files changed, 44 insertions(+), 40 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index XXXXXXX..XXXXXXX 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -XXX,XX +XXX,XX @@
 
 static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
                                                            int64_t start,
-                                                           int64_t end)
+                                                           int64_t bytes)
 {
     BlockCopyInFlightReq *req;
 
     QLIST_FOREACH(req, &s->inflight_reqs, list) {
-        if (end > req->start_byte && start < req->end_byte) {
+        if (start + bytes > req->start && start < req->start + req->bytes) {
             return req;
         }
     }
@@ -XXX,XX +XXX,XX @@ static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
 
 static void coroutine_fn block_copy_wait_inflight_reqs(BlockCopyState *s,
                                                        int64_t start,
-                                                       int64_t end)
+                                                       int64_t bytes)
 {
     BlockCopyInFlightReq *req;
 
-    while ((req = find_conflicting_inflight_req(s, start, end))) {
+    while ((req = find_conflicting_inflight_req(s, start, bytes))) {
         qemu_co_queue_wait(&req->wait_queue, NULL);
     }
 }
 
 static void block_copy_inflight_req_begin(BlockCopyState *s,
                                           BlockCopyInFlightReq *req,
-                                          int64_t start, int64_t end)
+                                          int64_t start, int64_t bytes)
 {
-    req->start_byte = start;
-    req->end_byte = end;
+    req->start = start;
+    req->bytes = bytes;
     qemu_co_queue_init(&req->wait_queue);
     QLIST_INSERT_HEAD(&s->inflight_reqs, req, list);
 }
@@ -XXX,XX +XXX,XX @@ void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm)
 /*
  * block_copy_do_copy
  *
- * Do copy of cluser-aligned chunk. @end is allowed to exceed s->len only to
- * cover last cluster when s->len is not aligned to clusters.
+ * Do copy of cluster-aligned chunk. Requested region is allowed to exceed
+ * s->len only to cover last cluster when s->len is not aligned to clusters.
  *
  * No sync here: nor bitmap neighter intersecting requests handling, only copy.
  *
  * Returns 0 on success.
  */
 static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
-                                           int64_t start, int64_t end,
+                                           int64_t start, int64_t bytes,
                                            bool zeroes, bool *error_is_read)
 {
     int ret;
-    int nbytes = MIN(end, s->len) - start;
+    int64_t nbytes = MIN(start + bytes, s->len) - start;
     void *bounce_buffer = NULL;
 
+    assert(start >= 0 && bytes > 0 && INT64_MAX - start >= bytes);
     assert(QEMU_IS_ALIGNED(start, s->cluster_size));
-    assert(QEMU_IS_ALIGNED(end, s->cluster_size));
-    assert(end < s->len || end == QEMU_ALIGN_UP(s->len, s->cluster_size));
+    assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
+    assert(start < s->len);
+    assert(start + bytes <= s->len ||
+           start + bytes == QEMU_ALIGN_UP(s->len, s->cluster_size));
+    assert(nbytes < INT_MAX);
 
     if (zeroes) {
         ret = bdrv_co_pwrite_zeroes(s->target, start, nbytes, s->write_flags &
@@ -XXX,XX +XXX,XX @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
 }
 
 int coroutine_fn block_copy(BlockCopyState *s,
-                            int64_t start, uint64_t bytes,
+                            int64_t start, int64_t bytes,
                             bool *error_is_read)
 {
     int ret = 0;
-    int64_t end = bytes + start; /* bytes */
     BlockCopyInFlightReq req;
 
     /*
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
            bdrv_get_aio_context(s->target->bs));
 
     assert(QEMU_IS_ALIGNED(start, s->cluster_size));
-    assert(QEMU_IS_ALIGNED(end, s->cluster_size));
+    assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
 
     block_copy_wait_inflight_reqs(s, start, bytes);
-    block_copy_inflight_req_begin(s, &req, start, end);
+    block_copy_inflight_req_begin(s, &req, start, bytes);
 
-    while (start < end) {
-        int64_t next_zero, chunk_end, status_bytes;
+    while (bytes) {
+        int64_t next_zero, cur_bytes, status_bytes;
 
         if (!bdrv_dirty_bitmap_get(s->copy_bitmap, start)) {
             trace_block_copy_skip(s, start);
             start += s->cluster_size;
+            bytes -= s->cluster_size;
             continue; /* already copied */
         }
 
-        chunk_end = MIN(end, start + s->copy_size);
+        cur_bytes = MIN(bytes, s->copy_size);
 
         next_zero = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, start,
-                                                chunk_end - start);
+                                                cur_bytes);
         if (next_zero >= 0) {
             assert(next_zero > start); /* start is dirty */
-            assert(next_zero < chunk_end); /* no need to do MIN() */
-            chunk_end = next_zero;
+            assert(next_zero < start + cur_bytes); /* no need to do MIN() */
+            cur_bytes = next_zero - start;
         }
 
-        ret = block_copy_block_status(s, start, chunk_end - start,
-                                      &status_bytes);
+        ret = block_copy_block_status(s, start, cur_bytes, &status_bytes);
         if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
             bdrv_reset_dirty_bitmap(s->copy_bitmap, start, status_bytes);
             progress_set_remaining(s->progress,
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
                                    s->in_flight_bytes);
             trace_block_copy_skip_range(s, start, status_bytes);
             start += status_bytes;
+            bytes -= status_bytes;
             continue;
         }
 
-        chunk_end = MIN(chunk_end, start + status_bytes);
+        cur_bytes = MIN(cur_bytes, status_bytes);
 
         trace_block_copy_process(s, start);
 
-        bdrv_reset_dirty_bitmap(s->copy_bitmap, start, chunk_end - start);
-        s->in_flight_bytes += chunk_end - start;
+        bdrv_reset_dirty_bitmap(s->copy_bitmap, start, cur_bytes);
+        s->in_flight_bytes += cur_bytes;
 
-        co_get_from_shres(s->mem, chunk_end - start);
-        ret = block_copy_do_copy(s, start, chunk_end, ret & BDRV_BLOCK_ZERO,
+        co_get_from_shres(s->mem, cur_bytes);
+        ret = block_copy_do_copy(s, start, cur_bytes, ret & BDRV_BLOCK_ZERO,
                                  error_is_read);
-        co_put_to_shres(s->mem, chunk_end - start);
-        s->in_flight_bytes -= chunk_end - start;
+        co_put_to_shres(s->mem, cur_bytes);
+        s->in_flight_bytes -= cur_bytes;
         if (ret < 0) {
-            bdrv_set_dirty_bitmap(s->copy_bitmap, start, chunk_end - start);
+            bdrv_set_dirty_bitmap(s->copy_bitmap, start, cur_bytes);
             break;
         }
 
-        progress_work_done(s->progress, chunk_end - start);
-        s->progress_bytes_callback(chunk_end - start, s->progress_opaque);
-        start = chunk_end;
-        ret = 0;
+        progress_work_done(s->progress, cur_bytes);
+        s->progress_bytes_callback(cur_bytes, s->progress_opaque);
+        start += cur_bytes;
+        bytes -= cur_bytes;
     }
 
     block_copy_inflight_req_end(&req);
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -XXX,XX +XXX,XX @@
 #include "qemu/co-shared-resource.h"
 
 typedef struct BlockCopyInFlightReq {
-    int64_t start_byte;
-    int64_t end_byte;
+    int64_t start;
+    int64_t bytes;
     QLIST_ENTRY(BlockCopyInFlightReq) list;
     CoQueue wait_queue; /* coroutines blocked on this request */
 } BlockCopyInFlightReq;
@@ -XXX,XX +XXX,XX @@ void block_copy_state_free(BlockCopyState *s);
 int64_t block_copy_reset_unallocated(BlockCopyState *s,
                                      int64_t offset, int64_t *count);
 
-int coroutine_fn block_copy(BlockCopyState *s, int64_t start, uint64_t bytes,
+int coroutine_fn block_copy(BlockCopyState *s, int64_t start, int64_t bytes,
                             bool *error_is_read);
 
 #endif /* BLOCK_COPY_H */
-- 
2.24.1

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

offset/bytes pair is more usual naming in block layer, let's use it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-8-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/block-copy.c         | 82 +++++++++++++++++++-------------------
 include/block/block-copy.h |  4 +-
 2 files changed, 43 insertions(+), 43 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index XXXXXXX..XXXXXXX 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -XXX,XX +XXX,XX @@
 #define BLOCK_COPY_MAX_MEM (128 * MiB)
 
 static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
-                                                           int64_t start,
+                                                           int64_t offset,
                                                            int64_t bytes)
 {
     BlockCopyInFlightReq *req;
 
     QLIST_FOREACH(req, &s->inflight_reqs, list) {
-        if (start + bytes > req->start && start < req->start + req->bytes) {
+        if (offset + bytes > req->offset && offset < req->offset + req->bytes) {
             return req;
         }
     }
@@ -XXX,XX +XXX,XX @@ static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
 }
 
 static void coroutine_fn block_copy_wait_inflight_reqs(BlockCopyState *s,
-                                                       int64_t start,
+                                                       int64_t offset,
                                                        int64_t bytes)
 {
     BlockCopyInFlightReq *req;
 
-    while ((req = find_conflicting_inflight_req(s, start, bytes))) {
+    while ((req = find_conflicting_inflight_req(s, offset, bytes))) {
         qemu_co_queue_wait(&req->wait_queue, NULL);
     }
 }
 
 static void block_copy_inflight_req_begin(BlockCopyState *s,
                                           BlockCopyInFlightReq *req,
-                                          int64_t start, int64_t bytes)
+                                          int64_t offset, int64_t bytes)
 {
-    req->start = start;
+    req->offset = offset;
     req->bytes = bytes;
     qemu_co_queue_init(&req->wait_queue);
     QLIST_INSERT_HEAD(&s->inflight_reqs, req, list);
@@ -XXX,XX +XXX,XX @@ void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm)
  * Returns 0 on success.
  */
 static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
-                                           int64_t start, int64_t bytes,
+                                           int64_t offset, int64_t bytes,
                                            bool zeroes, bool *error_is_read)
 {
     int ret;
-    int64_t nbytes = MIN(start + bytes, s->len) - start;
+    int64_t nbytes = MIN(offset + bytes, s->len) - offset;
     void *bounce_buffer = NULL;
 
-    assert(start >= 0 && bytes > 0 && INT64_MAX - start >= bytes);
-    assert(QEMU_IS_ALIGNED(start, s->cluster_size));
+    assert(offset >= 0 && bytes > 0 && INT64_MAX - offset >= bytes);
+    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
-    assert(start < s->len);
-    assert(start + bytes <= s->len ||
-           start + bytes == QEMU_ALIGN_UP(s->len, s->cluster_size));
+    assert(offset < s->len);
+    assert(offset + bytes <= s->len ||
+           offset + bytes == QEMU_ALIGN_UP(s->len, s->cluster_size));
     assert(nbytes < INT_MAX);
 
     if (zeroes) {
-        ret = bdrv_co_pwrite_zeroes(s->target, start, nbytes, s->write_flags &
+        ret = bdrv_co_pwrite_zeroes(s->target, offset, nbytes, s->write_flags &
                                     ~BDRV_REQ_WRITE_COMPRESSED);
         if (ret < 0) {
-            trace_block_copy_write_zeroes_fail(s, start, ret);
+            trace_block_copy_write_zeroes_fail(s, offset, ret);
             if (error_is_read) {
                 *error_is_read = false;
             }
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
     }
 
     if (s->use_copy_range) {
-        ret = bdrv_co_copy_range(s->source, start, s->target, start, nbytes,
+        ret = bdrv_co_copy_range(s->source, offset, s->target, offset, nbytes,
                                  0, s->write_flags);
         if (ret < 0) {
-            trace_block_copy_copy_range_fail(s, start, ret);
+            trace_block_copy_copy_range_fail(s, offset, ret);
             s->use_copy_range = false;
             s->copy_size = MAX(s->cluster_size, BLOCK_COPY_MAX_BUFFER);
             /* Fallback to read+write with allocated buffer */
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn block_copy_do_copy(BlockCopyState *s,
 
     bounce_buffer = qemu_blockalign(s->source->bs, nbytes);
 
-    ret = bdrv_co_pread(s->source, start, nbytes, bounce_buffer, 0);
+    ret = bdrv_co_pread(s->source, offset, nbytes, bounce_buffer, 0);
     if (ret < 0) {
-        trace_block_copy_read_fail(s, start, ret);
+        trace_block_copy_read_fail(s, offset, ret);
         if (error_is_read) {
             *error_is_read = true;
         }
         goto out;
     }
 
-    ret = bdrv_co_pwrite(s->target, start, nbytes, bounce_buffer,
+    ret = bdrv_co_pwrite(s->target, offset, nbytes, bounce_buffer,
                          s->write_flags);
     if (ret < 0) {
-        trace_block_copy_write_fail(s, start, ret);
+        trace_block_copy_write_fail(s, offset, ret);
         if (error_is_read) {
             *error_is_read = false;
         }
@@ -XXX,XX +XXX,XX @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
 }
 
 int coroutine_fn block_copy(BlockCopyState *s,
-                            int64_t start, int64_t bytes,
+                            int64_t offset, int64_t bytes,
                             bool *error_is_read)
 {
     int ret = 0;
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
     assert(bdrv_get_aio_context(s->source->bs) ==
            bdrv_get_aio_context(s->target->bs));
 
-    assert(QEMU_IS_ALIGNED(start, s->cluster_size));
+    assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
 
-    block_copy_wait_inflight_reqs(s, start, bytes);
-    block_copy_inflight_req_begin(s, &req, start, bytes);
+    block_copy_wait_inflight_reqs(s, offset, bytes);
+    block_copy_inflight_req_begin(s, &req, offset, bytes);
 
     while (bytes) {
         int64_t next_zero, cur_bytes, status_bytes;
 
-        if (!bdrv_dirty_bitmap_get(s->copy_bitmap, start)) {
-            trace_block_copy_skip(s, start);
-            start += s->cluster_size;
+        if (!bdrv_dirty_bitmap_get(s->copy_bitmap, offset)) {
+            trace_block_copy_skip(s, offset);
+            offset += s->cluster_size;
             bytes -= s->cluster_size;
             continue; /* already copied */
         }
 
         cur_bytes = MIN(bytes, s->copy_size);
 
-        next_zero = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, start,
+        next_zero = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, offset,
                                                 cur_bytes);
         if (next_zero >= 0) {
-            assert(next_zero > start); /* start is dirty */
-            assert(next_zero < start + cur_bytes); /* no need to do MIN() */
-            cur_bytes = next_zero - start;
+            assert(next_zero > offset); /* offset is dirty */
+            assert(next_zero < offset + cur_bytes); /* no need to do MIN() */
+            cur_bytes = next_zero - offset;
         }
 
-        ret = block_copy_block_status(s, start, cur_bytes, &status_bytes);
+        ret = block_copy_block_status(s, offset, cur_bytes, &status_bytes);
         if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
-            bdrv_reset_dirty_bitmap(s->copy_bitmap, start, status_bytes);
+            bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, status_bytes);
             progress_set_remaining(s->progress,
                                    bdrv_get_dirty_count(s->copy_bitmap) +
                                    s->in_flight_bytes);
-            trace_block_copy_skip_range(s, start, status_bytes);
-            start += status_bytes;
+            trace_block_copy_skip_range(s, offset, status_bytes);
+            offset += status_bytes;
             bytes -= status_bytes;
             continue;
         }
 
         cur_bytes = MIN(cur_bytes, status_bytes);
 
-        trace_block_copy_process(s, start);
+        trace_block_copy_process(s, offset);
 
-        bdrv_reset_dirty_bitmap(s->copy_bitmap, start, cur_bytes);
+        bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, cur_bytes);
         s->in_flight_bytes += cur_bytes;
 
         co_get_from_shres(s->mem, cur_bytes);
-        ret = block_copy_do_copy(s, start, cur_bytes, ret & BDRV_BLOCK_ZERO,
+        ret = block_copy_do_copy(s, offset, cur_bytes, ret & BDRV_BLOCK_ZERO,
                                  error_is_read);
         co_put_to_shres(s->mem, cur_bytes);
         s->in_flight_bytes -= cur_bytes;
         if (ret < 0) {
-            bdrv_set_dirty_bitmap(s->copy_bitmap, start, cur_bytes);
+            bdrv_set_dirty_bitmap(s->copy_bitmap, offset, cur_bytes);
             break;
         }
 
         progress_work_done(s->progress, cur_bytes);
         s->progress_bytes_callback(cur_bytes, s->progress_opaque);
-        start += cur_bytes;
+        offset += cur_bytes;
         bytes -= cur_bytes;
     }
 
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -XXX,XX +XXX,XX @@
 #include "qemu/co-shared-resource.h"
 
 typedef struct BlockCopyInFlightReq {
-    int64_t start;
+    int64_t offset;
     int64_t bytes;
     QLIST_ENTRY(BlockCopyInFlightReq) list;
     CoQueue wait_queue; /* coroutines blocked on this request */
@@ -XXX,XX +XXX,XX @@ void block_copy_state_free(BlockCopyState *s);
 int64_t block_copy_reset_unallocated(BlockCopyState *s,
                                      int64_t offset, int64_t *count);
 
-int coroutine_fn block_copy(BlockCopyState *s, int64_t start, int64_t bytes,
+int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
                             bool *error_is_read);
 
 #endif /* BLOCK_COPY_H */
-- 
2.24.1

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Currently, block_copy operation lock the whole requested region. But
there is no reason to lock clusters, which are already copied, it will
disturb other parallel block_copy requests for no reason.

Let's instead do the following:

Lock only sub-region, which we are going to operate on. Then, after
copying all dirty sub-regions, we should wait for intersecting
requests block-copy, if they failed, we should retry these new dirty
clusters.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Message-Id: <20200311103004.7649-9-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/block-copy.c | 129 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 105 insertions(+), 24 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index XXXXXXX..XXXXXXX 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -XXX,XX +XXX,XX @@ static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
     return NULL;
 }
 
-static void coroutine_fn block_copy_wait_inflight_reqs(BlockCopyState *s,
-                                                       int64_t offset,
-                                                       int64_t bytes)
+/*
+ * If there are no intersecting requests return false. Otherwise, wait for the
+ * first found intersecting request to finish and return true.
+ */
+static bool coroutine_fn block_copy_wait_one(BlockCopyState *s, int64_t offset,
+                                             int64_t bytes)
 {
-    BlockCopyInFlightReq *req;
+    BlockCopyInFlightReq *req = find_conflicting_inflight_req(s, offset, bytes);
 
-    while ((req = find_conflicting_inflight_req(s, offset, bytes))) {
-        qemu_co_queue_wait(&req->wait_queue, NULL);
+    if (!req) {
+        return false;
     }
+
+    qemu_co_queue_wait(&req->wait_queue, NULL);
+
+    return true;
 }
 
+/* Called only on full-dirty region */
 static void block_copy_inflight_req_begin(BlockCopyState *s,
                                           BlockCopyInFlightReq *req,
                                           int64_t offset, int64_t bytes)
 {
+    assert(!find_conflicting_inflight_req(s, offset, bytes));
+
+    bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
+    s->in_flight_bytes += bytes;
+
     req->offset = offset;
     req->bytes = bytes;
     qemu_co_queue_init(&req->wait_queue);
     QLIST_INSERT_HEAD(&s->inflight_reqs, req, list);
 }
 
-static void coroutine_fn block_copy_inflight_req_end(BlockCopyInFlightReq *req)
+/*
+ * block_copy_inflight_req_shrink
+ *
+ * Drop the tail of the request to be handled later. Set dirty bits back and
+ * wake up all requests waiting for us (may be some of them are not intersecting
+ * with shrunk request)
+ */
+static void coroutine_fn block_copy_inflight_req_shrink(BlockCopyState *s,
+        BlockCopyInFlightReq *req, int64_t new_bytes)
 {
+    if (new_bytes == req->bytes) {
+        return;
+    }
+
+    assert(new_bytes > 0 && new_bytes < req->bytes);
+
+    s->in_flight_bytes -= req->bytes - new_bytes;
+    bdrv_set_dirty_bitmap(s->copy_bitmap,
+                          req->offset + new_bytes, req->bytes - new_bytes);
+
+    req->bytes = new_bytes;
+    qemu_co_queue_restart_all(&req->wait_queue);
+}
+
+static void coroutine_fn block_copy_inflight_req_end(BlockCopyState *s,
+                                                     BlockCopyInFlightReq *req,
+                                                     int ret)
+{
+    s->in_flight_bytes -= req->bytes;
+    if (ret < 0) {
+        bdrv_set_dirty_bitmap(s->copy_bitmap, req->offset, req->bytes);
+    }
     QLIST_REMOVE(req, list);
     qemu_co_queue_restart_all(&req->wait_queue);
 }
@@ -XXX,XX +XXX,XX @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
     return ret;
 }
 
-int coroutine_fn block_copy(BlockCopyState *s,
-                            int64_t offset, int64_t bytes,
-                            bool *error_is_read)
+/*
+ * block_copy_dirty_clusters
+ *
+ * Copy dirty clusters in @offset/@bytes range.
+ * Returns 1 if dirty clusters found and successfully copied, 0 if no dirty
+ * clusters found and -errno on failure.
+ */
+static int coroutine_fn block_copy_dirty_clusters(BlockCopyState *s,
+                                                  int64_t offset, int64_t bytes,
+                                                  bool *error_is_read)
 {
     int ret = 0;
-    BlockCopyInFlightReq req;
+    bool found_dirty = false;
 
     /*
      * block_copy() user is responsible for keeping source and target in same
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
     assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
     assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
 
-    block_copy_wait_inflight_reqs(s, offset, bytes);
-    block_copy_inflight_req_begin(s, &req, offset, bytes);
-
     while (bytes) {
+        BlockCopyInFlightReq req;
         int64_t next_zero, cur_bytes, status_bytes;
 
         if (!bdrv_dirty_bitmap_get(s->copy_bitmap, offset)) {
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
             continue; /* already copied */
         }
 
+        found_dirty = true;
+
         cur_bytes = MIN(bytes, s->copy_size);
 
         next_zero = bdrv_dirty_bitmap_next_zero(s->copy_bitmap, offset,
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
             assert(next_zero < offset + cur_bytes); /* no need to do MIN() */
             cur_bytes = next_zero - offset;
         }
+        block_copy_inflight_req_begin(s, &req, offset, cur_bytes);
 
         ret = block_copy_block_status(s, offset, cur_bytes, &status_bytes);
+        assert(ret >= 0); /* never fail */
+        cur_bytes = MIN(cur_bytes, status_bytes);
+        block_copy_inflight_req_shrink(s, &req, cur_bytes);
         if (s->skip_unallocated && !(ret & BDRV_BLOCK_ALLOCATED)) {
-            bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, status_bytes);
+            block_copy_inflight_req_end(s, &req, 0);
             progress_set_remaining(s->progress,
                                    bdrv_get_dirty_count(s->copy_bitmap) +
                                    s->in_flight_bytes);
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
             continue;
         }
 
-        cur_bytes = MIN(cur_bytes, status_bytes);
-
         trace_block_copy_process(s, offset);
 
-        bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, cur_bytes);
-        s->in_flight_bytes += cur_bytes;
-
         co_get_from_shres(s->mem, cur_bytes);
         ret = block_copy_do_copy(s, offset, cur_bytes, ret & BDRV_BLOCK_ZERO,
                                  error_is_read);
         co_put_to_shres(s->mem, cur_bytes);
-        s->in_flight_bytes -= cur_bytes;
+        block_copy_inflight_req_end(s, &req, ret);
         if (ret < 0) {
-            bdrv_set_dirty_bitmap(s->copy_bitmap, offset, cur_bytes);
-            break;
+            return ret;
         }
 
         progress_work_done(s->progress, cur_bytes);
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s,
         bytes -= cur_bytes;
     }
 
-    block_copy_inflight_req_end(&req);
+    return found_dirty;
+}
+
+/*
+ * block_copy
+ *
+ * Copy requested region, accordingly to dirty bitmap.
+ * Collaborate with parallel block_copy requests: if they succeed it will help
+ * us. If they fail, we will retry not-copied regions. So, if we return error,
+ * it means that some I/O operation failed in context of _this_ block_copy call,
+ * not some parallel operation.
+ */
+int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
+                            bool *error_is_read)
+{
+    int ret;
+
+    do {
+        ret = block_copy_dirty_clusters(s, offset, bytes, error_is_read);
+
+        if (ret == 0) {
+            ret = block_copy_wait_one(s, offset, bytes);
+        }
+
+        /*
+         * We retry in two cases:
+         * 1. Some progress done
+         *    Something was copied, which means that there were yield points
+         *    and some new dirty bits may have appeared (due to failed parallel
+         *    block-copy requests).
+         * 2. We have waited for some intersecting block-copy request
+         *    It may have failed and produced new dirty bits.
+         */
+    } while (ret > 0);
 
     return ret;
 }
-- 
2.24.1

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Hide structure definitions and add explicit API instead, to keep an
eye on the scope of the shared fields.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20200311103004.7649-10-vsementsov@virtuozzo.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/backup-top.c         |  6 ++--
 block/backup.c             | 25 ++++++++--------
 block/block-copy.c         | 59 ++++++++++++++++++++++++++++++++++++++
 include/block/block-copy.h | 52 +++------------------------------
 4 files changed, 80 insertions(+), 62 deletions(-)

diff --git a/block/backup-top.c b/block/backup-top.c
index XXXXXXX..XXXXXXX 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVBackupTopState {
     BlockCopyState *bcs;
     BdrvChild *target;
     bool active;
+    int64_t cluster_size;
 } BDRVBackupTopState;
 
 static coroutine_fn int backup_top_co_preadv(
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offset,
         return 0;
     }
 
-    off = QEMU_ALIGN_DOWN(offset, s->bcs->cluster_size);
-    end = QEMU_ALIGN_UP(offset + bytes, s->bcs->cluster_size);
+    off = QEMU_ALIGN_DOWN(offset, s->cluster_size);
+    end = QEMU_ALIGN_UP(offset + bytes, s->cluster_size);
 
     return block_copy(s->bcs, off, end - off, NULL);
 }
@@ -XXX,XX +XXX,XX @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState *source,
         goto fail;
     }
 
+    state->cluster_size = cluster_size;
     state->bcs = block_copy_state_new(top->backing, state->target,
                                       cluster_size, write_flags, &local_err);
     if (local_err) {
diff --git a/block/backup.c b/block/backup.c
index XXXXXXX..XXXXXXX 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -XXX,XX +XXX,XX @@ static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
 
     if (ret < 0 && job->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS) {
         /* If we failed and synced, merge in the bits we didn't copy: */
-        bdrv_dirty_bitmap_merge_internal(bm, job->bcs->copy_bitmap,
+        bdrv_dirty_bitmap_merge_internal(bm, block_copy_dirty_bitmap(job->bcs),
                                          NULL, true);
     }
 }
@@ -XXX,XX +XXX,XX @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
         return;
     }
 
-    bdrv_set_dirty_bitmap(backup_job->bcs->copy_bitmap, 0, backup_job->len);
+    bdrv_set_dirty_bitmap(block_copy_dirty_bitmap(backup_job->bcs), 0,
+                          backup_job->len);
 }
 
 static BlockErrorAction backup_error_action(BackupBlockJob *job,
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_loop(BackupBlockJob *job)
     BdrvDirtyBitmapIter *bdbi;
     int ret = 0;
 
-    bdbi = bdrv_dirty_iter_new(job->bcs->copy_bitmap);
+    bdbi = bdrv_dirty_iter_new(block_copy_dirty_bitmap(job->bcs));
     while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
         do {
             if (yield_and_check(job)) {
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_loop(BackupBlockJob *job)
     return ret;
 }
 
-static void backup_init_copy_bitmap(BackupBlockJob *job)
+static void backup_init_bcs_bitmap(BackupBlockJob *job)
 {
     bool ret;
     uint64_t estimate;
+    BdrvDirtyBitmap *bcs_bitmap = block_copy_dirty_bitmap(job->bcs);
 
     if (job->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
-        ret = bdrv_dirty_bitmap_merge_internal(job->bcs->copy_bitmap,
-                                               job->sync_bitmap,
+        ret = bdrv_dirty_bitmap_merge_internal(bcs_bitmap, job->sync_bitmap,
                                                NULL, true);
         assert(ret);
     } else {
@@ -XXX,XX +XXX,XX @@ static void backup_init_copy_bitmap(BackupBlockJob *job)
              * We can't hog the coroutine to initialize this thoroughly.
              * Set a flag and resume work when we are able to yield safely.
              */
-            job->bcs->skip_unallocated = true;
+            block_copy_set_skip_unallocated(job->bcs, true);
         }
-        bdrv_set_dirty_bitmap(job->bcs->copy_bitmap, 0, job->len);
+        bdrv_set_dirty_bitmap(bcs_bitmap, 0, job->len);
     }
 
-    estimate = bdrv_get_dirty_count(job->bcs->copy_bitmap);
+    estimate = bdrv_get_dirty_count(bcs_bitmap);
     job_progress_set_remaining(&job->common.job, estimate);
 }
 
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_run(Job *job, Error **errp)
     BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
     int ret = 0;
 
-    backup_init_copy_bitmap(s);
+    backup_init_bcs_bitmap(s);
 
     if (s->sync_mode == MIRROR_SYNC_MODE_TOP) {
         int64_t offset = 0;
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_run(Job *job, Error **errp)
 
             offset += count;
         }
-        s->bcs->skip_unallocated = false;
+        block_copy_set_skip_unallocated(s->bcs, false);
     }
 
     if (s->sync_mode == MIRROR_SYNC_MODE_NONE) {
         /*
-         * All bits are set in copy_bitmap to allow any cluster to be copied.
+         * All bits are set in bcs bitmap to allow any cluster to be copied.
          * This does not actually require them to be copied.
          */
         while (!job_is_cancelled(job)) {
diff --git a/block/block-copy.c b/block/block-copy.c
index XXXXXXX..XXXXXXX 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -XXX,XX +XXX,XX @@
 #define BLOCK_COPY_MAX_BUFFER (1 * MiB)
 #define BLOCK_COPY_MAX_MEM (128 * MiB)
 
+typedef struct BlockCopyInFlightReq {
+    int64_t offset;
+    int64_t bytes;
+    QLIST_ENTRY(BlockCopyInFlightReq) list;
+    CoQueue wait_queue; /* coroutines blocked on this request */
+} BlockCopyInFlightReq;
+
+typedef struct BlockCopyState {
+    /*
+     * BdrvChild objects are not owned or managed by block-copy. They are
+     * provided by block-copy user and user is responsible for appropriate
+     * permissions on these children.
+     */
+    BdrvChild *source;
+    BdrvChild *target;
+    BdrvDirtyBitmap *copy_bitmap;
+    int64_t in_flight_bytes;
+    int64_t cluster_size;
+    bool use_copy_range;
+    int64_t copy_size;
+    uint64_t len;
+    QLIST_HEAD(, BlockCopyInFlightReq) inflight_reqs;
+
+    BdrvRequestFlags write_flags;
+
+    /*
+     * skip_unallocated:
+     *
+     * Used by sync=top jobs, which first scan the source node for unallocated
+     * areas and clear them in the copy_bitmap.  During this process, the bitmap
+     * is thus not fully initialized: It may still have bits set for areas that
+     * are unallocated and should actually not be copied.
+     *
+     * This is indicated by skip_unallocated.
+     *
+     * In this case, block_copy() will query the source’s allocation status,
+     * skip unallocated regions, clear them in the copy_bitmap, and invoke
+     * block_copy_reset_unallocated() every time it does.
+     */
+    bool skip_unallocated;
+
+    ProgressMeter *progress;
+    /* progress_bytes_callback: called when some copying progress is done. */
+    ProgressBytesCallbackFunc progress_bytes_callback;
+    void *progress_opaque;
+
+    SharedResource *mem;
+} BlockCopyState;
+
 static BlockCopyInFlightReq *find_conflicting_inflight_req(BlockCopyState *s,
                                                            int64_t offset,
                                                            int64_t bytes)
@@ -XXX,XX +XXX,XX @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
 
     return ret;
 }
+
+BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s)
+{
+    return s->copy_bitmap;
+}
+
+void block_copy_set_skip_unallocated(BlockCopyState *s, bool skip)
+{
+    s->skip_unallocated = skip;
+}
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -XXX,XX +XXX,XX @@
 #include "block/block.h"
 #include "qemu/co-shared-resource.h"
 
-typedef struct BlockCopyInFlightReq {
-    int64_t offset;
-    int64_t bytes;
-    QLIST_ENTRY(BlockCopyInFlightReq) list;
-    CoQueue wait_queue; /* coroutines blocked on this request */
-} BlockCopyInFlightReq;
-
 typedef void (*ProgressBytesCallbackFunc)(int64_t bytes, void *opaque);
-typedef struct BlockCopyState {
-    /*
-     * BdrvChild objects are not owned or managed by block-copy. They are
-     * provided by block-copy user and user is responsible for appropriate
-     * permissions on these children.
-     */
-    BdrvChild *source;
-    BdrvChild *target;
-    BdrvDirtyBitmap *copy_bitmap;
-    int64_t in_flight_bytes;
-    int64_t cluster_size;
-    bool use_copy_range;
-    int64_t copy_size;
-    uint64_t len;
-    QLIST_HEAD(, BlockCopyInFlightReq) inflight_reqs;
-
-    BdrvRequestFlags write_flags;
-
-    /*
-     * skip_unallocated:
-     *
-     * Used by sync=top jobs, which first scan the source node for unallocated
-     * areas and clear them in the copy_bitmap.  During this process, the bitmap
-     * is thus not fully initialized: It may still have bits set for areas that
-     * are unallocated and should actually not be copied.
-     *
-     * This is indicated by skip_unallocated.
-     *
-     * In this case, block_copy() will query the source’s allocation status,
-     * skip unallocated regions, clear them in the copy_bitmap, and invoke
-     * block_copy_reset_unallocated() every time it does.
-     */
-    bool skip_unallocated;
-
-    ProgressMeter *progress;
-    /* progress_bytes_callback: called when some copying progress is done. */
-    ProgressBytesCallbackFunc progress_bytes_callback;
-    void *progress_opaque;
-
-    SharedResource *mem;
-} BlockCopyState;
+typedef struct BlockCopyState BlockCopyState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
                                      int64_t cluster_size,
@@ -XXX,XX +XXX,XX @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
 int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
                             bool *error_is_read);
 
+BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s);
+void block_copy_set_skip_unallocated(BlockCopyState *s, bool skip);
+
 #endif /* BLOCK_COPY_H */
-- 
2.24.1

The following changes since commit 474f3938d79ab36b9231c9ad3b5a9314c2aeacde:

Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-jun-21-2019' into staging (2019-06-21 15:40:50 +0100)

are available in the Git repository at:

https://github.com/XanClic/qemu.git tags/pull-block-2019-06-24

for you to fetch changes up to ab5d4a30f7f3803ca5106b370969c1b7b54136f8:

iotests: Fix 205 for concurrent runs (2019-06-24 16:01:40 +0200)

----------------------------------------------------------------
Block patches:
- The SSH block driver now uses libssh instead of libssh2
- The VMDK block driver gets read-only support for the seSparse
  subformat
- Various fixes

---

v2:
- Squashed Pino's fix for pre-0.8 libssh into the libssh patch

----------------------------------------------------------------
Anton Nefedov (1):
  iotest 134: test cluster-misaligned encrypted write

Klaus Birkelund Jensen (1):
  nvme: do not advertise support for unsupported arbitration mechanism

Max Reitz (1):
  iotests: Fix 205 for concurrent runs

Pino Toscano (1):
  ssh: switch from libssh2 to libssh

Sam Eiderman (3):
  vmdk: Fix comment regarding max l1_size coverage
  vmdk: Reduce the max bound for L1 table size
  vmdk: Add read-only support for seSparse snapshots

Vladimir Sementsov-Ogievskiy (1):
  blockdev: enable non-root nodes for transaction drive-backup source

-- 
2.21.0

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

We forget to enable it for transaction .prepare, while it is already
enabled in do_drive_backup since commit a2d665c1bc362
    "blockdev: loosen restrictions on drive-backup source node"

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190618140804.59214-1-vsementsov@virtuozzo.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 blockdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/blockdev.c b/blockdev.c
index XXXXXXX..XXXXXXX 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -XXX,XX +XXX,XX @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
     assert(common->action->type == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
     backup = common->action->u.drive_backup.data;
 
-    bs = qmp_get_root_bs(backup->device, errp);
+    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
     if (!bs) {
         return;
     }
-- 
2.21.0

From: Anton Nefedov <anton.nefedov@virtuozzo.com>

COW (even empty/zero) areas require encryption too

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20190516143028.81155-1-anton.nefedov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/134     |  9 +++++++++
 tests/qemu-iotests/134.out | 10 ++++++++++
 2 files changed, 19 insertions(+)

diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
index XXXXXXX..XXXXXXX 100755
--- a/tests/qemu-iotests/134
+++ b/tests/qemu-iotests/134
@@ -XXX,XX +XXX,XX @@ echo
 echo "== reading whole image =="
 $QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 
+echo
+echo "== rewriting cluster part =="
+$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
+echo "== verify pattern =="
+$QEMU_IO --object $SECRET -c "read -P 0 0 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+$QEMU_IO --object $SECRET -c "read -P 0xb 512 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
 echo
 echo "== rewriting whole image =="
 $QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/134.out
+++ b/tests/qemu-iotests/134.out
@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
 read 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+== rewriting cluster part ==
+wrote 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify pattern ==
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 == rewriting whole image ==
 wrote 134217728/134217728 bytes at offset 0
 128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-- 
2.21.0

From: Sam Eiderman <shmuel.eiderman@oracle.com>

Commit b0651b8c246d ("vmdk: Move l1_size check into vmdk_add_extent")
extended the l1_size check from VMDK4 to VMDK3 but did not update the
default coverage in the moved comment.

The previous vmdk4 calculation:

(512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB

The added vmdk3 calculation:

(512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB

Adding the calculation of vmdk3 to the comment.

In any case, VMware does not offer virtual disks more than 2TB for
vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is
not implemented yet in qemu.

Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com
Reviewed-by: yuchenlin <yuchenlin@synology.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/vmdk.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index XXXXXXX..XXXXXXX 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
         return -EFBIG;
     }
     if (l1_size > 512 * 1024 * 1024) {
-        /* Although with big capacity and small l1_entry_sectors, we can get a
+        /*
+         * Although with big capacity and small l1_entry_sectors, we can get a
          * big l1_size, we don't want unbounded value to allocate the table.
-         * Limit it to 512M, which is 16PB for default cluster and L2 table
-         * size */
+         * Limit it to 512M, which is:
+         *     16PB - for default "Hosted Sparse Extent" (VMDK4)
+         *            cluster size: 64KB, L2 table size: 512 entries
+         *     1PB  - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
+         *            cluster size: 512B, L2 table size: 4096 entries
+         */
         error_setg(errp, "L1 size too big");
         return -EFBIG;
     }
-- 
2.21.0

From: Sam Eiderman <shmuel.eiderman@oracle.com>

512M of L1 entries is a very loose bound, only 32M are required to store
the maximal supported VMDK file size of 2TB.

Fixed qemu-iotest 59# - now failure occures before on impossible L1
table size.

Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/vmdk.c               | 13 +++++++------
 tests/qemu-iotests/059.out |  2 +-
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index XXXXXXX..XXXXXXX 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
         error_setg(errp, "Invalid granularity, image may be corrupt");
         return -EFBIG;
     }
-    if (l1_size > 512 * 1024 * 1024) {
+    if (l1_size > 32 * 1024 * 1024) {
         /*
          * Although with big capacity and small l1_entry_sectors, we can get a
          * big l1_size, we don't want unbounded value to allocate the table.
-         * Limit it to 512M, which is:
-         *     16PB - for default "Hosted Sparse Extent" (VMDK4)
-         *            cluster size: 64KB, L2 table size: 512 entries
-         *     1PB  - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
-         *            cluster size: 512B, L2 table size: 4096 entries
+         * Limit it to 32M, which is enough to store:
+         *     8TB  - for both VMDK3 & VMDK4 with
+         *            minimal cluster size: 512B
+         *            minimal L2 table size: 512 entries
+         *            8 TB is still more than the maximal value supported for
+         *            VMDK3 & VMDK4 which is 2TB.
          */
         error_setg(errp, "L1 size too big");
         return -EFBIG;
diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/059.out
+++ b/tests/qemu-iotests/059.out
@@ -XXX,XX +XXX,XX @@ Offset          Length          Mapped to       File
 0x140000000     0x10000         0x50000         TEST_DIR/t-s003.vmdk
 
 === Testing afl image with a very large capacity ===
-qemu-img: Can't get image size 'TEST_DIR/afl9.IMGFMT': File too large
+qemu-img: Could not open 'TEST_DIR/afl9.IMGFMT': L1 size too big
 *** done
-- 
2.21.0

From: Sam Eiderman <shmuel.eiderman@oracle.com>

Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
QEMU).

This format was lacking in the following:

* Grain directory (L1) and grain table (L2) entries were 32-bit,
      allowing access to only 2TB (slightly less) of data.
    * The grain size (default) was 512 bytes - leading to data
      fragmentation and many grain tables.
    * For space reclamation purposes, it was necessary to find all the
      grains which are not pointed to by any grain table - so a reverse
      mapping of "offset of grain in vmdk" to "grain table" must be
      constructed - which takes large amounts of CPU/RAM.

The format specification can be found in VMware's documentation:
https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf

In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
introduced: SESparse (Space Efficient).

This format fixes the above issues:

* All entries are now 64-bit.
    * The grain size (default) is 4KB.
    * Grain directory and grain tables are now located at the beginning
      of the file.
      + seSparse format reserves space for all grain tables.
      + Grain tables can be addressed using an index.
      + Grains are located in the end of the file and can also be
        addressed with an index.
      - seSparse vmdks of large disks (64TB) have huge preallocated
        headers - mainly due to L2 tables, even for empty snapshots.
    * The header contains a reverse mapping ("backmap") of "offset of
      grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
      specifies for each grain - whether it is allocated or not.
      Using these data structures we can implement space reclamation
      efficiently.
    * Due to the fact that the header now maintains two mappings:
        * The regular one (grain directory & grain tables)
        * A reverse one (backmap and free bitmap)
      These data structures can lose consistency upon crash and result
      in a corrupted VMDK.
      Therefore, a journal is also added to the VMDK and is replayed
      when the VMware reopens the file after a crash.

Since ESXi 6.7 - SESparse is the only snapshot format available.

Unfortunately, VMware does not provide documentation regarding the new
seSparse format.

This commit is based on black-box research of the seSparse format.
Various in-guest block operations and their effect on the snapshot file
were tested.

The only VMware provided source of information (regarding the underlying
implementation) was a log file on the ESXi:

/var/log/hostd.log

Whenever an seSparse snapshot is created - the log is being populated
with seSparse records.

Relevant log records are of the form:

[...] Const Header:
[...]  constMagic     = 0xcafebabe
[...]  version        = 2.1
[...]  capacity       = 204800
[...]  grainSize      = 8
[...]  grainTableSize = 64
[...]  flags          = 0
[...] Extents:
[...]  Header         : <1 : 1>
[...]  JournalHdr     : <2 : 2>
[...]  Journal        : <2048 : 2048>
[...]  GrainDirectory : <4096 : 2048>
[...]  GrainTables    : <6144 : 2048>
[...]  FreeBitmap     : <8192 : 2048>
[...]  BackMap        : <10240 : 2048>
[...]  Grain          : <12288 : 204800>
[...] Volatile Header:
[...] volatileMagic     = 0xcafecafe
[...] FreeGTNumber      = 0
[...] nextTxnSeqNumber  = 0
[...] replayJournal     = 0

The sizes that are seen in the log file are in sectors.
Extents are of the following format: <offset : size>

This commit is a strict implementation which enforces:
    * magics
    * version number 2.1
    * grain size of 8 sectors  (4KB)
    * grain table size of 64 sectors
    * zero flags
    * extent locations

Additionally, this commit proivdes only a subset of the functionality
offered by seSparse's format:
    * Read-only
    * No journal replay
    * No space reclamation
    * No unmap support

Hence, journal header, journal, free bitmap and backmap extents are
unused, only the "classic" (L1 -> L2 -> data) grain access is
implemented.

However there are several differences in the grain access itself.
Grain directory (L1):
    * Grain directory entries are indexes (not offsets) to grain
      tables.
    * Valid grain directory entries have their highest nibble set to
      0x1.
    * Since grain tables are always located in the beginning of the
      file - the index can fit into 32 bits - so we can use its low
      part if it's valid.
Grain table (L2):
    * Grain table entries are indexes (not offsets) to grains.
    * If the highest nibble of the entry is:
        0x0:
            The grain in not allocated.
            The rest of the bytes are 0.
        0x1:
            The grain is unmapped - guest sees a zero grain.
            The rest of the bits point to the previously mapped grain,
            see 0x3 case.
        0x2:
            The grain is zero.
        0x3:
            The grain is allocated - to get the index calculate:
            ((entry & 0x0fff000000000000) >> 48) |
            ((entry & 0x0000ffffffffffff) << 12)
    * The difference between 0x1 and 0x2 is that 0x1 is an unallocated
      grain which results from the guest using sg_unmap to unmap the
      grain - but the grain itself still exists in the grain extent - a
      space reclamation procedure should delete it.
      Unmapping a zero grain has no effect (0x2 will not change to 0x1)
      but unmapping an unallocated grain will (0x0 to 0x1) - naturally.

In order to implement seSparse some fields had to be changed to support
both 32-bit and 64-bit entry sizes.

Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/vmdk.c | 358 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 342 insertions(+), 16 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index XXXXXXX..XXXXXXX 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -XXX,XX +XXX,XX @@ typedef struct {
     uint16_t compressAlgorithm;
 } QEMU_PACKED VMDK4Header;
 
+typedef struct VMDKSESparseConstHeader {
+    uint64_t magic;
+    uint64_t version;
+    uint64_t capacity;
+    uint64_t grain_size;
+    uint64_t grain_table_size;
+    uint64_t flags;
+    uint64_t reserved1;
+    uint64_t reserved2;
+    uint64_t reserved3;
+    uint64_t reserved4;
+    uint64_t volatile_header_offset;
+    uint64_t volatile_header_size;
+    uint64_t journal_header_offset;
+    uint64_t journal_header_size;
+    uint64_t journal_offset;
+    uint64_t journal_size;
+    uint64_t grain_dir_offset;
+    uint64_t grain_dir_size;
+    uint64_t grain_tables_offset;
+    uint64_t grain_tables_size;
+    uint64_t free_bitmap_offset;
+    uint64_t free_bitmap_size;
+    uint64_t backmap_offset;
+    uint64_t backmap_size;
+    uint64_t grains_offset;
+    uint64_t grains_size;
+    uint8_t pad[304];
+} QEMU_PACKED VMDKSESparseConstHeader;
+
+typedef struct VMDKSESparseVolatileHeader {
+    uint64_t magic;
+    uint64_t free_gt_number;
+    uint64_t next_txn_seq_number;
+    uint64_t replay_journal;
+    uint8_t pad[480];
+} QEMU_PACKED VMDKSESparseVolatileHeader;
+
 #define L2_CACHE_SIZE 16
 
 typedef struct VmdkExtent {
@@ -XXX,XX +XXX,XX @@ typedef struct VmdkExtent {
     bool compressed;
     bool has_marker;
     bool has_zero_grain;
+    bool sesparse;
+    uint64_t sesparse_l2_tables_offset;
+    uint64_t sesparse_clusters_offset;
+    int32_t entry_size;
     int version;
     int64_t sectors;
     int64_t end_sector;
     int64_t flat_start_offset;
     int64_t l1_table_offset;
     int64_t l1_backup_table_offset;
-    uint32_t *l1_table;
+    void *l1_table;
     uint32_t *l1_backup_table;
     unsigned int l1_size;
     uint32_t l1_entry_sectors;
 
     unsigned int l2_size;
-    uint32_t *l2_cache;
+    void *l2_cache;
     uint32_t l2_cache_offsets[L2_CACHE_SIZE];
     uint32_t l2_cache_counts[L2_CACHE_SIZE];
 
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
          *            minimal L2 table size: 512 entries
          *            8 TB is still more than the maximal value supported for
          *            VMDK3 & VMDK4 which is 2TB.
+         *     64TB - for "ESXi seSparse Extent"
+         *            minimal cluster size: 512B (default is 4KB)
+         *            L2 table size: 4096 entries (const).
+         *            64TB is more than the maximal value supported for
+         *            seSparse VMDKs (which is slightly less than 64TB)
          */
         error_setg(errp, "L1 size too big");
         return -EFBIG;
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
     extent->l2_size = l2_size;
     extent->cluster_sectors = flat ? sectors : cluster_sectors;
     extent->next_cluster_sector = ROUND_UP(nb_sectors, cluster_sectors);
+    extent->entry_size = sizeof(uint32_t);
 
     if (s->num_extents > 1) {
         extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
     int i;
 
     /* read the L1 table */
-    l1_size = extent->l1_size * sizeof(uint32_t);
+    l1_size = extent->l1_size * extent->entry_size;
     extent->l1_table = g_try_malloc(l1_size);
     if (l1_size && extent->l1_table == NULL) {
         return -ENOMEM;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
         goto fail_l1;
     }
     for (i = 0; i < extent->l1_size; i++) {
-        le32_to_cpus(&extent->l1_table[i]);
+        if (extent->entry_size == sizeof(uint64_t)) {
+            le64_to_cpus((uint64_t *)extent->l1_table + i);
+        } else {
+            assert(extent->entry_size == sizeof(uint32_t));
+            le32_to_cpus((uint32_t *)extent->l1_table + i);
+        }
     }
 
     if (extent->l1_backup_table_offset) {
+        assert(!extent->sesparse);
         extent->l1_backup_table = g_try_malloc(l1_size);
         if (l1_size && extent->l1_backup_table == NULL) {
             ret = -ENOMEM;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
     }
 
     extent->l2_cache =
-        g_new(uint32_t, extent->l2_size * L2_CACHE_SIZE);
+        g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE);
     return 0;
  fail_l1b:
     g_free(extent->l1_backup_table);
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
     return ret;
 }
 
+#define SESPARSE_CONST_HEADER_MAGIC UINT64_C(0x00000000cafebabe)
+#define SESPARSE_VOLATILE_HEADER_MAGIC UINT64_C(0x00000000cafecafe)
+
+/* Strict checks - format not officially documented */
+static int check_se_sparse_const_header(VMDKSESparseConstHeader *header,
+                                        Error **errp)
+{
+    header->magic = le64_to_cpu(header->magic);
+    header->version = le64_to_cpu(header->version);
+    header->grain_size = le64_to_cpu(header->grain_size);
+    header->grain_table_size = le64_to_cpu(header->grain_table_size);
+    header->flags = le64_to_cpu(header->flags);
+    header->reserved1 = le64_to_cpu(header->reserved1);
+    header->reserved2 = le64_to_cpu(header->reserved2);
+    header->reserved3 = le64_to_cpu(header->reserved3);
+    header->reserved4 = le64_to_cpu(header->reserved4);
+
+    header->volatile_header_offset =
+        le64_to_cpu(header->volatile_header_offset);
+    header->volatile_header_size = le64_to_cpu(header->volatile_header_size);
+
+    header->journal_header_offset = le64_to_cpu(header->journal_header_offset);
+    header->journal_header_size = le64_to_cpu(header->journal_header_size);
+
+    header->journal_offset = le64_to_cpu(header->journal_offset);
+    header->journal_size = le64_to_cpu(header->journal_size);
+
+    header->grain_dir_offset = le64_to_cpu(header->grain_dir_offset);
+    header->grain_dir_size = le64_to_cpu(header->grain_dir_size);
+
+    header->grain_tables_offset = le64_to_cpu(header->grain_tables_offset);
+    header->grain_tables_size = le64_to_cpu(header->grain_tables_size);
+
+    header->free_bitmap_offset = le64_to_cpu(header->free_bitmap_offset);
+    header->free_bitmap_size = le64_to_cpu(header->free_bitmap_size);
+
+    header->backmap_offset = le64_to_cpu(header->backmap_offset);
+    header->backmap_size = le64_to_cpu(header->backmap_size);
+
+    header->grains_offset = le64_to_cpu(header->grains_offset);
+    header->grains_size = le64_to_cpu(header->grains_size);
+
+    if (header->magic != SESPARSE_CONST_HEADER_MAGIC) {
+        error_setg(errp, "Bad const header magic: 0x%016" PRIx64,
+                   header->magic);
+        return -EINVAL;
+    }
+
+    if (header->version != 0x0000000200000001) {
+        error_setg(errp, "Unsupported version: 0x%016" PRIx64,
+                   header->version);
+        return -ENOTSUP;
+    }
+
+    if (header->grain_size != 8) {
+        error_setg(errp, "Unsupported grain size: %" PRIu64,
+                   header->grain_size);
+        return -ENOTSUP;
+    }
+
+    if (header->grain_table_size != 64) {
+        error_setg(errp, "Unsupported grain table size: %" PRIu64,
+                   header->grain_table_size);
+        return -ENOTSUP;
+    }
+
+    if (header->flags != 0) {
+        error_setg(errp, "Unsupported flags: 0x%016" PRIx64,
+                   header->flags);
+        return -ENOTSUP;
+    }
+
+    if (header->reserved1 != 0 || header->reserved2 != 0 ||
+        header->reserved3 != 0 || header->reserved4 != 0) {
+        error_setg(errp, "Unsupported reserved bits:"
+                   " 0x%016" PRIx64 " 0x%016" PRIx64
+                   " 0x%016" PRIx64 " 0x%016" PRIx64,
+                   header->reserved1, header->reserved2,
+                   header->reserved3, header->reserved4);
+        return -ENOTSUP;
+    }
+
+    /* check that padding is 0 */
+    if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
+        error_setg(errp, "Unsupported non-zero const header padding");
+        return -ENOTSUP;
+    }
+
+    return 0;
+}
+
+static int check_se_sparse_volatile_header(VMDKSESparseVolatileHeader *header,
+                                           Error **errp)
+{
+    header->magic = le64_to_cpu(header->magic);
+    header->free_gt_number = le64_to_cpu(header->free_gt_number);
+    header->next_txn_seq_number = le64_to_cpu(header->next_txn_seq_number);
+    header->replay_journal = le64_to_cpu(header->replay_journal);
+
+    if (header->magic != SESPARSE_VOLATILE_HEADER_MAGIC) {
+        error_setg(errp, "Bad volatile header magic: 0x%016" PRIx64,
+                   header->magic);
+        return -EINVAL;
+    }
+
+    if (header->replay_journal) {
+        error_setg(errp, "Image is dirty, Replaying journal not supported");
+        return -ENOTSUP;
+    }
+
+    /* check that padding is 0 */
+    if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
+        error_setg(errp, "Unsupported non-zero volatile header padding");
+        return -ENOTSUP;
+    }
+
+    return 0;
+}
+
+static int vmdk_open_se_sparse(BlockDriverState *bs,
+                               BdrvChild *file,
+                               int flags, Error **errp)
+{
+    int ret;
+    VMDKSESparseConstHeader const_header;
+    VMDKSESparseVolatileHeader volatile_header;
+    VmdkExtent *extent;
+
+    ret = bdrv_apply_auto_read_only(bs,
+            "No write support for seSparse images available", errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    assert(sizeof(const_header) == SECTOR_SIZE);
+
+    ret = bdrv_pread(file, 0, &const_header, sizeof(const_header));
+    if (ret < 0) {
+        bdrv_refresh_filename(file->bs);
+        error_setg_errno(errp, -ret,
+                         "Could not read const header from file '%s'",
+                         file->bs->filename);
+        return ret;
+    }
+
+    /* check const header */
+    ret = check_se_sparse_const_header(&const_header, errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    assert(sizeof(volatile_header) == SECTOR_SIZE);
+
+    ret = bdrv_pread(file,
+                     const_header.volatile_header_offset * SECTOR_SIZE,
+                     &volatile_header, sizeof(volatile_header));
+    if (ret < 0) {
+        bdrv_refresh_filename(file->bs);
+        error_setg_errno(errp, -ret,
+                         "Could not read volatile header from file '%s'",
+                         file->bs->filename);
+        return ret;
+    }
+
+    /* check volatile header */
+    ret = check_se_sparse_volatile_header(&volatile_header, errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = vmdk_add_extent(bs, file, false,
+                          const_header.capacity,
+                          const_header.grain_dir_offset * SECTOR_SIZE,
+                          0,
+                          const_header.grain_dir_size *
+                          SECTOR_SIZE / sizeof(uint64_t),
+                          const_header.grain_table_size *
+                          SECTOR_SIZE / sizeof(uint64_t),
+                          const_header.grain_size,
+                          &extent,
+                          errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    extent->sesparse = true;
+    extent->sesparse_l2_tables_offset = const_header.grain_tables_offset;
+    extent->sesparse_clusters_offset = const_header.grains_offset;
+    extent->entry_size = sizeof(uint64_t);
+
+    ret = vmdk_init_tables(bs, extent, errp);
+    if (ret) {
+        /* free extent allocated by vmdk_add_extent */
+        vmdk_free_last_extent(bs);
+    }
+
+    return ret;
+}
+
 static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
                                QDict *options, Error **errp);
 
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
          * RW [size in sectors] SPARSE "file-name.vmdk"
          * RW [size in sectors] VMFS "file-name.vmdk"
          * RW [size in sectors] VMFSSPARSE "file-name.vmdk"
+         * RW [size in sectors] SESPARSE "file-name.vmdk"
          */
         flat_offset = -1;
         matches = sscanf(p, "%10s %" SCNd64 " %10s \"%511[^\n\r\"]\" %" SCNd64,
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
 
         if (sectors <= 0 ||
             (strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
-             strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE")) ||
+             strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE") &&
+             strcmp(type, "SESPARSE")) ||
             (strcmp(access, "RW"))) {
             continue;
         }
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
                 return ret;
             }
             extent = &s->extents[s->num_extents - 1];
+        } else if (!strcmp(type, "SESPARSE")) {
+            ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
+            if (ret) {
+                bdrv_unref_child(bs, extent_file);
+                return ret;
+            }
+            extent = &s->extents[s->num_extents - 1];
         } else {
             error_setg(errp, "Unsupported extent type '%s'", type);
             bdrv_unref_child(bs, extent_file);
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
     if (strcmp(ct, "monolithicFlat") &&
         strcmp(ct, "vmfs") &&
         strcmp(ct, "vmfsSparse") &&
+        strcmp(ct, "seSparse") &&
         strcmp(ct, "twoGbMaxExtentSparse") &&
         strcmp(ct, "twoGbMaxExtentFlat")) {
         error_setg(errp, "Unsupported image type '%s'", ct);
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
 {
     unsigned int l1_index, l2_offset, l2_index;
     int min_index, i, j;
-    uint32_t min_count, *l2_table;
+    uint32_t min_count;
+    void *l2_table;
     bool zeroed = false;
     int64_t ret;
     int64_t cluster_sector;
+    unsigned int l2_size_bytes = extent->l2_size * extent->entry_size;
 
     if (m_data) {
         m_data->valid = 0;
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
     if (l1_index >= extent->l1_size) {
         return VMDK_ERROR;
     }
-    l2_offset = extent->l1_table[l1_index];
+    if (extent->sesparse) {
+        uint64_t l2_offset_u64;
+
+        assert(extent->entry_size == sizeof(uint64_t));
+
+        l2_offset_u64 = ((uint64_t *)extent->l1_table)[l1_index];
+        if (l2_offset_u64 == 0) {
+            l2_offset = 0;
+        } else if ((l2_offset_u64 & 0xffffffff00000000) != 0x1000000000000000) {
+            /*
+             * Top most nibble is 0x1 if grain table is allocated.
+             * strict check - top most 4 bytes must be 0x10000000 since max
+             * supported size is 64TB for disk - so no more than 64TB / 16MB
+             * grain directories which is smaller than uint32,
+             * where 16MB is the only supported default grain table coverage.
+             */
+            return VMDK_ERROR;
+        } else {
+            l2_offset_u64 = l2_offset_u64 & 0x00000000ffffffff;
+            l2_offset_u64 = extent->sesparse_l2_tables_offset +
+                l2_offset_u64 * l2_size_bytes / SECTOR_SIZE;
+            if (l2_offset_u64 > 0x00000000ffffffff) {
+                return VMDK_ERROR;
+            }
+            l2_offset = (unsigned int)(l2_offset_u64);
+        }
+    } else {
+        assert(extent->entry_size == sizeof(uint32_t));
+        l2_offset = ((uint32_t *)extent->l1_table)[l1_index];
+    }
     if (!l2_offset) {
         return VMDK_UNALLOC;
     }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
                     extent->l2_cache_counts[j] >>= 1;
                 }
             }
-            l2_table = extent->l2_cache + (i * extent->l2_size);
+            l2_table = (char *)extent->l2_cache + (i * l2_size_bytes);
             goto found;
         }
     }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
             min_index = i;
         }
     }
-    l2_table = extent->l2_cache + (min_index * extent->l2_size);
+    l2_table = (char *)extent->l2_cache + (min_index * l2_size_bytes);
     BLKDBG_EVENT(extent->file, BLKDBG_L2_LOAD);
     if (bdrv_pread(extent->file,
                 (int64_t)l2_offset * 512,
                 l2_table,
-                extent->l2_size * sizeof(uint32_t)
-            ) != extent->l2_size * sizeof(uint32_t)) {
+                l2_size_bytes
+            ) != l2_size_bytes) {
         return VMDK_ERROR;
     }
 
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
     extent->l2_cache_counts[min_index] = 1;
  found:
     l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
-    cluster_sector = le32_to_cpu(l2_table[l2_index]);
 
-    if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
-        zeroed = true;
+    if (extent->sesparse) {
+        cluster_sector = le64_to_cpu(((uint64_t *)l2_table)[l2_index]);
+        switch (cluster_sector & 0xf000000000000000) {
+        case 0x0000000000000000:
+            /* unallocated grain */
+            if (cluster_sector != 0) {
+                return VMDK_ERROR;
+            }
+            break;
+        case 0x1000000000000000:
+            /* scsi-unmapped grain - fallthrough */
+        case 0x2000000000000000:
+            /* zero grain */
+            zeroed = true;
+            break;
+        case 0x3000000000000000:
+            /* allocated grain */
+            cluster_sector = (((cluster_sector & 0x0fff000000000000) >> 48) |
+                              ((cluster_sector & 0x0000ffffffffffff) << 12));
+            cluster_sector = extent->sesparse_clusters_offset +
+                cluster_sector * extent->cluster_sectors;
+            break;
+        default:
+            return VMDK_ERROR;
+        }
+    } else {
+        cluster_sector = le32_to_cpu(((uint32_t *)l2_table)[l2_index]);
+
+        if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
+            zeroed = true;
+        }
     }
 
     if (!cluster_sector || zeroed) {
         if (!allocate) {
             return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
         }
+        assert(!extent->sesparse);
 
         if (extent->next_cluster_sector >= VMDK_EXTENT_MAX_SECTORS) {
             return VMDK_ERROR;
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
             m_data->l1_index = l1_index;
             m_data->l2_index = l2_index;
             m_data->l2_offset = l2_offset;
-            m_data->l2_cache_entry = &l2_table[l2_index];
+            m_data->l2_cache_entry = ((uint32_t *)l2_table) + l2_index;
         }
     }
     *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
@@ -XXX,XX +XXX,XX @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
         if (!extent) {
             return -EIO;
         }
+        if (extent->sesparse) {
+            return -ENOTSUP;
+        }
         offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
         n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
                              - offset_in_cluster);
-- 
2.21.0

From: Pino Toscano <ptoscano@redhat.com>

Rewrite the implementation of the ssh block driver to use libssh instead
of libssh2.  The libssh library has various advantages over libssh2:
- easier API for authentication (for example for using ssh-agent)
- easier API for known_hosts handling
- supports newer types of keys in known_hosts

Use APIs/features available in libssh 0.8 conditionally, to support
older versions (which are not recommended though).

Adjust the iotest 207 according to the different error message, and to
find the default key type for localhost (to properly compare the
fingerprint with).
Contributed-by: Max Reitz <mreitz@redhat.com>

Adjust the various Docker/Travis scripts to use libssh when available
instead of libssh2. The mingw/mxe testing is dropped for now, as there
are no packages for it.

Signed-off-by: Pino Toscano <ptoscano@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20190620200840.17655-1-ptoscano@redhat.com
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 configure                                     |  65 +-
 block/Makefile.objs                           |   6 +-
 block/ssh.c                                   | 652 ++++++++++--------
 .travis.yml                                   |   4 +-
 block/trace-events                            |  14 +-
 docs/qemu-block-drivers.texi                  |   2 +-
 .../dockerfiles/debian-win32-cross.docker     |   1 -
 .../dockerfiles/debian-win64-cross.docker     |   1 -
 tests/docker/dockerfiles/fedora.docker        |   4 +-
 tests/docker/dockerfiles/ubuntu.docker        |   2 +-
 tests/docker/dockerfiles/ubuntu1804.docker    |   2 +-
 tests/qemu-iotests/207                        |  54 +-
 tests/qemu-iotests/207.out                    |   2 +-
 13 files changed, 449 insertions(+), 360 deletions(-)

diff --git a/configure b/configure
index XXXXXXX..XXXXXXX 100755
--- a/configure
+++ b/configure
@@ -XXX,XX +XXX,XX @@ auth_pam=""
 vte=""
 virglrenderer=""
 tpm=""
-libssh2=""
+libssh=""
 live_block_migration="yes"
 numa=""
 tcmalloc="no"
@@ -XXX,XX +XXX,XX @@ for opt do
   ;;
   --enable-tpm) tpm="yes"
   ;;
-  --disable-libssh2) libssh2="no"
+  --disable-libssh) libssh="no"
   ;;
-  --enable-libssh2) libssh2="yes"
+  --enable-libssh) libssh="yes"
   ;;
   --disable-live-block-migration) live_block_migration="no"
   ;;
@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available:
   coroutine-pool  coroutine freelist (better performance)
   glusterfs       GlusterFS backend
   tpm             TPM support
-  libssh2         ssh block device support
+  libssh          ssh block device support
   numa            libnuma support
   libxml2         for Parallels image format
   tcmalloc        tcmalloc support
@@ -XXX,XX +XXX,XX @@ EOF
 fi
 
 ##########################################
-# libssh2 probe
-min_libssh2_version=1.2.8
-if test "$libssh2" != "no" ; then
-  if $pkg_config --atleast-version=$min_libssh2_version libssh2; then
-    libssh2_cflags=$($pkg_config libssh2 --cflags)
-    libssh2_libs=$($pkg_config libssh2 --libs)
-    libssh2=yes
+# libssh probe
+if test "$libssh" != "no" ; then
+  if $pkg_config --exists libssh; then
+    libssh_cflags=$($pkg_config libssh --cflags)
+    libssh_libs=$($pkg_config libssh --libs)
+    libssh=yes
   else
-    if test "$libssh2" = "yes" ; then
-      error_exit "libssh2 >= $min_libssh2_version required for --enable-libssh2"
+    if test "$libssh" = "yes" ; then
+      error_exit "libssh required for --enable-libssh"
     fi
-    libssh2=no
+    libssh=no
   fi
 fi
 
 ##########################################
-# libssh2_sftp_fsync probe
+# Check for libssh 0.8
+# This is done like this instead of using the LIBSSH_VERSION_* and
+# SSH_VERSION_* macros because some distributions in the past shipped
+# snapshots of the future 0.8 from Git, and those snapshots did not
+# have updated version numbers (still referring to 0.7.0).
 
-if test "$libssh2" = "yes"; then
+if test "$libssh" = "yes"; then
   cat > $TMPC <<EOF
-#include <stdio.h>
-#include <libssh2.h>
-#include <libssh2_sftp.h>
-int main(void) {
-    LIBSSH2_SESSION *session;
-    LIBSSH2_SFTP *sftp;
-    LIBSSH2_SFTP_HANDLE *sftp_handle;
-    session = libssh2_session_init ();
-    sftp = libssh2_sftp_init (session);
-    sftp_handle = libssh2_sftp_open (sftp, "/", 0, 0);
-    libssh2_sftp_fsync (sftp_handle);
-    return 0;
-}
+#include <libssh/libssh.h>
+int main(void) { return ssh_get_server_publickey(NULL, NULL); }
 EOF
-  # libssh2_cflags/libssh2_libs defined in previous test.
-  if compile_prog "$libssh2_cflags" "$libssh2_libs" ; then
-    QEMU_CFLAGS="-DHAS_LIBSSH2_SFTP_FSYNC $QEMU_CFLAGS"
+  if compile_prog "$libssh_cflags" "$libssh_libs"; then
+    libssh_cflags="-DHAVE_LIBSSH_0_8 $libssh_cflags"
   fi
 fi
 
@@ -XXX,XX +XXX,XX @@ echo "GlusterFS support $glusterfs"
 echo "gcov              $gcov_tool"
 echo "gcov enabled      $gcov"
 echo "TPM support       $tpm"
-echo "libssh2 support   $libssh2"
+echo "libssh support    $libssh"
 echo "QOM debugging     $qom_cast_debug"
 echo "Live block migration $live_block_migration"
 echo "lzo support       $lzo"
@@ -XXX,XX +XXX,XX @@ if test "$glusterfs_iocb_has_stat" = "yes" ; then
   echo "CONFIG_GLUSTERFS_IOCB_HAS_STAT=y" >> $config_host_mak
 fi
 
-if test "$libssh2" = "yes" ; then
-  echo "CONFIG_LIBSSH2=m" >> $config_host_mak
-  echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
-  echo "LIBSSH2_LIBS=$libssh2_libs" >> $config_host_mak
+if test "$libssh" = "yes" ; then
+  echo "CONFIG_LIBSSH=m" >> $config_host_mak
+  echo "LIBSSH_CFLAGS=$libssh_cflags" >> $config_host_mak
+  echo "LIBSSH_LIBS=$libssh_libs" >> $config_host_mak
 fi
 
 if test "$live_block_migration" = "yes" ; then
diff --git a/block/Makefile.objs b/block/Makefile.objs
index XXXXXXX..XXXXXXX 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -XXX,XX +XXX,XX @@ block-obj-$(CONFIG_CURL) += curl.o
 block-obj-$(CONFIG_RBD) += rbd.o
 block-obj-$(CONFIG_GLUSTERFS) += gluster.o
 block-obj-$(CONFIG_VXHS) += vxhs.o
-block-obj-$(CONFIG_LIBSSH2) += ssh.o
+block-obj-$(CONFIG_LIBSSH) += ssh.o
 block-obj-y += accounting.o dirty-bitmap.o
 block-obj-y += write-threshold.o
 block-obj-y += backup.o
@@ -XXX,XX +XXX,XX @@ rbd.o-libs         := $(RBD_LIBS)
 gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
 gluster.o-libs     := $(GLUSTERFS_LIBS)
 vxhs.o-libs        := $(VXHS_LIBS)
-ssh.o-cflags       := $(LIBSSH2_CFLAGS)
-ssh.o-libs         := $(LIBSSH2_LIBS)
+ssh.o-cflags       := $(LIBSSH_CFLAGS)
+ssh.o-libs         := $(LIBSSH_LIBS)
 block-obj-dmg-bz2-$(CONFIG_BZIP2) += dmg-bz2.o
 block-obj-$(if $(CONFIG_DMG),m,n) += $(block-obj-dmg-bz2-y)
 dmg-bz2.o-libs     := $(BZIP2_LIBS)
diff --git a/block/ssh.c b/block/ssh.c
index XXXXXXX..XXXXXXX 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -XXX,XX +XXX,XX @@
 
 #include "qemu/osdep.h"
 
-#include <libssh2.h>
-#include <libssh2_sftp.h>
+#include <libssh/libssh.h>
+#include <libssh/sftp.h>
 
 #include "block/block_int.h"
 #include "block/qdict.h"
@@ -XXX,XX +XXX,XX @@
 #include "trace.h"
 
 /*
- * TRACE_LIBSSH2=<bitmask> enables tracing in libssh2 itself.  Note
- * that this requires that libssh2 was specially compiled with the
- * `./configure --enable-debug' option, so most likely you will have
- * to compile it yourself.  The meaning of <bitmask> is described
- * here: http://www.libssh2.org/libssh2_trace.html
+ * TRACE_LIBSSH=<level> enables tracing in libssh itself.
+ * The meaning of <level> is described here:
+ * http://api.libssh.org/master/group__libssh__log.html
  */
-#define TRACE_LIBSSH2 0 /* or try: LIBSSH2_TRACE_SFTP */
+#define TRACE_LIBSSH  0 /* see: SSH_LOG_* */
 
 typedef struct BDRVSSHState {
     /* Coroutine. */
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVSSHState {
 
     /* SSH connection. */
     int sock;                         /* socket */
-    LIBSSH2_SESSION *session;         /* ssh session */
-    LIBSSH2_SFTP *sftp;               /* sftp session */
-    LIBSSH2_SFTP_HANDLE *sftp_handle; /* sftp remote file handle */
+    ssh_session session;              /* ssh session */
+    sftp_session sftp;                /* sftp session */
+    sftp_file sftp_handle;            /* sftp remote file handle */
 
-    /* See ssh_seek() function below. */
-    int64_t offset;
-    bool offset_op_read;
-
-    /* File attributes at open.  We try to keep the .filesize field
+    /*
+     * File attributes at open.  We try to keep the .size field
      * updated if it changes (eg by writing at the end of the file).
      */
-    LIBSSH2_SFTP_ATTRIBUTES attrs;
+    sftp_attributes attrs;
 
     InetSocketAddress *inet;
 
@@ -XXX,XX +XXX,XX @@ static void ssh_state_init(BDRVSSHState *s)
 {
     memset(s, 0, sizeof *s);
     s->sock = -1;
-    s->offset = -1;
     qemu_co_mutex_init(&s->lock);
 }
 
@@ -XXX,XX +XXX,XX @@ static void ssh_state_free(BDRVSSHState *s)
 {
     g_free(s->user);
 
+    if (s->attrs) {
+        sftp_attributes_free(s->attrs);
+    }
     if (s->sftp_handle) {
-        libssh2_sftp_close(s->sftp_handle);
+        sftp_close(s->sftp_handle);
     }
     if (s->sftp) {
-        libssh2_sftp_shutdown(s->sftp);
+        sftp_free(s->sftp);
     }
     if (s->session) {
-        libssh2_session_disconnect(s->session,
-                                   "from qemu ssh client: "
-                                   "user closed the connection");
-        libssh2_session_free(s->session);
-    }
-    if (s->sock >= 0) {
-        close(s->sock);
+        ssh_disconnect(s->session);
+        ssh_free(s->session); /* This frees s->sock */
     }
 }
 
@@ -XXX,XX +XXX,XX @@ session_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
     va_end(args);
 
     if (s->session) {
-        char *ssh_err;
+        const char *ssh_err;
         int ssh_err_code;
 
-        /* This is not an errno.  See <libssh2.h>. */
-        ssh_err_code = libssh2_session_last_error(s->session,
-                                                  &ssh_err, NULL, 0);
-        error_setg(errp, "%s: %s (libssh2 error code: %d)",
+        /* This is not an errno.  See <libssh/libssh.h>. */
+        ssh_err = ssh_get_error(s->session);
+        ssh_err_code = ssh_get_error_code(s->session);
+        error_setg(errp, "%s: %s (libssh error code: %d)",
                    msg, ssh_err, ssh_err_code);
     } else {
         error_setg(errp, "%s", msg);
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
     va_end(args);
 
     if (s->sftp) {
-        char *ssh_err;
+        const char *ssh_err;
         int ssh_err_code;
-        unsigned long sftp_err_code;
+        int sftp_err_code;
 
-        /* This is not an errno.  See <libssh2.h>. */
-        ssh_err_code = libssh2_session_last_error(s->session,
-                                                  &ssh_err, NULL, 0);
-        /* See <libssh2_sftp.h>. */
-        sftp_err_code = libssh2_sftp_last_error((s)->sftp);
+        /* This is not an errno.  See <libssh/libssh.h>. */
+        ssh_err = ssh_get_error(s->session);
+        ssh_err_code = ssh_get_error_code(s->session);
+        /* See <libssh/sftp.h>. */
+        sftp_err_code = sftp_get_error(s->sftp);
 
         error_setg(errp,
-                   "%s: %s (libssh2 error code: %d, sftp error code: %lu)",
+                   "%s: %s (libssh error code: %d, sftp error code: %d)",
                    msg, ssh_err, ssh_err_code, sftp_err_code);
     } else {
         error_setg(errp, "%s", msg);
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
 
 static void sftp_error_trace(BDRVSSHState *s, const char *op)
 {
-    char *ssh_err;
+    const char *ssh_err;
     int ssh_err_code;
-    unsigned long sftp_err_code;
+    int sftp_err_code;
 
-    /* This is not an errno.  See <libssh2.h>. */
-    ssh_err_code = libssh2_session_last_error(s->session,
-                                              &ssh_err, NULL, 0);
-    /* See <libssh2_sftp.h>. */
-    sftp_err_code = libssh2_sftp_last_error((s)->sftp);
+    /* This is not an errno.  See <libssh/libssh.h>. */
+    ssh_err = ssh_get_error(s->session);
+    ssh_err_code = ssh_get_error_code(s->session);
+    /* See <libssh/sftp.h>. */
+    sftp_err_code = sftp_get_error(s->sftp);
 
     trace_sftp_error(op, ssh_err, ssh_err_code, sftp_err_code);
 }
@@ -XXX,XX +XXX,XX @@ static void ssh_parse_filename(const char *filename, QDict *options,
     parse_uri(filename, options, errp);
 }
 
-static int check_host_key_knownhosts(BDRVSSHState *s,
-                                     const char *host, int port, Error **errp)
+static int check_host_key_knownhosts(BDRVSSHState *s, Error **errp)
 {
-    const char *home;
-    char *knh_file = NULL;
-    LIBSSH2_KNOWNHOSTS *knh = NULL;
-    struct libssh2_knownhost *found;
-    int ret, r;
-    const char *hostkey;
-    size_t len;
-    int type;
-
-    hostkey = libssh2_session_hostkey(s->session, &len, &type);
-    if (!hostkey) {
+    int ret;
+#ifdef HAVE_LIBSSH_0_8
+    enum ssh_known_hosts_e state;
+    int r;
+    ssh_key pubkey;
+    enum ssh_keytypes_e pubkey_type;
+    unsigned char *server_hash = NULL;
+    size_t server_hash_len;
+    char *fingerprint = NULL;
+
+    state = ssh_session_is_known_server(s->session);
+    trace_ssh_server_status(state);
+
+    switch (state) {
+    case SSH_KNOWN_HOSTS_OK:
+        /* OK */
+        trace_ssh_check_host_key_knownhosts();
+        break;
+    case SSH_KNOWN_HOSTS_CHANGED:
         ret = -EINVAL;
-        session_error_setg(errp, s, "failed to read remote host key");
+        r = ssh_get_server_publickey(s->session, &pubkey);
+        if (r == 0) {
+            r = ssh_get_publickey_hash(pubkey, SSH_PUBLICKEY_HASH_SHA256,
+                                       &server_hash, &server_hash_len);
+            pubkey_type = ssh_key_type(pubkey);
+            ssh_key_free(pubkey);
+        }
+        if (r == 0) {
+            fingerprint = ssh_get_fingerprint_hash(SSH_PUBLICKEY_HASH_SHA256,
+                                                   server_hash,
+                                                   server_hash_len);
+            ssh_clean_pubkey_hash(&server_hash);
+        }
+        if (fingerprint) {
+            error_setg(errp,
+                       "host key (%s key with fingerprint %s) does not match "
+                       "the one in known_hosts; this may be a possible attack",
+                       ssh_key_type_to_char(pubkey_type), fingerprint);
+            ssh_string_free_char(fingerprint);
+        } else  {
+            error_setg(errp,
+                       "host key does not match the one in known_hosts; this "
+                       "may be a possible attack");
+        }
         goto out;
-    }
-
-    knh = libssh2_knownhost_init(s->session);
-    if (!knh) {
+    case SSH_KNOWN_HOSTS_OTHER:
         ret = -EINVAL;
-        session_error_setg(errp, s,
-                           "failed to initialize known hosts support");
+        error_setg(errp,
+                   "host key for this server not found, another type exists");
+        goto out;
+    case SSH_KNOWN_HOSTS_UNKNOWN:
+        ret = -EINVAL;
+        error_setg(errp, "no host key was found in known_hosts");
+        goto out;
+    case SSH_KNOWN_HOSTS_NOT_FOUND:
+        ret = -ENOENT;
+        error_setg(errp, "known_hosts file not found");
+        goto out;
+    case SSH_KNOWN_HOSTS_ERROR:
+        ret = -EINVAL;
+        error_setg(errp, "error while checking the host");
+        goto out;
+    default:
+        ret = -EINVAL;
+        error_setg(errp, "error while checking for known server (%d)", state);
         goto out;
     }
+#else /* !HAVE_LIBSSH_0_8 */
+    int state;
 
-    home = getenv("HOME");
-    if (home) {
-        knh_file = g_strdup_printf("%s/.ssh/known_hosts", home);
-    } else {
-        knh_file = g_strdup_printf("/root/.ssh/known_hosts");
-    }
-
-    /* Read all known hosts from OpenSSH-style known_hosts file. */
-    libssh2_knownhost_readfile(knh, knh_file, LIBSSH2_KNOWNHOST_FILE_OPENSSH);
+    state = ssh_is_server_known(s->session);
+    trace_ssh_server_status(state);
 
-    r = libssh2_knownhost_checkp(knh, host, port, hostkey, len,
-                                 LIBSSH2_KNOWNHOST_TYPE_PLAIN|
-                                 LIBSSH2_KNOWNHOST_KEYENC_RAW,
-                                 &found);
-    switch (r) {
-    case LIBSSH2_KNOWNHOST_CHECK_MATCH:
+    switch (state) {
+    case SSH_SERVER_KNOWN_OK:
         /* OK */
-        trace_ssh_check_host_key_knownhosts(found->key);
+        trace_ssh_check_host_key_knownhosts();
         break;
-    case LIBSSH2_KNOWNHOST_CHECK_MISMATCH:
+    case SSH_SERVER_KNOWN_CHANGED:
         ret = -EINVAL;
-        session_error_setg(errp, s,
-                      "host key does not match the one in known_hosts"
-                      " (found key %s)", found->key);
+        error_setg(errp,
+                   "host key does not match the one in known_hosts; this "
+                   "may be a possible attack");
         goto out;
-    case LIBSSH2_KNOWNHOST_CHECK_NOTFOUND:
+    case SSH_SERVER_FOUND_OTHER:
         ret = -EINVAL;
-        session_error_setg(errp, s, "no host key was found in known_hosts");
+        error_setg(errp,
+                   "host key for this server not found, another type exists");
+        goto out;
+    case SSH_SERVER_FILE_NOT_FOUND:
+        ret = -ENOENT;
+        error_setg(errp, "known_hosts file not found");
         goto out;
-    case LIBSSH2_KNOWNHOST_CHECK_FAILURE:
+    case SSH_SERVER_NOT_KNOWN:
         ret = -EINVAL;
-        session_error_setg(errp, s,
-                      "failure matching the host key with known_hosts");
+        error_setg(errp, "no host key was found in known_hosts");
+        goto out;
+    case SSH_SERVER_ERROR:
+        ret = -EINVAL;
+        error_setg(errp, "server error");
         goto out;
     default:
         ret = -EINVAL;
-        session_error_setg(errp, s, "unknown error matching the host key"
-                      " with known_hosts (%d)", r);
+        error_setg(errp, "error while checking for known server (%d)", state);
         goto out;
     }
+#endif /* !HAVE_LIBSSH_0_8 */
 
     /* known_hosts checking successful. */
     ret = 0;
 
  out:
-    if (knh != NULL) {
-        libssh2_knownhost_free(knh);
-    }
-    g_free(knh_file);
     return ret;
 }
 
@@ -XXX,XX +XXX,XX @@ static int compare_fingerprint(const unsigned char *fingerprint, size_t len,
 
 static int
 check_host_key_hash(BDRVSSHState *s, const char *hash,
-                    int hash_type, size_t fingerprint_len, Error **errp)
+                    enum ssh_publickey_hash_type type, Error **errp)
 {
-    const char *fingerprint;
-
-    fingerprint = libssh2_hostkey_hash(s->session, hash_type);
-    if (!fingerprint) {
+    int r;
+    ssh_key pubkey;
+    unsigned char *server_hash;
+    size_t server_hash_len;
+
+#ifdef HAVE_LIBSSH_0_8
+    r = ssh_get_server_publickey(s->session, &pubkey);
+#else
+    r = ssh_get_publickey(s->session, &pubkey);
+#endif
+    if (r != SSH_OK) {
         session_error_setg(errp, s, "failed to read remote host key");
         return -EINVAL;
     }
 
-    if(compare_fingerprint((unsigned char *) fingerprint, fingerprint_len,
-                           hash) != 0) {
+    r = ssh_get_publickey_hash(pubkey, type, &server_hash, &server_hash_len);
+    ssh_key_free(pubkey);
+    if (r != 0) {
+        session_error_setg(errp, s,
+                           "failed reading the hash of the server SSH key");
+        return -EINVAL;
+    }
+
+    r = compare_fingerprint(server_hash, server_hash_len, hash);
+    ssh_clean_pubkey_hash(&server_hash);
+    if (r != 0) {
         error_setg(errp, "remote host key does not match host_key_check '%s'",
                    hash);
         return -EPERM;
@@ -XXX,XX +XXX,XX @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
     return 0;
 }
 
-static int check_host_key(BDRVSSHState *s, const char *host, int port,
-                          SshHostKeyCheck *hkc, Error **errp)
+static int check_host_key(BDRVSSHState *s, SshHostKeyCheck *hkc, Error **errp)
 {
     SshHostKeyCheckMode mode;
 
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
     case SSH_HOST_KEY_CHECK_MODE_HASH:
         if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_MD5) {
             return check_host_key_hash(s, hkc->u.hash.hash,
-                                       LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
+                                       SSH_PUBLICKEY_HASH_MD5, errp);
         } else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA1) {
             return check_host_key_hash(s, hkc->u.hash.hash,
-                                       LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
+                                       SSH_PUBLICKEY_HASH_SHA1, errp);
         }
         g_assert_not_reached();
         break;
     case SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS:
-        return check_host_key_knownhosts(s, host, port, errp);
+        return check_host_key_knownhosts(s, errp);
     default:
         g_assert_not_reached();
     }
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
     return -EINVAL;
 }
 
-static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
+static int authenticate(BDRVSSHState *s, Error **errp)
 {
     int r, ret;
-    const char *userauthlist;
-    LIBSSH2_AGENT *agent = NULL;
-    struct libssh2_agent_publickey *identity;
-    struct libssh2_agent_publickey *prev_identity = NULL;
+    int method;
 
-    userauthlist = libssh2_userauth_list(s->session, user, strlen(user));
-    if (strstr(userauthlist, "publickey") == NULL) {
+    /* Try to authenticate with the "none" method. */
+    r = ssh_userauth_none(s->session, NULL);
+    if (r == SSH_AUTH_ERROR) {
         ret = -EPERM;
-        error_setg(errp,
-                "remote server does not support \"publickey\" authentication");
+        session_error_setg(errp, s, "failed to authenticate using none "
+                                    "authentication");
         goto out;
-    }
-
-    /* Connect to ssh-agent and try each identity in turn. */
-    agent = libssh2_agent_init(s->session);
-    if (!agent) {
-        ret = -EINVAL;
-        session_error_setg(errp, s, "failed to initialize ssh-agent support");
-        goto out;
-    }
-    if (libssh2_agent_connect(agent)) {
-        ret = -ECONNREFUSED;
-        session_error_setg(errp, s, "failed to connect to ssh-agent");
-        goto out;
-    }
-    if (libssh2_agent_list_identities(agent)) {
-        ret = -EINVAL;
-        session_error_setg(errp, s,
-                           "failed requesting identities from ssh-agent");
+    } else if (r == SSH_AUTH_SUCCESS) {
+        /* Authenticated! */
+        ret = 0;
         goto out;
     }
 
-    for(;;) {
-        r = libssh2_agent_get_identity(agent, &identity, prev_identity);
-        if (r == 1) {           /* end of list */
-            break;
-        }
-        if (r < 0) {
+    method = ssh_userauth_list(s->session, NULL);
+    trace_ssh_auth_methods(method);
+
+    /*
+     * Try to authenticate with publickey, using the ssh-agent
+     * if available.
+     */
+    if (method & SSH_AUTH_METHOD_PUBLICKEY) {
+        r = ssh_userauth_publickey_auto(s->session, NULL, NULL);
+        if (r == SSH_AUTH_ERROR) {
             ret = -EINVAL;
-            session_error_setg(errp, s,
-                               "failed to obtain identity from ssh-agent");
+            session_error_setg(errp, s, "failed to authenticate using "
+                                        "publickey authentication");
             goto out;
-        }
-        r = libssh2_agent_userauth(agent, user, identity);
-        if (r == 0) {
+        } else if (r == SSH_AUTH_SUCCESS) {
             /* Authenticated! */
             ret = 0;
             goto out;
         }
-        /* Failed to authenticate with this identity, try the next one. */
-        prev_identity = identity;
     }
 
     ret = -EPERM;
@@ -XXX,XX +XXX,XX @@ static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
                "and the identities held by your ssh-agent");
 
  out:
-    if (agent != NULL) {
-        /* Note: libssh2 implementation implicitly calls
-         * libssh2_agent_disconnect if necessary.
-         */
-        libssh2_agent_free(agent);
-    }
-
     return ret;
 }
 
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
                           int ssh_flags, int creat_mode, Error **errp)
 {
     int r, ret;
-    long port = 0;
+    unsigned int port = 0;
+    int new_sock = -1;
 
     if (opts->has_user) {
         s->user = g_strdup(opts->user);
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
     s->inet = opts->server;
     opts->server = NULL;
 
-    if (qemu_strtol(s->inet->port, NULL, 10, &port) < 0) {
+    if (qemu_strtoui(s->inet->port, NULL, 10, &port) < 0) {
         error_setg(errp, "Use only numeric port value");
         ret = -EINVAL;
         goto err;
     }
 
     /* Open the socket and connect. */
-    s->sock = inet_connect_saddr(s->inet, errp);
-    if (s->sock < 0) {
+    new_sock = inet_connect_saddr(s->inet, errp);
+    if (new_sock < 0) {
         ret = -EIO;
         goto err;
     }
 
+    /*
+     * Try to disable the Nagle algorithm on TCP sockets to reduce latency,
+     * but do not fail if it cannot be disabled.
+     */
+    r = socket_set_nodelay(new_sock);
+    if (r < 0) {
+        warn_report("can't set TCP_NODELAY for the ssh server %s: %s",
+                    s->inet->host, strerror(errno));
+    }
+
     /* Create SSH session. */
-    s->session = libssh2_session_init();
+    s->session = ssh_new();
     if (!s->session) {
         ret = -EINVAL;
-        session_error_setg(errp, s, "failed to initialize libssh2 session");
+        session_error_setg(errp, s, "failed to initialize libssh session");
         goto err;
     }
 
-#if TRACE_LIBSSH2 != 0
-    libssh2_trace(s->session, TRACE_LIBSSH2);
-#endif
+    /*
+     * Make sure we are in blocking mode during the connection and
+     * authentication phases.
+     */
+    ssh_set_blocking(s->session, 1);
 
-    r = libssh2_session_handshake(s->session, s->sock);
-    if (r != 0) {
+    r = ssh_options_set(s->session, SSH_OPTIONS_USER, s->user);
+    if (r < 0) {
+        ret = -EINVAL;
+        session_error_setg(errp, s,
+                           "failed to set the user in the libssh session");
+        goto err;
+    }
+
+    r = ssh_options_set(s->session, SSH_OPTIONS_HOST, s->inet->host);
+    if (r < 0) {
+        ret = -EINVAL;
+        session_error_setg(errp, s,
+                           "failed to set the host in the libssh session");
+        goto err;
+    }
+
+    if (port > 0) {
+        r = ssh_options_set(s->session, SSH_OPTIONS_PORT, &port);
+        if (r < 0) {
+            ret = -EINVAL;
+            session_error_setg(errp, s,
+                               "failed to set the port in the libssh session");
+            goto err;
+        }
+    }
+
+    r = ssh_options_set(s->session, SSH_OPTIONS_COMPRESSION, "none");
+    if (r < 0) {
+        ret = -EINVAL;
+        session_error_setg(errp, s,
+                           "failed to disable the compression in the libssh "
+                           "session");
+        goto err;
+    }
+
+    /* Read ~/.ssh/config. */
+    r = ssh_options_parse_config(s->session, NULL);
+    if (r < 0) {
+        ret = -EINVAL;
+        session_error_setg(errp, s, "failed to parse ~/.ssh/config");
+        goto err;
+    }
+
+    r = ssh_options_set(s->session, SSH_OPTIONS_FD, &new_sock);
+    if (r < 0) {
+        ret = -EINVAL;
+        session_error_setg(errp, s,
+                           "failed to set the socket in the libssh session");
+        goto err;
+    }
+    /* libssh took ownership of the socket. */
+    s->sock = new_sock;
+    new_sock = -1;
+
+    /* Connect. */
+    r = ssh_connect(s->session);
+    if (r != SSH_OK) {
         ret = -EINVAL;
         session_error_setg(errp, s, "failed to establish SSH session");
         goto err;
     }
 
     /* Check the remote host's key against known_hosts. */
-    ret = check_host_key(s, s->inet->host, port, opts->host_key_check, errp);
+    ret = check_host_key(s, opts->host_key_check, errp);
     if (ret < 0) {
         goto err;
     }
 
     /* Authenticate. */
-    ret = authenticate(s, s->user, errp);
+    ret = authenticate(s, errp);
     if (ret < 0) {
         goto err;
     }
 
     /* Start SFTP. */
-    s->sftp = libssh2_sftp_init(s->session);
+    s->sftp = sftp_new(s->session);
     if (!s->sftp) {
-        session_error_setg(errp, s, "failed to initialize sftp handle");
+        session_error_setg(errp, s, "failed to create sftp handle");
+        ret = -EINVAL;
+        goto err;
+    }
+
+    r = sftp_init(s->sftp);
+    if (r < 0) {
+        sftp_error_setg(errp, s, "failed to initialize sftp handle");
         ret = -EINVAL;
         goto err;
     }
 
     /* Open the remote file. */
     trace_ssh_connect_to_ssh(opts->path, ssh_flags, creat_mode);
-    s->sftp_handle = libssh2_sftp_open(s->sftp, opts->path, ssh_flags,
-                                       creat_mode);
+    s->sftp_handle = sftp_open(s->sftp, opts->path, ssh_flags, creat_mode);
     if (!s->sftp_handle) {
-        session_error_setg(errp, s, "failed to open remote file '%s'",
-                           opts->path);
+        sftp_error_setg(errp, s, "failed to open remote file '%s'",
+                        opts->path);
         ret = -EINVAL;
         goto err;
     }
 
-    r = libssh2_sftp_fstat(s->sftp_handle, &s->attrs);
-    if (r < 0) {
+    /* Make sure the SFTP file is handled in blocking mode. */
+    sftp_file_set_blocking(s->sftp_handle);
+
+    s->attrs = sftp_fstat(s->sftp_handle);
+    if (!s->attrs) {
         sftp_error_setg(errp, s, "failed to read file attributes");
         return -EINVAL;
     }
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
     return 0;
 
  err:
+    if (s->attrs) {
+        sftp_attributes_free(s->attrs);
+    }
+    s->attrs = NULL;
     if (s->sftp_handle) {
-        libssh2_sftp_close(s->sftp_handle);
+        sftp_close(s->sftp_handle);
     }
     s->sftp_handle = NULL;
     if (s->sftp) {
-        libssh2_sftp_shutdown(s->sftp);
+        sftp_free(s->sftp);
     }
     s->sftp = NULL;
     if (s->session) {
-        libssh2_session_disconnect(s->session,
-                                   "from qemu ssh client: "
-                                   "error opening connection");
-        libssh2_session_free(s->session);
+        ssh_disconnect(s->session);
+        ssh_free(s->session);
     }
     s->session = NULL;
+    s->sock = -1;
+    if (new_sock >= 0) {
+        close(new_sock);
+    }
 
     return ret;
 }
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
 
     ssh_state_init(s);
 
-    ssh_flags = LIBSSH2_FXF_READ;
+    ssh_flags = 0;
     if (bdrv_flags & BDRV_O_RDWR) {
-        ssh_flags |= LIBSSH2_FXF_WRITE;
+        ssh_flags |= O_RDWR;
+    } else {
+        ssh_flags |= O_RDONLY;
     }
 
     opts = ssh_parse_options(options, errp);
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
     }
 
     /* Go non-blocking. */
-    libssh2_session_set_blocking(s->session, 0);
+    ssh_set_blocking(s->session, 0);
 
     qapi_free_BlockdevOptionsSsh(opts);
 
     return 0;
 
  err:
-    if (s->sock >= 0) {
-        close(s->sock);
-    }
-    s->sock = -1;
-
     qapi_free_BlockdevOptionsSsh(opts);
 
     return ret;
@@ -XXX,XX +XXX,XX @@ static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
 {
     ssize_t ret;
     char c[1] = { '\0' };
-    int was_blocking = libssh2_session_get_blocking(s->session);
+    int was_blocking = ssh_is_blocking(s->session);
 
     /* offset must be strictly greater than the current size so we do
      * not overwrite anything */
-    assert(offset > 0 && offset > s->attrs.filesize);
+    assert(offset > 0 && offset > s->attrs->size);
 
-    libssh2_session_set_blocking(s->session, 1);
+    ssh_set_blocking(s->session, 1);
 
-    libssh2_sftp_seek64(s->sftp_handle, offset - 1);
-    ret = libssh2_sftp_write(s->sftp_handle, c, 1);
+    sftp_seek64(s->sftp_handle, offset - 1);
+    ret = sftp_write(s->sftp_handle, c, 1);
 
-    libssh2_session_set_blocking(s->session, was_blocking);
+    ssh_set_blocking(s->session, was_blocking);
 
     if (ret < 0) {
         sftp_error_setg(errp, s, "Failed to grow file");
         return -EIO;
     }
 
-    s->attrs.filesize = offset;
+    s->attrs->size = offset;
     return 0;
 }
 
@@ -XXX,XX +XXX,XX @@ static int ssh_co_create(BlockdevCreateOptions *options, Error **errp)
     ssh_state_init(&s);
 
     ret = connect_to_ssh(&s, opts->location,
-                         LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
-                         LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
+                         O_RDWR | O_CREAT | O_TRUNC,
                          0644, errp);
     if (ret < 0) {
         goto fail;
@@ -XXX,XX +XXX,XX @@ static int ssh_has_zero_init(BlockDriverState *bs)
     /* Assume false, unless we can positively prove it's true. */
     int has_zero_init = 0;
 
-    if (s->attrs.flags & LIBSSH2_SFTP_ATTR_PERMISSIONS) {
-        if (s->attrs.permissions & LIBSSH2_SFTP_S_IFREG) {
-            has_zero_init = 1;
-        }
+    if (s->attrs->type == SSH_FILEXFER_TYPE_REGULAR) {
+        has_zero_init = 1;
     }
 
     return has_zero_init;
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
         .co = qemu_coroutine_self()
     };
 
-    r = libssh2_session_block_directions(s->session);
+    r = ssh_get_poll_flags(s->session);
 
-    if (r & LIBSSH2_SESSION_BLOCK_INBOUND) {
+    if (r & SSH_READ_PENDING) {
         rd_handler = restart_coroutine;
     }
-    if (r & LIBSSH2_SESSION_BLOCK_OUTBOUND) {
+    if (r & SSH_WRITE_PENDING) {
         wr_handler = restart_coroutine;
     }
 
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
     trace_ssh_co_yield_back(s->sock);
 }
 
-/* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
- * in the remote file.  Notice that it just updates a field in the
- * sftp_handle structure, so there is no network traffic and it cannot
- * fail.
- *
- * However, `libssh2_sftp_seek64' does have a catastrophic effect on
- * performance since it causes the handle to throw away all in-flight
- * reads and buffered readahead data.  Therefore this function tries
- * to be intelligent about when to call the underlying libssh2 function.
- */
-#define SSH_SEEK_WRITE 0
-#define SSH_SEEK_READ  1
-#define SSH_SEEK_FORCE 2
-
-static void ssh_seek(BDRVSSHState *s, int64_t offset, int flags)
-{
-    bool op_read = (flags & SSH_SEEK_READ) != 0;
-    bool force = (flags & SSH_SEEK_FORCE) != 0;
-
-    if (force || op_read != s->offset_op_read || offset != s->offset) {
-        trace_ssh_seek(offset);
-        libssh2_sftp_seek64(s->sftp_handle, offset);
-        s->offset = offset;
-        s->offset_op_read = op_read;
-    }
-}
-
 static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
                                  int64_t offset, size_t size,
                                  QEMUIOVector *qiov)
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
 
     trace_ssh_read(offset, size);
 
-    ssh_seek(s, offset, SSH_SEEK_READ);
+    trace_ssh_seek(offset);
+    sftp_seek64(s->sftp_handle, offset);
 
     /* This keeps track of the current iovec element ('i'), where we
      * will write to next ('buf'), and the end of the current iovec
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
     buf = i->iov_base;
     end_of_vec = i->iov_base + i->iov_len;
 
-    /* libssh2 has a hard-coded limit of 2000 bytes per request,
-     * although it will also do readahead behind our backs.  Therefore
-     * we may have to do repeated reads here until we have read 'size'
-     * bytes.
-     */
     for (got = 0; got < size; ) {
+        size_t request_read_size;
     again:
-        trace_ssh_read_buf(buf, end_of_vec - buf);
-        r = libssh2_sftp_read(s->sftp_handle, buf, end_of_vec - buf);
-        trace_ssh_read_return(r);
+        /*
+         * The size of SFTP packets is limited to 32K bytes, so limit
+         * the amount of data requested to 16K, as libssh currently
+         * does not handle multiple requests on its own.
+         */
+        request_read_size = MIN(end_of_vec - buf, 16384);
+        trace_ssh_read_buf(buf, end_of_vec - buf, request_read_size);
+        r = sftp_read(s->sftp_handle, buf, request_read_size);
+        trace_ssh_read_return(r, sftp_get_error(s->sftp));
 
-        if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
+        if (r == SSH_AGAIN) {
             co_yield(s, bs);
             goto again;
         }
-        if (r < 0) {
-            sftp_error_trace(s, "read");
-            s->offset = -1;
-            return -EIO;
-        }
-        if (r == 0) {
+        if (r == SSH_EOF || (r == 0 && sftp_get_error(s->sftp) == SSH_FX_EOF)) {
             /* EOF: Short read so pad the buffer with zeroes and return it. */
             qemu_iovec_memset(qiov, got, 0, size - got);
             return 0;
         }
+        if (r <= 0) {
+            sftp_error_trace(s, "read");
+            return -EIO;
+        }
 
         got += r;
         buf += r;
-        s->offset += r;
         if (buf >= end_of_vec && got < size) {
             i++;
             buf = i->iov_base;
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
 
     trace_ssh_write(offset, size);
 
-    ssh_seek(s, offset, SSH_SEEK_WRITE);
+    trace_ssh_seek(offset);
+    sftp_seek64(s->sftp_handle, offset);
 
     /* This keeps track of the current iovec element ('i'), where we
      * will read from next ('buf'), and the end of the current iovec
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
     end_of_vec = i->iov_base + i->iov_len;
 
     for (written = 0; written < size; ) {
+        size_t request_write_size;
     again:
-        trace_ssh_write_buf(buf, end_of_vec - buf);
-        r = libssh2_sftp_write(s->sftp_handle, buf, end_of_vec - buf);
-        trace_ssh_write_return(r);
+        /*
+         * Avoid too large data packets, as libssh currently does not
+         * handle multiple requests on its own.
+         */
+        request_write_size = MIN(end_of_vec - buf, 131072);
+        trace_ssh_write_buf(buf, end_of_vec - buf, request_write_size);
+        r = sftp_write(s->sftp_handle, buf, request_write_size);
+        trace_ssh_write_return(r, sftp_get_error(s->sftp));
 
-        if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
+        if (r == SSH_AGAIN) {
             co_yield(s, bs);
             goto again;
         }
         if (r < 0) {
             sftp_error_trace(s, "write");
-            s->offset = -1;
             return -EIO;
         }
-        /* The libssh2 API is very unclear about this.  A comment in
-         * the code says "nothing was acked, and no EAGAIN was
-         * received!" which apparently means that no data got sent
-         * out, and the underlying channel didn't return any EAGAIN
-         * indication.  I think this is a bug in either libssh2 or
-         * OpenSSH (server-side).  In any case, forcing a seek (to
-         * discard libssh2 internal buffers), and then trying again
-         * works for me.
-         */
-        if (r == 0) {
-            ssh_seek(s, offset + written, SSH_SEEK_WRITE|SSH_SEEK_FORCE);
-            co_yield(s, bs);
-            goto again;
-        }
 
         written += r;
         buf += r;
-        s->offset += r;
         if (buf >= end_of_vec && written < size) {
             i++;
             buf = i->iov_base;
             end_of_vec = i->iov_base + i->iov_len;
         }
 
-        if (offset + written > s->attrs.filesize)
-            s->attrs.filesize = offset + written;
+        if (offset + written > s->attrs->size) {
+            s->attrs->size = offset + written;
+        }
     }
 
     return 0;
@@ -XXX,XX +XXX,XX @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
     }
 }
 
-#ifdef HAS_LIBSSH2_SFTP_FSYNC
+#ifdef HAVE_LIBSSH_0_8
 
 static coroutine_fn int ssh_flush(BDRVSSHState *s, BlockDriverState *bs)
 {
     int r;
 
     trace_ssh_flush();
+
+    if (!sftp_extension_supported(s->sftp, "fsync@openssh.com", "1")) {
+        unsafe_flush_warning(s, "OpenSSH >= 6.3");
+        return 0;
+    }
  again:
-    r = libssh2_sftp_fsync(s->sftp_handle);
-    if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
+    r = sftp_fsync(s->sftp_handle);
+    if (r == SSH_AGAIN) {
         co_yield(s, bs);
         goto again;
     }
-    if (r == LIBSSH2_ERROR_SFTP_PROTOCOL &&
-        libssh2_sftp_last_error(s->sftp) == LIBSSH2_FX_OP_UNSUPPORTED) {
-        unsafe_flush_warning(s, "OpenSSH >= 6.3");
-        return 0;
-    }
     if (r < 0) {
         sftp_error_trace(s, "fsync");
         return -EIO;
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
     return ret;
 }
 
-#else /* !HAS_LIBSSH2_SFTP_FSYNC */
+#else /* !HAVE_LIBSSH_0_8 */
 
 static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
 {
     BDRVSSHState *s = bs->opaque;
 
-    unsafe_flush_warning(s, "libssh2 >= 1.4.4");
+    unsafe_flush_warning(s, "libssh >= 0.8.0");
     return 0;
 }
 
-#endif /* !HAS_LIBSSH2_SFTP_FSYNC */
+#endif /* !HAVE_LIBSSH_0_8 */
 
 static int64_t ssh_getlength(BlockDriverState *bs)
 {
     BDRVSSHState *s = bs->opaque;
     int64_t length;
 
-    /* Note we cannot make a libssh2 call here. */
-    length = (int64_t) s->attrs.filesize;
+    /* Note we cannot make a libssh call here. */
+    length = (int64_t) s->attrs->size;
     trace_ssh_getlength(length);
 
     return length;
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn ssh_co_truncate(BlockDriverState *bs, int64_t offset,
         return -ENOTSUP;
     }
 
-    if (offset < s->attrs.filesize) {
+    if (offset < s->attrs->size) {
         error_setg(errp, "ssh driver does not support shrinking files");
         return -ENOTSUP;
     }
 
-    if (offset == s->attrs.filesize) {
+    if (offset == s->attrs->size) {
         return 0;
     }
 
@@ -XXX,XX +XXX,XX @@ static void bdrv_ssh_init(void)
 {
     int r;
 
-    r = libssh2_init(0);
+    r = ssh_init();
     if (r != 0) {
-        fprintf(stderr, "libssh2 initialization failed, %d\n", r);
+        fprintf(stderr, "libssh initialization failed, %d\n", r);
         exit(EXIT_FAILURE);
     }
 
+#if TRACE_LIBSSH != 0
+    ssh_set_log_level(TRACE_LIBSSH);
+#endif
+
     bdrv_register(&bdrv_ssh);
 }
 
diff --git a/.travis.yml b/.travis.yml
index XXXXXXX..XXXXXXX 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -XXX,XX +XXX,XX @@ addons:
       - libseccomp-dev
       - libspice-protocol-dev
       - libspice-server-dev
-      - libssh2-1-dev
+      - libssh-dev
       - liburcu-dev
       - libusb-1.0-0-dev
       - libvte-2.91-dev
@@ -XXX,XX +XXX,XX @@ matrix:
             - libseccomp-dev
             - libspice-protocol-dev
             - libspice-server-dev
-            - libssh2-1-dev
+            - libssh-dev
             - liburcu-dev
             - libusb-1.0-0-dev
             - libvte-2.91-dev
diff --git a/block/trace-events b/block/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -XXX,XX +XXX,XX @@ nbd_client_connect_success(const char *export_name) "export '%s'"
 # ssh.c
 ssh_restart_coroutine(void *co) "co=%p"
 ssh_flush(void) "fsync"
-ssh_check_host_key_knownhosts(const char *key) "host key OK: %s"
+ssh_check_host_key_knownhosts(void) "host key OK"
 ssh_connect_to_ssh(char *path, int flags, int mode) "opening file %s flags=0x%x creat_mode=0%o"
 ssh_co_yield(int sock, void *rd_handler, void *wr_handler) "s->sock=%d rd_handler=%p wr_handler=%p"
 ssh_co_yield_back(int sock) "s->sock=%d - back"
 ssh_getlength(int64_t length) "length=%" PRIi64
 ssh_co_create_opts(uint64_t size) "total_size=%" PRIu64
 ssh_read(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
-ssh_read_buf(void *buf, size_t size) "sftp_read buf=%p size=%zu"
-ssh_read_return(ssize_t ret) "sftp_read returned %zd"
+ssh_read_buf(void *buf, size_t size, size_t actual_size) "sftp_read buf=%p size=%zu (actual size=%zu)"
+ssh_read_return(ssize_t ret, int sftp_err) "sftp_read returned %zd (sftp error=%d)"
 ssh_write(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
-ssh_write_buf(void *buf, size_t size) "sftp_write buf=%p size=%zu"
-ssh_write_return(ssize_t ret) "sftp_write returned %zd"
+ssh_write_buf(void *buf, size_t size, size_t actual_size) "sftp_write buf=%p size=%zu (actual size=%zu)"
+ssh_write_return(ssize_t ret, int sftp_err) "sftp_write returned %zd (sftp error=%d)"
 ssh_seek(int64_t offset) "seeking to offset=%" PRIi64
+ssh_auth_methods(int methods) "auth methods=0x%x"
+ssh_server_status(int status) "server status=%d"
 
 # curl.c
 curl_timer_cb(long timeout_ms) "timer callback timeout_ms %ld"
@@ -XXX,XX +XXX,XX @@ sheepdog_snapshot_create(const char *sn_name, const char *id) "%s %s"
 sheepdog_snapshot_create_inode(const char *name, uint32_t snap, uint32_t vdi) "s->inode: name %s snap_id 0x%" PRIx32 " vdi 0x%" PRIx32
 
 # ssh.c
-sftp_error(const char *op, const char *ssh_err, int ssh_err_code, unsigned long sftp_err_code) "%s failed: %s (libssh2 error code: %d, sftp error code: %lu)"
+sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
index XXXXXXX..XXXXXXX 100644
--- a/docs/qemu-block-drivers.texi
+++ b/docs/qemu-block-drivers.texi
@@ -XXX,XX +XXX,XX @@ print a warning when @code{fsync} is not supported:
 
 warning: ssh server @code{ssh.example.com:22} does not support fsync
 
-With sufficiently new versions of libssh2 and OpenSSH, @code{fsync} is
+With sufficiently new versions of libssh and OpenSSH, @code{fsync} is
 supported.
 
 @node disk_images_nvme
diff --git a/tests/docker/dockerfiles/debian-win32-cross.docker b/tests/docker/dockerfiles/debian-win32-cross.docker
index XXXXXXX..XXXXXXX 100644
--- a/tests/docker/dockerfiles/debian-win32-cross.docker
+++ b/tests/docker/dockerfiles/debian-win32-cross.docker
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
         mxe-$TARGET-w64-mingw32.shared-curl \
         mxe-$TARGET-w64-mingw32.shared-glib \
         mxe-$TARGET-w64-mingw32.shared-libgcrypt \
-        mxe-$TARGET-w64-mingw32.shared-libssh2 \
         mxe-$TARGET-w64-mingw32.shared-libusb1 \
         mxe-$TARGET-w64-mingw32.shared-lzo \
         mxe-$TARGET-w64-mingw32.shared-nettle \
diff --git a/tests/docker/dockerfiles/debian-win64-cross.docker b/tests/docker/dockerfiles/debian-win64-cross.docker
index XXXXXXX..XXXXXXX 100644
--- a/tests/docker/dockerfiles/debian-win64-cross.docker
+++ b/tests/docker/dockerfiles/debian-win64-cross.docker
@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
         mxe-$TARGET-w64-mingw32.shared-curl \
         mxe-$TARGET-w64-mingw32.shared-glib \
         mxe-$TARGET-w64-mingw32.shared-libgcrypt \
-        mxe-$TARGET-w64-mingw32.shared-libssh2 \
         mxe-$TARGET-w64-mingw32.shared-libusb1 \
         mxe-$TARGET-w64-mingw32.shared-lzo \
         mxe-$TARGET-w64-mingw32.shared-nettle \
diff --git a/tests/docker/dockerfiles/fedora.docker b/tests/docker/dockerfiles/fedora.docker
index XXXXXXX..XXXXXXX 100644
--- a/tests/docker/dockerfiles/fedora.docker
+++ b/tests/docker/dockerfiles/fedora.docker
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
     libpng-devel \
     librbd-devel \
     libseccomp-devel \
-    libssh2-devel \
+    libssh-devel \
     libubsan \
     libusbx-devel \
     libxml2-devel \
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
     mingw32-gtk3 \
     mingw32-libjpeg-turbo \
     mingw32-libpng \
-    mingw32-libssh2 \
     mingw32-libtasn1 \
     mingw32-nettle \
     mingw32-pixman \
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
     mingw64-gtk3 \
     mingw64-libjpeg-turbo \
     mingw64-libpng \
-    mingw64-libssh2 \
     mingw64-libtasn1 \
     mingw64-nettle \
     mingw64-pixman \
diff --git a/tests/docker/dockerfiles/ubuntu.docker b/tests/docker/dockerfiles/ubuntu.docker
index XXXXXXX..XXXXXXX 100644
--- a/tests/docker/dockerfiles/ubuntu.docker
+++ b/tests/docker/dockerfiles/ubuntu.docker
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
     libsnappy-dev \
     libspice-protocol-dev \
     libspice-server-dev \
-    libssh2-1-dev \
+    libssh-dev \
     libusb-1.0-0-dev \
     libusbredirhost-dev \
     libvdeplug-dev \
diff --git a/tests/docker/dockerfiles/ubuntu1804.docker b/tests/docker/dockerfiles/ubuntu1804.docker
index XXXXXXX..XXXXXXX 100644
--- a/tests/docker/dockerfiles/ubuntu1804.docker
+++ b/tests/docker/dockerfiles/ubuntu1804.docker
@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
     libsnappy-dev \
     libspice-protocol-dev \
     libspice-server-dev \
-    libssh2-1-dev \
+    libssh-dev \
     libusb-1.0-0-dev \
     libusbredirhost-dev \
     libvdeplug-dev \
diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
index XXXXXXX..XXXXXXX 100755
--- a/tests/qemu-iotests/207
+++ b/tests/qemu-iotests/207
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
 
     iotests.img_info_log(remote_path)
 
-    md5_key = subprocess.check_output(
-        'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
-        'cut -d" " -f3 | base64 -d | md5sum -b | cut -d" " -f1',
-        shell=True).rstrip().decode('ascii')
+    keys = subprocess.check_output(
+        'ssh-keyscan 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
+        'cut -d" " -f3',
+        shell=True).rstrip().decode('ascii').split('\n')
+
+    # Mappings of base64 representations to digests
+    md5_keys = {}
+    sha1_keys = {}
+
+    for key in keys:
+        md5_keys[key] = subprocess.check_output(
+            'echo %s | base64 -d | md5sum -b | cut -d" " -f1' % key,
+            shell=True).rstrip().decode('ascii')
+
+        sha1_keys[key] = subprocess.check_output(
+            'echo %s | base64 -d | sha1sum -b | cut -d" " -f1' % key,
+            shell=True).rstrip().decode('ascii')
 
     vm.launch()
+
+    # Find correct key first
+    matching_key = None
+    for key in keys:
+        result = vm.qmp('blockdev-add',
+                        driver='ssh', node_name='node0', path=disk_path,
+                        server={
+                             'host': '127.0.0.1',
+                             'port': '22',
+                        }, host_key_check={
+                             'mode': 'hash',
+                             'type': 'md5',
+                             'hash': md5_keys[key],
+                        })
+
+        if 'error' not in result:
+            vm.qmp('blockdev-del', node_name='node0')
+            matching_key = key
+            break
+
+    if matching_key is None:
+        vm.shutdown()
+        iotests.notrun('Did not find a key that fits 127.0.0.1')
+
     blockdev_create(vm, { 'driver': 'ssh',
                           'location': {
                               'path': disk_path,
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
                               'host-key-check': {
                                   'mode': 'hash',
                                   'type': 'md5',
-                                  'hash': md5_key,
+                                  'hash': md5_keys[matching_key],
                               }
                           },
                           'size': 8388608 })
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
 
     iotests.img_info_log(remote_path)
 
-    sha1_key = subprocess.check_output(
-        'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
-        'cut -d" " -f3 | base64 -d | sha1sum -b | cut -d" " -f1',
-        shell=True).rstrip().decode('ascii')
-
     vm.launch()
     blockdev_create(vm, { 'driver': 'ssh',
                           'location': {
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
                               'host-key-check': {
                                   'mode': 'hash',
                                   'type': 'sha1',
-                                  'hash': sha1_key,
+                                  'hash': sha1_keys[matching_key],
                               }
                           },
                           'size': 4194304 })
diff --git a/tests/qemu-iotests/207.out b/tests/qemu-iotests/207.out
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/207.out
+++ b/tests/qemu-iotests/207.out
@@ -XXX,XX +XXX,XX @@ virtual size: 4 MiB (4194304 bytes)
 
 {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": {"driver": "ssh", "location": {"host-key-check": {"mode": "none"}, "path": "/this/is/not/an/existing/path", "server": {"host": "127.0.0.1", "port": "22"}}, "size": 4194304}}}
 {"return": {}}
-Job failed: failed to open remote file '/this/is/not/an/existing/path': Failed opening remote file (libssh2 error code: -31)
+Job failed: failed to open remote file '/this/is/not/an/existing/path': SFTP server: No such file (libssh error code: 1, sftp error code: 2)
 {"execute": "job-dismiss", "arguments": {"id": "job0"}}
 {"return": {}}
 
-- 
2.21.0