Series comparison

-[PULL 00/14] Block patches
+[Qemu-devel] [PULL v2 0/8] Block patches
-The following changes since commit f5fe7c17ac4e309e47e78f0f9761aebc8d2f2c81:
+The following changes since commit 474f3938d79ab36b9231c9ad3b5a9314c2aeacde:
-  Merge tag 'pull-tcg-20230823-2' of https://gitlab.com/rth7680/qemu into staging (2023-08-28 16:07:04 -0400)
+  Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-jun-21-2019' into staging (2019-06-21 15:40:50 +0100)
 are available in the Git repository at:
-  https://gitlab.com/hreitz/qemu.git tags/pull-block-2023-09-01
+  https://github.com/XanClic/qemu.git tags/pull-block-2019-06-24
-for you to fetch changes up to 380448464dd89291cf7fd7434be6c225482a334d:
+for you to fetch changes up to ab5d4a30f7f3803ca5106b370969c1b7b54136f8:
-  tests/file-io-error: New test (2023-08-29 13:01:24 +0200)
+  iotests: Fix 205 for concurrent runs (2019-06-24 16:01:40 +0200)
 ----------------------------------------------------------------
-Block patches
+Block patches:
 - The SSH block driver now uses libssh instead of libssh2
 - The VMDK block driver gets read-only support for the seSparse
   subformat
 - Various fixes
-- Fix for file-posix's zoning code crashing on I/O errors
+---
-- Throttling refactoring
 v2:
 - Squashed Pino's fix for pre-0.8 libssh into the libssh patch
 ----------------------------------------------------------------
-Hanna Czenczek (5):
+Anton Nefedov (1):
-  file-posix: Clear bs->bl.zoned on error
+  iotest 134: test cluster-misaligned encrypted write
   file-posix: Check bs->bl.zoned for zone info
   file-posix: Fix zone update in I/O error path
   file-posix: Simplify raw_co_prw's 'out' zone code
   tests/file-io-error: New test
-Zhenwei Pi (9):
+Klaus Birkelund Jensen (1):
-  throttle: introduce enum ThrottleDirection
+  nvme: do not advertise support for unsupported arbitration mechanism
   test-throttle: use enum ThrottleDirection
   throttle: support read-only and write-only
   test-throttle: test read only and write only
   cryptodev: use NULL throttle timer cb for read direction
   throttle: use enum ThrottleDirection instead of bool is_write
   throttle: use THROTTLE_MAX/ARRAY_SIZE for hard code
   fsdev: Use ThrottleDirection instread of bool is_write
   block/throttle-groups: Use ThrottleDirection instread of bool is_write
- fsdev/qemu-fsdev-throttle.h                |   4 +-
+Max Reitz (1):
- include/block/throttle-groups.h            |   6 +-
+  iotests: Fix 205 for concurrent runs
- include/qemu/throttle.h                    |  16 +-
- backends/cryptodev.c                       |  12 +-
+Pino Toscano (1):
- block/block-backend.c                      |   4 +-
+  ssh: switch from libssh2 to libssh
- block/file-posix.c                         |  42 +++---
- block/throttle-groups.c                    | 163 +++++++++++----------
+Sam Eiderman (3):
- block/throttle.c                           |   8 +-
+  vmdk: Fix comment regarding max l1_size coverage
- fsdev/qemu-fsdev-throttle.c                |  18 ++-
+  vmdk: Reduce the max bound for L1 table size
- hw/9pfs/cofile.c                           |   4 +-
+  vmdk: Add read-only support for seSparse snapshots
- tests/unit/test-throttle.c                 |  76 +++++++++-
- util/throttle.c                            |  84 +++++++----
+Vladimir Sementsov-Ogievskiy (1):
- tests/qemu-iotests/tests/file-io-error     | 119 +++++++++++++++
+  blockdev: enable non-root nodes for transaction drive-backup source
- tests/qemu-iotests/tests/file-io-error.out |  33 +++++
-files changed, 418 insertions(+), 171 deletions(-)
+ configure                                     |  65 +-
- create mode 100755 tests/qemu-iotests/tests/file-io-error
+ block/Makefile.objs                           |   6 +-
- create mode 100644 tests/qemu-iotests/tests/file-io-error.out
+ block/ssh.c                                   | 652 ++++++++++--------
  block/vmdk.c                                  | 372 +++++++++-
  blockdev.c                                    |   2 +-
  hw/block/nvme.c                               |   1 -
  .travis.yml                                   |   4 +-
  block/trace-events                            |  14 +-
  docs/qemu-block-drivers.texi                  |   2 +-
  .../dockerfiles/debian-win32-cross.docker     |   1 -
  .../dockerfiles/debian-win64-cross.docker     |   1 -
  tests/docker/dockerfiles/fedora.docker        |   4 +-
  tests/docker/dockerfiles/ubuntu.docker        |   2 +-
  tests/docker/dockerfiles/ubuntu1804.docker    |   2 +-
  tests/qemu-iotests/059.out                    |   2 +-
  tests/qemu-iotests/134                        |   9 +
  tests/qemu-iotests/134.out                    |  10 +
  tests/qemu-iotests/205                        |   2 +-
  tests/qemu-iotests/207                        |  54 +-
  tests/qemu-iotests/207.out                    |   2 +-
 files changed, 823 insertions(+), 384 deletions(-)
 --
-.41.0
+.21.0

-[PULL 01/14] throttle: introduce enum ThrottleDirection
+Deleted patch
-From: zhenwei pi <pizhenwei@bytedance.com>
-Use enum ThrottleDirection instead of number index.
-Reviewed-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
-Message-Id: <20230728022006.1098509-2-pizhenwei@bytedance.com>
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
----
- include/qemu/throttle.h | 11 ++++++++---
- util/throttle.c         | 16 +++++++++-------
-files changed, 17 insertions(+), 10 deletions(-)
-diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/throttle.h
-+++ b/include/qemu/throttle.h
-@@ -XXX,XX +XXX,XX @@ typedef struct ThrottleState {
-     int64_t previous_leak;    /* timestamp of the last leak done */
- } ThrottleState;
-+typedef enum {
-+    THROTTLE_READ = 0,
-+    THROTTLE_WRITE,
-+    THROTTLE_MAX
-+} ThrottleDirection;
-+
- typedef struct ThrottleTimers {
--    QEMUTimer *timers[2];     /* timers used to do the throttling */
-+    QEMUTimer *timers[THROTTLE_MAX];    /* timers used to do the throttling */
-     QEMUClockType clock_type; /* the clock used */
-     /* Callbacks */
--    QEMUTimerCB *read_timer_cb;
--    QEMUTimerCB *write_timer_cb;
-+    QEMUTimerCB *timer_cb[THROTTLE_MAX];
-     void *timer_opaque;
- } ThrottleTimers;
-diff --git a/util/throttle.c b/util/throttle.c
-index XXXXXXX..XXXXXXX 100644
---- a/util/throttle.c
-+++ b/util/throttle.c
-@@ -XXX,XX +XXX,XX @@ static bool throttle_compute_timer(ThrottleState *ts,
- void throttle_timers_attach_aio_context(ThrottleTimers *tt,
-                                         AioContext *new_context)
- {
--    tt->timers[0] = aio_timer_new(new_context, tt->clock_type, SCALE_NS,
--                                  tt->read_timer_cb, tt->timer_opaque);
--    tt->timers[1] = aio_timer_new(new_context, tt->clock_type, SCALE_NS,
--                                  tt->write_timer_cb, tt->timer_opaque);
-+    tt->timers[THROTTLE_READ] =
-+        aio_timer_new(new_context, tt->clock_type, SCALE_NS,
-+                      tt->timer_cb[THROTTLE_READ], tt->timer_opaque);
-+    tt->timers[THROTTLE_WRITE] =
-+        aio_timer_new(new_context, tt->clock_type, SCALE_NS,
-+                      tt->timer_cb[THROTTLE_WRITE], tt->timer_opaque);
- }
- /*
-@@ -XXX,XX +XXX,XX @@ void throttle_timers_init(ThrottleTimers *tt,
-     memset(tt, 0, sizeof(ThrottleTimers));
-     tt->clock_type = clock_type;
--    tt->read_timer_cb = read_timer_cb;
--    tt->write_timer_cb = write_timer_cb;
-+    tt->timer_cb[THROTTLE_READ] = read_timer_cb;
-+    tt->timer_cb[THROTTLE_WRITE] = write_timer_cb;
-     tt->timer_opaque = timer_opaque;
-     throttle_timers_attach_aio_context(tt, aio_context);
- }
-@@ -XXX,XX +XXX,XX @@ void throttle_timers_detach_aio_context(ThrottleTimers *tt)
- {
-     int i;
--    for (i = 0; i < 2; i++) {
-+    for (i = 0; i < THROTTLE_MAX; i++) {
-         throttle_timer_destroy(&tt->timers[i]);
-     }
- }
---
-.41.0

-[PULL 02/14] test-throttle: use enum ThrottleDirection
+Deleted patch
-From: zhenwei pi <pizhenwei@bytedance.com>
-Use enum ThrottleDirection instead in the throttle test codes.
-Reviewed-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
-Message-Id: <20230728022006.1098509-3-pizhenwei@bytedance.com>
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
----
- tests/unit/test-throttle.c | 6 +++---
-file changed, 3 insertions(+), 3 deletions(-)
-diff --git a/tests/unit/test-throttle.c b/tests/unit/test-throttle.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/unit/test-throttle.c
-+++ b/tests/unit/test-throttle.c
-@@ -XXX,XX +XXX,XX @@ static void test_init(void)
-     /* check initialized fields */
-     g_assert(tt->clock_type == QEMU_CLOCK_VIRTUAL);
--    g_assert(tt->timers[0]);
--    g_assert(tt->timers[1]);
-+    g_assert(tt->timers[THROTTLE_READ]);
-+    g_assert(tt->timers[THROTTLE_WRITE]);
-     /* check other fields where cleared */
-     g_assert(!ts.previous_leak);
-@@ -XXX,XX +XXX,XX @@ static void test_destroy(void)
-     throttle_timers_init(tt, ctx, QEMU_CLOCK_VIRTUAL,
-                          read_timer_cb, write_timer_cb, &ts);
-     throttle_timers_destroy(tt);
--    for (i = 0; i < 2; i++) {
-+    for (i = 0; i < THROTTLE_MAX; i++) {
-         g_assert(!tt->timers[i]);
-     }
- }
---
-.41.0

-[PULL 13/14] file-posix: Simplify raw_co_prw's 'out' zone code
+[Qemu-devel] [PULL v2 1/8] nvme: do not advertise support for unsupported arbitration mechanism
-We duplicate the same condition three times here, pull it out to the top
+From: Klaus Birkelund Jensen <klaus@birkelund.eu>
 level.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+The device mistakenly reports that the Weighted Round Robin with Urgent
-Message-Id: <20230824155345.109765-5-hreitz@redhat.com>
+Priority Class arbitration mechanism is supported.
-Reviewed-by: Sam Li <faithilikerun@gmail.com>
 It is not.
 Signed-off-by: Klaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
 Message-id: 20190606092530.14206-1-klaus@birkelund.eu
 Acked-by: Maxim Levitsky <mlevitsk@redhat.com>
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- block/file-posix.c | 18 +++++-------------
+ hw/block/nvme.c | 1 -
-file changed, 5 insertions(+), 13 deletions(-)
+file changed, 1 deletion(-)
-diff --git a/block/file-posix.c b/block/file-posix.c
+diff --git a/hw/block/nvme.c b/hw/block/nvme.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/file-posix.c
+--- a/hw/block/nvme.c
-+++ b/block/file-posix.c
++++ b/hw/block/nvme.c
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
+@@ -XXX,XX +XXX,XX @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
+     n->bar.cap = 0;
- out:
+     NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
- #if defined(CONFIG_BLKZONED)
+     NVME_CAP_SET_CQR(n->bar.cap, 1);
--{
+-    NVME_CAP_SET_AMS(n->bar.cap, 1);
--    BlockZoneWps *wps = bs->wps;
+     NVME_CAP_SET_TO(n->bar.cap, 0xf);
--    if (ret == 0) {
+     NVME_CAP_SET_CSS(n->bar.cap, 1);
--        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+     NVME_CAP_SET_MPSMAX(n->bar.cap, 4);
 -            bs->bl.zoned != BLK_Z_NONE) {
 +    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
 +        bs->bl.zoned != BLK_Z_NONE) {
 +        BlockZoneWps *wps = bs->wps;
 +        if (ret == 0) {
              uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
              if (!BDRV_ZT_IS_CONV(*wp)) {
                  if (type & QEMU_AIO_ZONE_APPEND) {
@@ -XXX,XX +XXX,XX @@ out:
                      *wp = offset + bytes;
                  }
              }
 -        }
 -    } else {
 -        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
 -            bs->bl.zoned != BLK_Z_NONE) {
 +        } else {
              update_zones_wp(bs, s->fd, 0, 1);
          }
 -    }
 -    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
 -        bs->blk.zoned != BLK_Z_NONE) {
          qemu_co_mutex_unlock(&wps->colock);
      }
 -}
  #endif
      return ret;
  }
 --
-.41.0
+.21.0

-[PULL 09/14] block/throttle-groups: Use ThrottleDirection instread of bool is_write
+[Qemu-devel] [PULL v2 2/8] blockdev: enable non-root nodes for transaction drive-backup source
-From: zhenwei pi <pizhenwei@bytedance.com>
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-'bool is_write' style is obsolete from throttle framework, adapt
+We forget to enable it for transaction .prepare, while it is already
-block throttle groups to the new style:
+enabled in do_drive_backup since commit a2d665c1bc362
-- use ThrottleDirection instead of 'bool is_write'. Ex,
+    "blockdev: loosen restrictions on drive-backup source node"
   schedule_next_request(ThrottleGroupMember *tgm, bool is_write)
   -> schedule_next_request(ThrottleGroupMember *tgm, ThrottleDirection direction)
-- use THROTTLE_MAX instead of hard code. Ex, ThrottleGroupMember *tokens[2]
+Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-  -> ThrottleGroupMember *tokens[THROTTLE_MAX]
+Message-id: 20190618140804.59214-1-vsementsov@virtuozzo.com
 Reviewed-by: John Snow <jsnow@redhat.com>
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
  blockdev.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-- use ThrottleDirection instead of hard code on iteration. Ex, (i = 0; i < 2; i++)
+diff --git a/blockdev.c b/blockdev.c
   -> for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++)
 Use a simple python script to test the new style:
  #!/usr/bin/python3
 import subprocess
 import random
 import time
 commands = ['virsh blkdeviotune jammy vda --write-bytes-sec ', \
             'virsh blkdeviotune jammy vda --write-iops-sec ', \
             'virsh blkdeviotune jammy vda --read-bytes-sec ', \
             'virsh blkdeviotune jammy vda --read-iops-sec ']
 for loop in range(1, 1000):
     time.sleep(random.randrange(3, 5))
     command = commands[random.randrange(0, 3)] + str(random.randrange(0, 1000000))
     subprocess.run(command, shell=True, check=True)
 This works fine.
 Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
 Message-Id: <20230728022006.1098509-10-pizhenwei@bytedance.com>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
  include/block/throttle-groups.h |   6 +-
  block/block-backend.c           |   4 +-
  block/throttle-groups.c         | 161 ++++++++++++++++----------------
  block/throttle.c                |   8 +-
 files changed, 90 insertions(+), 89 deletions(-)
 diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/block/throttle-groups.h
+--- a/blockdev.c
-+++ b/include/block/throttle-groups.h
++++ b/blockdev.c
-@@ -XXX,XX +XXX,XX @@ typedef struct ThrottleGroupMember {
+@@ -XXX,XX +XXX,XX @@ static void drive_backup_prepare(BlkActionState *common, Error **errp)
-     AioContext   *aio_context;
+     assert(common->action->type == TRANSACTION_ACTION_KIND_DRIVE_BACKUP);
-     /* throttled_reqs_lock protects the CoQueues for throttled requests.  */
+     backup = common->action->u.drive_backup.data;
-     CoMutex      throttled_reqs_lock;
--    CoQueue      throttled_reqs[2];
+-    bs = qmp_get_root_bs(backup->device, errp);
-+    CoQueue      throttled_reqs[THROTTLE_MAX];
++    bs = bdrv_lookup_bs(backup->device, backup->device, errp);
+     if (!bs) {
      /* Nonzero if the I/O limits are currently being ignored; generally
       * it is zero.  Accessed with atomic operations.
@@ -XXX,XX +XXX,XX @@ typedef struct ThrottleGroupMember {
       * throttle_state tells us if I/O limits are configured. */
      ThrottleState *throttle_state;
      ThrottleTimers throttle_timers;
 -    unsigned       pending_reqs[2];
 +    unsigned       pending_reqs[THROTTLE_MAX];
      QLIST_ENTRY(ThrottleGroupMember) round_robin;
  } ThrottleGroupMember;
@@ -XXX,XX +XXX,XX @@ void throttle_group_restart_tgm(ThrottleGroupMember *tgm);
  void coroutine_fn throttle_group_co_io_limits_intercept(ThrottleGroupMember *tgm,
                                                          int64_t bytes,
 -                                                        bool is_write);
 +                                                        ThrottleDirection direction);
  void throttle_group_attach_aio_context(ThrottleGroupMember *tgm,
                                         AioContext *new_context);
  void throttle_group_detach_aio_context(ThrottleGroupMember *tgm);
 diff --git a/block/block-backend.c b/block/block-backend.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/block-backend.c
 +++ b/block/block-backend.c
@@ -XXX,XX +XXX,XX @@ blk_co_do_preadv_part(BlockBackend *blk, int64_t offset, int64_t bytes,
      /* throttling disk I/O */
      if (blk->public.throttle_group_member.throttle_state) {
          throttle_group_co_io_limits_intercept(&blk->public.throttle_group_member,
 -                bytes, false);
 +                bytes, THROTTLE_READ);
      }
      ret = bdrv_co_preadv_part(blk->root, offset, bytes, qiov, qiov_offset,
@@ -XXX,XX +XXX,XX @@ blk_co_do_pwritev_part(BlockBackend *blk, int64_t offset, int64_t bytes,
      /* throttling disk I/O */
      if (blk->public.throttle_group_member.throttle_state) {
          throttle_group_co_io_limits_intercept(&blk->public.throttle_group_member,
 -                bytes, true);
 +                bytes, THROTTLE_WRITE);
      }
      if (!blk->enable_write_cache) {
 diff --git a/block/throttle-groups.c b/block/throttle-groups.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/throttle-groups.c
 +++ b/block/throttle-groups.c
@@ -XXX,XX +XXX,XX @@
  static void throttle_group_obj_init(Object *obj);
  static void throttle_group_obj_complete(UserCreatable *obj, Error **errp);
 -static void timer_cb(ThrottleGroupMember *tgm, bool is_write);
 +static void timer_cb(ThrottleGroupMember *tgm, ThrottleDirection direction);
  /* The ThrottleGroup structure (with its ThrottleState) is shared
   * among different ThrottleGroupMembers and it's independent from
@@ -XXX,XX +XXX,XX @@ struct ThrottleGroup {
      QemuMutex lock; /* This lock protects the following four fields */
      ThrottleState ts;
      QLIST_HEAD(, ThrottleGroupMember) head;
 -    ThrottleGroupMember *tokens[2];
 -    bool any_timer_armed[2];
 +    ThrottleGroupMember *tokens[THROTTLE_MAX];
 +    bool any_timer_armed[THROTTLE_MAX];
      QEMUClockType clock_type;
      /* This field is protected by the global QEMU mutex */
@@ -XXX,XX +XXX,XX @@ static ThrottleGroupMember *throttle_group_next_tgm(ThrottleGroupMember *tgm)
   * This assumes that tg->lock is held.
   *
   * @tgm:        the ThrottleGroupMember
 - * @is_write:   the type of operation (read/write)
 + * @direction:  the ThrottleDirection
   * @ret:        whether the ThrottleGroupMember has pending requests.
   */
  static inline bool tgm_has_pending_reqs(ThrottleGroupMember *tgm,
 -                                        bool is_write)
 +                                        ThrottleDirection direction)
  {
 -    return tgm->pending_reqs[is_write];
 +    return tgm->pending_reqs[direction];
  }
  /* Return the next ThrottleGroupMember in the round-robin sequence with pending
@@ -XXX,XX +XXX,XX @@ static inline bool tgm_has_pending_reqs(ThrottleGroupMember *tgm,
   * This assumes that tg->lock is held.
   *
   * @tgm:       the current ThrottleGroupMember
 - * @is_write:  the type of operation (read/write)
 + * @direction: the ThrottleDirection
   * @ret:       the next ThrottleGroupMember with pending requests, or tgm if
   *             there is none.
   */
  static ThrottleGroupMember *next_throttle_token(ThrottleGroupMember *tgm,
 -                                                bool is_write)
 +                                                ThrottleDirection direction)
  {
      ThrottleState *ts = tgm->throttle_state;
      ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
@@ -XXX,XX +XXX,XX @@ static ThrottleGroupMember *next_throttle_token(ThrottleGroupMember *tgm,
       * it's being drained. Skip the round-robin search and return tgm
       * immediately if it has pending requests. Otherwise we could be
       * forcing it to wait for other member's throttled requests. */
 -    if (tgm_has_pending_reqs(tgm, is_write) &&
 +    if (tgm_has_pending_reqs(tgm, direction) &&
          qatomic_read(&tgm->io_limits_disabled)) {
          return tgm;
      }
 -    start = token = tg->tokens[is_write];
 +    start = token = tg->tokens[direction];
      /* get next bs round in round robin style */
      token = throttle_group_next_tgm(token);
 -    while (token != start && !tgm_has_pending_reqs(token, is_write)) {
 +    while (token != start && !tgm_has_pending_reqs(token, direction)) {
          token = throttle_group_next_tgm(token);
      }
@@ -XXX,XX +XXX,XX @@ static ThrottleGroupMember *next_throttle_token(ThrottleGroupMember *tgm,
       * then decide the token is the current tgm because chances are
       * the current tgm got the current request queued.
       */
 -    if (token == start && !tgm_has_pending_reqs(token, is_write)) {
 +    if (token == start && !tgm_has_pending_reqs(token, direction)) {
          token = tgm;
      }
      /* Either we return the original TGM, or one with pending requests */
 -    assert(token == tgm || tgm_has_pending_reqs(token, is_write));
 +    assert(token == tgm || tgm_has_pending_reqs(token, direction));
      return token;
  }
@@ -XXX,XX +XXX,XX @@ static ThrottleGroupMember *next_throttle_token(ThrottleGroupMember *tgm,
   * This assumes that tg->lock is held.
   *
   * @tgm:        the current ThrottleGroupMember
 - * @is_write:   the type of operation (read/write)
 + * @direction:  the ThrottleDirection
   * @ret:        whether the I/O request needs to be throttled or not
   */
  static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
 -                                          bool is_write)
 +                                          ThrottleDirection direction)
  {
      ThrottleState *ts = tgm->throttle_state;
      ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
      ThrottleTimers *tt = &tgm->throttle_timers;
 -    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
      bool must_wait;
      if (qatomic_read(&tgm->io_limits_disabled)) {
@@ -XXX,XX +XXX,XX @@ static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
      }
      /* Check if any of the timers in this group is already armed */
 -    if (tg->any_timer_armed[is_write]) {
 +    if (tg->any_timer_armed[direction]) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
      /* If a timer just got armed, set tgm as the current token */
      if (must_wait) {
 -        tg->tokens[is_write] = tgm;
 -        tg->any_timer_armed[is_write] = true;
 +        tg->tokens[direction] = tgm;
 +        tg->any_timer_armed[direction] = true;
      }
      return must_wait;
@@ -XXX,XX +XXX,XX @@ static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
   * any request was actually pending.
   *
   * @tgm:       the current ThrottleGroupMember
 - * @is_write:  the type of operation (read/write)
 + * @direction: the ThrottleDirection
   */
  static bool coroutine_fn throttle_group_co_restart_queue(ThrottleGroupMember *tgm,
 -                                                         bool is_write)
 +                                                         ThrottleDirection direction)
  {
      bool ret;
      qemu_co_mutex_lock(&tgm->throttled_reqs_lock);
 -    ret = qemu_co_queue_next(&tgm->throttled_reqs[is_write]);
 +    ret = qemu_co_queue_next(&tgm->throttled_reqs[direction]);
      qemu_co_mutex_unlock(&tgm->throttled_reqs_lock);
      return ret;
@@ -XXX,XX +XXX,XX @@ static bool coroutine_fn throttle_group_co_restart_queue(ThrottleGroupMember *tg
   * This assumes that tg->lock is held.
   *
   * @tgm:       the current ThrottleGroupMember
 - * @is_write:  the type of operation (read/write)
 + * @direction: the ThrottleDirection
   */
 -static void schedule_next_request(ThrottleGroupMember *tgm, bool is_write)
 +static void schedule_next_request(ThrottleGroupMember *tgm,
 +                                  ThrottleDirection direction)
  {
      ThrottleState *ts = tgm->throttle_state;
      ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
@@ -XXX,XX +XXX,XX @@ static void schedule_next_request(ThrottleGroupMember *tgm, bool is_write)
      ThrottleGroupMember *token;
      /* Check if there's any pending request to schedule next */
 -    token = next_throttle_token(tgm, is_write);
 -    if (!tgm_has_pending_reqs(token, is_write)) {
 +    token = next_throttle_token(tgm, direction);
 +    if (!tgm_has_pending_reqs(token, direction)) {
          return;
      }
-     /* Set a timer for the request if it needs to be throttled */
--    must_wait = throttle_group_schedule_timer(token, is_write);
-+    must_wait = throttle_group_schedule_timer(token, direction);
-     /* If it doesn't have to wait, queue it for immediate execution */
-     if (!must_wait) {
-         /* Give preference to requests from the current tgm */
-         if (qemu_in_coroutine() &&
--            throttle_group_co_restart_queue(tgm, is_write)) {
-+            throttle_group_co_restart_queue(tgm, direction)) {
-             token = tgm;
-         } else {
-             ThrottleTimers *tt = &token->throttle_timers;
-             int64_t now = qemu_clock_get_ns(tg->clock_type);
--            timer_mod(tt->timers[is_write], now);
--            tg->any_timer_armed[is_write] = true;
-+            timer_mod(tt->timers[direction], now);
-+            tg->any_timer_armed[direction] = true;
-         }
--        tg->tokens[is_write] = token;
-+        tg->tokens[direction] = token;
-     }
- }
-@@ -XXX,XX +XXX,XX @@ static void schedule_next_request(ThrottleGroupMember *tgm, bool is_write)
-  *
-  * @tgm:       the current ThrottleGroupMember
-  * @bytes:     the number of bytes for this I/O
-- * @is_write:  the type of operation (read/write)
-+ * @direction: the ThrottleDirection
-  */
- void coroutine_fn throttle_group_co_io_limits_intercept(ThrottleGroupMember *tgm,
-                                                         int64_t bytes,
--                                                        bool is_write)
-+                                                        ThrottleDirection direction)
- {
-     bool must_wait;
-     ThrottleGroupMember *token;
-     ThrottleGroup *tg = container_of(tgm->throttle_state, ThrottleGroup, ts);
--    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
-     assert(bytes >= 0);
-+    assert(direction < THROTTLE_MAX);
-     qemu_mutex_lock(&tg->lock);
-     /* First we check if this I/O has to be throttled. */
--    token = next_throttle_token(tgm, is_write);
--    must_wait = throttle_group_schedule_timer(token, is_write);
-+    token = next_throttle_token(tgm, direction);
-+    must_wait = throttle_group_schedule_timer(token, direction);
-     /* Wait if there's a timer set or queued requests of this type */
--    if (must_wait || tgm->pending_reqs[is_write]) {
--        tgm->pending_reqs[is_write]++;
-+    if (must_wait || tgm->pending_reqs[direction]) {
-+        tgm->pending_reqs[direction]++;
-         qemu_mutex_unlock(&tg->lock);
-         qemu_co_mutex_lock(&tgm->throttled_reqs_lock);
--        qemu_co_queue_wait(&tgm->throttled_reqs[is_write],
-+        qemu_co_queue_wait(&tgm->throttled_reqs[direction],
-                            &tgm->throttled_reqs_lock);
-         qemu_co_mutex_unlock(&tgm->throttled_reqs_lock);
-         qemu_mutex_lock(&tg->lock);
--        tgm->pending_reqs[is_write]--;
-+        tgm->pending_reqs[direction]--;
-     }
-     /* The I/O will be executed, so do the accounting */
-     throttle_account(tgm->throttle_state, direction, bytes);
-     /* Schedule the next request */
--    schedule_next_request(tgm, is_write);
-+    schedule_next_request(tgm, direction);
-     qemu_mutex_unlock(&tg->lock);
- }
- typedef struct {
-     ThrottleGroupMember *tgm;
--    bool is_write;
-+    ThrottleDirection direction;
- } RestartData;
- static void coroutine_fn throttle_group_restart_queue_entry(void *opaque)
-@@ -XXX,XX +XXX,XX @@ static void coroutine_fn throttle_group_restart_queue_entry(void *opaque)
-     ThrottleGroupMember *tgm = data->tgm;
-     ThrottleState *ts = tgm->throttle_state;
-     ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
--    bool is_write = data->is_write;
-+    ThrottleDirection direction = data->direction;
-     bool empty_queue;
--    empty_queue = !throttle_group_co_restart_queue(tgm, is_write);
-+    empty_queue = !throttle_group_co_restart_queue(tgm, direction);
-     /* If the request queue was empty then we have to take care of
-      * scheduling the next one */
-     if (empty_queue) {
-         qemu_mutex_lock(&tg->lock);
--        schedule_next_request(tgm, is_write);
-+        schedule_next_request(tgm, direction);
-         qemu_mutex_unlock(&tg->lock);
-     }
-@@ -XXX,XX +XXX,XX @@ static void coroutine_fn throttle_group_restart_queue_entry(void *opaque)
-     aio_wait_kick();
- }
--static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write)
-+static void throttle_group_restart_queue(ThrottleGroupMember *tgm,
-+                                        ThrottleDirection direction)
- {
-     Coroutine *co;
-     RestartData *rd = g_new0(RestartData, 1);
-     rd->tgm = tgm;
--    rd->is_write = is_write;
-+    rd->direction = direction;
-     /* This function is called when a timer is fired or when
-      * throttle_group_restart_tgm() is called. Either way, there can
-      * be no timer pending on this tgm at this point */
--    assert(!timer_pending(tgm->throttle_timers.timers[is_write]));
-+    assert(!timer_pending(tgm->throttle_timers.timers[direction]));
-     qatomic_inc(&tgm->restart_pending);
-@@ -XXX,XX +XXX,XX @@ static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write
- void throttle_group_restart_tgm(ThrottleGroupMember *tgm)
- {
--    int i;
-+    ThrottleDirection dir;
-     if (tgm->throttle_state) {
--        for (i = 0; i < 2; i++) {
--            QEMUTimer *t = tgm->throttle_timers.timers[i];
-+        for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
-+            QEMUTimer *t = tgm->throttle_timers.timers[dir];
-             if (timer_pending(t)) {
-                 /* If there's a pending timer on this tgm, fire it now */
-                 timer_del(t);
--                timer_cb(tgm, i);
-+                timer_cb(tgm, dir);
-             } else {
-                 /* Else run the next request from the queue manually */
--                throttle_group_restart_queue(tgm, i);
-+                throttle_group_restart_queue(tgm, dir);
-             }
-         }
-     }
-@@ -XXX,XX +XXX,XX @@ void throttle_group_get_config(ThrottleGroupMember *tgm, ThrottleConfig *cfg)
-  * because it had been throttled.
-  *
-  * @tgm:       the ThrottleGroupMember whose request had been throttled
-- * @is_write:  the type of operation (read/write)
-+ * @direction: the ThrottleDirection
-  */
--static void timer_cb(ThrottleGroupMember *tgm, bool is_write)
-+static void timer_cb(ThrottleGroupMember *tgm, ThrottleDirection direction)
- {
-     ThrottleState *ts = tgm->throttle_state;
-     ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
-     /* The timer has just been fired, so we can update the flag */
-     qemu_mutex_lock(&tg->lock);
--    tg->any_timer_armed[is_write] = false;
-+    tg->any_timer_armed[direction] = false;
-     qemu_mutex_unlock(&tg->lock);
-     /* Run the request that was waiting for this timer */
--    throttle_group_restart_queue(tgm, is_write);
-+    throttle_group_restart_queue(tgm, direction);
- }
- static void read_timer_cb(void *opaque)
- {
--    timer_cb(opaque, false);
-+    timer_cb(opaque, THROTTLE_READ);
- }
- static void write_timer_cb(void *opaque)
- {
--    timer_cb(opaque, true);
-+    timer_cb(opaque, THROTTLE_WRITE);
- }
- /* Register a ThrottleGroupMember from the throttling group, also initializing
-@@ -XXX,XX +XXX,XX @@ void throttle_group_register_tgm(ThrottleGroupMember *tgm,
-                                  const char *groupname,
-                                  AioContext *ctx)
- {
--    int i;
-+    ThrottleDirection dir;
-     ThrottleState *ts = throttle_group_incref(groupname);
-     ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
-@@ -XXX,XX +XXX,XX @@ void throttle_group_register_tgm(ThrottleGroupMember *tgm,
-     QEMU_LOCK_GUARD(&tg->lock);
-     /* If the ThrottleGroup is new set this ThrottleGroupMember as the token */
--    for (i = 0; i < 2; i++) {
--        if (!tg->tokens[i]) {
--            tg->tokens[i] = tgm;
-+    for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
-+        if (!tg->tokens[dir]) {
-+            tg->tokens[dir] = tgm;
-         }
-+        qemu_co_queue_init(&tgm->throttled_reqs[dir]);
-     }
-     QLIST_INSERT_HEAD(&tg->head, tgm, round_robin);
-@@ -XXX,XX +XXX,XX @@ void throttle_group_register_tgm(ThrottleGroupMember *tgm,
-                          write_timer_cb,
-                          tgm);
-     qemu_co_mutex_init(&tgm->throttled_reqs_lock);
--    qemu_co_queue_init(&tgm->throttled_reqs[0]);
--    qemu_co_queue_init(&tgm->throttled_reqs[1]);
- }
- /* Unregister a ThrottleGroupMember from its group, removing it from the list,
-@@ -XXX,XX +XXX,XX @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
-     ThrottleState *ts = tgm->throttle_state;
-     ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
-     ThrottleGroupMember *token;
--    int i;
-+    ThrottleDirection dir;
-     if (!ts) {
-         /* Discard already unregistered tgm */
-@@ -XXX,XX +XXX,XX @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
-     AIO_WAIT_WHILE(tgm->aio_context, qatomic_read(&tgm->restart_pending) > 0);
-     WITH_QEMU_LOCK_GUARD(&tg->lock) {
--        for (i = 0; i < 2; i++) {
--            assert(tgm->pending_reqs[i] == 0);
--            assert(qemu_co_queue_empty(&tgm->throttled_reqs[i]));
--            assert(!timer_pending(tgm->throttle_timers.timers[i]));
--            if (tg->tokens[i] == tgm) {
-+        for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
-+            assert(tgm->pending_reqs[dir] == 0);
-+            assert(qemu_co_queue_empty(&tgm->throttled_reqs[dir]));
-+            assert(!timer_pending(tgm->throttle_timers.timers[dir]));
-+            if (tg->tokens[dir] == tgm) {
-                 token = throttle_group_next_tgm(tgm);
-                 /* Take care of the case where this is the last tgm in the group */
-                 if (token == tgm) {
-                     token = NULL;
-                 }
--                tg->tokens[i] = token;
-+                tg->tokens[dir] = token;
-             }
-         }
-@@ -XXX,XX +XXX,XX @@ void throttle_group_detach_aio_context(ThrottleGroupMember *tgm)
- {
-     ThrottleGroup *tg = container_of(tgm->throttle_state, ThrottleGroup, ts);
-     ThrottleTimers *tt = &tgm->throttle_timers;
--    int i;
-+    ThrottleDirection dir;
-     /* Requests must have been drained */
--    assert(tgm->pending_reqs[0] == 0 && tgm->pending_reqs[1] == 0);
--    assert(qemu_co_queue_empty(&tgm->throttled_reqs[0]));
--    assert(qemu_co_queue_empty(&tgm->throttled_reqs[1]));
-+    for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
-+        assert(tgm->pending_reqs[dir] == 0);
-+        assert(qemu_co_queue_empty(&tgm->throttled_reqs[dir]));
-+    }
-     /* Kick off next ThrottleGroupMember, if necessary */
-     WITH_QEMU_LOCK_GUARD(&tg->lock) {
--        for (i = 0; i < 2; i++) {
--            if (timer_pending(tt->timers[i])) {
--                tg->any_timer_armed[i] = false;
--                schedule_next_request(tgm, i);
-+        for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
-+            if (timer_pending(tt->timers[dir])) {
-+                tg->any_timer_armed[dir] = false;
-+                schedule_next_request(tgm, dir);
-             }
-         }
-     }
-diff --git a/block/throttle.c b/block/throttle.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/throttle.c
-+++ b/block/throttle.c
-@@ -XXX,XX +XXX,XX @@ throttle_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
- {
-     ThrottleGroupMember *tgm = bs->opaque;
--    throttle_group_co_io_limits_intercept(tgm, bytes, false);
-+    throttle_group_co_io_limits_intercept(tgm, bytes, THROTTLE_READ);
-     return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
- }
-@@ -XXX,XX +XXX,XX @@ throttle_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
-                     QEMUIOVector *qiov, BdrvRequestFlags flags)
- {
-     ThrottleGroupMember *tgm = bs->opaque;
--    throttle_group_co_io_limits_intercept(tgm, bytes, true);
-+    throttle_group_co_io_limits_intercept(tgm, bytes, THROTTLE_WRITE);
-     return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
- }
-@@ -XXX,XX +XXX,XX @@ throttle_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int64_t bytes,
-                           BdrvRequestFlags flags)
- {
-     ThrottleGroupMember *tgm = bs->opaque;
--    throttle_group_co_io_limits_intercept(tgm, bytes, true);
-+    throttle_group_co_io_limits_intercept(tgm, bytes, THROTTLE_WRITE);
-     return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
- }
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn GRAPH_RDLOCK
- throttle_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
- {
-     ThrottleGroupMember *tgm = bs->opaque;
--    throttle_group_co_io_limits_intercept(tgm, bytes, true);
-+    throttle_group_co_io_limits_intercept(tgm, bytes, THROTTLE_WRITE);
-     return bdrv_co_pdiscard(bs->file, offset, bytes);
- }
 --
-.41.0
+.21.0

-[PULL 14/14] tests/file-io-error: New test
+[Qemu-devel] [PULL v2 3/8] iotest 134: test cluster-misaligned encrypted write
-This is a regression test for
+From: Anton Nefedov <anton.nefedov@virtuozzo.com>
 https://bugzilla.redhat.com/show_bug.cgi?id=2234374.
-All this test needs to do is trigger an I/O error inside of file-posix
+COW (even empty/zero) areas require encryption too
 (specifically raw_co_prw()).  One reliable way to do this without
 requiring special privileges is to use a FUSE export, which allows us to
 inject any error that we want, e.g. via blkdebug.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
-Message-Id: <20230824155345.109765-6-hreitz@redhat.com>
+Reviewed-by: Eric Blake <eblake@redhat.com>
-[hreitz: Fixed test to be skipped when there is no FUSE support, to
+Reviewed-by: Max Reitz <mreitz@redhat.com>
-         suppress fusermount's allow_other warning, and to be skipped
+Reviewed-by: Alberto Garcia <berto@igalia.com>
-         with $IMGOPTSSYNTAX enabled]
+Message-id: 20190516143028.81155-1-anton.nefedov@virtuozzo.com
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- tests/qemu-iotests/tests/file-io-error     | 119 +++++++++++++++++++++
+ tests/qemu-iotests/134     |  9 +++++++++
- tests/qemu-iotests/tests/file-io-error.out |  33 ++++++
+ tests/qemu-iotests/134.out | 10 ++++++++++
-files changed, 152 insertions(+)
+files changed, 19 insertions(+)
  create mode 100755 tests/qemu-iotests/tests/file-io-error
  create mode 100644 tests/qemu-iotests/tests/file-io-error.out
-diff --git a/tests/qemu-iotests/tests/file-io-error b/tests/qemu-iotests/tests/file-io-error
+diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
-new file mode 100755
+index XXXXXXX..XXXXXXX 100755
-index XXXXXXX..XXXXXXX
+--- a/tests/qemu-iotests/134
---- /dev/null
++++ b/tests/qemu-iotests/134
-+++ b/tests/qemu-iotests/tests/file-io-error
+@@ -XXX,XX +XXX,XX @@ echo
-@@ -XXX,XX +XXX,XX @@
+ echo "== reading whole image =="
-+#!/usr/bin/env bash
+ $QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
-+# group: rw
-+#
++echo
-+# Produce an I/O error in file-posix, and hope that it is not catastrophic.
++echo "== rewriting cluster part =="
-+# Regression test for: https://bugzilla.redhat.com/show_bug.cgi?id=2234374
++$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 +#
 +# Copyright (C) 2023 Red Hat, Inc.
 +#
 +# This program is free software; you can redistribute it and/or modify
 +# it under the terms of the GNU General Public License as published by
 +# the Free Software Foundation; either version 2 of the License, or
 +# (at your option) any later version.
 +#
 +# This program is distributed in the hope that it will be useful,
 +# but WITHOUT ANY WARRANTY; without even the implied warranty of
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +# GNU General Public License for more details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
 +#
 +
 +seq=$(basename "$0")
 +echo "QA output created by $seq"
 +
 +status=1    # failure is the default!
 +
 +_cleanup()
 +{
 +    _cleanup_qemu
 +    rm -f "$TEST_DIR/fuse-export"
 +}
 +trap "_cleanup; exit \$status" 0 1 2 3 15
 +
 +# get standard environment, filters and checks
 +. ../common.rc
 +. ../common.filter
 +. ../common.qemu
 +
 +# Format-agnostic (we do not use any), but we do test the file protocol
 +_supported_proto file
 +_require_drivers blkdebug null-co
 +
 +if [ "$IMGOPTSSYNTAX" = "true" ]; then
 +    # We need `$QEMU_IO -f file` to work; IMGOPTSSYNTAX uses --image-opts,
 +    # breaking -f.
 +    _unsupported_fmt $IMGFMT
 +fi
 +
 +# This is a regression test of a bug in which flie-posix would access zone
 +# information in case of an I/O error even when there is no zone information,
 +# resulting in a division by zero.
 +# To reproduce the problem, we need to trigger an I/O error inside of
 +# file-posix, which can be done (rootless) by providing a FUSE export that
 +# presents only errors when accessed.
 +
 +_launch_qemu
 +_send_qemu_cmd $QEMU_HANDLE \
 +    "{'execute': 'qmp_capabilities'}" \
 +    'return'
 +
 +_send_qemu_cmd $QEMU_HANDLE \
 +    "{'execute': 'blockdev-add',
 +      'arguments': {
 +          'driver': 'blkdebug',
 +          'node-name': 'node0',
 +          'inject-error': [{'event': 'none'}],
 +          'image': {
 +              'driver': 'null-co'
 +          }
 +      }}" \
 +    'return'
 +
 +# FUSE mountpoint must exist and be a regular file
 +touch "$TEST_DIR/fuse-export"
 +
 +# The grep -v to filter fusermount's (benign) error when /etc/fuse.conf does
 +# not contain user_allow_other and the subsequent check for missing FUSE support
 +# have both been taken from iotest 308.
 +output=$(_send_qemu_cmd $QEMU_HANDLE \
 +    "{'execute': 'block-export-add',
 +      'arguments': {
 +          'id': 'exp0',
 +          'type': 'fuse',
 +          'node-name': 'node0',
 +          'mountpoint': '$TEST_DIR/fuse-export',
 +          'writable': true
 +      }}" \
 +    'return' \
 +    | grep -v 'option allow_other only allowed if')
 +
 +if echo "$output" | grep -q "Parameter 'type' does not accept value 'fuse'"; then
 +    _notrun 'No FUSE support'
 +fi
 +echo "$output"
 +
 +echo
-+# This should fail, but gracefully, i.e. just print an I/O error, not crash.
++echo "== verify pattern =="
-+$QEMU_IO -f file -c 'write 0 64M' "$TEST_DIR/fuse-export" | _filter_qemu_io
++$QEMU_IO --object $SECRET -c "read -P 0 0 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
-+echo
++$QEMU_IO --object $SECRET -c "read -P 0xb 512 512"  --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
 +
-+_send_qemu_cmd $QEMU_HANDLE \
+ echo
-+    "{'execute': 'block-export-del',
+ echo "== rewriting whole image =="
-+      'arguments': {'id': 'exp0'}}" \
+ $QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
-+    'return'
+diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qemu-iotests/134.out
 +++ b/tests/qemu-iotests/134.out
@@ -XXX,XX +XXX,XX @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
  read 134217728/134217728 bytes at offset 0
 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +== rewriting cluster part ==
 +wrote 512/512 bytes at offset 512
 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +
-+_send_qemu_cmd $QEMU_HANDLE \
++== verify pattern ==
-+    '' \
++read 512/512 bytes at offset 0
-+    'BLOCK_EXPORT_DELETED'
++512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +read 512/512 bytes at offset 512
 +512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +
-+_send_qemu_cmd $QEMU_HANDLE \
+ == rewriting whole image ==
-+    "{'execute': 'blockdev-del',
+ wrote 134217728/134217728 bytes at offset 0
-+      'arguments': {'node-name': 'node0'}}" \
+MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +    'return'
 +
 +# success, all done
 +echo "*** done"
 +rm -f $seq.full
 +status=0
 diff --git a/tests/qemu-iotests/tests/file-io-error.out b/tests/qemu-iotests/tests/file-io-error.out
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/tests/qemu-iotests/tests/file-io-error.out
@@ -XXX,XX +XXX,XX @@
 +QA output created by file-io-error
 +{'execute': 'qmp_capabilities'}
 +{"return": {}}
 +{'execute': 'blockdev-add',
 +      'arguments': {
 +          'driver': 'blkdebug',
 +          'node-name': 'node0',
 +          'inject-error': [{'event': 'none'}],
 +          'image': {
 +              'driver': 'null-co'
 +          }
 +      }}
 +{"return": {}}
 +{'execute': 'block-export-add',
 +      'arguments': {
 +          'id': 'exp0',
 +          'type': 'fuse',
 +          'node-name': 'node0',
 +          'mountpoint': 'TEST_DIR/fuse-export',
 +          'writable': true
 +      }}
 +{"return": {}}
 +
 +write failed: Input/output error
 +
 +{'execute': 'block-export-del',
 +      'arguments': {'id': 'exp0'}}
 +{"return": {}}
 +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "exp0"}}
 +{'execute': 'blockdev-del',
 +      'arguments': {'node-name': 'node0'}}
 +{"return": {}}
 +*** done
 --
-.41.0
+.21.0

-[PULL 12/14] file-posix: Fix zone update in I/O error path
+[Qemu-devel] [PULL v2 4/8] vmdk: Fix comment regarding max l1_size coverage
-We must check that zone information is present before running
+From: Sam Eiderman <shmuel.eiderman@oracle.com>
 update_zones_wp().
-Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2234374
+Commit b0651b8c246d ("vmdk: Move l1_size check into vmdk_add_extent")
-Fixes: Coverity CID 1512459
+extended the l1_size check from VMDK4 to VMDK3 but did not update the
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+default coverage in the moved comment.
-Message-Id: <20230824155345.109765-4-hreitz@redhat.com>
-Reviewed-by: Sam Li <faithilikerun@gmail.com>
+The previous vmdk4 calculation:
     (512 * 1024 * 1024) * 512(l2 entries) * 65536(grain) = 16PB
 The added vmdk3 calculation:
     (512 * 1024 * 1024) * 4096(l2 entries) * 512(grain) = 1PB
 Adding the calculation of vmdk3 to the comment.
 In any case, VMware does not offer virtual disks more than 2TB for
 vmdk4/vmdk3 or 64TB for the new undocumented seSparse format which is
 not implemented yet in qemu.
 Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
 Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
 Reviewed-by: Liran Alon <liran.alon@oracle.com>
 Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
 Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
 Message-id: 20190620091057.47441-2-shmuel.eiderman@oracle.com
 Reviewed-by: yuchenlin <yuchenlin@synology.com>
 Reviewed-by: Max Reitz <mreitz@redhat.com>
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- block/file-posix.c | 3 ++-
+ block/vmdk.c | 11 ++++++++---
-file changed, 2 insertions(+), 1 deletion(-)
+file changed, 8 insertions(+), 3 deletions(-)
-diff --git a/block/file-posix.c b/block/file-posix.c
+diff --git a/block/vmdk.c b/block/vmdk.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/file-posix.c
+--- a/block/vmdk.c
-+++ b/block/file-posix.c
++++ b/block/vmdk.c
-@@ -XXX,XX +XXX,XX @@ out:
+@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
-             }
+         return -EFBIG;
-         }
+     }
-     } else {
+     if (l1_size > 512 * 1024 * 1024) {
--        if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
+-        /* Although with big capacity and small l1_entry_sectors, we can get a
-+        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
++        /*
-+            bs->bl.zoned != BLK_Z_NONE) {
++         * Although with big capacity and small l1_entry_sectors, we can get a
-             update_zones_wp(bs, s->fd, 0, 1);
+          * big l1_size, we don't want unbounded value to allocate the table.
-         }
+-         * Limit it to 512M, which is 16PB for default cluster and L2 table
 -         * size */
 +         * Limit it to 512M, which is:
 +         *     16PB - for default "Hosted Sparse Extent" (VMDK4)
 +         *            cluster size: 64KB, L2 table size: 512 entries
 +         *     1PB  - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
 +         *            cluster size: 512B, L2 table size: 4096 entries
 +         */
          error_setg(errp, "L1 size too big");
          return -EFBIG;
      }
 --
-.41.0
+.21.0

-[PULL 11/14] file-posix: Check bs->bl.zoned for zone info
+[Qemu-devel] [PULL v2 5/8] vmdk: Reduce the max bound for L1 table size
-Instead of checking bs->wps or bs->bl.zone_size for whether zone
+From: Sam Eiderman <shmuel.eiderman@oracle.com>
 information is present, check bs->bl.zoned.  That is the flag that
 raw_refresh_zoned_limits() reliably sets to indicate zone support.  If
 it is set to something other than BLK_Z_NONE, other values and objects
 like bs->wps and bs->bl.zone_size must be non-null/zero and valid; if it
 is not, we cannot rely on their validity.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+M of L1 entries is a very loose bound, only 32M are required to store
-Message-Id: <20230824155345.109765-3-hreitz@redhat.com>
+the maximal supported VMDK file size of 2TB.
-Reviewed-by: Sam Li <faithilikerun@gmail.com>
 Fixed qemu-iotest 59# - now failure occures before on impossible L1
 table size.
 Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
 Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
 Reviewed-by: Liran Alon <liran.alon@oracle.com>
 Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
 Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
 Message-id: 20190620091057.47441-3-shmuel.eiderman@oracle.com
 Reviewed-by: Max Reitz <mreitz@redhat.com>
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- block/file-posix.c | 12 +++++++-----
+ block/vmdk.c               | 13 +++++++------
-file changed, 7 insertions(+), 5 deletions(-)
+ tests/qemu-iotests/059.out |  2 +-
 files changed, 8 insertions(+), 7 deletions(-)
-diff --git a/block/file-posix.c b/block/file-posix.c
+diff --git a/block/vmdk.c b/block/vmdk.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/file-posix.c
+--- a/block/vmdk.c
-+++ b/block/file-posix.c
++++ b/block/vmdk.c
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
+@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
-     if (fd_open(bs) < 0)
+         error_setg(errp, "Invalid granularity, image may be corrupt");
-         return -EIO;
+         return -EFBIG;
  #if defined(CONFIG_BLKZONED)
 -    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) {
 +    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
 +        bs->bl.zoned != BLK_Z_NONE) {
          qemu_co_mutex_lock(&bs->wps->colock);
 -        if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
 +        if (type & QEMU_AIO_ZONE_APPEND) {
              int index = offset / bs->bl.zone_size;
              offset = bs->wps->wp[index];
          }
@@ -XXX,XX +XXX,XX @@ out:
  {
      BlockZoneWps *wps = bs->wps;
      if (ret == 0) {
 -        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
 -            && wps && bs->bl.zone_size) {
 +        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
 +            bs->bl.zoned != BLK_Z_NONE) {
              uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
              if (!BDRV_ZT_IS_CONV(*wp)) {
                  if (type & QEMU_AIO_ZONE_APPEND) {
@@ -XXX,XX +XXX,XX @@ out:
          }
      }
+-    if (l1_size > 512 * 1024 * 1024) {
--    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) {
++    if (l1_size > 32 * 1024 * 1024) {
-+    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+         /*
-+        bs->blk.zoned != BLK_Z_NONE) {
+          * Although with big capacity and small l1_entry_sectors, we can get a
-         qemu_co_mutex_unlock(&wps->colock);
+          * big l1_size, we don't want unbounded value to allocate the table.
-     }
+-         * Limit it to 512M, which is:
- }
+-         *     16PB - for default "Hosted Sparse Extent" (VMDK4)
 -         *            cluster size: 64KB, L2 table size: 512 entries
 -         *     1PB  - for default "ESXi Host Sparse Extent" (VMDK3/vmfsSparse)
 -         *            cluster size: 512B, L2 table size: 4096 entries
 +         * Limit it to 32M, which is enough to store:
 +         *     8TB  - for both VMDK3 & VMDK4 with
 +         *            minimal cluster size: 512B
 +         *            minimal L2 table size: 512 entries
 +         *            8 TB is still more than the maximal value supported for
 +         *            VMDK3 & VMDK4 which is 2TB.
           */
          error_setg(errp, "L1 size too big");
          return -EFBIG;
 diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qemu-iotests/059.out
 +++ b/tests/qemu-iotests/059.out
@@ -XXX,XX +XXX,XX @@ Offset          Length          Mapped to       File
 x140000000     0x10000         0x50000         TEST_DIR/t-s003.vmdk
  === Testing afl image with a very large capacity ===
 -qemu-img: Can't get image size 'TEST_DIR/afl9.IMGFMT': File too large
 +qemu-img: Could not open 'TEST_DIR/afl9.IMGFMT': L1 size too big
  *** done
 --
-.41.0
+.21.0

-[PULL 04/14] test-throttle: test read only and write only
+[Qemu-devel] [PULL v2 6/8] vmdk: Add read-only support for seSparse snapshots
-From: zhenwei pi <pizhenwei@bytedance.com>
+From: Sam Eiderman <shmuel.eiderman@oracle.com>
-Reviewed-by: Alberto Garcia <berto@igalia.com>
+Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+QEMU).
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
-Message-Id: <20230728022006.1098509-5-pizhenwei@bytedance.com>
+This format was lacking in the following:
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
     * Grain directory (L1) and grain table (L2) entries were 32-bit,
       allowing access to only 2TB (slightly less) of data.
     * The grain size (default) was 512 bytes - leading to data
       fragmentation and many grain tables.
     * For space reclamation purposes, it was necessary to find all the
       grains which are not pointed to by any grain table - so a reverse
       mapping of "offset of grain in vmdk" to "grain table" must be
       constructed - which takes large amounts of CPU/RAM.
 The format specification can be found in VMware's documentation:
 https://www.vmware.com/support/developer/vddk/vmdk_50_technote.pdf
 In ESXi 6.5, to support snapshot files larger than 2TB, a new format was
 introduced: SESparse (Space Efficient).
 This format fixes the above issues:
     * All entries are now 64-bit.
     * The grain size (default) is 4KB.
     * Grain directory and grain tables are now located at the beginning
       of the file.
       + seSparse format reserves space for all grain tables.
       + Grain tables can be addressed using an index.
       + Grains are located in the end of the file and can also be
         addressed with an index.
       - seSparse vmdks of large disks (64TB) have huge preallocated
         headers - mainly due to L2 tables, even for empty snapshots.
     * The header contains a reverse mapping ("backmap") of "offset of
       grain in vmdk" to "grain table" and a bitmap ("free bitmap") which
       specifies for each grain - whether it is allocated or not.
       Using these data structures we can implement space reclamation
       efficiently.
     * Due to the fact that the header now maintains two mappings:
         * The regular one (grain directory & grain tables)
         * A reverse one (backmap and free bitmap)
       These data structures can lose consistency upon crash and result
       in a corrupted VMDK.
       Therefore, a journal is also added to the VMDK and is replayed
       when the VMware reopens the file after a crash.
 Since ESXi 6.7 - SESparse is the only snapshot format available.
 Unfortunately, VMware does not provide documentation regarding the new
 seSparse format.
 This commit is based on black-box research of the seSparse format.
 Various in-guest block operations and their effect on the snapshot file
 were tested.
 The only VMware provided source of information (regarding the underlying
 implementation) was a log file on the ESXi:
     /var/log/hostd.log
 Whenever an seSparse snapshot is created - the log is being populated
 with seSparse records.
 Relevant log records are of the form:
 [...] Const Header:
 [...]  constMagic     = 0xcafebabe
 [...]  version        = 2.1
 [...]  capacity       = 204800
 [...]  grainSize      = 8
 [...]  grainTableSize = 64
 [...]  flags          = 0
 [...] Extents:
 [...]  Header         : <1 : 1>
 [...]  JournalHdr     : <2 : 2>
 [...]  Journal        : <2048 : 2048>
 [...]  GrainDirectory : <4096 : 2048>
 [...]  GrainTables    : <6144 : 2048>
 [...]  FreeBitmap     : <8192 : 2048>
 [...]  BackMap        : <10240 : 2048>
 [...]  Grain          : <12288 : 204800>
 [...] Volatile Header:
 [...] volatileMagic     = 0xcafecafe
 [...] FreeGTNumber      = 0
 [...] nextTxnSeqNumber  = 0
 [...] replayJournal     = 0
 The sizes that are seen in the log file are in sectors.
 Extents are of the following format: <offset : size>
 This commit is a strict implementation which enforces:
     * magics
     * version number 2.1
     * grain size of 8 sectors  (4KB)
     * grain table size of 64 sectors
     * zero flags
     * extent locations
 Additionally, this commit proivdes only a subset of the functionality
 offered by seSparse's format:
     * Read-only
     * No journal replay
     * No space reclamation
     * No unmap support
 Hence, journal header, journal, free bitmap and backmap extents are
 unused, only the "classic" (L1 -> L2 -> data) grain access is
 implemented.
 However there are several differences in the grain access itself.
 Grain directory (L1):
     * Grain directory entries are indexes (not offsets) to grain
       tables.
     * Valid grain directory entries have their highest nibble set to
 x1.
     * Since grain tables are always located in the beginning of the
       file - the index can fit into 32 bits - so we can use its low
       part if it's valid.
 Grain table (L2):
     * Grain table entries are indexes (not offsets) to grains.
     * If the highest nibble of the entry is:
 x0:
             The grain in not allocated.
             The rest of the bytes are 0.
 x1:
             The grain is unmapped - guest sees a zero grain.
             The rest of the bits point to the previously mapped grain,
             see 0x3 case.
 x2:
             The grain is zero.
 x3:
             The grain is allocated - to get the index calculate:
             ((entry & 0x0fff000000000000) >> 48) |
             ((entry & 0x0000ffffffffffff) << 12)
     * The difference between 0x1 and 0x2 is that 0x1 is an unallocated
       grain which results from the guest using sg_unmap to unmap the
       grain - but the grain itself still exists in the grain extent - a
       space reclamation procedure should delete it.
       Unmapping a zero grain has no effect (0x2 will not change to 0x1)
       but unmapping an unallocated grain will (0x0 to 0x1) - naturally.
 In order to implement seSparse some fields had to be changed to support
 both 32-bit and 64-bit entry sizes.
 Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
 Reviewed-by: Eyal Moscovici <eyal.moscovici@oracle.com>
 Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com>
 Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
 Message-id: 20190620091057.47441-4-shmuel.eiderman@oracle.com
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- tests/unit/test-throttle.c | 66 ++++++++++++++++++++++++++++++++++++++
+ block/vmdk.c | 358 ++++++++++++++++++++++++++++++++++++++++++++++++---
-file changed, 66 insertions(+)
+file changed, 342 insertions(+), 16 deletions(-)
-diff --git a/tests/unit/test-throttle.c b/tests/unit/test-throttle.c
+diff --git a/block/vmdk.c b/block/vmdk.c
 index XXXXXXX..XXXXXXX 100644
---- a/tests/unit/test-throttle.c
+--- a/block/vmdk.c
-+++ b/tests/unit/test-throttle.c
++++ b/block/vmdk.c
-@@ -XXX,XX +XXX,XX @@ static void test_init(void)
+@@ -XXX,XX +XXX,XX @@ typedef struct {
-     throttle_timers_destroy(tt);
+     uint16_t compressAlgorithm;
  } QEMU_PACKED VMDK4Header;
 +typedef struct VMDKSESparseConstHeader {
 +    uint64_t magic;
 +    uint64_t version;
 +    uint64_t capacity;
 +    uint64_t grain_size;
 +    uint64_t grain_table_size;
 +    uint64_t flags;
 +    uint64_t reserved1;
 +    uint64_t reserved2;
 +    uint64_t reserved3;
 +    uint64_t reserved4;
 +    uint64_t volatile_header_offset;
 +    uint64_t volatile_header_size;
 +    uint64_t journal_header_offset;
 +    uint64_t journal_header_size;
 +    uint64_t journal_offset;
 +    uint64_t journal_size;
 +    uint64_t grain_dir_offset;
 +    uint64_t grain_dir_size;
 +    uint64_t grain_tables_offset;
 +    uint64_t grain_tables_size;
 +    uint64_t free_bitmap_offset;
 +    uint64_t free_bitmap_size;
 +    uint64_t backmap_offset;
 +    uint64_t backmap_size;
 +    uint64_t grains_offset;
 +    uint64_t grains_size;
 +    uint8_t pad[304];
 +} QEMU_PACKED VMDKSESparseConstHeader;
 +
 +typedef struct VMDKSESparseVolatileHeader {
 +    uint64_t magic;
 +    uint64_t free_gt_number;
 +    uint64_t next_txn_seq_number;
 +    uint64_t replay_journal;
 +    uint8_t pad[480];
 +} QEMU_PACKED VMDKSESparseVolatileHeader;
 +
  #define L2_CACHE_SIZE 16
  typedef struct VmdkExtent {
@@ -XXX,XX +XXX,XX @@ typedef struct VmdkExtent {
      bool compressed;
      bool has_marker;
      bool has_zero_grain;
 +    bool sesparse;
 +    uint64_t sesparse_l2_tables_offset;
 +    uint64_t sesparse_clusters_offset;
 +    int32_t entry_size;
      int version;
      int64_t sectors;
      int64_t end_sector;
      int64_t flat_start_offset;
      int64_t l1_table_offset;
      int64_t l1_backup_table_offset;
 -    uint32_t *l1_table;
 +    void *l1_table;
      uint32_t *l1_backup_table;
      unsigned int l1_size;
      uint32_t l1_entry_sectors;
      unsigned int l2_size;
 -    uint32_t *l2_cache;
 +    void *l2_cache;
      uint32_t l2_cache_offsets[L2_CACHE_SIZE];
      uint32_t l2_cache_counts[L2_CACHE_SIZE];
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
           *            minimal L2 table size: 512 entries
           *            8 TB is still more than the maximal value supported for
           *            VMDK3 & VMDK4 which is 2TB.
 +         *     64TB - for "ESXi seSparse Extent"
 +         *            minimal cluster size: 512B (default is 4KB)
 +         *            L2 table size: 4096 entries (const).
 +         *            64TB is more than the maximal value supported for
 +         *            seSparse VMDKs (which is slightly less than 64TB)
           */
          error_setg(errp, "L1 size too big");
          return -EFBIG;
@@ -XXX,XX +XXX,XX @@ static int vmdk_add_extent(BlockDriverState *bs,
      extent->l2_size = l2_size;
      extent->cluster_sectors = flat ? sectors : cluster_sectors;
      extent->next_cluster_sector = ROUND_UP(nb_sectors, cluster_sectors);
 +    extent->entry_size = sizeof(uint32_t);
      if (s->num_extents > 1) {
          extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
      int i;
      /* read the L1 table */
 -    l1_size = extent->l1_size * sizeof(uint32_t);
 +    l1_size = extent->l1_size * extent->entry_size;
      extent->l1_table = g_try_malloc(l1_size);
      if (l1_size && extent->l1_table == NULL) {
          return -ENOMEM;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
          goto fail_l1;
      }
      for (i = 0; i < extent->l1_size; i++) {
 -        le32_to_cpus(&extent->l1_table[i]);
 +        if (extent->entry_size == sizeof(uint64_t)) {
 +            le64_to_cpus((uint64_t *)extent->l1_table + i);
 +        } else {
 +            assert(extent->entry_size == sizeof(uint32_t));
 +            le32_to_cpus((uint32_t *)extent->l1_table + i);
 +        }
      }
      if (extent->l1_backup_table_offset) {
 +        assert(!extent->sesparse);
          extent->l1_backup_table = g_try_malloc(l1_size);
          if (l1_size && extent->l1_backup_table == NULL) {
              ret = -ENOMEM;
@@ -XXX,XX +XXX,XX @@ static int vmdk_init_tables(BlockDriverState *bs, VmdkExtent *extent,
      }
      extent->l2_cache =
 -        g_new(uint32_t, extent->l2_size * L2_CACHE_SIZE);
 +        g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE);
      return 0;
   fail_l1b:
      g_free(extent->l1_backup_table);
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
      return ret;
  }
-+static void test_init_readonly(void)
++#define SESPARSE_CONST_HEADER_MAGIC UINT64_C(0x00000000cafebabe)
 +#define SESPARSE_VOLATILE_HEADER_MAGIC UINT64_C(0x00000000cafecafe)
 +
 +/* Strict checks - format not officially documented */
 +static int check_se_sparse_const_header(VMDKSESparseConstHeader *header,
 +                                        Error **errp)
 +{
-+    int i;
++    header->magic = le64_to_cpu(header->magic);
-+
++    header->version = le64_to_cpu(header->version);
-+    tt = &tgm.throttle_timers;
++    header->grain_size = le64_to_cpu(header->grain_size);
-+
++    header->grain_table_size = le64_to_cpu(header->grain_table_size);
-+    /* fill the structures with crap */
++    header->flags = le64_to_cpu(header->flags);
-+    memset(&ts, 1, sizeof(ts));
++    header->reserved1 = le64_to_cpu(header->reserved1);
-+    memset(tt, 1, sizeof(*tt));
++    header->reserved2 = le64_to_cpu(header->reserved2);
-+
++    header->reserved3 = le64_to_cpu(header->reserved3);
-+    /* init structures */
++    header->reserved4 = le64_to_cpu(header->reserved4);
-+    throttle_init(&ts);
++
-+    throttle_timers_init(tt, ctx, QEMU_CLOCK_VIRTUAL,
++    header->volatile_header_offset =
-+                         read_timer_cb, NULL, &ts);
++        le64_to_cpu(header->volatile_header_offset);
-+
++    header->volatile_header_size = le64_to_cpu(header->volatile_header_size);
-+    /* check initialized fields */
++
-+    g_assert(tt->clock_type == QEMU_CLOCK_VIRTUAL);
++    header->journal_header_offset = le64_to_cpu(header->journal_header_offset);
-+    g_assert(tt->timers[THROTTLE_READ]);
++    header->journal_header_size = le64_to_cpu(header->journal_header_size);
-+    g_assert(!tt->timers[THROTTLE_WRITE]);
++
-+
++    header->journal_offset = le64_to_cpu(header->journal_offset);
-+    /* check other fields where cleared */
++    header->journal_size = le64_to_cpu(header->journal_size);
-+    g_assert(!ts.previous_leak);
++
-+    g_assert(!ts.cfg.op_size);
++    header->grain_dir_offset = le64_to_cpu(header->grain_dir_offset);
-+    for (i = 0; i < BUCKETS_COUNT; i++) {
++    header->grain_dir_size = le64_to_cpu(header->grain_dir_size);
-+        g_assert(!ts.cfg.buckets[i].avg);
++
-+        g_assert(!ts.cfg.buckets[i].max);
++    header->grain_tables_offset = le64_to_cpu(header->grain_tables_offset);
-+        g_assert(!ts.cfg.buckets[i].level);
++    header->grain_tables_size = le64_to_cpu(header->grain_tables_size);
-+    }
++
-+
++    header->free_bitmap_offset = le64_to_cpu(header->free_bitmap_offset);
-+    throttle_timers_destroy(tt);
++    header->free_bitmap_size = le64_to_cpu(header->free_bitmap_size);
 +
 +    header->backmap_offset = le64_to_cpu(header->backmap_offset);
 +    header->backmap_size = le64_to_cpu(header->backmap_size);
 +
 +    header->grains_offset = le64_to_cpu(header->grains_offset);
 +    header->grains_size = le64_to_cpu(header->grains_size);
 +
 +    if (header->magic != SESPARSE_CONST_HEADER_MAGIC) {
 +        error_setg(errp, "Bad const header magic: 0x%016" PRIx64,
 +                   header->magic);
 +        return -EINVAL;
 +    }
 +
 +    if (header->version != 0x0000000200000001) {
 +        error_setg(errp, "Unsupported version: 0x%016" PRIx64,
 +                   header->version);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->grain_size != 8) {
 +        error_setg(errp, "Unsupported grain size: %" PRIu64,
 +                   header->grain_size);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->grain_table_size != 64) {
 +        error_setg(errp, "Unsupported grain table size: %" PRIu64,
 +                   header->grain_table_size);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->flags != 0) {
 +        error_setg(errp, "Unsupported flags: 0x%016" PRIx64,
 +                   header->flags);
 +        return -ENOTSUP;
 +    }
 +
 +    if (header->reserved1 != 0 || header->reserved2 != 0 ||
 +        header->reserved3 != 0 || header->reserved4 != 0) {
 +        error_setg(errp, "Unsupported reserved bits:"
 +                   " 0x%016" PRIx64 " 0x%016" PRIx64
 +                   " 0x%016" PRIx64 " 0x%016" PRIx64,
 +                   header->reserved1, header->reserved2,
 +                   header->reserved3, header->reserved4);
 +        return -ENOTSUP;
 +    }
 +
 +    /* check that padding is 0 */
 +    if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
 +        error_setg(errp, "Unsupported non-zero const header padding");
 +        return -ENOTSUP;
 +    }
 +
 +    return 0;
 +}
 +
-+static void test_init_writeonly(void)
++static int check_se_sparse_volatile_header(VMDKSESparseVolatileHeader *header,
 +                                           Error **errp)
 +{
-+    int i;
++    header->magic = le64_to_cpu(header->magic);
-+
++    header->free_gt_number = le64_to_cpu(header->free_gt_number);
-+    tt = &tgm.throttle_timers;
++    header->next_txn_seq_number = le64_to_cpu(header->next_txn_seq_number);
-+
++    header->replay_journal = le64_to_cpu(header->replay_journal);
-+    /* fill the structures with crap */
++
-+    memset(&ts, 1, sizeof(ts));
++    if (header->magic != SESPARSE_VOLATILE_HEADER_MAGIC) {
-+    memset(tt, 1, sizeof(*tt));
++        error_setg(errp, "Bad volatile header magic: 0x%016" PRIx64,
-+
++                   header->magic);
-+    /* init structures */
++        return -EINVAL;
-+    throttle_init(&ts);
++    }
-+    throttle_timers_init(tt, ctx, QEMU_CLOCK_VIRTUAL,
++
-+                         NULL, write_timer_cb, &ts);
++    if (header->replay_journal) {
-+
++        error_setg(errp, "Image is dirty, Replaying journal not supported");
-+    /* check initialized fields */
++        return -ENOTSUP;
-+    g_assert(tt->clock_type == QEMU_CLOCK_VIRTUAL);
++    }
-+    g_assert(!tt->timers[THROTTLE_READ]);
++
-+    g_assert(tt->timers[THROTTLE_WRITE]);
++    /* check that padding is 0 */
-+
++    if (!buffer_is_zero(header->pad, sizeof(header->pad))) {
-+    /* check other fields where cleared */
++        error_setg(errp, "Unsupported non-zero volatile header padding");
-+    g_assert(!ts.previous_leak);
++        return -ENOTSUP;
-+    g_assert(!ts.cfg.op_size);
++    }
-+    for (i = 0; i < BUCKETS_COUNT; i++) {
++
-+        g_assert(!ts.cfg.buckets[i].avg);
++    return 0;
 +        g_assert(!ts.cfg.buckets[i].max);
 +        g_assert(!ts.cfg.buckets[i].level);
 +    }
 +
 +    throttle_timers_destroy(tt);
 +}
 +
- static void test_destroy(void)
++static int vmdk_open_se_sparse(BlockDriverState *bs,
 +                               BdrvChild *file,
 +                               int flags, Error **errp)
 +{
 +    int ret;
 +    VMDKSESparseConstHeader const_header;
 +    VMDKSESparseVolatileHeader volatile_header;
 +    VmdkExtent *extent;
 +
 +    ret = bdrv_apply_auto_read_only(bs,
 +            "No write support for seSparse images available", errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    assert(sizeof(const_header) == SECTOR_SIZE);
 +
 +    ret = bdrv_pread(file, 0, &const_header, sizeof(const_header));
 +    if (ret < 0) {
 +        bdrv_refresh_filename(file->bs);
 +        error_setg_errno(errp, -ret,
 +                         "Could not read const header from file '%s'",
 +                         file->bs->filename);
 +        return ret;
 +    }
 +
 +    /* check const header */
 +    ret = check_se_sparse_const_header(&const_header, errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    assert(sizeof(volatile_header) == SECTOR_SIZE);
 +
 +    ret = bdrv_pread(file,
 +                     const_header.volatile_header_offset * SECTOR_SIZE,
 +                     &volatile_header, sizeof(volatile_header));
 +    if (ret < 0) {
 +        bdrv_refresh_filename(file->bs);
 +        error_setg_errno(errp, -ret,
 +                         "Could not read volatile header from file '%s'",
 +                         file->bs->filename);
 +        return ret;
 +    }
 +
 +    /* check volatile header */
 +    ret = check_se_sparse_volatile_header(&volatile_header, errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    ret = vmdk_add_extent(bs, file, false,
 +                          const_header.capacity,
 +                          const_header.grain_dir_offset * SECTOR_SIZE,
 +                          0,
 +                          const_header.grain_dir_size *
 +                          SECTOR_SIZE / sizeof(uint64_t),
 +                          const_header.grain_table_size *
 +                          SECTOR_SIZE / sizeof(uint64_t),
 +                          const_header.grain_size,
 +                          &extent,
 +                          errp);
 +    if (ret < 0) {
 +        return ret;
 +    }
 +
 +    extent->sesparse = true;
 +    extent->sesparse_l2_tables_offset = const_header.grain_tables_offset;
 +    extent->sesparse_clusters_offset = const_header.grains_offset;
 +    extent->entry_size = sizeof(uint64_t);
 +
 +    ret = vmdk_init_tables(bs, extent, errp);
 +    if (ret) {
 +        /* free extent allocated by vmdk_add_extent */
 +        vmdk_free_last_extent(bs);
 +    }
 +
 +    return ret;
 +}
 +
  static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
                                 QDict *options, Error **errp);
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
           * RW [size in sectors] SPARSE "file-name.vmdk"
           * RW [size in sectors] VMFS "file-name.vmdk"
           * RW [size in sectors] VMFSSPARSE "file-name.vmdk"
 +         * RW [size in sectors] SESPARSE "file-name.vmdk"
           */
          flat_offset = -1;
          matches = sscanf(p, "%10s %" SCNd64 " %10s \"%511[^\n\r\"]\" %" SCNd64,
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
          if (sectors <= 0 ||
              (strcmp(type, "FLAT") && strcmp(type, "SPARSE") &&
 -             strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE")) ||
 +             strcmp(type, "VMFS") && strcmp(type, "VMFSSPARSE") &&
 +             strcmp(type, "SESPARSE")) ||
              (strcmp(access, "RW"))) {
              continue;
          }
@@ -XXX,XX +XXX,XX @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
                  return ret;
              }
              extent = &s->extents[s->num_extents - 1];
 +        } else if (!strcmp(type, "SESPARSE")) {
 +            ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
 +            if (ret) {
 +                bdrv_unref_child(bs, extent_file);
 +                return ret;
 +            }
 +            extent = &s->extents[s->num_extents - 1];
          } else {
              error_setg(errp, "Unsupported extent type '%s'", type);
              bdrv_unref_child(bs, extent_file);
@@ -XXX,XX +XXX,XX @@ static int vmdk_open_desc_file(BlockDriverState *bs, int flags, char *buf,
      if (strcmp(ct, "monolithicFlat") &&
          strcmp(ct, "vmfs") &&
          strcmp(ct, "vmfsSparse") &&
 +        strcmp(ct, "seSparse") &&
          strcmp(ct, "twoGbMaxExtentSparse") &&
          strcmp(ct, "twoGbMaxExtentFlat")) {
          error_setg(errp, "Unsupported image type '%s'", ct);
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
  {
-     int i;
+     unsigned int l1_index, l2_offset, l2_index;
-@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
+     int min_index, i, j;
-     g_test_add_func("/throttle/leak_bucket",        test_leak_bucket);
+-    uint32_t min_count, *l2_table;
-     g_test_add_func("/throttle/compute_wait",       test_compute_wait);
++    uint32_t min_count;
-     g_test_add_func("/throttle/init",               test_init);
++    void *l2_table;
-+    g_test_add_func("/throttle/init_readonly",      test_init_readonly);
+     bool zeroed = false;
-+    g_test_add_func("/throttle/init_writeonly",     test_init_writeonly);
+     int64_t ret;
-     g_test_add_func("/throttle/destroy",            test_destroy);
+     int64_t cluster_sector;
-     g_test_add_func("/throttle/have_timer",         test_have_timer);
++    unsigned int l2_size_bytes = extent->l2_size * extent->entry_size;
-     g_test_add_func("/throttle/detach_attach",      test_detach_attach);
      if (m_data) {
          m_data->valid = 0;
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
      if (l1_index >= extent->l1_size) {
          return VMDK_ERROR;
      }
 -    l2_offset = extent->l1_table[l1_index];
 +    if (extent->sesparse) {
 +        uint64_t l2_offset_u64;
 +
 +        assert(extent->entry_size == sizeof(uint64_t));
 +
 +        l2_offset_u64 = ((uint64_t *)extent->l1_table)[l1_index];
 +        if (l2_offset_u64 == 0) {
 +            l2_offset = 0;
 +        } else if ((l2_offset_u64 & 0xffffffff00000000) != 0x1000000000000000) {
 +            /*
 +             * Top most nibble is 0x1 if grain table is allocated.
 +             * strict check - top most 4 bytes must be 0x10000000 since max
 +             * supported size is 64TB for disk - so no more than 64TB / 16MB
 +             * grain directories which is smaller than uint32,
 +             * where 16MB is the only supported default grain table coverage.
 +             */
 +            return VMDK_ERROR;
 +        } else {
 +            l2_offset_u64 = l2_offset_u64 & 0x00000000ffffffff;
 +            l2_offset_u64 = extent->sesparse_l2_tables_offset +
 +                l2_offset_u64 * l2_size_bytes / SECTOR_SIZE;
 +            if (l2_offset_u64 > 0x00000000ffffffff) {
 +                return VMDK_ERROR;
 +            }
 +            l2_offset = (unsigned int)(l2_offset_u64);
 +        }
 +    } else {
 +        assert(extent->entry_size == sizeof(uint32_t));
 +        l2_offset = ((uint32_t *)extent->l1_table)[l1_index];
 +    }
      if (!l2_offset) {
          return VMDK_UNALLOC;
      }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
                      extent->l2_cache_counts[j] >>= 1;
                  }
              }
 -            l2_table = extent->l2_cache + (i * extent->l2_size);
 +            l2_table = (char *)extent->l2_cache + (i * l2_size_bytes);
              goto found;
          }
      }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
              min_index = i;
          }
      }
 -    l2_table = extent->l2_cache + (min_index * extent->l2_size);
 +    l2_table = (char *)extent->l2_cache + (min_index * l2_size_bytes);
      BLKDBG_EVENT(extent->file, BLKDBG_L2_LOAD);
      if (bdrv_pread(extent->file,
                  (int64_t)l2_offset * 512,
                  l2_table,
 -                extent->l2_size * sizeof(uint32_t)
 -            ) != extent->l2_size * sizeof(uint32_t)) {
 +                l2_size_bytes
 +            ) != l2_size_bytes) {
          return VMDK_ERROR;
      }
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
      extent->l2_cache_counts[min_index] = 1;
   found:
      l2_index = ((offset >> 9) / extent->cluster_sectors) % extent->l2_size;
 -    cluster_sector = le32_to_cpu(l2_table[l2_index]);
 -    if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
 -        zeroed = true;
 +    if (extent->sesparse) {
 +        cluster_sector = le64_to_cpu(((uint64_t *)l2_table)[l2_index]);
 +        switch (cluster_sector & 0xf000000000000000) {
 +        case 0x0000000000000000:
 +            /* unallocated grain */
 +            if (cluster_sector != 0) {
 +                return VMDK_ERROR;
 +            }
 +            break;
 +        case 0x1000000000000000:
 +            /* scsi-unmapped grain - fallthrough */
 +        case 0x2000000000000000:
 +            /* zero grain */
 +            zeroed = true;
 +            break;
 +        case 0x3000000000000000:
 +            /* allocated grain */
 +            cluster_sector = (((cluster_sector & 0x0fff000000000000) >> 48) |
 +                              ((cluster_sector & 0x0000ffffffffffff) << 12));
 +            cluster_sector = extent->sesparse_clusters_offset +
 +                cluster_sector * extent->cluster_sectors;
 +            break;
 +        default:
 +            return VMDK_ERROR;
 +        }
 +    } else {
 +        cluster_sector = le32_to_cpu(((uint32_t *)l2_table)[l2_index]);
 +
 +        if (extent->has_zero_grain && cluster_sector == VMDK_GTE_ZEROED) {
 +            zeroed = true;
 +        }
      }
      if (!cluster_sector || zeroed) {
          if (!allocate) {
              return zeroed ? VMDK_ZEROED : VMDK_UNALLOC;
          }
 +        assert(!extent->sesparse);
          if (extent->next_cluster_sector >= VMDK_EXTENT_MAX_SECTORS) {
              return VMDK_ERROR;
@@ -XXX,XX +XXX,XX @@ static int get_cluster_offset(BlockDriverState *bs,
              m_data->l1_index = l1_index;
              m_data->l2_index = l2_index;
              m_data->l2_offset = l2_offset;
 -            m_data->l2_cache_entry = &l2_table[l2_index];
 +            m_data->l2_cache_entry = ((uint32_t *)l2_table) + l2_index;
          }
      }
      *cluster_offset = cluster_sector << BDRV_SECTOR_BITS;
@@ -XXX,XX +XXX,XX @@ static int vmdk_pwritev(BlockDriverState *bs, uint64_t offset,
          if (!extent) {
              return -EIO;
          }
 +        if (extent->sesparse) {
 +            return -ENOTSUP;
 +        }
          offset_in_cluster = vmdk_find_offset_in_cluster(extent, offset);
          n_bytes = MIN(bytes, extent->cluster_sectors * BDRV_SECTOR_SIZE
                               - offset_in_cluster);
 --
-.41.0
+.21.0

-[PULL 03/14] throttle: support read-only and write-only
+[Qemu-devel] [PULL v2 7/8] ssh: switch from libssh2 to libssh
-From: zhenwei pi <pizhenwei@bytedance.com>
+From: Pino Toscano <ptoscano@redhat.com>
-Only one direction is necessary in several scenarios:
+Rewrite the implementation of the ssh block driver to use libssh instead
-- a read-only disk
+of libssh2.  The libssh library has various advantages over libssh2:
-- operations on a device are considered as *write* only. For example,
+- easier API for authentication (for example for using ssh-agent)
-  encrypt/decrypt/sign/verify operations on a cryptodev use a single
+- easier API for known_hosts handling
-  *write* timer(read timer callback is defined, but never invoked).
+- supports newer types of keys in known_hosts
-Allow a single direction in throttle, this reduces memory, and uplayer
+Use APIs/features available in libssh 0.8 conditionally, to support
-does not need a dummy callback any more.
+older versions (which are not recommended though).
-Reviewed-by: Alberto Garcia <berto@igalia.com>
+Adjust the iotest 207 according to the different error message, and to
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+find the default key type for localhost (to properly compare the
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
+fingerprint with).
-Message-Id: <20230728022006.1098509-4-pizhenwei@bytedance.com>
+Contributed-by: Max Reitz <mreitz@redhat.com>
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 Adjust the various Docker/Travis scripts to use libssh when available
 instead of libssh2. The mingw/mxe testing is dropped for now, as there
 are no packages for it.
 Signed-off-by: Pino Toscano <ptoscano@redhat.com>
 Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Acked-by: Alex Bennée <alex.bennee@linaro.org>
 Message-id: 20190620200840.17655-1-ptoscano@redhat.com
 Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-id: 5873173.t2JhDm7DL7@lindworm.usersys.redhat.com
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
- util/throttle.c | 42 ++++++++++++++++++++++++++++--------------
+ configure                                     |  65 +-
-file changed, 28 insertions(+), 14 deletions(-)
+ block/Makefile.objs                           |   6 +-
  block/ssh.c                                   | 652 ++++++++++--------
  .travis.yml                                   |   4 +-
  block/trace-events                            |  14 +-
  docs/qemu-block-drivers.texi                  |   2 +-
  .../dockerfiles/debian-win32-cross.docker     |   1 -
  .../dockerfiles/debian-win64-cross.docker     |   1 -
  tests/docker/dockerfiles/fedora.docker        |   4 +-
  tests/docker/dockerfiles/ubuntu.docker        |   2 +-
  tests/docker/dockerfiles/ubuntu1804.docker    |   2 +-
  tests/qemu-iotests/207                        |  54 +-
  tests/qemu-iotests/207.out                    |   2 +-
 files changed, 449 insertions(+), 360 deletions(-)
-diff --git a/util/throttle.c b/util/throttle.c
+diff --git a/configure b/configure
 index XXXXXXX..XXXXXXX 100755
 --- a/configure
 +++ b/configure
@@ -XXX,XX +XXX,XX @@ auth_pam=""
  vte=""
  virglrenderer=""
  tpm=""
 -libssh2=""
 +libssh=""
  live_block_migration="yes"
  numa=""
  tcmalloc="no"
@@ -XXX,XX +XXX,XX @@ for opt do
    ;;
    --enable-tpm) tpm="yes"
    ;;
 -  --disable-libssh2) libssh2="no"
 +  --disable-libssh) libssh="no"
    ;;
 -  --enable-libssh2) libssh2="yes"
 +  --enable-libssh) libssh="yes"
    ;;
    --disable-live-block-migration) live_block_migration="no"
    ;;
@@ -XXX,XX +XXX,XX @@ disabled with --disable-FEATURE, default is enabled if available:
    coroutine-pool  coroutine freelist (better performance)
    glusterfs       GlusterFS backend
    tpm             TPM support
 -  libssh2         ssh block device support
 +  libssh          ssh block device support
    numa            libnuma support
    libxml2         for Parallels image format
    tcmalloc        tcmalloc support
@@ -XXX,XX +XXX,XX @@ EOF
  fi
  ##########################################
 -# libssh2 probe
 -min_libssh2_version=1.2.8
 -if test "$libssh2" != "no" ; then
 -  if $pkg_config --atleast-version=$min_libssh2_version libssh2; then
 -    libssh2_cflags=$($pkg_config libssh2 --cflags)
 -    libssh2_libs=$($pkg_config libssh2 --libs)
 -    libssh2=yes
 +# libssh probe
 +if test "$libssh" != "no" ; then
 +  if $pkg_config --exists libssh; then
 +    libssh_cflags=$($pkg_config libssh --cflags)
 +    libssh_libs=$($pkg_config libssh --libs)
 +    libssh=yes
    else
 -    if test "$libssh2" = "yes" ; then
 -      error_exit "libssh2 >= $min_libssh2_version required for --enable-libssh2"
 +    if test "$libssh" = "yes" ; then
 +      error_exit "libssh required for --enable-libssh"
      fi
 -    libssh2=no
 +    libssh=no
    fi
  fi
  ##########################################
 -# libssh2_sftp_fsync probe
 +# Check for libssh 0.8
 +# This is done like this instead of using the LIBSSH_VERSION_* and
 +# SSH_VERSION_* macros because some distributions in the past shipped
 +# snapshots of the future 0.8 from Git, and those snapshots did not
 +# have updated version numbers (still referring to 0.7.0).
 -if test "$libssh2" = "yes"; then
 +if test "$libssh" = "yes"; then
    cat > $TMPC <<EOF
 -#include <stdio.h>
 -#include <libssh2.h>
 -#include <libssh2_sftp.h>
 -int main(void) {
 -    LIBSSH2_SESSION *session;
 -    LIBSSH2_SFTP *sftp;
 -    LIBSSH2_SFTP_HANDLE *sftp_handle;
 -    session = libssh2_session_init ();
 -    sftp = libssh2_sftp_init (session);
 -    sftp_handle = libssh2_sftp_open (sftp, "/", 0, 0);
 -    libssh2_sftp_fsync (sftp_handle);
 -    return 0;
 -}
 +#include <libssh/libssh.h>
 +int main(void) { return ssh_get_server_publickey(NULL, NULL); }
  EOF
 -  # libssh2_cflags/libssh2_libs defined in previous test.
 -  if compile_prog "$libssh2_cflags" "$libssh2_libs" ; then
 -    QEMU_CFLAGS="-DHAS_LIBSSH2_SFTP_FSYNC $QEMU_CFLAGS"
 +  if compile_prog "$libssh_cflags" "$libssh_libs"; then
 +    libssh_cflags="-DHAVE_LIBSSH_0_8 $libssh_cflags"
    fi
  fi
@@ -XXX,XX +XXX,XX @@ echo "GlusterFS support $glusterfs"
  echo "gcov              $gcov_tool"
  echo "gcov enabled      $gcov"
  echo "TPM support       $tpm"
 -echo "libssh2 support   $libssh2"
 +echo "libssh support    $libssh"
  echo "QOM debugging     $qom_cast_debug"
  echo "Live block migration $live_block_migration"
  echo "lzo support       $lzo"
@@ -XXX,XX +XXX,XX @@ if test "$glusterfs_iocb_has_stat" = "yes" ; then
    echo "CONFIG_GLUSTERFS_IOCB_HAS_STAT=y" >> $config_host_mak
  fi
 -if test "$libssh2" = "yes" ; then
 -  echo "CONFIG_LIBSSH2=m" >> $config_host_mak
 -  echo "LIBSSH2_CFLAGS=$libssh2_cflags" >> $config_host_mak
 -  echo "LIBSSH2_LIBS=$libssh2_libs" >> $config_host_mak
 +if test "$libssh" = "yes" ; then
 +  echo "CONFIG_LIBSSH=m" >> $config_host_mak
 +  echo "LIBSSH_CFLAGS=$libssh_cflags" >> $config_host_mak
 +  echo "LIBSSH_LIBS=$libssh_libs" >> $config_host_mak
  fi
  if test "$live_block_migration" = "yes" ; then
 diff --git a/block/Makefile.objs b/block/Makefile.objs
 index XXXXXXX..XXXXXXX 100644
---- a/util/throttle.c
+--- a/block/Makefile.objs
-+++ b/util/throttle.c
++++ b/block/Makefile.objs
-@@ -XXX,XX +XXX,XX @@ static bool throttle_compute_timer(ThrottleState *ts,
+@@ -XXX,XX +XXX,XX @@ block-obj-$(CONFIG_CURL) += curl.o
- void throttle_timers_attach_aio_context(ThrottleTimers *tt,
+ block-obj-$(CONFIG_RBD) += rbd.o
-                                         AioContext *new_context)
+ block-obj-$(CONFIG_GLUSTERFS) += gluster.o
  block-obj-$(CONFIG_VXHS) += vxhs.o
 -block-obj-$(CONFIG_LIBSSH2) += ssh.o
 +block-obj-$(CONFIG_LIBSSH) += ssh.o
  block-obj-y += accounting.o dirty-bitmap.o
  block-obj-y += write-threshold.o
  block-obj-y += backup.o
@@ -XXX,XX +XXX,XX @@ rbd.o-libs         := $(RBD_LIBS)
  gluster.o-cflags   := $(GLUSTERFS_CFLAGS)
  gluster.o-libs     := $(GLUSTERFS_LIBS)
  vxhs.o-libs        := $(VXHS_LIBS)
 -ssh.o-cflags       := $(LIBSSH2_CFLAGS)
 -ssh.o-libs         := $(LIBSSH2_LIBS)
 +ssh.o-cflags       := $(LIBSSH_CFLAGS)
 +ssh.o-libs         := $(LIBSSH_LIBS)
  block-obj-dmg-bz2-$(CONFIG_BZIP2) += dmg-bz2.o
  block-obj-$(if $(CONFIG_DMG),m,n) += $(block-obj-dmg-bz2-y)
  dmg-bz2.o-libs     := $(BZIP2_LIBS)
 diff --git a/block/ssh.c b/block/ssh.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/ssh.c
 +++ b/block/ssh.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/osdep.h"
 -#include <libssh2.h>
 -#include <libssh2_sftp.h>
 +#include <libssh/libssh.h>
 +#include <libssh/sftp.h>
  #include "block/block_int.h"
  #include "block/qdict.h"
@@ -XXX,XX +XXX,XX @@
  #include "trace.h"
  /*
 - * TRACE_LIBSSH2=<bitmask> enables tracing in libssh2 itself.  Note
 - * that this requires that libssh2 was specially compiled with the
 - * `./configure --enable-debug' option, so most likely you will have
 - * to compile it yourself.  The meaning of <bitmask> is described
 - * here: http://www.libssh2.org/libssh2_trace.html
 + * TRACE_LIBSSH=<level> enables tracing in libssh itself.
 + * The meaning of <level> is described here:
 + * http://api.libssh.org/master/group__libssh__log.html
   */
 -#define TRACE_LIBSSH2 0 /* or try: LIBSSH2_TRACE_SFTP */
 +#define TRACE_LIBSSH  0 /* see: SSH_LOG_* */
  typedef struct BDRVSSHState {
      /* Coroutine. */
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVSSHState {
      /* SSH connection. */
      int sock;                         /* socket */
 -    LIBSSH2_SESSION *session;         /* ssh session */
 -    LIBSSH2_SFTP *sftp;               /* sftp session */
 -    LIBSSH2_SFTP_HANDLE *sftp_handle; /* sftp remote file handle */
 +    ssh_session session;              /* ssh session */
 +    sftp_session sftp;                /* sftp session */
 +    sftp_file sftp_handle;            /* sftp remote file handle */
 -    /* See ssh_seek() function below. */
 -    int64_t offset;
 -    bool offset_op_read;
 -
 -    /* File attributes at open.  We try to keep the .filesize field
 +    /*
 +     * File attributes at open.  We try to keep the .size field
       * updated if it changes (eg by writing at the end of the file).
       */
 -    LIBSSH2_SFTP_ATTRIBUTES attrs;
 +    sftp_attributes attrs;
      InetSocketAddress *inet;
@@ -XXX,XX +XXX,XX @@ static void ssh_state_init(BDRVSSHState *s)
  {
--    tt->timers[THROTTLE_READ] =
+     memset(s, 0, sizeof *s);
--        aio_timer_new(new_context, tt->clock_type, SCALE_NS,
+     s->sock = -1;
--                      tt->timer_cb[THROTTLE_READ], tt->timer_opaque);
+-    s->offset = -1;
--    tt->timers[THROTTLE_WRITE] =
+     qemu_co_mutex_init(&s->lock);
--        aio_timer_new(new_context, tt->clock_type, SCALE_NS,
+ }
--                      tt->timer_cb[THROTTLE_WRITE], tt->timer_opaque);
-+    ThrottleDirection dir;
+@@ -XXX,XX +XXX,XX @@ static void ssh_state_free(BDRVSSHState *s)
-+
+ {
-+    for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
+     g_free(s->user);
-+        if (tt->timer_cb[dir]) {
-+            tt->timers[dir] =
++    if (s->attrs) {
-+                aio_timer_new(new_context, tt->clock_type, SCALE_NS,
++        sftp_attributes_free(s->attrs);
-+                              tt->timer_cb[dir], tt->timer_opaque);
++    }
      if (s->sftp_handle) {
 -        libssh2_sftp_close(s->sftp_handle);
 +        sftp_close(s->sftp_handle);
      }
      if (s->sftp) {
 -        libssh2_sftp_shutdown(s->sftp);
 +        sftp_free(s->sftp);
      }
      if (s->session) {
 -        libssh2_session_disconnect(s->session,
 -                                   "from qemu ssh client: "
 -                                   "user closed the connection");
 -        libssh2_session_free(s->session);
 -    }
 -    if (s->sock >= 0) {
 -        close(s->sock);
 +        ssh_disconnect(s->session);
 +        ssh_free(s->session); /* This frees s->sock */
      }
  }
@@ -XXX,XX +XXX,XX @@ session_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
      va_end(args);
      if (s->session) {
 -        char *ssh_err;
 +        const char *ssh_err;
          int ssh_err_code;
 -        /* This is not an errno.  See <libssh2.h>. */
 -        ssh_err_code = libssh2_session_last_error(s->session,
 -                                                  &ssh_err, NULL, 0);
 -        error_setg(errp, "%s: %s (libssh2 error code: %d)",
 +        /* This is not an errno.  See <libssh/libssh.h>. */
 +        ssh_err = ssh_get_error(s->session);
 +        ssh_err_code = ssh_get_error_code(s->session);
 +        error_setg(errp, "%s: %s (libssh error code: %d)",
                     msg, ssh_err, ssh_err_code);
      } else {
          error_setg(errp, "%s", msg);
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
      va_end(args);
      if (s->sftp) {
 -        char *ssh_err;
 +        const char *ssh_err;
          int ssh_err_code;
 -        unsigned long sftp_err_code;
 +        int sftp_err_code;
 -        /* This is not an errno.  See <libssh2.h>. */
 -        ssh_err_code = libssh2_session_last_error(s->session,
 -                                                  &ssh_err, NULL, 0);
 -        /* See <libssh2_sftp.h>. */
 -        sftp_err_code = libssh2_sftp_last_error((s)->sftp);
 +        /* This is not an errno.  See <libssh/libssh.h>. */
 +        ssh_err = ssh_get_error(s->session);
 +        ssh_err_code = ssh_get_error_code(s->session);
 +        /* See <libssh/sftp.h>. */
 +        sftp_err_code = sftp_get_error(s->sftp);
          error_setg(errp,
 -                   "%s: %s (libssh2 error code: %d, sftp error code: %lu)",
 +                   "%s: %s (libssh error code: %d, sftp error code: %d)",
                     msg, ssh_err, ssh_err_code, sftp_err_code);
      } else {
          error_setg(errp, "%s", msg);
@@ -XXX,XX +XXX,XX @@ sftp_error_setg(Error **errp, BDRVSSHState *s, const char *fs, ...)
  static void sftp_error_trace(BDRVSSHState *s, const char *op)
  {
 -    char *ssh_err;
 +    const char *ssh_err;
      int ssh_err_code;
 -    unsigned long sftp_err_code;
 +    int sftp_err_code;
 -    /* This is not an errno.  See <libssh2.h>. */
 -    ssh_err_code = libssh2_session_last_error(s->session,
 -                                              &ssh_err, NULL, 0);
 -    /* See <libssh2_sftp.h>. */
 -    sftp_err_code = libssh2_sftp_last_error((s)->sftp);
 +    /* This is not an errno.  See <libssh/libssh.h>. */
 +    ssh_err = ssh_get_error(s->session);
 +    ssh_err_code = ssh_get_error_code(s->session);
 +    /* See <libssh/sftp.h>. */
 +    sftp_err_code = sftp_get_error(s->sftp);
      trace_sftp_error(op, ssh_err, ssh_err_code, sftp_err_code);
  }
@@ -XXX,XX +XXX,XX @@ static void ssh_parse_filename(const char *filename, QDict *options,
      parse_uri(filename, options, errp);
  }
 -static int check_host_key_knownhosts(BDRVSSHState *s,
 -                                     const char *host, int port, Error **errp)
 +static int check_host_key_knownhosts(BDRVSSHState *s, Error **errp)
  {
 -    const char *home;
 -    char *knh_file = NULL;
 -    LIBSSH2_KNOWNHOSTS *knh = NULL;
 -    struct libssh2_knownhost *found;
 -    int ret, r;
 -    const char *hostkey;
 -    size_t len;
 -    int type;
 -
 -    hostkey = libssh2_session_hostkey(s->session, &len, &type);
 -    if (!hostkey) {
 +    int ret;
 +#ifdef HAVE_LIBSSH_0_8
 +    enum ssh_known_hosts_e state;
 +    int r;
 +    ssh_key pubkey;
 +    enum ssh_keytypes_e pubkey_type;
 +    unsigned char *server_hash = NULL;
 +    size_t server_hash_len;
 +    char *fingerprint = NULL;
 +
 +    state = ssh_session_is_known_server(s->session);
 +    trace_ssh_server_status(state);
 +
 +    switch (state) {
 +    case SSH_KNOWN_HOSTS_OK:
 +        /* OK */
 +        trace_ssh_check_host_key_knownhosts();
 +        break;
 +    case SSH_KNOWN_HOSTS_CHANGED:
          ret = -EINVAL;
 -        session_error_setg(errp, s, "failed to read remote host key");
 +        r = ssh_get_server_publickey(s->session, &pubkey);
 +        if (r == 0) {
 +            r = ssh_get_publickey_hash(pubkey, SSH_PUBLICKEY_HASH_SHA256,
 +                                       &server_hash, &server_hash_len);
 +            pubkey_type = ssh_key_type(pubkey);
 +            ssh_key_free(pubkey);
 +        }
 +        if (r == 0) {
 +            fingerprint = ssh_get_fingerprint_hash(SSH_PUBLICKEY_HASH_SHA256,
 +                                                   server_hash,
 +                                                   server_hash_len);
 +            ssh_clean_pubkey_hash(&server_hash);
 +        }
 +        if (fingerprint) {
 +            error_setg(errp,
 +                       "host key (%s key with fingerprint %s) does not match "
 +                       "the one in known_hosts; this may be a possible attack",
 +                       ssh_key_type_to_char(pubkey_type), fingerprint);
 +            ssh_string_free_char(fingerprint);
 +        } else  {
 +            error_setg(errp,
 +                       "host key does not match the one in known_hosts; this "
 +                       "may be a possible attack");
 +        }
          goto out;
 -    }
 -
 -    knh = libssh2_knownhost_init(s->session);
 -    if (!knh) {
 +    case SSH_KNOWN_HOSTS_OTHER:
          ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                           "failed to initialize known hosts support");
 +        error_setg(errp,
 +                   "host key for this server not found, another type exists");
 +        goto out;
 +    case SSH_KNOWN_HOSTS_UNKNOWN:
 +        ret = -EINVAL;
 +        error_setg(errp, "no host key was found in known_hosts");
 +        goto out;
 +    case SSH_KNOWN_HOSTS_NOT_FOUND:
 +        ret = -ENOENT;
 +        error_setg(errp, "known_hosts file not found");
 +        goto out;
 +    case SSH_KNOWN_HOSTS_ERROR:
 +        ret = -EINVAL;
 +        error_setg(errp, "error while checking the host");
 +        goto out;
 +    default:
 +        ret = -EINVAL;
 +        error_setg(errp, "error while checking for known server (%d)", state);
          goto out;
      }
 +#else /* !HAVE_LIBSSH_0_8 */
 +    int state;
 -    home = getenv("HOME");
 -    if (home) {
 -        knh_file = g_strdup_printf("%s/.ssh/known_hosts", home);
 -    } else {
 -        knh_file = g_strdup_printf("/root/.ssh/known_hosts");
 -    }
 -
 -    /* Read all known hosts from OpenSSH-style known_hosts file. */
 -    libssh2_knownhost_readfile(knh, knh_file, LIBSSH2_KNOWNHOST_FILE_OPENSSH);
 +    state = ssh_is_server_known(s->session);
 +    trace_ssh_server_status(state);
 -    r = libssh2_knownhost_checkp(knh, host, port, hostkey, len,
 -                                 LIBSSH2_KNOWNHOST_TYPE_PLAIN|
 -                                 LIBSSH2_KNOWNHOST_KEYENC_RAW,
 -                                 &found);
 -    switch (r) {
 -    case LIBSSH2_KNOWNHOST_CHECK_MATCH:
 +    switch (state) {
 +    case SSH_SERVER_KNOWN_OK:
          /* OK */
 -        trace_ssh_check_host_key_knownhosts(found->key);
 +        trace_ssh_check_host_key_knownhosts();
          break;
 -    case LIBSSH2_KNOWNHOST_CHECK_MISMATCH:
 +    case SSH_SERVER_KNOWN_CHANGED:
          ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                      "host key does not match the one in known_hosts"
 -                      " (found key %s)", found->key);
 +        error_setg(errp,
 +                   "host key does not match the one in known_hosts; this "
 +                   "may be a possible attack");
          goto out;
 -    case LIBSSH2_KNOWNHOST_CHECK_NOTFOUND:
 +    case SSH_SERVER_FOUND_OTHER:
          ret = -EINVAL;
 -        session_error_setg(errp, s, "no host key was found in known_hosts");
 +        error_setg(errp,
 +                   "host key for this server not found, another type exists");
 +        goto out;
 +    case SSH_SERVER_FILE_NOT_FOUND:
 +        ret = -ENOENT;
 +        error_setg(errp, "known_hosts file not found");
          goto out;
 -    case LIBSSH2_KNOWNHOST_CHECK_FAILURE:
 +    case SSH_SERVER_NOT_KNOWN:
          ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                      "failure matching the host key with known_hosts");
 +        error_setg(errp, "no host key was found in known_hosts");
 +        goto out;
 +    case SSH_SERVER_ERROR:
 +        ret = -EINVAL;
 +        error_setg(errp, "server error");
          goto out;
      default:
          ret = -EINVAL;
 -        session_error_setg(errp, s, "unknown error matching the host key"
 -                      " with known_hosts (%d)", r);
 +        error_setg(errp, "error while checking for known server (%d)", state);
          goto out;
      }
 +#endif /* !HAVE_LIBSSH_0_8 */
      /* known_hosts checking successful. */
      ret = 0;
   out:
 -    if (knh != NULL) {
 -        libssh2_knownhost_free(knh);
 -    }
 -    g_free(knh_file);
      return ret;
  }
@@ -XXX,XX +XXX,XX @@ static int compare_fingerprint(const unsigned char *fingerprint, size_t len,
  static int
  check_host_key_hash(BDRVSSHState *s, const char *hash,
 -                    int hash_type, size_t fingerprint_len, Error **errp)
 +                    enum ssh_publickey_hash_type type, Error **errp)
  {
 -    const char *fingerprint;
 -
 -    fingerprint = libssh2_hostkey_hash(s->session, hash_type);
 -    if (!fingerprint) {
 +    int r;
 +    ssh_key pubkey;
 +    unsigned char *server_hash;
 +    size_t server_hash_len;
 +
 +#ifdef HAVE_LIBSSH_0_8
 +    r = ssh_get_server_publickey(s->session, &pubkey);
 +#else
 +    r = ssh_get_publickey(s->session, &pubkey);
 +#endif
 +    if (r != SSH_OK) {
          session_error_setg(errp, s, "failed to read remote host key");
          return -EINVAL;
      }
 -    if(compare_fingerprint((unsigned char *) fingerprint, fingerprint_len,
 -                           hash) != 0) {
 +    r = ssh_get_publickey_hash(pubkey, type, &server_hash, &server_hash_len);
 +    ssh_key_free(pubkey);
 +    if (r != 0) {
 +        session_error_setg(errp, s,
 +                           "failed reading the hash of the server SSH key");
 +        return -EINVAL;
 +    }
 +
 +    r = compare_fingerprint(server_hash, server_hash_len, hash);
 +    ssh_clean_pubkey_hash(&server_hash);
 +    if (r != 0) {
          error_setg(errp, "remote host key does not match host_key_check '%s'",
                     hash);
          return -EPERM;
@@ -XXX,XX +XXX,XX @@ check_host_key_hash(BDRVSSHState *s, const char *hash,
      return 0;
  }
 -static int check_host_key(BDRVSSHState *s, const char *host, int port,
 -                          SshHostKeyCheck *hkc, Error **errp)
 +static int check_host_key(BDRVSSHState *s, SshHostKeyCheck *hkc, Error **errp)
  {
      SshHostKeyCheckMode mode;
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
      case SSH_HOST_KEY_CHECK_MODE_HASH:
          if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_MD5) {
              return check_host_key_hash(s, hkc->u.hash.hash,
 -                                       LIBSSH2_HOSTKEY_HASH_MD5, 16, errp);
 +                                       SSH_PUBLICKEY_HASH_MD5, errp);
          } else if (hkc->u.hash.type == SSH_HOST_KEY_CHECK_HASH_TYPE_SHA1) {
              return check_host_key_hash(s, hkc->u.hash.hash,
 -                                       LIBSSH2_HOSTKEY_HASH_SHA1, 20, errp);
 +                                       SSH_PUBLICKEY_HASH_SHA1, errp);
          }
          g_assert_not_reached();
          break;
      case SSH_HOST_KEY_CHECK_MODE_KNOWN_HOSTS:
 -        return check_host_key_knownhosts(s, host, port, errp);
 +        return check_host_key_knownhosts(s, errp);
      default:
          g_assert_not_reached();
      }
@@ -XXX,XX +XXX,XX @@ static int check_host_key(BDRVSSHState *s, const char *host, int port,
      return -EINVAL;
  }
 -static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
 +static int authenticate(BDRVSSHState *s, Error **errp)
  {
      int r, ret;
 -    const char *userauthlist;
 -    LIBSSH2_AGENT *agent = NULL;
 -    struct libssh2_agent_publickey *identity;
 -    struct libssh2_agent_publickey *prev_identity = NULL;
 +    int method;
 -    userauthlist = libssh2_userauth_list(s->session, user, strlen(user));
 -    if (strstr(userauthlist, "publickey") == NULL) {
 +    /* Try to authenticate with the "none" method. */
 +    r = ssh_userauth_none(s->session, NULL);
 +    if (r == SSH_AUTH_ERROR) {
          ret = -EPERM;
 -        error_setg(errp,
 -                "remote server does not support \"publickey\" authentication");
 +        session_error_setg(errp, s, "failed to authenticate using none "
 +                                    "authentication");
          goto out;
 -    }
 -
 -    /* Connect to ssh-agent and try each identity in turn. */
 -    agent = libssh2_agent_init(s->session);
 -    if (!agent) {
 -        ret = -EINVAL;
 -        session_error_setg(errp, s, "failed to initialize ssh-agent support");
 -        goto out;
 -    }
 -    if (libssh2_agent_connect(agent)) {
 -        ret = -ECONNREFUSED;
 -        session_error_setg(errp, s, "failed to connect to ssh-agent");
 -        goto out;
 -    }
 -    if (libssh2_agent_list_identities(agent)) {
 -        ret = -EINVAL;
 -        session_error_setg(errp, s,
 -                           "failed requesting identities from ssh-agent");
 +    } else if (r == SSH_AUTH_SUCCESS) {
 +        /* Authenticated! */
 +        ret = 0;
          goto out;
      }
 -    for(;;) {
 -        r = libssh2_agent_get_identity(agent, &identity, prev_identity);
 -        if (r == 1) {           /* end of list */
 -            break;
 -        }
 -        if (r < 0) {
 +    method = ssh_userauth_list(s->session, NULL);
 +    trace_ssh_auth_methods(method);
 +
 +    /*
 +     * Try to authenticate with publickey, using the ssh-agent
 +     * if available.
 +     */
 +    if (method & SSH_AUTH_METHOD_PUBLICKEY) {
 +        r = ssh_userauth_publickey_auto(s->session, NULL, NULL);
 +        if (r == SSH_AUTH_ERROR) {
              ret = -EINVAL;
 -            session_error_setg(errp, s,
 -                               "failed to obtain identity from ssh-agent");
 +            session_error_setg(errp, s, "failed to authenticate using "
 +                                        "publickey authentication");
              goto out;
 -        }
 -        r = libssh2_agent_userauth(agent, user, identity);
 -        if (r == 0) {
 +        } else if (r == SSH_AUTH_SUCCESS) {
              /* Authenticated! */
              ret = 0;
              goto out;
          }
 -        /* Failed to authenticate with this identity, try the next one. */
 -        prev_identity = identity;
      }
      ret = -EPERM;
@@ -XXX,XX +XXX,XX @@ static int authenticate(BDRVSSHState *s, const char *user, Error **errp)
                 "and the identities held by your ssh-agent");
   out:
 -    if (agent != NULL) {
 -        /* Note: libssh2 implementation implicitly calls
 -         * libssh2_agent_disconnect if necessary.
 -         */
 -        libssh2_agent_free(agent);
 -    }
 -
      return ret;
  }
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
                            int ssh_flags, int creat_mode, Error **errp)
  {
      int r, ret;
 -    long port = 0;
 +    unsigned int port = 0;
 +    int new_sock = -1;
      if (opts->has_user) {
          s->user = g_strdup(opts->user);
@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
      s->inet = opts->server;
      opts->server = NULL;
 -    if (qemu_strtol(s->inet->port, NULL, 10, &port) < 0) {
 +    if (qemu_strtoui(s->inet->port, NULL, 10, &port) < 0) {
          error_setg(errp, "Use only numeric port value");
          ret = -EINVAL;
          goto err;
      }
      /* Open the socket and connect. */
 -    s->sock = inet_connect_saddr(s->inet, errp);
 -    if (s->sock < 0) {
 +    new_sock = inet_connect_saddr(s->inet, errp);
 +    if (new_sock < 0) {
          ret = -EIO;
          goto err;
      }
 +    /*
 +     * Try to disable the Nagle algorithm on TCP sockets to reduce latency,
 +     * but do not fail if it cannot be disabled.
 +     */
 +    r = socket_set_nodelay(new_sock);
 +    if (r < 0) {
 +        warn_report("can't set TCP_NODELAY for the ssh server %s: %s",
 +                    s->inet->host, strerror(errno));
 +    }
 +
      /* Create SSH session. */
 -    s->session = libssh2_session_init();
 +    s->session = ssh_new();
      if (!s->session) {
          ret = -EINVAL;
 -        session_error_setg(errp, s, "failed to initialize libssh2 session");
 +        session_error_setg(errp, s, "failed to initialize libssh session");
          goto err;
      }
 -#if TRACE_LIBSSH2 != 0
 -    libssh2_trace(s->session, TRACE_LIBSSH2);
 -#endif
 +    /*
 +     * Make sure we are in blocking mode during the connection and
 +     * authentication phases.
 +     */
 +    ssh_set_blocking(s->session, 1);
 -    r = libssh2_session_handshake(s->session, s->sock);
 -    if (r != 0) {
 +    r = ssh_options_set(s->session, SSH_OPTIONS_USER, s->user);
 +    if (r < 0) {
 +        ret = -EINVAL;
 +        session_error_setg(errp, s,
 +                           "failed to set the user in the libssh session");
 +        goto err;
 +    }
 +
 +    r = ssh_options_set(s->session, SSH_OPTIONS_HOST, s->inet->host);
 +    if (r < 0) {
 +        ret = -EINVAL;
 +        session_error_setg(errp, s,
 +                           "failed to set the host in the libssh session");
 +        goto err;
 +    }
 +
 +    if (port > 0) {
 +        r = ssh_options_set(s->session, SSH_OPTIONS_PORT, &port);
 +        if (r < 0) {
 +            ret = -EINVAL;
 +            session_error_setg(errp, s,
 +                               "failed to set the port in the libssh session");
 +            goto err;
 +        }
 +    }
++
++    r = ssh_options_set(s->session, SSH_OPTIONS_COMPRESSION, "none");
++    if (r < 0) {
++        ret = -EINVAL;
++        session_error_setg(errp, s,
++                           "failed to disable the compression in the libssh "
++                           "session");
++        goto err;
++    }
++
++    /* Read ~/.ssh/config. */
++    r = ssh_options_parse_config(s->session, NULL);
++    if (r < 0) {
++        ret = -EINVAL;
++        session_error_setg(errp, s, "failed to parse ~/.ssh/config");
++        goto err;
++    }
++
++    r = ssh_options_set(s->session, SSH_OPTIONS_FD, &new_sock);
++    if (r < 0) {
++        ret = -EINVAL;
++        session_error_setg(errp, s,
++                           "failed to set the socket in the libssh session");
++        goto err;
++    }
++    /* libssh took ownership of the socket. */
++    s->sock = new_sock;
++    new_sock = -1;
++
++    /* Connect. */
++    r = ssh_connect(s->session);
++    if (r != SSH_OK) {
+         ret = -EINVAL;
+         session_error_setg(errp, s, "failed to establish SSH session");
+         goto err;
+     }
+     /* Check the remote host's key against known_hosts. */
+-    ret = check_host_key(s, s->inet->host, port, opts->host_key_check, errp);
++    ret = check_host_key(s, opts->host_key_check, errp);
+     if (ret < 0) {
+         goto err;
+     }
+     /* Authenticate. */
+-    ret = authenticate(s, s->user, errp);
++    ret = authenticate(s, errp);
+     if (ret < 0) {
+         goto err;
+     }
+     /* Start SFTP. */
+-    s->sftp = libssh2_sftp_init(s->session);
++    s->sftp = sftp_new(s->session);
+     if (!s->sftp) {
+-        session_error_setg(errp, s, "failed to initialize sftp handle");
++        session_error_setg(errp, s, "failed to create sftp handle");
++        ret = -EINVAL;
++        goto err;
++    }
++
++    r = sftp_init(s->sftp);
++    if (r < 0) {
++        sftp_error_setg(errp, s, "failed to initialize sftp handle");
+         ret = -EINVAL;
+         goto err;
+     }
+     /* Open the remote file. */
+     trace_ssh_connect_to_ssh(opts->path, ssh_flags, creat_mode);
+-    s->sftp_handle = libssh2_sftp_open(s->sftp, opts->path, ssh_flags,
+-                                       creat_mode);
++    s->sftp_handle = sftp_open(s->sftp, opts->path, ssh_flags, creat_mode);
+     if (!s->sftp_handle) {
+-        session_error_setg(errp, s, "failed to open remote file '%s'",
+-                           opts->path);
++        sftp_error_setg(errp, s, "failed to open remote file '%s'",
++                        opts->path);
+         ret = -EINVAL;
+         goto err;
+     }
+-    r = libssh2_sftp_fstat(s->sftp_handle, &s->attrs);
+-    if (r < 0) {
++    /* Make sure the SFTP file is handled in blocking mode. */
++    sftp_file_set_blocking(s->sftp_handle);
++
++    s->attrs = sftp_fstat(s->sftp_handle);
++    if (!s->attrs) {
+         sftp_error_setg(errp, s, "failed to read file attributes");
+         return -EINVAL;
+     }
+@@ -XXX,XX +XXX,XX @@ static int connect_to_ssh(BDRVSSHState *s, BlockdevOptionsSsh *opts,
+     return 0;
+  err:
++    if (s->attrs) {
++        sftp_attributes_free(s->attrs);
++    }
++    s->attrs = NULL;
+     if (s->sftp_handle) {
+-        libssh2_sftp_close(s->sftp_handle);
++        sftp_close(s->sftp_handle);
+     }
+     s->sftp_handle = NULL;
+     if (s->sftp) {
+-        libssh2_sftp_shutdown(s->sftp);
++        sftp_free(s->sftp);
+     }
+     s->sftp = NULL;
+     if (s->session) {
+-        libssh2_session_disconnect(s->session,
+-                                   "from qemu ssh client: "
+-                                   "error opening connection");
+-        libssh2_session_free(s->session);
++        ssh_disconnect(s->session);
++        ssh_free(s->session);
+     }
+     s->session = NULL;
++    s->sock = -1;
++    if (new_sock >= 0) {
++        close(new_sock);
++    }
+     return ret;
  }
+@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
- /*
-@@ -XXX,XX +XXX,XX @@ void throttle_timers_init(ThrottleTimers *tt,
+     ssh_state_init(s);
-                           QEMUTimerCB *write_timer_cb,
-                           void *timer_opaque)
+-    ssh_flags = LIBSSH2_FXF_READ;
 +    ssh_flags = 0;
      if (bdrv_flags & BDRV_O_RDWR) {
 -        ssh_flags |= LIBSSH2_FXF_WRITE;
 +        ssh_flags |= O_RDWR;
 +    } else {
 +        ssh_flags |= O_RDONLY;
      }
      opts = ssh_parse_options(options, errp);
@@ -XXX,XX +XXX,XX @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags,
      }
      /* Go non-blocking. */
 -    libssh2_session_set_blocking(s->session, 0);
 +    ssh_set_blocking(s->session, 0);
      qapi_free_BlockdevOptionsSsh(opts);
      return 0;
   err:
 -    if (s->sock >= 0) {
 -        close(s->sock);
 -    }
 -    s->sock = -1;
 -
      qapi_free_BlockdevOptionsSsh(opts);
      return ret;
@@ -XXX,XX +XXX,XX @@ static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
  {
-+    assert(read_timer_cb || write_timer_cb);
+     ssize_t ret;
-     memset(tt, 0, sizeof(ThrottleTimers));
+     char c[1] = { '\0' };
+-    int was_blocking = libssh2_session_get_blocking(s->session);
-     tt->clock_type = clock_type;
++    int was_blocking = ssh_is_blocking(s->session);
-@@ -XXX,XX +XXX,XX @@ void throttle_timers_init(ThrottleTimers *tt,
- /* destroy a timer */
+     /* offset must be strictly greater than the current size so we do
- static void throttle_timer_destroy(QEMUTimer **timer)
+      * not overwrite anything */
 -    assert(offset > 0 && offset > s->attrs.filesize);
 +    assert(offset > 0 && offset > s->attrs->size);
 -    libssh2_session_set_blocking(s->session, 1);
 +    ssh_set_blocking(s->session, 1);
 -    libssh2_sftp_seek64(s->sftp_handle, offset - 1);
 -    ret = libssh2_sftp_write(s->sftp_handle, c, 1);
 +    sftp_seek64(s->sftp_handle, offset - 1);
 +    ret = sftp_write(s->sftp_handle, c, 1);
 -    libssh2_session_set_blocking(s->session, was_blocking);
 +    ssh_set_blocking(s->session, was_blocking);
      if (ret < 0) {
          sftp_error_setg(errp, s, "Failed to grow file");
          return -EIO;
      }
 -    s->attrs.filesize = offset;
 +    s->attrs->size = offset;
      return 0;
  }
@@ -XXX,XX +XXX,XX @@ static int ssh_co_create(BlockdevCreateOptions *options, Error **errp)
      ssh_state_init(&s);
      ret = connect_to_ssh(&s, opts->location,
 -                         LIBSSH2_FXF_READ|LIBSSH2_FXF_WRITE|
 -                         LIBSSH2_FXF_CREAT|LIBSSH2_FXF_TRUNC,
 +                         O_RDWR | O_CREAT | O_TRUNC,
 , errp);
      if (ret < 0) {
          goto fail;
@@ -XXX,XX +XXX,XX @@ static int ssh_has_zero_init(BlockDriverState *bs)
      /* Assume false, unless we can positively prove it's true. */
      int has_zero_init = 0;
 -    if (s->attrs.flags & LIBSSH2_SFTP_ATTR_PERMISSIONS) {
 -        if (s->attrs.permissions & LIBSSH2_SFTP_S_IFREG) {
 -            has_zero_init = 1;
 -        }
 +    if (s->attrs->type == SSH_FILEXFER_TYPE_REGULAR) {
 +        has_zero_init = 1;
      }
      return has_zero_init;
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
          .co = qemu_coroutine_self()
      };
 -    r = libssh2_session_block_directions(s->session);
 +    r = ssh_get_poll_flags(s->session);
 -    if (r & LIBSSH2_SESSION_BLOCK_INBOUND) {
 +    if (r & SSH_READ_PENDING) {
          rd_handler = restart_coroutine;
      }
 -    if (r & LIBSSH2_SESSION_BLOCK_OUTBOUND) {
 +    if (r & SSH_WRITE_PENDING) {
          wr_handler = restart_coroutine;
      }
@@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs)
      trace_ssh_co_yield_back(s->sock);
  }
 -/* SFTP has a function `libssh2_sftp_seek64' which seeks to a position
 - * in the remote file.  Notice that it just updates a field in the
 - * sftp_handle structure, so there is no network traffic and it cannot
 - * fail.
 - *
 - * However, `libssh2_sftp_seek64' does have a catastrophic effect on
 - * performance since it causes the handle to throw away all in-flight
 - * reads and buffered readahead data.  Therefore this function tries
 - * to be intelligent about when to call the underlying libssh2 function.
 - */
 -#define SSH_SEEK_WRITE 0
 -#define SSH_SEEK_READ  1
 -#define SSH_SEEK_FORCE 2
 -
 -static void ssh_seek(BDRVSSHState *s, int64_t offset, int flags)
 -{
 -    bool op_read = (flags & SSH_SEEK_READ) != 0;
 -    bool force = (flags & SSH_SEEK_FORCE) != 0;
 -
 -    if (force || op_read != s->offset_op_read || offset != s->offset) {
 -        trace_ssh_seek(offset);
 -        libssh2_sftp_seek64(s->sftp_handle, offset);
 -        s->offset = offset;
 -        s->offset_op_read = op_read;
 -    }
 -}
 -
  static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
                                   int64_t offset, size_t size,
                                   QEMUIOVector *qiov)
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
      trace_ssh_read(offset, size);
 -    ssh_seek(s, offset, SSH_SEEK_READ);
 +    trace_ssh_seek(offset);
 +    sftp_seek64(s->sftp_handle, offset);
      /* This keeps track of the current iovec element ('i'), where we
       * will write to next ('buf'), and the end of the current iovec
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_read(BDRVSSHState *s, BlockDriverState *bs,
      buf = i->iov_base;
      end_of_vec = i->iov_base + i->iov_len;
 -    /* libssh2 has a hard-coded limit of 2000 bytes per request,
 -     * although it will also do readahead behind our backs.  Therefore
 -     * we may have to do repeated reads here until we have read 'size'
 -     * bytes.
 -     */
      for (got = 0; got < size; ) {
 +        size_t request_read_size;
      again:
 -        trace_ssh_read_buf(buf, end_of_vec - buf);
 -        r = libssh2_sftp_read(s->sftp_handle, buf, end_of_vec - buf);
 -        trace_ssh_read_return(r);
 +        /*
 +         * The size of SFTP packets is limited to 32K bytes, so limit
 +         * the amount of data requested to 16K, as libssh currently
 +         * does not handle multiple requests on its own.
 +         */
 +        request_read_size = MIN(end_of_vec - buf, 16384);
 +        trace_ssh_read_buf(buf, end_of_vec - buf, request_read_size);
 +        r = sftp_read(s->sftp_handle, buf, request_read_size);
 +        trace_ssh_read_return(r, sftp_get_error(s->sftp));
 -        if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
 +        if (r == SSH_AGAIN) {
              co_yield(s, bs);
              goto again;
          }
 -        if (r < 0) {
 -            sftp_error_trace(s, "read");
 -            s->offset = -1;
 -            return -EIO;
 -        }
 -        if (r == 0) {
 +        if (r == SSH_EOF || (r == 0 && sftp_get_error(s->sftp) == SSH_FX_EOF)) {
              /* EOF: Short read so pad the buffer with zeroes and return it. */
              qemu_iovec_memset(qiov, got, 0, size - got);
              return 0;
          }
 +        if (r <= 0) {
 +            sftp_error_trace(s, "read");
 +            return -EIO;
 +        }
          got += r;
          buf += r;
 -        s->offset += r;
          if (buf >= end_of_vec && got < size) {
              i++;
              buf = i->iov_base;
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
      trace_ssh_write(offset, size);
 -    ssh_seek(s, offset, SSH_SEEK_WRITE);
 +    trace_ssh_seek(offset);
 +    sftp_seek64(s->sftp_handle, offset);
      /* This keeps track of the current iovec element ('i'), where we
       * will read from next ('buf'), and the end of the current iovec
@@ -XXX,XX +XXX,XX @@ static int ssh_write(BDRVSSHState *s, BlockDriverState *bs,
      end_of_vec = i->iov_base + i->iov_len;
      for (written = 0; written < size; ) {
 +        size_t request_write_size;
      again:
 -        trace_ssh_write_buf(buf, end_of_vec - buf);
 -        r = libssh2_sftp_write(s->sftp_handle, buf, end_of_vec - buf);
 -        trace_ssh_write_return(r);
 +        /*
 +         * Avoid too large data packets, as libssh currently does not
 +         * handle multiple requests on its own.
 +         */
 +        request_write_size = MIN(end_of_vec - buf, 131072);
 +        trace_ssh_write_buf(buf, end_of_vec - buf, request_write_size);
 +        r = sftp_write(s->sftp_handle, buf, request_write_size);
 +        trace_ssh_write_return(r, sftp_get_error(s->sftp));
 -        if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
 +        if (r == SSH_AGAIN) {
              co_yield(s, bs);
              goto again;
          }
          if (r < 0) {
              sftp_error_trace(s, "write");
 -            s->offset = -1;
              return -EIO;
          }
 -        /* The libssh2 API is very unclear about this.  A comment in
 -         * the code says "nothing was acked, and no EAGAIN was
 -         * received!" which apparently means that no data got sent
 -         * out, and the underlying channel didn't return any EAGAIN
 -         * indication.  I think this is a bug in either libssh2 or
 -         * OpenSSH (server-side).  In any case, forcing a seek (to
 -         * discard libssh2 internal buffers), and then trying again
 -         * works for me.
 -         */
 -        if (r == 0) {
 -            ssh_seek(s, offset + written, SSH_SEEK_WRITE|SSH_SEEK_FORCE);
 -            co_yield(s, bs);
 -            goto again;
 -        }
          written += r;
          buf += r;
 -        s->offset += r;
          if (buf >= end_of_vec && written < size) {
              i++;
              buf = i->iov_base;
              end_of_vec = i->iov_base + i->iov_len;
          }
 -        if (offset + written > s->attrs.filesize)
 -            s->attrs.filesize = offset + written;
 +        if (offset + written > s->attrs->size) {
 +            s->attrs->size = offset + written;
 +        }
      }
      return 0;
@@ -XXX,XX +XXX,XX @@ static void unsafe_flush_warning(BDRVSSHState *s, const char *what)
      }
  }
 -#ifdef HAS_LIBSSH2_SFTP_FSYNC
 +#ifdef HAVE_LIBSSH_0_8
  static coroutine_fn int ssh_flush(BDRVSSHState *s, BlockDriverState *bs)
  {
--    assert(*timer != NULL);
+     int r;
-+    if (*timer == NULL) {
-+        return;
+     trace_ssh_flush();
 +
 +    if (!sftp_extension_supported(s->sftp, "fsync@openssh.com", "1")) {
 +        unsafe_flush_warning(s, "OpenSSH >= 6.3");
 +        return 0;
 +    }
+  again:
-     timer_free(*timer);
+-    r = libssh2_sftp_fsync(s->sftp_handle);
-     *timer = NULL;
+-    if (r == LIBSSH2_ERROR_EAGAIN || r == LIBSSH2_ERROR_TIMEOUT) {
-@@ -XXX,XX +XXX,XX @@ static void throttle_timer_destroy(QEMUTimer **timer)
++    r = sftp_fsync(s->sftp_handle);
- /* Remove timers from event loop */
++    if (r == SSH_AGAIN) {
- void throttle_timers_detach_aio_context(ThrottleTimers *tt)
+         co_yield(s, bs);
          goto again;
      }
 -    if (r == LIBSSH2_ERROR_SFTP_PROTOCOL &&
 -        libssh2_sftp_last_error(s->sftp) == LIBSSH2_FX_OP_UNSUPPORTED) {
 -        unsafe_flush_warning(s, "OpenSSH >= 6.3");
 -        return 0;
 -    }
      if (r < 0) {
          sftp_error_trace(s, "fsync");
          return -EIO;
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
      return ret;
  }
 -#else /* !HAS_LIBSSH2_SFTP_FSYNC */
 +#else /* !HAVE_LIBSSH_0_8 */
  static coroutine_fn int ssh_co_flush(BlockDriverState *bs)
  {
--    int i;
+     BDRVSSHState *s = bs->opaque;
-+    ThrottleDirection dir;
+-    unsafe_flush_warning(s, "libssh2 >= 1.4.4");
--    for (i = 0; i < THROTTLE_MAX; i++) {
++    unsafe_flush_warning(s, "libssh >= 0.8.0");
--        throttle_timer_destroy(&tt->timers[i]);
+     return 0;
 +    for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
 +        throttle_timer_destroy(&tt->timers[dir]);
      }
  }
-@@ -XXX,XX +XXX,XX @@ void throttle_timers_destroy(ThrottleTimers *tt)
+-#endif /* !HAS_LIBSSH2_SFTP_FSYNC */
- /* is any throttling timer configured */
++#endif /* !HAVE_LIBSSH_0_8 */
- bool throttle_timers_are_initialized(ThrottleTimers *tt)
  static int64_t ssh_getlength(BlockDriverState *bs)
  {
--    if (tt->timers[0]) {
+     BDRVSSHState *s = bs->opaque;
--        return true;
+     int64_t length;
-+    ThrottleDirection dir;
-+
+-    /* Note we cannot make a libssh2 call here. */
-+    for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
+-    length = (int64_t) s->attrs.filesize;
-+        if (tt->timers[dir]) {
++    /* Note we cannot make a libssh call here. */
-+            return true;
++    length = (int64_t) s->attrs->size;
-+        }
+     trace_ssh_getlength(length);
-     }
+     return length;
-     return false;
+@@ -XXX,XX +XXX,XX @@ static int coroutine_fn ssh_co_truncate(BlockDriverState *bs, int64_t offset,
-@@ -XXX,XX +XXX,XX @@ bool throttle_schedule_timer(ThrottleState *ts,
+         return -ENOTSUP;
      }
 -    if (offset < s->attrs.filesize) {
 +    if (offset < s->attrs->size) {
          error_setg(errp, "ssh driver does not support shrinking files");
          return -ENOTSUP;
      }
 -    if (offset == s->attrs.filesize) {
 +    if (offset == s->attrs->size) {
          return 0;
      }
@@ -XXX,XX +XXX,XX @@ static void bdrv_ssh_init(void)
  {
-     int64_t now = qemu_clock_get_ns(tt->clock_type);
+     int r;
-     int64_t next_timestamp;
-+    QEMUTimer *timer;
+-    r = libssh2_init(0);
-     bool must_wait;
++    r = ssh_init();
+     if (r != 0) {
-+    timer = is_write ? tt->timers[THROTTLE_WRITE] : tt->timers[THROTTLE_READ];
+-        fprintf(stderr, "libssh2 initialization failed, %d\n", r);
-+    assert(timer);
++        fprintf(stderr, "libssh initialization failed, %d\n", r);
-+
+         exit(EXIT_FAILURE);
-     must_wait = throttle_compute_timer(ts,
+     }
-                                        is_write,
-                                        now,
++#if TRACE_LIBSSH != 0
-@@ -XXX,XX +XXX,XX @@ bool throttle_schedule_timer(ThrottleState *ts,
++    ssh_set_log_level(TRACE_LIBSSH);
-     }
++#endif
++
-     /* request throttled and timer pending -> do nothing */
+     bdrv_register(&bdrv_ssh);
 -    if (timer_pending(tt->timers[is_write])) {
 +    if (timer_pending(timer)) {
          return true;
      }
      /* request throttled and timer not pending -> arm timer */
 -    timer_mod(tt->timers[is_write], next_timestamp);
 +    timer_mod(timer, next_timestamp);
      return true;
  }
+diff --git a/.travis.yml b/.travis.yml
+index XXXXXXX..XXXXXXX 100644
+--- a/.travis.yml
++++ b/.travis.yml
+@@ -XXX,XX +XXX,XX @@ addons:
+       - libseccomp-dev
+       - libspice-protocol-dev
+       - libspice-server-dev
+-      - libssh2-1-dev
++      - libssh-dev
+       - liburcu-dev
+       - libusb-1.0-0-dev
+       - libvte-2.91-dev
+@@ -XXX,XX +XXX,XX @@ matrix:
+             - libseccomp-dev
+             - libspice-protocol-dev
+             - libspice-server-dev
+-            - libssh2-1-dev
++            - libssh-dev
+             - liburcu-dev
+             - libusb-1.0-0-dev
+             - libvte-2.91-dev
+diff --git a/block/trace-events b/block/trace-events
+index XXXXXXX..XXXXXXX 100644
+--- a/block/trace-events
++++ b/block/trace-events
+@@ -XXX,XX +XXX,XX @@ nbd_client_connect_success(const char *export_name) "export '%s'"
+ # ssh.c
+ ssh_restart_coroutine(void *co) "co=%p"
+ ssh_flush(void) "fsync"
+-ssh_check_host_key_knownhosts(const char *key) "host key OK: %s"
++ssh_check_host_key_knownhosts(void) "host key OK"
+ ssh_connect_to_ssh(char *path, int flags, int mode) "opening file %s flags=0x%x creat_mode=0%o"
+ ssh_co_yield(int sock, void *rd_handler, void *wr_handler) "s->sock=%d rd_handler=%p wr_handler=%p"
+ ssh_co_yield_back(int sock) "s->sock=%d - back"
+ ssh_getlength(int64_t length) "length=%" PRIi64
+ ssh_co_create_opts(uint64_t size) "total_size=%" PRIu64
+ ssh_read(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
+-ssh_read_buf(void *buf, size_t size) "sftp_read buf=%p size=%zu"
+-ssh_read_return(ssize_t ret) "sftp_read returned %zd"
++ssh_read_buf(void *buf, size_t size, size_t actual_size) "sftp_read buf=%p size=%zu (actual size=%zu)"
++ssh_read_return(ssize_t ret, int sftp_err) "sftp_read returned %zd (sftp error=%d)"
+ ssh_write(int64_t offset, size_t size) "offset=%" PRIi64 " size=%zu"
+-ssh_write_buf(void *buf, size_t size) "sftp_write buf=%p size=%zu"
+-ssh_write_return(ssize_t ret) "sftp_write returned %zd"
++ssh_write_buf(void *buf, size_t size, size_t actual_size) "sftp_write buf=%p size=%zu (actual size=%zu)"
++ssh_write_return(ssize_t ret, int sftp_err) "sftp_write returned %zd (sftp error=%d)"
+ ssh_seek(int64_t offset) "seeking to offset=%" PRIi64
++ssh_auth_methods(int methods) "auth methods=0x%x"
++ssh_server_status(int status) "server status=%d"
+ # curl.c
+ curl_timer_cb(long timeout_ms) "timer callback timeout_ms %ld"
+@@ -XXX,XX +XXX,XX @@ sheepdog_snapshot_create(const char *sn_name, const char *id) "%s %s"
+ sheepdog_snapshot_create_inode(const char *name, uint32_t snap, uint32_t vdi) "s->inode: name %s snap_id 0x%" PRIx32 " vdi 0x%" PRIx32
+ # ssh.c
+-sftp_error(const char *op, const char *ssh_err, int ssh_err_code, unsigned long sftp_err_code) "%s failed: %s (libssh2 error code: %d, sftp error code: %lu)"
++sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
+diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
+index XXXXXXX..XXXXXXX 100644
+--- a/docs/qemu-block-drivers.texi
++++ b/docs/qemu-block-drivers.texi
+@@ -XXX,XX +XXX,XX @@ print a warning when @code{fsync} is not supported:
+ warning: ssh server @code{ssh.example.com:22} does not support fsync
+-With sufficiently new versions of libssh2 and OpenSSH, @code{fsync} is
++With sufficiently new versions of libssh and OpenSSH, @code{fsync} is
+ supported.
+ @node disk_images_nvme
+diff --git a/tests/docker/dockerfiles/debian-win32-cross.docker b/tests/docker/dockerfiles/debian-win32-cross.docker
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/docker/dockerfiles/debian-win32-cross.docker
++++ b/tests/docker/dockerfiles/debian-win32-cross.docker
+@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
+         mxe-$TARGET-w64-mingw32.shared-curl \
+         mxe-$TARGET-w64-mingw32.shared-glib \
+         mxe-$TARGET-w64-mingw32.shared-libgcrypt \
+-        mxe-$TARGET-w64-mingw32.shared-libssh2 \
+         mxe-$TARGET-w64-mingw32.shared-libusb1 \
+         mxe-$TARGET-w64-mingw32.shared-lzo \
+         mxe-$TARGET-w64-mingw32.shared-nettle \
+diff --git a/tests/docker/dockerfiles/debian-win64-cross.docker b/tests/docker/dockerfiles/debian-win64-cross.docker
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/docker/dockerfiles/debian-win64-cross.docker
++++ b/tests/docker/dockerfiles/debian-win64-cross.docker
+@@ -XXX,XX +XXX,XX @@ RUN DEBIAN_FRONTEND=noninteractive eatmydata \
+         mxe-$TARGET-w64-mingw32.shared-curl \
+         mxe-$TARGET-w64-mingw32.shared-glib \
+         mxe-$TARGET-w64-mingw32.shared-libgcrypt \
+-        mxe-$TARGET-w64-mingw32.shared-libssh2 \
+         mxe-$TARGET-w64-mingw32.shared-libusb1 \
+         mxe-$TARGET-w64-mingw32.shared-lzo \
+         mxe-$TARGET-w64-mingw32.shared-nettle \
+diff --git a/tests/docker/dockerfiles/fedora.docker b/tests/docker/dockerfiles/fedora.docker
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/docker/dockerfiles/fedora.docker
++++ b/tests/docker/dockerfiles/fedora.docker
+@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
+     libpng-devel \
+     librbd-devel \
+     libseccomp-devel \
+-    libssh2-devel \
++    libssh-devel \
+     libubsan \
+     libusbx-devel \
+     libxml2-devel \
+@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
+     mingw32-gtk3 \
+     mingw32-libjpeg-turbo \
+     mingw32-libpng \
+-    mingw32-libssh2 \
+     mingw32-libtasn1 \
+     mingw32-nettle \
+     mingw32-pixman \
+@@ -XXX,XX +XXX,XX @@ ENV PACKAGES \
+     mingw64-gtk3 \
+     mingw64-libjpeg-turbo \
+     mingw64-libpng \
+-    mingw64-libssh2 \
+     mingw64-libtasn1 \
+     mingw64-nettle \
+     mingw64-pixman \
+diff --git a/tests/docker/dockerfiles/ubuntu.docker b/tests/docker/dockerfiles/ubuntu.docker
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/docker/dockerfiles/ubuntu.docker
++++ b/tests/docker/dockerfiles/ubuntu.docker
+@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
+     libsnappy-dev \
+     libspice-protocol-dev \
+     libspice-server-dev \
+-    libssh2-1-dev \
++    libssh-dev \
+     libusb-1.0-0-dev \
+     libusbredirhost-dev \
+     libvdeplug-dev \
+diff --git a/tests/docker/dockerfiles/ubuntu1804.docker b/tests/docker/dockerfiles/ubuntu1804.docker
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/docker/dockerfiles/ubuntu1804.docker
++++ b/tests/docker/dockerfiles/ubuntu1804.docker
+@@ -XXX,XX +XXX,XX @@ ENV PACKAGES flex bison \
+     libsnappy-dev \
+     libspice-protocol-dev \
+     libspice-server-dev \
+-    libssh2-1-dev \
++    libssh-dev \
+     libusb-1.0-0-dev \
+     libusbredirhost-dev \
+     libvdeplug-dev \
+diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
+index XXXXXXX..XXXXXXX 100755
+--- a/tests/qemu-iotests/207
++++ b/tests/qemu-iotests/207
+@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
+     iotests.img_info_log(remote_path)
+-    md5_key = subprocess.check_output(
+-        'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
+-        'cut -d" " -f3 | base64 -d | md5sum -b | cut -d" " -f1',
+-        shell=True).rstrip().decode('ascii')
++    keys = subprocess.check_output(
++        'ssh-keyscan 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
++        'cut -d" " -f3',
++        shell=True).rstrip().decode('ascii').split('\n')
++
++    # Mappings of base64 representations to digests
++    md5_keys = {}
++    sha1_keys = {}
++
++    for key in keys:
++        md5_keys[key] = subprocess.check_output(
++            'echo %s | base64 -d | md5sum -b | cut -d" " -f1' % key,
++            shell=True).rstrip().decode('ascii')
++
++        sha1_keys[key] = subprocess.check_output(
++            'echo %s | base64 -d | sha1sum -b | cut -d" " -f1' % key,
++            shell=True).rstrip().decode('ascii')
+     vm.launch()
++
++    # Find correct key first
++    matching_key = None
++    for key in keys:
++        result = vm.qmp('blockdev-add',
++                        driver='ssh', node_name='node0', path=disk_path,
++                        server={
++                             'host': '127.0.0.1',
++                             'port': '22',
++                        }, host_key_check={
++                             'mode': 'hash',
++                             'type': 'md5',
++                             'hash': md5_keys[key],
++                        })
++
++        if 'error' not in result:
++            vm.qmp('blockdev-del', node_name='node0')
++            matching_key = key
++            break
++
++    if matching_key is None:
++        vm.shutdown()
++        iotests.notrun('Did not find a key that fits 127.0.0.1')
++
+     blockdev_create(vm, { 'driver': 'ssh',
+                           'location': {
+                               'path': disk_path,
+@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
+                               'host-key-check': {
+                                   'mode': 'hash',
+                                   'type': 'md5',
+-                                  'hash': md5_key,
++                                  'hash': md5_keys[matching_key],
+                               }
+                           },
+                           'size': 8388608 })
+@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
+     iotests.img_info_log(remote_path)
+-    sha1_key = subprocess.check_output(
+-        'ssh-keyscan -t rsa 127.0.0.1 2>/dev/null | grep -v "\\^#" | ' +
+-        'cut -d" " -f3 | base64 -d | sha1sum -b | cut -d" " -f1',
+-        shell=True).rstrip().decode('ascii')
+-
+     vm.launch()
+     blockdev_create(vm, { 'driver': 'ssh',
+                           'location': {
+@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('t.img') as disk_path, \
+                               'host-key-check': {
+                                   'mode': 'hash',
+                                   'type': 'sha1',
+-                                  'hash': sha1_key,
++                                  'hash': sha1_keys[matching_key],
+                               }
+                           },
+                           'size': 4194304 })
+diff --git a/tests/qemu-iotests/207.out b/tests/qemu-iotests/207.out
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qemu-iotests/207.out
++++ b/tests/qemu-iotests/207.out
+@@ -XXX,XX +XXX,XX @@ virtual size: 4 MiB (4194304 bytes)
+ {"execute": "blockdev-create", "arguments": {"job-id": "job0", "options": {"driver": "ssh", "location": {"host-key-check": {"mode": "none"}, "path": "/this/is/not/an/existing/path", "server": {"host": "127.0.0.1", "port": "22"}}, "size": 4194304}}}
+ {"return": {}}
+-Job failed: failed to open remote file '/this/is/not/an/existing/path': Failed opening remote file (libssh2 error code: -31)
++Job failed: failed to open remote file '/this/is/not/an/existing/path': SFTP server: No such file (libssh error code: 1, sftp error code: 2)
+ {"execute": "job-dismiss", "arguments": {"id": "job0"}}
+ {"return": {}}
 --
-.41.0
+.21.0

-[PULL 05/14] cryptodev: use NULL throttle timer cb for read direction
+Deleted patch
-From: zhenwei pi <pizhenwei@bytedance.com>
-Operations on a cryptodev are considered as *write* only, the callback
-of read direction is never invoked. Use NULL instead of an unreachable
-path(cryptodev_backend_throttle_timer_cb on read direction).
-The dummy read timer(never invoked) is already removed here, it means
-that the 'FIXME' tag is no longer needed.
-Reviewed-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
-Message-Id: <20230728022006.1098509-6-pizhenwei@bytedance.com>
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
----
- backends/cryptodev.c | 3 +--
-file changed, 1 insertion(+), 2 deletions(-)
-diff --git a/backends/cryptodev.c b/backends/cryptodev.c
-index XXXXXXX..XXXXXXX 100644
---- a/backends/cryptodev.c
-+++ b/backends/cryptodev.c
-@@ -XXX,XX +XXX,XX @@ static void cryptodev_backend_set_throttle(CryptoDevBackend *backend, int field,
-     if (!enabled) {
-         throttle_init(&backend->ts);
-         throttle_timers_init(&backend->tt, qemu_get_aio_context(),
--                             QEMU_CLOCK_REALTIME,
--                             cryptodev_backend_throttle_timer_cb, /* FIXME */
-+                             QEMU_CLOCK_REALTIME, NULL,
-                              cryptodev_backend_throttle_timer_cb, backend);
-     }
---
-.41.0

-[PULL 06/14] throttle: use enum ThrottleDirection instead of bool is_write
+Deleted patch
-From: zhenwei pi <pizhenwei@bytedance.com>
-enum ThrottleDirection is already there, use ThrottleDirection instead
-of 'bool is_write' for throttle API, also modify related codes from
-block, fsdev, cryptodev and tests.
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
-Message-Id: <20230728022006.1098509-7-pizhenwei@bytedance.com>
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
----
- include/qemu/throttle.h     |  5 +++--
- backends/cryptodev.c        |  9 +++++----
- block/throttle-groups.c     |  6 ++++--
- fsdev/qemu-fsdev-throttle.c |  8 +++++---
- tests/unit/test-throttle.c  |  4 ++--
- util/throttle.c             | 31 +++++++++++++++++--------------
-files changed, 36 insertions(+), 27 deletions(-)
-diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/throttle.h
-+++ b/include/qemu/throttle.h
-@@ -XXX,XX +XXX,XX @@ void throttle_config_init(ThrottleConfig *cfg);
- /* usage */
- bool throttle_schedule_timer(ThrottleState *ts,
-                              ThrottleTimers *tt,
--                             bool is_write);
-+                             ThrottleDirection direction);
--void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
-+void throttle_account(ThrottleState *ts, ThrottleDirection direction,
-+                      uint64_t size);
- void throttle_limits_to_config(ThrottleLimits *arg, ThrottleConfig *cfg,
-                                Error **errp);
- void throttle_config_to_limits(ThrottleConfig *cfg, ThrottleLimits *var);
-diff --git a/backends/cryptodev.c b/backends/cryptodev.c
-index XXXXXXX..XXXXXXX 100644
---- a/backends/cryptodev.c
-+++ b/backends/cryptodev.c
-@@ -XXX,XX +XXX,XX @@ static void cryptodev_backend_throttle_timer_cb(void *opaque)
-             continue;
-         }
--        throttle_account(&backend->ts, true, ret);
-+        throttle_account(&backend->ts, THROTTLE_WRITE, ret);
-         cryptodev_backend_operation(backend, op_info);
-         if (throttle_enabled(&backend->tc) &&
--            throttle_schedule_timer(&backend->ts, &backend->tt, true)) {
-+            throttle_schedule_timer(&backend->ts, &backend->tt,
-+                                    THROTTLE_WRITE)) {
-             break;
-         }
-     }
-@@ -XXX,XX +XXX,XX @@ int cryptodev_backend_crypto_operation(
-         goto do_account;
-     }
--    if (throttle_schedule_timer(&backend->ts, &backend->tt, true) ||
-+    if (throttle_schedule_timer(&backend->ts, &backend->tt, THROTTLE_WRITE) ||
-         !QTAILQ_EMPTY(&backend->opinfos)) {
-         QTAILQ_INSERT_TAIL(&backend->opinfos, op_info, next);
-         return 0;
-@@ -XXX,XX +XXX,XX @@ do_account:
-         return ret;
-     }
--    throttle_account(&backend->ts, true, ret);
-+    throttle_account(&backend->ts, THROTTLE_WRITE, ret);
-     return cryptodev_backend_operation(backend, op_info);
- }
-diff --git a/block/throttle-groups.c b/block/throttle-groups.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/throttle-groups.c
-+++ b/block/throttle-groups.c
-@@ -XXX,XX +XXX,XX @@ static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
-     ThrottleState *ts = tgm->throttle_state;
-     ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
-     ThrottleTimers *tt = &tgm->throttle_timers;
-+    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
-     bool must_wait;
-     if (qatomic_read(&tgm->io_limits_disabled)) {
-@@ -XXX,XX +XXX,XX @@ static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
-         return true;
-     }
--    must_wait = throttle_schedule_timer(ts, tt, is_write);
-+    must_wait = throttle_schedule_timer(ts, tt, direction);
-     /* If a timer just got armed, set tgm as the current token */
-     if (must_wait) {
-@@ -XXX,XX +XXX,XX @@ void coroutine_fn throttle_group_co_io_limits_intercept(ThrottleGroupMember *tgm
-     bool must_wait;
-     ThrottleGroupMember *token;
-     ThrottleGroup *tg = container_of(tgm->throttle_state, ThrottleGroup, ts);
-+    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
-     assert(bytes >= 0);
-@@ -XXX,XX +XXX,XX @@ void coroutine_fn throttle_group_co_io_limits_intercept(ThrottleGroupMember *tgm
-     }
-     /* The I/O will be executed, so do the accounting */
--    throttle_account(tgm->throttle_state, is_write, bytes);
-+    throttle_account(tgm->throttle_state, direction, bytes);
-     /* Schedule the next request */
-     schedule_next_request(tgm, is_write);
-diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c
-index XXXXXXX..XXXXXXX 100644
---- a/fsdev/qemu-fsdev-throttle.c
-+++ b/fsdev/qemu-fsdev-throttle.c
-@@ -XXX,XX +XXX,XX @@ void fsdev_throttle_init(FsThrottle *fst)
- void coroutine_fn fsdev_co_throttle_request(FsThrottle *fst, bool is_write,
-                                             struct iovec *iov, int iovcnt)
- {
-+    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
-+
-     if (throttle_enabled(&fst->cfg)) {
--        if (throttle_schedule_timer(&fst->ts, &fst->tt, is_write) ||
-+        if (throttle_schedule_timer(&fst->ts, &fst->tt, direction) ||
-             !qemu_co_queue_empty(&fst->throttled_reqs[is_write])) {
-             qemu_co_queue_wait(&fst->throttled_reqs[is_write], NULL);
-         }
--        throttle_account(&fst->ts, is_write, iov_size(iov, iovcnt));
-+        throttle_account(&fst->ts, direction, iov_size(iov, iovcnt));
-         if (!qemu_co_queue_empty(&fst->throttled_reqs[is_write]) &&
--            !throttle_schedule_timer(&fst->ts, &fst->tt, is_write)) {
-+            !throttle_schedule_timer(&fst->ts, &fst->tt, direction)) {
-             qemu_co_queue_next(&fst->throttled_reqs[is_write]);
-         }
-     }
-diff --git a/tests/unit/test-throttle.c b/tests/unit/test-throttle.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/unit/test-throttle.c
-+++ b/tests/unit/test-throttle.c
-@@ -XXX,XX +XXX,XX @@ static bool do_test_accounting(bool is_ops, /* are we testing bps or ops */
-     throttle_config(&ts, QEMU_CLOCK_VIRTUAL, &cfg);
-     /* account a read */
--    throttle_account(&ts, false, size);
-+    throttle_account(&ts, THROTTLE_READ, size);
-     /* account a write */
--    throttle_account(&ts, true, size);
-+    throttle_account(&ts, THROTTLE_WRITE, size);
-     /* check total result */
-     index = to_test[is_ops][0];
-diff --git a/util/throttle.c b/util/throttle.c
-index XXXXXXX..XXXXXXX 100644
---- a/util/throttle.c
-+++ b/util/throttle.c
-@@ -XXX,XX +XXX,XX @@ int64_t throttle_compute_wait(LeakyBucket *bkt)
- /* This function compute the time that must be waited while this IO
-  *
-- * @is_write:   true if the current IO is a write, false if it's a read
-+ * @direction:  throttle direction
-  * @ret:        time to wait
-  */
- static int64_t throttle_compute_wait_for(ThrottleState *ts,
--                                         bool is_write)
-+                                         ThrottleDirection direction)
- {
-     BucketType to_check[2][4] = { {THROTTLE_BPS_TOTAL,
-                                    THROTTLE_OPS_TOTAL,
-@@ -XXX,XX +XXX,XX @@ static int64_t throttle_compute_wait_for(ThrottleState *ts,
-     int i;
-     for (i = 0; i < 4; i++) {
--        BucketType index = to_check[is_write][i];
-+        BucketType index = to_check[direction][i];
-         wait = throttle_compute_wait(&ts->cfg.buckets[index]);
-         if (wait > max_wait) {
-             max_wait = wait;
-@@ -XXX,XX +XXX,XX @@ static int64_t throttle_compute_wait_for(ThrottleState *ts,
- /* compute the timer for this type of operation
-  *
-- * @is_write:   the type of operation
-+ * @direction:  throttle direction
-  * @now:        the current clock timestamp
-  * @next_timestamp: the resulting timer
-  * @ret:        true if a timer must be set
-  */
- static bool throttle_compute_timer(ThrottleState *ts,
--                                   bool is_write,
-+                                   ThrottleDirection direction,
-                                    int64_t now,
-                                    int64_t *next_timestamp)
- {
-@@ -XXX,XX +XXX,XX @@ static bool throttle_compute_timer(ThrottleState *ts,
-     throttle_do_leak(ts, now);
-     /* compute the wait time if any */
--    wait = throttle_compute_wait_for(ts, is_write);
-+    wait = throttle_compute_wait_for(ts, direction);
-     /* if the code must wait compute when the next timer should fire */
-     if (wait) {
-@@ -XXX,XX +XXX,XX @@ void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg)
-  * NOTE: this function is not unit tested due to it's usage of timer_mod
-  *
-  * @tt:       the timers structure
-- * @is_write: the type of operation (read/write)
-+ * @direction: throttle direction
-  * @ret:      true if the timer has been scheduled else false
-  */
- bool throttle_schedule_timer(ThrottleState *ts,
-                              ThrottleTimers *tt,
--                             bool is_write)
-+                             ThrottleDirection direction)
- {
-     int64_t now = qemu_clock_get_ns(tt->clock_type);
-     int64_t next_timestamp;
-     QEMUTimer *timer;
-     bool must_wait;
--    timer = is_write ? tt->timers[THROTTLE_WRITE] : tt->timers[THROTTLE_READ];
-+    assert(direction < THROTTLE_MAX);
-+    timer = tt->timers[direction];
-     assert(timer);
-     must_wait = throttle_compute_timer(ts,
--                                       is_write,
-+                                       direction,
-                                        now,
-                                        &next_timestamp);
-@@ -XXX,XX +XXX,XX @@ bool throttle_schedule_timer(ThrottleState *ts,
- /* do the accounting for this operation
-  *
-- * @is_write: the type of operation (read/write)
-+ * @direction: throttle direction
-  * @size:     the size of the operation
-  */
--void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
-+void throttle_account(ThrottleState *ts, ThrottleDirection direction,
-+                      uint64_t size)
- {
-     const BucketType bucket_types_size[2][2] = {
-         { THROTTLE_BPS_TOTAL, THROTTLE_BPS_READ },
-@@ -XXX,XX +XXX,XX @@ void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
-     double units = 1.0;
-     unsigned i;
-+    assert(direction < THROTTLE_MAX);
-     /* if cfg.op_size is defined and smaller than size we compute unit count */
-     if (ts->cfg.op_size && size > ts->cfg.op_size) {
-         units = (double) size / ts->cfg.op_size;
-@@ -XXX,XX +XXX,XX @@ void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
-     for (i = 0; i < 2; i++) {
-         LeakyBucket *bkt;
--        bkt = &ts->cfg.buckets[bucket_types_size[is_write][i]];
-+        bkt = &ts->cfg.buckets[bucket_types_size[direction][i]];
-         bkt->level += size;
-         if (bkt->burst_length > 1) {
-             bkt->burst_level += size;
-         }
--        bkt = &ts->cfg.buckets[bucket_types_units[is_write][i]];
-+        bkt = &ts->cfg.buckets[bucket_types_units[direction][i]];
-         bkt->level += units;
-         if (bkt->burst_length > 1) {
-             bkt->burst_level += units;
---
-.41.0

-[PULL 07/14] throttle: use THROTTLE_MAX/ARRAY_SIZE for hard code
+Deleted patch
-From: zhenwei pi <pizhenwei@bytedance.com>
-The first dimension of both to_check and
-bucket_types_size/bucket_types_units is used as throttle direction,
-use THROTTLE_MAX instead of hard coded number. Also use ARRAY_SIZE()
-to avoid hard coded number for the second dimension.
-Hanna noticed that the two array should be static. Yes, turn them
-into static variables.
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
-Message-Id: <20230728022006.1098509-8-pizhenwei@bytedance.com>
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
----
- util/throttle.c | 11 ++++++-----
-file changed, 6 insertions(+), 5 deletions(-)
-diff --git a/util/throttle.c b/util/throttle.c
-index XXXXXXX..XXXXXXX 100644
---- a/util/throttle.c
-+++ b/util/throttle.c
-@@ -XXX,XX +XXX,XX @@ int64_t throttle_compute_wait(LeakyBucket *bkt)
- static int64_t throttle_compute_wait_for(ThrottleState *ts,
-                                          ThrottleDirection direction)
- {
--    BucketType to_check[2][4] = { {THROTTLE_BPS_TOTAL,
-+    static const BucketType to_check[THROTTLE_MAX][4] = {
-+                                  {THROTTLE_BPS_TOTAL,
-                                    THROTTLE_OPS_TOTAL,
-                                    THROTTLE_BPS_READ,
-                                    THROTTLE_OPS_READ},
-@@ -XXX,XX +XXX,XX @@ static int64_t throttle_compute_wait_for(ThrottleState *ts,
-     int64_t wait, max_wait = 0;
-     int i;
--    for (i = 0; i < 4; i++) {
-+    for (i = 0; i < ARRAY_SIZE(to_check[THROTTLE_READ]); i++) {
-         BucketType index = to_check[direction][i];
-         wait = throttle_compute_wait(&ts->cfg.buckets[index]);
-         if (wait > max_wait) {
-@@ -XXX,XX +XXX,XX @@ bool throttle_schedule_timer(ThrottleState *ts,
- void throttle_account(ThrottleState *ts, ThrottleDirection direction,
-                       uint64_t size)
- {
--    const BucketType bucket_types_size[2][2] = {
-+    static const BucketType bucket_types_size[THROTTLE_MAX][2] = {
-         { THROTTLE_BPS_TOTAL, THROTTLE_BPS_READ },
-         { THROTTLE_BPS_TOTAL, THROTTLE_BPS_WRITE }
-     };
--    const BucketType bucket_types_units[2][2] = {
-+    static const BucketType bucket_types_units[THROTTLE_MAX][2] = {
-         { THROTTLE_OPS_TOTAL, THROTTLE_OPS_READ },
-         { THROTTLE_OPS_TOTAL, THROTTLE_OPS_WRITE }
-     };
-@@ -XXX,XX +XXX,XX @@ void throttle_account(ThrottleState *ts, ThrottleDirection direction,
-         units = (double) size / ts->cfg.op_size;
-     }
--    for (i = 0; i < 2; i++) {
-+    for (i = 0; i < ARRAY_SIZE(bucket_types_size[THROTTLE_READ]); i++) {
-         LeakyBucket *bkt;
-         bkt = &ts->cfg.buckets[bucket_types_size[direction][i]];
---
-.41.0

-[PULL 08/14] fsdev: Use ThrottleDirection instread of bool is_write
+Deleted patch
-From: zhenwei pi <pizhenwei@bytedance.com>
-'bool is_write' style is obsolete from throttle framework, adapt
-fsdev to the new style.
-Cc: Greg Kurz <groug@kaod.org>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
-Message-Id: <20230728022006.1098509-9-pizhenwei@bytedance.com>
-Reviewed-by: Greg Kurz <groug@kaod.org>
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
----
- fsdev/qemu-fsdev-throttle.h |  4 ++--
- fsdev/qemu-fsdev-throttle.c | 14 +++++++-------
- hw/9pfs/cofile.c            |  4 ++--
-files changed, 11 insertions(+), 11 deletions(-)
-diff --git a/fsdev/qemu-fsdev-throttle.h b/fsdev/qemu-fsdev-throttle.h
-index XXXXXXX..XXXXXXX 100644
---- a/fsdev/qemu-fsdev-throttle.h
-+++ b/fsdev/qemu-fsdev-throttle.h
-@@ -XXX,XX +XXX,XX @@ typedef struct FsThrottle {
-     ThrottleState ts;
-     ThrottleTimers tt;
-     ThrottleConfig cfg;
--    CoQueue      throttled_reqs[2];
-+    CoQueue      throttled_reqs[THROTTLE_MAX];
- } FsThrottle;
- int fsdev_throttle_parse_opts(QemuOpts *, FsThrottle *, Error **);
- void fsdev_throttle_init(FsThrottle *);
--void coroutine_fn fsdev_co_throttle_request(FsThrottle *, bool ,
-+void coroutine_fn fsdev_co_throttle_request(FsThrottle *, ThrottleDirection ,
-                                             struct iovec *, int);
- void fsdev_throttle_cleanup(FsThrottle *);
-diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c
-index XXXXXXX..XXXXXXX 100644
---- a/fsdev/qemu-fsdev-throttle.c
-+++ b/fsdev/qemu-fsdev-throttle.c
-@@ -XXX,XX +XXX,XX @@ void fsdev_throttle_init(FsThrottle *fst)
-     }
- }
--void coroutine_fn fsdev_co_throttle_request(FsThrottle *fst, bool is_write,
-+void coroutine_fn fsdev_co_throttle_request(FsThrottle *fst,
-+                                            ThrottleDirection direction,
-                                             struct iovec *iov, int iovcnt)
- {
--    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
--
-+    assert(direction < THROTTLE_MAX);
-     if (throttle_enabled(&fst->cfg)) {
-         if (throttle_schedule_timer(&fst->ts, &fst->tt, direction) ||
--            !qemu_co_queue_empty(&fst->throttled_reqs[is_write])) {
--            qemu_co_queue_wait(&fst->throttled_reqs[is_write], NULL);
-+            !qemu_co_queue_empty(&fst->throttled_reqs[direction])) {
-+            qemu_co_queue_wait(&fst->throttled_reqs[direction], NULL);
-         }
-         throttle_account(&fst->ts, direction, iov_size(iov, iovcnt));
--        if (!qemu_co_queue_empty(&fst->throttled_reqs[is_write]) &&
-+        if (!qemu_co_queue_empty(&fst->throttled_reqs[direction]) &&
-             !throttle_schedule_timer(&fst->ts, &fst->tt, direction)) {
--            qemu_co_queue_next(&fst->throttled_reqs[is_write]);
-+            qemu_co_queue_next(&fst->throttled_reqs[direction]);
-         }
-     }
- }
-diff --git a/hw/9pfs/cofile.c b/hw/9pfs/cofile.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/9pfs/cofile.c
-+++ b/hw/9pfs/cofile.c
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn v9fs_co_pwritev(V9fsPDU *pdu, V9fsFidState *fidp,
-     if (v9fs_request_cancelled(pdu)) {
-         return -EINTR;
-     }
--    fsdev_co_throttle_request(s->ctx.fst, true, iov, iovcnt);
-+    fsdev_co_throttle_request(s->ctx.fst, THROTTLE_WRITE, iov, iovcnt);
-     v9fs_co_run_in_worker(
-         {
-             err = s->ops->pwritev(&s->ctx, &fidp->fs, iov, iovcnt, offset);
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn v9fs_co_preadv(V9fsPDU *pdu, V9fsFidState *fidp,
-     if (v9fs_request_cancelled(pdu)) {
-         return -EINTR;
-     }
--    fsdev_co_throttle_request(s->ctx.fst, false, iov, iovcnt);
-+    fsdev_co_throttle_request(s->ctx.fst, THROTTLE_READ, iov, iovcnt);
-     v9fs_co_run_in_worker(
-         {
-             err = s->ops->preadv(&s->ctx, &fidp->fs, iov, iovcnt, offset);
---
-.41.0

-[PULL 10/14] file-posix: Clear bs->bl.zoned on error
+[Qemu-devel] [PULL v2 8/8] iotests: Fix 205 for concurrent runs
-bs->bl.zoned is what indicates whether the zone information is present
+Tests should place their files into the test directory.  This includes
-and valid; it is the only thing that raw_refresh_zoned_limits() sets if
+Unix sockets.  205 currently fails to do so, which prevents it from
-CONFIG_BLKZONED is not defined, and it is also the only thing that it
+being run concurrently.
 sets if CONFIG_BLKZONED is defined, but there are no zones.
-Make sure that it is always set to BLK_Z_NONE if there is an error
+Signed-off-by: Max Reitz <mreitz@redhat.com>
-anywhere in raw_refresh_zoned_limits() so that we do not accidentally
+Message-id: 20190618210238.9524-1-mreitz@redhat.com
-announce zones while our information is incomplete or invalid.
+Reviewed-by: Eric Blake <eblake@redhat.com>
 Signed-off-by: Max Reitz <mreitz@redhat.com>
 ---
  tests/qemu-iotests/205 | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-This also fixes a memory leak in the last error path in
+diff --git a/tests/qemu-iotests/205 b/tests/qemu-iotests/205
-raw_refresh_zoned_limits().
+index XXXXXXX..XXXXXXX 100755
 --- a/tests/qemu-iotests/205
 +++ b/tests/qemu-iotests/205
@@ -XXX,XX +XXX,XX @@ import iotests
  import time
  from iotests import qemu_img_create, qemu_io, filter_qemu_io, QemuIoInteractive
 -nbd_sock = 'nbd_sock'
 +nbd_sock = os.path.join(iotests.test_dir, 'nbd_sock')
  nbd_uri = 'nbd+unix:///exp?socket=' + nbd_sock
  disk = os.path.join(iotests.test_dir, 'disk')
 --
 .21.0
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
-Message-Id: <20230824155345.109765-2-hreitz@redhat.com>
-Reviewed-by: Sam Li <faithilikerun@gmail.com>
----
- block/file-posix.c | 21 ++++++++++++---------
-file changed, 12 insertions(+), 9 deletions(-)
-diff --git a/block/file-posix.c b/block/file-posix.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/file-posix.c
-+++ b/block/file-posix.c
-@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
-     BlockZoneModel zoned;
-     int ret;
--    bs->bl.zoned = BLK_Z_NONE;
--
-     ret = get_sysfs_zoned_model(st, &zoned);
-     if (ret < 0 || zoned == BLK_Z_NONE) {
--        return;
-+        goto no_zoned;
-     }
-     bs->bl.zoned = zoned;
-@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
-     if (ret < 0) {
-         error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
-                                      "sysfs attribute");
--        return;
-+        goto no_zoned;
-     } else if (!ret) {
-         error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
--        return;
-+        goto no_zoned;
-     }
-     bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
-@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
-     if (ret < 0) {
-         error_setg_errno(errp, -ret, "Unable to read nr_zones "
-                                      "sysfs attribute");
--        return;
-+        goto no_zoned;
-     } else if (!ret) {
-         error_setg(errp, "Read 0 from nr_zones sysfs attribute");
--        return;
-+        goto no_zoned;
-     }
-     bs->bl.nr_zones = ret;
-@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
-     ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
-     if (ret < 0) {
-         error_setg_errno(errp, -ret, "report wps failed");
--        bs->wps = NULL;
--        return;
-+        goto no_zoned;
-     }
-     qemu_co_mutex_init(&bs->wps->colock);
-+    return;
-+
-+no_zoned:
-+    bs->bl.zoned = BLK_Z_NONE;
-+    g_free(bs->wps);
-+    bs->wps = NULL;
- }
- #else /* !defined(CONFIG_BLKZONED) */
- static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
---
-.41.0

The following changes since commit f5fe7c17ac4e309e47e78f0f9761aebc8d2f2c81:

Merge tag 'pull-tcg-20230823-2' of https://gitlab.com/rth7680/qemu into staging (2023-08-28 16:07:04 -0400)

are available in the Git repository at:

https://gitlab.com/hreitz/qemu.git tags/pull-block-2023-09-01

for you to fetch changes up to 380448464dd89291cf7fd7434be6c225482a334d:

tests/file-io-error: New test (2023-08-29 13:01:24 +0200)

----------------------------------------------------------------
Block patches

- Fix for file-posix's zoning code crashing on I/O errors
- Throttling refactoring

----------------------------------------------------------------
Hanna Czenczek (5):
  file-posix: Clear bs->bl.zoned on error
  file-posix: Check bs->bl.zoned for zone info
  file-posix: Fix zone update in I/O error path
  file-posix: Simplify raw_co_prw's 'out' zone code
  tests/file-io-error: New test

Zhenwei Pi (9):
  throttle: introduce enum ThrottleDirection
  test-throttle: use enum ThrottleDirection
  throttle: support read-only and write-only
  test-throttle: test read only and write only
  cryptodev: use NULL throttle timer cb for read direction
  throttle: use enum ThrottleDirection instead of bool is_write
  throttle: use THROTTLE_MAX/ARRAY_SIZE for hard code
  fsdev: Use ThrottleDirection instread of bool is_write
  block/throttle-groups: Use ThrottleDirection instread of bool is_write

fsdev/qemu-fsdev-throttle.h                |   4 +-
 include/block/throttle-groups.h            |   6 +-
 include/qemu/throttle.h                    |  16 +-
 backends/cryptodev.c                       |  12 +-
 block/block-backend.c                      |   4 +-
 block/file-posix.c                         |  42 +++---
 block/throttle-groups.c                    | 163 +++++++++++----------
 block/throttle.c                           |   8 +-
 fsdev/qemu-fsdev-throttle.c                |  18 ++-
 hw/9pfs/cofile.c                           |   4 +-
 tests/unit/test-throttle.c                 |  76 +++++++++-
 util/throttle.c                            |  84 +++++++----
 tests/qemu-iotests/tests/file-io-error     | 119 +++++++++++++++
 tests/qemu-iotests/tests/file-io-error.out |  33 +++++
 14 files changed, 418 insertions(+), 171 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/file-io-error
 create mode 100644 tests/qemu-iotests/tests/file-io-error.out

-- 
2.41.0