Series comparison

-[PULL 0/5] Block patches
+[PULL 00/17] Block patches
-The following changes since commit 711c0418c8c1ce3a24346f058b001c4c5a2f0f81:
+The following changes since commit 848a6caa88b9f082c89c9b41afa975761262981d:
-  Merge remote-tracking branch 'remotes/philmd/tags/mips-20210702' into staging (2021-07-04 14:04:12 +0100)
+  Merge tag 'migration-20230602-pull-request' of https://gitlab.com/juan.quintela/qemu into staging (2023-06-02 17:33:29 -0700)
 are available in the Git repository at:
-  https://gitlab.com/stefanha/qemu.git tags/block-pull-request
+  https://gitlab.com/hreitz/qemu.git tags/pull-block-2023-06-05
-for you to fetch changes up to 9f460c64e13897117f35ffb61f6f5e0102cabc70:
+for you to fetch changes up to 42a2890a76f4783cd1c212f27856edcf2b5e8a75:
-  block/io: Merge discard request alignments (2021-07-06 14:28:55 +0100)
+  qcow2: add discard-no-unref option (2023-06-05 13:15:42 +0200)
 ----------------------------------------------------------------
-Pull request
+Block patches
 - Fix padding of unaligned vectored requests to match the host alignment
   for vectors with 1023 or 1024 buffers
 - Refactor and fix bugs in parallels's image check functionality
 - Add an option to the qcow2 driver to retain (qcow2-level) allocations
   on discard requests from the guest (while still forwarding the discard
   to the lower level and marking the range as zero)
 ----------------------------------------------------------------
+Alexander Ivanov (12):
+  parallels: Out of image offset in BAT leads to image inflation
+  parallels: Fix high_off calculation in parallels_co_check()
+  parallels: Fix image_end_offset and data_end after out-of-image check
+  parallels: create parallels_set_bat_entry_helper() to assign BAT value
+  parallels: Use generic infrastructure for BAT writing in
+    parallels_co_check()
+  parallels: Move check of unclean image to a separate function
+  parallels: Move check of cluster outside image to a separate function
+  parallels: Fix statistics calculation
+  parallels: Move check of leaks to a separate function
+  parallels: Move statistic collection to a separate function
+  parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD
+  parallels: Incorrect condition in out-of-image check
-Akihiko Odaki (3):
+Hanna Czenczek (4):
-  block/file-posix: Optimize for macOS
+  util/iov: Make qiov_slice() public
-  block: Add backend_defaults property
+  block: Collapse padded I/O vecs exceeding IOV_MAX
-  block/io: Merge discard request alignments
+  util/iov: Remove qemu_iovec_init_extended()
   iotests/iov-padding: New test
-Stefan Hajnoczi (2):
+Jean-Louis Dupond (1):
-  util/async: add a human-readable name to BHs for debugging
+  qcow2: add discard-no-unref option
   util/async: print leaked BH name when AioContext finalizes
- include/block/aio.h            | 31 ++++++++++++++++++++++---
+ qapi/block-core.json                     |  12 ++
- include/hw/block/block.h       |  3 +++
+ block/qcow2.h                            |   3 +
- include/qemu/main-loop.h       |  4 +++-
+ include/qemu/iov.h                       |   8 +-
- block/file-posix.c             | 27 ++++++++++++++++++++--
+ block/io.c                               | 166 ++++++++++++++++++--
- block/io.c                     |  2 ++
+ block/parallels.c                        | 190 ++++++++++++++++-------
- hw/block/block.c               | 42 ++++++++++++++++++++++++++++++----
+ block/qcow2-cluster.c                    |  32 +++-
- tests/unit/ptimer-test-stubs.c |  2 +-
+ block/qcow2.c                            |  18 +++
- util/async.c                   | 25 ++++++++++++++++----
+ util/iov.c                               |  89 ++---------
- util/main-loop.c               |  4 ++--
+ qemu-options.hx                          |  12 ++
- tests/qemu-iotests/172.out     | 38 ++++++++++++++++++++++++++++++
+ tests/qemu-iotests/tests/iov-padding     |  85 ++++++++++
-files changed, 161 insertions(+), 17 deletions(-)
+ tests/qemu-iotests/tests/iov-padding.out |  59 +++++++
 files changed, 523 insertions(+), 151 deletions(-)
  create mode 100755 tests/qemu-iotests/tests/iov-padding
  create mode 100644 tests/qemu-iotests/tests/iov-padding.out
 --
-.31.1
+.40.1

-New patch
+[PULL 01/17] util/iov: Make qiov_slice() public
+We want to inline qemu_iovec_init_extended() in block/io.c for padding
+requests, and having access to qiov_slice() is useful for this.  As a
+public function, it is renamed to qemu_iovec_slice().
+(We will need to count the number of I/O vector elements of a slice
+there, and then later process this slice.  Without qiov_slice(), we
+would need to call qemu_iovec_subvec_niov(), and all further
+IOV-processing functions may need to skip prefixing elements to
+accomodate for a qiov_offset.  Because qemu_iovec_subvec_niov()
+internally calls qiov_slice(), we can just have the block/io.c code call
+qiov_slice() itself, thus get the number of elements, and also create an
+iovec array with the superfluous prefixing elements stripped, so the
+following processing functions no longer need to skip them.)
+Reviewed-by: Eric Blake <eblake@redhat.com>
+Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Message-Id: <20230411173418.19549-2-hreitz@redhat.com>
+---
+ include/qemu/iov.h |  3 +++
+ util/iov.c         | 14 +++++++-------
+files changed, 10 insertions(+), 7 deletions(-)
+diff --git a/include/qemu/iov.h b/include/qemu/iov.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/qemu/iov.h
++++ b/include/qemu/iov.h
+@@ -XXX,XX +XXX,XX @@ int qemu_iovec_init_extended(
+         void *tail_buf, size_t tail_len);
+ void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
+                            size_t offset, size_t len);
++struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
++                               size_t offset, size_t len,
++                               size_t *head, size_t *tail, int *niov);
+ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len);
+ void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len);
+ void qemu_iovec_concat(QEMUIOVector *dst,
+diff --git a/util/iov.c b/util/iov.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/iov.c
++++ b/util/iov.c
+@@ -XXX,XX +XXX,XX @@ static struct iovec *iov_skip_offset(struct iovec *iov, size_t offset,
+ }
+ /*
+- * qiov_slice
++ * qemu_iovec_slice
+  *
+  * Find subarray of iovec's, containing requested range. @head would
+  * be offset in first iov (returned by the function), @tail would be
+  * count of extra bytes in last iovec (returned iov + @niov - 1).
+  */
+-static struct iovec *qiov_slice(QEMUIOVector *qiov,
+-                                size_t offset, size_t len,
+-                                size_t *head, size_t *tail, int *niov)
++struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
++                               size_t offset, size_t len,
++                               size_t *head, size_t *tail, int *niov)
+ {
+     struct iovec *iov, *end_iov;
+@@ -XXX,XX +XXX,XX @@ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len)
+     size_t head, tail;
+     int niov;
+-    qiov_slice(qiov, offset, len, &head, &tail, &niov);
++    qemu_iovec_slice(qiov, offset, len, &head, &tail, &niov);
+     return niov;
+ }
+@@ -XXX,XX +XXX,XX @@ int qemu_iovec_init_extended(
+     }
+     if (mid_len) {
+-        mid_iov = qiov_slice(mid_qiov, mid_offset, mid_len,
+-                             &mid_head, &mid_tail, &mid_niov);
++        mid_iov = qemu_iovec_slice(mid_qiov, mid_offset, mid_len,
++                                   &mid_head, &mid_tail, &mid_niov);
+     }
+     total_niov = !!head_len + mid_niov + !!tail_len;
+--
+.40.1

-[PULL 5/5] block/io: Merge discard request alignments
+[PULL 02/17] block: Collapse padded I/O vecs exceeding IOV_MAX
-From: Akihiko Odaki <akihiko.odaki@gmail.com>
+When processing vectored guest requests that are not aligned to the
+storage request alignment, we pad them by adding head and/or tail
-Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
+buffers for a read-modify-write cycle.
-Message-id: 20210705130458.97642-3-akihiko.odaki@gmail.com
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
 with this padding, the vector can exceed that limit.  As of
 c002cef0e9abe7135d7916c51abce47f7fc1ee2 ("util/iov: make
 qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
 limit, instead returning an error to the guest.
 To the guest, this appears as a random I/O error.  We should not return
 an I/O error to the guest when it issued a perfectly valid request.
 Before 4c002cef0e9abe7135d7916c51abce47f7fc1ee2, we just made the vector
 longer than IOV_MAX, which generally seems to work (because the guest
 assumes a smaller alignment than we really have, file-posix's
 raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
 so emulate the request, so that the IOV_MAX does not matter).  However,
 that does not seem exactly great.
 I see two ways to fix this problem:
 . We split such long requests into two requests.
 . We join some elements of the vector into new buffers to make it
    shorter.
 I am wary of (1), because it seems like it may have unintended side
 effects.
 (2) on the other hand seems relatively simple to implement, with
 hopefully few side effects, so this patch does that.
 To do this, the use of qemu_iovec_init_extended() in bdrv_pad_request()
 is effectively replaced by the new function bdrv_create_padded_qiov(),
 which not only wraps the request IOV with padding head/tail, but also
 ensures that the resulting vector will not have more than IOV_MAX
 elements.  Putting that functionality into qemu_iovec_init_extended() is
 infeasible because it requires allocating a bounce buffer; doing so
 would require many more parameters (buffer alignment, how to initialize
 the buffer, and out parameters like the buffer, its length, and the
 original elements), which is not reasonable.
 Conversely, it is not difficult to move qemu_iovec_init_extended()'s
 functionality into bdrv_create_padded_qiov() by using public
 qemu_iovec_* functions, so that is what this patch does.
 Because bdrv_pad_request() was the only "serious" user of
 qemu_iovec_init_extended(), the next patch will remove the latter
 function, so the functionality is not implemented twice.
 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 Message-Id: <20230411173418.19549-3-hreitz@redhat.com>
 Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
 ---
- block/io.c | 2 ++
+ block/io.c | 166 ++++++++++++++++++++++++++++++++++++++++++++++++-----
-file changed, 2 insertions(+)
+file changed, 151 insertions(+), 15 deletions(-)
 diff --git a/block/io.c b/block/io.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/io.c
 +++ b/block/io.c
-@@ -XXX,XX +XXX,XX @@ void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
+@@ -XXX,XX +XXX,XX @@ out:
+  * @merge_reads is true for small requests,
- static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
+  * if @buf_len == @head + bytes + @tail. In this case it is possible that both
   * head and tail exist but @buf_len == align and @tail_buf == @buf.
 + *
 + * @write is true for write requests, false for read requests.
 + *
 + * If padding makes the vector too long (exceeding IOV_MAX), then we need to
 + * merge existing vector elements into a single one.  @collapse_bounce_buf acts
 + * as the bounce buffer in such cases.  @pre_collapse_qiov has the pre-collapse
 + * I/O vector elements so for read requests, the data can be copied back after
 + * the read is done.
   */
  typedef struct BdrvRequestPadding {
      uint8_t *buf;
@@ -XXX,XX +XXX,XX @@ typedef struct BdrvRequestPadding {
      size_t head;
      size_t tail;
      bool merge_reads;
 +    bool write;
      QEMUIOVector local_qiov;
 +
 +    uint8_t *collapse_bounce_buf;
 +    size_t collapse_len;
 +    QEMUIOVector pre_collapse_qiov;
  } BdrvRequestPadding;
  static bool bdrv_init_padding(BlockDriverState *bs,
                                int64_t offset, int64_t bytes,
 +                              bool write,
                                BdrvRequestPadding *pad)
  {
-+    dst->pdiscard_alignment = MAX(dst->pdiscard_alignment,
+     int64_t align = bs->bl.request_alignment;
-+                                  src->pdiscard_alignment);
+@@ -XXX,XX +XXX,XX @@ static bool bdrv_init_padding(BlockDriverState *bs,
-     dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
+         pad->tail_buf = pad->buf + pad->buf_len - align;
-     dst->max_transfer = MIN_NON_ZERO(dst->max_transfer, src->max_transfer);
+     }
-     dst->max_hw_transfer = MIN_NON_ZERO(dst->max_hw_transfer,
 +    pad->write = write;
 +
      return true;
  }
@@ -XXX,XX +XXX,XX @@ zero_mem:
      return 0;
  }
 -static void bdrv_padding_destroy(BdrvRequestPadding *pad)
 +/**
 + * Free *pad's associated buffers, and perform any necessary finalization steps.
 + */
 +static void bdrv_padding_finalize(BdrvRequestPadding *pad)
  {
 +    if (pad->collapse_bounce_buf) {
 +        if (!pad->write) {
 +            /*
 +             * If padding required elements in the vector to be collapsed into a
 +             * bounce buffer, copy the bounce buffer content back
 +             */
 +            qemu_iovec_from_buf(&pad->pre_collapse_qiov, 0,
 +                                pad->collapse_bounce_buf, pad->collapse_len);
 +        }
 +        qemu_vfree(pad->collapse_bounce_buf);
 +        qemu_iovec_destroy(&pad->pre_collapse_qiov);
 +    }
      if (pad->buf) {
          qemu_vfree(pad->buf);
          qemu_iovec_destroy(&pad->local_qiov);
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
      memset(pad, 0, sizeof(*pad));
  }
 +/*
 + * Create pad->local_qiov by wrapping @iov in the padding head and tail, while
 + * ensuring that the resulting vector will not exceed IOV_MAX elements.
 + *
 + * To ensure this, when necessary, the first two or three elements of @iov are
 + * merged into pad->collapse_bounce_buf and replaced by a reference to that
 + * bounce buffer in pad->local_qiov.
 + *
 + * After performing a read request, the data from the bounce buffer must be
 + * copied back into pad->pre_collapse_qiov (e.g. by bdrv_padding_finalize()).
 + */
 +static int bdrv_create_padded_qiov(BlockDriverState *bs,
 +                                   BdrvRequestPadding *pad,
 +                                   struct iovec *iov, int niov,
 +                                   size_t iov_offset, size_t bytes)
 +{
 +    int padded_niov, surplus_count, collapse_count;
 +
 +    /* Assert this invariant */
 +    assert(niov <= IOV_MAX);
 +
 +    /*
 +     * Cannot pad if resulting length would exceed SIZE_MAX.  Returning an error
 +     * to the guest is not ideal, but there is little else we can do.  At least
 +     * this will practically never happen on 64-bit systems.
 +     */
 +    if (SIZE_MAX - pad->head < bytes ||
 +        SIZE_MAX - pad->head - bytes < pad->tail)
 +    {
 +        return -EINVAL;
 +    }
 +
 +    /* Length of the resulting IOV if we just concatenated everything */
 +    padded_niov = !!pad->head + niov + !!pad->tail;
 +
 +    qemu_iovec_init(&pad->local_qiov, MIN(padded_niov, IOV_MAX));
 +
 +    if (pad->head) {
 +        qemu_iovec_add(&pad->local_qiov, pad->buf, pad->head);
 +    }
 +
 +    /*
 +     * If padded_niov > IOV_MAX, we cannot just concatenate everything.
 +     * Instead, merge the first two or three elements of @iov to reduce the
 +     * number of vector elements as necessary.
 +     */
 +    if (padded_niov > IOV_MAX) {
 +        /*
 +         * Only head and tail can have lead to the number of entries exceeding
 +         * IOV_MAX, so we can exceed it by the head and tail at most.  We need
 +         * to reduce the number of elements by `surplus_count`, so we merge that
 +         * many elements plus one into one element.
 +         */
 +        surplus_count = padded_niov - IOV_MAX;
 +        assert(surplus_count <= !!pad->head + !!pad->tail);
 +        collapse_count = surplus_count + 1;
 +
 +        /*
 +         * Move the elements to collapse into `pad->pre_collapse_qiov`, then
 +         * advance `iov` (and associated variables) by those elements.
 +         */
 +        qemu_iovec_init(&pad->pre_collapse_qiov, collapse_count);
 +        qemu_iovec_concat_iov(&pad->pre_collapse_qiov, iov,
 +                              collapse_count, iov_offset, SIZE_MAX);
 +        iov += collapse_count;
 +        iov_offset = 0;
 +        niov -= collapse_count;
 +        bytes -= pad->pre_collapse_qiov.size;
 +
 +        /*
 +         * Construct the bounce buffer to match the length of the to-collapse
 +         * vector elements, and for write requests, initialize it with the data
 +         * from those elements.  Then add it to `pad->local_qiov`.
 +         */
 +        pad->collapse_len = pad->pre_collapse_qiov.size;
 +        pad->collapse_bounce_buf = qemu_blockalign(bs, pad->collapse_len);
 +        if (pad->write) {
 +            qemu_iovec_to_buf(&pad->pre_collapse_qiov, 0,
 +                              pad->collapse_bounce_buf, pad->collapse_len);
 +        }
 +        qemu_iovec_add(&pad->local_qiov,
 +                       pad->collapse_bounce_buf, pad->collapse_len);
 +    }
 +
 +    qemu_iovec_concat_iov(&pad->local_qiov, iov, niov, iov_offset, bytes);
 +
 +    if (pad->tail) {
 +        qemu_iovec_add(&pad->local_qiov,
 +                       pad->buf + pad->buf_len - pad->tail, pad->tail);
 +    }
 +
 +    assert(pad->local_qiov.niov == MIN(padded_niov, IOV_MAX));
 +    return 0;
 +}
 +
  /*
   * bdrv_pad_request
   *
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
   * read of padding, bdrv_padding_rmw_read() should be called separately if
   * needed.
   *
 + * @write is true for write requests, false for read requests.
 + *
   * Request parameters (@qiov, &qiov_offset, &offset, &bytes) are in-out:
   *  - on function start they represent original request
   *  - on failure or when padding is not needed they are unchanged
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
  static int bdrv_pad_request(BlockDriverState *bs,
                              QEMUIOVector **qiov, size_t *qiov_offset,
                              int64_t *offset, int64_t *bytes,
 +                            bool write,
                              BdrvRequestPadding *pad, bool *padded,
                              BdrvRequestFlags *flags)
  {
      int ret;
 +    struct iovec *sliced_iov;
 +    int sliced_niov;
 +    size_t sliced_head, sliced_tail;
      bdrv_check_qiov_request(*offset, *bytes, *qiov, *qiov_offset, &error_abort);
 -    if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
 +    if (!bdrv_init_padding(bs, *offset, *bytes, write, pad)) {
          if (padded) {
              *padded = false;
          }
          return 0;
      }
 -    ret = qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
 -                                   *qiov, *qiov_offset, *bytes,
 -                                   pad->buf + pad->buf_len - pad->tail,
 -                                   pad->tail);
 +    sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes,
 +                                  &sliced_head, &sliced_tail,
 +                                  &sliced_niov);
 +
 +    /* Guaranteed by bdrv_check_qiov_request() */
 +    assert(*bytes <= SIZE_MAX);
 +    ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov,
 +                                  sliced_head, *bytes);
      if (ret < 0) {
 -        bdrv_padding_destroy(pad);
 +        bdrv_padding_finalize(pad);
          return ret;
      }
      *bytes += pad->head + pad->tail;
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
          flags |= BDRV_REQ_COPY_ON_READ;
      }
 -    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
 -                           NULL, &flags);
 +    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, false,
 +                           &pad, NULL, &flags);
      if (ret < 0) {
          goto fail;
      }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
                                bs->bl.request_alignment,
                                qiov, qiov_offset, flags);
      tracked_request_end(&req);
 -    bdrv_padding_destroy(&pad);
 +    bdrv_padding_finalize(&pad);
  fail:
      bdrv_dec_in_flight(bs);
@@ -XXX,XX +XXX,XX @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
      /* This flag doesn't make sense for padding or zero writes */
      flags &= ~BDRV_REQ_REGISTERED_BUF;
 -    padding = bdrv_init_padding(bs, offset, bytes, &pad);
 +    padding = bdrv_init_padding(bs, offset, bytes, true, &pad);
      if (padding) {
          assert(!(flags & BDRV_REQ_NO_WAIT));
          bdrv_make_request_serialising(req, align);
@@ -XXX,XX +XXX,XX @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
      }
  out:
 -    bdrv_padding_destroy(&pad);
 +    bdrv_padding_finalize(&pad);
      return ret;
  }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
           * bdrv_co_do_zero_pwritev() does aligning by itself, so, we do
           * alignment only if there is no ZERO flag.
           */
 -        ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
 -                               &padded, &flags);
 +        ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, true,
 +                               &pad, &padded, &flags);
          if (ret < 0) {
              return ret;
          }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
      ret = bdrv_aligned_pwritev(child, &req, offset, bytes, align,
                                 qiov, qiov_offset, flags);
 -    bdrv_padding_destroy(&pad);
 +    bdrv_padding_finalize(&pad);
  out:
      tracked_request_end(&req);
 --
-.31.1
+.40.1

-New patch
+[PULL 03/17] util/iov: Remove qemu_iovec_init_extended()
+bdrv_pad_request() was the main user of qemu_iovec_init_extended().
+HEAD^ has removed that use, so we can remove qemu_iovec_init_extended()
+now.
+The only remaining user is qemu_iovec_init_slice(), which can easily
+inline the small part it really needs.
+Note that qemu_iovec_init_extended() offered a memcpy() optimization to
+initialize the new I/O vector.  qemu_iovec_concat_iov(), which is used
+to replace its functionality, does not, but calls qemu_iovec_add() for
+every single element.  If we decide this optimization was important, we
+will need to re-implement it in qemu_iovec_concat_iov(), which might
+also benefit its pre-existing users.
+Reviewed-by: Eric Blake <eblake@redhat.com>
+Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Message-Id: <20230411173418.19549-4-hreitz@redhat.com>
+---
+ include/qemu/iov.h |  5 ---
+ util/iov.c         | 79 +++++++---------------------------------------
+files changed, 11 insertions(+), 73 deletions(-)
+diff --git a/include/qemu/iov.h b/include/qemu/iov.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/qemu/iov.h
++++ b/include/qemu/iov.h
+@@ -XXX,XX +XXX,XX @@ static inline void *qemu_iovec_buf(QEMUIOVector *qiov)
+ void qemu_iovec_init(QEMUIOVector *qiov, int alloc_hint);
+ void qemu_iovec_init_external(QEMUIOVector *qiov, struct iovec *iov, int niov);
+-int qemu_iovec_init_extended(
+-        QEMUIOVector *qiov,
+-        void *head_buf, size_t head_len,
+-        QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
+-        void *tail_buf, size_t tail_len);
+ void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
+                            size_t offset, size_t len);
+ struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
+diff --git a/util/iov.c b/util/iov.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/iov.c
++++ b/util/iov.c
+@@ -XXX,XX +XXX,XX @@ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len)
+     return niov;
+ }
+-/*
+- * Compile new iovec, combining @head_buf buffer, sub-qiov of @mid_qiov,
+- * and @tail_buf buffer into new qiov.
+- */
+-int qemu_iovec_init_extended(
+-        QEMUIOVector *qiov,
+-        void *head_buf, size_t head_len,
+-        QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
+-        void *tail_buf, size_t tail_len)
+-{
+-    size_t mid_head, mid_tail;
+-    int total_niov, mid_niov = 0;
+-    struct iovec *p, *mid_iov = NULL;
+-
+-    assert(mid_qiov->niov <= IOV_MAX);
+-
+-    if (SIZE_MAX - head_len < mid_len ||
+-        SIZE_MAX - head_len - mid_len < tail_len)
+-    {
+-        return -EINVAL;
+-    }
+-
+-    if (mid_len) {
+-        mid_iov = qemu_iovec_slice(mid_qiov, mid_offset, mid_len,
+-                                   &mid_head, &mid_tail, &mid_niov);
+-    }
+-
+-    total_niov = !!head_len + mid_niov + !!tail_len;
+-    if (total_niov > IOV_MAX) {
+-        return -EINVAL;
+-    }
+-
+-    if (total_niov == 1) {
+-        qemu_iovec_init_buf(qiov, NULL, 0);
+-        p = &qiov->local_iov;
+-    } else {
+-        qiov->niov = qiov->nalloc = total_niov;
+-        qiov->size = head_len + mid_len + tail_len;
+-        p = qiov->iov = g_new(struct iovec, qiov->niov);
+-    }
+-
+-    if (head_len) {
+-        p->iov_base = head_buf;
+-        p->iov_len = head_len;
+-        p++;
+-    }
+-
+-    assert(!mid_niov == !mid_len);
+-    if (mid_niov) {
+-        memcpy(p, mid_iov, mid_niov * sizeof(*p));
+-        p[0].iov_base = (uint8_t *)p[0].iov_base + mid_head;
+-        p[0].iov_len -= mid_head;
+-        p[mid_niov - 1].iov_len -= mid_tail;
+-        p += mid_niov;
+-    }
+-
+-    if (tail_len) {
+-        p->iov_base = tail_buf;
+-        p->iov_len = tail_len;
+-    }
+-
+-    return 0;
+-}
+-
+ /*
+  * Check if the contents of subrange of qiov data is all zeroes.
+  */
+@@ -XXX,XX +XXX,XX @@ bool qemu_iovec_is_zero(QEMUIOVector *qiov, size_t offset, size_t bytes)
+ void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
+                            size_t offset, size_t len)
+ {
+-    int ret;
++    struct iovec *slice_iov;
++    int slice_niov;
++    size_t slice_head, slice_tail;
+     assert(source->size >= len);
+     assert(source->size - len >= offset);
+-    /* We shrink the request, so we can't overflow neither size_t nor MAX_IOV */
+-    ret = qemu_iovec_init_extended(qiov, NULL, 0, source, offset, len, NULL, 0);
+-    assert(ret == 0);
++    slice_iov = qemu_iovec_slice(source, offset, len,
++                                 &slice_head, &slice_tail, &slice_niov);
++    if (slice_niov == 1) {
++        qemu_iovec_init_buf(qiov, slice_iov[0].iov_base + slice_head, len);
++    } else {
++        qemu_iovec_init(qiov, slice_niov);
++        qemu_iovec_concat_iov(qiov, slice_iov, slice_niov, slice_head, len);
++    }
+ }
+ void qemu_iovec_destroy(QEMUIOVector *qiov)
+--
+.40.1

-New patch
+[PULL 04/17] iotests/iov-padding: New test
+Test that even vectored IO requests with 1024 vector elements that are
+not aligned to the device's request alignment will succeed.
+Reviewed-by: Eric Blake <eblake@redhat.com>
+Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Message-Id: <20230411173418.19549-5-hreitz@redhat.com>
+---
+ tests/qemu-iotests/tests/iov-padding     | 85 ++++++++++++++++++++++++
+ tests/qemu-iotests/tests/iov-padding.out | 59 ++++++++++++++++
+files changed, 144 insertions(+)
+ create mode 100755 tests/qemu-iotests/tests/iov-padding
+ create mode 100644 tests/qemu-iotests/tests/iov-padding.out
+diff --git a/tests/qemu-iotests/tests/iov-padding b/tests/qemu-iotests/tests/iov-padding
+new file mode 100755
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/tests/qemu-iotests/tests/iov-padding
+@@ -XXX,XX +XXX,XX @@
++#!/usr/bin/env bash
++# group: rw quick
++#
++# Check the interaction of request padding (to fit alignment restrictions) with
++# vectored I/O from the guest
++#
++# Copyright Red Hat
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program.  If not, see <http://www.gnu.org/licenses/>.
++#
++
++seq=$(basename $0)
++echo "QA output created by $seq"
++
++status=1    # failure is the default!
++
++_cleanup()
++{
++    _cleanup_test_img
++}
++trap "_cleanup; exit \$status" 0 1 2 3 15
++
++# get standard environment, filters and checks
++cd ..
++. ./common.rc
++. ./common.filter
++
++_supported_fmt raw
++_supported_proto file
++
++_make_test_img 1M
++
++IMGSPEC="driver=blkdebug,align=4096,image.driver=file,image.filename=$TEST_IMG"
++
++# Four combinations:
++# - Offset 4096, length 1023 * 512 + 512: Fully aligned to 4k
++# - Offset 4096, length 1023 * 512 + 4096: Head is aligned, tail is not
++# - Offset 512, length 1023 * 512 + 512: Neither head nor tail are aligned
++# - Offset 512, length 1023 * 512 + 4096: Tail is aligned, head is not
++for start_offset in 4096 512; do
++    for last_element_length in 512 4096; do
++        length=$((1023 * 512 + $last_element_length))
++
++        echo
++        echo "== performing 1024-element vectored requests to image (offset: $start_offset; length: $length) =="
++
++        # Fill with data for testing
++        $QEMU_IO -c 'write -P 1 0 1M' "$TEST_IMG" | _filter_qemu_io
++
++        # 1023 512-byte buffers, and then one with length $last_element_length
++        cmd_params="-P 2 $start_offset $(yes 512 | head -n 1023 | tr '\n' ' ') $last_element_length"
++        QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" $QEMU_IO \
++            -c "writev $cmd_params" \
++            --image-opts \
++            "$IMGSPEC" \
++            | _filter_qemu_io
++
++        # Read all patterns -- read the part we just wrote with writev twice,
++        # once "normally", and once with a readv, so we see that that works, too
++        QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" $QEMU_IO \
++            -c "read -P 1 0 $start_offset" \
++            -c "read -P 2 $start_offset $length" \
++            -c "readv $cmd_params" \
++            -c "read -P 1 $((start_offset + length)) $((1024 * 1024 - length - start_offset))" \
++            --image-opts \
++            "$IMGSPEC" \
++            | _filter_qemu_io
++    done
++done
++
++# success, all done
++echo "*** done"
++rm -f $seq.full
++status=0
+diff --git a/tests/qemu-iotests/tests/iov-padding.out b/tests/qemu-iotests/tests/iov-padding.out
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/tests/qemu-iotests/tests/iov-padding.out
+@@ -XXX,XX +XXX,XX @@
++QA output created by iov-padding
++Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
++
++== performing 1024-element vectored requests to image (offset: 4096; length: 524288) ==
++wrote 1048576/1048576 bytes at offset 0
++1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++wrote 524288/524288 bytes at offset 4096
++512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 4096/4096 bytes at offset 0
++4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 524288/524288 bytes at offset 4096
++512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 524288/524288 bytes at offset 4096
++512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 520192/520192 bytes at offset 528384
++508 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++
++== performing 1024-element vectored requests to image (offset: 4096; length: 527872) ==
++wrote 1048576/1048576 bytes at offset 0
++1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++wrote 527872/527872 bytes at offset 4096
++515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 4096/4096 bytes at offset 0
++4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 527872/527872 bytes at offset 4096
++515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 527872/527872 bytes at offset 4096
++515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 516608/516608 bytes at offset 531968
++504.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++
++== performing 1024-element vectored requests to image (offset: 512; length: 524288) ==
++wrote 1048576/1048576 bytes at offset 0
++1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++wrote 524288/524288 bytes at offset 512
++512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 512/512 bytes at offset 0
++512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 524288/524288 bytes at offset 512
++512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 524288/524288 bytes at offset 512
++512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 523776/523776 bytes at offset 524800
++511.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++
++== performing 1024-element vectored requests to image (offset: 512; length: 527872) ==
++wrote 1048576/1048576 bytes at offset 0
++1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++wrote 527872/527872 bytes at offset 512
++515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 512/512 bytes at offset 0
++512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 527872/527872 bytes at offset 512
++515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 527872/527872 bytes at offset 512
++515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++read 520192/520192 bytes at offset 528384
++508 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++*** done
+--
+.40.1

-New patch
+[PULL 05/17] parallels: Out of image offset in BAT leads to image inflation
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+data_end field in BDRVParallelsState is set to the biggest offset present
+in BAT. If this offset is outside of the image, any further write will
+create the cluster at this offset and/or the image will be truncated to
+this offset on close. This is definitely not correct.
+Raise an error in parallels_open() if data_end points outside the image
+and it is not a check (let the check to repaire the image). Set data_end
+to the end of the cluster with the last correct offset.
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Message-Id: <20230424093147.197643-2-alexander.ivanov@virtuozzo.com>
+Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+---
+ block/parallels.c | 17 +++++++++++++++++
+file changed, 17 insertions(+)
+diff --git a/block/parallels.c b/block/parallels.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/parallels.c
++++ b/block/parallels.c
+@@ -XXX,XX +XXX,XX @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
+     BDRVParallelsState *s = bs->opaque;
+     ParallelsHeader ph;
+     int ret, size, i;
++    int64_t file_nb_sectors;
+     QemuOpts *opts = NULL;
+     Error *local_err = NULL;
+     char *buf;
+@@ -XXX,XX +XXX,XX @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
+         return ret;
+     }
++    file_nb_sectors = bdrv_nb_sectors(bs->file->bs);
++    if (file_nb_sectors < 0) {
++        return -EINVAL;
++    }
++
+     ret = bdrv_pread(bs->file, 0, sizeof(ph), &ph, 0);
+     if (ret < 0) {
+         goto fail;
+@@ -XXX,XX +XXX,XX @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
+     for (i = 0; i < s->bat_size; i++) {
+         int64_t off = bat2sect(s, i);
++        if (off >= file_nb_sectors) {
++            if (flags & BDRV_O_CHECK) {
++                continue;
++            }
++            error_setg(errp, "parallels: Offset %" PRIi64 " in BAT[%d] entry "
++                       "is larger than file size (%" PRIi64 ")",
++                       off << BDRV_SECTOR_BITS, i,
++                       file_nb_sectors << BDRV_SECTOR_BITS);
++            ret = -EINVAL;
++            goto fail;
++        }
+         if (off >= s->data_end) {
+             s->data_end = off + s->tracks;
+         }
+--
+.40.1

-New patch
+[PULL 06/17] parallels: Fix high_off calculation in parallels_co_check()
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Don't let high_off be more than the file size even if we don't fix the
+image.
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Reviewed-by: Denis V. Lunev <den@openvz.org>
+Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Message-Id: <20230424093147.197643-3-alexander.ivanov@virtuozzo.com>
+Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+---
+ block/parallels.c | 4 ++--
+file changed, 2 insertions(+), 2 deletions(-)
+diff --git a/block/parallels.c b/block/parallels.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/parallels.c
++++ b/block/parallels.c
+@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+                     fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+             res->corruptions++;
+             if (fix & BDRV_FIX_ERRORS) {
+-                prev_off = 0;
+                 s->bat_bitmap[i] = 0;
+                 res->corruptions_fixed++;
+                 flush_bat = true;
+-                continue;
+             }
++            prev_off = 0;
++            continue;
+         }
+         res->bfi.allocated_clusters++;
+--
+.40.1

-New patch
+[PULL 07/17] parallels: Fix image_end_offset and data_end after out-of-image check
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Set data_end to the end of the last cluster inside the image. In such a
+way we can be sure that corrupted offsets in the BAT can't affect on the
+image size. If there are no allocated clusters set image_end_offset by
+data_end.
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Reviewed-by: Denis V. Lunev <den@openvz.org>
+Message-Id: <20230424093147.197643-4-alexander.ivanov@virtuozzo.com>
+Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+---
+ block/parallels.c | 8 +++++++-
+file changed, 7 insertions(+), 1 deletion(-)
+diff --git a/block/parallels.c b/block/parallels.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/parallels.c
++++ b/block/parallels.c
+@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+         }
+     }
+-    res->image_end_offset = high_off + s->cluster_size;
++    if (high_off == 0) {
++        res->image_end_offset = s->data_end << BDRV_SECTOR_BITS;
++    } else {
++        res->image_end_offset = high_off + s->cluster_size;
++        s->data_end = res->image_end_offset >> BDRV_SECTOR_BITS;
++    }
++
+     if (size > res->image_end_offset) {
+         int64_t count;
+         count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
+--
+.40.1

-New patch
+[PULL 08/17] parallels: create parallels_set_bat_entry_helper() to assign BAT value
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+This helper will be reused in next patches during parallels_co_check
+rework to simplify its code.
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Reviewed-by: Denis V. Lunev <den@openvz.org>
+Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Message-Id: <20230424093147.197643-5-alexander.ivanov@virtuozzo.com>
+Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+---
+ block/parallels.c | 11 ++++++++---
+file changed, 8 insertions(+), 3 deletions(-)
+diff --git a/block/parallels.c b/block/parallels.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/parallels.c
++++ b/block/parallels.c
+@@ -XXX,XX +XXX,XX @@ static int64_t block_status(BDRVParallelsState *s, int64_t sector_num,
+     return start_off;
+ }
++static void parallels_set_bat_entry(BDRVParallelsState *s,
++                                    uint32_t index, uint32_t offset)
++{
++    s->bat_bitmap[index] = cpu_to_le32(offset);
++    bitmap_set(s->bat_dirty_bmap, bat_entry_off(index) / s->bat_dirty_block, 1);
++}
++
+ static int64_t coroutine_fn GRAPH_RDLOCK
+ allocate_clusters(BlockDriverState *bs, int64_t sector_num,
+                   int nb_sectors, int *pnum)
+@@ -XXX,XX +XXX,XX @@ allocate_clusters(BlockDriverState *bs, int64_t sector_num,
+     }
+     for (i = 0; i < to_allocate; i++) {
+-        s->bat_bitmap[idx + i] = cpu_to_le32(s->data_end / s->off_multiplier);
++        parallels_set_bat_entry(s, idx + i, s->data_end / s->off_multiplier);
+         s->data_end += s->tracks;
+-        bitmap_set(s->bat_dirty_bmap,
+-                   bat_entry_off(idx + i) / s->bat_dirty_block, 1);
+     }
+     return bat2sect(s, idx) + sector_num % s->tracks;
+--
+.40.1

-[PULL 1/5] util/async: add a human-readable name to BHs for debugging
+[PULL 09/17] parallels: Use generic infrastructure for BAT writing in parallels_co_check()
-It can be difficult to debug issues with BHs in production environments.
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
 Although BHs can usually be identified by looking up their ->cb()
 function pointer, this requires debug information for the program. It is
 also not possible to print human-readable diagnostics about BHs because
 they have no identifier.
-This patch adds a name to each BH. The name is not unique per instance
+BAT is written in the context of conventional operations over the image
-but differentiates between cb() functions, which is usually enough. It's
+inside bdrv_co_flush() when it calls parallels_co_flush_to_os() callback.
-done by changing aio_bh_new() and friends to macros that stringify cb.
+Thus we should not modify BAT array directly, but call
 parallels_set_bat_entry() helper and bdrv_co_flush() further on. After
 that there is no need to manually write BAT and track its modification.
-The next patch will use the name field when reporting leaked BHs.
+This makes code more generic and allows to split parallels_set_bat_entry()
 for independent pieces.
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Reviewed-by: Denis V. Lunev <den@openvz.org>
-Message-Id: <20210414200247.917496-2-stefanha@redhat.com>
+Message-Id: <20230424093147.197643-6-alexander.ivanov@virtuozzo.com>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
- include/block/aio.h            | 31 ++++++++++++++++++++++++++++---
+ block/parallels.c | 23 ++++++++++-------------
- include/qemu/main-loop.h       |  4 +++-
+file changed, 10 insertions(+), 13 deletions(-)
  tests/unit/ptimer-test-stubs.c |  2 +-
  util/async.c                   |  9 +++++++--
  util/main-loop.c               |  4 ++--
 files changed, 41 insertions(+), 9 deletions(-)
-diff --git a/include/block/aio.h b/include/block/aio.h
+diff --git a/block/parallels.c b/block/parallels.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/block/aio.h
+--- a/block/parallels.c
-+++ b/include/block/aio.h
++++ b/block/parallels.c
-@@ -XXX,XX +XXX,XX @@ void aio_context_acquire(AioContext *ctx);
+@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
- /* Relinquish ownership of the AioContext. */
+ {
- void aio_context_release(AioContext *ctx);
+     BDRVParallelsState *s = bs->opaque;
+     int64_t size, prev_off, high_off;
-+/**
+-    int ret;
-+ * aio_bh_schedule_oneshot_full: Allocate a new bottom half structure that will
++    int ret = 0;
-+ * run only once and as soon as possible.
+     uint32_t i;
-+ *
+-    bool flush_bat = false;
-+ * @name: A human-readable identifier for debugging purposes.
-+ */
+     size = bdrv_getlength(bs->file->bs);
-+void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
+     if (size < 0) {
-+                                  const char *name);
+@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
                      fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
              res->corruptions++;
              if (fix & BDRV_FIX_ERRORS) {
 -                s->bat_bitmap[i] = 0;
 +                parallels_set_bat_entry(s, i, 0);
                  res->corruptions_fixed++;
 -                flush_bat = true;
              }
              prev_off = 0;
              continue;
@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
          prev_off = off;
      }
 -    ret = 0;
 -    if (flush_bat) {
 -        ret = bdrv_co_pwrite_sync(bs->file, 0, s->header_size, s->header, 0);
 -        if (ret < 0) {
 -            res->check_errors++;
 -            goto out;
 -        }
 -    }
 -
      if (high_off == 0) {
          res->image_end_offset = s->data_end << BDRV_SECTOR_BITS;
      } else {
@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
  out:
      qemu_co_mutex_unlock(&s->lock);
 +
- /**
++    if (ret == 0) {
-  * aio_bh_schedule_oneshot: Allocate a new bottom half structure that will run
++        ret = bdrv_co_flush(bs);
-  * only once and as soon as possible.
++        if (ret < 0) {
-+ *
++            res->check_errors++;
-+ * A convenience wrapper for aio_bh_schedule_oneshot_full() that uses cb as the
++        }
-+ * name string.
++    }
   */
 -void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
 +#define aio_bh_schedule_oneshot(ctx, cb, opaque) \
 +    aio_bh_schedule_oneshot_full((ctx), (cb), (opaque), (stringify(cb)))
  /**
 - * aio_bh_new: Allocate a new bottom half structure.
 + * aio_bh_new_full: Allocate a new bottom half structure.
   *
   * Bottom halves are lightweight callbacks whose invocation is guaranteed
   * to be wait-free, thread-safe and signal-safe.  The #QEMUBH structure
   * is opaque and must be allocated prior to its use.
 + *
 + * @name: A human-readable identifier for debugging purposes.
   */
 -QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
 +QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
 +                        const char *name);
 +
-+/**
+     return ret;
 + * aio_bh_new: Allocate a new bottom half structure
 + *
 + * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
 + * string.
 + */
 +#define aio_bh_new(ctx, cb, opaque) \
 +    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
  /**
   * aio_notify: Force processing of pending events.
 diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/qemu/main-loop.h
 +++ b/include/qemu/main-loop.h
@@ -XXX,XX +XXX,XX @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
  void qemu_fd_register(int fd);
 -QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
 +#define qemu_bh_new(cb, opaque) \
 +    qemu_bh_new_full((cb), (opaque), (stringify(cb)))
 +QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
  void qemu_bh_schedule_idle(QEMUBH *bh);
  enum {
 diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/unit/ptimer-test-stubs.c
 +++ b/tests/unit/ptimer-test-stubs.c
@@ -XXX,XX +XXX,XX @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int attr_mask)
      return deadline;
  }
--QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
-+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
- {
-     QEMUBH *bh = g_new(QEMUBH, 1);
-diff --git a/util/async.c b/util/async.c
-index XXXXXXX..XXXXXXX 100644
---- a/util/async.c
-+++ b/util/async.c
-@@ -XXX,XX +XXX,XX @@ enum {
- struct QEMUBH {
-     AioContext *ctx;
-+    const char *name;
-     QEMUBHFunc *cb;
-     void *opaque;
-     QSLIST_ENTRY(QEMUBH) next;
-@@ -XXX,XX +XXX,XX @@ static QEMUBH *aio_bh_dequeue(BHList *head, unsigned *flags)
-     return bh;
- }
--void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
-+void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
-+                                  void *opaque, const char *name)
- {
-     QEMUBH *bh;
-     bh = g_new(QEMUBH, 1);
-@@ -XXX,XX +XXX,XX @@ void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
-         .ctx = ctx,
-         .cb = cb,
-         .opaque = opaque,
-+        .name = name,
-     };
-     aio_bh_enqueue(bh, BH_SCHEDULED | BH_ONESHOT);
- }
--QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
-+QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
-+                        const char *name)
- {
-     QEMUBH *bh;
-     bh = g_new(QEMUBH, 1);
-@@ -XXX,XX +XXX,XX @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
-         .ctx = ctx,
-         .cb = cb,
-         .opaque = opaque,
-+        .name = name,
-     };
-     return bh;
- }
-diff --git a/util/main-loop.c b/util/main-loop.c
-index XXXXXXX..XXXXXXX 100644
---- a/util/main-loop.c
-+++ b/util/main-loop.c
-@@ -XXX,XX +XXX,XX @@ void main_loop_wait(int nonblocking)
- /* Functions to operate on the main QEMU AioContext.  */
--QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
-+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
- {
--    return aio_bh_new(qemu_aio_context, cb, opaque);
-+    return aio_bh_new_full(qemu_aio_context, cb, opaque, name);
- }
- /*
 --
-.31.1
+.40.1

-New patch
+[PULL 10/17] parallels: Move check of unclean image to a separate function
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+We will add more and more checks so we need a better code structure
+in parallels_co_check. Let each check performs in a separate loop
+in a separate helper.
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Reviewed-by: Denis V. Lunev <den@openvz.org>
+Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Message-Id: <20230424093147.197643-7-alexander.ivanov@virtuozzo.com>
+Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+---
+ block/parallels.c | 31 +++++++++++++++++++++----------
+file changed, 21 insertions(+), 10 deletions(-)
+diff --git a/block/parallels.c b/block/parallels.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/parallels.c
++++ b/block/parallels.c
+@@ -XXX,XX +XXX,XX @@ parallels_co_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
+     return ret;
+ }
++static void parallels_check_unclean(BlockDriverState *bs,
++                                    BdrvCheckResult *res,
++                                    BdrvCheckMode fix)
++{
++    BDRVParallelsState *s = bs->opaque;
++
++    if (!s->header_unclean) {
++        return;
++    }
++
++    fprintf(stderr, "%s image was not closed correctly\n",
++            fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
++    res->corruptions++;
++    if (fix & BDRV_FIX_ERRORS) {
++        /* parallels_close will do the job right */
++        res->corruptions_fixed++;
++        s->header_unclean = false;
++    }
++}
+ static int coroutine_fn GRAPH_RDLOCK
+ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+     }
+     qemu_co_mutex_lock(&s->lock);
+-    if (s->header_unclean) {
+-        fprintf(stderr, "%s image was not closed correctly\n",
+-                fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
+-        res->corruptions++;
+-        if (fix & BDRV_FIX_ERRORS) {
+-            /* parallels_close will do the job right */
+-            res->corruptions_fixed++;
+-            s->header_unclean = false;
+-        }
+-    }
++
++    parallels_check_unclean(bs, res, fix);
+     res->bfi.total_clusters = s->bat_size;
+     res->bfi.compressed_clusters = 0; /* compression is not supported */
+--
+.40.1

-[PULL 3/5] block/file-posix: Optimize for macOS
+[PULL 11/17] parallels: Move check of cluster outside image to a separate function
-From: Akihiko Odaki <akihiko.odaki@gmail.com>
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
-This commit introduces "punch hole" operation and optimizes transfer
+We will add more and more checks so we need a better code structure in
-block size for macOS.
+parallels_co_check. Let each check performs in a separate loop in a
 separate helper.
-Thanks to Konstantin Nazarov for detailed analysis of a flaw in an
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
-old version of this change:
+Reviewed-by: Denis V. Lunev <den@openvz.org>
-https://gist.github.com/akihikodaki/87df4149e7ca87f18dc56807ec5a1bc5#gistcomment-3654667
+Message-Id: <20230424093147.197643-8-alexander.ivanov@virtuozzo.com>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
  block/parallels.c | 75 +++++++++++++++++++++++++++++++----------------
 file changed, 49 insertions(+), 26 deletions(-)
-Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
+diff --git a/block/parallels.c b/block/parallels.c
 Message-id: 20210705130458.97642-1-akihiko.odaki@gmail.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
  block/file-posix.c | 27 +++++++++++++++++++++++++--
 file changed, 25 insertions(+), 2 deletions(-)
 diff --git a/block/file-posix.c b/block/file-posix.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/file-posix.c
+--- a/block/parallels.c
-+++ b/block/file-posix.c
++++ b/block/parallels.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ static void parallels_check_unclean(BlockDriverState *bs,
  #if defined(HAVE_HOST_BLOCK_DEVICE)
  #include <paths.h>
  #include <sys/param.h>
 +#include <sys/mount.h>
  #include <IOKit/IOKitLib.h>
  #include <IOKit/IOBSD.h>
  #include <IOKit/storage/IOMediaBSDClient.h>
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
          return;
      }
 +#if defined(__APPLE__) && (__MACH__)
 +    struct statfs buf;
 +
 +    if (!fstatfs(s->fd, &buf)) {
 +        bs->bl.opt_transfer = buf.f_iosize;
 +        bs->bl.pdiscard_alignment = buf.f_bsize;
 +    }
 +#endif
 +
      if (bs->sg || S_ISBLK(st.st_mode)) {
          int ret = hdev_get_max_hw_transfer(s->fd, &st);
@@ -XXX,XX +XXX,XX @@ out:
      }
  }
-+#if defined(CONFIG_FALLOCATE) || defined(BLKZEROOUT) || defined(BLKDISCARD)
++static int coroutine_fn GRAPH_RDLOCK
- static int translate_err(int err)
++parallels_check_outside_image(BlockDriverState *bs, BdrvCheckResult *res,
 +                              BdrvCheckMode fix)
 +{
 +    BDRVParallelsState *s = bs->opaque;
 +    uint32_t i;
 +    int64_t off, high_off, size;
 +
 +    size = bdrv_getlength(bs->file->bs);
 +    if (size < 0) {
 +        res->check_errors++;
 +        return size;
 +    }
 +
 +    high_off = 0;
 +    for (i = 0; i < s->bat_size; i++) {
 +        off = bat2sect(s, i) << BDRV_SECTOR_BITS;
 +        if (off > size) {
 +            fprintf(stderr, "%s cluster %u is outside image\n",
 +                    fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
 +            res->corruptions++;
 +            if (fix & BDRV_FIX_ERRORS) {
 +                parallels_set_bat_entry(s, i, 0);
 +                res->corruptions_fixed++;
 +            }
 +            continue;
 +        }
 +        if (high_off < off) {
 +            high_off = off;
 +        }
 +    }
 +
 +    if (high_off == 0) {
 +        res->image_end_offset = s->data_end << BDRV_SECTOR_BITS;
 +    } else {
 +        res->image_end_offset = high_off + s->cluster_size;
 +        s->data_end = res->image_end_offset >> BDRV_SECTOR_BITS;
 +    }
 +
 +    return 0;
 +}
 +
  static int coroutine_fn GRAPH_RDLOCK
  parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
                     BdrvCheckMode fix)
  {
-     if (err == -ENODEV || err == -ENOSYS || err == -EOPNOTSUPP ||
+     BDRVParallelsState *s = bs->opaque;
-@@ -XXX,XX +XXX,XX @@ static int translate_err(int err)
+-    int64_t size, prev_off, high_off;
 -    int ret = 0;
 +    int64_t size, prev_off;
 +    int ret;
      uint32_t i;
      size = bdrv_getlength(bs->file->bs);
@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
      parallels_check_unclean(bs, res, fix);
 +    ret = parallels_check_outside_image(bs, res, fix);
 +    if (ret < 0) {
 +        goto out;
 +    }
 +
      res->bfi.total_clusters = s->bat_size;
      res->bfi.compressed_clusters = 0; /* compression is not supported */
 -    high_off = 0;
      prev_off = 0;
      for (i = 0; i < s->bat_size; i++) {
          int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
              continue;
          }
 -        /* cluster outside the image */
 -        if (off > size) {
 -            fprintf(stderr, "%s cluster %u is outside image\n",
 -                    fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
 -            res->corruptions++;
 -            if (fix & BDRV_FIX_ERRORS) {
 -                parallels_set_bat_entry(s, i, 0);
 -                res->corruptions_fixed++;
 -            }
 -            prev_off = 0;
 -            continue;
 -        }
 -
          res->bfi.allocated_clusters++;
 -        if (off > high_off) {
 -            high_off = off;
 -        }
          if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
              res->bfi.fragmented_clusters++;
@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
          prev_off = off;
      }
-     return err;
- }
+-    if (high_off == 0) {
-+#endif
+-        res->image_end_offset = s->data_end << BDRV_SECTOR_BITS;
+-    } else {
- #ifdef CONFIG_FALLOCATE
+-        res->image_end_offset = high_off + s->cluster_size;
- static int do_fallocate(int fd, int mode, off_t offset, off_t len)
+-        s->data_end = res->image_end_offset >> BDRV_SECTOR_BITS;
-@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_discard(void *opaque)
+-    }
-             }
+-
-         } while (errno == EINTR);
+     if (size > res->image_end_offset) {
+         int64_t count;
--        ret = -errno;
+         count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
 +        ret = translate_err(-errno);
  #endif
      } else {
  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
          ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
                             aiocb->aio_offset, aiocb->aio_nbytes);
 +        ret = translate_err(-errno);
 +#elif defined(__APPLE__) && (__MACH__)
 +        fpunchhole_t fpunchhole;
 +        fpunchhole.fp_flags = 0;
 +        fpunchhole.reserved = 0;
 +        fpunchhole.fp_offset = aiocb->aio_offset;
 +        fpunchhole.fp_length = aiocb->aio_nbytes;
 +        if (fcntl(s->fd, F_PUNCHHOLE, &fpunchhole) == -1) {
 +            ret = errno == ENODEV ? -ENOTSUP : -errno;
 +        } else {
 +            ret = 0;
 +        }
  #endif
      }
 -    ret = translate_err(ret);
      if (ret == -ENOTSUP) {
          s->has_discard = false;
      }
 --
-.31.1
+.40.1

-New patch
+[PULL 12/17] parallels: Fix statistics calculation
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Exclude out-of-image clusters from allocated and fragmented clusters
+calculation.
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Message-Id: <20230424093147.197643-9-alexander.ivanov@virtuozzo.com>
+Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+---
+ block/parallels.c | 6 +++++-
+file changed, 5 insertions(+), 1 deletion(-)
+diff --git a/block/parallels.c b/block/parallels.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/parallels.c
++++ b/block/parallels.c
+@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+     prev_off = 0;
+     for (i = 0; i < s->bat_size; i++) {
+         int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+-        if (off == 0) {
++        /*
++         * If BDRV_FIX_ERRORS is not set, out-of-image BAT entries were not
++         * fixed. Skip not allocated and out-of-image BAT entries.
++         */
++        if (off == 0 || off + s->cluster_size > res->image_end_offset) {
+             prev_off = 0;
+             continue;
+         }
+--
+.40.1

-[PULL 4/5] block: Add backend_defaults property
+[PULL 13/17] parallels: Move check of leaks to a separate function
-From: Akihiko Odaki <akihiko.odaki@gmail.com>
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
-backend_defaults property allow users to control if default block
+We will add more and more checks so we need a better code structure
-properties should be decided with backend information.
+in parallels_co_check. Let each check performs in a separate loop
 in a separate helper.
-If it is off, any backend information will be discarded, which is
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
-suitable if you plan to perform live migration to a different disk backend.
+Message-Id: <20230424093147.197643-10-alexander.ivanov@virtuozzo.com>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
  block/parallels.c | 74 ++++++++++++++++++++++++++++-------------------
 file changed, 45 insertions(+), 29 deletions(-)
-If it is on, a block device may utilize backend information more
+diff --git a/block/parallels.c b/block/parallels.c
 aggressively.
 By default, it is auto, which uses backend information for block
 sizes and ignores the others, which is consistent with the older
 versions.
 Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
 Message-id: 20210705130458.97642-2-akihiko.odaki@gmail.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
  include/hw/block/block.h   |  3 +++
  hw/block/block.c           | 42 ++++++++++++++++++++++++++++++++++----
  tests/qemu-iotests/172.out | 38 ++++++++++++++++++++++++++++++++++
 files changed, 79 insertions(+), 4 deletions(-)
 diff --git a/include/hw/block/block.h b/include/hw/block/block.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/block/block.h
+--- a/block/parallels.c
-+++ b/include/hw/block/block.h
++++ b/block/parallels.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ parallels_check_outside_image(BlockDriverState *bs, BdrvCheckResult *res,
  typedef struct BlockConf {
      BlockBackend *blk;
 +    OnOffAuto backend_defaults;
      uint32_t physical_block_size;
      uint32_t logical_block_size;
      uint32_t min_io_size;
@@ -XXX,XX +XXX,XX @@ static inline unsigned int get_physical_block_exp(BlockConf *conf)
  }
- #define DEFINE_BLOCK_PROPERTIES_BASE(_state, _conf)                     \
+ static int coroutine_fn GRAPH_RDLOCK
-+    DEFINE_PROP_ON_OFF_AUTO("backend_defaults", _state,                 \
+-parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
-+                            _conf.backend_defaults, ON_OFF_AUTO_AUTO),  \
+-                   BdrvCheckMode fix)
-     DEFINE_PROP_BLOCKSIZE("logical_block_size", _state,                 \
++parallels_check_leak(BlockDriverState *bs, BdrvCheckResult *res,
-                           _conf.logical_block_size),                    \
++                     BdrvCheckMode fix)
      DEFINE_PROP_BLOCKSIZE("physical_block_size", _state,                \
 diff --git a/hw/block/block.c b/hw/block/block.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/block/block.c
 +++ b/hw/block/block.c
@@ -XXX,XX +XXX,XX @@ bool blkconf_blocksizes(BlockConf *conf, Error **errp)
  {
-     BlockBackend *blk = conf->blk;
+     BDRVParallelsState *s = bs->opaque;
-     BlockSizes blocksizes;
+-    int64_t size, prev_off;
--    int backend_ret;
++    int64_t size;
-+    BlockDriverState *bs;
+     int ret;
-+    bool use_blocksizes;
+-    uint32_t i;
-+    bool use_bs;
      size = bdrv_getlength(bs->file->bs);
      if (size < 0) {
@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
          return size;
      }
 +    if (size > res->image_end_offset) {
 +        int64_t count;
 +        count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
 +        fprintf(stderr, "%s space leaked at the end of the image %" PRId64 "\n",
 +                fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
 +                size - res->image_end_offset);
 +        res->leaks += count;
 +        if (fix & BDRV_FIX_LEAKS) {
 +            Error *local_err = NULL;
 +
-+    switch (conf->backend_defaults) {
++            /*
-+    case ON_OFF_AUTO_AUTO:
++             * In order to really repair the image, we must shrink it.
-+        use_blocksizes = !blk_probe_blocksizes(blk, &blocksizes);
++             * That means we have to pass exact=true.
-+        use_bs = false;
++             */
-+        break;
++            ret = bdrv_co_truncate(bs->file, res->image_end_offset, true,
-+
++                                   PREALLOC_MODE_OFF, 0, &local_err);
-+    case ON_OFF_AUTO_ON:
++            if (ret < 0) {
-+        use_blocksizes = !blk_probe_blocksizes(blk, &blocksizes);
++                error_report_err(local_err);
-+        bs = blk_bs(blk);
++                res->check_errors++;
-+        use_bs = bs;
++                return ret;
 +        break;
 +
 +    case ON_OFF_AUTO_OFF:
 +        use_blocksizes = false;
 +        use_bs = false;
 +        break;
 +
 +    default:
 +        abort();
 +    }
 -    backend_ret = blk_probe_blocksizes(blk, &blocksizes);
      /* fill in detected values if they are not defined via qemu command line */
      if (!conf->physical_block_size) {
 -        if (!backend_ret) {
 +        if (use_blocksizes) {
             conf->physical_block_size = blocksizes.phys;
          } else {
              conf->physical_block_size = BDRV_SECTOR_SIZE;
          }
      }
      if (!conf->logical_block_size) {
 -        if (!backend_ret) {
 +        if (use_blocksizes) {
              conf->logical_block_size = blocksizes.log;
          } else {
              conf->logical_block_size = BDRV_SECTOR_SIZE;
          }
      }
 +    if (use_bs) {
 +        if (!conf->opt_io_size) {
 +            conf->opt_io_size = bs->bl.opt_transfer;
 +        }
 +        if (conf->discard_granularity == -1) {
 +            if (bs->bl.pdiscard_alignment) {
 +                conf->discard_granularity = bs->bl.pdiscard_alignment;
 +            } else if (bs->bl.request_alignment != 1) {
 +                conf->discard_granularity = bs->bl.request_alignment;
 +            }
++            res->leaks_fixed += count;
 +        }
 +    }
++
-     if (conf->logical_block_size > conf->physical_block_size) {
++    return 0;
-         error_setg(errp,
++}
-diff --git a/tests/qemu-iotests/172.out b/tests/qemu-iotests/172.out
++
-index XXXXXXX..XXXXXXX 100644
++static int coroutine_fn GRAPH_RDLOCK
---- a/tests/qemu-iotests/172.out
++parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
-+++ b/tests/qemu-iotests/172.out
++                   BdrvCheckMode fix)
-@@ -XXX,XX +XXX,XX @@ Testing:
++{
-               dev: floppy, id ""
++    BDRVParallelsState *s = bs->opaque;
-                 unit = 0 (0x0)
++    int64_t prev_off;
-                 drive = "floppy0"
++    int ret;
-+                backend_defaults = "auto"
++    uint32_t i;
-                 logical_block_size = 512 (512 B)
++
-                 physical_block_size = 512 (512 B)
+     qemu_co_mutex_lock(&s->lock);
-                 min_io_size = 0 (0 B)
-@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2
+     parallels_check_unclean(bs, res, fix);
-               dev: floppy, id ""
+@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
-                 unit = 0 (0x0)
+         goto out;
-                 drive = "floppy0"
+     }
-+                backend_defaults = "auto"
-                 logical_block_size = 512 (512 B)
++    ret = parallels_check_leak(bs, res, fix);
-                 physical_block_size = 512 (512 B)
++    if (ret < 0) {
-                 min_io_size = 0 (0 B)
++        goto out;
-@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2
++    }
-               dev: floppy, id ""
++
-                 unit = 1 (0x1)
+     res->bfi.total_clusters = s->bat_size;
-                 drive = "floppy1"
+     res->bfi.compressed_clusters = 0; /* compression is not supported */
-+                backend_defaults = "auto"
-                 logical_block_size = 512 (512 B)
+@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
-                 physical_block_size = 512 (512 B)
+         prev_off = off;
-                 min_io_size = 0 (0 B)
+     }
-@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2
-               dev: floppy, id ""
+-    if (size > res->image_end_offset) {
-                 unit = 0 (0x0)
+-        int64_t count;
-                 drive = "floppy0"
+-        count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
-+                backend_defaults = "auto"
+-        fprintf(stderr, "%s space leaked at the end of the image %" PRId64 "\n",
-                 logical_block_size = 512 (512 B)
+-                fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
-                 physical_block_size = 512 (512 B)
+-                size - res->image_end_offset);
-                 min_io_size = 0 (0 B)
+-        res->leaks += count;
-@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -fdb TEST_DIR/t.qcow2.2
+-        if (fix & BDRV_FIX_LEAKS) {
-               dev: floppy, id ""
+-            Error *local_err = NULL;
-                 unit = 1 (0x1)
+-
-                 drive = "floppy1"
+-            /*
-+                backend_defaults = "auto"
+-             * In order to really repair the image, we must shrink it.
-                 logical_block_size = 512 (512 B)
+-             * That means we have to pass exact=true.
-                 physical_block_size = 512 (512 B)
+-             */
-                 min_io_size = 0 (0 B)
+-            ret = bdrv_co_truncate(bs->file, res->image_end_offset, true,
-@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -fdb TEST_DIR/t.qcow2.2
+-                                   PREALLOC_MODE_OFF, 0, &local_err);
-               dev: floppy, id ""
+-            if (ret < 0) {
-                 unit = 0 (0x0)
+-                error_report_err(local_err);
-                 drive = "floppy0"
+-                res->check_errors++;
-+                backend_defaults = "auto"
+-                goto out;
-                 logical_block_size = 512 (512 B)
+-            }
-                 physical_block_size = 512 (512 B)
+-            res->leaks_fixed += count;
-                 min_io_size = 0 (0 B)
+-        }
-@@ -XXX,XX +XXX,XX @@ Testing: -fdb
+-    }
-               dev: floppy, id ""
+-
-                 unit = 1 (0x1)
+ out:
-                 drive = "floppy1"
+     qemu_co_mutex_unlock(&s->lock);
-+                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "floppy0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "floppy0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2,index=1
                dev: floppy, id ""
                  unit = 1 (0x1)
                  drive = "floppy1"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2,index=1
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "floppy0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=floppy,file=TEST_DIR/t
                dev: floppy, id ""
                  unit = 1 (0x1)
                  drive = "floppy1"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=floppy,file=TEST_DIR/t
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "floppy0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,unit=1
                dev: floppy, id ""
                  unit = 1 (0x1)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qco
                dev: floppy, id ""
                  unit = 1 (0x1)
                  drive = "none1"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qco
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
                dev: floppy, id ""
                  unit = 1 (0x1)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "floppy0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
                dev: floppy, id ""
                  unit = 1 (0x1)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "floppy0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
                dev: floppy, id ""
                  unit = 1 (0x1)
                  drive = "floppy1"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
                dev: floppy, id ""
                  unit = 1 (0x1)
                  drive = "floppy1"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
                dev: floppy, id ""
                  unit = 1 (0x1)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "floppy0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
                dev: floppy, id ""
                  unit = 1 (0x1)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "floppy0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -global floppy.drive=none0 -device
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = ""
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy,drive-type=120
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = ""
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy,drive-type=144
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = ""
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy,drive-type=288
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = ""
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,drive-t
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,drive-t
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,logical
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,physica
                dev: floppy, id ""
                  unit = 0 (0x0)
                  drive = "none0"
 +                backend_defaults = "auto"
                  logical_block_size = 512 (512 B)
                  physical_block_size = 512 (512 B)
                  min_io_size = 0 (0 B)
 --
-.31.1
+.40.1

-New patch
+[PULL 14/17] parallels: Move statistic collection to a separate function
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+We will add more and more checks so we need a better code structure
+in parallels_co_check. Let each check performs in a separate loop
+in a separate helper.
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Reviewed-by: Denis V. Lunev <den@openvz.org>
+Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Message-Id: <20230424093147.197643-11-alexander.ivanov@virtuozzo.com>
+Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+---
+ block/parallels.c | 52 +++++++++++++++++++++++++++--------------------
+file changed, 30 insertions(+), 22 deletions(-)
+diff --git a/block/parallels.c b/block/parallels.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/parallels.c
++++ b/block/parallels.c
+@@ -XXX,XX +XXX,XX @@ parallels_check_leak(BlockDriverState *bs, BdrvCheckResult *res,
+     return 0;
+ }
+-static int coroutine_fn GRAPH_RDLOCK
+-parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+-                   BdrvCheckMode fix)
++static void parallels_collect_statistics(BlockDriverState *bs,
++                                         BdrvCheckResult *res,
++                                         BdrvCheckMode fix)
+ {
+     BDRVParallelsState *s = bs->opaque;
+-    int64_t prev_off;
+-    int ret;
++    int64_t off, prev_off;
+     uint32_t i;
+-    qemu_co_mutex_lock(&s->lock);
+-
+-    parallels_check_unclean(bs, res, fix);
+-
+-    ret = parallels_check_outside_image(bs, res, fix);
+-    if (ret < 0) {
+-        goto out;
+-    }
+-
+-    ret = parallels_check_leak(bs, res, fix);
+-    if (ret < 0) {
+-        goto out;
+-    }
+-
+     res->bfi.total_clusters = s->bat_size;
+     res->bfi.compressed_clusters = 0; /* compression is not supported */
+     prev_off = 0;
+     for (i = 0; i < s->bat_size; i++) {
+-        int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
++        off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+         /*
+          * If BDRV_FIX_ERRORS is not set, out-of-image BAT entries were not
+          * fixed. Skip not allocated and out-of-image BAT entries.
+@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+             continue;
+         }
+-        res->bfi.allocated_clusters++;
+-
+         if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
+             res->bfi.fragmented_clusters++;
+         }
+         prev_off = off;
++        res->bfi.allocated_clusters++;
+     }
++}
++
++static int coroutine_fn GRAPH_RDLOCK
++parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
++                   BdrvCheckMode fix)
++{
++    BDRVParallelsState *s = bs->opaque;
++    int ret;
++
++    qemu_co_mutex_lock(&s->lock);
++
++    parallels_check_unclean(bs, res, fix);
++
++    ret = parallels_check_outside_image(bs, res, fix);
++    if (ret < 0) {
++        goto out;
++    }
++
++    ret = parallels_check_leak(bs, res, fix);
++    if (ret < 0) {
++        goto out;
++    }
++
++    parallels_collect_statistics(bs, res, fix);
+ out:
+     qemu_co_mutex_unlock(&s->lock);
+--
+.40.1

-[PULL 2/5] util/async: print leaked BH name when AioContext finalizes
+[PULL 15/17] parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD
-BHs must be deleted before the AioContext is finalized. If not, it's a
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
 bug and probably indicates that some part of the program still expects
 the BH to run in the future. That can lead to memory leaks, inconsistent
 state, or just hangs.
-Unfortunately the assert(flags & BH_DELETED) call in aio_ctx_finalize()
+Replace the way we use mutex in parallels_co_check() for simplier
-is difficult to debug because the assertion failure contains no
+and less error prone code.
 information about the BH!
-Use the QEMUBH name field added in the previous patch to show a useful
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
-error when a leaked BH is detected.
+Reviewed-by: Denis V. Lunev <den@openvz.org>
 Message-Id: <20230424093147.197643-12-alexander.ivanov@virtuozzo.com>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
  block/parallels.c | 33 ++++++++++++++-------------------
 file changed, 14 insertions(+), 19 deletions(-)
-Suggested-by: Eric Ernst <eric.g.ernst@gmail.com>
+diff --git a/block/parallels.c b/block/parallels.c
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-Id: <20210414200247.917496-3-stefanha@redhat.com>
 ---
  util/async.c | 16 ++++++++++++++--
 file changed, 14 insertions(+), 2 deletions(-)
 diff --git a/util/async.c b/util/async.c
 index XXXXXXX..XXXXXXX 100644
---- a/util/async.c
+--- a/block/parallels.c
-+++ b/util/async.c
++++ b/block/parallels.c
-@@ -XXX,XX +XXX,XX @@ aio_ctx_finalize(GSource     *source)
+@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
-     assert(QSIMPLEQ_EMPTY(&ctx->bh_slice_list));
+     BDRVParallelsState *s = bs->opaque;
+     int ret;
-     while ((bh = aio_bh_dequeue(&ctx->bh_list, &flags))) {
--        /* qemu_bh_delete() must have been called on BHs in this AioContext */
+-    qemu_co_mutex_lock(&s->lock);
--        assert(flags & BH_DELETED);
++    WITH_QEMU_LOCK_GUARD(&s->lock) {
-+        /*
++        parallels_check_unclean(bs, res, fix);
-+         * qemu_bh_delete() must have been called on BHs in this AioContext. In
-+         * many cases memory leaks, hangs, or inconsistent state occur when a
+-    parallels_check_unclean(bs, res, fix);
-+         * BH is leaked because something still expects it to run.
++        ret = parallels_check_outside_image(bs, res, fix);
-+         *
++        if (ret < 0) {
-+         * If you hit this, fix the lifecycle of the BH so that
++            return ret;
 +         * qemu_bh_delete() and any associated cleanup is called before the
 +         * AioContext is finalized.
 +         */
 +        if (unlikely(!(flags & BH_DELETED))) {
 +            fprintf(stderr, "%s: BH '%s' leaked, aborting...\n",
 +                    __func__, bh->name);
 +            abort();
 +        }
-         g_free(bh);
+-    ret = parallels_check_outside_image(bs, res, fix);
 -    if (ret < 0) {
 -        goto out;
 -    }
 +        ret = parallels_check_leak(bs, res, fix);
 +        if (ret < 0) {
 +            return ret;
 +        }
 -    ret = parallels_check_leak(bs, res, fix);
 -    if (ret < 0) {
 -        goto out;
 +        parallels_collect_statistics(bs, res, fix);
      }
+-    parallels_collect_statistics(bs, res, fix);
+-
+-out:
+-    qemu_co_mutex_unlock(&s->lock);
+-
+-    if (ret == 0) {
+-        ret = bdrv_co_flush(bs);
+-        if (ret < 0) {
+-            res->check_errors++;
+-        }
++    ret = bdrv_co_flush(bs);
++    if (ret < 0) {
++        res->check_errors++;
+     }
+     return ret;
 --
-.31.1
+.40.1

-New patch
+[PULL 16/17] parallels: Incorrect condition in out-of-image check
+From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+All the offsets in the BAT must be lower than the file size.
+Fix the check condition for correct check.
+Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Reviewed-by: Denis V. Lunev <den@openvz.org>
+Message-Id: <20230424093147.197643-13-alexander.ivanov@virtuozzo.com>
+Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+---
+ block/parallels.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/block/parallels.c b/block/parallels.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/parallels.c
++++ b/block/parallels.c
+@@ -XXX,XX +XXX,XX @@ parallels_check_outside_image(BlockDriverState *bs, BdrvCheckResult *res,
+     high_off = 0;
+     for (i = 0; i < s->bat_size; i++) {
+         off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+-        if (off > size) {
++        if (off + s->cluster_size > size) {
+             fprintf(stderr, "%s cluster %u is outside image\n",
+                     fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+             res->corruptions++;
+--
+.40.1

-New patch
+[PULL 17/17] qcow2: add discard-no-unref option
+From: Jean-Louis Dupond <jean-louis@dupond.be>
 When we for example have a sparse qcow2 image and discard: unmap is enabled,
 there can be a lot of fragmentation in the image after some time. Especially on VM's
 that do a lot of writes/deletes.
 This causes the qcow2 image to grow even over 110% of its virtual size,
 because the free gaps in the image get too small to allocate new
 continuous clusters. So it allocates new space at the end of the image.
 Disabling discard is not an option, as discard is needed to keep the
 incremental backup size as low as possible. Without discard, the
 incremental backups would become large, as qemu thinks it's just dirty
 blocks but it doesn't know the blocks are unneeded.
 So we need to avoid fragmentation but also 'empty' the unneeded blocks in
 the image to have a small incremental backup.
 In addition, we also want to send the discards further down the stack, so
 the underlying blocks are still discarded.
 Therefor we introduce a new qcow2 option "discard-no-unref".
 When setting this option to true, discards will no longer have the qcow2
 driver relinquish cluster allocations. Other than that, the request is
 handled as normal: All clusters in range are marked as zero, and, if
 pass-discard-request is true, it is passed further down the stack.
 The only difference is that the now-zero clusters are preallocated
 instead of being unallocated.
 This will avoid fragmentation on the qcow2 image.
 Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
 Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
 Message-Id: <20230605084523.34134-2-jean-louis@dupond.be>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
  qapi/block-core.json  | 12 ++++++++++++
  block/qcow2.h         |  3 +++
  block/qcow2-cluster.c | 32 ++++++++++++++++++++++++++++----
  block/qcow2.c         | 18 ++++++++++++++++++
  qemu-options.hx       | 12 ++++++++++++
 files changed, 73 insertions(+), 4 deletions(-)
 diff --git a/qapi/block-core.json b/qapi/block-core.json
 index XXXXXXX..XXXXXXX 100644
 --- a/qapi/block-core.json
 +++ b/qapi/block-core.json
@@ -XXX,XX +XXX,XX @@
  # @pass-discard-other: whether discard requests for the data source
  #     should be issued on other occasions where a cluster gets freed
  #
 +# @discard-no-unref: when enabled, discards from the guest will not cause
 +#     cluster allocations to be relinquished. This prevents qcow2 fragmentation
 +#     that would be caused by such discards. Besides potential
 +#     performance degradation, such fragmentation can lead to increased
 +#     allocation of clusters past the end of the image file,
 +#     resulting in image files whose file length can grow much larger
 +#     than their guest disk size would suggest.
 +#     If image file length is of concern (e.g. when storing qcow2
 +#     images directly on block devices), you should consider enabling
 +#     this option. (since 8.1)
 +#
  # @overlap-check: which overlap checks to perform for writes to the
  #     image, defaults to 'cached' (since 2.2)
  #
@@ -XXX,XX +XXX,XX @@
              '*pass-discard-request': 'bool',
              '*pass-discard-snapshot': 'bool',
              '*pass-discard-other': 'bool',
 +            '*discard-no-unref': 'bool',
              '*overlap-check': 'Qcow2OverlapChecks',
              '*cache-size': 'int',
              '*l2-cache-size': 'int',
 diff --git a/block/qcow2.h b/block/qcow2.h
 index XXXXXXX..XXXXXXX 100644
 --- a/block/qcow2.h
 +++ b/block/qcow2.h
@@ -XXX,XX +XXX,XX @@
  #define QCOW2_OPT_DISCARD_REQUEST "pass-discard-request"
  #define QCOW2_OPT_DISCARD_SNAPSHOT "pass-discard-snapshot"
  #define QCOW2_OPT_DISCARD_OTHER "pass-discard-other"
 +#define QCOW2_OPT_DISCARD_NO_UNREF "discard-no-unref"
  #define QCOW2_OPT_OVERLAP "overlap-check"
  #define QCOW2_OPT_OVERLAP_TEMPLATE "overlap-check.template"
  #define QCOW2_OPT_OVERLAP_MAIN_HEADER "overlap-check.main-header"
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVQcow2State {
      bool discard_passthrough[QCOW2_DISCARD_MAX];
 +    bool discard_no_unref;
 +
      int overlap_check; /* bitmask of Qcow2MetadataOverlap values */
      bool signaled_corruption;
 diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/qcow2-cluster.c
 +++ b/block/qcow2-cluster.c
@@ -XXX,XX +XXX,XX @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
          uint64_t new_l2_bitmap = old_l2_bitmap;
          QCow2ClusterType cluster_type =
              qcow2_get_cluster_type(bs, old_l2_entry);
 +        bool keep_reference = (cluster_type != QCOW2_CLUSTER_COMPRESSED) &&
 +                              !full_discard &&
 +                              (s->discard_no_unref &&
 +                               type == QCOW2_DISCARD_REQUEST);
          /*
           * If full_discard is true, the cluster should not read back as zeroes,
@@ -XXX,XX +XXX,XX @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
              new_l2_entry = new_l2_bitmap = 0;
          } else if (bs->backing || qcow2_cluster_is_allocated(cluster_type)) {
              if (has_subclusters(s)) {
 -                new_l2_entry = 0;
 +                if (keep_reference) {
 +                    new_l2_entry = old_l2_entry;
 +                } else {
 +                    new_l2_entry = 0;
 +                }
                  new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
              } else {
 -                new_l2_entry = s->qcow_version >= 3 ? QCOW_OFLAG_ZERO : 0;
 +                if (s->qcow_version >= 3) {
 +                    if (keep_reference) {
 +                        new_l2_entry |= QCOW_OFLAG_ZERO;
 +                    } else {
 +                        new_l2_entry = QCOW_OFLAG_ZERO;
 +                    }
 +                } else {
 +                    new_l2_entry = 0;
 +                }
              }
          }
@@ -XXX,XX +XXX,XX @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
          if (has_subclusters(s)) {
              set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
          }
 -        /* Then decrease the refcount */
 -        qcow2_free_any_cluster(bs, old_l2_entry, type);
 +        if (!keep_reference) {
 +            /* Then decrease the refcount */
 +            qcow2_free_any_cluster(bs, old_l2_entry, type);
 +        } else if (s->discard_passthrough[type] &&
 +                   (cluster_type == QCOW2_CLUSTER_NORMAL ||
 +                    cluster_type == QCOW2_CLUSTER_ZERO_ALLOC)) {
 +            /* If we keep the reference, pass on the discard still */
 +            bdrv_pdiscard(s->data_file, old_l2_entry & L2E_OFFSET_MASK,
 +                          s->cluster_size);
 +       }
      }
      qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 diff --git a/block/qcow2.c b/block/qcow2.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/qcow2.c
 +++ b/block/qcow2.c
@@ -XXX,XX +XXX,XX @@ static const char *const mutable_opts[] = {
      QCOW2_OPT_DISCARD_REQUEST,
      QCOW2_OPT_DISCARD_SNAPSHOT,
      QCOW2_OPT_DISCARD_OTHER,
 +    QCOW2_OPT_DISCARD_NO_UNREF,
      QCOW2_OPT_OVERLAP,
      QCOW2_OPT_OVERLAP_TEMPLATE,
      QCOW2_OPT_OVERLAP_MAIN_HEADER,
@@ -XXX,XX +XXX,XX @@ static QemuOptsList qcow2_runtime_opts = {
              .type = QEMU_OPT_BOOL,
              .help = "Generate discard requests when other clusters are freed",
          },
 +        {
 +            .name = QCOW2_OPT_DISCARD_NO_UNREF,
 +            .type = QEMU_OPT_BOOL,
 +            .help = "Do not unreference discarded clusters",
 +        },
          {
              .name = QCOW2_OPT_OVERLAP,
              .type = QEMU_OPT_STRING,
@@ -XXX,XX +XXX,XX @@ typedef struct Qcow2ReopenState {
      bool use_lazy_refcounts;
      int overlap_check;
      bool discard_passthrough[QCOW2_DISCARD_MAX];
 +    bool discard_no_unref;
      uint64_t cache_clean_interval;
      QCryptoBlockOpenOptions *crypto_opts; /* Disk encryption runtime options */
  } Qcow2ReopenState;
@@ -XXX,XX +XXX,XX @@ static int qcow2_update_options_prepare(BlockDriverState *bs,
      r->discard_passthrough[QCOW2_DISCARD_OTHER] =
          qemu_opt_get_bool(opts, QCOW2_OPT_DISCARD_OTHER, false);
 +    r->discard_no_unref = qemu_opt_get_bool(opts, QCOW2_OPT_DISCARD_NO_UNREF,
 +                                            false);
 +    if (r->discard_no_unref && s->qcow_version < 3) {
 +        error_setg(errp,
 +                   "discard-no-unref is only supported since qcow2 version 3");
 +        ret = -EINVAL;
 +        goto fail;
 +    }
 +
      switch (s->crypt_method_header) {
      case QCOW_CRYPT_NONE:
          if (encryptfmt) {
@@ -XXX,XX +XXX,XX @@ static void qcow2_update_options_commit(BlockDriverState *bs,
          s->discard_passthrough[i] = r->discard_passthrough[i];
      }
 +    s->discard_no_unref = r->discard_no_unref;
 +
      if (s->cache_clean_interval != r->cache_clean_interval) {
          cache_clean_timer_del(bs);
          s->cache_clean_interval = r->cache_clean_interval;
 diff --git a/qemu-options.hx b/qemu-options.hx
 index XXXXXXX..XXXXXXX 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ SRST
              issued on other occasions where a cluster gets freed
              (on/off; default: off)
 +        ``discard-no-unref``
 +            When enabled, discards from the guest will not cause cluster
 +            allocations to be relinquished. This prevents qcow2 fragmentation
 +            that would be caused by such discards. Besides potential
 +            performance degradation, such fragmentation can lead to increased
 +            allocation of clusters past the end of the image file,
 +            resulting in image files whose file length can grow much larger
 +            than their guest disk size would suggest.
 +            If image file length is of concern (e.g. when storing qcow2
 +            images directly on block devices), you should consider enabling
 +            this option.
 +
          ``overlap-check``
              Which overlap checks to perform for writes to the image
              (none/constant/cached/all; default: cached). For details or
 --
 .40.1

The following changes since commit 711c0418c8c1ce3a24346f058b001c4c5a2f0f81:

Merge remote-tracking branch 'remotes/philmd/tags/mips-20210702' into staging (2021-07-04 14:04:12 +0100)

are available in the Git repository at:

https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 9f460c64e13897117f35ffb61f6f5e0102cabc70:

block/io: Merge discard request alignments (2021-07-06 14:28:55 +0100)

----------------------------------------------------------------
Pull request

----------------------------------------------------------------

Akihiko Odaki (3):
  block/file-posix: Optimize for macOS
  block: Add backend_defaults property
  block/io: Merge discard request alignments

Stefan Hajnoczi (2):
  util/async: add a human-readable name to BHs for debugging
  util/async: print leaked BH name when AioContext finalizes

-- 
2.31.1

It can be difficult to debug issues with BHs in production environments.
Although BHs can usually be identified by looking up their ->cb()
function pointer, this requires debug information for the program. It is
also not possible to print human-readable diagnostics about BHs because
they have no identifier.

This patch adds a name to each BH. The name is not unique per instance
but differentiates between cb() functions, which is usually enough. It's
done by changing aio_bh_new() and friends to macros that stringify cb.

The next patch will use the name field when reporting leaked BHs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210414200247.917496-2-stefanha@redhat.com>
---
 include/block/aio.h            | 31 ++++++++++++++++++++++++++++---
 include/qemu/main-loop.h       |  4 +++-
 tests/unit/ptimer-test-stubs.c |  2 +-
 util/async.c                   |  9 +++++++--
 util/main-loop.c               |  4 ++--
 5 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -XXX,XX +XXX,XX @@ void aio_context_acquire(AioContext *ctx);
 /* Relinquish ownership of the AioContext. */
 void aio_context_release(AioContext *ctx);
 
+/**
+ * aio_bh_schedule_oneshot_full: Allocate a new bottom half structure that will
+ * run only once and as soon as possible.
+ *
+ * @name: A human-readable identifier for debugging purposes.
+ */
+void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
+                                  const char *name);
+
 /**
  * aio_bh_schedule_oneshot: Allocate a new bottom half structure that will run
  * only once and as soon as possible.
+ *
+ * A convenience wrapper for aio_bh_schedule_oneshot_full() that uses cb as the
+ * name string.
  */
-void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
+#define aio_bh_schedule_oneshot(ctx, cb, opaque) \
+    aio_bh_schedule_oneshot_full((ctx), (cb), (opaque), (stringify(cb)))
 
 /**
- * aio_bh_new: Allocate a new bottom half structure.
+ * aio_bh_new_full: Allocate a new bottom half structure.
  *
  * Bottom halves are lightweight callbacks whose invocation is guaranteed
  * to be wait-free, thread-safe and signal-safe.  The #QEMUBH structure
  * is opaque and must be allocated prior to its use.
+ *
+ * @name: A human-readable identifier for debugging purposes.
  */
-QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
+QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
+                        const char *name);
+
+/**
+ * aio_bh_new: Allocate a new bottom half structure
+ *
+ * A convenience wrapper for aio_bh_new_full() that uses the cb as the name
+ * string.
+ */
+#define aio_bh_new(ctx, cb, opaque) \
+    aio_bh_new_full((ctx), (cb), (opaque), (stringify(cb)))
 
 /**
  * aio_notify: Force processing of pending events.
diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -XXX,XX +XXX,XX @@ void qemu_cond_timedwait_iothread(QemuCond *cond, int ms);
 
 void qemu_fd_register(int fd);
 
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque);
+#define qemu_bh_new(cb, opaque) \
+    qemu_bh_new_full((cb), (opaque), (stringify(cb)))
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name);
 void qemu_bh_schedule_idle(QEMUBH *bh);
 
 enum {
diff --git a/tests/unit/ptimer-test-stubs.c b/tests/unit/ptimer-test-stubs.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/unit/ptimer-test-stubs.c
+++ b/tests/unit/ptimer-test-stubs.c
@@ -XXX,XX +XXX,XX @@ int64_t qemu_clock_deadline_ns_all(QEMUClockType type, int attr_mask)
     return deadline;
 }
 
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
 {
     QEMUBH *bh = g_new(QEMUBH, 1);
 
diff --git a/util/async.c b/util/async.c
index XXXXXXX..XXXXXXX 100644
--- a/util/async.c
+++ b/util/async.c
@@ -XXX,XX +XXX,XX @@ enum {
 
 struct QEMUBH {
     AioContext *ctx;
+    const char *name;
     QEMUBHFunc *cb;
     void *opaque;
     QSLIST_ENTRY(QEMUBH) next;
@@ -XXX,XX +XXX,XX @@ static QEMUBH *aio_bh_dequeue(BHList *head, unsigned *flags)
     return bh;
 }
 
-void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
+void aio_bh_schedule_oneshot_full(AioContext *ctx, QEMUBHFunc *cb,
+                                  void *opaque, const char *name)
 {
     QEMUBH *bh;
     bh = g_new(QEMUBH, 1);
@@ -XXX,XX +XXX,XX @@ void aio_bh_schedule_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
         .ctx = ctx,
         .cb = cb,
         .opaque = opaque,
+        .name = name,
     };
     aio_bh_enqueue(bh, BH_SCHEDULED | BH_ONESHOT);
 }
 
-QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
+QEMUBH *aio_bh_new_full(AioContext *ctx, QEMUBHFunc *cb, void *opaque,
+                        const char *name)
 {
     QEMUBH *bh;
     bh = g_new(QEMUBH, 1);
@@ -XXX,XX +XXX,XX @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
         .ctx = ctx,
         .cb = cb,
         .opaque = opaque,
+        .name = name,
     };
     return bh;
 }
diff --git a/util/main-loop.c b/util/main-loop.c
index XXXXXXX..XXXXXXX 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -XXX,XX +XXX,XX @@ void main_loop_wait(int nonblocking)
 
 /* Functions to operate on the main QEMU AioContext.  */
 
-QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
+QEMUBH *qemu_bh_new_full(QEMUBHFunc *cb, void *opaque, const char *name)
 {
-    return aio_bh_new(qemu_aio_context, cb, opaque);
+    return aio_bh_new_full(qemu_aio_context, cb, opaque, name);
 }
 
 /*
-- 
2.31.1

BHs must be deleted before the AioContext is finalized. If not, it's a
bug and probably indicates that some part of the program still expects
the BH to run in the future. That can lead to memory leaks, inconsistent
state, or just hangs.

Unfortunately the assert(flags & BH_DELETED) call in aio_ctx_finalize()
is difficult to debug because the assertion failure contains no
information about the BH!

Use the QEMUBH name field added in the previous patch to show a useful
error when a leaked BH is detected.

Suggested-by: Eric Ernst <eric.g.ernst@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210414200247.917496-3-stefanha@redhat.com>
---
 util/async.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/util/async.c b/util/async.c
index XXXXXXX..XXXXXXX 100644
--- a/util/async.c
+++ b/util/async.c
@@ -XXX,XX +XXX,XX @@ aio_ctx_finalize(GSource     *source)
     assert(QSIMPLEQ_EMPTY(&ctx->bh_slice_list));
 
     while ((bh = aio_bh_dequeue(&ctx->bh_list, &flags))) {
-        /* qemu_bh_delete() must have been called on BHs in this AioContext */
-        assert(flags & BH_DELETED);
+        /*
+         * qemu_bh_delete() must have been called on BHs in this AioContext. In
+         * many cases memory leaks, hangs, or inconsistent state occur when a
+         * BH is leaked because something still expects it to run.
+         *
+         * If you hit this, fix the lifecycle of the BH so that
+         * qemu_bh_delete() and any associated cleanup is called before the
+         * AioContext is finalized.
+         */
+        if (unlikely(!(flags & BH_DELETED))) {
+            fprintf(stderr, "%s: BH '%s' leaked, aborting...\n",
+                    __func__, bh->name);
+            abort();
+        }
 
         g_free(bh);
     }
-- 
2.31.1

From: Akihiko Odaki <akihiko.odaki@gmail.com>

This commit introduces "punch hole" operation and optimizes transfer
block size for macOS.

Thanks to Konstantin Nazarov for detailed analysis of a flaw in an
old version of this change:
https://gist.github.com/akihikodaki/87df4149e7ca87f18dc56807ec5a1bc5#gistcomment-3654667

Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-id: 20210705130458.97642-1-akihiko.odaki@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/file-posix.c | 27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index XXXXXXX..XXXXXXX 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -XXX,XX +XXX,XX @@
 #if defined(HAVE_HOST_BLOCK_DEVICE)
 #include <paths.h>
 #include <sys/param.h>
+#include <sys/mount.h>
 #include <IOKit/IOKitLib.h>
 #include <IOKit/IOBSD.h>
 #include <IOKit/storage/IOMediaBSDClient.h>
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
         return;
     }
 
+#if defined(__APPLE__) && (__MACH__)
+    struct statfs buf;
+
+    if (!fstatfs(s->fd, &buf)) {
+        bs->bl.opt_transfer = buf.f_iosize;
+        bs->bl.pdiscard_alignment = buf.f_bsize;
+    }
+#endif
+
     if (bs->sg || S_ISBLK(st.st_mode)) {
         int ret = hdev_get_max_hw_transfer(s->fd, &st);
 
@@ -XXX,XX +XXX,XX @@ out:
     }
 }
 
+#if defined(CONFIG_FALLOCATE) || defined(BLKZEROOUT) || defined(BLKDISCARD)
 static int translate_err(int err)
 {
     if (err == -ENODEV || err == -ENOSYS || err == -EOPNOTSUPP ||
@@ -XXX,XX +XXX,XX @@ static int translate_err(int err)
     }
     return err;
 }
+#endif
 
 #ifdef CONFIG_FALLOCATE
 static int do_fallocate(int fd, int mode, off_t offset, off_t len)
@@ -XXX,XX +XXX,XX @@ static int handle_aiocb_discard(void *opaque)
             }
         } while (errno == EINTR);
 
-        ret = -errno;
+        ret = translate_err(-errno);
 #endif
     } else {
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
         ret = do_fallocate(s->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
                            aiocb->aio_offset, aiocb->aio_nbytes);
+        ret = translate_err(-errno);
+#elif defined(__APPLE__) && (__MACH__)
+        fpunchhole_t fpunchhole;
+        fpunchhole.fp_flags = 0;
+        fpunchhole.reserved = 0;
+        fpunchhole.fp_offset = aiocb->aio_offset;
+        fpunchhole.fp_length = aiocb->aio_nbytes;
+        if (fcntl(s->fd, F_PUNCHHOLE, &fpunchhole) == -1) {
+            ret = errno == ENODEV ? -ENOTSUP : -errno;
+        } else {
+            ret = 0;
+        }
 #endif
     }
 
-    ret = translate_err(ret);
     if (ret == -ENOTSUP) {
         s->has_discard = false;
     }
-- 
2.31.1

From: Akihiko Odaki <akihiko.odaki@gmail.com>

backend_defaults property allow users to control if default block
properties should be decided with backend information.

If it is off, any backend information will be discarded, which is
suitable if you plan to perform live migration to a different disk backend.

If it is on, a block device may utilize backend information more
aggressively.

By default, it is auto, which uses backend information for block
sizes and ignores the others, which is consistent with the older
versions.

Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-id: 20210705130458.97642-2-akihiko.odaki@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/block/block.h   |  3 +++
 hw/block/block.c           | 42 ++++++++++++++++++++++++++++++++++----
 tests/qemu-iotests/172.out | 38 ++++++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/include/hw/block/block.h b/include/hw/block/block.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/block/block.h
+++ b/include/hw/block/block.h
@@ -XXX,XX +XXX,XX @@
 
 typedef struct BlockConf {
     BlockBackend *blk;
+    OnOffAuto backend_defaults;
     uint32_t physical_block_size;
     uint32_t logical_block_size;
     uint32_t min_io_size;
@@ -XXX,XX +XXX,XX @@ static inline unsigned int get_physical_block_exp(BlockConf *conf)
 }
 
 #define DEFINE_BLOCK_PROPERTIES_BASE(_state, _conf)                     \
+    DEFINE_PROP_ON_OFF_AUTO("backend_defaults", _state,                 \
+                            _conf.backend_defaults, ON_OFF_AUTO_AUTO),  \
     DEFINE_PROP_BLOCKSIZE("logical_block_size", _state,                 \
                           _conf.logical_block_size),                    \
     DEFINE_PROP_BLOCKSIZE("physical_block_size", _state,                \
diff --git a/hw/block/block.c b/hw/block/block.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/block.c
+++ b/hw/block/block.c
@@ -XXX,XX +XXX,XX @@ bool blkconf_blocksizes(BlockConf *conf, Error **errp)
 {
     BlockBackend *blk = conf->blk;
     BlockSizes blocksizes;
-    int backend_ret;
+    BlockDriverState *bs;
+    bool use_blocksizes;
+    bool use_bs;
+
+    switch (conf->backend_defaults) {
+    case ON_OFF_AUTO_AUTO:
+        use_blocksizes = !blk_probe_blocksizes(blk, &blocksizes);
+        use_bs = false;
+        break;
+
+    case ON_OFF_AUTO_ON:
+        use_blocksizes = !blk_probe_blocksizes(blk, &blocksizes);
+        bs = blk_bs(blk);
+        use_bs = bs;
+        break;
+
+    case ON_OFF_AUTO_OFF:
+        use_blocksizes = false;
+        use_bs = false;
+        break;
+
+    default:
+        abort();
+    }
 
-    backend_ret = blk_probe_blocksizes(blk, &blocksizes);
     /* fill in detected values if they are not defined via qemu command line */
     if (!conf->physical_block_size) {
-        if (!backend_ret) {
+        if (use_blocksizes) {
            conf->physical_block_size = blocksizes.phys;
         } else {
             conf->physical_block_size = BDRV_SECTOR_SIZE;
         }
     }
     if (!conf->logical_block_size) {
-        if (!backend_ret) {
+        if (use_blocksizes) {
             conf->logical_block_size = blocksizes.log;
         } else {
             conf->logical_block_size = BDRV_SECTOR_SIZE;
         }
     }
+    if (use_bs) {
+        if (!conf->opt_io_size) {
+            conf->opt_io_size = bs->bl.opt_transfer;
+        }
+        if (conf->discard_granularity == -1) {
+            if (bs->bl.pdiscard_alignment) {
+                conf->discard_granularity = bs->bl.pdiscard_alignment;
+            } else if (bs->bl.request_alignment != 1) {
+                conf->discard_granularity = bs->bl.request_alignment;
+            }
+        }
+    }
 
     if (conf->logical_block_size > conf->physical_block_size) {
         error_setg(errp,
diff --git a/tests/qemu-iotests/172.out b/tests/qemu-iotests/172.out
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/172.out
+++ b/tests/qemu-iotests/172.out
@@ -XXX,XX +XXX,XX @@ Testing:
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "floppy1"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -fdb TEST_DIR/t.qcow2.2
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "floppy1"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -fdb TEST_DIR/t.qcow2.2
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "floppy1"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2,index=1
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "floppy1"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2,index=1
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=floppy,file=TEST_DIR/t
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "floppy1"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=floppy,file=TEST_DIR/t
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,unit=1
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qco
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "none1"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qco
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fda TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "floppy1"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -fdb TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.qcow2.2 -device fl
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "floppy1"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
               dev: floppy, id ""
                 unit = 1 (0x1)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=floppy,file=TEST_DIR/t.qcow2 -drive if=none,file=TEST_DIR/t.q
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "floppy0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -global floppy.drive=none0 -device
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = ""
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy,drive-type=120
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = ""
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy,drive-type=144
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = ""
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -device floppy,drive-type=288
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = ""
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,drive-t
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,drive-t
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,logical
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
@@ -XXX,XX +XXX,XX @@ Testing: -drive if=none,file=TEST_DIR/t.qcow2 -device floppy,drive=none0,physica
               dev: floppy, id ""
                 unit = 0 (0x0)
                 drive = "none0"
+                backend_defaults = "auto"
                 logical_block_size = 512 (512 B)
                 physical_block_size = 512 (512 B)
                 min_io_size = 0 (0 B)
-- 
2.31.1

The following changes since commit 848a6caa88b9f082c89c9b41afa975761262981d:

Merge tag 'migration-20230602-pull-request' of https://gitlab.com/juan.quintela/qemu into staging (2023-06-02 17:33:29 -0700)

are available in the Git repository at:

https://gitlab.com/hreitz/qemu.git tags/pull-block-2023-06-05

for you to fetch changes up to 42a2890a76f4783cd1c212f27856edcf2b5e8a75:

qcow2: add discard-no-unref option (2023-06-05 13:15:42 +0200)

----------------------------------------------------------------
Block patches

- Fix padding of unaligned vectored requests to match the host alignment
  for vectors with 1023 or 1024 buffers
- Refactor and fix bugs in parallels's image check functionality
- Add an option to the qcow2 driver to retain (qcow2-level) allocations
  on discard requests from the guest (while still forwarding the discard
  to the lower level and marking the range as zero)

----------------------------------------------------------------
Alexander Ivanov (12):
  parallels: Out of image offset in BAT leads to image inflation
  parallels: Fix high_off calculation in parallels_co_check()
  parallels: Fix image_end_offset and data_end after out-of-image check
  parallels: create parallels_set_bat_entry_helper() to assign BAT value
  parallels: Use generic infrastructure for BAT writing in
    parallels_co_check()
  parallels: Move check of unclean image to a separate function
  parallels: Move check of cluster outside image to a separate function
  parallels: Fix statistics calculation
  parallels: Move check of leaks to a separate function
  parallels: Move statistic collection to a separate function
  parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD
  parallels: Incorrect condition in out-of-image check

Hanna Czenczek (4):
  util/iov: Make qiov_slice() public
  block: Collapse padded I/O vecs exceeding IOV_MAX
  util/iov: Remove qemu_iovec_init_extended()
  iotests/iov-padding: New test

Jean-Louis Dupond (1):
  qcow2: add discard-no-unref option

qapi/block-core.json                     |  12 ++
 block/qcow2.h                            |   3 +
 include/qemu/iov.h                       |   8 +-
 block/io.c                               | 166 ++++++++++++++++++--
 block/parallels.c                        | 190 ++++++++++++++++-------
 block/qcow2-cluster.c                    |  32 +++-
 block/qcow2.c                            |  18 +++
 util/iov.c                               |  89 ++---------
 qemu-options.hx                          |  12 ++
 tests/qemu-iotests/tests/iov-padding     |  85 ++++++++++
 tests/qemu-iotests/tests/iov-padding.out |  59 +++++++
 11 files changed, 523 insertions(+), 151 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/iov-padding
 create mode 100644 tests/qemu-iotests/tests/iov-padding.out

-- 
2.40.1

We want to inline qemu_iovec_init_extended() in block/io.c for padding
requests, and having access to qiov_slice() is useful for this.  As a
public function, it is renamed to qemu_iovec_slice().

(We will need to count the number of I/O vector elements of a slice
there, and then later process this slice.  Without qiov_slice(), we
would need to call qemu_iovec_subvec_niov(), and all further
IOV-processing functions may need to skip prefixing elements to
accomodate for a qiov_offset.  Because qemu_iovec_subvec_niov()
internally calls qiov_slice(), we can just have the block/io.c code call
qiov_slice() itself, thus get the number of elements, and also create an
iovec array with the superfluous prefixing elements stripped, so the
following processing functions no longer need to skip them.)

diff --git a/include/qemu/iov.h b/include/qemu/iov.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/iov.h
+++ b/include/qemu/iov.h
@@ -XXX,XX +XXX,XX @@ int qemu_iovec_init_extended(
         void *tail_buf, size_t tail_len);
 void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
                            size_t offset, size_t len);
+struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
+                               size_t offset, size_t len,
+                               size_t *head, size_t *tail, int *niov);
 int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len);
 void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len);
 void qemu_iovec_concat(QEMUIOVector *dst,
diff --git a/util/iov.c b/util/iov.c
index XXXXXXX..XXXXXXX 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -XXX,XX +XXX,XX @@ static struct iovec *iov_skip_offset(struct iovec *iov, size_t offset,
 }
 
 /*
- * qiov_slice
+ * qemu_iovec_slice
  *
  * Find subarray of iovec's, containing requested range. @head would
  * be offset in first iov (returned by the function), @tail would be
  * count of extra bytes in last iovec (returned iov + @niov - 1).
  */
-static struct iovec *qiov_slice(QEMUIOVector *qiov,
-                                size_t offset, size_t len,
-                                size_t *head, size_t *tail, int *niov)
+struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
+                               size_t offset, size_t len,
+                               size_t *head, size_t *tail, int *niov)
 {
     struct iovec *iov, *end_iov;
 
@@ -XXX,XX +XXX,XX @@ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len)
     size_t head, tail;
     int niov;
 
-    qiov_slice(qiov, offset, len, &head, &tail, &niov);
+    qemu_iovec_slice(qiov, offset, len, &head, &tail, &niov);
 
     return niov;
 }
@@ -XXX,XX +XXX,XX @@ int qemu_iovec_init_extended(
     }
 
     if (mid_len) {
-        mid_iov = qiov_slice(mid_qiov, mid_offset, mid_len,
-                             &mid_head, &mid_tail, &mid_niov);
+        mid_iov = qemu_iovec_slice(mid_qiov, mid_offset, mid_len,
+                                   &mid_head, &mid_tail, &mid_niov);
     }
 
     total_niov = !!head_len + mid_niov + !!tail_len;
-- 
2.40.1

When processing vectored guest requests that are not aligned to the
storage request alignment, we pad them by adding head and/or tail
buffers for a read-modify-write cycle.

The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
with this padding, the vector can exceed that limit.  As of
4c002cef0e9abe7135d7916c51abce47f7fc1ee2 ("util/iov: make
qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
limit, instead returning an error to the guest.

To the guest, this appears as a random I/O error.  We should not return
an I/O error to the guest when it issued a perfectly valid request.

Before 4c002cef0e9abe7135d7916c51abce47f7fc1ee2, we just made the vector
longer than IOV_MAX, which generally seems to work (because the guest
assumes a smaller alignment than we really have, file-posix's
raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
so emulate the request, so that the IOV_MAX does not matter).  However,
that does not seem exactly great.

I see two ways to fix this problem:
1. We split such long requests into two requests.
2. We join some elements of the vector into new buffers to make it
   shorter.

I am wary of (1), because it seems like it may have unintended side
effects.

(2) on the other hand seems relatively simple to implement, with
hopefully few side effects, so this patch does that.

To do this, the use of qemu_iovec_init_extended() in bdrv_pad_request()
is effectively replaced by the new function bdrv_create_padded_qiov(),
which not only wraps the request IOV with padding head/tail, but also
ensures that the resulting vector will not have more than IOV_MAX
elements.  Putting that functionality into qemu_iovec_init_extended() is
infeasible because it requires allocating a bounce buffer; doing so
would require many more parameters (buffer alignment, how to initialize
the buffer, and out parameters like the buffer, its length, and the
original elements), which is not reasonable.

Conversely, it is not difficult to move qemu_iovec_init_extended()'s
functionality into bdrv_create_padded_qiov() by using public
qemu_iovec_* functions, so that is what this patch does.

Because bdrv_pad_request() was the only "serious" user of
qemu_iovec_init_extended(), the next patch will remove the latter
function, so the functionality is not implemented twice.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-3-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 block/io.c | 166 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 151 insertions(+), 15 deletions(-)

diff --git a/block/io.c b/block/io.c
index XXXXXXX..XXXXXXX 100644
--- a/block/io.c
+++ b/block/io.c
@@ -XXX,XX +XXX,XX @@ out:
  * @merge_reads is true for small requests,
  * if @buf_len == @head + bytes + @tail. In this case it is possible that both
  * head and tail exist but @buf_len == align and @tail_buf == @buf.
+ *
+ * @write is true for write requests, false for read requests.
+ *
+ * If padding makes the vector too long (exceeding IOV_MAX), then we need to
+ * merge existing vector elements into a single one.  @collapse_bounce_buf acts
+ * as the bounce buffer in such cases.  @pre_collapse_qiov has the pre-collapse
+ * I/O vector elements so for read requests, the data can be copied back after
+ * the read is done.
  */
 typedef struct BdrvRequestPadding {
     uint8_t *buf;
@@ -XXX,XX +XXX,XX @@ typedef struct BdrvRequestPadding {
     size_t head;
     size_t tail;
     bool merge_reads;
+    bool write;
     QEMUIOVector local_qiov;
+
+    uint8_t *collapse_bounce_buf;
+    size_t collapse_len;
+    QEMUIOVector pre_collapse_qiov;
 } BdrvRequestPadding;
 
 static bool bdrv_init_padding(BlockDriverState *bs,
                               int64_t offset, int64_t bytes,
+                              bool write,
                               BdrvRequestPadding *pad)
 {
     int64_t align = bs->bl.request_alignment;
@@ -XXX,XX +XXX,XX @@ static bool bdrv_init_padding(BlockDriverState *bs,
         pad->tail_buf = pad->buf + pad->buf_len - align;
     }
 
+    pad->write = write;
+
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ zero_mem:
     return 0;
 }
 
-static void bdrv_padding_destroy(BdrvRequestPadding *pad)
+/**
+ * Free *pad's associated buffers, and perform any necessary finalization steps.
+ */
+static void bdrv_padding_finalize(BdrvRequestPadding *pad)
 {
+    if (pad->collapse_bounce_buf) {
+        if (!pad->write) {
+            /*
+             * If padding required elements in the vector to be collapsed into a
+             * bounce buffer, copy the bounce buffer content back
+             */
+            qemu_iovec_from_buf(&pad->pre_collapse_qiov, 0,
+                                pad->collapse_bounce_buf, pad->collapse_len);
+        }
+        qemu_vfree(pad->collapse_bounce_buf);
+        qemu_iovec_destroy(&pad->pre_collapse_qiov);
+    }
     if (pad->buf) {
         qemu_vfree(pad->buf);
         qemu_iovec_destroy(&pad->local_qiov);
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
     memset(pad, 0, sizeof(*pad));
 }
 
+/*
+ * Create pad->local_qiov by wrapping @iov in the padding head and tail, while
+ * ensuring that the resulting vector will not exceed IOV_MAX elements.
+ *
+ * To ensure this, when necessary, the first two or three elements of @iov are
+ * merged into pad->collapse_bounce_buf and replaced by a reference to that
+ * bounce buffer in pad->local_qiov.
+ *
+ * After performing a read request, the data from the bounce buffer must be
+ * copied back into pad->pre_collapse_qiov (e.g. by bdrv_padding_finalize()).
+ */
+static int bdrv_create_padded_qiov(BlockDriverState *bs,
+                                   BdrvRequestPadding *pad,
+                                   struct iovec *iov, int niov,
+                                   size_t iov_offset, size_t bytes)
+{
+    int padded_niov, surplus_count, collapse_count;
+
+    /* Assert this invariant */
+    assert(niov <= IOV_MAX);
+
+    /*
+     * Cannot pad if resulting length would exceed SIZE_MAX.  Returning an error
+     * to the guest is not ideal, but there is little else we can do.  At least
+     * this will practically never happen on 64-bit systems.
+     */
+    if (SIZE_MAX - pad->head < bytes ||
+        SIZE_MAX - pad->head - bytes < pad->tail)
+    {
+        return -EINVAL;
+    }
+
+    /* Length of the resulting IOV if we just concatenated everything */
+    padded_niov = !!pad->head + niov + !!pad->tail;
+
+    qemu_iovec_init(&pad->local_qiov, MIN(padded_niov, IOV_MAX));
+
+    if (pad->head) {
+        qemu_iovec_add(&pad->local_qiov, pad->buf, pad->head);
+    }
+
+    /*
+     * If padded_niov > IOV_MAX, we cannot just concatenate everything.
+     * Instead, merge the first two or three elements of @iov to reduce the
+     * number of vector elements as necessary.
+     */
+    if (padded_niov > IOV_MAX) {
+        /*
+         * Only head and tail can have lead to the number of entries exceeding
+         * IOV_MAX, so we can exceed it by the head and tail at most.  We need
+         * to reduce the number of elements by `surplus_count`, so we merge that
+         * many elements plus one into one element.
+         */
+        surplus_count = padded_niov - IOV_MAX;
+        assert(surplus_count <= !!pad->head + !!pad->tail);
+        collapse_count = surplus_count + 1;
+
+        /*
+         * Move the elements to collapse into `pad->pre_collapse_qiov`, then
+         * advance `iov` (and associated variables) by those elements.
+         */
+        qemu_iovec_init(&pad->pre_collapse_qiov, collapse_count);
+        qemu_iovec_concat_iov(&pad->pre_collapse_qiov, iov,
+                              collapse_count, iov_offset, SIZE_MAX);
+        iov += collapse_count;
+        iov_offset = 0;
+        niov -= collapse_count;
+        bytes -= pad->pre_collapse_qiov.size;
+
+        /*
+         * Construct the bounce buffer to match the length of the to-collapse
+         * vector elements, and for write requests, initialize it with the data
+         * from those elements.  Then add it to `pad->local_qiov`.
+         */
+        pad->collapse_len = pad->pre_collapse_qiov.size;
+        pad->collapse_bounce_buf = qemu_blockalign(bs, pad->collapse_len);
+        if (pad->write) {
+            qemu_iovec_to_buf(&pad->pre_collapse_qiov, 0,
+                              pad->collapse_bounce_buf, pad->collapse_len);
+        }
+        qemu_iovec_add(&pad->local_qiov,
+                       pad->collapse_bounce_buf, pad->collapse_len);
+    }
+
+    qemu_iovec_concat_iov(&pad->local_qiov, iov, niov, iov_offset, bytes);
+
+    if (pad->tail) {
+        qemu_iovec_add(&pad->local_qiov,
+                       pad->buf + pad->buf_len - pad->tail, pad->tail);
+    }
+
+    assert(pad->local_qiov.niov == MIN(padded_niov, IOV_MAX));
+    return 0;
+}
+
 /*
  * bdrv_pad_request
  *
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
  * read of padding, bdrv_padding_rmw_read() should be called separately if
  * needed.
  *
+ * @write is true for write requests, false for read requests.
+ *
  * Request parameters (@qiov, &qiov_offset, &offset, &bytes) are in-out:
  *  - on function start they represent original request
  *  - on failure or when padding is not needed they are unchanged
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
 static int bdrv_pad_request(BlockDriverState *bs,
                             QEMUIOVector **qiov, size_t *qiov_offset,
                             int64_t *offset, int64_t *bytes,
+                            bool write,
                             BdrvRequestPadding *pad, bool *padded,
                             BdrvRequestFlags *flags)
 {
     int ret;
+    struct iovec *sliced_iov;
+    int sliced_niov;
+    size_t sliced_head, sliced_tail;
 
     bdrv_check_qiov_request(*offset, *bytes, *qiov, *qiov_offset, &error_abort);
 
-    if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
+    if (!bdrv_init_padding(bs, *offset, *bytes, write, pad)) {
         if (padded) {
             *padded = false;
         }
         return 0;
     }
 
-    ret = qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
-                                   *qiov, *qiov_offset, *bytes,
-                                   pad->buf + pad->buf_len - pad->tail,
-                                   pad->tail);
+    sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes,
+                                  &sliced_head, &sliced_tail,
+                                  &sliced_niov);
+
+    /* Guaranteed by bdrv_check_qiov_request() */
+    assert(*bytes <= SIZE_MAX);
+    ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov,
+                                  sliced_head, *bytes);
     if (ret < 0) {
-        bdrv_padding_destroy(pad);
+        bdrv_padding_finalize(pad);
         return ret;
     }
     *bytes += pad->head + pad->tail;
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
         flags |= BDRV_REQ_COPY_ON_READ;
     }
 
-    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
-                           NULL, &flags);
+    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, false,
+                           &pad, NULL, &flags);
     if (ret < 0) {
         goto fail;
     }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
                               bs->bl.request_alignment,
                               qiov, qiov_offset, flags);
     tracked_request_end(&req);
-    bdrv_padding_destroy(&pad);
+    bdrv_padding_finalize(&pad);
 
 fail:
     bdrv_dec_in_flight(bs);
@@ -XXX,XX +XXX,XX @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
     /* This flag doesn't make sense for padding or zero writes */
     flags &= ~BDRV_REQ_REGISTERED_BUF;
 
-    padding = bdrv_init_padding(bs, offset, bytes, &pad);
+    padding = bdrv_init_padding(bs, offset, bytes, true, &pad);
     if (padding) {
         assert(!(flags & BDRV_REQ_NO_WAIT));
         bdrv_make_request_serialising(req, align);
@@ -XXX,XX +XXX,XX @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
     }
 
 out:
-    bdrv_padding_destroy(&pad);
+    bdrv_padding_finalize(&pad);
 
     return ret;
 }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
          * bdrv_co_do_zero_pwritev() does aligning by itself, so, we do
          * alignment only if there is no ZERO flag.
          */
-        ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
-                               &padded, &flags);
+        ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, true,
+                               &pad, &padded, &flags);
         if (ret < 0) {
             return ret;
         }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
     ret = bdrv_aligned_pwritev(child, &req, offset, bytes, align,
                                qiov, qiov_offset, flags);
 
-    bdrv_padding_destroy(&pad);
+    bdrv_padding_finalize(&pad);
 
 out:
     tracked_request_end(&req);
-- 
2.40.1

bdrv_pad_request() was the main user of qemu_iovec_init_extended().
HEAD^ has removed that use, so we can remove qemu_iovec_init_extended()
now.

The only remaining user is qemu_iovec_init_slice(), which can easily
inline the small part it really needs.

Note that qemu_iovec_init_extended() offered a memcpy() optimization to
initialize the new I/O vector.  qemu_iovec_concat_iov(), which is used
to replace its functionality, does not, but calls qemu_iovec_add() for
every single element.  If we decide this optimization was important, we
will need to re-implement it in qemu_iovec_concat_iov(), which might
also benefit its pre-existing users.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-4-hreitz@redhat.com>
---
 include/qemu/iov.h |  5 ---
 util/iov.c         | 79 +++++++---------------------------------------
 2 files changed, 11 insertions(+), 73 deletions(-)

diff --git a/include/qemu/iov.h b/include/qemu/iov.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/iov.h
+++ b/include/qemu/iov.h
@@ -XXX,XX +XXX,XX @@ static inline void *qemu_iovec_buf(QEMUIOVector *qiov)
 
 void qemu_iovec_init(QEMUIOVector *qiov, int alloc_hint);
 void qemu_iovec_init_external(QEMUIOVector *qiov, struct iovec *iov, int niov);
-int qemu_iovec_init_extended(
-        QEMUIOVector *qiov,
-        void *head_buf, size_t head_len,
-        QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
-        void *tail_buf, size_t tail_len);
 void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
                            size_t offset, size_t len);
 struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
diff --git a/util/iov.c b/util/iov.c
index XXXXXXX..XXXXXXX 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -XXX,XX +XXX,XX @@ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len)
     return niov;
 }
 
-/*
- * Compile new iovec, combining @head_buf buffer, sub-qiov of @mid_qiov,
- * and @tail_buf buffer into new qiov.
- */
-int qemu_iovec_init_extended(
-        QEMUIOVector *qiov,
-        void *head_buf, size_t head_len,
-        QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
-        void *tail_buf, size_t tail_len)
-{
-    size_t mid_head, mid_tail;
-    int total_niov, mid_niov = 0;
-    struct iovec *p, *mid_iov = NULL;
-
-    assert(mid_qiov->niov <= IOV_MAX);
-
-    if (SIZE_MAX - head_len < mid_len ||
-        SIZE_MAX - head_len - mid_len < tail_len)
-    {
-        return -EINVAL;
-    }
-
-    if (mid_len) {
-        mid_iov = qemu_iovec_slice(mid_qiov, mid_offset, mid_len,
-                                   &mid_head, &mid_tail, &mid_niov);
-    }
-
-    total_niov = !!head_len + mid_niov + !!tail_len;
-    if (total_niov > IOV_MAX) {
-        return -EINVAL;
-    }
-
-    if (total_niov == 1) {
-        qemu_iovec_init_buf(qiov, NULL, 0);
-        p = &qiov->local_iov;
-    } else {
-        qiov->niov = qiov->nalloc = total_niov;
-        qiov->size = head_len + mid_len + tail_len;
-        p = qiov->iov = g_new(struct iovec, qiov->niov);
-    }
-
-    if (head_len) {
-        p->iov_base = head_buf;
-        p->iov_len = head_len;
-        p++;
-    }
-
-    assert(!mid_niov == !mid_len);
-    if (mid_niov) {
-        memcpy(p, mid_iov, mid_niov * sizeof(*p));
-        p[0].iov_base = (uint8_t *)p[0].iov_base + mid_head;
-        p[0].iov_len -= mid_head;
-        p[mid_niov - 1].iov_len -= mid_tail;
-        p += mid_niov;
-    }
-
-    if (tail_len) {
-        p->iov_base = tail_buf;
-        p->iov_len = tail_len;
-    }
-
-    return 0;
-}
-
 /*
  * Check if the contents of subrange of qiov data is all zeroes.
  */
@@ -XXX,XX +XXX,XX @@ bool qemu_iovec_is_zero(QEMUIOVector *qiov, size_t offset, size_t bytes)
 void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
                            size_t offset, size_t len)
 {
-    int ret;
+    struct iovec *slice_iov;
+    int slice_niov;
+    size_t slice_head, slice_tail;
 
     assert(source->size >= len);
     assert(source->size - len >= offset);
 
-    /* We shrink the request, so we can't overflow neither size_t nor MAX_IOV */
-    ret = qemu_iovec_init_extended(qiov, NULL, 0, source, offset, len, NULL, 0);
-    assert(ret == 0);
+    slice_iov = qemu_iovec_slice(source, offset, len,
+                                 &slice_head, &slice_tail, &slice_niov);
+    if (slice_niov == 1) {
+        qemu_iovec_init_buf(qiov, slice_iov[0].iov_base + slice_head, len);
+    } else {
+        qemu_iovec_init(qiov, slice_niov);
+        qemu_iovec_concat_iov(qiov, slice_iov, slice_niov, slice_head, len);
+    }
 }
 
 void qemu_iovec_destroy(QEMUIOVector *qiov)
-- 
2.40.1

Test that even vectored IO requests with 1024 vector elements that are
not aligned to the device's request alignment will succeed.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-5-hreitz@redhat.com>
---
 tests/qemu-iotests/tests/iov-padding     | 85 ++++++++++++++++++++++++
 tests/qemu-iotests/tests/iov-padding.out | 59 ++++++++++++++++
 2 files changed, 144 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/iov-padding
 create mode 100644 tests/qemu-iotests/tests/iov-padding.out

diff --git a/tests/qemu-iotests/tests/iov-padding b/tests/qemu-iotests/tests/iov-padding
new file mode 100755
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qemu-iotests/tests/iov-padding
@@ -XXX,XX +XXX,XX @@
+#!/usr/bin/env bash
+# group: rw quick
+#
+# Check the interaction of request padding (to fit alignment restrictions) with
+# vectored I/O from the guest
+#
+# Copyright Red Hat
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto file
+
+_make_test_img 1M
+
+IMGSPEC="driver=blkdebug,align=4096,image.driver=file,image.filename=$TEST_IMG"
+
+# Four combinations:
+# - Offset 4096, length 1023 * 512 + 512: Fully aligned to 4k
+# - Offset 4096, length 1023 * 512 + 4096: Head is aligned, tail is not
+# - Offset 512, length 1023 * 512 + 512: Neither head nor tail are aligned
+# - Offset 512, length 1023 * 512 + 4096: Tail is aligned, head is not
+for start_offset in 4096 512; do
+    for last_element_length in 512 4096; do
+        length=$((1023 * 512 + $last_element_length))
+
+        echo
+        echo "== performing 1024-element vectored requests to image (offset: $start_offset; length: $length) =="
+
+        # Fill with data for testing
+        $QEMU_IO -c 'write -P 1 0 1M' "$TEST_IMG" | _filter_qemu_io
+
+        # 1023 512-byte buffers, and then one with length $last_element_length
+        cmd_params="-P 2 $start_offset $(yes 512 | head -n 1023 | tr '\n' ' ') $last_element_length"
+        QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" $QEMU_IO \
+            -c "writev $cmd_params" \
+            --image-opts \
+            "$IMGSPEC" \
+            | _filter_qemu_io
+
+        # Read all patterns -- read the part we just wrote with writev twice,
+        # once "normally", and once with a readv, so we see that that works, too
+        QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" $QEMU_IO \
+            -c "read -P 1 0 $start_offset" \
+            -c "read -P 2 $start_offset $length" \
+            -c "readv $cmd_params" \
+            -c "read -P 1 $((start_offset + length)) $((1024 * 1024 - length - start_offset))" \
+            --image-opts \
+            "$IMGSPEC" \
+            | _filter_qemu_io
+    done
+done
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/iov-padding.out b/tests/qemu-iotests/tests/iov-padding.out
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qemu-iotests/tests/iov-padding.out
@@ -XXX,XX +XXX,XX @@
+QA output created by iov-padding
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+
+== performing 1024-element vectored requests to image (offset: 4096; length: 524288) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 524288/524288 bytes at offset 4096
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 4096
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 4096
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 520192/520192 bytes at offset 528384
+508 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== performing 1024-element vectored requests to image (offset: 4096; length: 527872) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 527872/527872 bytes at offset 4096
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 4096
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 4096
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 516608/516608 bytes at offset 531968
+504.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== performing 1024-element vectored requests to image (offset: 512; length: 524288) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 524288/524288 bytes at offset 512
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 512
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 512
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 523776/523776 bytes at offset 524800
+511.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== performing 1024-element vectored requests to image (offset: 512; length: 527872) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 527872/527872 bytes at offset 512
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 512
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 512
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 520192/520192 bytes at offset 528384
+508 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+*** done
-- 
2.40.1