Series comparison

-[PULL 00/17] Block patches
+[PULL v2 00/28] Block patches
-The following changes since commit 848a6caa88b9f082c89c9b41afa975761262981d:
+The following changes since commit ac793156f650ae2d77834932d72224175ee69086:
-  Merge tag 'migration-20230602-pull-request' of https://gitlab.com/juan.quintela/qemu into staging (2023-06-02 17:33:29 -0700)
+  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201020-1' into staging (2020-10-20 21:11:35 +0100)
 are available in the Git repository at:
-  https://gitlab.com/hreitz/qemu.git tags/pull-block-2023-06-05
+  https://gitlab.com/stefanha/qemu.git tags/block-pull-request
-for you to fetch changes up to 42a2890a76f4783cd1c212f27856edcf2b5e8a75:
+for you to fetch changes up to 32a3fd65e7e3551337fd26bfc0e2f899d70c028c:
-  qcow2: add discard-no-unref option (2023-06-05 13:15:42 +0200)
+  iotests: add commit top->base cases to 274 (2020-10-22 09:55:39 +0100)
 ----------------------------------------------------------------
-Block patches
+Pull request
-- Fix padding of unaligned vectored requests to match the host alignment
+v2:
-  for vectors with 1023 or 1024 buffers
+ * Fix format string issues on 32-bit hosts [Peter]
-- Refactor and fix bugs in parallels's image check functionality
+ * Fix qemu-nbd.c CONFIG_POSIX ifdef issue [Eric]
-- Add an option to the qcow2 driver to retain (qcow2-level) allocations
+ * Fix missing eventfd.h header on macOS [Peter]
-  on discard requests from the guest (while still forwarding the discard
+ * Drop unreliable vhost-user-blk test (will send a new patch when ready) [Peter]
-  to the lower level and marking the range as zero)
 This pull request contains the vhost-user-blk server by Coiby Xu along with my
 additions, block/nvme.c alignment and hardware error statistics by Philippe
 Mathieu-Daudé, and bdrv_co_block_status_above() fixes by Vladimir
 Sementsov-Ogievskiy.
 ----------------------------------------------------------------
-Alexander Ivanov (12):
-  parallels: Out of image offset in BAT leads to image inflation
-  parallels: Fix high_off calculation in parallels_co_check()
-  parallels: Fix image_end_offset and data_end after out-of-image check
-  parallels: create parallels_set_bat_entry_helper() to assign BAT value
-  parallels: Use generic infrastructure for BAT writing in
-    parallels_co_check()
-  parallels: Move check of unclean image to a separate function
-  parallels: Move check of cluster outside image to a separate function
-  parallels: Fix statistics calculation
-  parallels: Move check of leaks to a separate function
-  parallels: Move statistic collection to a separate function
-  parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD
-  parallels: Incorrect condition in out-of-image check
-Hanna Czenczek (4):
+Coiby Xu (6):
-  util/iov: Make qiov_slice() public
+  libvhost-user: Allow vu_message_read to be replaced
-  block: Collapse padded I/O vecs exceeding IOV_MAX
+  libvhost-user: remove watch for kick_fd when de-initialize vu-dev
-  util/iov: Remove qemu_iovec_init_extended()
+  util/vhost-user-server: generic vhost user server
-  iotests/iov-padding: New test
+  block: move logical block size check function to a common utility
     function
   block/export: vhost-user block device backend server
   MAINTAINERS: Add vhost-user block device backend server maintainer
-Jean-Louis Dupond (1):
+Philippe Mathieu-Daudé (1):
-  qcow2: add discard-no-unref option
+  block/nvme: Add driver statistics for access alignment and hw errors
- qapi/block-core.json                     |  12 ++
+Stefan Hajnoczi (16):
- block/qcow2.h                            |   3 +
+  util/vhost-user-server: s/fileds/fields/ typo fix
- include/qemu/iov.h                       |   8 +-
+  util/vhost-user-server: drop unnecessary QOM cast
- block/io.c                               | 166 ++++++++++++++++++--
+  util/vhost-user-server: drop unnecessary watch deletion
- block/parallels.c                        | 190 ++++++++++++++++-------
+  block/export: consolidate request structs into VuBlockReq
- block/qcow2-cluster.c                    |  32 +++-
+  util/vhost-user-server: drop unused DevicePanicNotifier
- block/qcow2.c                            |  18 +++
+  util/vhost-user-server: fix memory leak in vu_message_read()
- util/iov.c                               |  89 ++---------
+  util/vhost-user-server: check EOF when reading payload
- qemu-options.hx                          |  12 ++
+  util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
- tests/qemu-iotests/tests/iov-padding     |  85 ++++++++++
+  block/export: report flush errors
- tests/qemu-iotests/tests/iov-padding.out |  59 +++++++
+  block/export: convert vhost-user-blk server to block export API
-files changed, 523 insertions(+), 151 deletions(-)
+  util/vhost-user-server: move header to include/
- create mode 100755 tests/qemu-iotests/tests/iov-padding
+  util/vhost-user-server: use static library in meson.build
- create mode 100644 tests/qemu-iotests/tests/iov-padding.out
+  qemu-storage-daemon: avoid compiling blockdev_ss twice
   block: move block exports to libblockdev
   block/export: add iothread and fixed-iothread options
   block/export: add vhost-user-blk multi-queue support
 Vladimir Sementsov-Ogievskiy (5):
   block/io: fix bdrv_co_block_status_above
   block/io: bdrv_common_block_status_above: support include_base
   block/io: bdrv_common_block_status_above: support bs == base
   block/io: fix bdrv_is_allocated_above
   iotests: add commit top->base cases to 274
  MAINTAINERS                                |   9 +
  qapi/block-core.json                       |  24 +-
  qapi/block-export.json                     |  36 +-
  block/coroutines.h                         |   2 +
  block/export/vhost-user-blk-server.h       |  19 +
  contrib/libvhost-user/libvhost-user.h      |  21 +
  include/qemu/vhost-user-server.h           |  65 +++
  util/block-helpers.h                       |  19 +
  block/export/export.c                      |  37 +-
  block/export/vhost-user-blk-server.c       | 431 ++++++++++++++++++++
  block/io.c                                 | 132 +++---
  block/nvme.c                               |  27 ++
  block/qcow2.c                              |  16 +-
  contrib/libvhost-user/libvhost-user-glib.c |   2 +-
  contrib/libvhost-user/libvhost-user.c      |  15 +-
  hw/core/qdev-properties-system.c           |  31 +-
  nbd/server.c                               |   2 -
  qemu-nbd.c                                 |  21 +-
  softmmu/vl.c                               |   4 +
  stubs/blk-exp-close-all.c                  |   7 +
  tests/vhost-user-bridge.c                  |   2 +
  tools/virtiofsd/fuse_virtio.c              |   4 +-
  util/block-helpers.c                       |  46 +++
  util/vhost-user-server.c                   | 446 +++++++++++++++++++++
  block/export/meson.build                   |   3 +-
  contrib/libvhost-user/meson.build          |   1 +
  meson.build                                |  22 +-
  nbd/meson.build                            |   2 +
  storage-daemon/meson.build                 |   3 +-
  stubs/meson.build                          |   1 +
  tests/qemu-iotests/274                     |  20 +
  tests/qemu-iotests/274.out                 |  68 ++++
  util/meson.build                           |   4 +
 files changed, 1420 insertions(+), 122 deletions(-)
  create mode 100644 block/export/vhost-user-blk-server.h
  create mode 100644 include/qemu/vhost-user-server.h
  create mode 100644 util/block-helpers.h
  create mode 100644 block/export/vhost-user-blk-server.c
  create mode 100644 stubs/blk-exp-close-all.c
  create mode 100644 util/block-helpers.c
  create mode 100644 util/vhost-user-server.c
 --
-.40.1
+.26.2

-[PULL 17/17] qcow2: add discard-no-unref option
+[PULL v2 01/28] block/nvme: Add driver statistics for access alignment and hw errors
-From: Jean-Louis Dupond <jean-louis@dupond.be>
+From: Philippe Mathieu-Daudé <philmd@redhat.com>
-When we for example have a sparse qcow2 image and discard: unmap is enabled,
+Keep statistics of some hardware errors, and number of
-there can be a lot of fragmentation in the image after some time. Especially on VM's
+aligned/unaligned I/O accesses.
 that do a lot of writes/deletes.
 This causes the qcow2 image to grow even over 110% of its virtual size,
 because the free gaps in the image get too small to allocate new
 continuous clusters. So it allocates new space at the end of the image.
-Disabling discard is not an option, as discard is needed to keep the
+QMP example booting a full RHEL 8.3 aarch64 guest:
 incremental backup size as low as possible. Without discard, the
 incremental backups would become large, as qemu thinks it's just dirty
 blocks but it doesn't know the blocks are unneeded.
 So we need to avoid fragmentation but also 'empty' the unneeded blocks in
 the image to have a small incremental backup.
-In addition, we also want to send the discards further down the stack, so
+{ "execute": "query-blockstats" }
-the underlying blocks are still discarded.
+{
     "return": [
         {
             "device": "",
             "node-name": "drive0",
             "stats": {
                 "flush_total_time_ns": 6026948,
                 "wr_highest_offset": 3383991230464,
                 "wr_total_time_ns": 807450995,
                 "failed_wr_operations": 0,
                 "failed_rd_operations": 0,
                 "wr_merged": 3,
                 "wr_bytes": 50133504,
                 "failed_unmap_operations": 0,
                 "failed_flush_operations": 0,
                 "account_invalid": false,
                 "rd_total_time_ns": 1846979900,
                 "flush_operations": 130,
                 "wr_operations": 659,
                 "rd_merged": 1192,
                 "rd_bytes": 218244096,
                 "account_failed": false,
                 "idle_time_ns": 2678641497,
                 "rd_operations": 7406,
             },
             "driver-specific": {
                 "driver": "nvme",
                 "completion-errors": 0,
                 "unaligned-accesses": 2959,
                 "aligned-accesses": 4477
             },
             "qdev": "/machine/peripheral-anon/device[0]/virtio-backend"
         }
     ]
 }
-Therefor we introduce a new qcow2 option "discard-no-unref".
+Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
-When setting this option to true, discards will no longer have the qcow2
+Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-driver relinquish cluster allocations. Other than that, the request is
+Acked-by: Markus Armbruster <armbru@redhat.com>
-handled as normal: All clusters in range are marked as zero, and, if
+Message-id: 20201001162939.1567915-1-philmd@redhat.com
-pass-discard-request is true, it is passed further down the stack.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 The only difference is that the now-zero clusters are preallocated
 instead of being unallocated.
 This will avoid fragmentation on the qcow2 image.
 Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
 Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
 Message-Id: <20230605084523.34134-2-jean-louis@dupond.be>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
- qapi/block-core.json  | 12 ++++++++++++
+ qapi/block-core.json | 24 +++++++++++++++++++++++-
- block/qcow2.h         |  3 +++
+ block/nvme.c         | 27 +++++++++++++++++++++++++++
- block/qcow2-cluster.c | 32 ++++++++++++++++++++++++++++----
+files changed, 50 insertions(+), 1 deletion(-)
  block/qcow2.c         | 18 ++++++++++++++++++
  qemu-options.hx       | 12 ++++++++++++
 files changed, 73 insertions(+), 4 deletions(-)
 diff --git a/qapi/block-core.json b/qapi/block-core.json
 index XXXXXXX..XXXXXXX 100644
 --- a/qapi/block-core.json
 +++ b/qapi/block-core.json
 @@ -XXX,XX +XXX,XX @@
- # @pass-discard-other: whether discard requests for the data source
+       'discard-nb-failed': 'uint64',
- #     should be issued on other occasions where a cluster gets freed
+       'discard-bytes-ok': 'uint64' } }
- #
-+# @discard-no-unref: when enabled, discards from the guest will not cause
++##
-+#     cluster allocations to be relinquished. This prevents qcow2 fragmentation
++# @BlockStatsSpecificNvme:
 +#     that would be caused by such discards. Besides potential
 +#     performance degradation, such fragmentation can lead to increased
 +#     allocation of clusters past the end of the image file,
 +#     resulting in image files whose file length can grow much larger
 +#     than their guest disk size would suggest.
 +#     If image file length is of concern (e.g. when storing qcow2
 +#     images directly on block devices), you should consider enabling
 +#     this option. (since 8.1)
 +#
- # @overlap-check: which overlap checks to perform for writes to the
++# NVMe driver statistics
- #     image, defaults to 'cached' (since 2.2)
++#
 +# @completion-errors: The number of completion errors.
 +#
 +# @aligned-accesses: The number of aligned accesses performed by
 +#                    the driver.
 +#
 +# @unaligned-accesses: The number of unaligned accesses performed by
 +#                      the driver.
 +#
 +# Since: 5.2
 +##
 +{ 'struct': 'BlockStatsSpecificNvme',
 +  'data': {
 +      'completion-errors': 'uint64',
 +      'aligned-accesses': 'uint64',
 +      'unaligned-accesses': 'uint64' } }
 +
  ##
  # @BlockStatsSpecific:
  #
 @@ -XXX,XX +XXX,XX @@
-             '*pass-discard-request': 'bool',
+   'discriminator': 'driver',
-             '*pass-discard-snapshot': 'bool',
+   'data': {
-             '*pass-discard-other': 'bool',
+       'file': 'BlockStatsSpecificFile',
-+            '*discard-no-unref': 'bool',
+-      'host_device': 'BlockStatsSpecificFile' } }
-             '*overlap-check': 'Qcow2OverlapChecks',
++      'host_device': 'BlockStatsSpecificFile',
-             '*cache-size': 'int',
++      'nvme': 'BlockStatsSpecificNvme' } }
-             '*l2-cache-size': 'int',
-diff --git a/block/qcow2.h b/block/qcow2.h
+ ##
  # @BlockStats:
 diff --git a/block/nvme.c b/block/nvme.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2.h
+--- a/block/nvme.c
-+++ b/block/qcow2.h
++++ b/block/nvme.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ struct BDRVNVMeState {
- #define QCOW2_OPT_DISCARD_REQUEST "pass-discard-request"
- #define QCOW2_OPT_DISCARD_SNAPSHOT "pass-discard-snapshot"
+     /* PCI address (required for nvme_refresh_filename()) */
- #define QCOW2_OPT_DISCARD_OTHER "pass-discard-other"
+     char *device;
 +#define QCOW2_OPT_DISCARD_NO_UNREF "discard-no-unref"
  #define QCOW2_OPT_OVERLAP "overlap-check"
  #define QCOW2_OPT_OVERLAP_TEMPLATE "overlap-check.template"
  #define QCOW2_OPT_OVERLAP_MAIN_HEADER "overlap-check.main-header"
@@ -XXX,XX +XXX,XX @@ typedef struct BDRVQcow2State {
      bool discard_passthrough[QCOW2_DISCARD_MAX];
 +    bool discard_no_unref;
 +
-     int overlap_check; /* bitmask of Qcow2MetadataOverlap values */
++    struct {
-     bool signaled_corruption;
++        uint64_t completion_errors;
++        uint64_t aligned_accesses;
-diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
++        uint64_t unaligned_accesses;
-index XXXXXXX..XXXXXXX 100644
++    } stats;
---- a/block/qcow2-cluster.c
+ };
-+++ b/block/qcow2-cluster.c
-@@ -XXX,XX +XXX,XX @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
+ #define NVME_BLOCK_OPT_DEVICE "device"
-         uint64_t new_l2_bitmap = old_l2_bitmap;
+@@ -XXX,XX +XXX,XX @@ static bool nvme_process_completion(NVMeQueuePair *q)
-         QCow2ClusterType cluster_type =
+             break;
              qcow2_get_cluster_type(bs, old_l2_entry);
 +        bool keep_reference = (cluster_type != QCOW2_CLUSTER_COMPRESSED) &&
 +                              !full_discard &&
 +                              (s->discard_no_unref &&
 +                               type == QCOW2_DISCARD_REQUEST);
          /*
           * If full_discard is true, the cluster should not read back as zeroes,
@@ -XXX,XX +XXX,XX @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
              new_l2_entry = new_l2_bitmap = 0;
          } else if (bs->backing || qcow2_cluster_is_allocated(cluster_type)) {
              if (has_subclusters(s)) {
 -                new_l2_entry = 0;
 +                if (keep_reference) {
 +                    new_l2_entry = old_l2_entry;
 +                } else {
 +                    new_l2_entry = 0;
 +                }
                  new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
              } else {
 -                new_l2_entry = s->qcow_version >= 3 ? QCOW_OFLAG_ZERO : 0;
 +                if (s->qcow_version >= 3) {
 +                    if (keep_reference) {
 +                        new_l2_entry |= QCOW_OFLAG_ZERO;
 +                    } else {
 +                        new_l2_entry = QCOW_OFLAG_ZERO;
 +                    }
 +                } else {
 +                    new_l2_entry = 0;
 +                }
              }
          }
+         ret = nvme_translate_error(c);
-@@ -XXX,XX +XXX,XX @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
++        if (ret) {
-         if (has_subclusters(s)) {
++            s->stats.completion_errors++;
-             set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
++        }
-         }
+         q->cq.head = (q->cq.head + 1) % NVME_QUEUE_SIZE;
--        /* Then decrease the refcount */
+         if (!q->cq.head) {
--        qcow2_free_any_cluster(bs, old_l2_entry, type);
+             q->cq_phase = !q->cq_phase;
-+        if (!keep_reference) {
+@@ -XXX,XX +XXX,XX @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
-+            /* Then decrease the refcount */
+     assert(QEMU_IS_ALIGNED(bytes, s->page_size));
-+            qcow2_free_any_cluster(bs, old_l2_entry, type);
+     assert(bytes <= s->max_transfer);
-+        } else if (s->discard_passthrough[type] &&
+     if (nvme_qiov_aligned(bs, qiov)) {
-+                   (cluster_type == QCOW2_CLUSTER_NORMAL ||
++        s->stats.aligned_accesses++;
-+                    cluster_type == QCOW2_CLUSTER_ZERO_ALLOC)) {
+         return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
 +            /* If we keep the reference, pass on the discard still */
 +            bdrv_pdiscard(s->data_file, old_l2_entry & L2E_OFFSET_MASK,
 +                          s->cluster_size);
 +       }
      }
++    s->stats.unaligned_accesses++;
-     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
+     trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
-diff --git a/block/qcow2.c b/block/qcow2.c
+     buf = qemu_try_memalign(s->page_size, bytes);
-index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2.c
+@@ -XXX,XX +XXX,XX @@ static void nvme_unregister_buf(BlockDriverState *bs, void *host)
-+++ b/block/qcow2.c
+     qemu_vfio_dma_unmap(s->vfio, host);
-@@ -XXX,XX +XXX,XX @@ static const char *const mutable_opts[] = {
+ }
-     QCOW2_OPT_DISCARD_REQUEST,
-     QCOW2_OPT_DISCARD_SNAPSHOT,
++static BlockStatsSpecific *nvme_get_specific_stats(BlockDriverState *bs)
-     QCOW2_OPT_DISCARD_OTHER,
++{
-+    QCOW2_OPT_DISCARD_NO_UNREF,
++    BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
-     QCOW2_OPT_OVERLAP,
++    BDRVNVMeState *s = bs->opaque;
      QCOW2_OPT_OVERLAP_TEMPLATE,
      QCOW2_OPT_OVERLAP_MAIN_HEADER,
@@ -XXX,XX +XXX,XX @@ static QemuOptsList qcow2_runtime_opts = {
              .type = QEMU_OPT_BOOL,
              .help = "Generate discard requests when other clusters are freed",
          },
 +        {
 +            .name = QCOW2_OPT_DISCARD_NO_UNREF,
 +            .type = QEMU_OPT_BOOL,
 +            .help = "Do not unreference discarded clusters",
 +        },
          {
              .name = QCOW2_OPT_OVERLAP,
              .type = QEMU_OPT_STRING,
@@ -XXX,XX +XXX,XX @@ typedef struct Qcow2ReopenState {
      bool use_lazy_refcounts;
      int overlap_check;
      bool discard_passthrough[QCOW2_DISCARD_MAX];
 +    bool discard_no_unref;
      uint64_t cache_clean_interval;
      QCryptoBlockOpenOptions *crypto_opts; /* Disk encryption runtime options */
  } Qcow2ReopenState;
@@ -XXX,XX +XXX,XX @@ static int qcow2_update_options_prepare(BlockDriverState *bs,
      r->discard_passthrough[QCOW2_DISCARD_OTHER] =
          qemu_opt_get_bool(opts, QCOW2_OPT_DISCARD_OTHER, false);
 +    r->discard_no_unref = qemu_opt_get_bool(opts, QCOW2_OPT_DISCARD_NO_UNREF,
 +                                            false);
 +    if (r->discard_no_unref && s->qcow_version < 3) {
 +        error_setg(errp,
 +                   "discard-no-unref is only supported since qcow2 version 3");
 +        ret = -EINVAL;
 +        goto fail;
 +    }
 +
-     switch (s->crypt_method_header) {
++    stats->driver = BLOCKDEV_DRIVER_NVME;
-     case QCOW_CRYPT_NONE:
++    stats->u.nvme = (BlockStatsSpecificNvme) {
-         if (encryptfmt) {
++        .completion_errors = s->stats.completion_errors,
-@@ -XXX,XX +XXX,XX @@ static void qcow2_update_options_commit(BlockDriverState *bs,
++        .aligned_accesses = s->stats.aligned_accesses,
-         s->discard_passthrough[i] = r->discard_passthrough[i];
++        .unaligned_accesses = s->stats.unaligned_accesses,
-     }
++    };
 +    s->discard_no_unref = r->discard_no_unref;
 +
-     if (s->cache_clean_interval != r->cache_clean_interval) {
++    return stats;
-         cache_clean_timer_del(bs);
++}
          s->cache_clean_interval = r->cache_clean_interval;
 diff --git a/qemu-options.hx b/qemu-options.hx
 index XXXXXXX..XXXXXXX 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ SRST
              issued on other occasions where a cluster gets freed
              (on/off; default: off)
 +        ``discard-no-unref``
 +            When enabled, discards from the guest will not cause cluster
 +            allocations to be relinquished. This prevents qcow2 fragmentation
 +            that would be caused by such discards. Besides potential
 +            performance degradation, such fragmentation can lead to increased
 +            allocation of clusters past the end of the image file,
 +            resulting in image files whose file length can grow much larger
 +            than their guest disk size would suggest.
 +            If image file length is of concern (e.g. when storing qcow2
 +            images directly on block devices), you should consider enabling
 +            this option.
 +
-         ``overlap-check``
+ static const char *const nvme_strong_runtime_opts[] = {
-             Which overlap checks to perform for writes to the image
+     NVME_BLOCK_OPT_DEVICE,
-             (none/constant/cached/all; default: cached). For details or
+     NVME_BLOCK_OPT_NAMESPACE,
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_nvme = {
      .bdrv_refresh_filename    = nvme_refresh_filename,
      .bdrv_refresh_limits      = nvme_refresh_limits,
      .strong_runtime_opts      = nvme_strong_runtime_opts,
 +    .bdrv_get_specific_stats  = nvme_get_specific_stats,
      .bdrv_detach_aio_context  = nvme_detach_aio_context,
      .bdrv_attach_aio_context  = nvme_attach_aio_context,
 --
-.40.1
+.26.2

-[PULL 08/17] parallels: create parallels_set_bat_entry_helper() to assign BAT value
+[PULL v2 02/28] libvhost-user: Allow vu_message_read to be replaced
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+From: Coiby Xu <coiby.xu@gmail.com>
-This helper will be reused in next patches during parallels_co_check
+Allow vu_message_read to be replaced by one which will make use of the
-rework to simplify its code.
+QIOChannel functions. Thus reading vhost-user message won't stall the
 guest. For slave channel, we still use the default vu_message_read.
-Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
-Reviewed-by: Denis V. Lunev <den@openvz.org>
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
-Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Message-Id: <20230424093147.197643-5-alexander.ivanov@virtuozzo.com>
+Message-id: 20200918080912.321299-2-coiby.xu@gmail.com
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
- block/parallels.c | 11 ++++++++---
+ contrib/libvhost-user/libvhost-user.h      | 21 +++++++++++++++++++++
-file changed, 8 insertions(+), 3 deletions(-)
+ contrib/libvhost-user/libvhost-user-glib.c |  2 +-
  contrib/libvhost-user/libvhost-user.c      | 14 +++++++-------
  tests/vhost-user-bridge.c                  |  2 ++
  tools/virtiofsd/fuse_virtio.c              |  4 ++--
 files changed, 33 insertions(+), 10 deletions(-)
-diff --git a/block/parallels.c b/block/parallels.c
+diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/contrib/libvhost-user/libvhost-user.h
-+++ b/block/parallels.c
++++ b/contrib/libvhost-user/libvhost-user.h
-@@ -XXX,XX +XXX,XX @@ static int64_t block_status(BDRVParallelsState *s, int64_t sector_num,
+@@ -XXX,XX +XXX,XX @@
-     return start_off;
+  */
  #define VHOST_USER_MAX_RAM_SLOTS 32
 +#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
 +
  typedef enum VhostSetConfigType {
      VHOST_SET_CONFIG_TYPE_MASTER = 0,
      VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
@@ -XXX,XX +XXX,XX @@ typedef uint64_t (*vu_get_features_cb) (VuDev *dev);
  typedef void (*vu_set_features_cb) (VuDev *dev, uint64_t features);
  typedef int (*vu_process_msg_cb) (VuDev *dev, VhostUserMsg *vmsg,
                                    int *do_reply);
 +typedef bool (*vu_read_msg_cb) (VuDev *dev, int sock, VhostUserMsg *vmsg);
  typedef void (*vu_queue_set_started_cb) (VuDev *dev, int qidx, bool started);
  typedef bool (*vu_queue_is_processed_in_order_cb) (VuDev *dev, int qidx);
  typedef int (*vu_get_config_cb) (VuDev *dev, uint8_t *config, uint32_t len);
@@ -XXX,XX +XXX,XX @@ struct VuDev {
      bool broken;
      uint16_t max_queues;
 +    /* @read_msg: custom method to read vhost-user message
 +     *
 +     * Read data from vhost_user socket fd and fill up
 +     * the passed VhostUserMsg *vmsg struct.
 +     *
 +     * If reading fails, it should close the received set of file
 +     * descriptors as socket message's auxiliary data.
 +     *
 +     * For the details, please refer to vu_message_read in libvhost-user.c
 +     * which will be used by default if not custom method is provided when
 +     * calling vu_init
 +     *
 +     * Returns: true if vhost-user message successfully received,
 +     *          otherwise return false.
 +     *
 +     */
 +    vu_read_msg_cb read_msg;
      /* @set_watch: add or update the given fd to the watch set,
       * call cb when condition is met */
      vu_set_watch_cb set_watch;
@@ -XXX,XX +XXX,XX @@ bool vu_init(VuDev *dev,
               uint16_t max_queues,
               int socket,
               vu_panic_cb panic,
 +             vu_read_msg_cb read_msg,
               vu_set_watch_cb set_watch,
               vu_remove_watch_cb remove_watch,
               const VuDevIface *iface);
 diff --git a/contrib/libvhost-user/libvhost-user-glib.c b/contrib/libvhost-user/libvhost-user-glib.c
 index XXXXXXX..XXXXXXX 100644
 --- a/contrib/libvhost-user/libvhost-user-glib.c
 +++ b/contrib/libvhost-user/libvhost-user-glib.c
@@ -XXX,XX +XXX,XX @@ vug_init(VugDev *dev, uint16_t max_queues, int socket,
      g_assert(dev);
      g_assert(iface);
 -    if (!vu_init(&dev->parent, max_queues, socket, panic, set_watch,
 +    if (!vu_init(&dev->parent, max_queues, socket, panic, NULL, set_watch,
                   remove_watch, iface)) {
          return false;
      }
 diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
 index XXXXXXX..XXXXXXX 100644
 --- a/contrib/libvhost-user/libvhost-user.c
 +++ b/contrib/libvhost-user/libvhost-user.c
@@ -XXX,XX +XXX,XX @@
  /* The version of inflight buffer */
  #define INFLIGHT_VERSION 1
 -#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
 -
  /* The version of the protocol we support */
  #define VHOST_USER_VERSION 1
  #define LIBVHOST_USER_DEBUG 0
@@ -XXX,XX +XXX,XX @@ have_userfault(void)
  }
-+static void parallels_set_bat_entry(BDRVParallelsState *s,
+ static bool
-+                                    uint32_t index, uint32_t offset)
+-vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
-+{
++vu_message_read_default(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
-+    s->bat_bitmap[index] = cpu_to_le32(offset);
+ {
-+    bitmap_set(s->bat_dirty_bmap, bat_entry_off(index) / s->bat_dirty_block, 1);
+     char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int))] = {};
-+}
+     struct iovec iov = {
-+
+@@ -XXX,XX +XXX,XX @@ vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
- static int64_t coroutine_fn GRAPH_RDLOCK
+         goto out;
  allocate_clusters(BlockDriverState *bs, int64_t sector_num,
                    int nb_sectors, int *pnum)
@@ -XXX,XX +XXX,XX @@ allocate_clusters(BlockDriverState *bs, int64_t sector_num,
      }
-     for (i = 0; i < to_allocate; i++) {
+-    if (!vu_message_read(dev, dev->slave_fd, &msg_reply)) {
--        s->bat_bitmap[idx + i] = cpu_to_le32(s->data_end / s->off_multiplier);
++    if (!vu_message_read_default(dev, dev->slave_fd, &msg_reply)) {
-+        parallels_set_bat_entry(s, idx + i, s->data_end / s->off_multiplier);
+         goto out;
          s->data_end += s->tracks;
 -        bitmap_set(s->bat_dirty_bmap,
 -                   bat_entry_off(idx + i) / s->bat_dirty_block, 1);
      }
-     return bat2sect(s, idx) + sector_num % s->tracks;
+@@ -XXX,XX +XXX,XX @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
      /* Wait for QEMU to confirm that it's registered the handler for the
       * faults.
       */
 -    if (!vu_message_read(dev, dev->sock, vmsg) ||
 +    if (!dev->read_msg(dev, dev->sock, vmsg) ||
          vmsg->size != sizeof(vmsg->payload.u64) ||
          vmsg->payload.u64 != 0) {
          vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
@@ -XXX,XX +XXX,XX @@ vu_dispatch(VuDev *dev)
      int reply_requested;
      bool need_reply, success = false;
 -    if (!vu_message_read(dev, dev->sock, &vmsg)) {
 +    if (!dev->read_msg(dev, dev->sock, &vmsg)) {
          goto end;
      }
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
          uint16_t max_queues,
          int socket,
          vu_panic_cb panic,
 +        vu_read_msg_cb read_msg,
          vu_set_watch_cb set_watch,
          vu_remove_watch_cb remove_watch,
          const VuDevIface *iface)
@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
      dev->sock = socket;
      dev->panic = panic;
 +    dev->read_msg = read_msg ? read_msg : vu_message_read_default;
      dev->set_watch = set_watch;
      dev->remove_watch = remove_watch;
      dev->iface = iface;
@@ -XXX,XX +XXX,XX @@ static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
          vu_message_write(dev, dev->slave_fd, &vmsg);
          if (ack) {
 -            vu_message_read(dev, dev->slave_fd, &vmsg);
 +            vu_message_read_default(dev, dev->slave_fd, &vmsg);
          }
          return;
      }
 diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/vhost-user-bridge.c
 +++ b/tests/vhost-user-bridge.c
@@ -XXX,XX +XXX,XX @@ vubr_accept_cb(int sock, void *ctx)
                   VHOST_USER_BRIDGE_MAX_QUEUES,
                   conn_fd,
                   vubr_panic,
 +                 NULL,
                   vubr_set_watch,
                   vubr_remove_watch,
                   &vuiface)) {
@@ -XXX,XX +XXX,XX @@ vubr_new(const char *path, bool client)
                       VHOST_USER_BRIDGE_MAX_QUEUES,
                       dev->sock,
                       vubr_panic,
 +                     NULL,
                       vubr_set_watch,
                       vubr_remove_watch,
                       &vuiface)) {
 diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tools/virtiofsd/fuse_virtio.c
 +++ b/tools/virtiofsd/fuse_virtio.c
@@ -XXX,XX +XXX,XX @@ int virtio_session_mount(struct fuse_session *se)
      se->vu_socketfd = data_sock;
      se->virtio_dev->se = se;
      pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
 -    vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
 -            fv_remove_watch, &fv_iface);
 +    vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
 +            fv_set_watch, fv_remove_watch, &fv_iface);
      return 0;
  }
 --
-.40.1
+.26.2

-[PULL 06/17] parallels: Fix high_off calculation in parallels_co_check()
+[PULL v2 03/28] libvhost-user: remove watch for kick_fd when de-initialize vu-dev
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+From: Coiby Xu <coiby.xu@gmail.com>
-Don't let high_off be more than the file size even if we don't fix the
+When the client is running in gdb and quit command is run in gdb,
-image.
+QEMU will still dispatch the event which will cause segment fault in
 the callback function.
-Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
-Reviewed-by: Denis V. Lunev <den@openvz.org>
+Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
-Message-Id: <20230424093147.197643-3-alexander.ivanov@virtuozzo.com>
+Message-id: 20200918080912.321299-3-coiby.xu@gmail.com
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
- block/parallels.c | 4 ++--
+ contrib/libvhost-user/libvhost-user.c | 1 +
-file changed, 2 insertions(+), 2 deletions(-)
+file changed, 1 insertion(+)
-diff --git a/block/parallels.c b/block/parallels.c
+diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/contrib/libvhost-user/libvhost-user.c
-+++ b/block/parallels.c
++++ b/contrib/libvhost-user/libvhost-user.c
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+@@ -XXX,XX +XXX,XX @@ vu_deinit(VuDev *dev)
                      fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
              res->corruptions++;
              if (fix & BDRV_FIX_ERRORS) {
 -                prev_off = 0;
                  s->bat_bitmap[i] = 0;
                  res->corruptions_fixed++;
                  flush_bat = true;
 -                continue;
              }
 +            prev_off = 0;
 +            continue;
          }
-         res->bfi.allocated_clusters++;
+         if (vq->kick_fd != -1) {
 +            dev->remove_watch(dev, vq->kick_fd);
              close(vq->kick_fd);
              vq->kick_fd = -1;
          }
 --
-.40.1
+.26.2

-New patch
+[PULL v2 04/28] util/vhost-user-server: generic vhost user server
+From: Coiby Xu <coiby.xu@gmail.com>
+Sharing QEMU devices via vhost-user protocol.
+Only one vhost-user client can connect to the server one time.
+Suggested-by: Kevin Wolf <kwolf@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
+Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
+Message-id: 20200918080912.321299-4-coiby.xu@gmail.com
+[Fixed size_t %lu -> %zu format string compiler error.
+--Stefan]
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ util/vhost-user-server.h |  65 ++++++
+ util/vhost-user-server.c | 428 +++++++++++++++++++++++++++++++++++++++
+ util/meson.build         |   1 +
+files changed, 494 insertions(+)
+ create mode 100644 util/vhost-user-server.h
+ create mode 100644 util/vhost-user-server.c
+diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/util/vhost-user-server.h
+@@ -XXX,XX +XXX,XX @@
++/*
++ * Sharing QEMU devices via vhost-user protocol
++ *
++ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
++ * Copyright (c) 2020 Red Hat, Inc.
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or
++ * later.  See the COPYING file in the top-level directory.
++ */
++
++#ifndef VHOST_USER_SERVER_H
++#define VHOST_USER_SERVER_H
++
++#include "contrib/libvhost-user/libvhost-user.h"
++#include "io/channel-socket.h"
++#include "io/channel-file.h"
++#include "io/net-listener.h"
++#include "qemu/error-report.h"
++#include "qapi/error.h"
++#include "standard-headers/linux/virtio_blk.h"
++
++typedef struct VuFdWatch {
++    VuDev *vu_dev;
++    int fd; /*kick fd*/
++    void *pvt;
++    vu_watch_cb cb;
++    bool processing;
++    QTAILQ_ENTRY(VuFdWatch) next;
++} VuFdWatch;
++
++typedef struct VuServer VuServer;
++typedef void DevicePanicNotifierFn(VuServer *server);
++
++struct VuServer {
++    QIONetListener *listener;
++    AioContext *ctx;
++    DevicePanicNotifierFn *device_panic_notifier;
++    int max_queues;
++    const VuDevIface *vu_iface;
++    VuDev vu_dev;
++    QIOChannel *ioc; /* The I/O channel with the client */
++    QIOChannelSocket *sioc; /* The underlying data channel with the client */
++    /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
++    QIOChannel *ioc_slave;
++    QIOChannelSocket *sioc_slave;
++    Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
++    QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
++    /* restart coroutine co_trip if AIOContext is changed */
++    bool aio_context_changed;
++    bool processing_msg;
++};
++
++bool vhost_user_server_start(VuServer *server,
++                             SocketAddress *unix_socket,
++                             AioContext *ctx,
++                             uint16_t max_queues,
++                             DevicePanicNotifierFn *device_panic_notifier,
++                             const VuDevIface *vu_iface,
++                             Error **errp);
++
++void vhost_user_server_stop(VuServer *server);
++
++void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
++
++#endif /* VHOST_USER_SERVER_H */
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@
++/*
++ * Sharing QEMU devices via vhost-user protocol
++ *
++ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
++ * Copyright (c) 2020 Red Hat, Inc.
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or
++ * later.  See the COPYING file in the top-level directory.
++ */
++#include "qemu/osdep.h"
++#include "qemu/main-loop.h"
++#include "vhost-user-server.h"
++
++static void vmsg_close_fds(VhostUserMsg *vmsg)
++{
++    int i;
++    for (i = 0; i < vmsg->fd_num; i++) {
++        close(vmsg->fds[i]);
++    }
++}
++
++static void vmsg_unblock_fds(VhostUserMsg *vmsg)
++{
++    int i;
++    for (i = 0; i < vmsg->fd_num; i++) {
++        qemu_set_nonblock(vmsg->fds[i]);
++    }
++}
++
++static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
++                      gpointer opaque);
++
++static void close_client(VuServer *server)
++{
++    /*
++     * Before closing the client
++     *
++     * 1. Let vu_client_trip stop processing new vhost-user msg
++     *
++     * 2. remove kick_handler
++     *
++     * 3. wait for the kick handler to be finished
++     *
++     * 4. wait for the current vhost-user msg to be finished processing
++     */
++
++    QIOChannelSocket *sioc = server->sioc;
++    /* When this is set vu_client_trip will stop new processing vhost-user message */
++    server->sioc = NULL;
++
++    VuFdWatch *vu_fd_watch, *next;
++    QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
++        aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
++                           NULL, NULL, NULL);
++    }
++
++    while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
++        QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
++            if (!vu_fd_watch->processing) {
++                QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
++                g_free(vu_fd_watch);
++            }
++        }
++    }
++
++    while (server->processing_msg) {
++        if (server->ioc->read_coroutine) {
++            server->ioc->read_coroutine = NULL;
++            qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
++                                           NULL, server->ioc);
++            server->processing_msg = false;
++        }
++    }
++
++    vu_deinit(&server->vu_dev);
++    object_unref(OBJECT(sioc));
++    object_unref(OBJECT(server->ioc));
++}
++
++static void panic_cb(VuDev *vu_dev, const char *buf)
++{
++    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
++
++    /* avoid while loop in close_client */
++    server->processing_msg = false;
++
++    if (buf) {
++        error_report("vu_panic: %s", buf);
++    }
++
++    if (server->sioc) {
++        close_client(server);
++    }
++
++    if (server->device_panic_notifier) {
++        server->device_panic_notifier(server);
++    }
++
++    /*
++     * Set the callback function for network listener so another
++     * vhost-user client can connect to this server
++     */
++    qio_net_listener_set_client_func(server->listener,
++                                     vu_accept,
++                                     server,
++                                     NULL);
++}
++
++static bool coroutine_fn
++vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
++{
++    struct iovec iov = {
++        .iov_base = (char *)vmsg,
++        .iov_len = VHOST_USER_HDR_SIZE,
++    };
++    int rc, read_bytes = 0;
++    Error *local_err = NULL;
++    /*
++     * Store fds/nfds returned from qio_channel_readv_full into
++     * temporary variables.
++     *
++     * VhostUserMsg is a packed structure, gcc will complain about passing
++     * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
++     * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
++     * thus two temporary variables nfds and fds are used here.
++     */
++    size_t nfds = 0, nfds_t = 0;
++    const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
++    int *fds_t = NULL;
++    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
++    QIOChannel *ioc = server->ioc;
++
++    if (!ioc) {
++        error_report_err(local_err);
++        goto fail;
++    }
++
++    assert(qemu_in_coroutine());
++    do {
++        /*
++         * qio_channel_readv_full may have short reads, keeping calling it
++         * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
++         */
++        rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
++        if (rc < 0) {
++            if (rc == QIO_CHANNEL_ERR_BLOCK) {
++                qio_channel_yield(ioc, G_IO_IN);
++                continue;
++            } else {
++                error_report_err(local_err);
++                return false;
++            }
++        }
++        read_bytes += rc;
++        if (nfds_t > 0) {
++            if (nfds + nfds_t > max_fds) {
++                error_report("A maximum of %zu fds are allowed, "
++                             "however got %zu fds now",
++                             max_fds, nfds + nfds_t);
++                goto fail;
++            }
++            memcpy(vmsg->fds + nfds, fds_t,
++                   nfds_t *sizeof(vmsg->fds[0]));
++            nfds += nfds_t;
++            g_free(fds_t);
++        }
++        if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
++            break;
++        }
++        iov.iov_base = (char *)vmsg + read_bytes;
++        iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
++    } while (true);
++
++    vmsg->fd_num = nfds;
++    /* qio_channel_readv_full will make socket fds blocking, unblock them */
++    vmsg_unblock_fds(vmsg);
++    if (vmsg->size > sizeof(vmsg->payload)) {
++        error_report("Error: too big message request: %d, "
++                     "size: vmsg->size: %u, "
++                     "while sizeof(vmsg->payload) = %zu",
++                     vmsg->request, vmsg->size, sizeof(vmsg->payload));
++        goto fail;
++    }
++
++    struct iovec iov_payload = {
++        .iov_base = (char *)&vmsg->payload,
++        .iov_len = vmsg->size,
++    };
++    if (vmsg->size) {
++        rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
++        if (rc == -1) {
++            error_report_err(local_err);
++            goto fail;
++        }
++    }
++
++    return true;
++
++fail:
++    vmsg_close_fds(vmsg);
++
++    return false;
++}
++
++
++static void vu_client_start(VuServer *server);
++static coroutine_fn void vu_client_trip(void *opaque)
++{
++    VuServer *server = opaque;
++
++    while (!server->aio_context_changed && server->sioc) {
++        server->processing_msg = true;
++        vu_dispatch(&server->vu_dev);
++        server->processing_msg = false;
++    }
++
++    if (server->aio_context_changed && server->sioc) {
++        server->aio_context_changed = false;
++        vu_client_start(server);
++    }
++}
++
++static void vu_client_start(VuServer *server)
++{
++    server->co_trip = qemu_coroutine_create(vu_client_trip, server);
++    aio_co_enter(server->ctx, server->co_trip);
++}
++
++/*
++ * a wrapper for vu_kick_cb
++ *
++ * since aio_dispatch can only pass one user data pointer to the
++ * callback function, pack VuDev and pvt into a struct. Then unpack it
++ * and pass them to vu_kick_cb
++ */
++static void kick_handler(void *opaque)
++{
++    VuFdWatch *vu_fd_watch = opaque;
++    vu_fd_watch->processing = true;
++    vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
++    vu_fd_watch->processing = false;
++}
++
++
++static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
++{
++
++    VuFdWatch *vu_fd_watch, *next;
++    QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
++        if (vu_fd_watch->fd == fd) {
++            return vu_fd_watch;
++        }
++    }
++    return NULL;
++}
++
++static void
++set_watch(VuDev *vu_dev, int fd, int vu_evt,
++          vu_watch_cb cb, void *pvt)
++{
++
++    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
++    g_assert(vu_dev);
++    g_assert(fd >= 0);
++    g_assert(cb);
++
++    VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
++
++    if (!vu_fd_watch) {
++        VuFdWatch *vu_fd_watch = g_new0(VuFdWatch, 1);
++
++        QTAILQ_INSERT_TAIL(&server->vu_fd_watches, vu_fd_watch, next);
++
++        vu_fd_watch->fd = fd;
++        vu_fd_watch->cb = cb;
++        qemu_set_nonblock(fd);
++        aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
++                           NULL, NULL, vu_fd_watch);
++        vu_fd_watch->vu_dev = vu_dev;
++        vu_fd_watch->pvt = pvt;
++    }
++}
++
++
++static void remove_watch(VuDev *vu_dev, int fd)
++{
++    VuServer *server;
++    g_assert(vu_dev);
++    g_assert(fd >= 0);
++
++    server = container_of(vu_dev, VuServer, vu_dev);
++
++    VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
++
++    if (!vu_fd_watch) {
++        return;
++    }
++    aio_set_fd_handler(server->ioc->ctx, fd, true, NULL, NULL, NULL, NULL);
++
++    QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
++    g_free(vu_fd_watch);
++}
++
++
++static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
++                      gpointer opaque)
++{
++    VuServer *server = opaque;
++
++    if (server->sioc) {
++        warn_report("Only one vhost-user client is allowed to "
++                    "connect the server one time");
++        return;
++    }
++
++    if (!vu_init(&server->vu_dev, server->max_queues, sioc->fd, panic_cb,
++                 vu_message_read, set_watch, remove_watch, server->vu_iface)) {
++        error_report("Failed to initialize libvhost-user");
++        return;
++    }
++
++    /*
++     * Unset the callback function for network listener to make another
++     * vhost-user client keeping waiting until this client disconnects
++     */
++    qio_net_listener_set_client_func(server->listener,
++                                     NULL,
++                                     NULL,
++                                     NULL);
++    server->sioc = sioc;
++    /*
++     * Increase the object reference, so sioc will not freed by
++     * qio_net_listener_channel_func which will call object_unref(OBJECT(sioc))
++     */
++    object_ref(OBJECT(server->sioc));
++    qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
++    server->ioc = QIO_CHANNEL(sioc);
++    object_ref(OBJECT(server->ioc));
++    qio_channel_attach_aio_context(server->ioc, server->ctx);
++    qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
++    vu_client_start(server);
++}
++
++
++void vhost_user_server_stop(VuServer *server)
++{
++    if (server->sioc) {
++        close_client(server);
++    }
++
++    if (server->listener) {
++        qio_net_listener_disconnect(server->listener);
++        object_unref(OBJECT(server->listener));
++    }
++
++}
++
++void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
++{
++    VuFdWatch *vu_fd_watch, *next;
++    void *opaque = NULL;
++    IOHandler *io_read = NULL;
++    bool attach;
++
++    server->ctx = ctx ? ctx : qemu_get_aio_context();
++
++    if (!server->sioc) {
++        /* not yet serving any client*/
++        return;
++    }
++
++    if (ctx) {
++        qio_channel_attach_aio_context(server->ioc, ctx);
++        server->aio_context_changed = true;
++        io_read = kick_handler;
++        attach = true;
++    } else {
++        qio_channel_detach_aio_context(server->ioc);
++        /* server->ioc->ctx keeps the old AioConext */
++        ctx = server->ioc->ctx;
++        attach = false;
++    }
++
++    QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
++        if (vu_fd_watch->cb) {
++            opaque = attach ? vu_fd_watch : NULL;
++            aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
++                               io_read, NULL, NULL,
++                               opaque);
++        }
++    }
++}
++
++
++bool vhost_user_server_start(VuServer *server,
++                             SocketAddress *socket_addr,
++                             AioContext *ctx,
++                             uint16_t max_queues,
++                             DevicePanicNotifierFn *device_panic_notifier,
++                             const VuDevIface *vu_iface,
++                             Error **errp)
++{
++    QIONetListener *listener = qio_net_listener_new();
++    if (qio_net_listener_open_sync(listener, socket_addr, 1,
++                                   errp) < 0) {
++        object_unref(OBJECT(listener));
++        return false;
++    }
++
++    /* zero out unspecified fileds */
++    *server = (VuServer) {
++        .listener              = listener,
++        .vu_iface              = vu_iface,
++        .max_queues            = max_queues,
++        .ctx                   = ctx,
++        .device_panic_notifier = device_panic_notifier,
++    };
++
++    qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
++
++    qio_net_listener_set_client_func(server->listener,
++                                     vu_accept,
++                                     server,
++                                     NULL);
++
++    QTAILQ_INIT(&server->vu_fd_watches);
++    return true;
++}
+diff --git a/util/meson.build b/util/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/util/meson.build
++++ b/util/meson.build
+@@ -XXX,XX +XXX,XX @@ if have_block
+   util_ss.add(files('main-loop.c'))
+   util_ss.add(files('nvdimm-utils.c'))
+   util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
++  util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
+   util_ss.add(files('qemu-coroutine-sleep.c'))
+   util_ss.add(files('qemu-co-shared-resource.c'))
+   util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
+--
+.26.2

-[PULL 10/17] parallels: Move check of unclean image to a separate function
+[PULL v2 05/28] block: move logical block size check function to a common utility function
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+From: Coiby Xu <coiby.xu@gmail.com>
-We will add more and more checks so we need a better code structure
+Move the constants from hw/core/qdev-properties.c to
-in parallels_co_check. Let each check performs in a separate loop
+util/block-helpers.h so that knowledge of the min/max values is
 in a separate helper.
-Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: Denis V. Lunev <den@openvz.org>
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
-Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Message-Id: <20230424093147.197643-7-alexander.ivanov@virtuozzo.com>
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Acked-by: Eduardo Habkost <ehabkost@redhat.com>
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Message-id: 20200918080912.321299-5-coiby.xu@gmail.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/parallels.c | 31 +++++++++++++++++++++----------
+ util/block-helpers.h             | 19 +++++++++++++
-file changed, 21 insertions(+), 10 deletions(-)
+ hw/core/qdev-properties-system.c | 31 ++++-----------------
  util/block-helpers.c             | 46 ++++++++++++++++++++++++++++++++
  util/meson.build                 |  1 +
 files changed, 71 insertions(+), 26 deletions(-)
  create mode 100644 util/block-helpers.h
  create mode 100644 util/block-helpers.c
-diff --git a/block/parallels.c b/block/parallels.c
+diff --git a/util/block-helpers.h b/util/block-helpers.h
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/util/block-helpers.h
@@ -XXX,XX +XXX,XX @@
 +#ifndef BLOCK_HELPERS_H
 +#define BLOCK_HELPERS_H
 +
 +#include "qemu/units.h"
 +
 +/* lower limit is sector size */
 +#define MIN_BLOCK_SIZE          INT64_C(512)
 +#define MIN_BLOCK_SIZE_STR      "512 B"
 +/*
 + * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
 + * matches qcow2 cluster size limit
 + */
 +#define MAX_BLOCK_SIZE          (2 * MiB)
 +#define MAX_BLOCK_SIZE_STR      "2 MiB"
 +
 +void check_block_size(const char *id, const char *name, int64_t value,
 +                      Error **errp);
 +
 +#endif /* BLOCK_HELPERS_H */
 diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/hw/core/qdev-properties-system.c
-+++ b/block/parallels.c
++++ b/hw/core/qdev-properties-system.c
-@@ -XXX,XX +XXX,XX @@ parallels_co_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
+@@ -XXX,XX +XXX,XX @@
-     return ret;
+ #include "sysemu/blockdev.h"
  #include "net/net.h"
  #include "hw/pci/pci.h"
 +#include "util/block-helpers.h"
  static bool check_prop_still_unset(DeviceState *dev, const char *name,
                                     const void *old_val, const char *new_val,
@@ -XXX,XX +XXX,XX @@ const PropertyInfo qdev_prop_losttickpolicy = {
  /* --- blocksize --- */
 -/* lower limit is sector size */
 -#define MIN_BLOCK_SIZE          512
 -#define MIN_BLOCK_SIZE_STR      "512 B"
 -/*
 - * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
 - * matches qcow2 cluster size limit
 - */
 -#define MAX_BLOCK_SIZE          (2 * MiB)
 -#define MAX_BLOCK_SIZE_STR      "2 MiB"
 -
  static void set_blocksize(Object *obj, Visitor *v, const char *name,
                            void *opaque, Error **errp)
  {
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
      Property *prop = opaque;
      uint32_t *ptr = qdev_get_prop_ptr(dev, prop);
      uint64_t value;
 +    Error *local_err = NULL;
      if (dev->realized) {
          qdev_prop_set_after_realize(dev, name, errp);
@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
      if (!visit_type_size(v, name, &value, errp)) {
          return;
      }
 -    /* value of 0 means "unset" */
 -    if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
 -        error_setg(errp,
 -                   "Property %s.%s doesn't take value %" PRIu64
 -                   " (minimum: " MIN_BLOCK_SIZE_STR
 -                   ", maximum: " MAX_BLOCK_SIZE_STR ")",
 -                   dev->id ? : "", name, value);
 +    check_block_size(dev->id ? : "", name, value, &local_err);
 +    if (local_err) {
 +        error_propagate(errp, local_err);
          return;
      }
 -
 -    /* We rely on power-of-2 blocksizes for bitmasks */
 -    if ((value & (value - 1)) != 0) {
 -        error_setg(errp,
 -                  "Property %s.%s doesn't take value '%" PRId64 "', "
 -                  "it's not a power of 2", dev->id ?: "", name, (int64_t)value);
 -        return;
 -    }
 -
      *ptr = value;
  }
-+static void parallels_check_unclean(BlockDriverState *bs,
+diff --git a/util/block-helpers.c b/util/block-helpers.c
-+                                    BdrvCheckResult *res,
+new file mode 100644
-+                                    BdrvCheckMode fix)
+index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/util/block-helpers.c
@@ -XXX,XX +XXX,XX @@
 +/*
 + * Block utility functions
 + *
 + * Copyright IBM, Corp. 2011
 + * Copyright (c) 2020 Coiby Xu <coiby.xu@gmail.com>
 + *
 + * This work is licensed under the terms of the GNU GPL, version 2 or later.
 + * See the COPYING file in the top-level directory.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qapi/error.h"
 +#include "qapi/qmp/qerror.h"
 +#include "block-helpers.h"
 +
 +/**
 + * check_block_size:
 + * @id: The unique ID of the object
 + * @name: The name of the property being validated
 + * @value: The block size in bytes
 + * @errp: A pointer to an area to store an error
 + *
 + * This function checks that the block size meets the following conditions:
 + * 1. At least MIN_BLOCK_SIZE
 + * 2. No larger than MAX_BLOCK_SIZE
 + * 3. A power of 2
 + */
 +void check_block_size(const char *id, const char *name, int64_t value,
 +                      Error **errp)
 +{
-+    BDRVParallelsState *s = bs->opaque;
++    /* value of 0 means "unset" */
-+
++    if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
-+    if (!s->header_unclean) {
++        error_setg(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
 +                   id, name, value, MIN_BLOCK_SIZE, MAX_BLOCK_SIZE);
 +        return;
 +    }
 +
-+    fprintf(stderr, "%s image was not closed correctly\n",
++    /* We rely on power-of-2 blocksizes for bitmasks */
-+            fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
++    if ((value & (value - 1)) != 0) {
-+    res->corruptions++;
++        error_setg(errp,
-+    if (fix & BDRV_FIX_ERRORS) {
++                   "Property %s.%s doesn't take value '%" PRId64
-+        /* parallels_close will do the job right */
++                   "', it's not a power of 2",
-+        res->corruptions_fixed++;
++                   id, name, value);
-+        s->header_unclean = false;
++        return;
 +    }
 +}
+diff --git a/util/meson.build b/util/meson.build
- static int coroutine_fn GRAPH_RDLOCK
+index XXXXXXX..XXXXXXX 100644
- parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+--- a/util/meson.build
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
++++ b/util/meson.build
-     }
+@@ -XXX,XX +XXX,XX @@ if have_block
+   util_ss.add(files('nvdimm-utils.c'))
-     qemu_co_mutex_lock(&s->lock);
+   util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
--    if (s->header_unclean) {
+   util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
--        fprintf(stderr, "%s image was not closed correctly\n",
++  util_ss.add(files('block-helpers.c'))
--                fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR");
+   util_ss.add(files('qemu-coroutine-sleep.c'))
--        res->corruptions++;
+   util_ss.add(files('qemu-co-shared-resource.c'))
--        if (fix & BDRV_FIX_ERRORS) {
+   util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
 -            /* parallels_close will do the job right */
 -            res->corruptions_fixed++;
 -            s->header_unclean = false;
 -        }
 -    }
 +
 +    parallels_check_unclean(bs, res, fix);
      res->bfi.total_clusters = s->bat_size;
      res->bfi.compressed_clusters = 0; /* compression is not supported */
 --
-.40.1
+.26.2

-New patch
+[PULL v2 06/28] block/export: vhost-user block device backend server
+From: Coiby Xu <coiby.xu@gmail.com>
+By making use of libvhost-user, block device drive can be shared to
+the connected vhost-user client. Only one client can connect to the
+server one time.
+Since vhost-user-server needs a block drive to be created first, delay
+the creation of this object.
+Suggested-by: Kevin Wolf <kwolf@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
+Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
+Message-id: 20200918080912.321299-6-coiby.xu@gmail.com
+[Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the
+following compiler warning:
+../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
+and fix "Invalid size %ld ..." ssize_t format string arguments for
+-bit hosts.
+--Stefan]
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ block/export/vhost-user-blk-server.h |  36 ++
+ block/export/vhost-user-blk-server.c | 661 +++++++++++++++++++++++++++
+ softmmu/vl.c                         |   4 +
+ block/meson.build                    |   1 +
+files changed, 702 insertions(+)
+ create mode 100644 block/export/vhost-user-blk-server.h
+ create mode 100644 block/export/vhost-user-blk-server.c
+diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/block/export/vhost-user-blk-server.h
+@@ -XXX,XX +XXX,XX @@
++/*
++ * Sharing QEMU block devices via vhost-user protocal
++ *
++ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
++ * Copyright (c) 2020 Red Hat, Inc.
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or
++ * later.  See the COPYING file in the top-level directory.
++ */
++
++#ifndef VHOST_USER_BLK_SERVER_H
++#define VHOST_USER_BLK_SERVER_H
++#include "util/vhost-user-server.h"
++
++typedef struct VuBlockDev VuBlockDev;
++#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
++#define VHOST_USER_BLK_SERVER(obj) \
++   OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
++
++/* vhost user block device */
++struct VuBlockDev {
++    Object parent_obj;
++    char *node_name;
++    SocketAddress *addr;
++    AioContext *ctx;
++    VuServer vu_server;
++    bool running;
++    uint32_t blk_size;
++    BlockBackend *backend;
++    QIOChannelSocket *sioc;
++    QTAILQ_ENTRY(VuBlockDev) next;
++    struct virtio_blk_config blkcfg;
++    bool writable;
++};
++
++#endif /* VHOST_USER_BLK_SERVER_H */
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/block/export/vhost-user-blk-server.c
+@@ -XXX,XX +XXX,XX @@
++/*
++ * Sharing QEMU block devices via vhost-user protocal
++ *
++ * Parts of the code based on nbd/server.c.
++ *
++ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
++ * Copyright (c) 2020 Red Hat, Inc.
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or
++ * later.  See the COPYING file in the top-level directory.
++ */
++#include "qemu/osdep.h"
++#include "block/block.h"
++#include "vhost-user-blk-server.h"
++#include "qapi/error.h"
++#include "qom/object_interfaces.h"
++#include "sysemu/block-backend.h"
++#include "util/block-helpers.h"
++
++enum {
++    VHOST_USER_BLK_MAX_QUEUES = 1,
++};
++struct virtio_blk_inhdr {
++    unsigned char status;
++};
++
++typedef struct VuBlockReq {
++    VuVirtqElement *elem;
++    int64_t sector_num;
++    size_t size;
++    struct virtio_blk_inhdr *in;
++    struct virtio_blk_outhdr out;
++    VuServer *server;
++    struct VuVirtq *vq;
++} VuBlockReq;
++
++static void vu_block_req_complete(VuBlockReq *req)
++{
++    VuDev *vu_dev = &req->server->vu_dev;
++
++    /* IO size with 1 extra status byte */
++    vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
++    vu_queue_notify(vu_dev, req->vq);
++
++    if (req->elem) {
++        free(req->elem);
++    }
++
++    g_free(req);
++}
++
++static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
++{
++    return container_of(server, VuBlockDev, vu_server);
++}
++
++static int coroutine_fn
++vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
++                              uint32_t iovcnt, uint32_t type)
++{
++    struct virtio_blk_discard_write_zeroes desc;
++    ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
++    if (unlikely(size != sizeof(desc))) {
++        error_report("Invalid size %zd, expect %zu", size, sizeof(desc));
++        return -EINVAL;
++    }
++
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
++    uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
++                          le32_to_cpu(desc.num_sectors) << 9 };
++    if (type == VIRTIO_BLK_T_DISCARD) {
++        if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
++            return 0;
++        }
++    } else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
++        if (blk_co_pwrite_zeroes(vdev_blk->backend,
++                                 range[0], range[1], 0) == 0) {
++            return 0;
++        }
++    }
++
++    return -EINVAL;
++}
++
++static void coroutine_fn vu_block_flush(VuBlockReq *req)
++{
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
++    BlockBackend *backend = vdev_blk->backend;
++    blk_co_flush(backend);
++}
++
++struct req_data {
++    VuServer *server;
++    VuVirtq *vq;
++    VuVirtqElement *elem;
++};
++
++static void coroutine_fn vu_block_virtio_process_req(void *opaque)
++{
++    struct req_data *data = opaque;
++    VuServer *server = data->server;
++    VuVirtq *vq = data->vq;
++    VuVirtqElement *elem = data->elem;
++    uint32_t type;
++    VuBlockReq *req;
++
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
++    BlockBackend *backend = vdev_blk->backend;
++
++    struct iovec *in_iov = elem->in_sg;
++    struct iovec *out_iov = elem->out_sg;
++    unsigned in_num = elem->in_num;
++    unsigned out_num = elem->out_num;
++    /* refer to hw/block/virtio_blk.c */
++    if (elem->out_num < 1 || elem->in_num < 1) {
++        error_report("virtio-blk request missing headers");
++        free(elem);
++        return;
++    }
++
++    req = g_new0(VuBlockReq, 1);
++    req->server = server;
++    req->vq = vq;
++    req->elem = elem;
++
++    if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
++                            sizeof(req->out)) != sizeof(req->out))) {
++        error_report("virtio-blk request outhdr too short");
++        goto err;
++    }
++
++    iov_discard_front(&out_iov, &out_num, sizeof(req->out));
++
++    if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
++        error_report("virtio-blk request inhdr too short");
++        goto err;
++    }
++
++    /* We always touch the last byte, so just see how big in_iov is.  */
++    req->in = (void *)in_iov[in_num - 1].iov_base
++              + in_iov[in_num - 1].iov_len
++              - sizeof(struct virtio_blk_inhdr);
++    iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
++
++    type = le32_to_cpu(req->out.type);
++    switch (type & ~VIRTIO_BLK_T_BARRIER) {
++    case VIRTIO_BLK_T_IN:
++    case VIRTIO_BLK_T_OUT: {
++        ssize_t ret = 0;
++        bool is_write = type & VIRTIO_BLK_T_OUT;
++        req->sector_num = le64_to_cpu(req->out.sector);
++
++        int64_t offset = req->sector_num * vdev_blk->blk_size;
++        QEMUIOVector qiov;
++        if (is_write) {
++            qemu_iovec_init_external(&qiov, out_iov, out_num);
++            ret = blk_co_pwritev(backend, offset, qiov.size,
++                                 &qiov, 0);
++        } else {
++            qemu_iovec_init_external(&qiov, in_iov, in_num);
++            ret = blk_co_preadv(backend, offset, qiov.size,
++                                &qiov, 0);
++        }
++        if (ret >= 0) {
++            req->in->status = VIRTIO_BLK_S_OK;
++        } else {
++            req->in->status = VIRTIO_BLK_S_IOERR;
++        }
++        break;
++    }
++    case VIRTIO_BLK_T_FLUSH:
++        vu_block_flush(req);
++        req->in->status = VIRTIO_BLK_S_OK;
++        break;
++    case VIRTIO_BLK_T_GET_ID: {
++        size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
++                          VIRTIO_BLK_ID_BYTES);
++        snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk");
++        req->in->status = VIRTIO_BLK_S_OK;
++        req->size = elem->in_sg[0].iov_len;
++        break;
++    }
++    case VIRTIO_BLK_T_DISCARD:
++    case VIRTIO_BLK_T_WRITE_ZEROES: {
++        int rc;
++        rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
++                                           out_num, type);
++        if (rc == 0) {
++            req->in->status = VIRTIO_BLK_S_OK;
++        } else {
++            req->in->status = VIRTIO_BLK_S_IOERR;
++        }
++        break;
++    }
++    default:
++        req->in->status = VIRTIO_BLK_S_UNSUPP;
++        break;
++    }
++
++    vu_block_req_complete(req);
++    return;
++
++err:
++    free(elem);
++    g_free(req);
++    return;
++}
++
++static void vu_block_process_vq(VuDev *vu_dev, int idx)
++{
++    VuServer *server;
++    VuVirtq *vq;
++    struct req_data *req_data;
++
++    server = container_of(vu_dev, VuServer, vu_dev);
++    assert(server);
++
++    vq = vu_get_queue(vu_dev, idx);
++    assert(vq);
++    VuVirtqElement *elem;
++    while (1) {
++        elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
++                                    sizeof(VuBlockReq));
++        if (elem) {
++            req_data = g_new0(struct req_data, 1);
++            req_data->server = server;
++            req_data->vq = vq;
++            req_data->elem = elem;
++            Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
++                                                  req_data);
++            aio_co_enter(server->ioc->ctx, co);
++        } else {
++            break;
++        }
++    }
++}
++
++static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
++{
++    VuVirtq *vq;
++
++    assert(vu_dev);
++
++    vq = vu_get_queue(vu_dev, idx);
++    vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
++}
++
++static uint64_t vu_block_get_features(VuDev *dev)
++{
++    uint64_t features;
++    VuServer *server = container_of(dev, VuServer, vu_dev);
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
++    features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
++               1ull << VIRTIO_BLK_F_SEG_MAX |
++               1ull << VIRTIO_BLK_F_TOPOLOGY |
++               1ull << VIRTIO_BLK_F_BLK_SIZE |
++               1ull << VIRTIO_BLK_F_FLUSH |
++               1ull << VIRTIO_BLK_F_DISCARD |
++               1ull << VIRTIO_BLK_F_WRITE_ZEROES |
++               1ull << VIRTIO_BLK_F_CONFIG_WCE |
++               1ull << VIRTIO_F_VERSION_1 |
++               1ull << VIRTIO_RING_F_INDIRECT_DESC |
++               1ull << VIRTIO_RING_F_EVENT_IDX |
++               1ull << VHOST_USER_F_PROTOCOL_FEATURES;
++
++    if (!vdev_blk->writable) {
++        features |= 1ull << VIRTIO_BLK_F_RO;
++    }
++
++    return features;
++}
++
++static uint64_t vu_block_get_protocol_features(VuDev *dev)
++{
++    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
++           1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
++}
++
++static int
++vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
++{
++    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
++    memcpy(config, &vdev_blk->blkcfg, len);
++
++    return 0;
++}
++
++static int
++vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
++                    uint32_t offset, uint32_t size, uint32_t flags)
++{
++    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
++    uint8_t wce;
++
++    /* don't support live migration */
++    if (flags != VHOST_SET_CONFIG_TYPE_MASTER) {
++        return -EINVAL;
++    }
++
++    if (offset != offsetof(struct virtio_blk_config, wce) ||
++        size != 1) {
++        return -EINVAL;
++    }
++
++    wce = *data;
++    vdev_blk->blkcfg.wce = wce;
++    blk_set_enable_write_cache(vdev_blk->backend, wce);
++    return 0;
++}
++
++/*
++ * When the client disconnects, it sends a VHOST_USER_NONE request
++ * and vu_process_message will simple call exit which cause the VM
++ * to exit abruptly.
++ * To avoid this issue,  process VHOST_USER_NONE request ahead
++ * of vu_process_message.
++ *
++ */
++static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
++{
++    if (vmsg->request == VHOST_USER_NONE) {
++        dev->panic(dev, "disconnect");
++        return true;
++    }
++    return false;
++}
++
++static const VuDevIface vu_block_iface = {
++    .get_features          = vu_block_get_features,
++    .queue_set_started     = vu_block_queue_set_started,
++    .get_protocol_features = vu_block_get_protocol_features,
++    .get_config            = vu_block_get_config,
++    .set_config            = vu_block_set_config,
++    .process_msg           = vu_block_process_msg,
++};
++
++static void blk_aio_attached(AioContext *ctx, void *opaque)
++{
++    VuBlockDev *vub_dev = opaque;
++    aio_context_acquire(ctx);
++    vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
++    aio_context_release(ctx);
++}
++
++static void blk_aio_detach(void *opaque)
++{
++    VuBlockDev *vub_dev = opaque;
++    AioContext *ctx = vub_dev->vu_server.ctx;
++    aio_context_acquire(ctx);
++    vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
++    aio_context_release(ctx);
++}
++
++static void
++vu_block_initialize_config(BlockDriverState *bs,
++                           struct virtio_blk_config *config, uint32_t blk_size)
++{
++    config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
++    config->blk_size = blk_size;
++    config->size_max = 0;
++    config->seg_max = 128 - 2;
++    config->min_io_size = 1;
++    config->opt_io_size = 1;
++    config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
++    config->max_discard_sectors = 32768;
++    config->max_discard_seg = 1;
++    config->discard_sector_alignment = config->blk_size >> 9;
++    config->max_write_zeroes_sectors = 32768;
++    config->max_write_zeroes_seg = 1;
++}
++
++static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
++{
++
++    BlockBackend *blk;
++    Error *local_error = NULL;
++    const char *node_name = vu_block_device->node_name;
++    bool writable = vu_block_device->writable;
++    uint64_t perm = BLK_PERM_CONSISTENT_READ;
++    int ret;
++
++    AioContext *ctx;
++
++    BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
++
++    if (!bs) {
++        error_propagate(errp, local_error);
++        return NULL;
++    }
++
++    if (bdrv_is_read_only(bs)) {
++        writable = false;
++    }
++
++    if (writable) {
++        perm |= BLK_PERM_WRITE;
++    }
++
++    ctx = bdrv_get_aio_context(bs);
++    aio_context_acquire(ctx);
++    bdrv_invalidate_cache(bs, NULL);
++    aio_context_release(ctx);
++
++    /*
++     * Don't allow resize while the vhost user server is running,
++     * otherwise we don't care what happens with the node.
++     */
++    blk = blk_new(bdrv_get_aio_context(bs), perm,
++                  BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
++                  BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
++    ret = blk_insert_bs(blk, bs, errp);
++
++    if (ret < 0) {
++        goto fail;
++    }
++
++    blk_set_enable_write_cache(blk, false);
++
++    blk_set_allow_aio_context_change(blk, true);
++
++    vu_block_device->blkcfg.wce = 0;
++    vu_block_device->backend = blk;
++    if (!vu_block_device->blk_size) {
++        vu_block_device->blk_size = BDRV_SECTOR_SIZE;
++    }
++    vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
++    blk_set_guest_block_size(blk, vu_block_device->blk_size);
++    vu_block_initialize_config(bs, &vu_block_device->blkcfg,
++                                   vu_block_device->blk_size);
++    return vu_block_device;
++
++fail:
++    blk_unref(blk);
++    return NULL;
++}
++
++static void vu_block_deinit(VuBlockDev *vu_block_device)
++{
++    if (vu_block_device->backend) {
++        blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
++                                        blk_aio_detach, vu_block_device);
++    }
++
++    blk_unref(vu_block_device->backend);
++}
++
++static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
++{
++    vhost_user_server_stop(&vu_block_device->vu_server);
++    vu_block_deinit(vu_block_device);
++}
++
++static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
++                                        Error **errp)
++{
++    AioContext *ctx;
++    SocketAddress *addr = vu_block_device->addr;
++
++    if (!vu_block_init(vu_block_device, errp)) {
++        return;
++    }
++
++    ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
++
++    if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
++                                 VHOST_USER_BLK_MAX_QUEUES,
++                                 NULL, &vu_block_iface,
++                                 errp)) {
++        goto error;
++    }
++
++    blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
++                                 blk_aio_detach, vu_block_device);
++    vu_block_device->running = true;
++    return;
++
++ error:
++    vu_block_deinit(vu_block_device);
++}
++
++static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
++{
++    if (vus->running) {
++            error_setg(errp, "The property can't be modified "
++                       "while the server is running");
++            return false;
++    }
++    return true;
++}
++
++static void vu_set_node_name(Object *obj, const char *value, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++
++    if (!vu_prop_modifiable(vus, errp)) {
++        return;
++    }
++
++    if (vus->node_name) {
++        g_free(vus->node_name);
++    }
++
++    vus->node_name = g_strdup(value);
++}
++
++static char *vu_get_node_name(Object *obj, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++    return g_strdup(vus->node_name);
++}
++
++static void free_socket_addr(SocketAddress *addr)
++{
++        g_free(addr->u.q_unix.path);
++        g_free(addr);
++}
++
++static void vu_set_unix_socket(Object *obj, const char *value,
++                               Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++
++    if (!vu_prop_modifiable(vus, errp)) {
++        return;
++    }
++
++    if (vus->addr) {
++        free_socket_addr(vus->addr);
++    }
++
++    SocketAddress *addr = g_new0(SocketAddress, 1);
++    addr->type = SOCKET_ADDRESS_TYPE_UNIX;
++    addr->u.q_unix.path = g_strdup(value);
++    vus->addr = addr;
++}
++
++static char *vu_get_unix_socket(Object *obj, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++    return g_strdup(vus->addr->u.q_unix.path);
++}
++
++static bool vu_get_block_writable(Object *obj, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++    return vus->writable;
++}
++
++static void vu_set_block_writable(Object *obj, bool value, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++
++    if (!vu_prop_modifiable(vus, errp)) {
++            return;
++    }
++
++    vus->writable = value;
++}
++
++static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
++                            void *opaque, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++    uint32_t value = vus->blk_size;
++
++    visit_type_uint32(v, name, &value, errp);
++}
++
++static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
++                            void *opaque, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++
++    Error *local_err = NULL;
++    uint32_t value;
++
++    if (!vu_prop_modifiable(vus, errp)) {
++            return;
++    }
++
++    visit_type_uint32(v, name, &value, &local_err);
++    if (local_err) {
++        goto out;
++    }
++
++    check_block_size(object_get_typename(obj), name, value, &local_err);
++    if (local_err) {
++        goto out;
++    }
++
++    vus->blk_size = value;
++
++out:
++    error_propagate(errp, local_err);
++}
++
++static void vhost_user_blk_server_instance_finalize(Object *obj)
++{
++    VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
++
++    vhost_user_blk_server_stop(vub);
++
++    /*
++     * Unlike object_property_add_str, object_class_property_add_str
++     * doesn't have a release method. Thus manual memory freeing is
++     * needed.
++     */
++    free_socket_addr(vub->addr);
++    g_free(vub->node_name);
++}
++
++static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
++{
++    VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
++
++    vhost_user_blk_server_start(vub, errp);
++}
++
++static void vhost_user_blk_server_class_init(ObjectClass *klass,
++                                             void *class_data)
++{
++    UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
++    ucc->complete = vhost_user_blk_server_complete;
++
++    object_class_property_add_bool(klass, "writable",
++                                   vu_get_block_writable,
++                                   vu_set_block_writable);
++
++    object_class_property_add_str(klass, "node-name",
++                                  vu_get_node_name,
++                                  vu_set_node_name);
++
++    object_class_property_add_str(klass, "unix-socket",
++                                  vu_get_unix_socket,
++                                  vu_set_unix_socket);
++
++    object_class_property_add(klass, "logical-block-size", "uint32",
++                              vu_get_blk_size, vu_set_blk_size,
++                              NULL, NULL);
++}
++
++static const TypeInfo vhost_user_blk_server_info = {
++    .name = TYPE_VHOST_USER_BLK_SERVER,
++    .parent = TYPE_OBJECT,
++    .instance_size = sizeof(VuBlockDev),
++    .instance_finalize = vhost_user_blk_server_instance_finalize,
++    .class_init = vhost_user_blk_server_class_init,
++    .interfaces = (InterfaceInfo[]) {
++        {TYPE_USER_CREATABLE},
++        {}
++    },
++};
++
++static void vhost_user_blk_server_register_types(void)
++{
++    type_register_static(&vhost_user_blk_server_info);
++}
++
++type_init(vhost_user_blk_server_register_types)
+diff --git a/softmmu/vl.c b/softmmu/vl.c
+index XXXXXXX..XXXXXXX 100644
+--- a/softmmu/vl.c
++++ b/softmmu/vl.c
+@@ -XXX,XX +XXX,XX @@ static bool object_create_initial(const char *type, QemuOpts *opts)
+     }
+ #endif
++    /* Reason: vhost-user-blk-server property "node-name" */
++    if (g_str_equal(type, "vhost-user-blk-server")) {
++        return false;
++    }
+     /*
+      * Reason: filter-* property "netdev" etc.
+      */
+diff --git a/block/meson.build b/block/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/block/meson.build
++++ b/block/meson.build
+@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
+ block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
+ block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
+ block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
++block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
+ block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
+ block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
+ block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
+--
+.26.2

-New patch
+[PULL v2 07/28] MAINTAINERS: Add vhost-user block device backend server maintainer
+From: Coiby Xu <coiby.xu@gmail.com>
+Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
+Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
+Message-id: 20200918080912.321299-8-coiby.xu@gmail.com
+[Removed reference to vhost-user-blk-test.c, it will be sent in a
+separate pull request.
+--Stefan]
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ MAINTAINERS | 7 +++++++
+file changed, 7 insertions(+)
+diff --git a/MAINTAINERS b/MAINTAINERS
+index XXXXXXX..XXXXXXX 100644
+--- a/MAINTAINERS
++++ b/MAINTAINERS
+@@ -XXX,XX +XXX,XX @@ L: qemu-block@nongnu.org
+ S: Supported
+ F: tests/image-fuzzer/
++Vhost-user block device backend server
++M: Coiby Xu <Coiby.Xu@gmail.com>
++S: Maintained
++F: block/export/vhost-user-blk-server.c
++F: util/vhost-user-server.c
++F: tests/qtest/libqos/vhost-user-blk.c
++
+ Replication
+ M: Wen Congyang <wencongyang2@huawei.com>
+ M: Xie Changlong <xiechanglong.d@gmail.com>
+--
+.26.2

-New patch
+[PULL v2 08/28] util/vhost-user-server: s/fileds/fields/ typo fix
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-3-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ util/vhost-user-server.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.c
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
+         return false;
+     }
+-    /* zero out unspecified fileds */
++    /* zero out unspecified fields */
+     *server = (VuServer) {
+         .listener              = listener,
+         .vu_iface              = vu_iface,
+--
+.26.2

-[PULL 16/17] parallels: Incorrect condition in out-of-image check
+[PULL v2 09/28] util/vhost-user-server: drop unnecessary QOM cast
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+We already have access to the value with the correct type (ioc and sioc
 are the same QIOChannel).
-All the offsets in the BAT must be lower than the file size.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Fix the check condition for correct check.
+Message-id: 20200924151549.913737-4-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
 Reviewed-by: Denis V. Lunev <den@openvz.org>
 Message-Id: <20230424093147.197643-13-alexander.ivanov@virtuozzo.com>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
- block/parallels.c | 2 +-
+ util/vhost-user-server.c | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/block/parallels.c b/block/parallels.c
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/util/vhost-user-server.c
-+++ b/block/parallels.c
++++ b/util/vhost-user-server.c
-@@ -XXX,XX +XXX,XX @@ parallels_check_outside_image(BlockDriverState *bs, BdrvCheckResult *res,
+@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
-     high_off = 0;
+     server->ioc = QIO_CHANNEL(sioc);
-     for (i = 0; i < s->bat_size; i++) {
+     object_ref(OBJECT(server->ioc));
-         off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+     qio_channel_attach_aio_context(server->ioc, server->ctx);
--        if (off > size) {
+-    qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
-+        if (off + s->cluster_size > size) {
++    qio_channel_set_blocking(server->ioc, false, NULL);
-             fprintf(stderr, "%s cluster %u is outside image\n",
+     vu_client_start(server);
-                     fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+ }
-             res->corruptions++;
 --
-.40.1
+.26.2

-[PULL 03/17] util/iov: Remove qemu_iovec_init_extended()
+[PULL v2 10/28] util/vhost-user-server: drop unnecessary watch deletion
-bdrv_pad_request() was the main user of qemu_iovec_init_extended().
+Explicitly deleting watches is not necessary since libvhost-user calls
-HEAD^ has removed that use, so we can remove qemu_iovec_init_extended()
+remove_watch() during vu_deinit(). Add an assertion to check this
-now.
+though.
-The only remaining user is qemu_iovec_init_slice(), which can easily
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-inline the small part it really needs.
+Message-id: 20200924151549.913737-5-stefanha@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
  util/vhost-user-server.c | 19 ++++---------------
 file changed, 4 insertions(+), 15 deletions(-)
-Note that qemu_iovec_init_extended() offered a memcpy() optimization to
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
 initialize the new I/O vector.  qemu_iovec_concat_iov(), which is used
 to replace its functionality, does not, but calls qemu_iovec_add() for
 every single element.  If we decide this optimization was important, we
 will need to re-implement it in qemu_iovec_concat_iov(), which might
 also benefit its pre-existing users.
 Reviewed-by: Eric Blake <eblake@redhat.com>
 Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 Message-Id: <20230411173418.19549-4-hreitz@redhat.com>
 ---
  include/qemu/iov.h |  5 ---
  util/iov.c         | 79 +++++++---------------------------------------
 files changed, 11 insertions(+), 73 deletions(-)
 diff --git a/include/qemu/iov.h b/include/qemu/iov.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/iov.h
+--- a/util/vhost-user-server.c
-+++ b/include/qemu/iov.h
++++ b/util/vhost-user-server.c
-@@ -XXX,XX +XXX,XX @@ static inline void *qemu_iovec_buf(QEMUIOVector *qiov)
+@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
+     /* When this is set vu_client_trip will stop new processing vhost-user message */
- void qemu_iovec_init(QEMUIOVector *qiov, int alloc_hint);
+     server->sioc = NULL;
- void qemu_iovec_init_external(QEMUIOVector *qiov, struct iovec *iov, int niov);
--int qemu_iovec_init_extended(
+-    VuFdWatch *vu_fd_watch, *next;
--        QEMUIOVector *qiov,
+-    QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
--        void *head_buf, size_t head_len,
+-        aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
--        QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
+-                           NULL, NULL, NULL);
 -        void *tail_buf, size_t tail_len);
  void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
                             size_t offset, size_t len);
  struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
 diff --git a/util/iov.c b/util/iov.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/iov.c
 +++ b/util/iov.c
@@ -XXX,XX +XXX,XX @@ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len)
      return niov;
  }
 -/*
 - * Compile new iovec, combining @head_buf buffer, sub-qiov of @mid_qiov,
 - * and @tail_buf buffer into new qiov.
 - */
 -int qemu_iovec_init_extended(
 -        QEMUIOVector *qiov,
 -        void *head_buf, size_t head_len,
 -        QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
 -        void *tail_buf, size_t tail_len)
 -{
 -    size_t mid_head, mid_tail;
 -    int total_niov, mid_niov = 0;
 -    struct iovec *p, *mid_iov = NULL;
 -
 -    assert(mid_qiov->niov <= IOV_MAX);
 -
 -    if (SIZE_MAX - head_len < mid_len ||
 -        SIZE_MAX - head_len - mid_len < tail_len)
 -    {
 -        return -EINVAL;
 -    }
 -
--    if (mid_len) {
+-    while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
--        mid_iov = qemu_iovec_slice(mid_qiov, mid_offset, mid_len,
+-        QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
--                                   &mid_head, &mid_tail, &mid_niov);
+-            if (!vu_fd_watch->processing) {
 -                QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
 -                g_free(vu_fd_watch);
 -            }
 -        }
 -    }
 -
--    total_niov = !!head_len + mid_niov + !!tail_len;
+     while (server->processing_msg) {
--    if (total_niov > IOV_MAX) {
+         if (server->ioc->read_coroutine) {
--        return -EINVAL;
+             server->ioc->read_coroutine = NULL;
--    }
+@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
--
+     }
--    if (total_niov == 1) {
--        qemu_iovec_init_buf(qiov, NULL, 0);
+     vu_deinit(&server->vu_dev);
--        p = &qiov->local_iov;
++
--    } else {
++    /* vu_deinit() should have called remove_watch() */
--        qiov->niov = qiov->nalloc = total_niov;
++    assert(QTAILQ_EMPTY(&server->vu_fd_watches));
--        qiov->size = head_len + mid_len + tail_len;
++
--        p = qiov->iov = g_new(struct iovec, qiov->niov);
+     object_unref(OBJECT(sioc));
--    }
+     object_unref(OBJECT(server->ioc));
 -
 -    if (head_len) {
 -        p->iov_base = head_buf;
 -        p->iov_len = head_len;
 -        p++;
 -    }
 -
 -    assert(!mid_niov == !mid_len);
 -    if (mid_niov) {
 -        memcpy(p, mid_iov, mid_niov * sizeof(*p));
 -        p[0].iov_base = (uint8_t *)p[0].iov_base + mid_head;
 -        p[0].iov_len -= mid_head;
 -        p[mid_niov - 1].iov_len -= mid_tail;
 -        p += mid_niov;
 -    }
 -
 -    if (tail_len) {
 -        p->iov_base = tail_buf;
 -        p->iov_len = tail_len;
 -    }
 -
 -    return 0;
 -}
 -
  /*
   * Check if the contents of subrange of qiov data is all zeroes.
   */
@@ -XXX,XX +XXX,XX @@ bool qemu_iovec_is_zero(QEMUIOVector *qiov, size_t offset, size_t bytes)
  void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
                             size_t offset, size_t len)
  {
 -    int ret;
 +    struct iovec *slice_iov;
 +    int slice_niov;
 +    size_t slice_head, slice_tail;
      assert(source->size >= len);
      assert(source->size - len >= offset);
 -    /* We shrink the request, so we can't overflow neither size_t nor MAX_IOV */
 -    ret = qemu_iovec_init_extended(qiov, NULL, 0, source, offset, len, NULL, 0);
 -    assert(ret == 0);
 +    slice_iov = qemu_iovec_slice(source, offset, len,
 +                                 &slice_head, &slice_tail, &slice_niov);
 +    if (slice_niov == 1) {
 +        qemu_iovec_init_buf(qiov, slice_iov[0].iov_base + slice_head, len);
 +    } else {
 +        qemu_iovec_init(qiov, slice_niov);
 +        qemu_iovec_concat_iov(qiov, slice_iov, slice_niov, slice_head, len);
 +    }
  }
- void qemu_iovec_destroy(QEMUIOVector *qiov)
 --
-.40.1
+.26.2

-[PULL 09/17] parallels: Use generic infrastructure for BAT writing in parallels_co_check()
+[PULL v2 11/28] block/export: consolidate request structs into VuBlockReq
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Only one struct is needed per request. Drop req_data and the separate
 VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
 once.
-BAT is written in the context of conventional operations over the image
+This fixes the req_data memory leak in vu_block_virtio_process_req().
 inside bdrv_co_flush() when it calls parallels_co_flush_to_os() callback.
 Thus we should not modify BAT array directly, but call
 parallels_set_bat_entry() helper and bdrv_co_flush() further on. After
 that there is no need to manually write BAT and track its modification.
-This makes code more generic and allows to split parallels_set_bat_entry()
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-for independent pieces.
+Message-id: 20200924151549.913737-6-stefanha@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
  block/export/vhost-user-blk-server.c | 68 +++++++++-------------------
 file changed, 21 insertions(+), 47 deletions(-)
-Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
 Reviewed-by: Denis V. Lunev <den@openvz.org>
 Message-Id: <20230424093147.197643-6-alexander.ivanov@virtuozzo.com>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
  block/parallels.c | 23 ++++++++++-------------
 file changed, 10 insertions(+), 13 deletions(-)
 diff --git a/block/parallels.c b/block/parallels.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/block/export/vhost-user-blk-server.c
-+++ b/block/parallels.c
++++ b/block/export/vhost-user-blk-server.c
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
- {
+ };
-     BDRVParallelsState *s = bs->opaque;
-     int64_t size, prev_off, high_off;
+ typedef struct VuBlockReq {
--    int ret;
+-    VuVirtqElement *elem;
-+    int ret = 0;
++    VuVirtqElement elem;
-     uint32_t i;
+     int64_t sector_num;
--    bool flush_bat = false;
+     size_t size;
+     struct virtio_blk_inhdr *in;
-     size = bdrv_getlength(bs->file->bs);
+@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
-     if (size < 0) {
+     VuDev *vu_dev = &req->server->vu_dev;
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
-                     fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+     /* IO size with 1 extra status byte */
-             res->corruptions++;
+-    vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
-             if (fix & BDRV_FIX_ERRORS) {
++    vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
--                s->bat_bitmap[i] = 0;
+     vu_queue_notify(vu_dev, req->vq);
-+                parallels_set_bat_entry(s, i, 0);
-                 res->corruptions_fixed++;
+-    if (req->elem) {
--                flush_bat = true;
+-        free(req->elem);
              }
              prev_off = 0;
              continue;
@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
          prev_off = off;
      }
 -    ret = 0;
 -    if (flush_bat) {
 -        ret = bdrv_co_pwrite_sync(bs->file, 0, s->header_size, s->header, 0);
 -        if (ret < 0) {
 -            res->check_errors++;
 -            goto out;
 -        }
 -    }
 -
-     if (high_off == 0) {
+-    g_free(req);
-         res->image_end_offset = s->data_end << BDRV_SECTOR_BITS;
++    free(req);
-     } else {
+ }
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+ static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
- out:
+@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_flush(VuBlockReq *req)
-     qemu_co_mutex_unlock(&s->lock);
+     blk_co_flush(backend);
  }
 -struct req_data {
 -    VuServer *server;
 -    VuVirtq *vq;
 -    VuVirtqElement *elem;
 -};
 -
  static void coroutine_fn vu_block_virtio_process_req(void *opaque)
  {
 -    struct req_data *data = opaque;
 -    VuServer *server = data->server;
 -    VuVirtq *vq = data->vq;
 -    VuVirtqElement *elem = data->elem;
 +    VuBlockReq *req = opaque;
 +    VuServer *server = req->server;
 +    VuVirtqElement *elem = &req->elem;
      uint32_t type;
 -    VuBlockReq *req;
      VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
      BlockBackend *backend = vdev_blk->backend;
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
      struct iovec *out_iov = elem->out_sg;
      unsigned in_num = elem->in_num;
      unsigned out_num = elem->out_num;
 +
-+    if (ret == 0) {
+     /* refer to hw/block/virtio_blk.c */
-+        ret = bdrv_co_flush(bs);
+     if (elem->out_num < 1 || elem->in_num < 1) {
-+        if (ret < 0) {
+         error_report("virtio-blk request missing headers");
-+            res->check_errors++;
+-        free(elem);
-+        }
+-        return;
-+    }
++        goto err;
      }
 -    req = g_new0(VuBlockReq, 1);
 -    req->server = server;
 -    req->vq = vq;
 -    req->elem = elem;
 -
      if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
                              sizeof(req->out)) != sizeof(req->out))) {
          error_report("virtio-blk request outhdr too short");
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
  err:
      free(elem);
 -    g_free(req);
 -    return;
  }
  static void vu_block_process_vq(VuDev *vu_dev, int idx)
  {
 -    VuServer *server;
 -    VuVirtq *vq;
 -    struct req_data *req_data;
 +    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 +    VuVirtq *vq = vu_get_queue(vu_dev, idx);
 -    server = container_of(vu_dev, VuServer, vu_dev);
 -    assert(server);
 -
 -    vq = vu_get_queue(vu_dev, idx);
 -    assert(vq);
 -    VuVirtqElement *elem;
      while (1) {
 -        elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
 -                                    sizeof(VuBlockReq));
 -        if (elem) {
 -            req_data = g_new0(struct req_data, 1);
 -            req_data->server = server;
 -            req_data->vq = vq;
 -            req_data->elem = elem;
 -            Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
 -                                                  req_data);
 -            aio_co_enter(server->ioc->ctx, co);
 -        } else {
 +        VuBlockReq *req;
 +
-     return ret;
++        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
 +        if (!req) {
              break;
          }
 +
 +        req->server = server;
 +        req->vq = vq;
 +
 +        Coroutine *co =
 +            qemu_coroutine_create(vu_block_virtio_process_req, req);
 +        qemu_coroutine_enter(co);
      }
  }
 --
-.40.1
+.26.2

-New patch
+[PULL v2 12/28] util/vhost-user-server: drop unused DevicePanicNotifier
+The device panic notifier callback is not used. Drop it.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-7-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ util/vhost-user-server.h             | 3 ---
+ block/export/vhost-user-blk-server.c | 3 +--
+ util/vhost-user-server.c             | 6 ------
+files changed, 1 insertion(+), 11 deletions(-)
+diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.h
++++ b/util/vhost-user-server.h
+@@ -XXX,XX +XXX,XX @@ typedef struct VuFdWatch {
+ } VuFdWatch;
+ typedef struct VuServer VuServer;
+-typedef void DevicePanicNotifierFn(VuServer *server);
+ struct VuServer {
+     QIONetListener *listener;
+     AioContext *ctx;
+-    DevicePanicNotifierFn *device_panic_notifier;
+     int max_queues;
+     const VuDevIface *vu_iface;
+     VuDev vu_dev;
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
+                              SocketAddress *unix_socket,
+                              AioContext *ctx,
+                              uint16_t max_queues,
+-                             DevicePanicNotifierFn *device_panic_notifier,
+                              const VuDevIface *vu_iface,
+                              Error **errp);
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/export/vhost-user-blk-server.c
++++ b/block/export/vhost-user-blk-server.c
+@@ -XXX,XX +XXX,XX @@ static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
+     ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
+     if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
+-                                 VHOST_USER_BLK_MAX_QUEUES,
+-                                 NULL, &vu_block_iface,
++                                 VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
+                                  errp)) {
+         goto error;
+     }
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.c
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@ static void panic_cb(VuDev *vu_dev, const char *buf)
+         close_client(server);
+     }
+-    if (server->device_panic_notifier) {
+-        server->device_panic_notifier(server);
+-    }
+-
+     /*
+      * Set the callback function for network listener so another
+      * vhost-user client can connect to this server
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
+                              SocketAddress *socket_addr,
+                              AioContext *ctx,
+                              uint16_t max_queues,
+-                             DevicePanicNotifierFn *device_panic_notifier,
+                              const VuDevIface *vu_iface,
+                              Error **errp)
+ {
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
+         .vu_iface              = vu_iface,
+         .max_queues            = max_queues,
+         .ctx                   = ctx,
+-        .device_panic_notifier = device_panic_notifier,
+     };
+     qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
+--
+.26.2

-New patch
+[PULL v2 13/28] util/vhost-user-server: fix memory leak in vu_message_read()
+fds[] is leaked when qio_channel_readv_full() fails.
+Use vmsg->fds[] instead of keeping a local fds[] array. Then we can
+reuse goto fail to clean up fds. vmsg->fd_num must be zeroed before the
+loop to make this safe.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-8-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ util/vhost-user-server.c | 50 ++++++++++++++++++----------------------
+file changed, 23 insertions(+), 27 deletions(-)
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.c
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
+     };
+     int rc, read_bytes = 0;
+     Error *local_err = NULL;
+-    /*
+-     * Store fds/nfds returned from qio_channel_readv_full into
+-     * temporary variables.
+-     *
+-     * VhostUserMsg is a packed structure, gcc will complain about passing
+-     * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
+-     * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
+-     * thus two temporary variables nfds and fds are used here.
+-     */
+-    size_t nfds = 0, nfds_t = 0;
+     const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
+-    int *fds_t = NULL;
+     VuServer *server = container_of(vu_dev, VuServer, vu_dev);
+     QIOChannel *ioc = server->ioc;
++    vmsg->fd_num = 0;
+     if (!ioc) {
+         error_report_err(local_err);
+         goto fail;
+@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
+     assert(qemu_in_coroutine());
+     do {
++        size_t nfds = 0;
++        int *fds = NULL;
++
+         /*
+          * qio_channel_readv_full may have short reads, keeping calling it
+          * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
+          */
+-        rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
++        rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, &local_err);
+         if (rc < 0) {
+             if (rc == QIO_CHANNEL_ERR_BLOCK) {
++                assert(local_err == NULL);
+                 qio_channel_yield(ioc, G_IO_IN);
+                 continue;
+             } else {
+                 error_report_err(local_err);
+-                return false;
++                goto fail;
+             }
+         }
+-        read_bytes += rc;
+-        if (nfds_t > 0) {
+-            if (nfds + nfds_t > max_fds) {
++
++        if (nfds > 0) {
++            if (vmsg->fd_num + nfds > max_fds) {
+                 error_report("A maximum of %zu fds are allowed, "
+                              "however got %zu fds now",
+-                             max_fds, nfds + nfds_t);
++                             max_fds, vmsg->fd_num + nfds);
++                g_free(fds);
+                 goto fail;
+             }
+-            memcpy(vmsg->fds + nfds, fds_t,
+-                   nfds_t *sizeof(vmsg->fds[0]));
+-            nfds += nfds_t;
+-            g_free(fds_t);
++            memcpy(vmsg->fds + vmsg->fd_num, fds, nfds * sizeof(vmsg->fds[0]));
++            vmsg->fd_num += nfds;
++            g_free(fds);
+         }
+-        if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
+-            break;
++
++        if (rc == 0) { /* socket closed */
++            goto fail;
+         }
+-        iov.iov_base = (char *)vmsg + read_bytes;
+-        iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
+-    } while (true);
+-    vmsg->fd_num = nfds;
++        iov.iov_base += rc;
++        iov.iov_len -= rc;
++        read_bytes += rc;
++    } while (read_bytes != VHOST_USER_HDR_SIZE);
++
+     /* qio_channel_readv_full will make socket fds blocking, unblock them */
+     vmsg_unblock_fds(vmsg);
+     if (vmsg->size > sizeof(vmsg->payload)) {
+--
+.26.2

-[PULL 07/17] parallels: Fix image_end_offset and data_end after out-of-image check
+[PULL v2 14/28] util/vhost-user-server: check EOF when reading payload
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Unexpected EOF is an error that must be reported.
-Set data_end to the end of the last cluster inside the image. In such a
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-way we can be sure that corrupted offsets in the BAT can't affect on the
+Message-id: 20200924151549.913737-9-stefanha@redhat.com
-image size. If there are no allocated clusters set image_end_offset by
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-data_end.
+---
  util/vhost-user-server.c | 6 ++++--
 file changed, 4 insertions(+), 2 deletions(-)
-Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
 Reviewed-by: Denis V. Lunev <den@openvz.org>
 Message-Id: <20230424093147.197643-4-alexander.ivanov@virtuozzo.com>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
  block/parallels.c | 8 +++++++-
 file changed, 7 insertions(+), 1 deletion(-)
 diff --git a/block/parallels.c b/block/parallels.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/util/vhost-user-server.c
-+++ b/block/parallels.c
++++ b/util/vhost-user-server.c
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
      };
      if (vmsg->size) {
          rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
 -        if (rc == -1) {
 -            error_report_err(local_err);
 +        if (rc != 1) {
 +            if (local_err) {
 +                error_report_err(local_err);
 +            }
              goto fail;
          }
      }
--    res->image_end_offset = high_off + s->cluster_size;
-+    if (high_off == 0) {
-+        res->image_end_offset = s->data_end << BDRV_SECTOR_BITS;
-+    } else {
-+        res->image_end_offset = high_off + s->cluster_size;
-+        s->data_end = res->image_end_offset >> BDRV_SECTOR_BITS;
-+    }
-+
-     if (size > res->image_end_offset) {
-         int64_t count;
-         count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
 --
-.40.1
+.26.2

-[PULL 14/17] parallels: Move statistic collection to a separate function
+[PULL v2 15/28] util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+The vu_client_trip() coroutine is leaked during AioContext switching. It
 is also unsafe to destroy the vu_dev in panic_cb() since its callers
 still access it in some cases.
-We will add more and more checks so we need a better code structure
+Rework the lifecycle to solve these safety issues.
 in parallels_co_check. Let each check performs in a separate loop
 in a separate helper.
-Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: Denis V. Lunev <den@openvz.org>
+Message-id: 20200924151549.913737-10-stefanha@redhat.com
-Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-Id: <20230424093147.197643-11-alexander.ivanov@virtuozzo.com>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
- block/parallels.c | 52 +++++++++++++++++++++++++++--------------------
+ util/vhost-user-server.h             |  29 ++--
-file changed, 30 insertions(+), 22 deletions(-)
+ block/export/vhost-user-blk-server.c |   9 +-
  util/vhost-user-server.c             | 245 +++++++++++++++------------
 files changed, 155 insertions(+), 128 deletions(-)
-diff --git a/block/parallels.c b/block/parallels.c
+diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/util/vhost-user-server.h
-+++ b/block/parallels.c
++++ b/util/vhost-user-server.h
-@@ -XXX,XX +XXX,XX @@ parallels_check_leak(BlockDriverState *bs, BdrvCheckResult *res,
+@@ -XXX,XX +XXX,XX @@
-     return 0;
+ #include "qapi/error.h"
- }
+ #include "standard-headers/linux/virtio_blk.h"
--static int coroutine_fn GRAPH_RDLOCK
++/* A kick fd that we monitor on behalf of libvhost-user */
--parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+ typedef struct VuFdWatch {
--                   BdrvCheckMode fix)
+     VuDev *vu_dev;
-+static void parallels_collect_statistics(BlockDriverState *bs,
+     int fd; /*kick fd*/
-+                                         BdrvCheckResult *res,
+     void *pvt;
-+                                         BdrvCheckMode fix)
+     vu_watch_cb cb;
- {
+-    bool processing;
-     BDRVParallelsState *s = bs->opaque;
+     QTAILQ_ENTRY(VuFdWatch) next;
--    int64_t prev_off;
+ } VuFdWatch;
--    int ret;
-+    int64_t off, prev_off;
+-typedef struct VuServer VuServer;
-     uint32_t i;
+-
+-struct VuServer {
--    qemu_co_mutex_lock(&s->lock);
++/**
--
++ * VuServer:
--    parallels_check_unclean(bs, res, fix);
++ * A vhost-user server instance with user-defined VuDevIface callbacks.
--
++ * Vhost-user device backends can be implemented using VuServer. VuDevIface
--    ret = parallels_check_outside_image(bs, res, fix);
++ * callbacks and virtqueue kicks run in the given AioContext.
--    if (ret < 0) {
++ */
--        goto out;
++typedef struct {
      QIONetListener *listener;
 +    QEMUBH *restart_listener_bh;
      AioContext *ctx;
      int max_queues;
      const VuDevIface *vu_iface;
 +
 +    /* Protected by ctx lock */
      VuDev vu_dev;
      QIOChannel *ioc; /* The I/O channel with the client */
      QIOChannelSocket *sioc; /* The underlying data channel with the client */
 -    /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
 -    QIOChannel *ioc_slave;
 -    QIOChannelSocket *sioc_slave;
 -    Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
      QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
 -    /* restart coroutine co_trip if AIOContext is changed */
 -    bool aio_context_changed;
 -    bool processing_msg;
 -};
 +
 +    Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
 +} VuServer;
  bool vhost_user_server_start(VuServer *server,
                               SocketAddress *unix_socket,
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
  void vhost_user_server_stop(VuServer *server);
 -void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
 +void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
 +void vhost_user_server_detach_aio_context(VuServer *server);
  #endif /* VHOST_USER_SERVER_H */
 diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/vhost-user-blk-server.c
 +++ b/block/export/vhost-user-blk-server.c
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_block_iface = {
  static void blk_aio_attached(AioContext *ctx, void *opaque)
  {
      VuBlockDev *vub_dev = opaque;
 -    aio_context_acquire(ctx);
 -    vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
 -    aio_context_release(ctx);
 +    vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
  }
  static void blk_aio_detach(void *opaque)
  {
      VuBlockDev *vub_dev = opaque;
 -    AioContext *ctx = vub_dev->vu_server.ctx;
 -    aio_context_acquire(ctx);
 -    vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
 -    aio_context_release(ctx);
 +    vhost_user_server_detach_aio_context(&vub_dev->vu_server);
  }
  static void
 diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/vhost-user-server.c
 +++ b/util/vhost-user-server.c
@@ -XXX,XX +XXX,XX @@
   */
  #include "qemu/osdep.h"
  #include "qemu/main-loop.h"
 +#include "block/aio-wait.h"
  #include "vhost-user-server.h"
 +/*
 + * Theory of operation:
 + *
 + * VuServer is started and stopped by vhost_user_server_start() and
 + * vhost_user_server_stop() from the main loop thread. Starting the server
 + * opens a vhost-user UNIX domain socket and listens for incoming connections.
 + * Only one connection is allowed at a time.
 + *
 + * The connection is handled by the vu_client_trip() coroutine in the
 + * VuServer->ctx AioContext. The coroutine consists of a vu_dispatch() loop
 + * where libvhost-user calls vu_message_read() to receive the next vhost-user
 + * protocol messages over the UNIX domain socket.
 + *
 + * When virtqueues are set up libvhost-user calls set_watch() to monitor kick
 + * fds. These fds are also handled in the VuServer->ctx AioContext.
 + *
 + * Both vu_client_trip() and kick fd monitoring can be stopped by shutting down
 + * the socket connection. Shutting down the socket connection causes
 + * vu_message_read() to fail since no more data can be received from the socket.
 + * After vu_dispatch() fails, vu_client_trip() calls vu_deinit() to stop
 + * libvhost-user before terminating the coroutine. vu_deinit() calls
 + * remove_watch() to stop monitoring kick fds and this stops virtqueue
 + * processing.
 + *
 + * When vu_client_trip() has finished cleaning up it schedules a BH in the main
 + * loop thread to accept the next client connection.
 + *
 + * When libvhost-user detects an error it calls panic_cb() and sets the
 + * dev->broken flag. Both vu_client_trip() and kick fd processing stop when
 + * the dev->broken flag is set.
 + *
 + * It is possible to switch AioContexts using
 + * vhost_user_server_detach_aio_context() and
 + * vhost_user_server_attach_aio_context(). They stop monitoring fds in the old
 + * AioContext and resume monitoring in the new AioContext. The vu_client_trip()
 + * coroutine remains in a yielded state during the switch. This is made
 + * possible by QIOChannel's support for spurious coroutine re-entry in
 + * qio_channel_yield(). The coroutine will restart I/O when re-entered from the
 + * new AioContext.
 + */
 +
  static void vmsg_close_fds(VhostUserMsg *vmsg)
  {
      int i;
@@ -XXX,XX +XXX,XX @@ static void vmsg_unblock_fds(VhostUserMsg *vmsg)
      }
  }
 -static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
 -                      gpointer opaque);
 -
 -static void close_client(VuServer *server)
 -{
 -    /*
 -     * Before closing the client
 -     *
 -     * 1. Let vu_client_trip stop processing new vhost-user msg
 -     *
 -     * 2. remove kick_handler
 -     *
 -     * 3. wait for the kick handler to be finished
 -     *
 -     * 4. wait for the current vhost-user msg to be finished processing
 -     */
 -
 -    QIOChannelSocket *sioc = server->sioc;
 -    /* When this is set vu_client_trip will stop new processing vhost-user message */
 -    server->sioc = NULL;
 -
 -    while (server->processing_msg) {
 -        if (server->ioc->read_coroutine) {
 -            server->ioc->read_coroutine = NULL;
 -            qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
 -                                           NULL, server->ioc);
 -            server->processing_msg = false;
 -        }
 -    }
 -
--    ret = parallels_check_leak(bs, res, fix);
+-    vu_deinit(&server->vu_dev);
--    if (ret < 0) {
+-
--        goto out;
+-    /* vu_deinit() should have called remove_watch() */
 -    assert(QTAILQ_EMPTY(&server->vu_fd_watches));
 -
 -    object_unref(OBJECT(sioc));
 -    object_unref(OBJECT(server->ioc));
 -}
 -
  static void panic_cb(VuDev *vu_dev, const char *buf)
  {
 -    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 -
 -    /* avoid while loop in close_client */
 -    server->processing_msg = false;
 -
 -    if (buf) {
 -        error_report("vu_panic: %s", buf);
 -    }
 -
-     res->bfi.total_clusters = s->bat_size;
+-    if (server->sioc) {
-     res->bfi.compressed_clusters = 0; /* compression is not supported */
+-        close_client(server);
+-    }
-     prev_off = 0;
+-
-     for (i = 0; i < s->bat_size; i++) {
+-    /*
--        int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+-     * Set the callback function for network listener so another
-+        off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+-     * vhost-user client can connect to this server
-         /*
+-     */
-          * If BDRV_FIX_ERRORS is not set, out-of-image BAT entries were not
+-    qio_net_listener_set_client_func(server->listener,
-          * fixed. Skip not allocated and out-of-image BAT entries.
+-                                     vu_accept,
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+-                                     server,
-             continue;
+-                                     NULL);
-         }
++    error_report("vu_panic: %s", buf);
+ }
--        res->bfi.allocated_clusters++;
--
+ static bool coroutine_fn
-         if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
+@@ -XXX,XX +XXX,XX @@ fail:
-             res->bfi.fragmented_clusters++;
+     return false;
-         }
+ }
-         prev_off = off;
-+        res->bfi.allocated_clusters++;
+-
 -static void vu_client_start(VuServer *server);
  static coroutine_fn void vu_client_trip(void *opaque)
  {
      VuServer *server = opaque;
 +    VuDev *vu_dev = &server->vu_dev;
 -    while (!server->aio_context_changed && server->sioc) {
 -        server->processing_msg = true;
 -        vu_dispatch(&server->vu_dev);
 -        server->processing_msg = false;
 +    while (!vu_dev->broken && vu_dispatch(vu_dev)) {
 +        /* Keep running */
      }
 -    if (server->aio_context_changed && server->sioc) {
 -        server->aio_context_changed = false;
 -        vu_client_start(server);
 -    }
 -}
 +    vu_deinit(vu_dev);
 +
 +    /* vu_deinit() should have called remove_watch() */
 +    assert(QTAILQ_EMPTY(&server->vu_fd_watches));
 +
 +    object_unref(OBJECT(server->sioc));
 +    server->sioc = NULL;
 -static void vu_client_start(VuServer *server)
 -{
 -    server->co_trip = qemu_coroutine_create(vu_client_trip, server);
 -    aio_co_enter(server->ctx, server->co_trip);
 +    object_unref(OBJECT(server->ioc));
 +    server->ioc = NULL;
 +
 +    server->co_trip = NULL;
 +    if (server->restart_listener_bh) {
 +        qemu_bh_schedule(server->restart_listener_bh);
 +    }
 +    aio_wait_kick();
  }
  /*
@@ -XXX,XX +XXX,XX @@ static void vu_client_start(VuServer *server)
  static void kick_handler(void *opaque)
  {
      VuFdWatch *vu_fd_watch = opaque;
 -    vu_fd_watch->processing = true;
 -    vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
 -    vu_fd_watch->processing = false;
 +    VuDev *vu_dev = vu_fd_watch->vu_dev;
 +
 +    vu_fd_watch->cb(vu_dev, 0, vu_fd_watch->pvt);
 +
 +    /* Stop vu_client_trip() if an error occurred in vu_fd_watch->cb() */
 +    if (vu_dev->broken) {
 +        VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 +
 +        qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
 +    }
  }
 -
  static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
  {
@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
      qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
      server->ioc = QIO_CHANNEL(sioc);
      object_ref(OBJECT(server->ioc));
 -    qio_channel_attach_aio_context(server->ioc, server->ctx);
 +
 +    /* TODO vu_message_write() spins if non-blocking! */
      qio_channel_set_blocking(server->ioc, false, NULL);
 -    vu_client_start(server);
 +
 +    server->co_trip = qemu_coroutine_create(vu_client_trip, server);
 +
 +    aio_context_acquire(server->ctx);
 +    vhost_user_server_attach_aio_context(server, server->ctx);
 +    aio_context_release(server->ctx);
  }
 -
  void vhost_user_server_stop(VuServer *server)
  {
 +    aio_context_acquire(server->ctx);
 +
 +    qemu_bh_delete(server->restart_listener_bh);
 +    server->restart_listener_bh = NULL;
 +
      if (server->sioc) {
 -        close_client(server);
 +        VuFdWatch *vu_fd_watch;
 +
 +        QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
 +            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
 +                               NULL, NULL, NULL, vu_fd_watch);
 +        }
 +
 +        qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
 +
 +        AIO_WAIT_WHILE(server->ctx, server->co_trip);
      }
 +    aio_context_release(server->ctx);
 +
      if (server->listener) {
          qio_net_listener_disconnect(server->listener);
          object_unref(OBJECT(server->listener));
      }
 +}
 +
-+static int coroutine_fn GRAPH_RDLOCK
++/*
-+parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
++ * Allow the next client to connect to the server. Called from a BH in the main
-+                   BdrvCheckMode fix)
++ * loop.
 + */
 +static void restart_listener_bh(void *opaque)
 +{
-+    BDRVParallelsState *s = bs->opaque;
++    VuServer *server = opaque;
-+    int ret;
-+
++    qio_net_listener_set_client_func(server->listener, vu_accept, server,
-+    qemu_co_mutex_lock(&s->lock);
++                                     NULL);
-+
+ }
-+    parallels_check_unclean(bs, res, fix);
-+
+-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
-+    ret = parallels_check_outside_image(bs, res, fix);
++/* Called with ctx acquired */
-+    if (ret < 0) {
++void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
-+        goto out;
+ {
 -    VuFdWatch *vu_fd_watch, *next;
 -    void *opaque = NULL;
 -    IOHandler *io_read = NULL;
 -    bool attach;
 +    VuFdWatch *vu_fd_watch;
 -    server->ctx = ctx ? ctx : qemu_get_aio_context();
 +    server->ctx = ctx;
      if (!server->sioc) {
 -        /* not yet serving any client*/
          return;
      }
 -    if (ctx) {
 -        qio_channel_attach_aio_context(server->ioc, ctx);
 -        server->aio_context_changed = true;
 -        io_read = kick_handler;
 -        attach = true;
 -    } else {
 +    qio_channel_attach_aio_context(server->ioc, ctx);
 +
 +    QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
 +        aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL,
 +                           NULL, vu_fd_watch);
 +    }
 +
-+    ret = parallels_check_leak(bs, res, fix);
++    aio_co_schedule(ctx, server->co_trip);
-+    if (ret < 0) {
++}
-+        goto out;
++
-+    }
++/* Called with server->ctx acquired */
-+
++void vhost_user_server_detach_aio_context(VuServer *server)
-+    parallels_collect_statistics(bs, res, fix);
++{
++    if (server->sioc) {
- out:
++        VuFdWatch *vu_fd_watch;
-     qemu_co_mutex_unlock(&s->lock);
++
 +        QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
 +            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
 +                               NULL, NULL, NULL, vu_fd_watch);
 +        }
 +
          qio_channel_detach_aio_context(server->ioc);
 -        /* server->ioc->ctx keeps the old AioConext */
 -        ctx = server->ioc->ctx;
 -        attach = false;
      }
 -    QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
 -        if (vu_fd_watch->cb) {
 -            opaque = attach ? vu_fd_watch : NULL;
 -            aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
 -                               io_read, NULL, NULL,
 -                               opaque);
 -        }
 -    }
 +    server->ctx = NULL;
  }
 -
  bool vhost_user_server_start(VuServer *server,
                               SocketAddress *socket_addr,
                               AioContext *ctx,
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
                               const VuDevIface *vu_iface,
                               Error **errp)
  {
 +    QEMUBH *bh;
      QIONetListener *listener = qio_net_listener_new();
      if (qio_net_listener_open_sync(listener, socket_addr, 1,
                                     errp) < 0) {
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
          return false;
      }
 +    bh = qemu_bh_new(restart_listener_bh, server);
 +
      /* zero out unspecified fields */
      *server = (VuServer) {
          .listener              = listener,
 +        .restart_listener_bh   = bh,
          .vu_iface              = vu_iface,
          .max_queues            = max_queues,
          .ctx                   = ctx,
 --
-.40.1
+.26.2

-New patch
+[PULL v2 16/28] block/export: report flush errors
+Propagate the flush return value since errors are possible.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-11-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ block/export/vhost-user-blk-server.c | 11 +++++++----
+file changed, 7 insertions(+), 4 deletions(-)
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/export/vhost-user-blk-server.c
++++ b/block/export/vhost-user-blk-server.c
+@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
+     return -EINVAL;
+ }
+-static void coroutine_fn vu_block_flush(VuBlockReq *req)
++static int coroutine_fn vu_block_flush(VuBlockReq *req)
+ {
+     VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
+     BlockBackend *backend = vdev_blk->backend;
+-    blk_co_flush(backend);
++    return blk_co_flush(backend);
+ }
+ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
+@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
+         break;
+     }
+     case VIRTIO_BLK_T_FLUSH:
+-        vu_block_flush(req);
+-        req->in->status = VIRTIO_BLK_S_OK;
++        if (vu_block_flush(req) == 0) {
++            req->in->status = VIRTIO_BLK_S_OK;
++        } else {
++            req->in->status = VIRTIO_BLK_S_IOERR;
++        }
+         break;
+     case VIRTIO_BLK_T_GET_ID: {
+         size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
+--
+.26.2

-[PULL 15/17] parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD
+[PULL v2 17/28] block/export: convert vhost-user-blk server to block export API
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Use the new QAPI block exports API instead of defining our own QOM
+objects.
-Replace the way we use mutex in parallels_co_check() for simplier
-and less error prone code.
+This is a large change because the lifecycle of VuBlockDev needs to
+follow BlockExportDriver. QOM properties are replaced by QAPI options
-Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+objects.
-Reviewed-by: Denis V. Lunev <den@openvz.org>
-Message-Id: <20230424093147.197643-12-alexander.ivanov@virtuozzo.com>
+VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Several fields can be dropped since BlockExport already has equivalents.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 The file names and meson build integration will be adjusted in a future
 patch. libvhost-user should probably be built as a static library that
 is linked into QEMU instead of as a .c file that results in duplicate
 compilation.
 The new command-line syntax is:
   $ qemu-storage-daemon \
       --blockdev file,node-name=drive0,filename=test.img \
       --export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock
 Note that unix-socket is optional because we may wish to accept chardevs
 too in the future.
 Markus noted that supported address families are not explicit in the
 QAPI schema. It is unlikely that support for more address families will
 be added since file descriptor passing is required and few address
 families support it. If a new address family needs to be added, then the
 QAPI 'features' syntax can be used to advertize them.
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Acked-by: Markus Armbruster <armbru@redhat.com>
 Message-id: 20200924151549.913737-12-stefanha@redhat.com
 [Skip test on big-endian host architectures because this device doesn't
 support them yet (as already mentioned in a code comment).
 --Stefan]
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/parallels.c | 33 ++++++++++++++-------------------
+ qapi/block-export.json               |  21 +-
-file changed, 14 insertions(+), 19 deletions(-)
+ block/export/vhost-user-blk-server.h |  23 +-
+ block/export/export.c                |   6 +
-diff --git a/block/parallels.c b/block/parallels.c
+ block/export/vhost-user-blk-server.c | 452 +++++++--------------------
  util/vhost-user-server.c             |  10 +-
  block/export/meson.build             |   1 +
  block/meson.build                    |   1 -
 files changed, 156 insertions(+), 358 deletions(-)
 diff --git a/qapi/block-export.json b/qapi/block-export.json
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/qapi/block-export.json
-+++ b/block/parallels.c
++++ b/qapi/block-export.json
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+@@ -XXX,XX +XXX,XX @@
-     BDRVParallelsState *s = bs->opaque;
+   'data': { '*name': 'str', '*description': 'str',
-     int ret;
+             '*bitmap': 'str' } }
--    qemu_co_mutex_lock(&s->lock);
++##
-+    WITH_QEMU_LOCK_GUARD(&s->lock) {
++# @BlockExportOptionsVhostUserBlk:
-+        parallels_check_unclean(bs, res, fix);
++#
++# A vhost-user-blk block export.
--    parallels_check_unclean(bs, res, fix);
++#
-+        ret = parallels_check_outside_image(bs, res, fix);
++# @addr: The vhost-user socket on which to listen. Both 'unix' and 'fd'
-+        if (ret < 0) {
++#        SocketAddress types are supported. Passed fds must be UNIX domain
-+            return ret;
++#        sockets.
 +# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
 +#
 +# Since: 5.2
 +##
 +{ 'struct': 'BlockExportOptionsVhostUserBlk',
 +  'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
 +
  ##
  # @NbdServerAddOptions:
  #
@@ -XXX,XX +XXX,XX @@
  # An enumeration of block export types
  #
  # @nbd: NBD export
 +# @vhost-user-blk: vhost-user-blk export (since 5.2)
  #
  # Since: 4.2
  ##
  { 'enum': 'BlockExportType',
 -  'data': [ 'nbd' ] }
 +  'data': [ 'nbd', 'vhost-user-blk' ] }
  ##
  # @BlockExportOptions:
@@ -XXX,XX +XXX,XX @@
              '*writethrough': 'bool' },
    'discriminator': 'type',
    'data': {
 -      'nbd': 'BlockExportOptionsNbd'
 +      'nbd': 'BlockExportOptionsNbd',
 +      'vhost-user-blk': 'BlockExportOptionsVhostUserBlk'
     } }
  ##
 diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/vhost-user-blk-server.h
 +++ b/block/export/vhost-user-blk-server.h
@@ -XXX,XX +XXX,XX @@
  #ifndef VHOST_USER_BLK_SERVER_H
  #define VHOST_USER_BLK_SERVER_H
 -#include "util/vhost-user-server.h"
 -typedef struct VuBlockDev VuBlockDev;
 -#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
 -#define VHOST_USER_BLK_SERVER(obj) \
 -   OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
 +#include "block/export.h"
 -/* vhost user block device */
 -struct VuBlockDev {
 -    Object parent_obj;
 -    char *node_name;
 -    SocketAddress *addr;
 -    AioContext *ctx;
 -    VuServer vu_server;
 -    bool running;
 -    uint32_t blk_size;
 -    BlockBackend *backend;
 -    QIOChannelSocket *sioc;
 -    QTAILQ_ENTRY(VuBlockDev) next;
 -    struct virtio_blk_config blkcfg;
 -    bool writable;
 -};
 +/* For block/export/export.c */
 +extern const BlockExportDriver blk_exp_vhost_user_blk;
  #endif /* VHOST_USER_BLK_SERVER_H */
 diff --git a/block/export/export.c b/block/export/export.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/export.c
 +++ b/block/export/export.c
@@ -XXX,XX +XXX,XX @@
  #include "sysemu/block-backend.h"
  #include "block/export.h"
  #include "block/nbd.h"
 +#if CONFIG_LINUX
 +#include "block/export/vhost-user-blk-server.h"
 +#endif
  #include "qapi/error.h"
  #include "qapi/qapi-commands-block-export.h"
  #include "qapi/qapi-events-block-export.h"
@@ -XXX,XX +XXX,XX @@
  static const BlockExportDriver *blk_exp_drivers[] = {
      &blk_exp_nbd,
 +#if CONFIG_LINUX
 +    &blk_exp_vhost_user_blk,
 +#endif
  };
  /* Only accessed from the main thread */
 diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/vhost-user-blk-server.c
 +++ b/block/export/vhost-user-blk-server.c
@@ -XXX,XX +XXX,XX @@
   */
  #include "qemu/osdep.h"
  #include "block/block.h"
 +#include "contrib/libvhost-user/libvhost-user.h"
 +#include "standard-headers/linux/virtio_blk.h"
 +#include "util/vhost-user-server.h"
  #include "vhost-user-blk-server.h"
  #include "qapi/error.h"
  #include "qom/object_interfaces.h"
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
      unsigned char status;
  };
 -typedef struct VuBlockReq {
 +typedef struct VuBlkReq {
      VuVirtqElement elem;
      int64_t sector_num;
      size_t size;
@@ -XXX,XX +XXX,XX @@ typedef struct VuBlockReq {
      struct virtio_blk_outhdr out;
      VuServer *server;
      struct VuVirtq *vq;
 -} VuBlockReq;
 +} VuBlkReq;
 -static void vu_block_req_complete(VuBlockReq *req)
 +/* vhost user block device */
 +typedef struct {
 +    BlockExport export;
 +    VuServer vu_server;
 +    uint32_t blk_size;
 +    QIOChannelSocket *sioc;
 +    struct virtio_blk_config blkcfg;
 +    bool writable;
 +} VuBlkExport;
 +
 +static void vu_blk_req_complete(VuBlkReq *req)
  {
      VuDev *vu_dev = &req->server->vu_dev;
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
      free(req);
  }
 -static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
 -{
 -    return container_of(server, VuBlockDev, vu_server);
 -}
 -
  static int coroutine_fn
 -vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
 -                              uint32_t iovcnt, uint32_t type)
 +vu_blk_discard_write_zeroes(BlockBackend *blk, struct iovec *iov,
 +                            uint32_t iovcnt, uint32_t type)
  {
      struct virtio_blk_discard_write_zeroes desc;
      ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
          return -EINVAL;
      }
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
      uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
                            le32_to_cpu(desc.num_sectors) << 9 };
      if (type == VIRTIO_BLK_T_DISCARD) {
 -        if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
 +        if (blk_co_pdiscard(blk, range[0], range[1]) == 0) {
              return 0;
          }
      } else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
 -        if (blk_co_pwrite_zeroes(vdev_blk->backend,
 -                                 range[0], range[1], 0) == 0) {
 +        if (blk_co_pwrite_zeroes(blk, range[0], range[1], 0) == 0) {
              return 0;
          }
      }
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
      return -EINVAL;
  }
 -static int coroutine_fn vu_block_flush(VuBlockReq *req)
 +static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
  {
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
 -    BlockBackend *backend = vdev_blk->backend;
 -    return blk_co_flush(backend);
 -}
 -
 -static void coroutine_fn vu_block_virtio_process_req(void *opaque)
 -{
 -    VuBlockReq *req = opaque;
 +    VuBlkReq *req = opaque;
      VuServer *server = req->server;
      VuVirtqElement *elem = &req->elem;
      uint32_t type;
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
 -    BlockBackend *backend = vdev_blk->backend;
 +    VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
 +    BlockBackend *blk = vexp->export.blk;
      struct iovec *in_iov = elem->in_sg;
      struct iovec *out_iov = elem->out_sg;
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
          bool is_write = type & VIRTIO_BLK_T_OUT;
          req->sector_num = le64_to_cpu(req->out.sector);
 -        int64_t offset = req->sector_num * vdev_blk->blk_size;
 +        if (is_write && !vexp->writable) {
 +            req->in->status = VIRTIO_BLK_S_IOERR;
 +            break;
 +        }
++
--    ret = parallels_check_outside_image(bs, res, fix);
++        int64_t offset = req->sector_num * vexp->blk_size;
          QEMUIOVector qiov;
          if (is_write) {
              qemu_iovec_init_external(&qiov, out_iov, out_num);
 -            ret = blk_co_pwritev(backend, offset, qiov.size,
 -                                 &qiov, 0);
 +            ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0);
          } else {
              qemu_iovec_init_external(&qiov, in_iov, in_num);
 -            ret = blk_co_preadv(backend, offset, qiov.size,
 -                                &qiov, 0);
 +            ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0);
          }
          if (ret >= 0) {
              req->in->status = VIRTIO_BLK_S_OK;
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
          break;
      }
      case VIRTIO_BLK_T_FLUSH:
 -        if (vu_block_flush(req) == 0) {
 +        if (blk_co_flush(blk) == 0) {
              req->in->status = VIRTIO_BLK_S_OK;
          } else {
              req->in->status = VIRTIO_BLK_S_IOERR;
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
      case VIRTIO_BLK_T_DISCARD:
      case VIRTIO_BLK_T_WRITE_ZEROES: {
          int rc;
 -        rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
 -                                           out_num, type);
 +
 +        if (!vexp->writable) {
 +            req->in->status = VIRTIO_BLK_S_IOERR;
 +            break;
 +        }
 +
 +        rc = vu_blk_discard_write_zeroes(blk, &elem->out_sg[1], out_num, type);
          if (rc == 0) {
              req->in->status = VIRTIO_BLK_S_OK;
          } else {
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
          break;
      }
 -    vu_block_req_complete(req);
 +    vu_blk_req_complete(req);
      return;
  err:
 -    free(elem);
 +    free(req);
  }
 -static void vu_block_process_vq(VuDev *vu_dev, int idx)
 +static void vu_blk_process_vq(VuDev *vu_dev, int idx)
  {
      VuServer *server = container_of(vu_dev, VuServer, vu_dev);
      VuVirtq *vq = vu_get_queue(vu_dev, idx);
      while (1) {
 -        VuBlockReq *req;
 +        VuBlkReq *req;
 -        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
 +        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq));
          if (!req) {
              break;
          }
@@ -XXX,XX +XXX,XX @@ static void vu_block_process_vq(VuDev *vu_dev, int idx)
          req->vq = vq;
          Coroutine *co =
 -            qemu_coroutine_create(vu_block_virtio_process_req, req);
 +            qemu_coroutine_create(vu_blk_virtio_process_req, req);
          qemu_coroutine_enter(co);
      }
  }
 -static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
 +static void vu_blk_queue_set_started(VuDev *vu_dev, int idx, bool started)
  {
      VuVirtq *vq;
      assert(vu_dev);
      vq = vu_get_queue(vu_dev, idx);
 -    vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
 +    vu_set_queue_handler(vu_dev, vq, started ? vu_blk_process_vq : NULL);
  }
 -static uint64_t vu_block_get_features(VuDev *dev)
 +static uint64_t vu_blk_get_features(VuDev *dev)
  {
      uint64_t features;
      VuServer *server = container_of(dev, VuServer, vu_dev);
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
 +    VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
      features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
 ull << VIRTIO_BLK_F_SEG_MAX |
 ull << VIRTIO_BLK_F_TOPOLOGY |
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_block_get_features(VuDev *dev)
 ull << VIRTIO_RING_F_EVENT_IDX |
 ull << VHOST_USER_F_PROTOCOL_FEATURES;
 -    if (!vdev_blk->writable) {
 +    if (!vexp->writable) {
          features |= 1ull << VIRTIO_BLK_F_RO;
      }
      return features;
  }
 -static uint64_t vu_block_get_protocol_features(VuDev *dev)
 +static uint64_t vu_blk_get_protocol_features(VuDev *dev)
  {
      return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
 ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
  }
  static int
 -vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
 +vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
  {
 +    /* TODO blkcfg must be little-endian for VIRTIO 1.0 */
      VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
 -    memcpy(config, &vdev_blk->blkcfg, len);
 -
 +    VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
 +    memcpy(config, &vexp->blkcfg, len);
      return 0;
  }
  static int
 -vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
 +vu_blk_set_config(VuDev *vu_dev, const uint8_t *data,
                      uint32_t offset, uint32_t size, uint32_t flags)
  {
      VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
 +    VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
      uint8_t wce;
      /* don't support live migration */
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
      }
      wce = *data;
 -    vdev_blk->blkcfg.wce = wce;
 -    blk_set_enable_write_cache(vdev_blk->backend, wce);
 +    vexp->blkcfg.wce = wce;
 +    blk_set_enable_write_cache(vexp->export.blk, wce);
      return 0;
  }
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
   * of vu_process_message.
   *
   */
 -static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
 +static int vu_blk_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
  {
      if (vmsg->request == VHOST_USER_NONE) {
          dev->panic(dev, "disconnect");
@@ -XXX,XX +XXX,XX @@ static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
      return false;
  }
 -static const VuDevIface vu_block_iface = {
 -    .get_features          = vu_block_get_features,
 -    .queue_set_started     = vu_block_queue_set_started,
 -    .get_protocol_features = vu_block_get_protocol_features,
 -    .get_config            = vu_block_get_config,
 -    .set_config            = vu_block_set_config,
 -    .process_msg           = vu_block_process_msg,
 +static const VuDevIface vu_blk_iface = {
 +    .get_features          = vu_blk_get_features,
 +    .queue_set_started     = vu_blk_queue_set_started,
 +    .get_protocol_features = vu_blk_get_protocol_features,
 +    .get_config            = vu_blk_get_config,
 +    .set_config            = vu_blk_set_config,
 +    .process_msg           = vu_blk_process_msg,
  };
  static void blk_aio_attached(AioContext *ctx, void *opaque)
  {
 -    VuBlockDev *vub_dev = opaque;
 -    vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
 +    VuBlkExport *vexp = opaque;
 +    vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
  }
  static void blk_aio_detach(void *opaque)
  {
 -    VuBlockDev *vub_dev = opaque;
 -    vhost_user_server_detach_aio_context(&vub_dev->vu_server);
 +    VuBlkExport *vexp = opaque;
 +    vhost_user_server_detach_aio_context(&vexp->vu_server);
  }
  static void
 -vu_block_initialize_config(BlockDriverState *bs,
 +vu_blk_initialize_config(BlockDriverState *bs,
                             struct virtio_blk_config *config, uint32_t blk_size)
  {
      config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
@@ -XXX,XX +XXX,XX @@ vu_block_initialize_config(BlockDriverState *bs,
      config->max_write_zeroes_seg = 1;
  }
 -static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
 +static void vu_blk_exp_request_shutdown(BlockExport *exp)
  {
 +    VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
 -    BlockBackend *blk;
 -    Error *local_error = NULL;
 -    const char *node_name = vu_block_device->node_name;
 -    bool writable = vu_block_device->writable;
 -    uint64_t perm = BLK_PERM_CONSISTENT_READ;
 -    int ret;
 -
 -    AioContext *ctx;
 -
 -    BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
 -
 -    if (!bs) {
 -        error_propagate(errp, local_error);
 -        return NULL;
 -    }
 -
 -    if (bdrv_is_read_only(bs)) {
 -        writable = false;
 -    }
 -
 -    if (writable) {
 -        perm |= BLK_PERM_WRITE;
 -    }
 -
 -    ctx = bdrv_get_aio_context(bs);
 -    aio_context_acquire(ctx);
 -    bdrv_invalidate_cache(bs, NULL);
 -    aio_context_release(ctx);
 -
 -    /*
 -     * Don't allow resize while the vhost user server is running,
 -     * otherwise we don't care what happens with the node.
 -     */
 -    blk = blk_new(bdrv_get_aio_context(bs), perm,
 -                  BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
 -                  BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
 -    ret = blk_insert_bs(blk, bs, errp);
 -
 -    if (ret < 0) {
+-        goto fail;
+-    }
+-
+-    blk_set_enable_write_cache(blk, false);
+-
+-    blk_set_allow_aio_context_change(blk, true);
+-
+-    vu_block_device->blkcfg.wce = 0;
+-    vu_block_device->backend = blk;
+-    if (!vu_block_device->blk_size) {
+-        vu_block_device->blk_size = BDRV_SECTOR_SIZE;
+-    }
+-    vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
+-    blk_set_guest_block_size(blk, vu_block_device->blk_size);
+-    vu_block_initialize_config(bs, &vu_block_device->blkcfg,
+-                                   vu_block_device->blk_size);
+-    return vu_block_device;
+-
+-fail:
+-    blk_unref(blk);
+-    return NULL;
+-}
+-
+-static void vu_block_deinit(VuBlockDev *vu_block_device)
+-{
+-    if (vu_block_device->backend) {
+-        blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
+-                                        blk_aio_detach, vu_block_device);
+-    }
+-
+-    blk_unref(vu_block_device->backend);
+-}
+-
+-static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
+-{
+-    vhost_user_server_stop(&vu_block_device->vu_server);
+-    vu_block_deinit(vu_block_device);
+-}
+-
+-static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
+-                                        Error **errp)
+-{
+-    AioContext *ctx;
+-    SocketAddress *addr = vu_block_device->addr;
+-
+-    if (!vu_block_init(vu_block_device, errp)) {
+-        return;
+-    }
+-
+-    ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
+-
+-    if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
+-                                 VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
+-                                 errp)) {
+-        goto error;
+-    }
+-
+-    blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
+-                                 blk_aio_detach, vu_block_device);
+-    vu_block_device->running = true;
+-    return;
+-
+- error:
+-    vu_block_deinit(vu_block_device);
+-}
+-
+-static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
+-{
+-    if (vus->running) {
+-            error_setg(errp, "The property can't be modified "
+-                       "while the server is running");
+-            return false;
+-    }
+-    return true;
+-}
+-
+-static void vu_set_node_name(Object *obj, const char *value, Error **errp)
+-{
+-    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+-
+-    if (!vu_prop_modifiable(vus, errp)) {
+-        return;
+-    }
+-
+-    if (vus->node_name) {
+-        g_free(vus->node_name);
+-    }
+-
+-    vus->node_name = g_strdup(value);
+-}
+-
+-static char *vu_get_node_name(Object *obj, Error **errp)
+-{
+-    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+-    return g_strdup(vus->node_name);
+-}
+-
+-static void free_socket_addr(SocketAddress *addr)
+-{
+-        g_free(addr->u.q_unix.path);
+-        g_free(addr);
+-}
+-
+-static void vu_set_unix_socket(Object *obj, const char *value,
+-                               Error **errp)
+-{
+-    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+-
+-    if (!vu_prop_modifiable(vus, errp)) {
+-        return;
+-    }
+-
+-    if (vus->addr) {
+-        free_socket_addr(vus->addr);
+-    }
+-
+-    SocketAddress *addr = g_new0(SocketAddress, 1);
+-    addr->type = SOCKET_ADDRESS_TYPE_UNIX;
+-    addr->u.q_unix.path = g_strdup(value);
+-    vus->addr = addr;
++    vhost_user_server_stop(&vexp->vu_server);
+ }
+-static char *vu_get_unix_socket(Object *obj, Error **errp)
++static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
++                             Error **errp)
+ {
+-    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+-    return g_strdup(vus->addr->u.q_unix.path);
+-}
+-
+-static bool vu_get_block_writable(Object *obj, Error **errp)
+-{
+-    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+-    return vus->writable;
+-}
+-
+-static void vu_set_block_writable(Object *obj, bool value, Error **errp)
+-{
+-    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+-
+-    if (!vu_prop_modifiable(vus, errp)) {
+-            return;
+-    }
+-
+-    vus->writable = value;
+-}
+-
+-static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
+-                            void *opaque, Error **errp)
+-{
+-    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+-    uint32_t value = vus->blk_size;
+-
+-    visit_type_uint32(v, name, &value, errp);
+-}
+-
+-static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
+-                            void *opaque, Error **errp)
+-{
+-    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
+-
++    VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
++    BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
+     Error *local_err = NULL;
+-    uint32_t value;
++    uint64_t logical_block_size;
+-    if (!vu_prop_modifiable(vus, errp)) {
+-            return;
+-    }
++    vexp->writable = opts->writable;
++    vexp->blkcfg.wce = 0;
+-    visit_type_uint32(v, name, &value, &local_err);
+-    if (local_err) {
 -        goto out;
--    }
++    if (vu_opts->has_logical_block_size) {
-+        ret = parallels_check_leak(bs, res, fix);
++        logical_block_size = vu_opts->logical_block_size;
-+        if (ret < 0) {
++    } else {
-+            return ret;
++        logical_block_size = BDRV_SECTOR_SIZE;
-+        }
+     }
+-
--    ret = parallels_check_leak(bs, res, fix);
+-    check_block_size(object_get_typename(obj), name, value, &local_err);
--    if (ret < 0) {
++    check_block_size(exp->id, "logical-block-size", logical_block_size,
 +                     &local_err);
      if (local_err) {
 -        goto out;
-+        parallels_collect_statistics(bs, res, fix);
++        error_propagate(errp, local_err);
 +        return -EINVAL;
 +    }
 +    vexp->blk_size = logical_block_size;
 +    blk_set_guest_block_size(exp->blk, logical_block_size);
 +    vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
 +                               logical_block_size);
 +
 +    blk_set_allow_aio_context_change(exp->blk, true);
 +    blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 +                                 vexp);
 +
 +    if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
 +                                 VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
 +                                 errp)) {
 +        blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
 +                                        blk_aio_detach, vexp);
 +        return -EADDRNOTAVAIL;
      }
--    parallels_collect_statistics(bs, res, fix);
+-    vus->blk_size = value;
 -
 -out:
--    qemu_co_mutex_unlock(&s->lock);
+-    error_propagate(errp, local_err);
--
+-}
--    if (ret == 0) {
+-
--        ret = bdrv_co_flush(bs);
+-static void vhost_user_blk_server_instance_finalize(Object *obj)
--        if (ret < 0) {
+-{
--            res->check_errors++;
+-    VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
--        }
+-
-+    ret = bdrv_co_flush(bs);
+-    vhost_user_blk_server_stop(vub);
-+    if (ret < 0) {
+-
-+        res->check_errors++;
+-    /*
-     }
+-     * Unlike object_property_add_str, object_class_property_add_str
+-     * doesn't have a release method. Thus manual memory freeing is
-     return ret;
+-     * needed.
 -     */
 -    free_socket_addr(vub->addr);
 -    g_free(vub->node_name);
 -}
 -
 -static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
 -{
 -    VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
 -
 -    vhost_user_blk_server_start(vub, errp);
 +    return 0;
  }
 -static void vhost_user_blk_server_class_init(ObjectClass *klass,
 -                                             void *class_data)
 +static void vu_blk_exp_delete(BlockExport *exp)
  {
 -    UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
 -    ucc->complete = vhost_user_blk_server_complete;
 -
 -    object_class_property_add_bool(klass, "writable",
 -                                   vu_get_block_writable,
 -                                   vu_set_block_writable);
 -
 -    object_class_property_add_str(klass, "node-name",
 -                                  vu_get_node_name,
 -                                  vu_set_node_name);
 -
 -    object_class_property_add_str(klass, "unix-socket",
 -                                  vu_get_unix_socket,
 -                                  vu_set_unix_socket);
 +    VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
 -    object_class_property_add(klass, "logical-block-size", "uint32",
 -                              vu_get_blk_size, vu_set_blk_size,
 -                              NULL, NULL);
 +    blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 +                                    vexp);
  }
 -static const TypeInfo vhost_user_blk_server_info = {
 -    .name = TYPE_VHOST_USER_BLK_SERVER,
 -    .parent = TYPE_OBJECT,
 -    .instance_size = sizeof(VuBlockDev),
 -    .instance_finalize = vhost_user_blk_server_instance_finalize,
 -    .class_init = vhost_user_blk_server_class_init,
 -    .interfaces = (InterfaceInfo[]) {
 -        {TYPE_USER_CREATABLE},
 -        {}
 -    },
 +const BlockExportDriver blk_exp_vhost_user_blk = {
 +    .type               = BLOCK_EXPORT_TYPE_VHOST_USER_BLK,
 +    .instance_size      = sizeof(VuBlkExport),
 +    .create             = vu_blk_exp_create,
 +    .delete             = vu_blk_exp_delete,
 +    .request_shutdown   = vu_blk_exp_request_shutdown,
  };
 -
 -static void vhost_user_blk_server_register_types(void)
 -{
 -    type_register_static(&vhost_user_blk_server_info);
 -}
 -
 -type_init(vhost_user_blk_server_register_types)
 diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/vhost-user-server.c
 +++ b/util/vhost-user-server.c
@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
                               Error **errp)
  {
      QEMUBH *bh;
 -    QIONetListener *listener = qio_net_listener_new();
 +    QIONetListener *listener;
 +
 +    if (socket_addr->type != SOCKET_ADDRESS_TYPE_UNIX &&
 +        socket_addr->type != SOCKET_ADDRESS_TYPE_FD) {
 +        error_setg(errp, "Only socket address types 'unix' and 'fd' are supported");
 +        return false;
 +    }
 +
 +    listener = qio_net_listener_new();
      if (qio_net_listener_open_sync(listener, socket_addr, 1,
                                     errp) < 0) {
          object_unref(OBJECT(listener));
 diff --git a/block/export/meson.build b/block/export/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/meson.build
 +++ b/block/export/meson.build
@@ -1 +1,2 @@
  block_ss.add(files('export.c'))
 +block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
 diff --git a/block/meson.build b/block/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/block/meson.build
 +++ b/block/meson.build
@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
  block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
  block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
  block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
 -block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
  block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
  block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
  block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
 --
-.40.1
+.26.2

-New patch
+[PULL v2 18/28] util/vhost-user-server: move header to include/
+Headers used by other subsystems are located in include/. Also add the
+vhost-user-server and vhost-user-blk-server headers to MAINTAINERS.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-13-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ MAINTAINERS                                | 4 +++-
+ {util => include/qemu}/vhost-user-server.h | 0
+ block/export/vhost-user-blk-server.c       | 2 +-
+ util/vhost-user-server.c                   | 2 +-
+files changed, 5 insertions(+), 3 deletions(-)
+ rename {util => include/qemu}/vhost-user-server.h (100%)
+diff --git a/MAINTAINERS b/MAINTAINERS
+index XXXXXXX..XXXXXXX 100644
+--- a/MAINTAINERS
++++ b/MAINTAINERS
+@@ -XXX,XX +XXX,XX @@ Vhost-user block device backend server
+ M: Coiby Xu <Coiby.Xu@gmail.com>
+ S: Maintained
+ F: block/export/vhost-user-blk-server.c
+-F: util/vhost-user-server.c
++F: block/export/vhost-user-blk-server.h
++F: include/qemu/vhost-user-server.h
+ F: tests/qtest/libqos/vhost-user-blk.c
++F: util/vhost-user-server.c
+ Replication
+ M: Wen Congyang <wencongyang2@huawei.com>
+diff --git a/util/vhost-user-server.h b/include/qemu/vhost-user-server.h
+similarity index 100%
+rename from util/vhost-user-server.h
+rename to include/qemu/vhost-user-server.h
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/export/vhost-user-blk-server.c
++++ b/block/export/vhost-user-blk-server.c
+@@ -XXX,XX +XXX,XX @@
+ #include "block/block.h"
+ #include "contrib/libvhost-user/libvhost-user.h"
+ #include "standard-headers/linux/virtio_blk.h"
+-#include "util/vhost-user-server.h"
++#include "qemu/vhost-user-server.h"
+ #include "vhost-user-blk-server.h"
+ #include "qapi/error.h"
+ #include "qom/object_interfaces.h"
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.c
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@
+  */
+ #include "qemu/osdep.h"
+ #include "qemu/main-loop.h"
++#include "qemu/vhost-user-server.h"
+ #include "block/aio-wait.h"
+-#include "vhost-user-server.h"
+ /*
+  * Theory of operation:
+--
+.26.2

-New patch
+[PULL v2 19/28] util/vhost-user-server: use static library in meson.build
+Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build
+the static library once and then reuse it throughout QEMU.
+Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the
+vhost-user tools (vhost-user-gpu, etc) do.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-14-stefanha@redhat.com
+[Added CONFIG_LINUX again because libvhost-user doesn't build on macOS.
+--Stefan]
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ block/export/export.c             | 8 ++++----
+ block/export/meson.build          | 2 +-
+ contrib/libvhost-user/meson.build | 1 +
+ meson.build                       | 6 +++++-
+ util/meson.build                  | 4 +++-
+files changed, 14 insertions(+), 7 deletions(-)
+diff --git a/block/export/export.c b/block/export/export.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/export/export.c
++++ b/block/export/export.c
+@@ -XXX,XX +XXX,XX @@
+ #include "sysemu/block-backend.h"
+ #include "block/export.h"
+ #include "block/nbd.h"
+-#if CONFIG_LINUX
+-#include "block/export/vhost-user-blk-server.h"
+-#endif
+ #include "qapi/error.h"
+ #include "qapi/qapi-commands-block-export.h"
+ #include "qapi/qapi-events-block-export.h"
+ #include "qemu/id.h"
++#ifdef CONFIG_VHOST_USER
++#include "vhost-user-blk-server.h"
++#endif
+ static const BlockExportDriver *blk_exp_drivers[] = {
+     &blk_exp_nbd,
+-#if CONFIG_LINUX
++#ifdef CONFIG_VHOST_USER
+     &blk_exp_vhost_user_blk,
+ #endif
+ };
+diff --git a/block/export/meson.build b/block/export/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/block/export/meson.build
++++ b/block/export/meson.build
+@@ -XXX,XX +XXX,XX @@
+ block_ss.add(files('export.c'))
+-block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
++block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
+diff --git a/contrib/libvhost-user/meson.build b/contrib/libvhost-user/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/contrib/libvhost-user/meson.build
++++ b/contrib/libvhost-user/meson.build
+@@ -XXX,XX +XXX,XX @@
+ libvhost_user = static_library('vhost-user',
+                                files('libvhost-user.c', 'libvhost-user-glib.c'),
+                                build_by_default: false)
++vhost_user = declare_dependency(link_with: libvhost_user)
+diff --git a/meson.build b/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/meson.build
++++ b/meson.build
+@@ -XXX,XX +XXX,XX @@ trace_events_subdirs += [
+   'util',
+ ]
++vhost_user = not_found
++if 'CONFIG_VHOST_USER' in config_host
++  subdir('contrib/libvhost-user')
++endif
++
+ subdir('qapi')
+ subdir('qobject')
+ subdir('stubs')
+@@ -XXX,XX +XXX,XX @@ if have_tools
+              install: true)
+   if 'CONFIG_VHOST_USER' in config_host
+-    subdir('contrib/libvhost-user')
+     subdir('contrib/vhost-user-blk')
+     subdir('contrib/vhost-user-gpu')
+     subdir('contrib/vhost-user-input')
+diff --git a/util/meson.build b/util/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/util/meson.build
++++ b/util/meson.build
+@@ -XXX,XX +XXX,XX @@ if have_block
+   util_ss.add(files('main-loop.c'))
+   util_ss.add(files('nvdimm-utils.c'))
+   util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
+-  util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
++  util_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: [
++    files('vhost-user-server.c'), vhost_user
++  ])
+   util_ss.add(files('block-helpers.c'))
+   util_ss.add(files('qemu-coroutine-sleep.c'))
+   util_ss.add(files('qemu-co-shared-resource.c'))
+--
+.26.2

-New patch
+[PULL v2 20/28] qemu-storage-daemon: avoid compiling blockdev_ss twice
+Introduce libblkdev.fa to avoid recompiling blockdev_ss twice.
+Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
+Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200929125516.186715-3-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ meson.build                | 12 ++++++++++--
+ storage-daemon/meson.build |  3 +--
+files changed, 11 insertions(+), 4 deletions(-)
+diff --git a/meson.build b/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/meson.build
++++ b/meson.build
+@@ -XXX,XX +XXX,XX @@ blockdev_ss.add(files(
+ # os-win32.c does not
+ blockdev_ss.add(when: 'CONFIG_POSIX', if_true: files('os-posix.c'))
+ softmmu_ss.add(when: 'CONFIG_WIN32', if_true: [files('os-win32.c')])
+-softmmu_ss.add_all(blockdev_ss)
+ common_ss.add(files('cpus-common.c'))
+@@ -XXX,XX +XXX,XX @@ block = declare_dependency(link_whole: [libblock],
+                            link_args: '@block.syms',
+                            dependencies: [crypto, io])
++blockdev_ss = blockdev_ss.apply(config_host, strict: false)
++libblockdev = static_library('blockdev', blockdev_ss.sources() + genh,
++                             dependencies: blockdev_ss.dependencies(),
++                             name_suffix: 'fa',
++                             build_by_default: false)
++
++blockdev = declare_dependency(link_whole: [libblockdev],
++                              dependencies: [block])
++
+ qmp_ss = qmp_ss.apply(config_host, strict: false)
+ libqmp = static_library('qmp', qmp_ss.sources() + genh,
+                         dependencies: qmp_ss.dependencies(),
+@@ -XXX,XX +XXX,XX @@ foreach m : block_mods + softmmu_mods
+                 install_dir: config_host['qemu_moddir'])
+ endforeach
+-softmmu_ss.add(authz, block, chardev, crypto, io, qmp)
++softmmu_ss.add(authz, blockdev, chardev, crypto, io, qmp)
+ common_ss.add(qom, qemuutil)
+ common_ss.add_all(when: 'CONFIG_SOFTMMU', if_true: [softmmu_ss])
+diff --git a/storage-daemon/meson.build b/storage-daemon/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/storage-daemon/meson.build
++++ b/storage-daemon/meson.build
+@@ -XXX,XX +XXX,XX @@
+ qsd_ss = ss.source_set()
+ qsd_ss.add(files('qemu-storage-daemon.c'))
+-qsd_ss.add(block, chardev, qmp, qom, qemuutil)
+-qsd_ss.add_all(blockdev_ss)
++qsd_ss.add(blockdev, chardev, qmp, qom, qemuutil)
+ subdir('qapi')
+--
+.26.2

-[PULL 04/17] iotests/iov-padding: New test
+[PULL v2 21/28] block: move block exports to libblockdev
-Test that even vectored IO requests with 1024 vector elements that are
+Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
-not aligned to the device's request alignment will succeed.
+They are not used by other programs and are not otherwise needed in
 libblock.
+Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
+Since bdrv_close_all() (libblock) calls blk_exp_close_all()
+(libblockdev) a stub function is required..
+Make qemu-nbd.c use signal handling utility functions instead of
+duplicating the code. This helps because os-posix.c is in libblockdev
+and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
+Once we use the signal handling utility functions we also end up
+providing the necessary symbol.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
 Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Message-id: 20200929125516.186715-4-stefanha@redhat.com
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+[Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
-Message-Id: <20230411173418.19549-5-hreitz@redhat.com>
+--Stefan]
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- tests/qemu-iotests/tests/iov-padding     | 85 ++++++++++++++++++++++++
+ qemu-nbd.c                | 21 ++++++++-------------
- tests/qemu-iotests/tests/iov-padding.out | 59 ++++++++++++++++
+ stubs/blk-exp-close-all.c |  7 +++++++
-files changed, 144 insertions(+)
+ block/export/meson.build  |  4 ++--
- create mode 100755 tests/qemu-iotests/tests/iov-padding
+ meson.build               |  4 ++--
- create mode 100644 tests/qemu-iotests/tests/iov-padding.out
+ nbd/meson.build           |  2 ++
  stubs/meson.build         |  1 +
 files changed, 22 insertions(+), 17 deletions(-)
  create mode 100644 stubs/blk-exp-close-all.c
-diff --git a/tests/qemu-iotests/tests/iov-padding b/tests/qemu-iotests/tests/iov-padding
+diff --git a/qemu-nbd.c b/qemu-nbd.c
-new file mode 100755
+index XXXXXXX..XXXXXXX 100644
-index XXXXXXX..XXXXXXX
+--- a/qemu-nbd.c
---- /dev/null
++++ b/qemu-nbd.c
 +++ b/tests/qemu-iotests/tests/iov-padding
 @@ -XXX,XX +XXX,XX @@
-+#!/usr/bin/env bash
+ #include "qapi/error.h"
-+# group: rw quick
+ #include "qemu/cutils.h"
-+#
+ #include "sysemu/block-backend.h"
-+# Check the interaction of request padding (to fit alignment restrictions) with
++#include "sysemu/runstate.h" /* for qemu_system_killed() prototype */
-+# vectored I/O from the guest
+ #include "block/block_int.h"
-+#
+ #include "block/nbd.h"
-+# Copyright Red Hat
+ #include "qemu/main-loop.h"
-+#
+@@ -XXX,XX +XXX,XX @@ QEMU_COPYRIGHT "\n"
-+# This program is free software; you can redistribute it and/or modify
+ }
-+# it under the terms of the GNU General Public License as published by
-+# the Free Software Foundation; either version 2 of the License, or
+ #ifdef CONFIG_POSIX
-+# (at your option) any later version.
+-static void termsig_handler(int signum)
-+#
++/*
-+# This program is distributed in the hope that it will be useful,
++ * The client thread uses SIGTERM to interrupt the server.  A signal
-+# but WITHOUT ANY WARRANTY; without even the implied warranty of
++ * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
-+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++ */
-+# GNU General Public License for more details.
++void qemu_system_killed(int signum, pid_t pid)
-+#
+ {
-+# You should have received a copy of the GNU General Public License
+     qatomic_cmpxchg(&state, RUNNING, TERMINATE);
-+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+     qemu_notify_event();
-+#
+@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
-+
+     BlockExportOptions *export_opts;
-+seq=$(basename $0)
-+echo "QA output created by $seq"
+ #ifdef CONFIG_POSIX
-+
+-    /*
-+status=1    # failure is the default!
+-     * Exit gracefully on various signals, which includes SIGTERM used
-+
+-     * by 'qemu-nbd -v -c'.
-+_cleanup()
+-     */
-+{
+-    struct sigaction sa_sigterm;
-+    _cleanup_test_img
+-    memset(&sa_sigterm, 0, sizeof(sa_sigterm));
-+}
+-    sa_sigterm.sa_handler = termsig_handler;
-+trap "_cleanup; exit \$status" 0 1 2 3 15
+-    sigaction(SIGTERM, &sa_sigterm, NULL);
-+
+-    sigaction(SIGINT, &sa_sigterm, NULL);
-+# get standard environment, filters and checks
+-    sigaction(SIGHUP, &sa_sigterm, NULL);
-+cd ..
+-
-+. ./common.rc
+-    signal(SIGPIPE, SIG_IGN);
-+. ./common.filter
++    os_setup_early_signal_handling();
-+
++    os_setup_signal_handling();
-+_supported_fmt raw
+ #endif
-+_supported_proto file
-+
+     socket_init();
-+_make_test_img 1M
+diff --git a/stubs/blk-exp-close-all.c b/stubs/blk-exp-close-all.c
 +
 +IMGSPEC="driver=blkdebug,align=4096,image.driver=file,image.filename=$TEST_IMG"
 +
 +# Four combinations:
 +# - Offset 4096, length 1023 * 512 + 512: Fully aligned to 4k
 +# - Offset 4096, length 1023 * 512 + 4096: Head is aligned, tail is not
 +# - Offset 512, length 1023 * 512 + 512: Neither head nor tail are aligned
 +# - Offset 512, length 1023 * 512 + 4096: Tail is aligned, head is not
 +for start_offset in 4096 512; do
 +    for last_element_length in 512 4096; do
 +        length=$((1023 * 512 + $last_element_length))
 +
 +        echo
 +        echo "== performing 1024-element vectored requests to image (offset: $start_offset; length: $length) =="
 +
 +        # Fill with data for testing
 +        $QEMU_IO -c 'write -P 1 0 1M' "$TEST_IMG" | _filter_qemu_io
 +
 +        # 1023 512-byte buffers, and then one with length $last_element_length
 +        cmd_params="-P 2 $start_offset $(yes 512 | head -n 1023 | tr '\n' ' ') $last_element_length"
 +        QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" $QEMU_IO \
 +            -c "writev $cmd_params" \
 +            --image-opts \
 +            "$IMGSPEC" \
 +            | _filter_qemu_io
 +
 +        # Read all patterns -- read the part we just wrote with writev twice,
 +        # once "normally", and once with a readv, so we see that that works, too
 +        QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" $QEMU_IO \
 +            -c "read -P 1 0 $start_offset" \
 +            -c "read -P 2 $start_offset $length" \
 +            -c "readv $cmd_params" \
 +            -c "read -P 1 $((start_offset + length)) $((1024 * 1024 - length - start_offset))" \
 +            --image-opts \
 +            "$IMGSPEC" \
 +            | _filter_qemu_io
 +    done
 +done
 +
 +# success, all done
 +echo "*** done"
 +rm -f $seq.full
 +status=0
 diff --git a/tests/qemu-iotests/tests/iov-padding.out b/tests/qemu-iotests/tests/iov-padding.out
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/tests/qemu-iotests/tests/iov-padding.out
++++ b/stubs/blk-exp-close-all.c
 @@ -XXX,XX +XXX,XX @@
-+QA output created by iov-padding
++#include "qemu/osdep.h"
-+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
++#include "block/export.h"
 +
-+== performing 1024-element vectored requests to image (offset: 4096; length: 524288) ==
++/* Only used in programs that support block exports (libblockdev.fa) */
-+wrote 1048576/1048576 bytes at offset 0
++void blk_exp_close_all(void)
-+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++{
-+wrote 524288/524288 bytes at offset 4096
++}
-+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+diff --git a/block/export/meson.build b/block/export/meson.build
-+read 4096/4096 bytes at offset 0
+index XXXXXXX..XXXXXXX 100644
-+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+--- a/block/export/meson.build
-+read 524288/524288 bytes at offset 4096
++++ b/block/export/meson.build
-+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+@@ -XXX,XX +XXX,XX @@
-+read 524288/524288 bytes at offset 4096
+-block_ss.add(files('export.c'))
-+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+-block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
-+read 520192/520192 bytes at offset 528384
++blockdev_ss.add(files('export.c'))
-+508 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++blockdev_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
-+
+diff --git a/meson.build b/meson.build
-+== performing 1024-element vectored requests to image (offset: 4096; length: 527872) ==
+index XXXXXXX..XXXXXXX 100644
-+wrote 1048576/1048576 bytes at offset 0
+--- a/meson.build
-+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++++ b/meson.build
-+wrote 527872/527872 bytes at offset 4096
+@@ -XXX,XX +XXX,XX @@ subdir('dump')
-+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-+read 4096/4096 bytes at offset 0
+ block_ss.add(files(
-+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+   'block.c',
-+read 527872/527872 bytes at offset 4096
+-  'blockdev-nbd.c',
-+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+   'blockjob.c',
-+read 527872/527872 bytes at offset 4096
+   'job.c',
-+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+   'qemu-io-cmds.c',
-+read 516608/516608 bytes at offset 531968
+@@ -XXX,XX +XXX,XX @@ subdir('block')
-+504.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-+
+ blockdev_ss.add(files(
-+== performing 1024-element vectored requests to image (offset: 512; length: 524288) ==
+   'blockdev.c',
-+wrote 1048576/1048576 bytes at offset 0
++  'blockdev-nbd.c',
-+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+   'iothread.c',
-+wrote 524288/524288 bytes at offset 512
+   'job-qmp.c',
-+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+ ))
-+read 512/512 bytes at offset 0
+@@ -XXX,XX +XXX,XX @@ if have_tools
-+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+   qemu_io = executable('qemu-io', files('qemu-io.c'),
-+read 524288/524288 bytes at offset 512
+              dependencies: [block, qemuutil], install: true)
-+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+   qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
-+read 524288/524288 bytes at offset 512
+-               dependencies: [block, qemuutil], install: true)
-+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++               dependencies: [blockdev, qemuutil], install: true)
-+read 523776/523776 bytes at offset 524800
-+511.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+   subdir('storage-daemon')
-+
+   subdir('contrib/rdmacm-mux')
-+== performing 1024-element vectored requests to image (offset: 512; length: 527872) ==
+diff --git a/nbd/meson.build b/nbd/meson.build
-+wrote 1048576/1048576 bytes at offset 0
+index XXXXXXX..XXXXXXX 100644
-+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+--- a/nbd/meson.build
-+wrote 527872/527872 bytes at offset 512
++++ b/nbd/meson.build
-+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+@@ -XXX,XX +XXX,XX @@
-+read 512/512 bytes at offset 0
+ block_ss.add(files(
-+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+   'client.c',
-+read 527872/527872 bytes at offset 512
+   'common.c',
-+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++))
-+read 527872/527872 bytes at offset 512
++blockdev_ss.add(files(
-+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+   'server.c',
-+read 520192/520192 bytes at offset 528384
+ ))
-+508 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+diff --git a/stubs/meson.build b/stubs/meson.build
-+*** done
+index XXXXXXX..XXXXXXX 100644
 --- a/stubs/meson.build
 +++ b/stubs/meson.build
@@ -XXX,XX +XXX,XX @@
  stub_ss.add(files('arch_type.c'))
  stub_ss.add(files('bdrv-next-monitor-owned.c'))
  stub_ss.add(files('blk-commit-all.c'))
 +stub_ss.add(files('blk-exp-close-all.c'))
  stub_ss.add(files('blockdev-close-all-bdrv-states.c'))
  stub_ss.add(files('change-state-handler.c'))
  stub_ss.add(files('cmos.c'))
 --
-.40.1
+.26.2

-[PULL 13/17] parallels: Move check of leaks to a separate function
+[PULL v2 22/28] block/export: add iothread and fixed-iothread options
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Make it possible to specify the iothread where the export will run. By
 default the block node can be moved to other AioContexts later and the
 export will follow. The fixed-iothread option forces strict behavior
 that prevents changing AioContext while the export is active. See the
 QAPI docs for details.
-We will add more and more checks so we need a better code structure
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-in parallels_co_check. Let each check performs in a separate loop
+Message-id: 20200929125516.186715-5-stefanha@redhat.com
-in a separate helper.
+[Fix stray '#' character in block-export.json and add missing "(since:
 .2)" as suggested by Eric Blake.
 --Stefan]
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
  qapi/block-export.json               | 11 ++++++++++
  block/export/export.c                | 31 +++++++++++++++++++++++++++-
  block/export/vhost-user-blk-server.c |  5 ++++-
  nbd/server.c                         |  2 --
 files changed, 45 insertions(+), 4 deletions(-)
-Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+diff --git a/qapi/block-export.json b/qapi/block-export.json
 Message-Id: <20230424093147.197643-10-alexander.ivanov@virtuozzo.com>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
  block/parallels.c | 74 ++++++++++++++++++++++++++++-------------------
 file changed, 45 insertions(+), 29 deletions(-)
 diff --git a/block/parallels.c b/block/parallels.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/qapi/block-export.json
-+++ b/block/parallels.c
++++ b/qapi/block-export.json
-@@ -XXX,XX +XXX,XX @@ parallels_check_outside_image(BlockDriverState *bs, BdrvCheckResult *res,
+@@ -XXX,XX +XXX,XX @@
- }
+ #                export before completion is signalled. (since: 5.2;
+ #                default: false)
- static int coroutine_fn GRAPH_RDLOCK
+ #
--parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
++# @iothread: The name of the iothread object where the export will run. The
--                   BdrvCheckMode fix)
++#            default is to use the thread currently associated with the
-+parallels_check_leak(BlockDriverState *bs, BdrvCheckResult *res,
++#            block node. (since: 5.2)
-+                     BdrvCheckMode fix)
++#
 +# @fixed-iothread: True prevents the block node from being moved to another
 +#                  thread while the export is active. If true and @iothread is
 +#                  given, export creation fails if the block node cannot be
 +#                  moved to the iothread. The default is false. (since: 5.2)
 +#
  # Since: 4.2
  ##
  { 'union': 'BlockExportOptions',
    'base': { 'type': 'BlockExportType',
              'id': 'str',
 +        '*fixed-iothread': 'bool',
 +        '*iothread': 'str',
              'node-name': 'str',
              '*writable': 'bool',
              '*writethrough': 'bool' },
 diff --git a/block/export/export.c b/block/export/export.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/export.c
 +++ b/block/export/export.c
@@ -XXX,XX +XXX,XX @@
  #include "block/block.h"
  #include "sysemu/block-backend.h"
 +#include "sysemu/iothread.h"
  #include "block/export.h"
  #include "block/nbd.h"
  #include "qapi/error.h"
@@ -XXX,XX +XXX,XX @@ static const BlockExportDriver *blk_exp_find_driver(BlockExportType type)
  BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
  {
-     BDRVParallelsState *s = bs->opaque;
++    bool fixed_iothread = export->has_fixed_iothread && export->fixed_iothread;
--    int64_t size, prev_off;
+     const BlockExportDriver *drv;
-+    int64_t size;
+     BlockExport *exp = NULL;
      BlockDriverState *bs;
 -    BlockBackend *blk;
 +    BlockBackend *blk = NULL;
      AioContext *ctx;
      uint64_t perm;
      int ret;
--    uint32_t i;
+@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
+     ctx = bdrv_get_aio_context(bs);
-     size = bdrv_getlength(bs->file->bs);
+     aio_context_acquire(ctx);
-     if (size < 0) {
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
++    if (export->has_iothread) {
-         return size;
++        IOThread *iothread;
-     }
++        AioContext *new_ctx;
 +    if (size > res->image_end_offset) {
 +        int64_t count;
 +        count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
 +        fprintf(stderr, "%s space leaked at the end of the image %" PRId64 "\n",
 +                fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
 +                size - res->image_end_offset);
 +        res->leaks += count;
 +        if (fix & BDRV_FIX_LEAKS) {
 +            Error *local_err = NULL;
 +
-+            /*
++        iothread = iothread_by_id(export->iothread);
-+             * In order to really repair the image, we must shrink it.
++        if (!iothread) {
-+             * That means we have to pass exact=true.
++            error_setg(errp, "iothread \"%s\" not found", export->iothread);
-+             */
++            goto fail;
-+            ret = bdrv_co_truncate(bs->file, res->image_end_offset, true,
++        }
-+                                   PREALLOC_MODE_OFF, 0, &local_err);
++
-+            if (ret < 0) {
++        new_ctx = iothread_get_aio_context(iothread);
-+                error_report_err(local_err);
++
-+                res->check_errors++;
++        ret = bdrv_try_set_aio_context(bs, new_ctx, errp);
-+                return ret;
++        if (ret == 0) {
-+            }
++            aio_context_release(ctx);
-+            res->leaks_fixed += count;
++            aio_context_acquire(new_ctx);
 +            ctx = new_ctx;
 +        } else if (fixed_iothread) {
 +            goto fail;
 +        }
 +    }
 +
-+    return 0;
+     /*
-+}
+      * Block exports are used for non-shared storage migration. Make sure
       * that BDRV_O_INACTIVE is cleared and the image is ready for write
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
      }
      blk = blk_new(ctx, perm, BLK_PERM_ALL);
 +
-+static int coroutine_fn GRAPH_RDLOCK
++    if (!fixed_iothread) {
-+parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
++        blk_set_allow_aio_context_change(blk, true);
 +                   BdrvCheckMode fix)
 +{
 +    BDRVParallelsState *s = bs->opaque;
 +    int64_t prev_off;
 +    int ret;
 +    uint32_t i;
 +
      qemu_co_mutex_lock(&s->lock);
      parallels_check_unclean(bs, res, fix);
@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
          goto out;
      }
 +    ret = parallels_check_leak(bs, res, fix);
 +    if (ret < 0) {
 +        goto out;
 +    }
 +
-     res->bfi.total_clusters = s->bat_size;
+     ret = blk_insert_bs(blk, bs, errp);
-     res->bfi.compressed_clusters = 0; /* compression is not supported */
+     if (ret < 0) {
+         goto fail;
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
-         prev_off = off;
+index XXXXXXX..XXXXXXX 100644
 --- a/block/export/vhost-user-blk-server.c
 +++ b/block/export/vhost-user-blk-server.c
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_blk_iface = {
  static void blk_aio_attached(AioContext *ctx, void *opaque)
  {
      VuBlkExport *vexp = opaque;
 +
 +    vexp->export.ctx = ctx;
      vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
  }
  static void blk_aio_detach(void *opaque)
  {
      VuBlkExport *vexp = opaque;
 +
      vhost_user_server_detach_aio_context(&vexp->vu_server);
 +    vexp->export.ctx = NULL;
  }
  static void
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
      vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
                                 logical_block_size);
 -    blk_set_allow_aio_context_change(exp->blk, true);
      blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                   vexp);
 diff --git a/nbd/server.c b/nbd/server.c
 index XXXXXXX..XXXXXXX 100644
 --- a/nbd/server.c
 +++ b/nbd/server.c
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
          return ret;
      }
--    if (size > res->image_end_offset) {
+-    blk_set_allow_aio_context_change(blk, true);
 -        int64_t count;
 -        count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
 -        fprintf(stderr, "%s space leaked at the end of the image %" PRId64 "\n",
 -                fix & BDRV_FIX_LEAKS ? "Repairing" : "ERROR",
 -                size - res->image_end_offset);
 -        res->leaks += count;
 -        if (fix & BDRV_FIX_LEAKS) {
 -            Error *local_err = NULL;
 -
--            /*
+     QTAILQ_INIT(&exp->clients);
--             * In order to really repair the image, we must shrink it.
+     exp->name = g_strdup(arg->name);
--             * That means we have to pass exact=true.
+     exp->description = g_strdup(arg->description);
 -             */
 -            ret = bdrv_co_truncate(bs->file, res->image_end_offset, true,
 -                                   PREALLOC_MODE_OFF, 0, &local_err);
 -            if (ret < 0) {
 -                error_report_err(local_err);
 -                res->check_errors++;
 -                goto out;
 -            }
 -            res->leaks_fixed += count;
 -        }
 -    }
 -
  out:
      qemu_co_mutex_unlock(&s->lock);
 --
-.40.1
+.26.2

-[PULL 05/17] parallels: Out of image offset in BAT leads to image inflation
+[PULL v2 23/28] block/export: add vhost-user-blk multi-queue support
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+Allow the number of queues to be configured using --export
 vhost-user-blk,num-queues=N. This setting should match the QEMU --device
 vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
 its own value if the vhost-user-blk backend offers fewer queues than
 QEMU.
-data_end field in BDRVParallelsState is set to the biggest offset present
+The vhost-user-blk-server.c code is already capable of multi-queue. All
-in BAT. If this offset is outside of the image, any further write will
+virtqueue processing runs in the same AioContext. No new locking is
-create the cluster at this offset and/or the image will be truncated to
+needed.
 this offset on close. This is definitely not correct.
-Raise an error in parallels_open() if data_end points outside the image
+Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
-and it is not a check (let the check to repaire the image). Set data_end
+Note that the feature bit only announces the presence of the num_queues
-to the end of the cluster with the last correct offset.
+configuration space field. It does not promise that there is more than 1
 virtqueue, so we can set it unconditionally.
-Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+I tested multi-queue by running a random read fio test with numjobs=4 on
-Message-Id: <20230424093147.197643-2-alexander.ivanov@virtuozzo.com>
+an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+directory shows that Linux blk-mq has 4 queues configured.
 An automated test is included in the next commit.
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Acked-by: Markus Armbruster <armbru@redhat.com>
 Message-id: 20201001144604.559733-2-stefanha@redhat.com
 [Fixed accidental tab characters as suggested by Markus Armbruster
 --Stefan]
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/parallels.c | 17 +++++++++++++++++
+ qapi/block-export.json               | 10 +++++++---
-file changed, 17 insertions(+)
+ block/export/vhost-user-blk-server.c | 24 ++++++++++++++++++------
 files changed, 25 insertions(+), 9 deletions(-)
-diff --git a/block/parallels.c b/block/parallels.c
+diff --git a/qapi/block-export.json b/qapi/block-export.json
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/qapi/block-export.json
-+++ b/block/parallels.c
++++ b/qapi/block-export.json
-@@ -XXX,XX +XXX,XX @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
+@@ -XXX,XX +XXX,XX @@
-     BDRVParallelsState *s = bs->opaque;
+ #        SocketAddress types are supported. Passed fds must be UNIX domain
-     ParallelsHeader ph;
+ #        sockets.
-     int ret, size, i;
+ # @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
-+    int64_t file_nb_sectors;
++# @num-queues: Number of request virtqueues. Must be greater than 0. Defaults
-     QemuOpts *opts = NULL;
++#              to 1.
  #
  # Since: 5.2
  ##
  { 'struct': 'BlockExportOptionsVhostUserBlk',
 -  'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
 +  'data': { 'addr': 'SocketAddress',
 +        '*logical-block-size': 'size',
 +            '*num-queues': 'uint16'} }
  ##
  # @NbdServerAddOptions:
@@ -XXX,XX +XXX,XX @@
  { 'union': 'BlockExportOptions',
    'base': { 'type': 'BlockExportType',
              'id': 'str',
 -        '*fixed-iothread': 'bool',
 -        '*iothread': 'str',
 +            '*fixed-iothread': 'bool',
 +            '*iothread': 'str',
              'node-name': 'str',
              '*writable': 'bool',
              '*writethrough': 'bool' },
 diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/vhost-user-blk-server.c
 +++ b/block/export/vhost-user-blk-server.c
@@ -XXX,XX +XXX,XX @@
  #include "util/block-helpers.h"
  enum {
 -    VHOST_USER_BLK_MAX_QUEUES = 1,
 +    VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1,
  };
  struct virtio_blk_inhdr {
      unsigned char status;
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_blk_get_features(VuDev *dev)
 ull << VIRTIO_BLK_F_DISCARD |
 ull << VIRTIO_BLK_F_WRITE_ZEROES |
 ull << VIRTIO_BLK_F_CONFIG_WCE |
 +               1ull << VIRTIO_BLK_F_MQ |
 ull << VIRTIO_F_VERSION_1 |
 ull << VIRTIO_RING_F_INDIRECT_DESC |
 ull << VIRTIO_RING_F_EVENT_IDX |
@@ -XXX,XX +XXX,XX @@ static void blk_aio_detach(void *opaque)
  static void
  vu_blk_initialize_config(BlockDriverState *bs,
 -                           struct virtio_blk_config *config, uint32_t blk_size)
 +                         struct virtio_blk_config *config,
 +                         uint32_t blk_size,
 +                         uint16_t num_queues)
  {
      config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
      config->blk_size = blk_size;
@@ -XXX,XX +XXX,XX @@ vu_blk_initialize_config(BlockDriverState *bs,
      config->seg_max = 128 - 2;
      config->min_io_size = 1;
      config->opt_io_size = 1;
 -    config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
 +    config->num_queues = num_queues;
      config->max_discard_sectors = 32768;
      config->max_discard_seg = 1;
      config->discard_sector_alignment = config->blk_size >> 9;
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
      BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
      Error *local_err = NULL;
-     char *buf;
+     uint64_t logical_block_size;
-@@ -XXX,XX +XXX,XX @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
++    uint16_t num_queues = VHOST_USER_BLK_NUM_QUEUES_DEFAULT;
-         return ret;
      vexp->writable = opts->writable;
      vexp->blkcfg.wce = 0;
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
      }
+     vexp->blk_size = logical_block_size;
-+    file_nb_sectors = bdrv_nb_sectors(bs->file->bs);
+     blk_set_guest_block_size(exp->blk, logical_block_size);
-+    if (file_nb_sectors < 0) {
++
 +    if (vu_opts->has_num_queues) {
 +        num_queues = vu_opts->num_queues;
 +    }
 +    if (num_queues == 0) {
 +        error_setg(errp, "num-queues must be greater than 0");
 +        return -EINVAL;
 +    }
 +
-     ret = bdrv_pread(bs->file, 0, sizeof(ph), &ph, 0);
+     vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
-     if (ret < 0) {
+-                               logical_block_size);
-         goto fail;
++                             logical_block_size, num_queues);
-@@ -XXX,XX +XXX,XX @@ static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
+     blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
-     for (i = 0; i < s->bat_size; i++) {
+                                  vexp);
-         int64_t off = bat2sect(s, i);
-+        if (off >= file_nb_sectors) {
+     if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
-+            if (flags & BDRV_O_CHECK) {
+-                                 VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
-+                continue;
+-                                 errp)) {
-+            }
++                                 num_queues, &vu_blk_iface, errp)) {
-+            error_setg(errp, "parallels: Offset %" PRIi64 " in BAT[%d] entry "
+         blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
-+                       "is larger than file size (%" PRIi64 ")",
+                                         blk_aio_detach, vexp);
-+                       off << BDRV_SECTOR_BITS, i,
+         return -EADDRNOTAVAIL;
 +                       file_nb_sectors << BDRV_SECTOR_BITS);
 +            ret = -EINVAL;
 +            goto fail;
 +        }
          if (off >= s->data_end) {
              s->data_end = off + s->tracks;
          }
 --
-.40.1
+.26.2

-[PULL 11/17] parallels: Move check of cluster outside image to a separate function
+[PULL v2 24/28] block/io: fix bdrv_co_block_status_above
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-We will add more and more checks so we need a better code structure in
+bdrv_co_block_status_above has several design problems with handling
-parallels_co_check. Let each check performs in a separate loop in a
+short backing files:
 separate helper.
-Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
-Reviewed-by: Denis V. Lunev <den@openvz.org>
+without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
-Message-Id: <20230424093147.197643-8-alexander.ivanov@virtuozzo.com>
+which produces these after-EOF zeros is inside requested backing
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+sequence.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 . With want_zero=false, it may return pnum=0 prior to actual EOF,
 because of EOF of short backing file.
 Fix these things, making logic about short backing files clearer.
 With fixed bdrv_block_status_above we also have to improve is_zero in
 qcow2 code, otherwise iotest 154 will fail, because with this patch we
 stop to merge zeros of different types (produced by fully unallocated
 in the whole backing chain regions vs produced by short backing files).
 Note also, that this patch leaves for another day the general problem
 around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
 vs go-to-backing.
 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
 Reviewed-by: Alberto Garcia <berto@igalia.com>
 Reviewed-by: Eric Blake <eblake@redhat.com>
 Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com
 [Fix s/comes/come/ as suggested by Eric Blake
 --Stefan]
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/parallels.c | 75 +++++++++++++++++++++++++++++++----------------
+ block/io.c    | 68 ++++++++++++++++++++++++++++++++++++++++-----------
-file changed, 49 insertions(+), 26 deletions(-)
+ block/qcow2.c | 16 ++++++++++--
 files changed, 68 insertions(+), 16 deletions(-)
-diff --git a/block/parallels.c b/block/parallels.c
+diff --git a/block/io.c b/block/io.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/block/io.c
-+++ b/block/parallels.c
++++ b/block/io.c
-@@ -XXX,XX +XXX,XX @@ static void parallels_check_unclean(BlockDriverState *bs,
+@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
-     }
+                                   int64_t *map,
- }
+                                   BlockDriverState **file)
+ {
-+static int coroutine_fn GRAPH_RDLOCK
++    int ret;
-+parallels_check_outside_image(BlockDriverState *bs, BdrvCheckResult *res,
+     BlockDriverState *p;
-+                              BdrvCheckMode fix)
+-    int ret = 0;
-+{
+-    bool first = true;
-+    BDRVParallelsState *s = bs->opaque;
++    int64_t eof = 0;
-+    uint32_t i;
-+    int64_t off, high_off, size;
+     assert(bs != base);
 -    for (p = bs; p != base; p = bdrv_filter_or_cow_bs(p)) {
 +
-+    size = bdrv_getlength(bs->file->bs);
++    ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
-+    if (size < 0) {
++    if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
-+        res->check_errors++;
++        return ret;
 +        return size;
 +    }
 +
-+    high_off = 0;
++    if (ret & BDRV_BLOCK_EOF) {
-+    for (i = 0; i < s->bat_size; i++) {
++        eof = offset + *pnum;
 +        off = bat2sect(s, i) << BDRV_SECTOR_BITS;
 +        if (off > size) {
 +            fprintf(stderr, "%s cluster %u is outside image\n",
 +                    fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
 +            res->corruptions++;
 +            if (fix & BDRV_FIX_ERRORS) {
 +                parallels_set_bat_entry(s, i, 0);
 +                res->corruptions_fixed++;
 +            }
 +            continue;
 +        }
 +        if (high_off < off) {
 +            high_off = off;
 +        }
 +    }
 +
-+    if (high_off == 0) {
++    assert(*pnum <= bytes);
-+        res->image_end_offset = s->data_end << BDRV_SECTOR_BITS;
++    bytes = *pnum;
-+    } else {
++
-+        res->image_end_offset = high_off + s->cluster_size;
++    for (p = bdrv_filter_or_cow_bs(bs); p != base;
-+        s->data_end = res->image_end_offset >> BDRV_SECTOR_BITS;
++         p = bdrv_filter_or_cow_bs(p))
 +    {
          ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
                                     file);
          if (ret < 0) {
 -            break;
 +            return ret;
          }
 -        if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
 +        if (*pnum == 0) {
              /*
 -             * Reading beyond the end of the file continues to read
 -             * zeroes, but we can only widen the result to the
 -             * unallocated length we learned from an earlier
 -             * iteration.
 +             * The top layer deferred to this layer, and because this layer is
 +             * short, any zeroes that we synthesize beyond EOF behave as if they
 +             * were allocated at this layer.
 +             *
 +             * We don't include BDRV_BLOCK_EOF into ret, as upper layer may be
 +             * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
 +             * below.
               */
 +            assert(ret & BDRV_BLOCK_EOF);
              *pnum = bytes;
 +            if (file) {
 +                *file = p;
 +            }
 +            ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
 +            break;
          }
 -        if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
 +        if (ret & BDRV_BLOCK_ALLOCATED) {
 +            /*
 +             * We've found the node and the status, we must break.
 +             *
 +             * Drop BDRV_BLOCK_EOF, as it's not for upper layer, which may be
 +             * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
 +             * below.
 +             */
 +            ret &= ~BDRV_BLOCK_EOF;
              break;
          }
 -        /* [offset, pnum] unallocated on this layer, which could be only
 -         * the first part of [offset, bytes].  */
 -        bytes = MIN(bytes, *pnum);
 -        first = false;
 +
 +        /*
 +         * OK, [offset, offset + *pnum) region is unallocated on this layer,
 +         * let's continue the diving.
 +         */
 +        assert(*pnum <= bytes);
 +        bytes = *pnum;
 +    }
 +
-+    return 0;
++    if (offset + *pnum == eof) {
-+}
++        ret |= BDRV_BLOCK_EOF;
      }
 +
- static int coroutine_fn GRAPH_RDLOCK
+     return ret;
- parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+ }
-                    BdrvCheckMode fix)
- {
+diff --git a/block/qcow2.c b/block/qcow2.c
-     BDRVParallelsState *s = bs->opaque;
+index XXXXXXX..XXXXXXX 100644
--    int64_t size, prev_off, high_off;
+--- a/block/qcow2.c
--    int ret = 0;
++++ b/block/qcow2.c
-+    int64_t size, prev_off;
+@@ -XXX,XX +XXX,XX @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
-+    int ret;
+     if (!bytes) {
-     uint32_t i;
+         return true;
+     }
-     size = bdrv_getlength(bs->file->bs);
+-    res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+-    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
      parallels_check_unclean(bs, res, fix);
 +    ret = parallels_check_outside_image(bs, res, fix);
 +    if (ret < 0) {
 +        goto out;
 +    }
 +
-     res->bfi.total_clusters = s->bat_size;
++    /*
-     res->bfi.compressed_clusters = 0; /* compression is not supported */
++     * bdrv_block_status_above doesn't merge different types of zeros, for
++     * example, zeros which come from the region which is unallocated in
--    high_off = 0;
++     * the whole backing chain, and zeros which come because of a short
-     prev_off = 0;
++     * backing file. So, we need a loop.
-     for (i = 0; i < s->bat_size; i++) {
++     */
-         int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
++    do {
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
++        res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
-             continue;
++        offset += nr;
-         }
++        bytes -= nr;
++    } while (res >= 0 && (res & BDRV_BLOCK_ZERO) && nr && bytes);
--        /* cluster outside the image */
++
--        if (off > size) {
++    return res >= 0 && (res & BDRV_BLOCK_ZERO) && bytes == 0;
--            fprintf(stderr, "%s cluster %u is outside image\n",
+ }
--                    fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
--            res->corruptions++;
+ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 -            if (fix & BDRV_FIX_ERRORS) {
 -                parallels_set_bat_entry(s, i, 0);
 -                res->corruptions_fixed++;
 -            }
 -            prev_off = 0;
 -            continue;
 -        }
 -
          res->bfi.allocated_clusters++;
 -        if (off > high_off) {
 -            high_off = off;
 -        }
          if (prev_off != 0 && (prev_off + s->cluster_size) != off) {
              res->bfi.fragmented_clusters++;
@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
          prev_off = off;
      }
 -    if (high_off == 0) {
 -        res->image_end_offset = s->data_end << BDRV_SECTOR_BITS;
 -    } else {
 -        res->image_end_offset = high_off + s->cluster_size;
 -        s->data_end = res->image_end_offset >> BDRV_SECTOR_BITS;
 -    }
 -
      if (size > res->image_end_offset) {
          int64_t count;
          count = DIV_ROUND_UP(size - res->image_end_offset, s->cluster_size);
 --
-.40.1
+.26.2

-[PULL 01/17] util/iov: Make qiov_slice() public
+[PULL v2 25/28] block/io: bdrv_common_block_status_above: support include_base
-We want to inline qemu_iovec_init_extended() in block/io.c for padding
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
 requests, and having access to qiov_slice() is useful for this.  As a
 public function, it is renamed to qemu_iovec_slice().
-(We will need to count the number of I/O vector elements of a slice
+In order to reuse bdrv_common_block_status_above in
-there, and then later process this slice.  Without qiov_slice(), we
+bdrv_is_allocated_above, let's support include_base parameter.
 would need to call qemu_iovec_subvec_niov(), and all further
 IOV-processing functions may need to skip prefixing elements to
 accomodate for a qiov_offset.  Because qemu_iovec_subvec_niov()
 internally calls qiov_slice(), we can just have the block/io.c code call
 qiov_slice() itself, thus get the number of elements, and also create an
 iovec array with the superfluous prefixing elements stripped, so the
 following processing functions no longer need to skip them.)
+Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+Reviewed-by: Alberto Garcia <berto@igalia.com>
 Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
+Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-Id: <20230411173418.19549-2-hreitz@redhat.com>
 ---
- include/qemu/iov.h |  3 +++
+ block/coroutines.h |  2 ++
- util/iov.c         | 14 +++++++-------
+ block/io.c         | 21 ++++++++++++++-------
-files changed, 10 insertions(+), 7 deletions(-)
+files changed, 16 insertions(+), 7 deletions(-)
-diff --git a/include/qemu/iov.h b/include/qemu/iov.h
+diff --git a/block/coroutines.h b/block/coroutines.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/iov.h
+--- a/block/coroutines.h
-+++ b/include/qemu/iov.h
++++ b/block/coroutines.h
-@@ -XXX,XX +XXX,XX @@ int qemu_iovec_init_extended(
+@@ -XXX,XX +XXX,XX @@ bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int bytes,
-         void *tail_buf, size_t tail_len);
+ int coroutine_fn
- void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
+ bdrv_co_common_block_status_above(BlockDriverState *bs,
-                            size_t offset, size_t len);
+                                   BlockDriverState *base,
-+struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
++                                  bool include_base,
-+                               size_t offset, size_t len,
+                                   bool want_zero,
-+                               size_t *head, size_t *tail, int *niov);
+                                   int64_t offset,
- int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len);
+                                   int64_t bytes,
- void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len);
+@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
- void qemu_iovec_concat(QEMUIOVector *dst,
+ int generated_co_wrapper
-diff --git a/util/iov.c b/util/iov.c
+ bdrv_common_block_status_above(BlockDriverState *bs,
                                 BlockDriverState *base,
 +                               bool include_base,
                                 bool want_zero,
                                 int64_t offset,
                                 int64_t bytes,
 diff --git a/block/io.c b/block/io.c
 index XXXXXXX..XXXXXXX 100644
---- a/util/iov.c
+--- a/block/io.c
-+++ b/util/iov.c
++++ b/block/io.c
-@@ -XXX,XX +XXX,XX @@ static struct iovec *iov_skip_offset(struct iovec *iov, size_t offset,
+@@ -XXX,XX +XXX,XX @@ early_out:
  int coroutine_fn
  bdrv_co_common_block_status_above(BlockDriverState *bs,
                                    BlockDriverState *base,
 +                                  bool include_base,
                                    bool want_zero,
                                    int64_t offset,
                                    int64_t bytes,
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
      BlockDriverState *p;
      int64_t eof = 0;
 -    assert(bs != base);
 +    assert(include_base || bs != base);
 +    assert(!include_base || base); /* Can't include NULL base */
      ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
 -    if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
 +    if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
          return ret;
      }
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
      assert(*pnum <= bytes);
      bytes = *pnum;
 -    for (p = bdrv_filter_or_cow_bs(bs); p != base;
 +    for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
           p = bdrv_filter_or_cow_bs(p))
      {
          ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
              break;
          }
 +        if (p == base) {
 +            assert(include_base);
 +            break;
 +        }
 +
          /*
           * OK, [offset, offset + *pnum) region is unallocated on this layer,
           * let's continue the diving.
@@ -XXX,XX +XXX,XX @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
                              int64_t offset, int64_t bytes, int64_t *pnum,
                              int64_t *map, BlockDriverState **file)
  {
 -    return bdrv_common_block_status_above(bs, base, true, offset, bytes,
 +    return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
                                            pnum, map, file);
  }
- /*
+@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
-- * qiov_slice
+     int ret;
-+ * qemu_iovec_slice
+     int64_t dummy;
-  *
-  * Find subarray of iovec's, containing requested range. @head would
+-    ret = bdrv_common_block_status_above(bs, bdrv_filter_or_cow_bs(bs), false,
-  * be offset in first iov (returned by the function), @tail would be
+-                                         offset, bytes, pnum ? pnum : &dummy,
-  * count of extra bytes in last iovec (returned iov + @niov - 1).
+-                                         NULL, NULL);
-  */
++    ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
--static struct iovec *qiov_slice(QEMUIOVector *qiov,
++                                         bytes, pnum ? pnum : &dummy, NULL,
--                                size_t offset, size_t len,
++                                         NULL);
--                                size_t *head, size_t *tail, int *niov)
+     if (ret < 0) {
-+struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
+         return ret;
 +                               size_t offset, size_t len,
 +                               size_t *head, size_t *tail, int *niov)
  {
      struct iovec *iov, *end_iov;
@@ -XXX,XX +XXX,XX @@ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len)
      size_t head, tail;
      int niov;
 -    qiov_slice(qiov, offset, len, &head, &tail, &niov);
 +    qemu_iovec_slice(qiov, offset, len, &head, &tail, &niov);
      return niov;
  }
@@ -XXX,XX +XXX,XX @@ int qemu_iovec_init_extended(
      }
-     if (mid_len) {
--        mid_iov = qiov_slice(mid_qiov, mid_offset, mid_len,
--                             &mid_head, &mid_tail, &mid_niov);
-+        mid_iov = qemu_iovec_slice(mid_qiov, mid_offset, mid_len,
-+                                   &mid_head, &mid_tail, &mid_niov);
-     }
-     total_niov = !!head_len + mid_niov + !!tail_len;
 --
-.40.1
+.26.2

-[PULL 12/17] parallels: Fix statistics calculation
+[PULL v2 26/28] block/io: bdrv_common_block_status_above: support bs == base
-From: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Exclude out-of-image clusters from allocated and fragmented clusters
+We are going to reuse bdrv_common_block_status_above in
-calculation.
+bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
 include_base == false and still bs == base (for ex. from img_rebase()).
-Signed-off-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
+So, support this corner case.
-Message-Id: <20230424093147.197643-9-alexander.ivanov@virtuozzo.com>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Reviewed-by: Kevin Wolf <kwolf@redhat.com>
 Reviewed-by: Eric Blake <eblake@redhat.com>
 Reviewed-by: Alberto Garcia <berto@igalia.com>
 Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/parallels.c | 6 +++++-
+ block/io.c | 6 +++++-
 file changed, 5 insertions(+), 1 deletion(-)
-diff --git a/block/parallels.c b/block/parallels.c
+diff --git a/block/io.c b/block/io.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/parallels.c
+--- a/block/io.c
-+++ b/block/parallels.c
++++ b/block/io.c
-@@ -XXX,XX +XXX,XX @@ parallels_co_check(BlockDriverState *bs, BdrvCheckResult *res,
+@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
-     prev_off = 0;
+     BlockDriverState *p;
-     for (i = 0; i < s->bat_size; i++) {
+     int64_t eof = 0;
-         int64_t off = bat2sect(s, i) << BDRV_SECTOR_BITS;
--        if (off == 0) {
+-    assert(include_base || bs != base);
-+        /*
+     assert(!include_base || base); /* Can't include NULL base */
-+         * If BDRV_FIX_ERRORS is not set, out-of-image BAT entries were not
-+         * fixed. Skip not allocated and out-of-image BAT entries.
++    if (!include_base && bs == base) {
-+         */
++        *pnum = bytes;
-+        if (off == 0 || off + s->cluster_size > res->image_end_offset) {
++        return 0;
-             prev_off = 0;
++    }
-             continue;
++
-         }
+     ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
      if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
          return ret;
 --
-.40.1
+.26.2

-[PULL 02/17] block: Collapse padded I/O vecs exceeding IOV_MAX
+[PULL v2 27/28] block/io: fix bdrv_is_allocated_above
-When processing vectored guest requests that are not aligned to the
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
 storage request alignment, we pad them by adding head and/or tail
 buffers for a read-modify-write cycle.
-The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
+bdrv_is_allocated_above wrongly handles short backing files: it reports
-with this padding, the vector can exceed that limit.  As of
+after-EOF space as UNALLOCATED which is wrong, as on read the data is
-c002cef0e9abe7135d7916c51abce47f7fc1ee2 ("util/iov: make
+generated on the level of short backing file (if all overlays have
-qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
+unallocated areas at that place).
 limit, instead returning an error to the guest.
-To the guest, this appears as a random I/O error.  We should not return
+Reusing bdrv_common_block_status_above fixes the issue and unifies code
-an I/O error to the guest when it issued a perfectly valid request.
+path.
-Before 4c002cef0e9abe7135d7916c51abce47f7fc1ee2, we just made the vector
+Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-longer than IOV_MAX, which generally seems to work (because the guest
+Reviewed-by: Eric Blake <eblake@redhat.com>
-assumes a smaller alignment than we really have, file-posix's
+Reviewed-by: Alberto Garcia <berto@igalia.com>
-raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
+Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com
-so emulate the request, so that the IOV_MAX does not matter).  However,
+[Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/.
-that does not seem exactly great.
+--Stefan]
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 I see two ways to fix this problem:
 . We split such long requests into two requests.
 . We join some elements of the vector into new buffers to make it
    shorter.
 I am wary of (1), because it seems like it may have unintended side
 effects.
 (2) on the other hand seems relatively simple to implement, with
 hopefully few side effects, so this patch does that.
 To do this, the use of qemu_iovec_init_extended() in bdrv_pad_request()
 is effectively replaced by the new function bdrv_create_padded_qiov(),
 which not only wraps the request IOV with padding head/tail, but also
 ensures that the resulting vector will not have more than IOV_MAX
 elements.  Putting that functionality into qemu_iovec_init_extended() is
 infeasible because it requires allocating a bounce buffer; doing so
 would require many more parameters (buffer alignment, how to initialize
 the buffer, and out parameters like the buffer, its length, and the
 original elements), which is not reasonable.
 Conversely, it is not difficult to move qemu_iovec_init_extended()'s
 functionality into bdrv_create_padded_qiov() by using public
 qemu_iovec_* functions, so that is what this patch does.
 Because bdrv_pad_request() was the only "serious" user of
 qemu_iovec_init_extended(), the next patch will remove the latter
 function, so the functionality is not implemented twice.
 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 Message-Id: <20230411173418.19549-3-hreitz@redhat.com>
 Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
 ---
- block/io.c | 166 ++++++++++++++++++++++++++++++++++++++++++++++++-----
+ block/io.c | 43 +++++--------------------------------------
-file changed, 151 insertions(+), 15 deletions(-)
+file changed, 5 insertions(+), 38 deletions(-)
 diff --git a/block/io.c b/block/io.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/io.c
 +++ b/block/io.c
-@@ -XXX,XX +XXX,XX @@ out:
+@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
-  * @merge_reads is true for small requests,
+  * at 'offset + *pnum' may return the same allocation status (in other
-  * if @buf_len == @head + bytes + @tail. In this case it is possible that both
+  * words, the result is not necessarily the maximum possible range);
-  * head and tail exist but @buf_len == align and @tail_buf == @buf.
+  * but 'pnum' will only be 0 when end of file is reached.
-+ *
+- *
 + * @write is true for write requests, false for read requests.
 + *
 + * If padding makes the vector too long (exceeding IOV_MAX), then we need to
 + * merge existing vector elements into a single one.  @collapse_bounce_buf acts
 + * as the bounce buffer in such cases.  @pre_collapse_qiov has the pre-collapse
 + * I/O vector elements so for read requests, the data can be copied back after
 + * the read is done.
   */
- typedef struct BdrvRequestPadding {
+ int bdrv_is_allocated_above(BlockDriverState *top,
-     uint8_t *buf;
+                             BlockDriverState *base,
-@@ -XXX,XX +XXX,XX @@ typedef struct BdrvRequestPadding {
+                             bool include_base, int64_t offset,
-     size_t head;
+                             int64_t bytes, int64_t *pnum)
      size_t tail;
      bool merge_reads;
 +    bool write;
      QEMUIOVector local_qiov;
 +
 +    uint8_t *collapse_bounce_buf;
 +    size_t collapse_len;
 +    QEMUIOVector pre_collapse_qiov;
  } BdrvRequestPadding;
  static bool bdrv_init_padding(BlockDriverState *bs,
                                int64_t offset, int64_t bytes,
 +                              bool write,
                                BdrvRequestPadding *pad)
  {
-     int64_t align = bs->bl.request_alignment;
+-    BlockDriverState *intermediate;
-@@ -XXX,XX +XXX,XX @@ static bool bdrv_init_padding(BlockDriverState *bs,
+-    int ret;
-         pad->tail_buf = pad->buf + pad->buf_len - align;
+-    int64_t n = bytes;
 -
 -    assert(base || !include_base);
 -
 -    intermediate = top;
 -    while (include_base || intermediate != base) {
 -        int64_t pnum_inter;
 -        int64_t size_inter;
 -
 -        assert(intermediate);
 -        ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
 -        if (ret < 0) {
 -            return ret;
 -        }
 -        if (ret) {
 -            *pnum = pnum_inter;
 -            return 1;
 -        }
 -
 -        size_inter = bdrv_getlength(intermediate);
 -        if (size_inter < 0) {
 -            return size_inter;
 -        }
 -        if (n > pnum_inter &&
 -            (intermediate == top || offset + pnum_inter < size_inter)) {
 -            n = pnum_inter;
 -        }
 -
 -        if (intermediate == base) {
 -            break;
 -        }
 -
 -        intermediate = bdrv_filter_or_cow_bs(intermediate);
 +    int ret = bdrv_common_block_status_above(top, base, include_base, false,
 +                                             offset, bytes, pnum, NULL, NULL);
 +    if (ret < 0) {
 +        return ret;
      }
-+    pad->write = write;
+-    *pnum = n;
-+
+-    return 0;
-     return true;
++    return !!(ret & BDRV_BLOCK_ALLOCATED);
  }
-@@ -XXX,XX +XXX,XX @@ zero_mem:
+ int coroutine_fn
      return 0;
  }
 -static void bdrv_padding_destroy(BdrvRequestPadding *pad)
 +/**
 + * Free *pad's associated buffers, and perform any necessary finalization steps.
 + */
 +static void bdrv_padding_finalize(BdrvRequestPadding *pad)
  {
 +    if (pad->collapse_bounce_buf) {
 +        if (!pad->write) {
 +            /*
 +             * If padding required elements in the vector to be collapsed into a
 +             * bounce buffer, copy the bounce buffer content back
 +             */
 +            qemu_iovec_from_buf(&pad->pre_collapse_qiov, 0,
 +                                pad->collapse_bounce_buf, pad->collapse_len);
 +        }
 +        qemu_vfree(pad->collapse_bounce_buf);
 +        qemu_iovec_destroy(&pad->pre_collapse_qiov);
 +    }
      if (pad->buf) {
          qemu_vfree(pad->buf);
          qemu_iovec_destroy(&pad->local_qiov);
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
      memset(pad, 0, sizeof(*pad));
  }
 +/*
 + * Create pad->local_qiov by wrapping @iov in the padding head and tail, while
 + * ensuring that the resulting vector will not exceed IOV_MAX elements.
 + *
 + * To ensure this, when necessary, the first two or three elements of @iov are
 + * merged into pad->collapse_bounce_buf and replaced by a reference to that
 + * bounce buffer in pad->local_qiov.
 + *
 + * After performing a read request, the data from the bounce buffer must be
 + * copied back into pad->pre_collapse_qiov (e.g. by bdrv_padding_finalize()).
 + */
 +static int bdrv_create_padded_qiov(BlockDriverState *bs,
 +                                   BdrvRequestPadding *pad,
 +                                   struct iovec *iov, int niov,
 +                                   size_t iov_offset, size_t bytes)
 +{
 +    int padded_niov, surplus_count, collapse_count;
 +
 +    /* Assert this invariant */
 +    assert(niov <= IOV_MAX);
 +
 +    /*
 +     * Cannot pad if resulting length would exceed SIZE_MAX.  Returning an error
 +     * to the guest is not ideal, but there is little else we can do.  At least
 +     * this will practically never happen on 64-bit systems.
 +     */
 +    if (SIZE_MAX - pad->head < bytes ||
 +        SIZE_MAX - pad->head - bytes < pad->tail)
 +    {
 +        return -EINVAL;
 +    }
 +
 +    /* Length of the resulting IOV if we just concatenated everything */
 +    padded_niov = !!pad->head + niov + !!pad->tail;
 +
 +    qemu_iovec_init(&pad->local_qiov, MIN(padded_niov, IOV_MAX));
 +
 +    if (pad->head) {
 +        qemu_iovec_add(&pad->local_qiov, pad->buf, pad->head);
 +    }
 +
 +    /*
 +     * If padded_niov > IOV_MAX, we cannot just concatenate everything.
 +     * Instead, merge the first two or three elements of @iov to reduce the
 +     * number of vector elements as necessary.
 +     */
 +    if (padded_niov > IOV_MAX) {
 +        /*
 +         * Only head and tail can have lead to the number of entries exceeding
 +         * IOV_MAX, so we can exceed it by the head and tail at most.  We need
 +         * to reduce the number of elements by `surplus_count`, so we merge that
 +         * many elements plus one into one element.
 +         */
 +        surplus_count = padded_niov - IOV_MAX;
 +        assert(surplus_count <= !!pad->head + !!pad->tail);
 +        collapse_count = surplus_count + 1;
 +
 +        /*
 +         * Move the elements to collapse into `pad->pre_collapse_qiov`, then
 +         * advance `iov` (and associated variables) by those elements.
 +         */
 +        qemu_iovec_init(&pad->pre_collapse_qiov, collapse_count);
 +        qemu_iovec_concat_iov(&pad->pre_collapse_qiov, iov,
 +                              collapse_count, iov_offset, SIZE_MAX);
 +        iov += collapse_count;
 +        iov_offset = 0;
 +        niov -= collapse_count;
 +        bytes -= pad->pre_collapse_qiov.size;
 +
 +        /*
 +         * Construct the bounce buffer to match the length of the to-collapse
 +         * vector elements, and for write requests, initialize it with the data
 +         * from those elements.  Then add it to `pad->local_qiov`.
 +         */
 +        pad->collapse_len = pad->pre_collapse_qiov.size;
 +        pad->collapse_bounce_buf = qemu_blockalign(bs, pad->collapse_len);
 +        if (pad->write) {
 +            qemu_iovec_to_buf(&pad->pre_collapse_qiov, 0,
 +                              pad->collapse_bounce_buf, pad->collapse_len);
 +        }
 +        qemu_iovec_add(&pad->local_qiov,
 +                       pad->collapse_bounce_buf, pad->collapse_len);
 +    }
 +
 +    qemu_iovec_concat_iov(&pad->local_qiov, iov, niov, iov_offset, bytes);
 +
 +    if (pad->tail) {
 +        qemu_iovec_add(&pad->local_qiov,
 +                       pad->buf + pad->buf_len - pad->tail, pad->tail);
 +    }
 +
 +    assert(pad->local_qiov.niov == MIN(padded_niov, IOV_MAX));
 +    return 0;
 +}
 +
  /*
   * bdrv_pad_request
   *
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
   * read of padding, bdrv_padding_rmw_read() should be called separately if
   * needed.
   *
 + * @write is true for write requests, false for read requests.
 + *
   * Request parameters (@qiov, &qiov_offset, &offset, &bytes) are in-out:
   *  - on function start they represent original request
   *  - on failure or when padding is not needed they are unchanged
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
  static int bdrv_pad_request(BlockDriverState *bs,
                              QEMUIOVector **qiov, size_t *qiov_offset,
                              int64_t *offset, int64_t *bytes,
 +                            bool write,
                              BdrvRequestPadding *pad, bool *padded,
                              BdrvRequestFlags *flags)
  {
      int ret;
 +    struct iovec *sliced_iov;
 +    int sliced_niov;
 +    size_t sliced_head, sliced_tail;
      bdrv_check_qiov_request(*offset, *bytes, *qiov, *qiov_offset, &error_abort);
 -    if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
 +    if (!bdrv_init_padding(bs, *offset, *bytes, write, pad)) {
          if (padded) {
              *padded = false;
          }
          return 0;
      }
 -    ret = qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
 -                                   *qiov, *qiov_offset, *bytes,
 -                                   pad->buf + pad->buf_len - pad->tail,
 -                                   pad->tail);
 +    sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes,
 +                                  &sliced_head, &sliced_tail,
 +                                  &sliced_niov);
 +
 +    /* Guaranteed by bdrv_check_qiov_request() */
 +    assert(*bytes <= SIZE_MAX);
 +    ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov,
 +                                  sliced_head, *bytes);
      if (ret < 0) {
 -        bdrv_padding_destroy(pad);
 +        bdrv_padding_finalize(pad);
          return ret;
      }
      *bytes += pad->head + pad->tail;
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
          flags |= BDRV_REQ_COPY_ON_READ;
      }
 -    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
 -                           NULL, &flags);
 +    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, false,
 +                           &pad, NULL, &flags);
      if (ret < 0) {
          goto fail;
      }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
                                bs->bl.request_alignment,
                                qiov, qiov_offset, flags);
      tracked_request_end(&req);
 -    bdrv_padding_destroy(&pad);
 +    bdrv_padding_finalize(&pad);
  fail:
      bdrv_dec_in_flight(bs);
@@ -XXX,XX +XXX,XX @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
      /* This flag doesn't make sense for padding or zero writes */
      flags &= ~BDRV_REQ_REGISTERED_BUF;
 -    padding = bdrv_init_padding(bs, offset, bytes, &pad);
 +    padding = bdrv_init_padding(bs, offset, bytes, true, &pad);
      if (padding) {
          assert(!(flags & BDRV_REQ_NO_WAIT));
          bdrv_make_request_serialising(req, align);
@@ -XXX,XX +XXX,XX @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
      }
  out:
 -    bdrv_padding_destroy(&pad);
 +    bdrv_padding_finalize(&pad);
      return ret;
  }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
           * bdrv_co_do_zero_pwritev() does aligning by itself, so, we do
           * alignment only if there is no ZERO flag.
           */
 -        ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
 -                               &padded, &flags);
 +        ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, true,
 +                               &pad, &padded, &flags);
          if (ret < 0) {
              return ret;
          }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
      ret = bdrv_aligned_pwritev(child, &req, offset, bytes, align,
                                 qiov, qiov_offset, flags);
 -    bdrv_padding_destroy(&pad);
 +    bdrv_padding_finalize(&pad);
  out:
      tracked_request_end(&req);
 --
-.40.1
+.26.2

-New patch
+[PULL v2 28/28] iotests: add commit top->base cases to 274
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+These cases are fixed by previous patches around block_status and
+is_allocated.
+Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+Reviewed-by: Eric Blake <eblake@redhat.com>
+Reviewed-by: Alberto Garcia <berto@igalia.com>
+Message-id: 20200924194003.22080-6-vsementsov@virtuozzo.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ tests/qemu-iotests/274     | 20 +++++++++++
+ tests/qemu-iotests/274.out | 68 ++++++++++++++++++++++++++++++++++++++
+files changed, 88 insertions(+)
+diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
+index XXXXXXX..XXXXXXX 100755
+--- a/tests/qemu-iotests/274
++++ b/tests/qemu-iotests/274
+@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('base') as base, \
+     iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
+     iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
++    iotests.log('=== Testing qemu-img commit (top -> base) ===')
++
++    create_chain()
++    iotests.qemu_img_log('commit', '-b', base, top)
++    iotests.img_info_log(base)
++    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
++    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
++
++    iotests.log('=== Testing QMP active commit (top -> base) ===')
++
++    create_chain()
++    with create_vm() as vm:
++        vm.launch()
++        vm.qmp_log('block-commit', device='top', base_node='base',
++                   job_id='job0', auto_dismiss=False)
++        vm.run_job('job0', wait=5)
++
++    iotests.img_info_log(mid)
++    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
++    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
+     iotests.log('== Resize tests ==')
+diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
+index XXXXXXX..XXXXXXX 100644
+--- a/tests/qemu-iotests/274.out
++++ b/tests/qemu-iotests/274.out
+@@ -XXX,XX +XXX,XX @@ read 1048576/1048576 bytes at offset 0
+ read 1048576/1048576 bytes at offset 1048576
+MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++=== Testing qemu-img commit (top -> base) ===
++Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
++
++Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
++
++Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
++
++wrote 2097152/2097152 bytes at offset 0
++2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++
++Image committed.
++
++image: TEST_IMG
++file format: IMGFMT
++virtual size: 2 MiB (2097152 bytes)
++cluster_size: 65536
++Format specific information:
++    compat: 1.1
++    compression type: zlib
++    lazy refcounts: false
++    refcount bits: 16
++    corrupt: false
++    extended l2: false
++
++read 1048576/1048576 bytes at offset 0
++1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++
++read 1048576/1048576 bytes at offset 1048576
++1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++
++=== Testing QMP active commit (top -> base) ===
++Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
++
++Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
++
++Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
++
++wrote 2097152/2097152 bytes at offset 0
++2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++
++{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": "base", "device": "top", "job-id": "job0"}}
++{"return": {}}
++{"execute": "job-complete", "arguments": {"id": "job0"}}
++{"return": {}}
++{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
++{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
++{"execute": "job-dismiss", "arguments": {"id": "job0"}}
++{"return": {}}
++image: TEST_IMG
++file format: IMGFMT
++virtual size: 1 MiB (1048576 bytes)
++cluster_size: 65536
++backing file: TEST_DIR/PID-base
++backing file format: IMGFMT
++Format specific information:
++    compat: 1.1
++    compression type: zlib
++    lazy refcounts: false
++    refcount bits: 16
++    corrupt: false
++    extended l2: false
++
++read 1048576/1048576 bytes at offset 0
++1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++
++read 1048576/1048576 bytes at offset 1048576
++1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
++
+ == Resize tests ==
+ === preallocation=off ===
+ Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=6442450944 lazy_refcounts=off refcount_bits=16
+--
+.26.2

The following changes since commit 848a6caa88b9f082c89c9b41afa975761262981d:

Merge tag 'migration-20230602-pull-request' of https://gitlab.com/juan.quintela/qemu into staging (2023-06-02 17:33:29 -0700)

are available in the Git repository at:

https://gitlab.com/hreitz/qemu.git tags/pull-block-2023-06-05

for you to fetch changes up to 42a2890a76f4783cd1c212f27856edcf2b5e8a75:

qcow2: add discard-no-unref option (2023-06-05 13:15:42 +0200)

----------------------------------------------------------------
Block patches

- Fix padding of unaligned vectored requests to match the host alignment
  for vectors with 1023 or 1024 buffers
- Refactor and fix bugs in parallels's image check functionality
- Add an option to the qcow2 driver to retain (qcow2-level) allocations
  on discard requests from the guest (while still forwarding the discard
  to the lower level and marking the range as zero)

----------------------------------------------------------------
Alexander Ivanov (12):
  parallels: Out of image offset in BAT leads to image inflation
  parallels: Fix high_off calculation in parallels_co_check()
  parallels: Fix image_end_offset and data_end after out-of-image check
  parallels: create parallels_set_bat_entry_helper() to assign BAT value
  parallels: Use generic infrastructure for BAT writing in
    parallels_co_check()
  parallels: Move check of unclean image to a separate function
  parallels: Move check of cluster outside image to a separate function
  parallels: Fix statistics calculation
  parallels: Move check of leaks to a separate function
  parallels: Move statistic collection to a separate function
  parallels: Replace qemu_co_mutex_lock by WITH_QEMU_LOCK_GUARD
  parallels: Incorrect condition in out-of-image check

Hanna Czenczek (4):
  util/iov: Make qiov_slice() public
  block: Collapse padded I/O vecs exceeding IOV_MAX
  util/iov: Remove qemu_iovec_init_extended()
  iotests/iov-padding: New test

Jean-Louis Dupond (1):
  qcow2: add discard-no-unref option

qapi/block-core.json                     |  12 ++
 block/qcow2.h                            |   3 +
 include/qemu/iov.h                       |   8 +-
 block/io.c                               | 166 ++++++++++++++++++--
 block/parallels.c                        | 190 ++++++++++++++++-------
 block/qcow2-cluster.c                    |  32 +++-
 block/qcow2.c                            |  18 +++
 util/iov.c                               |  89 ++---------
 qemu-options.hx                          |  12 ++
 tests/qemu-iotests/tests/iov-padding     |  85 ++++++++++
 tests/qemu-iotests/tests/iov-padding.out |  59 +++++++
 11 files changed, 523 insertions(+), 151 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/iov-padding
 create mode 100644 tests/qemu-iotests/tests/iov-padding.out

-- 
2.40.1

We want to inline qemu_iovec_init_extended() in block/io.c for padding
requests, and having access to qiov_slice() is useful for this.  As a
public function, it is renamed to qemu_iovec_slice().

(We will need to count the number of I/O vector elements of a slice
there, and then later process this slice.  Without qiov_slice(), we
would need to call qemu_iovec_subvec_niov(), and all further
IOV-processing functions may need to skip prefixing elements to
accomodate for a qiov_offset.  Because qemu_iovec_subvec_niov()
internally calls qiov_slice(), we can just have the block/io.c code call
qiov_slice() itself, thus get the number of elements, and also create an
iovec array with the superfluous prefixing elements stripped, so the
following processing functions no longer need to skip them.)

diff --git a/include/qemu/iov.h b/include/qemu/iov.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/iov.h
+++ b/include/qemu/iov.h
@@ -XXX,XX +XXX,XX @@ int qemu_iovec_init_extended(
         void *tail_buf, size_t tail_len);
 void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
                            size_t offset, size_t len);
+struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
+                               size_t offset, size_t len,
+                               size_t *head, size_t *tail, int *niov);
 int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len);
 void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len);
 void qemu_iovec_concat(QEMUIOVector *dst,
diff --git a/util/iov.c b/util/iov.c
index XXXXXXX..XXXXXXX 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -XXX,XX +XXX,XX @@ static struct iovec *iov_skip_offset(struct iovec *iov, size_t offset,
 }
 
 /*
- * qiov_slice
+ * qemu_iovec_slice
  *
  * Find subarray of iovec's, containing requested range. @head would
  * be offset in first iov (returned by the function), @tail would be
  * count of extra bytes in last iovec (returned iov + @niov - 1).
  */
-static struct iovec *qiov_slice(QEMUIOVector *qiov,
-                                size_t offset, size_t len,
-                                size_t *head, size_t *tail, int *niov)
+struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
+                               size_t offset, size_t len,
+                               size_t *head, size_t *tail, int *niov)
 {
     struct iovec *iov, *end_iov;
 
@@ -XXX,XX +XXX,XX @@ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len)
     size_t head, tail;
     int niov;
 
-    qiov_slice(qiov, offset, len, &head, &tail, &niov);
+    qemu_iovec_slice(qiov, offset, len, &head, &tail, &niov);
 
     return niov;
 }
@@ -XXX,XX +XXX,XX @@ int qemu_iovec_init_extended(
     }
 
     if (mid_len) {
-        mid_iov = qiov_slice(mid_qiov, mid_offset, mid_len,
-                             &mid_head, &mid_tail, &mid_niov);
+        mid_iov = qemu_iovec_slice(mid_qiov, mid_offset, mid_len,
+                                   &mid_head, &mid_tail, &mid_niov);
     }
 
     total_niov = !!head_len + mid_niov + !!tail_len;
-- 
2.40.1

When processing vectored guest requests that are not aligned to the
storage request alignment, we pad them by adding head and/or tail
buffers for a read-modify-write cycle.

The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
with this padding, the vector can exceed that limit.  As of
4c002cef0e9abe7135d7916c51abce47f7fc1ee2 ("util/iov: make
qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
limit, instead returning an error to the guest.

To the guest, this appears as a random I/O error.  We should not return
an I/O error to the guest when it issued a perfectly valid request.

Before 4c002cef0e9abe7135d7916c51abce47f7fc1ee2, we just made the vector
longer than IOV_MAX, which generally seems to work (because the guest
assumes a smaller alignment than we really have, file-posix's
raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
so emulate the request, so that the IOV_MAX does not matter).  However,
that does not seem exactly great.

I see two ways to fix this problem:
1. We split such long requests into two requests.
2. We join some elements of the vector into new buffers to make it
   shorter.

I am wary of (1), because it seems like it may have unintended side
effects.

(2) on the other hand seems relatively simple to implement, with
hopefully few side effects, so this patch does that.

To do this, the use of qemu_iovec_init_extended() in bdrv_pad_request()
is effectively replaced by the new function bdrv_create_padded_qiov(),
which not only wraps the request IOV with padding head/tail, but also
ensures that the resulting vector will not have more than IOV_MAX
elements.  Putting that functionality into qemu_iovec_init_extended() is
infeasible because it requires allocating a bounce buffer; doing so
would require many more parameters (buffer alignment, how to initialize
the buffer, and out parameters like the buffer, its length, and the
original elements), which is not reasonable.

Conversely, it is not difficult to move qemu_iovec_init_extended()'s
functionality into bdrv_create_padded_qiov() by using public
qemu_iovec_* functions, so that is what this patch does.

Because bdrv_pad_request() was the only "serious" user of
qemu_iovec_init_extended(), the next patch will remove the latter
function, so the functionality is not implemented twice.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-3-hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 block/io.c | 166 ++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 151 insertions(+), 15 deletions(-)

diff --git a/block/io.c b/block/io.c
index XXXXXXX..XXXXXXX 100644
--- a/block/io.c
+++ b/block/io.c
@@ -XXX,XX +XXX,XX @@ out:
  * @merge_reads is true for small requests,
  * if @buf_len == @head + bytes + @tail. In this case it is possible that both
  * head and tail exist but @buf_len == align and @tail_buf == @buf.
+ *
+ * @write is true for write requests, false for read requests.
+ *
+ * If padding makes the vector too long (exceeding IOV_MAX), then we need to
+ * merge existing vector elements into a single one.  @collapse_bounce_buf acts
+ * as the bounce buffer in such cases.  @pre_collapse_qiov has the pre-collapse
+ * I/O vector elements so for read requests, the data can be copied back after
+ * the read is done.
  */
 typedef struct BdrvRequestPadding {
     uint8_t *buf;
@@ -XXX,XX +XXX,XX @@ typedef struct BdrvRequestPadding {
     size_t head;
     size_t tail;
     bool merge_reads;
+    bool write;
     QEMUIOVector local_qiov;
+
+    uint8_t *collapse_bounce_buf;
+    size_t collapse_len;
+    QEMUIOVector pre_collapse_qiov;
 } BdrvRequestPadding;
 
 static bool bdrv_init_padding(BlockDriverState *bs,
                               int64_t offset, int64_t bytes,
+                              bool write,
                               BdrvRequestPadding *pad)
 {
     int64_t align = bs->bl.request_alignment;
@@ -XXX,XX +XXX,XX @@ static bool bdrv_init_padding(BlockDriverState *bs,
         pad->tail_buf = pad->buf + pad->buf_len - align;
     }
 
+    pad->write = write;
+
     return true;
 }
 
@@ -XXX,XX +XXX,XX @@ zero_mem:
     return 0;
 }
 
-static void bdrv_padding_destroy(BdrvRequestPadding *pad)
+/**
+ * Free *pad's associated buffers, and perform any necessary finalization steps.
+ */
+static void bdrv_padding_finalize(BdrvRequestPadding *pad)
 {
+    if (pad->collapse_bounce_buf) {
+        if (!pad->write) {
+            /*
+             * If padding required elements in the vector to be collapsed into a
+             * bounce buffer, copy the bounce buffer content back
+             */
+            qemu_iovec_from_buf(&pad->pre_collapse_qiov, 0,
+                                pad->collapse_bounce_buf, pad->collapse_len);
+        }
+        qemu_vfree(pad->collapse_bounce_buf);
+        qemu_iovec_destroy(&pad->pre_collapse_qiov);
+    }
     if (pad->buf) {
         qemu_vfree(pad->buf);
         qemu_iovec_destroy(&pad->local_qiov);
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
     memset(pad, 0, sizeof(*pad));
 }
 
+/*
+ * Create pad->local_qiov by wrapping @iov in the padding head and tail, while
+ * ensuring that the resulting vector will not exceed IOV_MAX elements.
+ *
+ * To ensure this, when necessary, the first two or three elements of @iov are
+ * merged into pad->collapse_bounce_buf and replaced by a reference to that
+ * bounce buffer in pad->local_qiov.
+ *
+ * After performing a read request, the data from the bounce buffer must be
+ * copied back into pad->pre_collapse_qiov (e.g. by bdrv_padding_finalize()).
+ */
+static int bdrv_create_padded_qiov(BlockDriverState *bs,
+                                   BdrvRequestPadding *pad,
+                                   struct iovec *iov, int niov,
+                                   size_t iov_offset, size_t bytes)
+{
+    int padded_niov, surplus_count, collapse_count;
+
+    /* Assert this invariant */
+    assert(niov <= IOV_MAX);
+
+    /*
+     * Cannot pad if resulting length would exceed SIZE_MAX.  Returning an error
+     * to the guest is not ideal, but there is little else we can do.  At least
+     * this will practically never happen on 64-bit systems.
+     */
+    if (SIZE_MAX - pad->head < bytes ||
+        SIZE_MAX - pad->head - bytes < pad->tail)
+    {
+        return -EINVAL;
+    }
+
+    /* Length of the resulting IOV if we just concatenated everything */
+    padded_niov = !!pad->head + niov + !!pad->tail;
+
+    qemu_iovec_init(&pad->local_qiov, MIN(padded_niov, IOV_MAX));
+
+    if (pad->head) {
+        qemu_iovec_add(&pad->local_qiov, pad->buf, pad->head);
+    }
+
+    /*
+     * If padded_niov > IOV_MAX, we cannot just concatenate everything.
+     * Instead, merge the first two or three elements of @iov to reduce the
+     * number of vector elements as necessary.
+     */
+    if (padded_niov > IOV_MAX) {
+        /*
+         * Only head and tail can have lead to the number of entries exceeding
+         * IOV_MAX, so we can exceed it by the head and tail at most.  We need
+         * to reduce the number of elements by `surplus_count`, so we merge that
+         * many elements plus one into one element.
+         */
+        surplus_count = padded_niov - IOV_MAX;
+        assert(surplus_count <= !!pad->head + !!pad->tail);
+        collapse_count = surplus_count + 1;
+
+        /*
+         * Move the elements to collapse into `pad->pre_collapse_qiov`, then
+         * advance `iov` (and associated variables) by those elements.
+         */
+        qemu_iovec_init(&pad->pre_collapse_qiov, collapse_count);
+        qemu_iovec_concat_iov(&pad->pre_collapse_qiov, iov,
+                              collapse_count, iov_offset, SIZE_MAX);
+        iov += collapse_count;
+        iov_offset = 0;
+        niov -= collapse_count;
+        bytes -= pad->pre_collapse_qiov.size;
+
+        /*
+         * Construct the bounce buffer to match the length of the to-collapse
+         * vector elements, and for write requests, initialize it with the data
+         * from those elements.  Then add it to `pad->local_qiov`.
+         */
+        pad->collapse_len = pad->pre_collapse_qiov.size;
+        pad->collapse_bounce_buf = qemu_blockalign(bs, pad->collapse_len);
+        if (pad->write) {
+            qemu_iovec_to_buf(&pad->pre_collapse_qiov, 0,
+                              pad->collapse_bounce_buf, pad->collapse_len);
+        }
+        qemu_iovec_add(&pad->local_qiov,
+                       pad->collapse_bounce_buf, pad->collapse_len);
+    }
+
+    qemu_iovec_concat_iov(&pad->local_qiov, iov, niov, iov_offset, bytes);
+
+    if (pad->tail) {
+        qemu_iovec_add(&pad->local_qiov,
+                       pad->buf + pad->buf_len - pad->tail, pad->tail);
+    }
+
+    assert(pad->local_qiov.niov == MIN(padded_niov, IOV_MAX));
+    return 0;
+}
+
 /*
  * bdrv_pad_request
  *
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
  * read of padding, bdrv_padding_rmw_read() should be called separately if
  * needed.
  *
+ * @write is true for write requests, false for read requests.
+ *
  * Request parameters (@qiov, &qiov_offset, &offset, &bytes) are in-out:
  *  - on function start they represent original request
  *  - on failure or when padding is not needed they are unchanged
@@ -XXX,XX +XXX,XX @@ static void bdrv_padding_destroy(BdrvRequestPadding *pad)
 static int bdrv_pad_request(BlockDriverState *bs,
                             QEMUIOVector **qiov, size_t *qiov_offset,
                             int64_t *offset, int64_t *bytes,
+                            bool write,
                             BdrvRequestPadding *pad, bool *padded,
                             BdrvRequestFlags *flags)
 {
     int ret;
+    struct iovec *sliced_iov;
+    int sliced_niov;
+    size_t sliced_head, sliced_tail;
 
     bdrv_check_qiov_request(*offset, *bytes, *qiov, *qiov_offset, &error_abort);
 
-    if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {
+    if (!bdrv_init_padding(bs, *offset, *bytes, write, pad)) {
         if (padded) {
             *padded = false;
         }
         return 0;
     }
 
-    ret = qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
-                                   *qiov, *qiov_offset, *bytes,
-                                   pad->buf + pad->buf_len - pad->tail,
-                                   pad->tail);
+    sliced_iov = qemu_iovec_slice(*qiov, *qiov_offset, *bytes,
+                                  &sliced_head, &sliced_tail,
+                                  &sliced_niov);
+
+    /* Guaranteed by bdrv_check_qiov_request() */
+    assert(*bytes <= SIZE_MAX);
+    ret = bdrv_create_padded_qiov(bs, pad, sliced_iov, sliced_niov,
+                                  sliced_head, *bytes);
     if (ret < 0) {
-        bdrv_padding_destroy(pad);
+        bdrv_padding_finalize(pad);
         return ret;
     }
     *bytes += pad->head + pad->tail;
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
         flags |= BDRV_REQ_COPY_ON_READ;
     }
 
-    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
-                           NULL, &flags);
+    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, false,
+                           &pad, NULL, &flags);
     if (ret < 0) {
         goto fail;
     }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
                               bs->bl.request_alignment,
                               qiov, qiov_offset, flags);
     tracked_request_end(&req);
-    bdrv_padding_destroy(&pad);
+    bdrv_padding_finalize(&pad);
 
 fail:
     bdrv_dec_in_flight(bs);
@@ -XXX,XX +XXX,XX @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
     /* This flag doesn't make sense for padding or zero writes */
     flags &= ~BDRV_REQ_REGISTERED_BUF;
 
-    padding = bdrv_init_padding(bs, offset, bytes, &pad);
+    padding = bdrv_init_padding(bs, offset, bytes, true, &pad);
     if (padding) {
         assert(!(flags & BDRV_REQ_NO_WAIT));
         bdrv_make_request_serialising(req, align);
@@ -XXX,XX +XXX,XX @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
     }
 
 out:
-    bdrv_padding_destroy(&pad);
+    bdrv_padding_finalize(&pad);
 
     return ret;
 }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
          * bdrv_co_do_zero_pwritev() does aligning by itself, so, we do
          * alignment only if there is no ZERO flag.
          */
-        ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, &pad,
-                               &padded, &flags);
+        ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, &bytes, true,
+                               &pad, &padded, &flags);
         if (ret < 0) {
             return ret;
         }
@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
     ret = bdrv_aligned_pwritev(child, &req, offset, bytes, align,
                                qiov, qiov_offset, flags);
 
-    bdrv_padding_destroy(&pad);
+    bdrv_padding_finalize(&pad);
 
 out:
     tracked_request_end(&req);
-- 
2.40.1

bdrv_pad_request() was the main user of qemu_iovec_init_extended().
HEAD^ has removed that use, so we can remove qemu_iovec_init_extended()
now.

The only remaining user is qemu_iovec_init_slice(), which can easily
inline the small part it really needs.

Note that qemu_iovec_init_extended() offered a memcpy() optimization to
initialize the new I/O vector.  qemu_iovec_concat_iov(), which is used
to replace its functionality, does not, but calls qemu_iovec_add() for
every single element.  If we decide this optimization was important, we
will need to re-implement it in qemu_iovec_concat_iov(), which might
also benefit its pre-existing users.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-4-hreitz@redhat.com>
---
 include/qemu/iov.h |  5 ---
 util/iov.c         | 79 +++++++---------------------------------------
 2 files changed, 11 insertions(+), 73 deletions(-)

diff --git a/include/qemu/iov.h b/include/qemu/iov.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/iov.h
+++ b/include/qemu/iov.h
@@ -XXX,XX +XXX,XX @@ static inline void *qemu_iovec_buf(QEMUIOVector *qiov)
 
 void qemu_iovec_init(QEMUIOVector *qiov, int alloc_hint);
 void qemu_iovec_init_external(QEMUIOVector *qiov, struct iovec *iov, int niov);
-int qemu_iovec_init_extended(
-        QEMUIOVector *qiov,
-        void *head_buf, size_t head_len,
-        QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
-        void *tail_buf, size_t tail_len);
 void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
                            size_t offset, size_t len);
 struct iovec *qemu_iovec_slice(QEMUIOVector *qiov,
diff --git a/util/iov.c b/util/iov.c
index XXXXXXX..XXXXXXX 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -XXX,XX +XXX,XX @@ int qemu_iovec_subvec_niov(QEMUIOVector *qiov, size_t offset, size_t len)
     return niov;
 }
 
-/*
- * Compile new iovec, combining @head_buf buffer, sub-qiov of @mid_qiov,
- * and @tail_buf buffer into new qiov.
- */
-int qemu_iovec_init_extended(
-        QEMUIOVector *qiov,
-        void *head_buf, size_t head_len,
-        QEMUIOVector *mid_qiov, size_t mid_offset, size_t mid_len,
-        void *tail_buf, size_t tail_len)
-{
-    size_t mid_head, mid_tail;
-    int total_niov, mid_niov = 0;
-    struct iovec *p, *mid_iov = NULL;
-
-    assert(mid_qiov->niov <= IOV_MAX);
-
-    if (SIZE_MAX - head_len < mid_len ||
-        SIZE_MAX - head_len - mid_len < tail_len)
-    {
-        return -EINVAL;
-    }
-
-    if (mid_len) {
-        mid_iov = qemu_iovec_slice(mid_qiov, mid_offset, mid_len,
-                                   &mid_head, &mid_tail, &mid_niov);
-    }
-
-    total_niov = !!head_len + mid_niov + !!tail_len;
-    if (total_niov > IOV_MAX) {
-        return -EINVAL;
-    }
-
-    if (total_niov == 1) {
-        qemu_iovec_init_buf(qiov, NULL, 0);
-        p = &qiov->local_iov;
-    } else {
-        qiov->niov = qiov->nalloc = total_niov;
-        qiov->size = head_len + mid_len + tail_len;
-        p = qiov->iov = g_new(struct iovec, qiov->niov);
-    }
-
-    if (head_len) {
-        p->iov_base = head_buf;
-        p->iov_len = head_len;
-        p++;
-    }
-
-    assert(!mid_niov == !mid_len);
-    if (mid_niov) {
-        memcpy(p, mid_iov, mid_niov * sizeof(*p));
-        p[0].iov_base = (uint8_t *)p[0].iov_base + mid_head;
-        p[0].iov_len -= mid_head;
-        p[mid_niov - 1].iov_len -= mid_tail;
-        p += mid_niov;
-    }
-
-    if (tail_len) {
-        p->iov_base = tail_buf;
-        p->iov_len = tail_len;
-    }
-
-    return 0;
-}
-
 /*
  * Check if the contents of subrange of qiov data is all zeroes.
  */
@@ -XXX,XX +XXX,XX @@ bool qemu_iovec_is_zero(QEMUIOVector *qiov, size_t offset, size_t bytes)
 void qemu_iovec_init_slice(QEMUIOVector *qiov, QEMUIOVector *source,
                            size_t offset, size_t len)
 {
-    int ret;
+    struct iovec *slice_iov;
+    int slice_niov;
+    size_t slice_head, slice_tail;
 
     assert(source->size >= len);
     assert(source->size - len >= offset);
 
-    /* We shrink the request, so we can't overflow neither size_t nor MAX_IOV */
-    ret = qemu_iovec_init_extended(qiov, NULL, 0, source, offset, len, NULL, 0);
-    assert(ret == 0);
+    slice_iov = qemu_iovec_slice(source, offset, len,
+                                 &slice_head, &slice_tail, &slice_niov);
+    if (slice_niov == 1) {
+        qemu_iovec_init_buf(qiov, slice_iov[0].iov_base + slice_head, len);
+    } else {
+        qemu_iovec_init(qiov, slice_niov);
+        qemu_iovec_concat_iov(qiov, slice_iov, slice_niov, slice_head, len);
+    }
 }
 
 void qemu_iovec_destroy(QEMUIOVector *qiov)
-- 
2.40.1

Test that even vectored IO requests with 1024 vector elements that are
not aligned to the device's request alignment will succeed.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-Id: <20230411173418.19549-5-hreitz@redhat.com>
---
 tests/qemu-iotests/tests/iov-padding     | 85 ++++++++++++++++++++++++
 tests/qemu-iotests/tests/iov-padding.out | 59 ++++++++++++++++
 2 files changed, 144 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/iov-padding
 create mode 100644 tests/qemu-iotests/tests/iov-padding.out

diff --git a/tests/qemu-iotests/tests/iov-padding b/tests/qemu-iotests/tests/iov-padding
new file mode 100755
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qemu-iotests/tests/iov-padding
@@ -XXX,XX +XXX,XX @@
+#!/usr/bin/env bash
+# group: rw quick
+#
+# Check the interaction of request padding (to fit alignment restrictions) with
+# vectored I/O from the guest
+#
+# Copyright Red Hat
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto file
+
+_make_test_img 1M
+
+IMGSPEC="driver=blkdebug,align=4096,image.driver=file,image.filename=$TEST_IMG"
+
+# Four combinations:
+# - Offset 4096, length 1023 * 512 + 512: Fully aligned to 4k
+# - Offset 4096, length 1023 * 512 + 4096: Head is aligned, tail is not
+# - Offset 512, length 1023 * 512 + 512: Neither head nor tail are aligned
+# - Offset 512, length 1023 * 512 + 4096: Tail is aligned, head is not
+for start_offset in 4096 512; do
+    for last_element_length in 512 4096; do
+        length=$((1023 * 512 + $last_element_length))
+
+        echo
+        echo "== performing 1024-element vectored requests to image (offset: $start_offset; length: $length) =="
+
+        # Fill with data for testing
+        $QEMU_IO -c 'write -P 1 0 1M' "$TEST_IMG" | _filter_qemu_io
+
+        # 1023 512-byte buffers, and then one with length $last_element_length
+        cmd_params="-P 2 $start_offset $(yes 512 | head -n 1023 | tr '\n' ' ') $last_element_length"
+        QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" $QEMU_IO \
+            -c "writev $cmd_params" \
+            --image-opts \
+            "$IMGSPEC" \
+            | _filter_qemu_io
+
+        # Read all patterns -- read the part we just wrote with writev twice,
+        # once "normally", and once with a readv, so we see that that works, too
+        QEMU_IO_OPTIONS="$QEMU_IO_OPTIONS_NO_FMT" $QEMU_IO \
+            -c "read -P 1 0 $start_offset" \
+            -c "read -P 2 $start_offset $length" \
+            -c "readv $cmd_params" \
+            -c "read -P 1 $((start_offset + length)) $((1024 * 1024 - length - start_offset))" \
+            --image-opts \
+            "$IMGSPEC" \
+            | _filter_qemu_io
+    done
+done
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/iov-padding.out b/tests/qemu-iotests/tests/iov-padding.out
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qemu-iotests/tests/iov-padding.out
@@ -XXX,XX +XXX,XX @@
+QA output created by iov-padding
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+
+== performing 1024-element vectored requests to image (offset: 4096; length: 524288) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 524288/524288 bytes at offset 4096
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 4096
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 4096
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 520192/520192 bytes at offset 528384
+508 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== performing 1024-element vectored requests to image (offset: 4096; length: 527872) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 527872/527872 bytes at offset 4096
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 4096/4096 bytes at offset 0
+4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 4096
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 4096
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 516608/516608 bytes at offset 531968
+504.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== performing 1024-element vectored requests to image (offset: 512; length: 524288) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 524288/524288 bytes at offset 512
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 512
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 512
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 523776/523776 bytes at offset 524800
+511.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== performing 1024-element vectored requests to image (offset: 512; length: 527872) ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 527872/527872 bytes at offset 512
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 512
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 527872/527872 bytes at offset 512
+515.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 520192/520192 bytes at offset 528384
+508 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+*** done
-- 
2.40.1