Series comparison

-[PULL 00/14] Block patches
+[PULL v2 00/28] Block patches
-The following changes since commit f5fe7c17ac4e309e47e78f0f9761aebc8d2f2c81:
+The following changes since commit ac793156f650ae2d77834932d72224175ee69086:
-  Merge tag 'pull-tcg-20230823-2' of https://gitlab.com/rth7680/qemu into staging (2023-08-28 16:07:04 -0400)
+  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201020-1' into staging (2020-10-20 21:11:35 +0100)
 are available in the Git repository at:
-  https://gitlab.com/hreitz/qemu.git tags/pull-block-2023-09-01
+  https://gitlab.com/stefanha/qemu.git tags/block-pull-request
-for you to fetch changes up to 380448464dd89291cf7fd7434be6c225482a334d:
+for you to fetch changes up to 32a3fd65e7e3551337fd26bfc0e2f899d70c028c:
-  tests/file-io-error: New test (2023-08-29 13:01:24 +0200)
+  iotests: add commit top->base cases to 274 (2020-10-22 09:55:39 +0100)
 ----------------------------------------------------------------
-Block patches
+Pull request
-- Fix for file-posix's zoning code crashing on I/O errors
+v2:
-- Throttling refactoring
+ * Fix format string issues on 32-bit hosts [Peter]
  * Fix qemu-nbd.c CONFIG_POSIX ifdef issue [Eric]
  * Fix missing eventfd.h header on macOS [Peter]
  * Drop unreliable vhost-user-blk test (will send a new patch when ready) [Peter]
 This pull request contains the vhost-user-blk server by Coiby Xu along with my
 additions, block/nvme.c alignment and hardware error statistics by Philippe
 Mathieu-Daudé, and bdrv_co_block_status_above() fixes by Vladimir
 Sementsov-Ogievskiy.
 ----------------------------------------------------------------
-Hanna Czenczek (5):
-  file-posix: Clear bs->bl.zoned on error
-  file-posix: Check bs->bl.zoned for zone info
-  file-posix: Fix zone update in I/O error path
-  file-posix: Simplify raw_co_prw's 'out' zone code
-  tests/file-io-error: New test
-Zhenwei Pi (9):
+Coiby Xu (6):
-  throttle: introduce enum ThrottleDirection
+  libvhost-user: Allow vu_message_read to be replaced
-  test-throttle: use enum ThrottleDirection
+  libvhost-user: remove watch for kick_fd when de-initialize vu-dev
-  throttle: support read-only and write-only
+  util/vhost-user-server: generic vhost user server
-  test-throttle: test read only and write only
+  block: move logical block size check function to a common utility
-  cryptodev: use NULL throttle timer cb for read direction
+    function
-  throttle: use enum ThrottleDirection instead of bool is_write
+  block/export: vhost-user block device backend server
-  throttle: use THROTTLE_MAX/ARRAY_SIZE for hard code
+  MAINTAINERS: Add vhost-user block device backend server maintainer
   fsdev: Use ThrottleDirection instread of bool is_write
   block/throttle-groups: Use ThrottleDirection instread of bool is_write
- fsdev/qemu-fsdev-throttle.h                |   4 +-
+Philippe Mathieu-Daudé (1):
- include/block/throttle-groups.h            |   6 +-
+  block/nvme: Add driver statistics for access alignment and hw errors
- include/qemu/throttle.h                    |  16 +-
- backends/cryptodev.c                       |  12 +-
+Stefan Hajnoczi (16):
- block/block-backend.c                      |   4 +-
+  util/vhost-user-server: s/fileds/fields/ typo fix
- block/file-posix.c                         |  42 +++---
+  util/vhost-user-server: drop unnecessary QOM cast
- block/throttle-groups.c                    | 163 +++++++++++----------
+  util/vhost-user-server: drop unnecessary watch deletion
- block/throttle.c                           |   8 +-
+  block/export: consolidate request structs into VuBlockReq
- fsdev/qemu-fsdev-throttle.c                |  18 ++-
+  util/vhost-user-server: drop unused DevicePanicNotifier
- hw/9pfs/cofile.c                           |   4 +-
+  util/vhost-user-server: fix memory leak in vu_message_read()
- tests/unit/test-throttle.c                 |  76 +++++++++-
+  util/vhost-user-server: check EOF when reading payload
- util/throttle.c                            |  84 +++++++----
+  util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
- tests/qemu-iotests/tests/file-io-error     | 119 +++++++++++++++
+  block/export: report flush errors
- tests/qemu-iotests/tests/file-io-error.out |  33 +++++
+  block/export: convert vhost-user-blk server to block export API
-files changed, 418 insertions(+), 171 deletions(-)
+  util/vhost-user-server: move header to include/
- create mode 100755 tests/qemu-iotests/tests/file-io-error
+  util/vhost-user-server: use static library in meson.build
- create mode 100644 tests/qemu-iotests/tests/file-io-error.out
+  qemu-storage-daemon: avoid compiling blockdev_ss twice
   block: move block exports to libblockdev
   block/export: add iothread and fixed-iothread options
   block/export: add vhost-user-blk multi-queue support
 Vladimir Sementsov-Ogievskiy (5):
   block/io: fix bdrv_co_block_status_above
   block/io: bdrv_common_block_status_above: support include_base
   block/io: bdrv_common_block_status_above: support bs == base
   block/io: fix bdrv_is_allocated_above
   iotests: add commit top->base cases to 274
  MAINTAINERS                                |   9 +
  qapi/block-core.json                       |  24 +-
  qapi/block-export.json                     |  36 +-
  block/coroutines.h                         |   2 +
  block/export/vhost-user-blk-server.h       |  19 +
  contrib/libvhost-user/libvhost-user.h      |  21 +
  include/qemu/vhost-user-server.h           |  65 +++
  util/block-helpers.h                       |  19 +
  block/export/export.c                      |  37 +-
  block/export/vhost-user-blk-server.c       | 431 ++++++++++++++++++++
  block/io.c                                 | 132 +++---
  block/nvme.c                               |  27 ++
  block/qcow2.c                              |  16 +-
  contrib/libvhost-user/libvhost-user-glib.c |   2 +-
  contrib/libvhost-user/libvhost-user.c      |  15 +-
  hw/core/qdev-properties-system.c           |  31 +-
  nbd/server.c                               |   2 -
  qemu-nbd.c                                 |  21 +-
  softmmu/vl.c                               |   4 +
  stubs/blk-exp-close-all.c                  |   7 +
  tests/vhost-user-bridge.c                  |   2 +
  tools/virtiofsd/fuse_virtio.c              |   4 +-
  util/block-helpers.c                       |  46 +++
  util/vhost-user-server.c                   | 446 +++++++++++++++++++++
  block/export/meson.build                   |   3 +-
  contrib/libvhost-user/meson.build          |   1 +
  meson.build                                |  22 +-
  nbd/meson.build                            |   2 +
  storage-daemon/meson.build                 |   3 +-
  stubs/meson.build                          |   1 +
  tests/qemu-iotests/274                     |  20 +
  tests/qemu-iotests/274.out                 |  68 ++++
  util/meson.build                           |   4 +
 files changed, 1420 insertions(+), 122 deletions(-)
  create mode 100644 block/export/vhost-user-blk-server.h
  create mode 100644 include/qemu/vhost-user-server.h
  create mode 100644 util/block-helpers.h
  create mode 100644 block/export/vhost-user-blk-server.c
  create mode 100644 stubs/blk-exp-close-all.c
  create mode 100644 util/block-helpers.c
  create mode 100644 util/vhost-user-server.c
 --
-.41.0
+.26.2

-New patch
+[PULL v2 01/28] block/nvme: Add driver statistics for access alignment and hw errors
+From: Philippe Mathieu-Daudé <philmd@redhat.com>
+Keep statistics of some hardware errors, and number of
+aligned/unaligned I/O accesses.
+QMP example booting a full RHEL 8.3 aarch64 guest:
+{ "execute": "query-blockstats" }
+{
+    "return": [
+        {
+            "device": "",
+            "node-name": "drive0",
+            "stats": {
+                "flush_total_time_ns": 6026948,
+                "wr_highest_offset": 3383991230464,
+                "wr_total_time_ns": 807450995,
+                "failed_wr_operations": 0,
+                "failed_rd_operations": 0,
+                "wr_merged": 3,
+                "wr_bytes": 50133504,
+                "failed_unmap_operations": 0,
+                "failed_flush_operations": 0,
+                "account_invalid": false,
+                "rd_total_time_ns": 1846979900,
+                "flush_operations": 130,
+                "wr_operations": 659,
+                "rd_merged": 1192,
+                "rd_bytes": 218244096,
+                "account_failed": false,
+                "idle_time_ns": 2678641497,
+                "rd_operations": 7406,
+            },
+            "driver-specific": {
+                "driver": "nvme",
+                "completion-errors": 0,
+                "unaligned-accesses": 2959,
+                "aligned-accesses": 4477
+            },
+            "qdev": "/machine/peripheral-anon/device[0]/virtio-backend"
+        }
+    ]
+}
+Suggested-by: Stefan Hajnoczi <stefanha@gmail.com>
+Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Acked-by: Markus Armbruster <armbru@redhat.com>
+Message-id: 20201001162939.1567915-1-philmd@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ qapi/block-core.json | 24 +++++++++++++++++++++++-
+ block/nvme.c         | 27 +++++++++++++++++++++++++++
+files changed, 50 insertions(+), 1 deletion(-)
+diff --git a/qapi/block-core.json b/qapi/block-core.json
+index XXXXXXX..XXXXXXX 100644
+--- a/qapi/block-core.json
++++ b/qapi/block-core.json
+@@ -XXX,XX +XXX,XX @@
+       'discard-nb-failed': 'uint64',
+       'discard-bytes-ok': 'uint64' } }
++##
++# @BlockStatsSpecificNvme:
++#
++# NVMe driver statistics
++#
++# @completion-errors: The number of completion errors.
++#
++# @aligned-accesses: The number of aligned accesses performed by
++#                    the driver.
++#
++# @unaligned-accesses: The number of unaligned accesses performed by
++#                      the driver.
++#
++# Since: 5.2
++##
++{ 'struct': 'BlockStatsSpecificNvme',
++  'data': {
++      'completion-errors': 'uint64',
++      'aligned-accesses': 'uint64',
++      'unaligned-accesses': 'uint64' } }
++
+ ##
+ # @BlockStatsSpecific:
+ #
+@@ -XXX,XX +XXX,XX @@
+   'discriminator': 'driver',
+   'data': {
+       'file': 'BlockStatsSpecificFile',
+-      'host_device': 'BlockStatsSpecificFile' } }
++      'host_device': 'BlockStatsSpecificFile',
++      'nvme': 'BlockStatsSpecificNvme' } }
+ ##
+ # @BlockStats:
+diff --git a/block/nvme.c b/block/nvme.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/nvme.c
++++ b/block/nvme.c
+@@ -XXX,XX +XXX,XX @@ struct BDRVNVMeState {
+     /* PCI address (required for nvme_refresh_filename()) */
+     char *device;
++
++    struct {
++        uint64_t completion_errors;
++        uint64_t aligned_accesses;
++        uint64_t unaligned_accesses;
++    } stats;
+ };
+ #define NVME_BLOCK_OPT_DEVICE "device"
+@@ -XXX,XX +XXX,XX @@ static bool nvme_process_completion(NVMeQueuePair *q)
+             break;
+         }
+         ret = nvme_translate_error(c);
++        if (ret) {
++            s->stats.completion_errors++;
++        }
+         q->cq.head = (q->cq.head + 1) % NVME_QUEUE_SIZE;
+         if (!q->cq.head) {
+             q->cq_phase = !q->cq_phase;
+@@ -XXX,XX +XXX,XX @@ static int nvme_co_prw(BlockDriverState *bs, uint64_t offset, uint64_t bytes,
+     assert(QEMU_IS_ALIGNED(bytes, s->page_size));
+     assert(bytes <= s->max_transfer);
+     if (nvme_qiov_aligned(bs, qiov)) {
++        s->stats.aligned_accesses++;
+         return nvme_co_prw_aligned(bs, offset, bytes, qiov, is_write, flags);
+     }
++    s->stats.unaligned_accesses++;
+     trace_nvme_prw_buffered(s, offset, bytes, qiov->niov, is_write);
+     buf = qemu_try_memalign(s->page_size, bytes);
+@@ -XXX,XX +XXX,XX @@ static void nvme_unregister_buf(BlockDriverState *bs, void *host)
+     qemu_vfio_dma_unmap(s->vfio, host);
+ }
++static BlockStatsSpecific *nvme_get_specific_stats(BlockDriverState *bs)
++{
++    BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
++    BDRVNVMeState *s = bs->opaque;
++
++    stats->driver = BLOCKDEV_DRIVER_NVME;
++    stats->u.nvme = (BlockStatsSpecificNvme) {
++        .completion_errors = s->stats.completion_errors,
++        .aligned_accesses = s->stats.aligned_accesses,
++        .unaligned_accesses = s->stats.unaligned_accesses,
++    };
++
++    return stats;
++}
++
+ static const char *const nvme_strong_runtime_opts[] = {
+     NVME_BLOCK_OPT_DEVICE,
+     NVME_BLOCK_OPT_NAMESPACE,
+@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_nvme = {
+     .bdrv_refresh_filename    = nvme_refresh_filename,
+     .bdrv_refresh_limits      = nvme_refresh_limits,
+     .strong_runtime_opts      = nvme_strong_runtime_opts,
++    .bdrv_get_specific_stats  = nvme_get_specific_stats,
+     .bdrv_detach_aio_context  = nvme_detach_aio_context,
+     .bdrv_attach_aio_context  = nvme_attach_aio_context,
+--
+.26.2

-[PULL 09/14] block/throttle-groups: Use ThrottleDirection instread of bool is_write
+[PULL v2 02/28] libvhost-user: Allow vu_message_read to be replaced
-From: zhenwei pi <pizhenwei@bytedance.com>
+From: Coiby Xu <coiby.xu@gmail.com>
-'bool is_write' style is obsolete from throttle framework, adapt
+Allow vu_message_read to be replaced by one which will make use of the
-block throttle groups to the new style:
+QIOChannel functions. Thus reading vhost-user message won't stall the
-- use ThrottleDirection instead of 'bool is_write'. Ex,
+guest. For slave channel, we still use the default vu_message_read.
   schedule_next_request(ThrottleGroupMember *tgm, bool is_write)
   -> schedule_next_request(ThrottleGroupMember *tgm, ThrottleDirection direction)
-- use THROTTLE_MAX instead of hard code. Ex, ThrottleGroupMember *tokens[2]
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
-  -> ThrottleGroupMember *tokens[THROTTLE_MAX]
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
 Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-id: 20200918080912.321299-2-coiby.xu@gmail.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
  contrib/libvhost-user/libvhost-user.h      | 21 +++++++++++++++++++++
  contrib/libvhost-user/libvhost-user-glib.c |  2 +-
  contrib/libvhost-user/libvhost-user.c      | 14 +++++++-------
  tests/vhost-user-bridge.c                  |  2 ++
  tools/virtiofsd/fuse_virtio.c              |  4 ++--
 files changed, 33 insertions(+), 10 deletions(-)
-- use ThrottleDirection instead of hard code on iteration. Ex, (i = 0; i < 2; i++)
+diff --git a/contrib/libvhost-user/libvhost-user.h b/contrib/libvhost-user/libvhost-user.h
   -> for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++)
 Use a simple python script to test the new style:
  #!/usr/bin/python3
 import subprocess
 import random
 import time
 commands = ['virsh blkdeviotune jammy vda --write-bytes-sec ', \
             'virsh blkdeviotune jammy vda --write-iops-sec ', \
             'virsh blkdeviotune jammy vda --read-bytes-sec ', \
             'virsh blkdeviotune jammy vda --read-iops-sec ']
 for loop in range(1, 1000):
     time.sleep(random.randrange(3, 5))
     command = commands[random.randrange(0, 3)] + str(random.randrange(0, 1000000))
     subprocess.run(command, shell=True, check=True)
 This works fine.
 Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
 Message-Id: <20230728022006.1098509-10-pizhenwei@bytedance.com>
 Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
  include/block/throttle-groups.h |   6 +-
  block/block-backend.c           |   4 +-
  block/throttle-groups.c         | 161 ++++++++++++++++----------------
  block/throttle.c                |   8 +-
 files changed, 90 insertions(+), 89 deletions(-)
 diff --git a/include/block/throttle-groups.h b/include/block/throttle-groups.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/block/throttle-groups.h
+--- a/contrib/libvhost-user/libvhost-user.h
-+++ b/include/block/throttle-groups.h
++++ b/contrib/libvhost-user/libvhost-user.h
-@@ -XXX,XX +XXX,XX @@ typedef struct ThrottleGroupMember {
+@@ -XXX,XX +XXX,XX @@
-     AioContext   *aio_context;
+  */
-     /* throttled_reqs_lock protects the CoQueues for throttled requests.  */
+ #define VHOST_USER_MAX_RAM_SLOTS 32
-     CoMutex      throttled_reqs_lock;
--    CoQueue      throttled_reqs[2];
++#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
-+    CoQueue      throttled_reqs[THROTTLE_MAX];
++
+ typedef enum VhostSetConfigType {
-     /* Nonzero if the I/O limits are currently being ignored; generally
+     VHOST_SET_CONFIG_TYPE_MASTER = 0,
-      * it is zero.  Accessed with atomic operations.
+     VHOST_SET_CONFIG_TYPE_MIGRATION = 1,
-@@ -XXX,XX +XXX,XX @@ typedef struct ThrottleGroupMember {
+@@ -XXX,XX +XXX,XX @@ typedef uint64_t (*vu_get_features_cb) (VuDev *dev);
-      * throttle_state tells us if I/O limits are configured. */
+ typedef void (*vu_set_features_cb) (VuDev *dev, uint64_t features);
-     ThrottleState *throttle_state;
+ typedef int (*vu_process_msg_cb) (VuDev *dev, VhostUserMsg *vmsg,
-     ThrottleTimers throttle_timers;
+                                   int *do_reply);
--    unsigned       pending_reqs[2];
++typedef bool (*vu_read_msg_cb) (VuDev *dev, int sock, VhostUserMsg *vmsg);
-+    unsigned       pending_reqs[THROTTLE_MAX];
+ typedef void (*vu_queue_set_started_cb) (VuDev *dev, int qidx, bool started);
-     QLIST_ENTRY(ThrottleGroupMember) round_robin;
+ typedef bool (*vu_queue_is_processed_in_order_cb) (VuDev *dev, int qidx);
+ typedef int (*vu_get_config_cb) (VuDev *dev, uint8_t *config, uint32_t len);
- } ThrottleGroupMember;
+@@ -XXX,XX +XXX,XX @@ struct VuDev {
-@@ -XXX,XX +XXX,XX @@ void throttle_group_restart_tgm(ThrottleGroupMember *tgm);
+     bool broken;
+     uint16_t max_queues;
- void coroutine_fn throttle_group_co_io_limits_intercept(ThrottleGroupMember *tgm,
-                                                         int64_t bytes,
++    /* @read_msg: custom method to read vhost-user message
--                                                        bool is_write);
++     *
-+                                                        ThrottleDirection direction);
++     * Read data from vhost_user socket fd and fill up
- void throttle_group_attach_aio_context(ThrottleGroupMember *tgm,
++     * the passed VhostUserMsg *vmsg struct.
-                                        AioContext *new_context);
++     *
- void throttle_group_detach_aio_context(ThrottleGroupMember *tgm);
++     * If reading fails, it should close the received set of file
-diff --git a/block/block-backend.c b/block/block-backend.c
++     * descriptors as socket message's auxiliary data.
 +     *
 +     * For the details, please refer to vu_message_read in libvhost-user.c
 +     * which will be used by default if not custom method is provided when
 +     * calling vu_init
 +     *
 +     * Returns: true if vhost-user message successfully received,
 +     *          otherwise return false.
 +     *
 +     */
 +    vu_read_msg_cb read_msg;
      /* @set_watch: add or update the given fd to the watch set,
       * call cb when condition is met */
      vu_set_watch_cb set_watch;
@@ -XXX,XX +XXX,XX @@ bool vu_init(VuDev *dev,
               uint16_t max_queues,
               int socket,
               vu_panic_cb panic,
 +             vu_read_msg_cb read_msg,
               vu_set_watch_cb set_watch,
               vu_remove_watch_cb remove_watch,
               const VuDevIface *iface);
 diff --git a/contrib/libvhost-user/libvhost-user-glib.c b/contrib/libvhost-user/libvhost-user-glib.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/block-backend.c
+--- a/contrib/libvhost-user/libvhost-user-glib.c
-+++ b/block/block-backend.c
++++ b/contrib/libvhost-user/libvhost-user-glib.c
-@@ -XXX,XX +XXX,XX @@ blk_co_do_preadv_part(BlockBackend *blk, int64_t offset, int64_t bytes,
+@@ -XXX,XX +XXX,XX @@ vug_init(VugDev *dev, uint16_t max_queues, int socket,
-     /* throttling disk I/O */
+     g_assert(dev);
-     if (blk->public.throttle_group_member.throttle_state) {
+     g_assert(iface);
-         throttle_group_co_io_limits_intercept(&blk->public.throttle_group_member,
--                bytes, false);
+-    if (!vu_init(&dev->parent, max_queues, socket, panic, set_watch,
-+                bytes, THROTTLE_READ);
++    if (!vu_init(&dev->parent, max_queues, socket, panic, NULL, set_watch,
                   remove_watch, iface)) {
          return false;
      }
+diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
-     ret = bdrv_co_preadv_part(blk->root, offset, bytes, qiov, qiov_offset,
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ blk_co_do_pwritev_part(BlockBackend *blk, int64_t offset, int64_t bytes,
+--- a/contrib/libvhost-user/libvhost-user.c
-     /* throttling disk I/O */
++++ b/contrib/libvhost-user/libvhost-user.c
-     if (blk->public.throttle_group_member.throttle_state) {
+@@ -XXX,XX +XXX,XX @@
-         throttle_group_co_io_limits_intercept(&blk->public.throttle_group_member,
+ /* The version of inflight buffer */
--                bytes, true);
+ #define INFLIGHT_VERSION 1
-+                bytes, THROTTLE_WRITE);
 -#define VHOST_USER_HDR_SIZE offsetof(VhostUserMsg, payload.u64)
 -
  /* The version of the protocol we support */
  #define VHOST_USER_VERSION 1
  #define LIBVHOST_USER_DEBUG 0
@@ -XXX,XX +XXX,XX @@ have_userfault(void)
  }
  static bool
 -vu_message_read(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
 +vu_message_read_default(VuDev *dev, int conn_fd, VhostUserMsg *vmsg)
  {
      char control[CMSG_SPACE(VHOST_MEMORY_BASELINE_NREGIONS * sizeof(int))] = {};
      struct iovec iov = {
@@ -XXX,XX +XXX,XX @@ vu_process_message_reply(VuDev *dev, const VhostUserMsg *vmsg)
          goto out;
      }
-     if (!blk->enable_write_cache) {
+-    if (!vu_message_read(dev, dev->slave_fd, &msg_reply)) {
-diff --git a/block/throttle-groups.c b/block/throttle-groups.c
++    if (!vu_message_read_default(dev, dev->slave_fd, &msg_reply)) {
-index XXXXXXX..XXXXXXX 100644
+         goto out;
 --- a/block/throttle-groups.c
 +++ b/block/throttle-groups.c
@@ -XXX,XX +XXX,XX @@
  static void throttle_group_obj_init(Object *obj);
  static void throttle_group_obj_complete(UserCreatable *obj, Error **errp);
 -static void timer_cb(ThrottleGroupMember *tgm, bool is_write);
 +static void timer_cb(ThrottleGroupMember *tgm, ThrottleDirection direction);
  /* The ThrottleGroup structure (with its ThrottleState) is shared
   * among different ThrottleGroupMembers and it's independent from
@@ -XXX,XX +XXX,XX @@ struct ThrottleGroup {
      QemuMutex lock; /* This lock protects the following four fields */
      ThrottleState ts;
      QLIST_HEAD(, ThrottleGroupMember) head;
 -    ThrottleGroupMember *tokens[2];
 -    bool any_timer_armed[2];
 +    ThrottleGroupMember *tokens[THROTTLE_MAX];
 +    bool any_timer_armed[THROTTLE_MAX];
      QEMUClockType clock_type;
      /* This field is protected by the global QEMU mutex */
@@ -XXX,XX +XXX,XX @@ static ThrottleGroupMember *throttle_group_next_tgm(ThrottleGroupMember *tgm)
   * This assumes that tg->lock is held.
   *
   * @tgm:        the ThrottleGroupMember
 - * @is_write:   the type of operation (read/write)
 + * @direction:  the ThrottleDirection
   * @ret:        whether the ThrottleGroupMember has pending requests.
   */
  static inline bool tgm_has_pending_reqs(ThrottleGroupMember *tgm,
 -                                        bool is_write)
 +                                        ThrottleDirection direction)
  {
 -    return tgm->pending_reqs[is_write];
 +    return tgm->pending_reqs[direction];
  }
  /* Return the next ThrottleGroupMember in the round-robin sequence with pending
@@ -XXX,XX +XXX,XX @@ static inline bool tgm_has_pending_reqs(ThrottleGroupMember *tgm,
   * This assumes that tg->lock is held.
   *
   * @tgm:       the current ThrottleGroupMember
 - * @is_write:  the type of operation (read/write)
 + * @direction: the ThrottleDirection
   * @ret:       the next ThrottleGroupMember with pending requests, or tgm if
   *             there is none.
   */
  static ThrottleGroupMember *next_throttle_token(ThrottleGroupMember *tgm,
 -                                                bool is_write)
 +                                                ThrottleDirection direction)
  {
      ThrottleState *ts = tgm->throttle_state;
      ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
@@ -XXX,XX +XXX,XX @@ static ThrottleGroupMember *next_throttle_token(ThrottleGroupMember *tgm,
       * it's being drained. Skip the round-robin search and return tgm
       * immediately if it has pending requests. Otherwise we could be
       * forcing it to wait for other member's throttled requests. */
 -    if (tgm_has_pending_reqs(tgm, is_write) &&
 +    if (tgm_has_pending_reqs(tgm, direction) &&
          qatomic_read(&tgm->io_limits_disabled)) {
          return tgm;
      }
--    start = token = tg->tokens[is_write];
+@@ -XXX,XX +XXX,XX @@ vu_set_mem_table_exec_postcopy(VuDev *dev, VhostUserMsg *vmsg)
-+    start = token = tg->tokens[direction];
+     /* Wait for QEMU to confirm that it's registered the handler for the
+      * faults.
-     /* get next bs round in round robin style */
+      */
-     token = throttle_group_next_tgm(token);
+-    if (!vu_message_read(dev, dev->sock, vmsg) ||
--    while (token != start && !tgm_has_pending_reqs(token, is_write)) {
++    if (!dev->read_msg(dev, dev->sock, vmsg) ||
-+    while (token != start && !tgm_has_pending_reqs(token, direction)) {
+         vmsg->size != sizeof(vmsg->payload.u64) ||
-         token = throttle_group_next_tgm(token);
+         vmsg->payload.u64 != 0) {
          vu_panic(dev, "failed to receive valid ack for postcopy set-mem-table");
@@ -XXX,XX +XXX,XX @@ vu_dispatch(VuDev *dev)
      int reply_requested;
      bool need_reply, success = false;
 -    if (!vu_message_read(dev, dev->sock, &vmsg)) {
 +    if (!dev->read_msg(dev, dev->sock, &vmsg)) {
          goto end;
      }
-@@ -XXX,XX +XXX,XX @@ static ThrottleGroupMember *next_throttle_token(ThrottleGroupMember *tgm,
+@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
-      * then decide the token is the current tgm because chances are
+         uint16_t max_queues,
-      * the current tgm got the current request queued.
+         int socket,
-      */
+         vu_panic_cb panic,
--    if (token == start && !tgm_has_pending_reqs(token, is_write)) {
++        vu_read_msg_cb read_msg,
-+    if (token == start && !tgm_has_pending_reqs(token, direction)) {
+         vu_set_watch_cb set_watch,
-         token = tgm;
+         vu_remove_watch_cb remove_watch,
-     }
+         const VuDevIface *iface)
+@@ -XXX,XX +XXX,XX @@ vu_init(VuDev *dev,
-     /* Either we return the original TGM, or one with pending requests */
--    assert(token == tgm || tgm_has_pending_reqs(token, is_write));
+     dev->sock = socket;
-+    assert(token == tgm || tgm_has_pending_reqs(token, direction));
+     dev->panic = panic;
++    dev->read_msg = read_msg ? read_msg : vu_message_read_default;
-     return token;
+     dev->set_watch = set_watch;
- }
+     dev->remove_watch = remove_watch;
-@@ -XXX,XX +XXX,XX @@ static ThrottleGroupMember *next_throttle_token(ThrottleGroupMember *tgm,
+     dev->iface = iface;
-  * This assumes that tg->lock is held.
+@@ -XXX,XX +XXX,XX @@ static void _vu_queue_notify(VuDev *dev, VuVirtq *vq, bool sync)
-  *
-  * @tgm:        the current ThrottleGroupMember
+         vu_message_write(dev, dev->slave_fd, &vmsg);
-- * @is_write:   the type of operation (read/write)
+         if (ack) {
-+ * @direction:  the ThrottleDirection
+-            vu_message_read(dev, dev->slave_fd, &vmsg);
-  * @ret:        whether the I/O request needs to be throttled or not
++            vu_message_read_default(dev, dev->slave_fd, &vmsg);
-  */
+         }
  static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
 -                                          bool is_write)
 +                                          ThrottleDirection direction)
  {
      ThrottleState *ts = tgm->throttle_state;
      ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
      ThrottleTimers *tt = &tgm->throttle_timers;
 -    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
      bool must_wait;
      if (qatomic_read(&tgm->io_limits_disabled)) {
@@ -XXX,XX +XXX,XX @@ static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
      }
      /* Check if any of the timers in this group is already armed */
 -    if (tg->any_timer_armed[is_write]) {
 +    if (tg->any_timer_armed[direction]) {
          return true;
      }
@@ -XXX,XX +XXX,XX @@ static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
      /* If a timer just got armed, set tgm as the current token */
      if (must_wait) {
 -        tg->tokens[is_write] = tgm;
 -        tg->any_timer_armed[is_write] = true;
 +        tg->tokens[direction] = tgm;
 +        tg->any_timer_armed[direction] = true;
      }
      return must_wait;
@@ -XXX,XX +XXX,XX @@ static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
   * any request was actually pending.
   *
   * @tgm:       the current ThrottleGroupMember
 - * @is_write:  the type of operation (read/write)
 + * @direction: the ThrottleDirection
   */
  static bool coroutine_fn throttle_group_co_restart_queue(ThrottleGroupMember *tgm,
 -                                                         bool is_write)
 +                                                         ThrottleDirection direction)
  {
      bool ret;
      qemu_co_mutex_lock(&tgm->throttled_reqs_lock);
 -    ret = qemu_co_queue_next(&tgm->throttled_reqs[is_write]);
 +    ret = qemu_co_queue_next(&tgm->throttled_reqs[direction]);
      qemu_co_mutex_unlock(&tgm->throttled_reqs_lock);
      return ret;
@@ -XXX,XX +XXX,XX @@ static bool coroutine_fn throttle_group_co_restart_queue(ThrottleGroupMember *tg
   * This assumes that tg->lock is held.
   *
   * @tgm:       the current ThrottleGroupMember
 - * @is_write:  the type of operation (read/write)
 + * @direction: the ThrottleDirection
   */
 -static void schedule_next_request(ThrottleGroupMember *tgm, bool is_write)
 +static void schedule_next_request(ThrottleGroupMember *tgm,
 +                                  ThrottleDirection direction)
  {
      ThrottleState *ts = tgm->throttle_state;
      ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
@@ -XXX,XX +XXX,XX @@ static void schedule_next_request(ThrottleGroupMember *tgm, bool is_write)
      ThrottleGroupMember *token;
      /* Check if there's any pending request to schedule next */
 -    token = next_throttle_token(tgm, is_write);
 -    if (!tgm_has_pending_reqs(token, is_write)) {
 +    token = next_throttle_token(tgm, direction);
 +    if (!tgm_has_pending_reqs(token, direction)) {
          return;
      }
+diff --git a/tests/vhost-user-bridge.c b/tests/vhost-user-bridge.c
      /* Set a timer for the request if it needs to be throttled */
 -    must_wait = throttle_group_schedule_timer(token, is_write);
 +    must_wait = throttle_group_schedule_timer(token, direction);
      /* If it doesn't have to wait, queue it for immediate execution */
      if (!must_wait) {
          /* Give preference to requests from the current tgm */
          if (qemu_in_coroutine() &&
 -            throttle_group_co_restart_queue(tgm, is_write)) {
 +            throttle_group_co_restart_queue(tgm, direction)) {
              token = tgm;
          } else {
              ThrottleTimers *tt = &token->throttle_timers;
              int64_t now = qemu_clock_get_ns(tg->clock_type);
 -            timer_mod(tt->timers[is_write], now);
 -            tg->any_timer_armed[is_write] = true;
 +            timer_mod(tt->timers[direction], now);
 +            tg->any_timer_armed[direction] = true;
          }
 -        tg->tokens[is_write] = token;
 +        tg->tokens[direction] = token;
      }
  }
@@ -XXX,XX +XXX,XX @@ static void schedule_next_request(ThrottleGroupMember *tgm, bool is_write)
   *
   * @tgm:       the current ThrottleGroupMember
   * @bytes:     the number of bytes for this I/O
 - * @is_write:  the type of operation (read/write)
 + * @direction: the ThrottleDirection
   */
  void coroutine_fn throttle_group_co_io_limits_intercept(ThrottleGroupMember *tgm,
                                                          int64_t bytes,
 -                                                        bool is_write)
 +                                                        ThrottleDirection direction)
  {
      bool must_wait;
      ThrottleGroupMember *token;
      ThrottleGroup *tg = container_of(tgm->throttle_state, ThrottleGroup, ts);
 -    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
      assert(bytes >= 0);
 +    assert(direction < THROTTLE_MAX);
      qemu_mutex_lock(&tg->lock);
      /* First we check if this I/O has to be throttled. */
 -    token = next_throttle_token(tgm, is_write);
 -    must_wait = throttle_group_schedule_timer(token, is_write);
 +    token = next_throttle_token(tgm, direction);
 +    must_wait = throttle_group_schedule_timer(token, direction);
      /* Wait if there's a timer set or queued requests of this type */
 -    if (must_wait || tgm->pending_reqs[is_write]) {
 -        tgm->pending_reqs[is_write]++;
 +    if (must_wait || tgm->pending_reqs[direction]) {
 +        tgm->pending_reqs[direction]++;
          qemu_mutex_unlock(&tg->lock);
          qemu_co_mutex_lock(&tgm->throttled_reqs_lock);
 -        qemu_co_queue_wait(&tgm->throttled_reqs[is_write],
 +        qemu_co_queue_wait(&tgm->throttled_reqs[direction],
                             &tgm->throttled_reqs_lock);
          qemu_co_mutex_unlock(&tgm->throttled_reqs_lock);
          qemu_mutex_lock(&tg->lock);
 -        tgm->pending_reqs[is_write]--;
 +        tgm->pending_reqs[direction]--;
      }
      /* The I/O will be executed, so do the accounting */
      throttle_account(tgm->throttle_state, direction, bytes);
      /* Schedule the next request */
 -    schedule_next_request(tgm, is_write);
 +    schedule_next_request(tgm, direction);
      qemu_mutex_unlock(&tg->lock);
  }
  typedef struct {
      ThrottleGroupMember *tgm;
 -    bool is_write;
 +    ThrottleDirection direction;
  } RestartData;
  static void coroutine_fn throttle_group_restart_queue_entry(void *opaque)
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn throttle_group_restart_queue_entry(void *opaque)
      ThrottleGroupMember *tgm = data->tgm;
      ThrottleState *ts = tgm->throttle_state;
      ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
 -    bool is_write = data->is_write;
 +    ThrottleDirection direction = data->direction;
      bool empty_queue;
 -    empty_queue = !throttle_group_co_restart_queue(tgm, is_write);
 +    empty_queue = !throttle_group_co_restart_queue(tgm, direction);
      /* If the request queue was empty then we have to take care of
       * scheduling the next one */
      if (empty_queue) {
          qemu_mutex_lock(&tg->lock);
 -        schedule_next_request(tgm, is_write);
 +        schedule_next_request(tgm, direction);
          qemu_mutex_unlock(&tg->lock);
      }
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn throttle_group_restart_queue_entry(void *opaque)
      aio_wait_kick();
  }
 -static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write)
 +static void throttle_group_restart_queue(ThrottleGroupMember *tgm,
 +                                        ThrottleDirection direction)
  {
      Coroutine *co;
      RestartData *rd = g_new0(RestartData, 1);
      rd->tgm = tgm;
 -    rd->is_write = is_write;
 +    rd->direction = direction;
      /* This function is called when a timer is fired or when
       * throttle_group_restart_tgm() is called. Either way, there can
       * be no timer pending on this tgm at this point */
 -    assert(!timer_pending(tgm->throttle_timers.timers[is_write]));
 +    assert(!timer_pending(tgm->throttle_timers.timers[direction]));
      qatomic_inc(&tgm->restart_pending);
@@ -XXX,XX +XXX,XX @@ static void throttle_group_restart_queue(ThrottleGroupMember *tgm, bool is_write
  void throttle_group_restart_tgm(ThrottleGroupMember *tgm)
  {
 -    int i;
 +    ThrottleDirection dir;
      if (tgm->throttle_state) {
 -        for (i = 0; i < 2; i++) {
 -            QEMUTimer *t = tgm->throttle_timers.timers[i];
 +        for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
 +            QEMUTimer *t = tgm->throttle_timers.timers[dir];
              if (timer_pending(t)) {
                  /* If there's a pending timer on this tgm, fire it now */
                  timer_del(t);
 -                timer_cb(tgm, i);
 +                timer_cb(tgm, dir);
              } else {
                  /* Else run the next request from the queue manually */
 -                throttle_group_restart_queue(tgm, i);
 +                throttle_group_restart_queue(tgm, dir);
              }
          }
      }
@@ -XXX,XX +XXX,XX @@ void throttle_group_get_config(ThrottleGroupMember *tgm, ThrottleConfig *cfg)
   * because it had been throttled.
   *
   * @tgm:       the ThrottleGroupMember whose request had been throttled
 - * @is_write:  the type of operation (read/write)
 + * @direction: the ThrottleDirection
   */
 -static void timer_cb(ThrottleGroupMember *tgm, bool is_write)
 +static void timer_cb(ThrottleGroupMember *tgm, ThrottleDirection direction)
  {
      ThrottleState *ts = tgm->throttle_state;
      ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
      /* The timer has just been fired, so we can update the flag */
      qemu_mutex_lock(&tg->lock);
 -    tg->any_timer_armed[is_write] = false;
 +    tg->any_timer_armed[direction] = false;
      qemu_mutex_unlock(&tg->lock);
      /* Run the request that was waiting for this timer */
 -    throttle_group_restart_queue(tgm, is_write);
 +    throttle_group_restart_queue(tgm, direction);
  }
  static void read_timer_cb(void *opaque)
  {
 -    timer_cb(opaque, false);
 +    timer_cb(opaque, THROTTLE_READ);
  }
  static void write_timer_cb(void *opaque)
  {
 -    timer_cb(opaque, true);
 +    timer_cb(opaque, THROTTLE_WRITE);
  }
  /* Register a ThrottleGroupMember from the throttling group, also initializing
@@ -XXX,XX +XXX,XX @@ void throttle_group_register_tgm(ThrottleGroupMember *tgm,
                                   const char *groupname,
                                   AioContext *ctx)
  {
 -    int i;
 +    ThrottleDirection dir;
      ThrottleState *ts = throttle_group_incref(groupname);
      ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
@@ -XXX,XX +XXX,XX @@ void throttle_group_register_tgm(ThrottleGroupMember *tgm,
      QEMU_LOCK_GUARD(&tg->lock);
      /* If the ThrottleGroup is new set this ThrottleGroupMember as the token */
 -    for (i = 0; i < 2; i++) {
 -        if (!tg->tokens[i]) {
 -            tg->tokens[i] = tgm;
 +    for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
 +        if (!tg->tokens[dir]) {
 +            tg->tokens[dir] = tgm;
          }
 +        qemu_co_queue_init(&tgm->throttled_reqs[dir]);
      }
      QLIST_INSERT_HEAD(&tg->head, tgm, round_robin);
@@ -XXX,XX +XXX,XX @@ void throttle_group_register_tgm(ThrottleGroupMember *tgm,
                           write_timer_cb,
                           tgm);
      qemu_co_mutex_init(&tgm->throttled_reqs_lock);
 -    qemu_co_queue_init(&tgm->throttled_reqs[0]);
 -    qemu_co_queue_init(&tgm->throttled_reqs[1]);
  }
  /* Unregister a ThrottleGroupMember from its group, removing it from the list,
@@ -XXX,XX +XXX,XX @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
      ThrottleState *ts = tgm->throttle_state;
      ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
      ThrottleGroupMember *token;
 -    int i;
 +    ThrottleDirection dir;
      if (!ts) {
          /* Discard already unregistered tgm */
@@ -XXX,XX +XXX,XX @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
      AIO_WAIT_WHILE(tgm->aio_context, qatomic_read(&tgm->restart_pending) > 0);
      WITH_QEMU_LOCK_GUARD(&tg->lock) {
 -        for (i = 0; i < 2; i++) {
 -            assert(tgm->pending_reqs[i] == 0);
 -            assert(qemu_co_queue_empty(&tgm->throttled_reqs[i]));
 -            assert(!timer_pending(tgm->throttle_timers.timers[i]));
 -            if (tg->tokens[i] == tgm) {
 +        for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
 +            assert(tgm->pending_reqs[dir] == 0);
 +            assert(qemu_co_queue_empty(&tgm->throttled_reqs[dir]));
 +            assert(!timer_pending(tgm->throttle_timers.timers[dir]));
 +            if (tg->tokens[dir] == tgm) {
                  token = throttle_group_next_tgm(tgm);
                  /* Take care of the case where this is the last tgm in the group */
                  if (token == tgm) {
                      token = NULL;
                  }
 -                tg->tokens[i] = token;
 +                tg->tokens[dir] = token;
              }
          }
@@ -XXX,XX +XXX,XX @@ void throttle_group_detach_aio_context(ThrottleGroupMember *tgm)
  {
      ThrottleGroup *tg = container_of(tgm->throttle_state, ThrottleGroup, ts);
      ThrottleTimers *tt = &tgm->throttle_timers;
 -    int i;
 +    ThrottleDirection dir;
      /* Requests must have been drained */
 -    assert(tgm->pending_reqs[0] == 0 && tgm->pending_reqs[1] == 0);
 -    assert(qemu_co_queue_empty(&tgm->throttled_reqs[0]));
 -    assert(qemu_co_queue_empty(&tgm->throttled_reqs[1]));
 +    for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
 +        assert(tgm->pending_reqs[dir] == 0);
 +        assert(qemu_co_queue_empty(&tgm->throttled_reqs[dir]));
 +    }
      /* Kick off next ThrottleGroupMember, if necessary */
      WITH_QEMU_LOCK_GUARD(&tg->lock) {
 -        for (i = 0; i < 2; i++) {
 -            if (timer_pending(tt->timers[i])) {
 -                tg->any_timer_armed[i] = false;
 -                schedule_next_request(tgm, i);
 +        for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
 +            if (timer_pending(tt->timers[dir])) {
 +                tg->any_timer_armed[dir] = false;
 +                schedule_next_request(tgm, dir);
              }
          }
      }
 diff --git a/block/throttle.c b/block/throttle.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/throttle.c
+--- a/tests/vhost-user-bridge.c
-+++ b/block/throttle.c
++++ b/tests/vhost-user-bridge.c
-@@ -XXX,XX +XXX,XX @@ throttle_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
+@@ -XXX,XX +XXX,XX @@ vubr_accept_cb(int sock, void *ctx)
- {
+                  VHOST_USER_BRIDGE_MAX_QUEUES,
+                  conn_fd,
-     ThrottleGroupMember *tgm = bs->opaque;
+                  vubr_panic,
--    throttle_group_co_io_limits_intercept(tgm, bytes, false);
++                 NULL,
-+    throttle_group_co_io_limits_intercept(tgm, bytes, THROTTLE_READ);
+                  vubr_set_watch,
+                  vubr_remove_watch,
-     return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
+                  &vuiface)) {
- }
+@@ -XXX,XX +XXX,XX @@ vubr_new(const char *path, bool client)
-@@ -XXX,XX +XXX,XX @@ throttle_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
+                      VHOST_USER_BRIDGE_MAX_QUEUES,
-                     QEMUIOVector *qiov, BdrvRequestFlags flags)
+                      dev->sock,
- {
+                      vubr_panic,
-     ThrottleGroupMember *tgm = bs->opaque;
++                     NULL,
--    throttle_group_co_io_limits_intercept(tgm, bytes, true);
+                      vubr_set_watch,
-+    throttle_group_co_io_limits_intercept(tgm, bytes, THROTTLE_WRITE);
+                      vubr_remove_watch,
+                      &vuiface)) {
-     return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
- }
+index XXXXXXX..XXXXXXX 100644
-@@ -XXX,XX +XXX,XX @@ throttle_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int64_t bytes,
+--- a/tools/virtiofsd/fuse_virtio.c
-                           BdrvRequestFlags flags)
++++ b/tools/virtiofsd/fuse_virtio.c
- {
+@@ -XXX,XX +XXX,XX @@ int virtio_session_mount(struct fuse_session *se)
-     ThrottleGroupMember *tgm = bs->opaque;
+     se->vu_socketfd = data_sock;
--    throttle_group_co_io_limits_intercept(tgm, bytes, true);
+     se->virtio_dev->se = se;
-+    throttle_group_co_io_limits_intercept(tgm, bytes, THROTTLE_WRITE);
+     pthread_rwlock_init(&se->virtio_dev->vu_dispatch_rwlock, NULL);
+-    vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, fv_set_watch,
-     return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
+-            fv_remove_watch, &fv_iface);
- }
++    vu_init(&se->virtio_dev->dev, 2, se->vu_socketfd, fv_panic, NULL,
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn GRAPH_RDLOCK
++            fv_set_watch, fv_remove_watch, &fv_iface);
- throttle_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
- {
+     return 0;
      ThrottleGroupMember *tgm = bs->opaque;
 -    throttle_group_co_io_limits_intercept(tgm, bytes, true);
 +    throttle_group_co_io_limits_intercept(tgm, bytes, THROTTLE_WRITE);
      return bdrv_co_pdiscard(bs->file, offset, bytes);
  }
 --
-.41.0
+.26.2

-New patch
+[PULL v2 03/28] libvhost-user: remove watch for kick_fd when de-initialize vu-dev
+From: Coiby Xu <coiby.xu@gmail.com>
+When the client is running in gdb and quit command is run in gdb,
+QEMU will still dispatch the event which will cause segment fault in
+the callback function.
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
+Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
+Message-id: 20200918080912.321299-3-coiby.xu@gmail.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ contrib/libvhost-user/libvhost-user.c | 1 +
+file changed, 1 insertion(+)
+diff --git a/contrib/libvhost-user/libvhost-user.c b/contrib/libvhost-user/libvhost-user.c
+index XXXXXXX..XXXXXXX 100644
+--- a/contrib/libvhost-user/libvhost-user.c
++++ b/contrib/libvhost-user/libvhost-user.c
+@@ -XXX,XX +XXX,XX @@ vu_deinit(VuDev *dev)
+         }
+         if (vq->kick_fd != -1) {
++            dev->remove_watch(dev, vq->kick_fd);
+             close(vq->kick_fd);
+             vq->kick_fd = -1;
+         }
+--
+.26.2

-New patch
+[PULL v2 04/28] util/vhost-user-server: generic vhost user server
+From: Coiby Xu <coiby.xu@gmail.com>
+Sharing QEMU devices via vhost-user protocol.
+Only one vhost-user client can connect to the server one time.
+Suggested-by: Kevin Wolf <kwolf@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
+Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
+Message-id: 20200918080912.321299-4-coiby.xu@gmail.com
+[Fixed size_t %lu -> %zu format string compiler error.
+--Stefan]
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ util/vhost-user-server.h |  65 ++++++
+ util/vhost-user-server.c | 428 +++++++++++++++++++++++++++++++++++++++
+ util/meson.build         |   1 +
+files changed, 494 insertions(+)
+ create mode 100644 util/vhost-user-server.h
+ create mode 100644 util/vhost-user-server.c
+diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/util/vhost-user-server.h
+@@ -XXX,XX +XXX,XX @@
++/*
++ * Sharing QEMU devices via vhost-user protocol
++ *
++ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
++ * Copyright (c) 2020 Red Hat, Inc.
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or
++ * later.  See the COPYING file in the top-level directory.
++ */
++
++#ifndef VHOST_USER_SERVER_H
++#define VHOST_USER_SERVER_H
++
++#include "contrib/libvhost-user/libvhost-user.h"
++#include "io/channel-socket.h"
++#include "io/channel-file.h"
++#include "io/net-listener.h"
++#include "qemu/error-report.h"
++#include "qapi/error.h"
++#include "standard-headers/linux/virtio_blk.h"
++
++typedef struct VuFdWatch {
++    VuDev *vu_dev;
++    int fd; /*kick fd*/
++    void *pvt;
++    vu_watch_cb cb;
++    bool processing;
++    QTAILQ_ENTRY(VuFdWatch) next;
++} VuFdWatch;
++
++typedef struct VuServer VuServer;
++typedef void DevicePanicNotifierFn(VuServer *server);
++
++struct VuServer {
++    QIONetListener *listener;
++    AioContext *ctx;
++    DevicePanicNotifierFn *device_panic_notifier;
++    int max_queues;
++    const VuDevIface *vu_iface;
++    VuDev vu_dev;
++    QIOChannel *ioc; /* The I/O channel with the client */
++    QIOChannelSocket *sioc; /* The underlying data channel with the client */
++    /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
++    QIOChannel *ioc_slave;
++    QIOChannelSocket *sioc_slave;
++    Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
++    QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
++    /* restart coroutine co_trip if AIOContext is changed */
++    bool aio_context_changed;
++    bool processing_msg;
++};
++
++bool vhost_user_server_start(VuServer *server,
++                             SocketAddress *unix_socket,
++                             AioContext *ctx,
++                             uint16_t max_queues,
++                             DevicePanicNotifierFn *device_panic_notifier,
++                             const VuDevIface *vu_iface,
++                             Error **errp);
++
++void vhost_user_server_stop(VuServer *server);
++
++void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
++
++#endif /* VHOST_USER_SERVER_H */
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@
++/*
++ * Sharing QEMU devices via vhost-user protocol
++ *
++ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
++ * Copyright (c) 2020 Red Hat, Inc.
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or
++ * later.  See the COPYING file in the top-level directory.
++ */
++#include "qemu/osdep.h"
++#include "qemu/main-loop.h"
++#include "vhost-user-server.h"
++
++static void vmsg_close_fds(VhostUserMsg *vmsg)
++{
++    int i;
++    for (i = 0; i < vmsg->fd_num; i++) {
++        close(vmsg->fds[i]);
++    }
++}
++
++static void vmsg_unblock_fds(VhostUserMsg *vmsg)
++{
++    int i;
++    for (i = 0; i < vmsg->fd_num; i++) {
++        qemu_set_nonblock(vmsg->fds[i]);
++    }
++}
++
++static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
++                      gpointer opaque);
++
++static void close_client(VuServer *server)
++{
++    /*
++     * Before closing the client
++     *
++     * 1. Let vu_client_trip stop processing new vhost-user msg
++     *
++     * 2. remove kick_handler
++     *
++     * 3. wait for the kick handler to be finished
++     *
++     * 4. wait for the current vhost-user msg to be finished processing
++     */
++
++    QIOChannelSocket *sioc = server->sioc;
++    /* When this is set vu_client_trip will stop new processing vhost-user message */
++    server->sioc = NULL;
++
++    VuFdWatch *vu_fd_watch, *next;
++    QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
++        aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
++                           NULL, NULL, NULL);
++    }
++
++    while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
++        QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
++            if (!vu_fd_watch->processing) {
++                QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
++                g_free(vu_fd_watch);
++            }
++        }
++    }
++
++    while (server->processing_msg) {
++        if (server->ioc->read_coroutine) {
++            server->ioc->read_coroutine = NULL;
++            qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
++                                           NULL, server->ioc);
++            server->processing_msg = false;
++        }
++    }
++
++    vu_deinit(&server->vu_dev);
++    object_unref(OBJECT(sioc));
++    object_unref(OBJECT(server->ioc));
++}
++
++static void panic_cb(VuDev *vu_dev, const char *buf)
++{
++    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
++
++    /* avoid while loop in close_client */
++    server->processing_msg = false;
++
++    if (buf) {
++        error_report("vu_panic: %s", buf);
++    }
++
++    if (server->sioc) {
++        close_client(server);
++    }
++
++    if (server->device_panic_notifier) {
++        server->device_panic_notifier(server);
++    }
++
++    /*
++     * Set the callback function for network listener so another
++     * vhost-user client can connect to this server
++     */
++    qio_net_listener_set_client_func(server->listener,
++                                     vu_accept,
++                                     server,
++                                     NULL);
++}
++
++static bool coroutine_fn
++vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
++{
++    struct iovec iov = {
++        .iov_base = (char *)vmsg,
++        .iov_len = VHOST_USER_HDR_SIZE,
++    };
++    int rc, read_bytes = 0;
++    Error *local_err = NULL;
++    /*
++     * Store fds/nfds returned from qio_channel_readv_full into
++     * temporary variables.
++     *
++     * VhostUserMsg is a packed structure, gcc will complain about passing
++     * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
++     * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
++     * thus two temporary variables nfds and fds are used here.
++     */
++    size_t nfds = 0, nfds_t = 0;
++    const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
++    int *fds_t = NULL;
++    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
++    QIOChannel *ioc = server->ioc;
++
++    if (!ioc) {
++        error_report_err(local_err);
++        goto fail;
++    }
++
++    assert(qemu_in_coroutine());
++    do {
++        /*
++         * qio_channel_readv_full may have short reads, keeping calling it
++         * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
++         */
++        rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
++        if (rc < 0) {
++            if (rc == QIO_CHANNEL_ERR_BLOCK) {
++                qio_channel_yield(ioc, G_IO_IN);
++                continue;
++            } else {
++                error_report_err(local_err);
++                return false;
++            }
++        }
++        read_bytes += rc;
++        if (nfds_t > 0) {
++            if (nfds + nfds_t > max_fds) {
++                error_report("A maximum of %zu fds are allowed, "
++                             "however got %zu fds now",
++                             max_fds, nfds + nfds_t);
++                goto fail;
++            }
++            memcpy(vmsg->fds + nfds, fds_t,
++                   nfds_t *sizeof(vmsg->fds[0]));
++            nfds += nfds_t;
++            g_free(fds_t);
++        }
++        if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
++            break;
++        }
++        iov.iov_base = (char *)vmsg + read_bytes;
++        iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
++    } while (true);
++
++    vmsg->fd_num = nfds;
++    /* qio_channel_readv_full will make socket fds blocking, unblock them */
++    vmsg_unblock_fds(vmsg);
++    if (vmsg->size > sizeof(vmsg->payload)) {
++        error_report("Error: too big message request: %d, "
++                     "size: vmsg->size: %u, "
++                     "while sizeof(vmsg->payload) = %zu",
++                     vmsg->request, vmsg->size, sizeof(vmsg->payload));
++        goto fail;
++    }
++
++    struct iovec iov_payload = {
++        .iov_base = (char *)&vmsg->payload,
++        .iov_len = vmsg->size,
++    };
++    if (vmsg->size) {
++        rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
++        if (rc == -1) {
++            error_report_err(local_err);
++            goto fail;
++        }
++    }
++
++    return true;
++
++fail:
++    vmsg_close_fds(vmsg);
++
++    return false;
++}
++
++
++static void vu_client_start(VuServer *server);
++static coroutine_fn void vu_client_trip(void *opaque)
++{
++    VuServer *server = opaque;
++
++    while (!server->aio_context_changed && server->sioc) {
++        server->processing_msg = true;
++        vu_dispatch(&server->vu_dev);
++        server->processing_msg = false;
++    }
++
++    if (server->aio_context_changed && server->sioc) {
++        server->aio_context_changed = false;
++        vu_client_start(server);
++    }
++}
++
++static void vu_client_start(VuServer *server)
++{
++    server->co_trip = qemu_coroutine_create(vu_client_trip, server);
++    aio_co_enter(server->ctx, server->co_trip);
++}
++
++/*
++ * a wrapper for vu_kick_cb
++ *
++ * since aio_dispatch can only pass one user data pointer to the
++ * callback function, pack VuDev and pvt into a struct. Then unpack it
++ * and pass them to vu_kick_cb
++ */
++static void kick_handler(void *opaque)
++{
++    VuFdWatch *vu_fd_watch = opaque;
++    vu_fd_watch->processing = true;
++    vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
++    vu_fd_watch->processing = false;
++}
++
++
++static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
++{
++
++    VuFdWatch *vu_fd_watch, *next;
++    QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
++        if (vu_fd_watch->fd == fd) {
++            return vu_fd_watch;
++        }
++    }
++    return NULL;
++}
++
++static void
++set_watch(VuDev *vu_dev, int fd, int vu_evt,
++          vu_watch_cb cb, void *pvt)
++{
++
++    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
++    g_assert(vu_dev);
++    g_assert(fd >= 0);
++    g_assert(cb);
++
++    VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
++
++    if (!vu_fd_watch) {
++        VuFdWatch *vu_fd_watch = g_new0(VuFdWatch, 1);
++
++        QTAILQ_INSERT_TAIL(&server->vu_fd_watches, vu_fd_watch, next);
++
++        vu_fd_watch->fd = fd;
++        vu_fd_watch->cb = cb;
++        qemu_set_nonblock(fd);
++        aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler,
++                           NULL, NULL, vu_fd_watch);
++        vu_fd_watch->vu_dev = vu_dev;
++        vu_fd_watch->pvt = pvt;
++    }
++}
++
++
++static void remove_watch(VuDev *vu_dev, int fd)
++{
++    VuServer *server;
++    g_assert(vu_dev);
++    g_assert(fd >= 0);
++
++    server = container_of(vu_dev, VuServer, vu_dev);
++
++    VuFdWatch *vu_fd_watch = find_vu_fd_watch(server, fd);
++
++    if (!vu_fd_watch) {
++        return;
++    }
++    aio_set_fd_handler(server->ioc->ctx, fd, true, NULL, NULL, NULL, NULL);
++
++    QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
++    g_free(vu_fd_watch);
++}
++
++
++static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
++                      gpointer opaque)
++{
++    VuServer *server = opaque;
++
++    if (server->sioc) {
++        warn_report("Only one vhost-user client is allowed to "
++                    "connect the server one time");
++        return;
++    }
++
++    if (!vu_init(&server->vu_dev, server->max_queues, sioc->fd, panic_cb,
++                 vu_message_read, set_watch, remove_watch, server->vu_iface)) {
++        error_report("Failed to initialize libvhost-user");
++        return;
++    }
++
++    /*
++     * Unset the callback function for network listener to make another
++     * vhost-user client keeping waiting until this client disconnects
++     */
++    qio_net_listener_set_client_func(server->listener,
++                                     NULL,
++                                     NULL,
++                                     NULL);
++    server->sioc = sioc;
++    /*
++     * Increase the object reference, so sioc will not freed by
++     * qio_net_listener_channel_func which will call object_unref(OBJECT(sioc))
++     */
++    object_ref(OBJECT(server->sioc));
++    qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
++    server->ioc = QIO_CHANNEL(sioc);
++    object_ref(OBJECT(server->ioc));
++    qio_channel_attach_aio_context(server->ioc, server->ctx);
++    qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
++    vu_client_start(server);
++}
++
++
++void vhost_user_server_stop(VuServer *server)
++{
++    if (server->sioc) {
++        close_client(server);
++    }
++
++    if (server->listener) {
++        qio_net_listener_disconnect(server->listener);
++        object_unref(OBJECT(server->listener));
++    }
++
++}
++
++void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
++{
++    VuFdWatch *vu_fd_watch, *next;
++    void *opaque = NULL;
++    IOHandler *io_read = NULL;
++    bool attach;
++
++    server->ctx = ctx ? ctx : qemu_get_aio_context();
++
++    if (!server->sioc) {
++        /* not yet serving any client*/
++        return;
++    }
++
++    if (ctx) {
++        qio_channel_attach_aio_context(server->ioc, ctx);
++        server->aio_context_changed = true;
++        io_read = kick_handler;
++        attach = true;
++    } else {
++        qio_channel_detach_aio_context(server->ioc);
++        /* server->ioc->ctx keeps the old AioConext */
++        ctx = server->ioc->ctx;
++        attach = false;
++    }
++
++    QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
++        if (vu_fd_watch->cb) {
++            opaque = attach ? vu_fd_watch : NULL;
++            aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
++                               io_read, NULL, NULL,
++                               opaque);
++        }
++    }
++}
++
++
++bool vhost_user_server_start(VuServer *server,
++                             SocketAddress *socket_addr,
++                             AioContext *ctx,
++                             uint16_t max_queues,
++                             DevicePanicNotifierFn *device_panic_notifier,
++                             const VuDevIface *vu_iface,
++                             Error **errp)
++{
++    QIONetListener *listener = qio_net_listener_new();
++    if (qio_net_listener_open_sync(listener, socket_addr, 1,
++                                   errp) < 0) {
++        object_unref(OBJECT(listener));
++        return false;
++    }
++
++    /* zero out unspecified fileds */
++    *server = (VuServer) {
++        .listener              = listener,
++        .vu_iface              = vu_iface,
++        .max_queues            = max_queues,
++        .ctx                   = ctx,
++        .device_panic_notifier = device_panic_notifier,
++    };
++
++    qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
++
++    qio_net_listener_set_client_func(server->listener,
++                                     vu_accept,
++                                     server,
++                                     NULL);
++
++    QTAILQ_INIT(&server->vu_fd_watches);
++    return true;
++}
+diff --git a/util/meson.build b/util/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/util/meson.build
++++ b/util/meson.build
+@@ -XXX,XX +XXX,XX @@ if have_block
+   util_ss.add(files('main-loop.c'))
+   util_ss.add(files('nvdimm-utils.c'))
+   util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
++  util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
+   util_ss.add(files('qemu-coroutine-sleep.c'))
+   util_ss.add(files('qemu-co-shared-resource.c'))
+   util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
+--
+.26.2

-New patch
+[PULL v2 05/28] block: move logical block size check function to a common utility function
+From: Coiby Xu <coiby.xu@gmail.com>
+Move the constants from hw/core/qdev-properties.c to
+util/block-helpers.h so that knowledge of the min/max values is
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
+Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
+Acked-by: Eduardo Habkost <ehabkost@redhat.com>
+Message-id: 20200918080912.321299-5-coiby.xu@gmail.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ util/block-helpers.h             | 19 +++++++++++++
+ hw/core/qdev-properties-system.c | 31 ++++-----------------
+ util/block-helpers.c             | 46 ++++++++++++++++++++++++++++++++
+ util/meson.build                 |  1 +
+files changed, 71 insertions(+), 26 deletions(-)
+ create mode 100644 util/block-helpers.h
+ create mode 100644 util/block-helpers.c
+diff --git a/util/block-helpers.h b/util/block-helpers.h
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/util/block-helpers.h
+@@ -XXX,XX +XXX,XX @@
++#ifndef BLOCK_HELPERS_H
++#define BLOCK_HELPERS_H
++
++#include "qemu/units.h"
++
++/* lower limit is sector size */
++#define MIN_BLOCK_SIZE          INT64_C(512)
++#define MIN_BLOCK_SIZE_STR      "512 B"
++/*
++ * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
++ * matches qcow2 cluster size limit
++ */
++#define MAX_BLOCK_SIZE          (2 * MiB)
++#define MAX_BLOCK_SIZE_STR      "2 MiB"
++
++void check_block_size(const char *id, const char *name, int64_t value,
++                      Error **errp);
++
++#endif /* BLOCK_HELPERS_H */
+diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
+index XXXXXXX..XXXXXXX 100644
+--- a/hw/core/qdev-properties-system.c
++++ b/hw/core/qdev-properties-system.c
+@@ -XXX,XX +XXX,XX @@
+ #include "sysemu/blockdev.h"
+ #include "net/net.h"
+ #include "hw/pci/pci.h"
++#include "util/block-helpers.h"
+ static bool check_prop_still_unset(DeviceState *dev, const char *name,
+                                    const void *old_val, const char *new_val,
+@@ -XXX,XX +XXX,XX @@ const PropertyInfo qdev_prop_losttickpolicy = {
+ /* --- blocksize --- */
+-/* lower limit is sector size */
+-#define MIN_BLOCK_SIZE          512
+-#define MIN_BLOCK_SIZE_STR      "512 B"
+-/*
+- * upper limit is arbitrary, 2 MiB looks sufficient for all sensible uses, and
+- * matches qcow2 cluster size limit
+- */
+-#define MAX_BLOCK_SIZE          (2 * MiB)
+-#define MAX_BLOCK_SIZE_STR      "2 MiB"
+-
+ static void set_blocksize(Object *obj, Visitor *v, const char *name,
+                           void *opaque, Error **errp)
+ {
+@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
+     Property *prop = opaque;
+     uint32_t *ptr = qdev_get_prop_ptr(dev, prop);
+     uint64_t value;
++    Error *local_err = NULL;
+     if (dev->realized) {
+         qdev_prop_set_after_realize(dev, name, errp);
+@@ -XXX,XX +XXX,XX @@ static void set_blocksize(Object *obj, Visitor *v, const char *name,
+     if (!visit_type_size(v, name, &value, errp)) {
+         return;
+     }
+-    /* value of 0 means "unset" */
+-    if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
+-        error_setg(errp,
+-                   "Property %s.%s doesn't take value %" PRIu64
+-                   " (minimum: " MIN_BLOCK_SIZE_STR
+-                   ", maximum: " MAX_BLOCK_SIZE_STR ")",
+-                   dev->id ? : "", name, value);
++    check_block_size(dev->id ? : "", name, value, &local_err);
++    if (local_err) {
++        error_propagate(errp, local_err);
+         return;
+     }
+-
+-    /* We rely on power-of-2 blocksizes for bitmasks */
+-    if ((value & (value - 1)) != 0) {
+-        error_setg(errp,
+-                  "Property %s.%s doesn't take value '%" PRId64 "', "
+-                  "it's not a power of 2", dev->id ?: "", name, (int64_t)value);
+-        return;
+-    }
+-
+     *ptr = value;
+ }
+diff --git a/util/block-helpers.c b/util/block-helpers.c
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/util/block-helpers.c
+@@ -XXX,XX +XXX,XX @@
++/*
++ * Block utility functions
++ *
++ * Copyright IBM, Corp. 2011
++ * Copyright (c) 2020 Coiby Xu <coiby.xu@gmail.com>
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or later.
++ * See the COPYING file in the top-level directory.
++ */
++
++#include "qemu/osdep.h"
++#include "qapi/error.h"
++#include "qapi/qmp/qerror.h"
++#include "block-helpers.h"
++
++/**
++ * check_block_size:
++ * @id: The unique ID of the object
++ * @name: The name of the property being validated
++ * @value: The block size in bytes
++ * @errp: A pointer to an area to store an error
++ *
++ * This function checks that the block size meets the following conditions:
++ * 1. At least MIN_BLOCK_SIZE
++ * 2. No larger than MAX_BLOCK_SIZE
++ * 3. A power of 2
++ */
++void check_block_size(const char *id, const char *name, int64_t value,
++                      Error **errp)
++{
++    /* value of 0 means "unset" */
++    if (value && (value < MIN_BLOCK_SIZE || value > MAX_BLOCK_SIZE)) {
++        error_setg(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
++                   id, name, value, MIN_BLOCK_SIZE, MAX_BLOCK_SIZE);
++        return;
++    }
++
++    /* We rely on power-of-2 blocksizes for bitmasks */
++    if ((value & (value - 1)) != 0) {
++        error_setg(errp,
++                   "Property %s.%s doesn't take value '%" PRId64
++                   "', it's not a power of 2",
++                   id, name, value);
++        return;
++    }
++}
+diff --git a/util/meson.build b/util/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/util/meson.build
++++ b/util/meson.build
+@@ -XXX,XX +XXX,XX @@ if have_block
+   util_ss.add(files('nvdimm-utils.c'))
+   util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
+   util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
++  util_ss.add(files('block-helpers.c'))
+   util_ss.add(files('qemu-coroutine-sleep.c'))
+   util_ss.add(files('qemu-co-shared-resource.c'))
+   util_ss.add(files('thread-pool.c', 'qemu-timer.c'))
+--
+.26.2

-New patch
+[PULL v2 06/28] block/export: vhost-user block device backend server
+From: Coiby Xu <coiby.xu@gmail.com>
+By making use of libvhost-user, block device drive can be shared to
+the connected vhost-user client. Only one client can connect to the
+server one time.
+Since vhost-user-server needs a block drive to be created first, delay
+the creation of this object.
+Suggested-by: Kevin Wolf <kwolf@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
+Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
+Message-id: 20200918080912.321299-6-coiby.xu@gmail.com
+[Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the
+following compiler warning:
+../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=]
+and fix "Invalid size %ld ..." ssize_t format string arguments for
+-bit hosts.
+--Stefan]
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ block/export/vhost-user-blk-server.h |  36 ++
+ block/export/vhost-user-blk-server.c | 661 +++++++++++++++++++++++++++
+ softmmu/vl.c                         |   4 +
+ block/meson.build                    |   1 +
+files changed, 702 insertions(+)
+ create mode 100644 block/export/vhost-user-blk-server.h
+ create mode 100644 block/export/vhost-user-blk-server.c
+diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/block/export/vhost-user-blk-server.h
+@@ -XXX,XX +XXX,XX @@
++/*
++ * Sharing QEMU block devices via vhost-user protocal
++ *
++ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
++ * Copyright (c) 2020 Red Hat, Inc.
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or
++ * later.  See the COPYING file in the top-level directory.
++ */
++
++#ifndef VHOST_USER_BLK_SERVER_H
++#define VHOST_USER_BLK_SERVER_H
++#include "util/vhost-user-server.h"
++
++typedef struct VuBlockDev VuBlockDev;
++#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
++#define VHOST_USER_BLK_SERVER(obj) \
++   OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
++
++/* vhost user block device */
++struct VuBlockDev {
++    Object parent_obj;
++    char *node_name;
++    SocketAddress *addr;
++    AioContext *ctx;
++    VuServer vu_server;
++    bool running;
++    uint32_t blk_size;
++    BlockBackend *backend;
++    QIOChannelSocket *sioc;
++    QTAILQ_ENTRY(VuBlockDev) next;
++    struct virtio_blk_config blkcfg;
++    bool writable;
++};
++
++#endif /* VHOST_USER_BLK_SERVER_H */
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
+new file mode 100644
+index XXXXXXX..XXXXXXX
+--- /dev/null
++++ b/block/export/vhost-user-blk-server.c
+@@ -XXX,XX +XXX,XX @@
++/*
++ * Sharing QEMU block devices via vhost-user protocal
++ *
++ * Parts of the code based on nbd/server.c.
++ *
++ * Copyright (c) Coiby Xu <coiby.xu@gmail.com>.
++ * Copyright (c) 2020 Red Hat, Inc.
++ *
++ * This work is licensed under the terms of the GNU GPL, version 2 or
++ * later.  See the COPYING file in the top-level directory.
++ */
++#include "qemu/osdep.h"
++#include "block/block.h"
++#include "vhost-user-blk-server.h"
++#include "qapi/error.h"
++#include "qom/object_interfaces.h"
++#include "sysemu/block-backend.h"
++#include "util/block-helpers.h"
++
++enum {
++    VHOST_USER_BLK_MAX_QUEUES = 1,
++};
++struct virtio_blk_inhdr {
++    unsigned char status;
++};
++
++typedef struct VuBlockReq {
++    VuVirtqElement *elem;
++    int64_t sector_num;
++    size_t size;
++    struct virtio_blk_inhdr *in;
++    struct virtio_blk_outhdr out;
++    VuServer *server;
++    struct VuVirtq *vq;
++} VuBlockReq;
++
++static void vu_block_req_complete(VuBlockReq *req)
++{
++    VuDev *vu_dev = &req->server->vu_dev;
++
++    /* IO size with 1 extra status byte */
++    vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
++    vu_queue_notify(vu_dev, req->vq);
++
++    if (req->elem) {
++        free(req->elem);
++    }
++
++    g_free(req);
++}
++
++static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
++{
++    return container_of(server, VuBlockDev, vu_server);
++}
++
++static int coroutine_fn
++vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
++                              uint32_t iovcnt, uint32_t type)
++{
++    struct virtio_blk_discard_write_zeroes desc;
++    ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
++    if (unlikely(size != sizeof(desc))) {
++        error_report("Invalid size %zd, expect %zu", size, sizeof(desc));
++        return -EINVAL;
++    }
++
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
++    uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
++                          le32_to_cpu(desc.num_sectors) << 9 };
++    if (type == VIRTIO_BLK_T_DISCARD) {
++        if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
++            return 0;
++        }
++    } else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
++        if (blk_co_pwrite_zeroes(vdev_blk->backend,
++                                 range[0], range[1], 0) == 0) {
++            return 0;
++        }
++    }
++
++    return -EINVAL;
++}
++
++static void coroutine_fn vu_block_flush(VuBlockReq *req)
++{
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
++    BlockBackend *backend = vdev_blk->backend;
++    blk_co_flush(backend);
++}
++
++struct req_data {
++    VuServer *server;
++    VuVirtq *vq;
++    VuVirtqElement *elem;
++};
++
++static void coroutine_fn vu_block_virtio_process_req(void *opaque)
++{
++    struct req_data *data = opaque;
++    VuServer *server = data->server;
++    VuVirtq *vq = data->vq;
++    VuVirtqElement *elem = data->elem;
++    uint32_t type;
++    VuBlockReq *req;
++
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
++    BlockBackend *backend = vdev_blk->backend;
++
++    struct iovec *in_iov = elem->in_sg;
++    struct iovec *out_iov = elem->out_sg;
++    unsigned in_num = elem->in_num;
++    unsigned out_num = elem->out_num;
++    /* refer to hw/block/virtio_blk.c */
++    if (elem->out_num < 1 || elem->in_num < 1) {
++        error_report("virtio-blk request missing headers");
++        free(elem);
++        return;
++    }
++
++    req = g_new0(VuBlockReq, 1);
++    req->server = server;
++    req->vq = vq;
++    req->elem = elem;
++
++    if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
++                            sizeof(req->out)) != sizeof(req->out))) {
++        error_report("virtio-blk request outhdr too short");
++        goto err;
++    }
++
++    iov_discard_front(&out_iov, &out_num, sizeof(req->out));
++
++    if (in_iov[in_num - 1].iov_len < sizeof(struct virtio_blk_inhdr)) {
++        error_report("virtio-blk request inhdr too short");
++        goto err;
++    }
++
++    /* We always touch the last byte, so just see how big in_iov is.  */
++    req->in = (void *)in_iov[in_num - 1].iov_base
++              + in_iov[in_num - 1].iov_len
++              - sizeof(struct virtio_blk_inhdr);
++    iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
++
++    type = le32_to_cpu(req->out.type);
++    switch (type & ~VIRTIO_BLK_T_BARRIER) {
++    case VIRTIO_BLK_T_IN:
++    case VIRTIO_BLK_T_OUT: {
++        ssize_t ret = 0;
++        bool is_write = type & VIRTIO_BLK_T_OUT;
++        req->sector_num = le64_to_cpu(req->out.sector);
++
++        int64_t offset = req->sector_num * vdev_blk->blk_size;
++        QEMUIOVector qiov;
++        if (is_write) {
++            qemu_iovec_init_external(&qiov, out_iov, out_num);
++            ret = blk_co_pwritev(backend, offset, qiov.size,
++                                 &qiov, 0);
++        } else {
++            qemu_iovec_init_external(&qiov, in_iov, in_num);
++            ret = blk_co_preadv(backend, offset, qiov.size,
++                                &qiov, 0);
++        }
++        if (ret >= 0) {
++            req->in->status = VIRTIO_BLK_S_OK;
++        } else {
++            req->in->status = VIRTIO_BLK_S_IOERR;
++        }
++        break;
++    }
++    case VIRTIO_BLK_T_FLUSH:
++        vu_block_flush(req);
++        req->in->status = VIRTIO_BLK_S_OK;
++        break;
++    case VIRTIO_BLK_T_GET_ID: {
++        size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
++                          VIRTIO_BLK_ID_BYTES);
++        snprintf(elem->in_sg[0].iov_base, size, "%s", "vhost_user_blk");
++        req->in->status = VIRTIO_BLK_S_OK;
++        req->size = elem->in_sg[0].iov_len;
++        break;
++    }
++    case VIRTIO_BLK_T_DISCARD:
++    case VIRTIO_BLK_T_WRITE_ZEROES: {
++        int rc;
++        rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
++                                           out_num, type);
++        if (rc == 0) {
++            req->in->status = VIRTIO_BLK_S_OK;
++        } else {
++            req->in->status = VIRTIO_BLK_S_IOERR;
++        }
++        break;
++    }
++    default:
++        req->in->status = VIRTIO_BLK_S_UNSUPP;
++        break;
++    }
++
++    vu_block_req_complete(req);
++    return;
++
++err:
++    free(elem);
++    g_free(req);
++    return;
++}
++
++static void vu_block_process_vq(VuDev *vu_dev, int idx)
++{
++    VuServer *server;
++    VuVirtq *vq;
++    struct req_data *req_data;
++
++    server = container_of(vu_dev, VuServer, vu_dev);
++    assert(server);
++
++    vq = vu_get_queue(vu_dev, idx);
++    assert(vq);
++    VuVirtqElement *elem;
++    while (1) {
++        elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
++                                    sizeof(VuBlockReq));
++        if (elem) {
++            req_data = g_new0(struct req_data, 1);
++            req_data->server = server;
++            req_data->vq = vq;
++            req_data->elem = elem;
++            Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
++                                                  req_data);
++            aio_co_enter(server->ioc->ctx, co);
++        } else {
++            break;
++        }
++    }
++}
++
++static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
++{
++    VuVirtq *vq;
++
++    assert(vu_dev);
++
++    vq = vu_get_queue(vu_dev, idx);
++    vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
++}
++
++static uint64_t vu_block_get_features(VuDev *dev)
++{
++    uint64_t features;
++    VuServer *server = container_of(dev, VuServer, vu_dev);
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
++    features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
++               1ull << VIRTIO_BLK_F_SEG_MAX |
++               1ull << VIRTIO_BLK_F_TOPOLOGY |
++               1ull << VIRTIO_BLK_F_BLK_SIZE |
++               1ull << VIRTIO_BLK_F_FLUSH |
++               1ull << VIRTIO_BLK_F_DISCARD |
++               1ull << VIRTIO_BLK_F_WRITE_ZEROES |
++               1ull << VIRTIO_BLK_F_CONFIG_WCE |
++               1ull << VIRTIO_F_VERSION_1 |
++               1ull << VIRTIO_RING_F_INDIRECT_DESC |
++               1ull << VIRTIO_RING_F_EVENT_IDX |
++               1ull << VHOST_USER_F_PROTOCOL_FEATURES;
++
++    if (!vdev_blk->writable) {
++        features |= 1ull << VIRTIO_BLK_F_RO;
++    }
++
++    return features;
++}
++
++static uint64_t vu_block_get_protocol_features(VuDev *dev)
++{
++    return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
++           1ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
++}
++
++static int
++vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
++{
++    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
++    memcpy(config, &vdev_blk->blkcfg, len);
++
++    return 0;
++}
++
++static int
++vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
++                    uint32_t offset, uint32_t size, uint32_t flags)
++{
++    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
++    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
++    uint8_t wce;
++
++    /* don't support live migration */
++    if (flags != VHOST_SET_CONFIG_TYPE_MASTER) {
++        return -EINVAL;
++    }
++
++    if (offset != offsetof(struct virtio_blk_config, wce) ||
++        size != 1) {
++        return -EINVAL;
++    }
++
++    wce = *data;
++    vdev_blk->blkcfg.wce = wce;
++    blk_set_enable_write_cache(vdev_blk->backend, wce);
++    return 0;
++}
++
++/*
++ * When the client disconnects, it sends a VHOST_USER_NONE request
++ * and vu_process_message will simple call exit which cause the VM
++ * to exit abruptly.
++ * To avoid this issue,  process VHOST_USER_NONE request ahead
++ * of vu_process_message.
++ *
++ */
++static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
++{
++    if (vmsg->request == VHOST_USER_NONE) {
++        dev->panic(dev, "disconnect");
++        return true;
++    }
++    return false;
++}
++
++static const VuDevIface vu_block_iface = {
++    .get_features          = vu_block_get_features,
++    .queue_set_started     = vu_block_queue_set_started,
++    .get_protocol_features = vu_block_get_protocol_features,
++    .get_config            = vu_block_get_config,
++    .set_config            = vu_block_set_config,
++    .process_msg           = vu_block_process_msg,
++};
++
++static void blk_aio_attached(AioContext *ctx, void *opaque)
++{
++    VuBlockDev *vub_dev = opaque;
++    aio_context_acquire(ctx);
++    vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
++    aio_context_release(ctx);
++}
++
++static void blk_aio_detach(void *opaque)
++{
++    VuBlockDev *vub_dev = opaque;
++    AioContext *ctx = vub_dev->vu_server.ctx;
++    aio_context_acquire(ctx);
++    vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
++    aio_context_release(ctx);
++}
++
++static void
++vu_block_initialize_config(BlockDriverState *bs,
++                           struct virtio_blk_config *config, uint32_t blk_size)
++{
++    config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
++    config->blk_size = blk_size;
++    config->size_max = 0;
++    config->seg_max = 128 - 2;
++    config->min_io_size = 1;
++    config->opt_io_size = 1;
++    config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
++    config->max_discard_sectors = 32768;
++    config->max_discard_seg = 1;
++    config->discard_sector_alignment = config->blk_size >> 9;
++    config->max_write_zeroes_sectors = 32768;
++    config->max_write_zeroes_seg = 1;
++}
++
++static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
++{
++
++    BlockBackend *blk;
++    Error *local_error = NULL;
++    const char *node_name = vu_block_device->node_name;
++    bool writable = vu_block_device->writable;
++    uint64_t perm = BLK_PERM_CONSISTENT_READ;
++    int ret;
++
++    AioContext *ctx;
++
++    BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
++
++    if (!bs) {
++        error_propagate(errp, local_error);
++        return NULL;
++    }
++
++    if (bdrv_is_read_only(bs)) {
++        writable = false;
++    }
++
++    if (writable) {
++        perm |= BLK_PERM_WRITE;
++    }
++
++    ctx = bdrv_get_aio_context(bs);
++    aio_context_acquire(ctx);
++    bdrv_invalidate_cache(bs, NULL);
++    aio_context_release(ctx);
++
++    /*
++     * Don't allow resize while the vhost user server is running,
++     * otherwise we don't care what happens with the node.
++     */
++    blk = blk_new(bdrv_get_aio_context(bs), perm,
++                  BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
++                  BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
++    ret = blk_insert_bs(blk, bs, errp);
++
++    if (ret < 0) {
++        goto fail;
++    }
++
++    blk_set_enable_write_cache(blk, false);
++
++    blk_set_allow_aio_context_change(blk, true);
++
++    vu_block_device->blkcfg.wce = 0;
++    vu_block_device->backend = blk;
++    if (!vu_block_device->blk_size) {
++        vu_block_device->blk_size = BDRV_SECTOR_SIZE;
++    }
++    vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
++    blk_set_guest_block_size(blk, vu_block_device->blk_size);
++    vu_block_initialize_config(bs, &vu_block_device->blkcfg,
++                                   vu_block_device->blk_size);
++    return vu_block_device;
++
++fail:
++    blk_unref(blk);
++    return NULL;
++}
++
++static void vu_block_deinit(VuBlockDev *vu_block_device)
++{
++    if (vu_block_device->backend) {
++        blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
++                                        blk_aio_detach, vu_block_device);
++    }
++
++    blk_unref(vu_block_device->backend);
++}
++
++static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
++{
++    vhost_user_server_stop(&vu_block_device->vu_server);
++    vu_block_deinit(vu_block_device);
++}
++
++static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
++                                        Error **errp)
++{
++    AioContext *ctx;
++    SocketAddress *addr = vu_block_device->addr;
++
++    if (!vu_block_init(vu_block_device, errp)) {
++        return;
++    }
++
++    ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
++
++    if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
++                                 VHOST_USER_BLK_MAX_QUEUES,
++                                 NULL, &vu_block_iface,
++                                 errp)) {
++        goto error;
++    }
++
++    blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
++                                 blk_aio_detach, vu_block_device);
++    vu_block_device->running = true;
++    return;
++
++ error:
++    vu_block_deinit(vu_block_device);
++}
++
++static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
++{
++    if (vus->running) {
++            error_setg(errp, "The property can't be modified "
++                       "while the server is running");
++            return false;
++    }
++    return true;
++}
++
++static void vu_set_node_name(Object *obj, const char *value, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++
++    if (!vu_prop_modifiable(vus, errp)) {
++        return;
++    }
++
++    if (vus->node_name) {
++        g_free(vus->node_name);
++    }
++
++    vus->node_name = g_strdup(value);
++}
++
++static char *vu_get_node_name(Object *obj, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++    return g_strdup(vus->node_name);
++}
++
++static void free_socket_addr(SocketAddress *addr)
++{
++        g_free(addr->u.q_unix.path);
++        g_free(addr);
++}
++
++static void vu_set_unix_socket(Object *obj, const char *value,
++                               Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++
++    if (!vu_prop_modifiable(vus, errp)) {
++        return;
++    }
++
++    if (vus->addr) {
++        free_socket_addr(vus->addr);
++    }
++
++    SocketAddress *addr = g_new0(SocketAddress, 1);
++    addr->type = SOCKET_ADDRESS_TYPE_UNIX;
++    addr->u.q_unix.path = g_strdup(value);
++    vus->addr = addr;
++}
++
++static char *vu_get_unix_socket(Object *obj, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++    return g_strdup(vus->addr->u.q_unix.path);
++}
++
++static bool vu_get_block_writable(Object *obj, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++    return vus->writable;
++}
++
++static void vu_set_block_writable(Object *obj, bool value, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++
++    if (!vu_prop_modifiable(vus, errp)) {
++            return;
++    }
++
++    vus->writable = value;
++}
++
++static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
++                            void *opaque, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++    uint32_t value = vus->blk_size;
++
++    visit_type_uint32(v, name, &value, errp);
++}
++
++static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
++                            void *opaque, Error **errp)
++{
++    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
++
++    Error *local_err = NULL;
++    uint32_t value;
++
++    if (!vu_prop_modifiable(vus, errp)) {
++            return;
++    }
++
++    visit_type_uint32(v, name, &value, &local_err);
++    if (local_err) {
++        goto out;
++    }
++
++    check_block_size(object_get_typename(obj), name, value, &local_err);
++    if (local_err) {
++        goto out;
++    }
++
++    vus->blk_size = value;
++
++out:
++    error_propagate(errp, local_err);
++}
++
++static void vhost_user_blk_server_instance_finalize(Object *obj)
++{
++    VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
++
++    vhost_user_blk_server_stop(vub);
++
++    /*
++     * Unlike object_property_add_str, object_class_property_add_str
++     * doesn't have a release method. Thus manual memory freeing is
++     * needed.
++     */
++    free_socket_addr(vub->addr);
++    g_free(vub->node_name);
++}
++
++static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
++{
++    VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
++
++    vhost_user_blk_server_start(vub, errp);
++}
++
++static void vhost_user_blk_server_class_init(ObjectClass *klass,
++                                             void *class_data)
++{
++    UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
++    ucc->complete = vhost_user_blk_server_complete;
++
++    object_class_property_add_bool(klass, "writable",
++                                   vu_get_block_writable,
++                                   vu_set_block_writable);
++
++    object_class_property_add_str(klass, "node-name",
++                                  vu_get_node_name,
++                                  vu_set_node_name);
++
++    object_class_property_add_str(klass, "unix-socket",
++                                  vu_get_unix_socket,
++                                  vu_set_unix_socket);
++
++    object_class_property_add(klass, "logical-block-size", "uint32",
++                              vu_get_blk_size, vu_set_blk_size,
++                              NULL, NULL);
++}
++
++static const TypeInfo vhost_user_blk_server_info = {
++    .name = TYPE_VHOST_USER_BLK_SERVER,
++    .parent = TYPE_OBJECT,
++    .instance_size = sizeof(VuBlockDev),
++    .instance_finalize = vhost_user_blk_server_instance_finalize,
++    .class_init = vhost_user_blk_server_class_init,
++    .interfaces = (InterfaceInfo[]) {
++        {TYPE_USER_CREATABLE},
++        {}
++    },
++};
++
++static void vhost_user_blk_server_register_types(void)
++{
++    type_register_static(&vhost_user_blk_server_info);
++}
++
++type_init(vhost_user_blk_server_register_types)
+diff --git a/softmmu/vl.c b/softmmu/vl.c
+index XXXXXXX..XXXXXXX 100644
+--- a/softmmu/vl.c
++++ b/softmmu/vl.c
+@@ -XXX,XX +XXX,XX @@ static bool object_create_initial(const char *type, QemuOpts *opts)
+     }
+ #endif
++    /* Reason: vhost-user-blk-server property "node-name" */
++    if (g_str_equal(type, "vhost-user-blk-server")) {
++        return false;
++    }
+     /*
+      * Reason: filter-* property "netdev" etc.
+      */
+diff --git a/block/meson.build b/block/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/block/meson.build
++++ b/block/meson.build
+@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
+ block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
+ block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
+ block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
++block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
+ block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
+ block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
+ block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
+--
+.26.2

-New patch
+[PULL v2 07/28] MAINTAINERS: Add vhost-user block device backend server maintainer
+From: Coiby Xu <coiby.xu@gmail.com>
+Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
+Signed-off-by: Coiby Xu <coiby.xu@gmail.com>
+Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
+Message-id: 20200918080912.321299-8-coiby.xu@gmail.com
+[Removed reference to vhost-user-blk-test.c, it will be sent in a
+separate pull request.
+--Stefan]
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ MAINTAINERS | 7 +++++++
+file changed, 7 insertions(+)
+diff --git a/MAINTAINERS b/MAINTAINERS
+index XXXXXXX..XXXXXXX 100644
+--- a/MAINTAINERS
++++ b/MAINTAINERS
+@@ -XXX,XX +XXX,XX @@ L: qemu-block@nongnu.org
+ S: Supported
+ F: tests/image-fuzzer/
++Vhost-user block device backend server
++M: Coiby Xu <Coiby.Xu@gmail.com>
++S: Maintained
++F: block/export/vhost-user-blk-server.c
++F: util/vhost-user-server.c
++F: tests/qtest/libqos/vhost-user-blk.c
++
+ Replication
+ M: Wen Congyang <wencongyang2@huawei.com>
+ M: Xie Changlong <xiechanglong.d@gmail.com>
+--
+.26.2

-New patch
+[PULL v2 08/28] util/vhost-user-server: s/fileds/fields/ typo fix
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-3-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ util/vhost-user-server.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.c
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
+         return false;
+     }
+-    /* zero out unspecified fileds */
++    /* zero out unspecified fields */
+     *server = (VuServer) {
+         .listener              = listener,
+         .vu_iface              = vu_iface,
+--
+.26.2

-New patch
+[PULL v2 09/28] util/vhost-user-server: drop unnecessary QOM cast
+We already have access to the value with the correct type (ioc and sioc
+are the same QIOChannel).
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-4-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ util/vhost-user-server.c | 2 +-
+file changed, 1 insertion(+), 1 deletion(-)
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.c
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
+     server->ioc = QIO_CHANNEL(sioc);
+     object_ref(OBJECT(server->ioc));
+     qio_channel_attach_aio_context(server->ioc, server->ctx);
+-    qio_channel_set_blocking(QIO_CHANNEL(server->sioc), false, NULL);
++    qio_channel_set_blocking(server->ioc, false, NULL);
+     vu_client_start(server);
+ }
+--
+.26.2

-[PULL 11/14] file-posix: Check bs->bl.zoned for zone info
+[PULL v2 10/28] util/vhost-user-server: drop unnecessary watch deletion
-Instead of checking bs->wps or bs->bl.zone_size for whether zone
+Explicitly deleting watches is not necessary since libvhost-user calls
-information is present, check bs->bl.zoned.  That is the flag that
+remove_watch() during vu_deinit(). Add an assertion to check this
-raw_refresh_zoned_limits() reliably sets to indicate zone support.  If
+though.
 it is set to something other than BLK_Z_NONE, other values and objects
 like bs->wps and bs->bl.zone_size must be non-null/zero and valid; if it
 is not, we cannot rely on their validity.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Message-Id: <20230824155345.109765-3-hreitz@redhat.com>
+Message-id: 20200924151549.913737-5-stefanha@redhat.com
-Reviewed-by: Sam Li <faithilikerun@gmail.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/file-posix.c | 12 +++++++-----
+ util/vhost-user-server.c | 19 ++++---------------
-file changed, 7 insertions(+), 5 deletions(-)
+file changed, 4 insertions(+), 15 deletions(-)
-diff --git a/block/file-posix.c b/block/file-posix.c
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/file-posix.c
+--- a/util/vhost-user-server.c
-+++ b/block/file-posix.c
++++ b/util/vhost-user-server.c
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
+@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
-     if (fd_open(bs) < 0)
+     /* When this is set vu_client_trip will stop new processing vhost-user message */
-         return -EIO;
+     server->sioc = NULL;
- #if defined(CONFIG_BLKZONED)
--    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && bs->wps) {
+-    VuFdWatch *vu_fd_watch, *next;
-+    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+-    QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
-+        bs->bl.zoned != BLK_Z_NONE) {
+-        aio_set_fd_handler(server->ioc->ctx, vu_fd_watch->fd, true, NULL,
-         qemu_co_mutex_lock(&bs->wps->colock);
+-                           NULL, NULL, NULL);
--        if (type & QEMU_AIO_ZONE_APPEND && bs->bl.zone_size) {
+-    }
-+        if (type & QEMU_AIO_ZONE_APPEND) {
+-
-             int index = offset / bs->bl.zone_size;
+-    while (!QTAILQ_EMPTY(&server->vu_fd_watches)) {
-             offset = bs->wps->wp[index];
+-        QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
-         }
+-            if (!vu_fd_watch->processing) {
-@@ -XXX,XX +XXX,XX @@ out:
+-                QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next);
- {
+-                g_free(vu_fd_watch);
-     BlockZoneWps *wps = bs->wps;
+-            }
-     if (ret == 0) {
+-        }
--        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
+-    }
--            && wps && bs->bl.zone_size) {
+-
-+        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+     while (server->processing_msg) {
-+            bs->bl.zoned != BLK_Z_NONE) {
+         if (server->ioc->read_coroutine) {
-             uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
+             server->ioc->read_coroutine = NULL;
-             if (!BDRV_ZT_IS_CONV(*wp)) {
+@@ -XXX,XX +XXX,XX @@ static void close_client(VuServer *server)
                  if (type & QEMU_AIO_ZONE_APPEND) {
@@ -XXX,XX +XXX,XX @@ out:
          }
      }
--    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) && wps) {
+     vu_deinit(&server->vu_dev);
-+    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
++
-+        bs->blk.zoned != BLK_Z_NONE) {
++    /* vu_deinit() should have called remove_watch() */
-         qemu_co_mutex_unlock(&wps->colock);
++    assert(QTAILQ_EMPTY(&server->vu_fd_watches));
-     }
++
      object_unref(OBJECT(sioc));
      object_unref(OBJECT(server->ioc));
  }
 --
-.41.0
+.26.2

-[PULL 08/14] fsdev: Use ThrottleDirection instread of bool is_write
+[PULL v2 11/28] block/export: consolidate request structs into VuBlockReq
-From: zhenwei pi <pizhenwei@bytedance.com>
+Only one struct is needed per request. Drop req_data and the separate
 VuBlockReq instance. Instead let vu_queue_pop() allocate everything at
 once.
-'bool is_write' style is obsolete from throttle framework, adapt
+This fixes the req_data memory leak in vu_block_virtio_process_req().
 fsdev to the new style.
-Cc: Greg Kurz <groug@kaod.org>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Message-id: 20200924151549.913737-6-stefanha@redhat.com
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-Id: <20230728022006.1098509-9-pizhenwei@bytedance.com>
 Reviewed-by: Greg Kurz <groug@kaod.org>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
- fsdev/qemu-fsdev-throttle.h |  4 ++--
+ block/export/vhost-user-blk-server.c | 68 +++++++++-------------------
- fsdev/qemu-fsdev-throttle.c | 14 +++++++-------
+file changed, 21 insertions(+), 47 deletions(-)
  hw/9pfs/cofile.c            |  4 ++--
 files changed, 11 insertions(+), 11 deletions(-)
-diff --git a/fsdev/qemu-fsdev-throttle.h b/fsdev/qemu-fsdev-throttle.h
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
 index XXXXXXX..XXXXXXX 100644
---- a/fsdev/qemu-fsdev-throttle.h
+--- a/block/export/vhost-user-blk-server.c
-+++ b/fsdev/qemu-fsdev-throttle.h
++++ b/block/export/vhost-user-blk-server.c
-@@ -XXX,XX +XXX,XX @@ typedef struct FsThrottle {
+@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
-     ThrottleState ts;
+ };
-     ThrottleTimers tt;
-     ThrottleConfig cfg;
+ typedef struct VuBlockReq {
--    CoQueue      throttled_reqs[2];
+-    VuVirtqElement *elem;
-+    CoQueue      throttled_reqs[THROTTLE_MAX];
++    VuVirtqElement elem;
- } FsThrottle;
+     int64_t sector_num;
+     size_t size;
- int fsdev_throttle_parse_opts(QemuOpts *, FsThrottle *, Error **);
+     struct virtio_blk_inhdr *in;
+@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
- void fsdev_throttle_init(FsThrottle *);
+     VuDev *vu_dev = &req->server->vu_dev;
--void coroutine_fn fsdev_co_throttle_request(FsThrottle *, bool ,
+     /* IO size with 1 extra status byte */
-+void coroutine_fn fsdev_co_throttle_request(FsThrottle *, ThrottleDirection ,
+-    vu_queue_push(vu_dev, req->vq, req->elem, req->size + 1);
-                                             struct iovec *, int);
++    vu_queue_push(vu_dev, req->vq, &req->elem, req->size + 1);
+     vu_queue_notify(vu_dev, req->vq);
- void fsdev_throttle_cleanup(FsThrottle *);
-diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c
+-    if (req->elem) {
-index XXXXXXX..XXXXXXX 100644
+-        free(req->elem);
---- a/fsdev/qemu-fsdev-throttle.c
+-    }
-+++ b/fsdev/qemu-fsdev-throttle.c
+-
-@@ -XXX,XX +XXX,XX @@ void fsdev_throttle_init(FsThrottle *fst)
+-    g_free(req);
 +    free(req);
  }
  static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_flush(VuBlockReq *req)
      blk_co_flush(backend);
  }
 -struct req_data {
 -    VuServer *server;
 -    VuVirtq *vq;
 -    VuVirtqElement *elem;
 -};
 -
  static void coroutine_fn vu_block_virtio_process_req(void *opaque)
  {
 -    struct req_data *data = opaque;
 -    VuServer *server = data->server;
 -    VuVirtq *vq = data->vq;
 -    VuVirtqElement *elem = data->elem;
 +    VuBlockReq *req = opaque;
 +    VuServer *server = req->server;
 +    VuVirtqElement *elem = &req->elem;
      uint32_t type;
 -    VuBlockReq *req;
      VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
      BlockBackend *backend = vdev_blk->backend;
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
      struct iovec *out_iov = elem->out_sg;
      unsigned in_num = elem->in_num;
      unsigned out_num = elem->out_num;
 +
      /* refer to hw/block/virtio_blk.c */
      if (elem->out_num < 1 || elem->in_num < 1) {
          error_report("virtio-blk request missing headers");
 -        free(elem);
 -        return;
 +        goto err;
      }
 -    req = g_new0(VuBlockReq, 1);
 -    req->server = server;
 -    req->vq = vq;
 -    req->elem = elem;
 -
      if (unlikely(iov_to_buf(out_iov, out_num, 0, &req->out,
                              sizeof(req->out)) != sizeof(req->out))) {
          error_report("virtio-blk request outhdr too short");
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
  err:
      free(elem);
 -    g_free(req);
 -    return;
  }
  static void vu_block_process_vq(VuDev *vu_dev, int idx)
  {
 -    VuServer *server;
 -    VuVirtq *vq;
 -    struct req_data *req_data;
 +    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 +    VuVirtq *vq = vu_get_queue(vu_dev, idx);
 -    server = container_of(vu_dev, VuServer, vu_dev);
 -    assert(server);
 -
 -    vq = vu_get_queue(vu_dev, idx);
 -    assert(vq);
 -    VuVirtqElement *elem;
      while (1) {
 -        elem = vu_queue_pop(vu_dev, vq, sizeof(VuVirtqElement) +
 -                                    sizeof(VuBlockReq));
 -        if (elem) {
 -            req_data = g_new0(struct req_data, 1);
 -            req_data->server = server;
 -            req_data->vq = vq;
 -            req_data->elem = elem;
 -            Coroutine *co = qemu_coroutine_create(vu_block_virtio_process_req,
 -                                                  req_data);
 -            aio_co_enter(server->ioc->ctx, co);
 -        } else {
 +        VuBlockReq *req;
 +
 +        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
 +        if (!req) {
              break;
          }
 +
 +        req->server = server;
 +        req->vq = vq;
 +
 +        Coroutine *co =
 +            qemu_coroutine_create(vu_block_virtio_process_req, req);
 +        qemu_coroutine_enter(co);
      }
  }
--void coroutine_fn fsdev_co_throttle_request(FsThrottle *fst, bool is_write,
-+void coroutine_fn fsdev_co_throttle_request(FsThrottle *fst,
-+                                            ThrottleDirection direction,
-                                             struct iovec *iov, int iovcnt)
- {
--    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
--
-+    assert(direction < THROTTLE_MAX);
-     if (throttle_enabled(&fst->cfg)) {
-         if (throttle_schedule_timer(&fst->ts, &fst->tt, direction) ||
--            !qemu_co_queue_empty(&fst->throttled_reqs[is_write])) {
--            qemu_co_queue_wait(&fst->throttled_reqs[is_write], NULL);
-+            !qemu_co_queue_empty(&fst->throttled_reqs[direction])) {
-+            qemu_co_queue_wait(&fst->throttled_reqs[direction], NULL);
-         }
-         throttle_account(&fst->ts, direction, iov_size(iov, iovcnt));
--        if (!qemu_co_queue_empty(&fst->throttled_reqs[is_write]) &&
-+        if (!qemu_co_queue_empty(&fst->throttled_reqs[direction]) &&
-             !throttle_schedule_timer(&fst->ts, &fst->tt, direction)) {
--            qemu_co_queue_next(&fst->throttled_reqs[is_write]);
-+            qemu_co_queue_next(&fst->throttled_reqs[direction]);
-         }
-     }
- }
-diff --git a/hw/9pfs/cofile.c b/hw/9pfs/cofile.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/9pfs/cofile.c
-+++ b/hw/9pfs/cofile.c
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn v9fs_co_pwritev(V9fsPDU *pdu, V9fsFidState *fidp,
-     if (v9fs_request_cancelled(pdu)) {
-         return -EINTR;
-     }
--    fsdev_co_throttle_request(s->ctx.fst, true, iov, iovcnt);
-+    fsdev_co_throttle_request(s->ctx.fst, THROTTLE_WRITE, iov, iovcnt);
-     v9fs_co_run_in_worker(
-         {
-             err = s->ops->pwritev(&s->ctx, &fidp->fs, iov, iovcnt, offset);
-@@ -XXX,XX +XXX,XX @@ int coroutine_fn v9fs_co_preadv(V9fsPDU *pdu, V9fsFidState *fidp,
-     if (v9fs_request_cancelled(pdu)) {
-         return -EINTR;
-     }
--    fsdev_co_throttle_request(s->ctx.fst, false, iov, iovcnt);
-+    fsdev_co_throttle_request(s->ctx.fst, THROTTLE_READ, iov, iovcnt);
-     v9fs_co_run_in_worker(
-         {
-             err = s->ops->preadv(&s->ctx, &fidp->fs, iov, iovcnt, offset);
 --
-.41.0
+.26.2

-New patch
+[PULL v2 12/28] util/vhost-user-server: drop unused DevicePanicNotifier
+The device panic notifier callback is not used. Drop it.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-7-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ util/vhost-user-server.h             | 3 ---
+ block/export/vhost-user-blk-server.c | 3 +--
+ util/vhost-user-server.c             | 6 ------
+files changed, 1 insertion(+), 11 deletions(-)
+diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.h
++++ b/util/vhost-user-server.h
+@@ -XXX,XX +XXX,XX @@ typedef struct VuFdWatch {
+ } VuFdWatch;
+ typedef struct VuServer VuServer;
+-typedef void DevicePanicNotifierFn(VuServer *server);
+ struct VuServer {
+     QIONetListener *listener;
+     AioContext *ctx;
+-    DevicePanicNotifierFn *device_panic_notifier;
+     int max_queues;
+     const VuDevIface *vu_iface;
+     VuDev vu_dev;
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
+                              SocketAddress *unix_socket,
+                              AioContext *ctx,
+                              uint16_t max_queues,
+-                             DevicePanicNotifierFn *device_panic_notifier,
+                              const VuDevIface *vu_iface,
+                              Error **errp);
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/export/vhost-user-blk-server.c
++++ b/block/export/vhost-user-blk-server.c
+@@ -XXX,XX +XXX,XX @@ static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
+     ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
+     if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
+-                                 VHOST_USER_BLK_MAX_QUEUES,
+-                                 NULL, &vu_block_iface,
++                                 VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
+                                  errp)) {
+         goto error;
+     }
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.c
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@ static void panic_cb(VuDev *vu_dev, const char *buf)
+         close_client(server);
+     }
+-    if (server->device_panic_notifier) {
+-        server->device_panic_notifier(server);
+-    }
+-
+     /*
+      * Set the callback function for network listener so another
+      * vhost-user client can connect to this server
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
+                              SocketAddress *socket_addr,
+                              AioContext *ctx,
+                              uint16_t max_queues,
+-                             DevicePanicNotifierFn *device_panic_notifier,
+                              const VuDevIface *vu_iface,
+                              Error **errp)
+ {
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
+         .vu_iface              = vu_iface,
+         .max_queues            = max_queues,
+         .ctx                   = ctx,
+-        .device_panic_notifier = device_panic_notifier,
+     };
+     qio_net_listener_set_name(server->listener, "vhost-user-backend-listener");
+--
+.26.2

-New patch
+[PULL v2 13/28] util/vhost-user-server: fix memory leak in vu_message_read()
+fds[] is leaked when qio_channel_readv_full() fails.
+Use vmsg->fds[] instead of keeping a local fds[] array. Then we can
+reuse goto fail to clean up fds. vmsg->fd_num must be zeroed before the
+loop to make this safe.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-8-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ util/vhost-user-server.c | 50 ++++++++++++++++++----------------------
+file changed, 23 insertions(+), 27 deletions(-)
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.c
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
+     };
+     int rc, read_bytes = 0;
+     Error *local_err = NULL;
+-    /*
+-     * Store fds/nfds returned from qio_channel_readv_full into
+-     * temporary variables.
+-     *
+-     * VhostUserMsg is a packed structure, gcc will complain about passing
+-     * pointer to a packed structure member if we pass &VhostUserMsg.fd_num
+-     * and &VhostUserMsg.fds directly when calling qio_channel_readv_full,
+-     * thus two temporary variables nfds and fds are used here.
+-     */
+-    size_t nfds = 0, nfds_t = 0;
+     const size_t max_fds = G_N_ELEMENTS(vmsg->fds);
+-    int *fds_t = NULL;
+     VuServer *server = container_of(vu_dev, VuServer, vu_dev);
+     QIOChannel *ioc = server->ioc;
++    vmsg->fd_num = 0;
+     if (!ioc) {
+         error_report_err(local_err);
+         goto fail;
+@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
+     assert(qemu_in_coroutine());
+     do {
++        size_t nfds = 0;
++        int *fds = NULL;
++
+         /*
+          * qio_channel_readv_full may have short reads, keeping calling it
+          * until getting VHOST_USER_HDR_SIZE or 0 bytes in total
+          */
+-        rc = qio_channel_readv_full(ioc, &iov, 1, &fds_t, &nfds_t, &local_err);
++        rc = qio_channel_readv_full(ioc, &iov, 1, &fds, &nfds, &local_err);
+         if (rc < 0) {
+             if (rc == QIO_CHANNEL_ERR_BLOCK) {
++                assert(local_err == NULL);
+                 qio_channel_yield(ioc, G_IO_IN);
+                 continue;
+             } else {
+                 error_report_err(local_err);
+-                return false;
++                goto fail;
+             }
+         }
+-        read_bytes += rc;
+-        if (nfds_t > 0) {
+-            if (nfds + nfds_t > max_fds) {
++
++        if (nfds > 0) {
++            if (vmsg->fd_num + nfds > max_fds) {
+                 error_report("A maximum of %zu fds are allowed, "
+                              "however got %zu fds now",
+-                             max_fds, nfds + nfds_t);
++                             max_fds, vmsg->fd_num + nfds);
++                g_free(fds);
+                 goto fail;
+             }
+-            memcpy(vmsg->fds + nfds, fds_t,
+-                   nfds_t *sizeof(vmsg->fds[0]));
+-            nfds += nfds_t;
+-            g_free(fds_t);
++            memcpy(vmsg->fds + vmsg->fd_num, fds, nfds * sizeof(vmsg->fds[0]));
++            vmsg->fd_num += nfds;
++            g_free(fds);
+         }
+-        if (read_bytes == VHOST_USER_HDR_SIZE || rc == 0) {
+-            break;
++
++        if (rc == 0) { /* socket closed */
++            goto fail;
+         }
+-        iov.iov_base = (char *)vmsg + read_bytes;
+-        iov.iov_len = VHOST_USER_HDR_SIZE - read_bytes;
+-    } while (true);
+-    vmsg->fd_num = nfds;
++        iov.iov_base += rc;
++        iov.iov_len -= rc;
++        read_bytes += rc;
++    } while (read_bytes != VHOST_USER_HDR_SIZE);
++
+     /* qio_channel_readv_full will make socket fds blocking, unblock them */
+     vmsg_unblock_fds(vmsg);
+     if (vmsg->size > sizeof(vmsg->payload)) {
+--
+.26.2

-[PULL 12/14] file-posix: Fix zone update in I/O error path
+[PULL v2 14/28] util/vhost-user-server: check EOF when reading payload
-We must check that zone information is present before running
+Unexpected EOF is an error that must be reported.
 update_zones_wp().
-Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2234374
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Fixes: Coverity CID 1512459
+Message-id: 20200924151549.913737-9-stefanha@redhat.com
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-Id: <20230824155345.109765-4-hreitz@redhat.com>
 Reviewed-by: Sam Li <faithilikerun@gmail.com>
 ---
- block/file-posix.c | 3 ++-
+ util/vhost-user-server.c | 6 ++++--
-file changed, 2 insertions(+), 1 deletion(-)
+file changed, 4 insertions(+), 2 deletions(-)
-diff --git a/block/file-posix.c b/block/file-posix.c
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/file-posix.c
+--- a/util/vhost-user-server.c
-+++ b/block/file-posix.c
++++ b/util/vhost-user-server.c
-@@ -XXX,XX +XXX,XX @@ out:
+@@ -XXX,XX +XXX,XX @@ vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg)
-             }
+     };
-         }
+     if (vmsg->size) {
-     } else {
+         rc = qio_channel_readv_all_eof(ioc, &iov_payload, 1, &local_err);
--        if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
+-        if (rc == -1) {
-+        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+-            error_report_err(local_err);
-+            bs->bl.zoned != BLK_Z_NONE) {
++        if (rc != 1) {
-             update_zones_wp(bs, s->fd, 0, 1);
++            if (local_err) {
 +                error_report_err(local_err);
 +            }
              goto fail;
          }
      }
 --
-.41.0
+.26.2

-New patch
+[PULL v2 15/28] util/vhost-user-server: rework vu_client_trip() coroutine lifecycle
+The vu_client_trip() coroutine is leaked during AioContext switching. It
+is also unsafe to destroy the vu_dev in panic_cb() since its callers
+still access it in some cases.
+Rework the lifecycle to solve these safety issues.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-10-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ util/vhost-user-server.h             |  29 ++--
+ block/export/vhost-user-blk-server.c |   9 +-
+ util/vhost-user-server.c             | 245 +++++++++++++++------------
+files changed, 155 insertions(+), 128 deletions(-)
+diff --git a/util/vhost-user-server.h b/util/vhost-user-server.h
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.h
++++ b/util/vhost-user-server.h
+@@ -XXX,XX +XXX,XX @@
+ #include "qapi/error.h"
+ #include "standard-headers/linux/virtio_blk.h"
++/* A kick fd that we monitor on behalf of libvhost-user */
+ typedef struct VuFdWatch {
+     VuDev *vu_dev;
+     int fd; /*kick fd*/
+     void *pvt;
+     vu_watch_cb cb;
+-    bool processing;
+     QTAILQ_ENTRY(VuFdWatch) next;
+ } VuFdWatch;
+-typedef struct VuServer VuServer;
+-
+-struct VuServer {
++/**
++ * VuServer:
++ * A vhost-user server instance with user-defined VuDevIface callbacks.
++ * Vhost-user device backends can be implemented using VuServer. VuDevIface
++ * callbacks and virtqueue kicks run in the given AioContext.
++ */
++typedef struct {
+     QIONetListener *listener;
++    QEMUBH *restart_listener_bh;
+     AioContext *ctx;
+     int max_queues;
+     const VuDevIface *vu_iface;
++
++    /* Protected by ctx lock */
+     VuDev vu_dev;
+     QIOChannel *ioc; /* The I/O channel with the client */
+     QIOChannelSocket *sioc; /* The underlying data channel with the client */
+-    /* IOChannel for fd provided via VHOST_USER_SET_SLAVE_REQ_FD */
+-    QIOChannel *ioc_slave;
+-    QIOChannelSocket *sioc_slave;
+-    Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
+     QTAILQ_HEAD(, VuFdWatch) vu_fd_watches;
+-    /* restart coroutine co_trip if AIOContext is changed */
+-    bool aio_context_changed;
+-    bool processing_msg;
+-};
++
++    Coroutine *co_trip; /* coroutine for processing VhostUserMsg */
++} VuServer;
+ bool vhost_user_server_start(VuServer *server,
+                              SocketAddress *unix_socket,
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
+ void vhost_user_server_stop(VuServer *server);
+-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx);
++void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx);
++void vhost_user_server_detach_aio_context(VuServer *server);
+ #endif /* VHOST_USER_SERVER_H */
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/export/vhost-user-blk-server.c
++++ b/block/export/vhost-user-blk-server.c
+@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_block_iface = {
+ static void blk_aio_attached(AioContext *ctx, void *opaque)
+ {
+     VuBlockDev *vub_dev = opaque;
+-    aio_context_acquire(ctx);
+-    vhost_user_server_set_aio_context(&vub_dev->vu_server, ctx);
+-    aio_context_release(ctx);
++    vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
+ }
+ static void blk_aio_detach(void *opaque)
+ {
+     VuBlockDev *vub_dev = opaque;
+-    AioContext *ctx = vub_dev->vu_server.ctx;
+-    aio_context_acquire(ctx);
+-    vhost_user_server_set_aio_context(&vub_dev->vu_server, NULL);
+-    aio_context_release(ctx);
++    vhost_user_server_detach_aio_context(&vub_dev->vu_server);
+ }
+ static void
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.c
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@
+  */
+ #include "qemu/osdep.h"
+ #include "qemu/main-loop.h"
++#include "block/aio-wait.h"
+ #include "vhost-user-server.h"
++/*
++ * Theory of operation:
++ *
++ * VuServer is started and stopped by vhost_user_server_start() and
++ * vhost_user_server_stop() from the main loop thread. Starting the server
++ * opens a vhost-user UNIX domain socket and listens for incoming connections.
++ * Only one connection is allowed at a time.
++ *
++ * The connection is handled by the vu_client_trip() coroutine in the
++ * VuServer->ctx AioContext. The coroutine consists of a vu_dispatch() loop
++ * where libvhost-user calls vu_message_read() to receive the next vhost-user
++ * protocol messages over the UNIX domain socket.
++ *
++ * When virtqueues are set up libvhost-user calls set_watch() to monitor kick
++ * fds. These fds are also handled in the VuServer->ctx AioContext.
++ *
++ * Both vu_client_trip() and kick fd monitoring can be stopped by shutting down
++ * the socket connection. Shutting down the socket connection causes
++ * vu_message_read() to fail since no more data can be received from the socket.
++ * After vu_dispatch() fails, vu_client_trip() calls vu_deinit() to stop
++ * libvhost-user before terminating the coroutine. vu_deinit() calls
++ * remove_watch() to stop monitoring kick fds and this stops virtqueue
++ * processing.
++ *
++ * When vu_client_trip() has finished cleaning up it schedules a BH in the main
++ * loop thread to accept the next client connection.
++ *
++ * When libvhost-user detects an error it calls panic_cb() and sets the
++ * dev->broken flag. Both vu_client_trip() and kick fd processing stop when
++ * the dev->broken flag is set.
++ *
++ * It is possible to switch AioContexts using
++ * vhost_user_server_detach_aio_context() and
++ * vhost_user_server_attach_aio_context(). They stop monitoring fds in the old
++ * AioContext and resume monitoring in the new AioContext. The vu_client_trip()
++ * coroutine remains in a yielded state during the switch. This is made
++ * possible by QIOChannel's support for spurious coroutine re-entry in
++ * qio_channel_yield(). The coroutine will restart I/O when re-entered from the
++ * new AioContext.
++ */
++
+ static void vmsg_close_fds(VhostUserMsg *vmsg)
+ {
+     int i;
+@@ -XXX,XX +XXX,XX @@ static void vmsg_unblock_fds(VhostUserMsg *vmsg)
+     }
+ }
+-static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
+-                      gpointer opaque);
+-
+-static void close_client(VuServer *server)
+-{
+-    /*
+-     * Before closing the client
+-     *
+-     * 1. Let vu_client_trip stop processing new vhost-user msg
+-     *
+-     * 2. remove kick_handler
+-     *
+-     * 3. wait for the kick handler to be finished
+-     *
+-     * 4. wait for the current vhost-user msg to be finished processing
+-     */
+-
+-    QIOChannelSocket *sioc = server->sioc;
+-    /* When this is set vu_client_trip will stop new processing vhost-user message */
+-    server->sioc = NULL;
+-
+-    while (server->processing_msg) {
+-        if (server->ioc->read_coroutine) {
+-            server->ioc->read_coroutine = NULL;
+-            qio_channel_set_aio_fd_handler(server->ioc, server->ioc->ctx, NULL,
+-                                           NULL, server->ioc);
+-            server->processing_msg = false;
+-        }
+-    }
+-
+-    vu_deinit(&server->vu_dev);
+-
+-    /* vu_deinit() should have called remove_watch() */
+-    assert(QTAILQ_EMPTY(&server->vu_fd_watches));
+-
+-    object_unref(OBJECT(sioc));
+-    object_unref(OBJECT(server->ioc));
+-}
+-
+ static void panic_cb(VuDev *vu_dev, const char *buf)
+ {
+-    VuServer *server = container_of(vu_dev, VuServer, vu_dev);
+-
+-    /* avoid while loop in close_client */
+-    server->processing_msg = false;
+-
+-    if (buf) {
+-        error_report("vu_panic: %s", buf);
+-    }
+-
+-    if (server->sioc) {
+-        close_client(server);
+-    }
+-
+-    /*
+-     * Set the callback function for network listener so another
+-     * vhost-user client can connect to this server
+-     */
+-    qio_net_listener_set_client_func(server->listener,
+-                                     vu_accept,
+-                                     server,
+-                                     NULL);
++    error_report("vu_panic: %s", buf);
+ }
+ static bool coroutine_fn
+@@ -XXX,XX +XXX,XX @@ fail:
+     return false;
+ }
+-
+-static void vu_client_start(VuServer *server);
+ static coroutine_fn void vu_client_trip(void *opaque)
+ {
+     VuServer *server = opaque;
++    VuDev *vu_dev = &server->vu_dev;
+-    while (!server->aio_context_changed && server->sioc) {
+-        server->processing_msg = true;
+-        vu_dispatch(&server->vu_dev);
+-        server->processing_msg = false;
++    while (!vu_dev->broken && vu_dispatch(vu_dev)) {
++        /* Keep running */
+     }
+-    if (server->aio_context_changed && server->sioc) {
+-        server->aio_context_changed = false;
+-        vu_client_start(server);
+-    }
+-}
++    vu_deinit(vu_dev);
++
++    /* vu_deinit() should have called remove_watch() */
++    assert(QTAILQ_EMPTY(&server->vu_fd_watches));
++
++    object_unref(OBJECT(server->sioc));
++    server->sioc = NULL;
+-static void vu_client_start(VuServer *server)
+-{
+-    server->co_trip = qemu_coroutine_create(vu_client_trip, server);
+-    aio_co_enter(server->ctx, server->co_trip);
++    object_unref(OBJECT(server->ioc));
++    server->ioc = NULL;
++
++    server->co_trip = NULL;
++    if (server->restart_listener_bh) {
++        qemu_bh_schedule(server->restart_listener_bh);
++    }
++    aio_wait_kick();
+ }
+ /*
+@@ -XXX,XX +XXX,XX @@ static void vu_client_start(VuServer *server)
+ static void kick_handler(void *opaque)
+ {
+     VuFdWatch *vu_fd_watch = opaque;
+-    vu_fd_watch->processing = true;
+-    vu_fd_watch->cb(vu_fd_watch->vu_dev, 0, vu_fd_watch->pvt);
+-    vu_fd_watch->processing = false;
++    VuDev *vu_dev = vu_fd_watch->vu_dev;
++
++    vu_fd_watch->cb(vu_dev, 0, vu_fd_watch->pvt);
++
++    /* Stop vu_client_trip() if an error occurred in vu_fd_watch->cb() */
++    if (vu_dev->broken) {
++        VuServer *server = container_of(vu_dev, VuServer, vu_dev);
++
++        qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
++    }
+ }
+-
+ static VuFdWatch *find_vu_fd_watch(VuServer *server, int fd)
+ {
+@@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc,
+     qio_channel_set_name(QIO_CHANNEL(sioc), "vhost-user client");
+     server->ioc = QIO_CHANNEL(sioc);
+     object_ref(OBJECT(server->ioc));
+-    qio_channel_attach_aio_context(server->ioc, server->ctx);
++
++    /* TODO vu_message_write() spins if non-blocking! */
+     qio_channel_set_blocking(server->ioc, false, NULL);
+-    vu_client_start(server);
++
++    server->co_trip = qemu_coroutine_create(vu_client_trip, server);
++
++    aio_context_acquire(server->ctx);
++    vhost_user_server_attach_aio_context(server, server->ctx);
++    aio_context_release(server->ctx);
+ }
+-
+ void vhost_user_server_stop(VuServer *server)
+ {
++    aio_context_acquire(server->ctx);
++
++    qemu_bh_delete(server->restart_listener_bh);
++    server->restart_listener_bh = NULL;
++
+     if (server->sioc) {
+-        close_client(server);
++        VuFdWatch *vu_fd_watch;
++
++        QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
++            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
++                               NULL, NULL, NULL, vu_fd_watch);
++        }
++
++        qio_channel_shutdown(server->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
++
++        AIO_WAIT_WHILE(server->ctx, server->co_trip);
+     }
++    aio_context_release(server->ctx);
++
+     if (server->listener) {
+         qio_net_listener_disconnect(server->listener);
+         object_unref(OBJECT(server->listener));
+     }
++}
++
++/*
++ * Allow the next client to connect to the server. Called from a BH in the main
++ * loop.
++ */
++static void restart_listener_bh(void *opaque)
++{
++    VuServer *server = opaque;
++    qio_net_listener_set_client_func(server->listener, vu_accept, server,
++                                     NULL);
+ }
+-void vhost_user_server_set_aio_context(VuServer *server, AioContext *ctx)
++/* Called with ctx acquired */
++void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx)
+ {
+-    VuFdWatch *vu_fd_watch, *next;
+-    void *opaque = NULL;
+-    IOHandler *io_read = NULL;
+-    bool attach;
++    VuFdWatch *vu_fd_watch;
+-    server->ctx = ctx ? ctx : qemu_get_aio_context();
++    server->ctx = ctx;
+     if (!server->sioc) {
+-        /* not yet serving any client*/
+         return;
+     }
+-    if (ctx) {
+-        qio_channel_attach_aio_context(server->ioc, ctx);
+-        server->aio_context_changed = true;
+-        io_read = kick_handler;
+-        attach = true;
+-    } else {
++    qio_channel_attach_aio_context(server->ioc, ctx);
++
++    QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
++        aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL,
++                           NULL, vu_fd_watch);
++    }
++
++    aio_co_schedule(ctx, server->co_trip);
++}
++
++/* Called with server->ctx acquired */
++void vhost_user_server_detach_aio_context(VuServer *server)
++{
++    if (server->sioc) {
++        VuFdWatch *vu_fd_watch;
++
++        QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) {
++            aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true,
++                               NULL, NULL, NULL, vu_fd_watch);
++        }
++
+         qio_channel_detach_aio_context(server->ioc);
+-        /* server->ioc->ctx keeps the old AioConext */
+-        ctx = server->ioc->ctx;
+-        attach = false;
+     }
+-    QTAILQ_FOREACH_SAFE(vu_fd_watch, &server->vu_fd_watches, next, next) {
+-        if (vu_fd_watch->cb) {
+-            opaque = attach ? vu_fd_watch : NULL;
+-            aio_set_fd_handler(ctx, vu_fd_watch->fd, true,
+-                               io_read, NULL, NULL,
+-                               opaque);
+-        }
+-    }
++    server->ctx = NULL;
+ }
+-
+ bool vhost_user_server_start(VuServer *server,
+                              SocketAddress *socket_addr,
+                              AioContext *ctx,
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
+                              const VuDevIface *vu_iface,
+                              Error **errp)
+ {
++    QEMUBH *bh;
+     QIONetListener *listener = qio_net_listener_new();
+     if (qio_net_listener_open_sync(listener, socket_addr, 1,
+                                    errp) < 0) {
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
+         return false;
+     }
++    bh = qemu_bh_new(restart_listener_bh, server);
++
+     /* zero out unspecified fields */
+     *server = (VuServer) {
+         .listener              = listener,
++        .restart_listener_bh   = bh,
+         .vu_iface              = vu_iface,
+         .max_queues            = max_queues,
+         .ctx                   = ctx,
+--
+.26.2

-New patch
+[PULL v2 16/28] block/export: report flush errors
+Propagate the flush return value since errors are possible.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-11-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ block/export/vhost-user-blk-server.c | 11 +++++++----
+file changed, 7 insertions(+), 4 deletions(-)
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/export/vhost-user-blk-server.c
++++ b/block/export/vhost-user-blk-server.c
+@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
+     return -EINVAL;
+ }
+-static void coroutine_fn vu_block_flush(VuBlockReq *req)
++static int coroutine_fn vu_block_flush(VuBlockReq *req)
+ {
+     VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
+     BlockBackend *backend = vdev_blk->backend;
+-    blk_co_flush(backend);
++    return blk_co_flush(backend);
+ }
+ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
+@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
+         break;
+     }
+     case VIRTIO_BLK_T_FLUSH:
+-        vu_block_flush(req);
+-        req->in->status = VIRTIO_BLK_S_OK;
++        if (vu_block_flush(req) == 0) {
++            req->in->status = VIRTIO_BLK_S_OK;
++        } else {
++            req->in->status = VIRTIO_BLK_S_IOERR;
++        }
+         break;
+     case VIRTIO_BLK_T_GET_ID: {
+         size_t size = MIN(iov_size(&elem->in_sg[0], in_num),
+--
+.26.2

-[PULL 06/14] throttle: use enum ThrottleDirection instead of bool is_write
+[PULL v2 17/28] block/export: convert vhost-user-blk server to block export API
-From: zhenwei pi <pizhenwei@bytedance.com>
+Use the new QAPI block exports API instead of defining our own QOM
+objects.
-enum ThrottleDirection is already there, use ThrottleDirection instead
-of 'bool is_write' for throttle API, also modify related codes from
+This is a large change because the lifecycle of VuBlockDev needs to
-block, fsdev, cryptodev and tests.
+follow BlockExportDriver. QOM properties are replaced by QAPI options
+objects.
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
+VuBlockDev is renamed VuBlkExport and contains a BlockExport field.
-Message-Id: <20230728022006.1098509-7-pizhenwei@bytedance.com>
+Several fields can be dropped since BlockExport already has equivalents.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 The file names and meson build integration will be adjusted in a future
 patch. libvhost-user should probably be built as a static library that
 is linked into QEMU instead of as a .c file that results in duplicate
 compilation.
 The new command-line syntax is:
   $ qemu-storage-daemon \
       --blockdev file,node-name=drive0,filename=test.img \
       --export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock
 Note that unix-socket is optional because we may wish to accept chardevs
 too in the future.
 Markus noted that supported address families are not explicit in the
 QAPI schema. It is unlikely that support for more address families will
 be added since file descriptor passing is required and few address
 families support it. If a new address family needs to be added, then the
 QAPI 'features' syntax can be used to advertize them.
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Acked-by: Markus Armbruster <armbru@redhat.com>
 Message-id: 20200924151549.913737-12-stefanha@redhat.com
 [Skip test on big-endian host architectures because this device doesn't
 support them yet (as already mentioned in a code comment).
 --Stefan]
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- include/qemu/throttle.h     |  5 +++--
+ qapi/block-export.json               |  21 +-
- backends/cryptodev.c        |  9 +++++----
+ block/export/vhost-user-blk-server.h |  23 +-
- block/throttle-groups.c     |  6 ++++--
+ block/export/export.c                |   6 +
- fsdev/qemu-fsdev-throttle.c |  8 +++++---
+ block/export/vhost-user-blk-server.c | 452 +++++++--------------------
- tests/unit/test-throttle.c  |  4 ++--
+ util/vhost-user-server.c             |  10 +-
- util/throttle.c             | 31 +++++++++++++++++--------------
+ block/export/meson.build             |   1 +
-files changed, 36 insertions(+), 27 deletions(-)
+ block/meson.build                    |   1 -
+files changed, 156 insertions(+), 358 deletions(-)
-diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
 diff --git a/qapi/block-export.json b/qapi/block-export.json
 index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/throttle.h
+--- a/qapi/block-export.json
-+++ b/include/qemu/throttle.h
++++ b/qapi/block-export.json
-@@ -XXX,XX +XXX,XX @@ void throttle_config_init(ThrottleConfig *cfg);
+@@ -XXX,XX +XXX,XX @@
- /* usage */
+   'data': { '*name': 'str', '*description': 'str',
- bool throttle_schedule_timer(ThrottleState *ts,
+             '*bitmap': 'str' } }
-                              ThrottleTimers *tt,
--                             bool is_write);
++##
-+                             ThrottleDirection direction);
++# @BlockExportOptionsVhostUserBlk:
++#
--void throttle_account(ThrottleState *ts, bool is_write, uint64_t size);
++# A vhost-user-blk block export.
-+void throttle_account(ThrottleState *ts, ThrottleDirection direction,
++#
-+                      uint64_t size);
++# @addr: The vhost-user socket on which to listen. Both 'unix' and 'fd'
- void throttle_limits_to_config(ThrottleLimits *arg, ThrottleConfig *cfg,
++#        SocketAddress types are supported. Passed fds must be UNIX domain
-                                Error **errp);
++#        sockets.
- void throttle_config_to_limits(ThrottleConfig *cfg, ThrottleLimits *var);
++# @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
-diff --git a/backends/cryptodev.c b/backends/cryptodev.c
++#
 +# Since: 5.2
 +##
 +{ 'struct': 'BlockExportOptionsVhostUserBlk',
 +  'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
 +
  ##
  # @NbdServerAddOptions:
  #
@@ -XXX,XX +XXX,XX @@
  # An enumeration of block export types
  #
  # @nbd: NBD export
 +# @vhost-user-blk: vhost-user-blk export (since 5.2)
  #
  # Since: 4.2
  ##
  { 'enum': 'BlockExportType',
 -  'data': [ 'nbd' ] }
 +  'data': [ 'nbd', 'vhost-user-blk' ] }
  ##
  # @BlockExportOptions:
@@ -XXX,XX +XXX,XX @@
              '*writethrough': 'bool' },
    'discriminator': 'type',
    'data': {
 -      'nbd': 'BlockExportOptionsNbd'
 +      'nbd': 'BlockExportOptionsNbd',
 +      'vhost-user-blk': 'BlockExportOptionsVhostUserBlk'
     } }
  ##
 diff --git a/block/export/vhost-user-blk-server.h b/block/export/vhost-user-blk-server.h
 index XXXXXXX..XXXXXXX 100644
---- a/backends/cryptodev.c
+--- a/block/export/vhost-user-blk-server.h
-+++ b/backends/cryptodev.c
++++ b/block/export/vhost-user-blk-server.h
-@@ -XXX,XX +XXX,XX @@ static void cryptodev_backend_throttle_timer_cb(void *opaque)
+@@ -XXX,XX +XXX,XX @@
-             continue;
  #ifndef VHOST_USER_BLK_SERVER_H
  #define VHOST_USER_BLK_SERVER_H
 -#include "util/vhost-user-server.h"
 -typedef struct VuBlockDev VuBlockDev;
 -#define TYPE_VHOST_USER_BLK_SERVER "vhost-user-blk-server"
 -#define VHOST_USER_BLK_SERVER(obj) \
 -   OBJECT_CHECK(VuBlockDev, obj, TYPE_VHOST_USER_BLK_SERVER)
 +#include "block/export.h"
 -/* vhost user block device */
 -struct VuBlockDev {
 -    Object parent_obj;
 -    char *node_name;
 -    SocketAddress *addr;
 -    AioContext *ctx;
 -    VuServer vu_server;
 -    bool running;
 -    uint32_t blk_size;
 -    BlockBackend *backend;
 -    QIOChannelSocket *sioc;
 -    QTAILQ_ENTRY(VuBlockDev) next;
 -    struct virtio_blk_config blkcfg;
 -    bool writable;
 -};
 +/* For block/export/export.c */
 +extern const BlockExportDriver blk_exp_vhost_user_blk;
  #endif /* VHOST_USER_BLK_SERVER_H */
 diff --git a/block/export/export.c b/block/export/export.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/export.c
 +++ b/block/export/export.c
@@ -XXX,XX +XXX,XX @@
  #include "sysemu/block-backend.h"
  #include "block/export.h"
  #include "block/nbd.h"
 +#if CONFIG_LINUX
 +#include "block/export/vhost-user-blk-server.h"
 +#endif
  #include "qapi/error.h"
  #include "qapi/qapi-commands-block-export.h"
  #include "qapi/qapi-events-block-export.h"
@@ -XXX,XX +XXX,XX @@
  static const BlockExportDriver *blk_exp_drivers[] = {
      &blk_exp_nbd,
 +#if CONFIG_LINUX
 +    &blk_exp_vhost_user_blk,
 +#endif
  };
  /* Only accessed from the main thread */
 diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/vhost-user-blk-server.c
 +++ b/block/export/vhost-user-blk-server.c
@@ -XXX,XX +XXX,XX @@
   */
  #include "qemu/osdep.h"
  #include "block/block.h"
 +#include "contrib/libvhost-user/libvhost-user.h"
 +#include "standard-headers/linux/virtio_blk.h"
 +#include "util/vhost-user-server.h"
  #include "vhost-user-blk-server.h"
  #include "qapi/error.h"
  #include "qom/object_interfaces.h"
@@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr {
      unsigned char status;
  };
 -typedef struct VuBlockReq {
 +typedef struct VuBlkReq {
      VuVirtqElement elem;
      int64_t sector_num;
      size_t size;
@@ -XXX,XX +XXX,XX @@ typedef struct VuBlockReq {
      struct virtio_blk_outhdr out;
      VuServer *server;
      struct VuVirtq *vq;
 -} VuBlockReq;
 +} VuBlkReq;
 -static void vu_block_req_complete(VuBlockReq *req)
 +/* vhost user block device */
 +typedef struct {
 +    BlockExport export;
 +    VuServer vu_server;
 +    uint32_t blk_size;
 +    QIOChannelSocket *sioc;
 +    struct virtio_blk_config blkcfg;
 +    bool writable;
 +} VuBlkExport;
 +
 +static void vu_blk_req_complete(VuBlkReq *req)
  {
      VuDev *vu_dev = &req->server->vu_dev;
@@ -XXX,XX +XXX,XX @@ static void vu_block_req_complete(VuBlockReq *req)
      free(req);
  }
 -static VuBlockDev *get_vu_block_device_by_server(VuServer *server)
 -{
 -    return container_of(server, VuBlockDev, vu_server);
 -}
 -
  static int coroutine_fn
 -vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
 -                              uint32_t iovcnt, uint32_t type)
 +vu_blk_discard_write_zeroes(BlockBackend *blk, struct iovec *iov,
 +                            uint32_t iovcnt, uint32_t type)
  {
      struct virtio_blk_discard_write_zeroes desc;
      ssize_t size = iov_to_buf(iov, iovcnt, 0, &desc, sizeof(desc));
@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
          return -EINVAL;
      }
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
      uint64_t range[2] = { le64_to_cpu(desc.sector) << 9,
                            le32_to_cpu(desc.num_sectors) << 9 };
      if (type == VIRTIO_BLK_T_DISCARD) {
 -        if (blk_co_pdiscard(vdev_blk->backend, range[0], range[1]) == 0) {
 +        if (blk_co_pdiscard(blk, range[0], range[1]) == 0) {
              return 0;
          }
+     } else if (type == VIRTIO_BLK_T_WRITE_ZEROES) {
--        throttle_account(&backend->ts, true, ret);
+-        if (blk_co_pwrite_zeroes(vdev_blk->backend,
-+        throttle_account(&backend->ts, THROTTLE_WRITE, ret);
+-                                 range[0], range[1], 0) == 0) {
-         cryptodev_backend_operation(backend, op_info);
++        if (blk_co_pwrite_zeroes(blk, range[0], range[1], 0) == 0) {
-         if (throttle_enabled(&backend->tc) &&
+             return 0;
--            throttle_schedule_timer(&backend->ts, &backend->tt, true)) {
+         }
-+            throttle_schedule_timer(&backend->ts, &backend->tt,
+     }
-+                                    THROTTLE_WRITE)) {
+@@ -XXX,XX +XXX,XX @@ vu_block_discard_write_zeroes(VuBlockReq *req, struct iovec *iov,
      return -EINVAL;
  }
 -static int coroutine_fn vu_block_flush(VuBlockReq *req)
 +static void coroutine_fn vu_blk_virtio_process_req(void *opaque)
  {
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(req->server);
 -    BlockBackend *backend = vdev_blk->backend;
 -    return blk_co_flush(backend);
 -}
 -
 -static void coroutine_fn vu_block_virtio_process_req(void *opaque)
 -{
 -    VuBlockReq *req = opaque;
 +    VuBlkReq *req = opaque;
      VuServer *server = req->server;
      VuVirtqElement *elem = &req->elem;
      uint32_t type;
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
 -    BlockBackend *backend = vdev_blk->backend;
 +    VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
 +    BlockBackend *blk = vexp->export.blk;
      struct iovec *in_iov = elem->in_sg;
      struct iovec *out_iov = elem->out_sg;
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
          bool is_write = type & VIRTIO_BLK_T_OUT;
          req->sector_num = le64_to_cpu(req->out.sector);
 -        int64_t offset = req->sector_num * vdev_blk->blk_size;
 +        if (is_write && !vexp->writable) {
 +            req->in->status = VIRTIO_BLK_S_IOERR;
 +            break;
 +        }
 +
 +        int64_t offset = req->sector_num * vexp->blk_size;
          QEMUIOVector qiov;
          if (is_write) {
              qemu_iovec_init_external(&qiov, out_iov, out_num);
 -            ret = blk_co_pwritev(backend, offset, qiov.size,
 -                                 &qiov, 0);
 +            ret = blk_co_pwritev(blk, offset, qiov.size, &qiov, 0);
          } else {
              qemu_iovec_init_external(&qiov, in_iov, in_num);
 -            ret = blk_co_preadv(backend, offset, qiov.size,
 -                                &qiov, 0);
 +            ret = blk_co_preadv(blk, offset, qiov.size, &qiov, 0);
          }
          if (ret >= 0) {
              req->in->status = VIRTIO_BLK_S_OK;
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
          break;
      }
      case VIRTIO_BLK_T_FLUSH:
 -        if (vu_block_flush(req) == 0) {
 +        if (blk_co_flush(blk) == 0) {
              req->in->status = VIRTIO_BLK_S_OK;
          } else {
              req->in->status = VIRTIO_BLK_S_IOERR;
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
      case VIRTIO_BLK_T_DISCARD:
      case VIRTIO_BLK_T_WRITE_ZEROES: {
          int rc;
 -        rc = vu_block_discard_write_zeroes(req, &elem->out_sg[1],
 -                                           out_num, type);
 +
 +        if (!vexp->writable) {
 +            req->in->status = VIRTIO_BLK_S_IOERR;
 +            break;
 +        }
 +
 +        rc = vu_blk_discard_write_zeroes(blk, &elem->out_sg[1], out_num, type);
          if (rc == 0) {
              req->in->status = VIRTIO_BLK_S_OK;
          } else {
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_block_virtio_process_req(void *opaque)
          break;
      }
 -    vu_block_req_complete(req);
 +    vu_blk_req_complete(req);
      return;
  err:
 -    free(elem);
 +    free(req);
  }
 -static void vu_block_process_vq(VuDev *vu_dev, int idx)
 +static void vu_blk_process_vq(VuDev *vu_dev, int idx)
  {
      VuServer *server = container_of(vu_dev, VuServer, vu_dev);
      VuVirtq *vq = vu_get_queue(vu_dev, idx);
      while (1) {
 -        VuBlockReq *req;
 +        VuBlkReq *req;
 -        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlockReq));
 +        req = vu_queue_pop(vu_dev, vq, sizeof(VuBlkReq));
          if (!req) {
              break;
          }
+@@ -XXX,XX +XXX,XX @@ static void vu_block_process_vq(VuDev *vu_dev, int idx)
+         req->vq = vq;
+         Coroutine *co =
+-            qemu_coroutine_create(vu_block_virtio_process_req, req);
++            qemu_coroutine_create(vu_blk_virtio_process_req, req);
+         qemu_coroutine_enter(co);
      }
-@@ -XXX,XX +XXX,XX @@ int cryptodev_backend_crypto_operation(
+ }
-         goto do_account;
 -static void vu_block_queue_set_started(VuDev *vu_dev, int idx, bool started)
 +static void vu_blk_queue_set_started(VuDev *vu_dev, int idx, bool started)
  {
      VuVirtq *vq;
      assert(vu_dev);
      vq = vu_get_queue(vu_dev, idx);
 -    vu_set_queue_handler(vu_dev, vq, started ? vu_block_process_vq : NULL);
 +    vu_set_queue_handler(vu_dev, vq, started ? vu_blk_process_vq : NULL);
  }
 -static uint64_t vu_block_get_features(VuDev *dev)
 +static uint64_t vu_blk_get_features(VuDev *dev)
  {
      uint64_t features;
      VuServer *server = container_of(dev, VuServer, vu_dev);
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
 +    VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
      features = 1ull << VIRTIO_BLK_F_SIZE_MAX |
 ull << VIRTIO_BLK_F_SEG_MAX |
 ull << VIRTIO_BLK_F_TOPOLOGY |
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_block_get_features(VuDev *dev)
 ull << VIRTIO_RING_F_EVENT_IDX |
 ull << VHOST_USER_F_PROTOCOL_FEATURES;
 -    if (!vdev_blk->writable) {
 +    if (!vexp->writable) {
          features |= 1ull << VIRTIO_BLK_F_RO;
      }
--    if (throttle_schedule_timer(&backend->ts, &backend->tt, true) ||
+     return features;
-+    if (throttle_schedule_timer(&backend->ts, &backend->tt, THROTTLE_WRITE) ||
+ }
-         !QTAILQ_EMPTY(&backend->opinfos)) {
-         QTAILQ_INSERT_TAIL(&backend->opinfos, op_info, next);
+-static uint64_t vu_block_get_protocol_features(VuDev *dev)
-         return 0;
++static uint64_t vu_blk_get_protocol_features(VuDev *dev)
-@@ -XXX,XX +XXX,XX @@ do_account:
+ {
-         return ret;
+     return 1ull << VHOST_USER_PROTOCOL_F_CONFIG |
 ull << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD;
  }
  static int
 -vu_block_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
 +vu_blk_get_config(VuDev *vu_dev, uint8_t *config, uint32_t len)
  {
 +    /* TODO blkcfg must be little-endian for VIRTIO 1.0 */
      VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
 -    memcpy(config, &vdev_blk->blkcfg, len);
 -
 +    VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
 +    memcpy(config, &vexp->blkcfg, len);
      return 0;
  }
  static int
 -vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
 +vu_blk_set_config(VuDev *vu_dev, const uint8_t *data,
                      uint32_t offset, uint32_t size, uint32_t flags)
  {
      VuServer *server = container_of(vu_dev, VuServer, vu_dev);
 -    VuBlockDev *vdev_blk = get_vu_block_device_by_server(server);
 +    VuBlkExport *vexp = container_of(server, VuBlkExport, vu_server);
      uint8_t wce;
      /* don't support live migration */
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
      }
--    throttle_account(&backend->ts, true, ret);
+     wce = *data;
-+    throttle_account(&backend->ts, THROTTLE_WRITE, ret);
+-    vdev_blk->blkcfg.wce = wce;
+-    blk_set_enable_write_cache(vdev_blk->backend, wce);
-     return cryptodev_backend_operation(backend, op_info);
++    vexp->blkcfg.wce = wce;
- }
++    blk_set_enable_write_cache(vexp->export.blk, wce);
-diff --git a/block/throttle-groups.c b/block/throttle-groups.c
+     return 0;
  }
@@ -XXX,XX +XXX,XX @@ vu_block_set_config(VuDev *vu_dev, const uint8_t *data,
   * of vu_process_message.
   *
   */
 -static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
 +static int vu_blk_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
  {
      if (vmsg->request == VHOST_USER_NONE) {
          dev->panic(dev, "disconnect");
@@ -XXX,XX +XXX,XX @@ static int vu_block_process_msg(VuDev *dev, VhostUserMsg *vmsg, int *do_reply)
      return false;
  }
 -static const VuDevIface vu_block_iface = {
 -    .get_features          = vu_block_get_features,
 -    .queue_set_started     = vu_block_queue_set_started,
 -    .get_protocol_features = vu_block_get_protocol_features,
 -    .get_config            = vu_block_get_config,
 -    .set_config            = vu_block_set_config,
 -    .process_msg           = vu_block_process_msg,
 +static const VuDevIface vu_blk_iface = {
 +    .get_features          = vu_blk_get_features,
 +    .queue_set_started     = vu_blk_queue_set_started,
 +    .get_protocol_features = vu_blk_get_protocol_features,
 +    .get_config            = vu_blk_get_config,
 +    .set_config            = vu_blk_set_config,
 +    .process_msg           = vu_blk_process_msg,
  };
  static void blk_aio_attached(AioContext *ctx, void *opaque)
  {
 -    VuBlockDev *vub_dev = opaque;
 -    vhost_user_server_attach_aio_context(&vub_dev->vu_server, ctx);
 +    VuBlkExport *vexp = opaque;
 +    vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
  }
  static void blk_aio_detach(void *opaque)
  {
 -    VuBlockDev *vub_dev = opaque;
 -    vhost_user_server_detach_aio_context(&vub_dev->vu_server);
 +    VuBlkExport *vexp = opaque;
 +    vhost_user_server_detach_aio_context(&vexp->vu_server);
  }
  static void
 -vu_block_initialize_config(BlockDriverState *bs,
 +vu_blk_initialize_config(BlockDriverState *bs,
                             struct virtio_blk_config *config, uint32_t blk_size)
  {
      config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
@@ -XXX,XX +XXX,XX @@ vu_block_initialize_config(BlockDriverState *bs,
      config->max_write_zeroes_seg = 1;
  }
 -static VuBlockDev *vu_block_init(VuBlockDev *vu_block_device, Error **errp)
 +static void vu_blk_exp_request_shutdown(BlockExport *exp)
  {
 +    VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
 -    BlockBackend *blk;
 -    Error *local_error = NULL;
 -    const char *node_name = vu_block_device->node_name;
 -    bool writable = vu_block_device->writable;
 -    uint64_t perm = BLK_PERM_CONSISTENT_READ;
 -    int ret;
 -
 -    AioContext *ctx;
 -
 -    BlockDriverState *bs = bdrv_lookup_bs(node_name, node_name, &local_error);
 -
 -    if (!bs) {
 -        error_propagate(errp, local_error);
 -        return NULL;
 -    }
 -
 -    if (bdrv_is_read_only(bs)) {
 -        writable = false;
 -    }
 -
 -    if (writable) {
 -        perm |= BLK_PERM_WRITE;
 -    }
 -
 -    ctx = bdrv_get_aio_context(bs);
 -    aio_context_acquire(ctx);
 -    bdrv_invalidate_cache(bs, NULL);
 -    aio_context_release(ctx);
 -
 -    /*
 -     * Don't allow resize while the vhost user server is running,
 -     * otherwise we don't care what happens with the node.
 -     */
 -    blk = blk_new(bdrv_get_aio_context(bs), perm,
 -                  BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
 -                  BLK_PERM_WRITE | BLK_PERM_GRAPH_MOD);
 -    ret = blk_insert_bs(blk, bs, errp);
 -
 -    if (ret < 0) {
 -        goto fail;
 -    }
 -
 -    blk_set_enable_write_cache(blk, false);
 -
 -    blk_set_allow_aio_context_change(blk, true);
 -
 -    vu_block_device->blkcfg.wce = 0;
 -    vu_block_device->backend = blk;
 -    if (!vu_block_device->blk_size) {
 -        vu_block_device->blk_size = BDRV_SECTOR_SIZE;
 -    }
 -    vu_block_device->blkcfg.blk_size = vu_block_device->blk_size;
 -    blk_set_guest_block_size(blk, vu_block_device->blk_size);
 -    vu_block_initialize_config(bs, &vu_block_device->blkcfg,
 -                                   vu_block_device->blk_size);
 -    return vu_block_device;
 -
 -fail:
 -    blk_unref(blk);
 -    return NULL;
 -}
 -
 -static void vu_block_deinit(VuBlockDev *vu_block_device)
 -{
 -    if (vu_block_device->backend) {
 -        blk_remove_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
 -                                        blk_aio_detach, vu_block_device);
 -    }
 -
 -    blk_unref(vu_block_device->backend);
 -}
 -
 -static void vhost_user_blk_server_stop(VuBlockDev *vu_block_device)
 -{
 -    vhost_user_server_stop(&vu_block_device->vu_server);
 -    vu_block_deinit(vu_block_device);
 -}
 -
 -static void vhost_user_blk_server_start(VuBlockDev *vu_block_device,
 -                                        Error **errp)
 -{
 -    AioContext *ctx;
 -    SocketAddress *addr = vu_block_device->addr;
 -
 -    if (!vu_block_init(vu_block_device, errp)) {
 -        return;
 -    }
 -
 -    ctx = bdrv_get_aio_context(blk_bs(vu_block_device->backend));
 -
 -    if (!vhost_user_server_start(&vu_block_device->vu_server, addr, ctx,
 -                                 VHOST_USER_BLK_MAX_QUEUES, &vu_block_iface,
 -                                 errp)) {
 -        goto error;
 -    }
 -
 -    blk_add_aio_context_notifier(vu_block_device->backend, blk_aio_attached,
 -                                 blk_aio_detach, vu_block_device);
 -    vu_block_device->running = true;
 -    return;
 -
 - error:
 -    vu_block_deinit(vu_block_device);
 -}
 -
 -static bool vu_prop_modifiable(VuBlockDev *vus, Error **errp)
 -{
 -    if (vus->running) {
 -            error_setg(errp, "The property can't be modified "
 -                       "while the server is running");
 -            return false;
 -    }
 -    return true;
 -}
 -
 -static void vu_set_node_name(Object *obj, const char *value, Error **errp)
 -{
 -    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
 -
 -    if (!vu_prop_modifiable(vus, errp)) {
 -        return;
 -    }
 -
 -    if (vus->node_name) {
 -        g_free(vus->node_name);
 -    }
 -
 -    vus->node_name = g_strdup(value);
 -}
 -
 -static char *vu_get_node_name(Object *obj, Error **errp)
 -{
 -    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
 -    return g_strdup(vus->node_name);
 -}
 -
 -static void free_socket_addr(SocketAddress *addr)
 -{
 -        g_free(addr->u.q_unix.path);
 -        g_free(addr);
 -}
 -
 -static void vu_set_unix_socket(Object *obj, const char *value,
 -                               Error **errp)
 -{
 -    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
 -
 -    if (!vu_prop_modifiable(vus, errp)) {
 -        return;
 -    }
 -
 -    if (vus->addr) {
 -        free_socket_addr(vus->addr);
 -    }
 -
 -    SocketAddress *addr = g_new0(SocketAddress, 1);
 -    addr->type = SOCKET_ADDRESS_TYPE_UNIX;
 -    addr->u.q_unix.path = g_strdup(value);
 -    vus->addr = addr;
 +    vhost_user_server_stop(&vexp->vu_server);
  }
 -static char *vu_get_unix_socket(Object *obj, Error **errp)
 +static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
 +                             Error **errp)
  {
 -    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
 -    return g_strdup(vus->addr->u.q_unix.path);
 -}
 -
 -static bool vu_get_block_writable(Object *obj, Error **errp)
 -{
 -    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
 -    return vus->writable;
 -}
 -
 -static void vu_set_block_writable(Object *obj, bool value, Error **errp)
 -{
 -    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
 -
 -    if (!vu_prop_modifiable(vus, errp)) {
 -            return;
 -    }
 -
 -    vus->writable = value;
 -}
 -
 -static void vu_get_blk_size(Object *obj, Visitor *v, const char *name,
 -                            void *opaque, Error **errp)
 -{
 -    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
 -    uint32_t value = vus->blk_size;
 -
 -    visit_type_uint32(v, name, &value, errp);
 -}
 -
 -static void vu_set_blk_size(Object *obj, Visitor *v, const char *name,
 -                            void *opaque, Error **errp)
 -{
 -    VuBlockDev *vus = VHOST_USER_BLK_SERVER(obj);
 -
 +    VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
 +    BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
      Error *local_err = NULL;
 -    uint32_t value;
 +    uint64_t logical_block_size;
 -    if (!vu_prop_modifiable(vus, errp)) {
 -            return;
 -    }
 +    vexp->writable = opts->writable;
 +    vexp->blkcfg.wce = 0;
 -    visit_type_uint32(v, name, &value, &local_err);
 -    if (local_err) {
 -        goto out;
 +    if (vu_opts->has_logical_block_size) {
 +        logical_block_size = vu_opts->logical_block_size;
 +    } else {
 +        logical_block_size = BDRV_SECTOR_SIZE;
      }
 -
 -    check_block_size(object_get_typename(obj), name, value, &local_err);
 +    check_block_size(exp->id, "logical-block-size", logical_block_size,
 +                     &local_err);
      if (local_err) {
 -        goto out;
 +        error_propagate(errp, local_err);
 +        return -EINVAL;
 +    }
 +    vexp->blk_size = logical_block_size;
 +    blk_set_guest_block_size(exp->blk, logical_block_size);
 +    vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
 +                               logical_block_size);
 +
 +    blk_set_allow_aio_context_change(exp->blk, true);
 +    blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 +                                 vexp);
 +
 +    if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
 +                                 VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
 +                                 errp)) {
 +        blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
 +                                        blk_aio_detach, vexp);
 +        return -EADDRNOTAVAIL;
      }
 -    vus->blk_size = value;
 -
 -out:
 -    error_propagate(errp, local_err);
 -}
 -
 -static void vhost_user_blk_server_instance_finalize(Object *obj)
 -{
 -    VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
 -
 -    vhost_user_blk_server_stop(vub);
 -
 -    /*
 -     * Unlike object_property_add_str, object_class_property_add_str
 -     * doesn't have a release method. Thus manual memory freeing is
 -     * needed.
 -     */
 -    free_socket_addr(vub->addr);
 -    g_free(vub->node_name);
 -}
 -
 -static void vhost_user_blk_server_complete(UserCreatable *obj, Error **errp)
 -{
 -    VuBlockDev *vub = VHOST_USER_BLK_SERVER(obj);
 -
 -    vhost_user_blk_server_start(vub, errp);
 +    return 0;
  }
 -static void vhost_user_blk_server_class_init(ObjectClass *klass,
 -                                             void *class_data)
 +static void vu_blk_exp_delete(BlockExport *exp)
  {
 -    UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
 -    ucc->complete = vhost_user_blk_server_complete;
 -
 -    object_class_property_add_bool(klass, "writable",
 -                                   vu_get_block_writable,
 -                                   vu_set_block_writable);
 -
 -    object_class_property_add_str(klass, "node-name",
 -                                  vu_get_node_name,
 -                                  vu_set_node_name);
 -
 -    object_class_property_add_str(klass, "unix-socket",
 -                                  vu_get_unix_socket,
 -                                  vu_set_unix_socket);
 +    VuBlkExport *vexp = container_of(exp, VuBlkExport, export);
 -    object_class_property_add(klass, "logical-block-size", "uint32",
 -                              vu_get_blk_size, vu_set_blk_size,
 -                              NULL, NULL);
 +    blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
 +                                    vexp);
  }
 -static const TypeInfo vhost_user_blk_server_info = {
 -    .name = TYPE_VHOST_USER_BLK_SERVER,
 -    .parent = TYPE_OBJECT,
 -    .instance_size = sizeof(VuBlockDev),
 -    .instance_finalize = vhost_user_blk_server_instance_finalize,
 -    .class_init = vhost_user_blk_server_class_init,
 -    .interfaces = (InterfaceInfo[]) {
 -        {TYPE_USER_CREATABLE},
 -        {}
 -    },
 +const BlockExportDriver blk_exp_vhost_user_blk = {
 +    .type               = BLOCK_EXPORT_TYPE_VHOST_USER_BLK,
 +    .instance_size      = sizeof(VuBlkExport),
 +    .create             = vu_blk_exp_create,
 +    .delete             = vu_blk_exp_delete,
 +    .request_shutdown   = vu_blk_exp_request_shutdown,
  };
 -
 -static void vhost_user_blk_server_register_types(void)
 -{
 -    type_register_static(&vhost_user_blk_server_info);
 -}
 -
 -type_init(vhost_user_blk_server_register_types)
 diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/throttle-groups.c
+--- a/util/vhost-user-server.c
-+++ b/block/throttle-groups.c
++++ b/util/vhost-user-server.c
-@@ -XXX,XX +XXX,XX @@ static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
+@@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server,
-     ThrottleState *ts = tgm->throttle_state;
+                              Error **errp)
-     ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
+ {
-     ThrottleTimers *tt = &tgm->throttle_timers;
+     QEMUBH *bh;
-+    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
+-    QIONetListener *listener = qio_net_listener_new();
-     bool must_wait;
++    QIONetListener *listener;
++
-     if (qatomic_read(&tgm->io_limits_disabled)) {
++    if (socket_addr->type != SOCKET_ADDRESS_TYPE_UNIX &&
-@@ -XXX,XX +XXX,XX @@ static bool throttle_group_schedule_timer(ThrottleGroupMember *tgm,
++        socket_addr->type != SOCKET_ADDRESS_TYPE_FD) {
-         return true;
++        error_setg(errp, "Only socket address types 'unix' and 'fd' are supported");
-     }
++        return false;
++    }
--    must_wait = throttle_schedule_timer(ts, tt, is_write);
++
-+    must_wait = throttle_schedule_timer(ts, tt, direction);
++    listener = qio_net_listener_new();
+     if (qio_net_listener_open_sync(listener, socket_addr, 1,
-     /* If a timer just got armed, set tgm as the current token */
+                                    errp) < 0) {
-     if (must_wait) {
+         object_unref(OBJECT(listener));
-@@ -XXX,XX +XXX,XX @@ void coroutine_fn throttle_group_co_io_limits_intercept(ThrottleGroupMember *tgm
+diff --git a/block/export/meson.build b/block/export/meson.build
      bool must_wait;
      ThrottleGroupMember *token;
      ThrottleGroup *tg = container_of(tgm->throttle_state, ThrottleGroup, ts);
 +    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
      assert(bytes >= 0);
@@ -XXX,XX +XXX,XX @@ void coroutine_fn throttle_group_co_io_limits_intercept(ThrottleGroupMember *tgm
      }
      /* The I/O will be executed, so do the accounting */
 -    throttle_account(tgm->throttle_state, is_write, bytes);
 +    throttle_account(tgm->throttle_state, direction, bytes);
      /* Schedule the next request */
      schedule_next_request(tgm, is_write);
 diff --git a/fsdev/qemu-fsdev-throttle.c b/fsdev/qemu-fsdev-throttle.c
 index XXXXXXX..XXXXXXX 100644
---- a/fsdev/qemu-fsdev-throttle.c
+--- a/block/export/meson.build
-+++ b/fsdev/qemu-fsdev-throttle.c
++++ b/block/export/meson.build
-@@ -XXX,XX +XXX,XX @@ void fsdev_throttle_init(FsThrottle *fst)
+@@ -1 +1,2 @@
- void coroutine_fn fsdev_co_throttle_request(FsThrottle *fst, bool is_write,
+ block_ss.add(files('export.c'))
-                                             struct iovec *iov, int iovcnt)
++block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
- {
+diff --git a/block/meson.build b/block/meson.build
 +    ThrottleDirection direction = is_write ? THROTTLE_WRITE : THROTTLE_READ;
 +
      if (throttle_enabled(&fst->cfg)) {
 -        if (throttle_schedule_timer(&fst->ts, &fst->tt, is_write) ||
 +        if (throttle_schedule_timer(&fst->ts, &fst->tt, direction) ||
              !qemu_co_queue_empty(&fst->throttled_reqs[is_write])) {
              qemu_co_queue_wait(&fst->throttled_reqs[is_write], NULL);
          }
 -        throttle_account(&fst->ts, is_write, iov_size(iov, iovcnt));
 +        throttle_account(&fst->ts, direction, iov_size(iov, iovcnt));
          if (!qemu_co_queue_empty(&fst->throttled_reqs[is_write]) &&
 -            !throttle_schedule_timer(&fst->ts, &fst->tt, is_write)) {
 +            !throttle_schedule_timer(&fst->ts, &fst->tt, direction)) {
              qemu_co_queue_next(&fst->throttled_reqs[is_write]);
          }
      }
 diff --git a/tests/unit/test-throttle.c b/tests/unit/test-throttle.c
 index XXXXXXX..XXXXXXX 100644
---- a/tests/unit/test-throttle.c
+--- a/block/meson.build
-+++ b/tests/unit/test-throttle.c
++++ b/block/meson.build
-@@ -XXX,XX +XXX,XX @@ static bool do_test_accounting(bool is_ops, /* are we testing bps or ops */
+@@ -XXX,XX +XXX,XX @@ block_ss.add(when: 'CONFIG_WIN32', if_true: files('file-win32.c', 'win32-aio.c')
-     throttle_config(&ts, QEMU_CLOCK_VIRTUAL, &cfg);
+ block_ss.add(when: 'CONFIG_POSIX', if_true: [files('file-posix.c'), coref, iokit])
+ block_ss.add(when: 'CONFIG_LIBISCSI', if_true: files('iscsi-opts.c'))
-     /* account a read */
+ block_ss.add(when: 'CONFIG_LINUX', if_true: files('nvme.c'))
--    throttle_account(&ts, false, size);
+-block_ss.add(when: 'CONFIG_LINUX', if_true: files('export/vhost-user-blk-server.c', '../contrib/libvhost-user/libvhost-user.c'))
-+    throttle_account(&ts, THROTTLE_READ, size);
+ block_ss.add(when: 'CONFIG_REPLICATION', if_true: files('replication.c'))
-     /* account a write */
+ block_ss.add(when: 'CONFIG_SHEEPDOG', if_true: files('sheepdog.c'))
--    throttle_account(&ts, true, size);
+ block_ss.add(when: ['CONFIG_LINUX_AIO', libaio], if_true: files('linux-aio.c'))
 +    throttle_account(&ts, THROTTLE_WRITE, size);
      /* check total result */
      index = to_test[is_ops][0];
 diff --git a/util/throttle.c b/util/throttle.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/throttle.c
 +++ b/util/throttle.c
@@ -XXX,XX +XXX,XX @@ int64_t throttle_compute_wait(LeakyBucket *bkt)
  /* This function compute the time that must be waited while this IO
   *
 - * @is_write:   true if the current IO is a write, false if it's a read
 + * @direction:  throttle direction
   * @ret:        time to wait
   */
  static int64_t throttle_compute_wait_for(ThrottleState *ts,
 -                                         bool is_write)
 +                                         ThrottleDirection direction)
  {
      BucketType to_check[2][4] = { {THROTTLE_BPS_TOTAL,
                                     THROTTLE_OPS_TOTAL,
@@ -XXX,XX +XXX,XX @@ static int64_t throttle_compute_wait_for(ThrottleState *ts,
      int i;
      for (i = 0; i < 4; i++) {
 -        BucketType index = to_check[is_write][i];
 +        BucketType index = to_check[direction][i];
          wait = throttle_compute_wait(&ts->cfg.buckets[index]);
          if (wait > max_wait) {
              max_wait = wait;
@@ -XXX,XX +XXX,XX @@ static int64_t throttle_compute_wait_for(ThrottleState *ts,
  /* compute the timer for this type of operation
   *
 - * @is_write:   the type of operation
 + * @direction:  throttle direction
   * @now:        the current clock timestamp
   * @next_timestamp: the resulting timer
   * @ret:        true if a timer must be set
   */
  static bool throttle_compute_timer(ThrottleState *ts,
 -                                   bool is_write,
 +                                   ThrottleDirection direction,
                                     int64_t now,
                                     int64_t *next_timestamp)
  {
@@ -XXX,XX +XXX,XX @@ static bool throttle_compute_timer(ThrottleState *ts,
      throttle_do_leak(ts, now);
      /* compute the wait time if any */
 -    wait = throttle_compute_wait_for(ts, is_write);
 +    wait = throttle_compute_wait_for(ts, direction);
      /* if the code must wait compute when the next timer should fire */
      if (wait) {
@@ -XXX,XX +XXX,XX @@ void throttle_get_config(ThrottleState *ts, ThrottleConfig *cfg)
   * NOTE: this function is not unit tested due to it's usage of timer_mod
   *
   * @tt:       the timers structure
 - * @is_write: the type of operation (read/write)
 + * @direction: throttle direction
   * @ret:      true if the timer has been scheduled else false
   */
  bool throttle_schedule_timer(ThrottleState *ts,
                               ThrottleTimers *tt,
 -                             bool is_write)
 +                             ThrottleDirection direction)
  {
      int64_t now = qemu_clock_get_ns(tt->clock_type);
      int64_t next_timestamp;
      QEMUTimer *timer;
      bool must_wait;
 -    timer = is_write ? tt->timers[THROTTLE_WRITE] : tt->timers[THROTTLE_READ];
 +    assert(direction < THROTTLE_MAX);
 +    timer = tt->timers[direction];
      assert(timer);
      must_wait = throttle_compute_timer(ts,
 -                                       is_write,
 +                                       direction,
                                         now,
                                         &next_timestamp);
@@ -XXX,XX +XXX,XX @@ bool throttle_schedule_timer(ThrottleState *ts,
  /* do the accounting for this operation
   *
 - * @is_write: the type of operation (read/write)
 + * @direction: throttle direction
   * @size:     the size of the operation
   */
 -void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
 +void throttle_account(ThrottleState *ts, ThrottleDirection direction,
 +                      uint64_t size)
  {
      const BucketType bucket_types_size[2][2] = {
          { THROTTLE_BPS_TOTAL, THROTTLE_BPS_READ },
@@ -XXX,XX +XXX,XX @@ void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
      double units = 1.0;
      unsigned i;
 +    assert(direction < THROTTLE_MAX);
      /* if cfg.op_size is defined and smaller than size we compute unit count */
      if (ts->cfg.op_size && size > ts->cfg.op_size) {
          units = (double) size / ts->cfg.op_size;
@@ -XXX,XX +XXX,XX @@ void throttle_account(ThrottleState *ts, bool is_write, uint64_t size)
      for (i = 0; i < 2; i++) {
          LeakyBucket *bkt;
 -        bkt = &ts->cfg.buckets[bucket_types_size[is_write][i]];
 +        bkt = &ts->cfg.buckets[bucket_types_size[direction][i]];
          bkt->level += size;
          if (bkt->burst_length > 1) {
              bkt->burst_level += size;
          }
 -        bkt = &ts->cfg.buckets[bucket_types_units[is_write][i]];
 +        bkt = &ts->cfg.buckets[bucket_types_units[direction][i]];
          bkt->level += units;
          if (bkt->burst_length > 1) {
              bkt->burst_level += units;
 --
-.41.0
+.26.2

-New patch
+[PULL v2 18/28] util/vhost-user-server: move header to include/
+Headers used by other subsystems are located in include/. Also add the
+vhost-user-server and vhost-user-blk-server headers to MAINTAINERS.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-13-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ MAINTAINERS                                | 4 +++-
+ {util => include/qemu}/vhost-user-server.h | 0
+ block/export/vhost-user-blk-server.c       | 2 +-
+ util/vhost-user-server.c                   | 2 +-
+files changed, 5 insertions(+), 3 deletions(-)
+ rename {util => include/qemu}/vhost-user-server.h (100%)
+diff --git a/MAINTAINERS b/MAINTAINERS
+index XXXXXXX..XXXXXXX 100644
+--- a/MAINTAINERS
++++ b/MAINTAINERS
+@@ -XXX,XX +XXX,XX @@ Vhost-user block device backend server
+ M: Coiby Xu <Coiby.Xu@gmail.com>
+ S: Maintained
+ F: block/export/vhost-user-blk-server.c
+-F: util/vhost-user-server.c
++F: block/export/vhost-user-blk-server.h
++F: include/qemu/vhost-user-server.h
+ F: tests/qtest/libqos/vhost-user-blk.c
++F: util/vhost-user-server.c
+ Replication
+ M: Wen Congyang <wencongyang2@huawei.com>
+diff --git a/util/vhost-user-server.h b/include/qemu/vhost-user-server.h
+similarity index 100%
+rename from util/vhost-user-server.h
+rename to include/qemu/vhost-user-server.h
+diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/export/vhost-user-blk-server.c
++++ b/block/export/vhost-user-blk-server.c
+@@ -XXX,XX +XXX,XX @@
+ #include "block/block.h"
+ #include "contrib/libvhost-user/libvhost-user.h"
+ #include "standard-headers/linux/virtio_blk.h"
+-#include "util/vhost-user-server.h"
++#include "qemu/vhost-user-server.h"
+ #include "vhost-user-blk-server.h"
+ #include "qapi/error.h"
+ #include "qom/object_interfaces.h"
+diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c
+index XXXXXXX..XXXXXXX 100644
+--- a/util/vhost-user-server.c
++++ b/util/vhost-user-server.c
+@@ -XXX,XX +XXX,XX @@
+  */
+ #include "qemu/osdep.h"
+ #include "qemu/main-loop.h"
++#include "qemu/vhost-user-server.h"
+ #include "block/aio-wait.h"
+-#include "vhost-user-server.h"
+ /*
+  * Theory of operation:
+--
+.26.2

-New patch
+[PULL v2 19/28] util/vhost-user-server: use static library in meson.build
+Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build
+the static library once and then reuse it throughout QEMU.
+Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the
+vhost-user tools (vhost-user-gpu, etc) do.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-id: 20200924151549.913737-14-stefanha@redhat.com
+[Added CONFIG_LINUX again because libvhost-user doesn't build on macOS.
+--Stefan]
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ block/export/export.c             | 8 ++++----
+ block/export/meson.build          | 2 +-
+ contrib/libvhost-user/meson.build | 1 +
+ meson.build                       | 6 +++++-
+ util/meson.build                  | 4 +++-
+files changed, 14 insertions(+), 7 deletions(-)
+diff --git a/block/export/export.c b/block/export/export.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/export/export.c
++++ b/block/export/export.c
+@@ -XXX,XX +XXX,XX @@
+ #include "sysemu/block-backend.h"
+ #include "block/export.h"
+ #include "block/nbd.h"
+-#if CONFIG_LINUX
+-#include "block/export/vhost-user-blk-server.h"
+-#endif
+ #include "qapi/error.h"
+ #include "qapi/qapi-commands-block-export.h"
+ #include "qapi/qapi-events-block-export.h"
+ #include "qemu/id.h"
++#ifdef CONFIG_VHOST_USER
++#include "vhost-user-blk-server.h"
++#endif
+ static const BlockExportDriver *blk_exp_drivers[] = {
+     &blk_exp_nbd,
+-#if CONFIG_LINUX
++#ifdef CONFIG_VHOST_USER
+     &blk_exp_vhost_user_blk,
+ #endif
+ };
+diff --git a/block/export/meson.build b/block/export/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/block/export/meson.build
++++ b/block/export/meson.build
+@@ -XXX,XX +XXX,XX @@
+ block_ss.add(files('export.c'))
+-block_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-blk-server.c', '../../contrib/libvhost-user/libvhost-user.c'))
++block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
+diff --git a/contrib/libvhost-user/meson.build b/contrib/libvhost-user/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/contrib/libvhost-user/meson.build
++++ b/contrib/libvhost-user/meson.build
+@@ -XXX,XX +XXX,XX @@
+ libvhost_user = static_library('vhost-user',
+                                files('libvhost-user.c', 'libvhost-user-glib.c'),
+                                build_by_default: false)
++vhost_user = declare_dependency(link_with: libvhost_user)
+diff --git a/meson.build b/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/meson.build
++++ b/meson.build
+@@ -XXX,XX +XXX,XX @@ trace_events_subdirs += [
+   'util',
+ ]
++vhost_user = not_found
++if 'CONFIG_VHOST_USER' in config_host
++  subdir('contrib/libvhost-user')
++endif
++
+ subdir('qapi')
+ subdir('qobject')
+ subdir('stubs')
+@@ -XXX,XX +XXX,XX @@ if have_tools
+              install: true)
+   if 'CONFIG_VHOST_USER' in config_host
+-    subdir('contrib/libvhost-user')
+     subdir('contrib/vhost-user-blk')
+     subdir('contrib/vhost-user-gpu')
+     subdir('contrib/vhost-user-input')
+diff --git a/util/meson.build b/util/meson.build
+index XXXXXXX..XXXXXXX 100644
+--- a/util/meson.build
++++ b/util/meson.build
+@@ -XXX,XX +XXX,XX @@ if have_block
+   util_ss.add(files('main-loop.c'))
+   util_ss.add(files('nvdimm-utils.c'))
+   util_ss.add(files('qemu-coroutine.c', 'qemu-coroutine-lock.c', 'qemu-coroutine-io.c'))
+-  util_ss.add(when: 'CONFIG_LINUX', if_true: files('vhost-user-server.c'))
++  util_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: [
++    files('vhost-user-server.c'), vhost_user
++  ])
+   util_ss.add(files('block-helpers.c'))
+   util_ss.add(files('qemu-coroutine-sleep.c'))
+   util_ss.add(files('qemu-co-shared-resource.c'))
+--
+.26.2

-[PULL 13/14] file-posix: Simplify raw_co_prw's 'out' zone code
+[PULL v2 20/28] qemu-storage-daemon: avoid compiling blockdev_ss twice
-We duplicate the same condition three times here, pull it out to the top
+Introduce libblkdev.fa to avoid recompiling blockdev_ss twice.
 level.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
-Message-Id: <20230824155345.109765-5-hreitz@redhat.com>
+Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
-Reviewed-by: Sam Li <faithilikerun@gmail.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-id: 20200929125516.186715-3-stefanha@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/file-posix.c | 18 +++++-------------
+ meson.build                | 12 ++++++++++--
-file changed, 5 insertions(+), 13 deletions(-)
+ storage-daemon/meson.build |  3 +--
 files changed, 11 insertions(+), 4 deletions(-)
-diff --git a/block/file-posix.c b/block/file-posix.c
+diff --git a/meson.build b/meson.build
 index XXXXXXX..XXXXXXX 100644
---- a/block/file-posix.c
+--- a/meson.build
-+++ b/block/file-posix.c
++++ b/meson.build
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset,
+@@ -XXX,XX +XXX,XX @@ blockdev_ss.add(files(
+ # os-win32.c does not
- out:
+ blockdev_ss.add(when: 'CONFIG_POSIX', if_true: files('os-posix.c'))
- #if defined(CONFIG_BLKZONED)
+ softmmu_ss.add(when: 'CONFIG_WIN32', if_true: [files('os-win32.c')])
--{
+-softmmu_ss.add_all(blockdev_ss)
--    BlockZoneWps *wps = bs->wps;
--    if (ret == 0) {
+ common_ss.add(files('cpus-common.c'))
--        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
--            bs->bl.zoned != BLK_Z_NONE) {
+@@ -XXX,XX +XXX,XX @@ block = declare_dependency(link_whole: [libblock],
-+    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+                            link_args: '@block.syms',
-+        bs->bl.zoned != BLK_Z_NONE) {
+                            dependencies: [crypto, io])
-+        BlockZoneWps *wps = bs->wps;
-+        if (ret == 0) {
++blockdev_ss = blockdev_ss.apply(config_host, strict: false)
-             uint64_t *wp = &wps->wp[offset / bs->bl.zone_size];
++libblockdev = static_library('blockdev', blockdev_ss.sources() + genh,
-             if (!BDRV_ZT_IS_CONV(*wp)) {
++                             dependencies: blockdev_ss.dependencies(),
-                 if (type & QEMU_AIO_ZONE_APPEND) {
++                             name_suffix: 'fa',
-@@ -XXX,XX +XXX,XX @@ out:
++                             build_by_default: false)
-                     *wp = offset + bytes;
++
-                 }
++blockdev = declare_dependency(link_whole: [libblockdev],
-             }
++                              dependencies: [block])
--        }
++
--    } else {
+ qmp_ss = qmp_ss.apply(config_host, strict: false)
--        if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
+ libqmp = static_library('qmp', qmp_ss.sources() + genh,
--            bs->bl.zoned != BLK_Z_NONE) {
+                         dependencies: qmp_ss.dependencies(),
-+        } else {
+@@ -XXX,XX +XXX,XX @@ foreach m : block_mods + softmmu_mods
-             update_zones_wp(bs, s->fd, 0, 1);
+                 install_dir: config_host['qemu_moddir'])
-         }
+ endforeach
--    }
+-softmmu_ss.add(authz, block, chardev, crypto, io, qmp)
--    if ((type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) &&
++softmmu_ss.add(authz, blockdev, chardev, crypto, io, qmp)
--        bs->blk.zoned != BLK_Z_NONE) {
+ common_ss.add(qom, qemuutil)
-         qemu_co_mutex_unlock(&wps->colock);
-     }
+ common_ss.add_all(when: 'CONFIG_SOFTMMU', if_true: [softmmu_ss])
--}
+diff --git a/storage-daemon/meson.build b/storage-daemon/meson.build
- #endif
+index XXXXXXX..XXXXXXX 100644
-     return ret;
+--- a/storage-daemon/meson.build
- }
++++ b/storage-daemon/meson.build
@@ -XXX,XX +XXX,XX @@
  qsd_ss = ss.source_set()
  qsd_ss.add(files('qemu-storage-daemon.c'))
 -qsd_ss.add(block, chardev, qmp, qom, qemuutil)
 -qsd_ss.add_all(blockdev_ss)
 +qsd_ss.add(blockdev, chardev, qmp, qom, qemuutil)
  subdir('qapi')
 --
-.41.0
+.26.2

-[PULL 14/14] tests/file-io-error: New test
+[PULL v2 21/28] block: move block exports to libblockdev
-This is a regression test for
+Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd.
-https://bugzilla.redhat.com/show_bug.cgi?id=2234374.
+They are not used by other programs and are not otherwise needed in
 libblock.
-All this test needs to do is trigger an I/O error inside of file-posix
+Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss.
-(specifically raw_co_prw()).  One reliable way to do this without
+Since bdrv_close_all() (libblock) calls blk_exp_close_all()
-requiring special privileges is to use a FUSE export, which allows us to
+(libblockdev) a stub function is required..
 inject any error that we want, e.g. via blkdebug.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Make qemu-nbd.c use signal handling utility functions instead of
-Message-Id: <20230824155345.109765-6-hreitz@redhat.com>
+duplicating the code. This helps because os-posix.c is in libblockdev
-[hreitz: Fixed test to be skipped when there is no FUSE support, to
+and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks.
-         suppress fusermount's allow_other warning, and to be skipped
+Once we use the signal handling utility functions we also end up
-         with $IMGOPTSSYNTAX enabled]
+providing the necessary symbol.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
 Reviewed-by: Eric Blake <eblake@redhat.com>
 Message-id: 20200929125516.186715-4-stefanha@redhat.com
 [Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake
 --Stefan]
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- tests/qemu-iotests/tests/file-io-error     | 119 +++++++++++++++++++++
+ qemu-nbd.c                | 21 ++++++++-------------
- tests/qemu-iotests/tests/file-io-error.out |  33 ++++++
+ stubs/blk-exp-close-all.c |  7 +++++++
-files changed, 152 insertions(+)
+ block/export/meson.build  |  4 ++--
- create mode 100755 tests/qemu-iotests/tests/file-io-error
+ meson.build               |  4 ++--
- create mode 100644 tests/qemu-iotests/tests/file-io-error.out
+ nbd/meson.build           |  2 ++
  stubs/meson.build         |  1 +
 files changed, 22 insertions(+), 17 deletions(-)
  create mode 100644 stubs/blk-exp-close-all.c
-diff --git a/tests/qemu-iotests/tests/file-io-error b/tests/qemu-iotests/tests/file-io-error
+diff --git a/qemu-nbd.c b/qemu-nbd.c
-new file mode 100755
+index XXXXXXX..XXXXXXX 100644
-index XXXXXXX..XXXXXXX
+--- a/qemu-nbd.c
---- /dev/null
++++ b/qemu-nbd.c
 +++ b/tests/qemu-iotests/tests/file-io-error
 @@ -XXX,XX +XXX,XX @@
-+#!/usr/bin/env bash
+ #include "qapi/error.h"
-+# group: rw
+ #include "qemu/cutils.h"
-+#
+ #include "sysemu/block-backend.h"
-+# Produce an I/O error in file-posix, and hope that it is not catastrophic.
++#include "sysemu/runstate.h" /* for qemu_system_killed() prototype */
-+# Regression test for: https://bugzilla.redhat.com/show_bug.cgi?id=2234374
+ #include "block/block_int.h"
-+#
+ #include "block/nbd.h"
-+# Copyright (C) 2023 Red Hat, Inc.
+ #include "qemu/main-loop.h"
-+#
+@@ -XXX,XX +XXX,XX @@ QEMU_COPYRIGHT "\n"
-+# This program is free software; you can redistribute it and/or modify
+ }
-+# it under the terms of the GNU General Public License as published by
-+# the Free Software Foundation; either version 2 of the License, or
+ #ifdef CONFIG_POSIX
-+# (at your option) any later version.
+-static void termsig_handler(int signum)
-+#
++/*
-+# This program is distributed in the hope that it will be useful,
++ * The client thread uses SIGTERM to interrupt the server.  A signal
-+# but WITHOUT ANY WARRANTY; without even the implied warranty of
++ * handler ensures that "qemu-nbd -v -c" exits with a nice status code.
-+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++ */
-+# GNU General Public License for more details.
++void qemu_system_killed(int signum, pid_t pid)
-+#
+ {
-+# You should have received a copy of the GNU General Public License
+     qatomic_cmpxchg(&state, RUNNING, TERMINATE);
-+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+     qemu_notify_event();
-+#
+@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
-+
+     BlockExportOptions *export_opts;
-+seq=$(basename "$0")
-+echo "QA output created by $seq"
+ #ifdef CONFIG_POSIX
-+
+-    /*
-+status=1    # failure is the default!
+-     * Exit gracefully on various signals, which includes SIGTERM used
-+
+-     * by 'qemu-nbd -v -c'.
-+_cleanup()
+-     */
-+{
+-    struct sigaction sa_sigterm;
-+    _cleanup_qemu
+-    memset(&sa_sigterm, 0, sizeof(sa_sigterm));
-+    rm -f "$TEST_DIR/fuse-export"
+-    sa_sigterm.sa_handler = termsig_handler;
-+}
+-    sigaction(SIGTERM, &sa_sigterm, NULL);
-+trap "_cleanup; exit \$status" 0 1 2 3 15
+-    sigaction(SIGINT, &sa_sigterm, NULL);
-+
+-    sigaction(SIGHUP, &sa_sigterm, NULL);
-+# get standard environment, filters and checks
+-
-+. ../common.rc
+-    signal(SIGPIPE, SIG_IGN);
-+. ../common.filter
++    os_setup_early_signal_handling();
-+. ../common.qemu
++    os_setup_signal_handling();
-+
+ #endif
-+# Format-agnostic (we do not use any), but we do test the file protocol
-+_supported_proto file
+     socket_init();
-+_require_drivers blkdebug null-co
+diff --git a/stubs/blk-exp-close-all.c b/stubs/blk-exp-close-all.c
 +
 +if [ "$IMGOPTSSYNTAX" = "true" ]; then
 +    # We need `$QEMU_IO -f file` to work; IMGOPTSSYNTAX uses --image-opts,
 +    # breaking -f.
 +    _unsupported_fmt $IMGFMT
 +fi
 +
 +# This is a regression test of a bug in which flie-posix would access zone
 +# information in case of an I/O error even when there is no zone information,
 +# resulting in a division by zero.
 +# To reproduce the problem, we need to trigger an I/O error inside of
 +# file-posix, which can be done (rootless) by providing a FUSE export that
 +# presents only errors when accessed.
 +
 +_launch_qemu
 +_send_qemu_cmd $QEMU_HANDLE \
 +    "{'execute': 'qmp_capabilities'}" \
 +    'return'
 +
 +_send_qemu_cmd $QEMU_HANDLE \
 +    "{'execute': 'blockdev-add',
 +      'arguments': {
 +          'driver': 'blkdebug',
 +          'node-name': 'node0',
 +          'inject-error': [{'event': 'none'}],
 +          'image': {
 +              'driver': 'null-co'
 +          }
 +      }}" \
 +    'return'
 +
 +# FUSE mountpoint must exist and be a regular file
 +touch "$TEST_DIR/fuse-export"
 +
 +# The grep -v to filter fusermount's (benign) error when /etc/fuse.conf does
 +# not contain user_allow_other and the subsequent check for missing FUSE support
 +# have both been taken from iotest 308.
 +output=$(_send_qemu_cmd $QEMU_HANDLE \
 +    "{'execute': 'block-export-add',
 +      'arguments': {
 +          'id': 'exp0',
 +          'type': 'fuse',
 +          'node-name': 'node0',
 +          'mountpoint': '$TEST_DIR/fuse-export',
 +          'writable': true
 +      }}" \
 +    'return' \
 +    | grep -v 'option allow_other only allowed if')
 +
 +if echo "$output" | grep -q "Parameter 'type' does not accept value 'fuse'"; then
 +    _notrun 'No FUSE support'
 +fi
 +echo "$output"
 +
 +echo
 +# This should fail, but gracefully, i.e. just print an I/O error, not crash.
 +$QEMU_IO -f file -c 'write 0 64M' "$TEST_DIR/fuse-export" | _filter_qemu_io
 +echo
 +
 +_send_qemu_cmd $QEMU_HANDLE \
 +    "{'execute': 'block-export-del',
 +      'arguments': {'id': 'exp0'}}" \
 +    'return'
 +
 +_send_qemu_cmd $QEMU_HANDLE \
 +    '' \
 +    'BLOCK_EXPORT_DELETED'
 +
 +_send_qemu_cmd $QEMU_HANDLE \
 +    "{'execute': 'blockdev-del',
 +      'arguments': {'node-name': 'node0'}}" \
 +    'return'
 +
 +# success, all done
 +echo "*** done"
 +rm -f $seq.full
 +status=0
 diff --git a/tests/qemu-iotests/tests/file-io-error.out b/tests/qemu-iotests/tests/file-io-error.out
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/tests/qemu-iotests/tests/file-io-error.out
++++ b/stubs/blk-exp-close-all.c
 @@ -XXX,XX +XXX,XX @@
-+QA output created by file-io-error
++#include "qemu/osdep.h"
-+{'execute': 'qmp_capabilities'}
++#include "block/export.h"
 +{"return": {}}
 +{'execute': 'blockdev-add',
 +      'arguments': {
 +          'driver': 'blkdebug',
 +          'node-name': 'node0',
 +          'inject-error': [{'event': 'none'}],
 +          'image': {
 +              'driver': 'null-co'
 +          }
 +      }}
 +{"return": {}}
 +{'execute': 'block-export-add',
 +      'arguments': {
 +          'id': 'exp0',
 +          'type': 'fuse',
 +          'node-name': 'node0',
 +          'mountpoint': 'TEST_DIR/fuse-export',
 +          'writable': true
 +      }}
 +{"return": {}}
 +
-+write failed: Input/output error
++/* Only used in programs that support block exports (libblockdev.fa) */
-+
++void blk_exp_close_all(void)
-+{'execute': 'block-export-del',
++{
-+      'arguments': {'id': 'exp0'}}
++}
-+{"return": {}}
+diff --git a/block/export/meson.build b/block/export/meson.build
-+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_EXPORT_DELETED", "data": {"id": "exp0"}}
+index XXXXXXX..XXXXXXX 100644
-+{'execute': 'blockdev-del',
+--- a/block/export/meson.build
-+      'arguments': {'node-name': 'node0'}}
++++ b/block/export/meson.build
-+{"return": {}}
+@@ -XXX,XX +XXX,XX @@
-+*** done
+-block_ss.add(files('export.c'))
 -block_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
 +blockdev_ss.add(files('export.c'))
 +blockdev_ss.add(when: ['CONFIG_LINUX', 'CONFIG_VHOST_USER'], if_true: files('vhost-user-blk-server.c'))
 diff --git a/meson.build b/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/meson.build
 +++ b/meson.build
@@ -XXX,XX +XXX,XX @@ subdir('dump')
  block_ss.add(files(
    'block.c',
 -  'blockdev-nbd.c',
    'blockjob.c',
    'job.c',
    'qemu-io-cmds.c',
@@ -XXX,XX +XXX,XX @@ subdir('block')
  blockdev_ss.add(files(
    'blockdev.c',
 +  'blockdev-nbd.c',
    'iothread.c',
    'job-qmp.c',
  ))
@@ -XXX,XX +XXX,XX @@ if have_tools
    qemu_io = executable('qemu-io', files('qemu-io.c'),
               dependencies: [block, qemuutil], install: true)
    qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
 -               dependencies: [block, qemuutil], install: true)
 +               dependencies: [blockdev, qemuutil], install: true)
    subdir('storage-daemon')
    subdir('contrib/rdmacm-mux')
 diff --git a/nbd/meson.build b/nbd/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/nbd/meson.build
 +++ b/nbd/meson.build
@@ -XXX,XX +XXX,XX @@
  block_ss.add(files(
    'client.c',
    'common.c',
 +))
 +blockdev_ss.add(files(
    'server.c',
  ))
 diff --git a/stubs/meson.build b/stubs/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/stubs/meson.build
 +++ b/stubs/meson.build
@@ -XXX,XX +XXX,XX @@
  stub_ss.add(files('arch_type.c'))
  stub_ss.add(files('bdrv-next-monitor-owned.c'))
  stub_ss.add(files('blk-commit-all.c'))
 +stub_ss.add(files('blk-exp-close-all.c'))
  stub_ss.add(files('blockdev-close-all-bdrv-states.c'))
  stub_ss.add(files('change-state-handler.c'))
  stub_ss.add(files('cmos.c'))
 --
-.41.0
+.26.2

-[PULL 04/14] test-throttle: test read only and write only
+[PULL v2 22/28] block/export: add iothread and fixed-iothread options
-From: zhenwei pi <pizhenwei@bytedance.com>
+Make it possible to specify the iothread where the export will run. By
 default the block node can be moved to other AioContexts later and the
 export will follow. The fixed-iothread option forces strict behavior
 that prevents changing AioContext while the export is active. See the
 QAPI docs for details.
-Reviewed-by: Alberto Garcia <berto@igalia.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Message-id: 20200929125516.186715-5-stefanha@redhat.com
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
+[Fix stray '#' character in block-export.json and add missing "(since:
-Message-Id: <20230728022006.1098509-5-pizhenwei@bytedance.com>
+.2)" as suggested by Eric Blake.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+--Stefan]
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- tests/unit/test-throttle.c | 66 ++++++++++++++++++++++++++++++++++++++
+ qapi/block-export.json               | 11 ++++++++++
-file changed, 66 insertions(+)
+ block/export/export.c                | 31 +++++++++++++++++++++++++++-
  block/export/vhost-user-blk-server.c |  5 ++++-
  nbd/server.c                         |  2 --
 files changed, 45 insertions(+), 4 deletions(-)
-diff --git a/tests/unit/test-throttle.c b/tests/unit/test-throttle.c
+diff --git a/qapi/block-export.json b/qapi/block-export.json
 index XXXXXXX..XXXXXXX 100644
---- a/tests/unit/test-throttle.c
+--- a/qapi/block-export.json
-+++ b/tests/unit/test-throttle.c
++++ b/qapi/block-export.json
-@@ -XXX,XX +XXX,XX @@ static void test_init(void)
+@@ -XXX,XX +XXX,XX @@
-     throttle_timers_destroy(tt);
+ #                export before completion is signalled. (since: 5.2;
- }
+ #                default: false)
+ #
-+static void test_init_readonly(void)
++# @iothread: The name of the iothread object where the export will run. The
-+{
++#            default is to use the thread currently associated with the
-+    int i;
++#            block node. (since: 5.2)
 +#
 +# @fixed-iothread: True prevents the block node from being moved to another
 +#                  thread while the export is active. If true and @iothread is
 +#                  given, export creation fails if the block node cannot be
 +#                  moved to the iothread. The default is false. (since: 5.2)
 +#
  # Since: 4.2
  ##
  { 'union': 'BlockExportOptions',
    'base': { 'type': 'BlockExportType',
              'id': 'str',
 +        '*fixed-iothread': 'bool',
 +        '*iothread': 'str',
              'node-name': 'str',
              '*writable': 'bool',
              '*writethrough': 'bool' },
 diff --git a/block/export/export.c b/block/export/export.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/export.c
 +++ b/block/export/export.c
@@ -XXX,XX +XXX,XX @@
  #include "block/block.h"
  #include "sysemu/block-backend.h"
 +#include "sysemu/iothread.h"
  #include "block/export.h"
  #include "block/nbd.h"
  #include "qapi/error.h"
@@ -XXX,XX +XXX,XX @@ static const BlockExportDriver *blk_exp_find_driver(BlockExportType type)
  BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
  {
 +    bool fixed_iothread = export->has_fixed_iothread && export->fixed_iothread;
      const BlockExportDriver *drv;
      BlockExport *exp = NULL;
      BlockDriverState *bs;
 -    BlockBackend *blk;
 +    BlockBackend *blk = NULL;
      AioContext *ctx;
      uint64_t perm;
      int ret;
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
      ctx = bdrv_get_aio_context(bs);
      aio_context_acquire(ctx);
 +    if (export->has_iothread) {
 +        IOThread *iothread;
 +        AioContext *new_ctx;
 +
-+    tt = &tgm.throttle_timers;
++        iothread = iothread_by_id(export->iothread);
 +        if (!iothread) {
 +            error_setg(errp, "iothread \"%s\" not found", export->iothread);
 +            goto fail;
 +        }
 +
-+    /* fill the structures with crap */
++        new_ctx = iothread_get_aio_context(iothread);
 +    memset(&ts, 1, sizeof(ts));
 +    memset(tt, 1, sizeof(*tt));
 +
-+    /* init structures */
++        ret = bdrv_try_set_aio_context(bs, new_ctx, errp);
-+    throttle_init(&ts);
++        if (ret == 0) {
-+    throttle_timers_init(tt, ctx, QEMU_CLOCK_VIRTUAL,
++            aio_context_release(ctx);
-+                         read_timer_cb, NULL, &ts);
++            aio_context_acquire(new_ctx);
-+
++            ctx = new_ctx;
-+    /* check initialized fields */
++        } else if (fixed_iothread) {
-+    g_assert(tt->clock_type == QEMU_CLOCK_VIRTUAL);
++            goto fail;
-+    g_assert(tt->timers[THROTTLE_READ]);
++        }
 +    g_assert(!tt->timers[THROTTLE_WRITE]);
 +
 +    /* check other fields where cleared */
 +    g_assert(!ts.previous_leak);
 +    g_assert(!ts.cfg.op_size);
 +    for (i = 0; i < BUCKETS_COUNT; i++) {
 +        g_assert(!ts.cfg.buckets[i].avg);
 +        g_assert(!ts.cfg.buckets[i].max);
 +        g_assert(!ts.cfg.buckets[i].level);
 +    }
 +
-+    throttle_timers_destroy(tt);
+     /*
-+}
+      * Block exports are used for non-shared storage migration. Make sure
       * that BDRV_O_INACTIVE is cleared and the image is ready for write
@@ -XXX,XX +XXX,XX @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
      }
      blk = blk_new(ctx, perm, BLK_PERM_ALL);
 +
-+static void test_init_writeonly(void)
++    if (!fixed_iothread) {
-+{
++        blk_set_allow_aio_context_change(blk, true);
 +    int i;
 +
 +    tt = &tgm.throttle_timers;
 +
 +    /* fill the structures with crap */
 +    memset(&ts, 1, sizeof(ts));
 +    memset(tt, 1, sizeof(*tt));
 +
 +    /* init structures */
 +    throttle_init(&ts);
 +    throttle_timers_init(tt, ctx, QEMU_CLOCK_VIRTUAL,
 +                         NULL, write_timer_cb, &ts);
 +
 +    /* check initialized fields */
 +    g_assert(tt->clock_type == QEMU_CLOCK_VIRTUAL);
 +    g_assert(!tt->timers[THROTTLE_READ]);
 +    g_assert(tt->timers[THROTTLE_WRITE]);
 +
 +    /* check other fields where cleared */
 +    g_assert(!ts.previous_leak);
 +    g_assert(!ts.cfg.op_size);
 +    for (i = 0; i < BUCKETS_COUNT; i++) {
 +        g_assert(!ts.cfg.buckets[i].avg);
 +        g_assert(!ts.cfg.buckets[i].max);
 +        g_assert(!ts.cfg.buckets[i].level);
 +    }
 +
-+    throttle_timers_destroy(tt);
+     ret = blk_insert_bs(blk, bs, errp);
-+}
+     if (ret < 0) {
          goto fail;
 diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/vhost-user-blk-server.c
 +++ b/block/export/vhost-user-blk-server.c
@@ -XXX,XX +XXX,XX @@ static const VuDevIface vu_blk_iface = {
  static void blk_aio_attached(AioContext *ctx, void *opaque)
  {
      VuBlkExport *vexp = opaque;
 +
- static void test_destroy(void)
++    vexp->export.ctx = ctx;
      vhost_user_server_attach_aio_context(&vexp->vu_server, ctx);
  }
  static void blk_aio_detach(void *opaque)
  {
-     int i;
+     VuBlkExport *vexp = opaque;
-@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
++
-     g_test_add_func("/throttle/leak_bucket",        test_leak_bucket);
+     vhost_user_server_detach_aio_context(&vexp->vu_server);
-     g_test_add_func("/throttle/compute_wait",       test_compute_wait);
++    vexp->export.ctx = NULL;
-     g_test_add_func("/throttle/init",               test_init);
+ }
-+    g_test_add_func("/throttle/init_readonly",      test_init_readonly);
-+    g_test_add_func("/throttle/init_writeonly",     test_init_writeonly);
+ static void
-     g_test_add_func("/throttle/destroy",            test_destroy);
+@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
-     g_test_add_func("/throttle/have_timer",         test_have_timer);
+     vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
-     g_test_add_func("/throttle/detach_attach",      test_detach_attach);
+                                logical_block_size);
 -    blk_set_allow_aio_context_change(exp->blk, true);
      blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                   vexp);
 diff --git a/nbd/server.c b/nbd/server.c
 index XXXXXXX..XXXXXXX 100644
 --- a/nbd/server.c
 +++ b/nbd/server.c
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
          return ret;
      }
 -    blk_set_allow_aio_context_change(blk, true);
 -
      QTAILQ_INIT(&exp->clients);
      exp->name = g_strdup(arg->name);
      exp->description = g_strdup(arg->description);
 --
-.41.0
+.26.2

-[PULL 10/14] file-posix: Clear bs->bl.zoned on error
+[PULL v2 23/28] block/export: add vhost-user-blk multi-queue support
-bs->bl.zoned is what indicates whether the zone information is present
+Allow the number of queues to be configured using --export
-and valid; it is the only thing that raw_refresh_zoned_limits() sets if
+vhost-user-blk,num-queues=N. This setting should match the QEMU --device
-CONFIG_BLKZONED is not defined, and it is also the only thing that it
+vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers
-sets if CONFIG_BLKZONED is defined, but there are no zones.
+its own value if the vhost-user-blk backend offers fewer queues than
 QEMU.
-Make sure that it is always set to BLK_Z_NONE if there is an error
+The vhost-user-blk-server.c code is already capable of multi-queue. All
-anywhere in raw_refresh_zoned_limits() so that we do not accidentally
+virtqueue processing runs in the same AioContext. No new locking is
-announce zones while our information is incomplete or invalid.
+needed.
-This also fixes a memory leak in the last error path in
+Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit.
-raw_refresh_zoned_limits().
+Note that the feature bit only announces the presence of the num_queues
 configuration space field. It does not promise that there is more than 1
 virtqueue, so we can set it unconditionally.
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+I tested multi-queue by running a random read fio test with numjobs=4 on
-Message-Id: <20230824155345.109765-2-hreitz@redhat.com>
+an -smp 4 guest. After the benchmark finished the guest /proc/interrupts
-Reviewed-by: Sam Li <faithilikerun@gmail.com>
+file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/
 directory shows that Linux blk-mq has 4 queues configured.
 An automated test is included in the next commit.
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Acked-by: Markus Armbruster <armbru@redhat.com>
 Message-id: 20201001144604.559733-2-stefanha@redhat.com
 [Fixed accidental tab characters as suggested by Markus Armbruster
 --Stefan]
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/file-posix.c | 21 ++++++++++++---------
+ qapi/block-export.json               | 10 +++++++---
-file changed, 12 insertions(+), 9 deletions(-)
+ block/export/vhost-user-blk-server.c | 24 ++++++++++++++++++------
 files changed, 25 insertions(+), 9 deletions(-)
-diff --git a/block/file-posix.c b/block/file-posix.c
+diff --git a/qapi/block-export.json b/qapi/block-export.json
 index XXXXXXX..XXXXXXX 100644
---- a/block/file-posix.c
+--- a/qapi/block-export.json
-+++ b/block/file-posix.c
++++ b/qapi/block-export.json
-@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
+@@ -XXX,XX +XXX,XX @@
-     BlockZoneModel zoned;
+ #        SocketAddress types are supported. Passed fds must be UNIX domain
-     int ret;
+ #        sockets.
+ # @logical-block-size: Logical block size in bytes. Defaults to 512 bytes.
--    bs->bl.zoned = BLK_Z_NONE;
++# @num-queues: Number of request virtqueues. Must be greater than 0. Defaults
--
++#              to 1.
-     ret = get_sysfs_zoned_model(st, &zoned);
+ #
-     if (ret < 0 || zoned == BLK_Z_NONE) {
+ # Since: 5.2
--        return;
+ ##
-+        goto no_zoned;
+ { 'struct': 'BlockExportOptionsVhostUserBlk',
 -  'data': { 'addr': 'SocketAddress', '*logical-block-size': 'size' } }
 +  'data': { 'addr': 'SocketAddress',
 +        '*logical-block-size': 'size',
 +            '*num-queues': 'uint16'} }
  ##
  # @NbdServerAddOptions:
@@ -XXX,XX +XXX,XX @@
  { 'union': 'BlockExportOptions',
    'base': { 'type': 'BlockExportType',
              'id': 'str',
 -        '*fixed-iothread': 'bool',
 -        '*iothread': 'str',
 +            '*fixed-iothread': 'bool',
 +            '*iothread': 'str',
              'node-name': 'str',
              '*writable': 'bool',
              '*writethrough': 'bool' },
 diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/export/vhost-user-blk-server.c
 +++ b/block/export/vhost-user-blk-server.c
@@ -XXX,XX +XXX,XX @@
  #include "util/block-helpers.h"
  enum {
 -    VHOST_USER_BLK_MAX_QUEUES = 1,
 +    VHOST_USER_BLK_NUM_QUEUES_DEFAULT = 1,
  };
  struct virtio_blk_inhdr {
      unsigned char status;
@@ -XXX,XX +XXX,XX @@ static uint64_t vu_blk_get_features(VuDev *dev)
 ull << VIRTIO_BLK_F_DISCARD |
 ull << VIRTIO_BLK_F_WRITE_ZEROES |
 ull << VIRTIO_BLK_F_CONFIG_WCE |
 +               1ull << VIRTIO_BLK_F_MQ |
 ull << VIRTIO_F_VERSION_1 |
 ull << VIRTIO_RING_F_INDIRECT_DESC |
 ull << VIRTIO_RING_F_EVENT_IDX |
@@ -XXX,XX +XXX,XX @@ static void blk_aio_detach(void *opaque)
  static void
  vu_blk_initialize_config(BlockDriverState *bs,
 -                           struct virtio_blk_config *config, uint32_t blk_size)
 +                         struct virtio_blk_config *config,
 +                         uint32_t blk_size,
 +                         uint16_t num_queues)
  {
      config->capacity = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
      config->blk_size = blk_size;
@@ -XXX,XX +XXX,XX @@ vu_blk_initialize_config(BlockDriverState *bs,
      config->seg_max = 128 - 2;
      config->min_io_size = 1;
      config->opt_io_size = 1;
 -    config->num_queues = VHOST_USER_BLK_MAX_QUEUES;
 +    config->num_queues = num_queues;
      config->max_discard_sectors = 32768;
      config->max_discard_seg = 1;
      config->discard_sector_alignment = config->blk_size >> 9;
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
      BlockExportOptionsVhostUserBlk *vu_opts = &opts->u.vhost_user_blk;
      Error *local_err = NULL;
      uint64_t logical_block_size;
 +    uint16_t num_queues = VHOST_USER_BLK_NUM_QUEUES_DEFAULT;
      vexp->writable = opts->writable;
      vexp->blkcfg.wce = 0;
@@ -XXX,XX +XXX,XX @@ static int vu_blk_exp_create(BlockExport *exp, BlockExportOptions *opts,
      }
-     bs->bl.zoned = zoned;
+     vexp->blk_size = logical_block_size;
+     blk_set_guest_block_size(exp->blk, logical_block_size);
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
      if (ret < 0) {
          error_setg_errno(errp, -ret, "Unable to read chunk_sectors "
                                       "sysfs attribute");
 -        return;
 +        goto no_zoned;
      } else if (!ret) {
          error_setg(errp, "Read 0 from chunk_sectors sysfs attribute");
 -        return;
 +        goto no_zoned;
      }
      bs->bl.zone_size = ret << BDRV_SECTOR_BITS;
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
      if (ret < 0) {
          error_setg_errno(errp, -ret, "Unable to read nr_zones "
                                       "sysfs attribute");
 -        return;
 +        goto no_zoned;
      } else if (!ret) {
          error_setg(errp, "Read 0 from nr_zones sysfs attribute");
 -        return;
 +        goto no_zoned;
      }
      bs->bl.nr_zones = ret;
@@ -XXX,XX +XXX,XX @@ static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
      ret = get_zones_wp(bs, s->fd, 0, bs->bl.nr_zones, 0);
      if (ret < 0) {
          error_setg_errno(errp, -ret, "report wps failed");
 -        bs->wps = NULL;
 -        return;
 +        goto no_zoned;
      }
      qemu_co_mutex_init(&bs->wps->colock);
 +    return;
 +
-+no_zoned:
++    if (vu_opts->has_num_queues) {
-+    bs->bl.zoned = BLK_Z_NONE;
++        num_queues = vu_opts->num_queues;
-+    g_free(bs->wps);
++    }
-+    bs->wps = NULL;
++    if (num_queues == 0) {
- }
++        error_setg(errp, "num-queues must be greater than 0");
- #else /* !defined(CONFIG_BLKZONED) */
++        return -EINVAL;
- static void raw_refresh_zoned_limits(BlockDriverState *bs, struct stat *st,
++    }
 +
      vu_blk_initialize_config(blk_bs(exp->blk), &vexp->blkcfg,
 -                               logical_block_size);
 +                             logical_block_size, num_queues);
      blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach,
                                   vexp);
      if (!vhost_user_server_start(&vexp->vu_server, vu_opts->addr, exp->ctx,
 -                                 VHOST_USER_BLK_MAX_QUEUES, &vu_blk_iface,
 -                                 errp)) {
 +                                 num_queues, &vu_blk_iface, errp)) {
          blk_remove_aio_context_notifier(exp->blk, blk_aio_attached,
                                          blk_aio_detach, vexp);
          return -EADDRNOTAVAIL;
 --
-.41.0
+.26.2

-[PULL 03/14] throttle: support read-only and write-only
+[PULL v2 24/28] block/io: fix bdrv_co_block_status_above
-From: zhenwei pi <pizhenwei@bytedance.com>
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Only one direction is necessary in several scenarios:
+bdrv_co_block_status_above has several design problems with handling
-- a read-only disk
+short backing files:
 - operations on a device are considered as *write* only. For example,
   encrypt/decrypt/sign/verify operations on a cryptodev use a single
   *write* timer(read timer callback is defined, but never invoked).
-Allow a single direction in throttle, this reduces memory, and uplayer
+. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
-does not need a dummy callback any more.
+without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
 which produces these after-EOF zeros is inside requested backing
 sequence.
+. With want_zero=false, it may return pnum=0 prior to actual EOF,
+because of EOF of short backing file.
+Fix these things, making logic about short backing files clearer.
+With fixed bdrv_block_status_above we also have to improve is_zero in
+qcow2 code, otherwise iotest 154 will fail, because with this patch we
+stop to merge zeros of different types (produced by fully unallocated
+in the whole backing chain regions vs produced by short backing files).
+Note also, that this patch leaves for another day the general problem
+around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated
+vs go-to-backing.
+Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
 Reviewed-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Reviewed-by: Eric Blake <eblake@redhat.com>
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
+Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com
-Message-Id: <20230728022006.1098509-4-pizhenwei@bytedance.com>
+[Fix s/comes/come/ as suggested by Eric Blake
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+--Stefan]
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- util/throttle.c | 42 ++++++++++++++++++++++++++++--------------
+ block/io.c    | 68 ++++++++++++++++++++++++++++++++++++++++-----------
-file changed, 28 insertions(+), 14 deletions(-)
+ block/qcow2.c | 16 ++++++++++--
 files changed, 68 insertions(+), 16 deletions(-)
-diff --git a/util/throttle.c b/util/throttle.c
+diff --git a/block/io.c b/block/io.c
 index XXXXXXX..XXXXXXX 100644
---- a/util/throttle.c
+--- a/block/io.c
-+++ b/util/throttle.c
++++ b/block/io.c
-@@ -XXX,XX +XXX,XX @@ static bool throttle_compute_timer(ThrottleState *ts,
+@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
- void throttle_timers_attach_aio_context(ThrottleTimers *tt,
+                                   int64_t *map,
-                                         AioContext *new_context)
+                                   BlockDriverState **file)
  {
--    tt->timers[THROTTLE_READ] =
++    int ret;
--        aio_timer_new(new_context, tt->clock_type, SCALE_NS,
+     BlockDriverState *p;
--                      tt->timer_cb[THROTTLE_READ], tt->timer_opaque);
+-    int ret = 0;
--    tt->timers[THROTTLE_WRITE] =
+-    bool first = true;
--        aio_timer_new(new_context, tt->clock_type, SCALE_NS,
++    int64_t eof = 0;
--                      tt->timer_cb[THROTTLE_WRITE], tt->timer_opaque);
-+    ThrottleDirection dir;
+     assert(bs != base);
 -    for (p = bs; p != base; p = bdrv_filter_or_cow_bs(p)) {
 +
-+    for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
++    ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
-+        if (tt->timer_cb[dir]) {
++    if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
-+            tt->timers[dir] =
++        return ret;
 +                aio_timer_new(new_context, tt->clock_type, SCALE_NS,
 +                              tt->timer_cb[dir], tt->timer_opaque);
 +        }
 +    }
++
++    if (ret & BDRV_BLOCK_EOF) {
++        eof = offset + *pnum;
++    }
++
++    assert(*pnum <= bytes);
++    bytes = *pnum;
++
++    for (p = bdrv_filter_or_cow_bs(bs); p != base;
++         p = bdrv_filter_or_cow_bs(p))
++    {
+         ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
+                                    file);
+         if (ret < 0) {
+-            break;
++            return ret;
+         }
+-        if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
++        if (*pnum == 0) {
+             /*
+-             * Reading beyond the end of the file continues to read
+-             * zeroes, but we can only widen the result to the
+-             * unallocated length we learned from an earlier
+-             * iteration.
++             * The top layer deferred to this layer, and because this layer is
++             * short, any zeroes that we synthesize beyond EOF behave as if they
++             * were allocated at this layer.
++             *
++             * We don't include BDRV_BLOCK_EOF into ret, as upper layer may be
++             * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
++             * below.
+              */
++            assert(ret & BDRV_BLOCK_EOF);
+             *pnum = bytes;
++            if (file) {
++                *file = p;
++            }
++            ret = BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
++            break;
+         }
+-        if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
++        if (ret & BDRV_BLOCK_ALLOCATED) {
++            /*
++             * We've found the node and the status, we must break.
++             *
++             * Drop BDRV_BLOCK_EOF, as it's not for upper layer, which may be
++             * larger. We'll add BDRV_BLOCK_EOF if needed at function end, see
++             * below.
++             */
++            ret &= ~BDRV_BLOCK_EOF;
+             break;
+         }
+-        /* [offset, pnum] unallocated on this layer, which could be only
+-         * the first part of [offset, bytes].  */
+-        bytes = MIN(bytes, *pnum);
+-        first = false;
++
++        /*
++         * OK, [offset, offset + *pnum) region is unallocated on this layer,
++         * let's continue the diving.
++         */
++        assert(*pnum <= bytes);
++        bytes = *pnum;
++    }
++
++    if (offset + *pnum == eof) {
++        ret |= BDRV_BLOCK_EOF;
+     }
++
+     return ret;
  }
- /*
+diff --git a/block/qcow2.c b/block/qcow2.c
-@@ -XXX,XX +XXX,XX @@ void throttle_timers_init(ThrottleTimers *tt,
+index XXXXXXX..XXXXXXX 100644
-                           QEMUTimerCB *write_timer_cb,
+--- a/block/qcow2.c
-                           void *timer_opaque)
++++ b/block/qcow2.c
- {
+@@ -XXX,XX +XXX,XX @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
-+    assert(read_timer_cb || write_timer_cb);
+     if (!bytes) {
      memset(tt, 0, sizeof(ThrottleTimers));
      tt->clock_type = clock_type;
@@ -XXX,XX +XXX,XX @@ void throttle_timers_init(ThrottleTimers *tt,
  /* destroy a timer */
  static void throttle_timer_destroy(QEMUTimer **timer)
  {
 -    assert(*timer != NULL);
 +    if (*timer == NULL) {
 +        return;
 +    }
      timer_free(*timer);
      *timer = NULL;
@@ -XXX,XX +XXX,XX @@ static void throttle_timer_destroy(QEMUTimer **timer)
  /* Remove timers from event loop */
  void throttle_timers_detach_aio_context(ThrottleTimers *tt)
  {
 -    int i;
 +    ThrottleDirection dir;
 -    for (i = 0; i < THROTTLE_MAX; i++) {
 -        throttle_timer_destroy(&tt->timers[i]);
 +    for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
 +        throttle_timer_destroy(&tt->timers[dir]);
      }
  }
@@ -XXX,XX +XXX,XX @@ void throttle_timers_destroy(ThrottleTimers *tt)
  /* is any throttling timer configured */
  bool throttle_timers_are_initialized(ThrottleTimers *tt)
  {
 -    if (tt->timers[0]) {
 -        return true;
 +    ThrottleDirection dir;
 +
 +    for (dir = THROTTLE_READ; dir < THROTTLE_MAX; dir++) {
 +        if (tt->timers[dir]) {
 +            return true;
 +        }
      }
      return false;
@@ -XXX,XX +XXX,XX @@ bool throttle_schedule_timer(ThrottleState *ts,
  {
      int64_t now = qemu_clock_get_ns(tt->clock_type);
      int64_t next_timestamp;
 +    QEMUTimer *timer;
      bool must_wait;
 +    timer = is_write ? tt->timers[THROTTLE_WRITE] : tt->timers[THROTTLE_READ];
 +    assert(timer);
 +
      must_wait = throttle_compute_timer(ts,
                                         is_write,
                                         now,
@@ -XXX,XX +XXX,XX @@ bool throttle_schedule_timer(ThrottleState *ts,
      }
      /* request throttled and timer pending -> do nothing */
 -    if (timer_pending(tt->timers[is_write])) {
 +    if (timer_pending(timer)) {
          return true;
      }
+-    res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
-     /* request throttled and timer not pending -> arm timer */
+-    return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
--    timer_mod(tt->timers[is_write], next_timestamp);
++
-+    timer_mod(timer, next_timestamp);
++    /*
-     return true;
++     * bdrv_block_status_above doesn't merge different types of zeros, for
 +     * example, zeros which come from the region which is unallocated in
 +     * the whole backing chain, and zeros which come because of a short
 +     * backing file. So, we need a loop.
 +     */
 +    do {
 +        res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
 +        offset += nr;
 +        bytes -= nr;
 +    } while (res >= 0 && (res & BDRV_BLOCK_ZERO) && nr && bytes);
 +
 +    return res >= 0 && (res & BDRV_BLOCK_ZERO) && bytes == 0;
  }
+ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
 --
-.41.0
+.26.2

-[PULL 01/14] throttle: introduce enum ThrottleDirection
+[PULL v2 25/28] block/io: bdrv_common_block_status_above: support include_base
-From: zhenwei pi <pizhenwei@bytedance.com>
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Use enum ThrottleDirection instead of number index.
+In order to reuse bdrv_common_block_status_above in
 bdrv_is_allocated_above, let's support include_base parameter.
+Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
 Reviewed-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Reviewed-by: Eric Blake <eblake@redhat.com>
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
+Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com
-Message-Id: <20230728022006.1098509-2-pizhenwei@bytedance.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
- include/qemu/throttle.h | 11 ++++++++---
+ block/coroutines.h |  2 ++
- util/throttle.c         | 16 +++++++++-------
+ block/io.c         | 21 ++++++++++++++-------
-files changed, 17 insertions(+), 10 deletions(-)
+files changed, 16 insertions(+), 7 deletions(-)
-diff --git a/include/qemu/throttle.h b/include/qemu/throttle.h
+diff --git a/block/coroutines.h b/block/coroutines.h
 index XXXXXXX..XXXXXXX 100644
---- a/include/qemu/throttle.h
+--- a/block/coroutines.h
-+++ b/include/qemu/throttle.h
++++ b/block/coroutines.h
-@@ -XXX,XX +XXX,XX @@ typedef struct ThrottleState {
+@@ -XXX,XX +XXX,XX @@ bdrv_pwritev(BdrvChild *child, int64_t offset, unsigned int bytes,
-     int64_t previous_leak;    /* timestamp of the last leak done */
+ int coroutine_fn
- } ThrottleState;
+ bdrv_co_common_block_status_above(BlockDriverState *bs,
+                                   BlockDriverState *base,
-+typedef enum {
++                                  bool include_base,
-+    THROTTLE_READ = 0,
+                                   bool want_zero,
-+    THROTTLE_WRITE,
+                                   int64_t offset,
-+    THROTTLE_MAX
+                                   int64_t bytes,
-+} ThrottleDirection;
+@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
  int generated_co_wrapper
  bdrv_common_block_status_above(BlockDriverState *bs,
                                 BlockDriverState *base,
 +                               bool include_base,
                                 bool want_zero,
                                 int64_t offset,
                                 int64_t bytes,
 diff --git a/block/io.c b/block/io.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/io.c
 +++ b/block/io.c
@@ -XXX,XX +XXX,XX @@ early_out:
  int coroutine_fn
  bdrv_co_common_block_status_above(BlockDriverState *bs,
                                    BlockDriverState *base,
 +                                  bool include_base,
                                    bool want_zero,
                                    int64_t offset,
                                    int64_t bytes,
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
      BlockDriverState *p;
      int64_t eof = 0;
 -    assert(bs != base);
 +    assert(include_base || bs != base);
 +    assert(!include_base || base); /* Can't include NULL base */
      ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
 -    if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED) {
 +    if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
          return ret;
      }
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
      assert(*pnum <= bytes);
      bytes = *pnum;
 -    for (p = bdrv_filter_or_cow_bs(bs); p != base;
 +    for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
           p = bdrv_filter_or_cow_bs(p))
      {
          ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
              break;
          }
 +        if (p == base) {
 +            assert(include_base);
 +            break;
 +        }
 +
- typedef struct ThrottleTimers {
+         /*
--    QEMUTimer *timers[2];     /* timers used to do the throttling */
+          * OK, [offset, offset + *pnum) region is unallocated on this layer,
-+    QEMUTimer *timers[THROTTLE_MAX];    /* timers used to do the throttling */
+          * let's continue the diving.
-     QEMUClockType clock_type; /* the clock used */
+@@ -XXX,XX +XXX,XX @@ int bdrv_block_status_above(BlockDriverState *bs, BlockDriverState *base,
+                             int64_t offset, int64_t bytes, int64_t *pnum,
-     /* Callbacks */
+                             int64_t *map, BlockDriverState **file)
 -    QEMUTimerCB *read_timer_cb;
 -    QEMUTimerCB *write_timer_cb;
 +    QEMUTimerCB *timer_cb[THROTTLE_MAX];
      void *timer_opaque;
  } ThrottleTimers;
 diff --git a/util/throttle.c b/util/throttle.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/throttle.c
 +++ b/util/throttle.c
@@ -XXX,XX +XXX,XX @@ static bool throttle_compute_timer(ThrottleState *ts,
  void throttle_timers_attach_aio_context(ThrottleTimers *tt,
                                          AioContext *new_context)
  {
--    tt->timers[0] = aio_timer_new(new_context, tt->clock_type, SCALE_NS,
+-    return bdrv_common_block_status_above(bs, base, true, offset, bytes,
--                                  tt->read_timer_cb, tt->timer_opaque);
++    return bdrv_common_block_status_above(bs, base, false, true, offset, bytes,
--    tt->timers[1] = aio_timer_new(new_context, tt->clock_type, SCALE_NS,
+                                           pnum, map, file);
 -                                  tt->write_timer_cb, tt->timer_opaque);
 +    tt->timers[THROTTLE_READ] =
 +        aio_timer_new(new_context, tt->clock_type, SCALE_NS,
 +                      tt->timer_cb[THROTTLE_READ], tt->timer_opaque);
 +    tt->timers[THROTTLE_WRITE] =
 +        aio_timer_new(new_context, tt->clock_type, SCALE_NS,
 +                      tt->timer_cb[THROTTLE_WRITE], tt->timer_opaque);
  }
- /*
+@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
-@@ -XXX,XX +XXX,XX @@ void throttle_timers_init(ThrottleTimers *tt,
+     int ret;
-     memset(tt, 0, sizeof(ThrottleTimers));
+     int64_t dummy;
-     tt->clock_type = clock_type;
+-    ret = bdrv_common_block_status_above(bs, bdrv_filter_or_cow_bs(bs), false,
--    tt->read_timer_cb = read_timer_cb;
+-                                         offset, bytes, pnum ? pnum : &dummy,
--    tt->write_timer_cb = write_timer_cb;
+-                                         NULL, NULL);
-+    tt->timer_cb[THROTTLE_READ] = read_timer_cb;
++    ret = bdrv_common_block_status_above(bs, bs, true, false, offset,
-+    tt->timer_cb[THROTTLE_WRITE] = write_timer_cb;
++                                         bytes, pnum ? pnum : &dummy, NULL,
-     tt->timer_opaque = timer_opaque;
++                                         NULL);
-     throttle_timers_attach_aio_context(tt, aio_context);
+     if (ret < 0) {
- }
+         return ret;
@@ -XXX,XX +XXX,XX @@ void throttle_timers_detach_aio_context(ThrottleTimers *tt)
  {
      int i;
 -    for (i = 0; i < 2; i++) {
 +    for (i = 0; i < THROTTLE_MAX; i++) {
          throttle_timer_destroy(&tt->timers[i]);
      }
- }
 --
-.41.0
+.26.2

-[PULL 07/14] throttle: use THROTTLE_MAX/ARRAY_SIZE for hard code
+[PULL v2 26/28] block/io: bdrv_common_block_status_above: support bs == base
-From: zhenwei pi <pizhenwei@bytedance.com>
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-The first dimension of both to_check and
+We are going to reuse bdrv_common_block_status_above in
-bucket_types_size/bucket_types_units is used as throttle direction,
+bdrv_is_allocated_above. bdrv_is_allocated_above may be called with
-use THROTTLE_MAX instead of hard coded number. Also use ARRAY_SIZE()
+include_base == false and still bs == base (for ex. from img_rebase()).
 to avoid hard coded number for the second dimension.
-Hanna noticed that the two array should be static. Yes, turn them
+So, support this corner case.
 into static variables.
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
+Reviewed-by: Kevin Wolf <kwolf@redhat.com>
-Message-Id: <20230728022006.1098509-8-pizhenwei@bytedance.com>
+Reviewed-by: Eric Blake <eblake@redhat.com>
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Reviewed-by: Alberto Garcia <berto@igalia.com>
 Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- util/throttle.c | 11 ++++++-----
+ block/io.c | 6 +++++-
-file changed, 6 insertions(+), 5 deletions(-)
+file changed, 5 insertions(+), 1 deletion(-)
-diff --git a/util/throttle.c b/util/throttle.c
+diff --git a/block/io.c b/block/io.c
 index XXXXXXX..XXXXXXX 100644
---- a/util/throttle.c
+--- a/block/io.c
-+++ b/util/throttle.c
++++ b/block/io.c
-@@ -XXX,XX +XXX,XX @@ int64_t throttle_compute_wait(LeakyBucket *bkt)
+@@ -XXX,XX +XXX,XX @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
- static int64_t throttle_compute_wait_for(ThrottleState *ts,
+     BlockDriverState *p;
-                                          ThrottleDirection direction)
+     int64_t eof = 0;
- {
--    BucketType to_check[2][4] = { {THROTTLE_BPS_TOTAL,
+-    assert(include_base || bs != base);
-+    static const BucketType to_check[THROTTLE_MAX][4] = {
+     assert(!include_base || base); /* Can't include NULL base */
-+                                  {THROTTLE_BPS_TOTAL,
-                                    THROTTLE_OPS_TOTAL,
++    if (!include_base && bs == base) {
-                                    THROTTLE_BPS_READ,
++        *pnum = bytes;
-                                    THROTTLE_OPS_READ},
++        return 0;
-@@ -XXX,XX +XXX,XX @@ static int64_t throttle_compute_wait_for(ThrottleState *ts,
++    }
-     int64_t wait, max_wait = 0;
++
-     int i;
+     ret = bdrv_co_block_status(bs, want_zero, offset, bytes, pnum, map, file);
+     if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
--    for (i = 0; i < 4; i++) {
+         return ret;
 +    for (i = 0; i < ARRAY_SIZE(to_check[THROTTLE_READ]); i++) {
          BucketType index = to_check[direction][i];
          wait = throttle_compute_wait(&ts->cfg.buckets[index]);
          if (wait > max_wait) {
@@ -XXX,XX +XXX,XX @@ bool throttle_schedule_timer(ThrottleState *ts,
  void throttle_account(ThrottleState *ts, ThrottleDirection direction,
                        uint64_t size)
  {
 -    const BucketType bucket_types_size[2][2] = {
 +    static const BucketType bucket_types_size[THROTTLE_MAX][2] = {
          { THROTTLE_BPS_TOTAL, THROTTLE_BPS_READ },
          { THROTTLE_BPS_TOTAL, THROTTLE_BPS_WRITE }
      };
 -    const BucketType bucket_types_units[2][2] = {
 +    static const BucketType bucket_types_units[THROTTLE_MAX][2] = {
          { THROTTLE_OPS_TOTAL, THROTTLE_OPS_READ },
          { THROTTLE_OPS_TOTAL, THROTTLE_OPS_WRITE }
      };
@@ -XXX,XX +XXX,XX @@ void throttle_account(ThrottleState *ts, ThrottleDirection direction,
          units = (double) size / ts->cfg.op_size;
      }
 -    for (i = 0; i < 2; i++) {
 +    for (i = 0; i < ARRAY_SIZE(bucket_types_size[THROTTLE_READ]); i++) {
          LeakyBucket *bkt;
          bkt = &ts->cfg.buckets[bucket_types_size[direction][i]];
 --
-.41.0
+.26.2

-[PULL 05/14] cryptodev: use NULL throttle timer cb for read direction
+[PULL v2 27/28] block/io: fix bdrv_is_allocated_above
-From: zhenwei pi <pizhenwei@bytedance.com>
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Operations on a cryptodev are considered as *write* only, the callback
+bdrv_is_allocated_above wrongly handles short backing files: it reports
-of read direction is never invoked. Use NULL instead of an unreachable
+after-EOF space as UNALLOCATED which is wrong, as on read the data is
-path(cryptodev_backend_throttle_timer_cb on read direction).
+generated on the level of short backing file (if all overlays have
 unallocated areas at that place).
-The dummy read timer(never invoked) is already removed here, it means
+Reusing bdrv_common_block_status_above fixes the issue and unifies code
-that the 'FIXME' tag is no longer needed.
+path.
+Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+Reviewed-by: Eric Blake <eblake@redhat.com>
 Reviewed-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
+[Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/.
-Message-Id: <20230728022006.1098509-6-pizhenwei@bytedance.com>
+--Stefan]
-Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- backends/cryptodev.c | 3 +--
+ block/io.c | 43 +++++--------------------------------------
-file changed, 1 insertion(+), 2 deletions(-)
+file changed, 5 insertions(+), 38 deletions(-)
-diff --git a/backends/cryptodev.c b/backends/cryptodev.c
+diff --git a/block/io.c b/block/io.c
 index XXXXXXX..XXXXXXX 100644
---- a/backends/cryptodev.c
+--- a/block/io.c
-+++ b/backends/cryptodev.c
++++ b/block/io.c
-@@ -XXX,XX +XXX,XX @@ static void cryptodev_backend_set_throttle(CryptoDevBackend *backend, int field,
+@@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t offset,
-     if (!enabled) {
+  * at 'offset + *pnum' may return the same allocation status (in other
-         throttle_init(&backend->ts);
+  * words, the result is not necessarily the maximum possible range);
-         throttle_timers_init(&backend->tt, qemu_get_aio_context(),
+  * but 'pnum' will only be 0 when end of file is reached.
--                             QEMU_CLOCK_REALTIME,
+- *
--                             cryptodev_backend_throttle_timer_cb, /* FIXME */
+  */
-+                             QEMU_CLOCK_REALTIME, NULL,
+ int bdrv_is_allocated_above(BlockDriverState *top,
-                              cryptodev_backend_throttle_timer_cb, backend);
+                             BlockDriverState *base,
                              bool include_base, int64_t offset,
                              int64_t bytes, int64_t *pnum)
  {
 -    BlockDriverState *intermediate;
 -    int ret;
 -    int64_t n = bytes;
 -
 -    assert(base || !include_base);
 -
 -    intermediate = top;
 -    while (include_base || intermediate != base) {
 -        int64_t pnum_inter;
 -        int64_t size_inter;
 -
 -        assert(intermediate);
 -        ret = bdrv_is_allocated(intermediate, offset, bytes, &pnum_inter);
 -        if (ret < 0) {
 -            return ret;
 -        }
 -        if (ret) {
 -            *pnum = pnum_inter;
 -            return 1;
 -        }
 -
 -        size_inter = bdrv_getlength(intermediate);
 -        if (size_inter < 0) {
 -            return size_inter;
 -        }
 -        if (n > pnum_inter &&
 -            (intermediate == top || offset + pnum_inter < size_inter)) {
 -            n = pnum_inter;
 -        }
 -
 -        if (intermediate == base) {
 -            break;
 -        }
 -
 -        intermediate = bdrv_filter_or_cow_bs(intermediate);
 +    int ret = bdrv_common_block_status_above(top, base, include_base, false,
 +                                             offset, bytes, pnum, NULL, NULL);
 +    if (ret < 0) {
 +        return ret;
      }
+-    *pnum = n;
+-    return 0;
++    return !!(ret & BDRV_BLOCK_ALLOCATED);
+ }
+ int coroutine_fn
 --
-.41.0
+.26.2

-[PULL 02/14] test-throttle: use enum ThrottleDirection
+[PULL v2 28/28] iotests: add commit top->base cases to 274
-From: zhenwei pi <pizhenwei@bytedance.com>
+From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Use enum ThrottleDirection instead in the throttle test codes.
+These cases are fixed by previous patches around block_status and
 is_allocated.
+Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+Reviewed-by: Eric Blake <eblake@redhat.com>
 Reviewed-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
+Message-id: 20200924194003.22080-6-vsementsov@virtuozzo.com
-Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-Id: <20230728022006.1098509-3-pizhenwei@bytedance.com>
 Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
 ---
- tests/unit/test-throttle.c | 6 +++---
+ tests/qemu-iotests/274     | 20 +++++++++++
-file changed, 3 insertions(+), 3 deletions(-)
+ tests/qemu-iotests/274.out | 68 ++++++++++++++++++++++++++++++++++++++
 files changed, 88 insertions(+)
-diff --git a/tests/unit/test-throttle.c b/tests/unit/test-throttle.c
+diff --git a/tests/qemu-iotests/274 b/tests/qemu-iotests/274
 index XXXXXXX..XXXXXXX 100755
 --- a/tests/qemu-iotests/274
 +++ b/tests/qemu-iotests/274
@@ -XXX,XX +XXX,XX @@ with iotests.FilePath('base') as base, \
      iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, mid)
      iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), mid)
 +    iotests.log('=== Testing qemu-img commit (top -> base) ===')
 +
 +    create_chain()
 +    iotests.qemu_img_log('commit', '-b', base, top)
 +    iotests.img_info_log(base)
 +    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
 +    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
 +
 +    iotests.log('=== Testing QMP active commit (top -> base) ===')
 +
 +    create_chain()
 +    with create_vm() as vm:
 +        vm.launch()
 +        vm.qmp_log('block-commit', device='top', base_node='base',
 +                   job_id='job0', auto_dismiss=False)
 +        vm.run_job('job0', wait=5)
 +
 +    iotests.img_info_log(mid)
 +    iotests.qemu_io_log('-c', 'read -P 1 0 %d' % size_short, base)
 +    iotests.qemu_io_log('-c', 'read -P 0 %d %d' % (size_short, size_diff), base)
      iotests.log('== Resize tests ==')
 diff --git a/tests/qemu-iotests/274.out b/tests/qemu-iotests/274.out
 index XXXXXXX..XXXXXXX 100644
---- a/tests/unit/test-throttle.c
+--- a/tests/qemu-iotests/274.out
-+++ b/tests/unit/test-throttle.c
++++ b/tests/qemu-iotests/274.out
-@@ -XXX,XX +XXX,XX @@ static void test_init(void)
+@@ -XXX,XX +XXX,XX @@ read 1048576/1048576 bytes at offset 0
+ read 1048576/1048576 bytes at offset 1048576
-     /* check initialized fields */
+MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-     g_assert(tt->clock_type == QEMU_CLOCK_VIRTUAL);
--    g_assert(tt->timers[0]);
++=== Testing qemu-img commit (top -> base) ===
--    g_assert(tt->timers[1]);
++Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
-+    g_assert(tt->timers[THROTTLE_READ]);
++
-+    g_assert(tt->timers[THROTTLE_WRITE]);
++Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
++
-     /* check other fields where cleared */
++Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
-     g_assert(!ts.previous_leak);
++
-@@ -XXX,XX +XXX,XX @@ static void test_destroy(void)
++wrote 2097152/2097152 bytes at offset 0
-     throttle_timers_init(tt, ctx, QEMU_CLOCK_VIRTUAL,
++2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-                          read_timer_cb, write_timer_cb, &ts);
++
-     throttle_timers_destroy(tt);
++Image committed.
--    for (i = 0; i < 2; i++) {
++
-+    for (i = 0; i < THROTTLE_MAX; i++) {
++image: TEST_IMG
-         g_assert(!tt->timers[i]);
++file format: IMGFMT
-     }
++virtual size: 2 MiB (2097152 bytes)
- }
++cluster_size: 65536
 +Format specific information:
 +    compat: 1.1
 +    compression type: zlib
 +    lazy refcounts: false
 +    refcount bits: 16
 +    corrupt: false
 +    extended l2: false
 +
 +read 1048576/1048576 bytes at offset 0
 +1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +
 +read 1048576/1048576 bytes at offset 1048576
 +1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +
 +=== Testing QMP active commit (top -> base) ===
 +Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 lazy_refcounts=off refcount_bits=16
 +
 +Formatting 'TEST_DIR/PID-mid', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=1048576 backing_file=TEST_DIR/PID-base backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
 +
 +Formatting 'TEST_DIR/PID-top', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=2097152 backing_file=TEST_DIR/PID-mid backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
 +
 +wrote 2097152/2097152 bytes at offset 0
 +2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +
 +{"execute": "block-commit", "arguments": {"auto-dismiss": false, "base-node": "base", "device": "top", "job-id": "job0"}}
 +{"return": {}}
 +{"execute": "job-complete", "arguments": {"id": "job0"}}
 +{"return": {}}
 +{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 +{"data": {"device": "job0", "len": 1048576, "offset": 1048576, "speed": 0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 +{"execute": "job-dismiss", "arguments": {"id": "job0"}}
 +{"return": {}}
 +image: TEST_IMG
 +file format: IMGFMT
 +virtual size: 1 MiB (1048576 bytes)
 +cluster_size: 65536
 +backing file: TEST_DIR/PID-base
 +backing file format: IMGFMT
 +Format specific information:
 +    compat: 1.1
 +    compression type: zlib
 +    lazy refcounts: false
 +    refcount bits: 16
 +    corrupt: false
 +    extended l2: false
 +
 +read 1048576/1048576 bytes at offset 0
 +1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +
 +read 1048576/1048576 bytes at offset 1048576
 +1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 +
  == Resize tests ==
  === preallocation=off ===
  Formatting 'TEST_DIR/PID-base', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=6442450944 lazy_refcounts=off refcount_bits=16
 --
-.41.0
+.26.2

The following changes since commit f5fe7c17ac4e309e47e78f0f9761aebc8d2f2c81:

Merge tag 'pull-tcg-20230823-2' of https://gitlab.com/rth7680/qemu into staging (2023-08-28 16:07:04 -0400)

are available in the Git repository at:

https://gitlab.com/hreitz/qemu.git tags/pull-block-2023-09-01

for you to fetch changes up to 380448464dd89291cf7fd7434be6c225482a334d:

tests/file-io-error: New test (2023-08-29 13:01:24 +0200)

----------------------------------------------------------------
Block patches

- Fix for file-posix's zoning code crashing on I/O errors
- Throttling refactoring

----------------------------------------------------------------
Hanna Czenczek (5):
  file-posix: Clear bs->bl.zoned on error
  file-posix: Check bs->bl.zoned for zone info
  file-posix: Fix zone update in I/O error path
  file-posix: Simplify raw_co_prw's 'out' zone code
  tests/file-io-error: New test

Zhenwei Pi (9):
  throttle: introduce enum ThrottleDirection
  test-throttle: use enum ThrottleDirection
  throttle: support read-only and write-only
  test-throttle: test read only and write only
  cryptodev: use NULL throttle timer cb for read direction
  throttle: use enum ThrottleDirection instead of bool is_write
  throttle: use THROTTLE_MAX/ARRAY_SIZE for hard code
  fsdev: Use ThrottleDirection instread of bool is_write
  block/throttle-groups: Use ThrottleDirection instread of bool is_write

fsdev/qemu-fsdev-throttle.h                |   4 +-
 include/block/throttle-groups.h            |   6 +-
 include/qemu/throttle.h                    |  16 +-
 backends/cryptodev.c                       |  12 +-
 block/block-backend.c                      |   4 +-
 block/file-posix.c                         |  42 +++---
 block/throttle-groups.c                    | 163 +++++++++++----------
 block/throttle.c                           |   8 +-
 fsdev/qemu-fsdev-throttle.c                |  18 ++-
 hw/9pfs/cofile.c                           |   4 +-
 tests/unit/test-throttle.c                 |  76 +++++++++-
 util/throttle.c                            |  84 +++++++----
 tests/qemu-iotests/tests/file-io-error     | 119 +++++++++++++++
 tests/qemu-iotests/tests/file-io-error.out |  33 +++++
 14 files changed, 418 insertions(+), 171 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/file-io-error
 create mode 100644 tests/qemu-iotests/tests/file-io-error.out

-- 
2.41.0