Series comparison

-[Qemu-devel] [PULL 0/7] Block patches
+[PULL 0/8] Block patches
-The following changes since commit 6cb4f6db4f4367faa33da85b15f75bbbd2bed2a6:
+The following changes since commit c6a5fc2ac76c5ab709896ee1b0edd33685a67ed1:
-  Merge remote-tracking branch 'remotes/cleber/tags/python-next-pull-request' into staging (2019-03-07 16:16:02 +0000)
+  decodetree: Add --output-null for meson testing (2023-05-31 19:56:42 -0700)
 are available in the Git repository at:
-  git://github.com/stefanha/qemu.git tags/block-pull-request
+  https://gitlab.com/stefanha/qemu.git tags/block-pull-request
-for you to fetch changes up to 6ca206204fa773c8626d59caf2a5676d6cc35f52:
+for you to fetch changes up to 98b126f5e3228a346c774e569e26689943b401dd:
-  iothread: document about why we need explicit aio_poll() (2019-03-08 10:20:57 +0000)
+  qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa (2023-06-01 11:08:21 -0400)
 ----------------------------------------------------------------
 Pull request
+- Stefano Garzarella's blkio block driver 'fd' parameter
+- My thread-local blk_io_plug() series
 ----------------------------------------------------------------
-Anastasiia Rusakova (1):
+Stefan Hajnoczi (6):
-  hw/block/virtio-blk: Clean req->dev repetitions
+  block: add blk_io_plug_call() API
   block/nvme: convert to blk_io_plug_call() API
   block/blkio: convert to blk_io_plug_call() API
   block/io_uring: convert to blk_io_plug_call() API
   block/linux-aio: convert to blk_io_plug_call() API
   block: remove bdrv_co_io_plug() API
-Peter Xu (5):
+Stefano Garzarella (2):
-  iothread: replace init_done_cond with a semaphore
+  block/blkio: use qemu_open() to support fd passing for virtio-blk
-  iothread: create the gcontext unconditionally
+  qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa
   iothread: create main loop unconditionally
   iothread: push gcontext earlier in the thread_fn
   iothread: document about why we need explicit aio_poll()
-Stefan Hajnoczi (1):
+ MAINTAINERS                       |   1 +
-  MAINTAINERS: add missing support status fields
+ qapi/block-core.json              |   6 ++
+ meson.build                       |   4 +
- MAINTAINERS               |  3 ++
+ include/block/block-io.h          |   3 -
- include/sysemu/iothread.h |  5 +--
+ include/block/block_int-common.h  |  11 ---
- hw/block/virtio-blk.c     | 16 ++++---
+ include/block/raw-aio.h           |  14 ---
- iothread.c                | 90 +++++++++++++++++++--------------------
+ include/sysemu/block-backend-io.h |  13 +--
-files changed, 57 insertions(+), 57 deletions(-)
+ block/blkio.c                     |  96 ++++++++++++------
  block/block-backend.c             |  22 -----
  block/file-posix.c                |  38 -------
  block/io.c                        |  37 -------
  block/io_uring.c                  |  44 ++++-----
  block/linux-aio.c                 |  41 +++-----
  block/nvme.c                      |  44 +++------
  block/plug.c                      | 159 ++++++++++++++++++++++++++++++
  hw/block/dataplane/xen-block.c    |   8 +-
  hw/block/virtio-blk.c             |   4 +-
  hw/scsi/virtio-scsi.c             |   6 +-
  block/meson.build                 |   1 +
  block/trace-events                |   6 +-
 files changed, 293 insertions(+), 265 deletions(-)
  create mode 100644 block/plug.c
 --
-.20.1
+.40.1

-[Qemu-devel] [PULL 1/7] MAINTAINERS: add missing support status fields
+[PULL 1/8] block: add blk_io_plug_call() API
-This patch adds the "S:" line for areas of the codebase that currently
+Introduce a new API for thread-local blk_io_plug() that does not
-lack a support status field.
+traverse the block graph. The goal is to make blk_io_plug() multi-queue
+friendly.
-Note that there are a few more areas that are more abstract and do not
-correspond to a specific set of files.  They have not been modified.
+Instead of having block drivers track whether or not we're in a plugged
+section, provide an API that allows them to defer a function call until
-Cc: Alex Bennée <alex.bennee@linaro.org>
+we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
 called multiple times with the same fn/opaque pair, then fn() is only
 called once at the end of the function - resulting in batching.
 This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
 blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
 because the plug state is now thread-local.
 Later patches convert block drivers to blk_io_plug_call() and then we
 can finally remove .bdrv_co_io_plug() once all block drivers have been
 converted.
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: Thomas Huth <thuth@redhat.com>
+Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
+Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
-Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Acked-by: Kevin Wolf <kwolf@redhat.com>
-Message-id: 20190301163518.20702-1-stefanha@redhat.com
+Message-id: 20230530180959.1108766-2-stefanha@redhat.com
 Message-Id: <20190301163518.20702-1-stefanha@redhat.com>
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- MAINTAINERS | 3 +++
+ MAINTAINERS                       |   1 +
-file changed, 3 insertions(+)
+ include/sysemu/block-backend-io.h |  13 +--
  block/block-backend.c             |  22 -----
  block/plug.c                      | 159 ++++++++++++++++++++++++++++++
  hw/block/dataplane/xen-block.c    |   8 +-
  hw/block/virtio-blk.c             |   4 +-
  hw/scsi/virtio-scsi.c             |   6 +-
  block/meson.build                 |   1 +
 files changed, 173 insertions(+), 41 deletions(-)
  create mode 100644 block/plug.c
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
-@@ -XXX,XX +XXX,XX @@ F: include/hw/tricore/
+@@ -XXX,XX +XXX,XX @@ F: util/aio-*.c
+ F: util/aio-*.h
- Multiarch Linux User Tests
+ F: util/fdmon-*.c
- M: Alex Bennée <alex.bennee@linaro.org>
+ F: block/io.c
-+S: Maintained
++F: block/plug.c
- F: tests/tcg/multiarch/
+ F: migration/block*
+ F: include/block/aio.h
- Guest CPU Cores (KVM):
+ F: include/block/aio-wait.h
-@@ -XXX,XX +XXX,XX @@ F: qemu.sasl
+diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
- Coroutines
+index XXXXXXX..XXXXXXX 100644
- M: Stefan Hajnoczi <stefanha@redhat.com>
+--- a/include/sysemu/block-backend-io.h
- M: Kevin Wolf <kwolf@redhat.com>
++++ b/include/sysemu/block-backend-io.h
-+S: Maintained
+@@ -XXX,XX +XXX,XX @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
- F: util/*coroutine*
+ int blk_get_max_iov(BlockBackend *blk);
- F: include/qemu/coroutine*
+ int blk_get_max_hw_iov(BlockBackend *blk);
- F: tests/test-coroutine.c
-@@ -XXX,XX +XXX,XX @@ F: .gitlab-ci.yml
+-/*
- Guest Test Compilation Support
+- * blk_io_plug/unplug are thread-local operations. This means that multiple
- M: Alex Bennée <alex.bennee@linaro.org>
+- * IOThreads can simultaneously call plug/unplug, but the caller must ensure
- R: Philippe Mathieu-Daudé <f4bug@amsat.org>
+- * that each unplug() is called in the same IOThread of the matching plug().
-+S: Maintained
+- */
- F: tests/tcg/Makefile
+-void coroutine_fn blk_co_io_plug(BlockBackend *blk);
- F: tests/tcg/Makefile.include
+-void co_wrapper blk_io_plug(BlockBackend *blk);
- L: qemu-devel@nongnu.org
+-
 -void coroutine_fn blk_co_io_unplug(BlockBackend *blk);
 -void co_wrapper blk_io_unplug(BlockBackend *blk);
 +void blk_io_plug(void);
 +void blk_io_unplug(void);
 +void blk_io_plug_call(void (*fn)(void *), void *opaque);
  AioContext *blk_get_aio_context(BlockBackend *blk);
  BlockAcctStats *blk_get_stats(BlockBackend *blk);
 diff --git a/block/block-backend.c b/block/block-backend.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/block-backend.c
 +++ b/block/block-backend.c
@@ -XXX,XX +XXX,XX @@ void blk_add_insert_bs_notifier(BlockBackend *blk, Notifier *notify)
      notifier_list_add(&blk->insert_bs_notifiers, notify);
  }
 -void coroutine_fn blk_co_io_plug(BlockBackend *blk)
 -{
 -    BlockDriverState *bs = blk_bs(blk);
 -    IO_CODE();
 -    GRAPH_RDLOCK_GUARD();
 -
 -    if (bs) {
 -        bdrv_co_io_plug(bs);
 -    }
 -}
 -
 -void coroutine_fn blk_co_io_unplug(BlockBackend *blk)
 -{
 -    BlockDriverState *bs = blk_bs(blk);
 -    IO_CODE();
 -    GRAPH_RDLOCK_GUARD();
 -
 -    if (bs) {
 -        bdrv_co_io_unplug(bs);
 -    }
 -}
 -
  BlockAcctStats *blk_get_stats(BlockBackend *blk)
  {
      IO_CODE();
 diff --git a/block/plug.c b/block/plug.c
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
 +++ b/block/plug.c
@@ -XXX,XX +XXX,XX @@
 +/* SPDX-License-Identifier: GPL-2.0-or-later */
 +/*
 + * Block I/O plugging
 + *
 + * Copyright Red Hat.
 + *
 + * This API defers a function call within a blk_io_plug()/blk_io_unplug()
 + * section, allowing multiple calls to batch up. This is a performance
 + * optimization that is used in the block layer to submit several I/O requests
 + * at once instead of individually:
 + *
 + *   blk_io_plug(); <-- start of plugged region
 + *   ...
 + *   blk_io_plug_call(my_func, my_obj); <-- deferred my_func(my_obj) call
 + *   blk_io_plug_call(my_func, my_obj); <-- another
 + *   blk_io_plug_call(my_func, my_obj); <-- another
 + *   ...
 + *   blk_io_unplug(); <-- end of plugged region, my_func(my_obj) is called once
 + *
 + * This code is actually generic and not tied to the block layer. If another
 + * subsystem needs this functionality, it could be renamed.
 + */
 +
 +#include "qemu/osdep.h"
 +#include "qemu/coroutine-tls.h"
 +#include "qemu/notify.h"
 +#include "qemu/thread.h"
 +#include "sysemu/block-backend.h"
 +
 +/* A function call that has been deferred until unplug() */
 +typedef struct {
 +    void (*fn)(void *);
 +    void *opaque;
 +} UnplugFn;
 +
 +/* Per-thread state */
 +typedef struct {
 +    unsigned count;       /* how many times has plug() been called? */
 +    GArray *unplug_fns;   /* functions to call at unplug time */
 +} Plug;
 +
 +/* Use get_ptr_plug() to fetch this thread-local value */
 +QEMU_DEFINE_STATIC_CO_TLS(Plug, plug);
 +
 +/* Called at thread cleanup time */
 +static void blk_io_plug_atexit(Notifier *n, void *value)
 +{
 +    Plug *plug = get_ptr_plug();
 +    g_array_free(plug->unplug_fns, TRUE);
 +}
 +
 +/* This won't involve coroutines, so use __thread */
 +static __thread Notifier blk_io_plug_atexit_notifier;
 +
 +/**
 + * blk_io_plug_call:
 + * @fn: a function pointer to be invoked
 + * @opaque: a user-defined argument to @fn()
 + *
 + * Call @fn(@opaque) immediately if not within a blk_io_plug()/blk_io_unplug()
 + * section.
 + *
 + * Otherwise defer the call until the end of the outermost
 + * blk_io_plug()/blk_io_unplug() section in this thread. If the same
 + * @fn/@opaque pair has already been deferred, it will only be called once upon
 + * blk_io_unplug() so that accumulated calls are batched into a single call.
 + *
 + * The caller must ensure that @opaque is not freed before @fn() is invoked.
 + */
 +void blk_io_plug_call(void (*fn)(void *), void *opaque)
 +{
 +    Plug *plug = get_ptr_plug();
 +
 +    /* Call immediately if we're not plugged */
 +    if (plug->count == 0) {
 +        fn(opaque);
 +        return;
 +    }
 +
 +    GArray *array = plug->unplug_fns;
 +    if (!array) {
 +        array = g_array_new(FALSE, FALSE, sizeof(UnplugFn));
 +        plug->unplug_fns = array;
 +        blk_io_plug_atexit_notifier.notify = blk_io_plug_atexit;
 +        qemu_thread_atexit_add(&blk_io_plug_atexit_notifier);
 +    }
 +
 +    UnplugFn *fns = (UnplugFn *)array->data;
 +    UnplugFn new_fn = {
 +        .fn = fn,
 +        .opaque = opaque,
 +    };
 +
 +    /*
 +     * There won't be many, so do a linear search. If this becomes a bottleneck
 +     * then a binary search (glib 2.62+) or different data structure could be
 +     * used.
 +     */
 +    for (guint i = 0; i < array->len; i++) {
 +        if (memcmp(&fns[i], &new_fn, sizeof(new_fn)) == 0) {
 +            return; /* already exists */
 +        }
 +    }
 +
 +    g_array_append_val(array, new_fn);
 +}
 +
 +/**
 + * blk_io_plug: Defer blk_io_plug_call() functions until blk_io_unplug()
 + *
 + * blk_io_plug/unplug are thread-local operations. This means that multiple
 + * threads can simultaneously call plug/unplug, but the caller must ensure that
 + * each unplug() is called in the same thread of the matching plug().
 + *
 + * Nesting is supported. blk_io_plug_call() functions are only called at the
 + * outermost blk_io_unplug().
 + */
 +void blk_io_plug(void)
 +{
 +    Plug *plug = get_ptr_plug();
 +
 +    assert(plug->count < UINT32_MAX);
 +
 +    plug->count++;
 +}
 +
 +/**
 + * blk_io_unplug: Run any pending blk_io_plug_call() functions
 + *
 + * There must have been a matching blk_io_plug() call in the same thread prior
 + * to this blk_io_unplug() call.
 + */
 +void blk_io_unplug(void)
 +{
 +    Plug *plug = get_ptr_plug();
 +
 +    assert(plug->count > 0);
 +
 +    if (--plug->count > 0) {
 +        return;
 +    }
 +
 +    GArray *array = plug->unplug_fns;
 +    if (!array) {
 +        return;
 +    }
 +
 +    UnplugFn *fns = (UnplugFn *)array->data;
 +
 +    for (guint i = 0; i < array->len; i++) {
 +        fns[i].fn(fns[i].opaque);
 +    }
 +
 +    /*
 +     * This resets the array without freeing memory so that appending is cheap
 +     * in the future.
 +     */
 +    g_array_set_size(array, 0);
 +}
 diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/block/dataplane/xen-block.c
 +++ b/hw/block/dataplane/xen-block.c
@@ -XXX,XX +XXX,XX @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
       * is below us.
       */
      if (inflight_atstart > IO_PLUG_THRESHOLD) {
 -        blk_io_plug(dataplane->blk);
 +        blk_io_plug();
      }
      while (rc != rp) {
          /* pull request from ring */
@@ -XXX,XX +XXX,XX @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
          if (inflight_atstart > IO_PLUG_THRESHOLD &&
              batched >= inflight_atstart) {
 -            blk_io_unplug(dataplane->blk);
 +            blk_io_unplug();
          }
          xen_block_do_aio(request);
          if (inflight_atstart > IO_PLUG_THRESHOLD) {
              if (batched >= inflight_atstart) {
 -                blk_io_plug(dataplane->blk);
 +                blk_io_plug();
                  batched = 0;
              } else {
                  batched++;
@@ -XXX,XX +XXX,XX @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
          }
      }
      if (inflight_atstart > IO_PLUG_THRESHOLD) {
 -        blk_io_unplug(dataplane->blk);
 +        blk_io_unplug();
      }
      return done_something;
 diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/block/virtio-blk.c
 +++ b/hw/block/virtio-blk.c
@@ -XXX,XX +XXX,XX @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
      bool suppress_notifications = virtio_queue_get_notification(vq);
      aio_context_acquire(blk_get_aio_context(s->blk));
 -    blk_io_plug(s->blk);
 +    blk_io_plug();
      do {
          if (suppress_notifications) {
@@ -XXX,XX +XXX,XX @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
          virtio_blk_submit_multireq(s, &mrb);
      }
 -    blk_io_unplug(s->blk);
 +    blk_io_unplug();
      aio_context_release(blk_get_aio_context(s->blk));
  }
 diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/scsi/virtio-scsi.c
 +++ b/hw/scsi/virtio-scsi.c
@@ -XXX,XX +XXX,XX @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req)
          return -ENOBUFS;
      }
      scsi_req_ref(req->sreq);
 -    blk_io_plug(d->conf.blk);
 +    blk_io_plug();
      object_unref(OBJECT(d));
      return 0;
  }
@@ -XXX,XX +XXX,XX @@ static void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req)
      if (scsi_req_enqueue(sreq)) {
          scsi_req_continue(sreq);
      }
 -    blk_io_unplug(sreq->dev->conf.blk);
 +    blk_io_unplug();
      scsi_req_unref(sreq);
  }
@@ -XXX,XX +XXX,XX @@ static void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
                  while (!QTAILQ_EMPTY(&reqs)) {
                      req = QTAILQ_FIRST(&reqs);
                      QTAILQ_REMOVE(&reqs, req, next);
 -                    blk_io_unplug(req->sreq->dev->conf.blk);
 +                    blk_io_unplug();
                      scsi_req_unref(req->sreq);
                      virtqueue_detach_element(req->vq, &req->elem, 0);
                      virtio_scsi_free_req(req);
 diff --git a/block/meson.build b/block/meson.build
 index XXXXXXX..XXXXXXX 100644
 --- a/block/meson.build
 +++ b/block/meson.build
@@ -XXX,XX +XXX,XX @@ block_ss.add(files(
    'mirror.c',
    'nbd.c',
    'null.c',
 +  'plug.c',
    'qapi.c',
    'qcow2-bitmap.c',
    'qcow2-cache.c',
 --
-.20.1
+.40.1

-[Qemu-devel] [PULL 3/7] iothread: replace init_done_cond with a semaphore
+[PULL 2/8] block/nvme: convert to blk_io_plug_call() API
-From: Peter Xu <peterx@redhat.com>
+Stop using the .bdrv_co_io_plug() API because it is not multi-queue
 block layer friendly. Use the new blk_io_plug_call() API to batch I/O
 submission instead.
-Only sending an init-done message using lock+cond seems an overkill to
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-me.  Replacing it with a simpler semaphore.
+Reviewed-by: Eric Blake <eblake@redhat.com>
+Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
-Meanwhile, init the semaphore unconditionally, then we can destroy it
+Acked-by: Kevin Wolf <kwolf@redhat.com>
-unconditionally too in finalize which seems cleaner.
+Message-id: 20230530180959.1108766-3-stefanha@redhat.com
 Signed-off-by: Peter Xu <peterx@redhat.com>
 Message-id: 20190306115532.23025-2-peterx@redhat.com
 Message-Id: <20190306115532.23025-2-peterx@redhat.com>
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- include/sysemu/iothread.h |  3 +--
+ block/nvme.c       | 44 ++++++++++++--------------------------------
- iothread.c                | 17 ++++-------------
+ block/trace-events |  1 -
-files changed, 5 insertions(+), 15 deletions(-)
+files changed, 12 insertions(+), 33 deletions(-)
-diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
+diff --git a/block/nvme.c b/block/nvme.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/sysemu/iothread.h
+--- a/block/nvme.c
-+++ b/include/sysemu/iothread.h
++++ b/block/nvme.c
-@@ -XXX,XX +XXX,XX @@ typedef struct {
+@@ -XXX,XX +XXX,XX @@
-     GMainContext *worker_context;
+ #include "qemu/vfio-helpers.h"
-     GMainLoop *main_loop;
+ #include "block/block-io.h"
-     GOnce once;
+ #include "block/block_int.h"
--    QemuMutex init_done_lock;
++#include "sysemu/block-backend.h"
--    QemuCond init_done_cond;    /* is thread initialization done? */
+ #include "sysemu/replay.h"
-+    QemuSemaphore init_done_sem; /* is thread init done? */
+ #include "trace.h"
-     bool stopping;              /* has iothread_stop() been called? */
-     bool running;               /* should iothread_run() continue? */
+@@ -XXX,XX +XXX,XX @@ struct BDRVNVMeState {
-     int thread_id;
+     int blkshift;
-diff --git a/iothread.c b/iothread.c
-index XXXXXXX..XXXXXXX 100644
+     uint64_t max_transfer;
---- a/iothread.c
+-    bool plugged;
-+++ b/iothread.c
-@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
+     bool supports_write_zeroes;
-     rcu_register_thread();
+     bool supports_discard;
+@@ -XXX,XX +XXX,XX @@ static void nvme_kick(NVMeQueuePair *q)
-     my_iothread = iothread;
+ {
--    qemu_mutex_lock(&iothread->init_done_lock);
+     BDRVNVMeState *s = q->s;
-     iothread->thread_id = qemu_get_thread_id();
--    qemu_cond_signal(&iothread->init_done_cond);
+-    if (s->plugged || !q->need_kick) {
--    qemu_mutex_unlock(&iothread->init_done_lock);
++    if (!q->need_kick) {
 +    qemu_sem_post(&iothread->init_done_sem);
      while (iothread->running) {
          aio_poll(iothread->ctx, true);
@@ -XXX,XX +XXX,XX @@ static void iothread_instance_init(Object *obj)
      iothread->poll_max_ns = IOTHREAD_POLL_MAX_NS_DEFAULT;
      iothread->thread_id = -1;
 +    qemu_sem_init(&iothread->init_done_sem, 0);
  }
  static void iothread_instance_finalize(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void iothread_instance_finalize(Object *obj)
      iothread_stop(iothread);
 -    if (iothread->thread_id != -1) {
 -        qemu_cond_destroy(&iothread->init_done_cond);
 -        qemu_mutex_destroy(&iothread->init_done_lock);
 -    }
      /*
       * Before glib2 2.33.10, there is a glib2 bug that GSource context
       * pointer may not be cleared even if the context has already been
@@ -XXX,XX +XXX,XX @@ static void iothread_instance_finalize(Object *obj)
          g_main_context_unref(iothread->worker_context);
          iothread->worker_context = NULL;
      }
 +    qemu_sem_destroy(&iothread->init_done_sem);
  }
  static void iothread_complete(UserCreatable *obj, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void iothread_complete(UserCreatable *obj, Error **errp)
          return;
      }
+     trace_nvme_kick(s, q->index);
--    qemu_mutex_init(&iothread->init_done_lock);
+@@ -XXX,XX +XXX,XX @@ static bool nvme_process_completion(NVMeQueuePair *q)
--    qemu_cond_init(&iothread->init_done_cond);
+     NvmeCqe *c;
-     iothread->once = (GOnce) G_ONCE_INIT;
+     trace_nvme_process_completion(s, q->index, q->inflight);
-     /* This assumes we are called from a thread with useful CPU affinity for us
+-    if (s->plugged) {
-@@ -XXX,XX +XXX,XX @@ static void iothread_complete(UserCreatable *obj, Error **errp)
+-        trace_nvme_process_completion_queue_plugged(s, q->index);
-     g_free(name);
+-        return false;
+-    }
-     /* Wait for initialization to complete */
--    qemu_mutex_lock(&iothread->init_done_lock);
+     /*
-     while (iothread->thread_id == -1) {
+      * Support re-entrancy when a request cb() function invokes aio_poll().
--        qemu_cond_wait(&iothread->init_done_cond,
+@@ -XXX,XX +XXX,XX @@ static void nvme_trace_command(const NvmeCmd *cmd)
 -                       &iothread->init_done_lock);
 +        qemu_sem_wait(&iothread->init_done_sem);
      }
--    qemu_mutex_unlock(&iothread->init_done_lock);
  }
- typedef struct {
++static void nvme_unplug_fn(void *opaque)
 +{
 +    NVMeQueuePair *q = opaque;
 +
 +    QEMU_LOCK_GUARD(&q->lock);
 +    nvme_kick(q);
 +    nvme_process_completion(q);
 +}
 +
  static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
                                  NvmeCmd *cmd, BlockCompletionFunc cb,
                                  void *opaque)
@@ -XXX,XX +XXX,XX @@ static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
             q->sq.tail * NVME_SQ_ENTRY_BYTES, cmd, sizeof(*cmd));
      q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE;
      q->need_kick++;
 -    nvme_kick(q);
 -    nvme_process_completion(q);
 +    blk_io_plug_call(nvme_unplug_fn, q);
      qemu_mutex_unlock(&q->lock);
  }
@@ -XXX,XX +XXX,XX @@ static void nvme_attach_aio_context(BlockDriverState *bs,
      }
  }
 -static void coroutine_fn nvme_co_io_plug(BlockDriverState *bs)
 -{
 -    BDRVNVMeState *s = bs->opaque;
 -    assert(!s->plugged);
 -    s->plugged = true;
 -}
 -
 -static void coroutine_fn nvme_co_io_unplug(BlockDriverState *bs)
 -{
 -    BDRVNVMeState *s = bs->opaque;
 -    assert(s->plugged);
 -    s->plugged = false;
 -    for (unsigned i = INDEX_IO(0); i < s->queue_count; i++) {
 -        NVMeQueuePair *q = s->queues[i];
 -        qemu_mutex_lock(&q->lock);
 -        nvme_kick(q);
 -        nvme_process_completion(q);
 -        qemu_mutex_unlock(&q->lock);
 -    }
 -}
 -
  static bool nvme_register_buf(BlockDriverState *bs, void *host, size_t size,
                                Error **errp)
  {
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_nvme = {
      .bdrv_detach_aio_context  = nvme_detach_aio_context,
      .bdrv_attach_aio_context  = nvme_attach_aio_context,
 -    .bdrv_co_io_plug          = nvme_co_io_plug,
 -    .bdrv_co_io_unplug        = nvme_co_io_unplug,
 -
      .bdrv_register_buf        = nvme_register_buf,
      .bdrv_unregister_buf      = nvme_unregister_buf,
  };
 diff --git a/block/trace-events b/block/trace-events
 index XXXXXXX..XXXXXXX 100644
 --- a/block/trace-events
 +++ b/block/trace-events
@@ -XXX,XX +XXX,XX @@ nvme_kick(void *s, unsigned q_index) "s %p q #%u"
  nvme_dma_flush_queue_wait(void *s) "s %p"
  nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) "cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
  nvme_process_completion(void *s, unsigned q_index, int inflight) "s %p q #%u inflight %d"
 -nvme_process_completion_queue_plugged(void *s, unsigned q_index) "s %p q #%u"
  nvme_complete_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
  nvme_submit_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
  nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x"
 --
-.20.1
+.40.1

-[Qemu-devel] [PULL 4/7] iothread: create the gcontext unconditionally
+[PULL 3/8] block/blkio: convert to blk_io_plug_call() API
-From: Peter Xu <peterx@redhat.com>
+Stop using the .bdrv_co_io_plug() API because it is not multi-queue
 block layer friendly. Use the new blk_io_plug_call() API to batch I/O
 submission instead.
-In existing code we create the gcontext dynamically at the first
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-access of the gcontext from caller.  That can bring some complexity
+Reviewed-by: Eric Blake <eblake@redhat.com>
-and potential races during using iothread.  Since the context itself
+Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
-is not that big a resource, and we won't have millions of iothread,
+Acked-by: Kevin Wolf <kwolf@redhat.com>
-let's simply create the gcontext unconditionally.
+Message-id: 20230530180959.1108766-4-stefanha@redhat.com
 This will also be a preparation work further to move the thread
 context push operation earlier than before (now it's only pushed right
 before we want to start running the gmainloop).
 Removing the g_once since it's not necessary, while introducing a new
 run_gcontext boolean to show whether we want to run the gcontext.
 Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
 Signed-off-by: Peter Xu <peterx@redhat.com>
 Message-id: 20190306115532.23025-3-peterx@redhat.com
 Message-Id: <20190306115532.23025-3-peterx@redhat.com>
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- include/sysemu/iothread.h |  2 +-
+ block/blkio.c | 43 ++++++++++++++++++++++++-------------------
- iothread.c                | 43 +++++++++++++++++++--------------------
+file changed, 24 insertions(+), 19 deletions(-)
 files changed, 22 insertions(+), 23 deletions(-)
-diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
+diff --git a/block/blkio.c b/block/blkio.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/sysemu/iothread.h
+--- a/block/blkio.c
-+++ b/include/sysemu/iothread.h
++++ b/block/blkio.c
-@@ -XXX,XX +XXX,XX @@ typedef struct {
+@@ -XXX,XX +XXX,XX @@
+ #include "qemu/error-report.h"
-     QemuThread thread;
+ #include "qapi/qmp/qdict.h"
-     AioContext *ctx;
+ #include "qemu/module.h"
-+    bool run_gcontext;          /* whether we should run gcontext */
++#include "sysemu/block-backend.h"
-     GMainContext *worker_context;
+ #include "exec/memory.h" /* for ram_block_discard_disable() */
-     GMainLoop *main_loop;
--    GOnce once;
+ #include "block/block-io.h"
-     QemuSemaphore init_done_sem; /* is thread init done? */
+@@ -XXX,XX +XXX,XX @@ static void blkio_detach_aio_context(BlockDriverState *bs)
-     bool stopping;              /* has iothread_stop() been called? */
+                        NULL, NULL, NULL);
      bool running;               /* should iothread_run() continue? */
 diff --git a/iothread.c b/iothread.c
 index XXXXXXX..XXXXXXX 100644
 --- a/iothread.c
 +++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
           * We must check the running state again in case it was
           * changed in previous aio_poll()
           */
 -        if (iothread->running && atomic_read(&iothread->worker_context)) {
 +        if (iothread->running && atomic_read(&iothread->run_gcontext)) {
              GMainLoop *loop;
              g_main_context_push_thread_default(iothread->worker_context);
@@ -XXX,XX +XXX,XX @@ static void iothread_instance_init(Object *obj)
      iothread->poll_max_ns = IOTHREAD_POLL_MAX_NS_DEFAULT;
      iothread->thread_id = -1;
      qemu_sem_init(&iothread->init_done_sem, 0);
 +    /* By default, we don't run gcontext */
 +    atomic_set(&iothread->run_gcontext, 0);
  }
- static void iothread_instance_finalize(Object *obj)
+-/* Call with s->blkio_lock held to submit I/O after enqueuing a new request */
-@@ -XXX,XX +XXX,XX @@ static void iothread_instance_finalize(Object *obj)
+-static void blkio_submit_io(BlockDriverState *bs)
-     qemu_sem_destroy(&iothread->init_done_sem);
++/*
 + * Called by blk_io_unplug() or immediately if not plugged. Called without
 + * blkio_lock.
 + */
 +static void blkio_unplug_fn(void *opaque)
  {
 -    if (qatomic_read(&bs->io_plugged) == 0) {
 -        BDRVBlkioState *s = bs->opaque;
 +    BDRVBlkioState *s = opaque;
 +    WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
          blkioq_do_io(s->blkioq, NULL, 0, 0, NULL);
      }
  }
-+static void iothread_init_gcontext(IOThread *iothread)
++/*
 + * Schedule I/O submission after enqueuing a new request. Called without
 + * blkio_lock.
 + */
 +static void blkio_submit_io(BlockDriverState *bs)
 +{
-+    GSource *source;
++    BDRVBlkioState *s = bs->opaque;
 +
-+    iothread->worker_context = g_main_context_new();
++    blk_io_plug_call(blkio_unplug_fn, s);
 +    source = aio_get_g_source(iothread_get_aio_context(iothread));
 +    g_source_attach(source, iothread->worker_context);
 +    g_source_unref(source);
 +}
 +
- static void iothread_complete(UserCreatable *obj, Error **errp)
+ static int coroutine_fn
  blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
  {
-     Error *local_error = NULL;
+@@ -XXX,XX +XXX,XX @@ blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
-@@ -XXX,XX +XXX,XX @@ static void iothread_complete(UserCreatable *obj, Error **errp)
-         return;
+     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
          blkioq_discard(s->blkioq, offset, bytes, &cod, 0);
 -        blkio_submit_io(bs);
      }
-+    /*
++    blkio_submit_io(bs);
-+     * Init one GMainContext for the iothread unconditionally, even if
+     qemu_coroutine_yield();
-+     * it's not used
+     return cod.ret;
-+     */
+ }
-+    iothread_init_gcontext(iothread);
+@@ -XXX,XX +XXX,XX @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
-+
-     aio_context_set_poll_params(iothread->ctx,
+     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
-                                 iothread->poll_max_ns,
+         blkioq_readv(s->blkioq, offset, iov, iovcnt, &cod, 0);
-                                 iothread->poll_grow,
+-        blkio_submit_io(bs);
@@ -XXX,XX +XXX,XX @@ static void iothread_complete(UserCreatable *obj, Error **errp)
          return;
      }
--    iothread->once = (GOnce) G_ONCE_INIT;
++    blkio_submit_io(bs);
      qemu_coroutine_yield();
      if (use_bounce_buffer) {
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState *bs, int64_t offset,
      WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
          blkioq_writev(s->blkioq, offset, iov, iovcnt, &cod, blkio_flags);
 -        blkio_submit_io(bs);
      }
 +    blkio_submit_io(bs);
      qemu_coroutine_yield();
      if (use_bounce_buffer) {
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn blkio_co_flush(BlockDriverState *bs)
      WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
          blkioq_flush(s->blkioq, &cod, 0);
 -        blkio_submit_io(bs);
      }
 +    blkio_submit_io(bs);
      qemu_coroutine_yield();
      return cod.ret;
  }
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn blkio_co_pwrite_zeroes(BlockDriverState *bs,
      WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
          blkioq_write_zeroes(s->blkioq, offset, bytes, &cod, blkio_flags);
 -        blkio_submit_io(bs);
      }
 +    blkio_submit_io(bs);
      qemu_coroutine_yield();
      return cod.ret;
  }
 -static void coroutine_fn blkio_co_io_unplug(BlockDriverState *bs)
 -{
 -    BDRVBlkioState *s = bs->opaque;
 -
-     /* This assumes we are called from a thread with useful CPU affinity for us
+-    WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
-      * to inherit.
+-        blkio_submit_io(bs);
-      */
+-    }
@@ -XXX,XX +XXX,XX @@ IOThreadInfoList *qmp_query_iothreads(Error **errp)
      return head;
  }
 -static gpointer iothread_g_main_context_init(gpointer opaque)
 -{
 -    AioContext *ctx;
 -    IOThread *iothread = opaque;
 -    GSource *source;
 -
 -    iothread->worker_context = g_main_context_new();
 -
 -    ctx = iothread_get_aio_context(iothread);
 -    source = aio_get_g_source(ctx);
 -    g_source_attach(source, iothread->worker_context);
 -    g_source_unref(source);
 -
 -    aio_notify(iothread->ctx);
 -    return NULL;
 -}
 -
- GMainContext *iothread_get_g_main_context(IOThread *iothread)
+ typedef enum {
- {
+     BMRR_OK,
--    g_once(&iothread->once, iothread_g_main_context_init, iothread);
+     BMRR_SKIP,
--
+@@ -XXX,XX +XXX,XX @@ static void blkio_refresh_limits(BlockDriverState *bs, Error **errp)
-+    atomic_set(&iothread->run_gcontext, 1);
+         .bdrv_co_pwritev         = blkio_co_pwritev, \
-+    aio_notify(iothread->ctx);
+         .bdrv_co_flush_to_disk   = blkio_co_flush, \
-     return iothread->worker_context;
+         .bdrv_co_pwrite_zeroes   = blkio_co_pwrite_zeroes, \
- }
+-        .bdrv_co_io_unplug       = blkio_co_io_unplug, \
+         .bdrv_refresh_limits     = blkio_refresh_limits, \
          .bdrv_register_buf       = blkio_register_buf, \
          .bdrv_unregister_buf     = blkio_unregister_buf, \
 --
-.20.1
+.40.1

-New patch
+[PULL 4/8] block/io_uring: convert to blk_io_plug_call() API
+Stop using the .bdrv_co_io_plug() API because it is not multi-queue
+block layer friendly. Use the new blk_io_plug_call() API to batch I/O
+submission instead.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Eric Blake <eblake@redhat.com>
+Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
+Acked-by: Kevin Wolf <kwolf@redhat.com>
+Message-id: 20230530180959.1108766-5-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ include/block/raw-aio.h |  7 -------
+ block/file-posix.c      | 10 ----------
+ block/io_uring.c        | 44 ++++++++++++++++-------------------------
+ block/trace-events      |  5 ++---
+files changed, 19 insertions(+), 47 deletions(-)
+diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
+index XXXXXXX..XXXXXXX 100644
+--- a/include/block/raw-aio.h
++++ b/include/block/raw-aio.h
+@@ -XXX,XX +XXX,XX @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t offset,
+                                   QEMUIOVector *qiov, int type);
+ void luring_detach_aio_context(LuringState *s, AioContext *old_context);
+ void luring_attach_aio_context(LuringState *s, AioContext *new_context);
+-
+-/*
+- * luring_io_plug/unplug work in the thread's current AioContext, therefore the
+- * caller must ensure that they are paired in the same IOThread.
+- */
+-void luring_io_plug(void);
+-void luring_io_unplug(void);
+ #endif
+ #ifdef _WIN32
+diff --git a/block/file-posix.c b/block/file-posix.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/file-posix.c
++++ b/block/file-posix.c
+@@ -XXX,XX +XXX,XX @@ static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
+         laio_io_plug();
+     }
+ #endif
+-#ifdef CONFIG_LINUX_IO_URING
+-    if (s->use_linux_io_uring) {
+-        luring_io_plug();
+-    }
+-#endif
+ }
+ static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
+@@ -XXX,XX +XXX,XX @@ static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
+         laio_io_unplug(s->aio_max_batch);
+     }
+ #endif
+-#ifdef CONFIG_LINUX_IO_URING
+-    if (s->use_linux_io_uring) {
+-        luring_io_unplug();
+-    }
+-#endif
+ }
+ static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
+diff --git a/block/io_uring.c b/block/io_uring.c
+index XXXXXXX..XXXXXXX 100644
+--- a/block/io_uring.c
++++ b/block/io_uring.c
+@@ -XXX,XX +XXX,XX @@
+ #include "block/raw-aio.h"
+ #include "qemu/coroutine.h"
+ #include "qapi/error.h"
++#include "sysemu/block-backend.h"
+ #include "trace.h"
+ /* Only used for assertions.  */
+@@ -XXX,XX +XXX,XX @@ typedef struct LuringAIOCB {
+ } LuringAIOCB;
+ typedef struct LuringQueue {
+-    int plugged;
+     unsigned int in_queue;
+     unsigned int in_flight;
+     bool blocked;
+@@ -XXX,XX +XXX,XX @@ static void luring_process_completions_and_submit(LuringState *s)
+ {
+     luring_process_completions(s);
+-    if (!s->io_q.plugged && s->io_q.in_queue > 0) {
++    if (s->io_q.in_queue > 0) {
+         ioq_submit(s);
+     }
+ }
+@@ -XXX,XX +XXX,XX @@ static void qemu_luring_poll_ready(void *opaque)
+ static void ioq_init(LuringQueue *io_q)
+ {
+     QSIMPLEQ_INIT(&io_q->submit_queue);
+-    io_q->plugged = 0;
+     io_q->in_queue = 0;
+     io_q->in_flight = 0;
+     io_q->blocked = false;
+ }
+-void luring_io_plug(void)
++static void luring_unplug_fn(void *opaque)
+ {
+-    AioContext *ctx = qemu_get_current_aio_context();
+-    LuringState *s = aio_get_linux_io_uring(ctx);
+-    trace_luring_io_plug(s);
+-    s->io_q.plugged++;
+-}
+-
+-void luring_io_unplug(void)
+-{
+-    AioContext *ctx = qemu_get_current_aio_context();
+-    LuringState *s = aio_get_linux_io_uring(ctx);
+-    assert(s->io_q.plugged);
+-    trace_luring_io_unplug(s, s->io_q.blocked, s->io_q.plugged,
+-                           s->io_q.in_queue, s->io_q.in_flight);
+-    if (--s->io_q.plugged == 0 &&
+-        !s->io_q.blocked && s->io_q.in_queue > 0) {
++    LuringState *s = opaque;
++    trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
++                           s->io_q.in_flight);
++    if (!s->io_q.blocked && s->io_q.in_queue > 0) {
+         ioq_submit(s);
+     }
+ }
+@@ -XXX,XX +XXX,XX @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
+     QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
+     s->io_q.in_queue++;
+-    trace_luring_do_submit(s, s->io_q.blocked, s->io_q.plugged,
+-                           s->io_q.in_queue, s->io_q.in_flight);
+-    if (!s->io_q.blocked &&
+-        (!s->io_q.plugged ||
+-         s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) {
+-        ret = ioq_submit(s);
+-        trace_luring_do_submit_done(s, ret);
+-        return ret;
++    trace_luring_do_submit(s, s->io_q.blocked, s->io_q.in_queue,
++                           s->io_q.in_flight);
++    if (!s->io_q.blocked) {
++        if (s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES) {
++            ret = ioq_submit(s);
++            trace_luring_do_submit_done(s, ret);
++            return ret;
++        }
++
++        blk_io_plug_call(luring_unplug_fn, s);
+     }
+     return 0;
+ }
+diff --git a/block/trace-events b/block/trace-events
+index XXXXXXX..XXXXXXX 100644
+--- a/block/trace-events
++++ b/block/trace-events
+@@ -XXX,XX +XXX,XX @@ file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "
+ # io_uring.c
+ luring_init_state(void *s, size_t size) "s %p size %zu"
+ luring_cleanup_state(void *s) "%p freed"
+-luring_io_plug(void *s) "LuringState %p plug"
+-luring_io_unplug(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d"
+-luring_do_submit(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d"
++luring_unplug_fn(void *s, int blocked, int queued, int inflight) "LuringState %p blocked %d queued %d inflight %d"
++luring_do_submit(void *s, int blocked, int queued, int inflight) "LuringState %p blocked %d queued %d inflight %d"
+ luring_do_submit_done(void *s, int ret) "LuringState %p submitted to kernel %d"
+ luring_co_submit(void *bs, void *s, void *luringcb, int fd, uint64_t offset, size_t nbytes, int type) "bs %p s %p luringcb %p fd %d offset %" PRId64 " nbytes %zd type %d"
+ luring_process_completion(void *s, void *aiocb, int ret) "LuringState %p luringcb %p ret %d"
+--
+.40.1

-[Qemu-devel] [PULL 7/7] iothread: document about why we need explicit aio_poll()
+[PULL 5/8] block/linux-aio: convert to blk_io_plug_call() API
-From: Peter Xu <peterx@redhat.com>
+Stop using the .bdrv_co_io_plug() API because it is not multi-queue
+block layer friendly. Use the new blk_io_plug_call() API to batch I/O
-After consulting Paolo I know why we'd better keep the explicit
+submission instead.
-aio_poll() in iothread_run().  Document it directly into the code so
-that future readers will know the answer from day one.
+Note that a dev_max_batch check is dropped in laio_io_unplug() because
+the semantics of unplug_fn() are different from .bdrv_co_unplug():
-Signed-off-by: Peter Xu <peterx@redhat.com>
+. unplug_fn() is only called when the last blk_io_unplug() call occurs,
-Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
+   not every time blk_io_unplug() is called.
-Message-id: 20190306115532.23025-6-peterx@redhat.com
+. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
-Message-Id: <20190306115532.23025-6-peterx@redhat.com>
+   way to get per-BlockDriverState fields like dev_max_batch.
 Therefore this condition cannot be moved to laio_unplug_fn(). It is not
 obvious that this condition affects performance in practice, so I am
 removing it instead of trying to come up with a more complex mechanism
 to preserve the condition.
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Reviewed-by: Eric Blake <eblake@redhat.com>
 Acked-by: Kevin Wolf <kwolf@redhat.com>
 Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
 Message-id: 20230530180959.1108766-6-stefanha@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- iothread.c | 9 +++++++++
+ include/block/raw-aio.h |  7 -------
-file changed, 9 insertions(+)
+ block/file-posix.c      | 28 ----------------------------
+ block/linux-aio.c       | 41 +++++++++++------------------------------
-diff --git a/iothread.c b/iothread.c
+files changed, 11 insertions(+), 65 deletions(-)
 diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
 index XXXXXXX..XXXXXXX 100644
---- a/iothread.c
+--- a/include/block/raw-aio.h
-+++ b/iothread.c
++++ b/include/block/raw-aio.h
-@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
+@@ -XXX,XX +XXX,XX @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov,
-     qemu_sem_post(&iothread->init_done_sem);
+ void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
-     while (iothread->running) {
+ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
-+        /*
+-
-+         * Note: from functional-wise the g_main_loop_run() below can
+-/*
-+         * already cover the aio_poll() events, but we can't run the
+- * laio_io_plug/unplug work in the thread's current AioContext, therefore the
-+         * main loop unconditionally because explicit aio_poll() here
+- * caller must ensure that they are paired in the same IOThread.
-+         * is faster than g_main_loop_run() when we do not need the
+- */
-+         * gcontext at all (e.g., pure block layer iothreads).  In
+-void laio_io_plug(void);
-+         * other words, when we want to run the gcontext with the
+-void laio_io_unplug(uint64_t dev_max_batch);
-+         * iothread we need to pay some performance for functionality.
+ #endif
-+         */
+ /* io_uring.c - Linux io_uring implementation */
-         aio_poll(iothread->ctx, true);
+ #ifdef CONFIG_LINUX_IO_URING
+diff --git a/block/file-posix.c b/block/file-posix.c
-         /*
+index XXXXXXX..XXXXXXX 100644
 --- a/block/file-posix.c
 +++ b/block/file-posix.c
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offset,
      return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
  }
 -static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
 -{
 -    BDRVRawState __attribute__((unused)) *s = bs->opaque;
 -#ifdef CONFIG_LINUX_AIO
 -    if (s->use_linux_aio) {
 -        laio_io_plug();
 -    }
 -#endif
 -}
 -
 -static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
 -{
 -    BDRVRawState __attribute__((unused)) *s = bs->opaque;
 -#ifdef CONFIG_LINUX_AIO
 -    if (s->use_linux_aio) {
 -        laio_io_unplug(s->aio_max_batch);
 -    }
 -#endif
 -}
 -
  static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
  {
      BDRVRawState *s = bs->opaque;
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_file = {
      .bdrv_co_copy_range_from = raw_co_copy_range_from,
      .bdrv_co_copy_range_to  = raw_co_copy_range_to,
      .bdrv_refresh_limits = raw_refresh_limits,
 -    .bdrv_co_io_plug        = raw_co_io_plug,
 -    .bdrv_co_io_unplug      = raw_co_io_unplug,
      .bdrv_attach_aio_context = raw_aio_attach_aio_context,
      .bdrv_co_truncate                   = raw_co_truncate,
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
      .bdrv_co_copy_range_from = raw_co_copy_range_from,
      .bdrv_co_copy_range_to  = raw_co_copy_range_to,
      .bdrv_refresh_limits = raw_refresh_limits,
 -    .bdrv_co_io_plug        = raw_co_io_plug,
 -    .bdrv_co_io_unplug      = raw_co_io_unplug,
      .bdrv_attach_aio_context = raw_aio_attach_aio_context,
      .bdrv_co_truncate                   = raw_co_truncate,
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_cdrom = {
      .bdrv_co_pwritev        = raw_co_pwritev,
      .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
      .bdrv_refresh_limits    = cdrom_refresh_limits,
 -    .bdrv_co_io_plug        = raw_co_io_plug,
 -    .bdrv_co_io_unplug      = raw_co_io_unplug,
      .bdrv_attach_aio_context = raw_aio_attach_aio_context,
      .bdrv_co_truncate                   = raw_co_truncate,
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_cdrom = {
      .bdrv_co_pwritev        = raw_co_pwritev,
      .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
      .bdrv_refresh_limits    = cdrom_refresh_limits,
 -    .bdrv_co_io_plug        = raw_co_io_plug,
 -    .bdrv_co_io_unplug      = raw_co_io_unplug,
      .bdrv_attach_aio_context = raw_aio_attach_aio_context,
      .bdrv_co_truncate                   = raw_co_truncate,
 diff --git a/block/linux-aio.c b/block/linux-aio.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/linux-aio.c
 +++ b/block/linux-aio.c
@@ -XXX,XX +XXX,XX @@
  #include "qemu/event_notifier.h"
  #include "qemu/coroutine.h"
  #include "qapi/error.h"
 +#include "sysemu/block-backend.h"
  /* Only used for assertions.  */
  #include "qemu/coroutine_int.h"
@@ -XXX,XX +XXX,XX @@ struct qemu_laiocb {
  };
  typedef struct {
 -    int plugged;
      unsigned int in_queue;
      unsigned int in_flight;
      bool blocked;
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
  {
      qemu_laio_process_completions(s);
 -    if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
 +    if (!QSIMPLEQ_EMPTY(&s->io_q.pending)) {
          ioq_submit(s);
      }
  }
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_poll_ready(EventNotifier *opaque)
  static void ioq_init(LaioQueue *io_q)
  {
      QSIMPLEQ_INIT(&io_q->pending);
 -    io_q->plugged = 0;
      io_q->in_queue = 0;
      io_q->in_flight = 0;
      io_q->blocked = false;
@@ -XXX,XX +XXX,XX @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
      return max_batch;
  }
 -void laio_io_plug(void)
 +static void laio_unplug_fn(void *opaque)
  {
 -    AioContext *ctx = qemu_get_current_aio_context();
 -    LinuxAioState *s = aio_get_linux_aio(ctx);
 +    LinuxAioState *s = opaque;
 -    s->io_q.plugged++;
 -}
 -
 -void laio_io_unplug(uint64_t dev_max_batch)
 -{
 -    AioContext *ctx = qemu_get_current_aio_context();
 -    LinuxAioState *s = aio_get_linux_aio(ctx);
 -
 -    assert(s->io_q.plugged);
 -    s->io_q.plugged--;
 -
 -    /*
 -     * Why max batch checking is performed here:
 -     * Another BDS may have queued requests with a higher dev_max_batch and
 -     * therefore in_queue could now exceed our dev_max_batch. Re-check the max
 -     * batch so we can honor our device's dev_max_batch.
 -     */
 -    if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||
 -        (!s->io_q.plugged &&
 -         !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) {
 +    if (!s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
          ioq_submit(s);
      }
  }
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
      QSIMPLEQ_INSERT_TAIL(&s->io_q.pending, laiocb, next);
      s->io_q.in_queue++;
 -    if (!s->io_q.blocked &&
 -        (!s->io_q.plugged ||
 -         s->io_q.in_queue >= laio_max_batch(s, dev_max_batch))) {
 -        ioq_submit(s);
 +    if (!s->io_q.blocked) {
 +        if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch)) {
 +            ioq_submit(s);
 +        } else {
 +            blk_io_plug_call(laio_unplug_fn, s);
 +        }
      }
      return 0;
 --
-.20.1
+.40.1

-[Qemu-devel] [PULL 6/7] iothread: push gcontext earlier in the thread_fn
+[PULL 6/8] block: remove bdrv_co_io_plug() API
-From: Peter Xu <peterx@redhat.com>
+No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
+function pointers.
 We were pushing the context until right before running the gmainloop.
 Now since we have everything unconditionally, we can move this
 earlier.
 One benefit is that now it's done even before init_done_sem, so as
 long as the iothread user calls iothread_create() and completes, we
 know that the thread stack is ready.
 Signed-off-by: Peter Xu <peterx@redhat.com>
 Message-id: 20190306115532.23025-5-peterx@redhat.com
 Message-Id: <20190306115532.23025-5-peterx@redhat.com>
 [Tweaked comment wording as discussed with Peter Xu.
 --Stefan]
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Eric Blake <eblake@redhat.com>
+Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
+Acked-by: Kevin Wolf <kwolf@redhat.com>
+Message-id: 20230530180959.1108766-7-stefanha@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- iothread.c | 9 ++++++---
+ include/block/block-io.h         |  3 ---
-file changed, 6 insertions(+), 3 deletions(-)
+ include/block/block_int-common.h | 11 ----------
  block/io.c                       | 37 --------------------------------
 files changed, 51 deletions(-)
-diff --git a/iothread.c b/iothread.c
+diff --git a/include/block/block-io.h b/include/block/block-io.h
 index XXXXXXX..XXXXXXX 100644
---- a/iothread.c
+--- a/include/block/block-io.h
-+++ b/iothread.c
++++ b/include/block/block-io.h
-@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
+@@ -XXX,XX +XXX,XX @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx);
-     IOThread *iothread = opaque;
+ AioContext *child_of_bds_get_parent_aio_context(BdrvChild *c);
-     rcu_register_thread();
 -void coroutine_fn GRAPH_RDLOCK bdrv_co_io_plug(BlockDriverState *bs);
 -void coroutine_fn GRAPH_RDLOCK bdrv_co_io_unplug(BlockDriverState *bs);
 -
-+    /*
+ bool coroutine_fn GRAPH_RDLOCK
-+     * g_main_context_push_thread_default() must be called before anything
+ bdrv_co_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
-+     * in this new thread uses glib.
+                                    uint32_t granularity, Error **errp);
-+     */
+diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
-+    g_main_context_push_thread_default(iothread->worker_context);
+index XXXXXXX..XXXXXXX 100644
-     my_iothread = iothread;
+--- a/include/block/block_int-common.h
-     iothread->thread_id = qemu_get_thread_id();
++++ b/include/block/block_int-common.h
-     qemu_sem_post(&iothread->init_done_sem);
+@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
-@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
+     void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_debug_event)(
-          * changed in previous aio_poll()
+         BlockDriverState *bs, BlkdebugEvent event);
-          */
-         if (iothread->running && atomic_read(&iothread->run_gcontext)) {
+-    /* io queue for linux-aio */
--            g_main_context_push_thread_default(iothread->worker_context);
+-    void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_plug)(BlockDriverState *bs);
-             g_main_loop_run(iothread->main_loop);
+-    void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_unplug)(
--            g_main_context_pop_thread_default(iothread->worker_context);
+-        BlockDriverState *bs);
-         }
+-
-     }
+     bool (*bdrv_supports_persistent_dirty_bitmap)(BlockDriverState *bs);
-+    g_main_context_pop_thread_default(iothread->worker_context);
+     bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_can_store_new_dirty_bitmap)(
-     rcu_unregister_thread();
+@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
-     return NULL;
+     unsigned int in_flight;
      unsigned int serialising_in_flight;
 -    /*
 -     * counter for nested bdrv_io_plug.
 -     * Accessed with atomic ops.
 -     */
 -    unsigned io_plugged;
 -
      /* do we need to tell the quest if we have a volatile write cache? */
      int enable_write_cache;
 diff --git a/block/io.c b/block/io.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/io.c
 +++ b/block/io.c
@@ -XXX,XX +XXX,XX @@ void *qemu_try_blockalign0(BlockDriverState *bs, size_t size)
      return mem;
  }
+-void coroutine_fn bdrv_co_io_plug(BlockDriverState *bs)
+-{
+-    BdrvChild *child;
+-    IO_CODE();
+-    assert_bdrv_graph_readable();
+-
+-    QLIST_FOREACH(child, &bs->children, next) {
+-        bdrv_co_io_plug(child->bs);
+-    }
+-
+-    if (qatomic_fetch_inc(&bs->io_plugged) == 0) {
+-        BlockDriver *drv = bs->drv;
+-        if (drv && drv->bdrv_co_io_plug) {
+-            drv->bdrv_co_io_plug(bs);
+-        }
+-    }
+-}
+-
+-void coroutine_fn bdrv_co_io_unplug(BlockDriverState *bs)
+-{
+-    BdrvChild *child;
+-    IO_CODE();
+-    assert_bdrv_graph_readable();
+-
+-    assert(bs->io_plugged);
+-    if (qatomic_fetch_dec(&bs->io_plugged) == 1) {
+-        BlockDriver *drv = bs->drv;
+-        if (drv && drv->bdrv_co_io_unplug) {
+-            drv->bdrv_co_io_unplug(bs);
+-        }
+-    }
+-
+-    QLIST_FOREACH(child, &bs->children, next) {
+-        bdrv_co_io_unplug(child->bs);
+-    }
+-}
+-
+ /* Helper that undoes bdrv_register_buf() when it fails partway through */
+ static void GRAPH_RDLOCK
+ bdrv_register_buf_rollback(BlockDriverState *bs, void *host, size_t size,
 --
-.20.1
+.40.1

-[Qemu-devel] [PULL 2/7] hw/block/virtio-blk: Clean req->dev repetitions
+[PULL 7/8] block/blkio: use qemu_open() to support fd passing for virtio-blk
-From: Anastasiia Rusakova <arusakova917@gmail.com>
+From: Stefano Garzarella <sgarzare@redhat.com>
-Some functions sometimes uses req->dev even though a local variable
+Some virtio-blk drivers (e.g. virtio-blk-vhost-vdpa) supports the fd
-VirtIOBlock* s = req->dev has already been defined.
+passing. Let's expose this to the user, so the management layer
-Updated places to use s everywhere in the file.
+can pass the file descriptor of an already opened path.
-Signed-off-by: Anastasiia Rusakova <arusakova917@gmail.com>
+If the libblkio virtio-blk driver supports fd passing, let's always
-Message-id: 20190307161925.4158-1-rusakova.nastasia@icloud.com
+use qemu_open() to open the `path`, so we can handle fd passing
-Message-Id: <20190307161925.4158-1-rusakova.nastasia@icloud.com>
+from the management layer through the "/dev/fdset/N" special path.
 Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
 Message-id: 20230530071941.8954-2-sgarzare@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- hw/block/virtio-blk.c | 16 +++++++++-------
+ block/blkio.c | 53 ++++++++++++++++++++++++++++++++++++++++++---------
-file changed, 9 insertions(+), 7 deletions(-)
+file changed, 44 insertions(+), 9 deletions(-)
-diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
+diff --git a/block/blkio.c b/block/blkio.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/block/virtio-blk.c
+--- a/block/blkio.c
-+++ b/hw/block/virtio-blk.c
++++ b/block/blkio.c
-@@ -XXX,XX +XXX,XX @@ static void virtio_blk_rw_complete(void *opaque, int ret)
+@@ -XXX,XX +XXX,XX @@ static int blkio_virtio_blk_common_open(BlockDriverState *bs,
-         }
+ {
+     const char *path = qdict_get_try_str(options, "path");
-         if (ret) {
+     BDRVBlkioState *s = bs->opaque;
--            int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
+-    int ret;
-+            int p = virtio_ldl_p(VIRTIO_DEVICE(s), &req->out.type);
++    bool fd_supported = false;
-             bool is_read = !(p & VIRTIO_BLK_T_OUT);
++    int fd, ret;
-             /* Note that memory may be dirtied on read failure.  If the
-              * virtio request is not completed here, as is the case for
+     if (!path) {
-@@ -XXX,XX +XXX,XX @@ static void virtio_blk_rw_complete(void *opaque, int ret)
+         error_setg(errp, "missing 'path' option");
-         }
+         return -EINVAL;
          virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
 -        block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
 +        block_acct_done(blk_get_stats(s->blk), &req->acct);
          virtio_blk_free_request(req);
      }
-     aio_context_release(blk_get_aio_context(s->conf.conf.blk));
-@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_scsi_req(VirtIOBlockReq *req)
+-    ret = blkio_set_str(s->blkio, "path", path);
- {
+-    qdict_del(options, "path");
-     int status = VIRTIO_BLK_S_OK;
+-    if (ret < 0) {
-     struct virtio_scsi_inhdr *scsi = NULL;
+-        error_setg_errno(errp, -ret, "failed to set path: %s",
--    VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
+-                         blkio_get_error_msg());
--    VirtQueueElement *elem = &req->elem;
+-        return ret;
-     VirtIOBlock *blk = req->dev;
+-    }
-+    VirtIODevice *vdev = VIRTIO_DEVICE(blk);
+-
-+    VirtQueueElement *elem = &req->elem;
+     if (!(flags & BDRV_O_NOCACHE)) {
+         error_setg(errp, "cache.direct=off is not supported");
- #ifdef __linux__
+         return -EINVAL;
-     int i;
+     }
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
  static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb)
  {
 -    block_acct_start(blk_get_stats(req->dev->blk), &req->acct, 0,
 +    VirtIOBlock *s = req->dev;
 +
-+    block_acct_start(blk_get_stats(s->blk), &req->acct, 0,
++    if (blkio_get_int(s->blkio, "fd", &fd) == 0) {
-                      BLOCK_ACCT_FLUSH);
++        fd_supported = true;
++    }
-     /*
++
-      * Make sure all outstanding writes are posted to the backing device.
++    /*
-      */
++     * If the libblkio driver supports fd passing, let's always use qemu_open()
-     if (mrb->is_write && mrb->num_reqs > 0) {
++     * to open the `path`, so we can handle fd passing from the management
--        virtio_blk_submit_multireq(req->dev->blk, mrb);
++     * layer through the "/dev/fdset/N" special path.
-+        virtio_blk_submit_multireq(s->blk, mrb);
++     */
-     }
++    if (fd_supported) {
--    blk_aio_flush(req->dev->blk, virtio_blk_flush_complete, req);
++        int open_flags;
-+    blk_aio_flush(s->blk, virtio_blk_flush_complete, req);
++
 +        if (flags & BDRV_O_RDWR) {
 +            open_flags = O_RDWR;
 +        } else {
 +            open_flags = O_RDONLY;
 +        }
 +
 +        fd = qemu_open(path, open_flags, errp);
 +        if (fd < 0) {
 +            return -EINVAL;
 +        }
 +
 +        ret = blkio_set_int(s->blkio, "fd", fd);
 +        if (ret < 0) {
 +            error_setg_errno(errp, -ret, "failed to set fd: %s",
 +                             blkio_get_error_msg());
 +            qemu_close(fd);
 +            return ret;
 +        }
 +    } else {
 +        ret = blkio_set_str(s->blkio, "path", path);
 +        if (ret < 0) {
 +            error_setg_errno(errp, -ret, "failed to set path: %s",
 +                             blkio_get_error_msg());
 +            return ret;
 +        }
 +    }
 +
 +    qdict_del(options, "path");
 +
      return 0;
  }
- static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
 --
-.20.1
+.40.1

-[Qemu-devel] [PULL 5/7] iothread: create main loop unconditionally
+[PULL 8/8] qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa
-From: Peter Xu <peterx@redhat.com>
+From: Stefano Garzarella <sgarzare@redhat.com>
-Since we've have the gcontext always there, create the main loop
+The virtio-blk-vhost-vdpa driver in libblkio 1.3.0 supports the fd
-altogether.  The iothread_run() is even cleaner.
+passing through the new 'fd' property.
-Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
+Since now we are using qemu_open() on '@path' if the virtio-blk driver
-Signed-off-by: Peter Xu <peterx@redhat.com>
+supports the fd passing, let's announce it.
-Message-id: 20190306115532.23025-4-peterx@redhat.com
+In this way, the management layer can pass the file descriptor of an
-Message-Id: <20190306115532.23025-4-peterx@redhat.com>
+already opened vhost-vdpa character device. This is useful especially
 when the device can only be accessed with certain privileges.
 Add the '@fdset' feature only when the virtio-blk-vhost-vdpa driver
 in libblkio supports it.
 Suggested-by: Markus Armbruster <armbru@redhat.com>
 Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
 Message-id: 20230530071941.8954-3-sgarzare@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- iothread.c | 12 +++---------
+ qapi/block-core.json | 6 ++++++
-file changed, 3 insertions(+), 9 deletions(-)
+ meson.build          | 4 ++++
 files changed, 10 insertions(+)
-diff --git a/iothread.c b/iothread.c
+diff --git a/qapi/block-core.json b/qapi/block-core.json
 index XXXXXXX..XXXXXXX 100644
---- a/iothread.c
+--- a/qapi/block-core.json
-+++ b/iothread.c
++++ b/qapi/block-core.json
-@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
+@@ -XXX,XX +XXX,XX @@
-          * changed in previous aio_poll()
+ #
-          */
+ # @path: path to the vhost-vdpa character device.
-         if (iothread->running && atomic_read(&iothread->run_gcontext)) {
+ #
--            GMainLoop *loop;
++# Features:
--
++# @fdset: Member @path supports the special "/dev/fdset/N" path
-             g_main_context_push_thread_default(iothread->worker_context);
++#     (since 8.1)
--            iothread->main_loop =
++#
--                g_main_loop_new(iothread->worker_context, TRUE);
+ # Since: 7.2
--            loop = iothread->main_loop;
+ ##
--
+ { 'struct': 'BlockdevOptionsVirtioBlkVhostVdpa',
-             g_main_loop_run(iothread->main_loop);
+   'data': { 'path': 'str' },
--            iothread->main_loop = NULL;
++  'features': [ { 'name' :'fdset',
--            g_main_loop_unref(loop);
++                  'if': 'CONFIG_BLKIO_VHOST_VDPA_FD' } ],
--
+   'if': 'CONFIG_BLKIO' }
-             g_main_context_pop_thread_default(iothread->worker_context);
-         }
+ ##
-     }
+diff --git a/meson.build b/meson.build
-@@ -XXX,XX +XXX,XX @@ static void iothread_instance_finalize(Object *obj)
+index XXXXXXX..XXXXXXX 100644
-     if (iothread->worker_context) {
+--- a/meson.build
-         g_main_context_unref(iothread->worker_context);
++++ b/meson.build
-         iothread->worker_context = NULL;
+@@ -XXX,XX +XXX,XX @@ config_host_data.set('CONFIG_LZO', lzo.found())
-+        g_main_loop_unref(iothread->main_loop);
+ config_host_data.set('CONFIG_MPATH', mpathpersist.found())
-+        iothread->main_loop = NULL;
+ config_host_data.set('CONFIG_MPATH_NEW_API', mpathpersist_new_api)
-     }
+ config_host_data.set('CONFIG_BLKIO', blkio.found())
-     qemu_sem_destroy(&iothread->init_done_sem);
++if blkio.found()
- }
++  config_host_data.set('CONFIG_BLKIO_VHOST_VDPA_FD',
-@@ -XXX,XX +XXX,XX @@ static void iothread_init_gcontext(IOThread *iothread)
++                       blkio.version().version_compare('>=1.3.0'))
-     source = aio_get_g_source(iothread_get_aio_context(iothread));
++endif
-     g_source_attach(source, iothread->worker_context);
+ config_host_data.set('CONFIG_CURL', curl.found())
-     g_source_unref(source);
+ config_host_data.set('CONFIG_CURSES', curses.found())
-+    iothread->main_loop = g_main_loop_new(iothread->worker_context, TRUE);
+ config_host_data.set('CONFIG_GBM', gbm.found())
  }
  static void iothread_complete(UserCreatable *obj, Error **errp)
 --
-.20.1
+.40.1

The following changes since commit 6cb4f6db4f4367faa33da85b15f75bbbd2bed2a6:

Merge remote-tracking branch 'remotes/cleber/tags/python-next-pull-request' into staging (2019-03-07 16:16:02 +0000)

are available in the Git repository at:

git://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 6ca206204fa773c8626d59caf2a5676d6cc35f52:

iothread: document about why we need explicit aio_poll() (2019-03-08 10:20:57 +0000)

----------------------------------------------------------------
Pull request

----------------------------------------------------------------

Anastasiia Rusakova (1):
  hw/block/virtio-blk: Clean req->dev repetitions

Peter Xu (5):
  iothread: replace init_done_cond with a semaphore
  iothread: create the gcontext unconditionally
  iothread: create main loop unconditionally
  iothread: push gcontext earlier in the thread_fn
  iothread: document about why we need explicit aio_poll()

Stefan Hajnoczi (1):
  MAINTAINERS: add missing support status fields

MAINTAINERS               |  3 ++
 include/sysemu/iothread.h |  5 +--
 hw/block/virtio-blk.c     | 16 ++++---
 iothread.c                | 90 +++++++++++++++++++--------------------
 4 files changed, 57 insertions(+), 57 deletions(-)

-- 
2.20.1

This patch adds the "S:" line for areas of the codebase that currently
lack a support status field.

Note that there are a few more areas that are more abstract and do not
correspond to a specific set of files.  They have not been modified.

Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190301163518.20702-1-stefanha@redhat.com
Message-Id: <20190301163518.20702-1-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 MAINTAINERS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: include/hw/tricore/
 
 Multiarch Linux User Tests
 M: Alex Bennée <alex.bennee@linaro.org>
+S: Maintained
 F: tests/tcg/multiarch/
 
 Guest CPU Cores (KVM):
@@ -XXX,XX +XXX,XX @@ F: qemu.sasl
 Coroutines
 M: Stefan Hajnoczi <stefanha@redhat.com>
 M: Kevin Wolf <kwolf@redhat.com>
+S: Maintained
 F: util/*coroutine*
 F: include/qemu/coroutine*
 F: tests/test-coroutine.c
@@ -XXX,XX +XXX,XX @@ F: .gitlab-ci.yml
 Guest Test Compilation Support
 M: Alex Bennée <alex.bennee@linaro.org>
 R: Philippe Mathieu-Daudé <f4bug@amsat.org>
+S: Maintained
 F: tests/tcg/Makefile
 F: tests/tcg/Makefile.include
 L: qemu-devel@nongnu.org
-- 
2.20.1

From: Anastasiia Rusakova <arusakova917@gmail.com>

Some functions sometimes uses req->dev even though a local variable
VirtIOBlock* s = req->dev has already been defined.
Updated places to use s everywhere in the file.

Signed-off-by: Anastasiia Rusakova <arusakova917@gmail.com>
Message-id: 20190307161925.4158-1-rusakova.nastasia@icloud.com
Message-Id: <20190307161925.4158-1-rusakova.nastasia@icloud.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/virtio-blk.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_rw_complete(void *opaque, int ret)
         }
 
         if (ret) {
-            int p = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
+            int p = virtio_ldl_p(VIRTIO_DEVICE(s), &req->out.type);
             bool is_read = !(p & VIRTIO_BLK_T_OUT);
             /* Note that memory may be dirtied on read failure.  If the
              * virtio request is not completed here, as is the case for
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_rw_complete(void *opaque, int ret)
         }
 
         virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
-        block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
+        block_acct_done(blk_get_stats(s->blk), &req->acct);
         virtio_blk_free_request(req);
     }
     aio_context_release(blk_get_aio_context(s->conf.conf.blk));
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_scsi_req(VirtIOBlockReq *req)
 {
     int status = VIRTIO_BLK_S_OK;
     struct virtio_scsi_inhdr *scsi = NULL;
-    VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
-    VirtQueueElement *elem = &req->elem;
     VirtIOBlock *blk = req->dev;
+    VirtIODevice *vdev = VIRTIO_DEVICE(blk);
+    VirtQueueElement *elem = &req->elem;
 
 #ifdef __linux__
     int i;
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
 
 static void virtio_blk_handle_flush(VirtIOBlockReq *req, MultiReqBuffer *mrb)
 {
-    block_acct_start(blk_get_stats(req->dev->blk), &req->acct, 0,
+    VirtIOBlock *s = req->dev;
+
+    block_acct_start(blk_get_stats(s->blk), &req->acct, 0,
                      BLOCK_ACCT_FLUSH);
 
     /*
      * Make sure all outstanding writes are posted to the backing device.
      */
     if (mrb->is_write && mrb->num_reqs > 0) {
-        virtio_blk_submit_multireq(req->dev->blk, mrb);
+        virtio_blk_submit_multireq(s->blk, mrb);
     }
-    blk_aio_flush(req->dev->blk, virtio_blk_flush_complete, req);
+    blk_aio_flush(s->blk, virtio_blk_flush_complete, req);
 }
 
 static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
-- 
2.20.1

From: Peter Xu <peterx@redhat.com>

Only sending an init-done message using lock+cond seems an overkill to
me.  Replacing it with a simpler semaphore.

Meanwhile, init the semaphore unconditionally, then we can destroy it
unconditionally too in finalize which seems cleaner.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20190306115532.23025-2-peterx@redhat.com
Message-Id: <20190306115532.23025-2-peterx@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/sysemu/iothread.h |  3 +--
 iothread.c                | 17 ++++-------------
 2 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/iothread.h
+++ b/include/sysemu/iothread.h
@@ -XXX,XX +XXX,XX @@ typedef struct {
     GMainContext *worker_context;
     GMainLoop *main_loop;
     GOnce once;
-    QemuMutex init_done_lock;
-    QemuCond init_done_cond;    /* is thread initialization done? */
+    QemuSemaphore init_done_sem; /* is thread init done? */
     bool stopping;              /* has iothread_stop() been called? */
     bool running;               /* should iothread_run() continue? */
     int thread_id;
diff --git a/iothread.c b/iothread.c
index XXXXXXX..XXXXXXX 100644
--- a/iothread.c
+++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
     rcu_register_thread();
 
     my_iothread = iothread;
-    qemu_mutex_lock(&iothread->init_done_lock);
     iothread->thread_id = qemu_get_thread_id();
-    qemu_cond_signal(&iothread->init_done_cond);
-    qemu_mutex_unlock(&iothread->init_done_lock);
+    qemu_sem_post(&iothread->init_done_sem);
 
     while (iothread->running) {
         aio_poll(iothread->ctx, true);
@@ -XXX,XX +XXX,XX @@ static void iothread_instance_init(Object *obj)
 
     iothread->poll_max_ns = IOTHREAD_POLL_MAX_NS_DEFAULT;
     iothread->thread_id = -1;
+    qemu_sem_init(&iothread->init_done_sem, 0);
 }
 
 static void iothread_instance_finalize(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void iothread_instance_finalize(Object *obj)
 
     iothread_stop(iothread);
 
-    if (iothread->thread_id != -1) {
-        qemu_cond_destroy(&iothread->init_done_cond);
-        qemu_mutex_destroy(&iothread->init_done_lock);
-    }
     /*
      * Before glib2 2.33.10, there is a glib2 bug that GSource context
      * pointer may not be cleared even if the context has already been
@@ -XXX,XX +XXX,XX @@ static void iothread_instance_finalize(Object *obj)
         g_main_context_unref(iothread->worker_context);
         iothread->worker_context = NULL;
     }
+    qemu_sem_destroy(&iothread->init_done_sem);
 }
 
 static void iothread_complete(UserCreatable *obj, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void iothread_complete(UserCreatable *obj, Error **errp)
         return;
     }
 
-    qemu_mutex_init(&iothread->init_done_lock);
-    qemu_cond_init(&iothread->init_done_cond);
     iothread->once = (GOnce) G_ONCE_INIT;
 
     /* This assumes we are called from a thread with useful CPU affinity for us
@@ -XXX,XX +XXX,XX @@ static void iothread_complete(UserCreatable *obj, Error **errp)
     g_free(name);
 
     /* Wait for initialization to complete */
-    qemu_mutex_lock(&iothread->init_done_lock);
     while (iothread->thread_id == -1) {
-        qemu_cond_wait(&iothread->init_done_cond,
-                       &iothread->init_done_lock);
+        qemu_sem_wait(&iothread->init_done_sem);
     }
-    qemu_mutex_unlock(&iothread->init_done_lock);
 }
 
 typedef struct {
-- 
2.20.1

From: Peter Xu <peterx@redhat.com>

In existing code we create the gcontext dynamically at the first
access of the gcontext from caller.  That can bring some complexity
and potential races during using iothread.  Since the context itself
is not that big a resource, and we won't have millions of iothread,
let's simply create the gcontext unconditionally.

This will also be a preparation work further to move the thread
context push operation earlier than before (now it's only pushed right
before we want to start running the gmainloop).

Removing the g_once since it's not necessary, while introducing a new
run_gcontext boolean to show whether we want to run the gcontext.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20190306115532.23025-3-peterx@redhat.com
Message-Id: <20190306115532.23025-3-peterx@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/sysemu/iothread.h |  2 +-
 iothread.c                | 43 +++++++++++++++++++--------------------
 2 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/iothread.h
+++ b/include/sysemu/iothread.h
@@ -XXX,XX +XXX,XX @@ typedef struct {
 
     QemuThread thread;
     AioContext *ctx;
+    bool run_gcontext;          /* whether we should run gcontext */
     GMainContext *worker_context;
     GMainLoop *main_loop;
-    GOnce once;
     QemuSemaphore init_done_sem; /* is thread init done? */
     bool stopping;              /* has iothread_stop() been called? */
     bool running;               /* should iothread_run() continue? */
diff --git a/iothread.c b/iothread.c
index XXXXXXX..XXXXXXX 100644
--- a/iothread.c
+++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
          * We must check the running state again in case it was
          * changed in previous aio_poll()
          */
-        if (iothread->running && atomic_read(&iothread->worker_context)) {
+        if (iothread->running && atomic_read(&iothread->run_gcontext)) {
             GMainLoop *loop;
 
             g_main_context_push_thread_default(iothread->worker_context);
@@ -XXX,XX +XXX,XX @@ static void iothread_instance_init(Object *obj)
     iothread->poll_max_ns = IOTHREAD_POLL_MAX_NS_DEFAULT;
     iothread->thread_id = -1;
     qemu_sem_init(&iothread->init_done_sem, 0);
+    /* By default, we don't run gcontext */
+    atomic_set(&iothread->run_gcontext, 0);
 }
 
 static void iothread_instance_finalize(Object *obj)
@@ -XXX,XX +XXX,XX @@ static void iothread_instance_finalize(Object *obj)
     qemu_sem_destroy(&iothread->init_done_sem);
 }
 
+static void iothread_init_gcontext(IOThread *iothread)
+{
+    GSource *source;
+
+    iothread->worker_context = g_main_context_new();
+    source = aio_get_g_source(iothread_get_aio_context(iothread));
+    g_source_attach(source, iothread->worker_context);
+    g_source_unref(source);
+}
+
 static void iothread_complete(UserCreatable *obj, Error **errp)
 {
     Error *local_error = NULL;
@@ -XXX,XX +XXX,XX @@ static void iothread_complete(UserCreatable *obj, Error **errp)
         return;
     }
 
+    /*
+     * Init one GMainContext for the iothread unconditionally, even if
+     * it's not used
+     */
+    iothread_init_gcontext(iothread);
+
     aio_context_set_poll_params(iothread->ctx,
                                 iothread->poll_max_ns,
                                 iothread->poll_grow,
@@ -XXX,XX +XXX,XX @@ static void iothread_complete(UserCreatable *obj, Error **errp)
         return;
     }
 
-    iothread->once = (GOnce) G_ONCE_INIT;
-
     /* This assumes we are called from a thread with useful CPU affinity for us
      * to inherit.
      */
@@ -XXX,XX +XXX,XX @@ IOThreadInfoList *qmp_query_iothreads(Error **errp)
     return head;
 }
 
-static gpointer iothread_g_main_context_init(gpointer opaque)
-{
-    AioContext *ctx;
-    IOThread *iothread = opaque;
-    GSource *source;
-
-    iothread->worker_context = g_main_context_new();
-
-    ctx = iothread_get_aio_context(iothread);
-    source = aio_get_g_source(ctx);
-    g_source_attach(source, iothread->worker_context);
-    g_source_unref(source);
-
-    aio_notify(iothread->ctx);
-    return NULL;
-}
-
 GMainContext *iothread_get_g_main_context(IOThread *iothread)
 {
-    g_once(&iothread->once, iothread_g_main_context_init, iothread);
-
+    atomic_set(&iothread->run_gcontext, 1);
+    aio_notify(iothread->ctx);
     return iothread->worker_context;
 }
 
-- 
2.20.1

From: Peter Xu <peterx@redhat.com>

Since we've have the gcontext always there, create the main loop
altogether.  The iothread_run() is even cleaner.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20190306115532.23025-4-peterx@redhat.com
Message-Id: <20190306115532.23025-4-peterx@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 iothread.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/iothread.c b/iothread.c
index XXXXXXX..XXXXXXX 100644
--- a/iothread.c
+++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
          * changed in previous aio_poll()
          */
         if (iothread->running && atomic_read(&iothread->run_gcontext)) {
-            GMainLoop *loop;
-
             g_main_context_push_thread_default(iothread->worker_context);
-            iothread->main_loop =
-                g_main_loop_new(iothread->worker_context, TRUE);
-            loop = iothread->main_loop;
-
             g_main_loop_run(iothread->main_loop);
-            iothread->main_loop = NULL;
-            g_main_loop_unref(loop);
-
             g_main_context_pop_thread_default(iothread->worker_context);
         }
     }
@@ -XXX,XX +XXX,XX @@ static void iothread_instance_finalize(Object *obj)
     if (iothread->worker_context) {
         g_main_context_unref(iothread->worker_context);
         iothread->worker_context = NULL;
+        g_main_loop_unref(iothread->main_loop);
+        iothread->main_loop = NULL;
     }
     qemu_sem_destroy(&iothread->init_done_sem);
 }
@@ -XXX,XX +XXX,XX @@ static void iothread_init_gcontext(IOThread *iothread)
     source = aio_get_g_source(iothread_get_aio_context(iothread));
     g_source_attach(source, iothread->worker_context);
     g_source_unref(source);
+    iothread->main_loop = g_main_loop_new(iothread->worker_context, TRUE);
 }
 
 static void iothread_complete(UserCreatable *obj, Error **errp)
-- 
2.20.1

From: Peter Xu <peterx@redhat.com>

We were pushing the context until right before running the gmainloop.
Now since we have everything unconditionally, we can move this
earlier.

One benefit is that now it's done even before init_done_sem, so as
long as the iothread user calls iothread_create() and completes, we
know that the thread stack is ready.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-id: 20190306115532.23025-5-peterx@redhat.com
Message-Id: <20190306115532.23025-5-peterx@redhat.com>

[Tweaked comment wording as discussed with Peter Xu.
--Stefan]

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 iothread.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/iothread.c b/iothread.c
index XXXXXXX..XXXXXXX 100644
--- a/iothread.c
+++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
     IOThread *iothread = opaque;
 
     rcu_register_thread();
-
+    /*
+     * g_main_context_push_thread_default() must be called before anything
+     * in this new thread uses glib.
+     */
+    g_main_context_push_thread_default(iothread->worker_context);
     my_iothread = iothread;
     iothread->thread_id = qemu_get_thread_id();
     qemu_sem_post(&iothread->init_done_sem);
@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
          * changed in previous aio_poll()
          */
         if (iothread->running && atomic_read(&iothread->run_gcontext)) {
-            g_main_context_push_thread_default(iothread->worker_context);
             g_main_loop_run(iothread->main_loop);
-            g_main_context_pop_thread_default(iothread->worker_context);
         }
     }
 
+    g_main_context_pop_thread_default(iothread->worker_context);
     rcu_unregister_thread();
     return NULL;
 }
-- 
2.20.1

From: Peter Xu <peterx@redhat.com>

After consulting Paolo I know why we'd better keep the explicit
aio_poll() in iothread_run().  Document it directly into the code so
that future readers will know the answer from day one.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-id: 20190306115532.23025-6-peterx@redhat.com
Message-Id: <20190306115532.23025-6-peterx@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 iothread.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/iothread.c b/iothread.c
index XXXXXXX..XXXXXXX 100644
--- a/iothread.c
+++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
     qemu_sem_post(&iothread->init_done_sem);
 
     while (iothread->running) {
+        /*
+         * Note: from functional-wise the g_main_loop_run() below can
+         * already cover the aio_poll() events, but we can't run the
+         * main loop unconditionally because explicit aio_poll() here
+         * is faster than g_main_loop_run() when we do not need the
+         * gcontext at all (e.g., pure block layer iothreads).  In
+         * other words, when we want to run the gcontext with the
+         * iothread we need to pay some performance for functionality.
+         */
         aio_poll(iothread->ctx, true);
 
         /*
-- 
2.20.1

The following changes since commit c6a5fc2ac76c5ab709896ee1b0edd33685a67ed1:

decodetree: Add --output-null for meson testing (2023-05-31 19:56:42 -0700)

are available in the Git repository at:

https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 98b126f5e3228a346c774e569e26689943b401dd:

qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa (2023-06-01 11:08:21 -0400)

----------------------------------------------------------------
Pull request

- Stefano Garzarella's blkio block driver 'fd' parameter
- My thread-local blk_io_plug() series

----------------------------------------------------------------

Stefan Hajnoczi (6):
  block: add blk_io_plug_call() API
  block/nvme: convert to blk_io_plug_call() API
  block/blkio: convert to blk_io_plug_call() API
  block/io_uring: convert to blk_io_plug_call() API
  block/linux-aio: convert to blk_io_plug_call() API
  block: remove bdrv_co_io_plug() API

Stefano Garzarella (2):
  block/blkio: use qemu_open() to support fd passing for virtio-blk
  qapi: add '@fdset' feature for BlockdevOptionsVirtioBlkVhostVdpa

-- 
2.40.1

Introduce a new API for thread-local blk_io_plug() that does not
traverse the block graph. The goal is to make blk_io_plug() multi-queue
friendly.

Instead of having block drivers track whether or not we're in a plugged
section, provide an API that allows them to defer a function call until
we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
called multiple times with the same fn/opaque pair, then fn() is only
called once at the end of the function - resulting in batching.

This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
because the plug state is now thread-local.

Later patches convert block drivers to blk_io_plug_call() and then we
can finally remove .bdrv_co_io_plug() once all block drivers have been
converted.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 MAINTAINERS                       |   1 +
 include/sysemu/block-backend-io.h |  13 +--
 block/block-backend.c             |  22 -----
 block/plug.c                      | 159 ++++++++++++++++++++++++++++++
 hw/block/dataplane/xen-block.c    |   8 +-
 hw/block/virtio-blk.c             |   4 +-
 hw/scsi/virtio-scsi.c             |   6 +-
 block/meson.build                 |   1 +
 8 files changed, 173 insertions(+), 41 deletions(-)
 create mode 100644 block/plug.c

diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: util/aio-*.c
 F: util/aio-*.h
 F: util/fdmon-*.c
 F: block/io.c
+F: block/plug.c
 F: migration/block*
 F: include/block/aio.h
 F: include/block/aio-wait.h
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -XXX,XX +XXX,XX @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-/*
- * blk_io_plug/unplug are thread-local operations. This means that multiple
- * IOThreads can simultaneously call plug/unplug, but the caller must ensure
- * that each unplug() is called in the same IOThread of the matching plug().
- */
-void coroutine_fn blk_co_io_plug(BlockBackend *blk);
-void co_wrapper blk_io_plug(BlockBackend *blk);
-
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk);
-void co_wrapper blk_io_unplug(BlockBackend *blk);
+void blk_io_plug(void);
+void blk_io_unplug(void);
+void blk_io_plug_call(void (*fn)(void *), void *opaque);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
diff --git a/block/block-backend.c b/block/block-backend.c
index XXXXXXX..XXXXXXX 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -XXX,XX +XXX,XX @@ void blk_add_insert_bs_notifier(BlockBackend *blk, Notifier *notify)
     notifier_list_add(&blk->insert_bs_notifiers, notify);
 }
 
-void coroutine_fn blk_co_io_plug(BlockBackend *blk)
-{
-    BlockDriverState *bs = blk_bs(blk);
-    IO_CODE();
-    GRAPH_RDLOCK_GUARD();
-
-    if (bs) {
-        bdrv_co_io_plug(bs);
-    }
-}
-
-void coroutine_fn blk_co_io_unplug(BlockBackend *blk)
-{
-    BlockDriverState *bs = blk_bs(blk);
-    IO_CODE();
-    GRAPH_RDLOCK_GUARD();
-
-    if (bs) {
-        bdrv_co_io_unplug(bs);
-    }
-}
-
 BlockAcctStats *blk_get_stats(BlockBackend *blk)
 {
     IO_CODE();
diff --git a/block/plug.c b/block/plug.c
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/block/plug.c
@@ -XXX,XX +XXX,XX @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Block I/O plugging
+ *
+ * Copyright Red Hat.
+ *
+ * This API defers a function call within a blk_io_plug()/blk_io_unplug()
+ * section, allowing multiple calls to batch up. This is a performance
+ * optimization that is used in the block layer to submit several I/O requests
+ * at once instead of individually:
+ *
+ *   blk_io_plug(); <-- start of plugged region
+ *   ...
+ *   blk_io_plug_call(my_func, my_obj); <-- deferred my_func(my_obj) call
+ *   blk_io_plug_call(my_func, my_obj); <-- another
+ *   blk_io_plug_call(my_func, my_obj); <-- another
+ *   ...
+ *   blk_io_unplug(); <-- end of plugged region, my_func(my_obj) is called once
+ *
+ * This code is actually generic and not tied to the block layer. If another
+ * subsystem needs this functionality, it could be renamed.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/coroutine-tls.h"
+#include "qemu/notify.h"
+#include "qemu/thread.h"
+#include "sysemu/block-backend.h"
+
+/* A function call that has been deferred until unplug() */
+typedef struct {
+    void (*fn)(void *);
+    void *opaque;
+} UnplugFn;
+
+/* Per-thread state */
+typedef struct {
+    unsigned count;       /* how many times has plug() been called? */
+    GArray *unplug_fns;   /* functions to call at unplug time */
+} Plug;
+
+/* Use get_ptr_plug() to fetch this thread-local value */
+QEMU_DEFINE_STATIC_CO_TLS(Plug, plug);
+
+/* Called at thread cleanup time */
+static void blk_io_plug_atexit(Notifier *n, void *value)
+{
+    Plug *plug = get_ptr_plug();
+    g_array_free(plug->unplug_fns, TRUE);
+}
+
+/* This won't involve coroutines, so use __thread */
+static __thread Notifier blk_io_plug_atexit_notifier;
+
+/**
+ * blk_io_plug_call:
+ * @fn: a function pointer to be invoked
+ * @opaque: a user-defined argument to @fn()
+ *
+ * Call @fn(@opaque) immediately if not within a blk_io_plug()/blk_io_unplug()
+ * section.
+ *
+ * Otherwise defer the call until the end of the outermost
+ * blk_io_plug()/blk_io_unplug() section in this thread. If the same
+ * @fn/@opaque pair has already been deferred, it will only be called once upon
+ * blk_io_unplug() so that accumulated calls are batched into a single call.
+ *
+ * The caller must ensure that @opaque is not freed before @fn() is invoked.
+ */
+void blk_io_plug_call(void (*fn)(void *), void *opaque)
+{
+    Plug *plug = get_ptr_plug();
+
+    /* Call immediately if we're not plugged */
+    if (plug->count == 0) {
+        fn(opaque);
+        return;
+    }
+
+    GArray *array = plug->unplug_fns;
+    if (!array) {
+        array = g_array_new(FALSE, FALSE, sizeof(UnplugFn));
+        plug->unplug_fns = array;
+        blk_io_plug_atexit_notifier.notify = blk_io_plug_atexit;
+        qemu_thread_atexit_add(&blk_io_plug_atexit_notifier);
+    }
+
+    UnplugFn *fns = (UnplugFn *)array->data;
+    UnplugFn new_fn = {
+        .fn = fn,
+        .opaque = opaque,
+    };
+
+    /*
+     * There won't be many, so do a linear search. If this becomes a bottleneck
+     * then a binary search (glib 2.62+) or different data structure could be
+     * used.
+     */
+    for (guint i = 0; i < array->len; i++) {
+        if (memcmp(&fns[i], &new_fn, sizeof(new_fn)) == 0) {
+            return; /* already exists */
+        }
+    }
+
+    g_array_append_val(array, new_fn);
+}
+
+/**
+ * blk_io_plug: Defer blk_io_plug_call() functions until blk_io_unplug()
+ *
+ * blk_io_plug/unplug are thread-local operations. This means that multiple
+ * threads can simultaneously call plug/unplug, but the caller must ensure that
+ * each unplug() is called in the same thread of the matching plug().
+ *
+ * Nesting is supported. blk_io_plug_call() functions are only called at the
+ * outermost blk_io_unplug().
+ */
+void blk_io_plug(void)
+{
+    Plug *plug = get_ptr_plug();
+
+    assert(plug->count < UINT32_MAX);
+
+    plug->count++;
+}
+
+/**
+ * blk_io_unplug: Run any pending blk_io_plug_call() functions
+ *
+ * There must have been a matching blk_io_plug() call in the same thread prior
+ * to this blk_io_unplug() call.
+ */
+void blk_io_unplug(void)
+{
+    Plug *plug = get_ptr_plug();
+
+    assert(plug->count > 0);
+
+    if (--plug->count > 0) {
+        return;
+    }
+
+    GArray *array = plug->unplug_fns;
+    if (!array) {
+        return;
+    }
+
+    UnplugFn *fns = (UnplugFn *)array->data;
+
+    for (guint i = 0; i < array->len; i++) {
+        fns[i].fn(fns[i].opaque);
+    }
+
+    /*
+     * This resets the array without freeing memory so that appending is cheap
+     * in the future.
+     */
+    g_array_set_size(array, 0);
+}
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -XXX,XX +XXX,XX @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
      * is below us.
      */
     if (inflight_atstart > IO_PLUG_THRESHOLD) {
-        blk_io_plug(dataplane->blk);
+        blk_io_plug();
     }
     while (rc != rp) {
         /* pull request from ring */
@@ -XXX,XX +XXX,XX @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
 
         if (inflight_atstart > IO_PLUG_THRESHOLD &&
             batched >= inflight_atstart) {
-            blk_io_unplug(dataplane->blk);
+            blk_io_unplug();
         }
         xen_block_do_aio(request);
         if (inflight_atstart > IO_PLUG_THRESHOLD) {
             if (batched >= inflight_atstart) {
-                blk_io_plug(dataplane->blk);
+                blk_io_plug();
                 batched = 0;
             } else {
                 batched++;
@@ -XXX,XX +XXX,XX @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
         }
     }
     if (inflight_atstart > IO_PLUG_THRESHOLD) {
-        blk_io_unplug(dataplane->blk);
+        blk_io_unplug();
     }
 
     return done_something;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -XXX,XX +XXX,XX @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
     bool suppress_notifications = virtio_queue_get_notification(vq);
 
     aio_context_acquire(blk_get_aio_context(s->blk));
-    blk_io_plug(s->blk);
+    blk_io_plug();
 
     do {
         if (suppress_notifications) {
@@ -XXX,XX +XXX,XX @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
         virtio_blk_submit_multireq(s, &mrb);
     }
 
-    blk_io_unplug(s->blk);
+    blk_io_unplug();
     aio_context_release(blk_get_aio_context(s->blk));
 }
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -XXX,XX +XXX,XX @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req)
         return -ENOBUFS;
     }
     scsi_req_ref(req->sreq);
-    blk_io_plug(d->conf.blk);
+    blk_io_plug();
     object_unref(OBJECT(d));
     return 0;
 }
@@ -XXX,XX +XXX,XX @@ static void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req)
     if (scsi_req_enqueue(sreq)) {
         scsi_req_continue(sreq);
     }
-    blk_io_unplug(sreq->dev->conf.blk);
+    blk_io_unplug();
     scsi_req_unref(sreq);
 }
 
@@ -XXX,XX +XXX,XX @@ static void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
                 while (!QTAILQ_EMPTY(&reqs)) {
                     req = QTAILQ_FIRST(&reqs);
                     QTAILQ_REMOVE(&reqs, req, next);
-                    blk_io_unplug(req->sreq->dev->conf.blk);
+                    blk_io_unplug();
                     scsi_req_unref(req->sreq);
                     virtqueue_detach_element(req->vq, &req->elem, 0);
                     virtio_scsi_free_req(req);
diff --git a/block/meson.build b/block/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -XXX,XX +XXX,XX @@ block_ss.add(files(
   'mirror.c',
   'nbd.c',
   'null.c',
+  'plug.c',
   'qapi.c',
   'qcow2-bitmap.c',
   'qcow2-cache.c',
-- 
2.40.1

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

diff --git a/block/nvme.c b/block/nvme.c
index XXXXXXX..XXXXXXX 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/vfio-helpers.h"
 #include "block/block-io.h"
 #include "block/block_int.h"
+#include "sysemu/block-backend.h"
 #include "sysemu/replay.h"
 #include "trace.h"
 
@@ -XXX,XX +XXX,XX @@ struct BDRVNVMeState {
     int blkshift;
 
     uint64_t max_transfer;
-    bool plugged;
 
     bool supports_write_zeroes;
     bool supports_discard;
@@ -XXX,XX +XXX,XX @@ static void nvme_kick(NVMeQueuePair *q)
 {
     BDRVNVMeState *s = q->s;
 
-    if (s->plugged || !q->need_kick) {
+    if (!q->need_kick) {
         return;
     }
     trace_nvme_kick(s, q->index);
@@ -XXX,XX +XXX,XX @@ static bool nvme_process_completion(NVMeQueuePair *q)
     NvmeCqe *c;
 
     trace_nvme_process_completion(s, q->index, q->inflight);
-    if (s->plugged) {
-        trace_nvme_process_completion_queue_plugged(s, q->index);
-        return false;
-    }
 
     /*
      * Support re-entrancy when a request cb() function invokes aio_poll().
@@ -XXX,XX +XXX,XX @@ static void nvme_trace_command(const NvmeCmd *cmd)
     }
 }
 
+static void nvme_unplug_fn(void *opaque)
+{
+    NVMeQueuePair *q = opaque;
+
+    QEMU_LOCK_GUARD(&q->lock);
+    nvme_kick(q);
+    nvme_process_completion(q);
+}
+
 static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
                                 NvmeCmd *cmd, BlockCompletionFunc cb,
                                 void *opaque)
@@ -XXX,XX +XXX,XX @@ static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
            q->sq.tail * NVME_SQ_ENTRY_BYTES, cmd, sizeof(*cmd));
     q->sq.tail = (q->sq.tail + 1) % NVME_QUEUE_SIZE;
     q->need_kick++;
-    nvme_kick(q);
-    nvme_process_completion(q);
+    blk_io_plug_call(nvme_unplug_fn, q);
     qemu_mutex_unlock(&q->lock);
 }
 
@@ -XXX,XX +XXX,XX @@ static void nvme_attach_aio_context(BlockDriverState *bs,
     }
 }
 
-static void coroutine_fn nvme_co_io_plug(BlockDriverState *bs)
-{
-    BDRVNVMeState *s = bs->opaque;
-    assert(!s->plugged);
-    s->plugged = true;
-}
-
-static void coroutine_fn nvme_co_io_unplug(BlockDriverState *bs)
-{
-    BDRVNVMeState *s = bs->opaque;
-    assert(s->plugged);
-    s->plugged = false;
-    for (unsigned i = INDEX_IO(0); i < s->queue_count; i++) {
-        NVMeQueuePair *q = s->queues[i];
-        qemu_mutex_lock(&q->lock);
-        nvme_kick(q);
-        nvme_process_completion(q);
-        qemu_mutex_unlock(&q->lock);
-    }
-}
-
 static bool nvme_register_buf(BlockDriverState *bs, void *host, size_t size,
                               Error **errp)
 {
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_nvme = {
     .bdrv_detach_aio_context  = nvme_detach_aio_context,
     .bdrv_attach_aio_context  = nvme_attach_aio_context,
 
-    .bdrv_co_io_plug          = nvme_co_io_plug,
-    .bdrv_co_io_unplug        = nvme_co_io_unplug,
-
     .bdrv_register_buf        = nvme_register_buf,
     .bdrv_unregister_buf      = nvme_unregister_buf,
 };
diff --git a/block/trace-events b/block/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -XXX,XX +XXX,XX @@ nvme_kick(void *s, unsigned q_index) "s %p q #%u"
 nvme_dma_flush_queue_wait(void *s) "s %p"
 nvme_error(int cmd_specific, int sq_head, int sqid, int cid, int status) "cmd_specific %d sq_head %d sqid %d cid %d status 0x%x"
 nvme_process_completion(void *s, unsigned q_index, int inflight) "s %p q #%u inflight %d"
-nvme_process_completion_queue_plugged(void *s, unsigned q_index) "s %p q #%u"
 nvme_complete_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
 nvme_submit_command(void *s, unsigned q_index, int cid) "s %p q #%u cid %d"
 nvme_submit_command_raw(int c0, int c1, int c2, int c3, int c4, int c5, int c6, int c7) "%02x %02x %02x %02x %02x %02x %02x %02x"
-- 
2.40.1

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

diff --git a/block/blkio.c b/block/blkio.c
index XXXXXXX..XXXXXXX 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
 #include "qemu/module.h"
+#include "sysemu/block-backend.h"
 #include "exec/memory.h" /* for ram_block_discard_disable() */
 
 #include "block/block-io.h"
@@ -XXX,XX +XXX,XX @@ static void blkio_detach_aio_context(BlockDriverState *bs)
                        NULL, NULL, NULL);
 }
 
-/* Call with s->blkio_lock held to submit I/O after enqueuing a new request */
-static void blkio_submit_io(BlockDriverState *bs)
+/*
+ * Called by blk_io_unplug() or immediately if not plugged. Called without
+ * blkio_lock.
+ */
+static void blkio_unplug_fn(void *opaque)
 {
-    if (qatomic_read(&bs->io_plugged) == 0) {
-        BDRVBlkioState *s = bs->opaque;
+    BDRVBlkioState *s = opaque;
 
+    WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_do_io(s->blkioq, NULL, 0, 0, NULL);
     }
 }
 
+/*
+ * Schedule I/O submission after enqueuing a new request. Called without
+ * blkio_lock.
+ */
+static void blkio_submit_io(BlockDriverState *bs)
+{
+    BDRVBlkioState *s = bs->opaque;
+
+    blk_io_plug_call(blkio_unplug_fn, s);
+}
+
 static int coroutine_fn
 blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
@@ -XXX,XX +XXX,XX @@ blkio_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 
     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_discard(s->blkioq, offset, bytes, &cod, 0);
-        blkio_submit_io(bs);
     }
 
+    blkio_submit_io(bs);
     qemu_coroutine_yield();
     return cod.ret;
 }
@@ -XXX,XX +XXX,XX @@ blkio_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
 
     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_readv(s->blkioq, offset, iov, iovcnt, &cod, 0);
-        blkio_submit_io(bs);
     }
 
+    blkio_submit_io(bs);
     qemu_coroutine_yield();
 
     if (use_bounce_buffer) {
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn blkio_co_pwritev(BlockDriverState *bs, int64_t offset,
 
     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_writev(s->blkioq, offset, iov, iovcnt, &cod, blkio_flags);
-        blkio_submit_io(bs);
     }
 
+    blkio_submit_io(bs);
     qemu_coroutine_yield();
 
     if (use_bounce_buffer) {
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn blkio_co_flush(BlockDriverState *bs)
 
     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_flush(s->blkioq, &cod, 0);
-        blkio_submit_io(bs);
     }
 
+    blkio_submit_io(bs);
     qemu_coroutine_yield();
     return cod.ret;
 }
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn blkio_co_pwrite_zeroes(BlockDriverState *bs,
 
     WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
         blkioq_write_zeroes(s->blkioq, offset, bytes, &cod, blkio_flags);
-        blkio_submit_io(bs);
     }
 
+    blkio_submit_io(bs);
     qemu_coroutine_yield();
     return cod.ret;
 }
 
-static void coroutine_fn blkio_co_io_unplug(BlockDriverState *bs)
-{
-    BDRVBlkioState *s = bs->opaque;
-
-    WITH_QEMU_LOCK_GUARD(&s->blkio_lock) {
-        blkio_submit_io(bs);
-    }
-}
-
 typedef enum {
     BMRR_OK,
     BMRR_SKIP,
@@ -XXX,XX +XXX,XX @@ static void blkio_refresh_limits(BlockDriverState *bs, Error **errp)
         .bdrv_co_pwritev         = blkio_co_pwritev, \
         .bdrv_co_flush_to_disk   = blkio_co_flush, \
         .bdrv_co_pwrite_zeroes   = blkio_co_pwrite_zeroes, \
-        .bdrv_co_io_unplug       = blkio_co_io_unplug, \
         .bdrv_refresh_limits     = blkio_refresh_limits, \
         .bdrv_register_buf       = blkio_register_buf, \
         .bdrv_unregister_buf     = blkio_unregister_buf, \
-- 
2.40.1

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 20230530180959.1108766-5-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/raw-aio.h |  7 -------
 block/file-posix.c      | 10 ----------
 block/io_uring.c        | 44 ++++++++++++++++-------------------------
 block/trace-events      |  5 ++---
 4 files changed, 19 insertions(+), 47 deletions(-)

diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -XXX,XX +XXX,XX @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t offset,
                                   QEMUIOVector *qiov, int type);
 void luring_detach_aio_context(LuringState *s, AioContext *old_context);
 void luring_attach_aio_context(LuringState *s, AioContext *new_context);
-
-/*
- * luring_io_plug/unplug work in the thread's current AioContext, therefore the
- * caller must ensure that they are paired in the same IOThread.
- */
-void luring_io_plug(void);
-void luring_io_unplug(void);
 #endif
 
 #ifdef _WIN32
diff --git a/block/file-posix.c b/block/file-posix.c
index XXXXXXX..XXXXXXX 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
         laio_io_plug();
     }
 #endif
-#ifdef CONFIG_LINUX_IO_URING
-    if (s->use_linux_io_uring) {
-        luring_io_plug();
-    }
-#endif
 }
 
 static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
@@ -XXX,XX +XXX,XX @@ static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
         laio_io_unplug(s->aio_max_batch);
     }
 #endif
-#ifdef CONFIG_LINUX_IO_URING
-    if (s->use_linux_io_uring) {
-        luring_io_unplug();
-    }
-#endif
 }
 
 static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
diff --git a/block/io_uring.c b/block/io_uring.c
index XXXXXXX..XXXXXXX 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -XXX,XX +XXX,XX @@
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "sysemu/block-backend.h"
 #include "trace.h"
 
 /* Only used for assertions.  */
@@ -XXX,XX +XXX,XX @@ typedef struct LuringAIOCB {
 } LuringAIOCB;
 
 typedef struct LuringQueue {
-    int plugged;
     unsigned int in_queue;
     unsigned int in_flight;
     bool blocked;
@@ -XXX,XX +XXX,XX @@ static void luring_process_completions_and_submit(LuringState *s)
 {
     luring_process_completions(s);
 
-    if (!s->io_q.plugged && s->io_q.in_queue > 0) {
+    if (s->io_q.in_queue > 0) {
         ioq_submit(s);
     }
 }
@@ -XXX,XX +XXX,XX @@ static void qemu_luring_poll_ready(void *opaque)
 static void ioq_init(LuringQueue *io_q)
 {
     QSIMPLEQ_INIT(&io_q->submit_queue);
-    io_q->plugged = 0;
     io_q->in_queue = 0;
     io_q->in_flight = 0;
     io_q->blocked = false;
 }
 
-void luring_io_plug(void)
+static void luring_unplug_fn(void *opaque)
 {
-    AioContext *ctx = qemu_get_current_aio_context();
-    LuringState *s = aio_get_linux_io_uring(ctx);
-    trace_luring_io_plug(s);
-    s->io_q.plugged++;
-}
-
-void luring_io_unplug(void)
-{
-    AioContext *ctx = qemu_get_current_aio_context();
-    LuringState *s = aio_get_linux_io_uring(ctx);
-    assert(s->io_q.plugged);
-    trace_luring_io_unplug(s, s->io_q.blocked, s->io_q.plugged,
-                           s->io_q.in_queue, s->io_q.in_flight);
-    if (--s->io_q.plugged == 0 &&
-        !s->io_q.blocked && s->io_q.in_queue > 0) {
+    LuringState *s = opaque;
+    trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
+                           s->io_q.in_flight);
+    if (!s->io_q.blocked && s->io_q.in_queue > 0) {
         ioq_submit(s);
     }
 }
@@ -XXX,XX +XXX,XX @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
 
     QSIMPLEQ_INSERT_TAIL(&s->io_q.submit_queue, luringcb, next);
     s->io_q.in_queue++;
-    trace_luring_do_submit(s, s->io_q.blocked, s->io_q.plugged,
-                           s->io_q.in_queue, s->io_q.in_flight);
-    if (!s->io_q.blocked &&
-        (!s->io_q.plugged ||
-         s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES)) {
-        ret = ioq_submit(s);
-        trace_luring_do_submit_done(s, ret);
-        return ret;
+    trace_luring_do_submit(s, s->io_q.blocked, s->io_q.in_queue,
+                           s->io_q.in_flight);
+    if (!s->io_q.blocked) {
+        if (s->io_q.in_flight + s->io_q.in_queue >= MAX_ENTRIES) {
+            ret = ioq_submit(s);
+            trace_luring_do_submit_done(s, ret);
+            return ret;
+        }
+
+        blk_io_plug_call(luring_unplug_fn, s);
     }
     return 0;
 }
diff --git a/block/trace-events b/block/trace-events
index XXXXXXX..XXXXXXX 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -XXX,XX +XXX,XX @@ file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "
 # io_uring.c
 luring_init_state(void *s, size_t size) "s %p size %zu"
 luring_cleanup_state(void *s) "%p freed"
-luring_io_plug(void *s) "LuringState %p plug"
-luring_io_unplug(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d"
-luring_do_submit(void *s, int blocked, int plugged, int queued, int inflight) "LuringState %p blocked %d plugged %d queued %d inflight %d"
+luring_unplug_fn(void *s, int blocked, int queued, int inflight) "LuringState %p blocked %d queued %d inflight %d"
+luring_do_submit(void *s, int blocked, int queued, int inflight) "LuringState %p blocked %d queued %d inflight %d"
 luring_do_submit_done(void *s, int ret) "LuringState %p submitted to kernel %d"
 luring_co_submit(void *bs, void *s, void *luringcb, int fd, uint64_t offset, size_t nbytes, int type) "bs %p s %p luringcb %p fd %d offset %" PRId64 " nbytes %zd type %d"
 luring_process_completion(void *s, void *aiocb, int ret) "LuringState %p luringcb %p ret %d"
-- 
2.40.1

Stop using the .bdrv_co_io_plug() API because it is not multi-queue
block layer friendly. Use the new blk_io_plug_call() API to batch I/O
submission instead.

Note that a dev_max_batch check is dropped in laio_io_unplug() because
the semantics of unplug_fn() are different from .bdrv_co_unplug():
1. unplug_fn() is only called when the last blk_io_unplug() call occurs,
   not every time blk_io_unplug() is called.
2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no
   way to get per-BlockDriverState fields like dev_max_batch.

Therefore this condition cannot be moved to laio_unplug_fn(). It is not
obvious that this condition affects performance in practice, so I am
removing it instead of trying to come up with a more complex mechanism
to preserve the condition.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230530180959.1108766-6-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/raw-aio.h |  7 -------
 block/file-posix.c      | 28 ----------------------------
 block/linux-aio.c       | 41 +++++++++++------------------------------
 3 files changed, 11 insertions(+), 65 deletions(-)

diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/raw-aio.h
+++ b/include/block/raw-aio.h
@@ -XXX,XX +XXX,XX @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov,
 
 void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context);
 void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context);
-
-/*
- * laio_io_plug/unplug work in the thread's current AioContext, therefore the
- * caller must ensure that they are paired in the same IOThread.
- */
-void laio_io_plug(void);
-void laio_io_unplug(uint64_t dev_max_batch);
 #endif
 /* io_uring.c - Linux io_uring implementation */
 #ifdef CONFIG_LINUX_IO_URING
diff --git a/block/file-posix.c b/block/file-posix.c
index XXXXXXX..XXXXXXX 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_pwritev(BlockDriverState *bs, int64_t offset,
     return raw_co_prw(bs, offset, bytes, qiov, QEMU_AIO_WRITE);
 }
 
-static void coroutine_fn raw_co_io_plug(BlockDriverState *bs)
-{
-    BDRVRawState __attribute__((unused)) *s = bs->opaque;
-#ifdef CONFIG_LINUX_AIO
-    if (s->use_linux_aio) {
-        laio_io_plug();
-    }
-#endif
-}
-
-static void coroutine_fn raw_co_io_unplug(BlockDriverState *bs)
-{
-    BDRVRawState __attribute__((unused)) *s = bs->opaque;
-#ifdef CONFIG_LINUX_AIO
-    if (s->use_linux_aio) {
-        laio_io_unplug(s->aio_max_batch);
-    }
-#endif
-}
-
 static int coroutine_fn raw_co_flush_to_disk(BlockDriverState *bs)
 {
     BDRVRawState *s = bs->opaque;
@@ -XXX,XX +XXX,XX @@ BlockDriver bdrv_file = {
     .bdrv_co_copy_range_from = raw_co_copy_range_from,
     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
     .bdrv_refresh_limits = raw_refresh_limits,
-    .bdrv_co_io_plug        = raw_co_io_plug,
-    .bdrv_co_io_unplug      = raw_co_io_unplug,
     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_device = {
     .bdrv_co_copy_range_from = raw_co_copy_range_from,
     .bdrv_co_copy_range_to  = raw_co_copy_range_to,
     .bdrv_refresh_limits = raw_refresh_limits,
-    .bdrv_co_io_plug        = raw_co_io_plug,
-    .bdrv_co_io_unplug      = raw_co_io_unplug,
     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
     .bdrv_refresh_limits    = cdrom_refresh_limits,
-    .bdrv_co_io_plug        = raw_co_io_plug,
-    .bdrv_co_io_unplug      = raw_co_io_unplug,
     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_co_pwritev        = raw_co_pwritev,
     .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
     .bdrv_refresh_limits    = cdrom_refresh_limits,
-    .bdrv_co_io_plug        = raw_co_io_plug,
-    .bdrv_co_io_unplug      = raw_co_io_unplug,
     .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
     .bdrv_co_truncate                   = raw_co_truncate,
diff --git a/block/linux-aio.c b/block/linux-aio.c
index XXXXXXX..XXXXXXX 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/event_notifier.h"
 #include "qemu/coroutine.h"
 #include "qapi/error.h"
+#include "sysemu/block-backend.h"
 
 /* Only used for assertions.  */
 #include "qemu/coroutine_int.h"
@@ -XXX,XX +XXX,XX @@ struct qemu_laiocb {
 };
 
 typedef struct {
-    int plugged;
     unsigned int in_queue;
     unsigned int in_flight;
     bool blocked;
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
 {
     qemu_laio_process_completions(s);
 
-    if (!s->io_q.plugged && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
+    if (!QSIMPLEQ_EMPTY(&s->io_q.pending)) {
         ioq_submit(s);
     }
 }
@@ -XXX,XX +XXX,XX @@ static void qemu_laio_poll_ready(EventNotifier *opaque)
 static void ioq_init(LaioQueue *io_q)
 {
     QSIMPLEQ_INIT(&io_q->pending);
-    io_q->plugged = 0;
     io_q->in_queue = 0;
     io_q->in_flight = 0;
     io_q->blocked = false;
@@ -XXX,XX +XXX,XX @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
     return max_batch;
 }
 
-void laio_io_plug(void)
+static void laio_unplug_fn(void *opaque)
 {
-    AioContext *ctx = qemu_get_current_aio_context();
-    LinuxAioState *s = aio_get_linux_aio(ctx);
+    LinuxAioState *s = opaque;
 
-    s->io_q.plugged++;
-}
-
-void laio_io_unplug(uint64_t dev_max_batch)
-{
-    AioContext *ctx = qemu_get_current_aio_context();
-    LinuxAioState *s = aio_get_linux_aio(ctx);
-
-    assert(s->io_q.plugged);
-    s->io_q.plugged--;
-
-    /*
-     * Why max batch checking is performed here:
-     * Another BDS may have queued requests with a higher dev_max_batch and
-     * therefore in_queue could now exceed our dev_max_batch. Re-check the max
-     * batch so we can honor our device's dev_max_batch.
-     */
-    if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch) ||
-        (!s->io_q.plugged &&
-         !s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending))) {
+    if (!s->io_q.blocked && !QSIMPLEQ_EMPTY(&s->io_q.pending)) {
         ioq_submit(s);
     }
 }
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
 
     QSIMPLEQ_INSERT_TAIL(&s->io_q.pending, laiocb, next);
     s->io_q.in_queue++;
-    if (!s->io_q.blocked &&
-        (!s->io_q.plugged ||
-         s->io_q.in_queue >= laio_max_batch(s, dev_max_batch))) {
-        ioq_submit(s);
+    if (!s->io_q.blocked) {
+        if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch)) {
+            ioq_submit(s);
+        } else {
+            blk_io_plug_call(laio_unplug_fn, s);
+        }
     }
 
     return 0;
-- 
2.40.1

No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
function pointers.

diff --git a/include/block/block-io.h b/include/block/block-io.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -XXX,XX +XXX,XX @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, AioContext *old_ctx);
 
 AioContext *child_of_bds_get_parent_aio_context(BdrvChild *c);
 
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_plug(BlockDriverState *bs);
-void coroutine_fn GRAPH_RDLOCK bdrv_co_io_unplug(BlockDriverState *bs);
-
 bool coroutine_fn GRAPH_RDLOCK
 bdrv_co_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name,
                                    uint32_t granularity, Error **errp);
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -XXX,XX +XXX,XX @@ struct BlockDriver {
     void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_debug_event)(
         BlockDriverState *bs, BlkdebugEvent event);
 
-    /* io queue for linux-aio */
-    void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_plug)(BlockDriverState *bs);
-    void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_unplug)(
-        BlockDriverState *bs);
-
     bool (*bdrv_supports_persistent_dirty_bitmap)(BlockDriverState *bs);
 
     bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_can_store_new_dirty_bitmap)(
@@ -XXX,XX +XXX,XX @@ struct BlockDriverState {
     unsigned int in_flight;
     unsigned int serialising_in_flight;
 
-    /*
-     * counter for nested bdrv_io_plug.
-     * Accessed with atomic ops.
-     */
-    unsigned io_plugged;
-
     /* do we need to tell the quest if we have a volatile write cache? */
     int enable_write_cache;
 
diff --git a/block/io.c b/block/io.c
index XXXXXXX..XXXXXXX 100644
--- a/block/io.c
+++ b/block/io.c
@@ -XXX,XX +XXX,XX @@ void *qemu_try_blockalign0(BlockDriverState *bs, size_t size)
     return mem;
 }
 
-void coroutine_fn bdrv_co_io_plug(BlockDriverState *bs)
-{
-    BdrvChild *child;
-    IO_CODE();
-    assert_bdrv_graph_readable();
-
-    QLIST_FOREACH(child, &bs->children, next) {
-        bdrv_co_io_plug(child->bs);
-    }
-
-    if (qatomic_fetch_inc(&bs->io_plugged) == 0) {
-        BlockDriver *drv = bs->drv;
-        if (drv && drv->bdrv_co_io_plug) {
-            drv->bdrv_co_io_plug(bs);
-        }
-    }
-}
-
-void coroutine_fn bdrv_co_io_unplug(BlockDriverState *bs)
-{
-    BdrvChild *child;
-    IO_CODE();
-    assert_bdrv_graph_readable();
-
-    assert(bs->io_plugged);
-    if (qatomic_fetch_dec(&bs->io_plugged) == 1) {
-        BlockDriver *drv = bs->drv;
-        if (drv && drv->bdrv_co_io_unplug) {
-            drv->bdrv_co_io_unplug(bs);
-        }
-    }
-
-    QLIST_FOREACH(child, &bs->children, next) {
-        bdrv_co_io_unplug(child->bs);
-    }
-}
-
 /* Helper that undoes bdrv_register_buf() when it fails partway through */
 static void GRAPH_RDLOCK
 bdrv_register_buf_rollback(BlockDriverState *bs, void *host, size_t size,
-- 
2.40.1

From: Stefano Garzarella <sgarzare@redhat.com>

Some virtio-blk drivers (e.g. virtio-blk-vhost-vdpa) supports the fd
passing. Let's expose this to the user, so the management layer
can pass the file descriptor of an already opened path.

If the libblkio virtio-blk driver supports fd passing, let's always
use qemu_open() to open the `path`, so we can handle fd passing
from the management layer through the "/dev/fdset/N" special path.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230530071941.8954-2-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/blkio.c | 53 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 44 insertions(+), 9 deletions(-)

diff --git a/block/blkio.c b/block/blkio.c
index XXXXXXX..XXXXXXX 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -XXX,XX +XXX,XX @@ static int blkio_virtio_blk_common_open(BlockDriverState *bs,
 {
     const char *path = qdict_get_try_str(options, "path");
     BDRVBlkioState *s = bs->opaque;
-    int ret;
+    bool fd_supported = false;
+    int fd, ret;
 
     if (!path) {
         error_setg(errp, "missing 'path' option");
         return -EINVAL;
     }
 
-    ret = blkio_set_str(s->blkio, "path", path);
-    qdict_del(options, "path");
-    if (ret < 0) {
-        error_setg_errno(errp, -ret, "failed to set path: %s",
-                         blkio_get_error_msg());
-        return ret;
-    }
-
     if (!(flags & BDRV_O_NOCACHE)) {
         error_setg(errp, "cache.direct=off is not supported");
         return -EINVAL;
     }
+
+    if (blkio_get_int(s->blkio, "fd", &fd) == 0) {
+        fd_supported = true;
+    }
+
+    /*
+     * If the libblkio driver supports fd passing, let's always use qemu_open()
+     * to open the `path`, so we can handle fd passing from the management
+     * layer through the "/dev/fdset/N" special path.
+     */
+    if (fd_supported) {
+        int open_flags;
+
+        if (flags & BDRV_O_RDWR) {
+            open_flags = O_RDWR;
+        } else {
+            open_flags = O_RDONLY;
+        }
+
+        fd = qemu_open(path, open_flags, errp);
+        if (fd < 0) {
+            return -EINVAL;
+        }
+
+        ret = blkio_set_int(s->blkio, "fd", fd);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "failed to set fd: %s",
+                             blkio_get_error_msg());
+            qemu_close(fd);
+            return ret;
+        }
+    } else {
+        ret = blkio_set_str(s->blkio, "path", path);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "failed to set path: %s",
+                             blkio_get_error_msg());
+            return ret;
+        }
+    }
+
+    qdict_del(options, "path");
+
     return 0;
 }
 
-- 
2.40.1

From: Stefano Garzarella <sgarzare@redhat.com>

The virtio-blk-vhost-vdpa driver in libblkio 1.3.0 supports the fd
passing through the new 'fd' property.

Since now we are using qemu_open() on '@path' if the virtio-blk driver
supports the fd passing, let's announce it.
In this way, the management layer can pass the file descriptor of an
already opened vhost-vdpa character device. This is useful especially
when the device can only be accessed with certain privileges.

Add the '@fdset' feature only when the virtio-blk-vhost-vdpa driver
in libblkio supports it.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20230530071941.8954-3-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qapi/block-core.json | 6 ++++++
 meson.build          | 4 ++++
 2 files changed, 10 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index XXXXXXX..XXXXXXX 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -XXX,XX +XXX,XX @@
 #
 # @path: path to the vhost-vdpa character device.
 #
+# Features:
+# @fdset: Member @path supports the special "/dev/fdset/N" path
+#     (since 8.1)
+#
 # Since: 7.2
 ##
 { 'struct': 'BlockdevOptionsVirtioBlkVhostVdpa',
   'data': { 'path': 'str' },
+  'features': [ { 'name' :'fdset',
+                  'if': 'CONFIG_BLKIO_VHOST_VDPA_FD' } ],
   'if': 'CONFIG_BLKIO' }
 
 ##
diff --git a/meson.build b/meson.build
index XXXXXXX..XXXXXXX 100644
--- a/meson.build
+++ b/meson.build
@@ -XXX,XX +XXX,XX @@ config_host_data.set('CONFIG_LZO', lzo.found())
 config_host_data.set('CONFIG_MPATH', mpathpersist.found())
 config_host_data.set('CONFIG_MPATH_NEW_API', mpathpersist_new_api)
 config_host_data.set('CONFIG_BLKIO', blkio.found())
+if blkio.found()
+  config_host_data.set('CONFIG_BLKIO_VHOST_VDPA_FD',
+                       blkio.version().version_compare('>=1.3.0'))
+endif
 config_host_data.set('CONFIG_CURL', curl.found())
 config_host_data.set('CONFIG_CURSES', curses.found())
 config_host_data.set('CONFIG_GBM', gbm.found())
-- 
2.40.1