Series comparison

-[Qemu-devel] [PULL 00/12] Block patches
+[Qemu-devel] [PULL 0/7] Block patches
-The following changes since commit 9964e96dc9999cf7f7c936ee854a795415d19b60:
+The following changes since commit 0ab4537f08e09b13788db67efd760592fb7db769:
-  Merge remote-tracking branch 'jasowang/tags/net-pull-request' into staging (2017-05-23 15:01:31 +0100)
+  Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2018-03-07-1' into staging (2018-03-08 12:56:39 +0000)
-are available in the git repository at:
+are available in the Git repository at:
-  git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request
+  git://github.com/stefanha/qemu.git tags/block-pull-request
-for you to fetch changes up to 223a23c198787328ae75bc65d84edf5fde33c0b6:
+for you to fetch changes up to 4486e89c219c0d1b9bd8dfa0b1dd5b0d51ff2268:
-  block/gluster: glfs_lseek() workaround (2017-05-24 16:44:46 -0400)
+  vl: introduce vm_shutdown() (2018-03-08 17:38:51 +0000)
 ----------------------------------------------------------------
-Block patches
 ----------------------------------------------------------------
-Jeff Cody (1):
+Deepa Srinivasan (1):
-  block/gluster: glfs_lseek() workaround
+  block: Fix qemu crash when using scsi-block
-Paolo Bonzini (11):
+Fam Zheng (1):
-  blockjob: remove unnecessary check
+  README: Fix typo 'git-publish'
   blockjob: remove iostatus_reset callback
   blockjob: introduce block_job_early_fail
   blockjob: introduce block_job_pause/resume_all
   blockjob: separate monitor and blockjob APIs
   blockjob: move iostatus reset inside block_job_user_resume
   blockjob: introduce block_job_cancel_async, check iostatus invariants
   blockjob: group BlockJob transaction functions together
   blockjob: strengthen a bit test-blockjob-txn
   blockjob: reorganize block_job_completed_txn_abort
   blockjob: use deferred_to_main_loop to indicate the coroutine has
     ended
- block/backup.c               |   2 +-
+Sergio Lopez (1):
- block/commit.c               |   2 +-
+  virtio-blk: dataplane: Don't batch notifications if EVENT_IDX is
- block/gluster.c              |  18 +-
+    present
- block/io.c                   |  19 +-
- block/mirror.c               |   2 +-
+Stefan Hajnoczi (4):
- blockdev.c                   |   1 -
+  block: add aio_wait_bh_oneshot()
- blockjob.c                   | 750 ++++++++++++++++++++++++-------------------
+  virtio-blk: fix race between .ioeventfd_stop() and vq handler
- include/block/blockjob.h     |  16 -
+  virtio-scsi: fix race between .ioeventfd_stop() and vq handler
- include/block/blockjob_int.h |  27 +-
+  vl: introduce vm_shutdown()
- tests/test-blockjob-txn.c    |   7 +-
- tests/test-blockjob.c        |  10 +-
+ include/block/aio-wait.h        | 13 +++++++++++
-files changed, 463 insertions(+), 391 deletions(-)
+ include/sysemu/iothread.h       |  1 -
  include/sysemu/sysemu.h         |  1 +
  block/block-backend.c           | 51 ++++++++++++++++++++---------------------
  cpus.c                          | 16 ++++++++++---
  hw/block/dataplane/virtio-blk.c | 39 +++++++++++++++++++++++--------
  hw/scsi/virtio-scsi-dataplane.c |  9 ++++----
  iothread.c                      | 31 -------------------------
  util/aio-wait.c                 | 31 +++++++++++++++++++++++++
  vl.c                            | 13 +++--------
  README                          |  2 +-
 files changed, 122 insertions(+), 85 deletions(-)
 --
-.9.3
+.14.3

-[Qemu-devel] [PULL 08/12] blockjob: group BlockJob transaction functions together
+[Qemu-devel] [PULL 1/7] block: Fix qemu crash when using scsi-block
-From: Paolo Bonzini <pbonzini@redhat.com>
+From: Deepa Srinivasan <deepa.srinivasan@oracle.com>
-Yet another pure code movement patch, preparing for the next change.
+Starting qemu with the following arguments causes qemu to segfault:
 ... -device lsi,id=lsi0 -drive file=iscsi:<...>,format=raw,if=none,node-name=
 iscsi1 -device scsi-block,bus=lsi0.0,id=<...>,drive=iscsi1
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+This patch fixes blk_aio_ioctl() so it does not pass stack addresses to
-Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+blk_aio_ioctl_entry() which may be invoked after blk_aio_ioctl() returns. More
-Message-id: 20170508141310.8674-9-pbonzini@redhat.com
+details about the bug follow.
-Signed-off-by: Jeff Cody <jcody@redhat.com>
 blk_aio_ioctl() invokes blk_aio_prwv() with blk_aio_ioctl_entry as the
 coroutine parameter. blk_aio_prwv() ultimately calls aio_co_enter().
 When blk_aio_ioctl() is executed from within a coroutine context (e.g.
 iscsi_bh_cb()), aio_co_enter() adds the coroutine (blk_aio_ioctl_entry) to
 the current coroutine's wakeup queue. blk_aio_ioctl() then returns.
 When blk_aio_ioctl_entry() executes later, it accesses an invalid pointer:
 ....
     BlkRwCo *rwco = &acb->rwco;
     rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
                              rwco->qiov->iov[0].iov_base);  <--- qiov is
                                                                  invalid here
 ...
 In the case when blk_aio_ioctl() is called from a non-coroutine context,
 blk_aio_ioctl_entry() executes immediately. But if bdrv_co_ioctl() calls
 qemu_coroutine_yield(), blk_aio_ioctl() will return. When the coroutine
 execution is complete, control returns to blk_aio_ioctl_entry() after the call
 to blk_co_ioctl(). There is no invalid reference after this point, but the
 function is still holding on to invalid pointers.
 The fix is to change blk_aio_prwv() to accept a void pointer for the IO buffer
 rather than a QEMUIOVector. blk_aio_prwv() passes this through in BlkRwCo and the
 coroutine function casts it to QEMUIOVector or uses the void pointer directly.
 Signed-off-by: Deepa Srinivasan <deepa.srinivasan@oracle.com>
 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
 Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
 Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- blockjob.c | 128 ++++++++++++++++++++++++++++++-------------------------------
+ block/block-backend.c | 51 +++++++++++++++++++++++++--------------------------
-file changed, 64 insertions(+), 64 deletions(-)
+file changed, 25 insertions(+), 26 deletions(-)
-diff --git a/blockjob.c b/blockjob.c
+diff --git a/block/block-backend.c b/block/block-backend.c
 index XXXXXXX..XXXXXXX 100644
---- a/blockjob.c
+--- a/block/block-backend.c
-+++ b/blockjob.c
++++ b/block/block-backend.c
-@@ -XXX,XX +XXX,XX @@ BlockJob *block_job_get(const char *id)
+@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
-     return NULL;
+ typedef struct BlkRwCo {
      BlockBackend *blk;
      int64_t offset;
 -    QEMUIOVector *qiov;
 +    void *iobuf;
      int ret;
      BdrvRequestFlags flags;
  } BlkRwCo;
@@ -XXX,XX +XXX,XX @@ typedef struct BlkRwCo {
  static void blk_read_entry(void *opaque)
  {
      BlkRwCo *rwco = opaque;
 +    QEMUIOVector *qiov = rwco->iobuf;
 -    rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, rwco->qiov->size,
 -                              rwco->qiov, rwco->flags);
 +    rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, qiov->size,
 +                              qiov, rwco->flags);
  }
-+BlockJobTxn *block_job_txn_new(void)
+ static void blk_write_entry(void *opaque)
-+{
+ {
-+    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
+     BlkRwCo *rwco = opaque;
-+    QLIST_INIT(&txn->jobs);
++    QEMUIOVector *qiov = rwco->iobuf;
-+    txn->refcnt = 1;
-+    return txn;
+-    rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, rwco->qiov->size,
-+}
+-                               rwco->qiov, rwco->flags);
 +    rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, qiov->size,
 +                               qiov, rwco->flags);
  }
  static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
@@ -XXX,XX +XXX,XX @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
      rwco = (BlkRwCo) {
          .blk    = blk,
          .offset = offset,
 -        .qiov   = &qiov,
 +        .iobuf  = &qiov,
          .flags  = flags,
          .ret    = NOT_DONE,
      };
@@ -XXX,XX +XXX,XX @@ static void blk_aio_complete_bh(void *opaque)
  }
  static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
 -                                QEMUIOVector *qiov, CoroutineEntry co_entry,
 +                                void *iobuf, CoroutineEntry co_entry,
                                  BdrvRequestFlags flags,
                                  BlockCompletionFunc *cb, void *opaque)
  {
@@ -XXX,XX +XXX,XX @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
      acb->rwco = (BlkRwCo) {
          .blk    = blk,
          .offset = offset,
 -        .qiov   = qiov,
 +        .iobuf  = iobuf,
          .flags  = flags,
          .ret    = NOT_DONE,
      };
@@ -XXX,XX +XXX,XX @@ static void blk_aio_read_entry(void *opaque)
  {
      BlkAioEmAIOCB *acb = opaque;
      BlkRwCo *rwco = &acb->rwco;
 +    QEMUIOVector *qiov = rwco->iobuf;
 -    assert(rwco->qiov->size == acb->bytes);
 +    assert(qiov->size == acb->bytes);
      rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, acb->bytes,
 -                              rwco->qiov, rwco->flags);
 +                              qiov, rwco->flags);
      blk_aio_complete(acb);
  }
@@ -XXX,XX +XXX,XX @@ static void blk_aio_write_entry(void *opaque)
  {
      BlkAioEmAIOCB *acb = opaque;
      BlkRwCo *rwco = &acb->rwco;
 +    QEMUIOVector *qiov = rwco->iobuf;
 -    assert(!rwco->qiov || rwco->qiov->size == acb->bytes);
 +    assert(!qiov || qiov->size == acb->bytes);
      rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, acb->bytes,
 -                               rwco->qiov, rwco->flags);
 +                               qiov, rwco->flags);
      blk_aio_complete(acb);
  }
@@ -XXX,XX +XXX,XX @@ int blk_co_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
  static void blk_ioctl_entry(void *opaque)
  {
      BlkRwCo *rwco = opaque;
 +    QEMUIOVector *qiov = rwco->iobuf;
 +
-+static void block_job_txn_ref(BlockJobTxn *txn)
+     rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
-+{
+-                             rwco->qiov->iov[0].iov_base);
-+    txn->refcnt++;
++                             qiov->iov[0].iov_base);
-+}
+ }
  int blk_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
@@ -XXX,XX +XXX,XX @@ static void blk_aio_ioctl_entry(void *opaque)
      BlkAioEmAIOCB *acb = opaque;
      BlkRwCo *rwco = &acb->rwco;
 -    rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
 -                             rwco->qiov->iov[0].iov_base);
 +    rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset, rwco->iobuf);
 +
-+void block_job_txn_unref(BlockJobTxn *txn)
+     blk_aio_complete(acb);
-+{
+ }
-+    if (txn && --txn->refcnt == 0) {
-+        g_free(txn);
+ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf,
-+    }
+                           BlockCompletionFunc *cb, void *opaque)
-+}
+ {
 -    QEMUIOVector qiov;
 -    struct iovec iov;
 -
 -    iov = (struct iovec) {
 -        .iov_base = buf,
 -        .iov_len = 0,
 -    };
 -    qemu_iovec_init_external(&qiov, &iov, 1);
 -
 -    return blk_aio_prwv(blk, req, 0, &qiov, blk_aio_ioctl_entry, 0, cb, opaque);
 +    return blk_aio_prwv(blk, req, 0, buf, blk_aio_ioctl_entry, 0, cb, opaque);
  }
  int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int bytes)
@@ -XXX,XX +XXX,XX @@ int blk_truncate(BlockBackend *blk, int64_t offset, PreallocMode prealloc,
  static void blk_pdiscard_entry(void *opaque)
  {
      BlkRwCo *rwco = opaque;
 -    rwco->ret = blk_co_pdiscard(rwco->blk, rwco->offset, rwco->qiov->size);
 +    QEMUIOVector *qiov = rwco->iobuf;
 +
-+void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
++    rwco->ret = blk_co_pdiscard(rwco->blk, rwco->offset, qiov->size);
 +{
 +    if (!txn) {
 +        return;
 +    }
 +
 +    assert(!job->txn);
 +    job->txn = txn;
 +
 +    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
 +    block_job_txn_ref(txn);
 +}
 +
  static void block_job_pause(BlockJob *job)
  {
      job->pause_count++;
@@ -XXX,XX +XXX,XX @@ static void block_job_cancel_async(BlockJob *job)
      job->cancelled = true;
  }
-+static int block_job_finish_sync(BlockJob *job,
+ int blk_pdiscard(BlockBackend *blk, int64_t offset, int bytes)
 +                                 void (*finish)(BlockJob *, Error **errp),
 +                                 Error **errp)
 +{
 +    Error *local_err = NULL;
 +    int ret;
 +
 +    assert(blk_bs(job->blk)->job == job);
 +
 +    block_job_ref(job);
 +
 +    finish(job, &local_err);
 +    if (local_err) {
 +        error_propagate(errp, local_err);
 +        block_job_unref(job);
 +        return -EBUSY;
 +    }
 +    /* block_job_drain calls block_job_enter, and it should be enough to
 +     * induce progress until the job completes or moves to the main thread.
 +    */
 +    while (!job->deferred_to_main_loop && !job->completed) {
 +        block_job_drain(job);
 +    }
 +    while (!job->completed) {
 +        aio_poll(qemu_get_aio_context(), true);
 +    }
 +    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
 +    block_job_unref(job);
 +    return ret;
 +}
 +
  static void block_job_completed_txn_abort(BlockJob *job)
  {
      AioContext *ctx;
@@ -XXX,XX +XXX,XX @@ void block_job_cancel(BlockJob *job)
      }
  }
 -static int block_job_finish_sync(BlockJob *job,
 -                                 void (*finish)(BlockJob *, Error **errp),
 -                                 Error **errp)
 -{
 -    Error *local_err = NULL;
 -    int ret;
 -
 -    assert(blk_bs(job->blk)->job == job);
 -
 -    block_job_ref(job);
 -
 -    finish(job, &local_err);
 -    if (local_err) {
 -        error_propagate(errp, local_err);
 -        block_job_unref(job);
 -        return -EBUSY;
 -    }
 -    /* block_job_drain calls block_job_enter, and it should be enough to
 -     * induce progress until the job completes or moves to the main thread.
 -    */
 -    while (!job->deferred_to_main_loop && !job->completed) {
 -        block_job_drain(job);
 -    }
 -    while (!job->completed) {
 -        aio_poll(qemu_get_aio_context(), true);
 -    }
 -    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
 -    block_job_unref(job);
 -    return ret;
 -}
 -
  /* A wrapper around block_job_cancel() taking an Error ** parameter so it may be
   * used with block_job_finish_sync() without the need for (rather nasty)
   * function pointer casts there. */
@@ -XXX,XX +XXX,XX @@ void block_job_defer_to_main_loop(BlockJob *job,
      aio_bh_schedule_oneshot(qemu_get_aio_context(),
                              block_job_defer_to_main_loop_bh, data);
  }
 -
 -BlockJobTxn *block_job_txn_new(void)
 -{
 -    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
 -    QLIST_INIT(&txn->jobs);
 -    txn->refcnt = 1;
 -    return txn;
 -}
 -
 -static void block_job_txn_ref(BlockJobTxn *txn)
 -{
 -    txn->refcnt++;
 -}
 -
 -void block_job_txn_unref(BlockJobTxn *txn)
 -{
 -    if (txn && --txn->refcnt == 0) {
 -        g_free(txn);
 -    }
 -}
 -
 -void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
 -{
 -    if (!txn) {
 -        return;
 -    }
 -
 -    assert(!job->txn);
 -    job->txn = txn;
 -
 -    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
 -    block_job_txn_ref(txn);
 -}
 --
-.9.3
+.14.3

-[Qemu-devel] [PULL 01/12] blockjob: remove unnecessary check
+[Qemu-devel] [PULL 2/7] README: Fix typo 'git-publish'
-From: Paolo Bonzini <pbonzini@redhat.com>
+From: Fam Zheng <famz@redhat.com>
-!job is always checked prior to the call, drop it from here.
+Reported-by: Alberto Garcia <berto@igalia.com>
+Signed-off-by: Fam Zheng <famz@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+Message-id: 20180306024328.19195-1-famz@redhat.com
-Reviewed-by: Jeff Cody <jcody@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-id: 20170508141310.8674-2-pbonzini@redhat.com
 Signed-off-by: Jeff Cody <jcody@redhat.com>
 ---
- blockjob.c | 2 +-
+ README | 2 +-
 file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/blockjob.c b/blockjob.c
+diff --git a/README b/README
 index XXXXXXX..XXXXXXX 100644
---- a/blockjob.c
+--- a/README
-+++ b/blockjob.c
++++ b/README
-@@ -XXX,XX +XXX,XX @@ static bool block_job_should_pause(BlockJob *job)
+@@ -XXX,XX +XXX,XX @@ The QEMU website is also maintained under source control.
+   git clone git://git.qemu.org/qemu-web.git
- bool block_job_user_paused(BlockJob *job)
+   https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/
- {
--    return job ? job->user_paused : 0;
+-A 'git-profile' utility was created to make above process less
-+    return job->user_paused;
++A 'git-publish' utility was created to make above process less
- }
+ cumbersome, and is highly recommended for making regular contributions,
+ or even just for sending consecutive patch series revisions. It also
- void coroutine_fn block_job_pause_point(BlockJob *job)
+ requires a working 'git send-email' setup, and by default doesn't
 --
-.9.3
+.14.3

-[Qemu-devel] [PULL 02/12] blockjob: remove iostatus_reset callback
+Deleted patch
-From: Paolo Bonzini <pbonzini@redhat.com>
-This is unused since commit 66a0fae ("blockjob: Don't touch BDS iostatus",
--05-19).
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: John Snow <jsnow@redhat.com>
-Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
-Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-Reviewed-by: Jeff Cody <jcody@redhat.com>
-Message-id: 20170508141310.8674-3-pbonzini@redhat.com
-Signed-off-by: Jeff Cody <jcody@redhat.com>
----
- blockjob.c                   | 3 ---
- include/block/blockjob_int.h | 3 ---
-files changed, 6 deletions(-)
-diff --git a/blockjob.c b/blockjob.c
-index XXXXXXX..XXXXXXX 100644
---- a/blockjob.c
-+++ b/blockjob.c
-@@ -XXX,XX +XXX,XX @@ bool block_job_is_cancelled(BlockJob *job)
- void block_job_iostatus_reset(BlockJob *job)
- {
-     job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
--    if (job->driver->iostatus_reset) {
--        job->driver->iostatus_reset(job);
--    }
- }
- static int block_job_finish_sync(BlockJob *job,
-diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/block/blockjob_int.h
-+++ b/include/block/blockjob_int.h
-@@ -XXX,XX +XXX,XX @@ struct BlockJobDriver {
-     /** Optional callback for job types that support setting a speed limit */
-     void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
--    /** Optional callback for job types that need to forward I/O status reset */
--    void (*iostatus_reset)(BlockJob *job);
--
-     /** Mandatory: Entrypoint for the Coroutine. */
-     CoroutineEntry *start;
---
-.9.3

-[Qemu-devel] [PULL 11/12] blockjob: use deferred_to_main_loop to indicate the coroutine has ended
+[Qemu-devel] [PULL 3/7] virtio-blk: dataplane: Don't batch notifications if EVENT_IDX is present
-From: Paolo Bonzini <pbonzini@redhat.com>
+From: Sergio Lopez <slp@redhat.com>
-All block jobs are using block_job_defer_to_main_loop as the final
+Commit 5b2ffbe4d99843fd8305c573a100047a8c962327 ("virtio-blk: dataplane:
-step just before the coroutine terminates.  At this point,
+notify guest as a batch") deferred guest notification to a BH in order
-block_job_enter should do nothing, but currently it restarts
+batch notifications, with purpose of avoiding flooding the guest with
-the freed coroutine.
+interruptions.
-Now, the job->co states should probably be changed to an enum
+This optimization came with a cost. The average latency perceived in the
-(e.g. BEFORE_START, STARTED, YIELDED, COMPLETED) subsuming
+guest is increased by a few microseconds, but also when multiple IO
-block_job_started, job->deferred_to_main_loop and job->busy.
+operations finish at the same time, the guest won't be notified until
-For now, this patch eliminates the problematic reenter by
+all completions from each operation has been run. On the contrary,
-removing the reset of job->deferred_to_main_loop (which served
+virtio-scsi issues the notification at the end of each completion.
 no purpose, as far as I could see) and checking the flag in
 block_job_enter.
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+On the other hand, nowadays we have the EVENT_IDX feature that allows a
-Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+better coordination between QEMU and the Guest OS to avoid sending
-Message-id: 20170508141310.8674-12-pbonzini@redhat.com
+unnecessary interruptions.
-Signed-off-by: Jeff Cody <jcody@redhat.com>
 With this change, virtio-blk/dataplane only batches notifications if the
 EVENT_IDX feature is not present.
 Some numbers obtained with fio (ioengine=sync, iodepth=1, direct=1):
  - Test specs:
    * fio-3.4 (ioengine=sync, iodepth=1, direct=1)
    * qemu master
    * virtio-blk with a dedicated iothread (default poll-max-ns)
    * backend: null_blk nr_devices=1 irqmode=2 completion_nsec=280000
    * 8 vCPUs pinned to isolated physical cores
    * Emulator and iothread also pinned to separate isolated cores
    * variance between runs < 1%
  - Not patched
    * numjobs=1:  lat_avg=327.32  irqs=29998
    * numjobs=4:  lat_avg=337.89  irqs=29073
    * numjobs=8:  lat_avg=342.98  irqs=28643
  - Patched:
    * numjobs=1:  lat_avg=323.92  irqs=30262
    * numjobs=4:  lat_avg=332.65  irqs=29520
    * numjobs=8:  lat_avg=335.54  irqs=29323
 Signed-off-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20180307114459.26636-1-slp@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- blockjob.c                   | 10 ++++++++--
+ hw/block/dataplane/virtio-blk.c | 15 +++++++++++++--
- include/block/blockjob_int.h |  3 ++-
+file changed, 13 insertions(+), 2 deletions(-)
 files changed, 10 insertions(+), 3 deletions(-)
-diff --git a/blockjob.c b/blockjob.c
+diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
 index XXXXXXX..XXXXXXX 100644
---- a/blockjob.c
+--- a/hw/block/dataplane/virtio-blk.c
-+++ b/blockjob.c
++++ b/hw/block/dataplane/virtio-blk.c
-@@ -XXX,XX +XXX,XX @@ void block_job_resume_all(void)
+@@ -XXX,XX +XXX,XX @@ struct VirtIOBlockDataPlane {
+     VirtIODevice *vdev;
- void block_job_enter(BlockJob *job)
+     QEMUBH *bh;                     /* bh for guest notification */
      unsigned long *batch_notify_vqs;
 +    bool batch_notifications;
      /* Note that these EventNotifiers are assigned by value.  This is
       * fine as long as you do not call event_notifier_cleanup on them
@@ -XXX,XX +XXX,XX @@ struct VirtIOBlockDataPlane {
  /* Raise an interrupt to signal guest, if necessary */
  void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, VirtQueue *vq)
  {
--    if (job->co && !job->busy) {
+-    set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
-+    if (!block_job_started(job)) {
+-    qemu_bh_schedule(s->bh);
-+        return;
++    if (s->batch_notifications) {
 +        set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
 +        qemu_bh_schedule(s->bh);
 +    } else {
 +        virtio_notify_irqfd(s->vdev, vq);
 +    }
-+    if (job->deferred_to_main_loop) {
+ }
-+        return;
  static void notify_guest_bh(void *opaque)
@@ -XXX,XX +XXX,XX @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
      s->starting = true;
 +    if (!virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
 +        s->batch_notifications = true;
 +    } else {
 +        s->batch_notifications = false;
 +    }
 +
-+    if (!job->busy) {
+     /* Set up guest notifier (irq) */
-         bdrv_coroutine_enter(blk_bs(job->blk), job->co);
+     r = k->set_guest_notifiers(qbus->parent, nvqs, true);
-     }
+     if (r != 0) {
  }
@@ -XXX,XX +XXX,XX @@ static void block_job_defer_to_main_loop_bh(void *opaque)
          aio_context_acquire(aio_context);
      }
 -    data->job->deferred_to_main_loop = false;
      data->fn(data->job, data->opaque);
      if (aio_context != data->aio_context) {
 diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/block/blockjob_int.h
 +++ b/include/block/blockjob_int.h
@@ -XXX,XX +XXX,XX @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
   * @fn: The function to run in the main loop
   * @opaque: The opaque value that is passed to @fn
   *
 - * Execute a given function in the main loop with the BlockDriverState
 + * This function must be called by the main job coroutine just before it
 + * returns.  @fn is executed in the main loop with the BlockDriverState
   * AioContext acquired.  Block jobs must call bdrv_unref(), bdrv_close(), and
   * anything that uses bdrv_drain_all() in the main loop.
   *
 --
-.9.3
+.14.3

-[Qemu-devel] [PULL 06/12] blockjob: move iostatus reset inside block_job_user_resume
+[Qemu-devel] [PULL 4/7] block: add aio_wait_bh_oneshot()
-From: Paolo Bonzini <pbonzini@redhat.com>
+Sometimes it's necessary for the main loop thread to run a BH in an
 IOThread and wait for its completion.  This primitive is useful during
 startup/shutdown to synchronize and avoid race conditions.
-Outside blockjob.c, the block_job_iostatus_reset function is used once
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-in the monitor and once in BlockBackend.  When we introduce the block
+Reviewed-by: Fam Zheng <famz@redhat.com>
-job mutex, block_job_iostatus_reset's client is going to be the block
+Acked-by: Paolo Bonzini <pbonzini@redhat.com>
-layer (for which blockjob.c will take the block job mutex) rather than
+Message-id: 20180307144205.20619-2-stefanha@redhat.com
-the monitor (which will take the block job mutex by itself).
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
  include/block/aio-wait.h | 13 +++++++++++++
  util/aio-wait.c          | 31 +++++++++++++++++++++++++++++++
 files changed, 44 insertions(+)
-The monitor's call to block_job_iostatus_reset from the monitor comes
+diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
 just before the sole call to block_job_user_resume, so reset the
 iostatus directly from block_job_iostatus_reset.  This will avoid
 the need to introduce separate block_job_iostatus_reset and
 block_job_iostatus_reset_locked APIs.
 After making this change, move the function together with the others
 that were moved in the previous patch.
 Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
 Reviewed-by: John Snow <jsnow@redhat.com>
 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
 Reviewed-by: Jeff Cody <jcody@redhat.com>
 Message-id: 20170508141310.8674-7-pbonzini@redhat.com
 Signed-off-by: Jeff Cody <jcody@redhat.com>
 ---
  blockdev.c |  1 -
  blockjob.c | 11 ++++++-----
 files changed, 6 insertions(+), 6 deletions(-)
 diff --git a/blockdev.c b/blockdev.c
 index XXXXXXX..XXXXXXX 100644
---- a/blockdev.c
+--- a/include/block/aio-wait.h
-+++ b/blockdev.c
++++ b/include/block/aio-wait.h
-@@ -XXX,XX +XXX,XX @@ void qmp_block_job_resume(const char *device, Error **errp)
+@@ -XXX,XX +XXX,XX @@ typedef struct {
-     }
+  */
+ void aio_wait_kick(AioWait *wait);
-     trace_qmp_block_job_resume(job);
--    block_job_iostatus_reset(job);
++/**
-     block_job_user_resume(job);
++ * aio_wait_bh_oneshot:
-     aio_context_release(aio_context);
++ * @ctx: the aio context
- }
++ * @cb: the BH callback function
-diff --git a/blockjob.c b/blockjob.c
++ * @opaque: user data for the BH callback function
 + *
 + * Run a BH in @ctx and wait for it to complete.
 + *
 + * Must be called from the main loop thread with @ctx acquired exactly once.
 + * Note that main loop event processing may occur.
 + */
 +void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
 +
  #endif /* QEMU_AIO_WAIT */
 diff --git a/util/aio-wait.c b/util/aio-wait.c
 index XXXXXXX..XXXXXXX 100644
---- a/blockjob.c
+--- a/util/aio-wait.c
-+++ b/blockjob.c
++++ b/util/aio-wait.c
-@@ -XXX,XX +XXX,XX @@ void block_job_user_resume(BlockJob *job)
+@@ -XXX,XX +XXX,XX @@ void aio_wait_kick(AioWait *wait)
- {
+         aio_bh_schedule_oneshot(qemu_get_aio_context(), dummy_bh_cb, NULL);
      if (job && job->user_paused && job->pause_count > 0) {
          job->user_paused = false;
 +        block_job_iostatus_reset(job);
          block_job_resume(job);
      }
  }
-@@ -XXX,XX +XXX,XX @@ void block_job_cancel(BlockJob *job)
++
-     }
++typedef struct {
- }
++    AioWait wait;
++    bool done;
--void block_job_iostatus_reset(BlockJob *job)
++    QEMUBHFunc *cb;
--{
++    void *opaque;
--    job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
++} AioWaitBHData;
--}
++
--
++/* Context: BH in IOThread */
- static int block_job_finish_sync(BlockJob *job,
++static void aio_wait_bh(void *opaque)
                                   void (*finish)(BlockJob *, Error **errp),
                                   Error **errp)
@@ -XXX,XX +XXX,XX @@ void block_job_yield(BlockJob *job)
      block_job_pause_point(job);
  }
 +void block_job_iostatus_reset(BlockJob *job)
 +{
-+    job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
++    AioWaitBHData *data = opaque;
 +
 +    data->cb(data->opaque);
 +
 +    data->done = true;
 +    aio_wait_kick(&data->wait);
 +}
 +
- void block_job_event_ready(BlockJob *job)
++void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
- {
++{
-     job->ready = true;
++    AioWaitBHData data = {
 +        .cb = cb,
 +        .opaque = opaque,
 +    };
 +
 +    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 +
 +    aio_bh_schedule_oneshot(ctx, aio_wait_bh, &data);
 +    AIO_WAIT_WHILE(&data.wait, ctx, !data.done);
 +}
 --
-.9.3
+.14.3

-[Qemu-devel] [PULL 04/12] blockjob: introduce block_job_pause/resume_all
+[Qemu-devel] [PULL 5/7] virtio-blk: fix race between .ioeventfd_stop() and vq handler
-From: Paolo Bonzini <pbonzini@redhat.com>
+If the main loop thread invokes .ioeventfd_stop() just as the vq handler
 function begins in the IOThread then the handler may lose the race for
 the AioContext lock.  By the time the vq handler is able to acquire the
 AioContext lock the ioeventfd has already been removed and the handler
 isn't supposed to run anymore!
-Remove use of block_job_pause/resume from outside blockjob.c, thus
+Use the new aio_wait_bh_oneshot() function to perform ioeventfd removal
-making them static.  The new functions are used by the block layer,
+from within the IOThread.  This way no races with the vq handler are
-so place them in blockjob_int.h.
+possible.
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: John Snow <jsnow@redhat.com>
+Reviewed-by: Fam Zheng <famz@redhat.com>
-Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+Acked-by: Paolo Bonzini <pbonzini@redhat.com>
-Reviewed-by: Jeff Cody <jcody@redhat.com>
+Message-id: 20180307144205.20619-3-stefanha@redhat.com
-Message-id: 20170508141310.8674-5-pbonzini@redhat.com
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Signed-off-by: Jeff Cody <jcody@redhat.com>
 ---
- block/io.c                   |  19 ++------
+ hw/block/dataplane/virtio-blk.c | 24 +++++++++++++++++-------
- blockjob.c                   | 114 ++++++++++++++++++++++++++-----------------
+file changed, 17 insertions(+), 7 deletions(-)
  include/block/blockjob.h     |  16 ------
  include/block/blockjob_int.h |  14 ++++++
 files changed, 86 insertions(+), 77 deletions(-)
-diff --git a/block/io.c b/block/io.c
+diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/io.c
+--- a/hw/block/dataplane/virtio-blk.c
-+++ b/block/io.c
++++ b/hw/block/dataplane/virtio-blk.c
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
- #include "trace.h"
+     return -ENOSYS;
  #include "sysemu/block-backend.h"
  #include "block/blockjob.h"
 +#include "block/blockjob_int.h"
  #include "block/block_int.h"
  #include "qemu/cutils.h"
  #include "qapi/error.h"
@@ -XXX,XX +XXX,XX @@ void bdrv_drain_all_begin(void)
      bool waited = true;
      BlockDriverState *bs;
      BdrvNextIterator it;
 -    BlockJob *job = NULL;
      GSList *aio_ctxs = NULL, *ctx;
 -    while ((job = block_job_next(job))) {
 -        AioContext *aio_context = blk_get_aio_context(job->blk);
 -
 -        aio_context_acquire(aio_context);
 -        block_job_pause(job);
 -        aio_context_release(aio_context);
 -    }
 +    block_job_pause_all();
      for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
          AioContext *aio_context = bdrv_get_aio_context(bs);
@@ -XXX,XX +XXX,XX @@ void bdrv_drain_all_end(void)
  {
      BlockDriverState *bs;
      BdrvNextIterator it;
 -    BlockJob *job = NULL;
      for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
          AioContext *aio_context = bdrv_get_aio_context(bs);
@@ -XXX,XX +XXX,XX @@ void bdrv_drain_all_end(void)
          aio_context_release(aio_context);
      }
 -    while ((job = block_job_next(job))) {
 -        AioContext *aio_context = blk_get_aio_context(job->blk);
 -
 -        aio_context_acquire(aio_context);
 -        block_job_resume(job);
 -        aio_context_release(aio_context);
 -    }
 +    block_job_resume_all();
  }
- void bdrv_drain_all(void)
++/* Stop notifications for new requests from guest.
-diff --git a/blockjob.c b/blockjob.c
++ *
-index XXXXXXX..XXXXXXX 100644
++ * Context: BH in IOThread
---- a/blockjob.c
++ */
-+++ b/blockjob.c
++static void virtio_blk_data_plane_stop_bh(void *opaque)
@@ -XXX,XX +XXX,XX @@ struct BlockJobTxn {
  static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
 -static char *child_job_get_parent_desc(BdrvChild *c)
 -{
 -    BlockJob *job = c->opaque;
 -    return g_strdup_printf("%s job '%s'",
 -                           BlockJobType_lookup[job->driver->job_type],
 -                           job->id);
 -}
 -
 -static const BdrvChildRole child_job = {
 -    .get_parent_desc    = child_job_get_parent_desc,
 -    .stay_at_node       = true,
 -};
 -
 -static void block_job_drained_begin(void *opaque)
 -{
 -    BlockJob *job = opaque;
 -    block_job_pause(job);
 -}
 -
 -static void block_job_drained_end(void *opaque)
 -{
 -    BlockJob *job = opaque;
 -    block_job_resume(job);
 -}
 -
 -static const BlockDevOps block_job_dev_ops = {
 -    .drained_begin = block_job_drained_begin,
 -    .drained_end = block_job_drained_end,
 -};
 -
  BlockJob *block_job_next(BlockJob *job)
  {
      if (!job) {
@@ -XXX,XX +XXX,XX @@ BlockJob *block_job_get(const char *id)
      return NULL;
  }
 +static void block_job_pause(BlockJob *job)
 +{
-+    job->pause_count++;
++    VirtIOBlockDataPlane *s = opaque;
-+}
++    unsigned i;
 +
-+static void block_job_resume(BlockJob *job)
++    for (i = 0; i < s->conf->num_queues; i++) {
-+{
++        VirtQueue *vq = virtio_get_queue(s->vdev, i);
 +    assert(job->pause_count > 0);
 +    job->pause_count--;
 +    if (job->pause_count) {
 +        return;
 +    }
 +    block_job_enter(job);
 +}
 +
- static void block_job_ref(BlockJob *job)
++        virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, NULL);
  {
      ++job->refcnt;
@@ -XXX,XX +XXX,XX @@ static void block_job_detach_aio_context(void *opaque)
      block_job_unref(job);
  }
 +static char *child_job_get_parent_desc(BdrvChild *c)
 +{
 +    BlockJob *job = c->opaque;
 +    return g_strdup_printf("%s job '%s'",
 +                           BlockJobType_lookup[job->driver->job_type],
 +                           job->id);
 +}
 +
 +static const BdrvChildRole child_job = {
 +    .get_parent_desc    = child_job_get_parent_desc,
 +    .stay_at_node       = true,
 +};
 +
 +static void block_job_drained_begin(void *opaque)
 +{
 +    BlockJob *job = opaque;
 +    block_job_pause(job);
 +}
 +
 +static void block_job_drained_end(void *opaque)
 +{
 +    BlockJob *job = opaque;
 +    block_job_resume(job);
 +}
 +
 +static const BlockDevOps block_job_dev_ops = {
 +    .drained_begin = block_job_drained_begin,
 +    .drained_end = block_job_drained_end,
 +};
 +
  void block_job_remove_all_bdrv(BlockJob *job)
  {
      GSList *l;
@@ -XXX,XX +XXX,XX @@ void block_job_complete(BlockJob *job, Error **errp)
      job->driver->complete(job, errp);
  }
 -void block_job_pause(BlockJob *job)
 -{
 -    job->pause_count++;
 -}
 -
  void block_job_user_pause(BlockJob *job)
  {
      job->user_paused = true;
@@ -XXX,XX +XXX,XX @@ void coroutine_fn block_job_pause_point(BlockJob *job)
      }
  }
 -void block_job_resume(BlockJob *job)
 -{
 -    assert(job->pause_count > 0);
 -    job->pause_count--;
 -    if (job->pause_count) {
 -        return;
 -    }
 -    block_job_enter(job);
 -}
 -
  void block_job_user_resume(BlockJob *job)
  {
      if (job && job->user_paused && job->pause_count > 0) {
@@ -XXX,XX +XXX,XX @@ static void block_job_event_completed(BlockJob *job, const char *msg)
                                          &error_abort);
  }
 +void block_job_pause_all(void)
 +{
 +    BlockJob *job = NULL;
 +    while ((job = block_job_next(job))) {
 +        AioContext *aio_context = blk_get_aio_context(job->blk);
 +
 +        aio_context_acquire(aio_context);
 +        block_job_pause(job);
 +        aio_context_release(aio_context);
 +    }
 +}
 +
-+void block_job_resume_all(void)
+ /* Context: QEMU global mutex held */
-+{
+ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 +    BlockJob *job = NULL;
 +    while ((job = block_job_next(job))) {
 +        AioContext *aio_context = blk_get_aio_context(job->blk);
 +
 +        aio_context_acquire(aio_context);
 +        block_job_resume(job);
 +        aio_context_release(aio_context);
 +    }
 +}
 +
  void block_job_event_ready(BlockJob *job)
  {
-     job->ready = true;
+@@ -XXX,XX +XXX,XX @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
-diff --git a/include/block/blockjob.h b/include/block/blockjob.h
+     trace_virtio_blk_data_plane_stop(s);
-index XXXXXXX..XXXXXXX 100644
---- a/include/block/blockjob.h
+     aio_context_acquire(s->ctx);
 +++ b/include/block/blockjob.h
@@ -XXX,XX +XXX,XX @@ void block_job_complete(BlockJob *job, Error **errp);
  BlockJobInfo *block_job_query(BlockJob *job, Error **errp);
  /**
 - * block_job_pause:
 - * @job: The job to be paused.
 - *
 - * Asynchronously pause the specified job.
 - */
 -void block_job_pause(BlockJob *job);
 -
--/**
+-    /* Stop notifications for new requests from guest */
-  * block_job_user_pause:
+-    for (i = 0; i < nvqs; i++) {
-  * @job: The job to be paused.
+-        VirtQueue *vq = virtio_get_queue(s->vdev, i);
   *
@@ -XXX,XX +XXX,XX @@ void block_job_user_pause(BlockJob *job);
  bool block_job_user_paused(BlockJob *job);
  /**
 - * block_job_resume:
 - * @job: The job to be resumed.
 - *
 - * Resume the specified job.  Must be paired with a preceding block_job_pause.
 - */
 -void block_job_resume(BlockJob *job);
 -
--/**
+-        virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, NULL);
-  * block_job_user_resume:
+-    }
-  * @job: The job to be resumed.
++    aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
-  *
-diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
+     /* Drain and switch bs back to the QEMU main loop */
-index XXXXXXX..XXXXXXX 100644
+     blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
 --- a/include/block/blockjob_int.h
 +++ b/include/block/blockjob_int.h
@@ -XXX,XX +XXX,XX @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
  void block_job_yield(BlockJob *job);
  /**
 + * block_job_pause_all:
 + *
 + * Asynchronously pause all jobs.
 + */
 +void block_job_pause_all(void);
 +
 +/**
 + * block_job_resume_all:
 + *
 + * Resume all block jobs.  Must be paired with a preceding block_job_pause_all.
 + */
 +void block_job_resume_all(void);
 +
 +/**
   * block_job_early_fail:
   * @bs: The block device.
   *
 --
-.9.3
+.14.3

-[Qemu-devel] [PULL 05/12] blockjob: separate monitor and blockjob APIs
+[Qemu-devel] [PULL 6/7] virtio-scsi: fix race between .ioeventfd_stop() and vq handler
-From: Paolo Bonzini <pbonzini@redhat.com>
+If the main loop thread invokes .ioeventfd_stop() just as the vq handler
 function begins in the IOThread then the handler may lose the race for
 the AioContext lock.  By the time the vq handler is able to acquire the
 AioContext lock the ioeventfd has already been removed and the handler
 isn't supposed to run anymore!
-We have two different headers for block job operations, blockjob.h
+Use the new aio_wait_bh_oneshot() function to perform ioeventfd removal
-and blockjob_int.h.  The former contains APIs called by the monitor,
+from within the IOThread.  This way no races with the vq handler are
-the latter contains APIs called by the block job drivers and the
+possible.
 block layer itself.
-Keep the two APIs separate in the blockjob.c file too.  This will
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-be useful when transitioning away from the AioContext lock, because
+Reviewed-by: Fam Zheng <famz@redhat.com>
-there will be locking policies for the two categories, too---the
+Acked-by: Paolo Bonzini <pbonzini@redhat.com>
-monitor will have to call new block_job_lock/unlock APIs, while blockjob
+Message-id: 20180307144205.20619-4-stefanha@redhat.com
-APIs will take care of this for the users.
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
  hw/scsi/virtio-scsi-dataplane.c | 9 +++++----
 file changed, 5 insertions(+), 4 deletions(-)
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
 Message-id: 20170508141310.8674-6-pbonzini@redhat.com
 Signed-off-by: Jeff Cody <jcody@redhat.com>
 ---
  blockjob.c | 390 ++++++++++++++++++++++++++++++++-----------------------------
 file changed, 205 insertions(+), 185 deletions(-)
 diff --git a/blockjob.c b/blockjob.c
 index XXXXXXX..XXXXXXX 100644
---- a/blockjob.c
+--- a/hw/scsi/virtio-scsi-dataplane.c
-+++ b/blockjob.c
++++ b/hw/scsi/virtio-scsi-dataplane.c
-@@ -XXX,XX +XXX,XX @@ struct BlockJobTxn {
+@@ -XXX,XX +XXX,XX @@ static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue *vq, int n,
  static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
 +/*
 + * The block job API is composed of two categories of functions.
 + *
 + * The first includes functions used by the monitor.  The monitor is
 + * peculiar in that it accesses the block job list with block_job_get, and
 + * therefore needs consistency across block_job_get and the actual operation
 + * (e.g. block_job_set_speed).  The consistency is achieved with
 + * aio_context_acquire/release.  These functions are declared in blockjob.h.
 + *
 + * The second includes functions used by the block job drivers and sometimes
 + * by the core block layer.  These do not care about locking, because the
 + * whole coroutine runs under the AioContext lock, and are declared in
 + * blockjob_int.h.
 + */
 +
  BlockJob *block_job_next(BlockJob *job)
  {
      if (!job) {
@@ -XXX,XX +XXX,XX @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
      return 0;
  }
--void *block_job_create(const char *job_id, const BlockJobDriver *driver,
+-/* assumes s->ctx held */
--                       BlockDriverState *bs, uint64_t perm,
+-static void virtio_scsi_clear_aio(VirtIOSCSI *s)
--                       uint64_t shared_perm, int64_t speed, int flags,
++/* Context: BH in IOThread */
--                       BlockCompletionFunc *cb, void *opaque, Error **errp)
++static void virtio_scsi_dataplane_stop_bh(void *opaque)
 -{
 -    BlockBackend *blk;
 -    BlockJob *job;
 -    int ret;
 -
 -    if (bs->job) {
 -        error_setg(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
 -        return NULL;
 -    }
 -
 -    if (job_id == NULL && !(flags & BLOCK_JOB_INTERNAL)) {
 -        job_id = bdrv_get_device_name(bs);
 -        if (!*job_id) {
 -            error_setg(errp, "An explicit job ID is required for this node");
 -            return NULL;
 -        }
 -    }
 -
 -    if (job_id) {
 -        if (flags & BLOCK_JOB_INTERNAL) {
 -            error_setg(errp, "Cannot specify job ID for internal block job");
 -            return NULL;
 -        }
 -
 -        if (!id_wellformed(job_id)) {
 -            error_setg(errp, "Invalid job ID '%s'", job_id);
 -            return NULL;
 -        }
 -
 -        if (block_job_get(job_id)) {
 -            error_setg(errp, "Job ID '%s' already in use", job_id);
 -            return NULL;
 -        }
 -    }
 -
 -    blk = blk_new(perm, shared_perm);
 -    ret = blk_insert_bs(blk, bs, errp);
 -    if (ret < 0) {
 -        blk_unref(blk);
 -        return NULL;
 -    }
 -
 -    job = g_malloc0(driver->instance_size);
 -    job->driver        = driver;
 -    job->id            = g_strdup(job_id);
 -    job->blk           = blk;
 -    job->cb            = cb;
 -    job->opaque        = opaque;
 -    job->busy          = false;
 -    job->paused        = true;
 -    job->pause_count   = 1;
 -    job->refcnt        = 1;
 -
 -    error_setg(&job->blocker, "block device is in use by block job: %s",
 -               BlockJobType_lookup[driver->job_type]);
 -    block_job_add_bdrv(job, "main node", bs, 0, BLK_PERM_ALL, &error_abort);
 -    bs->job = job;
 -
 -    blk_set_dev_ops(blk, &block_job_dev_ops, job);
 -    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
 -
 -    QLIST_INSERT_HEAD(&block_jobs, job, job_list);
 -
 -    blk_add_aio_context_notifier(blk, block_job_attached_aio_context,
 -                                 block_job_detach_aio_context, job);
 -
 -    /* Only set speed when necessary to avoid NotSupported error */
 -    if (speed != 0) {
 -        Error *local_err = NULL;
 -
 -        block_job_set_speed(job, speed, &local_err);
 -        if (local_err) {
 -            block_job_unref(job);
 -            error_propagate(errp, local_err);
 -            return NULL;
 -        }
 -    }
 -    return job;
 -}
 -
  bool block_job_is_internal(BlockJob *job)
  {
-     return (job->id == NULL);
++    VirtIOSCSI *s = opaque;
-@@ -XXX,XX +XXX,XX @@ void block_job_start(BlockJob *job)
+     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
-     bdrv_coroutine_enter(blk_bs(job->blk), job->co);
+     int i;
- }
+@@ -XXX,XX +XXX,XX @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
--void block_job_early_fail(BlockJob *job)
+     return 0;
--{
--    block_job_unref(job);
+ fail_vrings:
--}
+-    virtio_scsi_clear_aio(s);
--
++    aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
- static void block_job_completed_single(BlockJob *job)
+     aio_context_release(s->ctx);
- {
+     for (i = 0; i < vs->conf.num_queues + 2; i++) {
-     if (!job->ret) {
+         virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
-@@ -XXX,XX +XXX,XX @@ static void block_job_completed_txn_success(BlockJob *job)
+@@ -XXX,XX +XXX,XX @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
-     }
+     s->dataplane_stopping = true;
- }
+     aio_context_acquire(s->ctx);
--void block_job_completed(BlockJob *job, int ret)
+-    virtio_scsi_clear_aio(s);
--{
++    aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
--    assert(blk_bs(job->blk)->job == job);
+     aio_context_release(s->ctx);
--    assert(!job->completed);
--    job->completed = true;
+     blk_drain_all(); /* ensure there are no in-flight requests */
 -    job->ret = ret;
 -    if (!job->txn) {
 -        block_job_completed_single(job);
 -    } else if (ret < 0 || block_job_is_cancelled(job)) {
 -        block_job_completed_txn_abort(job);
 -    } else {
 -        block_job_completed_txn_success(job);
 -    }
 -}
 -
  void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
  {
      Error *local_err = NULL;
@@ -XXX,XX +XXX,XX @@ void block_job_user_pause(BlockJob *job)
      block_job_pause(job);
  }
 -static bool block_job_should_pause(BlockJob *job)
 -{
 -    return job->pause_count > 0;
 -}
 -
  bool block_job_user_paused(BlockJob *job)
  {
      return job->user_paused;
  }
 -void coroutine_fn block_job_pause_point(BlockJob *job)
 -{
 -    assert(job && block_job_started(job));
 -
 -    if (!block_job_should_pause(job)) {
 -        return;
 -    }
 -    if (block_job_is_cancelled(job)) {
 -        return;
 -    }
 -
 -    if (job->driver->pause) {
 -        job->driver->pause(job);
 -    }
 -
 -    if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
 -        job->paused = true;
 -        job->busy = false;
 -        qemu_coroutine_yield(); /* wait for block_job_resume() */
 -        job->busy = true;
 -        job->paused = false;
 -    }
 -
 -    if (job->driver->resume) {
 -        job->driver->resume(job);
 -    }
 -}
 -
  void block_job_user_resume(BlockJob *job)
  {
      if (job && job->user_paused && job->pause_count > 0) {
@@ -XXX,XX +XXX,XX @@ void block_job_user_resume(BlockJob *job)
      }
  }
 -void block_job_enter(BlockJob *job)
 -{
 -    if (job->co && !job->busy) {
 -        bdrv_coroutine_enter(blk_bs(job->blk), job->co);
 -    }
 -}
 -
  void block_job_cancel(BlockJob *job)
  {
      if (block_job_started(job)) {
@@ -XXX,XX +XXX,XX @@ void block_job_cancel(BlockJob *job)
      }
  }
 -bool block_job_is_cancelled(BlockJob *job)
 -{
 -    return job->cancelled;
 -}
 -
  void block_job_iostatus_reset(BlockJob *job)
  {
      job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
@@ -XXX,XX +XXX,XX @@ int block_job_complete_sync(BlockJob *job, Error **errp)
      return block_job_finish_sync(job, &block_job_complete, errp);
  }
 -void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
 -{
 -    assert(job->busy);
 -
 -    /* Check cancellation *before* setting busy = false, too!  */
 -    if (block_job_is_cancelled(job)) {
 -        return;
 -    }
 -
 -    job->busy = false;
 -    if (!block_job_should_pause(job)) {
 -        co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
 -    }
 -    job->busy = true;
 -
 -    block_job_pause_point(job);
 -}
 -
 -void block_job_yield(BlockJob *job)
 -{
 -    assert(job->busy);
 -
 -    /* Check cancellation *before* setting busy = false, too!  */
 -    if (block_job_is_cancelled(job)) {
 -        return;
 -    }
 -
 -    job->busy = false;
 -    if (!block_job_should_pause(job)) {
 -        qemu_coroutine_yield();
 -    }
 -    job->busy = true;
 -
 -    block_job_pause_point(job);
 -}
 -
  BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
  {
      BlockJobInfo *info;
@@ -XXX,XX +XXX,XX @@ static void block_job_event_completed(BlockJob *job, const char *msg)
                                          &error_abort);
  }
 +/*
 + * API for block job drivers and the block layer.  These functions are
 + * declared in blockjob_int.h.
 + */
 +
 +void *block_job_create(const char *job_id, const BlockJobDriver *driver,
 +                       BlockDriverState *bs, uint64_t perm,
 +                       uint64_t shared_perm, int64_t speed, int flags,
 +                       BlockCompletionFunc *cb, void *opaque, Error **errp)
 +{
 +    BlockBackend *blk;
 +    BlockJob *job;
 +    int ret;
 +
 +    if (bs->job) {
 +        error_setg(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
 +        return NULL;
 +    }
 +
 +    if (job_id == NULL && !(flags & BLOCK_JOB_INTERNAL)) {
 +        job_id = bdrv_get_device_name(bs);
 +        if (!*job_id) {
 +            error_setg(errp, "An explicit job ID is required for this node");
 +            return NULL;
 +        }
 +    }
 +
 +    if (job_id) {
 +        if (flags & BLOCK_JOB_INTERNAL) {
 +            error_setg(errp, "Cannot specify job ID for internal block job");
 +            return NULL;
 +        }
 +
 +        if (!id_wellformed(job_id)) {
 +            error_setg(errp, "Invalid job ID '%s'", job_id);
 +            return NULL;
 +        }
 +
 +        if (block_job_get(job_id)) {
 +            error_setg(errp, "Job ID '%s' already in use", job_id);
 +            return NULL;
 +        }
 +    }
 +
 +    blk = blk_new(perm, shared_perm);
 +    ret = blk_insert_bs(blk, bs, errp);
 +    if (ret < 0) {
 +        blk_unref(blk);
 +        return NULL;
 +    }
 +
 +    job = g_malloc0(driver->instance_size);
 +    job->driver        = driver;
 +    job->id            = g_strdup(job_id);
 +    job->blk           = blk;
 +    job->cb            = cb;
 +    job->opaque        = opaque;
 +    job->busy          = false;
 +    job->paused        = true;
 +    job->pause_count   = 1;
 +    job->refcnt        = 1;
 +
 +    error_setg(&job->blocker, "block device is in use by block job: %s",
 +               BlockJobType_lookup[driver->job_type]);
 +    block_job_add_bdrv(job, "main node", bs, 0, BLK_PERM_ALL, &error_abort);
 +    bs->job = job;
 +
 +    blk_set_dev_ops(blk, &block_job_dev_ops, job);
 +    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
 +
 +    QLIST_INSERT_HEAD(&block_jobs, job, job_list);
 +
 +    blk_add_aio_context_notifier(blk, block_job_attached_aio_context,
 +                                 block_job_detach_aio_context, job);
 +
 +    /* Only set speed when necessary to avoid NotSupported error */
 +    if (speed != 0) {
 +        Error *local_err = NULL;
 +
 +        block_job_set_speed(job, speed, &local_err);
 +        if (local_err) {
 +            block_job_unref(job);
 +            error_propagate(errp, local_err);
 +            return NULL;
 +        }
 +    }
 +    return job;
 +}
 +
  void block_job_pause_all(void)
  {
      BlockJob *job = NULL;
@@ -XXX,XX +XXX,XX @@ void block_job_pause_all(void)
      }
  }
 +void block_job_early_fail(BlockJob *job)
 +{
 +    block_job_unref(job);
 +}
 +
 +void block_job_completed(BlockJob *job, int ret)
 +{
 +    assert(blk_bs(job->blk)->job == job);
 +    assert(!job->completed);
 +    job->completed = true;
 +    job->ret = ret;
 +    if (!job->txn) {
 +        block_job_completed_single(job);
 +    } else if (ret < 0 || block_job_is_cancelled(job)) {
 +        block_job_completed_txn_abort(job);
 +    } else {
 +        block_job_completed_txn_success(job);
 +    }
 +}
 +
 +static bool block_job_should_pause(BlockJob *job)
 +{
 +    return job->pause_count > 0;
 +}
 +
 +void coroutine_fn block_job_pause_point(BlockJob *job)
 +{
 +    assert(job && block_job_started(job));
 +
 +    if (!block_job_should_pause(job)) {
 +        return;
 +    }
 +    if (block_job_is_cancelled(job)) {
 +        return;
 +    }
 +
 +    if (job->driver->pause) {
 +        job->driver->pause(job);
 +    }
 +
 +    if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
 +        job->paused = true;
 +        job->busy = false;
 +        qemu_coroutine_yield(); /* wait for block_job_resume() */
 +        job->busy = true;
 +        job->paused = false;
 +    }
 +
 +    if (job->driver->resume) {
 +        job->driver->resume(job);
 +    }
 +}
 +
  void block_job_resume_all(void)
  {
      BlockJob *job = NULL;
@@ -XXX,XX +XXX,XX @@ void block_job_resume_all(void)
      }
  }
 +void block_job_enter(BlockJob *job)
 +{
 +    if (job->co && !job->busy) {
 +        bdrv_coroutine_enter(blk_bs(job->blk), job->co);
 +    }
 +}
 +
 +bool block_job_is_cancelled(BlockJob *job)
 +{
 +    return job->cancelled;
 +}
 +
 +void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
 +{
 +    assert(job->busy);
 +
 +    /* Check cancellation *before* setting busy = false, too!  */
 +    if (block_job_is_cancelled(job)) {
 +        return;
 +    }
 +
 +    job->busy = false;
 +    if (!block_job_should_pause(job)) {
 +        co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
 +    }
 +    job->busy = true;
 +
 +    block_job_pause_point(job);
 +}
 +
 +void block_job_yield(BlockJob *job)
 +{
 +    assert(job->busy);
 +
 +    /* Check cancellation *before* setting busy = false, too!  */
 +    if (block_job_is_cancelled(job)) {
 +        return;
 +    }
 +
 +    job->busy = false;
 +    if (!block_job_should_pause(job)) {
 +        qemu_coroutine_yield();
 +    }
 +    job->busy = true;
 +
 +    block_job_pause_point(job);
 +}
 +
  void block_job_event_ready(BlockJob *job)
  {
      job->ready = true;
 --
-.9.3
+.14.3

-[Qemu-devel] [PULL 03/12] blockjob: introduce block_job_early_fail
+[Qemu-devel] [PULL 7/7] vl: introduce vm_shutdown()
-From: Paolo Bonzini <pbonzini@redhat.com>
+Commit 00d09fdbbae5f7864ce754913efc84c12fdf9f1a ("vl: pause vcpus before
 stopping iothreads") and commit dce8921b2baaf95974af8176406881872067adfa
 ("iothread: Stop threads before main() quits") tried to work around the
 fact that emulation was still active during termination by stopping
 iothreads.  They suffer from race conditions:
 . virtio_scsi_handle_cmd_vq() racing with iothread_stop_all() hits the
    virtio_scsi_ctx_check() assertion failure because the BDS AioContext
    has been modified by iothread_stop_all().
 . Guest vq kick racing with main loop termination leaves a readable
    ioeventfd that is handled by the next aio_poll() when external
    clients are enabled again, resulting in unwanted emulation activity.
-Outside blockjob.c, block_job_unref is only used when a block job fails
+This patch obsoletes those commits by fully disabling emulation activity
-to start, and block_job_ref is not used at all.  The reference counting
+when vcpus are stopped.
 thus is pretty well hidden.  Introduce a separate function to be used
 by block jobs; because block_job_ref and block_job_unref now become
 static, move them earlier in blockjob.c.
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Use the new vm_shutdown() function instead of pause_all_vcpus() so that
-Reviewed-by: John Snow <jsnow@redhat.com>
+vm change state handlers are invoked too.  Virtio devices will now stop
-Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+their ioeventfds, preventing further emulation activity after vm_stop().
-Reviewed-by: Jeff Cody <jcody@redhat.com>
-Message-id: 20170508141310.8674-4-pbonzini@redhat.com
+Note that vm_stop(RUN_STATE_SHUTDOWN) cannot be used because it emits a
-Signed-off-by: Jeff Cody <jcody@redhat.com>
+QMP STOP event that may affect existing clients.
 It is no longer necessary to call replay_disable_events() directly since
 vm_shutdown() does so already.
 Drop iothread_stop_all() since it is no longer used.
 Cc: Fam Zheng <famz@redhat.com>
 Cc: Kevin Wolf <kwolf@redhat.com>
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Reviewed-by: Fam Zheng <famz@redhat.com>
 Acked-by: Paolo Bonzini <pbonzini@redhat.com>
 Message-id: 20180307144205.20619-5-stefanha@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/backup.c               |  2 +-
+ include/sysemu/iothread.h |  1 -
- block/commit.c               |  2 +-
+ include/sysemu/sysemu.h   |  1 +
- block/mirror.c               |  2 +-
+ cpus.c                    | 16 +++++++++++++---
- blockjob.c                   | 47 ++++++++++++++++++++++++++------------------
+ iothread.c                | 31 -------------------------------
- include/block/blockjob_int.h | 15 +++-----------
+ vl.c                      | 13 +++----------
- tests/test-blockjob.c        | 10 +++++-----
+files changed, 17 insertions(+), 45 deletions(-)
 files changed, 39 insertions(+), 39 deletions(-)
-diff --git a/block/backup.c b/block/backup.c
+diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
 index XXXXXXX..XXXXXXX 100644
---- a/block/backup.c
+--- a/include/sysemu/iothread.h
-+++ b/block/backup.c
++++ b/include/sysemu/iothread.h
-@@ -XXX,XX +XXX,XX @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
+@@ -XXX,XX +XXX,XX @@ typedef struct {
  char *iothread_get_id(IOThread *iothread);
  IOThread *iothread_by_id(const char *id);
  AioContext *iothread_get_aio_context(IOThread *iothread);
 -void iothread_stop_all(void);
  GMainContext *iothread_get_g_main_context(IOThread *iothread);
  /*
 diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/sysemu/sysemu.h
 +++ b/include/sysemu/sysemu.h
@@ -XXX,XX +XXX,XX @@ void vm_start(void);
  int vm_prepare_start(void);
  int vm_stop(RunState state);
  int vm_stop_force_state(RunState state);
 +int vm_shutdown(void);
  typedef enum WakeupReason {
      /* Always keep QEMU_WAKEUP_REASON_NONE = 0 */
 diff --git a/cpus.c b/cpus.c
 index XXXXXXX..XXXXXXX 100644
 --- a/cpus.c
 +++ b/cpus.c
@@ -XXX,XX +XXX,XX @@ void cpu_synchronize_all_pre_loadvm(void)
      }
-     if (job) {
+ }
-         backup_clean(&job->common);
--        block_job_unref(&job->common);
+-static int do_vm_stop(RunState state)
-+        block_job_early_fail(&job->common);
++static int do_vm_stop(RunState state, bool send_stop)
  {
      int ret = 0;
@@ -XXX,XX +XXX,XX @@ static int do_vm_stop(RunState state)
          pause_all_vcpus();
          runstate_set(state);
          vm_state_notify(0, state);
 -        qapi_event_send_stop(&error_abort);
 +        if (send_stop) {
 +            qapi_event_send_stop(&error_abort);
 +        }
      }
-     return NULL;
+     bdrv_drain_all();
-diff --git a/block/commit.c b/block/commit.c
+@@ -XXX,XX +XXX,XX @@ static int do_vm_stop(RunState state)
-index XXXXXXX..XXXXXXX 100644
+     return ret;
 --- a/block/commit.c
 +++ b/block/commit.c
@@ -XXX,XX +XXX,XX @@ fail:
      if (commit_top_bs) {
          bdrv_set_backing_hd(overlay_bs, top, &error_abort);
      }
 -    block_job_unref(&s->common);
 +    block_job_early_fail(&s->common);
  }
++/* Special vm_stop() variant for terminating the process.  Historically clients
-diff --git a/block/mirror.c b/block/mirror.c
++ * did not expect a QMP STOP event and so we need to retain compatibility.
-index XXXXXXX..XXXXXXX 100644
++ */
---- a/block/mirror.c
++int vm_shutdown(void)
 +++ b/block/mirror.c
@@ -XXX,XX +XXX,XX @@ fail:
          g_free(s->replaces);
          blk_unref(s->target);
 -        block_job_unref(&s->common);
 +        block_job_early_fail(&s->common);
      }
      bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
 diff --git a/blockjob.c b/blockjob.c
 index XXXXXXX..XXXXXXX 100644
 --- a/blockjob.c
 +++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ BlockJob *block_job_get(const char *id)
      return NULL;
  }
 +static void block_job_ref(BlockJob *job)
 +{
-+    ++job->refcnt;
++    return do_vm_stop(RUN_STATE_SHUTDOWN, false);
 +}
 +
-+static void block_job_attached_aio_context(AioContext *new_context,
+ static bool cpu_can_run(CPUState *cpu)
 +                                           void *opaque);
 +static void block_job_detach_aio_context(void *opaque);
 +
 +static void block_job_unref(BlockJob *job)
 +{
 +    if (--job->refcnt == 0) {
 +        BlockDriverState *bs = blk_bs(job->blk);
 +        bs->job = NULL;
 +        block_job_remove_all_bdrv(job);
 +        blk_remove_aio_context_notifier(job->blk,
 +                                        block_job_attached_aio_context,
 +                                        block_job_detach_aio_context, job);
 +        blk_unref(job->blk);
 +        error_free(job->blocker);
 +        g_free(job->id);
 +        QLIST_REMOVE(job, job_list);
 +        g_free(job);
 +    }
 +}
 +
  static void block_job_attached_aio_context(AioContext *new_context,
                                             void *opaque)
  {
-@@ -XXX,XX +XXX,XX @@ void block_job_start(BlockJob *job)
+     if (cpu->stop) {
-     bdrv_coroutine_enter(blk_bs(job->blk), job->co);
+@@ -XXX,XX +XXX,XX @@ int vm_stop(RunState state)
          return 0;
      }
 -    return do_vm_stop(state);
 +    return do_vm_stop(state, true);
  }
--void block_job_ref(BlockJob *job)
+ /**
-+void block_job_early_fail(BlockJob *job)
+diff --git a/iothread.c b/iothread.c
- {
+index XXXXXXX..XXXXXXX 100644
--    ++job->refcnt;
+--- a/iothread.c
 +++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ void iothread_stop(IOThread *iothread)
      qemu_thread_join(&iothread->thread);
  }
 -static int iothread_stop_iter(Object *object, void *opaque)
 -{
 -    IOThread *iothread;
 -
 -    iothread = (IOThread *)object_dynamic_cast(object, TYPE_IOTHREAD);
 -    if (!iothread) {
 -        return 0;
 -    }
 -    iothread_stop(iothread);
 -    return 0;
 -}
 -
--void block_job_unref(BlockJob *job)
+ static void iothread_instance_init(Object *obj)
  {
      IOThread *iothread = IOTHREAD(obj);
@@ -XXX,XX +XXX,XX @@ IOThreadInfoList *qmp_query_iothreads(Error **errp)
      return head;
  }
 -void iothread_stop_all(void)
 -{
--    if (--job->refcnt == 0) {
+-    Object *container = object_get_objects_root();
--        BlockDriverState *bs = blk_bs(job->blk);
+-    BlockDriverState *bs;
--        bs->job = NULL;
+-    BdrvNextIterator it;
--        block_job_remove_all_bdrv(job);
+-
--        blk_remove_aio_context_notifier(job->blk,
+-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
--                                        block_job_attached_aio_context,
+-        AioContext *ctx = bdrv_get_aio_context(bs);
--                                        block_job_detach_aio_context, job);
+-        if (ctx == qemu_get_aio_context()) {
--        blk_unref(job->blk);
+-            continue;
--        error_free(job->blocker);
+-        }
--        g_free(job->id);
+-        aio_context_acquire(ctx);
--        QLIST_REMOVE(job, job_list);
+-        bdrv_set_aio_context(bs, qemu_get_aio_context());
--        g_free(job);
+-        aio_context_release(ctx);
 -    }
-+    block_job_unref(job);
+-
- }
+-    object_child_foreach(container, iothread_stop_iter, NULL);
+-}
- static void block_job_completed_single(BlockJob *job)
+-
-diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
+ static gpointer iothread_g_main_context_init(gpointer opaque)
  {
      AioContext *ctx;
 diff --git a/vl.c b/vl.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/block/blockjob_int.h
+--- a/vl.c
-+++ b/include/block/blockjob_int.h
++++ b/vl.c
-@@ -XXX,XX +XXX,XX @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
+@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv, char **envp)
- void block_job_yield(BlockJob *job);
+     os_setup_post();
- /**
+     main_loop();
-- * block_job_ref:
+-    replay_disable_events();
-+ * block_job_early_fail:
-  * @bs: The block device.
+-    /* The ordering of the following is delicate.  Stop vcpus to prevent new
-  *
+-     * I/O requests being queued by the guest.  Then stop IOThreads (this
-- * Grab a reference to the block job. Should be paired with block_job_unref.
+-     * includes a drain operation and completes all request processing).  At
-+ * The block job could not be started, free it.
+-     * this point emulated devices are still associated with their IOThreads
-  */
+-     * (if any) but no longer have any work to do.  Only then can we close
--void block_job_ref(BlockJob *job);
+-     * block devices safely because we know there is no more I/O coming.
--
+-     */
--/**
+-    pause_all_vcpus();
-- * block_job_unref:
+-    iothread_stop_all();
-- * @bs: The block device.
++    /* No more vcpu or device emulation activity beyond this point */
-- *
++    vm_shutdown();
-- * Release reference to the block job and release resources if it is the last
++
-- * reference.
+     bdrv_close_all();
-- */
--void block_job_unref(BlockJob *job);
+     res_free();
 +void block_job_early_fail(BlockJob *job);
  /**
   * block_job_completed:
 diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/test-blockjob.c
 +++ b/tests/test-blockjob.c
@@ -XXX,XX +XXX,XX @@ static void test_job_ids(void)
      job[1] = do_test_id(blk[1], "id0", false);
      /* But once job[0] finishes we can reuse its ID */
 -    block_job_unref(job[0]);
 +    block_job_early_fail(job[0]);
      job[1] = do_test_id(blk[1], "id0", true);
      /* No job ID specified, defaults to the backend name ('drive1') */
 -    block_job_unref(job[1]);
 +    block_job_early_fail(job[1]);
      job[1] = do_test_id(blk[1], NULL, true);
      /* Duplicate job ID */
@@ -XXX,XX +XXX,XX @@ static void test_job_ids(void)
      /* This one is valid */
      job[2] = do_test_id(blk[2], "id_2", true);
 -    block_job_unref(job[0]);
 -    block_job_unref(job[1]);
 -    block_job_unref(job[2]);
 +    block_job_early_fail(job[0]);
 +    block_job_early_fail(job[1]);
 +    block_job_early_fail(job[2]);
      destroy_blk(blk[0]);
      destroy_blk(blk[1]);
 --
-.9.3
+.14.3

-[Qemu-devel] [PULL 07/12] blockjob: introduce block_job_cancel_async, check iostatus invariants
+Deleted patch
-From: Paolo Bonzini <pbonzini@redhat.com>
-The new functions helps respecting the invariant that the coroutine
-is entered with false user_resume, zero pause count and no error
-recorded in the iostatus.
-Resetting the iostatus is now common to all of block_job_cancel_async,
-block_job_user_resume and block_job_iostatus_reset, albeit with slight
-differences:
-- block_job_cancel_async resets the iostatus, and resumes the job if
-there was an error, but the coroutine is not restarted immediately.
-For example the caller may continue with a call to block_job_finish_sync.
-- block_job_user_resume resets the iostatus.  It wants to resume the job
-unconditionally, even if there was no error.
-- block_job_iostatus_reset doesn't resume the job at all.  Maybe that's
-a bug but it should be fixed separately.
-block_job_iostatus_reset does the least common denominator, so add some
-checking but otherwise leave it as the entry point for resetting the
-iostatus.
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-Message-id: 20170508141310.8674-8-pbonzini@redhat.com
-Signed-off-by: Jeff Cody <jcody@redhat.com>
----
- blockjob.c | 24 ++++++++++++++++++++----
-file changed, 20 insertions(+), 4 deletions(-)
-diff --git a/blockjob.c b/blockjob.c
-index XXXXXXX..XXXXXXX 100644
---- a/blockjob.c
-+++ b/blockjob.c
-@@ -XXX,XX +XXX,XX @@ static void block_job_completed_single(BlockJob *job)
-     block_job_unref(job);
- }
-+static void block_job_cancel_async(BlockJob *job)
-+{
-+    if (job->iostatus != BLOCK_DEVICE_IO_STATUS_OK) {
-+        block_job_iostatus_reset(job);
-+    }
-+    if (job->user_paused) {
-+        /* Do not call block_job_enter here, the caller will handle it.  */
-+        job->user_paused = false;
-+        job->pause_count--;
-+    }
-+    job->cancelled = true;
-+}
-+
- static void block_job_completed_txn_abort(BlockJob *job)
- {
-     AioContext *ctx;
-@@ -XXX,XX +XXX,XX @@ static void block_job_completed_txn_abort(BlockJob *job)
-              * them; this job, however, may or may not be cancelled, depending
-              * on the caller, so leave it. */
-             if (other_job != job) {
--                other_job->cancelled = true;
-+                block_job_cancel_async(other_job);
-             }
-             continue;
-         }
-@@ -XXX,XX +XXX,XX @@ bool block_job_user_paused(BlockJob *job)
- void block_job_user_resume(BlockJob *job)
- {
-     if (job && job->user_paused && job->pause_count > 0) {
--        job->user_paused = false;
-         block_job_iostatus_reset(job);
-+        job->user_paused = false;
-         block_job_resume(job);
-     }
- }
-@@ -XXX,XX +XXX,XX @@ void block_job_user_resume(BlockJob *job)
- void block_job_cancel(BlockJob *job)
- {
-     if (block_job_started(job)) {
--        job->cancelled = true;
--        block_job_iostatus_reset(job);
-+        block_job_cancel_async(job);
-         block_job_enter(job);
-     } else {
-         block_job_completed(job, -ECANCELED);
-@@ -XXX,XX +XXX,XX @@ void block_job_yield(BlockJob *job)
- void block_job_iostatus_reset(BlockJob *job)
- {
-+    if (job->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
-+        return;
-+    }
-+    assert(job->user_paused && job->pause_count > 0);
-     job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
- }
---
-.9.3

-[Qemu-devel] [PULL 09/12] blockjob: strengthen a bit test-blockjob-txn
+Deleted patch
-From: Paolo Bonzini <pbonzini@redhat.com>
-Unlike test-blockjob-txn, QMP releases the reference to the transaction
-before the jobs finish.  Thus, qemu-iotest 124 showed a failure while
-working on the next patch that the unit tests did not have.  Make
-the test a little nastier.
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: John Snow <jsnow@redhat.com>
-Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-Message-id: 20170508141310.8674-10-pbonzini@redhat.com
-Signed-off-by: Jeff Cody <jcody@redhat.com>
----
- tests/test-blockjob-txn.c | 7 +++++--
-file changed, 5 insertions(+), 2 deletions(-)
-diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/test-blockjob-txn.c
-+++ b/tests/test-blockjob-txn.c
-@@ -XXX,XX +XXX,XX @@ static void test_pair_jobs(int expected1, int expected2)
-     block_job_start(job1);
-     block_job_start(job2);
-+    /* Release our reference now to trigger as many nice
-+     * use-after-free bugs as possible.
-+     */
-+    block_job_txn_unref(txn);
-+
-     if (expected1 == -ECANCELED) {
-         block_job_cancel(job1);
-     }
-@@ -XXX,XX +XXX,XX @@ static void test_pair_jobs(int expected1, int expected2)
-     g_assert_cmpint(result1, ==, expected1);
-     g_assert_cmpint(result2, ==, expected2);
--
--    block_job_txn_unref(txn);
- }
- static void test_pair_jobs_success(void)
---
-.9.3

-[Qemu-devel] [PULL 10/12] blockjob: reorganize block_job_completed_txn_abort
+Deleted patch
-From: Paolo Bonzini <pbonzini@redhat.com>
-This splits the part that touches job states from the part that invokes
-callbacks.  It will make the code simpler to understand once job states will
-be protected by a different mutex than the AioContext lock.
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-Message-id: 20170508141310.8674-11-pbonzini@redhat.com
-Signed-off-by: Jeff Cody <jcody@redhat.com>
----
- blockjob.c | 35 ++++++++++++++++++++++-------------
-file changed, 22 insertions(+), 13 deletions(-)
-diff --git a/blockjob.c b/blockjob.c
-index XXXXXXX..XXXXXXX 100644
---- a/blockjob.c
-+++ b/blockjob.c
-@@ -XXX,XX +XXX,XX @@ void block_job_start(BlockJob *job)
- static void block_job_completed_single(BlockJob *job)
- {
-+    assert(job->completed);
-+
-     if (!job->ret) {
-         if (job->driver->commit) {
-             job->driver->commit(job);
-@@ -XXX,XX +XXX,XX @@ static int block_job_finish_sync(BlockJob *job,
-     block_job_ref(job);
--    finish(job, &local_err);
-+    if (finish) {
-+        finish(job, &local_err);
-+    }
-     if (local_err) {
-         error_propagate(errp, local_err);
-         block_job_unref(job);
-@@ -XXX,XX +XXX,XX @@ static void block_job_completed_txn_abort(BlockJob *job)
- {
-     AioContext *ctx;
-     BlockJobTxn *txn = job->txn;
--    BlockJob *other_job, *next;
-+    BlockJob *other_job;
-     if (txn->aborting) {
-         /*
-@@ -XXX,XX +XXX,XX @@ static void block_job_completed_txn_abort(BlockJob *job)
-         return;
-     }
-     txn->aborting = true;
-+    block_job_txn_ref(txn);
-+
-     /* We are the first failed job. Cancel other jobs. */
-     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
-         ctx = blk_get_aio_context(other_job->blk);
-         aio_context_acquire(ctx);
-     }
-+
-+    /* Other jobs are effectively cancelled by us, set the status for
-+     * them; this job, however, may or may not be cancelled, depending
-+     * on the caller, so leave it. */
-     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
--        if (other_job == job || other_job->completed) {
--            /* Other jobs are "effectively" cancelled by us, set the status for
--             * them; this job, however, may or may not be cancelled, depending
--             * on the caller, so leave it. */
--            if (other_job != job) {
--                block_job_cancel_async(other_job);
--            }
--            continue;
-+        if (other_job != job) {
-+            block_job_cancel_async(other_job);
-         }
--        block_job_cancel_sync(other_job);
--        assert(other_job->completed);
-     }
--    QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
-+    while (!QLIST_EMPTY(&txn->jobs)) {
-+        other_job = QLIST_FIRST(&txn->jobs);
-         ctx = blk_get_aio_context(other_job->blk);
-+        if (!other_job->completed) {
-+            assert(other_job->cancelled);
-+            block_job_finish_sync(other_job, NULL, NULL);
-+        }
-         block_job_completed_single(other_job);
-         aio_context_release(ctx);
-     }
-+
-+    block_job_txn_unref(txn);
- }
- static void block_job_completed_txn_success(BlockJob *job)
---
-.9.3

-[Qemu-devel] [PULL 12/12] block/gluster: glfs_lseek() workaround
+Deleted patch
-On current released versions of glusterfs, glfs_lseek() will sometimes
-return invalid values for SEEK_DATA or SEEK_HOLE.  For SEEK_DATA and
-SEEK_HOLE, the returned value should be >= the passed offset, or < 0 in
-the case of error:
-LSEEK(2):
-    off_t lseek(int fd, off_t offset, int whence);
-    [...]
-    SEEK_HOLE
-              Adjust  the file offset to the next hole in the file greater
-              than or equal to offset.  If offset points into the middle of
-              a hole, then the file offset is set to offset.  If there is no
-              hole past offset, then the file offset is adjusted to the end
-              of the file (i.e., there is  an implicit hole at the end of
-              any file).
-    [...]
-    RETURN VALUE
-              Upon  successful  completion,  lseek()  returns  the resulting
-              offset location as measured in bytes from the beginning of the
-              file.  On error, the value (off_t) -1 is returned and errno is
-              set to indicate the error
-However, occasionally glfs_lseek() for SEEK_HOLE/DATA will return a
-value less than the passed offset, yet greater than zero.
-For instance, here are example values observed from this call:
-    offs = glfs_lseek(s->fd, start, SEEK_HOLE);
-    if (offs < 0) {
-        return -errno;          /* D1 and (H3 or H4) */
-    }
-start == 7608336384
-offs == 7607877632
-This causes QEMU to abort on the assert test.  When this value is
-returned, errno is also 0.
-This is a reported and known bug to glusterfs:
-https://bugzilla.redhat.com/show_bug.cgi?id=1425293
-Although this is being fixed in gluster, we still should work around it
-in QEMU, given that multiple released versions of gluster behave this
-way.
-This patch treats the return case of (offs < start) the same as if an
-error value other than ENXIO is returned; we will assume we learned
-nothing, and there are no holes in the file.
-Signed-off-by: Jeff Cody <jcody@redhat.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Niels de Vos <ndevos@redhat.com>
-Message-id: 87c0140e9407c08f6e74b04131b610f2e27c014c.1495560397.git.jcody@redhat.com
-Signed-off-by: Jeff Cody <jcody@redhat.com>
----
- block/gluster.c | 18 ++++++++++++++++--
-file changed, 16 insertions(+), 2 deletions(-)
-diff --git a/block/gluster.c b/block/gluster.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/gluster.c
-+++ b/block/gluster.c
-@@ -XXX,XX +XXX,XX @@ static int find_allocation(BlockDriverState *bs, off_t start,
-     if (offs < 0) {
-         return -errno;          /* D3 or D4 */
-     }
--    assert(offs >= start);
-+
-+    if (offs < start) {
-+        /* This is not a valid return by lseek().  We are safe to just return
-+         * -EIO in this case, and we'll treat it like D4. Unfortunately some
-+         *  versions of gluster server will return offs < start, so an assert
-+         *  here will unnecessarily abort QEMU. */
-+        return -EIO;
-+    }
-     if (offs > start) {
-         /* D2: in hole, next data at offs */
-@@ -XXX,XX +XXX,XX @@ static int find_allocation(BlockDriverState *bs, off_t start,
-     if (offs < 0) {
-         return -errno;          /* D1 and (H3 or H4) */
-     }
--    assert(offs >= start);
-+
-+    if (offs < start) {
-+        /* This is not a valid return by lseek().  We are safe to just return
-+         * -EIO in this case, and we'll treat it like H4. Unfortunately some
-+         *  versions of gluster server will return offs < start, so an assert
-+         *  here will unnecessarily abort QEMU. */
-+        return -EIO;
-+    }
-     if (offs > start) {
-         /*
---
-.9.3

The following changes since commit 9964e96dc9999cf7f7c936ee854a795415d19b60:

Merge remote-tracking branch 'jasowang/tags/net-pull-request' into staging (2017-05-23 15:01:31 +0100)

are available in the git repository at:

git://github.com/codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to 223a23c198787328ae75bc65d84edf5fde33c0b6:

block/gluster: glfs_lseek() workaround (2017-05-24 16:44:46 -0400)

----------------------------------------------------------------
Block patches
----------------------------------------------------------------

Jeff Cody (1):
  block/gluster: glfs_lseek() workaround

Paolo Bonzini (11):
  blockjob: remove unnecessary check
  blockjob: remove iostatus_reset callback
  blockjob: introduce block_job_early_fail
  blockjob: introduce block_job_pause/resume_all
  blockjob: separate monitor and blockjob APIs
  blockjob: move iostatus reset inside block_job_user_resume
  blockjob: introduce block_job_cancel_async, check iostatus invariants
  blockjob: group BlockJob transaction functions together
  blockjob: strengthen a bit test-blockjob-txn
  blockjob: reorganize block_job_completed_txn_abort
  blockjob: use deferred_to_main_loop to indicate the coroutine has
    ended

-- 
2.9.3

From: Paolo Bonzini <pbonzini@redhat.com>

This is unused since commit 66a0fae ("blockjob: Don't touch BDS iostatus",
2016-05-19).

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170508141310.8674-3-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockjob.c                   | 3 ---
 include/block/blockjob_int.h | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ bool block_job_is_cancelled(BlockJob *job)
 void block_job_iostatus_reset(BlockJob *job)
 {
     job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
-    if (job->driver->iostatus_reset) {
-        job->driver->iostatus_reset(job);
-    }
 }
 
 static int block_job_finish_sync(BlockJob *job,
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -XXX,XX +XXX,XX @@ struct BlockJobDriver {
     /** Optional callback for job types that support setting a speed limit */
     void (*set_speed)(BlockJob *job, int64_t speed, Error **errp);
 
-    /** Optional callback for job types that need to forward I/O status reset */
-    void (*iostatus_reset)(BlockJob *job);
-
     /** Mandatory: Entrypoint for the Coroutine. */
     CoroutineEntry *start;
 
-- 
2.9.3

From: Paolo Bonzini <pbonzini@redhat.com>

Outside blockjob.c, block_job_unref is only used when a block job fails
to start, and block_job_ref is not used at all.  The reference counting
thus is pretty well hidden.  Introduce a separate function to be used
by block jobs; because block_job_ref and block_job_unref now become
static, move them earlier in blockjob.c.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170508141310.8674-4-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/backup.c               |  2 +-
 block/commit.c               |  2 +-
 block/mirror.c               |  2 +-
 blockjob.c                   | 47 ++++++++++++++++++++++++++------------------
 include/block/blockjob_int.h | 15 +++-----------
 tests/test-blockjob.c        | 10 +++++-----
 6 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index XXXXXXX..XXXXXXX 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -XXX,XX +XXX,XX @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
     }
     if (job) {
         backup_clean(&job->common);
-        block_job_unref(&job->common);
+        block_job_early_fail(&job->common);
     }
 
     return NULL;
diff --git a/block/commit.c b/block/commit.c
index XXXXXXX..XXXXXXX 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -XXX,XX +XXX,XX @@ fail:
     if (commit_top_bs) {
         bdrv_set_backing_hd(overlay_bs, top, &error_abort);
     }
-    block_job_unref(&s->common);
+    block_job_early_fail(&s->common);
 }
 
 
diff --git a/block/mirror.c b/block/mirror.c
index XXXXXXX..XXXXXXX 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -XXX,XX +XXX,XX @@ fail:
 
         g_free(s->replaces);
         blk_unref(s->target);
-        block_job_unref(&s->common);
+        block_job_early_fail(&s->common);
     }
 
     bdrv_child_try_set_perm(mirror_top_bs->backing, 0, BLK_PERM_ALL,
diff --git a/blockjob.c b/blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ BlockJob *block_job_get(const char *id)
     return NULL;
 }
 
+static void block_job_ref(BlockJob *job)
+{
+    ++job->refcnt;
+}
+
+static void block_job_attached_aio_context(AioContext *new_context,
+                                           void *opaque);
+static void block_job_detach_aio_context(void *opaque);
+
+static void block_job_unref(BlockJob *job)
+{
+    if (--job->refcnt == 0) {
+        BlockDriverState *bs = blk_bs(job->blk);
+        bs->job = NULL;
+        block_job_remove_all_bdrv(job);
+        blk_remove_aio_context_notifier(job->blk,
+                                        block_job_attached_aio_context,
+                                        block_job_detach_aio_context, job);
+        blk_unref(job->blk);
+        error_free(job->blocker);
+        g_free(job->id);
+        QLIST_REMOVE(job, job_list);
+        g_free(job);
+    }
+}
+
 static void block_job_attached_aio_context(AioContext *new_context,
                                            void *opaque)
 {
@@ -XXX,XX +XXX,XX @@ void block_job_start(BlockJob *job)
     bdrv_coroutine_enter(blk_bs(job->blk), job->co);
 }
 
-void block_job_ref(BlockJob *job)
+void block_job_early_fail(BlockJob *job)
 {
-    ++job->refcnt;
-}
-
-void block_job_unref(BlockJob *job)
-{
-    if (--job->refcnt == 0) {
-        BlockDriverState *bs = blk_bs(job->blk);
-        bs->job = NULL;
-        block_job_remove_all_bdrv(job);
-        blk_remove_aio_context_notifier(job->blk,
-                                        block_job_attached_aio_context,
-                                        block_job_detach_aio_context, job);
-        blk_unref(job->blk);
-        error_free(job->blocker);
-        g_free(job->id);
-        QLIST_REMOVE(job, job_list);
-        g_free(job);
-    }
+    block_job_unref(job);
 }
 
 static void block_job_completed_single(BlockJob *job)
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -XXX,XX +XXX,XX @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
 void block_job_yield(BlockJob *job);
 
 /**
- * block_job_ref:
+ * block_job_early_fail:
  * @bs: The block device.
  *
- * Grab a reference to the block job. Should be paired with block_job_unref.
+ * The block job could not be started, free it.
  */
-void block_job_ref(BlockJob *job);
-
-/**
- * block_job_unref:
- * @bs: The block device.
- *
- * Release reference to the block job and release resources if it is the last
- * reference.
- */
-void block_job_unref(BlockJob *job);
+void block_job_early_fail(BlockJob *job);
 
 /**
  * block_job_completed:
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -XXX,XX +XXX,XX @@ static void test_job_ids(void)
     job[1] = do_test_id(blk[1], "id0", false);
 
     /* But once job[0] finishes we can reuse its ID */
-    block_job_unref(job[0]);
+    block_job_early_fail(job[0]);
     job[1] = do_test_id(blk[1], "id0", true);
 
     /* No job ID specified, defaults to the backend name ('drive1') */
-    block_job_unref(job[1]);
+    block_job_early_fail(job[1]);
     job[1] = do_test_id(blk[1], NULL, true);
 
     /* Duplicate job ID */
@@ -XXX,XX +XXX,XX @@ static void test_job_ids(void)
     /* This one is valid */
     job[2] = do_test_id(blk[2], "id_2", true);
 
-    block_job_unref(job[0]);
-    block_job_unref(job[1]);
-    block_job_unref(job[2]);
+    block_job_early_fail(job[0]);
+    block_job_early_fail(job[1]);
+    block_job_early_fail(job[2]);
 
     destroy_blk(blk[0]);
     destroy_blk(blk[1]);
-- 
2.9.3

From: Paolo Bonzini <pbonzini@redhat.com>

Remove use of block_job_pause/resume from outside blockjob.c, thus
making them static.  The new functions are used by the block layer,
so place them in blockjob_int.h.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 20170508141310.8674-5-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/io.c                   |  19 ++------
 blockjob.c                   | 114 ++++++++++++++++++++++++++-----------------
 include/block/blockjob.h     |  16 ------
 include/block/blockjob_int.h |  14 ++++++
 4 files changed, 86 insertions(+), 77 deletions(-)

diff --git a/block/io.c b/block/io.c
index XXXXXXX..XXXXXXX 100644
--- a/block/io.c
+++ b/block/io.c
@@ -XXX,XX +XXX,XX @@
 #include "trace.h"
 #include "sysemu/block-backend.h"
 #include "block/blockjob.h"
+#include "block/blockjob_int.h"
 #include "block/block_int.h"
 #include "qemu/cutils.h"
 #include "qapi/error.h"
@@ -XXX,XX +XXX,XX @@ void bdrv_drain_all_begin(void)
     bool waited = true;
     BlockDriverState *bs;
     BdrvNextIterator it;
-    BlockJob *job = NULL;
     GSList *aio_ctxs = NULL, *ctx;
 
-    while ((job = block_job_next(job))) {
-        AioContext *aio_context = blk_get_aio_context(job->blk);
-
-        aio_context_acquire(aio_context);
-        block_job_pause(job);
-        aio_context_release(aio_context);
-    }
+    block_job_pause_all();
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
@@ -XXX,XX +XXX,XX @@ void bdrv_drain_all_end(void)
 {
     BlockDriverState *bs;
     BdrvNextIterator it;
-    BlockJob *job = NULL;
 
     for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
         AioContext *aio_context = bdrv_get_aio_context(bs);
@@ -XXX,XX +XXX,XX @@ void bdrv_drain_all_end(void)
         aio_context_release(aio_context);
     }
 
-    while ((job = block_job_next(job))) {
-        AioContext *aio_context = blk_get_aio_context(job->blk);
-
-        aio_context_acquire(aio_context);
-        block_job_resume(job);
-        aio_context_release(aio_context);
-    }
+    block_job_resume_all();
 }
 
 void bdrv_drain_all(void)
diff --git a/blockjob.c b/blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ struct BlockJobTxn {
 
 static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
 
-static char *child_job_get_parent_desc(BdrvChild *c)
-{
-    BlockJob *job = c->opaque;
-    return g_strdup_printf("%s job '%s'",
-                           BlockJobType_lookup[job->driver->job_type],
-                           job->id);
-}
-
-static const BdrvChildRole child_job = {
-    .get_parent_desc    = child_job_get_parent_desc,
-    .stay_at_node       = true,
-};
-
-static void block_job_drained_begin(void *opaque)
-{
-    BlockJob *job = opaque;
-    block_job_pause(job);
-}
-
-static void block_job_drained_end(void *opaque)
-{
-    BlockJob *job = opaque;
-    block_job_resume(job);
-}
-
-static const BlockDevOps block_job_dev_ops = {
-    .drained_begin = block_job_drained_begin,
-    .drained_end = block_job_drained_end,
-};
-
 BlockJob *block_job_next(BlockJob *job)
 {
     if (!job) {
@@ -XXX,XX +XXX,XX @@ BlockJob *block_job_get(const char *id)
     return NULL;
 }
 
+static void block_job_pause(BlockJob *job)
+{
+    job->pause_count++;
+}
+
+static void block_job_resume(BlockJob *job)
+{
+    assert(job->pause_count > 0);
+    job->pause_count--;
+    if (job->pause_count) {
+        return;
+    }
+    block_job_enter(job);
+}
+
 static void block_job_ref(BlockJob *job)
 {
     ++job->refcnt;
@@ -XXX,XX +XXX,XX @@ static void block_job_detach_aio_context(void *opaque)
     block_job_unref(job);
 }
 
+static char *child_job_get_parent_desc(BdrvChild *c)
+{
+    BlockJob *job = c->opaque;
+    return g_strdup_printf("%s job '%s'",
+                           BlockJobType_lookup[job->driver->job_type],
+                           job->id);
+}
+
+static const BdrvChildRole child_job = {
+    .get_parent_desc    = child_job_get_parent_desc,
+    .stay_at_node       = true,
+};
+
+static void block_job_drained_begin(void *opaque)
+{
+    BlockJob *job = opaque;
+    block_job_pause(job);
+}
+
+static void block_job_drained_end(void *opaque)
+{
+    BlockJob *job = opaque;
+    block_job_resume(job);
+}
+
+static const BlockDevOps block_job_dev_ops = {
+    .drained_begin = block_job_drained_begin,
+    .drained_end = block_job_drained_end,
+};
+
 void block_job_remove_all_bdrv(BlockJob *job)
 {
     GSList *l;
@@ -XXX,XX +XXX,XX @@ void block_job_complete(BlockJob *job, Error **errp)
     job->driver->complete(job, errp);
 }
 
-void block_job_pause(BlockJob *job)
-{
-    job->pause_count++;
-}
-
 void block_job_user_pause(BlockJob *job)
 {
     job->user_paused = true;
@@ -XXX,XX +XXX,XX @@ void coroutine_fn block_job_pause_point(BlockJob *job)
     }
 }
 
-void block_job_resume(BlockJob *job)
-{
-    assert(job->pause_count > 0);
-    job->pause_count--;
-    if (job->pause_count) {
-        return;
-    }
-    block_job_enter(job);
-}
-
 void block_job_user_resume(BlockJob *job)
 {
     if (job && job->user_paused && job->pause_count > 0) {
@@ -XXX,XX +XXX,XX @@ static void block_job_event_completed(BlockJob *job, const char *msg)
                                         &error_abort);
 }
 
+void block_job_pause_all(void)
+{
+    BlockJob *job = NULL;
+    while ((job = block_job_next(job))) {
+        AioContext *aio_context = blk_get_aio_context(job->blk);
+
+        aio_context_acquire(aio_context);
+        block_job_pause(job);
+        aio_context_release(aio_context);
+    }
+}
+
+void block_job_resume_all(void)
+{
+    BlockJob *job = NULL;
+    while ((job = block_job_next(job))) {
+        AioContext *aio_context = blk_get_aio_context(job->blk);
+
+        aio_context_acquire(aio_context);
+        block_job_resume(job);
+        aio_context_release(aio_context);
+    }
+}
+
 void block_job_event_ready(BlockJob *job)
 {
     job->ready = true;
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -XXX,XX +XXX,XX @@ void block_job_complete(BlockJob *job, Error **errp);
 BlockJobInfo *block_job_query(BlockJob *job, Error **errp);
 
 /**
- * block_job_pause:
- * @job: The job to be paused.
- *
- * Asynchronously pause the specified job.
- */
-void block_job_pause(BlockJob *job);
-
-/**
  * block_job_user_pause:
  * @job: The job to be paused.
  *
@@ -XXX,XX +XXX,XX @@ void block_job_user_pause(BlockJob *job);
 bool block_job_user_paused(BlockJob *job);
 
 /**
- * block_job_resume:
- * @job: The job to be resumed.
- *
- * Resume the specified job.  Must be paired with a preceding block_job_pause.
- */
-void block_job_resume(BlockJob *job);
-
-/**
  * block_job_user_resume:
  * @job: The job to be resumed.
  *
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -XXX,XX +XXX,XX @@ void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns);
 void block_job_yield(BlockJob *job);
 
 /**
+ * block_job_pause_all:
+ *
+ * Asynchronously pause all jobs.
+ */
+void block_job_pause_all(void);
+
+/**
+ * block_job_resume_all:
+ *
+ * Resume all block jobs.  Must be paired with a preceding block_job_pause_all.
+ */
+void block_job_resume_all(void);
+
+/**
  * block_job_early_fail:
  * @bs: The block device.
  *
-- 
2.9.3

From: Paolo Bonzini <pbonzini@redhat.com>

We have two different headers for block job operations, blockjob.h
and blockjob_int.h.  The former contains APIs called by the monitor,
the latter contains APIs called by the block job drivers and the
block layer itself.

Keep the two APIs separate in the blockjob.c file too.  This will
be useful when transitioning away from the AioContext lock, because
there will be locking policies for the two categories, too---the
monitor will have to call new block_job_lock/unlock APIs, while blockjob
APIs will take care of this for the users.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170508141310.8674-6-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockjob.c | 390 ++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 205 insertions(+), 185 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ struct BlockJobTxn {
 
 static QLIST_HEAD(, BlockJob) block_jobs = QLIST_HEAD_INITIALIZER(block_jobs);
 
+/*
+ * The block job API is composed of two categories of functions.
+ *
+ * The first includes functions used by the monitor.  The monitor is
+ * peculiar in that it accesses the block job list with block_job_get, and
+ * therefore needs consistency across block_job_get and the actual operation
+ * (e.g. block_job_set_speed).  The consistency is achieved with
+ * aio_context_acquire/release.  These functions are declared in blockjob.h.
+ *
+ * The second includes functions used by the block job drivers and sometimes
+ * by the core block layer.  These do not care about locking, because the
+ * whole coroutine runs under the AioContext lock, and are declared in
+ * blockjob_int.h.
+ */
+
 BlockJob *block_job_next(BlockJob *job)
 {
     if (!job) {
@@ -XXX,XX +XXX,XX @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
     return 0;
 }
 
-void *block_job_create(const char *job_id, const BlockJobDriver *driver,
-                       BlockDriverState *bs, uint64_t perm,
-                       uint64_t shared_perm, int64_t speed, int flags,
-                       BlockCompletionFunc *cb, void *opaque, Error **errp)
-{
-    BlockBackend *blk;
-    BlockJob *job;
-    int ret;
-
-    if (bs->job) {
-        error_setg(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
-        return NULL;
-    }
-
-    if (job_id == NULL && !(flags & BLOCK_JOB_INTERNAL)) {
-        job_id = bdrv_get_device_name(bs);
-        if (!*job_id) {
-            error_setg(errp, "An explicit job ID is required for this node");
-            return NULL;
-        }
-    }
-
-    if (job_id) {
-        if (flags & BLOCK_JOB_INTERNAL) {
-            error_setg(errp, "Cannot specify job ID for internal block job");
-            return NULL;
-        }
-
-        if (!id_wellformed(job_id)) {
-            error_setg(errp, "Invalid job ID '%s'", job_id);
-            return NULL;
-        }
-
-        if (block_job_get(job_id)) {
-            error_setg(errp, "Job ID '%s' already in use", job_id);
-            return NULL;
-        }
-    }
-
-    blk = blk_new(perm, shared_perm);
-    ret = blk_insert_bs(blk, bs, errp);
-    if (ret < 0) {
-        blk_unref(blk);
-        return NULL;
-    }
-
-    job = g_malloc0(driver->instance_size);
-    job->driver        = driver;
-    job->id            = g_strdup(job_id);
-    job->blk           = blk;
-    job->cb            = cb;
-    job->opaque        = opaque;
-    job->busy          = false;
-    job->paused        = true;
-    job->pause_count   = 1;
-    job->refcnt        = 1;
-
-    error_setg(&job->blocker, "block device is in use by block job: %s",
-               BlockJobType_lookup[driver->job_type]);
-    block_job_add_bdrv(job, "main node", bs, 0, BLK_PERM_ALL, &error_abort);
-    bs->job = job;
-
-    blk_set_dev_ops(blk, &block_job_dev_ops, job);
-    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
-
-    QLIST_INSERT_HEAD(&block_jobs, job, job_list);
-
-    blk_add_aio_context_notifier(blk, block_job_attached_aio_context,
-                                 block_job_detach_aio_context, job);
-
-    /* Only set speed when necessary to avoid NotSupported error */
-    if (speed != 0) {
-        Error *local_err = NULL;
-
-        block_job_set_speed(job, speed, &local_err);
-        if (local_err) {
-            block_job_unref(job);
-            error_propagate(errp, local_err);
-            return NULL;
-        }
-    }
-    return job;
-}
-
 bool block_job_is_internal(BlockJob *job)
 {
     return (job->id == NULL);
@@ -XXX,XX +XXX,XX @@ void block_job_start(BlockJob *job)
     bdrv_coroutine_enter(blk_bs(job->blk), job->co);
 }
 
-void block_job_early_fail(BlockJob *job)
-{
-    block_job_unref(job);
-}
-
 static void block_job_completed_single(BlockJob *job)
 {
     if (!job->ret) {
@@ -XXX,XX +XXX,XX @@ static void block_job_completed_txn_success(BlockJob *job)
     }
 }
 
-void block_job_completed(BlockJob *job, int ret)
-{
-    assert(blk_bs(job->blk)->job == job);
-    assert(!job->completed);
-    job->completed = true;
-    job->ret = ret;
-    if (!job->txn) {
-        block_job_completed_single(job);
-    } else if (ret < 0 || block_job_is_cancelled(job)) {
-        block_job_completed_txn_abort(job);
-    } else {
-        block_job_completed_txn_success(job);
-    }
-}
-
 void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 {
     Error *local_err = NULL;
@@ -XXX,XX +XXX,XX @@ void block_job_user_pause(BlockJob *job)
     block_job_pause(job);
 }
 
-static bool block_job_should_pause(BlockJob *job)
-{
-    return job->pause_count > 0;
-}
-
 bool block_job_user_paused(BlockJob *job)
 {
     return job->user_paused;
 }
 
-void coroutine_fn block_job_pause_point(BlockJob *job)
-{
-    assert(job && block_job_started(job));
-
-    if (!block_job_should_pause(job)) {
-        return;
-    }
-    if (block_job_is_cancelled(job)) {
-        return;
-    }
-
-    if (job->driver->pause) {
-        job->driver->pause(job);
-    }
-
-    if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
-        job->paused = true;
-        job->busy = false;
-        qemu_coroutine_yield(); /* wait for block_job_resume() */
-        job->busy = true;
-        job->paused = false;
-    }
-
-    if (job->driver->resume) {
-        job->driver->resume(job);
-    }
-}
-
 void block_job_user_resume(BlockJob *job)
 {
     if (job && job->user_paused && job->pause_count > 0) {
@@ -XXX,XX +XXX,XX @@ void block_job_user_resume(BlockJob *job)
     }
 }
 
-void block_job_enter(BlockJob *job)
-{
-    if (job->co && !job->busy) {
-        bdrv_coroutine_enter(blk_bs(job->blk), job->co);
-    }
-}
-
 void block_job_cancel(BlockJob *job)
 {
     if (block_job_started(job)) {
@@ -XXX,XX +XXX,XX @@ void block_job_cancel(BlockJob *job)
     }
 }
 
-bool block_job_is_cancelled(BlockJob *job)
-{
-    return job->cancelled;
-}
-
 void block_job_iostatus_reset(BlockJob *job)
 {
     job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
@@ -XXX,XX +XXX,XX @@ int block_job_complete_sync(BlockJob *job, Error **errp)
     return block_job_finish_sync(job, &block_job_complete, errp);
 }
 
-void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
-{
-    assert(job->busy);
-
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (block_job_is_cancelled(job)) {
-        return;
-    }
-
-    job->busy = false;
-    if (!block_job_should_pause(job)) {
-        co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
-    }
-    job->busy = true;
-
-    block_job_pause_point(job);
-}
-
-void block_job_yield(BlockJob *job)
-{
-    assert(job->busy);
-
-    /* Check cancellation *before* setting busy = false, too!  */
-    if (block_job_is_cancelled(job)) {
-        return;
-    }
-
-    job->busy = false;
-    if (!block_job_should_pause(job)) {
-        qemu_coroutine_yield();
-    }
-    job->busy = true;
-
-    block_job_pause_point(job);
-}
-
 BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 {
     BlockJobInfo *info;
@@ -XXX,XX +XXX,XX @@ static void block_job_event_completed(BlockJob *job, const char *msg)
                                         &error_abort);
 }
 
+/*
+ * API for block job drivers and the block layer.  These functions are
+ * declared in blockjob_int.h.
+ */
+
+void *block_job_create(const char *job_id, const BlockJobDriver *driver,
+                       BlockDriverState *bs, uint64_t perm,
+                       uint64_t shared_perm, int64_t speed, int flags,
+                       BlockCompletionFunc *cb, void *opaque, Error **errp)
+{
+    BlockBackend *blk;
+    BlockJob *job;
+    int ret;
+
+    if (bs->job) {
+        error_setg(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
+        return NULL;
+    }
+
+    if (job_id == NULL && !(flags & BLOCK_JOB_INTERNAL)) {
+        job_id = bdrv_get_device_name(bs);
+        if (!*job_id) {
+            error_setg(errp, "An explicit job ID is required for this node");
+            return NULL;
+        }
+    }
+
+    if (job_id) {
+        if (flags & BLOCK_JOB_INTERNAL) {
+            error_setg(errp, "Cannot specify job ID for internal block job");
+            return NULL;
+        }
+
+        if (!id_wellformed(job_id)) {
+            error_setg(errp, "Invalid job ID '%s'", job_id);
+            return NULL;
+        }
+
+        if (block_job_get(job_id)) {
+            error_setg(errp, "Job ID '%s' already in use", job_id);
+            return NULL;
+        }
+    }
+
+    blk = blk_new(perm, shared_perm);
+    ret = blk_insert_bs(blk, bs, errp);
+    if (ret < 0) {
+        blk_unref(blk);
+        return NULL;
+    }
+
+    job = g_malloc0(driver->instance_size);
+    job->driver        = driver;
+    job->id            = g_strdup(job_id);
+    job->blk           = blk;
+    job->cb            = cb;
+    job->opaque        = opaque;
+    job->busy          = false;
+    job->paused        = true;
+    job->pause_count   = 1;
+    job->refcnt        = 1;
+
+    error_setg(&job->blocker, "block device is in use by block job: %s",
+               BlockJobType_lookup[driver->job_type]);
+    block_job_add_bdrv(job, "main node", bs, 0, BLK_PERM_ALL, &error_abort);
+    bs->job = job;
+
+    blk_set_dev_ops(blk, &block_job_dev_ops, job);
+    bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
+
+    QLIST_INSERT_HEAD(&block_jobs, job, job_list);
+
+    blk_add_aio_context_notifier(blk, block_job_attached_aio_context,
+                                 block_job_detach_aio_context, job);
+
+    /* Only set speed when necessary to avoid NotSupported error */
+    if (speed != 0) {
+        Error *local_err = NULL;
+
+        block_job_set_speed(job, speed, &local_err);
+        if (local_err) {
+            block_job_unref(job);
+            error_propagate(errp, local_err);
+            return NULL;
+        }
+    }
+    return job;
+}
+
 void block_job_pause_all(void)
 {
     BlockJob *job = NULL;
@@ -XXX,XX +XXX,XX @@ void block_job_pause_all(void)
     }
 }
 
+void block_job_early_fail(BlockJob *job)
+{
+    block_job_unref(job);
+}
+
+void block_job_completed(BlockJob *job, int ret)
+{
+    assert(blk_bs(job->blk)->job == job);
+    assert(!job->completed);
+    job->completed = true;
+    job->ret = ret;
+    if (!job->txn) {
+        block_job_completed_single(job);
+    } else if (ret < 0 || block_job_is_cancelled(job)) {
+        block_job_completed_txn_abort(job);
+    } else {
+        block_job_completed_txn_success(job);
+    }
+}
+
+static bool block_job_should_pause(BlockJob *job)
+{
+    return job->pause_count > 0;
+}
+
+void coroutine_fn block_job_pause_point(BlockJob *job)
+{
+    assert(job && block_job_started(job));
+
+    if (!block_job_should_pause(job)) {
+        return;
+    }
+    if (block_job_is_cancelled(job)) {
+        return;
+    }
+
+    if (job->driver->pause) {
+        job->driver->pause(job);
+    }
+
+    if (block_job_should_pause(job) && !block_job_is_cancelled(job)) {
+        job->paused = true;
+        job->busy = false;
+        qemu_coroutine_yield(); /* wait for block_job_resume() */
+        job->busy = true;
+        job->paused = false;
+    }
+
+    if (job->driver->resume) {
+        job->driver->resume(job);
+    }
+}
+
 void block_job_resume_all(void)
 {
     BlockJob *job = NULL;
@@ -XXX,XX +XXX,XX @@ void block_job_resume_all(void)
     }
 }
 
+void block_job_enter(BlockJob *job)
+{
+    if (job->co && !job->busy) {
+        bdrv_coroutine_enter(blk_bs(job->blk), job->co);
+    }
+}
+
+bool block_job_is_cancelled(BlockJob *job)
+{
+    return job->cancelled;
+}
+
+void block_job_sleep_ns(BlockJob *job, QEMUClockType type, int64_t ns)
+{
+    assert(job->busy);
+
+    /* Check cancellation *before* setting busy = false, too!  */
+    if (block_job_is_cancelled(job)) {
+        return;
+    }
+
+    job->busy = false;
+    if (!block_job_should_pause(job)) {
+        co_aio_sleep_ns(blk_get_aio_context(job->blk), type, ns);
+    }
+    job->busy = true;
+
+    block_job_pause_point(job);
+}
+
+void block_job_yield(BlockJob *job)
+{
+    assert(job->busy);
+
+    /* Check cancellation *before* setting busy = false, too!  */
+    if (block_job_is_cancelled(job)) {
+        return;
+    }
+
+    job->busy = false;
+    if (!block_job_should_pause(job)) {
+        qemu_coroutine_yield();
+    }
+    job->busy = true;
+
+    block_job_pause_point(job);
+}
+
 void block_job_event_ready(BlockJob *job)
 {
     job->ready = true;
-- 
2.9.3

From: Paolo Bonzini <pbonzini@redhat.com>

Outside blockjob.c, the block_job_iostatus_reset function is used once
in the monitor and once in BlockBackend.  When we introduce the block
job mutex, block_job_iostatus_reset's client is going to be the block
layer (for which blockjob.c will take the block job mutex) rather than
the monitor (which will take the block job mutex by itself).

The monitor's call to block_job_iostatus_reset from the monitor comes
just before the sole call to block_job_user_resume, so reset the
iostatus directly from block_job_iostatus_reset.  This will avoid
the need to introduce separate block_job_iostatus_reset and
block_job_iostatus_reset_locked APIs.

After making this change, move the function together with the others
that were moved in the previous patch.

diff --git a/blockdev.c b/blockdev.c
index XXXXXXX..XXXXXXX 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -XXX,XX +XXX,XX @@ void qmp_block_job_resume(const char *device, Error **errp)
     }
 
     trace_qmp_block_job_resume(job);
-    block_job_iostatus_reset(job);
     block_job_user_resume(job);
     aio_context_release(aio_context);
 }
diff --git a/blockjob.c b/blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ void block_job_user_resume(BlockJob *job)
 {
     if (job && job->user_paused && job->pause_count > 0) {
         job->user_paused = false;
+        block_job_iostatus_reset(job);
         block_job_resume(job);
     }
 }
@@ -XXX,XX +XXX,XX @@ void block_job_cancel(BlockJob *job)
     }
 }
 
-void block_job_iostatus_reset(BlockJob *job)
-{
-    job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
-}
-
 static int block_job_finish_sync(BlockJob *job,
                                  void (*finish)(BlockJob *, Error **errp),
                                  Error **errp)
@@ -XXX,XX +XXX,XX @@ void block_job_yield(BlockJob *job)
     block_job_pause_point(job);
 }
 
+void block_job_iostatus_reset(BlockJob *job)
+{
+    job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
+}
+
 void block_job_event_ready(BlockJob *job)
 {
     job->ready = true;
-- 
2.9.3

From: Paolo Bonzini <pbonzini@redhat.com>

The new functions helps respecting the invariant that the coroutine
is entered with false user_resume, zero pause count and no error
recorded in the iostatus.

Resetting the iostatus is now common to all of block_job_cancel_async,
block_job_user_resume and block_job_iostatus_reset, albeit with slight
differences:

- block_job_cancel_async resets the iostatus, and resumes the job if
there was an error, but the coroutine is not restarted immediately.
For example the caller may continue with a call to block_job_finish_sync.

- block_job_user_resume resets the iostatus.  It wants to resume the job
unconditionally, even if there was no error.

- block_job_iostatus_reset doesn't resume the job at all.  Maybe that's
a bug but it should be fixed separately.

block_job_iostatus_reset does the least common denominator, so add some
checking but otherwise leave it as the entry point for resetting the
iostatus.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170508141310.8674-8-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockjob.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ static void block_job_completed_single(BlockJob *job)
     block_job_unref(job);
 }
 
+static void block_job_cancel_async(BlockJob *job)
+{
+    if (job->iostatus != BLOCK_DEVICE_IO_STATUS_OK) {
+        block_job_iostatus_reset(job);
+    }
+    if (job->user_paused) {
+        /* Do not call block_job_enter here, the caller will handle it.  */
+        job->user_paused = false;
+        job->pause_count--;
+    }
+    job->cancelled = true;
+}
+
 static void block_job_completed_txn_abort(BlockJob *job)
 {
     AioContext *ctx;
@@ -XXX,XX +XXX,XX @@ static void block_job_completed_txn_abort(BlockJob *job)
              * them; this job, however, may or may not be cancelled, depending
              * on the caller, so leave it. */
             if (other_job != job) {
-                other_job->cancelled = true;
+                block_job_cancel_async(other_job);
             }
             continue;
         }
@@ -XXX,XX +XXX,XX @@ bool block_job_user_paused(BlockJob *job)
 void block_job_user_resume(BlockJob *job)
 {
     if (job && job->user_paused && job->pause_count > 0) {
-        job->user_paused = false;
         block_job_iostatus_reset(job);
+        job->user_paused = false;
         block_job_resume(job);
     }
 }
@@ -XXX,XX +XXX,XX @@ void block_job_user_resume(BlockJob *job)
 void block_job_cancel(BlockJob *job)
 {
     if (block_job_started(job)) {
-        job->cancelled = true;
-        block_job_iostatus_reset(job);
+        block_job_cancel_async(job);
         block_job_enter(job);
     } else {
         block_job_completed(job, -ECANCELED);
@@ -XXX,XX +XXX,XX @@ void block_job_yield(BlockJob *job)
 
 void block_job_iostatus_reset(BlockJob *job)
 {
+    if (job->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
+        return;
+    }
+    assert(job->user_paused && job->pause_count > 0);
     job->iostatus = BLOCK_DEVICE_IO_STATUS_OK;
 }
 
-- 
2.9.3

From: Paolo Bonzini <pbonzini@redhat.com>

Yet another pure code movement patch, preparing for the next change.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170508141310.8674-9-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockjob.c | 128 ++++++++++++++++++++++++++++++-------------------------------
 1 file changed, 64 insertions(+), 64 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ BlockJob *block_job_get(const char *id)
     return NULL;
 }
 
+BlockJobTxn *block_job_txn_new(void)
+{
+    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
+    QLIST_INIT(&txn->jobs);
+    txn->refcnt = 1;
+    return txn;
+}
+
+static void block_job_txn_ref(BlockJobTxn *txn)
+{
+    txn->refcnt++;
+}
+
+void block_job_txn_unref(BlockJobTxn *txn)
+{
+    if (txn && --txn->refcnt == 0) {
+        g_free(txn);
+    }
+}
+
+void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
+{
+    if (!txn) {
+        return;
+    }
+
+    assert(!job->txn);
+    job->txn = txn;
+
+    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
+    block_job_txn_ref(txn);
+}
+
 static void block_job_pause(BlockJob *job)
 {
     job->pause_count++;
@@ -XXX,XX +XXX,XX @@ static void block_job_cancel_async(BlockJob *job)
     job->cancelled = true;
 }
 
+static int block_job_finish_sync(BlockJob *job,
+                                 void (*finish)(BlockJob *, Error **errp),
+                                 Error **errp)
+{
+    Error *local_err = NULL;
+    int ret;
+
+    assert(blk_bs(job->blk)->job == job);
+
+    block_job_ref(job);
+
+    finish(job, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        block_job_unref(job);
+        return -EBUSY;
+    }
+    /* block_job_drain calls block_job_enter, and it should be enough to
+     * induce progress until the job completes or moves to the main thread.
+    */
+    while (!job->deferred_to_main_loop && !job->completed) {
+        block_job_drain(job);
+    }
+    while (!job->completed) {
+        aio_poll(qemu_get_aio_context(), true);
+    }
+    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
+    block_job_unref(job);
+    return ret;
+}
+
 static void block_job_completed_txn_abort(BlockJob *job)
 {
     AioContext *ctx;
@@ -XXX,XX +XXX,XX @@ void block_job_cancel(BlockJob *job)
     }
 }
 
-static int block_job_finish_sync(BlockJob *job,
-                                 void (*finish)(BlockJob *, Error **errp),
-                                 Error **errp)
-{
-    Error *local_err = NULL;
-    int ret;
-
-    assert(blk_bs(job->blk)->job == job);
-
-    block_job_ref(job);
-
-    finish(job, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        block_job_unref(job);
-        return -EBUSY;
-    }
-    /* block_job_drain calls block_job_enter, and it should be enough to
-     * induce progress until the job completes or moves to the main thread.
-    */
-    while (!job->deferred_to_main_loop && !job->completed) {
-        block_job_drain(job);
-    }
-    while (!job->completed) {
-        aio_poll(qemu_get_aio_context(), true);
-    }
-    ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
-    block_job_unref(job);
-    return ret;
-}
-
 /* A wrapper around block_job_cancel() taking an Error ** parameter so it may be
  * used with block_job_finish_sync() without the need for (rather nasty)
  * function pointer casts there. */
@@ -XXX,XX +XXX,XX @@ void block_job_defer_to_main_loop(BlockJob *job,
     aio_bh_schedule_oneshot(qemu_get_aio_context(),
                             block_job_defer_to_main_loop_bh, data);
 }
-
-BlockJobTxn *block_job_txn_new(void)
-{
-    BlockJobTxn *txn = g_new0(BlockJobTxn, 1);
-    QLIST_INIT(&txn->jobs);
-    txn->refcnt = 1;
-    return txn;
-}
-
-static void block_job_txn_ref(BlockJobTxn *txn)
-{
-    txn->refcnt++;
-}
-
-void block_job_txn_unref(BlockJobTxn *txn)
-{
-    if (txn && --txn->refcnt == 0) {
-        g_free(txn);
-    }
-}
-
-void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
-{
-    if (!txn) {
-        return;
-    }
-
-    assert(!job->txn);
-    job->txn = txn;
-
-    QLIST_INSERT_HEAD(&txn->jobs, job, txn_list);
-    block_job_txn_ref(txn);
-}
-- 
2.9.3

From: Paolo Bonzini <pbonzini@redhat.com>

Unlike test-blockjob-txn, QMP releases the reference to the transaction
before the jobs finish.  Thus, qemu-iotest 124 showed a failure while
working on the next patch that the unit tests did not have.  Make
the test a little nastier.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170508141310.8674-10-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 tests/test-blockjob-txn.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tests/test-blockjob-txn.c b/tests/test-blockjob-txn.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/test-blockjob-txn.c
+++ b/tests/test-blockjob-txn.c
@@ -XXX,XX +XXX,XX @@ static void test_pair_jobs(int expected1, int expected2)
     block_job_start(job1);
     block_job_start(job2);
 
+    /* Release our reference now to trigger as many nice
+     * use-after-free bugs as possible.
+     */
+    block_job_txn_unref(txn);
+
     if (expected1 == -ECANCELED) {
         block_job_cancel(job1);
     }
@@ -XXX,XX +XXX,XX @@ static void test_pair_jobs(int expected1, int expected2)
 
     g_assert_cmpint(result1, ==, expected1);
     g_assert_cmpint(result2, ==, expected2);
-
-    block_job_txn_unref(txn);
 }
 
 static void test_pair_jobs_success(void)
-- 
2.9.3

From: Paolo Bonzini <pbonzini@redhat.com>

This splits the part that touches job states from the part that invokes
callbacks.  It will make the code simpler to understand once job states will
be protected by a different mutex than the AioContext lock.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170508141310.8674-11-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockjob.c | 35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ void block_job_start(BlockJob *job)
 
 static void block_job_completed_single(BlockJob *job)
 {
+    assert(job->completed);
+
     if (!job->ret) {
         if (job->driver->commit) {
             job->driver->commit(job);
@@ -XXX,XX +XXX,XX @@ static int block_job_finish_sync(BlockJob *job,
 
     block_job_ref(job);
 
-    finish(job, &local_err);
+    if (finish) {
+        finish(job, &local_err);
+    }
     if (local_err) {
         error_propagate(errp, local_err);
         block_job_unref(job);
@@ -XXX,XX +XXX,XX @@ static void block_job_completed_txn_abort(BlockJob *job)
 {
     AioContext *ctx;
     BlockJobTxn *txn = job->txn;
-    BlockJob *other_job, *next;
+    BlockJob *other_job;
 
     if (txn->aborting) {
         /*
@@ -XXX,XX +XXX,XX @@ static void block_job_completed_txn_abort(BlockJob *job)
         return;
     }
     txn->aborting = true;
+    block_job_txn_ref(txn);
+
     /* We are the first failed job. Cancel other jobs. */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
         ctx = blk_get_aio_context(other_job->blk);
         aio_context_acquire(ctx);
     }
+
+    /* Other jobs are effectively cancelled by us, set the status for
+     * them; this job, however, may or may not be cancelled, depending
+     * on the caller, so leave it. */
     QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
-        if (other_job == job || other_job->completed) {
-            /* Other jobs are "effectively" cancelled by us, set the status for
-             * them; this job, however, may or may not be cancelled, depending
-             * on the caller, so leave it. */
-            if (other_job != job) {
-                block_job_cancel_async(other_job);
-            }
-            continue;
+        if (other_job != job) {
+            block_job_cancel_async(other_job);
         }
-        block_job_cancel_sync(other_job);
-        assert(other_job->completed);
     }
-    QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
+    while (!QLIST_EMPTY(&txn->jobs)) {
+        other_job = QLIST_FIRST(&txn->jobs);
         ctx = blk_get_aio_context(other_job->blk);
+        if (!other_job->completed) {
+            assert(other_job->cancelled);
+            block_job_finish_sync(other_job, NULL, NULL);
+        }
         block_job_completed_single(other_job);
         aio_context_release(ctx);
     }
+
+    block_job_txn_unref(txn);
 }
 
 static void block_job_completed_txn_success(BlockJob *job)
-- 
2.9.3

From: Paolo Bonzini <pbonzini@redhat.com>

All block jobs are using block_job_defer_to_main_loop as the final
step just before the coroutine terminates.  At this point,
block_job_enter should do nothing, but currently it restarts
the freed coroutine.

Now, the job->co states should probably be changed to an enum
(e.g. BEFORE_START, STARTED, YIELDED, COMPLETED) subsuming
block_job_started, job->deferred_to_main_loop and job->busy.
For now, this patch eliminates the problematic reenter by
removing the reset of job->deferred_to_main_loop (which served
no purpose, as far as I could see) and checking the flag in
block_job_enter.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20170508141310.8674-12-pbonzini@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockjob.c                   | 10 ++++++++--
 include/block/blockjob_int.h |  3 ++-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ void block_job_resume_all(void)
 
 void block_job_enter(BlockJob *job)
 {
-    if (job->co && !job->busy) {
+    if (!block_job_started(job)) {
+        return;
+    }
+    if (job->deferred_to_main_loop) {
+        return;
+    }
+
+    if (!job->busy) {
         bdrv_coroutine_enter(blk_bs(job->blk), job->co);
     }
 }
@@ -XXX,XX +XXX,XX @@ static void block_job_defer_to_main_loop_bh(void *opaque)
         aio_context_acquire(aio_context);
     }
 
-    data->job->deferred_to_main_loop = false;
     data->fn(data->job, data->opaque);
 
     if (aio_context != data->aio_context) {
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -XXX,XX +XXX,XX @@ typedef void BlockJobDeferToMainLoopFn(BlockJob *job, void *opaque);
  * @fn: The function to run in the main loop
  * @opaque: The opaque value that is passed to @fn
  *
- * Execute a given function in the main loop with the BlockDriverState
+ * This function must be called by the main job coroutine just before it
+ * returns.  @fn is executed in the main loop with the BlockDriverState
  * AioContext acquired.  Block jobs must call bdrv_unref(), bdrv_close(), and
  * anything that uses bdrv_drain_all() in the main loop.
  *
-- 
2.9.3

On current released versions of glusterfs, glfs_lseek() will sometimes
return invalid values for SEEK_DATA or SEEK_HOLE.  For SEEK_DATA and
SEEK_HOLE, the returned value should be >= the passed offset, or < 0 in
the case of error:

LSEEK(2):

off_t lseek(int fd, off_t offset, int whence);

[...]

SEEK_HOLE
              Adjust  the file offset to the next hole in the file greater
              than or equal to offset.  If offset points into the middle of
              a hole, then the file offset is set to offset.  If there is no
              hole past offset, then the file offset is adjusted to the end
              of the file (i.e., there is  an implicit hole at the end of
              any file).

[...]

RETURN VALUE
              Upon  successful  completion,  lseek()  returns  the resulting
              offset location as measured in bytes from the beginning of the
              file.  On error, the value (off_t) -1 is returned and errno is
              set to indicate the error

However, occasionally glfs_lseek() for SEEK_HOLE/DATA will return a
value less than the passed offset, yet greater than zero.

For instance, here are example values observed from this call:

offs = glfs_lseek(s->fd, start, SEEK_HOLE);
    if (offs < 0) {
        return -errno;          /* D1 and (H3 or H4) */
    }

start == 7608336384
offs == 7607877632

This causes QEMU to abort on the assert test.  When this value is
returned, errno is also 0.

This is a reported and known bug to glusterfs:
https://bugzilla.redhat.com/show_bug.cgi?id=1425293

Although this is being fixed in gluster, we still should work around it
in QEMU, given that multiple released versions of gluster behave this
way.

This patch treats the return case of (offs < start) the same as if an
error value other than ENXIO is returned; we will assume we learned
nothing, and there are no holes in the file.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Message-id: 87c0140e9407c08f6e74b04131b610f2e27c014c.1495560397.git.jcody@redhat.com
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/gluster.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index XXXXXXX..XXXXXXX 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -XXX,XX +XXX,XX @@ static int find_allocation(BlockDriverState *bs, off_t start,
     if (offs < 0) {
         return -errno;          /* D3 or D4 */
     }
-    assert(offs >= start);
+
+    if (offs < start) {
+        /* This is not a valid return by lseek().  We are safe to just return
+         * -EIO in this case, and we'll treat it like D4. Unfortunately some
+         *  versions of gluster server will return offs < start, so an assert
+         *  here will unnecessarily abort QEMU. */
+        return -EIO;
+    }
 
     if (offs > start) {
         /* D2: in hole, next data at offs */
@@ -XXX,XX +XXX,XX @@ static int find_allocation(BlockDriverState *bs, off_t start,
     if (offs < 0) {
         return -errno;          /* D1 and (H3 or H4) */
     }
-    assert(offs >= start);
+
+    if (offs < start) {
+        /* This is not a valid return by lseek().  We are safe to just return
+         * -EIO in this case, and we'll treat it like H4. Unfortunately some
+         *  versions of gluster server will return offs < start, so an assert
+         *  here will unnecessarily abort QEMU. */
+        return -EIO;
+    }
 
     if (offs > start) {
         /*
-- 
2.9.3

The following changes since commit 0ab4537f08e09b13788db67efd760592fb7db769:

Merge remote-tracking branch 'remotes/stefanberger/tags/pull-tpm-2018-03-07-1' into staging (2018-03-08 12:56:39 +0000)

are available in the Git repository at:

git://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 4486e89c219c0d1b9bd8dfa0b1dd5b0d51ff2268:

vl: introduce vm_shutdown() (2018-03-08 17:38:51 +0000)

----------------------------------------------------------------

Deepa Srinivasan (1):
  block: Fix qemu crash when using scsi-block

Fam Zheng (1):
  README: Fix typo 'git-publish'

Sergio Lopez (1):
  virtio-blk: dataplane: Don't batch notifications if EVENT_IDX is
    present

Stefan Hajnoczi (4):
  block: add aio_wait_bh_oneshot()
  virtio-blk: fix race between .ioeventfd_stop() and vq handler
  virtio-scsi: fix race between .ioeventfd_stop() and vq handler
  vl: introduce vm_shutdown()

-- 
2.14.3

From: Deepa Srinivasan <deepa.srinivasan@oracle.com>

Starting qemu with the following arguments causes qemu to segfault:
... -device lsi,id=lsi0 -drive file=iscsi:<...>,format=raw,if=none,node-name=
iscsi1 -device scsi-block,bus=lsi0.0,id=<...>,drive=iscsi1

This patch fixes blk_aio_ioctl() so it does not pass stack addresses to
blk_aio_ioctl_entry() which may be invoked after blk_aio_ioctl() returns. More
details about the bug follow.

blk_aio_ioctl() invokes blk_aio_prwv() with blk_aio_ioctl_entry as the
coroutine parameter. blk_aio_prwv() ultimately calls aio_co_enter().

When blk_aio_ioctl() is executed from within a coroutine context (e.g.
iscsi_bh_cb()), aio_co_enter() adds the coroutine (blk_aio_ioctl_entry) to
the current coroutine's wakeup queue. blk_aio_ioctl() then returns.

When blk_aio_ioctl_entry() executes later, it accesses an invalid pointer:
....
    BlkRwCo *rwco = &acb->rwco;

rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
                             rwco->qiov->iov[0].iov_base);  <--- qiov is
                                                                 invalid here
...

In the case when blk_aio_ioctl() is called from a non-coroutine context,
blk_aio_ioctl_entry() executes immediately. But if bdrv_co_ioctl() calls
qemu_coroutine_yield(), blk_aio_ioctl() will return. When the coroutine
execution is complete, control returns to blk_aio_ioctl_entry() after the call
to blk_co_ioctl(). There is no invalid reference after this point, but the
function is still holding on to invalid pointers.

The fix is to change blk_aio_prwv() to accept a void pointer for the IO buffer
rather than a QEMUIOVector. blk_aio_prwv() passes this through in BlkRwCo and the
coroutine function casts it to QEMUIOVector or uses the void pointer directly.

Signed-off-by: Deepa Srinivasan <deepa.srinivasan@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Mark Kanda <mark.kanda@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/block-backend.c | 51 +++++++++++++++++++++++++--------------------------
 1 file changed, 25 insertions(+), 26 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index XXXXXXX..XXXXXXX 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_pwritev(BlockBackend *blk, int64_t offset,
 typedef struct BlkRwCo {
     BlockBackend *blk;
     int64_t offset;
-    QEMUIOVector *qiov;
+    void *iobuf;
     int ret;
     BdrvRequestFlags flags;
 } BlkRwCo;
@@ -XXX,XX +XXX,XX @@ typedef struct BlkRwCo {
 static void blk_read_entry(void *opaque)
 {
     BlkRwCo *rwco = opaque;
+    QEMUIOVector *qiov = rwco->iobuf;
 
-    rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, rwco->qiov->size,
-                              rwco->qiov, rwco->flags);
+    rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, qiov->size,
+                              qiov, rwco->flags);
 }
 
 static void blk_write_entry(void *opaque)
 {
     BlkRwCo *rwco = opaque;
+    QEMUIOVector *qiov = rwco->iobuf;
 
-    rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, rwco->qiov->size,
-                               rwco->qiov, rwco->flags);
+    rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, qiov->size,
+                               qiov, rwco->flags);
 }
 
 static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
@@ -XXX,XX +XXX,XX @@ static int blk_prw(BlockBackend *blk, int64_t offset, uint8_t *buf,
     rwco = (BlkRwCo) {
         .blk    = blk,
         .offset = offset,
-        .qiov   = &qiov,
+        .iobuf  = &qiov,
         .flags  = flags,
         .ret    = NOT_DONE,
     };
@@ -XXX,XX +XXX,XX @@ static void blk_aio_complete_bh(void *opaque)
 }
 
 static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
-                                QEMUIOVector *qiov, CoroutineEntry co_entry,
+                                void *iobuf, CoroutineEntry co_entry,
                                 BdrvRequestFlags flags,
                                 BlockCompletionFunc *cb, void *opaque)
 {
@@ -XXX,XX +XXX,XX @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
     acb->rwco = (BlkRwCo) {
         .blk    = blk,
         .offset = offset,
-        .qiov   = qiov,
+        .iobuf  = iobuf,
         .flags  = flags,
         .ret    = NOT_DONE,
     };
@@ -XXX,XX +XXX,XX @@ static void blk_aio_read_entry(void *opaque)
 {
     BlkAioEmAIOCB *acb = opaque;
     BlkRwCo *rwco = &acb->rwco;
+    QEMUIOVector *qiov = rwco->iobuf;
 
-    assert(rwco->qiov->size == acb->bytes);
+    assert(qiov->size == acb->bytes);
     rwco->ret = blk_co_preadv(rwco->blk, rwco->offset, acb->bytes,
-                              rwco->qiov, rwco->flags);
+                              qiov, rwco->flags);
     blk_aio_complete(acb);
 }
 
@@ -XXX,XX +XXX,XX @@ static void blk_aio_write_entry(void *opaque)
 {
     BlkAioEmAIOCB *acb = opaque;
     BlkRwCo *rwco = &acb->rwco;
+    QEMUIOVector *qiov = rwco->iobuf;
 
-    assert(!rwco->qiov || rwco->qiov->size == acb->bytes);
+    assert(!qiov || qiov->size == acb->bytes);
     rwco->ret = blk_co_pwritev(rwco->blk, rwco->offset, acb->bytes,
-                               rwco->qiov, rwco->flags);
+                               qiov, rwco->flags);
     blk_aio_complete(acb);
 }
 
@@ -XXX,XX +XXX,XX @@ int blk_co_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
 static void blk_ioctl_entry(void *opaque)
 {
     BlkRwCo *rwco = opaque;
+    QEMUIOVector *qiov = rwco->iobuf;
+
     rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
-                             rwco->qiov->iov[0].iov_base);
+                             qiov->iov[0].iov_base);
 }
 
 int blk_ioctl(BlockBackend *blk, unsigned long int req, void *buf)
@@ -XXX,XX +XXX,XX @@ static void blk_aio_ioctl_entry(void *opaque)
     BlkAioEmAIOCB *acb = opaque;
     BlkRwCo *rwco = &acb->rwco;
 
-    rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset,
-                             rwco->qiov->iov[0].iov_base);
+    rwco->ret = blk_co_ioctl(rwco->blk, rwco->offset, rwco->iobuf);
+
     blk_aio_complete(acb);
 }
 
 BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *buf,
                           BlockCompletionFunc *cb, void *opaque)
 {
-    QEMUIOVector qiov;
-    struct iovec iov;
-
-    iov = (struct iovec) {
-        .iov_base = buf,
-        .iov_len = 0,
-    };
-    qemu_iovec_init_external(&qiov, &iov, 1);
-
-    return blk_aio_prwv(blk, req, 0, &qiov, blk_aio_ioctl_entry, 0, cb, opaque);
+    return blk_aio_prwv(blk, req, 0, buf, blk_aio_ioctl_entry, 0, cb, opaque);
 }
 
 int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int bytes)
@@ -XXX,XX +XXX,XX @@ int blk_truncate(BlockBackend *blk, int64_t offset, PreallocMode prealloc,
 static void blk_pdiscard_entry(void *opaque)
 {
     BlkRwCo *rwco = opaque;
-    rwco->ret = blk_co_pdiscard(rwco->blk, rwco->offset, rwco->qiov->size);
+    QEMUIOVector *qiov = rwco->iobuf;
+
+    rwco->ret = blk_co_pdiscard(rwco->blk, rwco->offset, qiov->size);
 }
 
 int blk_pdiscard(BlockBackend *blk, int64_t offset, int bytes)
-- 
2.14.3

From: Sergio Lopez <slp@redhat.com>

Commit 5b2ffbe4d99843fd8305c573a100047a8c962327 ("virtio-blk: dataplane:
notify guest as a batch") deferred guest notification to a BH in order
batch notifications, with purpose of avoiding flooding the guest with
interruptions.

This optimization came with a cost. The average latency perceived in the
guest is increased by a few microseconds, but also when multiple IO
operations finish at the same time, the guest won't be notified until
all completions from each operation has been run. On the contrary,
virtio-scsi issues the notification at the end of each completion.

On the other hand, nowadays we have the EVENT_IDX feature that allows a
better coordination between QEMU and the Guest OS to avoid sending
unnecessary interruptions.

With this change, virtio-blk/dataplane only batches notifications if the
EVENT_IDX feature is not present.

Some numbers obtained with fio (ioengine=sync, iodepth=1, direct=1):
 - Test specs:
   * fio-3.4 (ioengine=sync, iodepth=1, direct=1)
   * qemu master
   * virtio-blk with a dedicated iothread (default poll-max-ns)
   * backend: null_blk nr_devices=1 irqmode=2 completion_nsec=280000
   * 8 vCPUs pinned to isolated physical cores
   * Emulator and iothread also pinned to separate isolated cores
   * variance between runs < 1%

- Not patched
   * numjobs=1:  lat_avg=327.32  irqs=29998
   * numjobs=4:  lat_avg=337.89  irqs=29073
   * numjobs=8:  lat_avg=342.98  irqs=28643

- Patched:
   * numjobs=1:  lat_avg=323.92  irqs=30262
   * numjobs=4:  lat_avg=332.65  irqs=29520
   * numjobs=8:  lat_avg=335.54  irqs=29323

Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-id: 20180307114459.26636-1-slp@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -XXX,XX +XXX,XX @@ struct VirtIOBlockDataPlane {
     VirtIODevice *vdev;
     QEMUBH *bh;                     /* bh for guest notification */
     unsigned long *batch_notify_vqs;
+    bool batch_notifications;
 
     /* Note that these EventNotifiers are assigned by value.  This is
      * fine as long as you do not call event_notifier_cleanup on them
@@ -XXX,XX +XXX,XX @@ struct VirtIOBlockDataPlane {
 /* Raise an interrupt to signal guest, if necessary */
 void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, VirtQueue *vq)
 {
-    set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
-    qemu_bh_schedule(s->bh);
+    if (s->batch_notifications) {
+        set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
+        qemu_bh_schedule(s->bh);
+    } else {
+        virtio_notify_irqfd(s->vdev, vq);
+    }
 }
 
 static void notify_guest_bh(void *opaque)
@@ -XXX,XX +XXX,XX @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
     s->starting = true;
 
+    if (!virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
+        s->batch_notifications = true;
+    } else {
+        s->batch_notifications = false;
+    }
+
     /* Set up guest notifier (irq) */
     r = k->set_guest_notifiers(qbus->parent, nvqs, true);
     if (r != 0) {
-- 
2.14.3

Sometimes it's necessary for the main loop thread to run a BH in an
IOThread and wait for its completion.  This primitive is useful during
startup/shutdown to synchronize and avoid race conditions.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20180307144205.20619-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/aio-wait.h | 13 +++++++++++++
 util/aio-wait.c          | 31 +++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/include/block/aio-wait.h b/include/block/aio-wait.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/aio-wait.h
+++ b/include/block/aio-wait.h
@@ -XXX,XX +XXX,XX @@ typedef struct {
  */
 void aio_wait_kick(AioWait *wait);
 
+/**
+ * aio_wait_bh_oneshot:
+ * @ctx: the aio context
+ * @cb: the BH callback function
+ * @opaque: user data for the BH callback function
+ *
+ * Run a BH in @ctx and wait for it to complete.
+ *
+ * Must be called from the main loop thread with @ctx acquired exactly once.
+ * Note that main loop event processing may occur.
+ */
+void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque);
+
 #endif /* QEMU_AIO_WAIT */
diff --git a/util/aio-wait.c b/util/aio-wait.c
index XXXXXXX..XXXXXXX 100644
--- a/util/aio-wait.c
+++ b/util/aio-wait.c
@@ -XXX,XX +XXX,XX @@ void aio_wait_kick(AioWait *wait)
         aio_bh_schedule_oneshot(qemu_get_aio_context(), dummy_bh_cb, NULL);
     }
 }
+
+typedef struct {
+    AioWait wait;
+    bool done;
+    QEMUBHFunc *cb;
+    void *opaque;
+} AioWaitBHData;
+
+/* Context: BH in IOThread */
+static void aio_wait_bh(void *opaque)
+{
+    AioWaitBHData *data = opaque;
+
+    data->cb(data->opaque);
+
+    data->done = true;
+    aio_wait_kick(&data->wait);
+}
+
+void aio_wait_bh_oneshot(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
+{
+    AioWaitBHData data = {
+        .cb = cb,
+        .opaque = opaque,
+    };
+
+    assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+
+    aio_bh_schedule_oneshot(ctx, aio_wait_bh, &data);
+    AIO_WAIT_WHILE(&data.wait, ctx, !data.done);
+}
-- 
2.14.3

If the main loop thread invokes .ioeventfd_stop() just as the vq handler
function begins in the IOThread then the handler may lose the race for
the AioContext lock.  By the time the vq handler is able to acquire the
AioContext lock the ioeventfd has already been removed and the handler
isn't supposed to run anymore!

Use the new aio_wait_bh_oneshot() function to perform ioeventfd removal
from within the IOThread.  This way no races with the vq handler are
possible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20180307144205.20619-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -XXX,XX +XXX,XX @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
     return -ENOSYS;
 }
 
+/* Stop notifications for new requests from guest.
+ *
+ * Context: BH in IOThread
+ */
+static void virtio_blk_data_plane_stop_bh(void *opaque)
+{
+    VirtIOBlockDataPlane *s = opaque;
+    unsigned i;
+
+    for (i = 0; i < s->conf->num_queues; i++) {
+        VirtQueue *vq = virtio_get_queue(s->vdev, i);
+
+        virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, NULL);
+    }
+}
+
 /* Context: QEMU global mutex held */
 void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 {
@@ -XXX,XX +XXX,XX @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
     trace_virtio_blk_data_plane_stop(s);
 
     aio_context_acquire(s->ctx);
-
-    /* Stop notifications for new requests from guest */
-    for (i = 0; i < nvqs; i++) {
-        VirtQueue *vq = virtio_get_queue(s->vdev, i);
-
-        virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, NULL);
-    }
+    aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s);
 
     /* Drain and switch bs back to the QEMU main loop */
     blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
-- 
2.14.3

Use the new aio_wait_bh_oneshot() function to perform ioeventfd removal
from within the IOThread.  This way no races with the vq handler are
possible.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20180307144205.20619-4-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/virtio-scsi-dataplane.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -XXX,XX +XXX,XX @@ static int virtio_scsi_vring_init(VirtIOSCSI *s, VirtQueue *vq, int n,
     return 0;
 }
 
-/* assumes s->ctx held */
-static void virtio_scsi_clear_aio(VirtIOSCSI *s)
+/* Context: BH in IOThread */
+static void virtio_scsi_dataplane_stop_bh(void *opaque)
 {
+    VirtIOSCSI *s = opaque;
     VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
     int i;
 
@@ -XXX,XX +XXX,XX @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev)
     return 0;
 
 fail_vrings:
-    virtio_scsi_clear_aio(s);
+    aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
     aio_context_release(s->ctx);
     for (i = 0; i < vs->conf.num_queues + 2; i++) {
         virtio_bus_set_host_notifier(VIRTIO_BUS(qbus), i, false);
@@ -XXX,XX +XXX,XX @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev)
     s->dataplane_stopping = true;
 
     aio_context_acquire(s->ctx);
-    virtio_scsi_clear_aio(s);
+    aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s);
     aio_context_release(s->ctx);
 
     blk_drain_all(); /* ensure there are no in-flight requests */
-- 
2.14.3

Commit 00d09fdbbae5f7864ce754913efc84c12fdf9f1a ("vl: pause vcpus before
stopping iothreads") and commit dce8921b2baaf95974af8176406881872067adfa
("iothread: Stop threads before main() quits") tried to work around the
fact that emulation was still active during termination by stopping
iothreads.  They suffer from race conditions:
1. virtio_scsi_handle_cmd_vq() racing with iothread_stop_all() hits the
   virtio_scsi_ctx_check() assertion failure because the BDS AioContext
   has been modified by iothread_stop_all().
2. Guest vq kick racing with main loop termination leaves a readable
   ioeventfd that is handled by the next aio_poll() when external
   clients are enabled again, resulting in unwanted emulation activity.

This patch obsoletes those commits by fully disabling emulation activity
when vcpus are stopped.

Use the new vm_shutdown() function instead of pause_all_vcpus() so that
vm change state handlers are invoked too.  Virtio devices will now stop
their ioeventfds, preventing further emulation activity after vm_stop().

Note that vm_stop(RUN_STATE_SHUTDOWN) cannot be used because it emits a
QMP STOP event that may affect existing clients.

It is no longer necessary to call replay_disable_events() directly since
vm_shutdown() does so already.

Drop iothread_stop_all() since it is no longer used.

Cc: Fam Zheng <famz@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20180307144205.20619-5-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/sysemu/iothread.h |  1 -
 include/sysemu/sysemu.h   |  1 +
 cpus.c                    | 16 +++++++++++++---
 iothread.c                | 31 -------------------------------
 vl.c                      | 13 +++----------
 5 files changed, 17 insertions(+), 45 deletions(-)

diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/iothread.h
+++ b/include/sysemu/iothread.h
@@ -XXX,XX +XXX,XX @@ typedef struct {
 char *iothread_get_id(IOThread *iothread);
 IOThread *iothread_by_id(const char *id);
 AioContext *iothread_get_aio_context(IOThread *iothread);
-void iothread_stop_all(void);
 GMainContext *iothread_get_g_main_context(IOThread *iothread);
 
 /*
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -XXX,XX +XXX,XX @@ void vm_start(void);
 int vm_prepare_start(void);
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
+int vm_shutdown(void);
 
 typedef enum WakeupReason {
     /* Always keep QEMU_WAKEUP_REASON_NONE = 0 */
diff --git a/cpus.c b/cpus.c
index XXXXXXX..XXXXXXX 100644
--- a/cpus.c
+++ b/cpus.c
@@ -XXX,XX +XXX,XX @@ void cpu_synchronize_all_pre_loadvm(void)
     }
 }
 
-static int do_vm_stop(RunState state)
+static int do_vm_stop(RunState state, bool send_stop)
 {
     int ret = 0;
 
@@ -XXX,XX +XXX,XX @@ static int do_vm_stop(RunState state)
         pause_all_vcpus();
         runstate_set(state);
         vm_state_notify(0, state);
-        qapi_event_send_stop(&error_abort);
+        if (send_stop) {
+            qapi_event_send_stop(&error_abort);
+        }
     }
 
     bdrv_drain_all();
@@ -XXX,XX +XXX,XX @@ static int do_vm_stop(RunState state)
     return ret;
 }
 
+/* Special vm_stop() variant for terminating the process.  Historically clients
+ * did not expect a QMP STOP event and so we need to retain compatibility.
+ */
+int vm_shutdown(void)
+{
+    return do_vm_stop(RUN_STATE_SHUTDOWN, false);
+}
+
 static bool cpu_can_run(CPUState *cpu)
 {
     if (cpu->stop) {
@@ -XXX,XX +XXX,XX @@ int vm_stop(RunState state)
         return 0;
     }
 
-    return do_vm_stop(state);
+    return do_vm_stop(state, true);
 }
 
 /**
diff --git a/iothread.c b/iothread.c
index XXXXXXX..XXXXXXX 100644
--- a/iothread.c
+++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ void iothread_stop(IOThread *iothread)
     qemu_thread_join(&iothread->thread);
 }
 
-static int iothread_stop_iter(Object *object, void *opaque)
-{
-    IOThread *iothread;
-
-    iothread = (IOThread *)object_dynamic_cast(object, TYPE_IOTHREAD);
-    if (!iothread) {
-        return 0;
-    }
-    iothread_stop(iothread);
-    return 0;
-}
-
 static void iothread_instance_init(Object *obj)
 {
     IOThread *iothread = IOTHREAD(obj);
@@ -XXX,XX +XXX,XX @@ IOThreadInfoList *qmp_query_iothreads(Error **errp)
     return head;
 }
 
-void iothread_stop_all(void)
-{
-    Object *container = object_get_objects_root();
-    BlockDriverState *bs;
-    BdrvNextIterator it;
-
-    for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
-        AioContext *ctx = bdrv_get_aio_context(bs);
-        if (ctx == qemu_get_aio_context()) {
-            continue;
-        }
-        aio_context_acquire(ctx);
-        bdrv_set_aio_context(bs, qemu_get_aio_context());
-        aio_context_release(ctx);
-    }
-
-    object_child_foreach(container, iothread_stop_iter, NULL);
-}
-
 static gpointer iothread_g_main_context_init(gpointer opaque)
 {
     AioContext *ctx;
diff --git a/vl.c b/vl.c
index XXXXXXX..XXXXXXX 100644
--- a/vl.c
+++ b/vl.c
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv, char **envp)
     os_setup_post();
 
     main_loop();
-    replay_disable_events();
 
-    /* The ordering of the following is delicate.  Stop vcpus to prevent new
-     * I/O requests being queued by the guest.  Then stop IOThreads (this
-     * includes a drain operation and completes all request processing).  At
-     * this point emulated devices are still associated with their IOThreads
-     * (if any) but no longer have any work to do.  Only then can we close
-     * block devices safely because we know there is no more I/O coming.
-     */
-    pause_all_vcpus();
-    iothread_stop_all();
+    /* No more vcpu or device emulation activity beyond this point */
+    vm_shutdown();
+
     bdrv_close_all();
 
     res_free();
-- 
2.14.3