Series comparison

-[Qemu-devel] [PULL 00/61] Block layer patches
+[PULL 00/10] Block layer patches
-The following changes since commit 4c8c1cc544dbd5e2564868e61c5037258e393832:
+The following changes since commit ec11dc41eec5142b4776db1296972c6323ba5847:
-  Merge remote-tracking branch 'remotes/vivier/tags/m68k-for-2.10-pull-request' into staging (2017-06-22 19:01:58 +0100)
+  Merge tag 'pull-misc-2022-05-11' of git://repo.or.cz/qemu/armbru into staging (2022-05-11 09:00:26 -0700)
-are available in the git repository at:
+are available in the Git repository at:
   git://repo.or.cz/qemu/kevin.git tags/for-upstream
-for you to fetch changes up to 1512008812410ca4054506a7c44343088abdd977:
+for you to fetch changes up to f70625299ecc9ba577c87f3d1d75012c747c7d88:
-  Merge remote-tracking branch 'mreitz/tags/pull-block-2017-06-23' into queue-block (2017-06-23 14:09:12 +0200)
+  qemu-iotests: inline common.config into common.rc (2022-05-12 15:42:49 +0200)
 ----------------------------------------------------------------
 Block layer patches
+- coroutine: Fix crashes due to too large pool batch size
+- fdc: Prevent end-of-track overrun
+- nbd: MULTI_CONN for shared writable exports
+- iotests test runner improvements
 ----------------------------------------------------------------
-Alberto Garcia (9):
+Daniel P. Berrangé (2):
-      throttle: Update throttle-groups.c documentation
+      tests/qemu-iotests: print intent to run a test in TAP mode
-      qcow2: Remove unused Error variable in do_perform_cow()
+      .gitlab-ci.d: export meson testlog.txt as an artifact
       qcow2: Use unsigned int for both members of Qcow2COWRegion
       qcow2: Make perform_cow() call do_perform_cow() twice
       qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()
       qcow2: Allow reading both COW regions with only one request
       qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()
       qcow2: Merge the writing of the COW regions with the guest data
       qcow2: Use offset_into_cluster() and offset_to_l2_index()
-Kevin Wolf (37):
+Eric Blake (2):
-      commit: Fix completion with extra reference
+      qemu-nbd: Pass max connections to blockdev layer
-      qemu-iotests: Allow starting new qemu after cleanup
+      nbd/server: Allow MULTI_CONN for shared writable exports
       qemu-iotests: Test exiting qemu with running job
       doc: Document generic -blockdev options
       doc: Document driver-specific -blockdev options
       qed: Use bottom half to resume waiting requests
       qed: Make qed_read_table() synchronous
       qed: Remove callback from qed_read_table()
       qed: Remove callback from qed_read_l2_table()
       qed: Remove callback from qed_find_cluster()
       qed: Make qed_read_backing_file() synchronous
       qed: Make qed_copy_from_backing_file() synchronous
       qed: Remove callback from qed_copy_from_backing_file()
       qed: Make qed_write_header() synchronous
       qed: Remove callback from qed_write_header()
       qed: Make qed_write_table() synchronous
       qed: Remove GenericCB
       qed: Remove callback from qed_write_table()
       qed: Make qed_aio_read_data() synchronous
       qed: Make qed_aio_write_main() synchronous
       qed: Inline qed_commit_l2_update()
       qed: Add return value to qed_aio_write_l1_update()
       qed: Add return value to qed_aio_write_l2_update()
       qed: Add return value to qed_aio_write_main()
       qed: Add return value to qed_aio_write_cow()
       qed: Add return value to qed_aio_write_inplace/alloc()
       qed: Add return value to qed_aio_read/write_data()
       qed: Remove ret argument from qed_aio_next_io()
       qed: Remove recursion in qed_aio_next_io()
       qed: Implement .bdrv_co_readv/writev
       qed: Use CoQueue for serialising allocations
       qed: Simplify request handling
       qed: Use a coroutine for need_check_timer
       qed: Add coroutine_fn to I/O path functions
       qed: Use bdrv_co_* for coroutine_fns
       block: Remove bdrv_aio_readv/writev/flush()
       Merge remote-tracking branch 'mreitz/tags/pull-block-2017-06-23' into queue-block
-Manos Pitsidianakis (1):
+Hanna Reitz (1):
-      block: change variable names in BlockDriverState
+      iotests/testrunner: Flush after run_test()
-Max Reitz (3):
+Kevin Wolf (2):
-      blkdebug: Catch bs->exact_filename overflow
+      coroutine: Rename qemu_coroutine_inc/dec_pool_size()
-      blkverify: Catch bs->exact_filename overflow
+      coroutine: Revert to constant batch size
       block: Do not strcmp() with NULL uri->scheme
-Stefan Hajnoczi (10):
+Paolo Bonzini (1):
-      block: count bdrv_co_rw_vmstate() requests
+      qemu-iotests: inline common.config into common.rc
       block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
       migration: avoid recursive AioContext locking in save_vmstate()
       migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()
       virtio-pci: use ioeventfd even when KVM is disabled
       migration: hold AioContext lock for loadvm qemu_fclose()
       qemu-iotests: 068: extract _qemu() function
       qemu-iotests: 068: use -drive/-device instead of -hda
       qemu-iotests: 068: test iothread mode
       qemu-img: don't shadow opts variable in img_dd()
-Stephen Bates (1):
+Philippe Mathieu-Daudé (2):
-      nvme: Add support for Read Data and Write Data in CMBs.
+      hw/block/fdc: Prevent end-of-track overrun (CVE-2021-3507)
       tests/qtest/fdc-test: Add a regression test for CVE-2021-3507
-sochin.jiang (1):
+ qapi/block-export.json                           |   8 +-
-      fix: avoid an infinite loop or a dangling pointer problem in img_commit
+ docs/interop/nbd.txt                             |   1 +
  docs/tools/qemu-nbd.rst                          |   3 +-
  include/block/nbd.h                              |   5 +-
  include/qemu/coroutine.h                         |   6 +-
  blockdev-nbd.c                                   |  13 +-
  hw/block/fdc.c                                   |   8 ++
  hw/block/virtio-blk.c                            |   6 +-
  nbd/server.c                                     |  10 +-
  qemu-nbd.c                                       |   2 +-
  tests/qtest/fdc-test.c                           |  21 ++++
  util/qemu-coroutine.c                            |  26 ++--
  tests/qemu-iotests/testrunner.py                 |   4 +
  .gitlab-ci.d/buildtest-template.yml              |  12 +-
  MAINTAINERS                                      |   1 +
  tests/qemu-iotests/common.config                 |  41 -------
  tests/qemu-iotests/common.rc                     |  31 +++--
  tests/qemu-iotests/tests/nbd-multiconn           | 145 +++++++++++++++++++++++
  tests/qemu-iotests/tests/nbd-multiconn.out       |   5 +
  tests/qemu-iotests/tests/nbd-qemu-allocation.out |   2 +-
 files changed, 261 insertions(+), 89 deletions(-)
  delete mode 100644 tests/qemu-iotests/common.config
  create mode 100755 tests/qemu-iotests/tests/nbd-multiconn
  create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out
- block/Makefile.objs            |   2 +-
- block/blkdebug.c               |  46 +--
- block/blkreplay.c              |   8 +-
- block/blkverify.c              |  12 +-
- block/block-backend.c          |  22 +-
- block/commit.c                 |   7 +
- block/file-posix.c             |  34 +-
- block/io.c                     | 240 ++-----------
- block/iscsi.c                  |  20 +-
- block/mirror.c                 |   8 +-
- block/nbd-client.c             |   8 +-
- block/nbd-client.h             |   4 +-
- block/nbd.c                    |   6 +-
- block/nfs.c                    |   2 +-
- block/qcow2-cluster.c          | 201 ++++++++---
- block/qcow2.c                  |  94 +++--
- block/qcow2.h                  |  11 +-
- block/qed-cluster.c            | 124 +++----
- block/qed-gencb.c              |  33 --
- block/qed-table.c              | 261 +++++---------
- block/qed.c                    | 779 ++++++++++++++++-------------------------
- block/qed.h                    |  54 +--
- block/raw-format.c             |   8 +-
- block/rbd.c                    |   4 +-
- block/sheepdog.c               |  12 +-
- block/ssh.c                    |   2 +-
- block/throttle-groups.c        |   2 +-
- block/trace-events             |   3 -
- blockjob.c                     |   4 +-
- hw/block/nvme.c                |  83 +++--
- hw/block/nvme.h                |   1 +
- hw/virtio/virtio-pci.c         |   2 +-
- include/block/block.h          |  16 +-
- include/block/block_int.h      |   6 +-
- include/block/blockjob.h       |  18 +
- include/sysemu/block-backend.h |  20 +-
- migration/savevm.c             |  32 +-
- qemu-img.c                     |  29 +-
- qemu-io-cmds.c                 |  46 +--
- qemu-options.hx                | 221 ++++++++++--
- tests/qemu-iotests/068         |  37 +-
- tests/qemu-iotests/068.out     |  11 +-
- tests/qemu-iotests/185         | 206 +++++++++++
- tests/qemu-iotests/185.out     |  59 ++++
- tests/qemu-iotests/common.qemu |   3 +
- tests/qemu-iotests/group       |   1 +
-files changed, 1477 insertions(+), 1325 deletions(-)
- delete mode 100644 block/qed-gencb.c
- create mode 100755 tests/qemu-iotests/185
- create mode 100644 tests/qemu-iotests/185.out

-[Qemu-devel] [PULL 01/61] commit: Fix completion with extra reference
+Deleted patch
-commit_complete() can't assume that after its block_job_completed() the
-job is actually immediately freed; someone else may still be holding
-references. In this case, the op blockers on the intermediate nodes make
-the graph reconfiguration in the completion code fail.
-Call block_job_remove_all_bdrv() manually so that we know for sure that
-any blockers on intermediate nodes are given up.
-Cc: qemu-stable@nongnu.org
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
----
- block/commit.c | 7 +++++++
-file changed, 7 insertions(+)
-diff --git a/block/commit.c b/block/commit.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/commit.c
-+++ b/block/commit.c
-@@ -XXX,XX +XXX,XX @@ static void commit_complete(BlockJob *job, void *opaque)
-     }
-     g_free(s->backing_file_str);
-     blk_unref(s->top);
-+
-+    /* If there is more than one reference to the job (e.g. if called from
-+     * block_job_finish_sync()), block_job_completed() won't free it and
-+     * therefore the blockers on the intermediate nodes remain. This would
-+     * cause bdrv_set_backing_hd() to fail. */
-+    block_job_remove_all_bdrv(job);
-+
-     block_job_completed(&s->common, ret);
-     g_free(data);
---
-.8.3.1

-[Qemu-devel] [PULL 02/61] qemu-iotests: Allow starting new qemu after cleanup
+Deleted patch
-After _cleanup_qemu(), test cases should be able to start the next qemu
-process and call _cleanup_qemu() for that one as well. For this to work
-cleanly, we need to improve the cleanup so that the second invocation
-doesn't try to kill the qemu instances from the first invocation a
-second time (which would result in error messages).
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
----
- tests/qemu-iotests/common.qemu | 3 +++
-file changed, 3 insertions(+)
-diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
-index XXXXXXX..XXXXXXX 100644
---- a/tests/qemu-iotests/common.qemu
-+++ b/tests/qemu-iotests/common.qemu
-@@ -XXX,XX +XXX,XX @@ function _cleanup_qemu()
-         rm -f "${QEMU_FIFO_IN}_${i}" "${QEMU_FIFO_OUT}_${i}"
-         eval "exec ${QEMU_IN[$i]}<&-"   # close file descriptors
-         eval "exec ${QEMU_OUT[$i]}<&-"
-+
-+        unset QEMU_IN[$i]
-+        unset QEMU_OUT[$i]
-     done
- }
---
-.8.3.1

-[Qemu-devel] [PULL 19/61] qcow2: Make perform_cow() call do_perform_cow() twice
+[PULL 01/10] coroutine: Rename qemu_coroutine_inc/dec_pool_size()
-From: Alberto Garcia <berto@igalia.com>
+It's true that these functions currently affect the batch size in which
 coroutines are reused (i.e. moved from the global release pool to the
 allocation pool of a specific thread), but this is a bug and will be
 fixed in a separate patch.
-Instead of calling perform_cow() twice with a different COW region
+In fact, the comment in the header file already just promises that it
-each time, call it just once and make perform_cow() handle both
+influences the pool size, so reflect this in the name of the functions.
-regions.
+As a nice side effect, the shorter function name makes some line
 wrapping unnecessary.
-This patch simply moves code around. The next one will do the actual
+Cc: qemu-stable@nongnu.org
-reordering of the COW operations.
+Signed-off-by: Kevin Wolf <kwolf@redhat.com>
+Message-Id: <20220510151020.105528-2-kwolf@redhat.com>
 Signed-off-by: Alberto Garcia <berto@igalia.com>
 Reviewed-by: Eric Blake <eblake@redhat.com>
 Reviewed-by: Kevin Wolf <kwolf@redhat.com>
 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
 ---
- block/qcow2-cluster.c | 36 ++++++++++++++++++++++--------------
+ include/qemu/coroutine.h | 6 +++---
-file changed, 22 insertions(+), 14 deletions(-)
+ hw/block/virtio-blk.c    | 6 ++----
  util/qemu-coroutine.c    | 4 ++--
 files changed, 7 insertions(+), 9 deletions(-)
-diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
+diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
 index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2-cluster.c
+--- a/include/qemu/coroutine.h
-+++ b/block/qcow2-cluster.c
++++ b/include/qemu/coroutine.h
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
+@@ -XXX,XX +XXX,XX @@ void coroutine_fn yield_until_fd_readable(int fd);
-     struct iovec iov;
+ /**
-     int ret;
+  * Increase coroutine pool size
+  */
-+    if (bytes == 0) {
+-void qemu_coroutine_increase_pool_batch_size(unsigned int additional_pool_size);
-+        return 0;
++void qemu_coroutine_inc_pool_size(unsigned int additional_pool_size);
-+    }
-+
+ /**
-     iov.iov_len = bytes;
+- * Devcrease coroutine pool size
-     iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
++ * Decrease coroutine pool size
-     if (iov.iov_base == NULL) {
+  */
-@@ -XXX,XX +XXX,XX @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
+-void qemu_coroutine_decrease_pool_batch_size(unsigned int additional_pool_size);
-     return cluster_offset;
++void qemu_coroutine_dec_pool_size(unsigned int additional_pool_size);
  #include "qemu/lockable.h"
 diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
 index XXXXXXX..XXXXXXX 100644
 --- a/hw/block/virtio-blk.c
 +++ b/hw/block/virtio-blk.c
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
      for (i = 0; i < conf->num_queues; i++) {
          virtio_add_queue(vdev, conf->queue_size, virtio_blk_handle_output);
      }
 -    qemu_coroutine_increase_pool_batch_size(conf->num_queues * conf->queue_size
 -                                            / 2);
 +    qemu_coroutine_inc_pool_size(conf->num_queues * conf->queue_size / 2);
      virtio_blk_data_plane_create(vdev, conf, &s->dataplane, &err);
      if (err != NULL) {
          error_propagate(errp, err);
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_unrealize(DeviceState *dev)
      for (i = 0; i < conf->num_queues; i++) {
          virtio_del_queue(vdev, i);
      }
 -    qemu_coroutine_decrease_pool_batch_size(conf->num_queues * conf->queue_size
 -                                            / 2);
 +    qemu_coroutine_dec_pool_size(conf->num_queues * conf->queue_size / 2);
      qemu_del_vm_change_state_handler(s->change);
      blockdev_mark_auto_del(s->blk);
      virtio_cleanup(vdev);
 diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/qemu-coroutine.c
 +++ b/util/qemu-coroutine.c
@@ -XXX,XX +XXX,XX @@ AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co)
      return co->ctx;
  }
--static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, Qcow2COWRegion *r)
+-void qemu_coroutine_increase_pool_batch_size(unsigned int additional_pool_size)
-+static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
++void qemu_coroutine_inc_pool_size(unsigned int additional_pool_size)
  {
-     BDRVQcow2State *s = bs->opaque;
+     qatomic_add(&pool_batch_size, additional_pool_size);
 +    Qcow2COWRegion *start = &m->cow_start;
 +    Qcow2COWRegion *end = &m->cow_end;
      int ret;
 -    if (r->nb_bytes == 0) {
 +    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
          return 0;
      }
      qemu_co_mutex_unlock(&s->lock);
 -    ret = do_perform_cow(bs, m->offset, m->alloc_offset, r->offset, r->nb_bytes);
 -    qemu_co_mutex_lock(&s->lock);
 -
 +    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
 +                         start->offset, start->nb_bytes);
      if (ret < 0) {
 -        return ret;
 +        goto fail;
      }
 +    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
 +                         end->offset, end->nb_bytes);
 +
 +fail:
 +    qemu_co_mutex_lock(&s->lock);
 +
      /*
       * Before we update the L2 table to actually point to the new cluster, we
       * need to be sure that the refcounts have been increased and COW was
       * handled.
       */
 -    qcow2_cache_depends_on_flush(s->l2_table_cache);
 +    if (ret == 0) {
 +        qcow2_cache_depends_on_flush(s->l2_table_cache);
 +    }
 -    return 0;
 +    return ret;
  }
- int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
+-void qemu_coroutine_decrease_pool_batch_size(unsigned int removing_pool_size)
-@@ -XXX,XX +XXX,XX @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
++void qemu_coroutine_dec_pool_size(unsigned int removing_pool_size)
-     }
+ {
+     qatomic_sub(&pool_batch_size, removing_pool_size);
-     /* copy content of unmodified sectors */
+ }
 -    ret = perform_cow(bs, m, &m->cow_start);
 -    if (ret < 0) {
 -        goto err;
 -    }
 -
 -    ret = perform_cow(bs, m, &m->cow_end);
 +    ret = perform_cow(bs, m);
      if (ret < 0) {
          goto err;
      }
 --
-.8.3.1
+.35.3

-[Qemu-devel] [PULL 59/61] blkverify: Catch bs->exact_filename overflow
+[PULL 02/10] coroutine: Revert to constant batch size
-From: Max Reitz <mreitz@redhat.com>
+Commit 4c41c69e changed the way the coroutine pool is sized because for
 virtio-blk devices with a large queue size and heavy I/O, it was just
 too small and caused coroutines to be deleted and reallocated soon
 afterwards. The change made the size dynamic based on the number of
 queues and the queue size of virtio-blk devices.
-The bs->exact_filename field may not be sufficient to store the full
+There are two important numbers here: Slightly simplified, when a
-blkverify node filename. In this case, we should not generate a filename
+coroutine terminates, it is generally stored in the global release pool
-at all instead of an unusable one.
+up to a certain pool size, and if the pool is full, it is freed.
 Conversely, when allocating a new coroutine, the coroutines in the
 release pool are reused if the pool already has reached a certain
 minimum size (the batch size), otherwise we allocate new coroutines.
 The problem after commit 4c41c69e is that it not only increases the
 maximum pool size (which is the intended effect), but also the batch
 size for reusing coroutines (which is a bug). It means that in cases
 with many devices and/or a large queue size (which defaults to the
 number of vcpus for virtio-blk-pci), many thousand coroutines could be
 sitting in the release pool without being reused.
 This is not only a waste of memory and allocations, but it actually
 makes the QEMU process likely to hit the vm.max_map_count limit on Linux
 because each coroutine requires two mappings (its stack and the guard
 page for the stack), causing it to abort() in qemu_alloc_stack() because
 when the limit is hit, mprotect() starts to fail with ENOMEM.
 In order to fix the problem, change the batch size back to 64 to avoid
 uselessly accumulating coroutines in the release pool, but keep the
 dynamic maximum pool size so that coroutines aren't freed too early
 in heavy I/O scenarios.
 Note that this fix doesn't strictly make it impossible to hit the limit,
 but this would only happen if most of the coroutines are actually in use
 at the same time, not just sitting in a pool. This is the same behaviour
 as we already had before commit 4c41c69e. Fully preventing this would
 require allowing qemu_coroutine_create() to return an error, but it
 doesn't seem to be a scenario that people hit in practice.
 Cc: qemu-stable@nongnu.org
-Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
+Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2079938
-Signed-off-by: Max Reitz <mreitz@redhat.com>
+Fixes: 4c41c69e05fe28c0f95f8abd2ebf407e95a4f04b
-Message-id: 20170613172006.19685-3-mreitz@redhat.com
+Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Alberto Garcia <berto@igalia.com>
+Message-Id: <20220510151020.105528-3-kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Tested-by: Hiroki Narukawa <hnarukaw@yahoo-corp.jp>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
+Signed-off-by: Kevin Wolf <kwolf@redhat.com>
 ---
- block/blkverify.c | 12 ++++++++----
+ util/qemu-coroutine.c | 22 ++++++++++++++--------
-file changed, 8 insertions(+), 4 deletions(-)
+file changed, 14 insertions(+), 8 deletions(-)
-diff --git a/block/blkverify.c b/block/blkverify.c
+diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/blkverify.c
+--- a/util/qemu-coroutine.c
-+++ b/block/blkverify.c
++++ b/util/qemu-coroutine.c
-@@ -XXX,XX +XXX,XX @@ static void blkverify_refresh_filename(BlockDriverState *bs, QDict *options)
+@@ -XXX,XX +XXX,XX @@
-     if (bs->file->bs->exact_filename[0]
+ #include "qemu/coroutine-tls.h"
-         && s->test_file->bs->exact_filename[0])
+ #include "block/aio.h"
-     {
--        snprintf(bs->exact_filename, sizeof(bs->exact_filename),
+-/** Initial batch size is 64, and is increased on demand */
--                 "blkverify:%s:%s",
++/**
--                 bs->file->bs->exact_filename,
++ * The minimal batch size is always 64, coroutines from the release_pool are
--                 s->test_file->bs->exact_filename);
++ * reused as soon as there are 64 coroutines in it. The maximum pool size starts
-+        int ret = snprintf(bs->exact_filename, sizeof(bs->exact_filename),
++ * with 64 and is increased on demand so that coroutines are not deleted even if
-+                           "blkverify:%s:%s",
++ * they are not immediately reused.
-+                           bs->file->bs->exact_filename,
++ */
-+                           s->test_file->bs->exact_filename);
+ enum {
-+        if (ret >= sizeof(bs->exact_filename)) {
+-    POOL_INITIAL_BATCH_SIZE = 64,
-+            /* An overflow makes the filename unusable, so do not report any */
++    POOL_MIN_BATCH_SIZE = 64,
-+            bs->exact_filename[0] = 0;
++    POOL_INITIAL_MAX_SIZE = 64,
-+        }
+ };
-     }
  /** Free list to speed up creation */
  static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool);
 -static unsigned int pool_batch_size = POOL_INITIAL_BATCH_SIZE;
 +static unsigned int pool_max_size = POOL_INITIAL_MAX_SIZE;
  static unsigned int release_pool_size;
  typedef QSLIST_HEAD(, Coroutine) CoroutineQSList;
@@ -XXX,XX +XXX,XX @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque)
          co = QSLIST_FIRST(alloc_pool);
          if (!co) {
 -            if (release_pool_size > qatomic_read(&pool_batch_size)) {
 +            if (release_pool_size > POOL_MIN_BATCH_SIZE) {
                  /* Slow path; a good place to register the destructor, too.  */
                  Notifier *notifier = get_ptr_coroutine_pool_cleanup_notifier();
                  if (!notifier->notify) {
@@ -XXX,XX +XXX,XX @@ static void coroutine_delete(Coroutine *co)
      co->caller = NULL;
      if (CONFIG_COROUTINE_POOL) {
 -        if (release_pool_size < qatomic_read(&pool_batch_size) * 2) {
 +        if (release_pool_size < qatomic_read(&pool_max_size) * 2) {
              QSLIST_INSERT_HEAD_ATOMIC(&release_pool, co, pool_next);
              qatomic_inc(&release_pool_size);
              return;
          }
 -        if (get_alloc_pool_size() < qatomic_read(&pool_batch_size)) {
 +        if (get_alloc_pool_size() < qatomic_read(&pool_max_size)) {
              QSLIST_INSERT_HEAD(get_ptr_alloc_pool(), co, pool_next);
              set_alloc_pool_size(get_alloc_pool_size() + 1);
              return;
@@ -XXX,XX +XXX,XX @@ AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co)
  void qemu_coroutine_inc_pool_size(unsigned int additional_pool_size)
  {
 -    qatomic_add(&pool_batch_size, additional_pool_size);
 +    qatomic_add(&pool_max_size, additional_pool_size);
  }
+ void qemu_coroutine_dec_pool_size(unsigned int removing_pool_size)
+ {
+-    qatomic_sub(&pool_batch_size, removing_pool_size);
++    qatomic_sub(&pool_max_size, removing_pool_size);
+ }
 --
-.8.3.1
+.35.3

-[Qemu-devel] [PULL 07/61] migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()
+[PULL 03/10] iotests/testrunner: Flush after run_test()
-From: Stefan Hajnoczi <stefanha@redhat.com>
+From: Hanna Reitz <hreitz@redhat.com>
-blk/bdrv_drain_all() only takes effect for a single instant and then
+When stdout is not a terminal, the buffer may not be flushed at each end
-resumes block jobs, guest devices, and other external clients like the
+of line, so we should flush after each test is done.  This is especially
-NBD server.  This can be handy when performing a synchronous drain
+apparent when run by check-block, in two ways:
 before terminating the program, for example.
-Monitor commands usually need to quiesce I/O across an entire code
+First, when running make check-block -jX with X > 1, progress indication
-region so blk/bdrv_drain_all() is not suitable.  They must use
+was missing, even though testrunner.py does theoretically print each
-bdrv_drain_all_begin/end() to mark the region.  This prevents new I/O
+test's status once it has been run, even in multi-processing mode.
-requests from slipping in or worse - block jobs completing and modifying
+Flushing after each test restores this progress indication.
 the graph.
-I audited other blk/bdrv_drain_all() callers but did not find anything
+Second, sometimes make check-block failed altogether, with an error
-that needs a similar fix.  This patch fixes the savevm/loadvm commands.
+message that "too few tests [were] run".  I presume that's because one
-Although I haven't encountered a read world issue this makes the code
+worker process in the job pool did not get to flush its stdout before
-safer.
+the main process exited, and so meson did not get to see that worker's
 test results.  In any case, by flushing at the end of run_test(), the
 problem has disappeared for me.
-Suggested-by: Kevin Wolf <kwolf@redhat.com>
+Signed-off-by: Hanna Reitz <hreitz@redhat.com>
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Message-Id: <20220506134215.10086-1-hreitz@redhat.com>
 Reviewed-by: Eric Blake <eblake@redhat.com>
 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
 ---
- migration/savevm.c | 18 +++++++++++++++---
+ tests/qemu-iotests/testrunner.py | 1 +
-file changed, 15 insertions(+), 3 deletions(-)
+file changed, 1 insertion(+)
-diff --git a/migration/savevm.c b/migration/savevm.c
+diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
 index XXXXXXX..XXXXXXX 100644
---- a/migration/savevm.c
+--- a/tests/qemu-iotests/testrunner.py
-+++ b/migration/savevm.c
++++ b/tests/qemu-iotests/testrunner.py
-@@ -XXX,XX +XXX,XX @@ int save_snapshot(const char *name, Error **errp)
+@@ -XXX,XX +XXX,XX @@ def run_test(self, test: str,
-     }
+             else:
-     vm_stop(RUN_STATE_SAVE_VM);
+                 print(res.casenotrun)
-+    bdrv_drain_all_begin();
++        sys.stdout.flush()
-+
+         return res
-     aio_context_acquire(aio_context);
+     def run_tests(self, tests: List[str], jobs: int = 1) -> bool:
      memset(sn, 0, sizeof(*sn));
@@ -XXX,XX +XXX,XX @@ int save_snapshot(const char *name, Error **errp)
      if (aio_context) {
          aio_context_release(aio_context);
      }
 +
 +    bdrv_drain_all_end();
 +
      if (saved_vm_running) {
          vm_start();
      }
@@ -XXX,XX +XXX,XX @@ int load_snapshot(const char *name, Error **errp)
      }
      /* Flush all IO requests so they don't interfere with the new state.  */
 -    bdrv_drain_all();
 +    bdrv_drain_all_begin();
      ret = bdrv_all_goto_snapshot(name, &bs);
      if (ret < 0) {
          error_setg(errp, "Error %d while activating snapshot '%s' on '%s'",
                       ret, name, bdrv_get_device_name(bs));
 -        return ret;
 +        goto err_drain;
      }
      /* restore the VM state */
      f = qemu_fopen_bdrv(bs_vm_state, 0);
      if (!f) {
          error_setg(errp, "Could not open VM state file");
 -        return -EINVAL;
 +        ret = -EINVAL;
 +        goto err_drain;
      }
      qemu_system_reset(SHUTDOWN_CAUSE_NONE);
@@ -XXX,XX +XXX,XX @@ int load_snapshot(const char *name, Error **errp)
      ret = qemu_loadvm_state(f);
      aio_context_release(aio_context);
 +    bdrv_drain_all_end();
 +
      migration_incoming_state_destroy();
      if (ret < 0) {
          error_setg(errp, "Error %d while loading VM state", ret);
@@ -XXX,XX +XXX,XX @@ int load_snapshot(const char *name, Error **errp)
      }
      return 0;
 +
 +err_drain:
 +    bdrv_drain_all_end();
 +    return ret;
  }
  void vmstate_register_ram(MemoryRegion *mr, DeviceState *dev)
 --
-.8.3.1
+.35.3

-[Qemu-devel] [PULL 11/61] virtio-pci: use ioeventfd even when KVM is disabled
+[PULL 04/10] tests/qemu-iotests: print intent to run a test in TAP mode
-From: Stefan Hajnoczi <stefanha@redhat.com>
+From: Daniel P. Berrangé <berrange@redhat.com>
-Old kvm.ko versions only supported a tiny number of ioeventfds so
+When running I/O tests using TAP output mode, we get a single TAP test
-virtio-pci avoids ioeventfds when kvm_has_many_ioeventfds() returns 0.
+with a sub-test reported for each I/O test that is run. The output looks
 something like this:
-Do not check kvm_has_many_ioeventfds() when KVM is disabled since it
+..123
-always returns 0.  Since commit 8c56c1a592b5092d91da8d8943c17777d6462a6f
+ ok qcow2 011
-("memory: emulate ioeventfd") it has been possible to use ioeventfds in
+ ok qcow2 012
-qtest or TCG mode.
+ ok qcow2 013
  ok qcow2 217
  ...
-This patch makes -device virtio-blk-pci,iothread=iothread0 work even
+If everything runs or fails normally this is fine, but periodically we
-when KVM is disabled.
+have been seeing the test harness abort early before all 123 tests have
 been run, just leaving a fairly useless message like
-I have tested that virtio-blk-pci works under TCG both with and without
+  TAP parsing error: Too few tests run (expected 123, got 107)
 iothread.
-Cc: Michael S. Tsirkin <mst@redhat.com>
+we have no idea which tests were running at the time the test harness
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+abruptly exited. This change causes us to print a message about our
-Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
+intent to run each test, so we have a record of what is active at the
 time the harness exits abnormally.
 ..123
  # running qcow2 011
  ok qcow2 011
  # running qcow2 012
  ok qcow2 012
  # running qcow2 013
  ok qcow2 013
  # running qcow2 217
  ok qcow2 217
  ...
 Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
 Message-Id: <20220509124134.867431-2-berrange@redhat.com>
 Reviewed-by: Thomas Huth <thuth@redhat.com>
 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
 ---
- hw/virtio/virtio-pci.c | 2 +-
+ tests/qemu-iotests/testrunner.py | 3 +++
-file changed, 1 insertion(+), 1 deletion(-)
+file changed, 3 insertions(+)
-diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
+diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
 index XXXXXXX..XXXXXXX 100644
---- a/hw/virtio/virtio-pci.c
+--- a/tests/qemu-iotests/testrunner.py
-+++ b/hw/virtio/virtio-pci.c
++++ b/tests/qemu-iotests/testrunner.py
-@@ -XXX,XX +XXX,XX @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ def run_test(self, test: str,
-     bool pcie_port = pci_bus_is_express(pci_dev->bus) &&
+                                      starttime=start,
-                      !pci_bus_is_root(pci_dev->bus);
+                                      lasttime=last_el,
+                                      end = '\n' if mp else '\r')
--    if (!kvm_has_many_ioeventfds()) {
++        else:
-+    if (kvm_enabled() && !kvm_has_many_ioeventfds()) {
++            testname = os.path.basename(test)
-         proxy->flags &= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
++            print(f'# running {self.env.imgfmt} {testname}')
-     }
          res = self.do_run_test(test, mp)
 --
-.8.3.1
+.35.3

-[Qemu-devel] [PULL 15/61] qemu-iotests: 068: test iothread mode
+[PULL 05/10] .gitlab-ci.d: export meson testlog.txt as an artifact
-From: Stefan Hajnoczi <stefanha@redhat.com>
+From: Daniel P. Berrangé <berrange@redhat.com>
-Perform the savevm/loadvm test with both iothread on and off.  This
+When running 'make check' we only get a summary of progress on the
-covers the recently found savevm/loadvm hang when iothread is enabled.
+console. Fortunately meson/ninja have saved the raw test output to a
 logfile. Exposing this log will make it easier to debug failures that
 happen in CI.
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
 Message-Id: <20220509124134.867431-3-berrange@redhat.com>
 Reviewed-by: Thomas Huth <thuth@redhat.com>
 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
 ---
- tests/qemu-iotests/068     | 23 ++++++++++++++---------
+ .gitlab-ci.d/buildtest-template.yml | 12 ++++++++++--
- tests/qemu-iotests/068.out | 11 ++++++++++-
+file changed, 10 insertions(+), 2 deletions(-)
 files changed, 24 insertions(+), 10 deletions(-)
-diff --git a/tests/qemu-iotests/068 b/tests/qemu-iotests/068
+diff --git a/.gitlab-ci.d/buildtest-template.yml b/.gitlab-ci.d/buildtest-template.yml
-index XXXXXXX..XXXXXXX 100755
+index XXXXXXX..XXXXXXX 100644
---- a/tests/qemu-iotests/068
+--- a/.gitlab-ci.d/buildtest-template.yml
-+++ b/tests/qemu-iotests/068
++++ b/.gitlab-ci.d/buildtest-template.yml
-@@ -XXX,XX +XXX,XX @@ _supported_os Linux
+@@ -XXX,XX +XXX,XX @@
- IMGOPTS="compat=1.1"
+         make -j"$JOBS" $MAKE_CHECK_ARGS ;
- IMG_SIZE=128K
+       fi
--echo
+-.native_test_job_template:
--echo "=== Saving and reloading a VM state to/from a qcow2 image ==="
++.common_test_job_template:
--echo
+   stage: test
--_make_test_img $IMG_SIZE
+   image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest
--
+   script:
- case "$QEMU_DEFAULT_MACHINE" in
+@@ -XXX,XX +XXX,XX @@
-   s390-ccw-virtio)
+     # Avoid recompiling by hiding ninja with NINJA=":"
-       platform_parm="-no-shutdown"
+     - make NINJA=":" $MAKE_CHECK_ARGS
-@@ -XXX,XX +XXX,XX @@ _qemu()
-     _filter_qemu | _filter_hmp
++.native_test_job_template:
- }
++  extends: .common_test_job_template
++  artifacts:
--# Give qemu some time to boot before saving the VM state
++    name: "$CI_JOB_NAME-$CI_COMMIT_REF_SLUG"
--bash -c 'sleep 1; echo -e "savevm 0\nquit"' | _qemu
++    expire_in: 7 days
--# Now try to continue from that VM state (this should just work)
++    paths:
--echo quit | _qemu -loadvm 0
++      - build/meson-logs/testlog.txt
 +for extra_args in \
 +    "" \
 +    "-object iothread,id=iothread0 -set device.hba0.iothread=iothread0"; do
 +    echo
 +    echo "=== Saving and reloading a VM state to/from a qcow2 image ($extra_args) ==="
 +    echo
 +
-+    _make_test_img $IMG_SIZE
+ .avocado_test_job_template:
-+
+-  extends: .native_test_job_template
-+    # Give qemu some time to boot before saving the VM state
++  extends: .common_test_job_template
-+    bash -c 'sleep 1; echo -e "savevm 0\nquit"' | _qemu $extra_args
+   cache:
-+    # Now try to continue from that VM state (this should just work)
+     key: "${CI_JOB_NAME}-cache"
-+    echo quit | _qemu $extra_args -loadvm 0
+     paths:
 +done
  # success, all done
  echo "*** done"
 diff --git a/tests/qemu-iotests/068.out b/tests/qemu-iotests/068.out
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qemu-iotests/068.out
 +++ b/tests/qemu-iotests/068.out
@@ -XXX,XX +XXX,XX @@
  QA output created by 068
 -=== Saving and reloading a VM state to/from a qcow2 image ===
 +=== Saving and reloading a VM state to/from a qcow2 image () ===
 +
 +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
 +QEMU X.Y.Z monitor - type 'help' for more information
 +(qemu) savevm 0
 +(qemu) quit
 +QEMU X.Y.Z monitor - type 'help' for more information
 +(qemu) quit
 +
 +=== Saving and reloading a VM state to/from a qcow2 image (-object iothread,id=iothread0 -set device.hba0.iothread=iothread0) ===
  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
  QEMU X.Y.Z monitor - type 'help' for more information
 --
-.8.3.1
+.35.3

-[Qemu-devel] [PULL 60/61] block: Do not strcmp() with NULL uri->scheme
+[PULL 06/10] hw/block/fdc: Prevent end-of-track overrun (CVE-2021-3507)
-From: Max Reitz <mreitz@redhat.com>
+From: Philippe Mathieu-Daudé <philmd@redhat.com>
-uri_parse(...)->scheme may be NULL. In fact, probably every field may be
+Per the 82078 datasheet, if the end-of-track (EOT byte in
-NULL, and the callers do test this for all of the other fields but not
+the FIFO) is more than the number of sectors per side, the
-for scheme (except for block/gluster.c; block/vxhs.c does not access
+command is terminated unsuccessfully:
 that field at all).
-We can easily fix this by using g_strcmp0() instead of strcmp().
+* 5.2.5 DATA TRANSFER TERMINATION
   The 82078 supports terminal count explicitly through
   the TC pin and implicitly through the underrun/over-
   run and end-of-track (EOT) functions. For full sector
   transfers, the EOT parameter can define the last
   sector to be transferred in a single or multisector
   transfer. If the last sector to be transferred is a par-
   tial sector, the host can stop transferring the data in
   mid-sector, and the 82078 will continue to complete
   the sector as if a hardware TC was received. The
   only difference between these implicit functions and
   TC is that they return "abnormal termination" result
   status. Such status indications can be ignored if they
   were expected.
 * 6.1.3 READ TRACK
   This command terminates when the EOT specified
   number of sectors have been read. If the 82078
   does not find an I D Address Mark on the diskette
   after the second· occurrence of a pulse on the
   INDX# pin, then it sets the IC code in Status Regis-
   ter 0 to "01" (Abnormal termination), sets the MA bit
   in Status Register 1 to "1", and terminates the com-
   mand.
 * 6.1.6 VERIFY
   Refer to Table 6-6 and Table 6-7 for information
   concerning the values of MT and EC versus SC and
   EOT value.
 * Table 6·6. Result Phase Table
 * Table 6-7. Verify Command Result Phase Table
 Fix by aborting the transfer when EOT > # Sectors Per Side.
 Cc: qemu-stable@nongnu.org
-Signed-off-by: Max Reitz <mreitz@redhat.com>
+Cc: Hervé Poussineau <hpoussin@reactos.org>
-Message-id: 20170613205726.13544-1-mreitz@redhat.com
+Fixes: baca51faff0 ("floppy driver: disk geometry auto detect")
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+Reported-by: Alexander Bulekov <alxndr@bu.edu>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
+Resolves: https://gitlab.com/qemu-project/qemu/-/issues/339
 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Message-Id: <20211118115733.4038610-2-philmd@redhat.com>
 Reviewed-by: Hanna Reitz <hreitz@redhat.com>
 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
 ---
- block/nbd.c      | 6 +++---
+ hw/block/fdc.c | 8 ++++++++
- block/nfs.c      | 2 +-
+file changed, 8 insertions(+)
  block/sheepdog.c | 6 +++---
  block/ssh.c      | 2 +-
 files changed, 8 insertions(+), 8 deletions(-)
-diff --git a/block/nbd.c b/block/nbd.c
+diff --git a/hw/block/fdc.c b/hw/block/fdc.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/nbd.c
+--- a/hw/block/fdc.c
-+++ b/block/nbd.c
++++ b/hw/block/fdc.c
-@@ -XXX,XX +XXX,XX @@ static int nbd_parse_uri(const char *filename, QDict *options)
+@@ -XXX,XX +XXX,XX @@ static void fdctrl_start_transfer(FDCtrl *fdctrl, int direction)
-     }
+         int tmp;
+         fdctrl->data_len = 128 << (fdctrl->fifo[5] > 7 ? 7 : fdctrl->fifo[5]);
-     /* transport */
+         tmp = (fdctrl->fifo[6] - ks + 1);
--    if (!strcmp(uri->scheme, "nbd")) {
++        if (tmp < 0) {
-+    if (!g_strcmp0(uri->scheme, "nbd")) {
++            FLOPPY_DPRINTF("invalid EOT: %d\n", tmp);
-         is_unix = false;
++            fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM, FD_SR1_MA, 0x00);
--    } else if (!strcmp(uri->scheme, "nbd+tcp")) {
++            fdctrl->fifo[3] = kt;
-+    } else if (!g_strcmp0(uri->scheme, "nbd+tcp")) {
++            fdctrl->fifo[4] = kh;
-         is_unix = false;
++            fdctrl->fifo[5] = ks;
--    } else if (!strcmp(uri->scheme, "nbd+unix")) {
++            return;
-+    } else if (!g_strcmp0(uri->scheme, "nbd+unix")) {
++        }
-         is_unix = true;
+         if (fdctrl->fifo[0] & 0x80)
-     } else {
+             tmp += fdctrl->fifo[6];
-         ret = -EINVAL;
+         fdctrl->data_len *= tmp;
 diff --git a/block/nfs.c b/block/nfs.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/nfs.c
 +++ b/block/nfs.c
@@ -XXX,XX +XXX,XX @@ static int nfs_parse_uri(const char *filename, QDict *options, Error **errp)
          error_setg(errp, "Invalid URI specified");
          goto out;
      }
 -    if (strcmp(uri->scheme, "nfs") != 0) {
 +    if (g_strcmp0(uri->scheme, "nfs") != 0) {
          error_setg(errp, "URI scheme must be 'nfs'");
          goto out;
      }
 diff --git a/block/sheepdog.c b/block/sheepdog.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/sheepdog.c
 +++ b/block/sheepdog.c
@@ -XXX,XX +XXX,XX @@ static void sd_parse_uri(SheepdogConfig *cfg, const char *filename,
      }
      /* transport */
 -    if (!strcmp(uri->scheme, "sheepdog")) {
 +    if (!g_strcmp0(uri->scheme, "sheepdog")) {
          is_unix = false;
 -    } else if (!strcmp(uri->scheme, "sheepdog+tcp")) {
 +    } else if (!g_strcmp0(uri->scheme, "sheepdog+tcp")) {
          is_unix = false;
 -    } else if (!strcmp(uri->scheme, "sheepdog+unix")) {
 +    } else if (!g_strcmp0(uri->scheme, "sheepdog+unix")) {
          is_unix = true;
      } else {
          error_setg(&err, "URI scheme must be 'sheepdog', 'sheepdog+tcp',"
 diff --git a/block/ssh.c b/block/ssh.c
 index XXXXXXX..XXXXXXX 100644
 --- a/block/ssh.c
 +++ b/block/ssh.c
@@ -XXX,XX +XXX,XX @@ static int parse_uri(const char *filename, QDict *options, Error **errp)
          return -EINVAL;
      }
 -    if (strcmp(uri->scheme, "ssh") != 0) {
 +    if (g_strcmp0(uri->scheme, "ssh") != 0) {
          error_setg(errp, "URI scheme must be 'ssh'");
          goto err;
      }
 --
-.8.3.1
+.35.3

-[Qemu-devel] [PULL 20/61] qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()
+[PULL 07/10] tests/qtest/fdc-test: Add a regression test for CVE-2021-3507
-From: Alberto Garcia <berto@igalia.com>
+From: Philippe Mathieu-Daudé <philmd@redhat.com>
-This patch splits do_perform_cow() into three separate functions to
+Add the reproducer from https://gitlab.com/qemu-project/qemu/-/issues/339
 read, encrypt and write the COW regions.
-perform_cow() can now read both regions first, then encrypt them and
+Without the previous commit, when running 'make check-qtest-i386'
-finally write them to disk. The memory allocation is also done in
+with QEMU configured with '--enable-sanitizers' we get:
 this function now, using one single buffer large enough to hold both
 regions.
-Signed-off-by: Alberto Garcia <berto@igalia.com>
+  ==4028352==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619000062a00 at pc 0x5626d03c491a bp 0x7ffdb4199410 sp 0x7ffdb4198bc0
-Reviewed-by: Kevin Wolf <kwolf@redhat.com>
+  READ of size 786432 at 0x619000062a00 thread T0
       #0 0x5626d03c4919 in __asan_memcpy (qemu-system-i386+0x1e65919)
       #1 0x5626d1c023cc in flatview_write_continue softmmu/physmem.c:2787:13
       #2 0x5626d1bf0c0f in flatview_write softmmu/physmem.c:2822:14
       #3 0x5626d1bf0798 in address_space_write softmmu/physmem.c:2914:18
       #4 0x5626d1bf0f37 in address_space_rw softmmu/physmem.c:2924:16
       #5 0x5626d1bf14c8 in cpu_physical_memory_rw softmmu/physmem.c:2933:5
       #6 0x5626d0bd5649 in cpu_physical_memory_write include/exec/cpu-common.h:82:5
       #7 0x5626d0bd0a07 in i8257_dma_write_memory hw/dma/i8257.c:452:9
       #8 0x5626d09f825d in fdctrl_transfer_handler hw/block/fdc.c:1616:13
       #9 0x5626d0a048b4 in fdctrl_start_transfer hw/block/fdc.c:1539:13
       #10 0x5626d09f4c3e in fdctrl_write_data hw/block/fdc.c:2266:13
       #11 0x5626d09f22f7 in fdctrl_write hw/block/fdc.c:829:9
       #12 0x5626d1c20bc5 in portio_write softmmu/ioport.c:207:17
 x619000062a00 is located 0 bytes to the right of 512-byte region [0x619000062800,0x619000062a00)
   allocated by thread T0 here:
       #0 0x5626d03c66ec in posix_memalign (qemu-system-i386+0x1e676ec)
       #1 0x5626d2b988d4 in qemu_try_memalign util/oslib-posix.c:210:11
       #2 0x5626d2b98b0c in qemu_memalign util/oslib-posix.c:226:27
       #3 0x5626d09fbaf0 in fdctrl_realize_common hw/block/fdc.c:2341:20
       #4 0x5626d0a150ed in isabus_fdc_realize hw/block/fdc-isa.c:113:5
       #5 0x5626d2367935 in device_set_realized hw/core/qdev.c:531:13
   SUMMARY: AddressSanitizer: heap-buffer-overflow (qemu-system-i386+0x1e65919) in __asan_memcpy
   Shadow bytes around the buggy address:
 x0c32800044f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 x0c3280004500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 x0c3280004510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 x0c3280004520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 x0c3280004530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   =>0x0c3280004540:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 x0c3280004550: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 x0c3280004560: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 x0c3280004570: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 x0c3280004580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
 x0c3280004590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
   Shadow byte legend (one shadow byte represents 8 application bytes):
     Addressable:           00
     Heap left redzone:       fa
     Freed heap region:       fd
   ==4028352==ABORTING
 [ kwolf: Added snapshot=on to prevent write file lock failure ]
 Reported-by: Alexander Bulekov <alxndr@bu.edu>
 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
 Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
 ---
- block/qcow2-cluster.c | 117 +++++++++++++++++++++++++++++++++++++-------------
+ tests/qtest/fdc-test.c | 21 +++++++++++++++++++++
-file changed, 87 insertions(+), 30 deletions(-)
+file changed, 21 insertions(+)
-diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
+diff --git a/tests/qtest/fdc-test.c b/tests/qtest/fdc-test.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2-cluster.c
+--- a/tests/qtest/fdc-test.c
-+++ b/block/qcow2-cluster.c
++++ b/tests/qtest/fdc-test.c
-@@ -XXX,XX +XXX,XX @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
+@@ -XXX,XX +XXX,XX @@ static void test_cve_2021_20196(void)
-     return 0;
+     qtest_quit(s);
  }
--static int coroutine_fn do_perform_cow(BlockDriverState *bs,
++static void test_cve_2021_3507(void)
--                                       uint64_t src_cluster_offset,
++{
--                                       uint64_t cluster_offset,
++    QTestState *s;
--                                       unsigned offset_in_cluster,
++
--                                       unsigned bytes)
++    s = qtest_initf("-nographic -m 32M -nodefaults "
-+static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
++                    "-drive file=%s,format=raw,if=floppy,snapshot=on",
-+                                            uint64_t src_cluster_offset,
++                    test_image);
-+                                            unsigned offset_in_cluster,
++    qtest_outl(s, 0x9, 0x0a0206);
-+                                            uint8_t *buffer,
++    qtest_outw(s, 0x3f4, 0x1600);
-+                                            unsigned bytes)
++    qtest_outw(s, 0x3f4, 0x0000);
- {
++    qtest_outw(s, 0x3f4, 0x0000);
--    BDRVQcow2State *s = bs->opaque;
++    qtest_outw(s, 0x3f4, 0x0000);
-     QEMUIOVector qiov;
++    qtest_outw(s, 0x3f4, 0x0200);
--    struct iovec iov;
++    qtest_outw(s, 0x3f4, 0x0200);
-+    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
++    qtest_outw(s, 0x3f4, 0x0000);
-     int ret;
++    qtest_outw(s, 0x3f4, 0x0000);
++    qtest_outw(s, 0x3f4, 0x0000);
-     if (bytes == 0) {
++    qtest_quit(s);
          return 0;
      }
 -    iov.iov_len = bytes;
 -    iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
 -    if (iov.iov_base == NULL) {
 -        return -ENOMEM;
 -    }
 -
      qemu_iovec_init_external(&qiov, &iov, 1);
      BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
      if (!bs->drv) {
 -        ret = -ENOMEDIUM;
 -        goto out;
 +        return -ENOMEDIUM;
      }
      /* Call .bdrv_co_readv() directly instead of using the public block-layer
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
      ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
                                    bytes, &qiov, 0);
      if (ret < 0) {
 -        goto out;
 +        return ret;
      }
 -    if (bs->encrypted) {
 +    return 0;
 +}
 +
-+static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
+ int main(int argc, char **argv)
-+                                                uint64_t src_cluster_offset,
+ {
-+                                                unsigned offset_in_cluster,
+     int fd;
-+                                                uint8_t *buffer,
+@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
-+                                                unsigned bytes)
+     qtest_add_func("/fdc/read_no_dma_19", test_read_no_dma_19);
-+{
+     qtest_add_func("/fdc/fuzz-registers", fuzz_registers);
-+    if (bytes && bs->encrypted) {
+     qtest_add_func("/fdc/fuzz/cve_2021_20196", test_cve_2021_20196);
-+        BDRVQcow2State *s = bs->opaque;
++    qtest_add_func("/fdc/fuzz/cve_2021_3507", test_cve_2021_3507);
-         int64_t sector = (src_cluster_offset + offset_in_cluster)
-                          >> BDRV_SECTOR_BITS;
+     ret = g_test_run();
          assert(s->cipher);
          assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
          assert((bytes & ~BDRV_SECTOR_MASK) == 0);
 -        if (qcow2_encrypt_sectors(s, sector, iov.iov_base, iov.iov_base,
 +        if (qcow2_encrypt_sectors(s, sector, buffer, buffer,
                                    bytes >> BDRV_SECTOR_BITS, true, NULL) < 0) {
 -            ret = -EIO;
 -            goto out;
 +            return false;
          }
      }
 +    return true;
 +}
 +
 +static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
 +                                             uint64_t cluster_offset,
 +                                             unsigned offset_in_cluster,
 +                                             uint8_t *buffer,
 +                                             unsigned bytes)
 +{
 +    QEMUIOVector qiov;
 +    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
 +    int ret;
 +
 +    if (bytes == 0) {
 +        return 0;
 +    }
 +
 +    qemu_iovec_init_external(&qiov, &iov, 1);
      ret = qcow2_pre_write_overlap_check(bs, 0,
              cluster_offset + offset_in_cluster, bytes);
      if (ret < 0) {
 -        goto out;
 +        return ret;
      }
      BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
      ret = bdrv_co_pwritev(bs->file, cluster_offset + offset_in_cluster,
                            bytes, &qiov, 0);
      if (ret < 0) {
 -        goto out;
 +        return ret;
      }
 -    ret = 0;
 -out:
 -    qemu_vfree(iov.iov_base);
 -    return ret;
 +    return 0;
  }
@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
      BDRVQcow2State *s = bs->opaque;
      Qcow2COWRegion *start = &m->cow_start;
      Qcow2COWRegion *end = &m->cow_end;
 +    unsigned buffer_size;
 +    uint8_t *start_buffer, *end_buffer;
      int ret;
 +    assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
 +
      if (start->nb_bytes == 0 && end->nb_bytes == 0) {
          return 0;
      }
 +    /* Reserve a buffer large enough to store the data from both the
 +     * start and end COW regions. Add some padding in the middle if
 +     * necessary to make sure that the end region is optimally aligned */
 +    buffer_size = QEMU_ALIGN_UP(start->nb_bytes, bdrv_opt_mem_align(bs)) +
 +        end->nb_bytes;
 +    start_buffer = qemu_try_blockalign(bs, buffer_size);
 +    if (start_buffer == NULL) {
 +        return -ENOMEM;
 +    }
 +    /* The part of the buffer where the end region is located */
 +    end_buffer = start_buffer + buffer_size - end->nb_bytes;
 +
      qemu_co_mutex_unlock(&s->lock);
 -    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
 -                         start->offset, start->nb_bytes);
 +    /* First we read the existing data from both COW regions */
 +    ret = do_perform_cow_read(bs, m->offset, start->offset,
 +                              start_buffer, start->nb_bytes);
      if (ret < 0) {
          goto fail;
      }
 -    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
 -                         end->offset, end->nb_bytes);
 +    ret = do_perform_cow_read(bs, m->offset, end->offset,
 +                              end_buffer, end->nb_bytes);
 +    if (ret < 0) {
 +        goto fail;
 +    }
 +
 +    /* Encrypt the data if necessary before writing it */
 +    if (bs->encrypted) {
 +        if (!do_perform_cow_encrypt(bs, m->offset, start->offset,
 +                                    start_buffer, start->nb_bytes) ||
 +            !do_perform_cow_encrypt(bs, m->offset, end->offset,
 +                                    end_buffer, end->nb_bytes)) {
 +            ret = -EIO;
 +            goto fail;
 +        }
 +    }
 +
 +    /* And now we can write everything */
 +    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
 +                               start_buffer, start->nb_bytes);
 +    if (ret < 0) {
 +        goto fail;
 +    }
 +    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset,
 +                               end_buffer, end->nb_bytes);
  fail:
      qemu_co_mutex_lock(&s->lock);
@@ -XXX,XX +XXX,XX @@ fail:
          qcow2_cache_depends_on_flush(s->l2_table_cache);
      }
 +    qemu_vfree(start_buffer);
      return ret;
  }
 --
-.8.3.1
+.35.3

-[Qemu-devel] [PULL 16/61] nvme: Add support for Read Data and Write Data in CMBs.
+[PULL 08/10] qemu-nbd: Pass max connections to blockdev layer
-From: Stephen Bates <sbates@raithlin.com>
+From: Eric Blake <eblake@redhat.com>
-Add the ability for the NVMe model to support both the RDS and WDS
+The next patch wants to adjust whether the NBD server code advertises
-modes in the Controller Memory Buffer.
+MULTI_CONN based on whether it is known if the server limits to
 exactly one client.  For a server started by QMP, this information is
 obtained through nbd_server_start (which can support more than one
 export); but for qemu-nbd (which supports exactly one export), it is
 controlled only by the command-line option -e/--shared.  Since we
 already have a hook function used by qemu-nbd, it's easiest to just
 alter its signature to fit our needs.
-Although not currently supported in the upstreamed Linux kernel a fork
+Signed-off-by: Eric Blake <eblake@redhat.com>
-with support exists [1] and user-space test programs that build on
+Message-Id: <20220512004924.417153-2-eblake@redhat.com>
 this also exist [2].
 Useful for testing CMB functionality in preperation for real CMB
 enabled NVMe devices (coming soon).
 [1] https://github.com/sbates130272/linux-p2pmem
 [2] https://github.com/sbates130272/p2pmem-test
 Signed-off-by: Stephen Bates <sbates@raithlin.com>
 Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
 Reviewed-by: Keith Busch <keith.busch@intel.com>
 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
 ---
- hw/block/nvme.c | 83 +++++++++++++++++++++++++++++++++++++++------------------
+ include/block/nbd.h | 2 +-
- hw/block/nvme.h |  1 +
+ blockdev-nbd.c      | 8 ++++----
-files changed, 58 insertions(+), 26 deletions(-)
+ qemu-nbd.c          | 2 +-
 files changed, 6 insertions(+), 6 deletions(-)
-diff --git a/hw/block/nvme.c b/hw/block/nvme.c
+diff --git a/include/block/nbd.h b/include/block/nbd.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/block/nvme.c
+--- a/include/block/nbd.h
-+++ b/hw/block/nvme.c
++++ b/include/block/nbd.h
-@@ -XXX,XX +XXX,XX @@
+@@ -XXX,XX +XXX,XX @@ void nbd_client_new(QIOChannelSocket *sioc,
-  *              cmb_size_mb=<cmb_size_mb[optional]>
+ void nbd_client_get(NBDClient *client);
-  *
+ void nbd_client_put(NBDClient *client);
-  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
-- * offset 0 in BAR2 and supports SQS only for now.
+-void nbd_server_is_qemu_nbd(bool value);
-+ * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
++void nbd_server_is_qemu_nbd(int max_connections);
-  */
+ bool nbd_server_is_running(void);
+ void nbd_server_start(SocketAddress *addr, const char *tls_creds,
- #include "qemu/osdep.h"
+                       const char *tls_authz, uint32_t max_connections,
-@@ -XXX,XX +XXX,XX @@ static void nvme_isr_notify(NvmeCtrl *n, NvmeCQueue *cq)
+diff --git a/blockdev-nbd.c b/blockdev-nbd.c
-     }
+index XXXXXXX..XXXXXXX 100644
 --- a/blockdev-nbd.c
 +++ b/blockdev-nbd.c
@@ -XXX,XX +XXX,XX @@ typedef struct NBDServerData {
  } NBDServerData;
  static NBDServerData *nbd_server;
 -static bool is_qemu_nbd;
 +static int qemu_nbd_connections = -1; /* Non-negative if this is qemu-nbd */
  static void nbd_update_server_watch(NBDServerData *s);
 -void nbd_server_is_qemu_nbd(bool value)
 +void nbd_server_is_qemu_nbd(int max_connections)
  {
 -    is_qemu_nbd = value;
 +    qemu_nbd_connections = max_connections;
  }
--static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
+ bool nbd_server_is_running(void)
 -    uint32_t len, NvmeCtrl *n)
 +static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
 +                             uint64_t prp2, uint32_t len, NvmeCtrl *n)
  {
-     hwaddr trans_len = n->page_size - (prp1 % n->page_size);
+-    return nbd_server || is_qemu_nbd;
-     trans_len = MIN(len, trans_len);
++    return nbd_server || qemu_nbd_connections >= 0;
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
      if (!prp1) {
          return NVME_INVALID_FIELD | NVME_DNR;
 +    } else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
 +               prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
 +        qsg->nsg = 0;
 +        qemu_iovec_init(iov, num_prps);
 +        qemu_iovec_add(iov, (void *)&n->cmbuf[prp1 - n->ctrl_mem.addr], trans_len);
 +    } else {
 +        pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
 +        qemu_sglist_add(qsg, prp1, trans_len);
      }
 -
 -    pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
 -    qemu_sglist_add(qsg, prp1, trans_len);
      len -= trans_len;
      if (len) {
          if (!prp2) {
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
              nents = (len + n->page_size - 1) >> n->page_bits;
              prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
 -            pci_dma_read(&n->parent_obj, prp2, (void *)prp_list, prp_trans);
 +            nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
              while (len != 0) {
                  uint64_t prp_ent = le64_to_cpu(prp_list[i]);
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
                      i = 0;
                      nents = (len + n->page_size - 1) >> n->page_bits;
                      prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
 -                    pci_dma_read(&n->parent_obj, prp_ent, (void *)prp_list,
 +                    nvme_addr_read(n, prp_ent, (void *)prp_list,
                          prp_trans);
                      prp_ent = le64_to_cpu(prp_list[i]);
                  }
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
                  }
                  trans_len = MIN(len, n->page_size);
 -                qemu_sglist_add(qsg, prp_ent, trans_len);
 +                if (qsg->nsg){
 +                    qemu_sglist_add(qsg, prp_ent, trans_len);
 +                } else {
 +                    qemu_iovec_add(iov, (void *)&n->cmbuf[prp_ent - n->ctrl_mem.addr], trans_len);
 +                }
                  len -= trans_len;
                  i++;
              }
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
              if (prp2 & (n->page_size - 1)) {
                  goto unmap;
              }
 -            qemu_sglist_add(qsg, prp2, len);
 +            if (qsg->nsg) {
 +                qemu_sglist_add(qsg, prp2, len);
 +            } else {
 +                qemu_iovec_add(iov, (void *)&n->cmbuf[prp2 - n->ctrl_mem.addr], trans_len);
 +            }
          }
      }
      return NVME_SUCCESS;
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
      uint64_t prp1, uint64_t prp2)
  {
      QEMUSGList qsg;
 +    QEMUIOVector iov;
 +    uint16_t status = NVME_SUCCESS;
 -    if (nvme_map_prp(&qsg, prp1, prp2, len, n)) {
 +    if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) {
          return NVME_INVALID_FIELD | NVME_DNR;
      }
 -    if (dma_buf_read(ptr, len, &qsg)) {
 +    if (qsg.nsg > 0) {
 +        if (dma_buf_read(ptr, len, &qsg)) {
 +            status = NVME_INVALID_FIELD | NVME_DNR;
 +        }
          qemu_sglist_destroy(&qsg);
 -        return NVME_INVALID_FIELD | NVME_DNR;
 +    } else {
 +        if (qemu_iovec_to_buf(&iov, 0, ptr, len) != len) {
 +            status = NVME_INVALID_FIELD | NVME_DNR;
 +        }
 +        qemu_iovec_destroy(&iov);
      }
 -    qemu_sglist_destroy(&qsg);
 -    return NVME_SUCCESS;
 +    return status;
  }
- static void nvme_post_cqes(void *opaque)
+ static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
-@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
+diff --git a/qemu-nbd.c b/qemu-nbd.c
          return NVME_LBA_RANGE | NVME_DNR;
      }
 -    if (nvme_map_prp(&req->qsg, prp1, prp2, data_size, n)) {
 +    if (nvme_map_prp(&req->qsg, &req->iov, prp1, prp2, data_size, n)) {
          block_acct_invalid(blk_get_stats(n->conf.blk), acct);
          return NVME_INVALID_FIELD | NVME_DNR;
      }
 -    assert((nlb << data_shift) == req->qsg.size);
 -
 -    req->has_sg = true;
      dma_acct_start(n->conf.blk, &req->acct, &req->qsg, acct);
 -    req->aiocb = is_write ?
 -        dma_blk_write(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
 -                      nvme_rw_cb, req) :
 -        dma_blk_read(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
 -                     nvme_rw_cb, req);
 +    if (req->qsg.nsg > 0) {
 +        req->has_sg = true;
 +        req->aiocb = is_write ?
 +            dma_blk_write(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
 +                          nvme_rw_cb, req) :
 +            dma_blk_read(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
 +                         nvme_rw_cb, req);
 +    } else {
 +        req->has_sg = false;
 +        req->aiocb = is_write ?
 +            blk_aio_pwritev(n->conf.blk, data_offset, &req->iov, 0, nvme_rw_cb,
 +                            req) :
 +            blk_aio_preadv(n->conf.blk, data_offset, &req->iov, 0, nvme_rw_cb,
 +                           req);
 +    }
      return NVME_NO_COMPLETE;
  }
@@ -XXX,XX +XXX,XX @@ static int nvme_init(PCIDevice *pci_dev)
          NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
          NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
          NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
 -        NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 0);
 -        NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 0);
 +        NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
 +        NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
          NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */
          NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->cmb_size_mb);
 +        n->cmbloc = n->bar.cmbloc;
 +        n->cmbsz = n->bar.cmbsz;
 +
          n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
          memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,
                                "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
 diff --git a/hw/block/nvme.h b/hw/block/nvme.h
 index XXXXXXX..XXXXXXX 100644
---- a/hw/block/nvme.h
+--- a/qemu-nbd.c
-+++ b/hw/block/nvme.h
++++ b/qemu-nbd.c
-@@ -XXX,XX +XXX,XX @@ typedef struct NvmeRequest {
+@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
-     NvmeCqe                 cqe;
-     BlockAcctCookie         acct;
+     bs->detect_zeroes = detect_zeroes;
-     QEMUSGList              qsg;
-+    QEMUIOVector            iov;
+-    nbd_server_is_qemu_nbd(true);
-     QTAILQ_ENTRY(NvmeRequest)entry;
++    nbd_server_is_qemu_nbd(shared);
- } NvmeRequest;
+     export_opts = g_new(BlockExportOptions, 1);
      *export_opts = (BlockExportOptions) {
 --
-.8.3.1
+.35.3

-[Qemu-devel] [PULL 03/61] qemu-iotests: Test exiting qemu with running job
+[PULL 09/10] nbd/server: Allow MULTI_CONN for shared writable exports
-When qemu is exited, all running jobs should be cancelled successfully.
+From: Eric Blake <eblake@redhat.com>
-This adds a test for this for all types of block jobs that currently
-exist in qemu.
+According to the NBD spec, a server that advertises
+NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
 not see any cache inconsistencies: when properly separated by a single
 flush, actions performed by one client will be visible to another
 client, regardless of which client did the flush.
 We always satisfy these conditions in qemu - even when we support
 multiple clients, ALL clients go through a single point of reference
 into the block layer, with no local caching.  The effect of one client
 is instantly visible to the next client.  Even if our backend were a
 network device, we argue that any multi-path caching effects that
 would cause inconsistencies in back-to-back actions not seeing the
 effect of previous actions would be a bug in that backend, and not the
 fault of caching in qemu.  As such, it is safe to unconditionally
 advertise CAN_MULTI_CONN for any qemu NBD server situation that
 supports parallel clients.
 Note, however, that we don't want to advertise CAN_MULTI_CONN when we
 know that a second client cannot connect (for historical reasons,
 qemu-nbd defaults to a single connection while nbd-server-add and QMP
 commands default to unlimited connections; but we already have
 existing means to let either style of NBD server creation alter those
 defaults).  This is visible by no longer advertising MULTI_CONN for
 'qemu-nbd -r' without -e, as in the iotest nbd-qemu-allocation.
 The harder part of this patch is setting up an iotest to demonstrate
 behavior of multiple NBD clients to a single server.  It might be
 possible with parallel qemu-io processes, but I found it easier to do
 in python with the help of libnbd, and help from Nir and Vladimir in
 writing the test.
 Signed-off-by: Eric Blake <eblake@redhat.com>
 Suggested-by: Nir Soffer <nsoffer@redhat.com>
 Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
 Message-Id: <20220512004924.417153-3-eblake@redhat.com>
 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
 ---
- tests/qemu-iotests/185     | 206 +++++++++++++++++++++++++++++++++++++++++++++
+ qapi/block-export.json                        |   8 +-
- tests/qemu-iotests/185.out |  59 +++++++++++++
+ docs/interop/nbd.txt                          |   1 +
- tests/qemu-iotests/group   |   1 +
+ docs/tools/qemu-nbd.rst                       |   3 +-
-files changed, 266 insertions(+)
+ include/block/nbd.h                           |   3 +-
- create mode 100755 tests/qemu-iotests/185
+ blockdev-nbd.c                                |   5 +
- create mode 100644 tests/qemu-iotests/185.out
+ nbd/server.c                                  |  10 +-
+ MAINTAINERS                                   |   1 +
-diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
+ tests/qemu-iotests/tests/nbd-multiconn        | 145 ++++++++++++++++++
  tests/qemu-iotests/tests/nbd-multiconn.out    |   5 +
  .../tests/nbd-qemu-allocation.out             |   2 +-
 files changed, 172 insertions(+), 11 deletions(-)
  create mode 100755 tests/qemu-iotests/tests/nbd-multiconn
  create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out
 diff --git a/qapi/block-export.json b/qapi/block-export.json
 index XXXXXXX..XXXXXXX 100644
 --- a/qapi/block-export.json
 +++ b/qapi/block-export.json
@@ -XXX,XX +XXX,XX @@
  #             recreated on the fly while the NBD server is active.
  #             If missing, it will default to denying access (since 4.0).
  # @max-connections: The maximum number of connections to allow at the same
 -#                   time, 0 for unlimited. (since 5.2; default: 0)
 +#                   time, 0 for unlimited. Setting this to 1 also stops
 +#                   the server from advertising multiple client support
 +#                   (since 5.2; default: 0)
  #
  # Since: 4.2
  ##
@@ -XXX,XX +XXX,XX @@
  #             recreated on the fly while the NBD server is active.
  #             If missing, it will default to denying access (since 4.0).
  # @max-connections: The maximum number of connections to allow at the same
 -#                   time, 0 for unlimited. (since 5.2; default: 0)
 +#                   time, 0 for unlimited. Setting this to 1 also stops
 +#                   the server from advertising multiple client support
 +#                   (since 5.2; default: 0).
  #
  # Returns: error if the server is already running.
  #
 diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/interop/nbd.txt
 +++ b/docs/interop/nbd.txt
@@ -XXX,XX +XXX,XX @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
  * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
  NBD_CMD_FLAG_FAST_ZERO
  * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 +* 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
 diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst
 index XXXXXXX..XXXXXXX 100644
 --- a/docs/tools/qemu-nbd.rst
 +++ b/docs/tools/qemu-nbd.rst
@@ -XXX,XX +XXX,XX @@ driver options if :option:`--image-opts` is specified.
  .. option:: -e, --shared=NUM
    Allow up to *NUM* clients to share the device (default
 -  ``1``), 0 for unlimited. Safe for readers, but for now,
 -  consistency is not guaranteed between multiple writers.
 +  ``1``), 0 for unlimited.
  .. option:: -t, --persistent
 diff --git a/include/block/nbd.h b/include/block/nbd.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/block/nbd.h
 +++ b/include/block/nbd.h
@@ -XXX,XX +XXX,XX @@
  /*
 - *  Copyright (C) 2016-2020 Red Hat, Inc.
 + *  Copyright (C) 2016-2022 Red Hat, Inc.
   *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
   *
   *  Network Block Device
@@ -XXX,XX +XXX,XX @@ void nbd_client_put(NBDClient *client);
  void nbd_server_is_qemu_nbd(int max_connections);
  bool nbd_server_is_running(void);
 +int nbd_server_max_connections(void);
  void nbd_server_start(SocketAddress *addr, const char *tls_creds,
                        const char *tls_authz, uint32_t max_connections,
                        Error **errp);
 diff --git a/blockdev-nbd.c b/blockdev-nbd.c
 index XXXXXXX..XXXXXXX 100644
 --- a/blockdev-nbd.c
 +++ b/blockdev-nbd.c
@@ -XXX,XX +XXX,XX @@ bool nbd_server_is_running(void)
      return nbd_server || qemu_nbd_connections >= 0;
  }
 +int nbd_server_max_connections(void)
 +{
 +    return nbd_server ? nbd_server->max_connections : qemu_nbd_connections;
 +}
 +
  static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
  {
      nbd_client_put(client);
 diff --git a/nbd/server.c b/nbd/server.c
 index XXXXXXX..XXXXXXX 100644
 --- a/nbd/server.c
 +++ b/nbd/server.c
@@ -XXX,XX +XXX,XX @@
  /*
 - *  Copyright (C) 2016-2021 Red Hat, Inc.
 + *  Copyright (C) 2016-2022 Red Hat, Inc.
   *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
   *
   *  Network Block Device Server Side
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
      int64_t size;
      uint64_t perm, shared_perm;
      bool readonly = !exp_args->writable;
 -    bool shared = !exp_args->writable;
      BlockDirtyBitmapOrStrList *bitmaps;
      size_t i;
      int ret;
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
      exp->description = g_strdup(arg->description);
      exp->nbdflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH |
                       NBD_FLAG_SEND_FUA | NBD_FLAG_SEND_CACHE);
 +
 +    if (nbd_server_max_connections() != 1) {
 +        exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
 +    }
      if (readonly) {
          exp->nbdflags |= NBD_FLAG_READ_ONLY;
 -        if (shared) {
 -            exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
 -        }
      } else {
          exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES |
                            NBD_FLAG_SEND_FAST_ZERO);
 diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
 --- a/MAINTAINERS
 +++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: qemu-nbd.*
  F: blockdev-nbd.c
  F: docs/interop/nbd.txt
  F: docs/tools/qemu-nbd.rst
 +F: tests/qemu-iotests/tests/*nbd*
  T: git https://repo.or.cz/qemu/ericb.git nbd
  T: git https://src.openvz.org/scm/~vsementsov/qemu.git nbd
 diff --git a/tests/qemu-iotests/tests/nbd-multiconn b/tests/qemu-iotests/tests/nbd-multiconn
 new file mode 100755
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/tests/qemu-iotests/185
++++ b/tests/qemu-iotests/tests/nbd-multiconn
 @@ -XXX,XX +XXX,XX @@
-+#!/bin/bash
++#!/usr/bin/env python3
-+#
++# group: rw auto quick
-+# Test exiting qemu while jobs are still running
++#
-+#
++# Test cases for NBD multi-conn advertisement
-+# Copyright (C) 2017 Red Hat, Inc.
++#
 +# Copyright (C) 2022 Red Hat, Inc.
 +#
 +# This program is free software; you can redistribute it and/or modify
 +# it under the terms of the GNU General Public License as published by
 +# the Free Software Foundation; either version 2 of the License, or
 +# (at your option) any later version.
 ...
 +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 +# GNU General Public License for more details.
 +#
 +# You should have received a copy of the GNU General Public License
 +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
-+#
++
-+
++import os
-+# creator
++from contextlib import contextmanager
-+owner=kwolf@redhat.com
++import iotests
-+
++from iotests import qemu_img_create, qemu_io
-+seq=`basename $0`
++
-+echo "QA output created by $seq"
++
-+
++disk = os.path.join(iotests.test_dir, 'disk')
-+here=`pwd`
++size = '4M'
-+status=1 # failure is the default!
++nbd_sock = os.path.join(iotests.sock_dir, 'nbd_sock')
-+
++nbd_uri = 'nbd+unix:///{}?socket=' + nbd_sock
-+MIG_SOCKET="${TEST_DIR}/migrate"
++
 +
-+_cleanup()
++@contextmanager
-+{
++def open_nbd(export_name):
-+    rm -f "${TEST_IMG}.mid"
++    h = nbd.NBD()
-+    rm -f "${TEST_IMG}.copy"
++    try:
-+    _cleanup_test_img
++        h.connect_uri(nbd_uri.format(export_name))
-+    _cleanup_qemu
++        yield h
-+}
++    finally:
-+trap "_cleanup; exit \$status" 0 1 2 3 15
++        h.shutdown()
 +
-+# get standard environment, filters and checks
++class TestNbdMulticonn(iotests.QMPTestCase):
-+. ./common.rc
++    def setUp(self):
-+. ./common.filter
++        qemu_img_create('-f', iotests.imgfmt, disk, size)
-+. ./common.qemu
++        qemu_io('-c', 'w -P 1 0 2M', '-c', 'w -P 2 2M 2M', disk)
 +
-+_supported_fmt qcow2
++        self.vm = iotests.VM()
-+_supported_proto file
++        self.vm.launch()
-+_supported_os Linux
++        result = self.vm.qmp('blockdev-add', {
-+
++            'driver': 'qcow2',
-+size=64M
++            'node-name': 'n',
-+TEST_IMG="${TEST_IMG}.base" _make_test_img $size
++            'file': {'driver': 'file', 'filename': disk}
-+
++        })
-+echo
++        self.assert_qmp(result, 'return', {})
-+echo === Starting VM ===
++
-+echo
++    def tearDown(self):
-+
++        self.vm.shutdown()
-+qemu_comm_method="qmp"
++        os.remove(disk)
-+
++        try:
-+_launch_qemu \
++            os.remove(nbd_sock)
-+    -drive file="${TEST_IMG}.base",cache=$CACHEMODE,driver=$IMGFMT,id=disk
++        except OSError:
-+h=$QEMU_HANDLE
++            pass
-+_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
++
-+
++    @contextmanager
-+echo
++    def run_server(self, max_connections=None):
-+echo === Creating backing chain ===
++        args = {
-+echo
++            'addr': {
-+
++                'type': 'unix',
-+_send_qemu_cmd $h \
++                'data': {'path': nbd_sock}
-+    "{ 'execute': 'blockdev-snapshot-sync',
++            }
-+       'arguments': { 'device': 'disk',
++        }
-+                      'snapshot-file': '$TEST_IMG.mid',
++        if max_connections is not None:
-+                      'format': '$IMGFMT',
++            args['max-connections'] = max_connections
-+                      'mode': 'absolute-paths' } }" \
++
-+    "return"
++        result = self.vm.qmp('nbd-server-start', args)
-+
++        self.assert_qmp(result, 'return', {})
-+_send_qemu_cmd $h \
++        yield
-+    "{ 'execute': 'human-monitor-command',
++
-+       'arguments': { 'command-line':
++        result = self.vm.qmp('nbd-server-stop')
-+                      'qemu-io disk \"write 0 4M\"' } }" \
++        self.assert_qmp(result, 'return', {})
-+    "return"
++
-+
++    def add_export(self, name, writable=None):
-+_send_qemu_cmd $h \
++        args = {
-+    "{ 'execute': 'blockdev-snapshot-sync',
++            'type': 'nbd',
-+       'arguments': { 'device': 'disk',
++            'id': name,
-+                      'snapshot-file': '$TEST_IMG',
++            'node-name': 'n',
-+                      'format': '$IMGFMT',
++            'name': name,
-+                      'mode': 'absolute-paths' } }" \
++        }
-+    "return"
++        if writable is not None:
-+
++            args['writable'] = writable
-+echo
++
-+echo === Start commit job and exit qemu ===
++        result = self.vm.qmp('block-export-add', args)
-+echo
++        self.assert_qmp(result, 'return', {})
 +
-+# Note that the reference output intentionally includes the 'offset' field in
++    def test_default_settings(self):
-+# BLOCK_JOB_CANCELLED events for all of the following block jobs. They are
++        with self.run_server():
-+# predictable and any change in the offsets would hint at a bug in the job
++            self.add_export('r')
-+# throttling code.
++            self.add_export('w', writable=True)
-+#
++            with open_nbd('r') as h:
-+# In order to achieve these predictable offsets, all of the following tests
++                self.assertTrue(h.can_multi_conn())
-+# use speed=65536. Each job will perform exactly one iteration before it has
++            with open_nbd('w') as h:
-+# to sleep at least for a second, which is plenty of time for the 'quit' QMP
++                self.assertTrue(h.can_multi_conn())
-+# command to be received (after receiving the command, the rest runs
++
-+# synchronously, so jobs can arbitrarily continue or complete).
++    def test_limited_connections(self):
-+#
++        with self.run_server(max_connections=1):
-+# The buffer size for commit and streaming is 512k (waiting for 8 seconds after
++            self.add_export('r')
-+# the first request), for active commit and mirror it's large enough to cover
++            self.add_export('w', writable=True)
-+# the full 4M, and for backup it's the qcow2 cluster size, which we know is
++            with open_nbd('r') as h:
-+# 64k. As all of these are at least as large as the speed, we are sure that the
++                self.assertFalse(h.can_multi_conn())
-+# offset doesn't advance after the first iteration before qemu exits.
++            with open_nbd('w') as h:
-+
++                self.assertFalse(h.can_multi_conn())
-+_send_qemu_cmd $h \
++
-+    "{ 'execute': 'block-commit',
++    def test_parallel_writes(self):
-+       'arguments': { 'device': 'disk',
++        with self.run_server():
-+                      'base':'$TEST_IMG.base',
++            self.add_export('w', writable=True)
-+                      'top': '$TEST_IMG.mid',
++
-+                      'speed': 65536 } }" \
++            clients = [nbd.NBD() for _ in range(3)]
-+    "return"
++            for c in clients:
-+
++                c.connect_uri(nbd_uri.format('w'))
-+_send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
++                self.assertTrue(c.can_multi_conn())
-+wait=1 _cleanup_qemu
++
-+
++            initial_data = clients[0].pread(1024 * 1024, 0)
-+echo
++            self.assertEqual(initial_data, b'\x01' * 1024 * 1024)
-+echo === Start active commit job and exit qemu ===
++
-+echo
++            updated_data = b'\x03' * 1024 * 1024
-+
++            clients[1].pwrite(updated_data, 0)
-+_launch_qemu \
++            clients[2].flush()
-+    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
++            current_data = clients[0].pread(1024 * 1024, 0)
-+h=$QEMU_HANDLE
++
-+_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
++            self.assertEqual(updated_data, current_data)
 +
-+_send_qemu_cmd $h \
++            for i in range(3):
-+    "{ 'execute': 'block-commit',
++                clients[i].shutdown()
-+       'arguments': { 'device': 'disk',
++
-+                      'base':'$TEST_IMG.base',
++
-+                      'speed': 65536 } }" \
++if __name__ == '__main__':
-+    "return"
++    try:
-+
++        # Easier to use libnbd than to try and set up parallel
-+_send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
++        # 'qemu-nbd --list' or 'qemu-io' processes, but not all systems
-+wait=1 _cleanup_qemu
++        # have libnbd installed.
-+
++        import nbd  # type: ignore
-+echo
++
-+echo === Start mirror job and exit qemu ===
++        iotests.main(supported_fmts=['qcow2'])
-+echo
++    except ImportError:
-+
++        iotests.notrun('libnbd not installed')
-+_launch_qemu \
+diff --git a/tests/qemu-iotests/tests/nbd-multiconn.out b/tests/qemu-iotests/tests/nbd-multiconn.out
 +    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
 +h=$QEMU_HANDLE
 +_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
 +
 +_send_qemu_cmd $h \
 +    "{ 'execute': 'drive-mirror',
 +       'arguments': { 'device': 'disk',
 +                      'target': '$TEST_IMG.copy',
 +                      'format': '$IMGFMT',
 +                      'sync': 'full',
 +                      'speed': 65536 } }" \
 +    "return"
 +
 +_send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 +wait=1 _cleanup_qemu
 +
 +echo
 +echo === Start backup job and exit qemu ===
 +echo
 +
 +_launch_qemu \
 +    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
 +h=$QEMU_HANDLE
 +_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
 +
 +_send_qemu_cmd $h \
 +    "{ 'execute': 'drive-backup',
 +       'arguments': { 'device': 'disk',
 +                      'target': '$TEST_IMG.copy',
 +                      'format': '$IMGFMT',
 +                      'sync': 'full',
 +                      'speed': 65536 } }" \
 +    "return"
 +
 +_send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 +wait=1 _cleanup_qemu
 +
 +echo
 +echo === Start streaming job and exit qemu ===
 +echo
 +
 +_launch_qemu \
 +    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
 +h=$QEMU_HANDLE
 +_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
 +
 +_send_qemu_cmd $h \
 +    "{ 'execute': 'block-stream',
 +       'arguments': { 'device': 'disk',
 +                      'speed': 65536 } }" \
 +    "return"
 +
 +_send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
 +wait=1 _cleanup_qemu
 +
 +_check_test_img
 +
 +# success, all done
 +echo "*** done"
 +rm -f $seq.full
 +status=0
 diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
 new file mode 100644
 index XXXXXXX..XXXXXXX
 --- /dev/null
-+++ b/tests/qemu-iotests/185.out
++++ b/tests/qemu-iotests/tests/nbd-multiconn.out
 @@ -XXX,XX +XXX,XX @@
-+QA output created by 185
++...
-+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
++----------------------------------------------------------------------
-+
++Ran 3 tests
-+=== Starting VM ===
++
-+
++OK
-+{"return": {}}
+diff --git a/tests/qemu-iotests/tests/nbd-qemu-allocation.out b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
-+
+index XXXXXXX..XXXXXXX 100644
-+=== Creating backing chain ===
+--- a/tests/qemu-iotests/tests/nbd-qemu-allocation.out
-+
++++ b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
-+Formatting 'TEST_DIR/t.qcow2.mid', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.base backing_fmt=qcow2 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
+@@ -XXX,XX +XXX,XX @@ wrote 2097152/2097152 bytes at offset 1048576
-+{"return": {}}
+ exports available: 1
-+wrote 4194304/4194304 bytes at offset 0
+  export: ''
-+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+   size:  4194304
-+{"return": ""}
+-  flags: 0x58f ( readonly flush fua df multi cache )
-+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.mid backing_fmt=qcow2 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
++  flags: 0x48f ( readonly flush fua df cache )
-+{"return": {}}
+   min block: 1
-+
+   opt block: 4096
-+=== Start commit job and exit qemu ===
+   max block: 33554432
 +
 +{"return": {}}
 +{"return": {}}
 +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
 +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 524288, "speed": 65536, "type": "commit"}}
 +
 +=== Start active commit job and exit qemu ===
 +
 +{"return": {}}
 +{"return": {}}
 +{"return": {}}
 +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
 +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
 +
 +=== Start mirror job and exit qemu ===
 +
 +{"return": {}}
 +Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
 +{"return": {}}
 +{"return": {}}
 +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
 +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "mirror"}}
 +
 +=== Start backup job and exit qemu ===
 +
 +{"return": {}}
 +Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
 +{"return": {}}
 +{"return": {}}
 +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
 +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 65536, "speed": 65536, "type": "backup"}}
 +
 +=== Start streaming job and exit qemu ===
 +
 +{"return": {}}
 +{"return": {}}
 +{"return": {}}
 +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
 +{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 524288, "speed": 65536, "type": "stream"}}
 +No errors were found on the image.
 +*** done
 diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qemu-iotests/group
 +++ b/tests/qemu-iotests/group
@@ -XXX,XX +XXX,XX @@
 rw auto migration
 rw auto quick
 rw auto migration
 +185 rw auto
 --
-.8.3.1
+.35.3

-[Qemu-devel] [PULL 05/61] block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
+Deleted patch
-From: Stefan Hajnoczi <stefanha@redhat.com>
-Calling aio_poll() directly may have been fine previously, but this is
-the future, man!  The difference between an aio_poll() loop and
-BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext
-around aio_poll().
-This allows the IOThread to run fd handlers or BHs to complete the
-request.  Failure to release the AioContext causes deadlocks.
-Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object
-iothread.
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
----
- block/io.c | 4 +---
-file changed, 1 insertion(+), 3 deletions(-)
-diff --git a/block/io.c b/block/io.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/io.c
-+++ b/block/io.c
-@@ -XXX,XX +XXX,XX @@ bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
-         Coroutine *co = qemu_coroutine_create(bdrv_co_rw_vmstate_entry, &data);
-         bdrv_coroutine_enter(bs, co);
--        while (data.ret == -EINPROGRESS) {
--            aio_poll(bdrv_get_aio_context(bs), true);
--        }
-+        BDRV_POLL_WHILE(bs, data.ret == -EINPROGRESS);
-         return data.ret;
-     }
- }
---
-.8.3.1

-[Qemu-devel] [PULL 06/61] migration: avoid recursive AioContext locking in save_vmstate()
+Deleted patch
-From: Stefan Hajnoczi <stefanha@redhat.com>
-AioContext was designed to allow nested acquire/release calls.  It uses
-a recursive mutex so callers don't need to worry about nesting...or so
-we thought.
-BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
-the AioContext temporarily around aio_poll().  This gives IOThreads a
-chance to acquire the AioContext to process I/O completions.
-It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
-BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
-will not be able to acquire the AioContext if it was acquired
-multiple times.
-Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
-this patch simply avoids nested locking in save_vmstate().  It's the
-simplest fix and we should step back to consider the big picture with
-all the recent changes to block layer threading.
-This patch is the final fix to solve 'savevm' hanging with -object
-iothread.
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
----
- migration/savevm.c | 12 +++++++++++-
-file changed, 11 insertions(+), 1 deletion(-)
-diff --git a/migration/savevm.c b/migration/savevm.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -XXX,XX +XXX,XX @@ int save_snapshot(const char *name, Error **errp)
-         goto the_end;
-     }
-+    /* The bdrv_all_create_snapshot() call that follows acquires the AioContext
-+     * for itself.  BDRV_POLL_WHILE() does not support nested locking because
-+     * it only releases the lock once.  Therefore synchronous I/O will deadlock
-+     * unless we release the AioContext before bdrv_all_create_snapshot().
-+     */
-+    aio_context_release(aio_context);
-+    aio_context = NULL;
-+
-     ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, &bs);
-     if (ret < 0) {
-         error_setg(errp, "Error while creating snapshot on '%s'",
-@@ -XXX,XX +XXX,XX @@ int save_snapshot(const char *name, Error **errp)
-     ret = 0;
-  the_end:
--    aio_context_release(aio_context);
-+    if (aio_context) {
-+        aio_context_release(aio_context);
-+    }
-     if (saved_vm_running) {
-         vm_start();
-     }
---
-.8.3.1

-[Qemu-devel] [PULL 08/61] doc: Document generic -blockdev options
+Deleted patch
-This adds documentation for the -blockdev options that apply to all
-nodes independent of the block driver used.
-All options that are shared by -blockdev and -drive are now explained in
-the section for -blockdev. The documentation of -drive mentions that all
--blockdev options are accepted as well.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
----
- qemu-options.hx | 108 +++++++++++++++++++++++++++++++++++++++++---------------
-file changed, 79 insertions(+), 29 deletions(-)
-diff --git a/qemu-options.hx b/qemu-options.hx
-index XXXXXXX..XXXXXXX 100644
---- a/qemu-options.hx
-+++ b/qemu-options.hx
-@@ -XXX,XX +XXX,XX @@ DEF("blockdev", HAS_ARG, QEMU_OPTION_blockdev,
-     "          [,read-only=on|off][,detect-zeroes=on|off|unmap]\n"
-     "          [,driver specific parameters...]\n"
-     "                configure a block backend\n", QEMU_ARCH_ALL)
-+STEXI
-+@item -blockdev @var{option}[,@var{option}[,@var{option}[,...]]]
-+@findex -blockdev
-+
-+Define a new block driver node.
-+
-+@table @option
-+@item Valid options for any block driver node:
-+
-+@table @code
-+@item driver
-+Specifies the block driver to use for the given node.
-+@item node-name
-+This defines the name of the block driver node by which it will be referenced
-+later. The name must be unique, i.e. it must not match the name of a different
-+block driver node, or (if you use @option{-drive} as well) the ID of a drive.
-+
-+If no node name is specified, it is automatically generated. The generated node
-+name is not intended to be predictable and changes between QEMU invocations.
-+For the top level, an explicit node name must be specified.
-+@item read-only
-+Open the node read-only. Guest write attempts will fail.
-+@item cache.direct
-+The host page cache can be avoided with @option{cache.direct=on}. This will
-+attempt to do disk IO directly to the guest's memory. QEMU may still perform an
-+internal copy of the data.
-+@item cache.no-flush
-+In case you don't care about data integrity over host failures, you can use
-+@option{cache.no-flush=on}. This option tells QEMU that it never needs to write
-+any data to the disk but can instead keep things in cache. If anything goes
-+wrong, like your host losing power, the disk storage getting disconnected
-+accidentally, etc. your image will most probably be rendered unusable.
-+@item discard=@var{discard}
-+@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and controls
-+whether @code{discard} (also known as @code{trim} or @code{unmap}) requests are
-+ignored or passed to the filesystem. Some machine types may not support
-+discard requests.
-+@item detect-zeroes=@var{detect-zeroes}
-+@var{detect-zeroes} is "off", "on" or "unmap" and enables the automatic
-+conversion of plain zero writes by the OS to driver specific optimized
-+zero write commands. You may even choose "unmap" if @var{discard} is set
-+to "unmap" to allow a zero write to be converted to an @code{unmap} operation.
-+@end table
-+
-+@end table
-+
-+ETEXI
- DEF("drive", HAS_ARG, QEMU_OPTION_drive,
-     "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
-@@ -XXX,XX +XXX,XX @@ STEXI
- @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
- @findex -drive
--Define a new drive. Valid options are:
-+Define a new drive. This includes creating a block driver node (the backend) as
-+well as a guest device, and is mostly a shortcut for defining the corresponding
-+@option{-blockdev} and @option{-device} options.
-+
-+@option{-drive} accepts all options that are accepted by @option{-blockdev}. In
-+addition, it knows the following options:
- @table @option
- @item file=@var{file}
-@@ -XXX,XX +XXX,XX @@ These options have the same definition as they have in @option{-hdachs}.
- @var{snapshot} is "on" or "off" and controls snapshot mode for the given drive
- (see @option{-snapshot}).
- @item cache=@var{cache}
--@var{cache} is "none", "writeback", "unsafe", "directsync" or "writethrough" and controls how the host cache is used to access block data.
-+@var{cache} is "none", "writeback", "unsafe", "directsync" or "writethrough"
-+and controls how the host cache is used to access block data. This is a
-+shortcut that sets the @option{cache.direct} and @option{cache.no-flush}
-+options (as in @option{-blockdev}), and additionally @option{cache.writeback},
-+which provides a default for the @option{write-cache} option of block guest
-+devices (as in @option{-device}). The modes correspond to the following
-+settings:
-+
-+@c Our texi2pod.pl script doesn't support @multitable, so fall back to using
-+@c plain ASCII art (well, UTF-8 art really). This looks okay both in the manpage
-+@c and the HTML output.
-+@example
-+@             │ cache.writeback   cache.direct   cache.no-flush
-+─────────────┼─────────────────────────────────────────────────
-+writeback    │ on                off            off
-+none         │ on                on             off
-+writethrough │ off               off            off
-+directsync   │ off               on             off
-+unsafe       │ on                off            on
-+@end example
-+
-+The default mode is @option{cache=writeback}.
-+
- @item aio=@var{aio}
- @var{aio} is "threads", or "native" and selects between pthread based disk I/O and native Linux AIO.
--@item discard=@var{discard}
--@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap}) requests are ignored or passed to the filesystem.  Some machine types may not support discard requests.
- @item format=@var{format}
- Specify which disk @var{format} will be used rather than detecting
- the format.  Can be used to specify format=raw to avoid interpreting
-@@ -XXX,XX +XXX,XX @@ Specify which @var{action} to take on write and read errors. Valid actions are:
- "report" (report the error to the guest), "enospc" (pause QEMU only if the
- host disk is full; report the error to the guest otherwise).
- The default setting is @option{werror=enospc} and @option{rerror=report}.
--@item readonly
--Open drive @option{file} as read-only. Guest write attempts will fail.
- @item copy-on-read=@var{copy-on-read}
- @var{copy-on-read} is "on" or "off" and enables whether to copy read backing
- file sectors into the image file.
--@item detect-zeroes=@var{detect-zeroes}
--@var{detect-zeroes} is "off", "on" or "unmap" and enables the automatic
--conversion of plain zero writes by the OS to driver specific optimized
--zero write commands. You may even choose "unmap" if @var{discard} is set
--to "unmap" to allow a zero write to be converted to an UNMAP operation.
- @item bps=@var{b},bps_rd=@var{r},bps_wr=@var{w}
- Specify bandwidth throttling limits in bytes per second, either for all request
- types or for reads or writes only.  Small values can lead to timeouts or hangs
-@@ -XXX,XX +XXX,XX @@ prevent guests from circumventing throttling limits by using many small disks
- instead of a single larger disk.
- @end table
--By default, the @option{cache=writeback} mode is used. It will report data
-+By default, the @option{cache.writeback=on} mode is used. It will report data
- writes as completed as soon as the data is present in the host page cache.
- This is safe as long as your guest OS makes sure to correctly flush disk caches
- where needed. If your guest OS does not handle volatile disk write caches
- correctly and your host crashes or loses power, then the guest may experience
- data corruption.
--For such guests, you should consider using @option{cache=writethrough}. This
-+For such guests, you should consider using @option{cache.writeback=off}. This
- means that the host page cache will be used to read and write data, but write
- notification will be sent to the guest only after QEMU has made sure to flush
- each write to the disk. Be aware that this has a major impact on performance.
--The host page cache can be avoided entirely with @option{cache=none}.  This will
--attempt to do disk IO directly to the guest's memory.  QEMU may still perform
--an internal copy of the data. Note that this is considered a writeback mode and
--the guest OS must handle the disk write cache correctly in order to avoid data
--corruption on host crashes.
--
--The host page cache can be avoided while only sending write notifications to
--the guest when the data has been flushed to the disk using
--@option{cache=directsync}.
--
--In case you don't care about data integrity over host failures, use
--@option{cache=unsafe}. This option tells QEMU that it never needs to write any
--data to the disk but can instead keep things in cache. If anything goes wrong,
--like your host losing power, the disk storage getting disconnected accidentally,
--etc. your image will most probably be rendered unusable.   When using
--the @option{-snapshot} option, unsafe caching is always used.
-+When using the @option{-snapshot} option, unsafe caching is always used.
- Copy-on-read avoids accessing the same backing file sectors repeatedly and is
- useful when the backing file is over a slow network.  By default copy-on-read
---
-.8.3.1

-[Qemu-devel] [PULL 09/61] doc: Document driver-specific -blockdev options
+Deleted patch
-This documents the driver-specific options for the raw, qcow2 and file
-block drivers for the man page. For everything else, we refer to the
-QAPI documentation.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Max Reitz <mreitz@redhat.com>
----
- qemu-options.hx | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
-file changed, 114 insertions(+), 1 deletion(-)
-diff --git a/qemu-options.hx b/qemu-options.hx
-index XXXXXXX..XXXXXXX 100644
---- a/qemu-options.hx
-+++ b/qemu-options.hx
-@@ -XXX,XX +XXX,XX @@ STEXI
- @item -blockdev @var{option}[,@var{option}[,@var{option}[,...]]]
- @findex -blockdev
--Define a new block driver node.
-+Define a new block driver node. Some of the options apply to all block drivers,
-+other options are only accepted for a specific block driver. See below for a
-+list of generic options and options for the most common block drivers.
-+
-+Options that expect a reference to another node (e.g. @code{file}) can be
-+given in two ways. Either you specify the node name of an already existing node
-+(file=@var{node-name}), or you define a new node inline, adding options
-+for the referenced node after a dot (file.filename=@var{path},file.aio=native).
-+
-+A block driver node created with @option{-blockdev} can be used for a guest
-+device by specifying its node name for the @code{drive} property in a
-+@option{-device} argument that defines a block device.
- @table @option
- @item Valid options for any block driver node:
-@@ -XXX,XX +XXX,XX @@ zero write commands. You may even choose "unmap" if @var{discard} is set
- to "unmap" to allow a zero write to be converted to an @code{unmap} operation.
- @end table
-+@item Driver-specific options for @code{file}
-+
-+This is the protocol-level block driver for accessing regular files.
-+
-+@table @code
-+@item filename
-+The path to the image file in the local filesystem
-+@item aio
-+Specifies the AIO backend (threads/native, default: threads)
-+@end table
-+Example:
-+@example
-+-blockdev driver=file,node-name=disk,filename=disk.img
-+@end example
-+
-+@item Driver-specific options for @code{raw}
-+
-+This is the image format block driver for raw images. It is usually
-+stacked on top of a protocol level block driver such as @code{file}.
-+
-+@table @code
-+@item file
-+Reference to or definition of the data source block driver node
-+(e.g. a @code{file} driver node)
-+@end table
-+Example 1:
-+@example
-+-blockdev driver=file,node-name=disk_file,filename=disk.img
-+-blockdev driver=raw,node-name=disk,file=disk_file
-+@end example
-+Example 2:
-+@example
-+-blockdev driver=raw,node-name=disk,file.driver=file,file.filename=disk.img
-+@end example
-+
-+@item Driver-specific options for @code{qcow2}
-+
-+This is the image format block driver for qcow2 images. It is usually
-+stacked on top of a protocol level block driver such as @code{file}.
-+
-+@table @code
-+@item file
-+Reference to or definition of the data source block driver node
-+(e.g. a @code{file} driver node)
-+
-+@item backing
-+Reference to or definition of the backing file block device (default is taken
-+from the image file). It is allowed to pass an empty string here in order to
-+disable the default backing file.
-+
-+@item lazy-refcounts
-+Whether to enable the lazy refcounts feature (on/off; default is taken from the
-+image file)
-+
-+@item cache-size
-+The maximum total size of the L2 table and refcount block caches in bytes
-+(default: 1048576 bytes or 8 clusters, whichever is larger)
-+
-+@item l2-cache-size
-+The maximum size of the L2 table cache in bytes
-+(default: 4/5 of the total cache size)
-+
-+@item refcount-cache-size
-+The maximum size of the refcount block cache in bytes
-+(default: 1/5 of the total cache size)
-+
-+@item cache-clean-interval
-+Clean unused entries in the L2 and refcount caches. The interval is in seconds.
-+The default value is 0 and it disables this feature.
-+
-+@item pass-discard-request
-+Whether discard requests to the qcow2 device should be forwarded to the data
-+source (on/off; default: on if discard=unmap is specified, off otherwise)
-+
-+@item pass-discard-snapshot
-+Whether discard requests for the data source should be issued when a snapshot
-+operation (e.g. deleting a snapshot) frees clusters in the qcow2 file (on/off;
-+default: on)
-+
-+@item pass-discard-other
-+Whether discard requests for the data source should be issued on other
-+occasions where a cluster gets freed (on/off; default: off)
-+
-+@item overlap-check
-+Which overlap checks to perform for writes to the image
-+(none/constant/cached/all; default: cached). For details or finer
-+granularity control refer to the QAPI documentation of @code{blockdev-add}.
-+@end table
-+
-+Example 1:
-+@example
-+-blockdev driver=file,node-name=my_file,filename=/tmp/disk.qcow2
-+-blockdev driver=qcow2,node-name=hda,file=my_file,overlap-check=none,cache-size=16777216
-+@end example
-+Example 2:
-+@example
-+-blockdev driver=qcow2,node-name=disk,file.driver=http,file.filename=http://example.com/image.qcow2
-+@end example
-+
-+@item Driver-specific options for other drivers
-+Please refer to the QAPI documentation of the @code{blockdev-add} QMP command.
-+
- @end table
- ETEXI
---
-.8.3.1

-[Qemu-devel] [PULL 10/61] throttle: Update throttle-groups.c documentation
+Deleted patch
-From: Alberto Garcia <berto@igalia.com>
-There used to be throttle_timers_{detach,attach}_aio_context() calls
-in bdrv_set_aio_context(), but since 7ca7f0f6db1fedd28d490795d778cf239
-they are now in blk_set_aio_context().
-Signed-off-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
----
- block/throttle-groups.c | 2 +-
-file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/block/throttle-groups.c b/block/throttle-groups.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/throttle-groups.c
-+++ b/block/throttle-groups.c
-@@ -XXX,XX +XXX,XX @@
-  * Again, all this is handled internally and is mostly transparent to
-  * the outside. The 'throttle_timers' field however has an additional
-  * constraint because it may be temporarily invalid (see for example
-- * bdrv_set_aio_context()). Therefore in this file a thread will
-+ * blk_set_aio_context()). Therefore in this file a thread will
-  * access some other BlockBackend's timers only after verifying that
-  * that BlockBackend has throttled requests in the queue.
-  */
---
-.8.3.1

-[Qemu-devel] [PULL 12/61] migration: hold AioContext lock for loadvm qemu_fclose()
+Deleted patch
-From: Stefan Hajnoczi <stefanha@redhat.com>
-migration_incoming_state_destroy() uses qemu_fclose() on the vmstate
-file.  Make sure to call it inside an AioContext acquire/release region.
-This fixes an 'qemu: qemu_mutex_unlock: Operation not permitted' abort
-in loadvm.
-This patch closes the vmstate file before ending the drained region.
-Previously we closed the vmstate file after ending the drained region.
-The order does not matter.
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
----
- migration/savevm.c | 2 +-
-file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/migration/savevm.c b/migration/savevm.c
-index XXXXXXX..XXXXXXX 100644
---- a/migration/savevm.c
-+++ b/migration/savevm.c
-@@ -XXX,XX +XXX,XX @@ int load_snapshot(const char *name, Error **errp)
-     aio_context_acquire(aio_context);
-     ret = qemu_loadvm_state(f);
-+    migration_incoming_state_destroy();
-     aio_context_release(aio_context);
-     bdrv_drain_all_end();
--    migration_incoming_state_destroy();
-     if (ret < 0) {
-         error_setg(errp, "Error %d while loading VM state", ret);
-         return ret;
---
-.8.3.1

-[Qemu-devel] [PULL 13/61] qemu-iotests: 068: extract _qemu() function
+Deleted patch
-From: Stefan Hajnoczi <stefanha@redhat.com>
-Avoid duplicating the QEMU command-line.
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
----
- tests/qemu-iotests/068 | 15 +++++++++------
-file changed, 9 insertions(+), 6 deletions(-)
-diff --git a/tests/qemu-iotests/068 b/tests/qemu-iotests/068
-index XXXXXXX..XXXXXXX 100755
---- a/tests/qemu-iotests/068
-+++ b/tests/qemu-iotests/068
-@@ -XXX,XX +XXX,XX @@ case "$QEMU_DEFAULT_MACHINE" in
-       ;;
- esac
--# Give qemu some time to boot before saving the VM state
--bash -c 'sleep 1; echo -e "savevm 0\nquit"' |\
--    $QEMU $platform_parm -nographic -monitor stdio -serial none -hda "$TEST_IMG" |\
-+_qemu()
-+{
-+    $QEMU $platform_parm -nographic -monitor stdio -serial none -hda "$TEST_IMG" \
-+          "$@" |\
-     _filter_qemu | _filter_hmp
-+}
-+
-+# Give qemu some time to boot before saving the VM state
-+bash -c 'sleep 1; echo -e "savevm 0\nquit"' | _qemu
- # Now try to continue from that VM state (this should just work)
--echo quit |\
--    $QEMU $platform_parm -nographic -monitor stdio -serial none -hda "$TEST_IMG" -loadvm 0 |\
--    _filter_qemu | _filter_hmp
-+echo quit | _qemu -loadvm 0
- # success, all done
- echo "*** done"
---
-.8.3.1

-[Qemu-devel] [PULL 18/61] qcow2: Use unsigned int for both members of Qcow2COWRegion
+Deleted patch
-From: Alberto Garcia <berto@igalia.com>
-Qcow2COWRegion has two attributes:
-- The offset of the COW region from the start of the first cluster
-  touched by the I/O request. Since it's always going to be positive
-  and the maximum request size is at most INT_MAX, we can use a
-  regular unsigned int to store this offset.
-- The size of the COW region in bytes. This is guaranteed to be >= 0,
-  so we should use an unsigned type instead.
-In x86_64 this reduces the size of Qcow2COWRegion from 16 to 8 bytes.
-It will also help keep some assertions simpler now that we know that
-there are no negative numbers.
-The prototype of do_perform_cow() is also updated to reflect these
-changes.
-Signed-off-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Kevin Wolf <kwolf@redhat.com>
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
----
- block/qcow2-cluster.c | 4 ++--
- block/qcow2.h         | 4 ++--
-files changed, 4 insertions(+), 4 deletions(-)
-diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2-cluster.c
-+++ b/block/qcow2-cluster.c
-@@ -XXX,XX +XXX,XX @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
- static int coroutine_fn do_perform_cow(BlockDriverState *bs,
-                                        uint64_t src_cluster_offset,
-                                        uint64_t cluster_offset,
--                                       int offset_in_cluster,
--                                       int bytes)
-+                                       unsigned offset_in_cluster,
-+                                       unsigned bytes)
- {
-     BDRVQcow2State *s = bs->opaque;
-     QEMUIOVector qiov;
-diff --git a/block/qcow2.h b/block/qcow2.h
-index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2.h
-+++ b/block/qcow2.h
-@@ -XXX,XX +XXX,XX @@ typedef struct Qcow2COWRegion {
-      * Offset of the COW region in bytes from the start of the first cluster
-      * touched by the request.
-      */
--    uint64_t    offset;
-+    unsigned    offset;
-     /** Number of bytes to copy */
--    int         nb_bytes;
-+    unsigned    nb_bytes;
- } Qcow2COWRegion;
- /**
---
-.8.3.1

-[Qemu-devel] [PULL 22/61] qcow2: Pass a QEMUIOVector to do_perform_cow_{read, write}()
+Deleted patch
-From: Alberto Garcia <berto@igalia.com>
-Instead of passing a single buffer pointer to do_perform_cow_write(),
-pass a QEMUIOVector. This will allow us to merge the write requests
-for the COW regions and the actual data into a single one.
-Although do_perform_cow_read() does not strictly need to change its
-API, we're doing it here as well for consistency.
-Signed-off-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Kevin Wolf <kwolf@redhat.com>
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
----
- block/qcow2-cluster.c | 51 ++++++++++++++++++++++++---------------------------
-file changed, 24 insertions(+), 27 deletions(-)
-diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2-cluster.c
-+++ b/block/qcow2-cluster.c
-@@ -XXX,XX +XXX,XX @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
- static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
-                                             uint64_t src_cluster_offset,
-                                             unsigned offset_in_cluster,
--                                            uint8_t *buffer,
--                                            unsigned bytes)
-+                                            QEMUIOVector *qiov)
- {
--    QEMUIOVector qiov;
--    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
-     int ret;
--    if (bytes == 0) {
-+    if (qiov->size == 0) {
-         return 0;
-     }
--    qemu_iovec_init_external(&qiov, &iov, 1);
--
-     BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
-     if (!bs->drv) {
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
-      * which can lead to deadlock when block layer copy-on-read is enabled.
-      */
-     ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
--                                  bytes, &qiov, 0);
-+                                  qiov->size, qiov, 0);
-     if (ret < 0) {
-         return ret;
-     }
-@@ -XXX,XX +XXX,XX @@ static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
- static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
-                                              uint64_t cluster_offset,
-                                              unsigned offset_in_cluster,
--                                             uint8_t *buffer,
--                                             unsigned bytes)
-+                                             QEMUIOVector *qiov)
- {
--    QEMUIOVector qiov;
--    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
-     int ret;
--    if (bytes == 0) {
-+    if (qiov->size == 0) {
-         return 0;
-     }
--    qemu_iovec_init_external(&qiov, &iov, 1);
--
-     ret = qcow2_pre_write_overlap_check(bs, 0,
--            cluster_offset + offset_in_cluster, bytes);
-+            cluster_offset + offset_in_cluster, qiov->size);
-     if (ret < 0) {
-         return ret;
-     }
-     BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
-     ret = bdrv_co_pwritev(bs->file, cluster_offset + offset_in_cluster,
--                          bytes, &qiov, 0);
-+                          qiov->size, qiov, 0);
-     if (ret < 0) {
-         return ret;
-     }
-@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
-     unsigned data_bytes = end->offset - (start->offset + start->nb_bytes);
-     bool merge_reads;
-     uint8_t *start_buffer, *end_buffer;
-+    QEMUIOVector qiov;
-     int ret;
-     assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
-@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
-     /* The part of the buffer where the end region is located */
-     end_buffer = start_buffer + buffer_size - end->nb_bytes;
-+    qemu_iovec_init(&qiov, 1);
-+
-     qemu_co_mutex_unlock(&s->lock);
-     /* First we read the existing data from both COW regions. We
-      * either read the whole region in one go, or the start and end
-      * regions separately. */
-     if (merge_reads) {
--        ret = do_perform_cow_read(bs, m->offset, start->offset,
--                                  start_buffer, buffer_size);
-+        qemu_iovec_add(&qiov, start_buffer, buffer_size);
-+        ret = do_perform_cow_read(bs, m->offset, start->offset, &qiov);
-     } else {
--        ret = do_perform_cow_read(bs, m->offset, start->offset,
--                                  start_buffer, start->nb_bytes);
-+        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
-+        ret = do_perform_cow_read(bs, m->offset, start->offset, &qiov);
-         if (ret < 0) {
-             goto fail;
-         }
--        ret = do_perform_cow_read(bs, m->offset, end->offset,
--                                  end_buffer, end->nb_bytes);
-+        qemu_iovec_reset(&qiov);
-+        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
-+        ret = do_perform_cow_read(bs, m->offset, end->offset, &qiov);
-     }
-     if (ret < 0) {
-         goto fail;
-@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
-     }
-     /* And now we can write everything */
--    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
--                               start_buffer, start->nb_bytes);
-+    qemu_iovec_reset(&qiov);
-+    qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
-+    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
-     if (ret < 0) {
-         goto fail;
-     }
--    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset,
--                               end_buffer, end->nb_bytes);
-+    qemu_iovec_reset(&qiov);
-+    qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
-+    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
- fail:
-     qemu_co_mutex_lock(&s->lock);
-@@ -XXX,XX +XXX,XX @@ fail:
-     }
-     qemu_vfree(start_buffer);
-+    qemu_iovec_destroy(&qiov);
-     return ret;
- }
---
-.8.3.1

-[Qemu-devel] [PULL 23/61] qcow2: Merge the writing of the COW regions with the guest data
+Deleted patch
-From: Alberto Garcia <berto@igalia.com>
-If the guest tries to write data that results on the allocation of a
-new cluster, instead of writing the guest data first and then the data
-from the COW regions, write everything together using one single I/O
-operation.
-This can improve the write performance by 25% or more, depending on
-several factors such as the media type, the cluster size and the I/O
-request size.
-Signed-off-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Kevin Wolf <kwolf@redhat.com>
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
----
- block/qcow2-cluster.c | 40 ++++++++++++++++++++++++--------
- block/qcow2.c         | 64 +++++++++++++++++++++++++++++++++++++++++++--------
- block/qcow2.h         |  7 ++++++
-files changed, 91 insertions(+), 20 deletions(-)
-diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2-cluster.c
-+++ b/block/qcow2-cluster.c
-@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
-     assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
-     assert(start->nb_bytes + end->nb_bytes <= UINT_MAX - data_bytes);
-     assert(start->offset + start->nb_bytes <= end->offset);
-+    assert(!m->data_qiov || m->data_qiov->size == data_bytes);
-     if (start->nb_bytes == 0 && end->nb_bytes == 0) {
-         return 0;
-@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
-     /* The part of the buffer where the end region is located */
-     end_buffer = start_buffer + buffer_size - end->nb_bytes;
--    qemu_iovec_init(&qiov, 1);
-+    qemu_iovec_init(&qiov, 2 + (m->data_qiov ? m->data_qiov->niov : 0));
-     qemu_co_mutex_unlock(&s->lock);
-     /* First we read the existing data from both COW regions. We
-@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
-         }
-     }
--    /* And now we can write everything */
--    qemu_iovec_reset(&qiov);
--    qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
--    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
--    if (ret < 0) {
--        goto fail;
-+    /* And now we can write everything. If we have the guest data we
-+     * can write everything in one single operation */
-+    if (m->data_qiov) {
-+        qemu_iovec_reset(&qiov);
-+        if (start->nb_bytes) {
-+            qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
-+        }
-+        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
-+        if (end->nb_bytes) {
-+            qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
-+        }
-+        /* NOTE: we have a write_aio blkdebug event here followed by
-+         * a cow_write one in do_perform_cow_write(), but there's only
-+         * one single I/O operation */
-+        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-+        ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
-+    } else {
-+        /* If there's no guest data then write both COW regions separately */
-+        qemu_iovec_reset(&qiov);
-+        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
-+        ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
-+        if (ret < 0) {
-+            goto fail;
-+        }
-+
-+        qemu_iovec_reset(&qiov);
-+        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
-+        ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
-     }
--    qemu_iovec_reset(&qiov);
--    qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
--    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
- fail:
-     qemu_co_mutex_lock(&s->lock);
-diff --git a/block/qcow2.c b/block/qcow2.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2.c
-+++ b/block/qcow2.c
-@@ -XXX,XX +XXX,XX @@ fail:
-     return ret;
- }
-+/* Check if it's possible to merge a write request with the writing of
-+ * the data from the COW regions */
-+static bool merge_cow(uint64_t offset, unsigned bytes,
-+                      QEMUIOVector *hd_qiov, QCowL2Meta *l2meta)
-+{
-+    QCowL2Meta *m;
-+
-+    for (m = l2meta; m != NULL; m = m->next) {
-+        /* If both COW regions are empty then there's nothing to merge */
-+        if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
-+            continue;
-+        }
-+
-+        /* The data (middle) region must be immediately after the
-+         * start region */
-+        if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
-+            continue;
-+        }
-+
-+        /* The end region must be immediately after the data (middle)
-+         * region */
-+        if (m->offset + m->cow_end.offset != offset + bytes) {
-+            continue;
-+        }
-+
-+        /* Make sure that adding both COW regions to the QEMUIOVector
-+         * does not exceed IOV_MAX */
-+        if (hd_qiov->niov > IOV_MAX - 2) {
-+            continue;
-+        }
-+
-+        m->data_qiov = hd_qiov;
-+        return true;
-+    }
-+
-+    return false;
-+}
-+
- static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
-                                          uint64_t bytes, QEMUIOVector *qiov,
-                                          int flags)
-@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
-             goto fail;
-         }
--        qemu_co_mutex_unlock(&s->lock);
--        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
--        trace_qcow2_writev_data(qemu_coroutine_self(),
--                                cluster_offset + offset_in_cluster);
--        ret = bdrv_co_pwritev(bs->file,
--                              cluster_offset + offset_in_cluster,
--                              cur_bytes, &hd_qiov, 0);
--        qemu_co_mutex_lock(&s->lock);
--        if (ret < 0) {
--            goto fail;
-+        /* If we need to do COW, check if it's possible to merge the
-+         * writing of the guest data together with that of the COW regions.
-+         * If it's not possible (or not necessary) then write the
-+         * guest data now. */
-+        if (!merge_cow(offset, cur_bytes, &hd_qiov, l2meta)) {
-+            qemu_co_mutex_unlock(&s->lock);
-+            BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-+            trace_qcow2_writev_data(qemu_coroutine_self(),
-+                                    cluster_offset + offset_in_cluster);
-+            ret = bdrv_co_pwritev(bs->file,
-+                                  cluster_offset + offset_in_cluster,
-+                                  cur_bytes, &hd_qiov, 0);
-+            qemu_co_mutex_lock(&s->lock);
-+            if (ret < 0) {
-+                goto fail;
-+            }
-         }
-         while (l2meta != NULL) {
-diff --git a/block/qcow2.h b/block/qcow2.h
-index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2.h
-+++ b/block/qcow2.h
-@@ -XXX,XX +XXX,XX @@ typedef struct QCowL2Meta
-      */
-     Qcow2COWRegion cow_end;
-+    /**
-+     * The I/O vector with the data from the actual guest write request.
-+     * If non-NULL, this is meant to be merged together with the data
-+     * from @cow_start and @cow_end into one single write operation.
-+     */
-+    QEMUIOVector *data_qiov;
-+
-     /** Pointer to next L2Meta of the same write request */
-     struct QCowL2Meta *next;
---
-.8.3.1

-[Qemu-devel] [PULL 24/61] qcow2: Use offset_into_cluster() and offset_to_l2_index()
+Deleted patch
-From: Alberto Garcia <berto@igalia.com>
-We already have functions for doing these calculations, so let's use
-them instead of doing everything by hand. This makes the code a bit
-more readable.
-Signed-off-by: Alberto Garcia <berto@igalia.com>
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
----
- block/qcow2-cluster.c | 4 ++--
- block/qcow2.c         | 2 +-
-files changed, 3 insertions(+), 3 deletions(-)
-diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2-cluster.c
-+++ b/block/qcow2-cluster.c
-@@ -XXX,XX +XXX,XX @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
-     /* find the cluster offset for the given disk offset */
--    l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
-+    l2_index = offset_to_l2_index(s, offset);
-     *cluster_offset = be64_to_cpu(l2_table[l2_index]);
-     nb_clusters = size_to_clusters(s, bytes_needed);
-@@ -XXX,XX +XXX,XX @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
-     /* find the cluster offset for the given disk offset */
--    l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
-+    l2_index = offset_to_l2_index(s, offset);
-     *new_l2_table = l2_table;
-     *new_l2_index = l2_index;
-diff --git a/block/qcow2.c b/block/qcow2.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qcow2.c
-+++ b/block/qcow2.c
-@@ -XXX,XX +XXX,XX @@ static int validate_table_offset(BlockDriverState *bs, uint64_t offset,
-     }
-     /* Tables must be cluster aligned */
--    if (offset & (s->cluster_size - 1)) {
-+    if (offset_into_cluster(s, offset) != 0) {
-         return -EINVAL;
-     }
---
-.8.3.1

-[Qemu-devel] [PULL 28/61] qed: Remove callback from qed_read_l2_table()
+Deleted patch
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed-cluster.c | 94 ++++++++++++++++++-----------------------------------
- block/qed-table.c   | 15 +++------
- block/qed.h         |  3 +-
-files changed, 36 insertions(+), 76 deletions(-)
-diff --git a/block/qed-cluster.c b/block/qed-cluster.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed-cluster.c
-+++ b/block/qed-cluster.c
-@@ -XXX,XX +XXX,XX @@ static unsigned int qed_count_contiguous_clusters(BDRVQEDState *s,
-     return i - index;
- }
--typedef struct {
--    BDRVQEDState *s;
--    uint64_t pos;
--    size_t len;
--
--    QEDRequest *request;
--
--    /* User callback */
--    QEDFindClusterFunc *cb;
--    void *opaque;
--} QEDFindClusterCB;
--
--static void qed_find_cluster_cb(void *opaque, int ret)
--{
--    QEDFindClusterCB *find_cluster_cb = opaque;
--    BDRVQEDState *s = find_cluster_cb->s;
--    QEDRequest *request = find_cluster_cb->request;
--    uint64_t offset = 0;
--    size_t len = 0;
--    unsigned int index;
--    unsigned int n;
--
--    qed_acquire(s);
--    if (ret) {
--        goto out;
--    }
--
--    index = qed_l2_index(s, find_cluster_cb->pos);
--    n = qed_bytes_to_clusters(s,
--                              qed_offset_into_cluster(s, find_cluster_cb->pos) +
--                              find_cluster_cb->len);
--    n = qed_count_contiguous_clusters(s, request->l2_table->table,
--                                      index, n, &offset);
--
--    if (qed_offset_is_unalloc_cluster(offset)) {
--        ret = QED_CLUSTER_L2;
--    } else if (qed_offset_is_zero_cluster(offset)) {
--        ret = QED_CLUSTER_ZERO;
--    } else if (qed_check_cluster_offset(s, offset)) {
--        ret = QED_CLUSTER_FOUND;
--    } else {
--        ret = -EINVAL;
--    }
--
--    len = MIN(find_cluster_cb->len, n * s->header.cluster_size -
--              qed_offset_into_cluster(s, find_cluster_cb->pos));
--
--out:
--    find_cluster_cb->cb(find_cluster_cb->opaque, ret, offset, len);
--    qed_release(s);
--    g_free(find_cluster_cb);
--}
--
- /**
-  * Find the offset of a data cluster
-  *
-@@ -XXX,XX +XXX,XX @@ out:
- void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
-                       size_t len, QEDFindClusterFunc *cb, void *opaque)
- {
--    QEDFindClusterCB *find_cluster_cb;
-     uint64_t l2_offset;
-+    uint64_t offset = 0;
-+    unsigned int index;
-+    unsigned int n;
-+    int ret;
-     /* Limit length to L2 boundary.  Requests are broken up at the L2 boundary
-      * so that a request acts on one L2 table at a time.
-@@ -XXX,XX +XXX,XX @@ void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
-         return;
-     }
--    find_cluster_cb = g_malloc(sizeof(*find_cluster_cb));
--    find_cluster_cb->s = s;
--    find_cluster_cb->pos = pos;
--    find_cluster_cb->len = len;
--    find_cluster_cb->cb = cb;
--    find_cluster_cb->opaque = opaque;
--    find_cluster_cb->request = request;
-+    ret = qed_read_l2_table(s, request, l2_offset);
-+    qed_acquire(s);
-+    if (ret) {
-+        goto out;
-+    }
-+
-+    index = qed_l2_index(s, pos);
-+    n = qed_bytes_to_clusters(s,
-+                              qed_offset_into_cluster(s, pos) + len);
-+    n = qed_count_contiguous_clusters(s, request->l2_table->table,
-+                                      index, n, &offset);
-+
-+    if (qed_offset_is_unalloc_cluster(offset)) {
-+        ret = QED_CLUSTER_L2;
-+    } else if (qed_offset_is_zero_cluster(offset)) {
-+        ret = QED_CLUSTER_ZERO;
-+    } else if (qed_check_cluster_offset(s, offset)) {
-+        ret = QED_CLUSTER_FOUND;
-+    } else {
-+        ret = -EINVAL;
-+    }
-+
-+    len = MIN(len,
-+              n * s->header.cluster_size - qed_offset_into_cluster(s, pos));
--    qed_read_l2_table(s, request, l2_offset,
--                      qed_find_cluster_cb, find_cluster_cb);
-+out:
-+    cb(opaque, ret, offset, len);
-+    qed_release(s);
- }
-diff --git a/block/qed-table.c b/block/qed-table.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed-table.c
-+++ b/block/qed-table.c
-@@ -XXX,XX +XXX,XX @@ int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
-     return ret;
- }
--void qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset,
--                       BlockCompletionFunc *cb, void *opaque)
-+int qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset)
- {
-     int ret;
-@@ -XXX,XX +XXX,XX @@ void qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset,
-     /* Check for cached L2 entry */
-     request->l2_table = qed_find_l2_cache_entry(&s->l2_cache, offset);
-     if (request->l2_table) {
--        cb(opaque, 0);
--        return;
-+        return 0;
-     }
-     request->l2_table = qed_alloc_l2_cache_entry(&s->l2_cache);
-@@ -XXX,XX +XXX,XX @@ void qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset,
-     }
-     qed_release(s);
--    cb(opaque, ret);
-+    return ret;
- }
- int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request, uint64_t offset)
- {
--    int ret = -EINPROGRESS;
--
--    qed_read_l2_table(s, request, offset, qed_sync_cb, &ret);
--    BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS);
--
--    return ret;
-+    return qed_read_l2_table(s, request, offset);
- }
- void qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
-diff --git a/block/qed.h b/block/qed.h
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.h
-+++ b/block/qed.h
-@@ -XXX,XX +XXX,XX @@ int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
-                             unsigned int n);
- int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
-                            uint64_t offset);
--void qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset,
--                       BlockCompletionFunc *cb, void *opaque);
-+int qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset);
- void qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
-                         unsigned int index, unsigned int n, bool flush,
-                         BlockCompletionFunc *cb, void *opaque);
---
-.8.3.1

-[Qemu-devel] [PULL 29/61] qed: Remove callback from qed_find_cluster()
+Deleted patch
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed-cluster.c | 39 ++++++++++++++++++++++-----------------
- block/qed.c         | 24 +++++++++++-------------
- block/qed.h         |  4 ++--
-files changed, 35 insertions(+), 32 deletions(-)
-diff --git a/block/qed-cluster.c b/block/qed-cluster.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed-cluster.c
-+++ b/block/qed-cluster.c
-@@ -XXX,XX +XXX,XX @@ static unsigned int qed_count_contiguous_clusters(BDRVQEDState *s,
-  * @s:          QED state
-  * @request:    L2 cache entry
-  * @pos:        Byte position in device
-- * @len:        Number of bytes
-- * @cb:         Completion function
-- * @opaque:     User data for completion function
-+ * @len:        Number of bytes (may be shortened on return)
-+ * @img_offset: Contains offset in the image file on success
-  *
-  * This function translates a position in the block device to an offset in the
-- * image file.  It invokes the cb completion callback to report back the
-- * translated offset or unallocated range in the image file.
-+ * image file. The translated offset or unallocated range in the image file is
-+ * reported back in *img_offset and *len.
-  *
-  * If the L2 table exists, request->l2_table points to the L2 table cache entry
-  * and the caller must free the reference when they are finished.  The cache
-  * entry is exposed in this way to avoid callers having to read the L2 table
-  * again later during request processing.  If request->l2_table is non-NULL it
-  * will be unreferenced before taking on the new cache entry.
-+ *
-+ * On success QED_CLUSTER_FOUND is returned and img_offset/len are a contiguous
-+ * range in the image file.
-+ *
-+ * On failure QED_CLUSTER_L2 or QED_CLUSTER_L1 is returned for missing L2 or L1
-+ * table offset, respectively. len is number of contiguous unallocated bytes.
-  */
--void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
--                      size_t len, QEDFindClusterFunc *cb, void *opaque)
-+int qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
-+                     size_t *len, uint64_t *img_offset)
- {
-     uint64_t l2_offset;
-     uint64_t offset = 0;
-@@ -XXX,XX +XXX,XX @@ void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
-     /* Limit length to L2 boundary.  Requests are broken up at the L2 boundary
-      * so that a request acts on one L2 table at a time.
-      */
--    len = MIN(len, (((pos >> s->l1_shift) + 1) << s->l1_shift) - pos);
-+    *len = MIN(*len, (((pos >> s->l1_shift) + 1) << s->l1_shift) - pos);
-     l2_offset = s->l1_table->offsets[qed_l1_index(s, pos)];
-     if (qed_offset_is_unalloc_cluster(l2_offset)) {
--        cb(opaque, QED_CLUSTER_L1, 0, len);
--        return;
-+        *img_offset = 0;
-+        return QED_CLUSTER_L1;
-     }
-     if (!qed_check_table_offset(s, l2_offset)) {
--        cb(opaque, -EINVAL, 0, 0);
--        return;
-+        *img_offset = *len = 0;
-+        return -EINVAL;
-     }
-     ret = qed_read_l2_table(s, request, l2_offset);
-@@ -XXX,XX +XXX,XX @@ void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
-     }
-     index = qed_l2_index(s, pos);
--    n = qed_bytes_to_clusters(s,
--                              qed_offset_into_cluster(s, pos) + len);
-+    n = qed_bytes_to_clusters(s, qed_offset_into_cluster(s, pos) + *len);
-     n = qed_count_contiguous_clusters(s, request->l2_table->table,
-                                       index, n, &offset);
-@@ -XXX,XX +XXX,XX @@ void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
-         ret = -EINVAL;
-     }
--    len = MIN(len,
--              n * s->header.cluster_size - qed_offset_into_cluster(s, pos));
-+    *len = MIN(*len,
-+               n * s->header.cluster_size - qed_offset_into_cluster(s, pos));
- out:
--    cb(opaque, ret, offset, len);
-+    *img_offset = offset;
-     qed_release(s);
-+    return ret;
- }
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
-         .file = file,
-     };
-     QEDRequest request = { .l2_table = NULL };
-+    uint64_t offset;
-+    int ret;
--    qed_find_cluster(s, &request, cb.pos, len, qed_is_allocated_cb, &cb);
-+    ret = qed_find_cluster(s, &request, cb.pos, &len, &offset);
-+    qed_is_allocated_cb(&cb, ret, offset, len);
--    /* Now sleep if the callback wasn't invoked immediately */
--    while (cb.status == BDRV_BLOCK_OFFSET_MASK) {
--        cb.co = qemu_coroutine_self();
--        qemu_coroutine_yield();
--    }
-+    /* The callback was invoked immediately */
-+    assert(cb.status != BDRV_BLOCK_OFFSET_MASK);
-     qed_unref_l2_cache_entry(request.l2_table);
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
-  *              or -errno
-  * @offset:     Cluster offset in bytes
-  * @len:        Length in bytes
-- *
-- * Callback from qed_find_cluster().
-  */
- static void qed_aio_write_data(void *opaque, int ret,
-                                uint64_t offset, size_t len)
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_data(void *opaque, int ret,
-  *              or -errno
-  * @offset:     Cluster offset in bytes
-  * @len:        Length in bytes
-- *
-- * Callback from qed_find_cluster().
-  */
- static void qed_aio_read_data(void *opaque, int ret,
-                               uint64_t offset, size_t len)
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb, int ret)
-     BDRVQEDState *s = acb_to_s(acb);
-     QEDFindClusterFunc *io_fn = (acb->flags & QED_AIOCB_WRITE) ?
-                                 qed_aio_write_data : qed_aio_read_data;
-+    uint64_t offset;
-+    size_t len;
-     trace_qed_aio_next_io(s, acb, ret, acb->cur_pos + acb->cur_qiov.size);
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb, int ret)
-     }
-     /* Find next cluster and start I/O */
--    qed_find_cluster(s, &acb->request,
--                      acb->cur_pos, acb->end_pos - acb->cur_pos,
--                      io_fn, acb);
-+    len = acb->end_pos - acb->cur_pos;
-+    ret = qed_find_cluster(s, &acb->request, acb->cur_pos, &len, &offset);
-+    io_fn(acb, ret, offset, len);
- }
- static BlockAIOCB *qed_aio_setup(BlockDriverState *bs,
-diff --git a/block/qed.h b/block/qed.h
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.h
-+++ b/block/qed.h
-@@ -XXX,XX +XXX,XX @@ int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
- /**
-  * Cluster functions
-  */
--void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
--                      size_t len, QEDFindClusterFunc *cb, void *opaque);
-+int qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
-+                     size_t *len, uint64_t *img_offset);
- /**
-  * Consistency check
---
-.8.3.1

-[Qemu-devel] [PULL 32/61] qed: Remove callback from qed_copy_from_backing_file()
+Deleted patch
-With this change, qed_aio_write_prefill() and qed_aio_write_postfill()
-collapse into a single function. This is reflected by a rename of the
-combined function to qed_aio_write_cow().
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Eric Blake <eblake@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 57 +++++++++++++++++++++++----------------------------------
-file changed, 23 insertions(+), 34 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static int qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
-  * @pos:        Byte position in device
-  * @len:        Number of bytes
-  * @offset:     Byte offset in image file
-- * @cb:         Completion function
-- * @opaque:     User data for completion function
-  */
--static void qed_copy_from_backing_file(BDRVQEDState *s, uint64_t pos,
--                                       uint64_t len, uint64_t offset,
--                                       BlockCompletionFunc *cb,
--                                       void *opaque)
-+static int qed_copy_from_backing_file(BDRVQEDState *s, uint64_t pos,
-+                                      uint64_t len, uint64_t offset)
- {
-     QEMUIOVector qiov;
-     QEMUIOVector *backing_qiov = NULL;
-@@ -XXX,XX +XXX,XX @@ static void qed_copy_from_backing_file(BDRVQEDState *s, uint64_t pos,
-     /* Skip copy entirely if there is no work to do */
-     if (len == 0) {
--        cb(opaque, 0);
--        return;
-+        return 0;
-     }
-     iov = (struct iovec) {
-@@ -XXX,XX +XXX,XX @@ static void qed_copy_from_backing_file(BDRVQEDState *s, uint64_t pos,
-     ret = 0;
- out:
-     qemu_vfree(iov.iov_base);
--    cb(opaque, ret);
-+    return ret;
- }
- /**
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_main(void *opaque, int ret)
- }
- /**
-- * Populate back untouched region of new data cluster
-+ * Populate untouched regions of new data cluster
-  */
--static void qed_aio_write_postfill(void *opaque, int ret)
-+static void qed_aio_write_cow(void *opaque, int ret)
- {
-     QEDAIOCB *acb = opaque;
-     BDRVQEDState *s = acb_to_s(acb);
--    uint64_t start = acb->cur_pos + acb->cur_qiov.size;
--    uint64_t len =
--        qed_start_of_cluster(s, start + s->header.cluster_size - 1) - start;
--    uint64_t offset = acb->cur_cluster +
--                      qed_offset_into_cluster(s, acb->cur_pos) +
--                      acb->cur_qiov.size;
-+    uint64_t start, len, offset;
-+
-+    /* Populate front untouched region of new data cluster */
-+    start = qed_start_of_cluster(s, acb->cur_pos);
-+    len = qed_offset_into_cluster(s, acb->cur_pos);
-+    trace_qed_aio_write_prefill(s, acb, start, len, acb->cur_cluster);
-+    ret = qed_copy_from_backing_file(s, start, len, acb->cur_cluster);
-     if (ret) {
-         qed_aio_complete(acb, ret);
-         return;
-     }
--    trace_qed_aio_write_postfill(s, acb, start, len, offset);
--    qed_copy_from_backing_file(s, start, len, offset,
--                                qed_aio_write_main, acb);
--}
-+    /* Populate back untouched region of new data cluster */
-+    start = acb->cur_pos + acb->cur_qiov.size;
-+    len = qed_start_of_cluster(s, start + s->header.cluster_size - 1) - start;
-+    offset = acb->cur_cluster +
-+             qed_offset_into_cluster(s, acb->cur_pos) +
-+             acb->cur_qiov.size;
--/**
-- * Populate front untouched region of new data cluster
-- */
--static void qed_aio_write_prefill(void *opaque, int ret)
--{
--    QEDAIOCB *acb = opaque;
--    BDRVQEDState *s = acb_to_s(acb);
--    uint64_t start = qed_start_of_cluster(s, acb->cur_pos);
--    uint64_t len = qed_offset_into_cluster(s, acb->cur_pos);
-+    trace_qed_aio_write_postfill(s, acb, start, len, offset);
-+    ret = qed_copy_from_backing_file(s, start, len, offset);
--    trace_qed_aio_write_prefill(s, acb, start, len, acb->cur_cluster);
--    qed_copy_from_backing_file(s, start, len, acb->cur_cluster,
--                                qed_aio_write_postfill, acb);
-+    qed_aio_write_main(acb, ret);
- }
- /**
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-         cb = qed_aio_write_zero_cluster;
-     } else {
--        cb = qed_aio_write_prefill;
-+        cb = qed_aio_write_cow;
-         acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
-     }
---
-.8.3.1

-[Qemu-devel] [PULL 34/61] qed: Remove callback from qed_write_header()
+Deleted patch
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 32 ++++++++++++--------------------
-file changed, 12 insertions(+), 20 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ int qed_write_header_sync(BDRVQEDState *s)
-  * This function only updates known header fields in-place and does not affect
-  * extra data after the QED header.
-  */
--static void qed_write_header(BDRVQEDState *s, BlockCompletionFunc cb,
--                             void *opaque)
-+static int qed_write_header(BDRVQEDState *s)
- {
-     /* We must write full sectors for O_DIRECT but cannot necessarily generate
-      * the data following the header if an unrecognized compat feature is
-@@ -XXX,XX +XXX,XX @@ static void qed_write_header(BDRVQEDState *s, BlockCompletionFunc cb,
-     ret = 0;
- out:
-     qemu_vfree(buf);
--    cb(opaque, ret);
-+    return ret;
- }
- static uint64_t qed_max_image_size(uint32_t cluster_size, uint32_t table_size)
-@@ -XXX,XX +XXX,XX @@ static void qed_unplug_allocating_write_reqs(BDRVQEDState *s)
-     }
- }
--static void qed_finish_clear_need_check(void *opaque, int ret)
--{
--    /* Do nothing */
--}
--
--static void qed_flush_after_clear_need_check(void *opaque, int ret)
--{
--    BDRVQEDState *s = opaque;
--
--    bdrv_aio_flush(s->bs, qed_finish_clear_need_check, s);
--
--    /* No need to wait until flush completes */
--    qed_unplug_allocating_write_reqs(s);
--}
--
- static void qed_clear_need_check(void *opaque, int ret)
- {
-     BDRVQEDState *s = opaque;
-@@ -XXX,XX +XXX,XX @@ static void qed_clear_need_check(void *opaque, int ret)
-     }
-     s->header.features &= ~QED_F_NEED_CHECK;
--    qed_write_header(s, qed_flush_after_clear_need_check, s);
-+    ret = qed_write_header(s);
-+    (void) ret;
-+
-+    qed_unplug_allocating_write_reqs(s);
-+
-+    ret = bdrv_flush(s->bs);
-+    (void) ret;
- }
- static void qed_need_check_timer_cb(void *opaque)
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-     BlockCompletionFunc *cb;
-+    int ret;
-     /* Cancel timer when the first allocating request comes in */
-     if (QSIMPLEQ_EMPTY(&s->allocating_write_reqs)) {
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-     if (qed_should_set_need_check(s)) {
-         s->header.features |= QED_F_NEED_CHECK;
--        qed_write_header(s, cb, acb);
-+        ret = qed_write_header(s);
-+        cb(acb, ret);
-     } else {
-         cb(acb, 0);
-     }
---
-.8.3.1

-[Qemu-devel] [PULL 37/61] qed: Remove callback from qed_write_table()
+Deleted patch
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed-table.c | 47 ++++++++++++-----------------------------------
- block/qed.c       | 12 +++++++-----
- block/qed.h       |  8 +++-----
-files changed, 22 insertions(+), 45 deletions(-)
-diff --git a/block/qed-table.c b/block/qed-table.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed-table.c
-+++ b/block/qed-table.c
-@@ -XXX,XX +XXX,XX @@ out:
-  * @index:      Index of first element
-  * @n:          Number of elements
-  * @flush:      Whether or not to sync to disk
-- * @cb:         Completion function
-- * @opaque:     Argument for completion function
-  */
--static void qed_write_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
--                            unsigned int index, unsigned int n, bool flush,
--                            BlockCompletionFunc *cb, void *opaque)
-+static int qed_write_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
-+                           unsigned int index, unsigned int n, bool flush)
- {
-     unsigned int sector_mask = BDRV_SECTOR_SIZE / sizeof(uint64_t) - 1;
-     unsigned int start, end, i;
-@@ -XXX,XX +XXX,XX @@ static void qed_write_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
-     ret = 0;
- out:
-     qemu_vfree(new_table);
--    cb(opaque, ret);
--}
--
--/**
-- * Propagate return value from async callback
-- */
--static void qed_sync_cb(void *opaque, int ret)
--{
--    *(int *)opaque = ret;
-+    return ret;
- }
- int qed_read_l1_table_sync(BDRVQEDState *s)
-@@ -XXX,XX +XXX,XX @@ int qed_read_l1_table_sync(BDRVQEDState *s)
-     return qed_read_table(s, s->header.l1_table_offset, s->l1_table);
- }
--void qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n,
--                        BlockCompletionFunc *cb, void *opaque)
-+int qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n)
- {
-     BLKDBG_EVENT(s->bs->file, BLKDBG_L1_UPDATE);
--    qed_write_table(s, s->header.l1_table_offset,
--                    s->l1_table, index, n, false, cb, opaque);
-+    return qed_write_table(s, s->header.l1_table_offset,
-+                           s->l1_table, index, n, false);
- }
- int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
-                             unsigned int n)
- {
--    int ret = -EINPROGRESS;
--
--    qed_write_l1_table(s, index, n, qed_sync_cb, &ret);
--    BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS);
--
--    return ret;
-+    return qed_write_l1_table(s, index, n);
- }
- int qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset)
-@@ -XXX,XX +XXX,XX @@ int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request, uint64_t offset
-     return qed_read_l2_table(s, request, offset);
- }
--void qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
--                        unsigned int index, unsigned int n, bool flush,
--                        BlockCompletionFunc *cb, void *opaque)
-+int qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
-+                       unsigned int index, unsigned int n, bool flush)
- {
-     BLKDBG_EVENT(s->bs->file, BLKDBG_L2_UPDATE);
--    qed_write_table(s, request->l2_table->offset,
--                    request->l2_table->table, index, n, flush, cb, opaque);
-+    return qed_write_table(s, request->l2_table->offset,
-+                           request->l2_table->table, index, n, flush);
- }
- int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
-                             unsigned int index, unsigned int n, bool flush)
- {
--    int ret = -EINPROGRESS;
--
--    qed_write_l2_table(s, request, index, n, flush, qed_sync_cb, &ret);
--    BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS);
--
--    return ret;
-+    return qed_write_l2_table(s, request, index, n, flush);
- }
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_l1_update(void *opaque, int ret)
-     index = qed_l1_index(s, acb->cur_pos);
-     s->l1_table->offsets[index] = acb->request.l2_table->offset;
--    qed_write_l1_table(s, index, 1, qed_commit_l2_update, acb);
-+    ret = qed_write_l1_table(s, index, 1);
-+    qed_commit_l2_update(acb, ret);
- }
- /**
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
-     if (need_alloc) {
-         /* Write out the whole new L2 table */
--        qed_write_l2_table(s, &acb->request, 0, s->table_nelems, true,
--                           qed_aio_write_l1_update, acb);
-+        ret = qed_write_l2_table(s, &acb->request, 0, s->table_nelems, true);
-+        qed_aio_write_l1_update(acb, ret);
-     } else {
-         /* Write out only the updated part of the L2 table */
--        qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters, false,
--                           qed_aio_next_io_cb, acb);
-+        ret = qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters,
-+                                 false);
-+        qed_aio_next_io(acb, ret);
-     }
-     return;
-diff --git a/block/qed.h b/block/qed.h
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.h
-+++ b/block/qed.h
-@@ -XXX,XX +XXX,XX @@ void qed_commit_l2_cache_entry(L2TableCache *l2_cache, CachedL2Table *l2_table);
-  * Table I/O functions
-  */
- int qed_read_l1_table_sync(BDRVQEDState *s);
--void qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n,
--                        BlockCompletionFunc *cb, void *opaque);
-+int qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n);
- int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
-                             unsigned int n);
- int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
-                            uint64_t offset);
- int qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset);
--void qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
--                        unsigned int index, unsigned int n, bool flush,
--                        BlockCompletionFunc *cb, void *opaque);
-+int qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
-+                       unsigned int index, unsigned int n, bool flush);
- int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
-                             unsigned int index, unsigned int n, bool flush);
---
-.8.3.1

-[Qemu-devel] [PULL 39/61] qed: Make qed_aio_write_main() synchronous
+Deleted patch
-Note that this code is generally not running in coroutine context, so
-this is an actual blocking synchronous operation. We'll fix this in a
-moment.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 61 +++++++++++++++++++------------------------------------------
-file changed, 19 insertions(+), 42 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_start_io(QEDAIOCB *acb)
-     qed_aio_next_io(acb, 0);
- }
--static void qed_aio_next_io_cb(void *opaque, int ret)
--{
--    QEDAIOCB *acb = opaque;
--
--    qed_aio_next_io(acb, ret);
--}
--
- static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
- {
-     assert(!s->allocating_write_reqs_plugged);
-@@ -XXX,XX +XXX,XX @@ err:
-     qed_aio_complete(acb, ret);
- }
--static void qed_aio_write_l2_update_cb(void *opaque, int ret)
--{
--    QEDAIOCB *acb = opaque;
--    qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
--}
--
--/**
-- * Flush new data clusters before updating the L2 table
-- *
-- * This flush is necessary when a backing file is in use.  A crash during an
-- * allocating write could result in empty clusters in the image.  If the write
-- * only touched a subregion of the cluster, then backing image sectors have
-- * been lost in the untouched region.  The solution is to flush after writing a
-- * new data cluster and before updating the L2 table.
-- */
--static void qed_aio_write_flush_before_l2_update(void *opaque, int ret)
--{
--    QEDAIOCB *acb = opaque;
--    BDRVQEDState *s = acb_to_s(acb);
--
--    if (!bdrv_aio_flush(s->bs->file->bs, qed_aio_write_l2_update_cb, opaque)) {
--        qed_aio_complete(acb, -EIO);
--    }
--}
--
- /**
-  * Write data to the image file
-  */
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_main(void *opaque, int ret)
-     BDRVQEDState *s = acb_to_s(acb);
-     uint64_t offset = acb->cur_cluster +
-                       qed_offset_into_cluster(s, acb->cur_pos);
--    BlockCompletionFunc *next_fn;
-     trace_qed_aio_write_main(s, acb, ret, offset, acb->cur_qiov.size);
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_main(void *opaque, int ret)
-         return;
-     }
-+    BLKDBG_EVENT(s->bs->file, BLKDBG_WRITE_AIO);
-+    ret = bdrv_pwritev(s->bs->file, offset, &acb->cur_qiov);
-+    if (ret >= 0) {
-+        ret = 0;
-+    }
-+
-     if (acb->find_cluster_ret == QED_CLUSTER_FOUND) {
--        next_fn = qed_aio_next_io_cb;
-+        qed_aio_next_io(acb, ret);
-     } else {
-         if (s->bs->backing) {
--            next_fn = qed_aio_write_flush_before_l2_update;
--        } else {
--            next_fn = qed_aio_write_l2_update_cb;
-+            /*
-+             * Flush new data clusters before updating the L2 table
-+             *
-+             * This flush is necessary when a backing file is in use.  A crash
-+             * during an allocating write could result in empty clusters in the
-+             * image.  If the write only touched a subregion of the cluster,
-+             * then backing image sectors have been lost in the untouched
-+             * region.  The solution is to flush after writing a new data
-+             * cluster and before updating the L2 table.
-+             */
-+            ret = bdrv_flush(s->bs->file->bs);
-         }
-+        qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
-     }
--
--    BLKDBG_EVENT(s->bs->file, BLKDBG_WRITE_AIO);
--    bdrv_aio_writev(s->bs->file, offset / BDRV_SECTOR_SIZE,
--                    &acb->cur_qiov, acb->cur_qiov.size / BDRV_SECTOR_SIZE,
--                    next_fn, acb);
- }
- /**
---
-.8.3.1

-[Qemu-devel] [PULL 40/61] qed: Inline qed_commit_l2_update()
+Deleted patch
-qed_commit_l2_update() is unconditionally called at the end of
-qed_aio_write_l1_update(). Inline it.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 36 ++++++++++++++----------------------
-file changed, 14 insertions(+), 22 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
- }
- /**
-- * Commit the current L2 table to the cache
-+ * Update L1 table with new L2 table offset and write it out
-  */
--static void qed_commit_l2_update(void *opaque, int ret)
-+static void qed_aio_write_l1_update(void *opaque, int ret)
- {
-     QEDAIOCB *acb = opaque;
-     BDRVQEDState *s = acb_to_s(acb);
-     CachedL2Table *l2_table = acb->request.l2_table;
-     uint64_t l2_offset = l2_table->offset;
-+    int index;
-+
-+    if (ret) {
-+        qed_aio_complete(acb, ret);
-+        return;
-+    }
-+    index = qed_l1_index(s, acb->cur_pos);
-+    s->l1_table->offsets[index] = l2_table->offset;
-+
-+    ret = qed_write_l1_table(s, index, 1);
-+
-+    /* Commit the current L2 table to the cache */
-     qed_commit_l2_cache_entry(&s->l2_cache, l2_table);
-     /* This is guaranteed to succeed because we just committed the entry to the
-@@ -XXX,XX +XXX,XX @@ static void qed_commit_l2_update(void *opaque, int ret)
-     qed_aio_next_io(acb, ret);
- }
--/**
-- * Update L1 table with new L2 table offset and write it out
-- */
--static void qed_aio_write_l1_update(void *opaque, int ret)
--{
--    QEDAIOCB *acb = opaque;
--    BDRVQEDState *s = acb_to_s(acb);
--    int index;
--
--    if (ret) {
--        qed_aio_complete(acb, ret);
--        return;
--    }
--
--    index = qed_l1_index(s, acb->cur_pos);
--    s->l1_table->offsets[index] = acb->request.l2_table->offset;
--
--    ret = qed_write_l1_table(s, index, 1);
--    qed_commit_l2_update(acb, ret);
--}
- /**
-  * Update L2 table with new cluster offsets and write them out
---
-.8.3.1

-[Qemu-devel] [PULL 41/61] qed: Add return value to qed_aio_write_l1_update()
+Deleted patch
-Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
-just return an error code and let the caller handle it.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 19 +++++++++----------
-file changed, 9 insertions(+), 10 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
- /**
-  * Update L1 table with new L2 table offset and write it out
-  */
--static void qed_aio_write_l1_update(void *opaque, int ret)
-+static int qed_aio_write_l1_update(QEDAIOCB *acb)
- {
--    QEDAIOCB *acb = opaque;
-     BDRVQEDState *s = acb_to_s(acb);
-     CachedL2Table *l2_table = acb->request.l2_table;
-     uint64_t l2_offset = l2_table->offset;
--    int index;
--
--    if (ret) {
--        qed_aio_complete(acb, ret);
--        return;
--    }
-+    int index, ret;
-     index = qed_l1_index(s, acb->cur_pos);
-     s->l1_table->offsets[index] = l2_table->offset;
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_l1_update(void *opaque, int ret)
-     acb->request.l2_table = qed_find_l2_cache_entry(&s->l2_cache, l2_offset);
-     assert(acb->request.l2_table != NULL);
--    qed_aio_next_io(acb, ret);
-+    return ret;
- }
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
-     if (need_alloc) {
-         /* Write out the whole new L2 table */
-         ret = qed_write_l2_table(s, &acb->request, 0, s->table_nelems, true);
--        qed_aio_write_l1_update(acb, ret);
-+        if (ret) {
-+            goto err;
-+        }
-+        ret = qed_aio_write_l1_update(acb);
-+        qed_aio_next_io(acb, ret);
-+
-     } else {
-         /* Write out only the updated part of the L2 table */
-         ret = qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters,
---
-.8.3.1

-[Qemu-devel] [PULL 42/61] qed: Add return value to qed_aio_write_l2_update()
+Deleted patch
-Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
-just return an error code and let the caller handle it.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 43 ++++++++++++++++++++++++++-----------------
-file changed, 26 insertions(+), 17 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_l1_update(QEDAIOCB *acb)
- /**
-  * Update L2 table with new cluster offsets and write them out
-  */
--static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
-+static int qed_aio_write_l2_update(QEDAIOCB *acb, uint64_t offset)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-     bool need_alloc = acb->find_cluster_ret == QED_CLUSTER_L1;
--    int index;
--
--    if (ret) {
--        goto err;
--    }
-+    int index, ret;
-     if (need_alloc) {
-         qed_unref_l2_cache_entry(acb->request.l2_table);
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
-         /* Write out the whole new L2 table */
-         ret = qed_write_l2_table(s, &acb->request, 0, s->table_nelems, true);
-         if (ret) {
--            goto err;
-+            return ret;
-         }
--        ret = qed_aio_write_l1_update(acb);
--        qed_aio_next_io(acb, ret);
--
-+        return qed_aio_write_l1_update(acb);
-     } else {
-         /* Write out only the updated part of the L2 table */
-         ret = qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters,
-                                  false);
--        qed_aio_next_io(acb, ret);
-+        if (ret) {
-+            return ret;
-+        }
-     }
--    return;
--
--err:
--    qed_aio_complete(acb, ret);
-+    return 0;
- }
- /**
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_main(void *opaque, int ret)
-              */
-             ret = bdrv_flush(s->bs->file->bs);
-         }
--        qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
-+        if (ret) {
-+            goto err;
-+        }
-+        ret = qed_aio_write_l2_update(acb, acb->cur_cluster);
-+        if (ret) {
-+            goto err;
-+        }
-+        qed_aio_next_io(acb, 0);
-     }
-+    return;
-+
-+err:
-+    qed_aio_complete(acb, ret);
- }
- /**
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_zero_cluster(void *opaque, int ret)
-         return;
-     }
--    qed_aio_write_l2_update(acb, 0, 1);
-+    ret = qed_aio_write_l2_update(acb, 1);
-+    if (ret < 0) {
-+        qed_aio_complete(acb, ret);
-+        return;
-+    }
-+    qed_aio_next_io(acb, 0);
- }
- /**
---
-.8.3.1

-[Qemu-devel] [PULL 44/61] qed: Add return value to qed_aio_write_cow()
+Deleted patch
-Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
-just return an error code and let the caller handle it.
-While refactoring qed_aio_write_alloc() to accomodate the change,
-qed_aio_write_zero_cluster() ended up with a single line, so I chose to
-inline that line and remove the function completely.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 58 +++++++++++++++++++++-------------------------------------
-file changed, 21 insertions(+), 37 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_main(QEDAIOCB *acb)
- /**
-  * Populate untouched regions of new data cluster
-  */
--static void qed_aio_write_cow(void *opaque, int ret)
-+static int qed_aio_write_cow(QEDAIOCB *acb)
- {
--    QEDAIOCB *acb = opaque;
-     BDRVQEDState *s = acb_to_s(acb);
-     uint64_t start, len, offset;
-+    int ret;
-     /* Populate front untouched region of new data cluster */
-     start = qed_start_of_cluster(s, acb->cur_pos);
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_cow(void *opaque, int ret)
-     trace_qed_aio_write_prefill(s, acb, start, len, acb->cur_cluster);
-     ret = qed_copy_from_backing_file(s, start, len, acb->cur_cluster);
--    if (ret) {
--        qed_aio_complete(acb, ret);
--        return;
-+    if (ret < 0) {
-+        return ret;
-     }
-     /* Populate back untouched region of new data cluster */
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_cow(void *opaque, int ret)
-     trace_qed_aio_write_postfill(s, acb, start, len, offset);
-     ret = qed_copy_from_backing_file(s, start, len, offset);
--    if (ret) {
--        qed_aio_complete(acb, ret);
--        return;
--    }
--
--    ret = qed_aio_write_main(acb);
-     if (ret < 0) {
--        qed_aio_complete(acb, ret);
--        return;
-+        return ret;
-     }
--    qed_aio_next_io(acb, 0);
-+
-+    return qed_aio_write_main(acb);
- }
- /**
-@@ -XXX,XX +XXX,XX @@ static bool qed_should_set_need_check(BDRVQEDState *s)
-     return !(s->header.features & QED_F_NEED_CHECK);
- }
--static void qed_aio_write_zero_cluster(void *opaque, int ret)
--{
--    QEDAIOCB *acb = opaque;
--
--    if (ret) {
--        qed_aio_complete(acb, ret);
--        return;
--    }
--
--    ret = qed_aio_write_l2_update(acb, 1);
--    if (ret < 0) {
--        qed_aio_complete(acb, ret);
--        return;
--    }
--    qed_aio_next_io(acb, 0);
--}
--
- /**
-  * Write new data cluster
-  *
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_zero_cluster(void *opaque, int ret)
- static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
- {
-     BDRVQEDState *s = acb_to_s(acb);
--    BlockCompletionFunc *cb;
-     int ret;
-     /* Cancel timer when the first allocating request comes in */
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-             qed_aio_start_io(acb);
-             return;
-         }
--
--        cb = qed_aio_write_zero_cluster;
-     } else {
--        cb = qed_aio_write_cow;
-         acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
-     }
-     if (qed_should_set_need_check(s)) {
-         s->header.features |= QED_F_NEED_CHECK;
-         ret = qed_write_header(s);
--        cb(acb, ret);
-+        if (ret < 0) {
-+            qed_aio_complete(acb, ret);
-+            return;
-+        }
-+    }
-+
-+    if (acb->flags & QED_AIOCB_ZERO) {
-+        ret = qed_aio_write_l2_update(acb, 1);
-     } else {
--        cb(acb, 0);
-+        ret = qed_aio_write_cow(acb);
-     }
-+    if (ret < 0) {
-+        qed_aio_complete(acb, ret);
-+        return;
-+    }
-+    qed_aio_next_io(acb, 0);
- }
- /**
---
-.8.3.1

-[Qemu-devel] [PULL 45/61] qed: Add return value to qed_aio_write_inplace/alloc()
+Deleted patch
-Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
-just return an error code and let the caller handle it.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 43 ++++++++++++++++++++-----------------------
-file changed, 20 insertions(+), 23 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static bool qed_should_set_need_check(BDRVQEDState *s)
-  *
-  * This path is taken when writing to previously unallocated clusters.
-  */
--static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-+static int qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-     int ret;
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-     }
-     if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs) ||
-         s->allocating_write_reqs_plugged) {
--        return; /* wait for existing request to finish */
-+        return -EINPROGRESS; /* wait for existing request to finish */
-     }
-     acb->cur_nclusters = qed_bytes_to_clusters(s,
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-     if (acb->flags & QED_AIOCB_ZERO) {
-         /* Skip ahead if the clusters are already zero */
-         if (acb->find_cluster_ret == QED_CLUSTER_ZERO) {
--            qed_aio_start_io(acb);
--            return;
-+            return 0;
-         }
-     } else {
-         acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-         s->header.features |= QED_F_NEED_CHECK;
-         ret = qed_write_header(s);
-         if (ret < 0) {
--            qed_aio_complete(acb, ret);
--            return;
-+            return ret;
-         }
-     }
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-         ret = qed_aio_write_cow(acb);
-     }
-     if (ret < 0) {
--        qed_aio_complete(acb, ret);
--        return;
-+        return ret;
-     }
--    qed_aio_next_io(acb, 0);
-+    return 0;
- }
- /**
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-  *
-  * This path is taken when writing to already allocated clusters.
-  */
--static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
-+static int qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
- {
--    int ret;
--
-     /* Allocate buffer for zero writes */
-     if (acb->flags & QED_AIOCB_ZERO) {
-         struct iovec *iov = acb->qiov->iov;
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
-         if (!iov->iov_base) {
-             iov->iov_base = qemu_try_blockalign(acb->common.bs, iov->iov_len);
-             if (iov->iov_base == NULL) {
--                qed_aio_complete(acb, -ENOMEM);
--                return;
-+                return -ENOMEM;
-             }
-             memset(iov->iov_base, 0, iov->iov_len);
-         }
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
-     qemu_iovec_concat(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
-     /* Do the actual write */
--    ret = qed_aio_write_main(acb);
--    if (ret < 0) {
--        qed_aio_complete(acb, ret);
--        return;
--    }
--    qed_aio_next_io(acb, 0);
-+    return qed_aio_write_main(acb);
- }
- /**
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_data(void *opaque, int ret,
-     switch (ret) {
-     case QED_CLUSTER_FOUND:
--        qed_aio_write_inplace(acb, offset, len);
-+        ret = qed_aio_write_inplace(acb, offset, len);
-         break;
-     case QED_CLUSTER_L2:
-     case QED_CLUSTER_L1:
-     case QED_CLUSTER_ZERO:
--        qed_aio_write_alloc(acb, len);
-+        ret = qed_aio_write_alloc(acb, len);
-         break;
-     default:
--        qed_aio_complete(acb, ret);
-+        assert(ret < 0);
-         break;
-     }
-+
-+    if (ret < 0) {
-+        if (ret != -EINPROGRESS) {
-+            qed_aio_complete(acb, ret);
-+        }
-+        return;
-+    }
-+    qed_aio_next_io(acb, 0);
- }
- /**
---
-.8.3.1

-[Qemu-devel] [PULL 47/61] qed: Remove ret argument from qed_aio_next_io()
+Deleted patch
-All callers pass ret = 0, so we can just remove it.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 17 ++++++-----------
-file changed, 6 insertions(+), 11 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static CachedL2Table *qed_new_l2_table(BDRVQEDState *s)
-     return l2_table;
- }
--static void qed_aio_next_io(QEDAIOCB *acb, int ret);
-+static void qed_aio_next_io(QEDAIOCB *acb);
- static void qed_aio_start_io(QEDAIOCB *acb)
- {
--    qed_aio_next_io(acb, 0);
-+    qed_aio_next_io(acb);
- }
- static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
-@@ -XXX,XX +XXX,XX @@ static int qed_aio_read_data(void *opaque, int ret, uint64_t offset, size_t len)
- /**
-  * Begin next I/O or complete the request
-  */
--static void qed_aio_next_io(QEDAIOCB *acb, int ret)
-+static void qed_aio_next_io(QEDAIOCB *acb)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-     uint64_t offset;
-     size_t len;
-+    int ret;
--    trace_qed_aio_next_io(s, acb, ret, acb->cur_pos + acb->cur_qiov.size);
-+    trace_qed_aio_next_io(s, acb, 0, acb->cur_pos + acb->cur_qiov.size);
-     if (acb->backing_qiov) {
-         qemu_iovec_destroy(acb->backing_qiov);
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb, int ret)
-         acb->backing_qiov = NULL;
-     }
--    /* Handle I/O error */
--    if (ret) {
--        qed_aio_complete(acb, ret);
--        return;
--    }
--
-     acb->qiov_offset += acb->cur_qiov.size;
-     acb->cur_pos += acb->cur_qiov.size;
-     qemu_iovec_reset(&acb->cur_qiov);
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb, int ret)
-         }
-         return;
-     }
--    qed_aio_next_io(acb, 0);
-+    qed_aio_next_io(acb);
- }
- static BlockAIOCB *qed_aio_setup(BlockDriverState *bs,
---
-.8.3.1

-[Qemu-devel] [PULL 49/61] qed: Implement .bdrv_co_readv/writev
+Deleted patch
-Most of the qed code is now synchronous and matches the coroutine model.
-One notable exception is the serialisation between requests which can
-still schedule a callback. Before we can replace this with coroutine
-locks, let's convert the driver's external interfaces to the coroutine
-versions.
-We need to be careful to handle both requests that call the completion
-callback directly from the calling coroutine (i.e. fully synchronous
-code) and requests that involve some callback, so that we need to yield
-and wait for the completion callback coming from outside the coroutine.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 97 ++++++++++++++++++++++++++-----------------------------------
-file changed, 42 insertions(+), 55 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb)
-     }
- }
--static BlockAIOCB *qed_aio_setup(BlockDriverState *bs,
--                                 int64_t sector_num,
--                                 QEMUIOVector *qiov, int nb_sectors,
--                                 BlockCompletionFunc *cb,
--                                 void *opaque, int flags)
-+typedef struct QEDRequestCo {
-+    Coroutine *co;
-+    bool done;
-+    int ret;
-+} QEDRequestCo;
-+
-+static void qed_co_request_cb(void *opaque, int ret)
- {
--    QEDAIOCB *acb = qemu_aio_get(&qed_aiocb_info, bs, cb, opaque);
-+    QEDRequestCo *co = opaque;
--    trace_qed_aio_setup(bs->opaque, acb, sector_num, nb_sectors,
--                        opaque, flags);
-+    co->done = true;
-+    co->ret = ret;
-+    qemu_coroutine_enter_if_inactive(co->co);
-+}
-+
-+static int coroutine_fn qed_co_request(BlockDriverState *bs, int64_t sector_num,
-+                                       QEMUIOVector *qiov, int nb_sectors,
-+                                       int flags)
-+{
-+    QEDRequestCo co = {
-+        .co     = qemu_coroutine_self(),
-+        .done   = false,
-+    };
-+    QEDAIOCB *acb = qemu_aio_get(&qed_aiocb_info, bs, qed_co_request_cb, &co);
-+
-+    trace_qed_aio_setup(bs->opaque, acb, sector_num, nb_sectors, &co, flags);
-     acb->flags = flags;
-     acb->qiov = qiov;
-@@ -XXX,XX +XXX,XX @@ static BlockAIOCB *qed_aio_setup(BlockDriverState *bs,
-     /* Start request */
-     qed_aio_start_io(acb);
--    return &acb->common;
--}
--static BlockAIOCB *bdrv_qed_aio_readv(BlockDriverState *bs,
--                                      int64_t sector_num,
--                                      QEMUIOVector *qiov, int nb_sectors,
--                                      BlockCompletionFunc *cb,
--                                      void *opaque)
--{
--    return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
-+    if (!co.done) {
-+        qemu_coroutine_yield();
-+    }
-+
-+    return co.ret;
- }
--static BlockAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
--                                       int64_t sector_num,
--                                       QEMUIOVector *qiov, int nb_sectors,
--                                       BlockCompletionFunc *cb,
--                                       void *opaque)
-+static int coroutine_fn bdrv_qed_co_readv(BlockDriverState *bs,
-+                                          int64_t sector_num, int nb_sectors,
-+                                          QEMUIOVector *qiov)
- {
--    return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb,
--                         opaque, QED_AIOCB_WRITE);
-+    return qed_co_request(bs, sector_num, qiov, nb_sectors, 0);
- }
--typedef struct {
--    Coroutine *co;
--    int ret;
--    bool done;
--} QEDWriteZeroesCB;
--
--static void coroutine_fn qed_co_pwrite_zeroes_cb(void *opaque, int ret)
-+static int coroutine_fn bdrv_qed_co_writev(BlockDriverState *bs,
-+                                           int64_t sector_num, int nb_sectors,
-+                                           QEMUIOVector *qiov)
- {
--    QEDWriteZeroesCB *cb = opaque;
--
--    cb->done = true;
--    cb->ret = ret;
--    if (cb->co) {
--        aio_co_wake(cb->co);
--    }
-+    return qed_co_request(bs, sector_num, qiov, nb_sectors, QED_AIOCB_WRITE);
- }
- static int coroutine_fn bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
-                                                   int count,
-                                                   BdrvRequestFlags flags)
- {
--    BlockAIOCB *blockacb;
-     BDRVQEDState *s = bs->opaque;
--    QEDWriteZeroesCB cb = { .done = false };
-     QEMUIOVector qiov;
-     struct iovec iov;
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
-     iov.iov_len = count;
-     qemu_iovec_init_external(&qiov, &iov, 1);
--    blockacb = qed_aio_setup(bs, offset >> BDRV_SECTOR_BITS, &qiov,
--                             count >> BDRV_SECTOR_BITS,
--                             qed_co_pwrite_zeroes_cb, &cb,
--                             QED_AIOCB_WRITE | QED_AIOCB_ZERO);
--    if (!blockacb) {
--        return -EIO;
--    }
--    if (!cb.done) {
--        cb.co = qemu_coroutine_self();
--        qemu_coroutine_yield();
--    }
--    assert(cb.done);
--    return cb.ret;
-+    return qed_co_request(bs, offset >> BDRV_SECTOR_BITS, &qiov,
-+                          count >> BDRV_SECTOR_BITS,
-+                          QED_AIOCB_WRITE | QED_AIOCB_ZERO);
- }
- static int bdrv_qed_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
-@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_qed = {
-     .bdrv_create              = bdrv_qed_create,
-     .bdrv_has_zero_init       = bdrv_has_zero_init_1,
-     .bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
--    .bdrv_aio_readv           = bdrv_qed_aio_readv,
--    .bdrv_aio_writev          = bdrv_qed_aio_writev,
-+    .bdrv_co_readv            = bdrv_qed_co_readv,
-+    .bdrv_co_writev           = bdrv_qed_co_writev,
-     .bdrv_co_pwrite_zeroes    = bdrv_qed_co_pwrite_zeroes,
-     .bdrv_truncate            = bdrv_qed_truncate,
-     .bdrv_getlength           = bdrv_qed_getlength,
---
-.8.3.1

-[Qemu-devel] [PULL 50/61] qed: Use CoQueue for serialising allocations
+Deleted patch
-Now that we're running in coroutine context, the ad-hoc serialisation
-code (which drops a request that has to wait out of coroutine context)
-can be replaced by a CoQueue.
-This means that when we resume a serialised request, it is running in
-coroutine context again and its I/O isn't blocking any more.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 49 +++++++++++++++++--------------------------------
- block/qed.h |  3 ++-
-files changed, 19 insertions(+), 33 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
- static void qed_unplug_allocating_write_reqs(BDRVQEDState *s)
- {
--    QEDAIOCB *acb;
--
-     assert(s->allocating_write_reqs_plugged);
-     s->allocating_write_reqs_plugged = false;
--
--    acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs);
--    if (acb) {
--        qed_aio_start_io(acb);
--    }
-+    qemu_co_enter_next(&s->allocating_write_reqs);
- }
- static void qed_clear_need_check(void *opaque, int ret)
-@@ -XXX,XX +XXX,XX @@ static void qed_need_check_timer_cb(void *opaque)
-     BDRVQEDState *s = opaque;
-     /* The timer should only fire when allocating writes have drained */
--    assert(!QSIMPLEQ_FIRST(&s->allocating_write_reqs));
-+    assert(!s->allocating_acb);
-     trace_qed_need_check_timer_cb(s);
-@@ -XXX,XX +XXX,XX @@ static int bdrv_qed_do_open(BlockDriverState *bs, QDict *options, int flags,
-     int ret;
-     s->bs = bs;
--    QSIMPLEQ_INIT(&s->allocating_write_reqs);
-+    qemu_co_queue_init(&s->allocating_write_reqs);
-     ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
-     if (ret < 0) {
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete_bh(void *opaque)
-     qed_release(s);
- }
--static void qed_resume_alloc_bh(void *opaque)
--{
--    qed_aio_start_io(opaque);
--}
--
- static void qed_aio_complete(QEDAIOCB *acb, int ret)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
-      * next request in the queue.  This ensures that we don't cycle through
-      * requests multiple times but rather finish one at a time completely.
-      */
--    if (acb == QSIMPLEQ_FIRST(&s->allocating_write_reqs)) {
--        QEDAIOCB *next_acb;
--        QSIMPLEQ_REMOVE_HEAD(&s->allocating_write_reqs, next);
--        next_acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs);
--        if (next_acb) {
--            aio_bh_schedule_oneshot(bdrv_get_aio_context(acb->common.bs),
--                                    qed_resume_alloc_bh, next_acb);
-+    if (acb == s->allocating_acb) {
-+        s->allocating_acb = NULL;
-+        if (!qemu_co_queue_empty(&s->allocating_write_reqs)) {
-+            qemu_co_enter_next(&s->allocating_write_reqs);
-         } else if (s->header.features & QED_F_NEED_CHECK) {
-             qed_start_need_check_timer(s);
-         }
-@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-     int ret;
-     /* Cancel timer when the first allocating request comes in */
--    if (QSIMPLEQ_EMPTY(&s->allocating_write_reqs)) {
-+    if (s->allocating_acb == NULL) {
-         qed_cancel_need_check_timer(s);
-     }
-     /* Freeze this request if another allocating write is in progress */
--    if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs)) {
--        QSIMPLEQ_INSERT_TAIL(&s->allocating_write_reqs, acb, next);
--    }
--    if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs) ||
--        s->allocating_write_reqs_plugged) {
--        return -EINPROGRESS; /* wait for existing request to finish */
-+    if (s->allocating_acb != acb || s->allocating_write_reqs_plugged) {
-+        if (s->allocating_acb != NULL) {
-+            qemu_co_queue_wait(&s->allocating_write_reqs, NULL);
-+            assert(s->allocating_acb == NULL);
-+        }
-+        s->allocating_acb = acb;
-+        return -EAGAIN; /* start over with looking up table entries */
-     }
-     acb->cur_nclusters = qed_bytes_to_clusters(s,
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb)
-             ret = qed_aio_read_data(acb, ret, offset, len);
-         }
--        if (ret < 0) {
--            if (ret != -EINPROGRESS) {
--                qed_aio_complete(acb, ret);
--            }
-+        if (ret < 0 && ret != -EAGAIN) {
-+            qed_aio_complete(acb, ret);
-             return;
-         }
-     }
-diff --git a/block/qed.h b/block/qed.h
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.h
-+++ b/block/qed.h
-@@ -XXX,XX +XXX,XX @@ typedef struct {
-     uint32_t l2_mask;
-     /* Allocating write request queue */
--    QSIMPLEQ_HEAD(, QEDAIOCB) allocating_write_reqs;
-+    QEDAIOCB *allocating_acb;
-+    CoQueue allocating_write_reqs;
-     bool allocating_write_reqs_plugged;
-     /* Periodic flush and clear need check flag */
---
-.8.3.1

-[Qemu-devel] [PULL 51/61] qed: Simplify request handling
+[PULL 10/10] qemu-iotests: inline common.config into common.rc
-Now that we process a request in the same coroutine from beginning to
+From: Paolo Bonzini <pbonzini@redhat.com>
 end and don't drop out of it any more, we can look like a proper
 coroutine-based driver and simply call qed_aio_next_io() and get a
 return value from it instead of spawning an additional coroutine that
 reenters the parent when it's done.
+common.rc has some complicated logic to find the common.config that
+dates back to xfstests and is completely unnecessary now.  Just include
+the contents of the file.
+Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
+Message-Id: <20220505094723.732116-1-pbonzini@redhat.com>
 Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/qed.c | 101 +++++++++++++-----------------------------------------------
+ tests/qemu-iotests/common.config | 41 --------------------------------
- block/qed.h |   3 +-
+ tests/qemu-iotests/common.rc     | 31 ++++++++++++++----------
-files changed, 22 insertions(+), 82 deletions(-)
+files changed, 19 insertions(+), 53 deletions(-)
  delete mode 100644 tests/qemu-iotests/common.config
-diff --git a/block/qed.c b/block/qed.c
+diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
-index XXXXXXX..XXXXXXX 100644
+deleted file mode 100644
---- a/block/qed.c
+index XXXXXXX..XXXXXXX
-+++ b/block/qed.c
+--- a/tests/qemu-iotests/common.config
 +++ /dev/null
 @@ -XXX,XX +XXX,XX @@
- #include "qapi/qmp/qerror.h"
+-#!/usr/bin/env bash
- #include "sysemu/block-backend.h"
+-#
+-# Copyright (C) 2009 Red Hat, Inc.
--static const AIOCBInfo qed_aiocb_info = {
+-# Copyright (c) 2000-2003,2006 Silicon Graphics, Inc.  All Rights Reserved.
--    .aiocb_size         = sizeof(QEDAIOCB),
+-#
--};
+-# This program is free software; you can redistribute it and/or
 -# modify it under the terms of the GNU General Public License as
 -# published by the Free Software Foundation.
 -#
 -# This program is distributed in the hope that it would be useful,
 -# but WITHOUT ANY WARRANTY; without even the implied warranty of
 -# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 -# GNU General Public License for more details.
 -#
 -# You should have received a copy of the GNU General Public License
 -# along with this program.  If not, see <http://www.gnu.org/licenses/>.
 -#
 -# all tests should use a common language setting to prevent golden
 -# output mismatches.
 -export LANG=C
 -
- static int bdrv_qed_probe(const uint8_t *buf, int buf_size,
+-PATH=".:$PATH"
                            const char *filename)
  {
@@ -XXX,XX +XXX,XX @@ static CachedL2Table *qed_new_l2_table(BDRVQEDState *s)
      return l2_table;
  }
 -static void qed_aio_next_io(QEDAIOCB *acb);
 -
--static void qed_aio_start_io(QEDAIOCB *acb)
+-HOSTOS=$(uname -s)
 -arch=$(uname -m)
 -[[ "$arch" =~ "ppc64" ]] && qemu_arch=ppc64 || qemu_arch="$arch"
 -
 -# make sure we have a standard umask
 -umask 022
 -
 -_optstr_add()
 -{
--    qed_aio_next_io(acb);
+-    if [ -n "$1" ]; then
 -        echo "$1,$2"
 -    else
 -        echo "$2"
 -    fi
 -}
 -
- static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
+-# make sure this script returns success
 -true
 diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
 index XXXXXXX..XXXXXXX 100644
 --- a/tests/qemu-iotests/common.rc
 +++ b/tests/qemu-iotests/common.rc
@@ -XXX,XX +XXX,XX @@
  # along with this program.  If not, see <http://www.gnu.org/licenses/>.
  #
 +export LANG=C
 +
 +PATH=".:$PATH"
 +
 +HOSTOS=$(uname -s)
 +arch=$(uname -m)
 +[[ "$arch" =~ "ppc64" ]] && qemu_arch=ppc64 || qemu_arch="$arch"
 +
 +# make sure we have a standard umask
 +umask 022
 +
  # bail out, setting up .notrun file
  _notrun()
  {
-     assert(!s->allocating_write_reqs_plugged);
+@@ -XXX,XX +XXX,XX @@ peek_file_raw()
-@@ -XXX,XX +XXX,XX @@ static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
+     dd if="$1" bs=1 skip="$2" count="$3" status=none
  static BDRVQEDState *acb_to_s(QEDAIOCB *acb)
  {
 -    return acb->common.bs->opaque;
 +    return acb->bs->opaque;
  }
- /**
+-config=common.config
-@@ -XXX,XX +XXX,XX @@ static void qed_update_l2_table(BDRVQEDState *s, QEDTable *table, int index,
+-test -f $config || config=../common.config
-     }
+-if ! test -f $config
- }
+-then
+-    echo "$0: failed to find common.config"
--static void qed_aio_complete_bh(void *opaque)
+-    exit 1
--{
+-fi
--    QEDAIOCB *acb = opaque;
+-if ! . $config
--    BDRVQEDState *s = acb_to_s(acb);
+-    then
--    BlockCompletionFunc *cb = acb->common.cb;
+-    echo "$0: failed to source common.config"
--    void *user_opaque = acb->common.opaque;
+-    exit 1
--    int ret = acb->bh_ret;
+-fi
--
++_optstr_add()
--    qemu_aio_unref(acb);
++{
--
++    if [ -n "$1" ]; then
--    /* Invoke callback */
++        echo "$1,$2"
--    qed_acquire(s);
++    else
--    cb(user_opaque, ret);
++        echo "$2"
--    qed_release(s);
++    fi
--}
++}
--
--static void qed_aio_complete(QEDAIOCB *acb, int ret)
+ # Set the variables to the empty string to turn Valgrind off
-+static void qed_aio_complete(QEDAIOCB *acb)
+ # for specific processes, e.g.
  {
      BDRVQEDState *s = acb_to_s(acb);
 -    trace_qed_aio_complete(s, acb, ret);
 -
      /* Free resources */
      qemu_iovec_destroy(&acb->cur_qiov);
      qed_unref_l2_cache_entry(acb->request.l2_table);
@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
          acb->qiov->iov[0].iov_base = NULL;
      }
 -    /* Arrange for a bh to invoke the completion function */
 -    acb->bh_ret = ret;
 -    aio_bh_schedule_oneshot(bdrv_get_aio_context(acb->common.bs),
 -                            qed_aio_complete_bh, acb);
 -
      /* Start next allocating write request waiting behind this one.  Note that
       * requests enqueue themselves when they first hit an unallocated cluster
       * but they wait until the entire request is finished before waking up the
@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
          struct iovec *iov = acb->qiov->iov;
          if (!iov->iov_base) {
 -            iov->iov_base = qemu_try_blockalign(acb->common.bs, iov->iov_len);
 +            iov->iov_base = qemu_try_blockalign(acb->bs, iov->iov_len);
              if (iov->iov_base == NULL) {
                  return -ENOMEM;
              }
@@ -XXX,XX +XXX,XX @@ static int qed_aio_read_data(void *opaque, int ret, uint64_t offset, size_t len)
  {
      QEDAIOCB *acb = opaque;
      BDRVQEDState *s = acb_to_s(acb);
 -    BlockDriverState *bs = acb->common.bs;
 +    BlockDriverState *bs = acb->bs;
      /* Adjust offset into cluster */
      offset += qed_offset_into_cluster(s, acb->cur_pos);
@@ -XXX,XX +XXX,XX @@ static int qed_aio_read_data(void *opaque, int ret, uint64_t offset, size_t len)
  /**
   * Begin next I/O or complete the request
   */
 -static void qed_aio_next_io(QEDAIOCB *acb)
 +static int qed_aio_next_io(QEDAIOCB *acb)
  {
      BDRVQEDState *s = acb_to_s(acb);
      uint64_t offset;
@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb)
          /* Complete request */
          if (acb->cur_pos >= acb->end_pos) {
 -            qed_aio_complete(acb, 0);
 -            return;
 +            ret = 0;
 +            break;
          }
          /* Find next cluster and start I/O */
          len = acb->end_pos - acb->cur_pos;
          ret = qed_find_cluster(s, &acb->request, acb->cur_pos, &len, &offset);
          if (ret < 0) {
 -            qed_aio_complete(acb, ret);
 -            return;
 +            break;
          }
          if (acb->flags & QED_AIOCB_WRITE) {
@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb)
          }
          if (ret < 0 && ret != -EAGAIN) {
 -            qed_aio_complete(acb, ret);
 -            return;
 +            break;
          }
      }
 -}
 -typedef struct QEDRequestCo {
 -    Coroutine *co;
 -    bool done;
 -    int ret;
 -} QEDRequestCo;
 -
 -static void qed_co_request_cb(void *opaque, int ret)
 -{
 -    QEDRequestCo *co = opaque;
 -
 -    co->done = true;
 -    co->ret = ret;
 -    qemu_coroutine_enter_if_inactive(co->co);
 +    trace_qed_aio_complete(s, acb, ret);
 +    qed_aio_complete(acb);
 +    return ret;
  }
  static int coroutine_fn qed_co_request(BlockDriverState *bs, int64_t sector_num,
                                         QEMUIOVector *qiov, int nb_sectors,
                                         int flags)
  {
 -    QEDRequestCo co = {
 -        .co     = qemu_coroutine_self(),
 -        .done   = false,
 +    QEDAIOCB acb = {
 +        .bs         = bs,
 +        .cur_pos    = (uint64_t) sector_num * BDRV_SECTOR_SIZE,
 +        .end_pos    = (sector_num + nb_sectors) * BDRV_SECTOR_SIZE,
 +        .qiov       = qiov,
 +        .flags      = flags,
      };
 -    QEDAIOCB *acb = qemu_aio_get(&qed_aiocb_info, bs, qed_co_request_cb, &co);
 -
 -    trace_qed_aio_setup(bs->opaque, acb, sector_num, nb_sectors, &co, flags);
 +    qemu_iovec_init(&acb.cur_qiov, qiov->niov);
 -    acb->flags = flags;
 -    acb->qiov = qiov;
 -    acb->qiov_offset = 0;
 -    acb->cur_pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE;
 -    acb->end_pos = acb->cur_pos + nb_sectors * BDRV_SECTOR_SIZE;
 -    acb->backing_qiov = NULL;
 -    acb->request.l2_table = NULL;
 -    qemu_iovec_init(&acb->cur_qiov, qiov->niov);
 +    trace_qed_aio_setup(bs->opaque, &acb, sector_num, nb_sectors, NULL, flags);
      /* Start request */
 -    qed_aio_start_io(acb);
 -
 -    if (!co.done) {
 -        qemu_coroutine_yield();
 -    }
 -
 -    return co.ret;
 +    return qed_aio_next_io(&acb);
  }
  static int coroutine_fn bdrv_qed_co_readv(BlockDriverState *bs,
 diff --git a/block/qed.h b/block/qed.h
 index XXXXXXX..XXXXXXX 100644
 --- a/block/qed.h
 +++ b/block/qed.h
@@ -XXX,XX +XXX,XX @@ enum {
  };
  typedef struct QEDAIOCB {
 -    BlockAIOCB common;
 -    int bh_ret;                     /* final return status for completion bh */
 +    BlockDriverState *bs;
      QSIMPLEQ_ENTRY(QEDAIOCB) next;  /* next request */
      int flags;                      /* QED_AIOCB_* bits ORed together */
      uint64_t end_pos;               /* request end on block device, in bytes */
 --
-.8.3.1
+.35.3

-[Qemu-devel] [PULL 52/61] qed: Use a coroutine for need_check_timer
+Deleted patch
-This fixes the last place where we degraded from AIO to actual blocking
-synchronous I/O requests. Putting it into a coroutine means that instead
-of blocking, the coroutine simply yields while doing I/O.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 33 +++++++++++++++++----------------
-file changed, 17 insertions(+), 16 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static void qed_unplug_allocating_write_reqs(BDRVQEDState *s)
-     qemu_co_enter_next(&s->allocating_write_reqs);
- }
--static void qed_clear_need_check(void *opaque, int ret)
-+static void qed_need_check_timer_entry(void *opaque)
- {
-     BDRVQEDState *s = opaque;
-+    int ret;
--    if (ret) {
-+    /* The timer should only fire when allocating writes have drained */
-+    assert(!s->allocating_acb);
-+
-+    trace_qed_need_check_timer_cb(s);
-+
-+    qed_acquire(s);
-+    qed_plug_allocating_write_reqs(s);
-+
-+    /* Ensure writes are on disk before clearing flag */
-+    ret = bdrv_co_flush(s->bs->file->bs);
-+    qed_release(s);
-+    if (ret < 0) {
-         qed_unplug_allocating_write_reqs(s);
-         return;
-     }
-@@ -XXX,XX +XXX,XX @@ static void qed_clear_need_check(void *opaque, int ret)
-     qed_unplug_allocating_write_reqs(s);
--    ret = bdrv_flush(s->bs);
-+    ret = bdrv_co_flush(s->bs);
-     (void) ret;
- }
- static void qed_need_check_timer_cb(void *opaque)
- {
--    BDRVQEDState *s = opaque;
--
--    /* The timer should only fire when allocating writes have drained */
--    assert(!s->allocating_acb);
--
--    trace_qed_need_check_timer_cb(s);
--
--    qed_acquire(s);
--    qed_plug_allocating_write_reqs(s);
--
--    /* Ensure writes are on disk before clearing flag */
--    bdrv_aio_flush(s->bs->file->bs, qed_clear_need_check, s);
--    qed_release(s);
-+    Coroutine *co = qemu_coroutine_create(qed_need_check_timer_entry, opaque);
-+    qemu_coroutine_enter(co);
- }
- void qed_acquire(BDRVQEDState *s)
---
-.8.3.1

-[Qemu-devel] [PULL 53/61] qed: Add coroutine_fn to I/O path functions
+Deleted patch
-Now that we stay in coroutine context for the whole request when doing
-reads or writes, we can add coroutine_fn annotations to many functions
-that can do I/O or yield directly.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed-cluster.c |  5 +++--
- block/qed.c         | 44 ++++++++++++++++++++++++--------------------
- block/qed.h         |  5 +++--
-files changed, 30 insertions(+), 24 deletions(-)
-diff --git a/block/qed-cluster.c b/block/qed-cluster.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed-cluster.c
-+++ b/block/qed-cluster.c
-@@ -XXX,XX +XXX,XX @@ static unsigned int qed_count_contiguous_clusters(BDRVQEDState *s,
-  * On failure QED_CLUSTER_L2 or QED_CLUSTER_L1 is returned for missing L2 or L1
-  * table offset, respectively. len is number of contiguous unallocated bytes.
-  */
--int qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
--                     size_t *len, uint64_t *img_offset)
-+int coroutine_fn qed_find_cluster(BDRVQEDState *s, QEDRequest *request,
-+                                  uint64_t pos, size_t *len,
-+                                  uint64_t *img_offset)
- {
-     uint64_t l2_offset;
-     uint64_t offset = 0;
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ int qed_write_header_sync(BDRVQEDState *s)
-  * This function only updates known header fields in-place and does not affect
-  * extra data after the QED header.
-  */
--static int qed_write_header(BDRVQEDState *s)
-+static int coroutine_fn qed_write_header(BDRVQEDState *s)
- {
-     /* We must write full sectors for O_DIRECT but cannot necessarily generate
-      * the data following the header if an unrecognized compat feature is
-@@ -XXX,XX +XXX,XX @@ static void qed_unplug_allocating_write_reqs(BDRVQEDState *s)
-     qemu_co_enter_next(&s->allocating_write_reqs);
- }
--static void qed_need_check_timer_entry(void *opaque)
-+static void coroutine_fn qed_need_check_timer_entry(void *opaque)
- {
-     BDRVQEDState *s = opaque;
-     int ret;
-@@ -XXX,XX +XXX,XX @@ static BDRVQEDState *acb_to_s(QEDAIOCB *acb)
-  * This function reads qiov->size bytes starting at pos from the backing file.
-  * If there is no backing file then zeroes are read.
-  */
--static int qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
--                                 QEMUIOVector *qiov,
--                                 QEMUIOVector **backing_qiov)
-+static int coroutine_fn qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
-+                                              QEMUIOVector *qiov,
-+                                              QEMUIOVector **backing_qiov)
- {
-     uint64_t backing_length = 0;
-     size_t size;
-@@ -XXX,XX +XXX,XX @@ static int qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
-  * @len:        Number of bytes
-  * @offset:     Byte offset in image file
-  */
--static int qed_copy_from_backing_file(BDRVQEDState *s, uint64_t pos,
--                                      uint64_t len, uint64_t offset)
-+static int coroutine_fn qed_copy_from_backing_file(BDRVQEDState *s,
-+                                                   uint64_t pos, uint64_t len,
-+                                                   uint64_t offset)
- {
-     QEMUIOVector qiov;
-     QEMUIOVector *backing_qiov = NULL;
-@@ -XXX,XX +XXX,XX @@ out:
-  * The cluster offset may be an allocated byte offset in the image file, the
-  * zero cluster marker, or the unallocated cluster marker.
-  */
--static void qed_update_l2_table(BDRVQEDState *s, QEDTable *table, int index,
--                                unsigned int n, uint64_t cluster)
-+static void coroutine_fn qed_update_l2_table(BDRVQEDState *s, QEDTable *table,
-+                                             int index, unsigned int n,
-+                                             uint64_t cluster)
- {
-     int i;
-     for (i = index; i < index + n; i++) {
-@@ -XXX,XX +XXX,XX @@ static void qed_update_l2_table(BDRVQEDState *s, QEDTable *table, int index,
-     }
- }
--static void qed_aio_complete(QEDAIOCB *acb)
-+static void coroutine_fn qed_aio_complete(QEDAIOCB *acb)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete(QEDAIOCB *acb)
- /**
-  * Update L1 table with new L2 table offset and write it out
-  */
--static int qed_aio_write_l1_update(QEDAIOCB *acb)
-+static int coroutine_fn qed_aio_write_l1_update(QEDAIOCB *acb)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-     CachedL2Table *l2_table = acb->request.l2_table;
-@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_l1_update(QEDAIOCB *acb)
- /**
-  * Update L2 table with new cluster offsets and write them out
-  */
--static int qed_aio_write_l2_update(QEDAIOCB *acb, uint64_t offset)
-+static int coroutine_fn qed_aio_write_l2_update(QEDAIOCB *acb, uint64_t offset)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-     bool need_alloc = acb->find_cluster_ret == QED_CLUSTER_L1;
-@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_l2_update(QEDAIOCB *acb, uint64_t offset)
- /**
-  * Write data to the image file
-  */
--static int qed_aio_write_main(QEDAIOCB *acb)
-+static int coroutine_fn qed_aio_write_main(QEDAIOCB *acb)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-     uint64_t offset = acb->cur_cluster +
-@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_main(QEDAIOCB *acb)
- /**
-  * Populate untouched regions of new data cluster
-  */
--static int qed_aio_write_cow(QEDAIOCB *acb)
-+static int coroutine_fn qed_aio_write_cow(QEDAIOCB *acb)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-     uint64_t start, len, offset;
-@@ -XXX,XX +XXX,XX @@ static bool qed_should_set_need_check(BDRVQEDState *s)
-  *
-  * This path is taken when writing to previously unallocated clusters.
-  */
--static int qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-+static int coroutine_fn qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-     int ret;
-@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
-  *
-  * This path is taken when writing to already allocated clusters.
-  */
--static int qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
-+static int coroutine_fn qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset,
-+                                              size_t len)
- {
-     /* Allocate buffer for zero writes */
-     if (acb->flags & QED_AIOCB_ZERO) {
-@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
-  * @offset:     Cluster offset in bytes
-  * @len:        Length in bytes
-  */
--static int qed_aio_write_data(void *opaque, int ret,
--                              uint64_t offset, size_t len)
-+static int coroutine_fn qed_aio_write_data(void *opaque, int ret,
-+                                           uint64_t offset, size_t len)
- {
-     QEDAIOCB *acb = opaque;
-@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_data(void *opaque, int ret,
-  * @offset:     Cluster offset in bytes
-  * @len:        Length in bytes
-  */
--static int qed_aio_read_data(void *opaque, int ret, uint64_t offset, size_t len)
-+static int coroutine_fn qed_aio_read_data(void *opaque, int ret,
-+                                          uint64_t offset, size_t len)
- {
-     QEDAIOCB *acb = opaque;
-     BDRVQEDState *s = acb_to_s(acb);
-@@ -XXX,XX +XXX,XX @@ static int qed_aio_read_data(void *opaque, int ret, uint64_t offset, size_t len)
- /**
-  * Begin next I/O or complete the request
-  */
--static int qed_aio_next_io(QEDAIOCB *acb)
-+static int coroutine_fn qed_aio_next_io(QEDAIOCB *acb)
- {
-     BDRVQEDState *s = acb_to_s(acb);
-     uint64_t offset;
-diff --git a/block/qed.h b/block/qed.h
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.h
-+++ b/block/qed.h
-@@ -XXX,XX +XXX,XX @@ int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
- /**
-  * Cluster functions
-  */
--int qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
--                     size_t *len, uint64_t *img_offset);
-+int coroutine_fn qed_find_cluster(BDRVQEDState *s, QEDRequest *request,
-+                                  uint64_t pos, size_t *len,
-+                                  uint64_t *img_offset);
- /**
-  * Consistency check
---
-.8.3.1

-[Qemu-devel] [PULL 54/61] qed: Use bdrv_co_* for coroutine_fns
+Deleted patch
-All functions that are marked coroutine_fn can directly call the
-bdrv_co_* version of functions instead of going through the wrapper.
-Signed-off-by: Kevin Wolf <kwolf@redhat.com>
-Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- block/qed.c | 16 +++++++++-------
-file changed, 9 insertions(+), 7 deletions(-)
-diff --git a/block/qed.c b/block/qed.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/qed.c
-+++ b/block/qed.c
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_write_header(BDRVQEDState *s)
-     };
-     qemu_iovec_init_external(&qiov, &iov, 1);
--    ret = bdrv_preadv(s->bs->file, 0, &qiov);
-+    ret = bdrv_co_preadv(s->bs->file, 0, qiov.size, &qiov, 0);
-     if (ret < 0) {
-         goto out;
-     }
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_write_header(BDRVQEDState *s)
-     /* Update header */
-     qed_header_cpu_to_le(&s->header, (QEDHeader *) buf);
--    ret = bdrv_pwritev(s->bs->file, 0, &qiov);
-+    ret = bdrv_co_pwritev(s->bs->file, 0, qiov.size,  &qiov, 0);
-     if (ret < 0) {
-         goto out;
-     }
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
-     qemu_iovec_concat(*backing_qiov, qiov, 0, size);
-     BLKDBG_EVENT(s->bs->file, BLKDBG_READ_BACKING_AIO);
--    ret = bdrv_preadv(s->bs->backing, pos, *backing_qiov);
-+    ret = bdrv_co_preadv(s->bs->backing, pos, size, *backing_qiov, 0);
-     if (ret < 0) {
-         return ret;
-     }
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_copy_from_backing_file(BDRVQEDState *s,
-     }
-     BLKDBG_EVENT(s->bs->file, BLKDBG_COW_WRITE);
--    ret = bdrv_pwritev(s->bs->file, offset, &qiov);
-+    ret = bdrv_co_pwritev(s->bs->file, offset, qiov.size, &qiov, 0);
-     if (ret < 0) {
-         goto out;
-     }
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_aio_write_main(QEDAIOCB *acb)
-     trace_qed_aio_write_main(s, acb, 0, offset, acb->cur_qiov.size);
-     BLKDBG_EVENT(s->bs->file, BLKDBG_WRITE_AIO);
--    ret = bdrv_pwritev(s->bs->file, offset, &acb->cur_qiov);
-+    ret = bdrv_co_pwritev(s->bs->file, offset, acb->cur_qiov.size,
-+                          &acb->cur_qiov, 0);
-     if (ret < 0) {
-         return ret;
-     }
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_aio_write_main(QEDAIOCB *acb)
-              * region.  The solution is to flush after writing a new data
-              * cluster and before updating the L2 table.
-              */
--            ret = bdrv_flush(s->bs->file->bs);
-+            ret = bdrv_co_flush(s->bs->file->bs);
-             if (ret < 0) {
-                 return ret;
-             }
-@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_aio_read_data(void *opaque, int ret,
-     }
-     BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
--    ret = bdrv_preadv(bs->file, offset, &acb->cur_qiov);
-+    ret = bdrv_co_preadv(bs->file, offset, acb->cur_qiov.size,
-+                         &acb->cur_qiov, 0);
-     if (ret < 0) {
-         return ret;
-     }
---
-.8.3.1

-[Qemu-devel] [PULL 57/61] fix: avoid an infinite loop or a dangling pointer problem in img_commit
+Deleted patch
-From: "sochin.jiang" <sochin.jiang@huawei.com>
-img_commit could fall into an infinite loop calling run_block_job() if
-its blockjob fails on any I/O error, fix this already known problem.
-Signed-off-by: sochin.jiang <sochin.jiang@huawei.com>
-Message-id: 1497509253-28941-1-git-send-email-sochin.jiang@huawei.com
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- blockjob.c               |  4 ++--
- include/block/blockjob.h | 18 ++++++++++++++++++
- qemu-img.c               | 20 +++++++++++++-------
-files changed, 33 insertions(+), 9 deletions(-)
-diff --git a/blockjob.c b/blockjob.c
-index XXXXXXX..XXXXXXX 100644
---- a/blockjob.c
-+++ b/blockjob.c
-@@ -XXX,XX +XXX,XX @@ static void block_job_resume(BlockJob *job)
-     block_job_enter(job);
- }
--static void block_job_ref(BlockJob *job)
-+void block_job_ref(BlockJob *job)
- {
-     ++job->refcnt;
- }
-@@ -XXX,XX +XXX,XX @@ static void block_job_attached_aio_context(AioContext *new_context,
-                                            void *opaque);
- static void block_job_detach_aio_context(void *opaque);
--static void block_job_unref(BlockJob *job)
-+void block_job_unref(BlockJob *job)
- {
-     if (--job->refcnt == 0) {
-         BlockDriverState *bs = blk_bs(job->blk);
-diff --git a/include/block/blockjob.h b/include/block/blockjob.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/block/blockjob.h
-+++ b/include/block/blockjob.h
-@@ -XXX,XX +XXX,XX @@ void block_job_iostatus_reset(BlockJob *job);
- BlockJobTxn *block_job_txn_new(void);
- /**
-+ * block_job_ref:
-+ *
-+ * Add a reference to BlockJob refcnt, it will be decreased with
-+ * block_job_unref, and then be freed if it comes to be the last
-+ * reference.
-+ */
-+void block_job_ref(BlockJob *job);
-+
-+/**
-+ * block_job_unref:
-+ *
-+ * Release a reference that was previously acquired with block_job_ref
-+ * or block_job_create. If it's the last reference to the object, it will be
-+ * freed.
-+ */
-+void block_job_unref(BlockJob *job);
-+
-+/**
-  * block_job_txn_unref:
-  *
-  * Release a reference that was previously acquired with block_job_txn_add_job
-diff --git a/qemu-img.c b/qemu-img.c
-index XXXXXXX..XXXXXXX 100644
---- a/qemu-img.c
-+++ b/qemu-img.c
-@@ -XXX,XX +XXX,XX @@ static void common_block_job_cb(void *opaque, int ret)
- static void run_block_job(BlockJob *job, Error **errp)
- {
-     AioContext *aio_context = blk_get_aio_context(job->blk);
-+    int ret = 0;
--    /* FIXME In error cases, the job simply goes away and we access a dangling
--     * pointer below. */
-     aio_context_acquire(aio_context);
-+    block_job_ref(job);
-     do {
-         aio_poll(aio_context, true);
-         qemu_progress_print(job->len ?
-                             ((float)job->offset / job->len * 100.f) : 0.0f, 0);
--    } while (!job->ready);
-+    } while (!job->ready && !job->completed);
--    block_job_complete_sync(job, errp);
-+    if (!job->completed) {
-+        ret = block_job_complete_sync(job, errp);
-+    } else {
-+        ret = job->ret;
-+    }
-+    block_job_unref(job);
-     aio_context_release(aio_context);
--    /* A block job may finish instantaneously without publishing any progress,
--     * so just signal completion here */
--    qemu_progress_print(100.f, 0);
-+    /* publish completion progress only when success */
-+    if (!ret) {
-+        qemu_progress_print(100.f, 0);
-+    }
- }
- static int img_commit(int argc, char **argv)
---
-.8.3.1

-[Qemu-devel] [PULL 58/61] blkdebug: Catch bs->exact_filename overflow
+Deleted patch
-From: Max Reitz <mreitz@redhat.com>
-The bs->exact_filename field may not be sufficient to store the full
-blkdebug node filename. In this case, we should not generate a filename
-at all instead of an unusable one.
-Cc: qemu-stable@nongnu.org
-Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
-Message-id: 20170613172006.19685-2-mreitz@redhat.com
-Reviewed-by: Alberto Garcia <berto@igalia.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Signed-off-by: Max Reitz <mreitz@redhat.com>
----
- block/blkdebug.c | 10 +++++++---
-file changed, 7 insertions(+), 3 deletions(-)
-diff --git a/block/blkdebug.c b/block/blkdebug.c
-index XXXXXXX..XXXXXXX 100644
---- a/block/blkdebug.c
-+++ b/block/blkdebug.c
-@@ -XXX,XX +XXX,XX @@ static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
-     }
-     if (!force_json && bs->file->bs->exact_filename[0]) {
--        snprintf(bs->exact_filename, sizeof(bs->exact_filename),
--                 "blkdebug:%s:%s", s->config_file ?: "",
--                 bs->file->bs->exact_filename);
-+        int ret = snprintf(bs->exact_filename, sizeof(bs->exact_filename),
-+                           "blkdebug:%s:%s", s->config_file ?: "",
-+                           bs->file->bs->exact_filename);
-+        if (ret >= sizeof(bs->exact_filename)) {
-+            /* An overflow makes the filename unusable, so do not report any */
-+            bs->exact_filename[0] = 0;
-+        }
-     }
-     opts = qdict_new();
---
-.8.3.1

The following changes since commit 4c8c1cc544dbd5e2564868e61c5037258e393832:

Merge remote-tracking branch 'remotes/vivier/tags/m68k-for-2.10-pull-request' into staging (2017-06-22 19:01:58 +0100)

are available in the git repository at:

git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 1512008812410ca4054506a7c44343088abdd977:

Merge remote-tracking branch 'mreitz/tags/pull-block-2017-06-23' into queue-block (2017-06-23 14:09:12 +0200)

----------------------------------------------------------------

Block layer patches

----------------------------------------------------------------
Alberto Garcia (9):
      throttle: Update throttle-groups.c documentation
      qcow2: Remove unused Error variable in do_perform_cow()
      qcow2: Use unsigned int for both members of Qcow2COWRegion
      qcow2: Make perform_cow() call do_perform_cow() twice
      qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()
      qcow2: Allow reading both COW regions with only one request
      qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()
      qcow2: Merge the writing of the COW regions with the guest data
      qcow2: Use offset_into_cluster() and offset_to_l2_index()

Kevin Wolf (37):
      commit: Fix completion with extra reference
      qemu-iotests: Allow starting new qemu after cleanup
      qemu-iotests: Test exiting qemu with running job
      doc: Document generic -blockdev options
      doc: Document driver-specific -blockdev options
      qed: Use bottom half to resume waiting requests
      qed: Make qed_read_table() synchronous
      qed: Remove callback from qed_read_table()
      qed: Remove callback from qed_read_l2_table()
      qed: Remove callback from qed_find_cluster()
      qed: Make qed_read_backing_file() synchronous
      qed: Make qed_copy_from_backing_file() synchronous
      qed: Remove callback from qed_copy_from_backing_file()
      qed: Make qed_write_header() synchronous
      qed: Remove callback from qed_write_header()
      qed: Make qed_write_table() synchronous
      qed: Remove GenericCB
      qed: Remove callback from qed_write_table()
      qed: Make qed_aio_read_data() synchronous
      qed: Make qed_aio_write_main() synchronous
      qed: Inline qed_commit_l2_update()
      qed: Add return value to qed_aio_write_l1_update()
      qed: Add return value to qed_aio_write_l2_update()
      qed: Add return value to qed_aio_write_main()
      qed: Add return value to qed_aio_write_cow()
      qed: Add return value to qed_aio_write_inplace/alloc()
      qed: Add return value to qed_aio_read/write_data()
      qed: Remove ret argument from qed_aio_next_io()
      qed: Remove recursion in qed_aio_next_io()
      qed: Implement .bdrv_co_readv/writev
      qed: Use CoQueue for serialising allocations
      qed: Simplify request handling
      qed: Use a coroutine for need_check_timer
      qed: Add coroutine_fn to I/O path functions
      qed: Use bdrv_co_* for coroutine_fns
      block: Remove bdrv_aio_readv/writev/flush()
      Merge remote-tracking branch 'mreitz/tags/pull-block-2017-06-23' into queue-block

Manos Pitsidianakis (1):
      block: change variable names in BlockDriverState

Max Reitz (3):
      blkdebug: Catch bs->exact_filename overflow
      blkverify: Catch bs->exact_filename overflow
      block: Do not strcmp() with NULL uri->scheme

Stefan Hajnoczi (10):
      block: count bdrv_co_rw_vmstate() requests
      block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
      migration: avoid recursive AioContext locking in save_vmstate()
      migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()
      virtio-pci: use ioeventfd even when KVM is disabled
      migration: hold AioContext lock for loadvm qemu_fclose()
      qemu-iotests: 068: extract _qemu() function
      qemu-iotests: 068: use -drive/-device instead of -hda
      qemu-iotests: 068: test iothread mode
      qemu-img: don't shadow opts variable in img_dd()

Stephen Bates (1):
      nvme: Add support for Read Data and Write Data in CMBs.

sochin.jiang (1):
      fix: avoid an infinite loop or a dangling pointer problem in img_commit

commit_complete() can't assume that after its block_job_completed() the
job is actually immediately freed; someone else may still be holding
references. In this case, the op blockers on the intermediate nodes make
the graph reconfiguration in the completion code fail.

Call block_job_remove_all_bdrv() manually so that we know for sure that
any blockers on intermediate nodes are given up.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/commit.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/block/commit.c b/block/commit.c
index XXXXXXX..XXXXXXX 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -XXX,XX +XXX,XX @@ static void commit_complete(BlockJob *job, void *opaque)
     }
     g_free(s->backing_file_str);
     blk_unref(s->top);
+
+    /* If there is more than one reference to the job (e.g. if called from
+     * block_job_finish_sync()), block_job_completed() won't free it and
+     * therefore the blockers on the intermediate nodes remain. This would
+     * cause bdrv_set_backing_hd() to fail. */
+    block_job_remove_all_bdrv(job);
+
     block_job_completed(&s->common, ret);
     g_free(data);
 
-- 
1.8.3.1

After _cleanup_qemu(), test cases should be able to start the next qemu
process and call _cleanup_qemu() for that one as well. For this to work
cleanly, we need to improve the cleanup so that the second invocation
doesn't try to kill the qemu instances from the first invocation a
second time (which would result in error messages).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/common.qemu | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/qemu-iotests/common.qemu b/tests/qemu-iotests/common.qemu
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/common.qemu
+++ b/tests/qemu-iotests/common.qemu
@@ -XXX,XX +XXX,XX @@ function _cleanup_qemu()
         rm -f "${QEMU_FIFO_IN}_${i}" "${QEMU_FIFO_OUT}_${i}"
         eval "exec ${QEMU_IN[$i]}<&-"   # close file descriptors
         eval "exec ${QEMU_OUT[$i]}<&-"
+
+        unset QEMU_IN[$i]
+        unset QEMU_OUT[$i]
     done
 }
-- 
1.8.3.1

When qemu is exited, all running jobs should be cancelled successfully.
This adds a test for this for all types of block jobs that currently
exist in qemu.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/185     | 206 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/185.out |  59 +++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 266 insertions(+)
 create mode 100755 tests/qemu-iotests/185
 create mode 100644 tests/qemu-iotests/185.out

diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
new file mode 100755
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qemu-iotests/185
@@ -XXX,XX +XXX,XX @@
+#!/bin/bash
+#
+# Test exiting qemu while jobs are still running
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=kwolf@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1 # failure is the default!
+
+MIG_SOCKET="${TEST_DIR}/migrate"
+
+_cleanup()
+{
+    rm -f "${TEST_IMG}.mid"
+    rm -f "${TEST_IMG}.copy"
+    _cleanup_test_img
+    _cleanup_qemu
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+size=64M
+TEST_IMG="${TEST_IMG}.base" _make_test_img $size
+
+echo
+echo === Starting VM ===
+echo
+
+qemu_comm_method="qmp"
+
+_launch_qemu \
+    -drive file="${TEST_IMG}.base",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+h=$QEMU_HANDLE
+_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
+
+echo
+echo === Creating backing chain ===
+echo
+
+_send_qemu_cmd $h \
+    "{ 'execute': 'blockdev-snapshot-sync',
+       'arguments': { 'device': 'disk',
+                      'snapshot-file': '$TEST_IMG.mid',
+                      'format': '$IMGFMT',
+                      'mode': 'absolute-paths' } }" \
+    "return"
+
+_send_qemu_cmd $h \
+    "{ 'execute': 'human-monitor-command',
+       'arguments': { 'command-line':
+                      'qemu-io disk \"write 0 4M\"' } }" \
+    "return"
+
+_send_qemu_cmd $h \
+    "{ 'execute': 'blockdev-snapshot-sync',
+       'arguments': { 'device': 'disk',
+                      'snapshot-file': '$TEST_IMG',
+                      'format': '$IMGFMT',
+                      'mode': 'absolute-paths' } }" \
+    "return"
+
+echo
+echo === Start commit job and exit qemu ===
+echo
+
+# Note that the reference output intentionally includes the 'offset' field in
+# BLOCK_JOB_CANCELLED events for all of the following block jobs. They are
+# predictable and any change in the offsets would hint at a bug in the job
+# throttling code.
+#
+# In order to achieve these predictable offsets, all of the following tests
+# use speed=65536. Each job will perform exactly one iteration before it has
+# to sleep at least for a second, which is plenty of time for the 'quit' QMP
+# command to be received (after receiving the command, the rest runs
+# synchronously, so jobs can arbitrarily continue or complete).
+#
+# The buffer size for commit and streaming is 512k (waiting for 8 seconds after
+# the first request), for active commit and mirror it's large enough to cover
+# the full 4M, and for backup it's the qcow2 cluster size, which we know is
+# 64k. As all of these are at least as large as the speed, we are sure that the
+# offset doesn't advance after the first iteration before qemu exits.
+
+_send_qemu_cmd $h \
+    "{ 'execute': 'block-commit',
+       'arguments': { 'device': 'disk',
+                      'base':'$TEST_IMG.base',
+                      'top': '$TEST_IMG.mid',
+                      'speed': 65536 } }" \
+    "return"
+
+_send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
+wait=1 _cleanup_qemu
+
+echo
+echo === Start active commit job and exit qemu ===
+echo
+
+_launch_qemu \
+    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+h=$QEMU_HANDLE
+_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
+
+_send_qemu_cmd $h \
+    "{ 'execute': 'block-commit',
+       'arguments': { 'device': 'disk',
+                      'base':'$TEST_IMG.base',
+                      'speed': 65536 } }" \
+    "return"
+
+_send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
+wait=1 _cleanup_qemu
+
+echo
+echo === Start mirror job and exit qemu ===
+echo
+
+_launch_qemu \
+    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+h=$QEMU_HANDLE
+_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
+
+_send_qemu_cmd $h \
+    "{ 'execute': 'drive-mirror',
+       'arguments': { 'device': 'disk',
+                      'target': '$TEST_IMG.copy',
+                      'format': '$IMGFMT',
+                      'sync': 'full',
+                      'speed': 65536 } }" \
+    "return"
+
+_send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
+wait=1 _cleanup_qemu
+
+echo
+echo === Start backup job and exit qemu ===
+echo
+
+_launch_qemu \
+    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+h=$QEMU_HANDLE
+_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
+
+_send_qemu_cmd $h \
+    "{ 'execute': 'drive-backup',
+       'arguments': { 'device': 'disk',
+                      'target': '$TEST_IMG.copy',
+                      'format': '$IMGFMT',
+                      'sync': 'full',
+                      'speed': 65536 } }" \
+    "return"
+
+_send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
+wait=1 _cleanup_qemu
+
+echo
+echo === Start streaming job and exit qemu ===
+echo
+
+_launch_qemu \
+    -drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+h=$QEMU_HANDLE
+_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" 'return'
+
+_send_qemu_cmd $h \
+    "{ 'execute': 'block-stream',
+       'arguments': { 'device': 'disk',
+                      'speed': 65536 } }" \
+    "return"
+
+_send_qemu_cmd $h "{ 'execute': 'quit' }" "return"
+wait=1 _cleanup_qemu
+
+_check_test_img
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qemu-iotests/185.out
@@ -XXX,XX +XXX,XX @@
+QA output created by 185
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
+
+=== Starting VM ===
+
+{"return": {}}
+
+=== Creating backing chain ===
+
+Formatting 'TEST_DIR/t.qcow2.mid', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.base backing_fmt=qcow2 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
+{"return": {}}
+wrote 4194304/4194304 bytes at offset 0
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+{"return": ""}
+Formatting 'TEST_DIR/t.qcow2', fmt=qcow2 size=67108864 backing_file=TEST_DIR/t.qcow2.mid backing_fmt=qcow2 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
+{"return": {}}
+
+=== Start commit job and exit qemu ===
+
+{"return": {}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 524288, "speed": 65536, "type": "commit"}}
+
+=== Start active commit job and exit qemu ===
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "commit"}}
+
+=== Start mirror job and exit qemu ===
+
+{"return": {}}
+Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
+{"return": {}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 4194304, "offset": 4194304, "speed": 65536, "type": "mirror"}}
+
+=== Start backup job and exit qemu ===
+
+{"return": {}}
+Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
+{"return": {}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 65536, "speed": 65536, "type": "backup"}}
+
+=== Start streaming job and exit qemu ===
+
+{"return": {}}
+{"return": {}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_CANCELLED", "data": {"device": "disk", "len": 67108864, "offset": 524288, "speed": 65536, "type": "stream"}}
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -XXX,XX +XXX,XX @@
 181 rw auto migration
 182 rw auto quick
 183 rw auto migration
+185 rw auto
-- 
1.8.3.1

From: Stefan Hajnoczi <stefanha@redhat.com>

Calling aio_poll() directly may have been fine previously, but this is
the future, man!  The difference between an aio_poll() loop and
BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext
around aio_poll().

This allows the IOThread to run fd handlers or BHs to complete the
request.  Failure to release the AioContext causes deadlocks.

Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object
iothread.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index XXXXXXX..XXXXXXX 100644
--- a/block/io.c
+++ b/block/io.c
@@ -XXX,XX +XXX,XX @@ bdrv_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
         Coroutine *co = qemu_coroutine_create(bdrv_co_rw_vmstate_entry, &data);
 
         bdrv_coroutine_enter(bs, co);
-        while (data.ret == -EINPROGRESS) {
-            aio_poll(bdrv_get_aio_context(bs), true);
-        }
+        BDRV_POLL_WHILE(bs, data.ret == -EINPROGRESS);
         return data.ret;
     }
 }
-- 
1.8.3.1

From: Stefan Hajnoczi <stefanha@redhat.com>

AioContext was designed to allow nested acquire/release calls.  It uses
a recursive mutex so callers don't need to worry about nesting...or so
we thought.

BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
the AioContext temporarily around aio_poll().  This gives IOThreads a
chance to acquire the AioContext to process I/O completions.

It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
will not be able to acquire the AioContext if it was acquired
multiple times.

Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
this patch simply avoids nested locking in save_vmstate().  It's the
simplest fix and we should step back to consider the big picture with
all the recent changes to block layer threading.

This patch is the final fix to solve 'savevm' hanging with -object
iothread.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 migration/savevm.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -XXX,XX +XXX,XX @@ int save_snapshot(const char *name, Error **errp)
         goto the_end;
     }
 
+    /* The bdrv_all_create_snapshot() call that follows acquires the AioContext
+     * for itself.  BDRV_POLL_WHILE() does not support nested locking because
+     * it only releases the lock once.  Therefore synchronous I/O will deadlock
+     * unless we release the AioContext before bdrv_all_create_snapshot().
+     */
+    aio_context_release(aio_context);
+    aio_context = NULL;
+
     ret = bdrv_all_create_snapshot(sn, bs, vm_state_size, &bs);
     if (ret < 0) {
         error_setg(errp, "Error while creating snapshot on '%s'",
@@ -XXX,XX +XXX,XX @@ int save_snapshot(const char *name, Error **errp)
     ret = 0;
 
  the_end:
-    aio_context_release(aio_context);
+    if (aio_context) {
+        aio_context_release(aio_context);
+    }
     if (saved_vm_running) {
         vm_start();
     }
-- 
1.8.3.1

From: Stefan Hajnoczi <stefanha@redhat.com>

blk/bdrv_drain_all() only takes effect for a single instant and then
resumes block jobs, guest devices, and other external clients like the
NBD server.  This can be handy when performing a synchronous drain
before terminating the program, for example.

Monitor commands usually need to quiesce I/O across an entire code
region so blk/bdrv_drain_all() is not suitable.  They must use
bdrv_drain_all_begin/end() to mark the region.  This prevents new I/O
requests from slipping in or worse - block jobs completing and modifying
the graph.

I audited other blk/bdrv_drain_all() callers but did not find anything
that needs a similar fix.  This patch fixes the savevm/loadvm commands.
Although I haven't encountered a read world issue this makes the code
safer.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 migration/savevm.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -XXX,XX +XXX,XX @@ int save_snapshot(const char *name, Error **errp)
     }
     vm_stop(RUN_STATE_SAVE_VM);
 
+    bdrv_drain_all_begin();
+
     aio_context_acquire(aio_context);
 
     memset(sn, 0, sizeof(*sn));
@@ -XXX,XX +XXX,XX @@ int save_snapshot(const char *name, Error **errp)
     if (aio_context) {
         aio_context_release(aio_context);
     }
+
+    bdrv_drain_all_end();
+
     if (saved_vm_running) {
         vm_start();
     }
@@ -XXX,XX +XXX,XX @@ int load_snapshot(const char *name, Error **errp)
     }
 
     /* Flush all IO requests so they don't interfere with the new state.  */
-    bdrv_drain_all();
+    bdrv_drain_all_begin();
 
     ret = bdrv_all_goto_snapshot(name, &bs);
     if (ret < 0) {
         error_setg(errp, "Error %d while activating snapshot '%s' on '%s'",
                      ret, name, bdrv_get_device_name(bs));
-        return ret;
+        goto err_drain;
     }
 
     /* restore the VM state */
     f = qemu_fopen_bdrv(bs_vm_state, 0);
     if (!f) {
         error_setg(errp, "Could not open VM state file");
-        return -EINVAL;
+        ret = -EINVAL;
+        goto err_drain;
     }
 
     qemu_system_reset(SHUTDOWN_CAUSE_NONE);
@@ -XXX,XX +XXX,XX @@ int load_snapshot(const char *name, Error **errp)
     ret = qemu_loadvm_state(f);
     aio_context_release(aio_context);
 
+    bdrv_drain_all_end();
+
     migration_incoming_state_destroy();
     if (ret < 0) {
         error_setg(errp, "Error %d while loading VM state", ret);
@@ -XXX,XX +XXX,XX @@ int load_snapshot(const char *name, Error **errp)
     }
 
     return 0;
+
+err_drain:
+    bdrv_drain_all_end();
+    return ret;
 }
 
 void vmstate_register_ram(MemoryRegion *mr, DeviceState *dev)
-- 
1.8.3.1

This adds documentation for the -blockdev options that apply to all
nodes independent of the block driver used.

All options that are shared by -blockdev and -drive are now explained in
the section for -blockdev. The documentation of -drive mentions that all
-blockdev options are accepted as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 qemu-options.hx | 108 +++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 79 insertions(+), 29 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index XXXXXXX..XXXXXXX 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ DEF("blockdev", HAS_ARG, QEMU_OPTION_blockdev,
     "          [,read-only=on|off][,detect-zeroes=on|off|unmap]\n"
     "          [,driver specific parameters...]\n"
     "                configure a block backend\n", QEMU_ARCH_ALL)
+STEXI
+@item -blockdev @var{option}[,@var{option}[,@var{option}[,...]]]
+@findex -blockdev
+
+Define a new block driver node.
+
+@table @option
+@item Valid options for any block driver node:
+
+@table @code
+@item driver
+Specifies the block driver to use for the given node.
+@item node-name
+This defines the name of the block driver node by which it will be referenced
+later. The name must be unique, i.e. it must not match the name of a different
+block driver node, or (if you use @option{-drive} as well) the ID of a drive.
+
+If no node name is specified, it is automatically generated. The generated node
+name is not intended to be predictable and changes between QEMU invocations.
+For the top level, an explicit node name must be specified.
+@item read-only
+Open the node read-only. Guest write attempts will fail.
+@item cache.direct
+The host page cache can be avoided with @option{cache.direct=on}. This will
+attempt to do disk IO directly to the guest's memory. QEMU may still perform an
+internal copy of the data.
+@item cache.no-flush
+In case you don't care about data integrity over host failures, you can use
+@option{cache.no-flush=on}. This option tells QEMU that it never needs to write
+any data to the disk but can instead keep things in cache. If anything goes
+wrong, like your host losing power, the disk storage getting disconnected
+accidentally, etc. your image will most probably be rendered unusable.
+@item discard=@var{discard}
+@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and controls
+whether @code{discard} (also known as @code{trim} or @code{unmap}) requests are
+ignored or passed to the filesystem. Some machine types may not support
+discard requests.
+@item detect-zeroes=@var{detect-zeroes}
+@var{detect-zeroes} is "off", "on" or "unmap" and enables the automatic
+conversion of plain zero writes by the OS to driver specific optimized
+zero write commands. You may even choose "unmap" if @var{discard} is set
+to "unmap" to allow a zero write to be converted to an @code{unmap} operation.
+@end table
+
+@end table
+
+ETEXI
 
 DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
@@ -XXX,XX +XXX,XX @@ STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
 @findex -drive
 
-Define a new drive. Valid options are:
+Define a new drive. This includes creating a block driver node (the backend) as
+well as a guest device, and is mostly a shortcut for defining the corresponding
+@option{-blockdev} and @option{-device} options.
+
+@option{-drive} accepts all options that are accepted by @option{-blockdev}. In
+addition, it knows the following options:
 
 @table @option
 @item file=@var{file}
@@ -XXX,XX +XXX,XX @@ These options have the same definition as they have in @option{-hdachs}.
 @var{snapshot} is "on" or "off" and controls snapshot mode for the given drive
 (see @option{-snapshot}).
 @item cache=@var{cache}
-@var{cache} is "none", "writeback", "unsafe", "directsync" or "writethrough" and controls how the host cache is used to access block data.
+@var{cache} is "none", "writeback", "unsafe", "directsync" or "writethrough"
+and controls how the host cache is used to access block data. This is a
+shortcut that sets the @option{cache.direct} and @option{cache.no-flush}
+options (as in @option{-blockdev}), and additionally @option{cache.writeback},
+which provides a default for the @option{write-cache} option of block guest
+devices (as in @option{-device}). The modes correspond to the following
+settings:
+
+@c Our texi2pod.pl script doesn't support @multitable, so fall back to using
+@c plain ASCII art (well, UTF-8 art really). This looks okay both in the manpage
+@c and the HTML output.
+@example
+@             │ cache.writeback   cache.direct   cache.no-flush
+─────────────┼─────────────────────────────────────────────────
+writeback    │ on                off            off
+none         │ on                on             off
+writethrough │ off               off            off
+directsync   │ off               on             off
+unsafe       │ on                off            on
+@end example
+
+The default mode is @option{cache=writeback}.
+
 @item aio=@var{aio}
 @var{aio} is "threads", or "native" and selects between pthread based disk I/O and native Linux AIO.
-@item discard=@var{discard}
-@var{discard} is one of "ignore" (or "off") or "unmap" (or "on") and controls whether @dfn{discard} (also known as @dfn{trim} or @dfn{unmap}) requests are ignored or passed to the filesystem.  Some machine types may not support discard requests.
 @item format=@var{format}
 Specify which disk @var{format} will be used rather than detecting
 the format.  Can be used to specify format=raw to avoid interpreting
@@ -XXX,XX +XXX,XX @@ Specify which @var{action} to take on write and read errors. Valid actions are:
 "report" (report the error to the guest), "enospc" (pause QEMU only if the
 host disk is full; report the error to the guest otherwise).
 The default setting is @option{werror=enospc} and @option{rerror=report}.
-@item readonly
-Open drive @option{file} as read-only. Guest write attempts will fail.
 @item copy-on-read=@var{copy-on-read}
 @var{copy-on-read} is "on" or "off" and enables whether to copy read backing
 file sectors into the image file.
-@item detect-zeroes=@var{detect-zeroes}
-@var{detect-zeroes} is "off", "on" or "unmap" and enables the automatic
-conversion of plain zero writes by the OS to driver specific optimized
-zero write commands. You may even choose "unmap" if @var{discard} is set
-to "unmap" to allow a zero write to be converted to an UNMAP operation.
 @item bps=@var{b},bps_rd=@var{r},bps_wr=@var{w}
 Specify bandwidth throttling limits in bytes per second, either for all request
 types or for reads or writes only.  Small values can lead to timeouts or hangs
@@ -XXX,XX +XXX,XX @@ prevent guests from circumventing throttling limits by using many small disks
 instead of a single larger disk.
 @end table
 
-By default, the @option{cache=writeback} mode is used. It will report data
+By default, the @option{cache.writeback=on} mode is used. It will report data
 writes as completed as soon as the data is present in the host page cache.
 This is safe as long as your guest OS makes sure to correctly flush disk caches
 where needed. If your guest OS does not handle volatile disk write caches
 correctly and your host crashes or loses power, then the guest may experience
 data corruption.
 
-For such guests, you should consider using @option{cache=writethrough}. This
+For such guests, you should consider using @option{cache.writeback=off}. This
 means that the host page cache will be used to read and write data, but write
 notification will be sent to the guest only after QEMU has made sure to flush
 each write to the disk. Be aware that this has a major impact on performance.
 
-The host page cache can be avoided entirely with @option{cache=none}.  This will
-attempt to do disk IO directly to the guest's memory.  QEMU may still perform
-an internal copy of the data. Note that this is considered a writeback mode and
-the guest OS must handle the disk write cache correctly in order to avoid data
-corruption on host crashes.
-
-The host page cache can be avoided while only sending write notifications to
-the guest when the data has been flushed to the disk using
-@option{cache=directsync}.
-
-In case you don't care about data integrity over host failures, use
-@option{cache=unsafe}. This option tells QEMU that it never needs to write any
-data to the disk but can instead keep things in cache. If anything goes wrong,
-like your host losing power, the disk storage getting disconnected accidentally,
-etc. your image will most probably be rendered unusable.   When using
-the @option{-snapshot} option, unsafe caching is always used.
+When using the @option{-snapshot} option, unsafe caching is always used.
 
 Copy-on-read avoids accessing the same backing file sectors repeatedly and is
 useful when the backing file is over a slow network.  By default copy-on-read
-- 
1.8.3.1

This documents the driver-specific options for the raw, qcow2 and file
block drivers for the man page. For everything else, we refer to the
QAPI documentation.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 qemu-options.hx | 115 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 114 insertions(+), 1 deletion(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index XXXXXXX..XXXXXXX 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ STEXI
 @item -blockdev @var{option}[,@var{option}[,@var{option}[,...]]]
 @findex -blockdev
 
-Define a new block driver node.
+Define a new block driver node. Some of the options apply to all block drivers,
+other options are only accepted for a specific block driver. See below for a
+list of generic options and options for the most common block drivers.
+
+Options that expect a reference to another node (e.g. @code{file}) can be
+given in two ways. Either you specify the node name of an already existing node
+(file=@var{node-name}), or you define a new node inline, adding options
+for the referenced node after a dot (file.filename=@var{path},file.aio=native).
+
+A block driver node created with @option{-blockdev} can be used for a guest
+device by specifying its node name for the @code{drive} property in a
+@option{-device} argument that defines a block device.
 
 @table @option
 @item Valid options for any block driver node:
@@ -XXX,XX +XXX,XX @@ zero write commands. You may even choose "unmap" if @var{discard} is set
 to "unmap" to allow a zero write to be converted to an @code{unmap} operation.
 @end table
 
+@item Driver-specific options for @code{file}
+
+This is the protocol-level block driver for accessing regular files.
+
+@table @code
+@item filename
+The path to the image file in the local filesystem
+@item aio
+Specifies the AIO backend (threads/native, default: threads)
+@end table
+Example:
+@example
+-blockdev driver=file,node-name=disk,filename=disk.img
+@end example
+
+@item Driver-specific options for @code{raw}
+
+This is the image format block driver for raw images. It is usually
+stacked on top of a protocol level block driver such as @code{file}.
+
+@table @code
+@item file
+Reference to or definition of the data source block driver node
+(e.g. a @code{file} driver node)
+@end table
+Example 1:
+@example
+-blockdev driver=file,node-name=disk_file,filename=disk.img
+-blockdev driver=raw,node-name=disk,file=disk_file
+@end example
+Example 2:
+@example
+-blockdev driver=raw,node-name=disk,file.driver=file,file.filename=disk.img
+@end example
+
+@item Driver-specific options for @code{qcow2}
+
+This is the image format block driver for qcow2 images. It is usually
+stacked on top of a protocol level block driver such as @code{file}.
+
+@table @code
+@item file
+Reference to or definition of the data source block driver node
+(e.g. a @code{file} driver node)
+
+@item backing
+Reference to or definition of the backing file block device (default is taken
+from the image file). It is allowed to pass an empty string here in order to
+disable the default backing file.
+
+@item lazy-refcounts
+Whether to enable the lazy refcounts feature (on/off; default is taken from the
+image file)
+
+@item cache-size
+The maximum total size of the L2 table and refcount block caches in bytes
+(default: 1048576 bytes or 8 clusters, whichever is larger)
+
+@item l2-cache-size
+The maximum size of the L2 table cache in bytes
+(default: 4/5 of the total cache size)
+
+@item refcount-cache-size
+The maximum size of the refcount block cache in bytes
+(default: 1/5 of the total cache size)
+
+@item cache-clean-interval
+Clean unused entries in the L2 and refcount caches. The interval is in seconds.
+The default value is 0 and it disables this feature.
+
+@item pass-discard-request
+Whether discard requests to the qcow2 device should be forwarded to the data
+source (on/off; default: on if discard=unmap is specified, off otherwise)
+
+@item pass-discard-snapshot
+Whether discard requests for the data source should be issued when a snapshot
+operation (e.g. deleting a snapshot) frees clusters in the qcow2 file (on/off;
+default: on)
+
+@item pass-discard-other
+Whether discard requests for the data source should be issued on other
+occasions where a cluster gets freed (on/off; default: off)
+
+@item overlap-check
+Which overlap checks to perform for writes to the image
+(none/constant/cached/all; default: cached). For details or finer
+granularity control refer to the QAPI documentation of @code{blockdev-add}.
+@end table
+
+Example 1:
+@example
+-blockdev driver=file,node-name=my_file,filename=/tmp/disk.qcow2
+-blockdev driver=qcow2,node-name=hda,file=my_file,overlap-check=none,cache-size=16777216
+@end example
+Example 2:
+@example
+-blockdev driver=qcow2,node-name=disk,file.driver=http,file.filename=http://example.com/image.qcow2
+@end example
+
+@item Driver-specific options for other drivers
+Please refer to the QAPI documentation of the @code{blockdev-add} QMP command.
+
 @end table
 
 ETEXI
-- 
1.8.3.1

From: Alberto Garcia <berto@igalia.com>

There used to be throttle_timers_{detach,attach}_aio_context() calls
in bdrv_set_aio_context(), but since 7ca7f0f6db1fedd28d490795d778cf239
they are now in blk_set_aio_context().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/throttle-groups.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index XXXXXXX..XXXXXXX 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -XXX,XX +XXX,XX @@
  * Again, all this is handled internally and is mostly transparent to
  * the outside. The 'throttle_timers' field however has an additional
  * constraint because it may be temporarily invalid (see for example
- * bdrv_set_aio_context()). Therefore in this file a thread will
+ * blk_set_aio_context()). Therefore in this file a thread will
  * access some other BlockBackend's timers only after verifying that
  * that BlockBackend has throttled requests in the queue.
  */
-- 
1.8.3.1

From: Stefan Hajnoczi <stefanha@redhat.com>

Old kvm.ko versions only supported a tiny number of ioeventfds so
virtio-pci avoids ioeventfds when kvm_has_many_ioeventfds() returns 0.

Do not check kvm_has_many_ioeventfds() when KVM is disabled since it
always returns 0.  Since commit 8c56c1a592b5092d91da8d8943c17777d6462a6f
("memory: emulate ioeventfd") it has been possible to use ioeventfds in
qtest or TCG mode.

This patch makes -device virtio-blk-pci,iothread=iothread0 work even
when KVM is disabled.

I have tested that virtio-blk-pci works under TCG both with and without
iothread.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/virtio/virtio-pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -XXX,XX +XXX,XX @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error **errp)
     bool pcie_port = pci_bus_is_express(pci_dev->bus) &&
                      !pci_bus_is_root(pci_dev->bus);
 
-    if (!kvm_has_many_ioeventfds()) {
+    if (kvm_enabled() && !kvm_has_many_ioeventfds()) {
         proxy->flags &= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
     }
 
-- 
1.8.3.1

From: Stefan Hajnoczi <stefanha@redhat.com>

migration_incoming_state_destroy() uses qemu_fclose() on the vmstate
file.  Make sure to call it inside an AioContext acquire/release region.

This fixes an 'qemu: qemu_mutex_unlock: Operation not permitted' abort
in loadvm.

This patch closes the vmstate file before ending the drained region.
Previously we closed the vmstate file after ending the drained region.
The order does not matter.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 migration/savevm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index XXXXXXX..XXXXXXX 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -XXX,XX +XXX,XX @@ int load_snapshot(const char *name, Error **errp)
 
     aio_context_acquire(aio_context);
     ret = qemu_loadvm_state(f);
+    migration_incoming_state_destroy();
     aio_context_release(aio_context);
 
     bdrv_drain_all_end();
 
-    migration_incoming_state_destroy();
     if (ret < 0) {
         error_setg(errp, "Error %d while loading VM state", ret);
         return ret;
-- 
1.8.3.1

From: Stefan Hajnoczi <stefanha@redhat.com>

Avoid duplicating the QEMU command-line.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/068 | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/068 b/tests/qemu-iotests/068
index XXXXXXX..XXXXXXX 100755
--- a/tests/qemu-iotests/068
+++ b/tests/qemu-iotests/068
@@ -XXX,XX +XXX,XX @@ case "$QEMU_DEFAULT_MACHINE" in
       ;;
 esac
 
-# Give qemu some time to boot before saving the VM state
-bash -c 'sleep 1; echo -e "savevm 0\nquit"' |\
-    $QEMU $platform_parm -nographic -monitor stdio -serial none -hda "$TEST_IMG" |\
+_qemu()
+{
+    $QEMU $platform_parm -nographic -monitor stdio -serial none -hda "$TEST_IMG" \
+          "$@" |\
     _filter_qemu | _filter_hmp
+}
+
+# Give qemu some time to boot before saving the VM state
+bash -c 'sleep 1; echo -e "savevm 0\nquit"' | _qemu
 # Now try to continue from that VM state (this should just work)
-echo quit |\
-    $QEMU $platform_parm -nographic -monitor stdio -serial none -hda "$TEST_IMG" -loadvm 0 |\
-    _filter_qemu | _filter_hmp
+echo quit | _qemu -loadvm 0
 
 # success, all done
 echo "*** done"
-- 
1.8.3.1

From: Stefan Hajnoczi <stefanha@redhat.com>

Perform the savevm/loadvm test with both iothread on and off.  This
covers the recently found savevm/loadvm hang when iothread is enabled.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/068     | 23 ++++++++++++++---------
 tests/qemu-iotests/068.out | 11 ++++++++++-
 2 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/068 b/tests/qemu-iotests/068
index XXXXXXX..XXXXXXX 100755
--- a/tests/qemu-iotests/068
+++ b/tests/qemu-iotests/068
@@ -XXX,XX +XXX,XX @@ _supported_os Linux
 IMGOPTS="compat=1.1"
 IMG_SIZE=128K
 
-echo
-echo "=== Saving and reloading a VM state to/from a qcow2 image ==="
-echo
-_make_test_img $IMG_SIZE
-
 case "$QEMU_DEFAULT_MACHINE" in
   s390-ccw-virtio)
       platform_parm="-no-shutdown"
@@ -XXX,XX +XXX,XX @@ _qemu()
     _filter_qemu | _filter_hmp
 }
 
-# Give qemu some time to boot before saving the VM state
-bash -c 'sleep 1; echo -e "savevm 0\nquit"' | _qemu
-# Now try to continue from that VM state (this should just work)
-echo quit | _qemu -loadvm 0
+for extra_args in \
+    "" \
+    "-object iothread,id=iothread0 -set device.hba0.iothread=iothread0"; do
+    echo
+    echo "=== Saving and reloading a VM state to/from a qcow2 image ($extra_args) ==="
+    echo
+
+    _make_test_img $IMG_SIZE
+
+    # Give qemu some time to boot before saving the VM state
+    bash -c 'sleep 1; echo -e "savevm 0\nquit"' | _qemu $extra_args
+    # Now try to continue from that VM state (this should just work)
+    echo quit | _qemu $extra_args -loadvm 0
+done
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/068.out b/tests/qemu-iotests/068.out
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/068.out
+++ b/tests/qemu-iotests/068.out
@@ -XXX,XX +XXX,XX @@
 QA output created by 068
 
-=== Saving and reloading a VM state to/from a qcow2 image ===
+=== Saving and reloading a VM state to/from a qcow2 image () ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) savevm 0
+(qemu) quit
+QEMU X.Y.Z monitor - type 'help' for more information
+(qemu) quit
+
+=== Saving and reloading a VM state to/from a qcow2 image (-object iothread,id=iothread0 -set device.hba0.iothread=iothread0) ===
 
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=131072
 QEMU X.Y.Z monitor - type 'help' for more information
-- 
1.8.3.1

From: Stephen Bates <sbates@raithlin.com>

Add the ability for the NVMe model to support both the RDS and WDS
modes in the Controller Memory Buffer.

Although not currently supported in the upstreamed Linux kernel a fork
with support exists [1] and user-space test programs that build on
this also exist [2].

Useful for testing CMB functionality in preperation for real CMB
enabled NVMe devices (coming soon).

[1] https://github.com/sbates130272/linux-p2pmem
[2] https://github.com/sbates130272/p2pmem-test

Signed-off-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/block/nvme.c | 83 +++++++++++++++++++++++++++++++++++++++------------------
 hw/block/nvme.h |  1 +
 2 files changed, 58 insertions(+), 26 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -XXX,XX +XXX,XX @@
  *              cmb_size_mb=<cmb_size_mb[optional]>
  *
  * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at
- * offset 0 in BAR2 and supports SQS only for now.
+ * offset 0 in BAR2 and supports only WDS, RDS and SQS for now.
  */
 
 #include "qemu/osdep.h"
@@ -XXX,XX +XXX,XX @@ static void nvme_isr_notify(NvmeCtrl *n, NvmeCQueue *cq)
     }
 }
 
-static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
-    uint32_t len, NvmeCtrl *n)
+static uint16_t nvme_map_prp(QEMUSGList *qsg, QEMUIOVector *iov, uint64_t prp1,
+                             uint64_t prp2, uint32_t len, NvmeCtrl *n)
 {
     hwaddr trans_len = n->page_size - (prp1 % n->page_size);
     trans_len = MIN(len, trans_len);
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
 
     if (!prp1) {
         return NVME_INVALID_FIELD | NVME_DNR;
+    } else if (n->cmbsz && prp1 >= n->ctrl_mem.addr &&
+               prp1 < n->ctrl_mem.addr + int128_get64(n->ctrl_mem.size)) {
+        qsg->nsg = 0;
+        qemu_iovec_init(iov, num_prps);
+        qemu_iovec_add(iov, (void *)&n->cmbuf[prp1 - n->ctrl_mem.addr], trans_len);
+    } else {
+        pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
+        qemu_sglist_add(qsg, prp1, trans_len);
     }
-
-    pci_dma_sglist_init(qsg, &n->parent_obj, num_prps);
-    qemu_sglist_add(qsg, prp1, trans_len);
     len -= trans_len;
     if (len) {
         if (!prp2) {
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
 
             nents = (len + n->page_size - 1) >> n->page_bits;
             prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
-            pci_dma_read(&n->parent_obj, prp2, (void *)prp_list, prp_trans);
+            nvme_addr_read(n, prp2, (void *)prp_list, prp_trans);
             while (len != 0) {
                 uint64_t prp_ent = le64_to_cpu(prp_list[i]);
 
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
                     i = 0;
                     nents = (len + n->page_size - 1) >> n->page_bits;
                     prp_trans = MIN(n->max_prp_ents, nents) * sizeof(uint64_t);
-                    pci_dma_read(&n->parent_obj, prp_ent, (void *)prp_list,
+                    nvme_addr_read(n, prp_ent, (void *)prp_list,
                         prp_trans);
                     prp_ent = le64_to_cpu(prp_list[i]);
                 }
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
                 }
 
                 trans_len = MIN(len, n->page_size);
-                qemu_sglist_add(qsg, prp_ent, trans_len);
+                if (qsg->nsg){
+                    qemu_sglist_add(qsg, prp_ent, trans_len);
+                } else {
+                    qemu_iovec_add(iov, (void *)&n->cmbuf[prp_ent - n->ctrl_mem.addr], trans_len);
+                }
                 len -= trans_len;
                 i++;
             }
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_map_prp(QEMUSGList *qsg, uint64_t prp1, uint64_t prp2,
             if (prp2 & (n->page_size - 1)) {
                 goto unmap;
             }
-            qemu_sglist_add(qsg, prp2, len);
+            if (qsg->nsg) {
+                qemu_sglist_add(qsg, prp2, len);
+            } else {
+                qemu_iovec_add(iov, (void *)&n->cmbuf[prp2 - n->ctrl_mem.addr], trans_len);
+            }
         }
     }
     return NVME_SUCCESS;
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_dma_read_prp(NvmeCtrl *n, uint8_t *ptr, uint32_t len,
     uint64_t prp1, uint64_t prp2)
 {
     QEMUSGList qsg;
+    QEMUIOVector iov;
+    uint16_t status = NVME_SUCCESS;
 
-    if (nvme_map_prp(&qsg, prp1, prp2, len, n)) {
+    if (nvme_map_prp(&qsg, &iov, prp1, prp2, len, n)) {
         return NVME_INVALID_FIELD | NVME_DNR;
     }
-    if (dma_buf_read(ptr, len, &qsg)) {
+    if (qsg.nsg > 0) {
+        if (dma_buf_read(ptr, len, &qsg)) {
+            status = NVME_INVALID_FIELD | NVME_DNR;
+        }
         qemu_sglist_destroy(&qsg);
-        return NVME_INVALID_FIELD | NVME_DNR;
+    } else {
+        if (qemu_iovec_to_buf(&iov, 0, ptr, len) != len) {
+            status = NVME_INVALID_FIELD | NVME_DNR;
+        }
+        qemu_iovec_destroy(&iov);
     }
-    qemu_sglist_destroy(&qsg);
-    return NVME_SUCCESS;
+    return status;
 }
 
 static void nvme_post_cqes(void *opaque)
@@ -XXX,XX +XXX,XX @@ static uint16_t nvme_rw(NvmeCtrl *n, NvmeNamespace *ns, NvmeCmd *cmd,
         return NVME_LBA_RANGE | NVME_DNR;
     }
 
-    if (nvme_map_prp(&req->qsg, prp1, prp2, data_size, n)) {
+    if (nvme_map_prp(&req->qsg, &req->iov, prp1, prp2, data_size, n)) {
         block_acct_invalid(blk_get_stats(n->conf.blk), acct);
         return NVME_INVALID_FIELD | NVME_DNR;
     }
 
-    assert((nlb << data_shift) == req->qsg.size);
-
-    req->has_sg = true;
     dma_acct_start(n->conf.blk, &req->acct, &req->qsg, acct);
-    req->aiocb = is_write ?
-        dma_blk_write(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
-                      nvme_rw_cb, req) :
-        dma_blk_read(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
-                     nvme_rw_cb, req);
+    if (req->qsg.nsg > 0) {
+        req->has_sg = true;
+        req->aiocb = is_write ?
+            dma_blk_write(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
+                          nvme_rw_cb, req) :
+            dma_blk_read(n->conf.blk, &req->qsg, data_offset, BDRV_SECTOR_SIZE,
+                         nvme_rw_cb, req);
+    } else {
+        req->has_sg = false;
+        req->aiocb = is_write ?
+            blk_aio_pwritev(n->conf.blk, data_offset, &req->iov, 0, nvme_rw_cb,
+                            req) :
+            blk_aio_preadv(n->conf.blk, data_offset, &req->iov, 0, nvme_rw_cb,
+                           req);
+    }
 
     return NVME_NO_COMPLETE;
 }
@@ -XXX,XX +XXX,XX @@ static int nvme_init(PCIDevice *pci_dev)
         NVME_CMBSZ_SET_SQS(n->bar.cmbsz, 1);
         NVME_CMBSZ_SET_CQS(n->bar.cmbsz, 0);
         NVME_CMBSZ_SET_LISTS(n->bar.cmbsz, 0);
-        NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 0);
-        NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 0);
+        NVME_CMBSZ_SET_RDS(n->bar.cmbsz, 1);
+        NVME_CMBSZ_SET_WDS(n->bar.cmbsz, 1);
         NVME_CMBSZ_SET_SZU(n->bar.cmbsz, 2); /* MBs */
         NVME_CMBSZ_SET_SZ(n->bar.cmbsz, n->cmb_size_mb);
 
+        n->cmbloc = n->bar.cmbloc;
+        n->cmbsz = n->bar.cmbsz;
+
         n->cmbuf = g_malloc0(NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
         memory_region_init_io(&n->ctrl_mem, OBJECT(n), &nvme_cmb_ops, n,
                               "nvme-cmb", NVME_CMBSZ_GETSIZE(n->bar.cmbsz));
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -XXX,XX +XXX,XX @@ typedef struct NvmeRequest {
     NvmeCqe                 cqe;
     BlockAcctCookie         acct;
     QEMUSGList              qsg;
+    QEMUIOVector            iov;
     QTAILQ_ENTRY(NvmeRequest)entry;
 } NvmeRequest;
 
-- 
1.8.3.1

From: Alberto Garcia <berto@igalia.com>

Qcow2COWRegion has two attributes:

- The offset of the COW region from the start of the first cluster
  touched by the I/O request. Since it's always going to be positive
  and the maximum request size is at most INT_MAX, we can use a
  regular unsigned int to store this offset.

- The size of the COW region in bytes. This is guaranteed to be >= 0,
  so we should use an unsigned type instead.

In x86_64 this reduces the size of Qcow2COWRegion from 16 to 8 bytes.
It will also help keep some assertions simpler now that we know that
there are no negative numbers.

The prototype of do_perform_cow() is also updated to reflect these
changes.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c | 4 ++--
 block/qcow2.h         | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

From: Alberto Garcia <berto@igalia.com>

Instead of calling perform_cow() twice with a different COW region
each time, call it just once and make perform_cow() handle both
regions.

This patch simply moves code around. The next one will do the actual
reordering of the COW operations.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c | 36 ++++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
     struct iovec iov;
     int ret;
 
+    if (bytes == 0) {
+        return 0;
+    }
+
     iov.iov_len = bytes;
     iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
     if (iov.iov_base == NULL) {
@@ -XXX,XX +XXX,XX @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     return cluster_offset;
 }
 
-static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, Qcow2COWRegion *r)
+static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
 {
     BDRVQcow2State *s = bs->opaque;
+    Qcow2COWRegion *start = &m->cow_start;
+    Qcow2COWRegion *end = &m->cow_end;
     int ret;
 
-    if (r->nb_bytes == 0) {
+    if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
     }
 
     qemu_co_mutex_unlock(&s->lock);
-    ret = do_perform_cow(bs, m->offset, m->alloc_offset, r->offset, r->nb_bytes);
-    qemu_co_mutex_lock(&s->lock);
-
+    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
+                         start->offset, start->nb_bytes);
     if (ret < 0) {
-        return ret;
+        goto fail;
     }
 
+    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
+                         end->offset, end->nb_bytes);
+
+fail:
+    qemu_co_mutex_lock(&s->lock);
+
     /*
      * Before we update the L2 table to actually point to the new cluster, we
      * need to be sure that the refcounts have been increased and COW was
      * handled.
      */
-    qcow2_cache_depends_on_flush(s->l2_table_cache);
+    if (ret == 0) {
+        qcow2_cache_depends_on_flush(s->l2_table_cache);
+    }
 
-    return 0;
+    return ret;
 }
 
 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
@@ -XXX,XX +XXX,XX @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
     }
 
     /* copy content of unmodified sectors */
-    ret = perform_cow(bs, m, &m->cow_start);
-    if (ret < 0) {
-        goto err;
-    }
-
-    ret = perform_cow(bs, m, &m->cow_end);
+    ret = perform_cow(bs, m);
     if (ret < 0) {
         goto err;
     }
-- 
1.8.3.1

From: Alberto Garcia <berto@igalia.com>

This patch splits do_perform_cow() into three separate functions to
read, encrypt and write the COW regions.

perform_cow() can now read both regions first, then encrypt them and
finally write them to disk. The memory allocation is also done in
this function now, using one single buffer large enough to hold both
regions.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c | 117 +++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 87 insertions(+), 30 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -XXX,XX +XXX,XX @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
     return 0;
 }
 
-static int coroutine_fn do_perform_cow(BlockDriverState *bs,
-                                       uint64_t src_cluster_offset,
-                                       uint64_t cluster_offset,
-                                       unsigned offset_in_cluster,
-                                       unsigned bytes)
+static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
+                                            uint64_t src_cluster_offset,
+                                            unsigned offset_in_cluster,
+                                            uint8_t *buffer,
+                                            unsigned bytes)
 {
-    BDRVQcow2State *s = bs->opaque;
     QEMUIOVector qiov;
-    struct iovec iov;
+    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
     int ret;
 
     if (bytes == 0) {
         return 0;
     }
 
-    iov.iov_len = bytes;
-    iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
-    if (iov.iov_base == NULL) {
-        return -ENOMEM;
-    }
-
     qemu_iovec_init_external(&qiov, &iov, 1);
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
 
     if (!bs->drv) {
-        ret = -ENOMEDIUM;
-        goto out;
+        return -ENOMEDIUM;
     }
 
     /* Call .bdrv_co_readv() directly instead of using the public block-layer
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn do_perform_cow(BlockDriverState *bs,
     ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
                                   bytes, &qiov, 0);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
-    if (bs->encrypted) {
+    return 0;
+}
+
+static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
+                                                uint64_t src_cluster_offset,
+                                                unsigned offset_in_cluster,
+                                                uint8_t *buffer,
+                                                unsigned bytes)
+{
+    if (bytes && bs->encrypted) {
+        BDRVQcow2State *s = bs->opaque;
         int64_t sector = (src_cluster_offset + offset_in_cluster)
                          >> BDRV_SECTOR_BITS;
         assert(s->cipher);
         assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
         assert((bytes & ~BDRV_SECTOR_MASK) == 0);
-        if (qcow2_encrypt_sectors(s, sector, iov.iov_base, iov.iov_base,
+        if (qcow2_encrypt_sectors(s, sector, buffer, buffer,
                                   bytes >> BDRV_SECTOR_BITS, true, NULL) < 0) {
-            ret = -EIO;
-            goto out;
+            return false;
         }
     }
+    return true;
+}
+
+static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
+                                             uint64_t cluster_offset,
+                                             unsigned offset_in_cluster,
+                                             uint8_t *buffer,
+                                             unsigned bytes)
+{
+    QEMUIOVector qiov;
+    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
+    int ret;
+
+    if (bytes == 0) {
+        return 0;
+    }
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
 
     ret = qcow2_pre_write_overlap_check(bs, 0,
             cluster_offset + offset_in_cluster, bytes);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
     ret = bdrv_co_pwritev(bs->file, cluster_offset + offset_in_cluster,
                           bytes, &qiov, 0);
     if (ret < 0) {
-        goto out;
+        return ret;
     }
 
-    ret = 0;
-out:
-    qemu_vfree(iov.iov_base);
-    return ret;
+    return 0;
 }
 
 
@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     BDRVQcow2State *s = bs->opaque;
     Qcow2COWRegion *start = &m->cow_start;
     Qcow2COWRegion *end = &m->cow_end;
+    unsigned buffer_size;
+    uint8_t *start_buffer, *end_buffer;
     int ret;
 
+    assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
+
     if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
     }
 
+    /* Reserve a buffer large enough to store the data from both the
+     * start and end COW regions. Add some padding in the middle if
+     * necessary to make sure that the end region is optimally aligned */
+    buffer_size = QEMU_ALIGN_UP(start->nb_bytes, bdrv_opt_mem_align(bs)) +
+        end->nb_bytes;
+    start_buffer = qemu_try_blockalign(bs, buffer_size);
+    if (start_buffer == NULL) {
+        return -ENOMEM;
+    }
+    /* The part of the buffer where the end region is located */
+    end_buffer = start_buffer + buffer_size - end->nb_bytes;
+
     qemu_co_mutex_unlock(&s->lock);
-    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
-                         start->offset, start->nb_bytes);
+    /* First we read the existing data from both COW regions */
+    ret = do_perform_cow_read(bs, m->offset, start->offset,
+                              start_buffer, start->nb_bytes);
     if (ret < 0) {
         goto fail;
     }
 
-    ret = do_perform_cow(bs, m->offset, m->alloc_offset,
-                         end->offset, end->nb_bytes);
+    ret = do_perform_cow_read(bs, m->offset, end->offset,
+                              end_buffer, end->nb_bytes);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /* Encrypt the data if necessary before writing it */
+    if (bs->encrypted) {
+        if (!do_perform_cow_encrypt(bs, m->offset, start->offset,
+                                    start_buffer, start->nb_bytes) ||
+            !do_perform_cow_encrypt(bs, m->offset, end->offset,
+                                    end_buffer, end->nb_bytes)) {
+            ret = -EIO;
+            goto fail;
+        }
+    }
+
+    /* And now we can write everything */
+    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
+                               start_buffer, start->nb_bytes);
+    if (ret < 0) {
+        goto fail;
+    }
 
+    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset,
+                               end_buffer, end->nb_bytes);
 fail:
     qemu_co_mutex_lock(&s->lock);
 
@@ -XXX,XX +XXX,XX @@ fail:
         qcow2_cache_depends_on_flush(s->l2_table_cache);
     }
 
+    qemu_vfree(start_buffer);
     return ret;
 }
 
-- 
1.8.3.1

From: Alberto Garcia <berto@igalia.com>

Instead of passing a single buffer pointer to do_perform_cow_write(),
pass a QEMUIOVector. This will allow us to merge the write requests
for the COW regions and the actual data into a single one.

Although do_perform_cow_read() does not strictly need to change its
API, we're doing it here as well for consistency.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c | 51 ++++++++++++++++++++++++---------------------------
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -XXX,XX +XXX,XX @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num,
 static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
                                             uint64_t src_cluster_offset,
                                             unsigned offset_in_cluster,
-                                            uint8_t *buffer,
-                                            unsigned bytes)
+                                            QEMUIOVector *qiov)
 {
-    QEMUIOVector qiov;
-    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
     int ret;
 
-    if (bytes == 0) {
+    if (qiov->size == 0) {
         return 0;
     }
 
-    qemu_iovec_init_external(&qiov, &iov, 1);
-
     BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
 
     if (!bs->drv) {
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
      * which can lead to deadlock when block layer copy-on-read is enabled.
      */
     ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
-                                  bytes, &qiov, 0);
+                                  qiov->size, qiov, 0);
     if (ret < 0) {
         return ret;
     }
@@ -XXX,XX +XXX,XX @@ static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
 static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
                                              uint64_t cluster_offset,
                                              unsigned offset_in_cluster,
-                                             uint8_t *buffer,
-                                             unsigned bytes)
+                                             QEMUIOVector *qiov)
 {
-    QEMUIOVector qiov;
-    struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
     int ret;
 
-    if (bytes == 0) {
+    if (qiov->size == 0) {
         return 0;
     }
 
-    qemu_iovec_init_external(&qiov, &iov, 1);
-
     ret = qcow2_pre_write_overlap_check(bs, 0,
-            cluster_offset + offset_in_cluster, bytes);
+            cluster_offset + offset_in_cluster, qiov->size);
     if (ret < 0) {
         return ret;
     }
 
     BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
     ret = bdrv_co_pwritev(bs->file, cluster_offset + offset_in_cluster,
-                          bytes, &qiov, 0);
+                          qiov->size, qiov, 0);
     if (ret < 0) {
         return ret;
     }
@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     unsigned data_bytes = end->offset - (start->offset + start->nb_bytes);
     bool merge_reads;
     uint8_t *start_buffer, *end_buffer;
+    QEMUIOVector qiov;
     int ret;
 
     assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     /* The part of the buffer where the end region is located */
     end_buffer = start_buffer + buffer_size - end->nb_bytes;
 
+    qemu_iovec_init(&qiov, 1);
+
     qemu_co_mutex_unlock(&s->lock);
     /* First we read the existing data from both COW regions. We
      * either read the whole region in one go, or the start and end
      * regions separately. */
     if (merge_reads) {
-        ret = do_perform_cow_read(bs, m->offset, start->offset,
-                                  start_buffer, buffer_size);
+        qemu_iovec_add(&qiov, start_buffer, buffer_size);
+        ret = do_perform_cow_read(bs, m->offset, start->offset, &qiov);
     } else {
-        ret = do_perform_cow_read(bs, m->offset, start->offset,
-                                  start_buffer, start->nb_bytes);
+        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+        ret = do_perform_cow_read(bs, m->offset, start->offset, &qiov);
         if (ret < 0) {
             goto fail;
         }
 
-        ret = do_perform_cow_read(bs, m->offset, end->offset,
-                                  end_buffer, end->nb_bytes);
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+        ret = do_perform_cow_read(bs, m->offset, end->offset, &qiov);
     }
     if (ret < 0) {
         goto fail;
@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     }
 
     /* And now we can write everything */
-    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
-                               start_buffer, start->nb_bytes);
+    qemu_iovec_reset(&qiov);
+    qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
     if (ret < 0) {
         goto fail;
     }
 
-    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset,
-                               end_buffer, end->nb_bytes);
+    qemu_iovec_reset(&qiov);
+    qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
 fail:
     qemu_co_mutex_lock(&s->lock);
 
@@ -XXX,XX +XXX,XX @@ fail:
     }
 
     qemu_vfree(start_buffer);
+    qemu_iovec_destroy(&qiov);
     return ret;
 }
 
-- 
1.8.3.1

From: Alberto Garcia <berto@igalia.com>

If the guest tries to write data that results on the allocation of a
new cluster, instead of writing the guest data first and then the data
from the COW regions, write everything together using one single I/O
operation.

This can improve the write performance by 25% or more, depending on
several factors such as the media type, the cluster size and the I/O
request size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c | 40 ++++++++++++++++++++++++--------
 block/qcow2.c         | 64 +++++++++++++++++++++++++++++++++++++++++++--------
 block/qcow2.h         |  7 ++++++
 3 files changed, 91 insertions(+), 20 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
     assert(start->nb_bytes + end->nb_bytes <= UINT_MAX - data_bytes);
     assert(start->offset + start->nb_bytes <= end->offset);
+    assert(!m->data_qiov || m->data_qiov->size == data_bytes);
 
     if (start->nb_bytes == 0 && end->nb_bytes == 0) {
         return 0;
@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
     /* The part of the buffer where the end region is located */
     end_buffer = start_buffer + buffer_size - end->nb_bytes;
 
-    qemu_iovec_init(&qiov, 1);
+    qemu_iovec_init(&qiov, 2 + (m->data_qiov ? m->data_qiov->niov : 0));
 
     qemu_co_mutex_unlock(&s->lock);
     /* First we read the existing data from both COW regions. We
@@ -XXX,XX +XXX,XX @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
         }
     }
 
-    /* And now we can write everything */
-    qemu_iovec_reset(&qiov);
-    qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
-    ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
-    if (ret < 0) {
-        goto fail;
+    /* And now we can write everything. If we have the guest data we
+     * can write everything in one single operation */
+    if (m->data_qiov) {
+        qemu_iovec_reset(&qiov);
+        if (start->nb_bytes) {
+            qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+        }
+        qemu_iovec_concat(&qiov, m->data_qiov, 0, data_bytes);
+        if (end->nb_bytes) {
+            qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+        }
+        /* NOTE: we have a write_aio blkdebug event here followed by
+         * a cow_write one in do_perform_cow_write(), but there's only
+         * one single I/O operation */
+        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+        ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
+    } else {
+        /* If there's no guest data then write both COW regions separately */
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, start_buffer, start->nb_bytes);
+        ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, &qiov);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        qemu_iovec_reset(&qiov);
+        qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
+        ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
     }
 
-    qemu_iovec_reset(&qiov);
-    qemu_iovec_add(&qiov, end_buffer, end->nb_bytes);
-    ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, &qiov);
 fail:
     qemu_co_mutex_lock(&s->lock);
 
diff --git a/block/qcow2.c b/block/qcow2.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -XXX,XX +XXX,XX @@ fail:
     return ret;
 }
 
+/* Check if it's possible to merge a write request with the writing of
+ * the data from the COW regions */
+static bool merge_cow(uint64_t offset, unsigned bytes,
+                      QEMUIOVector *hd_qiov, QCowL2Meta *l2meta)
+{
+    QCowL2Meta *m;
+
+    for (m = l2meta; m != NULL; m = m->next) {
+        /* If both COW regions are empty then there's nothing to merge */
+        if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
+            continue;
+        }
+
+        /* The data (middle) region must be immediately after the
+         * start region */
+        if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
+            continue;
+        }
+
+        /* The end region must be immediately after the data (middle)
+         * region */
+        if (m->offset + m->cow_end.offset != offset + bytes) {
+            continue;
+        }
+
+        /* Make sure that adding both COW regions to the QEMUIOVector
+         * does not exceed IOV_MAX */
+        if (hd_qiov->niov > IOV_MAX - 2) {
+            continue;
+        }
+
+        m->data_qiov = hd_qiov;
+        return true;
+    }
+
+    return false;
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
                                          uint64_t bytes, QEMUIOVector *qiov,
                                          int flags)
@@ -XXX,XX +XXX,XX @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
             goto fail;
         }
 
-        qemu_co_mutex_unlock(&s->lock);
-        BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-        trace_qcow2_writev_data(qemu_coroutine_self(),
-                                cluster_offset + offset_in_cluster);
-        ret = bdrv_co_pwritev(bs->file,
-                              cluster_offset + offset_in_cluster,
-                              cur_bytes, &hd_qiov, 0);
-        qemu_co_mutex_lock(&s->lock);
-        if (ret < 0) {
-            goto fail;
+        /* If we need to do COW, check if it's possible to merge the
+         * writing of the guest data together with that of the COW regions.
+         * If it's not possible (or not necessary) then write the
+         * guest data now. */
+        if (!merge_cow(offset, cur_bytes, &hd_qiov, l2meta)) {
+            qemu_co_mutex_unlock(&s->lock);
+            BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+            trace_qcow2_writev_data(qemu_coroutine_self(),
+                                    cluster_offset + offset_in_cluster);
+            ret = bdrv_co_pwritev(bs->file,
+                                  cluster_offset + offset_in_cluster,
+                                  cur_bytes, &hd_qiov, 0);
+            qemu_co_mutex_lock(&s->lock);
+            if (ret < 0) {
+                goto fail;
+            }
         }
 
         while (l2meta != NULL) {
diff --git a/block/qcow2.h b/block/qcow2.h
index XXXXXXX..XXXXXXX 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -XXX,XX +XXX,XX @@ typedef struct QCowL2Meta
      */
     Qcow2COWRegion cow_end;
 
+    /**
+     * The I/O vector with the data from the actual guest write request.
+     * If non-NULL, this is meant to be merged together with the data
+     * from @cow_start and @cow_end into one single write operation.
+     */
+    QEMUIOVector *data_qiov;
+
     /** Pointer to next L2Meta of the same write request */
     struct QCowL2Meta *next;
 
-- 
1.8.3.1

From: Alberto Garcia <berto@igalia.com>

We already have functions for doing these calculations, so let's use
them instead of doing everything by hand. This makes the code a bit
more readable.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-cluster.c | 4 ++--
 block/qcow2.c         | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -XXX,XX +XXX,XX @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
 
     /* find the cluster offset for the given disk offset */
 
-    l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
+    l2_index = offset_to_l2_index(s, offset);
     *cluster_offset = be64_to_cpu(l2_table[l2_index]);
 
     nb_clusters = size_to_clusters(s, bytes_needed);
@@ -XXX,XX +XXX,XX @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
 
     /* find the cluster offset for the given disk offset */
 
-    l2_index = (offset >> s->cluster_bits) & (s->l2_size - 1);
+    l2_index = offset_to_l2_index(s, offset);
 
     *new_l2_table = l2_table;
     *new_l2_index = l2_index;
diff --git a/block/qcow2.c b/block/qcow2.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -XXX,XX +XXX,XX @@ static int validate_table_offset(BlockDriverState *bs, uint64_t offset,
     }
 
     /* Tables must be cluster aligned */
-    if (offset & (s->cluster_size - 1)) {
+    if (offset_into_cluster(s, offset) != 0) {
         return -EINVAL;
     }
 
-- 
1.8.3.1

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed-cluster.c | 94 ++++++++++++++++++-----------------------------------
 block/qed-table.c   | 15 +++------
 block/qed.h         |  3 +-
 3 files changed, 36 insertions(+), 76 deletions(-)

diff --git a/block/qed-cluster.c b/block/qed-cluster.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed-cluster.c
+++ b/block/qed-cluster.c
@@ -XXX,XX +XXX,XX @@ static unsigned int qed_count_contiguous_clusters(BDRVQEDState *s,
     return i - index;
 }
 
-typedef struct {
-    BDRVQEDState *s;
-    uint64_t pos;
-    size_t len;
-
-    QEDRequest *request;
-
-    /* User callback */
-    QEDFindClusterFunc *cb;
-    void *opaque;
-} QEDFindClusterCB;
-
-static void qed_find_cluster_cb(void *opaque, int ret)
-{
-    QEDFindClusterCB *find_cluster_cb = opaque;
-    BDRVQEDState *s = find_cluster_cb->s;
-    QEDRequest *request = find_cluster_cb->request;
-    uint64_t offset = 0;
-    size_t len = 0;
-    unsigned int index;
-    unsigned int n;
-
-    qed_acquire(s);
-    if (ret) {
-        goto out;
-    }
-
-    index = qed_l2_index(s, find_cluster_cb->pos);
-    n = qed_bytes_to_clusters(s,
-                              qed_offset_into_cluster(s, find_cluster_cb->pos) +
-                              find_cluster_cb->len);
-    n = qed_count_contiguous_clusters(s, request->l2_table->table,
-                                      index, n, &offset);
-
-    if (qed_offset_is_unalloc_cluster(offset)) {
-        ret = QED_CLUSTER_L2;
-    } else if (qed_offset_is_zero_cluster(offset)) {
-        ret = QED_CLUSTER_ZERO;
-    } else if (qed_check_cluster_offset(s, offset)) {
-        ret = QED_CLUSTER_FOUND;
-    } else {
-        ret = -EINVAL;
-    }
-
-    len = MIN(find_cluster_cb->len, n * s->header.cluster_size -
-              qed_offset_into_cluster(s, find_cluster_cb->pos));
-
-out:
-    find_cluster_cb->cb(find_cluster_cb->opaque, ret, offset, len);
-    qed_release(s);
-    g_free(find_cluster_cb);
-}
-
 /**
  * Find the offset of a data cluster
  *
@@ -XXX,XX +XXX,XX @@ out:
 void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
                       size_t len, QEDFindClusterFunc *cb, void *opaque)
 {
-    QEDFindClusterCB *find_cluster_cb;
     uint64_t l2_offset;
+    uint64_t offset = 0;
+    unsigned int index;
+    unsigned int n;
+    int ret;
 
     /* Limit length to L2 boundary.  Requests are broken up at the L2 boundary
      * so that a request acts on one L2 table at a time.
@@ -XXX,XX +XXX,XX @@ void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
         return;
     }
 
-    find_cluster_cb = g_malloc(sizeof(*find_cluster_cb));
-    find_cluster_cb->s = s;
-    find_cluster_cb->pos = pos;
-    find_cluster_cb->len = len;
-    find_cluster_cb->cb = cb;
-    find_cluster_cb->opaque = opaque;
-    find_cluster_cb->request = request;
+    ret = qed_read_l2_table(s, request, l2_offset);
+    qed_acquire(s);
+    if (ret) {
+        goto out;
+    }
+
+    index = qed_l2_index(s, pos);
+    n = qed_bytes_to_clusters(s,
+                              qed_offset_into_cluster(s, pos) + len);
+    n = qed_count_contiguous_clusters(s, request->l2_table->table,
+                                      index, n, &offset);
+
+    if (qed_offset_is_unalloc_cluster(offset)) {
+        ret = QED_CLUSTER_L2;
+    } else if (qed_offset_is_zero_cluster(offset)) {
+        ret = QED_CLUSTER_ZERO;
+    } else if (qed_check_cluster_offset(s, offset)) {
+        ret = QED_CLUSTER_FOUND;
+    } else {
+        ret = -EINVAL;
+    }
+
+    len = MIN(len,
+              n * s->header.cluster_size - qed_offset_into_cluster(s, pos));
 
-    qed_read_l2_table(s, request, l2_offset,
-                      qed_find_cluster_cb, find_cluster_cb);
+out:
+    cb(opaque, ret, offset, len);
+    qed_release(s);
 }
diff --git a/block/qed-table.c b/block/qed-table.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed-table.c
+++ b/block/qed-table.c
@@ -XXX,XX +XXX,XX @@ int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
     return ret;
 }
 
-void qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset,
-                       BlockCompletionFunc *cb, void *opaque)
+int qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset)
 {
     int ret;
 
@@ -XXX,XX +XXX,XX @@ void qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset,
     /* Check for cached L2 entry */
     request->l2_table = qed_find_l2_cache_entry(&s->l2_cache, offset);
     if (request->l2_table) {
-        cb(opaque, 0);
-        return;
+        return 0;
     }
 
     request->l2_table = qed_alloc_l2_cache_entry(&s->l2_cache);
@@ -XXX,XX +XXX,XX @@ void qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset,
     }
     qed_release(s);
 
-    cb(opaque, ret);
+    return ret;
 }
 
 int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request, uint64_t offset)
 {
-    int ret = -EINPROGRESS;
-
-    qed_read_l2_table(s, request, offset, qed_sync_cb, &ret);
-    BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS);
-
-    return ret;
+    return qed_read_l2_table(s, request, offset);
 }
 
 void qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
diff --git a/block/qed.h b/block/qed.h
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -XXX,XX +XXX,XX @@ int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
                             unsigned int n);
 int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
                            uint64_t offset);
-void qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset,
-                       BlockCompletionFunc *cb, void *opaque);
+int qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset);
 void qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
                         unsigned int index, unsigned int n, bool flush,
                         BlockCompletionFunc *cb, void *opaque);
-- 
1.8.3.1

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed-cluster.c | 39 ++++++++++++++++++++++-----------------
 block/qed.c         | 24 +++++++++++-------------
 block/qed.h         |  4 ++--
 3 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/block/qed-cluster.c b/block/qed-cluster.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed-cluster.c
+++ b/block/qed-cluster.c
@@ -XXX,XX +XXX,XX @@ static unsigned int qed_count_contiguous_clusters(BDRVQEDState *s,
  * @s:          QED state
  * @request:    L2 cache entry
  * @pos:        Byte position in device
- * @len:        Number of bytes
- * @cb:         Completion function
- * @opaque:     User data for completion function
+ * @len:        Number of bytes (may be shortened on return)
+ * @img_offset: Contains offset in the image file on success
  *
  * This function translates a position in the block device to an offset in the
- * image file.  It invokes the cb completion callback to report back the
- * translated offset or unallocated range in the image file.
+ * image file. The translated offset or unallocated range in the image file is
+ * reported back in *img_offset and *len.
  *
  * If the L2 table exists, request->l2_table points to the L2 table cache entry
  * and the caller must free the reference when they are finished.  The cache
  * entry is exposed in this way to avoid callers having to read the L2 table
  * again later during request processing.  If request->l2_table is non-NULL it
  * will be unreferenced before taking on the new cache entry.
+ *
+ * On success QED_CLUSTER_FOUND is returned and img_offset/len are a contiguous
+ * range in the image file.
+ *
+ * On failure QED_CLUSTER_L2 or QED_CLUSTER_L1 is returned for missing L2 or L1
+ * table offset, respectively. len is number of contiguous unallocated bytes.
  */
-void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
-                      size_t len, QEDFindClusterFunc *cb, void *opaque)
+int qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
+                     size_t *len, uint64_t *img_offset)
 {
     uint64_t l2_offset;
     uint64_t offset = 0;
@@ -XXX,XX +XXX,XX @@ void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
     /* Limit length to L2 boundary.  Requests are broken up at the L2 boundary
      * so that a request acts on one L2 table at a time.
      */
-    len = MIN(len, (((pos >> s->l1_shift) + 1) << s->l1_shift) - pos);
+    *len = MIN(*len, (((pos >> s->l1_shift) + 1) << s->l1_shift) - pos);
 
     l2_offset = s->l1_table->offsets[qed_l1_index(s, pos)];
     if (qed_offset_is_unalloc_cluster(l2_offset)) {
-        cb(opaque, QED_CLUSTER_L1, 0, len);
-        return;
+        *img_offset = 0;
+        return QED_CLUSTER_L1;
     }
     if (!qed_check_table_offset(s, l2_offset)) {
-        cb(opaque, -EINVAL, 0, 0);
-        return;
+        *img_offset = *len = 0;
+        return -EINVAL;
     }
 
     ret = qed_read_l2_table(s, request, l2_offset);
@@ -XXX,XX +XXX,XX @@ void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
     }
 
     index = qed_l2_index(s, pos);
-    n = qed_bytes_to_clusters(s,
-                              qed_offset_into_cluster(s, pos) + len);
+    n = qed_bytes_to_clusters(s, qed_offset_into_cluster(s, pos) + *len);
     n = qed_count_contiguous_clusters(s, request->l2_table->table,
                                       index, n, &offset);
 
@@ -XXX,XX +XXX,XX @@ void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
         ret = -EINVAL;
     }
 
-    len = MIN(len,
-              n * s->header.cluster_size - qed_offset_into_cluster(s, pos));
+    *len = MIN(*len,
+               n * s->header.cluster_size - qed_offset_into_cluster(s, pos));
 
 out:
-    cb(opaque, ret, offset, len);
+    *img_offset = offset;
     qed_release(s);
+    return ret;
 }
diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
         .file = file,
     };
     QEDRequest request = { .l2_table = NULL };
+    uint64_t offset;
+    int ret;
 
-    qed_find_cluster(s, &request, cb.pos, len, qed_is_allocated_cb, &cb);
+    ret = qed_find_cluster(s, &request, cb.pos, &len, &offset);
+    qed_is_allocated_cb(&cb, ret, offset, len);
 
-    /* Now sleep if the callback wasn't invoked immediately */
-    while (cb.status == BDRV_BLOCK_OFFSET_MASK) {
-        cb.co = qemu_coroutine_self();
-        qemu_coroutine_yield();
-    }
+    /* The callback was invoked immediately */
+    assert(cb.status != BDRV_BLOCK_OFFSET_MASK);
 
     qed_unref_l2_cache_entry(request.l2_table);
 
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
  *              or -errno
  * @offset:     Cluster offset in bytes
  * @len:        Length in bytes
- *
- * Callback from qed_find_cluster().
  */
 static void qed_aio_write_data(void *opaque, int ret,
                                uint64_t offset, size_t len)
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_data(void *opaque, int ret,
  *              or -errno
  * @offset:     Cluster offset in bytes
  * @len:        Length in bytes
- *
- * Callback from qed_find_cluster().
  */
 static void qed_aio_read_data(void *opaque, int ret,
                               uint64_t offset, size_t len)
@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb, int ret)
     BDRVQEDState *s = acb_to_s(acb);
     QEDFindClusterFunc *io_fn = (acb->flags & QED_AIOCB_WRITE) ?
                                 qed_aio_write_data : qed_aio_read_data;
+    uint64_t offset;
+    size_t len;
 
     trace_qed_aio_next_io(s, acb, ret, acb->cur_pos + acb->cur_qiov.size);
 
@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb, int ret)
     }
 
     /* Find next cluster and start I/O */
-    qed_find_cluster(s, &acb->request,
-                      acb->cur_pos, acb->end_pos - acb->cur_pos,
-                      io_fn, acb);
+    len = acb->end_pos - acb->cur_pos;
+    ret = qed_find_cluster(s, &acb->request, acb->cur_pos, &len, &offset);
+    io_fn(acb, ret, offset, len);
 }
 
 static BlockAIOCB *qed_aio_setup(BlockDriverState *bs,
diff --git a/block/qed.h b/block/qed.h
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -XXX,XX +XXX,XX @@ int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
 /**
  * Cluster functions
  */
-void qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
-                      size_t len, QEDFindClusterFunc *cb, void *opaque);
+int qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
+                     size_t *len, uint64_t *img_offset);
 
 /**
  * Consistency check
-- 
1.8.3.1

With this change, qed_aio_write_prefill() and qed_aio_write_postfill()
collapse into a single function. This is reflected by a rename of the
combined function to qed_aio_write_cow().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 57 +++++++++++++++++++++++----------------------------------
 1 file changed, 23 insertions(+), 34 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static int qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
  * @pos:        Byte position in device
  * @len:        Number of bytes
  * @offset:     Byte offset in image file
- * @cb:         Completion function
- * @opaque:     User data for completion function
  */
-static void qed_copy_from_backing_file(BDRVQEDState *s, uint64_t pos,
-                                       uint64_t len, uint64_t offset,
-                                       BlockCompletionFunc *cb,
-                                       void *opaque)
+static int qed_copy_from_backing_file(BDRVQEDState *s, uint64_t pos,
+                                      uint64_t len, uint64_t offset)
 {
     QEMUIOVector qiov;
     QEMUIOVector *backing_qiov = NULL;
@@ -XXX,XX +XXX,XX @@ static void qed_copy_from_backing_file(BDRVQEDState *s, uint64_t pos,
 
     /* Skip copy entirely if there is no work to do */
     if (len == 0) {
-        cb(opaque, 0);
-        return;
+        return 0;
     }
 
     iov = (struct iovec) {
@@ -XXX,XX +XXX,XX @@ static void qed_copy_from_backing_file(BDRVQEDState *s, uint64_t pos,
     ret = 0;
 out:
     qemu_vfree(iov.iov_base);
-    cb(opaque, ret);
+    return ret;
 }
 
 /**
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_main(void *opaque, int ret)
 }
 
 /**
- * Populate back untouched region of new data cluster
+ * Populate untouched regions of new data cluster
  */
-static void qed_aio_write_postfill(void *opaque, int ret)
+static void qed_aio_write_cow(void *opaque, int ret)
 {
     QEDAIOCB *acb = opaque;
     BDRVQEDState *s = acb_to_s(acb);
-    uint64_t start = acb->cur_pos + acb->cur_qiov.size;
-    uint64_t len =
-        qed_start_of_cluster(s, start + s->header.cluster_size - 1) - start;
-    uint64_t offset = acb->cur_cluster +
-                      qed_offset_into_cluster(s, acb->cur_pos) +
-                      acb->cur_qiov.size;
+    uint64_t start, len, offset;
+
+    /* Populate front untouched region of new data cluster */
+    start = qed_start_of_cluster(s, acb->cur_pos);
+    len = qed_offset_into_cluster(s, acb->cur_pos);
 
+    trace_qed_aio_write_prefill(s, acb, start, len, acb->cur_cluster);
+    ret = qed_copy_from_backing_file(s, start, len, acb->cur_cluster);
     if (ret) {
         qed_aio_complete(acb, ret);
         return;
     }
 
-    trace_qed_aio_write_postfill(s, acb, start, len, offset);
-    qed_copy_from_backing_file(s, start, len, offset,
-                                qed_aio_write_main, acb);
-}
+    /* Populate back untouched region of new data cluster */
+    start = acb->cur_pos + acb->cur_qiov.size;
+    len = qed_start_of_cluster(s, start + s->header.cluster_size - 1) - start;
+    offset = acb->cur_cluster +
+             qed_offset_into_cluster(s, acb->cur_pos) +
+             acb->cur_qiov.size;
 
-/**
- * Populate front untouched region of new data cluster
- */
-static void qed_aio_write_prefill(void *opaque, int ret)
-{
-    QEDAIOCB *acb = opaque;
-    BDRVQEDState *s = acb_to_s(acb);
-    uint64_t start = qed_start_of_cluster(s, acb->cur_pos);
-    uint64_t len = qed_offset_into_cluster(s, acb->cur_pos);
+    trace_qed_aio_write_postfill(s, acb, start, len, offset);
+    ret = qed_copy_from_backing_file(s, start, len, offset);
 
-    trace_qed_aio_write_prefill(s, acb, start, len, acb->cur_cluster);
-    qed_copy_from_backing_file(s, start, len, acb->cur_cluster,
-                                qed_aio_write_postfill, acb);
+    qed_aio_write_main(acb, ret);
 }
 
 /**
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 
         cb = qed_aio_write_zero_cluster;
     } else {
-        cb = qed_aio_write_prefill;
+        cb = qed_aio_write_cow;
         acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
     }
 
-- 
1.8.3.1

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 32 ++++++++++++--------------------
 1 file changed, 12 insertions(+), 20 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ int qed_write_header_sync(BDRVQEDState *s)
  * This function only updates known header fields in-place and does not affect
  * extra data after the QED header.
  */
-static void qed_write_header(BDRVQEDState *s, BlockCompletionFunc cb,
-                             void *opaque)
+static int qed_write_header(BDRVQEDState *s)
 {
     /* We must write full sectors for O_DIRECT but cannot necessarily generate
      * the data following the header if an unrecognized compat feature is
@@ -XXX,XX +XXX,XX @@ static void qed_write_header(BDRVQEDState *s, BlockCompletionFunc cb,
     ret = 0;
 out:
     qemu_vfree(buf);
-    cb(opaque, ret);
+    return ret;
 }
 
 static uint64_t qed_max_image_size(uint32_t cluster_size, uint32_t table_size)
@@ -XXX,XX +XXX,XX @@ static void qed_unplug_allocating_write_reqs(BDRVQEDState *s)
     }
 }
 
-static void qed_finish_clear_need_check(void *opaque, int ret)
-{
-    /* Do nothing */
-}
-
-static void qed_flush_after_clear_need_check(void *opaque, int ret)
-{
-    BDRVQEDState *s = opaque;
-
-    bdrv_aio_flush(s->bs, qed_finish_clear_need_check, s);
-
-    /* No need to wait until flush completes */
-    qed_unplug_allocating_write_reqs(s);
-}
-
 static void qed_clear_need_check(void *opaque, int ret)
 {
     BDRVQEDState *s = opaque;
@@ -XXX,XX +XXX,XX @@ static void qed_clear_need_check(void *opaque, int ret)
     }
 
     s->header.features &= ~QED_F_NEED_CHECK;
-    qed_write_header(s, qed_flush_after_clear_need_check, s);
+    ret = qed_write_header(s);
+    (void) ret;
+
+    qed_unplug_allocating_write_reqs(s);
+
+    ret = bdrv_flush(s->bs);
+    (void) ret;
 }
 
 static void qed_need_check_timer_cb(void *opaque)
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 {
     BDRVQEDState *s = acb_to_s(acb);
     BlockCompletionFunc *cb;
+    int ret;
 
     /* Cancel timer when the first allocating request comes in */
     if (QSIMPLEQ_EMPTY(&s->allocating_write_reqs)) {
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 
     if (qed_should_set_need_check(s)) {
         s->header.features |= QED_F_NEED_CHECK;
-        qed_write_header(s, cb, acb);
+        ret = qed_write_header(s);
+        cb(acb, ret);
     } else {
         cb(acb, 0);
     }
-- 
1.8.3.1

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed-table.c | 47 ++++++++++++-----------------------------------
 block/qed.c       | 12 +++++++-----
 block/qed.h       |  8 +++-----
 3 files changed, 22 insertions(+), 45 deletions(-)

diff --git a/block/qed-table.c b/block/qed-table.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed-table.c
+++ b/block/qed-table.c
@@ -XXX,XX +XXX,XX @@ out:
  * @index:      Index of first element
  * @n:          Number of elements
  * @flush:      Whether or not to sync to disk
- * @cb:         Completion function
- * @opaque:     Argument for completion function
  */
-static void qed_write_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
-                            unsigned int index, unsigned int n, bool flush,
-                            BlockCompletionFunc *cb, void *opaque)
+static int qed_write_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
+                           unsigned int index, unsigned int n, bool flush)
 {
     unsigned int sector_mask = BDRV_SECTOR_SIZE / sizeof(uint64_t) - 1;
     unsigned int start, end, i;
@@ -XXX,XX +XXX,XX @@ static void qed_write_table(BDRVQEDState *s, uint64_t offset, QEDTable *table,
     ret = 0;
 out:
     qemu_vfree(new_table);
-    cb(opaque, ret);
-}
-
-/**
- * Propagate return value from async callback
- */
-static void qed_sync_cb(void *opaque, int ret)
-{
-    *(int *)opaque = ret;
+    return ret;
 }
 
 int qed_read_l1_table_sync(BDRVQEDState *s)
@@ -XXX,XX +XXX,XX @@ int qed_read_l1_table_sync(BDRVQEDState *s)
     return qed_read_table(s, s->header.l1_table_offset, s->l1_table);
 }
 
-void qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n,
-                        BlockCompletionFunc *cb, void *opaque)
+int qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n)
 {
     BLKDBG_EVENT(s->bs->file, BLKDBG_L1_UPDATE);
-    qed_write_table(s, s->header.l1_table_offset,
-                    s->l1_table, index, n, false, cb, opaque);
+    return qed_write_table(s, s->header.l1_table_offset,
+                           s->l1_table, index, n, false);
 }
 
 int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
                             unsigned int n)
 {
-    int ret = -EINPROGRESS;
-
-    qed_write_l1_table(s, index, n, qed_sync_cb, &ret);
-    BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS);
-
-    return ret;
+    return qed_write_l1_table(s, index, n);
 }
 
 int qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset)
@@ -XXX,XX +XXX,XX @@ int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request, uint64_t offset
     return qed_read_l2_table(s, request, offset);
 }
 
-void qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
-                        unsigned int index, unsigned int n, bool flush,
-                        BlockCompletionFunc *cb, void *opaque)
+int qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
+                       unsigned int index, unsigned int n, bool flush)
 {
     BLKDBG_EVENT(s->bs->file, BLKDBG_L2_UPDATE);
-    qed_write_table(s, request->l2_table->offset,
-                    request->l2_table->table, index, n, flush, cb, opaque);
+    return qed_write_table(s, request->l2_table->offset,
+                           request->l2_table->table, index, n, flush);
 }
 
 int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
                             unsigned int index, unsigned int n, bool flush)
 {
-    int ret = -EINPROGRESS;
-
-    qed_write_l2_table(s, request, index, n, flush, qed_sync_cb, &ret);
-    BDRV_POLL_WHILE(s->bs, ret == -EINPROGRESS);
-
-    return ret;
+    return qed_write_l2_table(s, request, index, n, flush);
 }
diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_l1_update(void *opaque, int ret)
     index = qed_l1_index(s, acb->cur_pos);
     s->l1_table->offsets[index] = acb->request.l2_table->offset;
 
-    qed_write_l1_table(s, index, 1, qed_commit_l2_update, acb);
+    ret = qed_write_l1_table(s, index, 1);
+    qed_commit_l2_update(acb, ret);
 }
 
 /**
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
 
     if (need_alloc) {
         /* Write out the whole new L2 table */
-        qed_write_l2_table(s, &acb->request, 0, s->table_nelems, true,
-                           qed_aio_write_l1_update, acb);
+        ret = qed_write_l2_table(s, &acb->request, 0, s->table_nelems, true);
+        qed_aio_write_l1_update(acb, ret);
     } else {
         /* Write out only the updated part of the L2 table */
-        qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters, false,
-                           qed_aio_next_io_cb, acb);
+        ret = qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters,
+                                 false);
+        qed_aio_next_io(acb, ret);
     }
     return;
 
diff --git a/block/qed.h b/block/qed.h
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -XXX,XX +XXX,XX @@ void qed_commit_l2_cache_entry(L2TableCache *l2_cache, CachedL2Table *l2_table);
  * Table I/O functions
  */
 int qed_read_l1_table_sync(BDRVQEDState *s);
-void qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n,
-                        BlockCompletionFunc *cb, void *opaque);
+int qed_write_l1_table(BDRVQEDState *s, unsigned int index, unsigned int n);
 int qed_write_l1_table_sync(BDRVQEDState *s, unsigned int index,
                             unsigned int n);
 int qed_read_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
                            uint64_t offset);
 int qed_read_l2_table(BDRVQEDState *s, QEDRequest *request, uint64_t offset);
-void qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
-                        unsigned int index, unsigned int n, bool flush,
-                        BlockCompletionFunc *cb, void *opaque);
+int qed_write_l2_table(BDRVQEDState *s, QEDRequest *request,
+                       unsigned int index, unsigned int n, bool flush);
 int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
                             unsigned int index, unsigned int n, bool flush);
 
-- 
1.8.3.1

Note that this code is generally not running in coroutine context, so
this is an actual blocking synchronous operation. We'll fix this in a
moment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 61 +++++++++++++++++++------------------------------------------
 1 file changed, 19 insertions(+), 42 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static void qed_aio_start_io(QEDAIOCB *acb)
     qed_aio_next_io(acb, 0);
 }
 
-static void qed_aio_next_io_cb(void *opaque, int ret)
-{
-    QEDAIOCB *acb = opaque;
-
-    qed_aio_next_io(acb, ret);
-}
-
 static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
 {
     assert(!s->allocating_write_reqs_plugged);
@@ -XXX,XX +XXX,XX @@ err:
     qed_aio_complete(acb, ret);
 }
 
-static void qed_aio_write_l2_update_cb(void *opaque, int ret)
-{
-    QEDAIOCB *acb = opaque;
-    qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
-}
-
-/**
- * Flush new data clusters before updating the L2 table
- *
- * This flush is necessary when a backing file is in use.  A crash during an
- * allocating write could result in empty clusters in the image.  If the write
- * only touched a subregion of the cluster, then backing image sectors have
- * been lost in the untouched region.  The solution is to flush after writing a
- * new data cluster and before updating the L2 table.
- */
-static void qed_aio_write_flush_before_l2_update(void *opaque, int ret)
-{
-    QEDAIOCB *acb = opaque;
-    BDRVQEDState *s = acb_to_s(acb);
-
-    if (!bdrv_aio_flush(s->bs->file->bs, qed_aio_write_l2_update_cb, opaque)) {
-        qed_aio_complete(acb, -EIO);
-    }
-}
-
 /**
  * Write data to the image file
  */
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_main(void *opaque, int ret)
     BDRVQEDState *s = acb_to_s(acb);
     uint64_t offset = acb->cur_cluster +
                       qed_offset_into_cluster(s, acb->cur_pos);
-    BlockCompletionFunc *next_fn;
 
     trace_qed_aio_write_main(s, acb, ret, offset, acb->cur_qiov.size);
 
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_main(void *opaque, int ret)
         return;
     }
 
+    BLKDBG_EVENT(s->bs->file, BLKDBG_WRITE_AIO);
+    ret = bdrv_pwritev(s->bs->file, offset, &acb->cur_qiov);
+    if (ret >= 0) {
+        ret = 0;
+    }
+
     if (acb->find_cluster_ret == QED_CLUSTER_FOUND) {
-        next_fn = qed_aio_next_io_cb;
+        qed_aio_next_io(acb, ret);
     } else {
         if (s->bs->backing) {
-            next_fn = qed_aio_write_flush_before_l2_update;
-        } else {
-            next_fn = qed_aio_write_l2_update_cb;
+            /*
+             * Flush new data clusters before updating the L2 table
+             *
+             * This flush is necessary when a backing file is in use.  A crash
+             * during an allocating write could result in empty clusters in the
+             * image.  If the write only touched a subregion of the cluster,
+             * then backing image sectors have been lost in the untouched
+             * region.  The solution is to flush after writing a new data
+             * cluster and before updating the L2 table.
+             */
+            ret = bdrv_flush(s->bs->file->bs);
         }
+        qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
     }
-
-    BLKDBG_EVENT(s->bs->file, BLKDBG_WRITE_AIO);
-    bdrv_aio_writev(s->bs->file, offset / BDRV_SECTOR_SIZE,
-                    &acb->cur_qiov, acb->cur_qiov.size / BDRV_SECTOR_SIZE,
-                    next_fn, acb);
 }
 
 /**
-- 
1.8.3.1

qed_commit_l2_update() is unconditionally called at the end of
qed_aio_write_l1_update(). Inline it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 36 ++++++++++++++----------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
 }
 
 /**
- * Commit the current L2 table to the cache
+ * Update L1 table with new L2 table offset and write it out
  */
-static void qed_commit_l2_update(void *opaque, int ret)
+static void qed_aio_write_l1_update(void *opaque, int ret)
 {
     QEDAIOCB *acb = opaque;
     BDRVQEDState *s = acb_to_s(acb);
     CachedL2Table *l2_table = acb->request.l2_table;
     uint64_t l2_offset = l2_table->offset;
+    int index;
+
+    if (ret) {
+        qed_aio_complete(acb, ret);
+        return;
+    }
 
+    index = qed_l1_index(s, acb->cur_pos);
+    s->l1_table->offsets[index] = l2_table->offset;
+
+    ret = qed_write_l1_table(s, index, 1);
+
+    /* Commit the current L2 table to the cache */
     qed_commit_l2_cache_entry(&s->l2_cache, l2_table);
 
     /* This is guaranteed to succeed because we just committed the entry to the
@@ -XXX,XX +XXX,XX @@ static void qed_commit_l2_update(void *opaque, int ret)
     qed_aio_next_io(acb, ret);
 }
 
-/**
- * Update L1 table with new L2 table offset and write it out
- */
-static void qed_aio_write_l1_update(void *opaque, int ret)
-{
-    QEDAIOCB *acb = opaque;
-    BDRVQEDState *s = acb_to_s(acb);
-    int index;
-
-    if (ret) {
-        qed_aio_complete(acb, ret);
-        return;
-    }
-
-    index = qed_l1_index(s, acb->cur_pos);
-    s->l1_table->offsets[index] = acb->request.l2_table->offset;
-
-    ret = qed_write_l1_table(s, index, 1);
-    qed_commit_l2_update(acb, ret);
-}
 
 /**
  * Update L2 table with new cluster offsets and write them out
-- 
1.8.3.1

Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
just return an error code and let the caller handle it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
 /**
  * Update L1 table with new L2 table offset and write it out
  */
-static void qed_aio_write_l1_update(void *opaque, int ret)
+static int qed_aio_write_l1_update(QEDAIOCB *acb)
 {
-    QEDAIOCB *acb = opaque;
     BDRVQEDState *s = acb_to_s(acb);
     CachedL2Table *l2_table = acb->request.l2_table;
     uint64_t l2_offset = l2_table->offset;
-    int index;
-
-    if (ret) {
-        qed_aio_complete(acb, ret);
-        return;
-    }
+    int index, ret;
 
     index = qed_l1_index(s, acb->cur_pos);
     s->l1_table->offsets[index] = l2_table->offset;
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_l1_update(void *opaque, int ret)
     acb->request.l2_table = qed_find_l2_cache_entry(&s->l2_cache, l2_offset);
     assert(acb->request.l2_table != NULL);
 
-    qed_aio_next_io(acb, ret);
+    return ret;
 }
 
 
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
     if (need_alloc) {
         /* Write out the whole new L2 table */
         ret = qed_write_l2_table(s, &acb->request, 0, s->table_nelems, true);
-        qed_aio_write_l1_update(acb, ret);
+        if (ret) {
+            goto err;
+        }
+        ret = qed_aio_write_l1_update(acb);
+        qed_aio_next_io(acb, ret);
+
     } else {
         /* Write out only the updated part of the L2 table */
         ret = qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters,
-- 
1.8.3.1

Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
just return an error code and let the caller handle it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 43 ++++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_l1_update(QEDAIOCB *acb)
 /**
  * Update L2 table with new cluster offsets and write them out
  */
-static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
+static int qed_aio_write_l2_update(QEDAIOCB *acb, uint64_t offset)
 {
     BDRVQEDState *s = acb_to_s(acb);
     bool need_alloc = acb->find_cluster_ret == QED_CLUSTER_L1;
-    int index;
-
-    if (ret) {
-        goto err;
-    }
+    int index, ret;
 
     if (need_alloc) {
         qed_unref_l2_cache_entry(acb->request.l2_table);
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset)
         /* Write out the whole new L2 table */
         ret = qed_write_l2_table(s, &acb->request, 0, s->table_nelems, true);
         if (ret) {
-            goto err;
+            return ret;
         }
-        ret = qed_aio_write_l1_update(acb);
-        qed_aio_next_io(acb, ret);
-
+        return qed_aio_write_l1_update(acb);
     } else {
         /* Write out only the updated part of the L2 table */
         ret = qed_write_l2_table(s, &acb->request, index, acb->cur_nclusters,
                                  false);
-        qed_aio_next_io(acb, ret);
+        if (ret) {
+            return ret;
+        }
     }
-    return;
-
-err:
-    qed_aio_complete(acb, ret);
+    return 0;
 }
 
 /**
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_main(void *opaque, int ret)
              */
             ret = bdrv_flush(s->bs->file->bs);
         }
-        qed_aio_write_l2_update(acb, ret, acb->cur_cluster);
+        if (ret) {
+            goto err;
+        }
+        ret = qed_aio_write_l2_update(acb, acb->cur_cluster);
+        if (ret) {
+            goto err;
+        }
+        qed_aio_next_io(acb, 0);
     }
+    return;
+
+err:
+    qed_aio_complete(acb, ret);
 }
 
 /**
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_zero_cluster(void *opaque, int ret)
         return;
     }
 
-    qed_aio_write_l2_update(acb, 0, 1);
+    ret = qed_aio_write_l2_update(acb, 1);
+    if (ret < 0) {
+        qed_aio_complete(acb, ret);
+        return;
+    }
+    qed_aio_next_io(acb, 0);
 }
 
 /**
-- 
1.8.3.1

Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
just return an error code and let the caller handle it.

While refactoring qed_aio_write_alloc() to accomodate the change,
qed_aio_write_zero_cluster() ended up with a single line, so I chose to
inline that line and remove the function completely.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 58 +++++++++++++++++++++-------------------------------------
 1 file changed, 21 insertions(+), 37 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_main(QEDAIOCB *acb)
 /**
  * Populate untouched regions of new data cluster
  */
-static void qed_aio_write_cow(void *opaque, int ret)
+static int qed_aio_write_cow(QEDAIOCB *acb)
 {
-    QEDAIOCB *acb = opaque;
     BDRVQEDState *s = acb_to_s(acb);
     uint64_t start, len, offset;
+    int ret;
 
     /* Populate front untouched region of new data cluster */
     start = qed_start_of_cluster(s, acb->cur_pos);
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_cow(void *opaque, int ret)
 
     trace_qed_aio_write_prefill(s, acb, start, len, acb->cur_cluster);
     ret = qed_copy_from_backing_file(s, start, len, acb->cur_cluster);
-    if (ret) {
-        qed_aio_complete(acb, ret);
-        return;
+    if (ret < 0) {
+        return ret;
     }
 
     /* Populate back untouched region of new data cluster */
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_cow(void *opaque, int ret)
 
     trace_qed_aio_write_postfill(s, acb, start, len, offset);
     ret = qed_copy_from_backing_file(s, start, len, offset);
-    if (ret) {
-        qed_aio_complete(acb, ret);
-        return;
-    }
-
-    ret = qed_aio_write_main(acb);
     if (ret < 0) {
-        qed_aio_complete(acb, ret);
-        return;
+        return ret;
     }
-    qed_aio_next_io(acb, 0);
+
+    return qed_aio_write_main(acb);
 }
 
 /**
@@ -XXX,XX +XXX,XX @@ static bool qed_should_set_need_check(BDRVQEDState *s)
     return !(s->header.features & QED_F_NEED_CHECK);
 }
 
-static void qed_aio_write_zero_cluster(void *opaque, int ret)
-{
-    QEDAIOCB *acb = opaque;
-
-    if (ret) {
-        qed_aio_complete(acb, ret);
-        return;
-    }
-
-    ret = qed_aio_write_l2_update(acb, 1);
-    if (ret < 0) {
-        qed_aio_complete(acb, ret);
-        return;
-    }
-    qed_aio_next_io(acb, 0);
-}
-
 /**
  * Write new data cluster
  *
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_zero_cluster(void *opaque, int ret)
 static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 {
     BDRVQEDState *s = acb_to_s(acb);
-    BlockCompletionFunc *cb;
     int ret;
 
     /* Cancel timer when the first allocating request comes in */
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
             qed_aio_start_io(acb);
             return;
         }
-
-        cb = qed_aio_write_zero_cluster;
     } else {
-        cb = qed_aio_write_cow;
         acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
     }
 
     if (qed_should_set_need_check(s)) {
         s->header.features |= QED_F_NEED_CHECK;
         ret = qed_write_header(s);
-        cb(acb, ret);
+        if (ret < 0) {
+            qed_aio_complete(acb, ret);
+            return;
+        }
+    }
+
+    if (acb->flags & QED_AIOCB_ZERO) {
+        ret = qed_aio_write_l2_update(acb, 1);
     } else {
-        cb(acb, 0);
+        ret = qed_aio_write_cow(acb);
     }
+    if (ret < 0) {
+        qed_aio_complete(acb, ret);
+        return;
+    }
+    qed_aio_next_io(acb, 0);
 }
 
 /**
-- 
1.8.3.1

Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
just return an error code and let the caller handle it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 43 ++++++++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static bool qed_should_set_need_check(BDRVQEDState *s)
  *
  * This path is taken when writing to previously unallocated clusters.
  */
-static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
+static int qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 {
     BDRVQEDState *s = acb_to_s(acb);
     int ret;
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
     }
     if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs) ||
         s->allocating_write_reqs_plugged) {
-        return; /* wait for existing request to finish */
+        return -EINPROGRESS; /* wait for existing request to finish */
     }
 
     acb->cur_nclusters = qed_bytes_to_clusters(s,
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
     if (acb->flags & QED_AIOCB_ZERO) {
         /* Skip ahead if the clusters are already zero */
         if (acb->find_cluster_ret == QED_CLUSTER_ZERO) {
-            qed_aio_start_io(acb);
-            return;
+            return 0;
         }
     } else {
         acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters);
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
         s->header.features |= QED_F_NEED_CHECK;
         ret = qed_write_header(s);
         if (ret < 0) {
-            qed_aio_complete(acb, ret);
-            return;
+            return ret;
         }
     }
 
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
         ret = qed_aio_write_cow(acb);
     }
     if (ret < 0) {
-        qed_aio_complete(acb, ret);
-        return;
+        return ret;
     }
-    qed_aio_next_io(acb, 0);
+    return 0;
 }
 
 /**
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
  *
  * This path is taken when writing to already allocated clusters.
  */
-static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
+static int qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
 {
-    int ret;
-
     /* Allocate buffer for zero writes */
     if (acb->flags & QED_AIOCB_ZERO) {
         struct iovec *iov = acb->qiov->iov;
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
         if (!iov->iov_base) {
             iov->iov_base = qemu_try_blockalign(acb->common.bs, iov->iov_len);
             if (iov->iov_base == NULL) {
-                qed_aio_complete(acb, -ENOMEM);
-                return;
+                return -ENOMEM;
             }
             memset(iov->iov_base, 0, iov->iov_len);
         }
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
     qemu_iovec_concat(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len);
 
     /* Do the actual write */
-    ret = qed_aio_write_main(acb);
-    if (ret < 0) {
-        qed_aio_complete(acb, ret);
-        return;
-    }
-    qed_aio_next_io(acb, 0);
+    return qed_aio_write_main(acb);
 }
 
 /**
@@ -XXX,XX +XXX,XX @@ static void qed_aio_write_data(void *opaque, int ret,
 
     switch (ret) {
     case QED_CLUSTER_FOUND:
-        qed_aio_write_inplace(acb, offset, len);
+        ret = qed_aio_write_inplace(acb, offset, len);
         break;
 
     case QED_CLUSTER_L2:
     case QED_CLUSTER_L1:
     case QED_CLUSTER_ZERO:
-        qed_aio_write_alloc(acb, len);
+        ret = qed_aio_write_alloc(acb, len);
         break;
 
     default:
-        qed_aio_complete(acb, ret);
+        assert(ret < 0);
         break;
     }
+
+    if (ret < 0) {
+        if (ret != -EINPROGRESS) {
+            qed_aio_complete(acb, ret);
+        }
+        return;
+    }
+    qed_aio_next_io(acb, 0);
 }
 
 /**
-- 
1.8.3.1

All callers pass ret = 0, so we can just remove it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 17 ++++++-----------
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static CachedL2Table *qed_new_l2_table(BDRVQEDState *s)
     return l2_table;
 }
 
-static void qed_aio_next_io(QEDAIOCB *acb, int ret);
+static void qed_aio_next_io(QEDAIOCB *acb);
 
 static void qed_aio_start_io(QEDAIOCB *acb)
 {
-    qed_aio_next_io(acb, 0);
+    qed_aio_next_io(acb);
 }
 
 static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
@@ -XXX,XX +XXX,XX @@ static int qed_aio_read_data(void *opaque, int ret, uint64_t offset, size_t len)
 /**
  * Begin next I/O or complete the request
  */
-static void qed_aio_next_io(QEDAIOCB *acb, int ret)
+static void qed_aio_next_io(QEDAIOCB *acb)
 {
     BDRVQEDState *s = acb_to_s(acb);
     uint64_t offset;
     size_t len;
+    int ret;
 
-    trace_qed_aio_next_io(s, acb, ret, acb->cur_pos + acb->cur_qiov.size);
+    trace_qed_aio_next_io(s, acb, 0, acb->cur_pos + acb->cur_qiov.size);
 
     if (acb->backing_qiov) {
         qemu_iovec_destroy(acb->backing_qiov);
@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb, int ret)
         acb->backing_qiov = NULL;
     }
 
-    /* Handle I/O error */
-    if (ret) {
-        qed_aio_complete(acb, ret);
-        return;
-    }
-
     acb->qiov_offset += acb->cur_qiov.size;
     acb->cur_pos += acb->cur_qiov.size;
     qemu_iovec_reset(&acb->cur_qiov);
@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb, int ret)
         }
         return;
     }
-    qed_aio_next_io(acb, 0);
+    qed_aio_next_io(acb);
 }
 
 static BlockAIOCB *qed_aio_setup(BlockDriverState *bs,
-- 
1.8.3.1

Most of the qed code is now synchronous and matches the coroutine model.
One notable exception is the serialisation between requests which can
still schedule a callback. Before we can replace this with coroutine
locks, let's convert the driver's external interfaces to the coroutine
versions.

We need to be careful to handle both requests that call the completion
callback directly from the calling coroutine (i.e. fully synchronous
code) and requests that involve some callback, so that we need to yield
and wait for the completion callback coming from outside the coroutine.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 97 ++++++++++++++++++++++++++-----------------------------------
 1 file changed, 42 insertions(+), 55 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb)
     }
 }
 
-static BlockAIOCB *qed_aio_setup(BlockDriverState *bs,
-                                 int64_t sector_num,
-                                 QEMUIOVector *qiov, int nb_sectors,
-                                 BlockCompletionFunc *cb,
-                                 void *opaque, int flags)
+typedef struct QEDRequestCo {
+    Coroutine *co;
+    bool done;
+    int ret;
+} QEDRequestCo;
+
+static void qed_co_request_cb(void *opaque, int ret)
 {
-    QEDAIOCB *acb = qemu_aio_get(&qed_aiocb_info, bs, cb, opaque);
+    QEDRequestCo *co = opaque;
 
-    trace_qed_aio_setup(bs->opaque, acb, sector_num, nb_sectors,
-                        opaque, flags);
+    co->done = true;
+    co->ret = ret;
+    qemu_coroutine_enter_if_inactive(co->co);
+}
+
+static int coroutine_fn qed_co_request(BlockDriverState *bs, int64_t sector_num,
+                                       QEMUIOVector *qiov, int nb_sectors,
+                                       int flags)
+{
+    QEDRequestCo co = {
+        .co     = qemu_coroutine_self(),
+        .done   = false,
+    };
+    QEDAIOCB *acb = qemu_aio_get(&qed_aiocb_info, bs, qed_co_request_cb, &co);
+
+    trace_qed_aio_setup(bs->opaque, acb, sector_num, nb_sectors, &co, flags);
 
     acb->flags = flags;
     acb->qiov = qiov;
@@ -XXX,XX +XXX,XX @@ static BlockAIOCB *qed_aio_setup(BlockDriverState *bs,
 
     /* Start request */
     qed_aio_start_io(acb);
-    return &acb->common;
-}
 
-static BlockAIOCB *bdrv_qed_aio_readv(BlockDriverState *bs,
-                                      int64_t sector_num,
-                                      QEMUIOVector *qiov, int nb_sectors,
-                                      BlockCompletionFunc *cb,
-                                      void *opaque)
-{
-    return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
+    if (!co.done) {
+        qemu_coroutine_yield();
+    }
+
+    return co.ret;
 }
 
-static BlockAIOCB *bdrv_qed_aio_writev(BlockDriverState *bs,
-                                       int64_t sector_num,
-                                       QEMUIOVector *qiov, int nb_sectors,
-                                       BlockCompletionFunc *cb,
-                                       void *opaque)
+static int coroutine_fn bdrv_qed_co_readv(BlockDriverState *bs,
+                                          int64_t sector_num, int nb_sectors,
+                                          QEMUIOVector *qiov)
 {
-    return qed_aio_setup(bs, sector_num, qiov, nb_sectors, cb,
-                         opaque, QED_AIOCB_WRITE);
+    return qed_co_request(bs, sector_num, qiov, nb_sectors, 0);
 }
 
-typedef struct {
-    Coroutine *co;
-    int ret;
-    bool done;
-} QEDWriteZeroesCB;
-
-static void coroutine_fn qed_co_pwrite_zeroes_cb(void *opaque, int ret)
+static int coroutine_fn bdrv_qed_co_writev(BlockDriverState *bs,
+                                           int64_t sector_num, int nb_sectors,
+                                           QEMUIOVector *qiov)
 {
-    QEDWriteZeroesCB *cb = opaque;
-
-    cb->done = true;
-    cb->ret = ret;
-    if (cb->co) {
-        aio_co_wake(cb->co);
-    }
+    return qed_co_request(bs, sector_num, qiov, nb_sectors, QED_AIOCB_WRITE);
 }
 
 static int coroutine_fn bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
                                                   int count,
                                                   BdrvRequestFlags flags)
 {
-    BlockAIOCB *blockacb;
     BDRVQEDState *s = bs->opaque;
-    QEDWriteZeroesCB cb = { .done = false };
     QEMUIOVector qiov;
     struct iovec iov;
 
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
     iov.iov_len = count;
 
     qemu_iovec_init_external(&qiov, &iov, 1);
-    blockacb = qed_aio_setup(bs, offset >> BDRV_SECTOR_BITS, &qiov,
-                             count >> BDRV_SECTOR_BITS,
-                             qed_co_pwrite_zeroes_cb, &cb,
-                             QED_AIOCB_WRITE | QED_AIOCB_ZERO);
-    if (!blockacb) {
-        return -EIO;
-    }
-    if (!cb.done) {
-        cb.co = qemu_coroutine_self();
-        qemu_coroutine_yield();
-    }
-    assert(cb.done);
-    return cb.ret;
+    return qed_co_request(bs, offset >> BDRV_SECTOR_BITS, &qiov,
+                          count >> BDRV_SECTOR_BITS,
+                          QED_AIOCB_WRITE | QED_AIOCB_ZERO);
 }
 
 static int bdrv_qed_truncate(BlockDriverState *bs, int64_t offset, Error **errp)
@@ -XXX,XX +XXX,XX @@ static BlockDriver bdrv_qed = {
     .bdrv_create              = bdrv_qed_create,
     .bdrv_has_zero_init       = bdrv_has_zero_init_1,
     .bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
-    .bdrv_aio_readv           = bdrv_qed_aio_readv,
-    .bdrv_aio_writev          = bdrv_qed_aio_writev,
+    .bdrv_co_readv            = bdrv_qed_co_readv,
+    .bdrv_co_writev           = bdrv_qed_co_writev,
     .bdrv_co_pwrite_zeroes    = bdrv_qed_co_pwrite_zeroes,
     .bdrv_truncate            = bdrv_qed_truncate,
     .bdrv_getlength           = bdrv_qed_getlength,
-- 
1.8.3.1

Now that we're running in coroutine context, the ad-hoc serialisation
code (which drops a request that has to wait out of coroutine context)
can be replaced by a CoQueue.

This means that when we resume a serialised request, it is running in
coroutine context again and its I/O isn't blocking any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 49 +++++++++++++++++--------------------------------
 block/qed.h |  3 ++-
 2 files changed, 19 insertions(+), 33 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
 
 static void qed_unplug_allocating_write_reqs(BDRVQEDState *s)
 {
-    QEDAIOCB *acb;
-
     assert(s->allocating_write_reqs_plugged);
 
     s->allocating_write_reqs_plugged = false;
-
-    acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs);
-    if (acb) {
-        qed_aio_start_io(acb);
-    }
+    qemu_co_enter_next(&s->allocating_write_reqs);
 }
 
 static void qed_clear_need_check(void *opaque, int ret)
@@ -XXX,XX +XXX,XX @@ static void qed_need_check_timer_cb(void *opaque)
     BDRVQEDState *s = opaque;
 
     /* The timer should only fire when allocating writes have drained */
-    assert(!QSIMPLEQ_FIRST(&s->allocating_write_reqs));
+    assert(!s->allocating_acb);
 
     trace_qed_need_check_timer_cb(s);
 
@@ -XXX,XX +XXX,XX @@ static int bdrv_qed_do_open(BlockDriverState *bs, QDict *options, int flags,
     int ret;
 
     s->bs = bs;
-    QSIMPLEQ_INIT(&s->allocating_write_reqs);
+    qemu_co_queue_init(&s->allocating_write_reqs);
 
     ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
     if (ret < 0) {
@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete_bh(void *opaque)
     qed_release(s);
 }
 
-static void qed_resume_alloc_bh(void *opaque)
-{
-    qed_aio_start_io(opaque);
-}
-
 static void qed_aio_complete(QEDAIOCB *acb, int ret)
 {
     BDRVQEDState *s = acb_to_s(acb);
@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
      * next request in the queue.  This ensures that we don't cycle through
      * requests multiple times but rather finish one at a time completely.
      */
-    if (acb == QSIMPLEQ_FIRST(&s->allocating_write_reqs)) {
-        QEDAIOCB *next_acb;
-        QSIMPLEQ_REMOVE_HEAD(&s->allocating_write_reqs, next);
-        next_acb = QSIMPLEQ_FIRST(&s->allocating_write_reqs);
-        if (next_acb) {
-            aio_bh_schedule_oneshot(bdrv_get_aio_context(acb->common.bs),
-                                    qed_resume_alloc_bh, next_acb);
+    if (acb == s->allocating_acb) {
+        s->allocating_acb = NULL;
+        if (!qemu_co_queue_empty(&s->allocating_write_reqs)) {
+            qemu_co_enter_next(&s->allocating_write_reqs);
         } else if (s->header.features & QED_F_NEED_CHECK) {
             qed_start_need_check_timer(s);
         }
@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
     int ret;
 
     /* Cancel timer when the first allocating request comes in */
-    if (QSIMPLEQ_EMPTY(&s->allocating_write_reqs)) {
+    if (s->allocating_acb == NULL) {
         qed_cancel_need_check_timer(s);
     }
 
     /* Freeze this request if another allocating write is in progress */
-    if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs)) {
-        QSIMPLEQ_INSERT_TAIL(&s->allocating_write_reqs, acb, next);
-    }
-    if (acb != QSIMPLEQ_FIRST(&s->allocating_write_reqs) ||
-        s->allocating_write_reqs_plugged) {
-        return -EINPROGRESS; /* wait for existing request to finish */
+    if (s->allocating_acb != acb || s->allocating_write_reqs_plugged) {
+        if (s->allocating_acb != NULL) {
+            qemu_co_queue_wait(&s->allocating_write_reqs, NULL);
+            assert(s->allocating_acb == NULL);
+        }
+        s->allocating_acb = acb;
+        return -EAGAIN; /* start over with looking up table entries */
     }
 
     acb->cur_nclusters = qed_bytes_to_clusters(s,
@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb)
             ret = qed_aio_read_data(acb, ret, offset, len);
         }
 
-        if (ret < 0) {
-            if (ret != -EINPROGRESS) {
-                qed_aio_complete(acb, ret);
-            }
+        if (ret < 0 && ret != -EAGAIN) {
+            qed_aio_complete(acb, ret);
             return;
         }
     }
diff --git a/block/qed.h b/block/qed.h
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -XXX,XX +XXX,XX @@ typedef struct {
     uint32_t l2_mask;
 
     /* Allocating write request queue */
-    QSIMPLEQ_HEAD(, QEDAIOCB) allocating_write_reqs;
+    QEDAIOCB *allocating_acb;
+    CoQueue allocating_write_reqs;
     bool allocating_write_reqs_plugged;
 
     /* Periodic flush and clear need check flag */
-- 
1.8.3.1

Now that we process a request in the same coroutine from beginning to
end and don't drop out of it any more, we can look like a proper
coroutine-based driver and simply call qed_aio_next_io() and get a
return value from it instead of spawning an additional coroutine that
reenters the parent when it's done.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 101 +++++++++++++-----------------------------------------------
 block/qed.h |   3 +-
 2 files changed, 22 insertions(+), 82 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@
 #include "qapi/qmp/qerror.h"
 #include "sysemu/block-backend.h"
 
-static const AIOCBInfo qed_aiocb_info = {
-    .aiocb_size         = sizeof(QEDAIOCB),
-};
-
 static int bdrv_qed_probe(const uint8_t *buf, int buf_size,
                           const char *filename)
 {
@@ -XXX,XX +XXX,XX @@ static CachedL2Table *qed_new_l2_table(BDRVQEDState *s)
     return l2_table;
 }
 
-static void qed_aio_next_io(QEDAIOCB *acb);
-
-static void qed_aio_start_io(QEDAIOCB *acb)
-{
-    qed_aio_next_io(acb);
-}
-
 static void qed_plug_allocating_write_reqs(BDRVQEDState *s)
 {
     assert(!s->allocating_write_reqs_plugged);
@@ -XXX,XX +XXX,XX @@ static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
 
 static BDRVQEDState *acb_to_s(QEDAIOCB *acb)
 {
-    return acb->common.bs->opaque;
+    return acb->bs->opaque;
 }
 
 /**
@@ -XXX,XX +XXX,XX @@ static void qed_update_l2_table(BDRVQEDState *s, QEDTable *table, int index,
     }
 }
 
-static void qed_aio_complete_bh(void *opaque)
-{
-    QEDAIOCB *acb = opaque;
-    BDRVQEDState *s = acb_to_s(acb);
-    BlockCompletionFunc *cb = acb->common.cb;
-    void *user_opaque = acb->common.opaque;
-    int ret = acb->bh_ret;
-
-    qemu_aio_unref(acb);
-
-    /* Invoke callback */
-    qed_acquire(s);
-    cb(user_opaque, ret);
-    qed_release(s);
-}
-
-static void qed_aio_complete(QEDAIOCB *acb, int ret)
+static void qed_aio_complete(QEDAIOCB *acb)
 {
     BDRVQEDState *s = acb_to_s(acb);
 
-    trace_qed_aio_complete(s, acb, ret);
-
     /* Free resources */
     qemu_iovec_destroy(&acb->cur_qiov);
     qed_unref_l2_cache_entry(acb->request.l2_table);
@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete(QEDAIOCB *acb, int ret)
         acb->qiov->iov[0].iov_base = NULL;
     }
 
-    /* Arrange for a bh to invoke the completion function */
-    acb->bh_ret = ret;
-    aio_bh_schedule_oneshot(bdrv_get_aio_context(acb->common.bs),
-                            qed_aio_complete_bh, acb);
-
     /* Start next allocating write request waiting behind this one.  Note that
      * requests enqueue themselves when they first hit an unallocated cluster
      * but they wait until the entire request is finished before waking up the
@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
         struct iovec *iov = acb->qiov->iov;
 
         if (!iov->iov_base) {
-            iov->iov_base = qemu_try_blockalign(acb->common.bs, iov->iov_len);
+            iov->iov_base = qemu_try_blockalign(acb->bs, iov->iov_len);
             if (iov->iov_base == NULL) {
                 return -ENOMEM;
             }
@@ -XXX,XX +XXX,XX @@ static int qed_aio_read_data(void *opaque, int ret, uint64_t offset, size_t len)
 {
     QEDAIOCB *acb = opaque;
     BDRVQEDState *s = acb_to_s(acb);
-    BlockDriverState *bs = acb->common.bs;
+    BlockDriverState *bs = acb->bs;
 
     /* Adjust offset into cluster */
     offset += qed_offset_into_cluster(s, acb->cur_pos);
@@ -XXX,XX +XXX,XX @@ static int qed_aio_read_data(void *opaque, int ret, uint64_t offset, size_t len)
 /**
  * Begin next I/O or complete the request
  */
-static void qed_aio_next_io(QEDAIOCB *acb)
+static int qed_aio_next_io(QEDAIOCB *acb)
 {
     BDRVQEDState *s = acb_to_s(acb);
     uint64_t offset;
@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb)
 
         /* Complete request */
         if (acb->cur_pos >= acb->end_pos) {
-            qed_aio_complete(acb, 0);
-            return;
+            ret = 0;
+            break;
         }
 
         /* Find next cluster and start I/O */
         len = acb->end_pos - acb->cur_pos;
         ret = qed_find_cluster(s, &acb->request, acb->cur_pos, &len, &offset);
         if (ret < 0) {
-            qed_aio_complete(acb, ret);
-            return;
+            break;
         }
 
         if (acb->flags & QED_AIOCB_WRITE) {
@@ -XXX,XX +XXX,XX @@ static void qed_aio_next_io(QEDAIOCB *acb)
         }
 
         if (ret < 0 && ret != -EAGAIN) {
-            qed_aio_complete(acb, ret);
-            return;
+            break;
         }
     }
-}
 
-typedef struct QEDRequestCo {
-    Coroutine *co;
-    bool done;
-    int ret;
-} QEDRequestCo;
-
-static void qed_co_request_cb(void *opaque, int ret)
-{
-    QEDRequestCo *co = opaque;
-
-    co->done = true;
-    co->ret = ret;
-    qemu_coroutine_enter_if_inactive(co->co);
+    trace_qed_aio_complete(s, acb, ret);
+    qed_aio_complete(acb);
+    return ret;
 }
 
 static int coroutine_fn qed_co_request(BlockDriverState *bs, int64_t sector_num,
                                        QEMUIOVector *qiov, int nb_sectors,
                                        int flags)
 {
-    QEDRequestCo co = {
-        .co     = qemu_coroutine_self(),
-        .done   = false,
+    QEDAIOCB acb = {
+        .bs         = bs,
+        .cur_pos    = (uint64_t) sector_num * BDRV_SECTOR_SIZE,
+        .end_pos    = (sector_num + nb_sectors) * BDRV_SECTOR_SIZE,
+        .qiov       = qiov,
+        .flags      = flags,
     };
-    QEDAIOCB *acb = qemu_aio_get(&qed_aiocb_info, bs, qed_co_request_cb, &co);
-
-    trace_qed_aio_setup(bs->opaque, acb, sector_num, nb_sectors, &co, flags);
+    qemu_iovec_init(&acb.cur_qiov, qiov->niov);
 
-    acb->flags = flags;
-    acb->qiov = qiov;
-    acb->qiov_offset = 0;
-    acb->cur_pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE;
-    acb->end_pos = acb->cur_pos + nb_sectors * BDRV_SECTOR_SIZE;
-    acb->backing_qiov = NULL;
-    acb->request.l2_table = NULL;
-    qemu_iovec_init(&acb->cur_qiov, qiov->niov);
+    trace_qed_aio_setup(bs->opaque, &acb, sector_num, nb_sectors, NULL, flags);
 
     /* Start request */
-    qed_aio_start_io(acb);
-
-    if (!co.done) {
-        qemu_coroutine_yield();
-    }
-
-    return co.ret;
+    return qed_aio_next_io(&acb);
 }
 
 static int coroutine_fn bdrv_qed_co_readv(BlockDriverState *bs,
diff --git a/block/qed.h b/block/qed.h
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -XXX,XX +XXX,XX @@ enum {
 };
 
 typedef struct QEDAIOCB {
-    BlockAIOCB common;
-    int bh_ret;                     /* final return status for completion bh */
+    BlockDriverState *bs;
     QSIMPLEQ_ENTRY(QEDAIOCB) next;  /* next request */
     int flags;                      /* QED_AIOCB_* bits ORed together */
     uint64_t end_pos;               /* request end on block device, in bytes */
-- 
1.8.3.1

This fixes the last place where we degraded from AIO to actual blocking
synchronous I/O requests. Putting it into a coroutine means that instead
of blocking, the coroutine simply yields while doing I/O.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static void qed_unplug_allocating_write_reqs(BDRVQEDState *s)
     qemu_co_enter_next(&s->allocating_write_reqs);
 }
 
-static void qed_clear_need_check(void *opaque, int ret)
+static void qed_need_check_timer_entry(void *opaque)
 {
     BDRVQEDState *s = opaque;
+    int ret;
 
-    if (ret) {
+    /* The timer should only fire when allocating writes have drained */
+    assert(!s->allocating_acb);
+
+    trace_qed_need_check_timer_cb(s);
+
+    qed_acquire(s);
+    qed_plug_allocating_write_reqs(s);
+
+    /* Ensure writes are on disk before clearing flag */
+    ret = bdrv_co_flush(s->bs->file->bs);
+    qed_release(s);
+    if (ret < 0) {
         qed_unplug_allocating_write_reqs(s);
         return;
     }
@@ -XXX,XX +XXX,XX @@ static void qed_clear_need_check(void *opaque, int ret)
 
     qed_unplug_allocating_write_reqs(s);
 
-    ret = bdrv_flush(s->bs);
+    ret = bdrv_co_flush(s->bs);
     (void) ret;
 }
 
 static void qed_need_check_timer_cb(void *opaque)
 {
-    BDRVQEDState *s = opaque;
-
-    /* The timer should only fire when allocating writes have drained */
-    assert(!s->allocating_acb);
-
-    trace_qed_need_check_timer_cb(s);
-
-    qed_acquire(s);
-    qed_plug_allocating_write_reqs(s);
-
-    /* Ensure writes are on disk before clearing flag */
-    bdrv_aio_flush(s->bs->file->bs, qed_clear_need_check, s);
-    qed_release(s);
+    Coroutine *co = qemu_coroutine_create(qed_need_check_timer_entry, opaque);
+    qemu_coroutine_enter(co);
 }
 
 void qed_acquire(BDRVQEDState *s)
-- 
1.8.3.1

Now that we stay in coroutine context for the whole request when doing
reads or writes, we can add coroutine_fn annotations to many functions
that can do I/O or yield directly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed-cluster.c |  5 +++--
 block/qed.c         | 44 ++++++++++++++++++++++++--------------------
 block/qed.h         |  5 +++--
 3 files changed, 30 insertions(+), 24 deletions(-)

diff --git a/block/qed-cluster.c b/block/qed-cluster.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed-cluster.c
+++ b/block/qed-cluster.c
@@ -XXX,XX +XXX,XX @@ static unsigned int qed_count_contiguous_clusters(BDRVQEDState *s,
  * On failure QED_CLUSTER_L2 or QED_CLUSTER_L1 is returned for missing L2 or L1
  * table offset, respectively. len is number of contiguous unallocated bytes.
  */
-int qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
-                     size_t *len, uint64_t *img_offset)
+int coroutine_fn qed_find_cluster(BDRVQEDState *s, QEDRequest *request,
+                                  uint64_t pos, size_t *len,
+                                  uint64_t *img_offset)
 {
     uint64_t l2_offset;
     uint64_t offset = 0;
diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ int qed_write_header_sync(BDRVQEDState *s)
  * This function only updates known header fields in-place and does not affect
  * extra data after the QED header.
  */
-static int qed_write_header(BDRVQEDState *s)
+static int coroutine_fn qed_write_header(BDRVQEDState *s)
 {
     /* We must write full sectors for O_DIRECT but cannot necessarily generate
      * the data following the header if an unrecognized compat feature is
@@ -XXX,XX +XXX,XX @@ static void qed_unplug_allocating_write_reqs(BDRVQEDState *s)
     qemu_co_enter_next(&s->allocating_write_reqs);
 }
 
-static void qed_need_check_timer_entry(void *opaque)
+static void coroutine_fn qed_need_check_timer_entry(void *opaque)
 {
     BDRVQEDState *s = opaque;
     int ret;
@@ -XXX,XX +XXX,XX @@ static BDRVQEDState *acb_to_s(QEDAIOCB *acb)
  * This function reads qiov->size bytes starting at pos from the backing file.
  * If there is no backing file then zeroes are read.
  */
-static int qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
-                                 QEMUIOVector *qiov,
-                                 QEMUIOVector **backing_qiov)
+static int coroutine_fn qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
+                                              QEMUIOVector *qiov,
+                                              QEMUIOVector **backing_qiov)
 {
     uint64_t backing_length = 0;
     size_t size;
@@ -XXX,XX +XXX,XX @@ static int qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
  * @len:        Number of bytes
  * @offset:     Byte offset in image file
  */
-static int qed_copy_from_backing_file(BDRVQEDState *s, uint64_t pos,
-                                      uint64_t len, uint64_t offset)
+static int coroutine_fn qed_copy_from_backing_file(BDRVQEDState *s,
+                                                   uint64_t pos, uint64_t len,
+                                                   uint64_t offset)
 {
     QEMUIOVector qiov;
     QEMUIOVector *backing_qiov = NULL;
@@ -XXX,XX +XXX,XX @@ out:
  * The cluster offset may be an allocated byte offset in the image file, the
  * zero cluster marker, or the unallocated cluster marker.
  */
-static void qed_update_l2_table(BDRVQEDState *s, QEDTable *table, int index,
-                                unsigned int n, uint64_t cluster)
+static void coroutine_fn qed_update_l2_table(BDRVQEDState *s, QEDTable *table,
+                                             int index, unsigned int n,
+                                             uint64_t cluster)
 {
     int i;
     for (i = index; i < index + n; i++) {
@@ -XXX,XX +XXX,XX @@ static void qed_update_l2_table(BDRVQEDState *s, QEDTable *table, int index,
     }
 }
 
-static void qed_aio_complete(QEDAIOCB *acb)
+static void coroutine_fn qed_aio_complete(QEDAIOCB *acb)
 {
     BDRVQEDState *s = acb_to_s(acb);
 
@@ -XXX,XX +XXX,XX @@ static void qed_aio_complete(QEDAIOCB *acb)
 /**
  * Update L1 table with new L2 table offset and write it out
  */
-static int qed_aio_write_l1_update(QEDAIOCB *acb)
+static int coroutine_fn qed_aio_write_l1_update(QEDAIOCB *acb)
 {
     BDRVQEDState *s = acb_to_s(acb);
     CachedL2Table *l2_table = acb->request.l2_table;
@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_l1_update(QEDAIOCB *acb)
 /**
  * Update L2 table with new cluster offsets and write them out
  */
-static int qed_aio_write_l2_update(QEDAIOCB *acb, uint64_t offset)
+static int coroutine_fn qed_aio_write_l2_update(QEDAIOCB *acb, uint64_t offset)
 {
     BDRVQEDState *s = acb_to_s(acb);
     bool need_alloc = acb->find_cluster_ret == QED_CLUSTER_L1;
@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_l2_update(QEDAIOCB *acb, uint64_t offset)
 /**
  * Write data to the image file
  */
-static int qed_aio_write_main(QEDAIOCB *acb)
+static int coroutine_fn qed_aio_write_main(QEDAIOCB *acb)
 {
     BDRVQEDState *s = acb_to_s(acb);
     uint64_t offset = acb->cur_cluster +
@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_main(QEDAIOCB *acb)
 /**
  * Populate untouched regions of new data cluster
  */
-static int qed_aio_write_cow(QEDAIOCB *acb)
+static int coroutine_fn qed_aio_write_cow(QEDAIOCB *acb)
 {
     BDRVQEDState *s = acb_to_s(acb);
     uint64_t start, len, offset;
@@ -XXX,XX +XXX,XX @@ static bool qed_should_set_need_check(BDRVQEDState *s)
  *
  * This path is taken when writing to previously unallocated clusters.
  */
-static int qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
+static int coroutine_fn qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
 {
     BDRVQEDState *s = acb_to_s(acb);
     int ret;
@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_alloc(QEDAIOCB *acb, size_t len)
  *
  * This path is taken when writing to already allocated clusters.
  */
-static int qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
+static int coroutine_fn qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset,
+                                              size_t len)
 {
     /* Allocate buffer for zero writes */
     if (acb->flags & QED_AIOCB_ZERO) {
@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_inplace(QEDAIOCB *acb, uint64_t offset, size_t len)
  * @offset:     Cluster offset in bytes
  * @len:        Length in bytes
  */
-static int qed_aio_write_data(void *opaque, int ret,
-                              uint64_t offset, size_t len)
+static int coroutine_fn qed_aio_write_data(void *opaque, int ret,
+                                           uint64_t offset, size_t len)
 {
     QEDAIOCB *acb = opaque;
 
@@ -XXX,XX +XXX,XX @@ static int qed_aio_write_data(void *opaque, int ret,
  * @offset:     Cluster offset in bytes
  * @len:        Length in bytes
  */
-static int qed_aio_read_data(void *opaque, int ret, uint64_t offset, size_t len)
+static int coroutine_fn qed_aio_read_data(void *opaque, int ret,
+                                          uint64_t offset, size_t len)
 {
     QEDAIOCB *acb = opaque;
     BDRVQEDState *s = acb_to_s(acb);
@@ -XXX,XX +XXX,XX @@ static int qed_aio_read_data(void *opaque, int ret, uint64_t offset, size_t len)
 /**
  * Begin next I/O or complete the request
  */
-static int qed_aio_next_io(QEDAIOCB *acb)
+static int coroutine_fn qed_aio_next_io(QEDAIOCB *acb)
 {
     BDRVQEDState *s = acb_to_s(acb);
     uint64_t offset;
diff --git a/block/qed.h b/block/qed.h
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.h
+++ b/block/qed.h
@@ -XXX,XX +XXX,XX @@ int qed_write_l2_table_sync(BDRVQEDState *s, QEDRequest *request,
 /**
  * Cluster functions
  */
-int qed_find_cluster(BDRVQEDState *s, QEDRequest *request, uint64_t pos,
-                     size_t *len, uint64_t *img_offset);
+int coroutine_fn qed_find_cluster(BDRVQEDState *s, QEDRequest *request,
+                                  uint64_t pos, size_t *len,
+                                  uint64_t *img_offset);
 
 /**
  * Consistency check
-- 
1.8.3.1

All functions that are marked coroutine_fn can directly call the
bdrv_co_* version of functions instead of going through the wrapper.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/qed.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/block/qed.c b/block/qed.c
index XXXXXXX..XXXXXXX 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_write_header(BDRVQEDState *s)
     };
     qemu_iovec_init_external(&qiov, &iov, 1);
 
-    ret = bdrv_preadv(s->bs->file, 0, &qiov);
+    ret = bdrv_co_preadv(s->bs->file, 0, qiov.size, &qiov, 0);
     if (ret < 0) {
         goto out;
     }
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_write_header(BDRVQEDState *s)
     /* Update header */
     qed_header_cpu_to_le(&s->header, (QEDHeader *) buf);
 
-    ret = bdrv_pwritev(s->bs->file, 0, &qiov);
+    ret = bdrv_co_pwritev(s->bs->file, 0, qiov.size,  &qiov, 0);
     if (ret < 0) {
         goto out;
     }
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_read_backing_file(BDRVQEDState *s, uint64_t pos,
     qemu_iovec_concat(*backing_qiov, qiov, 0, size);
 
     BLKDBG_EVENT(s->bs->file, BLKDBG_READ_BACKING_AIO);
-    ret = bdrv_preadv(s->bs->backing, pos, *backing_qiov);
+    ret = bdrv_co_preadv(s->bs->backing, pos, size, *backing_qiov, 0);
     if (ret < 0) {
         return ret;
     }
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_copy_from_backing_file(BDRVQEDState *s,
     }
 
     BLKDBG_EVENT(s->bs->file, BLKDBG_COW_WRITE);
-    ret = bdrv_pwritev(s->bs->file, offset, &qiov);
+    ret = bdrv_co_pwritev(s->bs->file, offset, qiov.size, &qiov, 0);
     if (ret < 0) {
         goto out;
     }
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_aio_write_main(QEDAIOCB *acb)
     trace_qed_aio_write_main(s, acb, 0, offset, acb->cur_qiov.size);
 
     BLKDBG_EVENT(s->bs->file, BLKDBG_WRITE_AIO);
-    ret = bdrv_pwritev(s->bs->file, offset, &acb->cur_qiov);
+    ret = bdrv_co_pwritev(s->bs->file, offset, acb->cur_qiov.size,
+                          &acb->cur_qiov, 0);
     if (ret < 0) {
         return ret;
     }
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_aio_write_main(QEDAIOCB *acb)
              * region.  The solution is to flush after writing a new data
              * cluster and before updating the L2 table.
              */
-            ret = bdrv_flush(s->bs->file->bs);
+            ret = bdrv_co_flush(s->bs->file->bs);
             if (ret < 0) {
                 return ret;
             }
@@ -XXX,XX +XXX,XX @@ static int coroutine_fn qed_aio_read_data(void *opaque, int ret,
     }
 
     BLKDBG_EVENT(bs->file, BLKDBG_READ_AIO);
-    ret = bdrv_preadv(bs->file, offset, &acb->cur_qiov);
+    ret = bdrv_co_preadv(bs->file, offset, acb->cur_qiov.size,
+                         &acb->cur_qiov, 0);
     if (ret < 0) {
         return ret;
     }
-- 
1.8.3.1

From: "sochin.jiang" <sochin.jiang@huawei.com>

img_commit could fall into an infinite loop calling run_block_job() if
its blockjob fails on any I/O error, fix this already known problem.

Signed-off-by: sochin.jiang <sochin.jiang@huawei.com>
Message-id: 1497509253-28941-1-git-send-email-sochin.jiang@huawei.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 blockjob.c               |  4 ++--
 include/block/blockjob.h | 18 ++++++++++++++++++
 qemu-img.c               | 20 +++++++++++++-------
 3 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index XXXXXXX..XXXXXXX 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -XXX,XX +XXX,XX @@ static void block_job_resume(BlockJob *job)
     block_job_enter(job);
 }
 
-static void block_job_ref(BlockJob *job)
+void block_job_ref(BlockJob *job)
 {
     ++job->refcnt;
 }
@@ -XXX,XX +XXX,XX @@ static void block_job_attached_aio_context(AioContext *new_context,
                                            void *opaque);
 static void block_job_detach_aio_context(void *opaque);
 
-static void block_job_unref(BlockJob *job)
+void block_job_unref(BlockJob *job)
 {
     if (--job->refcnt == 0) {
         BlockDriverState *bs = blk_bs(job->blk);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -XXX,XX +XXX,XX @@ void block_job_iostatus_reset(BlockJob *job);
 BlockJobTxn *block_job_txn_new(void);
 
 /**
+ * block_job_ref:
+ *
+ * Add a reference to BlockJob refcnt, it will be decreased with
+ * block_job_unref, and then be freed if it comes to be the last
+ * reference.
+ */
+void block_job_ref(BlockJob *job);
+
+/**
+ * block_job_unref:
+ *
+ * Release a reference that was previously acquired with block_job_ref
+ * or block_job_create. If it's the last reference to the object, it will be
+ * freed.
+ */
+void block_job_unref(BlockJob *job);
+
+/**
  * block_job_txn_unref:
  *
  * Release a reference that was previously acquired with block_job_txn_add_job
diff --git a/qemu-img.c b/qemu-img.c
index XXXXXXX..XXXXXXX 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -XXX,XX +XXX,XX @@ static void common_block_job_cb(void *opaque, int ret)
 static void run_block_job(BlockJob *job, Error **errp)
 {
     AioContext *aio_context = blk_get_aio_context(job->blk);
+    int ret = 0;
 
-    /* FIXME In error cases, the job simply goes away and we access a dangling
-     * pointer below. */
     aio_context_acquire(aio_context);
+    block_job_ref(job);
     do {
         aio_poll(aio_context, true);
         qemu_progress_print(job->len ?
                             ((float)job->offset / job->len * 100.f) : 0.0f, 0);
-    } while (!job->ready);
+    } while (!job->ready && !job->completed);
 
-    block_job_complete_sync(job, errp);
+    if (!job->completed) {
+        ret = block_job_complete_sync(job, errp);
+    } else {
+        ret = job->ret;
+    }
+    block_job_unref(job);
     aio_context_release(aio_context);
 
-    /* A block job may finish instantaneously without publishing any progress,
-     * so just signal completion here */
-    qemu_progress_print(100.f, 0);
+    /* publish completion progress only when success */
+    if (!ret) {
+        qemu_progress_print(100.f, 0);
+    }
 }
 
 static int img_commit(int argc, char **argv)
-- 
1.8.3.1

From: Max Reitz <mreitz@redhat.com>

The bs->exact_filename field may not be sufficient to store the full
blkdebug node filename. In this case, we should not generate a filename
at all instead of an unusable one.

Cc: qemu-stable@nongnu.org
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170613172006.19685-2-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/blkdebug.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index XXXXXXX..XXXXXXX 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -XXX,XX +XXX,XX @@ static void blkdebug_refresh_filename(BlockDriverState *bs, QDict *options)
     }
 
     if (!force_json && bs->file->bs->exact_filename[0]) {
-        snprintf(bs->exact_filename, sizeof(bs->exact_filename),
-                 "blkdebug:%s:%s", s->config_file ?: "",
-                 bs->file->bs->exact_filename);
+        int ret = snprintf(bs->exact_filename, sizeof(bs->exact_filename),
+                           "blkdebug:%s:%s", s->config_file ?: "",
+                           bs->file->bs->exact_filename);
+        if (ret >= sizeof(bs->exact_filename)) {
+            /* An overflow makes the filename unusable, so do not report any */
+            bs->exact_filename[0] = 0;
+        }
     }
 
     opts = qdict_new();
-- 
1.8.3.1

From: Max Reitz <mreitz@redhat.com>

The bs->exact_filename field may not be sufficient to store the full
blkverify node filename. In this case, we should not generate a filename
at all instead of an unusable one.

Cc: qemu-stable@nongnu.org
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170613172006.19685-3-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/blkverify.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/blkverify.c b/block/blkverify.c
index XXXXXXX..XXXXXXX 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -XXX,XX +XXX,XX @@ static void blkverify_refresh_filename(BlockDriverState *bs, QDict *options)
     if (bs->file->bs->exact_filename[0]
         && s->test_file->bs->exact_filename[0])
     {
-        snprintf(bs->exact_filename, sizeof(bs->exact_filename),
-                 "blkverify:%s:%s",
-                 bs->file->bs->exact_filename,
-                 s->test_file->bs->exact_filename);
+        int ret = snprintf(bs->exact_filename, sizeof(bs->exact_filename),
+                           "blkverify:%s:%s",
+                           bs->file->bs->exact_filename,
+                           s->test_file->bs->exact_filename);
+        if (ret >= sizeof(bs->exact_filename)) {
+            /* An overflow makes the filename unusable, so do not report any */
+            bs->exact_filename[0] = 0;
+        }
     }
 }
 
-- 
1.8.3.1

From: Max Reitz <mreitz@redhat.com>

uri_parse(...)->scheme may be NULL. In fact, probably every field may be
NULL, and the callers do test this for all of the other fields but not
for scheme (except for block/gluster.c; block/vxhs.c does not access
that field at all).

We can easily fix this by using g_strcmp0() instead of strcmp().

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170613205726.13544-1-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/nbd.c      | 6 +++---
 block/nfs.c      | 2 +-
 block/sheepdog.c | 6 +++---
 block/ssh.c      | 2 +-
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index XXXXXXX..XXXXXXX 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -XXX,XX +XXX,XX @@ static int nbd_parse_uri(const char *filename, QDict *options)
     }
 
     /* transport */
-    if (!strcmp(uri->scheme, "nbd")) {
+    if (!g_strcmp0(uri->scheme, "nbd")) {
         is_unix = false;
-    } else if (!strcmp(uri->scheme, "nbd+tcp")) {
+    } else if (!g_strcmp0(uri->scheme, "nbd+tcp")) {
         is_unix = false;
-    } else if (!strcmp(uri->scheme, "nbd+unix")) {
+    } else if (!g_strcmp0(uri->scheme, "nbd+unix")) {
         is_unix = true;
     } else {
         ret = -EINVAL;
diff --git a/block/nfs.c b/block/nfs.c
index XXXXXXX..XXXXXXX 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -XXX,XX +XXX,XX @@ static int nfs_parse_uri(const char *filename, QDict *options, Error **errp)
         error_setg(errp, "Invalid URI specified");
         goto out;
     }
-    if (strcmp(uri->scheme, "nfs") != 0) {
+    if (g_strcmp0(uri->scheme, "nfs") != 0) {
         error_setg(errp, "URI scheme must be 'nfs'");
         goto out;
     }
diff --git a/block/sheepdog.c b/block/sheepdog.c
index XXXXXXX..XXXXXXX 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -XXX,XX +XXX,XX @@ static void sd_parse_uri(SheepdogConfig *cfg, const char *filename,
     }
 
     /* transport */
-    if (!strcmp(uri->scheme, "sheepdog")) {
+    if (!g_strcmp0(uri->scheme, "sheepdog")) {
         is_unix = false;
-    } else if (!strcmp(uri->scheme, "sheepdog+tcp")) {
+    } else if (!g_strcmp0(uri->scheme, "sheepdog+tcp")) {
         is_unix = false;
-    } else if (!strcmp(uri->scheme, "sheepdog+unix")) {
+    } else if (!g_strcmp0(uri->scheme, "sheepdog+unix")) {
         is_unix = true;
     } else {
         error_setg(&err, "URI scheme must be 'sheepdog', 'sheepdog+tcp',"
diff --git a/block/ssh.c b/block/ssh.c
index XXXXXXX..XXXXXXX 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -XXX,XX +XXX,XX @@ static int parse_uri(const char *filename, QDict *options, Error **errp)
         return -EINVAL;
     }
 
-    if (strcmp(uri->scheme, "ssh") != 0) {
+    if (g_strcmp0(uri->scheme, "ssh") != 0) {
         error_setg(errp, "URI scheme must be 'ssh'");
         goto err;
     }
-- 
1.8.3.1

The following changes since commit ec11dc41eec5142b4776db1296972c6323ba5847:

Merge tag 'pull-misc-2022-05-11' of git://repo.or.cz/qemu/armbru into staging (2022-05-11 09:00:26 -0700)

are available in the Git repository at:

git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to f70625299ecc9ba577c87f3d1d75012c747c7d88:

qemu-iotests: inline common.config into common.rc (2022-05-12 15:42:49 +0200)

----------------------------------------------------------------
Block layer patches

- coroutine: Fix crashes due to too large pool batch size
- fdc: Prevent end-of-track overrun
- nbd: MULTI_CONN for shared writable exports
- iotests test runner improvements

----------------------------------------------------------------
Daniel P. Berrangé (2):
      tests/qemu-iotests: print intent to run a test in TAP mode
      .gitlab-ci.d: export meson testlog.txt as an artifact

Eric Blake (2):
      qemu-nbd: Pass max connections to blockdev layer
      nbd/server: Allow MULTI_CONN for shared writable exports

Hanna Reitz (1):
      iotests/testrunner: Flush after run_test()

Kevin Wolf (2):
      coroutine: Rename qemu_coroutine_inc/dec_pool_size()
      coroutine: Revert to constant batch size

Paolo Bonzini (1):
      qemu-iotests: inline common.config into common.rc

Philippe Mathieu-Daudé (2):
      hw/block/fdc: Prevent end-of-track overrun (CVE-2021-3507)
      tests/qtest/fdc-test: Add a regression test for CVE-2021-3507

qapi/block-export.json                           |   8 +-
 docs/interop/nbd.txt                             |   1 +
 docs/tools/qemu-nbd.rst                          |   3 +-
 include/block/nbd.h                              |   5 +-
 include/qemu/coroutine.h                         |   6 +-
 blockdev-nbd.c                                   |  13 +-
 hw/block/fdc.c                                   |   8 ++
 hw/block/virtio-blk.c                            |   6 +-
 nbd/server.c                                     |  10 +-
 qemu-nbd.c                                       |   2 +-
 tests/qtest/fdc-test.c                           |  21 ++++
 util/qemu-coroutine.c                            |  26 ++--
 tests/qemu-iotests/testrunner.py                 |   4 +
 .gitlab-ci.d/buildtest-template.yml              |  12 +-
 MAINTAINERS                                      |   1 +
 tests/qemu-iotests/common.config                 |  41 -------
 tests/qemu-iotests/common.rc                     |  31 +++--
 tests/qemu-iotests/tests/nbd-multiconn           | 145 +++++++++++++++++++++++
 tests/qemu-iotests/tests/nbd-multiconn.out       |   5 +
 tests/qemu-iotests/tests/nbd-qemu-allocation.out |   2 +-
 20 files changed, 261 insertions(+), 89 deletions(-)
 delete mode 100644 tests/qemu-iotests/common.config
 create mode 100755 tests/qemu-iotests/tests/nbd-multiconn
 create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out

It's true that these functions currently affect the batch size in which
coroutines are reused (i.e. moved from the global release pool to the
allocation pool of a specific thread), but this is a bug and will be
fixed in a separate patch.

In fact, the comment in the header file already just promises that it
influences the pool size, so reflect this in the name of the functions.
As a nice side effect, the shorter function name makes some line
wrapping unnecessary.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220510151020.105528-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/coroutine.h | 6 +++---
 hw/block/virtio-blk.c    | 6 ++----
 util/qemu-coroutine.c    | 4 ++--
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -XXX,XX +XXX,XX @@ void coroutine_fn yield_until_fd_readable(int fd);
 /**
  * Increase coroutine pool size
  */
-void qemu_coroutine_increase_pool_batch_size(unsigned int additional_pool_size);
+void qemu_coroutine_inc_pool_size(unsigned int additional_pool_size);
 
 /**
- * Devcrease coroutine pool size
+ * Decrease coroutine pool size
  */
-void qemu_coroutine_decrease_pool_batch_size(unsigned int additional_pool_size);
+void qemu_coroutine_dec_pool_size(unsigned int additional_pool_size);
 
 #include "qemu/lockable.h"
 
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
     for (i = 0; i < conf->num_queues; i++) {
         virtio_add_queue(vdev, conf->queue_size, virtio_blk_handle_output);
     }
-    qemu_coroutine_increase_pool_batch_size(conf->num_queues * conf->queue_size
-                                            / 2);
+    qemu_coroutine_inc_pool_size(conf->num_queues * conf->queue_size / 2);
     virtio_blk_data_plane_create(vdev, conf, &s->dataplane, &err);
     if (err != NULL) {
         error_propagate(errp, err);
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_unrealize(DeviceState *dev)
     for (i = 0; i < conf->num_queues; i++) {
         virtio_del_queue(vdev, i);
     }
-    qemu_coroutine_decrease_pool_batch_size(conf->num_queues * conf->queue_size
-                                            / 2);
+    qemu_coroutine_dec_pool_size(conf->num_queues * conf->queue_size / 2);
     qemu_del_vm_change_state_handler(s->change);
     blockdev_mark_auto_del(s->blk);
     virtio_cleanup(vdev);
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index XXXXXXX..XXXXXXX 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -XXX,XX +XXX,XX @@ AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co)
     return co->ctx;
 }
 
-void qemu_coroutine_increase_pool_batch_size(unsigned int additional_pool_size)
+void qemu_coroutine_inc_pool_size(unsigned int additional_pool_size)
 {
     qatomic_add(&pool_batch_size, additional_pool_size);
 }
 
-void qemu_coroutine_decrease_pool_batch_size(unsigned int removing_pool_size)
+void qemu_coroutine_dec_pool_size(unsigned int removing_pool_size)
 {
     qatomic_sub(&pool_batch_size, removing_pool_size);
 }
-- 
2.35.3

Commit 4c41c69e changed the way the coroutine pool is sized because for
virtio-blk devices with a large queue size and heavy I/O, it was just
too small and caused coroutines to be deleted and reallocated soon
afterwards. The change made the size dynamic based on the number of
queues and the queue size of virtio-blk devices.

There are two important numbers here: Slightly simplified, when a
coroutine terminates, it is generally stored in the global release pool
up to a certain pool size, and if the pool is full, it is freed.
Conversely, when allocating a new coroutine, the coroutines in the
release pool are reused if the pool already has reached a certain
minimum size (the batch size), otherwise we allocate new coroutines.

The problem after commit 4c41c69e is that it not only increases the
maximum pool size (which is the intended effect), but also the batch
size for reusing coroutines (which is a bug). It means that in cases
with many devices and/or a large queue size (which defaults to the
number of vcpus for virtio-blk-pci), many thousand coroutines could be
sitting in the release pool without being reused.

This is not only a waste of memory and allocations, but it actually
makes the QEMU process likely to hit the vm.max_map_count limit on Linux
because each coroutine requires two mappings (its stack and the guard
page for the stack), causing it to abort() in qemu_alloc_stack() because
when the limit is hit, mprotect() starts to fail with ENOMEM.

In order to fix the problem, change the batch size back to 64 to avoid
uselessly accumulating coroutines in the release pool, but keep the
dynamic maximum pool size so that coroutines aren't freed too early
in heavy I/O scenarios.

Note that this fix doesn't strictly make it impossible to hit the limit,
but this would only happen if most of the coroutines are actually in use
at the same time, not just sitting in a pool. This is the same behaviour
as we already had before commit 4c41c69e. Fully preventing this would
require allowing qemu_coroutine_create() to return an error, but it
doesn't seem to be a scenario that people hit in practice.

Cc: qemu-stable@nongnu.org
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2079938
Fixes: 4c41c69e05fe28c0f95f8abd2ebf407e95a4f04b
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220510151020.105528-3-kwolf@redhat.com>
Tested-by: Hiroki Narukawa <hnarukaw@yahoo-corp.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 util/qemu-coroutine.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index XXXXXXX..XXXXXXX 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -XXX,XX +XXX,XX @@
 #include "qemu/coroutine-tls.h"
 #include "block/aio.h"
 
-/** Initial batch size is 64, and is increased on demand */
+/**
+ * The minimal batch size is always 64, coroutines from the release_pool are
+ * reused as soon as there are 64 coroutines in it. The maximum pool size starts
+ * with 64 and is increased on demand so that coroutines are not deleted even if
+ * they are not immediately reused.
+ */
 enum {
-    POOL_INITIAL_BATCH_SIZE = 64,
+    POOL_MIN_BATCH_SIZE = 64,
+    POOL_INITIAL_MAX_SIZE = 64,
 };
 
 /** Free list to speed up creation */
 static QSLIST_HEAD(, Coroutine) release_pool = QSLIST_HEAD_INITIALIZER(pool);
-static unsigned int pool_batch_size = POOL_INITIAL_BATCH_SIZE;
+static unsigned int pool_max_size = POOL_INITIAL_MAX_SIZE;
 static unsigned int release_pool_size;
 
 typedef QSLIST_HEAD(, Coroutine) CoroutineQSList;
@@ -XXX,XX +XXX,XX @@ Coroutine *qemu_coroutine_create(CoroutineEntry *entry, void *opaque)
 
         co = QSLIST_FIRST(alloc_pool);
         if (!co) {
-            if (release_pool_size > qatomic_read(&pool_batch_size)) {
+            if (release_pool_size > POOL_MIN_BATCH_SIZE) {
                 /* Slow path; a good place to register the destructor, too.  */
                 Notifier *notifier = get_ptr_coroutine_pool_cleanup_notifier();
                 if (!notifier->notify) {
@@ -XXX,XX +XXX,XX @@ static void coroutine_delete(Coroutine *co)
     co->caller = NULL;
 
     if (CONFIG_COROUTINE_POOL) {
-        if (release_pool_size < qatomic_read(&pool_batch_size) * 2) {
+        if (release_pool_size < qatomic_read(&pool_max_size) * 2) {
             QSLIST_INSERT_HEAD_ATOMIC(&release_pool, co, pool_next);
             qatomic_inc(&release_pool_size);
             return;
         }
-        if (get_alloc_pool_size() < qatomic_read(&pool_batch_size)) {
+        if (get_alloc_pool_size() < qatomic_read(&pool_max_size)) {
             QSLIST_INSERT_HEAD(get_ptr_alloc_pool(), co, pool_next);
             set_alloc_pool_size(get_alloc_pool_size() + 1);
             return;
@@ -XXX,XX +XXX,XX @@ AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co)
 
 void qemu_coroutine_inc_pool_size(unsigned int additional_pool_size)
 {
-    qatomic_add(&pool_batch_size, additional_pool_size);
+    qatomic_add(&pool_max_size, additional_pool_size);
 }
 
 void qemu_coroutine_dec_pool_size(unsigned int removing_pool_size)
 {
-    qatomic_sub(&pool_batch_size, removing_pool_size);
+    qatomic_sub(&pool_max_size, removing_pool_size);
 }
-- 
2.35.3

From: Hanna Reitz <hreitz@redhat.com>

When stdout is not a terminal, the buffer may not be flushed at each end
of line, so we should flush after each test is done.  This is especially
apparent when run by check-block, in two ways:

First, when running make check-block -jX with X > 1, progress indication
was missing, even though testrunner.py does theoretically print each
test's status once it has been run, even in multi-processing mode.
Flushing after each test restores this progress indication.

Second, sometimes make check-block failed altogether, with an error
message that "too few tests [were] run".  I presume that's because one
worker process in the job pool did not get to flush its stdout before
the main process exited, and so meson did not get to see that worker's
test results.  In any case, by flushing at the end of run_test(), the
problem has disappeared for me.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220506134215.10086-1-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/testrunner.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/testrunner.py
+++ b/tests/qemu-iotests/testrunner.py
@@ -XXX,XX +XXX,XX @@ def run_test(self, test: str,
             else:
                 print(res.casenotrun)
 
+        sys.stdout.flush()
         return res
 
     def run_tests(self, tests: List[str], jobs: int = 1) -> bool:
-- 
2.35.3

From: Daniel P. Berrangé <berrange@redhat.com>

When running I/O tests using TAP output mode, we get a single TAP test
with a sub-test reported for each I/O test that is run. The output looks
something like this:

1..123
 ok qcow2 011
 ok qcow2 012
 ok qcow2 013
 ok qcow2 217
 ...

If everything runs or fails normally this is fine, but periodically we
have been seeing the test harness abort early before all 123 tests have
been run, just leaving a fairly useless message like

TAP parsing error: Too few tests run (expected 123, got 107)

we have no idea which tests were running at the time the test harness
abruptly exited. This change causes us to print a message about our
intent to run each test, so we have a record of what is active at the
time the harness exits abnormally.

1..123
 # running qcow2 011
 ok qcow2 011
 # running qcow2 012
 ok qcow2 012
 # running qcow2 013
 ok qcow2 013
 # running qcow2 217
 ok qcow2 217
 ...

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220509124134.867431-2-berrange@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/testrunner.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/qemu-iotests/testrunner.py b/tests/qemu-iotests/testrunner.py
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/testrunner.py
+++ b/tests/qemu-iotests/testrunner.py
@@ -XXX,XX +XXX,XX @@ def run_test(self, test: str,
                                      starttime=start,
                                      lasttime=last_el,
                                      end = '\n' if mp else '\r')
+        else:
+            testname = os.path.basename(test)
+            print(f'# running {self.env.imgfmt} {testname}')
 
         res = self.do_run_test(test, mp)
 
-- 
2.35.3

From: Daniel P. Berrangé <berrange@redhat.com>

When running 'make check' we only get a summary of progress on the
console. Fortunately meson/ninja have saved the raw test output to a
logfile. Exposing this log will make it easier to debug failures that
happen in CI.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220509124134.867431-3-berrange@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 .gitlab-ci.d/buildtest-template.yml | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/.gitlab-ci.d/buildtest-template.yml b/.gitlab-ci.d/buildtest-template.yml
index XXXXXXX..XXXXXXX 100644
--- a/.gitlab-ci.d/buildtest-template.yml
+++ b/.gitlab-ci.d/buildtest-template.yml
@@ -XXX,XX +XXX,XX @@
         make -j"$JOBS" $MAKE_CHECK_ARGS ;
       fi
 
-.native_test_job_template:
+.common_test_job_template:
   stage: test
   image: $CI_REGISTRY_IMAGE/qemu/$IMAGE:latest
   script:
@@ -XXX,XX +XXX,XX @@
     # Avoid recompiling by hiding ninja with NINJA=":"
     - make NINJA=":" $MAKE_CHECK_ARGS
 
+.native_test_job_template:
+  extends: .common_test_job_template
+  artifacts:
+    name: "$CI_JOB_NAME-$CI_COMMIT_REF_SLUG"
+    expire_in: 7 days
+    paths:
+      - build/meson-logs/testlog.txt
+
 .avocado_test_job_template:
-  extends: .native_test_job_template
+  extends: .common_test_job_template
   cache:
     key: "${CI_JOB_NAME}-cache"
     paths:
-- 
2.35.3

From: Philippe Mathieu-Daudé <philmd@redhat.com>

Per the 82078 datasheet, if the end-of-track (EOT byte in
the FIFO) is more than the number of sectors per side, the
command is terminated unsuccessfully:

* 5.2.5 DATA TRANSFER TERMINATION

The 82078 supports terminal count explicitly through
  the TC pin and implicitly through the underrun/over-
  run and end-of-track (EOT) functions. For full sector
  transfers, the EOT parameter can define the last
  sector to be transferred in a single or multisector
  transfer. If the last sector to be transferred is a par-
  tial sector, the host can stop transferring the data in
  mid-sector, and the 82078 will continue to complete
  the sector as if a hardware TC was received. The
  only difference between these implicit functions and
  TC is that they return "abnormal termination" result
  status. Such status indications can be ignored if they
  were expected.

* 6.1.3 READ TRACK

This command terminates when the EOT specified
  number of sectors have been read. If the 82078
  does not find an I D Address Mark on the diskette
  after the second· occurrence of a pulse on the
  INDX# pin, then it sets the IC code in Status Regis-
  ter 0 to "01" (Abnormal termination), sets the MA bit
  in Status Register 1 to "1", and terminates the com-
  mand.

* 6.1.6 VERIFY

Refer to Table 6-6 and Table 6-7 for information
  concerning the values of MT and EC versus SC and
  EOT value.

* Table 6·6. Result Phase Table

* Table 6-7. Verify Command Result Phase Table

Fix by aborting the transfer when EOT > # Sectors Per Side.

Cc: qemu-stable@nongnu.org
Cc: Hervé Poussineau <hpoussin@reactos.org>
Fixes: baca51faff0 ("floppy driver: disk geometry auto detect")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/339
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211118115733.4038610-2-philmd@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/block/fdc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -XXX,XX +XXX,XX @@ static void fdctrl_start_transfer(FDCtrl *fdctrl, int direction)
         int tmp;
         fdctrl->data_len = 128 << (fdctrl->fifo[5] > 7 ? 7 : fdctrl->fifo[5]);
         tmp = (fdctrl->fifo[6] - ks + 1);
+        if (tmp < 0) {
+            FLOPPY_DPRINTF("invalid EOT: %d\n", tmp);
+            fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM, FD_SR1_MA, 0x00);
+            fdctrl->fifo[3] = kt;
+            fdctrl->fifo[4] = kh;
+            fdctrl->fifo[5] = ks;
+            return;
+        }
         if (fdctrl->fifo[0] & 0x80)
             tmp += fdctrl->fifo[6];
         fdctrl->data_len *= tmp;
-- 
2.35.3

From: Philippe Mathieu-Daudé <philmd@redhat.com>

Add the reproducer from https://gitlab.com/qemu-project/qemu/-/issues/339

Without the previous commit, when running 'make check-qtest-i386'
with QEMU configured with '--enable-sanitizers' we get:

==4028352==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619000062a00 at pc 0x5626d03c491a bp 0x7ffdb4199410 sp 0x7ffdb4198bc0
  READ of size 786432 at 0x619000062a00 thread T0
      #0 0x5626d03c4919 in __asan_memcpy (qemu-system-i386+0x1e65919)
      #1 0x5626d1c023cc in flatview_write_continue softmmu/physmem.c:2787:13
      #2 0x5626d1bf0c0f in flatview_write softmmu/physmem.c:2822:14
      #3 0x5626d1bf0798 in address_space_write softmmu/physmem.c:2914:18
      #4 0x5626d1bf0f37 in address_space_rw softmmu/physmem.c:2924:16
      #5 0x5626d1bf14c8 in cpu_physical_memory_rw softmmu/physmem.c:2933:5
      #6 0x5626d0bd5649 in cpu_physical_memory_write include/exec/cpu-common.h:82:5
      #7 0x5626d0bd0a07 in i8257_dma_write_memory hw/dma/i8257.c:452:9
      #8 0x5626d09f825d in fdctrl_transfer_handler hw/block/fdc.c:1616:13
      #9 0x5626d0a048b4 in fdctrl_start_transfer hw/block/fdc.c:1539:13
      #10 0x5626d09f4c3e in fdctrl_write_data hw/block/fdc.c:2266:13
      #11 0x5626d09f22f7 in fdctrl_write hw/block/fdc.c:829:9
      #12 0x5626d1c20bc5 in portio_write softmmu/ioport.c:207:17

0x619000062a00 is located 0 bytes to the right of 512-byte region [0x619000062800,0x619000062a00)
  allocated by thread T0 here:
      #0 0x5626d03c66ec in posix_memalign (qemu-system-i386+0x1e676ec)
      #1 0x5626d2b988d4 in qemu_try_memalign util/oslib-posix.c:210:11
      #2 0x5626d2b98b0c in qemu_memalign util/oslib-posix.c:226:27
      #3 0x5626d09fbaf0 in fdctrl_realize_common hw/block/fdc.c:2341:20
      #4 0x5626d0a150ed in isabus_fdc_realize hw/block/fdc-isa.c:113:5
      #5 0x5626d2367935 in device_set_realized hw/core/qdev.c:531:13

SUMMARY: AddressSanitizer: heap-buffer-overflow (qemu-system-i386+0x1e65919) in __asan_memcpy
  Shadow bytes around the buggy address:
    0x0c32800044f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c3280004510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c3280004520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c3280004530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  =>0x0c3280004540:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004550: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004560: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004570: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Heap left redzone:       fa
    Freed heap region:       fd
  ==4028352==ABORTING

[ kwolf: Added snapshot=on to prevent write file lock failure ]

Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qtest/fdc-test.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tests/qtest/fdc-test.c b/tests/qtest/fdc-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/qtest/fdc-test.c
+++ b/tests/qtest/fdc-test.c
@@ -XXX,XX +XXX,XX @@ static void test_cve_2021_20196(void)
     qtest_quit(s);
 }
 
+static void test_cve_2021_3507(void)
+{
+    QTestState *s;
+
+    s = qtest_initf("-nographic -m 32M -nodefaults "
+                    "-drive file=%s,format=raw,if=floppy,snapshot=on",
+                    test_image);
+    qtest_outl(s, 0x9, 0x0a0206);
+    qtest_outw(s, 0x3f4, 0x1600);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_outw(s, 0x3f4, 0x0200);
+    qtest_outw(s, 0x3f4, 0x0200);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_outw(s, 0x3f4, 0x0000);
+    qtest_quit(s);
+}
+
 int main(int argc, char **argv)
 {
     int fd;
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
     qtest_add_func("/fdc/read_no_dma_19", test_read_no_dma_19);
     qtest_add_func("/fdc/fuzz-registers", fuzz_registers);
     qtest_add_func("/fdc/fuzz/cve_2021_20196", test_cve_2021_20196);
+    qtest_add_func("/fdc/fuzz/cve_2021_3507", test_cve_2021_3507);
 
     ret = g_test_run();
 
-- 
2.35.3

From: Eric Blake <eblake@redhat.com>

The next patch wants to adjust whether the NBD server code advertises
MULTI_CONN based on whether it is known if the server limits to
exactly one client.  For a server started by QMP, this information is
obtained through nbd_server_start (which can support more than one
export); but for qemu-nbd (which supports exactly one export), it is
controlled only by the command-line option -e/--shared.  Since we
already have a hook function used by qemu-nbd, it's easiest to just
alter its signature to fit our needs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220512004924.417153-2-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/nbd.h | 2 +-
 blockdev-nbd.c      | 8 ++++----
 qemu-nbd.c          | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -XXX,XX +XXX,XX @@ void nbd_client_new(QIOChannelSocket *sioc,
 void nbd_client_get(NBDClient *client);
 void nbd_client_put(NBDClient *client);
 
-void nbd_server_is_qemu_nbd(bool value);
+void nbd_server_is_qemu_nbd(int max_connections);
 bool nbd_server_is_running(void);
 void nbd_server_start(SocketAddress *addr, const char *tls_creds,
                       const char *tls_authz, uint32_t max_connections,
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index XXXXXXX..XXXXXXX 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -XXX,XX +XXX,XX @@ typedef struct NBDServerData {
 } NBDServerData;
 
 static NBDServerData *nbd_server;
-static bool is_qemu_nbd;
+static int qemu_nbd_connections = -1; /* Non-negative if this is qemu-nbd */
 
 static void nbd_update_server_watch(NBDServerData *s);
 
-void nbd_server_is_qemu_nbd(bool value)
+void nbd_server_is_qemu_nbd(int max_connections)
 {
-    is_qemu_nbd = value;
+    qemu_nbd_connections = max_connections;
 }
 
 bool nbd_server_is_running(void)
 {
-    return nbd_server || is_qemu_nbd;
+    return nbd_server || qemu_nbd_connections >= 0;
 }
 
 static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
diff --git a/qemu-nbd.c b/qemu-nbd.c
index XXXXXXX..XXXXXXX 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv)
 
     bs->detect_zeroes = detect_zeroes;
 
-    nbd_server_is_qemu_nbd(true);
+    nbd_server_is_qemu_nbd(shared);
 
     export_opts = g_new(BlockExportOptions, 1);
     *export_opts = (BlockExportOptions) {
-- 
2.35.3

From: Eric Blake <eblake@redhat.com>

According to the NBD spec, a server that advertises
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.

We always satisfy these conditions in qemu - even when we support
multiple clients, ALL clients go through a single point of reference
into the block layer, with no local caching.  The effect of one client
is instantly visible to the next client.  Even if our backend were a
network device, we argue that any multi-path caching effects that
would cause inconsistencies in back-to-back actions not seeing the
effect of previous actions would be a bug in that backend, and not the
fault of caching in qemu.  As such, it is safe to unconditionally
advertise CAN_MULTI_CONN for any qemu NBD server situation that
supports parallel clients.

Note, however, that we don't want to advertise CAN_MULTI_CONN when we
know that a second client cannot connect (for historical reasons,
qemu-nbd defaults to a single connection while nbd-server-add and QMP
commands default to unlimited connections; but we already have
existing means to let either style of NBD server creation alter those
defaults).  This is visible by no longer advertising MULTI_CONN for
'qemu-nbd -r' without -e, as in the iotest nbd-qemu-allocation.

The harder part of this patch is setting up an iotest to demonstrate
behavior of multiple NBD clients to a single server.  It might be
possible with parallel qemu-io processes, but I found it easier to do
in python with the help of libnbd, and help from Nir and Vladimir in
writing the test.

Signed-off-by: Eric Blake <eblake@redhat.com>
Suggested-by: Nir Soffer <nsoffer@redhat.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
Message-Id: <20220512004924.417153-3-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qapi/block-export.json                        |   8 +-
 docs/interop/nbd.txt                          |   1 +
 docs/tools/qemu-nbd.rst                       |   3 +-
 include/block/nbd.h                           |   3 +-
 blockdev-nbd.c                                |   5 +
 nbd/server.c                                  |  10 +-
 MAINTAINERS                                   |   1 +
 tests/qemu-iotests/tests/nbd-multiconn        | 145 ++++++++++++++++++
 tests/qemu-iotests/tests/nbd-multiconn.out    |   5 +
 .../tests/nbd-qemu-allocation.out             |   2 +-
 10 files changed, 172 insertions(+), 11 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/nbd-multiconn
 create mode 100644 tests/qemu-iotests/tests/nbd-multiconn.out

diff --git a/qapi/block-export.json b/qapi/block-export.json
index XXXXXXX..XXXXXXX 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -XXX,XX +XXX,XX @@
 #             recreated on the fly while the NBD server is active.
 #             If missing, it will default to denying access (since 4.0).
 # @max-connections: The maximum number of connections to allow at the same
-#                   time, 0 for unlimited. (since 5.2; default: 0)
+#                   time, 0 for unlimited. Setting this to 1 also stops
+#                   the server from advertising multiple client support
+#                   (since 5.2; default: 0)
 #
 # Since: 4.2
 ##
@@ -XXX,XX +XXX,XX @@
 #             recreated on the fly while the NBD server is active.
 #             If missing, it will default to denying access (since 4.0).
 # @max-connections: The maximum number of connections to allow at the same
-#                   time, 0 for unlimited. (since 5.2; default: 0)
+#                   time, 0 for unlimited. Setting this to 1 also stops
+#                   the server from advertising multiple client support
+#                   (since 5.2; default: 0).
 #
 # Returns: error if the server is already running.
 #
diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index XXXXXXX..XXXXXXX 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -XXX,XX +XXX,XX @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 * 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
+* 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
diff --git a/docs/tools/qemu-nbd.rst b/docs/tools/qemu-nbd.rst
index XXXXXXX..XXXXXXX 100644
--- a/docs/tools/qemu-nbd.rst
+++ b/docs/tools/qemu-nbd.rst
@@ -XXX,XX +XXX,XX @@ driver options if :option:`--image-opts` is specified.
 .. option:: -e, --shared=NUM
 
   Allow up to *NUM* clients to share the device (default
-  ``1``), 0 for unlimited. Safe for readers, but for now,
-  consistency is not guaranteed between multiple writers.
+  ``1``), 0 for unlimited.
 
 .. option:: -t, --persistent
 
diff --git a/include/block/nbd.h b/include/block/nbd.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -XXX,XX +XXX,XX @@
 /*
- *  Copyright (C) 2016-2020 Red Hat, Inc.
+ *  Copyright (C) 2016-2022 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device
@@ -XXX,XX +XXX,XX @@ void nbd_client_put(NBDClient *client);
 
 void nbd_server_is_qemu_nbd(int max_connections);
 bool nbd_server_is_running(void);
+int nbd_server_max_connections(void);
 void nbd_server_start(SocketAddress *addr, const char *tls_creds,
                       const char *tls_authz, uint32_t max_connections,
                       Error **errp);
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index XXXXXXX..XXXXXXX 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -XXX,XX +XXX,XX @@ bool nbd_server_is_running(void)
     return nbd_server || qemu_nbd_connections >= 0;
 }
 
+int nbd_server_max_connections(void)
+{
+    return nbd_server ? nbd_server->max_connections : qemu_nbd_connections;
+}
+
 static void nbd_blockdev_client_closed(NBDClient *client, bool ignored)
 {
     nbd_client_put(client);
diff --git a/nbd/server.c b/nbd/server.c
index XXXXXXX..XXXXXXX 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -XXX,XX +XXX,XX @@
 /*
- *  Copyright (C) 2016-2021 Red Hat, Inc.
+ *  Copyright (C) 2016-2022 Red Hat, Inc.
  *  Copyright (C) 2005  Anthony Liguori <anthony@codemonkey.ws>
  *
  *  Network Block Device Server Side
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
     int64_t size;
     uint64_t perm, shared_perm;
     bool readonly = !exp_args->writable;
-    bool shared = !exp_args->writable;
     BlockDirtyBitmapOrStrList *bitmaps;
     size_t i;
     int ret;
@@ -XXX,XX +XXX,XX @@ static int nbd_export_create(BlockExport *blk_exp, BlockExportOptions *exp_args,
     exp->description = g_strdup(arg->description);
     exp->nbdflags = (NBD_FLAG_HAS_FLAGS | NBD_FLAG_SEND_FLUSH |
                      NBD_FLAG_SEND_FUA | NBD_FLAG_SEND_CACHE);
+
+    if (nbd_server_max_connections() != 1) {
+        exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
+    }
     if (readonly) {
         exp->nbdflags |= NBD_FLAG_READ_ONLY;
-        if (shared) {
-            exp->nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
-        }
     } else {
         exp->nbdflags |= (NBD_FLAG_SEND_TRIM | NBD_FLAG_SEND_WRITE_ZEROES |
                           NBD_FLAG_SEND_FAST_ZERO);
diff --git a/MAINTAINERS b/MAINTAINERS
index XXXXXXX..XXXXXXX 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -XXX,XX +XXX,XX @@ F: qemu-nbd.*
 F: blockdev-nbd.c
 F: docs/interop/nbd.txt
 F: docs/tools/qemu-nbd.rst
+F: tests/qemu-iotests/tests/*nbd*
 T: git https://repo.or.cz/qemu/ericb.git nbd
 T: git https://src.openvz.org/scm/~vsementsov/qemu.git nbd
 
diff --git a/tests/qemu-iotests/tests/nbd-multiconn b/tests/qemu-iotests/tests/nbd-multiconn
new file mode 100755
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qemu-iotests/tests/nbd-multiconn
@@ -XXX,XX +XXX,XX @@
+#!/usr/bin/env python3
+# group: rw auto quick
+#
+# Test cases for NBD multi-conn advertisement
+#
+# Copyright (C) 2022 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import os
+from contextlib import contextmanager
+import iotests
+from iotests import qemu_img_create, qemu_io
+
+
+disk = os.path.join(iotests.test_dir, 'disk')
+size = '4M'
+nbd_sock = os.path.join(iotests.sock_dir, 'nbd_sock')
+nbd_uri = 'nbd+unix:///{}?socket=' + nbd_sock
+
+
+@contextmanager
+def open_nbd(export_name):
+    h = nbd.NBD()
+    try:
+        h.connect_uri(nbd_uri.format(export_name))
+        yield h
+    finally:
+        h.shutdown()
+
+class TestNbdMulticonn(iotests.QMPTestCase):
+    def setUp(self):
+        qemu_img_create('-f', iotests.imgfmt, disk, size)
+        qemu_io('-c', 'w -P 1 0 2M', '-c', 'w -P 2 2M 2M', disk)
+
+        self.vm = iotests.VM()
+        self.vm.launch()
+        result = self.vm.qmp('blockdev-add', {
+            'driver': 'qcow2',
+            'node-name': 'n',
+            'file': {'driver': 'file', 'filename': disk}
+        })
+        self.assert_qmp(result, 'return', {})
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(disk)
+        try:
+            os.remove(nbd_sock)
+        except OSError:
+            pass
+
+    @contextmanager
+    def run_server(self, max_connections=None):
+        args = {
+            'addr': {
+                'type': 'unix',
+                'data': {'path': nbd_sock}
+            }
+        }
+        if max_connections is not None:
+            args['max-connections'] = max_connections
+
+        result = self.vm.qmp('nbd-server-start', args)
+        self.assert_qmp(result, 'return', {})
+        yield
+
+        result = self.vm.qmp('nbd-server-stop')
+        self.assert_qmp(result, 'return', {})
+
+    def add_export(self, name, writable=None):
+        args = {
+            'type': 'nbd',
+            'id': name,
+            'node-name': 'n',
+            'name': name,
+        }
+        if writable is not None:
+            args['writable'] = writable
+
+        result = self.vm.qmp('block-export-add', args)
+        self.assert_qmp(result, 'return', {})
+
+    def test_default_settings(self):
+        with self.run_server():
+            self.add_export('r')
+            self.add_export('w', writable=True)
+            with open_nbd('r') as h:
+                self.assertTrue(h.can_multi_conn())
+            with open_nbd('w') as h:
+                self.assertTrue(h.can_multi_conn())
+
+    def test_limited_connections(self):
+        with self.run_server(max_connections=1):
+            self.add_export('r')
+            self.add_export('w', writable=True)
+            with open_nbd('r') as h:
+                self.assertFalse(h.can_multi_conn())
+            with open_nbd('w') as h:
+                self.assertFalse(h.can_multi_conn())
+
+    def test_parallel_writes(self):
+        with self.run_server():
+            self.add_export('w', writable=True)
+
+            clients = [nbd.NBD() for _ in range(3)]
+            for c in clients:
+                c.connect_uri(nbd_uri.format('w'))
+                self.assertTrue(c.can_multi_conn())
+
+            initial_data = clients[0].pread(1024 * 1024, 0)
+            self.assertEqual(initial_data, b'\x01' * 1024 * 1024)
+
+            updated_data = b'\x03' * 1024 * 1024
+            clients[1].pwrite(updated_data, 0)
+            clients[2].flush()
+            current_data = clients[0].pread(1024 * 1024, 0)
+
+            self.assertEqual(updated_data, current_data)
+
+            for i in range(3):
+                clients[i].shutdown()
+
+
+if __name__ == '__main__':
+    try:
+        # Easier to use libnbd than to try and set up parallel
+        # 'qemu-nbd --list' or 'qemu-io' processes, but not all systems
+        # have libnbd installed.
+        import nbd  # type: ignore
+
+        iotests.main(supported_fmts=['qcow2'])
+    except ImportError:
+        iotests.notrun('libnbd not installed')
diff --git a/tests/qemu-iotests/tests/nbd-multiconn.out b/tests/qemu-iotests/tests/nbd-multiconn.out
new file mode 100644
index XXXXXXX..XXXXXXX
--- /dev/null
+++ b/tests/qemu-iotests/tests/nbd-multiconn.out
@@ -XXX,XX +XXX,XX @@
+...
+----------------------------------------------------------------------
+Ran 3 tests
+
+OK
diff --git a/tests/qemu-iotests/tests/nbd-qemu-allocation.out b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/tests/nbd-qemu-allocation.out
+++ b/tests/qemu-iotests/tests/nbd-qemu-allocation.out
@@ -XXX,XX +XXX,XX @@ wrote 2097152/2097152 bytes at offset 1048576
 exports available: 1
  export: ''
   size:  4194304
-  flags: 0x58f ( readonly flush fua df multi cache )
+  flags: 0x48f ( readonly flush fua df cache )
   min block: 1
   opt block: 4096
   max block: 33554432
-- 
2.35.3

From: Paolo Bonzini <pbonzini@redhat.com>

common.rc has some complicated logic to find the common.config that
dates back to xfstests and is completely unnecessary now.  Just include
the contents of the file.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220505094723.732116-1-pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/common.config | 41 --------------------------------
 tests/qemu-iotests/common.rc     | 31 ++++++++++++++----------
 2 files changed, 19 insertions(+), 53 deletions(-)
 delete mode 100644 tests/qemu-iotests/common.config

diff --git a/tests/qemu-iotests/common.config b/tests/qemu-iotests/common.config
deleted file mode 100644
index XXXXXXX..XXXXXXX
--- a/tests/qemu-iotests/common.config
+++ /dev/null
@@ -XXX,XX +XXX,XX @@
-#!/usr/bin/env bash
-#
-# Copyright (C) 2009 Red Hat, Inc.
-# Copyright (c) 2000-2003,2006 Silicon Graphics, Inc.  All Rights Reserved.
-#
-# This program is free software; you can redistribute it and/or
-# modify it under the terms of the GNU General Public License as
-# published by the Free Software Foundation.
-#
-# This program is distributed in the hope that it would be useful,
-# but WITHOUT ANY WARRANTY; without even the implied warranty of
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-# GNU General Public License for more details.
-#
-# You should have received a copy of the GNU General Public License
-# along with this program.  If not, see <http://www.gnu.org/licenses/>.
-#
-# all tests should use a common language setting to prevent golden
-# output mismatches.
-export LANG=C
-
-PATH=".:$PATH"
-
-HOSTOS=$(uname -s)
-arch=$(uname -m)
-[[ "$arch" =~ "ppc64" ]] && qemu_arch=ppc64 || qemu_arch="$arch"
-
-# make sure we have a standard umask
-umask 022
-
-_optstr_add()
-{
-    if [ -n "$1" ]; then
-        echo "$1,$2"
-    else
-        echo "$2"
-    fi
-}
-
-# make sure this script returns success
-true
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index XXXXXXX..XXXXXXX 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -XXX,XX +XXX,XX @@
 # along with this program.  If not, see <http://www.gnu.org/licenses/>.
 #
 
+export LANG=C
+
+PATH=".:$PATH"
+
+HOSTOS=$(uname -s)
+arch=$(uname -m)
+[[ "$arch" =~ "ppc64" ]] && qemu_arch=ppc64 || qemu_arch="$arch"
+
+# make sure we have a standard umask
+umask 022
+
 # bail out, setting up .notrun file
 _notrun()
 {
@@ -XXX,XX +XXX,XX @@ peek_file_raw()
     dd if="$1" bs=1 skip="$2" count="$3" status=none
 }
 
-config=common.config
-test -f $config || config=../common.config
-if ! test -f $config
-then
-    echo "$0: failed to find common.config"
-    exit 1
-fi
-if ! . $config
-    then
-    echo "$0: failed to source common.config"
-    exit 1
-fi
+_optstr_add()
+{
+    if [ -n "$1" ]; then
+        echo "$1,$2"
+    else
+        echo "$2"
+    fi
+}
 
 # Set the variables to the empty string to turn Valgrind off
 # for specific processes, e.g.
-- 
2.35.3