Series comparison

-[Qemu-devel] [PULL 0/9] Block patches
+[PULL for-6.1 0/3] Block patches
-The following changes since commit e47f81b617684c4546af286d307b69014a83538a:
+The following changes since commit 801f3db7564dcce8a37a70833c0abe40ec19f8ce:
-  Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging (2019-02-07 18:53:25 +0000)
+  Merge remote-tracking branch 'remotes/philmd/tags/kconfig-20210720' into staging (2021-07-20 19:30:28 +0100)
 are available in the Git repository at:
-  git://github.com/stefanha/qemu.git tags/block-pull-request
+  https://gitlab.com/stefanha/qemu.git tags/block-pull-request
-for you to fetch changes up to 55140166667bb555c5d05165b93b25557d2e6397:
+for you to fetch changes up to d7ddd0a1618a75b31dc308bb37365ce1da972154:
-  tests/virtio-blk: add test for WRITE_ZEROES command (2019-02-11 11:58:17 +0800)
+  linux-aio: limit the batch size using `aio-max-batch` parameter (2021-07-21 13:47:50 +0100)
 ----------------------------------------------------------------
 Pull request
-User-visible changes:
+Stefano's performance regression fix for commit 2558cb8dd4 ("linux-aio:
+increasing MAX_EVENTS to a larger hardcoded value").
  * virtio-blk: DISCARD and WRITE_ZEROES support
 ----------------------------------------------------------------
-Peter Xu (1):
+Stefano Garzarella (3):
-  iothread: fix iothread hang when stop too soon
+  iothread: generalize iothread_set_param/iothread_get_param
   iothread: add aio-max-batch parameter
   linux-aio: limit the batch size using `aio-max-batch` parameter
-Stefano Garzarella (7):
+ qapi/misc.json            |  6 ++-
-  virtio-blk: cleanup using VirtIOBlock *s and VirtIODevice *vdev
+ qapi/qom.json             |  7 +++-
-  virtio-blk: add acct_failed param to virtio_blk_handle_rw_error()
+ include/block/aio.h       | 12 ++++++
-  virtio-blk: add host_features field in VirtIOBlock
+ include/sysemu/iothread.h |  3 ++
-  virtio-blk: add "discard" and "write-zeroes" properties
+ block/linux-aio.c         |  9 ++++-
-  virtio-blk: add DISCARD and WRITE_ZEROES features
+ iothread.c                | 82 ++++++++++++++++++++++++++++++++++-----
-  tests/virtio-blk: change assert on data_size in virtio_blk_request()
+ monitor/hmp-cmds.c        |  2 +
-  tests/virtio-blk: add test for WRITE_ZEROES command
+ util/aio-posix.c          | 12 ++++++
+ util/aio-win32.c          |  5 +++
-Vladimir Sementsov-Ogievskiy (1):
+ util/async.c              |  2 +
-  qemugdb/coroutine: fix arch_prctl has unknown return type
+ qemu-options.hx           |  8 +++-
+files changed, 134 insertions(+), 14 deletions(-)
  include/hw/virtio/virtio-blk.h |   5 +-
  hw/block/virtio-blk.c          | 236 +++++++++++++++++++++++++++++----
  hw/core/machine.c              |   2 +
  iothread.c                     |   6 +-
  tests/virtio-blk-test.c        |  75 ++++++++++-
  scripts/qemugdb/coroutine.py   |   2 +-
 files changed, 297 insertions(+), 29 deletions(-)
 --
-.20.1
+.31.1

-[Qemu-devel] [PULL 1/9] iothread: fix iothread hang when stop too soon
+Deleted patch
-From: Peter Xu <peterx@redhat.com>
-Lukas reported an hard to reproduce QMP iothread hang on s390 that
-QEMU might hang at pthread_join() of the QMP monitor iothread before
-quitting:
-  Thread 1
-  #0  0x000003ffad10932c in pthread_join
-  #1  0x0000000109e95750 in qemu_thread_join
-      at /home/thuth/devel/qemu/util/qemu-thread-posix.c:570
-  #2  0x0000000109c95a1c in iothread_stop
-  #3  0x0000000109bb0874 in monitor_cleanup
-  #4  0x0000000109b55042 in main
-While the iothread is still in the main loop:
-  Thread 4
-  #0  0x000003ffad0010e4 in ??
-  #1  0x000003ffad553958 in g_main_context_iterate.isra.19
-  #2  0x000003ffad553d90 in g_main_loop_run
-  #3  0x0000000109c9585a in iothread_run
-      at /home/thuth/devel/qemu/iothread.c:74
-  #4  0x0000000109e94752 in qemu_thread_start
-      at /home/thuth/devel/qemu/util/qemu-thread-posix.c:502
-  #5  0x000003ffad10825a in start_thread
-  #6  0x000003ffad00dcf2 in thread_start
-IMHO it's because there's a race between the main thread and iothread
-when stopping the thread in following sequence:
-    main thread                       iothread
-    ===========                       ==============
-                                      aio_poll()
-    iothread_get_g_main_context
-      set iothread->worker_context
-    iothread_stop
-      schedule iothread_stop_bh
-                                        execute iothread_stop_bh [1]
-                                          set iothread->running=false
-                                          (since main_loop==NULL so
-                                           skip to quit main loop.
-                                           Note: although main_loop is
-                                           NULL but worker_context is
-                                           not!)
-                                      atomic_read(&iothread->worker_context) [2]
-                                        create main_loop object
-                                        g_main_loop_run() [3]
-    pthread_join() [4]
-We can see that when execute iothread_stop_bh() at [1] it's possible
-that main_loop is still NULL because it's only created until the first
-check of the worker_context later at [2].  Then the iothread will hang
-in the main loop [3] and it'll starve the main thread too [4].
-Here the simple solution should be that we check again the "running"
-variable before check against worker_context.
-CC: Thomas Huth <thuth@redhat.com>
-CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
-CC: Stefan Hajnoczi <stefanha@redhat.com>
-CC: Lukáš Doktor <ldoktor@redhat.com>
-CC: Markus Armbruster <armbru@redhat.com>
-CC: Eric Blake <eblake@redhat.com>
-CC: Paolo Bonzini <pbonzini@redhat.com>
-Reported-by: Lukáš Doktor <ldoktor@redhat.com>
-Signed-off-by: Peter Xu <peterx@redhat.com>
-Tested-by: Thomas Huth <thuth@redhat.com>
-Message-id: 20190129051432.22023-1-peterx@redhat.com
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- iothread.c | 6 +++++-
-file changed, 5 insertions(+), 1 deletion(-)
-diff --git a/iothread.c b/iothread.c
-index XXXXXXX..XXXXXXX 100644
---- a/iothread.c
-+++ b/iothread.c
-@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
-     while (iothread->running) {
-         aio_poll(iothread->ctx, true);
--        if (atomic_read(&iothread->worker_context)) {
-+        /*
-+         * We must check the running state again in case it was
-+         * changed in previous aio_poll()
-+         */
-+        if (iothread->running && atomic_read(&iothread->worker_context)) {
-             GMainLoop *loop;
-             g_main_context_push_thread_default(iothread->worker_context);
---
-.20.1

-[Qemu-devel] [PULL 2/9] qemugdb/coroutine: fix arch_prctl has unknown return type
+Deleted patch
-From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-qemu coroutine command results in following error output:
-Python Exception <class 'gdb.error'> 'arch_prctl' has unknown return
-type; cast the call to its declared return type: Error occurred in
-Python command: 'arch_prctl' has unknown return type; cast the call to
-its declared return type
-Fix it by giving it what it wants: arch_prctl return type.
-Information on the topic:
-   https://sourceware.org/gdb/onlinedocs/gdb/Calling.html
-Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
-Message-id: 20190206151425.105871-1-vsementsov@virtuozzo.com
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- scripts/qemugdb/coroutine.py | 2 +-
-file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
-index XXXXXXX..XXXXXXX 100644
---- a/scripts/qemugdb/coroutine.py
-+++ b/scripts/qemugdb/coroutine.py
-@@ -XXX,XX +XXX,XX @@ def get_fs_base():
-        pthread_self().'''
-     # %rsp - 120 is scratch space according to the SystemV ABI
-     old = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
--    gdb.execute('call arch_prctl(0x1003, $rsp - 120)', False, True)
-+    gdb.execute('call (int)arch_prctl(0x1003, $rsp - 120)', False, True)
-     fs_base = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
-     gdb.execute('set *(uint64_t*)($rsp - 120) = %s' % old, False, True)
-     return fs_base
---
-.20.1

-[Qemu-devel] [PULL 7/9] virtio-blk: add DISCARD and WRITE_ZEROES features
+[PULL for-6.1 1/3] iothread: generalize iothread_set_param/iothread_get_param
 From: Stefano Garzarella <sgarzare@redhat.com>
-This patch adds the support of DISCARD and WRITE_ZEROES commands,
+Changes in preparation for next patches where we add a new
-that have been introduced in the virtio-blk protocol to have
+parameter not related to the poll mechanism.
 better performance when using SSD backend.
-We support only one segment per request since multiple segments
+Let's add two new generic functions (iothread_set_param and
-are not widely used and there are no userspace APIs that allow
+iothread_get_param) that we use to set and get IOThread
-applications to submit multiple segments in a single call.
+parameters.
-Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
-Acked-by: Pankaj Gupta <pagupta@redhat.com>
+Message-id: 20210721094211.69853-2-sgarzare@redhat.com
 Message-id: 20190208134950.187665-5-sgarzare@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- include/hw/virtio/virtio-blk.h |   2 +
+ iothread.c | 27 +++++++++++++++++++++++----
- hw/block/virtio-blk.c          | 184 +++++++++++++++++++++++++++++++++
+file changed, 23 insertions(+), 4 deletions(-)
 files changed, 186 insertions(+)
-diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
+diff --git a/iothread.c b/iothread.c
 index XXXXXXX..XXXXXXX 100644
---- a/include/hw/virtio/virtio-blk.h
+--- a/iothread.c
-+++ b/include/hw/virtio/virtio-blk.h
++++ b/iothread.c
-@@ -XXX,XX +XXX,XX @@ struct VirtIOBlkConf
+@@ -XXX,XX +XXX,XX @@ static PollParamInfo poll_shrink_info = {
-     uint32_t request_merging;
+     "poll-shrink", offsetof(IOThread, poll_shrink),
      uint16_t num_queues;
      uint16_t queue_size;
 +    uint32_t max_discard_sectors;
 +    uint32_t max_write_zeroes_sectors;
  };
- struct VirtIOBlockDataPlane;
+-static void iothread_get_poll_param(Object *obj, Visitor *v,
-diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
++static void iothread_get_param(Object *obj, Visitor *v,
-index XXXXXXX..XXXXXXX 100644
+         const char *name, void *opaque, Error **errp)
---- a/hw/block/virtio-blk.c
+ {
-+++ b/hw/block/virtio-blk.c
+     IOThread *iothread = IOTHREAD(obj);
-@@ -XXX,XX +XXX,XX @@ out:
+@@ -XXX,XX +XXX,XX @@ static void iothread_get_poll_param(Object *obj, Visitor *v,
-     aio_context_release(blk_get_aio_context(s->conf.conf.blk));
+     visit_type_int64(v, name, field, errp);
  }
-+static void virtio_blk_discard_write_zeroes_complete(void *opaque, int ret)
+-static void iothread_set_poll_param(Object *obj, Visitor *v,
-+{
++static bool iothread_set_param(Object *obj, Visitor *v,
-+    VirtIOBlockReq *req = opaque;
+         const char *name, void *opaque, Error **errp)
-+    VirtIOBlock *s = req->dev;
+ {
-+    bool is_write_zeroes = (virtio_ldl_p(VIRTIO_DEVICE(s), &req->out.type) &
+     IOThread *iothread = IOTHREAD(obj);
-+                            ~VIRTIO_BLK_T_BARRIER) == VIRTIO_BLK_T_WRITE_ZEROES;
+@@ -XXX,XX +XXX,XX @@ static void iothread_set_poll_param(Object *obj, Visitor *v,
-+
+     int64_t value;
-+    aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
-+    if (ret) {
+     if (!visit_type_int64(v, name, &value, errp)) {
-+        if (virtio_blk_handle_rw_error(req, -ret, false, is_write_zeroes)) {
+-        return;
-+            goto out;
++        return false;
-+        }
+     }
-+    }
-+
+     if (value < 0) {
-+    virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
+         error_setg(errp, "%s value must be in range [0, %" PRId64 "]",
-+    if (is_write_zeroes) {
+                    info->name, INT64_MAX);
-+        block_acct_done(blk_get_stats(s->blk), &req->acct);
+-        return;
-+    }
++        return false;
-+    virtio_blk_free_request(req);
+     }
-+
-+out:
+     *field = value;
-+    aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 +    return true;
 +}
 +
- #ifdef __linux__
++static void iothread_get_poll_param(Object *obj, Visitor *v,
++        const char *name, void *opaque, Error **errp)
  typedef struct {
@@ -XXX,XX +XXX,XX @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
      return true;
  }
 +static uint8_t virtio_blk_handle_discard_write_zeroes(VirtIOBlockReq *req,
 +    struct virtio_blk_discard_write_zeroes *dwz_hdr, bool is_write_zeroes)
 +{
-+    VirtIOBlock *s = req->dev;
-+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
-+    uint64_t sector;
-+    uint32_t num_sectors, flags, max_sectors;
-+    uint8_t err_status;
-+    int bytes;
 +
-+    sector = virtio_ldq_p(vdev, &dwz_hdr->sector);
++    iothread_get_param(obj, v, name, opaque, errp);
 +    num_sectors = virtio_ldl_p(vdev, &dwz_hdr->num_sectors);
 +    flags = virtio_ldl_p(vdev, &dwz_hdr->flags);
 +    max_sectors = is_write_zeroes ? s->conf.max_write_zeroes_sectors :
 +                  s->conf.max_discard_sectors;
 +
 +    /*
 +     * max_sectors is at most BDRV_REQUEST_MAX_SECTORS, this check
 +     * make us sure that "num_sectors << BDRV_SECTOR_BITS" can fit in
 +     * the integer variable.
 +     */
 +    if (unlikely(num_sectors > max_sectors)) {
 +        err_status = VIRTIO_BLK_S_IOERR;
 +        goto err;
 +    }
 +
 +    bytes = num_sectors << BDRV_SECTOR_BITS;
 +
 +    if (unlikely(!virtio_blk_sect_range_ok(s, sector, bytes))) {
 +        err_status = VIRTIO_BLK_S_IOERR;
 +        goto err;
 +    }
 +
 +    /*
 +     * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for discard
 +     * and write zeroes commands if any unknown flag is set.
 +     */
 +    if (unlikely(flags & ~VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) {
 +        err_status = VIRTIO_BLK_S_UNSUPP;
 +        goto err;
 +    }
 +
 +    if (is_write_zeroes) { /* VIRTIO_BLK_T_WRITE_ZEROES */
 +        int blk_aio_flags = 0;
 +
 +        if (flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP) {
 +            blk_aio_flags |= BDRV_REQ_MAY_UNMAP;
 +        }
 +
 +        block_acct_start(blk_get_stats(s->blk), &req->acct, bytes,
 +                         BLOCK_ACCT_WRITE);
 +
 +        blk_aio_pwrite_zeroes(s->blk, sector << BDRV_SECTOR_BITS,
 +                              bytes, blk_aio_flags,
 +                              virtio_blk_discard_write_zeroes_complete, req);
 +    } else { /* VIRTIO_BLK_T_DISCARD */
 +        /*
 +         * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for
 +         * discard commands if the unmap flag is set.
 +         */
 +        if (unlikely(flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) {
 +            err_status = VIRTIO_BLK_S_UNSUPP;
 +            goto err;
 +        }
 +
 +        blk_aio_pdiscard(s->blk, sector << BDRV_SECTOR_BITS, bytes,
 +                         virtio_blk_discard_write_zeroes_complete, req);
 +    }
 +
 +    return VIRTIO_BLK_S_OK;
 +
 +err:
 +    if (is_write_zeroes) {
 +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_WRITE);
 +    }
 +    return err_status;
 +}
 +
- static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
++static void iothread_set_poll_param(Object *obj, Visitor *v,
- {
++        const char *name, void *opaque, Error **errp)
-     uint32_t type;
++{
-@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
++    IOThread *iothread = IOTHREAD(obj);
          virtio_blk_free_request(req);
          break;
      }
 +    /*
 +     * VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with
 +     * VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement,
 +     * so we must mask it for these requests, then we will check if it is set.
 +     */
 +    case VIRTIO_BLK_T_DISCARD & ~VIRTIO_BLK_T_OUT:
 +    case VIRTIO_BLK_T_WRITE_ZEROES & ~VIRTIO_BLK_T_OUT:
 +    {
 +        struct virtio_blk_discard_write_zeroes dwz_hdr;
 +        size_t out_len = iov_size(out_iov, out_num);
 +        bool is_write_zeroes = (type & ~VIRTIO_BLK_T_BARRIER) ==
 +                               VIRTIO_BLK_T_WRITE_ZEROES;
 +        uint8_t err_status;
 +
-+        /*
++    if (!iothread_set_param(obj, v, name, opaque, errp)) {
 +         * Unsupported if VIRTIO_BLK_T_OUT is not set or the request contains
 +         * more than one segment.
 +         */
 +        if (unlikely(!(type & VIRTIO_BLK_T_OUT) ||
 +                     out_len > sizeof(dwz_hdr))) {
 +            virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
 +            virtio_blk_free_request(req);
 +            return 0;
 +        }
 +
 +        if (unlikely(iov_to_buf(out_iov, out_num, 0, &dwz_hdr,
 +                                sizeof(dwz_hdr)) != sizeof(dwz_hdr))) {
 +            virtio_error(vdev, "virtio-blk discard/write_zeroes header"
 +                         " too short");
 +            return -1;
 +        }
 +
 +        err_status = virtio_blk_handle_discard_write_zeroes(req, &dwz_hdr,
 +                                                            is_write_zeroes);
 +        if (err_status != VIRTIO_BLK_S_OK) {
 +            virtio_blk_req_complete(req, err_status);
 +            virtio_blk_free_request(req);
 +        }
 +
 +        break;
 +    }
      default:
          virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
          virtio_blk_free_request(req);
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
      blkcfg.alignment_offset = 0;
      blkcfg.wce = blk_enable_write_cache(s->blk);
      virtio_stw_p(vdev, &blkcfg.num_queues, s->conf.num_queues);
 +    if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD)) {
 +        virtio_stl_p(vdev, &blkcfg.max_discard_sectors,
 +                     s->conf.max_discard_sectors);
 +        virtio_stl_p(vdev, &blkcfg.discard_sector_alignment,
 +                     blk_size >> BDRV_SECTOR_BITS);
 +        /*
 +         * We support only one segment per request since multiple segments
 +         * are not widely used and there are no userspace APIs that allow
 +         * applications to submit multiple segments in a single call.
 +         */
 +        virtio_stl_p(vdev, &blkcfg.max_discard_seg, 1);
 +    }
 +    if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_WRITE_ZEROES)) {
 +        virtio_stl_p(vdev, &blkcfg.max_write_zeroes_sectors,
 +                     s->conf.max_write_zeroes_sectors);
 +        blkcfg.write_zeroes_may_unmap = 1;
 +        virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
 +    }
      memcpy(config, &blkcfg, sizeof(struct virtio_blk_config));
  }
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
          return;
      }
 +    if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD) &&
 +        (!conf->max_discard_sectors ||
 +         conf->max_discard_sectors > BDRV_REQUEST_MAX_SECTORS)) {
 +        error_setg(errp, "invalid max-discard-sectors property (%" PRIu32 ")"
 +                   ", must be between 1 and %d",
 +                   conf->max_discard_sectors, (int)BDRV_REQUEST_MAX_SECTORS);
 +        return;
 +    }
 +
-+    if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_WRITE_ZEROES) &&
+     if (iothread->ctx) {
-+        (!conf->max_write_zeroes_sectors ||
+         aio_context_set_poll_params(iothread->ctx,
-+         conf->max_write_zeroes_sectors > BDRV_REQUEST_MAX_SECTORS)) {
+                                     iothread->poll_max_ns,
 +        error_setg(errp, "invalid max-write-zeroes-sectors property (%" PRIu32
 +                   "), must be between 1 and %d",
 +                   conf->max_write_zeroes_sectors,
 +                   (int)BDRV_REQUEST_MAX_SECTORS);
 +        return;
 +    }
 +
      virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
                  sizeof(struct virtio_blk_config));
@@ -XXX,XX +XXX,XX @@ static Property virtio_blk_properties[] = {
                        VIRTIO_BLK_F_DISCARD, true),
      DEFINE_PROP_BIT64("write-zeroes", VirtIOBlock, host_features,
                        VIRTIO_BLK_F_WRITE_ZEROES, true),
 +    DEFINE_PROP_UINT32("max-discard-sectors", VirtIOBlock,
 +                       conf.max_discard_sectors, BDRV_REQUEST_MAX_SECTORS),
 +    DEFINE_PROP_UINT32("max-write-zeroes-sectors", VirtIOBlock,
 +                       conf.max_write_zeroes_sectors, BDRV_REQUEST_MAX_SECTORS),
      DEFINE_PROP_END_OF_LIST(),
  };
 --
-.20.1
+.31.1

-[Qemu-devel] [PULL 9/9] tests/virtio-blk: add test for WRITE_ZEROES command
+[PULL for-6.1 2/3] iothread: add aio-max-batch parameter
 From: Stefano Garzarella <sgarzare@redhat.com>
-If the WRITE_ZEROES feature is enabled, we check this command
+The `aio-max-batch` parameter will be propagated to AIO engines
-in the test_basic().
+and it will be used to control the maximum number of queued requests.
-Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
+When there are in queue a number of requests equal to `aio-max-batch`,
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
+the engine invokes the system call to forward the requests to the kernel.
-Acked-by: Thomas Huth <thuth@redhat.com>
 This parameter allows us to control the maximum batch size to reduce
 the latency that requests might accumulate while queued in the AIO
 engine queue.
 If `aio-max-batch` is equal to 0 (default value), the AIO engine will
 use its default maximum batch size value.
 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
-Acked-by: Pankaj Gupta <pagupta@redhat.com>
+Message-id: 20210721094211.69853-3-sgarzare@redhat.com
 Message-id: 20190208134950.187665-7-sgarzare@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- tests/virtio-blk-test.c | 60 +++++++++++++++++++++++++++++++++++++++++
+ qapi/misc.json            |  6 ++++-
-file changed, 60 insertions(+)
+ qapi/qom.json             |  7 ++++-
+ include/block/aio.h       | 12 +++++++++
-diff --git a/tests/virtio-blk-test.c b/tests/virtio-blk-test.c
+ include/sysemu/iothread.h |  3 +++
-index XXXXXXX..XXXXXXX 100644
+ iothread.c                | 55 +++++++++++++++++++++++++++++++++++----
---- a/tests/virtio-blk-test.c
+ monitor/hmp-cmds.c        |  2 ++
-+++ b/tests/virtio-blk-test.c
+ util/aio-posix.c          | 12 +++++++++
-@@ -XXX,XX +XXX,XX @@ static void test_basic(QVirtioDevice *dev, QGuestAllocator *alloc,
+ util/aio-win32.c          |  5 ++++
+ util/async.c              |  2 ++
-     guest_free(alloc, req_addr);
+ qemu-options.hx           |  8 ++++--
+files changed, 103 insertions(+), 9 deletions(-)
-+    if (features & (1u << VIRTIO_BLK_F_WRITE_ZEROES)) {
-+        struct virtio_blk_discard_write_zeroes dwz_hdr;
+diff --git a/qapi/misc.json b/qapi/misc.json
-+        void *expected;
+index XXXXXXX..XXXXXXX 100644
-+
+--- a/qapi/misc.json
-+        /*
++++ b/qapi/misc.json
-+         * WRITE_ZEROES request on the same sector of previous test where
+@@ -XXX,XX +XXX,XX @@
-+         * we wrote "TEST".
+ # @poll-shrink: how many ns will be removed from polling time, 0 means that
-+         */
+ #               it's not configured (since 2.9)
-+        req.type = VIRTIO_BLK_T_WRITE_ZEROES;
+ #
-+        req.data = (char *) &dwz_hdr;
++# @aio-max-batch: maximum number of requests in a batch for the AIO engine,
-+        dwz_hdr.sector = 0;
++#                 0 means that the engine will use its default (since 6.1)
-+        dwz_hdr.num_sectors = 1;
++#
-+        dwz_hdr.flags = 0;
+ # Since: 2.0
-+
+ ##
-+        req_addr = virtio_blk_request(alloc, dev, &req, sizeof(dwz_hdr));
+ { 'struct': 'IOThreadInfo',
-+
+@@ -XXX,XX +XXX,XX @@
-+        free_head = qvirtqueue_add(vq, req_addr, 16, false, true);
+            'thread-id': 'int',
-+        qvirtqueue_add(vq, req_addr + 16, sizeof(dwz_hdr), false, true);
+            'poll-max-ns': 'int',
-+        qvirtqueue_add(vq, req_addr + 16 + sizeof(dwz_hdr), 1, true, false);
+            'poll-grow': 'int',
-+
+-           'poll-shrink': 'int' } }
-+        qvirtqueue_kick(dev, vq, free_head);
++           'poll-shrink': 'int',
-+
++           'aio-max-batch': 'int' } }
-+        qvirtio_wait_used_elem(dev, vq, free_head, NULL,
-+                               QVIRTIO_BLK_TIMEOUT_US);
+ ##
-+        status = readb(req_addr + 16 + sizeof(dwz_hdr));
+ # @query-iothreads:
-+        g_assert_cmpint(status, ==, 0);
+diff --git a/qapi/qom.json b/qapi/qom.json
-+
+index XXXXXXX..XXXXXXX 100644
-+        guest_free(alloc, req_addr);
+--- a/qapi/qom.json
-+
++++ b/qapi/qom.json
-+        /* Read request to check if the sector contains all zeroes */
+@@ -XXX,XX +XXX,XX @@
-+        req.type = VIRTIO_BLK_T_IN;
+ #               algorithm detects it is spending too long polling without
-+        req.ioprio = 1;
+ #               encountering events. 0 selects a default behaviour (default: 0)
-+        req.sector = 0;
+ #
-+        req.data = g_malloc0(512);
++# @aio-max-batch: maximum number of requests in a batch for the AIO engine,
-+
++#                 0 means that the engine will use its default
-+        req_addr = virtio_blk_request(alloc, dev, &req, 512);
++#                 (default:0, since 6.1)
-+
++#
-+        g_free(req.data);
+ # Since: 2.0
-+
+ ##
-+        free_head = qvirtqueue_add(vq, req_addr, 16, false, true);
+ { 'struct': 'IothreadProperties',
-+        qvirtqueue_add(vq, req_addr + 16, 512, true, true);
+   'data': { '*poll-max-ns': 'int',
-+        qvirtqueue_add(vq, req_addr + 528, 1, true, false);
+             '*poll-grow': 'int',
-+
+-            '*poll-shrink': 'int' } }
-+        qvirtqueue_kick(dev, vq, free_head);
++            '*poll-shrink': 'int',
-+
++            '*aio-max-batch': 'int' } }
-+        qvirtio_wait_used_elem(dev, vq, free_head, NULL,
-+                               QVIRTIO_BLK_TIMEOUT_US);
+ ##
-+        status = readb(req_addr + 528);
+ # @MemoryBackendProperties:
-+        g_assert_cmpint(status, ==, 0);
+diff --git a/include/block/aio.h b/include/block/aio.h
-+
+index XXXXXXX..XXXXXXX 100644
-+        data = g_malloc(512);
+--- a/include/block/aio.h
-+        expected = g_malloc0(512);
++++ b/include/block/aio.h
-+        memread(req_addr + 16, data, 512);
+@@ -XXX,XX +XXX,XX @@ struct AioContext {
-+        g_assert_cmpmem(data, 512, expected, 512);
+     int64_t poll_grow;      /* polling time growth factor */
-+        g_free(expected);
+     int64_t poll_shrink;    /* polling time shrink factor */
-+        g_free(data);
-+
++    /* AIO engine parameters */
-+        guest_free(alloc, req_addr);
++    int64_t aio_max_batch;  /* maximum number of requests in a batch */
 +
      /*
       * List of handlers participating in userspace polling.  Protected by
       * ctx->list_lock.  Iterated and modified mostly by the event loop thread
@@ -XXX,XX +XXX,XX @@ void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
                                   int64_t grow, int64_t shrink,
                                   Error **errp);
 +/**
 + * aio_context_set_aio_params:
 + * @ctx: the aio context
 + * @max_batch: maximum number of requests in a batch, 0 means that the
 + *             engine will use its default
 + */
 +void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch,
 +                                Error **errp);
 +
  #endif
 diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
 index XXXXXXX..XXXXXXX 100644
 --- a/include/sysemu/iothread.h
 +++ b/include/sysemu/iothread.h
@@ -XXX,XX +XXX,XX @@ struct IOThread {
      int64_t poll_max_ns;
      int64_t poll_grow;
      int64_t poll_shrink;
 +
 +    /* AioContext AIO engine parameters */
 +    int64_t aio_max_batch;
  };
  typedef struct IOThread IOThread;
 diff --git a/iothread.c b/iothread.c
 index XXXXXXX..XXXXXXX 100644
 --- a/iothread.c
 +++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ static void iothread_init_gcontext(IOThread *iothread)
      iothread->main_loop = g_main_loop_new(iothread->worker_context, TRUE);
  }
 +static void iothread_set_aio_context_params(IOThread *iothread, Error **errp)
 +{
 +    ERRP_GUARD();
 +
 +    aio_context_set_poll_params(iothread->ctx,
 +                                iothread->poll_max_ns,
 +                                iothread->poll_grow,
 +                                iothread->poll_shrink,
 +                                errp);
 +    if (*errp) {
 +        return;
 +    }
 +
-     if (features & (1u << VIRTIO_F_ANY_LAYOUT)) {
++    aio_context_set_aio_params(iothread->ctx,
-         /* Write and read with 2 descriptor layout */
++                               iothread->aio_max_batch,
-         /* Write request */
++                               errp);
 +}
 +
  static void iothread_complete(UserCreatable *obj, Error **errp)
  {
      Error *local_error = NULL;
@@ -XXX,XX +XXX,XX @@ static void iothread_complete(UserCreatable *obj, Error **errp)
       */
      iothread_init_gcontext(iothread);
 -    aio_context_set_poll_params(iothread->ctx,
 -                                iothread->poll_max_ns,
 -                                iothread->poll_grow,
 -                                iothread->poll_shrink,
 -                                &local_error);
 +    iothread_set_aio_context_params(iothread, &local_error);
      if (local_error) {
          error_propagate(errp, local_error);
          aio_context_unref(iothread->ctx);
@@ -XXX,XX +XXX,XX @@ static PollParamInfo poll_grow_info = {
  static PollParamInfo poll_shrink_info = {
      "poll-shrink", offsetof(IOThread, poll_shrink),
  };
 +static PollParamInfo aio_max_batch_info = {
 +    "aio-max-batch", offsetof(IOThread, aio_max_batch),
 +};
  static void iothread_get_param(Object *obj, Visitor *v,
          const char *name, void *opaque, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void iothread_set_poll_param(Object *obj, Visitor *v,
      }
  }
 +static void iothread_get_aio_param(Object *obj, Visitor *v,
 +        const char *name, void *opaque, Error **errp)
 +{
 +
 +    iothread_get_param(obj, v, name, opaque, errp);
 +}
 +
 +static void iothread_set_aio_param(Object *obj, Visitor *v,
 +        const char *name, void *opaque, Error **errp)
 +{
 +    IOThread *iothread = IOTHREAD(obj);
 +
 +    if (!iothread_set_param(obj, v, name, opaque, errp)) {
 +        return;
 +    }
 +
 +    if (iothread->ctx) {
 +        aio_context_set_aio_params(iothread->ctx,
 +                                   iothread->aio_max_batch,
 +                                   errp);
 +    }
 +}
 +
  static void iothread_class_init(ObjectClass *klass, void *class_data)
  {
      UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
@@ -XXX,XX +XXX,XX @@ static void iothread_class_init(ObjectClass *klass, void *class_data)
                                iothread_get_poll_param,
                                iothread_set_poll_param,
                                NULL, &poll_shrink_info);
 +    object_class_property_add(klass, "aio-max-batch", "int",
 +                              iothread_get_aio_param,
 +                              iothread_set_aio_param,
 +                              NULL, &aio_max_batch_info);
  }
  static const TypeInfo iothread_info = {
@@ -XXX,XX +XXX,XX @@ static int query_one_iothread(Object *object, void *opaque)
      info->poll_max_ns = iothread->poll_max_ns;
      info->poll_grow = iothread->poll_grow;
      info->poll_shrink = iothread->poll_shrink;
 +    info->aio_max_batch = iothread->aio_max_batch;
      QAPI_LIST_APPEND(*tail, info);
      return 0;
 diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
 index XXXXXXX..XXXXXXX 100644
 --- a/monitor/hmp-cmds.c
 +++ b/monitor/hmp-cmds.c
@@ -XXX,XX +XXX,XX @@ void hmp_info_iothreads(Monitor *mon, const QDict *qdict)
          monitor_printf(mon, "  poll-max-ns=%" PRId64 "\n", value->poll_max_ns);
          monitor_printf(mon, "  poll-grow=%" PRId64 "\n", value->poll_grow);
          monitor_printf(mon, "  poll-shrink=%" PRId64 "\n", value->poll_shrink);
 +        monitor_printf(mon, "  aio-max-batch=%" PRId64 "\n",
 +                       value->aio_max_batch);
      }
      qapi_free_IOThreadInfoList(info_list);
 diff --git a/util/aio-posix.c b/util/aio-posix.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/aio-posix.c
 +++ b/util/aio-posix.c
@@ -XXX,XX +XXX,XX @@ void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
      aio_notify(ctx);
  }
 +
 +void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch,
 +                                Error **errp)
 +{
 +    /*
 +     * No thread synchronization here, it doesn't matter if an incorrect value
 +     * is used once.
 +     */
 +    ctx->aio_max_batch = max_batch;
 +
 +    aio_notify(ctx);
 +}
 diff --git a/util/aio-win32.c b/util/aio-win32.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/aio-win32.c
 +++ b/util/aio-win32.c
@@ -XXX,XX +XXX,XX @@ void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
          error_setg(errp, "AioContext polling is not implemented on Windows");
      }
  }
 +
 +void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch,
 +                                Error **errp)
 +{
 +}
 diff --git a/util/async.c b/util/async.c
 index XXXXXXX..XXXXXXX 100644
 --- a/util/async.c
 +++ b/util/async.c
@@ -XXX,XX +XXX,XX @@ AioContext *aio_context_new(Error **errp)
      ctx->poll_grow = 0;
      ctx->poll_shrink = 0;
 +    ctx->aio_max_batch = 0;
 +
      return ctx;
  fail:
      g_source_destroy(&ctx->source);
 diff --git a/qemu-options.hx b/qemu-options.hx
 index XXXXXXX..XXXXXXX 100644
 --- a/qemu-options.hx
 +++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ SRST
              CN=laptop.example.com,O=Example Home,L=London,ST=London,C=GB
 -    ``-object iothread,id=id,poll-max-ns=poll-max-ns,poll-grow=poll-grow,poll-shrink=poll-shrink``
 +    ``-object iothread,id=id,poll-max-ns=poll-max-ns,poll-grow=poll-grow,poll-shrink=poll-shrink,aio-max-batch=aio-max-batch``
          Creates a dedicated event loop thread that devices can be
          assigned to. This is known as an IOThread. By default device
          emulation happens in vCPU threads or the main event loop thread.
@@ -XXX,XX +XXX,XX @@ SRST
          the polling time when the algorithm detects it is spending too
          long polling without encountering events.
 -        The polling parameters can be modified at run-time using the
 +        The ``aio-max-batch`` parameter is the maximum number of requests
 +        in a batch for the AIO engine, 0 means that the engine will use
 +        its default.
 +
 +        The IOThread parameters can be modified at run-time using the
          ``qom-set`` command (where ``iothread1`` is the IOThread's
          ``id``):
 --
-.20.1
+.31.1

-[Qemu-devel] [PULL 3/9] virtio-blk: cleanup using VirtIOBlock *s and VirtIODevice *vdev
+[PULL for-6.1 3/3] linux-aio: limit the batch size using `aio-max-batch` parameter
 From: Stefano Garzarella <sgarzare@redhat.com>
-In several part we still using req->dev or VIRTIO_DEVICE(req->dev)
+When there are multiple queues attached to the same AIO context,
-when we have already defined s and vdev pointers:
+some requests may experience high latency, since in the worst case
-    VirtIOBlock *s = req->dev;
+the AIO engine queue is only flushed when it is full (MAX_EVENTS) or
-    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+there are no more queues plugged.
 Commit 2558cb8dd4 ("linux-aio: increasing MAX_EVENTS to a larger
 hardcoded value") changed MAX_EVENTS from 128 to 1024, to increase
 the number of in-flight requests. But this change also increased
 the potential maximum batch to 1024 elements.
 When there is a single queue attached to the AIO context, the issue
 is mitigated from laio_io_unplug() that will flush the queue every
 time is invoked since there can't be others queue plugged.
 Let's use the new `aio-max-batch` IOThread parameter to mitigate
 this issue, limiting the number of requests in a batch.
 We also define a default value (32): this value is obtained running
 some benchmarks and it represents a good tradeoff between the latency
 increase while a request is queued and the cost of the io_submit(2)
 system call.
 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
-Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
+Message-id: 20210721094211.69853-4-sgarzare@redhat.com
 Message-id: 20190208142347.214815-1-sgarzare@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- hw/block/virtio-blk.c | 22 +++++++++-------------
+ block/linux-aio.c | 9 ++++++++-
-file changed, 9 insertions(+), 13 deletions(-)
+file changed, 8 insertions(+), 1 deletion(-)
-diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
+diff --git a/block/linux-aio.c b/block/linux-aio.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/block/virtio-blk.c
+--- a/block/linux-aio.c
-+++ b/hw/block/virtio-blk.c
++++ b/block/linux-aio.c
-@@ -XXX,XX +XXX,XX @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
+@@ -XXX,XX +XXX,XX @@
- static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
+  */
-     bool is_read)
+ #define MAX_EVENTS 1024
- {
--    BlockErrorAction action = blk_get_error_action(req->dev->blk,
++/* Maximum number of requests in a batch. (default value) */
--                                                   is_read, error);
++#define DEFAULT_MAX_BATCH 32
-     VirtIOBlock *s = req->dev;
++
-+    BlockErrorAction action = blk_get_error_action(s->blk, is_read, error);
+ struct qemu_laiocb {
+     Coroutine *co;
-     if (action == BLOCK_ERROR_ACTION_STOP) {
+     LinuxAioState *ctx;
-         /* Break the link as the next request is going to be parsed from the
+@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
-@@ -XXX,XX +XXX,XX @@ static void virtio_blk_flush_complete(void *opaque, int ret)
+     LinuxAioState *s = laiocb->ctx;
      struct iocb *iocbs = &laiocb->iocb;
      QEMUIOVector *qiov = laiocb->qiov;
 +    int64_t max_batch = s->aio_context->aio_max_batch ?: DEFAULT_MAX_BATCH;
 +
 +    /* limit the batch with the number of available events */
 +    max_batch = MIN_NON_ZERO(MAX_EVENTS - s->io_q.in_flight, max_batch);
      switch (type) {
      case QEMU_AIO_WRITE:
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
      s->io_q.in_queue++;
      if (!s->io_q.blocked &&
          (!s->io_q.plugged ||
 -         s->io_q.in_flight + s->io_q.in_queue >= MAX_EVENTS)) {
 +         s->io_q.in_queue >= max_batch)) {
          ioq_submit(s);
      }
-     virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
--    block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
-+    block_acct_done(blk_get_stats(s->blk), &req->acct);
-     virtio_blk_free_request(req);
- out:
-@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
-               - sizeof(struct virtio_blk_inhdr);
-     iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
--    type = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
-+    type = virtio_ldl_p(vdev, &req->out.type);
-     /* VIRTIO_BLK_T_OUT defines the command direction. VIRTIO_BLK_T_BARRIER
-      * is an optional flag. Although a guest should not send this flag if
-@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
-     case VIRTIO_BLK_T_IN:
-     {
-         bool is_write = type & VIRTIO_BLK_T_OUT;
--        req->sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
--                                       &req->out.sector);
-+        req->sector_num = virtio_ldq_p(vdev, &req->out.sector);
-         if (is_write) {
-             qemu_iovec_init_external(&req->qiov, out_iov, out_num);
-@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
-                                          req->qiov.size / BDRV_SECTOR_SIZE);
-         }
--        if (!virtio_blk_sect_range_ok(req->dev, req->sector_num,
--                                      req->qiov.size)) {
-+        if (!virtio_blk_sect_range_ok(s, req->sector_num, req->qiov.size)) {
-             virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
--            block_acct_invalid(blk_get_stats(req->dev->blk),
-+            block_acct_invalid(blk_get_stats(s->blk),
-                                is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
-             virtio_blk_free_request(req);
-             return 0;
-         }
--        block_acct_start(blk_get_stats(req->dev->blk),
--                         &req->acct, req->qiov.size,
-+        block_acct_start(blk_get_stats(s->blk), &req->acct, req->qiov.size,
-                          is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
-         /* merge would exceed maximum number of requests or IO direction
-          * changes */
-         if (mrb->num_reqs > 0 && (mrb->num_reqs == VIRTIO_BLK_MAX_MERGE_REQS ||
-                                   is_write != mrb->is_write ||
--                                  !req->dev->conf.request_merging)) {
--            virtio_blk_submit_multireq(req->dev->blk, mrb);
-+                                  !s->conf.request_merging)) {
-+            virtio_blk_submit_multireq(s->blk, mrb);
-         }
-         assert(mrb->num_reqs < VIRTIO_BLK_MAX_MERGE_REQS);
 --
-.20.1
+.31.1

-[Qemu-devel] [PULL 4/9] virtio-blk: add acct_failed param to virtio_blk_handle_rw_error()
+Deleted patch
-From: Stefano Garzarella <sgarzare@redhat.com>
-We add acct_failed param in order to use virtio_blk_handle_rw_error()
-also when is not required to call block_acct_failed(). (eg. a discard
-operation is failed)
-Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
-Acked-by: Pankaj Gupta <pagupta@redhat.com>
-Message-id: 20190208134950.187665-2-sgarzare@redhat.com
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- hw/block/virtio-blk.c | 10 ++++++----
-file changed, 6 insertions(+), 4 deletions(-)
-diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/block/virtio-blk.c
-+++ b/hw/block/virtio-blk.c
-@@ -XXX,XX +XXX,XX @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
- }
- static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
--    bool is_read)
-+    bool is_read, bool acct_failed)
- {
-     VirtIOBlock *s = req->dev;
-     BlockErrorAction action = blk_get_error_action(s->blk, is_read, error);
-@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
-         s->rq = req;
-     } else if (action == BLOCK_ERROR_ACTION_REPORT) {
-         virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
--        block_acct_failed(blk_get_stats(s->blk), &req->acct);
-+        if (acct_failed) {
-+            block_acct_failed(blk_get_stats(s->blk), &req->acct);
-+        }
-         virtio_blk_free_request(req);
-     }
-@@ -XXX,XX +XXX,XX @@ static void virtio_blk_rw_complete(void *opaque, int ret)
-              * the memory until the request is completed (which will
-              * happen on the other side of the migration).
-              */
--            if (virtio_blk_handle_rw_error(req, -ret, is_read)) {
-+            if (virtio_blk_handle_rw_error(req, -ret, is_read, true)) {
-                 continue;
-             }
-         }
-@@ -XXX,XX +XXX,XX @@ static void virtio_blk_flush_complete(void *opaque, int ret)
-     aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
-     if (ret) {
--        if (virtio_blk_handle_rw_error(req, -ret, 0)) {
-+        if (virtio_blk_handle_rw_error(req, -ret, 0, true)) {
-             goto out;
-         }
-     }
---
-.20.1

-[Qemu-devel] [PULL 5/9] virtio-blk: add host_features field in VirtIOBlock
+Deleted patch
-From: Stefano Garzarella <sgarzare@redhat.com>
-Since configurable features for virtio-blk are growing, this patch
-adds host_features field in the struct VirtIOBlock. (as in virtio-net)
-In this way, we can avoid to add new fields for new properties and
-we can directly set VIRTIO_BLK_F* flags in the host_features.
-We update "config-wce" and "scsi" property definition to use the new
-host_features field without change the behaviour.
-Suggested-by: Michael S. Tsirkin <mst@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
-Acked-by: Pankaj Gupta <pagupta@redhat.com>
-Message-id: 20190208134950.187665-3-sgarzare@redhat.com
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- include/hw/virtio/virtio-blk.h |  3 +--
- hw/block/virtio-blk.c          | 16 +++++++++-------
-files changed, 10 insertions(+), 9 deletions(-)
-diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
-index XXXXXXX..XXXXXXX 100644
---- a/include/hw/virtio/virtio-blk.h
-+++ b/include/hw/virtio/virtio-blk.h
-@@ -XXX,XX +XXX,XX @@ struct VirtIOBlkConf
-     BlockConf conf;
-     IOThread *iothread;
-     char *serial;
--    uint32_t scsi;
--    uint32_t config_wce;
-     uint32_t request_merging;
-     uint16_t num_queues;
-     uint16_t queue_size;
-@@ -XXX,XX +XXX,XX @@ typedef struct VirtIOBlock {
-     bool dataplane_disabled;
-     bool dataplane_started;
-     struct VirtIOBlockDataPlane *dataplane;
-+    uint64_t host_features;
- } VirtIOBlock;
- typedef struct VirtIOBlockReq {
-diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/block/virtio-blk.c
-+++ b/hw/block/virtio-blk.c
-@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_scsi_req(VirtIOBlockReq *req)
-      */
-     scsi = (void *)elem->in_sg[elem->in_num - 2].iov_base;
--    if (!blk->conf.scsi) {
-+    if (!virtio_has_feature(blk->host_features, VIRTIO_BLK_F_SCSI)) {
-         status = VIRTIO_BLK_S_UNSUPP;
-         goto fail;
-     }
-@@ -XXX,XX +XXX,XX @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features,
- {
-     VirtIOBlock *s = VIRTIO_BLK(vdev);
-+    /* Firstly sync all virtio-blk possible supported features */
-+    features |= s->host_features;
-+
-     virtio_add_feature(&features, VIRTIO_BLK_F_SEG_MAX);
-     virtio_add_feature(&features, VIRTIO_BLK_F_GEOMETRY);
-     virtio_add_feature(&features, VIRTIO_BLK_F_TOPOLOGY);
-     virtio_add_feature(&features, VIRTIO_BLK_F_BLK_SIZE);
-     if (virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
--        if (s->conf.scsi) {
-+        if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_SCSI)) {
-             error_setg(errp, "Please set scsi=off for virtio-blk devices in order to use virtio 1.0");
-             return 0;
-         }
-@@ -XXX,XX +XXX,XX @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features,
-         virtio_add_feature(&features, VIRTIO_BLK_F_SCSI);
-     }
--    if (s->conf.config_wce) {
--        virtio_add_feature(&features, VIRTIO_BLK_F_CONFIG_WCE);
--    }
-     if (blk_enable_write_cache(s->blk)) {
-         virtio_add_feature(&features, VIRTIO_BLK_F_WCE);
-     }
-@@ -XXX,XX +XXX,XX @@ static Property virtio_blk_properties[] = {
-     DEFINE_BLOCK_ERROR_PROPERTIES(VirtIOBlock, conf.conf),
-     DEFINE_BLOCK_CHS_PROPERTIES(VirtIOBlock, conf.conf),
-     DEFINE_PROP_STRING("serial", VirtIOBlock, conf.serial),
--    DEFINE_PROP_BIT("config-wce", VirtIOBlock, conf.config_wce, 0, true),
-+    DEFINE_PROP_BIT64("config-wce", VirtIOBlock, host_features,
-+                      VIRTIO_BLK_F_CONFIG_WCE, true),
- #ifdef __linux__
--    DEFINE_PROP_BIT("scsi", VirtIOBlock, conf.scsi, 0, false),
-+    DEFINE_PROP_BIT64("scsi", VirtIOBlock, host_features,
-+                      VIRTIO_BLK_F_SCSI, false),
- #endif
-     DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
-                     true),
---
-.20.1

-[Qemu-devel] [PULL 6/9] virtio-blk: add "discard" and "write-zeroes" properties
+Deleted patch
-From: Stefano Garzarella <sgarzare@redhat.com>
-In order to avoid migration issues, we enable DISCARD and
-WRITE_ZEROES features only for machine type >= 4.0
-As discussed with Michael S. Tsirkin and Stefan Hajnoczi on the
-list [1], DISCARD operation should not have security implications
-(eg. page cache attacks), so we can enable it by default.
-[1] https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg00504.html
-Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
-Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
-Acked-by: Pankaj Gupta <pagupta@redhat.com>
-Message-id: 20190208134950.187665-4-sgarzare@redhat.com
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- hw/block/virtio-blk.c | 4 ++++
- hw/core/machine.c     | 2 ++
-files changed, 6 insertions(+)
-diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/block/virtio-blk.c
-+++ b/hw/block/virtio-blk.c
-@@ -XXX,XX +XXX,XX @@ static Property virtio_blk_properties[] = {
-     DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 128),
-     DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
-                      IOThread *),
-+    DEFINE_PROP_BIT64("discard", VirtIOBlock, host_features,
-+                      VIRTIO_BLK_F_DISCARD, true),
-+    DEFINE_PROP_BIT64("write-zeroes", VirtIOBlock, host_features,
-+                      VIRTIO_BLK_F_WRITE_ZEROES, true),
-     DEFINE_PROP_END_OF_LIST(),
- };
-diff --git a/hw/core/machine.c b/hw/core/machine.c
-index XXXXXXX..XXXXXXX 100644
---- a/hw/core/machine.c
-+++ b/hw/core/machine.c
-@@ -XXX,XX +XXX,XX @@ GlobalProperty hw_compat_3_1[] = {
-     { "usb-kbd", "serial", "42" },
-     { "usb-mouse", "serial", "42" },
-     { "usb-kbd", "serial", "42" },
-+    { "virtio-blk-device", "discard", "false" },
-+    { "virtio-blk-device", "write-zeroes", "false" },
- };
- const size_t hw_compat_3_1_len = G_N_ELEMENTS(hw_compat_3_1);
---
-.20.1

-[Qemu-devel] [PULL 8/9] tests/virtio-blk: change assert on data_size in virtio_blk_request()
+Deleted patch
-From: Stefano Garzarella <sgarzare@redhat.com>
-The size of data in the virtio_blk_request must be a multiple
-of 512 bytes for IN and OUT requests, or a multiple of the size
-of struct virtio_blk_discard_write_zeroes for DISCARD and
-WRITE_ZEROES requests.
-Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
-Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
-Reviewed-by: Thomas Huth <thuth@redhat.com>
-Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
-Acked-by: Pankaj Gupta <pagupta@redhat.com>
-Message-id: 20190208134950.187665-6-sgarzare@redhat.com
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- tests/virtio-blk-test.c | 15 ++++++++++++++-
-file changed, 14 insertions(+), 1 deletion(-)
-diff --git a/tests/virtio-blk-test.c b/tests/virtio-blk-test.c
-index XXXXXXX..XXXXXXX 100644
---- a/tests/virtio-blk-test.c
-+++ b/tests/virtio-blk-test.c
-@@ -XXX,XX +XXX,XX @@ static uint64_t virtio_blk_request(QGuestAllocator *alloc, QVirtioDevice *d,
-     uint64_t addr;
-     uint8_t status = 0xFF;
--    g_assert_cmpuint(data_size % 512, ==, 0);
-+    switch (req->type) {
-+    case VIRTIO_BLK_T_IN:
-+    case VIRTIO_BLK_T_OUT:
-+        g_assert_cmpuint(data_size % 512, ==, 0);
-+        break;
-+    case VIRTIO_BLK_T_DISCARD:
-+    case VIRTIO_BLK_T_WRITE_ZEROES:
-+        g_assert_cmpuint(data_size %
-+                         sizeof(struct virtio_blk_discard_write_zeroes), ==, 0);
-+        break;
-+    default:
-+        g_assert_cmpuint(data_size, ==, 0);
-+    }
-+
-     addr = guest_alloc(alloc, sizeof(*req) + data_size);
-     virtio_blk_fix_request(d, req);
---
-.20.1

The following changes since commit e47f81b617684c4546af286d307b69014a83538a:

Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging (2019-02-07 18:53:25 +0000)

are available in the Git repository at:

git://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to 55140166667bb555c5d05165b93b25557d2e6397:

tests/virtio-blk: add test for WRITE_ZEROES command (2019-02-11 11:58:17 +0800)

----------------------------------------------------------------
Pull request

User-visible changes:

* virtio-blk: DISCARD and WRITE_ZEROES support

----------------------------------------------------------------

Peter Xu (1):
  iothread: fix iothread hang when stop too soon

Stefano Garzarella (7):
  virtio-blk: cleanup using VirtIOBlock *s and VirtIODevice *vdev
  virtio-blk: add acct_failed param to virtio_blk_handle_rw_error()
  virtio-blk: add host_features field in VirtIOBlock
  virtio-blk: add "discard" and "write-zeroes" properties
  virtio-blk: add DISCARD and WRITE_ZEROES features
  tests/virtio-blk: change assert on data_size in virtio_blk_request()
  tests/virtio-blk: add test for WRITE_ZEROES command

Vladimir Sementsov-Ogievskiy (1):
  qemugdb/coroutine: fix arch_prctl has unknown return type

-- 
2.20.1

From: Peter Xu <peterx@redhat.com>

Lukas reported an hard to reproduce QMP iothread hang on s390 that
QEMU might hang at pthread_join() of the QMP monitor iothread before
quitting:

Thread 1
  #0  0x000003ffad10932c in pthread_join
  #1  0x0000000109e95750 in qemu_thread_join
      at /home/thuth/devel/qemu/util/qemu-thread-posix.c:570
  #2  0x0000000109c95a1c in iothread_stop
  #3  0x0000000109bb0874 in monitor_cleanup
  #4  0x0000000109b55042 in main

While the iothread is still in the main loop:

Thread 4
  #0  0x000003ffad0010e4 in ??
  #1  0x000003ffad553958 in g_main_context_iterate.isra.19
  #2  0x000003ffad553d90 in g_main_loop_run
  #3  0x0000000109c9585a in iothread_run
      at /home/thuth/devel/qemu/iothread.c:74
  #4  0x0000000109e94752 in qemu_thread_start
      at /home/thuth/devel/qemu/util/qemu-thread-posix.c:502
  #5  0x000003ffad10825a in start_thread
  #6  0x000003ffad00dcf2 in thread_start

IMHO it's because there's a race between the main thread and iothread
when stopping the thread in following sequence:

main thread                       iothread
    ===========                       ==============
                                      aio_poll()
    iothread_get_g_main_context
      set iothread->worker_context
    iothread_stop
      schedule iothread_stop_bh
                                        execute iothread_stop_bh [1]
                                          set iothread->running=false
                                          (since main_loop==NULL so
                                           skip to quit main loop.
                                           Note: although main_loop is
                                           NULL but worker_context is
                                           not!)
                                      atomic_read(&iothread->worker_context) [2]
                                        create main_loop object
                                        g_main_loop_run() [3]
    pthread_join() [4]

We can see that when execute iothread_stop_bh() at [1] it's possible
that main_loop is still NULL because it's only created until the first
check of the worker_context later at [2].  Then the iothread will hang
in the main loop [3] and it'll starve the main thread too [4].

Here the simple solution should be that we check again the "running"
variable before check against worker_context.

CC: Thomas Huth <thuth@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Lukáš Doktor <ldoktor@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Eric Blake <eblake@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-id: 20190129051432.22023-1-peterx@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 iothread.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/iothread.c b/iothread.c
index XXXXXXX..XXXXXXX 100644
--- a/iothread.c
+++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ static void *iothread_run(void *opaque)
     while (iothread->running) {
         aio_poll(iothread->ctx, true);
 
-        if (atomic_read(&iothread->worker_context)) {
+        /*
+         * We must check the running state again in case it was
+         * changed in previous aio_poll()
+         */
+        if (iothread->running && atomic_read(&iothread->worker_context)) {
             GMainLoop *loop;
 
             g_main_context_push_thread_default(iothread->worker_context);
-- 
2.20.1

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

qemu coroutine command results in following error output:

Python Exception <class 'gdb.error'> 'arch_prctl' has unknown return
type; cast the call to its declared return type: Error occurred in
Python command: 'arch_prctl' has unknown return type; cast the call to
its declared return type

Fix it by giving it what it wants: arch_prctl return type.

Information on the topic:
   https://sourceware.org/gdb/onlinedocs/gdb/Calling.html

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20190206151425.105871-1-vsementsov@virtuozzo.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 scripts/qemugdb/coroutine.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/qemugdb/coroutine.py b/scripts/qemugdb/coroutine.py
index XXXXXXX..XXXXXXX 100644
--- a/scripts/qemugdb/coroutine.py
+++ b/scripts/qemugdb/coroutine.py
@@ -XXX,XX +XXX,XX @@ def get_fs_base():
        pthread_self().'''
     # %rsp - 120 is scratch space according to the SystemV ABI
     old = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
-    gdb.execute('call arch_prctl(0x1003, $rsp - 120)', False, True)
+    gdb.execute('call (int)arch_prctl(0x1003, $rsp - 120)', False, True)
     fs_base = gdb.parse_and_eval('*(uint64_t*)($rsp - 120)')
     gdb.execute('set *(uint64_t*)($rsp - 120) = %s' % old, False, True)
     return fs_base
-- 
2.20.1

From: Stefano Garzarella <sgarzare@redhat.com>

In several part we still using req->dev or VIRTIO_DEVICE(req->dev)
when we have already defined s and vdev pointers:
    VirtIOBlock *s = req->dev;
    VirtIODevice *vdev = VIRTIO_DEVICE(s);

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Message-id: 20190208142347.214815-1-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/virtio-blk.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, unsigned char status)
 static int virtio_blk_handle_rw_error(VirtIOBlockReq *req, int error,
     bool is_read)
 {
-    BlockErrorAction action = blk_get_error_action(req->dev->blk,
-                                                   is_read, error);
     VirtIOBlock *s = req->dev;
+    BlockErrorAction action = blk_get_error_action(s->blk, is_read, error);
 
     if (action == BLOCK_ERROR_ACTION_STOP) {
         /* Break the link as the next request is going to be parsed from the
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_flush_complete(void *opaque, int ret)
     }
 
     virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
-    block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
+    block_acct_done(blk_get_stats(s->blk), &req->acct);
     virtio_blk_free_request(req);
 
 out:
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
               - sizeof(struct virtio_blk_inhdr);
     iov_discard_back(in_iov, &in_num, sizeof(struct virtio_blk_inhdr));
 
-    type = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type);
+    type = virtio_ldl_p(vdev, &req->out.type);
 
     /* VIRTIO_BLK_T_OUT defines the command direction. VIRTIO_BLK_T_BARRIER
      * is an optional flag. Although a guest should not send this flag if
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
     case VIRTIO_BLK_T_IN:
     {
         bool is_write = type & VIRTIO_BLK_T_OUT;
-        req->sector_num = virtio_ldq_p(VIRTIO_DEVICE(req->dev),
-                                       &req->out.sector);
+        req->sector_num = virtio_ldq_p(vdev, &req->out.sector);
 
         if (is_write) {
             qemu_iovec_init_external(&req->qiov, out_iov, out_num);
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
                                          req->qiov.size / BDRV_SECTOR_SIZE);
         }
 
-        if (!virtio_blk_sect_range_ok(req->dev, req->sector_num,
-                                      req->qiov.size)) {
+        if (!virtio_blk_sect_range_ok(s, req->sector_num, req->qiov.size)) {
             virtio_blk_req_complete(req, VIRTIO_BLK_S_IOERR);
-            block_acct_invalid(blk_get_stats(req->dev->blk),
+            block_acct_invalid(blk_get_stats(s->blk),
                                is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
             virtio_blk_free_request(req);
             return 0;
         }
 
-        block_acct_start(blk_get_stats(req->dev->blk),
-                         &req->acct, req->qiov.size,
+        block_acct_start(blk_get_stats(s->blk), &req->acct, req->qiov.size,
                          is_write ? BLOCK_ACCT_WRITE : BLOCK_ACCT_READ);
 
         /* merge would exceed maximum number of requests or IO direction
          * changes */
         if (mrb->num_reqs > 0 && (mrb->num_reqs == VIRTIO_BLK_MAX_MERGE_REQS ||
                                   is_write != mrb->is_write ||
-                                  !req->dev->conf.request_merging)) {
-            virtio_blk_submit_multireq(req->dev->blk, mrb);
+                                  !s->conf.request_merging)) {
+            virtio_blk_submit_multireq(s->blk, mrb);
         }
 
         assert(mrb->num_reqs < VIRTIO_BLK_MAX_MERGE_REQS);
-- 
2.20.1

From: Stefano Garzarella <sgarzare@redhat.com>

We add acct_failed param in order to use virtio_blk_handle_rw_error()
also when is not required to call block_acct_failed(). (eg. a discard
operation is failed)

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
Message-id: 20190208134950.187665-2-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/virtio-blk.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

From: Stefano Garzarella <sgarzare@redhat.com>

Since configurable features for virtio-blk are growing, this patch
adds host_features field in the struct VirtIOBlock. (as in virtio-net)
In this way, we can avoid to add new fields for new properties and
we can directly set VIRTIO_BLK_F* flags in the host_features.

We update "config-wce" and "scsi" property definition to use the new
host_features field without change the behaviour.

Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
Message-id: 20190208134950.187665-3-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/virtio/virtio-blk.h |  3 +--
 hw/block/virtio-blk.c          | 16 +++++++++-------
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -XXX,XX +XXX,XX @@ struct VirtIOBlkConf
     BlockConf conf;
     IOThread *iothread;
     char *serial;
-    uint32_t scsi;
-    uint32_t config_wce;
     uint32_t request_merging;
     uint16_t num_queues;
     uint16_t queue_size;
@@ -XXX,XX +XXX,XX @@ typedef struct VirtIOBlock {
     bool dataplane_disabled;
     bool dataplane_started;
     struct VirtIOBlockDataPlane *dataplane;
+    uint64_t host_features;
 } VirtIOBlock;
 
 typedef struct VirtIOBlockReq {
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_scsi_req(VirtIOBlockReq *req)
      */
     scsi = (void *)elem->in_sg[elem->in_num - 2].iov_base;
 
-    if (!blk->conf.scsi) {
+    if (!virtio_has_feature(blk->host_features, VIRTIO_BLK_F_SCSI)) {
         status = VIRTIO_BLK_S_UNSUPP;
         goto fail;
     }
@@ -XXX,XX +XXX,XX @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features,
 {
     VirtIOBlock *s = VIRTIO_BLK(vdev);
 
+    /* Firstly sync all virtio-blk possible supported features */
+    features |= s->host_features;
+
     virtio_add_feature(&features, VIRTIO_BLK_F_SEG_MAX);
     virtio_add_feature(&features, VIRTIO_BLK_F_GEOMETRY);
     virtio_add_feature(&features, VIRTIO_BLK_F_TOPOLOGY);
     virtio_add_feature(&features, VIRTIO_BLK_F_BLK_SIZE);
     if (virtio_has_feature(features, VIRTIO_F_VERSION_1)) {
-        if (s->conf.scsi) {
+        if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_SCSI)) {
             error_setg(errp, "Please set scsi=off for virtio-blk devices in order to use virtio 1.0");
             return 0;
         }
@@ -XXX,XX +XXX,XX @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features,
         virtio_add_feature(&features, VIRTIO_BLK_F_SCSI);
     }
 
-    if (s->conf.config_wce) {
-        virtio_add_feature(&features, VIRTIO_BLK_F_CONFIG_WCE);
-    }
     if (blk_enable_write_cache(s->blk)) {
         virtio_add_feature(&features, VIRTIO_BLK_F_WCE);
     }
@@ -XXX,XX +XXX,XX @@ static Property virtio_blk_properties[] = {
     DEFINE_BLOCK_ERROR_PROPERTIES(VirtIOBlock, conf.conf),
     DEFINE_BLOCK_CHS_PROPERTIES(VirtIOBlock, conf.conf),
     DEFINE_PROP_STRING("serial", VirtIOBlock, conf.serial),
-    DEFINE_PROP_BIT("config-wce", VirtIOBlock, conf.config_wce, 0, true),
+    DEFINE_PROP_BIT64("config-wce", VirtIOBlock, host_features,
+                      VIRTIO_BLK_F_CONFIG_WCE, true),
 #ifdef __linux__
-    DEFINE_PROP_BIT("scsi", VirtIOBlock, conf.scsi, 0, false),
+    DEFINE_PROP_BIT64("scsi", VirtIOBlock, host_features,
+                      VIRTIO_BLK_F_SCSI, false),
 #endif
     DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
                     true),
-- 
2.20.1

From: Stefano Garzarella <sgarzare@redhat.com>

In order to avoid migration issues, we enable DISCARD and
WRITE_ZEROES features only for machine type >= 4.0

As discussed with Michael S. Tsirkin and Stefan Hajnoczi on the
list [1], DISCARD operation should not have security implications
(eg. page cache attacks), so we can enable it by default.

[1] https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg00504.html

Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
Message-id: 20190208134950.187665-4-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/block/virtio-blk.c | 4 ++++
 hw/core/machine.c     | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -XXX,XX +XXX,XX @@ static Property virtio_blk_properties[] = {
     DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 128),
     DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
                      IOThread *),
+    DEFINE_PROP_BIT64("discard", VirtIOBlock, host_features,
+                      VIRTIO_BLK_F_DISCARD, true),
+    DEFINE_PROP_BIT64("write-zeroes", VirtIOBlock, host_features,
+                      VIRTIO_BLK_F_WRITE_ZEROES, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/core/machine.c b/hw/core/machine.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -XXX,XX +XXX,XX @@ GlobalProperty hw_compat_3_1[] = {
     { "usb-kbd", "serial", "42" },
     { "usb-mouse", "serial", "42" },
     { "usb-kbd", "serial", "42" },
+    { "virtio-blk-device", "discard", "false" },
+    { "virtio-blk-device", "write-zeroes", "false" },
 };
 const size_t hw_compat_3_1_len = G_N_ELEMENTS(hw_compat_3_1);
 
-- 
2.20.1

From: Stefano Garzarella <sgarzare@redhat.com>

This patch adds the support of DISCARD and WRITE_ZEROES commands,
that have been introduced in the virtio-blk protocol to have
better performance when using SSD backend.

We support only one segment per request since multiple segments
are not widely used and there are no userspace APIs that allow
applications to submit multiple segments in a single call.

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
Message-id: 20190208134950.187665-5-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/virtio/virtio-blk.h |   2 +
 hw/block/virtio-blk.c          | 184 +++++++++++++++++++++++++++++++++
 2 files changed, 186 insertions(+)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index XXXXXXX..XXXXXXX 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -XXX,XX +XXX,XX @@ struct VirtIOBlkConf
     uint32_t request_merging;
     uint16_t num_queues;
     uint16_t queue_size;
+    uint32_t max_discard_sectors;
+    uint32_t max_write_zeroes_sectors;
 };
 
 struct VirtIOBlockDataPlane;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -XXX,XX +XXX,XX @@ out:
     aio_context_release(blk_get_aio_context(s->conf.conf.blk));
 }
 
+static void virtio_blk_discard_write_zeroes_complete(void *opaque, int ret)
+{
+    VirtIOBlockReq *req = opaque;
+    VirtIOBlock *s = req->dev;
+    bool is_write_zeroes = (virtio_ldl_p(VIRTIO_DEVICE(s), &req->out.type) &
+                            ~VIRTIO_BLK_T_BARRIER) == VIRTIO_BLK_T_WRITE_ZEROES;
+
+    aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
+    if (ret) {
+        if (virtio_blk_handle_rw_error(req, -ret, false, is_write_zeroes)) {
+            goto out;
+        }
+    }
+
+    virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
+    if (is_write_zeroes) {
+        block_acct_done(blk_get_stats(s->blk), &req->acct);
+    }
+    virtio_blk_free_request(req);
+
+out:
+    aio_context_release(blk_get_aio_context(s->conf.conf.blk));
+}
+
 #ifdef __linux__
 
 typedef struct {
@@ -XXX,XX +XXX,XX @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
     return true;
 }
 
+static uint8_t virtio_blk_handle_discard_write_zeroes(VirtIOBlockReq *req,
+    struct virtio_blk_discard_write_zeroes *dwz_hdr, bool is_write_zeroes)
+{
+    VirtIOBlock *s = req->dev;
+    VirtIODevice *vdev = VIRTIO_DEVICE(s);
+    uint64_t sector;
+    uint32_t num_sectors, flags, max_sectors;
+    uint8_t err_status;
+    int bytes;
+
+    sector = virtio_ldq_p(vdev, &dwz_hdr->sector);
+    num_sectors = virtio_ldl_p(vdev, &dwz_hdr->num_sectors);
+    flags = virtio_ldl_p(vdev, &dwz_hdr->flags);
+    max_sectors = is_write_zeroes ? s->conf.max_write_zeroes_sectors :
+                  s->conf.max_discard_sectors;
+
+    /*
+     * max_sectors is at most BDRV_REQUEST_MAX_SECTORS, this check
+     * make us sure that "num_sectors << BDRV_SECTOR_BITS" can fit in
+     * the integer variable.
+     */
+    if (unlikely(num_sectors > max_sectors)) {
+        err_status = VIRTIO_BLK_S_IOERR;
+        goto err;
+    }
+
+    bytes = num_sectors << BDRV_SECTOR_BITS;
+
+    if (unlikely(!virtio_blk_sect_range_ok(s, sector, bytes))) {
+        err_status = VIRTIO_BLK_S_IOERR;
+        goto err;
+    }
+
+    /*
+     * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for discard
+     * and write zeroes commands if any unknown flag is set.
+     */
+    if (unlikely(flags & ~VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) {
+        err_status = VIRTIO_BLK_S_UNSUPP;
+        goto err;
+    }
+
+    if (is_write_zeroes) { /* VIRTIO_BLK_T_WRITE_ZEROES */
+        int blk_aio_flags = 0;
+
+        if (flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP) {
+            blk_aio_flags |= BDRV_REQ_MAY_UNMAP;
+        }
+
+        block_acct_start(blk_get_stats(s->blk), &req->acct, bytes,
+                         BLOCK_ACCT_WRITE);
+
+        blk_aio_pwrite_zeroes(s->blk, sector << BDRV_SECTOR_BITS,
+                              bytes, blk_aio_flags,
+                              virtio_blk_discard_write_zeroes_complete, req);
+    } else { /* VIRTIO_BLK_T_DISCARD */
+        /*
+         * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for
+         * discard commands if the unmap flag is set.
+         */
+        if (unlikely(flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) {
+            err_status = VIRTIO_BLK_S_UNSUPP;
+            goto err;
+        }
+
+        blk_aio_pdiscard(s->blk, sector << BDRV_SECTOR_BITS, bytes,
+                         virtio_blk_discard_write_zeroes_complete, req);
+    }
+
+    return VIRTIO_BLK_S_OK;
+
+err:
+    if (is_write_zeroes) {
+        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_WRITE);
+    }
+    return err_status;
+}
+
 static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
 {
     uint32_t type;
@@ -XXX,XX +XXX,XX @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
         virtio_blk_free_request(req);
         break;
     }
+    /*
+     * VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with
+     * VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement,
+     * so we must mask it for these requests, then we will check if it is set.
+     */
+    case VIRTIO_BLK_T_DISCARD & ~VIRTIO_BLK_T_OUT:
+    case VIRTIO_BLK_T_WRITE_ZEROES & ~VIRTIO_BLK_T_OUT:
+    {
+        struct virtio_blk_discard_write_zeroes dwz_hdr;
+        size_t out_len = iov_size(out_iov, out_num);
+        bool is_write_zeroes = (type & ~VIRTIO_BLK_T_BARRIER) ==
+                               VIRTIO_BLK_T_WRITE_ZEROES;
+        uint8_t err_status;
+
+        /*
+         * Unsupported if VIRTIO_BLK_T_OUT is not set or the request contains
+         * more than one segment.
+         */
+        if (unlikely(!(type & VIRTIO_BLK_T_OUT) ||
+                     out_len > sizeof(dwz_hdr))) {
+            virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
+            virtio_blk_free_request(req);
+            return 0;
+        }
+
+        if (unlikely(iov_to_buf(out_iov, out_num, 0, &dwz_hdr,
+                                sizeof(dwz_hdr)) != sizeof(dwz_hdr))) {
+            virtio_error(vdev, "virtio-blk discard/write_zeroes header"
+                         " too short");
+            return -1;
+        }
+
+        err_status = virtio_blk_handle_discard_write_zeroes(req, &dwz_hdr,
+                                                            is_write_zeroes);
+        if (err_status != VIRTIO_BLK_S_OK) {
+            virtio_blk_req_complete(req, err_status);
+            virtio_blk_free_request(req);
+        }
+
+        break;
+    }
     default:
         virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
         virtio_blk_free_request(req);
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
     blkcfg.alignment_offset = 0;
     blkcfg.wce = blk_enable_write_cache(s->blk);
     virtio_stw_p(vdev, &blkcfg.num_queues, s->conf.num_queues);
+    if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD)) {
+        virtio_stl_p(vdev, &blkcfg.max_discard_sectors,
+                     s->conf.max_discard_sectors);
+        virtio_stl_p(vdev, &blkcfg.discard_sector_alignment,
+                     blk_size >> BDRV_SECTOR_BITS);
+        /*
+         * We support only one segment per request since multiple segments
+         * are not widely used and there are no userspace APIs that allow
+         * applications to submit multiple segments in a single call.
+         */
+        virtio_stl_p(vdev, &blkcfg.max_discard_seg, 1);
+    }
+    if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_WRITE_ZEROES)) {
+        virtio_stl_p(vdev, &blkcfg.max_write_zeroes_sectors,
+                     s->conf.max_write_zeroes_sectors);
+        blkcfg.write_zeroes_may_unmap = 1;
+        virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
+    }
     memcpy(config, &blkcfg, sizeof(struct virtio_blk_config));
 }
 
@@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_DISCARD) &&
+        (!conf->max_discard_sectors ||
+         conf->max_discard_sectors > BDRV_REQUEST_MAX_SECTORS)) {
+        error_setg(errp, "invalid max-discard-sectors property (%" PRIu32 ")"
+                   ", must be between 1 and %d",
+                   conf->max_discard_sectors, (int)BDRV_REQUEST_MAX_SECTORS);
+        return;
+    }
+
+    if (virtio_has_feature(s->host_features, VIRTIO_BLK_F_WRITE_ZEROES) &&
+        (!conf->max_write_zeroes_sectors ||
+         conf->max_write_zeroes_sectors > BDRV_REQUEST_MAX_SECTORS)) {
+        error_setg(errp, "invalid max-write-zeroes-sectors property (%" PRIu32
+                   "), must be between 1 and %d",
+                   conf->max_write_zeroes_sectors,
+                   (int)BDRV_REQUEST_MAX_SECTORS);
+        return;
+    }
+
     virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
                 sizeof(struct virtio_blk_config));
 
@@ -XXX,XX +XXX,XX @@ static Property virtio_blk_properties[] = {
                       VIRTIO_BLK_F_DISCARD, true),
     DEFINE_PROP_BIT64("write-zeroes", VirtIOBlock, host_features,
                       VIRTIO_BLK_F_WRITE_ZEROES, true),
+    DEFINE_PROP_UINT32("max-discard-sectors", VirtIOBlock,
+                       conf.max_discard_sectors, BDRV_REQUEST_MAX_SECTORS),
+    DEFINE_PROP_UINT32("max-write-zeroes-sectors", VirtIOBlock,
+                       conf.max_write_zeroes_sectors, BDRV_REQUEST_MAX_SECTORS),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.20.1

From: Stefano Garzarella <sgarzare@redhat.com>

The size of data in the virtio_blk_request must be a multiple
of 512 bytes for IN and OUT requests, or a multiple of the size
of struct virtio_blk_discard_write_zeroes for DISCARD and
WRITE_ZEROES requests.

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
Message-id: 20190208134950.187665-6-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/virtio-blk-test.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/tests/virtio-blk-test.c b/tests/virtio-blk-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/virtio-blk-test.c
+++ b/tests/virtio-blk-test.c
@@ -XXX,XX +XXX,XX @@ static uint64_t virtio_blk_request(QGuestAllocator *alloc, QVirtioDevice *d,
     uint64_t addr;
     uint8_t status = 0xFF;
 
-    g_assert_cmpuint(data_size % 512, ==, 0);
+    switch (req->type) {
+    case VIRTIO_BLK_T_IN:
+    case VIRTIO_BLK_T_OUT:
+        g_assert_cmpuint(data_size % 512, ==, 0);
+        break;
+    case VIRTIO_BLK_T_DISCARD:
+    case VIRTIO_BLK_T_WRITE_ZEROES:
+        g_assert_cmpuint(data_size %
+                         sizeof(struct virtio_blk_discard_write_zeroes), ==, 0);
+        break;
+    default:
+        g_assert_cmpuint(data_size, ==, 0);
+    }
+
     addr = guest_alloc(alloc, sizeof(*req) + data_size);
 
     virtio_blk_fix_request(d, req);
-- 
2.20.1

From: Stefano Garzarella <sgarzare@redhat.com>

If the WRITE_ZEROES feature is enabled, we check this command
in the test_basic().

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Pankaj Gupta <pagupta@redhat.com>
Message-id: 20190208134950.187665-7-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/virtio-blk-test.c | 60 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/tests/virtio-blk-test.c b/tests/virtio-blk-test.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/virtio-blk-test.c
+++ b/tests/virtio-blk-test.c
@@ -XXX,XX +XXX,XX @@ static void test_basic(QVirtioDevice *dev, QGuestAllocator *alloc,
 
     guest_free(alloc, req_addr);
 
+    if (features & (1u << VIRTIO_BLK_F_WRITE_ZEROES)) {
+        struct virtio_blk_discard_write_zeroes dwz_hdr;
+        void *expected;
+
+        /*
+         * WRITE_ZEROES request on the same sector of previous test where
+         * we wrote "TEST".
+         */
+        req.type = VIRTIO_BLK_T_WRITE_ZEROES;
+        req.data = (char *) &dwz_hdr;
+        dwz_hdr.sector = 0;
+        dwz_hdr.num_sectors = 1;
+        dwz_hdr.flags = 0;
+
+        req_addr = virtio_blk_request(alloc, dev, &req, sizeof(dwz_hdr));
+
+        free_head = qvirtqueue_add(vq, req_addr, 16, false, true);
+        qvirtqueue_add(vq, req_addr + 16, sizeof(dwz_hdr), false, true);
+        qvirtqueue_add(vq, req_addr + 16 + sizeof(dwz_hdr), 1, true, false);
+
+        qvirtqueue_kick(dev, vq, free_head);
+
+        qvirtio_wait_used_elem(dev, vq, free_head, NULL,
+                               QVIRTIO_BLK_TIMEOUT_US);
+        status = readb(req_addr + 16 + sizeof(dwz_hdr));
+        g_assert_cmpint(status, ==, 0);
+
+        guest_free(alloc, req_addr);
+
+        /* Read request to check if the sector contains all zeroes */
+        req.type = VIRTIO_BLK_T_IN;
+        req.ioprio = 1;
+        req.sector = 0;
+        req.data = g_malloc0(512);
+
+        req_addr = virtio_blk_request(alloc, dev, &req, 512);
+
+        g_free(req.data);
+
+        free_head = qvirtqueue_add(vq, req_addr, 16, false, true);
+        qvirtqueue_add(vq, req_addr + 16, 512, true, true);
+        qvirtqueue_add(vq, req_addr + 528, 1, true, false);
+
+        qvirtqueue_kick(dev, vq, free_head);
+
+        qvirtio_wait_used_elem(dev, vq, free_head, NULL,
+                               QVIRTIO_BLK_TIMEOUT_US);
+        status = readb(req_addr + 528);
+        g_assert_cmpint(status, ==, 0);
+
+        data = g_malloc(512);
+        expected = g_malloc0(512);
+        memread(req_addr + 16, data, 512);
+        g_assert_cmpmem(data, 512, expected, 512);
+        g_free(expected);
+        g_free(data);
+
+        guest_free(alloc, req_addr);
+    }
+
     if (features & (1u << VIRTIO_F_ANY_LAYOUT)) {
         /* Write and read with 2 descriptor layout */
         /* Write request */
-- 
2.20.1

The following changes since commit 801f3db7564dcce8a37a70833c0abe40ec19f8ce:

Merge remote-tracking branch 'remotes/philmd/tags/kconfig-20210720' into staging (2021-07-20 19:30:28 +0100)

are available in the Git repository at:

https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to d7ddd0a1618a75b31dc308bb37365ce1da972154:

linux-aio: limit the batch size using `aio-max-batch` parameter (2021-07-21 13:47:50 +0100)

----------------------------------------------------------------
Pull request

Stefano's performance regression fix for commit 2558cb8dd4 ("linux-aio:
increasing MAX_EVENTS to a larger hardcoded value").

----------------------------------------------------------------

Stefano Garzarella (3):
  iothread: generalize iothread_set_param/iothread_get_param
  iothread: add aio-max-batch parameter
  linux-aio: limit the batch size using `aio-max-batch` parameter

-- 
2.31.1

From: Stefano Garzarella <sgarzare@redhat.com>

Changes in preparation for next patches where we add a new
parameter not related to the poll mechanism.

Let's add two new generic functions (iothread_set_param and
iothread_get_param) that we use to set and get IOThread
parameters.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20210721094211.69853-2-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 iothread.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/iothread.c b/iothread.c
index XXXXXXX..XXXXXXX 100644
--- a/iothread.c
+++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ static PollParamInfo poll_shrink_info = {
     "poll-shrink", offsetof(IOThread, poll_shrink),
 };
 
-static void iothread_get_poll_param(Object *obj, Visitor *v,
+static void iothread_get_param(Object *obj, Visitor *v,
         const char *name, void *opaque, Error **errp)
 {
     IOThread *iothread = IOTHREAD(obj);
@@ -XXX,XX +XXX,XX @@ static void iothread_get_poll_param(Object *obj, Visitor *v,
     visit_type_int64(v, name, field, errp);
 }
 
-static void iothread_set_poll_param(Object *obj, Visitor *v,
+static bool iothread_set_param(Object *obj, Visitor *v,
         const char *name, void *opaque, Error **errp)
 {
     IOThread *iothread = IOTHREAD(obj);
@@ -XXX,XX +XXX,XX @@ static void iothread_set_poll_param(Object *obj, Visitor *v,
     int64_t value;
 
     if (!visit_type_int64(v, name, &value, errp)) {
-        return;
+        return false;
     }
 
     if (value < 0) {
         error_setg(errp, "%s value must be in range [0, %" PRId64 "]",
                    info->name, INT64_MAX);
-        return;
+        return false;
     }
 
     *field = value;
 
+    return true;
+}
+
+static void iothread_get_poll_param(Object *obj, Visitor *v,
+        const char *name, void *opaque, Error **errp)
+{
+
+    iothread_get_param(obj, v, name, opaque, errp);
+}
+
+static void iothread_set_poll_param(Object *obj, Visitor *v,
+        const char *name, void *opaque, Error **errp)
+{
+    IOThread *iothread = IOTHREAD(obj);
+
+    if (!iothread_set_param(obj, v, name, opaque, errp)) {
+        return;
+    }
+
     if (iothread->ctx) {
         aio_context_set_poll_params(iothread->ctx,
                                     iothread->poll_max_ns,
-- 
2.31.1

From: Stefano Garzarella <sgarzare@redhat.com>

The `aio-max-batch` parameter will be propagated to AIO engines
and it will be used to control the maximum number of queued requests.

When there are in queue a number of requests equal to `aio-max-batch`,
the engine invokes the system call to forward the requests to the kernel.

This parameter allows us to control the maximum batch size to reduce
the latency that requests might accumulate while queued in the AIO
engine queue.

If `aio-max-batch` is equal to 0 (default value), the AIO engine will
use its default maximum batch size value.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20210721094211.69853-3-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qapi/misc.json            |  6 ++++-
 qapi/qom.json             |  7 ++++-
 include/block/aio.h       | 12 +++++++++
 include/sysemu/iothread.h |  3 +++
 iothread.c                | 55 +++++++++++++++++++++++++++++++++++----
 monitor/hmp-cmds.c        |  2 ++
 util/aio-posix.c          | 12 +++++++++
 util/aio-win32.c          |  5 ++++
 util/async.c              |  2 ++
 qemu-options.hx           |  8 ++++--
 10 files changed, 103 insertions(+), 9 deletions(-)

diff --git a/qapi/misc.json b/qapi/misc.json
index XXXXXXX..XXXXXXX 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -XXX,XX +XXX,XX @@
 # @poll-shrink: how many ns will be removed from polling time, 0 means that
 #               it's not configured (since 2.9)
 #
+# @aio-max-batch: maximum number of requests in a batch for the AIO engine,
+#                 0 means that the engine will use its default (since 6.1)
+#
 # Since: 2.0
 ##
 { 'struct': 'IOThreadInfo',
@@ -XXX,XX +XXX,XX @@
            'thread-id': 'int',
            'poll-max-ns': 'int',
            'poll-grow': 'int',
-           'poll-shrink': 'int' } }
+           'poll-shrink': 'int',
+           'aio-max-batch': 'int' } }
 
 ##
 # @query-iothreads:
diff --git a/qapi/qom.json b/qapi/qom.json
index XXXXXXX..XXXXXXX 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -XXX,XX +XXX,XX @@
 #               algorithm detects it is spending too long polling without
 #               encountering events. 0 selects a default behaviour (default: 0)
 #
+# @aio-max-batch: maximum number of requests in a batch for the AIO engine,
+#                 0 means that the engine will use its default
+#                 (default:0, since 6.1)
+#
 # Since: 2.0
 ##
 { 'struct': 'IothreadProperties',
   'data': { '*poll-max-ns': 'int',
             '*poll-grow': 'int',
-            '*poll-shrink': 'int' } }
+            '*poll-shrink': 'int',
+            '*aio-max-batch': 'int' } }
 
 ##
 # @MemoryBackendProperties:
diff --git a/include/block/aio.h b/include/block/aio.h
index XXXXXXX..XXXXXXX 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -XXX,XX +XXX,XX @@ struct AioContext {
     int64_t poll_grow;      /* polling time growth factor */
     int64_t poll_shrink;    /* polling time shrink factor */
 
+    /* AIO engine parameters */
+    int64_t aio_max_batch;  /* maximum number of requests in a batch */
+
     /*
      * List of handlers participating in userspace polling.  Protected by
      * ctx->list_lock.  Iterated and modified mostly by the event loop thread
@@ -XXX,XX +XXX,XX @@ void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
                                  int64_t grow, int64_t shrink,
                                  Error **errp);
 
+/**
+ * aio_context_set_aio_params:
+ * @ctx: the aio context
+ * @max_batch: maximum number of requests in a batch, 0 means that the
+ *             engine will use its default
+ */
+void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch,
+                                Error **errp);
+
 #endif
diff --git a/include/sysemu/iothread.h b/include/sysemu/iothread.h
index XXXXXXX..XXXXXXX 100644
--- a/include/sysemu/iothread.h
+++ b/include/sysemu/iothread.h
@@ -XXX,XX +XXX,XX @@ struct IOThread {
     int64_t poll_max_ns;
     int64_t poll_grow;
     int64_t poll_shrink;
+
+    /* AioContext AIO engine parameters */
+    int64_t aio_max_batch;
 };
 typedef struct IOThread IOThread;
 
diff --git a/iothread.c b/iothread.c
index XXXXXXX..XXXXXXX 100644
--- a/iothread.c
+++ b/iothread.c
@@ -XXX,XX +XXX,XX @@ static void iothread_init_gcontext(IOThread *iothread)
     iothread->main_loop = g_main_loop_new(iothread->worker_context, TRUE);
 }
 
+static void iothread_set_aio_context_params(IOThread *iothread, Error **errp)
+{
+    ERRP_GUARD();
+
+    aio_context_set_poll_params(iothread->ctx,
+                                iothread->poll_max_ns,
+                                iothread->poll_grow,
+                                iothread->poll_shrink,
+                                errp);
+    if (*errp) {
+        return;
+    }
+
+    aio_context_set_aio_params(iothread->ctx,
+                               iothread->aio_max_batch,
+                               errp);
+}
+
 static void iothread_complete(UserCreatable *obj, Error **errp)
 {
     Error *local_error = NULL;
@@ -XXX,XX +XXX,XX @@ static void iothread_complete(UserCreatable *obj, Error **errp)
      */
     iothread_init_gcontext(iothread);
 
-    aio_context_set_poll_params(iothread->ctx,
-                                iothread->poll_max_ns,
-                                iothread->poll_grow,
-                                iothread->poll_shrink,
-                                &local_error);
+    iothread_set_aio_context_params(iothread, &local_error);
     if (local_error) {
         error_propagate(errp, local_error);
         aio_context_unref(iothread->ctx);
@@ -XXX,XX +XXX,XX @@ static PollParamInfo poll_grow_info = {
 static PollParamInfo poll_shrink_info = {
     "poll-shrink", offsetof(IOThread, poll_shrink),
 };
+static PollParamInfo aio_max_batch_info = {
+    "aio-max-batch", offsetof(IOThread, aio_max_batch),
+};
 
 static void iothread_get_param(Object *obj, Visitor *v,
         const char *name, void *opaque, Error **errp)
@@ -XXX,XX +XXX,XX @@ static void iothread_set_poll_param(Object *obj, Visitor *v,
     }
 }
 
+static void iothread_get_aio_param(Object *obj, Visitor *v,
+        const char *name, void *opaque, Error **errp)
+{
+
+    iothread_get_param(obj, v, name, opaque, errp);
+}
+
+static void iothread_set_aio_param(Object *obj, Visitor *v,
+        const char *name, void *opaque, Error **errp)
+{
+    IOThread *iothread = IOTHREAD(obj);
+
+    if (!iothread_set_param(obj, v, name, opaque, errp)) {
+        return;
+    }
+
+    if (iothread->ctx) {
+        aio_context_set_aio_params(iothread->ctx,
+                                   iothread->aio_max_batch,
+                                   errp);
+    }
+}
+
 static void iothread_class_init(ObjectClass *klass, void *class_data)
 {
     UserCreatableClass *ucc = USER_CREATABLE_CLASS(klass);
@@ -XXX,XX +XXX,XX @@ static void iothread_class_init(ObjectClass *klass, void *class_data)
                               iothread_get_poll_param,
                               iothread_set_poll_param,
                               NULL, &poll_shrink_info);
+    object_class_property_add(klass, "aio-max-batch", "int",
+                              iothread_get_aio_param,
+                              iothread_set_aio_param,
+                              NULL, &aio_max_batch_info);
 }
 
 static const TypeInfo iothread_info = {
@@ -XXX,XX +XXX,XX @@ static int query_one_iothread(Object *object, void *opaque)
     info->poll_max_ns = iothread->poll_max_ns;
     info->poll_grow = iothread->poll_grow;
     info->poll_shrink = iothread->poll_shrink;
+    info->aio_max_batch = iothread->aio_max_batch;
 
     QAPI_LIST_APPEND(*tail, info);
     return 0;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index XXXXXXX..XXXXXXX 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -XXX,XX +XXX,XX @@ void hmp_info_iothreads(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "  poll-max-ns=%" PRId64 "\n", value->poll_max_ns);
         monitor_printf(mon, "  poll-grow=%" PRId64 "\n", value->poll_grow);
         monitor_printf(mon, "  poll-shrink=%" PRId64 "\n", value->poll_shrink);
+        monitor_printf(mon, "  aio-max-batch=%" PRId64 "\n",
+                       value->aio_max_batch);
     }
 
     qapi_free_IOThreadInfoList(info_list);
diff --git a/util/aio-posix.c b/util/aio-posix.c
index XXXXXXX..XXXXXXX 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -XXX,XX +XXX,XX @@ void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
 
     aio_notify(ctx);
 }
+
+void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch,
+                                Error **errp)
+{
+    /*
+     * No thread synchronization here, it doesn't matter if an incorrect value
+     * is used once.
+     */
+    ctx->aio_max_batch = max_batch;
+
+    aio_notify(ctx);
+}
diff --git a/util/aio-win32.c b/util/aio-win32.c
index XXXXXXX..XXXXXXX 100644
--- a/util/aio-win32.c
+++ b/util/aio-win32.c
@@ -XXX,XX +XXX,XX @@ void aio_context_set_poll_params(AioContext *ctx, int64_t max_ns,
         error_setg(errp, "AioContext polling is not implemented on Windows");
     }
 }
+
+void aio_context_set_aio_params(AioContext *ctx, int64_t max_batch,
+                                Error **errp)
+{
+}
diff --git a/util/async.c b/util/async.c
index XXXXXXX..XXXXXXX 100644
--- a/util/async.c
+++ b/util/async.c
@@ -XXX,XX +XXX,XX @@ AioContext *aio_context_new(Error **errp)
     ctx->poll_grow = 0;
     ctx->poll_shrink = 0;
 
+    ctx->aio_max_batch = 0;
+
     return ctx;
 fail:
     g_source_destroy(&ctx->source);
diff --git a/qemu-options.hx b/qemu-options.hx
index XXXXXXX..XXXXXXX 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -XXX,XX +XXX,XX @@ SRST
 
             CN=laptop.example.com,O=Example Home,L=London,ST=London,C=GB
 
-    ``-object iothread,id=id,poll-max-ns=poll-max-ns,poll-grow=poll-grow,poll-shrink=poll-shrink``
+    ``-object iothread,id=id,poll-max-ns=poll-max-ns,poll-grow=poll-grow,poll-shrink=poll-shrink,aio-max-batch=aio-max-batch``
         Creates a dedicated event loop thread that devices can be
         assigned to. This is known as an IOThread. By default device
         emulation happens in vCPU threads or the main event loop thread.
@@ -XXX,XX +XXX,XX @@ SRST
         the polling time when the algorithm detects it is spending too
         long polling without encountering events.
 
-        The polling parameters can be modified at run-time using the
+        The ``aio-max-batch`` parameter is the maximum number of requests
+        in a batch for the AIO engine, 0 means that the engine will use
+        its default.
+
+        The IOThread parameters can be modified at run-time using the
         ``qom-set`` command (where ``iothread1`` is the IOThread's
         ``id``):
 
-- 
2.31.1

From: Stefano Garzarella <sgarzare@redhat.com>

When there are multiple queues attached to the same AIO context,
some requests may experience high latency, since in the worst case
the AIO engine queue is only flushed when it is full (MAX_EVENTS) or
there are no more queues plugged.

Commit 2558cb8dd4 ("linux-aio: increasing MAX_EVENTS to a larger
hardcoded value") changed MAX_EVENTS from 128 to 1024, to increase
the number of in-flight requests. But this change also increased
the potential maximum batch to 1024 elements.

When there is a single queue attached to the AIO context, the issue
is mitigated from laio_io_unplug() that will flush the queue every
time is invoked since there can't be others queue plugged.

Let's use the new `aio-max-batch` IOThread parameter to mitigate
this issue, limiting the number of requests in a batch.

We also define a default value (32): this value is obtained running
some benchmarks and it represents a good tradeoff between the latency
increase while a request is queued and the cost of the io_submit(2)
system call.

Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-id: 20210721094211.69853-4-sgarzare@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/linux-aio.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/linux-aio.c b/block/linux-aio.c
index XXXXXXX..XXXXXXX 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -XXX,XX +XXX,XX @@
  */
 #define MAX_EVENTS 1024
 
+/* Maximum number of requests in a batch. (default value) */
+#define DEFAULT_MAX_BATCH 32
+
 struct qemu_laiocb {
     Coroutine *co;
     LinuxAioState *ctx;
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
     LinuxAioState *s = laiocb->ctx;
     struct iocb *iocbs = &laiocb->iocb;
     QEMUIOVector *qiov = laiocb->qiov;
+    int64_t max_batch = s->aio_context->aio_max_batch ?: DEFAULT_MAX_BATCH;
+
+    /* limit the batch with the number of available events */
+    max_batch = MIN_NON_ZERO(MAX_EVENTS - s->io_q.in_flight, max_batch);
 
     switch (type) {
     case QEMU_AIO_WRITE:
@@ -XXX,XX +XXX,XX @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
     s->io_q.in_queue++;
     if (!s->io_q.blocked &&
         (!s->io_q.plugged ||
-         s->io_q.in_flight + s->io_q.in_queue >= MAX_EVENTS)) {
+         s->io_q.in_queue >= max_batch)) {
         ioq_submit(s);
     }
 
-- 
2.31.1