Series comparison

-[Qemu-devel] [PULL 0/2] Block patches
+[PULL for-6.1 0/3] Block patches
-The following changes since commit 33f18cf7dca7741d3647d514040904ce83edd73d:
+The following changes since commit 3521ade3510eb5cefb2e27a101667f25dad89935:
-  Merge remote-tracking branch 'remotes/kraxel/tags/audio-20190821-pull-request' into staging (2019-08-21 15:18:50 +0100)
+  Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-07-29' into staging (2021-07-29 13:17:20 +0100)
 are available in the Git repository at:
-  https://github.com/stefanha/qemu.git tags/block-pull-request
+  https://gitlab.com/stefanha/qemu.git tags/block-pull-request
-for you to fetch changes up to 5d4c1ed3d46d7e2010b389fe5f3376f605182ab0:
+for you to fetch changes up to cc8eecd7f105a1dff5876adeb238a14696061a4a:
-  vhost-user-scsi: prevent using uninitialized vqs (2019-08-22 16:52:23 +0100)
+  MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver (2021-07-29 17:17:34 +0100)
 ----------------------------------------------------------------
 Pull request
+The main fix here is for io_uring. Spurious -EAGAIN errors can happen and the
+request needs to be resubmitted.
+The MAINTAINERS changes carry no risk and we might as well include them in QEMU
+.1.
 ----------------------------------------------------------------
-Raphael Norwitz (1):
+Fabian Ebner (1):
-  vhost-user-scsi: prevent using uninitialized vqs
+  block/io_uring: resubmit when result is -EAGAIN
-Stefan Hajnoczi (1):
+Philippe Mathieu-Daudé (1):
-  util/async: hold AioContext ref to prevent use-after-free
+  MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver
- hw/scsi/vhost-user-scsi.c | 2 +-
+Stefano Garzarella (1):
- util/async.c              | 8 ++++++++
+  MAINTAINERS: add Stefano Garzarella as io_uring reviewer
-files changed, 9 insertions(+), 1 deletion(-)
  MAINTAINERS      |  2 ++
  block/io_uring.c | 16 +++++++++++++++-
 files changed, 17 insertions(+), 1 deletion(-)
 --
-.21.0
+.31.1

-New patch
+[PULL for-6.1 1/3] MAINTAINERS: add Stefano Garzarella as io_uring reviewer
+From: Stefano Garzarella <sgarzare@redhat.com>
+I've been working with io_uring for a while so I'd like to help
+with reviews.
+Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
+Message-Id: <20210728131515.131045-1-sgarzare@redhat.com>
+Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
+---
+ MAINTAINERS | 1 +
+file changed, 1 insertion(+)
+diff --git a/MAINTAINERS b/MAINTAINERS
+index XXXXXXX..XXXXXXX 100644
+--- a/MAINTAINERS
++++ b/MAINTAINERS
+@@ -XXX,XX +XXX,XX @@ Linux io_uring
+ M: Aarushi Mehta <mehta.aaru20@gmail.com>
+ M: Julia Suvorova <jusual@redhat.com>
+ M: Stefan Hajnoczi <stefanha@redhat.com>
++R: Stefano Garzarella <sgarzare@redhat.com>
+ L: qemu-block@nongnu.org
+ S: Maintained
+ F: block/io_uring.c
+--
+.31.1

-[Qemu-devel] [PULL 2/2] vhost-user-scsi: prevent using uninitialized vqs
+[PULL for-6.1 2/3] block/io_uring: resubmit when result is -EAGAIN
-From: Raphael Norwitz <raphael.norwitz@nutanix.com>
+From: Fabian Ebner <f.ebner@proxmox.com>
-Of the 3 virtqueues, seabios only sets cmd, leaving ctrl
+Linux SCSI can throw spurious -EAGAIN in some corner cases in its
-and event without a physical address. This can cause
+completion path, which will end up being the result in the completed
-vhost_verify_ring_part_mapping to return ENOMEM, causing
+io_uring request.
 the following logs:
-qemu-system-x86_64: Unable to map available ring for ring 0
+Resubmitting such requests should allow block jobs to complete, even
-qemu-system-x86_64: Verify ring failure on region 0
+if such spurious errors are encountered.
-The qemu commit e6cc11d64fc998c11a4dfcde8fda3fc33a74d844
+Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
-has already resolved the issue for vhost scsi devices but
+Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
-the fix was never applied to vhost-user scsi devices.
+Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
+Message-id: 20210729091029.65369-1-f.ebner@proxmox.com
 Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
 Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-id: 1560299717-177734-1-git-send-email-raphael.norwitz@nutanix.com
 Message-Id: <1560299717-177734-1-git-send-email-raphael.norwitz@nutanix.com>
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- hw/scsi/vhost-user-scsi.c | 2 +-
+ block/io_uring.c | 16 +++++++++++++++-
-file changed, 1 insertion(+), 1 deletion(-)
+file changed, 15 insertions(+), 1 deletion(-)
-diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
+diff --git a/block/io_uring.c b/block/io_uring.c
 index XXXXXXX..XXXXXXX 100644
---- a/hw/scsi/vhost-user-scsi.c
+--- a/block/io_uring.c
-+++ b/hw/scsi/vhost-user-scsi.c
++++ b/block/io_uring.c
-@@ -XXX,XX +XXX,XX @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
+@@ -XXX,XX +XXX,XX @@ static void luring_process_completions(LuringState *s)
-     }
+         total_bytes = ret + luringcb->total_read;
-     vsc->dev.nvqs = 2 + vs->conf.num_queues;
+         if (ret < 0) {
--    vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
+-            if (ret == -EINTR) {
-+    vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs);
++            /*
-     vsc->dev.vq_index = 0;
++             * Only writev/readv/fsync requests on regular files or host block
-     vsc->dev.backend_features = 0;
++             * devices are submitted. Therefore -EAGAIN is not expected but it's
-     vqs = vsc->dev.vqs;
++             * known to happen sometimes with Linux SCSI. Submit again and hope
 +             * the request completes successfully.
 +             *
 +             * For more information, see:
 +             * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
 +             *
 +             * If the code is changed to submit other types of requests in the
 +             * future, then this workaround may need to be extended to deal with
 +             * genuine -EAGAIN results that should not be resubmitted
 +             * immediately.
 +             */
 +            if (ret == -EINTR || ret == -EAGAIN) {
                  luring_resubmit(s, luringcb);
                  continue;
              }
 --
-.21.0
+.31.1

-[Qemu-devel] [PULL 1/2] util/async: hold AioContext ref to prevent use-after-free
+[PULL for-6.1 3/3] MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver
-The tests/test-bdrv-drain /bdrv-drain/iothread/drain test case does the
+From: Philippe Mathieu-Daudé <philmd@redhat.com>
 following:
-. The preadv coroutine calls aio_bh_schedule_oneshot() and then yields.
+I'm interested in following the activity around the NVMe bdrv.
 . The one-shot BH executes in another AioContext.  All it does is call
    aio_co_wakeup(preadv_co).
 . The preadv coroutine is re-entered and returns.
-There is a race condition in aio_co_wake() where the preadv coroutine
+Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
-returns and the test case destroys the preadv IOThread.  aio_co_wake()
+Message-id: 20210728183340.2018313-1-philmd@redhat.com
 can still be running in the other AioContext and it performs an access
 to the freed IOThread AioContext.
 Here is the race in aio_co_schedule():
   QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
                             co, co_scheduled_next);
   <-- race: co may execute before we invoke qemu_bh_schedule()!
   qemu_bh_schedule(ctx->co_schedule_bh);
 So if co causes ctx to be freed then we're in trouble.  Fix this problem
 by holding a reference to ctx.
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
 Message-id: 20190723190623.21537-1-stefanha@redhat.com
 Message-Id: <20190723190623.21537-1-stefanha@redhat.com>
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- util/async.c | 8 ++++++++
+ MAINTAINERS | 1 +
-file changed, 8 insertions(+)
+file changed, 1 insertion(+)
-diff --git a/util/async.c b/util/async.c
+diff --git a/MAINTAINERS b/MAINTAINERS
 index XXXXXXX..XXXXXXX 100644
---- a/util/async.c
+--- a/MAINTAINERS
-+++ b/util/async.c
++++ b/MAINTAINERS
-@@ -XXX,XX +XXX,XX @@ void aio_co_schedule(AioContext *ctx, Coroutine *co)
+@@ -XXX,XX +XXX,XX @@ F: block/null.c
-         abort();
+ NVMe Block Driver
-     }
+ M: Stefan Hajnoczi <stefanha@redhat.com>
+ R: Fam Zheng <fam@euphon.net>
-+    /* The coroutine might run and release the last ctx reference before we
++R: Philippe Mathieu-Daudé <philmd@redhat.com>
-+     * invoke qemu_bh_schedule().  Take a reference to keep ctx alive until
+ L: qemu-block@nongnu.org
-+     * we're done.
+ S: Supported
-+     */
+ F: block/nvme*
 +    aio_context_ref(ctx);
 +
      QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
                                co, co_scheduled_next);
      qemu_bh_schedule(ctx->co_schedule_bh);
 +
 +    aio_context_unref(ctx);
  }
  void aio_co_wake(struct Coroutine *co)
 --
-.21.0
+.31.1

The tests/test-bdrv-drain /bdrv-drain/iothread/drain test case does the
following:

1. The preadv coroutine calls aio_bh_schedule_oneshot() and then yields.
2. The one-shot BH executes in another AioContext.  All it does is call
   aio_co_wakeup(preadv_co).
3. The preadv coroutine is re-entered and returns.

There is a race condition in aio_co_wake() where the preadv coroutine
returns and the test case destroys the preadv IOThread.  aio_co_wake()
can still be running in the other AioContext and it performs an access
to the freed IOThread AioContext.

Here is the race in aio_co_schedule():

QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
                            co, co_scheduled_next);
  <-- race: co may execute before we invoke qemu_bh_schedule()!
  qemu_bh_schedule(ctx->co_schedule_bh);

So if co causes ctx to be freed then we're in trouble.  Fix this problem
by holding a reference to ctx.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20190723190623.21537-1-stefanha@redhat.com
Message-Id: <20190723190623.21537-1-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/async.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/util/async.c b/util/async.c
index XXXXXXX..XXXXXXX 100644
--- a/util/async.c
+++ b/util/async.c
@@ -XXX,XX +XXX,XX @@ void aio_co_schedule(AioContext *ctx, Coroutine *co)
         abort();
     }
 
+    /* The coroutine might run and release the last ctx reference before we
+     * invoke qemu_bh_schedule().  Take a reference to keep ctx alive until
+     * we're done.
+     */
+    aio_context_ref(ctx);
+
     QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
                               co, co_scheduled_next);
     qemu_bh_schedule(ctx->co_schedule_bh);
+
+    aio_context_unref(ctx);
 }
 
 void aio_co_wake(struct Coroutine *co)
-- 
2.21.0

From: Raphael Norwitz <raphael.norwitz@nutanix.com>

Of the 3 virtqueues, seabios only sets cmd, leaving ctrl
and event without a physical address. This can cause
vhost_verify_ring_part_mapping to return ENOMEM, causing
the following logs:

qemu-system-x86_64: Unable to map available ring for ring 0
qemu-system-x86_64: Verify ring failure on region 0

The qemu commit e6cc11d64fc998c11a4dfcde8fda3fc33a74d844
has already resolved the issue for vhost scsi devices but
the fix was never applied to vhost-user scsi devices.

Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1560299717-177734-1-git-send-email-raphael.norwitz@nutanix.com
Message-Id: <1560299717-177734-1-git-send-email-raphael.norwitz@nutanix.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/scsi/vhost-user-scsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index XXXXXXX..XXXXXXX 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -XXX,XX +XXX,XX @@ static void vhost_user_scsi_realize(DeviceState *dev, Error **errp)
     }
 
     vsc->dev.nvqs = 2 + vs->conf.num_queues;
-    vsc->dev.vqs = g_new(struct vhost_virtqueue, vsc->dev.nvqs);
+    vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs);
     vsc->dev.vq_index = 0;
     vsc->dev.backend_features = 0;
     vqs = vsc->dev.vqs;
-- 
2.21.0

The following changes since commit 3521ade3510eb5cefb2e27a101667f25dad89935:

Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-07-29' into staging (2021-07-29 13:17:20 +0100)

are available in the Git repository at:

https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to cc8eecd7f105a1dff5876adeb238a14696061a4a:

MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver (2021-07-29 17:17:34 +0100)

----------------------------------------------------------------
Pull request

The main fix here is for io_uring. Spurious -EAGAIN errors can happen and the
request needs to be resubmitted.

The MAINTAINERS changes carry no risk and we might as well include them in QEMU
6.1.

----------------------------------------------------------------

Fabian Ebner (1):
  block/io_uring: resubmit when result is -EAGAIN

Philippe Mathieu-Daudé (1):
  MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver

Stefano Garzarella (1):
  MAINTAINERS: add Stefano Garzarella as io_uring reviewer

MAINTAINERS      |  2 ++
 block/io_uring.c | 16 +++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

-- 
2.31.1

From: Fabian Ebner <f.ebner@proxmox.com>

Linux SCSI can throw spurious -EAGAIN in some corner cases in its
completion path, which will end up being the result in the completed
io_uring request.

Resubmitting such requests should allow block jobs to complete, even
if such spurious errors are encountered.

Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Message-id: 20210729091029.65369-1-f.ebner@proxmox.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/io_uring.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index XXXXXXX..XXXXXXX 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -XXX,XX +XXX,XX @@ static void luring_process_completions(LuringState *s)
         total_bytes = ret + luringcb->total_read;
 
         if (ret < 0) {
-            if (ret == -EINTR) {
+            /*
+             * Only writev/readv/fsync requests on regular files or host block
+             * devices are submitted. Therefore -EAGAIN is not expected but it's
+             * known to happen sometimes with Linux SCSI. Submit again and hope
+             * the request completes successfully.
+             *
+             * For more information, see:
+             * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
+             *
+             * If the code is changed to submit other types of requests in the
+             * future, then this workaround may need to be extended to deal with
+             * genuine -EAGAIN results that should not be resubmitted
+             * immediately.
+             */
+            if (ret == -EINTR || ret == -EAGAIN) {
                 luring_resubmit(s, luringcb);
                 continue;
             }
-- 
2.31.1