Series comparison

-[PULL for-6.1 0/3] Block patches
+[Qemu-devel] [PULL for-2.11-rc1 v2 0/2] Block patches
-The following changes since commit 3521ade3510eb5cefb2e27a101667f25dad89935:
+The following changes since commit b0fbe46ad82982b289a44ee2495b59b0bad8a842:
-  Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-07-29' into staging (2021-07-29 13:17:20 +0100)
+  Update version for v2.11.0-rc0 release (2017-11-07 16:05:28 +0000)
-are available in the Git repository at:
+are available in the git repository at:
-  https://gitlab.com/stefanha/qemu.git tags/block-pull-request
+  git://github.com/stefanha/qemu.git tags/block-pull-request
-for you to fetch changes up to cc8eecd7f105a1dff5876adeb238a14696061a4a:
+for you to fetch changes up to ef6dada8b44e1e7c4bec5c1115903af9af415b50:
-  MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver (2021-07-29 17:17:34 +0100)
+  util/async: use atomic_mb_set in qemu_bh_cancel (2017-11-08 19:09:15 +0000)
 ----------------------------------------------------------------
 Pull request
-The main fix here is for io_uring. Spurious -EAGAIN errors can happen and the
+v2:
-request needs to be resubmitted.
+ * v1 emails 2/3 and 3/3 weren't sent due to an email failure
+ * Included Sergio's updated wording in the commit description
 The MAINTAINERS changes carry no risk and we might as well include them in QEMU
 .1.
 ----------------------------------------------------------------
-Fabian Ebner (1):
+Sergio Lopez (1):
-  block/io_uring: resubmit when result is -EAGAIN
+  util/async: use atomic_mb_set in qemu_bh_cancel
-Philippe Mathieu-Daudé (1):
+Stefan Hajnoczi (1):
-  MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver
+  tests-aio-multithread: fix /aio/multi/schedule race condition
-Stefano Garzarella (1):
+ tests/test-aio-multithread.c | 5 ++---
-  MAINTAINERS: add Stefano Garzarella as io_uring reviewer
+ util/async.c                 | 2 +-
+files changed, 3 insertions(+), 4 deletions(-)
  MAINTAINERS      |  2 ++
  block/io_uring.c | 16 +++++++++++++++-
 files changed, 17 insertions(+), 1 deletion(-)
 --
-.31.1
+.13.6

-[PULL for-6.1 1/3] MAINTAINERS: add Stefano Garzarella as io_uring reviewer
+Deleted patch
-From: Stefano Garzarella <sgarzare@redhat.com>
-I've been working with io_uring for a while so I'd like to help
-with reviews.
-Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
-Message-Id: <20210728131515.131045-1-sgarzare@redhat.com>
-Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
----
- MAINTAINERS | 1 +
-file changed, 1 insertion(+)
-diff --git a/MAINTAINERS b/MAINTAINERS
-index XXXXXXX..XXXXXXX 100644
---- a/MAINTAINERS
-+++ b/MAINTAINERS
-@@ -XXX,XX +XXX,XX @@ Linux io_uring
- M: Aarushi Mehta <mehta.aaru20@gmail.com>
- M: Julia Suvorova <jusual@redhat.com>
- M: Stefan Hajnoczi <stefanha@redhat.com>
-+R: Stefano Garzarella <sgarzare@redhat.com>
- L: qemu-block@nongnu.org
- S: Maintained
- F: block/io_uring.c
---
-.31.1

-[PULL for-6.1 3/3] MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver
+[Qemu-devel] [PULL for-2.11-rc1 v2 1/2] tests-aio-multithread: fix /aio/multi/schedule race condition
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+test_multi_co_schedule_entry() set to_schedule[id] in the final loop
 iteration before terminating the coroutine.  There is a race condition
 where the main thread attempts to enter the terminating or terminated
 coroutine when signalling coroutines to stop:
-I'm interested in following the activity around the NVMe bdrv.
+  atomic_mb_set(&now_stopping, true);
   for (i = 0; i < NUM_CONTEXTS; i++) {
       ctx_run(i, finish_cb, NULL);  <--- enters dead coroutine!
       to_schedule[i] = NULL;
   }
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Make sure only to set to_schedule[id] if this coroutine really needs to
-Message-id: 20210728183340.2018313-1-philmd@redhat.com
+be scheduled!
 Reported-by: "R.Nageswara Sastry" <nasastry@in.ibm.com>
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
 Message-id: 20171106190233.1175-1-stefanha@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- MAINTAINERS | 1 +
+ tests/test-aio-multithread.c | 5 ++---
-file changed, 1 insertion(+)
+file changed, 2 insertions(+), 3 deletions(-)
-diff --git a/MAINTAINERS b/MAINTAINERS
+diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c
 index XXXXXXX..XXXXXXX 100644
---- a/MAINTAINERS
+--- a/tests/test-aio-multithread.c
-+++ b/MAINTAINERS
++++ b/tests/test-aio-multithread.c
-@@ -XXX,XX +XXX,XX @@ F: block/null.c
+@@ -XXX,XX +XXX,XX @@ static void finish_cb(void *opaque)
- NVMe Block Driver
+ static coroutine_fn void test_multi_co_schedule_entry(void *opaque)
- M: Stefan Hajnoczi <stefanha@redhat.com>
+ {
- R: Fam Zheng <fam@euphon.net>
+     g_assert(to_schedule[id] == NULL);
-+R: Philippe Mathieu-Daudé <philmd@redhat.com>
+-    atomic_mb_set(&to_schedule[id], qemu_coroutine_self());
- L: qemu-block@nongnu.org
- S: Supported
+     while (!atomic_mb_read(&now_stopping)) {
- F: block/nvme*
+         int n;
          n = g_test_rand_int_range(0, NUM_CONTEXTS);
          schedule_next(n);
 +
 +        atomic_mb_set(&to_schedule[id], qemu_coroutine_self());
          qemu_coroutine_yield();
 -
          g_assert(to_schedule[id] == NULL);
 -        atomic_mb_set(&to_schedule[id], qemu_coroutine_self());
      }
  }
 --
-.31.1
+.13.6

-[PULL for-6.1 2/3] block/io_uring: resubmit when result is -EAGAIN
+[Qemu-devel] [PULL for-2.11-rc1 v2 2/2] util/async: use atomic_mb_set in qemu_bh_cancel
-From: Fabian Ebner <f.ebner@proxmox.com>
+From: Sergio Lopez <slp@redhat.com>
-Linux SCSI can throw spurious -EAGAIN in some corner cases in its
+Commit b7a745d added a qemu_bh_cancel call to the completion function
-completion path, which will end up being the result in the completed
+as an optimization to prevent it from unnecessarily rescheduling itself.
 io_uring request.
-Resubmitting such requests should allow block jobs to complete, even
+This completion function is scheduled from worker_thread, after setting
-if such spurious errors are encountered.
+the state of a ThreadPoolElement to THREAD_DONE.
-Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
+This was considered to be safe, as the completion function restarts the
-Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
+loop just after the call to qemu_bh_cancel. But, as this loop lacks a HW
-Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
+memory barrier, the read of req->state may actually happen _before_ the
-Message-id: 20210729091029.65369-1-f.ebner@proxmox.com
+call, seeing it still as THREAD_QUEUED, and ending the completion
 function without having processed a pending TPE linked at pool->head:
          worker thread             |            I/O thread
 ------------------------------------------------------------------------
                                    | speculatively read req->state
 req->state = THREAD_DONE;          |
 qemu_bh_schedule(p->completion_bh) |
   bh->scheduled = 1;               |
                                    | qemu_bh_cancel(p->completion_bh)
                                    |   bh->scheduled = 0;
                                    | if (req->state == THREAD_DONE)
                                    |   // sees THREAD_QUEUED
 The source of the misunderstanding was that qemu_bh_cancel is now being
 used by the _consumer_ rather than the producer, and therefore now needs
 to have acquire semantics just like e.g. aio_bh_poll.
 In some situations, if there are no other independent requests in the
 same aio context that could eventually trigger the scheduling of the
 completion function, the omitted TPE and all operations pending on it
 will get stuck forever.
 [Added Sergio's updated wording about the HW memory barrier.
 --Stefan]
 Signed-off-by: Sergio Lopez <slp@redhat.com>
 Message-id: 20171108063447.2842-1-slp@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/io_uring.c | 16 +++++++++++++++-
+ util/async.c | 2 +-
-file changed, 15 insertions(+), 1 deletion(-)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/block/io_uring.c b/block/io_uring.c
+diff --git a/util/async.c b/util/async.c
 index XXXXXXX..XXXXXXX 100644
---- a/block/io_uring.c
+--- a/util/async.c
-+++ b/block/io_uring.c
++++ b/util/async.c
-@@ -XXX,XX +XXX,XX @@ static void luring_process_completions(LuringState *s)
+@@ -XXX,XX +XXX,XX @@ void qemu_bh_schedule(QEMUBH *bh)
-         total_bytes = ret + luringcb->total_read;
+  */
+ void qemu_bh_cancel(QEMUBH *bh)
-         if (ret < 0) {
+ {
--            if (ret == -EINTR) {
+-    bh->scheduled = 0;
-+            /*
++    atomic_mb_set(&bh->scheduled, 0);
-+             * Only writev/readv/fsync requests on regular files or host block
+ }
-+             * devices are submitted. Therefore -EAGAIN is not expected but it's
-+             * known to happen sometimes with Linux SCSI. Submit again and hope
+ /* This func is async.The bottom half will do the delete action at the finial
 +             * the request completes successfully.
 +             *
 +             * For more information, see:
 +             * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
 +             *
 +             * If the code is changed to submit other types of requests in the
 +             * future, then this workaround may need to be extended to deal with
 +             * genuine -EAGAIN results that should not be resubmitted
 +             * immediately.
 +             */
 +            if (ret == -EINTR || ret == -EAGAIN) {
                  luring_resubmit(s, luringcb);
                  continue;
              }
 --
-.31.1
+.13.6

The following changes since commit 3521ade3510eb5cefb2e27a101667f25dad89935:

Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-07-29' into staging (2021-07-29 13:17:20 +0100)

are available in the Git repository at:

https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to cc8eecd7f105a1dff5876adeb238a14696061a4a:

MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver (2021-07-29 17:17:34 +0100)

----------------------------------------------------------------
Pull request

The main fix here is for io_uring. Spurious -EAGAIN errors can happen and the
request needs to be resubmitted.

The MAINTAINERS changes carry no risk and we might as well include them in QEMU
6.1.

----------------------------------------------------------------

Fabian Ebner (1):
  block/io_uring: resubmit when result is -EAGAIN

Philippe Mathieu-Daudé (1):
  MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver

Stefano Garzarella (1):
  MAINTAINERS: add Stefano Garzarella as io_uring reviewer

MAINTAINERS      |  2 ++
 block/io_uring.c | 16 +++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

-- 
2.31.1

From: Fabian Ebner <f.ebner@proxmox.com>

Linux SCSI can throw spurious -EAGAIN in some corner cases in its
completion path, which will end up being the result in the completed
io_uring request.

Resubmitting such requests should allow block jobs to complete, even
if such spurious errors are encountered.

Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Message-id: 20210729091029.65369-1-f.ebner@proxmox.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/io_uring.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index XXXXXXX..XXXXXXX 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -XXX,XX +XXX,XX @@ static void luring_process_completions(LuringState *s)
         total_bytes = ret + luringcb->total_read;
 
         if (ret < 0) {
-            if (ret == -EINTR) {
+            /*
+             * Only writev/readv/fsync requests on regular files or host block
+             * devices are submitted. Therefore -EAGAIN is not expected but it's
+             * known to happen sometimes with Linux SCSI. Submit again and hope
+             * the request completes successfully.
+             *
+             * For more information, see:
+             * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
+             *
+             * If the code is changed to submit other types of requests in the
+             * future, then this workaround may need to be extended to deal with
+             * genuine -EAGAIN results that should not be resubmitted
+             * immediately.
+             */
+            if (ret == -EINTR || ret == -EAGAIN) {
                 luring_resubmit(s, luringcb);
                 continue;
             }
-- 
2.31.1

test_multi_co_schedule_entry() set to_schedule[id] in the final loop
iteration before terminating the coroutine.  There is a race condition
where the main thread attempts to enter the terminating or terminated
coroutine when signalling coroutines to stop:

atomic_mb_set(&now_stopping, true);
  for (i = 0; i < NUM_CONTEXTS; i++) {
      ctx_run(i, finish_cb, NULL);  <--- enters dead coroutine!
      to_schedule[i] = NULL;
  }

Make sure only to set to_schedule[id] if this coroutine really needs to
be scheduled!

Reported-by: "R.Nageswara Sastry" <nasastry@in.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20171106190233.1175-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/test-aio-multithread.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/tests/test-aio-multithread.c b/tests/test-aio-multithread.c
index XXXXXXX..XXXXXXX 100644
--- a/tests/test-aio-multithread.c
+++ b/tests/test-aio-multithread.c
@@ -XXX,XX +XXX,XX @@ static void finish_cb(void *opaque)
 static coroutine_fn void test_multi_co_schedule_entry(void *opaque)
 {
     g_assert(to_schedule[id] == NULL);
-    atomic_mb_set(&to_schedule[id], qemu_coroutine_self());
 
     while (!atomic_mb_read(&now_stopping)) {
         int n;
 
         n = g_test_rand_int_range(0, NUM_CONTEXTS);
         schedule_next(n);
+
+        atomic_mb_set(&to_schedule[id], qemu_coroutine_self());
         qemu_coroutine_yield();
-
         g_assert(to_schedule[id] == NULL);
-        atomic_mb_set(&to_schedule[id], qemu_coroutine_self());
     }
 }
 
-- 
2.13.6

From: Sergio Lopez <slp@redhat.com>

Commit b7a745d added a qemu_bh_cancel call to the completion function
as an optimization to prevent it from unnecessarily rescheduling itself.

This completion function is scheduled from worker_thread, after setting
the state of a ThreadPoolElement to THREAD_DONE.

This was considered to be safe, as the completion function restarts the
loop just after the call to qemu_bh_cancel. But, as this loop lacks a HW
memory barrier, the read of req->state may actually happen _before_ the
call, seeing it still as THREAD_QUEUED, and ending the completion
function without having processed a pending TPE linked at pool->head:

The source of the misunderstanding was that qemu_bh_cancel is now being
used by the _consumer_ rather than the producer, and therefore now needs
to have acquire semantics just like e.g. aio_bh_poll.

In some situations, if there are no other independent requests in the
same aio context that could eventually trigger the scheduling of the
completion function, the omitted TPE and all operations pending on it
will get stuck forever.

[Added Sergio's updated wording about the HW memory barrier.
--Stefan]

Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-id: 20171108063447.2842-1-slp@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/async.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/async.c b/util/async.c
index XXXXXXX..XXXXXXX 100644
--- a/util/async.c
+++ b/util/async.c
@@ -XXX,XX +XXX,XX @@ void qemu_bh_schedule(QEMUBH *bh)
  */
 void qemu_bh_cancel(QEMUBH *bh)
 {
-    bh->scheduled = 0;
+    atomic_mb_set(&bh->scheduled, 0);
 }
 
 /* This func is async.The bottom half will do the delete action at the finial
-- 
2.13.6