Series comparison

-[PULL for-6.1 0/3] Block patches
+[Qemu-devel] [PULL v2 0/3] Block patches
-The following changes since commit 3521ade3510eb5cefb2e27a101667f25dad89935:
+The following changes since commit bec9c64ef7be8063f1192608b83877bc5c9ea217:
-  Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-07-29' into staging (2021-07-29 13:17:20 +0100)
+  Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2018-02-13 18:24:08 +0000)
 are available in the Git repository at:
-  https://gitlab.com/stefanha/qemu.git tags/block-pull-request
+  git://github.com/stefanha/qemu.git tags/block-pull-request
-for you to fetch changes up to cc8eecd7f105a1dff5876adeb238a14696061a4a:
+for you to fetch changes up to d2f668b74907cbd96d9df0774971768ed06de2f0:
-  MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver (2021-07-29 17:17:34 +0100)
+  misc: fix spelling (2018-02-15 09:39:49 +0000)
 ----------------------------------------------------------------
 Pull request
-The main fix here is for io_uring. Spurious -EAGAIN errors can happen and the
+v2:
-request needs to be resubmitted.
+ * Dropped Fam's git-publish series because there is still ongoing discussion
 The MAINTAINERS changes carry no risk and we might as well include them in QEMU
 .1.
 ----------------------------------------------------------------
-Fabian Ebner (1):
+Marc-André Lureau (1):
-  block/io_uring: resubmit when result is -EAGAIN
+  misc: fix spelling
-Philippe Mathieu-Daudé (1):
+Stefan Hajnoczi (1):
-  MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver
+  vl: pause vcpus before stopping iothreads
-Stefano Garzarella (1):
+Wolfgang Bumiller (1):
-  MAINTAINERS: add Stefano Garzarella as io_uring reviewer
+  ratelimit: don't align wait time with slices
- MAINTAINERS      |  2 ++
+ include/qemu/ratelimit.h   | 11 +++++------
- block/io_uring.c | 16 +++++++++++++++-
+ util/qemu-coroutine-lock.c |  2 +-
-files changed, 17 insertions(+), 1 deletion(-)
+ vl.c                       | 12 ++++++++++--
 files changed, 16 insertions(+), 9 deletions(-)
 --
-.31.1
+.14.3

-[PULL for-6.1 3/3] MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver
+[Qemu-devel] [PULL v2 1/3] vl: pause vcpus before stopping iothreads
-From: Philippe Mathieu-Daudé <philmd@redhat.com>
+Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
 before main() quits") introduced iothread_stop_all() to avoid the
 following virtio-scsi assertion failure:
-I'm interested in following the activity around the NVMe bdrv.
+  assert(blk_get_aio_context(d->conf.blk) == s->ctx);
-Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
+Back then the assertion failed because when bdrv_close_all() made
-Message-id: 20210728183340.2018313-1-philmd@redhat.com
+d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
 instead of s->ctx.
 The same assertion can still fail today when vcpus submit new I/O
 requests after iothread_stop_all() has moved the BDS to the global
 AioContext.
 This patch hardens the iothread_stop_all() approach by pausing vcpus
 before calling iothread_stop_all().
 Note that the assertion failure is a race condition.  It is not possible
 to reproduce it reliably.
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 Message-id: 20180201110708.8080-1-stefanha@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- MAINTAINERS | 1 +
+ vl.c | 12 ++++++++++--
-file changed, 1 insertion(+)
+file changed, 10 insertions(+), 2 deletions(-)
-diff --git a/MAINTAINERS b/MAINTAINERS
+diff --git a/vl.c b/vl.c
 index XXXXXXX..XXXXXXX 100644
---- a/MAINTAINERS
+--- a/vl.c
-+++ b/MAINTAINERS
++++ b/vl.c
-@@ -XXX,XX +XXX,XX @@ F: block/null.c
+@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv, char **envp)
- NVMe Block Driver
- M: Stefan Hajnoczi <stefanha@redhat.com>
+     main_loop();
- R: Fam Zheng <fam@euphon.net>
+     replay_disable_events();
-+R: Philippe Mathieu-Daudé <philmd@redhat.com>
++
- L: qemu-block@nongnu.org
++    /* The ordering of the following is delicate.  Stop vcpus to prevent new
- S: Supported
++     * I/O requests being queued by the guest.  Then stop IOThreads (this
- F: block/nvme*
++     * includes a drain operation and completes all request processing).  At
 +     * this point emulated devices are still associated with their IOThreads
 +     * (if any) but no longer have any work to do.  Only then can we close
 +     * block devices safely because we know there is no more I/O coming.
 +     */
 +    pause_all_vcpus();
      iothread_stop_all();
 -
 -    pause_all_vcpus();
      bdrv_close_all();
 +
      res_free();
      /* vhost-user must be cleaned up before chardevs.  */
 --
-.31.1
+.14.3

-[PULL for-6.1 2/3] block/io_uring: resubmit when result is -EAGAIN
+[Qemu-devel] [PULL v2 2/3] ratelimit: don't align wait time with slices
-From: Fabian Ebner <f.ebner@proxmox.com>
+From: Wolfgang Bumiller <w.bumiller@proxmox.com>
-Linux SCSI can throw spurious -EAGAIN in some corner cases in its
+It is possible for rate limited writes to keep overshooting a slice's
-completion path, which will end up being the result in the completed
+quota by a tiny amount causing the slice-aligned waiting period to
-io_uring request.
+effectively halve the rate.
-Resubmitting such requests should allow block jobs to complete, even
+Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
-if such spurious errors are encountered.
+Reviewed-by: Alberto Garcia <berto@igalia.com>
+Message-id: 20180207071758.6818-1-w.bumiller@proxmox.com
 Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
 Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
 Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
 Message-id: 20210729091029.65369-1-f.ebner@proxmox.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- block/io_uring.c | 16 +++++++++++++++-
+ include/qemu/ratelimit.h | 11 +++++------
-file changed, 15 insertions(+), 1 deletion(-)
+file changed, 5 insertions(+), 6 deletions(-)
-diff --git a/block/io_uring.c b/block/io_uring.c
+diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
 index XXXXXXX..XXXXXXX 100644
---- a/block/io_uring.c
+--- a/include/qemu/ratelimit.h
-+++ b/block/io_uring.c
++++ b/include/qemu/ratelimit.h
-@@ -XXX,XX +XXX,XX @@ static void luring_process_completions(LuringState *s)
+@@ -XXX,XX +XXX,XX @@ typedef struct {
-         total_bytes = ret + luringcb->total_read;
+ static inline int64_t ratelimit_calculate_delay(RateLimit *limit, uint64_t n)
+ {
-         if (ret < 0) {
+     int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
--            if (ret == -EINTR) {
+-    uint64_t delay_slices;
-+            /*
++    double delay_slices;
-+             * Only writev/readv/fsync requests on regular files or host block
-+             * devices are submitted. Therefore -EAGAIN is not expected but it's
+     assert(limit->slice_quota && limit->slice_ns);
-+             * known to happen sometimes with Linux SCSI. Submit again and hope
-+             * the request completes successfully.
+@@ -XXX,XX +XXX,XX @@ static inline int64_t ratelimit_calculate_delay(RateLimit *limit, uint64_t n)
-+             *
+         return 0;
-+             * For more information, see:
+     }
-+             * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
-+             *
+-    /* Quota exceeded. Calculate the next time slice we may start
-+             * If the code is changed to submit other types of requests in the
+-     * sending data again. */
-+             * future, then this workaround may need to be extended to deal with
+-    delay_slices = (limit->dispatched + limit->slice_quota - 1) /
-+             * genuine -EAGAIN results that should not be resubmitted
+-        limit->slice_quota;
-+             * immediately.
++    /* Quota exceeded. Wait based on the excess amount and then start a new
-+             */
++     * slice. */
-+            if (ret == -EINTR || ret == -EAGAIN) {
++    delay_slices = (double)limit->dispatched / limit->slice_quota;
-                 luring_resubmit(s, luringcb);
+     limit->slice_end_time = limit->slice_start_time +
-                 continue;
+-        delay_slices * limit->slice_ns;
-             }
++        (uint64_t)(delay_slices * limit->slice_ns);
      return limit->slice_end_time - now;
  }
 --
-.31.1
+.14.3

-[PULL for-6.1 1/3] MAINTAINERS: add Stefano Garzarella as io_uring reviewer
+[Qemu-devel] [PULL v2 3/3] misc: fix spelling
-From: Stefano Garzarella <sgarzare@redhat.com>
+From: Marc-André Lureau <marcandre.lureau@redhat.com>
-I've been working with io_uring for a while so I'd like to help
+s/pupulate/populate
 with reviews.
-Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
+Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
-Message-Id: <20210728131515.131045-1-sgarzare@redhat.com>
+Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
 Message-id: 20180208162447.10851-1-marcandre.lureau@redhat.com
 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
 ---
- MAINTAINERS | 1 +
+ util/qemu-coroutine-lock.c | 2 +-
-file changed, 1 insertion(+)
+file changed, 1 insertion(+), 1 deletion(-)
-diff --git a/MAINTAINERS b/MAINTAINERS
+diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
 index XXXXXXX..XXXXXXX 100644
---- a/MAINTAINERS
+--- a/util/qemu-coroutine-lock.c
-+++ b/MAINTAINERS
++++ b/util/qemu-coroutine-lock.c
-@@ -XXX,XX +XXX,XX @@ Linux io_uring
+@@ -XXX,XX +XXX,XX @@ void qemu_co_queue_run_restart(Coroutine *co)
- M: Aarushi Mehta <mehta.aaru20@gmail.com>
+      * invalid memory.  Therefore, use a temporary queue and do not touch
- M: Julia Suvorova <jusual@redhat.com>
+      * the "co" coroutine as soon as you enter another one.
- M: Stefan Hajnoczi <stefanha@redhat.com>
+      *
-+R: Stefano Garzarella <sgarzare@redhat.com>
+-     * In its turn resumed "co" can pupulate "co_queue_wakeup" queue with
- L: qemu-block@nongnu.org
++     * In its turn resumed "co" can populate "co_queue_wakeup" queue with
- S: Maintained
+      * new coroutines to be woken up.  The caller, who has resumed "co",
- F: block/io_uring.c
+      * will be responsible for traversing the same queue, which may cause
       * a different wakeup order but not any missing wakeups.
 --
-.31.1
+.14.3

The following changes since commit 3521ade3510eb5cefb2e27a101667f25dad89935:

Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-07-29' into staging (2021-07-29 13:17:20 +0100)

are available in the Git repository at:

https://gitlab.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to cc8eecd7f105a1dff5876adeb238a14696061a4a:

MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver (2021-07-29 17:17:34 +0100)

----------------------------------------------------------------
Pull request

The main fix here is for io_uring. Spurious -EAGAIN errors can happen and the
request needs to be resubmitted.

The MAINTAINERS changes carry no risk and we might as well include them in QEMU
6.1.

----------------------------------------------------------------

Fabian Ebner (1):
  block/io_uring: resubmit when result is -EAGAIN

Philippe Mathieu-Daudé (1):
  MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver

Stefano Garzarella (1):
  MAINTAINERS: add Stefano Garzarella as io_uring reviewer

MAINTAINERS      |  2 ++
 block/io_uring.c | 16 +++++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

-- 
2.31.1

From: Fabian Ebner <f.ebner@proxmox.com>

Linux SCSI can throw spurious -EAGAIN in some corner cases in its
completion path, which will end up being the result in the completed
io_uring request.

Resubmitting such requests should allow block jobs to complete, even
if such spurious errors are encountered.

Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Message-id: 20210729091029.65369-1-f.ebner@proxmox.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/io_uring.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index XXXXXXX..XXXXXXX 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -XXX,XX +XXX,XX @@ static void luring_process_completions(LuringState *s)
         total_bytes = ret + luringcb->total_read;
 
         if (ret < 0) {
-            if (ret == -EINTR) {
+            /*
+             * Only writev/readv/fsync requests on regular files or host block
+             * devices are submitted. Therefore -EAGAIN is not expected but it's
+             * known to happen sometimes with Linux SCSI. Submit again and hope
+             * the request completes successfully.
+             *
+             * For more information, see:
+             * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
+             *
+             * If the code is changed to submit other types of requests in the
+             * future, then this workaround may need to be extended to deal with
+             * genuine -EAGAIN results that should not be resubmitted
+             * immediately.
+             */
+            if (ret == -EINTR || ret == -EAGAIN) {
                 luring_resubmit(s, luringcb);
                 continue;
             }
-- 
2.31.1

The following changes since commit bec9c64ef7be8063f1192608b83877bc5c9ea217:

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2018-02-13 18:24:08 +0000)

are available in the Git repository at:

git://github.com/stefanha/qemu.git tags/block-pull-request

for you to fetch changes up to d2f668b74907cbd96d9df0774971768ed06de2f0:

misc: fix spelling (2018-02-15 09:39:49 +0000)

----------------------------------------------------------------
Pull request

v2:
 * Dropped Fam's git-publish series because there is still ongoing discussion

----------------------------------------------------------------

Marc-André Lureau (1):
  misc: fix spelling

Stefan Hajnoczi (1):
  vl: pause vcpus before stopping iothreads

Wolfgang Bumiller (1):
  ratelimit: don't align wait time with slices

include/qemu/ratelimit.h   | 11 +++++------
 util/qemu-coroutine-lock.c |  2 +-
 vl.c                       | 12 ++++++++++--
 3 files changed, 16 insertions(+), 9 deletions(-)

-- 
2.14.3

Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
before main() quits") introduced iothread_stop_all() to avoid the
following virtio-scsi assertion failure:

assert(blk_get_aio_context(d->conf.blk) == s->ctx);

Back then the assertion failed because when bdrv_close_all() made
d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
instead of s->ctx.

The same assertion can still fail today when vcpus submit new I/O
requests after iothread_stop_all() has moved the BDS to the global
AioContext.

This patch hardens the iothread_stop_all() approach by pausing vcpus
before calling iothread_stop_all().

Note that the assertion failure is a race condition.  It is not possible
to reproduce it reliably.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180201110708.8080-1-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 vl.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/vl.c b/vl.c
index XXXXXXX..XXXXXXX 100644
--- a/vl.c
+++ b/vl.c
@@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv, char **envp)
 
     main_loop();
     replay_disable_events();
+
+    /* The ordering of the following is delicate.  Stop vcpus to prevent new
+     * I/O requests being queued by the guest.  Then stop IOThreads (this
+     * includes a drain operation and completes all request processing).  At
+     * this point emulated devices are still associated with their IOThreads
+     * (if any) but no longer have any work to do.  Only then can we close
+     * block devices safely because we know there is no more I/O coming.
+     */
+    pause_all_vcpus();
     iothread_stop_all();
-
-    pause_all_vcpus();
     bdrv_close_all();
+
     res_free();
 
     /* vhost-user must be cleaned up before chardevs.  */
-- 
2.14.3

From: Wolfgang Bumiller <w.bumiller@proxmox.com>

It is possible for rate limited writes to keep overshooting a slice's
quota by a tiny amount causing the slice-aligned waiting period to
effectively halve the rate.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-id: 20180207071758.6818-1-w.bumiller@proxmox.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/qemu/ratelimit.h | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
index XXXXXXX..XXXXXXX 100644
--- a/include/qemu/ratelimit.h
+++ b/include/qemu/ratelimit.h
@@ -XXX,XX +XXX,XX @@ typedef struct {
 static inline int64_t ratelimit_calculate_delay(RateLimit *limit, uint64_t n)
 {
     int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
-    uint64_t delay_slices;
+    double delay_slices;
 
     assert(limit->slice_quota && limit->slice_ns);
 
@@ -XXX,XX +XXX,XX @@ static inline int64_t ratelimit_calculate_delay(RateLimit *limit, uint64_t n)
         return 0;
     }
 
-    /* Quota exceeded. Calculate the next time slice we may start
-     * sending data again. */
-    delay_slices = (limit->dispatched + limit->slice_quota - 1) /
-        limit->slice_quota;
+    /* Quota exceeded. Wait based on the excess amount and then start a new
+     * slice. */
+    delay_slices = (double)limit->dispatched / limit->slice_quota;
     limit->slice_end_time = limit->slice_start_time +
-        delay_slices * limit->slice_ns;
+        (uint64_t)(delay_slices * limit->slice_ns);
     return limit->slice_end_time - now;
 }
 
-- 
2.14.3

From: Marc-André Lureau <marcandre.lureau@redhat.com>

s/pupulate/populate

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180208162447.10851-1-marcandre.lureau@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/qemu-coroutine-lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c
index XXXXXXX..XXXXXXX 100644
--- a/util/qemu-coroutine-lock.c
+++ b/util/qemu-coroutine-lock.c
@@ -XXX,XX +XXX,XX @@ void qemu_co_queue_run_restart(Coroutine *co)
      * invalid memory.  Therefore, use a temporary queue and do not touch
      * the "co" coroutine as soon as you enter another one.
      *
-     * In its turn resumed "co" can pupulate "co_queue_wakeup" queue with
+     * In its turn resumed "co" can populate "co_queue_wakeup" queue with
      * new coroutines to be woken up.  The caller, who has resumed "co",
      * will be responsible for traversing the same queue, which may cause
      * a different wakeup order but not any missing wakeups.
-- 
2.14.3