1 | The following changes since commit 3521ade3510eb5cefb2e27a101667f25dad89935: | 1 | The following changes since commit bec9c64ef7be8063f1192608b83877bc5c9ea217: |
---|---|---|---|
2 | 2 | ||
3 | Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-07-29' into staging (2021-07-29 13:17:20 +0100) | 3 | Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging (2018-02-13 18:24:08 +0000) |
4 | 4 | ||
5 | are available in the Git repository at: | 5 | are available in the Git repository at: |
6 | 6 | ||
7 | https://gitlab.com/stefanha/qemu.git tags/block-pull-request | 7 | git://github.com/stefanha/qemu.git tags/block-pull-request |
8 | 8 | ||
9 | for you to fetch changes up to cc8eecd7f105a1dff5876adeb238a14696061a4a: | 9 | for you to fetch changes up to d2f668b74907cbd96d9df0774971768ed06de2f0: |
10 | 10 | ||
11 | MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver (2021-07-29 17:17:34 +0100) | 11 | misc: fix spelling (2018-02-15 09:39:49 +0000) |
12 | 12 | ||
13 | ---------------------------------------------------------------- | 13 | ---------------------------------------------------------------- |
14 | Pull request | 14 | Pull request |
15 | 15 | ||
16 | The main fix here is for io_uring. Spurious -EAGAIN errors can happen and the | 16 | v2: |
17 | request needs to be resubmitted. | 17 | * Dropped Fam's git-publish series because there is still ongoing discussion |
18 | |||
19 | The MAINTAINERS changes carry no risk and we might as well include them in QEMU | ||
20 | 6.1. | ||
21 | 18 | ||
22 | ---------------------------------------------------------------- | 19 | ---------------------------------------------------------------- |
23 | 20 | ||
24 | Fabian Ebner (1): | 21 | Marc-André Lureau (1): |
25 | block/io_uring: resubmit when result is -EAGAIN | 22 | misc: fix spelling |
26 | 23 | ||
27 | Philippe Mathieu-Daudé (1): | 24 | Stefan Hajnoczi (1): |
28 | MAINTAINERS: Added myself as a reviewer for the NVMe Block Driver | 25 | vl: pause vcpus before stopping iothreads |
29 | 26 | ||
30 | Stefano Garzarella (1): | 27 | Wolfgang Bumiller (1): |
31 | MAINTAINERS: add Stefano Garzarella as io_uring reviewer | 28 | ratelimit: don't align wait time with slices |
32 | 29 | ||
33 | MAINTAINERS | 2 ++ | 30 | include/qemu/ratelimit.h | 11 +++++------ |
34 | block/io_uring.c | 16 +++++++++++++++- | 31 | util/qemu-coroutine-lock.c | 2 +- |
35 | 2 files changed, 17 insertions(+), 1 deletion(-) | 32 | vl.c | 12 ++++++++++-- |
33 | 3 files changed, 16 insertions(+), 9 deletions(-) | ||
36 | 34 | ||
37 | -- | 35 | -- |
38 | 2.31.1 | 36 | 2.14.3 |
39 | 37 | ||
38 | diff view generated by jsdifflib |
1 | From: Philippe Mathieu-Daudé <philmd@redhat.com> | 1 | Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads |
---|---|---|---|
2 | before main() quits") introduced iothread_stop_all() to avoid the | ||
3 | following virtio-scsi assertion failure: | ||
2 | 4 | ||
3 | I'm interested in following the activity around the NVMe bdrv. | 5 | assert(blk_get_aio_context(d->conf.blk) == s->ctx); |
4 | 6 | ||
5 | Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> | 7 | Back then the assertion failed because when bdrv_close_all() made |
6 | Message-id: 20210728183340.2018313-1-philmd@redhat.com | 8 | d->conf.blk NULL, blk_get_aio_context() returned the global AioContext |
9 | instead of s->ctx. | ||
10 | |||
11 | The same assertion can still fail today when vcpus submit new I/O | ||
12 | requests after iothread_stop_all() has moved the BDS to the global | ||
13 | AioContext. | ||
14 | |||
15 | This patch hardens the iothread_stop_all() approach by pausing vcpus | ||
16 | before calling iothread_stop_all(). | ||
17 | |||
18 | Note that the assertion failure is a race condition. It is not possible | ||
19 | to reproduce it reliably. | ||
20 | |||
21 | Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | ||
22 | Message-id: 20180201110708.8080-1-stefanha@redhat.com | ||
7 | Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | 23 | Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> |
8 | --- | 24 | --- |
9 | MAINTAINERS | 1 + | 25 | vl.c | 12 ++++++++++-- |
10 | 1 file changed, 1 insertion(+) | 26 | 1 file changed, 10 insertions(+), 2 deletions(-) |
11 | 27 | ||
12 | diff --git a/MAINTAINERS b/MAINTAINERS | 28 | diff --git a/vl.c b/vl.c |
13 | index XXXXXXX..XXXXXXX 100644 | 29 | index XXXXXXX..XXXXXXX 100644 |
14 | --- a/MAINTAINERS | 30 | --- a/vl.c |
15 | +++ b/MAINTAINERS | 31 | +++ b/vl.c |
16 | @@ -XXX,XX +XXX,XX @@ F: block/null.c | 32 | @@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv, char **envp) |
17 | NVMe Block Driver | 33 | |
18 | M: Stefan Hajnoczi <stefanha@redhat.com> | 34 | main_loop(); |
19 | R: Fam Zheng <fam@euphon.net> | 35 | replay_disable_events(); |
20 | +R: Philippe Mathieu-Daudé <philmd@redhat.com> | 36 | + |
21 | L: qemu-block@nongnu.org | 37 | + /* The ordering of the following is delicate. Stop vcpus to prevent new |
22 | S: Supported | 38 | + * I/O requests being queued by the guest. Then stop IOThreads (this |
23 | F: block/nvme* | 39 | + * includes a drain operation and completes all request processing). At |
40 | + * this point emulated devices are still associated with their IOThreads | ||
41 | + * (if any) but no longer have any work to do. Only then can we close | ||
42 | + * block devices safely because we know there is no more I/O coming. | ||
43 | + */ | ||
44 | + pause_all_vcpus(); | ||
45 | iothread_stop_all(); | ||
46 | - | ||
47 | - pause_all_vcpus(); | ||
48 | bdrv_close_all(); | ||
49 | + | ||
50 | res_free(); | ||
51 | |||
52 | /* vhost-user must be cleaned up before chardevs. */ | ||
24 | -- | 53 | -- |
25 | 2.31.1 | 54 | 2.14.3 |
26 | 55 | ||
56 | diff view generated by jsdifflib |
1 | From: Fabian Ebner <f.ebner@proxmox.com> | 1 | From: Wolfgang Bumiller <w.bumiller@proxmox.com> |
---|---|---|---|
2 | 2 | ||
3 | Linux SCSI can throw spurious -EAGAIN in some corner cases in its | 3 | It is possible for rate limited writes to keep overshooting a slice's |
4 | completion path, which will end up being the result in the completed | 4 | quota by a tiny amount causing the slice-aligned waiting period to |
5 | io_uring request. | 5 | effectively halve the rate. |
6 | 6 | ||
7 | Resubmitting such requests should allow block jobs to complete, even | 7 | Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> |
8 | if such spurious errors are encountered. | 8 | Reviewed-by: Alberto Garcia <berto@igalia.com> |
9 | 9 | Message-id: 20180207071758.6818-1-w.bumiller@proxmox.com | |
10 | Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com> | ||
11 | Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> | ||
12 | Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> | ||
13 | Message-id: 20210729091029.65369-1-f.ebner@proxmox.com | ||
14 | Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | 10 | Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> |
15 | --- | 11 | --- |
16 | block/io_uring.c | 16 +++++++++++++++- | 12 | include/qemu/ratelimit.h | 11 +++++------ |
17 | 1 file changed, 15 insertions(+), 1 deletion(-) | 13 | 1 file changed, 5 insertions(+), 6 deletions(-) |
18 | 14 | ||
19 | diff --git a/block/io_uring.c b/block/io_uring.c | 15 | diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h |
20 | index XXXXXXX..XXXXXXX 100644 | 16 | index XXXXXXX..XXXXXXX 100644 |
21 | --- a/block/io_uring.c | 17 | --- a/include/qemu/ratelimit.h |
22 | +++ b/block/io_uring.c | 18 | +++ b/include/qemu/ratelimit.h |
23 | @@ -XXX,XX +XXX,XX @@ static void luring_process_completions(LuringState *s) | 19 | @@ -XXX,XX +XXX,XX @@ typedef struct { |
24 | total_bytes = ret + luringcb->total_read; | 20 | static inline int64_t ratelimit_calculate_delay(RateLimit *limit, uint64_t n) |
25 | 21 | { | |
26 | if (ret < 0) { | 22 | int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME); |
27 | - if (ret == -EINTR) { | 23 | - uint64_t delay_slices; |
28 | + /* | 24 | + double delay_slices; |
29 | + * Only writev/readv/fsync requests on regular files or host block | 25 | |
30 | + * devices are submitted. Therefore -EAGAIN is not expected but it's | 26 | assert(limit->slice_quota && limit->slice_ns); |
31 | + * known to happen sometimes with Linux SCSI. Submit again and hope | 27 | |
32 | + * the request completes successfully. | 28 | @@ -XXX,XX +XXX,XX @@ static inline int64_t ratelimit_calculate_delay(RateLimit *limit, uint64_t n) |
33 | + * | 29 | return 0; |
34 | + * For more information, see: | 30 | } |
35 | + * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u | 31 | |
36 | + * | 32 | - /* Quota exceeded. Calculate the next time slice we may start |
37 | + * If the code is changed to submit other types of requests in the | 33 | - * sending data again. */ |
38 | + * future, then this workaround may need to be extended to deal with | 34 | - delay_slices = (limit->dispatched + limit->slice_quota - 1) / |
39 | + * genuine -EAGAIN results that should not be resubmitted | 35 | - limit->slice_quota; |
40 | + * immediately. | 36 | + /* Quota exceeded. Wait based on the excess amount and then start a new |
41 | + */ | 37 | + * slice. */ |
42 | + if (ret == -EINTR || ret == -EAGAIN) { | 38 | + delay_slices = (double)limit->dispatched / limit->slice_quota; |
43 | luring_resubmit(s, luringcb); | 39 | limit->slice_end_time = limit->slice_start_time + |
44 | continue; | 40 | - delay_slices * limit->slice_ns; |
45 | } | 41 | + (uint64_t)(delay_slices * limit->slice_ns); |
42 | return limit->slice_end_time - now; | ||
43 | } | ||
44 | |||
46 | -- | 45 | -- |
47 | 2.31.1 | 46 | 2.14.3 |
48 | 47 | ||
48 | diff view generated by jsdifflib |
1 | From: Stefano Garzarella <sgarzare@redhat.com> | 1 | From: Marc-André Lureau <marcandre.lureau@redhat.com> |
---|---|---|---|
2 | 2 | ||
3 | I've been working with io_uring for a while so I'd like to help | 3 | s/pupulate/populate |
4 | with reviews. | ||
5 | 4 | ||
6 | Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> | 5 | Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> |
7 | Message-Id: <20210728131515.131045-1-sgarzare@redhat.com> | 6 | Reviewed-by: Peter Maydell <peter.maydell@linaro.org> |
7 | Message-id: 20180208162447.10851-1-marcandre.lureau@redhat.com | ||
8 | Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> | 8 | Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> |
9 | --- | 9 | --- |
10 | MAINTAINERS | 1 + | 10 | util/qemu-coroutine-lock.c | 2 +- |
11 | 1 file changed, 1 insertion(+) | 11 | 1 file changed, 1 insertion(+), 1 deletion(-) |
12 | 12 | ||
13 | diff --git a/MAINTAINERS b/MAINTAINERS | 13 | diff --git a/util/qemu-coroutine-lock.c b/util/qemu-coroutine-lock.c |
14 | index XXXXXXX..XXXXXXX 100644 | 14 | index XXXXXXX..XXXXXXX 100644 |
15 | --- a/MAINTAINERS | 15 | --- a/util/qemu-coroutine-lock.c |
16 | +++ b/MAINTAINERS | 16 | +++ b/util/qemu-coroutine-lock.c |
17 | @@ -XXX,XX +XXX,XX @@ Linux io_uring | 17 | @@ -XXX,XX +XXX,XX @@ void qemu_co_queue_run_restart(Coroutine *co) |
18 | M: Aarushi Mehta <mehta.aaru20@gmail.com> | 18 | * invalid memory. Therefore, use a temporary queue and do not touch |
19 | M: Julia Suvorova <jusual@redhat.com> | 19 | * the "co" coroutine as soon as you enter another one. |
20 | M: Stefan Hajnoczi <stefanha@redhat.com> | 20 | * |
21 | +R: Stefano Garzarella <sgarzare@redhat.com> | 21 | - * In its turn resumed "co" can pupulate "co_queue_wakeup" queue with |
22 | L: qemu-block@nongnu.org | 22 | + * In its turn resumed "co" can populate "co_queue_wakeup" queue with |
23 | S: Maintained | 23 | * new coroutines to be woken up. The caller, who has resumed "co", |
24 | F: block/io_uring.c | 24 | * will be responsible for traversing the same queue, which may cause |
25 | * a different wakeup order but not any missing wakeups. | ||
25 | -- | 26 | -- |
26 | 2.31.1 | 27 | 2.14.3 |
27 | 28 | ||
29 | diff view generated by jsdifflib |