There's a cross-dependency between closing the block exports and
draining the block layer. The latter needs that we close all export's
client connections to ensure they won't queue more requests, but the
exports may have coroutines yielding in the block layer, which implies
they can't be fully closed until we drain it.
To break this cross-dependency, this change adds a "bool wait"
argument to blk_exp_close_all() and blk_exp_close_all_type(), so
callers can decide whether they want to wait for the exports to be
fully quiesced, or just return after requesting them to shut down.
Then, in bdrv_close_all we make two calls, one without waiting to
close all client connections, and another after draining the block
layer, this time waiting for the exports to be fully quiesced.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900505
Signed-off-by: Sergio Lopez <slp@redhat.com>
---
block.c | 20 +++++++++++++++++++-
block/export/export.c | 10 ++++++----
blockdev-nbd.c | 2 +-
include/block/export.h | 4 ++--
qemu-nbd.c | 2 +-
stubs/blk-exp-close-all.c | 2 +-
6 files changed, 30 insertions(+), 10 deletions(-)
diff --git a/block.c b/block.c
index bc8a66ab6e..41db70ac07 100644
--- a/block.c
+++ b/block.c
@@ -4472,13 +4472,31 @@ static void bdrv_close(BlockDriverState *bs)
void bdrv_close_all(void)
{
assert(job_next(NULL) == NULL);
- blk_exp_close_all();
+
+ /*
+ * There's a cross-dependency between closing the block exports and
+ * draining the block layer. The latter needs that we close all export's
+ * client connections to ensure they won't queue more requests, but the
+ * exports may have coroutines yielding in the block layer, which implies
+ * they can't be fully closed until we drain it.
+ *
+ * Make a first call to close all export's client connections, without
+ * waiting for each export to be fully quiesced.
+ */
+ blk_exp_close_all(false);
/* Drop references from requests still in flight, such as canceled block
* jobs whose AIO context has not been polled yet */
bdrv_drain_all();
blk_remove_all_bs();
+
+ /*
+ * Make a second call to shut down the exports, this time waiting for them
+ * to be fully quiesced.
+ */
+ blk_exp_close_all(true);
+
blockdev_close_all_bdrv_states();
assert(QTAILQ_EMPTY(&all_bdrv_states));
diff --git a/block/export/export.c b/block/export/export.c
index bad6f21b1c..0124ebd9f9 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -280,7 +280,7 @@ static bool blk_exp_has_type(BlockExportType type)
}
/* type == BLOCK_EXPORT_TYPE__MAX for all types */
-void blk_exp_close_all_type(BlockExportType type)
+void blk_exp_close_all_type(BlockExportType type, bool wait)
{
BlockExport *exp, *next;
@@ -293,12 +293,14 @@ void blk_exp_close_all_type(BlockExportType type)
blk_exp_request_shutdown(exp);
}
- AIO_WAIT_WHILE(NULL, blk_exp_has_type(type));
+ if (wait) {
+ AIO_WAIT_WHILE(NULL, blk_exp_has_type(type));
+ }
}
-void blk_exp_close_all(void)
+void blk_exp_close_all(bool wait)
{
- blk_exp_close_all_type(BLOCK_EXPORT_TYPE__MAX);
+ blk_exp_close_all_type(BLOCK_EXPORT_TYPE__MAX, wait);
}
void qmp_block_export_add(BlockExportOptions *export, Error **errp)
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index d8443d235b..d71d4da7c2 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -266,7 +266,7 @@ void qmp_nbd_server_stop(Error **errp)
return;
}
- blk_exp_close_all_type(BLOCK_EXPORT_TYPE_NBD);
+ blk_exp_close_all_type(BLOCK_EXPORT_TYPE_NBD, true);
nbd_server_free(nbd_server);
nbd_server = NULL;
diff --git a/include/block/export.h b/include/block/export.h
index 7feb02e10d..71c25928ce 100644
--- a/include/block/export.h
+++ b/include/block/export.h
@@ -83,7 +83,7 @@ BlockExport *blk_exp_find(const char *id);
void blk_exp_ref(BlockExport *exp);
void blk_exp_unref(BlockExport *exp);
void blk_exp_request_shutdown(BlockExport *exp);
-void blk_exp_close_all(void);
-void blk_exp_close_all_type(BlockExportType type);
+void blk_exp_close_all(bool wait);
+void blk_exp_close_all_type(BlockExportType type, bool wait);
#endif
diff --git a/qemu-nbd.c b/qemu-nbd.c
index a7075c5419..928f4466f6 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -1122,7 +1122,7 @@ int main(int argc, char **argv)
do {
main_loop_wait(false);
if (state == TERMINATE) {
- blk_exp_close_all();
+ blk_exp_close_all(true);
state = TERMINATED;
}
} while (state != TERMINATED);
diff --git a/stubs/blk-exp-close-all.c b/stubs/blk-exp-close-all.c
index 1c71316763..ecd0ce611f 100644
--- a/stubs/blk-exp-close-all.c
+++ b/stubs/blk-exp-close-all.c
@@ -2,6 +2,6 @@
#include "block/export.h"
/* Only used in programs that support block exports (libblockdev.fa) */
-void blk_exp_close_all(void)
+void blk_exp_close_all(bool wait)
{
}
--
2.26.2
Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben:
> There's a cross-dependency between closing the block exports and
> draining the block layer. The latter needs that we close all export's
> client connections to ensure they won't queue more requests, but the
> exports may have coroutines yielding in the block layer, which implies
> they can't be fully closed until we drain it.
A coroutine that yielded must have some way to be reentered. So I guess
the quesiton becomes why they aren't reentered until drain. We do
process events:
AIO_WAIT_WHILE(NULL, blk_exp_has_type(type));
So in theory, anything that would finalise the block export closing
should still execute.
What is the difference that drain makes compared to a simple
AIO_WAIT_WHILE, so that coroutine are reentered during drain, but not
during AIO_WAIT_WHILE?
This is an even more interesting question because the NBD server isn't a
block node nor a BdrvChildClass implementation, so it shouldn't even
notice a drain operation.
Kevin
> To break this cross-dependency, this change adds a "bool wait"
> argument to blk_exp_close_all() and blk_exp_close_all_type(), so
> callers can decide whether they want to wait for the exports to be
> fully quiesced, or just return after requesting them to shut down.
>
> Then, in bdrv_close_all we make two calls, one without waiting to
> close all client connections, and another after draining the block
> layer, this time waiting for the exports to be fully quiesced.
>
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900505
> Signed-off-by: Sergio Lopez <slp@redhat.com>
> ---
> block.c | 20 +++++++++++++++++++-
> block/export/export.c | 10 ++++++----
> blockdev-nbd.c | 2 +-
> include/block/export.h | 4 ++--
> qemu-nbd.c | 2 +-
> stubs/blk-exp-close-all.c | 2 +-
> 6 files changed, 30 insertions(+), 10 deletions(-)
>
> diff --git a/block.c b/block.c
> index bc8a66ab6e..41db70ac07 100644
> --- a/block.c
> +++ b/block.c
> @@ -4472,13 +4472,31 @@ static void bdrv_close(BlockDriverState *bs)
> void bdrv_close_all(void)
> {
> assert(job_next(NULL) == NULL);
> - blk_exp_close_all();
> +
> + /*
> + * There's a cross-dependency between closing the block exports and
> + * draining the block layer. The latter needs that we close all export's
> + * client connections to ensure they won't queue more requests, but the
> + * exports may have coroutines yielding in the block layer, which implies
> + * they can't be fully closed until we drain it.
> + *
> + * Make a first call to close all export's client connections, without
> + * waiting for each export to be fully quiesced.
> + */
> + blk_exp_close_all(false);
>
> /* Drop references from requests still in flight, such as canceled block
> * jobs whose AIO context has not been polled yet */
> bdrv_drain_all();
>
> blk_remove_all_bs();
> +
> + /*
> + * Make a second call to shut down the exports, this time waiting for them
> + * to be fully quiesced.
> + */
> + blk_exp_close_all(true);
> +
> blockdev_close_all_bdrv_states();
>
> assert(QTAILQ_EMPTY(&all_bdrv_states));
> diff --git a/block/export/export.c b/block/export/export.c
> index bad6f21b1c..0124ebd9f9 100644
> --- a/block/export/export.c
> +++ b/block/export/export.c
> @@ -280,7 +280,7 @@ static bool blk_exp_has_type(BlockExportType type)
> }
>
> /* type == BLOCK_EXPORT_TYPE__MAX for all types */
> -void blk_exp_close_all_type(BlockExportType type)
> +void blk_exp_close_all_type(BlockExportType type, bool wait)
> {
> BlockExport *exp, *next;
>
> @@ -293,12 +293,14 @@ void blk_exp_close_all_type(BlockExportType type)
> blk_exp_request_shutdown(exp);
> }
>
> - AIO_WAIT_WHILE(NULL, blk_exp_has_type(type));
> + if (wait) {
> + AIO_WAIT_WHILE(NULL, blk_exp_has_type(type));
> + }
> }
>
> -void blk_exp_close_all(void)
> +void blk_exp_close_all(bool wait)
> {
> - blk_exp_close_all_type(BLOCK_EXPORT_TYPE__MAX);
> + blk_exp_close_all_type(BLOCK_EXPORT_TYPE__MAX, wait);
> }
>
> void qmp_block_export_add(BlockExportOptions *export, Error **errp)
> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index d8443d235b..d71d4da7c2 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -266,7 +266,7 @@ void qmp_nbd_server_stop(Error **errp)
> return;
> }
>
> - blk_exp_close_all_type(BLOCK_EXPORT_TYPE_NBD);
> + blk_exp_close_all_type(BLOCK_EXPORT_TYPE_NBD, true);
>
> nbd_server_free(nbd_server);
> nbd_server = NULL;
> diff --git a/include/block/export.h b/include/block/export.h
> index 7feb02e10d..71c25928ce 100644
> --- a/include/block/export.h
> +++ b/include/block/export.h
> @@ -83,7 +83,7 @@ BlockExport *blk_exp_find(const char *id);
> void blk_exp_ref(BlockExport *exp);
> void blk_exp_unref(BlockExport *exp);
> void blk_exp_request_shutdown(BlockExport *exp);
> -void blk_exp_close_all(void);
> -void blk_exp_close_all_type(BlockExportType type);
> +void blk_exp_close_all(bool wait);
> +void blk_exp_close_all_type(BlockExportType type, bool wait);
>
> #endif
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index a7075c5419..928f4466f6 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -1122,7 +1122,7 @@ int main(int argc, char **argv)
> do {
> main_loop_wait(false);
> if (state == TERMINATE) {
> - blk_exp_close_all();
> + blk_exp_close_all(true);
> state = TERMINATED;
> }
> } while (state != TERMINATED);
> diff --git a/stubs/blk-exp-close-all.c b/stubs/blk-exp-close-all.c
> index 1c71316763..ecd0ce611f 100644
> --- a/stubs/blk-exp-close-all.c
> +++ b/stubs/blk-exp-close-all.c
> @@ -2,6 +2,6 @@
> #include "block/export.h"
>
> /* Only used in programs that support block exports (libblockdev.fa) */
> -void blk_exp_close_all(void)
> +void blk_exp_close_all(bool wait)
> {
> }
> --
> 2.26.2
>
On Tue, Dec 15, 2020 at 04:34:05PM +0100, Kevin Wolf wrote: > Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben: > > There's a cross-dependency between closing the block exports and > > draining the block layer. The latter needs that we close all export's > > client connections to ensure they won't queue more requests, but the > > exports may have coroutines yielding in the block layer, which implies > > they can't be fully closed until we drain it. > > A coroutine that yielded must have some way to be reentered. So I guess > the quesiton becomes why they aren't reentered until drain. We do > process events: > > AIO_WAIT_WHILE(NULL, blk_exp_has_type(type)); > > So in theory, anything that would finalise the block export closing > should still execute. > > What is the difference that drain makes compared to a simple > AIO_WAIT_WHILE, so that coroutine are reentered during drain, but not > during AIO_WAIT_WHILE? > > This is an even more interesting question because the NBD server isn't a > block node nor a BdrvChildClass implementation, so it shouldn't even > notice a drain operation. I agree in that this deserves a deeper analysis. I'm going to drop this patch from the series, and will re-analyze the issue later. Thanks, Sergio.
On Tue, Dec 15, 2020 at 04:34:05PM +0100, Kevin Wolf wrote:
> Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben:
> > There's a cross-dependency between closing the block exports and
> > draining the block layer. The latter needs that we close all export's
> > client connections to ensure they won't queue more requests, but the
> > exports may have coroutines yielding in the block layer, which implies
> > they can't be fully closed until we drain it.
>
> A coroutine that yielded must have some way to be reentered. So I guess
> the quesiton becomes why they aren't reentered until drain. We do
> process events:
>
> AIO_WAIT_WHILE(NULL, blk_exp_has_type(type));
>
> So in theory, anything that would finalise the block export closing
> should still execute.
>
> What is the difference that drain makes compared to a simple
> AIO_WAIT_WHILE, so that coroutine are reentered during drain, but not
> during AIO_WAIT_WHILE?
>
> This is an even more interesting question because the NBD server isn't a
> block node nor a BdrvChildClass implementation, so it shouldn't even
> notice a drain operation.
OK, took a deeper dive into the issue. While shutting down the guest,
some co-routines from the NBD server are stuck here:
nbd_trip
nbd_handle_request
nbd_do_cmd_read
nbd_co_send_sparse_read
blk_pread
blk_prw
blk_read_entry
blk_do_preadv
blk_wait_while_drained
qemu_co_queue_wait
This happens because bdrv_close_all() is called after
bdrv_drain_all_begin(), so all block backends are quiesced.
An alternative approach to this patch would be moving
blk_exp_close_all() to vl.c:qemu_cleanup, before
bdrv_drain_all_begin().
Do you have a preference for one of these options?
Thanks,
Sergio.
© 2016 - 2025 Red Hat, Inc.