qemu_cleanup: do vm_shutdown() before bdrv_drain_all_begin()

[RFC] qemu_cleanup: do vm_shutdown() before bdrv_drain_all_begin()

Posted by Vladimir Sementsov-Ogievskiy 4 years, 6 months ago

That doesn't seem good to stop handling io when guest is still running.
For example it leads to the following:

After bdrv_drain_all_begin(), during vm_shutdown() scsi_dma_writev()
calls blk_aio_pwritev(). As we are in drained section the request waits
in blk_wait_while_drained().

Next, during bdrv_close_all() bs is removed from blk, and blk drain
finishes. So, the request is resumed, and fails with -ENOMEDIUM.
Corresponding BLOCK_IO_ERROR event is sent and takes place in libvirt
log. That doesn't seem good.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

Hi all!

In our product (v5.2 based) we faced -ENOMEDIUM BLOCK_IO_ERROR events
during qemu termination (by SIGTERM). I don't have a reproducer for
master. Still the problem seems possible.

Ideas of how to reproduce it are welcome.

Also, I thought that issuing blk_ requests from SCSI is not possible
during drained section, and logic with blk_wait_while_drained() was
introduced for IDE..  Which code is responsible for not issuing SCSI
requests during drained sections? May be it is racy.. Or it may be our
downstream problem, I don't know :(

 softmmu/runstate.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 10d9b7365a..1966d773f3 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -797,21 +797,18 @@ void qemu_cleanup(void)
      */
     blk_exp_close_all();
 
+    /* No more vcpu or device emulation activity beyond this point */
+    vm_shutdown();
+    replay_finish();
+
     /*
      * We must cancel all block jobs while the block layer is drained,
      * or cancelling will be affected by throttling and thus may block
      * for an extended period of time.
-     * vm_shutdown() will bdrv_drain_all(), so we may as well include
-     * it in the drained section.
      * We do not need to end this section, because we do not want any
      * requests happening from here on anyway.
      */
     bdrv_drain_all_begin();
-
-    /* No more vcpu or device emulation activity beyond this point */
-    vm_shutdown();
-    replay_finish();
-
     job_cancel_sync_all();
     bdrv_close_all();
 
-- 
2.29.2

Re: [RFC] qemu_cleanup: do vm_shutdown() before bdrv_drain_all_begin()

Posted by Vladimir Sementsov-Ogievskiy 4 years, 5 months ago

ping

30.07.2021 17:29, Vladimir Sementsov-Ogievskiy wrote:
> That doesn't seem good to stop handling io when guest is still running.
> For example it leads to the following:
> 
> After bdrv_drain_all_begin(), during vm_shutdown() scsi_dma_writev()
> calls blk_aio_pwritev(). As we are in drained section the request waits
> in blk_wait_while_drained().
> 
> Next, during bdrv_close_all() bs is removed from blk, and blk drain
> finishes. So, the request is resumed, and fails with -ENOMEDIUM.
> Corresponding BLOCK_IO_ERROR event is sent and takes place in libvirt
> log. That doesn't seem good.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> Hi all!
> 
> In our product (v5.2 based) we faced -ENOMEDIUM BLOCK_IO_ERROR events
> during qemu termination (by SIGTERM). I don't have a reproducer for
> master. Still the problem seems possible.
> 
> Ideas of how to reproduce it are welcome.
> 
> Also, I thought that issuing blk_ requests from SCSI is not possible
> during drained section, and logic with blk_wait_while_drained() was
> introduced for IDE..  Which code is responsible for not issuing SCSI
> requests during drained sections? May be it is racy.. Or it may be our
> downstream problem, I don't know :(
> 
>   softmmu/runstate.c | 11 ++++-------
>   1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
> index 10d9b7365a..1966d773f3 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -797,21 +797,18 @@ void qemu_cleanup(void)
>        */
>       blk_exp_close_all();
>   
> +    /* No more vcpu or device emulation activity beyond this point */
> +    vm_shutdown();
> +    replay_finish();
> +
>       /*
>        * We must cancel all block jobs while the block layer is drained,
>        * or cancelling will be affected by throttling and thus may block
>        * for an extended period of time.
> -     * vm_shutdown() will bdrv_drain_all(), so we may as well include
> -     * it in the drained section.
>        * We do not need to end this section, because we do not want any
>        * requests happening from here on anyway.
>        */
>       bdrv_drain_all_begin();
> -
> -    /* No more vcpu or device emulation activity beyond this point */
> -    vm_shutdown();
> -    replay_finish();
> -
>       job_cancel_sync_all();
>       bdrv_close_all();
>   
> 


-- 
Best regards,
Vladimir