vl.c/exit: pause cpus before closing block devices

[Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Dr. David Alan Gilbert (git) 6 years, 9 months ago

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

There's a rare exit seg if the guest is accessing
IO during exit.
It's always hitting the atomic_inc(&bs->in_flight) with a NULL
bs. This was added recently in 99723548  but I don't see it
as the cause.

Flip vl.c around so we pause the cpus before closing the block devices,
that way we shouldn't have anything trying to access them when
they're gone.

This was originally Red Hat bz https://bugzilla.redhat.com/show_bug.cgi?id=1451015

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reported-by: Cong Li <coli@redhat.com>

--
This is a very rare race, I'll leave it running in a loop to see if
we hit anything else and to check this really fixes it.

I do worry if there are other cases that can trigger this - e.g.
hot-unplug or ejecting a CD.

---
 vl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vl.c b/vl.c
index f7560de622..d53f715e91 100644
--- a/vl.c
+++ b/vl.c
@@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp)
     replay_disable_events();
     iothread_stop_all();
 
-    bdrv_close_all();
     pause_all_vcpus();
+    bdrv_close_all();
     res_free();
 
     /* vhost-user must be cleaned up before chardevs.  */
-- 
2.13.0

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Stefan Hajnoczi 6 years, 9 months ago

On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> There's a rare exit seg if the guest is accessing
> IO during exit.
> It's always hitting the atomic_inc(&bs->in_flight) with a NULL
> bs. This was added recently in 99723548  but I don't see it
> as the cause.
> 
> Flip vl.c around so we pause the cpus before closing the block devices,
> that way we shouldn't have anything trying to access them when
> they're gone.
> 
> This was originally Red Hat bz https://bugzilla.redhat.com/show_bug.cgi?id=1451015
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reported-by: Cong Li <coli@redhat.com>
> 
> --
> This is a very rare race, I'll leave it running in a loop to see if
> we hit anything else and to check this really fixes it.
> 
> I do worry if there are other cases that can trigger this - e.g.
> hot-unplug or ejecting a CD.
> 
> ---
>  vl.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Dr. David Alan Gilbert 6 years, 9 months ago

* Stefan Hajnoczi (stefanha@gmail.com) wrote:
> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > There's a rare exit seg if the guest is accessing
> > IO during exit.
> > It's always hitting the atomic_inc(&bs->in_flight) with a NULL
> > bs. This was added recently in 99723548  but I don't see it
> > as the cause.
> > 
> > Flip vl.c around so we pause the cpus before closing the block devices,
> > that way we shouldn't have anything trying to access them when
> > they're gone.
> > 
> > This was originally Red Hat bz https://bugzilla.redhat.com/show_bug.cgi?id=1451015
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Reported-by: Cong Li <coli@redhat.com>
> > 
> > --
> > This is a very rare race, I'll leave it running in a loop to see if
> > we hit anything else and to check this really fixes it.
> > 
> > I do worry if there are other cases that can trigger this - e.g.
> > hot-unplug or ejecting a CD.
> > 
> > ---
> >  vl.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Thanks;  and the test I left running seems solid - ~12k runs
over the weekend with no seg.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by John Snow 6 years, 9 months ago


On 07/17/2017 06:26 AM, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@gmail.com) wrote:
>> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote:
>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>
>>> There's a rare exit seg if the guest is accessing
>>> IO during exit.
>>> It's always hitting the atomic_inc(&bs->in_flight) with a NULL
>>> bs. This was added recently in 99723548  but I don't see it
>>> as the cause.
>>>
>>> Flip vl.c around so we pause the cpus before closing the block devices,
>>> that way we shouldn't have anything trying to access them when
>>> they're gone.
>>>
>>> This was originally Red Hat bz https://bugzilla.redhat.com/show_bug.cgi?id=1451015
>>>
>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>> Reported-by: Cong Li <coli@redhat.com>
>>>
>>> --
>>> This is a very rare race, I'll leave it running in a loop to see if
>>> we hit anything else and to check this really fixes it.
>>>
>>> I do worry if there are other cases that can trigger this - e.g.
>>> hot-unplug or ejecting a CD.
>>>
>>> ---
>>>  vl.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Thanks;  and the test I left running seems solid - ~12k runs
> over the weekend with no seg.
> 
> Dave
> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

the root cause of this bug is related to this as well:
https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html

From commit 99723548 we started assuming (incorrectly?) that blk_
functions always WILL have an attached BDS, but this is not always true,
for instance, flushing the cache from an empty CDROM.

Paolo, can we move the flight counter increment outside of the
block-backend layer, is that safe?

--js

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Stefan Hajnoczi 6 years, 8 months ago

On Mon, Jul 17, 2017 at 5:43 PM, John Snow <jsnow@redhat.com> wrote:
> On 07/17/2017 06:26 AM, Dr. David Alan Gilbert wrote:
>> * Stefan Hajnoczi (stefanha@gmail.com) wrote:
>>> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote:
>>>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>>>
>>>> There's a rare exit seg if the guest is accessing
>>>> IO during exit.
>>>> It's always hitting the atomic_inc(&bs->in_flight) with a NULL
>>>> bs. This was added recently in 99723548  but I don't see it
>>>> as the cause.
>>>>
>>>> Flip vl.c around so we pause the cpus before closing the block devices,
>>>> that way we shouldn't have anything trying to access them when
>>>> they're gone.
>>>>
>>>> This was originally Red Hat bz https://bugzilla.redhat.com/show_bug.cgi?id=1451015
>>>>
>>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> Reported-by: Cong Li <coli@redhat.com>
>>>>
>>>> --
>>>> This is a very rare race, I'll leave it running in a loop to see if
>>>> we hit anything else and to check this really fixes it.
>>>>
>>>> I do worry if there are other cases that can trigger this - e.g.
>>>> hot-unplug or ejecting a CD.
>>>>
>>>> ---
>>>>  vl.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>
>> Thanks;  and the test I left running seems solid - ~12k runs
>> over the weekend with no seg.
>>
>> Dave
>>
>> --
>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>
>
> the root cause of this bug is related to this as well:
> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
>
> From commit 99723548 we started assuming (incorrectly?) that blk_
> functions always WILL have an attached BDS, but this is not always true,
> for instance, flushing the cache from an empty CDROM.
>
> Paolo, can we move the flight counter increment outside of the
> block-backend layer, is that safe?

I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
regardless of the throttling timer issue discussed below.  BB cannot
assume that the BDS graph is non-empty.

Stefan

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Paolo Bonzini 6 years, 8 months ago

On 04/08/2017 11:58, Stefan Hajnoczi wrote:
>> the root cause of this bug is related to this as well:
>> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
>>
>> From commit 99723548 we started assuming (incorrectly?) that blk_
>> functions always WILL have an attached BDS, but this is not always true,
>> for instance, flushing the cache from an empty CDROM.
>>
>> Paolo, can we move the flight counter increment outside of the
>> block-backend layer, is that safe?
> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> regardless of the throttling timer issue discussed below.  BB cannot
> assume that the BDS graph is non-empty.

Can we make bdrv_aio_* return NULL (even temporarily) if there is no
attached BDS?  That would make it much easier to fix.

Paolo

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Kevin Wolf 6 years, 8 months ago

Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
> >> the root cause of this bug is related to this as well:
> >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
> >>
> >> From commit 99723548 we started assuming (incorrectly?) that blk_
> >> functions always WILL have an attached BDS, but this is not always true,
> >> for instance, flushing the cache from an empty CDROM.
> >>
> >> Paolo, can we move the flight counter increment outside of the
> >> block-backend layer, is that safe?
> > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> > regardless of the throttling timer issue discussed below.  BB cannot
> > assume that the BDS graph is non-empty.
> 
> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
> attached BDS?  That would make it much easier to fix.

Would the proper fix be much more complicated than the following? I must
admit that I don't fully understand the current state of affairs with
respect to threading, AioContext etc. so I may well be missing
something.

Note that my blk_drain() implementation doesn't necessarily drain
blk_bs(blk) completely, but only those requests that came from the
specific BlockBackend. I think this is what the callers want, but
if otherwise, it shouldn't be hard to change.

Kevin

diff --git a/block.c b/block.c
index f1c3e98a4d..07decc3b5f 100644
--- a/block.c
+++ b/block.c
@@ -4563,7 +4563,7 @@ out:
 
 AioContext *bdrv_get_aio_context(BlockDriverState *bs)
 {
-    return bs->aio_context;
+    return bs ? bs->aio_context : qemu_get_aio_context();
 }
 
 void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co)
diff --git a/block/block-backend.c b/block/block-backend.c
index 968438c149..649d11b4ef 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -68,6 +68,9 @@ struct BlockBackend {
     NotifierList remove_bs_notifiers, insert_bs_notifiers;
 
     int quiesce_counter;
+
+    /* Number of in-flight requests. Accessed with atomic ops. */
+    unsigned int in_flight;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -1109,6 +1112,16 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
     return bdrv_make_zero(blk->root, flags);
 }
 
+static void blk_inc_in_flight(BlockBackend *blk)
+{
+    atomic_inc(&blk->in_flight);
+}
+
+static void blk_dec_in_flight(BlockBackend *blk)
+{
+    atomic_dec(&blk->in_flight);
+}
+
 static void error_callback_bh(void *opaque)
 {
     struct BlockBackendAIOCB *acb = opaque;
@@ -1147,7 +1160,7 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
 static void blk_aio_complete(BlkAioEmAIOCB *acb)
 {
     if (acb->has_returned) {
-        bdrv_dec_in_flight(acb->common.bs);
+        blk_dec_in_flight(acb->rwco.blk);
         acb->common.cb(acb->common.opaque, acb->rwco.ret);
         qemu_aio_unref(acb);
     }
@@ -1168,7 +1181,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
     BlkAioEmAIOCB *acb;
     Coroutine *co;
 
-    bdrv_inc_in_flight(blk_bs(blk));
+    blk_inc_in_flight(blk);
     acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
     acb->rwco = (BlkRwCo) {
         .blk    = blk,
@@ -1405,8 +1418,16 @@ int blk_flush(BlockBackend *blk)
 
 void blk_drain(BlockBackend *blk)
 {
-    if (blk_bs(blk)) {
-        bdrv_drain(blk_bs(blk));
+    AioContext *ctx = blk_get_aio_context(blk);
+
+    while (atomic_read(&blk->in_flight)) {
+        aio_context_acquire(ctx);
+        aio_poll(ctx, false);
+        aio_context_release(ctx);
+
+        if (blk_bs(blk)) {
+            bdrv_drain(blk_bs(blk));
+        }
     }
 }
 
@@ -1453,10 +1474,11 @@ static void send_qmp_error_event(BlockBackend *blk,
                                  bool is_read, int error)
 {
     IoOperationType optype;
+    BlockDriverState *bs = blk_bs(blk);
 
     optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
     qapi_event_send_block_io_error(blk_name(blk),
-                                   bdrv_get_node_name(blk_bs(blk)), optype,
+                                   bs ? bdrv_get_node_name(bs) : "", optype,
                                    action, blk_iostatus_is_enabled(blk),
                                    error == ENOSPC, strerror(error),
                                    &error_abort);
diff --git a/tests/ide-test.c b/tests/ide-test.c
index bfd79ddbdc..ffbfb04271 100644
--- a/tests/ide-test.c
+++ b/tests/ide-test.c
@@ -689,6 +689,24 @@ static void test_flush_nodev(void)
     ide_test_quit();
 }
 
+static void test_flush_empty_drive(void)
+{
+    QPCIDevice *dev;
+    QPCIBar bmdma_bar, ide_bar;
+
+    ide_test_start("-device ide-cd,bus=ide.0");
+    dev = get_pci_device(&bmdma_bar, &ide_bar);
+
+    /* FLUSH CACHE command on device 0*/
+    qpci_io_writeb(dev, ide_bar, reg_device, 0);
+    qpci_io_writeb(dev, ide_bar, reg_command, CMD_FLUSH_CACHE);
+
+    /* Just testing that qemu doesn't crash... */
+
+    free_pci_device(dev);
+    ide_test_quit();
+}
+
 static void test_pci_retry_flush(void)
 {
     test_retry_flush("pc");
@@ -954,6 +972,7 @@ int main(int argc, char **argv)
 
     qtest_add_func("/ide/flush", test_flush);
     qtest_add_func("/ide/flush/nodev", test_flush_nodev);
+    qtest_add_func("/ide/flush/empty_drive", test_flush_empty_drive);
     qtest_add_func("/ide/flush/retry_pci", test_pci_retry_flush);
     qtest_add_func("/ide/flush/retry_isa", test_isa_retry_flush);
 
diff --git a/tests/test-block-backend.c b/tests/test-block-backend.c
new file mode 100644
index 0000000000..5348781a42
--- /dev/null
+++ b/tests/test-block-backend.c
@@ -0,0 +1,62 @@
+/*
+ * BlockBackend tests
+ *
+ * Copyright (c) 2017 Kevin Wolf <kwolf@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "block/block.h"
+#include "sysemu/block-backend.h"
+#include "qapi/error.h"
+
+static void test_drain_aio_error_flush_cb(void *opaque, int ret)
+{
+    bool *completed = opaque;
+
+    g_assert(ret == -ENOMEDIUM);
+    *completed = true;
+}
+
+static void test_drain_aio_error(void)
+{
+    BlockBackend *blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
+    BlockAIOCB *acb;
+    bool completed = false;
+
+    acb = blk_aio_flush(blk, test_drain_aio_error_flush_cb, &completed);
+    g_assert(acb != NULL);
+    g_assert(completed == false);
+
+    blk_drain(blk);
+    g_assert(completed == true);
+}
+
+int main(int argc, char **argv)
+{
+    bdrv_init();
+    qemu_init_main_loop(&error_abort);
+
+    g_test_init(&argc, &argv, NULL);
+
+    g_test_add_func("/block-backend/drain_aio_error", test_drain_aio_error);
+
+    return g_test_run();
+}
diff --git a/tests/Makefile.include b/tests/Makefile.include
index 59e536bf0b..7beb65397b 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -56,6 +56,7 @@ check-unit-y += tests/test-hbitmap$(EXESUF)
 gcov-files-test-hbitmap-y = blockjob.c
 check-unit-y += tests/test-blockjob$(EXESUF)
 check-unit-y += tests/test-blockjob-txn$(EXESUF)
+check-unit-y += tests/test-block-backend$(EXESUF)
 check-unit-y += tests/test-x86-cpuid$(EXESUF)
 # all code tested by test-x86-cpuid is inside topology.h
 gcov-files-test-x86-cpuid-y =
@@ -556,6 +557,7 @@ tests/test-aio-multithread$(EXESUF): tests/test-aio-multithread.o $(test-block-o
 tests/test-throttle$(EXESUF): tests/test-throttle.o $(test-block-obj-y)
 tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) $(test-util-obj-y)
 tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o $(test-block-obj-y) $(test-util-obj-y)
+tests/test-block-backend$(EXESUF): tests/test-block-backend.o $(test-block-obj-y) $(test-util-obj-y)
 tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
 tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
 tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) $(test-crypto-obj-y)

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Paolo Bonzini 6 years, 8 months ago

On 08/08/2017 12:02, Kevin Wolf wrote:
> Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
>> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
>>>> the root cause of this bug is related to this as well:
>>>> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
>>>>
>>>> From commit 99723548 we started assuming (incorrectly?) that blk_
>>>> functions always WILL have an attached BDS, but this is not always true,
>>>> for instance, flushing the cache from an empty CDROM.
>>>>
>>>> Paolo, can we move the flight counter increment outside of the
>>>> block-backend layer, is that safe?
>>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
>>> regardless of the throttling timer issue discussed below.  BB cannot
>>> assume that the BDS graph is non-empty.
>>
>> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
>> attached BDS?  That would make it much easier to fix.
> 
> Would the proper fix be much more complicated than the following? I must
> admit that I don't fully understand the current state of affairs with
> respect to threading, AioContext etc. so I may well be missing
> something.

Not much, but it's not complete either.  The issues I see are that: 1)
blk_drain_all does not take the new counter into account; 2)
bdrv_drain_all callers need to be audited to see if they should be
blk_drain_all (or more likely, only device BlockBackends should be drained).

Paolo

> Note that my blk_drain() implementation doesn't necessarily drain
> blk_bs(blk) completely, but only those requests that came from the
> specific BlockBackend. I think this is what the callers want, but
> if otherwise, it shouldn't be hard to change.

Yes, this should be what they want.

Paolo

> Kevin
> 
> diff --git a/block.c b/block.c
> index f1c3e98a4d..07decc3b5f 100644
> --- a/block.c
> +++ b/block.c
> @@ -4563,7 +4563,7 @@ out:
>  
>  AioContext *bdrv_get_aio_context(BlockDriverState *bs)
>  {
> -    return bs->aio_context;
> +    return bs ? bs->aio_context : qemu_get_aio_context();
>  }
>  
>  void bdrv_coroutine_enter(BlockDriverState *bs, Coroutine *co)
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 968438c149..649d11b4ef 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -68,6 +68,9 @@ struct BlockBackend {
>      NotifierList remove_bs_notifiers, insert_bs_notifiers;
>  
>      int quiesce_counter;
> +
> +    /* Number of in-flight requests. Accessed with atomic ops. */
> +    unsigned int in_flight;
>  };
>  
>  typedef struct BlockBackendAIOCB {
> @@ -1109,6 +1112,16 @@ int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
>      return bdrv_make_zero(blk->root, flags);
>  }
>  
> +static void blk_inc_in_flight(BlockBackend *blk)
> +{
> +    atomic_inc(&blk->in_flight);
> +}
> +
> +static void blk_dec_in_flight(BlockBackend *blk)
> +{
> +    atomic_dec(&blk->in_flight);
> +}
> +
>  static void error_callback_bh(void *opaque)
>  {
>      struct BlockBackendAIOCB *acb = opaque;
> @@ -1147,7 +1160,7 @@ static const AIOCBInfo blk_aio_em_aiocb_info = {
>  static void blk_aio_complete(BlkAioEmAIOCB *acb)
>  {
>      if (acb->has_returned) {
> -        bdrv_dec_in_flight(acb->common.bs);
> +        blk_dec_in_flight(acb->rwco.blk);
>          acb->common.cb(acb->common.opaque, acb->rwco.ret);
>          qemu_aio_unref(acb);
>      }
> @@ -1168,7 +1181,7 @@ static BlockAIOCB *blk_aio_prwv(BlockBackend *blk, int64_t offset, int bytes,
>      BlkAioEmAIOCB *acb;
>      Coroutine *co;
>  
> -    bdrv_inc_in_flight(blk_bs(blk));
> +    blk_inc_in_flight(blk);
>      acb = blk_aio_get(&blk_aio_em_aiocb_info, blk, cb, opaque);
>      acb->rwco = (BlkRwCo) {
>          .blk    = blk,
> @@ -1405,8 +1418,16 @@ int blk_flush(BlockBackend *blk)
>  
>  void blk_drain(BlockBackend *blk)
>  {
> -    if (blk_bs(blk)) {
> -        bdrv_drain(blk_bs(blk));
> +    AioContext *ctx = blk_get_aio_context(blk);
> +
> +    while (atomic_read(&blk->in_flight)) {
> +        aio_context_acquire(ctx);
> +        aio_poll(ctx, false);
> +        aio_context_release(ctx);
> +
> +        if (blk_bs(blk)) {
> +            bdrv_drain(blk_bs(blk));
> +        }
>      }
>  }
>  
> @@ -1453,10 +1474,11 @@ static void send_qmp_error_event(BlockBackend *blk,
>                                   bool is_read, int error)
>  {
>      IoOperationType optype;
> +    BlockDriverState *bs = blk_bs(blk);
>  
>      optype = is_read ? IO_OPERATION_TYPE_READ : IO_OPERATION_TYPE_WRITE;
>      qapi_event_send_block_io_error(blk_name(blk),
> -                                   bdrv_get_node_name(blk_bs(blk)), optype,
> +                                   bs ? bdrv_get_node_name(bs) : "", optype,
>                                     action, blk_iostatus_is_enabled(blk),
>                                     error == ENOSPC, strerror(error),
>                                     &error_abort);
> diff --git a/tests/ide-test.c b/tests/ide-test.c
> index bfd79ddbdc..ffbfb04271 100644
> --- a/tests/ide-test.c
> +++ b/tests/ide-test.c
> @@ -689,6 +689,24 @@ static void test_flush_nodev(void)
>      ide_test_quit();
>  }
>  
> +static void test_flush_empty_drive(void)
> +{
> +    QPCIDevice *dev;
> +    QPCIBar bmdma_bar, ide_bar;
> +
> +    ide_test_start("-device ide-cd,bus=ide.0");
> +    dev = get_pci_device(&bmdma_bar, &ide_bar);
> +
> +    /* FLUSH CACHE command on device 0*/
> +    qpci_io_writeb(dev, ide_bar, reg_device, 0);
> +    qpci_io_writeb(dev, ide_bar, reg_command, CMD_FLUSH_CACHE);
> +
> +    /* Just testing that qemu doesn't crash... */
> +
> +    free_pci_device(dev);
> +    ide_test_quit();
> +}
> +
>  static void test_pci_retry_flush(void)
>  {
>      test_retry_flush("pc");
> @@ -954,6 +972,7 @@ int main(int argc, char **argv)
>  
>      qtest_add_func("/ide/flush", test_flush);
>      qtest_add_func("/ide/flush/nodev", test_flush_nodev);
> +    qtest_add_func("/ide/flush/empty_drive", test_flush_empty_drive);
>      qtest_add_func("/ide/flush/retry_pci", test_pci_retry_flush);
>      qtest_add_func("/ide/flush/retry_isa", test_isa_retry_flush);
>  
> diff --git a/tests/test-block-backend.c b/tests/test-block-backend.c
> new file mode 100644
> index 0000000000..5348781a42
> --- /dev/null
> +++ b/tests/test-block-backend.c
> @@ -0,0 +1,62 @@
> +/*
> + * BlockBackend tests
> + *
> + * Copyright (c) 2017 Kevin Wolf <kwolf@redhat.com>
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "block/block.h"
> +#include "sysemu/block-backend.h"
> +#include "qapi/error.h"
> +
> +static void test_drain_aio_error_flush_cb(void *opaque, int ret)
> +{
> +    bool *completed = opaque;
> +
> +    g_assert(ret == -ENOMEDIUM);
> +    *completed = true;
> +}
> +
> +static void test_drain_aio_error(void)
> +{
> +    BlockBackend *blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL);
> +    BlockAIOCB *acb;
> +    bool completed = false;
> +
> +    acb = blk_aio_flush(blk, test_drain_aio_error_flush_cb, &completed);
> +    g_assert(acb != NULL);
> +    g_assert(completed == false);
> +
> +    blk_drain(blk);
> +    g_assert(completed == true);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +    bdrv_init();
> +    qemu_init_main_loop(&error_abort);
> +
> +    g_test_init(&argc, &argv, NULL);
> +
> +    g_test_add_func("/block-backend/drain_aio_error", test_drain_aio_error);
> +
> +    return g_test_run();
> +}
> diff --git a/tests/Makefile.include b/tests/Makefile.include
> index 59e536bf0b..7beb65397b 100644
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -56,6 +56,7 @@ check-unit-y += tests/test-hbitmap$(EXESUF)
>  gcov-files-test-hbitmap-y = blockjob.c
>  check-unit-y += tests/test-blockjob$(EXESUF)
>  check-unit-y += tests/test-blockjob-txn$(EXESUF)
> +check-unit-y += tests/test-block-backend$(EXESUF)
>  check-unit-y += tests/test-x86-cpuid$(EXESUF)
>  # all code tested by test-x86-cpuid is inside topology.h
>  gcov-files-test-x86-cpuid-y =
> @@ -556,6 +557,7 @@ tests/test-aio-multithread$(EXESUF): tests/test-aio-multithread.o $(test-block-o
>  tests/test-throttle$(EXESUF): tests/test-throttle.o $(test-block-obj-y)
>  tests/test-blockjob$(EXESUF): tests/test-blockjob.o $(test-block-obj-y) $(test-util-obj-y)
>  tests/test-blockjob-txn$(EXESUF): tests/test-blockjob-txn.o $(test-block-obj-y) $(test-util-obj-y)
> +tests/test-block-backend$(EXESUF): tests/test-block-backend.o $(test-block-obj-y) $(test-util-obj-y)
>  tests/test-thread-pool$(EXESUF): tests/test-thread-pool.o $(test-block-obj-y)
>  tests/test-iov$(EXESUF): tests/test-iov.o $(test-util-obj-y)
>  tests/test-hbitmap$(EXESUF): tests/test-hbitmap.o $(test-util-obj-y) $(test-crypto-obj-y)
>

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Kevin Wolf 6 years, 8 months ago

Am 08.08.2017 um 13:04 hat Paolo Bonzini geschrieben:
> On 08/08/2017 12:02, Kevin Wolf wrote:
> > Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
> >> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
> >>>> the root cause of this bug is related to this as well:
> >>>> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
> >>>>
> >>>> From commit 99723548 we started assuming (incorrectly?) that blk_
> >>>> functions always WILL have an attached BDS, but this is not always true,
> >>>> for instance, flushing the cache from an empty CDROM.
> >>>>
> >>>> Paolo, can we move the flight counter increment outside of the
> >>>> block-backend layer, is that safe?
> >>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> >>> regardless of the throttling timer issue discussed below.  BB cannot
> >>> assume that the BDS graph is non-empty.
> >>
> >> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
> >> attached BDS?  That would make it much easier to fix.
> > 
> > Would the proper fix be much more complicated than the following? I must
> > admit that I don't fully understand the current state of affairs with
> > respect to threading, AioContext etc. so I may well be missing
> > something.
> 
> Not much, but it's not complete either.  The issues I see are that: 1)
> blk_drain_all does not take the new counter into account;

Ok, I think this does the trick:

void blk_drain_all(void)
{
    BlockBackend *blk = NULL;

    bdrv_drain_all_begin();
    while ((blk = blk_all_next(blk)) != NULL) {
        blk_drain(blk);
    }
    bdrv_drain_all_end();
}

> 2) bdrv_drain_all callers need to be audited to see if they should be
> blk_drain_all (or more likely, only device BlockBackends should be drained).

qmp_transaction() is unclear to me. It should be changed in some way
anyway because it uses bdrv_drain_all() rather than a begin/end pair.

do_vm_stop() and vm_stop_force_state() probably want blk_drain_all().

xen_invalidate_map_cache() - wtf? Looks like the wrong layer to do this,
but I guess blk_drain_all(), too.

block_migration_cleanup() is just lazy and really means a blk_drain()
for its own BlockBackends. blk_drain_all() as the simple conversion.

migration/savevm: Migration wants blk_drain_all() to get the devices
quiesced.

qemu-io: blk_drain_all(), too.

Hm, looks like there won't be many callers of bdrv_drain_all() left. :-)

> > Note that my blk_drain() implementation doesn't necessarily drain
> > blk_bs(blk) completely, but only those requests that came from the
> > specific BlockBackend. I think this is what the callers want, but
> > if otherwise, it shouldn't be hard to change.
> 
> Yes, this should be what they want.

Apparently not; block jobs don't complete with it any more. I haven't
checked in detail, but it makes sense that they can have a BH (e.g. for
block_job_defer_to_main_loop) without a request being in flight.

So I'm including an unconditional bdrv_drain() again. Or I guess,
calling aio_poll() unconditionally and including its return value
in the loop condition would be the cleaner approach?

Kevin

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Paolo Bonzini 6 years, 8 months ago

On 08/08/2017 13:56, Kevin Wolf wrote:
> Am 08.08.2017 um 13:04 hat Paolo Bonzini geschrieben:
>> On 08/08/2017 12:02, Kevin Wolf wrote:
>>> Am 04.08.2017 um 13:46 hat Paolo Bonzini geschrieben:
>>>> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
>>>>>> the root cause of this bug is related to this as well:
>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
>>>>>>
>>>>>> From commit 99723548 we started assuming (incorrectly?) that blk_
>>>>>> functions always WILL have an attached BDS, but this is not always true,
>>>>>> for instance, flushing the cache from an empty CDROM.
>>>>>>
>>>>>> Paolo, can we move the flight counter increment outside of the
>>>>>> block-backend layer, is that safe?
>>>>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
>>>>> regardless of the throttling timer issue discussed below.  BB cannot
>>>>> assume that the BDS graph is non-empty.
>>>>
>>>> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
>>>> attached BDS?  That would make it much easier to fix.
>>>
>>> Would the proper fix be much more complicated than the following? I must
>>> admit that I don't fully understand the current state of affairs with
>>> respect to threading, AioContext etc. so I may well be missing
>>> something.
>>
>> Not much, but it's not complete either.  The issues I see are that: 1)
>> blk_drain_all does not take the new counter into account;
> 
> Ok, I think this does the trick:
> 
> void blk_drain_all(void)
> {
>     BlockBackend *blk = NULL;
> 
>     bdrv_drain_all_begin();
>     while ((blk = blk_all_next(blk)) != NULL) {
>         blk_drain(blk);
>     }
>     bdrv_drain_all_end();
> }

Small note: blk_drain_all is called while no AioContext has been
acquired, while blk_drain requires you to acquire blk's AioContext.

And this of course should be split in blk_drain_all_begin/blk_drain_all_end.

>> 2) bdrv_drain_all callers need to be audited to see if they should be
>> blk_drain_all (or more likely, only device BlockBackends should be drained).
> 
> qmp_transaction() is unclear to me. It should be changed in some way
> anyway because it uses bdrv_drain_all() rather than a begin/end pair.

I think every ->prepare should do drained_begin and ->clean should end
the section.  Most of them are doing it already.

> do_vm_stop() and vm_stop_force_state() probably want blk_drain_all().

Or just all devices.  From the October 2016 discussion, "qdev
drive properties would install a vmstate notifier to quiesce their own
BlockBackend rather than relying on a blind bdrv_drain_all from vm_stop"
(though there'll surely be some non-qdevified straggler).

> xen_invalidate_map_cache() - wtf? Looks like the wrong layer to do this,
> but I guess blk_drain_all(), too.

It seems to be an advisory operation triggered by balloon deflation.
Whatever.

> block_migration_cleanup() is just lazy and really means a blk_drain()
> for its own BlockBackends. blk_drain_all() as the simple conversion.
> 
> migration/savevm: Migration wants blk_drain_all() to get the devices
> quiesced.

It already does vm_stop though.  It doesn't seem to be necessary at all.

> qemu-io: blk_drain_all(), too.

Or just blk_drain(qemuio_blk).

> Hm, looks like there won't be many callers of bdrv_drain_all() left. :-)

Not many blk_drain_all(), and that's not a bad thing.

You missed bdrv_reopen_multiple.  Maybe it also should not do anything,
and the caller should do begin/end instead.

>>> Note that my blk_drain() implementation doesn't necessarily drain
>>> blk_bs(blk) completely, but only those requests that came from the
>>> specific BlockBackend. I think this is what the callers want, but
>>> if otherwise, it shouldn't be hard to change.
>>
>> Yes, this should be what they want.
> 
> Apparently not; block jobs don't complete with it any more. I haven't
> checked in detail, but it makes sense that they can have a BH (e.g. for
> block_job_defer_to_main_loop) without a request being in flight.

That BH should be handled here:

    while (!job->completed) {
        aio_poll(qemu_get_aio_context(), true);
    }

so that should be okay.  (Block jobs have been a huge pain every time I
touched bdrv_drain, indeed...).

> So I'm including an unconditional bdrv_drain() again. Or I guess,
> calling aio_poll() unconditionally and including its return value
> in the loop condition would be the cleaner approach?

Note that if you have no block device then you don't have an AioContext
for aio_poll() to use.  In fact, we're really close to removing the
AioContext dependency for submitting I/O requests (AioContext remains
only as a place for network drivers to register their sockets' I/O
handlers), so you shouldn't use it from block-backend.c.

That's why I suggested returning NULL from AIO operations if there is no
blk_bs(blk).  It may be a little less tidy (I'm more worried about
static analysis lossage than anything else), but it does make some
sense.  Most callers are checking blk_is_available or blk_is_inserted
already, which explains why we're not seeing more crashes in
block-backend.c.

If you can set aside the blk_bs(blk) == NULL case, you're basically
reinventing BDRV_POLL_WHILE at this point.  If you add a bdrv_wakeup to
blk_dec_in_flight, and use BDRV_POLL_WHILE in blk_drain, then things
should in theory just work (though most likely they won't for some
reason that takes hours to debug).

Paolo

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Stefan Hajnoczi 6 years, 8 months ago

On Fri, Aug 04, 2017 at 01:46:17PM +0200, Paolo Bonzini wrote:
> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
> >> the root cause of this bug is related to this as well:
> >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
> >>
> >> From commit 99723548 we started assuming (incorrectly?) that blk_
> >> functions always WILL have an attached BDS, but this is not always true,
> >> for instance, flushing the cache from an empty CDROM.
> >>
> >> Paolo, can we move the flight counter increment outside of the
> >> block-backend layer, is that safe?
> > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> > regardless of the throttling timer issue discussed below.  BB cannot
> > assume that the BDS graph is non-empty.
> 
> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
> attached BDS?  That would make it much easier to fix.

There are many blk_aio_*() callers.  Returning NULL forces them to
perform extra error handling.

When you say "temporarily" do you mean it returns NULL but schedules a
one-shot BH to invoke the callback?  I wonder if we can use a singleton
aiocb instead of NULL for -ENOMEDIUM errors.

Stefan

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Kevin Wolf 6 years, 8 months ago

Am 08.08.2017 um 14:53 hat Stefan Hajnoczi geschrieben:
> On Fri, Aug 04, 2017 at 01:46:17PM +0200, Paolo Bonzini wrote:
> > On 04/08/2017 11:58, Stefan Hajnoczi wrote:
> > >> the root cause of this bug is related to this as well:
> > >> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
> > >>
> > >> From commit 99723548 we started assuming (incorrectly?) that blk_
> > >> functions always WILL have an attached BDS, but this is not always true,
> > >> for instance, flushing the cache from an empty CDROM.
> > >>
> > >> Paolo, can we move the flight counter increment outside of the
> > >> block-backend layer, is that safe?
> > > I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
> > > regardless of the throttling timer issue discussed below.  BB cannot
> > > assume that the BDS graph is non-empty.
> > 
> > Can we make bdrv_aio_* return NULL (even temporarily) if there is no
> > attached BDS?  That would make it much easier to fix.
> 
> There are many blk_aio_*() callers.  Returning NULL forces them to
> perform extra error handling.

Yes, that's my concern. We removed NULL returns a long time ago. Most
callers probably don't check for it any more.

> When you say "temporarily" do you mean it returns NULL but schedules a
> one-shot BH to invoke the callback?  I wonder if we can use a singleton
> aiocb instead of NULL for -ENOMEDIUM errors.

This doesn't help. As soon as you involve BHs, you need to consider them
during blk_drain(), otherwise the drain can return too early.

And if you want to consider them during blk_drain()... well, I made an
attempt, maybe we can make it work with some more changes. But I'm
starting to see that it's not a trivial change; though admittedly, the
NULL return thing doesn't look trivial either.

Kevin

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Paolo Bonzini 6 years, 8 months ago

On 08/08/2017 14:53, Stefan Hajnoczi wrote:
> On Fri, Aug 04, 2017 at 01:46:17PM +0200, Paolo Bonzini wrote:
>> On 04/08/2017 11:58, Stefan Hajnoczi wrote:
>>>> the root cause of this bug is related to this as well:
>>>> https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg02945.html
>>>>
>>>> From commit 99723548 we started assuming (incorrectly?) that blk_
>>>> functions always WILL have an attached BDS, but this is not always true,
>>>> for instance, flushing the cache from an empty CDROM.
>>>>
>>>> Paolo, can we move the flight counter increment outside of the
>>>> block-backend layer, is that safe?
>>> I think the bdrv_inc_in_flight(blk_bs(blk)) needs to be fixed
>>> regardless of the throttling timer issue discussed below.  BB cannot
>>> assume that the BDS graph is non-empty.
>>
>> Can we make bdrv_aio_* return NULL (even temporarily) if there is no
>> attached BDS?  That would make it much easier to fix.
> 
> There are many blk_aio_*() callers.  Returning NULL forces them to
> perform extra error handling.

Most of them either are for non-removable devices and check for a
non-empty BB at startup.  The others (ide-cd and scsi-cd, sd) check the
BlockBackend's blk_is_available or blk_is_inserted state themselves.

It does require some auditing of course, remember that anything that
would return NULL, would now be crashing already in bdrv_inc_in_flight.
We'd be seeing NULL pointer dereferences left and right, if it were so bad.

> When you say "temporarily" do you mean it returns NULL but schedules a
> one-shot BH to invoke the callback?  I wonder if we can use a singleton
> aiocb instead of NULL for -ENOMEDIUM errors.

No, I mean undoing that in 2.11.

Paolo

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Alberto Garcia 6 years, 8 months ago

On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote:
> --- a/vl.c
> +++ b/vl.c
> @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp)
>      replay_disable_events();
>      iothread_stop_all();
>  
> -    bdrv_close_all();
>      pause_all_vcpus();
> +    bdrv_close_all();
>      res_free();

I haven't debugged it yet, but in my computer iotest 093 stops working
(it never finishes) after this change.

Berto

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Dr. David Alan Gilbert 6 years, 8 months ago

* Alberto Garcia (berto@igalia.com) wrote:
> On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git) wrote:
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp)
> >      replay_disable_events();
> >      iothread_stop_all();
> >  
> > -    bdrv_close_all();
> >      pause_all_vcpus();
> > +    bdrv_close_all();
> >      res_free();
> 
> I haven't debugged it yet, but in my computer iotest 093 stops working
> (it never finishes) after this change.

Yes, I can reproduce that here (I've got to explicitly run 093 - it
doesn't do it automatically for me):

(gdb) where
#0  0x00007feee3780b76 in ppoll () at /lib64/libc.so.6
#1  0x0000564aed293199 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>)
    at /usr/include/bits/poll2.h:77
#2  0x0000564aed293199 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>)
    at /home/dgilbert/git/qemu/util/qemu-timer.c:322
#3  0x0000564aed294de1 in aio_poll (ctx=ctx@entry=0x564aefa578e0, blocking=<optimized out>)
    at /home/dgilbert/git/qemu/util/aio-posix.c:622
#4  0x0000564aed215924 in bdrv_drain_recurse (bs=bs@entry=0x564aefa6c320) at /home/dgilbert/git/qemu/block/io.c:188
#5  0x0000564aed216345 in bdrv_drain_all_begin () at /home/dgilbert/git/qemu/block/io.c:353
#6  0x0000564aed2164b9 in bdrv_drain_all () at /home/dgilbert/git/qemu/block/io.c:382
#7  0x0000564aed1c9b63 in bdrv_close_all () at /home/dgilbert/git/qemu/block.c:3113
#8  0x0000564aeceb04fe in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
    at /home/dgilbert/git/qemu/vl.c:4795

Dave

> Berto
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Paolo Bonzini 6 years, 8 months ago


----- Original Message -----
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> To: "Alberto Garcia" <berto@igalia.com>
> Cc: qemu-devel@nongnu.org, pbonzini@redhat.com, jsnow@redhat.com
> Sent: Thursday, August 3, 2017 6:45:17 PM
> Subject: Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
> 
> * Alberto Garcia (berto@igalia.com) wrote:
> > On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git)
> > wrote:
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp)
> > >      replay_disable_events();
> > >      iothread_stop_all();
> > >  
> > > -    bdrv_close_all();
> > >      pause_all_vcpus();
> > > +    bdrv_close_all();
> > >      res_free();
> > 
> > I haven't debugged it yet, but in my computer iotest 093 stops working
> > (it never finishes) after this change.
> 
> Yes, I can reproduce that here (I've got to explicitly run 093 - it
> doesn't do it automatically for me):

The culprit so to speak is this:

        if (qtest_enabled()) {
            /* For testing block IO throttling only */
            tg->clock_type = QEMU_CLOCK_VIRTUAL;
        }

So after pause_all_vcpus(), the clock doesn't advance and bdrv_close_all
hangs.  Should throttling be disabled by the time bdrv_close drains the
BlockDriverState, and likewise for bdrv_close_all?

Paolo

Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices

Posted by Stefan Hajnoczi 6 years, 8 months ago

On Thu, Aug 3, 2017 at 11:36 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> ----- Original Message -----
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> To: "Alberto Garcia" <berto@igalia.com>
>> Cc: qemu-devel@nongnu.org, pbonzini@redhat.com, jsnow@redhat.com
>> Sent: Thursday, August 3, 2017 6:45:17 PM
>> Subject: Re: [Qemu-devel] [PATCH] vl.c/exit: pause cpus before closing block devices
>>
>> * Alberto Garcia (berto@igalia.com) wrote:
>> > On Thu, Jul 13, 2017 at 08:01:16PM +0100, Dr. David Alan Gilbert (git)
>> > wrote:
>> > > --- a/vl.c
>> > > +++ b/vl.c
>> > > @@ -4787,8 +4787,8 @@ int main(int argc, char **argv, char **envp)
>> > >      replay_disable_events();
>> > >      iothread_stop_all();
>> > >
>> > > -    bdrv_close_all();
>> > >      pause_all_vcpus();
>> > > +    bdrv_close_all();
>> > >      res_free();
>> >
>> > I haven't debugged it yet, but in my computer iotest 093 stops working
>> > (it never finishes) after this change.
>>
>> Yes, I can reproduce that here (I've got to explicitly run 093 - it
>> doesn't do it automatically for me):
>
> The culprit so to speak is this:
>
>         if (qtest_enabled()) {
>             /* For testing block IO throttling only */
>             tg->clock_type = QEMU_CLOCK_VIRTUAL;
>         }
>
> So after pause_all_vcpus(), the clock doesn't advance and bdrv_close_all
> hangs.  Should throttling be disabled by the time bdrv_close drains the
> BlockDriverState, and likewise for bdrv_close_all?

bdrv_drain_all() is called before blk_remove_all_bs(), so throttling
is still enabled while draining:

void bdrv_close_all(void)
{
    block_job_cancel_sync_all();
    nbd_export_close_all();

    /* Drop references from requests still in flight, such as canceled block
     * jobs whose AIO context has not been polled yet */
    bdrv_drain_all();

    blk_remove_all_bs();

Perhaps we need a bdrv_throttle_disable_all() function.  Manos is
moving throttling to a block driver so there is no longer a 1:1
relationship between BB and throttling - there might be multiple
throtting nodes in a BDS graph.

Stefan