[Qemu-devel] [PATCH v2] throttle: fix a qemu crash problem when calling blk_delete

sochin jiang posted 1 patch 6 years, 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/1508816031-82709-1-git-send-email-sochin.jiang@huawei.com
Test checkpatch passed
Test docker passed
Test s390x passed
block/block-backend.c   | 2 +-
block/throttle-groups.c | 4 +++-
2 files changed, 4 insertions(+), 2 deletions(-)
[Qemu-devel] [PATCH v2] throttle: fix a qemu crash problem when calling blk_delete
Posted by sochin jiang 6 years, 6 months ago
commit 7ca7f0 moves the throttling related part of the BDS life cycle
management to BlockBackend, adds call to
throttle_timers_detach_aio_context in blk_remove_bs.  commit 1606e
remove a block device from its throttle group in blk_delete by calling
blk_io_limits_disable, this fix an easily reproducible qemu crash. But
delete a BB without a BDS inserted could easily cause a qemu crash too
by calling bdrv_drained_begin in blk_io_limits_disable. Say, a simply
drive_add and then a drive_del command.

This patch removes draining BDS by calling throttle_group_unregister_tgm
directly instead of blk_io_limits_disable, leaves draining operation to
blk_remove_bs in case that there is no BDS inserted. Futhermore, make sure
throttle timers are initialized or attached before throttle_timers_destroy
is called in throttle_group_unregister_tgm.

Signed-off-by: sochin jiang <sochin.jiang@huawei.com>
---
 block/block-backend.c   | 2 +-
 block/throttle-groups.c | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 45d9101..39c7cca 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -341,7 +341,7 @@ static void blk_delete(BlockBackend *blk)
     assert(!blk->name);
     assert(!blk->dev);
     if (blk->public.throttle_group_member.throttle_state) {
-        blk_io_limits_disable(blk);
+        throttle_group_unregister_tgm(&blk->public.throttle_group_member);
     }
     if (blk->root) {
         blk_remove_bs(blk);
diff --git a/block/throttle-groups.c b/block/throttle-groups.c
index b291a88..c5f9af3 100644
--- a/block/throttle-groups.c
+++ b/block/throttle-groups.c
@@ -576,7 +576,9 @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
 
     /* remove the current tgm from the list */
     QLIST_REMOVE(tgm, round_robin);
-    throttle_timers_destroy(&tgm->throttle_timers);
+    if (throttle_timers_are_initialized(&tgm->throttle_timers)) {
+        throttle_timers_destroy(&tgm->throttle_timers);
+    }
     qemu_mutex_unlock(&tg->lock);
 
     throttle_group_unref(&tg->ts);
-- 
1.8.3.1


Re: [Qemu-devel] [PATCH v2] throttle: fix a qemu crash problem when calling blk_delete
Posted by Alberto Garcia 6 years, 6 months ago
On Tue 24 Oct 2017 05:33:51 AM CEST, sochin jiang wrote:
> --- a/block/throttle-groups.c
> +++ b/block/throttle-groups.c
> @@ -576,7 +576,9 @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
>  
>      /* remove the current tgm from the list */
>      QLIST_REMOVE(tgm, round_robin);
> -    throttle_timers_destroy(&tgm->throttle_timers);
> +    if (throttle_timers_are_initialized(&tgm->throttle_timers)) {
> +        throttle_timers_destroy(&tgm->throttle_timers);
> +    }
>      qemu_mutex_unlock(&tg->lock);
>  
>      throttle_group_unref(&tg->ts);

I don't know what the rest of the people think, but I'm not completely
convinced that it's a good idea to have an active ThrottleState inside a
ThrottleGroupMember without timers.

Perhaps in blk_remove_bs() after detaching the AioContext from the BDS
we can attach the default one (what you would get with
blk_get_aio_context()).

On the other hand I think that Manos's series to remove the legacy
throttling code gets rid of BlockBackend.ThrottleGroupMember completely.

Berto

Re: [Qemu-devel] [Qemu-block] [PATCH v2] throttle: fix a qemu crash problem when calling blk_delete
Posted by Stefan Hajnoczi 6 years, 5 months ago
On Wed, Oct 25, 2017 at 10:40:47AM +0200, Alberto Garcia wrote:
> On Tue 24 Oct 2017 05:33:51 AM CEST, sochin jiang wrote:
> > --- a/block/throttle-groups.c
> > +++ b/block/throttle-groups.c
> > @@ -576,7 +576,9 @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
> >  
> >      /* remove the current tgm from the list */
> >      QLIST_REMOVE(tgm, round_robin);
> > -    throttle_timers_destroy(&tgm->throttle_timers);
> > +    if (throttle_timers_are_initialized(&tgm->throttle_timers)) {
> > +        throttle_timers_destroy(&tgm->throttle_timers);
> > +    }
> >      qemu_mutex_unlock(&tg->lock);
> >  
> >      throttle_group_unref(&tg->ts);
> 
> I don't know what the rest of the people think, but I'm not completely
> convinced that it's a good idea to have an active ThrottleState inside a
> ThrottleGroupMember without timers.
> 
> Perhaps in blk_remove_bs() after detaching the AioContext from the BDS
> we can attach the default one (what you would get with
> blk_get_aio_context()).
> 
> On the other hand I think that Manos's series to remove the legacy
> throttling code gets rid of BlockBackend.ThrottleGroupMember completely.

What is the status of Manos' series?

It would be good to merge it and then consider currently open throttling
bugs again.

Stefan
Re: [Qemu-devel] [Qemu-block] [PATCH v2] throttle: fix a qemu crash problem when calling blk_delete
Posted by Manos Pitsidianakis 6 years, 5 months ago
On Fri, Nov 03, 2017 at 02:26:33PM +0000, Stefan Hajnoczi wrote:
>On Wed, Oct 25, 2017 at 10:40:47AM +0200, Alberto Garcia wrote:
>> On Tue 24 Oct 2017 05:33:51 AM CEST, sochin jiang wrote:
>> > --- a/block/throttle-groups.c
>> > +++ b/block/throttle-groups.c
>> > @@ -576,7 +576,9 @@ void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
>> >
>> >      /* remove the current tgm from the list */
>> >      QLIST_REMOVE(tgm, round_robin);
>> > -    throttle_timers_destroy(&tgm->throttle_timers);
>> > +    if (throttle_timers_are_initialized(&tgm->throttle_timers)) {
>> > +        throttle_timers_destroy(&tgm->throttle_timers);
>> > +    }
>> >      qemu_mutex_unlock(&tg->lock);
>> >
>> >      throttle_group_unref(&tg->ts);
>>
>> I don't know what the rest of the people think, but I'm not completely
>> convinced that it's a good idea to have an active ThrottleState inside a
>> ThrottleGroupMember without timers.
>>
>> Perhaps in blk_remove_bs() after detaching the AioContext from the BDS
>> we can attach the default one (what you would get with
>> blk_get_aio_context()).
>>
>> On the other hand I think that Manos's series to remove the legacy
>> throttling code gets rid of BlockBackend.ThrottleGroupMember completely.
>
>What is the status of Manos' series?
>
>It would be good to merge it and then consider currently open throttling
>bugs again.
>

I have been busy with schoolwork the past weeks, but I'm close to 
finishing some changes Kevin recommended for this series. I hope they 
will be ready soon.
Re: [Qemu-devel] [Qemu-block] [PATCH v2] throttle: fix a qemu crash problem when calling blk_delete
Posted by Stefan Hajnoczi 6 years, 5 months ago
On Tue, Oct 24, 2017 at 11:33:51AM +0800, sochin jiang wrote:
> commit 7ca7f0 moves the throttling related part of the BDS life cycle
> management to BlockBackend, adds call to
> throttle_timers_detach_aio_context in blk_remove_bs.  commit 1606e
> remove a block device from its throttle group in blk_delete by calling
> blk_io_limits_disable, this fix an easily reproducible qemu crash. But
> delete a BB without a BDS inserted could easily cause a qemu crash too
> by calling bdrv_drained_begin in blk_io_limits_disable. Say, a simply
> drive_add and then a drive_del command.
> 
> This patch removes draining BDS by calling throttle_group_unregister_tgm
> directly instead of blk_io_limits_disable, leaves draining operation to
> blk_remove_bs in case that there is no BDS inserted. Futhermore, make sure
> throttle timers are initialized or attached before throttle_timers_destroy
> is called in throttle_group_unregister_tgm.
> 
> Signed-off-by: sochin jiang <sochin.jiang@huawei.com>
> ---
>  block/block-backend.c   | 2 +-
>  block/throttle-groups.c | 4 +++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 45d9101..39c7cca 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -341,7 +341,7 @@ static void blk_delete(BlockBackend *blk)
>      assert(!blk->name);
>      assert(!blk->dev);
>      if (blk->public.throttle_group_member.throttle_state) {
> -        blk_io_limits_disable(blk);
> +        throttle_group_unregister_tgm(&blk->public.throttle_group_member);

The following assertions fail without the drain when there are pending
requests:

  void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
  {
      ThrottleState *ts = tgm->throttle_state;
      ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
      ThrottleGroupMember *token;
      int i;

      if (!ts) {
          /* Discard already unregistered tgm */
          return;
      }

      assert(tgm->pending_reqs[0] == 0 && tgm->pending_reqs[1] == 0);
      assert(qemu_co_queue_empty(&tgm->throttled_reqs[0]));
      assert(qemu_co_queue_empty(&tgm->throttled_reqs[1]));

A safer approach is making blk_io_limits_disable(blk) skip the draining
when blk_bs(blk) == NULL.  That is the only case where we are 100% sure
that there are no pending requests.
Re: [Qemu-devel] [Qemu-block] [PATCH v2] throttle: fix a qemu crash problem when calling blk_delete
Posted by Alberto Garcia 6 years, 5 months ago
On Thu 09 Nov 2017 06:12:10 PM CET, Stefan Hajnoczi wrote:
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index 45d9101..39c7cca 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -341,7 +341,7 @@ static void blk_delete(BlockBackend *blk)
>>      assert(!blk->name);
>>      assert(!blk->dev);
>>      if (blk->public.throttle_group_member.throttle_state) {
>> -        blk_io_limits_disable(blk);
>> +        throttle_group_unregister_tgm(&blk->public.throttle_group_member);
>
> The following assertions fail without the drain when there are pending
> requests:
>
>   void throttle_group_unregister_tgm(ThrottleGroupMember *tgm)
>   {
>       ThrottleState *ts = tgm->throttle_state;
>       ThrottleGroup *tg = container_of(ts, ThrottleGroup, ts);
>       ThrottleGroupMember *token;
>       int i;
>
>       if (!ts) {
>           /* Discard already unregistered tgm */
>           return;
>       }
>
>       assert(tgm->pending_reqs[0] == 0 && tgm->pending_reqs[1] == 0);
>       assert(qemu_co_queue_empty(&tgm->throttled_reqs[0]));
>       assert(qemu_co_queue_empty(&tgm->throttled_reqs[1]));
>
> A safer approach is making blk_io_limits_disable(blk) skip the
> draining when blk_bs(blk) == NULL.  That is the only case where we are
> 100% sure that there are no pending requests.

I think so too.

And as I said in a previous e-mail I don't know if we should have a
valid blk->public.throttle_group_member.throttle_state with no timers at
all. I'd rather attach the default AioContext while the BDS is removed.

I'll try to prepare a patch with all of this together.

Berto