[PATCH v3] migration: Don't allow migration if vm is in POSTMIGRATE

Tuguoyi posted 1 patch 3 years, 4 months ago
Failed in applying to current master (apply log)
migration/migration.c | 6 ++++++
1 file changed, 6 insertions(+)
[PATCH v3] migration: Don't allow migration if vm is in POSTMIGRATE
Posted by Tuguoyi 3 years, 4 months ago
The following steps will cause qemu assertion failure:
- pause vm by executing 'virsh suspend'
- create external snapshot of memory and disk using 'virsh snapshot-create-as'
- doing the above operation again will cause qemu crash

The backtrace looks like:
#0  0x00007fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007fbf958beca2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x000055ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400) at /build/qemu-5.0/block.c:5724
#5  0x000055ca8dece967 in bdrv_inactivate_all () at /build//qemu-5.0/block.c:5792
#6  0x000055ca8de5539d in qemu_savevm_state_complete_precopy_non_iterable (inactivate_disks=true, in_postcopy=false, f=0x55ca907044b0)
    at /build/qemu-5.0/migration/savevm.c:1401
#7  qemu_savevm_state_complete_precopy (f=0x55ca907044b0, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true)
    at /build/qemu-5.0/migration/savevm.c:1453
#8  0x000055ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:2941
#9  migration_iteration_run (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3295
#10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3459
#11 0x000055ca8dfc6716 in qemu_thread_start (args=<optimized out>) at /build/qemu-5.0/util/qemu-thread-posix.c:519
#12 0x00007fbf95c5f184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6

When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE
flag by bdrv_inactivate_all(), and during the second migration the
bdrv_inactivate_recurse assert that the bs->open_flags is already
BDRV_O_INACTIVE enabled which cause crash.

As Vladimir suggested, this patch makes migrate_prepare check the state of vm and
return error if it is in RUN_STATE_POSTMIGRATE state.

Signed-off-by: Tuguoyi <tu.guoyi@h3c.com>
---
 migration/migration.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/migration/migration.c b/migration/migration.c
index 87a9b59..5e33962 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2115,6 +2115,12 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
         return false;
     }
 
+    if (runstate_check(RUN_STATE_POSTMIGRATE)) {
+        error_setg(errp, "Can't migrate the vm that was paused due to "
+                   "previous migration");
+        return false;
+    }
+
     if (migration_is_blocked(errp)) {
         return false;
     }
-- 
2.7.4

[Patch v2]: https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg01318.html
[Patch v1]: https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05950.html
Re: [PATCH v3] migration: Don't allow migration if vm is in POSTMIGRATE
Posted by Pankaj Gupta 3 years, 4 months ago
> The following steps will cause qemu assertion failure:
> - pause vm by executing 'virsh suspend'
> - create external snapshot of memory and disk using 'virsh snapshot-create-as'
> - doing the above operation again will cause qemu crash
>
> The backtrace looks like:
> #0  0x00007fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x00007fbf958beca2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x000055ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400) at /build/qemu-5.0/block.c:5724
> #5  0x000055ca8dece967 in bdrv_inactivate_all () at /build//qemu-5.0/block.c:5792
> #6  0x000055ca8de5539d in qemu_savevm_state_complete_precopy_non_iterable (inactivate_disks=true, in_postcopy=false, f=0x55ca907044b0)
>     at /build/qemu-5.0/migration/savevm.c:1401
> #7  qemu_savevm_state_complete_precopy (f=0x55ca907044b0, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true)
>     at /build/qemu-5.0/migration/savevm.c:1453
> #8  0x000055ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:2941
> #9  migration_iteration_run (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3295
> #10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3459
> #11 0x000055ca8dfc6716 in qemu_thread_start (args=<optimized out>) at /build/qemu-5.0/util/qemu-thread-posix.c:519
> #12 0x00007fbf95c5f184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> #13 0x00007fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6
>
> When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE
> flag by bdrv_inactivate_all(), and during the second migration the
> bdrv_inactivate_recurse assert that the bs->open_flags is already
> BDRV_O_INACTIVE enabled which cause crash.
>
> As Vladimir suggested, this patch makes migrate_prepare check the state of vm and
> return error if it is in RUN_STATE_POSTMIGRATE state.
>
> Signed-off-by: Tuguoyi <tu.guoyi@h3c.com>
Similar issue is reported by Li Zhang(+CC) with almost same patch[3]
to fix this.

Reported-by: Li Zhang <li.zhang@cloud.ionos.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>

[3] https://marc.info/?l=qemu-devel&m=160749859831357&w=2
> ---
>  migration/migration.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 87a9b59..5e33962 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2115,6 +2115,12 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
>          return false;
>      }
>
> +    if (runstate_check(RUN_STATE_POSTMIGRATE)) {
> +        error_setg(errp, "Can't migrate the vm that was paused due to "
> +                   "previous migration");
> +        return false;
> +    }
> +
>      if (migration_is_blocked(errp)) {
>          return false;
>      }
> --
> 2.7.4
>
> [Patch v2]: https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg01318.html
> [Patch v1]: https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05950.html

RE: [PATCH v3] migration: Don't allow migration if vm is in POSTMIGRATE
Posted by Tuguoyi 3 years, 4 months ago
Ping. 
It seems no one handle this patch. 


> -----Original Message-----
> From: Pankaj Gupta [mailto:pankaj.gupta.linux@gmail.com]
> Sent: Wednesday, December 09, 2020 10:21 PM
> To: tuguoyi (Cloud) <tu.guoyi@h3c.com>
> Cc: Juan Quintela <quintela@redhat.com>; Dr. David Alan Gilbert
> <dgilbert@redhat.com>; vsementsov@virtuozzo.com;
> qemu-devel@nongnu.org; Li Zhang <li.zhang@cloud.ionos.com>
> Subject: Re: [PATCH v3] migration: Don't allow migration if vm is in
> POSTMIGRATE
> 
> > The following steps will cause qemu assertion failure:
> > - pause vm by executing 'virsh suspend'
> > - create external snapshot of memory and disk using 'virsh
> snapshot-create-as'
> > - doing the above operation again will cause qemu crash
> >
> > The backtrace looks like:
> > #0  0x00007fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> > #1  0x00007fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> > #2  0x00007fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #3  0x00007fbf958beca2 in __assert_fail () from
> /lib/x86_64-linux-gnu/libc.so.6
> > #4  0x000055ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400)
> at /build/qemu-5.0/block.c:5724
> > #5  0x000055ca8dece967 in bdrv_inactivate_all () at
> /build//qemu-5.0/block.c:5792
> > #6  0x000055ca8de5539d in
> qemu_savevm_state_complete_precopy_non_iterable (inactivate_disks=true,
> in_postcopy=false, f=0x55ca907044b0)
> >     at /build/qemu-5.0/migration/savevm.c:1401
> > #7  qemu_savevm_state_complete_precopy (f=0x55ca907044b0,
> iterable_only=iterable_only@entry=false,
> inactivate_disks=inactivate_disks@entry=true)
> >     at /build/qemu-5.0/migration/savevm.c:1453
> > #8  0x000055ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at
> /build/qemu-5.0/migration/migration.c:2941
> > #9  migration_iteration_run (s=0x55ca8f64d9f0) at
> /build/qemu-5.0/migration/migration.c:3295
> > #10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at
> /build/qemu-5.0/migration/migration.c:3459
> > #11 0x000055ca8dfc6716 in qemu_thread_start (args=<optimized out>) at
> /build/qemu-5.0/util/qemu-thread-posix.c:519
> > #12 0x00007fbf95c5f184 in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> > #13 0x00007fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6
> >
> > When the first migration completes, bs->open_flags will set
> BDRV_O_INACTIVE
> > flag by bdrv_inactivate_all(), and during the second migration the
> > bdrv_inactivate_recurse assert that the bs->open_flags is already
> > BDRV_O_INACTIVE enabled which cause crash.
> >
> > As Vladimir suggested, this patch makes migrate_prepare check the state of
> vm and
> > return error if it is in RUN_STATE_POSTMIGRATE state.
> >
> > Signed-off-by: Tuguoyi <tu.guoyi@h3c.com>
> Similar issue is reported by Li Zhang(+CC) with almost same patch[3]
> to fix this.
> 
> Reported-by: Li Zhang <li.zhang@cloud.ionos.com>
> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
> 
> [3] https://marc.info/?l=qemu-devel&m=160749859831357&w=2
> > ---
> >  migration/migration.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 87a9b59..5e33962 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -2115,6 +2115,12 @@ static bool migrate_prepare(MigrationState *s,
> bool blk, bool blk_inc,
> >          return false;
> >      }
> >
> > +    if (runstate_check(RUN_STATE_POSTMIGRATE)) {
> > +        error_setg(errp, "Can't migrate the vm that was paused due to "
> > +                   "previous migration");
> > +        return false;
> > +    }
> > +
> >      if (migration_is_blocked(errp)) {
> >          return false;
> >      }
> > --
> > 2.7.4
> >
> > [Patch v2]:
> https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg01318.html
> > [Patch v1]:
> https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05950.html
Re: [PATCH v3] migration: Don't allow migration if vm is in POSTMIGRATE
Posted by Dr. David Alan Gilbert 3 years, 4 months ago
* Tuguoyi (tu.guoyi@h3c.com) wrote:
> The following steps will cause qemu assertion failure:
> - pause vm by executing 'virsh suspend'
> - create external snapshot of memory and disk using 'virsh snapshot-create-as'
> - doing the above operation again will cause qemu crash
> 
> The backtrace looks like:
> #0  0x00007fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x00007fbf958beca2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x000055ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400) at /build/qemu-5.0/block.c:5724
> #5  0x000055ca8dece967 in bdrv_inactivate_all () at /build//qemu-5.0/block.c:5792
> #6  0x000055ca8de5539d in qemu_savevm_state_complete_precopy_non_iterable (inactivate_disks=true, in_postcopy=false, f=0x55ca907044b0)
>     at /build/qemu-5.0/migration/savevm.c:1401
> #7  qemu_savevm_state_complete_precopy (f=0x55ca907044b0, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true)
>     at /build/qemu-5.0/migration/savevm.c:1453
> #8  0x000055ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:2941
> #9  migration_iteration_run (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3295
> #10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3459
> #11 0x000055ca8dfc6716 in qemu_thread_start (args=<optimized out>) at /build/qemu-5.0/util/qemu-thread-posix.c:519
> #12 0x00007fbf95c5f184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> #13 0x00007fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE
> flag by bdrv_inactivate_all(), and during the second migration the
> bdrv_inactivate_recurse assert that the bs->open_flags is already
> BDRV_O_INACTIVE enabled which cause crash.
> 
> As Vladimir suggested, this patch makes migrate_prepare check the state of vm and
> return error if it is in RUN_STATE_POSTMIGRATE state.
> 
> Signed-off-by: Tuguoyi <tu.guoyi@h3c.com>

Yes, we've had this problem for a long long time; there are a bunch of
other similar conditions; the real answer is to figure out some command
for explicitly handing back control of the block devices (without
actually restarting the CPUs).

However, this is a reasonable patch to cover the common case.

Queued

> ---
>  migration/migration.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 87a9b59..5e33962 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2115,6 +2115,12 @@ static bool migrate_prepare(MigrationState *s, bool blk, bool blk_inc,
>          return false;
>      }
>  
> +    if (runstate_check(RUN_STATE_POSTMIGRATE)) {
> +        error_setg(errp, "Can't migrate the vm that was paused due to "
> +                   "previous migration");
> +        return false;
> +    }
> +
>      if (migration_is_blocked(errp)) {
>          return false;
>      }
> -- 
> 2.7.4
> 
> [Patch v2]: https://lists.gnu.org/archive/html/qemu-devel/2020-12/msg01318.html
> [Patch v1]: https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg05950.html
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [PATCH v3] migration: Don't allow migration if vm is in POSTMIGRATE
Posted by Vladimir Sementsov-Ogievskiy 3 years, 4 months ago
08.12.2020 04:46, Tuguoyi wrote:
> The following steps will cause qemu assertion failure:
> - pause vm by executing 'virsh suspend'
> - create external snapshot of memory and disk using 'virsh snapshot-create-as'
> - doing the above operation again will cause qemu crash
> 
> The backtrace looks like:
> #0  0x00007fbf958c5c37 in raise () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007fbf958c9028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007fbf958bebf6 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x00007fbf958beca2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
> #4  0x000055ca8decd39d in bdrv_inactivate_recurse (bs=0x55ca90c80400) at /build/qemu-5.0/block.c:5724
> #5  0x000055ca8dece967 in bdrv_inactivate_all () at /build//qemu-5.0/block.c:5792
> #6  0x000055ca8de5539d in qemu_savevm_state_complete_precopy_non_iterable (inactivate_disks=true, in_postcopy=false, f=0x55ca907044b0)
>      at /build/qemu-5.0/migration/savevm.c:1401
> #7  qemu_savevm_state_complete_precopy (f=0x55ca907044b0, iterable_only=iterable_only@entry=false, inactivate_disks=inactivate_disks@entry=true)
>      at /build/qemu-5.0/migration/savevm.c:1453
> #8  0x000055ca8de4f581 in migration_completion (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:2941
> #9  migration_iteration_run (s=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3295
> #10 migration_thread (opaque=opaque@entry=0x55ca8f64d9f0) at /build/qemu-5.0/migration/migration.c:3459
> #11 0x000055ca8dfc6716 in qemu_thread_start (args=<optimized out>) at /build/qemu-5.0/util/qemu-thread-posix.c:519
> #12 0x00007fbf95c5f184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> #13 0x00007fbf9598cbed in clone () from /lib/x86_64-linux-gnu/libc.so.6
> 
> When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE
> flag by bdrv_inactivate_all(), and during the second migration the
> bdrv_inactivate_recurse assert that the bs->open_flags is already
> BDRV_O_INACTIVE enabled which cause crash.
> 
> As Vladimir suggested, this patch makes migrate_prepare check the state of vm and
> return error if it is in RUN_STATE_POSTMIGRATE state.
> 
> Signed-off-by: Tuguoyi<tu.guoyi@h3c.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir