[Qemu-devel] [PATCH] migration: Don't activate block devices if using -S

Dr. David Alan Gilbert (git) posted 1 patch 6 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180328170207.49512-1-dgilbert@redhat.com
Test checkpatch passed
Test docker-build@min-glib failed
Test docker-mingw@fedora passed
Test docker-quick@centos6 failed
Test s390x passed
There is a newer version of this series
migration/migration.c | 22 +++++++++++++++-------
1 file changed, 15 insertions(+), 7 deletions(-)
[Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Dr. David Alan Gilbert (git) 6 years ago
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Activating the block devices causes the locks to be taken on
the backing file.  If we're running with -S and the destination libvirt
hasn't started the destination with 'cont', it's expecting the locks are
still untaken.

Don't activate the block devices if we're not going to autostart the VM;
'cont' already will do that anyway.

bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 52a5092add..58bd382730 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -306,13 +306,21 @@ static void process_incoming_migration_bh(void *opaque)
     Error *local_err = NULL;
     MigrationIncomingState *mis = opaque;
 
-    /* Make sure all file formats flush their mutable metadata.
-     * If we get an error here, just don't restart the VM yet. */
-    bdrv_invalidate_cache_all(&local_err);
-    if (local_err) {
-        error_report_err(local_err);
-        local_err = NULL;
-        autostart = false;
+    /* Only fire up the block code now if we're going to restart the
+     * VM, else 'cont' will do it.
+     * This causes file locking to happen; so we don't want it to happen
+     * unless we really are starting the VM.
+     */
+    if (autostart && (!global_state_received() ||
+        global_state_get_runstate() == RUN_STATE_RUNNING)) {
+        /* Make sure all file formats flush their mutable metadata.
+         * If we get an error here, just don't restart the VM yet. */
+        bdrv_invalidate_cache_all(&local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            local_err = NULL;
+            autostart = false;
+        }
     }
 
     /*
-- 
2.14.3


Re: [Qemu-devel] [PATCH for-2.12] migration: Don't activate block devices if using -S
Posted by Eric Blake 6 years ago
On 03/28/2018 12:02 PM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Activating the block devices causes the locks to be taken on
> the backing file.  If we're running with -S and the destination libvirt
> hasn't started the destination with 'cont', it's expecting the locks are
> still untaken.
> 
> Don't activate the block devices if we're not going to autostart the VM;
> 'cont' already will do that anyway.
> 
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   migration/migration.c | 22 +++++++++++++++-------
>   1 file changed, 15 insertions(+), 7 deletions(-)

Sounds like 2.12 material.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Dr. David Alan Gilbert 6 years ago
* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Activating the block devices causes the locks to be taken on
> the backing file.  If we're running with -S and the destination libvirt
> hasn't started the destination with 'cont', it's expecting the locks are
> still untaken.
> 
> Don't activate the block devices if we're not going to autostart the VM;
> 'cont' already will do that anyway.
> 
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Queued

> ---
>  migration/migration.c | 22 +++++++++++++++-------
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 52a5092add..58bd382730 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -306,13 +306,21 @@ static void process_incoming_migration_bh(void *opaque)
>      Error *local_err = NULL;
>      MigrationIncomingState *mis = opaque;
>  
> -    /* Make sure all file formats flush their mutable metadata.
> -     * If we get an error here, just don't restart the VM yet. */
> -    bdrv_invalidate_cache_all(&local_err);
> -    if (local_err) {
> -        error_report_err(local_err);
> -        local_err = NULL;
> -        autostart = false;
> +    /* Only fire up the block code now if we're going to restart the
> +     * VM, else 'cont' will do it.
> +     * This causes file locking to happen; so we don't want it to happen
> +     * unless we really are starting the VM.
> +     */
> +    if (autostart && (!global_state_received() ||
> +        global_state_get_runstate() == RUN_STATE_RUNNING)) {
> +        /* Make sure all file formats flush their mutable metadata.
> +         * If we get an error here, just don't restart the VM yet. */
> +        bdrv_invalidate_cache_all(&local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            local_err = NULL;
> +            autostart = false;
> +        }
>      }
>  
>      /*
> -- 
> 2.14.3
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by no-reply@patchew.org 6 years ago
Hi,

This series failed docker-build@min-glib build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 20180328170207.49512-1-dgilbert@redhat.com
Subject: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S

=== TEST SCRIPT BEGIN ===
#!/bin/bash
set -e
git submodule update --init dtc
# Let docker tests dump environment info
export SHOW_ENV=1
export J=8
time make docker-test-build@min-glib
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
4862dfc311 migration: Don't activate block devices if using -S

=== OUTPUT BEGIN ===
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-_1xmie25/src/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
  BUILD   min-glib
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-_1xmie25/src'
  GEN     /var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863/qemu.tar
Cloning into '/var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863/qemu.tar.vroot'...
done.
Checking out files:  20% (1255/6066)   
Checking out files:  21% (1274/6066)   
Checking out files:  22% (1335/6066)   
Checking out files:  22% (1355/6066)   
Checking out files:  23% (1396/6066)   
Checking out files:  24% (1456/6066)   
Checking out files:  25% (1517/6066)   
Checking out files:  26% (1578/6066)   
Checking out files:  27% (1638/6066)   
Checking out files:  28% (1699/6066)   
Checking out files:  29% (1760/6066)   
Checking out files:  29% (1806/6066)   
Checking out files:  30% (1820/6066)   
Checking out files:  31% (1881/6066)   
Checking out files:  32% (1942/6066)   
Checking out files:  33% (2002/6066)   
Checking out files:  34% (2063/6066)   
Checking out files:  35% (2124/6066)   
Checking out files:  36% (2184/6066)   
Checking out files:  37% (2245/6066)   
Checking out files:  38% (2306/6066)   
Checking out files:  39% (2366/6066)   
Checking out files:  40% (2427/6066)   
Checking out files:  41% (2488/6066)   
Checking out files:  42% (2548/6066)   
Checking out files:  43% (2609/6066)   
Checking out files:  44% (2670/6066)   
Checking out files:  45% (2730/6066)   
Checking out files:  46% (2791/6066)   
Checking out files:  47% (2852/6066)   
Checking out files:  48% (2912/6066)   
Checking out files:  49% (2973/6066)   
Checking out files:  50% (3033/6066)   
Checking out files:  51% (3094/6066)   
Checking out files:  52% (3155/6066)   
Checking out files:  53% (3215/6066)   
Checking out files:  54% (3276/6066)   
Checking out files:  55% (3337/6066)   
Checking out files:  56% (3397/6066)   
Checking out files:  57% (3458/6066)   
Checking out files:  58% (3519/6066)   
Checking out files:  59% (3579/6066)   
Checking out files:  60% (3640/6066)   
Checking out files:  61% (3701/6066)   
Checking out files:  62% (3761/6066)   
Checking out files:  63% (3822/6066)   
Checking out files:  64% (3883/6066)   
Checking out files:  65% (3943/6066)   
Checking out files:  66% (4004/6066)   
Checking out files:  67% (4065/6066)   
Checking out files:  68% (4125/6066)   
Checking out files:  69% (4186/6066)   
Checking out files:  70% (4247/6066)   
Checking out files:  71% (4307/6066)   
Checking out files:  72% (4368/6066)   
Checking out files:  73% (4429/6066)   
Checking out files:  74% (4489/6066)   
Checking out files:  75% (4550/6066)   
Checking out files:  76% (4611/6066)   
Checking out files:  77% (4671/6066)   
Checking out files:  78% (4732/6066)   
Checking out files:  79% (4793/6066)   
Checking out files:  79% (4794/6066)   
Checking out files:  80% (4853/6066)   
Checking out files:  81% (4914/6066)   
Checking out files:  82% (4975/6066)   
Checking out files:  83% (5035/6066)   
Checking out files:  84% (5096/6066)   
Checking out files:  85% (5157/6066)   
Checking out files:  86% (5217/6066)   
Checking out files:  87% (5278/6066)   
Checking out files:  88% (5339/6066)   
Checking out files:  89% (5399/6066)   
Checking out files:  90% (5460/6066)   
Checking out files:  91% (5521/6066)   
Checking out files:  92% (5581/6066)   
Checking out files:  93% (5642/6066)   
Checking out files:  94% (5703/6066)   
Checking out files:  95% (5763/6066)   
Checking out files:  96% (5824/6066)   
Checking out files:  97% (5885/6066)   
Checking out files:  98% (5945/6066)   
Checking out files:  98% (5997/6066)   
Checking out files:  99% (6006/6066)   
Checking out files: 100% (6066/6066)   
Checking out files: 100% (6066/6066), done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb'
Cloning into '/var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce'
tar: /var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863/qemu.tar: Wrote only 2048 of 10240 bytes
tar: Error is not recoverable: exiting now
failed to create tar file
  COPY    RUNNER
    RUN test-build in qemu:min-glib 
tar: Unexpected EOF in archive
tar: rmtlseek not stopped at a record boundary
tar: Error is not recoverable: exiting now
/var/tmp/qemu/run: line 32: prep_fail: command not found
Environment variables:
HOSTNAME=e1cbd43a966e
MAKEFLAGS= -j8
J=8
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
TARGET_LIST=
SHLVL=1
HOME=/root
TEST_DIR=/tmp/qemu-test
FEATURES= dtc
DEBUG=
_=/usr/bin/env

/var/tmp/qemu/run: line 52: cd: /tmp/qemu-test/src/tests/docker: No such file or directory
/var/tmp/qemu/run: line 57: /test-build: No such file or directory
/var/tmp/qemu/run: line 57: exec: /test-build: cannot execute: No such file or directory
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 407, in <module>
    sys.exit(main())
  File "./tests/docker/docker.py", line 404, in main
    return args.cmdobj.run(args, argv)
  File "./tests/docker/docker.py", line 261, in run
    return Docker().run(argv, args.keep, quiet=args.quiet)
  File "./tests/docker/docker.py", line 229, in run
    quiet=quiet)
  File "./tests/docker/docker.py", line 147, in _do_check
    return subprocess.check_call(self._command + cmd, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['docker', 'run', '--label', 'com.qemu.instance.uuid=0111c03c34b911e88e0d52540069c830', '-u', '0', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/root/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-_1xmie25/src/docker-src.2018-03-31-03.55.58.6863:/var/tmp/qemu:z,ro', 'qemu:min-glib', '/var/tmp/qemu/run', 'test-build']' returned non-zero exit status 126
make[1]: *** [tests/docker/Makefile.include:129: docker-run] Error 1
make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-_1xmie25/src'
make: *** [tests/docker/Makefile.include:163: docker-run-test-build@min-glib] Error 2

real	0m36.921s
user	0m8.999s
sys	0m6.657s
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Kevin Wolf 6 years ago
Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Activating the block devices causes the locks to be taken on
> the backing file.  If we're running with -S and the destination libvirt
> hasn't started the destination with 'cont', it's expecting the locks are
> still untaken.
> 
> Don't activate the block devices if we're not going to autostart the VM;
> 'cont' already will do that anyway.
> 
> bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

I'm not sure that this is a good idea. Going back to my old writeup of
the migration phases...

https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html

...the phase between migration completion and 'cont' is described like
this:

    b) Migration converges:
       Both VMs are stopped (assuming -S is given on the destination,
       otherwise this phase is skipped), the destination is in control of
       the resources

This patch changes the definition of the phase so that neither side is
in control of the resources. We lose the phase where the destination is
in control, but the VM isn't running yet. This feels like a problem to
me.

Consider a case where the management tool keeps a mirror job with
sync=none running to expose all I/O requests to some external process.
It needs to shut down the old block job on the source in the
'pre-switchover' state, and start a new block job on the destination
when the destination controls the images, but the VM doesn't run yet (so
that it doesn't miss an I/O request). This patch removes the migration
phase that the management tool needs to implement this correctly.

If we need a "neither side has control" phase, we might need to
introduce it in addition to the existing phases rather than replacing a
phase that is still needed in other cases.

Kevin

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Dr. David Alan Gilbert 6 years ago
* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Activating the block devices causes the locks to be taken on
> > the backing file.  If we're running with -S and the destination libvirt
> > hasn't started the destination with 'cont', it's expecting the locks are
> > still untaken.
> > 
> > Don't activate the block devices if we're not going to autostart the VM;
> > 'cont' already will do that anyway.
> > 
> > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
> I'm not sure that this is a good idea. Going back to my old writeup of
> the migration phases...
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html
> 
> ...the phase between migration completion and 'cont' is described like
> this:
> 
>     b) Migration converges:
>        Both VMs are stopped (assuming -S is given on the destination,
>        otherwise this phase is skipped), the destination is in control of
>        the resources
> 
> This patch changes the definition of the phase so that neither side is
> in control of the resources. We lose the phase where the destination is
> in control, but the VM isn't running yet. This feels like a problem to
> me.

But see Jiri's writeup on that bz;  libvirt is hitting the opposite
problem;   in this corner case they can't have the destination taking
control yet.

> Consider a case where the management tool keeps a mirror job with
> sync=none running to expose all I/O requests to some external process.
> It needs to shut down the old block job on the source in the
> 'pre-switchover' state, and start a new block job on the destination
> when the destination controls the images, but the VM doesn't run yet (so
> that it doesn't miss an I/O request). This patch removes the migration
> phase that the management tool needs to implement this correctly.
> 
> If we need a "neither side has control" phase, we might need to
> introduce it in addition to the existing phases rather than replacing a
> phase that is still needed in other cases.

This is yet another phase to be added.
IMHO this needs the managment tool to explicitly take control in the
case you're talking about.

> Kevin

Dave (out this week)

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Kevin Wolf 6 years ago
Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben:
> * Kevin Wolf (kwolf@redhat.com) wrote:
> > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Activating the block devices causes the locks to be taken on
> > > the backing file.  If we're running with -S and the destination libvirt
> > > hasn't started the destination with 'cont', it's expecting the locks are
> > > still untaken.
> > > 
> > > Don't activate the block devices if we're not going to autostart the VM;
> > > 'cont' already will do that anyway.
> > > 
> > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > 
> > I'm not sure that this is a good idea. Going back to my old writeup of
> > the migration phases...
> > 
> > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html
> > 
> > ...the phase between migration completion and 'cont' is described like
> > this:
> > 
> >     b) Migration converges:
> >        Both VMs are stopped (assuming -S is given on the destination,
> >        otherwise this phase is skipped), the destination is in control of
> >        the resources
> > 
> > This patch changes the definition of the phase so that neither side is
> > in control of the resources. We lose the phase where the destination is
> > in control, but the VM isn't running yet. This feels like a problem to
> > me.
> 
> But see Jiri's writeup on that bz;  libvirt is hitting the opposite
> problem;   in this corner case they can't have the destination taking
> control yet.

I wonder if they can't already grant the destination QEMU the necessary
permission in the pre-switchover phase. Just a thought, I don't know how
this works in detail, so it might not possible after all.

> > Consider a case where the management tool keeps a mirror job with
> > sync=none running to expose all I/O requests to some external process.
> > It needs to shut down the old block job on the source in the
> > 'pre-switchover' state, and start a new block job on the destination
> > when the destination controls the images, but the VM doesn't run yet (so
> > that it doesn't miss an I/O request). This patch removes the migration
> > phase that the management tool needs to implement this correctly.
> > 
> > If we need a "neither side has control" phase, we might need to
> > introduce it in addition to the existing phases rather than replacing a
> > phase that is still needed in other cases.
> 
> This is yet another phase to be added.
> IMHO this needs the managment tool to explicitly take control in the
> case you're talking about.

What kind of mechanism do you have in mind there?

Maybe what could work would be separate QMP commands to inactivate (and
possibly for symmetry activate) all block nodes. Then the management
tool could use the pre-switchover phase to shut down its block jobs
etc., inactivate all block nodes, transfer its own locks and then call
migrate-continue.

Kevin

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Dr. David Alan Gilbert 6 years ago
* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Activating the block devices causes the locks to be taken on
> > > > the backing file.  If we're running with -S and the destination libvirt
> > > > hasn't started the destination with 'cont', it's expecting the locks are
> > > > still untaken.
> > > > 
> > > > Don't activate the block devices if we're not going to autostart the VM;
> > > > 'cont' already will do that anyway.
> > > > 
> > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > 
> > > I'm not sure that this is a good idea. Going back to my old writeup of
> > > the migration phases...
> > > 
> > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html
> > > 
> > > ...the phase between migration completion and 'cont' is described like
> > > this:
> > > 
> > >     b) Migration converges:
> > >        Both VMs are stopped (assuming -S is given on the destination,
> > >        otherwise this phase is skipped), the destination is in control of
> > >        the resources
> > > 
> > > This patch changes the definition of the phase so that neither side is
> > > in control of the resources. We lose the phase where the destination is
> > > in control, but the VM isn't running yet. This feels like a problem to
> > > me.
> > 
> > But see Jiri's writeup on that bz;  libvirt is hitting the opposite
> > problem;   in this corner case they can't have the destination taking
> > control yet.
> 
> I wonder if they can't already grant the destination QEMU the necessary
> permission in the pre-switchover phase. Just a thought, I don't know how
> this works in detail, so it might not possible after all.

It's a fairly hairy failure case they had; if I remember correctly it's:
  a) Start migration
  b) Migration gets to completion point
  c) Destination is still paused
  d) Libvirt is restarted on the source
  e) Since libvirt was restarted it fails the migration (and hence knows
     the destination won't be started)
  f) It now tries to resume the qemu on the source

(f) fails because (b) caused the locks to be taken on the destination;
hence this patch stops doing that.  It's a case we don't really think
about - i.e. that the migration has actually completed and all the data
is on the destination, but libvirt decides for some other reason to
abandon migration.

> > > Consider a case where the management tool keeps a mirror job with
> > > sync=none running to expose all I/O requests to some external process.
> > > It needs to shut down the old block job on the source in the
> > > 'pre-switchover' state, and start a new block job on the destination
> > > when the destination controls the images, but the VM doesn't run yet (so
> > > that it doesn't miss an I/O request). This patch removes the migration
> > > phase that the management tool needs to implement this correctly.
> > > 
> > > If we need a "neither side has control" phase, we might need to
> > > introduce it in addition to the existing phases rather than replacing a
> > > phase that is still needed in other cases.
> > 
> > This is yet another phase to be added.
> > IMHO this needs the managment tool to explicitly take control in the
> > case you're talking about.
> 
> What kind of mechanism do you have in mind there?
> 
> Maybe what could work would be separate QMP commands to inactivate (and
> possibly for symmetry activate) all block nodes. Then the management
> tool could use the pre-switchover phase to shut down its block jobs
> etc., inactivate all block nodes, transfer its own locks and then call
> migrate-continue.

Yes it was a 'block-activate' that I'd wondered about.  One complication
is that if this now under the control of the management layer then we
should stop asserting when the block devices aren't in the expected
state and just cleanly fail the command instead.

Dave

> Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Kevin Wolf 6 years ago
Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> * Kevin Wolf (kwolf@redhat.com) wrote:
> > Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben:
> > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben:
> > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > 
> > > > > Activating the block devices causes the locks to be taken on
> > > > > the backing file.  If we're running with -S and the destination libvirt
> > > > > hasn't started the destination with 'cont', it's expecting the locks are
> > > > > still untaken.
> > > > > 
> > > > > Don't activate the block devices if we're not going to autostart the VM;
> > > > > 'cont' already will do that anyway.
> > > > > 
> > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
> > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > 
> > > > I'm not sure that this is a good idea. Going back to my old writeup of
> > > > the migration phases...
> > > > 
> > > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html
> > > > 
> > > > ...the phase between migration completion and 'cont' is described like
> > > > this:
> > > > 
> > > >     b) Migration converges:
> > > >        Both VMs are stopped (assuming -S is given on the destination,
> > > >        otherwise this phase is skipped), the destination is in control of
> > > >        the resources
> > > > 
> > > > This patch changes the definition of the phase so that neither side is
> > > > in control of the resources. We lose the phase where the destination is
> > > > in control, but the VM isn't running yet. This feels like a problem to
> > > > me.
> > > 
> > > But see Jiri's writeup on that bz;  libvirt is hitting the opposite
> > > problem;   in this corner case they can't have the destination taking
> > > control yet.
> > 
> > I wonder if they can't already grant the destination QEMU the necessary
> > permission in the pre-switchover phase. Just a thought, I don't know how
> > this works in detail, so it might not possible after all.
> 
> It's a fairly hairy failure case they had; if I remember correctly it's:
>   a) Start migration
>   b) Migration gets to completion point
>   c) Destination is still paused
>   d) Libvirt is restarted on the source
>   e) Since libvirt was restarted it fails the migration (and hence knows
>      the destination won't be started)
>   f) It now tries to resume the qemu on the source
> 
> (f) fails because (b) caused the locks to be taken on the destination;
> hence this patch stops doing that.  It's a case we don't really think
> about - i.e. that the migration has actually completed and all the data
> is on the destination, but libvirt decides for some other reason to
> abandon migration.

If you do remember correctly, that scenario doesn't feel tricky at all.
libvirt needs to quit the destination qemu, which will inactivate the
images on the destination and release the lock, and then it can continue
the source.

In fact, this is so straightforward that I wonder what else libvirt is
doing. Is the destination qemu only shut down after trying to continue
the source? That would be libvirt using the wrong order of steps.

> > > > Consider a case where the management tool keeps a mirror job with
> > > > sync=none running to expose all I/O requests to some external process.
> > > > It needs to shut down the old block job on the source in the
> > > > 'pre-switchover' state, and start a new block job on the destination
> > > > when the destination controls the images, but the VM doesn't run yet (so
> > > > that it doesn't miss an I/O request). This patch removes the migration
> > > > phase that the management tool needs to implement this correctly.
> > > > 
> > > > If we need a "neither side has control" phase, we might need to
> > > > introduce it in addition to the existing phases rather than replacing a
> > > > phase that is still needed in other cases.
> > > 
> > > This is yet another phase to be added.
> > > IMHO this needs the managment tool to explicitly take control in the
> > > case you're talking about.
> > 
> > What kind of mechanism do you have in mind there?
> > 
> > Maybe what could work would be separate QMP commands to inactivate (and
> > possibly for symmetry activate) all block nodes. Then the management
> > tool could use the pre-switchover phase to shut down its block jobs
> > etc., inactivate all block nodes, transfer its own locks and then call
> > migrate-continue.
> 
> Yes it was a 'block-activate' that I'd wondered about.  One complication
> is that if this now under the control of the management layer then we
> should stop asserting when the block devices aren't in the expected
> state and just cleanly fail the command instead.

Requiring an explicit 'block-activate' on the destination would be an
incompatible change, so you would have to introduce a new option for
that. 'block-inactivate' on the source feels a bit simpler.

And yes, you're probably right that we would have to be more careful to
catch inactive images without crashing. On the other hand, it would
become a state that is easier to test because it can be directly
influenced via QMP rather than being only a side effect of migration.

Kevin

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Dr. David Alan Gilbert 6 years ago
* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben:
> > > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben:
> > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > 
> > > > > > Activating the block devices causes the locks to be taken on
> > > > > > the backing file.  If we're running with -S and the destination libvirt
> > > > > > hasn't started the destination with 'cont', it's expecting the locks are
> > > > > > still untaken.
> > > > > > 
> > > > > > Don't activate the block devices if we're not going to autostart the VM;
> > > > > > 'cont' already will do that anyway.
> > > > > > 
> > > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
> > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > 
> > > > > I'm not sure that this is a good idea. Going back to my old writeup of
> > > > > the migration phases...
> > > > > 
> > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html
> > > > > 
> > > > > ...the phase between migration completion and 'cont' is described like
> > > > > this:
> > > > > 
> > > > >     b) Migration converges:
> > > > >        Both VMs are stopped (assuming -S is given on the destination,
> > > > >        otherwise this phase is skipped), the destination is in control of
> > > > >        the resources
> > > > > 
> > > > > This patch changes the definition of the phase so that neither side is
> > > > > in control of the resources. We lose the phase where the destination is
> > > > > in control, but the VM isn't running yet. This feels like a problem to
> > > > > me.
> > > > 
> > > > But see Jiri's writeup on that bz;  libvirt is hitting the opposite
> > > > problem;   in this corner case they can't have the destination taking
> > > > control yet.
> > > 
> > > I wonder if they can't already grant the destination QEMU the necessary
> > > permission in the pre-switchover phase. Just a thought, I don't know how
> > > this works in detail, so it might not possible after all.
> > 
> > It's a fairly hairy failure case they had; if I remember correctly it's:
> >   a) Start migration
> >   b) Migration gets to completion point
> >   c) Destination is still paused
> >   d) Libvirt is restarted on the source
> >   e) Since libvirt was restarted it fails the migration (and hence knows
> >      the destination won't be started)
> >   f) It now tries to resume the qemu on the source
> > 
> > (f) fails because (b) caused the locks to be taken on the destination;
> > hence this patch stops doing that.  It's a case we don't really think
> > about - i.e. that the migration has actually completed and all the data
> > is on the destination, but libvirt decides for some other reason to
> > abandon migration.
> 
> If you do remember correctly, that scenario doesn't feel tricky at all.
> libvirt needs to quit the destination qemu, which will inactivate the
> images on the destination and release the lock, and then it can continue
> the source.
> 
> In fact, this is so straightforward that I wonder what else libvirt is
> doing. Is the destination qemu only shut down after trying to continue
> the source? That would be libvirt using the wrong order of steps.

I'll leave Jiri to reply to this; I think this is a case of the source
realising libvirt has restarted, then trying to recover all of it's VMs
without being in the position of being able to check on the destination.

> > > > > Consider a case where the management tool keeps a mirror job with
> > > > > sync=none running to expose all I/O requests to some external process.
> > > > > It needs to shut down the old block job on the source in the
> > > > > 'pre-switchover' state, and start a new block job on the destination
> > > > > when the destination controls the images, but the VM doesn't run yet (so
> > > > > that it doesn't miss an I/O request). This patch removes the migration
> > > > > phase that the management tool needs to implement this correctly.
> > > > > 
> > > > > If we need a "neither side has control" phase, we might need to
> > > > > introduce it in addition to the existing phases rather than replacing a
> > > > > phase that is still needed in other cases.
> > > > 
> > > > This is yet another phase to be added.
> > > > IMHO this needs the managment tool to explicitly take control in the
> > > > case you're talking about.
> > > 
> > > What kind of mechanism do you have in mind there?
> > > 
> > > Maybe what could work would be separate QMP commands to inactivate (and
> > > possibly for symmetry activate) all block nodes. Then the management
> > > tool could use the pre-switchover phase to shut down its block jobs
> > > etc., inactivate all block nodes, transfer its own locks and then call
> > > migrate-continue.
> > 
> > Yes it was a 'block-activate' that I'd wondered about.  One complication
> > is that if this now under the control of the management layer then we
> > should stop asserting when the block devices aren't in the expected
> > state and just cleanly fail the command instead.
> 
> Requiring an explicit 'block-activate' on the destination would be an
> incompatible change, so you would have to introduce a new option for
> that. 'block-inactivate' on the source feels a bit simpler.

I'd only want the 'block-activate' in the case of this new block-job
you're suggesting; not in the case of normal migrates - they'd still get
it when they do 'cont' - so the change in behaviour is only with that
block-job case that must start before the end of migrate.

> And yes, you're probably right that we would have to be more careful to
> catch inactive images without crashing. On the other hand, it would
> become a state that is easier to test because it can be directly
> influenced via QMP rather than being only a side effect of migration.

Yes; but crashing is really bad, so we should really really stopping
asserting all over.

Dave

> Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Kevin Wolf 6 years ago
Am 09.04.2018 um 16:04 hat Dr. David Alan Gilbert geschrieben:
> * Kevin Wolf (kwolf@redhat.com) wrote:
> > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben:
> > > > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > > > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben:
> > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > > 
> > > > > > > Activating the block devices causes the locks to be taken on
> > > > > > > the backing file.  If we're running with -S and the destination libvirt
> > > > > > > hasn't started the destination with 'cont', it's expecting the locks are
> > > > > > > still untaken.
> > > > > > > 
> > > > > > > Don't activate the block devices if we're not going to autostart the VM;
> > > > > > > 'cont' already will do that anyway.
> > > > > > > 
> > > > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
> > > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > > 
> > > > > > I'm not sure that this is a good idea. Going back to my old writeup of
> > > > > > the migration phases...
> > > > > > 
> > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html
> > > > > > 
> > > > > > ...the phase between migration completion and 'cont' is described like
> > > > > > this:
> > > > > > 
> > > > > >     b) Migration converges:
> > > > > >        Both VMs are stopped (assuming -S is given on the destination,
> > > > > >        otherwise this phase is skipped), the destination is in control of
> > > > > >        the resources
> > > > > > 
> > > > > > This patch changes the definition of the phase so that neither side is
> > > > > > in control of the resources. We lose the phase where the destination is
> > > > > > in control, but the VM isn't running yet. This feels like a problem to
> > > > > > me.
> > > > > 
> > > > > But see Jiri's writeup on that bz;  libvirt is hitting the opposite
> > > > > problem;   in this corner case they can't have the destination taking
> > > > > control yet.
> > > > 
> > > > I wonder if they can't already grant the destination QEMU the necessary
> > > > permission in the pre-switchover phase. Just a thought, I don't know how
> > > > this works in detail, so it might not possible after all.
> > > 
> > > It's a fairly hairy failure case they had; if I remember correctly it's:
> > >   a) Start migration
> > >   b) Migration gets to completion point
> > >   c) Destination is still paused
> > >   d) Libvirt is restarted on the source
> > >   e) Since libvirt was restarted it fails the migration (and hence knows
> > >      the destination won't be started)
> > >   f) It now tries to resume the qemu on the source
> > > 
> > > (f) fails because (b) caused the locks to be taken on the destination;
> > > hence this patch stops doing that.  It's a case we don't really think
> > > about - i.e. that the migration has actually completed and all the data
> > > is on the destination, but libvirt decides for some other reason to
> > > abandon migration.
> > 
> > If you do remember correctly, that scenario doesn't feel tricky at all.
> > libvirt needs to quit the destination qemu, which will inactivate the
> > images on the destination and release the lock, and then it can continue
> > the source.
> > 
> > In fact, this is so straightforward that I wonder what else libvirt is
> > doing. Is the destination qemu only shut down after trying to continue
> > the source? That would be libvirt using the wrong order of steps.
> 
> I'll leave Jiri to reply to this; I think this is a case of the source
> realising libvirt has restarted, then trying to recover all of it's VMs
> without being in the position of being able to check on the destination.
> 
> > > > > > Consider a case where the management tool keeps a mirror job with
> > > > > > sync=none running to expose all I/O requests to some external process.
> > > > > > It needs to shut down the old block job on the source in the
> > > > > > 'pre-switchover' state, and start a new block job on the destination
> > > > > > when the destination controls the images, but the VM doesn't run yet (so
> > > > > > that it doesn't miss an I/O request). This patch removes the migration
> > > > > > phase that the management tool needs to implement this correctly.
> > > > > > 
> > > > > > If we need a "neither side has control" phase, we might need to
> > > > > > introduce it in addition to the existing phases rather than replacing a
> > > > > > phase that is still needed in other cases.
> > > > > 
> > > > > This is yet another phase to be added.
> > > > > IMHO this needs the managment tool to explicitly take control in the
> > > > > case you're talking about.
> > > > 
> > > > What kind of mechanism do you have in mind there?
> > > > 
> > > > Maybe what could work would be separate QMP commands to inactivate (and
> > > > possibly for symmetry activate) all block nodes. Then the management
> > > > tool could use the pre-switchover phase to shut down its block jobs
> > > > etc., inactivate all block nodes, transfer its own locks and then call
> > > > migrate-continue.
> > > 
> > > Yes it was a 'block-activate' that I'd wondered about.  One complication
> > > is that if this now under the control of the management layer then we
> > > should stop asserting when the block devices aren't in the expected
> > > state and just cleanly fail the command instead.
> > 
> > Requiring an explicit 'block-activate' on the destination would be an
> > incompatible change, so you would have to introduce a new option for
> > that. 'block-inactivate' on the source feels a bit simpler.
> 
> I'd only want the 'block-activate' in the case of this new block-job
> you're suggesting; not in the case of normal migrates - they'd still get
> it when they do 'cont' - so the change in behaviour is only with that
> block-job case that must start before the end of migrate.

I'm not aware of having suggested a new block job?

> > And yes, you're probably right that we would have to be more careful to
> > catch inactive images without crashing. On the other hand, it would
> > become a state that is easier to test because it can be directly
> > influenced via QMP rather than being only a side effect of migration.
> 
> Yes; but crashing is really bad, so we should really really stopping
> asserting all over.

Are you aware of any wrong assertions currently?

The thing is, inactive images can only happen in a fairly restricted set
of scenarios today - either on the source after migration completed, or
on the destination before it completed. If you get any write I/O
requests in these states, that's a QEMU bug, so assertions to catch
these bugs feel right to me.

Kevin

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Dr. David Alan Gilbert 6 years ago
* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 09.04.2018 um 16:04 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> > > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > > Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben:
> > > > > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > > > > Am 28.03.2018 um 19:02 hat Dr. David Alan Gilbert (git) geschrieben:
> > > > > > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > > > > 
> > > > > > > > Activating the block devices causes the locks to be taken on
> > > > > > > > the backing file.  If we're running with -S and the destination libvirt
> > > > > > > > hasn't started the destination with 'cont', it's expecting the locks are
> > > > > > > > still untaken.
> > > > > > > > 
> > > > > > > > Don't activate the block devices if we're not going to autostart the VM;
> > > > > > > > 'cont' already will do that anyway.
> > > > > > > > 
> > > > > > > > bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854
> > > > > > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > > > > 
> > > > > > > I'm not sure that this is a good idea. Going back to my old writeup of
> > > > > > > the migration phases...
> > > > > > > 
> > > > > > > https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg07917.html
> > > > > > > 
> > > > > > > ...the phase between migration completion and 'cont' is described like
> > > > > > > this:
> > > > > > > 
> > > > > > >     b) Migration converges:
> > > > > > >        Both VMs are stopped (assuming -S is given on the destination,
> > > > > > >        otherwise this phase is skipped), the destination is in control of
> > > > > > >        the resources
> > > > > > > 
> > > > > > > This patch changes the definition of the phase so that neither side is
> > > > > > > in control of the resources. We lose the phase where the destination is
> > > > > > > in control, but the VM isn't running yet. This feels like a problem to
> > > > > > > me.
> > > > > > 
> > > > > > But see Jiri's writeup on that bz;  libvirt is hitting the opposite
> > > > > > problem;   in this corner case they can't have the destination taking
> > > > > > control yet.
> > > > > 
> > > > > I wonder if they can't already grant the destination QEMU the necessary
> > > > > permission in the pre-switchover phase. Just a thought, I don't know how
> > > > > this works in detail, so it might not possible after all.
> > > > 
> > > > It's a fairly hairy failure case they had; if I remember correctly it's:
> > > >   a) Start migration
> > > >   b) Migration gets to completion point
> > > >   c) Destination is still paused
> > > >   d) Libvirt is restarted on the source
> > > >   e) Since libvirt was restarted it fails the migration (and hence knows
> > > >      the destination won't be started)
> > > >   f) It now tries to resume the qemu on the source
> > > > 
> > > > (f) fails because (b) caused the locks to be taken on the destination;
> > > > hence this patch stops doing that.  It's a case we don't really think
> > > > about - i.e. that the migration has actually completed and all the data
> > > > is on the destination, but libvirt decides for some other reason to
> > > > abandon migration.
> > > 
> > > If you do remember correctly, that scenario doesn't feel tricky at all.
> > > libvirt needs to quit the destination qemu, which will inactivate the
> > > images on the destination and release the lock, and then it can continue
> > > the source.
> > > 
> > > In fact, this is so straightforward that I wonder what else libvirt is
> > > doing. Is the destination qemu only shut down after trying to continue
> > > the source? That would be libvirt using the wrong order of steps.
> > 
> > I'll leave Jiri to reply to this; I think this is a case of the source
> > realising libvirt has restarted, then trying to recover all of it's VMs
> > without being in the position of being able to check on the destination.
> > 
> > > > > > > Consider a case where the management tool keeps a mirror job with
> > > > > > > sync=none running to expose all I/O requests to some external process.
> > > > > > > It needs to shut down the old block job on the source in the
> > > > > > > 'pre-switchover' state, and start a new block job on the destination
> > > > > > > when the destination controls the images, but the VM doesn't run yet (so
> > > > > > > that it doesn't miss an I/O request). This patch removes the migration
> > > > > > > phase that the management tool needs to implement this correctly.
> > > > > > > 
> > > > > > > If we need a "neither side has control" phase, we might need to
> > > > > > > introduce it in addition to the existing phases rather than replacing a
> > > > > > > phase that is still needed in other cases.
> > > > > > 
> > > > > > This is yet another phase to be added.
> > > > > > IMHO this needs the managment tool to explicitly take control in the
> > > > > > case you're talking about.
> > > > > 
> > > > > What kind of mechanism do you have in mind there?
> > > > > 
> > > > > Maybe what could work would be separate QMP commands to inactivate (and
> > > > > possibly for symmetry activate) all block nodes. Then the management
> > > > > tool could use the pre-switchover phase to shut down its block jobs
> > > > > etc., inactivate all block nodes, transfer its own locks and then call
> > > > > migrate-continue.
> > > > 
> > > > Yes it was a 'block-activate' that I'd wondered about.  One complication
> > > > is that if this now under the control of the management layer then we
> > > > should stop asserting when the block devices aren't in the expected
> > > > state and just cleanly fail the command instead.
> > > 
> > > Requiring an explicit 'block-activate' on the destination would be an
> > > incompatible change, so you would have to introduce a new option for
> > > that. 'block-inactivate' on the source feels a bit simpler.
> > 
> > I'd only want the 'block-activate' in the case of this new block-job
> > you're suggesting; not in the case of normal migrates - they'd still get
> > it when they do 'cont' - so the change in behaviour is only with that
> > block-job case that must start before the end of migrate.
> 
> I'm not aware of having suggested a new block job?

I'm referring to your concern in your first reply in the thread:
     Consider a case where the management tool keeps a mirror job with
     sync=none running to expose all I/O requests to some external process.

> > > And yes, you're probably right that we would have to be more careful to
> > > catch inactive images without crashing. On the other hand, it would
> > > become a state that is easier to test because it can be directly
> > > influenced via QMP rather than being only a side effect of migration.
> > 
> > Yes; but crashing is really bad, so we should really really stopping
> > asserting all over.
> 
> Are you aware of any wrong assertions currently?

Well, there's https://bugzilla.redhat.com/show_bug.cgi?id=1408653  that
I've not looked at for a while.
But we have had a few lately.

> The thing is, inactive images can only happen in a fairly restricted set
> of scenarios today - either on the source after migration completed, or
> on the destination before it completed. If you get any write I/O
> requests in these states, that's a QEMU bug, so assertions to catch
> these bugs feel right to me.

But if we add the 'block-inactivate' command you suggest, then it could
be a management screwup rather than a qemu bug, and so assertions feel
wrong.

Dave


> Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Jiri Denemark 6 years ago
On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote:
> Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> > It's a fairly hairy failure case they had; if I remember correctly it's:
> >   a) Start migration
> >   b) Migration gets to completion point
> >   c) Destination is still paused
> >   d) Libvirt is restarted on the source
> >   e) Since libvirt was restarted it fails the migration (and hence knows
> >      the destination won't be started)
> >   f) It now tries to resume the qemu on the source
> > 
> > (f) fails because (b) caused the locks to be taken on the destination;
> > hence this patch stops doing that.  It's a case we don't really think
> > about - i.e. that the migration has actually completed and all the data
> > is on the destination, but libvirt decides for some other reason to
> > abandon migration.
> 
> If you do remember correctly, that scenario doesn't feel tricky at all.
> libvirt needs to quit the destination qemu, which will inactivate the
> images on the destination and release the lock, and then it can continue
> the source.
> 
> In fact, this is so straightforward that I wonder what else libvirt is
> doing. Is the destination qemu only shut down after trying to continue
> the source? That would be libvirt using the wrong order of steps.

There's no connection between the two libvirt daemons in the case we're
talking about so they can't really synchronize the actions. The
destination daemon will kill the new QEMU process and the source will
resume the old one, but the order is completely random.

...
> > Yes it was a 'block-activate' that I'd wondered about.  One complication
> > is that if this now under the control of the management layer then we
> > should stop asserting when the block devices aren't in the expected
> > state and just cleanly fail the command instead.
> 
> Requiring an explicit 'block-activate' on the destination would be an
> incompatible change, so you would have to introduce a new option for
> that. 'block-inactivate' on the source feels a bit simpler.

As I said in another email, the explicit block-activate command could
depend on a migration capability similarly to how pre-switchover state
works.

Jirka

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Kevin Wolf 6 years ago
Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben:
> On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote:
> > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> > > It's a fairly hairy failure case they had; if I remember correctly it's:
> > >   a) Start migration
> > >   b) Migration gets to completion point
> > >   c) Destination is still paused
> > >   d) Libvirt is restarted on the source
> > >   e) Since libvirt was restarted it fails the migration (and hence knows
> > >      the destination won't be started)
> > >   f) It now tries to resume the qemu on the source
> > > 
> > > (f) fails because (b) caused the locks to be taken on the destination;
> > > hence this patch stops doing that.  It's a case we don't really think
> > > about - i.e. that the migration has actually completed and all the data
> > > is on the destination, but libvirt decides for some other reason to
> > > abandon migration.
> > 
> > If you do remember correctly, that scenario doesn't feel tricky at all.
> > libvirt needs to quit the destination qemu, which will inactivate the
> > images on the destination and release the lock, and then it can continue
> > the source.
> > 
> > In fact, this is so straightforward that I wonder what else libvirt is
> > doing. Is the destination qemu only shut down after trying to continue
> > the source? That would be libvirt using the wrong order of steps.
> 
> There's no connection between the two libvirt daemons in the case we're
> talking about so they can't really synchronize the actions. The
> destination daemon will kill the new QEMU process and the source will
> resume the old one, but the order is completely random.

Hm, okay...

> > > Yes it was a 'block-activate' that I'd wondered about.  One complication
> > > is that if this now under the control of the management layer then we
> > > should stop asserting when the block devices aren't in the expected
> > > state and just cleanly fail the command instead.
> > 
> > Requiring an explicit 'block-activate' on the destination would be an
> > incompatible change, so you would have to introduce a new option for
> > that. 'block-inactivate' on the source feels a bit simpler.
> 
> As I said in another email, the explicit block-activate command could
> depend on a migration capability similarly to how pre-switchover state
> works.

Yeah, that's exactly the thing that we wouldn't need if we could use
'block-inactivate' on the source instead. It feels a bit wrong to
design a more involved QEMU interface around the libvirt internals, but
as long as we implement both sides for symmetry and libvirt just happens
to pick the destination side for now, I think it's okay.

By the way, are block devices the only thing that need to be explicitly
activated? For example, what about qemu_announce_self() for network
cards, do we need to delay that, too?

In any case, I think this patch needs to be reverted for 2.12 because
it's wrong, and then we can create the proper solution in the 2.13
timefrage.

Kevin

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Dr. David Alan Gilbert 6 years ago
* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben:
> > On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote:
> > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> > > > It's a fairly hairy failure case they had; if I remember correctly it's:
> > > >   a) Start migration
> > > >   b) Migration gets to completion point
> > > >   c) Destination is still paused
> > > >   d) Libvirt is restarted on the source
> > > >   e) Since libvirt was restarted it fails the migration (and hence knows
> > > >      the destination won't be started)
> > > >   f) It now tries to resume the qemu on the source
> > > > 
> > > > (f) fails because (b) caused the locks to be taken on the destination;
> > > > hence this patch stops doing that.  It's a case we don't really think
> > > > about - i.e. that the migration has actually completed and all the data
> > > > is on the destination, but libvirt decides for some other reason to
> > > > abandon migration.
> > > 
> > > If you do remember correctly, that scenario doesn't feel tricky at all.
> > > libvirt needs to quit the destination qemu, which will inactivate the
> > > images on the destination and release the lock, and then it can continue
> > > the source.
> > > 
> > > In fact, this is so straightforward that I wonder what else libvirt is
> > > doing. Is the destination qemu only shut down after trying to continue
> > > the source? That would be libvirt using the wrong order of steps.
> > 
> > There's no connection between the two libvirt daemons in the case we're
> > talking about so they can't really synchronize the actions. The
> > destination daemon will kill the new QEMU process and the source will
> > resume the old one, but the order is completely random.
> 
> Hm, okay...
> 
> > > > Yes it was a 'block-activate' that I'd wondered about.  One complication
> > > > is that if this now under the control of the management layer then we
> > > > should stop asserting when the block devices aren't in the expected
> > > > state and just cleanly fail the command instead.
> > > 
> > > Requiring an explicit 'block-activate' on the destination would be an
> > > incompatible change, so you would have to introduce a new option for
> > > that. 'block-inactivate' on the source feels a bit simpler.
> > 
> > As I said in another email, the explicit block-activate command could
> > depend on a migration capability similarly to how pre-switchover state
> > works.
> 
> Yeah, that's exactly the thing that we wouldn't need if we could use
> 'block-inactivate' on the source instead. It feels a bit wrong to
> design a more involved QEMU interface around the libvirt internals,

It's not necessarily 'libvirt internals' - it's a case of them having to
cope with recovering from failures that happen around migration; it's
not an easy problem, and if they've got a way to stop both sides running
at the same time that's pretty important.

> but
> as long as we implement both sides for symmetry and libvirt just happens
> to pick the destination side for now, I think it's okay.
> 
> By the way, are block devices the only thing that need to be explicitly
> activated? For example, what about qemu_announce_self() for network
> cards, do we need to delay that, too?
> 
> In any case, I think this patch needs to be reverted for 2.12 because
> it's wrong, and then we can create the proper solution in the 2.13
> timefrage.

what case does this break?
I'm a bit wary of reverting this, which fixes a known problem, on the
basis that it causes a theoretical problem.

Dave

> Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Kevin Wolf 6 years ago
Am 10.04.2018 um 10:45 hat Dr. David Alan Gilbert geschrieben:
> * Kevin Wolf (kwolf@redhat.com) wrote:
> > Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben:
> > > On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote:
> > > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> > > > > It's a fairly hairy failure case they had; if I remember correctly it's:
> > > > >   a) Start migration
> > > > >   b) Migration gets to completion point
> > > > >   c) Destination is still paused
> > > > >   d) Libvirt is restarted on the source
> > > > >   e) Since libvirt was restarted it fails the migration (and hence knows
> > > > >      the destination won't be started)
> > > > >   f) It now tries to resume the qemu on the source
> > > > > 
> > > > > (f) fails because (b) caused the locks to be taken on the destination;
> > > > > hence this patch stops doing that.  It's a case we don't really think
> > > > > about - i.e. that the migration has actually completed and all the data
> > > > > is on the destination, but libvirt decides for some other reason to
> > > > > abandon migration.
> > > > 
> > > > If you do remember correctly, that scenario doesn't feel tricky at all.
> > > > libvirt needs to quit the destination qemu, which will inactivate the
> > > > images on the destination and release the lock, and then it can continue
> > > > the source.
> > > > 
> > > > In fact, this is so straightforward that I wonder what else libvirt is
> > > > doing. Is the destination qemu only shut down after trying to continue
> > > > the source? That would be libvirt using the wrong order of steps.
> > > 
> > > There's no connection between the two libvirt daemons in the case we're
> > > talking about so they can't really synchronize the actions. The
> > > destination daemon will kill the new QEMU process and the source will
> > > resume the old one, but the order is completely random.
> > 
> > Hm, okay...
> > 
> > > > > Yes it was a 'block-activate' that I'd wondered about.  One complication
> > > > > is that if this now under the control of the management layer then we
> > > > > should stop asserting when the block devices aren't in the expected
> > > > > state and just cleanly fail the command instead.
> > > > 
> > > > Requiring an explicit 'block-activate' on the destination would be an
> > > > incompatible change, so you would have to introduce a new option for
> > > > that. 'block-inactivate' on the source feels a bit simpler.
> > > 
> > > As I said in another email, the explicit block-activate command could
> > > depend on a migration capability similarly to how pre-switchover state
> > > works.
> > 
> > Yeah, that's exactly the thing that we wouldn't need if we could use
> > 'block-inactivate' on the source instead. It feels a bit wrong to
> > design a more involved QEMU interface around the libvirt internals,
> 
> It's not necessarily 'libvirt internals' - it's a case of them having to
> cope with recovering from failures that happen around migration; it's
> not an easy problem, and if they've got a way to stop both sides running
> at the same time that's pretty important.

The 'libvirt internals' isn't that it needs an additional state where
neither source nor destination QEMU own the images, but that it has to
be between migration completion and image activation on the destination
rather than between image inactivation on the source and migration
completion. The latter would be much easier for qemu, but apparently it
doesn't work for libvirt because of how it works internally.

But as I said, I'd just implement both for symmetry and then management
tools can pick whatever makes their life easier.

> > but
> > as long as we implement both sides for symmetry and libvirt just happens
> > to pick the destination side for now, I think it's okay.
> > 
> > By the way, are block devices the only thing that need to be explicitly
> > activated? For example, what about qemu_announce_self() for network
> > cards, do we need to delay that, too?
> > 
> > In any case, I think this patch needs to be reverted for 2.12 because
> > it's wrong, and then we can create the proper solution in the 2.13
> > timefrage.
> 
> what case does this break?
> I'm a bit wary of reverting this, which fixes a known problem, on the
> basis that it causes a theoretical problem.

It breaks the API. And the final design we're having in mind now is
compatible with the old API, not with the new one exposed by this patch,
so that switch would break the API again to get back to the old state.

Do you know all the scripts that people are using around QEMU? I don't,
but I know that plenty of them exist, so I don't think we can declare
this API breakage purely theoretical.

Yes, the patch fixes a known problem, but also a problem that is a rare
corner case error that you can only hit with really bad timing. Do we
really want to risk unconditionally breaking success cases for fixing a
mostly theoretical corner case error path (with the failure mode that
the guest is paused when it shouldn't be)?

Kevin

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Dr. David Alan Gilbert 6 years ago
* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 10.04.2018 um 10:45 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben:
> > > > On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote:
> > > > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> > > > > > It's a fairly hairy failure case they had; if I remember correctly it's:
> > > > > >   a) Start migration
> > > > > >   b) Migration gets to completion point
> > > > > >   c) Destination is still paused
> > > > > >   d) Libvirt is restarted on the source
> > > > > >   e) Since libvirt was restarted it fails the migration (and hence knows
> > > > > >      the destination won't be started)
> > > > > >   f) It now tries to resume the qemu on the source
> > > > > > 
> > > > > > (f) fails because (b) caused the locks to be taken on the destination;
> > > > > > hence this patch stops doing that.  It's a case we don't really think
> > > > > > about - i.e. that the migration has actually completed and all the data
> > > > > > is on the destination, but libvirt decides for some other reason to
> > > > > > abandon migration.
> > > > > 
> > > > > If you do remember correctly, that scenario doesn't feel tricky at all.
> > > > > libvirt needs to quit the destination qemu, which will inactivate the
> > > > > images on the destination and release the lock, and then it can continue
> > > > > the source.
> > > > > 
> > > > > In fact, this is so straightforward that I wonder what else libvirt is
> > > > > doing. Is the destination qemu only shut down after trying to continue
> > > > > the source? That would be libvirt using the wrong order of steps.
> > > > 
> > > > There's no connection between the two libvirt daemons in the case we're
> > > > talking about so they can't really synchronize the actions. The
> > > > destination daemon will kill the new QEMU process and the source will
> > > > resume the old one, but the order is completely random.
> > > 
> > > Hm, okay...
> > > 
> > > > > > Yes it was a 'block-activate' that I'd wondered about.  One complication
> > > > > > is that if this now under the control of the management layer then we
> > > > > > should stop asserting when the block devices aren't in the expected
> > > > > > state and just cleanly fail the command instead.
> > > > > 
> > > > > Requiring an explicit 'block-activate' on the destination would be an
> > > > > incompatible change, so you would have to introduce a new option for
> > > > > that. 'block-inactivate' on the source feels a bit simpler.
> > > > 
> > > > As I said in another email, the explicit block-activate command could
> > > > depend on a migration capability similarly to how pre-switchover state
> > > > works.
> > > 
> > > Yeah, that's exactly the thing that we wouldn't need if we could use
> > > 'block-inactivate' on the source instead. It feels a bit wrong to
> > > design a more involved QEMU interface around the libvirt internals,
> > 
> > It's not necessarily 'libvirt internals' - it's a case of them having to
> > cope with recovering from failures that happen around migration; it's
> > not an easy problem, and if they've got a way to stop both sides running
> > at the same time that's pretty important.
> 
> The 'libvirt internals' isn't that it needs an additional state where
> neither source nor destination QEMU own the images, but that it has to
> be between migration completion and image activation on the destination
> rather than between image inactivation on the source and migration
> completion. The latter would be much easier for qemu, but apparently it
> doesn't work for libvirt because of how it works internally.

I suspect this is actually a fundamental requirement to ensuring that we
don't end up with a QEMU running on both sides rather than how libvirt is
structured.

> But as I said, I'd just implement both for symmetry and then management
> tools can pick whatever makes their life easier.
> 
> > > but
> > > as long as we implement both sides for symmetry and libvirt just happens
> > > to pick the destination side for now, I think it's okay.
> > > 
> > > By the way, are block devices the only thing that need to be explicitly
> > > activated? For example, what about qemu_announce_self() for network
> > > cards, do we need to delay that, too?
> > > 
> > > In any case, I think this patch needs to be reverted for 2.12 because
> > > it's wrong, and then we can create the proper solution in the 2.13
> > > timefrage.
> > 
> > what case does this break?
> > I'm a bit wary of reverting this, which fixes a known problem, on the
> > basis that it causes a theoretical problem.
> 
> It breaks the API. And the final design we're having in mind now is
> compatible with the old API, not with the new one exposed by this patch,
> so that switch would break the API again to get back to the old state.
> 
> Do you know all the scripts that people are using around QEMU? I don't,
> but I know that plenty of them exist, so I don't think we can declare
> this API breakage purely theoretical.
> 
> Yes, the patch fixes a known problem, but also a problem that is a rare
> corner case error that you can only hit with really bad timing. Do we
> really want to risk unconditionally breaking success cases for fixing a
> mostly theoretical corner case error path (with the failure mode that
> the guest is paused when it shouldn't be)?

Hmm; having chatted to Jiri I'm OK with reverting it, on the condition
that I actually understand how this alternative would work first.

I can't currently see how a block-inactivate would be used.
I also can't see how a block-activate unless it's also with the
change that you're asking to revert.

Can you explain the way you see it working?

Dave

> Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Kevin Wolf 6 years ago
Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben:
> * Kevin Wolf (kwolf@redhat.com) wrote:
> > Am 10.04.2018 um 10:45 hat Dr. David Alan Gilbert geschrieben:
> > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben:
> > > > > On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote:
> > > > > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> > > > > > > It's a fairly hairy failure case they had; if I remember correctly it's:
> > > > > > >   a) Start migration
> > > > > > >   b) Migration gets to completion point
> > > > > > >   c) Destination is still paused
> > > > > > >   d) Libvirt is restarted on the source
> > > > > > >   e) Since libvirt was restarted it fails the migration (and hence knows
> > > > > > >      the destination won't be started)
> > > > > > >   f) It now tries to resume the qemu on the source
> > > > > > > 
> > > > > > > (f) fails because (b) caused the locks to be taken on the destination;
> > > > > > > hence this patch stops doing that.  It's a case we don't really think
> > > > > > > about - i.e. that the migration has actually completed and all the data
> > > > > > > is on the destination, but libvirt decides for some other reason to
> > > > > > > abandon migration.
> > > > > > 
> > > > > > If you do remember correctly, that scenario doesn't feel tricky at all.
> > > > > > libvirt needs to quit the destination qemu, which will inactivate the
> > > > > > images on the destination and release the lock, and then it can continue
> > > > > > the source.
> > > > > > 
> > > > > > In fact, this is so straightforward that I wonder what else libvirt is
> > > > > > doing. Is the destination qemu only shut down after trying to continue
> > > > > > the source? That would be libvirt using the wrong order of steps.
> > > > > 
> > > > > There's no connection between the two libvirt daemons in the case we're
> > > > > talking about so they can't really synchronize the actions. The
> > > > > destination daemon will kill the new QEMU process and the source will
> > > > > resume the old one, but the order is completely random.
> > > > 
> > > > Hm, okay...
> > > > 
> > > > > > > Yes it was a 'block-activate' that I'd wondered about.  One complication
> > > > > > > is that if this now under the control of the management layer then we
> > > > > > > should stop asserting when the block devices aren't in the expected
> > > > > > > state and just cleanly fail the command instead.
> > > > > > 
> > > > > > Requiring an explicit 'block-activate' on the destination would be an
> > > > > > incompatible change, so you would have to introduce a new option for
> > > > > > that. 'block-inactivate' on the source feels a bit simpler.
> > > > > 
> > > > > As I said in another email, the explicit block-activate command could
> > > > > depend on a migration capability similarly to how pre-switchover state
> > > > > works.
> > > > 
> > > > Yeah, that's exactly the thing that we wouldn't need if we could use
> > > > 'block-inactivate' on the source instead. It feels a bit wrong to
> > > > design a more involved QEMU interface around the libvirt internals,
> > > 
> > > It's not necessarily 'libvirt internals' - it's a case of them having to
> > > cope with recovering from failures that happen around migration; it's
> > > not an easy problem, and if they've got a way to stop both sides running
> > > at the same time that's pretty important.
> > 
> > The 'libvirt internals' isn't that it needs an additional state where
> > neither source nor destination QEMU own the images, but that it has to
> > be between migration completion and image activation on the destination
> > rather than between image inactivation on the source and migration
> > completion. The latter would be much easier for qemu, but apparently it
> > doesn't work for libvirt because of how it works internally.
> 
> I suspect this is actually a fundamental requirement to ensuring that we
> don't end up with a QEMU running on both sides rather than how libvirt is
> structured.

I don't think so. In theory, both options can provide the same. If
anything, it's related specifically to the phases that Jirka described
that libvirt uses to implement migration.

> > But as I said, I'd just implement both for symmetry and then management
> > tools can pick whatever makes their life easier.
> > 
> > > > but
> > > > as long as we implement both sides for symmetry and libvirt just happens
> > > > to pick the destination side for now, I think it's okay.
> > > > 
> > > > By the way, are block devices the only thing that need to be explicitly
> > > > activated? For example, what about qemu_announce_self() for network
> > > > cards, do we need to delay that, too?
> > > > 
> > > > In any case, I think this patch needs to be reverted for 2.12 because
> > > > it's wrong, and then we can create the proper solution in the 2.13
> > > > timefrage.
> > > 
> > > what case does this break?
> > > I'm a bit wary of reverting this, which fixes a known problem, on the
> > > basis that it causes a theoretical problem.
> > 
> > It breaks the API. And the final design we're having in mind now is
> > compatible with the old API, not with the new one exposed by this patch,
> > so that switch would break the API again to get back to the old state.
> > 
> > Do you know all the scripts that people are using around QEMU? I don't,
> > but I know that plenty of them exist, so I don't think we can declare
> > this API breakage purely theoretical.
> > 
> > Yes, the patch fixes a known problem, but also a problem that is a rare
> > corner case error that you can only hit with really bad timing. Do we
> > really want to risk unconditionally breaking success cases for fixing a
> > mostly theoretical corner case error path (with the failure mode that
> > the guest is paused when it shouldn't be)?
> 
> Hmm; having chatted to Jiri I'm OK with reverting it, on the condition
> that I actually understand how this alternative would work first.
> 
> I can't currently see how a block-inactivate would be used.
> I also can't see how a block-activate unless it's also with the
> change that you're asking to revert.
> 
> Can you explain the way you see it working?

The key is making the delayed activation of block devices (and probably
delayed announcement of NICs? - you didn't answer that part) optional
instead of making it the default.

We can use Jirka's suggestion of adding a migration capability that
enables it, or I suppose a new option to -incoming could work, too. It
doesn't really matter what the syntax is, but the management tool must
request it explicitly.

Kevin

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Dr. David Alan Gilbert 6 years ago
* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > Am 10.04.2018 um 10:45 hat Dr. David Alan Gilbert geschrieben:
> > > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > > Am 10.04.2018 um 09:36 hat Jiri Denemark geschrieben:
> > > > > > On Mon, Apr 09, 2018 at 15:40:03 +0200, Kevin Wolf wrote:
> > > > > > > Am 09.04.2018 um 12:27 hat Dr. David Alan Gilbert geschrieben:
> > > > > > > > It's a fairly hairy failure case they had; if I remember correctly it's:
> > > > > > > >   a) Start migration
> > > > > > > >   b) Migration gets to completion point
> > > > > > > >   c) Destination is still paused
> > > > > > > >   d) Libvirt is restarted on the source
> > > > > > > >   e) Since libvirt was restarted it fails the migration (and hence knows
> > > > > > > >      the destination won't be started)
> > > > > > > >   f) It now tries to resume the qemu on the source
> > > > > > > > 
> > > > > > > > (f) fails because (b) caused the locks to be taken on the destination;
> > > > > > > > hence this patch stops doing that.  It's a case we don't really think
> > > > > > > > about - i.e. that the migration has actually completed and all the data
> > > > > > > > is on the destination, but libvirt decides for some other reason to
> > > > > > > > abandon migration.
> > > > > > > 
> > > > > > > If you do remember correctly, that scenario doesn't feel tricky at all.
> > > > > > > libvirt needs to quit the destination qemu, which will inactivate the
> > > > > > > images on the destination and release the lock, and then it can continue
> > > > > > > the source.
> > > > > > > 
> > > > > > > In fact, this is so straightforward that I wonder what else libvirt is
> > > > > > > doing. Is the destination qemu only shut down after trying to continue
> > > > > > > the source? That would be libvirt using the wrong order of steps.
> > > > > > 
> > > > > > There's no connection between the two libvirt daemons in the case we're
> > > > > > talking about so they can't really synchronize the actions. The
> > > > > > destination daemon will kill the new QEMU process and the source will
> > > > > > resume the old one, but the order is completely random.
> > > > > 
> > > > > Hm, okay...
> > > > > 
> > > > > > > > Yes it was a 'block-activate' that I'd wondered about.  One complication
> > > > > > > > is that if this now under the control of the management layer then we
> > > > > > > > should stop asserting when the block devices aren't in the expected
> > > > > > > > state and just cleanly fail the command instead.
> > > > > > > 
> > > > > > > Requiring an explicit 'block-activate' on the destination would be an
> > > > > > > incompatible change, so you would have to introduce a new option for
> > > > > > > that. 'block-inactivate' on the source feels a bit simpler.
> > > > > > 
> > > > > > As I said in another email, the explicit block-activate command could
> > > > > > depend on a migration capability similarly to how pre-switchover state
> > > > > > works.
> > > > > 
> > > > > Yeah, that's exactly the thing that we wouldn't need if we could use
> > > > > 'block-inactivate' on the source instead. It feels a bit wrong to
> > > > > design a more involved QEMU interface around the libvirt internals,
> > > > 
> > > > It's not necessarily 'libvirt internals' - it's a case of them having to
> > > > cope with recovering from failures that happen around migration; it's
> > > > not an easy problem, and if they've got a way to stop both sides running
> > > > at the same time that's pretty important.
> > > 
> > > The 'libvirt internals' isn't that it needs an additional state where
> > > neither source nor destination QEMU own the images, but that it has to
> > > be between migration completion and image activation on the destination
> > > rather than between image inactivation on the source and migration
> > > completion. The latter would be much easier for qemu, but apparently it
> > > doesn't work for libvirt because of how it works internally.
> > 
> > I suspect this is actually a fundamental requirement to ensuring that we
> > don't end up with a QEMU running on both sides rather than how libvirt is
> > structured.
> 
> I don't think so. In theory, both options can provide the same. If
> anything, it's related specifically to the phases that Jirka described
> that libvirt uses to implement migration.
> 
> > > But as I said, I'd just implement both for symmetry and then management
> > > tools can pick whatever makes their life easier.
> > > 
> > > > > but
> > > > > as long as we implement both sides for symmetry and libvirt just happens
> > > > > to pick the destination side for now, I think it's okay.
> > > > > 
> > > > > By the way, are block devices the only thing that need to be explicitly
> > > > > activated? For example, what about qemu_announce_self() for network
> > > > > cards, do we need to delay that, too?
> > > > > 
> > > > > In any case, I think this patch needs to be reverted for 2.12 because
> > > > > it's wrong, and then we can create the proper solution in the 2.13
> > > > > timefrage.
> > > > 
> > > > what case does this break?
> > > > I'm a bit wary of reverting this, which fixes a known problem, on the
> > > > basis that it causes a theoretical problem.
> > > 
> > > It breaks the API. And the final design we're having in mind now is
> > > compatible with the old API, not with the new one exposed by this patch,
> > > so that switch would break the API again to get back to the old state.
> > > 
> > > Do you know all the scripts that people are using around QEMU? I don't,
> > > but I know that plenty of them exist, so I don't think we can declare
> > > this API breakage purely theoretical.
> > > 
> > > Yes, the patch fixes a known problem, but also a problem that is a rare
> > > corner case error that you can only hit with really bad timing. Do we
> > > really want to risk unconditionally breaking success cases for fixing a
> > > mostly theoretical corner case error path (with the failure mode that
> > > the guest is paused when it shouldn't be)?
> > 
> > Hmm; having chatted to Jiri I'm OK with reverting it, on the condition
> > that I actually understand how this alternative would work first.
> > 
> > I can't currently see how a block-inactivate would be used.
> > I also can't see how a block-activate unless it's also with the
> > change that you're asking to revert.
> > 
> > Can you explain the way you see it working?
> 
> The key is making the delayed activation of block devices (and probably
> delayed announcement of NICs? - you didn't answer that part) optional
> instead of making it the default.

NIC announcments are broken in similar but slightly different ways;  we
did have a series on list to help a while ago but it never got merged;
I'd like to keep that mess separate.

> We can use Jirka's suggestion of adding a migration capability that
> enables it, or I suppose a new option to -incoming could work, too. It
> doesn't really matter what the syntax is, but the management tool must
> request it explicitly.

A new capability is easy to gate the change in behaviour that this patch
added; I'll do that first thing for 2.13 (given today is rc3 tag it's
too late).

However, once we turn this on, to cope with the situation of a block user
that must start prior to the 'cont' when this behaviour is active, we'd
also need the 'block-activate' command.

Dave

> Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Kevin Wolf 6 years ago
Am 10.04.2018 um 16:22 hat Dr. David Alan Gilbert geschrieben:
> * Kevin Wolf (kwolf@redhat.com) wrote:
> > Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben:
> > > Hmm; having chatted to Jiri I'm OK with reverting it, on the condition
> > > that I actually understand how this alternative would work first.
> > > 
> > > I can't currently see how a block-inactivate would be used.
> > > I also can't see how a block-activate unless it's also with the
> > > change that you're asking to revert.
> > > 
> > > Can you explain the way you see it working?
> > 
> > The key is making the delayed activation of block devices (and probably
> > delayed announcement of NICs? - you didn't answer that part) optional
> > instead of making it the default.
> 
> NIC announcments are broken in similar but slightly different ways;  we
> did have a series on list to help a while ago but it never got merged;
> I'd like to keep that mess separate.

Okay. I just thought that it would make sense to have clear migration
phases that are the same for all external resources that the QEMU
processes use.

> > We can use Jirka's suggestion of adding a migration capability that
> > enables it, or I suppose a new option to -incoming could work, too. It
> > doesn't really matter what the syntax is, but the management tool must
> > request it explicitly.
> 
> A new capability is easy to gate the change in behaviour that this patch
> added; I'll do that first thing for 2.13 (given today is rc3 tag it's
> too late).
> 
> However, once we turn this on, to cope with the situation of a block user
> that must start prior to the 'cont' when this behaviour is active, we'd
> also need the 'block-activate' command.

Yes, that's right.

Kevin

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Jiri Denemark 6 years ago
On Tue, Apr 10, 2018 at 16:47:56 +0200, Kevin Wolf wrote:
> Am 10.04.2018 um 16:22 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben:
> > > > Hmm; having chatted to Jiri I'm OK with reverting it, on the condition
> > > > that I actually understand how this alternative would work first.
> > > > 
> > > > I can't currently see how a block-inactivate would be used.
> > > > I also can't see how a block-activate unless it's also with the
> > > > change that you're asking to revert.
> > > > 
> > > > Can you explain the way you see it working?
> > > 
> > > The key is making the delayed activation of block devices (and probably
> > > delayed announcement of NICs? - you didn't answer that part) optional
> > > instead of making it the default.
> > 
> > NIC announcments are broken in similar but slightly different ways;  we
> > did have a series on list to help a while ago but it never got merged;
> > I'd like to keep that mess separate.
> 
> Okay. I just thought that it would make sense to have clear migration
> phases that are the same for all external resources that the QEMU
> processes use.

I don't think NIC announcements should be delayed in this specific case
since we're dealing with a failure recovery which should be rare in
comparison to successful migration when we want NIC announcements to be
send early. In other words, any NIC issues should be solved separately
and Laine would likely be a better person for discussing them since he
has a broader knowledge of all the fancy network stuff which libvirt
needs to coordinate with.

Jirka

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Kevin Wolf 6 years ago
Am 11.04.2018 um 12:01 hat Jiri Denemark geschrieben:
> On Tue, Apr 10, 2018 at 16:47:56 +0200, Kevin Wolf wrote:
> > Am 10.04.2018 um 16:22 hat Dr. David Alan Gilbert geschrieben:
> > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben:
> > > > > Hmm; having chatted to Jiri I'm OK with reverting it, on the condition
> > > > > that I actually understand how this alternative would work first.
> > > > > 
> > > > > I can't currently see how a block-inactivate would be used.
> > > > > I also can't see how a block-activate unless it's also with the
> > > > > change that you're asking to revert.
> > > > > 
> > > > > Can you explain the way you see it working?
> > > > 
> > > > The key is making the delayed activation of block devices (and probably
> > > > delayed announcement of NICs? - you didn't answer that part) optional
> > > > instead of making it the default.
> > > 
> > > NIC announcments are broken in similar but slightly different ways;  we
> > > did have a series on list to help a while ago but it never got merged;
> > > I'd like to keep that mess separate.
> > 
> > Okay. I just thought that it would make sense to have clear migration
> > phases that are the same for all external resources that the QEMU
> > processes use.
> 
> I don't think NIC announcements should be delayed in this specific case
> since we're dealing with a failure recovery which should be rare in
> comparison to successful migration when we want NIC announcements to be
> send early. In other words, any NIC issues should be solved separately
> and Laine would likely be a better person for discussing them since he
> has a broader knowledge of all the fancy network stuff which libvirt
> needs to coordinate with.

Well, if I were the migration maintainer, I would insist on a properly
designed phase model that solves the problem once and for all because it
would be clear where everything belongs. We could still have bugs in the
future, but that would be internal implementation bugs with no effect on
the API.

But I'm not the maintainer and Dave prefers to deal with it basically as
a bunch of one-off fixes, and that will work, too. It will probably
clutter up the external API a bit (because the management tool will have
to separately address migration of block devices, network devices and
possibly other things in the future), but that shouldn't matter much for
libvirt. Maybe what we do need is some documentation of the recommended
process for performing a live migration so that management tools know
which QMP commands they need to issue when.

Kevin

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Dr. David Alan Gilbert 6 years ago
* Kevin Wolf (kwolf@redhat.com) wrote:
> Am 11.04.2018 um 12:01 hat Jiri Denemark geschrieben:
> > On Tue, Apr 10, 2018 at 16:47:56 +0200, Kevin Wolf wrote:
> > > Am 10.04.2018 um 16:22 hat Dr. David Alan Gilbert geschrieben:
> > > > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > > > Am 10.04.2018 um 12:40 hat Dr. David Alan Gilbert geschrieben:
> > > > > > Hmm; having chatted to Jiri I'm OK with reverting it, on the condition
> > > > > > that I actually understand how this alternative would work first.
> > > > > > 
> > > > > > I can't currently see how a block-inactivate would be used.
> > > > > > I also can't see how a block-activate unless it's also with the
> > > > > > change that you're asking to revert.
> > > > > > 
> > > > > > Can you explain the way you see it working?
> > > > > 
> > > > > The key is making the delayed activation of block devices (and probably
> > > > > delayed announcement of NICs? - you didn't answer that part) optional
> > > > > instead of making it the default.
> > > > 
> > > > NIC announcments are broken in similar but slightly different ways;  we
> > > > did have a series on list to help a while ago but it never got merged;
> > > > I'd like to keep that mess separate.
> > > 
> > > Okay. I just thought that it would make sense to have clear migration
> > > phases that are the same for all external resources that the QEMU
> > > processes use.
> > 
> > I don't think NIC announcements should be delayed in this specific case
> > since we're dealing with a failure recovery which should be rare in
> > comparison to successful migration when we want NIC announcements to be
> > send early. In other words, any NIC issues should be solved separately
> > and Laine would likely be a better person for discussing them since he
> > has a broader knowledge of all the fancy network stuff which libvirt
> > needs to coordinate with.
> 
> Well, if I were the migration maintainer, I would insist on a properly
> designed phase model that solves the problem once and for all because it
> would be clear where everything belongs. We could still have bugs in the
> future, but that would be internal implementation bugs with no effect on
> the API.

My main reason for believing this wouldn't work is that most of the
things we've had  recently have been things where we've found out about
subtle constraints that we previously didn't realise, and hence if we
were writing down the mythical phase model we wouldn't have put in.

I'd have loved to have had some more discussion about what those
requirements were _before_ block locking went in a few versions back,
because unsurprisingly adding hard locking constraints shook a lot of
problems out (and IMHO was a much bigger API change than this change)

> But I'm not the maintainer and Dave prefers to deal with it basically as
> a bunch of one-off fixes, and that will work, too. It will probably
> clutter up the external API a bit (because the management tool will have
> to separately address migration of block devices, network devices and
> possibly other things in the future), but that shouldn't matter much for
> libvirt. Maybe what we do need is some documentation of the recommended
> process for performing a live migration so that management tools know
> which QMP commands they need to issue when.

I'd like to keep the networking stuff separate because it's got a whole
bunch of other interactions that we found out last time we tried to fix
it; in particular Op=enStack's networking interaciton with Libvirt isn't
quite what's expected and OpenStack have a whole bunch of different
network configurations whose behaviour when we change something is fun.
Oh and it depends heavily on the guest - which fortunately the block
stuff doesn't (because on modern virtio-net guests it does the modern announce
from the guest and so only ever happens once the guest CPU is running
which simplifies stuff massively).

Dave



> Kevin
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: Don't activate block devices if using -S
Posted by Jiri Denemark 6 years ago
On Wed, Apr 04, 2018 at 12:03:03 +0200, Kevin Wolf wrote:
> Am 03.04.2018 um 22:52 hat Dr. David Alan Gilbert geschrieben:
> > * Kevin Wolf (kwolf@redhat.com) wrote:
> > > Consider a case where the management tool keeps a mirror job with
> > > sync=none running to expose all I/O requests to some external process.
> > > It needs to shut down the old block job on the source in the
> > > 'pre-switchover' state, and start a new block job on the destination
> > > when the destination controls the images, but the VM doesn't run yet (so
> > > that it doesn't miss an I/O request). This patch removes the migration
> > > phase that the management tool needs to implement this correctly.
> > > 
> > > If we need a "neither side has control" phase, we might need to
> > > introduce it in addition to the existing phases rather than replacing a
> > > phase that is still needed in other cases.
> > 
> > This is yet another phase to be added.
> > IMHO this needs the managment tool to explicitly take control in the
> > case you're talking about.
> 
> What kind of mechanism do you have in mind there?
> 
> Maybe what could work would be separate QMP commands to inactivate (and
> possibly for symmetry activate) all block nodes. Then the management
> tool could use the pre-switchover phase to shut down its block jobs
> etc., inactivate all block nodes, transfer its own locks and then call
> migrate-continue.

Libvirt's migration protocol consists of several phases, the ones
relevant to QEMU are:

1. destination libvirtd starts a new QEMU process with -incoming
2. source libvirtd calls migrate QMP command and waits until migration
   completes; in this step we actually wait for the pre-switchover, shut
   down all block jobs, and call migrate-continue
3. destination libvirtd calls cont QMP command

The daemons only communicated between these steps, i.e., the result of
each steps is transferred to the other side of migration. In other
words, transferring of locks and other state

IIUC what you suggested would require step 2. to be split so that some
work can be done on the destination between "migrate" command and
completed migration. This would be pretty complicated and I don't think
the problem we're trying to solve would be worth it. Not to mention that
it would multiply the number of possible code paths in the migration
code.

However, I don't think the extra step is even necessary. We could just
have a separate QMP command to activate block nodes. Thus the source
would wait for pre-switchover, shut down all block jobs and call
migrate-continue. Once migration completes, the control would be
transferred to the destination which would call the new command to
activate block nodes followed by cont. That is, the "cont" command would
just be replaced with two commands. And this similarly to the
pre-switchover state, this could be controlled by a new migration
capability to make sure all block nodes are activated automatically with
old libvirt which doesn't know anything about the new command. This
would solve compatibility with non-libvirt use cases too.

Jirka