[PATCH] migration: Don't try and recover return path in non-postcopy

Dr. David Alan Gilbert (git) posted 1 patch 4 years, 6 months ago
Test FreeBSD passed
Test docker-mingw@fedora passed
Test asan passed
Test docker-quick@centos7 passed
Test checkpatch passed
Test docker-clang@ubuntu passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20191007103507.31308-1-dgilbert@redhat.com
Maintainers: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>
migration/migration.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] migration: Don't try and recover return path in non-postcopy
Posted by Dr. David Alan Gilbert (git) 4 years, 6 months ago
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In normal precopy we can't do reconnection recovery - but we also
don't need to, since you can just rerun migration.
At the moment if the 'return-path' capability is on, we use
the return path in precopy to give a postiive 'OK' to the end
of migration; however if migration fails then we fall into
the postcopy recovery path and hang.  This fixes it by only
running the return path in the postcopy case.

Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 5f7e4d15e9..d5d9b31bb7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2481,7 +2481,7 @@ retry:
 out:
     res = qemu_file_get_error(rp);
     if (res) {
-        if (res == -EIO) {
+        if (res == -EIO && migration_in_postcopy()) {
             /*
              * Maybe there is something we can do: it looks like a
              * network down issue, and we pause for a recovery.
-- 
2.21.0


Re: [PATCH] migration: Don't try and recover return path in non-postcopy
Posted by Dr. David Alan Gilbert 4 years, 6 months ago
* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> In normal precopy we can't do reconnection recovery - but we also
> don't need to, since you can just rerun migration.
> At the moment if the 'return-path' capability is on, we use
> the return path in precopy to give a postiive 'OK' to the end
> of migration; however if migration fails then we fall into
> the postcopy recovery path and hang.  This fixes it by only
> running the return path in the postcopy case.
> 
> Reported-by: Greg Kurz <groug@kaod.org>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Queued

> ---
>  migration/migration.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 5f7e4d15e9..d5d9b31bb7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2481,7 +2481,7 @@ retry:
>  out:
>      res = qemu_file_get_error(rp);
>      if (res) {
> -        if (res == -EIO) {
> +        if (res == -EIO && migration_in_postcopy()) {
>              /*
>               * Maybe there is something we can do: it looks like a
>               * network down issue, and we pause for a recovery.
> -- 
> 2.21.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [PATCH] migration: Don't try and recover return path in non-postcopy
Posted by Peter Xu 4 years, 6 months ago
On Mon, Oct 07, 2019 at 11:35:07AM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> In normal precopy we can't do reconnection recovery - but we also
> don't need to, since you can just rerun migration.
> At the moment if the 'return-path' capability is on, we use
> the return path in precopy to give a postiive 'OK' to the end
> of migration; however if migration fails then we fall into
> the postcopy recovery path and hang.  This fixes it by only
> running the return path in the postcopy case.
> 
> Reported-by: Greg Kurz <groug@kaod.org>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/migration.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 5f7e4d15e9..d5d9b31bb7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2481,7 +2481,7 @@ retry:
>  out:
>      res = qemu_file_get_error(rp);
>      if (res) {
> -        if (res == -EIO) {
> +        if (res == -EIO && migration_in_postcopy()) {

Makes sense!  I saw that in qemu_loadvm_state_main() we're using
(postcopy_state_get() == POSTCOPY_INCOMING_RUNNING) to check.  That
also makes sense because I think we can't really do the recover if the
migration stream failed at status like POSTCOPY_INCOMING_DISCARD even
if it switched to POSTCOPY_ACTIVE... However that should really be a
very corner case even if it's true, and afaict it's nowhere worse...

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu

Re: [PATCH] migration: Don't try and recover return path in non-postcopy
Posted by Greg Kurz 4 years, 6 months ago
On Mon,  7 Oct 2019 11:35:07 +0100
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> In normal precopy we can't do reconnection recovery - but we also
> don't need to, since you can just rerun migration.
> At the moment if the 'return-path' capability is on, we use
> the return path in precopy to give a postiive 'OK' to the end

s/postiive/positive

> of migration; however if migration fails then we fall into
> the postcopy recovery path and hang.  This fixes it by only
> running the return path in the postcopy case.
> 
> Reported-by: Greg Kurz <groug@kaod.org>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---

Thanks !

Tested-by: Greg Kurz <groug@kaod.org>

>  migration/migration.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 5f7e4d15e9..d5d9b31bb7 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2481,7 +2481,7 @@ retry:
>  out:
>      res = qemu_file_get_error(rp);
>      if (res) {
> -        if (res == -EIO) {
> +        if (res == -EIO && migration_in_postcopy()) {
>              /*
>               * Maybe there is something we can do: it looks like a
>               * network down issue, and we pause for a recovery.