[PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'

Dr. David Alan Gilbert (git) posted 1 patch 4 years, 7 months ago
Test docker-clang@ubuntu passed
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test checkpatch passed
Test asan passed
Test FreeBSD passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190923174942.12182-1-dgilbert@redhat.com
Maintainers: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>
migration/migration.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
[PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
Posted by Dr. David Alan Gilbert (git) 4 years, 7 months ago
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Various parts of the migration code do different things when they're
in postcopy mode; prior to this patch this has been 'postcopy-active'.
This patch extends 'in_postcopy' to include 'postcopy-paused' and
'postcopy-recover'.

In particular, when you set the max-postcopy-bandwidth parameter, this
only affects the current migration fd if we're 'in_postcopy';
this leads to a race in the postcopy recovery test where it increases
the speed from 4k/sec to unlimited, but that increase can get ignored
if the change is made between the point at which the reconnection
happens and it transitions back to active.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/migration.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 01863a95f5..5f7e4d15e9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void)
 {
     MigrationState *s = migrate_get_current();
 
-    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
+    switch (s->state) {
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_PAUSED:
+    case MIGRATION_STATUS_POSTCOPY_RECOVER:
+        return true;
+    default:
+        return false;
+    }
 }
 
 bool migration_in_postcopy_after_devices(MigrationState *s)
-- 
2.21.0


Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
Posted by Alex Bennée 4 years, 7 months ago
Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
>
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

I'm stress testing it now.

> ---
>  migration/migration.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 01863a95f5..5f7e4d15e9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void)
>  {
>      MigrationState *s = migrate_get_current();
>
> -    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +    switch (s->state) {
> +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
> +        return true;
> +    default:
> +        return false;
> +    }
>  }
>
>  bool migration_in_postcopy_after_devices(MigrationState *s)


--
Alex Bennée

Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
Posted by Dr. David Alan Gilbert 4 years, 7 months ago
* Dr. David Alan Gilbert (git) (dgilbert@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
> 
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Queued

> ---
>  migration/migration.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 01863a95f5..5f7e4d15e9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void)
>  {
>      MigrationState *s = migrate_get_current();
>  
> -    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +    switch (s->state) {
> +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
> +        return true;
> +    default:
> +        return false;
> +    }
>  }
>  
>  bool migration_in_postcopy_after_devices(MigrationState *s)
> -- 
> 2.21.0
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
Posted by Markus Armbruster 4 years, 7 months ago
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> writes:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
>
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

This seems to fix the intermittent hangs I observed and bisected to
commit 8504ddeca0 "migration: Fix postcopy bw for recovery".

Tested-by: Markus Armbruster <armbru@redhat.com>

Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
Posted by Alex Bennée 4 years, 7 months ago
Dr. David Alan Gilbert (git) <dgilbert@redhat.com> writes:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
>
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

In my xenial stress test I run 100 times and it never triggered the 180s
timeout I set on my retry.py script:

Tested-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  migration/migration.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 01863a95f5..5f7e4d15e9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1659,7 +1659,14 @@ bool migration_in_postcopy(void)
>  {
>      MigrationState *s = migrate_get_current();
>
> -    return (s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
> +    switch (s->state) {
> +    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> +    case MIGRATION_STATUS_POSTCOPY_PAUSED:
> +    case MIGRATION_STATUS_POSTCOPY_RECOVER:
> +        return true;
> +    default:
> +        return false;
> +    }
>  }
>
>  bool migration_in_postcopy_after_devices(MigrationState *s)


--
Alex Bennée

Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
Posted by Juan Quintela 4 years, 7 months ago
"Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
>
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

Re: [PATCH] migration/postcopy: Recognise the recovery states as 'in_postcopy'
Posted by Peter Xu 4 years, 7 months ago
On Mon, Sep 23, 2019 at 06:49:42PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Various parts of the migration code do different things when they're
> in postcopy mode; prior to this patch this has been 'postcopy-active'.
> This patch extends 'in_postcopy' to include 'postcopy-paused' and
> 'postcopy-recover'.
> 
> In particular, when you set the max-postcopy-bandwidth parameter, this
> only affects the current migration fd if we're 'in_postcopy';
> this leads to a race in the postcopy recovery test where it increases
> the speed from 4k/sec to unlimited, but that increase can get ignored
> if the change is made between the point at which the reconnection
> happens and it transitions back to active.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Yeh this makes quite a lot of sense to me...

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu