[Qemu-devel] [PATCH] migration: incoming postcopy advise sanity checks

Greg Kurz posted 1 patch 6 years, 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/151790197381.27004.13241184632371976036.stgit@bahia.lan
Test checkpatch passed
Test docker-build@min-glib passed
Test docker-mingw@fedora passed
Test docker-quick@centos6 passed
Test ppc passed
Test s390x passed
There is a newer version of this series
migration/savevm.c |   18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
[Qemu-devel] [PATCH] migration: incoming postcopy advise sanity checks
Posted by Greg Kurz 6 years, 1 month ago
If postcopy-ram was set on the source but not on the destination,
migration doesn't occur, the destination prints an error and boots
the guest:

qemu-system-ppc64: Expected vmdescription section, but got 0

We end up with two running instances.

This behaviour was introduced in 2.11 by commit 58110f0acb1a "migration:
split common postcopy out of ram postcopy" to prepare ground for the
upcoming dirty bitmap postcopy support. It adds a new case where the
source may send an empty postcopy advise because dirty bitmap doesn't
need to check page sizes like RAM postcopy does.

If the source has enabled postcopy-ram, then it sends an advise with
the page size values. If the destination hasn't enabled postcopy-ram,
then loadvm_postcopy_handle_advise() leaves the page size values on
the stream and returns. This confuses qemu_loadvm_state() later on
and causes the destination to start execution.

As discussed several times, postcopy-ram should be enabled both sides
to be functional. This patch changes the destination to perform some
extra checks on the advise length to ensure this is the case. Otherwise
an error is returned and migration is aborted.

Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 migration/savevm.c |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index b7908f62be3c..1c516fcbb8d7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1376,7 +1376,8 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
  * *might* happen - it might be skipped if precopy transferred everything
  * quickly.
  */
-static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
+static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
+                                         uint16_t len)
 {
     PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
     uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
@@ -1387,8 +1388,19 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
         return -1;
     }
 
-    if (!migrate_postcopy_ram()) {
+    switch (len) {
+    case 0:
+        /* The source hasn't enabled postcopy-ram. Nothing to do. */
         return 0;
+    case 8 + 8:
+        if (!migrate_postcopy_ram()) {
+            error_report("RAM postcopy is disabled");
+            return -EINVAL;
+        }
+        break;
+    default:
+        error_report("CMD_POSTCOPY_ADVISE invalid length (%d)", len);
+        return -EINVAL;
     }
 
     if (!postcopy_ram_supported_by_host(mis)) {
@@ -1807,7 +1819,7 @@ static int loadvm_process_command(QEMUFile *f)
         return loadvm_handle_cmd_packaged(mis);
 
     case MIG_CMD_POSTCOPY_ADVISE:
-        return loadvm_postcopy_handle_advise(mis);
+        return loadvm_postcopy_handle_advise(mis, len);
 
     case MIG_CMD_POSTCOPY_LISTEN:
         return loadvm_postcopy_handle_listen(mis);


Re: [Qemu-devel] [PATCH] migration: incoming postcopy advise sanity checks
Posted by Vladimir Sementsov-Ogievskiy 6 years, 1 month ago
06.02.2018 10:26, Greg Kurz wrote:
> If postcopy-ram was set on the source but not on the destination,
> migration doesn't occur, the destination prints an error and boots
> the guest:
>
> qemu-system-ppc64: Expected vmdescription section, but got 0
>
> We end up with two running instances.
>
> This behaviour was introduced in 2.11 by commit 58110f0acb1a "migration:
> split common postcopy out of ram postcopy" to prepare ground for the
> upcoming dirty bitmap postcopy support. It adds a new case where the
> source may send an empty postcopy advise because dirty bitmap doesn't
> need to check page sizes like RAM postcopy does.
>
> If the source has enabled postcopy-ram, then it sends an advise with
> the page size values. If the destination hasn't enabled postcopy-ram,
> then loadvm_postcopy_handle_advise() leaves the page size values on
> the stream and returns. This confuses qemu_loadvm_state() later on
> and causes the destination to start execution.
>
> As discussed several times, postcopy-ram should be enabled both sides
> to be functional. This patch changes the destination to perform some
> extra checks on the advise length to ensure this is the case. Otherwise
> an error is returned and migration is aborted.
>
> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>   migration/savevm.c |   18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/migration/savevm.c b/migration/savevm.c
> index b7908f62be3c..1c516fcbb8d7 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1376,7 +1376,8 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
>    * *might* happen - it might be skipped if precopy transferred everything
>    * quickly.
>    */
> -static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
> +static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
> +                                         uint16_t len)
>   {
>       PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
>       uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
> @@ -1387,8 +1388,19 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
>           return -1;
>       }
>   
> -    if (!migrate_postcopy_ram()) {
> +    switch (len) {
> +    case 0:
> +        /* The source hasn't enabled postcopy-ram. Nothing to do. */

should we error-out here if (migrate_postcopy_ram()) ?

>           return 0;
> +    case 8 + 8:
> +        if (!migrate_postcopy_ram()) {
> +            error_report("RAM postcopy is disabled");
> +            return -EINVAL;
> +        }
> +        break;
> +    default:
> +        error_report("CMD_POSTCOPY_ADVISE invalid length (%d)", len);
> +        return -EINVAL;
>       }
>   
>       if (!postcopy_ram_supported_by_host(mis)) {
> @@ -1807,7 +1819,7 @@ static int loadvm_process_command(QEMUFile *f)
>           return loadvm_handle_cmd_packaged(mis);
>   
>       case MIG_CMD_POSTCOPY_ADVISE:
> -        return loadvm_postcopy_handle_advise(mis);
> +        return loadvm_postcopy_handle_advise(mis, len);
>   
>       case MIG_CMD_POSTCOPY_LISTEN:
>           return loadvm_postcopy_handle_listen(mis);
>


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH] migration: incoming postcopy advise sanity checks
Posted by Greg Kurz 6 years, 1 month ago
On Tue, 6 Feb 2018 10:49:47 +0300
Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:

> 06.02.2018 10:26, Greg Kurz wrote:
> > If postcopy-ram was set on the source but not on the destination,
> > migration doesn't occur, the destination prints an error and boots
> > the guest:
> >
> > qemu-system-ppc64: Expected vmdescription section, but got 0
> >
> > We end up with two running instances.
> >
> > This behaviour was introduced in 2.11 by commit 58110f0acb1a "migration:
> > split common postcopy out of ram postcopy" to prepare ground for the
> > upcoming dirty bitmap postcopy support. It adds a new case where the
> > source may send an empty postcopy advise because dirty bitmap doesn't
> > need to check page sizes like RAM postcopy does.
> >
> > If the source has enabled postcopy-ram, then it sends an advise with
> > the page size values. If the destination hasn't enabled postcopy-ram,
> > then loadvm_postcopy_handle_advise() leaves the page size values on
> > the stream and returns. This confuses qemu_loadvm_state() later on
> > and causes the destination to start execution.
> >
> > As discussed several times, postcopy-ram should be enabled both sides
> > to be functional. This patch changes the destination to perform some
> > extra checks on the advise length to ensure this is the case. Otherwise
> > an error is returned and migration is aborted.
> >
> > Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >   migration/savevm.c |   18 +++++++++++++++---
> >   1 file changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index b7908f62be3c..1c516fcbb8d7 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -1376,7 +1376,8 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
> >    * *might* happen - it might be skipped if precopy transferred everything
> >    * quickly.
> >    */
> > -static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
> > +static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
> > +                                         uint16_t len)
> >   {
> >       PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
> >       uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
> > @@ -1387,8 +1388,19 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
> >           return -1;
> >       }
> >   
> > -    if (!migrate_postcopy_ram()) {
> > +    switch (len) {
> > +    case 0:
> > +        /* The source hasn't enabled postcopy-ram. Nothing to do. */  
> 
> should we error-out here if (migrate_postcopy_ram()) ?
> 

I was also thinking so at first, but if the source hasn't enabled postcopy-ram,
then RAM postcopy won't happen. Not sure why we should error out...

> >           return 0;
> > +    case 8 + 8:
> > +        if (!migrate_postcopy_ram()) {
> > +            error_report("RAM postcopy is disabled");
> > +            return -EINVAL;
> > +        }
> > +        break;
> > +    default:
> > +        error_report("CMD_POSTCOPY_ADVISE invalid length (%d)", len);
> > +        return -EINVAL;
> >       }
> >   
> >       if (!postcopy_ram_supported_by_host(mis)) {
> > @@ -1807,7 +1819,7 @@ static int loadvm_process_command(QEMUFile *f)
> >           return loadvm_handle_cmd_packaged(mis);
> >   
> >       case MIG_CMD_POSTCOPY_ADVISE:
> > -        return loadvm_postcopy_handle_advise(mis);
> > +        return loadvm_postcopy_handle_advise(mis, len);
> >   
> >       case MIG_CMD_POSTCOPY_LISTEN:
> >           return loadvm_postcopy_handle_listen(mis);
> >  
> 
> 


Re: [Qemu-devel] [PATCH] migration: incoming postcopy advise sanity checks
Posted by Vladimir Sementsov-Ogievskiy 6 years, 1 month ago
06.02.2018 11:43, Greg Kurz wrote:
> On Tue, 6 Feb 2018 10:49:47 +0300
> Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
>
>> 06.02.2018 10:26, Greg Kurz wrote:
>>> If postcopy-ram was set on the source but not on the destination,
>>> migration doesn't occur, the destination prints an error and boots
>>> the guest:
>>>
>>> qemu-system-ppc64: Expected vmdescription section, but got 0
>>>
>>> We end up with two running instances.
>>>
>>> This behaviour was introduced in 2.11 by commit 58110f0acb1a "migration:
>>> split common postcopy out of ram postcopy" to prepare ground for the
>>> upcoming dirty bitmap postcopy support. It adds a new case where the
>>> source may send an empty postcopy advise because dirty bitmap doesn't
>>> need to check page sizes like RAM postcopy does.
>>>
>>> If the source has enabled postcopy-ram, then it sends an advise with
>>> the page size values. If the destination hasn't enabled postcopy-ram,
>>> then loadvm_postcopy_handle_advise() leaves the page size values on
>>> the stream and returns. This confuses qemu_loadvm_state() later on
>>> and causes the destination to start execution.
>>>
>>> As discussed several times, postcopy-ram should be enabled both sides
>>> to be functional. This patch changes the destination to perform some
>>> extra checks on the advise length to ensure this is the case. Otherwise
>>> an error is returned and migration is aborted.
>>>
>>> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
>>> Signed-off-by: Greg Kurz <groug@kaod.org>
>>> ---
>>>    migration/savevm.c |   18 +++++++++++++++---
>>>    1 file changed, 15 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>> index b7908f62be3c..1c516fcbb8d7 100644
>>> --- a/migration/savevm.c
>>> +++ b/migration/savevm.c
>>> @@ -1376,7 +1376,8 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
>>>     * *might* happen - it might be skipped if precopy transferred everything
>>>     * quickly.
>>>     */
>>> -static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
>>> +static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
>>> +                                         uint16_t len)
>>>    {
>>>        PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
>>>        uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
>>> @@ -1387,8 +1388,19 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
>>>            return -1;
>>>        }
>>>    
>>> -    if (!migrate_postcopy_ram()) {
>>> +    switch (len) {
>>> +    case 0:
>>> +        /* The source hasn't enabled postcopy-ram. Nothing to do. */
>> should we error-out here if (migrate_postcopy_ram()) ?
>>
> I was also thinking so at first, but if the source hasn't enabled postcopy-ram,
> then RAM postcopy won't happen. Not sure why we should error out...

if user enables dirty-bitmaps postcopy on source, and enables 
postcopy-ram on target,
we will be here, with migrate_postcopy_ram()=true. and it should be error.

>
>>>            return 0;
>>> +    case 8 + 8:
>>> +        if (!migrate_postcopy_ram()) {
>>> +            error_report("RAM postcopy is disabled");
>>> +            return -EINVAL;
>>> +        }
>>> +        break;
>>> +    default:
>>> +        error_report("CMD_POSTCOPY_ADVISE invalid length (%d)", len);
>>> +        return -EINVAL;
>>>        }
>>>    
>>>        if (!postcopy_ram_supported_by_host(mis)) {
>>> @@ -1807,7 +1819,7 @@ static int loadvm_process_command(QEMUFile *f)
>>>            return loadvm_handle_cmd_packaged(mis);
>>>    
>>>        case MIG_CMD_POSTCOPY_ADVISE:
>>> -        return loadvm_postcopy_handle_advise(mis);
>>> +        return loadvm_postcopy_handle_advise(mis, len);
>>>    
>>>        case MIG_CMD_POSTCOPY_LISTEN:
>>>            return loadvm_postcopy_handle_listen(mis);
>>>   
>>


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH] migration: incoming postcopy advise sanity checks
Posted by Dr. David Alan Gilbert 6 years, 1 month ago
* Vladimir Sementsov-Ogievskiy (vsementsov@virtuozzo.com) wrote:
> 06.02.2018 11:43, Greg Kurz wrote:
> > On Tue, 6 Feb 2018 10:49:47 +0300
> > Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
> > 
> > > 06.02.2018 10:26, Greg Kurz wrote:
> > > > If postcopy-ram was set on the source but not on the destination,
> > > > migration doesn't occur, the destination prints an error and boots
> > > > the guest:
> > > > 
> > > > qemu-system-ppc64: Expected vmdescription section, but got 0
> > > > 
> > > > We end up with two running instances.
> > > > 
> > > > This behaviour was introduced in 2.11 by commit 58110f0acb1a "migration:
> > > > split common postcopy out of ram postcopy" to prepare ground for the
> > > > upcoming dirty bitmap postcopy support. It adds a new case where the
> > > > source may send an empty postcopy advise because dirty bitmap doesn't
> > > > need to check page sizes like RAM postcopy does.
> > > > 
> > > > If the source has enabled postcopy-ram, then it sends an advise with
> > > > the page size values. If the destination hasn't enabled postcopy-ram,
> > > > then loadvm_postcopy_handle_advise() leaves the page size values on
> > > > the stream and returns. This confuses qemu_loadvm_state() later on
> > > > and causes the destination to start execution.
> > > > 
> > > > As discussed several times, postcopy-ram should be enabled both sides
> > > > to be functional. This patch changes the destination to perform some
> > > > extra checks on the advise length to ensure this is the case. Otherwise
> > > > an error is returned and migration is aborted.
> > > > 
> > > > Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> > > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > > > ---
> > > >    migration/savevm.c |   18 +++++++++++++++---
> > > >    1 file changed, 15 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > > index b7908f62be3c..1c516fcbb8d7 100644
> > > > --- a/migration/savevm.c
> > > > +++ b/migration/savevm.c
> > > > @@ -1376,7 +1376,8 @@ static int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
> > > >     * *might* happen - it might be skipped if precopy transferred everything
> > > >     * quickly.
> > > >     */
> > > > -static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
> > > > +static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
> > > > +                                         uint16_t len)
> > > >    {
> > > >        PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
> > > >        uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
> > > > @@ -1387,8 +1388,19 @@ static int loadvm_postcopy_handle_advise(MigrationIncomingState *mis)
> > > >            return -1;
> > > >        }
> > > > -    if (!migrate_postcopy_ram()) {
> > > > +    switch (len) {
> > > > +    case 0:
> > > > +        /* The source hasn't enabled postcopy-ram. Nothing to do. */
> > > should we error-out here if (migrate_postcopy_ram()) ?
> > > 
> > I was also thinking so at first, but if the source hasn't enabled postcopy-ram,
> > then RAM postcopy won't happen. Not sure why we should error out...
> 
> if user enables dirty-bitmaps postcopy on source, and enables postcopy-ram
> on target,
> we will be here, with migrate_postcopy_ram()=true. and it should be error.

In that case it's probably worth it.

> > 
> > > >            return 0;
> > > > +    case 8 + 8:
> > > > +        if (!migrate_postcopy_ram()) {
> > > > +            error_report("RAM postcopy is disabled");

Can you make that a little bit more descriptive please:
    error_report("RAM postcopy is disabled but have 16 byte advise");
should do.

Thanks,

Dave

> > > > +            return -EINVAL;
> > > > +        }
> > > > +        break;
> > > > +    default:
> > > > +        error_report("CMD_POSTCOPY_ADVISE invalid length (%d)", len);
> > > > +        return -EINVAL;
> > > >        }
> > > >        if (!postcopy_ram_supported_by_host(mis)) {
> > > > @@ -1807,7 +1819,7 @@ static int loadvm_process_command(QEMUFile *f)
> > > >            return loadvm_handle_cmd_packaged(mis);
> > > >        case MIG_CMD_POSTCOPY_ADVISE:
> > > > -        return loadvm_postcopy_handle_advise(mis);
> > > > +        return loadvm_postcopy_handle_advise(mis, len);
> > > >        case MIG_CMD_POSTCOPY_LISTEN:
> > > >            return loadvm_postcopy_handle_listen(mis);
> > > 
> 
> 
> -- 
> Best regards,
> Vladimir
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK