[Qemu-devel] [PATCH] migration: always initial RAMBlock.bmap to 1 for new migration

Ivan Ren posted 1 patch 4 years, 9 months ago
Test checkpatch passed
Test s390x passed
Test asan passed
Test docker-mingw@fedora passed
Test FreeBSD passed
Test docker-clang@ubuntu passed
Failed in applying to current master (apply log)
migration/ram.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
[Qemu-devel] [PATCH] migration: always initial RAMBlock.bmap to 1 for new migration
Posted by Ivan Ren 4 years, 9 months ago
Reproduce the problem:
migrate
migrate_cancel
migrate

Error happen for memory migration

The reason as follows:
1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to
   1 by a series of cpu_physical_memory_set_dirty_range
2. migration start:ram_init_bitmaps
   - memory_global_dirty_log_start: begin log diry
   - memory_global_dirty_log_sync: sync dirty bitmap to
     ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
   - migration_bitmap_sync_range: sync ram_list.
     dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
     and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
3. migration data...
4. migrate_cancel, will stop log dirty
5. migration start:ram_init_bitmaps
   - memory_global_dirty_log_start: begin log diry
   - memory_global_dirty_log_sync: sync dirty bitmap to
     ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
   - migration_bitmap_sync_range: sync ram_list.
     dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
     and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero

   Here RAMBlock.bmap only have new logged dirty pages, don't contain
   the whole guest pages.

Signed-off-by: Ivan Ren <ivanren@tencent.com>
---
 migration/ram.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 908517fc2b..bbebaee0c1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3173,11 +3173,11 @@ static int ram_state_init(RAMState **rsp)
     QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
 
     /*
+     * Count the total number of pages used by ram blocks not including any
+     * gaps due to alignment or unplugs.
      * This must match with the initial values of dirty bitmap.
-     * Currently we initialize the dirty bitmap to all zeros so
-     * here the total dirty page count is zero.
      */
-    (*rsp)->migration_dirty_pages = 0;
+    (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
     ram_state_reset(*rsp);
 
     return 0;
@@ -3196,12 +3196,13 @@ static void ram_list_init_bitmaps(void)
              * The initial dirty bitmap for migration must be set with all
              * ones to make sure we'll migrate every guest RAM page to
              * destination.
-             * Here we didn't set RAMBlock.bmap simply because it is already
-             * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in
-             * ram_block_add, and that's where we'll sync the dirty bitmaps.
-             * Here setting RAMBlock.bmap would be fine too but not necessary.
+             * Here we set RAMBlock.bmap all to 1 because when rebegin a
+             * new migration after a failed migration, ram_list.
+             * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole
+             * guest memory.
              */
             block->bmap = bitmap_new(pages);
+            bitmap_set(block->bmap, 0, pages);
             if (migrate_postcopy_ram()) {
                 block->unsentmap = bitmap_new(pages);
                 bitmap_set(block->unsentmap, 0, pages);
-- 
2.17.2 (Apple Git-113)


Re: [Qemu-devel] [PATCH] migration: always initial RAMBlock.bmap to 1 for new migration
Posted by Wei Yang 4 years, 8 months ago
Thanks, I didn't notice this case.

On Sun, Jul 14, 2019 at 10:51:19PM +0800, Ivan Ren wrote:
>Reproduce the problem:
>migrate
>migrate_cancel
>migrate
>
>Error happen for memory migration
>
>The reason as follows:
>1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to
>   1 by a series of cpu_physical_memory_set_dirty_range
>2. migration start:ram_init_bitmaps
>   - memory_global_dirty_log_start: begin log diry

s/diry/dirty/

>   - memory_global_dirty_log_sync: sync dirty bitmap to
>     ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
>   - migration_bitmap_sync_range: sync ram_list.
>     dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
>     and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
>3. migration data...
>4. migrate_cancel, will stop log dirty
>5. migration start:ram_init_bitmaps
>   - memory_global_dirty_log_start: begin log diry

same as above

>   - memory_global_dirty_log_sync: sync dirty bitmap to
>     ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
>   - migration_bitmap_sync_range: sync ram_list.
>     dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
>     and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
>
>   Here RAMBlock.bmap only have new logged dirty pages, don't contain
>   the whole guest pages.
>
>Signed-off-by: Ivan Ren <ivanren@tencent.com>
>---
> migration/ram.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
>
>diff --git a/migration/ram.c b/migration/ram.c
>index 908517fc2b..bbebaee0c1 100644
>--- a/migration/ram.c
>+++ b/migration/ram.c
>@@ -3173,11 +3173,11 @@ static int ram_state_init(RAMState **rsp)
>     QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
> 
>     /*
>+     * Count the total number of pages used by ram blocks not including any
>+     * gaps due to alignment or unplugs.
>      * This must match with the initial values of dirty bitmap.
>-     * Currently we initialize the dirty bitmap to all zeros so
>-     * here the total dirty page count is zero.
>      */
>-    (*rsp)->migration_dirty_pages = 0;
>+    (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
>     ram_state_reset(*rsp);
> 
>     return 0;
>@@ -3196,12 +3196,13 @@ static void ram_list_init_bitmaps(void)
>              * The initial dirty bitmap for migration must be set with all
>              * ones to make sure we'll migrate every guest RAM page to
>              * destination.
>-             * Here we didn't set RAMBlock.bmap simply because it is already
>-             * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in
>-             * ram_block_add, and that's where we'll sync the dirty bitmaps.
>-             * Here setting RAMBlock.bmap would be fine too but not necessary.
>+             * Here we set RAMBlock.bmap all to 1 because when rebegin a
>+             * new migration after a failed migration, ram_list.

It is after failure or cancel?

>+             * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole
>+             * guest memory.
>              */
>             block->bmap = bitmap_new(pages);
>+            bitmap_set(block->bmap, 0, pages);
>             if (migrate_postcopy_ram()) {
>                 block->unsentmap = bitmap_new(pages);
>                 bitmap_set(block->unsentmap, 0, pages);
>-- 
>2.17.2 (Apple Git-113)
>


Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>

-- 
Wei Yang
Help you, Help me

Re: [Qemu-devel] [PATCH] migration: always initial RAMBlock.bmap to 1 for new migration
Posted by Ivan Ren 4 years, 8 months ago
>>-             * Here we didn't set RAMBlock.bmap simply because it is
already
>>-             * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in
>>-             * ram_block_add, and that's where we'll sync the dirty
bitmaps.
>>-             * Here setting RAMBlock.bmap would be fine too but not
necessary.
>>+             * Here we set RAMBlock.bmap all to 1 because when rebegin a
>>+             * new migration after a failed migration, ram_list.
>
>It is after failure or cancel?

Here we can treat cancel as a special fail.

Thanks.

On Thu, Aug 1, 2019 at 10:56 AM Wei Yang <richardw.yang@linux.intel.com>
wrote:

> Thanks, I didn't notice this case.
>
> On Sun, Jul 14, 2019 at 10:51:19PM +0800, Ivan Ren wrote:
> >Reproduce the problem:
> >migrate
> >migrate_cancel
> >migrate
> >
> >Error happen for memory migration
> >
> >The reason as follows:
> >1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to
> >   1 by a series of cpu_physical_memory_set_dirty_range
> >2. migration start:ram_init_bitmaps
> >   - memory_global_dirty_log_start: begin log diry
>
> s/diry/dirty/
>
> >   - memory_global_dirty_log_sync: sync dirty bitmap to
> >     ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
> >   - migration_bitmap_sync_range: sync ram_list.
> >     dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
> >     and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
> >3. migration data...
> >4. migrate_cancel, will stop log dirty
> >5. migration start:ram_init_bitmaps
> >   - memory_global_dirty_log_start: begin log diry
>
> same as above
>
> >   - memory_global_dirty_log_sync: sync dirty bitmap to
> >     ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
> >   - migration_bitmap_sync_range: sync ram_list.
> >     dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
> >     and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
> >
> >   Here RAMBlock.bmap only have new logged dirty pages, don't contain
> >   the whole guest pages.
> >
> >Signed-off-by: Ivan Ren <ivanren@tencent.com>
> >---
> > migration/ram.c | 15 ++++++++-------
> > 1 file changed, 8 insertions(+), 7 deletions(-)
> >
> >diff --git a/migration/ram.c b/migration/ram.c
> >index 908517fc2b..bbebaee0c1 100644
> >--- a/migration/ram.c
> >+++ b/migration/ram.c
> >@@ -3173,11 +3173,11 @@ static int ram_state_init(RAMState **rsp)
> >     QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
> >
> >     /*
> >+     * Count the total number of pages used by ram blocks not including
> any
> >+     * gaps due to alignment or unplugs.
> >      * This must match with the initial values of dirty bitmap.
> >-     * Currently we initialize the dirty bitmap to all zeros so
> >-     * here the total dirty page count is zero.
> >      */
> >-    (*rsp)->migration_dirty_pages = 0;
> >+    (*rsp)->migration_dirty_pages = ram_bytes_total() >>
> TARGET_PAGE_BITS;
> >     ram_state_reset(*rsp);
> >
> >     return 0;
> >@@ -3196,12 +3196,13 @@ static void ram_list_init_bitmaps(void)
> >              * The initial dirty bitmap for migration must be set with
> all
> >              * ones to make sure we'll migrate every guest RAM page to
> >              * destination.
> >-             * Here we didn't set RAMBlock.bmap simply because it is
> already
> >-             * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in
> >-             * ram_block_add, and that's where we'll sync the dirty
> bitmaps.
> >-             * Here setting RAMBlock.bmap would be fine too but not
> necessary.
> >+             * Here we set RAMBlock.bmap all to 1 because when rebegin a
> >+             * new migration after a failed migration, ram_list.
>
> It is after failure or cancel?
>
> >+             * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the
> whole
> >+             * guest memory.
> >              */
> >             block->bmap = bitmap_new(pages);
> >+            bitmap_set(block->bmap, 0, pages);
> >             if (migrate_postcopy_ram()) {
> >                 block->unsentmap = bitmap_new(pages);
> >                 bitmap_set(block->unsentmap, 0, pages);
> >--
> >2.17.2 (Apple Git-113)
> >
>
>
> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com>
>
> --
> Wei Yang
> Help you, Help me
>
Re: [Qemu-devel] [PATCH] migration: always initial RAMBlock.bmap to 1 for new migration
Posted by Wei Yang 4 years, 8 months ago
On Thu, Aug 01, 2019 at 03:58:54PM +0800, Ivan Ren wrote:
>>>-             * Here we didn't set RAMBlock.bmap simply because it is
>already
>>>-             * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in
>>>-             * ram_block_add, and that's where we'll sync the dirty
>bitmaps.
>>>-             * Here setting RAMBlock.bmap would be fine too but not
>necessary.
>>>+             * Here we set RAMBlock.bmap all to 1 because when rebegin a
>>>+             * new migration after a failed migration, ram_list.
>>
>>It is after failure or cancel?
>
>Here we can treat cancel as a special fail.
>

Well, I am not sure this is officially documented. Would this lead to
confusion?

>Thanks.
>

-- 
Wei Yang
Help you, Help me

Re: [Qemu-devel] [PATCH] migration: always initial RAMBlock.bmap to 1 for new migration
Posted by Dr. David Alan Gilbert 4 years, 9 months ago
* Ivan Ren (renyime@gmail.com) wrote:
> Reproduce the problem:
> migrate
> migrate_cancel
> migrate
> 
> Error happen for memory migration

Can we fix this by just reverting 03158519384 ?

Dave

> The reason as follows:
> 1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to
>    1 by a series of cpu_physical_memory_set_dirty_range
> 2. migration start:ram_init_bitmaps
>    - memory_global_dirty_log_start: begin log diry
>    - memory_global_dirty_log_sync: sync dirty bitmap to
>      ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
>    - migration_bitmap_sync_range: sync ram_list.
>      dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
>      and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
> 3. migration data...
> 4. migrate_cancel, will stop log dirty
> 5. migration start:ram_init_bitmaps
>    - memory_global_dirty_log_start: begin log diry
>    - memory_global_dirty_log_sync: sync dirty bitmap to
>      ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
>    - migration_bitmap_sync_range: sync ram_list.
>      dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
>      and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
> 
>    Here RAMBlock.bmap only have new logged dirty pages, don't contain
>    the whole guest pages.
> 
> Signed-off-by: Ivan Ren <ivanren@tencent.com>
> ---
>  migration/ram.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index 908517fc2b..bbebaee0c1 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3173,11 +3173,11 @@ static int ram_state_init(RAMState **rsp)
>      QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
>  
>      /*
> +     * Count the total number of pages used by ram blocks not including any
> +     * gaps due to alignment or unplugs.
>       * This must match with the initial values of dirty bitmap.
> -     * Currently we initialize the dirty bitmap to all zeros so
> -     * here the total dirty page count is zero.
>       */
> -    (*rsp)->migration_dirty_pages = 0;
> +    (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
>      ram_state_reset(*rsp);
>  
>      return 0;
> @@ -3196,12 +3196,13 @@ static void ram_list_init_bitmaps(void)
>               * The initial dirty bitmap for migration must be set with all
>               * ones to make sure we'll migrate every guest RAM page to
>               * destination.
> -             * Here we didn't set RAMBlock.bmap simply because it is already
> -             * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in
> -             * ram_block_add, and that's where we'll sync the dirty bitmaps.
> -             * Here setting RAMBlock.bmap would be fine too but not necessary.
> +             * Here we set RAMBlock.bmap all to 1 because when rebegin a
> +             * new migration after a failed migration, ram_list.
> +             * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole
> +             * guest memory.
>               */
>              block->bmap = bitmap_new(pages);
> +            bitmap_set(block->bmap, 0, pages);
>              if (migrate_postcopy_ram()) {
>                  block->unsentmap = bitmap_new(pages);
>                  bitmap_set(block->unsentmap, 0, pages);
> -- 
> 2.17.2 (Apple Git-113)
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: always initial RAMBlock.bmap to 1 for new migration
Posted by Peter Xu 4 years, 9 months ago
On Sun, Jul 14, 2019 at 10:51:19PM +0800, Ivan Ren wrote:
> Reproduce the problem:
> migrate
> migrate_cancel
> migrate
> 
> Error happen for memory migration

Can mention "this mostly revert 0315851938 but with comments kept"
when merge...

> 
> The reason as follows:
> 1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to
>    1 by a series of cpu_physical_memory_set_dirty_range
> 2. migration start:ram_init_bitmaps
>    - memory_global_dirty_log_start: begin log diry
>    - memory_global_dirty_log_sync: sync dirty bitmap to
>      ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
>    - migration_bitmap_sync_range: sync ram_list.
>      dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
>      and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
> 3. migration data...
> 4. migrate_cancel, will stop log dirty
> 5. migration start:ram_init_bitmaps
>    - memory_global_dirty_log_start: begin log diry
>    - memory_global_dirty_log_sync: sync dirty bitmap to
>      ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
>    - migration_bitmap_sync_range: sync ram_list.
>      dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
>      and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
> 
>    Here RAMBlock.bmap only have new logged dirty pages, don't contain
>    the whole guest pages.

Fixes: 03158519384f158

> 
> Signed-off-by: Ivan Ren <ivanren@tencent.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

I think this is a bit severe and should be rc2 material.  Dave/Juan?

Thanks,

-- 
Peter Xu

Re: [Qemu-devel] [PATCH] migration: always initial RAMBlock.bmap to 1 for new migration
Posted by Dr. David Alan Gilbert 4 years, 9 months ago
* Peter Xu (zhexu@redhat.com) wrote:
> On Sun, Jul 14, 2019 at 10:51:19PM +0800, Ivan Ren wrote:
> > Reproduce the problem:
> > migrate
> > migrate_cancel
> > migrate
> > 
> > Error happen for memory migration
> 
> Can mention "this mostly revert 0315851938 but with comments kept"
> when merge...
> 
> > 
> > The reason as follows:
> > 1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to
> >    1 by a series of cpu_physical_memory_set_dirty_range
> > 2. migration start:ram_init_bitmaps
> >    - memory_global_dirty_log_start: begin log diry
> >    - memory_global_dirty_log_sync: sync dirty bitmap to
> >      ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
> >    - migration_bitmap_sync_range: sync ram_list.
> >      dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
> >      and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
> > 3. migration data...
> > 4. migrate_cancel, will stop log dirty
> > 5. migration start:ram_init_bitmaps
> >    - memory_global_dirty_log_start: begin log diry
> >    - memory_global_dirty_log_sync: sync dirty bitmap to
> >      ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
> >    - migration_bitmap_sync_range: sync ram_list.
> >      dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
> >      and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
> > 
> >    Here RAMBlock.bmap only have new logged dirty pages, don't contain
> >    the whole guest pages.
> 
> Fixes: 03158519384f158
> 
> > 
> > Signed-off-by: Ivan Ren <ivanren@tencent.com>
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> I think this is a bit severe and should be rc2 material.  Dave/Juan?

Yes agreed; I've added it to the planning/4.1 page.

> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK