Reproduce the problem:
migrate
migrate_cancel
migrate
Error happen for memory migration
The reason as follows:
1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to
1 by a series of cpu_physical_memory_set_dirty_range
2. migration start:ram_init_bitmaps
- memory_global_dirty_log_start: begin log diry
- memory_global_dirty_log_sync: sync dirty bitmap to
ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
- migration_bitmap_sync_range: sync ram_list.
dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
3. migration data...
4. migrate_cancel, will stop log dirty
5. migration start:ram_init_bitmaps
- memory_global_dirty_log_start: begin log diry
- memory_global_dirty_log_sync: sync dirty bitmap to
ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]
- migration_bitmap_sync_range: sync ram_list.
dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap
and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero
Here RAMBlock.bmap only have new logged dirty pages, don't contain
the whole guest pages.
Signed-off-by: Ivan Ren <ivanren@tencent.com>
---
migration/ram.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 908517fc2b..bbebaee0c1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3173,11 +3173,11 @@ static int ram_state_init(RAMState **rsp)
QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
/*
+ * Count the total number of pages used by ram blocks not including any
+ * gaps due to alignment or unplugs.
* This must match with the initial values of dirty bitmap.
- * Currently we initialize the dirty bitmap to all zeros so
- * here the total dirty page count is zero.
*/
- (*rsp)->migration_dirty_pages = 0;
+ (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
ram_state_reset(*rsp);
return 0;
@@ -3196,12 +3196,13 @@ static void ram_list_init_bitmaps(void)
* The initial dirty bitmap for migration must be set with all
* ones to make sure we'll migrate every guest RAM page to
* destination.
- * Here we didn't set RAMBlock.bmap simply because it is already
- * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in
- * ram_block_add, and that's where we'll sync the dirty bitmaps.
- * Here setting RAMBlock.bmap would be fine too but not necessary.
+ * Here we set RAMBlock.bmap all to 1 because when rebegin a
+ * new migration after a failed migration, ram_list.
+ * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole
+ * guest memory.
*/
block->bmap = bitmap_new(pages);
+ bitmap_set(block->bmap, 0, pages);
if (migrate_postcopy_ram()) {
block->unsentmap = bitmap_new(pages);
bitmap_set(block->unsentmap, 0, pages);
--
2.17.2 (Apple Git-113)
Thanks, I didn't notice this case. On Sun, Jul 14, 2019 at 10:51:19PM +0800, Ivan Ren wrote: >Reproduce the problem: >migrate >migrate_cancel >migrate > >Error happen for memory migration > >The reason as follows: >1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to > 1 by a series of cpu_physical_memory_set_dirty_range >2. migration start:ram_init_bitmaps > - memory_global_dirty_log_start: begin log diry s/diry/dirty/ > - memory_global_dirty_log_sync: sync dirty bitmap to > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] > - migration_bitmap_sync_range: sync ram_list. > dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap > and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero >3. migration data... >4. migrate_cancel, will stop log dirty >5. migration start:ram_init_bitmaps > - memory_global_dirty_log_start: begin log diry same as above > - memory_global_dirty_log_sync: sync dirty bitmap to > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] > - migration_bitmap_sync_range: sync ram_list. > dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap > and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero > > Here RAMBlock.bmap only have new logged dirty pages, don't contain > the whole guest pages. > >Signed-off-by: Ivan Ren <ivanren@tencent.com> >--- > migration/ram.c | 15 ++++++++------- > 1 file changed, 8 insertions(+), 7 deletions(-) > >diff --git a/migration/ram.c b/migration/ram.c >index 908517fc2b..bbebaee0c1 100644 >--- a/migration/ram.c >+++ b/migration/ram.c >@@ -3173,11 +3173,11 @@ static int ram_state_init(RAMState **rsp) > QSIMPLEQ_INIT(&(*rsp)->src_page_requests); > > /* >+ * Count the total number of pages used by ram blocks not including any >+ * gaps due to alignment or unplugs. > * This must match with the initial values of dirty bitmap. >- * Currently we initialize the dirty bitmap to all zeros so >- * here the total dirty page count is zero. > */ >- (*rsp)->migration_dirty_pages = 0; >+ (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS; > ram_state_reset(*rsp); > > return 0; >@@ -3196,12 +3196,13 @@ static void ram_list_init_bitmaps(void) > * The initial dirty bitmap for migration must be set with all > * ones to make sure we'll migrate every guest RAM page to > * destination. >- * Here we didn't set RAMBlock.bmap simply because it is already >- * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in >- * ram_block_add, and that's where we'll sync the dirty bitmaps. >- * Here setting RAMBlock.bmap would be fine too but not necessary. >+ * Here we set RAMBlock.bmap all to 1 because when rebegin a >+ * new migration after a failed migration, ram_list. It is after failure or cancel? >+ * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole >+ * guest memory. > */ > block->bmap = bitmap_new(pages); >+ bitmap_set(block->bmap, 0, pages); > if (migrate_postcopy_ram()) { > block->unsentmap = bitmap_new(pages); > bitmap_set(block->unsentmap, 0, pages); >-- >2.17.2 (Apple Git-113) > Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> -- Wei Yang Help you, Help me
>>- * Here we didn't set RAMBlock.bmap simply because it is already >>- * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in >>- * ram_block_add, and that's where we'll sync the dirty bitmaps. >>- * Here setting RAMBlock.bmap would be fine too but not necessary. >>+ * Here we set RAMBlock.bmap all to 1 because when rebegin a >>+ * new migration after a failed migration, ram_list. > >It is after failure or cancel? Here we can treat cancel as a special fail. Thanks. On Thu, Aug 1, 2019 at 10:56 AM Wei Yang <richardw.yang@linux.intel.com> wrote: > Thanks, I didn't notice this case. > > On Sun, Jul 14, 2019 at 10:51:19PM +0800, Ivan Ren wrote: > >Reproduce the problem: > >migrate > >migrate_cancel > >migrate > > > >Error happen for memory migration > > > >The reason as follows: > >1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to > > 1 by a series of cpu_physical_memory_set_dirty_range > >2. migration start:ram_init_bitmaps > > - memory_global_dirty_log_start: begin log diry > > s/diry/dirty/ > > > - memory_global_dirty_log_sync: sync dirty bitmap to > > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] > > - migration_bitmap_sync_range: sync ram_list. > > dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap > > and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero > >3. migration data... > >4. migrate_cancel, will stop log dirty > >5. migration start:ram_init_bitmaps > > - memory_global_dirty_log_start: begin log diry > > same as above > > > - memory_global_dirty_log_sync: sync dirty bitmap to > > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] > > - migration_bitmap_sync_range: sync ram_list. > > dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap > > and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero > > > > Here RAMBlock.bmap only have new logged dirty pages, don't contain > > the whole guest pages. > > > >Signed-off-by: Ivan Ren <ivanren@tencent.com> > >--- > > migration/ram.c | 15 ++++++++------- > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > >diff --git a/migration/ram.c b/migration/ram.c > >index 908517fc2b..bbebaee0c1 100644 > >--- a/migration/ram.c > >+++ b/migration/ram.c > >@@ -3173,11 +3173,11 @@ static int ram_state_init(RAMState **rsp) > > QSIMPLEQ_INIT(&(*rsp)->src_page_requests); > > > > /* > >+ * Count the total number of pages used by ram blocks not including > any > >+ * gaps due to alignment or unplugs. > > * This must match with the initial values of dirty bitmap. > >- * Currently we initialize the dirty bitmap to all zeros so > >- * here the total dirty page count is zero. > > */ > >- (*rsp)->migration_dirty_pages = 0; > >+ (*rsp)->migration_dirty_pages = ram_bytes_total() >> > TARGET_PAGE_BITS; > > ram_state_reset(*rsp); > > > > return 0; > >@@ -3196,12 +3196,13 @@ static void ram_list_init_bitmaps(void) > > * The initial dirty bitmap for migration must be set with > all > > * ones to make sure we'll migrate every guest RAM page to > > * destination. > >- * Here we didn't set RAMBlock.bmap simply because it is > already > >- * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in > >- * ram_block_add, and that's where we'll sync the dirty > bitmaps. > >- * Here setting RAMBlock.bmap would be fine too but not > necessary. > >+ * Here we set RAMBlock.bmap all to 1 because when rebegin a > >+ * new migration after a failed migration, ram_list. > > It is after failure or cancel? > > >+ * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the > whole > >+ * guest memory. > > */ > > block->bmap = bitmap_new(pages); > >+ bitmap_set(block->bmap, 0, pages); > > if (migrate_postcopy_ram()) { > > block->unsentmap = bitmap_new(pages); > > bitmap_set(block->unsentmap, 0, pages); > >-- > >2.17.2 (Apple Git-113) > > > > > Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> > > -- > Wei Yang > Help you, Help me >
On Thu, Aug 01, 2019 at 03:58:54PM +0800, Ivan Ren wrote: >>>- * Here we didn't set RAMBlock.bmap simply because it is >already >>>- * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in >>>- * ram_block_add, and that's where we'll sync the dirty >bitmaps. >>>- * Here setting RAMBlock.bmap would be fine too but not >necessary. >>>+ * Here we set RAMBlock.bmap all to 1 because when rebegin a >>>+ * new migration after a failed migration, ram_list. >> >>It is after failure or cancel? > >Here we can treat cancel as a special fail. > Well, I am not sure this is officially documented. Would this lead to confusion? >Thanks. > -- Wei Yang Help you, Help me
* Ivan Ren (renyime@gmail.com) wrote: > Reproduce the problem: > migrate > migrate_cancel > migrate > > Error happen for memory migration Can we fix this by just reverting 03158519384 ? Dave > The reason as follows: > 1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to > 1 by a series of cpu_physical_memory_set_dirty_range > 2. migration start:ram_init_bitmaps > - memory_global_dirty_log_start: begin log diry > - memory_global_dirty_log_sync: sync dirty bitmap to > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] > - migration_bitmap_sync_range: sync ram_list. > dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap > and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero > 3. migration data... > 4. migrate_cancel, will stop log dirty > 5. migration start:ram_init_bitmaps > - memory_global_dirty_log_start: begin log diry > - memory_global_dirty_log_sync: sync dirty bitmap to > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] > - migration_bitmap_sync_range: sync ram_list. > dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap > and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero > > Here RAMBlock.bmap only have new logged dirty pages, don't contain > the whole guest pages. > > Signed-off-by: Ivan Ren <ivanren@tencent.com> > --- > migration/ram.c | 15 ++++++++------- > 1 file changed, 8 insertions(+), 7 deletions(-) > > diff --git a/migration/ram.c b/migration/ram.c > index 908517fc2b..bbebaee0c1 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -3173,11 +3173,11 @@ static int ram_state_init(RAMState **rsp) > QSIMPLEQ_INIT(&(*rsp)->src_page_requests); > > /* > + * Count the total number of pages used by ram blocks not including any > + * gaps due to alignment or unplugs. > * This must match with the initial values of dirty bitmap. > - * Currently we initialize the dirty bitmap to all zeros so > - * here the total dirty page count is zero. > */ > - (*rsp)->migration_dirty_pages = 0; > + (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS; > ram_state_reset(*rsp); > > return 0; > @@ -3196,12 +3196,13 @@ static void ram_list_init_bitmaps(void) > * The initial dirty bitmap for migration must be set with all > * ones to make sure we'll migrate every guest RAM page to > * destination. > - * Here we didn't set RAMBlock.bmap simply because it is already > - * set in ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] in > - * ram_block_add, and that's where we'll sync the dirty bitmaps. > - * Here setting RAMBlock.bmap would be fine too but not necessary. > + * Here we set RAMBlock.bmap all to 1 because when rebegin a > + * new migration after a failed migration, ram_list. > + * dirty_memory[DIRTY_MEMORY_MIGRATION] don't include the whole > + * guest memory. > */ > block->bmap = bitmap_new(pages); > + bitmap_set(block->bmap, 0, pages); > if (migrate_postcopy_ram()) { > block->unsentmap = bitmap_new(pages); > bitmap_set(block->unsentmap, 0, pages); > -- > 2.17.2 (Apple Git-113) > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Sun, Jul 14, 2019 at 10:51:19PM +0800, Ivan Ren wrote: > Reproduce the problem: > migrate > migrate_cancel > migrate > > Error happen for memory migration Can mention "this mostly revert 0315851938 but with comments kept" when merge... > > The reason as follows: > 1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to > 1 by a series of cpu_physical_memory_set_dirty_range > 2. migration start:ram_init_bitmaps > - memory_global_dirty_log_start: begin log diry > - memory_global_dirty_log_sync: sync dirty bitmap to > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] > - migration_bitmap_sync_range: sync ram_list. > dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap > and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero > 3. migration data... > 4. migrate_cancel, will stop log dirty > 5. migration start:ram_init_bitmaps > - memory_global_dirty_log_start: begin log diry > - memory_global_dirty_log_sync: sync dirty bitmap to > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] > - migration_bitmap_sync_range: sync ram_list. > dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap > and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero > > Here RAMBlock.bmap only have new logged dirty pages, don't contain > the whole guest pages. Fixes: 03158519384f158 > > Signed-off-by: Ivan Ren <ivanren@tencent.com> Reviewed-by: Peter Xu <peterx@redhat.com> I think this is a bit severe and should be rc2 material. Dave/Juan? Thanks, -- Peter Xu
* Peter Xu (zhexu@redhat.com) wrote: > On Sun, Jul 14, 2019 at 10:51:19PM +0800, Ivan Ren wrote: > > Reproduce the problem: > > migrate > > migrate_cancel > > migrate > > > > Error happen for memory migration > > Can mention "this mostly revert 0315851938 but with comments kept" > when merge... > > > > > The reason as follows: > > 1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to > > 1 by a series of cpu_physical_memory_set_dirty_range > > 2. migration start:ram_init_bitmaps > > - memory_global_dirty_log_start: begin log diry > > - memory_global_dirty_log_sync: sync dirty bitmap to > > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] > > - migration_bitmap_sync_range: sync ram_list. > > dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap > > and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero > > 3. migration data... > > 4. migrate_cancel, will stop log dirty > > 5. migration start:ram_init_bitmaps > > - memory_global_dirty_log_start: begin log diry > > - memory_global_dirty_log_sync: sync dirty bitmap to > > ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] > > - migration_bitmap_sync_range: sync ram_list. > > dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap > > and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero > > > > Here RAMBlock.bmap only have new logged dirty pages, don't contain > > the whole guest pages. > > Fixes: 03158519384f158 > > > > > Signed-off-by: Ivan Ren <ivanren@tencent.com> > > Reviewed-by: Peter Xu <peterx@redhat.com> > > I think this is a bit severe and should be rc2 material. Dave/Juan? Yes agreed; I've added it to the planning/4.1 page. > Thanks, > > -- > Peter Xu -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
© 2016 - 2024 Red Hat, Inc.