RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()

Wang, Wei W posted 1 patch 4 years, 7 months ago
Failed in applying to current master (apply log)
RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Posted by Wang, Wei W 4 years, 7 months ago
On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
> On 02.07.21 04:48, Wang, Wei W wrote:
> > On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
> >> On 01.07.21 14:51, Peter Xu wrote:
> 
> I think that clearly shows the issue.
> 
> My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
> Assume the VM reports all pages within a 1GB chunk as free (easy with a fresh
> VM). While processing hints, we will clear the bits from the dirty bmap in the
> RAMBlock. As we will never migrate any page of that 1GB chunk, we will not
> actually clear the dirty bitmap of the memory region. When re-syncing, we will
> set all bits bits in the dirty bmap again from the dirty bitmap in the memory
> region. Thus, many of our hints end up being mostly ignored. The smaller the
> clear bmap chunk, the more extreme the issue.

OK, that looks possible. We need to clear the related bits from the memory region
bitmap before skipping the free pages. Could you try with below patch:

diff --git a/migration/ram.c b/migration/ram.c
index ace8ad431c..a1f6df3e6c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -811,6 +811,26 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
     return next;
 }

+
+static void migration_clear_memory_region_dirty_bitmap(RAMState *rs,
+                                                       RAMBlock *rb,
+                                                      unsigned long page)
+{
+    uint8_t shift;
+    hwaddr size, start;
+
+    if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page))
+        return;
+
+    shift = rb->clear_bmap_shift;
+    assert(shift >= 6);
+
+    size = 1ULL << (TARGET_PAGE_BITS + shift);
+    start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
+    trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
+    memory_region_clear_dirty_bitmap(rb->mr, start, size);
+}
+
 static inline bool migration_bitmap_clear_dirty(RAMState *rs,
                                                 RAMBlock *rb,
                                                 unsigned long page)
@@ -827,26 +847,9 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
      * the page in the chunk we clear the remote dirty bitmap for all.
      * Clearing it earlier won't be a problem, but too late will.
      */
-    if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) {
-        uint8_t shift = rb->clear_bmap_shift;
-        hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift);
-        hwaddr start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
-
-        /*
-         * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
-         * can make things easier sometimes since then start address
-         * of the small chunk will always be 64 pages aligned so the
-         * bitmap will always be aligned to unsigned long.  We should
-         * even be able to remove this restriction but I'm simply
-         * keeping it.
-         */
-        assert(shift >= 6);
-        trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
-        memory_region_clear_dirty_bitmap(rb->mr, start, size);
-    }
+    migration_clear_memory_region_dirty_bitmap(rs, rb, page);

     ret = test_and_clear_bit(page, rb->bmap);
-
     if (ret) {
         rs->migration_dirty_pages--;
     }
@@ -2746,7 +2749,7 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
 {
     RAMBlock *block;
     ram_addr_t offset;
-    size_t used_len, start, npages;
+    size_t used_len, start, npages, page_to_clear, i = 0;
     MigrationState *s = migrate_get_current();

     /* This function is currently expected to be used during live migration */
@@ -2775,6 +2778,19 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
         start = offset >> TARGET_PAGE_BITS;
         npages = used_len >> TARGET_PAGE_BITS;

+        /*
+         * The skipped free pages are equavelent to be sent from clear_bmap's
+        * perspective, so clear the bits from the memory region bitmap which
+        * are initially set. Otherwise those skipped pages will be sent in
+        * the next round after syncing from the memory region bitmap.
+        */
+        /*
+         * The skipped free pages are equavelent to be sent from clear_bmap's
+        * perspective, so clear the bits from the memory region bitmap which
+        * are initially set. Otherwise those skipped pages will be sent in
+        * the next round after syncing from the memory region bitmap.
+        */
+       do {
+            page_to_clear = start + (i++ << block->clear_bmap_shift);
+            migration_clear_memory_region_dirty_bitmap(ram_state,
+                                                       block,
+                                                       page_to_clear);
+       } while (i <= npages >> block->clear_bmap_shift);
+
         qemu_mutex_lock(&ram_state->bitmap_mutex);
         ram_state->migration_dirty_pages -=
                       bitmap_count_one_with_offset(block->bmap, start, npages);

Best,
Wei
Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Posted by David Hildenbrand 4 years, 7 months ago
On 03.07.21 04:53, Wang, Wei W wrote:
> On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
>> On 02.07.21 04:48, Wang, Wei W wrote:
>>> On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
>>>> On 01.07.21 14:51, Peter Xu wrote:
>>
>> I think that clearly shows the issue.
>>
>> My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
>> Assume the VM reports all pages within a 1GB chunk as free (easy with a fresh
>> VM). While processing hints, we will clear the bits from the dirty bmap in the
>> RAMBlock. As we will never migrate any page of that 1GB chunk, we will not
>> actually clear the dirty bitmap of the memory region. When re-syncing, we will
>> set all bits bits in the dirty bmap again from the dirty bitmap in the memory
>> region. Thus, many of our hints end up being mostly ignored. The smaller the
>> clear bmap chunk, the more extreme the issue.
> 
> OK, that looks possible. We need to clear the related bits from the memory region
> bitmap before skipping the free pages. Could you try with below patch:

I did a quick test (with the memhog example) and it seems like it mostly 
works. However, we're now doing the bitmap clearing from another, racing 
with the migration thread. Especially:

1. Racing with clear_bmap_set() via cpu_physical_memory_sync_dirty_bitmap()
2. Racing with migration_bitmap_clear_dirty()

So that might need some thought, if I'm not wrong.

The simplest approach would be removing/freeing the clear_bmap via 
PRECOPY_NOTIFY_SETUP(), similar to 
precopy_enable_free_page_optimization() we had before. Of course, this 
will skip the clear_bmap optimization.

-- 
Thanks,

David / dhildenb


RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Posted by Wang, Wei W 4 years, 7 months ago
On Monday, July 5, 2021 9:42 PM, David Hildenbrand wrote:
> On 03.07.21 04:53, Wang, Wei W wrote:
> > On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
> >> On 02.07.21 04:48, Wang, Wei W wrote:
> >>> On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
> >>>> On 01.07.21 14:51, Peter Xu wrote:
> >>
> >> I think that clearly shows the issue.
> >>
> >> My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
> >> Assume the VM reports all pages within a 1GB chunk as free (easy with
> >> a fresh VM). While processing hints, we will clear the bits from the
> >> dirty bmap in the RAMBlock. As we will never migrate any page of that
> >> 1GB chunk, we will not actually clear the dirty bitmap of the memory
> >> region. When re-syncing, we will set all bits bits in the dirty bmap
> >> again from the dirty bitmap in the memory region. Thus, many of our
> >> hints end up being mostly ignored. The smaller the clear bmap chunk, the
> more extreme the issue.
> >
> > OK, that looks possible. We need to clear the related bits from the
> > memory region bitmap before skipping the free pages. Could you try with
> below patch:
> 
> I did a quick test (with the memhog example) and it seems like it mostly works.
> However, we're now doing the bitmap clearing from another, racing with the
> migration thread. Especially:
> 
> 1. Racing with clear_bmap_set() via cpu_physical_memory_sync_dirty_bitmap()
> 2. Racing with migration_bitmap_clear_dirty()
> 
> So that might need some thought, if I'm not wrong.

I think this is similar to the non-clear_bmap case, where the rb->bmap is set or cleared by
the migration thread and qemu_guest_free_page_hint. For example, the migration thread
could find a bit is set from rb->bmap before qemu_guest_free_page_hint gets that bit cleared in time.
The result is that the free page is transferred, which isn't necessary, but won't cause any issue.
This is very rare in practice.

> 
> The simplest approach would be removing/freeing the clear_bmap via
> PRECOPY_NOTIFY_SETUP(), similar to
> precopy_enable_free_page_optimization() we had before. Of course, this will
> skip the clear_bmap optimization.

Not necessarily, I think. The two optimizations are not mutual exclusive essentially.
At least, they work well together now. If there are other implementation issues reported,
we could check them then.

Best,
Wei
Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Posted by David Hildenbrand 4 years, 7 months ago
On 06.07.21 11:41, Wang, Wei W wrote:
> On Monday, July 5, 2021 9:42 PM, David Hildenbrand wrote:
>> On 03.07.21 04:53, Wang, Wei W wrote:
>>> On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
>>>> On 02.07.21 04:48, Wang, Wei W wrote:
>>>>> On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
>>>>>> On 01.07.21 14:51, Peter Xu wrote:
>>>>
>>>> I think that clearly shows the issue.
>>>>
>>>> My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
>>>> Assume the VM reports all pages within a 1GB chunk as free (easy with
>>>> a fresh VM). While processing hints, we will clear the bits from the
>>>> dirty bmap in the RAMBlock. As we will never migrate any page of that
>>>> 1GB chunk, we will not actually clear the dirty bitmap of the memory
>>>> region. When re-syncing, we will set all bits bits in the dirty bmap
>>>> again from the dirty bitmap in the memory region. Thus, many of our
>>>> hints end up being mostly ignored. The smaller the clear bmap chunk, the
>> more extreme the issue.
>>>
>>> OK, that looks possible. We need to clear the related bits from the
>>> memory region bitmap before skipping the free pages. Could you try with
>> below patch:
>>
>> I did a quick test (with the memhog example) and it seems like it mostly works.
>> However, we're now doing the bitmap clearing from another, racing with the
>> migration thread. Especially:
>>
>> 1. Racing with clear_bmap_set() via cpu_physical_memory_sync_dirty_bitmap()
>> 2. Racing with migration_bitmap_clear_dirty()
>>
>> So that might need some thought, if I'm not wrong.
> 
> I think this is similar to the non-clear_bmap case, where the rb->bmap is set or cleared by
> the migration thread and qemu_guest_free_page_hint. For example, the migration thread
> could find a bit is set from rb->bmap before qemu_guest_free_page_hint gets that bit cleared in time.
> The result is that the free page is transferred, which isn't necessary, but won't cause any issue.
> This is very rare in practice.

Here are my concerns regarding races:

a) We now end up calling migration_clear_memory_region_dirty_bitmap() 
without holding the bitmap_mutex. We have to clarify if that is ok. At 
least migration_bitmap_clear_dirty() holds it *while* clearing the log 
and migration_bitmap_sync() while setting bits in the clear_bmap. I 
think we also have to hold the mutex on the new path.

b) We now can end up calling memory_region_clear_dirty_bitmap() 
concurrently, not sure if all notifiers can handle that. I'd assume it 
is okay, but I wouldn't bet on it.

Maybe Peter can help clarifying.

-- 
Thanks,

David / dhildenb


Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Posted by Peter Xu 4 years, 7 months ago
On Sat, Jul 03, 2021 at 02:53:27AM +0000, Wang, Wei W wrote:
> On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
> > On 02.07.21 04:48, Wang, Wei W wrote:
> > > On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
> > >> On 01.07.21 14:51, Peter Xu wrote:
> > 
> > I think that clearly shows the issue.
> > 
> > My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
> > Assume the VM reports all pages within a 1GB chunk as free (easy with a fresh
> > VM). While processing hints, we will clear the bits from the dirty bmap in the
> > RAMBlock. As we will never migrate any page of that 1GB chunk, we will not
> > actually clear the dirty bitmap of the memory region. When re-syncing, we will
> > set all bits bits in the dirty bmap again from the dirty bitmap in the memory
> > region. Thus, many of our hints end up being mostly ignored. The smaller the
> > clear bmap chunk, the more extreme the issue.
> 
> OK, that looks possible. We need to clear the related bits from the memory region
> bitmap before skipping the free pages. Could you try with below patch:
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index ace8ad431c..a1f6df3e6c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -811,6 +811,26 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>      return next;
>  }
> 
> +
> +static void migration_clear_memory_region_dirty_bitmap(RAMState *rs,
> +                                                       RAMBlock *rb,
> +                                                      unsigned long page)
> +{
> +    uint8_t shift;
> +    hwaddr size, start;
> +
> +    if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page))
> +        return;
> +
> +    shift = rb->clear_bmap_shift;
> +    assert(shift >= 6);
> +
> +    size = 1ULL << (TARGET_PAGE_BITS + shift);
> +    start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
> +    trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
> +    memory_region_clear_dirty_bitmap(rb->mr, start, size);
> +}
> +
>  static inline bool migration_bitmap_clear_dirty(RAMState *rs,
>                                                  RAMBlock *rb,
>                                                  unsigned long page)
> @@ -827,26 +847,9 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
>       * the page in the chunk we clear the remote dirty bitmap for all.
>       * Clearing it earlier won't be a problem, but too late will.
>       */
> -    if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) {
> -        uint8_t shift = rb->clear_bmap_shift;
> -        hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift);
> -        hwaddr start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
> -
> -        /*
> -         * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
> -         * can make things easier sometimes since then start address
> -         * of the small chunk will always be 64 pages aligned so the
> -         * bitmap will always be aligned to unsigned long.  We should
> -         * even be able to remove this restriction but I'm simply
> -         * keeping it.
> -         */
> -        assert(shift >= 6);
> -        trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
> -        memory_region_clear_dirty_bitmap(rb->mr, start, size);
> -    }
> +    migration_clear_memory_region_dirty_bitmap(rs, rb, page);
> 
>      ret = test_and_clear_bit(page, rb->bmap);
> -
>      if (ret) {
>          rs->migration_dirty_pages--;
>      }
> @@ -2746,7 +2749,7 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
>  {
>      RAMBlock *block;
>      ram_addr_t offset;
> -    size_t used_len, start, npages;
> +    size_t used_len, start, npages, page_to_clear, i = 0;
>      MigrationState *s = migrate_get_current();
> 
>      /* This function is currently expected to be used during live migration */
> @@ -2775,6 +2778,19 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
>          start = offset >> TARGET_PAGE_BITS;
>          npages = used_len >> TARGET_PAGE_BITS;
> 
> +        /*
> +         * The skipped free pages are equavelent to be sent from clear_bmap's
> +        * perspective, so clear the bits from the memory region bitmap which
> +        * are initially set. Otherwise those skipped pages will be sent in
> +        * the next round after syncing from the memory region bitmap.
> +        */
> +        /*
> +         * The skipped free pages are equavelent to be sent from clear_bmap's
> +        * perspective, so clear the bits from the memory region bitmap which
> +        * are initially set. Otherwise those skipped pages will be sent in
> +        * the next round after syncing from the memory region bitmap.
> +        */
> +       do {
> +            page_to_clear = start + (i++ << block->clear_bmap_shift);

Why "i" needs to be shifted?

> +            migration_clear_memory_region_dirty_bitmap(ram_state,
> +                                                       block,
> +                                                       page_to_clear);
> +       } while (i <= npages >> block->clear_bmap_shift);

I agree with David that this should be better put into the mutex section, if so
we'd also touch up comment for bitmap_mutex.  Or is it a reason to explicitly
not do so?

> +
>          qemu_mutex_lock(&ram_state->bitmap_mutex);
>          ram_state->migration_dirty_pages -=
>                        bitmap_count_one_with_offset(block->bmap, start, npages);

After my patch (move mutex out of migration_bitmap_clear_dirty()), maybe we can
call migration_bitmap_clear_dirty() directly here rather than introducing a new
helper?  It'll done all the dirty/clear bmap ops including dirty page accounting.

Thanks,

-- 
Peter Xu


RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Posted by Wang, Wei W 4 years, 7 months ago
On Wednesday, July 7, 2021 1:47 AM, Peter Xu wrote:
> On Sat, Jul 03, 2021 at 02:53:27AM +0000, Wang, Wei W wrote:
> > +       do {
> > +            page_to_clear = start + (i++ << block->clear_bmap_shift);
> 
> Why "i" needs to be shifted?

Just move to the next clear chunk, no?
For example, (1 << 18) pages chunk (i.e. 1GB).

> 
> > +            migration_clear_memory_region_dirty_bitmap(ram_state,
> > +                                                       block,
> > +
> page_to_clear);
> > +       } while (i <= npages >> block->clear_bmap_shift);
> 
> I agree with David that this should be better put into the mutex section, if so
> we'd also touch up comment for bitmap_mutex.  Or is it a reason to explicitly
> not do so?

clear_bmap_test_and_clear already uses atomic operation on clear_bmap.
But it's also OK to me if you guys feel safer to move it under the lock.

Best,
Wei

Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Posted by Peter Xu 4 years, 7 months ago
On Wed, Jul 07, 2021 at 08:34:50AM +0000, Wang, Wei W wrote:
> On Wednesday, July 7, 2021 1:47 AM, Peter Xu wrote:
> > On Sat, Jul 03, 2021 at 02:53:27AM +0000, Wang, Wei W wrote:
> > > +       do {
> > > +            page_to_clear = start + (i++ << block->clear_bmap_shift);
> > 
> > Why "i" needs to be shifted?
> 
> Just move to the next clear chunk, no?
> For example, (1 << 18) pages chunk (i.e. 1GB).

But migration_clear_memory_region_dirty_bitmap() has done the shifting?

> 
> > 
> > > +            migration_clear_memory_region_dirty_bitmap(ram_state,
> > > +                                                       block,
> > > +
> > page_to_clear);
> > > +       } while (i <= npages >> block->clear_bmap_shift);
> > 
> > I agree with David that this should be better put into the mutex section, if so
> > we'd also touch up comment for bitmap_mutex.  Or is it a reason to explicitly
> > not do so?
> 
> clear_bmap_test_and_clear already uses atomic operation on clear_bmap.
> But it's also OK to me if you guys feel safer to move it under the lock.

I see, yes seems ok to be out of the lock.  Or we use mutex to protect all of
them, then make clear_bmap* helpers non-atomic too, just like dirty bmap.

Thanks,

-- 
Peter Xu


RE: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Posted by Wang, Wei W 4 years, 7 months ago
On Thursday, July 8, 2021 12:55 AM, Peter Xu wrote:
> On Wed, Jul 07, 2021 at 08:34:50AM +0000, Wang, Wei W wrote:
> > On Wednesday, July 7, 2021 1:47 AM, Peter Xu wrote:
> > > On Sat, Jul 03, 2021 at 02:53:27AM +0000, Wang, Wei W wrote:
> > > > +       do {
> > > > +            page_to_clear = start + (i++ <<
> > > > + block->clear_bmap_shift);
> > >
> > > Why "i" needs to be shifted?
> >
> > Just move to the next clear chunk, no?
> > For example, (1 << 18) pages chunk (i.e. 1GB).
> 
> But migration_clear_memory_region_dirty_bitmap() has done the shifting?
> 

Please see this example: start=0, npages = 2 * (1 <<18), i.e. we have 2 chunks of pages to clear, and start from 0.
First chunk: from 0 to (1 <<18);
Second chunk: from (1 << 18) to 2*(1<<18).

To clear the second chunk, we need to pass (start + "1 << 18") to migration_clear_memory_region_dirty_bitmap(),
and clear_bmap_test_and_clear() there will do ">>18" to transform it into the id of clear_bitmap, which is 1.

Best,
Wei