[PATCH v2] migration: Count new_dirty instead of real_dirty

Keqian Zhu posted 1 patch 3 years, 11 months ago
Test FreeBSD passed
Test asan passed
Test docker-quick@centos7 passed
Test checkpatch passed
Test docker-mingw@fedora passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200616021059.25984-1-zhukeqian1@huawei.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>
There is a newer version of this series
include/exec/ram_addr.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
[PATCH v2] migration: Count new_dirty instead of real_dirty
Posted by Keqian Zhu 3 years, 11 months ago
real_dirty_pages becomes equal to total ram size after dirty log sync
in ram_init_bitmaps, the reason is that the bitmap of ramblock is
initialized to be all set, so old path counts them as "real dirty" at
beginning.

This causes wrong dirty rate and false positive throttling at the end
of first ram save iteration.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>

---
Changelog:

v2:
 - use new_dirty_pages instead of accu_dirty_pages.
 - adjust commit messages.

---
 include/exec/ram_addr.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
index 7b5c24e928..a95e2e7c25 100644
--- a/include/exec/ram_addr.h
+++ b/include/exec/ram_addr.h
@@ -443,7 +443,7 @@ static inline
 uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
                                                ram_addr_t start,
                                                ram_addr_t length,
-                                               uint64_t *real_dirty_pages)
+                                               uint64_t *new_dirty_pages)
 {
     ram_addr_t addr;
     unsigned long word = BIT_WORD((start + rb->offset) >> TARGET_PAGE_BITS);
@@ -469,7 +469,6 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
             if (src[idx][offset]) {
                 unsigned long bits = atomic_xchg(&src[idx][offset], 0);
                 unsigned long new_dirty;
-                *real_dirty_pages += ctpopl(bits);
                 new_dirty = ~dest[k];
                 dest[k] |= bits;
                 new_dirty &= bits;
@@ -502,7 +501,6 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
                         start + addr + offset,
                         TARGET_PAGE_SIZE,
                         DIRTY_MEMORY_MIGRATION)) {
-                *real_dirty_pages += 1;
                 long k = (start + addr) >> TARGET_PAGE_BITS;
                 if (!test_and_set_bit(k, dest)) {
                     num_dirty++;
@@ -511,6 +509,7 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
         }
     }
 
+    *new_dirty_pages += num_dirty;
     return num_dirty;
 }
 #endif
-- 
2.19.1


Re: [PATCH v2] migration: Count new_dirty instead of real_dirty
Posted by Dr. David Alan Gilbert 3 years, 11 months ago
* Keqian Zhu (zhukeqian1@huawei.com) wrote:
> real_dirty_pages becomes equal to total ram size after dirty log sync
> in ram_init_bitmaps, the reason is that the bitmap of ramblock is
> initialized to be all set, so old path counts them as "real dirty" at
> beginning.
> 
> This causes wrong dirty rate and false positive throttling at the end
> of first ram save iteration.
> 
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>

Since this function already returns num_dirty, why not just change the
caller to increment a counter based off the return value?

Can you point to the code which is using this value that triggers the
throttle?

Dave


> ---
> Changelog:
> 
> v2:
>  - use new_dirty_pages instead of accu_dirty_pages.
>  - adjust commit messages.
> 
> ---
>  include/exec/ram_addr.h | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h
> index 7b5c24e928..a95e2e7c25 100644
> --- a/include/exec/ram_addr.h
> +++ b/include/exec/ram_addr.h
> @@ -443,7 +443,7 @@ static inline
>  uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
>                                                 ram_addr_t start,
>                                                 ram_addr_t length,
> -                                               uint64_t *real_dirty_pages)
> +                                               uint64_t *new_dirty_pages)
>  {
>      ram_addr_t addr;
>      unsigned long word = BIT_WORD((start + rb->offset) >> TARGET_PAGE_BITS);
> @@ -469,7 +469,6 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
>              if (src[idx][offset]) {
>                  unsigned long bits = atomic_xchg(&src[idx][offset], 0);
>                  unsigned long new_dirty;
> -                *real_dirty_pages += ctpopl(bits);
>                  new_dirty = ~dest[k];
>                  dest[k] |= bits;
>                  new_dirty &= bits;
> @@ -502,7 +501,6 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
>                          start + addr + offset,
>                          TARGET_PAGE_SIZE,
>                          DIRTY_MEMORY_MIGRATION)) {
> -                *real_dirty_pages += 1;
>                  long k = (start + addr) >> TARGET_PAGE_BITS;
>                  if (!test_and_set_bit(k, dest)) {
>                      num_dirty++;
> @@ -511,6 +509,7 @@ uint64_t cpu_physical_memory_sync_dirty_bitmap(RAMBlock *rb,
>          }
>      }
>  
> +    *new_dirty_pages += num_dirty;
>      return num_dirty;
>  }
>  #endif
> -- 
> 2.19.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [PATCH v2] migration: Count new_dirty instead of real_dirty
Posted by zhukeqian 3 years, 11 months ago
Hi Dave,

On 2020/6/16 17:35, Dr. David Alan Gilbert wrote:
> * Keqian Zhu (zhukeqian1@huawei.com) wrote:
>> real_dirty_pages becomes equal to total ram size after dirty log sync
>> in ram_init_bitmaps, the reason is that the bitmap of ramblock is
>> initialized to be all set, so old path counts them as "real dirty" at
>> beginning.
>>
>> This causes wrong dirty rate and false positive throttling at the end
>> of first ram save iteration.
>>
>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> 
> Since this function already returns num_dirty, why not just change the
> caller to increment a counter based off the return value?
Yes, that would be better :-) .

> 
> Can you point to the code which is using this value that triggers the
> throttle?
> 
In migration_trigger_throttle(), rs->num_dirty_pages_period is used.
And it corresponds to real_dirty_pages here.

Thanks,
Keqian

> Dave
> 
> 
[...]
>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> .
> 

Re: [PATCH v2] migration: Count new_dirty instead of real_dirty
Posted by Dr. David Alan Gilbert 3 years, 11 months ago
* zhukeqian (zhukeqian1@huawei.com) wrote:
> Hi Dave,
> 
> On 2020/6/16 17:35, Dr. David Alan Gilbert wrote:
> > * Keqian Zhu (zhukeqian1@huawei.com) wrote:
> >> real_dirty_pages becomes equal to total ram size after dirty log sync
> >> in ram_init_bitmaps, the reason is that the bitmap of ramblock is
> >> initialized to be all set, so old path counts them as "real dirty" at
> >> beginning.
> >>
> >> This causes wrong dirty rate and false positive throttling at the end
> >> of first ram save iteration.
> >>
> >> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> > 
> > Since this function already returns num_dirty, why not just change the
> > caller to increment a counter based off the return value?
> Yes, that would be better :-) .
> 
> > 
> > Can you point to the code which is using this value that triggers the
> > throttle?
> > 
> In migration_trigger_throttle(), rs->num_dirty_pages_period is used.
> And it corresponds to real_dirty_pages here.

OK; so is the problem not the same as the check that's in there for
blk_mig_bulk_activate - don't we need to do the same trick for ram bulk
migration (i.e. the first pass).

Dave

> Thanks,
> Keqian
> 
> > Dave
> > 
> > 
> [...]
> >>
> >>
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> > 
> > .
> > 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


Re: [PATCH v2] migration: Count new_dirty instead of real_dirty
Posted by zhukeqian 3 years, 11 months ago
Hi Dave,

On 2020/6/16 17:58, Dr. David Alan Gilbert wrote:
> * zhukeqian (zhukeqian1@huawei.com) wrote:
>> Hi Dave,
>>
>> On 2020/6/16 17:35, Dr. David Alan Gilbert wrote:
>>> * Keqian Zhu (zhukeqian1@huawei.com) wrote:
>>>> real_dirty_pages becomes equal to total ram size after dirty log sync
>>>> in ram_init_bitmaps, the reason is that the bitmap of ramblock is
>>>> initialized to be all set, so old path counts them as "real dirty" at
>>>> beginning.
>>>>
>>>> This causes wrong dirty rate and false positive throttling at the end
>>>> of first ram save iteration.
>>>>
>>>> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
>>>
>>> Since this function already returns num_dirty, why not just change the
>>> caller to increment a counter based off the return value?
>> Yes, that would be better :-) .
>>
>>>
>>> Can you point to the code which is using this value that triggers the
>>> throttle?
>>>
>> In migration_trigger_throttle(), rs->num_dirty_pages_period is used.
>> And it corresponds to real_dirty_pages here.
> 
> OK; so is the problem not the same as the check that's in there for
> blk_mig_bulk_activate - don't we need to do the same trick for ram bulk
> migration (i.e. the first pass).
> 
Sorry that I do not get your idea clearly. Could you give some sample
code?

> Dave
> 
>> Thanks,
>> Keqian
>>
>>> Dave
>>>
>>>
>> [...]
>>>>
>>>>
>>> --
>>> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
>>>
>>> .
>>>
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
> .
>