[PATCH 0/5] KVM: Dirty ring and auto converge optimization

Chongyun Wu posted 5 patches 2 years, 1 month ago
Only 0 patches received!
accel/kvm/kvm-all.c   | 239 ++++++++++++++++++++++++++++++++++++--
include/sysemu/cpus.h |   1 +
include/sysemu/kvm.h  |   2 +
migration/migration.c |  13 ++++++
migration/migration.h |   2 +
migration/ram.c       |  64 +++++++++++++++++++++++--
softmmu/cpus.c        |  18 ++++++++
softmmu/memory.c      |   6 +++
8 files changed, 316 insertions(+), 29 deletions(-)
[PATCH 0/5] KVM: Dirty ring and auto converge optimization
Posted by Chongyun Wu 2 years, 1 month ago
Overview
============
This series of patches is to optimize the performance of
online migration using dirty ring and autoconverge.

Mainly through the following aspects to do optimization:
1. When using the dirty ring mode to traverse each memslot
to obtain dirty pages, only call log_sync_global once,
because log_sync_global collects the dirty pages of all
memslots on all CPUs.

2. Dynamically adjust the dirty ring collection thread to
reduce the occurrence of ring full, thereby reducing the
impact on customers, improving the efficiency of dirty
page collection, and thus improving the migration efficiency.

3. When collecting dirty pages from KVM,
kvm_cpu_synchronize_kick_all is not called if the rate is
limited, and it is called only once before suspending the
virtual machine. Because kvm_cpu_synchronize_kick_all will
become very time-consuming when the CPU is limited, and
there will not be too many dirty pages, so it only needs
to be called once before suspending the virtual machine to
ensure that dirty pages will not be lost and the efficiency
of migration is guaranteed .

4. Based on the characteristic of collecting dirty pages
in the dirty ring, a new dirty page rate calculation method
is proposed to obtain a more accurate dirty page rate.

5. Use a more accurate dirty page rate and calculate the
matching speed limit threshold required to complete the
migration according to the current system bandwidth and
parameters, instead of the current speed limit by constantly
trying this time-consuming method, reducing the need for
practical Meaningful trial process, greatly reducing
migration time.

Testing
=======
     Test environment:
     Host: 64 cpus(Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz),
           512G memory,
           10G NIC
     VM: 2 cpus,4G memory and 8 cpus, 32G memory
     memory stress: run stress(qemu) in VM to generates memory stress

     Test1: Massive online migration(Run each test item 50 to 200 times)
     Test command: virsh -t migrate $vm --live --p2p --unsafe
     --undefinesource --persistent --auto-converge  --migrateuri
     tcp://${data_ip_remote}
     *********** Use optimized dirtry ring  ***********
     ring_size  mem_stress VM   average_migration_time(ms)
     4096      1G       2C4G     15888
     4096      3G       2C4G     13320
     65536     1G       2C4G     10036
     65536     3G       2C4G     12132
     4096      4G       8C32G    53629
     4096      8G       8C32G    62474
     4096      30G      8C32G    99025
     65536     4G       8C32G    45563
     65536     8G       8C32G    61114
     65536     30G      8C32G    102087
     *********** Use Unoptimized dirtry ring ***********
     ring_size  mem_stress VM   average_migration_time(ms)
     4096      1G       2C4G     23992
     4096      3G       2C4G     44234
     65536     1G       2C4G     24546
     65536     3G       2C4G     44939
     4096      4G       8C32G    88441
     4096      8G       8C32G    may not complete
     4096      30G      8C32G    602884
     65536     4G       8C32G    335535
     65536     8G       8C32G    1249232
     65536     30G      8C32G    616939
     *********** Use bitmap dirty tracking  ***********
     ring_size  mem_stress VM   average_migration_time(ms)
     0         1G       2C4G     24597
     0         3G       2C4G     45254
     0         4G       8C32G    103773
     0         8G       8C32G    129626
     0         30G      8C32G    588212

Test1 result:
     Compared with the old bitmap method and the unoptimized dirty ring,
     the migration time of the optimized dirty ring from the sorted data
     is greatly improved, especially when the virtual machine memory is
     large and the memory pressure is high, the effect is more obvious,
     can achieve five to six times the migration acceleration effect.

     And during the test, it was found that the dirty ring could not be
     completed for a long time after adding certain memory pressure.
     The optimized dirty ring did not encounter such a problem.

     Test2: qemu guestperf test
     Test ommand parameters:  --auto-converge  --stress-mem XX --downtime 300
     --bandwidth 10000
     *********** Use optimized dirtry ring  ***********
     ring_size stress VM    Significant_perf  max_memory_update cost_time(s)
                            _drop_duration(s) speed(ms/GB)
     4096       3G    2C4G        5.5           2962             23.5
     65536      3G    2C4G        6             3160             25
     4096       3G    8C32G       13            7921             38
     4096       6G    8C32G       16            11.6K            46
     4096       10G   8C32G       12.1          11.2K            47.6
     4096       20G   8C32G       20            20.2K            71
     4096       30G   8C32G       29.5          29K              94.5
     65536      3G    8C32G       14            8700             40
     65536      6G    8C32G       15            12K              46
     65536      10G   8C32G       11.5          11.1k            47.5
     65536      20G   8C32G       21            20.9K            72
     65536      30G   8C32G       29.5          29.1K            94.5
     *********** Use Unoptimized dirtry ring ***********
     ring_size stress VM    Significant_perf  max_memory_update cost_time(s)
                            _drop_duration(s) speed(ms/GB)
     4096        3G    2C4G        23            2766            46
     65536       3G    2C4G        22.2          3283            46
     4096        3G    8C32G       62            48.8K           106
     4096        6G    8C32G       68            23.87K          124
     4096        10G   8C32G       91            16.87K          190
     4096        20G   8C32G       152.8         28.65K          336.8
     4096        30G   8C32G       187           41.19K          502
     65536       3G    8C32G       71            12.7K           67
     65536       6G    8C32G       63            12K             46
     65536       10G   8C32G       88            25.3k           120
     65536       20G   8C32G       157.3         25K             391
     65536       30G   8C32G       171           30.8K           487
     *********** Use bitmap dirty tracking  ***********
     ring_size stress VM    Significant_perf  max_memory_update cost_time(s)
                            _drop_duration(s) speed(ms/GB)
     0           3G    2C4G        18             3300            38
     0           3G    8C32G       38             7571            66
     0           6G    8C32G       61.5           10.5K           115.5
     0           10G   8C32G       110            13.68k          180
     0           20G   8C32G       161.6          24.4K           280
     0           30G   8C32G       221.5          28.4K           337.5

Test2 result:
     The above test data shows that the guestperf performance of the
     optimized dirty ring during the migration process is significantly
     better than that of the unoptimized dirty ring, and slightly better
     than the bitmap method.

     During the migration process of the optimized dirty ring, the migration
     time is greatly reduced, and the time in the period of significant
     memory performance degradation is  significantly shorter than that of
     the bitmap mode and the unoptimized dirty ring mode. Therefore, the
     optimized ditry ring can better reduce the impact on guests accessing
     memory resources during the migration process.

Please review, thanks.

Chongyun Wu (5):
     kvm,memory: Optimize dirty page collection for dirty ring
     kvm: Dynamically control the load of the reaper thread
     kvm: Dirty ring autoconverge optmization for kvm_cpu_synchronize_kick_all
     kvm: Introduce a dirty rate calculation method based on dirty ring
     migration: Calculate the appropriate throttle for autoconverge

  accel/kvm/kvm-all.c   | 239 ++++++++++++++++++++++++++++++++++++--
  include/sysemu/cpus.h |   1 +
  include/sysemu/kvm.h  |   2 +
  migration/migration.c |  13 ++++++
  migration/migration.h |   2 +
  migration/ram.c       |  64 +++++++++++++++++++++++--
  softmmu/cpus.c        |  18 ++++++++
  softmmu/memory.c      |   6 +++
  8 files changed, 316 insertions(+), 29 deletions(-)
-- 
1.8.3.1

-- 
Best Regard,
Chongyun Wu