This is the first version for auto-converge refinements; refer to the
following link for details about the RFC version:
https://patchew.org/QEMU/cover.1725891841.git.yong.huang@smartx.com/
This series introduces two refinements called "background sync" and
"responsive throttle," respectively.
1. background sync:
The original auto-converge throttle logic doesn't look like it will
scale because migration_trigger_throttle() is only called for each
iteration, so it won't be invoked for a long time if one iteration
can take a long time.
The background sync would fix this issue by implementing the background
dirty bitmap sync and throttle automatically once detect that
the iteration lasts a long time during the migration.
The background sync is implemented transparently, and there is no
new-added interface for upper apps.
2. responsive throttle:
The original auto-converge throttle logic determines if the migration
is convergent by one criteria, and if the iteration fits twice, then
launch the CPU throttle or increase the throttle percentage. This
results in that the migration_trigger_throttle() won't be invoked for
a long time if one iteration can take a long time too.
The responsive throttle introduce one more criteria to assist detecting
the convergence of the migration, if either of the two criteria is
met, migration_trigger_throttle() would be called. This also makes it
more likely that the CPU throttle will be activated, thereby
accelerating the migration process.
The responsive throttle provides the 'cpu-responsive-throttle' option
to enable this feature.
We test this two features with the following environment:
a. Test tool:
guestperf
Refer to the following link to see details:
https://github.com/qemu/qemu/tree/master/tests/migration/guestperf
b. Test VM scale:
CPU: 16; Memory: 100GB
c. Average bandwidth between source and destination for migration:
1.59 Gbits/sec
We use stress tool contained in the initrd-stress.img to update
ramsize MB on every CPU in guest, refer to the following link to
see the source code:
https://github.com/qemu/qemu/blob/master/tests/migration/stress.c
The following command is executed to compare our refined QEMU with the
original QEMU:
# python3.6 guestperf.py --binary /path/to/qemu-kvm --cpus 16 \
--mem 100 --max-iters 200 --max-time 1200 --dst-host {dst_ip} \
--kernel /path/to/vmlinuz --initrd /path/to/initrd-stress.img \
--transport tcp --downtime 500 --auto-converge --auto-converge-step 10 \
--verbose --stress-mem {ramsize}
We set ramsize to 150MB to simulate the light load, 3000MB as moderate
load and 5000MB as heavy load. Test cases were executed three times in
each scenario.
The following data shows the migration test results with an increase in
stress.
ramsize: 150MB
|------------+-----------+----------+-----------+--------------|
| | totaltime | downtime | iteration | max throttle |
| | (ms) | (ms) | count | percent |
|------------+-----------+----------+-----------+--------------|
| original | 123685 | 490 | 87 | 99% |
| | 116249 | 542 | 45 | 60% |
| | 107772 | 587 | 8 | 0% |
|------------+-----------+----------+-----------+--------------|
| background | 113744 | 1654 | 16 | 20% |
| sync | 122623 | 758 | 60 | 80% |
| | 112668 | 547 | 23 | 20% |
|------------+-----------+----------+-----------+--------------|
| background | 113660 | 573 | 5 | 0% |
| sync + | 109357 | 576 | 6 | 0% |
| responsive | 126792 | 494 | 37 | 99% |
| throttle | | | | |
|------------+-----------+----------+-----------+--------------|
ramsize: 3000MB
|------------+-----------+----------+-----------+--------------|
| | totaltime | downtime | iteration | max throttle |
| | (ms) | (ms) | count | percent |
|------------+-----------+----------+-----------+--------------|
| original | 404398 | 515 | 26 | 99% |
| | 392552 | 528 | 25 | 99% |
| | 400113 | 447 | 24 | 99% |
|------------+-----------+----------+-----------+--------------|
| background | 239151 | 681 | 25 | 99% |
| sync | 295047 | 587 | 41 | 99% |
| | 289936 | 681 | 34 | 99% |
|------------+-----------+----------+-----------+--------------|
| background | 212786 | 487 | 22 | 99% |
| sync + | 225246 | 666 | 23 | 99% |
| responsive | 244053 | 572 | 27 | 99% |
| throttle | | | | |
|------------+-----------+----------+-----------+--------------|
ramsize: 5000MB
|------------+-----------+----------+-----------+--------------|
| | totaltime | downtime | iteration | max throttle |
| | (ms) | (ms) | count | percent |
|------------+-----------+----------+-----------+--------------|
| original | 566357 | 644 | 22 | 99% |
| | 607471 | 320 | 23 | 99% |
| | 603136 | 417 | 22 | 99% |
|------------+-----------+----------+-----------+--------------|
| background | 284605 | 793 | 27 | 99% |
| sync | 272270 | 668 | 28 | 99% |
| | 267543 | 545 | 28 | 99% |
|------------+-----------+----------+-----------+--------------|
| background | 226446 | 413 | 22 | 99% |
| sync + | 232082 | 494 | 23 | 99% |
| responsive | 269863 | 533 | 23 | 99% |
| throttle | | | | |
|------------+-----------+----------+-----------+--------------|
To summarize the data above, any data that implies negative optimization
does not appear, the refinement saves the total time significantly and,
therefore, shortens the duration of the guest performance degradation.
Additionally, we examined the memory performance curves generated from
the test case results above; while no negative optimization is there,
but the performance degradation occurs more quickly. Since it is
inconvenient to display the graphic data, one can independently
verify it.
Please review, any comments and advice are appreciated.
Thanks,
Yong
Hyman Huang (7):
migration: Introduce structs for background sync
migration: Refine util functions to support background sync
qapi/migration: Introduce the iteration-count
migration: Implment background sync watcher
migration: Support background dirty bitmap sync and throttle
qapi/migration: Introduce cpu-responsive-throttle parameter
migration: Support responsive CPU throttle
include/exec/ram_addr.h | 107 ++++++++++++++-
include/exec/ramblock.h | 45 +++++++
migration/migration-hmp-cmds.c | 8 ++
migration/migration-stats.h | 4 +
migration/migration.c | 13 ++
migration/options.c | 20 +++
migration/options.h | 1 +
migration/ram.c | 236 ++++++++++++++++++++++++++++++---
migration/ram.h | 3 +
migration/trace-events | 4 +
qapi/migration.json | 22 ++-
tests/qtest/migration-test.c | 40 ++++++
12 files changed, 479 insertions(+), 24 deletions(-)
--
2.39.1