[v7] support dirty restraint on vCPU

[PATCH v7 0/3] support dirty restraint on vCPU

Posted by huangy81@chinatelecom.cn 4 years, 2 months ago

From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>

The patch [2/3] has not been touched so far. Any corrections and
suggetions are welcome. 

Please review, thanks!

v7:
- rebase on master
- polish the comments and error message according to the
  advices given by Markus
- introduce dirtylimit_enabled function to pre-check if dirty
  page limit is enabled before canceling.

v6:
- rebase on master
- fix dirtylimit setup crash found by Markus
- polish the comments according to the advice given by Markus
- adjust the qemu qmp command tag to 7.0

v5:
- rebase on master
- adjust the throttle algorithm by removing the tuning in 
  RESTRAINT_RATIO case so that dirty page rate could reachs the quota
  more quickly.
- fix percentage update in throttle iteration.

v4:
- rebase on master
- modify the following points according to the advice given by Markus
  1. move the defination into migration.json
  2. polish the comments of set-dirty-limit
  3. do the syntax check and change dirty rate to dirty page rate

Thanks for the carefule reviews made by Markus.

Please review, thanks!

v3:
- rebase on master
- modify the following points according to the advice given by Markus
  1. remove the DirtyRateQuotaVcpu and use its field as option directly
  2. add comments to show details of what dirtylimit setup do
  3. explain how to use dirtylimit in combination with existing qmp
     commands "calc-dirty-rate" and "query-dirty-rate" in documentation.

Thanks for the carefule reviews made by Markus.

Please review, thanks!

Hyman

v2:
- rebase on master
- modify the following points according to the advices given by Juan
  1. rename dirtyrestraint to dirtylimit
  2. implement the full lifecyle function of dirtylimit_calc, include
     dirtylimit_calc and dirtylimit_calc_quit
  3. introduce 'quit' field in dirtylimit_calc_state to implement the
     dirtylimit_calc_quit
  4. remove the ready_cond and ready_mtx since it may not be suitable
  5. put the 'record_dirtypage' function code at the beggining of the
     file
  6. remove the unnecesary return;
- other modifications has been made after code review
  1. introduce 'bmap' and 'nr' field in dirtylimit_state to record the
     number of running thread forked by dirtylimit
  2. stop the dirtyrate calculation thread if all the dirtylimit thread
     are stopped
  3. do some renaming works
     dirtyrate calulation thread -> dirtylimit-calc
     dirtylimit thread -> dirtylimit-{cpu_index}
     function name do_dirtyrestraint -> dirtylimit_check
     qmp command dirty-restraint -> set-drity-limit
     qmp command dirty-restraint-cancel -> cancel-dirty-limit
     header file dirtyrestraint.h -> dirtylimit.h

Please review, thanks !

thanks for the accurate and timely advices given by Juan. we really
appreciate it if corrections and suggetions about this patchset are
proposed.

Best Regards !

Hyman

v1:
this patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.

For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects.
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
  struggling to find the optimal throttle percentage when
  dirtyrate is high.
- hard to predict the remaining time of migration if the
  throttling percentage reachs 99%

to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.

the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.

this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it also can be an independent
api to supply the upper app such as libvirt, which can use it to
implement the convergence logic during live migration, supplemented
with the qmp 'calc-dirty-rate' command or whatever.

we post this patchset for RFC and any corrections and suggetions about
the implementation, api, throttleing algorithm or whatever are very
appreciated!

Please review, thanks !

Best Regards !

Hyman Huang (3):
  migration/dirtyrate: implement vCPU dirtyrate calculation periodically
  cpu-throttle: implement vCPU throttle
  cpus-common: implement dirty page limit on vCPU

 cpus-common.c                 |  48 +++++++
 include/exec/memory.h         |   5 +-
 include/hw/core/cpu.h         |   9 ++
 include/sysemu/cpu-throttle.h |  30 ++++
 include/sysemu/dirtylimit.h   |  44 ++++++
 migration/dirtyrate.c         | 139 +++++++++++++++++--
 migration/dirtyrate.h         |   2 +
 qapi/migration.json           |  43 ++++++
 softmmu/cpu-throttle.c        | 316 ++++++++++++++++++++++++++++++++++++++++++
 softmmu/trace-events          |   5 +
 softmmu/vl.c                  |   1 +
 11 files changed, 631 insertions(+), 11 deletions(-)
 create mode 100644 include/sysemu/dirtylimit.h

-- 
1.8.3.1

Re: [PATCH v7 0/3] support dirty restraint on vCPU

Posted by Peter Xu 4 years, 2 months ago

On Tue, Nov 30, 2021 at 06:28:10PM +0800, huangy81@chinatelecom.cn wrote:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
> 
> The patch [2/3] has not been touched so far. Any corrections and
> suggetions are welcome. 

I played with it today, but the vcpu didn't got throttled as expected.

What I did was starting two workload with 500mb/s, each pinned on one vcpu
thread:

[root@fedora ~]# pgrep -fa mig_mon
595 ./mig_mon mm_dirty 1000 500 sequential
604 ./mig_mon mm_dirty 1000 500 sequential
[root@fedora ~]# taskset -pc 595
pid 595's current affinity list: 2
[root@fedora ~]# taskset -pc 604
pid 604's current affinity list: 3

Then start throttle with 100mb/s:

(QEMU) set-dirty-limit cpu-index=3 dirty-rate=100
{"return": {}}
(QEMU) set-dirty-limit cpu-index=2 dirty-rate=100
{"return": {}}

I can see the workload dropped a tiny little bit (perhaps 500mb -> 499mb), then
it keeps going..

Further throttle won't work too:

(QEMU) set-dirty-limit cpu-index=2 dirty-rate=10
{"return": {}}

Funnily, the ssh client got slowed down instead... :(

Yong, how did you test it?

-- 
Peter Xu

Re: [PATCH v7 0/3] support dirty restraint on vCPU

Posted by Hyman Huang 4 years, 2 months ago

1.
Start vm with kernel+initrd.img with qemu command line as following:

[root@Hyman_server1 fast_qemu]# cat vm.sh
#!/bin/bash
/usr/bin/qemu-system-x86_64 \
     -display none -vga none \
     -name guest=simple_vm,debug-threads=on \
     -monitor stdio \
     -machine pc-i440fx-2.12 \
     -accel kvm,dirty-ring-size=65536 -cpu host \
     -kernel /home/work/fast_qemu/vmlinuz-5.13.0-rc4+ \
     -initrd /home/work/fast_qemu/initrd-stress.img \
     -append "noapic edd=off printk.time=1 noreplace-smp 
cgroup_disable=memory pci=noearly console=ttyS0 debug ramsize=1500 
ratio=1 sleep=1" \
     -chardev file,id=charserial0,path=/var/log/vm_console.log \
     -serial chardev:charserial0 \
     -qmp unix:/tmp/qmp-sock,server,nowait \
     -D /var/log/vm.log \
     --trace events=/home/work/fast_qemu/events \
     -m 4096 -smp 2 -device sga

2.
Enable the dirtylimit trace event which will output to /var/log/vm.log
[root@Hyman_server1 fast_qemu]# cat /home/work/fast_qemu/events
dirtylimit_state_init
dirtylimit_vcpu
dirtylimit_impose
dirtyrate_do_calculate_vcpu


3.
Connect the qmp server with low level qmp client and set-dirty-limit

[root@Hyman_server1 my_qemu]# python3.6 ./scripts/qmp/qmp-shell -v -p 
/tmp/qmp-sock 
 

Welcome to the QMP low-level shell!
Connected to QEMU 6.1.92

(QEMU) set-dirty-limit cpu-index=1 dirty-rate=400 
 
 

{
     "arguments": {
         "cpu-index": 1,
         "dirty-rate": 400
     },
     "execute": "set-dirty-limit"
}

4.
observe the vcpu current dirty rate and quota dirty rate...

[root@Hyman_server1 ~]# tail -f /var/log/vm.log
dirtylimit_state_init dirtylimit state init: max cpus 2
dirtylimit_vcpu CPU[1] set quota dirtylimit 400
dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 0, 
percentage 0
dirtyrate_do_calculate_vcpu vcpu[0]: 1075 MB/s
dirtyrate_do_calculate_vcpu vcpu[1]: 1061 MB/s
dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 1061, 
percentage 62
dirtyrate_do_calculate_vcpu vcpu[0]: 1133 MB/s
dirtyrate_do_calculate_vcpu vcpu[1]: 380 MB/s
dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 380, 
percentage 57
dirtyrate_do_calculate_vcpu vcpu[0]: 1227 MB/s
dirtyrate_do_calculate_vcpu vcpu[1]: 464 MB/s

We can observe that vcpu-1's dirtyrate is about 400MB/s with dirty page 
limit set and the vcpu-0 is not affected.

5.
observe the vm stress info...
[root@Hyman_server1 fast_qemu]# tail -f /var/log/vm_console.log
[    0.838051] Run /init as init process
[    0.839216]   with arguments:
[    0.840153]     /init
[    0.840882]   with environment:
[    0.841884]     HOME=/
[    0.842649]     TERM=linux
[    0.843478]     edd=off
[    0.844233]     ramsize=1500
[    0.845079]     ratio=1
[    0.845829]     sleep=1
/init (00001): INFO: RAM 1500 MiB across 2 CPUs, ratio 1, sleep 1 us
[    1.158011] random: init: uninitialized urandom read (4096 bytes read)
[    1.448205] random: init: uninitialized urandom read (4096 bytes read)
/init (00001): INFO: 1638282593684ms copied 1 GB in 00729ms
/init (00110): INFO: 1638282593964ms copied 1 GB in 00719ms
/init (00001): INFO: 1638282594405ms copied 1 GB in 00719ms
/init (00110): INFO: 1638282594677ms copied 1 GB in 00713ms
/init (00001): INFO: 1638282595093ms copied 1 GB in 00686ms
/init (00110): INFO: 1638282595339ms copied 1 GB in 00662ms
/init (00001): INFO: 1638282595764ms copied 1 GB in 00670m

PS: the kernel and initrd images comes from:

kernel image: vmlinuz-5.13.0-rc4+, normal centos vmlinuz copied from 
/boot directory

initrd.img: initrd-stress.img, only contains a stress binary, which 
compiled from qemu source tests/migration/stress.c and run as init
in vm.

you can view README.md file of my project 
"met"(https://github.com/newfriday/met) to compile the initrd-stress.img. :)

On 11/30/21 20:57, Peter Xu wrote:
> On Tue, Nov 30, 2021 at 06:28:10PM +0800, huangy81@chinatelecom.cn wrote:
>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>
>> The patch [2/3] has not been touched so far. Any corrections and
>> suggetions are welcome.
> 
> I played with it today, but the vcpu didn't got throttled as expected.
> 
> What I did was starting two workload with 500mb/s, each pinned on one vcpu
> thread:
> 
> [root@fedora ~]# pgrep -fa mig_mon
> 595 ./mig_mon mm_dirty 1000 500 sequential
> 604 ./mig_mon mm_dirty 1000 500 sequential
> [root@fedora ~]# taskset -pc 595
> pid 595's current affinity list: 2
> [root@fedora ~]# taskset -pc 604
> pid 604's current affinity list: 3
> 
> Then start throttle with 100mb/s:
> 
> (QEMU) set-dirty-limit cpu-index=3 dirty-rate=100
> {"return": {}}
> (QEMU) set-dirty-limit cpu-index=2 dirty-rate=100
> {"return": {}}
> 
> I can see the workload dropped a tiny little bit (perhaps 500mb -> 499mb), then
> it keeps going..
> 
> Further throttle won't work too:
> 
> (QEMU) set-dirty-limit cpu-index=2 dirty-rate=10
> {"return": {}}
> 
> Funnily, the ssh client got slowed down instead... :(
> 
> Yong, how did you test it?
> 

-- 
Best Regards
Hyman Huang(黄勇)

Re: [PATCH v7 0/3] support dirty restraint on vCPU

Posted by Hyman Huang 4 years, 2 months ago


On 11/30/21 22:57, Hyman Huang wrote:
> 1.
> Start vm with kernel+initrd.img with qemu command line as following:
> 
> [root@Hyman_server1 fast_qemu]# cat vm.sh
> #!/bin/bash
> /usr/bin/qemu-system-x86_64 \
>      -display none -vga none \
>      -name guest=simple_vm,debug-threads=on \
>      -monitor stdio \
>      -machine pc-i440fx-2.12 \
>      -accel kvm,dirty-ring-size=65536 -cpu host \
>      -kernel /home/work/fast_qemu/vmlinuz-5.13.0-rc4+ \
>      -initrd /home/work/fast_qemu/initrd-stress.img \
>      -append "noapic edd=off printk.time=1 noreplace-smp 
> cgroup_disable=memory pci=noearly console=ttyS0 debug ramsize=1500 
> ratio=1 sleep=1" \
>      -chardev file,id=charserial0,path=/var/log/vm_console.log \
>      -serial chardev:charserial0 \
>      -qmp unix:/tmp/qmp-sock,server,nowait \
>      -D /var/log/vm.log \
>      --trace events=/home/work/fast_qemu/events \
>      -m 4096 -smp 2 -device sga
> 
> 2.
> Enable the dirtylimit trace event which will output to /var/log/vm.log
> [root@Hyman_server1 fast_qemu]# cat /home/work/fast_qemu/events
> dirtylimit_state_init
> dirtylimit_vcpu
> dirtylimit_impose
> dirtyrate_do_calculate_vcpu
> 
> 
> 3.
> Connect the qmp server with low level qmp client and set-dirty-limit
> 
> [root@Hyman_server1 my_qemu]# python3.6 ./scripts/qmp/qmp-shell -v -p 
> /tmp/qmp-sock
> 
> Welcome to the QMP low-level shell!
> Connected to QEMU 6.1.92
> 
> (QEMU) set-dirty-limit cpu-index=1 dirty-rate=400
> 
> 
> {
>      "arguments": {
>          "cpu-index": 1,
>          "dirty-rate": 400
>      },
>      "execute": "set-dirty-limit"
> }
> 
> 4.
> observe the vcpu current dirty rate and quota dirty rate...
> 
> [root@Hyman_server1 ~]# tail -f /var/log/vm.log
> dirtylimit_state_init dirtylimit state init: max cpus 2
> dirtylimit_vcpu CPU[1] set quota dirtylimit 400
> dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 0, 
> percentage 0
> dirtyrate_do_calculate_vcpu vcpu[0]: 1075 MB/s
> dirtyrate_do_calculate_vcpu vcpu[1]: 1061 MB/s
> dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 1061, 
> percentage 62
> dirtyrate_do_calculate_vcpu vcpu[0]: 1133 MB/s
> dirtyrate_do_calculate_vcpu vcpu[1]: 380 MB/s
> dirtylimit_impose CPU[1] impose dirtylimit: quota 400, current 380, 
> percentage 57
> dirtyrate_do_calculate_vcpu vcpu[0]: 1227 MB/s
> dirtyrate_do_calculate_vcpu vcpu[1]: 464 MB/s
> 
> We can observe that vcpu-1's dirtyrate is about 400MB/s with dirty page 
> limit set and the vcpu-0 is not affected.
> 
> 5.
> observe the vm stress info...
> [root@Hyman_server1 fast_qemu]# tail -f /var/log/vm_console.log
> [    0.838051] Run /init as init process
> [    0.839216]   with arguments:
> [    0.840153]     /init
> [    0.840882]   with environment:
> [    0.841884]     HOME=/
> [    0.842649]     TERM=linux
> [    0.843478]     edd=off
> [    0.844233]     ramsize=1500
> [    0.845079]     ratio=1
> [    0.845829]     sleep=1
> /init (00001): INFO: RAM 1500 MiB across 2 CPUs, ratio 1, sleep 1 us
> [    1.158011] random: init: uninitialized urandom read (4096 bytes read)
> [    1.448205] random: init: uninitialized urandom read (4096 bytes read)
> /init (00001): INFO: 1638282593684ms copied 1 GB in 00729ms
> /init (00110): INFO: 1638282593964ms copied 1 GB in 00719ms
> /init (00001): INFO: 1638282594405ms copied 1 GB in 00719ms
> /init (00110): INFO: 1638282594677ms copied 1 GB in 00713ms
> /init (00001): INFO: 1638282595093ms copied 1 GB in 00686ms
> /init (00110): INFO: 1638282595339ms copied 1 GB in 00662ms
> /init (00001): INFO: 1638282595764ms copied 1 GB in 00670m
> 
> PS: the kernel and initrd images comes from:
> 
> kernel image: vmlinuz-5.13.0-rc4+, normal centos vmlinuz copied from 
> /boot directory
> 
> initrd.img: initrd-stress.img, only contains a stress binary, which 
> compiled from qemu source tests/migration/stress.c and run as init
> in vm.
> 
> you can view README.md file of my project 
> "met"(https://github.com/newfriday/met) to compile the 
> initrd-stress.img. :)
> 
> On 11/30/21 20:57, Peter Xu wrote:
>> On Tue, Nov 30, 2021 at 06:28:10PM +0800, huangy81@chinatelecom.cn wrote:
>>> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>>>
>>> The patch [2/3] has not been touched so far. Any corrections and
>>> suggetions are welcome.
>>
>> I played with it today, but the vcpu didn't got throttled as expected.
>>
>> What I did was starting two workload with 500mb/s, each pinned on one 
>> vcpu
>> thread:
>>
>> [root@fedora ~]# pgrep -fa mig_mon
>> 595 ./mig_mon mm_dirty 1000 500 sequential
>> 604 ./mig_mon mm_dirty 1000 500 sequential
>> [root@fedora ~]# taskset -pc 595
>> pid 595's current affinity list: 2
>> [root@fedora ~]# taskset -pc 604
>> pid 604's current affinity list: 3
>>
>> Then start throttle with 100mb/s:
>>
>> (QEMU) set-dirty-limit cpu-index=3 dirty-rate=100
>> {"return": {}}
>> (QEMU) set-dirty-limit cpu-index=2 dirty-rate=100
>> {"return": {}}
>>
>> I can see the workload dropped a tiny little bit (perhaps 500mb -> 
>> 499mb), then
>> it keeps going..
The test step above i listed assume that dirtyrate calculated by 
dirtylimit_calc_func via dirty-ring is accurate, which differ from
your test policy.

The macro DIRTYLIMIT_CALC_TIME_MS used as calculation period in 
migration/dirtyrate.c has a big affect on result. So "how we define the 
right dirtyrate" is worth discussing.

Anyway, one of our target is to improve the memory performence during 
migration, so i think memory write/read speed in vm is a convincing 
metric. I'll test the dirtyrate in the way your metioned and analyze the 
result.

>>
>> Further throttle won't work too:
>>
>> (QEMU) set-dirty-limit cpu-index=2 dirty-rate=10
>> {"return": {}}
>>
>> Funnily, the ssh client got slowed down instead... :(
>>
>> Yong, how did you test it?
>>
> 

-- 
Best Regards
Hyman Huang(黄勇)