[PATCH RFC 00/12] *** mulitple RDMA channels for migration ***

Zhimin Feng posted 12 patches 4 years, 2 months ago
Test docker-mingw@fedora failed
Test checkpatch passed
Test docker-quick@centos7 passed
Test FreeBSD passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200109045922.904-1-fengzhimin1@huawei.com
Maintainers: Juan Quintela <quintela@redhat.com>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Markus Armbruster <armbru@redhat.com>, Eric Blake <eblake@redhat.com>
migration/migration.c |   55 +-
migration/migration.h |    6 +
migration/rdma.c      | 1320 +++++++++++++++++++++++++++++++++++++----
monitor/hmp-cmds.c    |    7 +
qapi/migration.json   |   27 +-
5 files changed, 1285 insertions(+), 130 deletions(-)
[PATCH RFC 00/12] *** mulitple RDMA channels for migration ***
Posted by Zhimin Feng 4 years, 2 months ago
From: fengzhimin <fengzhimin1@huawei.com>

Currently there is a single channel for RDMA migration, this causes
the problem that the network bandwidth is not fully utilized for
25Gigabit NIC. Inspired by the Multifd, we use two RDMA channels to
send RAM pages, which we call MultiRDMA.

We compare the migration performance of MultiRDMA with origin
RDMA migration. The VM specifications for migration are as follows:
- VM use 4k page;
- the number of VCPU is 4;
- the total memory is 16Gigabit;
- use 'mempress' tool to pressurize VM(mempress 8000 500);
- use 25Gigabit network card to migrate;

For origin RDMA and MultiRDMA migration, the total migration times of
VM are as follows:
+++++++++++++++++++++++++++++++++++++++++++++++++
|             | NOT rdma-pin-all | rdma-pin-all |
+++++++++++++++++++++++++++++++++++++++++++++++++
| origin RDMA |       18 s       |     23 s     |
-------------------------------------------------
|  MultiRDMA  |       13 s       |     18 s     |
+++++++++++++++++++++++++++++++++++++++++++++++++

For NOT rdma-pin-all migration, the multiRDMA can improve the
total migration time by about 27.8%.
For rdma-pin-all migration, the multiRDMA can improve the
total migration time by about 21.7%.

Test the multiRDMA migration like this:
'virsh migrate --live --rdma-parallel --migrateuri
rdma://hostname domain qemu+tcp://hostname/system'


fengzhimin (12):
  migration: Add multiRDMA capability support
  migration: Export the 'migration_incoming_setup' function           
             and add the 'migrate_use_rdma_pin_all' function
  migration: Create the multi-rdma-channels parameter
  migration/rdma: Create multiRDMA migration threads
  migration/rdma: Create the multiRDMA channels
  migration/rdma: Transmit initial package
  migration/rdma: Be sure all channels are created
  migration/rdma: register memory for multiRDMA channels
  migration/rdma: Wait for all multiRDMA to complete registration
  migration/rdma: use multiRDMA to send RAM block for rdma-pin-all mode
  migration/rdma: use multiRDMA to send RAM block for NOT rdma-pin-all
                  mode
  migration/rdma: only register the virt-ram block for MultiRDMA

 migration/migration.c |   55 +-
 migration/migration.h |    6 +
 migration/rdma.c      | 1320 +++++++++++++++++++++++++++++++++++++----
 monitor/hmp-cmds.c    |    7 +
 qapi/migration.json   |   27 +-
 5 files changed, 1285 insertions(+), 130 deletions(-)

-- 
2.19.1



Re: [PATCH RFC 00/12] *** mulitple RDMA channels for migration ***
Posted by no-reply@patchew.org 4 years, 2 months ago
Patchew URL: https://patchew.org/QEMU/20200109045922.904-1-fengzhimin1@huawei.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  LINK    aarch64-softmmu/qemu-system-aarch64w.exe
../migration/migration.o: In function `migrate_fd_cleanup':
/tmp/qemu-test/src/migration/migration.c:1549: undefined reference to `multiRDMA_save_cleanup'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:206: qemu-system-x86_64w.exe] Error 1
make: *** [Makefile:483: x86_64-softmmu/all] Error 2
make: *** Waiting for unfinished jobs....
../migration/migration.o: In function `migrate_fd_cleanup':
/tmp/qemu-test/src/migration/migration.c:1549: undefined reference to `multiRDMA_save_cleanup'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:206: qemu-system-aarch64w.exe] Error 1
make: *** [Makefile:483: aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=b69d215b8e4143ba8f9e54fa9d5a6cbc', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-1frpag52/src/docker-src.2020-01-09-05.35.46.27299:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=b69d215b8e4143ba8f9e54fa9d5a6cbc
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-1frpag52/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    2m32.657s
user    0m8.573s


The full log is available at
http://patchew.org/logs/20200109045922.904-1-fengzhimin1@huawei.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Re: [PATCH RFC 00/12] *** mulitple RDMA channels for migration ***
Posted by Dr. David Alan Gilbert 4 years, 2 months ago
* Zhimin Feng (fengzhimin1@huawei.com) wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> Currently there is a single channel for RDMA migration, this causes
> the problem that the network bandwidth is not fully utilized for
> 25Gigabit NIC. Inspired by the Multifd, we use two RDMA channels to
> send RAM pages, which we call MultiRDMA.
> 
> We compare the migration performance of MultiRDMA with origin
> RDMA migration. The VM specifications for migration are as follows:
> - VM use 4k page;
> - the number of VCPU is 4;
> - the total memory is 16Gigabit;
> - use 'mempress' tool to pressurize VM(mempress 8000 500);
> - use 25Gigabit network card to migrate;
> 
> For origin RDMA and MultiRDMA migration, the total migration times of
> VM are as follows:
> +++++++++++++++++++++++++++++++++++++++++++++++++
> |             | NOT rdma-pin-all | rdma-pin-all |
> +++++++++++++++++++++++++++++++++++++++++++++++++
> | origin RDMA |       18 s       |     23 s     |
> -------------------------------------------------
> |  MultiRDMA  |       13 s       |     18 s     |
> +++++++++++++++++++++++++++++++++++++++++++++++++

Very nice.

> For NOT rdma-pin-all migration, the multiRDMA can improve the
> total migration time by about 27.8%.
> For rdma-pin-all migration, the multiRDMA can improve the
> total migration time by about 21.7%.
> 
> Test the multiRDMA migration like this:
> 'virsh migrate --live --rdma-parallel --migrateuri
> rdma://hostname domain qemu+tcp://hostname/system'

It will take me a while to finish the review; but another
general suggestion is add more trace_ calls; it will make it easier
to diagnose problems later.

Dave

> 
> fengzhimin (12):
>   migration: Add multiRDMA capability support
>   migration: Export the 'migration_incoming_setup' function           
>              and add the 'migrate_use_rdma_pin_all' function
>   migration: Create the multi-rdma-channels parameter
>   migration/rdma: Create multiRDMA migration threads
>   migration/rdma: Create the multiRDMA channels
>   migration/rdma: Transmit initial package
>   migration/rdma: Be sure all channels are created
>   migration/rdma: register memory for multiRDMA channels
>   migration/rdma: Wait for all multiRDMA to complete registration
>   migration/rdma: use multiRDMA to send RAM block for rdma-pin-all mode
>   migration/rdma: use multiRDMA to send RAM block for NOT rdma-pin-all
>                   mode
>   migration/rdma: only register the virt-ram block for MultiRDMA
> 
>  migration/migration.c |   55 +-
>  migration/migration.h |    6 +
>  migration/rdma.c      | 1320 +++++++++++++++++++++++++++++++++++++----
>  monitor/hmp-cmds.c    |    7 +
>  qapi/migration.json   |   27 +-
>  5 files changed, 1285 insertions(+), 130 deletions(-)
> 
> -- 
> 2.19.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


RE: [PATCH RFC 00/12] *** mulitple RDMA channels for migration ***
Posted by fengzhimin 4 years, 2 months ago
Thanks for your review. I will add more trace_ calls in the next version(V2) and modify its according to your suggestions.

-----Original Message-----
From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com] 
Sent: Thursday, January 16, 2020 3:57 AM
To: fengzhimin <fengzhimin1@huawei.com>
Cc: quintela@redhat.com; armbru@redhat.com; eblake@redhat.com; qemu-devel@nongnu.org; Zhanghailiang <zhang.zhanghailiang@huawei.com>; jemmy858585@gmail.com
Subject: Re: [PATCH RFC 00/12] *** mulitple RDMA channels for migration ***

* Zhimin Feng (fengzhimin1@huawei.com) wrote:
> From: fengzhimin <fengzhimin1@huawei.com>
> 
> Currently there is a single channel for RDMA migration, this causes 
> the problem that the network bandwidth is not fully utilized for 
> 25Gigabit NIC. Inspired by the Multifd, we use two RDMA channels to 
> send RAM pages, which we call MultiRDMA.
> 
> We compare the migration performance of MultiRDMA with origin RDMA 
> migration. The VM specifications for migration are as follows:
> - VM use 4k page;
> - the number of VCPU is 4;
> - the total memory is 16Gigabit;
> - use 'mempress' tool to pressurize VM(mempress 8000 500);
> - use 25Gigabit network card to migrate;
> 
> For origin RDMA and MultiRDMA migration, the total migration times of 
> VM are as follows:
> +++++++++++++++++++++++++++++++++++++++++++++++++
> |             | NOT rdma-pin-all | rdma-pin-all |
> +++++++++++++++++++++++++++++++++++++++++++++++++
> | origin RDMA |       18 s       |     23 s     |
> -------------------------------------------------
> |  MultiRDMA  |       13 s       |     18 s     |
> +++++++++++++++++++++++++++++++++++++++++++++++++

Very nice.

> For NOT rdma-pin-all migration, the multiRDMA can improve the total 
> migration time by about 27.8%.
> For rdma-pin-all migration, the multiRDMA can improve the total 
> migration time by about 21.7%.
> 
> Test the multiRDMA migration like this:
> 'virsh migrate --live --rdma-parallel --migrateuri rdma://hostname 
> domain qemu+tcp://hostname/system'

It will take me a while to finish the review; but another general suggestion is add more trace_ calls; it will make it easier to diagnose problems later.

Dave

> 
> fengzhimin (12):
>   migration: Add multiRDMA capability support
>   migration: Export the 'migration_incoming_setup' function           
>              and add the 'migrate_use_rdma_pin_all' function
>   migration: Create the multi-rdma-channels parameter
>   migration/rdma: Create multiRDMA migration threads
>   migration/rdma: Create the multiRDMA channels
>   migration/rdma: Transmit initial package
>   migration/rdma: Be sure all channels are created
>   migration/rdma: register memory for multiRDMA channels
>   migration/rdma: Wait for all multiRDMA to complete registration
>   migration/rdma: use multiRDMA to send RAM block for rdma-pin-all mode
>   migration/rdma: use multiRDMA to send RAM block for NOT rdma-pin-all
>                   mode
>   migration/rdma: only register the virt-ram block for MultiRDMA
> 
>  migration/migration.c |   55 +-
>  migration/migration.h |    6 +
>  migration/rdma.c      | 1320 +++++++++++++++++++++++++++++++++++++----
>  monitor/hmp-cmds.c    |    7 +
>  qapi/migration.json   |   27 +-
>  5 files changed, 1285 insertions(+), 130 deletions(-)
> 
> --
> 2.19.1
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK