[Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device

Sukrit Bhatnagar posted 2 patches 4 years, 9 months ago
Test checkpatch passed
Test s390x passed
Test asan passed
Test docker-mingw@fedora passed
Test FreeBSD passed
Test docker-clang@ubuntu failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190720234803.18938-1-skrtbhtngr@gmail.com
Maintainers: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Yuval Shaia <yuval.shaia@oracle.com>
hw/rdma/vmw/pvrdma_main.c | 94 +++++++++++++++++++++++++++++++++++----
1 file changed, 86 insertions(+), 8 deletions(-)
[Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device
Posted by Sukrit Bhatnagar 4 years, 9 months ago
In v2, we had successful migration of PCI and MSIX states as well as
various DMA addresses and ring page information.
This series enables the migration of various GIDs used by the device.

We have switched to a setup having two hosts and two VMs running atop them.
Migrations are now performed over the local network. This has settled the
same-host issue with libvirt.

We also have performed various ping-pong tests (ibv_rc_pingpong) in the
guest(s) after adding GID migration support and this is the current status:
- ping-pong to localhost succeeds, when performed before starting the
  migration and after the completion of migration.
- ping-pong to a peer succeeds, both before and after migration as above,
  provided that both VMs are running on/migrated to the same host.
  So, if two VMs were started on two different hosts, and one of them
  was migrated to the other host, the ping-pong was successful.
  Similarly, if two VMs are migrated to the same host, then after migration,
  the ping-pong was successful.
- ping-pong to a peer on the remote host is not working as of now.

Our next goal is to achieve successful migration with live traffic.

This series can be also found at:
https://github.com/skrtbhtngr/qemu/tree/gsoc19


History:

v2 -> v3:
- remove struct PVRDMAMigTmp and VMSTATE_WITH_TMP
- use predefined PVRDMA_HW_NAME for the vmsd name
- add vmsd for gids and a gid table field in pvrdma_state
- perform gid registration in pvrdma_post_load
- define pvrdma_post_save to unregister gids in the source host

v1 -> v2:
- modify load_dsr() to make it idempotent
- switch to VMStateDescription
- add fields for PCI and MSIX state
- define a temporary struct PVRDMAMigTmp to use WITH_TMP macro
- perform mappings to CQ and event notification rings at load
- vmxnet3 issue solved by Marcel's patch
- BounceBuffer issue solved automatically by switching to VMStateDescription


Link(s) to v2:
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01848.html
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01849.html
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01850.html

Link(s) to v1:
https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04924.html
https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04923.html

Sukrit Bhatnagar (2):
  hw/pvrdma: make DSR mapping idempotent in load_dsr()
  hw/pvrdma: add live migration support

 hw/rdma/vmw/pvrdma_main.c | 94 +++++++++++++++++++++++++++++++++++----
 1 file changed, 86 insertions(+), 8 deletions(-)

-- 
2.21.0


Re: [Qemu-devel] [RFC v3 0/2] Add live migration support in the PVRDMA device
Posted by Yuval Shaia 4 years, 8 months ago
On Sun, Jul 21, 2019 at 05:18:01AM +0530, Sukrit Bhatnagar wrote:
> In v2, we had successful migration of PCI and MSIX states as well as
> various DMA addresses and ring page information.
> This series enables the migration of various GIDs used by the device.
> 
> We have switched to a setup having two hosts and two VMs running atop them.
> Migrations are now performed over the local network. This has settled the
> same-host issue with libvirt.
> 
> We also have performed various ping-pong tests (ibv_rc_pingpong) in the
> guest(s) after adding GID migration support and this is the current status:
> - ping-pong to localhost succeeds, when performed before starting the
>   migration and after the completion of migration.
> - ping-pong to a peer succeeds, both before and after migration as above,
>   provided that both VMs are running on/migrated to the same host.
>   So, if two VMs were started on two different hosts, and one of them
>   was migrated to the other host, the ping-pong was successful.
>   Similarly, if two VMs are migrated to the same host, then after migration,
>   the ping-pong was successful.
> - ping-pong to a peer on the remote host is not working as of now.
> 
> Our next goal is to achieve successful migration with live traffic.

As this is a major milestone which enable live migration (still when there
are no QPs), i believe we are ok for a patch.

Yuval

> 
> This series can be also found at:
> https://github.com/skrtbhtngr/qemu/tree/gsoc19
> 
> 
> History:
> 
> v2 -> v3:
> - remove struct PVRDMAMigTmp and VMSTATE_WITH_TMP
> - use predefined PVRDMA_HW_NAME for the vmsd name
> - add vmsd for gids and a gid table field in pvrdma_state
> - perform gid registration in pvrdma_post_load
> - define pvrdma_post_save to unregister gids in the source host
> 
> v1 -> v2:
> - modify load_dsr() to make it idempotent
> - switch to VMStateDescription
> - add fields for PCI and MSIX state
> - define a temporary struct PVRDMAMigTmp to use WITH_TMP macro
> - perform mappings to CQ and event notification rings at load
> - vmxnet3 issue solved by Marcel's patch
> - BounceBuffer issue solved automatically by switching to VMStateDescription
> 
> 
> Link(s) to v2:
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01848.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01849.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01850.html
> 
> Link(s) to v1:
> https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04924.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04923.html
> 
> Sukrit Bhatnagar (2):
>   hw/pvrdma: make DSR mapping idempotent in load_dsr()
>   hw/pvrdma: add live migration support
> 
>  hw/rdma/vmw/pvrdma_main.c | 94 +++++++++++++++++++++++++++++++++++----
>  1 file changed, 86 insertions(+), 8 deletions(-)
> 
> -- 
> 2.21.0
>