[Qemu-devel] [RFC v2 0/2] Add live migration support in the PVRDMA device

Sukrit Bhatnagar posted 2 patches 4 years, 9 months ago
Test s390x passed
Test asan passed
Test checkpatch passed
Test docker-mingw@fedora passed
Test FreeBSD passed
Test docker-clang@ubuntu passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190706040940.7884-1-skrtbhtngr@gmail.com
Maintainers: Yuval Shaia <yuval.shaia@oracle.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
There is a newer version of this series
hw/rdma/vmw/pvrdma_main.c | 104 +++++++++++++++++++++++++++++++++++---
1 file changed, 96 insertions(+), 8 deletions(-)
[Qemu-devel] [RFC v2 0/2] Add live migration support in the PVRDMA device
Posted by Sukrit Bhatnagar 4 years, 9 months ago
Changes in v2:

* Modify load_dsr() such that dsr mapping is not performed if dsr value
  is non-NULL. Also move free_dsr() out of load_dsr() and call it right
  before if needed. These two changes will allow us to call load_dsr()
  even when we have already done dsr mapping and would like to go on
  with the rest of mappings.

* Use VMStateDescription instead of SaveVMHandlers to describe migration
  state. Also add fields for parent PCI object and MSIX.

* Use a temporary structure (struct PVRDMAMigTmp) to hold some fields
  during migration. These fields, such as cmd_slot_dma and resp_slot_dma
  inside dsr, do not fit into VMSTATE macros as their container
  (dsr_info->dsr) will not be ready until it is mapped on the dest.

* Perform mappings to CQ and event notification rings after the state is
  loaded. This is an extension to the mappings performed in v1;
  following the flow of load_dsr(). All the mappings are succesfully
  done on the dest on state load.

Link(s) to v1:
https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04924.html
https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04923.html


Things working now (were not working at the time of v1):

* vmxnet3 is migrating successfully. The issue was in the migration of
  its PCI configuration space, and is solved by the patch Marcel had sent:
  https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01500.html

* There is no problem due to BounceBuffers which were failing the dma mapping
  calls in state load logic earlier. Not sure exactly how it went away. I am
  guessing that adding the PCI and MSIX state to migration solved the issue.


What is still needed:

* A workaround to get libvirt to support same-host migration. Since
  the problems faced in v1 (mentioned above) are out of the way, we
  can move further, and in doing so, we will need this.

Sukrit Bhatnagar (2):
  hw/pvrdma: make DSR mapping idempotent in load_dsr()
  hw/pvrdma: add live migration support

 hw/rdma/vmw/pvrdma_main.c | 104 +++++++++++++++++++++++++++++++++++---
 1 file changed, 96 insertions(+), 8 deletions(-)

-- 
2.21.0


Re: [Qemu-devel] [RFC v2 0/2] Add live migration support in the PVRDMA device
Posted by Marcel Apfelbaum 4 years, 9 months ago
Hi Sukrit,

On 7/6/19 7:09 AM, Sukrit Bhatnagar wrote:
> Changes in v2:
>
> * Modify load_dsr() such that dsr mapping is not performed if dsr value
>    is non-NULL. Also move free_dsr() out of load_dsr() and call it right
>    before if needed. These two changes will allow us to call load_dsr()
>    even when we have already done dsr mapping and would like to go on
>    with the rest of mappings.
>
> * Use VMStateDescription instead of SaveVMHandlers to describe migration
>    state. Also add fields for parent PCI object and MSIX.
>
> * Use a temporary structure (struct PVRDMAMigTmp) to hold some fields
>    during migration. These fields, such as cmd_slot_dma and resp_slot_dma
>    inside dsr, do not fit into VMSTATE macros as their container
>    (dsr_info->dsr) will not be ready until it is mapped on the dest.
>
> * Perform mappings to CQ and event notification rings after the state is
>    loaded. This is an extension to the mappings performed in v1;
>    following the flow of load_dsr(). All the mappings are succesfully
>    done on the dest on state load.

Nice!

> Link(s) to v1:
> https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04924.html
> https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04923.html
>
>
> Things working now (were not working at the time of v1):
>
> * vmxnet3 is migrating successfully. The issue was in the migration of
>    its PCI configuration space, and is solved by the patch Marcel had sent:
>    https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01500.html
>
> * There is no problem due to BounceBuffers which were failing the dma mapping
>    calls in state load logic earlier. Not sure exactly how it went away. I am
>    guessing that adding the PCI and MSIX state to migration solved the issue.
>

I am sure it was connected somehow, anyway, I am glad we can continue
with the project.

> What is still needed:
>
> * A workaround to get libvirt to support same-host migration. Since
>    the problems faced in v1 (mentioned above) are out of the way, we
>    can move further, and in doing so, we will need this.

[Adding Daniel  and Michal]
Is there anyway to test live-migration for libvirt domains on the same host?
Even a 'hack' would be enough.

Sukrit, another way you could do it is by enabling nested virtualization
and have 2 Vms as hosts. I suppose the migration will take some time 
though...


Thanks,
Marcel

> Sukrit Bhatnagar (2):
>    hw/pvrdma: make DSR mapping idempotent in load_dsr()
>    hw/pvrdma: add live migration support
>
>   hw/rdma/vmw/pvrdma_main.c | 104 +++++++++++++++++++++++++++++++++++---
>   1 file changed, 96 insertions(+), 8 deletions(-)
>


Re: [Qemu-devel] [RFC v2 0/2] Add live migration support in the PVRDMA device
Posted by Daniel P. Berrangé 4 years, 9 months ago
On Sat, Jul 06, 2019 at 10:04:55PM +0300, Marcel Apfelbaum wrote:
> Hi Sukrit,
> 
> On 7/6/19 7:09 AM, Sukrit Bhatnagar wrote:
> > Changes in v2:
> > 
> > * Modify load_dsr() such that dsr mapping is not performed if dsr value
> >    is non-NULL. Also move free_dsr() out of load_dsr() and call it right
> >    before if needed. These two changes will allow us to call load_dsr()
> >    even when we have already done dsr mapping and would like to go on
> >    with the rest of mappings.
> > 
> > * Use VMStateDescription instead of SaveVMHandlers to describe migration
> >    state. Also add fields for parent PCI object and MSIX.
> > 
> > * Use a temporary structure (struct PVRDMAMigTmp) to hold some fields
> >    during migration. These fields, such as cmd_slot_dma and resp_slot_dma
> >    inside dsr, do not fit into VMSTATE macros as their container
> >    (dsr_info->dsr) will not be ready until it is mapped on the dest.
> > 
> > * Perform mappings to CQ and event notification rings after the state is
> >    loaded. This is an extension to the mappings performed in v1;
> >    following the flow of load_dsr(). All the mappings are succesfully
> >    done on the dest on state load.
> 
> Nice!
> 
> > Link(s) to v1:
> > https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04924.html
> > https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04923.html
> > 
> > 
> > Things working now (were not working at the time of v1):
> > 
> > * vmxnet3 is migrating successfully. The issue was in the migration of
> >    its PCI configuration space, and is solved by the patch Marcel had sent:
> >    https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01500.html
> > 
> > * There is no problem due to BounceBuffers which were failing the dma mapping
> >    calls in state load logic earlier. Not sure exactly how it went away. I am
> >    guessing that adding the PCI and MSIX state to migration solved the issue.
> > 
> 
> I am sure it was connected somehow, anyway, I am glad we can continue
> with the project.
> 
> > What is still needed:
> > 
> > * A workaround to get libvirt to support same-host migration. Since
> >    the problems faced in v1 (mentioned above) are out of the way, we
> >    can move further, and in doing so, we will need this.
> 
> [Adding Daniel  and Michal]
> Is there anyway to test live-migration for libvirt domains on the same host?
> Even a 'hack' would be enough.

Create two VMs on your host & run inside those. Or create two containers
if you want a lighter weight solution. You must have two completely
independant libvirtd instances, sharing nothing, except optionally where
you store disk images.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC v2 0/2] Add live migration support in the PVRDMA device
Posted by Marcel Apfelbaum 4 years, 9 months ago

On 7/8/19 12:38 PM, Daniel P. Berrangé wrote:
> On Sat, Jul 06, 2019 at 10:04:55PM +0300, Marcel Apfelbaum wrote:
>> Hi Sukrit,
>>
>> On 7/6/19 7:09 AM, Sukrit Bhatnagar wrote:
>>> Changes in v2:
>>>
>>> * Modify load_dsr() such that dsr mapping is not performed if dsr value
>>>     is non-NULL. Also move free_dsr() out of load_dsr() and call it right
>>>     before if needed. These two changes will allow us to call load_dsr()
>>>     even when we have already done dsr mapping and would like to go on
>>>     with the rest of mappings.
>>>
>>> * Use VMStateDescription instead of SaveVMHandlers to describe migration
>>>     state. Also add fields for parent PCI object and MSIX.
>>>
>>> * Use a temporary structure (struct PVRDMAMigTmp) to hold some fields
>>>     during migration. These fields, such as cmd_slot_dma and resp_slot_dma
>>>     inside dsr, do not fit into VMSTATE macros as their container
>>>     (dsr_info->dsr) will not be ready until it is mapped on the dest.
>>>
>>> * Perform mappings to CQ and event notification rings after the state is
>>>     loaded. This is an extension to the mappings performed in v1;
>>>     following the flow of load_dsr(). All the mappings are succesfully
>>>     done on the dest on state load.
>> Nice!
>>
>>> Link(s) to v1:
>>> https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04924.html
>>> https://lists.gnu.org/archive/html/qemu-devel/2019-06/msg04923.html
>>>
>>>
>>> Things working now (were not working at the time of v1):
>>>
>>> * vmxnet3 is migrating successfully. The issue was in the migration of
>>>     its PCI configuration space, and is solved by the patch Marcel had sent:
>>>     https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg01500.html
>>>
>>> * There is no problem due to BounceBuffers which were failing the dma mapping
>>>     calls in state load logic earlier. Not sure exactly how it went away. I am
>>>     guessing that adding the PCI and MSIX state to migration solved the issue.
>>>
>> I am sure it was connected somehow, anyway, I am glad we can continue
>> with the project.
>>
>>> What is still needed:
>>>
>>> * A workaround to get libvirt to support same-host migration. Since
>>>     the problems faced in v1 (mentioned above) are out of the way, we
>>>     can move further, and in doing so, we will need this.
>> [Adding Daniel  and Michal]
>> Is there anyway to test live-migration for libvirt domains on the same host?
>> Even a 'hack' would be enough.
> Create two VMs on your host & run inside those. Or create two containers
> if you want a lighter weight solution. You must have two completely
> independant libvirtd instances, sharing nothing, except optionally where
> you store disk images.

We'll work with live-cd, no storage is needed.

Thank you for the help!
Marcel

> Regards,
> Daniel