[Qemu-devel] [PATCH 5/5] migration/rdma: Send error during cancelling

Dr. David Alan Gilbert (git) posted 5 patches 8 years, 7 months ago
There is a newer version of this series
[Qemu-devel] [PATCH 5/5] migration/rdma: Send error during cancelling
Posted by Dr. David Alan Gilbert (git) 8 years, 7 months ago
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

When we issue a cancel and clean up the RDMA channel
send a CONTROL_ERROR to get the destination to quit.

The rdma_cleanup code waits for the event to come back
from the rdma_disconnect; but that wont happen until the
destination quits and there's currently nothing to force
it.

Note this makes the case of a cancel work while the destination
is alive, and it already works if the destination is
truly dead.  Note it doesn't fix the case where the destination
is hung (we get stuck waiting for the rdma_disconnect event).

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/rdma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index bfb0a43740..3d17db3a23 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2260,7 +2260,9 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
     int ret, idx;
 
     if (rdma->cm_id && rdma->connected) {
-        if (rdma->error_state && !rdma->received_error) {
+        if ((rdma->error_state ||
+             migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
+            !rdma->received_error) {
             RDMAControlHeader head = { .len = 0,
                                        .type = RDMA_CONTROL_ERROR,
                                        .repeat = 1,
-- 
2.13.0


Re: [Qemu-devel] [PATCH 5/5] migration/rdma: Send error during cancelling
Posted by Peter Xu 8 years, 7 months ago
On Tue, Jul 04, 2017 at 07:49:15PM +0100, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> When we issue a cancel and clean up the RDMA channel
> send a CONTROL_ERROR to get the destination to quit.
> 
> The rdma_cleanup code waits for the event to come back
> from the rdma_disconnect; but that wont happen until the
> destination quits and there's currently nothing to force
> it.
> 
> Note this makes the case of a cancel work while the destination
> is alive, and it already works if the destination is
> truly dead.  Note it doesn't fix the case where the destination
> is hung (we get stuck waiting for the rdma_disconnect event).
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Looks like we'll print this as well when we cancel the migration
(before sending the RDMA_CONTROL_ERROR):

  error_report("Early error. Sending error.");

But I don't think it really matters. So:

Reviewed-by: Peter Xu <peterx@redhat.com>

> ---
>  migration/rdma.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index bfb0a43740..3d17db3a23 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2260,7 +2260,9 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>      int ret, idx;
>  
>      if (rdma->cm_id && rdma->connected) {
> -        if (rdma->error_state && !rdma->received_error) {
> +        if ((rdma->error_state ||
> +             migrate_get_current()->state == MIGRATION_STATUS_CANCELLING) &&
> +            !rdma->received_error) {
>              RDMAControlHeader head = { .len = 0,
>                                         .type = RDMA_CONTROL_ERROR,
>                                         .repeat = 1,
> -- 
> 2.13.0
> 

-- 
Peter Xu