From nobody Sun Feb 8 22:08:29 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) client-ip=170.10.129.124; envelope-from=libvir-list-bounces@redhat.com; helo=us-smtp-delivery-124.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1652196208; cv=none; d=zohomail.com; s=zohoarc; b=g/5RdfFdWcN7/rnuqLvz/m4WZI1JFWrk5lN2vom/awg165IfDBuRuTHUcRphIq0rjCG3pXwXXJOHFH9hXcf60AT/HrNqnoyyUTY3Sz2gYeviIjASJPT4c5VC8qq0TMZOxPrMJpYndUfw8xP3w7rGOHy1/ZaDd/EOhfjZ7mhXwpU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1652196208; h=Content-Type:Content-Transfer-Encoding:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=/3+OAqdGWMcH7UHKUfmkjET2hyMsSNAUopvnONA1uUU=; b=FuQDOpFmqp2MgAZbvg8attM/XxJxQENY+tdDOtS5KKa2TOFaTCmib3Om+cr+j1i3Gaa273l7t41RrnZyr6uCtEPf7fsqagQb1uOrphGcfgLTeKjCx3cB94Ysi6618+1CWaMWeNt6N2+P4ijVB6b8DNz/sMCa1s+F67Ew+Ff5iZE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.zohomail.com with SMTPS id 1652196208087419.3683553486974; Tue, 10 May 2022 08:23:28 -0700 (PDT) Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-607-xsMpd-QmPne55mI3DFfUWw-1; Tue, 10 May 2022 11:23:16 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EE51786C167; Tue, 10 May 2022 15:22:02 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id D478A40D2825; Tue, 10 May 2022 15:22:02 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 3E90A1940344; Tue, 10 May 2022 15:21:56 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 6DD16194704D for ; Tue, 10 May 2022 15:21:49 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id 50BFB40D1B9A; Tue, 10 May 2022 15:21:49 +0000 (UTC) Received: from virval.usersys.redhat.com (unknown [10.43.2.187]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 144A04010E46 for ; Tue, 10 May 2022 15:21:49 +0000 (UTC) Received: by virval.usersys.redhat.com (Postfix, from userid 500) id A86F3244616; Tue, 10 May 2022 17:21:43 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1652196206; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=/3+OAqdGWMcH7UHKUfmkjET2hyMsSNAUopvnONA1uUU=; b=K8+8c+7i/mEi/7Epra7wOxvym+e9WO261s44Mo3EFmlVg3/02Vfrz34G5wYYQzDsJlek/n 2X0o/iWUQGkMi/jXe267ent4HoQyHeu1H2HYkMHsDrAlex5CH5fQsVtVAlk5BHi6qF0NN0 /eVvwFuiFhCAZi/E4JZVNv0kAkCmBn8= X-MC-Unique: xsMpd-QmPne55mI3DFfUWw-1 X-Original-To: libvir-list@listman.corp.redhat.com From: Jiri Denemark To: libvir-list@redhat.com Subject: [libvirt PATCH 42/80] qemu: Improve post-copy migration handling on reconnect Date: Tue, 10 May 2022 17:21:03 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libvir-list-bounces@redhat.com Sender: "libvir-list" X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=libvir-list-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1652196208868100005 Content-Type: text/plain; charset="utf-8" When libvirt daemon is restarted during an active post-copy migration, we do not always mark the migration as broken. In this phase libvirt is not really needed for migration to finish successfully. In fact the migration could have even finished while libvirt was not running or it may still be happily running. Signed-off-by: Jiri Denemark Reviewed-by: Peter Krempa --- src/qemu/qemu_migration.c | 27 +++++++++++++++++++++++++++ src/qemu/qemu_migration.h | 6 ++++++ src/qemu/qemu_process.c | 39 +++++++++++++++++++++++++++++---------- 3 files changed, 62 insertions(+), 10 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index dacea63610..854dfd43c1 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2460,6 +2460,33 @@ qemuMigrationSrcBeginPhaseBlockDirtyBitmaps(qemuMigr= ationCookie *mig, } =20 =20 +int +qemuMigrationAnyRefreshStatus(virQEMUDriver *driver, + virDomainObj *vm, + virDomainAsyncJob asyncJob, + virDomainJobStatus *status) +{ + g_autoptr(virDomainJobData) jobData =3D NULL; + qemuDomainJobDataPrivate *priv; + + jobData =3D virDomainJobDataInit(&qemuJobDataPrivateDataCallbacks); + priv =3D jobData->privateData; + + if (qemuMigrationAnyFetchStats(driver, vm, asyncJob, jobData, NULL) < = 0) + return -1; + + qemuMigrationUpdateJobType(jobData); + VIR_DEBUG("QEMU reports domain '%s' is in '%s' migration state, " + "translated as %d", + vm->def->name, + qemuMonitorMigrationStatusTypeToString(priv->stats.mig.statu= s), + jobData->status); + + *status =3D jobData->status; + return 0; +} + + /* The caller is supposed to lock the vm and start a migration job. */ static char * qemuMigrationSrcBeginPhase(virQEMUDriver *driver, diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index eeb69a52bf..9351d6ac51 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -279,3 +279,9 @@ qemuMigrationSrcFetchMirrorStats(virQEMUDriver *driver, virDomainObj *vm, virDomainAsyncJob asyncJob, virDomainJobData *jobData); + +int +qemuMigrationAnyRefreshStatus(virQEMUDriver *driver, + virDomainObj *vm, + virDomainAsyncJob asyncJob, + virDomainJobStatus *status); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 7b347a9061..1cb00af6f1 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3591,10 +3591,8 @@ qemuProcessRecoverMigrationIn(virQEMUDriver *driver, /* migration finished, we started resuming the domain but didn't * confirm success or failure yet; killing it seems safest unless * we already started guest CPUs or we were in post-copy mode */ - if (virDomainObjIsPostcopy(vm, VIR_DOMAIN_JOB_OPERATION_MIGRATION_= IN)) { - qemuMigrationDstPostcopyFailed(vm); + if (virDomainObjIsPostcopy(vm, VIR_DOMAIN_JOB_OPERATION_MIGRATION_= IN)) return 1; - } =20 if (state !=3D VIR_DOMAIN_RUNNING) { VIR_DEBUG("Killing migrated domain %s", vm->def->name); @@ -3661,10 +3659,8 @@ qemuProcessRecoverMigrationOut(virQEMUDriver *driver, * of Finish3 step; third party needs to check what to do next; in * post-copy mode we can use PAUSED_POSTCOPY_FAILED state for this */ - if (postcopy) { - qemuMigrationSrcPostcopyFailed(vm); + if (postcopy) return 1; - } break; =20 case QEMU_MIGRATION_PHASE_CONFIRM3_CANCELLED: @@ -3672,10 +3668,8 @@ qemuProcessRecoverMigrationOut(virQEMUDriver *driver, * post-copy mode there's no way back, so let's just mark the doma= in * as broken in that case */ - if (postcopy) { - qemuMigrationSrcPostcopyFailed(vm); + if (postcopy) return 1; - } =20 VIR_DEBUG("Resuming domain %s after failed migration", vm->def->name); @@ -3713,6 +3707,7 @@ qemuProcessRecoverMigration(virQEMUDriver *driver, qemuDomainJobObj *job, unsigned int *stopFlags) { + virDomainJobStatus migStatus =3D VIR_DOMAIN_JOB_STATUS_NONE; qemuDomainJobPrivate *jobPriv =3D job->privateData; virDomainState state; int reason; @@ -3720,6 +3715,8 @@ qemuProcessRecoverMigration(virQEMUDriver *driver, =20 state =3D virDomainObjGetState(vm, &reason); =20 + qemuMigrationAnyRefreshStatus(driver, vm, VIR_ASYNC_JOB_NONE, &migStat= us); + if (job->asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_OUT) { rc =3D qemuProcessRecoverMigrationOut(driver, vm, job, state, reason, stopFlags); @@ -3731,7 +3728,29 @@ qemuProcessRecoverMigration(virQEMUDriver *driver, return -1; =20 if (rc > 0) { - qemuProcessRestoreMigrationJob(vm, job); + if (migStatus =3D=3D VIR_DOMAIN_JOB_STATUS_POSTCOPY) { + VIR_DEBUG("Post-copy migration of domain %s still running, it " + "will be handled as unattended", vm->def->name); + qemuProcessRestoreMigrationJob(vm, job); + return 0; + } + + if (migStatus !=3D VIR_DOMAIN_JOB_STATUS_HYPERVISOR_COMPLETED) { + if (job->asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_OUT) + qemuMigrationSrcPostcopyFailed(vm); + else + qemuMigrationDstPostcopyFailed(vm); + + qemuProcessRestoreMigrationJob(vm, job); + return 0; + } + + VIR_DEBUG("Post-copy migration of domain %s already finished", + vm->def->name); + if (job->asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_OUT) + qemuMigrationSrcComplete(driver, vm, VIR_ASYNC_JOB_NONE); + else + qemuMigrationDstComplete(driver, vm, true, VIR_ASYNC_JOB_NONE,= job); return 0; } =20 --=20 2.35.1