From nobody Sun Feb 8 13:39:04 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) client-ip=170.10.133.124; envelope-from=libvir-list-bounces@redhat.com; helo=us-smtp-delivery-124.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1654087988; cv=none; d=zohomail.com; s=zohoarc; b=cz+1u2MefBUa9MabP1y1JsvZwQqiIUAb16KcNMARmJCGQFKxnl7Ry0C+nz35DvvIEj1q/fKJLjfR+Vxt038CXC3hG+24BS9nByZZ8SdghLZZd+9cTUJ+0Zx6Hzn3KlFJcTPT7hgMhcBIiyVrQjEHf/2qi0zEPAjC+6FARAGtY5U= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1654087988; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=63VtyviKKGdi4TyCuxvoPU+APOPxkSy/y6SFCCCnijo=; b=S9XcczO11KjHWfLbuq4Hv1lZdyPUDC4OrqDIGtVBNXSnXHr+WpP2sf004/saw8CZi0L/Ok09JKeeXy78KdmJ7ms8ie1SJ3nsHEWuwODaDOqeirYJ9k1AGQeuCzVqmIAtjVmcKCqOdMK3349iW0UasvWA8mBKPyJ43GVqu10yrWE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mx.zohomail.com with SMTPS id 1654087988966728.6748234050826; Wed, 1 Jun 2022 05:53:08 -0700 (PDT) Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-596-M1iyJsneNYy4AfN2rkLzPg-1; Wed, 01 Jun 2022 08:51:00 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4BBCF80419C; Wed, 1 Jun 2022 12:50:35 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 31A1340EC005; Wed, 1 Jun 2022 12:50:35 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 6FE22193F6D5; Wed, 1 Jun 2022 12:50:33 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 1ABA21947B95 for ; Wed, 1 Jun 2022 12:50:28 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id E68971410F36; Wed, 1 Jun 2022 12:50:27 +0000 (UTC) Received: from virval.usersys.redhat.com (unknown [10.43.2.227]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A9FC71402410; Wed, 1 Jun 2022 12:50:27 +0000 (UTC) Received: by virval.usersys.redhat.com (Postfix, from userid 500) id 71F59245B5A; Wed, 1 Jun 2022 14:50:25 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1654087988; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=63VtyviKKGdi4TyCuxvoPU+APOPxkSy/y6SFCCCnijo=; b=AZqiffvQdm4dhXQLiu+vANrLBYCEOkaRaPbuJtblWFAiZv3zv8JPFxq6b9WfDz5845nBro LZm6kuhi2jCcdO1U4JxxeUXABjAQMatOlouVFQ8+ruiYcGhjzr1FLARd1pEQureinwDbcQ 37G4dONJXaYk8QTaMYupiLRTue0P9GQ= X-MC-Unique: M1iyJsneNYy4AfN2rkLzPg-1 X-Original-To: libvir-list@listman.corp.redhat.com From: Jiri Denemark To: libvir-list@redhat.com Subject: [libvirt PATCH v2 14/81] qemu: Restore failed migration job on reconnect Date: Wed, 1 Jun 2022 14:49:14 +0200 Message-Id: <8cea42b357c374cc165a57bf0c386c4e16cb1fea.1654087150.git.jdenemar@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.85 on 10.11.54.7 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Krempa , Pavel Hrdina Errors-To: libvir-list-bounces@redhat.com Sender: "libvir-list" X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=libvir-list-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1654087990462100012 Content-Type: text/plain; charset="utf-8" Since we keep the migration job active when post-copy migration fails, we need to restore it when reconnecting to running domains. Signed-off-by: Jiri Denemark Reviewed-by: Peter Krempa Reviewed-by: Pavel Hrdina --- Notes: Version 2: - no change src/qemu/qemu_process.c | 128 ++++++++++++++++++++++++++++++---------- 1 file changed, 96 insertions(+), 32 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f5a45c898d..081b049672 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3385,20 +3385,48 @@ qemuProcessCleanupMigrationJob(virQEMUDriver *drive= r, } =20 =20 +static void +qemuProcessRestoreMigrationJob(virDomainObj *vm, + qemuDomainJobObj *job) +{ + qemuDomainObjPrivate *priv =3D vm->privateData; + qemuDomainJobPrivate *jobPriv =3D job->privateData; + virDomainJobOperation op; + unsigned long long allowedJobs; + + if (job->asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_IN) { + op =3D VIR_DOMAIN_JOB_OPERATION_MIGRATION_IN; + allowedJobs =3D VIR_JOB_NONE; + } else { + op =3D VIR_DOMAIN_JOB_OPERATION_MIGRATION_OUT; + allowedJobs =3D VIR_JOB_DEFAULT_MASK | JOB_MASK(VIR_JOB_MIGRATION_= OP); + } + + qemuDomainObjRestoreAsyncJob(vm, job->asyncJob, job->phase, op, + QEMU_DOMAIN_JOB_STATS_TYPE_MIGRATION, + VIR_DOMAIN_JOB_STATUS_PAUSED, + allowedJobs); + + job->privateData =3D g_steal_pointer(&priv->job.privateData); + priv->job.privateData =3D jobPriv; + priv->job.apiFlags =3D job->apiFlags; + + qemuDomainCleanupAdd(vm, qemuProcessCleanupMigrationJob); +} + + +/* + * Returns + * -1 on error, the domain will be killed, + * 0 the domain should remain running with the migration job discarde= d, + * 1 the daemon was restarted during post-copy phase + */ static int qemuProcessRecoverMigrationIn(virQEMUDriver *driver, virDomainObj *vm, - const qemuDomainJobObj *job, - virDomainState state, - int reason) + qemuDomainJobObj *job, + virDomainState state) { - - qemuDomainJobPrivate *jobPriv =3D job->privateData; - bool postcopy =3D (state =3D=3D VIR_DOMAIN_PAUSED && - reason =3D=3D VIR_DOMAIN_PAUSED_POSTCOPY_FAILED) || - (state =3D=3D VIR_DOMAIN_RUNNING && - reason =3D=3D VIR_DOMAIN_RUNNING_POSTCOPY); - VIR_DEBUG("Active incoming migration in phase %s", qemuMigrationJobPhaseTypeToString(job->phase)); =20 @@ -3435,32 +3463,37 @@ qemuProcessRecoverMigrationIn(virQEMUDriver *driver, /* migration finished, we started resuming the domain but didn't * confirm success or failure yet; killing it seems safest unless * we already started guest CPUs or we were in post-copy mode */ - if (postcopy) { + if (virDomainObjIsPostcopy(vm, VIR_DOMAIN_JOB_OPERATION_MIGRATION_= IN)) { qemuMigrationDstPostcopyFailed(vm); - } else if (state !=3D VIR_DOMAIN_RUNNING) { + return 1; + } + + if (state !=3D VIR_DOMAIN_RUNNING) { VIR_DEBUG("Killing migrated domain %s", vm->def->name); return -1; } break; } =20 - qemuMigrationParamsReset(driver, vm, VIR_ASYNC_JOB_NONE, - jobPriv->migParams, job->apiFlags); return 0; } =20 + +/* + * Returns + * -1 on error, the domain will be killed, + * 0 the domain should remain running with the migration job discarde= d, + * 1 the daemon was restarted during post-copy phase + */ static int qemuProcessRecoverMigrationOut(virQEMUDriver *driver, virDomainObj *vm, - const qemuDomainJobObj *job, + qemuDomainJobObj *job, virDomainState state, int reason, unsigned int *stopFlags) { - qemuDomainJobPrivate *jobPriv =3D job->privateData; - bool postcopy =3D state =3D=3D VIR_DOMAIN_PAUSED && - (reason =3D=3D VIR_DOMAIN_PAUSED_POSTCOPY || - reason =3D=3D VIR_DOMAIN_PAUSED_POSTCOPY_FAILED); + bool postcopy =3D virDomainObjIsPostcopy(vm, VIR_DOMAIN_JOB_OPERATION_= MIGRATION_OUT); bool resume =3D false; =20 VIR_DEBUG("Active outgoing migration in phase %s", @@ -3500,8 +3533,10 @@ qemuProcessRecoverMigrationOut(virQEMUDriver *driver, * of Finish3 step; third party needs to check what to do next; in * post-copy mode we can use PAUSED_POSTCOPY_FAILED state for this */ - if (postcopy) + if (postcopy) { qemuMigrationSrcPostcopyFailed(vm); + return 1; + } break; =20 case QEMU_MIGRATION_PHASE_CONFIRM3_CANCELLED: @@ -3511,11 +3546,12 @@ qemuProcessRecoverMigrationOut(virQEMUDriver *drive= r, */ if (postcopy) { qemuMigrationSrcPostcopyFailed(vm); - } else { - VIR_DEBUG("Resuming domain %s after failed migration", - vm->def->name); - resume =3D true; + return 1; } + + VIR_DEBUG("Resuming domain %s after failed migration", + vm->def->name); + resume =3D true; break; =20 case QEMU_MIGRATION_PHASE_CONFIRM3: @@ -3539,15 +3575,49 @@ qemuProcessRecoverMigrationOut(virQEMUDriver *drive= r, } } =20 + return 0; +} + + +static int +qemuProcessRecoverMigration(virQEMUDriver *driver, + virDomainObj *vm, + qemuDomainJobObj *job, + unsigned int *stopFlags) +{ + qemuDomainJobPrivate *jobPriv =3D job->privateData; + virDomainState state; + int reason; + int rc; + + state =3D virDomainObjGetState(vm, &reason); + + if (job->asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_OUT) { + rc =3D qemuProcessRecoverMigrationOut(driver, vm, job, + state, reason, stopFlags); + } else { + rc =3D qemuProcessRecoverMigrationIn(driver, vm, job, state); + } + + if (rc < 0) + return -1; + + if (rc > 0) { + qemuProcessRestoreMigrationJob(vm, job); + return 0; + } + qemuMigrationParamsReset(driver, vm, VIR_ASYNC_JOB_NONE, jobPriv->migParams, job->apiFlags); + return 0; } =20 + static int qemuProcessRecoverJob(virQEMUDriver *driver, virDomainObj *vm, - const qemuDomainJobObj *job, + qemuDomainJobObj *job, unsigned int *stopFlags) { qemuDomainObjPrivate *priv =3D vm->privateData; @@ -3565,14 +3635,8 @@ qemuProcessRecoverJob(virQEMUDriver *driver, =20 switch (job->asyncJob) { case VIR_ASYNC_JOB_MIGRATION_OUT: - if (qemuProcessRecoverMigrationOut(driver, vm, job, - state, reason, stopFlags) < 0) - return -1; - break; - case VIR_ASYNC_JOB_MIGRATION_IN: - if (qemuProcessRecoverMigrationIn(driver, vm, job, - state, reason) < 0) + if (qemuProcessRecoverMigration(driver, vm, job, stopFlags) < 0) return -1; break; =20 --=20 2.35.1