From nobody Mon Feb 9 16:33:35 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) client-ip=170.10.129.124; envelope-from=libvir-list-bounces@redhat.com; helo=us-smtp-delivery-124.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1671115081; cv=none; d=zohomail.com; s=zohoarc; b=HtV2Gd5N0+UeWsVZQK9CkGqScY7A7dv7Jxa8jKOVJLYCFuXN08uopQy+JBOPAq3t10QJ4FYsdWgJsXhoOJow6t4hoh7WWSLqD9zEOxXs6alTq2SQbuLrieH8WOqBf8v/8DDhbFGKMFZGZc2E+FAlQwavcAGI8nKL7gIfq8PKhT0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1671115081; h=Content-Type:Content-Transfer-Encoding:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=+Vyxj9zvoU6gPSCw6NFButv3NwmbkuoVYbZCbvVXU6E=; b=OSBd6qB8N2KHdWv38KpO16lRk2zrMK3U7z8GWE8OQUsYTrgZgbrgDHkIho54oqsbRE96yg23nZzxmuKyyWamXO9A2PPD8IKKAgruBozuPM1gQ90GlKYD1f6UxNIjcAsXWe1rTqDZIJeckpiGhfRp2uljpnxzE4Y9QB6c+4TmKwo= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mx.zohomail.com with SMTPS id 1671115081241846.3094322435114; Thu, 15 Dec 2022 06:38:01 -0800 (PST) Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-411-XcyP_NHyPfKgnic6BEoVEg-1; Thu, 15 Dec 2022 09:37:57 -0500 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 44F1380556A; Thu, 15 Dec 2022 14:37:53 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2D0BC408572B; Thu, 15 Dec 2022 14:37:53 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id BC84419466DF; Thu, 15 Dec 2022 14:37:52 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 800C71946A7D for ; Thu, 15 Dec 2022 14:37:51 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id 648CF49BB6A; Thu, 15 Dec 2022 14:37:51 +0000 (UTC) Received: from virval.usersys.redhat.com (unknown [10.43.2.227]) by smtp.corp.redhat.com (Postfix) with ESMTP id 26D48400F5A for ; Thu, 15 Dec 2022 14:37:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671115079; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=+Vyxj9zvoU6gPSCw6NFButv3NwmbkuoVYbZCbvVXU6E=; b=QdLZP/wOeDytRuXB2T4jxyfITxc0s1caAJpxtqyq3h3piULW++cQbIJm89Aoxj+Vj1O847 /JM6PchaN4aYe6RXTKlT57uCsik7tv6H3dL6kbC4bj+2/i9zveynjSkopjyb5eRcqdOUIH tp564hzpihe263UIf7PsK5qIFTqsKyE= X-MC-Unique: XcyP_NHyPfKgnic6BEoVEg-1 X-Original-To: libvir-list@listman.corp.redhat.com From: Jiri Denemark To: libvir-list@redhat.com Subject: [libvirt PATCH 3/4] qemu: Remember failed post-copy migration in job Date: Thu, 15 Dec 2022 15:37:43 +0100 Message-Id: <02fea52fe4614cdc0fe948ec504d86adbf0eeda4.1671114079.git.jdenemar@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libvir-list-bounces@redhat.com Sender: "libvir-list" X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1671115083545100004 Content-Type: text/plain; charset="utf-8" When post-copy migration fails, the domain stays running on the destination with a VIR_DOMAIN_RUNNING_POSTCOPY_FAILED reason. Both the state and the reason can later be rewritten in case the domain gets paused for other reasons (such as an I/O error). Thus we need a separate place to remember the post-copy migration failed to be able to resume the migration. https://bugzilla.redhat.com/show_bug.cgi?id=3D2111948 Signed-off-by: Jiri Denemark --- src/conf/domain_conf.c | 7 ++++++- src/conf/virdomainjob.c | 1 + src/conf/virdomainjob.h | 1 + src/qemu/qemu_domainjob.c | 9 +++++++++ src/qemu/qemu_migration.c | 34 +++++++++++++++++++++++----------- src/qemu/qemu_process.c | 15 +++++++++++++++ 6 files changed, 55 insertions(+), 12 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9e2eea79e7..f83586c549 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -27874,8 +27874,13 @@ virDomainObjGetState(virDomainObj *dom, int *reaso= n) =20 bool virDomainObjIsFailedPostcopy(virDomainObj *dom, - virDomainJobObj *job G_GNUC_UNUSED) + virDomainJobObj *job) { + if (job && job->asyncPaused && + (job->asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_IN || + job->asyncJob =3D=3D VIR_ASYNC_JOB_MIGRATION_OUT)) + return true; + return ((dom->state.state =3D=3D VIR_DOMAIN_PAUSED && dom->state.reason =3D=3D VIR_DOMAIN_PAUSED_POSTCOPY_FAILED) || (dom->state.state =3D=3D VIR_DOMAIN_RUNNING && diff --git a/src/conf/virdomainjob.c b/src/conf/virdomainjob.c index 256b665a42..c4cbbe8f6d 100644 --- a/src/conf/virdomainjob.c +++ b/src/conf/virdomainjob.c @@ -174,6 +174,7 @@ virDomainObjResetAsyncJob(virDomainJobObj *job) job->asyncOwner =3D 0; g_clear_pointer(&job->asyncOwnerAPI, g_free); job->asyncStarted =3D 0; + job->asyncPaused =3D false; job->phase =3D 0; job->mask =3D VIR_JOB_DEFAULT_MASK; job->abortJob =3D false; diff --git a/src/conf/virdomainjob.h b/src/conf/virdomainjob.h index b1ac36a2fa..0d62bab287 100644 --- a/src/conf/virdomainjob.h +++ b/src/conf/virdomainjob.h @@ -176,6 +176,7 @@ struct _virDomainJobObj { unsigned long long asyncOwner; /* Thread which set current async = job */ char *asyncOwnerAPI; /* The API which owns the async jo= b */ unsigned long long asyncStarted; /* When the current async job star= ted */ + bool asyncPaused; /* The async job is paused */ int phase; /* Job phase (mainly for migration= s) */ unsigned long long mask; /* Jobs allowed during async job */ virDomainJobData *current; /* async job progress data */ diff --git a/src/qemu/qemu_domainjob.c b/src/qemu/qemu_domainjob.c index 8d958b9d21..27beb5229f 100644 --- a/src/qemu/qemu_domainjob.c +++ b/src/qemu/qemu_domainjob.c @@ -695,6 +695,8 @@ qemuDomainObjPrivateXMLFormatJob(virBuffer *buf, if (vm->job->asyncJob !=3D VIR_ASYNC_JOB_NONE) { virBufferAsprintf(&attrBuf, " flags=3D'0x%x'", vm->job->apiFlags); virBufferAsprintf(&attrBuf, " asyncStarted=3D'%llu'", vm->job->asy= ncStarted); + if (vm->job->asyncPaused) + virBufferAddLit(&attrBuf, " asyncPaused=3D'yes'"); } =20 if (vm->job->cb && @@ -732,6 +734,7 @@ qemuDomainObjPrivateXMLParseJob(virDomainObj *vm, =20 if ((tmp =3D virXPathString("string(@async)", ctxt))) { int async; + virTristateBool paused; =20 if ((async =3D virDomainAsyncJobTypeFromString(tmp)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -757,6 +760,12 @@ qemuDomainObjPrivateXMLParseJob(virDomainObj *vm, _("Invalid async job start")); return -1; } + + if (virXMLPropTristateBool(ctxt->node, "asyncPaused", VIR_XML_PROP= _NONE, + &paused) < 0) + return -1; + + vm->job->asyncPaused =3D paused =3D=3D VIR_TRISTATE_BOOL_YES; } =20 if (virXMLPropUInt(ctxt->node, "flags", 16, VIR_XML_PROP_NONE, diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 27a74795d6..f258e7d700 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1666,17 +1666,19 @@ qemuMigrationSrcPostcopyFailed(virDomainObj *vm) =20 state =3D virDomainObjGetState(vm, &reason); =20 - VIR_DEBUG("%s/%s", + VIR_DEBUG("%s/%s, asyncPaused=3D%u", virDomainStateTypeToString(state), - virDomainStateReasonToString(state, reason)); + virDomainStateReasonToString(state, reason), + vm->job->asyncPaused); =20 if (state !=3D VIR_DOMAIN_PAUSED || - reason =3D=3D VIR_DOMAIN_PAUSED_POSTCOPY_FAILED) + virDomainObjIsFailedPostcopy(vm, vm->job)) return; =20 VIR_WARN("Migration of domain %s failed during post-copy; " "leaving the domain paused", vm->def->name); =20 + vm->job->asyncPaused =3D true; virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_POSTCOPY_FAILED); event =3D virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_SUSPE= NDED, @@ -1696,21 +1698,31 @@ qemuMigrationDstPostcopyFailed(virDomainObj *vm) =20 state =3D virDomainObjGetState(vm, &reason); =20 - VIR_DEBUG("%s/%s", + VIR_DEBUG("%s/%s, asyncPaused=3D%u", virDomainStateTypeToString(state), - virDomainStateReasonToString(state, reason)); + virDomainStateReasonToString(state, reason), + vm->job->asyncPaused); =20 - if (state !=3D VIR_DOMAIN_RUNNING || - reason =3D=3D VIR_DOMAIN_RUNNING_POSTCOPY_FAILED) + if ((state !=3D VIR_DOMAIN_RUNNING && state !=3D VIR_DOMAIN_PAUSED) || + virDomainObjIsFailedPostcopy(vm, vm->job)) return; =20 VIR_WARN("Incoming migration of domain '%s' failed during post-copy; " "leaving the domain running", vm->def->name); =20 - virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, - VIR_DOMAIN_RUNNING_POSTCOPY_FAILED); - event =3D virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_RESUM= ED, - VIR_DOMAIN_EVENT_RESUMED_POS= TCOPY_FAILED); + vm->job->asyncPaused =3D true; + if (state =3D=3D VIR_DOMAIN_RUNNING) { + virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, + VIR_DOMAIN_RUNNING_POSTCOPY_FAILED); + event =3D virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_R= ESUMED, + VIR_DOMAIN_EVENT_RESUMED= _POSTCOPY_FAILED); + } else { + /* The domain was paused for other reasons (I/O error, ...) so we = don't + * want to rewrite the original reason and just emit a postcopy-fa= iled + * event. */ + event =3D virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_S= USPENDED, + VIR_DOMAIN_EVENT_SUSPEND= ED_POSTCOPY_FAILED); + } virObjectEventStateQueue(driver->domainEventState, event); } =20 diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 6091c9f1a9..017a05d57e 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -712,6 +712,15 @@ qemuProcessHandleResume(qemuMonitor *mon G_GNUC_UNUSED, vm->def->name, virDomainRunningReasonTypeToString(reason= ), eventDetail); =20 + /* When a domain is running in (failed) post-copy migration on the + * destination host, we need to make sure to set the appropriate r= eason + * here. */ + if (virDomainObjIsPostcopy(vm, vm->job)) { + if (virDomainObjIsFailedPostcopy(vm, vm->job)) + reason =3D VIR_DOMAIN_RUNNING_POSTCOPY_FAILED; + else + reason =3D VIR_DOMAIN_RUNNING_POSTCOPY; + } virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, reason); event =3D virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_RESUMED, @@ -1491,6 +1500,7 @@ qemuProcessHandleMigrationStatus(qemuMonitor *mon G_G= NUC_UNUSED, vm->def->name, virDomainStateTypeToString(state), NULLSTR(virDomainStateReasonToString(state, reason))= ); + vm->job->asyncPaused =3D false; virDomainObjSetState(vm, state, reason); event =3D virDomainEventLifecycleNewFromObj(vm, eventType, eve= ntDetail); qemuDomainSaveStatus(vm); @@ -3420,6 +3430,7 @@ qemuProcessRestoreMigrationJob(virDomainObj *vm, job->privateData =3D g_steal_pointer(&vm->job->privateData); vm->job->privateData =3D jobPriv; vm->job->apiFlags =3D job->apiFlags; + vm->job->asyncPaused =3D job->asyncPaused; =20 qemuDomainCleanupAdd(vm, qemuProcessCleanupMigrationJob); } @@ -3645,6 +3656,7 @@ qemuProcessRecoverMigration(virQEMUDriver *driver, if (migStatus =3D=3D VIR_DOMAIN_JOB_STATUS_POSTCOPY) { VIR_DEBUG("Post-copy migration of domain %s still running, it = will be handled as unattended", vm->def->name); + vm->job->asyncPaused =3D false; return 0; } =20 @@ -3653,6 +3665,9 @@ qemuProcessRecoverMigration(virQEMUDriver *driver, qemuMigrationSrcPostcopyFailed(vm); else qemuMigrationDstPostcopyFailed(vm); + /* Set the asyncPaused flag in case we're reconnecting to a do= main + * started by an older libvirt. */ + vm->job->asyncPaused =3D true; return 0; } =20 --=20 2.39.0