From nobody Sun Feb 8 15:01:34 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) client-ip=170.10.133.124; envelope-from=libvir-list-bounces@redhat.com; helo=us-smtp-delivery-124.mimecast.com; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1654087834; cv=none; d=zohomail.com; s=zohoarc; b=BzkLuDn8IoReK2+6CnMILnHIIq2rwOIB1TwIj7C6fvDhg5HbrMldD3o7/w8A9QAt/i310ruO1/d3uAMAqIYLoZrgHER6uj7cIeFUqDesued67xgicm99MMk+18NlgPnw+iCcxUMH28oMhON8lMweykLLCeT63nvQuX5qTuhPE/Q= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1654087834; h=Content-Type:Content-Transfer-Encoding:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=kEI8hkTfqi6bu2SRdG/efShtGcNXr2FexaVpXnMzYq4=; b=TVqUnDC3e1Xw9sf5r/conlrLKEJkeRHeVOc1UcWiQguou/Dl+7H3HHZxoX6FwPuJvC3nIphKHG2vE+L0PeyLPr+BNX2Tttu/kDt4lL0zf8guFcL/trwIusGglrGJcN+YpUYG/4A7o21ubwzkllAdLGlsv1ZccwxCiuRBopWYlqs= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mx.zohomail.com with SMTPS id 1654087834565712.5415964254371; Wed, 1 Jun 2022 05:50:34 -0700 (PDT) Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-456-f6NtlfVaNluUfSacG8nOhA-1; Wed, 01 Jun 2022 08:50:31 -0400 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B6FE6100E658; Wed, 1 Jun 2022 12:50:29 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (unknown [10.30.29.100]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8D10C40EC008; Wed, 1 Jun 2022 12:50:29 +0000 (UTC) Received: from mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (localhost [IPv6:::1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id 28A4B1947B99; Wed, 1 Jun 2022 12:50:28 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) by mm-prod-listman-01.mail-001.prod.us-east-1.aws.redhat.com (Postfix) with ESMTP id B68CB194706A for ; Wed, 1 Jun 2022 12:50:26 +0000 (UTC) Received: by smtp.corp.redhat.com (Postfix) id A986D400F75F; Wed, 1 Jun 2022 12:50:26 +0000 (UTC) Received: from virval.usersys.redhat.com (unknown [10.43.2.227]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6D56140CF8EB for ; Wed, 1 Jun 2022 12:50:26 +0000 (UTC) Received: by virval.usersys.redhat.com (Postfix, from userid 500) id 62519245B50; Wed, 1 Jun 2022 14:50:25 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1654087833; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=kEI8hkTfqi6bu2SRdG/efShtGcNXr2FexaVpXnMzYq4=; b=HePaq1oH8Ws6ffKikmBlo2HLmAlfCJg2UKnyoXgEC+sqZVhbZhtLJXBezt2FzMirwZloKR dFBnTtsPNDI462d1Y0p+KTYF8RfzMSTwhWBZis1IWP49ERczg4nrLkU541x3fJrDNd0HIV A0yVOzZI6EBRNLw80MI1uRYTDL/+KJY= X-MC-Unique: f6NtlfVaNluUfSacG8nOhA-1 X-Original-To: libvir-list@listman.corp.redhat.com From: Jiri Denemark To: libvir-list@redhat.com Subject: [libvirt PATCH v2 04/81] qemu: Keep domain running on dst on failed post-copy migration Date: Wed, 1 Jun 2022 14:49:04 +0200 Message-Id: <6c28c740c13e8afe7409a667bd00c093a5feb7bd.1654087150.git.jdenemar@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.11.54.1 X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libvir-list-bounces@redhat.com Sender: "libvir-list" X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=libvir-list-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1654087835805100002 Content-Type: text/plain; charset="utf-8" There's no need to artificially pause a domain when post-copy fails from our point of view unless QEMU connection is broken too as migration may still be progressing well. Signed-off-by: Jiri Denemark Reviewed-by: Peter Krempa --- Notes: Version 2: - commit message and warning text updated - dropped dead code from qemuMigrationSrcPostcopyFailed - source domain is always paused once it enters post-copy, handling RUNNING state there was a leftover from before this patch src/qemu/qemu_migration.c | 51 ++++++++++++++++++++++++++------------- src/qemu/qemu_migration.h | 6 +++-- src/qemu/qemu_process.c | 8 +++--- 3 files changed, 42 insertions(+), 23 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 6cc68a567a..326e17ddd7 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1577,34 +1577,51 @@ qemuMigrationSrcIsSafe(virDomainDef *def, =20 =20 void -qemuMigrationAnyPostcopyFailed(virQEMUDriver *driver, - virDomainObj *vm) +qemuMigrationSrcPostcopyFailed(virDomainObj *vm) { virDomainState state; int reason; =20 state =3D virDomainObjGetState(vm, &reason); =20 - if (state !=3D VIR_DOMAIN_PAUSED && - state !=3D VIR_DOMAIN_RUNNING) - return; + VIR_DEBUG("%s/%s", + virDomainStateTypeToString(state), + virDomainStateReasonToString(state, reason)); =20 - if (state =3D=3D VIR_DOMAIN_PAUSED && + if (state !=3D VIR_DOMAIN_PAUSED || reason =3D=3D VIR_DOMAIN_PAUSED_POSTCOPY_FAILED) return; =20 VIR_WARN("Migration of domain %s failed during post-copy; " "leaving the domain paused", vm->def->name); =20 - if (state =3D=3D VIR_DOMAIN_RUNNING) { - if (qemuProcessStopCPUs(driver, vm, - VIR_DOMAIN_PAUSED_POSTCOPY_FAILED, - VIR_ASYNC_JOB_MIGRATION_IN) < 0) - VIR_WARN("Unable to pause guest CPUs for %s", vm->def->name); - } else { - virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, - VIR_DOMAIN_PAUSED_POSTCOPY_FAILED); - } + virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, + VIR_DOMAIN_PAUSED_POSTCOPY_FAILED); +} + + +void +qemuMigrationDstPostcopyFailed(virDomainObj *vm) +{ + virDomainState state; + int reason; + + state =3D virDomainObjGetState(vm, &reason); + + VIR_DEBUG("%s/%s", + virDomainStateTypeToString(state), + virDomainStateReasonToString(state, reason)); + + if (state !=3D VIR_DOMAIN_RUNNING || + reason =3D=3D VIR_DOMAIN_RUNNING_POSTCOPY_FAILED) + return; + + VIR_WARN("Migration protocol failed during incoming migration of domai= n " + "%s, but QEMU keeps migrating; leaving the domain running, th= e " + "migration will be handled as unattended", vm->def->name); + + virDomainObjSetState(vm, VIR_DOMAIN_RUNNING, + VIR_DOMAIN_RUNNING_POSTCOPY_FAILED); } =20 =20 @@ -3453,7 +3470,7 @@ qemuMigrationSrcConfirmPhase(virQEMUDriver *driver, =20 if (virDomainObjGetState(vm, &reason) =3D=3D VIR_DOMAIN_PAUSED && reason =3D=3D VIR_DOMAIN_PAUSED_POSTCOPY) - qemuMigrationAnyPostcopyFailed(driver, vm); + qemuMigrationSrcPostcopyFailed(vm); else qemuMigrationSrcRestoreDomainState(driver, vm); =20 @@ -5826,7 +5843,7 @@ qemuMigrationDstFinish(virQEMUDriver *driver, VIR_DOMAIN_EVENT_STOPPED_FAILED); virObjectEventStateQueue(driver->domainEventState, event); } else { - qemuMigrationAnyPostcopyFailed(driver, vm); + qemuMigrationDstPostcopyFailed(vm); } } =20 diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index a8afa66119..c4e4228282 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -251,8 +251,10 @@ qemuMigrationDstRun(virQEMUDriver *driver, virDomainAsyncJob asyncJob); =20 void -qemuMigrationAnyPostcopyFailed(virQEMUDriver *driver, - virDomainObj *vm); +qemuMigrationSrcPostcopyFailed(virDomainObj *vm); + +void +qemuMigrationDstPostcopyFailed(virDomainObj *vm); =20 int qemuMigrationSrcFetchMirrorStats(virQEMUDriver *driver, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e8936cd623..0d39c67dfc 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3411,7 +3411,7 @@ qemuProcessRecoverMigrationIn(virQEMUDriver *driver, * confirm success or failure yet; killing it seems safest unless * we already started guest CPUs or we were in post-copy mode */ if (postcopy) { - qemuMigrationAnyPostcopyFailed(driver, vm); + qemuMigrationDstPostcopyFailed(vm); } else if (state !=3D VIR_DOMAIN_RUNNING) { VIR_DEBUG("Killing migrated domain %s", vm->def->name); return -1; @@ -3462,7 +3462,7 @@ qemuProcessRecoverMigrationOut(virQEMUDriver *driver, * post-copy mode */ if (postcopy) { - qemuMigrationAnyPostcopyFailed(driver, vm); + qemuMigrationSrcPostcopyFailed(vm); } else { VIR_DEBUG("Cancelling unfinished migration of domain %s", vm->def->name); @@ -3480,7 +3480,7 @@ qemuProcessRecoverMigrationOut(virQEMUDriver *driver, * post-copy mode we can use PAUSED_POSTCOPY_FAILED state for this */ if (postcopy) - qemuMigrationAnyPostcopyFailed(driver, vm); + qemuMigrationSrcPostcopyFailed(vm); break; =20 case QEMU_MIGRATION_PHASE_CONFIRM3_CANCELLED: @@ -3489,7 +3489,7 @@ qemuProcessRecoverMigrationOut(virQEMUDriver *driver, * as broken in that case */ if (postcopy) { - qemuMigrationAnyPostcopyFailed(driver, vm); + qemuMigrationSrcPostcopyFailed(vm); } else { VIR_DEBUG("Resuming domain %s after failed migration", vm->def->name); --=20 2.35.1