From nobody Sun Sep 28 16:36:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1757937681; cv=none; d=zohomail.com; s=zohoarc; b=Y3DEhOFIw2N3V0A5Xj1tqt6WnmI0VFwXLq9WnQOAolwuluSHSHZ6YspsdYLTdlNPuaQkaQ63mLFrOt/VpDkbekcllpOnLmlqiN/XOPDUauVoM43QzXiWtxQ0uNgCDTtzT7IyL/i1Zvypj7cTIwGiRidEla6ZWZAXnC5PkVQvroU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1757937681; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=O5Ns+gRP/BDir8gasafzaZlhxfAoGAe28uqV3eZtNbs=; b=HkUY8aC8DcXv6R+Iw0KDnBlYQsCyih5gPxm/gnrsTd8B+Qnqm95HtUKIHl8Pjo+dQI3qacHSPu33vtq/EjVoyf6neyzV8h/K06TgrGOopwd3jkM+05eo4z527TwWxvdYTCbkKH57vL7ewE7im9Co0suOP5ZXRedJdVOnXD6f1Ds= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1757937680551753.2845638252928; Mon, 15 Sep 2025 05:01:20 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uy7sD-00043e-0M; Mon, 15 Sep 2025 07:59:57 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uy7sB-00043G-EY for qemu-devel@nongnu.org; Mon, 15 Sep 2025 07:59:55 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uy7s6-0004Yy-8F for qemu-devel@nongnu.org; Mon, 15 Sep 2025 07:59:55 -0400 Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-589-1nrs_2dqPzW2lvqbYjkshw-1; Mon, 15 Sep 2025 07:59:37 -0400 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8122C19107D8; Mon, 15 Sep 2025 11:59:36 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.45.224.193]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3DDF41954126; Mon, 15 Sep 2025 11:59:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757937583; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O5Ns+gRP/BDir8gasafzaZlhxfAoGAe28uqV3eZtNbs=; b=SjrNmKzJhoGw6Btb+doV9CsXhWbZ61lTDrXdVTTMBm99ucWg7ezjnJBzdDP0Iuo5xlr1t+ 5X9BUXNXzJ6+FImI+BGMM3+qMSgv8cBySOPJpKyXz/1drFXA2boaAT5w+wh0AoDWbf9D2C Wt7D/LBPob7G+hOZm1nwhoC2q96gogk= X-MC-Unique: 1nrs_2dqPzW2lvqbYjkshw-1 X-Mimecast-MFC-AGG-ID: 1nrs_2dqPzW2lvqbYjkshw_1757937576 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , Jiri Denemark , Peter Xu , "Dr. David Alan Gilbert" , Fabiano Rosas Subject: [PATCH 1/4] migration: Do not try to start VM if disk activation fails Date: Mon, 15 Sep 2025 13:59:12 +0200 Message-ID: <20250915115918.3520735-2-jmarcin@redhat.com> In-Reply-To: <20250915115918.3520735-1-jmarcin@redhat.com> References: <20250915115918.3520735-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1757937690752116600 Content-Type: text/plain; charset="utf-8" From: Peter Xu If a rare split brain happens (e.g. dest QEMU started running somehow, taking shared drive locks), src QEMU may not be able to activate the drives anymore. In this case, src QEMU shouldn't start the VM or it might crash the block layer later with something like: bdrv_co_write_req_prepare: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' = failed. Meanwhile, src QEMU cannot try to continue either even if dest QEMU can release the drive locks (e.g. by QMP "stop"). Because as long as dest QEMU started running, it means dest QEMU's RAM is the only version that is consistent with current status of the shared storage. Signed-off-by: Peter Xu Reviewed-by: Fabiano Rosas --- migration/migration.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 10c216d25d..54dac3db88 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3502,6 +3502,8 @@ static MigIterateState migration_iteration_run(Migrat= ionState *s) =20 static void migration_iteration_finish(MigrationState *s) { + Error *local_err =3D NULL; + bql_lock(); =20 /* @@ -3525,11 +3527,28 @@ static void migration_iteration_finish(MigrationSta= te *s) case MIGRATION_STATUS_FAILED: case MIGRATION_STATUS_CANCELLED: case MIGRATION_STATUS_CANCELLING: - /* - * Re-activate the block drives if they're inactivated. Note, COLO - * shouldn't use block_active at all, so it should be no-op there. - */ - migration_block_activate(NULL); + if (!migration_block_activate(&local_err)) { + /* + * Re-activate the block drives if they're inactivated. + * + * If it fails (e.g. in case of a split brain, where dest QEMU + * might have taken some of the drive locks and running!), do + * not start VM, instead wait for mgmt to decide the next step. + * + * If dest already started, it means dest QEMU should contain + * all the data it needs and it properly owns all the drive + * locks. Then even if src QEMU got a FAILED in migration, it + * normally should mean we should treat the migration as + * COMPLETED. + * + * NOTE: it's not safe anymore to start VM on src now even if + * dest would release the drive locks. It's because as long as + * dest started running then only dest QEMU's RAM is consistent + * with the shared storage. + */ + error_free(local_err); + break; + } if (runstate_is_live(s->vm_old_state)) { if (!runstate_check(RUN_STATE_SHUTDOWN)) { vm_start(); --=20 2.51.0 From nobody Sun Sep 28 16:36:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1757937809; cv=none; d=zohomail.com; s=zohoarc; b=ZNxTzFERjHOiV8o8jNWNS/vLrrU45lkZnkAGYfLEzB484ZKW9Efjgz4HkgxbjLLWJFe5jikBQU+N6c7oVQWcw/wu+ZjsU0AdRaKy9CSMQd1aEIkvMMOuLbXSkzFE3xCPq1iUStsSWJBbigNzPqbN2ZOP8ccY/DNiIfp628zQNR4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1757937809; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=a2UP8i+XysbpcmQ+9CUYa5lGbv29nNSPR9oZG3/6NeA=; b=mNHunmIZ6dPVQm2cXDJRkMZDcoaxF9Mmki7S87cKR2hZ9VZSbuhfnk47RhMr6hZVHIJR8/MybVhCI03etCgw3JWhUrDd1uWr3RLgz0O8yWj8litpSDxWNMHcQPXcEJBrYlTo7NFCo4LQ5qUAzzMmvioT8WtqKA72mByrVJd16E0= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1757937809936336.5395590630701; Mon, 15 Sep 2025 05:03:29 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uy7sc-00046j-2z; Mon, 15 Sep 2025 08:00:23 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uy7sL-00045X-4S for qemu-devel@nongnu.org; Mon, 15 Sep 2025 08:00:05 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uy7s7-0004ZY-AF for qemu-devel@nongnu.org; Mon, 15 Sep 2025 08:00:03 -0400 Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-609-A0vxFNLPMF-mAykjCH3LZw-1; Mon, 15 Sep 2025 07:59:44 -0400 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C25D21800378; Mon, 15 Sep 2025 11:59:43 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.45.224.193]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 1BB641954128; Mon, 15 Sep 2025 11:59:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757937586; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=a2UP8i+XysbpcmQ+9CUYa5lGbv29nNSPR9oZG3/6NeA=; b=OyZekJ/Qr5By3n+4XO9c4PMw1oWDhQfbK76+yYpWtZktZMeASNnyKZtdfXdvvLICeBuMzT rSPY0+zKzii7f2cqnKYK3udtuwZGkwgTUg5Tv+gMUpd7mtGO5yR3Seqyxjr/zWgJh6xOA2 V7UEcUztjVizuVH9zAydSMRL7tI2nic= X-MC-Unique: A0vxFNLPMF-mAykjCH3LZw-1 X-Mimecast-MFC-AGG-ID: A0vxFNLPMF-mAykjCH3LZw_1757937583 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , Jiri Denemark , Peter Xu , "Dr. David Alan Gilbert" , Fabiano Rosas Subject: [PATCH 2/4] migration: Accept MigrationStatus in migration_has_failed() Date: Mon, 15 Sep 2025 13:59:13 +0200 Message-ID: <20250915115918.3520735-3-jmarcin@redhat.com> In-Reply-To: <20250915115918.3520735-1-jmarcin@redhat.com> References: <20250915115918.3520735-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.035, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1757937814401116600 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin This allows to reuse the helper also with MigrationIncomingState. Signed-off-by: Juraj Marcin --- migration/migration.c | 8 ++++---- migration/migration.h | 2 +- migration/multifd.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 54dac3db88..2c0b3a7229 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1542,7 +1542,7 @@ static void migration_cleanup(MigrationState *s) /* It is used on info migrate. We can't free it */ error_report_err(error_copy(s->error)); } - type =3D migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED : + type =3D migration_has_failed(s->state) ? MIG_EVENT_PRECOPY_FAILED : MIG_EVENT_PRECOPY_DONE; migration_call_notifiers(s, type, NULL); yank_unregister_instance(MIGRATION_YANK_INSTANCE); @@ -1700,10 +1700,10 @@ int migration_call_notifiers(MigrationState *s, Mig= rationEventType type, return ret; } =20 -bool migration_has_failed(MigrationState *s) +bool migration_has_failed(MigrationStatus state) { - return (s->state =3D=3D MIGRATION_STATUS_CANCELLED || - s->state =3D=3D MIGRATION_STATUS_FAILED); + return (state =3D=3D MIGRATION_STATUS_CANCELLED || + state =3D=3D MIGRATION_STATUS_FAILED); } =20 bool migration_in_postcopy(void) diff --git a/migration/migration.h b/migration/migration.h index 01329bf824..2c2331f40d 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -535,7 +535,7 @@ bool migration_is_blocked(Error **errp); bool migration_in_postcopy(void); bool migration_postcopy_is_alive(MigrationStatus state); MigrationState *migrate_get_current(void); -bool migration_has_failed(MigrationState *); +bool migration_has_failed(MigrationStatus state); bool migrate_mode_is_cpr(MigrationState *); =20 uint64_t ram_get_total_transferred_pages(void); diff --git a/migration/multifd.c b/migration/multifd.c index b255778855..c569f91f2c 100644 --- a/migration/multifd.c +++ b/migration/multifd.c @@ -568,7 +568,7 @@ void multifd_send_shutdown(void) * already failed. If the migration succeeded, errors are * not expected but there's no need to kill the source. */ - if (local_err && !migration_has_failed(migrate_get_current()))= { + if (local_err && !migration_has_failed(migrate_get_current()->= state)) { warn_report( "multifd_send_%d: Failed to terminate TLS connection: = %s", p->id, error_get_pretty(local_err)); --=20 2.51.0 From nobody Sun Sep 28 16:36:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1757937763; cv=none; d=zohomail.com; s=zohoarc; b=Jqhn/ZXFsQvsWNUPPwZkX0l48Rh9HCJpdMso1VvE8D1bWqePUu1HMX29930GpFkFPoe7nQz3T7PdVUjDkvX3YBVDfCEydfwDTJU7YlK2u0ePyQdqRaexIqe6iLDWiW4GVNOWulsl0tREvnydnwd8MQr7t7x1KwPs1GHTavR28Qw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1757937763; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=i7vTnrxV4V0xgC6bS9L8wnCdMT60fT69SjLq8DL/rM4=; b=lM4HNcUwoZTWGvgeCk4yVGpULOhD3S3hODbwxOB25zXkTTRznQKwJYf2c8R0fWrNQ2xUN6hbH5jwFNpxFwi43zPSK5wd+inT3shI2G6fQMY2ftyZgwRWhZrxFH0ezFQwcuY6+SBgeKKEJi7+HEIwmA2RE7FKRWPKzhEZvA8ckuU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1757937763238780.7706393815128; Mon, 15 Sep 2025 05:02:43 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uy7sf-00048y-99; Mon, 15 Sep 2025 08:00:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uy7sP-00046Y-9c for qemu-devel@nongnu.org; Mon, 15 Sep 2025 08:00:10 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uy7sE-0004aa-Jf for qemu-devel@nongnu.org; Mon, 15 Sep 2025 08:00:06 -0400 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-404-Y16CNe5eMJCZ1dHw3Pme7w-1; Mon, 15 Sep 2025 07:59:50 -0400 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E63271944F11; Mon, 15 Sep 2025 11:59:49 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.45.224.193]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2CD2119432AE; Mon, 15 Sep 2025 11:59:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757937594; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i7vTnrxV4V0xgC6bS9L8wnCdMT60fT69SjLq8DL/rM4=; b=P3WPHfD20Pu14WPLd/Ilh6xiur+NXcQf1IvR7T9WBxBDAkfwfeJEC48a537eg+t9irGvKl Mzxv+1VwHJY8OHlIb2ZANDpext565jdfZGXF+e/PQchqN7ao4o7dqvrI2VDZkzQvlNLdub OQh+oT22ac/90LkNjjWKRXUua6XRDPw= X-MC-Unique: Y16CNe5eMJCZ1dHw3Pme7w-1 X-Mimecast-MFC-AGG-ID: Y16CNe5eMJCZ1dHw3Pme7w_1757937590 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , Jiri Denemark , Peter Xu , "Dr. David Alan Gilbert" , Fabiano Rosas Subject: [PATCH 3/4] migration: Refactor incoming cleanup into migration_incoming_finish() Date: Mon, 15 Sep 2025 13:59:14 +0200 Message-ID: <20250915115918.3520735-4-jmarcin@redhat.com> In-Reply-To: <20250915115918.3520735-1-jmarcin@redhat.com> References: <20250915115918.3520735-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.035, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1757937767643116600 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin Currently, there are two functions that are responsible for cleanup of the incoming migration state. With successful precopy, it's the main thread and with successful postcopy it's the listen thread. However, if postcopy fails during in the device load, both functions will try to do the cleanup. Moreover, when exit-on-error parameter was added, it was applied only to precopy. This patch refactors common cleanup and exiting on error into a helper function that can be started either from precopy or postcopy, reducing the duplication. If the listen thread has been started (the postcopy state is at least LISTENING), the listen thread is responsible for all cleanup and exiting, otherwise it's the main thread's responsibility. Signed-off-by: Juraj Marcin --- migration/migration.c | 64 ++++++++++++++++++++++++------------------- migration/migration.h | 1 + migration/savevm.c | 48 +++++++++++--------------------- 3 files changed, 53 insertions(+), 60 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 2c0b3a7229..7222e3de13 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -442,9 +442,19 @@ void migration_incoming_transport_cleanup(MigrationInc= omingState *mis) void migration_incoming_state_destroy(void) { struct MigrationIncomingState *mis =3D migration_incoming_get_current(= ); + PostcopyState ps =3D postcopy_state_get(); =20 multifd_recv_cleanup(); =20 + if (mis->have_listen_thread) { + qemu_thread_join(&mis->listen_thread); + mis->have_listen_thread =3D false; + } + + if (ps !=3D POSTCOPY_INCOMING_NONE) { + postcopy_ram_incoming_cleanup(mis); + } + /* * RAM state cleanup needs to happen after multifd cleanup, because * multifd threads can use some of its states (receivedmap). @@ -809,6 +819,23 @@ static void qemu_start_incoming_migration(const char *= uri, bool has_channels, cpr_state_close(); } =20 +void migration_incoming_finish(void) +{ + MigrationState *s =3D migrate_get_current(); + MigrationIncomingState *mis =3D migration_incoming_get_current(); + + migration_incoming_state_destroy(); + + if (migration_has_failed(mis->state) && mis->exit_on_error) { + WITH_QEMU_LOCK_GUARD(&s->error_mutex) { + error_report_err(s->error); + s->error =3D NULL; + } + + exit(EXIT_FAILURE); + } +} + static void process_incoming_migration_bh(void *opaque) { MigrationIncomingState *mis =3D opaque; @@ -861,7 +888,7 @@ static void process_incoming_migration_bh(void *opaque) */ migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COMPLETED); - migration_incoming_state_destroy(); + migration_incoming_finish(); } =20 static void coroutine_fn @@ -888,23 +915,13 @@ process_incoming_migration_co(void *opaque) =20 ps =3D postcopy_state_get(); trace_process_incoming_migration_co_end(ret, ps); - if (ps !=3D POSTCOPY_INCOMING_NONE) { - if (ps =3D=3D POSTCOPY_INCOMING_ADVISE) { - /* - * Where a migration had postcopy enabled (and thus went to ad= vise) - * but managed to complete within the precopy period, we can u= se - * the normal exit. - */ - postcopy_ram_incoming_cleanup(mis); - } else if (ret >=3D 0) { - /* - * Postcopy was started, cleanup should happen at the end of t= he - * postcopy thread. - */ - trace_process_incoming_migration_co_postcopy_end_main(); - goto out; - } - /* Else if something went wrong then just fall out of the normal e= xit */ + if (ps >=3D POSTCOPY_INCOMING_LISTENING) { + /* + * Postcopy was started, cleanup should happen at the end of the + * postcopy thread. + */ + trace_process_incoming_migration_co_postcopy_end_main(); + goto out; } =20 if (ret < 0) { @@ -926,16 +943,7 @@ fail: migrate_set_error(s, local_err); error_free(local_err); =20 - migration_incoming_state_destroy(); - - if (mis->exit_on_error) { - WITH_QEMU_LOCK_GUARD(&s->error_mutex) { - error_report_err(s->error); - s->error =3D NULL; - } - - exit(EXIT_FAILURE); - } + migration_incoming_finish(); out: /* Pairs with the refcount taken in qmp_migrate_incoming() */ migrate_incoming_unref_outgoing_state(); diff --git a/migration/migration.h b/migration/migration.h index 2c2331f40d..67e3318467 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -518,6 +518,7 @@ void migrate_set_state(MigrationStatus *state, Migratio= nStatus old_state, void migration_fd_process_incoming(QEMUFile *f); void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp); void migration_incoming_process(void); +void migration_incoming_finish(void); =20 bool migration_has_all_channels(void); =20 diff --git a/migration/savevm.c b/migration/savevm.c index fabbeb296a..d7eb416d48 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2069,6 +2069,11 @@ static int loadvm_postcopy_ram_handle_discard(Migrat= ionIncomingState *mis, return 0; } =20 +static void postcopy_ram_listen_thread_bh(void *opaque) +{ + migration_incoming_finish(); +} + /* * Triggered by a postcopy_listen command; this thread takes over reading * the input stream, leaving the main thread free to carry on loading the = rest @@ -2122,52 +2127,31 @@ static void *postcopy_ram_listen_thread(void *opaqu= e) "bitmaps may be lost, and present migrated dirty " "bitmaps are correctly migrated and valid.", __func__, load_res); - load_res =3D 0; /* prevent further exit() */ } else { error_report("%s: loadvm failed: %d", __func__, load_res); migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIV= E, MIGRATION_STATUS_FAILED); + goto out; } } - if (load_res >=3D 0) { - /* - * This looks good, but it's possible that the device loading in t= he - * main thread hasn't finished yet, and so we might not be in 'RUN' - * state yet; wait for the end of the main thread. - */ - qemu_event_wait(&mis->main_thread_load_event); - } - postcopy_ram_incoming_cleanup(mis); - - if (load_res < 0) { - /* - * If something went wrong then we have a bad state so exit; - * depending how far we got it might be possible at this point - * to leave the guest running and fire MCEs for pages that never - * arrived as a desperate recovery step. - */ - rcu_unregister_thread(); - exit(EXIT_FAILURE); - } + /* + * This looks good, but it's possible that the device loading in the + * main thread hasn't finished yet, and so we might not be in 'RUN' + * state yet; wait for the end of the main thread. + */ + qemu_event_wait(&mis->main_thread_load_event); =20 migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_COMPLETED); - /* - * If everything has worked fine, then the main thread has waited - * for us to start, and we're the last use of the mis. - * (If something broke then qemu will have to exit anyway since it's - * got a bad migration state). - */ - bql_lock(); - migration_incoming_state_destroy(); - bql_unlock(); =20 +out: rcu_unregister_thread(); - mis->have_listen_thread =3D false; postcopy_state_set(POSTCOPY_INCOMING_END); =20 object_unref(OBJECT(migr)); =20 + migration_bh_schedule(postcopy_ram_listen_thread_bh, NULL); + return NULL; } =20 @@ -2217,7 +2201,7 @@ static int loadvm_postcopy_handle_listen(MigrationInc= omingState *mis) mis->have_listen_thread =3D true; postcopy_thread_create(mis, &mis->listen_thread, MIGRATION_THREAD_DST_LISTEN, - postcopy_ram_listen_thread, QEMU_THREAD_DETACHE= D); + postcopy_ram_listen_thread, QEMU_THREAD_JOINABL= E); trace_loadvm_postcopy_handle_listen("return"); =20 return 0; --=20 2.51.0 From nobody Sun Sep 28 16:36:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1757937740; cv=none; d=zohomail.com; s=zohoarc; b=Ra0RUWGMFKXCE4TPbSJ68zBf8BE1S52j1CFX+7nz2rVQnNom7Lu3bs9QUHzwN596YUUaW4ub9vvQw9+H3WF2CPxgZRosxO5UjK7fAATkEr17jOrYjNp8JoW2twgLdiC8/3vSJ2zGGlkEYGCKYRJxmKXDiUu+TSLJ5dg3ZMTqa3g= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1757937740; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=0tp5oeg6dvweY9fzo/znq4Z2Pm2mGPwulbHE8Lic6BM=; b=UFpoxojigy2femJWuKaUsqgkBcjDBGEXgagmKNSrs2sku1AQXaR94B3HBOBl2X+Z30213IxFevFECoTFgPP6wPcftt3ORaTC/gwZRpT1N0YrtdovTxM5+CUj8wyj58UXkQuk0yL9Cvytw8ZaNj2MDuGBUnXQAzJZ4ZUEZTQyZ7U= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 175793774094660.20472126787888; Mon, 15 Sep 2025 05:02:20 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uy7sx-0004Ce-Hs; Mon, 15 Sep 2025 08:00:43 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uy7se-00049U-51 for qemu-devel@nongnu.org; Mon, 15 Sep 2025 08:00:24 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uy7sP-0004dl-5p for qemu-devel@nongnu.org; Mon, 15 Sep 2025 08:00:18 -0400 Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-178-mfCfdxoTOguO1-mMIvy6ag-1; Mon, 15 Sep 2025 07:59:58 -0400 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4CF981953946; Mon, 15 Sep 2025 11:59:57 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.45.224.193]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CC9A319540EB; Mon, 15 Sep 2025 11:59:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757937603; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0tp5oeg6dvweY9fzo/znq4Z2Pm2mGPwulbHE8Lic6BM=; b=FbvuRH4qi7fSAn3xaD8l4qK6pFDOZmkK2nFPl0pIkDARLKs+Roj0yZrG74u9yIoACFHgZY fbioMy+34sJdhlB78gEQ9C+rvHryHkdFCI6Tuo7B1RNSyiK+RqfJMKzxKrfrdrHAaxZxqC GS/KxBOY71NKyh0cpt3WfoA/cOMXxss= X-MC-Unique: mfCfdxoTOguO1-mMIvy6ag-1 X-Mimecast-MFC-AGG-ID: mfCfdxoTOguO1-mMIvy6ag_1757937597 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , Jiri Denemark , Peter Xu , "Dr. David Alan Gilbert" , Fabiano Rosas Subject: [PATCH 4/4] migration: Introduce POSTCOPY_DEVICE state Date: Mon, 15 Sep 2025 13:59:15 +0200 Message-ID: <20250915115918.3520735-5-jmarcin@redhat.com> In-Reply-To: <20250915115918.3520735-1-jmarcin@redhat.com> References: <20250915115918.3520735-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.035, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1757937743461116600 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin Currently, when postcopy starts, the source VM starts switchover and sends a package containing the state of all non-postcopiable devices. When the destination loads this package, the switchover is complete and the destination VM starts. However, if the device state load fails or the destination side crashes, the source side is already in POSTCOPY_ACTIVE state and cannot be recovered, even when it has the most up-to-date machine state as the destination has not yet started. This patch introduces a new POSTCOPY_DEVICE state which is active while the destination machine is loading the device state, is not yet running, and the source side can be resumed in case of a migration failure. To transition from POSTCOPY_DEVICE to POSTCOPY_ACTIVE, the source side uses a PONG message that is a response to a PING message processed just before the POSTCOPY_RUN command that starts the destination VM. Thus, this change does not require any changes on the destination side and is effective even with older destination versions. Signed-off-by: Juraj Marcin --- migration/migration.c | 23 ++++++++++++++++++----- migration/savevm.h | 2 ++ migration/trace-events | 1 + qapi/migration.json | 8 ++++++-- tests/qtest/migration/precopy-tests.c | 3 ++- 5 files changed, 29 insertions(+), 8 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 7222e3de13..e63a7487be 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1223,6 +1223,7 @@ bool migration_is_running(void) =20 switch (s->state) { case MIGRATION_STATUS_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_DEVICE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: @@ -1244,6 +1245,7 @@ static bool migration_is_active(void) MigrationState *s =3D current_migration; =20 return (s->state =3D=3D MIGRATION_STATUS_ACTIVE || + s->state =3D=3D MIGRATION_STATUS_POSTCOPY_DEVICE || s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE); } =20 @@ -1366,6 +1368,7 @@ static void fill_source_migration_info(MigrationInfo = *info) break; case MIGRATION_STATUS_ACTIVE: case MIGRATION_STATUS_CANCELLING: + case MIGRATION_STATUS_POSTCOPY_DEVICE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: @@ -1419,6 +1422,7 @@ static void fill_destination_migration_info(Migration= Info *info) case MIGRATION_STATUS_CANCELLING: case MIGRATION_STATUS_CANCELLED: case MIGRATION_STATUS_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_DEVICE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: case MIGRATION_STATUS_POSTCOPY_RECOVER: @@ -1719,6 +1723,7 @@ bool migration_in_postcopy(void) MigrationState *s =3D migrate_get_current(); =20 switch (s->state) { + case MIGRATION_STATUS_POSTCOPY_DEVICE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: @@ -2564,6 +2569,11 @@ static void *source_return_path_thread(void *opaque) tmp32 =3D ldl_be_p(buf); trace_source_return_path_thread_pong(tmp32); qemu_sem_post(&ms->rp_state.rp_pong_acks); + if (tmp32 =3D=3D QEMU_VM_PING_PACKAGED_LOADED) { + trace_source_return_path_thread_dst_started(); + migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_DE= VICE, + MIGRATION_STATUS_POSTCOPY_ACTIVE); + } break; =20 case MIG_RP_MSG_REQ_PAGES: @@ -2814,6 +2824,7 @@ static int postcopy_start(MigrationState *ms, Error *= *errp) if (migrate_postcopy_ram()) { qemu_savevm_send_ping(fb, 3); } + qemu_savevm_send_ping(fb, QEMU_VM_PING_PACKAGED_LOADED); =20 qemu_savevm_send_postcopy_run(fb); =20 @@ -2871,7 +2882,7 @@ static int postcopy_start(MigrationState *ms, Error *= *errp) =20 /* Now, switchover looks all fine, switching to postcopy-active */ migrate_set_state(&ms->state, MIGRATION_STATUS_DEVICE, - MIGRATION_STATUS_POSTCOPY_ACTIVE); + MIGRATION_STATUS_POSTCOPY_DEVICE); =20 bql_unlock(); =20 @@ -3035,7 +3046,8 @@ static void migration_completion(MigrationState *s) =20 if (s->state =3D=3D MIGRATION_STATUS_ACTIVE) { ret =3D migration_completion_precopy(s); - } else if (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE) { + } else if (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_DEVICE || + s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE) { migration_completion_postcopy(s); } else { ret =3D -1; @@ -3311,8 +3323,8 @@ static MigThrError migration_detect_error(MigrationSt= ate *s) return postcopy_pause(s); } else { /* - * For precopy (or postcopy with error outside IO), we fail - * with no time. + * For precopy (or postcopy with error outside IO, or before dest + * starts), we fail with no time. */ migrate_set_state(&s->state, state, MIGRATION_STATUS_FAILED); trace_migration_thread_file_err(); @@ -3447,7 +3459,8 @@ static MigIterateState migration_iteration_run(Migrat= ionState *s) { uint64_t must_precopy, can_postcopy, pending_size; Error *local_err =3D NULL; - bool in_postcopy =3D s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE; + bool in_postcopy =3D (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_DEVICE= || + s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE); bool can_switchover =3D migration_can_switchover(s); bool complete_ready; =20 diff --git a/migration/savevm.h b/migration/savevm.h index 2d5e9c7166..c4de0325eb 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -29,6 +29,8 @@ #define QEMU_VM_COMMAND 0x08 #define QEMU_VM_SECTION_FOOTER 0x7e =20 +#define QEMU_VM_PING_PACKAGED_LOADED 0x42 + bool qemu_savevm_state_blocked(Error **errp); void qemu_savevm_non_migratable_list(strList **reasons); int qemu_savevm_state_prepare(Error **errp); diff --git a/migration/trace-events b/migration/trace-events index 706db97def..007b5c407e 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -191,6 +191,7 @@ source_return_path_thread_pong(uint32_t val) "0x%x" source_return_path_thread_shut(uint32_t val) "0x%x" source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32 source_return_path_thread_switchover_acked(void) "" +source_return_path_thread_dst_started(void) "" migration_thread_low_pending(uint64_t pending) "%" PRIu64 migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t ba= ndwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_sp= ent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %"= PRId64 process_incoming_migration_co_end(int ret, int ps) "ret=3D%d postcopy-stat= e=3D%d" diff --git a/qapi/migration.json b/qapi/migration.json index 2387c21e9c..89a20d858d 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -142,6 +142,10 @@ # @postcopy-active: like active, but now in postcopy mode. # (since 2.5) # +# @postcopy-device: like postcopy-active, but the destination is still +# loading device state and is not running yet. If migration fails +# during this state, the source side will resume. (since 10.2) +# # @postcopy-paused: during postcopy but paused. (since 3.0) # # @postcopy-recover-setup: setup phase for a postcopy recovery @@ -173,8 +177,8 @@ ## { 'enum': 'MigrationStatus', 'data': [ 'none', 'setup', 'cancelling', 'cancelled', - 'active', 'postcopy-active', 'postcopy-paused', - 'postcopy-recover-setup', + 'active', 'postcopy-device', 'postcopy-active', + 'postcopy-paused', 'postcopy-recover-setup', 'postcopy-recover', 'completed', 'failed', 'colo', 'pre-switchover', 'device', 'wait-unplug' ] } ## diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/= precopy-tests.c index bb38292550..57ca623de5 100644 --- a/tests/qtest/migration/precopy-tests.c +++ b/tests/qtest/migration/precopy-tests.c @@ -1316,13 +1316,14 @@ void migration_test_add_precopy(MigrationTestEnv *e= nv) } =20 /* ensure new status don't go unnoticed */ - assert(MIGRATION_STATUS__MAX =3D=3D 15); + assert(MIGRATION_STATUS__MAX =3D=3D 16); =20 for (int i =3D MIGRATION_STATUS_NONE; i < MIGRATION_STATUS__MAX; i++) { switch (i) { case MIGRATION_STATUS_DEVICE: /* happens too fast */ case MIGRATION_STATUS_WAIT_UNPLUG: /* no support in tests */ case MIGRATION_STATUS_COLO: /* no support in tests */ + case MIGRATION_STATUS_POSTCOPY_DEVICE: /* postcopy can't be cancel= led */ case MIGRATION_STATUS_POSTCOPY_ACTIVE: /* postcopy can't be cancel= led */ case MIGRATION_STATUS_POSTCOPY_PAUSED: case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: --=20 2.51.0