From nobody Fri Nov 14 16:50:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762194966; cv=none; d=zohomail.com; s=zohoarc; b=i0Jr9XLm2TbCOan6Ohk45UePD8g4eLsWyZFmae4HSBtA97QL5iNgurfwZ7Ad2LNAep9TIOJYptr2PgZsxr/cayjTULw6Ba2THlD1EKrpgZbOatJOEhir6YAOWhNMNwyr99sJ3CMNUYA+LTdn0LeggPNKfimm+0GMb0tE3se16Tg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762194966; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=FTL58dGcslqhpWl8jCgH73Dkh0Rme7iehdt9ug/4Lds=; b=CmrIzSUwATeIPFZR53q6JShBpnNesmhJL/m68hCNCSoeJB3p0EbjU3KqTngjDb9bwaNDC5Pivz+PCM3RV5MNCjJCIagl2OA2gOqTvFrfMDG1yTJwnkJAUjBu+oNHq5yWKFl/XQj6p4BjAVYxLMDaRxXHmDH7NaADXqTleLUShRA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762194966130682.8015554665703; Mon, 3 Nov 2025 10:36:06 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vFzN5-0005rT-Ld; Mon, 03 Nov 2025 13:33:39 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzN3-0005rB-5F for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:33:37 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzMt-0006xx-Ex for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:33:34 -0500 Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-563-CATjP8I_N_2V3Gs9DdNXLQ-1; Mon, 03 Nov 2025 13:33:17 -0500 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3FE78183450E; Mon, 3 Nov 2025 18:33:11 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.44.32.249]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 1D18D1800576; Mon, 3 Nov 2025 18:33:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762194801; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FTL58dGcslqhpWl8jCgH73Dkh0Rme7iehdt9ug/4Lds=; b=NG+XAiqrL+vqbn8RXsN/0sSyzbQutCzzvbRDOdK7LT2HQgBs4dWnCiNBKhKXBk37k+0v/4 QXPrgJv97zl83+4SnPKqvlkaLGnhWKV2eHFaTOn7MYqIZGUgnd/XzgfwbA4WBuoCUaVGdl qg9Mp3d3qLMqPKQy8yq5Sj5uy5rdMHQ= X-MC-Unique: CATjP8I_N_2V3Gs9DdNXLQ-1 X-Mimecast-MFC-AGG-ID: CATjP8I_N_2V3Gs9DdNXLQ_1762194796 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , "Dr. David Alan Gilbert" , Fabiano Rosas , Peter Xu , Jiri Denemark Subject: [PATCH v4 1/8] migration: Flush migration channel after sending data of CMD_PACKAGED Date: Mon, 3 Nov 2025 19:32:50 +0100 Message-ID: <20251103183301.3840862-2-jmarcin@redhat.com> In-Reply-To: <20251103183301.3840862-1-jmarcin@redhat.com> References: <20251103183301.3840862-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762194967274154100 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin If the length of the data sent after CMD_PACKAGED is just right, and there is not much data to send afterward, it is possible part of the CMD_PACKAGED payload will get left behind in the sending buffer. This causes the destination side to hang while it tries to load the whole package and initiate postcopy. Signed-off-by: Juraj Marcin Reviewed-by: Peter Xu --- migration/savevm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration/savevm.c b/migration/savevm.c index 232cae090b..fa017378db 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1142,6 +1142,7 @@ int qemu_savevm_send_packaged(QEMUFile *f, const uint= 8_t *buf, size_t len) qemu_savevm_command_send(f, MIG_CMD_PACKAGED, 4, (uint8_t *)&tmp); =20 qemu_put_buffer(f, buf, len); + qemu_fflush(f); =20 return 0; } --=20 2.51.0 From nobody Fri Nov 14 16:50:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762194908; cv=none; d=zohomail.com; s=zohoarc; b=cbHbUPbWiXK5Z2uogQtGzJVzCCd0xfFVWRO9CxWbG+0C6VbAyK0DzxVuoqhJ//gdD84AWHemM7W8zyg13EUG/tp5KT1e8tsW0+v0/T070y6As90+qhEcXDx6Mw7pa7ghq9DEfjFWj7MVCl0ni4rDE/puEIGzceP6xvnmp3Jbkhs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762194908; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=/EGZjmbie7WvN5P/atFBVzVk3p2GtY/pSoH3cMkB8LE=; b=isG1+mYoMtgX9WlWqMJEKLZik/IJJJGC0Xbh1yxCiTcDRVQcjssG6EQ20HKH2eBifISHdkF+Pp5bxcNKzj+G+QBVAT0HVmHPcVdiYVAJH7SNOuIGLK33hDzMtpj13hZVZ5pfGmH/F2vnsX0USDVMpDkHmMSlaWC/40bfbSj5B+Q= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 17621949083292.498476913156196; Mon, 3 Nov 2025 10:35:08 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vFzNA-00060B-Vn; Mon, 03 Nov 2025 13:33:45 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzN8-0005vd-2x for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:33:42 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzMw-0006yq-LO for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:33:41 -0500 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-494-6jG5dWeaOI2cT_sulp1z6g-1; Mon, 03 Nov 2025 13:33:22 -0500 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 56EA619539B8; Mon, 3 Nov 2025 18:33:14 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.44.32.249]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 068BB180058C; Mon, 3 Nov 2025 18:33:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762194806; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/EGZjmbie7WvN5P/atFBVzVk3p2GtY/pSoH3cMkB8LE=; b=ZFFNbBJEgRfzkE8yvgpk6md1+FhOm6iMjd5zFHFbs+3shKzxCGh+/UoRMqOBXKsziqOwlh 7HeWnWBqpQdr9E5B/n/1xhB2GXj98rQAC5mCIaDBAxSJC9KBaf8OFrlDJbEMs8cj6e8sNz 0A47YKynxjJ/hnd3Z2MGBWDi0oec5TY= X-MC-Unique: 6jG5dWeaOI2cT_sulp1z6g-1 X-Mimecast-MFC-AGG-ID: 6jG5dWeaOI2cT_sulp1z6g_1762194801 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , "Dr. David Alan Gilbert" , Fabiano Rosas , Peter Xu , Jiri Denemark Subject: [PATCH v4 2/8] migration: Do not try to start VM if disk activation fails Date: Mon, 3 Nov 2025 19:32:51 +0100 Message-ID: <20251103183301.3840862-3-jmarcin@redhat.com> In-Reply-To: <20251103183301.3840862-1-jmarcin@redhat.com> References: <20251103183301.3840862-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762194911018154100 Content-Type: text/plain; charset="utf-8" From: Peter Xu If a rare split brain happens (e.g. dest QEMU started running somehow, taking shared drive locks), src QEMU may not be able to activate the drives anymore. In this case, src QEMU shouldn't start the VM or it might crash the block layer later with something like: Meanwhile, src QEMU cannot try to continue either even if dest QEMU can release the drive locks (e.g. by QMP "stop"). Because as long as dest QEMU started running, it means dest QEMU's RAM is the only version that is consistent with current status of the shared storage. Signed-off-by: Peter Xu --- migration/migration.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 5e74993b46..6e647c7c4a 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3526,6 +3526,8 @@ static MigIterateState migration_iteration_run(Migrat= ionState *s) =20 static void migration_iteration_finish(MigrationState *s) { + Error *local_err =3D NULL; + bql_lock(); =20 /* @@ -3549,11 +3551,28 @@ static void migration_iteration_finish(MigrationSta= te *s) case MIGRATION_STATUS_FAILED: case MIGRATION_STATUS_CANCELLED: case MIGRATION_STATUS_CANCELLING: - /* - * Re-activate the block drives if they're inactivated. Note, COLO - * shouldn't use block_active at all, so it should be no-op there. - */ - migration_block_activate(NULL); + if (!migration_block_activate(&local_err)) { + /* + * Re-activate the block drives if they're inactivated. + * + * If it fails (e.g. in case of a split brain, where dest QEMU + * might have taken some of the drive locks and running!), do + * not start VM, instead wait for mgmt to decide the next step. + * + * If dest already started, it means dest QEMU should contain + * all the data it needs and it properly owns all the drive + * locks. Then even if src QEMU got a FAILED in migration, it + * normally should mean we should treat the migration as + * COMPLETED. + * + * NOTE: it's not safe anymore to start VM on src now even if + * dest would release the drive locks. It's because as long as + * dest started running then only dest QEMU's RAM is consistent + * with the shared storage. + */ + error_free(local_err); + break; + } if (runstate_is_live(s->vm_old_state)) { if (!runstate_check(RUN_STATE_SHUTDOWN)) { vm_start(); --=20 2.51.0 From nobody Fri Nov 14 16:50:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762194936; cv=none; d=zohomail.com; s=zohoarc; b=cpWaFDEvW9bQ9MaIvZPwDMNlRmL4YZjfhQSEsOZxyw1um7k+Xi0kV/QXob3+pPCmXgaCkYCrx+78Xf3b2+PB3jrH03LPQB118zBdmS1O7jxBhAdgkZERx1teAgdSjPLiExjul8FJbD0E9H3N3gmqx66Py3yvIxtvV9lPNXTKMtQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762194936; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=UuKTmuRorXFZOc6qPxeY/USEzq3NY2u7yZYeyn8ueoc=; b=QfhEZaEBPElR+TSeBpHMmp5gWostuEyBGG6yaUvUFe0/QarxGKGgWXKUOsj3iDnSVOynRL122ap1Ev+XwZBrbOTpLgX23DzJHqa1nRZSXTVSlDt4JydeOuw+rgHH52lsHuAr/upgTPu0jW7d6khmcdP7fPefhof7ayhgYoy6UX8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762194936408130.0161832468675; Mon, 3 Nov 2025 10:35:36 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vFzND-00061v-Hg; Mon, 03 Nov 2025 13:33:47 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzNB-00061F-LZ for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:33:45 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzN0-0006zR-DS for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:33:45 -0500 Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-175-VMcdCyVMOpmtA-Yr7EDzJw-1; Mon, 03 Nov 2025 13:33:26 -0500 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8F73A19541BB; Mon, 3 Nov 2025 18:33:17 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.44.32.249]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 07E0C1800619; Mon, 3 Nov 2025 18:33:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762194809; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UuKTmuRorXFZOc6qPxeY/USEzq3NY2u7yZYeyn8ueoc=; b=FNW8M8R6esvOlrsMmdcEhpe+jBbtVRK2Ms/2KqnR4nhgqAdwKz035iZxsm5VpIa8LEcVdh fnAxVJSHrgxF91X+CwmaOoWqZBMibAUlruBBlXgCUpUaxOoWe80HAU6eeKcwTfxRWVOgf3 jshkta1ed38eSPhkW0BoPxVr0LMcXQ8= X-MC-Unique: VMcdCyVMOpmtA-Yr7EDzJw-1 X-Mimecast-MFC-AGG-ID: VMcdCyVMOpmtA-Yr7EDzJw_1762194805 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , "Dr. David Alan Gilbert" , Fabiano Rosas , Peter Xu , Jiri Denemark Subject: [PATCH v4 3/8] migration: Move postcopy_ram_listen_thread() to postcopy-ram.c Date: Mon, 3 Nov 2025 19:32:52 +0100 Message-ID: <20251103183301.3840862-4-jmarcin@redhat.com> In-Reply-To: <20251103183301.3840862-1-jmarcin@redhat.com> References: <20251103183301.3840862-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762194937309154100 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin This patch addresses a TODO about moving postcopy_ram_listen_thread() to postcopy file. Signed-off-by: Juraj Marcin Reviewed-by: Peter Xu --- migration/postcopy-ram.c | 105 ++++++++++++++++++++++++++++++++++++++ migration/postcopy-ram.h | 2 + migration/savevm.c | 107 --------------------------------------- 3 files changed, 107 insertions(+), 107 deletions(-) diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 5471efb4f0..880b11f154 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -2077,3 +2077,108 @@ bool postcopy_is_paused(MigrationStatus status) return status =3D=3D MIGRATION_STATUS_POSTCOPY_PAUSED || status =3D=3D MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP; } + +/* + * Triggered by a postcopy_listen command; this thread takes over reading + * the input stream, leaving the main thread free to carry on loading the = rest + * of the device state (from RAM). + */ +void *postcopy_ram_listen_thread(void *opaque) +{ + MigrationIncomingState *mis =3D migration_incoming_get_current(); + QEMUFile *f =3D mis->from_src_file; + int load_res; + MigrationState *migr =3D migrate_get_current(); + Error *local_err =3D NULL; + + object_ref(OBJECT(migr)); + + migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, + MIGRATION_STATUS_POSTCOPY_ACTIVE); + qemu_event_set(&mis->thread_sync_event); + trace_postcopy_ram_listen_thread_start(); + + rcu_register_thread(); + /* + * Because we're a thread and not a coroutine we can't yield + * in qemu_file, and thus we must be blocking now. + */ + qemu_file_set_blocking(f, true, &error_fatal); + + /* TODO: sanity check that only postcopiable data will be loaded here = */ + load_res =3D qemu_loadvm_state_main(f, mis, &local_err); + + /* + * This is tricky, but, mis->from_src_file can change after it + * returns, when postcopy recovery happened. In the future, we may + * want a wrapper for the QEMUFile handle. + */ + f =3D mis->from_src_file; + + /* And non-blocking again so we don't block in any cleanup */ + qemu_file_set_blocking(f, false, &error_fatal); + + trace_postcopy_ram_listen_thread_exit(); + if (load_res < 0) { + qemu_file_set_error(f, load_res); + dirty_bitmap_mig_cancel_incoming(); + if (postcopy_state_get() =3D=3D POSTCOPY_INCOMING_RUNNING && + !migrate_postcopy_ram() && migrate_dirty_bitmaps()) + { + error_report("%s: loadvm failed during postcopy: %d: %s. All s= tates " + "are migrated except dirty bitmaps. Some dirty " + "bitmaps may be lost, and present migrated dirty " + "bitmaps are correctly migrated and valid.", + __func__, load_res, error_get_pretty(local_err)); + g_clear_pointer(&local_err, error_free); + load_res =3D 0; /* prevent further exit() */ + } else { + error_prepend(&local_err, + "loadvm failed during postcopy: %d: ", load_res); + migrate_set_error(migr, local_err); + g_clear_pointer(&local_err, error_report_err); + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIV= E, + MIGRATION_STATUS_FAILED); + } + } + if (load_res >=3D 0) { + /* + * This looks good, but it's possible that the device loading in t= he + * main thread hasn't finished yet, and so we might not be in 'RUN' + * state yet; wait for the end of the main thread. + */ + qemu_event_wait(&mis->main_thread_load_event); + } + postcopy_ram_incoming_cleanup(mis); + + if (load_res < 0) { + /* + * If something went wrong then we have a bad state so exit; + * depending how far we got it might be possible at this point + * to leave the guest running and fire MCEs for pages that never + * arrived as a desperate recovery step. + */ + rcu_unregister_thread(); + exit(EXIT_FAILURE); + } + + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, + MIGRATION_STATUS_COMPLETED); + /* + * If everything has worked fine, then the main thread has waited + * for us to start, and we're the last use of the mis. + * (If something broke then qemu will have to exit anyway since it's + * got a bad migration state). + */ + bql_lock(); + migration_incoming_state_destroy(); + bql_unlock(); + + rcu_unregister_thread(); + mis->have_listen_thread =3D false; + postcopy_state_set(POSTCOPY_INCOMING_END); + + object_unref(OBJECT(migr)); + + return NULL; +} diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h index ca19433b24..3e26db3e6b 100644 --- a/migration/postcopy-ram.h +++ b/migration/postcopy-ram.h @@ -199,4 +199,6 @@ bool postcopy_is_paused(MigrationStatus status); void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid, RAMBlock *rb); =20 +void *postcopy_ram_listen_thread(void *opaque); + #endif diff --git a/migration/savevm.c b/migration/savevm.c index fa017378db..2f7ed0db64 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2088,113 +2088,6 @@ static int loadvm_postcopy_ram_handle_discard(Migra= tionIncomingState *mis, return 0; } =20 -/* - * Triggered by a postcopy_listen command; this thread takes over reading - * the input stream, leaving the main thread free to carry on loading the = rest - * of the device state (from RAM). - * (TODO:This could do with being in a postcopy file - but there again it's - * just another input loop, not that postcopy specific) - */ -static void *postcopy_ram_listen_thread(void *opaque) -{ - MigrationIncomingState *mis =3D migration_incoming_get_current(); - QEMUFile *f =3D mis->from_src_file; - int load_res; - MigrationState *migr =3D migrate_get_current(); - Error *local_err =3D NULL; - - object_ref(OBJECT(migr)); - - migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, - MIGRATION_STATUS_POSTCOPY_ACTIVE); - qemu_event_set(&mis->thread_sync_event); - trace_postcopy_ram_listen_thread_start(); - - rcu_register_thread(); - /* - * Because we're a thread and not a coroutine we can't yield - * in qemu_file, and thus we must be blocking now. - */ - qemu_file_set_blocking(f, true, &error_fatal); - - /* TODO: sanity check that only postcopiable data will be loaded here = */ - load_res =3D qemu_loadvm_state_main(f, mis, &local_err); - - /* - * This is tricky, but, mis->from_src_file can change after it - * returns, when postcopy recovery happened. In the future, we may - * want a wrapper for the QEMUFile handle. - */ - f =3D mis->from_src_file; - - /* And non-blocking again so we don't block in any cleanup */ - qemu_file_set_blocking(f, false, &error_fatal); - - trace_postcopy_ram_listen_thread_exit(); - if (load_res < 0) { - qemu_file_set_error(f, load_res); - dirty_bitmap_mig_cancel_incoming(); - if (postcopy_state_get() =3D=3D POSTCOPY_INCOMING_RUNNING && - !migrate_postcopy_ram() && migrate_dirty_bitmaps()) - { - error_report("%s: loadvm failed during postcopy: %d: %s. All s= tates " - "are migrated except dirty bitmaps. Some dirty " - "bitmaps may be lost, and present migrated dirty " - "bitmaps are correctly migrated and valid.", - __func__, load_res, error_get_pretty(local_err)); - g_clear_pointer(&local_err, error_free); - load_res =3D 0; /* prevent further exit() */ - } else { - error_prepend(&local_err, - "loadvm failed during postcopy: %d: ", load_res); - migrate_set_error(migr, local_err); - g_clear_pointer(&local_err, error_report_err); - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIV= E, - MIGRATION_STATUS_FAILED); - } - } - if (load_res >=3D 0) { - /* - * This looks good, but it's possible that the device loading in t= he - * main thread hasn't finished yet, and so we might not be in 'RUN' - * state yet; wait for the end of the main thread. - */ - qemu_event_wait(&mis->main_thread_load_event); - } - postcopy_ram_incoming_cleanup(mis); - - if (load_res < 0) { - /* - * If something went wrong then we have a bad state so exit; - * depending how far we got it might be possible at this point - * to leave the guest running and fire MCEs for pages that never - * arrived as a desperate recovery step. - */ - rcu_unregister_thread(); - exit(EXIT_FAILURE); - } - - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, - MIGRATION_STATUS_COMPLETED); - /* - * If everything has worked fine, then the main thread has waited - * for us to start, and we're the last use of the mis. - * (If something broke then qemu will have to exit anyway since it's - * got a bad migration state). - */ - bql_lock(); - migration_incoming_state_destroy(); - bql_unlock(); - - rcu_unregister_thread(); - mis->have_listen_thread =3D false; - postcopy_state_set(POSTCOPY_INCOMING_END); - - object_unref(OBJECT(migr)); - - return NULL; -} - /* After this message we must be able to immediately receive postcopy data= */ static int loadvm_postcopy_handle_listen(MigrationIncomingState *mis, Error **errp) --=20 2.51.0 From nobody Fri Nov 14 16:50:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762194909; cv=none; d=zohomail.com; s=zohoarc; b=RP+HxjfU6ABjLMMNkIWIsCGC9km+72zX7xuSIInWQ+mlzC3jewdzWZ3L6mE8fW0ms9KwraM6axjIBA8AA8I0tszD3jOTDIj8bAEn+mTY0vwmfwC1X/NY3wI2eXIsXqDihQQWaBlcP45GaCM7EbqbOfcqTA//14mblw1NtwN7E5o= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762194909; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=Iwv9Ne8VGwdQgfdYohuZ2aslPMNYDvxBjP8G9NMIogM=; b=cuq9L5xAL2/fPlLVXtqgiAAJxiyZcarxcdIpP20tyLbmFp/KS0ekTfN90Kpbgs5SW19OBQtjUpvMi/vmlZtT4TGm62KCceMbSENjwl/GCYzQKEaxvbnVxn5VSgizFCIDI7zHhhOOJkB5BfFrioHcfrZvuOL1OZ4lciaqf8ovQL0= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762194909305749.4874270900806; Mon, 3 Nov 2025 10:35:09 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vFzNH-00063M-Nz; Mon, 03 Nov 2025 13:33:51 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzNF-00062h-Pz for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:33:49 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzN8-000723-FC for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:33:49 -0500 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-634-0QgamVrePFOPAWLrfkxXCg-1; Mon, 03 Nov 2025 13:33:35 -0500 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7511518ADBD7; Mon, 3 Nov 2025 18:33:23 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.44.32.249]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0252C1800659; Mon, 3 Nov 2025 18:33:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762194818; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Iwv9Ne8VGwdQgfdYohuZ2aslPMNYDvxBjP8G9NMIogM=; b=DzRkooAGD/DlhESsQ2r2lOm+THIoP0gbLW560oYUkRzC8AifZX4j9fnMaW0YCnc4jqbdAW t28VZ1PYN0lQp7q3727OvppI1tXSgL8bIA9cV+pbLb1N8LfGtrhV947IFicEkmebXtUBEO FB9bjCCtD4e1rGyxu4iSEPnmWDII5kU= X-MC-Unique: 0QgamVrePFOPAWLrfkxXCg-1 X-Mimecast-MFC-AGG-ID: 0QgamVrePFOPAWLrfkxXCg_1762194814 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , "Dr. David Alan Gilbert" , Fabiano Rosas , Peter Xu , Jiri Denemark Subject: [PATCH v4 4/8] migration: Introduce postcopy incoming setup and cleanup functions Date: Mon, 3 Nov 2025 19:32:53 +0100 Message-ID: <20251103183301.3840862-5-jmarcin@redhat.com> In-Reply-To: <20251103183301.3840862-1-jmarcin@redhat.com> References: <20251103183301.3840862-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762194909994158500 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin After moving postcopy_ram_listen_thread() to postcopy file, this patch introduces a pair of functions, postcopy_incoming_setup() and postcopy_incoming_cleanup(). These functions encapsulate setup and cleanup of all incoming postcopy resources, postcopy-ram and postcopy listen thread. Furthermore, this patch also renames the postcopy_ram_listen_thread to postcopy_listen_thread, as this thread handles not only postcopy-ram, but also dirty-bitmaps and in the future it could handle other postcopiable devices. Signed-off-by: Juraj Marcin Reviewed-by: Peter Xu --- migration/migration.c | 2 +- migration/postcopy-ram.c | 44 ++++++++++++++++++++++++++++++++++++++-- migration/postcopy-ram.h | 3 ++- migration/savevm.c | 25 ++--------------------- 4 files changed, 47 insertions(+), 27 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 6e647c7c4a..9a367f717e 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -892,7 +892,7 @@ process_incoming_migration_co(void *opaque) * but managed to complete within the precopy period, we can u= se * the normal exit. */ - postcopy_ram_incoming_cleanup(mis); + postcopy_incoming_cleanup(mis); } else if (ret >=3D 0) { /* * Postcopy was started, cleanup should happen at the end of t= he diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 880b11f154..b47c955763 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -2083,7 +2083,7 @@ bool postcopy_is_paused(MigrationStatus status) * the input stream, leaving the main thread free to carry on loading the = rest * of the device state (from RAM). */ -void *postcopy_ram_listen_thread(void *opaque) +static void *postcopy_listen_thread(void *opaque) { MigrationIncomingState *mis =3D migration_incoming_get_current(); QEMUFile *f =3D mis->from_src_file; @@ -2149,7 +2149,7 @@ void *postcopy_ram_listen_thread(void *opaque) */ qemu_event_wait(&mis->main_thread_load_event); } - postcopy_ram_incoming_cleanup(mis); + postcopy_incoming_cleanup(mis); =20 if (load_res < 0) { /* @@ -2182,3 +2182,43 @@ void *postcopy_ram_listen_thread(void *opaque) =20 return NULL; } + +int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp) +{ + /* + * Sensitise RAM - can now generate requests for blocks that don't exi= st + * However, at this point the CPU shouldn't be running, and the IO + * shouldn't be doing anything yet so don't actually expect requests + */ + if (migrate_postcopy_ram()) { + if (postcopy_ram_incoming_setup(mis)) { + postcopy_ram_incoming_cleanup(mis); + error_setg(errp, "Failed to setup incoming postcopy RAM blocks= "); + return -1; + } + } + + trace_loadvm_postcopy_handle_listen("after uffd"); + + if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) { + return -1; + } + + mis->have_listen_thread =3D true; + postcopy_thread_create(mis, &mis->listen_thread, + MIGRATION_THREAD_DST_LISTEN, + postcopy_listen_thread, QEMU_THREAD_DETACHED); + + return 0; +} + +int postcopy_incoming_cleanup(MigrationIncomingState *mis) +{ + int rc =3D 0; + + if (migrate_postcopy_ram()) { + rc =3D postcopy_ram_incoming_cleanup(mis); + } + + return rc; +} diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h index 3e26db3e6b..a080dd65a7 100644 --- a/migration/postcopy-ram.h +++ b/migration/postcopy-ram.h @@ -199,6 +199,7 @@ bool postcopy_is_paused(MigrationStatus status); void mark_postcopy_blocktime_begin(uintptr_t addr, uint32_t ptid, RAMBlock *rb); =20 -void *postcopy_ram_listen_thread(void *opaque); +int postcopy_incoming_setup(MigrationIncomingState *mis, Error **errp); +int postcopy_incoming_cleanup(MigrationIncomingState *mis); =20 #endif diff --git a/migration/savevm.c b/migration/savevm.c index 2f7ed0db64..01b5a8bfff 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2113,32 +2113,11 @@ static int loadvm_postcopy_handle_listen(MigrationI= ncomingState *mis, =20 trace_loadvm_postcopy_handle_listen("after discard"); =20 - /* - * Sensitise RAM - can now generate requests for blocks that don't exi= st - * However, at this point the CPU shouldn't be running, and the IO - * shouldn't be doing anything yet so don't actually expect requests - */ - if (migrate_postcopy_ram()) { - if (postcopy_ram_incoming_setup(mis)) { - postcopy_ram_incoming_cleanup(mis); - error_setg(errp, "Failed to setup incoming postcopy RAM blocks= "); - return -1; - } - } + int rc =3D postcopy_incoming_setup(mis, errp); =20 - trace_loadvm_postcopy_handle_listen("after uffd"); - - if (postcopy_notify(POSTCOPY_NOTIFY_INBOUND_LISTEN, errp)) { - return -1; - } - - mis->have_listen_thread =3D true; - postcopy_thread_create(mis, &mis->listen_thread, - MIGRATION_THREAD_DST_LISTEN, - postcopy_ram_listen_thread, QEMU_THREAD_DETACHE= D); trace_loadvm_postcopy_handle_listen("return"); =20 - return 0; + return rc; } =20 static void loadvm_postcopy_handle_run_bh(void *opaque) --=20 2.51.0 From nobody Fri Nov 14 16:50:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762194909; cv=none; d=zohomail.com; s=zohoarc; b=AF9hrsOeKSGM6k4Cis81v8AwaA7wENrat9odHhQES9XnJFRFi9iSQnSaAMBuHjlTunxdPL/Pzb4S6yNSE2onzYmGPMCM7qO7ZbD89H1PcHKufxreoRqUW9HAgIS1QThlNaFcxKsXvBxyW00uf3Ap7TI74frcMysFa8uQIP26Ibk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762194908; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=0Vw1B+WAklixK/OgjgJ07zfBa9fc2MEiBscojqAcb3k=; b=JWdrgm0CBfcGyMiEu13+58v2sEBTqontXqaZ2y5qS2oDwSQXO0vLE1oxztgp6Bt9aRgpltT+vtAEDrDwGWoseGosv/Aw8WskE+UHWm1Md7A+Df3xYypfO6Vm3BB/NUYSYsKnWfPLGZbi5lWH9FGy5XrCJAvxhiufNnLL15HLn3w= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762194908991678.6340124151037; Mon, 3 Nov 2025 10:35:08 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vFzNS-00064i-RK; Mon, 03 Nov 2025 13:34:02 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzNR-00064K-PY for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:34:01 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzNF-00075y-Po for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:33:59 -0500 Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-108-tzFpCPrtNVm-xUTXnCt-Sw-1; Mon, 03 Nov 2025 13:33:44 -0500 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2DDC419373E6; Mon, 3 Nov 2025 18:33:27 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.44.32.249]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3426D1800576; Mon, 3 Nov 2025 18:33:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762194827; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0Vw1B+WAklixK/OgjgJ07zfBa9fc2MEiBscojqAcb3k=; b=hQEiCGfyq0hGBl/EpxDc/lKkNAGTyUYevv+IN5xmVUYtoUZSxGbz3lv+Lv4WjTQmGZV8uo WUmdFuc8XIKq9/PrsF1ylYgBX9jg7JsG91KIf6tGjA40RoE5itDENpiSDi9yb4CH++kCH2 nvsnMtymK6ULt6KUNH01lKi4gjfdBIw= X-MC-Unique: tzFpCPrtNVm-xUTXnCt-Sw-1 X-Mimecast-MFC-AGG-ID: tzFpCPrtNVm-xUTXnCt-Sw_1762194823 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , "Dr. David Alan Gilbert" , Fabiano Rosas , Peter Xu , Jiri Denemark Subject: [PATCH v4 5/8] migration: Refactor all incoming cleanup info migration_incoming_destroy() Date: Mon, 3 Nov 2025 19:32:54 +0100 Message-ID: <20251103183301.3840862-6-jmarcin@redhat.com> In-Reply-To: <20251103183301.3840862-1-jmarcin@redhat.com> References: <20251103183301.3840862-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, SPF_HELO_PASS=-0.001, T_SPF_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762194911155154100 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin Currently, there are two functions that are responsible for calling the cleanup of the incoming migration state. With successful precopy, it's the incoming migration coroutine, and with successful postcopy it's the postcopy listen thread. However, if postcopy fails during in the device load, both functions will try to do the cleanup. This patch refactors all cleanup that needs to be done on the incoming side into a common function and defines a clear boundary, who is responsible for the cleanup. The incoming migration coroutine is responsible for calling the cleanup function, unless the listen thread has been started, in which case the postcopy listen thread runs the incoming migration cleanup in its BH. Signed-off-by: Juraj Marcin Fixes: 9535435795 ("migration: push Error **errp into qemu_loadvm_state()") Reviewed-by: Peter Xu --- migration/migration.c | 44 +++++++++------------------- migration/migration.h | 1 + migration/postcopy-ram.c | 63 +++++++++++++++++++++------------------- migration/trace-events | 2 +- 4 files changed, 49 insertions(+), 61 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 9a367f717e..637be71bfe 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIn= comingState *mis) =20 void migration_incoming_state_destroy(void) { - struct MigrationIncomingState *mis =3D migration_incoming_get_current(= ); + MigrationIncomingState *mis =3D migration_incoming_get_current(); + PostcopyState ps =3D postcopy_state_get(); =20 multifd_recv_cleanup(); =20 + if (ps !=3D POSTCOPY_INCOMING_NONE) { + postcopy_incoming_cleanup(mis); + } + /* * RAM state cleanup needs to happen after multifd cleanup, because * multifd threads can use some of its states (receivedmap). @@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque) { MigrationState *s =3D migrate_get_current(); MigrationIncomingState *mis =3D migration_incoming_get_current(); - PostcopyState ps; int ret; Error *local_err =3D NULL; =20 @@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque) =20 trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed"); =20 - ps =3D postcopy_state_get(); - trace_process_incoming_migration_co_end(ret, ps); - if (ps !=3D POSTCOPY_INCOMING_NONE) { - if (ps =3D=3D POSTCOPY_INCOMING_ADVISE) { - /* - * Where a migration had postcopy enabled (and thus went to ad= vise) - * but managed to complete within the precopy period, we can u= se - * the normal exit. - */ - postcopy_incoming_cleanup(mis); - } else if (ret >=3D 0) { - /* - * Postcopy was started, cleanup should happen at the end of t= he - * postcopy thread. - */ - trace_process_incoming_migration_co_postcopy_end_main(); - goto out; - } - /* Else if something went wrong then just fall out of the normal e= xit */ + trace_process_incoming_migration_co_end(ret); + if (mis->have_listen_thread) { + /* + * Postcopy was started, cleanup should happen at the end of the + * postcopy listen thread. + */ + trace_process_incoming_migration_co_postcopy_end_main(); + goto out; } =20 if (ret < 0) { @@ -933,15 +926,6 @@ fail: } =20 exit(EXIT_FAILURE); - } else { - /* - * Report the error here in case that QEMU abruptly exits - * when postcopy is enabled. - */ - WITH_QEMU_LOCK_GUARD(&s->error_mutex) { - error_report_err(s->error); - s->error =3D NULL; - } } out: /* Pairs with the refcount taken in qmp_migrate_incoming() */ diff --git a/migration/migration.h b/migration/migration.h index 01329bf824..4a37f7202c 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -254,6 +254,7 @@ struct MigrationIncomingState { MigrationIncomingState *migration_incoming_get_current(void); void migration_incoming_state_destroy(void); void migration_incoming_transport_cleanup(MigrationIncomingState *mis); +void migration_incoming_qemu_exit(void); /* * Functions to work with blocktime context */ diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index b47c955763..48cbb46c27 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status) status =3D=3D MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP; } =20 +static void postcopy_listen_thread_bh(void *opaque) +{ + MigrationIncomingState *mis =3D migration_incoming_get_current(); + + migration_incoming_state_destroy(); + + if (mis->state =3D=3D MIGRATION_STATUS_FAILED) { + /* + * If something went wrong then we have a bad state so exit; + * we only could have gotten here if something failed before + * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise + * postcopy migration would pause inside qemu_loadvm_state_main(). + * Failing dirty-bitmaps won't fail the whole migration. + */ + exit(1); + } +} + /* * Triggered by a postcopy_listen command; this thread takes over reading * the input stream, leaving the main thread free to carry on loading the = rest @@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque) "bitmaps are correctly migrated and valid.", __func__, load_res, error_get_pretty(local_err)); g_clear_pointer(&local_err, error_free); - load_res =3D 0; /* prevent further exit() */ } else { + /* + * Something went fatally wrong and we have a bad state, QEMU = will + * exit depending on if postcopy-exit-on-error is true, but the + * migration cannot be recovered. + */ error_prepend(&local_err, "loadvm failed during postcopy: %d: ", load_res); migrate_set_error(migr, local_err); g_clear_pointer(&local_err, error_report_err); migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIV= E, MIGRATION_STATUS_FAILED); + goto out; } } - if (load_res >=3D 0) { - /* - * This looks good, but it's possible that the device loading in t= he - * main thread hasn't finished yet, and so we might not be in 'RUN' - * state yet; wait for the end of the main thread. - */ - qemu_event_wait(&mis->main_thread_load_event); - } - postcopy_incoming_cleanup(mis); - - if (load_res < 0) { - /* - * If something went wrong then we have a bad state so exit; - * depending how far we got it might be possible at this point - * to leave the guest running and fire MCEs for pages that never - * arrived as a desperate recovery step. - */ - rcu_unregister_thread(); - exit(EXIT_FAILURE); - } + /* + * This looks good, but it's possible that the device loading in the + * main thread hasn't finished yet, and so we might not be in 'RUN' + * state yet; wait for the end of the main thread. + */ + qemu_event_wait(&mis->main_thread_load_event); =20 migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_COMPLETED); - /* - * If everything has worked fine, then the main thread has waited - * for us to start, and we're the last use of the mis. - * (If something broke then qemu will have to exit anyway since it's - * got a bad migration state). - */ - bql_lock(); - migration_incoming_state_destroy(); - bql_unlock(); =20 +out: rcu_unregister_thread(); mis->have_listen_thread =3D false; postcopy_state_set(POSTCOPY_INCOMING_END); =20 + migration_bh_schedule(postcopy_listen_thread_bh, NULL); + object_unref(OBJECT(migr)); =20 return NULL; diff --git a/migration/trace-events b/migration/trace-events index e8edd1fbba..772636f3ac 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRI= u32 source_return_path_thread_switchover_acked(void) "" migration_thread_low_pending(uint64_t pending) "%" PRIu64 migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t ba= ndwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_sp= ent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %"= PRId64 -process_incoming_migration_co_end(int ret, int ps) "ret=3D%d postcopy-stat= e=3D%d" +process_incoming_migration_co_end(int ret) "ret=3D%d" process_incoming_migration_co_postcopy_end_main(void) "" postcopy_preempt_enabled(bool value) "%d" migration_precopy_complete(void) "" --=20 2.51.0 From nobody Fri Nov 14 16:50:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762194916; cv=none; d=zohomail.com; s=zohoarc; b=WWblrIGj/xu8w1KsuF1VCu1NhUzAaUmZdCr2ENoAlkI3ME8rpniY2cHnq6ikim1C1IT46f6YdIfP0JOcecp8+44mRCebRxREqXQ23/ePQp5gR16km9eW1ScMoaRkz2ji99KSqZaD3FMmCrdIPoTmyclSz9qxphtrGFAoNmh3hEQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762194916; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=nwOLooqjrQgvrxWbQhGQsWKmFK2ppIjlU+Vt120JKEI=; b=HaAsKrJsByErkJtGjzYNSRytwVcD3eNyPhezoXLTvb2YlxK1Eu5En+YTb299QEbAmFapfKWWP+eh4/fRYXUdn/SQ22q5ucZYmdF/LBnQ4Jn6LZrq0Z2UIsE582D6fHsoqFtbK8RYElnZoyT3Vo00kL+kwgAgyAo+JhvNFPZ6sbA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762194916089817.6328567919771; Mon, 3 Nov 2025 10:35:16 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vFzNa-00065Z-Ku; Mon, 03 Nov 2025 13:34:10 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzNX-00065G-R5 for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:34:08 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzNH-00076D-CP for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:34:06 -0500 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-552-Jmfcj8j8PzeziUBKxf3kKQ-1; Mon, 03 Nov 2025 13:33:46 -0500 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9971D1954B1B; Mon, 3 Nov 2025 18:33:29 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.44.32.249]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8DE90180058D; Mon, 3 Nov 2025 18:33:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762194828; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nwOLooqjrQgvrxWbQhGQsWKmFK2ppIjlU+Vt120JKEI=; b=T4wwpCObmIcBTDW4L2H+6yOqVikizh/g6IkQKEXua/vJzlC77ajglBnLiyNvGmvO1CR/oE CP9RNuTID6ekxNvVwqDxLx6jq8OcgW+UiKNQkpIn1jZgVq8KrOPfnXK+07U024ILEsF2Zr wjjKzI8u0ZBichsEIgqm905fkVljeBI= X-MC-Unique: Jmfcj8j8PzeziUBKxf3kKQ-1 X-Mimecast-MFC-AGG-ID: Jmfcj8j8PzeziUBKxf3kKQ_1762194825 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , "Dr. David Alan Gilbert" , Fabiano Rosas , Peter Xu , Jiri Denemark Subject: [PATCH v4 6/8] migration: Respect exit-on-error when migration fails before resuming Date: Mon, 3 Nov 2025 19:32:55 +0100 Message-ID: <20251103183301.3840862-7-jmarcin@redhat.com> In-Reply-To: <20251103183301.3840862-1-jmarcin@redhat.com> References: <20251103183301.3840862-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762194917798158500 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin When exit-on-error was added to migration, it wasn't added to postcopy. Even though postcopy migration will usually pause and not fail, in cases it does unrecoverably fail before destination side has been started, exit-on-error will allow management to query the error. Signed-off-by: Juraj Marcin Reviewed-by: Peter Xu --- migration/postcopy-ram.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 48cbb46c27..91431f02a4 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -2080,11 +2080,16 @@ bool postcopy_is_paused(MigrationStatus status) =20 static void postcopy_listen_thread_bh(void *opaque) { + MigrationState *s =3D migrate_get_current(); MigrationIncomingState *mis =3D migration_incoming_get_current(); =20 migration_incoming_state_destroy(); =20 - if (mis->state =3D=3D MIGRATION_STATUS_FAILED) { + if (mis->state =3D=3D MIGRATION_STATUS_FAILED && mis->exit_on_error) { + WITH_QEMU_LOCK_GUARD(&s->error_mutex) { + error_report_err(s->error); + s->error =3D NULL; + } /* * If something went wrong then we have a bad state so exit; * we only could have gotten here if something failed before --=20 2.51.0 From nobody Fri Nov 14 16:50:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762194947; cv=none; d=zohomail.com; s=zohoarc; b=KmpnYZ5lDskOstEBVXgvKX94NRvmJeXhH8DEVn7vA3ekGcr+t4+fLOdgiQ2mG4LEa4QdtEzDCDG1X6H2yId1wqDMja7gxyFttgO69fkcdK2UsmHQbA6csAaFkNOBHa1QZgc8e0/Ma4CtxuDHshWc29S8LH6J7qrGhGuYMWQQBaE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762194947; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=CWk9uithZtSX14pfAQlE3GNVT9zUP4frfhot39QuP/w=; b=faLwJQS3LQ8X90N/fjBW91WmkLjTi2n/FLVQydi7hne/CqhupJnNa67zymZB8LsYmZSesT3hMFSmnHrwGSSuGP0v5LM2B+XvWSmk6EkqXtPQD/6tcNSQ9Enjzm1QFMI7mqs9ED66ukYQcXrT+jlrodZbnT9/X1kiBbOHitadrms= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762194947333725.4107850158587; Mon, 3 Nov 2025 10:35:47 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vFzNc-00066O-A8; Mon, 03 Nov 2025 13:34:12 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzNa-000666-Gg for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:34:11 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzNP-00077l-PX for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:34:10 -0500 Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-659-nvGazUQhNBuZS3qME8uvgw-1; Mon, 03 Nov 2025 13:33:52 -0500 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7A830183451C; Mon, 3 Nov 2025 18:33:32 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.44.32.249]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 4549E180058A; Mon, 3 Nov 2025 18:33:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762194835; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CWk9uithZtSX14pfAQlE3GNVT9zUP4frfhot39QuP/w=; b=FibArQ8YDoYwr53VhtfJl1DR8MmrfSI/gkTt9weKjdVgUT9f8+cTs66zG3ty1BFKOOycPu As5hphxrN+jg1D6QR/ea/eBdyKfBBCtJw7AM+7plUcQj0IBsM0drC4DNjvyYl2dKxhv5Mz bjjQVL08Dmp7qMbV6qH1pxu2G0pDPMw= X-MC-Unique: nvGazUQhNBuZS3qME8uvgw-1 X-Mimecast-MFC-AGG-ID: nvGazUQhNBuZS3qME8uvgw_1762194831 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , "Dr. David Alan Gilbert" , Fabiano Rosas , Peter Xu , Jiri Denemark Subject: [PATCH v4 7/8] migration: Make postcopy listen thread joinable Date: Mon, 3 Nov 2025 19:32:56 +0100 Message-ID: <20251103183301.3840862-8-jmarcin@redhat.com> In-Reply-To: <20251103183301.3840862-1-jmarcin@redhat.com> References: <20251103183301.3840862-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, T_SPF_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762194948098158500 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin This patch makes the listen thread joinable instead detached, and joins it alongside other postcopy threads. Signed-off-by: Juraj Marcin Reviewed-by: Peter Xu --- migration/postcopy-ram.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 91431f02a4..8405cce7b4 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -2181,7 +2181,6 @@ static void *postcopy_listen_thread(void *opaque) =20 out: rcu_unregister_thread(); - mis->have_listen_thread =3D false; postcopy_state_set(POSTCOPY_INCOMING_END); =20 migration_bh_schedule(postcopy_listen_thread_bh, NULL); @@ -2215,7 +2214,7 @@ int postcopy_incoming_setup(MigrationIncomingState *m= is, Error **errp) mis->have_listen_thread =3D true; postcopy_thread_create(mis, &mis->listen_thread, MIGRATION_THREAD_DST_LISTEN, - postcopy_listen_thread, QEMU_THREAD_DETACHED); + postcopy_listen_thread, QEMU_THREAD_JOINABLE); =20 return 0; } @@ -2224,6 +2223,11 @@ int postcopy_incoming_cleanup(MigrationIncomingState= *mis) { int rc =3D 0; =20 + if (mis->have_listen_thread) { + qemu_thread_join(&mis->listen_thread); + mis->have_listen_thread =3D false; + } + if (migrate_postcopy_ram()) { rc =3D postcopy_ram_incoming_cleanup(mis); } --=20 2.51.0 From nobody Fri Nov 14 16:50:37 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762194931; cv=none; d=zohomail.com; s=zohoarc; b=naMQADFH/RvCIvGUodFYFabT9KA4MLSnQeQaP5CYKlYOke+cPJ4EFRa9Aeq9R7bwuIa0UbxkB+x6d9bJENf8JqVn6o+1V9X8BekDMef/RP7PRvYjd3NOV006kT0GmG/ZWzeDRoEtRM/ZWKsrDwVmXc7/Tc+aMeV+5SURggmlC0A= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762194931; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=pnvX5Q8pGOtmw2ly1J1yluVuYdn6KceFXTHdKsc1p40=; b=l3yJU8JfXM3wQEGQ1oTjjEQH+cv5oi9ayQpWat0uV5tyQ+FdnQutGu08J6UFq++3K1BbPYMWwdtzGQUVGcl0MklGskPZQVgpfTA0CZOwzrSR811zv7F6sFohIZmvtCQCC2WFaioPVR8ya3NAfWib4XNSPVTwKoPBbpv4N0Iz/0c= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762194931861769.0491123228298; Mon, 3 Nov 2025 10:35:31 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vFzNd-00066C-IU; Mon, 03 Nov 2025 13:34:13 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzNX-00065H-R4 for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:34:08 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vFzNT-00078G-7C for qemu-devel@nongnu.org; Mon, 03 Nov 2025 13:34:06 -0500 Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-633-AwLC2zxcNcSTxq9ffN9pvw-1; Mon, 03 Nov 2025 13:33:54 -0500 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0B9B21955EA9; Mon, 3 Nov 2025 18:33:35 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.44.32.249]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E6F79180035A; Mon, 3 Nov 2025 18:33:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762194838; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pnvX5Q8pGOtmw2ly1J1yluVuYdn6KceFXTHdKsc1p40=; b=MqHJdWKhQbNpbnS8AD7b/s53+yZkDQAoaL2H5FkswQpMXxj5k9pZnDTuVQVyjQIVexfD1P zEuYmJM66a4XgxB2jx6YqHQDV0Gyq1l6U0ULAQxMCwUJ4ws6wstxSfHy+zGrGxxCNebi76 VCEguXowUKVWwESwOylrC+PALZuuGdo= X-MC-Unique: AwLC2zxcNcSTxq9ffN9pvw-1 X-Mimecast-MFC-AGG-ID: AwLC2zxcNcSTxq9ffN9pvw_1762194833 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , "Dr. David Alan Gilbert" , Fabiano Rosas , Peter Xu , Jiri Denemark Subject: [PATCH v4 8/8] migration: Introduce POSTCOPY_DEVICE state Date: Mon, 3 Nov 2025 19:32:57 +0100 Message-ID: <20251103183301.3840862-9-jmarcin@redhat.com> In-Reply-To: <20251103183301.3840862-1-jmarcin@redhat.com> References: <20251103183301.3840862-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762194946182158500 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin Currently, when postcopy starts, the source VM starts switchover and sends a package containing the state of all non-postcopiable devices. When the destination loads this package, the switchover is complete and the destination VM starts. However, if the device state load fails or the destination side crashes, the source side is already in POSTCOPY_ACTIVE state and cannot be recovered, even when it has the most up-to-date machine state as the destination has not yet started. This patch introduces a new POSTCOPY_DEVICE state which is active while the destination machine is loading the device state, is not yet running, and the source side can be resumed in case of a migration failure. Return-path is required for this state to function, otherwise it will be skipped in favor of POSTCOPY_ACTIVE. To transition from POSTCOPY_DEVICE to POSTCOPY_ACTIVE, the source side uses a PONG message that is a response to a PING message processed just before the POSTCOPY_RUN command that starts the destination VM. Thus, this feature is effective even if the destination side does not yet support this new state. Signed-off-by: Juraj Marcin --- migration/migration.c | 50 ++++++++++++++++++++++++--- migration/migration.h | 3 ++ migration/postcopy-ram.c | 10 ++++-- migration/savevm.c | 5 +++ migration/savevm.h | 2 ++ migration/trace-events | 1 + qapi/migration.json | 10 ++++-- tests/qemu-iotests/194 | 2 +- tests/qtest/migration/precopy-tests.c | 3 +- 9 files changed, 75 insertions(+), 11 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 637be71bfe..c2daab6bdd 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1206,6 +1206,7 @@ bool migration_is_running(void) =20 switch (s->state) { case MIGRATION_STATUS_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_DEVICE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: @@ -1227,6 +1228,7 @@ static bool migration_is_active(void) MigrationState *s =3D current_migration; =20 return (s->state =3D=3D MIGRATION_STATUS_ACTIVE || + s->state =3D=3D MIGRATION_STATUS_POSTCOPY_DEVICE || s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE); } =20 @@ -1349,6 +1351,7 @@ static void fill_source_migration_info(MigrationInfo = *info) break; case MIGRATION_STATUS_ACTIVE: case MIGRATION_STATUS_CANCELLING: + case MIGRATION_STATUS_POSTCOPY_DEVICE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_PRE_SWITCHOVER: case MIGRATION_STATUS_DEVICE: @@ -1402,6 +1405,7 @@ static void fill_destination_migration_info(Migration= Info *info) case MIGRATION_STATUS_CANCELLING: case MIGRATION_STATUS_CANCELLED: case MIGRATION_STATUS_ACTIVE: + case MIGRATION_STATUS_POSTCOPY_DEVICE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: case MIGRATION_STATUS_POSTCOPY_RECOVER: @@ -1732,6 +1736,7 @@ bool migration_in_postcopy(void) MigrationState *s =3D migrate_get_current(); =20 switch (s->state) { + case MIGRATION_STATUS_POSTCOPY_DEVICE: case MIGRATION_STATUS_POSTCOPY_ACTIVE: case MIGRATION_STATUS_POSTCOPY_PAUSED: case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: @@ -1833,6 +1838,9 @@ int migrate_init(MigrationState *s, Error **errp) memset(&mig_stats, 0, sizeof(mig_stats)); migration_reset_vfio_bytes_transferred(); =20 + s->postcopy_package_loaded =3D false; + qemu_event_reset(&s->postcopy_package_loaded_event); + return 0; } =20 @@ -2568,6 +2576,11 @@ static void *source_return_path_thread(void *opaque) tmp32 =3D ldl_be_p(buf); trace_source_return_path_thread_pong(tmp32); qemu_sem_post(&ms->rp_state.rp_pong_acks); + if (tmp32 =3D=3D QEMU_VM_PING_PACKAGED_LOADED) { + trace_source_return_path_thread_postcopy_package_loaded(); + ms->postcopy_package_loaded =3D true; + qemu_event_set(&ms->postcopy_package_loaded_event); + } break; =20 case MIG_RP_MSG_REQ_PAGES: @@ -2813,6 +2826,15 @@ static int postcopy_start(MigrationState *ms, Error = **errp) if (migrate_postcopy_ram()) { qemu_savevm_send_ping(fb, 3); } + if (ms->rp_state.rp_thread_created) { + /* + * This ping will tell us that all non-postcopiable device state ha= s been + * successfully loaded and the destination is about to start. When + * response is received, it will trigger transition from POSTCOPY_D= EVICE + * to POSTCOPY_ACTIVE state. + */ + qemu_savevm_send_ping(fb, QEMU_VM_PING_PACKAGED_LOADED); + } =20 qemu_savevm_send_postcopy_run(fb); =20 @@ -2868,8 +2890,13 @@ static int postcopy_start(MigrationState *ms, Error = **errp) */ migration_rate_set(migrate_max_postcopy_bandwidth()); =20 - /* Now, switchover looks all fine, switching to postcopy-active */ + /* + * Now, switchover looks all fine, switching to POSTCOPY_DEVICE, or + * directly to POSTCOPY_ACTIVE if there is no return path. + */ migrate_set_state(&ms->state, MIGRATION_STATUS_DEVICE, + ms->rp_state.rp_thread_created ? + MIGRATION_STATUS_POSTCOPY_DEVICE : MIGRATION_STATUS_POSTCOPY_ACTIVE); =20 bql_unlock(); @@ -3311,8 +3338,8 @@ static MigThrError migration_detect_error(MigrationSt= ate *s) return postcopy_pause(s); } else { /* - * For precopy (or postcopy with error outside IO), we fail - * with no time. + * For precopy (or postcopy with error outside IO, or before dest + * starts), we fail with no time. */ migrate_set_state(&s->state, state, MIGRATION_STATUS_FAILED); trace_migration_thread_file_err(); @@ -3447,7 +3474,8 @@ static MigIterateState migration_iteration_run(Migrat= ionState *s) { uint64_t must_precopy, can_postcopy, pending_size; Error *local_err =3D NULL; - bool in_postcopy =3D s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE; + bool in_postcopy =3D (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_DEVICE= || + s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE); bool can_switchover =3D migration_can_switchover(s); bool complete_ready; =20 @@ -3463,6 +3491,18 @@ static MigIterateState migration_iteration_run(Migra= tionState *s) * POSTCOPY_ACTIVE it means switchover already happened. */ complete_ready =3D !pending_size; + if (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_DEVICE && + (s->postcopy_package_loaded || complete_ready)) { + /* + * If package has been loaded, the event is set and we will + * immediatelly transition to POSTCOPY_ACTIVE. If we are ready= for + * completion, we need to wait for destination to load the pos= tcopy + * package before actually completing. + */ + qemu_event_wait(&s->postcopy_package_loaded_event); + migrate_set_state(&s->state, MIGRATION_STATUS_POSTCOPY_DEVICE, + MIGRATION_STATUS_POSTCOPY_ACTIVE); + } } else { /* * Exact pending reporting is only needed for precopy. Taking RAM @@ -4117,6 +4157,7 @@ static void migration_instance_finalize(Object *obj) qemu_sem_destroy(&ms->rp_state.rp_pong_acks); qemu_sem_destroy(&ms->postcopy_qemufile_src_sem); error_free(ms->error); + qemu_event_destroy(&ms->postcopy_package_loaded_event); } =20 static void migration_instance_init(Object *obj) @@ -4138,6 +4179,7 @@ static void migration_instance_init(Object *obj) qemu_sem_init(&ms->wait_unplug_sem, 0); qemu_sem_init(&ms->postcopy_qemufile_src_sem, 0); qemu_mutex_init(&ms->qemu_file_lock); + qemu_event_init(&ms->postcopy_package_loaded_event, 0); } =20 /* diff --git a/migration/migration.h b/migration/migration.h index 4a37f7202c..213b33fe6e 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -510,6 +510,9 @@ struct MigrationState { /* Is this a rdma migration */ bool rdma_migration; =20 + bool postcopy_package_loaded; + QemuEvent postcopy_package_loaded_event; + GSource *hup_source; }; =20 diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index 8405cce7b4..3f98dcb6fd 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -2117,7 +2117,8 @@ static void *postcopy_listen_thread(void *opaque) object_ref(OBJECT(migr)); =20 migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, - MIGRATION_STATUS_POSTCOPY_ACTIVE); + mis->to_src_file ? MIGRATION_STATUS_POSTCOPY_DEVICE : + MIGRATION_STATUS_POSTCOPY_ACTIVE); qemu_event_set(&mis->thread_sync_event); trace_postcopy_ram_listen_thread_start(); =20 @@ -2164,8 +2165,7 @@ static void *postcopy_listen_thread(void *opaque) "loadvm failed during postcopy: %d: ", load_res); migrate_set_error(migr, local_err); g_clear_pointer(&local_err, error_report_err); - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIV= E, - MIGRATION_STATUS_FAILED); + migrate_set_state(&mis->state, mis->state, MIGRATION_STATUS_FA= ILED); goto out; } } @@ -2176,6 +2176,10 @@ static void *postcopy_listen_thread(void *opaque) */ qemu_event_wait(&mis->main_thread_load_event); =20 + /* + * Device load in the main thread has finished, we should be in + * POSTCOPY_ACTIVE now. + */ migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_COMPLETED); =20 diff --git a/migration/savevm.c b/migration/savevm.c index 01b5a8bfff..62cc2ce25c 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2170,6 +2170,11 @@ static int loadvm_postcopy_handle_run(MigrationIncom= ingState *mis, Error **errp) return -1; } =20 + /* We might be already in POSTCOPY_ACTIVE if there is no return path */ + if (mis->state =3D=3D MIGRATION_STATUS_POSTCOPY_DEVICE) { + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_DEVICE, + MIGRATION_STATUS_POSTCOPY_ACTIVE); + } postcopy_state_set(POSTCOPY_INCOMING_RUNNING); migration_bh_schedule(loadvm_postcopy_handle_run_bh, mis); =20 diff --git a/migration/savevm.h b/migration/savevm.h index c337e3e3d1..125a2507b7 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -29,6 +29,8 @@ #define QEMU_VM_COMMAND 0x08 #define QEMU_VM_SECTION_FOOTER 0x7e =20 +#define QEMU_VM_PING_PACKAGED_LOADED 0x42 + bool qemu_savevm_state_blocked(Error **errp); void qemu_savevm_non_migratable_list(strList **reasons); int qemu_savevm_state_prepare(Error **errp); diff --git a/migration/trace-events b/migration/trace-events index 772636f3ac..bf11b62b17 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -191,6 +191,7 @@ source_return_path_thread_pong(uint32_t val) "0x%x" source_return_path_thread_shut(uint32_t val) "0x%x" source_return_path_thread_resume_ack(uint32_t v) "%"PRIu32 source_return_path_thread_switchover_acked(void) "" +source_return_path_thread_postcopy_package_loaded(void) "" migration_thread_low_pending(uint64_t pending) "%" PRIu64 migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t ba= ndwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_sp= ent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %"= PRId64 process_incoming_migration_co_end(int ret) "ret=3D%d" diff --git a/qapi/migration.json b/qapi/migration.json index c7a6737cc1..93f71de3fe 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -142,6 +142,12 @@ # @postcopy-active: like active, but now in postcopy mode. # (since 2.5) # +# @postcopy-device: like postcopy-active, but the destination is still +# loading device state and is not running yet. If migration fails +# during this state, the source side will resume. If there is no +# return-path from destination to source, this state is skipped. +# (since 10.2) +# # @postcopy-paused: during postcopy but paused. (since 3.0) # # @postcopy-recover-setup: setup phase for a postcopy recovery @@ -173,8 +179,8 @@ ## { 'enum': 'MigrationStatus', 'data': [ 'none', 'setup', 'cancelling', 'cancelled', - 'active', 'postcopy-active', 'postcopy-paused', - 'postcopy-recover-setup', + 'active', 'postcopy-device', 'postcopy-active', + 'postcopy-paused', 'postcopy-recover-setup', 'postcopy-recover', 'completed', 'failed', 'colo', 'pre-switchover', 'device', 'wait-unplug' ] } ## diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194 index e114c0b269..806624394d 100755 --- a/tests/qemu-iotests/194 +++ b/tests/qemu-iotests/194 @@ -76,7 +76,7 @@ with iotests.FilePath('source.img') as source_img_path, \ =20 while True: event1 =3D source_vm.event_wait('MIGRATION') - if event1['data']['status'] =3D=3D 'postcopy-active': + if event1['data']['status'] in ('postcopy-device', 'postcopy-activ= e'): # This event is racy, it depends do we really do postcopy or b= itmap # was migrated during downtime (and no data to migrate in post= copy # phase). So, don't log it. diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/= precopy-tests.c index bb38292550..57ca623de5 100644 --- a/tests/qtest/migration/precopy-tests.c +++ b/tests/qtest/migration/precopy-tests.c @@ -1316,13 +1316,14 @@ void migration_test_add_precopy(MigrationTestEnv *e= nv) } =20 /* ensure new status don't go unnoticed */ - assert(MIGRATION_STATUS__MAX =3D=3D 15); + assert(MIGRATION_STATUS__MAX =3D=3D 16); =20 for (int i =3D MIGRATION_STATUS_NONE; i < MIGRATION_STATUS__MAX; i++) { switch (i) { case MIGRATION_STATUS_DEVICE: /* happens too fast */ case MIGRATION_STATUS_WAIT_UNPLUG: /* no support in tests */ case MIGRATION_STATUS_COLO: /* no support in tests */ + case MIGRATION_STATUS_POSTCOPY_DEVICE: /* postcopy can't be cancel= led */ case MIGRATION_STATUS_POSTCOPY_ACTIVE: /* postcopy can't be cancel= led */ case MIGRATION_STATUS_POSTCOPY_PAUSED: case MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP: --=20 2.51.0