From nobody Sun Dec 14 12:17:29 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=akamai.com ARC-Seal: i=1; a=rsa-sha256; t=1764787950; cv=none; d=zohomail.com; s=zohoarc; b=Ynyt66FDMkXG/vzEMpQD7vU/ca7ZBrKM0HaxPLE0epgouHniC3vXwjTArmV0ZrU4YV5JV8codsELCKna0LKtZVqbrAucwHoYRc+60vg5TzM4RKt8I5V4EdhbswbVysSY6OGSRX68KNFXPPxql6r9orsWZ+ZuFjWvhctjEHCFFr0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1764787950; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=SCJJjeHJL3zvQFZyNH6s4ysT7NMl476yKiQWW9PF388=; b=fsDeLaXJKBhaXS76mI6GpuSnMSQA+6vKHuD3PoGvRQmfmJnHFZbW9/ljCTXAe7DW20EY/nEYPUxH9MNNJueem0npclWnduUi7IuK802kQ6WwkcxfQSvRFvzP+iY2c8HTbsMDKNGka85pQXyf+FUcYiOGRlMb9Yf+X6EFpCCiNXo= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1764787950292278.88762918294856; Wed, 3 Dec 2025 10:52:30 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vQrwq-0007SG-A2; Wed, 03 Dec 2025 13:51:32 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vQrwp-0007Qp-14 for qemu-devel@nongnu.org; Wed, 03 Dec 2025 13:51:31 -0500 Received: from mx0b-00190b01.pphosted.com ([67.231.157.127]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vQrwn-0000zR-5z for qemu-devel@nongnu.org; Wed, 03 Dec 2025 13:51:30 -0500 Received: from pps.filterd (m0409411.ppops.net [127.0.0.1]) by m0409411.ppops.net-00190b01. (8.18.1.11/8.18.1.11) with ESMTP id 5B3FwH85148441; Wed, 3 Dec 2025 18:51:27 GMT Received: from prod-mail-ppoint2 (prod-mail-ppoint2.akamai.com [184.51.33.19]) by m0409411.ppops.net-00190b01. (PPS) with ESMTPS id 4at31m8y0b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 03 Dec 2025 18:51:27 +0000 (GMT) Received: from pps.filterd (prod-mail-ppoint2.akamai.com [127.0.0.1]) by prod-mail-ppoint2.akamai.com (8.18.1.2/8.18.1.2) with ESMTP id 5B3IZ4A0014679; Wed, 3 Dec 2025 13:51:26 -0500 Received: from prod-mail-relay01.akamai.com ([172.27.118.31]) by prod-mail-ppoint2.akamai.com (PPS) with ESMTP id 4aqw21kuh6-1; Wed, 03 Dec 2025 13:51:26 -0500 Received: from bos-lhvkhf.bos01.corp.akamai.com (bos-lhvkhf.bos01.corp.akamai.com [172.28.40.75]) by prod-mail-relay01.akamai.com (Postfix) with ESMTP id 9CA8184; Wed, 3 Dec 2025 18:51:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=akamai.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=jan2016.eng; bh=SCJJjeHJL3zvQFZyNH6s4ysT7NMl476yKiQWW9PF388=; b=Uy6tsZ86GKI0 Ee/lp5OZYKvBAsIgv+dYboExsM/fr849GjrtMgaX86mMsvpf39/tw1drbQoc3gpY DbAyXcJPZ90XK1oY1w6HLgaOGGsIND9m69MHQ4REG5UFUI9CHmXLtiaOv528++he 7Uxy0KitMddrfr10+N39IGWO4oMEqE1Bsm0Wa3llCSuaDq0S1k7SPw0qso1wdUUV RhKaRDIBPrG1u4FSTwe+SPS1AGMJxrBnSH0q1/1TcWh52ch64OB68E04ysN1M5AS dspyLwe5IjvzyeiVeDKKkw9tXk7774PomEEEA7vUfZYaK2t8xATC4ohJKu2quUTo o41j2FmJkQ== From: Ben Chaney Date: Wed, 03 Dec 2025 13:51:18 -0500 Subject: [PATCH v3 1/8] migration: stop vm earlier for cpr MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251203-cpr-tap-v3-1-3c12e0a61f8e@akamai.com> References: <20251203-cpr-tap-v3-0-3c12e0a61f8e@akamai.com> In-Reply-To: <20251203-cpr-tap-v3-0-3c12e0a61f8e@akamai.com> To: qemu-devel@nongnu.org Cc: Peter Xu , Fabiano Rosas , "Michael S. Tsirkin" , Stefano Garzarella , Jason Wang , Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_Goater?= , Eric Blake , Markus Armbruster , Stefan Weil , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Paolo Bonzini , Hamza Khan , Mark Kanda , Joshua Hunt , Max Tottenham , Ben Chaney , Steve Sistare X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1764787886; l=4737; i=bchaney@akamai.com; s=20251203; h=from:subject:message-id; bh=6VO5q5TE5yuukYkhY6LLyhvDeAXk7NNxQPT3gWn7Grw=; b=jmB3MnmsjpjuBR/wnVMra/A/TNmA/dppEn5qBFxRW1X9N+Qzwm8MSIeLmEoyL6bBW4kfGSpTe KBQX1b8RC8JCBAbmBEwwsEuzkp3Yi6MwqTXXziriNInNkHbQnMZt6qc X-Developer-Key: i=bchaney@akamai.com; a=ed25519; pk=6+w9cse5QEeVdy3tjqFxs/4rAaRdQ2/fkTxVFq+lWy4= X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-03_02,2025-12-03_02,2025-10-01_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 adultscore=0 mlxscore=0 malwarescore=0 suspectscore=0 mlxlogscore=999 phishscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2510240000 definitions=main-2512030148 X-Proofpoint-GUID: X363mFYwB-pYD7snl9ZTA5UTk2bKOKDW X-Authority-Analysis: v=2.4 cv=I6Nohdgg c=1 sm=1 tr=0 ts=693086af cx=c_pps a=BpD+HMUBsFIkYY1OQe22Yw==:117 a=BpD+HMUBsFIkYY1OQe22Yw==:17 a=IkcTkHD0fZMA:10 a=wP3pNCr1ah4A:10 a=VkNPw1HP01LnGYTKEx00:22 a=yPCof4ZbAAAA:8 a=X7Ea-ya5AAAA:8 a=Q2gaB0lLyM3LBXvKuXcA:9 a=QEXdDO2ut3YA:10 X-Proofpoint-ORIG-GUID: X363mFYwB-pYD7snl9ZTA5UTk2bKOKDW X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMjAzMDE0NyBTYWx0ZWRfXyRMrCv7efKI4 J0fG9e7AoGAIaTK/g2DnvAh0BYkYO1uJhegDdLplFQVZ03pZWTFNrB8M1GikGIIyzVVMjJgqotO 6BHvCV8vooMm011YWkfLYmu3qxJ9FtCCdjKM3GrF2VST78BnyaFATDfNOTZi3h4zhQaPFJYEKY9 rQG4lXAATcRASrVBX32Ypz7uO0KBk+mj2XNJCitOYk1m7wmMjzDBRdI6SXb5/WlBtWV6SukhLKF 8fAXveIW48hw4LTLt4M1u9FcWEKPN8ke8pCiPo4FgX6W5qeRh1mUIDF1PL2k4U8MUOaiHTYkXQX erUSpPB1FawxYWMCauQ0jf97ZZIfkfHWqH4YhFYhYH7/wf7xc/VTrS0zDwHSAUV4gMO+O84wEv7 2Xdo95s5s8zP9suNutu85SE05BJ8Tw== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2025-12-03_02,2025-12-03_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 impostorscore=0 phishscore=0 malwarescore=0 bulkscore=0 clxscore=1011 spamscore=0 lowpriorityscore=0 adultscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.22.0-2510240001 definitions=main-2512030147 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=67.231.157.127; envelope-from=bchaney@akamai.com; helo=mx0b-00190b01.pphosted.com X-Spam_score_int: 5 X-Spam_score: 0.5 X-Spam_bar: / X-Spam_report: (0.5 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_SBL_CSS=3.335, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @akamai.com) X-ZM-MESSAGEID: 1764787952259019200 From: Steve Sistare Stop the vm earlier for cpr, before cpr_save_state which causes new QEMU to proceed and initialize devices. We must guarantee devices are stopped in old QEMU, and all source notifiers called, before they are initialized in new QEMU. Signed-off-by: Steve Sistare Signed-off-by: Ben Chaney --- migration/migration.c | 57 +++++++++++++++++++++++++++++++++++++++++++----= ---- 1 file changed, 48 insertions(+), 9 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index c2daab6bdd..6d40697767 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1657,6 +1657,7 @@ void migration_cancel(void) MIGRATION_STATUS_CANCELLED); cpr_state_close(); migrate_hup_delete(s); + vm_resume(s->vm_old_state); } } =20 @@ -2216,6 +2217,7 @@ void qmp_migrate(const char *uri, bool has_channels, MigrationAddress *addr =3D NULL; MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] =3D { NULL }; MigrationChannel *cpr_channel =3D NULL; + bool stopped =3D false; =20 /* * Having preliminary checks for uri and channel @@ -2268,6 +2270,46 @@ void qmp_migrate(const char *uri, bool has_channels, return; } =20 + /* + * CPR-transfer ordering: + * + * SOURCE TARGET + * ------ ------ + * cpr_state_load() blocks + * | | + * | 1. migration_stop_vm() | + * | VM stopped, devices quiesced | + * | | Waiting for + * | 2. notifiers (PRECOPY_SETUP) | FDs from source + * | vhost_reset_owner() releases | + * | device ownership | + * | | + * | 3. cpr_state_save() ---- FDs -------> | + * | | + * v v + * postmigrate Device init begins + * - cpr_find_fd() + * - vhost_dev_init() + * - VHOST_SET_OWNER + * + * Step 3 is the synchronization/cut-over point. Target proceeds immed= iately + * upon receiving FDs, so steps 1-2 must complete otherwise: + * - Target's VHOST_SET_OWNER fails with -EBUSY (source still owns) + * - Race between source I/O and target device init + * + * We stop the VM early (before FD transfer) to prevent this race. + * Unlike regular migration, CPR-transfer passes memory via FD (memfd) + * rather than copying RAM, so early VM stop should have minimal down= time. + */ + if (migrate_mode_is_cpr(s)) { + int ret =3D migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE); + if (ret < 0) { + error_setg(&local_err, "migration_stop_vm failed, error %d", -= ret); + goto out; + } + stopped =3D true; + } + if (!cpr_state_save(cpr_channel, &local_err)) { goto out; } @@ -2294,6 +2336,9 @@ out: if (local_err) { migration_connect_set_error(s, local_err); error_propagate(errp, local_err); + if (stopped) { + vm_resume(s->vm_old_state); + } } } =20 @@ -2339,6 +2384,9 @@ static void qmp_migrate_finish(MigrationAddress *addr= , bool resume_requested, } migration_connect_set_error(s, local_err); error_propagate(errp, local_err); + if (migrate_mode_is_cpr(s)) { + vm_resume(s->vm_old_state); + } return; } } @@ -4028,7 +4076,6 @@ void migration_connect(MigrationState *s, Error *erro= r_in) Error *local_err =3D NULL; uint64_t rate_limit; bool resume =3D (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_RECOVER_SET= UP); - int ret; =20 /* * If there's a previous error, free it and prepare for another one. @@ -4099,14 +4146,6 @@ void migration_connect(MigrationState *s, Error *err= or_in) return; } =20 - if (migrate_mode_is_cpr(s)) { - ret =3D migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE); - if (ret < 0) { - error_setg(&local_err, "migration_stop_vm failed, error %d", -= ret); - goto fail; - } - } - /* * Take a refcount to make sure the migration object won't get freed by * the main thread already in migration_shutdown(). --=20 2.34.1