From nobody Fri Nov 14 18:03:34 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1761861167; cv=none; d=zohomail.com; s=zohoarc; b=BDAqlLvlWBzBlt373Q1P+7gwX/AIXtQdWnx66DmBPYkCk/es+yPeMaTMPT2azm379UUslF3iAEbwEJCudwGx1GBMmOcgPNff13tl++ZaooS2N3SOX508pw+9+N8mnLeTliSfZIs3vMwehg26ywWYi4yySIzKGUD3KMtiu+cWdak= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1761861167; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=kpQ1d9E8ywS7tsGQjPzaihaMRcKLRk5yK7Fv7M8iwJI=; b=eNoK9PQd1sLXTTlAlu06ibRXVf2cIwnqRuguLwEHNJd4q1sULOJIPk0r7kihU5tTFIUyK3dz0QIIXMpiwNYOuzuQBzhl/jwgulTS6I+ZKyruHf+RFOswet3THUUUTXnldEtk4TwOv/zpFHjOHUr3D5phaanR/LpPWTnsBbbN5Bg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1761861167035514.8551253736006; Thu, 30 Oct 2025 14:52:47 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vEaWz-0006qk-Qu; Thu, 30 Oct 2025 17:50:05 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vEaWx-0006ok-7n for qemu-devel@nongnu.org; Thu, 30 Oct 2025 17:50:03 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vEaWg-0000cg-N8 for qemu-devel@nongnu.org; Thu, 30 Oct 2025 17:50:02 -0400 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-678-KU6RiZMQMb-3kVZ643P1Jg-1; Thu, 30 Oct 2025 17:49:40 -0400 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 95C261955F2A; Thu, 30 Oct 2025 21:49:39 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.45.224.58]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 84F8E1800579; Thu, 30 Oct 2025 21:49:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1761860983; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kpQ1d9E8ywS7tsGQjPzaihaMRcKLRk5yK7Fv7M8iwJI=; b=Uz+HQOTfbExsf30ZEyfIal7mEP6hSewiOsLox0ZZRnR/17JN6xAV/JpZ0uAGEH68kNYKOM ken+79waWoS26VpXiKyuLCpRzddOQW4pk0jskmvq4nT6RBMLdl33cm9Z7hF825Ke6lS37j UPjKjrOAwFzOSOue69DO78x/ItLexbY= X-MC-Unique: KU6RiZMQMb-3kVZ643P1Jg-1 X-Mimecast-MFC-AGG-ID: KU6RiZMQMb-3kVZ643P1Jg_1761860979 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , Peter Xu , "Dr. David Alan Gilbert" , Jiri Denemark , Fabiano Rosas Subject: [PATCH v3 4/7] migration: Refactor all incoming cleanup info migration_incoming_destroy() Date: Thu, 30 Oct 2025 22:49:08 +0100 Message-ID: <20251030214915.1411860-5-jmarcin@redhat.com> In-Reply-To: <20251030214915.1411860-1-jmarcin@redhat.com> References: <20251030214915.1411860-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1761861169702154100 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin Currently, there are two functions that are responsible for calling the cleanup of the incoming migration state. With successful precopy, it's the incoming migration coroutine, and with successful postcopy it's the postcopy listen thread. However, if postcopy fails during in the device load, both functions will try to do the cleanup. This patch refactors all cleanup that needs to be done on the incoming side into a common function and defines a clear boundary, who is responsible for the cleanup. The incoming migration coroutine is responsible for calling the cleanup function, unless the listen thread has been started, in which case the postcopy listen thread runs the incoming migration cleanup in its BH. Signed-off-by: Juraj Marcin Reviewed-by: Peter Xu --- migration/migration.c | 44 +++++++++------------------- migration/migration.h | 1 + migration/postcopy-ram.c | 63 +++++++++++++++++++++------------------- migration/trace-events | 2 +- 4 files changed, 49 insertions(+), 61 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 9a367f717e..637be71bfe 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIn= comingState *mis) =20 void migration_incoming_state_destroy(void) { - struct MigrationIncomingState *mis =3D migration_incoming_get_current(= ); + MigrationIncomingState *mis =3D migration_incoming_get_current(); + PostcopyState ps =3D postcopy_state_get(); =20 multifd_recv_cleanup(); =20 + if (ps !=3D POSTCOPY_INCOMING_NONE) { + postcopy_incoming_cleanup(mis); + } + /* * RAM state cleanup needs to happen after multifd cleanup, because * multifd threads can use some of its states (receivedmap). @@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque) { MigrationState *s =3D migrate_get_current(); MigrationIncomingState *mis =3D migration_incoming_get_current(); - PostcopyState ps; int ret; Error *local_err =3D NULL; =20 @@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque) =20 trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed"); =20 - ps =3D postcopy_state_get(); - trace_process_incoming_migration_co_end(ret, ps); - if (ps !=3D POSTCOPY_INCOMING_NONE) { - if (ps =3D=3D POSTCOPY_INCOMING_ADVISE) { - /* - * Where a migration had postcopy enabled (and thus went to ad= vise) - * but managed to complete within the precopy period, we can u= se - * the normal exit. - */ - postcopy_incoming_cleanup(mis); - } else if (ret >=3D 0) { - /* - * Postcopy was started, cleanup should happen at the end of t= he - * postcopy thread. - */ - trace_process_incoming_migration_co_postcopy_end_main(); - goto out; - } - /* Else if something went wrong then just fall out of the normal e= xit */ + trace_process_incoming_migration_co_end(ret); + if (mis->have_listen_thread) { + /* + * Postcopy was started, cleanup should happen at the end of the + * postcopy listen thread. + */ + trace_process_incoming_migration_co_postcopy_end_main(); + goto out; } =20 if (ret < 0) { @@ -933,15 +926,6 @@ fail: } =20 exit(EXIT_FAILURE); - } else { - /* - * Report the error here in case that QEMU abruptly exits - * when postcopy is enabled. - */ - WITH_QEMU_LOCK_GUARD(&s->error_mutex) { - error_report_err(s->error); - s->error =3D NULL; - } } out: /* Pairs with the refcount taken in qmp_migrate_incoming() */ diff --git a/migration/migration.h b/migration/migration.h index 01329bf824..4a37f7202c 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -254,6 +254,7 @@ struct MigrationIncomingState { MigrationIncomingState *migration_incoming_get_current(void); void migration_incoming_state_destroy(void); void migration_incoming_transport_cleanup(MigrationIncomingState *mis); +void migration_incoming_qemu_exit(void); /* * Functions to work with blocktime context */ diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index b47c955763..48cbb46c27 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status) status =3D=3D MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP; } =20 +static void postcopy_listen_thread_bh(void *opaque) +{ + MigrationIncomingState *mis =3D migration_incoming_get_current(); + + migration_incoming_state_destroy(); + + if (mis->state =3D=3D MIGRATION_STATUS_FAILED) { + /* + * If something went wrong then we have a bad state so exit; + * we only could have gotten here if something failed before + * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise + * postcopy migration would pause inside qemu_loadvm_state_main(). + * Failing dirty-bitmaps won't fail the whole migration. + */ + exit(1); + } +} + /* * Triggered by a postcopy_listen command; this thread takes over reading * the input stream, leaving the main thread free to carry on loading the = rest @@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque) "bitmaps are correctly migrated and valid.", __func__, load_res, error_get_pretty(local_err)); g_clear_pointer(&local_err, error_free); - load_res =3D 0; /* prevent further exit() */ } else { + /* + * Something went fatally wrong and we have a bad state, QEMU = will + * exit depending on if postcopy-exit-on-error is true, but the + * migration cannot be recovered. + */ error_prepend(&local_err, "loadvm failed during postcopy: %d: ", load_res); migrate_set_error(migr, local_err); g_clear_pointer(&local_err, error_report_err); migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIV= E, MIGRATION_STATUS_FAILED); + goto out; } } - if (load_res >=3D 0) { - /* - * This looks good, but it's possible that the device loading in t= he - * main thread hasn't finished yet, and so we might not be in 'RUN' - * state yet; wait for the end of the main thread. - */ - qemu_event_wait(&mis->main_thread_load_event); - } - postcopy_incoming_cleanup(mis); - - if (load_res < 0) { - /* - * If something went wrong then we have a bad state so exit; - * depending how far we got it might be possible at this point - * to leave the guest running and fire MCEs for pages that never - * arrived as a desperate recovery step. - */ - rcu_unregister_thread(); - exit(EXIT_FAILURE); - } + /* + * This looks good, but it's possible that the device loading in the + * main thread hasn't finished yet, and so we might not be in 'RUN' + * state yet; wait for the end of the main thread. + */ + qemu_event_wait(&mis->main_thread_load_event); =20 migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_COMPLETED); - /* - * If everything has worked fine, then the main thread has waited - * for us to start, and we're the last use of the mis. - * (If something broke then qemu will have to exit anyway since it's - * got a bad migration state). - */ - bql_lock(); - migration_incoming_state_destroy(); - bql_unlock(); =20 +out: rcu_unregister_thread(); mis->have_listen_thread =3D false; postcopy_state_set(POSTCOPY_INCOMING_END); =20 + migration_bh_schedule(postcopy_listen_thread_bh, NULL); + object_unref(OBJECT(migr)); =20 return NULL; diff --git a/migration/trace-events b/migration/trace-events index e8edd1fbba..772636f3ac 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRI= u32 source_return_path_thread_switchover_acked(void) "" migration_thread_low_pending(uint64_t pending) "%" PRIu64 migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t ba= ndwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_sp= ent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %"= PRId64 -process_incoming_migration_co_end(int ret, int ps) "ret=3D%d postcopy-stat= e=3D%d" +process_incoming_migration_co_end(int ret) "ret=3D%d" process_incoming_migration_co_postcopy_end_main(void) "" postcopy_preempt_enabled(bool value) "%d" migration_precopy_complete(void) "" --=20 2.51.0