From nobody Sun Sep 28 17:05:12 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1757937763; cv=none; d=zohomail.com; s=zohoarc; b=Jqhn/ZXFsQvsWNUPPwZkX0l48Rh9HCJpdMso1VvE8D1bWqePUu1HMX29930GpFkFPoe7nQz3T7PdVUjDkvX3YBVDfCEydfwDTJU7YlK2u0ePyQdqRaexIqe6iLDWiW4GVNOWulsl0tREvnydnwd8MQr7t7x1KwPs1GHTavR28Qw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1757937763; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=i7vTnrxV4V0xgC6bS9L8wnCdMT60fT69SjLq8DL/rM4=; b=lM4HNcUwoZTWGvgeCk4yVGpULOhD3S3hODbwxOB25zXkTTRznQKwJYf2c8R0fWrNQ2xUN6hbH5jwFNpxFwi43zPSK5wd+inT3shI2G6fQMY2ftyZgwRWhZrxFH0ezFQwcuY6+SBgeKKEJi7+HEIwmA2RE7FKRWPKzhEZvA8ckuU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1757937763238780.7706393815128; Mon, 15 Sep 2025 05:02:43 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1uy7sf-00048y-99; Mon, 15 Sep 2025 08:00:25 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uy7sP-00046Y-9c for qemu-devel@nongnu.org; Mon, 15 Sep 2025 08:00:10 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1uy7sE-0004aa-Jf for qemu-devel@nongnu.org; Mon, 15 Sep 2025 08:00:06 -0400 Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-404-Y16CNe5eMJCZ1dHw3Pme7w-1; Mon, 15 Sep 2025 07:59:50 -0400 Received: from mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.17]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E63271944F11; Mon, 15 Sep 2025 11:59:49 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.45.224.193]) by mx-prod-int-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2CD2119432AE; Mon, 15 Sep 2025 11:59:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1757937594; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i7vTnrxV4V0xgC6bS9L8wnCdMT60fT69SjLq8DL/rM4=; b=P3WPHfD20Pu14WPLd/Ilh6xiur+NXcQf1IvR7T9WBxBDAkfwfeJEC48a537eg+t9irGvKl Mzxv+1VwHJY8OHlIb2ZANDpext565jdfZGXF+e/PQchqN7ao4o7dqvrI2VDZkzQvlNLdub OQh+oT22ac/90LkNjjWKRXUua6XRDPw= X-MC-Unique: Y16CNe5eMJCZ1dHw3Pme7w-1 X-Mimecast-MFC-AGG-ID: Y16CNe5eMJCZ1dHw3Pme7w_1757937590 From: Juraj Marcin To: qemu-devel@nongnu.org Cc: Juraj Marcin , Jiri Denemark , Peter Xu , "Dr. David Alan Gilbert" , Fabiano Rosas Subject: [PATCH 3/4] migration: Refactor incoming cleanup into migration_incoming_finish() Date: Mon, 15 Sep 2025 13:59:14 +0200 Message-ID: <20250915115918.3520735-4-jmarcin@redhat.com> In-Reply-To: <20250915115918.3520735-1-jmarcin@redhat.com> References: <20250915115918.3520735-1-jmarcin@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.17 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=jmarcin@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.035, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1757937767643116600 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin Currently, there are two functions that are responsible for cleanup of the incoming migration state. With successful precopy, it's the main thread and with successful postcopy it's the listen thread. However, if postcopy fails during in the device load, both functions will try to do the cleanup. Moreover, when exit-on-error parameter was added, it was applied only to precopy. This patch refactors common cleanup and exiting on error into a helper function that can be started either from precopy or postcopy, reducing the duplication. If the listen thread has been started (the postcopy state is at least LISTENING), the listen thread is responsible for all cleanup and exiting, otherwise it's the main thread's responsibility. Signed-off-by: Juraj Marcin --- migration/migration.c | 64 ++++++++++++++++++++++++------------------- migration/migration.h | 1 + migration/savevm.c | 48 +++++++++++--------------------- 3 files changed, 53 insertions(+), 60 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 2c0b3a7229..7222e3de13 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -442,9 +442,19 @@ void migration_incoming_transport_cleanup(MigrationInc= omingState *mis) void migration_incoming_state_destroy(void) { struct MigrationIncomingState *mis =3D migration_incoming_get_current(= ); + PostcopyState ps =3D postcopy_state_get(); =20 multifd_recv_cleanup(); =20 + if (mis->have_listen_thread) { + qemu_thread_join(&mis->listen_thread); + mis->have_listen_thread =3D false; + } + + if (ps !=3D POSTCOPY_INCOMING_NONE) { + postcopy_ram_incoming_cleanup(mis); + } + /* * RAM state cleanup needs to happen after multifd cleanup, because * multifd threads can use some of its states (receivedmap). @@ -809,6 +819,23 @@ static void qemu_start_incoming_migration(const char *= uri, bool has_channels, cpr_state_close(); } =20 +void migration_incoming_finish(void) +{ + MigrationState *s =3D migrate_get_current(); + MigrationIncomingState *mis =3D migration_incoming_get_current(); + + migration_incoming_state_destroy(); + + if (migration_has_failed(mis->state) && mis->exit_on_error) { + WITH_QEMU_LOCK_GUARD(&s->error_mutex) { + error_report_err(s->error); + s->error =3D NULL; + } + + exit(EXIT_FAILURE); + } +} + static void process_incoming_migration_bh(void *opaque) { MigrationIncomingState *mis =3D opaque; @@ -861,7 +888,7 @@ static void process_incoming_migration_bh(void *opaque) */ migrate_set_state(&mis->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_COMPLETED); - migration_incoming_state_destroy(); + migration_incoming_finish(); } =20 static void coroutine_fn @@ -888,23 +915,13 @@ process_incoming_migration_co(void *opaque) =20 ps =3D postcopy_state_get(); trace_process_incoming_migration_co_end(ret, ps); - if (ps !=3D POSTCOPY_INCOMING_NONE) { - if (ps =3D=3D POSTCOPY_INCOMING_ADVISE) { - /* - * Where a migration had postcopy enabled (and thus went to ad= vise) - * but managed to complete within the precopy period, we can u= se - * the normal exit. - */ - postcopy_ram_incoming_cleanup(mis); - } else if (ret >=3D 0) { - /* - * Postcopy was started, cleanup should happen at the end of t= he - * postcopy thread. - */ - trace_process_incoming_migration_co_postcopy_end_main(); - goto out; - } - /* Else if something went wrong then just fall out of the normal e= xit */ + if (ps >=3D POSTCOPY_INCOMING_LISTENING) { + /* + * Postcopy was started, cleanup should happen at the end of the + * postcopy thread. + */ + trace_process_incoming_migration_co_postcopy_end_main(); + goto out; } =20 if (ret < 0) { @@ -926,16 +943,7 @@ fail: migrate_set_error(s, local_err); error_free(local_err); =20 - migration_incoming_state_destroy(); - - if (mis->exit_on_error) { - WITH_QEMU_LOCK_GUARD(&s->error_mutex) { - error_report_err(s->error); - s->error =3D NULL; - } - - exit(EXIT_FAILURE); - } + migration_incoming_finish(); out: /* Pairs with the refcount taken in qmp_migrate_incoming() */ migrate_incoming_unref_outgoing_state(); diff --git a/migration/migration.h b/migration/migration.h index 2c2331f40d..67e3318467 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -518,6 +518,7 @@ void migrate_set_state(MigrationStatus *state, Migratio= nStatus old_state, void migration_fd_process_incoming(QEMUFile *f); void migration_ioc_process_incoming(QIOChannel *ioc, Error **errp); void migration_incoming_process(void); +void migration_incoming_finish(void); =20 bool migration_has_all_channels(void); =20 diff --git a/migration/savevm.c b/migration/savevm.c index fabbeb296a..d7eb416d48 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2069,6 +2069,11 @@ static int loadvm_postcopy_ram_handle_discard(Migrat= ionIncomingState *mis, return 0; } =20 +static void postcopy_ram_listen_thread_bh(void *opaque) +{ + migration_incoming_finish(); +} + /* * Triggered by a postcopy_listen command; this thread takes over reading * the input stream, leaving the main thread free to carry on loading the = rest @@ -2122,52 +2127,31 @@ static void *postcopy_ram_listen_thread(void *opaqu= e) "bitmaps may be lost, and present migrated dirty " "bitmaps are correctly migrated and valid.", __func__, load_res); - load_res =3D 0; /* prevent further exit() */ } else { error_report("%s: loadvm failed: %d", __func__, load_res); migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIV= E, MIGRATION_STATUS_FAILED); + goto out; } } - if (load_res >=3D 0) { - /* - * This looks good, but it's possible that the device loading in t= he - * main thread hasn't finished yet, and so we might not be in 'RUN' - * state yet; wait for the end of the main thread. - */ - qemu_event_wait(&mis->main_thread_load_event); - } - postcopy_ram_incoming_cleanup(mis); - - if (load_res < 0) { - /* - * If something went wrong then we have a bad state so exit; - * depending how far we got it might be possible at this point - * to leave the guest running and fire MCEs for pages that never - * arrived as a desperate recovery step. - */ - rcu_unregister_thread(); - exit(EXIT_FAILURE); - } + /* + * This looks good, but it's possible that the device loading in the + * main thread hasn't finished yet, and so we might not be in 'RUN' + * state yet; wait for the end of the main thread. + */ + qemu_event_wait(&mis->main_thread_load_event); =20 migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_COMPLETED); - /* - * If everything has worked fine, then the main thread has waited - * for us to start, and we're the last use of the mis. - * (If something broke then qemu will have to exit anyway since it's - * got a bad migration state). - */ - bql_lock(); - migration_incoming_state_destroy(); - bql_unlock(); =20 +out: rcu_unregister_thread(); - mis->have_listen_thread =3D false; postcopy_state_set(POSTCOPY_INCOMING_END); =20 object_unref(OBJECT(migr)); =20 + migration_bh_schedule(postcopy_ram_listen_thread_bh, NULL); + return NULL; } =20 @@ -2217,7 +2201,7 @@ static int loadvm_postcopy_handle_listen(MigrationInc= omingState *mis) mis->have_listen_thread =3D true; postcopy_thread_create(mis, &mis->listen_thread, MIGRATION_THREAD_DST_LISTEN, - postcopy_ram_listen_thread, QEMU_THREAD_DETACHE= D); + postcopy_ram_listen_thread, QEMU_THREAD_JOINABL= E); trace_loadvm_postcopy_handle_listen("return"); =20 return 0; --=20 2.51.0