From nobody Fri Nov 14 23:28:59 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1762204306; cv=none; d=zohomail.com; s=zohoarc; b=Km9Ms0WMZ2yHkh6cv5iyPTDzxdRVNGwAu+O2QShQN5COqD/7S5Du2APTqmB4m6+fMNLbaHbMGEFHXh4vp7T328KiKh9qu4Ma5Ka8FEh4rZ1MPAcos8mAiQMExoDQG/tx5v5jQVMHnoW+ywD79B14ZuSaDHwFpEk5Qs25KxQTp3w= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1762204306; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=QKMPVMoNMgHMuyq+3Hjcjr8NaNmoAw5q1rOR6GQTNrk=; b=miLs5yFbxyz0v9ry6YWxZ5FE3cz8W2LRg7IxuIBoESlIwTHJmy6N+b6IEQDpQEPMS7Rg6txGUg8LznbNqTNr383nt5hxe0sLFO47+z9fsh8g2cZlLCZJZrTwYFqwfevI79pYvzwJC+Sx3XTVI4fsXg0pT78gGRlzHw71CxxTJcY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1762204306080479.5647797750995; Mon, 3 Nov 2025 13:11:46 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1vG1mF-0001B0-GM; Mon, 03 Nov 2025 16:07:47 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG1mC-00019s-95 for qemu-devel@nongnu.org; Mon, 03 Nov 2025 16:07:45 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1vG1lo-0005St-Tn for qemu-devel@nongnu.org; Mon, 03 Nov 2025 16:07:44 -0500 Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-82-VJE_d9GhOxy78fv1nPvHnA-1; Mon, 03 Nov 2025 16:07:16 -0500 Received: by mail-qt1-f199.google.com with SMTP id d75a77b69052e-4ecf4540bb6so133880761cf.3 for ; Mon, 03 Nov 2025 13:07:15 -0800 (PST) Received: from x1.com ([142.188.210.50]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b0f5468968sm57428185a.19.2025.11.03.13.07.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Nov 2025 13:07:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762204038; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QKMPVMoNMgHMuyq+3Hjcjr8NaNmoAw5q1rOR6GQTNrk=; b=LLSq1vQq89TyaIWR5/6sbtHJdDJeEmp9j4C3a236SdYOdMDroDMRlyVH9K4CxrUyXxiC9i fgQXh705LvI02xaeQ3L7ALi4g5wpaQt9JE1if7tbBYFYsT76FKf8MdH3jM1FKjCG8Ub7S1 QZuFxnajjLI+8O5YFLuO1WeAbdcLMjA= X-MC-Unique: VJE_d9GhOxy78fv1nPvHnA-1 X-Mimecast-MFC-AGG-ID: VJE_d9GhOxy78fv1nPvHnA_1762204035 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1762204035; x=1762808835; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QKMPVMoNMgHMuyq+3Hjcjr8NaNmoAw5q1rOR6GQTNrk=; b=Q6T0t4DDUmYYHajRf91Pta9Cp7NLTlDySYLd5AZKX8HoIzyClGUjagQ6jDmE0lCPba j95Mvyy74alm+60pleEANSDAa7xISGhrvlZsToMrXYk0M5KtvTlfjsK/Ve2+jgiQnD3O PpcfxtvB9U/UmQEq+1pXwmyqAwnbnCQacvq0I1knaSbliYe+fKJsdG265/7gNwaHB6jY gcpRMAsO6cF5SmtnZrw2++tBWE8FuosPJ3IlO3rV5Rgf8m/yc8OyK6mw+LDI1mBBSBzh sjalU5Ji+ZUB4pQlMq79uG1MVx1BoVBsaZ95494uxsHGoGDBxssZBM0Eua5raOiPs1sd r8sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762204035; x=1762808835; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QKMPVMoNMgHMuyq+3Hjcjr8NaNmoAw5q1rOR6GQTNrk=; b=fZLF+nfQvEYWvkkZa7ZUyErlFI7M++xG5Z47QB7Uc8AFsVjd5FaEgmwfLBe2PPl4+J ZsHhpwTOKJ+sHk22At3DB+p+ypP6Fyt5LssOM2Jcf7EIjtUvQKpgw7B0J9nvRVZdlMSF yDcoqIX3kNcnIOJiuE4bzh72WgoU5KZsdVlNrhtyBcUDB6QYwpGay2lYWlNwkwAnIRt5 ZQBvTYhj2t2hwfpiYHJNHVPEQPXcfJpGhv+MYodqNeMdL3uyPP1Am/E59+26nQIpk2Lc Bg1zApZQjuWYMEman3U6mJfqyP29I5HxdHy72UX4yRlzYlSEU6nWPTtJTc/1Qh/k1q9O q4IQ== X-Gm-Message-State: AOJu0YwZEMcxedlvOqA1I9RIPDRIl38rTzNMBYhGlzp9dpnXAiIpBJxt DIUd/i90BB/RGpPWYrEKZuW6Fl+YO7G6n9R4X279+ZeZjDb0hNxVZKU3zdkExuUyuGym9lqoXns u6eN/71gNRnsheijbZuipXjLCF30UWSEyLkRVrVHnXyupNxr7KXPvwkMhGrjQM+3/+Phnq+pnin TfZ0Lv7H25wrHdRjCSMeJssvP5IkZ31zM8l8pg+w== X-Gm-Gg: ASbGncspy79B6NvIW5qMb2i+44/IY+89TJoUXbcJfjg1zjs4A1EeDZSqZXJ/KxlAnp6 5yMUA1vrXdcQ9wtJ6kXlAbIocpYOTI64D/jbUYZRsS5mch+Q43z95gdN8+3E9EsGJ+9hTjutd22 H1mxEFP+9F+cw8KWzk42FOfvKhgKwxXDSYUx5hGp3m53Bbox4dvC1P7x7FQ9Zzw2A+/VgNp0H89 bnsOWHjQS8kDM6wgaz/uyxKf63ovCuuddgWAfx5kPXufm9cgeMTY+Us6l1Dgqf41CdWk4pg8S3s B6Shy8B188tAg0YmZrcsjqjsymuLPfag4AoGmI6nkJ7nmMiBo/VANqYQyhorC0mZ X-Received: by 2002:a05:622a:15c2:b0:4e8:912a:bdc5 with SMTP id d75a77b69052e-4ed30f57aadmr190215011cf.27.1762204034895; Mon, 03 Nov 2025 13:07:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IFzXAW+9REqy2X9XkTadXyvfhgARn4v1dbo9COoQ6U+3R9zbjv6xXWoWsIY34lt3C+PSpCzyQ== X-Received: by 2002:a05:622a:15c2:b0:4e8:912a:bdc5 with SMTP id d75a77b69052e-4ed30f57aadmr190214281cf.27.1762204034187; Mon, 03 Nov 2025 13:07:14 -0800 (PST) From: Peter Xu To: qemu-devel@nongnu.org Cc: Fabiano Rosas , David Hildenbrand , peterx@redhat.com, Paolo Bonzini , Juraj Marcin Subject: [PULL 33/36] migration: Refactor all incoming cleanup info migration_incoming_destroy() Date: Mon, 3 Nov 2025 16:06:22 -0500 Message-ID: <20251103210625.3689448-34-peterx@redhat.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20251103210625.3689448-1-peterx@redhat.com> References: <20251103210625.3689448-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_PASS=-0.001, T_SPF_HELO_TEMPERROR=0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1762204307387158500 Content-Type: text/plain; charset="utf-8" From: Juraj Marcin Currently, there are two functions that are responsible for calling the cleanup of the incoming migration state. With successful precopy, it's the incoming migration coroutine, and with successful postcopy it's the postcopy listen thread. However, if postcopy fails during in the device load, both functions will try to do the cleanup. This patch refactors all cleanup that needs to be done on the incoming side into a common function and defines a clear boundary, who is responsible for the cleanup. The incoming migration coroutine is responsible for calling the cleanup function, unless the listen thread has been started, in which case the postcopy listen thread runs the incoming migration cleanup in its BH. Signed-off-by: Juraj Marcin Fixes: 9535435795 ("migration: push Error **errp into qemu_loadvm_state()") Reviewed-by: Peter Xu Link: https://lore.kernel.org/r/20251103183301.3840862-6-jmarcin@redhat.com Signed-off-by: Peter Xu --- migration/migration.h | 1 + migration/migration.c | 44 +++++++++------------------- migration/postcopy-ram.c | 63 +++++++++++++++++++++------------------- migration/trace-events | 2 +- 4 files changed, 49 insertions(+), 61 deletions(-) diff --git a/migration/migration.h b/migration/migration.h index 01329bf824..4a37f7202c 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -254,6 +254,7 @@ struct MigrationIncomingState { MigrationIncomingState *migration_incoming_get_current(void); void migration_incoming_state_destroy(void); void migration_incoming_transport_cleanup(MigrationIncomingState *mis); +void migration_incoming_qemu_exit(void); /* * Functions to work with blocktime context */ diff --git a/migration/migration.c b/migration/migration.c index 9a367f717e..637be71bfe 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -438,10 +438,15 @@ void migration_incoming_transport_cleanup(MigrationIn= comingState *mis) =20 void migration_incoming_state_destroy(void) { - struct MigrationIncomingState *mis =3D migration_incoming_get_current(= ); + MigrationIncomingState *mis =3D migration_incoming_get_current(); + PostcopyState ps =3D postcopy_state_get(); =20 multifd_recv_cleanup(); =20 + if (ps !=3D POSTCOPY_INCOMING_NONE) { + postcopy_incoming_cleanup(mis); + } + /* * RAM state cleanup needs to happen after multifd cleanup, because * multifd threads can use some of its states (receivedmap). @@ -866,7 +871,6 @@ process_incoming_migration_co(void *opaque) { MigrationState *s =3D migrate_get_current(); MigrationIncomingState *mis =3D migration_incoming_get_current(); - PostcopyState ps; int ret; Error *local_err =3D NULL; =20 @@ -883,25 +887,14 @@ process_incoming_migration_co(void *opaque) =20 trace_vmstate_downtime_checkpoint("dst-precopy-loadvm-completed"); =20 - ps =3D postcopy_state_get(); - trace_process_incoming_migration_co_end(ret, ps); - if (ps !=3D POSTCOPY_INCOMING_NONE) { - if (ps =3D=3D POSTCOPY_INCOMING_ADVISE) { - /* - * Where a migration had postcopy enabled (and thus went to ad= vise) - * but managed to complete within the precopy period, we can u= se - * the normal exit. - */ - postcopy_incoming_cleanup(mis); - } else if (ret >=3D 0) { - /* - * Postcopy was started, cleanup should happen at the end of t= he - * postcopy thread. - */ - trace_process_incoming_migration_co_postcopy_end_main(); - goto out; - } - /* Else if something went wrong then just fall out of the normal e= xit */ + trace_process_incoming_migration_co_end(ret); + if (mis->have_listen_thread) { + /* + * Postcopy was started, cleanup should happen at the end of the + * postcopy listen thread. + */ + trace_process_incoming_migration_co_postcopy_end_main(); + goto out; } =20 if (ret < 0) { @@ -933,15 +926,6 @@ fail: } =20 exit(EXIT_FAILURE); - } else { - /* - * Report the error here in case that QEMU abruptly exits - * when postcopy is enabled. - */ - WITH_QEMU_LOCK_GUARD(&s->error_mutex) { - error_report_err(s->error); - s->error =3D NULL; - } } out: /* Pairs with the refcount taken in qmp_migrate_incoming() */ diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c index b47c955763..48cbb46c27 100644 --- a/migration/postcopy-ram.c +++ b/migration/postcopy-ram.c @@ -2078,6 +2078,24 @@ bool postcopy_is_paused(MigrationStatus status) status =3D=3D MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP; } =20 +static void postcopy_listen_thread_bh(void *opaque) +{ + MigrationIncomingState *mis =3D migration_incoming_get_current(); + + migration_incoming_state_destroy(); + + if (mis->state =3D=3D MIGRATION_STATUS_FAILED) { + /* + * If something went wrong then we have a bad state so exit; + * we only could have gotten here if something failed before + * POSTCOPY_INCOMING_RUNNING (for example device load), otherwise + * postcopy migration would pause inside qemu_loadvm_state_main(). + * Failing dirty-bitmaps won't fail the whole migration. + */ + exit(1); + } +} + /* * Triggered by a postcopy_listen command; this thread takes over reading * the input stream, leaving the main thread free to carry on loading the = rest @@ -2131,53 +2149,38 @@ static void *postcopy_listen_thread(void *opaque) "bitmaps are correctly migrated and valid.", __func__, load_res, error_get_pretty(local_err)); g_clear_pointer(&local_err, error_free); - load_res =3D 0; /* prevent further exit() */ } else { + /* + * Something went fatally wrong and we have a bad state, QEMU = will + * exit depending on if postcopy-exit-on-error is true, but the + * migration cannot be recovered. + */ error_prepend(&local_err, "loadvm failed during postcopy: %d: ", load_res); migrate_set_error(migr, local_err); g_clear_pointer(&local_err, error_report_err); migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIV= E, MIGRATION_STATUS_FAILED); + goto out; } } - if (load_res >=3D 0) { - /* - * This looks good, but it's possible that the device loading in t= he - * main thread hasn't finished yet, and so we might not be in 'RUN' - * state yet; wait for the end of the main thread. - */ - qemu_event_wait(&mis->main_thread_load_event); - } - postcopy_incoming_cleanup(mis); - - if (load_res < 0) { - /* - * If something went wrong then we have a bad state so exit; - * depending how far we got it might be possible at this point - * to leave the guest running and fire MCEs for pages that never - * arrived as a desperate recovery step. - */ - rcu_unregister_thread(); - exit(EXIT_FAILURE); - } + /* + * This looks good, but it's possible that the device loading in the + * main thread hasn't finished yet, and so we might not be in 'RUN' + * state yet; wait for the end of the main thread. + */ + qemu_event_wait(&mis->main_thread_load_event); =20 migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_COMPLETED); - /* - * If everything has worked fine, then the main thread has waited - * for us to start, and we're the last use of the mis. - * (If something broke then qemu will have to exit anyway since it's - * got a bad migration state). - */ - bql_lock(); - migration_incoming_state_destroy(); - bql_unlock(); =20 +out: rcu_unregister_thread(); mis->have_listen_thread =3D false; postcopy_state_set(POSTCOPY_INCOMING_END); =20 + migration_bh_schedule(postcopy_listen_thread_bh, NULL); + object_unref(OBJECT(migr)); =20 return NULL; diff --git a/migration/trace-events b/migration/trace-events index e8edd1fbba..772636f3ac 100644 --- a/migration/trace-events +++ b/migration/trace-events @@ -193,7 +193,7 @@ source_return_path_thread_resume_ack(uint32_t v) "%"PRI= u32 source_return_path_thread_switchover_acked(void) "" migration_thread_low_pending(uint64_t pending) "%" PRIu64 migrate_transferred(uint64_t transferred, uint64_t time_spent, uint64_t ba= ndwidth, uint64_t avail_bw, uint64_t size) "transferred %" PRIu64 " time_sp= ent %" PRIu64 " bandwidth %" PRIu64 " switchover_bw %" PRIu64 " max_size %"= PRId64 -process_incoming_migration_co_end(int ret, int ps) "ret=3D%d postcopy-stat= e=3D%d" +process_incoming_migration_co_end(int ret) "ret=3D%d" process_incoming_migration_co_postcopy_end_main(void) "" postcopy_preempt_enabled(bool value) "%d" migration_precopy_complete(void) "" --=20 2.50.1