From nobody Sat May 4 21:01:26 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1528697072388353.98125725452735; Sun, 10 Jun 2018 23:04:32 -0700 (PDT) Received: from localhost ([::1]:46654 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fSFwI-0003r0-BM for importer@patchew.org; Mon, 11 Jun 2018 02:04:26 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35557) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fSFuX-0002uy-G4 for qemu-devel@nongnu.org; Mon, 11 Jun 2018 02:02:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fSFuV-0001vd-SJ for qemu-devel@nongnu.org; Mon, 11 Jun 2018 02:02:37 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:42218 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fSFuV-0001ul-OR for qemu-devel@nongnu.org; Mon, 11 Jun 2018 02:02:35 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4F5A68424B for ; Mon, 11 Jun 2018 06:02:35 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-78.pek2.redhat.com [10.72.12.78]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2E3232166BB2; Mon, 11 Jun 2018 06:02:32 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Mon, 11 Jun 2018 14:02:27 +0800 Message-Id: <20180611060228.2998-2-peterx@redhat.com> In-Reply-To: <20180611060228.2998-1-peterx@redhat.com> References: <20180611060228.2998-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Mon, 11 Jun 2018 06:02:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Mon, 11 Jun 2018 06:02:35 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH 1/2] migration: unbreak postcopy recovery X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Dr . David Alan Gilbert" , peterx@redhat.com, Juan Quintela Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" It was broken due to recent changes in two parts: - migration_incoming_process() will be called now even for postcopy recovery sessions (while we shouldn't) - Now we don't call migrate_fd_process_incoming() any more (unless in RDMA code), so actually the postcopy recovery logic is fully bypassed Fix this up to make sure we only call migration_incoming_process() when necessary, and move the postcopy recovery logic far earlier to the entry point of incoming migration. Renaming migration_fd_process_incoming() into postcopy_try_recover() since it's mostly for the recovery process, then touch up RDMA code to suite it. Since at it, refactor the imcoming port handling to only have one singe entry point for incoming migration. Then we can avoid calling migration_ioc_process_incoming() everywhere, which is really error prone. Fixes: 36c2f8be2c ("migration: Delay start of migration main routines") Signed-off-by: Peter Xu --- migration/ram.h | 2 +- migration/exec.c | 3 --- migration/fd.c | 3 --- migration/migration.c | 34 ++++++++++++++++++++++++++-------- migration/ram.c | 11 +++++------ migration/rdma.c | 3 ++- migration/socket.c | 5 ----- 7 files changed, 34 insertions(+), 27 deletions(-) diff --git a/migration/ram.h b/migration/ram.h index d386f4d641..457bf54b8c 100644 --- a/migration/ram.h +++ b/migration/ram.h @@ -46,7 +46,7 @@ int multifd_save_cleanup(Error **errp); int multifd_load_setup(void); int multifd_load_cleanup(Error **errp); bool multifd_recv_all_channels_created(void); -void multifd_recv_new_channel(QIOChannel *ioc); +bool multifd_recv_new_channel(QIOChannel *ioc); =20 uint64_t ram_pagesize_summary(void); int ram_save_queue_pages(const char *rbname, ram_addr_t start, ram_addr_t = len); diff --git a/migration/exec.c b/migration/exec.c index 0bbeb63c97..375d2e1b54 100644 --- a/migration/exec.c +++ b/migration/exec.c @@ -49,9 +49,6 @@ static gboolean exec_accept_incoming_migration(QIOChannel= *ioc, { migration_channel_process_incoming(ioc); object_unref(OBJECT(ioc)); - if (!migrate_use_multifd()) { - migration_incoming_process(); - } return G_SOURCE_REMOVE; } =20 diff --git a/migration/fd.c b/migration/fd.c index fee34ffdc0..a7c13df4ad 100644 --- a/migration/fd.c +++ b/migration/fd.c @@ -49,9 +49,6 @@ static gboolean fd_accept_incoming_migration(QIOChannel *= ioc, { migration_channel_process_incoming(ioc); object_unref(OBJECT(ioc)); - if (!migrate_use_multifd()) { - migration_incoming_process(); - } return G_SOURCE_REMOVE; } =20 diff --git a/migration/migration.c b/migration/migration.c index 1e99ec9b7e..b8ed3dcd2f 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -461,7 +461,8 @@ void migration_incoming_process(void) qemu_coroutine_enter(co); } =20 -void migration_fd_process_incoming(QEMUFile *f) +/* Returns true if recovered from a paused migration, otherwise false */ +static bool postcopy_try_recover(QEMUFile *f) { MigrationIncomingState *mis =3D migration_incoming_get_current(); =20 @@ -486,23 +487,40 @@ void migration_fd_process_incoming(QEMUFile *f) * that source is ready to reply to page requests. */ qemu_sem_post(&mis->postcopy_pause_sem_dst); - } else { - /* New incoming migration */ - migration_incoming_setup(f); - migration_incoming_process(); + return true; } + + return false; } =20 void migration_ioc_process_incoming(QIOChannel *ioc) { MigrationIncomingState *mis =3D migration_incoming_get_current(); + QEMUFile *f =3D qemu_fopen_channel_input(ioc); + bool start_migration; + + /* If it's a recovery attempt, we're done */ + if (postcopy_try_recover(f)) { + return; + } =20 if (!mis->from_src_file) { - QEMUFile *f =3D qemu_fopen_channel_input(ioc); + /* The first connection (multifd may have multiple) */ migration_incoming_setup(f); - return; + /* + * Common migration only needs one channel, so we can start + * right now. Multifd needs more than one channel, we wait. + */ + start_migration =3D !migrate_use_multifd(); + } else { + /* Multiple connections */ + assert(migrate_use_multifd()); + start_migration =3D multifd_recv_new_channel(ioc); + } + + if (start_migration) { + migration_incoming_process(); } - multifd_recv_new_channel(ioc); } =20 /** diff --git a/migration/ram.c b/migration/ram.c index a500015a2f..0d8f38d968 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -871,7 +871,8 @@ bool multifd_recv_all_channels_created(void) return thread_count =3D=3D atomic_read(&multifd_recv_state->count); } =20 -void multifd_recv_new_channel(QIOChannel *ioc) +/* Return true if multifd is ready for the migration, otherwise false */ +bool multifd_recv_new_channel(QIOChannel *ioc) { MultiFDRecvParams *p; Error *local_err =3D NULL; @@ -880,7 +881,7 @@ void multifd_recv_new_channel(QIOChannel *ioc) id =3D multifd_recv_initial_packet(ioc, &local_err); if (id < 0) { multifd_recv_terminate_threads(local_err); - return; + return false; } =20 p =3D &multifd_recv_state->params[id]; @@ -888,7 +889,7 @@ void multifd_recv_new_channel(QIOChannel *ioc) error_setg(&local_err, "multifd: received id '%d' already setup'", id); multifd_recv_terminate_threads(local_err); - return; + return false; } p->c =3D ioc; object_ref(OBJECT(ioc)); @@ -897,9 +898,7 @@ void multifd_recv_new_channel(QIOChannel *ioc) qemu_thread_create(&p->thread, p->name, multifd_recv_thread, p, QEMU_THREAD_JOINABLE); atomic_inc(&multifd_recv_state->count); - if (multifd_recv_state->count =3D=3D migrate_multifd_channels()) { - migration_incoming_process(); - } + return multifd_recv_state->count =3D=3D migrate_multifd_channels(); } =20 /** diff --git a/migration/rdma.c b/migration/rdma.c index 05aee3d591..0f5ee987c6 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -3687,7 +3687,8 @@ static void rdma_accept_incoming_migration(void *opaq= ue) } =20 rdma->migration_started_on_destination =3D 1; - migration_fd_process_incoming(f); + migration_incoming_setup(f); + migration_incoming_process(); } =20 void rdma_start_incoming_migration(const char *host_port, Error **errp) diff --git a/migration/socket.c b/migration/socket.c index 3456eb76e9..f4c8174400 100644 --- a/migration/socket.c +++ b/migration/socket.c @@ -168,12 +168,7 @@ static void socket_accept_incoming_migration(QIONetLis= tener *listener, if (migration_has_all_channels()) { /* Close listening socket as its no longer needed */ qio_net_listener_disconnect(listener); - object_unref(OBJECT(listener)); - - if (!migrate_use_multifd()) { - migration_incoming_process(); - } } } =20 --=20 2.17.1 From nobody Sat May 4 21:01:26 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1528697075254163.68305923409298; Sun, 10 Jun 2018 23:04:35 -0700 (PDT) Received: from localhost ([::1]:46655 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fSFwI-0003rw-JA for importer@patchew.org; Mon, 11 Jun 2018 02:04:26 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35572) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fSFuZ-0002vr-8K for qemu-devel@nongnu.org; Mon, 11 Jun 2018 02:02:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fSFuY-0001xH-DY for qemu-devel@nongnu.org; Mon, 11 Jun 2018 02:02:39 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:51356 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fSFuY-0001x9-8x for qemu-devel@nongnu.org; Mon, 11 Jun 2018 02:02:38 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C850C401EF00 for ; Mon, 11 Jun 2018 06:02:37 +0000 (UTC) Received: from xz-mi.redhat.com (ovpn-12-78.pek2.redhat.com [10.72.12.78]) by smtp.corp.redhat.com (Postfix) with ESMTP id DD33F2166BB2; Mon, 11 Jun 2018 06:02:35 +0000 (UTC) From: Peter Xu To: qemu-devel@nongnu.org Date: Mon, 11 Jun 2018 14:02:28 +0800 Message-Id: <20180611060228.2998-3-peterx@redhat.com> In-Reply-To: <20180611060228.2998-1-peterx@redhat.com> References: <20180611060228.2998-1-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 11 Jun 2018 06:02:37 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 11 Jun 2018 06:02:37 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'peterx@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH 2/2] migration: delay postcopy paused state X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Dr . David Alan Gilbert" , peterx@redhat.com, Juan Quintela Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Before this patch we firstly setup the postcopy-paused state then we clean up the QEMUFile handles. That can be racy if there is a very fast "migrate-recover" command running in parallel. Fix that up. Reported-by: Peter Maydell Signed-off-by: Peter Xu Reviewed-by: Juan Quintela --- migration/savevm.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/migration/savevm.c b/migration/savevm.c index c2f34ffc7c..851d74e8b6 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -2194,9 +2194,6 @@ static bool postcopy_pause_incoming(MigrationIncoming= State *mis) /* Clear the triggered bit to allow one recovery */ mis->postcopy_recover_triggered =3D false; =20 - migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, - MIGRATION_STATUS_POSTCOPY_PAUSED); - assert(mis->from_src_file); qemu_file_shutdown(mis->from_src_file); qemu_fclose(mis->from_src_file); @@ -2209,6 +2206,9 @@ static bool postcopy_pause_incoming(MigrationIncoming= State *mis) mis->to_src_file =3D NULL; qemu_mutex_unlock(&mis->rp_mutex); =20 + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, + MIGRATION_STATUS_POSTCOPY_PAUSED); + /* Notify the fault thread for the invalidated file handle */ postcopy_fault_thread_notify(mis); =20 --=20 2.17.1