From nobody Tue Feb 10 12:57:51 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1501230706736671.337688486053; Fri, 28 Jul 2017 01:31:46 -0700 (PDT) Received: from localhost ([::1]:46774 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1db0gT-0000dC-A1 for importer@patchew.org; Fri, 28 Jul 2017 04:31:45 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47726) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1db0K6-0001r2-DV for qemu-devel@nongnu.org; Fri, 28 Jul 2017 04:08:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1db0K5-0001tJ-9A for qemu-devel@nongnu.org; Fri, 28 Jul 2017 04:08:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55520) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1db0K5-0001si-0Q for qemu-devel@nongnu.org; Fri, 28 Jul 2017 04:08:37 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 02C56C107C93; Fri, 28 Jul 2017 08:08:36 +0000 (UTC) Received: from pxdev.xzpeter.org.com (dhcp-15-224.nay.redhat.com [10.66.15.224]) by smtp.corp.redhat.com (Postfix) with ESMTP id D656A600C2; Fri, 28 Jul 2017 08:08:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 02C56C107C93 Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=peterx@redhat.com From: Peter Xu To: qemu-devel@nongnu.org Date: Fri, 28 Jul 2017 16:06:38 +0800 Message-Id: <1501229198-30588-30-git-send-email-peterx@redhat.com> In-Reply-To: <1501229198-30588-1-git-send-email-peterx@redhat.com> References: <1501229198-30588-1-git-send-email-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Fri, 28 Jul 2017 08:08:36 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [RFC 29/29] migration: reset migrate thread vars when resumed X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Laurent Vivier , Andrea Arcangeli , Juan Quintela , Alexey Perevalov , peterx@redhat.com, "Dr . David Alan Gilbert" Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Firstly, MigThrError enumeration is introduced to describe the error in migration_detect_error() better. This gives the migration_thread() a chance to know whether a recovery has happened. Then, if a recovery is detected, migration_thread() will reset its local variables to prepare for that. Signed-off-by: Peter Xu --- migration/migration.c | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index ecebe30..439bc22 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2159,6 +2159,15 @@ static bool postcopy_should_start(MigrationState *s) return atomic_read(&s->start_postcopy) || s->start_postcopy_fast; } =20 +typedef enum MigThrError { + /* No error detected */ + MIG_THR_ERR_NONE =3D 0, + /* Detected error, but resumed successfully */ + MIG_THR_ERR_RECOVERED =3D 1, + /* Detected fatal error, need to exit */ + MIG_THR_ERR_FATAL =3D 2, +} MigThrError; + static int postcopy_resume_handshake(MigrationState *s) { qemu_mutex_lock(&s->resume_lock); @@ -2209,10 +2218,10 @@ static int postcopy_do_resume(MigrationState *s) =20 /* * We don't return until we are in a safe state to continue current - * postcopy migration. Returns true to continue the migration, or - * false to terminate current migration. + * postcopy migration. Returns MIG_THR_ERR_RECOVERED if recovered, or + * MIG_THR_ERR_FATAL if unrecovery failure happened. */ -static bool postcopy_pause(MigrationState *s) +static MigThrError postcopy_pause(MigrationState *s) { assert(s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE); =20 @@ -2247,7 +2256,7 @@ do_pause: if (postcopy_do_resume(s) =3D=3D 0) { /* Let's continue! */ trace_postcopy_pause_continued(); - return true; + return MIG_THR_ERR_RECOVERED; } else { /* * Something wrong happened during the recovery, let's @@ -2258,12 +2267,11 @@ do_pause: } } else { /* This is not right... Time to quit. */ - return false; + return MIG_THR_ERR_FATAL; } } =20 -/* Return true if we want to stop the migration, otherwise false. */ -static bool migration_detect_error(MigrationState *s) +static MigThrError migration_detect_error(MigrationState *s) { int ret; =20 @@ -2272,7 +2280,7 @@ static bool migration_detect_error(MigrationState *s) =20 if (!ret) { /* Everything is fine */ - return false; + return MIG_THR_ERR_NONE; } =20 if (s->state =3D=3D MIGRATION_STATUS_POSTCOPY_ACTIVE && ret =3D=3D -EI= O) { @@ -2281,7 +2289,7 @@ static bool migration_detect_error(MigrationState *s) * while. After that, it can be continued by a * recovery phase. */ - return !postcopy_pause(s); + return postcopy_pause(s); } else { /* * For precopy (or postcopy with error outside IO), we fail @@ -2291,7 +2299,7 @@ static bool migration_detect_error(MigrationState *s) trace_migration_thread_file_err(); =20 /* Time to stop the migration, now. */ - return true; + return MIG_THR_ERR_FATAL; } } =20 @@ -2319,6 +2327,7 @@ static void *migration_thread(void *opaque) /* The active state we expect to be in; ACTIVE or POSTCOPY_ACTIVE */ enum MigrationStatus current_active_state =3D MIGRATION_STATUS_ACTIVE; bool enable_colo =3D migrate_colo_enabled(); + MigThrError thr_error; =20 rcu_register_thread(); =20 @@ -2395,8 +2404,17 @@ static void *migration_thread(void *opaque) * Try to detect any kind of failures, and see whether we * should stop the migration now. */ - if (migration_detect_error(s)) { + thr_error =3D migration_detect_error(s); + if (thr_error =3D=3D MIG_THR_ERR_FATAL) { + /* Stop migration */ break; + } else if (thr_error =3D=3D MIG_THR_ERR_RECOVERED) { + /* + * Just recovered from a e.g. network failure, reset all + * the local variables. + */ + initial_time =3D qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + initial_bytes =3D 0; } =20 current_time =3D qemu_clock_get_ms(QEMU_CLOCK_REALTIME); --=20 2.7.4