From nobody Mon Apr 29 09:56:41 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1497629318149678.679273772817; Fri, 16 Jun 2017 09:08:38 -0700 (PDT) Received: from localhost ([::1]:59734 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dLtnY-0002Ty-Q4 for importer@patchew.org; Fri, 16 Jun 2017 12:08:36 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52539) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dLtmG-0000z6-2r for qemu-devel@nongnu.org; Fri, 16 Jun 2017 12:07:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dLtmB-0000gl-9N for qemu-devel@nongnu.org; Fri, 16 Jun 2017 12:07:16 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35328) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dLtmB-0000ga-0v for qemu-devel@nongnu.org; Fri, 16 Jun 2017 12:07:11 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AA0D1B175; Fri, 16 Jun 2017 16:07:09 +0000 (UTC) Received: from lemon.redhat.com (ovpn-12-22.pek2.redhat.com [10.72.12.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 291609CB79; Fri, 16 Jun 2017 16:06:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com AA0D1B175 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=famz@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com AA0D1B175 From: Fam Zheng To: qemu-devel@nongnu.org Date: Sat, 17 Jun 2017 00:06:58 +0800 Message-Id: <20170616160658.32290-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Fri, 16 Jun 2017 16:07:09 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH] migration: Fix race of image locking between src and dst X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Peter Maydell , zhanghailiang , Juan Quintela , "Dr. David Alan Gilbert" , peterx@redhat.com, mreitz@redhat.com, stefanha@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Previously, dst side will immediately try to lock the write byte upon receiving QEMU_VM_EOF, but at src side, bdrv_inactivate_all() is only done after sending it. If the src host is under load, dst may fail to acquire the lock due to racing with the src unlocking it. Fix this by hoisting the bdrv_inactivate_all() operation before QEMU_VM_EOF. N.B. A further improvement could possibly be done to cleanly handover locks between src and dst, so that there is no window where a third QEMU could steal the locks and prevent src and dst from running. Reported-by: Peter Maydell Signed-off-by: Fam Zheng Reviewed-by: Juan Quintela --- migration/colo.c | 2 +- migration/migration.c | 19 +++++++------------ migration/savevm.c | 19 +++++++++++++++---- migration/savevm.h | 3 ++- 4 files changed, 25 insertions(+), 18 deletions(-) diff --git a/migration/colo.c b/migration/colo.c index c436d63..c4ba4c3 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -352,7 +352,7 @@ static int colo_do_checkpoint_transaction(MigrationStat= e *s, qemu_savevm_state_header(fb); qemu_savevm_state_begin(fb); qemu_mutex_lock_iothread(); - qemu_savevm_state_complete_precopy(fb, false); + qemu_savevm_state_complete_precopy(fb, false, false); qemu_mutex_unlock_iothread(); =20 qemu_fflush(fb); diff --git a/migration/migration.c b/migration/migration.c index b9d8798..f588329 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1553,7 +1553,7 @@ static int postcopy_start(MigrationState *ms, bool *o= ld_vm_running) * Cause any non-postcopiable, but iterative devices to * send out their final data. */ - qemu_savevm_state_complete_precopy(ms->to_dst_file, true); + qemu_savevm_state_complete_precopy(ms->to_dst_file, true, false); =20 /* * in Finish migrate and with the io-lock held everything should @@ -1597,7 +1597,7 @@ static int postcopy_start(MigrationState *ms, bool *o= ld_vm_running) */ qemu_savevm_send_postcopy_listen(fb); =20 - qemu_savevm_state_complete_precopy(fb, false); + qemu_savevm_state_complete_precopy(fb, false, false); qemu_savevm_send_ping(fb, 3); =20 qemu_savevm_send_postcopy_run(fb); @@ -1695,20 +1695,15 @@ static void migration_completion(MigrationState *s,= int current_active_state, ret =3D global_state_store(); =20 if (!ret) { + bool inactivate =3D !migrate_colo_enabled(); ret =3D vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); if (ret >=3D 0) { qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX); - qemu_savevm_state_complete_precopy(s->to_dst_file, false); + ret =3D qemu_savevm_state_complete_precopy(s->to_dst_file,= false, + inactivate); } - /* - * Don't mark the image with BDRV_O_INACTIVE flag if - * we will go into COLO stage later. - */ - if (ret >=3D 0 && !migrate_colo_enabled()) { - ret =3D bdrv_inactivate_all(); - if (ret >=3D 0) { - s->block_inactive =3D true; - } + if (inactivate && ret >=3D 0) { + s->block_inactive =3D true; } } qemu_mutex_unlock_iothread(); diff --git a/migration/savevm.c b/migration/savevm.c index f32a82d..6bfd489 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1104,7 +1104,8 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f) qemu_fflush(f); } =20 -void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only) +int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only, + bool inactivate_disks) { QJSON *vmdesc; int vmdesc_len; @@ -1138,12 +1139,12 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f= , bool iterable_only) save_section_footer(f, se); if (ret < 0) { qemu_file_set_error(f, ret); - return; + return -1; } } =20 if (iterable_only) { - return; + return 0; } =20 vmdesc =3D qjson_new(); @@ -1173,6 +1174,15 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,= bool iterable_only) json_end_object(vmdesc); } =20 + if (inactivate_disks) { + /* Inactivate before sending QEMU_VM_EOF so that the + * bdrv_invalidate_cache_all() on the other end won't fail. */ + ret =3D bdrv_inactivate_all(); + if (ret) { + qemu_file_set_error(f, ret); + return ret; + } + } if (!in_postcopy) { /* Postcopy stream will still be going */ qemu_put_byte(f, QEMU_VM_EOF); @@ -1190,6 +1200,7 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, = bool iterable_only) qjson_destroy(vmdesc); =20 qemu_fflush(f); + return 0; } =20 /* Give an estimate of the amount left to be transferred, @@ -1263,7 +1274,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **err= p) =20 ret =3D qemu_file_get_error(f); if (ret =3D=3D 0) { - qemu_savevm_state_complete_precopy(f, false); + qemu_savevm_state_complete_precopy(f, false, false); ret =3D qemu_file_get_error(f); } qemu_savevm_state_cleanup(); diff --git a/migration/savevm.h b/migration/savevm.h index 45b59c1..5a2ed11 100644 --- a/migration/savevm.h +++ b/migration/savevm.h @@ -35,7 +35,8 @@ void qemu_savevm_state_header(QEMUFile *f); int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy); void qemu_savevm_state_cleanup(void); void qemu_savevm_state_complete_postcopy(QEMUFile *f); -void qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only); +int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only, + bool inactivate_disks); void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size, uint64_t *res_non_postcopiable, uint64_t *res_postcopiable); --=20 2.9.4