From nobody Thu May 16 17:55:05 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 1524846306866358.9764504371501; Fri, 27 Apr 2018 09:25:06 -0700 (PDT) Received: from localhost ([::1]:49058 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fC6B7-0006yz-Cp for importer@patchew.org; Fri, 27 Apr 2018 12:24:57 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41230) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fC69o-0005od-EA for qemu-devel@nongnu.org; Fri, 27 Apr 2018 12:23:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fC69n-0006XA-BX for qemu-devel@nongnu.org; Fri, 27 Apr 2018 12:23:36 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46172 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fC69l-0006Vp-0a; Fri, 27 Apr 2018 12:23:33 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9CE4A40858CF; Fri, 27 Apr 2018 16:23:32 +0000 (UTC) Received: from localhost (unknown [10.36.118.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3FEEA7C3B; Fri, 27 Apr 2018 16:23:26 +0000 (UTC) From: Stefan Hajnoczi To: Date: Fri, 27 Apr 2018 17:23:11 +0100 Message-Id: <20180427162312.18583-2-stefanha@redhat.com> In-Reply-To: <20180427162312.18583-1-stefanha@redhat.com> References: <20180427162312.18583-1-stefanha@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 27 Apr 2018 16:23:32 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 27 Apr 2018 16:23:32 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'stefanha@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v2 1/2] block/file-posix: implement bdrv_co_invalidate_cache() on Linux X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Fam Zheng , Sergio Lopez , qemu-block@nongnu.org, "Dr. David Alan Gilbert" , Markus Armbruster , Stefan Hajnoczi , Max Reitz Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" On Linux posix_fadvise(POSIX_FADV_DONTNEED) invalidates pages*. Use this to drop page cache on the destination host during shared storage migration. This way the destination host will read the latest copy of the data and will not use stale data from the page cache. The flow is as follows: 1. Source host writes out all dirty pages and inactivates drives. 2. QEMU_VM_EOF is sent on migration stream. 3. Destination host invalidates caches before accessing drives. This patch enables live migration even with -drive cache.direct=3Doff. * Terms and conditions may apply, please see patch for details. Signed-off-by: Stefan Hajnoczi Reviewed-by: Fam Zheng --- block/file-posix.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/block/file-posix.c b/block/file-posix.c index 3794c0007a..3707ea2d1c 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -2236,6 +2236,49 @@ static int coroutine_fn raw_co_block_status(BlockDri= verState *bs, return ret | BDRV_BLOCK_OFFSET_VALID; } =20 +static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs, + Error **errp) +{ + BDRVRawState *s =3D bs->opaque; + int ret; + + ret =3D fd_open(bs); + if (ret < 0) { + error_setg_errno(errp, -ret, "The file descriptor is not open"); + return; + } + + if (s->open_flags & O_DIRECT) { + return; /* No host kernel page cache */ + } + +#if defined(__linux__) + /* This sets the scene for the next syscall... */ + ret =3D bdrv_co_flush(bs); + if (ret < 0) { + error_setg_errno(errp, -ret, "flush failed"); + return; + } + + /* Linux does not invalidate pages that are dirty, locked, or mmapped = by a + * process. These limitations are okay because we just fsynced the fi= le, + * we don't use mmap, and the file should not be in use by other proce= sses. + */ + ret =3D posix_fadvise(s->fd, 0, 0, POSIX_FADV_DONTNEED); + if (ret !=3D 0) { /* the return value is a positive errno */ + error_setg_errno(errp, ret, "fadvise failed"); + return; + } +#else /* __linux__ */ + /* Do nothing. Live migration to a remote host with cache.direct=3Dof= f is + * unsupported on other host operating systems. Cache consistency iss= ues + * may occur but no error is reported here, partly because that's the + * historical behavior and partly because it's hard to differentiate v= alid + * configurations that should not cause errors. + */ +#endif /* !__linux__ */ +} + static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs, int64_t offset, int bytes, BlockCompletionFunc *cb, void *opaque) @@ -2328,6 +2371,7 @@ BlockDriver bdrv_file =3D { .bdrv_co_create_opts =3D raw_co_create_opts, .bdrv_has_zero_init =3D bdrv_has_zero_init_1, .bdrv_co_block_status =3D raw_co_block_status, + .bdrv_co_invalidate_cache =3D raw_co_invalidate_cache, .bdrv_co_pwrite_zeroes =3D raw_co_pwrite_zeroes, =20 .bdrv_co_preadv =3D raw_co_preadv, @@ -2805,6 +2849,7 @@ static BlockDriver bdrv_host_device =3D { .bdrv_reopen_abort =3D raw_reopen_abort, .bdrv_co_create_opts =3D hdev_co_create_opts, .create_opts =3D &raw_create_opts, + .bdrv_co_invalidate_cache =3D raw_co_invalidate_cache, .bdrv_co_pwrite_zeroes =3D hdev_co_pwrite_zeroes, =20 .bdrv_co_preadv =3D raw_co_preadv, @@ -2927,6 +2972,7 @@ static BlockDriver bdrv_host_cdrom =3D { .bdrv_reopen_abort =3D raw_reopen_abort, .bdrv_co_create_opts =3D hdev_co_create_opts, .create_opts =3D &raw_create_opts, + .bdrv_co_invalidate_cache =3D raw_co_invalidate_cache, =20 =20 .bdrv_co_preadv =3D raw_co_preadv, --=20 2.14.3 From nobody Thu May 16 17:55:05 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1524846432059979.1192114427154; Fri, 27 Apr 2018 09:27:12 -0700 (PDT) Received: from localhost ([::1]:49071 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fC6DH-0001UQ-5Z for importer@patchew.org; Fri, 27 Apr 2018 12:27:11 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41257) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fC69q-0005qu-UF for qemu-devel@nongnu.org; Fri, 27 Apr 2018 12:23:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fC69p-0006YS-MJ for qemu-devel@nongnu.org; Fri, 27 Apr 2018 12:23:38 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:37690 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fC69m-0006Wi-O3; Fri, 27 Apr 2018 12:23:34 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6196D81A88C8; Fri, 27 Apr 2018 16:23:34 +0000 (UTC) Received: from localhost (unknown [10.36.118.67]) by smtp.corp.redhat.com (Postfix) with ESMTP id CC92E215CDCB; Fri, 27 Apr 2018 16:23:33 +0000 (UTC) From: Stefan Hajnoczi To: Date: Fri, 27 Apr 2018 17:23:12 +0100 Message-Id: <20180427162312.18583-3-stefanha@redhat.com> In-Reply-To: <20180427162312.18583-1-stefanha@redhat.com> References: <20180427162312.18583-1-stefanha@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 27 Apr 2018 16:23:34 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Fri, 27 Apr 2018 16:23:34 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'stefanha@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v2 2/2] block/file-posix: add x-check-page-cache=on|off option X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Fam Zheng , Sergio Lopez , qemu-block@nongnu.org, "Dr. David Alan Gilbert" , Markus Armbruster , Stefan Hajnoczi , Max Reitz Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" mincore(2) checks whether pages are resident. Use it to verify that page cache has been dropped. You can trigger a verification failure by mmapping the image file from another process that loads a byte from a page, forcing it to become resident. bdrv_co_invalidate_cache() will fail while that process is alive. Signed-off-by: Stefan Hajnoczi Reviewed-by: Fam Zheng --- qapi/block-core.json | 7 +++- block/file-posix.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++= ++-- 2 files changed, 104 insertions(+), 3 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index c50517bff3..21c3470234 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2530,6 +2530,10 @@ # @locking: whether to enable file locking. If set to 'auto', only ena= ble # when Open File Descriptor (OFD) locking API is available # (default: auto, since 2.10) +# @x-check-cache-dropped: whether to check that page cache was dropped on = live +# migration. May cause noticeable delays if the i= mage +# file is large, do not use in production. +# (default: off) (since: 2.13) # # Since: 2.9 ## @@ -2537,7 +2541,8 @@ 'data': { 'filename': 'str', '*pr-manager': 'str', '*locking': 'OnOffAuto', - '*aio': 'BlockdevAioOptions' } } + '*aio': 'BlockdevAioOptions', + '*x-check-cache-dropped': 'bool' } } =20 ## # @BlockdevOptionsNull: diff --git a/block/file-posix.c b/block/file-posix.c index 3707ea2d1c..5a602cfe37 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -161,6 +161,7 @@ typedef struct BDRVRawState { bool page_cache_inconsistent:1; bool has_fallocate; bool needs_alignment; + bool check_cache_dropped; =20 PRManager *pr_mgr; } BDRVRawState; @@ -168,6 +169,7 @@ typedef struct BDRVRawState { typedef struct BDRVRawReopenState { int fd; int open_flags; + bool check_cache_dropped; } BDRVRawReopenState; =20 static int fd_open(BlockDriverState *bs); @@ -415,6 +417,11 @@ static QemuOptsList raw_runtime_opts =3D { .type =3D QEMU_OPT_STRING, .help =3D "id of persistent reservation manager object (defaul= t: none)", }, + { + .name =3D "x-check-cache-dropped", + .type =3D QEMU_OPT_BOOL, + .help =3D "check that page cache was dropped on live migration= (default: off)" + }, { /* end of list */ } }, }; @@ -500,6 +507,9 @@ static int raw_open_common(BlockDriverState *bs, QDict = *options, } } =20 + s->check_cache_dropped =3D qemu_opt_get_bool(opts, "x-check-cache-drop= ped", + false); + s->open_flags =3D open_flags; raw_parse_flags(bdrv_flags, &s->open_flags); =20 @@ -777,6 +787,7 @@ static int raw_reopen_prepare(BDRVReopenState *state, { BDRVRawState *s; BDRVRawReopenState *rs; + QemuOpts *opts; int ret =3D 0; Error *local_err =3D NULL; =20 @@ -787,6 +798,19 @@ static int raw_reopen_prepare(BDRVReopenState *state, =20 state->opaque =3D g_new0(BDRVRawReopenState, 1); rs =3D state->opaque; + rs->fd =3D -1; + + /* Handle options changes */ + opts =3D qemu_opts_create(&raw_runtime_opts, NULL, 0, &error_abort); + qemu_opts_absorb_qdict(opts, state->options, &local_err); + if (local_err) { + error_propagate(errp, local_err); + ret =3D -EINVAL; + goto out; + } + + rs->check_cache_dropped =3D qemu_opt_get_bool(opts, "x-check-cache-dro= pped", + s->check_cache_dropped); =20 if (s->type =3D=3D FTYPE_CD) { rs->open_flags |=3D O_NONBLOCK; @@ -794,8 +818,6 @@ static int raw_reopen_prepare(BDRVReopenState *state, =20 raw_parse_flags(state->flags, &rs->open_flags); =20 - rs->fd =3D -1; - int fcntl_flags =3D O_APPEND | O_NONBLOCK; #ifdef O_NOATIME fcntl_flags |=3D O_NOATIME; @@ -850,6 +872,8 @@ static int raw_reopen_prepare(BDRVReopenState *state, } } =20 +out: + qemu_opts_del(opts); return ret; } =20 @@ -858,6 +882,7 @@ static void raw_reopen_commit(BDRVReopenState *state) BDRVRawReopenState *rs =3D state->opaque; BDRVRawState *s =3D state->bs->opaque; =20 + s->check_cache_dropped =3D rs->check_cache_dropped; s->open_flags =3D rs->open_flags; =20 qemu_close(s->fd); @@ -2236,6 +2261,73 @@ static int coroutine_fn raw_co_block_status(BlockDri= verState *bs, return ret | BDRV_BLOCK_OFFSET_VALID; } =20 +#if defined(__linux__) +/* Verify that the file is not in the page cache */ +static void check_cache_dropped(BlockDriverState *bs, Error **errp) +{ + const size_t window_size =3D 128 * 1024 * 1024; + BDRVRawState *s =3D bs->opaque; + void *window =3D NULL; + size_t length =3D 0; + unsigned char *vec; + size_t page_size; + off_t offset; + off_t end; + + /* mincore(2) page status information requires 1 byte per page */ + page_size =3D sysconf(_SC_PAGESIZE); + vec =3D g_malloc(DIV_ROUND_UP(window_size, page_size)); + + end =3D raw_getlength(bs); + + for (offset =3D 0; offset < end; offset +=3D window_size) { + void *new_window; + size_t new_length; + size_t vec_end; + size_t i; + int ret; + + /* Unmap previous window if size has changed */ + new_length =3D MIN(end - offset, window_size); + if (new_length !=3D length) { + munmap(window, length); + window =3D NULL; + length =3D 0; + } + + new_window =3D mmap(window, new_length, PROT_NONE, MAP_PRIVATE, + s->fd, offset); + if (new_window =3D=3D MAP_FAILED) { + error_setg_errno(errp, errno, "mmap failed"); + break; + } + + window =3D new_window; + length =3D new_length; + + ret =3D mincore(window, length, vec); + if (ret < 0) { + error_setg_errno(errp, errno, "mincore failed"); + break; + } + + vec_end =3D DIV_ROUND_UP(length, page_size); + for (i =3D 0; i < vec_end; i++) { + if (vec[i] & 0x1) { + error_setg(errp, "page cache still in use!"); + break; + } + } + } + + if (window) { + munmap(window, length); + } + + g_free(vec); +} +#endif /* __linux__ */ + static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs, Error **errp) { @@ -2269,6 +2361,10 @@ static void coroutine_fn raw_co_invalidate_cache(Blo= ckDriverState *bs, error_setg_errno(errp, ret, "fadvise failed"); return; } + + if (s->check_cache_dropped) { + check_cache_dropped(bs, errp); + } #else /* __linux__ */ /* Do nothing. Live migration to a remote host with cache.direct=3Dof= f is * unsupported on other host operating systems. Cache consistency iss= ues --=20 2.14.3