From nobody Mon Apr 29 08:18:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 1507081602087491.46177711055816; Tue, 3 Oct 2017 18:46:42 -0700 (PDT) Received: from localhost ([::1]:60950 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzYla-0001yk-5R for importer@patchew.org; Tue, 03 Oct 2017 21:46:30 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58518) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzYjE-0000en-C3 for qemu-devel@nongnu.org; Tue, 03 Oct 2017 21:44:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dzYjD-0001JG-Ac for qemu-devel@nongnu.org; Tue, 03 Oct 2017 21:44:04 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50466) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dzYj8-0001BJ-Et; Tue, 03 Oct 2017 21:43:58 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6CD3DC04AC43; Wed, 4 Oct 2017 01:43:57 +0000 (UTC) Received: from red.redhat.com (ovpn-120-2.rdu2.redhat.com [10.10.120.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5F3135C898; Wed, 4 Oct 2017 01:43:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 6CD3DC04AC43 Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=eblake@redhat.com From: Eric Blake To: qemu-devel@nongnu.org Date: Tue, 3 Oct 2017 20:43:43 -0500 Message-Id: <20171004014347.25099-2-eblake@redhat.com> In-Reply-To: <20171004014347.25099-1-eblake@redhat.com> References: <20171004014347.25099-1-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Wed, 04 Oct 2017 01:43:57 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 1/5] qemu-io: Add -C for opening with copy-on-read X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, qemu-block@nongnu.org, jcody@redhat.com, Max Reitz , stefanha@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Make it easier to enable copy-on-read during iotests, by exposing a new bool option to main and open. Signed-off-by: Eric Blake Reviewed-by: Jeff Cody Reviewed-by: Kevin Wolf Reviewed-by: John Snow Reviewed-by: Stefan Hajnoczi --- qemu-io.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/qemu-io.c b/qemu-io.c index 265445ad89..c70bde3eb1 100644 --- a/qemu-io.c +++ b/qemu-io.c @@ -102,6 +102,7 @@ static void open_help(void) " Opens a file for subsequent use by all of the other qemu-io commands.\n" " -r, -- open file read-only\n" " -s, -- use snapshot file\n" +" -C, -- use copy-on-read\n" " -n, -- disable host cache, short for -t none\n" " -U, -- force shared permissions\n" " -k, -- use kernel AIO implementation (on Linux only)\n" @@ -120,7 +121,7 @@ static const cmdinfo_t open_cmd =3D { .argmin =3D 1, .argmax =3D -1, .flags =3D CMD_NOFILE_OK, - .args =3D "[-rsnkU] [-t cache] [-d discard] [-o options] [path]", + .args =3D "[-rsCnkU] [-t cache] [-d discard] [-o options] [path]= ", .oneline =3D "open the file specified by path", .help =3D open_help, }; @@ -145,7 +146,7 @@ static int open_f(BlockBackend *blk, int argc, char **a= rgv) QDict *opts; bool force_share =3D false; - while ((c =3D getopt(argc, argv, "snro:kt:d:U")) !=3D -1) { + while ((c =3D getopt(argc, argv, "snCro:kt:d:U")) !=3D -1) { switch (c) { case 's': flags |=3D BDRV_O_SNAPSHOT; @@ -154,6 +155,9 @@ static int open_f(BlockBackend *blk, int argc, char **a= rgv) flags |=3D BDRV_O_NOCACHE; writethrough =3D false; break; + case 'C': + flags |=3D BDRV_O_COPY_ON_READ; + break; case 'r': readonly =3D 1; break; @@ -251,6 +255,7 @@ static void usage(const char *name) " -r, --read-only export read-only\n" " -s, --snapshot use snapshot file\n" " -n, --nocache disable host cache, short for -t none\n" +" -C, --copy-on-read enable copy-on-read\n" " -m, --misalign misalign allocations for O_DIRECT\n" " -k, --native-aio use kernel AIO implementation (on Linux only)\n" " -t, --cache=3DMODE use the given cache mode for the image\n" @@ -439,7 +444,7 @@ static QemuOptsList file_opts =3D { int main(int argc, char **argv) { int readonly =3D 0; - const char *sopt =3D "hVc:d:f:rsnmkt:T:U"; + const char *sopt =3D "hVc:d:f:rsnCmkt:T:U"; const struct option lopt[] =3D { { "help", no_argument, NULL, 'h' }, { "version", no_argument, NULL, 'V' }, @@ -448,6 +453,7 @@ int main(int argc, char **argv) { "read-only", no_argument, NULL, 'r' }, { "snapshot", no_argument, NULL, 's' }, { "nocache", no_argument, NULL, 'n' }, + { "copy-on-read", no_argument, NULL, 'C' }, { "misalign", no_argument, NULL, 'm' }, { "native-aio", no_argument, NULL, 'k' }, { "discard", required_argument, NULL, 'd' }, @@ -492,6 +498,9 @@ int main(int argc, char **argv) flags |=3D BDRV_O_NOCACHE; writethrough =3D false; break; + case 'C': + flags |=3D BDRV_O_COPY_ON_READ; + break; case 'd': if (bdrv_parse_discard_flags(optarg, &flags) < 0) { error_report("Invalid discard option: %s", optarg); --=20 2.13.6 From nobody Mon Apr 29 08:18:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (208.118.235.17 [208.118.235.17]) by mx.zohomail.com with SMTPS id 1507081611351541.8389246566187; Tue, 3 Oct 2017 18:46:51 -0700 (PDT) Received: from localhost ([::1]:60952 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzYlh-00025D-FB for importer@patchew.org; Tue, 03 Oct 2017 21:46:37 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58546) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzYjH-0000gG-9Y for qemu-devel@nongnu.org; Tue, 03 Oct 2017 21:44:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dzYjG-0001Kq-8w for qemu-devel@nongnu.org; Tue, 03 Oct 2017 21:44:07 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53810) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dzYjB-0001Fu-Qd; Tue, 03 Oct 2017 21:44:01 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9BBFA882FF; Wed, 4 Oct 2017 01:44:00 +0000 (UTC) Received: from red.redhat.com (ovpn-120-2.rdu2.redhat.com [10.10.120.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id A7E805C881; Wed, 4 Oct 2017 01:43:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 9BBFA882FF Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=eblake@redhat.com From: Eric Blake To: qemu-devel@nongnu.org Date: Tue, 3 Oct 2017 20:43:44 -0500 Message-Id: <20171004014347.25099-3-eblake@redhat.com> In-Reply-To: <20171004014347.25099-1-eblake@redhat.com> References: <20171004014347.25099-1-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 04 Oct 2017 01:44:00 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 2/5] block: Uniform handling of 0-length bdrv_get_block_status() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Fam Zheng , qemu-block@nongnu.org, jcody@redhat.com, Max Reitz , stefanha@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Handle a 0-length block status request up front, with a uniform return value claiming the area is not allocated. Most callers don't pass a length of 0 to bdrv_get_block_status() and friends; but it definitely happens with a 0-length read when copy-on-read is enabled. While we could audit all callers to ensure that they never make a 0-length request, and then assert that fact, it was just as easy to fix things to always report success (as long as the callers are careful to not go into an infinite loop). However, we had inconsistent behavior on whether the status is reported as allocated or defers to the backing layer, depending on what callbacks the driver implements, and possibly wasting quite a few CPU cycles to get to that answer. Consistently reporting unallocated up front doesn't really hurt anything, and makes it easier both for callers (0-length requests now have well-defined behavior) and for drivers (drivers don't have to deal with 0-length requests). Signed-off-by: Eric Blake Reviewed-by: Stefan Hajnoczi --- v2: new patch --- block/io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/io.c b/block/io.c index e0f904583f..1f5baac41d 100644 --- a/block/io.c +++ b/block/io.c @@ -1773,9 +1773,9 @@ static int64_t coroutine_fn bdrv_co_get_block_status(= BlockDriverState *bs, return total_sectors; } - if (sector_num >=3D total_sectors) { + if (sector_num >=3D total_sectors || !nb_sectors) { *pnum =3D 0; - return BDRV_BLOCK_EOF; + return sector_num >=3D total_sectors ? BDRV_BLOCK_EOF : 0; } n =3D total_sectors - sector_num; --=20 2.13.6 From nobody Mon Apr 29 08:18:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1507081759244182.5629899277816; Tue, 3 Oct 2017 18:49:19 -0700 (PDT) Received: from localhost ([::1]:60963 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzYoA-0003qq-Co for importer@patchew.org; Tue, 03 Oct 2017 21:49:10 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58576) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzYjI-0000hN-AI for qemu-devel@nongnu.org; Tue, 03 Oct 2017 21:44:09 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dzYjH-0001Lm-Es for qemu-devel@nongnu.org; Tue, 03 Oct 2017 21:44:08 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60180) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dzYjD-0001II-4t; Tue, 03 Oct 2017 21:44:03 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 317B3820F8; Wed, 4 Oct 2017 01:44:02 +0000 (UTC) Received: from red.redhat.com (ovpn-120-2.rdu2.redhat.com [10.10.120.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id D73F45C542; Wed, 4 Oct 2017 01:44:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 317B3820F8 Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx02.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=eblake@redhat.com From: Eric Blake To: qemu-devel@nongnu.org Date: Tue, 3 Oct 2017 20:43:45 -0500 Message-Id: <20171004014347.25099-4-eblake@redhat.com> In-Reply-To: <20171004014347.25099-1-eblake@redhat.com> References: <20171004014347.25099-1-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 04 Oct 2017 01:44:02 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 3/5] block: Add blkdebug hook for copy-on-read X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Fam Zheng , qemu-block@nongnu.org, jcody@redhat.com, Markus Armbruster , Max Reitz , stefanha@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Make it possible to inject errors on writes performed during a read operation due to copy-on-read semantics. Signed-off-by: Eric Blake Reviewed-by: Jeff Cody Reviewed-by: Kevin Wolf Reviewed-by: John Snow Reviewed-by: Stefan Hajnoczi --- qapi/block-core.json | 5 ++++- block/io.c | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index 750bb0c77c..ab96e348e6 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2538,6 +2538,8 @@ # # @l1_shrink_free_l2_clusters: discard the l2 tables. (since 2.11) # +# @cor_write: a write due to copy-on-read (since 2.11) +# # Since: 2.9 ## { 'enum': 'BlkdebugEvent', 'prefix': 'BLKDBG', @@ -2555,7 +2557,8 @@ 'flush_to_disk', 'pwritev_rmw_head', 'pwritev_rmw_after_head', 'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev', 'pwritev_zero', 'pwritev_done', 'empty_image_prepare', - 'l1_shrink_write_table', 'l1_shrink_free_l2_clusters' ] } + 'l1_shrink_write_table', 'l1_shrink_free_l2_clusters', + 'cor_write'] } ## # @BlkdebugInjectErrorOptions: diff --git a/block/io.c b/block/io.c index 1f5baac41d..d656a0485b 100644 --- a/block/io.c +++ b/block/io.c @@ -983,6 +983,7 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvCh= ild *child, goto err; } + bdrv_debug_event(bs, BLKDBG_COR_WRITE); if (drv->bdrv_co_pwrite_zeroes && buffer_is_zero(bounce_buffer, iov.iov_len)) { /* FIXME: Should we (perhaps conditionally) be setting --=20 2.13.6 From nobody Mon Apr 29 08:18:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1507081891168645.6319491738379; Tue, 3 Oct 2017 18:51:31 -0700 (PDT) Received: from localhost ([::1]:60977 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzYqQ-0005R2-Ex for importer@patchew.org; Tue, 03 Oct 2017 21:51:30 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58599) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzYjK-0000ju-JH for qemu-devel@nongnu.org; Tue, 03 Oct 2017 21:44:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dzYjJ-0001Mq-2O for qemu-devel@nongnu.org; Tue, 03 Oct 2017 21:44:10 -0400 Received: from mx1.redhat.com ([209.132.183.28]:53870) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dzYjE-0001Jp-P7; Tue, 03 Oct 2017 21:44:04 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BCC3280478; Wed, 4 Oct 2017 01:44:03 +0000 (UTC) Received: from red.redhat.com (ovpn-120-2.rdu2.redhat.com [10.10.120.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6B9205C881; Wed, 4 Oct 2017 01:44:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com BCC3280478 Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=eblake@redhat.com From: Eric Blake To: qemu-devel@nongnu.org Date: Tue, 3 Oct 2017 20:43:46 -0500 Message-Id: <20171004014347.25099-5-eblake@redhat.com> In-Reply-To: <20171004014347.25099-1-eblake@redhat.com> References: <20171004014347.25099-1-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Wed, 04 Oct 2017 01:44:03 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 4/5] block: Perform copy-on-read in loop X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Fam Zheng , qemu-block@nongnu.org, jcody@redhat.com, qemu-stable@nongnu.org, Max Reitz , stefanha@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Improve our braindead copy-on-read implementation. Pre-patch, we have multiple issues: - we create a bounce buffer and perform a write for the entire request, even if the active image already has 99% of the clusters occupied, and really only needs to copy-on-read the remaining 1% of the clusters - our bounce buffer was as large as the read request, and can needlessly exhaust our memory by using double the memory of the request size (the original request plus our bounce buffer), rather than a capped maximum overhead beyond the original - if a driver has a max_transfer limit, we are bypassing the normal code in bdrv_aligned_preadv() that fragments to that limit, and instead attempt to read the entire buffer from the driver in one go, which some drivers may assert on - a client can request a large request of nearly 2G such that rounding the request out to cluster boundaries results in a byte count larger than 2G. While this cannot exceed 32 bits, it DOES have some follow-on problems: -- the call to bdrv_driver_pread() can assert for exceeding BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks .bdrv_co_preadv -- if the buffer is all zeroes, the subsequent call to bdrv_co_do_pwrite_zeroes is a no-op due to a negative size, which means we did not actually copy on read Fix all of these issues by breaking up the action into a loop, where each iteration is capped to sane limits. Also, querying the allocation status allows us to optimize: when data is already present in the active layer, we don't need to bounce. Note that the code has a telling comment that copy-on-read should probably be a filter driver rather than a bolt-in hack in io.c; but that remains a task for another day. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake Reviewed-by: Kevin Wolf Reviewed-by: Stefan Hajnoczi --- v2: avoid uninit ret on 0-length op [patchew, Kevin] --- block/io.c | 120 +++++++++++++++++++++++++++++++++++++++++----------------= ---- 1 file changed, 82 insertions(+), 38 deletions(-) diff --git a/block/io.c b/block/io.c index d656a0485b..1e246315a7 100644 --- a/block/io.c +++ b/block/io.c @@ -34,6 +34,9 @@ #define NOT_DONE 0x7fffffff /* used while emulated sync operation in progr= ess */ +/* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */ +#define MAX_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS) + static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int bytes, BdrvRequestFlags flags); @@ -945,11 +948,14 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(Bdrv= Child *child, BlockDriver *drv =3D bs->drv; struct iovec iov; - QEMUIOVector bounce_qiov; + QEMUIOVector local_qiov; int64_t cluster_offset; unsigned int cluster_bytes; size_t skip_bytes; int ret; + int max_transfer =3D MIN_NON_ZERO(bs->bl.max_transfer, + BDRV_REQUEST_MAX_BYTES); + unsigned int progress =3D 0; /* FIXME We cannot require callers to have write permissions when all = they * are doing is a read request. If we did things right, write permissi= ons @@ -961,53 +967,95 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(Bdrv= Child *child, // assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE)); /* Cover entire cluster so no additional backing file I/O is required = when - * allocating cluster in the image file. + * allocating cluster in the image file. Note that this value may exc= eed + * BDRV_REQUEST_MAX_BYTES (even when the original read did not), which + * is one reason we loop rather than doing it all at once. */ bdrv_round_to_clusters(bs, offset, bytes, &cluster_offset, &cluster_by= tes); + skip_bytes =3D offset - cluster_offset; trace_bdrv_co_do_copy_on_readv(bs, offset, bytes, cluster_offset, cluster_bytes); - iov.iov_len =3D cluster_bytes; - iov.iov_base =3D bounce_buffer =3D qemu_try_blockalign(bs, iov.iov_len= ); + bounce_buffer =3D qemu_try_blockalign(bs, + MIN(MIN(max_transfer, cluster_byte= s), + MAX_BOUNCE_BUFFER)); if (bounce_buffer =3D=3D NULL) { ret =3D -ENOMEM; goto err; } - qemu_iovec_init_external(&bounce_qiov, &iov, 1); + while (cluster_bytes) { + int64_t pnum; - ret =3D bdrv_driver_preadv(bs, cluster_offset, cluster_bytes, - &bounce_qiov, 0); - if (ret < 0) { - goto err; - } + ret =3D bdrv_is_allocated(bs, cluster_offset, + MIN(cluster_bytes, max_transfer), &pnum); + if (ret < 0) { + /* Safe to treat errors in querying allocation as if + * unallocated; we'll probably fail again soon on the + * read, but at least that will set a decent errno. + */ + pnum =3D MIN(cluster_bytes, max_transfer); + } - bdrv_debug_event(bs, BLKDBG_COR_WRITE); - if (drv->bdrv_co_pwrite_zeroes && - buffer_is_zero(bounce_buffer, iov.iov_len)) { - /* FIXME: Should we (perhaps conditionally) be setting - * BDRV_REQ_MAY_UNMAP, if it will allow for a sparser copy - * that still correctly reads as zero? */ - ret =3D bdrv_co_do_pwrite_zeroes(bs, cluster_offset, cluster_bytes= , 0); - } else { - /* This does not change the data on the disk, it is not necessary - * to flush even in cache=3Dwritethrough mode. - */ - ret =3D bdrv_driver_pwritev(bs, cluster_offset, cluster_bytes, - &bounce_qiov, 0); - } + assert(skip_bytes < pnum); - if (ret < 0) { - /* It might be okay to ignore write errors for guest requests. If= this - * is a deliberate copy-on-read then we don't want to ignore the e= rror. - * Simply report it in all cases. - */ - goto err; - } + if (ret <=3D 0) { + /* Must copy-on-read; use the bounce buffer */ + iov.iov_base =3D bounce_buffer; + iov.iov_len =3D pnum =3D MIN(pnum, MAX_BOUNCE_BUFFER); + qemu_iovec_init_external(&local_qiov, &iov, 1); + + ret =3D bdrv_driver_preadv(bs, cluster_offset, pnum, + &local_qiov, 0); + if (ret < 0) { + goto err; + } - skip_bytes =3D offset - cluster_offset; - qemu_iovec_from_buf(qiov, 0, bounce_buffer + skip_bytes, bytes); + bdrv_debug_event(bs, BLKDBG_COR_WRITE); + if (drv->bdrv_co_pwrite_zeroes && + buffer_is_zero(bounce_buffer, pnum)) { + /* FIXME: Should we (perhaps conditionally) be setting + * BDRV_REQ_MAY_UNMAP, if it will allow for a sparser copy + * that still correctly reads as zero? */ + ret =3D bdrv_co_do_pwrite_zeroes(bs, cluster_offset, pnum,= 0); + } else { + /* This does not change the data on the disk, it is not + * necessary to flush even in cache=3Dwritethrough mode. + */ + ret =3D bdrv_driver_pwritev(bs, cluster_offset, pnum, + &local_qiov, 0); + } + + if (ret < 0) { + /* It might be okay to ignore write errors for guest + * requests. If this is a deliberate copy-on-read + * then we don't want to ignore the error. Simply + * report it in all cases. + */ + goto err; + } + + qemu_iovec_from_buf(qiov, progress, bounce_buffer + skip_bytes, + pnum - skip_bytes); + } else { + /* Read directly into the destination */ + qemu_iovec_init(&local_qiov, qiov->niov); + qemu_iovec_concat(&local_qiov, qiov, progress, pnum - skip_byt= es); + ret =3D bdrv_driver_preadv(bs, offset + progress, local_qiov.s= ize, + &local_qiov, 0); + qemu_iovec_destroy(&local_qiov); + if (ret < 0) { + goto err; + } + } + + cluster_offset +=3D pnum; + cluster_bytes -=3D pnum; + progress +=3D pnum - skip_bytes; + skip_bytes =3D 0; + } + ret =3D 0; err: qemu_vfree(bounce_buffer); @@ -1213,9 +1261,6 @@ int coroutine_fn bdrv_co_readv(BdrvChild *child, int6= 4_t sector_num, return bdrv_co_do_readv(child, sector_num, nb_sectors, qiov, 0); } -/* Maximum buffer for write zeroes fallback, in bytes */ -#define MAX_WRITE_ZEROES_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS) - static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int bytes, BdrvRequestFlags flags) { @@ -1230,8 +1275,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(Bloc= kDriverState *bs, int max_write_zeroes =3D MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MA= X); int alignment =3D MAX(bs->bl.pwrite_zeroes_alignment, bs->bl.request_alignment); - int max_transfer =3D MIN_NON_ZERO(bs->bl.max_transfer, - MAX_WRITE_ZEROES_BOUNCE_BUFFER); + int max_transfer =3D MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFF= ER); assert(alignment % bs->bl.request_alignment =3D=3D 0); head =3D offset % alignment; --=20 2.13.6 From nobody Mon Apr 29 08:18:43 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1507081806704896.8134483730468; Tue, 3 Oct 2017 18:50:06 -0700 (PDT) Received: from localhost ([::1]:60964 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzYp0-0004MA-MD for importer@patchew.org; Tue, 03 Oct 2017 21:50:02 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58617) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dzYjL-0000jw-UM for qemu-devel@nongnu.org; Tue, 03 Oct 2017 21:44:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dzYjK-0001NM-9H for qemu-devel@nongnu.org; Tue, 03 Oct 2017 21:44:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37610) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dzYjG-0001KS-2e; Tue, 03 Oct 2017 21:44:06 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1E9723B724; Wed, 4 Oct 2017 01:44:05 +0000 (UTC) Received: from red.redhat.com (ovpn-120-2.rdu2.redhat.com [10.10.120.2]) by smtp.corp.redhat.com (Postfix) with ESMTP id 06C414DA81; Wed, 4 Oct 2017 01:44:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 1E9723B724 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=eblake@redhat.com From: Eric Blake To: qemu-devel@nongnu.org Date: Tue, 3 Oct 2017 20:43:47 -0500 Message-Id: <20171004014347.25099-6-eblake@redhat.com> In-Reply-To: <20171004014347.25099-1-eblake@redhat.com> References: <20171004014347.25099-1-eblake@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Wed, 04 Oct 2017 01:44:05 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2 5/5] iotests: Add test 197 for covering copy-on-read X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, qemu-block@nongnu.org, jcody@redhat.com, Max Reitz , stefanha@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add a test for qcow2 copy-on-read behavior, including exposure for the just-fixed bugs. The copy-on-read behavior is always to a qcow2 image, but the test is careful to allow running with most image protocol/format combos as the backing file being copied from (luks being the exception, as it is harder to pass the right secret to all the right places). In fact, for './check nbd', this appears to be the first time we've had a qcow2 image wrapping NBD, requiring an additional line in _filter_img_create to match the similar line in _filter_img_info. Invoking blkdebug to prove we don't write too much took some effort to get working; and it requires that $TEST_WRAP (based on $TEST_DIR) not be subject to word splitting. We may decide later to have the entire iotests suite use relative rather than absolute names, to avoid problems inherited by the absolute name of $PWD or $TEST_DIR, at which point the sanity check in this commit could be simplified. Signed-off-by: Eric Blake Reviewed-by: Stefan Hajnoczi --- v2: test 0-length query [Kevin], sanity check TEST_DIR [Jeff] I only tested with -raw, -qcow2, -qed, and -nbd. I won't be surprised if the test fails in some other setup... --- tests/qemu-iotests/common.filter | 1 + tests/qemu-iotests/197 | 102 +++++++++++++++++++++++++++++++++++= ++++ tests/qemu-iotests/197.out | 26 ++++++++++ tests/qemu-iotests/group | 1 + 4 files changed, 130 insertions(+) create mode 100755 tests/qemu-iotests/197 create mode 100644 tests/qemu-iotests/197.out diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.f= ilter index 9d5442ecd9..227b37e941 100644 --- a/tests/qemu-iotests/common.filter +++ b/tests/qemu-iotests/common.filter @@ -111,6 +111,7 @@ _filter_img_create() sed -e "s#$IMGPROTO:$TEST_DIR#TEST_DIR#g" \ -e "s#$TEST_DIR#TEST_DIR#g" \ -e "s#$IMGFMT#IMGFMT#g" \ + -e 's#nbd:127.0.0.1:10810#TEST_DIR/t.IMGFMT#g' \ -e "s# encryption=3Doff##g" \ -e "s# cluster_size=3D[0-9]\\+##g" \ -e "s# table_size=3D[0-9]\\+##g" \ diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197 new file mode 100755 index 0000000000..cc85388039 --- /dev/null +++ b/tests/qemu-iotests/197 @@ -0,0 +1,102 @@ +#!/bin/bash +# +# Test case for copy-on-read into qcow2 +# +# Copyright (C) 2017 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# + +# creator +owner=3Deblake@redhat.com + +seq=3D"$(basename $0)" +echo "QA output created by $seq" + +here=3D"$PWD" +status=3D1 # failure is the default! + +# get standard environment, filters and checks +. ./common.rc +. ./common.filter + +TEST_WRAP=3D"$TEST_DIR/t.wrap.qcow2" +BLKDBG_CONF=3D"$TEST_DIR/blkdebug.conf" + +# Sanity check: our use of blkdebug fails if $TEST_DIR contains spaces +# or other problems +case "$TEST_DIR" in + *[^-_a-zA-Z0-9/]*) + _notrun "Suspicious TEST_DIR=3D'$TEST_DIR', cowardly refusing to r= un" ;; +esac + +_cleanup() +{ + _cleanup_test_img + rm -f "$BLKDBG_CONF" +} +trap "_cleanup; exit \$status" 0 1 2 3 15 + +# Test is supported for any backing file; but we force qcow2 for our wrapp= er. +_supported_fmt generic +_supported_proto generic +_supported_os Linux +# LUKS support may be possible, but it complicates things. +_unsupported_fmt luks + +echo +echo '=3D=3D=3D Copy-on-read =3D=3D=3D' +echo + +# Prep the images +_make_test_img 4G +$QEMU_IO -c "write -P 55 3G 1k" "$TEST_IMG" | _filter_qemu_io +IMGPROTO=3Dfile IMGFMT=3Dqcow2 IMGOPTS=3D TEST_IMG_FILE=3D"$TEST_WRAP" \ + _make_test_img -F "$IMGFMT" -b "$TEST_IMG" | _filter_img_create +$QEMU_IO -f qcow2 -c "write -z -u 1M 64k" "$TEST_WRAP" | _filter_qemu_io + +# Ensure that a read of two clusters, but where one is already allocated, +# does not re-write the allocated cluster +cat > "$BLKDBG_CONF" <&1 | _filter_testdir + +# Break the backing chain, and show that images are identical, and that +# we properly copied over explicit zeros. +$QEMU_IMG rebase -u -b "" -f qcow2 "$TEST_WRAP" +$QEMU_IO -f qcow2 -c map "$TEST_WRAP" +_check_test_img +$QEMU_IMG compare -f $IMGFMT -F qcow2 "$TEST_IMG" "$TEST_WRAP" + +# success, all done +echo '*** done' +status=3D0 diff --git a/tests/qemu-iotests/197.out b/tests/qemu-iotests/197.out new file mode 100644 index 0000000000..52b4137d7b --- /dev/null +++ b/tests/qemu-iotests/197.out @@ -0,0 +1,26 @@ +QA output created by 197 + +=3D=3D=3D Copy-on-read =3D=3D=3D + +Formatting 'TEST_DIR/t.IMGFMT', fmt=3DIMGFMT size=3D4294967296 +wrote 1024/1024 bytes at offset 3221225472 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +Formatting 'TEST_DIR/t.wrap.IMGFMT', fmt=3DIMGFMT size=3D4294967296 backin= g_file=3DTEST_DIR/t.IMGFMT backing_fmt=3DIMGFMT +wrote 65536/65536 bytes at offset 1048576 +64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 131072/131072 bytes at offset 1048576 +128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 0/0 bytes at offset 0 +0 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 2147483136/2147483136 bytes at offset 1024 +2 GiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +read 1024/1024 bytes at offset 3221226496 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +can't open device TEST_DIR/t.wrap.qcow2: Can't use copy-on-read on read-on= ly device +2 GiB (0x80010000) bytes allocated at offset 0 bytes (0x0) +1023.938 MiB (0x3fff0000) bytes not allocated at offset 2 GiB (0x80010000) +64 KiB (0x10000) bytes allocated at offset 3 GiB (0xc0000000) +1023.938 MiB (0x3fff0000) bytes not allocated at offset 3 GiB (0xc0010000) +No errors were found on the image. +Images are identical. +*** done diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group index cdccee319e..7a302a67cf 100644 --- a/tests/qemu-iotests/group +++ b/tests/qemu-iotests/group @@ -191,3 +191,4 @@ 192 rw auto quick 194 rw auto migration quick 195 rw auto quick +197 rw auto quick --=20 2.13.6