From nobody Tue Apr 15 03:59:04 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1506436350611518.6054264073375; Tue, 26 Sep 2017 07:32:30 -0700 (PDT) Received: from localhost ([::1]:47833 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dwquT-0001wq-QB for importer@patchew.org; Tue, 26 Sep 2017 10:32:29 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33211) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1dwqks-0001uK-AG for qemu-devel@nongnu.org; Tue, 26 Sep 2017 10:22:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1dwqkl-0005dM-EF for qemu-devel@nongnu.org; Tue, 26 Sep 2017 10:22:34 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50680) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1dwqke-0005ZW-KK; Tue, 26 Sep 2017 10:22:20 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B0D84C07F99C; Tue, 26 Sep 2017 14:22:19 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-39.ams2.redhat.com [10.36.117.39]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6855466D26; Tue, 26 Sep 2017 14:22:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com B0D84C07F99C Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=kwolf@redhat.com From: Kevin Wolf To: qemu-block@nongnu.org Date: Tue, 26 Sep 2017 16:21:31 +0200 Message-Id: <20170926142133.2498-23-kwolf@redhat.com> In-Reply-To: <20170926142133.2498-1-kwolf@redhat.com> References: <20170926142133.2498-1-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 26 Sep 2017 14:22:19 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 22/24] qcow2: add shrink image support X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, peter.maydell@linaro.org, qemu-devel@nongnu.org Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Pavel Butsykin This patch add shrinking of the image file for qcow2. As a result, this all= ows us to reduce the virtual image size and free up space on the disk without copying the image. Image can be fragmented and shrink is done by punching h= oles in the image file. Signed-off-by: Pavel Butsykin Reviewed-by: Max Reitz Reviewed-by: John Snow Message-id: 20170918124230.8152-4-pbutsykin@virtuozzo.com Signed-off-by: Max Reitz --- qapi/block-core.json | 8 +++- block/qcow2.h | 14 ++++++ block/qcow2-cluster.c | 50 +++++++++++++++++++++ block/qcow2-refcount.c | 120 +++++++++++++++++++++++++++++++++++++++++++++= ++++ block/qcow2.c | 43 ++++++++++++++---- 5 files changed, 225 insertions(+), 10 deletions(-) diff --git a/qapi/block-core.json b/qapi/block-core.json index c69a395804..750bb0c77c 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2533,6 +2533,11 @@ # # Trigger events supported by blkdebug. # +# @l1_shrink_write_table: write zeros to the l1 table to shrink image. +# (since 2.11) +# +# @l1_shrink_free_l2_clusters: discard the l2 tables. (since 2.11) +# # Since: 2.9 ## { 'enum': 'BlkdebugEvent', 'prefix': 'BLKDBG', @@ -2549,7 +2554,8 @@ 'cluster_alloc_bytes', 'cluster_free', 'flush_to_os', 'flush_to_disk', 'pwritev_rmw_head', 'pwritev_rmw_after_head', 'pwritev_rmw_tail', 'pwritev_rmw_after_tail', 'pwritev', - 'pwritev_zero', 'pwritev_done', 'empty_image_prepare' ] } + 'pwritev_zero', 'pwritev_done', 'empty_image_prepare', + 'l1_shrink_write_table', 'l1_shrink_free_l2_clusters' ] } =20 ## # @BlkdebugInjectErrorOptions: diff --git a/block/qcow2.h b/block/qcow2.h index 52c374e9ed..5a289a81e2 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -521,6 +521,18 @@ static inline uint64_t refcount_diff(uint64_t r1, uint= 64_t r2) return r1 > r2 ? r1 - r2 : r2 - r1; } =20 +static inline +uint32_t offset_to_reftable_index(BDRVQcow2State *s, uint64_t offset) +{ + return offset >> (s->refcount_block_bits + s->cluster_bits); +} + +static inline uint64_t get_refblock_offset(BDRVQcow2State *s, uint64_t off= set) +{ + uint32_t index =3D offset_to_reftable_index(s, offset); + return s->refcount_table[index] & REFT_OFFSET_MASK; +} + /* qcow2.c functions */ int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov, int64_t sector_num, int nb_sectors); @@ -584,10 +596,12 @@ int qcow2_inc_refcounts_imrt(BlockDriverState *bs, Bd= rvCheckResult *res, int qcow2_change_refcount_order(BlockDriverState *bs, int refcount_order, BlockDriverAmendStatusCB *status_cb, void *cb_opaque, Error **errp); +int qcow2_shrink_reftable(BlockDriverState *bs); =20 /* qcow2-cluster.c functions */ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size, bool exact_size); +int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t max_size); int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index); int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset= ); int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t sector_num, diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index 0d4824993c..d2518d1893 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -32,6 +32,56 @@ #include "qemu/bswap.h" #include "trace.h" =20 +int qcow2_shrink_l1_table(BlockDriverState *bs, uint64_t exact_size) +{ + BDRVQcow2State *s =3D bs->opaque; + int new_l1_size, i, ret; + + if (exact_size >=3D s->l1_size) { + return 0; + } + + new_l1_size =3D exact_size; + +#ifdef DEBUG_ALLOC2 + fprintf(stderr, "shrink l1_table from %d to %d\n", s->l1_size, new_l1_= size); +#endif + + BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_WRITE_TABLE); + ret =3D bdrv_pwrite_zeroes(bs->file, s->l1_table_offset + + new_l1_size * sizeof(uint64_t), + (s->l1_size - new_l1_size) * sizeof(uint64_t)= , 0); + if (ret < 0) { + goto fail; + } + + ret =3D bdrv_flush(bs->file->bs); + if (ret < 0) { + goto fail; + } + + BLKDBG_EVENT(bs->file, BLKDBG_L1_SHRINK_FREE_L2_CLUSTERS); + for (i =3D s->l1_size - 1; i > new_l1_size - 1; i--) { + if ((s->l1_table[i] & L1E_OFFSET_MASK) =3D=3D 0) { + continue; + } + qcow2_free_clusters(bs, s->l1_table[i] & L1E_OFFSET_MASK, + s->cluster_size, QCOW2_DISCARD_ALWAYS); + s->l1_table[i] =3D 0; + } + return 0; + +fail: + /* + * If the write in the l1_table failed the image may contain a partial= ly + * overwritten l1_table. In this case it would be better to clear the + * l1_table in memory to avoid possible image corruption. + */ + memset(s->l1_table + new_l1_size, 0, + (s->l1_size - new_l1_size) * sizeof(uint64_t)); + return ret; +} + int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size, bool exact_size) { diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index 8c17c0e3aa..88d5a3f1ad 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -29,6 +29,7 @@ #include "block/qcow2.h" #include "qemu/range.h" #include "qemu/bswap.h" +#include "qemu/cutils.h" =20 static int64_t alloc_clusters_noref(BlockDriverState *bs, uint64_t size); static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs, @@ -3061,3 +3062,122 @@ done: qemu_vfree(new_refblock); return ret; } + +static int qcow2_discard_refcount_block(BlockDriverState *bs, + uint64_t discard_block_offs) +{ + BDRVQcow2State *s =3D bs->opaque; + uint64_t refblock_offs =3D get_refblock_offset(s, discard_block_offs); + uint64_t cluster_index =3D discard_block_offs >> s->cluster_bits; + uint32_t block_index =3D cluster_index & (s->refcount_block_size - 1); + void *refblock; + int ret; + + assert(discard_block_offs !=3D 0); + + ret =3D qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs, + &refblock); + if (ret < 0) { + return ret; + } + + if (s->get_refcount(refblock, block_index) !=3D 1) { + qcow2_signal_corruption(bs, true, -1, -1, "Invalid refcount:" + " refblock offset %#" PRIx64 + ", reftable index %u" + ", block offset %#" PRIx64 + ", refcount %#" PRIx64, + refblock_offs, + offset_to_reftable_index(s, discard_block_= offs), + discard_block_offs, + s->get_refcount(refblock, block_index)); + qcow2_cache_put(bs, s->refcount_block_cache, &refblock); + return -EINVAL; + } + s->set_refcount(refblock, block_index, 0); + + qcow2_cache_entry_mark_dirty(bs, s->refcount_block_cache, refblock); + + qcow2_cache_put(bs, s->refcount_block_cache, &refblock); + + if (cluster_index < s->free_cluster_index) { + s->free_cluster_index =3D cluster_index; + } + + refblock =3D qcow2_cache_is_table_offset(bs, s->refcount_block_cache, + discard_block_offs); + if (refblock) { + /* discard refblock from the cache if refblock is cached */ + qcow2_cache_discard(bs, s->refcount_block_cache, refblock); + } + update_refcount_discard(bs, discard_block_offs, s->cluster_size); + + return 0; +} + +int qcow2_shrink_reftable(BlockDriverState *bs) +{ + BDRVQcow2State *s =3D bs->opaque; + uint64_t *reftable_tmp =3D + g_malloc(s->refcount_table_size * sizeof(uint64_t)); + int i, ret; + + for (i =3D 0; i < s->refcount_table_size; i++) { + int64_t refblock_offs =3D s->refcount_table[i] & REFT_OFFSET_MASK; + void *refblock; + bool unused_block; + + if (refblock_offs =3D=3D 0) { + reftable_tmp[i] =3D 0; + continue; + } + ret =3D qcow2_cache_get(bs, s->refcount_block_cache, refblock_offs, + &refblock); + if (ret < 0) { + goto out; + } + + /* the refblock has own reference */ + if (i =3D=3D offset_to_reftable_index(s, refblock_offs)) { + uint64_t block_index =3D (refblock_offs >> s->cluster_bits) & + (s->refcount_block_size - 1); + uint64_t refcount =3D s->get_refcount(refblock, block_index); + + s->set_refcount(refblock, block_index, 0); + + unused_block =3D buffer_is_zero(refblock, s->cluster_size); + + s->set_refcount(refblock, block_index, refcount); + } else { + unused_block =3D buffer_is_zero(refblock, s->cluster_size); + } + qcow2_cache_put(bs, s->refcount_block_cache, &refblock); + + reftable_tmp[i] =3D unused_block ? 0 : cpu_to_be64(s->refcount_tab= le[i]); + } + + ret =3D bdrv_pwrite_sync(bs->file, s->refcount_table_offset, reftable_= tmp, + s->refcount_table_size * sizeof(uint64_t)); + /* + * If the write in the reftable failed the image may contain a partial= ly + * overwritten reftable. In this case it would be better to clear the + * reftable in memory to avoid possible image corruption. + */ + for (i =3D 0; i < s->refcount_table_size; i++) { + if (s->refcount_table[i] && !reftable_tmp[i]) { + if (ret =3D=3D 0) { + ret =3D qcow2_discard_refcount_block(bs, s->refcount_table= [i] & + REFT_OFFSET_MASK); + } + s->refcount_table[i] =3D 0; + } + } + + if (!s->cache_discards) { + qcow2_process_discards(bs, ret); + } + +out: + g_free(reftable_tmp); + return ret; +} diff --git a/block/qcow2.c b/block/qcow2.c index d33fb3ecdd..970006fc1d 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3104,18 +3104,43 @@ static int qcow2_truncate(BlockDriverState *bs, int= 64_t offset, } =20 old_length =3D bs->total_sectors * 512; + new_l1_size =3D size_to_l1(s, offset); =20 - /* shrinking is currently not supported */ if (offset < old_length) { - error_setg(errp, "qcow2 doesn't support shrinking images yet"); - return -ENOTSUP; - } + if (prealloc !=3D PREALLOC_MODE_OFF) { + error_setg(errp, + "Preallocation can't be used for shrinking an image= "); + return -EINVAL; + } =20 - new_l1_size =3D size_to_l1(s, offset); - ret =3D qcow2_grow_l1_table(bs, new_l1_size, true); - if (ret < 0) { - error_setg_errno(errp, -ret, "Failed to grow the L1 table"); - return ret; + ret =3D qcow2_cluster_discard(bs, ROUND_UP(offset, s->cluster_size= ), + old_length - ROUND_UP(offset, + s->cluster_size), + QCOW2_DISCARD_ALWAYS, true); + if (ret < 0) { + error_setg_errno(errp, -ret, "Failed to discard cropped cluste= rs"); + return ret; + } + + ret =3D qcow2_shrink_l1_table(bs, new_l1_size); + if (ret < 0) { + error_setg_errno(errp, -ret, + "Failed to reduce the number of L2 tables"); + return ret; + } + + ret =3D qcow2_shrink_reftable(bs); + if (ret < 0) { + error_setg_errno(errp, -ret, + "Failed to discard unused refblocks"); + return ret; + } + } else { + ret =3D qcow2_grow_l1_table(bs, new_l1_size, true); + if (ret < 0) { + error_setg_errno(errp, -ret, "Failed to grow the L1 table"); + return ret; + } } =20 switch (prealloc) { --=20 2.13.5