If a qcow2 file is preallocated, it can no longer guarantee that it
initially appears as filled with zeroes.
So implement .bdrv_has_zero_init() by checking whether the file is
preallocated; if so, forward the call to the underlying storage node,
except for when it is encrypted: Encrypted preallocated images always
return effectively random data, so .bdrv_has_zero_init() must always
return 0 for them.
Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 89 insertions(+), 1 deletion(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index 039bdc2f7e..730fd53890 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4631,6 +4631,94 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs,
return spec_info;
}
+/*
+ * Return 1 if the file only contains zero and unallocated clusters.
+ * Return 0 if it contains compressed or normal clusters.
+ * Return -errno on error.
+ */
+static int qcow2_is_zero(BlockDriverState *bs)
+{
+ BDRVQcow2State *s = bs->opaque;
+ int l1_i;
+ int ret;
+
+ if (bs->backing) {
+ return 0;
+ }
+
+ for (l1_i = 0; l1_i < s->l1_size; l1_i++) {
+ uint64_t l2_offset = s->l1_table[l1_i] & L1E_OFFSET_MASK;
+ int slice_start_i;
+
+ if (!l2_offset) {
+ continue;
+ }
+
+ for (slice_start_i = 0; slice_start_i < s->l2_size;
+ slice_start_i += s->l2_slice_size)
+ {
+ uint64_t *l2_slice;
+ int l2_slice_i;
+
+ ret = qcow2_cache_get(bs, s->l2_table_cache,
+ l2_offset + slice_start_i * sizeof(uint64_t),
+ (void **)&l2_slice);
+ if (ret < 0) {
+ return ret;
+ }
+
+ for (l2_slice_i = 0; l2_slice_i < s->l2_slice_size; l2_slice_i++) {
+ uint64_t l2_entry = be64_to_cpu(l2_slice[l2_slice_i]);
+
+ switch (qcow2_get_cluster_type(bs, l2_entry)) {
+ case QCOW2_CLUSTER_UNALLOCATED:
+ case QCOW2_CLUSTER_ZERO_PLAIN:
+ case QCOW2_CLUSTER_ZERO_ALLOC:
+ break;
+
+ case QCOW2_CLUSTER_NORMAL:
+ case QCOW2_CLUSTER_COMPRESSED:
+ qcow2_cache_put(s->l2_table_cache, (void **)&l2_slice);
+ return 0;
+
+ default:
+ abort();
+ }
+ }
+
+ qcow2_cache_put(s->l2_table_cache, (void **)&l2_slice);
+ }
+ }
+
+ return 1;
+}
+
+static int qcow2_has_zero_init(BlockDriverState *bs)
+{
+ BDRVQcow2State *s = bs->opaque;
+ int ret;
+
+ if (qemu_in_coroutine()) {
+ qemu_co_mutex_lock(&s->lock);
+ }
+ /* Check preallocation status */
+ ret = qcow2_is_zero(bs);
+ if (qemu_in_coroutine()) {
+ qemu_co_mutex_unlock(&s->lock);
+ }
+ if (ret < 0) {
+ return 0;
+ }
+
+ if (ret == 1) {
+ return 1;
+ } else if (bs->encrypted) {
+ return 0;
+ } else {
+ return bdrv_has_zero_init(s->data_file->bs);
+ }
+}
+
static int qcow2_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
int64_t pos)
{
@@ -5186,7 +5274,7 @@ BlockDriver bdrv_qcow2 = {
.bdrv_child_perm = bdrv_format_default_perms,
.bdrv_co_create_opts = qcow2_co_create_opts,
.bdrv_co_create = qcow2_co_create,
- .bdrv_has_zero_init = bdrv_has_zero_init_1,
+ .bdrv_has_zero_init = qcow2_has_zero_init,
.bdrv_co_block_status = qcow2_co_block_status,
.bdrv_co_preadv = qcow2_co_preadv,
--
2.21.0
Am 15.07.2019 um 12:45 hat Max Reitz geschrieben: > If a qcow2 file is preallocated, it can no longer guarantee that it > initially appears as filled with zeroes. > > So implement .bdrv_has_zero_init() by checking whether the file is > preallocated; if so, forward the call to the underlying storage node, > except for when it is encrypted: Encrypted preallocated images always > return effectively random data, so .bdrv_has_zero_init() must always > return 0 for them. > > Reported-by: Stefano Garzarella <sgarzare@redhat.com> > Signed-off-by: Max Reitz <mreitz@redhat.com> Hm... This patch only really works directly after image creation (which is indeed where .bdrv_has_zero_init is used). Why do we have to have a full qcow2_is_zero() that loops over the whole image just to find out whether it's preallocated? Wouldn't looking at a single data cluster be enough? Kevin
On 16.07.19 18:54, Kevin Wolf wrote: > Am 15.07.2019 um 12:45 hat Max Reitz geschrieben: >> If a qcow2 file is preallocated, it can no longer guarantee that it >> initially appears as filled with zeroes. >> >> So implement .bdrv_has_zero_init() by checking whether the file is >> preallocated; if so, forward the call to the underlying storage node, >> except for when it is encrypted: Encrypted preallocated images always >> return effectively random data, so .bdrv_has_zero_init() must always >> return 0 for them. >> >> Reported-by: Stefano Garzarella <sgarzare@redhat.com> >> Signed-off-by: Max Reitz <mreitz@redhat.com> > > Hm... This patch only really works directly after image creation (which > is indeed where .bdrv_has_zero_init is used). Why do we have to have a > full qcow2_is_zero() that loops over the whole image just to find out > whether it's preallocated? Wouldn't looking at a single data cluster be > enough? Hm. I would like to agree (because you’re right), but now I see that the callers of bdrv_has_zero_init() don’t necessarily hold to that convention. For example, qemu-img convert has the -n flag, but that doesn’t stop it from invoking bdrv_has_zero_init(). Which is a bug, of course. $ ./qemu-img create -f qcow2 src.qcow2 64M $ ./qemu-img create -f qcow2 dest.qcow2 64M $ ./qemu-io -c 'write -P 42 0 64M' dest.qcow2 $ ./qemu-img convert -n src.qcow2 dest.qcow2 $ ./qemu-img compare src.qcow2 dest.qcow2 Content mismatch at offset 0! Aw, man, why does this keep happening... :-/ OK, so qemu-img convert -n is easy to fix. But there are more callers: mirror: Uses this function to inquire whether it needs to zero the target before actually doing something useful. There is no guarantee that the target is a new image. Well, it just isn’t with mode=existing or blockdev-mirror. parallels: Whether to write zeroes to newly added image areas. That actually sounds correct, because those new areas cannot point to any data yet. Well, maybe not correct, because bdrv_has_zero_init() is not the same as “when this image grows, new areas will be zero”, but at least bdrv_hsa_zero_init() will return false if the the latter is false. vhdx: Similarly to parallels, it uses this information to check whether it needs to zero new areas when growing an image file. raw/vmdk/vpc: Just passing through info from their storage child. Hm, OK. So mirror and qemu-img need fixing. That sounds possible. Max
© 2016 - 2026 Red Hat, Inc.