If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
undo any previous preallocation, but just adds the zero flag to all
relevant L2 entries. If an external data file is in use, a write_zeroes
request to the data file is made instead.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
block/qcow2.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/block/qcow2.c b/block/qcow2.c
index 9cfbdfc939..bd632405d1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1726,6 +1726,7 @@ static int coroutine_fn qcow2_do_open(BlockDriverState *bs, QDict *options,
bs->supported_zero_flags = header.version >= 3 ?
BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK : 0;
+ bs->supported_truncate_flags = BDRV_REQ_ZERO_WRITE;
/* Repair image if dirty */
if (!(flags & (BDRV_O_CHECK | BDRV_O_INACTIVE)) && !bs->read_only &&
@@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
g_assert_not_reached();
}
+ if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
+ uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
+ uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
+
+ /* Use zero clusters as much as we can */
+ ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Failed to zero out the new area");
+ goto fail;
+ }
+
+ /* Write explicit zeros for the unaligned head */
+ if (zero_start > old_length) {
+ uint8_t *buf = qemu_blockalign0(bs, s->cluster_size);
+ QEMUIOVector qiov;
+ qemu_iovec_init_buf(&qiov, buf, zero_start - old_length);
+
+ qemu_co_mutex_unlock(&s->lock);
+ ret = qcow2_co_pwritev_part(bs, old_length, qiov.size, &qiov, 0, 0);
+ qemu_co_mutex_lock(&s->lock);
+
+ qemu_vfree(buf);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Failed to zero out the new area");
+ goto fail;
+ }
+ }
+ }
+
if (prealloc != PREALLOC_MODE_OFF) {
/* Flush metadata before actually changing the image size */
ret = qcow2_write_caches(bs);
--
2.25.3
On 22.04.20 17:21, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> block/qcow2.c | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 9cfbdfc939..bd632405d1 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
[...]
> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> g_assert_not_reached();
> }
>
> + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
> +
> + /* Use zero clusters as much as we can */
> + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
It’s kind of a pity that this changes the cluster mappings that were
established when using falloc/full preallocation already (i.e., they
become preallocated zero clusters then, so when writing to them, we need
COW again).
But falloc/full preallocation do not guarantee that the new data is
zero, so I suppose this is the only thing we can reasonably do.
I personally don’t really care about whether zero_end is aligned or not,
so either way:
Reviewed-by: Max Reitz <mreitz@redhat.com>
Am 23.04.2020 um 12:53 hat Max Reitz geschrieben:
> On 22.04.20 17:21, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> >
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> > block/qcow2.c | 30 ++++++++++++++++++++++++++++++
> > 1 file changed, 30 insertions(+)
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index 9cfbdfc939..bd632405d1 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
>
> [...]
>
> > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> > g_assert_not_reached();
> > }
> >
> > + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
> > +
> > + /* Use zero clusters as much as we can */
> > + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
>
> It’s kind of a pity that this changes the cluster mappings that were
> established when using falloc/full preallocation already (i.e., they
> become preallocated zero clusters then, so when writing to them, we need
> COW again).
>
> But falloc/full preallocation do not guarantee that the new data is
> zero, so I suppose this is the only thing we can reasonably do.
If we really want, I guess we could make full preallocation first try
passing BDRV_REQ_ZERO_WRITE to the protocol layer and if that succeeds,
we could skip setting the zero cluster flag at the qcow2 level.
Feels like a separate patch, though.
Kevin
On 23.04.20 15:25, Kevin Wolf wrote:
> Am 23.04.2020 um 12:53 hat Max Reitz geschrieben:
>> On 22.04.20 17:21, Kevin Wolf wrote:
>>> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
>>> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
>>> undo any previous preallocation, but just adds the zero flag to all
>>> relevant L2 entries. If an external data file is in use, a write_zeroes
>>> request to the data file is made instead.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>> block/qcow2.c | 30 ++++++++++++++++++++++++++++++
>>> 1 file changed, 30 insertions(+)
>>>
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 9cfbdfc939..bd632405d1 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>
>> [...]
>>
>>> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>>> g_assert_not_reached();
>>> }
>>>
>>> + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
>>> + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
>>> + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
>>> +
>>> + /* Use zero clusters as much as we can */
>>> + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
>>
>> It’s kind of a pity that this changes the cluster mappings that were
>> established when using falloc/full preallocation already (i.e., they
>> become preallocated zero clusters then, so when writing to them, we need
>> COW again).
>>
>> But falloc/full preallocation do not guarantee that the new data is
>> zero, so I suppose this is the only thing we can reasonably do.
>
> If we really want, I guess we could make full preallocation first try
> passing BDRV_REQ_ZERO_WRITE to the protocol layer and if that succeeds,
> we could skip setting the zero cluster flag at the qcow2 level.
That might be nice.
> Feels like a separate patch, though.
Definitely, yes.
Max
On 4/22/20 10:21 AM, Kevin Wolf wrote:
> If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> undo any previous preallocation, but just adds the zero flag to all
> relevant L2 entries. If an external data file is in use, a write_zeroes
> request to the data file is made instead.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> block/qcow2.c | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> g_assert_not_reached();
> }
>
> + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
This rounds up beyond the new size...
> +
> + /* Use zero clusters as much as we can */
> + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
and then requests that the extra be zeroed. Does that always work, even
when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file?
If so,
Reviewed-by: Eric Blake <eblake@redhat.com>
otherwise, you may have to treat the tail specially, the same way you
treated an unaligned head.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
Am 22.04.2020 um 17:33 hat Eric Blake geschrieben:
> On 4/22/20 10:21 AM, Kevin Wolf wrote:
> > If BDRV_REQ_ZERO_WRITE is set and we're extending the image, calling
> > qcow2_cluster_zeroize() with flags=0 does the right thing: It doesn't
> > undo any previous preallocation, but just adds the zero flag to all
> > relevant L2 entries. If an external data file is in use, a write_zeroes
> > request to the data file is made instead.
> >
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> > block/qcow2.c | 30 ++++++++++++++++++++++++++++++
> > 1 file changed, 30 insertions(+)
> >
>
> > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> > g_assert_not_reached();
> > }
> > + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
>
> This rounds up beyond the new size...
>
> > +
> > + /* Use zero clusters as much as we can */
> > + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
>
> and then requests that the extra be zeroed. Does that always work, even
> when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file?
You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's
currently not a code path that is run because we only set
BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and
data_file_is_raw() doesn't work with backing files.
But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE
for such a file, I think it would fail.
> If so,
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> otherwise, you may have to treat the tail specially, the same way you
> treated an unaligned head.
Actually, do I even need to round the tail?
/* Caller must pass aligned values, except at image end */
assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would
still set the zero flag for the partial last cluster and for the
external data file, bdrv_co_pwrite_zeroes() would have the correct size.
Kevin
On 4/22/20 10:58 AM, Kevin Wolf wrote:
>>> @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
>>> g_assert_not_reached();
>>> }
>>> + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
>>> + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
>>> + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
>>
>> This rounds up beyond the new size...
>>
>>> +
>>> + /* Use zero clusters as much as we can */
>>> + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
>>
>> and then requests that the extra be zeroed. Does that always work, even
>> when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file?
>
> You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's
> currently not a code path that is run because we only set
> BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and
> data_file_is_raw() doesn't work with backing files.
Good point.
>
> But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE
> for such a file, I think it would fail.
>
>> If so,
>>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>
>> otherwise, you may have to treat the tail specially, the same way you
>> treated an unaligned head.
>
> Actually, do I even need to round the tail?
>
> /* Caller must pass aligned values, except at image end */
> assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
> assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
> end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
>
> So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would
> still set the zero flag for the partial last cluster and for the
> external data file, bdrv_co_pwrite_zeroes() would have the correct size.
Then I'm in favor of NOT rounding the tail. That's an easy enough
change and we've now justified that it does what we want, so R-b stands
with that one-line tweak.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
Am 22.04.2020 um 18:14 hat Eric Blake geschrieben:
> On 4/22/20 10:58 AM, Kevin Wolf wrote:
>
> > > > @@ -4214,6 +4215,35 @@ static int coroutine_fn qcow2_co_truncate(BlockDriverState *bs, int64_t offset,
> > > > g_assert_not_reached();
> > > > }
> > > > + if ((flags & BDRV_REQ_ZERO_WRITE) && offset > old_length) {
> > > > + uint64_t zero_start = QEMU_ALIGN_UP(old_length, s->cluster_size);
> > > > + uint64_t zero_end = QEMU_ALIGN_UP(offset, s->cluster_size);
> > >
> > > This rounds up beyond the new size...
> > >
> > > > +
> > > > + /* Use zero clusters as much as we can */
> > > > + ret = qcow2_cluster_zeroize(bs, zero_start, zero_end - zero_start, 0);
> > >
> > > and then requests that the extra be zeroed. Does that always work, even
> > > when it results in pdrv_co_pwrite_zeroes beyond the end of s->data_file?
> >
> > You mean the data_file_is_raw() path in qcow2_cluster_zeroize()? It's
> > currently not a code path that is run because we only set
> > BDRV_REQ_ZERO_WRITE for truncate if the image has a backing file, and
> > data_file_is_raw() doesn't work with backing files.
>
> Good point.
>
> >
> > But hypothetically, if someone called truncate with BDRV_REQ_ZERO_WRITE
> > for such a file, I think it would fail.
> >
> > > If so,
> > >
> > > Reviewed-by: Eric Blake <eblake@redhat.com>
> > >
> > > otherwise, you may have to treat the tail specially, the same way you
> > > treated an unaligned head.
> >
> > Actually, do I even need to round the tail?
> >
> > /* Caller must pass aligned values, except at image end */
> > assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
> > assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
> > end_offset == bs->total_sectors << BDRV_SECTOR_BITS);
> >
> > So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would
> > still set the zero flag for the partial last cluster and for the
> > external data file, bdrv_co_pwrite_zeroes() would have the correct size.
>
> Then I'm in favor of NOT rounding the tail. That's an easy enough change
> and we've now justified that it does what we want, so R-b stands with that
> one-line tweak.
Would have been too easy... bs->total_sectors isn't updated yet, so the
assertion does fail.
I can make the assertion check end_offset >= ... instead. That should
still check what we wanted to check here and allow the unaligned
extension.
This feels like the better option to me compared to updating
bs->total_sectors earlier and then undoing that change in every error
path.
Kevin
On 4/23/20 8:23 AM, Kevin Wolf wrote: >>> >>> So qcow2_cluster_zeroize() seems to accept the unaligned tail. It would >>> still set the zero flag for the partial last cluster and for the >>> external data file, bdrv_co_pwrite_zeroes() would have the correct size. >> >> Then I'm in favor of NOT rounding the tail. That's an easy enough change >> and we've now justified that it does what we want, so R-b stands with that >> one-line tweak. > > Would have been too easy... bs->total_sectors isn't updated yet, so the > assertion does fail. > > I can make the assertion check end_offset >= ... instead. That should > still check what we wanted to check here and allow the unaligned > extension. Yes, that works for me. > > This feels like the better option to me compared to updating > bs->total_sectors earlier and then undoing that change in every error > path. Indeed. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
© 2016 - 2026 Red Hat, Inc.