[PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero

Eric Blake posted 11 patches 5 months, 1 week ago
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, John Snow <jsnow@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>, Ronnie Sahlberg <ronniesahlberg@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Peter Lieven <pl@dlhnet.de>, Eric Blake <eblake@redhat.com>, "Denis V. Lunev" <den@openvz.org>, Alberto Garcia <berto@igalia.com>, Ilya Dryomov <idryomov@gmail.com>, Stefan Weil <sw@weilnetz.de>, Markus Armbruster <armbru@redhat.com>
[PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero
Posted by Eric Blake 5 months, 1 week ago
When doing a sync=full mirroring, QMP drive-mirror requests full
zeroing if it did not just create the destination, and blockdev-mirror
requests full zeroing unconditionally.  This is because during a full
sync, we must ensure that the portions of the disk that are not
otherwise touched by the source still read as zero upon completion.

However, in mirror_dirty_init(), we were blindly assuming that if the
destination allows punching holes, we should pre-zero the entire
image; and if it does not allow punching holes, then treat the entire
source as dirty rather than mirroring just the allocated portions of
the source.  Without the ability to punch holes, this results in the
destination file being fully allocated; and even when punching holes
is supported, it causes duplicate I/O to the portions of the
destination corresponding to chunks of the source that are allocated
but read as zero.

Smarter is to avoid the pre-zeroing pass over the destination if it
can be proved the destination already reads as zero.  Note that a
later patch will then further improve things to skip writing to the
destination for parts of the image where the source is zero; but even
with just this patch, it is possible to see a difference for any BDS
that can quickly report that it already reads as zero.

Note, however, that if the destination was opened with
"detect-zeroes": "unmap", then the user wants us to punch holes where
possible for any zeroes in the source; in that case, we are better off
doing unmap up front in bulk.

Signed-off-by: Eric Blake <eblake@redhat.com>

---

v3: add exemption for "detect-zeroes":"unmap" on destination
---
 block/mirror.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 34c6c5252e1..4059bf96854 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -849,13 +849,30 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
     bdrv_graph_co_rdunlock();

     if (s->zero_target) {
+        offset = 0;
+        bdrv_graph_co_rdlock();
+        ret = bdrv_co_is_all_zeroes(target_bs);
+        bdrv_graph_co_rdunlock();
+        if (ret < 0) {
+            return ret;
+        }
+        /*
+         * If the destination already reads as zero, and we are not
+         * requested to punch holes into existing zeroes, then we can
+         * skip pre-zeroing the destination.
+         */
+        if (ret > 0 &&
+            (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
+             !bdrv_can_write_zeroes_with_unmap(target_bs))) {
+            offset = s->bdev_length;
+        }
         if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
             bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
             return 0;
         }

         s->initial_zeroing_ongoing = true;
-        for (offset = 0; offset < s->bdev_length; ) {
+        while (offset < s->bdev_length) {
             int bytes = MIN(s->bdev_length - offset,
                             QEMU_ALIGN_DOWN(INT_MAX, s->granularity));

-- 
2.49.0
Re: [PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero
Posted by Sunny Zhu 5 months, 1 week ago
> When doing a sync=full mirroring, QMP drive-mirror requests full
> zeroing if it did not just create the destination, and blockdev-mirror
> requests full zeroing unconditionally.  This is because during a full
> sync, we must ensure that the portions of the disk that are not
> otherwise touched by the source still read as zero upon completion.
> 
> However, in mirror_dirty_init(), we were blindly assuming that if the
> destination allows punching holes, we should pre-zero the entire
> image; and if it does not allow punching holes, then treat the entire
> source as dirty rather than mirroring just the allocated portions of
> the source.  Without the ability to punch holes, this results in the
> destination file being fully allocated; and even when punching holes
> is supported, it causes duplicate I/O to the portions of the
> destination corresponding to chunks of the source that are allocated
> but read as zero.
> 
> Smarter is to avoid the pre-zeroing pass over the destination if it
> can be proved the destination already reads as zero.  Note that a
> later patch will then further improve things to skip writing to the
> destination for parts of the image where the source is zero; but even
> with just this patch, it is possible to see a difference for any BDS
> that can quickly report that it already reads as zero.
> 
> Note, however, that if the destination was opened with
> "detect-zeroes": "unmap", then the user wants us to punch holes where
> possible for any zeroes in the source; in that case, we are better off
> doing unmap up front in bulk.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> 
> v3: add exemption for "detect-zeroes":"unmap" on destination
> ---
>  block/mirror.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 34c6c5252e1..4059bf96854 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -849,13 +849,30 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
>      bdrv_graph_co_rdunlock();
> 
>      if (s->zero_target) {
> +        offset = 0;
> +        bdrv_graph_co_rdlock();
> +        ret = bdrv_co_is_all_zeroes(target_bs);
> +        bdrv_graph_co_rdunlock();
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        /*
> +         * If the destination already reads as zero, and we are not
> +         * requested to punch holes into existing zeroes, then we can
> +         * skip pre-zeroing the destination.
> +         */
> +        if (ret > 0 &&
> +            (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
> +             !bdrv_can_write_zeroes_with_unmap(target_bs))) {
> +            offset = s->bdev_length;

If when bdrv_can_write_zeroes_with_unmap(target_bs) == true, we prefer to
punch holes regardless of whether target_bs already reads as zero, then
execute bdrv_co_is_all_zeroes() in advance might be wasteful.

    if (bdrv_can_write_zeroes_with_unmap(target_bs)) {
        initial_zeroing();
    } else {
        ret = bdrv_co_is_all_zeroes(target_bs);
        ...
    }

> +        }
>          if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
>              bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
>              return 0;

When ret > 0, We should not return directly here.

>          }
> 
>          s->initial_zeroing_ongoing = true;
> -        for (offset = 0; offset < s->bdev_length; ) {
> +        while (offset < s->bdev_length) {
>              int bytes = MIN(s->bdev_length - offset,
>                              QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
Re: [PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero
Posted by Eric Blake 5 months ago
On Thu, May 01, 2025 at 12:09:26AM +0800, Sunny Zhu wrote:
> > When doing a sync=full mirroring, QMP drive-mirror requests full
> > zeroing if it did not just create the destination, and blockdev-mirror
> > requests full zeroing unconditionally.  This is because during a full
> > sync, we must ensure that the portions of the disk that are not
> > otherwise touched by the source still read as zero upon completion.
> > 
> > However, in mirror_dirty_init(), we were blindly assuming that if the
> > destination allows punching holes, we should pre-zero the entire
> > image; and if it does not allow punching holes, then treat the entire
> > source as dirty rather than mirroring just the allocated portions of
> > the source.  Without the ability to punch holes, this results in the
> > destination file being fully allocated; and even when punching holes
> > is supported, it causes duplicate I/O to the portions of the
> > destination corresponding to chunks of the source that are allocated
> > but read as zero.
> > 
> > Smarter is to avoid the pre-zeroing pass over the destination if it
> > can be proved the destination already reads as zero.  Note that a
> > later patch will then further improve things to skip writing to the
> > destination for parts of the image where the source is zero; but even
> > with just this patch, it is possible to see a difference for any BDS
> > that can quickly report that it already reads as zero.
> > 
> > Note, however, that if the destination was opened with
> > "detect-zeroes": "unmap", then the user wants us to punch holes where
> > possible for any zeroes in the source; in that case, we are better off
> > doing unmap up front in bulk.
> > 
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > 
> > ---
> > 
> > v3: add exemption for "detect-zeroes":"unmap" on destination
> > ---
> >  block/mirror.c | 19 ++++++++++++++++++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/mirror.c b/block/mirror.c
> > index 34c6c5252e1..4059bf96854 100644
> > --- a/block/mirror.c
> > +++ b/block/mirror.c
> > @@ -849,13 +849,30 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
> >      bdrv_graph_co_rdunlock();
> > 
> >      if (s->zero_target) {
> > +        offset = 0;
> > +        bdrv_graph_co_rdlock();
> > +        ret = bdrv_co_is_all_zeroes(target_bs);
> > +        bdrv_graph_co_rdunlock();
> > +        if (ret < 0) {
> > +            return ret;
> > +        }
> > +        /*
> > +         * If the destination already reads as zero, and we are not
> > +         * requested to punch holes into existing zeroes, then we can
> > +         * skip pre-zeroing the destination.
> > +         */
> > +        if (ret > 0 &&
> > +            (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
> > +             !bdrv_can_write_zeroes_with_unmap(target_bs))) {
> > +            offset = s->bdev_length;
> 
> If when bdrv_can_write_zeroes_with_unmap(target_bs) == true, we prefer to
> punch holes regardless of whether target_bs already reads as zero, then
> execute bdrv_co_is_all_zeroes() in advance might be wasteful.

Hmm.  bdrv_co_is_all_zeroes() is supposed to be fast, but you're right
that even faster than a syscall or two is no syscalls at all.

> 
>     if (bdrv_can_write_zeroes_with_unmap(target_bs)) {
>         initial_zeroing();
>     } else {
>         ret = bdrv_co_is_all_zeroes(target_bs);
>         ...
>     }

That's a bigger refactoring; probably worth doing in a separate patch.
It looks like I should probably do a v4 along those lines, to see how
it compares.

> 
> > +        }
> >          if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
> >              bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
> >              return 0;
> 
> When ret > 0, We should not return directly here.

That's pre-existing, but correct.  If we can't write zeroes with
unmap, then we mark the entire image dirty (every zero in the source
will result in zeroes in the dest - as it was before this series); and
we have also marked the zero bitmap (writing zeroes will be a no-op if
the zero bitmap says that is safe).  The rest of this function has two
purposes: finish pre-zeroing (well, there's nothing to pre-zero if we
can't punch holes, and especially nothing to pre-zero if we already
know the image reads as all zero), and populate the dirty bitmap (we
just populated the entire map here, so it's not worth trying to
populate the map with finer granularity later), so returning here is
the right thing to do.

> 
> >          }
> > 
> >          s->initial_zeroing_ongoing = true;
> > -        for (offset = 0; offset < s->bdev_length; ) {
> > +        while (offset < s->bdev_length) {
> >              int bytes = MIN(s->bdev_length - offset,
> >                              QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org
Re: [PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero
Posted by Sunny Zhu 5 months ago
On Thu, 1 May 2025 12:33:14 -0500, Eric wrote:
> > > +         * If the destination already reads as zero, and we are not
> > > +         * requested to punch holes into existing zeroes, then we can
> > > +         * skip pre-zeroing the destination.
> > > +         */
> > > +        if (ret > 0 &&
> > > +            (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
> > > +             !bdrv_can_write_zeroes_with_unmap(target_bs))) {
> > > +            offset = s->bdev_length;
> > 
> > If when bdrv_can_write_zeroes_with_unmap(target_bs) == true, we prefer to
> > punch holes regardless of whether target_bs already reads as zero, then
> > execute bdrv_co_is_all_zeroes() in advance might be wasteful.
> 
> Hmm.  bdrv_co_is_all_zeroes() is supposed to be fast, but you're right
> that even faster than a syscall or two is no syscalls at all.

Indeed, bdrv_co_is_all_zeroes() is supposed to be fast, and since we are on the
management plane rather than the data plane, the impact here should be negligible.

> 
> > 
> >     if (bdrv_can_write_zeroes_with_unmap(target_bs)) {
> >         initial_zeroing();
> >     } else {
> >         ret = bdrv_co_is_all_zeroes(target_bs);
> >         ...
> >     }
> 
> That's a bigger refactoring; probably worth doing in a separate patch.
> It looks like I should probably do a v4 along those lines, to see how
> it compares.
> 
> > 
> > > +        }
> > >          if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
> > >              bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
> > >              return 0;
> > 
> > When ret > 0, We should not return directly here.
> 
> That's pre-existing, but correct.  If we can't write zeroes with
> unmap, then we mark the entire image dirty (every zero in the source
> will result in zeroes in the dest - as it was before this series); and
> we have also marked the zero bitmap (writing zeroes will be a no-op if
> the zero bitmap says that is safe).  The rest of this function has two
> purposes: finish pre-zeroing (well, there's nothing to pre-zero if we
> can't punch holes, and especially nothing to pre-zero if we already
> know the image reads as all zero), and populate the dirty bitmap (we
> just populated the entire map here, so it's not worth trying to
> populate the map with finer granularity later), so returning here is
> the right thing to do.

I apologize that my previous explanation missed the key point. What I meant is:
when bdrv_co_is_all_zeroes() > 0 && !bdrv_can_write_zeroes_with_unmap(target_bs),
we set offset = s->bdev_length;

However here, when !bdrv_can_write_zeroes_with_unmap(target_bs) is true, we directly
return, which means that offset = s->bdev_length (i.e., our logic to skip pre-zeroing
the destination) will never take effect.

The appropriate modification here should be:

    if (!ret && !bdrv_can_write_zeroes_with_unmap(target_bs)) {
        bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
        return 0;

> 
> > 
> > >          }
> > > 
> > >          s->initial_zeroing_ongoing = true;
> > > -        for (offset = 0; offset < s->bdev_length; ) {
> > > +        while (offset < s->bdev_length) {
> > >              int bytes = MIN(s->bdev_length - offset,
> > >                              QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
> > 
>