block: Fix slow pre-zeroing in qemu-img convert

[Qemu-devel] [PATCH 0/6] block: Fix slow pre-zeroing in qemu-img convert

Posted by Kevin Wolf 5 years, 1 month ago

If qemu-img convert sees that the target image isn't zero-initialised
yet, it tries to do an efficient zero write for the whole image first
to save the overhead of repeated explicit zero writes during the
conversion. Obviously, this provides only an advantage if the
pre-zeroing is actually efficient. Otherwise, we can end up writing
zeroes slowly while zeroing out the whole image, and then overwrite the
same blocks again with real data, potentially doubling the written data.

Additionally, commit 9776f0db changed NBD to advertise for all NBD
devices that zero writes with unmap is supported for all NBD devices, no
matter whether the storage of NBD server actually can do this or whether
we would fall back to explicit zero writes.

This series adds a new BDRV_REQ_NO_FALLBACK flag for writing zeroes,
which makes the request return an error if it can't guarantee that we
don't end up running a slow fallback path.

For NBD, this means that we still support zero writes from guests, but
qemu-img convert will not try to use it to zero out the whole image.
This is potentially suboptimal if the server does actually support
efficient zero writes. We'd need an NBD protocol extension to transfer
this flag so we can re-enable the qemu-img convert feature for the NBD
driver without regressing on storage that can't provide efficient zero
writes.

Kevin Wolf (6):
  block: Remove error messages in bdrv_make_zero()
  block: Add BDRV_REQ_NO_FALLBACK
  block: Advertise BDRV_REQ_NO_FALLBACK in filter drivers
  file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes
  qemu-img: Use BDRV_REQ_NO_FALLBACK for pre-zeroing
  qemu-io: Add write -n for BDRV_REQ_NO_FALLBACK

 include/block/block.h   |  7 ++++++-
 include/block/raw-aio.h |  1 +
 block/blkdebug.c        |  2 +-
 block/copy-on-read.c    |  7 +++----
 block/file-posix.c      | 24 ++++++++++++++++--------
 block/io.c              | 16 +++++++++++-----
 block/mirror.c          |  3 ++-
 block/raw-format.c      |  2 +-
 qemu-img.c              |  2 +-
 qemu-io-cmds.c          | 13 +++++++++++--
 10 files changed, 53 insertions(+), 24 deletions(-)

-- 
2.20.1

Re: [Qemu-devel] [PATCH 0/6] block: Fix slow pre-zeroing in qemu-img convert

Posted by Eric Blake 5 years, 1 month ago

On 3/22/19 8:21 AM, Kevin Wolf wrote:
> If qemu-img convert sees that the target image isn't zero-initialised
> yet, it tries to do an efficient zero write for the whole image first
> to save the overhead of repeated explicit zero writes during the
> conversion. Obviously, this provides only an advantage if the
> pre-zeroing is actually efficient. Otherwise, we can end up writing
> zeroes slowly while zeroing out the whole image, and then overwrite the
> same blocks again with real data, potentially doubling the written data.
> 
> Additionally, commit 9776f0db changed NBD to advertise for all NBD
> devices that zero writes with unmap is supported for all NBD devices, no
> matter whether the storage of NBD server actually can do this or whether
> we would fall back to explicit zero writes.
> 
> This series adds a new BDRV_REQ_NO_FALLBACK flag for writing zeroes,
> which makes the request return an error if it can't guarantee that we
> don't end up running a slow fallback path.
> 
> For NBD, this means that we still support zero writes from guests, but
> qemu-img convert will not try to use it to zero out the whole image.
> This is potentially suboptimal if the server does actually support
> efficient zero writes. We'd need an NBD protocol extension to transfer
> this flag so we can re-enable the qemu-img convert feature for the NBD
> driver without regressing on storage that can't provide efficient zero
> writes.

I haven't reviewed closely yet, but the summary is sane, so:
Acked-by: Eric Blake <eblake@redhat.com>

and I'll be cross-posting another mail soon with the desired NBD
protocol extension.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH for-4.0? 0/6] block: Fix slow pre-zeroing in qemu-img convert

Posted by Eric Blake 5 years, 1 month ago

On 3/22/19 9:21 AM, Eric Blake wrote:
> On 3/22/19 8:21 AM, Kevin Wolf wrote:
>> If qemu-img convert sees that the target image isn't zero-initialised
>> yet, it tries to do an efficient zero write for the whole image first
>> to save the overhead of repeated explicit zero writes during the
>> conversion. Obviously, this provides only an advantage if the
>> pre-zeroing is actually efficient. Otherwise, we can end up writing
>> zeroes slowly while zeroing out the whole image, and then overwrite the
>> same blocks again with real data, potentially doubling the written data.
>>
>> Additionally, commit 9776f0db changed NBD to advertise for all NBD
>> devices that zero writes with unmap is supported for all NBD devices, no
>> matter whether the storage of NBD server actually can do this or whether
>> we would fall back to explicit zero writes.
>>
>> This series adds a new BDRV_REQ_NO_FALLBACK flag for writing zeroes,
>> which makes the request return an error if it can't guarantee that we
>> don't end up running a slow fallback path.
>>
>> For NBD, this means that we still support zero writes from guests, but
>> qemu-img convert will not try to use it to zero out the whole image.
>> This is potentially suboptimal if the server does actually support
>> efficient zero writes. We'd need an NBD protocol extension to transfer
>> this flag so we can re-enable the qemu-img convert feature for the NBD
>> driver without regressing on storage that can't provide efficient zero
>> writes.
> 
> I haven't reviewed closely yet, but the summary is sane, so:
> Acked-by: Eric Blake <eblake@redhat.com>
> 
> and I'll be cross-posting another mail soon with the desired NBD
> protocol extension.

I also meant to add - since the performance slowdown is so noticeable,
and we are still at -rc0, I think this series is reasonable as 4.0
material as a bug fix (performance bugs rank lower than correctness
bugs, but they are still bugs)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 0/6] block: Fix slow pre-zeroing in qemu-img convert

Posted by Kevin Wolf 5 years, 1 month ago

Am 22.03.2019 um 14:21 hat Kevin Wolf geschrieben:
> If qemu-img convert sees that the target image isn't zero-initialised
> yet, it tries to do an efficient zero write for the whole image first
> to save the overhead of repeated explicit zero writes during the
> conversion. Obviously, this provides only an advantage if the
> pre-zeroing is actually efficient. Otherwise, we can end up writing
> zeroes slowly while zeroing out the whole image, and then overwrite the
> same blocks again with real data, potentially doubling the written data.
> 
> Additionally, commit 9776f0db changed NBD to advertise for all NBD
> devices that zero writes with unmap is supported for all NBD devices, no
> matter whether the storage of NBD server actually can do this or whether
> we would fall back to explicit zero writes.
> 
> This series adds a new BDRV_REQ_NO_FALLBACK flag for writing zeroes,
> which makes the request return an error if it can't guarantee that we
> don't end up running a slow fallback path.
> 
> For NBD, this means that we still support zero writes from guests, but
> qemu-img convert will not try to use it to zero out the whole image.
> This is potentially suboptimal if the server does actually support
> efficient zero writes. We'd need an NBD protocol extension to transfer
> this flag so we can re-enable the qemu-img convert feature for the NBD
> driver without regressing on storage that can't provide efficient zero
> writes.

Applied to the block branch.

Kevin

Re: [Qemu-devel] [PATCH 0/6] block: Fix slow pre-zeroing in qemu-img convert

Posted by Nir Soffer 5 years, 1 month ago

On Fri, Mar 22, 2019 at 3:22 PM Kevin Wolf <kwolf@redhat.com> wrote:

> If qemu-img convert sees that the target image isn't zero-initialised
> yet, it tries to do an efficient zero write for the whole image first
> to save the overhead of repeated explicit zero writes during the
> conversion. Obviously, this provides only an advantage if the
> pre-zeroing is actually efficient. Otherwise, we can end up writing
> zeroes slowly while zeroing out the whole image, and then overwrite the
> same blocks again with real data, potentially doubling the written data.
>
> Additionally, commit 9776f0db changed NBD to advertise for all NBD
> devices that zero writes with unmap is supported for all NBD devices, no
> matter whether the storage of NBD server actually can do this or whether
> we would fall back to explicit zero writes.
>
> This series adds a new BDRV_REQ_NO_FALLBACK flag for writing zeroes,
> which makes the request return an error if it can't guarantee that we
> don't end up running a slow fallback path.
>
> For NBD, this means that we still support zero writes from guests, but
> qemu-img convert will not try to use it to zero out the whole image.
> This is potentially suboptimal if the server does actually support
> efficient zero writes. We'd need an NBD protocol extension to transfer
> this flag so we can re-enable the qemu-img convert feature for the NBD
> driver without regressing on storage that can't provide efficient zero
> writes.
>

Thanks for fixing this! This will be a very important performance fix for
importing
VMs.

The change description sounds good to me.

Can you make these patches available on some git repo to make it easy to
test?

Nir

Kevin Wolf (6):
>   block: Remove error messages in bdrv_make_zero()
>   block: Add BDRV_REQ_NO_FALLBACK
>   block: Advertise BDRV_REQ_NO_FALLBACK in filter drivers
>   file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes
>   qemu-img: Use BDRV_REQ_NO_FALLBACK for pre-zeroing
>   qemu-io: Add write -n for BDRV_REQ_NO_FALLBACK
>
>  include/block/block.h   |  7 ++++++-
>  include/block/raw-aio.h |  1 +
>  block/blkdebug.c        |  2 +-
>  block/copy-on-read.c    |  7 +++----
>  block/file-posix.c      | 24 ++++++++++++++++--------
>  block/io.c              | 16 +++++++++++-----
>  block/mirror.c          |  3 ++-
>  block/raw-format.c      |  2 +-
>  qemu-img.c              |  2 +-
>  qemu-io-cmds.c          | 13 +++++++++++--
>  10 files changed, 53 insertions(+), 24 deletions(-)
>
> --
> 2.20.1
>
>

Re: [Qemu-devel] [PATCH 0/6] block: Fix slow pre-zeroing in qemu-img convert

Posted by Eric Blake 5 years, 1 month ago

On 3/22/19 3:22 PM, Nir Soffer wrote:

> Thanks for fixing this! This will be a very important performance fix for
> importing
> VMs.
> 
> The change description sounds good to me.
> 
> Can you make these patches available on some git repo to make it easy to
> test?

Based on commit titles, this branch looks reasonably close (at the moment):

https://repo.or.cz/qemu/kevin.git/shortlog/refs/heads/qcow2-data-file

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH 0/6] block: Fix slow pre-zeroing in qemu-img convert

Posted by Nir Soffer 5 years, 1 month ago

On Fri, Mar 22, 2019 at 10:47 PM Eric Blake <eblake@redhat.com> wrote:

> On 3/22/19 3:22 PM, Nir Soffer wrote:
>
> > Thanks for fixing this! This will be a very important performance fix for
> > importing
> > VMs.
> >
> > The change description sounds good to me.
> >
> > Can you make these patches available on some git repo to make it easy to
> > test?
>
> Based on commit titles, this branch looks reasonably close (at the moment):
>
> https://repo.or.cz/qemu/kevin.git/shortlog/refs/heads/qcow2-data-file

I tested the patches and they work correctly with NBD server (tested both
nbdkit
and qemu-nbd), skipping pre zero step, and zeroing only the unallocated
areas.

However zeroing is always done without BDRV_REQ_MAY_UNMAP, creating
fully allocated images when the expected result is sparse image.

I posted a patch with possible fix, explaining this issue with mode details.
I don't see it yest in mail archive, but the patch is available here:
https://github.com/nirs/qemu/commit/5023405ee42dc0d8e9a03ac03d247079f719b455

Nir