include/block/block.h | 7 ++++++- include/block/raw-aio.h | 1 + block/blkdebug.c | 2 +- block/copy-on-read.c | 7 +++---- block/file-posix.c | 24 ++++++++++++++++-------- block/io.c | 16 +++++++++++----- block/mirror.c | 3 ++- block/raw-format.c | 2 +- qemu-img.c | 2 +- qemu-io-cmds.c | 13 +++++++++++-- 10 files changed, 53 insertions(+), 24 deletions(-)
If qemu-img convert sees that the target image isn't zero-initialised yet, it tries to do an efficient zero write for the whole image first to save the overhead of repeated explicit zero writes during the conversion. Obviously, this provides only an advantage if the pre-zeroing is actually efficient. Otherwise, we can end up writing zeroes slowly while zeroing out the whole image, and then overwrite the same blocks again with real data, potentially doubling the written data. Additionally, commit 9776f0db changed NBD to advertise for all NBD devices that zero writes with unmap is supported for all NBD devices, no matter whether the storage of NBD server actually can do this or whether we would fall back to explicit zero writes. This series adds a new BDRV_REQ_NO_FALLBACK flag for writing zeroes, which makes the request return an error if it can't guarantee that we don't end up running a slow fallback path. For NBD, this means that we still support zero writes from guests, but qemu-img convert will not try to use it to zero out the whole image. This is potentially suboptimal if the server does actually support efficient zero writes. We'd need an NBD protocol extension to transfer this flag so we can re-enable the qemu-img convert feature for the NBD driver without regressing on storage that can't provide efficient zero writes. Kevin Wolf (6): block: Remove error messages in bdrv_make_zero() block: Add BDRV_REQ_NO_FALLBACK block: Advertise BDRV_REQ_NO_FALLBACK in filter drivers file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes qemu-img: Use BDRV_REQ_NO_FALLBACK for pre-zeroing qemu-io: Add write -n for BDRV_REQ_NO_FALLBACK include/block/block.h | 7 ++++++- include/block/raw-aio.h | 1 + block/blkdebug.c | 2 +- block/copy-on-read.c | 7 +++---- block/file-posix.c | 24 ++++++++++++++++-------- block/io.c | 16 +++++++++++----- block/mirror.c | 3 ++- block/raw-format.c | 2 +- qemu-img.c | 2 +- qemu-io-cmds.c | 13 +++++++++++-- 10 files changed, 53 insertions(+), 24 deletions(-) -- 2.20.1
On 3/22/19 8:21 AM, Kevin Wolf wrote: > If qemu-img convert sees that the target image isn't zero-initialised > yet, it tries to do an efficient zero write for the whole image first > to save the overhead of repeated explicit zero writes during the > conversion. Obviously, this provides only an advantage if the > pre-zeroing is actually efficient. Otherwise, we can end up writing > zeroes slowly while zeroing out the whole image, and then overwrite the > same blocks again with real data, potentially doubling the written data. > > Additionally, commit 9776f0db changed NBD to advertise for all NBD > devices that zero writes with unmap is supported for all NBD devices, no > matter whether the storage of NBD server actually can do this or whether > we would fall back to explicit zero writes. > > This series adds a new BDRV_REQ_NO_FALLBACK flag for writing zeroes, > which makes the request return an error if it can't guarantee that we > don't end up running a slow fallback path. > > For NBD, this means that we still support zero writes from guests, but > qemu-img convert will not try to use it to zero out the whole image. > This is potentially suboptimal if the server does actually support > efficient zero writes. We'd need an NBD protocol extension to transfer > this flag so we can re-enable the qemu-img convert feature for the NBD > driver without regressing on storage that can't provide efficient zero > writes. I haven't reviewed closely yet, but the summary is sane, so: Acked-by: Eric Blake <eblake@redhat.com> and I'll be cross-posting another mail soon with the desired NBD protocol extension. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
On 3/22/19 9:21 AM, Eric Blake wrote: > On 3/22/19 8:21 AM, Kevin Wolf wrote: >> If qemu-img convert sees that the target image isn't zero-initialised >> yet, it tries to do an efficient zero write for the whole image first >> to save the overhead of repeated explicit zero writes during the >> conversion. Obviously, this provides only an advantage if the >> pre-zeroing is actually efficient. Otherwise, we can end up writing >> zeroes slowly while zeroing out the whole image, and then overwrite the >> same blocks again with real data, potentially doubling the written data. >> >> Additionally, commit 9776f0db changed NBD to advertise for all NBD >> devices that zero writes with unmap is supported for all NBD devices, no >> matter whether the storage of NBD server actually can do this or whether >> we would fall back to explicit zero writes. >> >> This series adds a new BDRV_REQ_NO_FALLBACK flag for writing zeroes, >> which makes the request return an error if it can't guarantee that we >> don't end up running a slow fallback path. >> >> For NBD, this means that we still support zero writes from guests, but >> qemu-img convert will not try to use it to zero out the whole image. >> This is potentially suboptimal if the server does actually support >> efficient zero writes. We'd need an NBD protocol extension to transfer >> this flag so we can re-enable the qemu-img convert feature for the NBD >> driver without regressing on storage that can't provide efficient zero >> writes. > > I haven't reviewed closely yet, but the summary is sane, so: > Acked-by: Eric Blake <eblake@redhat.com> > > and I'll be cross-posting another mail soon with the desired NBD > protocol extension. I also meant to add - since the performance slowdown is so noticeable, and we are still at -rc0, I think this series is reasonable as 4.0 material as a bug fix (performance bugs rank lower than correctness bugs, but they are still bugs) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
Am 22.03.2019 um 14:21 hat Kevin Wolf geschrieben: > If qemu-img convert sees that the target image isn't zero-initialised > yet, it tries to do an efficient zero write for the whole image first > to save the overhead of repeated explicit zero writes during the > conversion. Obviously, this provides only an advantage if the > pre-zeroing is actually efficient. Otherwise, we can end up writing > zeroes slowly while zeroing out the whole image, and then overwrite the > same blocks again with real data, potentially doubling the written data. > > Additionally, commit 9776f0db changed NBD to advertise for all NBD > devices that zero writes with unmap is supported for all NBD devices, no > matter whether the storage of NBD server actually can do this or whether > we would fall back to explicit zero writes. > > This series adds a new BDRV_REQ_NO_FALLBACK flag for writing zeroes, > which makes the request return an error if it can't guarantee that we > don't end up running a slow fallback path. > > For NBD, this means that we still support zero writes from guests, but > qemu-img convert will not try to use it to zero out the whole image. > This is potentially suboptimal if the server does actually support > efficient zero writes. We'd need an NBD protocol extension to transfer > this flag so we can re-enable the qemu-img convert feature for the NBD > driver without regressing on storage that can't provide efficient zero > writes. Applied to the block branch. Kevin
On Fri, Mar 22, 2019 at 3:22 PM Kevin Wolf <kwolf@redhat.com> wrote: > If qemu-img convert sees that the target image isn't zero-initialised > yet, it tries to do an efficient zero write for the whole image first > to save the overhead of repeated explicit zero writes during the > conversion. Obviously, this provides only an advantage if the > pre-zeroing is actually efficient. Otherwise, we can end up writing > zeroes slowly while zeroing out the whole image, and then overwrite the > same blocks again with real data, potentially doubling the written data. > > Additionally, commit 9776f0db changed NBD to advertise for all NBD > devices that zero writes with unmap is supported for all NBD devices, no > matter whether the storage of NBD server actually can do this or whether > we would fall back to explicit zero writes. > > This series adds a new BDRV_REQ_NO_FALLBACK flag for writing zeroes, > which makes the request return an error if it can't guarantee that we > don't end up running a slow fallback path. > > For NBD, this means that we still support zero writes from guests, but > qemu-img convert will not try to use it to zero out the whole image. > This is potentially suboptimal if the server does actually support > efficient zero writes. We'd need an NBD protocol extension to transfer > this flag so we can re-enable the qemu-img convert feature for the NBD > driver without regressing on storage that can't provide efficient zero > writes. > Thanks for fixing this! This will be a very important performance fix for importing VMs. The change description sounds good to me. Can you make these patches available on some git repo to make it easy to test? Nir Kevin Wolf (6): > block: Remove error messages in bdrv_make_zero() > block: Add BDRV_REQ_NO_FALLBACK > block: Advertise BDRV_REQ_NO_FALLBACK in filter drivers > file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes > qemu-img: Use BDRV_REQ_NO_FALLBACK for pre-zeroing > qemu-io: Add write -n for BDRV_REQ_NO_FALLBACK > > include/block/block.h | 7 ++++++- > include/block/raw-aio.h | 1 + > block/blkdebug.c | 2 +- > block/copy-on-read.c | 7 +++---- > block/file-posix.c | 24 ++++++++++++++++-------- > block/io.c | 16 +++++++++++----- > block/mirror.c | 3 ++- > block/raw-format.c | 2 +- > qemu-img.c | 2 +- > qemu-io-cmds.c | 13 +++++++++++-- > 10 files changed, 53 insertions(+), 24 deletions(-) > > -- > 2.20.1 > >
On 3/22/19 3:22 PM, Nir Soffer wrote: > Thanks for fixing this! This will be a very important performance fix for > importing > VMs. > > The change description sounds good to me. > > Can you make these patches available on some git repo to make it easy to > test? Based on commit titles, this branch looks reasonably close (at the moment): https://repo.or.cz/qemu/kevin.git/shortlog/refs/heads/qcow2-data-file -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
On Fri, Mar 22, 2019 at 10:47 PM Eric Blake <eblake@redhat.com> wrote: > On 3/22/19 3:22 PM, Nir Soffer wrote: > > > Thanks for fixing this! This will be a very important performance fix for > > importing > > VMs. > > > > The change description sounds good to me. > > > > Can you make these patches available on some git repo to make it easy to > > test? > > Based on commit titles, this branch looks reasonably close (at the moment): > > https://repo.or.cz/qemu/kevin.git/shortlog/refs/heads/qcow2-data-file I tested the patches and they work correctly with NBD server (tested both nbdkit and qemu-nbd), skipping pre zero step, and zeroing only the unallocated areas. However zeroing is always done without BDRV_REQ_MAY_UNMAP, creating fully allocated images when the expected result is sparse image. I posted a patch with possible fix, explaining this issue with mode details. I don't see it yest in mail archive, but the patch is available here: https://github.com/nirs/qemu/commit/5023405ee42dc0d8e9a03ac03d247079f719b455 Nir
© 2016 - 2024 Red Hat, Inc.