[PATCH for-4.2 v2 0/3] qcow2: Fix data corruption on XFS

Max Reitz posted 3 patches 4 years, 4 months ago
Test asan passed
Test checkpatch passed
Test FreeBSD passed
Test docker-mingw@fedora passed
Test docker-clang@ubuntu passed
Test docker-quick@centos7 failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20191101152510.11719-1-mreitz@redhat.com
Maintainers: Max Reitz <mreitz@redhat.com>, Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>, Stefan Hajnoczi <stefanha@redhat.com>
include/block/block_int.h |  4 ++++
block/file-posix.c        | 36 +++++++++++++++++++++++++++++++++
block/io.c                | 42 ++++++++++++++++++++++++++++-----------
3 files changed, 70 insertions(+), 12 deletions(-)
[PATCH for-4.2 v2 0/3] qcow2: Fix data corruption on XFS
Posted by Max Reitz 4 years, 4 months ago
Hi,

As I reasoned here:
https://lists.nongnu.org/archive/html/qemu-block/2019-11/msg00027.html
I’m no longer convinced of reverting c8bb23cbdbe.  I could see a
significant performance improvement from it on ext4 with aio=native in a
guest that does random 4k writes, and as such I feel it would be a
regression to revert it for 4.2.

To work around the XFS corruption, we still need the other three patches
from the series, of course.  We cannot restrict the workaround to just
XFS, because maybe the file is on a remote filesystem and then we don’t
know anything about the host configuration.

The performance impact should still be minimal because this just
serializes post-EOF write-zeroes and data writes, and that just doesn’t
happen very often, even with handle_alloc_space() in qcow2.


I would love to have more time to make a decision, but there simply
isn’t any.  Patches for 4.1.1 are to be collected on Monday, AFAIU.


v2:
- Dropped patch 1
- Forgot a “coroutine_fn” in patch 2 (it isn’t really important,
  qemu_coroutine_self() works in non-coroutine functions, too, but
  calling bdrv_(co_)get_self_request() from a non-coroutine just doesn’t
  make any sense)
- Patch 3:
  - Reverted the commit message to the RFC state to reflect that this is
    specifically because of handle_alloc_space()
  - Dropped the two lines that said there’d be no performance impact at
    all because no driver would submit zero writes beyond the EOF
    (because qcow2 now still does)


git-backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/3:[----] [--] 'block: Make wait/mark serialising requests public'
002/3:[0004] [FC] 'block: Add bdrv_co_get_self_request()'
003/3:[0002] [FC] 'block/file-posix: Let post-EOF fallocate serialize'


Max Reitz (3):
  block: Make wait/mark serialising requests public
  block: Add bdrv_co_get_self_request()
  block/file-posix: Let post-EOF fallocate serialize

 include/block/block_int.h |  4 ++++
 block/file-posix.c        | 36 +++++++++++++++++++++++++++++++++
 block/io.c                | 42 ++++++++++++++++++++++++++++-----------
 3 files changed, 70 insertions(+), 12 deletions(-)

-- 
2.21.0


Re: [PATCH for-4.2 v2 0/3] qcow2: Fix data corruption on XFS
Posted by no-reply@patchew.org 4 years, 4 months ago
Patchew URL: https://patchew.org/QEMU/20191101152510.11719-1-mreitz@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TEST    check-qtest-x86_64: tests/ahci-test
  TEST    check-unit: tests/test-aio-multithread
test-aio-multithread: /tmp/qemu-test/src/util/async.c:283: aio_ctx_finalize: Assertion `!qemu_lockcnt_count(&ctx->list_lock)' failed.
ERROR - too few tests run (expected 6, got 2)
make: *** [check-unit] Error 1
make: *** Waiting for unfinished jobs....
  TEST    iotest-qcow2: 013
  TEST    iotest-qcow2: 017
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=bf88cbb2b1b4497f8f4fc2b674903b08', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-vhhub41q/src/docker-src.2019-11-01-11.37.23.29941:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=bf88cbb2b1b4497f8f4fc2b674903b08
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-vhhub41q/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    11m36.434s
user    0m9.145s


The full log is available at
http://patchew.org/logs/20191101152510.11719-1-mreitz@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Re: [PATCH for-4.2 v2 0/3] qcow2: Fix data corruption on XFS
Posted by Max Reitz 4 years, 4 months ago
On 01.11.19 16:48, no-reply@patchew.org wrote:
> Patchew URL: https://patchew.org/QEMU/20191101152510.11719-1-mreitz@redhat.com/
> 
> 
> 
> Hi,
> 
> This series failed the docker-quick@centos7 build test. Please find the testing commands and
> their output below. If you have Docker installed, you can probably reproduce it
> locally.
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> make docker-image-centos7 V=1 NETWORK=1
> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
> === TEST SCRIPT END ===
> 
>   TEST    check-qtest-x86_64: tests/ahci-test
>   TEST    check-unit: tests/test-aio-multithread
> test-aio-multithread: /tmp/qemu-test/src/util/async.c:283: aio_ctx_finalize: Assertion `!qemu_lockcnt_count(&ctx->list_lock)' failed.
> ERROR - too few tests run (expected 6, got 2)
> make: *** [check-unit] Error 1
> make: *** Waiting for unfinished jobs....

This doesn’t seem related to this series and I can’t reproduce it
locally, so I’ll just ignore it.

Max

Re: [PATCH for-4.2 v2 0/3] qcow2: Fix data corruption on XFS
Posted by Max Reitz 4 years, 4 months ago
On 01.11.19 16:25, Max Reitz wrote:
> Hi,
> 
> As I reasoned here:
> https://lists.nongnu.org/archive/html/qemu-block/2019-11/msg00027.html
> I’m no longer convinced of reverting c8bb23cbdbe.  I could see a
> significant performance improvement from it on ext4 with aio=native in a
> guest that does random 4k writes, and as such I feel it would be a
> regression to revert it for 4.2.
> 
> To work around the XFS corruption, we still need the other three patches
> from the series, of course.  We cannot restrict the workaround to just
> XFS, because maybe the file is on a remote filesystem and then we don’t
> know anything about the host configuration.
> 
> The performance impact should still be minimal because this just
> serializes post-EOF write-zeroes and data writes, and that just doesn’t
> happen very often, even with handle_alloc_space() in qcow2.
> 
> 
> I would love to have more time to make a decision, but there simply
> isn’t any.  Patches for 4.1.1 are to be collected on Monday, AFAIU.

I would have liked some reviews on this series (so I waited over the
weekend, even though I didn’t expect any), but now I’ve applied it
anyway (and sent a pull request).  I understand it was difficult last
week because of KVM Forum.

AFAIU we need it today for stable, and there didn’t seem to be any
opposition on these patches here, just on the revert of c8bb23cbdbe.

I welcome you to still review the series and then shout “STOP” at the
pull request if you find it necessary.

Max

Re: [PATCH for-4.2 v2 0/3] qcow2: Fix data corruption on XFS
Posted by Stefan Hajnoczi 4 years, 4 months ago
On Fri, Nov 01, 2019 at 04:25:07PM +0100, Max Reitz wrote:
> Hi,
> 
> As I reasoned here:
> https://lists.nongnu.org/archive/html/qemu-block/2019-11/msg00027.html
> I’m no longer convinced of reverting c8bb23cbdbe.  I could see a
> significant performance improvement from it on ext4 with aio=native in a
> guest that does random 4k writes, and as such I feel it would be a
> regression to revert it for 4.2.
> 
> To work around the XFS corruption, we still need the other three patches
> from the series, of course.  We cannot restrict the workaround to just
> XFS, because maybe the file is on a remote filesystem and then we don’t
> know anything about the host configuration.
> 
> The performance impact should still be minimal because this just
> serializes post-EOF write-zeroes and data writes, and that just doesn’t
> happen very often, even with handle_alloc_space() in qcow2.
> 
> 
> I would love to have more time to make a decision, but there simply
> isn’t any.  Patches for 4.1.1 are to be collected on Monday, AFAIU.
> 
> 
> v2:
> - Dropped patch 1
> - Forgot a “coroutine_fn” in patch 2 (it isn’t really important,
>   qemu_coroutine_self() works in non-coroutine functions, too, but
>   calling bdrv_(co_)get_self_request() from a non-coroutine just doesn’t
>   make any sense)
> - Patch 3:
>   - Reverted the commit message to the RFC state to reflect that this is
>     specifically because of handle_alloc_space()
>   - Dropped the two lines that said there’d be no performance impact at
>     all because no driver would submit zero writes beyond the EOF
>     (because qcow2 now still does)
> 
> 
> git-backport-diff against v1:
> 
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
> 
> 001/3:[----] [--] 'block: Make wait/mark serialising requests public'
> 002/3:[0004] [FC] 'block: Add bdrv_co_get_self_request()'
> 003/3:[0002] [FC] 'block/file-posix: Let post-EOF fallocate serialize'
> 
> 
> Max Reitz (3):
>   block: Make wait/mark serialising requests public
>   block: Add bdrv_co_get_self_request()
>   block/file-posix: Let post-EOF fallocate serialize
> 
>  include/block/block_int.h |  4 ++++
>  block/file-posix.c        | 36 +++++++++++++++++++++++++++++++++
>  block/io.c                | 42 ++++++++++++++++++++++++++++-----------
>  3 files changed, 70 insertions(+), 12 deletions(-)
> 
> -- 
> 2.21.0
> 

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>