Public bug reported:
Hi,
Primary vm flush failed after killing svm, which leads primary vm guest filesystem unavailable.
qemu versoin: 5.2.0
host/guest os: CentOS Linux release 7.6.1810 (Core)
Reproduce steps:
1. create colo vm following https://github.com/qemu/qemu/blob/master/docs/COLO-FT.txt
2. kill secondary vm (don't remove nbd child from quorum on primary vm)and wait for a minute. the interval depends on guest os.
result: primary vm file system shutdown because of flush cache error.
After serveral tests, I found that qemu-5.0.0 worked well, and it's the
commit
https://git.qemu.org/?p=qemu.git;a=commit;h=883833e29cb800b4d92b5d4736252f4004885191(block:
Flush all children in generic code) leads this change, and both virtio-
blk and ide turned out to be bad.
I think it's nbd(replication) flush failed leads bdrv_co_flush(quorum_bs) failed, here is the call stack.
#0 bdrv_co_flush (bs=0x56242b3cc0b0=nbd_bs) at ../block/io.c:2856
#1 0x0000562428b0f399 in bdrv_co_flush (bs=0x56242b3c7e00=replication_bs) at ../block/io.c:2920
#2 0x0000562428b0f399 in bdrv_co_flush (bs=0x56242a4ad800=quorum_bs) at ../block/io.c:2920
#3 0x0000562428b70d56 in blk_do_flush (blk=0x56242a4ad4a0) at ../block/block-backend.c:1672
#4 0x0000562428b70d87 in blk_aio_flush_entry (opaque=0x7fd0980073f0) at ../block/block-backend.c:1680
#5 0x0000562428c5f9a7 in coroutine_trampoline (i0=-1409269904, i1=32721) at ../util/coroutine-ucontext.c:173
While i am not sure whether i use colo inproperly? Can we assume that
nbd child of quorum immediately removed right after svm crashed? Or it's
really a bug? Does the following patch fix? Help is needed! Thanks a
lot!
diff --git a/block/quorum.c b/block/quorum.c
index cfc1436..f2c0805 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1279,7 +1279,7 @@ static BlockDriver bdrv_quorum = {
.bdrv_dirname = quorum_dirname,
.bdrv_co_block_status = quorum_co_block_status,
- .bdrv_co_flush_to_disk = quorum_co_flush,
+ .bdrv_co_flush = quorum_co_flush,
.bdrv_getlength = quorum_getlength,
** Affects: qemu
Importance: Undecided
Status: New
** Patch added: "primary guest kernel message"
https://bugs.launchpad.net/bugs/1923583/+attachment/5487235/+files/primary_guest_dmesg.log
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1923583
Title:
colo: pvm flush failed after svm killed
Status in QEMU:
New
Bug description:
Hi,
Primary vm flush failed after killing svm, which leads primary vm guest filesystem unavailable.
qemu versoin: 5.2.0
host/guest os: CentOS Linux release 7.6.1810 (Core)
Reproduce steps:
1. create colo vm following https://github.com/qemu/qemu/blob/master/docs/COLO-FT.txt
2. kill secondary vm (don't remove nbd child from quorum on primary vm)and wait for a minute. the interval depends on guest os.
result: primary vm file system shutdown because of flush cache error.
After serveral tests, I found that qemu-5.0.0 worked well, and it's
the commit
https://git.qemu.org/?p=qemu.git;a=commit;h=883833e29cb800b4d92b5d4736252f4004885191(block:
Flush all children in generic code) leads this change, and both
virtio-blk and ide turned out to be bad.
I think it's nbd(replication) flush failed leads bdrv_co_flush(quorum_bs) failed, here is the call stack.
#0 bdrv_co_flush (bs=0x56242b3cc0b0=nbd_bs) at ../block/io.c:2856
#1 0x0000562428b0f399 in bdrv_co_flush (bs=0x56242b3c7e00=replication_bs) at ../block/io.c:2920
#2 0x0000562428b0f399 in bdrv_co_flush (bs=0x56242a4ad800=quorum_bs) at ../block/io.c:2920
#3 0x0000562428b70d56 in blk_do_flush (blk=0x56242a4ad4a0) at ../block/block-backend.c:1672
#4 0x0000562428b70d87 in blk_aio_flush_entry (opaque=0x7fd0980073f0) at ../block/block-backend.c:1680
#5 0x0000562428c5f9a7 in coroutine_trampoline (i0=-1409269904, i1=32721) at ../util/coroutine-ucontext.c:173
While i am not sure whether i use colo inproperly? Can we assume that
nbd child of quorum immediately removed right after svm crashed? Or
it's really a bug? Does the following patch fix? Help is needed!
Thanks a lot!
diff --git a/block/quorum.c b/block/quorum.c
index cfc1436..f2c0805 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1279,7 +1279,7 @@ static BlockDriver bdrv_quorum = {
.bdrv_dirname = quorum_dirname,
.bdrv_co_block_status = quorum_co_block_status,
- .bdrv_co_flush_to_disk = quorum_co_flush,
+ .bdrv_co_flush = quorum_co_flush,
.bdrv_getlength = quorum_getlength,
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1923583/+subscriptions
Patchew URL: https://patchew.org/QEMU/161830261172.29345.7866671962411605196.malonedeb@wampee.canonical.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 161830261172.29345.7866671962411605196.malonedeb@wampee.canonical.com Subject: [Bug 1923583] [NEW] colo: pvm flush failed after svm killed === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/161830261172.29345.7866671962411605196.malonedeb@wampee.canonical.com -> patchew/161830261172.29345.7866671962411605196.malonedeb@wampee.canonical.com - [tag update] patchew/20210413081008.3409459-1-f4bug@amsat.org -> patchew/20210413081008.3409459-1-f4bug@amsat.org Switched to a new branch 'test' f43885d colo: pvm flush failed after svm killed === OUTPUT BEGIN === ERROR: Missing Signed-off-by: line(s) total: 1 errors, 0 warnings, 8 lines checked Commit f43885d3a7e9 (colo: pvm flush failed after svm killed) has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/161830261172.29345.7866671962411605196.malonedeb@wampee.canonical.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
© 2016 - 2024 Red Hat, Inc.