This tests exercises graph locking, draining, and graph modifications
with AioContext switches a lot. Amongst others, it serves as a
regression test for bdrv_graph_wrlock() deadlocking because it is called
with a locked AioContext and for AioContext handling in the NBD server.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
tests/qemu-iotests/iotests.py | 4 ++
.../qemu-iotests/tests/graph-changes-while-io | 56 +++++++++++++++++--
.../tests/graph-changes-while-io.out | 4 +-
3 files changed, 58 insertions(+), 6 deletions(-)
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 3e82c634cf..7073579a7d 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -462,6 +462,10 @@ def qmp(self, cmd: str, args: Optional[Dict[str, object]] = None) \
assert self._qmp is not None
return self._qmp.cmd(cmd, args)
+ def get_qmp(self) -> QEMUMonitorProtocol:
+ assert self._qmp is not None
+ return self._qmp
+
def stop(self, kill_signal=15):
self._p.send_signal(kill_signal)
self._p.wait()
diff --git a/tests/qemu-iotests/tests/graph-changes-while-io b/tests/qemu-iotests/tests/graph-changes-while-io
index 7664f33689..750e7d4d38 100755
--- a/tests/qemu-iotests/tests/graph-changes-while-io
+++ b/tests/qemu-iotests/tests/graph-changes-while-io
@@ -22,19 +22,19 @@
import os
from threading import Thread
import iotests
-from iotests import imgfmt, qemu_img, qemu_img_create, QMPTestCase, \
- QemuStorageDaemon
+from iotests import imgfmt, qemu_img, qemu_img_create, qemu_io, \
+ QMPTestCase, QemuStorageDaemon
top = os.path.join(iotests.test_dir, 'top.img')
nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock')
-def do_qemu_img_bench() -> None:
+def do_qemu_img_bench(count: int = 2000000) -> None:
"""
Do some I/O requests on `nbd_sock`.
"""
- qemu_img('bench', '-f', 'raw', '-c', '2000000',
+ qemu_img('bench', '-f', 'raw', '-c', str(count),
f'nbd+unix:///node0?socket={nbd_sock}')
@@ -84,6 +84,54 @@ class TestGraphChangesWhileIO(QMPTestCase):
bench_thr.join()
+ def test_commit_while_io(self) -> None:
+ # Run qemu-img bench in the background
+ bench_thr = Thread(target=do_qemu_img_bench, args=(200000, ))
+ bench_thr.start()
+
+ qemu_io('-c', 'write 0 64k', top)
+ qemu_io('-c', 'write 128k 64k', top)
+
+ result = self.qsd.qmp('blockdev-add', {
+ 'driver': imgfmt,
+ 'node-name': 'overlay',
+ 'backing': None,
+ 'file': {
+ 'driver': 'file',
+ 'filename': top
+ }
+ })
+ self.assert_qmp(result, 'return', {})
+
+ result = self.qsd.qmp('blockdev-snapshot', {
+ 'node': 'node0',
+ 'overlay': 'overlay',
+ })
+ self.assert_qmp(result, 'return', {})
+
+ # While qemu-img bench is running, repeatedly commit overlay to node0
+ while bench_thr.is_alive():
+ result = self.qsd.qmp('block-commit', {
+ 'job-id': 'job0',
+ 'device': 'overlay',
+ })
+ self.assert_qmp(result, 'return', {})
+
+ result = self.qsd.qmp('block-job-cancel', {
+ 'device': 'job0',
+ })
+ self.assert_qmp(result, 'return', {})
+
+ cancelled = False
+ while not cancelled:
+ for event in self.qsd.get_qmp().get_events(wait=10.0):
+ if event['event'] != 'JOB_STATUS_CHANGE':
+ continue
+ if event['data']['status'] == 'null':
+ cancelled = True
+
+ bench_thr.join()
+
if __name__ == '__main__':
# Format must support raw backing files
iotests.main(supported_fmts=['qcow', 'qcow2', 'qed'],
diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out b/tests/qemu-iotests/tests/graph-changes-while-io.out
index ae1213e6f8..fbc63e62f8 100644
--- a/tests/qemu-iotests/tests/graph-changes-while-io.out
+++ b/tests/qemu-iotests/tests/graph-changes-while-io.out
@@ -1,5 +1,5 @@
-.
+..
----------------------------------------------------------------------
-Ran 1 tests
+Ran 2 tests
OK
--
2.40.1
On Wed, May 17, 2023 at 05:28:34PM +0200, Kevin Wolf wrote:
> This tests exercises graph locking, draining, and graph modifications
> with AioContext switches a lot. Amongst others, it serves as a
> regression test for bdrv_graph_wrlock() deadlocking because it is called
> with a locked AioContext and for AioContext handling in the NBD server.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
I've now confirmed the following setups with just './check
graph-changes-while-io -qcow2':
patch 3 alone => test fails with wrlock assertion (good, we're
catching the 8.0 regression where the new assertion failure is
tripping)
patch 1 and 3 => test fails differently than patch 3 alone (good,
we're exposing the fact that NBD had a pre-existing bug, regardless of
whether the added rwlock made it easier to spot)
patch 2 and 3 => test passes (good, patch 2 appears to have fixed this
particular bug, and when we are ready to revert patch 1 because we get
rid of AioContext locking we'll still be okay)
patch 1, 2, and 3 => test passes (good, we fixed the NBD bug, and have
regression testing in place for a scenario that previously wasn't
getting good testing)
As such, I'm happy to supply:
Tested-by: Eric Blake <eblake@redhat.com>
Now on to the patch itself...
> ---
> tests/qemu-iotests/iotests.py | 4 ++
> .../qemu-iotests/tests/graph-changes-while-io | 56 +++++++++++++++++--
> .../tests/graph-changes-while-io.out | 4 +-
> 3 files changed, 58 insertions(+), 6 deletions(-)
>
> @@ -84,6 +84,54 @@ class TestGraphChangesWhileIO(QMPTestCase):
>
> bench_thr.join()
>
> + def test_commit_while_io(self) -> None:
> + # Run qemu-img bench in the background
> + bench_thr = Thread(target=do_qemu_img_bench, args=(200000, ))
TIL - you can create a 1-item tuple in Python. It caught me
off-guard, but makes sense now that I've re-read it.
> +
> + # While qemu-img bench is running, repeatedly commit overlay to node0
> + while bench_thr.is_alive():
> + result = self.qsd.qmp('block-commit', {
> + 'job-id': 'job0',
> + 'device': 'overlay',
> + })
> + self.assert_qmp(result, 'return', {})
> +
> + result = self.qsd.qmp('block-job-cancel', {
> + 'device': 'job0',
> + })
> + self.assert_qmp(result, 'return', {})
> +
> + cancelled = False
> + while not cancelled:
> + for event in self.qsd.get_qmp().get_events(wait=10.0):
The updated test took about 34 seconds on my machine; long enough that
it is rightfully not part of './check -g quick', but still reasonable
that I had no problems reproducing the issue while the test was
running.
> + if event['event'] != 'JOB_STATUS_CHANGE':
> + continue
> + if event['data']['status'] == 'null':
> + cancelled = True
> +
> + bench_thr.join()
> +
It feels a bit odd that the test is skipped during './check -nbd', yet
it IS utilizing nbd, and the fix in patch 2 is indeed in NBD code.
But that's not a flaw in the test itself, just a limitation of what
images we need in order to set up the NBD service in a way to trigger
the problem.
Reviewed-by: Eric Blake <eblake@redhat.com>
I'm in a spot where I can quickly queue this through my NBD tree so we
can get it backported to 8.0.1; pull request coming up, provided the
full series passes a few more tests on my end.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
© 2016 - 2026 Red Hat, Inc.