block/graph-lock: Disable locking for now

[PATCH 3/3] iotests: Test commit with iothreads and ongoing I/O

Posted by Kevin Wolf 2 years, 8 months ago

This tests exercises graph locking, draining, and graph modifications
with AioContext switches a lot. Amongst others, it serves as a
regression test for bdrv_graph_wrlock() deadlocking because it is called
with a locked AioContext and for AioContext handling in the NBD server.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/iotests.py                 |  4 ++
 .../qemu-iotests/tests/graph-changes-while-io | 56 +++++++++++++++++--
 .../tests/graph-changes-while-io.out          |  4 +-
 3 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 3e82c634cf..7073579a7d 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -462,6 +462,10 @@ def qmp(self, cmd: str, args: Optional[Dict[str, object]] = None) \
         assert self._qmp is not None
         return self._qmp.cmd(cmd, args)
 
+    def get_qmp(self) -> QEMUMonitorProtocol:
+        assert self._qmp is not None
+        return self._qmp
+
     def stop(self, kill_signal=15):
         self._p.send_signal(kill_signal)
         self._p.wait()
diff --git a/tests/qemu-iotests/tests/graph-changes-while-io b/tests/qemu-iotests/tests/graph-changes-while-io
index 7664f33689..750e7d4d38 100755
--- a/tests/qemu-iotests/tests/graph-changes-while-io
+++ b/tests/qemu-iotests/tests/graph-changes-while-io
@@ -22,19 +22,19 @@
 import os
 from threading import Thread
 import iotests
-from iotests import imgfmt, qemu_img, qemu_img_create, QMPTestCase, \
-        QemuStorageDaemon
+from iotests import imgfmt, qemu_img, qemu_img_create, qemu_io, \
+        QMPTestCase, QemuStorageDaemon
 
 
 top = os.path.join(iotests.test_dir, 'top.img')
 nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock')
 
 
-def do_qemu_img_bench() -> None:
+def do_qemu_img_bench(count: int = 2000000) -> None:
     """
     Do some I/O requests on `nbd_sock`.
     """
-    qemu_img('bench', '-f', 'raw', '-c', '2000000',
+    qemu_img('bench', '-f', 'raw', '-c', str(count),
              f'nbd+unix:///node0?socket={nbd_sock}')
 
 
@@ -84,6 +84,54 @@ class TestGraphChangesWhileIO(QMPTestCase):
 
         bench_thr.join()
 
+    def test_commit_while_io(self) -> None:
+        # Run qemu-img bench in the background
+        bench_thr = Thread(target=do_qemu_img_bench, args=(200000, ))
+        bench_thr.start()
+
+        qemu_io('-c', 'write 0 64k', top)
+        qemu_io('-c', 'write 128k 64k', top)
+
+        result = self.qsd.qmp('blockdev-add', {
+            'driver': imgfmt,
+            'node-name': 'overlay',
+            'backing': None,
+            'file': {
+                'driver': 'file',
+                'filename': top
+            }
+        })
+        self.assert_qmp(result, 'return', {})
+
+        result = self.qsd.qmp('blockdev-snapshot', {
+            'node': 'node0',
+            'overlay': 'overlay',
+        })
+        self.assert_qmp(result, 'return', {})
+
+        # While qemu-img bench is running, repeatedly commit overlay to node0
+        while bench_thr.is_alive():
+            result = self.qsd.qmp('block-commit', {
+                'job-id': 'job0',
+                'device': 'overlay',
+            })
+            self.assert_qmp(result, 'return', {})
+
+            result = self.qsd.qmp('block-job-cancel', {
+                'device': 'job0',
+            })
+            self.assert_qmp(result, 'return', {})
+
+            cancelled = False
+            while not cancelled:
+                for event in self.qsd.get_qmp().get_events(wait=10.0):
+                    if event['event'] != 'JOB_STATUS_CHANGE':
+                        continue
+                    if event['data']['status'] == 'null':
+                        cancelled = True
+
+        bench_thr.join()
+
 if __name__ == '__main__':
     # Format must support raw backing files
     iotests.main(supported_fmts=['qcow', 'qcow2', 'qed'],
diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out b/tests/qemu-iotests/tests/graph-changes-while-io.out
index ae1213e6f8..fbc63e62f8 100644
--- a/tests/qemu-iotests/tests/graph-changes-while-io.out
+++ b/tests/qemu-iotests/tests/graph-changes-while-io.out
@@ -1,5 +1,5 @@
-.
+..
 ----------------------------------------------------------------------
-Ran 1 tests
+Ran 2 tests
 
 OK
-- 
2.40.1

Re: [PATCH 3/3] iotests: Test commit with iothreads and ongoing I/O

Posted by Eric Blake 2 years, 8 months ago

On Wed, May 17, 2023 at 05:28:34PM +0200, Kevin Wolf wrote:
> This tests exercises graph locking, draining, and graph modifications
> with AioContext switches a lot. Amongst others, it serves as a
> regression test for bdrv_graph_wrlock() deadlocking because it is called
> with a locked AioContext and for AioContext handling in the NBD server.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

I've now confirmed the following setups with just './check
graph-changes-while-io -qcow2':

patch 3 alone => test fails with wrlock assertion (good, we're
catching the 8.0 regression where the new assertion failure is
tripping)

patch 1 and 3 => test fails differently than patch 3 alone (good,
we're exposing the fact that NBD had a pre-existing bug, regardless of
whether the added rwlock made it easier to spot)

patch 2 and 3 => test passes (good, patch 2 appears to have fixed this
particular bug, and when we are ready to revert patch 1 because we get
rid of AioContext locking we'll still be okay)

patch 1, 2, and 3 => test passes (good, we fixed the NBD bug, and have
regression testing in place for a scenario that previously wasn't
getting good testing)

As such, I'm happy to supply:

Tested-by: Eric Blake <eblake@redhat.com>


Now on to the patch itself...

> ---
>  tests/qemu-iotests/iotests.py                 |  4 ++
>  .../qemu-iotests/tests/graph-changes-while-io | 56 +++++++++++++++++--
>  .../tests/graph-changes-while-io.out          |  4 +-
>  3 files changed, 58 insertions(+), 6 deletions(-)
> 

> @@ -84,6 +84,54 @@ class TestGraphChangesWhileIO(QMPTestCase):
>  
>          bench_thr.join()
>  
> +    def test_commit_while_io(self) -> None:
> +        # Run qemu-img bench in the background
> +        bench_thr = Thread(target=do_qemu_img_bench, args=(200000, ))

TIL - you can create a 1-item tuple in Python.  It caught me
off-guard, but makes sense now that I've re-read it.

> +
> +        # While qemu-img bench is running, repeatedly commit overlay to node0
> +        while bench_thr.is_alive():
> +            result = self.qsd.qmp('block-commit', {
> +                'job-id': 'job0',
> +                'device': 'overlay',
> +            })
> +            self.assert_qmp(result, 'return', {})
> +
> +            result = self.qsd.qmp('block-job-cancel', {
> +                'device': 'job0',
> +            })
> +            self.assert_qmp(result, 'return', {})
> +
> +            cancelled = False
> +            while not cancelled:
> +                for event in self.qsd.get_qmp().get_events(wait=10.0):

The updated test took about 34 seconds on my machine; long enough that
it is rightfully not part of './check -g quick', but still reasonable
that I had no problems reproducing the issue while the test was
running.

> +                    if event['event'] != 'JOB_STATUS_CHANGE':
> +                        continue
> +                    if event['data']['status'] == 'null':
> +                        cancelled = True
> +
> +        bench_thr.join()
> +

It feels a bit odd that the test is skipped during './check -nbd', yet
it IS utilizing nbd, and the fix in patch 2 is indeed in NBD code.
But that's not a flaw in the test itself, just a limitation of what
images we need in order to set up the NBD service in a way to trigger
the problem.

Reviewed-by: Eric Blake <eblake@redhat.com>

I'm in a spot where I can quickly queue this through my NBD tree so we
can get it backported to 8.0.1; pull request coming up, provided the
full series passes a few more tests on my end.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PATCH 1/3] graph-lock: Disable locking for now
[PATCH 2/3] nbd/server: Fix drained_poll to wake coroutine in right AioContext
[PATCH 3/3] iotests: Test commit with iothreads and ongoing I/O