[PATCH v3 0/7] Switch iotests to using Async QMP

John Snow posted 7 patches 1 week, 4 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20211012223445.1051101-1-jsnow@redhat.com
Maintainers: Cleber Rosa <crosa@redhat.com>, Hanna Reitz <hreitz@redhat.com>, John Snow <jsnow@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, Kevin Wolf <kwolf@redhat.com>, Eduardo Habkost <ehabkost@redhat.com>
python/qemu/aqmp/__init__.py              |  12 --
python/qemu/aqmp/legacy.py                | 138 ++++++++++++++++++++++
python/qemu/machine/machine.py            |  85 +++++++++----
scripts/simplebench/bench_block_job.py    |   3 +-
tests/qemu-iotests/040                    |   7 +-
tests/qemu-iotests/218                    |   2 +-
tests/qemu-iotests/255                    |   2 +-
tests/qemu-iotests/iotests.py             |  20 +++-
tests/qemu-iotests/tests/mirror-top-perms |  17 ++-
9 files changed, 238 insertions(+), 48 deletions(-)
create mode 100644 python/qemu/aqmp/legacy.py

[PATCH v3 0/7] Switch iotests to using Async QMP

Posted by John Snow 1 week, 4 days ago
Based-on: <20211012214152.802483-1-jsnow@redhat.com>
          [PULL 00/10] Python patches
GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591

Hiya,

This series continues where the last two AQMP series left off and adds a
synchronous 'legacy' wrapper around the new AQMP interface, then drops
it straight into iotests to prove that AQMP is functional and totally
cool and fine. The disruption and churn to iotests is pretty minimal.

In the event that a regression happens and I am not physically proximate
to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable
to any non-empty string as it pleases you to engage the QMP machinery
you are used to.

I'd like to try and get this committed early in the 6.2 development
cycle to give ample time to smooth over any possible regressions. I've
tested it locally and via gitlab CI, across Python versions 3.6 through
3.10, and "worksforme". If something bad happens, we can revert the
actual switch-flip very trivially.

V3:

001/7:[----] [--] 'python/machine: remove has_quit argument'
002/7:[0002] [FC] 'python/machine: Handle QMP errors on close more meticulously'
003/7:[----] [--] 'python/aqmp: Remove scary message'
004/7:[0006] [FC] 'iotests: Accommodate async QMP Exception classes'
005/7:[0003] [FC] 'iotests: Conditionally silence certain AQMP errors'
006/7:[0009] [FC] 'python/aqmp: Create sync QMP wrapper for iotests'
007/7:[----] [--] 'python, iotests: replace qmp with aqmp'

002: Account for force-kill cases, too.
003: Shuffled earlier into the series to prevent a mid-series regression.
004: Rewrite the imports to be less "heterogeneous" ;)
005: Add in a TODO for me to trip over in the future.
006: Fix a bug surfaced by a new iotest where waiting with pull_event for a
     timeout of 0.0 will cause a timeout exception to be raised even if there
     was an event ready to be read.

V2:

001/17:[----] [--] 'python/aqmp: add greeting property to QMPClient'
002/17:[----] [--] 'python/aqmp: add .empty() method to EventListener'
003/17:[----] [--] 'python/aqmp: Return cleared events from EventListener.clear()'
004/17:[0007] [FC] 'python/aqmp: add send_fd_scm'
005/17:[down] 'python/aqmp: Add dict conversion method to Greeting object'
006/17:[down] 'python/aqmp: Reduce severity of EOFError-caused loop terminations'
007/17:[down] 'python/aqmp: Disable logging messages by default'

008/17:[0002] [FC] 'python/qmp: clear events on get_events() call'
009/17:[----] [--] 'python/qmp: add send_fd_scm directly to QEMUMonitorProtocol'
010/17:[----] [--] 'python, iotests: remove socket_scm_helper'
011/17:[0013] [FC] 'python/machine: remove has_quit argument'
012/17:[down] 'python/machine: Handle QMP errors on close more meticulously'

013/17:[0009] [FC] 'iotests: Accommodate async QMP Exception classes'
014/17:[down] 'iotests: Conditionally silence certain AQMP errors'

015/17:[0016] [FC] 'python/aqmp: Create sync QMP wrapper for iotests'
016/17:[0002] [FC] 'python/aqmp: Remove scary message'
017/17:[----] [--] 'python, iotests: replace qmp with aqmp'

- Rebased on jsnow/python, which was recently rebased on origin/master.
- Make aqmp's send_fd_scm method bark if the socket isn't AF_UNIX (Hanna)
- Uh... modify send_fd_scm so it doesn't break when Python 3.11 comes out ...
  See the commit message for more detail.
- Drop the "python/aqmp: Create MessageModel and StandaloneModel class"
  patch and replace with a far simpler method that just adds an
  _asdict() method.
- Add patches 06 and 07 to change how the AQMP library handles logging.
- Adjust docstring in patch 08 (Hanna)
- Rename "_has_quit" attribute to "_quid_issued" (Hanna)
- Renamed patch 12, simplified the logic in _soft_shutdown a tiny bit.
- Fixed bad exception handling logic in 13 (Hanna)
- Introduce a helper in patch 14 to silence log output when it's unwanted.
- Small addition of _get_greeting() helper in patch 15, coinciding with the
  new patch 05 here.
- Contextual changes in 16.

John Snow (7):
  python/machine: remove has_quit argument
  python/machine: Handle QMP errors on close more meticulously
  python/aqmp: Remove scary message
  iotests: Accommodate async QMP Exception classes
  iotests: Conditionally silence certain AQMP errors
  python/aqmp: Create sync QMP wrapper for iotests
  python, iotests: replace qmp with aqmp

 python/qemu/aqmp/__init__.py              |  12 --
 python/qemu/aqmp/legacy.py                | 138 ++++++++++++++++++++++
 python/qemu/machine/machine.py            |  85 +++++++++----
 scripts/simplebench/bench_block_job.py    |   3 +-
 tests/qemu-iotests/040                    |   7 +-
 tests/qemu-iotests/218                    |   2 +-
 tests/qemu-iotests/255                    |   2 +-
 tests/qemu-iotests/iotests.py             |  20 +++-
 tests/qemu-iotests/tests/mirror-top-perms |  17 ++-
 9 files changed, 238 insertions(+), 48 deletions(-)
 create mode 100644 python/qemu/aqmp/legacy.py

-- 
2.31.1



Re: [PATCH v3 0/7] Switch iotests to using Async QMP

Posted by Hanna Reitz 1 week, 3 days ago
On 13.10.21 00:34, John Snow wrote:
> Based-on: <20211012214152.802483-1-jsnow@redhat.com>
>            [PULL 00/10] Python patches
> GitLab: https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
> CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591
>
> Hiya,
>
> This series continues where the last two AQMP series left off and adds a
> synchronous 'legacy' wrapper around the new AQMP interface, then drops
> it straight into iotests to prove that AQMP is functional and totally
> cool and fine. The disruption and churn to iotests is pretty minimal.
>
> In the event that a regression happens and I am not physically proximate
> to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable
> to any non-empty string as it pleases you to engage the QMP machinery
> you are used to.
>
> I'd like to try and get this committed early in the 6.2 development
> cycle to give ample time to smooth over any possible regressions. I've
> tested it locally and via gitlab CI, across Python versions 3.6 through
> 3.10, and "worksforme". If something bad happens, we can revert the
> actual switch-flip very trivially.

So running iotests locally, I got one failure:

$ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300
[...]
300                             fail       [10:28:06] [10:28:11] 
5.1s                 output mismatch (see 300.out.bad)
--- /home/maxx/projects/qemu/tests/qemu-iotests/300.out
+++ 300.out.bad
@@ -1,4 +1,5 @@
-.......................................
+..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader: 
ConnectionResetError: [Errno 104] Connection reset by peer
+.........................
  ----------------------------------------------------------------------
  Ran 39 tests
[...]


I’m afraid I can’t really give a reproducer or anything.  It feels like 
just some random spurious timing-related error.  Although then again, 
300 does have an `except machine.AbnormalShutdown` clause at one 
point...  So perhaps that’s the culprit, and we need to disable logging 
there.

Hanna


Re: [PATCH v3 0/7] Switch iotests to using Async QMP

Posted by John Snow 1 week, 3 days ago
On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz <hreitz@redhat.com> wrote:

> On 13.10.21 00:34, John Snow wrote:
> > Based-on: <20211012214152.802483-1-jsnow@redhat.com>
> >            [PULL 00/10] Python patches
> > GitLab:
> https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
> > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591
> >
> > Hiya,
> >
> > This series continues where the last two AQMP series left off and adds a
> > synchronous 'legacy' wrapper around the new AQMP interface, then drops
> > it straight into iotests to prove that AQMP is functional and totally
> > cool and fine. The disruption and churn to iotests is pretty minimal.
> >
> > In the event that a regression happens and I am not physically proximate
> > to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable
> > to any non-empty string as it pleases you to engage the QMP machinery
> > you are used to.
> >
> > I'd like to try and get this committed early in the 6.2 development
> > cycle to give ample time to smooth over any possible regressions. I've
> > tested it locally and via gitlab CI, across Python versions 3.6 through
> > 3.10, and "worksforme". If something bad happens, we can revert the
> > actual switch-flip very trivially.
>
> So running iotests locally, I got one failure:
>
> $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300
> [...]
> 300                             fail       [10:28:06] [10:28:11]
> 5.1s                 output mismatch (see 300.out.bad)
> --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out
> +++ 300.out.bad
> @@ -1,4 +1,5 @@
> -.......................................
> +..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader:
> ConnectionResetError: [Errno 104] Connection reset by peer
> +.........................
>   ----------------------------------------------------------------------
>   Ran 39 tests
> [...]
>
>
Oh, unfortunate.


>
> I’m afraid I can’t really give a reproducer or anything.  It feels like
>

Thank you for the report!


> just some random spurious timing-related error.  Although then again,
> 300 does have an `except machine.AbnormalShutdown` clause at one
> point...  So perhaps that’s the culprit, and we need to disable logging
> there.
>
>
I'll investigate!

Re: [PATCH v3 0/7] Switch iotests to using Async QMP

Posted by John Snow 1 week, 3 days ago
On Wed, Oct 13, 2021 at 8:51 AM John Snow <jsnow@redhat.com> wrote:

>
>
> On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz <hreitz@redhat.com> wrote:
>
>> On 13.10.21 00:34, John Snow wrote:
>> > Based-on: <20211012214152.802483-1-jsnow@redhat.com>
>> >            [PULL 00/10] Python patches
>> > GitLab:
>> https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
>> > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591
>> >
>> > Hiya,
>> >
>> > This series continues where the last two AQMP series left off and adds a
>> > synchronous 'legacy' wrapper around the new AQMP interface, then drops
>> > it straight into iotests to prove that AQMP is functional and totally
>> > cool and fine. The disruption and churn to iotests is pretty minimal.
>> >
>> > In the event that a regression happens and I am not physically proximate
>> > to inflict damage upon, one may set the QEMU_PYTHON_LEGACY_QMP variable
>> > to any non-empty string as it pleases you to engage the QMP machinery
>> > you are used to.
>> >
>> > I'd like to try and get this committed early in the 6.2 development
>> > cycle to give ample time to smooth over any possible regressions. I've
>> > tested it locally and via gitlab CI, across Python versions 3.6 through
>> > 3.10, and "worksforme". If something bad happens, we can revert the
>> > actual switch-flip very trivially.
>>
>> So running iotests locally, I got one failure:
>>
>> $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300
>> [...]
>> 300                             fail       [10:28:06] [10:28:11]
>> 5.1s                 output mismatch (see 300.out.bad)
>> --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out
>> +++ 300.out.bad
>> @@ -1,4 +1,5 @@
>> -.......................................
>> +..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader:
>> ConnectionResetError: [Errno 104] Connection reset by peer
>> +.........................
>>   ----------------------------------------------------------------------
>>   Ran 39 tests
>> [...]
>>
>>
> Oh, unfortunate.
>
>
>>
>> I’m afraid I can’t really give a reproducer or anything.  It feels like
>>
>
> Thank you for the report!
>
>
>> just some random spurious timing-related error.  Although then again,
>> 300 does have an `except machine.AbnormalShutdown` clause at one
>> point...  So perhaps that’s the culprit, and we need to disable logging
>> there.
>>
>>
> I'll investigate!
>

Unfortunately, even in a loop some 150 times I couldn't reproduce this one.
As you point out, it appears to be just a failure caused by logging. The
test logic itself completes as expected.

Still, I would expect, on a "clean" shutdown of the destination host (where
the destination process fails to load the migration stream and voluntarily
exits with an error code) to end with a FIN/ACK for TCP or ... uh, whatever
happens for a UNIX socket. Where's the Connection Reset coming from? Did
the destination VM process *crash*?

I'm not so sure that I *should* silence this error, but I also can't
reproduce it at all to answer these questions, so uh. uhhh. I guess I will
just hammer it on a loop a few hundred times more and see if I get lucky.

Re: [PATCH v3 0/7] Switch iotests to using Async QMP

Posted by Hanna Reitz 1 week, 3 days ago
On 13.10.21 16:00, John Snow wrote:
>
>
> On Wed, Oct 13, 2021 at 8:51 AM John Snow <jsnow@redhat.com> wrote:
>
>
>
>     On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz <hreitz@redhat.com> wrote:
>
>         On 13.10.21 00:34, John Snow wrote:
>         > Based-on: <20211012214152.802483-1-jsnow@redhat.com>
>         >            [PULL 00/10] Python patches
>         > GitLab:
>         https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
>         > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591
>         >
>         > Hiya,
>         >
>         > This series continues where the last two AQMP series left
>         off and adds a
>         > synchronous 'legacy' wrapper around the new AQMP interface,
>         then drops
>         > it straight into iotests to prove that AQMP is functional
>         and totally
>         > cool and fine. The disruption and churn to iotests is pretty
>         minimal.
>         >
>         > In the event that a regression happens and I am not
>         physically proximate
>         > to inflict damage upon, one may set the
>         QEMU_PYTHON_LEGACY_QMP variable
>         > to any non-empty string as it pleases you to engage the QMP
>         machinery
>         > you are used to.
>         >
>         > I'd like to try and get this committed early in the 6.2
>         development
>         > cycle to give ample time to smooth over any possible
>         regressions. I've
>         > tested it locally and via gitlab CI, across Python versions
>         3.6 through
>         > 3.10, and "worksforme". If something bad happens, we can
>         revert the
>         > actual switch-flip very trivially.
>
>         So running iotests locally, I got one failure:
>
>         $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300
>         [...]
>         300                             fail       [10:28:06] [10:28:11]
>         5.1s                 output mismatch (see 300.out.bad)
>         --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out
>         +++ 300.out.bad
>         @@ -1,4 +1,5 @@
>         -.......................................
>         +..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader:
>
>         ConnectionResetError: [Errno 104] Connection reset by peer
>         +.........................
>           ----------------------------------------------------------------------
>           Ran 39 tests
>         [...]
>
>
>     Oh, unfortunate.
>
>
>         I’m afraid I can’t really give a reproducer or anything.  It
>         feels like
>
>
>     Thank you for the report!
>
>         just some random spurious timing-related error. Although then
>         again,
>         300 does have an `except machine.AbnormalShutdown` clause at one
>         point...  So perhaps that’s the culprit, and we need to
>         disable logging
>         there.
>
>
>     I'll investigate!
>
>
> Unfortunately, even in a loop some 150 times I couldn't reproduce this 
> one. As you point out, it appears to be just a failure caused by 
> logging. The test logic itself completes as expected.
>
> Still, I would expect, on a "clean" shutdown of the destination host 
> (where the destination process fails to load the migration stream and 
> voluntarily exits with an error code) to end with a FIN/ACK for TCP or 
> ... uh, whatever happens for a UNIX socket. Where's the Connection 
> Reset coming from? Did the destination VM process *crash*?
>
> I'm not so sure that I *should* silence this error, but I also can't 
> reproduce it at all to answer these questions, so uh. uhhh. I guess I 
> will just hammer it on a loop a few hundred times more and see if I 
> get lucky.

I could reproduce it, by running 20 instances concurrently.  (Needs a 
change to testrunner.py, so that the reference outputs don’t collide:

diff --git a/tests/qemu-iotests/testrunner.py 
b/tests/qemu-iotests/testrunner.py
index a56b6da396..fd0a3a1eeb 100644
--- a/tests/qemu-iotests/testrunner.py
+++ b/tests/qemu-iotests/testrunner.py
@@ -221,7 +221,7 @@ def find_reference(self, test: str) -> str:

      def do_run_test(self, test: str) -> TestResult:
          f_test = Path(test)
-        f_bad = Path(f_test.name + '.out.bad')
+        f_bad = Path(f'{os.getpid()}-{f_test.name}.out.bad')
          f_notrun = Path(f_test.name + '.notrun')
          f_casenotrun = Path(f_test.name + '.casenotrun')
          f_reference = Path(self.find_reference(test))

)

And then:

$ while TEST_DIR=/tmp/vdi-$$ ./check -vdi 300; do; done

Which pretty quickly shows the error in at least one of those loops 
(under a minute).

As far as I can tell, changing the log level in 300 does indeed fix it:

diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
index 10f9f2a8da..096f5dabf0 100755
--- a/tests/qemu-iotests/300
+++ b/tests/qemu-iotests/300
@@ -27,6 +27,7 @@ from typing import Dict, List, Optional
  from qemu.machine import machine

  import iotests
+from iotests import change_log_level


  BlockBitmapMapping = List[Dict[str, object]]
@@ -464,7 +465,8 @@ class 
TestBlockBitmapMappingErrors(TestDirtyBitmapMigration):
          # Expect abnormal shutdown of the destination VM because of
          # the failed migration
          try:
-            self.vm_b.shutdown()
+            with change_log_level('qemu.aqmp'):
+                self.vm_b.shutdown()
          except machine.AbnormalShutdown:
              pass


Re: [PATCH v3 0/7] Switch iotests to using Async QMP

Posted by John Snow 1 week, 3 days ago
On Wed, Oct 13, 2021 at 10:49 AM Hanna Reitz <hreitz@redhat.com> wrote:

> On 13.10.21 16:00, John Snow wrote:
> >
> >
> > On Wed, Oct 13, 2021 at 8:51 AM John Snow <jsnow@redhat.com> wrote:
> >
> >
> >
> >     On Wed, Oct 13, 2021 at 4:45 AM Hanna Reitz <hreitz@redhat.com>
> wrote:
> >
> >         On 13.10.21 00:34, John Snow wrote:
> >         > Based-on: <20211012214152.802483-1-jsnow@redhat.com>
> >         >            [PULL 00/10] Python patches
> >         > GitLab:
> >
> https://gitlab.com/jsnow/qemu/-/commits/python-aqmp-iotest-wrapper
> >         > CI: https://gitlab.com/jsnow/qemu/-/pipelines/387210591
> >         >
> >         > Hiya,
> >         >
> >         > This series continues where the last two AQMP series left
> >         off and adds a
> >         > synchronous 'legacy' wrapper around the new AQMP interface,
> >         then drops
> >         > it straight into iotests to prove that AQMP is functional
> >         and totally
> >         > cool and fine. The disruption and churn to iotests is pretty
> >         minimal.
> >         >
> >         > In the event that a regression happens and I am not
> >         physically proximate
> >         > to inflict damage upon, one may set the
> >         QEMU_PYTHON_LEGACY_QMP variable
> >         > to any non-empty string as it pleases you to engage the QMP
> >         machinery
> >         > you are used to.
> >         >
> >         > I'd like to try and get this committed early in the 6.2
> >         development
> >         > cycle to give ample time to smooth over any possible
> >         regressions. I've
> >         > tested it locally and via gitlab CI, across Python versions
> >         3.6 through
> >         > 3.10, and "worksforme". If something bad happens, we can
> >         revert the
> >         > actual switch-flip very trivially.
> >
> >         So running iotests locally, I got one failure:
> >
> >         $ TEST_DIR=/tmp/vdi-tests ./check -c writethrough -vdi 300
> >         [...]
> >         300                             fail       [10:28:06] [10:28:11]
> >         5.1s                 output mismatch (see 300.out.bad)
> >         --- /home/maxx/projects/qemu/tests/qemu-iotests/300.out
> >         +++ 300.out.bad
> >         @@ -1,4 +1,5 @@
> >         -.......................................
> >
>  +..............ERROR:qemu.aqmp.qmp_client.qemu-b-222963:Task.Reader:
> >
> >         ConnectionResetError: [Errno 104] Connection reset by peer
> >         +.........................
> >
>    ----------------------------------------------------------------------
> >           Ran 39 tests
> >         [...]
> >
> >
> >     Oh, unfortunate.
> >
> >
> >         I’m afraid I can’t really give a reproducer or anything.  It
> >         feels like
> >
> >
> >     Thank you for the report!
> >
> >         just some random spurious timing-related error. Although then
> >         again,
> >         300 does have an `except machine.AbnormalShutdown` clause at one
> >         point...  So perhaps that’s the culprit, and we need to
> >         disable logging
> >         there.
> >
> >
> >     I'll investigate!
> >
> >
> > Unfortunately, even in a loop some 150 times I couldn't reproduce this
> > one. As you point out, it appears to be just a failure caused by
> > logging. The test logic itself completes as expected.
> >
> > Still, I would expect, on a "clean" shutdown of the destination host
> > (where the destination process fails to load the migration stream and
> > voluntarily exits with an error code) to end with a FIN/ACK for TCP or
> > ... uh, whatever happens for a UNIX socket. Where's the Connection
> > Reset coming from? Did the destination VM process *crash*?
> >
> > I'm not so sure that I *should* silence this error, but I also can't
> > reproduce it at all to answer these questions, so uh. uhhh. I guess I
> > will just hammer it on a loop a few hundred times more and see if I
> > get lucky.
>
> I could reproduce it, by running 20 instances concurrently.  (Needs a
> change to testrunner.py, so that the reference outputs don’t collide:
>
> diff --git a/tests/qemu-iotests/testrunner.py
> b/tests/qemu-iotests/testrunner.py
> index a56b6da396..fd0a3a1eeb 100644
> --- a/tests/qemu-iotests/testrunner.py
> +++ b/tests/qemu-iotests/testrunner.py
> @@ -221,7 +221,7 @@ def find_reference(self, test: str) -> str:
>
>       def do_run_test(self, test: str) -> TestResult:
>           f_test = Path(test)
> -        f_bad = Path(f_test.name + '.out.bad')
> +        f_bad = Path(f'{os.getpid()}-{f_test.name}.out.bad')
>           f_notrun = Path(f_test.name + '.notrun')
>           f_casenotrun = Path(f_test.name + '.casenotrun')
>           f_reference = Path(self.find_reference(test))
>
> )
>
> And then:
>
> $ while TEST_DIR=/tmp/vdi-$$ ./check -vdi 300; do; done
>
> Which pretty quickly shows the error in at least one of those loops
> (under a minute).
>
> As far as I can tell, changing the log level in 300 does indeed fix it:
>
> diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
> index 10f9f2a8da..096f5dabf0 100755
> --- a/tests/qemu-iotests/300
> +++ b/tests/qemu-iotests/300
> @@ -27,6 +27,7 @@ from typing import Dict, List, Optional
>   from qemu.machine import machine
>
>   import iotests
> +from iotests import change_log_level
>
>
>   BlockBitmapMapping = List[Dict[str, object]]
> @@ -464,7 +465,8 @@ class
> TestBlockBitmapMappingErrors(TestDirtyBitmapMigration):
>           # Expect abnormal shutdown of the destination VM because of
>           # the failed migration
>           try:
> -            self.vm_b.shutdown()
> +            with change_log_level('qemu.aqmp'):
> +                self.vm_b.shutdown()
>           except machine.AbnormalShutdown:
>               pass
>
>
Thanks for this! That helped a lot.

Long story short, what is happening here is that in some cases, we manage
to issue a 'quit' command before we acknowledge the EOF from the target VM,
so we get an 'ExecInterruptedError' from the top half and ECONNRESET from
the bottom half.
This gives me some faith that there's no more sinister problem going on
here.

There's really no way to "universally" solve this, though: It's always
going to be a problem that *maybe* sometimes when we go to send a command
that, no matter how recently we checked, the peer appeared to be online and
then upon sending a message we found out that isn't true.

However, there's a cuter way to solve this particular instance: just change
test 300 to wait() for the VM to self-exit instead of trying to shut it
down ourselves. That avoids the race condition in exactly what type of
error message we find when shutting down a dead-or-dying VM.

Thanks for the assistance on debugging this one.

--js