[v1] iotests: make meson aware of individual I/O tests

[PATCH 0/5] iotests: make meson aware of individual I/O tests

Posted by Daniel P. Berrangé 2 years, 11 months ago

To just repeat the patch 5 description...

Currently meson registers a single test that invokes an entire group of
I/O tests, hiding the test granularity from meson. There are various
downsides of doing this

 * You cannot ask 'meson test' to invoke a single I/O test
 * The meson test timeout can't be applied to the individual
   tests
 * Meson only gets a pass/fail for the overall I/O test group
   not individual tests
 * If a CI job gets killed by the GitLab timeout, we don't
   get visibility into how far through the I/O tests
   execution got.

This is not really specific to the I/O tests, the problem is common
to any case of us running a test which is in fact another test
harness which runs many tests. It would be nice to have meson have
the full view of all tests run. Adapting the I/O tests is as easy
win in this respect.

This switches meson to perform test discovery by invoking 'check' in
dry-run mode. It then registers one meson test case for each I/O
test. Parallel execution remains disabled since the I/O tests do not
use self contained execution environments and thus conflict with
each other.

Compare contrast output from a current job:

  https://gitlab.com/qemu-project/qemu/-/jobs/3863603546

[quote]
204/224 qemu:block / qemu-iotests qcow2   OK 329.94s   119 subtests passed
[/quote]

Vs what is seen with this series:

  https://gitlab.com/berrange/qemu/-/jobs/3865975463

[quote]
204/350 qemu:block / qemu-iotests-qcow2-001   OK    2.16s   1 subtests passed
205/350 qemu:block / qemu-iotests-qcow2-002   OK    2.77s   1 subtests passed

...snip...

329/350 qemu:block / qemu-iotests-qcow2-qemu-img-close-errors       OK    6.19s   1 subtests passed
330/350 qemu:block / qemu-iotests-qcow2-qsd-jobs          OK    0.55s   1 subtests passed
[/quote]

A few tweaks were needed to the iotests runner because it had a few
assumptions about it always running in a tree that has already been
built, which is obviously not the case at the time meson does test
discovery.

Daniel P. Berrangé (5):
  iotests: explicitly pass source/build dir to 'check' command
  iotests: allow test discovery before building
  iotests: strip subdir path when listing tests
  iotests: print TAP protocol version when reporting tests
  iotests: register each I/O test separately with meson

 tests/qemu-iotests/check         | 11 +++++++++--
 tests/qemu-iotests/meson.build   | 33 ++++++++++++++++++++++++++------
 tests/qemu-iotests/testenv.py    | 22 +++++++++++++++++----
 tests/qemu-iotests/testrunner.py |  1 +
 4 files changed, 55 insertions(+), 12 deletions(-)

-- 
2.39.2

Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

Posted by Thomas Huth 2 years, 11 months ago

On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> To just repeat the patch 5 description...
> 
> Currently meson registers a single test that invokes an entire group of
> I/O tests, hiding the test granularity from meson. There are various
> downsides of doing this
> 
>   * You cannot ask 'meson test' to invoke a single I/O test
>   * The meson test timeout can't be applied to the individual
>     tests
>   * Meson only gets a pass/fail for the overall I/O test group
>     not individual tests
>   * If a CI job gets killed by the GitLab timeout, we don't
>     get visibility into how far through the I/O tests
>     execution got.
> 
> This is not really specific to the I/O tests, the problem is common
> to any case of us running a test which is in fact another test
> harness which runs many tests. It would be nice to have meson have
> the full view of all tests run. Adapting the I/O tests is as easy
> win in this respect.
> 
> This switches meson to perform test discovery by invoking 'check' in
> dry-run mode. It then registers one meson test case for each I/O
> test. Parallel execution remains disabled since the I/O tests do not
> use self contained execution environments and thus conflict with
> each other.

Great to see some movement in this area again!

Some questions/remarks:

1) Could you remove tests/check-block.sh now? See also:
    https://lore.kernel.org/all/20220209101530.3442837-9-thuth@redhat.com/

2) With regards to parallel execution ... I think it should be
    possible nowadays - the "check" script is normally also run
    with the "-j" switch by the tests/check-block.sh script, so
    if you remove the possibility to run in parallel, it's a
    regression from the previous behavior!

3) When I tried this last year, I had a weird problem that
    the terminal sometimes gets messed up ... I wasn't able
    to track it down back then - could you check by running
    "make check-block" many times (>10 times) to see whether
    it happens with your series or not?

  Thomas

Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

Posted by Daniel P. Berrangé 2 years, 11 months ago

On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
> On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> 3) When I tried this last year, I had a weird problem that
>    the terminal sometimes gets messed up ... I wasn't able
>    to track it down back then - could you check by running
>    "make check-block" many times (>10 times) to see whether
>    it happens with your series or not?

I've just seen this - echo got disabled on my terminal.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

Posted by Daniel P. Berrangé 2 years, 11 months ago

On Fri, Mar 03, 2023 at 10:45:40AM +0000, Daniel P. Berrangé wrote:
> On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
> > On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> > 3) When I tried this last year, I had a weird problem that
> >    the terminal sometimes gets messed up ... I wasn't able
> >    to track it down back then - could you check by running
> >    "make check-block" many times (>10 times) to see whether
> >    it happens with your series or not?
> 
> I've just seen this - echo got disabled on my terminal.

The problem is that testrunner.py script doing

# We want to save current tty settings during test run,
# since an aborting qemu call may leave things screwed up.
@contextmanager
def savetty() -> Iterator[None]:
    isterm = sys.stdin.isatty()
    if isterm:
        fd = sys.stdin.fileno()
        attr = termios.tcgetattr(fd)

    try:
        yield
    finally:
        if isterm:
            termios.tcsetattr(fd, termios.TCSADRAIN, attr)

When invoking 'check' this wraps around execution of the entire
'check' script. IOW it saves/restores the terminal once.

When we invoke 'check' in parallel it will save/restore the same
terminal for each invokation.

If the 'termios.tcgetattr' call runs at the same time as there is
a QEMU running which has put the terminal in raw mode, then when
'check' exits it'll "restore" the terminal to raw mode.

IOW, this savetty() logic is fundamentally unsafe when invoking
'check' in parallel.

AFAICT, the more reliable thing todo would be to spawn a new PTY
when invoking each test, so there's no cleanup required. If QEMU
aborts leaving the TTY messed up it dosn't matter was it was a
temporary throwaway PTY.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

Posted by Thomas Huth 2 years, 11 months ago

On 03/03/2023 14.06, Daniel P. Berrangé wrote:
> On Fri, Mar 03, 2023 at 10:45:40AM +0000, Daniel P. Berrangé wrote:
>> On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
>>> On 02/03/2023 19.46, Daniel P. Berrangé wrote:
>>> 3) When I tried this last year, I had a weird problem that
>>>     the terminal sometimes gets messed up ... I wasn't able
>>>     to track it down back then - could you check by running
>>>     "make check-block" many times (>10 times) to see whether
>>>     it happens with your series or not?
>>
>> I've just seen this - echo got disabled on my terminal.
> 
> The problem is that testrunner.py script doing
> 
> # We want to save current tty settings during test run,
> # since an aborting qemu call may leave things screwed up.
> @contextmanager
> def savetty() -> Iterator[None]:
>      isterm = sys.stdin.isatty()
>      if isterm:
>          fd = sys.stdin.fileno()
>          attr = termios.tcgetattr(fd)
> 
>      try:
>          yield
>      finally:
>          if isterm:
>              termios.tcsetattr(fd, termios.TCSADRAIN, attr)
> 
> 
> When invoking 'check' this wraps around execution of the entire
> 'check' script. IOW it saves/restores the terminal once.
> 
> When we invoke 'check' in parallel it will save/restore the same
> terminal for each invokation.
> 
> If the 'termios.tcgetattr' call runs at the same time as there is
> a QEMU running which has put the terminal in raw mode, then when
> 'check' exits it'll "restore" the terminal to raw mode.
> 
> IOW, this savetty() logic is fundamentally unsafe when invoking
> 'check' in parallel.

Hmm, couldn't we e.g. also simply skip this termios stuff when running with 
"--tap" ?

  Thomas

Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

Posted by Daniel P. Berrangé 2 years, 11 months ago

On Fri, Mar 03, 2023 at 04:49:36PM +0100, Thomas Huth wrote:
> On 03/03/2023 14.06, Daniel P. Berrangé wrote:
> > On Fri, Mar 03, 2023 at 10:45:40AM +0000, Daniel P. Berrangé wrote:
> > > On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
> > > > On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> > > > 3) When I tried this last year, I had a weird problem that
> > > >     the terminal sometimes gets messed up ... I wasn't able
> > > >     to track it down back then - could you check by running
> > > >     "make check-block" many times (>10 times) to see whether
> > > >     it happens with your series or not?
> > > 
> > > I've just seen this - echo got disabled on my terminal.
> > 
> > The problem is that testrunner.py script doing
> > 
> > # We want to save current tty settings during test run,
> > # since an aborting qemu call may leave things screwed up.
> > @contextmanager
> > def savetty() -> Iterator[None]:
> >      isterm = sys.stdin.isatty()
> >      if isterm:
> >          fd = sys.stdin.fileno()
> >          attr = termios.tcgetattr(fd)
> > 
> >      try:
> >          yield
> >      finally:
> >          if isterm:
> >              termios.tcsetattr(fd, termios.TCSADRAIN, attr)
> > 
> > 
> > When invoking 'check' this wraps around execution of the entire
> > 'check' script. IOW it saves/restores the terminal once.
> > 
> > When we invoke 'check' in parallel it will save/restore the same
> > terminal for each invokation.
> > 
> > If the 'termios.tcgetattr' call runs at the same time as there is
> > a QEMU running which has put the terminal in raw mode, then when
> > 'check' exits it'll "restore" the terminal to raw mode.
> > 
> > IOW, this savetty() logic is fundamentally unsafe when invoking
> > 'check' in parallel.
> 
> Hmm, couldn't we e.g. also simply skip this termios stuff when running with
> "--tap" ?

It actually turns out to be way simpler. We merely need to set
stdin=subprocess.DEVNULL. There is no valid reason for the test
cases to be reading for stdin, as that would block execution.
By setting stdin==/dev/null, the isatty(0) checks will fail and
thus QEMU won't mess with termios, and thus we don't need any
save/restore dance either.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

Posted by Daniel P. Berrangé 2 years, 11 months ago

On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
> On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> > To just repeat the patch 5 description...
> > 
> > Currently meson registers a single test that invokes an entire group of
> > I/O tests, hiding the test granularity from meson. There are various
> > downsides of doing this
> > 
> >   * You cannot ask 'meson test' to invoke a single I/O test
> >   * The meson test timeout can't be applied to the individual
> >     tests
> >   * Meson only gets a pass/fail for the overall I/O test group
> >     not individual tests
> >   * If a CI job gets killed by the GitLab timeout, we don't
> >     get visibility into how far through the I/O tests
> >     execution got.
> > 
> > This is not really specific to the I/O tests, the problem is common
> > to any case of us running a test which is in fact another test
> > harness which runs many tests. It would be nice to have meson have
> > the full view of all tests run. Adapting the I/O tests is as easy
> > win in this respect.
> > 
> > This switches meson to perform test discovery by invoking 'check' in
> > dry-run mode. It then registers one meson test case for each I/O
> > test. Parallel execution remains disabled since the I/O tests do not
> > use self contained execution environments and thus conflict with
> > each other.
> 
> Great to see some movement in this area again!
> 
> Some questions/remarks:
> 
> 1) Could you remove tests/check-block.sh now? See also:
>    https://lore.kernel.org/all/20220209101530.3442837-9-thuth@redhat.com/

Possibly, I wasn't sure if that was wanted as a general entry
point for humans, or was solely for meson ?

> 2) With regards to parallel execution ... I think it should be
>    possible nowadays - the "check" script is normally also run
>    with the "-j" switch by the tests/check-block.sh script, so
>    if you remove the possibility to run in parallel, it's a
>    regression from the previous behavior!

Hmmm, I got *masses* of test failures when running in parallel
but I'll check again to be sure.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

Posted by Thomas Huth 2 years, 11 months ago

On 03/03/2023 09.53, Daniel P. Berrangé wrote:
> On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
>> On 02/03/2023 19.46, Daniel P. Berrangé wrote:
>>> To just repeat the patch 5 description...
>>>
>>> Currently meson registers a single test that invokes an entire group of
>>> I/O tests, hiding the test granularity from meson. There are various
>>> downsides of doing this
>>>
>>>    * You cannot ask 'meson test' to invoke a single I/O test
>>>    * The meson test timeout can't be applied to the individual
>>>      tests
>>>    * Meson only gets a pass/fail for the overall I/O test group
>>>      not individual tests
>>>    * If a CI job gets killed by the GitLab timeout, we don't
>>>      get visibility into how far through the I/O tests
>>>      execution got.
>>>
>>> This is not really specific to the I/O tests, the problem is common
>>> to any case of us running a test which is in fact another test
>>> harness which runs many tests. It would be nice to have meson have
>>> the full view of all tests run. Adapting the I/O tests is as easy
>>> win in this respect.
>>>
>>> This switches meson to perform test discovery by invoking 'check' in
>>> dry-run mode. It then registers one meson test case for each I/O
>>> test. Parallel execution remains disabled since the I/O tests do not
>>> use self contained execution environments and thus conflict with
>>> each other.
>>
>> Great to see some movement in this area again!
>>
>> Some questions/remarks:
>>
>> 1) Could you remove tests/check-block.sh now? See also:
>>     https://lore.kernel.org/all/20220209101530.3442837-9-thuth@redhat.com/
> 
> Possibly, I wasn't sure if that was wanted as a general entry
> point for humans, or was solely for meson ?

I think this script was only ever used for "make check-block", I never heard 
of anybody really using this script directly in a regular fashion. Humans 
rather run the tests/qemu-iotests/check script directly. Also see its origins:

  https://gitlab.com/qemu-project/qemu/-/commit/b8c6f29eb84cd3ccbbf

  HTH,
   Thomas

Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

Posted by Daniel P. Berrangé 2 years, 11 months ago

On Fri, Mar 03, 2023 at 08:53:32AM +0000, Daniel P. Berrangé wrote:
> On Fri, Mar 03, 2023 at 09:30:39AM +0100, Thomas Huth wrote:
> > On 02/03/2023 19.46, Daniel P. Berrangé wrote:
> > > To just repeat the patch 5 description...
> > > 
> > > Currently meson registers a single test that invokes an entire group of
> > > I/O tests, hiding the test granularity from meson. There are various
> > > downsides of doing this
> > > 
> > >   * You cannot ask 'meson test' to invoke a single I/O test
> > >   * The meson test timeout can't be applied to the individual
> > >     tests
> > >   * Meson only gets a pass/fail for the overall I/O test group
> > >     not individual tests
> > >   * If a CI job gets killed by the GitLab timeout, we don't
> > >     get visibility into how far through the I/O tests
> > >     execution got.
> > > 
> > > This is not really specific to the I/O tests, the problem is common
> > > to any case of us running a test which is in fact another test
> > > harness which runs many tests. It would be nice to have meson have
> > > the full view of all tests run. Adapting the I/O tests is as easy
> > > win in this respect.
> > > 
> > > This switches meson to perform test discovery by invoking 'check' in
> > > dry-run mode. It then registers one meson test case for each I/O
> > > test. Parallel execution remains disabled since the I/O tests do not
> > > use self contained execution environments and thus conflict with
> > > each other.
> > 
> > Great to see some movement in this area again!
> > 
> > Some questions/remarks:
> > 
> > 1) Could you remove tests/check-block.sh now? See also:
> >    https://lore.kernel.org/all/20220209101530.3442837-9-thuth@redhat.com/
> 
> Possibly, I wasn't sure if that was wanted as a general entry
> point for humans, or was solely for meson ?
> 
> > 2) With regards to parallel execution ... I think it should be
> >    possible nowadays - the "check" script is normally also run
> >    with the "-j" switch by the tests/check-block.sh script, so
> >    if you remove the possibility to run in parallel, it's a
> >    regression from the previous behavior!
> 
> Hmmm, I got *masses* of test failures when running in parallel
> but I'll check again to be sure.

Ooooh, the 'check' script only uses a unique sub-dir for
tests when passing -j with a value >= 2. It also merely
appends the test name, so it is still broken for anyone
who runs multiple copies of 'check' in parallel for
different configurations, even if they passed -j. It
needs to just always use an isolated subdir no matter
what, with process level uniqueness, not merely test.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 0/5] iotests: make meson aware of individual I/O tests

Posted by Daniel P. Berrangé 2 years, 11 months ago

On Thu, Mar 02, 2023 at 06:46:01PM +0000, Daniel P. Berrangé wrote:
> To just repeat the patch 5 description...
> 
> Currently meson registers a single test that invokes an entire group of
> I/O tests, hiding the test granularity from meson. There are various
> downsides of doing this
> 
>  * You cannot ask 'meson test' to invoke a single I/O test
>  * The meson test timeout can't be applied to the individual
>    tests
>  * Meson only gets a pass/fail for the overall I/O test group
>    not individual tests

Another big benefit is that we get to see the execution time of
every I/O test, so can identify which handful of tests are making
it take almost 5 minutes to run in gitlab...

The really big 4 are:

199/314 qemu:block / qemu-iotests-qcow2-040   OK    16.62s   1 subtests passed
200/314 qemu:block / qemu-iotests-qcow2-041   OK    44.51s   1 subtests passed
265/314 qemu:block / qemu-iotests-qcow2-191   OK    17.93s   1 subtests passed
282/314 qemu:block / qemu-iotests-qcow2-271   OK    25.03s   1 subtests passed

there are a bunch in the ~5 second range, and then most are
1 second or less.

>  * If a CI job gets killed by the GitLab timeout, we don't
>    get visibility into how far through the I/O tests
>    execution got.
> 
> This is not really specific to the I/O tests, the problem is common
> to any case of us running a test which is in fact another test
> harness which runs many tests. It would be nice to have meson have
> the full view of all tests run. Adapting the I/O tests is as easy
> win in this respect.
> 
> This switches meson to perform test discovery by invoking 'check' in
> dry-run mode. It then registers one meson test case for each I/O
> test. Parallel execution remains disabled since the I/O tests do not
> use self contained execution environments and thus conflict with
> each other.
> 
> Compare contrast output from a current job:
> 
>   https://gitlab.com/qemu-project/qemu/-/jobs/3863603546
> 
> [quote]
> 204/224 qemu:block / qemu-iotests qcow2   OK 329.94s   119 subtests passed
> [/quote]
> 
> Vs what is seen with this series:
> 
>   https://gitlab.com/berrange/qemu/-/jobs/3865975463
> 
> [quote]
> 204/350 qemu:block / qemu-iotests-qcow2-001   OK    2.16s   1 subtests passed
> 205/350 qemu:block / qemu-iotests-qcow2-002   OK    2.77s   1 subtests passed
> 
> ...snip...
> 
> 329/350 qemu:block / qemu-iotests-qcow2-qemu-img-close-errors       OK    6.19s   1 subtests passed
> 330/350 qemu:block / qemu-iotests-qcow2-qsd-jobs          OK    0.55s   1 subtests passed
> [/quote]
> 
> A few tweaks were needed to the iotests runner because it had a few
> assumptions about it always running in a tree that has already been
> built, which is obviously not the case at the time meson does test
> discovery.
> 
> Daniel P. Berrangé (5):
>   iotests: explicitly pass source/build dir to 'check' command
>   iotests: allow test discovery before building
>   iotests: strip subdir path when listing tests
>   iotests: print TAP protocol version when reporting tests
>   iotests: register each I/O test separately with meson
> 
>  tests/qemu-iotests/check         | 11 +++++++++--
>  tests/qemu-iotests/meson.build   | 33 ++++++++++++++++++++++++++------
>  tests/qemu-iotests/testenv.py    | 22 +++++++++++++++++----
>  tests/qemu-iotests/testrunner.py |  1 +
>  4 files changed, 55 insertions(+), 12 deletions(-)
> 
> -- 
> 2.39.2
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|