[v1] Final changes to fix test suite on macOS

[PATCH 3/4] qemuxml2argvtest: Increase timeout on macOS

Posted by Roman Bolshakov 4 years ago

The test takes 100+ seconds if all test suite is run with meson and 45+
seconds if it's run directly. According to the output of sample tool,
most of the time (~72 seconds) is spent in poll():

Sort by top of stack, same collapsed (when >= 5):
        poll  (in libsystem_kernel.dylib)                         72396
        tiny_free_no_lock  (in libsystem_malloc.dylib)             1726
        tiny_malloc_should_clear  (in libsystem_malloc.dylib)      1671
        free_tiny  (in libsystem_malloc.dylib)                     1287
        tiny_free_list_add_ptr  (in libsystem_malloc.dylib)        1258
        tiny_malloc_from_free_list  (in libsystem_malloc.dylib)    1256

Closes https://gitlab.com/libvirt/libvirt/-/issues/58
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
---
 tests/meson.build | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tests/meson.build b/tests/meson.build
index 8decdfa20c..3b77a211bb 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -571,7 +571,13 @@ foreach data : tests
     ],
     export_dynamic: true,
   )
-  test(data['name'], test_bin, env: tests_env)
+  if data['name'] == 'qemuxml2argvtest' and host_machine.system() == 'darwin'
+    timeout = 180
+  else
+    # default meson timeout
+    timeout = 30
+  endif
+  test(data['name'], test_bin, env: tests_env, timeout: timeout)
 endforeach
 
 
-- 
2.29.2

Re: [PATCH 3/4] qemuxml2argvtest: Increase timeout on macOS

Posted by Michal Privoznik 4 years ago

On 11/8/20 10:24 PM, Roman Bolshakov wrote:
> The test takes 100+ seconds if all test suite is run with meson and 45+
> seconds if it's run directly. According to the output of sample tool,
> most of the time (~72 seconds) is spent in poll():
> 
> Sort by top of stack, same collapsed (when >= 5):
>          poll  (in libsystem_kernel.dylib)                         72396
>          tiny_free_no_lock  (in libsystem_malloc.dylib)             1726
>          tiny_malloc_should_clear  (in libsystem_malloc.dylib)      1671
>          free_tiny  (in libsystem_malloc.dylib)                     1287
>          tiny_free_list_add_ptr  (in libsystem_malloc.dylib)        1258
>          tiny_malloc_from_free_list  (in libsystem_malloc.dylib)    1256
> 
> Closes https://gitlab.com/libvirt/libvirt/-/issues/58
> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
> ---
>   tests/meson.build | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/meson.build b/tests/meson.build
> index 8decdfa20c..3b77a211bb 100644
> --- a/tests/meson.build
> +++ b/tests/meson.build
> @@ -571,7 +571,13 @@ foreach data : tests
>       ],
>       export_dynamic: true,
>     )
> -  test(data['name'], test_bin, env: tests_env)
> +  if data['name'] == 'qemuxml2argvtest' and host_machine.system() == 'darwin'
> +    timeout = 180
> +  else
> +    # default meson timeout
> +    timeout = 30
> +  endif
> +  test(data['name'], test_bin, env: tests_env, timeout: timeout)
>   endforeach
>   
>   
> 

I think the last time I wanted to increase the timeout I was told that 
it is machine specific and since I know my machine the best I should 
use: meson test --timeout-multiplier=X

Do you need this for the next cirrus patch? If so I think we should just 
add --timeout-multiplier= there.

Michal

Re: [PATCH 3/4] qemuxml2argvtest: Increase timeout on macOS

Posted by Andrea Bolognani 4 years ago

On Fri, 2020-11-13 at 16:58 +0100, Michal Privoznik wrote:
> On 11/8/20 10:24 PM, Roman Bolshakov wrote:
> > +  if data['name'] == 'qemuxml2argvtest' and host_machine.system() == 'darwin'
> > +    timeout = 180
> > +  else
> > +    # default meson timeout
> > +    timeout = 30
> > +  endif
> > +  test(data['name'], test_bin, env: tests_env, timeout: timeout)
> 
> I think the last time I wanted to increase the timeout I was told that 
> it is machine specific and since I know my machine the best I should 
> use: meson test --timeout-multiplier=X

Agreed: in my local test environment, for example, the test suite
completes successfully without tweaking the timeout at all. If the
Cirrus CI builders are slower and require a larger timeout that's
perfectly fine, but we should apply the tweak for that environment
only using the command line argument.

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH 3/4] qemuxml2argvtest: Increase timeout on macOS

Posted by Roman Bolshakov 4 years ago

On Fri, Nov 13, 2020 at 04:58:36PM +0100, Michal Privoznik wrote:
> On 11/8/20 10:24 PM, Roman Bolshakov wrote:
> > The test takes 100+ seconds if all test suite is run with meson and 45+
> > seconds if it's run directly. According to the output of sample tool,
> > most of the time (~72 seconds) is spent in poll():
> > 
> > Sort by top of stack, same collapsed (when >= 5):
> >          poll  (in libsystem_kernel.dylib)                         72396
> >          tiny_free_no_lock  (in libsystem_malloc.dylib)             1726
> >          tiny_malloc_should_clear  (in libsystem_malloc.dylib)      1671
> >          free_tiny  (in libsystem_malloc.dylib)                     1287
> >          tiny_free_list_add_ptr  (in libsystem_malloc.dylib)        1258
> >          tiny_malloc_from_free_list  (in libsystem_malloc.dylib)    1256
> > 
> > Closes https://gitlab.com/libvirt/libvirt/-/issues/58
> > Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
> > ---
> >   tests/meson.build | 8 +++++++-
> >   1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/meson.build b/tests/meson.build
> > index 8decdfa20c..3b77a211bb 100644
> > --- a/tests/meson.build
> > +++ b/tests/meson.build
> > @@ -571,7 +571,13 @@ foreach data : tests
> >       ],
> >       export_dynamic: true,
> >     )
> > -  test(data['name'], test_bin, env: tests_env)
> > +  if data['name'] == 'qemuxml2argvtest' and host_machine.system() == 'darwin'
> > +    timeout = 180
> > +  else
> > +    # default meson timeout
> > +    timeout = 30
> > +  endif
> > +  test(data['name'], test_bin, env: tests_env, timeout: timeout)
> >   endforeach
> > 
> 
> I think the last time I wanted to increase the timeout I was told that it is
> machine specific and since I know my machine the best I should use: meson
> test --timeout-multiplier=X
> 

But qumexml2argvtest has 1000+ test cases. And prior to the series the
test never fully passed. In some sense the series raises a baseline (but
with a higher timeout than on Linux).

> Do you need this for the next cirrus patch? If so I think we should just add
> --timeout-multiplier= there.
> 

No, I needed it for local environment. And I profiled it a bit to find
where the time is spent [1]. I can also build a flamegraph if it helps.

FWIW, it takes less time in Cirrus CI [2] than on MBP Pro 2012 but
timeout margin is quite narrow for default timeout:

90/127 qemuxml2argvtest                        OK      29.4992618560791 s

So, I've measured test execution time on a few apple laptops at hand.

On 16" MBP 2019 I get:
 90/285 libvirt / qemuxml2argvtest      OK             22.11s

On 15" MBP 2012:
 80/117 qemuxml2argvtest                OK             48.31s

On 13" MBA 2015 running on battery:
 80/117 qemuxml2argvtest                OK             50.20s

On 13" MBA 2015 running on A/C power:
 80/117 qemuxml2argvtest                OK             48.71s

Without the explicit timeout everyone has to remember that at least "-t 2"
has to be always passed to "meson test" on darwin. In my opinion,
project defaults should be reasonable without tweaking, to allow nice
"meson test -C build" to run all tests and be sure that they run without
failures if no issues are actually introduced, i.e. there should be no
false positives. This is especially important for "meson dist".

What do you think about other timeout values for qemuxml2argvtest on
darwin: 60s, 90s, 120s, 150s? Is there a chance that one of them would
be ok?

1. https://www.dropbox.com/s/0rxid8me0976m69/qemuxml2argvtest_2020-11-08_220422_4J2C.sample.txt.xz?dl=0
2. https://gitlab.com/roolebo/libvirt/-/jobs/837071634

Thanks,
Roman

Re: [PATCH 3/4] qemuxml2argvtest: Increase timeout on macOS

Posted by Andrea Bolognani 4 years ago

On Tue, 2020-11-17 at 01:34 +0300, Roman Bolshakov wrote:
> On Fri, Nov 13, 2020 at 04:58:36PM +0100, Michal Privoznik wrote:
> > On 11/8/20 10:24 PM, Roman Bolshakov wrote:
> > > +  if data['name'] == 'qemuxml2argvtest' and host_machine.system() == 'darwin'
> > > +    timeout = 180
> > > +  else
> > > +    # default meson timeout
> > > +    timeout = 30
> > > +  endif
> > > +  test(data['name'], test_bin, env: tests_env, timeout: timeout)
> > >   endforeach
> > 
> > I think the last time I wanted to increase the timeout I was told that it is
> > machine specific and since I know my machine the best I should use: meson
> > test --timeout-multiplier=X
> 
> But qumexml2argvtest has 1000+ test cases. And prior to the series the
> test never fully passed. In some sense the series raises a baseline (but
> with a higher timeout than on Linux).
> 
> > Do you need this for the next cirrus patch? If so I think we should just add
> > --timeout-multiplier= there.
> 
> No, I needed it for local environment. And I profiled it a bit to find
> where the time is spent [1]. I can also build a flamegraph if it helps.
> 
> FWIW, it takes less time in Cirrus CI [2] than on MBP Pro 2012 but
> timeout margin is quite narrow for default timeout:
> 
> 90/127 qemuxml2argvtest                        OK      29.4992618560791 s
> 
> So, I've measured test execution time on a few apple laptops at hand.
> 
> On 16" MBP 2019 I get:
>  90/285 libvirt / qemuxml2argvtest      OK             22.11s
> 
> On 15" MBP 2012:
>  80/117 qemuxml2argvtest                OK             48.31s
> 
> On 13" MBA 2015 running on battery:
>  80/117 qemuxml2argvtest                OK             50.20s
> 
> On 13" MBA 2015 running on A/C power:
>  80/117 qemuxml2argvtest                OK             48.71s
> 
> Without the explicit timeout everyone has to remember that at least "-t 2"
> has to be always passed to "meson test" on darwin. In my opinion,
> project defaults should be reasonable without tweaking, to allow nice
> "meson test -C build" to run all tests and be sure that they run without
> failures if no issues are actually introduced, i.e. there should be no
> false positives. This is especially important for "meson dist".
> 
> What do you think about other timeout values for qemuxml2argvtest on
> darwin: 60s, 90s, 120s, 150s? Is there a chance that one of them would
> be ok?

It's true that we make accomodations so that things work out of the
box as much as possible, and this would fall in line with that trend.

You've almost convinced we want this, but yeah we should make this
maybe 90s - 180s seems quite excessive, given your measurements.

I wonder if we should just bump the timeout for this test
unconditionally: we know it's generally the one that takes the
longest, so even on non-Apple hardware we might very well find that
all other tests finish before the default timeout kicks in but this
one can't quite make it.

That's pretty much hwhat happened not too long ago in Debian: then
6.7.0 was imported, it FTBFS on a bunch of slow architectures due to
this specific test timing out:

  https://buildd.debian.org/status/fetch.php?pkg=libvirt&arch=armel&ver=6.7.0-1&stamp=1599582160&raw=0

The solution was to pass --timeout-multiplier

  https://salsa.debian.org/libvirt-team/libvirt/-/commit/b327f9a356c81657aef905eb0d94f188d7a805bd
  https://salsa.debian.org/libvirt-team/libvirt/-/commit/2a7b4f4ad9285bfc9e6dd2d248e426924ef01140

Even if you bumped the timeout unconditionally, we wouldn't be able
to drop that patch: if you look at the latest build for mipsel

  https://buildd.debian.org/status/fetch.php?pkg=libvirt&arch=mipsel&ver=6.9.0-1&stamp=1605523316&raw=0

for example, you'll see that *several* tests take longer than 30s to
finish, even those that take a fraction of qemuxml2argv on x86_64, so
I wouldn't be too keen on keeping track of them and singling them out
in meson.build.

But, one thing is old, slow architectures that people almost
certainly don't do day-to-day development on, and another thing is
relatively recent MacBooks :)

-- 
Andrea Bolognani / Red Hat / Virtualization

Re: [PATCH 3/4] qemuxml2argvtest: Increase timeout on macOS

Posted by Roman Bolshakov 4 years ago

On Tue, Nov 17, 2020 at 11:01:21AM +0100, Andrea Bolognani wrote:
> On Tue, 2020-11-17 at 01:34 +0300, Roman Bolshakov wrote:
> > On Fri, Nov 13, 2020 at 04:58:36PM +0100, Michal Privoznik wrote:
> > > On 11/8/20 10:24 PM, Roman Bolshakov wrote:
> > > > +  if data['name'] == 'qemuxml2argvtest' and host_machine.system() == 'darwin'
> > > > +    timeout = 180
> > > > +  else
> > > > +    # default meson timeout
> > > > +    timeout = 30
> > > > +  endif
> > > > +  test(data['name'], test_bin, env: tests_env, timeout: timeout)
> > > >   endforeach
> > > 
> > > I think the last time I wanted to increase the timeout I was told that it is
> > > machine specific and since I know my machine the best I should use: meson
> > > test --timeout-multiplier=X
> > 
> > But qumexml2argvtest has 1000+ test cases. And prior to the series the
> > test never fully passed. In some sense the series raises a baseline (but
> > with a higher timeout than on Linux).
> > 
> > > Do you need this for the next cirrus patch? If so I think we should just add
> > > --timeout-multiplier= there.
> > 
> > No, I needed it for local environment. And I profiled it a bit to find
> > where the time is spent [1]. I can also build a flamegraph if it helps.
> > 
> > FWIW, it takes less time in Cirrus CI [2] than on MBP Pro 2012 but
> > timeout margin is quite narrow for default timeout:
> > 
> > 90/127 qemuxml2argvtest                        OK      29.4992618560791 s
> > 
> > So, I've measured test execution time on a few apple laptops at hand.
> > 
> > On 16" MBP 2019 I get:
> >  90/285 libvirt / qemuxml2argvtest      OK             22.11s
> > 
> > On 15" MBP 2012:
> >  80/117 qemuxml2argvtest                OK             48.31s
> > 
> > On 13" MBA 2015 running on battery:
> >  80/117 qemuxml2argvtest                OK             50.20s
> > 
> > On 13" MBA 2015 running on A/C power:
> >  80/117 qemuxml2argvtest                OK             48.71s
> > 
> > Without the explicit timeout everyone has to remember that at least "-t 2"
> > has to be always passed to "meson test" on darwin. In my opinion,
> > project defaults should be reasonable without tweaking, to allow nice
> > "meson test -C build" to run all tests and be sure that they run without
> > failures if no issues are actually introduced, i.e. there should be no
> > false positives. This is especially important for "meson dist".
> > 
> > What do you think about other timeout values for qemuxml2argvtest on
> > darwin: 60s, 90s, 120s, 150s? Is there a chance that one of them would
> > be ok?
> 
> It's true that we make accomodations so that things work out of the
> box as much as possible, and this would fall in line with that trend.
> 
> You've almost convinced we want this, but yeah we should make this
> maybe 90s - 180s seems quite excessive, given your measurements.
> 

Ok, I can change it to 90s.

> I wonder if we should just bump the timeout for this test
> unconditionally: we know it's generally the one that takes the
> longest, so even on non-Apple hardware we might very well find that
> all other tests finish before the default timeout kicks in but this
> one can't quite make it.
> 

I'm fine if it's unconditionally increased on all platforms for
qemuxml2argvtest. I'll send the change in v2 if nobody has objections.

> That's pretty much what happened not too long ago in Debian: then
> 6.7.0 was imported, it FTBFS on a bunch of slow architectures due to
> this specific test timing out:
> 
>   https://buildd.debian.org/status/fetch.php?pkg=libvirt&arch=armel&ver=6.7.0-1&stamp=1599582160&raw=0
> 
> The solution was to pass --timeout-multiplier
> 
>   https://salsa.debian.org/libvirt-team/libvirt/-/commit/b327f9a356c81657aef905eb0d94f188d7a805bd
>   https://salsa.debian.org/libvirt-team/libvirt/-/commit/2a7b4f4ad9285bfc9e6dd2d248e426924ef01140
> 
> Even if you bumped the timeout unconditionally, we wouldn't be able
> to drop that patch: if you look at the latest build for mipsel
> 
>   https://buildd.debian.org/status/fetch.php?pkg=libvirt&arch=mipsel&ver=6.9.0-1&stamp=1605523316&raw=0
> 
> for example, you'll see that *several* tests take longer than 30s to
> finish, even those that take a fraction of qemuxml2argv on x86_64, so
> I wouldn't be too keen on keeping track of them and singling them out
> in meson.build.
> 
> But, one thing is old, slow architectures that people almost
> certainly don't do day-to-day development on, and another thing is
> relatively recent MacBooks :)
> 

I doubt there are many users/developers of libvirtd on macOS right now
because there's no upstream Hypervisor.framework and
Virtualization.framework support. libvirt is used as a standalone client
tool [1] and for virt-manager [2]. The only usable qemu domain in
libvirtd is TCG.

With regards to HVF, I'm ressurecting my HVF patchset [3] as all
outstanding issues (syntax check, the tests, qemucapsprobe) are mostly
fixed or about to be fixed. There's also an outstanding patch series to
QEMU that improves reporting of non KVM accelerators [4]. Once we add
HVF and may be HAXM domain support, there will be much more interest in
libvirt, qemu on macOS. So, convenient development environment would be
important to ease further contributions.

1. https://formulae.brew.sh/formula/libvirt
2. https://github.com/jeffreywildman/homebrew-virt-manager
3. https://www.redhat.com/archives/libvir-list/2018-November/msg00802.html
4. https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg03944.html

Thanks,
Roman