build-aux/meson.build | 1 + tests/meson.build | 14 ++++++-------- 2 files changed, 7 insertions(+), 8 deletions(-)
Since we've switched to meson our tests run with a timeout (meson
uses 30 seconds as the default). However, not every machine that
builds libvirt is fast enough to run every test under 30 seconds
(each test binary has its own timeout, but still). For instance
when building a package for distro on a farm that's under load.
Or on a generally slow ARM hardware. While each developer can
tune their command line for building by adding
--timeout-multiplier=10, this is hard to do for aforementioned
build farms.
It's time to admit that not everybody has the latest, top shelf
CPU and increase the timeout.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
Successfully tested on my RasPi 2B, where the top three longest test
were:
qemuxml2argvtest OK 142.43s
virschematest OK 40.82s
virnettlssessiontest OK 38.64s
and syntax-check:
sc_spacing-check OK 50.26
sc_avoid_if_before_free OK 42.30s
sc_prohibit_cross_inclusion OK 40.51s
build-aux/meson.build | 1 +
tests/meson.build | 14 ++++++--------
2 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/build-aux/meson.build b/build-aux/meson.build
index c506feefd2..e7b2db8759 100644
--- a/build-aux/meson.build
+++ b/build-aux/meson.build
@@ -44,6 +44,7 @@ if git
potfiles_dep,
],
suite: 'syntax-check',
+ timeout: 120,
)
endforeach
endif
diff --git a/tests/meson.build b/tests/meson.build
index 0de0783839..702090d594 100644
--- a/tests/meson.build
+++ b/tests/meson.build
@@ -546,6 +546,10 @@ if conf.has('WITH_YAJL')
]
endif
+# Increase the default timeout because some tests may run longer,
+# esp. on slower systems.
+timeout = 180
+
foreach data : tests
test_sources = '@0@.c'.format(data['name'])
test_bin = executable(
@@ -577,12 +581,6 @@ foreach data : tests
],
export_dynamic: true,
)
- if data['name'] == 'qemuxml2argvtest'
- timeout = 90
- else
- # default meson timeout
- timeout = 30
- endif
test(data['name'], test_bin, env: tests_env, timeout: timeout)
endforeach
@@ -683,7 +681,7 @@ endif
foreach name : test_scripts
script = find_program(name)
- test(name, script, env: tests_env)
+ test(name, script, env: tests_env, timeout: timeout)
endforeach
add_test_setup(
@@ -703,6 +701,6 @@ add_test_setup(
'--suppressions=@0@'.format(meson.current_source_dir() / '.valgrind.supp'),
'--error-exitcode=1',
],
- # default timeout in meson is 30s
+ # Tests take a lot longer when run under Valgrind
timeout_multiplier: 4,
)
--
2.26.2
On 1/27/21 2:59 PM, Michal Privoznik wrote: > Since we've switched to meson our tests run with a timeout (meson > uses 30 seconds as the default). However, not every machine that > builds libvirt is fast enough to run every test under 30 seconds > (each test binary has its own timeout, but still). For instance > when building a package for distro on a farm that's under load. > Or on a generally slow ARM hardware. While each developer can > tune their command line for building by adding > --timeout-multiplier=10, this is hard to do for aforementioned > build farms. > > It's time to admit that not everybody has the latest, top shelf > CPU and increase the timeout. > > Signed-off-by: Michal Privoznik <mprivozn@redhat.com> > --- This sure will help these build farms environments, but what about the cases where an actual timeout means that there is something wrong with the code? E.g. commits 46d88d8dba56 and 2ba0b7497ce7 were only possible because I was seeing tests timing out in Power hosts when the 30 sec timeout was being enforced. A 120 second default timeout for the majority of the test cases is a long time. virschematest in this laptop I use takes 2.5 sec to complete. If I do something wrong in the code and now the test is now 4 times slower (10 sec) I will not be able to detect it (I'll need to start keeping track or something). You'll have to run the test suit on your RasPi 2B to see that something went wrong because the timeout is better tuned to your RasPI than this laptop, but then the code is already upstream. And the tests will get more complex and will naturally take longer to complete. Eventually this timeout might no be enough. Increase the timeout again? Meson 0.57 has support for disabling test timeout entirely with --timeout-multiplier=0, disabling test timeout entirely. Can't we bump the meson version to 0.57 and then tell the distros to use timeout-multiplier=0? That will solve the problems for them, I keep the short timeouts for development, and we won't need to keep bumping the test timeout once every 2 years or something because the s390x TCG enviroment in Fedora COPR is timing out again. Thanks, DHB > > Successfully tested on my RasPi 2B, where the top three longest test > were: > > qemuxml2argvtest OK 142.43s > virschematest OK 40.82s > virnettlssessiontest OK 38.64s > > and syntax-check: > > sc_spacing-check OK 50.26 > sc_avoid_if_before_free OK 42.30s > sc_prohibit_cross_inclusion OK 40.51s > > build-aux/meson.build | 1 + > tests/meson.build | 14 ++++++-------- > 2 files changed, 7 insertions(+), 8 deletions(-) > > diff --git a/build-aux/meson.build b/build-aux/meson.build > index c506feefd2..e7b2db8759 100644 > --- a/build-aux/meson.build > +++ b/build-aux/meson.build > @@ -44,6 +44,7 @@ if git > potfiles_dep, > ], > suite: 'syntax-check', > + timeout: 120, > ) > endforeach > endif > diff --git a/tests/meson.build b/tests/meson.build > index 0de0783839..702090d594 100644 > --- a/tests/meson.build > +++ b/tests/meson.build > @@ -546,6 +546,10 @@ if conf.has('WITH_YAJL') > ] > endif > > +# Increase the default timeout because some tests may run longer, > +# esp. on slower systems. > +timeout = 180 > + > foreach data : tests > test_sources = '@0@.c'.format(data['name']) > test_bin = executable( > @@ -577,12 +581,6 @@ foreach data : tests > ], > export_dynamic: true, > ) > - if data['name'] == 'qemuxml2argvtest' > - timeout = 90 > - else > - # default meson timeout > - timeout = 30 > - endif > test(data['name'], test_bin, env: tests_env, timeout: timeout) > endforeach > > @@ -683,7 +681,7 @@ endif > > foreach name : test_scripts > script = find_program(name) > - test(name, script, env: tests_env) > + test(name, script, env: tests_env, timeout: timeout) > endforeach > > add_test_setup( > @@ -703,6 +701,6 @@ add_test_setup( > '--suppressions=@0@'.format(meson.current_source_dir() / '.valgrind.supp'), > '--error-exitcode=1', > ], > - # default timeout in meson is 30s > + # Tests take a lot longer when run under Valgrind > timeout_multiplier: 4, > ) >
On Fri, Jan 29, 2021 at 09:30:24AM -0300, Daniel Henrique Barboza wrote: > > > On 1/27/21 2:59 PM, Michal Privoznik wrote: > > Since we've switched to meson our tests run with a timeout (meson > > uses 30 seconds as the default). However, not every machine that > > builds libvirt is fast enough to run every test under 30 seconds > > (each test binary has its own timeout, but still). For instance > > when building a package for distro on a farm that's under load. > > Or on a generally slow ARM hardware. While each developer can > > tune their command line for building by adding > > --timeout-multiplier=10, this is hard to do for aforementioned > > build farms. > > > > It's time to admit that not everybody has the latest, top shelf > > CPU and increase the timeout. > > > > Signed-off-by: Michal Privoznik <mprivozn@redhat.com> > > --- > > This sure will help these build farms environments, but what about the cases > where an actual timeout means that there is something wrong with the code? > E.g. commits 46d88d8dba56 and 2ba0b7497ce7 were only possible because I was > seeing tests timing out in Power hosts when the 30 sec timeout was being > enforced. > > A 120 second default timeout for the majority of the test cases is a long time. > virschematest in this laptop I use takes 2.5 sec to complete. If I do something > wrong in the code and now the test is now 4 times slower (10 sec) I will not be > able to detect it (I'll need to start keeping track or something). You'll have to > run the test suit on your RasPi 2B to see that something went wrong because the > timeout is better tuned to your RasPI than this laptop, but then the code is already > upstream. > > And the tests will get more complex and will naturally take longer to complete. > Eventually this timeout might no be enough. Increase the timeout again? > > Meson 0.57 has support for disabling test timeout entirely with --timeout-multiplier=0, > disabling test timeout entirely. Can't we bump the meson version to 0.57 and > then tell the distros to use timeout-multiplier=0? That will solve the problems > for them, I keep the short timeouts for development, and we won't need to keep > bumping the test timeout once every 2 years or something because the s390x > TCG enviroment in Fedora COPR is timing out again. Agreed. The timeout is useful and we can always find a machine where the current timeout will not be good enough. I don't think it is a good idea to increase the timeout like this for only few specific use-cases. Bumping meson version to 0.57 just only for this tiny nice to have feature is not something we should do. It would make building libvirt unnecessary complicated for most contributors not using latest bleeding-edge software. For example meson 0.57 is not even in Fedora Rawhide, Gentoo, Archlinux and possibly on most of the similar distributions. Pavel
On 1/29/21 1:45 PM, Pavel Hrdina wrote: > On Fri, Jan 29, 2021 at 09:30:24AM -0300, Daniel Henrique Barboza wrote: >> >> >> On 1/27/21 2:59 PM, Michal Privoznik wrote: >>> Since we've switched to meson our tests run with a timeout (meson >>> uses 30 seconds as the default). However, not every machine that >>> builds libvirt is fast enough to run every test under 30 seconds >>> (each test binary has its own timeout, but still). For instance >>> when building a package for distro on a farm that's under load. >>> Or on a generally slow ARM hardware. While each developer can >>> tune their command line for building by adding >>> --timeout-multiplier=10, this is hard to do for aforementioned >>> build farms. >>> >>> It's time to admit that not everybody has the latest, top shelf >>> CPU and increase the timeout. >>> >>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> >>> --- >> >> This sure will help these build farms environments, but what about the cases >> where an actual timeout means that there is something wrong with the code? >> E.g. commits 46d88d8dba56 and 2ba0b7497ce7 were only possible because I was >> seeing tests timing out in Power hosts when the 30 sec timeout was being >> enforced. >> >> A 120 second default timeout for the majority of the test cases is a long time. >> virschematest in this laptop I use takes 2.5 sec to complete. If I do something >> wrong in the code and now the test is now 4 times slower (10 sec) I will not be >> able to detect it (I'll need to start keeping track or something). You'll have to >> run the test suit on your RasPi 2B to see that something went wrong because the >> timeout is better tuned to your RasPI than this laptop, but then the code is already >> upstream. >> >> And the tests will get more complex and will naturally take longer to complete. >> Eventually this timeout might no be enough. Increase the timeout again? >> >> Meson 0.57 has support for disabling test timeout entirely with --timeout-multiplier=0, >> disabling test timeout entirely. Can't we bump the meson version to 0.57 and >> then tell the distros to use timeout-multiplier=0? That will solve the problems >> for them, I keep the short timeouts for development, and we won't need to keep >> bumping the test timeout once every 2 years or something because the s390x >> TCG enviroment in Fedora COPR is timing out again. > > Agreed. The timeout is useful and we can always find a machine where the > current timeout will not be good enough. I don't think it is a good idea > to increase the timeout like this for only few specific use-cases. So we agree that timeouts are evil :-) Back in the autotools days we did not use any timeout at all and I don't remember that causing any trouble. But maybe it was so painful that I blocked it in my mind :-) Michal
On Fri, Jan 29, 2021 at 02:07:31PM +0100, Michal Privoznik wrote: > On 1/29/21 1:45 PM, Pavel Hrdina wrote: > > On Fri, Jan 29, 2021 at 09:30:24AM -0300, Daniel Henrique Barboza wrote: > > > > > > > > > On 1/27/21 2:59 PM, Michal Privoznik wrote: > > > > Since we've switched to meson our tests run with a timeout (meson > > > > uses 30 seconds as the default). However, not every machine that > > > > builds libvirt is fast enough to run every test under 30 seconds > > > > (each test binary has its own timeout, but still). For instance > > > > when building a package for distro on a farm that's under load. > > > > Or on a generally slow ARM hardware. While each developer can > > > > tune their command line for building by adding > > > > --timeout-multiplier=10, this is hard to do for aforementioned > > > > build farms. > > > > > > > > It's time to admit that not everybody has the latest, top shelf > > > > CPU and increase the timeout. > > > > > > > > Signed-off-by: Michal Privoznik <mprivozn@redhat.com> > > > > --- > > > > > > This sure will help these build farms environments, but what about the cases > > > where an actual timeout means that there is something wrong with the code? > > > E.g. commits 46d88d8dba56 and 2ba0b7497ce7 were only possible because I was > > > seeing tests timing out in Power hosts when the 30 sec timeout was being > > > enforced. > > > > > > A 120 second default timeout for the majority of the test cases is a long time. > > > virschematest in this laptop I use takes 2.5 sec to complete. If I do something > > > wrong in the code and now the test is now 4 times slower (10 sec) I will not be > > > able to detect it (I'll need to start keeping track or something). You'll have to > > > run the test suit on your RasPi 2B to see that something went wrong because the > > > timeout is better tuned to your RasPI than this laptop, but then the code is already > > > upstream. > > > > > > And the tests will get more complex and will naturally take longer to complete. > > > Eventually this timeout might no be enough. Increase the timeout again? > > > > > > Meson 0.57 has support for disabling test timeout entirely with --timeout-multiplier=0, > > > disabling test timeout entirely. Can't we bump the meson version to 0.57 and > > > then tell the distros to use timeout-multiplier=0? That will solve the problems > > > for them, I keep the short timeouts for development, and we won't need to keep > > > bumping the test timeout once every 2 years or something because the s390x > > > TCG enviroment in Fedora COPR is timing out again. > > > > Agreed. The timeout is useful and we can always find a machine where the > > current timeout will not be good enough. I don't think it is a good idea > > to increase the timeout like this for only few specific use-cases. > > So we agree that timeouts are evil :-) Back in the autotools days we did not No we don't :) timeouts are useful in test suites to reveal issues that something takes a long time when it shouldn't. Pavel > use any timeout at all and I don't remember that causing any trouble. But > maybe it was so painful that I blocked it in my mind :-) > > Michal
On 1/29/21 2:25 PM, Pavel Hrdina wrote: > On Fri, Jan 29, 2021 at 02:07:31PM +0100, Michal Privoznik wrote: >> On 1/29/21 1:45 PM, Pavel Hrdina wrote: >>> On Fri, Jan 29, 2021 at 09:30:24AM -0300, Daniel Henrique Barboza wrote: >>>> >> So we agree that timeouts are evil :-) Back in the autotools days we did not > > No we don't :) timeouts are useful in test suites to reveal issues that > something takes a long time when it shouldn't. I view timeout as a prevention mechanism for a test that went into an endless loop. If we really care about timing then there are way better techniques than meson's test timeout. Or put differently, I don't record times for individual test cases and I would not notice if a test doubles its run time. Michal
On Fri, Jan 29, 2021 at 09:30:24AM -0300, Daniel Henrique Barboza wrote: > > > On 1/27/21 2:59 PM, Michal Privoznik wrote: > > Since we've switched to meson our tests run with a timeout (meson > > uses 30 seconds as the default). However, not every machine that > > builds libvirt is fast enough to run every test under 30 seconds > > (each test binary has its own timeout, but still). For instance > > when building a package for distro on a farm that's under load. > > Or on a generally slow ARM hardware. While each developer can > > tune their command line for building by adding > > --timeout-multiplier=10, this is hard to do for aforementioned > > build farms. > > > > It's time to admit that not everybody has the latest, top shelf > > CPU and increase the timeout. > > > > Signed-off-by: Michal Privoznik <mprivozn@redhat.com> > > --- > > This sure will help these build farms environments, but what about the cases > where an actual timeout means that there is something wrong with the code? > E.g. commits 46d88d8dba56 and 2ba0b7497ce7 were only possible because I was > seeing tests timing out in Power hosts when the 30 sec timeout was being > enforced. > > A 120 second default timeout for the majority of the test cases is a long time. > virschematest in this laptop I use takes 2.5 sec to complete. If I do something > wrong in the code and now the test is now 4 times slower (10 sec) I will not be > able to detect it (I'll need to start keeping track or something). You'll have to > run the test suit on your RasPi 2B to see that something went wrong because the > timeout is better tuned to your RasPI than this laptop, but then the code is already > upstream. > > And the tests will get more complex and will naturally take longer to complete. > Eventually this timeout might no be enough. Increase the timeout again? > > Meson 0.57 has support for disabling test timeout entirely with --timeout-multiplier=0, > disabling test timeout entirely. Can't we bump the meson version to 0.57 and > then tell the distros to use timeout-multiplier=0? That will solve the problems > for them, I keep the short timeouts for development, and we won't need to keep > bumping the test timeout once every 2 years or something because the s390x > TCG enviroment in Fedora COPR is timing out again. We don't even need to bump the meson versionin libvirt. If someone ones the functionality from 0.57, then can simply install that version of meson. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 1/29/21 1:30 PM, Daniel Henrique Barboza wrote: > > > On 1/27/21 2:59 PM, Michal Privoznik wrote: >> Since we've switched to meson our tests run with a timeout (meson >> uses 30 seconds as the default). However, not every machine that >> builds libvirt is fast enough to run every test under 30 seconds >> (each test binary has its own timeout, but still). For instance >> when building a package for distro on a farm that's under load. >> Or on a generally slow ARM hardware. While each developer can >> tune their command line for building by adding >> --timeout-multiplier=10, this is hard to do for aforementioned >> build farms. >> >> It's time to admit that not everybody has the latest, top shelf >> CPU and increase the timeout. >> >> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> >> --- > > This sure will help these build farms environments, but what about the > cases > where an actual timeout means that there is something wrong with the code? > E.g. commits 46d88d8dba56 and 2ba0b7497ce7 were only possible because I was > seeing tests timing out in Power hosts when the 30 sec timeout was being > enforced. > > A 120 second default timeout for the majority of the test cases is a > long time. > virschematest in this laptop I use takes 2.5 sec to complete. If I do > something > wrong in the code and now the test is now 4 times slower (10 sec) I will > not be > able to detect it (I'll need to start keeping track or something). With 30 second timeout you won't detect that either. Using timeout as an indicator of test failure is wrong IMO. And if I were not lazy and fixed 'check-access' test suite then we would see instantly what tests are accessing paths in the host (=> depend on host configuration). > You'll have to > run the test suit on your RasPi 2B to see that something went wrong > because the > timeout is better tuned to your RasPI than this laptop, but then the > code is already > upstream. So should we make timeouts shorter then? Why is 30 seconds sweet spot? > > And the tests will get more complex and will naturally take longer to > complete. > Eventually this timeout might no be enough. Increase the timeout again? Sure, why not? We adapt to newer gcc/clang/$whatever, why not timeout? Michal
On 1/29/21 10:04 AM, Michal Privoznik wrote: > On 1/29/21 1:30 PM, Daniel Henrique Barboza wrote: >> >> >> On 1/27/21 2:59 PM, Michal Privoznik wrote: >>> Since we've switched to meson our tests run with a timeout (meson >>> uses 30 seconds as the default). However, not every machine that >>> builds libvirt is fast enough to run every test under 30 seconds >>> (each test binary has its own timeout, but still). For instance >>> when building a package for distro on a farm that's under load. >>> Or on a generally slow ARM hardware. While each developer can >>> tune their command line for building by adding >>> --timeout-multiplier=10, this is hard to do for aforementioned >>> build farms. >>> >>> It's time to admit that not everybody has the latest, top shelf >>> CPU and increase the timeout. >>> >>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com> >>> --- >> >> This sure will help these build farms environments, but what about the cases >> where an actual timeout means that there is something wrong with the code? >> E.g. commits 46d88d8dba56 and 2ba0b7497ce7 were only possible because I was >> seeing tests timing out in Power hosts when the 30 sec timeout was being >> enforced. >> >> A 120 second default timeout for the majority of the test cases is a long time. >> virschematest in this laptop I use takes 2.5 sec to complete. If I do something >> wrong in the code and now the test is now 4 times slower (10 sec) I will not be >> able to detect it (I'll need to start keeping track or something). > > With 30 second timeout you won't detect that either. Using timeout as an indicator of test failure is wrong IMO. And if I were not lazy and fixed 'check-access' test suite then we would see instantly what tests are accessing paths in the host (=> depend on host configuration). 30 sec is too long for most tests :) Let's put it this way: I don't think most of us keeps track of how much a certain test takes to complete during our dev, and the 30 sec timeout is a marker to see if we messed up or not. A 120 timeout is too extreme for my dev env (and I believe most if not all of us can run the test suit without any problems during development). All that said, I just ran 'ninja -C build test' and verified that we don't provide a total run time for all tests. We provide the time taken for all 306 tests, but not a total. If we change the script to output the total time taken to run all tests in the test suit, in a successful run (i.e. no failed tests), then I wouldn't mind the timeout increase. I can run the test suit in master, get the total time taken, do some coding, run again, compare the new total. This comparison will give me a hint of whether something went too wrong, then I can compare the numbers myself to see what happened. In this case I wouldn't mind removing all test timeouts or increasing the timeout to help the distros or what have you. Thanks, DHB > >> You'll have to >> run the test suit on your RasPi 2B to see that something went wrong because the >> timeout is better tuned to your RasPI than this laptop, but then the code is already >> upstream. > > So should we make timeouts shorter then? Why is 30 seconds sweet spot? > >> >> And the tests will get more complex and will naturally take longer to complete. >> Eventually this timeout might no be enough. Increase the timeout again? > > Sure, why not? We adapt to newer gcc/clang/$whatever, why not timeout? > > Michal >
On Wed, Jan 27, 2021 at 06:59:58PM +0100, Michal Privoznik wrote: > Since we've switched to meson our tests run with a timeout (meson > uses 30 seconds as the default). However, not every machine that > builds libvirt is fast enough to run every test under 30 seconds > (each test binary has its own timeout, but still). For instance > when building a package for distro on a farm that's under load. > Or on a generally slow ARM hardware. While each developer can > tune their command line for building by adding > --timeout-multiplier=10, this is hard to do for aforementioned > build farms. I don't get why it is hard for build farms. Someone, somwhere is writing the script that invokes meson & ninja with some args. Why is it hard to add --timeout-multiplier=10 too ? > It's time to admit that not everybody has the latest, top shelf > CPU and increase the timeout. I'm not convinced we want to optimize for the slowest hardware we can find, especially when there's an easy option of setting --timeout-multiplier=10. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Fri, 2021-01-29 at 12:48 +0000, Daniel P. Berrangé wrote: > On Wed, Jan 27, 2021 at 06:59:58PM +0100, Michal Privoznik wrote: > > Since we've switched to meson our tests run with a timeout (meson > > uses 30 seconds as the default). However, not every machine that > > builds libvirt is fast enough to run every test under 30 seconds > > (each test binary has its own timeout, but still). For instance > > when building a package for distro on a farm that's under load. > > Or on a generally slow ARM hardware. While each developer can > > tune their command line for building by adding > > --timeout-multiplier=10, this is hard to do for aforementioned > > build farms. > > I don't get why it is hard for build farms. Someone, somwhere > is writing the script that invokes meson & ninja with some > args. Why is it hard to add --timeout-multiplier=10 too ? > > > It's time to admit that not everybody has the latest, top shelf > > CPU and increase the timeout. > > I'm not convinced we want to optimize for the slowest hardware > we can find, especially when there's an easy option of setting > --timeout-multiplier=10. It's not complicated to add the option, but the fact that Debian, SUSE and now Fedora all need to specify a timeout multiplier hints to the fact that perhaps the default timeout is just too small. -- Andrea Bolognani / Red Hat / Virtualization
On Fri, Jan 29, 2021 at 03:46:18PM +0100, Andrea Bolognani wrote: > On Fri, 2021-01-29 at 12:48 +0000, Daniel P. Berrangé wrote: > > On Wed, Jan 27, 2021 at 06:59:58PM +0100, Michal Privoznik wrote: > > > Since we've switched to meson our tests run with a timeout (meson > > > uses 30 seconds as the default). However, not every machine that > > > builds libvirt is fast enough to run every test under 30 seconds > > > (each test binary has its own timeout, but still). For instance > > > when building a package for distro on a farm that's under load. > > > Or on a generally slow ARM hardware. While each developer can > > > tune their command line for building by adding > > > --timeout-multiplier=10, this is hard to do for aforementioned > > > build farms. > > > > I don't get why it is hard for build farms. Someone, somwhere > > is writing the script that invokes meson & ninja with some > > args. Why is it hard to add --timeout-multiplier=10 too ? > > > > > It's time to admit that not everybody has the latest, top shelf > > > CPU and increase the timeout. > > > > I'm not convinced we want to optimize for the slowest hardware > > we can find, especially when there's an easy option of setting > > --timeout-multiplier=10. > > It's not complicated to add the option, but the fact that Debian, > SUSE and now Fedora all need to specify a timeout multiplier hints to > the fact that perhaps the default timeout is just too small. AFAIK, Fedora hasn't set any timeout multiplier in our builds. Some specific tests have an increased timeout, but that's reasonable because they are very big tests, and that's a targetted increase. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 1/29/21 3:53 PM, Daniel P. Berrangé wrote: > On Fri, Jan 29, 2021 at 03:46:18PM +0100, Andrea Bolognani wrote: >> On Fri, 2021-01-29 at 12:48 +0000, Daniel P. Berrangé wrote: >>> On Wed, Jan 27, 2021 at 06:59:58PM +0100, Michal Privoznik wrote: >>>> Since we've switched to meson our tests run with a timeout (meson >>>> uses 30 seconds as the default). However, not every machine that >>>> builds libvirt is fast enough to run every test under 30 seconds >>>> (each test binary has its own timeout, but still). For instance >>>> when building a package for distro on a farm that's under load. >>>> Or on a generally slow ARM hardware. While each developer can >>>> tune their command line for building by adding >>>> --timeout-multiplier=10, this is hard to do for aforementioned >>>> build farms. >>> >>> I don't get why it is hard for build farms. Someone, somwhere >>> is writing the script that invokes meson & ninja with some >>> args. Why is it hard to add --timeout-multiplier=10 too ? >>> >>>> It's time to admit that not everybody has the latest, top shelf >>>> CPU and increase the timeout. >>> >>> I'm not convinced we want to optimize for the slowest hardware >>> we can find, especially when there's an easy option of setting >>> --timeout-multiplier=10. >> >> It's not complicated to add the option, but the fact that Debian, >> SUSE and now Fedora all need to specify a timeout multiplier hints to >> the fact that perhaps the default timeout is just too small. It's an arbitrary number that was chosen outside of libvirt devel community and IIUC can change anytime meson devels please. So effectively the only control we have is this --timeout-multiplier which is still not guaranteed to yield correct results (e.g. if I now use multiplier of value 10 to allow 300s timeout but the default timeout is then changed to 20s, all of a sudfen I have 200s timeout). > > AFAIK, Fedora hasn't set any timeout multiplier in our builds. Only because Cole did not merge it, because I came up with idea for this patch. Here's Cole's original patch: https://www.redhat.com/archives/libvir-list/2021-January/msg00919.html But okay, looks like we don't have an agreement here so I won't push this patch. But Cole should still push his. Michal
On Fri, Jan 29, 2021 at 04:14:30PM +0100, Michal Privoznik wrote: > On 1/29/21 3:53 PM, Daniel P. Berrangé wrote: > > On Fri, Jan 29, 2021 at 03:46:18PM +0100, Andrea Bolognani wrote: > > > On Fri, 2021-01-29 at 12:48 +0000, Daniel P. Berrangé wrote: > > > > On Wed, Jan 27, 2021 at 06:59:58PM +0100, Michal Privoznik wrote: > > > > > Since we've switched to meson our tests run with a timeout (meson > > > > > uses 30 seconds as the default). However, not every machine that > > > > > builds libvirt is fast enough to run every test under 30 seconds > > > > > (each test binary has its own timeout, but still). For instance > > > > > when building a package for distro on a farm that's under load. > > > > > Or on a generally slow ARM hardware. While each developer can > > > > > tune their command line for building by adding > > > > > --timeout-multiplier=10, this is hard to do for aforementioned > > > > > build farms. > > > > > > > > I don't get why it is hard for build farms. Someone, somwhere > > > > is writing the script that invokes meson & ninja with some > > > > args. Why is it hard to add --timeout-multiplier=10 too ? > > > > > > > > > It's time to admit that not everybody has the latest, top shelf > > > > > CPU and increase the timeout. > > > > > > > > I'm not convinced we want to optimize for the slowest hardware > > > > we can find, especially when there's an easy option of setting > > > > --timeout-multiplier=10. > > > > > > It's not complicated to add the option, but the fact that Debian, > > > SUSE and now Fedora all need to specify a timeout multiplier hints to > > > the fact that perhaps the default timeout is just too small. > > It's an arbitrary number that was chosen outside of libvirt devel community > and IIUC can change anytime meson devels please. So effectively the only > control we have is this --timeout-multiplier which is still not guaranteed > to yield correct results (e.g. if I now use multiplier of value 10 to allow > 300s timeout but the default timeout is then changed to 20s, all of a sudfen > I have 200s timeout). > > > > > AFAIK, Fedora hasn't set any timeout multiplier in our builds. > > Only because Cole did not merge it, because I came up with idea for this > patch. Here's Cole's original patch: > > https://www.redhat.com/archives/libvir-list/2021-January/msg00919.html > > But okay, looks like we don't have an agreement here so I won't push this > patch. But Cole should still push his. Oh that's not a real Fedora build - that's using COPR where s390 is provided by QEMU TCG emulation. It is unsuprising that is slow as treacle, and IMHO optimizing for that env is not desirable. IOW, I'm not convinced we should merge Cole's either. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Fri, 2021-01-29 at 14:53 +0000, Daniel P. Berrangé wrote: > On Fri, Jan 29, 2021 at 03:46:18PM +0100, Andrea Bolognani wrote: > > On Fri, 2021-01-29 at 12:48 +0000, Daniel P. Berrangé wrote: > > > I'm not convinced we want to optimize for the slowest hardware > > > we can find, especially when there's an easy option of setting > > > --timeout-multiplier=10. > > > > It's not complicated to add the option, but the fact that Debian, > > SUSE and now Fedora all need to specify a timeout multiplier hints to > > the fact that perhaps the default timeout is just too small. > > AFAIK, Fedora hasn't set any timeout multiplier in our builds. With commit 70307548d1622bfeffd60c86e62098d165c95074 Author: Cole Robinson <crobinso@redhat.com> Date: Thu Jan 21 16:54:45 2021 -0500 spec: Increase meson test timeout 10x Tests time out when building in slow environments, like emulated s390x in Fedora copr. Bump up the test timeout Reviewed-by: Neal Gompa <ngompa13@gmail.com> Reviewed-by: Andrea Bolognani <abologna@redhat.com> Signed-off-by: Cole Robinson <crobinso@redhat.com> the timeout multiplier has been added to the upstream spec file, which the Fedora spec file is based on, so while you're correct that Fedora is not yet using it, it's fair to assume it will soon make its way there for the benefit of e.g. the virt-preview COPR. -- Andrea Bolognani / Red Hat / Virtualization
On Fri, Jan 29, 2021 at 04:20:53PM +0100, Andrea Bolognani wrote: > On Fri, 2021-01-29 at 14:53 +0000, Daniel P. Berrangé wrote: > > On Fri, Jan 29, 2021 at 03:46:18PM +0100, Andrea Bolognani wrote: > > > On Fri, 2021-01-29 at 12:48 +0000, Daniel P. Berrangé wrote: > > > > I'm not convinced we want to optimize for the slowest hardware > > > > we can find, especially when there's an easy option of setting > > > > --timeout-multiplier=10. > > > > > > It's not complicated to add the option, but the fact that Debian, > > > SUSE and now Fedora all need to specify a timeout multiplier hints to > > > the fact that perhaps the default timeout is just too small. > > > > AFAIK, Fedora hasn't set any timeout multiplier in our builds. > > With > > commit 70307548d1622bfeffd60c86e62098d165c95074 > Author: Cole Robinson <crobinso@redhat.com> > Date: Thu Jan 21 16:54:45 2021 -0500 > > spec: Increase meson test timeout 10x > > Tests time out when building in slow environments, like emulated > s390x in Fedora copr. Bump up the test timeout > > Reviewed-by: Neal Gompa <ngompa13@gmail.com> > Reviewed-by: Andrea Bolognani <abologna@redhat.com> > Signed-off-by: Cole Robinson <crobinso@redhat.com> > > the timeout multiplier has been added to the upstream spec file, > which the Fedora spec file is based on, so while you're correct that > Fedora is not yet using it, it's fair to assume it will soon make its > way there for the benefit of e.g. the virt-preview COPR. I think this is the wrong solution. IMHO RPM should be making th %meson_test macro include the timeout arg automatically when running on a emulated environment. This would fix the problem for all users of meson, without causing unecessarily long timeouts for native builds. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Fri, 2021-01-29 at 15:28 +0000, Daniel P. Berrangé wrote: > On Fri, Jan 29, 2021 at 04:20:53PM +0100, Andrea Bolognani wrote: > > the timeout multiplier has been added to the upstream spec file, > > which the Fedora spec file is based on, so while you're correct that > > Fedora is not yet using it, it's fair to assume it will soon make its > > way there for the benefit of e.g. the virt-preview COPR. > > I think this is the wrong solution. IMHO RPM should be making th > %meson_test macro include the timeout arg automatically when > running on a emulated environment. This would fix the problem for > all users of meson, without causing unecessarily long timeouts > for native builds. How likely is it that some broken test will take longer than 30 seconds and less than 5 minutes to run on your average developer's laptop? My experience suggests tests either take way less than half a minute to complete, or go on spinning forever. So yeah, in the unlikely event that your changes have introduced an infinite wait in one of the tests, it will take you a couple more minutes to realize that; however, considering how fast the test suite is under meson you'd probably get suspicious way before actually hitting the timeout, and introducing such an issue is hopefully not a very frequent occurrence anyway. Note that emulated environments are not the only ones hitting this: native builds on slow architectures failing because of the timeout are the reason why I introduced the timeout multiplier in Debian, for example. Finally, not everyone uses RPMs, and even those who do might not want to use "rpmbuild" as a substitute for "meson test" - especially not on the kind of hardware where the default test timeout is a problem! Raising the timeout at the meson level makes it possible for people to just run "meson test" and be reasonably confident it will only fail if there's an actual bug in libvirt. -- Andrea Bolognani / Red Hat / Virtualization
On Fri, Jan 29, 2021 at 04:56:07PM +0100, Andrea Bolognani wrote: > On Fri, 2021-01-29 at 15:28 +0000, Daniel P. Berrangé wrote: > > On Fri, Jan 29, 2021 at 04:20:53PM +0100, Andrea Bolognani wrote: > > > the timeout multiplier has been added to the upstream spec file, > > > which the Fedora spec file is based on, so while you're correct that > > > Fedora is not yet using it, it's fair to assume it will soon make its > > > way there for the benefit of e.g. the virt-preview COPR. > > > > I think this is the wrong solution. IMHO RPM should be making th > > %meson_test macro include the timeout arg automatically when > > running on a emulated environment. This would fix the problem for > > all users of meson, without causing unecessarily long timeouts > > for native builds. > > How likely is it that some broken test will take longer than 30 > seconds and less than 5 minutes to run on your average developer's > laptop? My experience suggests tests either take way less than half a > minute to complete, or go on spinning forever. > > So yeah, in the unlikely event that your changes have introduced an > infinite wait in one of the tests, it will take you a couple more > minutes to realize that; however, considering how fast the test suite > is under meson you'd probably get suspicious way before actually > hitting the timeout, and introducing such an issue is hopefully not a > very frequent occurrence anyway. > > Note that emulated environments are not the only ones hitting this: > native builds on slow architectures failing because of the timeout > are the reason why I introduced the timeout multiplier in Debian, for > example. > > Finally, not everyone uses RPMs, and even those who do might not want > to use "rpmbuild" as a substitute for "meson test" - especially not > on the kind of hardware where the default test timeout is a problem! > > Raising the timeout at the meson level makes it possible for people > to just run "meson test" and be reasonably confident it will only > fail if there's an actual bug in libvirt. If it is such a broad problem then the default should be raised in meson itself rather than individual apps using meson. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
© 2016 - 2024 Red Hat, Inc.