[PATCH] tests: increase timeout per instance of bios-tables-test

Igor Mammedov posted 1 patch 1 month, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20240716125930.620861-1-imammedo@redhat.com
Maintainers: Thomas Huth <thuth@redhat.com>, Laurent Vivier <lvivier@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
tests/qtest/meson.build | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] tests: increase timeout per instance of bios-tables-test
Posted by Igor Mammedov 1 month, 3 weeks ago
CI often fails 'cross-i686-tci' job due to runner slowness
Log shows that test almost complete, with a few remaining
when bios-tables-test timeout hits:

  19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
    TIMEOUT        610.02s   killed by signal 15 SIGTERM
  ...
  stderr:
  TAP parsing error: Too few tests run (expected 8, got 7)

At the same time overall job running time is only ~30 out of 1hr allowed.

Increase bios-tables-test instance timeout on 5min as a fix
for slow CI runners.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
 tests/qtest/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index 6508bfb1a2..ff9200f882 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -1,6 +1,6 @@
 slow_qtests = {
   'aspeed_smc-test': 360,
-  'bios-tables-test' : 610,
+  'bios-tables-test' : 910,
   'cdrom-test' : 610,
   'device-introspect-test' : 720,
   'migration-test' : 480,
-- 
2.43.0
Re: [PATCH] tests: increase timeout per instance of bios-tables-test
Posted by Michael S. Tsirkin 1 month, 3 weeks ago
On Tue, Jul 16, 2024 at 02:59:30PM +0200, Igor Mammedov wrote:
> CI often fails 'cross-i686-tci' job due to runner slowness
> Log shows that test almost complete, with a few remaining
> when bios-tables-test timeout hits:
> 
>   19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
>     TIMEOUT        610.02s   killed by signal 15 SIGTERM
>   ...
>   stderr:
>   TAP parsing error: Too few tests run (expected 8, got 7)
> 
> At the same time overall job running time is only ~30 out of 1hr allowed.
> 
> Increase bios-tables-test instance timeout on 5min as a fix
> for slow CI runners.
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>

We can't just keep increasing the timeout.
The issue is checking wall time on a busy host,
isn't it? Let's check CPU time instead.

> ---
>  tests/qtest/meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> index 6508bfb1a2..ff9200f882 100644
> --- a/tests/qtest/meson.build
> +++ b/tests/qtest/meson.build
> @@ -1,6 +1,6 @@
>  slow_qtests = {
>    'aspeed_smc-test': 360,
> -  'bios-tables-test' : 610,
> +  'bios-tables-test' : 910,
>    'cdrom-test' : 610,
>    'device-introspect-test' : 720,
>    'migration-test' : 480,
> -- 
> 2.43.0
Re: [PATCH] tests: increase timeout per instance of bios-tables-test
Posted by Thomas Huth 1 month, 3 weeks ago
On 16/07/2024 15.06, Michael S. Tsirkin wrote:
> On Tue, Jul 16, 2024 at 02:59:30PM +0200, Igor Mammedov wrote:
>> CI often fails 'cross-i686-tci' job due to runner slowness
>> Log shows that test almost complete, with a few remaining
>> when bios-tables-test timeout hits:
>>
>>    19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
>>      TIMEOUT        610.02s   killed by signal 15 SIGTERM
>>    ...
>>    stderr:
>>    TAP parsing error: Too few tests run (expected 8, got 7)
>>
>> At the same time overall job running time is only ~30 out of 1hr allowed.
>>
>> Increase bios-tables-test instance timeout on 5min as a fix
>> for slow CI runners.
>>
>> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> 
> We can't just keep increasing the timeout.
> The issue is checking wall time on a busy host,
> isn't it? Let's check CPU time instead.

The timeout setting comes from meson, not sure whether you can switch that 
easily to use CPU time instead of wall time?

Anyway, if the bios-tables-test is getting more and more complex, it's maybe 
not such a good idea to run it in a job that is using TCI ... Maybe it's 
best to remove aarch64-softmmu from the cross-i686-tci job?

  Thomas
Re: [PATCH] tests: increase timeout per instance of bios-tables-test
Posted by Peter Maydell 1 month, 3 weeks ago
On Tue, 16 Jul 2024 at 18:45, Thomas Huth <thuth@redhat.com> wrote:
>
> On 16/07/2024 15.06, Michael S. Tsirkin wrote:
> > On Tue, Jul 16, 2024 at 02:59:30PM +0200, Igor Mammedov wrote:
> >> CI often fails 'cross-i686-tci' job due to runner slowness
> >> Log shows that test almost complete, with a few remaining
> >> when bios-tables-test timeout hits:
> >>
> >>    19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
> >>      TIMEOUT        610.02s   killed by signal 15 SIGTERM
> >>    ...
> >>    stderr:
> >>    TAP parsing error: Too few tests run (expected 8, got 7)
> >>
> >> At the same time overall job running time is only ~30 out of 1hr allowed.
> >>
> >> Increase bios-tables-test instance timeout on 5min as a fix
> >> for slow CI runners.
> >>
> >> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> >
> > We can't just keep increasing the timeout.
> > The issue is checking wall time on a busy host,
> > isn't it? Let's check CPU time instead.
>
> The timeout setting comes from meson, not sure whether you can switch that
> easily to use CPU time instead of wall time?
>
> Anyway, if the bios-tables-test is getting more and more complex, it's maybe
> not such a good idea to run it in a job that is using TCI ... Maybe it's
> best to remove aarch64-softmmu from the cross-i686-tci job?

It's one of the few tests that actually runs code in the guest:
we definitely shouldn't reduce the coverage of the actual TCI
part of the TCI job, I think.

I continue to think we need to find out why this CI job is
perpetually flaky and fix the underlying cause, not simply
increase timeouts or drop test cases or configs from it.

-- PMM
Re: [PATCH] tests: increase timeout per instance of bios-tables-test
Posted by Igor Mammedov 1 month, 3 weeks ago
On Tue, 16 Jul 2024 09:06:59 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Jul 16, 2024 at 02:59:30PM +0200, Igor Mammedov wrote:
> > CI often fails 'cross-i686-tci' job due to runner slowness
> > Log shows that test almost complete, with a few remaining
> > when bios-tables-test timeout hits:
> > 
> >   19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
> >     TIMEOUT        610.02s   killed by signal 15 SIGTERM
> >   ...
> >   stderr:
> >   TAP parsing error: Too few tests run (expected 8, got 7)
> > 
> > At the same time overall job running time is only ~30 out of 1hr allowed.
> > 
> > Increase bios-tables-test instance timeout on 5min as a fix
> > for slow CI runners.
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>  
> 
> We can't just keep increasing the timeout.
in this case I'm following precedent
https://gitlab.com/qemu-project/qemu/-/commit/a1f5a47b60d119859d974bed4d66db745448aac6
I'm not saying that the right approach (though seems to work for now)

> The issue is checking wall time on a busy host,
> isn't it? Let's check CPU time instead.
It likely won't help as we still racing with wallclock
overall job timeout (which sometimes triggers failure too,
I guess it depends on stars alignment and load on the host).

Anyways, I don't have know-how when it comes to meson,
to do more than this patch.

with this patch 'cross-i686-tci' job passes for me,
but we have msys2-64bit failing atm due timeouts as well
(seems to be limited to sparc tests)

> 
> > ---
> >  tests/qtest/meson.build | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
> > index 6508bfb1a2..ff9200f882 100644
> > --- a/tests/qtest/meson.build
> > +++ b/tests/qtest/meson.build
> > @@ -1,6 +1,6 @@
> >  slow_qtests = {
> >    'aspeed_smc-test': 360,
> > -  'bios-tables-test' : 610,
> > +  'bios-tables-test' : 910,
> >    'cdrom-test' : 610,
> >    'device-introspect-test' : 720,
> >    'migration-test' : 480,
> > -- 
> > 2.43.0  
>
Re: [PATCH] tests: increase timeout per instance of bios-tables-test
Posted by Peter Maydell 1 month, 3 weeks ago
On Tue, 16 Jul 2024 at 14:07, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jul 16, 2024 at 02:59:30PM +0200, Igor Mammedov wrote:
> > CI often fails 'cross-i686-tci' job due to runner slowness
> > Log shows that test almost complete, with a few remaining
> > when bios-tables-test timeout hits:
> >
> >   19/270 qemu:qtest+qtest-aarch64 / qtest-aarch64/bios-tables-test
> >     TIMEOUT        610.02s   killed by signal 15 SIGTERM
> >   ...
> >   stderr:
> >   TAP parsing error: Too few tests run (expected 8, got 7)
> >
> > At the same time overall job running time is only ~30 out of 1hr allowed.
> >
> > Increase bios-tables-test instance timeout on 5min as a fix
> > for slow CI runners.
> >
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
>
> We can't just keep increasing the timeout.
> The issue is checking wall time on a busy host,
> isn't it? Let's check CPU time instead.

The last time this came up Stefan had a look and reckoned
that the problem was that we try to run the qtests in
parallel when the CI host is only giving us one CPU:

https://lore.kernel.org/qemu-devel/9692cfcb-ef59-4cec-8452-8bfb859e8a6c@weilnetz.de/

thanks
-- PMM