tests/qtest/meson.build | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
The migration-test is a long-running test whose subtests all launch
at least two QEMU processes. This means that if for example the host
has 4 CPUs then 'make check' defaults to a parallelism of 5, and if
we launch 5 migration-tests in parallel then we will be running 10
QEMU instances on a 4 CPU system. If the system is not very fast
then the test can spuriously time out because the different tests are
all stealing CPU from each other. This seems to particularly be a
problem on our S390 CI job and the cross-i686-tci CI job.
Force meson to run migration-test non-parallel, so there is never any
other test running at the same time as it. This will slow down
overall test execution time somewhat, but hopefully make our CI less
flaky.
The downside is that because each migration-test instance runs for
between 2 and 5 minutes and we run it for five architectures this
significantly increases the runtime. For an all-architectures build
on my local machine 'make check -j8' goes from
real 8m19.127s
user 31m47.534s
sys 19m42.650s
to
real 20m31.218s
user 32m48.712s
sys 19m52.133s
more than doubling the wallclock runtime.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
Also, looking at these figures we spend a *lot* of our overall
'make check' time on migration-test. Do we really need to do
that much for every architecture?
It's unfortunate that meson doesn't let us say "parallel is
OK, but not very parallel". One other approach would be
to have mtest2make say "run tests at half the parallelism that
-jN suggests, rather than at that parallelism", I guess...
---
tests/qtest/meson.build | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index fc852f3d8ba..dbf2b8e2be1 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -17,6 +17,21 @@ slow_qtests = {
'vmgenid-test': 610,
}
+# Tests which override the default of "can run in parallel".
+# Don't use this to work around test bugs which prevent parallelism.
+# Do document why we need to make a particular test serialized.
+# Do be sparing with use of this: tests listed here will not be
+# run in parallel with any other test, not merely not with other
+# instances of themselves.
+#
+# The migration-test's subtests will each kick off two QEMU processes,
+# so allowing multiple migration-tests in parallel can overload the
+# host system and result in intermittent timeouts. So we only want to
+# run one migration-test at once.a
+qtests_parallelism = {
+ 'migration-test': false,
+}
+
qtests_generic = [
'cdrom-test',
'device-introspect-test',
@@ -411,6 +426,7 @@ foreach dir : target_dirs
protocol: 'tap',
timeout: slow_qtests.get(test, 60),
priority: slow_qtests.get(test, 60),
+ is_parallel: qtests_parallelism.get(test, true),
suite: ['qtest', 'qtest-' + target_base])
endforeach
endforeach
--
2.34.1
Peter Maydell <peter.maydell@linaro.org> writes: > The migration-test is a long-running test whose subtests all launch > at least two QEMU processes. This means that if for example the host > has 4 CPUs then 'make check' defaults to a parallelism of 5, and if > we launch 5 migration-tests in parallel then we will be running 10 > QEMU instances on a 4 CPU system. If the system is not very fast > then the test can spuriously time out because the different tests are > all stealing CPU from each other. This seems to particularly be a > problem on our S390 CI job and the cross-i686-tci CI job. > > Force meson to run migration-test non-parallel, so there is never any > other test running at the same time as it. This will slow down > overall test execution time somewhat, but hopefully make our CI less > flaky. > > The downside is that because each migration-test instance runs for > between 2 and 5 minutes and we run it for five architectures this > significantly increases the runtime. For an all-architectures build > on my local machine 'make check -j8' goes from > > real 8m19.127s > user 31m47.534s > sys 19m42.650s > > to > > real 20m31.218s > user 32m48.712s > sys 19m52.133s > > more than doubling the wallclock runtime. > > Signed-off-by: Peter Maydell <peter.maydell@linaro.org> > --- > Also, looking at these figures we spend a *lot* of our overall > 'make check' time on migration-test. Do we really need to do > that much for every architecture? I guess one question is are we getting value from all the extra migration tests? There certainly seem to be some sub-tests that are slower than the others and I assume testing a small delta on the tests before it. On s390x it seems the native test runs pretty much to the same time as the other TCG guests. Do we exercise any extra migration code by running tests for every architecture as opposed to one KVM/native hyp and one TCG one? -- Alex Bennée Virtualisation Tech Lead @ Linaro
On Mon, 9 Sept 2024 at 16:23, Alex Bennée <alex.bennee@linaro.org> wrote: > I guess one question is are we getting value from all the extra > migration tests? There certainly seem to be some sub-tests that are > slower than the others and I assume testing a small delta on the tests > before it. > > On s390x it seems the native test runs pretty much to the same time as > the other TCG guests. Do we exercise any extra migration code by running > tests for every architecture as opposed to one KVM/native hyp and one > TCG one? s390 is an interesting one because Christian pointed out that although it has "KVM" support, we're actually running on a VM under z/VM, and so when we run a CI test under "-accel KVM" that's actually nested-KVM and its effects on the host CPU's TLB could be such that it's actually worse than using TCG... -- PMM
© 2016 - 2024 Red Hat, Inc.