Based on many runs, the average run time for these 4 tests is around
250 seconds, with 320 seconds being the ceiling. In any way, the
default 120 seconds timeout is inappropriate in my experience.
Let's increase the timeout so these tests get a chance to completion.
Signed-off-by: Cleber Rosa <crosa@redhat.com>
---
tests/avocado/intel_iommu.py | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tests/avocado/intel_iommu.py b/tests/avocado/intel_iommu.py
index f04ee1cf9d..24bfad0756 100644
--- a/tests/avocado/intel_iommu.py
+++ b/tests/avocado/intel_iommu.py
@@ -25,6 +25,8 @@ class IntelIOMMU(LinuxTest):
:avocado: tags=flaky
"""
+ timeout = 360
+
IOMMU_ADDON = ',iommu_platform=on,disable-modern=off,disable-legacy=on'
kernel_path = None
initrd_path = None
--
2.43.0
Cleber Rosa <crosa@redhat.com> writes: > Based on many runs, the average run time for these 4 tests is around > 250 seconds, with 320 seconds being the ceiling. In any way, the > default 120 seconds timeout is inappropriate in my experience. I would rather see these tests updated to fix: - Don't use such an old Fedora 31 image - Avoid updating image packages (when will RH stop serving them?) - The "test" is a fairly basic check of dmesg/sysfs output I think building a buildroot image with the tools pre-installed (with perhaps more testing) would be a better use of our limited test time. FWIW the runtime on my machine is: ➜ env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py JOB ID : 5c582ccf274f3aee279c2208f969a7af8ceb9943 JOB LOG : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s) (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s) (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s) (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s) RESULTS : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 JOB TIME : 255.43 s -- Alex Bennée Virtualisation Tech Lead @ Linaro
Alex Bennée <alex.bennee@linaro.org> writes: > Cleber Rosa <crosa@redhat.com> writes: > >> Based on many runs, the average run time for these 4 tests is around >> 250 seconds, with 320 seconds being the ceiling. In any way, the >> default 120 seconds timeout is inappropriate in my experience. > > I would rather see these tests updated to fix: > > - Don't use such an old Fedora 31 image I remember proposing a bump in Fedora version used by default in avocado_qemu.LinuxTest (which would propagate to tests such as boot_linux.py and others), but that was not well accepted. I can definitely work on such a version bump again. > - Avoid updating image packages (when will RH stop serving them?) IIUC the only reason for updating the packages is to test the network from the guest, and could/should be done another way. Eric, could you confirm this? > - The "test" is a fairly basic check of dmesg/sysfs output Maybe the network is also an implicit check here. Let's see what Eric has to say. > > I think building a buildroot image with the tools pre-installed (with > perhaps more testing) would be a better use of our limited test time. > > FWIW the runtime on my machine is: > > ➜ env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py > JOB ID : 5c582ccf274f3aee279c2208f969a7af8ceb9943 > JOB LOG : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log > (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s) > (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s) > (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s) > (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s) > RESULTS : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 > JOB TIME : 255.43 s > Yes, I've also seen similar runtimes in other environments... so it looks like it depends a lot on the "dnf -y install numactl-devel". If that can be removed, the tests would have much more predictable runtimes.
Hi Cleber, On 12/13/23 21:08, Cleber Rosa wrote: > Alex Bennée <alex.bennee@linaro.org> writes: > >> Cleber Rosa <crosa@redhat.com> writes: >> >>> Based on many runs, the average run time for these 4 tests is around >>> 250 seconds, with 320 seconds being the ceiling. In any way, the >>> default 120 seconds timeout is inappropriate in my experience. >> I would rather see these tests updated to fix: >> >> - Don't use such an old Fedora 31 image > I remember proposing a bump in Fedora version used by default in > avocado_qemu.LinuxTest (which would propagate to tests such as > boot_linux.py and others), but that was not well accepted. I can > definitely work on such a version bump again. > >> - Avoid updating image packages (when will RH stop serving them?) > IIUC the only reason for updating the packages is to test the network > from the guest, and could/should be done another way. > > Eric, could you confirm this? Sorry for the delay. Yes effectively I used the dnf install to stress the viommu. In the past I was able to trigger viommu bugs that way whereas getting an IP @ for the guest was just successful. > >> - The "test" is a fairly basic check of dmesg/sysfs output > Maybe the network is also an implicit check here. Let's see what Eric > has to say. To be honest I do not remember how avocado does the check in itself; my guess if that if the dnf install does not complete you get a timeout and the test fails. But you may be more knowledged on this than me ;-) Thanks Eric > >> I think building a buildroot image with the tools pre-installed (with >> perhaps more testing) would be a better use of our limited test time. >> >> FWIW the runtime on my machine is: >> >> ➜ env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py >> JOB ID : 5c582ccf274f3aee279c2208f969a7af8ceb9943 >> JOB LOG : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log >> (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s) >> (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s) >> (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s) >> (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s) >> RESULTS : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 >> JOB TIME : 255.43 s >> > Yes, I've also seen similar runtimes in other environments... so it > looks like it depends a lot on the "dnf -y install numactl-devel". If > that can be removed, the tests would have much more predictable runtimes. >
Eric Auger <eric.auger@redhat.com> writes: > Hi Cleber, > > On 12/13/23 21:08, Cleber Rosa wrote: >> Alex Bennée <alex.bennee@linaro.org> writes: >> >>> Cleber Rosa <crosa@redhat.com> writes: >>> >>>> Based on many runs, the average run time for these 4 tests is around >>>> 250 seconds, with 320 seconds being the ceiling. In any way, the >>>> default 120 seconds timeout is inappropriate in my experience. >>> I would rather see these tests updated to fix: >>> >>> - Don't use such an old Fedora 31 image >> I remember proposing a bump in Fedora version used by default in >> avocado_qemu.LinuxTest (which would propagate to tests such as >> boot_linux.py and others), but that was not well accepted. I can >> definitely work on such a version bump again. >> >>> - Avoid updating image packages (when will RH stop serving them?) >> IIUC the only reason for updating the packages is to test the network >> from the guest, and could/should be done another way. >> >> Eric, could you confirm this? > Sorry for the delay. Yes effectively I used the dnf install to stress > the viommu. In the past I was able to trigger viommu bugs that way > whereas getting an IP @ for the guest was just successful. >> >>> - The "test" is a fairly basic check of dmesg/sysfs output >> Maybe the network is also an implicit check here. Let's see what Eric >> has to say. > > To be honest I do not remember how avocado does the check in itself; my > guess if that if the dnf install does not complete you get a timeout and > the test fails. But you may be more knowledged on this than me ;-) I guess the problem is relying on external infrastructure can lead to unpredictable results. However its a lot easier to configure user mode networking just to pull something off the internet than have a local netperf or some such setup to generate local traffic. I guess there is no loopback like setup which would sufficiently exercise the code? > > Thanks > > Eric >> >>> I think building a buildroot image with the tools pre-installed (with >>> perhaps more testing) would be a better use of our limited test time. >>> >>> FWIW the runtime on my machine is: >>> >>> ➜ env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py >>> JOB ID : 5c582ccf274f3aee279c2208f969a7af8ceb9943 >>> JOB LOG : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log >>> (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s) >>> (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s) >>> (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s) >>> (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s) >>> RESULTS : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 >>> JOB TIME : 255.43 s >>> >> Yes, I've also seen similar runtimes in other environments... so it >> looks like it depends a lot on the "dnf -y install numactl-devel". If >> that can be removed, the tests would have much more predictable runtimes. >> -- Alex Bennée Virtualisation Tech Lead @ Linaro
On 12/14/23 10:41, Alex Bennée wrote: > Eric Auger <eric.auger@redhat.com> writes: > >> Hi Cleber, >> >> On 12/13/23 21:08, Cleber Rosa wrote: >>> Alex Bennée <alex.bennee@linaro.org> writes: >>> >>>> Cleber Rosa <crosa@redhat.com> writes: >>>> >>>>> Based on many runs, the average run time for these 4 tests is around >>>>> 250 seconds, with 320 seconds being the ceiling. In any way, the >>>>> default 120 seconds timeout is inappropriate in my experience. >>>> I would rather see these tests updated to fix: >>>> >>>> - Don't use such an old Fedora 31 image >>> I remember proposing a bump in Fedora version used by default in >>> avocado_qemu.LinuxTest (which would propagate to tests such as >>> boot_linux.py and others), but that was not well accepted. I can >>> definitely work on such a version bump again. >>> >>>> - Avoid updating image packages (when will RH stop serving them?) >>> IIUC the only reason for updating the packages is to test the network >>> from the guest, and could/should be done another way. >>> >>> Eric, could you confirm this? >> Sorry for the delay. Yes effectively I used the dnf install to stress >> the viommu. In the past I was able to trigger viommu bugs that way >> whereas getting an IP @ for the guest was just successful. >>>> - The "test" is a fairly basic check of dmesg/sysfs output >>> Maybe the network is also an implicit check here. Let's see what Eric >>> has to say. >> To be honest I do not remember how avocado does the check in itself; my >> guess if that if the dnf install does not complete you get a timeout and >> the test fails. But you may be more knowledged on this than me ;-) > I guess the problem is relying on external infrastructure can lead to > unpredictable results. However its a lot easier to configure user mode > networking just to pull something off the internet than have a local > netperf or some such setup to generate local traffic. > > I guess there is no loopback like setup which would sufficiently > exercise the code? I don't think so. This test is a reproducer for a bug I encountered and fixed in the past. Besudes, I am totally fine moving the test out of the gating CI and just keep it as a tier2 test, as suggested by Phil. Thanks Eric > >> Thanks >> >> Eric >>>> I think building a buildroot image with the tools pre-installed (with >>>> perhaps more testing) would be a better use of our limited test time. >>>> >>>> FWIW the runtime on my machine is: >>>> >>>> ➜ env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py >>>> JOB ID : 5c582ccf274f3aee279c2208f969a7af8ceb9943 >>>> JOB LOG : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log >>>> (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s) >>>> (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s) >>>> (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s) >>>> (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s) >>>> RESULTS : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 >>>> JOB TIME : 255.43 s >>>> >>> Yes, I've also seen similar runtimes in other environments... so it >>> looks like it depends a lot on the "dnf -y install numactl-devel". If >>> that can be removed, the tests would have much more predictable runtimes. >>>
On 14/12/23 08:24, Eric Auger wrote: > Hi Cleber, > > On 12/13/23 21:08, Cleber Rosa wrote: >> Alex Bennée <alex.bennee@linaro.org> writes: >> >>> Cleber Rosa <crosa@redhat.com> writes: >>> >>>> Based on many runs, the average run time for these 4 tests is around >>>> 250 seconds, with 320 seconds being the ceiling. In any way, the >>>> default 120 seconds timeout is inappropriate in my experience. >>> I would rather see these tests updated to fix: >>> >>> - Don't use such an old Fedora 31 image >> I remember proposing a bump in Fedora version used by default in >> avocado_qemu.LinuxTest (which would propagate to tests such as >> boot_linux.py and others), but that was not well accepted. I can >> definitely work on such a version bump again. >> >>> - Avoid updating image packages (when will RH stop serving them?) >> IIUC the only reason for updating the packages is to test the network >> from the guest, and could/should be done another way. >> >> Eric, could you confirm this? > Sorry for the delay. Yes effectively I used the dnf install to stress > the viommu. In the past I was able to trigger viommu bugs that way > whereas getting an IP @ for the guest was just successful. Maybe this test is useful as what Daniel described as "Tier 2" [*], that maintainers run locally but don't need to be gating CI? That would save us some resources there. [*] https://lore.kernel.org/qemu-devel/20200427152036.GI1244803@redhat.com/ >> >>> - The "test" is a fairly basic check of dmesg/sysfs output >> Maybe the network is also an implicit check here. Let's see what Eric >> has to say. > > To be honest I do not remember how avocado does the check in itself; my > guess if that if the dnf install does not complete you get a timeout and > the test fails. But you may be more knowledged on this than me ;-) > > Thanks > > Eric
On 2023/12/12 2:01, Alex Bennée wrote: > Cleber Rosa <crosa@redhat.com> writes: > >> Based on many runs, the average run time for these 4 tests is around >> 250 seconds, with 320 seconds being the ceiling. In any way, the >> default 120 seconds timeout is inappropriate in my experience. > > I would rather see these tests updated to fix: > > - Don't use such an old Fedora 31 image > - Avoid updating image packages (when will RH stop serving them?) > - The "test" is a fairly basic check of dmesg/sysfs output > > I think building a buildroot image with the tools pre-installed (with > perhaps more testing) would be a better use of our limited test time. That's what tests/avocado/netdev-ethtool.py does, but I don't like it much because building a buildroot image takes long and results in a somewhat big binary blob. I rather prefer to have some script that runs mkosi[1] to make an image; it downloads packages from distributor so it will take much less than using buildroot. The CI system can run the script and cache the image. [1] https://github.com/systemd/mkosi
Akihiko Odaki <akihiko.odaki@daynix.com> writes: > On 2023/12/12 2:01, Alex Bennée wrote: >> Cleber Rosa <crosa@redhat.com> writes: >> >>> Based on many runs, the average run time for these 4 tests is around >>> 250 seconds, with 320 seconds being the ceiling. In any way, the >>> default 120 seconds timeout is inappropriate in my experience. >> I would rather see these tests updated to fix: >> - Don't use such an old Fedora 31 image >> - Avoid updating image packages (when will RH stop serving them?) >> - The "test" is a fairly basic check of dmesg/sysfs output >> I think building a buildroot image with the tools pre-installed >> (with >> perhaps more testing) would be a better use of our limited test time. > > That's what tests/avocado/netdev-ethtool.py does, but I don't like it > much because building a buildroot image takes long and results in a > somewhat big binary blob. > > I rather prefer to have some script that runs mkosi[1] to make an > image; it downloads packages from distributor so it will take much > less than using buildroot. The CI system can run the script and cache > the image. I'm all more smaller more directed test cases and I'm less worried about exactly how things are built. I only use buildroot personally because I'm familiar with it and it makes it easy to build testcases for multiple architectures. > [1] https://github.com/systemd/mkosi If that works for you go for it ;-) -- Alex Bennée Virtualisation Tech Lead @ Linaro
© 2016 - 2024 Red Hat, Inc.