[PATCH 03/10] tests/avocado/intel_iommu.py: increase timeout

Cleber Rosa posted 10 patches 11 months, 3 weeks ago
Maintainers: Cleber Rosa <crosa@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Wainer dos Santos Moschetta <wainersm@redhat.com>, Beraldo Leal <bleal@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, David Woodhouse <dwmw2@infradead.org>, Paul Durrant <paul@xen.org>, Paolo Bonzini <pbonzini@redhat.com>, Radoslaw Biernacki <rad@semihalf.com>, Peter Maydell <peter.maydell@linaro.org>, Leif Lindholm <quic_llindhol@quicinc.com>, Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>, Akihiko Odaki <akihiko.odaki@daynix.com>, Sriram Yagnaraman <sriram.yagnaraman@est.tech>, Jiaxun Yang <jiaxun.yang@flygoat.com>
[PATCH 03/10] tests/avocado/intel_iommu.py: increase timeout
Posted by Cleber Rosa 11 months, 3 weeks ago
Based on many runs, the average run time for these 4 tests is around
250 seconds, with 320 seconds being the ceiling.  In any way, the
default 120 seconds timeout is inappropriate in my experience.

Let's increase the timeout so these tests get a chance to completion.

Signed-off-by: Cleber Rosa <crosa@redhat.com>
---
 tests/avocado/intel_iommu.py | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/avocado/intel_iommu.py b/tests/avocado/intel_iommu.py
index f04ee1cf9d..24bfad0756 100644
--- a/tests/avocado/intel_iommu.py
+++ b/tests/avocado/intel_iommu.py
@@ -25,6 +25,8 @@ class IntelIOMMU(LinuxTest):
     :avocado: tags=flaky
     """
 
+    timeout = 360
+
     IOMMU_ADDON = ',iommu_platform=on,disable-modern=off,disable-legacy=on'
     kernel_path = None
     initrd_path = None
-- 
2.43.0
Re: [PATCH 03/10] tests/avocado/intel_iommu.py: increase timeout
Posted by Alex Bennée 11 months, 3 weeks ago
Cleber Rosa <crosa@redhat.com> writes:

> Based on many runs, the average run time for these 4 tests is around
> 250 seconds, with 320 seconds being the ceiling.  In any way, the
> default 120 seconds timeout is inappropriate in my experience.

I would rather see these tests updated to fix:

 - Don't use such an old Fedora 31 image
 - Avoid updating image packages (when will RH stop serving them?)
 - The "test" is a fairly basic check of dmesg/sysfs output

I think building a buildroot image with the tools pre-installed (with
perhaps more testing) would be a better use of our limited test time.

FWIW the runtime on my machine is:

➜  env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py
JOB ID     : 5c582ccf274f3aee279c2208f969a7af8ceb9943
JOB LOG    : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log
 (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s)
 (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s)
 (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s)
 (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s)
RESULTS    : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 255.43 s

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro
Re: [PATCH 03/10] tests/avocado/intel_iommu.py: increase timeout
Posted by Cleber Rosa 11 months, 2 weeks ago
Alex Bennée <alex.bennee@linaro.org> writes:

> Cleber Rosa <crosa@redhat.com> writes:
>
>> Based on many runs, the average run time for these 4 tests is around
>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>> default 120 seconds timeout is inappropriate in my experience.
>
> I would rather see these tests updated to fix:
>
>  - Don't use such an old Fedora 31 image

I remember proposing a bump in Fedora version used by default in
avocado_qemu.LinuxTest (which would propagate to tests such as
boot_linux.py and others), but that was not well accepted.  I can
definitely work on such a version bump again.

>  - Avoid updating image packages (when will RH stop serving them?)

IIUC the only reason for updating the packages is to test the network
from the guest, and could/should be done another way.

Eric, could you confirm this?

>  - The "test" is a fairly basic check of dmesg/sysfs output

Maybe the network is also an implicit check here.  Let's see what Eric
has to say.

>
> I think building a buildroot image with the tools pre-installed (with
> perhaps more testing) would be a better use of our limited test time.
>
> FWIW the runtime on my machine is:
>
> ➜  env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py
> JOB ID     : 5c582ccf274f3aee279c2208f969a7af8ceb9943
> JOB LOG    : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log
>  (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s)
>  (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s)
>  (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s)
>  (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s)
> RESULTS    : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
> JOB TIME   : 255.43 s
>

Yes, I've also seen similar runtimes in other environments... so it
looks like it depends a lot on the "dnf -y install numactl-devel".  If
that can be removed, the tests would have much more predictable runtimes.
Re: [PATCH 03/10] tests/avocado/intel_iommu.py: increase timeout
Posted by Eric Auger 11 months, 2 weeks ago
Hi Cleber,

On 12/13/23 21:08, Cleber Rosa wrote:
> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> Cleber Rosa <crosa@redhat.com> writes:
>>
>>> Based on many runs, the average run time for these 4 tests is around
>>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>>> default 120 seconds timeout is inappropriate in my experience.
>> I would rather see these tests updated to fix:
>>
>>  - Don't use such an old Fedora 31 image
> I remember proposing a bump in Fedora version used by default in
> avocado_qemu.LinuxTest (which would propagate to tests such as
> boot_linux.py and others), but that was not well accepted.  I can
> definitely work on such a version bump again.
>
>>  - Avoid updating image packages (when will RH stop serving them?)
> IIUC the only reason for updating the packages is to test the network
> from the guest, and could/should be done another way.
>
> Eric, could you confirm this?
Sorry for the delay. Yes effectively I used the dnf install to stress
the viommu. In the past I was able to trigger viommu bugs that way
whereas getting an IP @ for the guest was just successful.
>
>>  - The "test" is a fairly basic check of dmesg/sysfs output
> Maybe the network is also an implicit check here.  Let's see what Eric
> has to say.

To be honest I do not remember how avocado does the check in itself; my
guess if that if the dnf install does not complete you get a timeout and
the test fails. But you may be more knowledged on this than me ;-)

Thanks

Eric
>
>> I think building a buildroot image with the tools pre-installed (with
>> perhaps more testing) would be a better use of our limited test time.
>>
>> FWIW the runtime on my machine is:
>>
>> ➜  env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py
>> JOB ID     : 5c582ccf274f3aee279c2208f969a7af8ceb9943
>> JOB LOG    : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log
>>  (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s)
>>  (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s)
>>  (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s)
>>  (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s)
>> RESULTS    : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>> JOB TIME   : 255.43 s
>>
> Yes, I've also seen similar runtimes in other environments... so it
> looks like it depends a lot on the "dnf -y install numactl-devel".  If
> that can be removed, the tests would have much more predictable runtimes.
>


Re: [PATCH 03/10] tests/avocado/intel_iommu.py: increase timeout
Posted by Alex Bennée 11 months, 2 weeks ago
Eric Auger <eric.auger@redhat.com> writes:

> Hi Cleber,
>
> On 12/13/23 21:08, Cleber Rosa wrote:
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>>> Cleber Rosa <crosa@redhat.com> writes:
>>>
>>>> Based on many runs, the average run time for these 4 tests is around
>>>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>>>> default 120 seconds timeout is inappropriate in my experience.
>>> I would rather see these tests updated to fix:
>>>
>>>  - Don't use such an old Fedora 31 image
>> I remember proposing a bump in Fedora version used by default in
>> avocado_qemu.LinuxTest (which would propagate to tests such as
>> boot_linux.py and others), but that was not well accepted.  I can
>> definitely work on such a version bump again.
>>
>>>  - Avoid updating image packages (when will RH stop serving them?)
>> IIUC the only reason for updating the packages is to test the network
>> from the guest, and could/should be done another way.
>>
>> Eric, could you confirm this?
> Sorry for the delay. Yes effectively I used the dnf install to stress
> the viommu. In the past I was able to trigger viommu bugs that way
> whereas getting an IP @ for the guest was just successful.
>>
>>>  - The "test" is a fairly basic check of dmesg/sysfs output
>> Maybe the network is also an implicit check here.  Let's see what Eric
>> has to say.
>
> To be honest I do not remember how avocado does the check in itself; my
> guess if that if the dnf install does not complete you get a timeout and
> the test fails. But you may be more knowledged on this than me ;-)

I guess the problem is relying on external infrastructure can lead to
unpredictable results. However its a lot easier to configure user mode
networking just to pull something off the internet than have a local
netperf or some such setup to generate local traffic.

I guess there is no loopback like setup which would sufficiently
exercise the code?

>
> Thanks
>
> Eric
>>
>>> I think building a buildroot image with the tools pre-installed (with
>>> perhaps more testing) would be a better use of our limited test time.
>>>
>>> FWIW the runtime on my machine is:
>>>
>>> ➜  env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py
>>> JOB ID     : 5c582ccf274f3aee279c2208f969a7af8ceb9943
>>> JOB LOG    : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log
>>>  (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s)
>>>  (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s)
>>>  (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s)
>>>  (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s)
>>> RESULTS    : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>>> JOB TIME   : 255.43 s
>>>
>> Yes, I've also seen similar runtimes in other environments... so it
>> looks like it depends a lot on the "dnf -y install numactl-devel".  If
>> that can be removed, the tests would have much more predictable runtimes.
>>

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro
Re: [PATCH 03/10] tests/avocado/intel_iommu.py: increase timeout
Posted by Eric Auger 11 months, 2 weeks ago

On 12/14/23 10:41, Alex Bennée wrote:
> Eric Auger <eric.auger@redhat.com> writes:
>
>> Hi Cleber,
>>
>> On 12/13/23 21:08, Cleber Rosa wrote:
>>> Alex Bennée <alex.bennee@linaro.org> writes:
>>>
>>>> Cleber Rosa <crosa@redhat.com> writes:
>>>>
>>>>> Based on many runs, the average run time for these 4 tests is around
>>>>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>>>>> default 120 seconds timeout is inappropriate in my experience.
>>>> I would rather see these tests updated to fix:
>>>>
>>>>  - Don't use such an old Fedora 31 image
>>> I remember proposing a bump in Fedora version used by default in
>>> avocado_qemu.LinuxTest (which would propagate to tests such as
>>> boot_linux.py and others), but that was not well accepted.  I can
>>> definitely work on such a version bump again.
>>>
>>>>  - Avoid updating image packages (when will RH stop serving them?)
>>> IIUC the only reason for updating the packages is to test the network
>>> from the guest, and could/should be done another way.
>>>
>>> Eric, could you confirm this?
>> Sorry for the delay. Yes effectively I used the dnf install to stress
>> the viommu. In the past I was able to trigger viommu bugs that way
>> whereas getting an IP @ for the guest was just successful.
>>>>  - The "test" is a fairly basic check of dmesg/sysfs output
>>> Maybe the network is also an implicit check here.  Let's see what Eric
>>> has to say.
>> To be honest I do not remember how avocado does the check in itself; my
>> guess if that if the dnf install does not complete you get a timeout and
>> the test fails. But you may be more knowledged on this than me ;-)
> I guess the problem is relying on external infrastructure can lead to
> unpredictable results. However its a lot easier to configure user mode
> networking just to pull something off the internet than have a local
> netperf or some such setup to generate local traffic.
>
> I guess there is no loopback like setup which would sufficiently
> exercise the code?

I don't think so. This test is a reproducer for a bug I encountered and
fixed in the past.
Besudes, I am totally fine moving the test out of the gating CI and just
keep it as a tier2 test, as suggested by Phil.

Thanks

Eric
>
>> Thanks
>>
>> Eric
>>>> I think building a buildroot image with the tools pre-installed (with
>>>> perhaps more testing) would be a better use of our limited test time.
>>>>
>>>> FWIW the runtime on my machine is:
>>>>
>>>> ➜  env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py
>>>> JOB ID     : 5c582ccf274f3aee279c2208f969a7af8ceb9943
>>>> JOB LOG    : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log
>>>>  (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s)
>>>>  (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s)
>>>>  (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s)
>>>>  (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s)
>>>> RESULTS    : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>>>> JOB TIME   : 255.43 s
>>>>
>>> Yes, I've also seen similar runtimes in other environments... so it
>>> looks like it depends a lot on the "dnf -y install numactl-devel".  If
>>> that can be removed, the tests would have much more predictable runtimes.
>>>


Re: [PATCH 03/10] tests/avocado/intel_iommu.py: increase timeout
Posted by Philippe Mathieu-Daudé 11 months, 2 weeks ago
On 14/12/23 08:24, Eric Auger wrote:
> Hi Cleber,
> 
> On 12/13/23 21:08, Cleber Rosa wrote:
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>>> Cleber Rosa <crosa@redhat.com> writes:
>>>
>>>> Based on many runs, the average run time for these 4 tests is around
>>>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>>>> default 120 seconds timeout is inappropriate in my experience.
>>> I would rather see these tests updated to fix:
>>>
>>>   - Don't use such an old Fedora 31 image
>> I remember proposing a bump in Fedora version used by default in
>> avocado_qemu.LinuxTest (which would propagate to tests such as
>> boot_linux.py and others), but that was not well accepted.  I can
>> definitely work on such a version bump again.
>>
>>>   - Avoid updating image packages (when will RH stop serving them?)
>> IIUC the only reason for updating the packages is to test the network
>> from the guest, and could/should be done another way.
>>
>> Eric, could you confirm this?
> Sorry for the delay. Yes effectively I used the dnf install to stress
> the viommu. In the past I was able to trigger viommu bugs that way
> whereas getting an IP @ for the guest was just successful.

Maybe this test is useful as what Daniel described as "Tier 2" [*],
that maintainers run locally but don't need to be gating CI? That
would save us some resources there.

[*] https://lore.kernel.org/qemu-devel/20200427152036.GI1244803@redhat.com/

>>
>>>   - The "test" is a fairly basic check of dmesg/sysfs output
>> Maybe the network is also an implicit check here.  Let's see what Eric
>> has to say.
> 
> To be honest I do not remember how avocado does the check in itself; my
> guess if that if the dnf install does not complete you get a timeout and
> the test fails. But you may be more knowledged on this than me ;-)
> 
> Thanks
> 
> Eric


Re: [PATCH 03/10] tests/avocado/intel_iommu.py: increase timeout
Posted by Akihiko Odaki 11 months, 2 weeks ago
On 2023/12/12 2:01, Alex Bennée wrote:
> Cleber Rosa <crosa@redhat.com> writes:
> 
>> Based on many runs, the average run time for these 4 tests is around
>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>> default 120 seconds timeout is inappropriate in my experience.
> 
> I would rather see these tests updated to fix:
> 
>   - Don't use such an old Fedora 31 image
>   - Avoid updating image packages (when will RH stop serving them?)
>   - The "test" is a fairly basic check of dmesg/sysfs output
> 
> I think building a buildroot image with the tools pre-installed (with
> perhaps more testing) would be a better use of our limited test time.

That's what tests/avocado/netdev-ethtool.py does, but I don't like it 
much because building a buildroot image takes long and results in a 
somewhat big binary blob.

I rather prefer to have some script that runs mkosi[1] to make an image; 
it downloads packages from distributor so it will take much less than 
using buildroot. The CI system can run the script and cache the image.

[1] https://github.com/systemd/mkosi

Re: [PATCH 03/10] tests/avocado/intel_iommu.py: increase timeout
Posted by Alex Bennée 11 months, 2 weeks ago
Akihiko Odaki <akihiko.odaki@daynix.com> writes:

> On 2023/12/12 2:01, Alex Bennée wrote:
>> Cleber Rosa <crosa@redhat.com> writes:
>> 
>>> Based on many runs, the average run time for these 4 tests is around
>>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>>> default 120 seconds timeout is inappropriate in my experience.
>> I would rather see these tests updated to fix:
>>   - Don't use such an old Fedora 31 image
>>   - Avoid updating image packages (when will RH stop serving them?)
>>   - The "test" is a fairly basic check of dmesg/sysfs output
>> I think building a buildroot image with the tools pre-installed
>> (with
>> perhaps more testing) would be a better use of our limited test time.
>
> That's what tests/avocado/netdev-ethtool.py does, but I don't like it
> much because building a buildroot image takes long and results in a
> somewhat big binary blob.
>
> I rather prefer to have some script that runs mkosi[1] to make an
> image; it downloads packages from distributor so it will take much
> less than using buildroot. The CI system can run the script and cache
> the image.

I'm all more smaller more directed test cases and I'm less worried about
exactly how things are built. I only use buildroot personally because
I'm familiar with it and it makes it easy to build testcases for
multiple architectures.

> [1] https://github.com/systemd/mkosi

If that works for you go for it ;-)

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro