riscv64/test_boston.py: fix intermitent test timeout

[PATCH] riscv64/test_boston.py: fix intermitent test timeout

Posted by Daniel Henrique Barboza 1 week, 4 days ago

The recently added Boston MIPS board selftest times out consistently in a
machine running 'make check-functional' with -j 16:

18/18 func-thorough+func-riscv64-thorough+thorough - qemu:func-riscv64-boston
      TIMEOUT        120.09s   killed by signal 15 SIGTERM

The reason is quite boring: it is testing too much stuff.

Note that functional tests aren't supposed to be used as stress tests,
e.g. it doesn't have to test every single corner case that might hit the
board. It is supposed to catch most common user ooopsies. A timeout, in
this context, is most likely to be considered something abnormal slowing
down the emulation, not a lack of CPU horsepower to run all the tests
before timeout.

Some of the tests claim to test odd CPU SMP numbers to either "ensures
proper core distribution across clusters" or "validating proper handling
of larger asymmetric SMP configurations". But there's no SMP/NUMA check
made anywhere after boot, so in the end we're just testing whether the
board is able to boot with 7/35 CPUs. As far as these tests are concerned
we could have a completely broken, but bootable, SMP topology with 7/35
CPUS, and we're oblivious about it.

Remove the 7 and 35 SMP tests, keeping the minimal CPUs (2) and maximum
(64) tests. With these changes we're now able to run the test with a
good TIMEOUT margin:

17/18 func-thorough+func-riscv64-thorough+thorough - qemu:func-riscv64-boston
      OK              61.28s   3 subtests passed

Fixes: e71111e26b ("test/functional: Add test for boston-aia board")
Signed-off-by: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
---
 tests/functional/riscv64/test_boston.py | 19 -------------------
 1 file changed, 19 deletions(-)

diff --git a/tests/functional/riscv64/test_boston.py b/tests/functional/riscv64/test_boston.py
index 385de6a61d..13c44ca3e3 100755
--- a/tests/functional/riscv64/test_boston.py
+++ b/tests/functional/riscv64/test_boston.py
@@ -63,25 +63,6 @@ def test_boston_boot_linux_min_cpus(self):
         """
         self._boot_linux_test(smp_count=2)
 
-    def test_boston_boot_linux_7_cpus(self):
-        """
-        Test Linux kernel boot with 7 CPUs
-
-        7 CPUs is a special configuration that tests odd CPU count
-        handling and ensures proper core distribution across clusters.
-        """
-        self._boot_linux_test(smp_count=7)
-
-    def test_boston_boot_linux_35_cpus(self):
-        """
-        Test Linux kernel boot with 35 CPUs
-
-        35 CPUs is a special configuration that tests a non-power-of-2
-        CPU count above 32, validating proper handling of larger
-        asymmetric SMP configurations.
-        """
-        self._boot_linux_test(smp_count=35)
-
     def test_boston_boot_linux_max_cpus(self):
         """
         Test Linux kernel boot with maximum supported CPU count (64)
-- 
2.43.0

Re: [PATCH] riscv64/test_boston.py: fix intermitent test timeout

Posted by Philippe Mathieu-Daudé 6 hours ago

On 26/1/26 18:45, Daniel Henrique Barboza wrote:
> The recently added Boston MIPS board selftest times out consistently in a
> machine running 'make check-functional' with -j 16:
> 
> 18/18 func-thorough+func-riscv64-thorough+thorough - qemu:func-riscv64-boston
>        TIMEOUT        120.09s   killed by signal 15 SIGTERM
> 
> The reason is quite boring: it is testing too much stuff.
> 
> Note that functional tests aren't supposed to be used as stress tests,
> e.g. it doesn't have to test every single corner case that might hit the
> board. It is supposed to catch most common user ooopsies. A timeout, in
> this context, is most likely to be considered something abnormal slowing
> down the emulation, not a lack of CPU horsepower to run all the tests
> before timeout.
> 
> Some of the tests claim to test odd CPU SMP numbers to either "ensures
> proper core distribution across clusters" or "validating proper handling
> of larger asymmetric SMP configurations". But there's no SMP/NUMA check
> made anywhere after boot, so in the end we're just testing whether the
> board is able to boot with 7/35 CPUs. As far as these tests are concerned
> we could have a completely broken, but bootable, SMP topology with 7/35
> CPUS, and we're oblivious about it.
> 
> Remove the 7 and 35 SMP tests, keeping the minimal CPUs (2) and maximum
> (64) tests. With these changes we're now able to run the test with a
> good TIMEOUT margin:
> 
> 17/18 func-thorough+func-riscv64-thorough+thorough - qemu:func-riscv64-boston
>        OK              61.28s   3 subtests passed
> 
> Fixes: e71111e26b ("test/functional: Add test for boston-aia board")
> Signed-off-by: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
> ---
>   tests/functional/riscv64/test_boston.py | 19 -------------------
>   1 file changed, 19 deletions(-)

As CI often fails without this patch, I'm going to merge it via my tree.

Regards,

Phil.

Re: [PATCH] riscv64/test_boston.py: fix intermitent test timeout

Posted by Philippe Mathieu-Daudé 1 day, 8 hours ago

On 26/1/26 18:45, Daniel Henrique Barboza wrote:
> The recently added Boston MIPS board selftest times out consistently in a
> machine running 'make check-functional' with -j 16:
> 
> 18/18 func-thorough+func-riscv64-thorough+thorough - qemu:func-riscv64-boston
>        TIMEOUT        120.09s   killed by signal 15 SIGTERM
> 
> The reason is quite boring: it is testing too much stuff.
> 
> Note that functional tests aren't supposed to be used as stress tests,
> e.g. it doesn't have to test every single corner case that might hit the
> board. It is supposed to catch most common user ooopsies. A timeout, in
> this context, is most likely to be considered something abnormal slowing
> down the emulation, not a lack of CPU horsepower to run all the tests
> before timeout.
> 
> Some of the tests claim to test odd CPU SMP numbers to either "ensures
> proper core distribution across clusters" or "validating proper handling
> of larger asymmetric SMP configurations". But there's no SMP/NUMA check
> made anywhere after boot, so in the end we're just testing whether the
> board is able to boot with 7/35 CPUs. As far as these tests are concerned
> we could have a completely broken, but bootable, SMP topology with 7/35
> CPUS, and we're oblivious about it.

Good justification. Otherwise I'd have suggested the @skipSlowTest()
decorator.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

> 
> Remove the 7 and 35 SMP tests, keeping the minimal CPUs (2) and maximum
> (64) tests. With these changes we're now able to run the test with a
> good TIMEOUT margin:
> 
> 17/18 func-thorough+func-riscv64-thorough+thorough - qemu:func-riscv64-boston
>        OK              61.28s   3 subtests passed
> 
> Fixes: e71111e26b ("test/functional: Add test for boston-aia board")
> Signed-off-by: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
> ---
>   tests/functional/riscv64/test_boston.py | 19 -------------------
>   1 file changed, 19 deletions(-)

Re: [PATCH] riscv64/test_boston.py: fix intermitent test timeout

Posted by Djordje Todorovic 1 day, 18 hours ago

On 1/26/26 18:45, Daniel Henrique Barboza wrote:
> [You don't often get email from daniel.barboza@oss.qualcomm.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> The recently added Boston MIPS board selftest times out consistently in a
> machine running 'make check-functional' with -j 16:
>
> 18/18 func-thorough+func-riscv64-thorough+thorough - qemu:func-riscv64-boston
>        TIMEOUT        120.09s   killed by signal 15 SIGTERM
>
> The reason is quite boring: it is testing too much stuff.
>
> Note that functional tests aren't supposed to be used as stress tests,
> e.g. it doesn't have to test every single corner case that might hit the
> board. It is supposed to catch most common user ooopsies. A timeout, in
> this context, is most likely to be considered something abnormal slowing
> down the emulation, not a lack of CPU horsepower to run all the tests
> before timeout.
>
> Some of the tests claim to test odd CPU SMP numbers to either "ensures
> proper core distribution across clusters" or "validating proper handling
> of larger asymmetric SMP configurations". But there's no SMP/NUMA check
> made anywhere after boot, so in the end we're just testing whether the
> board is able to boot with 7/35 CPUs. As far as these tests are concerned
> we could have a completely broken, but bootable, SMP topology with 7/35
> CPUS, and we're oblivious about it.
>
> Remove the 7 and 35 SMP tests, keeping the minimal CPUs (2) and maximum
> (64) tests. With these changes we're now able to run the test with a
> good TIMEOUT margin:
>
> 17/18 func-thorough+func-riscv64-thorough+thorough - qemu:func-riscv64-boston
>        OK              61.28s   3 subtests passed
>
> Fixes: e71111e26b ("test/functional: Add test for boston-aia board")
> Signed-off-by: Daniel Henrique Barboza <daniel.barboza@oss.qualcomm.com>
> ---
>   tests/functional/riscv64/test_boston.py | 19 -------------------
>   1 file changed, 19 deletions(-)
>
> diff --git a/tests/functional/riscv64/test_boston.py b/tests/functional/riscv64/test_boston.py
> index 385de6a61d..13c44ca3e3 100755
> --- a/tests/functional/riscv64/test_boston.py
> +++ b/tests/functional/riscv64/test_boston.py
> @@ -63,25 +63,6 @@ def test_boston_boot_linux_min_cpus(self):
>           """
>           self._boot_linux_test(smp_count=2)
>
> -    def test_boston_boot_linux_7_cpus(self):
> -        """
> -        Test Linux kernel boot with 7 CPUs
> -
> -        7 CPUs is a special configuration that tests odd CPU count
> -        handling and ensures proper core distribution across clusters.
> -        """
> -        self._boot_linux_test(smp_count=7)
> -
> -    def test_boston_boot_linux_35_cpus(self):
> -        """
> -        Test Linux kernel boot with 35 CPUs
> -
> -        35 CPUs is a special configuration that tests a non-power-of-2
> -        CPU count above 32, validating proper handling of larger
> -        asymmetric SMP configurations.
> -        """
> -        self._boot_linux_test(smp_count=35)
> -
>       def test_boston_boot_linux_max_cpus(self):
>           """
>           Test Linux kernel boot with maximum supported CPU count (64)
> --
> 2.43.0

LGTM, thank you!

Reviewed-by: Djordje Todorovic <Djordje.Todorovic@htecgroup.com>