Richard Henderson <richard.henderson@linaro.org> writes:
> On 8/23/23 06:04, Thomas Huth wrote:
>> On 06/08/2023 05.36, Richard Henderson wrote:
>>> The following changes since commit 6db03ccc7f4ca33c99debaac290066f4500a2dfb:
>>>
>>> Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into
>>> staging (2023-08-04 14:47:00 -0700)
>>>
>>> are available in the Git repository at:
>>>
>>> https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230805
>>>
>>> for you to fetch changes up to 843246699425adfb6b81f927c16c9c6249b51e1d:
>>>
>>> linux-user/elfload: Set V in ELF_HWCAP for RISC-V (2023-08-05 18:17:20 +0000)
>>>
>>> ----------------------------------------------------------------
>>> accel/tcg: Do not issue misaligned i/o
>>> accel/tcg: Call save_iotlb_data from io_readx
>>> gdbstub: use 0 ("any process") on packets with no PID
>>> linux-user: Fixes for MAP_FIXED_NOREPLACE
>>> linux-user: Fixes for brk
>>> linux-user: Adjust task_unmapped_base for reserved_va
>>> linux-user: Use ELF_ET_DYN_BASE for ET_DYN with interpreter
>>> linux-user: Remove host != guest page size workarounds in brk and image load
>>> linux-user: Set V in ELF_HWCAP for RISC-V
>>> *-user: Remove last_brk as unused
>> Hi Richard,
>> I noticed that we currently have two failing Avocado jobs in our CI,
>> avocado-system-centos and avocado-system-opensuse, where the
>> boot_linux.py:BootLinuxX8664.test_pc_i440fx_tcg and the
>> boot_linux.py:BootLinuxX8664.test_pc_q35_tcg are now apparently
>> crashing. If I've got the history right, it started with your pull
>> request here, in the preceeding one from Paolo, everything is still
>> green:
>> https://gitlab.com/qemu-project/qemu/-/pipelines/956543770
>> But here the jobs started failing:
>> https://gitlab.com/qemu-project/qemu/-/pipelines/957458385
>> Could you please have a look?
>
> It's some sort of timing issue, which sometimes goes away when re-run.
> I was re-running tests *a lot* in order to get them to go green while
> running the 8.1 release.
There is a definite regression point for the test_pc_q35 case:
./tests/venv/bin/avocado run ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg
JOB ID : b8ea329d3353db7a47eb955fcad2f26b2dbe9f29
JOB LOG : /home/alex.bennee/avocado/job-results/job-2023-08-24T15.27-b8ea329/job.log
(1/1) ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg: PASS (110.70 s)
RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME : 111.22 s
🕙15:29:06 alex.bennee@hackbox2:qemu.git/builds/bisect (190aba8) (BISECTING) [$!?] took 1m51s
➜ make -j30
[1/8] Generating qemu-version.h with a custom command (wrapped by meson to capture output)
[2/8] Compiling C object qga/qemu-ga.p/main.c.o
[3/8] Compiling C object libqmp.fa.p/monitor_qmp-cmds-control.c.o
[4/8] Compiling C object libqemu-x86_64-softmmu.fa.p/accel_tcg_cputlb.c.o
[5/8] Compiling C object libcommon.fa.p/softmmu_vl.c.o
[6/8] Linking static target libqmp.fa
[7/8] Linking target qga/qemu-ga
[8/8] Linking target qemu-system-x86_64
🕙15:30:12 alex.bennee@hackbox2:qemu.git/builds/bisect (f7eaf9d) (BISECTING) [$!?] took 5s
➜ ./tests/venv/bin/avocado run ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg
JOB ID : 56768272dee373062792251ee3445cc81092634e
JOB LOG : /home/alex.bennee/avocado/job-results/job-2023-08-24T15.30-5676827/job.log
(1/1) ./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '1-./tests/avocado/boot_linux.py:BootLinuxX8664.test_pc_q35_tcg', 'logdir': '/home/alex.bennee/avocado/job-results/job-2023-08-24T15.30-5676827/test-results... (480.28 s)
RESULTS : PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
JOB TIME : 480.80 s
which bisects to:
commit f7eaf9d702efdd02481d5f1c25f7d8e0ffb64c6e (HEAD, refs/bisect/bad)
Author: Richard Henderson <richard.henderson@linaro.org>
Date: Tue Aug 1 10:46:03 2023 -0700
accel/tcg: Do not issue misaligned i/o
In the single-page case we were issuing misaligned i/o to
the memory subsystem, which does not handle it properly.
Split such accesses via do_{ld,st}_mmio_*.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1800
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>
> For instance, with very little added except for your s390x pull, the
> same BootLinuxX8664.test_pc_i440fx_tcg test passes:
>
> https://gitlab.com/qemu-project/qemu/-/jobs/4931341744#L136
>
> In the failing i44fx_tcg test, you can even see it's a timing issue:
>
> https://qemu-project.gitlab.io/-/qemu/-/jobs/4813804725/artifacts/build/tests/results/latest/test-results/02-tests_avocado_boot_linux.py_BootLinuxX8664.test_pc_i440fx_tcg/debug.log
>
> 23:42:30 DEBUG| [ 61.003328] Sending NMI from CPU 0 to CPUs 1:
> 23:42:30 DEBUG| [ 61.007829] INFO: NMI handler
> (nmi_cpu_backtrace_handler) took too long to run: 2.622 msecs
> 23:42:30 DEBUG| [ 61.003328] NMI backtrace for cpu 1 skipped: idling
> at native_safe_halt+0xe/0x10
> 23:42:30 DEBUG| [ 61.003328] rcu: rcu_sched kthread starved for
> 60002 jiffies! g-963 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
> 23:42:30 DEBUG| [ 61.003328] rcu: RCU grace-period kthread stack dump:
> 23:42:30 DEBUG| [ 61.003328] rcu_sched I 0 10 2 0x80004000
> 23:42:30 DEBUG| [ 61.003328] Call Trace:
> 23:42:30 DEBUG| [ 61.003328] ? __schedule+0x29f/0x680
> ...
>
>
> r~
--
Alex Bennée
Virtualisation Tech Lead @ Linaro