On 21/10/2024 15.59, Peter Maydell wrote:
> On Mon, 21 Oct 2024 at 14:55, Thomas Huth <thuth@redhat.com> wrote:
>>
>> On 21/10/2024 15.18, Thomas Huth wrote:
>>> On 21/10/2024 15.00, Peter Maydell wrote:
>>>> On Mon, 21 Oct 2024 at 12:35, Thomas Huth <thuth@redhat.com> wrote:
>>>>>
>>>>> The following changes since commit f1dd640896ee2b50cb34328f2568aad324702954:
>>>>>
>>>>> Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into
>>>>> staging (2024-10-18 10:42:56 +0100)
>>>>>
>>>>> are available in the Git repository at:
>>>>>
>>>>> https://gitlab.com/thuth/qemu.git tags/pull-request-2024-10-21
>>>>>
>>>>> for you to fetch changes up to ee772a332af8f23acf604ad0fb5132f886b0eb16:
>>>>>
>>>>> tests/functional: Convert the Avocado sh4 tuxrun test (2024-10-21
>>>>> 13:25:12 +0200)
>>>>>
>>>>> ----------------------------------------------------------------
>>>>> * Convert the Tuxrun Avocado tests to the new functional framework
>>>>> * Update the OpenBSD CI image to OpenBSD v7.6
>>>>> * Bump timeout of the ide-test
>>>>> * New maintainer for the QTests
>>>>> * Disable the pci-bridge on s390x by default
>>>>>
>>>>> ----------------------------------------------------------------
>>>>
>>>> Couple of failures on the functional-tests:
>>>>
>>>> https://gitlab.com/qemu-project/qemu/-/jobs/8140716604
>>>>
>>>> 7/28 qemu:func-thorough+func-aarch64-thorough+thorough /
>>>> func-aarch64-aarch64_tuxrun TIMEOUT 120.06s killed by signal 15
>>>> SIGTERM
>>>>
>>>> https://gitlab.com/qemu-project/qemu/-/jobs/8140716520
>>>>
>>>> 14/17 qemu:func-thorough+func-loongarch64-thorough+thorough /
>>>> func-loongarch64-loongarch64_virt TIMEOUT 60.09s killed by signal 15
>>>> SIGTERM
>>>>
>>>> I'm retrying to see if these are intermittent, but they
>>>> suggest that we should bump the timeout for these.
>>>
>>> Everything was fine with the gitlab shared runners (https://gitlab.com/
>>> thuth/qemu/-/pipelines/1504882880), but yes, it's likely the private runners
>>> being slow again...
>>>
>>> So please don't merge it yet, I'll go through the jobs of the private
>>> runners and update the timeouts of the failed jobs and the ones where it is
>>> getting close to the limit.
>>
>> Actually, looking at it again, the func-loongarch64-loongarch64_virt test is
>> not a new one, this has been merged quite a while ago already. And in
>> previous runs, it only took 6 - 10 seconds:
>>
>> https://gitlab.com/qemu-project/qemu/-/jobs/8125336852#L810
>> https://gitlab.com/qemu-project/qemu/-/jobs/8111434905#L740
>>
>> So maybe this was just a temporary blip in the test runners indeed? Could
>> you please try to rerun the jobs to see how long they take then?
>
> The alpine job passed on the retry:
> https://gitlab.com/qemu-project/qemu/-/jobs/8141648479
> and the func-loongarch64-loongarch64_virt test took 5.08s.
>
> The opensuse job failed again:
> https://gitlab.com/qemu-project/qemu/-/jobs/8141649069
> 7/28 qemu:func-thorough+func-aarch64-thorough+thorough /
> func-aarch64-aarch64_tuxrun TIMEOUT 120.04s killed by signal 15
> SIGTERM
Looking at the log files of the job, I can see in
https://gitlab.com/qemu-project/qemu/-/jobs/8141649069/artifacts/browse/build/tests/functional/aarch64/test_aarch64_tuxrun.TuxRunAarch64Test.test_arm64be/
console.log:
2024-10-21 13:20:32,844: Run /sbin/init as init process
2024-10-21 13:20:34,043: EXT4-fs (vda): re-mounted. Opts: (null). Quota
mode: none.
2024-10-21 13:20:34,350: Starting syslogd: OK
2024-10-21 13:20:34,423: Starting klogd: OK
2024-10-21 13:20:34,667: Running sysctl: OK
2024-10-21 13:20:34,739: Saving 2048 bits of non-creditable seed for next boot
2024-10-21 13:20:34,966: Starting network: blk_update_request: I/O error,
dev vda, sector 5824 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
2024-10-21 13:20:35,028: blk_update_request: I/O error, dev vda, sector 8848
op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
2024-10-21 13:20:35,051: OK
2024-10-21 13:20:35,088: blk_update_request: I/O error, dev vda, sector
12936 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
2024-10-21 13:20:35,149: blk_update_request: I/O error, dev vda, sector
17032 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
2024-10-21 13:20:35,181: Welcome to TuxTest
2024-10-21 13:20:35,882: tuxtest login: blk_update_request: I/O error, dev
vda, sector 21128 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
2024-10-21 13:20:35,882: blk_update_request: I/O error, dev vda, sector
25224 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
2024-10-21 13:20:35,882: blk_update_request: I/O error, dev vda, sector
29320 op 0x9:(WRITE_ZEROES) flags 0x800 phys_seg 0 prio class 0
2024-10-21 13:20:35,887: root
So this is indeed more than just a timeout setting that is too small...
I don't get the virtio errors when running the test locally, though.
I guess this needs some more investigation first ... maybe best if I respin
the PR without this patch for now 'til this is understood and fixed.
Thomas