On 16/11/2022 18:20, Alex Bennée wrote:
> Stefan Hajnoczi <stefanha@gmail.com> writes:
>
>> This pull request causes the following CI failure:
>>
>> https://gitlab.com/qemu-project/qemu/-/jobs/3328449477
>>
>> I haven't figured out the root cause of the failure. Maybe the pull
>> request just exposes a latent failure. Please take a look and we can
>> try again for -rc2.
>
> OK after a lot of digging I've come to the following conclusion:
>
> * the Fuloong 2E machine never enables the FIFO on the 16550 (s->fcr & UART_FCR_FE)
> * as a result if qemu_chr_fe_write(&s->chr, &s->tsr, 1) fails with -EAGAIN
> - a serial_watch_cb is queued
> - s->tsr_retry++
> * additional serial_ioport_write's overwrite s->thr
> * the console output gets corrupted
>
> You can see the effect by comparing the serial write and xmit values:
>
> ➜ grep serial_write alex.log | cut -d ' ' -f 6 | xxd -r -p | head -n 10
> [ 0.000000] Initializing cgroup subsys cpuset
> [ 0.000000] Initializing cgroup subsys cpu
> [ 0.000000] Initializing cgroup subsys cpuacct
> [ 0.000000] Linux version 3.16.0-6-loongson-2e (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 Debian 3.16.56-1+deb8u1 (2018-05-08)
> [ 0.000000] memsize=256, highmemsize=0
> [ 0.000000] CpuClock = 533080000
> [ 0.000000] bootconsole [early0] enabled
> [ 0.000000] CPU0 revision is: 00006302 (ICT Loongson-2)
> [ 0.000000] FPU revision is: 00000501
> [ 0.000000] Checking for the multiply/shift bug... no.
> 🕙18:27:17 alex@zen:qemu.git/builds/all on pr/141122-misc-for-7.2-1 [$!?⇕]
> ➜ grep serial_xmit alex.log | cut -d ' ' -f 2 | xxd -r -p | head -n 10
> [ 0.000000] Initializing cgroup subsys cpuset
> [ 0.000000] Initializing cgroup subsys cpu
> [ 0.000000] Initializing cgroup subsys cpuacct
> [ 0.000000] Linux version 3.16.0-6-loongson-2e (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 Debian 33 0.000000] bootconsole [early0] enabled
> [ 0.000000] CPU0 revision is: 00006302 (ICT Loongson-2)
> [ 0.000000] FPU revision is: 00000501
> [ 0.000000] Checking for the multiply/shift bug... no.
> [ 0.000000] Checking for the daddiu bug... no.
> [ 0.000000] Determined physical RAM map:
> [ 0.000000] memory: 000
>
> As a result the check for the pattern fails:
>
> console_pattern = 'Kernel command line: %s' % kernel_command_line
> self.wait_for_console_pattern(console_pattern)
>
> resulting in a timeout and test fail.
>
> In effect the configuration makes the output dependent on how fast the
> avocado test can drain the socket as there is no buffering elsewhere in
> the system. The changes in:
>
> Subject: [PULL 02/10] tests/avocado: improve behaviour waiting for login prompts
>
> makes this failure more likely to happen - I think because the .peek() and
> .readline() behaviour have different buffering strategies. Options
> include:
>
> - enable the 16550 FIFO for the Loognson kernel (command line option?)
> - increase the buffering of the python socket.socket() code
>
> I can get it to pass by shuffling the time.sleep() and a few other
> checks around but that seems flaky at best.
Nice work! This is the well-known problem whereby the kernel sometimes expects the
BIOS to have pre-configured the serial ports, which of course never happens when
booting directly with -kernel.
Given that the fuloong2e machine already has a mini "trampoline" bootloader, would it
be possible to tweak write_bootloader() at
https://gitlab.com/qemu-project/qemu/-/blob/master/hw/mips/fuloong2e.c#L166 to set
UART_FCR_FE on the available UARTs before jumping into the kernel?
ATB,
Mark.