[PATCH v3 00/29] PowerPC interrupt rework

Matheus Ferst posted 29 patches 3 years, 3 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20221011204829.1641124-1-matheus.ferst@eldorado.org.br
Maintainers: "Cédric Le Goater" <clg@kaod.org>, Daniel Henrique Barboza <danielhb413@gmail.com>, David Gibson <david@gibson.dropbear.id.au>, Greg Kurz <groug@kaod.org>
There is a newer version of this series
hw/ppc/pnv_core.c        |   1 +
hw/ppc/ppc.c             |  17 +-
hw/ppc/spapr_hcall.c     |   6 +
hw/ppc/spapr_rtas.c      |   2 +-
hw/ppc/trace-events      |   2 +-
target/ppc/cpu.c         |   4 +
target/ppc/cpu.h         |  43 +-
target/ppc/cpu_init.c    | 212 +---------
target/ppc/excp_helper.c | 887 ++++++++++++++++++++++++++++++++++-----
target/ppc/helper.h      |   1 +
target/ppc/helper_regs.c |   2 +
target/ppc/misc_helper.c |  11 +-
target/ppc/translate.c   |   2 +
13 files changed, 833 insertions(+), 357 deletions(-)
[PATCH v3 00/29] PowerPC interrupt rework
Posted by Matheus Ferst 3 years, 3 months ago
Link to v2: https://lists.gnu.org/archive/html/qemu-ppc/2022-09/msg00556.html
This series is also available as a git branch: https://github.com/PPC64/qemu/tree/ferst-interrupt-fix-v3
Patches without review: 3-27

This new version rebases the patch series on the current master and
fixes some problems pointed out by Fabiano on v2.

Matheus Ferst (29):
  target/ppc: define PPC_INTERRUPT_* values directly
  target/ppc: always use ppc_set_irq to set env->pending_interrupts
  target/ppc: split interrupt masking and delivery from ppc_hw_interrupt
  target/ppc: prepare to split interrupt masking and delivery by excp_model
  target/ppc: create an interrupt masking method for POWER9/POWER10
  target/ppc: remove unused interrupts from p9_next_unmasked_interrupt
  target/ppc: create an interrupt deliver method for POWER9/POWER10
  target/ppc: remove unused interrupts from p9_deliver_interrupt
  target/ppc: remove generic architecture checks from p9_deliver_interrupt
  target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER9
  target/ppc: add power-saving interrupt masking logic to p9_next_unmasked_interrupt
  target/ppc: create an interrupt masking method for POWER8
  target/ppc: remove unused interrupts from p8_next_unmasked_interrupt
  target/ppc: create an interrupt deliver method for POWER8
  target/ppc: remove unused interrupts from p8_deliver_interrupt
  target/ppc: remove generic architecture checks from p8_deliver_interrupt
  target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER8
  target/ppc: add power-saving interrupt masking logic to p8_next_unmasked_interrupt
  target/ppc: create an interrupt masking method for POWER7
  target/ppc: remove unused interrupts from p7_next_unmasked_interrupt
  target/ppc: create an interrupt deliver method for POWER7
  target/ppc: remove unused interrupts from p7_deliver_interrupt
  target/ppc: remove generic architecture checks from p7_deliver_interrupt
  target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER7
  target/ppc: add power-saving interrupt masking logic to p7_next_unmasked_interrupt
  target/ppc: remove ppc_store_lpcr from CONFIG_USER_ONLY builds
  target/ppc: introduce ppc_maybe_interrupt
  target/ppc: unify cpu->has_work based on cs->interrupt_request
  target/ppc: move the p*_interrupt_powersave methods to excp_helper.c

 hw/ppc/pnv_core.c        |   1 +
 hw/ppc/ppc.c             |  17 +-
 hw/ppc/spapr_hcall.c     |   6 +
 hw/ppc/spapr_rtas.c      |   2 +-
 hw/ppc/trace-events      |   2 +-
 target/ppc/cpu.c         |   4 +
 target/ppc/cpu.h         |  43 +-
 target/ppc/cpu_init.c    | 212 +---------
 target/ppc/excp_helper.c | 887 ++++++++++++++++++++++++++++++++++-----
 target/ppc/helper.h      |   1 +
 target/ppc/helper_regs.c |   2 +
 target/ppc/misc_helper.c |  11 +-
 target/ppc/translate.c   |   2 +
 13 files changed, 833 insertions(+), 357 deletions(-)

-- 
2.25.1
Re: [PATCH v3 00/29] PowerPC interrupt rework
Posted by Daniel Henrique Barboza 3 years, 3 months ago
Matheus,

Nice cleanup and rework. The code is way better now.

I send a PR today without this series. As soon as that gets merged I'll
push these patches to ppc-next. This will give us more time to test the
changes and see if we can detect any unintended changes/bugs.

Thanks,

Daniel

On 10/11/22 17:48, Matheus Ferst wrote:
> Link to v2: https://lists.gnu.org/archive/html/qemu-ppc/2022-09/msg00556.html
> This series is also available as a git branch: https://github.com/PPC64/qemu/tree/ferst-interrupt-fix-v3
> Patches without review: 3-27
> 
> This new version rebases the patch series on the current master and
> fixes some problems pointed out by Fabiano on v2.
> 
> Matheus Ferst (29):
>    target/ppc: define PPC_INTERRUPT_* values directly
>    target/ppc: always use ppc_set_irq to set env->pending_interrupts
>    target/ppc: split interrupt masking and delivery from ppc_hw_interrupt
>    target/ppc: prepare to split interrupt masking and delivery by excp_model
>    target/ppc: create an interrupt masking method for POWER9/POWER10
>    target/ppc: remove unused interrupts from p9_next_unmasked_interrupt
>    target/ppc: create an interrupt deliver method for POWER9/POWER10
>    target/ppc: remove unused interrupts from p9_deliver_interrupt
>    target/ppc: remove generic architecture checks from p9_deliver_interrupt
>    target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER9
>    target/ppc: add power-saving interrupt masking logic to p9_next_unmasked_interrupt
>    target/ppc: create an interrupt masking method for POWER8
>    target/ppc: remove unused interrupts from p8_next_unmasked_interrupt
>    target/ppc: create an interrupt deliver method for POWER8
>    target/ppc: remove unused interrupts from p8_deliver_interrupt
>    target/ppc: remove generic architecture checks from p8_deliver_interrupt
>    target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER8
>    target/ppc: add power-saving interrupt masking logic to p8_next_unmasked_interrupt
>    target/ppc: create an interrupt masking method for POWER7
>    target/ppc: remove unused interrupts from p7_next_unmasked_interrupt
>    target/ppc: create an interrupt deliver method for POWER7
>    target/ppc: remove unused interrupts from p7_deliver_interrupt
>    target/ppc: remove generic architecture checks from p7_deliver_interrupt
>    target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER7
>    target/ppc: add power-saving interrupt masking logic to p7_next_unmasked_interrupt
>    target/ppc: remove ppc_store_lpcr from CONFIG_USER_ONLY builds
>    target/ppc: introduce ppc_maybe_interrupt
>    target/ppc: unify cpu->has_work based on cs->interrupt_request
>    target/ppc: move the p*_interrupt_powersave methods to excp_helper.c
> 
>   hw/ppc/pnv_core.c        |   1 +
>   hw/ppc/ppc.c             |  17 +-
>   hw/ppc/spapr_hcall.c     |   6 +
>   hw/ppc/spapr_rtas.c      |   2 +-
>   hw/ppc/trace-events      |   2 +-
>   target/ppc/cpu.c         |   4 +
>   target/ppc/cpu.h         |  43 +-
>   target/ppc/cpu_init.c    | 212 +---------
>   target/ppc/excp_helper.c | 887 ++++++++++++++++++++++++++++++++++-----
>   target/ppc/helper.h      |   1 +
>   target/ppc/helper_regs.c |   2 +
>   target/ppc/misc_helper.c |  11 +-
>   target/ppc/translate.c   |   2 +
>   13 files changed, 833 insertions(+), 357 deletions(-)
>
Re: [PATCH v3 00/29] PowerPC interrupt rework
Posted by Daniel Henrique Barboza 3 years, 3 months ago
Matheus,

This series fails 'make check-avocado' in an e500 test. This is the error output:


& make -j && \
	make check-avocado AVOCADO_TESTS=tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500

(...)

Fetching asset from tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500
JOB ID     : 506b6b07bf40cf1bffcf3911a0f9b8948de6553c
JOB LOG    : /home/danielhb/qemu/build/tests/results/job-2022-10-19T17.37-506b6b0/job.log
  (1/1) tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '1-tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500', 'logdir': '/home/danielhb/qemu/build/tests/results/job-2022-10-19T17.37-506b6b0/test-... (120.31 s)
RESULTS    : PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
JOB TIME   : 121.00 s



'git bisect' pointed the following commit as the culprit:

d9bdb6192edc5c74cda754a6cd32237b1b9272f0 is the first bad commit
commit d9bdb6192edc5c74cda754a6cd32237b1b9272f0
Author: Matheus Ferst <matheus.ferst@eldorado.org.br>
Date:   Tue Oct 11 17:48:27 2022 -0300

     target/ppc: introduce ppc_maybe_interrupt
     

This would be patch 27.


As a benchmark, this test when successful takes around 11 seconds in my test
env:

  (33/42) tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500: PASS (11.02 s)


Cedric's qemu-ppc-boot test suit works fine with this series, so I'd say that
this avocado test is doing something else that is causing the problem.


I'll test patches 1-26 later and see if all tests pass. In that case I'll push
1-26 to ppc-next and then you can work on 27-29.


Thanks,


Daniel



On 10/11/22 17:48, Matheus Ferst wrote:
> Link to v2: https://lists.gnu.org/archive/html/qemu-ppc/2022-09/msg00556.html
> This series is also available as a git branch: https://github.com/PPC64/qemu/tree/ferst-interrupt-fix-v3
> Patches without review: 3-27
> 
> This new version rebases the patch series on the current master and
> fixes some problems pointed out by Fabiano on v2.
> 
> Matheus Ferst (29):
>    target/ppc: define PPC_INTERRUPT_* values directly
>    target/ppc: always use ppc_set_irq to set env->pending_interrupts
>    target/ppc: split interrupt masking and delivery from ppc_hw_interrupt
>    target/ppc: prepare to split interrupt masking and delivery by excp_model
>    target/ppc: create an interrupt masking method for POWER9/POWER10
>    target/ppc: remove unused interrupts from p9_next_unmasked_interrupt
>    target/ppc: create an interrupt deliver method for POWER9/POWER10
>    target/ppc: remove unused interrupts from p9_deliver_interrupt
>    target/ppc: remove generic architecture checks from p9_deliver_interrupt
>    target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER9
>    target/ppc: add power-saving interrupt masking logic to p9_next_unmasked_interrupt
>    target/ppc: create an interrupt masking method for POWER8
>    target/ppc: remove unused interrupts from p8_next_unmasked_interrupt
>    target/ppc: create an interrupt deliver method for POWER8
>    target/ppc: remove unused interrupts from p8_deliver_interrupt
>    target/ppc: remove generic architecture checks from p8_deliver_interrupt
>    target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER8
>    target/ppc: add power-saving interrupt masking logic to p8_next_unmasked_interrupt
>    target/ppc: create an interrupt masking method for POWER7
>    target/ppc: remove unused interrupts from p7_next_unmasked_interrupt
>    target/ppc: create an interrupt deliver method for POWER7
>    target/ppc: remove unused interrupts from p7_deliver_interrupt
>    target/ppc: remove generic architecture checks from p7_deliver_interrupt
>    target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER7
>    target/ppc: add power-saving interrupt masking logic to p7_next_unmasked_interrupt
>    target/ppc: remove ppc_store_lpcr from CONFIG_USER_ONLY builds
>    target/ppc: introduce ppc_maybe_interrupt
>    target/ppc: unify cpu->has_work based on cs->interrupt_request
>    target/ppc: move the p*_interrupt_powersave methods to excp_helper.c
> 
>   hw/ppc/pnv_core.c        |   1 +
>   hw/ppc/ppc.c             |  17 +-
>   hw/ppc/spapr_hcall.c     |   6 +
>   hw/ppc/spapr_rtas.c      |   2 +-
>   hw/ppc/trace-events      |   2 +-
>   target/ppc/cpu.c         |   4 +
>   target/ppc/cpu.h         |  43 +-
>   target/ppc/cpu_init.c    | 212 +---------
>   target/ppc/excp_helper.c | 887 ++++++++++++++++++++++++++++++++++-----
>   target/ppc/helper.h      |   1 +
>   target/ppc/helper_regs.c |   2 +
>   target/ppc/misc_helper.c |  11 +-
>   target/ppc/translate.c   |   2 +
>   13 files changed, 833 insertions(+), 357 deletions(-)
>
Re: [PATCH v3 00/29] PowerPC interrupt rework
Posted by Daniel Henrique Barboza 3 years, 3 months ago

On 10/19/22 18:55, Daniel Henrique Barboza wrote:
> Matheus,
> 
> This series fails 'make check-avocado' in an e500 test. This is the error output:

Scrap that.

This avocado test is also failing on master 10% of the time, give or take.
It might be case that patch 27 makes the failure more consistent, but I can't
say it's the culprit.


I'll take a closer look and see if I can diagnose one particular commit that
is making the patch fail 1 out of 10 times. It can be case where I might need
to skip the test altogether.


Thanks,


Daniel


> 
> 
> & make -j && \
>      make check-avocado AVOCADO_TESTS=tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500
> 
> (...)
> 
> Fetching asset from tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500
> JOB ID     : 506b6b07bf40cf1bffcf3911a0f9b8948de6553c
> JOB LOG    : /home/danielhb/qemu/build/tests/results/job-2022-10-19T17.37-506b6b0/job.log
>   (1/1) tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '1-tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500', 'logdir': '/home/danielhb/qemu/build/tests/results/job-2022-10-19T17.37-506b6b0/test-... (120.31 s)
> RESULTS    : PASS 0 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 1 | CANCEL 0
> JOB TIME   : 121.00 s
> 
> 
> 
> 'git bisect' pointed the following commit as the culprit:
> 
> d9bdb6192edc5c74cda754a6cd32237b1b9272f0 is the first bad commit
> commit d9bdb6192edc5c74cda754a6cd32237b1b9272f0
> Author: Matheus Ferst <matheus.ferst@eldorado.org.br>
> Date:   Tue Oct 11 17:48:27 2022 -0300
> 
>      target/ppc: introduce ppc_maybe_interrupt
> 
> This would be patch 27.
> 
> 
> As a benchmark, this test when successful takes around 11 seconds in my test
> env:
> 
>   (33/42) tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500: PASS (11.02 s)
> 
> 
> Cedric's qemu-ppc-boot test suit works fine with this series, so I'd say that
> this avocado test is doing something else that is causing the problem.
> 
> 
> I'll test patches 1-26 later and see if all tests pass. In that case I'll push
> 1-26 to ppc-next and then you can work on 27-29.
> 
> 
> Thanks,
> 
> 
> Daniel
> 
> 
> 
> On 10/11/22 17:48, Matheus Ferst wrote:
>> Link to v2: https://lists.gnu.org/archive/html/qemu-ppc/2022-09/msg00556.html
>> This series is also available as a git branch: https://github.com/PPC64/qemu/tree/ferst-interrupt-fix-v3
>> Patches without review: 3-27
>>
>> This new version rebases the patch series on the current master and
>> fixes some problems pointed out by Fabiano on v2.
>>
>> Matheus Ferst (29):
>>    target/ppc: define PPC_INTERRUPT_* values directly
>>    target/ppc: always use ppc_set_irq to set env->pending_interrupts
>>    target/ppc: split interrupt masking and delivery from ppc_hw_interrupt
>>    target/ppc: prepare to split interrupt masking and delivery by excp_model
>>    target/ppc: create an interrupt masking method for POWER9/POWER10
>>    target/ppc: remove unused interrupts from p9_next_unmasked_interrupt
>>    target/ppc: create an interrupt deliver method for POWER9/POWER10
>>    target/ppc: remove unused interrupts from p9_deliver_interrupt
>>    target/ppc: remove generic architecture checks from p9_deliver_interrupt
>>    target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER9
>>    target/ppc: add power-saving interrupt masking logic to p9_next_unmasked_interrupt
>>    target/ppc: create an interrupt masking method for POWER8
>>    target/ppc: remove unused interrupts from p8_next_unmasked_interrupt
>>    target/ppc: create an interrupt deliver method for POWER8
>>    target/ppc: remove unused interrupts from p8_deliver_interrupt
>>    target/ppc: remove generic architecture checks from p8_deliver_interrupt
>>    target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER8
>>    target/ppc: add power-saving interrupt masking logic to p8_next_unmasked_interrupt
>>    target/ppc: create an interrupt masking method for POWER7
>>    target/ppc: remove unused interrupts from p7_next_unmasked_interrupt
>>    target/ppc: create an interrupt deliver method for POWER7
>>    target/ppc: remove unused interrupts from p7_deliver_interrupt
>>    target/ppc: remove generic architecture checks from p7_deliver_interrupt
>>    target/ppc: move power-saving interrupt masking out of cpu_has_work_POWER7
>>    target/ppc: add power-saving interrupt masking logic to p7_next_unmasked_interrupt
>>    target/ppc: remove ppc_store_lpcr from CONFIG_USER_ONLY builds
>>    target/ppc: introduce ppc_maybe_interrupt
>>    target/ppc: unify cpu->has_work based on cs->interrupt_request
>>    target/ppc: move the p*_interrupt_powersave methods to excp_helper.c
>>
>>   hw/ppc/pnv_core.c        |   1 +
>>   hw/ppc/ppc.c             |  17 +-
>>   hw/ppc/spapr_hcall.c     |   6 +
>>   hw/ppc/spapr_rtas.c      |   2 +-
>>   hw/ppc/trace-events      |   2 +-
>>   target/ppc/cpu.c         |   4 +
>>   target/ppc/cpu.h         |  43 +-
>>   target/ppc/cpu_init.c    | 212 +---------
>>   target/ppc/excp_helper.c | 887 ++++++++++++++++++++++++++++++++++-----
>>   target/ppc/helper.h      |   1 +
>>   target/ppc/helper_regs.c |   2 +
>>   target/ppc/misc_helper.c |  11 +-
>>   target/ppc/translate.c   |   2 +
>>   13 files changed, 833 insertions(+), 357 deletions(-)
>>

Re: [PATCH v3 00/29] PowerPC interrupt rework
Posted by Matheus K. Ferst 3 years, 3 months ago
On 20/10/2022 08:18, Daniel Henrique Barboza wrote:
> On 10/19/22 18:55, Daniel Henrique Barboza wrote:
>> Matheus,
>>
>> This series fails 'make check-avocado' in an e500 test. This is the 
>> error output:
> 
> Scrap that.
> 
> This avocado test is also failing on master 10% of the time, give or take.
> It might be case that patch 27 makes the failure more consistent, but I 
> can't
> say it's the culprit.
> 
> 
> I'll take a closer look and see if I can diagnose one particular commit 
> that
> is making the patch fail 1 out of 10 times. It can be case where I might 
> need
> to skip the test altogether.
> 

Nice catch. I guess we need a gen_icount_io_start before calling 
helper_ppc_maybe_interrupt, so maybe it's better to make a 
gen_ppc_maybe_interrupt that calls icount and the helper. I'll give it a 
bit more testing and re-spin the series.

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>
Re: [PATCH v3 00/29] PowerPC interrupt rework
Posted by Daniel Henrique Barboza 3 years, 3 months ago
Matheus,

I did some digging yesterday. There are 2 distinct things happening:

- the apparent problem with the avocado test. After doing more and more tests
it seems like the test failure rate is lower than 10%. With a simple script
to exercise it in my laptop:

n=1
while [ 1 ]; do
	make -j check-avocado \
		AVOCADO_TESTS='tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500' ;
	if [ $? -ne 0 ]; then
		echo "test failed after $n interactions"
		exit 1
	fi
	((n=n+1))
done

In master I managed to get up to 100+ runs without failure. Sometimes I get 90,
50, 30 runs before failure and so on. This is an OK failure rate in my opinion,
so if any code contribution does not dramatically increase this failure rate I'm
fine with it. This also means that I'll not be skipping the test.

- back to this series, I couldn't manage to get a single successful run with
patch 27 applied. On the other hand, running the aforementioned script with
patches 1-26 I just got 96 test runs before the first failure. This is enough
evidence for me to believe that, yeah, patch 27 is really doing something that is
messing with the icount replay for e500 one way or the other.


All that said, patches 1-26 are queued in ppc-next.


On 10/20/22 10:40, Matheus K. Ferst wrote:
> On 20/10/2022 08:18, Daniel Henrique Barboza wrote:
>> On 10/19/22 18:55, Daniel Henrique Barboza wrote:
>>> Matheus,
>>>
>>> This series fails 'make check-avocado' in an e500 test. This is the error output:
>>
>> Scrap that.
>>
>> This avocado test is also failing on master 10% of the time, give or take.
>> It might be case that patch 27 makes the failure more consistent, but I can't
>> say it's the culprit.
>>
>>
>> I'll take a closer look and see if I can diagnose one particular commit that
>> is making the patch fail 1 out of 10 times. It can be case where I might need
>> to skip the test altogether.
>>
> 
> Nice catch. I guess we need a gen_icount_io_start before calling helper_ppc_maybe_interrupt, so maybe it's better to make a gen_ppc_maybe_interrupt that calls icount and the helper. I'll give it a bit more testing and re-spin the series.


Don't need to re-spin everything (unless you needed to do some changes in
the patches prior). Just resend patch 27+.


Thanks,


Daniel

> 
> Thanks,
> Matheus K. Ferst
> Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
> Analista de Software
> Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>
>
Re: [PATCH v3 00/29] PowerPC interrupt rework
Posted by Matheus K. Ferst 3 years, 3 months ago
On 21/10/2022 07:56, Daniel Henrique Barboza wrote:
> Matheus,
> 
> I did some digging yesterday. There are 2 distinct things happening:
> 
> - the apparent problem with the avocado test. After doing more and more 
> tests
> it seems like the test failure rate is lower than 10%. With a simple script
> to exercise it in my laptop:
> 
> n=1
> while [ 1 ]; do
>         make -j check-avocado \
>                 
> AVOCADO_TESTS='tests/avocado/replay_kernel.py:ReplayKernelNormal.test_ppc64_e500' ;
>         if [ $? -ne 0 ]; then
>                 echo "test failed after $n interactions"
>                 exit 1
>         fi
>         ((n=n+1))
> done
> 
> In master I managed to get up to 100+ runs without failure. Sometimes I 
> get 90,
> 50, 30 runs before failure and so on. This is an OK failure rate in my 
> opinion,
> so if any code contribution does not dramatically increase this failure 
> rate I'm
> fine with it. This also means that I'll not be skipping the test.
> 

Thanks for this testing, I suspect we may have more than one bug that 
causes this test failure.

> - back to this series, I couldn't manage to get a single successful run 
> with
> patch 27 applied. On the other hand, running the aforementioned script with
> patches 1-26 I just got 96 test runs before the first failure. This is 
> enough
> evidence for me to believe that, yeah, patch 27 is really doing 
> something that is
> messing with the icount replay for e500 one way or the other.
>
Patch 27 is definitely wrong - other places that write in special 
registers and SPRs that may cause an interrupt (e.g., 
gen_helper_store_decr, gen_mtmsr[d]) call gen_io_start, so we also 
should use it before helper_ppc_maybe_interrupt. Without that call, we 
hit the cpu_abort in icount_handle_interrupt when using icount if 
writee[i] unmasks a pending interrupt.

The current writee[i] may be wrong in not calling it too, as it may 
cause an interrupt to be delivered. However, before the interrupt 
rework, CPU_INTERRUPT_HARD was set somewhere else, so it wouldn't 
trigger the abort.

That said, even after adding this call I still see failures after ~200 
iterations of this test, so we may have more problems to tackle here. 
However, it's not a CPU abort anymore, the second QEMU invocation exits 
with zero without writing anything to the console.

> 
> All that said, patches 1-26 are queued in ppc-next.
> 
> 
> On 10/20/22 10:40, Matheus K. Ferst wrote:
>> On 20/10/2022 08:18, Daniel Henrique Barboza wrote:
>>> On 10/19/22 18:55, Daniel Henrique Barboza wrote:
>>>> Matheus,
>>>>
>>>> This series fails 'make check-avocado' in an e500 test. This is the 
>>>> error output:
>>>
>>> Scrap that.
>>>
>>> This avocado test is also failing on master 10% of the time, give or 
>>> take.
>>> It might be case that patch 27 makes the failure more consistent, but 
>>> I can't
>>> say it's the culprit.
>>>
>>>
>>> I'll take a closer look and see if I can diagnose one particular 
>>> commit that
>>> is making the patch fail 1 out of 10 times. It can be case where I 
>>> might need
>>> to skip the test altogether.
>>>
>>
>> Nice catch. I guess we need a gen_icount_io_start before calling 
>> helper_ppc_maybe_interrupt, so maybe it's better to make a 
>> gen_ppc_maybe_interrupt that calls icount and the helper. I'll give it 
>> a bit more testing and re-spin the series.
> 
> 
> Don't need to re-spin everything (unless you needed to do some changes in
> the patches prior). Just resend patch 27+.
> 
> 

Ok, I'll send 27-29 with based on ppc-next.

Thanks,
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>