[Qemu-devel] [PATCH v2] spapr: ignore decr interrupts when MSR_EE is disabled

Nikunj A Dadhania posted 1 patch 6 years, 9 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170714040437.22034-1-nikunj@linux.vnet.ibm.com
Test FreeBSD passed
Test checkpatch passed
Test docker passed
Test s390x passed
target/ppc/translate_init.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
[Qemu-devel] [PATCH v2] spapr: ignore decr interrupts when MSR_EE is disabled
Posted by Nikunj A Dadhania 6 years, 9 months ago
Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.

When reset happens, all the CPUs are in halted state. First CPU is brought out
of reset and secondary CPUs would be initialized by the guest kernel using a
rtas call start-cpu.

However, in case of TCG, decrementer interrupts keep on coming and waking the
secondary CPUs up.

These secondary CPUs would see the decrementer interrupt pending, which makes
cpu::has_work() to bring them out of wait loop and start executing
tcg_exec_cpu().

The problem with this is all the CPUs wake up and start booting SLOF image,
causing the following exception(4 CPUs TCG VM):

[   81.440850] reboot: Restarting system

SLOF
S
SLOF
SLOFLOF[0[0m **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
[0m **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
[0m *************************************m**********[?25l **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
***********************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
ERROR: Flatten device tree not available!

 exception 300
SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
ERROR: Flatten device tree not available!

 exception 300
SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8

Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 target/ppc/translate_init.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/target/ppc/translate_init.c b/target/ppc/translate_init.c
index 783bf98..8b98076 100644
--- a/target/ppc/translate_init.c
+++ b/target/ppc/translate_init.c
@@ -8536,7 +8536,8 @@ static bool cpu_has_work_POWER7(CPUState *cs)
         }
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
             (env->spr[SPR_LPCR] & LPCR_P7_PECE1)) {
-            return true;
+            /* Return true only when MSR_EE is enabled */
+            return msr_ee;
         }
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) &&
             (env->spr[SPR_LPCR] & LPCR_P7_PECE2)) {
@@ -8693,7 +8694,8 @@ static bool cpu_has_work_POWER8(CPUState *cs)
         }
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
             (env->spr[SPR_LPCR] & LPCR_P8_PECE3)) {
-            return true;
+            /* Return true only when MSR_EE is enabled */
+            return msr_ee;
         }
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) &&
             (env->spr[SPR_LPCR] & LPCR_P8_PECE4)) {
@@ -8876,7 +8878,8 @@ static bool cpu_has_work_POWER9(CPUState *cs)
         /* Decrementer Exception */
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
             (env->spr[SPR_LPCR] & LPCR_DEE)) {
-            return true;
+            /* Return true only when MSR_EE is enabled */
+            return msr_ee;
         }
         /* Machine Check or Hypervisor Maintenance Exception */
         if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK |
-- 
2.9.3


Re: [Qemu-devel] [Qemu-ppc] [PATCH v2] spapr: ignore decr interrupts when MSR_EE is disabled
Posted by Benjamin Herrenschmidt 6 years, 9 months ago
On Fri, 2017-07-14 at 09:34 +0530, Nikunj A Dadhania wrote:
> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
> 
> When reset happens, all the CPUs are in halted state. First CPU is brought out
> of reset and secondary CPUs would be initialized by the guest kernel using a
> rtas call start-cpu.
> 
> However, in case of TCG, decrementer interrupts keep on coming and waking the
> secondary CPUs up.
> 
> These secondary CPUs would see the decrementer interrupt pending, which makes
> cpu::has_work() to bring them out of wait loop and start executing
> tcg_exec_cpu().
> 
> The problem with this is all the CPUs wake up and start booting SLOF image,
> causing the following exception(4 CPUs TCG VM):

This patch isn't right....

> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -8536,7 +8536,8 @@ static bool cpu_has_work_POWER7(CPUState *cs)
>          }
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
>              (env->spr[SPR_LPCR] & LPCR_P7_PECE1)) {
> -            return true;
> +            /* Return true only when MSR_EE is enabled */
> +            return msr_ee;
>          }

No, the HW will not check MSR_EE, in fact, MSR_EE can purposefully be
left off and we might expect a wakeup still.

I think the right fix is for reset to clear the LPCR PECE bits.

 Cheers
Ben.
 
> >          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) &&
>              (env->spr[SPR_LPCR] & LPCR_P7_PECE2)) {
> @@ -8693,7 +8694,8 @@ static bool cpu_has_work_POWER8(CPUState *cs)
>          }
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
>              (env->spr[SPR_LPCR] & LPCR_P8_PECE3)) {
> -            return true;
> +            /* Return true only when MSR_EE is enabled */
> +            return msr_ee;
>          }
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK)) &&
>              (env->spr[SPR_LPCR] & LPCR_P8_PECE4)) {
> @@ -8876,7 +8878,8 @@ static bool cpu_has_work_POWER9(CPUState *cs)
>          /* Decrementer Exception */
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
>              (env->spr[SPR_LPCR] & LPCR_DEE)) {
> -            return true;
> +            /* Return true only when MSR_EE is enabled */
> +            return msr_ee;
>          }
>          /* Machine Check or Hypervisor Maintenance Exception */
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_MCK |


Re: [Qemu-devel] [Qemu-ppc] [PATCH v2] spapr: ignore decr interrupts when MSR_EE is disabled
Posted by Nikunj Dadhania 6 years, 9 months ago
On 15-Jul-2017 8:10 AM, "Benjamin Herrenschmidt" <benh@au1.ibm.com> wrote:

On Fri, 2017-07-14 at 09:34 +0530, Nikunj A Dadhania wrote:
> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>
> When reset happens, all the CPUs are in halted state. First CPU is
brought out
> of reset and secondary CPUs would be initialized by the guest kernel
using a
> rtas call start-cpu.
>
> However, in case of TCG, decrementer interrupts keep on coming and waking
the
> secondary CPUs up.
>
> These secondary CPUs would see the decrementer interrupt pending, which
makes
> cpu::has_work() to bring them out of wait loop and start executing
> tcg_exec_cpu().
>
> The problem with this is all the CPUs wake up and start booting SLOF
image,
> causing the following exception(4 CPUs TCG VM):

This patch isn't right....

> --- a/target/ppc/translate_init.c
> +++ b/target/ppc/translate_init.c
> @@ -8536,7 +8536,8 @@ static bool cpu_has_work_POWER7(CPUState *cs)
>          }
>          if ((env->pending_interrupts & (1u << PPC_INTERRUPT_DECR)) &&
>              (env->spr[SPR_LPCR] & LPCR_P7_PECE1)) {
> -            return true;
> +            /* Return true only when MSR_EE is enabled */
> +            return msr_ee;
>          }

No, the HW will not check MSR_EE, in fact, MSR_EE can purposefully be
left off and we might expect a wakeup still.


Here we had a decr timer callback which is started during machine init and
remains on even during reset.

In case of HW, does the decr stop on reset?


I think the right fix is for reset to clear the LPCR PECE bits.


Ok, let me try that out.

Regards
Nikunj