[Qemu-devel] [PATCH v3] spapr: disable decrementer during reset

Nikunj A Dadhania posted 1 patch 6 years, 9 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170717041639.16137-1-nikunj@linux.vnet.ibm.com
Test FreeBSD passed
Test checkpatch passed
Test docker passed
Test s390x passed
hw/ppc/spapr_cpu_core.c | 9 +++++++++
hw/ppc/spapr_rtas.c     | 8 ++++++++
2 files changed, 17 insertions(+)
[Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Posted by Nikunj A Dadhania 6 years, 9 months ago
Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.

When reset happens, all the CPUs are in halted state. First CPU is brought out
of reset and secondary CPUs would be initialized by the guest kernel using a
rtas call start-cpu.

However, in case of TCG, decrementer interrupts keep on coming and waking the
secondary CPUs up.

These secondary CPUs would see the decrementer interrupt pending, which makes
cpu::has_work() to bring them out of wait loop and start executing
tcg_exec_cpu().

The problem with this is all the CPUs wake up and start booting SLOF image,
causing the following exception(4 CPUs TCG VM):

[   81.440850] reboot: Restarting system

SLOF
S
SLOF
SLOFLOF[0[0m **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
[0m **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
[0m *************************************m**********[?25l **********************************************************************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
***********************
QEMU Starting
 Build Date = Mar  3 2017 13:29:19
 FW Version = git-66d250ef0fd06bb8
ERROR: Flatten device tree not available!

 exception 300
SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
ERROR: Flatten device tree not available!

 exception 300
SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8

During reset, disable decrementer interrupt for secondary CPUs and enable them
when the secondary CPUs are brought online by rtas start-cpu call.

Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
 hw/ppc/spapr_cpu_core.c | 9 +++++++++
 hw/ppc/spapr_rtas.c     | 8 ++++++++
 2 files changed, 17 insertions(+)

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index ea278ce..bbfe8c2 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -87,6 +87,15 @@ static void spapr_cpu_reset(void *opaque)
 
     env->spr[SPR_HIOR] = 0;
 
+    /* Disable DECR for secondary cpus */
+    if (cs != first_cpu) {
+        if (env->mmu_model == POWERPC_MMU_3_00) {
+            env->spr[SPR_LPCR] &= ~LPCR_DEE;
+        } else {
+            /* P7 and P8 both have same bit for DECR */
+            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
+        }
+    }
     /*
      * This is a hack for the benefit of KVM PR - it abuses the SDR1
      * slot in kvm_sregs to communicate the userspace address of the
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 94a2799..4623d1d 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -174,6 +174,14 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
         kvm_cpu_synchronize_state(cs);
 
         env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
+
+        /* Enable DECR interrupt */
+        if (env->mmu_model == POWERPC_MMU_3_00) {
+            env->spr[SPR_LPCR] |= LPCR_DEE;
+        } else {
+            /* P7 and P8 both have same bit for DECR */
+            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
+        }
         env->nip = start;
         env->gpr[3] = r3;
         cs->halted = 0;
-- 
2.9.3


Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Posted by David Gibson 6 years, 9 months ago
On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
> 
> When reset happens, all the CPUs are in halted state. First CPU is brought out
> of reset and secondary CPUs would be initialized by the guest kernel using a
> rtas call start-cpu.
> 
> However, in case of TCG, decrementer interrupts keep on coming and waking the
> secondary CPUs up.
> 
> These secondary CPUs would see the decrementer interrupt pending, which makes
> cpu::has_work() to bring them out of wait loop and start executing
> tcg_exec_cpu().
> 
> The problem with this is all the CPUs wake up and start booting SLOF image,
> causing the following exception(4 CPUs TCG VM):

Ok, I'm still trying to understand why the behaviour on reboot is
different from the first boot.  AFAICT on initial boot, the LPCR will
have DEE / PECE3 enabled.  So why aren't we getting the same problem
then?


> 
> [   81.440850] reboot: Restarting system
> 
> SLOF
> S
> SLOF
> SLOFLOF[0[0m **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> [0m **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> [0m *************************************m**********[?25l **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> ***********************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> ERROR: Flatten device tree not available!
> 
>  exception 300
> SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
> SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
> ERROR: Flatten device tree not available!
> 
>  exception 300
> SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
> SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
> 
> During reset, disable decrementer interrupt for secondary CPUs and enable them
> when the secondary CPUs are brought online by rtas start-cpu call.
> 
> Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_cpu_core.c | 9 +++++++++
>  hw/ppc/spapr_rtas.c     | 8 ++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index ea278ce..bbfe8c2 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -87,6 +87,15 @@ static void spapr_cpu_reset(void *opaque)
>  
>      env->spr[SPR_HIOR] = 0;
>  
> +    /* Disable DECR for secondary cpus */
> +    if (cs != first_cpu) {
> +        if (env->mmu_model == POWERPC_MMU_3_00) {
> +            env->spr[SPR_LPCR] &= ~LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> +        }
> +    }
>      /*
>       * This is a hack for the benefit of KVM PR - it abuses the SDR1
>       * slot in kvm_sregs to communicate the userspace address of the
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 94a2799..4623d1d 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -174,6 +174,14 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
>          kvm_cpu_synchronize_state(cs);
>  
>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
> +
> +        /* Enable DECR interrupt */
> +        if (env->mmu_model == POWERPC_MMU_3_00) {
> +            env->spr[SPR_LPCR] |= LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
> +        }
>          env->nip = start;
>          env->gpr[3] = r3;
>          cs->halted = 0;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Posted by Nikunj A Dadhania 6 years, 9 months ago
David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
>> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>> 
>> When reset happens, all the CPUs are in halted state. First CPU is brought out
>> of reset and secondary CPUs would be initialized by the guest kernel using a
>> rtas call start-cpu.
>> 
>> However, in case of TCG, decrementer interrupts keep on coming and waking the
>> secondary CPUs up.
>> 
>> These secondary CPUs would see the decrementer interrupt pending, which makes
>> cpu::has_work() to bring them out of wait loop and start executing
>> tcg_exec_cpu().
>> 
>> The problem with this is all the CPUs wake up and start booting SLOF image,
>> causing the following exception(4 CPUs TCG VM):
>
> Ok, I'm still trying to understand why the behaviour on reboot is
> different from the first boot.

During first boot, the cpu is in the stopped state, so
cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
until rtas start-cpu. Therefore, we never check the cpu_has_work()

In case of reboot, all CPUs are resumed after reboot. So we check the
next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
DECR interrupt and remove the CPU from halted state as the CPU has work.

> AFAICT on initial boot, the LPCR will
> have DEE / PECE3 enabled.  So why aren't we getting the same problem
> then?

Regards
Nikunj


Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Posted by Nikunj A Dadhania 6 years, 9 months ago
David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
>> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>> 
>> When reset happens, all the CPUs are in halted state. First CPU is brought out
>> of reset and secondary CPUs would be initialized by the guest kernel using a
>> rtas call start-cpu.
>> 
>> However, in case of TCG, decrementer interrupts keep on coming and waking the
>> secondary CPUs up.
>> 
>> These secondary CPUs would see the decrementer interrupt pending, which makes
>> cpu::has_work() to bring them out of wait loop and start executing
>> tcg_exec_cpu().
>> 
>> The problem with this is all the CPUs wake up and start booting SLOF image,
>> causing the following exception(4 CPUs TCG VM):
>
> Ok, I'm still trying to understand why the behaviour on reboot is
> different from the first boot.

During first boot, the cpu is in the stopped state, so
cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
until rtas start-cpu. Therefore, we never check the cpu_has_work()

In case of reboot, all CPUs are resumed after reboot. So we check the
next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
DECR interrupt and remove the CPU from halted state as the CPU has work.

> AFAICT on initial boot, the LPCR will
> have DEE / PECE3 enabled.  So why aren't we getting the same problem
> then?

Regards
Nikunj


Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Posted by David Gibson 6 years, 9 months ago
On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
> >> 
> >> When reset happens, all the CPUs are in halted state. First CPU is brought out
> >> of reset and secondary CPUs would be initialized by the guest kernel using a
> >> rtas call start-cpu.
> >> 
> >> However, in case of TCG, decrementer interrupts keep on coming and waking the
> >> secondary CPUs up.
> >> 
> >> These secondary CPUs would see the decrementer interrupt pending, which makes
> >> cpu::has_work() to bring them out of wait loop and start executing
> >> tcg_exec_cpu().
> >> 
> >> The problem with this is all the CPUs wake up and start booting SLOF image,
> >> causing the following exception(4 CPUs TCG VM):
> >
> > Ok, I'm still trying to understand why the behaviour on reboot is
> > different from the first boot.
> 
> During first boot, the cpu is in the stopped state, so
> cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
> until rtas start-cpu. Therefore, we never check the cpu_has_work()
> 
> In case of reboot, all CPUs are resumed after reboot. So we check the
> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
> DECR interrupt and remove the CPU from halted state as the CPU has
> work.

Ok, so it sounds like we should set stopped on all the secondary CPUs
on reset as well.  What's causing them to be resumed after the reset
at the moment?

> > AFAICT on initial boot, the LPCR will
> > have DEE / PECE3 enabled.  So why aren't we getting the same problem
> > then?
> 
> Regards
> Nikunj
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Posted by Nikunj A Dadhania 6 years, 9 months ago
David Gibson <david@gibson.dropbear.id.au> writes:

> On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>> 
>> > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
>> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>> >> 
>> >> When reset happens, all the CPUs are in halted state. First CPU is brought out
>> >> of reset and secondary CPUs would be initialized by the guest kernel using a
>> >> rtas call start-cpu.
>> >> 
>> >> However, in case of TCG, decrementer interrupts keep on coming and waking the
>> >> secondary CPUs up.
>> >> 
>> >> These secondary CPUs would see the decrementer interrupt pending, which makes
>> >> cpu::has_work() to bring them out of wait loop and start executing
>> >> tcg_exec_cpu().
>> >> 
>> >> The problem with this is all the CPUs wake up and start booting SLOF image,
>> >> causing the following exception(4 CPUs TCG VM):
>> >
>> > Ok, I'm still trying to understand why the behaviour on reboot is
>> > different from the first boot.
>> 
>> During first boot, the cpu is in the stopped state, so
>> cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
>> until rtas start-cpu. Therefore, we never check the cpu_has_work()
>> 
>> In case of reboot, all CPUs are resumed after reboot. So we check the
>> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
>> DECR interrupt and remove the CPU from halted state as the CPU has
>> work.
>
> Ok, so it sounds like we should set stopped on all the secondary CPUs
> on reset as well.  What's causing them to be resumed after the reset
> at the moment?

That is part of the main loop in vl.c, when reset is requested. All the
vcpus are paused (stopped == true) then system reset is issued, and all
cpus are resumed (stopped == false). Which is correct.

static bool main_loop_should_exit(void)
{
[...]
    request = qemu_reset_requested();
    if (request) {
        pause_all_vcpus();
        qemu_system_reset(request);
        resume_all_vcpus();
        if (!runstate_check(RUN_STATE_RUNNING) &&
                !runstate_check(RUN_STATE_INMIGRATE)) {
            runstate_set(RUN_STATE_PRELAUNCH);
        }
    }
[...]
}

The CPUs are in halted state, i.e. cpu::halted = 1. We have set
cpu::halted = 0 for the primary CPU, aka first_cpu, in
spapr.c::ppc_spapr_reset()

In case of TCG, we have a decr callback (per CPU) scheduled once the
machine is started which keeps on running and interrupting irrespective
of the state of the machine.

tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_decr_cb, cpu);

Which keeps on queueing the interrupt to the CPUs. All the other CPUs
are in a tight loop checking cpu_thread_is_idle(), which returns false
when it sees the decrementer interrupt, the cpu starts executing:

cpu_exec()
  -> cpu_handle_halt()
     -> sees cpu_has_work() and sets cpu->halted = 0

And the execution resumes, when it shouldnt have. Ideally, for secondary
CPUs, cpu::halted flag is cleared in rtas start-cpu call.

Initially, I thought we should not have interrupt during reset state.
That was the reason of ignoring decr interrupt when msr_ee was disabled
in my previous patch. BenH suggested that it is wrong from HW
perspective.

Regards,
Nikunj


Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Posted by David Gibson 6 years, 9 months ago
On Wed, Jul 19, 2017 at 09:20:52AM +0530, Nikunj A Dadhania wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote:
> >> David Gibson <david@gibson.dropbear.id.au> writes:
> >> 
> >> > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
> >> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
> >> >> 
> >> >> When reset happens, all the CPUs are in halted state. First CPU is brought out
> >> >> of reset and secondary CPUs would be initialized by the guest kernel using a
> >> >> rtas call start-cpu.
> >> >> 
> >> >> However, in case of TCG, decrementer interrupts keep on coming and waking the
> >> >> secondary CPUs up.
> >> >> 
> >> >> These secondary CPUs would see the decrementer interrupt pending, which makes
> >> >> cpu::has_work() to bring them out of wait loop and start executing
> >> >> tcg_exec_cpu().
> >> >> 
> >> >> The problem with this is all the CPUs wake up and start booting SLOF image,
> >> >> causing the following exception(4 CPUs TCG VM):
> >> >
> >> > Ok, I'm still trying to understand why the behaviour on reboot is
> >> > different from the first boot.
> >> 
> >> During first boot, the cpu is in the stopped state, so
> >> cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
> >> until rtas start-cpu. Therefore, we never check the cpu_has_work()
> >> 
> >> In case of reboot, all CPUs are resumed after reboot. So we check the
> >> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
> >> DECR interrupt and remove the CPU from halted state as the CPU has
> >> work.
> >
> > Ok, so it sounds like we should set stopped on all the secondary CPUs
> > on reset as well.  What's causing them to be resumed after the reset
> > at the moment?
> 
> That is part of the main loop in vl.c, when reset is requested. All the
> vcpus are paused (stopped == true) then system reset is issued, and all
> cpus are resumed (stopped == false). Which is correct.

is it?  Seems we have a different value of 'stopped' on the first boot
compared to reoboots, which doesn't seem right.

> static bool main_loop_should_exit(void)
> {
> [...]
>     request = qemu_reset_requested();
>     if (request) {
>         pause_all_vcpus();
>         qemu_system_reset(request);
>         resume_all_vcpus();
>         if (!runstate_check(RUN_STATE_RUNNING) &&
>                 !runstate_check(RUN_STATE_INMIGRATE)) {
>             runstate_set(RUN_STATE_PRELAUNCH);
>         }
>     }
> [...]
> }
> 
> The CPUs are in halted state, i.e. cpu::halted = 1. We have set
> cpu::halted = 0 for the primary CPU, aka first_cpu, in
> spapr.c::ppc_spapr_reset()
> 
> In case of TCG, we have a decr callback (per CPU) scheduled once the
> machine is started which keeps on running and interrupting irrespective
> of the state of the machine.

Right.  The thing is "halted" means waiting-for-interrupt; it's mostly
used for things like x86 hlt instruction, H_CEDE, short-term nap modes
and so forth.  We're abusing it a little for keeping the secondary
CPUs offline until they're explicitly started.

Trouble is, there isn't a clearly better option - stopped is
automatically turned off after reset as above, so it can't be used for
"stopped under firmware / hypervisor authority".

> tb_env->decr_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, &cpu_ppc_decr_cb, cpu);
> 
> Which keeps on queueing the interrupt to the CPUs. All the other CPUs
> are in a tight loop checking cpu_thread_is_idle(), which returns false
> when it sees the decrementer interrupt, the cpu starts executing:
> 
> cpu_exec()
>   -> cpu_handle_halt()
>      -> sees cpu_has_work() and sets cpu->halted = 0
> 
> And the execution resumes, when it shouldnt have. Ideally, for secondary
> CPUs, cpu::halted flag is cleared in rtas start-cpu call.
> 
> Initially, I thought we should not have interrupt during reset state.
> That was the reason of ignoring decr interrupt when msr_ee was disabled
> in my previous patch. BenH suggested that it is wrong from HW
> perspective.
> 
> Regards,
> Nikunj
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Posted by Nikunj A Dadhania 6 years, 7 months ago
David Gibson <david@gibson.dropbear.id.au> writes:

> On Wed, Jul 19, 2017 at 09:20:52AM +0530, Nikunj A Dadhania wrote:
>> David Gibson <david@gibson.dropbear.id.au> writes:
>> 
>> > On Tue, Jul 18, 2017 at 10:53:01AM +0530, Nikunj A Dadhania wrote:
>> >> David Gibson <david@gibson.dropbear.id.au> writes:
>> >> 
>> >> > On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
>> >> >> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>> >> >> 
>> >> >> When reset happens, all the CPUs are in halted state. First CPU is brought out
>> >> >> of reset and secondary CPUs would be initialized by the guest kernel using a
>> >> >> rtas call start-cpu.
>> >> >> 
>> >> >> However, in case of TCG, decrementer interrupts keep on coming and waking the
>> >> >> secondary CPUs up.
>> >> >> 
>> >> >> These secondary CPUs would see the decrementer interrupt pending, which makes
>> >> >> cpu::has_work() to bring them out of wait loop and start executing
>> >> >> tcg_exec_cpu().
>> >> >> 
>> >> >> The problem with this is all the CPUs wake up and start booting SLOF image,
>> >> >> causing the following exception(4 CPUs TCG VM):
>> >> >
>> >> > Ok, I'm still trying to understand why the behaviour on reboot is
>> >> > different from the first boot.
>> >> 
>> >> During first boot, the cpu is in the stopped state, so
>> >> cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
>> >> until rtas start-cpu. Therefore, we never check the cpu_has_work()
>> >> 
>> >> In case of reboot, all CPUs are resumed after reboot. So we check the
>> >> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
>> >> DECR interrupt and remove the CPU from halted state as the CPU has
>> >> work.
>> >
>> > Ok, so it sounds like we should set stopped on all the secondary CPUs
>> > on reset as well.  What's causing them to be resumed after the reset
>> > at the moment?
>> 
>> That is part of the main loop in vl.c, when reset is requested. All the
>> vcpus are paused (stopped == true) then system reset is issued, and all
>> cpus are resumed (stopped == false). Which is correct.
>
> is it?  Seems we have a different value of 'stopped' on the first boot
> compared to reoboots, which doesn't seem right.

I have checked this with more debugging prints (patch at the end), as
you said, first boot and reboot does not have different value of
cpu::stopped

     cpu_ppc_decr_excp-cpu1: stop 0 stopped 1 halted 0 SPR_LPCR 0
     spapr_cpu_reset-cpu1: stop 0 stopped 1 halted 1 SPR_LPCR 2000
     spapr_cpu_reset-cpu1: stop 0 stopped 1 halted 1 SPR_LPCR 2000
     resume_all_vcpus-cpu0: stop 0 stopped 0 halted 0
     resume_all_vcpus-cpu1: stop 0 stopped 0 halted 1
     
     
     SLOF **********************************************************************
     QEMU Starting
     
     [ boot fine and then we reboot ]


     cpu_ppc_decr_excp-cpu1: stop 0 stopped 0 halted 0 SPR_LPCR 2000
     cpu_ppc_decr_excp-cpu1: stop 0 stopped 0 halted 0 SPR_LPCR 2000
     cpu_ppc_decr_excp-cpu1: stop 0 stopped 0 halted 0 SPR_LPCR 2000
     cpu_ppc_decr_excp-cpu1: stop 0 stopped 0 halted 1 SPR_LPCR 2000
     [   54.044388] reboot: Restarting system
     spapr_cpu_reset-cpu1: stop 0 stopped 1 halted 1 SPR_LPCR 2000
     resume_all_vcpus-cpu0: stop 0 stopped 0 halted 0
     resume_all_vcpus-cpu1: stop 0 stopped 0 halted 1
     cpu_ppc_decr_excp-cpu1: stop 0 stopped 0 halted 1 SPR_LPCR 2000
     
     
     
     
     SSLLOOFF ***[0m **********************************************************************
     QEMU Starting
      Build Date = Aug 16 2017 12:38:57
      FW Version = git-685af54d8a47a42d
     *******************************************************************
     QEMU Starting
      Build Date = Aug 16 2017 12:38:57
      FW Version = git-685af54d8a47a42d
     ERROR: Flatten device tree not available!
     
      SPRG2 = RSPRG3 = 00000000000000000 0 

One difference I see here is, the decr exception is delivered before
reset in case of first boot for secondary cpu, and then I do not see any
decr exception until rtas-start.

In case of a reboot, we are getting decr_exception at some interval,
then there is spapr_cpu_reset, after that the cpus are resumed, the
interrupt gets delivered just after that which brings out the CPU-1 from
halted state. Other thing is, could it be a stale interrupt, delivered
late? As I do not see any such prints after that.

Regards
Nikunj




diff --git a/cpus.c b/cpus.c
index 9bed61e..f6cfe65 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1638,6 +1638,8 @@ void resume_all_vcpus(void)
     qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true);
     CPU_FOREACH(cpu) {
         cpu_resume(cpu);
+        fprintf(stderr, "%s-cpu%d: stop %d stopped %d halted %d\n",
+                __func__, cpu->cpu_index, cpu->stop, cpu->stopped, cpu->halted);
     }
 }
 
diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
index 224184d..14823a8 100644
--- a/hw/ppc/ppc.c
+++ b/hw/ppc/ppc.c
@@ -702,8 +702,14 @@ uint64_t cpu_ppc_load_purr (CPUPPCState *env)
  */
 static inline void cpu_ppc_decr_excp(PowerPCCPU *cpu)
 {
+    CPUState *cs = CPU(cpu);
+    CPUPPCState *env = &cpu->env;
     /* Raise it */
     LOG_TB("raise decrementer exception\n");
+    if (first_cpu != cs) {
+        fprintf(stderr, "%s-cpu%d: stop %d stopped %d halted %d SPR_LPCR %llx\n",
+                __func__, cs->cpu_index, cs->stop, cs->stopped, cs->halted, (env->spr[SPR_LPCR] & LPCR_P8_PECE3));
+    }
     ppc_set_irq(cpu, PPC_INTERRUPT_DECR, 1);
 }
 
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index ea278ce..5d01081 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -86,6 +86,10 @@ static void spapr_cpu_reset(void *opaque)
     cs->halted = 1;
 
     env->spr[SPR_HIOR] = 0;
+    if (first_cpu != cs) {
+        fprintf(stderr, "%s-cpu%d: stop %d stopped %d halted %d SPR_LPCR %llx\n",
+                __func__, cs->cpu_index, cs->stop, cs->stopped, cs->halted, (env->spr[SPR_LPCR] & LPCR_P8_PECE3));
+    }
 
     /*
      * This is a hack for the benefit of KVM PR - it abuses the SDR1



Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Posted by Nikunj A Dadhania 6 years, 9 months ago
David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Jul 17, 2017 at 09:46:39AM +0530, Nikunj A Dadhania wrote:
>> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
>> 
>> When reset happens, all the CPUs are in halted state. First CPU is brought out
>> of reset and secondary CPUs would be initialized by the guest kernel using a
>> rtas call start-cpu.
>> 
>> However, in case of TCG, decrementer interrupts keep on coming and waking the
>> secondary CPUs up.
>> 
>> These secondary CPUs would see the decrementer interrupt pending, which makes
>> cpu::has_work() to bring them out of wait loop and start executing
>> tcg_exec_cpu().
>> 
>> The problem with this is all the CPUs wake up and start booting SLOF image,
>> causing the following exception(4 CPUs TCG VM):
>
> Ok, I'm still trying to understand why the behaviour on reboot is
> different from the first boot.

During first boot, the cpu is in the stopped state, so
cpus.c:cpu_thread_is_idle returns true and CPU remains in halted state
until rtas start-cpu. Therefore, we never check the cpu_has_work()

In case of reboot, all CPUs are resumed after reboot. So we check the
next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
DECR interrupt and remove the CPU from halted state as the CPU has work.

> AFAICT on initial boot, the LPCR will
> have DEE / PECE3 enabled.  So why aren't we getting the same problem
> then?

Regards
Nikunj


Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Posted by Benjamin Herrenschmidt 6 years, 9 months ago
On Tue, 2017-07-18 at 10:56 +0530, Nikunj A Dadhania wrote:
> In case of reboot, all CPUs are resumed after reboot. So we check the
> next condition cpu_has_work() in cpu_thread_is_idle(), where we see a
> DECR interrupt and remove the CPU from halted state as the CPU has work.

Shouldn't we put them back in the stopped state then ?

Cheers,
Ben.


Re: [Qemu-devel] [PATCH v3] spapr: disable decrementer during reset
Posted by Cédric Le Goater 6 years, 6 months ago
On 07/17/2017 06:16 AM, Nikunj A Dadhania wrote:
> Rebooting a SMP TCG guest is broken for both single/multi threaded TCG.
> 
> When reset happens, all the CPUs are in halted state. First CPU is brought out
> of reset and secondary CPUs would be initialized by the guest kernel using a
> rtas call start-cpu.
> 
> However, in case of TCG, decrementer interrupts keep on coming and waking the
> secondary CPUs up.
> 
> These secondary CPUs would see the decrementer interrupt pending, which makes
> cpu::has_work() to bring them out of wait loop and start executing
> tcg_exec_cpu().
> 
> The problem with this is all the CPUs wake up and start booting SLOF image,
> causing the following exception(4 CPUs TCG VM):
> 
> [   81.440850] reboot: Restarting system
> 
> SLOF
> S
> SLOF
> SLOFLOF[0[0m **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> [0m **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> [0m *************************************m**********[?25l **********************************************************************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> ***********************
> QEMU Starting
>  Build Date = Mar  3 2017 13:29:19
>  FW Version = git-66d250ef0fd06bb8
> ERROR: Flatten device tree not available!
> 
>  exception 300
> SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
> SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
> ERROR: Flatten device tree not available!
> 
>  exception 300
> SRR0 = 00000000000060e4  SRR1 = 800000008000000000000000
> SPRG2 = 0000000000400000  SPRG3 = 0000000000004bd8
> 
> During reset, disable decrementer interrupt for secondary CPUs and enable them
> when the secondary CPUs are brought online by rtas start-cpu call.
> 
> Reported-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr_cpu_core.c | 9 +++++++++
>  hw/ppc/spapr_rtas.c     | 8 ++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index ea278ce..bbfe8c2 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -87,6 +87,15 @@ static void spapr_cpu_reset(void *opaque)
>  
>      env->spr[SPR_HIOR] = 0;
>  
> +    /* Disable DECR for secondary cpus */
> +    if (cs != first_cpu) {
> +        if (env->mmu_model == POWERPC_MMU_3_00) {
> +            env->spr[SPR_LPCR] &= ~LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> +        }
> +    }
>      /*
>       * This is a hack for the benefit of KVM PR - it abuses the SDR1
>       * slot in kvm_sregs to communicate the userspace address of the
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 94a2799..4623d1d 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -174,6 +174,14 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPRMachineState *spapr,
>          kvm_cpu_synchronize_state(cs);
>  
>          env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME);
> +
> +        /* Enable DECR interrupt */
> +        if (env->mmu_model == POWERPC_MMU_3_00) {
> +            env->spr[SPR_LPCR] |= LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] |= LPCR_P8_PECE3;
> +        }
>          env->nip = start;
>          env->gpr[3] = r3;
>          cs->halted = 0;


you should also disable the decrementer in the stop-cpu call
when a cpu is hot unplugged.

One thing I don't explain is why the decrementer interrupt 
does not wake up the cpu after being  hot unplugged. I am not 
sure the code doing :

	env->msr = 0;

is responsible of this behavior. 

With the xive patchset, I am having issues in that area. 
the decrementer wakes up the cpu just after hot unplugged
and I need to set the LPCR to disable it. 

I wonder why XICS is immune to that. 

C.