[Qemu-devel] [PATCH 0/2] disable the decrementer interrupt when a CPU is unplugged

Cédric Le Goater posted 2 patches 6 years, 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20171005164959.26024-1-clg@kaod.org
Test checkpatch passed
Test docker passed
Test s390x passed
There is a newer version of this series
hw/ppc/spapr_rtas.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
[Qemu-devel] [PATCH 0/2] disable the decrementer interrupt when a CPU is unplugged
Posted by Cédric Le Goater 6 years, 6 months ago
Hello,

When a CPU is stopped with the 'stop-self' RTAS call, its state
'halted' is switched to 1 and, in this case, the MSR is not taken into
account anymore in the cpu_has_work() routine. Only the pending
hardware interrupts are checked with their LPCR:PECE* enablement bit.

If the DECR timer fires after 'stop-self' is called and before the CPU
'stop' state is reached, the nearly-dead CPU will have some work to do
and the guest will crash. This case happens very frequently with the
not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
occasionally fired but after 'stop' state, so no work is to be done
and the guest survives.

I suspect there is a race between the QEMU mainloop triggering the
timers and the TCG CPU thread but I could not quite identify the root
cause. To be safe, let's disable the decrementer interrupt in the LPCR
when the CPU is halted and reenable it when the CPU is restarted.
Reseting the MSR is now pointless, so remove this dubious workaround.

Thanks,

C.

Cédric Le Goater (2):
  spapr/rtas: disable the decrementer interrupt when a CPU is unplugged
  spapr/rtas: do not reset the MSR in stop-self command

 hw/ppc/spapr_rtas.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

-- 
2.13.6


Re: [Qemu-devel] [PATCH 0/2] disable the decrementer interrupt when a CPU is unplugged
Posted by Nikunj A Dadhania 6 years, 6 months ago
Cédric Le Goater <clg@kaod.org> writes:

> Hello,
>
> When a CPU is stopped with the 'stop-self' RTAS call, its state
> 'halted' is switched to 1 and, in this case, the MSR is not taken into
> account anymore in the cpu_has_work() routine. Only the pending
> hardware interrupts are checked with their LPCR:PECE* enablement bit.
>
> If the DECR timer fires after 'stop-self' is called and before the CPU
> 'stop' state is reached, the nearly-dead CPU will have some work to do
> and the guest will crash. This case happens very frequently with the
> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
> occasionally fired but after 'stop' state, so no work is to be done
> and the guest survives.
>
> I suspect there is a race between the QEMU mainloop triggering the
> timers and the TCG CPU thread but I could not quite identify the root
> cause. To be safe, let's disable the decrementer interrupt in the LPCR
> when the CPU is halted and reenable it when the CPU is restarted.

Moreover, disabling the DECR in the reset path solves the TCG multi cpu
reboot case, as reboot path does not call stop-cpu rtas call.

diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 3e20b1d886..c5150ee590 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque)
     cs->halted = 1;
 
     env->spr[SPR_HIOR] = 0;
+    /* Disable DECR for secondary cpus */
+    if (cs != first_cpu) {
+        if (env->mmu_model == POWERPC_MMU_3_00) {
+            env->spr[SPR_LPCR] &= ~LPCR_DEE;
+        } else {
+            /* P7 and P8 both have same bit for DECR */
+            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
+        }
+    }
 }
 
 static void spapr_cpu_destroy(PowerPCCPU *cpu)


Regards
Nikunj


Re: [Qemu-devel] [PATCH 0/2] disable the decrementer interrupt when a CPU is unplugged
Posted by Cédric Le Goater 6 years, 6 months ago
On 10/06/2017 08:10 AM, Nikunj A Dadhania wrote:
> Cédric Le Goater <clg@kaod.org> writes:
> 
>> Hello,
>>
>> When a CPU is stopped with the 'stop-self' RTAS call, its state
>> 'halted' is switched to 1 and, in this case, the MSR is not taken into
>> account anymore in the cpu_has_work() routine. Only the pending
>> hardware interrupts are checked with their LPCR:PECE* enablement bit.
>>
>> If the DECR timer fires after 'stop-self' is called and before the CPU
>> 'stop' state is reached, the nearly-dead CPU will have some work to do
>> and the guest will crash. This case happens very frequently with the
>> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
>> occasionally fired but after 'stop' state, so no work is to be done
>> and the guest survives.
>>
>> I suspect there is a race between the QEMU mainloop triggering the
>> timers and the TCG CPU thread but I could not quite identify the root
>> cause. To be safe, let's disable the decrementer interrupt in the LPCR
>> when the CPU is halted and reenable it when the CPU is restarted.
> 
> Moreover, disabling the DECR in the reset path solves the TCG multi cpu
> reboot case, as reboot path does not call stop-cpu rtas call.

yes. I was going to restart the thread on the topic. 

Let's how these two little patches are discussed. Then we/you can 
resend the missing hunk in reset which is needed to perform a TCG 
reboot.

Thanks,  

C.


> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 3e20b1d886..c5150ee590 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque)
>      cs->halted = 1;
>  
>      env->spr[SPR_HIOR] = 0;
> +    /* Disable DECR for secondary cpus */
> +    if (cs != first_cpu) {
> +        if (env->mmu_model == POWERPC_MMU_3_00) {
> +            env->spr[SPR_LPCR] &= ~LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> +        }
> +    }
>  }
>  
>  static void spapr_cpu_destroy(PowerPCCPU *cpu)
> 
> 
> Regards
> Nikunj
> 


Re: [Qemu-devel] [PATCH 0/2] disable the decrementer interrupt when a CPU is unplugged
Posted by Benjamin Herrenschmidt 6 years, 6 months ago
On Fri, 2017-10-06 at 11:40 +0530, Nikunj A Dadhania wrote:
> Cédric Le Goater <clg@kaod.org> writes:
> 
> > Hello,
> > 
> > When a CPU is stopped with the 'stop-self' RTAS call, its state
> > 'halted' is switched to 1 and, in this case, the MSR is not taken into
> > account anymore in the cpu_has_work() routine. Only the pending
> > hardware interrupts are checked with their LPCR:PECE* enablement bit.
> > 
> > If the DECR timer fires after 'stop-self' is called and before the CPU
> > 'stop' state is reached, the nearly-dead CPU will have some work to do
> > and the guest will crash. This case happens very frequently with the
> > not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
> > occasionally fired but after 'stop' state, so no work is to be done
> > and the guest survives.
> > 
> > I suspect there is a race between the QEMU mainloop triggering the
> > timers and the TCG CPU thread but I could not quite identify the root
> > cause. To be safe, let's disable the decrementer interrupt in the LPCR
> > when the CPU is halted and reenable it when the CPU is restarted.
> 
> Moreover, disabling the DECR in the reset path solves the TCG multi cpu
> reboot case, as reboot path does not call stop-cpu rtas call.

SHouldn't we do it in set_papr too and only turn it on for the boot CPU
and in start-cpu RTAS call ? Same with the other PECEs in fact...

> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 3e20b1d886..c5150ee590 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque)
>      cs->halted = 1;
>  
>      env->spr[SPR_HIOR] = 0;
> +    /* Disable DECR for secondary cpus */
> +    if (cs != first_cpu) {
> +        if (env->mmu_model == POWERPC_MMU_3_00) {
> +            env->spr[SPR_LPCR] &= ~LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> +        }
> +    }
>  }
>  
>  static void spapr_cpu_destroy(PowerPCCPU *cpu)
> 
> 
> Regards
> Nikunj

Re: [Qemu-devel] [PATCH 0/2] disable the decrementer interrupt when a CPU is unplugged
Posted by Cédric Le Goater 6 years, 6 months ago
On 10/06/2017 09:46 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2017-10-06 at 11:40 +0530, Nikunj A Dadhania wrote:
>> Cédric Le Goater <clg@kaod.org> writes:
>>
>>> Hello,
>>>
>>> When a CPU is stopped with the 'stop-self' RTAS call, its state
>>> 'halted' is switched to 1 and, in this case, the MSR is not taken into
>>> account anymore in the cpu_has_work() routine. Only the pending
>>> hardware interrupts are checked with their LPCR:PECE* enablement bit.
>>>
>>> If the DECR timer fires after 'stop-self' is called and before the CPU
>>> 'stop' state is reached, the nearly-dead CPU will have some work to do
>>> and the guest will crash. This case happens very frequently with the
>>> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
>>> occasionally fired but after 'stop' state, so no work is to be done
>>> and the guest survives.
>>>
>>> I suspect there is a race between the QEMU mainloop triggering the
>>> timers and the TCG CPU thread but I could not quite identify the root
>>> cause. To be safe, let's disable the decrementer interrupt in the LPCR
>>> when the CPU is halted and reenable it when the CPU is restarted.
>>
>> Moreover, disabling the DECR in the reset path solves the TCG multi cpu
>> reboot case, as reboot path does not call stop-cpu rtas call.
> 
> SHouldn't we do it in set_papr too and only turn it on for the boot CPU
> and in start-cpu RTAS call ? Same with the other PECEs in fact...

yes I agree. 

In cpu_ppc_set_papr(), we should set the PECE* bits only for the boot 
CPU and then let the RTAS calls start-cpu and stop-self do the enablement 
and disablement.

I will respin the patchset. 

C.

>> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
>> index 3e20b1d886..c5150ee590 100644
>> --- a/hw/ppc/spapr_cpu_core.c
>> +++ b/hw/ppc/spapr_cpu_core.c
>> @@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque)
>>      cs->halted = 1;
>>  
>>      env->spr[SPR_HIOR] = 0;
>> +    /* Disable DECR for secondary cpus */
>> +    if (cs != first_cpu) {
>> +        if (env->mmu_model == POWERPC_MMU_3_00) {
>> +            env->spr[SPR_LPCR] &= ~LPCR_DEE;
>> +        } else {
>> +            /* P7 and P8 both have same bit for DECR */
>> +            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
>> +        }
>> +    }
>>  }
>>  
>>  static void spapr_cpu_destroy(PowerPCCPU *cpu)
>>
>>
>> Regards
>> Nikunj


Re: [Qemu-devel] [PATCH 0/2] disable the decrementer interrupt when a CPU is unplugged
Posted by Nikunj A Dadhania 6 years, 6 months ago
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Fri, 2017-10-06 at 11:40 +0530, Nikunj A Dadhania wrote:
>> Cédric Le Goater <clg@kaod.org> writes:
>> 
>> > Hello,
>> > 
>> > When a CPU is stopped with the 'stop-self' RTAS call, its state
>> > 'halted' is switched to 1 and, in this case, the MSR is not taken into
>> > account anymore in the cpu_has_work() routine. Only the pending
>> > hardware interrupts are checked with their LPCR:PECE* enablement bit.
>> > 
>> > If the DECR timer fires after 'stop-self' is called and before the CPU
>> > 'stop' state is reached, the nearly-dead CPU will have some work to do
>> > and the guest will crash. This case happens very frequently with the
>> > not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
>> > occasionally fired but after 'stop' state, so no work is to be done
>> > and the guest survives.
>> > 
>> > I suspect there is a race between the QEMU mainloop triggering the
>> > timers and the TCG CPU thread but I could not quite identify the root
>> > cause. To be safe, let's disable the decrementer interrupt in the LPCR
>> > when the CPU is halted and reenable it when the CPU is restarted.
>> 
>> Moreover, disabling the DECR in the reset path solves the TCG multi cpu
>> reboot case, as reboot path does not call stop-cpu rtas call.
>
> SHouldn't we do it in set_papr too and only turn it on for the boot CPU
> and in start-cpu RTAS call ? Same with the other PECEs in fact...

Yes, +1 for that

Regards
Nikunj


Re: [Qemu-devel] [PATCH 0/2] disable the decrementer interrupt when a CPU is unplugged
Posted by David Gibson 6 years, 6 months ago
On Fri, Oct 06, 2017 at 11:40:02AM +0530, Nikunj A Dadhania wrote:
> Cédric Le Goater <clg@kaod.org> writes:
> 
> > Hello,
> >
> > When a CPU is stopped with the 'stop-self' RTAS call, its state
> > 'halted' is switched to 1 and, in this case, the MSR is not taken into
> > account anymore in the cpu_has_work() routine. Only the pending
> > hardware interrupts are checked with their LPCR:PECE* enablement bit.
> >
> > If the DECR timer fires after 'stop-self' is called and before the CPU
> > 'stop' state is reached, the nearly-dead CPU will have some work to do
> > and the guest will crash. This case happens very frequently with the
> > not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is
> > occasionally fired but after 'stop' state, so no work is to be done
> > and the guest survives.
> >
> > I suspect there is a race between the QEMU mainloop triggering the
> > timers and the TCG CPU thread but I could not quite identify the root
> > cause. To be safe, let's disable the decrementer interrupt in the LPCR
> > when the CPU is halted and reenable it when the CPU is restarted.
> 
> Moreover, disabling the DECR in the reset path solves the TCG multi cpu
> reboot case, as reboot path does not call stop-cpu rtas call.
> 
> diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
> index 3e20b1d886..c5150ee590 100644
> --- a/hw/ppc/spapr_cpu_core.c
> +++ b/hw/ppc/spapr_cpu_core.c
> @@ -86,6 +86,15 @@ static void spapr_cpu_reset(void *opaque)
>      cs->halted = 1;
>  
>      env->spr[SPR_HIOR] = 0;
> +    /* Disable DECR for secondary cpus */
> +    if (cs != first_cpu) {
> +        if (env->mmu_model == POWERPC_MMU_3_00) {
> +            env->spr[SPR_LPCR] &= ~LPCR_DEE;
> +        } else {
> +            /* P7 and P8 both have same bit for DECR */
> +            env->spr[SPR_LPCR] &= ~LPCR_P8_PECE3;
> +        }
> +    }
>  }

This seems reasonable.

>  
>  static void spapr_cpu_destroy(PowerPCCPU *cpu)
> 
> 
> Regards
> Nikunj
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson