[Qemu-devel] [PATCH 06/13] spapr/xive: fix migration of the XiveTCTX under TCG

Cédric Le Goater posted 13 patches 6 years, 10 months ago
There is a newer version of this series
[Qemu-devel] [PATCH 06/13] spapr/xive: fix migration of the XiveTCTX under TCG
Posted by Cédric Le Goater 6 years, 10 months ago
When the thread interrupt management state is retrieved from the KVM
VCPU, word2 is saved under the QEMU XIVE thread context to print out
the OS CAM line under the QEMU monitor.

This breaks the migration of a TCG guest (and with KVM when
kernel_irqchip=off) because the matching algorithm of the presenter
relies on the OS CAM value. Fix with an extra reset of the thread
contexts to restore the expected value.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr_irq.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 233c97c5ecd9..ba27d9d8e972 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -363,7 +363,31 @@ static void spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
 
 static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
 {
-    return spapr_xive_post_load(spapr->xive, version_id);
+    CPUState *cs;
+    int ret;
+
+    ret = spapr_xive_post_load(spapr->xive, version_id);
+    if (ret) {
+        return ret;
+    }
+
+    /*
+     * When the states are collected from the KVM XIVE device, word2
+     * of the XiveTCTX is set to print out the OS CAM line under the
+     * QEMU monitor.
+     *
+     * This breaks the migration on a TCG guest (or on KVM with
+     * kernel_irqchip=off) because the matching algorithm of the
+     * presenter relies on the OS CAM value. Fix with an extra reset
+     * of the thread contexts to restore the expected value.
+     */
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        /* (TCG) Set the OS CAM line of the thread interrupt context. */
+        spapr_xive_set_tctx_os_cam(cpu->tctx);
+    }
+    return 0;
 }
 
 static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
-- 
2.20.1


Re: [Qemu-devel] [PATCH 06/13] spapr/xive: fix migration of the XiveTCTX under TCG
Posted by David Gibson 6 years, 9 months ago
On Mon, Jan 07, 2019 at 07:39:39PM +0100, Cédric Le Goater wrote:
> When the thread interrupt management state is retrieved from the KVM
> VCPU, word2 is saved under the QEMU XIVE thread context to print out
> the OS CAM line under the QEMU monitor.
> 
> This breaks the migration of a TCG guest (and with KVM when
> kernel_irqchip=off) because the matching algorithm of the presenter
> relies on the OS CAM value. Fix with an extra reset of the thread
> contexts to restore the expected value.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Why is the CAM value you get from KVM different from the one you
expect in qemu?

> ---
>  hw/ppc/spapr_irq.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 233c97c5ecd9..ba27d9d8e972 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -363,7 +363,31 @@ static void spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
>  
>  static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>  {
> -    return spapr_xive_post_load(spapr->xive, version_id);
> +    CPUState *cs;
> +    int ret;
> +
> +    ret = spapr_xive_post_load(spapr->xive, version_id);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    /*
> +     * When the states are collected from the KVM XIVE device, word2
> +     * of the XiveTCTX is set to print out the OS CAM line under the
> +     * QEMU monitor.
> +     *
> +     * This breaks the migration on a TCG guest (or on KVM with
> +     * kernel_irqchip=off) because the matching algorithm of the
> +     * presenter relies on the OS CAM value. Fix with an extra reset
> +     * of the thread contexts to restore the expected value.
> +     */
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +
> +        /* (TCG) Set the OS CAM line of the thread interrupt context. */
> +        spapr_xive_set_tctx_os_cam(cpu->tctx);
> +    }
> +    return 0;
>  }
>  
>  static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH 06/13] spapr/xive: fix migration of the XiveTCTX under TCG
Posted by Cédric Le Goater 6 years, 9 months ago
On 2/8/19 6:36 AM, David Gibson wrote:
> On Mon, Jan 07, 2019 at 07:39:39PM +0100, Cédric Le Goater wrote:
>> When the thread interrupt management state is retrieved from the KVM
>> VCPU, word2 is saved under the QEMU XIVE thread context to print out
>> the OS CAM line under the QEMU monitor.
>>
>> This breaks the migration of a TCG guest (and with KVM when
>> kernel_irqchip=off) because the matching algorithm of the presenter
>> relies on the OS CAM value. Fix with an extra reset of the thread
>> contexts to restore the expected value.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> Why is the CAM value you get from KVM different from the one you
> expect in qemu?

An NVT base identifier is allocated for each VM at the OPAL level
and each vCPU getd an increment of this value. It is pushed in the 
OS CAM line when the vCPU is scheduled to run.
 
KVM XIVE names this identifier a VP id. 

The QEMU emulation of XIVE uses a fixed value for the NVT base 
identifier.

C.

 
>> ---
>>  hw/ppc/spapr_irq.c | 26 +++++++++++++++++++++++++-
>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index 233c97c5ecd9..ba27d9d8e972 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -363,7 +363,31 @@ static void spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
>>  
>>  static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>>  {
>> -    return spapr_xive_post_load(spapr->xive, version_id);
>> +    CPUState *cs;
>> +    int ret;
>> +
>> +    ret = spapr_xive_post_load(spapr->xive, version_id);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    /*
>> +     * When the states are collected from the KVM XIVE device, word2
>> +     * of the XiveTCTX is set to print out the OS CAM line under the
>> +     * QEMU monitor.
>> +     *
>> +     * This breaks the migration on a TCG guest (or on KVM with
>> +     * kernel_irqchip=off) because the matching algorithm of the
>> +     * presenter relies on the OS CAM value. Fix with an extra reset
>> +     * of the thread contexts to restore the expected value.
>> +     */
>> +    CPU_FOREACH(cs) {
>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>> +
>> +        /* (TCG) Set the OS CAM line of the thread interrupt context. */
>> +        spapr_xive_set_tctx_os_cam(cpu->tctx);
>> +    }
>> +    return 0;
>>  }
>>  
>>  static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
> 


Re: [Qemu-devel] [PATCH 06/13] spapr/xive: fix migration of the XiveTCTX under TCG
Posted by David Gibson 6 years, 8 months ago
On Fri, Feb 08, 2019 at 08:12:12AM +0100, Cédric Le Goater wrote:
> On 2/8/19 6:36 AM, David Gibson wrote:
> > On Mon, Jan 07, 2019 at 07:39:39PM +0100, Cédric Le Goater wrote:
> >> When the thread interrupt management state is retrieved from the KVM
> >> VCPU, word2 is saved under the QEMU XIVE thread context to print out
> >> the OS CAM line under the QEMU monitor.
> >>
> >> This breaks the migration of a TCG guest (and with KVM when
> >> kernel_irqchip=off) because the matching algorithm of the presenter
> >> relies on the OS CAM value. Fix with an extra reset of the thread
> >> contexts to restore the expected value.
> >>
> >> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > 
> > Why is the CAM value you get from KVM different from the one you
> > expect in qemu?
> 
> An NVT base identifier is allocated for each VM at the OPAL level
> and each vCPU getd an increment of this value. It is pushed in the 
> OS CAM line when the vCPU is scheduled to run.
>  
> KVM XIVE names this identifier a VP id. 
> 
> The QEMU emulation of XIVE uses a fixed value for the NVT base 
> identifier.

Ah, I see.

Hmm.  Really this highlights why I'm uneasy migrating the whole TCTX
as just a blob of registers, even though only some of them are really
runtime state, and others are machine configuration that can be worked
out separately at the two ends.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [Qemu-devel] [PATCH 06/13] spapr/xive: fix migration of the XiveTCTX under TCG
Posted by Cédric Le Goater 6 years, 8 months ago
On 2/12/19 1:22 AM, David Gibson wrote:
> On Fri, Feb 08, 2019 at 08:12:12AM +0100, Cédric Le Goater wrote:
>> On 2/8/19 6:36 AM, David Gibson wrote:
>>> On Mon, Jan 07, 2019 at 07:39:39PM +0100, Cédric Le Goater wrote:
>>>> When the thread interrupt management state is retrieved from the KVM
>>>> VCPU, word2 is saved under the QEMU XIVE thread context to print out
>>>> the OS CAM line under the QEMU monitor.
>>>>
>>>> This breaks the migration of a TCG guest (and with KVM when
>>>> kernel_irqchip=off) because the matching algorithm of the presenter
>>>> relies on the OS CAM value. Fix with an extra reset of the thread
>>>> contexts to restore the expected value.
>>>>
>>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>>
>>> Why is the CAM value you get from KVM different from the one you
>>> expect in qemu?
>>
>> An NVT base identifier is allocated for each VM at the OPAL level
>> and each vCPU getd an increment of this value. It is pushed in the 
>> OS CAM line when the vCPU is scheduled to run.
>>  
>> KVM XIVE names this identifier a VP id. 
>>
>> The QEMU emulation of XIVE uses a fixed value for the NVT base 
>> identifier.
> 
> Ah, I see.
> 
> Hmm.  Really this highlights why I'm uneasy migrating the whole TCTX
> as just a blob of registers, even though only some of them are really
> runtime state, and others are machine configuration that can be worked
> out separately at the two ends.

This is really a special case :

1. migration kernel_irqchip=on -> kernel_irqchip=off
2. debug info from KVM is squashing TCG state

We could ignore the VP id configured by KVM but it seems interesting
to retrieve. 

C.