[Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop

Heyi Guo posted 1 patch 5 years, 1 month ago
Test checkpatch passed
Test asan failed
Test docker-mingw@fedora passed
Test docker-clang@ubuntu passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/1552377439-24640-1-git-send-email-guoheyi@huawei.com
Maintainers: Peter Maydell <peter.maydell@linaro.org>
target/arm/kvm.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
[Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop
Posted by Heyi Guo 5 years, 1 month ago
When we stop a VM for more than 30 seconds and then resume it, by qemu
monitor command "stop" and "cont", Linux on VM will complain of "soft
lockup - CPU#x stuck for xxs!" as below:

[ 2783.809517] watchdog: BUG: soft lockup - CPU#3 stuck for 2395s!
[ 2783.809559] watchdog: BUG: soft lockup - CPU#2 stuck for 2395s!
[ 2783.809561] watchdog: BUG: soft lockup - CPU#1 stuck for 2395s!
[ 2783.809563] Modules linked in...

This is because Guest Linux uses generic timer virtual counter as
a software watchdog, and CNTVCT_EL0 does not stop when VM is stopped
by qemu.

This patch is to fix this issue by saving the value of CNTVCT_EL0 when
stopping and restoring it when resuming.

Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Heyi Guo <guoheyi@huawei.com>
---
 target/arm/kvm.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 79a79f0..73b9ecb 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -39,11 +39,77 @@ static bool cap_has_inject_serror_esr;
 
 static ARMHostCPUFeatures arm_host_cpu_features;
 
+static int get_vcpu_timer_tick(CPUState *cs, uint64_t *tick_at_pause)
+{
+    int err;
+    struct kvm_one_reg reg;
+
+    reg.id = KVM_REG_ARM_TIMER_CNT;
+    reg.addr = (uintptr_t) tick_at_pause;
+
+    err = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &reg);
+    return err;
+}
+
+static int set_vcpu_timer_tick(CPUState *cs, uint64_t tick_at_pause)
+{
+    int err;
+    struct kvm_one_reg reg;
+
+    reg.id = KVM_REG_ARM_TIMER_CNT;
+    reg.addr = (uintptr_t) &tick_at_pause;
+
+    err = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &reg);
+    return err;
+}
+
+static void arch_timer_change_state_handler(void *opaque, int running,
+                                            RunState state)
+{
+    static uint64_t hw_ticks_at_paused;
+    static RunState pre_state = RUN_STATE__MAX;
+    int err;
+    CPUState *cs = (CPUState *)opaque;
+
+    switch (state) {
+    case RUN_STATE_PAUSED:
+        err = get_vcpu_timer_tick(cs, &hw_ticks_at_paused);
+        if (err) {
+            error_report("Get vcpu timer tick failed: %d", err);
+        }
+        break;
+    case RUN_STATE_RUNNING:
+        if (pre_state == RUN_STATE_PAUSED) {
+            err = set_vcpu_timer_tick(cs, hw_ticks_at_paused);
+            if (err) {
+                error_report("Resume vcpu timer tick failed: %d", err);
+            }
+        }
+        break;
+    default:
+        break;
+    }
+
+    pre_state = state;
+}
+
 int kvm_arm_vcpu_init(CPUState *cs)
 {
     ARMCPU *cpu = ARM_CPU(cs);
     struct kvm_vcpu_init init;
 
+    /*
+     * Only add change state handler for arch timer once, for KVM will help to
+     * synchronize virtual timer of all VCPUs.
+     */
+    static bool arch_timer_change_state_handler_added;
+
+
+    if (!arch_timer_change_state_handler_added) {
+        qemu_add_vm_change_state_handler(arch_timer_change_state_handler, cs);
+        arch_timer_change_state_handler_added = true;
+    }
+
     init.target = cpu->kvm_target;
     memcpy(init.features, cpu->kvm_init_features, sizeof(init.features));
 
-- 
1.8.3.1


Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop
Posted by Richard Henderson 5 years, 1 month ago
On 3/12/19 12:57 AM, Heyi Guo wrote:
>  int kvm_arm_vcpu_init(CPUState *cs)
>  {
>      ARMCPU *cpu = ARM_CPU(cs);
>      struct kvm_vcpu_init init;
>  
> +    /*
> +     * Only add change state handler for arch timer once, for KVM will help to
> +     * synchronize virtual timer of all VCPUs.
> +     */
> +    static bool arch_timer_change_state_handler_added;
> +
> +
> +    if (!arch_timer_change_state_handler_added) {
> +        qemu_add_vm_change_state_handler(arch_timer_change_state_handler, cs);
> +        arch_timer_change_state_handler_added = true;
> +    }

Which means that this will not trigger for the second time that you pause the
vm.  You need to key this off of something else, like cpu id 0.


r~

Re: [Qemu-devel] [RFC] arm/cpu: fix soft lockup panic after resuming from stop
Posted by Heyi Guo 5 years, 1 month ago
Hi Richard,



On 2019/3/12 22:59, Richard Henderson wrote:
> On 3/12/19 12:57 AM, Heyi Guo wrote:
>>   int kvm_arm_vcpu_init(CPUState *cs)
>>   {
>>       ARMCPU *cpu = ARM_CPU(cs);
>>       struct kvm_vcpu_init init;
>>   
>> +    /*
>> +     * Only add change state handler for arch timer once, for KVM will help to
>> +     * synchronize virtual timer of all VCPUs.
>> +     */
>> +    static bool arch_timer_change_state_handler_added;
>> +
>> +
>> +    if (!arch_timer_change_state_handler_added) {
>> +        qemu_add_vm_change_state_handler(arch_timer_change_state_handler, cs);
>> +        arch_timer_change_state_handler_added = true;
>> +    }
> Which means that this will not trigger for the second time that you pause the
> vm.  You need to key this off of something else, like cpu id 0.
I don't quite understand. Do you mean the handler will be deactivated after one trigger? Or something else?

I suppose the state change handler will take effect for the whole VM life time, so I believed one handler for one VM is enough, in whichever vCPU initialization it is created. I also tested several times for one VM, and fortunately the rough code worked.

Thanks,

Heyi

>
>
> r~
>
>