From nobody Wed Nov 5 18:39:28 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; dkim=fail; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1536306072471155.89797734879266; Fri, 7 Sep 2018 00:41:12 -0700 (PDT) Received: from localhost ([::1]:37157 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fyBOB-00087w-CL for importer@patchew.org; Fri, 07 Sep 2018 03:41:11 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45845) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fyBFs-0000FZ-1n for qemu-devel@nongnu.org; Fri, 07 Sep 2018 03:32:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fyBFr-0005L1-46 for qemu-devel@nongnu.org; Fri, 07 Sep 2018 03:32:36 -0400 Received: from ozlabs.org ([203.11.71.1]:37413) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fyBFq-00055M-Em; Fri, 07 Sep 2018 03:32:35 -0400 Received: by ozlabs.org (Postfix, from userid 1007) id 4268Jz3fQXz9sj3; Fri, 7 Sep 2018 17:32:02 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gibson.dropbear.id.au; s=201602; t=1536305523; bh=68xbPUPojAuRJqwktuX9TUyw3j2CxDbj6f+pSKLmj+g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=bGAuSZlvCB1xeglOUREDwPxTzmC90N3MnHqn5mRclwOtqJtRpGD36zkXSQ8ipq4Er 3FlScaLDZdZBduTaJKCLopyfBkmLZXVUdwShFOuxp8tKm8OOH83HQdW/pbLk6+Fs0L DNdbTd/XYpC7vDvKjCN1W2BczzFL1AjbD3XixX6Y= From: David Gibson To: peter.maydell@linaro.org Date: Fri, 7 Sep 2018 17:31:53 +1000 Message-Id: <20180907073155.26200-13-david@gibson.dropbear.id.au> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180907073155.26200-1-david@gibson.dropbear.id.au> References: <20180907073155.26200-1-david@gibson.dropbear.id.au> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 203.11.71.1 Subject: [Qemu-devel] [PULL 12/14] Fix a deadlock case in the CPU hotplug flow X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: lvivier@redhat.com, Jose Ricardo Ziviani , mark.cave-ayland@ilande.co.uk, qemu-devel@nongnu.org, groug@kaod.org, qemu-ppc@nongnu.org, clg@kaod.org, David Gibson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: fail (Header signature does not verify) X-ZohoMail: RDKM_2 RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Jose Ricardo Ziviani We need to set cs->halted to 1 before calling ppc_set_compat. The reason is that ppc_set_compat kicks up the new thread created to manage the hotplugged KVM virtual CPU and the code drives directly to KVM_RUN ioctl. When cs->halted is 1, the code: int kvm_cpu_exec(CPUState *cpu) ... if (kvm_arch_process_async_events(cpu)) { atomic_set(&cpu->exit_request, 0); return EXCP_HLT; } ... returns before it reaches KVM_RUN, giving time to the main thread to finish its job. Otherwise we can fall in a deadlock because the KVM thread will issue the KVM_RUN ioctl while the main thread is setting up KVM registers. Depending on how these jobs are scheduled we'll end up freezing QEMU. The following output shows kvm_vcpu_ioctl sleeping because it cannot get the mutex and never will. PS: kvm_vcpu_ioctl was triggered kvm_set_one_reg - compat_pvr. STATE: TASK_UNINTERRUPTIBLE|TASK_WAKEKILL PID: 61564 TASK: c000003e981e0780 CPU: 48 COMMAND: "qemu-system-ppc" #0 [c000003e982679a0] __schedule at c000000000b10a44 #1 [c000003e98267a60] schedule at c000000000b113a8 #2 [c000003e98267a90] schedule_preempt_disabled at c000000000b11910 #3 [c000003e98267ab0] __mutex_lock at c000000000b132ec #4 [c000003e98267bc0] kvm_vcpu_ioctl at c00800000ea03140 [kvm] #5 [c000003e98267d20] do_vfs_ioctl at c000000000407d30 #6 [c000003e98267dc0] ksys_ioctl at c000000000408674 #7 [c000003e98267e10] sys_ioctl at c0000000004086f8 #8 [c000003e98267e30] system_call at c00000000000b488 crash> struct -x kvm.vcpus 0xc000003da0000000 vcpus =3D {0xc000003db4880000, 0xc000003d52b80000, 0xc0000039e9c80000, 0xc0= 00003d0e200000, 0xc000003d58280000, 0x0, 0x0, ...} crash> struct -x kvm_vcpu.mutex.owner 0xc000003d58280000 mutex.owner =3D { counter =3D 0xc000003a23a5c881 <- flag 1: waiters }, crash> bt 0xc000003a23a5c880 PID: 61579 TASK: c000003a23a5c880 CPU: 9 COMMAND: "CPU 4/KVM" (active) crash> struct -x kvm_vcpu.mutex.wait_list 0xc000003d58280000 mutex.wait_list =3D { next =3D 0xc000003e98267b10, prev =3D 0xc000003e98267b10 }, crash> struct -x mutex_waiter.task 0xc000003e98267b10 task =3D 0xc000003e981e0780 The following command-line was used to reproduce the problem (note: gdb and trace can change the results). $ qemu-ppc/build/ppc64-softmmu/qemu-system-ppc64 -cpu host \ -enable-kvm -m 4096 \ -smp 4,maxcpus=3D8,sockets=3D1,cores=3D2,threads=3D4 \ -display none -nographic \ -drive file=3Ddisk1.qcow2,format=3Dqcow2 ... (qemu) device_add host-spapr-cpu-core,core-id=3D4 [no interaction is possible after it, only SIGKILL to take the terminal back] Signed-off-by: Jose Ricardo Ziviani Reviewed-by: Greg Kurz Signed-off-by: David Gibson --- hw/ppc/spapr_cpu_core.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index 876f0b3d9d..a73b244a3f 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -34,16 +34,16 @@ static void spapr_cpu_reset(void *opaque) =20 cpu_reset(cs); =20 - /* Set compatibility mode to match the boot CPU, which was either set - * by the machine reset code or by CAS. This should never fail. - */ - ppc_set_compat(cpu, POWERPC_CPU(first_cpu)->compat_pvr, &error_abort); - /* All CPUs start halted. CPU0 is unhalted from the machine level * reset code and the rest are explicitly started up by the guest * using an RTAS call */ cs->halted =3D 1; =20 + /* Set compatibility mode to match the boot CPU, which was either set + * by the machine reset code or by CAS. This should never fail. + */ + ppc_set_compat(cpu, POWERPC_CPU(first_cpu)->compat_pvr, &error_abort); + env->spr[SPR_HIOR] =3D 0; =20 lpcr =3D env->spr[SPR_LPCR]; --=20 2.17.1