From nobody Fri May 17 09:38:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1710664709; cv=none; d=zohomail.com; s=zohoarc; b=clLURjY8K47t4avOeSlUN6HQxDvsdOWJK1NwLD2kz1cn/pIZLniE6cpjU7XiXy3HyI+ATSCkBu+XaBOIxDp7umf4NnNj0MjfzJ6+8l6Ir3mTCEqnN+7hDR7XLQ5zXp1QuCD3y3a9/dNcqVSF3hM5dEe34lGkxrTgX6l81yEHuO0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1710664709; h=Content-Type:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:Reply-To:References:Sender:Subject:Subject:To:To:Message-Id; bh=1lkYpIrRWa4OWpcUFjrJgv3CoRf+Qdl+1IHluykhWQs=; b=FY4DjdVhrI1DXfNJYdVa5twMqwOWga1WoCsEWgrwrfL8ucNp1xVaVCsRsKNZG5AJM3C5huZPR5ybqOIaz2hvCIu5zx3hoR4Z0oKZplC3sveWmO8/RK67m+nR28j6ZtJQX7yTYp4oDBkfHPEIHI8Hn9JxGi2K016xFgi3GhhsFL0= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1710664709345630.2219074603075; Sun, 17 Mar 2024 01:38:29 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rlm1b-00070g-SB; Sun, 17 Mar 2024 04:37:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rlm1Z-00070K-Ta for qemu-devel@nongnu.org; Sun, 17 Mar 2024 04:37:45 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rlm1Y-000861-0a for qemu-devel@nongnu.org; Sun, 17 Mar 2024 04:37:45 -0400 Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4TyBBF2jsJz1Q9x9; Sun, 17 Mar 2024 16:35:01 +0800 (CST) Received: from kwepemi500026.china.huawei.com (unknown [7.221.188.247]) by mail.maildlp.com (Postfix) with ESMTPS id 664431402C6; Sun, 17 Mar 2024 16:37:32 +0800 (CST) Received: from DESKTOP-5IS4806.china.huawei.com (10.174.187.224) by kwepemi500026.china.huawei.com (7.221.188.247) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Sun, 17 Mar 2024 16:37:31 +0800 To: , Peter Maydell , Igor Mammedov , David Hildenbrand , Stefan Hajnoczi CC: , Zenghui Yu , , Subject: [PATCH v1 1/2] system/cpus: Fix pause_all_vcpus() under concurrent environment Date: Sun, 17 Mar 2024 16:37:03 +0800 Message-ID: <20240317083704.23244-2-zhukeqian1@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 In-Reply-To: <20240317083704.23244-1-zhukeqian1@huawei.com> References: <20240317083704.23244-1-zhukeqian1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.187.224] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemi500026.china.huawei.com (7.221.188.247) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=45.249.212.35; envelope-from=zhukeqian1@huawei.com; helo=szxga07-in.huawei.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Keqian Zhu From: Keqian Zhu via Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1710664711507100003 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Both main loop thread and vCPU thread are allowed to call pause_all_vcpus(), and in general resume_all_vcpus() is called after it. Two issues live in pause_all_vcpus(): 1. There is possibility that during thread T1 waits on qemu_pause_cond with bql unlocked, other thread has called pause_all_vcpus() and resume_all_vcpus(), then thread T1 will stuck, because the condition all_vcpus_paused() is always false. 2. After all_vcpus_paused() has been checked as true, we will unlock bql to relock replay_mutex. During the bql was unlocked, the vcpu's state may has been changed by other thread, so we must retry. Signed-off-by: Keqian Zhu --- system/cpus.c | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/system/cpus.c b/system/cpus.c index 68d161d96b..4e41abe23e 100644 --- a/system/cpus.c +++ b/system/cpus.c @@ -571,12 +571,14 @@ static bool all_vcpus_paused(void) return true; } =20 -void pause_all_vcpus(void) +static void request_pause_all_vcpus(void) { CPUState *cpu; =20 - qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false); CPU_FOREACH(cpu) { + if (cpu->stopped) { + continue; + } if (qemu_cpu_is_self(cpu)) { qemu_cpu_stop(cpu, true); } else { @@ -584,6 +586,14 @@ void pause_all_vcpus(void) qemu_cpu_kick(cpu); } } +} + +void pause_all_vcpus(void) +{ + qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false); + +retry: + request_pause_all_vcpus(); =20 /* We need to drop the replay_lock so any vCPU threads woken up * can finish their replay tasks @@ -592,14 +602,23 @@ void pause_all_vcpus(void) =20 while (!all_vcpus_paused()) { qemu_cond_wait(&qemu_pause_cond, &bql); - CPU_FOREACH(cpu) { - qemu_cpu_kick(cpu); - } + /* During we waited on qemu_pause_cond the bql was unlocked, + * the vcpu's state may has been changed by other thread, so + * we must request the pause state on all vcpus again. + */ + request_pause_all_vcpus(); } =20 bql_unlock(); replay_mutex_lock(); bql_lock(); + + /* During the bql was unlocked, the vcpu's state may has been + * changed by other thread, so we must retry. + */ + if (!all_vcpus_paused()) { + goto retry; + } } =20 void cpu_resume(CPUState *cpu) --=20 2.33.0 From nobody Fri May 17 09:38:25 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=nongnu.org ARC-Seal: i=1; a=rsa-sha256; t=1710664689; cv=none; d=zohomail.com; s=zohoarc; b=O55PnZNTJCaGE+SKuy68JgNZnKtIocUmBWyE6pxVUV0NlckY1esnlH5hRCWVNS3aePt+fY9cOJxTrxPmGj1Itznv1f0RTVkoi1s8LuDq8tRjsbSskHRfWBIxopsZFT/vogTUootpBbe89X4r/04t2dg7nLrYk02LsJyCProhhgs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1710664689; h=Content-Type:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:Reply-To:References:Sender:Subject:Subject:To:To:Message-Id; bh=jKh+9GXJWTSXLLzQbklTZ9OFAV+ysfm6RgqM1fmGExM=; b=Z5XWlD8xN8SdMzv1toHoVlWmunBSD4QJNqi2dt+lXfEYklD0vQ8wSqOarIdRJH+QlobEz1TtEwBHpQjm6hNtoERP56UZc1dI2aZK0OWG7dBo1w9+ZtmTX4xw8T5Lvf5j5ChU1wBug3HQmQ7QIRbeHJnOkWQWOKDF7v9CgxVmJcw= ARC-Authentication-Results: i=1; mx.zohomail.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1710664689241599.1066963664161; Sun, 17 Mar 2024 01:38:09 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rlm1d-00070w-3M; Sun, 17 Mar 2024 04:37:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rlm1b-00070W-5i for qemu-devel@nongnu.org; Sun, 17 Mar 2024 04:37:47 -0400 Received: from szxga07-in.huawei.com ([45.249.212.35]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rlm1Y-00086e-9y for qemu-devel@nongnu.org; Sun, 17 Mar 2024 04:37:46 -0400 Received: from mail.maildlp.com (unknown [172.19.88.234]) by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4TyBBJ2HG5z1QBFP; Sun, 17 Mar 2024 16:35:04 +0800 (CST) Received: from kwepemi500026.china.huawei.com (unknown [7.221.188.247]) by mail.maildlp.com (Postfix) with ESMTPS id 57B2E1402C6; Sun, 17 Mar 2024 16:37:35 +0800 (CST) Received: from DESKTOP-5IS4806.china.huawei.com (10.174.187.224) by kwepemi500026.china.huawei.com (7.221.188.247) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Sun, 17 Mar 2024 16:37:34 +0800 To: , Peter Maydell , Igor Mammedov , David Hildenbrand , Stefan Hajnoczi CC: , Zenghui Yu , , Subject: [PATCH v1 2/2] system/cpus: Fix resume_all_vcpus() under vCPU hotplug condition Date: Sun, 17 Mar 2024 16:37:04 +0800 Message-ID: <20240317083704.23244-3-zhukeqian1@huawei.com> X-Mailer: git-send-email 2.8.4.windows.1 In-Reply-To: <20240317083704.23244-1-zhukeqian1@huawei.com> References: <20240317083704.23244-1-zhukeqian1@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.174.187.224] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To kwepemi500026.china.huawei.com (7.221.188.247) Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=45.249.212.35; envelope-from=zhukeqian1@huawei.com; helo=szxga07-in.huawei.com X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-to: Keqian Zhu From: Keqian Zhu via Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZM-MESSAGEID: 1710664691576100006 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For vCPU being hotplugged, qemu_init_vcpu() is called. In this function, we set vcpu state as stopped, and then wait vcpu thread to be created. As the vcpu state is stopped, it will inform us it has been created and then wait on halt_cond. After we has realized vcpu object, we will resume the vcpu thread. However, during we wait vcpu thread to be created, the bql is unlocked, and other thread is allowed to call resume_all_vcpus(), which will resume the un-realized vcpu. This fixes the issue by filter out un-realized vcpu during resume_all_vcpus(). Signed-off-by: Keqian Zhu --- system/cpus.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/system/cpus.c b/system/cpus.c index 4e41abe23e..8871f5dfa9 100644 --- a/system/cpus.c +++ b/system/cpus.c @@ -638,6 +638,9 @@ void resume_all_vcpus(void) =20 qemu_clock_enable(QEMU_CLOCK_VIRTUAL, true); CPU_FOREACH(cpu) { + if (!object_property_get_bool(OBJECT(cpu), "realized", &error_abor= t)) { + continue; + } cpu_resume(cpu); } } --=20 2.33.0