From nobody Fri Dec 19 16:06:41 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79A47C6FA8F for ; Wed, 30 Aug 2023 19:22:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241132AbjH3TTm (ORCPT ); Wed, 30 Aug 2023 15:19:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243246AbjH3KaE (ORCPT ); Wed, 30 Aug 2023 06:30:04 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3BB8C0; Wed, 30 Aug 2023 03:30:00 -0700 (PDT) Date: Wed, 30 Aug 2023 10:29:58 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1693391399; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qiHeCeWMqDfMSLrY6Ovkw5z04+RV1jb4Pn6ptsA182g=; b=C7zeWOUFsIxIOmrcGuA8IfFFwpcUzR2rVivjQI4/pT4xX9yEq6w5+H5ODHp/6EQujgdV80 CodiI1EZdivpvPRUGqjLf4/pgJ5Av9Ta6tgGMM30VdTaWu1v2kXd+FVaX/VzQ3IjepvLMc TdoZKaRA5MuMu9qB3RaPonvBcs2NaGtmvv6KJE3GNLPGBGwejWarKv7mDm/7/qb7kG6fvv MzNqqau1d6NUZBqCfKkjPOy980SVmgRWw9ay8yrnxvyC8cGjM3QUoxrFpKtF7gfEkQiYkk TGQDsBCl8XknJYvq/WVOWTnlNaBU+UG4Ic49nkqDCTy6Lk4Ngs+csvLD5AmJXw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1693391399; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qiHeCeWMqDfMSLrY6Ovkw5z04+RV1jb4Pn6ptsA182g=; b=CvDNKDlmsb+IHEd3R72G0w2INwWC9axP1vaDZvpoeUlOA0ojqtGDxJ975TiP8Zsi1Y+5y0 vqo+XOIrEc9aYnCA== From: "tip-bot2 for Thomas Gleixner" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: smp/urgent] cpu/hotplug: Prevent self deadlock on CPU hot-unplug Cc: Xiongfeng Wang , Thomas Gleixner , Yu Liao , Vincent Guittot , stable@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com> References: <8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com> MIME-Version: 1.0 Message-ID: <169339139819.27769.17621875462076784482.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The following commit has been merged into the smp/urgent branch of tip: Commit-ID: 2b8272ff4a70b866106ae13c36be7ecbef5d5da2 Gitweb: https://git.kernel.org/tip/2b8272ff4a70b866106ae13c36be7ecbe= f5d5da2 Author: Thomas Gleixner AuthorDate: Wed, 23 Aug 2023 10:47:02 +02:00 Committer: Thomas Gleixner CommitterDate: Wed, 30 Aug 2023 12:24:22 +02:00 cpu/hotplug: Prevent self deadlock on CPU hot-unplug Xiongfeng reported and debugged a self deadlock of the task which initiates and controls a CPU hot-unplug operation vs. the CFS bandwidth timer. CPU1 CPU2 T1 sets cfs_quota starts hrtimer cfs_bandwidth 'period_timer' T1 is migrated to CPU2 =09 T1 initiates offlining of CPU1 Hotplug operation starts ... 'period_timer' expires and is re-enqueued on CPU1 ... take_cpu_down() CPU1 shuts down and does not handle timers anymore. They have to be migrated in the post dead hotplug steps by the control task. T1 runs the post dead offline operation T1 is scheduled out T1 waits for 'period_timer' to expire T1 waits there forever if it is scheduled out before it can execute the hrt= imer offline callback hrtimers_dead_cpu(). Cure this by delegating the hotplug control operation to a worker thread on an online CPU. This takes the initiating user space task, which might be affected by the bandwidth timer, completely out of the picture. Reported-by: Xiongfeng Wang Signed-off-by: Thomas Gleixner Tested-by: Yu Liao Acked-by: Vincent Guittot Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@hua= wei.com Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx --- kernel/cpu.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index f6811c8..6de7c6b 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1487,8 +1487,22 @@ out: return ret; } =20 +struct cpu_down_work { + unsigned int cpu; + enum cpuhp_state target; +}; + +static long __cpu_down_maps_locked(void *arg) +{ + struct cpu_down_work *work =3D arg; + + return _cpu_down(work->cpu, 0, work->target); +} + static int cpu_down_maps_locked(unsigned int cpu, enum cpuhp_state target) { + struct cpu_down_work work =3D { .cpu =3D cpu, .target =3D target, }; + /* * If the platform does not support hotplug, report it explicitly to * differentiate it from a transient offlining failure. @@ -1497,7 +1511,15 @@ static int cpu_down_maps_locked(unsigned int cpu, en= um cpuhp_state target) return -EOPNOTSUPP; if (cpu_hotplug_disabled) return -EBUSY; - return _cpu_down(cpu, 0, target); + + /* + * Ensure that the control task does not run on the to be offlined + * CPU to prevent a deadlock against cfs_b->period_timer. + */ + cpu =3D cpumask_any_but(cpu_online_mask, cpu); + if (cpu >=3D nr_cpu_ids) + return -EBUSY; + return work_on_cpu(cpu, __cpu_down_maps_locked, &work); } =20 static int cpu_down(unsigned int cpu, enum cpuhp_state target)