From nobody Thu Jan 30 17:27:54 2025 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2076.outbound.protection.outlook.com [40.107.237.76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DF391D61BF for ; Thu, 23 Jan 2025 23:42:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.237.76 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737675752; cv=fail; b=MdGX6Hkw9EC6Jt68+JDbNYC7WMVqowjkiOzB0N9RSSszgt6g2oBFWdUtYqRhQq1WZZG9eNwEqyL32Wietb6y/DeLHmLSAzRRMJjHdKtL29zNBDBfzxGIj/FC8FUeYMjTv3uIdViOi5or6DdrBB9l0YDwpVfrqFOsSfMaquTIw2s= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737675752; c=relaxed/simple; bh=WU1coJZwZNhG2/7W4FD43gvLiNcncR2dNI3wQmkKnIU=; h=From:To:Cc:Subject:Date:Message-ID:Content-Type:MIME-Version; b=FtmcNt6HM31/3qCc5CjUwtQUiFIct6ilhaaxsO5kNn4qKToYZf3A9bHMg1gcl/TwEK6naD88BGeai4Q8Be4ae3uAGfR3bdAkKYzEtfiQ+CsIgE+xltfDsI2sEpd7i23DgNmy3FcieQgT5T4K7/nsgW/JnCOjTqv/B8XlwogtcEU= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=sd3mjiFj; arc=fail smtp.client-ip=40.107.237.76 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="sd3mjiFj" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=TueDZK7cWRsIgauhQJ7F0aBpFtC/FZsVldKHs/v2zRM1hJ9pVQiT4d2bVhlF21CbGBjB85Oi0DnlVwOy6graXf0QnGxoZ0hblUwKsIkcjfrxR2VLK+hHZIepp1FVxFWjRIOs0dhUn/ogJDTUb7xATL2ERI9hJvHJ8bCr8UhnHhD+at09PX9ftFv18P2S7vsIaiUpGG9SkBXAir5P8SsLQuP0yb+zH/k7/llhAqjRgOdSMelilknJx+mfLxBECIp6zfV5/aiyWlZrqKMlc+zlljZ74n02b1bAq2DZFuw990uxN106LgOfClZMjvJTIhM4J55t84w22jqqv5xsXLlcPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YfW/BNB8b9gt8pQKKOzsmrhmmo3+E/Myg/gVR00XVqM=; b=SlTpwEK6a4rPtDHUptGL8BPynZ2yQI80+yLH+fyhnpgaSv9pUcdHM2Z/CK6JVpT9TtoEd6LD07CR+klroqQQSl5GKc40EVFc86JlaRSYcuT74xiPev4icEHjKD4Ltw6djEDbSRvML59toXuu7PS6aaqmd6sI8LL4z+KyV18vNqnAmg28QDmYp2uc1GPvFyUenWD2ZCTu+4eRTL66u2VwxZ6QiGk5v/9fSLqhEedk5oWqA4RWtwOJzA7Fw/J67SfD7s0NrFcOjAZcJ5fLPRO1eLhP5vagif6Fq7q8vLEXftQPzSV5kpRBq1O+Erf4OhWacI3FlxpDlQcnnomNHYc+7g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YfW/BNB8b9gt8pQKKOzsmrhmmo3+E/Myg/gVR00XVqM=; b=sd3mjiFjveXLPKjCysZffLZdsy1XqRWJUFii5RISoN/uFQApw5uYxYwTOMUfQKinBGXj89WyAhUl4EPqctAV3Uo7ThDsCMB/ANZiT/FmIKtny5putDRuC9H+LJOHUyJbH2nqDqc1kbmeqZkKXSkrTDc8TKP8NCtbnoeF79BTJgRF8w/uWlAxrWfernLVXfTsNtm6UpFQg7FTt8VjN2f2QBX9c/JkzO480o7chx6w3y56im7mC5S0fLNKB1l3pw+Qd6qOaB14CbIGZnDmJnY2969UAbD/0PMw8k2Wv7Pmh3kPlVd4QpryDILsh5eosKp22YJEifq5Hd6W9Iq5lV+ZgA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) by DM4PR12MB6062.namprd12.prod.outlook.com (2603:10b6:8:b2::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8377.19; Thu, 23 Jan 2025 23:42:26 +0000 Received: from CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5%3]) with mapi id 15.20.8356.020; Thu, 23 Jan 2025 23:42:25 +0000 From: Andrea Righi To: Tejun Heo , David Vernet , Changwoo Min Cc: linux-kernel@vger.kernel.org Subject: [PATCH] sched_ext: Fix lock imbalance in dispatch_to_local_dsq() Date: Fri, 24 Jan 2025 00:42:20 +0100 Message-ID: <20250123234220.36680-1-arighi@nvidia.com> X-Mailer: git-send-email 2.48.1 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: FR0P281CA0226.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:b2::19) To CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|DM4PR12MB6062:EE_ X-MS-Office365-Filtering-Correlation-Id: 3b2af6a1-fc14-4003-791a-08dd3c079743 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?6vFHYCqFDEhlDyG+jmz3YQYsa0iBM482xQGrHlz8bHTIokIM71e8RvVnwVW4?= =?us-ascii?Q?JAww86WZ1f04/xhSYkN6P/fAd5HFj5Z+hpiwifoXOwuvCFdcjR/9rZmyg31F?= =?us-ascii?Q?4FQ7qatXq/u53bOY9jjLYXWwGJ+xy66FPwF4y50viaFca+HcqGLp+TlkR1yl?= =?us-ascii?Q?kDUUYH3kH7mZZmVuOaRMXU7EHd3z2TmAFJYIzvb2HftGBZCzm/E2Wts23uB0?= =?us-ascii?Q?aDbnNeMRTNswIZSM2E5JiS/XlEAScyMPStzwBVDhiPpUH+axwHDo7NrE0LsF?= =?us-ascii?Q?yhWWkpULQ2pfVoSnF2SkHLxTbiMKeTaJ9Xwyla5nvS1ldbVmn+ucIydBiMmE?= =?us-ascii?Q?STsrN9lYD5H9DHK8Ag7rJS8ywnYwTa5PhvIDNSqAGaFe0ATIMQNFc0KfdkEu?= =?us-ascii?Q?CHwPAIx+s7MQ2Rd3Wog9plRuy5vdMwtasJGQ6QbY872x8t5p0lLXND9vSwCb?= =?us-ascii?Q?xes0OkKl/jVvLjtHIXIdF2TZfZSMBuxsmEXqmOFaZnHOZdABfZnjj/u12HOZ?= =?us-ascii?Q?jt16yOz/rdgcV2Jdx3fA6RnZPgp6qCCAAcbImd34jabZP8Oae47X1zvp4ALv?= =?us-ascii?Q?sG1wZM+wkz63dCJQrNahAfq7aA/RC6sKTi5k5NKNwhZFOuTWcFMHc2b67ZDT?= =?us-ascii?Q?S6yNPh4TbC0R8AA5KiMoJz80ExplPwqYxKVuojf6gBXP9Wt9sXFJfB0Zp5zi?= =?us-ascii?Q?83NGs1vquc1PGAXB9tX5CPJTrGLerTF8muvxB6PvEOFGpGowe7YHsWb2Cckl?= =?us-ascii?Q?h1K7gYeMh00sS+T7GQHAATtxwxX36ko+mPo8oKbLCejQCzYI5R5m1fDAlpOI?= =?us-ascii?Q?7n5wqI+wJrsockx9WHop0X7g+4/tnWvs9rirwKZeQ5nPvSvVnuYVW9KsyLcU?= =?us-ascii?Q?zRVvPWZaCQjqCaWWO3fx1izJBw7PS6/94fuToAmXSl4LLNiVMc6pi8o0Z2iN?= =?us-ascii?Q?juDGKMI5HdKdL7ACdXUmGAng8c3DbqcCy90xScv+JGlkNqgxAhRHGnHjCgJ3?= =?us-ascii?Q?BTMkTiKQxYvsle3QrhxcRaeLR/42XJ5Doh2hVYGTiLTjzzpO/H3cuxbhOFGh?= =?us-ascii?Q?Of9YVYb49zTiNObzCXPmvLycVw90FU2vPrQ6weFH6Wg1wtLC0bugmH43Zy44?= =?us-ascii?Q?kGAShw+TDamAHhOGFLABgGCXfA16MO36MR28MwkBGRP152UDfuPv/oR+c88k?= =?us-ascii?Q?HCtsWUWHqZ5kMWR+4TdLaoFq3+22IZajYJWTMGma/uUNixKRZ/UnMfSc2JQQ?= =?us-ascii?Q?izRrY2GKDG4qdxV5w+i6Yi3Ugbe+wricdSvsKdqsdzGC6JeoUB5xoufYGUjG?= =?us-ascii?Q?gu0phGAeI0Qx83CpMfxFLANUeUkR9YJdrKJREket4MiXW1aNndWFJNl1tYQn?= =?us-ascii?Q?ZzEmOe9UoF2JsvGr4byDfy2m2pTu?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?eCJ5DzNtV4sAEKszF0zy7nYTXFWD11JZ6f4QoY64/JERKqiOl1cAWaMGqUGC?= =?us-ascii?Q?wdyWZu6c3lMsmW3tLeoaqIY4LvOts8qN2+q5XfFIwPD9BDqGSdDMmz4eSI5t?= =?us-ascii?Q?p2pFdHRlTTPa0DBNKOPPkvLIUOBt0tTHHeCkoALfDHe9BytmblIsSsaGxC3M?= =?us-ascii?Q?P6+kttFL8w6YTT/9V7CYKJEfFp/Gm67hqzGvdaBTMK9ArXpZ5wHPyqc0zlbn?= =?us-ascii?Q?70xSBAR+ZpmwRiiXwZDHJ+mBVkfBFgs/Re5bhDRteu//BxYIWprhObgTFx8N?= =?us-ascii?Q?Vph3eQOylailR2IrDypYeU8IYrsLtt4QkJuln40tmykR/RQtpwM9L/IfXn0g?= =?us-ascii?Q?OIJCkSgs4mdMI3povj8XNnkGSoW7Nls93zJTxJnthsUILsbEbD4t/OztiDbN?= =?us-ascii?Q?n2wOyrTmYV3qMYiW1Z7FLrdl3D5nOT5WZJY/Bm1coycYnqCqfp20ec7IZhlA?= =?us-ascii?Q?kYIEUyhLApe/5+evGRtocRySmZwyd9s9mZ30F1AKSIP88Ob22LYPg9zdLl1j?= =?us-ascii?Q?oycOxwtSAd7rvH4HOBaywekbdxBtTTHPYa1wMQ6o4nGEQbo/ns97Zf6EPXZ8?= =?us-ascii?Q?EKs59Tr0iyrRXpJEYqiWroX9T9eU3YElupFonho5ihNv+PZNvdrL+JM225OF?= =?us-ascii?Q?TVCETmumcKKq0Eta7oXw2L6Q2rpPDRjEmVDGYeTQxGPkbfDaSiLsinWxZCiQ?= =?us-ascii?Q?PQKtRsyIEBSKZpdHHsVw3JVuzNvT6kf3/tbDz3lGpcn6+Qy3mKsXQUJ2F0hT?= =?us-ascii?Q?vnSKJEwxrZFXWEOMBKFMdKZWS/F0w7ofuF2Tm8x0YM+vDgc/1IhwqRwwNLb0?= =?us-ascii?Q?lc6PtlugTFAbarzprTUgRBxlwCbP1LyNg7ZKUwwRqcDh8ykdQhg6ZYl0RC7V?= =?us-ascii?Q?jHxzW0E3fejWBcyUrcsJrMPLNBPuSQf4h3lySShfzqNE00BRRH3cv+7dwOee?= =?us-ascii?Q?dIzP++schsQIHcDxyO+Gw0fm9mQqK9gM/3GgYjgJqR3/XNapof75blrsI1y5?= =?us-ascii?Q?HoP2k7j3xyHYxVRVz6amZEHZX33zo8J00vlCvdcDV9+XkMHOTE1r6L6lfkUN?= =?us-ascii?Q?Fi4OtSWHPA8EYn2s3HEUm7XXscb2MCbndrW55NHabLWQjtwehPqGBdyl1ruE?= =?us-ascii?Q?GIgyTZnzjtyGgtC/M2XX5AsSzCAPPCfA0lOwXBuLeMUE9EnORGcrFY4lxAVi?= =?us-ascii?Q?YPmlXbb3rKVU5CfTCR6JUKBic9mEe9hYqtM175s17qkzAq5lDLIX4tuaBtHW?= =?us-ascii?Q?bpqeYIXs5JORkIuF4uJNMR+dyVHVSQvJi52kkYkls1n1fUVkeEBwUHsfjGVu?= =?us-ascii?Q?fGIgoSfe3b3ZCazQYV6WACNlO+TDl8x0UyzBo+/1BR7dJMO5QvAKbMPDVlhX?= =?us-ascii?Q?bVf+faspvybmmhfRGYfvPE0Fux/W33ho8O5tV9KvlvVw20rErni8zRiQ8qBq?= =?us-ascii?Q?op6jvEu9ca+t5t1Yx34Pr/1iDKZS0/zwD1XHAXV9TbjS7b7mrS4+ACkNafXw?= =?us-ascii?Q?bB3PTB3xH7xgKK7iwQnpujbeM6i4LGyGzoTeO2//301012pJ84S5maaspDrD?= =?us-ascii?Q?1crfGrs7gFGwZXTgZMzvgKtFzmsqao64gC4eAnSD?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3b2af6a1-fc14-4003-791a-08dd3c079743 X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Jan 2025 23:42:25.7896 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: lPwObCkUvag76vqJBMhYbJ9ph7ka5UEdCBgTMvTZtiLXVduZtF/Gv/Rgr7H1FCXtSCBIaEfZ3Myj9j95DyJjfQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB6062 Content-Type: text/plain; charset="utf-8" While performing the rq locking dance in dispatch_to_local_dsq(), we may trigger the following lock imbalance condition, in particular when multiple tasks are rapidly changing CPU affinity (i.e., running a `stress-ng --race-sched 0`): [ 13.413579] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D [ 13.413660] WARNING: bad unlock balance detected! [ 13.413729] 6.13.0-virtme #15 Not tainted [ 13.413792] ------------------------------------- [ 13.413859] kworker/1:1/80 is trying to release lock (&rq->__lock) at: [ 13.413954] [] dispatch_to_local_dsq+0x108/0x1a0 [ 13.414111] but there are no more locks to release! [ 13.414176] [ 13.414176] other info that might help us debug this: [ 13.414258] 1 lock held by kworker/1:1/80: [ 13.414318] #0: ffff8b66feb41698 (&rq->__lock){-.-.}-{2:2}, at: raw_spi= n_rq_lock_nested+0x20/0x90 [ 13.414612] [ 13.414612] stack backtrace: [ 13.415255] CPU: 1 UID: 0 PID: 80 Comm: kworker/1:1 Not tainted 6.13.0-v= irtme #15 [ 13.415505] Workqueue: 0x0 (events) [ 13.415567] Sched_ext: dsp_local_on (enabled+all), task: runnable_at=3D-= 2ms [ 13.415570] Call Trace: [ 13.415700] [ 13.415744] dump_stack_lvl+0x78/0xe0 [ 13.415806] ? dispatch_to_local_dsq+0x108/0x1a0 [ 13.415884] print_unlock_imbalance_bug+0x11b/0x130 [ 13.415965] ? dispatch_to_local_dsq+0x108/0x1a0 [ 13.416226] lock_release+0x231/0x2c0 [ 13.416326] _raw_spin_unlock+0x1b/0x40 [ 13.416422] dispatch_to_local_dsq+0x108/0x1a0 [ 13.416554] flush_dispatch_buf+0x199/0x1d0 [ 13.416652] balance_one+0x194/0x370 [ 13.416751] balance_scx+0x61/0x1e0 [ 13.416848] prev_balance+0x43/0xb0 [ 13.416947] __pick_next_task+0x6b/0x1b0 [ 13.417052] __schedule+0x20d/0x1740 This happens because dispatch_to_local_dsq() is racing with dispatch_dequeue(), when the latter wins we incorrectly assume that the task has been moved to the dst_rq. Fix this by correctly assuming that task is still in the src_rq in this specific scenario. Fixes: 4d3ca89bdd31 ("sched_ext: Refactor consume_remote_task()") Signed-off-by: Andrea Righi --- kernel/sched/ext.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index a24d48cebfb7..7500b1a26757 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2617,6 +2617,8 @@ static void dispatch_to_local_dsq(struct rq *rq, stru= ct scx_dispatch_q *dst_dsq, /* if the destination CPU is idle, wake it up */ if (sched_class_above(p->sched_class, dst_rq->curr->sched_class)) resched_curr(dst_rq); + } else { + dst_rq =3D src_rq; } =20 /* switch back to @rq lock */ --=20 2.48.1