From nobody Thu Jan 30 17:21:06 2025 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2087.outbound.protection.outlook.com [40.107.236.87]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C941C10FD for ; Fri, 24 Jan 2025 07:24:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.236.87 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737703473; cv=fail; b=nxpXPL9XRumafna0zV3DttA8CTkRsyuaohE0Rv1jFjb+1cspkOEEZj7DqKKx2bhWfiJZwke6qjuO0FFVFmWYC5v+iCfG7KZHS2yG7M/Dx73mUVrcIDYVnnU9lPgU/dpBSPoj/OMdpHQf9Q/OL1vvFYZPDeIk9pt4K0JOJSaKbUY= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737703473; c=relaxed/simple; bh=6aBCFqXaB6oqzVuIyG/VprI3yxMqzZCt6F9Ffbipy0g=; h=From:To:Cc:Subject:Date:Message-ID:Content-Type:MIME-Version; b=X1VBNqBPu5V7z22FWYenzW/qMGuIHYfb7c4wqbnxOMv4lBueiuVP8Hy1+yMLaaNQhHEzjX0KGbbRS/Q5gTqC14b3aFhjWGBV4ZECDz2bk/zJpKYJfEz5imvvK6j4dTdUTETrbufXoDx/Cypn+ZXMCqlBAmW8lHYUobsx0d8gfuM= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=BDWYbHHT; arc=fail smtp.client-ip=40.107.236.87 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="BDWYbHHT" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=lF+Iamx+l2yHUtHdwh5pO82pOf/dNJQ4w1SNoxsyQ1JME/Wp2My3G61Pl+47O2At7cgGtcUIOIm22suJkwBngZPe7jaGOmr5zZz9C6YMbWZYMYcE76yYrAdZBBeG40ni2T5AlwOluIgM60YTVorm+sw6Af9j6L0HQ4r69Y4d+eaNU+AxxsX3KXaxScSJzAbhGYeTeD/Yqkj6IZBo9zPRooNPwamoKTwVj9cvqBB6tKdZJc1kW55VBRlJFEdXJU50NrWFFCx8ytOfPKHAmB7Xxs9n+fJdG/4o2ILCAmrJTHVR86+GIUyt8a2eIS7tgGTwKmD2p74BP+mcfSP4eJ1/KA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vnzNSmi6jT134R6/udrhG3nnR3mX6utoyEHlvDXWpoA=; b=wi3VOuHurVSzMt4oPIwFv177+JTwzSjj2Kg9fmTM3plCylMWaPA/CL26IavUpuHk3PJCR9PS2bC5Dk/Vg1GsTePpMeD6wkGzZMw/cVrCK3rhfkHChqbw6apgNN8C9lOhVqvUdvpAea0g2LXHRmpPKcgwnoDiHFLBm6iE1WnxYA2Y8SSEqi6Nc9IR8Yp6Xh9s2V305YiEufwIU/394Wudn6sH8E4W9LjnnniaFd4c4g5PPIAKnQXYcQdymWMYo9boOu4ZFiiwSMBPT+GpC6AlxF+d/+5M4DWgFSn6RFZkx0CTXVYxDBvWJLwN3zupPJO8HKqxjwXQjrrkT/hkCXY1AQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vnzNSmi6jT134R6/udrhG3nnR3mX6utoyEHlvDXWpoA=; b=BDWYbHHT7O9DDhTyJvcoDFPCsHNLui8MNDaTQLj6XvJi1vFPaDGKf72UuLUKH2o9sQKpLf84OCWX2bYUbx71Pt8Kz4ax/OodmUYzuXi2zFEKlADfVjYZJqDFnDX+FBjHUnT+xlctxV9T5Ra0vXj08kahAMLKBRaIjNLnmeFwJy7X22b+gZEMmHvHGrKKnoOrs4r52iXiE6sTvnaopRnsrc7OunCYooV5aJDKSsHzB1nAll4Wn84DKyWBIzv9eIcv5HSRa6MHNXREREwyZs+F7NwSRFDQS0TKXbcIl4Y9KUGotKMsemD2I/BdxR8vxPAQGIK1atSewQ+3YH0ygM9shw== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) by CY8PR12MB8300.namprd12.prod.outlook.com (2603:10b6:930:7d::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8377.18; Fri, 24 Jan 2025 07:24:29 +0000 Received: from CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5%3]) with mapi id 15.20.8356.020; Fri, 24 Jan 2025 07:24:29 +0000 From: Andrea Righi To: Tejun Heo , David Vernet , Changwoo Min Cc: linux-kernel@vger.kernel.org Subject: [PATCH v2] sched_ext: Fix lock imbalance in dispatch_to_local_dsq() Date: Fri, 24 Jan 2025 08:24:25 +0100 Message-ID: <20250124072425.47795-1-arighi@nvidia.com> X-Mailer: git-send-email 2.48.1 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: FR0P281CA0155.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:b3::18) To CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|CY8PR12MB8300:EE_ X-MS-Office365-Filtering-Correlation-Id: 797fcce4-e573-472a-0b7b-08dd3c4823c1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?8HSnCLIcfOwna1PyS4QHIj6AS+VA1I7UKZhc3PYS6nHLUhS7D3yCXHvOtFSg?= =?us-ascii?Q?IYr64D2rgq5kKnF4i3le1oqIQdjGuSthvsb0hXKRgAsctomyiPOjpWo75tLD?= =?us-ascii?Q?BhgmblxZyGJ1yVtZRRkTYZyEKR9uXhJrgfI5hzU56euViEi8Omem3h3pxWnk?= =?us-ascii?Q?WkoPsHEY52FIZBZNWnVD13nF8VrM1MCQgSw2IqbR06cxjpftcTdu8hurCRjz?= =?us-ascii?Q?c8QbgJ3hydojnWNBUX91eu2iSjPvN/MYagynCK9Pd5WmAYbC8GnbAASY/W1G?= =?us-ascii?Q?aG+0unQYwwAWw8k7RpFNs3PSg9f2BZYOwdmEtMTgkMX0HG9FcQTFfeOhoCGM?= =?us-ascii?Q?40BrH2p/TPY2MF8CgANz5I4SDKE1+oKneRZ+hKGAETTeD2kW03+cUPvuZ1SJ?= =?us-ascii?Q?Pwcf0DOdmVc7IH+EqlIRf0lr92QW9L8kmtvRWOxoh1ysEyDavLPYFRj+YBpS?= =?us-ascii?Q?oFrwMjLcuTq4nHqRmzdMUdKiIvpCz6lTlxRyPNNYVB264SeSbFKplrOk8dUF?= =?us-ascii?Q?bbI60FKQEmI4Qqn47BVNnuKubQXfoVkEnd1Jg9WYVbbxrDr+3QuUDP44iQgu?= =?us-ascii?Q?tfAYSUlvHd/8bJ6znSe8Xw/iQveFstQBVTcFuplHgk0MZtfyU8WUGgOr5J9p?= =?us-ascii?Q?fyKFdKr8tG/hD20VkXt6fpUmiqkls9t+njKDIyai56agfxp81A6hMbVF5Ysz?= =?us-ascii?Q?e+beG4HWooIdlPtAvvRvJl9uzizzsYLiScxVq28tWIXo52pKmBpWOCNADEqS?= =?us-ascii?Q?GvvLJHGSgXvOkXAd4FLKzQW1Ws8HHN3u4MbJcSxtqDPR6YX+0mvoxemPrbnh?= =?us-ascii?Q?zx9iGdgBRuD5+UPU97C2pGVDEm/19AQcmyVmww59+F2wpYyAXslTYWhGskF6?= =?us-ascii?Q?2WKQpgi179MqNnivRwvBEFzabfUol0H47FPR/iujHUT5uCQhbqT/ukHmxO1y?= =?us-ascii?Q?wxgSo2fWsktxkRGWv9SD1r0L+6ixRQj6p+81Kitl8VOYHebKnXROFPes2as0?= =?us-ascii?Q?c9MCis30vq64L4iFTbNZno0geHF0GEnOaHPtd1fQw/EidcaKVTdoEhI1FOIm?= =?us-ascii?Q?GKBTzu/OaQydbhj5reelX1I6xsmXHNGSsuOBxn2W9L26kU1xxL1tEfr8Ut1K?= =?us-ascii?Q?P3SjHeOFV2mBetIyu5okcdVw2aOrorz+EFH17R7yV5JnNxO3bSUUhz2G9ouG?= =?us-ascii?Q?IRlzI5LOuz+nja3kOVPbshAS2tmYQSytw8yLmscJQzvs3BOZgYr8VpGV5IKi?= =?us-ascii?Q?dxYrqrV/QWonQph79/9N9j7lgGgj9lOsRB7sNE/0Iyng/fskFtiyW8Jj9tal?= =?us-ascii?Q?pUhG2GI8do92ok05BlMNS6oQwZrF1mIh3aLGuhUc+sVcvvu+GSKWbEj6QQ1f?= =?us-ascii?Q?caW2zBvedxN8aU6w09k2DfkQNwn4?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(366016)(1800799024);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?J2j8R0JxEy1aHVrtjDBmOdhP/BIEOk9huFkr5mmKEm47PaX2izy1m4i9mApz?= =?us-ascii?Q?SnF4aerY6xpX8R8lcSLEWSIHUsHye5vZ5wos2/egZe3ho0dh87I0dE1TLCMQ?= =?us-ascii?Q?YxPc+UJ6iFdELPq/AmutI/SfLs07pfsMZHeIc/m6xJxuGWweeas2VHrfj7Iz?= =?us-ascii?Q?7Y5tTWUfLyDNKuCSWvll7ksJaRW0OPPsfu8bxCjsLolPnQuq7Y4b/rSNYW5+?= =?us-ascii?Q?nMVD9c1I/gFD2IlQ5KpefHP5W2SwPaYr/tmMqp9JcBTDvecq5jdP0xA/4W5j?= =?us-ascii?Q?8sXaGx8vCjRuTN7AbqMLipraZDVpkjzr6TauLvifMYgAPbnw1UTmYLiV9G9y?= =?us-ascii?Q?C4nGnwqpmyDIplQlIPKXL9253r/Ehm08ujBvcbry0V8jqi/HfnhhKLDHA93+?= =?us-ascii?Q?Y8dItur1pljR8OigRCsewnp5LmUvwo+hTdjoVNk35aopG1GAMHx0c3L0Bw26?= =?us-ascii?Q?P0h52K4PAXlUj+Ymu2O371AkejGK8syrTOVd0vL1yEhOG3EWDaJ6i9HOGU1K?= =?us-ascii?Q?Nxg6lI4gTUVkD+o6sm9N56PUhpgKfvZmAbWQGkjynow3bEek/t1Z2jlxenFX?= =?us-ascii?Q?+pWmR/qxrF9Kn8/uPoJqVwrs/jwJbITLqTGw1VVo2b52AOZi+bdCxodDdjaZ?= =?us-ascii?Q?Ls5BQ6NI7VwSwNBL6kEdl8e+mn+/w0+Npy45Wz2H0Z9qRpHs7IK4H6lgd+0z?= =?us-ascii?Q?N71cbIBaYstwAlx84tGPxtq0rkkmrixDWaCTst2HywYHPyg3bQxbV1qTStcF?= =?us-ascii?Q?dLkYMLRnd5lSVvTGFGRseG/uWNH7auBbLXaDkyMyJwzgY86nm17NtxgzJw69?= =?us-ascii?Q?B7t1Tq9cbgBJQkvp5wZpcpEL7baIpWwEoGJHWul12NWUE2MYArKvoXZ3rO7R?= =?us-ascii?Q?D16mYDQ9oo55litopcnqCwbnG2+K+nxUObeACnaX49Cd5S/t7JcAU/tl3L+C?= =?us-ascii?Q?X3Y1qVXkC/jvcih5/RtNSj6/7OzJi1B3xDpCt9io/Feacofye0TFvHGvpl+l?= =?us-ascii?Q?/sHyoD3g5g1hGXoGE4fetJnfTVrhGaAKyxrE9k7fZxAC1dUmACV3DD7lDjDO?= =?us-ascii?Q?sEBFKE4oPNkl8SFYMZLghXDHDKHvBEOgkBJ/wTaKu3/QsvA/aBPAxyR2dDjs?= =?us-ascii?Q?KErRbqipsUGzo5HDiCdAHt7tjJtcms0lmdAEYybilLFbfdyF/tjS9C7v6cNg?= =?us-ascii?Q?dM0qd4+nMaeDK4KUPJ5feso1bGgKwnTsyDW7ztGqBWsQk10W3Q5fCE5Y3FkP?= =?us-ascii?Q?FiNg/Wp1zSYSAi2t3vH8K7+5S9TVKacFFpXOnkUtxadqL8WgC/bCsh7FHnBx?= =?us-ascii?Q?48K9EmflyF4EdtFs3+QYaKTJaAT+luZsdo067f7iRBal+Kw7B2UO27aYg7F5?= =?us-ascii?Q?JzgyNcS6taqPAlkXb/CtOE59+t/LKRoiDmuj+nqykYKl/MG6nykDv0aS3kll?= =?us-ascii?Q?roEzH9XeAfx7UsSNS6lFbcBTMQv49oPnyALoc3SkleI6vM09pEMwGGBZT6BU?= =?us-ascii?Q?ASqW87zErdhcRgSg+fv6omZm7OJeE/7P+tAEE+IMWQCUph3mXlS09iQnwtSH?= =?us-ascii?Q?0+XQfnlLK+2Gw50vWa7o6Qik/C6lrgx5dxYPZPCV?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 797fcce4-e573-472a-0b7b-08dd3c4823c1 X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Jan 2025 07:24:29.2913 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 9SH0QOQb/+RZ60Kzp8so95Ow8lAllI3dXOe+cxzFakHW9if1yKJJrJ6Bf9ixj7IZPz/glrEypoFHLmL7QYMDzw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY8PR12MB8300 Content-Type: text/plain; charset="utf-8" While performing the rq locking dance in dispatch_to_local_dsq(), we may trigger the following lock imbalance condition, in particular when multiple tasks are rapidly changing CPU affinity (i.e., running a `stress-ng --race-sched 0`): [ 13.413579] =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D [ 13.413660] WARNING: bad unlock balance detected! [ 13.413729] 6.13.0-virtme #15 Not tainted [ 13.413792] ------------------------------------- [ 13.413859] kworker/1:1/80 is trying to release lock (&rq->__lock) at: [ 13.413954] [] dispatch_to_local_dsq+0x108/0x1a0 [ 13.414111] but there are no more locks to release! [ 13.414176] [ 13.414176] other info that might help us debug this: [ 13.414258] 1 lock held by kworker/1:1/80: [ 13.414318] #0: ffff8b66feb41698 (&rq->__lock){-.-.}-{2:2}, at: raw_spi= n_rq_lock_nested+0x20/0x90 [ 13.414612] [ 13.414612] stack backtrace: [ 13.415255] CPU: 1 UID: 0 PID: 80 Comm: kworker/1:1 Not tainted 6.13.0-v= irtme #15 [ 13.415505] Workqueue: 0x0 (events) [ 13.415567] Sched_ext: dsp_local_on (enabled+all), task: runnable_at=3D-= 2ms [ 13.415570] Call Trace: [ 13.415700] [ 13.415744] dump_stack_lvl+0x78/0xe0 [ 13.415806] ? dispatch_to_local_dsq+0x108/0x1a0 [ 13.415884] print_unlock_imbalance_bug+0x11b/0x130 [ 13.415965] ? dispatch_to_local_dsq+0x108/0x1a0 [ 13.416226] lock_release+0x231/0x2c0 [ 13.416326] _raw_spin_unlock+0x1b/0x40 [ 13.416422] dispatch_to_local_dsq+0x108/0x1a0 [ 13.416554] flush_dispatch_buf+0x199/0x1d0 [ 13.416652] balance_one+0x194/0x370 [ 13.416751] balance_scx+0x61/0x1e0 [ 13.416848] prev_balance+0x43/0xb0 [ 13.416947] __pick_next_task+0x6b/0x1b0 [ 13.417052] __schedule+0x20d/0x1740 This happens because dispatch_to_local_dsq() is racing with dispatch_dequeue(), when the latter wins we incorrectly assume that the task has been moved to dst_rq. Fix this by correctly assuming that task is still in src_rq in this specific scenario. Fixes: 4d3ca89bdd31 ("sched_ext: Refactor consume_remote_task()") Signed-off-by: Andrea Righi --- kernel/sched/ext.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) ChangeLog v1 -> v2: - more comments to clarify the race with dequeue - rebase to tip diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 3f3f6baac917..92b69c57b400 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2598,7 +2598,10 @@ static void dispatch_to_local_dsq(struct rq *rq, str= uct scx_dispatch_q *dst_dsq, raw_spin_rq_lock(src_rq); } =20 - /* task_rq couldn't have changed if we're still the holding cpu */ + /* + * If p->scx.holding_cpu still matches the current CPU, task_rq(p) + * has not changed and we can safely move the task to @dst_rq. + */ if (likely(p->scx.holding_cpu =3D=3D raw_smp_processor_id()) && !WARN_ON_ONCE(src_rq !=3D task_rq(p))) { /* @@ -2617,6 +2620,13 @@ static void dispatch_to_local_dsq(struct rq *rq, str= uct scx_dispatch_q *dst_dsq, /* if the destination CPU is idle, wake it up */ if (sched_class_above(p->sched_class, dst_rq->curr->sched_class)) resched_curr(dst_rq); + } else { + /* + * Otherwise, if dequeue wins the race, we no longer have + * exclusive ownership of the task and we must keep it in + * its original @src_dsq. + */ + dst_rq =3D src_rq; } =20 /* switch back to @rq lock */ --=20 2.48.1