From nobody Thu Jan 30 17:20:51 2025 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2072.outbound.protection.outlook.com [40.107.93.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1FE11CD1E1 for ; Thu, 23 Jan 2025 22:02:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.93.72 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737669735; cv=fail; b=JMIWQdk8Fmd5si7d/SUA4OY+E4PzI74R4y9pFALiKnx2/6UUQV3bHz9oiC9YYZm4D8gCpRhZ5EZU81IlNtuqSfJuh5F/fMb14QRsimfq+M7kyJmkFlfmPB4UCbDow0SQEtwiEtc2ZKZUW4WzgC99D6zek4IijGqgYcapcq628vc= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737669735; c=relaxed/simple; bh=LSUEv27gUqy7N2+GQphUOek6qYbXV/xB1/7LFAqJO/w=; h=From:To:Cc:Subject:Date:Message-ID:Content-Type:MIME-Version; b=I+eP8OjJnaa070tdKB3KHveyQG9FY62m7du4UcB1pFUtagBPQBr2V1uFVh5JSi9c/JSAVJedxwB+CDY8nCwHfE60pw4+ycyNu+D/g0z5sj6+fXk042vCF6p0c2Lsjpo0hK5GOaBOJi26T2pxt/jT7XAO3Nl0NXB4YqX4Q6kPuMg= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=c8oUZrw8; arc=fail smtp.client-ip=40.107.93.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="c8oUZrw8" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=IV7nsSVOHXeWdqRpq2XKJTgHOa5R1MEn8gOTINSBeT3vTWC7l09KSwR+zsCdUxRk5riS1bl3FWnAxUaV/d90+7wuZv7prX0dTjxF70ftVLaUyWKWRqiHlVkTIVlP3VhYt0laTxL9QldajPmlb3bV4m0xSQlX/FizmXG4yALwkrFm6GQUeKCKunOf2rq89ThtG+KGaFv8EeU61i7ZQp5V1dIdIYxbtWszofOeb3zXWFfMfo/yDBo0wFg47pHT+5Nz9r5xeShmlxTMo7Pb3EbB+V1WHVJOJw2os/H+Eah1Y9sVNsUGxBuhsh9Db0R4PIiN1VZmZ8KTdFM6VzgI2/kyPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=kZa9kqmjhE0KhH1ZpaDeLAHYiZRV41c0GtuNnRX3fcY=; b=jmPbwB2sbk5EWcMqpUbqeloVCrcKorA4TEsmlfW99RQGZxVG69Bk5IPqXjn7azF9TF5dUtbpDnOmynU4a0RA25jevA3rrRqgYeczwnEB81cM728x+4rFHK1NZS8B27SBkWuqrE07b5ozE32qY2IxSUPkIhkn8osJfedEbgKc889EH1aJoRPC3sbnE3Csaeakv2XhKlGeeHBUsAUJG6ngrDiU2WKcb2QA36eRqALsIY5F4pJLWB/nsADuTWWlunP3vntcRWytrBz9ED73wkE1CO9q5Mw2hARVc1tVnMcHWZaDx3qR5aJszAyZPClac7jOw+U303+Cc8MiHK5uAcDJgA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kZa9kqmjhE0KhH1ZpaDeLAHYiZRV41c0GtuNnRX3fcY=; b=c8oUZrw8MVK/pvbXyoplkgP2c6eDjbn+loSuQtK/R7n8DkuiraMQfGQwqUEYPfj/HKS+1suLT4l3Z78/D3BCn4HWY5PGdr7oMWN8W4y86wz9LPta5QD5pyvVK8drIRsRNmTWtqo6vDI60V3HM974wdkNx7xlttz/UGoI4RAcZtDLlIusuTO+TY0VhjsJmriiiJO+Rpeu7aQhloZVdntX/qkyYtUVf8F7oU6/f1U+48eWM+QDN1k2/4c4mupnK+J3n0/FwJuT91owwhhdzHIjA+0re1jy07CB0oK/25cbH6rvOjWfVEE2wLPqLkyJFbn9e8IOUzAM6LTpmrPq/5hFTQ== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) by PH8PR12MB6987.namprd12.prod.outlook.com (2603:10b6:510:1be::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8356.12; Thu, 23 Jan 2025 22:02:07 +0000 Received: from CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5]) by CY5PR12MB6405.namprd12.prod.outlook.com ([fe80::2119:c96c:b455:53b5%3]) with mapi id 15.20.8356.020; Thu, 23 Jan 2025 22:02:07 +0000 From: Andrea Righi To: Tejun Heo , David Vernet , Changwoo Min Cc: Yury Norov , linux-kernel@vger.kernel.org Subject: [PATCH sched_ext/for-6.14] sched_ext: Move built-in idle CPU selection policy to a separate file Date: Thu, 23 Jan 2025 23:02:02 +0100 Message-ID: <20250123220202.16274-1-arighi@nvidia.com> X-Mailer: git-send-email 2.48.1 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: FR3P281CA0039.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:4a::11) To CY5PR12MB6405.namprd12.prod.outlook.com (2603:10b6:930:3e::17) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY5PR12MB6405:EE_|PH8PR12MB6987:EE_ X-MS-Office365-Filtering-Correlation-Id: 2b231e95-32c9-4aa2-86c6-08dd3bf993fe X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|366016|376014|7053199007; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?GekNaYxlZ3bjmo4H+WFW368hLr8iDImz178v+dSWtSVXxVBRNXGQJlRqnqp/?= =?us-ascii?Q?1jG4d4dkrg6ELRWTZzqkhi3rElFO64wsGexZ+8UJ6x7hVl+KXHgGIXjDGiMG?= =?us-ascii?Q?rwDdQ7HZ1DEl+d3ocAE9HxM+xwoXlT3bvec0urvMML9AG9dBIQahAni5+1NY?= =?us-ascii?Q?t/ZHfGETgcCZpp9oxreBE92dG/uhBFYe/PAxKDRO3o0mz7l7//xaSCpgzukT?= =?us-ascii?Q?CCpdi+jjQze5yLWzZ7FdhjzdnJmKhLB4FQrj6e3ro5M4Aef+dkMGVhf+1Dto?= =?us-ascii?Q?vJVQhuqeqxsmYSbV92U8Co6eR2snELvsCZR6UvKwed6cqcLNdPJ/VIJzwU0v?= =?us-ascii?Q?ywv6Dfr9Q1bJe0SFF3PQbLFOLfIUoG2VIuvqAbZ9/2WNwttrO4EzYbP+lq5a?= =?us-ascii?Q?VUhpTb3E45QDKzeVE0L+aV05z0rbJB+ZOKy6kmj7GvTH9JOBdT+k7AEYwtEm?= =?us-ascii?Q?fC5aZVf3MpQSqInHcSIPGBD5AFiuO74oqLuEli/OiKNjagi/cvnvxNHm6m+s?= =?us-ascii?Q?poiAwH0PKf2Lq/Rj3eO1lKuQoWCiBtAQCQJal6BSokpFOz5kM9Pm+D3vo7xQ?= =?us-ascii?Q?crW640wmT3XWWkiNiDXdjz0i5nNdEhkDQzM5CEXUpbaondTHbA9q/dupKsWi?= =?us-ascii?Q?fg71hwjH2AuBHOLY7ByROGMDEkAWQjfaJiLE+yuqN2hT7+78uEFqTsJf05NM?= =?us-ascii?Q?CeYjeGnkHj0FV8QREle1YGZgHt0Y7Q694XhXfiF3xjvqO5scTWReZXeAZhB8?= =?us-ascii?Q?BrrZwC4GyadbwVbMNVQx/A4/VWM858zIbK5RP3PLQR1j4v6Oc75//u/QK2Q2?= =?us-ascii?Q?lf0SsLsE6S+8DPYPXeh0P/wL5xUwXR7FGKzQrNPa1ON1ZMHfBWQJ89Pqb5t3?= =?us-ascii?Q?7sybtztpiYvn+YVGc5V9asc15yuAhechjBJYu4FENBu9Tde0CmsBwzZDesiQ?= =?us-ascii?Q?cdXlgaYfIab0UbxyNmhlXxJrv1WrlGLxwZ20L17QS1pj71NSa4ZWardFKRNa?= =?us-ascii?Q?tAPFmXPqigd3uOvOnS2uuqVQAaq/7eOBdG11h/W7QPPg4M6W+qCVtgeceWlQ?= =?us-ascii?Q?XQ16qoDkI2gtdxViIT9uq2mbEMJsCgTO4qTJiGSvKlXXhQH4hDI68wAGjcJ8?= =?us-ascii?Q?3WKHHQYCu2xamm06DgHbR8eTHrr6PfIPH3fzdF6uUwB1FyA//cZ83dXp3cUH?= =?us-ascii?Q?XJ7t7/7EItg5DggT5lGnAtnkSxlqMk4PlUTUHsBew5i7zKq3VCZDbI7ejWU3?= =?us-ascii?Q?QZvVJSvowgSzCqk8JpzFqctCOAQiaFphAiqMnBH4jP+jlEvC6+tOF0bf3KYn?= =?us-ascii?Q?L97bKWtDGllHBQ37rL4iU7YUpE9s6IsQGn2n4BClLOKUQ56WwFQ7vr3wGIzk?= =?us-ascii?Q?VzsrIOZdEkN8hkYMvO9T6nrn9yWb?= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:CY5PR12MB6405.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(366016)(376014)(7053199007);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?4FbaqzxEIQMWOUSfzu7k87iZ9CswnPDqPD1FzBMNTB3I1B6cjJZ5RLpavoDX?= =?us-ascii?Q?la/eBZ7bFW6PzLJdpDYo0jZhlyS8O+1lAUbLlyugETgYUBHCGGB58OG7k+kU?= =?us-ascii?Q?HQhc+OI+52Hyq7Oj58AshGGEsgp2Y6art1EfkIhbtFqoeF4BX9H0R9oKaR1e?= =?us-ascii?Q?SArtMFXBW/uvdKzRNNTX5tlxk8bclNSChpGj5aVP706qa/j9hVe4PZWK5r3D?= =?us-ascii?Q?WgAw98Ta+2PuILLCsjmxTBtSir3fmq4qGqjUZ4RmzZSSi/Nb6lzbE06AR+V2?= =?us-ascii?Q?YPnXqeyuoz7i964AKOXAuBqNyDfLUsv3EDbloXCLqMI6S/YsgLWZx/2shK1F?= =?us-ascii?Q?UiCH3f2Rd9evv+tmfXsMdEXky6BA0Ro6nK+4Zr7miIDgzJJ1JNMlgz5yU76H?= =?us-ascii?Q?mKAjKSUVD3U94UmP+8HvubpqWl/3HWGJA/o7ts9gvYuGWQUCzJqtfgM7rpX7?= =?us-ascii?Q?cuFd+hY4z3+2UhHCnQkodHkfmn8Vczh//986SPN6CCt0CtabA7fzl1j5Ewa3?= =?us-ascii?Q?19k0348u4q/nk+sBORQK7t3V1xNcYm9b0t9jt8Mvw2yWhHKVCwswCbxVMUuT?= =?us-ascii?Q?/+GKxnC2ehyOOE1Uibzu1xopVXYFt7YmA5Q4n+s7KhvMswcdsvSmBeQS3cpB?= =?us-ascii?Q?i9g4dBsiVXhlar0gkgTA+G+1OB+4k/ygrjjAtNazzjvjnlLZTmLoBhDtezeu?= =?us-ascii?Q?783MjH+miEYHbTYpJDV4LB833TH+RUPhk3mRa8XFeFmZW1+5Dgq9bCz2hJQp?= =?us-ascii?Q?S3ww5WqZlhqkoJ5yYse9WvQ2PNazEEaRJ/iKlFvW7qNgFY+7Xa/IDC8MM9ee?= =?us-ascii?Q?17fH40e7EVoVkyTL7QSgtP0WymfYbT3ZB0ghCoEkDggjGDUP9SccYkrEdipA?= =?us-ascii?Q?Zr8sVsqBNwRZPwlqLHA3PT7glP+G+mbNlEja0XZgbf3Qznaw1s/XUm9uFj+B?= =?us-ascii?Q?7p5LBIGt4MxoyL+GyedXGFoweYFYZOmeiRFXIXNbvt7otFRr0ZvWuo1FZiCo?= =?us-ascii?Q?XqN4UH9aILm6W15QRI/vPtucqQcJK2c8AEaQU8A2SFm2BLDm6M3ddlGv9THI?= =?us-ascii?Q?Uo9bMaE08d17RsMHETv2hSGIoGFu117PsmJf4UziCjfRBFNTaMmNRaCsXU7l?= =?us-ascii?Q?5nkUEyYlq7sfkPleo29rOMP4OBHWrQMZV6nxn8fWYB9AU15hC1pDYlYktiCQ?= =?us-ascii?Q?2Hv4AbnkXV9nB/p2O/WfOFYujijm4scoQd3dyDziceKoX3+tl+BQtQd7eIfJ?= =?us-ascii?Q?eDTMjVf37+DBaJs5GrMUgU7KA2ym4WjygsJIp7MljLoUmBn9p+GjWR+eXvRA?= =?us-ascii?Q?/p3O0AChKsBPjDHeDHPj6nI8AXs722gb6W8DezPrxsWJolI/Cw4w/kIwlJt3?= =?us-ascii?Q?Fxao8vTz1SaEl8jo1o0oabQukz86iKcIdSuB+XuDMmDncF6Y7An8mwqA9nLl?= =?us-ascii?Q?K8qixhcJixXZEsplVJdffiEE3Y4iXofrvpX4AR6CAU68EQjPx3r8Srvd3b/D?= =?us-ascii?Q?NWWltkIpzTejhI08iVsurfPz8qfBzzxG7QrHyqzZl4q2GIEfzCg742gf9sNq?= =?us-ascii?Q?Np7301xH7q4m7u8vrQT5ZGrwPJdiRM2KciZIWRHV?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2b231e95-32c9-4aa2-86c6-08dd3bf993fe X-MS-Exchange-CrossTenant-AuthSource: CY5PR12MB6405.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Jan 2025 22:02:07.4772 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: CtgyT7zHX86gtCW7sGmXBhF63YNTBr/pljJGaPwuC079VumkbNlMufWilaDTSNSB6z+itSXP6XgkLY4Z2ojwng== X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH8PR12MB6987 Content-Type: text/plain; charset="utf-8" As ext.c is becoming quite large, move the idle CPU selection policy to separate files (ext_idle.c / ext_idle.h) for better code readability. Moreover, group together all the idle CPU selection kfunc's to the same btf_kfunc_id_set block. No functional changes, this is purely code reorganization. Suggested-by: Yury Norov Signed-off-by: Andrea Righi --- MAINTAINERS | 2 + kernel/sched/build_policy.c | 1 + kernel/sched/ext.c | 753 +----------------------------------- kernel/sched/ext_idle.c | 722 ++++++++++++++++++++++++++++++++++ kernel/sched/ext_idle.h | 45 +++ 5 files changed, 787 insertions(+), 736 deletions(-) create mode 100644 kernel/sched/ext_idle.c create mode 100644 kernel/sched/ext_idle.h diff --git a/MAINTAINERS b/MAINTAINERS index 023df277737d..cd3d5b139a11 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -20935,6 +20935,8 @@ T: git://git.kernel.org/pub/scm/linux/kernel/git/tj= /sched_ext.git F: include/linux/sched/ext.h F: kernel/sched/ext.h F: kernel/sched/ext.c +F: kernel/sched/ext_idle.c +F: kernel/sched/ext_idle.h F: tools/sched_ext/ F: tools/testing/selftests/sched_ext =20 diff --git a/kernel/sched/build_policy.c b/kernel/sched/build_policy.c index fae1f5c921eb..72d97aa8b726 100644 --- a/kernel/sched/build_policy.c +++ b/kernel/sched/build_policy.c @@ -61,6 +61,7 @@ =20 #ifdef CONFIG_SCHED_CLASS_EXT # include "ext.c" +# include "ext_idle.c" #endif =20 #include "syscalls.c" diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 564f250e7689..a24d48cebfb7 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -6,6 +6,9 @@ * Copyright (c) 2022 Tejun Heo * Copyright (c) 2022 David Vernet */ +#include +#include "ext_idle.h" + #define SCX_OP_IDX(op) (offsetof(struct sched_ext_ops, op) / sizeof(void = (*)(void))) =20 enum scx_consts { @@ -883,12 +886,6 @@ static bool scx_warned_zero_slice; static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_last); static DEFINE_STATIC_KEY_FALSE(scx_ops_enq_exiting); static DEFINE_STATIC_KEY_FALSE(scx_ops_cpu_preempt); -static DEFINE_STATIC_KEY_FALSE(scx_builtin_idle_enabled); - -#ifdef CONFIG_SMP -static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_llc); -static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_numa); -#endif =20 static struct static_key_false scx_has_op[SCX_OPI_END] =3D { [0 ... SCX_OPI_END-1] =3D STATIC_KEY_FALSE_INIT }; @@ -896,6 +893,17 @@ static struct static_key_false scx_has_op[SCX_OPI_END]= =3D static atomic_t scx_exit_kind =3D ATOMIC_INIT(SCX_EXIT_DONE); static struct scx_exit_info *scx_exit_info; =20 +#define scx_ops_error_kind(err, fmt, args...) \ + scx_ops_exit_kind((err), 0, fmt, ##args) + +#define scx_ops_exit(code, fmt, args...) \ + scx_ops_exit_kind(SCX_EXIT_UNREG_KERN, (code), fmt, ##args) + +#define scx_ops_error(fmt, args...) \ + scx_ops_error_kind(SCX_EXIT_ERROR, fmt, ##args) + +#define SCX_HAS_OP(op) static_branch_likely(&scx_has_op[SCX_OP_IDX(op)]) + static atomic_long_t scx_nr_rejected =3D ATOMIC_LONG_INIT(0); static atomic_long_t scx_hotplug_seq =3D ATOMIC_LONG_INIT(0); =20 @@ -923,21 +931,6 @@ static unsigned long scx_watchdog_timestamp =3D INITIA= L_JIFFIES; =20 static struct delayed_work scx_watchdog_work; =20 -/* idle tracking */ -#ifdef CONFIG_SMP -#ifdef CONFIG_CPUMASK_OFFSTACK -#define CL_ALIGNED_IF_ONSTACK -#else -#define CL_ALIGNED_IF_ONSTACK __cacheline_aligned_in_smp -#endif - -static struct { - cpumask_var_t cpu; - cpumask_var_t smt; -} idle_masks CL_ALIGNED_IF_ONSTACK; - -#endif /* CONFIG_SMP */ - /* for %SCX_KICK_WAIT */ static unsigned long __percpu *scx_kick_cpus_pnt_seqs; =20 @@ -1024,17 +1017,6 @@ static __printf(3, 4) void scx_ops_exit_kind(enum sc= x_exit_kind kind, s64 exit_code, const char *fmt, ...); =20 -#define scx_ops_error_kind(err, fmt, args...) \ - scx_ops_exit_kind((err), 0, fmt, ##args) - -#define scx_ops_exit(code, fmt, args...) \ - scx_ops_exit_kind(SCX_EXIT_UNREG_KERN, (code), fmt, ##args) - -#define scx_ops_error(fmt, args...) \ - scx_ops_error_kind(SCX_EXIT_ERROR, fmt, ##args) - -#define SCX_HAS_OP(op) static_branch_likely(&scx_has_op[SCX_OP_IDX(op)]) - static long jiffies_delta_msecs(unsigned long at, unsigned long now) { if (time_after(at, now)) @@ -3169,416 +3151,6 @@ bool scx_prio_less(const struct task_struct *a, con= st struct task_struct *b, =20 #ifdef CONFIG_SMP =20 -static bool test_and_clear_cpu_idle(int cpu) -{ -#ifdef CONFIG_SCHED_SMT - /* - * SMT mask should be cleared whether we can claim @cpu or not. The SMT - * cluster is not wholly idle either way. This also prevents - * scx_pick_idle_cpu() from getting caught in an infinite loop. - */ - if (sched_smt_active()) { - const struct cpumask *smt =3D cpu_smt_mask(cpu); - - /* - * If offline, @cpu is not its own sibling and - * scx_pick_idle_cpu() can get caught in an infinite loop as - * @cpu is never cleared from idle_masks.smt. Ensure that @cpu - * is eventually cleared. - * - * NOTE: Use cpumask_intersects() and cpumask_test_cpu() to - * reduce memory writes, which may help alleviate cache - * coherence pressure. - */ - if (cpumask_intersects(smt, idle_masks.smt)) - cpumask_andnot(idle_masks.smt, idle_masks.smt, smt); - else if (cpumask_test_cpu(cpu, idle_masks.smt)) - __cpumask_clear_cpu(cpu, idle_masks.smt); - } -#endif - return cpumask_test_and_clear_cpu(cpu, idle_masks.cpu); -} - -static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags) -{ - int cpu; - -retry: - if (sched_smt_active()) { - cpu =3D cpumask_any_and_distribute(idle_masks.smt, cpus_allowed); - if (cpu < nr_cpu_ids) - goto found; - - if (flags & SCX_PICK_IDLE_CORE) - return -EBUSY; - } - - cpu =3D cpumask_any_and_distribute(idle_masks.cpu, cpus_allowed); - if (cpu >=3D nr_cpu_ids) - return -EBUSY; - -found: - if (test_and_clear_cpu_idle(cpu)) - return cpu; - else - goto retry; -} - -/* - * Return the amount of CPUs in the same LLC domain of @cpu (or zero if th= e LLC - * domain is not defined). - */ -static unsigned int llc_weight(s32 cpu) -{ - struct sched_domain *sd; - - sd =3D rcu_dereference(per_cpu(sd_llc, cpu)); - if (!sd) - return 0; - - return sd->span_weight; -} - -/* - * Return the cpumask representing the LLC domain of @cpu (or NULL if the = LLC - * domain is not defined). - */ -static struct cpumask *llc_span(s32 cpu) -{ - struct sched_domain *sd; - - sd =3D rcu_dereference(per_cpu(sd_llc, cpu)); - if (!sd) - return 0; - - return sched_domain_span(sd); -} - -/* - * Return the amount of CPUs in the same NUMA domain of @cpu (or zero if t= he - * NUMA domain is not defined). - */ -static unsigned int numa_weight(s32 cpu) -{ - struct sched_domain *sd; - struct sched_group *sg; - - sd =3D rcu_dereference(per_cpu(sd_numa, cpu)); - if (!sd) - return 0; - sg =3D sd->groups; - if (!sg) - return 0; - - return sg->group_weight; -} - -/* - * Return the cpumask representing the NUMA domain of @cpu (or NULL if the= NUMA - * domain is not defined). - */ -static struct cpumask *numa_span(s32 cpu) -{ - struct sched_domain *sd; - struct sched_group *sg; - - sd =3D rcu_dereference(per_cpu(sd_numa, cpu)); - if (!sd) - return NULL; - sg =3D sd->groups; - if (!sg) - return NULL; - - return sched_group_span(sg); -} - -/* - * Return true if the LLC domains do not perfectly overlap with the NUMA - * domains, false otherwise. - */ -static bool llc_numa_mismatch(void) -{ - int cpu; - - /* - * We need to scan all online CPUs to verify whether their scheduling - * domains overlap. - * - * While it is rare to encounter architectures with asymmetric NUMA - * topologies, CPU hotplugging or virtualized environments can result - * in asymmetric configurations. - * - * For example: - * - * NUMA 0: - * - LLC 0: cpu0..cpu7 - * - LLC 1: cpu8..cpu15 [offline] - * - * NUMA 1: - * - LLC 0: cpu16..cpu23 - * - LLC 1: cpu24..cpu31 - * - * In this case, if we only check the first online CPU (cpu0), we might - * incorrectly assume that the LLC and NUMA domains are fully - * overlapping, which is incorrect (as NUMA 1 has two distinct LLC - * domains). - */ - for_each_online_cpu(cpu) - if (llc_weight(cpu) !=3D numa_weight(cpu)) - return true; - - return false; -} - -/* - * Initialize topology-aware scheduling. - * - * Detect if the system has multiple LLC or multiple NUMA domains and enab= le - * cache-aware / NUMA-aware scheduling optimizations in the default CPU id= le - * selection policy. - * - * Assumption: the kernel's internal topology representation assumes that = each - * CPU belongs to a single LLC domain, and that each LLC domain is entirely - * contained within a single NUMA node. - */ -static void update_selcpu_topology(void) -{ - bool enable_llc =3D false, enable_numa =3D false; - unsigned int nr_cpus; - s32 cpu =3D cpumask_first(cpu_online_mask); - - /* - * Enable LLC domain optimization only when there are multiple LLC - * domains among the online CPUs. If all online CPUs are part of a - * single LLC domain, the idle CPU selection logic can choose any - * online CPU without bias. - * - * Note that it is sufficient to check the LLC domain of the first - * online CPU to determine whether a single LLC domain includes all - * CPUs. - */ - rcu_read_lock(); - nr_cpus =3D llc_weight(cpu); - if (nr_cpus > 0) { - if (nr_cpus < num_online_cpus()) - enable_llc =3D true; - pr_debug("sched_ext: LLC=3D%*pb weight=3D%u\n", - cpumask_pr_args(llc_span(cpu)), llc_weight(cpu)); - } - - /* - * Enable NUMA optimization only when there are multiple NUMA domains - * among the online CPUs and the NUMA domains don't perfectly overlaps - * with the LLC domains. - * - * If all CPUs belong to the same NUMA node and the same LLC domain, - * enabling both NUMA and LLC optimizations is unnecessary, as checking - * for an idle CPU in the same domain twice is redundant. - */ - nr_cpus =3D numa_weight(cpu); - if (nr_cpus > 0) { - if (nr_cpus < num_online_cpus() && llc_numa_mismatch()) - enable_numa =3D true; - pr_debug("sched_ext: NUMA=3D%*pb weight=3D%u\n", - cpumask_pr_args(numa_span(cpu)), numa_weight(cpu)); - } - rcu_read_unlock(); - - pr_debug("sched_ext: LLC idle selection %s\n", - str_enabled_disabled(enable_llc)); - pr_debug("sched_ext: NUMA idle selection %s\n", - str_enabled_disabled(enable_numa)); - - if (enable_llc) - static_branch_enable_cpuslocked(&scx_selcpu_topo_llc); - else - static_branch_disable_cpuslocked(&scx_selcpu_topo_llc); - if (enable_numa) - static_branch_enable_cpuslocked(&scx_selcpu_topo_numa); - else - static_branch_disable_cpuslocked(&scx_selcpu_topo_numa); -} - -/* - * Built-in CPU idle selection policy: - * - * 1. Prioritize full-idle cores: - * - always prioritize CPUs from fully idle cores (both logical CPUs are - * idle) to avoid interference caused by SMT. - * - * 2. Reuse the same CPU: - * - prefer the last used CPU to take advantage of cached data (L1, L2) = and - * branch prediction optimizations. - * - * 3. Pick a CPU within the same LLC (Last-Level Cache): - * - if the above conditions aren't met, pick a CPU that shares the same= LLC - * to maintain cache locality. - * - * 4. Pick a CPU within the same NUMA node, if enabled: - * - choose a CPU from the same NUMA node to reduce memory access latenc= y. - * - * 5. Pick any idle CPU usable by the task. - * - * Step 3 and 4 are performed only if the system has, respectively, multip= le - * LLC domains / multiple NUMA nodes (see scx_selcpu_topo_llc and - * scx_selcpu_topo_numa). - * - * NOTE: tasks that can only run on 1 CPU are excluded by this logic, beca= use - * we never call ops.select_cpu() for them, see select_task_rq(). - */ -static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, - u64 wake_flags, bool *found) -{ - const struct cpumask *llc_cpus =3D NULL; - const struct cpumask *numa_cpus =3D NULL; - s32 cpu; - - *found =3D false; - - /* - * This is necessary to protect llc_cpus. - */ - rcu_read_lock(); - - /* - * Determine the scheduling domain only if the task is allowed to run - * on all CPUs. - * - * This is done primarily for efficiency, as it avoids the overhead of - * updating a cpumask every time we need to select an idle CPU (which - * can be costly in large SMP systems), but it also aligns logically: - * if a task's scheduling domain is restricted by user-space (through - * CPU affinity), the task will simply use the flat scheduling domain - * defined by user-space. - */ - if (p->nr_cpus_allowed >=3D num_possible_cpus()) { - if (static_branch_maybe(CONFIG_NUMA, &scx_selcpu_topo_numa)) - numa_cpus =3D numa_span(prev_cpu); - - if (static_branch_maybe(CONFIG_SCHED_MC, &scx_selcpu_topo_llc)) - llc_cpus =3D llc_span(prev_cpu); - } - - /* - * If WAKE_SYNC, try to migrate the wakee to the waker's CPU. - */ - if (wake_flags & SCX_WAKE_SYNC) { - cpu =3D smp_processor_id(); - - /* - * If the waker's CPU is cache affine and prev_cpu is idle, - * then avoid a migration. - */ - if (cpus_share_cache(cpu, prev_cpu) && - test_and_clear_cpu_idle(prev_cpu)) { - cpu =3D prev_cpu; - goto cpu_found; - } - - /* - * If the waker's local DSQ is empty, and the system is under - * utilized, try to wake up @p to the local DSQ of the waker. - * - * Checking only for an empty local DSQ is insufficient as it - * could give the wakee an unfair advantage when the system is - * oversaturated. - * - * Checking only for the presence of idle CPUs is also - * insufficient as the local DSQ of the waker could have tasks - * piled up on it even if there is an idle core elsewhere on - * the system. - */ - if (!cpumask_empty(idle_masks.cpu) && - !(current->flags & PF_EXITING) && - cpu_rq(cpu)->scx.local_dsq.nr =3D=3D 0) { - if (cpumask_test_cpu(cpu, p->cpus_ptr)) - goto cpu_found; - } - } - - /* - * If CPU has SMT, any wholly idle CPU is likely a better pick than - * partially idle @prev_cpu. - */ - if (sched_smt_active()) { - /* - * Keep using @prev_cpu if it's part of a fully idle core. - */ - if (cpumask_test_cpu(prev_cpu, idle_masks.smt) && - test_and_clear_cpu_idle(prev_cpu)) { - cpu =3D prev_cpu; - goto cpu_found; - } - - /* - * Search for any fully idle core in the same LLC domain. - */ - if (llc_cpus) { - cpu =3D scx_pick_idle_cpu(llc_cpus, SCX_PICK_IDLE_CORE); - if (cpu >=3D 0) - goto cpu_found; - } - - /* - * Search for any fully idle core in the same NUMA node. - */ - if (numa_cpus) { - cpu =3D scx_pick_idle_cpu(numa_cpus, SCX_PICK_IDLE_CORE); - if (cpu >=3D 0) - goto cpu_found; - } - - /* - * Search for any full idle core usable by the task. - */ - cpu =3D scx_pick_idle_cpu(p->cpus_ptr, SCX_PICK_IDLE_CORE); - if (cpu >=3D 0) - goto cpu_found; - } - - /* - * Use @prev_cpu if it's idle. - */ - if (test_and_clear_cpu_idle(prev_cpu)) { - cpu =3D prev_cpu; - goto cpu_found; - } - - /* - * Search for any idle CPU in the same LLC domain. - */ - if (llc_cpus) { - cpu =3D scx_pick_idle_cpu(llc_cpus, 0); - if (cpu >=3D 0) - goto cpu_found; - } - - /* - * Search for any idle CPU in the same NUMA node. - */ - if (numa_cpus) { - cpu =3D scx_pick_idle_cpu(numa_cpus, 0); - if (cpu >=3D 0) - goto cpu_found; - } - - /* - * Search for any idle CPU usable by the task. - */ - cpu =3D scx_pick_idle_cpu(p->cpus_ptr, 0); - if (cpu >=3D 0) - goto cpu_found; - - rcu_read_unlock(); - return prev_cpu; - -cpu_found: - rcu_read_unlock(); - - *found =3D true; - return cpu; -} - static int select_task_rq_scx(struct task_struct *p, int prev_cpu, int wak= e_flags) { /* @@ -3645,90 +3217,6 @@ static void set_cpus_allowed_scx(struct task_struct = *p, (struct cpumask *)p->cpus_ptr); } =20 -static void reset_idle_masks(void) -{ - /* - * Consider all online cpus idle. Should converge to the actual state - * quickly. - */ - cpumask_copy(idle_masks.cpu, cpu_online_mask); - cpumask_copy(idle_masks.smt, cpu_online_mask); -} - -static void update_builtin_idle(int cpu, bool idle) -{ - assign_cpu(cpu, idle_masks.cpu, idle); - -#ifdef CONFIG_SCHED_SMT - if (sched_smt_active()) { - const struct cpumask *smt =3D cpu_smt_mask(cpu); - - if (idle) { - /* - * idle_masks.smt handling is racy but that's fine as - * it's only for optimization and self-correcting. - */ - if (!cpumask_subset(smt, idle_masks.cpu)) - return; - cpumask_or(idle_masks.smt, idle_masks.smt, smt); - } else { - cpumask_andnot(idle_masks.smt, idle_masks.smt, smt); - } - } -#endif -} - -/* - * Update the idle state of a CPU to @idle. - * - * If @do_notify is true, ops.update_idle() is invoked to notify the scx - * scheduler of an actual idle state transition (idle to busy or vice - * versa). If @do_notify is false, only the idle state in the idle masks is - * refreshed without invoking ops.update_idle(). - * - * This distinction is necessary, because an idle CPU can be "reserved" and - * awakened via scx_bpf_pick_idle_cpu() + scx_bpf_kick_cpu(), marking it as - * busy even if no tasks are dispatched. In this case, the CPU may return - * to idle without a true state transition. Refreshing the idle masks - * without invoking ops.update_idle() ensures accurate idle state tracking - * while avoiding unnecessary updates and maintaining balanced state - * transitions. - */ -void __scx_update_idle(struct rq *rq, bool idle, bool do_notify) -{ - int cpu =3D cpu_of(rq); - - lockdep_assert_rq_held(rq); - - /* - * Trigger ops.update_idle() only when transitioning from a task to - * the idle thread and vice versa. - * - * Idle transitions are indicated by do_notify being set to true, - * managed by put_prev_task_idle()/set_next_task_idle(). - */ - if (SCX_HAS_OP(update_idle) && do_notify && !scx_rq_bypassing(rq)) - SCX_CALL_OP(SCX_KF_REST, update_idle, cpu_of(rq), idle); - - /* - * Update the idle masks: - * - for real idle transitions (do_notify =3D=3D true) - * - for idle-to-idle transitions (indicated by the previous task - * being the idle thread, managed by pick_task_idle()) - * - * Skip updating idle masks if the previous task is not the idle - * thread, since set_next_task_idle() has already handled it when - * transitioning from a task to the idle thread (calling this - * function with do_notify =3D=3D true). - * - * In this way we can avoid updating the idle masks twice, - * unnecessarily. - */ - if (static_branch_likely(&scx_builtin_idle_enabled)) - if (do_notify || is_idle_task(rq->curr)) - update_builtin_idle(cpu, idle); -} - static void handle_hotplug(struct rq *rq, bool online) { int cpu =3D cpu_of(rq); @@ -3768,12 +3256,6 @@ static void rq_offline_scx(struct rq *rq) rq->scx.flags &=3D ~SCX_RQ_ONLINE; } =20 -#else /* CONFIG_SMP */ - -static bool test_and_clear_cpu_idle(int cpu) { return false; } -static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags= ) { return -EBUSY; } -static void reset_idle_masks(void) {} - #endif /* CONFIG_SMP */ =20 static bool check_rq_for_timeouts(struct rq *rq) @@ -5632,9 +5114,8 @@ static int scx_ops_enable(struct sched_ext_ops *ops, = struct bpf_link *link) static_branch_enable_cpuslocked(&scx_has_op[i]); =20 check_hotplug_seq(ops); -#ifdef CONFIG_SMP update_selcpu_topology(); -#endif + cpus_read_unlock(); =20 ret =3D validate_ops(ops); @@ -6325,10 +5806,8 @@ void __init init_sched_ext_class(void) SCX_TG_ONLINE); =20 BUG_ON(rhashtable_init(&dsq_hash, &dsq_hash_params)); -#ifdef CONFIG_SMP - BUG_ON(!alloc_cpumask_var(&idle_masks.cpu, GFP_KERNEL)); - BUG_ON(!alloc_cpumask_var(&idle_masks.smt, GFP_KERNEL)); -#endif + init_idle_masks(); + scx_kick_cpus_pnt_seqs =3D __alloc_percpu(sizeof(scx_kick_cpus_pnt_seqs[0]) * nr_cpu_ids, __alignof__(scx_kick_cpus_pnt_seqs[0])); @@ -6361,62 +5840,6 @@ void __init init_sched_ext_class(void) /*************************************************************************= ******* * Helpers that can be called from the BPF scheduler. */ -#include - -__bpf_kfunc_start_defs(); - -static bool check_builtin_idle_enabled(void) -{ - if (static_branch_likely(&scx_builtin_idle_enabled)) - return true; - - scx_ops_error("built-in idle tracking is disabled"); - return false; -} - -/** - * scx_bpf_select_cpu_dfl - The default implementation of ops.select_cpu() - * @p: task_struct to select a CPU for - * @prev_cpu: CPU @p was on previously - * @wake_flags: %SCX_WAKE_* flags - * @is_idle: out parameter indicating whether the returned CPU is idle - * - * Can only be called from ops.select_cpu() if the built-in CPU selection = is - * enabled - ops.update_idle() is missing or %SCX_OPS_KEEP_BUILTIN_IDLE is= set. - * @p, @prev_cpu and @wake_flags match ops.select_cpu(). - * - * Returns the picked CPU with *@is_idle indicating whether the picked CPU= is - * currently idle and thus a good candidate for direct dispatching. - */ -__bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, - u64 wake_flags, bool *is_idle) -{ - if (!check_builtin_idle_enabled()) - goto prev_cpu; - - if (!scx_kf_allowed(SCX_KF_SELECT_CPU)) - goto prev_cpu; - -#ifdef CONFIG_SMP - return scx_select_cpu_dfl(p, prev_cpu, wake_flags, is_idle); -#endif - -prev_cpu: - *is_idle =3D false; - return prev_cpu; -} - -__bpf_kfunc_end_defs(); - -BTF_KFUNCS_START(scx_kfunc_ids_select_cpu) -BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_RCU) -BTF_KFUNCS_END(scx_kfunc_ids_select_cpu) - -static const struct btf_kfunc_id_set scx_kfunc_set_select_cpu =3D { - .owner =3D THIS_MODULE, - .set =3D &scx_kfunc_ids_select_cpu, -}; - static bool scx_dsq_insert_preamble(struct task_struct *p, u64 enq_flags) { if (!scx_kf_allowed(SCX_KF_ENQUEUE | SCX_KF_DISPATCH)) @@ -7475,142 +6898,6 @@ __bpf_kfunc void scx_bpf_put_cpumask(const struct c= pumask *cpumask) */ } =20 -/** - * scx_bpf_get_idle_cpumask - Get a referenced kptr to the idle-tracking - * per-CPU cpumask. - * - * Returns NULL if idle tracking is not enabled, or running on a UP kernel. - */ -__bpf_kfunc const struct cpumask *scx_bpf_get_idle_cpumask(void) -{ - if (!check_builtin_idle_enabled()) - return cpu_none_mask; - -#ifdef CONFIG_SMP - return idle_masks.cpu; -#else - return cpu_none_mask; -#endif -} - -/** - * scx_bpf_get_idle_smtmask - Get a referenced kptr to the idle-tracking, - * per-physical-core cpumask. Can be used to determine if an entire physic= al - * core is free. - * - * Returns NULL if idle tracking is not enabled, or running on a UP kernel. - */ -__bpf_kfunc const struct cpumask *scx_bpf_get_idle_smtmask(void) -{ - if (!check_builtin_idle_enabled()) - return cpu_none_mask; - -#ifdef CONFIG_SMP - if (sched_smt_active()) - return idle_masks.smt; - else - return idle_masks.cpu; -#else - return cpu_none_mask; -#endif -} - -/** - * scx_bpf_put_idle_cpumask - Release a previously acquired referenced kpt= r to - * either the percpu, or SMT idle-tracking cpumask. - * @idle_mask: &cpumask to use - */ -__bpf_kfunc void scx_bpf_put_idle_cpumask(const struct cpumask *idle_mask) -{ - /* - * Empty function body because we aren't actually acquiring or releasing - * a reference to a global idle cpumask, which is read-only in the - * caller and is never released. The acquire / release semantics here - * are just used to make the cpumask a trusted pointer in the caller. - */ -} - -/** - * scx_bpf_test_and_clear_cpu_idle - Test and clear @cpu's idle state - * @cpu: cpu to test and clear idle for - * - * Returns %true if @cpu was idle and its idle state was successfully clea= red. - * %false otherwise. - * - * Unavailable if ops.update_idle() is implemented and - * %SCX_OPS_KEEP_BUILTIN_IDLE is not set. - */ -__bpf_kfunc bool scx_bpf_test_and_clear_cpu_idle(s32 cpu) -{ - if (!check_builtin_idle_enabled()) - return false; - - if (ops_cpu_valid(cpu, NULL)) - return test_and_clear_cpu_idle(cpu); - else - return false; -} - -/** - * scx_bpf_pick_idle_cpu - Pick and claim an idle cpu - * @cpus_allowed: Allowed cpumask - * @flags: %SCX_PICK_IDLE_CPU_* flags - * - * Pick and claim an idle cpu in @cpus_allowed. Returns the picked idle cpu - * number on success. -%EBUSY if no matching cpu was found. - * - * Idle CPU tracking may race against CPU scheduling state transitions. For - * example, this function may return -%EBUSY as CPUs are transitioning int= o the - * idle state. If the caller then assumes that there will be dispatch even= ts on - * the CPUs as they were all busy, the scheduler may end up stalling with = CPUs - * idling while there are pending tasks. Use scx_bpf_pick_any_cpu() and - * scx_bpf_kick_cpu() to guarantee that there will be at least one dispatch - * event in the near future. - * - * Unavailable if ops.update_idle() is implemented and - * %SCX_OPS_KEEP_BUILTIN_IDLE is not set. - */ -__bpf_kfunc s32 scx_bpf_pick_idle_cpu(const struct cpumask *cpus_allowed, - u64 flags) -{ - if (!check_builtin_idle_enabled()) - return -EBUSY; - - return scx_pick_idle_cpu(cpus_allowed, flags); -} - -/** - * scx_bpf_pick_any_cpu - Pick and claim an idle cpu if available or pick = any CPU - * @cpus_allowed: Allowed cpumask - * @flags: %SCX_PICK_IDLE_CPU_* flags - * - * Pick and claim an idle cpu in @cpus_allowed. If none is available, pick= any - * CPU in @cpus_allowed. Guaranteed to succeed and returns the picked idle= cpu - * number if @cpus_allowed is not empty. -%EBUSY is returned if @cpus_allo= wed is - * empty. - * - * If ops.update_idle() is implemented and %SCX_OPS_KEEP_BUILTIN_IDLE is n= ot - * set, this function can't tell which CPUs are idle and will always pick = any - * CPU. - */ -__bpf_kfunc s32 scx_bpf_pick_any_cpu(const struct cpumask *cpus_allowed, - u64 flags) -{ - s32 cpu; - - if (static_branch_likely(&scx_builtin_idle_enabled)) { - cpu =3D scx_pick_idle_cpu(cpus_allowed, flags); - if (cpu >=3D 0) - return cpu; - } - - cpu =3D cpumask_any_distribute(cpus_allowed); - if (cpu < nr_cpu_ids) - return cpu; - else - return -EBUSY; -} - /** * scx_bpf_task_running - Is task currently running? * @p: task of interest @@ -7750,12 +7037,6 @@ BTF_ID_FLAGS(func, scx_bpf_nr_cpu_ids) BTF_ID_FLAGS(func, scx_bpf_get_possible_cpumask, KF_ACQUIRE) BTF_ID_FLAGS(func, scx_bpf_get_online_cpumask, KF_ACQUIRE) BTF_ID_FLAGS(func, scx_bpf_put_cpumask, KF_RELEASE) -BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask, KF_ACQUIRE) -BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask, KF_ACQUIRE) -BTF_ID_FLAGS(func, scx_bpf_put_idle_cpumask, KF_RELEASE) -BTF_ID_FLAGS(func, scx_bpf_test_and_clear_cpu_idle) -BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_RCU) -BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_RCU) BTF_ID_FLAGS(func, scx_bpf_task_running, KF_RCU) BTF_ID_FLAGS(func, scx_bpf_task_cpu, KF_RCU) BTF_ID_FLAGS(func, scx_bpf_cpu_rq) diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c new file mode 100644 index 000000000000..ca99fc58af91 --- /dev/null +++ b/kernel/sched/ext_idle.c @@ -0,0 +1,722 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * BPF extensible scheduler class: Documentation/scheduler/sched-ext.rst + * + * Built-in idle CPU tracking policy. + * + * Copyright (c) 2022 Meta Platforms, Inc. and affiliates. + * Copyright (c) 2022 Tejun Heo + * Copyright (c) 2022 David Vernet + * Copyright (c) 2024 Andrea Righi + */ +#ifdef CONFIG_SMP +#ifdef CONFIG_CPUMASK_OFFSTACK +#define CL_ALIGNED_IF_ONSTACK +#else +#define CL_ALIGNED_IF_ONSTACK __cacheline_aligned_in_smp +#endif + +static struct { + cpumask_var_t cpu; + cpumask_var_t smt; +} idle_masks CL_ALIGNED_IF_ONSTACK; + +static bool test_and_clear_cpu_idle(int cpu) +{ +#ifdef CONFIG_SCHED_SMT + /* + * SMT mask should be cleared whether we can claim @cpu or not. The SMT + * cluster is not wholly idle either way. This also prevents + * scx_pick_idle_cpu() from getting caught in an infinite loop. + */ + if (sched_smt_active()) { + const struct cpumask *smt =3D cpu_smt_mask(cpu); + + /* + * If offline, @cpu is not its own sibling and + * scx_pick_idle_cpu() can get caught in an infinite loop as + * @cpu is never cleared from idle_masks.smt. Ensure that @cpu + * is eventually cleared. + * + * NOTE: Use cpumask_intersects() and cpumask_test_cpu() to + * reduce memory writes, which may help alleviate cache + * coherence pressure. + */ + if (cpumask_intersects(smt, idle_masks.smt)) + cpumask_andnot(idle_masks.smt, idle_masks.smt, smt); + else if (cpumask_test_cpu(cpu, idle_masks.smt)) + __cpumask_clear_cpu(cpu, idle_masks.smt); + } +#endif + return cpumask_test_and_clear_cpu(cpu, idle_masks.cpu); +} + +static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags) +{ + int cpu; + +retry: + if (sched_smt_active()) { + cpu =3D cpumask_any_and_distribute(idle_masks.smt, cpus_allowed); + if (cpu < nr_cpu_ids) + goto found; + + if (flags & SCX_PICK_IDLE_CORE) + return -EBUSY; + } + + cpu =3D cpumask_any_and_distribute(idle_masks.cpu, cpus_allowed); + if (cpu >=3D nr_cpu_ids) + return -EBUSY; + +found: + if (test_and_clear_cpu_idle(cpu)) + return cpu; + else + goto retry; +} + +/* + * Return the amount of CPUs in the same LLC domain of @cpu (or zero if th= e LLC + * domain is not defined). + */ +static unsigned int llc_weight(s32 cpu) +{ + struct sched_domain *sd; + + sd =3D rcu_dereference(per_cpu(sd_llc, cpu)); + if (!sd) + return 0; + + return sd->span_weight; +} + +/* + * Return the cpumask representing the LLC domain of @cpu (or NULL if the = LLC + * domain is not defined). + */ +static struct cpumask *llc_span(s32 cpu) +{ + struct sched_domain *sd; + + sd =3D rcu_dereference(per_cpu(sd_llc, cpu)); + if (!sd) + return 0; + + return sched_domain_span(sd); +} + +/* + * Return the amount of CPUs in the same NUMA domain of @cpu (or zero if t= he + * NUMA domain is not defined). + */ +static unsigned int numa_weight(s32 cpu) +{ + struct sched_domain *sd; + struct sched_group *sg; + + sd =3D rcu_dereference(per_cpu(sd_numa, cpu)); + if (!sd) + return 0; + sg =3D sd->groups; + if (!sg) + return 0; + + return sg->group_weight; +} + +/* + * Return the cpumask representing the NUMA domain of @cpu (or NULL if the= NUMA + * domain is not defined). + */ +static struct cpumask *numa_span(s32 cpu) +{ + struct sched_domain *sd; + struct sched_group *sg; + + sd =3D rcu_dereference(per_cpu(sd_numa, cpu)); + if (!sd) + return NULL; + sg =3D sd->groups; + if (!sg) + return NULL; + + return sched_group_span(sg); +} + +/* + * Return true if the LLC domains do not perfectly overlap with the NUMA + * domains, false otherwise. + */ +static bool llc_numa_mismatch(void) +{ + int cpu; + + /* + * We need to scan all online CPUs to verify whether their scheduling + * domains overlap. + * + * While it is rare to encounter architectures with asymmetric NUMA + * topologies, CPU hotplugging or virtualized environments can result + * in asymmetric configurations. + * + * For example: + * + * NUMA 0: + * - LLC 0: cpu0..cpu7 + * - LLC 1: cpu8..cpu15 [offline] + * + * NUMA 1: + * - LLC 0: cpu16..cpu23 + * - LLC 1: cpu24..cpu31 + * + * In this case, if we only check the first online CPU (cpu0), we might + * incorrectly assume that the LLC and NUMA domains are fully + * overlapping, which is incorrect (as NUMA 1 has two distinct LLC + * domains). + */ + for_each_online_cpu(cpu) + if (llc_weight(cpu) !=3D numa_weight(cpu)) + return true; + + return false; +} + +/* + * Initialize topology-aware scheduling. + * + * Detect if the system has multiple LLC or multiple NUMA domains and enab= le + * cache-aware / NUMA-aware scheduling optimizations in the default CPU id= le + * selection policy. + * + * Assumption: the kernel's internal topology representation assumes that = each + * CPU belongs to a single LLC domain, and that each LLC domain is entirely + * contained within a single NUMA node. + */ +static void update_selcpu_topology(void) +{ + bool enable_llc =3D false, enable_numa =3D false; + unsigned int nr_cpus; + s32 cpu =3D cpumask_first(cpu_online_mask); + + /* + * Enable LLC domain optimization only when there are multiple LLC + * domains among the online CPUs. If all online CPUs are part of a + * single LLC domain, the idle CPU selection logic can choose any + * online CPU without bias. + * + * Note that it is sufficient to check the LLC domain of the first + * online CPU to determine whether a single LLC domain includes all + * CPUs. + */ + rcu_read_lock(); + nr_cpus =3D llc_weight(cpu); + if (nr_cpus > 0) { + if (nr_cpus < num_online_cpus()) + enable_llc =3D true; + pr_debug("sched_ext: LLC=3D%*pb weight=3D%u\n", + cpumask_pr_args(llc_span(cpu)), llc_weight(cpu)); + } + + /* + * Enable NUMA optimization only when there are multiple NUMA domains + * among the online CPUs and the NUMA domains don't perfectly overlaps + * with the LLC domains. + * + * If all CPUs belong to the same NUMA node and the same LLC domain, + * enabling both NUMA and LLC optimizations is unnecessary, as checking + * for an idle CPU in the same domain twice is redundant. + */ + nr_cpus =3D numa_weight(cpu); + if (nr_cpus > 0) { + if (nr_cpus < num_online_cpus() && llc_numa_mismatch()) + enable_numa =3D true; + pr_debug("sched_ext: NUMA=3D%*pb weight=3D%u\n", + cpumask_pr_args(numa_span(cpu)), numa_weight(cpu)); + } + rcu_read_unlock(); + + pr_debug("sched_ext: LLC idle selection %s\n", + str_enabled_disabled(enable_llc)); + pr_debug("sched_ext: NUMA idle selection %s\n", + str_enabled_disabled(enable_numa)); + + if (enable_llc) + static_branch_enable_cpuslocked(&scx_selcpu_topo_llc); + else + static_branch_disable_cpuslocked(&scx_selcpu_topo_llc); + if (enable_numa) + static_branch_enable_cpuslocked(&scx_selcpu_topo_numa); + else + static_branch_disable_cpuslocked(&scx_selcpu_topo_numa); +} + +/* + * Built-in CPU idle selection policy: + * + * 1. Prioritize full-idle cores: + * - always prioritize CPUs from fully idle cores (both logical CPUs are + * idle) to avoid interference caused by SMT. + * + * 2. Reuse the same CPU: + * - prefer the last used CPU to take advantage of cached data (L1, L2) = and + * branch prediction optimizations. + * + * 3. Pick a CPU within the same LLC (Last-Level Cache): + * - if the above conditions aren't met, pick a CPU that shares the same= LLC + * to maintain cache locality. + * + * 4. Pick a CPU within the same NUMA node, if enabled: + * - choose a CPU from the same NUMA node to reduce memory access latenc= y. + * + * 5. Pick any idle CPU usable by the task. + * + * Step 3 and 4 are performed only if the system has, respectively, multip= le + * LLC domains / multiple NUMA nodes (see scx_selcpu_topo_llc and + * scx_selcpu_topo_numa). + * + * NOTE: tasks that can only run on 1 CPU are excluded by this logic, beca= use + * we never call ops.select_cpu() for them, see select_task_rq(). + */ +static s32 scx_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, + u64 wake_flags, bool *found) +{ + const struct cpumask *llc_cpus =3D NULL; + const struct cpumask *numa_cpus =3D NULL; + s32 cpu; + + *found =3D false; + + /* + * This is necessary to protect llc_cpus. + */ + rcu_read_lock(); + + /* + * Determine the scheduling domain only if the task is allowed to run + * on all CPUs. + * + * This is done primarily for efficiency, as it avoids the overhead of + * updating a cpumask every time we need to select an idle CPU (which + * can be costly in large SMP systems), but it also aligns logically: + * if a task's scheduling domain is restricted by user-space (through + * CPU affinity), the task will simply use the flat scheduling domain + * defined by user-space. + */ + if (p->nr_cpus_allowed >=3D num_possible_cpus()) { + if (static_branch_maybe(CONFIG_NUMA, &scx_selcpu_topo_numa)) + numa_cpus =3D numa_span(prev_cpu); + + if (static_branch_maybe(CONFIG_SCHED_MC, &scx_selcpu_topo_llc)) + llc_cpus =3D llc_span(prev_cpu); + } + + /* + * If WAKE_SYNC, try to migrate the wakee to the waker's CPU. + */ + if (wake_flags & SCX_WAKE_SYNC) { + cpu =3D smp_processor_id(); + + /* + * If the waker's CPU is cache affine and prev_cpu is idle, + * then avoid a migration. + */ + if (cpus_share_cache(cpu, prev_cpu) && + test_and_clear_cpu_idle(prev_cpu)) { + cpu =3D prev_cpu; + goto cpu_found; + } + + /* + * If the waker's local DSQ is empty, and the system is under + * utilized, try to wake up @p to the local DSQ of the waker. + * + * Checking only for an empty local DSQ is insufficient as it + * could give the wakee an unfair advantage when the system is + * oversaturated. + * + * Checking only for the presence of idle CPUs is also + * insufficient as the local DSQ of the waker could have tasks + * piled up on it even if there is an idle core elsewhere on + * the system. + */ + if (!cpumask_empty(idle_masks.cpu) && + !(current->flags & PF_EXITING) && + cpu_rq(cpu)->scx.local_dsq.nr =3D=3D 0) { + if (cpumask_test_cpu(cpu, p->cpus_ptr)) + goto cpu_found; + } + } + + /* + * If CPU has SMT, any wholly idle CPU is likely a better pick than + * partially idle @prev_cpu. + */ + if (sched_smt_active()) { + /* + * Keep using @prev_cpu if it's part of a fully idle core. + */ + if (cpumask_test_cpu(prev_cpu, idle_masks.smt) && + test_and_clear_cpu_idle(prev_cpu)) { + cpu =3D prev_cpu; + goto cpu_found; + } + + /* + * Search for any fully idle core in the same LLC domain. + */ + if (llc_cpus) { + cpu =3D scx_pick_idle_cpu(llc_cpus, SCX_PICK_IDLE_CORE); + if (cpu >=3D 0) + goto cpu_found; + } + + /* + * Search for any fully idle core in the same NUMA node. + */ + if (numa_cpus) { + cpu =3D scx_pick_idle_cpu(numa_cpus, SCX_PICK_IDLE_CORE); + if (cpu >=3D 0) + goto cpu_found; + } + + /* + * Search for any full idle core usable by the task. + */ + cpu =3D scx_pick_idle_cpu(p->cpus_ptr, SCX_PICK_IDLE_CORE); + if (cpu >=3D 0) + goto cpu_found; + } + + /* + * Use @prev_cpu if it's idle. + */ + if (test_and_clear_cpu_idle(prev_cpu)) { + cpu =3D prev_cpu; + goto cpu_found; + } + + /* + * Search for any idle CPU in the same LLC domain. + */ + if (llc_cpus) { + cpu =3D scx_pick_idle_cpu(llc_cpus, 0); + if (cpu >=3D 0) + goto cpu_found; + } + + /* + * Search for any idle CPU in the same NUMA node. + */ + if (numa_cpus) { + cpu =3D scx_pick_idle_cpu(numa_cpus, 0); + if (cpu >=3D 0) + goto cpu_found; + } + + /* + * Search for any idle CPU usable by the task. + */ + cpu =3D scx_pick_idle_cpu(p->cpus_ptr, 0); + if (cpu >=3D 0) + goto cpu_found; + + rcu_read_unlock(); + return prev_cpu; + +cpu_found: + rcu_read_unlock(); + + *found =3D true; + return cpu; +} + +static void reset_idle_masks(void) +{ + /* + * Consider all online cpus idle. Should converge to the actual state + * quickly. + */ + cpumask_copy(idle_masks.cpu, cpu_online_mask); + cpumask_copy(idle_masks.smt, cpu_online_mask); +} + +static void init_idle_masks(void) +{ + BUG_ON(!alloc_cpumask_var(&idle_masks.cpu, GFP_KERNEL)); + BUG_ON(!alloc_cpumask_var(&idle_masks.smt, GFP_KERNEL)); +} + +static void update_builtin_idle(int cpu, bool idle) +{ + assign_cpu(cpu, idle_masks.cpu, idle); + +#ifdef CONFIG_SCHED_SMT + if (sched_smt_active()) { + const struct cpumask *smt =3D cpu_smt_mask(cpu); + + if (idle) { + /* + * idle_masks.smt handling is racy but that's fine as + * it's only for optimization and self-correcting. + */ + if (!cpumask_subset(smt, idle_masks.cpu)) + return; + cpumask_or(idle_masks.smt, idle_masks.smt, smt); + } else { + cpumask_andnot(idle_masks.smt, idle_masks.smt, smt); + } + } +#endif +} + +/* + * Update the idle state of a CPU to @idle. + * + * If @do_notify is true, ops.update_idle() is invoked to notify the scx + * scheduler of an actual idle state transition (idle to busy or vice + * versa). If @do_notify is false, only the idle state in the idle masks is + * refreshed without invoking ops.update_idle(). + * + * This distinction is necessary, because an idle CPU can be "reserved" and + * awakened via scx_bpf_pick_idle_cpu() + scx_bpf_kick_cpu(), marking it as + * busy even if no tasks are dispatched. In this case, the CPU may return + * to idle without a true state transition. Refreshing the idle masks + * without invoking ops.update_idle() ensures accurate idle state tracking + * while avoiding unnecessary updates and maintaining balanced state + * transitions. + */ +void __scx_update_idle(struct rq *rq, bool idle, bool do_notify) +{ + int cpu =3D cpu_of(rq); + + lockdep_assert_rq_held(rq); + + /* + * Trigger ops.update_idle() only when transitioning from a task to + * the idle thread and vice versa. + * + * Idle transitions are indicated by do_notify being set to true, + * managed by put_prev_task_idle()/set_next_task_idle(). + */ + if (SCX_HAS_OP(update_idle) && do_notify && !scx_rq_bypassing(rq)) + SCX_CALL_OP(SCX_KF_REST, update_idle, cpu_of(rq), idle); + + /* + * Update the idle masks: + * - for real idle transitions (do_notify =3D=3D true) + * - for idle-to-idle transitions (indicated by the previous task + * being the idle thread, managed by pick_task_idle()) + * + * Skip updating idle masks if the previous task is not the idle + * thread, since set_next_task_idle() has already handled it when + * transitioning from a task to the idle thread (calling this + * function with do_notify =3D=3D true). + * + * In this way we can avoid updating the idle masks twice, + * unnecessarily. + */ + if (static_branch_likely(&scx_builtin_idle_enabled)) + if (do_notify || is_idle_task(rq->curr)) + update_builtin_idle(cpu, idle); +} +#endif /* CONFIG_SMP */ + +/*************************************************************************= ******* + * Helpers that can be called from the BPF scheduler. + */ +__bpf_kfunc_start_defs(); + +static bool check_builtin_idle_enabled(void) +{ + if (static_branch_likely(&scx_builtin_idle_enabled)) + return true; + + scx_ops_error("built-in idle tracking is disabled"); + return false; +} + +/** + * scx_bpf_select_cpu_dfl - The default implementation of ops.select_cpu() + * @p: task_struct to select a CPU for + * @prev_cpu: CPU @p was on previously + * @wake_flags: %SCX_WAKE_* flags + * @is_idle: out parameter indicating whether the returned CPU is idle + * + * Can only be called from ops.select_cpu() if the built-in CPU selection = is + * enabled - ops.update_idle() is missing or %SCX_OPS_KEEP_BUILTIN_IDLE is= set. + * @p, @prev_cpu and @wake_flags match ops.select_cpu(). + * + * Returns the picked CPU with *@is_idle indicating whether the picked CPU= is + * currently idle and thus a good candidate for direct dispatching. + */ +__bpf_kfunc s32 scx_bpf_select_cpu_dfl(struct task_struct *p, s32 prev_cpu, + u64 wake_flags, bool *is_idle) +{ + if (!check_builtin_idle_enabled()) + goto prev_cpu; + + if (!scx_kf_allowed(SCX_KF_SELECT_CPU)) + goto prev_cpu; + +#ifdef CONFIG_SMP + return scx_select_cpu_dfl(p, prev_cpu, wake_flags, is_idle); +#endif + +prev_cpu: + *is_idle =3D false; + return prev_cpu; +} + +/** + * scx_bpf_get_idle_cpumask - Get a referenced kptr to the idle-tracking + * per-CPU cpumask. + * + * Returns NULL if idle tracking is not enabled, or running on a UP kernel. + */ +__bpf_kfunc const struct cpumask *scx_bpf_get_idle_cpumask(void) +{ + if (!check_builtin_idle_enabled()) + return cpu_none_mask; + +#ifdef CONFIG_SMP + return idle_masks.cpu; +#else + return cpu_none_mask; +#endif +} + +/** + * scx_bpf_get_idle_smtmask - Get a referenced kptr to the idle-tracking, + * per-physical-core cpumask. Can be used to determine if an entire physic= al + * core is free. + * + * Returns NULL if idle tracking is not enabled, or running on a UP kernel. + */ +__bpf_kfunc const struct cpumask *scx_bpf_get_idle_smtmask(void) +{ + if (!check_builtin_idle_enabled()) + return cpu_none_mask; + +#ifdef CONFIG_SMP + if (sched_smt_active()) + return idle_masks.smt; + else + return idle_masks.cpu; +#else + return cpu_none_mask; +#endif +} + +/** + * scx_bpf_put_idle_cpumask - Release a previously acquired referenced kpt= r to + * either the percpu, or SMT idle-tracking cpumask. + * @idle_mask: &cpumask to use + */ +__bpf_kfunc void scx_bpf_put_idle_cpumask(const struct cpumask *idle_mask) +{ + /* + * Empty function body because we aren't actually acquiring or releasing + * a reference to a global idle cpumask, which is read-only in the + * caller and is never released. The acquire / release semantics here + * are just used to make the cpumask a trusted pointer in the caller. + */ +} + +/** + * scx_bpf_test_and_clear_cpu_idle - Test and clear @cpu's idle state + * @cpu: cpu to test and clear idle for + * + * Returns %true if @cpu was idle and its idle state was successfully clea= red. + * %false otherwise. + * + * Unavailable if ops.update_idle() is implemented and + * %SCX_OPS_KEEP_BUILTIN_IDLE is not set. + */ +__bpf_kfunc bool scx_bpf_test_and_clear_cpu_idle(s32 cpu) +{ + if (!check_builtin_idle_enabled()) + return false; + + if (ops_cpu_valid(cpu, NULL)) + return test_and_clear_cpu_idle(cpu); + else + return false; +} + +/** + * scx_bpf_pick_idle_cpu - Pick and claim an idle cpu + * @cpus_allowed: Allowed cpumask + * @flags: %SCX_PICK_IDLE_CPU_* flags + * + * Pick and claim an idle cpu in @cpus_allowed. Returns the picked idle cpu + * number on success. -%EBUSY if no matching cpu was found. + * + * Idle CPU tracking may race against CPU scheduling state transitions. For + * example, this function may return -%EBUSY as CPUs are transitioning int= o the + * idle state. If the caller then assumes that there will be dispatch even= ts on + * the CPUs as they were all busy, the scheduler may end up stalling with = CPUs + * idling while there are pending tasks. Use scx_bpf_pick_any_cpu() and + * scx_bpf_kick_cpu() to guarantee that there will be at least one dispatch + * event in the near future. + * + * Unavailable if ops.update_idle() is implemented and + * %SCX_OPS_KEEP_BUILTIN_IDLE is not set. + */ +__bpf_kfunc s32 scx_bpf_pick_idle_cpu(const struct cpumask *cpus_allowed, + u64 flags) +{ + if (!check_builtin_idle_enabled()) + return -EBUSY; + + return scx_pick_idle_cpu(cpus_allowed, flags); +} + +/** + * scx_bpf_pick_any_cpu - Pick and claim an idle cpu if available or pick = any CPU + * @cpus_allowed: Allowed cpumask + * @flags: %SCX_PICK_IDLE_CPU_* flags + * + * Pick and claim an idle cpu in @cpus_allowed. If none is available, pick= any + * CPU in @cpus_allowed. Guaranteed to succeed and returns the picked idle= cpu + * number if @cpus_allowed is not empty. -%EBUSY is returned if @cpus_allo= wed is + * empty. + * + * If ops.update_idle() is implemented and %SCX_OPS_KEEP_BUILTIN_IDLE is n= ot + * set, this function can't tell which CPUs are idle and will always pick = any + * CPU. + */ +__bpf_kfunc s32 scx_bpf_pick_any_cpu(const struct cpumask *cpus_allowed, + u64 flags) +{ + s32 cpu; + + if (static_branch_likely(&scx_builtin_idle_enabled)) { + cpu =3D scx_pick_idle_cpu(cpus_allowed, flags); + if (cpu >=3D 0) + return cpu; + } + + cpu =3D cpumask_any_distribute(cpus_allowed); + if (cpu < nr_cpu_ids) + return cpu; + else + return -EBUSY; +} + +__bpf_kfunc_end_defs(); + +BTF_KFUNCS_START(scx_kfunc_ids_select_cpu) +BTF_ID_FLAGS(func, scx_bpf_select_cpu_dfl, KF_RCU) +BTF_ID_FLAGS(func, scx_bpf_get_idle_cpumask, KF_ACQUIRE) +BTF_ID_FLAGS(func, scx_bpf_get_idle_smtmask, KF_ACQUIRE) +BTF_ID_FLAGS(func, scx_bpf_put_idle_cpumask, KF_RELEASE) +BTF_ID_FLAGS(func, scx_bpf_test_and_clear_cpu_idle) +BTF_ID_FLAGS(func, scx_bpf_pick_idle_cpu, KF_RCU) +BTF_ID_FLAGS(func, scx_bpf_pick_any_cpu, KF_RCU) +BTF_KFUNCS_END(scx_kfunc_ids_select_cpu) + +const struct btf_kfunc_id_set scx_kfunc_set_select_cpu =3D { + .owner =3D THIS_MODULE, + .set =3D &scx_kfunc_ids_select_cpu, +}; diff --git a/kernel/sched/ext_idle.h b/kernel/sched/ext_idle.h new file mode 100644 index 000000000000..c1385af1ceeb --- /dev/null +++ b/kernel/sched/ext_idle.h @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * BPF extensible scheduler class: Documentation/scheduler/sched-ext.rst + * + * Copyright (c) 2022 Meta Platforms, Inc. and affiliates. + * Copyright (c) 2022 Tejun Heo + * Copyright (c) 2022 David Vernet + * Copyright (c) 2024 Andrea Righi + */ +#ifndef _KERNEL_SCHED_EXT_IDLE_H + +static DEFINE_STATIC_KEY_FALSE(scx_builtin_idle_enabled); + +#ifdef CONFIG_SMP +static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_llc); +static DEFINE_STATIC_KEY_FALSE(scx_selcpu_topo_numa); + +static void update_selcpu_topology(void); +static void reset_idle_masks(void); +static void init_idle_masks(void); +static bool test_and_clear_cpu_idle(int cpu); + +static s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u64 flags= ); +static s32 scx_select_cpu_dfl(struct task_struct *p, + s32 prev_cpu, u64 wake_flags, bool *found); +#else /* !CONFIG_SMP */ +static inline void update_selcpu_topology(void) {} +static inline void reset_idle_masks(void) {} +static inline void init_idle_masks(void) {} +static inline bool test_and_clear_cpu_idle(int cpu) { return false; } + +static inline s32 scx_pick_idle_cpu(const struct cpumask *cpus_allowed, u6= 4 flags) +{ + return -EBUSY; +} +static inline s32 scx_select_cpu_dfl(struct task_struct *p, + s32 prev_cpu, u64 wake_flags, bool *found) +{ + return -EBUSY; +} +#endif /* CONFIG_SMP */ + +extern const struct btf_kfunc_id_set scx_kfunc_set_select_cpu; + +#endif /* _KERNEL_SCHED_EXT_IDLE_H */ --=20 2.48.1