From nobody Fri Dec 19 17:36:19 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FCB7C83F36 for ; Thu, 31 Aug 2023 10:46:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234779AbjHaKqo (ORCPT ); Thu, 31 Aug 2023 06:46:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245032AbjHaKqh (ORCPT ); Thu, 31 Aug 2023 06:46:37 -0400 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2085.outbound.protection.outlook.com [40.107.92.85]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3688BE45 for ; Thu, 31 Aug 2023 03:46:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kdKtvHXMvugW5dR1FkQx3WnjmWHEpQ1bYvj//iTvptnQqbrWD5JPFxdyAOyKIf5rJ4yWsEWOfMuWBnJo8bJjXedNKcAJz55iKf3cuUosQoTyTvyvgeaka3aWnUwh+kjehHMcPoxvQBAdsbG0jZjuRW7WYWImnWrataRHML/YwJz25mL52MqMfFVw0tGAUP3xNj98AMVah0GUQGWo8xMl1aWqualG62CGGorekIV+NYfZgBwXX38uMm0Wp4sKjOZxtt3EFXv9zEKQpevY2Ycx4/Cv0270zBcyA+DGARPVv05q2mU/iQKA3dkPJ2+9fDinGFRiFNNtnQNVRfg47UI6iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=N700d41axJmuUDbgbPsF0gLqf4w1V9LDbR0F2kNTLks=; b=BDV0Q2hiwDcoTCJAYBvV8jbcROuZq6ChvwubKJiKA4DV6imNCKbYE/KlosQLdPMjtrAFnfCgvJZVqlQCHFclAUlU3Pg5mNg5RwWjKUqwBl4vTKEIB7ZXSj2aSOxHyupsvA97vfFN3n9u7DEadlXKF7ZmOtn24fcEs2iMtDk2D+UXRkEaJXDdru1KG3SYus2Up4vcQGs2cDSZDomg0O2NmL7JOJaZ+xcMjkKUFKsDfecNUxjYOiTKiPsmUrMH37PGIPRZI21TC3b0t9/11Tr8uL95oT0ejkBJLxMQmcMhS7yMzJUYv7BMKtpF22LeAK+sQgtquuohaDcoOD9pZ6ia+w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=manifault.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=N700d41axJmuUDbgbPsF0gLqf4w1V9LDbR0F2kNTLks=; b=gnypNJvXYo2VdDt2768QEQizaW6XhTXDgx6ymx9/jnGKKkTEMANx4EZ/D/1uLPoyYXhJw9BNIKzpUBWaIqFlfEmYTtuqbs/ZU12I1oaqd3I1wU4xjOmKIIJjEi+I1C5EOn7yIQYT/w4NZjLXDSx8YGlfNe2IeB88+6DCW7pA3yE= Received: from SN7P222CA0016.NAMP222.PROD.OUTLOOK.COM (2603:10b6:806:124::16) by PH7PR12MB7209.namprd12.prod.outlook.com (2603:10b6:510:204::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6699.35; Thu, 31 Aug 2023 10:45:44 +0000 Received: from SN1PEPF0002636D.namprd02.prod.outlook.com (2603:10b6:806:124:cafe::f5) by SN7P222CA0016.outlook.office365.com (2603:10b6:806:124::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6745.22 via Frontend Transport; Thu, 31 Aug 2023 10:45:43 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SN1PEPF0002636D.mail.protection.outlook.com (10.167.241.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6745.17 via Frontend Transport; Thu, 31 Aug 2023 10:45:43 +0000 Received: from BLR5CG134614W.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Thu, 31 Aug 2023 05:45:36 -0500 From: K Prateek Nayak To: CC: , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 1/3] sched/fair: Move SHARED_RUNQ related structs and definitions into sched.h Date: Thu, 31 Aug 2023 16:15:06 +0530 Message-ID: <20230831104508.7619-2-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230831104508.7619-1-kprateek.nayak@amd.com> References: <31aeb639-1d66-2d12-1673-c19fed0ab33a@amd.com> <20230831104508.7619-1-kprateek.nayak@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF0002636D:EE_|PH7PR12MB7209:EE_ X-MS-Office365-Filtering-Correlation-Id: 462aeeaf-16d8-473f-0b35-08dbaa0f6d25 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 6DC7PVxSgFF2pUcVDQI9eASIMyeTtYQb72adHTIN3/qR/+TbBMNgeyaWzqCIRyksr4mUWYpkqLzpB2qW7kWqwfskJI7KTkBpNTiAbSsIxp46m5v9CFhvh8Llbnxzdy3bd+rMnimL0oGwk1h72Hpunkdlw8qDvwqnFoDNzv4BOaJmWubHfIC81deVCBlA/4k17fyvvCUAPSfyACHSAv1Z+E5PxnZS1I4RoycXlqdxcWsSIhPG0jWBYfmRICTTSO3ilG22pP0u+WNje6w04f3PcgfU2q1YjDeTdL+YgMYP/+OYWQhugaFVhp1GNU1GCd/tAwb9ReJ0cMYrt4bEijmVqoaybNEF6B3tIwYRRO1wxmeaHnD+5K6Kwpzxj/cvPVKWzOvdJwDLuWLKYWjZh2vI/EMtFhUAqF48E27pqibgojolzFKPU9H/vWB1p0zwvMCuE75sEETaxRXsDPhpwQ0qbG691BuS10LHgFeYxXgpkIfJsaJAjN4cirZ3+QmAwORA3HHGpdNJR1k1f4rHuu3VgYevMWz4MTFA7y6+OoTybYKznSRRfk9thYdtH5NaQVkXZWOydvwdJycmMNATxJWriGrhEIyFlbJKctqauUIk/wlHo18d++q0CMGbMreQlbCZKIl/DilCP8GV28TpWI/pCcUhcY4r/ASHshluHy87jzBWF1uSrkmDXctj8I8b/S9H+qHtoMz38/Zqb7wfhuRF1sycx72ReisS7ZN3OX6SgWZjTJockjfAE7CEUv1k+vq8O2HB4Bvv0rr/IA8/LgSGyw== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(346002)(136003)(376002)(396003)(39860400002)(186009)(1800799009)(82310400011)(451199024)(46966006)(40470700004)(36840700001)(356005)(81166007)(82740400003)(8936002)(6666004)(478600001)(70206006)(7696005)(54906003)(70586007)(6916009)(316002)(41300700001)(40460700003)(26005)(2616005)(16526019)(36756003)(8676002)(2906002)(86362001)(5660300002)(83380400001)(36860700001)(40480700001)(1076003)(426003)(336012)(47076005)(4326008)(7416002)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Aug 2023 10:45:43.4807 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 462aeeaf-16d8-473f-0b35-08dbaa0f6d25 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF0002636D.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB7209 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Move struct shared_runq_shard, struct shared_runq, SHARED_RUNQ_SHARD_SZ and SHARED_RUNQ_MAX_SHARDS definitions into sched.h Signed-off-by: K Prateek Nayak --- kernel/sched/fair.c | 68 -------------------------------------------- kernel/sched/sched.h | 68 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 68 insertions(+), 68 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d67d86d3bfdf..bf844ffa79c2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -139,74 +139,6 @@ static int __init setup_sched_thermal_decay_shift(char= *str) } __setup("sched_thermal_decay_shift=3D", setup_sched_thermal_decay_shift); =20 -/** - * struct shared_runq - Per-LLC queue structure for enqueuing and migrating - * runnable tasks within an LLC. - * - * struct shared_runq_shard - A structure containing a task list and a spi= nlock - * for a subset of cores in a struct shared_runq. - * - * WHAT - * =3D=3D=3D=3D - * - * This structure enables the scheduler to be more aggressively work - * conserving, by placing waking tasks on a per-LLC FIFO queue shard that = can - * then be pulled from when another core in the LLC is going to go idle. - * - * struct rq stores two pointers in its struct cfs_rq: - * - * 1. The per-LLC struct shared_runq which contains one or more shards of - * enqueued tasks. - * - * 2. The shard inside of the per-LLC struct shared_runq which contains the - * list of runnable tasks for that shard. - * - * Waking tasks are enqueued in the calling CPU's struct shared_runq_shard= in - * __enqueue_entity(), and are opportunistically pulled from the shared_ru= nq in - * newidle_balance(). Pulling from shards is an O(# shards) operation. - * - * There is currently no task-stealing between shared_runqs in different L= LCs, - * which means that shared_runq is not fully work conserving. This could be - * added at a later time, with tasks likely only being stolen across - * shared_runqs on the same NUMA node to avoid violating NUMA affinities. - * - * HOW - * =3D=3D=3D - * - * A struct shared_runq_shard is comprised of a list, and a spinlock for - * synchronization. Given that the critical section for a shared_runq is - * typically a fast list operation, and that the shared_runq_shard is loca= lized - * to a subset of cores on a single LLC (plus other cores in the LLC that = pull - * from the shard in newidle_balance()), the spinlock will typically only = be - * contended on workloads that do little else other than hammer the runque= ue. - * - * WHY - * =3D=3D=3D - * - * As mentioned above, the main benefit of shared_runq is that it enables = more - * aggressive work conservation in the scheduler. This can benefit workloa= ds - * that benefit more from CPU utilization than from L1/L2 cache locality. - * - * shared_runqs are segmented across LLCs both to avoid contention on the - * shared_runq spinlock by minimizing the number of CPUs that could conten= d on - * it, as well as to strike a balance between work conservation, and L3 ca= che - * locality. - */ -struct shared_runq_shard { - struct list_head list; - raw_spinlock_t lock; -} ____cacheline_aligned; - -/* This would likely work better as a configurable knob via debugfs */ -#define SHARED_RUNQ_SHARD_SZ 6 -#define SHARED_RUNQ_MAX_SHARDS \ - ((NR_CPUS / SHARED_RUNQ_SHARD_SZ) + (NR_CPUS % SHARED_RUNQ_SHARD_SZ !=3D = 0)) - -struct shared_runq { - unsigned int num_shards; - struct shared_runq_shard shards[SHARED_RUNQ_MAX_SHARDS]; -} ____cacheline_aligned; - #ifdef CONFIG_SMP =20 static DEFINE_PER_CPU(struct shared_runq, shared_runqs); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b504f8f4416b..f50176f720b1 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -545,6 +545,74 @@ do { \ # define u64_u32_load(var) u64_u32_load_copy(var, var##_copy) # define u64_u32_store(var, val) u64_u32_store_copy(var, var##_copy, val) =20 +/** + * struct shared_runq - Per-LLC queue structure for enqueuing and migrating + * runnable tasks within an LLC. + * + * struct shared_runq_shard - A structure containing a task list and a spi= nlock + * for a subset of cores in a struct shared_runq. + * + * WHAT + * =3D=3D=3D=3D + * + * This structure enables the scheduler to be more aggressively work + * conserving, by placing waking tasks on a per-LLC FIFO queue shard that = can + * then be pulled from when another core in the LLC is going to go idle. + * + * struct rq stores two pointers in its struct cfs_rq: + * + * 1. The per-LLC struct shared_runq which contains one or more shards of + * enqueued tasks. + * + * 2. The shard inside of the per-LLC struct shared_runq which contains the + * list of runnable tasks for that shard. + * + * Waking tasks are enqueued in the calling CPU's struct shared_runq_shard= in + * __enqueue_entity(), and are opportunistically pulled from the shared_ru= nq in + * newidle_balance(). Pulling from shards is an O(# shards) operation. + * + * There is currently no task-stealing between shared_runqs in different L= LCs, + * which means that shared_runq is not fully work conserving. This could be + * added at a later time, with tasks likely only being stolen across + * shared_runqs on the same NUMA node to avoid violating NUMA affinities. + * + * HOW + * =3D=3D=3D + * + * A struct shared_runq_shard is comprised of a list, and a spinlock for + * synchronization. Given that the critical section for a shared_runq is + * typically a fast list operation, and that the shared_runq_shard is loca= lized + * to a subset of cores on a single LLC (plus other cores in the LLC that = pull + * from the shard in newidle_balance()), the spinlock will typically only = be + * contended on workloads that do little else other than hammer the runque= ue. + * + * WHY + * =3D=3D=3D + * + * As mentioned above, the main benefit of shared_runq is that it enables = more + * aggressive work conservation in the scheduler. This can benefit workloa= ds + * that benefit more from CPU utilization than from L1/L2 cache locality. + * + * shared_runqs are segmented across LLCs both to avoid contention on the + * shared_runq spinlock by minimizing the number of CPUs that could conten= d on + * it, as well as to strike a balance between work conservation, and L3 ca= che + * locality. + */ +struct shared_runq_shard { + struct list_head list; + raw_spinlock_t lock; +} ____cacheline_aligned; + +/* This would likely work better as a configurable knob via debugfs */ +#define SHARED_RUNQ_SHARD_SZ 6 +#define SHARED_RUNQ_MAX_SHARDS \ + ((NR_CPUS / SHARED_RUNQ_SHARD_SZ) + (NR_CPUS % SHARED_RUNQ_SHARD_SZ !=3D = 0)) + +struct shared_runq { + unsigned int num_shards; + struct shared_runq_shard shards[SHARED_RUNQ_MAX_SHARDS]; +} ____cacheline_aligned; + /* CFS-related fields in a runqueue */ struct cfs_rq { struct load_weight load; --=20 2.34.1 From nobody Fri Dec 19 17:36:19 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34888C83F01 for ; Thu, 31 Aug 2023 10:47:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244923AbjHaKrT (ORCPT ); Thu, 31 Aug 2023 06:47:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242716AbjHaKrQ (ORCPT ); Thu, 31 Aug 2023 06:47:16 -0400 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2086.outbound.protection.outlook.com [40.107.223.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90CAD10CF for ; Thu, 31 Aug 2023 03:46:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=oc0Avs0fIOGc7tTLp0MVU7MiOH5r2c2TuFb3rasQoO1K6k26G2z45VPX2A1bwqWRgA7o6Ir8NngeuAmVPobeOimqdeUbWkYll997ghl86/n1zOtpDzAOTYoMn5vOHmlkSehMw6g42Feu6y2yitQiy0yb20uyP6KnJLRn1Y4GiIz6YGpsfPIXoBJsZrYf2YTutQDtLPkonMjXnpILSykPtnd7J1XkkjMLwR9zEzKsN7CdHT0uCeBSHFlJn9LNirGiVYiw15jvpw20pjt1O4rTBxjCapen/JRs8ZwqQYrzAs8gV6FC9FOWviKJiUAdlzKa0tmVUEgTw7ye1DtjKfCTpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lejRveKA+0KNqoBr9I8jd1tLGSvwBsA9vgSDzISUUDM=; b=ZXYl+SpHbKB84YVmoiDLCyhqTBeOwUPr0W7kfQfYMkZLMFqucRzpOt8P6JIl/tfABsvZu4AKpRK4FxHNK3ghZdA9t983+5CXZe9dkf+ioBY62yB5ok3CzpIFQTnoQy7ZDiNokBR3wB1A2iCqT+avVqFAlQPgDMljePuhoBsipfrYTTUMnB7txzTeDyWEjrw2rwSB/1jXe0H28qD+s0F8iQRe7sxga47k3uUK19gyaklEOqJOQj3yy1jVRwmubKrkLJwJHzpQxFDgEqNAaZYriZsJ64jK6u8LCkP3sNjWKoKS/ngceFZ89I9ICstiyX25b6whgXuZ8iR7RNkqkWCT1w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=manifault.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lejRveKA+0KNqoBr9I8jd1tLGSvwBsA9vgSDzISUUDM=; b=OnrfvPYps9u8O3fpPabzeYDjd+2Q9YFIyTTdmoOIYeT1ssiEr1M3OYTqszcMFfIfuD3EXnVOmzOBenxaGwyPVysK/Fj5DbXX4ASBYeWYycIuCDKHK43LrDHd7affz/bTdUKv+u3I3PSXhW+5b7f+5SoUXvvlg1LlDYMR4INugK4= Received: from SA9PR03CA0021.namprd03.prod.outlook.com (2603:10b6:806:20::26) by LV3PR12MB9258.namprd12.prod.outlook.com (2603:10b6:408:1bb::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6699.35; Thu, 31 Aug 2023 10:46:03 +0000 Received: from SN1PEPF00026369.namprd02.prod.outlook.com (2603:10b6:806:20:cafe::4c) by SA9PR03CA0021.outlook.office365.com (2603:10b6:806:20::26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6745.21 via Frontend Transport; Thu, 31 Aug 2023 10:46:03 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SN1PEPF00026369.mail.protection.outlook.com (10.167.241.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6745.17 via Frontend Transport; Thu, 31 Aug 2023 10:46:03 +0000 Received: from BLR5CG134614W.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Thu, 31 Aug 2023 05:45:56 -0500 From: K Prateek Nayak To: CC: , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 2/3] sched/fair: Improve integration of SHARED_RUNQ feature within newidle_balance Date: Thu, 31 Aug 2023 16:15:07 +0530 Message-ID: <20230831104508.7619-3-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230831104508.7619-1-kprateek.nayak@amd.com> References: <31aeb639-1d66-2d12-1673-c19fed0ab33a@amd.com> <20230831104508.7619-1-kprateek.nayak@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF00026369:EE_|LV3PR12MB9258:EE_ X-MS-Office365-Filtering-Correlation-Id: 83fd86da-2489-4ad8-4edd-08dbaa0f78f5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 4o6a8ro5YRz8hQ72jOMbHylg6ZfPjLuhhS3fce8h1oTL6IzPG2xqr7pS2lOwSVvmcmxB1QUlbdRm/Xu9JbdPHafGMw/VVoOAWPWasjQpNSiuCiFYXWwraLUVf4piXdeK19UxAiypcNzyir+w2j0pHoGKzO5yY5aXRzHu8Bp7pY0bkV8W3W4i/i5zwOYpa6i7cFBXF08D32Y1B3L/KmC+4alQ/aiIgVJlH2VurIaHllCUEinhK2yGzbs449s4MPpYqWtyawZVmJbbK4sEANRnYE18aeEn1KAG6TtZ9GCgvM3NwrbrgUyNXKicdzf0RmKZi6u3Vm2s1fgMQZ58H4aqjNprwSnjf7vZcyw/8njk37VsV5z5a/D8RqADkOh2lLOtUA2+ZLia1f4q0DZWvkjk4CX864bVUUusKnO4Ti2pGE8RBaMJBIhn+lNCvNUoVUdc3IDukZP4HT9TGeWRf8AmhlzOJRAR6YnnP+U6RZikz6ILHRRZ6e18a9zYfcI9OBH/k0GAV+S21LsEUeJfz2XazetR+aSSfJ0mPyj6Dki2yWlH4Wts37+hbhk0kqS8b7rNY1PK+Vlfaf/Sj1+iEPz0ezSkdkE/bYa3+8qJQL0MLZ0EIjOR+5I30Az4kZi1ygNRGsGuzYBd2ZCxR+NtBKgrEX/96gi1Kh84bTpSRHcSp57d4vSup5QpZwwWK3XdsQe1bKyagwTuRniixm2PjBR+mtssNJ95vylgTBBYiqBkc1WLmf+jbgmA7TEOD0FOHZ2Z X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(346002)(136003)(376002)(396003)(39860400002)(1800799009)(186009)(82310400011)(451199024)(46966006)(40470700004)(36840700001)(81166007)(82740400003)(356005)(8936002)(6666004)(70206006)(70586007)(478600001)(7696005)(54906003)(966005)(41300700001)(316002)(426003)(40460700003)(86362001)(36756003)(1076003)(26005)(16526019)(2616005)(83380400001)(47076005)(40480700001)(36860700001)(6916009)(2906002)(8676002)(336012)(5660300002)(7416002)(4326008)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Aug 2023 10:46:03.3005 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 83fd86da-2489-4ad8-4edd-08dbaa0f78f5 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF00026369.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV3PR12MB9258 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" This patch takes the relevant optimizations from [1] in newidle_balance(). Following is the breakdown: - Check "rq->rd->overload" before jumping into newidle_balance, even with SHARED_RQ feat enabled. - Call update_next_balance() for all the domains till MC Domain in when SHARED_RQ path is taken. - Account cost from shared_runq_pick_next_task() and update curr_cost and sd->max_newidle_lb_cost accordingly. - Move the initial rq_unpin_lock() logic around. Also, the caller of shared_runq_pick_next_task() is responsible for calling rq_repin_lock() if the return value is non zero. (Needs to be verified everything is right with LOCKDEP) - Includes a fix to skip directly above the LLC domain when calling the load_balance() in newidle_balance() All other surgery from [1] has been removed. Link: https://lore.kernel.org/all/31aeb639-1d66-2d12-1673-c19fed0ab33a@amd.= com/ [1] Signed-off-by: K Prateek Nayak --- kernel/sched/fair.c | 94 ++++++++++++++++++++++++++++++++------------- 1 file changed, 67 insertions(+), 27 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bf844ffa79c2..446ffdad49e1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -337,7 +337,6 @@ static int shared_runq_pick_next_task(struct rq *rq, st= ruct rq_flags *rf) rq_unpin_lock(rq, &src_rf); raw_spin_unlock_irqrestore(&p->pi_lock, src_rf.flags); } - rq_repin_lock(rq, rf); =20 return ret; } @@ -12276,50 +12275,83 @@ static int newidle_balance(struct rq *this_rq, st= ruct rq_flags *rf) if (!cpu_active(this_cpu)) return 0; =20 - if (sched_feat(SHARED_RUNQ)) { - pulled_task =3D shared_runq_pick_next_task(this_rq, rf); - if (pulled_task) - return pulled_task; - } - /* * We must set idle_stamp _before_ calling idle_balance(), such that we * measure the duration of idle_balance() as idle time. */ this_rq->idle_stamp =3D rq_clock(this_rq); =20 - /* - * This is OK, because current is on_cpu, which avoids it being picked - * for load-balance and preemption/IRQs are still disabled avoiding - * further scheduler activity on it and we're being very careful to - * re-start the picking loop. - */ - rq_unpin_lock(this_rq, rf); - rcu_read_lock(); - sd =3D rcu_dereference_check_sched_domain(this_rq->sd); - - /* - * Skip <=3D LLC domains as they likely won't have any tasks if the - * shared runq is empty. - */ - if (sched_feat(SHARED_RUNQ)) { + if (sched_feat(SHARED_RUNQ)) sd =3D rcu_dereference(*this_cpu_ptr(&sd_llc)); - if (likely(sd)) - sd =3D sd->parent; - } + else + sd =3D rcu_dereference_check_sched_domain(this_rq->sd); =20 if (!READ_ONCE(this_rq->rd->overload) || - (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { + /* Look at rq->avg_idle iff SHARED_RUNQ is disabled */ + (!sched_feat(SHARED_RUNQ) && sd && this_rq->avg_idle < sd->max_newidl= e_lb_cost)) { =20 - if (sd) + while (sd) { update_next_balance(sd, &next_balance); + sd =3D sd->child; + } + rcu_read_unlock(); =20 goto out; } + + if (sched_feat(SHARED_RUNQ)) { + struct sched_domain *tmp =3D sd; + + t0 =3D sched_clock_cpu(this_cpu); + + /* Do update_next_balance() for all domains within LLC */ + while (tmp) { + update_next_balance(tmp, &next_balance); + tmp =3D tmp->child; + } + + pulled_task =3D shared_runq_pick_next_task(this_rq, rf); + if (pulled_task) { + if (sd) { + curr_cost =3D sched_clock_cpu(this_cpu) - t0; + /* + * Will help bail out of scans of higer domains + * slightly earlier. + */ + update_newidle_cost(sd, curr_cost); + } + + rcu_read_unlock(); + goto out_swq; + } + + if (sd) { + t1 =3D sched_clock_cpu(this_cpu); + curr_cost +=3D t1 - t0; + update_newidle_cost(sd, curr_cost); + } + + /* + * Since shared_runq_pick_next_task() can take a while + * check if the CPU was targetted for a wakeup in the + * meantime. + */ + if (this_rq->ttwu_pending) { + rcu_read_unlock(); + return 0; + } + } rcu_read_unlock(); =20 + /* + * This is OK, because current is on_cpu, which avoids it being picked + * for load-balance and preemption/IRQs are still disabled avoiding + * further scheduler activity on it and we're being very careful to + * re-start the picking loop. + */ + rq_unpin_lock(this_rq, rf); raw_spin_rq_unlock(this_rq); =20 t0 =3D sched_clock_cpu(this_cpu); @@ -12335,6 +12367,13 @@ static int newidle_balance(struct rq *this_rq, str= uct rq_flags *rf) if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) break; =20 + /* + * Skip <=3D LLC domains as they likely won't have any tasks if the + * shared runq is empty. + */ + if (sched_feat(SHARED_RUNQ) && (sd->flags & SD_SHARE_PKG_RESOURCES)) + continue; + if (sd->flags & SD_BALANCE_NEWIDLE) { =20 pulled_task =3D load_balance(this_cpu, this_rq, @@ -12361,6 +12400,7 @@ static int newidle_balance(struct rq *this_rq, stru= ct rq_flags *rf) =20 raw_spin_rq_lock(this_rq); =20 +out_swq: if (curr_cost > this_rq->max_idle_balance_cost) this_rq->max_idle_balance_cost =3D curr_cost; =20 --=20 2.34.1 From nobody Fri Dec 19 17:36:19 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 105EEC83F01 for ; Thu, 31 Aug 2023 10:47:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238639AbjHaKrz (ORCPT ); Thu, 31 Aug 2023 06:47:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345266AbjHaKrm (ORCPT ); Thu, 31 Aug 2023 06:47:42 -0400 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2071.outbound.protection.outlook.com [40.107.93.71]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E89FCFE for ; Thu, 31 Aug 2023 03:47:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=XcoOI4a1rvQPQSJ6jqZfzZRyfmPo5VWGzKHw6LX2ZMj08/3xwPFeDz80r00QyH/RMt9vjCBKawA92yLP2/tV9qKnwwhnvinlTLlblnbBWuyk5MQbdWw9ov0ZNQIr00R99zJkxYN6OUn0zCxsBehmhE8E6xZU8soTskWc/i/E8VUm/CcdzTmfLlJWm2fHWv+Zr9SB0+soQXV2CZuBBXSBYIXJatQcn3ajshvtCFy+080jlJ1MgPexzHy3Pk+d+gs22o0YmdqXYQxbwmQOeIBaOilCJNE8tz0fTcg4LHLhqaZEVzB+g/Sc2GHGr5H2mPR9nZ33akqOwXv9As8zqfE/aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=D3fOzSlPbtQ9G5Lbjw7F56CqkwjWnTdafUCdwc8+Ex0=; b=LuJppmoPXeQGyrw4FqVt3u9+U6plMNq8N02BdYIb36A2PntTvpV47IKNj1EoDjZ6Tsap+wv765/CaE+qWbbkWP+KMvANYJ0BXj3Zi0ese9wyqlJvupwpkUM3cOWOYm15eEMqXolK1vxxCrJ6HtkQZjQOggXIAokJa/0MUrQDhZRa+rTW6rhSEvifWysB7k3QzK3OBllX+vCj4KIE005yUg6XHNCOORsTfFxHKR+4vZbGW60EPJPq+FJaplpkzFmIkqb11V1Jztn0CjGQ4tSisLClXxWX4FUjEiowj9whwYWQdntztAyPbAf2/v2vk92m5SKK8cWVAuXSPcpduJrYtg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=manifault.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=D3fOzSlPbtQ9G5Lbjw7F56CqkwjWnTdafUCdwc8+Ex0=; b=WKf6naKNR4/YNVrxQQcNHWz9hngxvytOSBvnUO4wAkMLok76t6hxyF/YqgEkvPvzBYH+xkJt3QaBnV7elLy7mEa3HTgMCjJBxmkh/bOPTVzKnlwyN9kCu4fia2IrwKZYj9Owc6hhv3KsPR5BvfKPQmCCbytr+tHdi5WGyyRUPTA= Received: from SA9P221CA0016.NAMP221.PROD.OUTLOOK.COM (2603:10b6:806:25::21) by SJ2PR12MB7920.namprd12.prod.outlook.com (2603:10b6:a03:4c6::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6745.21; Thu, 31 Aug 2023 10:46:23 +0000 Received: from SN1PEPF0002636E.namprd02.prod.outlook.com (2603:10b6:806:25:cafe::c1) by SA9P221CA0016.outlook.office365.com (2603:10b6:806:25::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6745.22 via Frontend Transport; Thu, 31 Aug 2023 10:46:23 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SN1PEPF0002636E.mail.protection.outlook.com (10.167.241.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6745.17 via Frontend Transport; Thu, 31 Aug 2023 10:46:22 +0000 Received: from BLR5CG134614W.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Thu, 31 Aug 2023 05:46:16 -0500 From: K Prateek Nayak To: CC: , , , , , , , , , , , , , , , , , Subject: [RFC PATCH 3/3] sched/fair: Add a per-shard overload flag Date: Thu, 31 Aug 2023 16:15:08 +0530 Message-ID: <20230831104508.7619-4-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230831104508.7619-1-kprateek.nayak@amd.com> References: <31aeb639-1d66-2d12-1673-c19fed0ab33a@amd.com> <20230831104508.7619-1-kprateek.nayak@amd.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF0002636E:EE_|SJ2PR12MB7920:EE_ X-MS-Office365-Filtering-Correlation-Id: 9fe56222-ee3f-464e-5872-08dbaa0f84af X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: GOFkzdm//JE8Vc3vXMV91uLMpXNvcOCqti3w+O6iGqmGOKv1B9RRBBwe4Q3GxBpqLA9SXhcAZ9OP/vtuR0MYzJ1oFzsKCxzxCwxx5BoavxH20dkEZKdlg/0NwJp2+a4z10W+klLYyer+s7tWa5U0GhDC13deCrdKeLMAHtX1hgi34qJZqg98W3GpDDY6Haig7NEeFoRX2SfscPJiAmMsix0weH7ufQUGc7w+8TFK7kCJYCZ4A4YJ5dh+wdDsB98883lVe61e6V7YTcwA8TeePjn591xqm/SBhQLGoN2LNfj7kp1vyCL+hrPy6lX5fM4l2MxF+IMZimHwPxNeBsFQ6LNAFKPlDjPF14Zynrfez3tIinU+U+YFSUDvR1omqQvhmKAaFIGGnlbV0lOkwZleZ1ad1dA/eUsenVIyebQczcF2iEAoVwU/PvgWQFPqpCt0JRBxuKzRfmycQkRgElgc31YrhJdz3VACIy0F/sEh6H1A2tY7+4i9pCC9l0VqO9kGgGhkQWEtMsWOLpUsQtf6Z6aNyhcxHN/t2TmrHO+SKRamkQk6m1p7WDkfOngfs8s+Q57GqMrwPI2KW2Y0SxVVQQ0YbTR0ucFjyvZso9Etjjr6i3Yfz5a65ZOz4KmuVYQ6+5e5VoDhuQpRjBccY5+ty2Rg+S5l1jsYt04AKbimKEYpUDxtcDjZXeDuvzPmYzjWaQwZKoyKv7nxZJYVfu46Y3Nf5E4i7GTluk5lnLePSrMR5GuwytACUFhOPg1DIrhKEXm8lB3L3qTPqr+77xqORw== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(376002)(39860400002)(136003)(346002)(396003)(451199024)(82310400011)(186009)(1800799009)(40470700004)(36840700001)(46966006)(4326008)(8936002)(8676002)(5660300002)(316002)(6916009)(36756003)(54906003)(2906002)(70206006)(70586007)(7416002)(41300700001)(40460700003)(7696005)(36860700001)(16526019)(336012)(26005)(426003)(1076003)(40480700001)(81166007)(82740400003)(6666004)(356005)(478600001)(2616005)(47076005)(83380400001)(86362001)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Aug 2023 10:46:22.9747 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9fe56222-ee3f-464e-5872-08dbaa0f84af X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF0002636E.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB7920 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Even with the two patches, I still observe the following lock contention when profiling the tbench 128-clients run with IBS: - 12.61% swapper [kernel.vmlinux] [k] native_queued_s= pin_lock_slowpath - 10.94% native_queued_spin_lock_slowpath - 10.73% _raw_spin_lock - 9.57% __schedule schedule_idle do_idle + cpu_startup_entry - 0.82% task_rq_lock newidle_balance pick_next_task_fair __schedule schedule_idle do_idle + cpu_startup_entry Since David mentioned rq->avg_idle check is probably not the right step towards the solution, this experiment introduces a per-shard "overload" flag. Similar to "rq->rd->overload", per-shard overload flag notifies of the possibility of one or more rq covered in the shard's domain having a queued task. shard's overload flag is set at the same time as "rq->rd->overload", and is cleared when shard's list is found to be empty. With these changes, following are the results for tbench 128-clients: tip : 1.00 (var: 1.00%) tip + v3 + series till patch 2 : 0.41 (var: 1.15%) (diff: -58.81%) tip + v3 + full series : 1.01 (var: 0.36%) (diff: +00.92%) Signed-off-by: K Prateek Nayak --- kernel/sched/fair.c | 13 +++++++++++-- kernel/sched/sched.h | 17 +++++++++++++++++ 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 446ffdad49e1..31fe109fdaf0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -186,6 +186,7 @@ static void shared_runq_reassign_domains(void) rq->cfs.shared_runq =3D shared_runq; rq->cfs.shard =3D &shared_runq->shards[shard_idx]; rq_unlock(rq, &rf); + WRITE_ONCE(rq->cfs.shard->overload, 0); } } =20 @@ -202,6 +203,7 @@ static void __shared_runq_drain(struct shared_runq *sha= red_runq) list_for_each_entry_safe(p, tmp, &shard->list, shared_runq_node) list_del_init(&p->shared_runq_node); raw_spin_unlock(&shard->lock); + WRITE_ONCE(shard->overload, 0); } } =20 @@ -258,13 +260,20 @@ shared_runq_pop_task(struct shared_runq_shard *shard,= int target) { struct task_struct *p; =20 - if (list_empty(&shard->list)) + if (!READ_ONCE(shard->overload)) return NULL; =20 + if (list_empty(&shard->list)) { + WRITE_ONCE(shard->overload, 0); + return NULL; + } + raw_spin_lock(&shard->lock); p =3D list_first_entry_or_null(&shard->list, struct task_struct, shared_runq_node); - if (p && is_cpu_allowed(p, target)) + if (!p) + WRITE_ONCE(shard->overload, 0); + else if (is_cpu_allowed(p, target)) list_del_init(&p->shared_runq_node); else p =3D NULL; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index f50176f720b1..e8d4d948f742 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -601,6 +601,20 @@ do { \ struct shared_runq_shard { struct list_head list; raw_spinlock_t lock; + /* + * shared_runq_shard can contain running tasks. + * In such cases where all the tasks are running, + * it is futile to attempt to pull tasks from the + * list. Overload flag is used to indicate case + * where one or more rq in the shard domain may + * have a queued task. If the flag is 0, it is + * very likely that all tasks in the shard are + * running and cannot be migrated. This is not + * guarded by the shard lock, and since it may + * be updated often, it is placed into its own + * cacheline. + */ + int overload ____cacheline_aligned; } ____cacheline_aligned; =20 /* This would likely work better as a configurable knob via debugfs */ @@ -2585,6 +2599,9 @@ static inline void add_nr_running(struct rq *rq, unsi= gned count) if (prev_nr < 2 && rq->nr_running >=3D 2) { if (!READ_ONCE(rq->rd->overload)) WRITE_ONCE(rq->rd->overload, 1); + + if (rq->cfs.shard && !READ_ONCE(rq->cfs.shard->overload)) + WRITE_ONCE(rq->cfs.shard->overload, 1); } #endif =20 --=20 2.34.1