From nobody Thu Dec 18 15:32:02 2025 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2072.outbound.protection.outlook.com [40.107.237.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2160F17C205 for ; Wed, 10 Jul 2024 09:02:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.237.72 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720602173; cv=fail; b=Z3U9tMEHr2vVIlRpk8a3uIDUEGu+0+6Dh+1FUtdKpsueDShsBZ+xGgx1i0rSiFe3RpbUGHv0ZxlHiDqdjUhvcMtbqX9+9ApLfbYbWPTL6jWuuKhuUcbv7VPbZyR9szN/+lr9RkM4ZmwiQu1UFx86dZnEw9Ekz7SKRgYtR88OL50= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720602173; c=relaxed/simple; bh=/n1zw3pNW3dNP+cp4qWXAqkAl1jqer2S0aTtSaEo53k=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Lb0oBelD4gv6uhX7F+CYbAmCzJp7pJN3+I4zNEPkMoan2K42lnHdPsPhqhC+1PTkbvZw9GTog0JBvRyXhpUymIQxckhlk2o5j9NthGyaLQOeCoDqWx92doXibhKODQENMn6ZGyWpmEeTNncqoO0oWNjgh/Hr7xcZsI9UpLD49dE= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=GJlDvwvQ; arc=fail smtp.client-ip=40.107.237.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="GJlDvwvQ" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RCCj3E/R+A7y8kVUIl4yushh2MO9f0cwDkBOD5H3NEOSa7niIADUnucbJkMlHL+D3zjou3QW3BqTH0TQEzIIvNNMb+3lD+K2Rwvqm6MoCBd8cYkhvRSIAEj6D0sKXnhCziOTAgXd2sGg/CFG4F8RBkbuGaJvFTxEwsuJRrMyKJTjt+OSJnh0URLn85lcVl92R2qe6/OOrVECl3FWDl/a6Gh0069Dh9RdfqET1N2wTBfeoJf6hvDmQrLI27GOqnSWD4wIugPOWgV2r941HPOWdk4xlB2dTihmyMoYaqxQaY+gJVsyeK25LWzq9LeJTeG7RI2PuCyqUuNCHFKq+vS6rQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5K/tW3tekbk++sGL2PuNPV4YHI6oJCgOEO0o4brTJwg=; b=LnuQ/ozfL2wNeaGB4ZcvQa8e0sRfVLRSauKDkYU81FayXf9v+MEnvq2yoFfqUd3t3ZzyO/XJIxI3VxvZxqtHAZnaguU2BAKdDtiFOJ+PffNNuffFzErhizoGBacfKh7u8IWSW1RGmLaNCU/4lsBYtJAlilPcj82xNTxNEXbAbNRJRw7VteD7qPx6YEsBehRsYSN6NKFFXLO8vqJCG0GED75fADzqbBVw9+C3Oncgc9zPq+8rETcQ1wRyBnZuQ/BVNiJ95nKMbQEuEcEVvNA9By/rxaG2WLTvQk13IqoAAdGCs7P13Bepl5X0+e2H5PpvQLDO8Z2CI5IPHORvg+Nr7g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5K/tW3tekbk++sGL2PuNPV4YHI6oJCgOEO0o4brTJwg=; b=GJlDvwvQWE1Enh35Zu/pCPND2SyMnToZTNtDpBB899+w81ElDxmp5+BJBZ7m80PbKzX12l2MLNt+0bOPGU7IUSLlIVAyWA2kbNLhXvSTVcOzVd+h73yMYbej/+HeiAtHiKR7T/wjW39Si2nNOtddmeBHhHFCIrGQCZ6yG3Z/LG8= Received: from SA9PR13CA0020.namprd13.prod.outlook.com (2603:10b6:806:21::25) by MN0PR12MB6317.namprd12.prod.outlook.com (2603:10b6:208:3c2::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.35; Wed, 10 Jul 2024 09:02:45 +0000 Received: from SN1PEPF0002636D.namprd02.prod.outlook.com (2603:10b6:806:21:cafe::47) by SA9PR13CA0020.outlook.office365.com (2603:10b6:806:21::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7762.19 via Frontend Transport; Wed, 10 Jul 2024 09:02:44 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SN1PEPF0002636D.mail.protection.outlook.com (10.167.241.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7762.17 via Frontend Transport; Wed, 10 Jul 2024 09:02:43 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 10 Jul 2024 04:02:37 -0500 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , CC: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , "Valentin Schneider" , "Paul E. McKenney" , Imran Khan , Leonardo Bras , "Guo Ren" , Rik van Riel , Tejun Heo , Cruz Zhao , Lai Jiangshan , Joel Fernandes , Zqiang , Julia Lawall , "Gautham R. Shenoy" , K Prateek Nayak Subject: [PATCH 1/3] sched/core: Remove the unnecessary need_resched() check in nohz_csd_func() Date: Wed, 10 Jul 2024 09:02:08 +0000 Message-ID: <20240710090210.41856-2-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240710090210.41856-1-kprateek.nayak@amd.com> References: <20240710090210.41856-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF0002636D:EE_|MN0PR12MB6317:EE_ X-MS-Office365-Filtering-Correlation-Id: d5956d86-3b38-443c-60ee-08dca0bf0fad X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|36860700013|1800799024|376014|7416014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?fy6Cq5O0CWAdYHxE2CzjZHzYutKnW/XUvv/AW2mbdjeDQOE6sCqOd11t5l5d?= =?us-ascii?Q?mRg6BJ9d+fU/SqyWUa+t4Er/eFUj5jn1NUkJoRsxr6IHyGc86ODGRHgc1TP+?= =?us-ascii?Q?1GJax/QCJ2ycI7gm598ILkY2V7FUwFgsosMAofNpUZKVzBnidZgRbW6tIpae?= =?us-ascii?Q?jPWnpCXrVBp7gbZMl1vttWbQVG6ND2l/4rPZJQoqkZCZfusCziZqbcW8KYZT?= =?us-ascii?Q?y27HAQvPKRRK7UVWa0M8Fc+0ThJLENf/2PGo+YZO6GrzflliQU4rWms/mNep?= =?us-ascii?Q?XyTPB3/rHlvT95//2SnLdw9fLe9bi4o0PFMkjxONO4Gw1MbNU1RLSsqBB4Kl?= =?us-ascii?Q?AD4gt9hzA/iRepaGc5iOqiVSMsowi4yR178BQAwrY1NGdufsooOwmTviKjQP?= =?us-ascii?Q?w3J3vkgKdooH4xxjTDVZbuRDfSbAla+ZWYHvIOBOnxBODFCdgD7/Qi36Xq3y?= =?us-ascii?Q?0yH7qSQlzOE0LEnOXKQCFjCjyjyEYrGWR0TXfAJZAitH6FApor09ed2xSZJr?= =?us-ascii?Q?9gOjriNkp+R3+XdsLKCVoCroJLT9oQwhJPUmXqEqUU60jCPz8yQ7L0X0i/Hu?= =?us-ascii?Q?6IklMej0h670USECrBNZwz7pWZ44AW8psjDFSLeEL+0x+RJb0n8uPV4AW/Kv?= =?us-ascii?Q?2CvBMF8JOOcRr3WJNK9PGMXAueIAMNqXpTXl8zX1che0OgRdQ44Y/DmMyaLN?= =?us-ascii?Q?zfxllZEA0E83PR64js6fTe8Mlde1zNGuYD6lxEfLOl3GSAmjeUaZCPGJpiNM?= =?us-ascii?Q?tuQyizgW84DLMgooiju1XM743sj/ZqEggSZp0th5MpFXnF3pucoZCAxNzA4N?= =?us-ascii?Q?+aoB2pBFc296++zNch2fGLMOVErKrgJZBZt9LBipTJhBYwZAT6jNFGgigP3p?= =?us-ascii?Q?0MJtdMTj3e+Z/aIKugMZ0gzv/3Q1dVmJT+WsveMGOEmg9YTfMYJq91n4bYFK?= =?us-ascii?Q?Yv4AnLbm3bvvKnKx1o6OUUaGC66kjFfWvAdbdFGzJ8BIDy3z3bf3Bu8nCbJf?= =?us-ascii?Q?ZssAYODnGW9jTwYA7rzT1cb/58hcp5MQX01p15rSsLppXVYa3OwamtFS4097?= =?us-ascii?Q?9SSwsvW6wKIBdXIaYNz40uzsRU9HK0V5TFNAeDIzvMpWEv9zQe+ajWa8Rpuv?= =?us-ascii?Q?iJktc1CApLPxn5rjv8hGedBD7Lj8a1edYDhwxV3aQJCVk2M7btq431vPiuno?= =?us-ascii?Q?LlZRrKUVnSrOxEG/qCptXv5ShiBPCkQVcpuSK9cCwid3NbVifenbgUjIrsQ0?= =?us-ascii?Q?02xhTVGFCkPQQSsRmFmBIutubhk9i0tblutTCWuGj95D9CGfeczpe57k6cmj?= =?us-ascii?Q?HdZoBZbJv+lec4La7log1c5SXc2fmbQ44DZuIo6wDSoqlG7IXlU3Og31NIU7?= =?us-ascii?Q?XbMPBC6bLaKHr1h6hWoqYDSm5TJYda3sAXF3w3JIZrXUa939sg=3D=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(36860700013)(1800799024)(376014)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jul 2024 09:02:43.9798 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d5956d86-3b38-443c-60ee-08dca0bf0fad X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF0002636D.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB6317 Content-Type: text/plain; charset="utf-8" The need_resched() check currently in nohz_csd_func() can be tracked to have been added in scheduler_ipi() back in 2011 via commit ca38062e57e9 ("sched: Use resched IPI to kick off the nohz idle balance") Since then, it has travelled quite a bit but it seems like an idle_cpu() check currently is sufficient to detect the need to bail out from an idle load balancing. To justify this removal, consider all the following case where an idle load balancing could race with a task wakeup: o Since commit f3dd3f674555b ("sched: Remove the limitation of WF_ON_CPU on wakelist if wakee cpu is idle") a target perceived to be idle (target_rq->nr_running =3D=3D 0) will return true for ttwu_queue_cond(target) which will offload the task wakeup to the idle target via an IPI. In all such cases target_rq->ttwu_pending will be set to 1 before queuing the wake function. If an idle load balance races here, following scenarios are possible: - The CPU is not in TIF_POLLING_NRFLAG mode in which case an actual IPI is sent to the CPU to wake it out of idle. If the nohz_csd_func() queues before sched_ttwu_pending(), the idle load balance will bail out since idle_cpu(target) returns 0 since target_rq->ttwu_pending is 1. If the nohz_csd_func() is queued after sched_ttwu_pending() it should see rq->nr_running to be non-zero and bail out of idle load balancing. - The CPU is in TIF_POLLING_NRFLAG mode and instead of an actual IPI, the sender will simply set TIF_NEED_RESCHED for the target to put it out of idle and flush_smp_call_function_queue() in do_idle() will execute the call function. Depending on the ordering of the queuing of nohz_csd_func() and sched_ttwu_pending(), the idle_cpu() check in nohz_csd_func() should either see target_rq->ttwu_pending =3D 1 or target_rq->nr_running to be non-zero if there is a genuine task wakeup racing with the idle load balance kick. o The waker CPU perceives the target CPU to be busy (targer_rq->nr_running !=3D 0) but the CPU is in fact going idle and due to a series of unfortunate events, the system reaches a case where the waker CPU decides to perform the wakeup by itself in ttwu_queue() on the target CPU but target is concurrently selected for idle load balance (Can this happen? I'm not sure, but we'll consider its possibility to estimate the worst case scenario). ttwu_do_activate() calls enqueue_task() which would increment "rq->nr_running" post which it calls wakeup_preempt() which is responsible for setting TIF_NEED_RESCHED (via a resched IPI or by setting TIF_NEED_RESCHED on a TIF_POLLING_NRFLAG idle CPU) The key thing to note in this case is that rq->nr_running is already non-zero in case of a wakeup before TIF_NEED_RESCHED is set which would lead to idle_cpu() check returning false. In all cases, it seems that need_resched() check is unnecessary when checking for idle_cpu() first since an impending wakeup racing with idle load balancer will either set the "rq->ttwu_pending" or indicate a newly woken task via "rq->nr_running". Chasing the reason why this check might have existed in the first place, I came across Peter's suggestion on the fist iteration of Suresh's patch from 2011 [1] where the condition to raise the SCHED_SOFTIRQ was: sched_ttwu_do_pending(list); if (unlikely((rq->idle =3D=3D current) && rq->nohz_balance_kick && !need_resched())) raise_softirq_irqoff(SCHED_SOFTIRQ); However, since this was preceded by sched_ttwu_do_pending() which is equivalent of sched_ttwu_pending() in the current upstream kernel, the need_resched() check was necessary to catch a newly queued task. Peter suggested modifying it to: if (idle_cpu() && rq->nohz_balance_kick && !need_resched()) raise_softirq_irqoff(SCHED_SOFTIRQ); where idle_cpu() seems to have replaced "rq->idle =3D=3D current" check. However, even back then, the idle_cpu() check would have been sufficient to have caught the enqueue of a new task and since commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") overloads the interpretation of TIF_NEED_RESCHED for TIF_POLLING_NRFLAG idling, remove the need_resched() check in nohz_csd_func() to raise SCHED_SOFTIRQ based on Peter's suggestion. Link: https://lore.kernel.org/all/1317670590.20367.38.camel@twins/ [1] Link: https://lore.kernel.org/lkml/20240615014521.GR8774@noisy.programming.= kicks-ass.net/ Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") Suggested-by: Peter Zijlstra Signed-off-by: K Prateek Nayak --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0935f9d4bb7b..1e0c77eac65a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1205,7 +1205,7 @@ static void nohz_csd_func(void *info) WARN_ON(!(flags & NOHZ_KICK_MASK)); =20 rq->idle_balance =3D idle_cpu(cpu); - if (rq->idle_balance && !need_resched()) { + if (rq->idle_balance) { rq->nohz_idle_balance =3D flags; raise_softirq_irqoff(SCHED_SOFTIRQ); } --=20 2.34.1 From nobody Thu Dec 18 15:32:02 2025 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2070.outbound.protection.outlook.com [40.107.236.70]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FCF718EA86 for ; Wed, 10 Jul 2024 09:03:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.236.70 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720602195; cv=fail; b=UYOkh9d/Fn5gS2zdY4AXHNY952eJB4YEG+hphYCiwYSykuDhL2kP99np4LqGa+5lzizfCpf4wEusVEreetHay56UW2tFpyYaQkKhOgkdNgm0g9NUH3X86d6eVOglLn8lGGqFdDJ3J99BLwMoPoyg2a29GXjCXnn2OtOixhEmiDE= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720602195; c=relaxed/simple; bh=wJz8xlUL6j7pNimpIEL+8AYBbGd3gVI/b4Emsu/7A+0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Vs6v1qdHU6A5CsZVToh0dbCjM02RM6DXphhZmbfXdxyIh0eztgN3vZwDDZDxC8orccyt+Ta1D8uzgKwSo26M2PRX7zlqVLmB7eY9FR7759Mjkuld22AbgH6lOLIsvu6Fx2tkBtWUjqf7qREQTwAiBR7k/6T3jOffqKwtjTJHQn4= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=yptkNHpB; arc=fail smtp.client-ip=40.107.236.70 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="yptkNHpB" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=meOOPOha1CPP8GuUYXobftuqmGSXd4CZrogr2KV5IeYOdh9tVnUjww3c0G4M8Q8qY47I0XHi7iXAZzvR2x+ON6GKek11iI2G82Ou/as5sj3wP2VgVJYqaUjo6k6O2qIuHEMg2zUQrxW7MdFfP54uAJm+roMyxVI0DqGJqqebrg2+9mUtbH27oSBpLPjBSk0+m9oZGUB0nxV86Z0tuxjzPZU554zlJRAUBMGyCdK/0r1/6rcj+l8tCeQbvQrp3e/m3+tns6eQvAJU238aHrJl3oUFcCh9TyO5g+ODTPHlib2vstlHz19DiuoT+WQXUmRWAsP4zEeg17rH1q/FB3ofVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=IVAn/kZjrqAb3iJjYk/98xigYUD2onh9DGXB0lQ4Lk8=; b=FBK8Wv9yKc7iA6TAwi/hQM7KwbmNNG2fSpV3+28s0l9US8/xho2v+lKMjDw2BjPrc4miFO6VgCszE3FZRNvy2ArKuu6jCS06wE7CiqYl4ZbYigwsQkzOujDuB5WdnpWmdxQ4+yqrjD+Q8/pyeg29lQk7RePIHaTFu2sMGRsJFFq+V9Iu9O8t7x53oRvz6ySEp3lgTKBtljMobGORHliIjJ9l9MoFsvVxNbTnsrF70Xwy03Oi+pjk7ITJAU8ZR9uce2SJIEV0KtX8D0O7vVQP5PO45db0Zqp0c5nVLWYtCfMcCDEUkVz/R3RFP3+5sfuiNwMsBrr1kk0/UhKj0mrAFw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IVAn/kZjrqAb3iJjYk/98xigYUD2onh9DGXB0lQ4Lk8=; b=yptkNHpBBa+ists9521f7A7hxtsflLLb887/vvPu/OpkQg2hTWSKk0g8exX/fWtO8xejrMwbmFzOkTDzHp7KyJiuj7qxIyoiaEHB7QB0ztE06+831HGSZ74RjE03YVg6Oj1cBFvyXrdVlwwnUct6I8nE7Qur6VPXepmRQcs9sMA= Received: from SN7PR04CA0068.namprd04.prod.outlook.com (2603:10b6:806:121::13) by MN6PR12MB8567.namprd12.prod.outlook.com (2603:10b6:208:478::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.35; Wed, 10 Jul 2024 09:03:07 +0000 Received: from SN1PEPF00026368.namprd02.prod.outlook.com (2603:10b6:806:121:cafe::e5) by SN7PR04CA0068.outlook.office365.com (2603:10b6:806:121::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.36 via Frontend Transport; Wed, 10 Jul 2024 09:03:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SN1PEPF00026368.mail.protection.outlook.com (10.167.241.133) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7762.17 via Frontend Transport; Wed, 10 Jul 2024 09:03:07 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 10 Jul 2024 04:02:56 -0500 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , CC: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , "Valentin Schneider" , "Paul E. McKenney" , Imran Khan , Leonardo Bras , "Guo Ren" , Rik van Riel , Tejun Heo , Cruz Zhao , Lai Jiangshan , Joel Fernandes , Zqiang , Julia Lawall , "Gautham R. Shenoy" , K Prateek Nayak Subject: [PATCH 2/3] sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule() Date: Wed, 10 Jul 2024 09:02:09 +0000 Message-ID: <20240710090210.41856-3-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240710090210.41856-1-kprateek.nayak@amd.com> References: <20240710090210.41856-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF00026368:EE_|MN6PR12MB8567:EE_ X-MS-Office365-Filtering-Correlation-Id: 8996a1bf-142b-4029-9c7d-08dca0bf1d86 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|7416014|376014|1800799024|36860700013; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?AR50yfhuELSkMXm7iq3fbMdW+FIpubg6Sy0Z0e2dLKuGN71/DuK+z0g/uvKr?= =?us-ascii?Q?8TqGIcYKlk3C70EPyw3ZnURUMjksom1QPC/r7P2Um1bdj+oLuPvlelPHxZ/6?= =?us-ascii?Q?ME91nO9tZTA0gb5wnZQreE0OF8gvb34wHjKi7OehYJ5ZHvMyyagky/Ix4DNY?= =?us-ascii?Q?I+HGUaC5N+UwaiZhvB0Rmy48iO4eVNDHk2SJylO1g6A1kmGBMrOKLxn7l0SM?= =?us-ascii?Q?YFX3NgzIX1u2fvsb6tqIkLUcmdwC8z5cFI/34fzAs75aSQ09gM7RmW/KK/zF?= =?us-ascii?Q?DUssbcBWB69OAI0nQ/tGcDJyVwILo11T20kK2hdyNd/lF4g3NPsYQBVoVpkJ?= =?us-ascii?Q?yKhFGRwl+Lg+7L7VeA6FZW0qQrE5TX9cuHj8fjJZeM1H2/e/ZgdQHxWVUUk+?= =?us-ascii?Q?PnvHIUF60HmC/1eg9vfEFe5uyHoQUj4KHkhEITIEu6dZ790uok6CL/E0LzIW?= =?us-ascii?Q?Vq83a6cjakaKTJPJdh/BA6bo3Tq6EihHnqo+YkV4r0UmASJUUgm9MrRks4bD?= =?us-ascii?Q?cZw9+iPxmrocJIKj1gFiLvtfPKWx7egNJq7kuBQF5lNIdB+PqSP+vPvjiIsG?= =?us-ascii?Q?X1cjF7l5DU6GmTV4UUK9M/nzBXb7EtlzioaofQFOlJ59vX7884wiSpqRqAOp?= =?us-ascii?Q?9fTtfzlKyWmUU9Ex8yq8VnEXm0KGGFJCp7SAtQxPdvgmE7N+0DUAw89LfhK6?= =?us-ascii?Q?xQQeKHdNMvXlI+SVi39d9o6M1zIe+uTuJRpGnU7zlR73MtvBAO0I/hSYEOlx?= =?us-ascii?Q?bDAIGrBxyvlAMG12DK7wV6K0JXPoDiWaUEDmKMbACDM5y8+49xB3fowWNYTy?= =?us-ascii?Q?dgIhkA92ogNKHXVslxHNZZXusDNx48+miSp38b4FvdoH/LJ5zRTukf6DH9Fs?= =?us-ascii?Q?UOnW/DDkptHTIiVk9+3D2TB0B7Ez99hYz6dvid4pNNoLIpL8zpbHfPEbtDTf?= =?us-ascii?Q?G5rvR3U7MGU7qgldg/uCIjvMWGKFT6EiRcm6kTH6av+T7qt9qGFcM+chuncl?= =?us-ascii?Q?VHaHvYiC43Ukimlf2O+crZnMXknP7bSNgTzfXMo3VmMfOqtcyiQswzt2D4Z6?= =?us-ascii?Q?uZ4arkru/edKUgTzO+1Ff87nqANfhOKFPioqVd3F8w4Eq+qq805i6l/YJcvc?= =?us-ascii?Q?fN+n6yUUcNU+X6gBEdb1Yw5g+T432lWjPD6kBXIbDpISwHftSoB+V2CDu6Sm?= =?us-ascii?Q?yT/2vvDIXtYdH1FwvicRh/vEiEJwpnlYTdKM5Rk4L2k9hXQkIMOy9xBBtRGY?= =?us-ascii?Q?F9L0/Hxd7tKUZrIegKBLEEbIAhGoP8/Shak9wrYo8/fx552rILDELJlbaBcW?= =?us-ascii?Q?ZC0viBicBa+Jd65dBqgMxdY0huotXJjjZSlB7HE/Y9mxhxph7CHWazIerk0+?= =?us-ascii?Q?QdY01qh4iWNttG7QnPh4rOeauBI1m1A11TnMw/KSy+quZ3Blqw=3D=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(7416014)(376014)(1800799024)(36860700013);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jul 2024 09:03:07.2451 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 8996a1bf-142b-4029-9c7d-08dca0bf1d86 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF00026368.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN6PR12MB8567 Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Since commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") an idle CPU in TIF_POLLING_NRFLAG mode can be pulled out of idle by setting TIF_NEED_RESCHED flag to service an IPI without actually sending an interrupt. Even in cases where the IPI handler does not queue a task on the idle CPU, do_idle() will call __schedule() since need_resched() returns true in these cases. Introduce and use SM_IDLE to identify call to __schedule() from schedule_idle() and shorten the idle re-entry time by skipping pick_next_task() when nr_running is 0 and the previous task is the idle task. With the SM_IDLE fast-path, the time taken to complete a fixed set of IPIs using ipistorm improves significantly. Following are the numbers from a dual socket 3rd Generation EPYC system (2 x 64C/128T) (boost on, C2 disabled) running ipistorm between CPU8 and CPU16: cmdline: insmod ipistorm.ko numipi=3D100000 single=3D1 offset=3D8 cpulist= =3D8 wait=3D1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Test : ipistorm (modified) Units : Normalized runtime Interpretation: Lower is better Statistic : AMean =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D kernel: time [pct imp] tip:sched/core 1.00 [baseline] tip:sched/core + SM_IDLE 0.25 [75.11%] [ kprateek: Commit log and testing ] Link: https://lore.kernel.org/lkml/20240615012814.GP8774@noisy.programming.= kicks-ass.net/ Not-yet-signed-off-by: Peter Zijlstra Signed-off-by: K Prateek Nayak --- kernel/sched/core.c | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1e0c77eac65a..417d3ebbdf60 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6343,19 +6343,12 @@ pick_next_task(struct rq *rq, struct task_struct *p= rev, struct rq_flags *rf) * Constants for the sched_mode argument of __schedule(). * * The mode argument allows RT enabled kernels to differentiate a - * preemption from blocking on an 'sleeping' spin/rwlock. Note that - * SM_MASK_PREEMPT for !RT has all bits set, which allows the compiler to - * optimize the AND operation out and just check for zero. + * preemption from blocking on an 'sleeping' spin/rwlock. */ -#define SM_NONE 0x0 -#define SM_PREEMPT 0x1 -#define SM_RTLOCK_WAIT 0x2 - -#ifndef CONFIG_PREEMPT_RT -# define SM_MASK_PREEMPT (~0U) -#else -# define SM_MASK_PREEMPT SM_PREEMPT -#endif +#define SM_IDLE (-1) +#define SM_NONE 0 +#define SM_PREEMPT 1 +#define SM_RTLOCK_WAIT 2 =20 /* * __schedule() is the main scheduler function. @@ -6396,11 +6389,12 @@ pick_next_task(struct rq *rq, struct task_struct *p= rev, struct rq_flags *rf) * * WARNING: must be called with preemption disabled! */ -static void __sched notrace __schedule(unsigned int sched_mode) +static void __sched notrace __schedule(int sched_mode) { struct task_struct *prev, *next; unsigned long *switch_count; unsigned long prev_state; + bool preempt =3D sched_mode > 0; struct rq_flags rf; struct rq *rq; int cpu; @@ -6409,13 +6403,13 @@ static void __sched notrace __schedule(unsigned int= sched_mode) rq =3D cpu_rq(cpu); prev =3D rq->curr; =20 - schedule_debug(prev, !!sched_mode); + schedule_debug(prev, preempt); =20 if (sched_feat(HRTICK) || sched_feat(HRTICK_DL)) hrtick_clear(rq); =20 local_irq_disable(); - rcu_note_context_switch(!!sched_mode); + rcu_note_context_switch(preempt); =20 /* * Make sure that signal_pending_state()->signal_pending() below @@ -6449,7 +6443,12 @@ static void __sched notrace __schedule(unsigned int = sched_mode) * that we form a control dependency vs deactivate_task() below. */ prev_state =3D READ_ONCE(prev->__state); - if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) { + if (sched_mode =3D=3D SM_IDLE) { + if (!rq->nr_running) { + next =3D prev; + goto picked; + } + } else if (!preempt && prev_state) { if (signal_pending_state(prev_state, prev)) { WRITE_ONCE(prev->__state, TASK_RUNNING); } else { @@ -6483,6 +6482,7 @@ static void __sched notrace __schedule(unsigned int s= ched_mode) } =20 next =3D pick_next_task(rq, prev, &rf); +picked: clear_tsk_need_resched(prev); clear_preempt_need_resched(); #ifdef CONFIG_SCHED_DEBUG @@ -6523,7 +6523,7 @@ static void __sched notrace __schedule(unsigned int s= ched_mode) migrate_disable_switch(rq, prev); psi_sched_switch(prev, next, !task_on_rq_queued(prev)); =20 - trace_sched_switch(sched_mode & SM_MASK_PREEMPT, prev, next, prev_state); + trace_sched_switch(preempt, prev, next, prev_state); =20 /* Also unlocks the rq: */ rq =3D context_switch(rq, prev, next, &rf); @@ -6599,7 +6599,7 @@ static void sched_update_worker(struct task_struct *t= sk) } } =20 -static __always_inline void __schedule_loop(unsigned int sched_mode) +static __always_inline void __schedule_loop(int sched_mode) { do { preempt_disable(); @@ -6644,7 +6644,7 @@ void __sched schedule_idle(void) */ WARN_ON_ONCE(current->__state); do { - __schedule(SM_NONE); + __schedule(SM_IDLE); } while (need_resched()); } =20 --=20 2.34.1 From nobody Thu Dec 18 15:32:02 2025 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2041.outbound.protection.outlook.com [40.107.220.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A4E247CF34 for ; Wed, 10 Jul 2024 09:03:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.220.41 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720602207; cv=fail; b=uT3DXtdctiBac9Sw9Qdp1FEPU1SmjTpS0XghO1gQmGlO8pyjjDte9hSo/RQSnZipy1jmCqdMz+3ASy8AARjpzfHbNKdgraZMjohjdubwou37FuAJj9ZxW48sBiHE3XfM7nWWdlT5vZca5Z1qv6TYeKdgK1eaA9mImNwJTBaUNrM= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720602207; c=relaxed/simple; bh=UC4FDmo69ycTukrS0/gZy43E75L/Y/qTWjtFzEFZWx0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=erdrgtTCJK8Wygouq1SUBx4NBOVh27hGG6g6AxJKuh2xYaolMIEzikowGGuLqNcfuwtFoZtPmtRVxIelI2f1oCBen1dgK6jJ1ELyn9EH3qHmtdjUuNTmbffyWf3DnXlHsbWIHkTxc/26fvVLHrCoQl8AfYo8sMUj4eoc2pPyj6w= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=zeraSVSy; arc=fail smtp.client-ip=40.107.220.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="zeraSVSy" ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iR201yodsIdNV2X/qqNoeLwtPfK9VV5F6cB+zYC0DgezgDkQk484k5dd7+pi5AW+MkpSo8JgeA6sae9QXDmmjfx9OXCy7xaxWidbmYytqpIM8Nzug1+BubDdeXBJc4h1EnKMDruczc9rMMvt0ZLh3JZ37fbh4N3mL2+7aEoM1mLPyUao1Lu1DE0bro17a1+7ZA8gU8+ksNMtSuYw66lKZeyQufBpYbJZoJIXNbqUeorS5/vQQdGlv+9yp0oB7wKRwD5zkYLo7DJVA/EO61sWsG7nRVYGkB1Wu/8pUQxzBNNiwzJG5RcjctzaF7zqDEmFCJWWCUGV6y0BMEarFUkAPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ptLK9mZRaMilTR7o9l/Ccg1evNOzH1IZS91OMUWI5ls=; b=MR99LXTGIzege5uM8uo+WMVXqeSngAotOA/bjv0Hk8qoXHItGdgsvxixWd/hyI9R9q9hXhrLve+tBcD0DvAFhlZac1pi1jBT1cZN/LgsySto5LyrF/s16+r0cJGrlSEniQvcNwdBIN4v6NOwPEs8Ufh0Rvby/yk87qRUJTekA+xvfTIJlBWi56U39I5U+9VQobs2lDMHl3oBy1A8liZcTZe1XC4qkv3+cz1G8JW43ClwclokugiwU6AlOJEECiCNMVxxMUSjY9xPGL0cdL7Aa/Ukulr49eYI3NTlGACg4VKsQkRJkwt3ia2Ycu8KYKX/nU1paOx1psQ9QwBaw9iwQQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=ptLK9mZRaMilTR7o9l/Ccg1evNOzH1IZS91OMUWI5ls=; b=zeraSVSyKaDUUMDUwhVGFqIN2B0q0MCX6A0POrT9Vo9D+I1oKVxgrxZo6Lxe9wjT1/fZfl/r2BW5KkjygjibKWi6IiPPsGqGNZhhAmhJcqKIhfv2LHDBsEjfEhhxpmMglEA57iooRyMNGuQ9ffOYk5XkRJIVH7aDGINJCDkogrg= Received: from CYXPR03CA0050.namprd03.prod.outlook.com (2603:10b6:930:d1::15) by MW4PR12MB6850.namprd12.prod.outlook.com (2603:10b6:303:1ed::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.35; Wed, 10 Jul 2024 09:03:23 +0000 Received: from CY4PEPF0000EE3A.namprd03.prod.outlook.com (2603:10b6:930:d1:cafe::76) by CYXPR03CA0050.outlook.office365.com (2603:10b6:930:d1::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.36 via Frontend Transport; Wed, 10 Jul 2024 09:03:23 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EE3A.mail.protection.outlook.com (10.167.242.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7762.17 via Frontend Transport; Wed, 10 Jul 2024 09:03:22 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 10 Jul 2024 04:03:15 -0500 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , CC: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , "Valentin Schneider" , "Paul E. McKenney" , Imran Khan , Leonardo Bras , "Guo Ren" , Rik van Riel , Tejun Heo , Cruz Zhao , Lai Jiangshan , Joel Fernandes , Zqiang , Julia Lawall , "Gautham R. Shenoy" , K Prateek Nayak Subject: [RFC PATCH 3/3] softirq: Avoid waking up ksoftirqd from flush_smp_call_function_queue() Date: Wed, 10 Jul 2024 09:02:10 +0000 Message-ID: <20240710090210.41856-4-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240710090210.41856-1-kprateek.nayak@amd.com> References: <20240710090210.41856-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EE3A:EE_|MW4PR12MB6850:EE_ X-MS-Office365-Filtering-Correlation-Id: b8cbc8f5-21dc-4153-5571-08dca0bf26cc X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700013|376014|7416014|82310400026|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?Cjk92EXwbgfUxx0Lr5qb5Y0aJieVyDDnS7m82amyAQxma7ff1phTH7w0zVS5?= =?us-ascii?Q?oOi0+MU+y8E+5VsdGgjFKwgBsp/BRrPmneJ4IRuHzZwXmwaR8EfK8EPU5mzw?= =?us-ascii?Q?IoJomsRCnMrbhSLkfS0gKnnORZKRcckwOixfmZY9o5DbBap8oHveQhmgWiCL?= =?us-ascii?Q?5qpcPGPG7T3oCfkjqBWpalp0QaR28Q/YBina5BN2+liShadWSu0B9HDfaUnA?= =?us-ascii?Q?USMrhhxOzpW16SY/11xcdPnf2IX/jnLw8gox3y991Zyi/Fw7Q0psAkLIry6N?= =?us-ascii?Q?63CEjwG4ekGeY2xCktPfo8JV9I/C19q9QD4znt+q0m9gcbv9jepwdGf6Zlmc?= =?us-ascii?Q?JGo+rohdKHFVYQFIxpx9scfsRV7G5gtvCklYV6Me4YeXBhP5rSgaqaQncG5F?= =?us-ascii?Q?6y4uHbijaD6daviwQ+c+hHMCI2Sd3+nGC5IEe41J5gNgCEeNaF+LvfoErpew?= =?us-ascii?Q?10L6/rjg8UvSr1WlT3Gw1GEbLOsz70bhW7+cj3nrqFeyN9B+mZ85OWO0VvOT?= =?us-ascii?Q?5vsT8PXKvbNo8eB1So+49d60QonbuZ2PQwhxxZ2yyq6cgW2nXOYLqszTPjRP?= =?us-ascii?Q?SAivIIXIySH3GtTHi74N3E+WQnZGWOAw/gYu/M/0RQkK35vMt9OB99+v5H6l?= =?us-ascii?Q?aD9ZZxywciM7G1wweX1cxHmGQ/ZKkexC3z2bU0E+aN8ueHAJyJKpyXzyapPj?= =?us-ascii?Q?0UcmHlwX5rZRvlfBxb6potboADh9toOaBJsG9gEJO/CIdqr95E3/lDyCpCbV?= =?us-ascii?Q?OZfeMzDyuz/gpZK1jaNhe+ONdlvyTyv2Zdu/ig1pTFwLTt2JCSavLY3bD4g5?= =?us-ascii?Q?krwBQiQ9fZfc5+5BKzkWydHDALIBTCudxOOkKfpOAuZJlZfriV2X2uxweI8e?= =?us-ascii?Q?TyawROrbOyR94povYxXtGGZJ1RhlR4rjmVxA0DOfkqoNHbnkTZyClT9mwetx?= =?us-ascii?Q?3by2P0a4PSlnhHTQWXJ0QD8PJ1IYeZk+7HXwjlJDeydoxzK+8ZvlJ+VO7UEn?= =?us-ascii?Q?fCa+iXVhFfQ4GjTh+ia+rlOre4ybo4zbzY+HSnBQcuntVRxiIK5MGve5T+a0?= =?us-ascii?Q?n3ANU06kZ+FeGDaeNUHF+2ZNtWcHTpNohAPGcZVxm4oWoB88hVmw9/4uFtz6?= =?us-ascii?Q?OpFe7q+JadMZ7hULLHQ21ZnLz8wQCBYwYTadZ6+UCHZoz5PbIBLuDLP3sOiX?= =?us-ascii?Q?paE+gaPfavuVlSBBJScLpIFlMT+qT99XkEdo3tu+q/Bj1jJ64xUPTRiHsF61?= =?us-ascii?Q?q4abY1zcPTbkY2YNsTWQZjGxyd7oGmrkeomK/V7ACSwBNGa5ON/cOA8oFZth?= =?us-ascii?Q?V3AiJdUIAwmV77mn7MQu3gJS2PUHWSwlAUVqavUCer8rIxRHjDORfDC0IEwV?= =?us-ascii?Q?Y7JgsByU41LL1u22Kx3g9AVBR1gT?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700013)(376014)(7416014)(82310400026)(1800799024);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jul 2024 09:03:22.7703 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b8cbc8f5-21dc-4153-5571-08dca0bf26cc X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EE3A.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR12MB6850 Content-Type: text/plain; charset="utf-8" Since commit b2a02fc43a1f4 ("smp: Optimize send_call_function_single_ipi()"), sending an actual interrupt to an idle CPU in TIF_POLLING_NRFLAG mode can be avoided by queuing the SMP call function on the call function queue of the CPU and setting the TIF_NEED_RESCHED bit in idle task's thread info. The call function is handled in the idle exit path when do_idle() calls flush_smp_call_function_queue(). However, since flush_smp_call_function_queue() is executed in idle thread's context, in_interrupt() check within a call function will return false. raise_softirq() uses this check to decide whether to wake ksoftirqd, since, a softirq raised from an interrupt context will be handled at irq exit. In all other cases, raise_softirq() wakes up ksoftirqd to handle the softirq on !PREEMPT_RT kernel. Since flush_smp_call_function_queue() calls do_softirq_post_smp_call_flush(), waking up ksoftirqd is not necessary since the softirqs raised by the call functions will be handled soon after the call function queue is flushed. Mark __flush_smp_call_function_queue() within flush_smp_call_function_queue() with "will_do_softirq_post_flush" and use "do_softirq_pending()" to notify raise_softirq() an impending call to do_softirq() and avoid waking up ksoftirqd. Adding a trace_printk() in nohz_csd_func() at the spot of raising SCHED_SOFTIRQ and enabling trace events for sched_switch, sched_wakeup, and softirq_entry (for SCHED_SOFTIRQ vector alone) helps observing the current behavior: -0 [000] dN.1. nohz_csd_func: Raise SCHED_SOFTIRQ for idle b= alance -0 [000] dN.4. sched_wakeup: comm=3Dksoftirqd/0 pid=3D16 pri= o=3D120 target_cpu=3D000 -0 [000] .Ns1. softirq_entry: vec=3D7 [action=3DSCHED] -0 [000] d..2. sched_switch: prev_comm=3Dswapper/0 =3D=3D> n= ext_comm=3Dksoftirqd/0 ksoftirqd/0-16 [000] d..2. sched_switch: prev_comm=3Dksoftirqd/0 = =3D=3D> next_comm=3Dswapper/0 ksoftirqd is woken up before the idle thread calls do_softirq_post_smp_call_flush() which can make the runqueue appear busy and prevent the idle load balancer from pulling task from an overloaded runqueue towards itself[1]. Following are the observations with the changes when enabling the same set of events: -0 [000] dN.1. 106.134226: nohz_csd_func: Raise SCHED_SOFT= IRQ for idle balance -0 [000] .Ns1. 106.134227: softirq_entry: vec=3D7 [action= =3DSCHED] ... No unnecessary ksoftirqd wakeups are seen from idle task's context to service the softirq. When doing performance testing, it was noticed that per-CPU "will_do_softirq_post_flush" variable needs to be defined as cacheline aligned to minimize performance overheads of the writes in flush_smp_call_function_queue(). Following is the IPI throughput measured using a modified version of ipistorm that performs a fixed set of IPIs between two CPUs on a dual socket 3rd Generation EPYC system (2 x 64C/128T) (boost on, C2 disabled) by running ipistorm between CPU8 and CPU16: cmdline: insmod ipistorm.ko numipi=3D100000 single=3D1 offset=3D8 cpulist= =3D8 wait=3D1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Test : ipistorm (modified) Units : Normalized runtime Interpretation: Lower is better Statistic : AMean =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D kernel: time [pct imp] tip:sched/core 1.00 [baseline] tip:sched/core + SM_IDLE 0.25 [75.11%] tip:sched/core + SM_IDLE + unaligned var 0.47 [53.74%] * tip:sched/core + SM_IDLE + aligned var 0.25 [75.04%] * The version where "will_do_softirq_post_flush" was not cacheline aligned takes twice as long as the cacheline aligned version to perform a fixed set of IPIs. Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") Reported-by: Julia Lawall Closes: https://lore.kernel.org/lkml/fcf823f-195e-6c9a-eac3-25f870cb35ac@in= ria.fr/ [1] Signed-off-by: K Prateek Nayak --- kernel/sched/smp.h | 2 ++ kernel/smp.c | 32 ++++++++++++++++++++++++++++++++ kernel/softirq.c | 10 +++++++++- 3 files changed, 43 insertions(+), 1 deletion(-) diff --git a/kernel/sched/smp.h b/kernel/sched/smp.h index 21ac44428bb0..3731e79fe19b 100644 --- a/kernel/sched/smp.h +++ b/kernel/sched/smp.h @@ -9,7 +9,9 @@ extern void sched_ttwu_pending(void *arg); extern bool call_function_single_prep_ipi(int cpu); =20 #ifdef CONFIG_SMP +extern bool do_softirq_pending(void); extern void flush_smp_call_function_queue(void); #else +static inline bool do_softirq_pending(void) { return false; } static inline void flush_smp_call_function_queue(void) { } #endif diff --git a/kernel/smp.c b/kernel/smp.c index f085ebcdf9e7..2eab5e1d5cef 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -559,6 +559,36 @@ static void __flush_smp_call_function_queue(bool warn_= cpu_offline) } } =20 +/* Indicate an impending call to do_softirq_post_smp_call_flush() */ +static DEFINE_PER_CPU_ALIGNED(bool, will_do_softirq_post_flush); + +static __always_inline void __set_will_do_softirq_post_flush(void) +{ + this_cpu_write(will_do_softirq_post_flush, true); +} + +static __always_inline void __clr_will_do_softirq_post_flush(void) +{ + this_cpu_write(will_do_softirq_post_flush, false); +} + +/** + * do_softirq_pending - Check if do_softirq_post_smp_call_flush() will + * be called after the invocation of + * __flush_smp_call_function_queue() + * + * When flush_smp_call_function_queue() executes in the context of idle, + * migration thread, a softirq raised from the smp-call-function ends up + * waking ksoftirqd despite an impending softirq processing via + * do_softirq_post_smp_call_flush(). + * + * Indicate an impending do_softirq() to should_wake_ksoftirqd() despite + * not being in an interrupt context. + */ +__always_inline bool do_softirq_pending(void) +{ + return this_cpu_read(will_do_softirq_post_flush); +} =20 /** * flush_smp_call_function_queue - Flush pending smp-call-function callbac= ks @@ -583,7 +613,9 @@ void flush_smp_call_function_queue(void) local_irq_save(flags); /* Get the already pending soft interrupts for RT enabled kernels */ was_pending =3D local_softirq_pending(); + __set_will_do_softirq_post_flush(); __flush_smp_call_function_queue(true); + __clr_will_do_softirq_post_flush(); if (local_softirq_pending()) do_softirq_post_smp_call_flush(was_pending); =20 diff --git a/kernel/softirq.c b/kernel/softirq.c index 02582017759a..b39eeed03042 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -34,6 +34,8 @@ #define CREATE_TRACE_POINTS #include =20 +#include "sched/smp.h" + /* - No shared variables, all the data are CPU local. - If a softirq needs serialization, let it serialize itself @@ -413,7 +415,13 @@ static inline void ksoftirqd_run_end(void) =20 static inline bool should_wake_ksoftirqd(void) { - return true; + /* + * Avoid waking up ksoftirqd when a softirq is raised from a + * call-function executed by flush_smp_call_function_queue() + * in idle, migration thread's context since it'll soon call + * do_softirq_post_smp_call_flush(). + */ + return !do_softirq_pending(); } =20 static inline void invoke_softirq(void) --=20 2.34.1