From nobody Mon Nov 25 02:26:04 2024 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10on2051.outbound.protection.outlook.com [40.107.94.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97EED1DF732 for ; Wed, 30 Oct 2024 07:24:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.94.51 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730273074; cv=fail; b=bNxSNqcAJShndqU7mBo+53VmawrNq1dUh8tVl9QuiVkyr/ljd2/dGl9YMQrsBTLoxtXHO4h70UPaSPcmQFCpDtOvsK0dnP5G+AYK1GDV89HRg3fIskDT7VDmGhpAVyyqA05WXB9XQoYupqhImNT7cdKl8Q6UmWXfHeKaYkmlwC8= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730273074; c=relaxed/simple; bh=ivAkjxRInKCrJijX6z2QgcNwgOBTlXg4OaSIz++RegA=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=euR7i5PyaoUmXXJ6uci7D7xUYJnEPNpLkt+fAK7VFt0QaTypT+BuarGbB9LIrgEDEuJ1ycwMIZz9leo0vtcnYPlj3byXoXGgyeSHDFj2GCXRfTiycG+ZM4chO6LZ/Kpl2X+FTNYQnr9jJ7ccMqDKRlhjhOk+DagiDCv/EA7lRPs= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=ZsftkYoy; arc=fail smtp.client-ip=40.107.94.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="ZsftkYoy" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=DMG0xyCMrUflifS8DeZJQe9UVl8f2Pd+Dfyn5ZxVJgo9Fi0wrN/4w0T67WbTlyhwx5ogx8eMRRpL0oFZwKH2Hn2fY01bv8+XpzVbp9sWHe3L5BSiuxW53rF8UHmgF/gntLyKMPgsD6nFCwvvalNnn4/lPjTXb/1ZeqzVwOyYZIuY4HSiyQd+aumiRv6d+6//4d9jbpO07ZM0NrihiUBVHsPSL8ipxeRFB8MLB6Pi9G0ftYo9oBRBXvgKVLgmbR6IbXxsda9HNrLH5HJPwWZKpJXu7qt/KCQNEzugKb6/3o5MSTCPeke3thiQx6NRHmZu6sv3zzFZKIcd939D1VxWog== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=WiPGwGdlnV+mXmiNMMmzO3h6RM4D8jANcceN+D0b71Y=; b=QVg3l4UbFN6xGJ+HkIuIhDz1uV3A6JX6KwDhU9xB6V9BAOsrDQo7SCZZtACCR4In/zsBj9wCGQ+Z9hKz1nVPsvyuXOncoxbymuSyPvwf1Fq6FoTcHhoha8iT3MYm9M1FZ1CtedKHL5MGFm7yaHYOxYcj4gNuCdGLeEayV2mKeYn7KZF2Q1p0i1b2oOrsxlmWnSgx5TyjzXXvl8QrHzALRMQ/afVNEW6UMNBkAMJjUZU94U9DQeQFnmLyBnHeYFHNZ94/luDJCoUDlEiTVMzDxn7PDmABWzFOgAsBe3EKZFYgQrUbzQ6W9GUKmSki/djh8mjyV9+NJ4HvxKed6dUr4g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=WiPGwGdlnV+mXmiNMMmzO3h6RM4D8jANcceN+D0b71Y=; b=ZsftkYoyDP1w9RIFN0Z88AKcnv4kOs9+YjNneC06wkujWY3eejIp0aPHfbEzDrykZm76aAsetPw+2CENDc6GbOAy2puYB2N1sdvb+ZU8dwHo26xIzyKso+/yK2AVh9GbltFnyAVCPLebNJ8IXqjg+WPRWV3ymh0nAKN01fEA+9M= Received: from CY5PR16CA0028.namprd16.prod.outlook.com (2603:10b6:930:10::25) by IA1PR12MB6092.namprd12.prod.outlook.com (2603:10b6:208:3ec::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8093.25; Wed, 30 Oct 2024 07:24:25 +0000 Received: from CY4PEPF0000EDD6.namprd03.prod.outlook.com (2603:10b6:930:10:cafe::ed) by CY5PR16CA0028.outlook.office365.com (2603:10b6:930:10::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8114.23 via Frontend Transport; Wed, 30 Oct 2024 07:24:25 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EDD6.mail.protection.outlook.com (10.167.241.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8114.16 via Frontend Transport; Wed, 30 Oct 2024 07:24:24 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 30 Oct 2024 02:18:37 -0500 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Sebastian Andrzej Siewior , Clark Williams , "Steven Rostedt" , , CC: Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Thomas Gleixner , Tejun Heo , Jens Axboe , NeilBrown , Zqiang , Caleb Sander Mateos , "Gautham R . Shenoy" , Chen Yu , Julia Lawall , "K Prateek Nayak" Subject: [PATCH v4 1/3] softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel Date: Wed, 30 Oct 2024 07:15:55 +0000 Message-ID: <20241030071557.1422-2-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241030071557.1422-1-kprateek.nayak@amd.com> References: <20241030071557.1422-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EDD6:EE_|IA1PR12MB6092:EE_ X-MS-Office365-Filtering-Correlation-Id: 538c9f4c-e133-4eff-2673-08dcf8b3e191 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|36860700013|82310400026|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?B/AUKmx2SvYOq5cjzg5ux1KdldXb14KJmhLNr4I2DSJbL7nbpm7L5dmsFuYn?= =?us-ascii?Q?e34SHtO7vxVdXnI1uezF5gnw3Gus+5JOyhRIqsxO5EvdfTsHLe8zn5MN9vkA?= =?us-ascii?Q?xsYYcwdoaWZmOjftKO1RgPCfhtRUfEXN17E64PMkeXRtdw82PKgPi1VBb0Q/?= =?us-ascii?Q?B1iUkjiDZf/+aG9A/b/Jzr1rdBgbodgjNvepyNV9ZwpTyTM5fuK3p2D57q+n?= =?us-ascii?Q?IA9eDLlixiDRx0jPYUTQAsqsFp4P/Z9ZjiNDvAKpcVZ+9Tn08WoR62dxcqfx?= =?us-ascii?Q?GcqpTHJ8t/KBNWW/lqwhMgsfCtVD1oNPOcN0Ddt8SEIkt0exZVFoCf7HFuto?= =?us-ascii?Q?RNY9v/QmGFPP4PeXcQ3LTg3AlRelPDvFHfp/AbalIo+i3tS63G2/VNoHoUbS?= =?us-ascii?Q?8tq2K7Icpa/01BQ93Ut5G6IEsL7d1uMqMDaR0Io5urJfBU/LzG8ahVQNxlH4?= =?us-ascii?Q?TBXe/dLgwsallpM37QJGvzaI1Vv28SUhABsN4L4iLdki9v7epBThV9avKfLC?= =?us-ascii?Q?FINp35+MbShpOvfH0jmVsop47HMM7Zuw4R/WAZOfP3eTWKRFwJzTPhGy8zZT?= =?us-ascii?Q?XUTMpx248/EieumjGAvn6FqBS80wPeJYvl16oJPe+6Us6NXBU6skAIwzLiNT?= =?us-ascii?Q?SvktB0/iAxTXLNilz6DV/J4B7LFZJ5UceRFrkvK3VUbYFEyTxmmgUlCZLiUg?= =?us-ascii?Q?nWUiZ+72vHUsF1hdW5YLRqC28/s6pSAsWEQ/hKmwxudwMBzJ5EvheTL3yalQ?= =?us-ascii?Q?YA43Mrx+I8lkiLNmEQ7DZFiFuKrjC8igGXZBCVirJoUuB+/xtM5YIZoIUXVU?= =?us-ascii?Q?c+opzWhHEvtUslVjyL+/fsKISvrrNcigrpgMarWGvZU0GMSx5r8BxCbcYQ4j?= =?us-ascii?Q?QzOZ5Y+cdOsFHV38livmuOBhtxcesB6Y3ltmXfRWYIMkZQwhURFy7dU1beeO?= =?us-ascii?Q?Me+gt4grdkCnusTyo/AxDsY9Sk3Jr+lrzEk/VfKe3PbLbtsfX2cl04WaLPt5?= =?us-ascii?Q?zoKvfen9exRuM6YW7qH+y/zPp1KkWbJIwyVwz4tqxB9SbCMgbqShUA6JSYkd?= =?us-ascii?Q?TT1MLI9rDb0iDBqpldUfVkuRiVP3YgQp6abwGoQM87mDPw7tbvVCBpUB54OZ?= =?us-ascii?Q?wg+VRI9AkLJosEhhU1R0Qj3fcI/PbRykssbfZiaF6aRWDtk4DP+Jh3il7Ybc?= =?us-ascii?Q?x2yUNHdjVzPMoIzhU9FrsyiuYST3G5RtsQWeIehQllKSZcNzs0s3pm3L2nPE?= =?us-ascii?Q?TMMAW+vP2A3doWSc3DHkFwWdfWE+jl0I9nuhmKxSmLjQcRuCaGKg5HZoFPjL?= =?us-ascii?Q?Z1/zweWNCBd2iKsh/rH96eX3Nm4ipsQ56kg/YG8H37SFoCCUW2rKO6BnQu3Q?= =?us-ascii?Q?2QYFM8ZkK8jpIK9HQ8KtljdherWY4YahW1UQS5OvS/xtm3gHzg=3D=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(7416014)(36860700013)(82310400026)(376014)(1800799024);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Oct 2024 07:24:24.5269 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 538c9f4c-e133-4eff-2673-08dcf8b3e191 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EDD6.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB6092 Content-Type: text/plain; charset="utf-8" do_softirq_post_smp_call_flush() on PREEMPT_RT kernels carries a WARN_ON_ONCE() for any SOFTIRQ being raised from an SMP-call-function. Since do_softirq_post_smp_call_flush() is called with preempt disabled, raising a SOFTIRQ during flush_smp_call_function_queue() can lead to longer preempt disabled sections. Since commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") IPIs to an idle CPU in TIF_POLLING_NRFLAG mode can be optimized out by instead setting TIF_NEED_RESCHED bit in idle task's thread_info and relying on the flush_smp_call_function_queue() in the idle-exit path to run the SMP-call-function. To trigger an idle load balancing, the scheduler queues nohz_csd_function() responsible for triggering an idle load balancing on a target nohz idle CPU and sends an IPI. Only now, this IPI is optimized out and the SMP-call-function is executed from flush_smp_call_function_queue() in do_idle() which can raise a SCHED_SOFTIRQ to trigger the balancing. So far, this went undetected since, the need_resched() check in nohz_csd_function() would make it bail out of idle load balancing early as the idle thread does not clear TIF_POLLING_NRFLAG before calling flush_smp_call_function_queue(). The need_resched() check was added with the intent to catch a new task wakeup, however, it has recently discovered to be unnecessary and will be removed soon. As such, nohz_csd_function() will raise a SCHED_SOFTIRQ from flush_smp_call_function_queue() to trigger an idle load balance on an idle target. nohz_csd_function() bails out early if "idle_cpu()" check for the target CPU returns false and should not delay a newly woken up task from running. Account for this and prevent a WARN_ON_ONCE() when SCHED_SOFTIRQ is raised from flush_smp_call_function_queue(). Signed-off-by: K Prateek Nayak --- v3..v4: o No changes. --- kernel/softirq.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index b756d6b3fd09..d89be0affe46 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -280,17 +280,24 @@ static inline void invoke_softirq(void) wakeup_softirqd(); } =20 +#define SCHED_SOFTIRQ_MASK BIT(SCHED_SOFTIRQ) + /* * flush_smp_call_function_queue() can raise a soft interrupt in a function - * call. On RT kernels this is undesired and the only known functionality - * in the block layer which does this is disabled on RT. If soft interrupts - * get raised which haven't been raised before the flush, warn so it can be + * call. On RT kernels this is undesired and the only known functionalities + * are in the block layer which is disabled on RT, and in the scheduler for + * idle load balancing. If soft interrupts get raised which haven't been + * raised before the flush, warn if it is not a SCHED_SOFTIRQ so it can be * investigated. */ void do_softirq_post_smp_call_flush(unsigned int was_pending) { - if (WARN_ON_ONCE(was_pending !=3D local_softirq_pending())) + unsigned int is_pending =3D local_softirq_pending(); + + if (unlikely(was_pending !=3D is_pending)) { + WARN_ON_ONCE(was_pending !=3D (is_pending & ~SCHED_SOFTIRQ_MASK)); invoke_softirq(); + } } =20 #else /* CONFIG_PREEMPT_RT */ --=20 2.34.1 From nobody Mon Nov 25 02:26:04 2024 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2072.outbound.protection.outlook.com [40.107.244.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B5691DC1A2 for ; Wed, 30 Oct 2024 07:25:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.244.72 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730273130; cv=fail; b=PJXEWIV2QdUGvRLkQOf//JmmdDTVH5oEN07dtPkzYw28KV+N6tJad1VBLVX6VHSt2p6W/5qB0kG77J8ZGK2n1nVHzu27Nypo6cZ5xqh4aNXcSXZNRCbkOe243XJcHZhnYmczOcSvq+2XFdhGj2aYU6w442R2KSEC4XQjePvd37M= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730273130; c=relaxed/simple; bh=l2Mksu3vknG55VmlNaSOZYEQJKgMNOfhnas6AXIPHKc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NSs2/YbeivkchiOXIY6nO1q5AtF6IxmhVTDmXoQfBwRi4+FfEJ2N+MqafTCFRWrXtMn93vgQP3Wx6RPEmpG+sD/kX3qbdaL4utoGHTRSarlNsuiTnJAllykFG8coq5njr/Aufn9feU6o/fKvx7PNF2tQI3WQ2U+8Tb03gl5cpUM= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=Erb8VdC+; arc=fail smtp.client-ip=40.107.244.72 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="Erb8VdC+" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=peOfh6ZI6sv4vkqZkrBKypeLbVVOxHYr/bFneT8DSysdB5hTKQARMuupoKmrwId1kmG/ioztobUf7JIbtIrU+lgiDOTNEJK8X8MUdCZ4vIrYQX1Kay7Y2V4Gv9t+03FF5IJlWEO0m2iAgOmYgpYZ9/S1MLGEaEII3DOgKhCEcH/ZTPxGf/CNFf655TW/B0J+ZvVw3t++Jnned0LsN+FHmJnEhNYTzMelLiHScFJJJH1VN1AUBsUp/HlVFDhqxniPlkvDTZqLCOsW+LJxWwSXfZWU1LEWDqmK+cxRWiVHd1mJKCuSvUiuCNaqML0ofSXaGFSDV7d3flGgoJO+3meZ8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=48CPlsmVUY1DVShT+sl6RhhND5nxlPqQQU8pfYvZg7U=; b=wErvA1Ale6rIlsVD6L0Jo0uFnMive8c1GKSy7Y4OVrXkW6aZN+7qJFhRrTwiV2Arj8sfe9wrFwcvesf2oJhQhjVvERdFUButZIHIswgsO1FPEKA4v6wZFa71w4aCksm94TPjergAihsce13ZMPDNFV1Uq9CBjnkC/IwuNtnXiFqCC24RU+o7U9j6Q/rsTZg9KmSVH40dM6nqmOHpAbUs6FalAIVuLz6U5fYN0jup/JgYhdNODYyhKbJsj9DCdTboqnVrU8Y0HOi0K9Fexlht0yWuOqvTxjD7PPnCYaCV/3KOGoLHf8I31jJxsH22oufKoh1VUsuHDPTtLvKjkBet9g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=48CPlsmVUY1DVShT+sl6RhhND5nxlPqQQU8pfYvZg7U=; b=Erb8VdC+SyBUZpEhC3RmHLlYeywEzs7qQlPkIdkGeDqLhGfIr/TuLBYD/aCD2cKLpBKm/fM+QGeCWvDbGBigvFN+uPiEY6bi9wb4CV8Ax1gFeMjnqWx3MGqmIqr5w/NY+JZkqqnee+u1fH/HJie5iNoR44zqivOSUR32hVI9TFU= Received: from CY5PR16CA0003.namprd16.prod.outlook.com (2603:10b6:930:10::7) by DM6PR12MB4220.namprd12.prod.outlook.com (2603:10b6:5:21d::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8093.32; Wed, 30 Oct 2024 07:25:21 +0000 Received: from CY4PEPF0000EDD6.namprd03.prod.outlook.com (2603:10b6:930:10:cafe::50) by CY5PR16CA0003.outlook.office365.com (2603:10b6:930:10::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8114.17 via Frontend Transport; Wed, 30 Oct 2024 07:25:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EDD6.mail.protection.outlook.com (10.167.241.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8114.16 via Frontend Transport; Wed, 30 Oct 2024 07:25:21 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 30 Oct 2024 02:21:57 -0500 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Sebastian Andrzej Siewior , Clark Williams , "Steven Rostedt" , , CC: Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Thomas Gleixner , Tejun Heo , Jens Axboe , NeilBrown , Zqiang , Caleb Sander Mateos , "Gautham R . Shenoy" , Chen Yu , Julia Lawall , "K Prateek Nayak" Subject: [PATCH v4 2/3] sched/core: Remove the unnecessary need_resched() check in nohz_csd_func() Date: Wed, 30 Oct 2024 07:15:56 +0000 Message-ID: <20241030071557.1422-3-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241030071557.1422-1-kprateek.nayak@amd.com> References: <20241030071557.1422-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EDD6:EE_|DM6PR12MB4220:EE_ X-MS-Office365-Filtering-Correlation-Id: e4fb2b02-a35e-4d23-1ad5-08dcf8b4034f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|36860700013|376014|7416014|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?Pbh4icQ8D2bFfDsmAFuK7GZRYcgEs6tTthHhnv2dVxEEtKOQ5wR6P4+xAk6F?= =?us-ascii?Q?9g123Iek5P3NfWbFq/h5oeUAizmwTRMNvgx0AfHEfUN0kzEWDjhL7/B8/xKz?= =?us-ascii?Q?GHd0Cim4BxHAYc5hLrdTeDCaYGQZvWzXGwlFFlUPQ6fWVYGr89D63X8iLmR5?= =?us-ascii?Q?XZIVUhvaa0pYyMDH84hdg1MrHjD2ZSXrGvFCqfX+Di8Ww3tSb6o3d8mDTPLa?= =?us-ascii?Q?qPHtUfKPDKdSOmveG6xyBd9tDWWcnjrtsuAd1lXF0WK9UaGlv1WDmrFH3tuW?= =?us-ascii?Q?sKo2el+xiZ1eGI4p/n9dugOdGPjui8X5y41/cA6rPOusMFBFJFjGVTE7hUUy?= =?us-ascii?Q?0r28Filj8xQJS346+C25HRJtqiHMYPQwGsb1PlW9znLGki7eAy+5jSmAPAVu?= =?us-ascii?Q?J0n79EM5H5AFmjTRKiC3WnoDs8801PC8qypUORin/8PkWPBSdxVGu2AqeYDx?= =?us-ascii?Q?dLGvohHtVuVOIDNo1gM0RPOWVmj++4YquS9srQyITpmAu3BXJ+JlEs63Ieci?= =?us-ascii?Q?XIe6n1RYfeHUDWMQDDuBeiswFM4uS+UCLOox7adkbO9+yphP6KhJtN8wJXCC?= =?us-ascii?Q?n8zirUJHjMhj2ikuVH5LBXADMsRs5AB6lHjxxY7AZ0i/dU697loZGbi6PTGP?= =?us-ascii?Q?twzOL9zXPt1lxzKCT/iu/TuN5LZM7uybchDAAYIkgP9UBdJwzRPpfhtCkl6u?= =?us-ascii?Q?AatoS9uT49Jtm0Tqc04/Pgej966BcZpReNVxB7SxF8vWwUIR+162u1jONZzB?= =?us-ascii?Q?nVpZLDlFrfFm6atot0vBAwqD226JgS0ZFEpBRjE05tkRKgQxytJIuKpQReUP?= =?us-ascii?Q?p26Ljhpt5rmPDEB3d1j9BIWYlOdWSl77naRz/ueE3rcnE2pYAInoL9W2M9Xu?= =?us-ascii?Q?NHFR5j8fWB9JsUa0/dBuUFYPP4CJtTmOo2yGRNh0tOmRKWOdKLigvEpn2iEw?= =?us-ascii?Q?Ge7DtuMTvl8dyNZYHglFmJ0uSFMM3zOuuUuEgfvreMLFRhJbYGCt7ZRHPrxm?= =?us-ascii?Q?NnE0qcLZka8ry5NaK438NIm7kfodDhf7iH6VRH7L42zVcNpfzXQU74Jp6QIl?= =?us-ascii?Q?slbhQyjJ6QJN2z/F/3xCwbyvWe4cgDy0VfoCBZXOCz55ShoDLbJWYI+Ahpin?= =?us-ascii?Q?26VQaxQyj1bkeqoFLvQJ3abYrlec5hg47Z1iJVXfAgrBFz50VPw1dyKWqEaP?= =?us-ascii?Q?g7HTo18OL/C43vC8cq4YJgMuxE/LZg60pHI/3P/D+ch138XzcjgY/OCNarXL?= =?us-ascii?Q?RlpvzWp4gs7ahp/yQLn8ZKZ0RqV83Eotu0VeDfYquA0cOeL/QJei7bwBoDxO?= =?us-ascii?Q?ubGbNmCFL/Go0B/kSBhqS1Ek1HrL+lANhMimMi5gdwynfBl2aadmOFzzo4ZO?= =?us-ascii?Q?bYksA5vRWusJQzQqEh0xcBxWFabZzgp8XS3yGyMlBP9Y4xftohQ9VrdoEH3L?= =?us-ascii?Q?MnItbMH6Iw4=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(36860700013)(376014)(7416014)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Oct 2024 07:25:21.1679 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: e4fb2b02-a35e-4d23-1ad5-08dcf8b4034f X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EDD6.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4220 Content-Type: text/plain; charset="utf-8" The need_resched() check currently in nohz_csd_func() can be tracked to have been added in scheduler_ipi() back in 2011 via commit ca38062e57e9 ("sched: Use resched IPI to kick off the nohz idle balance") Since then, it has travelled quite a bit but it seems like an idle_cpu() check currently is sufficient to detect the need to bail out from an idle load balancing. To justify this removal, consider all the following case where an idle load balancing could race with a task wakeup: o Since commit f3dd3f674555b ("sched: Remove the limitation of WF_ON_CPU on wakelist if wakee cpu is idle") a target perceived to be idle (target_rq->nr_running =3D=3D 0) will return true for ttwu_queue_cond(target) which will offload the task wakeup to the idle target via an IPI. In all such cases target_rq->ttwu_pending will be set to 1 before queuing the wake function. If an idle load balance races here, following scenarios are possible: - The CPU is not in TIF_POLLING_NRFLAG mode in which case an actual IPI is sent to the CPU to wake it out of idle. If the nohz_csd_func() queues before sched_ttwu_pending(), the idle load balance will bail out since idle_cpu(target) returns 0 since target_rq->ttwu_pending is 1. If the nohz_csd_func() is queued after sched_ttwu_pending() it should see rq->nr_running to be non-zero and bail out of idle load balancing. - The CPU is in TIF_POLLING_NRFLAG mode and instead of an actual IPI, the sender will simply set TIF_NEED_RESCHED for the target to put it out of idle and flush_smp_call_function_queue() in do_idle() will execute the call function. Depending on the ordering of the queuing of nohz_csd_func() and sched_ttwu_pending(), the idle_cpu() check in nohz_csd_func() should either see target_rq->ttwu_pending =3D 1 or target_rq->nr_running to be non-zero if there is a genuine task wakeup racing with the idle load balance kick. o The waker CPU perceives the target CPU to be busy (targer_rq->nr_running !=3D 0) but the CPU is in fact going idle and due to a series of unfortunate events, the system reaches a case where the waker CPU decides to perform the wakeup by itself in ttwu_queue() on the target CPU but target is concurrently selected for idle load balance (XXX: Can this happen? I'm not sure, but we'll consider the mother of all coincidences to estimate the worst case scenario). ttwu_do_activate() calls enqueue_task() which would increment "rq->nr_running" post which it calls wakeup_preempt() which is responsible for setting TIF_NEED_RESCHED (via a resched IPI or by setting TIF_NEED_RESCHED on a TIF_POLLING_NRFLAG idle CPU) The key thing to note in this case is that rq->nr_running is already non-zero in case of a wakeup before TIF_NEED_RESCHED is set which would lead to idle_cpu() check returning false. In all cases, it seems that need_resched() check is unnecessary when checking for idle_cpu() first since an impending wakeup racing with idle load balancer will either set the "rq->ttwu_pending" or indicate a newly woken task via "rq->nr_running". Chasing the reason why this check might have existed in the first place, I came across Peter's suggestion on the fist iteration of Suresh's patch from 2011 [1] where the condition to raise the SCHED_SOFTIRQ was: sched_ttwu_do_pending(list); if (unlikely((rq->idle =3D=3D current) && rq->nohz_balance_kick && !need_resched())) raise_softirq_irqoff(SCHED_SOFTIRQ); Since the condition to raise the SCHED_SOFIRQ was preceded by sched_ttwu_do_pending() (which is equivalent of sched_ttwu_pending()) in the current upstream kernel, the need_resched() check was necessary to catch a newly queued task. Peter suggested modifying it to: if (idle_cpu() && rq->nohz_balance_kick && !need_resched()) raise_softirq_irqoff(SCHED_SOFTIRQ); where idle_cpu() seems to have replaced "rq->idle =3D=3D current" check. Even back then, the idle_cpu() check would have been sufficient to catch a new task being enqueued. Since commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") overloads the interpretation of TIF_NEED_RESCHED for TIF_POLLING_NRFLAG idling, remove the need_resched() check in nohz_csd_func() to raise SCHED_SOFTIRQ based on Peter's suggestion. Link: https://lore.kernel.org/all/1317670590.20367.38.camel@twins/ [1] Link: https://lore.kernel.org/lkml/20240615014521.GR8774@noisy.programming.= kicks-ass.net/ Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") Suggested-by: Peter Zijlstra Signed-off-by: K Prateek Nayak --- v3..v4: o No changes. --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c57a79e34911..aaf99c0bcb49 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1242,7 +1242,7 @@ static void nohz_csd_func(void *info) WARN_ON(!(flags & NOHZ_KICK_MASK)); =20 rq->idle_balance =3D idle_cpu(cpu); - if (rq->idle_balance && !need_resched()) { + if (rq->idle_balance) { rq->nohz_idle_balance =3D flags; raise_softirq_irqoff(SCHED_SOFTIRQ); } --=20 2.34.1 From nobody Mon Nov 25 02:26:04 2024 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2062.outbound.protection.outlook.com [40.107.220.62]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E83BB13CF82 for ; Wed, 30 Oct 2024 07:25:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.220.62 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730273161; cv=fail; b=vCSvjeZ7PT9y4brvYgsPhnrM4zFr7pX28u5DqOVMccEDa/h+LdBvXoZYOrQlQOSzc4VrxDlPNe06Tsnt+SfZ0TaDKSFzg/NKemZah4rKG1fxF1i2T9HGVxRl9Qo4BxK1iSD/vpVxIeamlgfxSV6qLO8b95GwiCpHqsZ0FsRG8qQ= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730273161; c=relaxed/simple; bh=7cN/piHL+RmLwlBogUSCxkXXlV1zB5xh7nO6DwvH/jc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=UlHcNfbNI+IyzJfqJPglqWjMCqkHlzobuXV3BKA2qEM8R+AlkFsQ43ZZvMBMP2vucb7Ihp/gdCn8/Wlm60kIQzTChOT/+NxJm8/iGZCOKFm4hWOKdyxa5ABmjx6ybRu92JJeN+5aMyg3OXnMLauJySi4qlUSKd6+80AiVzikhic= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=s30IwE5k; arc=fail smtp.client-ip=40.107.220.62 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="s30IwE5k" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=N94Bbg2+nQ4It+kR/4ISLm88F7Y7wr259QQEQr3+g2XIgBX+eP/RKPQ3fb71JHV0sCNq19yhqy0vycPj+rN759MY8qmzP4pnQm6zqVi92Yhrhr6yHA9ikMRstuVZ6L3b8nevnMNe4Eb3ay0guCn25otHxOd61K2PGO55M5hoWl5Qc68IstW9QeYu5YbMTIcHUKTb3PX99RWnRG72n5c9Cunx7mwSPFBXQQJDIj3anXob/1vgv5X+rdvDZZ7ldj8R7NCwDax0pQ/YOV+ziZe2L2vnckAKiV0sezT3gksjIi7GVypinD+eKvC+PsGMvxueb1lpXc+WfnZ5sIMsCritFw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Nkzv30Ezz7l8xAfvMAVylfwv90yAkDltxvmkMfkX8M0=; b=Cy8GQMCKkbG0SLlIc4EmyJ9B0DyXn8vqbVCgoI7u7fZ9jnGByEb0Kt+xISWrjgN/o42wcyvm7KYsQtivhthb/QW343MoFuIblO05eQbo4sgjXXvyEA8SVFUNqzhEEAgHI+IK3Vu2cxv09TeEvtS6C53viv3I03O1nfPBPd3aYbJmnARTfip5COv4tmTZSlJ9O0A3XY3t4fAL3qb/+9vzbu2b2m8VQVK7NcEikDNsMWEVhYvvRQpgWEbkUKQrnYzeSBMX+BAOhnNmPWe2kwlULkwzEpRBEL39/sGNjMuXvil9dXYt8PY2iH/tFI9UhAKK3DoCBwottaUzu3z5Jf+VQg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Nkzv30Ezz7l8xAfvMAVylfwv90yAkDltxvmkMfkX8M0=; b=s30IwE5kJe/ocIY3u5JP71vGBTpPM4b7TldMqWC6qwUL3EmQJf/4hGrSLe3AyXK93CEP7GKoGHYS24cyp8I9VP3ALr7AWz8ZdsI8w2jXRGgtfQJTpuTo2jG19D4sWP6lik8F2GM+ALs+d2VuR5ikIcaNmNM3uGHWx7cpIA7ZhOg= Received: from DM6PR07CA0062.namprd07.prod.outlook.com (2603:10b6:5:74::39) by SN7PR12MB8132.namprd12.prod.outlook.com (2603:10b6:806:321::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8093.24; Wed, 30 Oct 2024 07:25:50 +0000 Received: from CY4PEPF0000EDD2.namprd03.prod.outlook.com (2603:10b6:5:74:cafe::3c) by DM6PR07CA0062.outlook.office365.com (2603:10b6:5:74::39) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8114.23 via Frontend Transport; Wed, 30 Oct 2024 07:25:50 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CY4PEPF0000EDD2.mail.protection.outlook.com (10.167.241.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8114.16 via Frontend Transport; Wed, 30 Oct 2024 07:25:28 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Wed, 30 Oct 2024 02:25:17 -0500 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Sebastian Andrzej Siewior , Clark Williams , "Steven Rostedt" , , CC: Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider , Thomas Gleixner , Tejun Heo , Jens Axboe , NeilBrown , Zqiang , Caleb Sander Mateos , "Gautham R . Shenoy" , Chen Yu , Julia Lawall , "K Prateek Nayak" , Julia Lawall Subject: [PATCH v4 3/3] sched/core: Prevent wakeup of ksoftirqd during idle load balance Date: Wed, 30 Oct 2024 07:15:57 +0000 Message-ID: <20241030071557.1422-4-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20241030071557.1422-1-kprateek.nayak@amd.com> References: <20241030071557.1422-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000EDD2:EE_|SN7PR12MB8132:EE_ X-MS-Office365-Filtering-Correlation-Id: 2cfcd6c9-4b2d-45e3-33ab-08dcf8b414a5 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|36860700013|376014|7416014|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?cBVMkpEyiWBYAxHw2JsnfCOusi5x3HZRUHvb4/ZITKq7W6EmE8Ra3/SgwGjU?= =?us-ascii?Q?3oGpojRFO5Pin3mqSQmv8irU5dGjtDDDeWDj33f5+48j21wyRz+3EW7AdgBi?= =?us-ascii?Q?oxrq/lVfm/4MnTc6LFPJkNIuNmYEoIbtzOwkcKHKd8YAeK3sLq3lELmvEYqY?= =?us-ascii?Q?t5WA4TQHxdWuWbG6PRQwx/pPoRVfMRtx2NEgvi2r/DQzvxxe54pnSLu7V8kt?= =?us-ascii?Q?Kdr1G1ri52604DN2c8rmrmsMJe7zMBB+LFRrgFSxYHlVs+yW7wjW2d0O3UlO?= =?us-ascii?Q?D3cku/k50SIfKrLn9b7NVTIAoh0CNxXoCVB53C09LC5JRLELkgMHs7/w4/Lr?= =?us-ascii?Q?UVW88ME/Y1zQMBdGeqOFMYIs7XYFhhCUEyR+lMCt9v2E7GVOGQhU5q5p03p9?= =?us-ascii?Q?zXVJMDZw7aFSuU2jnQWI4R6+SHilSIHXiEFgZ/ne2UlczeCFpawVPnpjdEoh?= =?us-ascii?Q?DirV/WTSwyxY/qoz75xmXsBORr2h7kbQUr02v5RqfUCsnjHdFZlWziRKZvk3?= =?us-ascii?Q?ivQy1SMoDfKuNPwmo1okPJe3FH/XYOH5dDzkuPG1SvCXgbZrMpJXIUb1M7IC?= =?us-ascii?Q?uV2OgYivS34nIoFEzZ3SWCwgMP8yagBeR2ntiaDTjzd7+mYpvDVbxi0l+7+f?= =?us-ascii?Q?4X4skuT0ixTLgghvcCDahg1WIV9UvkRv5JgZtLQ+TFqLzMELGB2OCdtNE9Gi?= =?us-ascii?Q?7nhr+4DvuOvEPuK45gj5sfcCEl4MTKMCdL+G8sXjkDSWke2kOrMpMW5aqi1T?= =?us-ascii?Q?9L+36+TCTC2m0gWE40YM3zp4r2nh7vOh+wdK+9US5C0OUQX5X9HjVQ8au/kj?= =?us-ascii?Q?XNepNhHbZi8td3FperoxPLM1soWKinusOWcDe6Beaz3RvxFPrzoRcfCh1CU1?= =?us-ascii?Q?UlipwKca8JuvDLSGRXVoelaueTSEpxTKRB8szSzUEmg+ASpXZiu7udS4rq+f?= =?us-ascii?Q?6OMOpbxQH2pobIyLL+OTG1XvSq5DUtkC2/dc2rQs5t51imcBPqF0Ou5CiYwT?= =?us-ascii?Q?gwSbhNnx/lia6VNbM5NwY5GKRKiSQe6OhHrUnxYY8saYIl4WWIQOYCkvnwj1?= =?us-ascii?Q?aAz6mLnroaqWTl212bR9N78xnr2kxhJ51+dGB1EqBB81p7jE1/2xQV0egcPy?= =?us-ascii?Q?h5I6qScOXXqrnRuyn+tm9m5yhwHeqNbUKAKOn0S5aeBuiWRw502fHomYzwnI?= =?us-ascii?Q?Vdn4ZrVEVvMgJ+2G7CZNzISCZWufaxOz9AGweSlRXlSaCCl44QzOPh1Qwo3P?= =?us-ascii?Q?PKIbWUBXhpJegVC8C7/mN/FaEvi03ggehgZ/uWDzhXPF8EozuxRlQhG2olE2?= =?us-ascii?Q?SP3p6nPx088hffkxyNEllJWiMzfc51pvJCfzw+RhXIC/sA+6qzYqtXTwR2kK?= =?us-ascii?Q?HiC+8sl9gEro2M7IWV2cukZM3NLg2Tugtk35ERTjLaJk5fDfdA=3D=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(36860700013)(376014)(7416014)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Oct 2024 07:25:28.1598 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2cfcd6c9-4b2d-45e3-33ab-08dcf8b414a5 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000EDD2.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB8132 Content-Type: text/plain; charset="utf-8" Scheduler raises a SCHED_SOFTIRQ to trigger a load balancing event on from the IPI handler on the idle CPU. Since the softirq can be raised from flush_smp_call_function_queue(), it can end up waking up ksoftirqd, which can give an illusion of the idle CPU being busy when doing an idle load balancing. Adding a trace_printk() in nohz_csd_func() at the spot of raising SCHED_SOFTIRQ and enabling trace events for sched_switch, sched_wakeup, and softirq_entry (for SCHED_SOFTIRQ vector alone) helps observing the current behavior: -0 [000] dN.1.: nohz_csd_func: Raising SCHED_SOFTIRQ from n= ohz_csd_func -0 [000] dN.4.: sched_wakeup: comm=3Dksoftirqd/0 pid=3D16 p= rio=3D120 target_cpu=3D000 -0 [000] .Ns1.: softirq_entry: vec=3D7 [action=3DSCHED] -0 [000] .Ns1.: softirq_exit: vec=3D7 [action=3DSCHED] -0 [000] d..2.: sched_switch: prev_comm=3Dswapper/0 prev_pi= d=3D0 prev_prio=3D120 prev_state=3DR =3D=3D> next_comm=3Dksoftirqd/0 next_p= id=3D16 next_prio=3D120 ksoftirqd/0-16 [000] d..2.: sched_switch: prev_comm=3Dksoftirqd/0 prev_= pid=3D16 prev_prio=3D120 prev_state=3DS =3D=3D> next_comm=3Dswapper/0 next_= pid=3D0 next_prio=3D120 ... ksoftirqd is woken up before the idle thread calls do_softirq_post_smp_call_flush() which can make the runqueue appear busy and prevent the idle load balancer from pulling task from an overloaded runqueue towards itself[1]. Since the softirq raised is guranteed to be serviced in irq_exit() or via do_softirq_post_smp_call_flush(), set SCHED_SOFTIRQ without checking the need to wakeup ksoftirq for idle load balancing. Following are the observations with the changes when enabling the same set of events: -0 [000] dN.1.: nohz_csd_func: Raising SCHED_SOFTIRQ for= nohz_idle_balance -0 [000] dN.1.: softirq_raise: vec=3D7 [action=3DSCHED] -0 [000] .Ns1.: softirq_entry: vec=3D7 [action=3DSCHED] No unnecessary ksoftirqd wakeups are seen from idle task's context to service the softirq. Fixes: b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") Reported-by: Julia Lawall Closes: https://lore.kernel.org/lkml/fcf823f-195e-6c9a-eac3-25f870cb35ac@in= ria.fr/ [1] Suggested-by: Sebastian Andrzej Siewior Signed-off-by: K Prateek Nayak Reviewed-by: Sebastian Andrzej Siewior --- v3..v4: o New patch based on Sebastian's suggestion. --- kernel/sched/core.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index aaf99c0bcb49..2ee3621d6e7e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1244,7 +1244,18 @@ static void nohz_csd_func(void *info) rq->idle_balance =3D idle_cpu(cpu); if (rq->idle_balance) { rq->nohz_idle_balance =3D flags; - raise_softirq_irqoff(SCHED_SOFTIRQ); + + /* + * Don't wakeup ksoftirqd when raising SCHED_SOFTIRQ + * since the idle load balancer may mistake wakeup of + * ksoftirqd as a genuine task wakeup and bail out from + * load balancing early. Since it is guaranteed that + * pending softirqs will be handled soon, either on + * irq_exit() or via do_softirq_post_smp_call_flush(), + * raise SCHED_SOFTIRQ without checking the need to + * wakeup ksoftirqd. + */ + __raise_softirq_irqoff(SCHED_SOFTIRQ); } } =20 --=20 2.34.1