From nobody Tue Dec 16 02:58:46 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C627018DF73; Sat, 15 Mar 2025 09:47:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742032062; cv=none; b=XeqmkANFD8VANZW1r5qh33d7aIu1GFK6I1yh5K1f5/CTnHmOnWgOE2wdAo/G6qc9TZvyhvEr2ULomb4mYQ0+k3hRpUTmUOqOXQP7wCQyanunzmDpnOO2RstPv3Y2dsQqfZJziXDEsppleMi2o8GEopxuqz7NCMwJTl9p9K6E3sM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742032062; c=relaxed/simple; bh=QsppZ2VdJ0QMr8Tovr+o4WKQV7iF6rO1LQiiZQJHL7E=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=eL2YyCqt+HecT2uuICYlShxYCWEmI5MdNCRRiANuD8dWkk8zCvdxgZjOi/5wawqK0eiQshjZzKBRyY6vBoYty43+5iBP4ABDFv53S44hTTGkWIlDxaQsivIa9wH0pMi8Ue7va34tN3uYEO7Ria5Pbfd1MGdrkOLfbvvoLzjjqgU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=K/pPsz0C; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=VNEIPhW4; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="K/pPsz0C"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="VNEIPhW4" Date: Sat, 15 Mar 2025 09:47:37 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1742032059; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Km0ugx/+JsfaSTCSmC6l2sdAuWR1BcWc0jvA0T0ggb0=; b=K/pPsz0CiyTx6I4w5WISzKJpYDlC04ERz+hpH6NE4Hu39ZdoeYzrt9vD3FCD0todcJdn0a xQLwt02FmRPpnsbkkZHL6afONNEGYBb3Lkoe7KGJflNui0hOCG2/zI0tVsoAZsmYibNp7X LWADTIhyOc8+C8ei6Fo9mmHSbNonaTYguTwP+rRIQzqUNSxX3fHsPcJYLPXWAB7hmrOvVL QJawtuJk6Jsv19RRg/DKtEZ4wV7KUTWyGPtiqSOqQc6OogB77WJw0xUxo1iOuwzENL2lKW YDR3HCRmgAba/nI2mqILlfLBmkcC2Gd4cTruR6LBsmWzIhgHgLcBnaFeXCAQjw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1742032059; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Km0ugx/+JsfaSTCSmC6l2sdAuWR1BcWc0jvA0T0ggb0=; b=VNEIPhW45GJdDLp4S1lq/3dxMJM84JWef2J3vlfoIeZ8R/65x1kZvwf9VsSqgsT30W6H5H ALpw9M/bu1M+hADA== From: "tip-bot2 for Dietmar Eggemann" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/urgent] Revert "sched/core: Reduce cost of sched_move_task when config autogroup" Cc: Hazem Mohamed Abuelfotoh , Dietmar Eggemann , "Peter Zijlstra (Intel)" , Ingo Molnar , Vincent Guittot , Hagar Hemdan , Linus Torvalds , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20250314151345.275739-1-dietmar.eggemann@arm.com> References: <20250314151345.275739-1-dietmar.eggemann@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <174203205783.14745.12971717119665166640.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the sched/urgent branch of tip: Commit-ID: 76f970ce51c80f625eb6ddbb24e9cb51b977b598 Gitweb: https://git.kernel.org/tip/76f970ce51c80f625eb6ddbb24e9cb51b= 977b598 Author: Dietmar Eggemann AuthorDate: Fri, 14 Mar 2025 16:13:45 +01:00 Committer: Ingo Molnar CommitterDate: Sat, 15 Mar 2025 10:34:27 +01:00 Revert "sched/core: Reduce cost of sched_move_task when config autogroup" This reverts commit eff6c8ce8d4d7faef75f66614dd20bb50595d261. Hazem reported a 30% drop in UnixBench spawn test with commit eff6c8ce8d4d ("sched/core: Reduce cost of sched_move_task when config autogroup") on a m6g.xlarge AWS EC2 instance with 4 vCPUs and 16 GiB RAM (aarch64) (single level MC sched domain): https://lkml.kernel.org/r/20250205151026.13061-1-hagarhem@amazon.com There is an early bail from sched_move_task() if p->sched_task_group is equal to p's 'cpu cgroup' (sched_get_task_group()). E.g. both are pointing to taskgroup '/user.slice/user-1000.slice/session-1.scope' (Ubuntu '22.04.5 LTS'). So in: do_exit() sched_autogroup_exit_task() sched_move_task() if sched_get_task_group(p) =3D=3D p->sched_task_group return /* p is enqueued */ dequeue_task() \ sched_change_group() | task_change_group_fair() | detach_task_cfs_rq() | (1) set_task_rq() | attach_task_cfs_rq() | enqueue_task() / (1) isn't called for p anymore. Turns out that the regression is related to sgs->group_util in group_is_overloaded() and group_has_capacity(). If (1) isn't called for all the 'spawn' tasks then sgs->group_util is ~900 and sgs->group_capacity =3D 1024 (single CPU sched domain) and this leads to group_is_overloaded() returning true (2) and group_has_capacity() false (3) much more often compared to the case when (1) is called. I.e. there are much more cases of 'group_is_overloaded' and 'group_fully_busy' in WF_FORK wakeup sched_balance_find_dst_cpu() which then returns much more often a CPU !=3D smp_processor_id() (5). This isn't good for these extremely short running tasks (FORK + EXIT) and also involves calling sched_balance_find_dst_group_cpu() unnecessary (single CPU sched domain). Instead if (1) is called for 'p->flags & PF_EXITING' then the path (4),(6) is taken much more often. select_task_rq_fair(..., wake_flags =3D WF_FORK) cpu =3D smp_processor_id() new_cpu =3D sched_balance_find_dst_cpu(..., cpu, ...) group =3D sched_balance_find_dst_group(..., cpu) do { update_sg_wakeup_stats() sgs->group_type =3D group_classify() if group_is_overloaded() (2) return group_overloaded if !group_has_capacity() (3) return group_fully_busy return group_has_spare (4) } while group if local_sgs.group_type > idlest_sgs.group_type return idlest (5) case group_has_spare: if local_sgs.idle_cpus >=3D idlest_sgs.idle_cpus return NULL (6) Unixbench Tests './Run -c 4 spawn' on: (a) VM AWS instance (m7gd.16xlarge) with v6.13 ('maxcpus=3D4 nr_cpus=3D4') and Ubuntu 22.04.5 LTS (aarch64). Shell & test run in '/user.slice/user-1000.slice/session-1.scope'. w/o patch w/ patch 21005 27120 (b) i7-13700K with tip/sched/core ('nosmt maxcpus=3D8 nr_cpus=3D8') and Ubuntu 22.04.5 LTS (x86_64). Shell & test run in '/A'. w/o patch w/ patch 67675 88806 CONFIG_SCHED_AUTOGROUP=3Dy & /sys/proc/kernel/sched_autogroup_enabled equal 0 or 1. Reported-by: Hazem Mohamed Abuelfotoh Signed-off-by: Dietmar Eggemann Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Vincent Guittot Tested-by: Hagar Hemdan Cc: Linus Torvalds Link: https://lore.kernel.org/r/20250314151345.275739-1-dietmar.eggemann@ar= m.com --- kernel/sched/core.c | 21 +++------------------ 1 file changed, 3 insertions(+), 18 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 6718990..042351c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9016,7 +9016,7 @@ void sched_release_group(struct task_group *tg) spin_unlock_irqrestore(&task_group_lock, flags); } =20 -static struct task_group *sched_get_task_group(struct task_struct *tsk) +static void sched_change_group(struct task_struct *tsk) { struct task_group *tg; =20 @@ -9028,13 +9028,7 @@ static struct task_group *sched_get_task_group(struc= t task_struct *tsk) tg =3D container_of(task_css_check(tsk, cpu_cgrp_id, true), struct task_group, css); tg =3D autogroup_task_group(tsk, tg); - - return tg; -} - -static void sched_change_group(struct task_struct *tsk, struct task_group = *group) -{ - tsk->sched_task_group =3D group; + tsk->sched_task_group =3D tg; =20 #ifdef CONFIG_FAIR_GROUP_SCHED if (tsk->sched_class->task_change_group) @@ -9055,20 +9049,11 @@ void sched_move_task(struct task_struct *tsk, bool = for_autogroup) { int queued, running, queue_flags =3D DEQUEUE_SAVE | DEQUEUE_MOVE | DEQUEUE_NOCLOCK; - struct task_group *group; struct rq *rq; =20 CLASS(task_rq_lock, rq_guard)(tsk); rq =3D rq_guard.rq; =20 - /* - * Esp. with SCHED_AUTOGROUP enabled it is possible to get superfluous - * group changes. - */ - group =3D sched_get_task_group(tsk); - if (group =3D=3D tsk->sched_task_group) - return; - update_rq_clock(rq); =20 running =3D task_current_donor(rq, tsk); @@ -9079,7 +9064,7 @@ void sched_move_task(struct task_struct *tsk, bool fo= r_autogroup) if (running) put_prev_task(rq, tsk); =20 - sched_change_group(tsk, group); + sched_change_group(tsk); if (!for_autogroup) scx_cgroup_move_task(tsk);