From nobody Mon Dec 1 22:05:05 2025 Received: from fra-out-009.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-009.esa.eu-central-1.outbound.mail-perimeter.amazon.com [3.64.237.68]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B7AB303CAB for ; Fri, 28 Nov 2025 13:20:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=3.64.237.68 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764336042; cv=none; b=UxR1s0ehZTEueEicqI4pq8BwUYJjbVnpxIewuvmVbLTiXHYTk6jDGeyeTsd/muFqI8ACdn84cvtljqzmWviUxUB4UNcq9rtW3Mk9zV96G1/SVysI7yILtuuwEUF55U/f20mvO1/GvwhGX/itmfcMublQOjCPsbhPGc7nxvuNG5Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764336042; c=relaxed/simple; bh=1S8Mz0RWQU1gQywcN8ZNasjBJtpSxjrBeEvAD+qqh24=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=mAQDC7K3hdR4oavX8vQEzRYY5YOFKW/DXuhGTAPHgC1wY6DR3NOkB18T56OV1sOMR6D1Ij9XLo/Xa+Evo7M9kFACXXwszZWPbdBfAXwRoEIIJU+s2K0LJm5xM05HCko8/55ZdL0/aEodbfky9iB5zF9cC35SLuZ62nYRBD7t6aI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (2048-bit key) header.d=amazon.com header.i=@amazon.com header.b=oqaHiiOx; arc=none smtp.client-ip=3.64.237.68 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.com header.i=@amazon.com header.b="oqaHiiOx" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazoncorp2; t=1764336040; x=1795872040; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=62KnJQ40+0651te1h2Hr2GQH0A1IbjCd9TQQGoLqEH0=; b=oqaHiiOxLpL+9SvrpcUnoBdp4vajRmApO00a+lq2V8F5nUSxRfJM0ma3 ufdGP/+aDvEDxpIw3+g4Td515oafovCrC7i1Eqncwy0uvdzDOpOp2oTLt 4RxkTh9gsVpDFex4cdqKdqHJteaCzpcqoQlxEHmTB/S4yq38U7AIupgDn l0wB1vZeEISg7IHMp3n5RRUzzZAuZ2xKk40G16CMfIBPC+hPGkEKPdu0u H29wr/pt8ggPMZNe3Q9SU+YgAJYXEpgp9Dno0SGABSrHqrOvnNMGtaliX s4s6EhLQUtGg+GuRFdKEYuF90wwEdtIZ7RBa7tKfCrSqyMG+GWEBwt4BR g==; X-CSE-ConnectionGUID: E4XogNnUR+q4CWIs78a0+g== X-CSE-MsgGUID: rrXD7JOOQpaKcT4MuRHIgg== X-IronPort-AV: E=Sophos;i="6.20,232,1758585600"; d="scan'208";a="5847116" Received: from ip-10-6-11-83.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.11.83]) by internal-fra-out-009.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Nov 2025 13:20:22 +0000 Received: from EX19MTAEUC002.ant.amazon.com [54.240.197.228:26831] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.7.127:2525] with esmtp (Farcaster) id ada8b689-6b28-41e9-90cb-9de45352bd0a; Fri, 28 Nov 2025 13:20:22 +0000 (UTC) X-Farcaster-Flow-ID: ada8b689-6b28-41e9-90cb-9de45352bd0a Received: from EX19D003EUB001.ant.amazon.com (10.252.51.97) by EX19MTAEUC002.ant.amazon.com (10.252.51.181) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Fri, 28 Nov 2025 13:20:22 +0000 Received: from u5934974a1cdd59.ant.amazon.com (10.146.13.223) by EX19D003EUB001.ant.amazon.com (10.252.51.97) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Fri, 28 Nov 2025 13:20:15 +0000 From: Fernand Sieber To: , , CC: , , , , , , , , , , Subject: [PATCH] sched/core: Push tasks on force idle Date: Fri, 28 Nov 2025 15:19:54 +0200 Message-ID: <20251128131954.324423-1-sieberf@amazon.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D038UWB003.ant.amazon.com (10.13.139.157) To EX19D003EUB001.ant.amazon.com (10.252.51.97) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When a cpu enters force idle, it will 1) try to steal cookie matching tasks from other CPUs 2) do the newidle balance If the stealing fails, we are out of options to get out of force idle properly. New idle balance might decide to pull other tasks, but they won't necessarily be matching anyways. Introduce a step in between where we try to push the runnable tasks that are blocked in force idle to a more suitable CPU. =3D=3D=3D Testing setup =3D=3D=3D Similar setup as in: https://lore.kernel.org/lkml/20251127202719.963766-1-sieberf@amazon.com Testing is aimed at measuring perceived guest noise on hypervisor system with time shared scenarios. Setup is on system where the load is nearing 100% which should allow no steal time. The system has 64 CPUs, with 8 VMs, each VM using core scheduling with 8 vCPUs per VM, time shared. 7 VMs are running stressors (`stress-ng --cpu 0`) while the last VM is running the hwlat tracer with a width of 100ms, a period of 300ms, and a threshold of 100us. Each VM runs a cookied non vCPU VMM process that adds a light level of noise which forces some level of load balancing. The test scenario is ran 10x60s and the average noise is measured (we use breaches scaled up to period/width to estimate noise). =3D=3D=3D Testing results =3D=3D=3D Baseline noise: 1.20% After patch noise: 0.66% (-45%) Signed-off-by: Fernand Sieber --- kernel/sched/core.c | 88 +++++++++++++++++++++++++++++++++++++++++++- kernel/sched/fair.c | 11 ++++++ kernel/sched/sched.h | 1 + 3 files changed, 98 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f754a60de848..852863eda8b8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6338,6 +6338,81 @@ static bool steal_cookie_task(int cpu, struct sched_= domain *sd) return false; } =20 +static bool forceidle_try_push_task(int this, int that) +{ + struct rq *dst =3D cpu_rq(that), *src =3D cpu_rq(this); + struct task_struct *p; + int cpu; + bool cookie_check =3D false; + bool success =3D false; + const struct sched_class *class; + + if (!available_idle_cpu(that)) + return false; + + if (sched_core_enabled(dst)) { + for_each_cpu(cpu, cpu_smt_mask(that)) { + if (cpu =3D=3D that) + continue; + if (!available_idle_cpu(cpu)) { + cookie_check =3D true; + break; + } + } + } + + guard(irq)(); + double_rq_lock(dst, src); + + for_each_class(class) { + if (!class->select_next_task_push) + continue; + + p =3D class->select_next_task_push(src, NULL); + while (p) { + if (!is_cpu_allowed(p, that)) + goto next; + + if (sched_task_is_throttled(p, that)) + goto next; + + if (cookie_check && dst->core->core_cookie !=3D p->core_cookie) + goto next; + + deactivate_task(src, p, 0); + set_task_cpu(p, that); + activate_task(dst, p, 0); + wakeup_preempt(dst, p, 0); + + success =3D true; + break; + +next: + p =3D class->select_next_task_push(src, p); + } + } + + double_rq_unlock(dst, src); + return success; +} + +static bool forceidle_push_tasks(int cpu, struct sched_domain *sd) +{ + int i; + + for_each_cpu_wrap(i, sched_domain_span(sd), cpu + 1) { + if (cpumask_test_cpu(i, cpu_smt_mask(cpu))) + continue; + + if (need_resched()) + break; + + if (forceidle_try_push_task(cpu, i)) + return true; + } + return false; +} + static void sched_core_balance(struct rq *rq) { struct sched_domain *sd; @@ -6349,11 +6424,20 @@ static void sched_core_balance(struct rq *rq) raw_spin_rq_unlock_irq(rq); for_each_domain(cpu, sd) { if (need_resched()) - break; + goto out; =20 if (steal_cookie_task(cpu, sd)) - break; + goto out; + } + for_each_domain(cpu, sd) { + if (need_resched()) + goto out; + + if (forceidle_push_tasks(cpu, sd)) + goto out; } + +out: raw_spin_rq_lock_irq(rq); } =20 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7c86a67762d1..a50cec23458c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13113,6 +13113,16 @@ static int task_is_throttled_fair(struct task_stru= ct *p, int cpu) #endif return throttled_hierarchy(cfs_rq); } + +static struct task_struct *select_next_task_push_fair(struct rq *rq, struc= t task_struct *p) +{ + p =3D list_prepare_entry(p, &rq->cfs_tasks, se.group_node); + list_for_each_entry_continue_reverse(p, &rq->cfs_tasks, se.group_node) { + return p; + } + return NULL; +} + #else /* !CONFIG_SCHED_CORE: */ static inline void task_tick_core(struct rq *rq, struct task_struct *curr)= {} #endif /* !CONFIG_SCHED_CORE */ @@ -13674,6 +13684,7 @@ DEFINE_SCHED_CLASS(fair) =3D { =20 #ifdef CONFIG_SCHED_CORE .task_is_throttled =3D task_is_throttled_fair, + .select_next_task_push =3D select_next_task_push_fair, #endif =20 #ifdef CONFIG_UCLAMP_TASK diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index fdee101b1a66..bdcea16fca54 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2477,6 +2477,7 @@ struct sched_class { =20 #ifdef CONFIG_SCHED_CORE int (*task_is_throttled)(struct task_struct *p, int cpu); + struct task_struct* (*select_next_task_push)(struct rq *rq, struct task_s= truct *p); #endif }; =20 --=20 2.43.0 Amazon Development Centre (South Africa) (Proprietary) Limited 29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa Registration Number: 2004 / 034463 / 07