From nobody Mon Feb 9 07:07:47 2026 Received: from mail-pj1-f97.google.com (mail-pj1-f97.google.com [209.85.216.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 762352F656A for ; Wed, 3 Dec 2025 11:43:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.97 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764762217; cv=none; b=I5Bn4dJ4qlS8UdhiP7vUhtymlalCUlxKPtr55ISB8oktAJYWlNcsCi31Q2vft1RL1FMUMuNAQhDQX0ExzPIVJ6y15X/Gj/d18ST92gk7C8nV72M24AEiEiucPvhw+f4ABvEiotF/KxXBaaGlSf/AOtLxSMQoKOl3S66QvNzYAn4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764762217; c=relaxed/simple; bh=UtdFMJwZBJWJFUXRVW5pS9k5RKZ7sSfvPSP4YuwCcxk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VdBpPucB+ddYlyVruM5TW4J6VEjYGMYW8NpVfFG/3yU+0MK6k38tUrSbOaqLfpk9NWRMfpse0b+MpGUUgCy1iKE1c+avVY0hA/xpGlugvxerMXkeIrTkBkqrxVxnbQ/TELqust6luBwTkm3CFEq8wULRqYtBo8ZdDuNI5X3tYMU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=broadcom.com; spf=fail smtp.mailfrom=broadcom.com; dkim=pass (1024-bit key) header.d=broadcom.com header.i=@broadcom.com header.b=ZjuHlBfM; arc=none smtp.client-ip=209.85.216.97 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=broadcom.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=broadcom.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=broadcom.com header.i=@broadcom.com header.b="ZjuHlBfM" Received: by mail-pj1-f97.google.com with SMTP id 98e67ed59e1d1-3434700be69so9272389a91.1 for ; Wed, 03 Dec 2025 03:43:35 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764762215; x=1765367015; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:dkim-signature:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=12d1g/kfvy76Un8DYq1Egc2YQIXGsJ6Y11du/XcNKeo=; b=s2nqREI1jxT2OVo+CHguaYk9fzu+W3eoykeguBaLtTu8zP+mkqDlsApu4OoAzZCnzt VEVasiOoJ1gERxMhL4Ku/DLH83CWI5ZVN7oKPYTYnc8DNvEm7C7gPZXVzNoGUTEIXnH4 KD9wZBSQEjxKFPUwBhAHxPy2Pt387xGHreouI01ip/69na1/qyR7qeeGYIkrE+KJpCTL BguyX0Q8UhYO9Slz/pz8WYQoAnx575ou67v+3vtEhizqPlJ1ijjCR6tvNkOj3v96rh7p mKAuUpI7XDt1x/vaDiDSguJpT0oblo33Ok7qpuSIDkTjniYKDrEEgJ4Qb9tBMmD4FP1N OzNg== X-Forwarded-Encrypted: i=1; AJvYcCVbiGGruNwR50KMwVFJWcG/z2G6EqirfgHXNgya+TU9qJ89o3C9n9dPO6KTLw1Vxfk53qM/hNfMZ/qMDI8=@vger.kernel.org X-Gm-Message-State: AOJu0YwXKetPotToqzXCWGpGbIsFSe0bXTLHmXGC5GKcYodP3Owqev89 5NUT1+vcJZ1X0fQ07Qyj6DN3n5N6HH/CEgnm83t6bCWny8Sz1/BVjg2o4PAf9I2yz8UvqwF5BBB Q26+Rzmlhv8BPN/C0Wri/nHrQEOb0HwnO57D/yDMKrxnrP+AbQjrY0loEMqFkyDBDZobcvzMdu7 0pkzXGQroDf9vl1WD4+41tnEyqUHvsXdNOdr4FxuvoSqpgD04FmPQ2AqloihdVNwwNcrtlKAIbm ixQkYSqZLRwGgkPQG4= X-Gm-Gg: ASbGncvGxAskZUXJCMYaf6iI13w3xmgNiJc29ljYsPBvtxBKa+4OnPTex1JGskVH3sG guPgJcQM1AArHUu8eXrbBnn8R1zAripv/eEAKADlyVtKvrWwHeInsAz7/wusfSj1wGPN5vDdU94 otuKLVb9hngZQigWEi8szlx444mbNOvglv/WeMq9xFoMCwIHJQWPr6askiVSdga0en6F+SXL2d0 GiBLChfeGPuzY3GghtMhXtKudtTQUNVSKiwIv7yP7tjs8iFU/uhVS3dxFdISBkhxgmi0+G3Z3xs zbaavypoGRz81pzmVvUEsR0WgCV3pFLw/DRZmFiFAJmshAZ9qyXqp1JBftcGDCY2uNT243L1vIl HGHigdLb7aycEZg1rJFIaPOiaWL66oihgoihMfQRyO3iPpRY2iP1nuurCzwt26WSHvl9vyMQg1x PUvN7rD3wFNXJSgcHQb4djOYBErtTEInOH9CM4oxvUcsuV X-Google-Smtp-Source: AGHT+IGts4sP6+HtK2g/F4hML138zGJXBQXEY21tceitxY0EBLW0FV7oiL7lexMsLjJH0hvjZYOzWTk6Af5x X-Received: by 2002:a17:90b:2f84:b0:340:ff89:8b62 with SMTP id 98e67ed59e1d1-349127f956amr2218286a91.21.1764762214715; Wed, 03 Dec 2025 03:43:34 -0800 (PST) Received: from smtp-us-east1-p01-i01-si01.dlp.protect.broadcom.com (address-144-49-247-103.dlp.protect.broadcom.com. [144.49.247.103]) by smtp-relay.gmail.com with ESMTPS id 98e67ed59e1d1-34910b1a9aesm260541a91.1.2025.12.03.03.43.34 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 Dec 2025 03:43:34 -0800 (PST) X-Relaying-Domain: broadcom.com X-CFilter-Loop: Reflected Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-8b22d590227so777741285a.1 for ; Wed, 03 Dec 2025 03:43:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; t=1764762213; x=1765367013; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=12d1g/kfvy76Un8DYq1Egc2YQIXGsJ6Y11du/XcNKeo=; b=ZjuHlBfMImMVIr3c5kHc3R0ugrEbwThQdtqqYJvFbVi/W+Ag2NmwGUPmiQGChnmmru F6bi6P+119vSL8KixZGbQVf9C2iAfjxBKYaKiPJfWBS4MxH0FJRpWD1DCzY2r66Yxs2S PfoHLtZJoVG+y7sGNPt2P9JYh6pogdQOmRKlY= X-Forwarded-Encrypted: i=1; AJvYcCV+v/1CmnAluXhwkyCbupVn9OLa3ua6qwRklWSy4vjJAuveVpV3ls8nb6JBYd+md2iklQs4PJ+Me8q10EE=@vger.kernel.org X-Received: by 2002:a05:620a:31a2:b0:8b2:6606:edaf with SMTP id af79cd13be357-8b5e5830538mr253363385a.37.1764762213308; Wed, 03 Dec 2025 03:43:33 -0800 (PST) X-Received: by 2002:a05:620a:31a2:b0:8b2:6606:edaf with SMTP id af79cd13be357-8b5e5830538mr253358885a.37.1764762212814; Wed, 03 Dec 2025 03:43:32 -0800 (PST) Received: from photon-dev-haas.. ([192.19.161.250]) by smtp.gmail.com with ESMTPSA id af79cd13be357-8b52a1b65bbsm1284727985a.33.2025.12.03.03.43.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Dec 2025 03:43:32 -0800 (PST) From: Ajay Kaher To: stable@vger.kernel.org, gregkh@linuxfoundation.org Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, ajay.kaher@broadcom.com, alexey.makhalov@broadcom.com, yin.ding@broadcom.com, tapas.kundu@broadcom.com, Chris Mason Subject: [PATCH v6.1 4/4] sched/fair: Proportional newidle balance Date: Wed, 3 Dec 2025 11:25:52 +0000 Message-Id: <20251203112552.1738424-5-ajay.kaher@broadcom.com> X-Mailer: git-send-email 2.40.4 In-Reply-To: <20251203112552.1738424-1-ajay.kaher@broadcom.com> References: <20251203112552.1738424-1-ajay.kaher@broadcom.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-DetectorID-Processed: b00c1d49-9d2e-4205-b15f-d015386d3d5e Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra commit 33cf66d88306663d16e4759e9d24766b0aaa2e17 upstream. Add a randomized algorithm that runs newidle balancing proportional to its success rate. This improves schbench significantly: 6.18-rc4: 2.22 Mrps/s 6.18-rc4+revert: 2.04 Mrps/s 6.18-rc4+revert+random: 2.18 Mrps/S Conversely, per Adam Li this affects SpecJBB slightly, reducing it by 1%: 6.17: -6% 6.17+revert: 0% 6.17+revert+random: -1% Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Dietmar Eggemann Tested-by: Dietmar Eggemann Tested-by: Chris Mason Link: https://lkml.kernel.org/r/6825c50d-7fa7-45d8-9b81-c6e7e25738e2@meta.c= om Link: https://patch.msgid.link/20251107161739.770122091@infradead.org [ Ajay: Modified to apply on v6.1 ] Signed-off-by: Ajay Kaher --- include/linux/sched/topology.h | 3 +++ kernel/sched/core.c | 3 +++ kernel/sched/fair.c | 44 ++++++++++++++++++++++++++++++---- kernel/sched/features.h | 5 ++++ kernel/sched/sched.h | 7 ++++++ kernel/sched/topology.c | 6 +++++ 6 files changed, 64 insertions(+), 4 deletions(-) diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index 816df6cc4..caeceec3e 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -106,6 +106,9 @@ struct sched_domain { unsigned int nr_balance_failed; /* initialise to 0 */ =20 /* idle_balance() stats */ + unsigned int newidle_call; + unsigned int newidle_success; + unsigned int newidle_ratio; u64 max_newidle_lb_cost; unsigned long last_decay_max_lb_cost; =20 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9b01fdceb..09ffe1b96 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -112,6 +112,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_se_tp); EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp); =20 DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); +DEFINE_PER_CPU(struct rnd_state, sched_rnd_state); =20 #ifdef CONFIG_SCHED_DEBUG /* @@ -9632,6 +9633,8 @@ void __init sched_init_smp(void) { sched_init_numa(NUMA_NO_NODE); =20 + prandom_init_once(&sched_rnd_state); + /* * There's no userspace yet to cause hotplug operations; hence all the * CPU masks are stable and all blatant races in the below code cannot diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2f296e2af..9f7c9083e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10935,11 +10935,27 @@ void update_max_interval(void) max_load_balance_interval =3D HZ*num_online_cpus()/10; } =20 -static inline bool update_newidle_cost(struct sched_domain *sd, u64 cost) +static inline void update_newidle_stats(struct sched_domain *sd, unsigned = int success) +{ + sd->newidle_call++; + sd->newidle_success +=3D success; + + if (sd->newidle_call >=3D 1024) { + sd->newidle_ratio =3D sd->newidle_success; + sd->newidle_call /=3D 2; + sd->newidle_success /=3D 2; + } +} + +static inline bool +update_newidle_cost(struct sched_domain *sd, u64 cost, unsigned int succes= s) { unsigned long next_decay =3D sd->last_decay_max_lb_cost + HZ; unsigned long now =3D jiffies; =20 + if (cost) + update_newidle_stats(sd, success); + if (cost > sd->max_newidle_lb_cost) { /* * Track max cost of a domain to make sure to not delay the @@ -10987,7 +11003,7 @@ static void rebalance_domains(struct rq *rq, enum c= pu_idle_type idle) * Decay the newidle max times here because this is a regular * visit to all the domains. */ - need_decay =3D update_newidle_cost(sd, 0); + need_decay =3D update_newidle_cost(sd, 0, 0); max_cost +=3D sd->max_newidle_lb_cost; =20 /* @@ -11621,6 +11637,22 @@ static int sched_balance_newidle(struct rq *this_r= q, struct rq_flags *rf) break; =20 if (sd->flags & SD_BALANCE_NEWIDLE) { + unsigned int weight =3D 1; + + if (sched_feat(NI_RANDOM)) { + /* + * Throw a 1k sided dice; and only run + * newidle_balance according to the success + * rate. + */ + u32 d1k =3D sched_rng() % 1024; + weight =3D 1 + sd->newidle_ratio; + if (d1k > weight) { + update_newidle_stats(sd, 0); + continue; + } + weight =3D (1024 + weight/2) / weight; + } =20 pulled_task =3D load_balance(this_cpu, this_rq, sd, CPU_NEWLY_IDLE, @@ -11628,10 +11660,14 @@ static int sched_balance_newidle(struct rq *this_= rq, struct rq_flags *rf) =20 t1 =3D sched_clock_cpu(this_cpu); domain_cost =3D t1 - t0; - update_newidle_cost(sd, domain_cost); - curr_cost +=3D domain_cost; t0 =3D t1; + + /* + * Track max cost of a domain to make sure to not delay the + * next wakeup on the CPU. + */ + update_newidle_cost(sd, domain_cost, weight * !!pulled_task); } =20 /* diff --git a/kernel/sched/features.h b/kernel/sched/features.h index ee7f23c76..0115183ee 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -99,5 +99,10 @@ SCHED_FEAT(UTIL_EST_FASTUP, true) =20 SCHED_FEAT(LATENCY_WARN, false) =20 +/* + * Do newidle balancing proportional to its success rate using randomizati= on. + */ +SCHED_FEAT(NI_RANDOM, true) + SCHED_FEAT(ALT_PERIOD, true) SCHED_FEAT(BASE_SLICE, true) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 95afded0b..6f66a9b1a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -5,6 +5,7 @@ #ifndef _KERNEL_SCHED_SCHED_H #define _KERNEL_SCHED_SCHED_H =20 +#include #include #include #include @@ -1190,6 +1191,12 @@ static inline bool is_migration_disabled(struct task= _struct *p) } =20 DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); +DECLARE_PER_CPU(struct rnd_state, sched_rnd_state); + +static inline u32 sched_rng(void) +{ + return prandom_u32_state(this_cpu_ptr(&sched_rnd_state)); +} =20 #define cpu_rq(cpu) (&per_cpu(runqueues, (cpu))) #define this_rq() this_cpu_ptr(&runqueues) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index d404b5d2d..9d6ec8311 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1584,6 +1584,12 @@ sd_init(struct sched_domain_topology_level *tl, =20 .last_balance =3D jiffies, .balance_interval =3D sd_weight, + + /* 50% success rate */ + .newidle_call =3D 512, + .newidle_success =3D 256, + .newidle_ratio =3D 512, + .max_newidle_lb_cost =3D 0, .last_decay_max_lb_cost =3D jiffies, .child =3D child, --=20 2.40.4