From nobody Mon May 25 05:12:53 2026 Received: from canpmsgout03.his.huawei.com (canpmsgout03.his.huawei.com [113.46.200.218]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9785D401A28 for ; Mon, 18 May 2026 12:17:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.218 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779106623; cv=none; b=WhI6Lhna1xn071rob8ArUmjuQIulDVbScCIu27odjJOcx9cQp2fzYfods+YxN8rpdH21BMdyVXJqN1BZjQuNO8l7+5tn7CmXzlubcbuJqkoxBEX71Rx3IFNtRcszebPsYRojMFEk5o/p9/s4pNSH2m8nB1yuRtJ+PBlrorEVQ3Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779106623; c=relaxed/simple; bh=wHPal7IZ07upSHSvtTGWAPK5aoEP3FLBo/otftKS1xI=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=QbI8b0XOVO+nAJQnVeiKn6dpy80OULkRtAjrHODVKgrynr/RMZAnMsXiLRNUiXOtkGq6sfRor1tk87f5gBfjKNweUraayKTkqNZOu5Q3kQ3hfG7KOFdl43Jlfm2z/H6WXcx1JAkOCJgExVf/8sm2u+tg1zWviMY4/RT5sKmz1/4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=SC06hyM9; arc=none smtp.client-ip=113.46.200.218 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="SC06hyM9" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=S4DnaBf3ACEHKy01s1hGaFHSW3XoxpE0ZqF3Y81K1Mc=; b=SC06hyM9YOg/PrEQKLHHJEqYLhbfTg5220V/FEZrUOvuTA/F40QXDOne5QsiZWar6nFAv/16P UVCXIOP3VEGXiOqfvAyzQpVV90DShRtd2nvtBVHZ7tbmEgAIfNq8c6RgJ4MsbaJpn5f6bNADr0b COVn2tpueYV1NdrrSZfpcng= Received: from mail.maildlp.com (unknown [172.19.162.223]) by canpmsgout03.his.huawei.com (SkyGuard) with ESMTPS id 4gJxSV6TZYzpStK; Mon, 18 May 2026 20:09:46 +0800 (CST) Received: from kwepemj100017.china.huawei.com (unknown [7.202.194.11]) by mail.maildlp.com (Postfix) with ESMTPS id 6179340561; Mon, 18 May 2026 20:16:57 +0800 (CST) Received: from huawei.com (10.67.174.193) by kwepemj100017.china.huawei.com (7.202.194.11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.36; Mon, 18 May 2026 20:16:56 +0800 From: Luo Gengkun To: CC: , , , , , , , , , Subject: [PATCH RFC] sched/fair: Introduce half-idle retry mechanism in NI_RANDOM Date: Mon, 18 May 2026 12:43:46 +0000 Message-ID: <20260518124346.4010277-1-luogengkun2@huawei.com> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems200001.china.huawei.com (7.221.188.67) To kwepemj100017.china.huawei.com (7.202.194.11) Content-Type: text/plain; charset="utf-8" When NI_RANDOM is enabled, some sched domains may be skipped. If the sched_balance_newidle() fails to pull any tasks, there maybe substantial amount of idle time. This patch introduces a "half-idle" retry mechanism to harness the remaining idle time, safely re-evaluating the domains that were skipped stochastically in the first round. The retry logic is restricted by following constraints: 1. Half Idle Time Threshold: Retries are triggered only when the remaining idle time exceeds avg_idle / 2. This ensure that there is enough idle time to aborb the retry costs. 2. Dynamic Re-try Window: Instead of setting the half_idle as avg_idle / 2, initializing it with the remain_idle. When remain_idle is abundant, using avg_idle / 2 would shrink the window and skip most of domains. 3. LLC Only: This ensures that the newly triggered load balance remain lightweight. 4. Skipped Domains Only: The retry only re-evaluates domains that were skipped by NI_RANDOM during the first pass, this is tracked by 'try_bits'. Additionally, because this is a 'retry' path, the weight is maintained at 1 to ensure long-term balancing accuracy. Performance Evaluation: Tested with schbench against v7.1-rc3 baseline, each case was tested 10 times and the average was used. [schbench] ------------------------------------------------------------------------- | baseline | | compare | loads |rps(M/s) 99th Wakeup Lat| |rps(M/s) 99th Wakeup Lat| ------------------------------------------------------------------------- threads=3D2 |1.03 7.14 | |1.05(+1.94%) 7.08 (-0.84%) | threads=3D4 |2.68 7.97 | |2.82(+5.22%) 7.82 (-1.88%) | threads=3D8 |3.04 12.91 | |3.23(+6.25%) 12.38 (-4.11%) | threads=3D16 |3.06 23.27 | |3.30(+7.84%) 21.90 (-5.89%) | threads=3D32 |2.95 46.50 | |3.25(+10.17%) 43.28 (-6.92%) | threads=3D64 |2.81 94.92 | |3.14(+11.74%) 88.34 (-6.93%) | threads=3D128 |2.71 195.77 | |2.97(+9.59%) 183.07 (-6.49%)| threads=3D256 |2.60 403.34 | |2.77(+6.54%) 385.30 (-4.47%)| ------------------------------------------------------------------------- Signed-off-by: Luo Gengkun --- kernel/sched/fair.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3ebec186f982..5d1c73a41c37 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13185,6 +13185,7 @@ static int sched_balance_newidle(struct rq *this_rq= , struct rq_flags *rf) int this_cpu =3D this_rq->cpu; int continue_balancing =3D 1; u64 t0, t1, curr_cost =3D 0; + u64 idx, half_idle =3D 0, try_bits =3D 0; struct sched_domain *sd; int pulled_task =3D 0; =20 @@ -13240,18 +13241,28 @@ static int sched_balance_newidle(struct rq *this_= rq, struct rq_flags *rf) rq_modified_begin(this_rq, &fair_sched_class); raw_spin_rq_unlock(this_rq); =20 +retry: + idx =3D 0; for_each_domain(this_cpu, sd) { - u64 domain_cost; + u64 domain_cost, next_cost =3D curr_cost + sd->max_newidle_lb_cost; =20 - update_next_balance(sd, &next_balance); + if (!half_idle) + update_next_balance(sd, &next_balance); =20 - if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) + if (this_rq->avg_idle < next_cost) { + continue_balancing =3D 0; break; + } + + if (try_bits & (1UL << ++idx) || + (half_idle && (!(sd->flags & SD_SHARE_LLC) || next_cost >=3D half_id= le))) + continue; =20 if (sd->flags & SD_BALANCE_NEWIDLE) { unsigned int weight =3D 1; =20 - if (sched_feat(NI_RANDOM) && sd->newidle_ratio < 1024) { + if (sched_feat(NI_RANDOM) && sd->newidle_ratio < 1024 && + !half_idle) { /* * Throw a 1k sided dice; and only run * newidle_balance according to the success @@ -13266,6 +13277,7 @@ static int sched_balance_newidle(struct rq *this_rq= , struct rq_flags *rf) weight =3D (1024 + weight/2) / weight; } =20 + try_bits |=3D (1UL << idx); pulled_task =3D sched_balance_rq(this_cpu, this_rq, sd, CPU_NEWLY_IDLE, &continue_balancing); @@ -13290,6 +13302,17 @@ static int sched_balance_newidle(struct rq *this_r= q, struct rq_flags *rf) break; } =20 + if (sched_feat(NI_RANDOM) && !half_idle && + !(pulled_task || !continue_balancing)) { + s64 remain_idle =3D this_rq->avg_idle - curr_cost; + + if (remain_idle > 0 && + remain_idle >=3D this_rq->avg_idle / 2) { + half_idle =3D remain_idle; + goto retry; + } + } + raw_spin_rq_lock(this_rq); =20 if (curr_cost > this_rq->max_idle_balance_cost) --=20 2.34.1