From nobody Sun Feb 8 11:21:50 2026 Received: from mail-pl1-f193.google.com (mail-pl1-f193.google.com [209.85.214.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 59DC318E25 for ; Wed, 25 Dec 2024 05:41:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.193 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735105290; cv=none; b=a+g81qO8kHSR6RWOwi1n3Ry3801lUmlglI1ZgM4Wa8BN/Zspz9b1DU9w/FtegY2syvArnFGpCb/KW06Q/YY6miUaqy6OXsmLFQphRJ+ydZXnbE3R6WUt+QEr69uXv6eaXEYLRnJvZgSvknDEUWbuRJGh22awi37q6WwvmQC2Uxs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735105290; c=relaxed/simple; bh=OHS1VrAXNZkp7OGcUgYF/UYAoOti2ILAwNXyYBXtHeg=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=Fs5UAAJJBLWJkQKvj/tATBL3mg8oSJk7xy/WcIHEgg563mOqcvyipyP/QZmdiN+EEEzS3bNdIkJX2ByUwzY3nUWGOzNDfftCplTrGAtTp9wG0Rq4a3MjcHExU2wgwgQrVJ0l88z4QU7g6yBUuYdITZOPhsqBlaMo4ycGZZZ9JNY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=feXI6uBH; arc=none smtp.client-ip=209.85.214.193 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="feXI6uBH" Received: by mail-pl1-f193.google.com with SMTP id d9443c01a7336-21661be2c2dso54317285ad.1 for ; Tue, 24 Dec 2024 21:41:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735105286; x=1735710086; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=fj6RBNh7Z3s69h4DduRXQR4t/JtsEobZmjSSURFMnXc=; b=feXI6uBHiQoJmOSSyU8YmLyDAum5yUGQtEjJaCk/D36wtH3VPI9RaBfcd5bNf1h1zb 1eX2nB+XsOgaGo0pJH0LbYlxmze3kRNG2W5xtBFY1ElXCopEYJr0k4B7/IuTmutJjRIj tEki2W8RqMG20E01OaV3tJzRyi1A8+scpNjFEiglRDtP7uc5WjhMMewZGuUYp0+++oKM muUhFBozBbXldOpHJ9m04ibkL8Ge7AoBCH0Y8zzgoY6FMuR3awz6BgfkzERbyo7U+Mb4 jM32vJ1debHig5i1I/yhE+lRla0LG7gEeXDGgfVAdzox3e3UV2SMRpXOtAlF7xG0IBZ1 Qfuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735105286; x=1735710086; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fj6RBNh7Z3s69h4DduRXQR4t/JtsEobZmjSSURFMnXc=; b=qA0cdMVs/Z/EwVbp2BaE6NqtI0lk9Kh0CAKZUAqPuMyGSgtBdVWaUGNTCE2FI5eQQ5 siiNhjC6+4DI5Rq5eOz8S7Rx2TYHkU62Vj21/rCNeT+3QxqzsrnKGJ9JPovo1kj5cyTx okAF/u+qnMt5Inu1ev4RvlGycZKm/V3DMpo79WlkTQ7zB8edd7YlOBYBDqnIGjsnKph7 RlZwvBveGlccx0w3doEE0Y/V1FCArQNQrCjXLrfOpFEqxtHnXBUhToEiBHkbc/YJ8GBO Di6awT8jP4Q7nAwm7f8d3Ld0S1SqvGhdAx/0vuGRfnx7yf9M9VZm5XVCXD75j6BNbx84 ZBzQ== X-Gm-Message-State: AOJu0YzlbOQ2ubPheAA8/uCSYQZJFXReRpJu8KJeIfbHhxKRDeOg2wFr zyo8PoiPTMJ0qAy6RKyZJyRri/mmpw04yj5Mo+Gr+Uqdc0EUfv3y X-Gm-Gg: ASbGncvL1cUlX3CFHIVA0MeHSUSzoPyFbZJxOMsVTaT6EMTSZnbe6jKMV/y8f7oLz8A G8rDDIIP9L1jn+eFGFuaHeUgzqwTRzi1Vx7lhEZgsoYr9hCZNKqcPeoeMU9u+4BR4oV4aIqKMjt n34U34vk9swZBQN9r28H62lKEJZzf5FNro8S4LX5C7WE6cQP3V1FuJkD/IGb0lHrNySchyl7Yug N1d6ujUPQWA2Int1n2nY06VBwzaeL2LN5kFSPsM8rW0KVeosI1HWqtnRLmLcSE4 X-Google-Smtp-Source: AGHT+IFu5mRWLXP2RAAh+C7nV683n4490yPj4baqvWZls6k+f9XDCu+Ge5EyW49KY7TWECfjx750jQ== X-Received: by 2002:a17:903:234f:b0:216:4b6f:ddda with SMTP id d9443c01a7336-219e6f0e6c3mr246913675ad.35.1735105285688; Tue, 24 Dec 2024 21:41:25 -0800 (PST) Received: from localhost.localdomain ([192.169.96.197]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-219dc964b1fsm98033545ad.37.2024.12.24.21.41.21 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 24 Dec 2024 21:41:25 -0800 (PST) From: zihan zhou <15645113830zzh@gmail.com> To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com Cc: linux-kernel@vger.kernel.org, zihan zhou <15645113830zzh@gmail.com>, yaozhenguo , yaowenchao Subject: [PATCH V2] sched: Forward deadline for early tick Date: Wed, 25 Dec 2024 13:40:31 +0800 Message-Id: <20241225054030.77765-1-15645113830zzh@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Due to the tick error, the eevdf scheduler exhibits unexpected behavior. For example, a machine with sysctl_sched_base_slice=3D3ms, CONFIG_HZ=3D1000 should trigger a tick every 1ms. A se (sched_entity) with default weight 1024 should theoretically reach its deadline on the third tick. However, the tick often arrives a little faster than expected. In this case, the se can only wait until the next tick to consider that it has reached its deadline, and will run 1ms longer. vruntime + sysctl_sched_base_slice =3D deadline |-----------|-----------|-----------|-----------| 1ms 1ms 1ms 1ms ^ ^ ^ ^ tick1 tick2 tick3 tick4(nearly 4ms) Here is a simple example of this scenario, where sysctl_sched_base_slice=3D3ms, CONFIG_HZ=3D1000, the CPU is Intel(R) Xeon(R) Platinum 8338C CPU @ 2.60GHz, and "while :; do :; done &" is run twice with pids 72112 and 72113. According to the design of eevdf, both should run for 3ms each, but often they run for 4ms. time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) ------------- ------ ------------- --------- --------- --------- 56696.846136 [0001] perf[72368] 0.000 0.000 0.000 56696.849378 [0001] bash[72112] 0.000 0.000 3.241 56696.852379 [0001] bash[72113] 0.000 0.000 3.000 56696.852964 [0001] sleep[72369] 0.000 6.261 0.584 56696.856378 [0001] bash[72112] 3.585 0.000 3.414 56696.860379 [0001] bash[72113] 3.999 0.000 4.000 < 56696.864379 [0001] bash[72112] 4.000 0.000 4.000 < 56696.868377 [0001] bash[72113] 4.000 0.000 3.997 < 56696.871378 [0001] bash[72112] 3.997 0.000 3.000 56696.874377 [0001] bash[72113] 3.000 0.000 2.999 56696.877377 [0001] bash[72112] 2.999 0.000 2.999 56696.881377 [0001] bash[72113] 2.999 0.000 3.999 < There are two reasons for tick error: clockevent precision and the=20 CONFIG_IRQ_TIME_ACCOUNTING. with CONFIG_IRQ_TIME_ACCOUNTING every tick will less than 1ms, but even without it, because of clockevent precision, tick still often less than 1ms. In the system above, there is no such config, but the task still often takes more than 3ms. To solve this problem, we add a sched feature FORWARD_DEADLINE, consider forwarding the deadline appropriately. When vruntime is very close to the deadline, and the task is ineligible, we consider that task should be resched, the tolerance is set to min(vslice/128, tick/2). when open FORWARD_DEADLINE, the task will run once every 3ms as designed by eevdf: time cpu task name wait time sch delay run time [tid/pid] (msec) (msec) (msec) ----------- ------ ---------------- --------- --------- --------- 207.207293 [0001] bash[1699] 3.998 0.000 3.000=20 207.210294 [0001] bash[1694] 3.000 0.000 3.000=20 207.213300 [0001] bash[1699] 3.000 0.000 3.006=20 207.216293 [0001] bash[1694] 3.006 0.000 2.993=20 207.219293 [0001] bash[1699] 2.993 0.000 3.000=20 207.222293 [0001] bash[1694] 3.000 0.000 2.999=20 207.225293 [0001] bash[1699] 2.999 0.000 3.000=20 207.228293 [0001] bash[1694] 3.000 0.000 3.000=20 207.231293 [0001] bash[1699] 3.000 0.000 3.000=20 207.234293 [0001] bash[1694] 3.000 0.000 2.999=20 207.237292 [0001] bash[1699] 2.999 0.000 2.999=20 207.240293 [0001] bash[1694] 2.999 0.000 3.000=20 207.243293 [0001] bash[1699] 3.000 0.000 3.000=20 Signed-off-by: zihan zhou <15645113830zzh@gmail.com> Signed-off-by: yaozhenguo Signed-off-by: yaowenchao --- v2: 1. just forward deadline for ineligible task. 2. for ineligible task, vd_i =3D vd_i + r_i / w_i, but for eligible tas= k, vd_i =3D ve_i + r_i / w_i, which is the same as before. 3. tolerance =3D min(vslice>>7, TICK_NSEC/2), prevent scheduling errors=20 from increasing when vslice is too large relative to tick. --- kernel/sched/fair.c | 42 +++++++++++++++++++++++++++++++++++++---- kernel/sched/features.h | 7 +++++++ 2 files changed, 45 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2d16c8545c71..9cc52f632bb1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1006,8 +1006,10 @@ static void clear_buddies(struct cfs_rq *cfs_rq, str= uct sched_entity *se); */ static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se) { - if ((s64)(se->vruntime - se->deadline) < 0) - return false; + + u64 vslice; + u64 tolerance =3D 0; + u64 next_deadline; =20 /* * For EEVDF the virtual time slope is determined by w_i (iow. @@ -1016,11 +1018,43 @@ static bool update_deadline(struct cfs_rq *cfs_rq, = struct sched_entity *se) */ if (!se->custom_slice) se->slice =3D sysctl_sched_base_slice; + vslice =3D calc_delta_fair(se->slice, se); + + /* + * vd_i =3D ve_i + r_i / w_i + */ + next_deadline =3D se->vruntime + vslice; + + if (sched_feat(FORWARD_DEADLINE)) + tolerance =3D min(vslice>>7, TICK_NSEC/2); + + if ((s64)(se->vruntime + tolerance - se->deadline) < 0) + return false; =20 /* - * EEVDF: vd_i =3D ve_i + r_i / w_i + * when se->vruntime + tolerance - se->deadline >=3D 0 + * but se->vruntime - se->deadline < 0, + * there is two case: if entity is eligible? + * if entity is not eligible, we don't need wait deadline, because + * eevdf don't guarantee + * an ineligible entity can exec its request time in one go. + * but when entity eligible, just let it run, which is the + * same processing logic as before. */ - se->deadline =3D se->vruntime + calc_delta_fair(se->slice, se); + + if (sched_feat(FORWARD_DEADLINE) && (s64)(se->vruntime - se->deadline) < = 0) { + + if (entity_eligible(cfs_rq, se)) + return false; + + /* + * vd_i =3D vd_i + r_i / w_i + */ + next_deadline =3D se->deadline + vslice; + } + + + se->deadline =3D next_deadline; =20 /* * The task has consumed its request, reschedule. diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 290874079f60..5c74deec7209 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -24,6 +24,13 @@ SCHED_FEAT(RUN_TO_PARITY, true) */ SCHED_FEAT(PREEMPT_SHORT, true) =20 +/* + * For some cases where the tick is faster than expected, + * move the deadline forward + */ +SCHED_FEAT(FORWARD_DEADLINE, true) + + /* * Prefer to schedule the task we woke last (assuming it failed * wakeup-preemption), since its likely going to consume data we --=20 2.39.3 (Apple Git-146)