From nobody Mon Feb  9 03:20:14 2026
Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com
 [209.85.214.195])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13F2120125F
	for <linux-kernel@vger.kernel.org>; Fri, 21 Feb 2025 08:48:17 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.195
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1740127699; cv=none;
 b=a8D+doSVw6l3/2Hxea4N+RAoQd5VGQdJmZbAA81dVm5M9B77xAqxFsIwzJXNTA1zj8LIYDcnIXbpaYI2OgvPINJj7eRCDIojgQIYavui17Ipn8EYhMugWkBuHL4oebfg9lIArnclXC5xnuyNGlzWU9g4BLyqVd5Opkc4hWtMQIA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1740127699; c=relaxed/simple;
	bh=m1tPChqnSc/+I1COgrBLsjmH17qUE9x8fPTlWTD6JUY=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=HsFw9dHnXcDLOqEAzMTUiQhEyFrEh6Q+F64ow5l8ilI8ZNlyZiKMvPSfFW3o+aM/VYR5hCbE9/iYC/U2hzRZUqnnbF51QPhRzd+eRH03UyJ9QDWsMjbw5XWkjLu8w1ck6gaKW70rVkgXK7S3kGyJ5rmBa2pj/rUZwaM/0f+NfCY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=D/5Nbaq+; arc=none smtp.client-ip=209.85.214.195
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="D/5Nbaq+"
Received: by mail-pl1-f195.google.com with SMTP id
 d9443c01a7336-220ec47991aso24426795ad.1
        for <linux-kernel@vger.kernel.org>;
 Fri, 21 Feb 2025 00:48:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1740127697; x=1740732497;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=DOcOfoHtBK2E3+rgegkUkhTDtgbD6buh77xCtXURBzc=;
        b=D/5Nbaq+WpE5ytr57TO7JKHn4gfIS9WAypJ6W1Sx5HRbT30S8kJhvo9qJ4fVXE8B+K
         ID1UZEUPK0kVU0v50bCGtcsZv+4xI7HOzTkQd4Os7FOHMXtAIIhp7uTj5h4ly5G7+jXU
         x1UPjRXDQAOBPkC80ebCcFjbfB7f/GAH4UgeIHkEmWyxb6b/VqvxVLRUKRo5/Atm8a0z
         Aksz65gNOZrFIjqZ9fs/oY2lcAxDrLjriO2zMrCFCe/mCBCmY+W+4Gr5bS7/+xGsTURU
         M6TMTvuTzZ3k9KV6aIOtweW0FWBloMGAKJ3K2YxhEFcS+GhSO0AGGmjvOwQ48vrqBTCl
         t9Lg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1740127697; x=1740732497;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=DOcOfoHtBK2E3+rgegkUkhTDtgbD6buh77xCtXURBzc=;
        b=w5c/UWxRWrNigApR1k8Mh1lEQwArAZglRpHAuuOnRKbsvpx03sP3mBhf1qLRPtmhVF
         4cApyL7C4cMxxZCZfyWj5yztbhnYw30VE1VAYFSah/r0e8qC/pxamJsMLvqHMfuFIjur
         vEBvIjTBQPtkhBMYAbCmpnKs2Niho5SbPyExeY6FNgZpfaE8RJnjgOdbqMJfyBCHUQnM
         mwy+zfZtmGXpdBiTmD7uqKtc9EmVykPnh1RAQXz9MGWSDAOfk6RXPsXvXgp4qfCXbcKF
         aXvOvPnAEp9m9MUvF0MV9t4b5DrgNWB9TbvS778tWaTok9mserXVnl2Phw4a/3ERgsM4
         xEVw==
X-Forwarded-Encrypted: i=1;
 AJvYcCVIgLxWPZtmeovkU5+HpuXweQY7L/HNtqwlT7WzSzJPPkxvb/gRRl4buQrd1D1Zm1S2jhoXuklWrI0MUq4=@vger.kernel.org
X-Gm-Message-State: AOJu0YwfwFBy/CvjkauNpbgHtSQxrPTNj5ft5LZ3XmKH0Fs8cn6GN3DA
	9hNpqKm7MyEtrSt1T1gpUhb1Xn6NuMb3x1itHfQpn0zcwmOFSrNR
X-Gm-Gg: ASbGncs+jNhgQEFC9Q/hA7BGqSLKv6x+3pJLwwEit8HP9sZB3B7Dq1JifmRo7KRGAl5
	RmG6jcLKgZSvMHWijOy/3RSIwD5xSH1+JX7ec9ROY1L2IT7Qk+80op9m7km8l/G7uZpH7MpgXpz
	jwv39d1faAkCK4eRFZ5iXzbfqRxLGNKfCcV2uuj95/knaaJhOQhuN73WCCs1JjycOjjA3eXrvld
	ZNSYTEKtsSSt2uvTP72MKoe27oGOsf4Adng4ZytHGcWYKYQVMjeJs4HNur0ockLt5fyJaOcIIkH
	6nK5RC9+DqSX8csgSeBBCb5XU+OzfLKWlJjqjVNptm8P/EhQewwbCc4=
X-Google-Smtp-Source: 
 AGHT+IFPee9g5cWnsjvKUxnUcKxA6+Epp9DDGURxcVQbl61BabtRng+KCZkiXe9RJW1yYEwy6AILLA==
X-Received: by 2002:a17:902:db08:b0:220:cab1:810e with SMTP id
 d9443c01a7336-2219ff2a633mr35546445ad.6.1740127697156;
        Fri, 21 Feb 2025 00:48:17 -0800 (PST)
Received: from localhost.localdomain ([2408:80e0:41fc:0:fe2d:0:2:6253])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-220d5364676sm133197955ad.82.2025.02.21.00.48.11
        (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256);
        Fri, 21 Feb 2025 00:48:16 -0800 (PST)
From: zihan zhou <15645113830zzh@gmail.com>
To: 15645113830zzh@gmail.com
Cc: bsegall@google.com,
	dietmar.eggemann@arm.com,
	juri.lelli@redhat.com,
	linux-kernel@vger.kernel.org,
	mgorman@suse.de,
	mingo@redhat.com,
	peterz@infradead.org,
	rostedt@goodmis.org,
	vincent.guittot@linaro.org,
	vschneid@redhat.com
Subject: [PATCH V1 1/4] sched: Add kconfig of predict load.
Date: Fri, 21 Feb 2025 16:47:45 +0800
Message-Id: <20250221084744.31803-1-15645113830zzh@gmail.com>
X-Mailer: git-send-email 2.39.3 (Apple Git-146)
In-Reply-To: <20250221084437.31284-1-15645113830zzh@gmail.com>
References: <20250221084437.31284-1-15645113830zzh@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Using predict load will make the scheduler logic more complex and take
up more resources. When we are not sure whether to use it, we should be
able to close it.

Signed-off-by: zihan zhou <15645113830zzh@gmail.com>
---
 init/Kconfig      | 10 ++++++++++
 lib/Kconfig.debug | 12 ++++++++++++
 2 files changed, 22 insertions(+)

diff --git a/init/Kconfig b/init/Kconfig
index d0d021b3fa3b..83cff5d63ce2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -573,6 +573,16 @@ config HAVE_SCHED_AVG_IRQ
 	depends on IRQ_TIME_ACCOUNTING || PARAVIRT_TIME_ACCOUNTING
 	depends on SMP
=20
+config SCHED_PREDICT_LOAD
+	bool "Predict the load of se"
+	depends on SMP
+	help
+	  Select this option to enable the load prediction, the load at the
+	  time of dequeue will be predicted according the load at the time
+	  of enqueue.
+
+	  Say N if unsure.
+
 config SCHED_HW_PRESSURE
 	bool
 	default y if ARM && ARM_CPU_TOPOLOGY
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 1af972a92d06..01b23677d003 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1310,6 +1310,18 @@ config SCHED_DEBUG
 	  that can help debug the scheduler. The runtime overhead of this
 	  option is minimal.
=20
+config SCHED_PREDICT_LOAD_DEBUG
+	bool "Debug for SCHED_PREDICT_LOAD"
+	depends on SMP && SCHED_PREDICT_LOAD && SCHED_DEBUG
+	default y
+	help
+	  If you say Y here, the /proc/$pid/predict_load file will be provided
+	  the information of task se that can help debug the SCHED_PREDICT_LOAD.
+	  The /sys/kernel/debug/sched/debug file can also see the information
+	  of group se, but compared with task se, there is less information.
+
+	  Say N if unsure.
+
 config SCHED_INFO
 	bool
 	default n
--=20
2.33.0
From nobody Mon Feb  9 03:20:14 2026
Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com
 [209.85.214.195])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74444202C23
	for <linux-kernel@vger.kernel.org>; Fri, 21 Feb 2025 08:51:18 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.195
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1740127880; cv=none;
 b=SMReAt/G3DqaYMGhAddWytwFetfBAhbu69D3p0JKgyZsatBQjeYTmtItaUvBRi9nStk8SDy7BLT4++dA0/3nfefaeLsbTS8dzrGP4u+9CxgyKduHbChJWjPwqs7P6O4Tex8BTmteGwQWCd104SuHkY6bUCKhiYjSZWuVwcNxtAU=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1740127880; c=relaxed/simple;
	bh=IOGJQL6s27+8z3AuOXUqyPojF4z/7mrWClNgfqxYZJ8=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=d42phdqejSlTm0rR9Cy4ec5O1I0w9VhNxQrqpzK/dEiXiCMoaAaY0jNfMDR8E3fXxOXqWnNehhAK/e9wZ7pIk8DnFq5R0FU+XKFgUS9nnqx4979duKqt4SLBP58cTey+p9DEtY38Ey+8+r3gnD4wR/qxBi9+gsVU46JHJ5/Ht5M=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=VgnVH3kJ; arc=none smtp.client-ip=209.85.214.195
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="VgnVH3kJ"
Received: by mail-pl1-f195.google.com with SMTP id
 d9443c01a7336-22128b7d587so34888325ad.3
        for <linux-kernel@vger.kernel.org>;
 Fri, 21 Feb 2025 00:51:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1740127877; x=1740732677;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=25FyP5V2MCscP+w4+HnonziJqe8/Aa1u78lqnAr5ogs=;
        b=VgnVH3kJKQlhaQWjI0012/+jTUHjEP9vOoK2owk9XbowRD3BEONeE/Ii7hyl1b3yrZ
         tif/1k9z8aqVIi/tIoVmDWd//BezzEGo/NkIgTpwE/bLntLR6dFV9lK6u3QXrQGVyxdE
         58tDRK6PU9dLWFWZSnD0E8Yeg/RFdFo40pX+UqavGUFO4y142T1eFRkQLGwfCIjLLrRT
         +AHWgNC/hEKiyNXXZlm0bRG1LP5lD3n88U6yuVdiu9uWRwNDiBmnLIK9ax1b5dkkL/yJ
         Es/OXs2wMWivrkf2u/JuhkQhfPwHFAV45cO8JPg6bHzHTMYSkC1Dxy8PK7+qzlfC7ztC
         ggPg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1740127877; x=1740732677;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=25FyP5V2MCscP+w4+HnonziJqe8/Aa1u78lqnAr5ogs=;
        b=bQngOwCtmOYNdA1KuqQtIOit+wY3Zet4KD8EPHSkN/MIFRt6nfmGlUvIrRHim5zBXK
         cFpaJCyTiQ9FmPQ+Nz7GdKkIfDSPjF5+qCkv6vnxgK4lnX6zxiShycuZT0g60VgnlXXW
         Fylh7DYLnRMzGnRSEaEBwvCeMZRbob7tucWLGIUhB+6TSoXCfQKcRnRSNIkdiB4ociQO
         QWj8MzEobsm1k4Fj9/SBLz0qqWB5VxWuN+em+yWTjNAVRW9UK/0wlmOaMOUA+f1fHDrV
         kVVOQqYBcV2UD37Q1YkB2Ha63xXrFs2fAcE8pY9XZMff8txGDkLNCiTNuciyU4Mcuvdl
         lgCA==
X-Forwarded-Encrypted: i=1;
 AJvYcCVA3qa2XuTzO9YaDukd013I8M8hDkXfKMhJqm2/52JSURCTg/AaaonmJpCgOorcEZACXyZees/4mBu8R0U=@vger.kernel.org
X-Gm-Message-State: AOJu0YxMCdivmNcm6nEufSywpXM/9wZ+N56XxAnNswIOw8IDytBcEsul
	nr/fPBM2+u9muZ3QuITsIQyPFuyZzNzQpypIwajK4hEdBuMrmz1e
X-Gm-Gg: ASbGncvpbupCPTigphgBON0cnHhP4RvyIrQB82boxFDgcLHGnOjW2e6AptyKL1k284y
	BNcCFq2pG4I26E3c5AP+pFFaVJ09OW5hapuF1PNaUbFKpBrimXVM1oYecqF+4lDUzWcTUwOBY0D
	5NsD0HypKdlDlU1qgLFz/DssS/YwGpQ9l0v/UXTiYxgdJDoAL41xdbvktygbEw2okIBjZYPTsD3
	UrBMx2Foo4STgL/WoIKQSuaX2PdMe+t9TknPTRjmmLh4Ktdj+IvCbNT+Rm87pN90RJRVCAuneO2
	6xWF+EcQKqWc1KwDajVgi1peXZlXPEwawg==
X-Google-Smtp-Source: 
 AGHT+IGn2IPjva2esrxqes1fneYalYkM593na73RdEni1hrsArpjaZzg1rAnuAXiH9NCfxKR33otsA==
X-Received: by 2002:a17:903:230f:b0:216:4883:fb43 with SMTP id
 d9443c01a7336-2219ffd2718mr43910005ad.32.1740127877428;
        Fri, 21 Feb 2025 00:51:17 -0800 (PST)
Received: from localhost.localdomain ([129.227.63.233])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-220d545df28sm132392645ad.153.2025.02.21.00.51.13
        (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256);
        Fri, 21 Feb 2025 00:51:16 -0800 (PST)
From: zihan zhou <15645113830zzh@gmail.com>
To: 15645113830zzh@gmail.com
Cc: bsegall@google.com,
	dietmar.eggemann@arm.com,
	juri.lelli@redhat.com,
	linux-kernel@vger.kernel.org,
	mgorman@suse.de,
	mingo@redhat.com,
	peterz@infradead.org,
	rostedt@goodmis.org,
	vincent.guittot@linaro.org,
	vschneid@redhat.com
Subject: [PATCH V1 2/4] sched: Do predict load
Date: Fri, 21 Feb 2025 16:50:52 +0800
Message-Id: <20250221085051.32468-1-15645113830zzh@gmail.com>
X-Mailer: git-send-email 2.39.3 (Apple Git-146)
In-Reply-To: <20250221084437.31284-1-15645113830zzh@gmail.com>
References: <20250221084437.31284-1-15645113830zzh@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Patch 2/4 is the core content of this submission.

The main struct is predict_load_data, which every se has one except
init_task. Both group se and task se has it.

Predict load is mainly realized by functions record_predict_load_data
and se_do_predict_load, they run when dequeue and enqueue.

when enqueue, se_do_predict_load record load_normalized_when_enqueue,
and try get predict_load_normalized. when dequeue, record_predict_load_data
use record_load_array to record correspondence between enqueue load and
dequeue load.

Here we use Boyer=E2=80=93Moore majority vote algorithm, I think prediction=
 is
considered reliable only when the confidence is greater than
CONFIDENCE_THRESHOLD(4), this is an experimental value.

It has also been explained in patch 0/4 that the load will be normalized to
0~1024. From the perspective of machine learning, normalization is better,
and it also helps to deal with some special cases.

All operation time complexity is O(1), and predict load will not affect
performance, at least when I test.

TODO:
There are still many shortcomings here. I hope to have the opportunity
to establish a mapping of exec_filename to predict_load_data and add
exec statistics, which helps the kernel distinguish whether the executable
file is sysbench or cyclictest.

Signed-off-by: zihan zhou <15645113830zzh@gmail.com>
---
 include/linux/sched.h      |  46 ++++++++++++
 include/linux/sched/task.h |   2 +-
 init/init_task.c           |   3 +
 kernel/fork.c              |   6 +-
 kernel/sched/core.c        |  15 +++-
 kernel/sched/fair.c        | 148 ++++++++++++++++++++++++++++++++++++-
 kernel/sched/sched.h       |   2 +
 7 files changed, 217 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9632e3318e0d..b8576bca5a5d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -491,6 +491,42 @@ struct sched_avg {
 	unsigned int			util_est;
 } ____cacheline_aligned;
=20
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+
+#define NO_PREDICT_LOAD ULONG_MAX
+#define CONFIDENCE_THRESHOLD 4
+
+#define PREDICT_LOAD_MAX 1024
+#define LOAD_GRAN_SHIFT 4
+#define LOAD_GRAN (1 << LOAD_GRAN_SHIFT)
+
+struct record_load {
+	u8 load_after_offset;
+	u8 confidence;
+};
+
+extern struct kmem_cache *predict_load_data_cachep;
+
+#endif
+
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+
+	struct predict_load_data {
+		//1024 Special processing, index does not need +1.
+		struct record_load record_load_array[PREDICT_LOAD_MAX >> LOAD_GRAN_SHIFT=
];
+		unsigned long		load_normalized_when_enqueue;
+		unsigned long		predict_load_normalized;
+
+#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG
+		unsigned long predict_count;
+		unsigned long predict_correct_count;
+		unsigned long no_predict_count;
+#endif
+		bool in_predict_no_preempt;
+	};
+
+#endif
+
 /*
  * The UTIL_AVG_UNCHANGED flag is used to synchronize util_est with util_a=
vg
  * updates. When a task is dequeued, its util_est should not be updated if=
 its
@@ -587,9 +623,19 @@ struct sched_entity {
 	 * collide with read-mostly values above.
 	 */
 	struct sched_avg		avg;
+
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+	struct predict_load_data *pldp;
+#endif
 #endif
 };
=20
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+unsigned long get_predict_load(struct sched_entity *se);
+void set_in_predict_no_preempt(struct sched_entity *se, bool in_predict_no=
_preempt);
+bool predict_error_should_resched(struct sched_entity *se);
+#endif
+
 struct sched_rt_entity {
 	struct list_head		run_list;
 	unsigned long			timeout;
diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 0f2aeb37bbb0..c5d435b9fce9 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -62,7 +62,7 @@ extern int lockdep_tasklist_lock_is_held(void);
 extern asmlinkage void schedule_tail(struct task_struct *prev);
 extern void init_idle(struct task_struct *idle, int cpu);
=20
-extern int sched_fork(unsigned long clone_flags, struct task_struct *p);
+extern int sched_fork(unsigned long clone_flags, struct task_struct *p, in=
t node);
 extern int sched_cgroup_fork(struct task_struct *p, struct kernel_clone_ar=
gs *kargs);
 extern void sched_cancel_fork(struct task_struct *p);
 extern void sched_post_fork(struct task_struct *p);
diff --git a/init/init_task.c b/init/init_task.c
index e557f622bd90..c0ea11adfdab 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -89,6 +89,9 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) =
=3D {
 	},
 	.se		=3D {
 		.group_node 	=3D LIST_HEAD_INIT(init_task.se.group_node),
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+		.pldp			=3D NULL,
+#endif
 	},
 	.rt		=3D {
 		.run_list	=3D LIST_HEAD_INIT(init_task.rt.run_list),
diff --git a/kernel/fork.c b/kernel/fork.c
index 735405a9c5f3..b8ba621d2a87 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -182,6 +182,10 @@ static inline struct task_struct *alloc_task_struct_no=
de(int node)
=20
 static inline void free_task_struct(struct task_struct *tsk)
 {
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+	if (tsk->se.pldp !=3D NULL && predict_load_data_cachep !=3D NULL)
+		kmem_cache_free(predict_load_data_cachep, tsk->se.pldp);
+#endif
 	kmem_cache_free(task_struct_cachep, tsk);
 }
=20
@@ -2370,7 +2374,7 @@ __latent_entropy struct task_struct *copy_process(
 #endif
=20
 	/* Perform scheduler related setup. Assign this task to a CPU. */
-	retval =3D sched_fork(clone_flags, p);
+	retval =3D sched_fork(clone_flags, p, node);
 	if (retval)
 		goto bad_fork_cleanup_policy;
=20
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 165c90ba64ea..905d53503a35 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4712,7 +4712,7 @@ late_initcall(sched_core_sysctl_init);
 /*
  * fork()/clone()-time setup:
  */
-int sched_fork(unsigned long clone_flags, struct task_struct *p)
+int sched_fork(unsigned long clone_flags, struct task_struct *p, int node)
 {
 	__sched_fork(clone_flags, p);
 	/*
@@ -4768,7 +4768,9 @@ int sched_fork(unsigned long clone_flags, struct task=
_struct *p)
 	}
=20
 	init_entity_runnable_average(&p->se);
-
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+	init_entity_predict_load_data(&p->se, node);
+#endif
=20
 #ifdef CONFIG_SCHED_INFO
 	if (likely(sched_info_on()))
@@ -8472,11 +8474,20 @@ LIST_HEAD(task_groups);
 static struct kmem_cache *task_group_cache __ro_after_init;
 #endif
=20
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+struct kmem_cache *predict_load_data_cachep;
+#endif
+
 void __init sched_init(void)
 {
 	unsigned long ptr =3D 0;
 	int i;
=20
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+	predict_load_data_cachep =3D kmem_cache_create("predict_load_data",
+		sizeof(struct predict_load_data), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);
+#endif
+
 	/* Make sure the linker didn't screw up */
 #ifdef CONFIG_SMP
 	BUG_ON(!sched_class_above(&stop_sched_class, &dl_sched_class));
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 857808da23d8..d22d47419f79 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1068,6 +1068,23 @@ void init_entity_runnable_average(struct sched_entit=
y *se)
 	/* when this task is enqueued, it will contribute to its cfs_rq's load_av=
g */
 }
=20
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+void init_entity_predict_load_data(struct sched_entity *se, int node)
+{
+	if (predict_load_data_cachep =3D=3D NULL) {
+		se->pldp =3D NULL;
+		return;
+	}
+
+	struct predict_load_data *pldp =3D kmem_cache_alloc_node(predict_load_dat=
a_cachep,
+										 GFP_KERNEL, node);
+
+	memset(pldp, 0, sizeof(*(pldp)));
+	pldp->predict_load_normalized =3D NO_PREDICT_LOAD;
+	se->pldp =3D pldp;
+}
+#endif
+
 /*
  * With new tasks being created, their initial util_avgs are extrapolated
  * based on the cfs_rq's current util_avg:
@@ -4701,6 +4718,114 @@ static void attach_entity_load_avg(struct cfs_rq *c=
fs_rq, struct sched_entity *s
 	trace_pelt_cfs_tp(cfs_rq);
 }
=20
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+
+static unsigned long get_load_after_offset(unsigned long load)
+{
+	if (load >=3D PREDICT_LOAD_MAX)
+		load =3D PREDICT_LOAD_MAX - 1;
+	return load >> LOAD_GRAN_SHIFT;
+}
+
+/*
+ * Here I don't want the weight of se to affect the load, because
+ * the predict_load_data is designed to record load form 0 to 1024,
+ * so normalized it, we can restore it as needed by restore_normalized_loa=
d.
+ */
+static unsigned long get_normalized_load(struct sched_entity *se)
+{
+	unsigned long normalized_load, load =3D se->avg.load_avg;
+
+	//Prevent arithmetic overflow
+	WARN_ON_ONCE(load > 4000000);
+	if (se_weight(se) =3D=3D PREDICT_LOAD_MAX)
+		return load;
+	normalized_load =3D div_u64(load * PREDICT_LOAD_MAX, se_weight(se));
+	return min(normalized_load, PREDICT_LOAD_MAX);
+}
+
+static unsigned long restore_normalized_load(unsigned long normalized_load=
, unsigned long weight)
+{
+	unsigned long load;
+
+	//Prevent arithmetic overflow
+	WARN_ON_ONCE(normalized_load > 4000000);
+	if (weight =3D=3D PREDICT_LOAD_MAX)
+		return normalized_load;
+	load =3D div_u64(load * weight, PREDICT_LOAD_MAX);
+	return load;
+}
+
+//This is a useful API.
+unsigned long get_predict_load(struct sched_entity *se)
+{
+	if (se->pldp =3D=3D NULL)
+		return NO_PREDICT_LOAD;
+	struct predict_load_data *pldp =3D se->pldp;
+	unsigned long predict_load_normalized =3D pldp->predict_load_normalized;
+	unsigned long predict_load;
+
+	if (predict_load_normalized =3D=3D NO_PREDICT_LOAD)
+		return NO_PREDICT_LOAD;
+
+	predict_load =3D restore_normalized_load(predict_load_normalized + LOAD_G=
RAN, se_weight(se));
+	return predict_load;
+}
+
+
+static void record_predict_load_data(struct sched_entity *se)
+{
+	if (se->pldp =3D=3D NULL)
+		return;
+	struct predict_load_data *pldp =3D se->pldp;
+	struct record_load *rla =3D pldp->record_load_array;
+	unsigned long load_normalized_when_dequeue =3D get_normalized_load(se);
+	unsigned long load_normalized_when_enqueue =3D se->pldp->load_normalized_=
when_enqueue;
+	unsigned long index =3D get_load_after_offset(load_normalized_when_enqueu=
e);
+	unsigned long val =3D get_load_after_offset(load_normalized_when_dequeue);
+
+#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG
+	if (pldp->predict_load_normalized !=3D NO_PREDICT_LOAD) {
+		pldp->predict_count++;
+		if (load_normalized_when_dequeue >=3D pldp->predict_load_normalized
+		&& load_normalized_when_dequeue <=3D pldp->predict_load_normalized + LOA=
D_GRAN)
+			pldp->predict_correct_count++;
+	} else {
+		pldp->no_predict_count++;
+	}
+#endif
+
+	if (rla[index].load_after_offset =3D=3D val) {
+		if (rla[index].confidence < 255)
+			rla[index].confidence++;
+	} else {
+		if (rla[index].confidence <=3D 1) {
+			rla[index].load_after_offset =3D val;
+			rla[index].confidence =3D 1;
+		} else {
+			rla[index].confidence--;
+		}
+	}
+}
+
+static void se_do_predict_load(struct sched_entity *se)
+{
+	if (se->pldp =3D=3D NULL)
+		return;
+	unsigned long index, predict_load_normalized =3D NO_PREDICT_LOAD;
+	struct predict_load_data *pldp =3D se->pldp;
+	struct record_load *rla =3D pldp->record_load_array;
+
+	pldp->load_normalized_when_enqueue =3D get_normalized_load(se);
+	index =3D get_load_after_offset(pldp->load_normalized_when_enqueue);
+
+	if (rla[index].confidence >=3D CONFIDENCE_THRESHOLD)
+		predict_load_normalized =3D rla[index].load_after_offset << LOAD_GRAN_SH=
IFT;
+	pldp->predict_load_normalized =3D predict_load_normalized;
+}
+
+#endif
+
 /**
  * detach_entity_load_avg - detach this entity from its cfs_rq load avg
  * @cfs_rq: cfs_rq to detach from
@@ -5336,6 +5461,11 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_e=
ntity *se, int flags)
 	 */
 	update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH);
 	se_update_runnable(se);
+
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+	se_do_predict_load(se);
+#endif
+
 	/*
 	 * XXX update_load_avg() above will have attached us to the pelt sum;
 	 * but update_cfs_group() here will re-adjust the weight and have to
@@ -5493,6 +5623,11 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_e=
ntity *se, int flags)
 	update_load_avg(cfs_rq, se, action);
 	se_update_runnable(se);
=20
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+	record_predict_load_data(se);
+	set_in_predict_no_preempt(se, false);
+#endif
+
 	update_stats_dequeue_fair(cfs_rq, se, flags);
=20
 	update_entity_lag(cfs_rq, se);
@@ -5628,6 +5763,9 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, st=
ruct sched_entity *prev)
 	}
 	SCHED_WARN_ON(cfs_rq->curr !=3D prev);
 	cfs_rq->curr =3D NULL;
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+	set_in_predict_no_preempt(prev, false);
+#endif
 }
=20
 static void
@@ -13345,8 +13483,13 @@ void free_fair_sched_group(struct task_group *tg)
 	for_each_possible_cpu(i) {
 		if (tg->cfs_rq)
 			kfree(tg->cfs_rq[i]);
-		if (tg->se)
+		if (tg->se) {
 			kfree(tg->se[i]);
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+			if (tg->se[i]->pldp !=3D NULL && predict_load_data_cachep !=3D NULL)
+				kmem_cache_free(predict_load_data_cachep, tg->se[i]->pldp);
+#endif
+		}
 	}
=20
 	kfree(tg->cfs_rq);
@@ -13384,6 +13527,9 @@ int alloc_fair_sched_group(struct task_group *tg, s=
truct task_group *parent)
 		init_cfs_rq(cfs_rq);
 		init_tg_cfs_entry(tg, cfs_rq, se, i, parent->se[i]);
 		init_entity_runnable_average(se);
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+		init_entity_predict_load_data(se, cpu_to_node(i));
+#endif
 	}
=20
 	return 1;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ab16d3d0e51c..cf1e98bf83d3 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2733,6 +2733,8 @@ extern void init_dl_entity(struct sched_dl_entity *dl=
_se);
 extern unsigned long to_ratio(u64 period, u64 runtime);
=20
 extern void init_entity_runnable_average(struct sched_entity *se);
+extern void init_entity_predict_load_data(struct sched_entity *se, int nod=
e);
+
 extern void post_init_entity_util_avg(struct task_struct *p);
=20
 #ifdef CONFIG_NO_HZ_FULL
--=20
2.33.0

From nobody Mon Feb  9 03:20:14 2026
Received: from mail-pj1-f67.google.com (mail-pj1-f67.google.com
 [209.85.216.67])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53B6D2010F2
	for <linux-kernel@vger.kernel.org>; Fri, 21 Feb 2025 08:55:26 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.67
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1740128127; cv=none;
 b=n+k5UKopQbW/l43lPwbLMV8pi9Elof79u2SP/ztWuT2H3TXMIH1NRkaF9ywDabH89i6YOfs1hNGtIYlPdRgNfGWxYA8A2T9wpI7UlDtClzjU3Zvt9vMtbrdLLG0ib3EaVsEg15cj6GLIlvq9jEgyP1ohPIJHURnUmObPFgrdODY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1740128127; c=relaxed/simple;
	bh=WYmZxiWZQBd1PtixBUM98h2J3Yz2UWp4UMCy6spo09M=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=ZIwIsG/r9yJdDM5G1C0xnzK5Yg2q15oO98mHl7enkEok5wqHaTnfvMmZgloL+RvF0aBgOhxJqSrNVkrK3V42bUPRcKwUwmjZG38gB1QDEFn0qgoXY5JHUoJ2Ziw96sQtdTRmcxn6dlsRcNo2abXFy5utn4VI0zSubE48b9We8/I=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=Xh++8gj1; arc=none smtp.client-ip=209.85.216.67
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="Xh++8gj1"
Received: by mail-pj1-f67.google.com with SMTP id
 98e67ed59e1d1-2fcce9bb0ecso3551311a91.3
        for <linux-kernel@vger.kernel.org>;
 Fri, 21 Feb 2025 00:55:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1740128125; x=1740732925;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=RvT1k2l3XeU5pNdodIBtmwWFocycPmhl++sLb9X13Tw=;
        b=Xh++8gj1tzsrIrwJbxaE/Dhnz6X4X5F400L/7VjuitVG3fTetBi7TcKFkZuLcXuw4N
         0KurNDedwkBl+QtVPue3bHru1u2N5Ut8b4h+9g/EZJLeozLSR3KlLyFestiVsXmloMvU
         faRy+08nWdMdr8ALzQK0QtqgfL/oCiy2B4wr33nhHzdCOUQp/4xEdpYB/bao/cjTnak/
         ibkMflg1gHg/1Fgwdl9oU8mrdBwSonEBJC7znRBvGEe4voHk3RNqzDoXM6EPMdoN+uek
         nUyq8EvdEIdauLZCz5s1FhtsBOrTtNhA+G5sePYVzkFPtPAHNIkhsAsFy7y0mcPmYvFq
         z8bw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1740128125; x=1740732925;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=RvT1k2l3XeU5pNdodIBtmwWFocycPmhl++sLb9X13Tw=;
        b=Le97pTpMPmdNPFaLh861FW3wCHEGXYvVqK71xgWm7q5LPjpleP/a4l7CxnDXxzkfdO
         YtgSSWiYfrHfsG6xhyYcQ2LNVAwQpo8glKvJv3qwCmOI7uhUrpUxs1TQM5YLmmt9Rcs8
         5dALTyNexsksnt++irfhDb+xDcaxXrWNGDiHc9NRvsXxMtgQKT03+nX/0aG92lqPovXl
         fWtiRRw4YFmYNZqpjNFJ+ZxolWwQAfaZo/3QTXJ6jXLuNcXTS3AF4bMWq3bJkL1Hb/ID
         2sRLksOzDQzJt0v1lIYcuc9uOfJQy2jgcZZmIjeoJWjgJnvTYzWIF44Ji2MEC4FtccEb
         ruFw==
X-Forwarded-Encrypted: i=1;
 AJvYcCUZpY/Dlg2vGW6YnXHAHnoRXJz4CnTSp/ixOGk2jqYCsny3i3wTTUDfcnvOp3Tw5iqXu1cwFJh+feDAz64=@vger.kernel.org
X-Gm-Message-State: AOJu0YyDjxKDhLnAjXJvnlNvKMzj2gnSRc8fVhUcQiPMCc8c5iH+xS61
	FxxGBvq/VlGe5gOCaRJWXZzAcVIfxJ7BHfw/+ysyueSRaUos969B
X-Gm-Gg: ASbGncuYQi9ne9OlODttnUgQ5LeRo2foqKvI7sOEWSfMQGHekMy1+Nu1Nrr/zcJRGnN
	XGqT3CyLNYRM3ZE8ELy4XoTuQO3MB7mmzJqBdj5f4+Wx3nsQ0mf4QBRqMFbfrsMDH5AbcxHwDlo
	MiksEP11e07mMiW9BQqNSS4LFypmIFeU9p3r6O7jLTJ0D3e9kzavMLSBcwvKWJVQ+nJzuc2vU2h
	I8qMEbLZwkmHw2OczrlA1zm4clAFxH9A4fRCL+cT97jtkm+r/v6Y1Lz6sxtWRthjcjU9IZ7cOvt
	nJxOjKiaBNiomy2MBSuYlVfyIqwC2LCqcTKZaZB3hHHb8IfJgbmhpYw=
X-Google-Smtp-Source: 
 AGHT+IFRpVxTVgjd+8RI+imhhtpHxKuk4io0KAj1jJiRLbU6ibi/GTmV38YfY/EgtQn7N8+vpTNnvg==
X-Received: by 2002:a17:90b:2e4f:b0:2ee:d63f:d77 with SMTP id
 98e67ed59e1d1-2fce78a5047mr4586929a91.9.1740128125390;
        Fri, 21 Feb 2025 00:55:25 -0800 (PST)
Received: from localhost.localdomain ([2408:80e0:41fc:0:fe2d:0:2:6253])
        by smtp.gmail.com with ESMTPSA id
 98e67ed59e1d1-2fceb04b0c2sm816950a91.14.2025.02.21.00.55.22
        (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256);
        Fri, 21 Feb 2025 00:55:24 -0800 (PST)
From: zihan zhou <15645113830zzh@gmail.com>
To: 15645113830zzh@gmail.com
Cc: bsegall@google.com,
	dietmar.eggemann@arm.com,
	juri.lelli@redhat.com,
	linux-kernel@vger.kernel.org,
	mgorman@suse.de,
	mingo@redhat.com,
	peterz@infradead.org,
	rostedt@goodmis.org,
	vincent.guittot@linaro.org,
	vschneid@redhat.com
Subject: [PATCH V1 3/4] sched: add debug for predict load
Date: Fri, 21 Feb 2025 16:55:08 +0800
Message-Id: <20250221085507.33329-1-15645113830zzh@gmail.com>
X-Mailer: git-send-email 2.39.3 (Apple Git-146)
In-Reply-To: <20250221084437.31284-1-15645113830zzh@gmail.com>
References: <20250221084437.31284-1-15645113830zzh@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

We can see the debugging information about load prediction from
/proc/$pid/predict_load (task se) and /sys/kernel/debug/sched/debug
(group se)

An example:
[root@test sched]# cat /proc/1/predict_load=20
se.pldp->predict_correct_count               :                 7699=20
se.pldp->predict_count                       :                 7820
se.pldp->no_predict_count                    :                  263
enqueue_load_normalized: 0, dequeue_load_normalized: 0, confidence:255
enqueue_load_normalized: 16, dequeue_load_normalized: 16, confidence:42
enqueue_load_normalized: 32, dequeue_load_normalized: 32, confidence:14
enqueue_load_normalized: 48, dequeue_load_normalized: 48, confidence:5=20
enqueue_load_normalized: 64, dequeue_load_normalized: 64, confidence:8=20
enqueue_load_normalized: 80, dequeue_load_normalized: 80, confidence:9=20
enqueue_load_normalized: 96, dequeue_load_normalized: 96, confidence:3=20
enqueue_load_normalized: 112, dequeue_load_normalized: 128, confidence:2=20

/sys/kernel/debug/sched/debug only have predict_count.

Signed-off-by: zihan zhou <15645113830zzh@gmail.com>
---
 fs/proc/base.c              | 39 +++++++++++++++++++++++++++++++++++++
 include/linux/sched/debug.h |  5 +++++
 kernel/sched/debug.c        | 39 +++++++++++++++++++++++++++++++++++++
 3 files changed, 83 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index cd89e956c322..e66173ce941b 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -1541,6 +1541,39 @@ static const struct file_operations proc_pid_sched_o=
perations =3D {
=20
 #endif
=20
+
+#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG
+
+static int predict_load_show(struct seq_file *m, void *v)
+{
+	struct inode *inode =3D m->private;
+	struct pid_namespace *ns =3D proc_pid_ns(inode->i_sb);
+	struct task_struct *p;
+
+	p =3D get_proc_task(inode);
+	if (!p)
+		return -ESRCH;
+	proc_predict_load_show_task(p, ns, m);
+
+	put_task_struct(p);
+
+	return 0;
+}
+
+static int predict_load_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, predict_load_show, inode);
+}
+
+static const struct file_operations proc_pid_predict_load_operations =3D {
+	.open		=3D predict_load_open,
+	.read		=3D seq_read,
+	.llseek		=3D seq_lseek,
+	.release	=3D single_release,
+};
+
+#endif
+
 #ifdef CONFIG_SCHED_AUTOGROUP
 /*
  * Print out autogroup related information:
@@ -3334,6 +3367,9 @@ static const struct pid_entry tgid_base_stuff[] =3D {
 #ifdef CONFIG_SCHED_DEBUG
 	REG("sched",      S_IRUGO|S_IWUSR, proc_pid_sched_operations),
 #endif
+#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG
+	REG("predict_load",      S_IRUGO, proc_pid_predict_load_operations),
+#endif
 #ifdef CONFIG_SCHED_AUTOGROUP
 	REG("autogroup",  S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations),
 #endif
@@ -3684,6 +3720,9 @@ static const struct pid_entry tid_base_stuff[] =3D {
 	ONE("limits",	 S_IRUGO, proc_pid_limits),
 #ifdef CONFIG_SCHED_DEBUG
 	REG("sched",     S_IRUGO|S_IWUSR, proc_pid_sched_operations),
+#endif
+#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG
+	REG("predict_load",      S_IRUGO, proc_pid_predict_load_operations),
 #endif
 	NOD("comm",      S_IFREG|S_IRUGO|S_IWUSR,
 			 &proc_tid_comm_inode_operations,
diff --git a/include/linux/sched/debug.h b/include/linux/sched/debug.h
index b5035afa2396..5b2bab60afae 100644
--- a/include/linux/sched/debug.h
+++ b/include/linux/sched/debug.h
@@ -40,6 +40,11 @@ struct seq_file;
 extern void proc_sched_show_task(struct task_struct *p,
 				 struct pid_namespace *ns, struct seq_file *m);
 extern void proc_sched_set_task(struct task_struct *p);
+
+#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG
+extern void proc_predict_load_show_task(struct task_struct *p,
+					struct pid_namespace *ns, struct seq_file *m);
+#endif
 #endif
=20
 /* Attach to any functions which should be ignored in wchan output. */
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index ef047add7f9e..619b96333f6a 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -690,6 +690,12 @@ static void print_cfs_group_stats(struct seq_file *m, =
int cpu, struct task_group
 	P(se->avg.runnable_avg);
 #endif
=20
+#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG
+	P(se->pldp->predict_correct_count);
+	P(se->pldp->predict_count);
+	P(se->pldp->no_predict_count);
+#endif
+
 #undef PN_SCHEDSTAT
 #undef PN
 #undef P_SCHEDSTAT
@@ -1160,6 +1166,39 @@ static void sched_show_numa(struct task_struct *p, s=
truct seq_file *m)
 #endif
 }
=20
+#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG
+
+void proc_predict_load_show_task(struct task_struct *p, struct pid_namespa=
ce *ns,
+						  struct seq_file *m)
+{
+	struct predict_load_data *pldp =3D p->se.pldp;
+
+	if (pldp =3D=3D NULL)
+		return;
+	struct record_load *rla =3D pldp->record_load_array;
+
+	unsigned long index, enqueue_load_normalized, dequeue_load_normalized, co=
nfidence;
+
+	P(se.pldp->predict_correct_count);
+	P(se.pldp->predict_count);
+	P(se.pldp->no_predict_count);
+
+
+	for (index =3D 0; index < (PREDICT_LOAD_MAX >> LOAD_GRAN_SHIFT); index++)=
 {
+		enqueue_load_normalized =3D index << LOAD_GRAN_SHIFT;
+		dequeue_load_normalized =3D rla[index].load_after_offset << LOAD_GRAN_SH=
IFT;
+		confidence =3D rla[index].confidence;
+		if (confidence) {
+			SEQ_printf(m, "enqueue_load_normalized: %ld, ", enqueue_load_normalized=
);
+			SEQ_printf(m, "dequeue_load_normalized: %ld, ", dequeue_load_normalized=
);
+			SEQ_printf(m, "confidence:%ld\n", confidence);
+		}
+	}
+
+}
+
+#endif
+
 void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns,
 						  struct seq_file *m)
 {
--=20
2.33.0
From nobody Mon Feb  9 03:20:14 2026
Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com
 [209.85.214.195])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADFF233EA
	for <linux-kernel@vger.kernel.org>; Fri, 21 Feb 2025 08:57:45 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.195
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1740128267; cv=none;
 b=M/ZUAILRXrJ7/ob1sqocHhCMv0x5ftcvaKPnxrH4gRuMTkqlYWiOoyXdmyUZx5JxBI8V7YAVHCKGK36hMRidLdB2OsWTAMLlN/2dKYvr8SIPnl1yV2oSgrLYO+GYm/AaGTTwh+kyz1Zs6ziAIaBGBZN8QfAZN6cXlCeFgsKtpQA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1740128267; c=relaxed/simple;
	bh=NU/hkrqoZ5jzQmEQiGl0NYzgGLmACZcdobkKFKgALLY=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=Dy3kOoux763WQ79YIltdk1CKAFZ0GZLF9rzvWefoxNDFbZtCtOBg+ft1ckIxkBgni38A7aY1aggy5Wd3guw8+VTYoH1KhFX/ngdDnBWQFT1E8lGKu2QLRth5rbmos13ax/lpr+xivg6ruw5IZeiUxorgGO1CVyawmbn78BRl77g=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=XbPzM8zr; arc=none smtp.client-ip=209.85.214.195
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="XbPzM8zr"
Received: by mail-pl1-f195.google.com with SMTP id
 d9443c01a7336-221050f3f00so39174475ad.2
        for <linux-kernel@vger.kernel.org>;
 Fri, 21 Feb 2025 00:57:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1740128265; x=1740733065;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=QkbJqFXXVa6R8LsAKTwnnxnNWPxmP+opdk/XTuZUT9c=;
        b=XbPzM8zrhIMDooKy1IqDdjvejHZcPkeha3ccxvMp3I9voVqMPkuFBPr+ku7wb5vTH8
         z10J/P5brLDb/oTRprXyqAvlhbEjwDKMmmvWcO5EHHQ10NzmHbbHBA8x81Zt8a1M07tA
         hLzTPrJka+Mp1Q4Pb2B/9QzjSAbdfk2aKrpDUq80Mk82cFPpd/wHKP8LgrsitxOo5geS
         Nt8C9xbnTzRDGkRj1YgcCjYlR5QANhk1jfSXLcnukq3JAfkxUewSYm560GaJ6nenXP1f
         p2s7Yu7W1G1fyLlKbyb9OlMUdJw8hYnOhiw5+9MUUIXUqjtBFvCg1NOClFJoL3eqFyEV
         J9SQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1740128265; x=1740733065;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=QkbJqFXXVa6R8LsAKTwnnxnNWPxmP+opdk/XTuZUT9c=;
        b=Q0cqVPBK2nXccpEJEX/SvDVFcfOleyGfTGN+kwuVSUE8JiPkd4m7oZqYh44I4heiZq
         r74Lj5RS+eOJ/srXtZAPC78GPAjlGGCjldzL5V2zF02tuTuR7HOu5/3S5llKra3/XZ2s
         G3f7EQzMPKgkGqPsl6p0PntF0OEhQlqPzyw4FdL769vR80gm8ReponploWYztaeUhZdO
         owRKyAr2wK8otOsGINhmJ+SEKD4/Gb/fCrD2liG2+Miwsmv8Kr1oZMYtDvRK46xN/2ra
         mxfJvjN9IfEEIv4PB12KbWyTDLdMw2FdmlSCeaWRmOwZ+VEnjw8ZNqfqbab9um08ieyP
         WD1Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCWuCPfzIyv05tOfatcL9ITMClG6MwxVm/+1xlCLORagQOjDvJPbxqb73IpbEy+DrCqbmjlqsjvheRkSD/Y=@vger.kernel.org
X-Gm-Message-State: AOJu0YzcDq8R3VKGah29FNazVPHr8897VLCgjI5zQ/1vyPmoF+RAfuD3
	+elKUeKgTL05t69thsjbSXBNy7XW/rpQYROFpEhBfG0lhmPLgPrz
X-Gm-Gg: ASbGncuuPTqVAqV0S8iVn1i8KPjZMocgTGlvYfiK/a9f9zdnjqeik+pl+zdlAdvWK5E
	2bMCSVEcIinSUuPpnyd/bzj8T4qkKBPTgvREOvAtA8b/o5B+UAPxOv/va2NO3cDr4DOVfbdLaIv
	nmlYqg62J9UGPO2syqZkj+sLPkSNKAMS1b/JPzJna0h5Wmli96I8xQnaD0OAg3R8YBmEXFQa3Pi
	9kT7FjrUGNJklW67zZBHHOb7jGat6zwzu9+0Ewh6OxeDfzhqcsTNB6JyxZlWSy1a9geCaiWIsgZ
	IoBVyB0NVg7XdvwhunnXEVaOGm3Fdwztyg==
X-Google-Smtp-Source: 
 AGHT+IFr1cySoaKCX7EeEXfWp3iKzTGJ/lI6Iaj5xm3KEjWwKvAn1t5QCWl37DL6w0m3Y0XokVI9dw==
X-Received: by 2002:a17:903:2a8d:b0:220:eade:d773 with SMTP id
 d9443c01a7336-2219ff5e60cmr50134045ad.24.1740128264852;
        Fri, 21 Feb 2025 00:57:44 -0800 (PST)
Received: from localhost.localdomain ([129.227.63.233])
        by smtp.gmail.com with ESMTPSA id
 d9443c01a7336-220d5366242sm132919715ad.87.2025.02.21.00.57.41
        (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256);
        Fri, 21 Feb 2025 00:57:44 -0800 (PST)
From: zihan zhou <15645113830zzh@gmail.com>
To: 15645113830zzh@gmail.com
Cc: bsegall@google.com,
	dietmar.eggemann@arm.com,
	juri.lelli@redhat.com,
	linux-kernel@vger.kernel.org,
	mgorman@suse.de,
	mingo@redhat.com,
	peterz@infradead.org,
	rostedt@goodmis.org,
	vincent.guittot@linaro.org,
	vschneid@redhat.com
Subject: [PATCH V1 4/4] sched: add feature PREDICT_NO_PREEMPT
Date: Fri, 21 Feb 2025 16:57:26 +0800
Message-Id: <20250221085725.33943-1-15645113830zzh@gmail.com>
X-Mailer: git-send-email 2.39.3 (Apple Git-146)
In-Reply-To: <20250221084437.31284-1-15645113830zzh@gmail.com>
References: <20250221084437.31284-1-15645113830zzh@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Patch 4/4 is independent. It is an attempt to use the predict load.

I observed that some tasks were almost finished, but they were preempted
and had to spend more time running.

This task can be identified by load prediction, that is, the load when
enqueue is basically equal to the load when dequeue.

If the se to be preempted is such a task and the pse should have
preempted the se, PREDICT_NO_PREEMPT prevents pse from preempting and
compensates pse (set_next_buddy).

This is a protection for tasks that are executed immediately. If we find
that our prediction fails later, we will resched the se.

It can be said that this is a way to automatically adjust to
SCHED_BATCH, The performance of hackbench has improved a little.

./hackbench -g 8 -l 10000
orig: 2.063s   with PREDICT_NO_PREEMPT: 1.833s
./hackbench -g 16 -l 10000                                       =20
orig: 3.658s   with PREDICT_NO_PREEMPT: 3.479s

The average latency of cyclictest (with hackbench) has increased, but the
maximum latency is no different.

orig:
I:1000 C: 181852 Min:      4 Act:   59 Avg:  212 Max:   21838
with PREDICT_NO_PREEMPT:
I:1000 C: 181564 Min:      8 Act:   80 Avg:  457 Max:   22989

I think this kind of scheduling protection can't increase the scheduling
delay over 1ms (every tick will check whether the prediction is correct).
And it can improve the overall throughput, which seems acceptable.
Of course, this patch is still experimental, and welcome to put forward
suggestions.

(Seems to predict util better?)

In addition, I found that even if a high load hackbench was hung in
the background, the terminal operation was still very smooth.

Signed-off-by: zihan zhou <15645113830zzh@gmail.com>
---
 kernel/sched/fair.c     | 92 +++++++++++++++++++++++++++++++++++++++--
 kernel/sched/features.h |  4 ++
 2 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d22d47419f79..21bf58a494ba 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1258,6 +1258,9 @@ static void update_curr(struct cfs_rq *cfs_rq)
=20
 	curr->vruntime +=3D calc_delta_fair(delta_exec, curr);
 	resched =3D update_deadline(cfs_rq, curr);
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+	resched |=3D predict_error_should_resched(curr);
+#endif
 	update_min_vruntime(cfs_rq);
=20
 	if (entity_is_task(curr)) {
@@ -8884,6 +8887,60 @@ static void set_next_buddy(struct sched_entity *se)
 	}
 }
=20
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+static bool predict_se_will_end_soon(struct sched_entity *se)
+{
+	struct predict_load_data *pldp =3D se->pldp;
+
+	if (pldp =3D=3D NULL)
+		return false;
+	if (pldp->predict_load_normalized =3D=3D NO_PREDICT_LOAD)
+		return false;
+	if (pldp->predict_load_normalized > pldp->load_normalized_when_enqueue)
+		return false;
+	if (se->avg.load_avg >=3D get_predict_load(se))
+		return false;
+	return true;
+}
+
+void set_in_predict_no_preempt(struct sched_entity *se, bool in_predict_no=
_preempt)
+{
+	struct predict_load_data *pldp =3D se->pldp;
+
+	if (pldp =3D=3D NULL)
+		return;
+	pldp->in_predict_no_preempt =3D in_predict_no_preempt;
+}
+
+static bool get_in_predict_no_preempt(struct sched_entity *se)
+{
+	struct predict_load_data *pldp =3D se->pldp;
+
+	if (pldp =3D=3D NULL)
+		return false;
+	return pldp->in_predict_no_preempt;
+}
+
+static bool predict_right(struct sched_entity *se)
+{
+	struct predict_load_data *pldp =3D se->pldp;
+
+	if (pldp =3D=3D NULL)
+		return false;
+	if (pldp->predict_load_normalized =3D=3D NO_PREDICT_LOAD)
+		return false;
+	if (se->avg.load_avg <=3D get_predict_load(se))
+		return true;
+	return false;
+}
+
+bool predict_error_should_resched(struct sched_entity *se)
+{
+	return get_in_predict_no_preempt(se) && !predict_right(se);
+}
+
+#endif
+
 /*
  * Preempt the current task with a newly woken task if needed:
  */
@@ -8893,6 +8950,10 @@ static void check_preempt_wakeup_fair(struct rq *rq,=
 struct task_struct *p, int
 	struct sched_entity *se =3D &donor->se, *pse =3D &p->se;
 	struct cfs_rq *cfs_rq =3D task_cfs_rq(donor);
 	int cse_is_idle, pse_is_idle;
+	bool if_best_se;
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+	bool predict_no_preempt =3D false;
+#endif
=20
 	if (unlikely(se =3D=3D pse))
 		return;
@@ -8954,6 +9015,21 @@ static void check_preempt_wakeup_fair(struct rq *rq,=
 struct task_struct *p, int
 	if (unlikely(!normal_policy(p->policy)))
 		return;
=20
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+	/*
+	 * If If we predict that se will end soon, it's better not to preempt it,
+	 * but wait for it to exit by itself. This is undoubtedly a grievance for
+	 * pse, so if pse should preempt se, we will give it some compensation.
+	 */
+	if (sched_feat(PREDICT_NO_PREEMPT)) {
+		if (predict_error_should_resched(se))
+			goto preempt;
+
+		if (predict_se_will_end_soon(se))
+			predict_no_preempt =3D true;
+	}
+#endif
+
 	cfs_rq =3D cfs_rq_of(se);
 	update_curr(cfs_rq);
 	/*
@@ -8966,10 +9042,18 @@ static void check_preempt_wakeup_fair(struct rq *rq=
, struct task_struct *p, int
 	if (do_preempt_short(cfs_rq, pse, se))
 		cancel_protect_slice(se);
=20
-	/*
-	 * If @p has become the most eligible task, force preemption.
-	 */
-	if (pick_eevdf(cfs_rq) =3D=3D pse)
+	if_best_se =3D (pick_eevdf(cfs_rq) =3D=3D pse);
+
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+	if (predict_no_preempt) {
+		if (if_best_se && !pse->sched_delayed) {
+			set_next_buddy(pse);
+			set_in_predict_no_preempt(se, true);
+			return;
+		}
+	}
+#endif
+	if (if_best_se)
 		goto preempt;
=20
 	return;
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 3c12d9f93331..8a78108af835 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -121,3 +121,7 @@ SCHED_FEAT(WA_BIAS, true)
 SCHED_FEAT(UTIL_EST, true)
=20
 SCHED_FEAT(LATENCY_WARN, false)
+
+#ifdef CONFIG_SCHED_PREDICT_LOAD
+SCHED_FEAT(PREDICT_NO_PREEMPT, true)
+#endif
--=20
2.33.0