From nobody Tue Dec 16 16:08:25 2025 Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com [209.85.214.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13F2120125F for ; Fri, 21 Feb 2025 08:48:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.195 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740127699; cv=none; b=a8D+doSVw6l3/2Hxea4N+RAoQd5VGQdJmZbAA81dVm5M9B77xAqxFsIwzJXNTA1zj8LIYDcnIXbpaYI2OgvPINJj7eRCDIojgQIYavui17Ipn8EYhMugWkBuHL4oebfg9lIArnclXC5xnuyNGlzWU9g4BLyqVd5Opkc4hWtMQIA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740127699; c=relaxed/simple; bh=m1tPChqnSc/+I1COgrBLsjmH17qUE9x8fPTlWTD6JUY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=HsFw9dHnXcDLOqEAzMTUiQhEyFrEh6Q+F64ow5l8ilI8ZNlyZiKMvPSfFW3o+aM/VYR5hCbE9/iYC/U2hzRZUqnnbF51QPhRzd+eRH03UyJ9QDWsMjbw5XWkjLu8w1ck6gaKW70rVkgXK7S3kGyJ5rmBa2pj/rUZwaM/0f+NfCY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=D/5Nbaq+; arc=none smtp.client-ip=209.85.214.195 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="D/5Nbaq+" Received: by mail-pl1-f195.google.com with SMTP id d9443c01a7336-220ec47991aso24426795ad.1 for ; Fri, 21 Feb 2025 00:48:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740127697; x=1740732497; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DOcOfoHtBK2E3+rgegkUkhTDtgbD6buh77xCtXURBzc=; b=D/5Nbaq+WpE5ytr57TO7JKHn4gfIS9WAypJ6W1Sx5HRbT30S8kJhvo9qJ4fVXE8B+K ID1UZEUPK0kVU0v50bCGtcsZv+4xI7HOzTkQd4Os7FOHMXtAIIhp7uTj5h4ly5G7+jXU x1UPjRXDQAOBPkC80ebCcFjbfB7f/GAH4UgeIHkEmWyxb6b/VqvxVLRUKRo5/Atm8a0z Aksz65gNOZrFIjqZ9fs/oY2lcAxDrLjriO2zMrCFCe/mCBCmY+W+4Gr5bS7/+xGsTURU M6TMTvuTzZ3k9KV6aIOtweW0FWBloMGAKJ3K2YxhEFcS+GhSO0AGGmjvOwQ48vrqBTCl t9Lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740127697; x=1740732497; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DOcOfoHtBK2E3+rgegkUkhTDtgbD6buh77xCtXURBzc=; b=w5c/UWxRWrNigApR1k8Mh1lEQwArAZglRpHAuuOnRKbsvpx03sP3mBhf1qLRPtmhVF 4cApyL7C4cMxxZCZfyWj5yztbhnYw30VE1VAYFSah/r0e8qC/pxamJsMLvqHMfuFIjur vEBvIjTBQPtkhBMYAbCmpnKs2Niho5SbPyExeY6FNgZpfaE8RJnjgOdbqMJfyBCHUQnM mwy+zfZtmGXpdBiTmD7uqKtc9EmVykPnh1RAQXz9MGWSDAOfk6RXPsXvXgp4qfCXbcKF aXvOvPnAEp9m9MUvF0MV9t4b5DrgNWB9TbvS778tWaTok9mserXVnl2Phw4a/3ERgsM4 xEVw== X-Forwarded-Encrypted: i=1; AJvYcCVIgLxWPZtmeovkU5+HpuXweQY7L/HNtqwlT7WzSzJPPkxvb/gRRl4buQrd1D1Zm1S2jhoXuklWrI0MUq4=@vger.kernel.org X-Gm-Message-State: AOJu0YwfwFBy/CvjkauNpbgHtSQxrPTNj5ft5LZ3XmKH0Fs8cn6GN3DA 9hNpqKm7MyEtrSt1T1gpUhb1Xn6NuMb3x1itHfQpn0zcwmOFSrNR X-Gm-Gg: ASbGncs+jNhgQEFC9Q/hA7BGqSLKv6x+3pJLwwEit8HP9sZB3B7Dq1JifmRo7KRGAl5 RmG6jcLKgZSvMHWijOy/3RSIwD5xSH1+JX7ec9ROY1L2IT7Qk+80op9m7km8l/G7uZpH7MpgXpz jwv39d1faAkCK4eRFZ5iXzbfqRxLGNKfCcV2uuj95/knaaJhOQhuN73WCCs1JjycOjjA3eXrvld ZNSYTEKtsSSt2uvTP72MKoe27oGOsf4Adng4ZytHGcWYKYQVMjeJs4HNur0ockLt5fyJaOcIIkH 6nK5RC9+DqSX8csgSeBBCb5XU+OzfLKWlJjqjVNptm8P/EhQewwbCc4= X-Google-Smtp-Source: AGHT+IFPee9g5cWnsjvKUxnUcKxA6+Epp9DDGURxcVQbl61BabtRng+KCZkiXe9RJW1yYEwy6AILLA== X-Received: by 2002:a17:902:db08:b0:220:cab1:810e with SMTP id d9443c01a7336-2219ff2a633mr35546445ad.6.1740127697156; Fri, 21 Feb 2025 00:48:17 -0800 (PST) Received: from localhost.localdomain ([2408:80e0:41fc:0:fe2d:0:2:6253]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d5364676sm133197955ad.82.2025.02.21.00.48.11 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 21 Feb 2025 00:48:16 -0800 (PST) From: zihan zhou <15645113830zzh@gmail.com> To: 15645113830zzh@gmail.com Cc: bsegall@google.com, dietmar.eggemann@arm.com, juri.lelli@redhat.com, linux-kernel@vger.kernel.org, mgorman@suse.de, mingo@redhat.com, peterz@infradead.org, rostedt@goodmis.org, vincent.guittot@linaro.org, vschneid@redhat.com Subject: [PATCH V1 1/4] sched: Add kconfig of predict load. Date: Fri, 21 Feb 2025 16:47:45 +0800 Message-Id: <20250221084744.31803-1-15645113830zzh@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250221084437.31284-1-15645113830zzh@gmail.com> References: <20250221084437.31284-1-15645113830zzh@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Using predict load will make the scheduler logic more complex and take up more resources. When we are not sure whether to use it, we should be able to close it. Signed-off-by: zihan zhou <15645113830zzh@gmail.com> --- init/Kconfig | 10 ++++++++++ lib/Kconfig.debug | 12 ++++++++++++ 2 files changed, 22 insertions(+) diff --git a/init/Kconfig b/init/Kconfig index d0d021b3fa3b..83cff5d63ce2 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -573,6 +573,16 @@ config HAVE_SCHED_AVG_IRQ depends on IRQ_TIME_ACCOUNTING || PARAVIRT_TIME_ACCOUNTING depends on SMP =20 +config SCHED_PREDICT_LOAD + bool "Predict the load of se" + depends on SMP + help + Select this option to enable the load prediction, the load at the + time of dequeue will be predicted according the load at the time + of enqueue. + + Say N if unsure. + config SCHED_HW_PRESSURE bool default y if ARM && ARM_CPU_TOPOLOGY diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 1af972a92d06..01b23677d003 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1310,6 +1310,18 @@ config SCHED_DEBUG that can help debug the scheduler. The runtime overhead of this option is minimal. =20 +config SCHED_PREDICT_LOAD_DEBUG + bool "Debug for SCHED_PREDICT_LOAD" + depends on SMP && SCHED_PREDICT_LOAD && SCHED_DEBUG + default y + help + If you say Y here, the /proc/$pid/predict_load file will be provided + the information of task se that can help debug the SCHED_PREDICT_LOAD. + The /sys/kernel/debug/sched/debug file can also see the information + of group se, but compared with task se, there is less information. + + Say N if unsure. + config SCHED_INFO bool default n --=20 2.33.0 From nobody Tue Dec 16 16:08:25 2025 Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com [209.85.214.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74444202C23 for ; Fri, 21 Feb 2025 08:51:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.195 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740127880; cv=none; b=SMReAt/G3DqaYMGhAddWytwFetfBAhbu69D3p0JKgyZsatBQjeYTmtItaUvBRi9nStk8SDy7BLT4++dA0/3nfefaeLsbTS8dzrGP4u+9CxgyKduHbChJWjPwqs7P6O4Tex8BTmteGwQWCd104SuHkY6bUCKhiYjSZWuVwcNxtAU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740127880; c=relaxed/simple; bh=IOGJQL6s27+8z3AuOXUqyPojF4z/7mrWClNgfqxYZJ8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=d42phdqejSlTm0rR9Cy4ec5O1I0w9VhNxQrqpzK/dEiXiCMoaAaY0jNfMDR8E3fXxOXqWnNehhAK/e9wZ7pIk8DnFq5R0FU+XKFgUS9nnqx4979duKqt4SLBP58cTey+p9DEtY38Ey+8+r3gnD4wR/qxBi9+gsVU46JHJ5/Ht5M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VgnVH3kJ; arc=none smtp.client-ip=209.85.214.195 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VgnVH3kJ" Received: by mail-pl1-f195.google.com with SMTP id d9443c01a7336-22128b7d587so34888325ad.3 for ; Fri, 21 Feb 2025 00:51:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740127877; x=1740732677; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=25FyP5V2MCscP+w4+HnonziJqe8/Aa1u78lqnAr5ogs=; b=VgnVH3kJKQlhaQWjI0012/+jTUHjEP9vOoK2owk9XbowRD3BEONeE/Ii7hyl1b3yrZ tif/1k9z8aqVIi/tIoVmDWd//BezzEGo/NkIgTpwE/bLntLR6dFV9lK6u3QXrQGVyxdE 58tDRK6PU9dLWFWZSnD0E8Yeg/RFdFo40pX+UqavGUFO4y142T1eFRkQLGwfCIjLLrRT +AHWgNC/hEKiyNXXZlm0bRG1LP5lD3n88U6yuVdiu9uWRwNDiBmnLIK9ax1b5dkkL/yJ Es/OXs2wMWivrkf2u/JuhkQhfPwHFAV45cO8JPg6bHzHTMYSkC1Dxy8PK7+qzlfC7ztC ggPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740127877; x=1740732677; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=25FyP5V2MCscP+w4+HnonziJqe8/Aa1u78lqnAr5ogs=; b=bQngOwCtmOYNdA1KuqQtIOit+wY3Zet4KD8EPHSkN/MIFRt6nfmGlUvIrRHim5zBXK cFpaJCyTiQ9FmPQ+Nz7GdKkIfDSPjF5+qCkv6vnxgK4lnX6zxiShycuZT0g60VgnlXXW Fylh7DYLnRMzGnRSEaEBwvCeMZRbob7tucWLGIUhB+6TSoXCfQKcRnRSNIkdiB4ociQO QWj8MzEobsm1k4Fj9/SBLz0qqWB5VxWuN+em+yWTjNAVRW9UK/0wlmOaMOUA+f1fHDrV kVVOQqYBcV2UD37Q1YkB2Ha63xXrFs2fAcE8pY9XZMff8txGDkLNCiTNuciyU4Mcuvdl lgCA== X-Forwarded-Encrypted: i=1; AJvYcCVA3qa2XuTzO9YaDukd013I8M8hDkXfKMhJqm2/52JSURCTg/AaaonmJpCgOorcEZACXyZees/4mBu8R0U=@vger.kernel.org X-Gm-Message-State: AOJu0YxMCdivmNcm6nEufSywpXM/9wZ+N56XxAnNswIOw8IDytBcEsul nr/fPBM2+u9muZ3QuITsIQyPFuyZzNzQpypIwajK4hEdBuMrmz1e X-Gm-Gg: ASbGncvpbupCPTigphgBON0cnHhP4RvyIrQB82boxFDgcLHGnOjW2e6AptyKL1k284y BNcCFq2pG4I26E3c5AP+pFFaVJ09OW5hapuF1PNaUbFKpBrimXVM1oYecqF+4lDUzWcTUwOBY0D 5NsD0HypKdlDlU1qgLFz/DssS/YwGpQ9l0v/UXTiYxgdJDoAL41xdbvktygbEw2okIBjZYPTsD3 UrBMx2Foo4STgL/WoIKQSuaX2PdMe+t9TknPTRjmmLh4Ktdj+IvCbNT+Rm87pN90RJRVCAuneO2 6xWF+EcQKqWc1KwDajVgi1peXZlXPEwawg== X-Google-Smtp-Source: AGHT+IGn2IPjva2esrxqes1fneYalYkM593na73RdEni1hrsArpjaZzg1rAnuAXiH9NCfxKR33otsA== X-Received: by 2002:a17:903:230f:b0:216:4883:fb43 with SMTP id d9443c01a7336-2219ffd2718mr43910005ad.32.1740127877428; Fri, 21 Feb 2025 00:51:17 -0800 (PST) Received: from localhost.localdomain ([129.227.63.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d545df28sm132392645ad.153.2025.02.21.00.51.13 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 21 Feb 2025 00:51:16 -0800 (PST) From: zihan zhou <15645113830zzh@gmail.com> To: 15645113830zzh@gmail.com Cc: bsegall@google.com, dietmar.eggemann@arm.com, juri.lelli@redhat.com, linux-kernel@vger.kernel.org, mgorman@suse.de, mingo@redhat.com, peterz@infradead.org, rostedt@goodmis.org, vincent.guittot@linaro.org, vschneid@redhat.com Subject: [PATCH V1 2/4] sched: Do predict load Date: Fri, 21 Feb 2025 16:50:52 +0800 Message-Id: <20250221085051.32468-1-15645113830zzh@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250221084437.31284-1-15645113830zzh@gmail.com> References: <20250221084437.31284-1-15645113830zzh@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Patch 2/4 is the core content of this submission. The main struct is predict_load_data, which every se has one except init_task. Both group se and task se has it. Predict load is mainly realized by functions record_predict_load_data and se_do_predict_load, they run when dequeue and enqueue. when enqueue, se_do_predict_load record load_normalized_when_enqueue, and try get predict_load_normalized. when dequeue, record_predict_load_data use record_load_array to record correspondence between enqueue load and dequeue load. Here we use Boyer=E2=80=93Moore majority vote algorithm, I think prediction= is considered reliable only when the confidence is greater than CONFIDENCE_THRESHOLD(4), this is an experimental value. It has also been explained in patch 0/4 that the load will be normalized to 0~1024. From the perspective of machine learning, normalization is better, and it also helps to deal with some special cases. All operation time complexity is O(1), and predict load will not affect performance, at least when I test. TODO: There are still many shortcomings here. I hope to have the opportunity to establish a mapping of exec_filename to predict_load_data and add exec statistics, which helps the kernel distinguish whether the executable file is sysbench or cyclictest. Signed-off-by: zihan zhou <15645113830zzh@gmail.com> --- include/linux/sched.h | 46 ++++++++++++ include/linux/sched/task.h | 2 +- init/init_task.c | 3 + kernel/fork.c | 6 +- kernel/sched/core.c | 15 +++- kernel/sched/fair.c | 148 ++++++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 2 + 7 files changed, 217 insertions(+), 5 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 9632e3318e0d..b8576bca5a5d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -491,6 +491,42 @@ struct sched_avg { unsigned int util_est; } ____cacheline_aligned; =20 +#ifdef CONFIG_SCHED_PREDICT_LOAD + +#define NO_PREDICT_LOAD ULONG_MAX +#define CONFIDENCE_THRESHOLD 4 + +#define PREDICT_LOAD_MAX 1024 +#define LOAD_GRAN_SHIFT 4 +#define LOAD_GRAN (1 << LOAD_GRAN_SHIFT) + +struct record_load { + u8 load_after_offset; + u8 confidence; +}; + +extern struct kmem_cache *predict_load_data_cachep; + +#endif + +#ifdef CONFIG_SCHED_PREDICT_LOAD + + struct predict_load_data { + //1024 Special processing, index does not need +1. + struct record_load record_load_array[PREDICT_LOAD_MAX >> LOAD_GRAN_SHIFT= ]; + unsigned long load_normalized_when_enqueue; + unsigned long predict_load_normalized; + +#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG + unsigned long predict_count; + unsigned long predict_correct_count; + unsigned long no_predict_count; +#endif + bool in_predict_no_preempt; + }; + +#endif + /* * The UTIL_AVG_UNCHANGED flag is used to synchronize util_est with util_a= vg * updates. When a task is dequeued, its util_est should not be updated if= its @@ -587,9 +623,19 @@ struct sched_entity { * collide with read-mostly values above. */ struct sched_avg avg; + +#ifdef CONFIG_SCHED_PREDICT_LOAD + struct predict_load_data *pldp; +#endif #endif }; =20 +#ifdef CONFIG_SCHED_PREDICT_LOAD +unsigned long get_predict_load(struct sched_entity *se); +void set_in_predict_no_preempt(struct sched_entity *se, bool in_predict_no= _preempt); +bool predict_error_should_resched(struct sched_entity *se); +#endif + struct sched_rt_entity { struct list_head run_list; unsigned long timeout; diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 0f2aeb37bbb0..c5d435b9fce9 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -62,7 +62,7 @@ extern int lockdep_tasklist_lock_is_held(void); extern asmlinkage void schedule_tail(struct task_struct *prev); extern void init_idle(struct task_struct *idle, int cpu); =20 -extern int sched_fork(unsigned long clone_flags, struct task_struct *p); +extern int sched_fork(unsigned long clone_flags, struct task_struct *p, in= t node); extern int sched_cgroup_fork(struct task_struct *p, struct kernel_clone_ar= gs *kargs); extern void sched_cancel_fork(struct task_struct *p); extern void sched_post_fork(struct task_struct *p); diff --git a/init/init_task.c b/init/init_task.c index e557f622bd90..c0ea11adfdab 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -89,6 +89,9 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = =3D { }, .se =3D { .group_node =3D LIST_HEAD_INIT(init_task.se.group_node), +#ifdef CONFIG_SCHED_PREDICT_LOAD + .pldp =3D NULL, +#endif }, .rt =3D { .run_list =3D LIST_HEAD_INIT(init_task.rt.run_list), diff --git a/kernel/fork.c b/kernel/fork.c index 735405a9c5f3..b8ba621d2a87 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -182,6 +182,10 @@ static inline struct task_struct *alloc_task_struct_no= de(int node) =20 static inline void free_task_struct(struct task_struct *tsk) { +#ifdef CONFIG_SCHED_PREDICT_LOAD + if (tsk->se.pldp !=3D NULL && predict_load_data_cachep !=3D NULL) + kmem_cache_free(predict_load_data_cachep, tsk->se.pldp); +#endif kmem_cache_free(task_struct_cachep, tsk); } =20 @@ -2370,7 +2374,7 @@ __latent_entropy struct task_struct *copy_process( #endif =20 /* Perform scheduler related setup. Assign this task to a CPU. */ - retval =3D sched_fork(clone_flags, p); + retval =3D sched_fork(clone_flags, p, node); if (retval) goto bad_fork_cleanup_policy; =20 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 165c90ba64ea..905d53503a35 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4712,7 +4712,7 @@ late_initcall(sched_core_sysctl_init); /* * fork()/clone()-time setup: */ -int sched_fork(unsigned long clone_flags, struct task_struct *p) +int sched_fork(unsigned long clone_flags, struct task_struct *p, int node) { __sched_fork(clone_flags, p); /* @@ -4768,7 +4768,9 @@ int sched_fork(unsigned long clone_flags, struct task= _struct *p) } =20 init_entity_runnable_average(&p->se); - +#ifdef CONFIG_SCHED_PREDICT_LOAD + init_entity_predict_load_data(&p->se, node); +#endif =20 #ifdef CONFIG_SCHED_INFO if (likely(sched_info_on())) @@ -8472,11 +8474,20 @@ LIST_HEAD(task_groups); static struct kmem_cache *task_group_cache __ro_after_init; #endif =20 +#ifdef CONFIG_SCHED_PREDICT_LOAD +struct kmem_cache *predict_load_data_cachep; +#endif + void __init sched_init(void) { unsigned long ptr =3D 0; int i; =20 +#ifdef CONFIG_SCHED_PREDICT_LOAD + predict_load_data_cachep =3D kmem_cache_create("predict_load_data", + sizeof(struct predict_load_data), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); +#endif + /* Make sure the linker didn't screw up */ #ifdef CONFIG_SMP BUG_ON(!sched_class_above(&stop_sched_class, &dl_sched_class)); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 857808da23d8..d22d47419f79 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1068,6 +1068,23 @@ void init_entity_runnable_average(struct sched_entit= y *se) /* when this task is enqueued, it will contribute to its cfs_rq's load_av= g */ } =20 +#ifdef CONFIG_SCHED_PREDICT_LOAD +void init_entity_predict_load_data(struct sched_entity *se, int node) +{ + if (predict_load_data_cachep =3D=3D NULL) { + se->pldp =3D NULL; + return; + } + + struct predict_load_data *pldp =3D kmem_cache_alloc_node(predict_load_dat= a_cachep, + GFP_KERNEL, node); + + memset(pldp, 0, sizeof(*(pldp))); + pldp->predict_load_normalized =3D NO_PREDICT_LOAD; + se->pldp =3D pldp; +} +#endif + /* * With new tasks being created, their initial util_avgs are extrapolated * based on the cfs_rq's current util_avg: @@ -4701,6 +4718,114 @@ static void attach_entity_load_avg(struct cfs_rq *c= fs_rq, struct sched_entity *s trace_pelt_cfs_tp(cfs_rq); } =20 +#ifdef CONFIG_SCHED_PREDICT_LOAD + +static unsigned long get_load_after_offset(unsigned long load) +{ + if (load >=3D PREDICT_LOAD_MAX) + load =3D PREDICT_LOAD_MAX - 1; + return load >> LOAD_GRAN_SHIFT; +} + +/* + * Here I don't want the weight of se to affect the load, because + * the predict_load_data is designed to record load form 0 to 1024, + * so normalized it, we can restore it as needed by restore_normalized_loa= d. + */ +static unsigned long get_normalized_load(struct sched_entity *se) +{ + unsigned long normalized_load, load =3D se->avg.load_avg; + + //Prevent arithmetic overflow + WARN_ON_ONCE(load > 4000000); + if (se_weight(se) =3D=3D PREDICT_LOAD_MAX) + return load; + normalized_load =3D div_u64(load * PREDICT_LOAD_MAX, se_weight(se)); + return min(normalized_load, PREDICT_LOAD_MAX); +} + +static unsigned long restore_normalized_load(unsigned long normalized_load= , unsigned long weight) +{ + unsigned long load; + + //Prevent arithmetic overflow + WARN_ON_ONCE(normalized_load > 4000000); + if (weight =3D=3D PREDICT_LOAD_MAX) + return normalized_load; + load =3D div_u64(load * weight, PREDICT_LOAD_MAX); + return load; +} + +//This is a useful API. +unsigned long get_predict_load(struct sched_entity *se) +{ + if (se->pldp =3D=3D NULL) + return NO_PREDICT_LOAD; + struct predict_load_data *pldp =3D se->pldp; + unsigned long predict_load_normalized =3D pldp->predict_load_normalized; + unsigned long predict_load; + + if (predict_load_normalized =3D=3D NO_PREDICT_LOAD) + return NO_PREDICT_LOAD; + + predict_load =3D restore_normalized_load(predict_load_normalized + LOAD_G= RAN, se_weight(se)); + return predict_load; +} + + +static void record_predict_load_data(struct sched_entity *se) +{ + if (se->pldp =3D=3D NULL) + return; + struct predict_load_data *pldp =3D se->pldp; + struct record_load *rla =3D pldp->record_load_array; + unsigned long load_normalized_when_dequeue =3D get_normalized_load(se); + unsigned long load_normalized_when_enqueue =3D se->pldp->load_normalized_= when_enqueue; + unsigned long index =3D get_load_after_offset(load_normalized_when_enqueu= e); + unsigned long val =3D get_load_after_offset(load_normalized_when_dequeue); + +#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG + if (pldp->predict_load_normalized !=3D NO_PREDICT_LOAD) { + pldp->predict_count++; + if (load_normalized_when_dequeue >=3D pldp->predict_load_normalized + && load_normalized_when_dequeue <=3D pldp->predict_load_normalized + LOA= D_GRAN) + pldp->predict_correct_count++; + } else { + pldp->no_predict_count++; + } +#endif + + if (rla[index].load_after_offset =3D=3D val) { + if (rla[index].confidence < 255) + rla[index].confidence++; + } else { + if (rla[index].confidence <=3D 1) { + rla[index].load_after_offset =3D val; + rla[index].confidence =3D 1; + } else { + rla[index].confidence--; + } + } +} + +static void se_do_predict_load(struct sched_entity *se) +{ + if (se->pldp =3D=3D NULL) + return; + unsigned long index, predict_load_normalized =3D NO_PREDICT_LOAD; + struct predict_load_data *pldp =3D se->pldp; + struct record_load *rla =3D pldp->record_load_array; + + pldp->load_normalized_when_enqueue =3D get_normalized_load(se); + index =3D get_load_after_offset(pldp->load_normalized_when_enqueue); + + if (rla[index].confidence >=3D CONFIDENCE_THRESHOLD) + predict_load_normalized =3D rla[index].load_after_offset << LOAD_GRAN_SH= IFT; + pldp->predict_load_normalized =3D predict_load_normalized; +} + +#endif + /** * detach_entity_load_avg - detach this entity from its cfs_rq load avg * @cfs_rq: cfs_rq to detach from @@ -5336,6 +5461,11 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_e= ntity *se, int flags) */ update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH); se_update_runnable(se); + +#ifdef CONFIG_SCHED_PREDICT_LOAD + se_do_predict_load(se); +#endif + /* * XXX update_load_avg() above will have attached us to the pelt sum; * but update_cfs_group() here will re-adjust the weight and have to @@ -5493,6 +5623,11 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_e= ntity *se, int flags) update_load_avg(cfs_rq, se, action); se_update_runnable(se); =20 +#ifdef CONFIG_SCHED_PREDICT_LOAD + record_predict_load_data(se); + set_in_predict_no_preempt(se, false); +#endif + update_stats_dequeue_fair(cfs_rq, se, flags); =20 update_entity_lag(cfs_rq, se); @@ -5628,6 +5763,9 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, st= ruct sched_entity *prev) } SCHED_WARN_ON(cfs_rq->curr !=3D prev); cfs_rq->curr =3D NULL; +#ifdef CONFIG_SCHED_PREDICT_LOAD + set_in_predict_no_preempt(prev, false); +#endif } =20 static void @@ -13345,8 +13483,13 @@ void free_fair_sched_group(struct task_group *tg) for_each_possible_cpu(i) { if (tg->cfs_rq) kfree(tg->cfs_rq[i]); - if (tg->se) + if (tg->se) { kfree(tg->se[i]); +#ifdef CONFIG_SCHED_PREDICT_LOAD + if (tg->se[i]->pldp !=3D NULL && predict_load_data_cachep !=3D NULL) + kmem_cache_free(predict_load_data_cachep, tg->se[i]->pldp); +#endif + } } =20 kfree(tg->cfs_rq); @@ -13384,6 +13527,9 @@ int alloc_fair_sched_group(struct task_group *tg, s= truct task_group *parent) init_cfs_rq(cfs_rq); init_tg_cfs_entry(tg, cfs_rq, se, i, parent->se[i]); init_entity_runnable_average(se); +#ifdef CONFIG_SCHED_PREDICT_LOAD + init_entity_predict_load_data(se, cpu_to_node(i)); +#endif } =20 return 1; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ab16d3d0e51c..cf1e98bf83d3 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2733,6 +2733,8 @@ extern void init_dl_entity(struct sched_dl_entity *dl= _se); extern unsigned long to_ratio(u64 period, u64 runtime); =20 extern void init_entity_runnable_average(struct sched_entity *se); +extern void init_entity_predict_load_data(struct sched_entity *se, int nod= e); + extern void post_init_entity_util_avg(struct task_struct *p); =20 #ifdef CONFIG_NO_HZ_FULL --=20 2.33.0 From nobody Tue Dec 16 16:08:25 2025 Received: from mail-pj1-f67.google.com (mail-pj1-f67.google.com [209.85.216.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53B6D2010F2 for ; Fri, 21 Feb 2025 08:55:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.67 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740128127; cv=none; b=n+k5UKopQbW/l43lPwbLMV8pi9Elof79u2SP/ztWuT2H3TXMIH1NRkaF9ywDabH89i6YOfs1hNGtIYlPdRgNfGWxYA8A2T9wpI7UlDtClzjU3Zvt9vMtbrdLLG0ib3EaVsEg15cj6GLIlvq9jEgyP1ohPIJHURnUmObPFgrdODY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740128127; c=relaxed/simple; bh=WYmZxiWZQBd1PtixBUM98h2J3Yz2UWp4UMCy6spo09M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZIwIsG/r9yJdDM5G1C0xnzK5Yg2q15oO98mHl7enkEok5wqHaTnfvMmZgloL+RvF0aBgOhxJqSrNVkrK3V42bUPRcKwUwmjZG38gB1QDEFn0qgoXY5JHUoJ2Ziw96sQtdTRmcxn6dlsRcNo2abXFy5utn4VI0zSubE48b9We8/I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Xh++8gj1; arc=none smtp.client-ip=209.85.216.67 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Xh++8gj1" Received: by mail-pj1-f67.google.com with SMTP id 98e67ed59e1d1-2fcce9bb0ecso3551311a91.3 for ; Fri, 21 Feb 2025 00:55:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740128125; x=1740732925; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RvT1k2l3XeU5pNdodIBtmwWFocycPmhl++sLb9X13Tw=; b=Xh++8gj1tzsrIrwJbxaE/Dhnz6X4X5F400L/7VjuitVG3fTetBi7TcKFkZuLcXuw4N 0KurNDedwkBl+QtVPue3bHru1u2N5Ut8b4h+9g/EZJLeozLSR3KlLyFestiVsXmloMvU faRy+08nWdMdr8ALzQK0QtqgfL/oCiy2B4wr33nhHzdCOUQp/4xEdpYB/bao/cjTnak/ ibkMflg1gHg/1Fgwdl9oU8mrdBwSonEBJC7znRBvGEe4voHk3RNqzDoXM6EPMdoN+uek nUyq8EvdEIdauLZCz5s1FhtsBOrTtNhA+G5sePYVzkFPtPAHNIkhsAsFy7y0mcPmYvFq z8bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740128125; x=1740732925; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RvT1k2l3XeU5pNdodIBtmwWFocycPmhl++sLb9X13Tw=; b=Le97pTpMPmdNPFaLh861FW3wCHEGXYvVqK71xgWm7q5LPjpleP/a4l7CxnDXxzkfdO YtgSSWiYfrHfsG6xhyYcQ2LNVAwQpo8glKvJv3qwCmOI7uhUrpUxs1TQM5YLmmt9Rcs8 5dALTyNexsksnt++irfhDb+xDcaxXrWNGDiHc9NRvsXxMtgQKT03+nX/0aG92lqPovXl fWtiRRw4YFmYNZqpjNFJ+ZxolWwQAfaZo/3QTXJ6jXLuNcXTS3AF4bMWq3bJkL1Hb/ID 2sRLksOzDQzJt0v1lIYcuc9uOfJQy2jgcZZmIjeoJWjgJnvTYzWIF44Ji2MEC4FtccEb ruFw== X-Forwarded-Encrypted: i=1; AJvYcCUZpY/Dlg2vGW6YnXHAHnoRXJz4CnTSp/ixOGk2jqYCsny3i3wTTUDfcnvOp3Tw5iqXu1cwFJh+feDAz64=@vger.kernel.org X-Gm-Message-State: AOJu0YyDjxKDhLnAjXJvnlNvKMzj2gnSRc8fVhUcQiPMCc8c5iH+xS61 FxxGBvq/VlGe5gOCaRJWXZzAcVIfxJ7BHfw/+ysyueSRaUos969B X-Gm-Gg: ASbGncuYQi9ne9OlODttnUgQ5LeRo2foqKvI7sOEWSfMQGHekMy1+Nu1Nrr/zcJRGnN XGqT3CyLNYRM3ZE8ELy4XoTuQO3MB7mmzJqBdj5f4+Wx3nsQ0mf4QBRqMFbfrsMDH5AbcxHwDlo MiksEP11e07mMiW9BQqNSS4LFypmIFeU9p3r6O7jLTJ0D3e9kzavMLSBcwvKWJVQ+nJzuc2vU2h I8qMEbLZwkmHw2OczrlA1zm4clAFxH9A4fRCL+cT97jtkm+r/v6Y1Lz6sxtWRthjcjU9IZ7cOvt nJxOjKiaBNiomy2MBSuYlVfyIqwC2LCqcTKZaZB3hHHb8IfJgbmhpYw= X-Google-Smtp-Source: AGHT+IFRpVxTVgjd+8RI+imhhtpHxKuk4io0KAj1jJiRLbU6ibi/GTmV38YfY/EgtQn7N8+vpTNnvg== X-Received: by 2002:a17:90b:2e4f:b0:2ee:d63f:d77 with SMTP id 98e67ed59e1d1-2fce78a5047mr4586929a91.9.1740128125390; Fri, 21 Feb 2025 00:55:25 -0800 (PST) Received: from localhost.localdomain ([2408:80e0:41fc:0:fe2d:0:2:6253]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2fceb04b0c2sm816950a91.14.2025.02.21.00.55.22 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 21 Feb 2025 00:55:24 -0800 (PST) From: zihan zhou <15645113830zzh@gmail.com> To: 15645113830zzh@gmail.com Cc: bsegall@google.com, dietmar.eggemann@arm.com, juri.lelli@redhat.com, linux-kernel@vger.kernel.org, mgorman@suse.de, mingo@redhat.com, peterz@infradead.org, rostedt@goodmis.org, vincent.guittot@linaro.org, vschneid@redhat.com Subject: [PATCH V1 3/4] sched: add debug for predict load Date: Fri, 21 Feb 2025 16:55:08 +0800 Message-Id: <20250221085507.33329-1-15645113830zzh@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250221084437.31284-1-15645113830zzh@gmail.com> References: <20250221084437.31284-1-15645113830zzh@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We can see the debugging information about load prediction from /proc/$pid/predict_load (task se) and /sys/kernel/debug/sched/debug (group se) An example: [root@test sched]# cat /proc/1/predict_load=20 se.pldp->predict_correct_count : 7699=20 se.pldp->predict_count : 7820 se.pldp->no_predict_count : 263 enqueue_load_normalized: 0, dequeue_load_normalized: 0, confidence:255 enqueue_load_normalized: 16, dequeue_load_normalized: 16, confidence:42 enqueue_load_normalized: 32, dequeue_load_normalized: 32, confidence:14 enqueue_load_normalized: 48, dequeue_load_normalized: 48, confidence:5=20 enqueue_load_normalized: 64, dequeue_load_normalized: 64, confidence:8=20 enqueue_load_normalized: 80, dequeue_load_normalized: 80, confidence:9=20 enqueue_load_normalized: 96, dequeue_load_normalized: 96, confidence:3=20 enqueue_load_normalized: 112, dequeue_load_normalized: 128, confidence:2=20 /sys/kernel/debug/sched/debug only have predict_count. Signed-off-by: zihan zhou <15645113830zzh@gmail.com> --- fs/proc/base.c | 39 +++++++++++++++++++++++++++++++++++++ include/linux/sched/debug.h | 5 +++++ kernel/sched/debug.c | 39 +++++++++++++++++++++++++++++++++++++ 3 files changed, 83 insertions(+) diff --git a/fs/proc/base.c b/fs/proc/base.c index cd89e956c322..e66173ce941b 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -1541,6 +1541,39 @@ static const struct file_operations proc_pid_sched_o= perations =3D { =20 #endif =20 + +#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG + +static int predict_load_show(struct seq_file *m, void *v) +{ + struct inode *inode =3D m->private; + struct pid_namespace *ns =3D proc_pid_ns(inode->i_sb); + struct task_struct *p; + + p =3D get_proc_task(inode); + if (!p) + return -ESRCH; + proc_predict_load_show_task(p, ns, m); + + put_task_struct(p); + + return 0; +} + +static int predict_load_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, predict_load_show, inode); +} + +static const struct file_operations proc_pid_predict_load_operations =3D { + .open =3D predict_load_open, + .read =3D seq_read, + .llseek =3D seq_lseek, + .release =3D single_release, +}; + +#endif + #ifdef CONFIG_SCHED_AUTOGROUP /* * Print out autogroup related information: @@ -3334,6 +3367,9 @@ static const struct pid_entry tgid_base_stuff[] =3D { #ifdef CONFIG_SCHED_DEBUG REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations), #endif +#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG + REG("predict_load", S_IRUGO, proc_pid_predict_load_operations), +#endif #ifdef CONFIG_SCHED_AUTOGROUP REG("autogroup", S_IRUGO|S_IWUSR, proc_pid_sched_autogroup_operations), #endif @@ -3684,6 +3720,9 @@ static const struct pid_entry tid_base_stuff[] =3D { ONE("limits", S_IRUGO, proc_pid_limits), #ifdef CONFIG_SCHED_DEBUG REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations), +#endif +#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG + REG("predict_load", S_IRUGO, proc_pid_predict_load_operations), #endif NOD("comm", S_IFREG|S_IRUGO|S_IWUSR, &proc_tid_comm_inode_operations, diff --git a/include/linux/sched/debug.h b/include/linux/sched/debug.h index b5035afa2396..5b2bab60afae 100644 --- a/include/linux/sched/debug.h +++ b/include/linux/sched/debug.h @@ -40,6 +40,11 @@ struct seq_file; extern void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, struct seq_file *m); extern void proc_sched_set_task(struct task_struct *p); + +#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG +extern void proc_predict_load_show_task(struct task_struct *p, + struct pid_namespace *ns, struct seq_file *m); +#endif #endif =20 /* Attach to any functions which should be ignored in wchan output. */ diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index ef047add7f9e..619b96333f6a 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -690,6 +690,12 @@ static void print_cfs_group_stats(struct seq_file *m, = int cpu, struct task_group P(se->avg.runnable_avg); #endif =20 +#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG + P(se->pldp->predict_correct_count); + P(se->pldp->predict_count); + P(se->pldp->no_predict_count); +#endif + #undef PN_SCHEDSTAT #undef PN #undef P_SCHEDSTAT @@ -1160,6 +1166,39 @@ static void sched_show_numa(struct task_struct *p, s= truct seq_file *m) #endif } =20 +#ifdef CONFIG_SCHED_PREDICT_LOAD_DEBUG + +void proc_predict_load_show_task(struct task_struct *p, struct pid_namespa= ce *ns, + struct seq_file *m) +{ + struct predict_load_data *pldp =3D p->se.pldp; + + if (pldp =3D=3D NULL) + return; + struct record_load *rla =3D pldp->record_load_array; + + unsigned long index, enqueue_load_normalized, dequeue_load_normalized, co= nfidence; + + P(se.pldp->predict_correct_count); + P(se.pldp->predict_count); + P(se.pldp->no_predict_count); + + + for (index =3D 0; index < (PREDICT_LOAD_MAX >> LOAD_GRAN_SHIFT); index++)= { + enqueue_load_normalized =3D index << LOAD_GRAN_SHIFT; + dequeue_load_normalized =3D rla[index].load_after_offset << LOAD_GRAN_SH= IFT; + confidence =3D rla[index].confidence; + if (confidence) { + SEQ_printf(m, "enqueue_load_normalized: %ld, ", enqueue_load_normalized= ); + SEQ_printf(m, "dequeue_load_normalized: %ld, ", dequeue_load_normalized= ); + SEQ_printf(m, "confidence:%ld\n", confidence); + } + } + +} + +#endif + void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, struct seq_file *m) { --=20 2.33.0 From nobody Tue Dec 16 16:08:25 2025 Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com [209.85.214.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADFF233EA for ; Fri, 21 Feb 2025 08:57:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.195 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740128267; cv=none; b=M/ZUAILRXrJ7/ob1sqocHhCMv0x5ftcvaKPnxrH4gRuMTkqlYWiOoyXdmyUZx5JxBI8V7YAVHCKGK36hMRidLdB2OsWTAMLlN/2dKYvr8SIPnl1yV2oSgrLYO+GYm/AaGTTwh+kyz1Zs6ziAIaBGBZN8QfAZN6cXlCeFgsKtpQA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740128267; c=relaxed/simple; bh=NU/hkrqoZ5jzQmEQiGl0NYzgGLmACZcdobkKFKgALLY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Dy3kOoux763WQ79YIltdk1CKAFZ0GZLF9rzvWefoxNDFbZtCtOBg+ft1ckIxkBgni38A7aY1aggy5Wd3guw8+VTYoH1KhFX/ngdDnBWQFT1E8lGKu2QLRth5rbmos13ax/lpr+xivg6ruw5IZeiUxorgGO1CVyawmbn78BRl77g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XbPzM8zr; arc=none smtp.client-ip=209.85.214.195 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XbPzM8zr" Received: by mail-pl1-f195.google.com with SMTP id d9443c01a7336-221050f3f00so39174475ad.2 for ; Fri, 21 Feb 2025 00:57:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740128265; x=1740733065; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QkbJqFXXVa6R8LsAKTwnnxnNWPxmP+opdk/XTuZUT9c=; b=XbPzM8zrhIMDooKy1IqDdjvejHZcPkeha3ccxvMp3I9voVqMPkuFBPr+ku7wb5vTH8 z10J/P5brLDb/oTRprXyqAvlhbEjwDKMmmvWcO5EHHQ10NzmHbbHBA8x81Zt8a1M07tA hLzTPrJka+Mp1Q4Pb2B/9QzjSAbdfk2aKrpDUq80Mk82cFPpd/wHKP8LgrsitxOo5geS Nt8C9xbnTzRDGkRj1YgcCjYlR5QANhk1jfSXLcnukq3JAfkxUewSYm560GaJ6nenXP1f p2s7Yu7W1G1fyLlKbyb9OlMUdJw8hYnOhiw5+9MUUIXUqjtBFvCg1NOClFJoL3eqFyEV J9SQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740128265; x=1740733065; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QkbJqFXXVa6R8LsAKTwnnxnNWPxmP+opdk/XTuZUT9c=; b=Q0cqVPBK2nXccpEJEX/SvDVFcfOleyGfTGN+kwuVSUE8JiPkd4m7oZqYh44I4heiZq r74Lj5RS+eOJ/srXtZAPC78GPAjlGGCjldzL5V2zF02tuTuR7HOu5/3S5llKra3/XZ2s G3f7EQzMPKgkGqPsl6p0PntF0OEhQlqPzyw4FdL769vR80gm8ReponploWYztaeUhZdO owRKyAr2wK8otOsGINhmJ+SEKD4/Gb/fCrD2liG2+Miwsmv8Kr1oZMYtDvRK46xN/2ra mxfJvjN9IfEEIv4PB12KbWyTDLdMw2FdmlSCeaWRmOwZ+VEnjw8ZNqfqbab9um08ieyP WD1Q== X-Forwarded-Encrypted: i=1; AJvYcCWuCPfzIyv05tOfatcL9ITMClG6MwxVm/+1xlCLORagQOjDvJPbxqb73IpbEy+DrCqbmjlqsjvheRkSD/Y=@vger.kernel.org X-Gm-Message-State: AOJu0YzcDq8R3VKGah29FNazVPHr8897VLCgjI5zQ/1vyPmoF+RAfuD3 +elKUeKgTL05t69thsjbSXBNy7XW/rpQYROFpEhBfG0lhmPLgPrz X-Gm-Gg: ASbGncuuPTqVAqV0S8iVn1i8KPjZMocgTGlvYfiK/a9f9zdnjqeik+pl+zdlAdvWK5E 2bMCSVEcIinSUuPpnyd/bzj8T4qkKBPTgvREOvAtA8b/o5B+UAPxOv/va2NO3cDr4DOVfbdLaIv nmlYqg62J9UGPO2syqZkj+sLPkSNKAMS1b/JPzJna0h5Wmli96I8xQnaD0OAg3R8YBmEXFQa3Pi 9kT7FjrUGNJklW67zZBHHOb7jGat6zwzu9+0Ewh6OxeDfzhqcsTNB6JyxZlWSy1a9geCaiWIsgZ IoBVyB0NVg7XdvwhunnXEVaOGm3Fdwztyg== X-Google-Smtp-Source: AGHT+IFr1cySoaKCX7EeEXfWp3iKzTGJ/lI6Iaj5xm3KEjWwKvAn1t5QCWl37DL6w0m3Y0XokVI9dw== X-Received: by 2002:a17:903:2a8d:b0:220:eade:d773 with SMTP id d9443c01a7336-2219ff5e60cmr50134045ad.24.1740128264852; Fri, 21 Feb 2025 00:57:44 -0800 (PST) Received: from localhost.localdomain ([129.227.63.233]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-220d5366242sm132919715ad.87.2025.02.21.00.57.41 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 21 Feb 2025 00:57:44 -0800 (PST) From: zihan zhou <15645113830zzh@gmail.com> To: 15645113830zzh@gmail.com Cc: bsegall@google.com, dietmar.eggemann@arm.com, juri.lelli@redhat.com, linux-kernel@vger.kernel.org, mgorman@suse.de, mingo@redhat.com, peterz@infradead.org, rostedt@goodmis.org, vincent.guittot@linaro.org, vschneid@redhat.com Subject: [PATCH V1 4/4] sched: add feature PREDICT_NO_PREEMPT Date: Fri, 21 Feb 2025 16:57:26 +0800 Message-Id: <20250221085725.33943-1-15645113830zzh@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250221084437.31284-1-15645113830zzh@gmail.com> References: <20250221084437.31284-1-15645113830zzh@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Patch 4/4 is independent. It is an attempt to use the predict load. I observed that some tasks were almost finished, but they were preempted and had to spend more time running. This task can be identified by load prediction, that is, the load when enqueue is basically equal to the load when dequeue. If the se to be preempted is such a task and the pse should have preempted the se, PREDICT_NO_PREEMPT prevents pse from preempting and compensates pse (set_next_buddy). This is a protection for tasks that are executed immediately. If we find that our prediction fails later, we will resched the se. It can be said that this is a way to automatically adjust to SCHED_BATCH, The performance of hackbench has improved a little. ./hackbench -g 8 -l 10000 orig: 2.063s with PREDICT_NO_PREEMPT: 1.833s ./hackbench -g 16 -l 10000 =20 orig: 3.658s with PREDICT_NO_PREEMPT: 3.479s The average latency of cyclictest (with hackbench) has increased, but the maximum latency is no different. orig: I:1000 C: 181852 Min: 4 Act: 59 Avg: 212 Max: 21838 with PREDICT_NO_PREEMPT: I:1000 C: 181564 Min: 8 Act: 80 Avg: 457 Max: 22989 I think this kind of scheduling protection can't increase the scheduling delay over 1ms (every tick will check whether the prediction is correct). And it can improve the overall throughput, which seems acceptable. Of course, this patch is still experimental, and welcome to put forward suggestions. (Seems to predict util better?) In addition, I found that even if a high load hackbench was hung in the background, the terminal operation was still very smooth. Signed-off-by: zihan zhou <15645113830zzh@gmail.com> --- kernel/sched/fair.c | 92 +++++++++++++++++++++++++++++++++++++++-- kernel/sched/features.h | 4 ++ 2 files changed, 92 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d22d47419f79..21bf58a494ba 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1258,6 +1258,9 @@ static void update_curr(struct cfs_rq *cfs_rq) =20 curr->vruntime +=3D calc_delta_fair(delta_exec, curr); resched =3D update_deadline(cfs_rq, curr); +#ifdef CONFIG_SCHED_PREDICT_LOAD + resched |=3D predict_error_should_resched(curr); +#endif update_min_vruntime(cfs_rq); =20 if (entity_is_task(curr)) { @@ -8884,6 +8887,60 @@ static void set_next_buddy(struct sched_entity *se) } } =20 +#ifdef CONFIG_SCHED_PREDICT_LOAD +static bool predict_se_will_end_soon(struct sched_entity *se) +{ + struct predict_load_data *pldp =3D se->pldp; + + if (pldp =3D=3D NULL) + return false; + if (pldp->predict_load_normalized =3D=3D NO_PREDICT_LOAD) + return false; + if (pldp->predict_load_normalized > pldp->load_normalized_when_enqueue) + return false; + if (se->avg.load_avg >=3D get_predict_load(se)) + return false; + return true; +} + +void set_in_predict_no_preempt(struct sched_entity *se, bool in_predict_no= _preempt) +{ + struct predict_load_data *pldp =3D se->pldp; + + if (pldp =3D=3D NULL) + return; + pldp->in_predict_no_preempt =3D in_predict_no_preempt; +} + +static bool get_in_predict_no_preempt(struct sched_entity *se) +{ + struct predict_load_data *pldp =3D se->pldp; + + if (pldp =3D=3D NULL) + return false; + return pldp->in_predict_no_preempt; +} + +static bool predict_right(struct sched_entity *se) +{ + struct predict_load_data *pldp =3D se->pldp; + + if (pldp =3D=3D NULL) + return false; + if (pldp->predict_load_normalized =3D=3D NO_PREDICT_LOAD) + return false; + if (se->avg.load_avg <=3D get_predict_load(se)) + return true; + return false; +} + +bool predict_error_should_resched(struct sched_entity *se) +{ + return get_in_predict_no_preempt(se) && !predict_right(se); +} + +#endif + /* * Preempt the current task with a newly woken task if needed: */ @@ -8893,6 +8950,10 @@ static void check_preempt_wakeup_fair(struct rq *rq,= struct task_struct *p, int struct sched_entity *se =3D &donor->se, *pse =3D &p->se; struct cfs_rq *cfs_rq =3D task_cfs_rq(donor); int cse_is_idle, pse_is_idle; + bool if_best_se; +#ifdef CONFIG_SCHED_PREDICT_LOAD + bool predict_no_preempt =3D false; +#endif =20 if (unlikely(se =3D=3D pse)) return; @@ -8954,6 +9015,21 @@ static void check_preempt_wakeup_fair(struct rq *rq,= struct task_struct *p, int if (unlikely(!normal_policy(p->policy))) return; =20 +#ifdef CONFIG_SCHED_PREDICT_LOAD + /* + * If If we predict that se will end soon, it's better not to preempt it, + * but wait for it to exit by itself. This is undoubtedly a grievance for + * pse, so if pse should preempt se, we will give it some compensation. + */ + if (sched_feat(PREDICT_NO_PREEMPT)) { + if (predict_error_should_resched(se)) + goto preempt; + + if (predict_se_will_end_soon(se)) + predict_no_preempt =3D true; + } +#endif + cfs_rq =3D cfs_rq_of(se); update_curr(cfs_rq); /* @@ -8966,10 +9042,18 @@ static void check_preempt_wakeup_fair(struct rq *rq= , struct task_struct *p, int if (do_preempt_short(cfs_rq, pse, se)) cancel_protect_slice(se); =20 - /* - * If @p has become the most eligible task, force preemption. - */ - if (pick_eevdf(cfs_rq) =3D=3D pse) + if_best_se =3D (pick_eevdf(cfs_rq) =3D=3D pse); + +#ifdef CONFIG_SCHED_PREDICT_LOAD + if (predict_no_preempt) { + if (if_best_se && !pse->sched_delayed) { + set_next_buddy(pse); + set_in_predict_no_preempt(se, true); + return; + } + } +#endif + if (if_best_se) goto preempt; =20 return; diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 3c12d9f93331..8a78108af835 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -121,3 +121,7 @@ SCHED_FEAT(WA_BIAS, true) SCHED_FEAT(UTIL_EST, true) =20 SCHED_FEAT(LATENCY_WARN, false) + +#ifdef CONFIG_SCHED_PREDICT_LOAD +SCHED_FEAT(PREDICT_NO_PREEMPT, true) +#endif --=20 2.33.0