sched/uclamp: Exclude kernel threads from uclamp logic

[PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Zihuan Zhang 3 months ago

Kernel threads (PF_KTHREAD) are not subject to user-defined utilization
clamping. They do not represent user workloads and should not participate
in any uclamp logic, including:

 - clamp initialization during fork/post-fork
 - effective clamp value computation
 - runtime aggregation (uclamp_rq_inc/dec)

Allowing kernel threads into these paths may pollute the rq->uclamp[]
statistics, mislead schedutil governor's frequency selection, and
complicate debugging or trace interpretation.

This patch ensures that:
 - uclamp_fork() and uclamp_post_fork() skip kernel threads
 - uclamp_eff_value() return default values
 - uclamp_rq_inc() and uclamp_rq_dec() skip kernel threads

This aligns the semantics of uclamp with its original intent:
user-space task-specific clamping.

dmesg in uclamp_rq_inc_id:
[   76.373903] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[   76.375905] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[   76.379837] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[   76.379839] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[   76.379839] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[   76.379841] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[   76.383897] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[   76.383897] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[   76.383900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[   76.383901] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[   76.387885] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[   76.387885] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[   76.387888] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[   76.387889] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[   76.388139] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:0 value:0 kthread:1
[   76.388140] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[   76.388142] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[   76.388143] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:1 value:1024 kthread:1
[   76.388169] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1
[   76.388171] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1
[   76.388891] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[   76.388893] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[   76.392900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[   76.392902] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[   76.398850] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[   76.398852] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[   76.401880] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:0 value:0 kthread:1
[   76.401883] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:1 value:1024 kthread:1
[   76.409053] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[   76.409054] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[   76.410881] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1
[   76.410884] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1
[   76.419947] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[   76.419949] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[   76.419976] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1
[   76.419979] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1
[   76.420119] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[   76.420121] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[   76.420642] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[   76.420644] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[   76.434914] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:0 value:0 kthread:1
[   76.434916] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:1 value:1024 kthread:1
[   76.447689] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1
[   76.447691] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1
[   76.447705] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:0 value:0 kthread:1
[   76.447707] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:1 value:1024 kthread:1
[   76.448809] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[   76.448811] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[   76.451260] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[   76.451263] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[   76.452806] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[   76.452808] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[   76.488052] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[   76.488054] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[   76.488767] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[   76.488770] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[   76.490847] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1
[   76.490848] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:0 value:0 kthread:1
[   76.490849] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:0 value:0 kthread:1
[   76.490848] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:0 value:0 kthread:1
[   76.490849] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:0 value:0 kthread:1
[   76.490850] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:1 value:1024 kthread:1
[   76.490851] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1
[   76.490851] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:1 value:1024 kthread:1
[   76.490851] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:1 value:1024 kthread:1
[   76.490853] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:1 value:1024 kthread:1
[   76.490857] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:0 value:0 kthread:1
[   76.490859] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:1 value:1024 kthread:1
[   76.491850] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:0 value:0 kthread:1
[   76.491852] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:1 value:1024 kthread:1
[   76.504848] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:0 value:0 kthread:1
[   76.504852] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:1 value:1024 kthread:1
[   76.508785] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:0 value:0 kthread:1
[   76.508787] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:1 value:1024 kthread:1
[   76.514856] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1
[   76.514859] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1
[   76.522742] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1

Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
---
 kernel/sched/core.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8988d38d46a3..a1e6b4157682 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1630,6 +1630,9 @@ unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id)
 {
 	struct uclamp_se uc_eff;
 
+	if (p->flags & PF_KTHREAD)
+		return uclamp_none(clamp_id);
+
 	/* Task currently refcounted: use back-annotated (effective) value */
 	if (p->uclamp[clamp_id].active)
 		return (unsigned long)p->uclamp[clamp_id].value;
@@ -1769,6 +1772,9 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags
 	if (unlikely(!p->sched_class->uclamp_enabled))
 		return;
 
+	if (p->flags & PF_KTHREAD)
+		return;
+
 	/* Only inc the delayed task which being woken up. */
 	if (p->se.sched_delayed && !(flags & ENQUEUE_DELAYED))
 		return;
@@ -1797,6 +1803,9 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
 	if (unlikely(!p->sched_class->uclamp_enabled))
 		return;
 
+	if (p->flags & PF_KTHREAD)
+		return;
+
 	if (p->se.sched_delayed)
 		return;
 
@@ -1977,6 +1986,8 @@ static void uclamp_fork(struct task_struct *p)
 {
 	enum uclamp_id clamp_id;
 
+	if (p->flags & PF_KTHREAD)
+		return;
 	/*
 	 * We don't need to hold task_rq_lock() when updating p->uclamp_* here
 	 * as the task is still at its early fork stages.
@@ -1995,6 +2006,8 @@ static void uclamp_fork(struct task_struct *p)
 
 static void uclamp_post_fork(struct task_struct *p)
 {
+	if (p->flags & PF_KTHREAD)
+		return;
 	uclamp_update_util_min_rt_default(p);
 }
 
-- 
2.25.1

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Xuewen Yan 3 months ago

Hi zihuan,

On Thu, Jul 3, 2025 at 5:15 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
>
> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization
> clamping. They do not represent user workloads and should not participate
> in any uclamp logic, including:

Indeed, some driver would use set_scheduler() to set some kthread to
improve performance.
It is not a good idea to exclude it.

Thanks！

>
>  - clamp initialization during fork/post-fork
>  - effective clamp value computation
>  - runtime aggregation (uclamp_rq_inc/dec)
>
> Allowing kernel threads into these paths may pollute the rq->uclamp[]
> statistics, mislead schedutil governor's frequency selection, and
> complicate debugging or trace interpretation.
>
> This patch ensures that:
>  - uclamp_fork() and uclamp_post_fork() skip kernel threads
>  - uclamp_eff_value() return default values
>  - uclamp_rq_inc() and uclamp_rq_dec() skip kernel threads
>
> This aligns the semantics of uclamp with its original intent:
> user-space task-specific clamping.
>
> dmesg in uclamp_rq_inc_id:
> [   76.373903] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
> [   76.375905] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
> [   76.379837] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
> [   76.379839] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
> [   76.379839] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
> [   76.379841] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
> [   76.383897] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
> [   76.383897] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
> [   76.383900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
> [   76.383901] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
> [   76.387885] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
> [   76.387885] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
> [   76.387888] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
> [   76.387889] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
> [   76.388139] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:0 value:0 kthread:1
> [   76.388140] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
> [   76.388142] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
> [   76.388143] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:1 value:1024 kthread:1
> [   76.388169] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1
> [   76.388171] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1
> [   76.388891] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
> [   76.388893] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
> [   76.392900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
> [   76.392902] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
> [   76.398850] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
> [   76.398852] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
> [   76.401880] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:0 value:0 kthread:1
> [   76.401883] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:1 value:1024 kthread:1
> [   76.409053] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
> [   76.409054] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
> [   76.410881] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1
> [   76.410884] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1
> [   76.419947] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
> [   76.419949] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
> [   76.419976] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1
> [   76.419979] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1
> [   76.420119] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
> [   76.420121] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
> [   76.420642] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
> [   76.420644] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
> [   76.434914] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:0 value:0 kthread:1
> [   76.434916] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:1 value:1024 kthread:1
> [   76.447689] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1
> [   76.447691] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1
> [   76.447705] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:0 value:0 kthread:1
> [   76.447707] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:1 value:1024 kthread:1
> [   76.448809] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
> [   76.448811] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
> [   76.451260] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
> [   76.451263] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
> [   76.452806] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
> [   76.452808] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
> [   76.488052] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
> [   76.488054] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
> [   76.488767] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
> [   76.488770] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
> [   76.490847] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1
> [   76.490848] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:0 value:0 kthread:1
> [   76.490849] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:0 value:0 kthread:1
> [   76.490848] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:0 value:0 kthread:1
> [   76.490849] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:0 value:0 kthread:1
> [   76.490850] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:1 value:1024 kthread:1
> [   76.490851] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1
> [   76.490851] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:1 value:1024 kthread:1
> [   76.490851] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:1 value:1024 kthread:1
> [   76.490853] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:1 value:1024 kthread:1
> [   76.490857] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:0 value:0 kthread:1
> [   76.490859] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:1 value:1024 kthread:1
> [   76.491850] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:0 value:0 kthread:1
> [   76.491852] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:1 value:1024 kthread:1
> [   76.504848] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:0 value:0 kthread:1
> [   76.504852] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:1 value:1024 kthread:1
> [   76.508785] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:0 value:0 kthread:1
> [   76.508787] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:1 value:1024 kthread:1
> [   76.514856] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1
> [   76.514859] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1
> [   76.522742] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
>
> Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
> ---
>  kernel/sched/core.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 8988d38d46a3..a1e6b4157682 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1630,6 +1630,9 @@ unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id)
>  {
>         struct uclamp_se uc_eff;
>
> +       if (p->flags & PF_KTHREAD)
> +               return uclamp_none(clamp_id);
> +
>         /* Task currently refcounted: use back-annotated (effective) value */
>         if (p->uclamp[clamp_id].active)
>                 return (unsigned long)p->uclamp[clamp_id].value;
> @@ -1769,6 +1772,9 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags
>         if (unlikely(!p->sched_class->uclamp_enabled))
>                 return;
>
> +       if (p->flags & PF_KTHREAD)
> +               return;
> +
>         /* Only inc the delayed task which being woken up. */
>         if (p->se.sched_delayed && !(flags & ENQUEUE_DELAYED))
>                 return;
> @@ -1797,6 +1803,9 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
>         if (unlikely(!p->sched_class->uclamp_enabled))
>                 return;
>
> +       if (p->flags & PF_KTHREAD)
> +               return;
> +
>         if (p->se.sched_delayed)
>                 return;
>
> @@ -1977,6 +1986,8 @@ static void uclamp_fork(struct task_struct *p)
>  {
>         enum uclamp_id clamp_id;
>
> +       if (p->flags & PF_KTHREAD)
> +               return;
>         /*
>          * We don't need to hold task_rq_lock() when updating p->uclamp_* here
>          * as the task is still at its early fork stages.
> @@ -1995,6 +2006,8 @@ static void uclamp_fork(struct task_struct *p)
>
>  static void uclamp_post_fork(struct task_struct *p)
>  {
> +       if (p->flags & PF_KTHREAD)
> +               return;
>         uclamp_update_util_min_rt_default(p);
>  }
>
> --
> 2.25.1
>
>

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Zihuan Zhang 3 months ago

Hi Xuewen,

Thanks for your feedback — that makes a lot of sense.

在 2025/7/3 17:42, Xuewen Yan 写道:
> Hi zihuan,
>
> On Thu, Jul 3, 2025 at 5:15 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
>> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization
>> clamping. They do not represent user workloads and should not participate
>> in any uclamp logic, including:
> Indeed, some driver would use set_scheduler() to set some kthread to
> improve performance.
> It is not a good idea to exclude it.
>
> Thanks！
>
I agree that kernel threads may need explicit scheduling control, so 
it’s indeed not a good idea to exclude them unconditionally.
Our main concern was that uclamp_rq_inc() is a performance-sensitive 
path, and letting default-initialized kthreads participate in clamp 
aggregation could lead to unnecessary overhead and distort frequency 
decisions.

We’ll rework the patch to be more selective — possibly skipping only 
those kernel threads that don’t have user-defined clamp values.

Thanks again for the helpful input!
>>   - clamp initialization during fork/post-fork
>>   - effective clamp value computation
>>   - runtime aggregation (uclamp_rq_inc/dec)
>>
>> Allowing kernel threads into these paths may pollute the rq->uclamp[]
>> statistics, mislead schedutil governor's frequency selection, and
>> complicate debugging or trace interpretation.
>>
>> This patch ensures that:
>>   - uclamp_fork() and uclamp_post_fork() skip kernel threads
>>   - uclamp_eff_value() return default values
>>   - uclamp_rq_inc() and uclamp_rq_dec() skip kernel threads
>>
>> This aligns the semantics of uclamp with its original intent:
>> user-space task-specific clamping.
>>
>> dmesg in uclamp_rq_inc_id:
>> [   76.373903] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
>> [   76.375905] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
>> [   76.379837] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
>> [   76.379839] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
>> [   76.379839] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
>> [   76.379841] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
>> [   76.383897] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
>> [   76.383897] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
>> [   76.383900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
>> [   76.383901] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
>> [   76.387885] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
>> [   76.387885] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
>> [   76.387888] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
>> [   76.387889] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
>> [   76.388139] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:0 value:0 kthread:1
>> [   76.388140] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
>> [   76.388142] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
>> [   76.388143] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:1 value:1024 kthread:1
>> [   76.388169] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1
>> [   76.388171] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1
>> [   76.388891] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
>> [   76.388893] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
>> [   76.392900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
>> [   76.392902] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
>> [   76.398850] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
>> [   76.398852] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
>> [   76.401880] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:0 value:0 kthread:1
>> [   76.401883] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:1 value:1024 kthread:1
>> [   76.409053] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
>> [   76.409054] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
>> [   76.410881] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1
>> [   76.410884] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1
>> [   76.419947] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
>> [   76.419949] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
>> [   76.419976] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1
>> [   76.419979] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1
>> [   76.420119] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
>> [   76.420121] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
>> [   76.420642] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
>> [   76.420644] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
>> [   76.434914] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:0 value:0 kthread:1
>> [   76.434916] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:1 value:1024 kthread:1
>> [   76.447689] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1
>> [   76.447691] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1
>> [   76.447705] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:0 value:0 kthread:1
>> [   76.447707] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:1 value:1024 kthread:1
>> [   76.448809] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
>> [   76.448811] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
>> [   76.451260] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
>> [   76.451263] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
>> [   76.452806] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
>> [   76.452808] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
>> [   76.488052] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
>> [   76.488054] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
>> [   76.488767] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
>> [   76.488770] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
>> [   76.490847] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1
>> [   76.490848] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:0 value:0 kthread:1
>> [   76.490849] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:0 value:0 kthread:1
>> [   76.490848] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:0 value:0 kthread:1
>> [   76.490849] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:0 value:0 kthread:1
>> [   76.490850] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:1 value:1024 kthread:1
>> [   76.490851] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1
>> [   76.490851] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:1 value:1024 kthread:1
>> [   76.490851] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:1 value:1024 kthread:1
>> [   76.490853] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:1 value:1024 kthread:1
>> [   76.490857] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:0 value:0 kthread:1
>> [   76.490859] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:1 value:1024 kthread:1
>> [   76.491850] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:0 value:0 kthread:1
>> [   76.491852] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:1 value:1024 kthread:1
>> [   76.504848] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:0 value:0 kthread:1
>> [   76.504852] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:1 value:1024 kthread:1
>> [   76.508785] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:0 value:0 kthread:1
>> [   76.508787] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:1 value:1024 kthread:1
>> [   76.514856] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1
>> [   76.514859] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1
>> [   76.522742] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
>>
>> Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
>> ---
>>   kernel/sched/core.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 8988d38d46a3..a1e6b4157682 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1630,6 +1630,9 @@ unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id)
>>   {
>>          struct uclamp_se uc_eff;
>>
>> +       if (p->flags & PF_KTHREAD)
>> +               return uclamp_none(clamp_id);
>> +
>>          /* Task currently refcounted: use back-annotated (effective) value */
>>          if (p->uclamp[clamp_id].active)
>>                  return (unsigned long)p->uclamp[clamp_id].value;
>> @@ -1769,6 +1772,9 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags
>>          if (unlikely(!p->sched_class->uclamp_enabled))
>>                  return;
>>
>> +       if (p->flags & PF_KTHREAD)
>> +               return;
>> +
>>          /* Only inc the delayed task which being woken up. */
>>          if (p->se.sched_delayed && !(flags & ENQUEUE_DELAYED))
>>                  return;
>> @@ -1797,6 +1803,9 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
>>          if (unlikely(!p->sched_class->uclamp_enabled))
>>                  return;
>>
>> +       if (p->flags & PF_KTHREAD)
>> +               return;
>> +
>>          if (p->se.sched_delayed)
>>                  return;
>>
>> @@ -1977,6 +1986,8 @@ static void uclamp_fork(struct task_struct *p)
>>   {
>>          enum uclamp_id clamp_id;
>>
>> +       if (p->flags & PF_KTHREAD)
>> +               return;
>>          /*
>>           * We don't need to hold task_rq_lock() when updating p->uclamp_* here
>>           * as the task is still at its early fork stages.
>> @@ -1995,6 +2006,8 @@ static void uclamp_fork(struct task_struct *p)
>>
>>   static void uclamp_post_fork(struct task_struct *p)
>>   {
>> +       if (p->flags & PF_KTHREAD)
>> +               return;
>>          uclamp_update_util_min_rt_default(p);
>>   }
>>
>> --
>> 2.25.1
>>
>>
Best regards,
Zihuan

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Christian Loehle 3 months ago

On 7/3/25 10:14, Zihuan Zhang wrote:
> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization
> clamping. They do not represent user workloads and should not participate
> in any uclamp logic, including:

Why not?

> [snip]

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Zihuan Zhang 3 months ago

Hi Christian,

Thanks for the question!

在 2025/7/3 17:22, Christian Loehle 写道:
> On 7/3/25 10:14, Zihuan Zhang wrote:
>> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization
>> clamping. They do not represent user workloads and should not participate
>> in any uclamp logic, including:
> Why not?
>
As Xuewen mentioned, some kernel threads may intentionally set 
scheduling attributes for performance. So instead of unconditionally 
excluding all kernel threads, I’m now considering a more conservative 
approach:
skip only those kthreads that haven’t explicitly set any clamp values.

This should help avoid unintended clamp aggregation while still 
supporting performance-tuned kthreads.

>> [snip]
Best regards,
Zihuan

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Steven Rostedt 3 months ago

On Thu, 3 Jul 2025 18:07:20 +0800
Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:

> As Xuewen mentioned, some kernel threads may intentionally set 
> scheduling attributes for performance. So instead of unconditionally 
> excluding all kernel threads, I’m now considering a more conservative 
> approach:
> skip only those kthreads that haven’t explicitly set any clamp values.

The above comment bothers me. What kernel threads set scheduling attributes?

From my experience working on real-time, no kernel thread gets
scheduling tweaking correct.

-- Steve

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Zihuan Zhang 2 months, 4 weeks ago

Hi Steven,

Thanks for the feedback!

在 2025/7/3 22:51, Steven Rostedt 写道:
> On Thu, 3 Jul 2025 18:07:20 +0800
> Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
>
>> As Xuewen mentioned, some kernel threads may intentionally set
>> scheduling attributes for performance. So instead of unconditionally
>> excluding all kernel threads, I’m now considering a more conservative
>> approach:
>> skip only those kthreads that haven’t explicitly set any clamp values.
> The above comment bothers me. What kernel threads set scheduling attributes?
>
>  From my experience working on real-time, no kernel thread gets
> scheduling tweaking correct.
>
> -- Steve

You raised a good point — in most cases, kernel threads don’t explicitly 
tweak scheduling attributes, and when they do, it might not always be 
correct or effective. I appreciate the insight from your real-time 
experience.

The motivation behind this patch is to explore whether it’s worth 
optimizing the uclamp hot path a bit further. Since kernel threads 
typically don’t benefit from uclamp adjustments and often just inherit 
default values (e.g., max=1024), we were wondering if skipping the 
aggregation logic for such cases could slightly reduce overhead in some 
workloads.

Of course, we want to be conservative and avoid breaking any legitimate 
usage. So I’d love to hear your opinion — do you think it’s worthwhile 
to pursue this kind of micro-optimization in uclamp, or is the potential 
gain too marginal to justify the added logic?

Thanks again for your time and thoughts!

Best regards,
Zihuan Zhang

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Steven Rostedt 2 months, 4 weeks ago

On Thu, 10 Jul 2025 08:55:28 +0800
Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:

> The motivation behind this patch is to explore whether it’s worth 
> optimizing the uclamp hot path a bit further. Since kernel threads 
> typically don’t benefit from uclamp adjustments and often just inherit 
> default values (e.g., max=1024), we were wondering if skipping the 
> aggregation logic for such cases could slightly reduce overhead in some 
> workloads.
> 
> Of course, we want to be conservative and avoid breaking any legitimate 
> usage. So I’d love to hear your opinion — do you think it’s worthwhile 
> to pursue this kind of micro-optimization in uclamp, or is the potential 
> gain too marginal to justify the added logic?

My honest opinion is that if there's not a huge issue you are trying
to solve, then it's best to leave things as is. Tweaking this for
micro-optimizations usually end up causing a regression somewhere you
never expected.

-- Steve

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Zihuan Zhang 2 months, 3 weeks ago

在 2025/7/10 22:03, Steven Rostedt 写道:
> On Thu, 10 Jul 2025 08:55:28 +0800
> Zihuan Zhang <zhangzihuan@kylinos.cn> wrote:
>
>> The motivation behind this patch is to explore whether it’s worth
>> optimizing the uclamp hot path a bit further. Since kernel threads
>> typically don’t benefit from uclamp adjustments and often just inherit
>> default values (e.g., max=1024), we were wondering if skipping the
>> aggregation logic for such cases could slightly reduce overhead in some
>> workloads.
>>
>> Of course, we want to be conservative and avoid breaking any legitimate
>> usage. So I’d love to hear your opinion — do you think it’s worthwhile
>> to pursue this kind of micro-optimization in uclamp, or is the potential
>> gain too marginal to justify the added logic?
> My honest opinion is that if there's not a huge issue you are trying
> to solve, then it's best to leave things as is. Tweaking this for
> micro-optimizations usually end up causing a regression somewhere you
> never expected.
>
> -- Steve
Thanks for the advice! I’ll keep that in mind for my next patch.

Best regards,
Zihuan Zhang

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Christian Loehle 3 months ago

On 7/3/25 11:07, Zihuan Zhang wrote:
> Hi Christian,
> 
> Thanks for the question!
> 
> 在 2025/7/3 17:22, Christian Loehle 写道:
>> On 7/3/25 10:14, Zihuan Zhang wrote:
>>> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization
>>> clamping. They do not represent user workloads and should not participate
>>> in any uclamp logic, including:
>> Why not?
>>
> As Xuewen mentioned, some kernel threads may intentionally set scheduling attributes for performance. So instead of unconditionally excluding all kernel threads, I’m now considering a more conservative approach:
> skip only those kthreads that haven’t explicitly set any clamp values.
> 
> This should help avoid unintended clamp aggregation while still supporting performance-tuned kthreads.

I'm skeptical, fundamentally you cannot exclude some fair tasks from uclamp logic.
At least the cpufreq part they will be affected by, so if you 'exclude' some
kthread that doesn't have clamps set (i.e. has min=0, max=1024) its
utilization may not contribute to sugov frequency selection by being
clamped by other task(s) (let's say you only have one other task with 
max=0, excluding the unclamped kthread now leads to sugov requesting
the lowest OPP? Is that always correct/desired?)

Is there a specific issue you're trying to solve?
FYI there has been discussion around reworking the uclamp mechanism to solve
some issues you may have been facing, but so far they haven't lead anywhere:
https://lore.kernel.org/lkml/cover.1741091349.git.hongyan.xia2@arm.com/

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Zihuan Zhang 2 months, 4 weeks ago

Hi Christian,
Apologies for the late reply, and thanks for raising the concerns.

在 2025/7/3 18:17, Christian Loehle 写道:
> On 7/3/25 11:07, Zihuan Zhang wrote:
>> Hi Christian,
>>
>> Thanks for the question!
>>
>> 在 2025/7/3 17:22, Christian Loehle 写道:
>>> On 7/3/25 10:14, Zihuan Zhang wrote:
>>>> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization
>>>> clamping. They do not represent user workloads and should not participate
>>>> in any uclamp logic, including:
>>> Why not?
>>>
>> As Xuewen mentioned, some kernel threads may intentionally set scheduling attributes for performance. So instead of unconditionally excluding all kernel threads, I’m now considering a more conservative approach:
>> skip only those kthreads that haven’t explicitly set any clamp values.
>>
>> This should help avoid unintended clamp aggregation while still supporting performance-tuned kthreads.
> I'm skeptical, fundamentally you cannot exclude some fair tasks from uclamp logic.
> At least the cpufreq part they will be affected by, so if you 'exclude' some
> kthread that doesn't have clamps set (i.e. has min=0, max=1024) its
> utilization may not contribute to sugov frequency selection by being
> clamped by other task(s) (let's say you only have one other task with
> max=0, excluding the unclamped kthread now leads to sugov requesting
> the lowest OPP? Is that always correct/desired?)
>
> Is there a specific issue you're trying to solve?
> FYI there has been discussion around reworking the uclamp mechanism to solve
> some issues you may have been facing, but so far they haven't lead anywhere:
> https://lore.kernel.org/lkml/cover.1741091349.git.hongyan.xia2@arm.com/
Our original motivation stems from the observation that uclamp is 
primarily designed to manage frequency selection based on user-space 
task behavior. Kernel threads typically do not represent user workloads 
and are often not considered meaningful participants in uclamp-driven 
decisions.
To be clear, we are not aiming to exclude all kthreads from affecting 
frequency, but rather to explore ways to avoid unnecessary uclamp 
aggregation overhead from kernel threads that have no explicit clamp 
values set (i.e. uclamp.min=0, max=1024).
As you pointed out, fully excluding these tasks might interfere with 
sugov behavior in certain edge cases. So a more balanced approach might be:

- For kernel threads that do not set any clamp values, skip the clamp 
aggregation step

- If a kernel thread explicitly sets clamp attributes, it should of 
course remain fully visible to uclamp logic.

This would preserve correctness while reducing unnecessary overhead in 
the hot path, especially on systems with many runnable tasks.

That said, we're open to alternative suggestions if there's a better way 
to achieve this optimization. We'll also review the earlier rework 
discussion you linked, thank you for pointing that out.


Best regards,
Zihuan Zhang

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Christian Loehle 2 months, 4 weeks ago

On 7/10/25 01:47, Zihuan Zhang wrote:
> Hi Christian,
> Apologies for the late reply, and thanks for raising the concerns.
> 
> 在 2025/7/3 18:17, Christian Loehle 写道:
>> On 7/3/25 11:07, Zihuan Zhang wrote:
>>> Hi Christian,
>>>
>>> Thanks for the question!
>>>
>>> 在 2025/7/3 17:22, Christian Loehle 写道:
>>>> On 7/3/25 10:14, Zihuan Zhang wrote:
>>>>> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization
>>>>> clamping. They do not represent user workloads and should not participate
>>>>> in any uclamp logic, including:
>>>> Why not?
>>>>
>>> As Xuewen mentioned, some kernel threads may intentionally set scheduling attributes for performance. So instead of unconditionally excluding all kernel threads, I’m now considering a more conservative approach:
>>> skip only those kthreads that haven’t explicitly set any clamp values.
>>>
>>> This should help avoid unintended clamp aggregation while still supporting performance-tuned kthreads.
>> I'm skeptical, fundamentally you cannot exclude some fair tasks from uclamp logic.
>> At least the cpufreq part they will be affected by, so if you 'exclude' some
>> kthread that doesn't have clamps set (i.e. has min=0, max=1024) its
>> utilization may not contribute to sugov frequency selection by being
>> clamped by other task(s) (let's say you only have one other task with
>> max=0, excluding the unclamped kthread now leads to sugov requesting
>> the lowest OPP? Is that always correct/desired?)
>>
>> Is there a specific issue you're trying to solve?
>> FYI there has been discussion around reworking the uclamp mechanism to solve
>> some issues you may have been facing, but so far they haven't lead anywhere:
>> https://lore.kernel.org/lkml/cover.1741091349.git.hongyan.xia2@arm.com/
> Our original motivation stems from the observation that uclamp is primarily designed to manage frequency selection based on user-space task behavior. Kernel threads typically do not represent user workloads and are often not considered meaningful participants in uclamp-driven decisions.

Two comments to that:
- It's also used to drive task placement, not just frequency selection.
- There can be cases where a kthread is fundamentally part of a user workload,
thinking about io_uring here, but others exist too.

> To be clear, we are not aiming to exclude all kthreads from affecting frequency, but rather to explore ways to avoid unnecessary uclamp aggregation overhead from kernel threads that have no explicit clamp values set (i.e. uclamp.min=0, max=1024).
> As you pointed out, fully excluding these tasks might interfere with sugov behavior in certain edge cases. So a more balanced approach might be:
> 
> - For kernel threads that do not set any clamp values, skip the clamp aggregation step
> 
> - If a kernel thread explicitly sets clamp attributes, it should of course remain fully visible to uclamp logic.
> 
> This would preserve correctness while reducing unnecessary overhead in the hot path, especially on systems with many runnable tasks.

So an unclamped task not being part of uclamp will definitely affect the UCLAMP_MAX
result, as I've mentioned above, you'll apply (other tasks) UCLAMP_MAX restrictions
even if the kthread has UCLAMP_MAX==1024. That is not always desirable.
Or would you let it take part in uclamp if the user explicitly set UCLAMP_MAX==1024
instead of relying on the default? That wouldn't be consistent IMO.

Regarding the optimization part:
Is there a specific workload where the overhead is an issue? It should
be rather small. Some numbers should help.

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Zihuan Zhang 2 months, 3 weeks ago

在 2025/7/10 16:41, Christian Loehle 写道:
> On 7/10/25 01:47, Zihuan Zhang wrote:
>> Hi Christian,
>> Apologies for the late reply, and thanks for raising the concerns.
>>
>> 在 2025/7/3 18:17, Christian Loehle 写道:
>>> On 7/3/25 11:07, Zihuan Zhang wrote:
>>>> Hi Christian,
>>>>
>>>> Thanks for the question!
>>>>
>>>> 在 2025/7/3 17:22, Christian Loehle 写道:
>>>>> On 7/3/25 10:14, Zihuan Zhang wrote:
>>>>>> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization
>>>>>> clamping. They do not represent user workloads and should not participate
>>>>>> in any uclamp logic, including:
>>>>> Why not?
>>>>>
>>>> As Xuewen mentioned, some kernel threads may intentionally set scheduling attributes for performance. So instead of unconditionally excluding all kernel threads, I’m now considering a more conservative approach:
>>>> skip only those kthreads that haven’t explicitly set any clamp values.
>>>>
>>>> This should help avoid unintended clamp aggregation while still supporting performance-tuned kthreads.
>>> I'm skeptical, fundamentally you cannot exclude some fair tasks from uclamp logic.
>>> At least the cpufreq part they will be affected by, so if you 'exclude' some
>>> kthread that doesn't have clamps set (i.e. has min=0, max=1024) its
>>> utilization may not contribute to sugov frequency selection by being
>>> clamped by other task(s) (let's say you only have one other task with
>>> max=0, excluding the unclamped kthread now leads to sugov requesting
>>> the lowest OPP? Is that always correct/desired?)
>>>
>>> Is there a specific issue you're trying to solve?
>>> FYI there has been discussion around reworking the uclamp mechanism to solve
>>> some issues you may have been facing, but so far they haven't lead anywhere:
>>> https://lore.kernel.org/lkml/cover.1741091349.git.hongyan.xia2@arm.com/
>> Our original motivation stems from the observation that uclamp is primarily designed to manage frequency selection based on user-space task behavior. Kernel threads typically do not represent user workloads and are often not considered meaningful participants in uclamp-driven decisions.
> Two comments to that:
> - It's also used to drive task placement, not just frequency selection.
> - There can be cases where a kthread is fundamentally part of a user workload,
> thinking about io_uring here, but others exist too.
>
>> To be clear, we are not aiming to exclude all kthreads from affecting frequency, but rather to explore ways to avoid unnecessary uclamp aggregation overhead from kernel threads that have no explicit clamp values set (i.e. uclamp.min=0, max=1024).
>> As you pointed out, fully excluding these tasks might interfere with sugov behavior in certain edge cases. So a more balanced approach might be:
>>
>> - For kernel threads that do not set any clamp values, skip the clamp aggregation step
>>
>> - If a kernel thread explicitly sets clamp attributes, it should of course remain fully visible to uclamp logic.
>>
>> This would preserve correctness while reducing unnecessary overhead in the hot path, especially on systems with many runnable tasks.
> So an unclamped task not being part of uclamp will definitely affect the UCLAMP_MAX
> result, as I've mentioned above, you'll apply (other tasks) UCLAMP_MAX restrictions
> even if the kthread has UCLAMP_MAX==1024. That is not always desirable.
> Or would you let it take part in uclamp if the user explicitly set UCLAMP_MAX==1024
> instead of relying on the default? That wouldn't be consistent IMO.
>
> Regarding the optimization part:
> Is there a specific workload where the overhead is an issue? It should
> be rather small. Some numbers should help.
You’re absolutely right — excluding unclamped kernel threads entirely 
can unintentionally affect UCLAMP_MAX aggregation, and may lead to 
undesirable behavior in edge cases. I agree that this would not be a 
consistent or generally correct approach.

At this stage, I think the idea still lacks maturity, and I appreciate 
your input in highlighting the possible implications. I’m currently 
diving deeper into the schedutil governor code to better understand how 
uclamp aggregation interacts with frequency selection and task placement 
in real workloads.

With that in mind, I’ll take a step back and revisit the broader problem 
from a more informed perspective. Hopefully, in the near future, I’ll 
come up with a more solid and well-justified solution.

Thanks again for your time and insights.

Best regards,
Zihuan Zhang

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by K Prateek Nayak 2 months, 4 weeks ago

Hello Zihuan,

On 7/10/2025 6:17 AM, Zihuan Zhang wrote:
> - For kernel threads that do not set any clamp values, skip the clamp
> aggregation step
> 
> - If a kernel thread explicitly sets clamp attributes, it should of
> course remain fully visible to uclamp logic.

There are also sched_util_clamp_{min,max} global controls via sysctl
which can be influencing the kthread scheduling / freq behavior
indirectly and glancing at the implementation, I think these are
still handled by clamping in uclamp_eff_get() and effective_cpu_util()
only looks at uclamp_rq_get() to make freq decisions.

Wouldn't excluding the kthreads from the uclamp aggregation also change
this behavior? I'm assuming these global knobs can be used to limit
frequencies when thermal throttle is detected and be reset again once
the SoC falls below the throttle limits?

-- 
Thanks and Regards,
Prateek

Re: [PATCH v1] sched/uclamp: Exclude kernel threads from uclamp logic

Posted by Zihuan Zhang 2 months, 3 weeks ago


在 2025/7/10 11:41, K Prateek Nayak 写道:
> Hello Zihuan,
>
> On 7/10/2025 6:17 AM, Zihuan Zhang wrote:
>> - For kernel threads that do not set any clamp values, skip the clamp
>> aggregation step
>>
>> - If a kernel thread explicitly sets clamp attributes, it should of
>> course remain fully visible to uclamp logic.
> There are also sched_util_clamp_{min,max} global controls via sysctl
> which can be influencing the kthread scheduling / freq behavior
> indirectly and glancing at the implementation, I think these are
> still handled by clamping in uclamp_eff_get() and effective_cpu_util()
> only looks at uclamp_rq_get() to make freq decisions.
>
> Wouldn't excluding the kthreads from the uclamp aggregation also change
> this behavior? I'm assuming these global knobs can be used to limit
> frequencies when thermal throttle is detected and be reset again once
> the SoC falls below the throttle limits?
>
Hi Prateek,

Thanks a lot for pointing this out — I hadn’t fully considered the 
impact of global sched_util_clamp_{min,max} settings on kernel threads 
that don’t explicitly set clamp values.

I’ll take a closer look at how this behaves in practice and carefully 
rethink the approach.

Best regards,
Zihuan Zhang