kernel/sched/core.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
Kernel threads (PF_KTHREAD) are not subject to user-defined utilization
clamping. They do not represent user workloads and should not participate
in any uclamp logic, including:
- clamp initialization during fork/post-fork
- effective clamp value computation
- runtime aggregation (uclamp_rq_inc/dec)
Allowing kernel threads into these paths may pollute the rq->uclamp[]
statistics, mislead schedutil governor's frequency selection, and
complicate debugging or trace interpretation.
This patch ensures that:
- uclamp_fork() and uclamp_post_fork() skip kernel threads
- uclamp_eff_value() return default values
- uclamp_rq_inc() and uclamp_rq_dec() skip kernel threads
This aligns the semantics of uclamp with its original intent:
user-space task-specific clamping.
dmesg in uclamp_rq_inc_id:
[ 76.373903] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[ 76.375905] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[ 76.379837] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[ 76.379839] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[ 76.379839] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[ 76.379841] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[ 76.383897] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[ 76.383897] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[ 76.383900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[ 76.383901] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[ 76.387885] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[ 76.387885] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[ 76.387888] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[ 76.387889] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[ 76.388139] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:0 value:0 kthread:1
[ 76.388140] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[ 76.388142] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[ 76.388143] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:1 value:1024 kthread:1
[ 76.388169] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1
[ 76.388171] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1
[ 76.388891] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[ 76.388893] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[ 76.392900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[ 76.392902] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[ 76.398850] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[ 76.398852] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[ 76.401880] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:0 value:0 kthread:1
[ 76.401883] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:1 value:1024 kthread:1
[ 76.409053] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[ 76.409054] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[ 76.410881] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1
[ 76.410884] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1
[ 76.419947] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[ 76.419949] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[ 76.419976] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1
[ 76.419979] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1
[ 76.420119] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1
[ 76.420121] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1
[ 76.420642] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[ 76.420644] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[ 76.434914] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:0 value:0 kthread:1
[ 76.434916] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:1 value:1024 kthread:1
[ 76.447689] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1
[ 76.447691] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1
[ 76.447705] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:0 value:0 kthread:1
[ 76.447707] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:1 value:1024 kthread:1
[ 76.448809] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[ 76.448811] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[ 76.451260] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[ 76.451263] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[ 76.452806] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1
[ 76.452808] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1
[ 76.488052] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[ 76.488054] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[ 76.488767] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
[ 76.488770] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1
[ 76.490847] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1
[ 76.490848] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:0 value:0 kthread:1
[ 76.490849] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:0 value:0 kthread:1
[ 76.490848] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:0 value:0 kthread:1
[ 76.490849] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:0 value:0 kthread:1
[ 76.490850] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:1 value:1024 kthread:1
[ 76.490851] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1
[ 76.490851] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:1 value:1024 kthread:1
[ 76.490851] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:1 value:1024 kthread:1
[ 76.490853] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:1 value:1024 kthread:1
[ 76.490857] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:0 value:0 kthread:1
[ 76.490859] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:1 value:1024 kthread:1
[ 76.491850] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:0 value:0 kthread:1
[ 76.491852] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:1 value:1024 kthread:1
[ 76.504848] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:0 value:0 kthread:1
[ 76.504852] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:1 value:1024 kthread:1
[ 76.508785] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:0 value:0 kthread:1
[ 76.508787] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:1 value:1024 kthread:1
[ 76.514856] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1
[ 76.514859] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1
[ 76.522742] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1
Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn>
---
kernel/sched/core.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8988d38d46a3..a1e6b4157682 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1630,6 +1630,9 @@ unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id)
{
struct uclamp_se uc_eff;
+ if (p->flags & PF_KTHREAD)
+ return uclamp_none(clamp_id);
+
/* Task currently refcounted: use back-annotated (effective) value */
if (p->uclamp[clamp_id].active)
return (unsigned long)p->uclamp[clamp_id].value;
@@ -1769,6 +1772,9 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags
if (unlikely(!p->sched_class->uclamp_enabled))
return;
+ if (p->flags & PF_KTHREAD)
+ return;
+
/* Only inc the delayed task which being woken up. */
if (p->se.sched_delayed && !(flags & ENQUEUE_DELAYED))
return;
@@ -1797,6 +1803,9 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
if (unlikely(!p->sched_class->uclamp_enabled))
return;
+ if (p->flags & PF_KTHREAD)
+ return;
+
if (p->se.sched_delayed)
return;
@@ -1977,6 +1986,8 @@ static void uclamp_fork(struct task_struct *p)
{
enum uclamp_id clamp_id;
+ if (p->flags & PF_KTHREAD)
+ return;
/*
* We don't need to hold task_rq_lock() when updating p->uclamp_* here
* as the task is still at its early fork stages.
@@ -1995,6 +2006,8 @@ static void uclamp_fork(struct task_struct *p)
static void uclamp_post_fork(struct task_struct *p)
{
+ if (p->flags & PF_KTHREAD)
+ return;
uclamp_update_util_min_rt_default(p);
}
--
2.25.1
Hi zihuan, On Thu, Jul 3, 2025 at 5:15 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote: > > Kernel threads (PF_KTHREAD) are not subject to user-defined utilization > clamping. They do not represent user workloads and should not participate > in any uclamp logic, including: Indeed, some driver would use set_scheduler() to set some kthread to improve performance. It is not a good idea to exclude it. Thanks! > > - clamp initialization during fork/post-fork > - effective clamp value computation > - runtime aggregation (uclamp_rq_inc/dec) > > Allowing kernel threads into these paths may pollute the rq->uclamp[] > statistics, mislead schedutil governor's frequency selection, and > complicate debugging or trace interpretation. > > This patch ensures that: > - uclamp_fork() and uclamp_post_fork() skip kernel threads > - uclamp_eff_value() return default values > - uclamp_rq_inc() and uclamp_rq_dec() skip kernel threads > > This aligns the semantics of uclamp with its original intent: > user-space task-specific clamping. > > dmesg in uclamp_rq_inc_id: > [ 76.373903] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 > [ 76.375905] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 > [ 76.379837] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 > [ 76.379839] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 > [ 76.379839] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 > [ 76.379841] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 > [ 76.383897] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 > [ 76.383897] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 > [ 76.383900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 > [ 76.383901] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 > [ 76.387885] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 > [ 76.387885] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 > [ 76.387888] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 > [ 76.387889] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 > [ 76.388139] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:0 value:0 kthread:1 > [ 76.388140] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 > [ 76.388142] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 > [ 76.388143] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:1 value:1024 kthread:1 > [ 76.388169] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1 > [ 76.388171] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1 > [ 76.388891] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 > [ 76.388893] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 > [ 76.392900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 > [ 76.392902] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 > [ 76.398850] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 > [ 76.398852] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 > [ 76.401880] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:0 value:0 kthread:1 > [ 76.401883] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:1 value:1024 kthread:1 > [ 76.409053] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 > [ 76.409054] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 > [ 76.410881] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1 > [ 76.410884] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1 > [ 76.419947] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 > [ 76.419949] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 > [ 76.419976] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1 > [ 76.419979] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1 > [ 76.420119] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 > [ 76.420121] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 > [ 76.420642] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 > [ 76.420644] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 > [ 76.434914] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:0 value:0 kthread:1 > [ 76.434916] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:1 value:1024 kthread:1 > [ 76.447689] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1 > [ 76.447691] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1 > [ 76.447705] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:0 value:0 kthread:1 > [ 76.447707] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:1 value:1024 kthread:1 > [ 76.448809] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 > [ 76.448811] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 > [ 76.451260] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 > [ 76.451263] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 > [ 76.452806] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 > [ 76.452808] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 > [ 76.488052] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 > [ 76.488054] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 > [ 76.488767] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 > [ 76.488770] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 > [ 76.490847] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1 > [ 76.490848] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:0 value:0 kthread:1 > [ 76.490849] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:0 value:0 kthread:1 > [ 76.490848] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:0 value:0 kthread:1 > [ 76.490849] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:0 value:0 kthread:1 > [ 76.490850] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:1 value:1024 kthread:1 > [ 76.490851] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1 > [ 76.490851] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:1 value:1024 kthread:1 > [ 76.490851] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:1 value:1024 kthread:1 > [ 76.490853] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:1 value:1024 kthread:1 > [ 76.490857] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:0 value:0 kthread:1 > [ 76.490859] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:1 value:1024 kthread:1 > [ 76.491850] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:0 value:0 kthread:1 > [ 76.491852] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:1 value:1024 kthread:1 > [ 76.504848] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:0 value:0 kthread:1 > [ 76.504852] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:1 value:1024 kthread:1 > [ 76.508785] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:0 value:0 kthread:1 > [ 76.508787] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:1 value:1024 kthread:1 > [ 76.514856] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1 > [ 76.514859] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1 > [ 76.522742] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 > > Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn> > --- > kernel/sched/core.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 8988d38d46a3..a1e6b4157682 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1630,6 +1630,9 @@ unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id) > { > struct uclamp_se uc_eff; > > + if (p->flags & PF_KTHREAD) > + return uclamp_none(clamp_id); > + > /* Task currently refcounted: use back-annotated (effective) value */ > if (p->uclamp[clamp_id].active) > return (unsigned long)p->uclamp[clamp_id].value; > @@ -1769,6 +1772,9 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags > if (unlikely(!p->sched_class->uclamp_enabled)) > return; > > + if (p->flags & PF_KTHREAD) > + return; > + > /* Only inc the delayed task which being woken up. */ > if (p->se.sched_delayed && !(flags & ENQUEUE_DELAYED)) > return; > @@ -1797,6 +1803,9 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) > if (unlikely(!p->sched_class->uclamp_enabled)) > return; > > + if (p->flags & PF_KTHREAD) > + return; > + > if (p->se.sched_delayed) > return; > > @@ -1977,6 +1986,8 @@ static void uclamp_fork(struct task_struct *p) > { > enum uclamp_id clamp_id; > > + if (p->flags & PF_KTHREAD) > + return; > /* > * We don't need to hold task_rq_lock() when updating p->uclamp_* here > * as the task is still at its early fork stages. > @@ -1995,6 +2006,8 @@ static void uclamp_fork(struct task_struct *p) > > static void uclamp_post_fork(struct task_struct *p) > { > + if (p->flags & PF_KTHREAD) > + return; > uclamp_update_util_min_rt_default(p); > } > > -- > 2.25.1 > >
Hi Xuewen, Thanks for your feedback — that makes a lot of sense. 在 2025/7/3 17:42, Xuewen Yan 写道: > Hi zihuan, > > On Thu, Jul 3, 2025 at 5:15 PM Zihuan Zhang <zhangzihuan@kylinos.cn> wrote: >> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization >> clamping. They do not represent user workloads and should not participate >> in any uclamp logic, including: > Indeed, some driver would use set_scheduler() to set some kthread to > improve performance. > It is not a good idea to exclude it. > > Thanks! > I agree that kernel threads may need explicit scheduling control, so it’s indeed not a good idea to exclude them unconditionally. Our main concern was that uclamp_rq_inc() is a performance-sensitive path, and letting default-initialized kthreads participate in clamp aggregation could lead to unnecessary overhead and distort frequency decisions. We’ll rework the patch to be more selective — possibly skipping only those kernel threads that don’t have user-defined clamp values. Thanks again for the helpful input! >> - clamp initialization during fork/post-fork >> - effective clamp value computation >> - runtime aggregation (uclamp_rq_inc/dec) >> >> Allowing kernel threads into these paths may pollute the rq->uclamp[] >> statistics, mislead schedutil governor's frequency selection, and >> complicate debugging or trace interpretation. >> >> This patch ensures that: >> - uclamp_fork() and uclamp_post_fork() skip kernel threads >> - uclamp_eff_value() return default values >> - uclamp_rq_inc() and uclamp_rq_dec() skip kernel threads >> >> This aligns the semantics of uclamp with its original intent: >> user-space task-specific clamping. >> >> dmesg in uclamp_rq_inc_id: >> [ 76.373903] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 >> [ 76.375905] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 >> [ 76.379837] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 >> [ 76.379839] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 >> [ 76.379839] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 >> [ 76.379841] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 >> [ 76.383897] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 >> [ 76.383897] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 >> [ 76.383900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 >> [ 76.383901] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 >> [ 76.387885] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 >> [ 76.387885] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 >> [ 76.387888] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 >> [ 76.387889] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 >> [ 76.388139] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:0 value:0 kthread:1 >> [ 76.388140] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 >> [ 76.388142] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 >> [ 76.388143] uclamp_rq_inc_id: task:jbd2/sda3-8 pid:316 clamp_id:1 value:1024 kthread:1 >> [ 76.388169] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1 >> [ 76.388171] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1 >> [ 76.388891] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 >> [ 76.388893] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 >> [ 76.392900] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 >> [ 76.392902] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 >> [ 76.398850] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 >> [ 76.398852] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 >> [ 76.401880] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:0 value:0 kthread:1 >> [ 76.401883] uclamp_rq_inc_id: task:ksoftirqd/8 pid:67 clamp_id:1 value:1024 kthread:1 >> [ 76.409053] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 >> [ 76.409054] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 >> [ 76.410881] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1 >> [ 76.410884] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1 >> [ 76.419947] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 >> [ 76.419949] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 >> [ 76.419976] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:0 value:0 kthread:1 >> [ 76.419979] uclamp_rq_inc_id: task:kworker/u48:6 pid:93 clamp_id:1 value:1024 kthread:1 >> [ 76.420119] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:0 value:0 kthread:1 >> [ 76.420121] uclamp_rq_inc_id: task:kworker/2:1H pid:188 clamp_id:1 value:1024 kthread:1 >> [ 76.420642] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 >> [ 76.420644] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 >> [ 76.434914] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:0 value:0 kthread:1 >> [ 76.434916] uclamp_rq_inc_id: task:kcompactd0 pid:108 clamp_id:1 value:1024 kthread:1 >> [ 76.447689] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1 >> [ 76.447691] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1 >> [ 76.447705] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:0 value:0 kthread:1 >> [ 76.447707] uclamp_rq_inc_id: task:ksoftirqd/3 pid:37 clamp_id:1 value:1024 kthread:1 >> [ 76.448809] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 >> [ 76.448811] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 >> [ 76.451260] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 >> [ 76.451263] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 >> [ 76.452806] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:0 value:0 kthread:1 >> [ 76.452808] uclamp_rq_inc_id: task:rcu_preempt pid:16 clamp_id:1 value:1024 kthread:1 >> [ 76.488052] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 >> [ 76.488054] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 >> [ 76.488767] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 >> [ 76.488770] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:1 value:1024 kthread:1 >> [ 76.490847] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:0 value:0 kthread:1 >> [ 76.490848] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:0 value:0 kthread:1 >> [ 76.490849] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:0 value:0 kthread:1 >> [ 76.490848] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:0 value:0 kthread:1 >> [ 76.490849] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:0 value:0 kthread:1 >> [ 76.490850] uclamp_rq_inc_id: task:kworker/2:1 pid:143 clamp_id:1 value:1024 kthread:1 >> [ 76.490851] uclamp_rq_inc_id: task:kworker/3:2 pid:244 clamp_id:1 value:1024 kthread:1 >> [ 76.490851] uclamp_rq_inc_id: task:kworker/11:1 pid:146 clamp_id:1 value:1024 kthread:1 >> [ 76.490851] uclamp_rq_inc_id: task:kworker/7:2 pid:687 clamp_id:1 value:1024 kthread:1 >> [ 76.490853] uclamp_rq_inc_id: task:kworker/1:3 pid:462 clamp_id:1 value:1024 kthread:1 >> [ 76.490857] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:0 value:0 kthread:1 >> [ 76.490859] uclamp_rq_inc_id: task:kworker/5:1 pid:141 clamp_id:1 value:1024 kthread:1 >> [ 76.491850] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:0 value:0 kthread:1 >> [ 76.491852] uclamp_rq_inc_id: task:kworker/4:2 pid:534 clamp_id:1 value:1024 kthread:1 >> [ 76.504848] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:0 value:0 kthread:1 >> [ 76.504852] uclamp_rq_inc_id: task:kworker/10:2 pid:228 clamp_id:1 value:1024 kthread:1 >> [ 76.508785] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:0 value:0 kthread:1 >> [ 76.508787] uclamp_rq_inc_id: task:kworker/9:1 pid:142 clamp_id:1 value:1024 kthread:1 >> [ 76.514856] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:0 value:0 kthread:1 >> [ 76.514859] uclamp_rq_inc_id: task:kworker/u48:10 pid:97 clamp_id:1 value:1024 kthread:1 >> [ 76.522742] uclamp_rq_inc_id: task:kworker/1:1H pid:153 clamp_id:0 value:0 kthread:1 >> >> Signed-off-by: Zihuan Zhang <zhangzihuan@kylinos.cn> >> --- >> kernel/sched/core.c | 13 +++++++++++++ >> 1 file changed, 13 insertions(+) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index 8988d38d46a3..a1e6b4157682 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -1630,6 +1630,9 @@ unsigned long uclamp_eff_value(struct task_struct *p, enum uclamp_id clamp_id) >> { >> struct uclamp_se uc_eff; >> >> + if (p->flags & PF_KTHREAD) >> + return uclamp_none(clamp_id); >> + >> /* Task currently refcounted: use back-annotated (effective) value */ >> if (p->uclamp[clamp_id].active) >> return (unsigned long)p->uclamp[clamp_id].value; >> @@ -1769,6 +1772,9 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p, int flags >> if (unlikely(!p->sched_class->uclamp_enabled)) >> return; >> >> + if (p->flags & PF_KTHREAD) >> + return; >> + >> /* Only inc the delayed task which being woken up. */ >> if (p->se.sched_delayed && !(flags & ENQUEUE_DELAYED)) >> return; >> @@ -1797,6 +1803,9 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) >> if (unlikely(!p->sched_class->uclamp_enabled)) >> return; >> >> + if (p->flags & PF_KTHREAD) >> + return; >> + >> if (p->se.sched_delayed) >> return; >> >> @@ -1977,6 +1986,8 @@ static void uclamp_fork(struct task_struct *p) >> { >> enum uclamp_id clamp_id; >> >> + if (p->flags & PF_KTHREAD) >> + return; >> /* >> * We don't need to hold task_rq_lock() when updating p->uclamp_* here >> * as the task is still at its early fork stages. >> @@ -1995,6 +2006,8 @@ static void uclamp_fork(struct task_struct *p) >> >> static void uclamp_post_fork(struct task_struct *p) >> { >> + if (p->flags & PF_KTHREAD) >> + return; >> uclamp_update_util_min_rt_default(p); >> } >> >> -- >> 2.25.1 >> >> Best regards, Zihuan
On 7/3/25 10:14, Zihuan Zhang wrote: > Kernel threads (PF_KTHREAD) are not subject to user-defined utilization > clamping. They do not represent user workloads and should not participate > in any uclamp logic, including: Why not? > [snip]
Hi Christian, Thanks for the question! 在 2025/7/3 17:22, Christian Loehle 写道: > On 7/3/25 10:14, Zihuan Zhang wrote: >> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization >> clamping. They do not represent user workloads and should not participate >> in any uclamp logic, including: > Why not? > As Xuewen mentioned, some kernel threads may intentionally set scheduling attributes for performance. So instead of unconditionally excluding all kernel threads, I’m now considering a more conservative approach: skip only those kthreads that haven’t explicitly set any clamp values. This should help avoid unintended clamp aggregation while still supporting performance-tuned kthreads. >> [snip] Best regards, Zihuan
On Thu, 3 Jul 2025 18:07:20 +0800 Zihuan Zhang <zhangzihuan@kylinos.cn> wrote: > As Xuewen mentioned, some kernel threads may intentionally set > scheduling attributes for performance. So instead of unconditionally > excluding all kernel threads, I’m now considering a more conservative > approach: > skip only those kthreads that haven’t explicitly set any clamp values. The above comment bothers me. What kernel threads set scheduling attributes? From my experience working on real-time, no kernel thread gets scheduling tweaking correct. -- Steve
Hi Steven, Thanks for the feedback! 在 2025/7/3 22:51, Steven Rostedt 写道: > On Thu, 3 Jul 2025 18:07:20 +0800 > Zihuan Zhang <zhangzihuan@kylinos.cn> wrote: > >> As Xuewen mentioned, some kernel threads may intentionally set >> scheduling attributes for performance. So instead of unconditionally >> excluding all kernel threads, I’m now considering a more conservative >> approach: >> skip only those kthreads that haven’t explicitly set any clamp values. > The above comment bothers me. What kernel threads set scheduling attributes? > > From my experience working on real-time, no kernel thread gets > scheduling tweaking correct. > > -- Steve You raised a good point — in most cases, kernel threads don’t explicitly tweak scheduling attributes, and when they do, it might not always be correct or effective. I appreciate the insight from your real-time experience. The motivation behind this patch is to explore whether it’s worth optimizing the uclamp hot path a bit further. Since kernel threads typically don’t benefit from uclamp adjustments and often just inherit default values (e.g., max=1024), we were wondering if skipping the aggregation logic for such cases could slightly reduce overhead in some workloads. Of course, we want to be conservative and avoid breaking any legitimate usage. So I’d love to hear your opinion — do you think it’s worthwhile to pursue this kind of micro-optimization in uclamp, or is the potential gain too marginal to justify the added logic? Thanks again for your time and thoughts! Best regards, Zihuan Zhang
On Thu, 10 Jul 2025 08:55:28 +0800 Zihuan Zhang <zhangzihuan@kylinos.cn> wrote: > The motivation behind this patch is to explore whether it’s worth > optimizing the uclamp hot path a bit further. Since kernel threads > typically don’t benefit from uclamp adjustments and often just inherit > default values (e.g., max=1024), we were wondering if skipping the > aggregation logic for such cases could slightly reduce overhead in some > workloads. > > Of course, we want to be conservative and avoid breaking any legitimate > usage. So I’d love to hear your opinion — do you think it’s worthwhile > to pursue this kind of micro-optimization in uclamp, or is the potential > gain too marginal to justify the added logic? My honest opinion is that if there's not a huge issue you are trying to solve, then it's best to leave things as is. Tweaking this for micro-optimizations usually end up causing a regression somewhere you never expected. -- Steve
在 2025/7/10 22:03, Steven Rostedt 写道: > On Thu, 10 Jul 2025 08:55:28 +0800 > Zihuan Zhang <zhangzihuan@kylinos.cn> wrote: > >> The motivation behind this patch is to explore whether it’s worth >> optimizing the uclamp hot path a bit further. Since kernel threads >> typically don’t benefit from uclamp adjustments and often just inherit >> default values (e.g., max=1024), we were wondering if skipping the >> aggregation logic for such cases could slightly reduce overhead in some >> workloads. >> >> Of course, we want to be conservative and avoid breaking any legitimate >> usage. So I’d love to hear your opinion — do you think it’s worthwhile >> to pursue this kind of micro-optimization in uclamp, or is the potential >> gain too marginal to justify the added logic? > My honest opinion is that if there's not a huge issue you are trying > to solve, then it's best to leave things as is. Tweaking this for > micro-optimizations usually end up causing a regression somewhere you > never expected. > > -- Steve Thanks for the advice! I’ll keep that in mind for my next patch. Best regards, Zihuan Zhang
On 7/3/25 11:07, Zihuan Zhang wrote: > Hi Christian, > > Thanks for the question! > > 在 2025/7/3 17:22, Christian Loehle 写道: >> On 7/3/25 10:14, Zihuan Zhang wrote: >>> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization >>> clamping. They do not represent user workloads and should not participate >>> in any uclamp logic, including: >> Why not? >> > As Xuewen mentioned, some kernel threads may intentionally set scheduling attributes for performance. So instead of unconditionally excluding all kernel threads, I’m now considering a more conservative approach: > skip only those kthreads that haven’t explicitly set any clamp values. > > This should help avoid unintended clamp aggregation while still supporting performance-tuned kthreads. I'm skeptical, fundamentally you cannot exclude some fair tasks from uclamp logic. At least the cpufreq part they will be affected by, so if you 'exclude' some kthread that doesn't have clamps set (i.e. has min=0, max=1024) its utilization may not contribute to sugov frequency selection by being clamped by other task(s) (let's say you only have one other task with max=0, excluding the unclamped kthread now leads to sugov requesting the lowest OPP? Is that always correct/desired?) Is there a specific issue you're trying to solve? FYI there has been discussion around reworking the uclamp mechanism to solve some issues you may have been facing, but so far they haven't lead anywhere: https://lore.kernel.org/lkml/cover.1741091349.git.hongyan.xia2@arm.com/
Hi Christian, Apologies for the late reply, and thanks for raising the concerns. 在 2025/7/3 18:17, Christian Loehle 写道: > On 7/3/25 11:07, Zihuan Zhang wrote: >> Hi Christian, >> >> Thanks for the question! >> >> 在 2025/7/3 17:22, Christian Loehle 写道: >>> On 7/3/25 10:14, Zihuan Zhang wrote: >>>> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization >>>> clamping. They do not represent user workloads and should not participate >>>> in any uclamp logic, including: >>> Why not? >>> >> As Xuewen mentioned, some kernel threads may intentionally set scheduling attributes for performance. So instead of unconditionally excluding all kernel threads, I’m now considering a more conservative approach: >> skip only those kthreads that haven’t explicitly set any clamp values. >> >> This should help avoid unintended clamp aggregation while still supporting performance-tuned kthreads. > I'm skeptical, fundamentally you cannot exclude some fair tasks from uclamp logic. > At least the cpufreq part they will be affected by, so if you 'exclude' some > kthread that doesn't have clamps set (i.e. has min=0, max=1024) its > utilization may not contribute to sugov frequency selection by being > clamped by other task(s) (let's say you only have one other task with > max=0, excluding the unclamped kthread now leads to sugov requesting > the lowest OPP? Is that always correct/desired?) > > Is there a specific issue you're trying to solve? > FYI there has been discussion around reworking the uclamp mechanism to solve > some issues you may have been facing, but so far they haven't lead anywhere: > https://lore.kernel.org/lkml/cover.1741091349.git.hongyan.xia2@arm.com/ Our original motivation stems from the observation that uclamp is primarily designed to manage frequency selection based on user-space task behavior. Kernel threads typically do not represent user workloads and are often not considered meaningful participants in uclamp-driven decisions. To be clear, we are not aiming to exclude all kthreads from affecting frequency, but rather to explore ways to avoid unnecessary uclamp aggregation overhead from kernel threads that have no explicit clamp values set (i.e. uclamp.min=0, max=1024). As you pointed out, fully excluding these tasks might interfere with sugov behavior in certain edge cases. So a more balanced approach might be: - For kernel threads that do not set any clamp values, skip the clamp aggregation step - If a kernel thread explicitly sets clamp attributes, it should of course remain fully visible to uclamp logic. This would preserve correctness while reducing unnecessary overhead in the hot path, especially on systems with many runnable tasks. That said, we're open to alternative suggestions if there's a better way to achieve this optimization. We'll also review the earlier rework discussion you linked, thank you for pointing that out. Best regards, Zihuan Zhang
On 7/10/25 01:47, Zihuan Zhang wrote: > Hi Christian, > Apologies for the late reply, and thanks for raising the concerns. > > 在 2025/7/3 18:17, Christian Loehle 写道: >> On 7/3/25 11:07, Zihuan Zhang wrote: >>> Hi Christian, >>> >>> Thanks for the question! >>> >>> 在 2025/7/3 17:22, Christian Loehle 写道: >>>> On 7/3/25 10:14, Zihuan Zhang wrote: >>>>> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization >>>>> clamping. They do not represent user workloads and should not participate >>>>> in any uclamp logic, including: >>>> Why not? >>>> >>> As Xuewen mentioned, some kernel threads may intentionally set scheduling attributes for performance. So instead of unconditionally excluding all kernel threads, I’m now considering a more conservative approach: >>> skip only those kthreads that haven’t explicitly set any clamp values. >>> >>> This should help avoid unintended clamp aggregation while still supporting performance-tuned kthreads. >> I'm skeptical, fundamentally you cannot exclude some fair tasks from uclamp logic. >> At least the cpufreq part they will be affected by, so if you 'exclude' some >> kthread that doesn't have clamps set (i.e. has min=0, max=1024) its >> utilization may not contribute to sugov frequency selection by being >> clamped by other task(s) (let's say you only have one other task with >> max=0, excluding the unclamped kthread now leads to sugov requesting >> the lowest OPP? Is that always correct/desired?) >> >> Is there a specific issue you're trying to solve? >> FYI there has been discussion around reworking the uclamp mechanism to solve >> some issues you may have been facing, but so far they haven't lead anywhere: >> https://lore.kernel.org/lkml/cover.1741091349.git.hongyan.xia2@arm.com/ > Our original motivation stems from the observation that uclamp is primarily designed to manage frequency selection based on user-space task behavior. Kernel threads typically do not represent user workloads and are often not considered meaningful participants in uclamp-driven decisions. Two comments to that: - It's also used to drive task placement, not just frequency selection. - There can be cases where a kthread is fundamentally part of a user workload, thinking about io_uring here, but others exist too. > To be clear, we are not aiming to exclude all kthreads from affecting frequency, but rather to explore ways to avoid unnecessary uclamp aggregation overhead from kernel threads that have no explicit clamp values set (i.e. uclamp.min=0, max=1024). > As you pointed out, fully excluding these tasks might interfere with sugov behavior in certain edge cases. So a more balanced approach might be: > > - For kernel threads that do not set any clamp values, skip the clamp aggregation step > > - If a kernel thread explicitly sets clamp attributes, it should of course remain fully visible to uclamp logic. > > This would preserve correctness while reducing unnecessary overhead in the hot path, especially on systems with many runnable tasks. So an unclamped task not being part of uclamp will definitely affect the UCLAMP_MAX result, as I've mentioned above, you'll apply (other tasks) UCLAMP_MAX restrictions even if the kthread has UCLAMP_MAX==1024. That is not always desirable. Or would you let it take part in uclamp if the user explicitly set UCLAMP_MAX==1024 instead of relying on the default? That wouldn't be consistent IMO. Regarding the optimization part: Is there a specific workload where the overhead is an issue? It should be rather small. Some numbers should help.
在 2025/7/10 16:41, Christian Loehle 写道: > On 7/10/25 01:47, Zihuan Zhang wrote: >> Hi Christian, >> Apologies for the late reply, and thanks for raising the concerns. >> >> 在 2025/7/3 18:17, Christian Loehle 写道: >>> On 7/3/25 11:07, Zihuan Zhang wrote: >>>> Hi Christian, >>>> >>>> Thanks for the question! >>>> >>>> 在 2025/7/3 17:22, Christian Loehle 写道: >>>>> On 7/3/25 10:14, Zihuan Zhang wrote: >>>>>> Kernel threads (PF_KTHREAD) are not subject to user-defined utilization >>>>>> clamping. They do not represent user workloads and should not participate >>>>>> in any uclamp logic, including: >>>>> Why not? >>>>> >>>> As Xuewen mentioned, some kernel threads may intentionally set scheduling attributes for performance. So instead of unconditionally excluding all kernel threads, I’m now considering a more conservative approach: >>>> skip only those kthreads that haven’t explicitly set any clamp values. >>>> >>>> This should help avoid unintended clamp aggregation while still supporting performance-tuned kthreads. >>> I'm skeptical, fundamentally you cannot exclude some fair tasks from uclamp logic. >>> At least the cpufreq part they will be affected by, so if you 'exclude' some >>> kthread that doesn't have clamps set (i.e. has min=0, max=1024) its >>> utilization may not contribute to sugov frequency selection by being >>> clamped by other task(s) (let's say you only have one other task with >>> max=0, excluding the unclamped kthread now leads to sugov requesting >>> the lowest OPP? Is that always correct/desired?) >>> >>> Is there a specific issue you're trying to solve? >>> FYI there has been discussion around reworking the uclamp mechanism to solve >>> some issues you may have been facing, but so far they haven't lead anywhere: >>> https://lore.kernel.org/lkml/cover.1741091349.git.hongyan.xia2@arm.com/ >> Our original motivation stems from the observation that uclamp is primarily designed to manage frequency selection based on user-space task behavior. Kernel threads typically do not represent user workloads and are often not considered meaningful participants in uclamp-driven decisions. > Two comments to that: > - It's also used to drive task placement, not just frequency selection. > - There can be cases where a kthread is fundamentally part of a user workload, > thinking about io_uring here, but others exist too. > >> To be clear, we are not aiming to exclude all kthreads from affecting frequency, but rather to explore ways to avoid unnecessary uclamp aggregation overhead from kernel threads that have no explicit clamp values set (i.e. uclamp.min=0, max=1024). >> As you pointed out, fully excluding these tasks might interfere with sugov behavior in certain edge cases. So a more balanced approach might be: >> >> - For kernel threads that do not set any clamp values, skip the clamp aggregation step >> >> - If a kernel thread explicitly sets clamp attributes, it should of course remain fully visible to uclamp logic. >> >> This would preserve correctness while reducing unnecessary overhead in the hot path, especially on systems with many runnable tasks. > So an unclamped task not being part of uclamp will definitely affect the UCLAMP_MAX > result, as I've mentioned above, you'll apply (other tasks) UCLAMP_MAX restrictions > even if the kthread has UCLAMP_MAX==1024. That is not always desirable. > Or would you let it take part in uclamp if the user explicitly set UCLAMP_MAX==1024 > instead of relying on the default? That wouldn't be consistent IMO. > > Regarding the optimization part: > Is there a specific workload where the overhead is an issue? It should > be rather small. Some numbers should help. You’re absolutely right — excluding unclamped kernel threads entirely can unintentionally affect UCLAMP_MAX aggregation, and may lead to undesirable behavior in edge cases. I agree that this would not be a consistent or generally correct approach. At this stage, I think the idea still lacks maturity, and I appreciate your input in highlighting the possible implications. I’m currently diving deeper into the schedutil governor code to better understand how uclamp aggregation interacts with frequency selection and task placement in real workloads. With that in mind, I’ll take a step back and revisit the broader problem from a more informed perspective. Hopefully, in the near future, I’ll come up with a more solid and well-justified solution. Thanks again for your time and insights. Best regards, Zihuan Zhang
Hello Zihuan, On 7/10/2025 6:17 AM, Zihuan Zhang wrote: > - For kernel threads that do not set any clamp values, skip the clamp > aggregation step > > - If a kernel thread explicitly sets clamp attributes, it should of > course remain fully visible to uclamp logic. There are also sched_util_clamp_{min,max} global controls via sysctl which can be influencing the kthread scheduling / freq behavior indirectly and glancing at the implementation, I think these are still handled by clamping in uclamp_eff_get() and effective_cpu_util() only looks at uclamp_rq_get() to make freq decisions. Wouldn't excluding the kthreads from the uclamp aggregation also change this behavior? I'm assuming these global knobs can be used to limit frequencies when thermal throttle is detected and be reset again once the SoC falls below the throttle limits? -- Thanks and Regards, Prateek
在 2025/7/10 11:41, K Prateek Nayak 写道: > Hello Zihuan, > > On 7/10/2025 6:17 AM, Zihuan Zhang wrote: >> - For kernel threads that do not set any clamp values, skip the clamp >> aggregation step >> >> - If a kernel thread explicitly sets clamp attributes, it should of >> course remain fully visible to uclamp logic. > There are also sched_util_clamp_{min,max} global controls via sysctl > which can be influencing the kthread scheduling / freq behavior > indirectly and glancing at the implementation, I think these are > still handled by clamping in uclamp_eff_get() and effective_cpu_util() > only looks at uclamp_rq_get() to make freq decisions. > > Wouldn't excluding the kthreads from the uclamp aggregation also change > this behavior? I'm assuming these global knobs can be used to limit > frequencies when thermal throttle is detected and be reset again once > the SoC falls below the throttle limits? > Hi Prateek, Thanks a lot for pointing this out — I hadn’t fully considered the impact of global sched_util_clamp_{min,max} settings on kernel threads that don’t explicitly set clamp values. I’ll take a closer look at how this behaves in practice and carefully rethink the approach. Best regards, Zihuan Zhang
© 2016 - 2025 Red Hat, Inc.