[PATCH V7 09/11] sched: Add nodelay scheduling

Prakash Sangappa posted 11 patches 2 months, 1 week ago
[PATCH V7 09/11] sched: Add nodelay scheduling
Posted by Prakash Sangappa 2 months, 1 week ago
Realtime threads that are sensitive can indicate not to be delayed by a thread
running on th cpu, that has requested scheduler time slice extension.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
---
 init/Kconfig           |  2 +-
 kernel/Kconfig.preempt |  3 +++
 kernel/sched/core.c    | 14 ++++++++++++++
 kernel/sched/rt.c      | 10 +++++-----
 kernel/sched/sched.h   |  1 +
 5 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 3005abab77cf..119448f0b9e1 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1132,7 +1132,7 @@ config SCHED_MM_CID
 
 config RSEQ_RESCHED_DELAY
 	def_bool y
-	depends on SMP && RSEQ && SCHED_HRTICK
+	depends on SMP && RSEQ && SCHED_HRTICK && ARCH_HAS_PREEMPT_NODELAY
 	help
 	  This feature enables a thread to request extending its time slice on
 	  the cpu by delaying preemption.
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index 54ea59ff8fbe..96809d8d8bcb 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -14,6 +14,9 @@ config PREEMPT_BUILD
 config ARCH_HAS_PREEMPT_LAZY
 	bool
 
+config ARCH_HAS_PREEMPT_NODELAY
+	bool
+
 choice
 	prompt "Preemption Model"
 	default PREEMPT_NONE
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 035eec8911c2..e9be8a6b8851 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1185,6 +1185,20 @@ void resched_curr_lazy(struct rq *rq)
 	__resched_curr(rq, get_lazy_tif_bit());
 }
 
+#ifdef	CONFIG_RSEQ_RESCHED_DELAY
+void resched_curr_nodelay(struct rq *rq, struct task_struct *p)
+{
+	int tif;
+	tif = p->sched_nodelay ? TIF_NEED_RESCHED_NODELAY : TIF_NEED_RESCHED;
+	__resched_curr(rq, tif);
+}
+#else
+void resched_curr_nodelay(struct rq *rq, struct task_struct *p)
+{
+	__resched_curr(rq, TIF_NEED_RESCHED);
+}
+#endif
+
 void resched_cpu(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e40422c37033..1beae971799e 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1027,7 +1027,7 @@ static void update_curr_rt(struct rq *rq)
 			rt_rq->rt_time += delta_exec;
 			exceeded = sched_rt_runtime_exceeded(rt_rq);
 			if (exceeded)
-				resched_curr(rq);
+				resched_curr_nodelay(rq, rq->curr);
 			raw_spin_unlock(&rt_rq->rt_runtime_lock);
 			if (exceeded)
 				do_start_rt_bandwidth(sched_rt_bandwidth(rt_rq));
@@ -1634,7 +1634,7 @@ static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
 	 * to try and push the current task away:
 	 */
 	requeue_task_rt(rq, p, 1);
-	resched_curr(rq);
+	resched_curr_nodelay(rq, p);
 }
 
 static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
@@ -1663,7 +1663,7 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
 	struct task_struct *donor = rq->donor;
 
 	if (p->prio < donor->prio) {
-		resched_curr(rq);
+		resched_curr_nodelay(rq, p);
 		return;
 	}
 
@@ -1999,7 +1999,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 	 * just reschedule current.
 	 */
 	if (unlikely(next_task->prio < rq->donor->prio)) {
-		resched_curr(rq);
+		resched_curr_nodelay(rq, next_task);
 		return 0;
 	}
 
@@ -2087,7 +2087,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 	}
 
 	move_queued_task_locked(rq, lowest_rq, next_task);
-	resched_curr(lowest_rq);
+	resched_curr_nodelay(lowest_rq, next_task);
 	ret = 1;
 
 	double_unlock_balance(rq, lowest_rq);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f213f9e68aa6..b81354dfed3c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2698,6 +2698,7 @@ extern void init_sched_fair_class(void);
 extern void resched_curr(struct rq *rq);
 extern void resched_curr_lazy(struct rq *rq);
 extern void resched_cpu(int cpu);
+extern void resched_curr_nodelay(struct rq *rq, struct task_struct *p);
 
 extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime);
 extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq);
-- 
2.43.5
Re: [PATCH V7 09/11] sched: Add nodelay scheduling
Posted by Thomas Gleixner 1 month, 4 weeks ago
On Thu, Jul 24 2025 at 16:16, Prakash Sangappa wrote:
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c

You forgot dl.c :)

> @@ -1027,7 +1027,7 @@ static void update_curr_rt(struct rq *rq)
>  			rt_rq->rt_time += delta_exec;
>  			exceeded = sched_rt_runtime_exceeded(rt_rq);
>  			if (exceeded)
> -				resched_curr(rq);
> +				resched_curr_nodelay(rq, rq->curr);

How is this possibly correct?

If the current task has nodelay set, then this means it asks not to be
affected by a slice extension of a lower priority task.

But that aside, I agree with Sebastian, that this is overly complex and
yet another TIF RESCHED flag is just horrible. We should avoid it in the
first place unless there is a real use case.

RT uses the LAZY flag for non-RT tasks, which means if the regular
RESCHED is set on RT, then we just go and preempt and decline the
extension.

If there is a real use case somewhere down the road, we can revisit the
problem later. Keep it simple for now.

Thanks,

        tglx
Re: [PATCH V7 09/11] sched: Add nodelay scheduling
Posted by Prakash Sangappa 1 month, 4 weeks ago

> On Aug 8, 2025, at 6:26 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Thu, Jul 24 2025 at 16:16, Prakash Sangappa wrote:
>> --- a/kernel/sched/rt.c
>> +++ b/kernel/sched/rt.c
> 
> You forgot dl.c :)
> 
>> @@ -1027,7 +1027,7 @@ static void update_curr_rt(struct rq *rq)
>> rt_rq->rt_time += delta_exec;
>> exceeded = sched_rt_runtime_exceeded(rt_rq);
>> if (exceeded)
>> - resched_curr(rq);
>> + resched_curr_nodelay(rq, rq->curr);
> 
> How is this possibly correct?
> 
> If the current task has nodelay set, then this means it asks not to be
> affected by a slice extension of a lower priority task.
> 
> But that aside, I agree with Sebastian, that this is overly complex and
> yet another TIF RESCHED flag is just horrible. We should avoid it in the
> first place unless there is a real use case.
> 

This was a prototype. Appears it would get complex.

> RT uses the LAZY flag for non-RT tasks, which means if the regular
> RESCHED is set on RT, then we just go and preempt and decline the
> extension.

So we allow extension only if LAZY is set. 

> 
> If there is a real use case somewhere down the road, we can revisit the
> problem later. Keep it simple for now.

OK, I will drop these patches in the next round.
> 

Thanks,
-Prakash.
> Thanks,
> 
>        tglx
>