Preparatory changes for Proxy Execution v11

[PATCH v11 7/7] sched: Split scheduler and execution contexts

Posted by John Stultz 1 year, 3 months ago

From: Peter Zijlstra <peterz@infradead.org>

Let's define the scheduling context as all the scheduler state
in task_struct for the task selected to run, and the execution
context as all state required to actually run the task.

Currently both are intertwined in task_struct. We want to
logically split these such that we can use the scheduling
context of the task selected to be scheduled, but use the
execution context of a different task to actually be run.

To this purpose, introduce rq_selected() macro to point to the
task_struct selected from the runqueue by the scheduler, and
will be used for scheduler state, and preserve rq->curr to
indicate the execution context of the task that will actually be
run.

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: kernel-team@android.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20181009092434.26221-5-juri.lelli@redhat.com
[add additional comments and update more sched_class code to use
 rq::proxy]
Signed-off-by: Connor O'Brien <connoro@google.com>
[jstultz: Rebased and resolved minor collisions, reworked to use
 accessors, tweaked update_curr_common to use rq_proxy fixing rt
 scheduling issues]
Signed-off-by: John Stultz <jstultz@google.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: Metin Kaya <metin.kaya@arm.com>
Reviewed-by: Metin Kaya <metin.kaya@arm.com>
---
v2:
* Reworked to use accessors
* Fixed update_curr_common to use proxy instead of curr
v3:
* Tweaked wrapper names
* Swapped proxy for selected for clarity
v4:
* Minor variable name tweaks for readability
* Use a macro instead of a inline function and drop
  other helper functions as suggested by Peter.
* Remove verbose comments/questions to avoid review
  distractions, as suggested by Dietmar
v5:
* Add CONFIG_PROXY_EXEC option to this patch so the
  new logic can be tested with this change
* Minor fix to grab rq_selected when holding the rq lock
v7:
* Minor spelling fix and unused argument fixes suggested by
  Metin Kaya
* Switch to curr_selected for consistency, and minor rewording
  of commit message for clarity
* Rename variables selected instead of curr when we're using
  rq_selected()
* Reduce macros in CONFIG_SCHED_PROXY_EXEC ifdef sections,
  as suggested by Metin Kaya
v8:
* Use rq->curr, not rq_selected with task_tick, as suggested by
  Valentin
* Minor rework to reorder this with CONFIG_SCHED_PROXY_EXEC patch
v10:
* Use rq_selected in push_rt_task & get_push_task
v11:
* Rework to use selected instead of curr in a few cases we were
  previously assigning curr = rq_selected() to minimize lines of
  change. Suggested by Metin.
---
 kernel/sched/core.c     | 46 ++++++++++++++++++++++++---------------
 kernel/sched/deadline.c | 39 +++++++++++++++++----------------
 kernel/sched/fair.c     | 32 +++++++++++++--------------
 kernel/sched/rt.c       | 48 ++++++++++++++++++++---------------------
 kernel/sched/sched.h    | 27 ++++++++++++++++++++---
 5 files changed, 113 insertions(+), 79 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 029e7ecf5ea9..17036bae4a27 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -794,7 +794,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
 
 	rq_lock(rq, &rf);
 	update_rq_clock(rq);
-	rq->curr->sched_class->task_tick(rq, rq->curr, 1);
+	rq_selected(rq)->sched_class->task_tick(rq, rq->curr, 1);
 	rq_unlock(rq, &rf);
 
 	return HRTIMER_NORESTART;
@@ -2236,16 +2236,18 @@ static inline void check_class_changed(struct rq *rq, struct task_struct *p,
 
 void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags)
 {
-	if (p->sched_class == rq->curr->sched_class)
-		rq->curr->sched_class->wakeup_preempt(rq, p, flags);
-	else if (sched_class_above(p->sched_class, rq->curr->sched_class))
+	struct task_struct *selected = rq_selected(rq);
+
+	if (p->sched_class == selected->sched_class)
+		selected->sched_class->wakeup_preempt(rq, p, flags);
+	else if (sched_class_above(p->sched_class, selected->sched_class))
 		resched_curr(rq);
 
 	/*
 	 * A queue event has occurred, and we're going to schedule.  In
 	 * this case, we can save a useless back to back clock update.
 	 */
-	if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
+	if (task_on_rq_queued(selected) && test_tsk_need_resched(rq->curr))
 		rq_clock_skip_update(rq);
 }
 
@@ -2772,7 +2774,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx)
 		lockdep_assert_held(&p->pi_lock);
 
 	queued = task_on_rq_queued(p);
-	running = task_current(rq, p);
+	running = task_current_selected(rq, p);
 
 	if (queued) {
 		/*
@@ -5593,7 +5595,7 @@ unsigned long long task_sched_runtime(struct task_struct *p)
 	 * project cycles that may never be accounted to this
 	 * thread, breaking clock_gettime().
 	 */
-	if (task_current(rq, p) && task_on_rq_queued(p)) {
+	if (task_current_selected(rq, p) && task_on_rq_queued(p)) {
 		prefetch_curr_exec_start(p);
 		update_rq_clock(rq);
 		p->sched_class->update_curr(rq);
@@ -5661,7 +5663,8 @@ void sched_tick(void)
 {
 	int cpu = smp_processor_id();
 	struct rq *rq = cpu_rq(cpu);
-	struct task_struct *curr = rq->curr;
+	/* accounting goes to the selected task */
+	struct task_struct *selected;
 	struct rq_flags rf;
 	unsigned long hw_pressure;
 	u64 resched_latency;
@@ -5672,16 +5675,17 @@ void sched_tick(void)
 	sched_clock_tick();
 
 	rq_lock(rq, &rf);
+	selected = rq_selected(rq);
 
 	update_rq_clock(rq);
 	hw_pressure = arch_scale_hw_pressure(cpu_of(rq));
 	update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure);
-	curr->sched_class->task_tick(rq, curr, 0);
+	selected->sched_class->task_tick(rq, selected, 0);
 	if (sched_feat(LATENCY_WARN))
 		resched_latency = cpu_resched_latency(rq);
 	calc_global_load_tick(rq);
 	sched_core_tick(rq);
-	task_tick_mm_cid(rq, curr);
+	task_tick_mm_cid(rq, selected);
 
 	rq_unlock(rq, &rf);
 
@@ -5690,8 +5694,8 @@ void sched_tick(void)
 
 	perf_event_task_tick();
 
-	if (curr->flags & PF_WQ_WORKER)
-		wq_worker_tick(curr);
+	if (selected->flags & PF_WQ_WORKER)
+		wq_worker_tick(selected);
 
 #ifdef CONFIG_SMP
 	rq->idle_balance = idle_cpu(cpu);
@@ -5756,6 +5760,12 @@ static void sched_tick_remote(struct work_struct *work)
 		struct task_struct *curr = rq->curr;
 
 		if (cpu_online(cpu)) {
+			/*
+			 * Since this is a remote tick for full dynticks mode,
+			 * we are always sure that there is no proxy (only a
+			 * single task is running).
+			 */
+			SCHED_WARN_ON(rq->curr != rq_selected(rq));
 			update_rq_clock(rq);
 
 			if (!is_idle_task(curr)) {
@@ -6705,6 +6715,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
 	}
 
 	next = pick_next_task(rq, prev, &rf);
+	rq_set_selected(rq, next);
 	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
 #ifdef CONFIG_SCHED_DEBUG
@@ -7215,7 +7226,7 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)
 
 	prev_class = p->sched_class;
 	queued = task_on_rq_queued(p);
-	running = task_current(rq, p);
+	running = task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, queue_flag);
 	if (running)
@@ -7305,7 +7316,7 @@ void set_user_nice(struct task_struct *p, long nice)
 	}
 
 	queued = task_on_rq_queued(p);
-	running = task_current(rq, p);
+	running = task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
 	if (running)
@@ -7884,7 +7895,7 @@ static int __sched_setscheduler(struct task_struct *p,
 	}
 
 	queued = task_on_rq_queued(p);
-	running = task_current(rq, p);
+	running = task_current_selected(rq, p);
 	if (queued)
 		dequeue_task(rq, p, queue_flags);
 	if (running)
@@ -9311,6 +9322,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
 	rcu_read_unlock();
 
 	rq->idle = idle;
+	rq_set_selected(rq, idle);
 	rcu_assign_pointer(rq->curr, idle);
 	idle->on_rq = TASK_ON_RQ_QUEUED;
 #ifdef CONFIG_SMP
@@ -9400,7 +9412,7 @@ void sched_setnuma(struct task_struct *p, int nid)
 
 	rq = task_rq_lock(p, &rf);
 	queued = task_on_rq_queued(p);
-	running = task_current(rq, p);
+	running = task_current_selected(rq, p);
 
 	if (queued)
 		dequeue_task(rq, p, DEQUEUE_SAVE);
@@ -10505,7 +10517,7 @@ void sched_move_task(struct task_struct *tsk)
 
 	update_rq_clock(rq);
 
-	running = task_current(rq, tsk);
+	running = task_current_selected(rq, tsk);
 	queued = task_on_rq_queued(tsk);
 
 	if (queued)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ef135776e068..dbfa14ff16ed 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1217,7 +1217,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
 #endif
 
 	enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
-	if (dl_task(rq->curr))
+	if (dl_task(rq_selected(rq)))
 		wakeup_preempt_dl(rq, p, 0);
 	else
 		resched_curr(rq);
@@ -1441,11 +1441,11 @@ void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
  */
 static void update_curr_dl(struct rq *rq)
 {
-	struct task_struct *curr = rq->curr;
-	struct sched_dl_entity *dl_se = &curr->dl;
+	struct task_struct *selected = rq_selected(rq);
+	struct sched_dl_entity *dl_se = &selected->dl;
 	s64 delta_exec;
 
-	if (!dl_task(curr) || !on_dl_rq(dl_se))
+	if (!dl_task(selected) || !on_dl_rq(dl_se))
 		return;
 
 	/*
@@ -1898,7 +1898,7 @@ static int find_later_rq(struct task_struct *task);
 static int
 select_task_rq_dl(struct task_struct *p, int cpu, int flags)
 {
-	struct task_struct *curr;
+	struct task_struct *curr, *selected;
 	bool select_rq;
 	struct rq *rq;
 
@@ -1909,6 +1909,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)
 
 	rcu_read_lock();
 	curr = READ_ONCE(rq->curr); /* unlocked access */
+	selected = READ_ONCE(rq_selected(rq));
 
 	/*
 	 * If we are dealing with a -deadline task, we must
@@ -1919,9 +1920,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)
 	 * other hand, if it has a shorter deadline, we
 	 * try to make it stay here, it might be important.
 	 */
-	select_rq = unlikely(dl_task(curr)) &&
+	select_rq = unlikely(dl_task(selected)) &&
 		    (curr->nr_cpus_allowed < 2 ||
-		     !dl_entity_preempt(&p->dl, &curr->dl)) &&
+		     !dl_entity_preempt(&p->dl, &selected->dl)) &&
 		    p->nr_cpus_allowed > 1;
 
 	/*
@@ -1984,7 +1985,7 @@ static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
 	 * let's hope p can move out.
 	 */
 	if (rq->curr->nr_cpus_allowed == 1 ||
-	    !cpudl_find(&rq->rd->cpudl, rq->curr, NULL))
+	    !cpudl_find(&rq->rd->cpudl, rq_selected(rq), NULL))
 		return;
 
 	/*
@@ -2023,7 +2024,7 @@ static int balance_dl(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
 static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
 				  int flags)
 {
-	if (dl_entity_preempt(&p->dl, &rq->curr->dl)) {
+	if (dl_entity_preempt(&p->dl, &rq_selected(rq)->dl)) {
 		resched_curr(rq);
 		return;
 	}
@@ -2033,7 +2034,7 @@ static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
 	 * In the unlikely case current and p have the same deadline
 	 * let us try to decide what's the best thing to do...
 	 */
-	if ((p->dl.deadline == rq->curr->dl.deadline) &&
+	if ((p->dl.deadline == rq_selected(rq)->dl.deadline) &&
 	    !test_tsk_need_resched(rq->curr))
 		check_preempt_equal_dl(rq, p);
 #endif /* CONFIG_SMP */
@@ -2065,7 +2066,7 @@ static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool first)
 	if (!first)
 		return;
 
-	if (rq->curr->sched_class != &dl_sched_class)
+	if (rq_selected(rq)->sched_class != &dl_sched_class)
 		update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0);
 
 	deadline_queue_push_tasks(rq);
@@ -2390,8 +2391,8 @@ static int push_dl_task(struct rq *rq)
 	 * can move away, it makes sense to just reschedule
 	 * without going further in pushing next_task.
 	 */
-	if (dl_task(rq->curr) &&
-	    dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) &&
+	if (dl_task(rq_selected(rq)) &&
+	    dl_time_before(next_task->dl.deadline, rq_selected(rq)->dl.deadline) &&
 	    rq->curr->nr_cpus_allowed > 1) {
 		resched_curr(rq);
 		return 0;
@@ -2514,7 +2515,7 @@ static void pull_dl_task(struct rq *this_rq)
 			 * deadline than the current task of its runqueue.
 			 */
 			if (dl_time_before(p->dl.deadline,
-					   src_rq->curr->dl.deadline))
+					   rq_selected(src_rq)->dl.deadline))
 				goto skip;
 
 			if (is_migration_disabled(p)) {
@@ -2553,9 +2554,9 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p)
 	if (!task_on_cpu(rq, p) &&
 	    !test_tsk_need_resched(rq->curr) &&
 	    p->nr_cpus_allowed > 1 &&
-	    dl_task(rq->curr) &&
+	    dl_task(rq_selected(rq)) &&
 	    (rq->curr->nr_cpus_allowed < 2 ||
-	     !dl_entity_preempt(&p->dl, &rq->curr->dl))) {
+	     !dl_entity_preempt(&p->dl, &rq_selected(rq)->dl))) {
 		push_dl_tasks(rq);
 	}
 }
@@ -2730,12 +2731,12 @@ static void switched_to_dl(struct rq *rq, struct task_struct *p)
 		return;
 	}
 
-	if (rq->curr != p) {
+	if (rq_selected(rq) != p) {
 #ifdef CONFIG_SMP
 		if (p->nr_cpus_allowed > 1 && rq->dl.overloaded)
 			deadline_queue_push_tasks(rq);
 #endif
-		if (dl_task(rq->curr))
+		if (dl_task(rq_selected(rq)))
 			wakeup_preempt_dl(rq, p, 0);
 		else
 			resched_curr(rq);
@@ -2764,7 +2765,7 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
 	if (!rq->dl.overloaded)
 		deadline_queue_pull_task(rq);
 
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 		/*
 		 * If we now have a earlier deadline task than p,
 		 * then reschedule, provided p is still on this
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a5b1ae0aa55..4d0d3b423220 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1140,12 +1140,12 @@ static inline void update_curr_task(struct task_struct *p, s64 delta_exec)
  */
 s64 update_curr_common(struct rq *rq)
 {
-	struct task_struct *curr = rq->curr;
+	struct task_struct *selected = rq_selected(rq);
 	s64 delta_exec;
 
-	delta_exec = update_curr_se(rq, &curr->se);
+	delta_exec = update_curr_se(rq, &selected->se);
 	if (likely(delta_exec > 0))
-		update_curr_task(curr, delta_exec);
+		update_curr_task(selected, delta_exec);
 
 	return delta_exec;
 }
@@ -1177,7 +1177,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 
 static void update_curr_fair(struct rq *rq)
 {
-	update_curr(cfs_rq_of(&rq->curr->se));
+	update_curr(cfs_rq_of(&rq_selected(rq)->se));
 }
 
 static inline void
@@ -6646,7 +6646,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
 		s64 delta = slice - ran;
 
 		if (delta < 0) {
-			if (task_current(rq, p))
+			if (task_current_selected(rq, p))
 				resched_curr(rq);
 			return;
 		}
@@ -6661,12 +6661,12 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
  */
 static void hrtick_update(struct rq *rq)
 {
-	struct task_struct *curr = rq->curr;
+	struct task_struct *selected = rq_selected(rq);
 
-	if (!hrtick_enabled_fair(rq) || curr->sched_class != &fair_sched_class)
+	if (!hrtick_enabled_fair(rq) || selected->sched_class != &fair_sched_class)
 		return;
 
-	hrtick_start_fair(rq, curr);
+	hrtick_start_fair(rq, selected);
 }
 #else /* !CONFIG_SCHED_HRTICK */
 static inline void
@@ -8348,9 +8348,9 @@ static void set_next_buddy(struct sched_entity *se)
  */
 static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
 {
-	struct task_struct *curr = rq->curr;
-	struct sched_entity *se = &curr->se, *pse = &p->se;
-	struct cfs_rq *cfs_rq = task_cfs_rq(curr);
+	struct task_struct *selected = rq_selected(rq);
+	struct sched_entity *se = &selected->se, *pse = &p->se;
+	struct cfs_rq *cfs_rq = task_cfs_rq(selected);
 	int cse_is_idle, pse_is_idle;
 
 	if (unlikely(se == pse))
@@ -8379,11 +8379,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
 	 * prevents us from potentially nominating it as a false LAST_BUDDY
 	 * below.
 	 */
-	if (test_tsk_need_resched(curr))
+	if (test_tsk_need_resched(rq->curr))
 		return;
 
 	/* Idle tasks are by definition preempted by non-idle tasks. */
-	if (unlikely(task_has_idle_policy(curr)) &&
+	if (unlikely(task_has_idle_policy(selected)) &&
 	    likely(!task_has_idle_policy(p)))
 		goto preempt;
 
@@ -9361,7 +9361,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
 	 * update_load_avg() can call cpufreq_update_util(). Make sure that RT,
 	 * DL and IRQ signals have been updated before updating CFS.
 	 */
-	curr_class = rq->curr->sched_class;
+	curr_class = rq_selected(rq)->sched_class;
 
 	hw_pressure = arch_scale_hw_pressure(cpu_of(rq));
 
@@ -12738,7 +12738,7 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
 	 * our priority decreased, or if we are not currently running on
 	 * this runqueue and our priority is higher than the current's
 	 */
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 		if (p->prio > oldprio)
 			resched_curr(rq);
 	} else
@@ -12843,7 +12843,7 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
 		 * kick off the schedule if running, otherwise just see
 		 * if we can still preempt the current task.
 		 */
-		if (task_current(rq, p))
+		if (task_current_selected(rq, p))
 			resched_curr(rq);
 		else
 			wakeup_preempt(rq, p, 0);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 56363e18949a..da4cbd744fe6 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -529,7 +529,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)
 
 static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
 {
-	struct task_struct *curr = rq_of_rt_rq(rt_rq)->curr;
+	struct task_struct *selected = rq_selected(rq_of_rt_rq(rt_rq));
 	struct rq *rq = rq_of_rt_rq(rt_rq);
 	struct sched_rt_entity *rt_se;
 
@@ -543,7 +543,7 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
 		else if (!on_rt_rq(rt_se))
 			enqueue_rt_entity(rt_se, 0);
 
-		if (rt_rq->highest_prio.curr < curr->prio)
+		if (rt_rq->highest_prio.curr < selected->prio)
 			resched_curr(rq);
 	}
 }
@@ -999,11 +999,11 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
  */
 static void update_curr_rt(struct rq *rq)
 {
-	struct task_struct *curr = rq->curr;
-	struct sched_rt_entity *rt_se = &curr->rt;
+	struct task_struct *selected = rq_selected(rq);
+	struct sched_rt_entity *rt_se = &selected->rt;
 	s64 delta_exec;
 
-	if (curr->sched_class != &rt_sched_class)
+	if (selected->sched_class != &rt_sched_class)
 		return;
 
 	delta_exec = update_curr_common(rq);
@@ -1542,7 +1542,7 @@ static int find_lowest_rq(struct task_struct *task);
 static int
 select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 {
-	struct task_struct *curr;
+	struct task_struct *curr, *selected;
 	struct rq *rq;
 	bool test;
 
@@ -1554,6 +1554,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 
 	rcu_read_lock();
 	curr = READ_ONCE(rq->curr); /* unlocked access */
+	selected = READ_ONCE(rq_selected(rq));
 
 	/*
 	 * If the current task on @p's runqueue is an RT task, then
@@ -1582,8 +1583,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 	 * systems like big.LITTLE.
 	 */
 	test = curr &&
-	       unlikely(rt_task(curr)) &&
-	       (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
+	       unlikely(rt_task(selected)) &&
+	       (curr->nr_cpus_allowed < 2 || selected->prio <= p->prio);
 
 	if (test || !rt_task_fits_capacity(p, cpu)) {
 		int target = find_lowest_rq(p);
@@ -1613,12 +1614,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
 
 static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
 {
-	/*
-	 * Current can't be migrated, useless to reschedule,
-	 * let's hope p can move out.
-	 */
 	if (rq->curr->nr_cpus_allowed == 1 ||
-	    !cpupri_find(&rq->rd->cpupri, rq->curr, NULL))
+	    !cpupri_find(&rq->rd->cpupri, rq_selected(rq), NULL))
 		return;
 
 	/*
@@ -1661,7 +1658,9 @@ static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
  */
 static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
 {
-	if (p->prio < rq->curr->prio) {
+	struct task_struct *selected = rq_selected(rq);
+
+	if (p->prio < selected->prio) {
 		resched_curr(rq);
 		return;
 	}
@@ -1679,7 +1678,7 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
 	 * to move current somewhere else, making room for our non-migratable
 	 * task.
 	 */
-	if (p->prio == rq->curr->prio && !test_tsk_need_resched(rq->curr))
+	if (p->prio == selected->prio && !test_tsk_need_resched(rq->curr))
 		check_preempt_equal_prio(rq, p);
 #endif
 }
@@ -1704,7 +1703,7 @@ static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool f
 	 * utilization. We only care of the case where we start to schedule a
 	 * rt task
 	 */
-	if (rq->curr->sched_class != &rt_sched_class)
+	if (rq_selected(rq)->sched_class != &rt_sched_class)
 		update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0);
 
 	rt_queue_push_tasks(rq);
@@ -1976,6 +1975,7 @@ static struct task_struct *pick_next_pushable_task(struct rq *rq)
 
 	BUG_ON(rq->cpu != task_cpu(p));
 	BUG_ON(task_current(rq, p));
+	BUG_ON(task_current_selected(rq, p));
 	BUG_ON(p->nr_cpus_allowed <= 1);
 
 	BUG_ON(!task_on_rq_queued(p));
@@ -2008,7 +2008,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 	 * higher priority than current. If that's the case
 	 * just reschedule current.
 	 */
-	if (unlikely(next_task->prio < rq->curr->prio)) {
+	if (unlikely(next_task->prio < rq_selected(rq)->prio)) {
 		resched_curr(rq);
 		return 0;
 	}
@@ -2029,7 +2029,7 @@ static int push_rt_task(struct rq *rq, bool pull)
 		 * Note that the stoppers are masqueraded as SCHED_FIFO
 		 * (cf. sched_set_stop_task()), so we can't rely on rt_task().
 		 */
-		if (rq->curr->sched_class != &rt_sched_class)
+		if (rq_selected(rq)->sched_class != &rt_sched_class)
 			return 0;
 
 		cpu = find_lowest_rq(rq->curr);
@@ -2361,7 +2361,7 @@ static void pull_rt_task(struct rq *this_rq)
 			 * p if it is lower in priority than the
 			 * current task on the run queue
 			 */
-			if (p->prio < src_rq->curr->prio)
+			if (p->prio < rq_selected(src_rq)->prio)
 				goto skip;
 
 			if (is_migration_disabled(p)) {
@@ -2403,9 +2403,9 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
 	bool need_to_push = !task_on_cpu(rq, p) &&
 			    !test_tsk_need_resched(rq->curr) &&
 			    p->nr_cpus_allowed > 1 &&
-			    (dl_task(rq->curr) || rt_task(rq->curr)) &&
+			    (dl_task(rq_selected(rq)) || rt_task(rq_selected(rq))) &&
 			    (rq->curr->nr_cpus_allowed < 2 ||
-			     rq->curr->prio <= p->prio);
+			     rq_selected(rq)->prio <= p->prio);
 
 	if (need_to_push)
 		push_rt_tasks(rq);
@@ -2489,7 +2489,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
 		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
 			rt_queue_push_tasks(rq);
 #endif /* CONFIG_SMP */
-		if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq)))
+		if (p->prio < rq_selected(rq)->prio && cpu_online(cpu_of(rq)))
 			resched_curr(rq);
 	}
 }
@@ -2504,7 +2504,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
 	if (!task_on_rq_queued(p))
 		return;
 
-	if (task_current(rq, p)) {
+	if (task_current_selected(rq, p)) {
 #ifdef CONFIG_SMP
 		/*
 		 * If our priority decreases while running, we
@@ -2530,7 +2530,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
 		 * greater than the current running task
 		 * then reschedule.
 		 */
-		if (p->prio < rq->curr->prio)
+		if (p->prio < rq_selected(rq)->prio)
 			resched_curr(rq);
 	}
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 493de4cc320a..7ee8c7fa0ae8 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1051,7 +1051,7 @@ struct rq {
 	 */
 	unsigned int		nr_uninterruptible;
 
-	struct task_struct __rcu	*curr;
+	struct task_struct __rcu	*curr;       /* Execution context */
 	struct task_struct	*idle;
 	struct task_struct	*stop;
 	unsigned long		next_balance;
@@ -1246,6 +1246,13 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
 #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
 #define raw_rq()		raw_cpu_ptr(&runqueues)
 
+/* For now, rq_selected == rq->curr */
+#define rq_selected(rq)		((rq)->curr)
+static inline void rq_set_selected(struct rq *rq, struct task_struct *t)
+{
+	/* Do nothing */
+}
+
 struct sched_group;
 #ifdef CONFIG_SCHED_CORE
 static inline struct cpumask *sched_group_span(struct sched_group *sg);
@@ -2151,11 +2158,25 @@ static inline u64 global_rt_runtime(void)
 	return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC;
 }
 
+/*
+ * Is p the current execution context?
+ */
 static inline int task_current(struct rq *rq, struct task_struct *p)
 {
 	return rq->curr == p;
 }
 
+/*
+ * Is p the current scheduling context?
+ *
+ * Note that it might be the current execution context at the same time if
+ * rq->curr == rq_selected() == p.
+ */
+static inline int task_current_selected(struct rq *rq, struct task_struct *p)
+{
+	return rq_selected(rq) == p;
+}
+
 static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
 {
 #ifdef CONFIG_SMP
@@ -2325,7 +2346,7 @@ struct sched_class {
 
 static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
 {
-	WARN_ON_ONCE(rq->curr != prev);
+	WARN_ON_ONCE(rq_selected(rq) != prev);
 	prev->sched_class->put_prev_task(rq, prev);
 }
 
@@ -2406,7 +2427,7 @@ extern void set_cpus_allowed_common(struct task_struct *p, struct affinity_conte
 
 static inline struct task_struct *get_push_task(struct rq *rq)
 {
-	struct task_struct *p = rq->curr;
+	struct task_struct *p = rq_selected(rq);
 
 	lockdep_assert_rq_held(rq);
 
-- 
2.45.2.993.g49e7a77208-goog

Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts

Posted by Peter Zijlstra 1 year, 3 months ago

On Tue, Jul 09, 2024 at 01:31:50PM -0700, John Stultz wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> Let's define the scheduling context as all the scheduler state
> in task_struct for the task selected to run, and the execution
> context as all state required to actually run the task.
> 
> Currently both are intertwined in task_struct. We want to
> logically split these such that we can use the scheduling
> context of the task selected to be scheduled, but use the
> execution context of a different task to actually be run.
> 
> To this purpose, introduce rq_selected() macro to point to the
> task_struct selected from the runqueue by the scheduler, and
> will be used for scheduler state, and preserve rq->curr to
> indicate the execution context of the task that will actually be
> run.

> * Swapped proxy for selected for clarity

I'm not loving this naming...  what does selected even mean? What was
wrong with proxy? -- (did we have this conversation before?)

> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 493de4cc320a..7ee8c7fa0ae8 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1051,7 +1051,7 @@ struct rq {
>  	 */
>  	unsigned int		nr_uninterruptible;
>  
> -	struct task_struct __rcu	*curr;
> +	struct task_struct __rcu	*curr;       /* Execution context */
>  	struct task_struct	*idle;
>  	struct task_struct	*stop;
>  	unsigned long		next_balance;
> @@ -1246,6 +1246,13 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
>  #define cpu_curr(cpu)		(cpu_rq(cpu)->curr)
>  #define raw_rq()		raw_cpu_ptr(&runqueues)
>  
> +/* For now, rq_selected == rq->curr */
> +#define rq_selected(rq)		((rq)->curr)
> +static inline void rq_set_selected(struct rq *rq, struct task_struct *t)
> +{
> +	/* Do nothing */
> +}
> +
>  struct sched_group;
>  #ifdef CONFIG_SCHED_CORE
>  static inline struct cpumask *sched_group_span(struct sched_group *sg);
> @@ -2151,11 +2158,25 @@ static inline u64 global_rt_runtime(void)
>  	return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC;
>  }
>  
> +/*
> + * Is p the current execution context?
> + */
>  static inline int task_current(struct rq *rq, struct task_struct *p)
>  {
>  	return rq->curr == p;
>  }
>  
> +/*
> + * Is p the current scheduling context?
> + *
> + * Note that it might be the current execution context at the same time if
> + * rq->curr == rq_selected() == p.
> + */
> +static inline int task_current_selected(struct rq *rq, struct task_struct *p)
> +{
> +	return rq_selected(rq) == p;
> +}


And I think I hated on the macros before, and you said you needed them
to to allow !PROXY builds.

But what about something like:

#ifdef CONFIG_PROXY_EXEC
	struct task_struct __rcu *proxy;
	struct task_struct __rcu *curr;
#else
	union {
		struct task_struct __rcu *proxy;
		struct task_struct __rcu *curr;
	};
#endif


And then we can use rq->proxy and rq->curr like the good old days?


I realize this is going to be a lot of typing to fix up, and perhaps
there's a reason to not do this. But...

Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts

Posted by Juri Lelli 1 year, 3 months ago

On 12/07/24 17:01, Peter Zijlstra wrote:
> On Tue, Jul 09, 2024 at 01:31:50PM -0700, John Stultz wrote:
> > From: Peter Zijlstra <peterz@infradead.org>
> > 
> > Let's define the scheduling context as all the scheduler state
> > in task_struct for the task selected to run, and the execution
> > context as all state required to actually run the task.
> > 
> > Currently both are intertwined in task_struct. We want to
> > logically split these such that we can use the scheduling
> > context of the task selected to be scheduled, but use the
> > execution context of a different task to actually be run.
> > 
> > To this purpose, introduce rq_selected() macro to point to the
> > task_struct selected from the runqueue by the scheduler, and
> > will be used for scheduler state, and preserve rq->curr to
> > indicate the execution context of the task that will actually be
> > run.
> 
> > * Swapped proxy for selected for clarity
> 
> I'm not loving this naming...  what does selected even mean? What was
> wrong with proxy? -- (did we have this conversation before?)

Or maybe curr and sched_ctx as an alternative (if proxy confuses people
:), so that it's more direct/straightforward (even though it's not
symmetric)?

Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts

Posted by John Stultz 1 year, 3 months ago

On Fri, Jul 12, 2024 at 8:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jul 09, 2024 at 01:31:50PM -0700, John Stultz wrote:
> > From: Peter Zijlstra <peterz@infradead.org>
> >
> > Let's define the scheduling context as all the scheduler state
> > in task_struct for the task selected to run, and the execution
> > context as all state required to actually run the task.
> >
> > Currently both are intertwined in task_struct. We want to
> > logically split these such that we can use the scheduling
> > context of the task selected to be scheduled, but use the
> > execution context of a different task to actually be run.
> >
> > To this purpose, introduce rq_selected() macro to point to the
> > task_struct selected from the runqueue by the scheduler, and
> > will be used for scheduler state, and preserve rq->curr to
> > indicate the execution context of the task that will actually be
> > run.
>
> > * Swapped proxy for selected for clarity
>
> I'm not loving this naming...  what does selected even mean? What was
> wrong with proxy? -- (did we have this conversation before?)

So yeah, this came up earlier:
https://lore.kernel.org/lkml/CANDhNCr3acrEpBYd2LVkY3At=HCDZxGWqbMMwzVJ-Mn--dv3DA@mail.gmail.com/

My big concern is that the way proxy was used early in the series
seemed to be inverted from how the term is commonly used.

A proxy is one who takes an action on behalf of someone else.

In this case we have a blocked task that was picked to run, but then
we run another task in its place. Intuitively, this makes the proxy
the one that actually runs, not the one that was picked. But the
earliest versions of the patch had this flipped, and caused lots of
conceptual confusion in the discussions I had with folks about what
the patch was doing (as well as my own confusion initially working on
the patch).

So to avoid this, I've minimized the use of the term proxy in the
code, trying to use less confusing terms.

To me, selected is a more straightforward term, as it's the task the
scheduler chose to run (but not necessarily the one that actually
runs).
And curr is fine enough, as it is the currently running task.

But I'm not wedded to any particular name.  "selected", "chosen",
(Valentin suggested "picked" earlier, which I'd be ok with, but wanted
some sense of consensus before I go through to rework it all), but I
do think "proxy" for the scheduler context would make the code
unnecessarily difficult for others to understand.

> > +static inline int task_current_selected(struct rq *rq, struct task_struct *p)
> > +{
> > +     return rq_selected(rq) == p;
> > +}
>
>
> And I think I hated on the macros before, and you said you needed them
> to to allow !PROXY builds.
>
> But what about something like:
>
> #ifdef CONFIG_PROXY_EXEC
>         struct task_struct __rcu *proxy;
>         struct task_struct __rcu *curr;
> #else
>         union {
>                 struct task_struct __rcu *proxy;
>                 struct task_struct __rcu *curr;
>         };
> #endif
>
>
> And then we can use rq->proxy and rq->curr like the good old days?

Ok, yeah, we can use a union for that. I tend to think of unions as a
bit overly subtle for this sort of trick (the wrapper functions make
it more clear there's indirection going on), but that's a taste issue
and I'm not picky.

> I realize this is going to be a lot of typing to fix up, and perhaps
> there's a reason to not do this. But...

I have no problems reworking this if it helps get this series unstuck!

I might strongly push back against "proxy" as the name for the
scheduler context, but if it really is your favorite, I'll hold my
nose to move this along.

Do let me know your final preference and I'll go rework it all asap.

Thanks again so much for providing some feedback here! I really appreciate it!
-john

Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts

Posted by Juri Lelli 1 year, 3 months ago

Hi John,

On 12/07/24 12:10, John Stultz wrote:
> On Fri, Jul 12, 2024 at 8:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Jul 09, 2024 at 01:31:50PM -0700, John Stultz wrote:
> > > From: Peter Zijlstra <peterz@infradead.org>
> > >
> > > Let's define the scheduling context as all the scheduler state
> > > in task_struct for the task selected to run, and the execution
> > > context as all state required to actually run the task.
> > >
> > > Currently both are intertwined in task_struct. We want to
> > > logically split these such that we can use the scheduling
> > > context of the task selected to be scheduled, but use the
> > > execution context of a different task to actually be run.
> > >
> > > To this purpose, introduce rq_selected() macro to point to the
> > > task_struct selected from the runqueue by the scheduler, and
> > > will be used for scheduler state, and preserve rq->curr to
> > > indicate the execution context of the task that will actually be
> > > run.
> >
> > > * Swapped proxy for selected for clarity
> >
> > I'm not loving this naming...  what does selected even mean? What was
> > wrong with proxy? -- (did we have this conversation before?)
> 
> So yeah, this came up earlier:
> https://lore.kernel.org/lkml/CANDhNCr3acrEpBYd2LVkY3At=HCDZxGWqbMMwzVJ-Mn--dv3DA@mail.gmail.com/
> 
> My big concern is that the way proxy was used early in the series
> seemed to be inverted from how the term is commonly used.
> 
> A proxy is one who takes an action on behalf of someone else.
> 
> In this case we have a blocked task that was picked to run, but then
> we run another task in its place. Intuitively, this makes the proxy
> the one that actually runs, not the one that was picked. But the
> earliest versions of the patch had this flipped, and caused lots of
> conceptual confusion in the discussions I had with folks about what
> the patch was doing (as well as my own confusion initially working on
> the patch).

I don't think I have strong preferences either way, but I actually
considered the proxy to be the blocked donor (the one picked by the
scheduler to run), as it makes the owner use its properties, acting as a
proxy for the owner.

I think that in this case find_proxy_task() might have a confusing name,
as it doesn't actually try to find a proxy, but rather the owner of a
(or a chain of) mutex. We could rename that find_owner_task() and it
seems it might make sense, as we would have

pick_again:
        next = pick_next_task(rq, rq_proxy(rq), &rf);
        rq_set_proxy(rq, next);
        if (unlikely(task_is_blocked(next))) {
                next = find_owner_task(rq, next, &rf);

where, if next was blocked, we search for the owner it is blocked on.

Anyway, just wanted to tell you the way I understood this so far. Happy
to go with what you and Peter decide to go with. :)

Best,
Juri

Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts

Posted by Peter Zijlstra 1 year, 3 months ago

Sorry for the delay, I need the earth to stop spinning so goddamn fast
:-) 36 hours days ftw or so... Oh wait, that'd mean other people also
increase the amount of crap they send my way, don't it?

Damn..

On Wed, Jul 31, 2024 at 11:11:45AM +0200, Juri Lelli wrote:
> Hi John,
> 
> On 12/07/24 12:10, John Stultz wrote:
> > On Fri, Jul 12, 2024 at 8:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Tue, Jul 09, 2024 at 01:31:50PM -0700, John Stultz wrote:
> > > > From: Peter Zijlstra <peterz@infradead.org>
> > > >
> > > > Let's define the scheduling context as all the scheduler state
> > > > in task_struct for the task selected to run, and the execution
> > > > context as all state required to actually run the task.
> > > >
> > > > Currently both are intertwined in task_struct. We want to
> > > > logically split these such that we can use the scheduling
> > > > context of the task selected to be scheduled, but use the
> > > > execution context of a different task to actually be run.
> > > >
> > > > To this purpose, introduce rq_selected() macro to point to the
> > > > task_struct selected from the runqueue by the scheduler, and
> > > > will be used for scheduler state, and preserve rq->curr to
> > > > indicate the execution context of the task that will actually be
> > > > run.
> > >
> > > > * Swapped proxy for selected for clarity
> > >
> > > I'm not loving this naming...  what does selected even mean? What was
> > > wrong with proxy? -- (did we have this conversation before?)
> > 
> > So yeah, this came up earlier:
> > https://lore.kernel.org/lkml/CANDhNCr3acrEpBYd2LVkY3At=HCDZxGWqbMMwzVJ-Mn--dv3DA@mail.gmail.com/
> > 
> > My big concern is that the way proxy was used early in the series
> > seemed to be inverted from how the term is commonly used.
> > 
> > A proxy is one who takes an action on behalf of someone else.

Ah, I see your confusion.

> > In this case we have a blocked task that was picked to run, but then
> > we run another task in its place. Intuitively, this makes the proxy
> > the one that actually runs, not the one that was picked. But the
> > earliest versions of the patch had this flipped, and caused lots of
> > conceptual confusion in the discussions I had with folks about what
> > the patch was doing (as well as my own confusion initially working on
> > the patch).
> 
> I don't think I have strong preferences either way, but I actually
> considered the proxy to be the blocked donor (the one picked by the
> scheduler to run), as it makes the owner use its properties, acting as a
> proxy for the owner.

This. But I suspect we both suffer from not being native English
speakers.

Would 'donor' work in this case?

Then the donor gets all the accounting done on it, while we execute
curr.

Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts

Posted by John Stultz 1 year, 3 months ago

On Wed, Jul 31, 2024 at 4:37 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Sorry for the delay, I need the earth to stop spinning so goddamn fast
> :-) 36 hours days ftw or so... Oh wait, that'd mean other people also
> increase the amount of crap they send my way, don't it?
>
> Damn..

Yeah. My sympathies! I do really appreciate your taking time to
provide feedback here (amongst all the other eevdf patches and
sched_ext stuff you're reviewing).

> Would 'donor' work in this case?
>
> Then the donor gets all the accounting done on it, while we execute
> curr.

'Donor' is great, I'll switch to that.

Thanks again for getting back to me here!
-john

Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts

Posted by Phil Auld 1 year, 3 months ago

On Wed, Jul 31, 2024 at 04:32:33PM -0700 John Stultz wrote:
> On Wed, Jul 31, 2024 at 4:37 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Sorry for the delay, I need the earth to stop spinning so goddamn fast
> > :-) 36 hours days ftw or so... Oh wait, that'd mean other people also
> > increase the amount of crap they send my way, don't it?
> >
> > Damn..
> 
> Yeah. My sympathies! I do really appreciate your taking time to
> provide feedback here (amongst all the other eevdf patches and
> sched_ext stuff you're reviewing).
>

Not to mention way too many rust patches...


> > Would 'donor' work in this case?
> >
> > Then the donor gets all the accounting done on it, while we execute
> > curr.
> 
> 'Donor' is great, I'll switch to that.
> 
> Thanks again for getting back to me here!
> -john
> 

--

Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts

Posted by Phil Auld 1 year, 3 months ago

On Wed, Jul 31, 2024 at 01:37:20PM +0200 Peter Zijlstra wrote:
> 
> Sorry for the delay, I need the earth to stop spinning so goddamn fast
> :-) 36 hours days ftw or so... Oh wait, that'd mean other people also
> increase the amount of crap they send my way, don't it?
> 
> Damn..
> 
> On Wed, Jul 31, 2024 at 11:11:45AM +0200, Juri Lelli wrote:
> > Hi John,
> > 
> > On 12/07/24 12:10, John Stultz wrote:
> > > On Fri, Jul 12, 2024 at 8:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Tue, Jul 09, 2024 at 01:31:50PM -0700, John Stultz wrote:
> > > > > From: Peter Zijlstra <peterz@infradead.org>
> > > > >
> > > > > Let's define the scheduling context as all the scheduler state
> > > > > in task_struct for the task selected to run, and the execution
> > > > > context as all state required to actually run the task.
> > > > >
> > > > > Currently both are intertwined in task_struct. We want to
> > > > > logically split these such that we can use the scheduling
> > > > > context of the task selected to be scheduled, but use the
> > > > > execution context of a different task to actually be run.
> > > > >
> > > > > To this purpose, introduce rq_selected() macro to point to the
> > > > > task_struct selected from the runqueue by the scheduler, and
> > > > > will be used for scheduler state, and preserve rq->curr to
> > > > > indicate the execution context of the task that will actually be
> > > > > run.
> > > >
> > > > > * Swapped proxy for selected for clarity
> > > >
> > > > I'm not loving this naming...  what does selected even mean? What was
> > > > wrong with proxy? -- (did we have this conversation before?)
> > > 
> > > So yeah, this came up earlier:
> > > https://lore.kernel.org/lkml/CANDhNCr3acrEpBYd2LVkY3At=HCDZxGWqbMMwzVJ-Mn--dv3DA@mail.gmail.com/
> > > 
> > > My big concern is that the way proxy was used early in the series
> > > seemed to be inverted from how the term is commonly used.
> > > 
> > > A proxy is one who takes an action on behalf of someone else.
> 
> Ah, I see your confusion.
> 
> > > In this case we have a blocked task that was picked to run, but then
> > > we run another task in its place. Intuitively, this makes the proxy
> > > the one that actually runs, not the one that was picked. But the
> > > earliest versions of the patch had this flipped, and caused lots of
> > > conceptual confusion in the discussions I had with folks about what
> > > the patch was doing (as well as my own confusion initially working on
> > > the patch).
> > 
> > I don't think I have strong preferences either way, but I actually
> > considered the proxy to be the blocked donor (the one picked by the
> > scheduler to run), as it makes the owner use its properties, acting as a
> > proxy for the owner.
> 
> This. But I suspect we both suffer from not being native English
> speakers.
> 
> Would 'donor' work in this case?
>
> Then the donor gets all the accounting done on it, while we execute
> curr.
> 

"donor" works for me and is more expressive (and shorter) than
"selected". Fwiw.



Cheers,
Phil

--

Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts

Posted by Steven Rostedt 1 year, 3 months ago

On Wed, 31 Jul 2024 13:37:20 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> > I don't think I have strong preferences either way, but I actually
> > considered the proxy to be the blocked donor (the one picked by the
> > scheduler to run), as it makes the owner use its properties, acting as a
> > proxy for the owner.  
> 
> This. But I suspect we both suffer from not being native English
> speakers.

Has nothing to do with being a native English speaker or not. "proxy" is
not a commonly used word and can easily be misused.

> 
> Would 'donor' work in this case?
> 
> Then the donor gets all the accounting done on it, while we execute
> curr.

I agree that "donor" is a bit easier to understand.

-- Steve

Re: [PATCH v11 7/7] sched: Split scheduler and execution contexts

Posted by John Stultz 1 year, 3 months ago

On Fri, Jul 12, 2024 at 12:10 PM John Stultz <jstultz@google.com> wrote:
> On Fri, Jul 12, 2024 at 8:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Jul 09, 2024 at 01:31:50PM -0700, John Stultz wrote:
> > > From: Peter Zijlstra <peterz@infradead.org>
> > >
> > > Let's define the scheduling context as all the scheduler state
> > > in task_struct for the task selected to run, and the execution
> > > context as all state required to actually run the task.
> > >
> > > Currently both are intertwined in task_struct. We want to
> > > logically split these such that we can use the scheduling
> > > context of the task selected to be scheduled, but use the
> > > execution context of a different task to actually be run.
> > >
> > > To this purpose, introduce rq_selected() macro to point to the
> > > task_struct selected from the runqueue by the scheduler, and
> > > will be used for scheduler state, and preserve rq->curr to
> > > indicate the execution context of the task that will actually be
> > > run.
> >
> > > * Swapped proxy for selected for clarity
> >
> > I'm not loving this naming...  what does selected even mean? What was
> > wrong with proxy? -- (did we have this conversation before?)
>
> So yeah, this came up earlier:
> https://lore.kernel.org/lkml/CANDhNCr3acrEpBYd2LVkY3At=HCDZxGWqbMMwzVJ-Mn--dv3DA@mail.gmail.com/
>
> My big concern is that the way proxy was used early in the series
> seemed to be inverted from how the term is commonly used.
>
> A proxy is one who takes an action on behalf of someone else.
>
> In this case we have a blocked task that was picked to run, but then
> we run another task in its place. Intuitively, this makes the proxy
> the one that actually runs, not the one that was picked. But the
> earliest versions of the patch had this flipped, and caused lots of
> conceptual confusion in the discussions I had with folks about what
> the patch was doing (as well as my own confusion initially working on
> the patch).
>
> So to avoid this, I've minimized the use of the term proxy in the
> code, trying to use less confusing terms.
>
> To me, selected is a more straightforward term, as it's the task the
> scheduler chose to run (but not necessarily the one that actually
> runs).
> And curr is fine enough, as it is the currently running task.
>
> But I'm not wedded to any particular name.  "selected", "chosen",
> (Valentin suggested "picked" earlier, which I'd be ok with, but wanted
> some sense of consensus before I go through to rework it all), but I
> do think "proxy" for the scheduler context would make the code
> unnecessarily difficult for others to understand.
...

> > I realize this is going to be a lot of typing to fix up, and perhaps
> > there's a reason to not do this. But...
>
> I have no problems reworking this if it helps get this series unstuck!
>
> I might strongly push back against "proxy" as the name for the
> scheduler context, but if it really is your favorite, I'll hold my
> nose to move this along.
>
> Do let me know your final preference and I'll go rework it all asap.

Hey Peter,
  Just wanted to check in on this in case the mail slipped by. I'm
eager to rework the naming and get this re-sent, but I just wanted to
confirm your preference here.

thanks
-john

[PATCH v11 1/7] locking/mutex: Remove wakeups from under mutex::wait_lock
[PATCH v11 2/7] locking/mutex: Make mutex::wait_lock irq safe
[PATCH v11 3/7] locking/mutex: Expose __mutex_owner()
[PATCH v11 4/7] sched: Add move_queued_task_locked helper
[PATCH v11 5/7] sched: Consolidate pick_*_task to task_is_pushable helper
[PATCH v11 6/7] sched: Split out __schedule() deactivate task logic into a helper
[PATCH v11 7/7] sched: Split scheduler and execution contexts