From nobody Wed Dec 31 08:30:07 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFC9AC4167B for ; Mon, 6 Nov 2023 19:36:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232984AbjKFTgq (ORCPT ); Mon, 6 Nov 2023 14:36:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57320 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233024AbjKFTgI (ORCPT ); Mon, 6 Nov 2023 14:36:08 -0500 Received: from mail-pf1-x44a.google.com (mail-pf1-x44a.google.com [IPv6:2607:f8b0:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 308AA10D2 for ; Mon, 6 Nov 2023 11:35:59 -0800 (PST) Received: by mail-pf1-x44a.google.com with SMTP id d2e1a72fcca58-6b31cb3cc7eso3225270b3a.0 for ; Mon, 06 Nov 2023 11:35:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699299358; x=1699904158; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ClsCieoc65HaPuChloqB8WZHNaRbpIH283rOAN9AQVQ=; b=zv0CaL8fWlXFmEp9Ha6phT8k1R1y86ZLSsJT98IS1GRaNBeN4+JvwONvBIVSXROSwv /MNyR6QA6ZewhZJ/oToobTZ0HkgejG8DmabpVe+bfxL8cU89sEntbS7vgorbhiy+eDoF POztwXu5lc5gNzgo7845Kz9BzLqSEHrtjx4IY7HuxNyDrCBwksL+zVr1tbL0FzCbFxQO ZudRiGAxReOXxud+IDlg1N89RLybWsIAGvBC6uLAHrqpEHmB55hVKIPfm19Jqv0fSoKg mPgWeTSakpYT5lqbNQLgZ7sHAToulqHRraU3NFmFwXQHcK9zIbEVYu0z9SgrwlmJtJp3 1XhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699299358; x=1699904158; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ClsCieoc65HaPuChloqB8WZHNaRbpIH283rOAN9AQVQ=; b=My7z/bCyyqfayLLV2sZOEAyxyEQdTXdVjtD0iqW2BrkdnoKW1X3h48yLTfOF9wzeLC 0+mMmkSu3PkTvvaAXJB+swd1kdGjsFD4NXZiwg9x9VFTiQoNatNrfCjJWYxf8sYQqk1L rxMhNfmtH3Xxf4NsW+2+fReuzRDO95TohdMlNeDJQIY+1sascw8gxFgrKtqtNqNIEW9x VOKcqOVJCoor2qB+DkfZOMn1Y/06KLbLm0mMhwxqQun5MQrUz+6P88pVkPx5vDt4MsC7 afCba1WKtf8ISz2PNrYWxFbbo6PHCBMxNq5t3O9+lAjbZP/3U32lE9UA/a6qtVhfDKfv OhPg== X-Gm-Message-State: AOJu0Yw+QV64ABFZMJiXzWJC0YoctoGha11rEazIIGMRIQQDhsTyy7S8 LjO1P88VvgYvb/9r+a9eAguZbYy2HYm5k0Qy3XJ6SekPoVoJzLm/6saCZhxsTtYPWtiiSNPcbto JTOgtZe9haS0k48wr4nTarwifUFUonNyx6m2TerZ5cHRObnHKOI+eNV1Lbqq69h0eGcgCEdA= X-Google-Smtp-Source: AGHT+IHtI/o5pNeGz2cY7lq9IkPnYGSTCv2/jAujz0MnI+G/LRxESQ+2O6giZBrdCidFPtMXcGmuDiYAecPK X-Received: from jstultz-noogler2.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:600]) (user=jstultz job=sendgmr) by 2002:a05:6a00:26ea:b0:692:c216:8830 with SMTP id p42-20020a056a0026ea00b00692c2168830mr16132pfw.0.1699299357192; Mon, 06 Nov 2023 11:35:57 -0800 (PST) Date: Mon, 6 Nov 2023 19:34:54 +0000 In-Reply-To: <20231106193524.866104-1-jstultz@google.com> Mime-Version: 1.0 References: <20231106193524.866104-1-jstultz@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231106193524.866104-12-jstultz@google.com> Subject: [PATCH v6 11/20] sched: Split scheduler execution context From: John Stultz To: LKML Cc: Peter Zijlstra , Joel Fernandes , Qais Yousef , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Youssef Esmat , Mel Gorman , Daniel Bristot de Oliveira , Will Deacon , Waiman Long , Boqun Feng , "Paul E . McKenney" , kernel-team@android.com, "Connor O'Brien" , John Stultz Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Lets define the scheduling context as all the scheduler state in task_struct and the execution context as all state required to run the task. Currently both are intertwined in task_struct. We want to logically split these such that we can run the execution context of one task with the scheduling context of another. To this purpose introduce rq_selected() macro to point to the task_struct used for scheduler state and preserve rq->curr to denote the execution context. NOTE: Peter previously mentioned he didn't like the name "rq_selected()", but I've not come up with a better alternative. I'm very open to other name proposals. Question for Peter: Dietmar suggested you'd prefer I drop the conditionalization of the scheduler context pointer on the rq (so rq_selected() would be open coded as rq->curr_sched or whatever we agree on for a name), but I'd think in the !CONFIG_PROXY_EXEC case we'd want to avoid the wasted pointer and its use (since it curr_sched would always be =3D=3D curr)? If I'm wrong I'm fine switching this, but would appreciate clarification. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Youssef Esmat Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E . McKenney" Cc: kernel-team@android.com Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Juri Lelli Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20181009092434.26221-5-juri.lelli@redhat.com [add additional comments and update more sched_class code to use rq::proxy] Signed-off-by: Connor O'Brien [jstultz: Rebased and resolved minor collisions, reworked to use accessors, tweaked update_curr_common to use rq_proxy fixing rt scheduling issues] Signed-off-by: John Stultz --- v2: * Reworked to use accessors * Fixed update_curr_common to use proxy instead of curr v3: * Tweaked wrapper names * Swapped proxy for selected for clarity v4: * Minor variable name tweaks for readability * Use a macro instead of a inline function and drop other helper functions as suggested by Peter. * Remove verbose comments/questions to avoid review distractions, as suggested by Dietmar v5: * Add CONFIG_PROXY_EXEC option to this patch so the new logic can be tested with this change * Minor fix to grab rq_selected when holding the rq lock --- kernel/sched/core.c | 38 +++++++++++++++++++++++++------------- kernel/sched/deadline.c | 35 ++++++++++++++++++----------------- kernel/sched/fair.c | 18 +++++++++--------- kernel/sched/rt.c | 40 ++++++++++++++++++++-------------------- kernel/sched/sched.h | 37 +++++++++++++++++++++++++++++++++++-- 5 files changed, 107 insertions(+), 61 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a38bf8ef5798..9931940ba474 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -824,7 +824,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *time= r) =20 rq_lock(rq, &rf); update_rq_clock(rq); - rq->curr->sched_class->task_tick(rq, rq->curr, 1); + rq_selected(rq)->sched_class->task_tick(rq, rq_selected(rq), 1); rq_unlock(rq, &rf); =20 return HRTIMER_NORESTART; @@ -2251,16 +2251,18 @@ static inline void check_class_changed(struct rq *r= q, struct task_struct *p, =20 void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) { - if (p->sched_class =3D=3D rq->curr->sched_class) - rq->curr->sched_class->check_preempt_curr(rq, p, flags); - else if (sched_class_above(p->sched_class, rq->curr->sched_class)) + struct task_struct *curr =3D rq_selected(rq); + + if (p->sched_class =3D=3D curr->sched_class) + curr->sched_class->check_preempt_curr(rq, p, flags); + else if (sched_class_above(p->sched_class, curr->sched_class)) resched_curr(rq); =20 /* * A queue event has occurred, and we're going to schedule. In * this case, we can save a useless back to back clock update. */ - if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr)) + if (task_on_rq_queued(curr) && test_tsk_need_resched(rq->curr)) rq_clock_skip_update(rq); } =20 @@ -2799,7 +2801,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct a= ffinity_context *ctx) lockdep_assert_held(&p->pi_lock); =20 queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); =20 if (queued) { /* @@ -5602,7 +5604,7 @@ unsigned long long task_sched_runtime(struct task_str= uct *p) * project cycles that may never be accounted to this * thread, breaking clock_gettime(). */ - if (task_current(rq, p) && task_on_rq_queued(p)) { + if (task_current_selected(rq, p) && task_on_rq_queued(p)) { prefetch_curr_exec_start(p); update_rq_clock(rq); p->sched_class->update_curr(rq); @@ -5670,7 +5672,8 @@ void scheduler_tick(void) { int cpu =3D smp_processor_id(); struct rq *rq =3D cpu_rq(cpu); - struct task_struct *curr =3D rq->curr; + /* accounting goes to the selected task */ + struct task_struct *curr; struct rq_flags rf; unsigned long thermal_pressure; u64 resched_latency; @@ -5681,6 +5684,7 @@ void scheduler_tick(void) sched_clock_tick(); =20 rq_lock(rq, &rf); + curr =3D rq_selected(rq); =20 update_rq_clock(rq); thermal_pressure =3D arch_scale_thermal_pressure(cpu_of(rq)); @@ -5765,6 +5769,12 @@ static void sched_tick_remote(struct work_struct *wo= rk) struct task_struct *curr =3D rq->curr; =20 if (cpu_online(cpu)) { + /* + * Since this is a remote tick for full dynticks mode, + * we are always sure that there is no proxy (only a + * single task is running). + */ + SCHED_WARN_ON(rq->curr !=3D rq_selected(rq)); update_rq_clock(rq); =20 if (!is_idle_task(curr)) { @@ -6688,6 +6698,7 @@ static void __sched notrace __schedule(unsigned int s= ched_mode) } =20 next =3D pick_next_task(rq, prev, &rf); + rq_set_selected(rq, next); clear_tsk_need_resched(prev); clear_preempt_need_resched(); #ifdef CONFIG_SCHED_DEBUG @@ -7154,7 +7165,7 @@ void rt_mutex_setprio(struct task_struct *p, struct t= ask_struct *pi_task) =20 prev_class =3D p->sched_class; queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); if (queued) dequeue_task(rq, p, queue_flag); if (running) @@ -7242,7 +7253,7 @@ void set_user_nice(struct task_struct *p, long nice) goto out_unlock; } queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); if (queued) dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK); if (running) @@ -7826,7 +7837,7 @@ static int __sched_setscheduler(struct task_struct *p, } =20 queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); if (queued) dequeue_task(rq, p, queue_flags); if (running) @@ -9327,6 +9338,7 @@ void __init init_idle(struct task_struct *idle, int c= pu) rcu_read_unlock(); =20 rq->idle =3D idle; + rq_set_selected(rq, idle); rcu_assign_pointer(rq->curr, idle); idle->on_rq =3D TASK_ON_RQ_QUEUED; #ifdef CONFIG_SMP @@ -9416,7 +9428,7 @@ void sched_setnuma(struct task_struct *p, int nid) =20 rq =3D task_rq_lock(p, &rf); queued =3D task_on_rq_queued(p); - running =3D task_current(rq, p); + running =3D task_current_selected(rq, p); =20 if (queued) dequeue_task(rq, p, DEQUEUE_SAVE); @@ -10543,7 +10555,7 @@ void sched_move_task(struct task_struct *tsk) =20 update_rq_clock(rq); =20 - running =3D task_current(rq, tsk); + running =3D task_current_selected(rq, tsk); queued =3D task_on_rq_queued(tsk); =20 if (queued) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 9522e6607754..e8bca6b8da6f 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1174,7 +1174,7 @@ static enum hrtimer_restart dl_task_timer(struct hrti= mer *timer) #endif =20 enqueue_task_dl(rq, p, ENQUEUE_REPLENISH); - if (dl_task(rq->curr)) + if (dl_task(rq_selected(rq))) check_preempt_curr_dl(rq, p, 0); else resched_curr(rq); @@ -1297,7 +1297,7 @@ static u64 grub_reclaim(u64 delta, struct rq *rq, str= uct sched_dl_entity *dl_se) */ static void update_curr_dl(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *curr =3D rq_selected(rq); struct sched_dl_entity *dl_se =3D &curr->dl; s64 delta_exec, scaled_delta_exec; int cpu =3D cpu_of(rq); @@ -1810,7 +1810,7 @@ static int find_later_rq(struct task_struct *task); static int select_task_rq_dl(struct task_struct *p, int cpu, int flags) { - struct task_struct *curr; + struct task_struct *curr, *selected; bool select_rq; struct rq *rq; =20 @@ -1821,6 +1821,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int= flags) =20 rcu_read_lock(); curr =3D READ_ONCE(rq->curr); /* unlocked access */ + selected =3D READ_ONCE(rq_selected(rq)); =20 /* * If we are dealing with a -deadline task, we must @@ -1831,9 +1832,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int= flags) * other hand, if it has a shorter deadline, we * try to make it stay here, it might be important. */ - select_rq =3D unlikely(dl_task(curr)) && + select_rq =3D unlikely(dl_task(selected)) && (curr->nr_cpus_allowed < 2 || - !dl_entity_preempt(&p->dl, &curr->dl)) && + !dl_entity_preempt(&p->dl, &selected->dl)) && p->nr_cpus_allowed > 1; =20 /* @@ -1896,7 +1897,7 @@ static void check_preempt_equal_dl(struct rq *rq, str= uct task_struct *p) * let's hope p can move out. */ if (rq->curr->nr_cpus_allowed =3D=3D 1 || - !cpudl_find(&rq->rd->cpudl, rq->curr, NULL)) + !cpudl_find(&rq->rd->cpudl, rq_selected(rq), NULL)) return; =20 /* @@ -1935,7 +1936,7 @@ static int balance_dl(struct rq *rq, struct task_stru= ct *p, struct rq_flags *rf) static void check_preempt_curr_dl(struct rq *rq, struct task_struct *p, int flags) { - if (dl_entity_preempt(&p->dl, &rq->curr->dl)) { + if (dl_entity_preempt(&p->dl, &rq_selected(rq)->dl)) { resched_curr(rq); return; } @@ -1945,7 +1946,7 @@ static void check_preempt_curr_dl(struct rq *rq, stru= ct task_struct *p, * In the unlikely case current and p have the same deadline * let us try to decide what's the best thing to do... */ - if ((p->dl.deadline =3D=3D rq->curr->dl.deadline) && + if ((p->dl.deadline =3D=3D rq_selected(rq)->dl.deadline) && !test_tsk_need_resched(rq->curr)) check_preempt_equal_dl(rq, p); #endif /* CONFIG_SMP */ @@ -1980,7 +1981,7 @@ static void set_next_task_dl(struct rq *rq, struct ta= sk_struct *p, bool first) if (hrtick_enabled_dl(rq)) start_hrtick_dl(rq, p); =20 - if (rq->curr->sched_class !=3D &dl_sched_class) + if (rq_selected(rq)->sched_class !=3D &dl_sched_class) update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0); =20 deadline_queue_push_tasks(rq); @@ -2297,8 +2298,8 @@ static int push_dl_task(struct rq *rq) * can move away, it makes sense to just reschedule * without going further in pushing next_task. */ - if (dl_task(rq->curr) && - dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) && + if (dl_task(rq_selected(rq)) && + dl_time_before(next_task->dl.deadline, rq_selected(rq)->dl.deadline) = && rq->curr->nr_cpus_allowed > 1) { resched_curr(rq); return 0; @@ -2423,7 +2424,7 @@ static void pull_dl_task(struct rq *this_rq) * deadline than the current task of its runqueue. */ if (dl_time_before(p->dl.deadline, - src_rq->curr->dl.deadline)) + rq_selected(src_rq)->dl.deadline)) goto skip; =20 if (is_migration_disabled(p)) { @@ -2462,9 +2463,9 @@ static void task_woken_dl(struct rq *rq, struct task_= struct *p) if (!task_on_cpu(rq, p) && !test_tsk_need_resched(rq->curr) && p->nr_cpus_allowed > 1 && - dl_task(rq->curr) && + dl_task(rq_selected(rq)) && (rq->curr->nr_cpus_allowed < 2 || - !dl_entity_preempt(&p->dl, &rq->curr->dl))) { + !dl_entity_preempt(&p->dl, &rq_selected(rq)->dl))) { push_dl_tasks(rq); } } @@ -2639,12 +2640,12 @@ static void switched_to_dl(struct rq *rq, struct ta= sk_struct *p) return; } =20 - if (rq->curr !=3D p) { + if (rq_selected(rq) !=3D p) { #ifdef CONFIG_SMP if (p->nr_cpus_allowed > 1 && rq->dl.overloaded) deadline_queue_push_tasks(rq); #endif - if (dl_task(rq->curr)) + if (dl_task(rq_selected(rq))) check_preempt_curr_dl(rq, p, 0); else resched_curr(rq); @@ -2673,7 +2674,7 @@ static void prio_changed_dl(struct rq *rq, struct tas= k_struct *p, if (!rq->dl.overloaded) deadline_queue_pull_task(rq); =20 - if (task_current(rq, p)) { + if (task_current_selected(rq, p)) { /* * If we now have a earlier deadline task than p, * then reschedule, provided p is still on this diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c919633acd3d..3d5c1ec34bf7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1172,7 +1172,7 @@ static s64 update_curr_se(struct rq *rq, struct sched= _entity *curr) */ s64 update_curr_common(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *curr =3D rq_selected(rq); s64 delta_exec; =20 delta_exec =3D update_curr_se(rq, &curr->se); @@ -1218,7 +1218,7 @@ static void update_curr(struct cfs_rq *cfs_rq) =20 static void update_curr_fair(struct rq *rq) { - update_curr(cfs_rq_of(&rq->curr->se)); + update_curr(cfs_rq_of(&rq_selected(rq)->se)); } =20 static inline void @@ -6461,7 +6461,7 @@ static void hrtick_start_fair(struct rq *rq, struct t= ask_struct *p) s64 delta =3D slice - ran; =20 if (delta < 0) { - if (task_current(rq, p)) + if (task_current_selected(rq, p)) resched_curr(rq); return; } @@ -6476,7 +6476,7 @@ static void hrtick_start_fair(struct rq *rq, struct t= ask_struct *p) */ static void hrtick_update(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *curr =3D rq_selected(rq); =20 if (!hrtick_enabled_fair(rq) || curr->sched_class !=3D &fair_sched_class) return; @@ -8082,7 +8082,7 @@ static void set_next_buddy(struct sched_entity *se) */ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int= wake_flags) { - struct task_struct *curr =3D rq->curr; + struct task_struct *curr =3D rq_selected(rq); struct sched_entity *se =3D &curr->se, *pse =3D &p->se; struct cfs_rq *cfs_rq =3D task_cfs_rq(curr); int next_buddy_marked =3D 0; @@ -8115,7 +8115,7 @@ static void check_preempt_wakeup(struct rq *rq, struc= t task_struct *p, int wake_ * prevents us from potentially nominating it as a false LAST_BUDDY * below. */ - if (test_tsk_need_resched(curr)) + if (test_tsk_need_resched(rq->curr)) return; =20 /* Idle tasks are by definition preempted by non-idle tasks. */ @@ -9099,7 +9099,7 @@ static bool __update_blocked_others(struct rq *rq, bo= ol *done) * update_load_avg() can call cpufreq_update_util(). Make sure that RT, * DL and IRQ signals have been updated before updating CFS. */ - curr_class =3D rq->curr->sched_class; + curr_class =3D rq_selected(rq)->sched_class; =20 thermal_pressure =3D arch_scale_thermal_pressure(cpu_of(rq)); =20 @@ -12471,7 +12471,7 @@ prio_changed_fair(struct rq *rq, struct task_struct= *p, int oldprio) * our priority decreased, or if we are not currently running on * this runqueue and our priority is higher than the current's */ - if (task_current(rq, p)) { + if (task_current_selected(rq, p)) { if (p->prio > oldprio) resched_curr(rq); } else @@ -12574,7 +12574,7 @@ static void switched_to_fair(struct rq *rq, struct = task_struct *p) * kick off the schedule if running, otherwise just see * if we can still preempt the current task. */ - if (task_current(rq, p)) + if (task_current_selected(rq, p)) resched_curr(rq); else check_preempt_curr(rq, p, 0); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 327ae4148aec..bc243e70bc0e 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -574,7 +574,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *r= t_se, unsigned int flags) =20 static void sched_rt_rq_enqueue(struct rt_rq *rt_rq) { - struct task_struct *curr =3D rq_of_rt_rq(rt_rq)->curr; + struct task_struct *curr =3D rq_selected(rq_of_rt_rq(rt_rq)); struct rq *rq =3D rq_of_rt_rq(rt_rq); struct sched_rt_entity *rt_se; =20 @@ -1044,7 +1044,7 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt= _rq) */ static void update_curr_rt(struct rq *rq) { - struct task_struct *curr =3D rq->curr; + struct task_struct *curr =3D rq_selected(rq); struct sched_rt_entity *rt_se =3D &curr->rt; s64 delta_exec; =20 @@ -1591,7 +1591,7 @@ static int find_lowest_rq(struct task_struct *task); static int select_task_rq_rt(struct task_struct *p, int cpu, int flags) { - struct task_struct *curr; + struct task_struct *curr, *selected; struct rq *rq; bool test; =20 @@ -1603,6 +1603,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int= flags) =20 rcu_read_lock(); curr =3D READ_ONCE(rq->curr); /* unlocked access */ + selected =3D READ_ONCE(rq_selected(rq)); =20 /* * If the current task on @p's runqueue is an RT task, then @@ -1631,8 +1632,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int= flags) * systems like big.LITTLE. */ test =3D curr && - unlikely(rt_task(curr)) && - (curr->nr_cpus_allowed < 2 || curr->prio <=3D p->prio); + unlikely(rt_task(selected)) && + (curr->nr_cpus_allowed < 2 || selected->prio <=3D p->prio); =20 if (test || !rt_task_fits_capacity(p, cpu)) { int target =3D find_lowest_rq(p); @@ -1662,12 +1663,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, in= t flags) =20 static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p) { - /* - * Current can't be migrated, useless to reschedule, - * let's hope p can move out. - */ if (rq->curr->nr_cpus_allowed =3D=3D 1 || - !cpupri_find(&rq->rd->cpupri, rq->curr, NULL)) + !cpupri_find(&rq->rd->cpupri, rq_selected(rq), NULL)) return; =20 /* @@ -1710,7 +1707,9 @@ static int balance_rt(struct rq *rq, struct task_stru= ct *p, struct rq_flags *rf) */ static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p, in= t flags) { - if (p->prio < rq->curr->prio) { + struct task_struct *curr =3D rq_selected(rq); + + if (p->prio < curr->prio) { resched_curr(rq); return; } @@ -1728,7 +1727,7 @@ static void check_preempt_curr_rt(struct rq *rq, stru= ct task_struct *p, int flag * to move current somewhere else, making room for our non-migratable * task. */ - if (p->prio =3D=3D rq->curr->prio && !test_tsk_need_resched(rq->curr)) + if (p->prio =3D=3D curr->prio && !test_tsk_need_resched(rq->curr)) check_preempt_equal_prio(rq, p); #endif } @@ -1753,7 +1752,7 @@ static inline void set_next_task_rt(struct rq *rq, st= ruct task_struct *p, bool f * utilization. We only care of the case where we start to schedule a * rt task */ - if (rq->curr->sched_class !=3D &rt_sched_class) + if (rq_selected(rq)->sched_class !=3D &rt_sched_class) update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0); =20 rt_queue_push_tasks(rq); @@ -2034,6 +2033,7 @@ static struct task_struct *pick_next_pushable_task(st= ruct rq *rq) =20 BUG_ON(rq->cpu !=3D task_cpu(p)); BUG_ON(task_current(rq, p)); + BUG_ON(task_current_selected(rq, p)); BUG_ON(p->nr_cpus_allowed <=3D 1); =20 BUG_ON(!task_on_rq_queued(p)); @@ -2066,7 +2066,7 @@ static int push_rt_task(struct rq *rq, bool pull) * higher priority than current. If that's the case * just reschedule current. */ - if (unlikely(next_task->prio < rq->curr->prio)) { + if (unlikely(next_task->prio < rq_selected(rq)->prio)) { resched_curr(rq); return 0; } @@ -2419,7 +2419,7 @@ static void pull_rt_task(struct rq *this_rq) * p if it is lower in priority than the * current task on the run queue */ - if (p->prio < src_rq->curr->prio) + if (p->prio < rq_selected(src_rq)->prio) goto skip; =20 if (is_migration_disabled(p)) { @@ -2461,9 +2461,9 @@ static void task_woken_rt(struct rq *rq, struct task_= struct *p) bool need_to_push =3D !task_on_cpu(rq, p) && !test_tsk_need_resched(rq->curr) && p->nr_cpus_allowed > 1 && - (dl_task(rq->curr) || rt_task(rq->curr)) && + (dl_task(rq_selected(rq)) || rt_task(rq_selected(rq))) && (rq->curr->nr_cpus_allowed < 2 || - rq->curr->prio <=3D p->prio); + rq_selected(rq)->prio <=3D p->prio); =20 if (need_to_push) push_rt_tasks(rq); @@ -2547,7 +2547,7 @@ static void switched_to_rt(struct rq *rq, struct task= _struct *p) if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) rt_queue_push_tasks(rq); #endif /* CONFIG_SMP */ - if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq))) + if (p->prio < rq_selected(rq)->prio && cpu_online(cpu_of(rq))) resched_curr(rq); } } @@ -2562,7 +2562,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,= int oldprio) if (!task_on_rq_queued(p)) return; =20 - if (task_current(rq, p)) { + if (task_current_selected(rq, p)) { #ifdef CONFIG_SMP /* * If our priority decreases while running, we @@ -2588,7 +2588,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,= int oldprio) * greater than the current running task * then reschedule. */ - if (p->prio < rq->curr->prio) + if (p->prio < rq_selected(rq)->prio) resched_curr(rq); } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 7c37d478e0f8..130e669cf5da 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1013,7 +1013,10 @@ struct rq { */ unsigned int nr_uninterruptible; =20 - struct task_struct __rcu *curr; + struct task_struct __rcu *curr; /* Execution context */ +#ifdef CONFIG_PROXY_EXEC + struct task_struct __rcu *curr_sched; /* Scheduling context (policy) */ +#endif struct task_struct *idle; struct task_struct *stop; unsigned long next_balance; @@ -1212,6 +1215,22 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); #define cpu_curr(cpu) (cpu_rq(cpu)->curr) #define raw_rq() raw_cpu_ptr(&runqueues) =20 +#ifdef CONFIG_PROXY_EXEC +#define rq_selected(rq) ((rq)->curr_sched) +#define cpu_curr_selected(cpu) (cpu_rq(cpu)->curr_sched) +static inline void rq_set_selected(struct rq *rq, struct task_struct *t) +{ + rcu_assign_pointer(rq->curr_sched, t); +} +#else +#define rq_selected(rq) ((rq)->curr) +#define cpu_curr_selected(cpu) (cpu_rq(cpu)->curr) +static inline void rq_set_selected(struct rq *rq, struct task_struct *t) +{ + /* Do nothing */ +} +#endif + struct sched_group; #ifdef CONFIG_SCHED_CORE static inline struct cpumask *sched_group_span(struct sched_group *sg); @@ -2128,11 +2147,25 @@ static inline u64 global_rt_runtime(void) return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC; } =20 +/* + * Is p the current execution context? + */ static inline int task_current(struct rq *rq, struct task_struct *p) { return rq->curr =3D=3D p; } =20 +/* + * Is p the current scheduling context? + * + * Note that it might be the current execution context at the same time if + * rq->curr =3D=3D rq_selected() =3D=3D p. + */ +static inline int task_current_selected(struct rq *rq, struct task_struct = *p) +{ + return rq_selected(rq) =3D=3D p; +} + #ifdef CONFIG_PROXY_EXEC static inline bool task_is_blocked(struct task_struct *p) { @@ -2308,7 +2341,7 @@ struct sched_class { =20 static inline void put_prev_task(struct rq *rq, struct task_struct *prev) { - WARN_ON_ONCE(rq->curr !=3D prev); + WARN_ON_ONCE(rq_selected(rq) !=3D prev); prev->sched_class->put_prev_task(rq, prev); } =20 --=20 2.42.0.869.gea05f2083d-goog