From nobody Mon Dec 1 22:05:57 2025 Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B823A301035 for ; Mon, 1 Dec 2025 09:13:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764580401; cv=none; b=FuXCBO4ZdeL+0bkRqL66t5itArR7dtarr6X7Ql6GN9J0JlIla8lkhGW44gbG9cKNh//IT8wn2ldmued58PlFEoF/C88axvE88rWH8EEBD+mBc2QO/+sKXb4aAnYqTEa/OoEKHS0Vf5odFBixxjtfE4f5U5cFw1+kDUVpEjEvOYM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764580401; c=relaxed/simple; bh=TWMuj6O2T1H94T5izSq1oEyI0WoW18DKO4WAteFRkKA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GoHbYGLnryzoMK4FGPaTmTyBkOXT6bM7dP8/qFjKqo4bgff8toq+K+jYSPFxga4oqCGneTtGZzJnxMhnqFfyrEICO3C732UMlj/KTXlPmXgAytZyU445OdbTM9AxqNS04Uiwiy5TCfEh278ZCqhZPvR0hLmYoXL/OgKT62/RKqk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org; spf=pass smtp.mailfrom=linaro.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b=O1fcz+Nb; arc=none smtp.client-ip=209.85.221.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linaro.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="O1fcz+Nb" Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-42e2d44c727so584592f8f.0 for ; Mon, 01 Dec 2025 01:13:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1764580398; x=1765185198; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7eIKMzShNanebsz39IRyp+q+2Oqh30CR1zVjIwRlB9s=; b=O1fcz+NbBr4jFxUx6eKHZqow1GcipGbnJ1XN7ohm737XnGXyuUNjypiuO8U+jgfhVm hRVNyYtaCqzSllrsYBnbqcsa+ohp+/rPM0wRWI72ynH30WsMDdKSjYa1v5Dwqn4H2U/p raSNc/4fRm/p+qPM1iuVWld5iqjw/yMqUEf/Ypf+6ZOL6GbBwQNARP1VPqnW9gr0Egu5 VIr4F0OBEXYDNMC2IVxZ/KuWlkp5qAyLRs1YZgSbyh7Jru1etobn3xYoX3zzmD0btih1 m0cwh1pSyRziKDlkr59WYAvhpUizZeM2/taLt4k9FvZ4JAPy+KLsF7ZDPTWXfXzAqI4O CSPQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764580398; x=1765185198; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=7eIKMzShNanebsz39IRyp+q+2Oqh30CR1zVjIwRlB9s=; b=qmETZrczpg4QqK80/aoWib5IXcZMLKLxQkkxDGfHDkVxorS6E+MlWN+hecO5wnN34n t4Q+JcW6/3ZMRwgpoE8OS3jIasQQsTtLtKR2cDZIqSJtv5LoZwbQgQYQH1aOXqKhPzFF 2p7XbaMjy8r8LhM3ngkuCTZhW6zYJw44+Qh1WO1tXSwk54PsujybfLK/KXsudtULAzbS oMSsH8BWApoD8GsyMMPe2Qg5vUkgdHR5lD9HukOcnajNbrp5VzgqSXRI11Z0TUb1Due7 hEXEpOQWc6q9xJRuiFwkrh0W6bJltjLtpQZCIHyDUB6PtyPhWkvmZasMJAp/9o6ecxCY 5TgA== X-Forwarded-Encrypted: i=1; AJvYcCUEVs+JxqNuNew0trcu6Deve7zD3nBvVGdpHA0dQT2pt3okPbBh8jX+3Gqijho4hZb9BxIqCV/tCGmpCXY=@vger.kernel.org X-Gm-Message-State: AOJu0YyfSgH/MCIFmMav2qEn3ZnvG5psQaF8VeReQd8cCr0wEH4nm/tO mFY+mV0CPI67FxwjYYrBfPcB07qxgWhcJ5vJiedRFkeBPPbS7Q0JI/CPR0f2XFCub8A= X-Gm-Gg: ASbGncvc9MJ+OY0vzcL3Bc5NslhQL0WEwTy75Q3T3PEE55jFEwmqdbpRmBkNFzHPmK9 tsmwrEpsTaWJdsNmlONJ/dY8eijA6aQVVo+iOiKGF1DlrYT4qjK3t6NMrvMPR5PWgD8InBQaz+d X+tjh4X6KdcjUug46bdiHizHfXj3nNcB5fGmCuzzInRH9LhnHzm66SQ0wc+Ttu1nxJ6V0/B9L7U k/Bo+OE+w1TSJkT8zzZoQcVMMQ91LUT1A1kH6LATj8Qq7yWsX9V86GmlD9ldOoZoPMY840qLfvW 8mdu0HZ5GzggWXhumby+U+jAxCMt13IGQPnLsuD3otMaqLtd2GcVtANUZzMK8FglQogCLVdJWAT /E5nWaje20uyBRI+YQGpnqzcMH3OnQ6KMeZ1MQNjRniywcm81EYYQQ2aEdpUp3+HOO2oyGpilN8 iCZ5W7KwSyCO4wkXGwaxrJXVEYnTOv5g== X-Google-Smtp-Source: AGHT+IEmQNcSzsFsTaxk/7FjXYT4vdoumBVXxltVbe78suF6ZSkZb3LHRiWFkRPxdaa/J2xklFKRsw== X-Received: by 2002:a05:6000:178b:b0:42b:487c:d7cb with SMTP id ffacd0b85a97d-42cc1cf3860mr37808875f8f.34.1764580397731; Mon, 01 Dec 2025 01:13:17 -0800 (PST) Received: from vingu-cube.. ([2a01:e0a:f:6020:77d1:da59:4ce1:741c]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-42e1caa767dsm25344110f8f.38.2025.12.01.01.13.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Dec 2025 01:13:16 -0800 (PST) From: Vincent Guittot To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, pierre.gondois@arm.com, kprateek.nayak@amd.com Cc: qyousef@layalina.io, hongyan.xia2@arm.com, christian.loehle@arm.com, luis.machado@arm.com, Vincent Guittot Subject: [PATCH 4/6 v7] sched/fair: Add push task mechanism for fair Date: Mon, 1 Dec 2025 10:13:06 +0100 Message-ID: <20251201091308.761711-5-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251201091308.761711-1-vincent.guittot@linaro.org> References: <20251201091308.761711-1-vincent.guittot@linaro.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" EAS is based on wakeup events to efficiently place tasks on the system, but there are cases where a task doesn't have wakeup events anymore or at a far too low pace. For such situation, we can take advantage of the task being put back in the enqueued list to check if it should be pushed on another CPU. When the task is alone on the CPU, it's never put back in the enqueued list; In this special case, we use the tick to run the check. Add a push task mecanism that enables fair scheduler to push runnable tasks. EAS will be one user but other feature like filling idle CPUs can also take advantage of it. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 211 ++++++++++++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 4 + 2 files changed, 213 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 80c4131fb35b..4e94a4cb8caa 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6989,6 +6989,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) hrtick_update(rq); } =20 +static void fair_remove_pushable_task(struct rq *rq, struct task_struct *p= ); /* * Basically dequeue_task_fair(), except it can deal with dequeue_entity() * failing half-way through and resume the dequeue later. @@ -7017,6 +7018,8 @@ static int dequeue_entities(struct rq *rq, struct sch= ed_entity *se, int flags) h_nr_idle =3D task_has_idle_policy(p); if (task_sleep || task_delayed || !se->sched_delayed) h_nr_runnable =3D 1; + + fair_remove_pushable_task(rq, p); } =20 for_each_sched_entity(se) { @@ -8504,6 +8507,187 @@ static int find_energy_efficient_cpu(struct task_st= ruct *p, int prev_cpu) return target; } =20 +DEFINE_STATIC_KEY_FALSE(sched_push_task); + +static inline bool sched_push_task_enabled(void) +{ + return static_branch_unlikely(&sched_push_task); +} + +static bool fair_push_task(struct rq *rq, struct task_struct *p) +{ + return false; +} + +static inline int has_pushable_tasks(struct rq *rq) +{ + return !plist_head_empty(&rq->cfs.pushable_tasks); +} + +static struct task_struct *pick_next_pushable_fair_task(struct rq *rq) +{ + struct task_struct *p; + + if (!has_pushable_tasks(rq)) + return NULL; + + p =3D plist_first_entry(&rq->cfs.pushable_tasks, + struct task_struct, pushable_tasks); + + WARN_ON_ONCE(rq->cpu !=3D task_cpu(p)); + WARN_ON_ONCE(task_current(rq, p)); + WARN_ON_ONCE(p->nr_cpus_allowed <=3D 1); + WARN_ON_ONCE(!task_on_rq_queued(p)); + + /* + * Remove task from the pushable list as we try only once after that + * the task has been put back in enqueued list. + */ + plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks); + + return p; +} + +static int +select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags); + +/* + * See if the non running fair tasks on this rq can be sent on other CPUs + * that fits better with their profile. + */ +static bool push_fair_task(struct rq *rq) +{ + struct task_struct *next_task; + int prev_cpu, new_cpu; + struct rq *new_rq; + + next_task =3D pick_next_pushable_fair_task(rq); + if (!next_task) + return false; + + if (is_migration_disabled(next_task)) + return true; + + /* We might release rq lock */ + get_task_struct(next_task); + + prev_cpu =3D rq->cpu; + + new_cpu =3D select_task_rq_fair(next_task, prev_cpu, 0); + + if (new_cpu =3D=3D prev_cpu) + goto out; + + new_rq =3D cpu_rq(new_cpu); + + if (double_lock_balance(rq, new_rq)) { + /* The task has already migrated in between */ + if (task_cpu(next_task) !=3D rq->cpu) { + double_unlock_balance(rq, new_rq); + goto out; + } + + deactivate_task(rq, next_task, 0); + set_task_cpu(next_task, new_cpu); + activate_task(new_rq, next_task, 0); + + resched_curr(new_rq); + + double_unlock_balance(rq, new_rq); + } + +out: + put_task_struct(next_task); + + return true; +} + +static void push_fair_tasks(struct rq *rq) +{ + /* push_fair_task() will return true if it moved a fair task */ + while (push_fair_task(rq)) + ; +} + +static DEFINE_PER_CPU(struct balance_callback, fair_push_head); + +static inline void fair_queue_pushable_tasks(struct rq *rq) +{ + if (!sched_push_task_enabled() || !has_pushable_tasks(rq)) + return; + + queue_balance_callback(rq, &per_cpu(fair_push_head, rq->cpu), push_fair_t= asks); +} + +static void fair_remove_pushable_task(struct rq *rq, struct task_struct *p) +{ + if (sched_push_task_enabled()) + plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks); +} + +static void fair_add_pushable_task(struct rq *rq, struct task_struct *p) +{ + if (sched_push_task_enabled() && fair_push_task(rq, p)) { + plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks); + plist_node_init(&p->pushable_tasks, p->prio); + plist_add(&p->pushable_tasks, &rq->cfs.pushable_tasks); + } +} + +static int active_load_balance_cpu_stop(void *data); + +/* + * See if the alone task running on the CPU should migrate on a better than + * the local one. + */ +static inline bool check_pushable_task(struct task_struct *p, struct rq *r= q) +{ + int new_cpu, cpu =3D cpu_of(rq); + + if (!sched_push_task_enabled()) + return false; + + if (WARN_ON(!p)) + return false; + + if (WARN_ON(!task_current(rq, p))) + return false; + + if (is_migration_disabled(p)) + return false; + + /* If there are several task, wait for being put back */ + if (rq->nr_running > 1) + return false; + + if (!fair_push_task(rq, p)) + return false; + + new_cpu =3D select_task_rq_fair(p, cpu, 0); + + if (new_cpu =3D=3D cpu) + return false; + + /* + * ->active_balance synchronizes accesses to + * ->active_balance_work. Once set, it's cleared + * only after active load balance is finished. + */ + if (!rq->active_balance) { + rq->active_balance =3D 1; + rq->push_cpu =3D new_cpu; + } else + return false; + + raw_spin_rq_unlock(rq); + stop_one_cpu_nowait(cpu, + active_load_balance_cpu_stop, rq, + &rq->active_balance_work); + raw_spin_rq_lock(rq); + + return true; +} + /* * select_task_rq_fair: Select target runqueue for the waking task in doma= ins * that have the relevant SD flag set. In practice, this is SD_BALANCE_WAK= E, @@ -8973,6 +9157,12 @@ pick_next_task_fair(struct rq *rq, struct task_struc= t *prev, struct rq_flags *rf put_prev_entity(cfs_rq, pse); set_next_entity(cfs_rq, se); =20 + /* + * The previous task might be eligible for being pushed on + * another cpu if it is still active. + */ + fair_add_pushable_task(rq, prev); + __set_next_task_fair(rq, p, true); } =20 @@ -9036,6 +9226,13 @@ static void put_prev_task_fair(struct rq *rq, struct= task_struct *prev, struct t cfs_rq =3D cfs_rq_of(se); put_prev_entity(cfs_rq, se); } + + /* + * The previous task might be eligible for being pushed on another cpu + * if it is still active. + */ + fair_add_pushable_task(rq, prev); + } =20 /* @@ -13390,8 +13587,10 @@ static void task_tick_fair(struct rq *rq, struct t= ask_struct *curr, int queued) if (static_branch_unlikely(&sched_numa_balancing)) task_tick_numa(rq, curr); =20 - update_misfit_status(curr, rq); - check_update_overutilized_status(task_rq(curr)); + if (!check_pushable_task(curr, rq)) { + update_misfit_status(curr, rq); + check_update_overutilized_status(task_rq(curr)); + } =20 task_tick_core(rq, curr); } @@ -13552,6 +13751,8 @@ static void __set_next_task_fair(struct rq *rq, str= uct task_struct *p, bool firs { struct sched_entity *se =3D &p->se; =20 + fair_remove_pushable_task(rq, p); + if (task_on_rq_queued(p)) { /* * Move the next running task to the front of the list, so our @@ -13567,6 +13768,11 @@ static void __set_next_task_fair(struct rq *rq, st= ruct task_struct *p, bool firs if (hrtick_enabled_fair(rq)) hrtick_start_fair(rq, p); =20 + /* + * Try to push prev task before checking misfit for next task as + * the migration of prev can make next fitting the CPU + */ + fair_queue_pushable_tasks(rq); update_misfit_status(p, rq); sched_fair_update_stop_tick(rq, p); } @@ -13596,6 +13802,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) { cfs_rq->tasks_timeline =3D RB_ROOT_CACHED; cfs_rq->zero_vruntime =3D (u64)(-(1LL << 20)); + plist_head_init(&cfs_rq->pushable_tasks); raw_spin_lock_init(&cfs_rq->removed.lock); } =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b419a4d98461..697bd654298a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -711,6 +711,8 @@ struct cfs_rq { unsigned long runnable_avg; } removed; =20 + struct plist_head pushable_tasks; + #ifdef CONFIG_FAIR_GROUP_SCHED u64 last_update_tg_load_avg; unsigned long tg_load_avg_contrib; @@ -3620,6 +3622,8 @@ static inline bool sched_energy_enabled(void) { retur= n false; } =20 #endif /* !(CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL) */ =20 +DECLARE_STATIC_KEY_FALSE(sched_push_task); + #ifdef CONFIG_MEMBARRIER =20 /* --=20 2.43.0