From nobody Mon Dec 1 22:02:39 2025 Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com [209.85.218.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 076C531062E for ; Mon, 1 Dec 2025 12:42:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764592943; cv=none; b=AZJmbnsc+R6OzvEVuRf0qmEpYw7iuaAPIaLoAg3AxykQx/REm1DJLJmEKRtUEkWG8cZjFFifx+f+emNAOL48GCrf0IAemXyvOLdeSueLzaKXzVlWNfH9kOJYjF643K8vNCJ8gPucwdh+TtzxCQ89i6Gc70Ph/YJoDQQ0v20cVNg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764592943; c=relaxed/simple; bh=UG3K+wxMNW7BTgQ31fHXNhk62O0mzoq3tbNazZI8W30=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FRXIgWZ8/31+o+gtBwKT6C4SR8xT8weEGHKIgYqkMZZUkVYjT8KPtgOIHbfDd7qs94eq+sufesyDQLt5BM3o1X4Ps9MHaZMxvDhg3Ee6EBkTmk02IkqE0mhCSOkkAHAikl55MF4KGTIb9rtH4/rakkGqKZ7GXH5ZJdSgTD7LypA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=kzsmnKGv; arc=none smtp.client-ip=209.85.218.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="kzsmnKGv" Received: by mail-ej1-f49.google.com with SMTP id a640c23a62f3a-b713c7096f9so673286166b.3 for ; Mon, 01 Dec 2025 04:42:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764592939; x=1765197739; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=F8g1afQO1rtVJZT9hlKagPzN9gZNeZwBORiXsRXGHT0=; b=kzsmnKGvl/EL6m0Pb374s3wn9NQTd0tCLjr8Unza+UW7E/8EIEOcIIBng2S/WTTHci URN8XCIq/CuZtPgiGXa16kPJjDkPwH8Ng0WSeIKqppDWcSmlLtY70tbHhloFt/QtFNp+ 0ZE6+LMSt9LRySzrtwVhNIupKJw7F1nWHFmRdeJj1oxI8xlQPDgaPrcOzNEzc7VaTrKq VT0PhqsMLF24blbJwZas2Qa7vBSmhdZJN37FIICmf+E0w5TxmAT7PmLfNq7uuj4OoMHS ZkKwNT9uAc9EhZ4S1RkNzd4tv8UM+I60xulccHXrM6B/Q8bNUeA23mnKVQV5fcI2onPo 8exw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764592939; x=1765197739; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=F8g1afQO1rtVJZT9hlKagPzN9gZNeZwBORiXsRXGHT0=; b=kkuPVCcelXd1k8BuY4oZNw1cw3I9I4c1HslbxQtZuZHAn7EnolgDNBmSAU2Zx0q3wG 6I25lIZUJbFbY9wbVCSPYZdWI2E8GZyDh/lyGcIL3Ehaky/ot7ACvzi+8OayZHOR0AdA IGZbdOvoL3OTKRk6nl97neslDoShTk9X+9j25mrVf+nJaT9iUXIs7MCHlIJ0pu7YPX2i bpgpQUvkwsLvQSs4N9gfxxjdNX21qDkbZ/xe5ak8ImyXCgy5wwAFSapdgXDYOGV6lRf8 86TtB0hRC1PgljzifwGDk6qQMYQDz89642gPqmDgAh21kq9Qi9CFnmXj+jSJK1/N8wPB 4XFg== X-Gm-Message-State: AOJu0Yydv7L1pqf6mGyweTTjguefEroUvIZFu1OeMCkLXcRe5u4kFF1m vcEU95WRX0V4Vzo60hKF0I3Deac+qDaicSkiwcSKEJs7Mq0LY+0MS4Bb X-Gm-Gg: ASbGncttsm5AOIMmLHtnlYT8dOria6oRo/HlUKJ6PDA/Zh/pAqDXJdL3LcBEMTfBCpb kceXUVoCeO5WU5KOIdE+hJYIpVXKaIVJlfhno3pNhRHHbpJa0COC+iHCQkxhnD6P9rKj/zPuXjH uX4ZHHsRxRPgn/HryPbiOX47N94SpxRyU6iJbg4dPbK0EeyOdEO9Q1AexfpwRvQB0lUM8YXg75X 85lZhSQ/AoWWl3+ZpfvNqwjQvwg7Zaie6R1BUC7dxIG1kcMtnas1c6bxKt59BvukmT+B8F7est6 woeWvZtv96F2FtSXG1hb11ID0KYYQbdtR/NZT7tudWh8ho0LfSHS6Pu9HS236SdduBWgnJHcFN4 H52R//ugvYBjUMJOiw061QXrrDhPXxtRb1Jdyf6j/w6r8B5l2q0MR2GSM4ZAv85L2PqjFMIRdjK uc2j+hA7DemtYbWBf/DpI= X-Google-Smtp-Source: AGHT+IG8a8W7ew5ByJ+3rmg/WGQAR19MtvriaAtoQ1OfKaaIVKmszSnTnu5ZLbmHwhDKzdVBdr9cLw== X-Received: by 2002:a17:907:26c2:b0:b4e:f7cc:72f1 with SMTP id a640c23a62f3a-b76715aba47mr3966683166b.22.1764592939084; Mon, 01 Dec 2025 04:42:19 -0800 (PST) Received: from victus-lab ([193.205.81.5]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b76f59e8612sm1173738266b.52.2025.12.01.04.42.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Dec 2025 04:42:18 -0800 (PST) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v4 13/28] sched/rt: Update task event callbacks for HCBS scheduling Date: Mon, 1 Dec 2025 13:41:46 +0100 Message-ID: <20251201124205.11169-14-yurand2000@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251201124205.11169-1-yurand2000@gmail.com> References: <20251201124205.11169-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update wakeup_preempt_rt, switched_{from/to}_rt and prio_changed_rt with rt-cgroup's specific preemption rules: - In wakeup_preempt_rt(), whenever a task wakes up, it must be checked if it is served by a deadline server or it lives on the global runqueue. Preemption rules (as documented in the function), change based on the current task and woken task runqueue: - If both tasks are FIFO/RR tasks on the global runqueue, or the same cgroup, run as normal. - If woken is inside a cgroup, but curr is a FIFO task on the global runqueue, always preempt. If curr is a DEADLINE task, check if the dl server preempts curr. - If both tasks are FIFO/RR tasks in served but different groups, check whether the woken server preempts the current server. - In switched_from_rt(), perform a pull only on the global runqueue, and do nothing if the task is inside a group. This will change when migrations will be added. - In switched_to_rt(), queue a push only on the global runqueue, while perform a priority check when the task switching is inside a group. This will change also when migrations will be added. - In prio_changed_rt(), queue a pull only on the global runqueue, if the task is not queued. If the task is queued, run preemption checks only if both the prio changed task and curr are in the same cgroup. Update sched_rt_can_attach() to check if a task can be attached to a given cgroup. For now the check only consists in checking if the group has non-zero bandwidth. Remove the tsk argument from sched_rt_can_attach, as it is unused. Change cpu_cgroup_can_attach() to check if the attachee is a FIFO/RR task before attaching it to a cgroup. Update __sched_setscheduler() to perform checks when trying to switch to FIFO/RR for a task inside a cgroup, as the group needs to have runtime allocated. Update task_is_throttled_rt() for SCHED_CORE, returning the is_throttled value of the server if present, while global rt-tasks are never throttled. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: luca abeni Signed-off-by: luca abeni Signed-off-by: Yuri Andriaccio --- kernel/sched/core.c | 2 +- kernel/sched/rt.c | 104 +++++++++++++++++++++++++++++++++++----- kernel/sched/sched.h | 2 +- kernel/sched/syscalls.c | 13 +++++ 4 files changed, 108 insertions(+), 13 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 495cbdfdc5..d7fc83cdae 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9318,7 +9318,7 @@ static int cpu_cgroup_can_attach(struct cgroup_taskse= t *tset) goto scx_check; cgroup_taskset_for_each(task, css, tset) { - if (!sched_rt_can_attach(css_tg(css), task)) + if (rt_task(task) && !sched_rt_can_attach(css_tg(css))) return -EINVAL; } scx_check: diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 7ec117a18d..2b7c4b7754 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -960,7 +960,58 @@ static int balance_rt(struct rq *rq, struct task_struc= t *p, struct rq_flags *rf) static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int fl= ags) { struct task_struct *donor =3D rq->donor; + struct sched_dl_entity *woken_dl_se =3D NULL; + struct sched_dl_entity *curr_dl_se =3D NULL; + if (!rt_group_sched_enabled()) + goto no_group_sched; + + /* + * Preemption checks are different if the waking task and the current task + * are running on the global runqueue or in a cgroup. The following rules + * apply: + * - dl-tasks (and equally dl_servers) always preempt FIFO/RR tasks. + * - if curr is a FIFO/RR task inside a cgroup (i.e. run by a + * dl_server), or curr is a DEADLINE task and waking is a FIFO/RR t= ask + * on the root cgroup, do nothing. + * - if waking is inside a cgroup but curr is a FIFO/RR task in the r= oot + * cgroup, always reschedule. + * - if they are both on the global runqueue, run the standard code. + * - if they are both in the same cgroup, check for tasks priorities. + * - if they are both in a cgroup, but not the same one, check whether = the + * woken task's dl_server preempts the current's dl_server. + * - if curr is a DEADLINE task and waking is in a cgroup, check whether + * the woken task's server preempts curr. + */ + if (is_dl_group(rt_rq_of_se(&p->rt))) + woken_dl_se =3D dl_group_of(rt_rq_of_se(&p->rt)); + if (is_dl_group(rt_rq_of_se(&rq->curr->rt))) + curr_dl_se =3D dl_group_of(rt_rq_of_se(&rq->curr->rt)); + else if (task_has_dl_policy(rq->curr)) + curr_dl_se =3D &rq->curr->dl; + + if (woken_dl_se !=3D NULL && curr_dl_se !=3D NULL) { + if (woken_dl_se =3D=3D curr_dl_se) { + if (p->prio < rq->curr->prio) + resched_curr(rq); + + return; + } + + if (dl_entity_preempt(woken_dl_se, curr_dl_se)) + resched_curr(rq); + + return; + + } else if (woken_dl_se !=3D NULL) { + resched_curr(rq); + return; + + } else if (curr_dl_se !=3D NULL) { + return; + } + +no_group_sched: if (p->prio < donor->prio) { resched_curr(rq); return; @@ -1710,6 +1761,8 @@ static void rq_offline_rt(struct rq *rq) */ static void switched_from_rt(struct rq *rq, struct task_struct *p) { + struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt); + /* * If there are other RT tasks then we will reschedule * and the scheduling of the other RT tasks will handle @@ -1717,10 +1770,11 @@ static void switched_from_rt(struct rq *rq, struct = task_struct *p) * we may need to handle the pulling of RT tasks * now. */ - if (!task_on_rq_queued(p) || rq->rt.rt_nr_running) + if (!task_on_rq_queued(p) || rt_rq->rt_nr_running) return; - rt_queue_pull_task(rt_rq_of_se(&p->rt)); + if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq)) + rt_queue_pull_task(rt_rq); } void __init init_sched_rt_class(void) @@ -1740,6 +1794,8 @@ void __init init_sched_rt_class(void) */ static void switched_to_rt(struct rq *rq, struct task_struct *p) { + struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt); + /* * If we are running, update the avg_rt tracking, as the running time * will now on be accounted into the latter. @@ -1755,8 +1811,14 @@ static void switched_to_rt(struct rq *rq, struct tas= k_struct *p) * then see if we can move to another run queue. */ if (task_on_rq_queued(p)) { - if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) - rt_queue_push_tasks(rt_rq_of_se(&p->rt)); + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) { + if (p->prio < rq->curr->prio) + resched_curr(rq); + } else { + if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) + rt_queue_push_tasks(rt_rq_of_se(&p->rt)); + } + if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq))) resched_curr(rq); } @@ -1769,6 +1831,8 @@ static void switched_to_rt(struct rq *rq, struct task= _struct *p) static void prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio) { + struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt); + if (!task_on_rq_queued(p)) return; @@ -1777,16 +1841,25 @@ prio_changed_rt(struct rq *rq, struct task_struct *= p, int oldprio) * If our priority decreases while running, we * may need to pull tasks to this runqueue. */ - if (oldprio < p->prio) - rt_queue_pull_task(rt_rq_of_se(&p->rt)); + if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) && oldprio < p->prio) + rt_queue_pull_task(rt_rq); /* * If there's a higher priority task waiting to run * then reschedule. */ - if (p->prio > rq->rt.highest_prio.curr) + if (p->prio > rt_rq->highest_prio.curr) resched_curr(rq); } else { + /* + * This task is not running, thus we check against the currently + * running task for preemption. We can preempt only if both tasks are + * in the same cgroup or on the global runqueue. + */ + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && + rt_rq_of_se(&p->rt)->tg !=3D rt_rq_of_se(&rq->curr->rt)->tg) + return; + /* * This task is not running, but if it is * greater than the current running task @@ -1881,7 +1954,16 @@ static unsigned int get_rr_interval_rt(struct rq *rq= , struct task_struct *task) #ifdef CONFIG_SCHED_CORE static int task_is_throttled_rt(struct task_struct *p, int cpu) { +#ifdef CONFIG_RT_GROUP_SCHED + struct rt_rq *rt_rq; + + rt_rq =3D task_group(p)->rt_rq[cpu]; + WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group); + + return dl_group_of(rt_rq)->dl_throttled; +#else return 0; +#endif } #endif /* CONFIG_SCHED_CORE */ @@ -2133,16 +2215,16 @@ static int sched_rt_global_constraints(void) } #endif /* CONFIG_SYSCTL */ -int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk) +int sched_rt_can_attach(struct task_group *tg) { /* Don't accept real-time tasks when there is no way for them to run */ - if (rt_group_sched_enabled() && rt_task(tsk) && tg->rt_bandwidth.rt_runti= me =3D=3D 0) + if (rt_group_sched_enabled() && tg->dl_bandwidth.dl_runtime =3D=3D 0) return 0; return 1; } -#else /* !CONFIG_RT_GROUP_SCHED: */ +#else /* !CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_SYSCTL static int sched_rt_global_constraints(void) @@ -2150,7 +2232,7 @@ static int sched_rt_global_constraints(void) return 0; } #endif /* CONFIG_SYSCTL */ -#endif /* !CONFIG_RT_GROUP_SCHED */ +#endif /* CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_SYSCTL static int sched_rt_global_validate(void) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index fb4dcb4551..bc3ed02e40 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -606,7 +606,7 @@ extern int sched_group_set_rt_runtime(struct task_group= *tg, long rt_runtime_us) extern int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_= us); extern long sched_group_rt_runtime(struct task_group *tg); extern long sched_group_rt_period(struct task_group *tg); -extern int sched_rt_can_attach(struct task_group *tg, struct task_struct *= tsk); +extern int sched_rt_can_attach(struct task_group *tg); extern struct task_group *sched_create_group(struct task_group *parent); extern void sched_online_group(struct task_group *tg, diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c index 93a9c03b28..0b75126019 100644 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -626,6 +626,19 @@ int __sched_setscheduler(struct task_struct *p, change: if (user) { +#ifdef CONFIG_RT_GROUP_SCHED + /* + * Do not allow real-time tasks into groups that have no runtime + * assigned. + */ + if (rt_group_sched_enabled() && + dl_bandwidth_enabled() && rt_policy(policy) && + !sched_rt_can_attach(task_group(p)) && + !task_group_is_autogroup(task_group(p))) { + retval =3D -EPERM; + goto unlock; + } +#endif if (dl_bandwidth_enabled() && dl_policy(policy) && !(attr->sched_flags & SCHED_FLAG_SUGOV)) { cpumask_t *span =3D rq->rd->span; -- 2.51.0