From nobody Sat Feb  7 22:41:32 2026
Received: from mail-ej1-f49.google.com (mail-ej1-f49.google.com
 [209.85.218.49])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 076C531062E
	for <linux-kernel@vger.kernel.org>; Mon,  1 Dec 2025 12:42:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.218.49
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764592943; cv=none;
 b=AZJmbnsc+R6OzvEVuRf0qmEpYw7iuaAPIaLoAg3AxykQx/REm1DJLJmEKRtUEkWG8cZjFFifx+f+emNAOL48GCrf0IAemXyvOLdeSueLzaKXzVlWNfH9kOJYjF643K8vNCJ8gPucwdh+TtzxCQ89i6Gc70Ph/YJoDQQ0v20cVNg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764592943; c=relaxed/simple;
	bh=UG3K+wxMNW7BTgQ31fHXNhk62O0mzoq3tbNazZI8W30=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=FRXIgWZ8/31+o+gtBwKT6C4SR8xT8weEGHKIgYqkMZZUkVYjT8KPtgOIHbfDd7qs94eq+sufesyDQLt5BM3o1X4Ps9MHaZMxvDhg3Ee6EBkTmk02IkqE0mhCSOkkAHAikl55MF4KGTIb9rtH4/rakkGqKZ7GXH5ZJdSgTD7LypA=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=kzsmnKGv; arc=none smtp.client-ip=209.85.218.49
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="kzsmnKGv"
Received: by mail-ej1-f49.google.com with SMTP id
 a640c23a62f3a-b713c7096f9so673286166b.3
        for <linux-kernel@vger.kernel.org>;
 Mon, 01 Dec 2025 04:42:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1764592939; x=1765197739;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=F8g1afQO1rtVJZT9hlKagPzN9gZNeZwBORiXsRXGHT0=;
        b=kzsmnKGvl/EL6m0Pb374s3wn9NQTd0tCLjr8Unza+UW7E/8EIEOcIIBng2S/WTTHci
         URN8XCIq/CuZtPgiGXa16kPJjDkPwH8Ng0WSeIKqppDWcSmlLtY70tbHhloFt/QtFNp+
         0ZE6+LMSt9LRySzrtwVhNIupKJw7F1nWHFmRdeJj1oxI8xlQPDgaPrcOzNEzc7VaTrKq
         VT0PhqsMLF24blbJwZas2Qa7vBSmhdZJN37FIICmf+E0w5TxmAT7PmLfNq7uuj4OoMHS
         ZkKwNT9uAc9EhZ4S1RkNzd4tv8UM+I60xulccHXrM6B/Q8bNUeA23mnKVQV5fcI2onPo
         8exw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764592939; x=1765197739;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=F8g1afQO1rtVJZT9hlKagPzN9gZNeZwBORiXsRXGHT0=;
        b=kkuPVCcelXd1k8BuY4oZNw1cw3I9I4c1HslbxQtZuZHAn7EnolgDNBmSAU2Zx0q3wG
         6I25lIZUJbFbY9wbVCSPYZdWI2E8GZyDh/lyGcIL3Ehaky/ot7ACvzi+8OayZHOR0AdA
         IGZbdOvoL3OTKRk6nl97neslDoShTk9X+9j25mrVf+nJaT9iUXIs7MCHlIJ0pu7YPX2i
         bpgpQUvkwsLvQSs4N9gfxxjdNX21qDkbZ/xe5ak8ImyXCgy5wwAFSapdgXDYOGV6lRf8
         86TtB0hRC1PgljzifwGDk6qQMYQDz89642gPqmDgAh21kq9Qi9CFnmXj+jSJK1/N8wPB
         4XFg==
X-Gm-Message-State: AOJu0Yydv7L1pqf6mGyweTTjguefEroUvIZFu1OeMCkLXcRe5u4kFF1m
	vcEU95WRX0V4Vzo60hKF0I3Deac+qDaicSkiwcSKEJs7Mq0LY+0MS4Bb
X-Gm-Gg: ASbGncttsm5AOIMmLHtnlYT8dOria6oRo/HlUKJ6PDA/Zh/pAqDXJdL3LcBEMTfBCpb
	kceXUVoCeO5WU5KOIdE+hJYIpVXKaIVJlfhno3pNhRHHbpJa0COC+iHCQkxhnD6P9rKj/zPuXjH
	uX4ZHHsRxRPgn/HryPbiOX47N94SpxRyU6iJbg4dPbK0EeyOdEO9Q1AexfpwRvQB0lUM8YXg75X
	85lZhSQ/AoWWl3+ZpfvNqwjQvwg7Zaie6R1BUC7dxIG1kcMtnas1c6bxKt59BvukmT+B8F7est6
	woeWvZtv96F2FtSXG1hb11ID0KYYQbdtR/NZT7tudWh8ho0LfSHS6Pu9HS236SdduBWgnJHcFN4
	H52R//ugvYBjUMJOiw061QXrrDhPXxtRb1Jdyf6j/w6r8B5l2q0MR2GSM4ZAv85L2PqjFMIRdjK
	uc2j+hA7DemtYbWBf/DpI=
X-Google-Smtp-Source: 
 AGHT+IG8a8W7ew5ByJ+3rmg/WGQAR19MtvriaAtoQ1OfKaaIVKmszSnTnu5ZLbmHwhDKzdVBdr9cLw==
X-Received: by 2002:a17:907:26c2:b0:b4e:f7cc:72f1 with SMTP id
 a640c23a62f3a-b76715aba47mr3966683166b.22.1764592939084;
        Mon, 01 Dec 2025 04:42:19 -0800 (PST)
Received: from victus-lab ([193.205.81.5])
        by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-b76f59e8612sm1173738266b.52.2025.12.01.04.42.18
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 01 Dec 2025 04:42:18 -0800 (PST)
From: Yuri Andriaccio <yurand2000@gmail.com>
To: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Luca Abeni <luca.abeni@santannapisa.it>,
	Yuri Andriaccio <yuri.andriaccio@santannapisa.it>
Subject: [RFC PATCH v4 13/28] sched/rt: Update task event callbacks for HCBS
 scheduling
Date: Mon,  1 Dec 2025 13:41:46 +0100
Message-ID: <20251201124205.11169-14-yurand2000@gmail.com>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251201124205.11169-1-yurand2000@gmail.com>
References: <20251201124205.11169-1-yurand2000@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Update wakeup_preempt_rt, switched_{from/to}_rt and prio_changed_rt with
rt-cgroup's specific preemption rules:

- In wakeup_preempt_rt(), whenever a task wakes up, it must be checked if
  it is served by a deadline server or it lives on the global runqueue.
  Preemption rules (as documented in the function), change based on the
  current task and woken task runqueue:
  - If both tasks are FIFO/RR tasks on the global runqueue, or the same
    cgroup, run as normal.
  - If woken is inside a cgroup, but curr is a FIFO task on the global
    runqueue, always preempt. If curr is a DEADLINE task, check if the dl
    server preempts curr.
  - If both tasks are FIFO/RR tasks in served but different groups, check
    whether the woken server preempts the current server.
- In switched_from_rt(), perform a pull only on the global runqueue, and
  do nothing if the task is inside a group. This will change when
  migrations will be added.
- In switched_to_rt(), queue a push only on the global runqueue, while
  perform a priority check when the task switching is inside a group.
  This will change also when migrations will be added.
- In prio_changed_rt(), queue a pull only on the global runqueue, if the
  task is not queued. If the task is queued, run preemption checks only
  if both the prio changed task and curr are in the same cgroup.

Update sched_rt_can_attach() to check if a task can be attached to a given
cgroup. For now the check only consists in checking if the group has
non-zero bandwidth. Remove the tsk argument from sched_rt_can_attach, as
it is unused.

Change cpu_cgroup_can_attach() to check if the attachee is a FIFO/RR
task before attaching it to a cgroup.

Update __sched_setscheduler() to perform checks when trying to switch
to FIFO/RR for a task inside a cgroup, as the group needs to have
runtime allocated.

Update task_is_throttled_rt() for SCHED_CORE, returning the is_throttled
value of the server if present, while global rt-tasks are never throttled.

Co-developed-by: Alessio Balsini <a.balsini@sssup.it>
Signed-off-by: Alessio Balsini <a.balsini@sssup.it>
Co-developed-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Co-developed-by: luca abeni <luca.abeni@santannapisa.it>
Signed-off-by: luca abeni <luca.abeni@santannapisa.it>
Signed-off-by: Yuri Andriaccio <yurand2000@gmail.com>
---
 kernel/sched/core.c     |   2 +-
 kernel/sched/rt.c       | 104 +++++++++++++++++++++++++++++++++++-----
 kernel/sched/sched.h    |   2 +-
 kernel/sched/syscalls.c |  13 +++++
 4 files changed, 108 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 495cbdfdc5..d7fc83cdae 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9318,7 +9318,7 @@ static int cpu_cgroup_can_attach(struct cgroup_taskse=
t *tset)
 		goto scx_check;

 	cgroup_taskset_for_each(task, css, tset) {
-		if (!sched_rt_can_attach(css_tg(css), task))
+		if (rt_task(task) && !sched_rt_can_attach(css_tg(css)))
 			return -EINVAL;
 	}
 scx_check:
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 7ec117a18d..2b7c4b7754 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -960,7 +960,58 @@ static int balance_rt(struct rq *rq, struct task_struc=
t *p, struct rq_flags *rf)
 static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int fl=
ags)
 {
 	struct task_struct *donor =3D rq->donor;
+	struct sched_dl_entity *woken_dl_se =3D NULL;
+	struct sched_dl_entity *curr_dl_se =3D NULL;

+	if (!rt_group_sched_enabled())
+		goto no_group_sched;
+
+	/*
+	 * Preemption checks are different if the waking task and the current task
+	 * are running on the global runqueue or in a cgroup. The following rules
+	 * apply:
+	 *   - dl-tasks (and equally dl_servers) always preempt FIFO/RR tasks.
+	 *     - if curr is a FIFO/RR task inside a cgroup (i.e. run by a
+	 *       dl_server), or curr is a DEADLINE task and waking is a FIFO/RR t=
ask
+	 *       on the root cgroup, do nothing.
+	 *     - if waking is inside a cgroup but curr is a FIFO/RR task in the r=
oot
+	 *       cgroup, always reschedule.
+	 *   - if they are both on the global runqueue, run the standard code.
+	 *   - if they are both in the same cgroup, check for tasks priorities.
+	 *   - if they are both in a cgroup, but not the same one, check whether =
the
+	 *     woken task's dl_server preempts the current's dl_server.
+	 *   - if curr is a DEADLINE task and waking is in a cgroup, check whether
+	 *     the woken task's server preempts curr.
+	 */
+	if (is_dl_group(rt_rq_of_se(&p->rt)))
+		woken_dl_se =3D dl_group_of(rt_rq_of_se(&p->rt));
+	if (is_dl_group(rt_rq_of_se(&rq->curr->rt)))
+		curr_dl_se =3D dl_group_of(rt_rq_of_se(&rq->curr->rt));
+	else if (task_has_dl_policy(rq->curr))
+		curr_dl_se =3D &rq->curr->dl;
+
+	if (woken_dl_se !=3D NULL && curr_dl_se !=3D NULL) {
+		if (woken_dl_se =3D=3D curr_dl_se) {
+			if (p->prio < rq->curr->prio)
+				resched_curr(rq);
+
+			return;
+		}
+
+		if (dl_entity_preempt(woken_dl_se, curr_dl_se))
+			resched_curr(rq);
+
+		return;
+
+	} else if (woken_dl_se !=3D NULL) {
+		resched_curr(rq);
+		return;
+
+	} else if (curr_dl_se !=3D NULL) {
+		return;
+	}
+
+no_group_sched:
 	if (p->prio < donor->prio) {
 		resched_curr(rq);
 		return;
@@ -1710,6 +1761,8 @@ static void rq_offline_rt(struct rq *rq)
  */
 static void switched_from_rt(struct rq *rq, struct task_struct *p)
 {
+	struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt);
+
 	/*
 	 * If there are other RT tasks then we will reschedule
 	 * and the scheduling of the other RT tasks will handle
@@ -1717,10 +1770,11 @@ static void switched_from_rt(struct rq *rq, struct =
task_struct *p)
 	 * we may need to handle the pulling of RT tasks
 	 * now.
 	 */
-	if (!task_on_rq_queued(p) || rq->rt.rt_nr_running)
+	if (!task_on_rq_queued(p) || rt_rq->rt_nr_running)
 		return;

-	rt_queue_pull_task(rt_rq_of_se(&p->rt));
+	if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq))
+		rt_queue_pull_task(rt_rq);
 }

 void __init init_sched_rt_class(void)
@@ -1740,6 +1794,8 @@ void __init init_sched_rt_class(void)
  */
 static void switched_to_rt(struct rq *rq, struct task_struct *p)
 {
+	struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt);
+
 	/*
 	 * If we are running, update the avg_rt tracking, as the running time
 	 * will now on be accounted into the latter.
@@ -1755,8 +1811,14 @@ static void switched_to_rt(struct rq *rq, struct tas=
k_struct *p)
 	 * then see if we can move to another run queue.
 	 */
 	if (task_on_rq_queued(p)) {
-		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
-			rt_queue_push_tasks(rt_rq_of_se(&p->rt));
+		if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) {
+			if (p->prio < rq->curr->prio)
+				resched_curr(rq);
+		} else {
+			if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
+				rt_queue_push_tasks(rt_rq_of_se(&p->rt));
+		}
+
 		if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq)))
 			resched_curr(rq);
 	}
@@ -1769,6 +1831,8 @@ static void switched_to_rt(struct rq *rq, struct task=
_struct *p)
 static void
 prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
 {
+	struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt);
+
 	if (!task_on_rq_queued(p))
 		return;

@@ -1777,16 +1841,25 @@ prio_changed_rt(struct rq *rq, struct task_struct *=
p, int oldprio)
 		 * If our priority decreases while running, we
 		 * may need to pull tasks to this runqueue.
 		 */
-		if (oldprio < p->prio)
-			rt_queue_pull_task(rt_rq_of_se(&p->rt));
+		if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) && oldprio < p->prio)
+			rt_queue_pull_task(rt_rq);

 		/*
 		 * If there's a higher priority task waiting to run
 		 * then reschedule.
 		 */
-		if (p->prio > rq->rt.highest_prio.curr)
+		if (p->prio > rt_rq->highest_prio.curr)
 			resched_curr(rq);
 	} else {
+		/*
+		 * This task is not running, thus we check against the currently
+		 * running task for preemption. We can preempt only if both tasks are
+		 * in the same cgroup or on the global runqueue.
+		 */
+		if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) &&
+		    rt_rq_of_se(&p->rt)->tg !=3D rt_rq_of_se(&rq->curr->rt)->tg)
+			return;
+
 		/*
 		 * This task is not running, but if it is
 		 * greater than the current running task
@@ -1881,7 +1954,16 @@ static unsigned int get_rr_interval_rt(struct rq *rq=
, struct task_struct *task)
 #ifdef CONFIG_SCHED_CORE
 static int task_is_throttled_rt(struct task_struct *p, int cpu)
 {
+#ifdef CONFIG_RT_GROUP_SCHED
+	struct rt_rq *rt_rq;
+
+	rt_rq =3D task_group(p)->rt_rq[cpu];
+	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group);
+
+	return dl_group_of(rt_rq)->dl_throttled;
+#else
 	return 0;
+#endif
 }
 #endif /* CONFIG_SCHED_CORE */

@@ -2133,16 +2215,16 @@ static int sched_rt_global_constraints(void)
 }
 #endif /* CONFIG_SYSCTL */

-int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)
+int sched_rt_can_attach(struct task_group *tg)
 {
 	/* Don't accept real-time tasks when there is no way for them to run */
-	if (rt_group_sched_enabled() && rt_task(tsk) && tg->rt_bandwidth.rt_runti=
me =3D=3D 0)
+	if (rt_group_sched_enabled() && tg->dl_bandwidth.dl_runtime =3D=3D 0)
 		return 0;

 	return 1;
 }

-#else /* !CONFIG_RT_GROUP_SCHED: */
+#else /* !CONFIG_RT_GROUP_SCHED */

 #ifdef CONFIG_SYSCTL
 static int sched_rt_global_constraints(void)
@@ -2150,7 +2232,7 @@ static int sched_rt_global_constraints(void)
 	return 0;
 }
 #endif /* CONFIG_SYSCTL */
-#endif /* !CONFIG_RT_GROUP_SCHED */
+#endif /* CONFIG_RT_GROUP_SCHED */

 #ifdef CONFIG_SYSCTL
 static int sched_rt_global_validate(void)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index fb4dcb4551..bc3ed02e40 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -606,7 +606,7 @@ extern int sched_group_set_rt_runtime(struct task_group=
 *tg, long rt_runtime_us)
 extern int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_=
us);
 extern long sched_group_rt_runtime(struct task_group *tg);
 extern long sched_group_rt_period(struct task_group *tg);
-extern int sched_rt_can_attach(struct task_group *tg, struct task_struct *=
tsk);
+extern int sched_rt_can_attach(struct task_group *tg);

 extern struct task_group *sched_create_group(struct task_group *parent);
 extern void sched_online_group(struct task_group *tg,
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 93a9c03b28..0b75126019 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -626,6 +626,19 @@ int __sched_setscheduler(struct task_struct *p,
 change:

 	if (user) {
+#ifdef CONFIG_RT_GROUP_SCHED
+		/*
+		 * Do not allow real-time tasks into groups that have no runtime
+		 * assigned.
+		 */
+		if (rt_group_sched_enabled() &&
+				dl_bandwidth_enabled() && rt_policy(policy) &&
+				!sched_rt_can_attach(task_group(p)) &&
+				!task_group_is_autogroup(task_group(p))) {
+			retval =3D -EPERM;
+			goto unlock;
+		}
+#endif
 		if (dl_bandwidth_enabled() && dl_policy(policy) &&
 				!(attr->sched_flags & SCHED_FLAG_SUGOV)) {
 			cpumask_t *span =3D rq->rd->span;
--
2.51.0