From nobody Sun Feb  8 08:35:52 2026
Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com
 [209.85.221.43])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 136B41F4C8E
	for <linux-kernel@vger.kernel.org>; Thu,  5 Jun 2025 07:14:28 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.221.43
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1749107672; cv=none;
 b=nHLfAZXkczp2s1C4GEKebfBBuG0Fzb2osexEJg+ihV0avandbRPdxcu0zputelo427sR/0I7hUa0e+aGqjYttBX8VXLpg0EeFM9UsMVVSMit+IyILN9Qaynyiv7JgGv1ZN1FDYm3TQAACcqeyXVcRB2HOxnuAjoriMCGK8Kc5z0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1749107672; c=relaxed/simple;
	bh=+SLrJP+SwnpzCynRJkodV0YYCKHymXa5kVx+/59ksm4=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=LQAlDho7UITSSxuuE9GOiRu/G2mAuOVEDp64Cp4iqC6jIUB/OTOwRk9yePXhJhUK7biRCid2/rsGwpKQ1/bUfehkcuqbi5MvNAYdLX1DABqYvu2B1zi4lN7hX8P0ywLY9ZqXTdR81oWkljAUkserKutPLX+XEmumq9dVk8pBUYI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=lvW1Y6IR; arc=none smtp.client-ip=209.85.221.43
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="lvW1Y6IR"
Received: by mail-wr1-f43.google.com with SMTP id
 ffacd0b85a97d-3a4f71831abso573830f8f.3
        for <linux-kernel@vger.kernel.org>;
 Thu, 05 Jun 2025 00:14:28 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1749107667; x=1749712467;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Nl3EyhwNmJYbp+s8ngdxDvdBkfcvseywo+ids1EXZSk=;
        b=lvW1Y6IRkDyrqtvAX7cNboLXNUJgwvBW3VYhQkVyY4ZmOn+m15vujXx49PjBJFeGLG
         ABnc0u0C9+EtPJO1beIlq622aHOG5xUPy6zCrwr3QOj0itks5YN/vFZ+gwC1ajWONXOv
         2GRdsJRoTT7/SZneN2GbGg6HiFs4XSVjIa7XXOvo11xIwSZzpLxJkV3Nky17DJZfVqDS
         hGWXzuJ6qcdZ+576fF2T2SMalbEVaxY5RjzpinOwu4uu/26wMWLua/5rYb2kcxAGxIc2
         5VXxNxEMVvWwLx45YMpOABGrDqAOXcblw2Zs0UTJ0t5UfeW0xUw05cWb5aq8Brl5z18b
         S80A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1749107667; x=1749712467;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Nl3EyhwNmJYbp+s8ngdxDvdBkfcvseywo+ids1EXZSk=;
        b=SYPAh3G5D7UnxblbLAIFLnmuvq7GcojO2zv9NFsSP+R1IgnoDNbMrEn23flUDP1Uke
         D1D9VJ5wGPcV85zPpr3RALtKhh5XwpqGkY0VlAvDqnUOnlVUeKU6dB5e4AEmvp63Naq9
         9NBTdEublR1nLICznCO18jXCHlVWhOhebyo+oYTb64n5IzCob9X68AM+e8q3n3NL9osL
         DGgNNFNder9pzFB2KQHydeLfJ1kLleEqlMYJ8lryX5lhGUpM4j+OyCyFmt91l+Wz47cA
         2KHbzRBKNNB0MuxH37LJvmmMuI50glp/w9anUUi284ybJjTYi1ru1tw+DNXhL2PJe26n
         orFA==
X-Gm-Message-State: AOJu0Ywi1Dx5zRD7k8/GQW9KnXeVdWXqjx6en4apyPjUi7JzFMZS5p/6
	RiZ6XjSYGswoMemYZgRuyfvwDZjW9d38ZLEU13qp+J6TMKqP+oNWDYmaxfHJLWfQ13s=
X-Gm-Gg: ASbGncs7lcTwCPQ5zWAdqdVV6/5DABar04eC1HUzfgDfjhqQa0iZ2EZJMs7WeZUDwm3
	N+okUu13vxBrhLeTJHAwK5iNxWXyyM81pR/HHNHeDC2/hD8g+8b66CzErPu0HcALxzw+WYt1wo0
	nti48nA1FCneESc1UxTZW8Xio5oihDWFPYkh01KdVWUvSSt+CurHTO1jMCtszpJ5Vtq9ZaPuF/I
	H/US4dFMvO4Aq5j2RY9RxiLubF1L1PotiMiKTHrMhzT9CUP0I42b74XKP3KF8WP6xHqNhU++KHX
	jrXw9wc989n/QulunndAsulvW3naMxK3CZywUm5VOV5pVOnhKuhpFFanb+xhFB4o
X-Google-Smtp-Source: 
 AGHT+IFSvd/4VaY1YpesBrCGCxbgoh0VjyWHKJsJy+Zaj6L4wCy5WaRkxY/OLWy7Xh8s6cmU2kTq7w==
X-Received: by 2002:a05:6000:430d:b0:3a4:e68e:d33c with SMTP id
 ffacd0b85a97d-3a51d96b0fdmr4676637f8f.47.1749107666807;
        Thu, 05 Jun 2025 00:14:26 -0700 (PDT)
Received: from localhost.localdomain ([78.210.56.234])
        by smtp.gmail.com with ESMTPSA id
 ffacd0b85a97d-3a4f00972b5sm23885431f8f.76.2025.06.05.00.14.25
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 05 Jun 2025 00:14:26 -0700 (PDT)
From: Yuri Andriaccio <yurand2000@gmail.com>
To: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Luca Abeni <luca.abeni@santannapisa.it>,
	Yuri Andriaccio <yuri.andriaccio@santannapisa.it>
Subject: [RFC PATCH 5/9] sched/deadline: Hierarchical scheduling with DL on
 top of RT
Date: Thu,  5 Jun 2025 09:14:08 +0200
Message-ID: <20250605071412.139240-6-yurand2000@gmail.com>
X-Mailer: git-send-email 2.49.0
In-Reply-To: <20250605071412.139240-1-yurand2000@gmail.com>
References: <20250605071412.139240-1-yurand2000@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: luca abeni <luca.abeni@santannapisa.it>

Implement the hierarchical scheduling mechanism:
- Enforce the runtime of RT tasks controlled by cgroups using the dl_server
  mechanism, based on the runtime and period (the deadline is set equal
  to the period) parameters.
- Make sched_dl_entity store the cgroups' RT local runqueue, and provide a
  rt_rq for this runqueue.
- Allow zeroing the runtime of a rt cgroups.

Update dl_server code:
- Make check for the return value of dl_server_apply_params on initializati=
on
  of the fair server.
- Initialize the dl_server in dl_server_start only if it is a fair server, =
but
  do not initialize other types of servers, regardless of its period value.
- Make inc_dl_task and dec_dl_task increase/decrease by one the value of ac=
tive
  tasks if a fair_server starts/stops, while ignore rt-cgroups dl_servers in
  the accounting.

Co-developed-by: Alessio Balsini <a.balsini@sssup.it>
Signed-off-by: Alessio Balsini <a.balsini@sssup.it>
Co-developed-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Co-developed-by: Yuri Andriaccio <yurand2000@gmail.com>
Signed-off-by: Yuri Andriaccio <yurand2000@gmail.com>
Signed-off-by: luca abeni <luca.abeni@santannapisa.it>
---
 kernel/sched/autogroup.c |   4 +-
 kernel/sched/core.c      |  15 +-
 kernel/sched/deadline.c  | 145 ++++++++++++--
 kernel/sched/rt.c        | 415 ++++++++++++++++++++++++++-------------
 kernel/sched/sched.h     |  59 +++++-
 kernel/sched/syscalls.c  |   4 +-
 6 files changed, 471 insertions(+), 171 deletions(-)

diff --git a/kernel/sched/autogroup.c b/kernel/sched/autogroup.c
index 2b331822c..a647c9265 100644
--- a/kernel/sched/autogroup.c
+++ b/kernel/sched/autogroup.c
@@ -49,7 +49,7 @@ static inline void autogroup_destroy(struct kref *kref)
=20
 #ifdef CONFIG_RT_GROUP_SCHED
 	/* We've redirected RT tasks to the root task group... */
-	ag->tg->rt_se =3D NULL;
+	ag->tg->dl_se =3D NULL;
 	ag->tg->rt_rq =3D NULL;
 #endif
 	sched_release_group(ag->tg);
@@ -106,7 +106,7 @@ static inline struct autogroup *autogroup_create(void)
 	 * the policy change to proceed.
 	 */
 	free_rt_sched_group(tg);
-	tg->rt_se =3D root_task_group.rt_se;
+	tg->dl_se =3D root_task_group.dl_se;
 	tg->rt_rq =3D root_task_group.rt_rq;
 #endif
 	tg->autogroup =3D ag;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dce50fa57..c07fddbf2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2196,6 +2196,9 @@ void wakeup_preempt(struct rq *rq, struct task_struct=
 *p, int flags)
 {
 	struct task_struct *donor =3D rq->donor;
=20
+	if (is_dl_group(rt_rq_of_se(&p->rt)) && task_has_rt_policy(p))
+		resched_curr(rq);
+
 	if (p->sched_class =3D=3D donor->sched_class)
 		donor->sched_class->wakeup_preempt(rq, p, flags);
 	else if (sched_class_above(p->sched_class, donor->sched_class))
@@ -8548,7 +8551,7 @@ void __init sched_init(void)
 		root_task_group.scx_weight =3D CGROUP_WEIGHT_DFL;
 #endif /* CONFIG_EXT_GROUP_SCHED */
 #ifdef CONFIG_RT_GROUP_SCHED
-		root_task_group.rt_se =3D (struct sched_rt_entity **)ptr;
+		root_task_group.dl_se =3D (struct sched_dl_entity **)ptr;
 		ptr +=3D nr_cpu_ids * sizeof(void **);
=20
 		root_task_group.rt_rq =3D (struct rt_rq **)ptr;
@@ -8562,7 +8565,7 @@ void __init sched_init(void)
 #endif
=20
 #ifdef CONFIG_RT_GROUP_SCHED
-	init_rt_bandwidth(&root_task_group.rt_bandwidth,
+	init_dl_bandwidth(&root_task_group.dl_bandwidth,
 			global_rt_period(), global_rt_runtime());
 #endif /* CONFIG_RT_GROUP_SCHED */
=20
@@ -8618,7 +8621,7 @@ void __init sched_init(void)
 		 * yet.
 		 */
 		rq->rt.rt_runtime =3D global_rt_runtime();
-		init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL);
+		init_tg_rt_entry(&root_task_group, rq, NULL, i, NULL);
 #endif
 #ifdef CONFIG_SMP
 		rq->sd =3D NULL;
@@ -9125,6 +9128,12 @@ cpu_cgroup_css_alloc(struct cgroup_subsys_state *par=
ent_css)
 		return &root_task_group.css;
 	}
=20
+	/* Do not allow cpu_cgroup hierachies with depth greater than 2. */
+#ifdef CONFIG_RT_GROUP_SCHED
+	if (parent !=3D &root_task_group)
+		return ERR_PTR(-EINVAL);
+#endif
+
 	tg =3D sched_create_group(parent);
 	if (IS_ERR(tg))
 		return ERR_PTR(-ENOMEM);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 7736a625f..6589077c0 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -239,8 +239,15 @@ void __dl_add(struct dl_bw *dl_b, u64 tsk_bw, int cpus)
 static inline bool
 __dl_overflow(struct dl_bw *dl_b, unsigned long cap, u64 old_bw, u64 new_b=
w)
 {
+	u64 dl_groups_root =3D 0;
+
+#ifdef CONFIG_RT_GROUP_SCHED
+	dl_groups_root =3D to_ratio(root_task_group.dl_bandwidth.dl_period,
+				  root_task_group.dl_bandwidth.dl_runtime);
+#endif
 	return dl_b->bw !=3D -1 &&
-	       cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw;
+	       cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw
+					+ cap_scale(dl_groups_root, cap);
 }
=20
 static inline
@@ -366,6 +373,93 @@ void cancel_inactive_timer(struct sched_dl_entity *dl_=
se)
 	cancel_dl_timer(dl_se, &dl_se->inactive_timer);
 }
=20
+/*
+ * Used for dl_bw check and update, used under sched_rt_handler()::mutex a=
nd
+ * sched_domains_mutex.
+ */
+u64 dl_cookie;
+
+#ifdef CONFIG_RT_GROUP_SCHED
+int dl_check_tg(unsigned long total)
+{
+	unsigned long flags;
+	int which_cpu;
+	int cpus;
+	struct dl_bw *dl_b;
+	u64 gen =3D ++dl_cookie;
+
+	for_each_possible_cpu(which_cpu) {
+		rcu_read_lock_sched();
+
+		if (!dl_bw_visited(which_cpu, gen)) {
+			cpus =3D dl_bw_cpus(which_cpu);
+			dl_b =3D dl_bw_of(which_cpu);
+
+			raw_spin_lock_irqsave(&dl_b->lock, flags);
+
+			if (dl_b->bw !=3D -1 &&
+			    dl_b->bw * cpus < dl_b->total_bw + total * cpus) {
+				raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+				rcu_read_unlock_sched();
+
+				return 0;
+			}
+
+			raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+		}
+
+		rcu_read_unlock_sched();
+	}
+
+	return 1;
+}
+
+int dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_runtime, u64 rt_perio=
d)
+{
+	struct rq *rq =3D container_of(dl_se->dl_rq, struct rq, dl);
+	int is_active;
+	u64 old_runtime;
+
+	/*
+	 * Since we truncate DL_SCALE bits, make sure we're at least
+	 * that big.
+	 */
+	if (rt_runtime !=3D 0 && rt_runtime < (1ULL << DL_SCALE))
+		return 0;
+
+	/*
+	 * Since we use the MSB for wrap-around and sign issues, make
+	 * sure it's not set (mind that period can be equal to zero).
+	 */
+	if (rt_period & (1ULL << 63))
+		return 0;
+
+	raw_spin_rq_lock_irq(rq);
+	is_active =3D dl_se->my_q->rt.rt_nr_running > 0;
+	old_runtime =3D dl_se->dl_runtime;
+	dl_se->dl_runtime  =3D rt_runtime;
+	dl_se->dl_period   =3D rt_period;
+	dl_se->dl_deadline =3D dl_se->dl_period;
+	if (is_active) {
+		sub_running_bw(dl_se, dl_se->dl_rq);
+	} else if (dl_se->dl_non_contending) {
+		sub_running_bw(dl_se, dl_se->dl_rq);
+		dl_se->dl_non_contending =3D 0;
+		hrtimer_try_to_cancel(&dl_se->inactive_timer);
+	}
+	__sub_rq_bw(dl_se->dl_bw, dl_se->dl_rq);
+	dl_se->dl_bw =3D to_ratio(dl_se->dl_period, dl_se->dl_runtime);
+	__add_rq_bw(dl_se->dl_bw, dl_se->dl_rq);
+
+	if (is_active)
+		add_running_bw(dl_se, dl_se->dl_rq);
+
+	raw_spin_rq_unlock_irq(rq);
+
+	return 1;
+}
+#endif
+
 static void dl_change_utilization(struct task_struct *p, u64 new_bw)
 {
 	WARN_ON_ONCE(p->dl.flags & SCHED_FLAG_SUGOV);
@@ -539,6 +633,14 @@ static inline int is_leftmost(struct sched_dl_entity *=
dl_se, struct dl_rq *dl_rq
=20
 static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq);
=20
+void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime)
+{
+	raw_spin_lock_init(&dl_b->dl_runtime_lock);
+	dl_b->dl_period =3D period;
+	dl_b->dl_runtime =3D runtime;
+}
+
+
 void init_dl_bw(struct dl_bw *dl_b)
 {
 	raw_spin_lock_init(&dl_b->lock);
@@ -1493,6 +1595,9 @@ static void update_curr_dl_se(struct rq *rq, struct s=
ched_dl_entity *dl_se, s64
 {
 	s64 scaled_delta_exec;
=20
+	if (dl_server(dl_se) && !on_dl_rq(dl_se))
+		return;
+
 	if (unlikely(delta_exec <=3D 0)) {
 		if (unlikely(dl_se->dl_yielded))
 			goto throttle;
@@ -1654,13 +1759,15 @@ void dl_server_start(struct sched_dl_entity *dl_se)
 	 * this before getting generic.
 	 */
 	if (!dl_server(dl_se)) {
-		u64 runtime =3D  50 * NSEC_PER_MSEC;
-		u64 period =3D 1000 * NSEC_PER_MSEC;
-
 		dl_se->dl_server =3D 1;
-		dl_server_apply_params(dl_se, runtime, period, 1);
+		if (dl_se =3D=3D &rq_of_dl_se(dl_se)->fair_server) {
+			u64 runtime =3D  50 * NSEC_PER_MSEC;
+			u64 period =3D 1000 * NSEC_PER_MSEC;
+
+			BUG_ON(dl_server_apply_params(dl_se, runtime, period, 1));
=20
-		dl_se->dl_defer =3D 1;
+			dl_se->dl_defer =3D 1;
+		}
 		setup_new_dl_entity(dl_se);
 	}
=20
@@ -1669,13 +1776,14 @@ void dl_server_start(struct sched_dl_entity *dl_se)
=20
 	dl_se->dl_server_active =3D 1;
 	enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP);
-			rq =3D rq_of_dl_se(dl_se);
+	rq =3D rq_of_dl_se(dl_se);
 	if (!dl_task(rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl))
 		resched_curr(rq);
 }
=20
 void dl_server_stop(struct sched_dl_entity *dl_se)
 {
+//	if (!dl_server(dl_se)) return;	TODO: Check if the following is equivale=
nt to this!!!
 	if (!dl_se->dl_runtime)
 		return;
=20
@@ -1898,7 +2006,13 @@ void inc_dl_tasks(struct sched_dl_entity *dl_se, str=
uct dl_rq *dl_rq)
 	u64 deadline =3D dl_se->deadline;
=20
 	dl_rq->dl_nr_running++;
-	add_nr_running(rq_of_dl_rq(dl_rq), 1);
+	if (!dl_server(dl_se) || dl_se =3D=3D &rq_of_dl_rq(dl_rq)->fair_server) {
+		add_nr_running(rq_of_dl_rq(dl_rq), 1);
+	} else {
+		struct rt_rq *rt_rq =3D &dl_se->my_q->rt;
+
+		add_nr_running(rq_of_dl_rq(dl_rq), rt_rq->rt_nr_running);
+	}
=20
 	inc_dl_deadline(dl_rq, deadline);
 }
@@ -1908,7 +2022,13 @@ void dec_dl_tasks(struct sched_dl_entity *dl_se, str=
uct dl_rq *dl_rq)
 {
 	WARN_ON(!dl_rq->dl_nr_running);
 	dl_rq->dl_nr_running--;
-	sub_nr_running(rq_of_dl_rq(dl_rq), 1);
+	if ((!dl_server(dl_se)) || dl_se =3D=3D &rq_of_dl_rq(dl_rq)->fair_server)=
 {
+		sub_nr_running(rq_of_dl_rq(dl_rq), 1);
+	} else {
+		struct rt_rq *rt_rq =3D &dl_se->my_q->rt;
+
+		sub_nr_running(rq_of_dl_rq(dl_rq), rt_rq->rt_nr_running);
+	}
=20
 	dec_dl_deadline(dl_rq, dl_se->deadline);
 }
@@ -2445,6 +2565,7 @@ static struct task_struct *__pick_task_dl(struct rq *=
rq)
 			}
 			goto again;
 		}
+		BUG_ON(!p);
 		rq->dl_server =3D dl_se;
 	} else {
 		p =3D dl_task_of(dl_se);
@@ -3177,12 +3298,6 @@ DEFINE_SCHED_CLASS(dl) =3D {
 #endif
 };
=20
-/*
- * Used for dl_bw check and update, used under sched_rt_handler()::mutex a=
nd
- * sched_domains_mutex.
- */
-u64 dl_cookie;
-
 int sched_dl_global_validate(void)
 {
 	u64 runtime =3D global_rt_runtime();
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 382126274..e348b8aba 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1,3 +1,4 @@
+#pragma GCC diagnostic ignored "-Wunused-function"
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Real-Time Scheduling Class (mapped to the SCHED_FIFO and SCHED_RR
@@ -184,81 +185,122 @@ void free_rt_sched_group(struct task_group *tg)
 		return;
=20
 	for_each_possible_cpu(i) {
-		if (tg->rt_rq)
-			kfree(tg->rt_rq[i]);
-		if (tg->rt_se)
-			kfree(tg->rt_se[i]);
+		if (tg->dl_se) {
+			unsigned long flags;
+
+			/*
+			 * Since the dl timer is going to be cancelled,
+			 * we risk to never decrease the running bw...
+			 * Fix this issue by changing the group runtime
+			 * to 0 immediately before freeing it.
+			 */
+			BUG_ON(!dl_init_tg(tg->dl_se[i], 0, tg->dl_se[i]->dl_period));
+			raw_spin_rq_lock_irqsave(cpu_rq(i), flags);
+			BUG_ON(tg->rt_rq[i]->rt_nr_running);
+			raw_spin_rq_unlock_irqrestore(cpu_rq(i), flags);
+
+			hrtimer_cancel(&tg->dl_se[i]->dl_timer);
+			kfree(tg->dl_se[i]);
+		}
+		if (tg->rt_rq) {
+			struct rq *served_rq;
+
+			served_rq =3D container_of(tg->rt_rq[i], struct rq, rt);
+			kfree(served_rq);
+		}
 	}
=20
 	kfree(tg->rt_rq);
-	kfree(tg->rt_se);
+	kfree(tg->dl_se);
 }
=20
-void init_tg_rt_entry(struct task_group *tg, struct rt_rq *rt_rq,
-		struct sched_rt_entity *rt_se, int cpu,
-		struct sched_rt_entity *parent)
+void init_tg_rt_entry(struct task_group *tg, struct rq *served_rq,
+		struct sched_dl_entity *dl_se, int cpu,
+		struct sched_dl_entity *parent)
 {
 	struct rq *rq =3D cpu_rq(cpu);
=20
-	rt_rq->highest_prio.curr =3D MAX_RT_PRIO-1;
-	rt_rq->rt_nr_boosted =3D 0;
-	rt_rq->rq =3D rq;
-	rt_rq->tg =3D tg;
+	served_rq->rt.highest_prio.curr =3D MAX_RT_PRIO-1;
+	served_rq->rt.rq =3D rq;
+	served_rq->rt.tg =3D tg;
=20
-	tg->rt_rq[cpu] =3D rt_rq;
-	tg->rt_se[cpu] =3D rt_se;
+	tg->rt_rq[cpu] =3D &served_rq->rt;
+	tg->dl_se[cpu] =3D dl_se;
=20
-	if (!rt_se)
+	if (!dl_se)
 		return;
=20
-	if (!parent)
-		rt_se->rt_rq =3D &rq->rt;
-	else
-		rt_se->rt_rq =3D parent->my_q;
+	dl_se->dl_rq =3D &rq->dl;
+	dl_se->my_q =3D served_rq;
+}
=20
-	rt_se->my_q =3D rt_rq;
-	rt_se->parent =3D parent;
-	INIT_LIST_HEAD(&rt_se->run_list);
+static bool rt_server_has_tasks(struct sched_dl_entity *dl_se)
+{
+	return !!dl_se->my_q->rt.rt_nr_running;
+}
+
+static struct task_struct *_pick_next_task_rt(struct rt_rq *rt_rq);
+static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, =
bool first);
+static struct task_struct *rt_server_pick(struct sched_dl_entity *dl_se)
+{
+	struct rt_rq *rt_rq =3D &dl_se->my_q->rt;
+	struct rq *rq =3D rq_of_rt_rq(rt_rq);
+	struct task_struct *p;
+
+	if (dl_se->my_q->rt.rt_nr_running =3D=3D 0)
+		return NULL;
+
+	p =3D _pick_next_task_rt(rt_rq);
+	set_next_task_rt(rq, p, true);
+
+	return p;
 }
=20
 int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
 {
-	struct rt_rq *rt_rq;
-	struct sched_rt_entity *rt_se;
+	struct rq *s_rq;
+	struct sched_dl_entity *dl_se;
 	int i;
=20
 	if (!rt_group_sched_enabled())
 		return 1;
=20
-	tg->rt_rq =3D kcalloc(nr_cpu_ids, sizeof(rt_rq), GFP_KERNEL);
+	tg->rt_rq =3D kcalloc(nr_cpu_ids, sizeof(struct rt_rq *), GFP_KERNEL);
 	if (!tg->rt_rq)
 		goto err;
-	tg->rt_se =3D kcalloc(nr_cpu_ids, sizeof(rt_se), GFP_KERNEL);
-	if (!tg->rt_se)
+	tg->dl_se =3D kcalloc(nr_cpu_ids, sizeof(dl_se), GFP_KERNEL);
+	if (!tg->dl_se)
 		goto err;
=20
-	init_rt_bandwidth(&tg->rt_bandwidth, ktime_to_ns(global_rt_period()), 0);
+	init_dl_bandwidth(&tg->dl_bandwidth, 0, 0);
=20
 	for_each_possible_cpu(i) {
-		rt_rq =3D kzalloc_node(sizeof(struct rt_rq),
+		s_rq =3D kzalloc_node(sizeof(struct rq),
 				     GFP_KERNEL, cpu_to_node(i));
-		if (!rt_rq)
+		if (!s_rq)
 			goto err;
=20
-		rt_se =3D kzalloc_node(sizeof(struct sched_rt_entity),
+		dl_se =3D kzalloc_node(sizeof(struct sched_dl_entity),
 				     GFP_KERNEL, cpu_to_node(i));
-		if (!rt_se)
+		if (!dl_se)
 			goto err_free_rq;
=20
-		init_rt_rq(rt_rq);
-		rt_rq->rt_runtime =3D tg->rt_bandwidth.rt_runtime;
-		init_tg_rt_entry(tg, rt_rq, rt_se, i, parent->rt_se[i]);
+		init_rt_rq(&s_rq->rt);
+		init_dl_entity(dl_se);
+		dl_se->dl_runtime =3D tg->dl_bandwidth.dl_runtime;
+		dl_se->dl_period =3D tg->dl_bandwidth.dl_period;
+		dl_se->dl_deadline =3D dl_se->dl_period;
+		dl_se->dl_bw =3D to_ratio(dl_se->dl_period, dl_se->dl_runtime);
+
+		dl_server_init(dl_se, &cpu_rq(i)->dl, s_rq, rt_server_has_tasks, rt_serv=
er_pick);
+
+		init_tg_rt_entry(tg, s_rq, dl_se, i, parent->dl_se[i]);
 	}
=20
 	return 1;
=20
 err_free_rq:
-	kfree(rt_rq);
+	kfree(s_rq);
 err:
 	return 0;
 }
@@ -391,6 +433,10 @@ static inline void dequeue_pushable_task(struct rt_rq =
*rt_rq, struct task_struct
 static inline void rt_queue_push_tasks(struct rq *rq)
 {
 }
+
+static inline void rt_queue_pull_task(struct rq *rq)
+{
+}
 #endif /* CONFIG_SMP */
=20
 static void enqueue_top_rt_rq(struct rt_rq *rt_rq);
@@ -449,7 +495,7 @@ static inline u64 sched_rt_runtime(struct rt_rq *rt_rq)
=20
 static inline u64 sched_rt_period(struct rt_rq *rt_rq)
 {
-	return ktime_to_ns(rt_rq->tg->rt_bandwidth.rt_period);
+	return ktime_to_ns(rt_rq->tg->dl_bandwidth.dl_period);
 }
=20
 typedef struct task_group *rt_rq_iter_t;
@@ -952,6 +998,9 @@ static void update_curr_rt(struct rq *rq)
 {
 	struct task_struct *donor =3D rq->donor;
 	s64 delta_exec;
+#ifdef CONFIG_RT_GROUP_SCHED
+	struct rt_rq *rt_rq;
+#endif
=20
 	if (donor->sched_class !=3D &rt_sched_class)
 		return;
@@ -961,25 +1010,17 @@ static void update_curr_rt(struct rq *rq)
 		return;
=20
 #ifdef CONFIG_RT_GROUP_SCHED
-	struct sched_rt_entity *rt_se =3D &donor->rt;
+	if (!rt_group_sched_enabled())
+		return;
=20
-	if (!rt_bandwidth_enabled())
+	if (!dl_bandwidth_enabled())
 		return;
=20
-	for_each_sched_rt_entity(rt_se) {
-		struct rt_rq *rt_rq =3D rt_rq_of_se(rt_se);
-		int exceeded;
+	rt_rq =3D rt_rq_of_se(&donor->rt);
+	if (is_dl_group(rt_rq)) {
+		struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq);
=20
-		if (sched_rt_runtime(rt_rq) !=3D RUNTIME_INF) {
-			raw_spin_lock(&rt_rq->rt_runtime_lock);
-			rt_rq->rt_time +=3D delta_exec;
-			exceeded =3D sched_rt_runtime_exceeded(rt_rq);
-			if (exceeded)
-				resched_curr(rq);
-			raw_spin_unlock(&rt_rq->rt_runtime_lock);
-			if (exceeded)
-				do_start_rt_bandwidth(sched_rt_bandwidth(rt_rq));
-		}
+		dl_server_update(dl_se, delta_exec);
 	}
 #endif
 }
@@ -1033,7 +1074,7 @@ inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int pr=
ev_prio)
 	/*
 	 * Change rq's cpupri only if rt_rq is the top queue.
 	 */
-	if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && &rq->rt !=3D rt_rq)
+	if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq))
 		return;
=20
 	if (rq->online && prio < prev_prio)
@@ -1048,7 +1089,7 @@ dec_rt_prio_smp(struct rt_rq *rt_rq, int prio, int pr=
ev_prio)
 	/*
 	 * Change rq's cpupri only if rt_rq is the top queue.
 	 */
-	if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && &rq->rt !=3D rt_rq)
+	if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq))
 		return;
=20
 	if (rq->online && rt_rq->highest_prio.curr !=3D prev_prio)
@@ -1177,19 +1218,34 @@ void inc_rt_tasks(struct sched_rt_entity *rt_se, st=
ruct rt_rq *rt_rq)
 	rt_rq->rr_nr_running +=3D rt_se_rr_nr_running(rt_se);
=20
 	inc_rt_prio(rt_rq, prio);
-	inc_rt_group(rt_se, rt_rq);
+
+	if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) {
+		struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq);
+
+		if (!dl_se->dl_throttled)
+			add_nr_running(rq_of_rt_rq(rt_rq), 1);
+	} else {
+		add_nr_running(rq_of_rt_rq(rt_rq), 1);
+	}
 }
=20
 static inline
 void dec_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 {
 	WARN_ON(!rt_prio(rt_se_prio(rt_se)));
-	WARN_ON(!rt_rq->rt_nr_running);
 	rt_rq->rt_nr_running -=3D rt_se_nr_running(rt_se);
 	rt_rq->rr_nr_running -=3D rt_se_rr_nr_running(rt_se);
=20
 	dec_rt_prio(rt_rq, rt_se_prio(rt_se));
-	dec_rt_group(rt_se, rt_rq);
+
+	if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) {
+		struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq);
+
+		if (!dl_se->dl_throttled)
+			sub_nr_running(rq_of_rt_rq(rt_rq), 1);
+	} else {
+		sub_nr_running(rq_of_rt_rq(rt_rq), 1);
+	}
 }
=20
 /*
@@ -1323,21 +1379,8 @@ static void __enqueue_rt_entity(struct sched_rt_enti=
ty *rt_se, unsigned int flag
 {
 	struct rt_rq *rt_rq =3D rt_rq_of_se(rt_se);
 	struct rt_prio_array *array =3D &rt_rq->active;
-	struct rt_rq *group_rq =3D group_rt_rq(rt_se);
 	struct list_head *queue =3D array->queue + rt_se_prio(rt_se);
=20
-	/*
-	 * Don't enqueue the group if its throttled, or when empty.
-	 * The latter is a consequence of the former when a child group
-	 * get throttled and the current group doesn't have any other
-	 * active members.
-	 */
-	if (group_rq && (rt_rq_throttled(group_rq) || !group_rq->rt_nr_running)) {
-		if (rt_se->on_list)
-			__delist_rt_entity(rt_se, array);
-		return;
-	}
-
 	if (move_entity(flags)) {
 		WARN_ON_ONCE(rt_se->on_list);
 		if (flags & ENQUEUE_HEAD)
@@ -1393,31 +1436,16 @@ static void dequeue_rt_stack(struct sched_rt_entity=
 *rt_se, unsigned int flags)
=20
 static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int =
flags)
 {
-	struct rq *rq =3D rq_of_rt_se(rt_se);
-
 	update_stats_enqueue_rt(rt_rq_of_se(rt_se), rt_se, flags);
=20
-	dequeue_rt_stack(rt_se, flags);
-	for_each_sched_rt_entity(rt_se)
-		__enqueue_rt_entity(rt_se, flags);
-	enqueue_top_rt_rq(&rq->rt);
+	__enqueue_rt_entity(rt_se, flags);
 }
=20
 static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int =
flags)
 {
-	struct rq *rq =3D rq_of_rt_se(rt_se);
-
 	update_stats_dequeue_rt(rt_rq_of_se(rt_se), rt_se, flags);
=20
-	dequeue_rt_stack(rt_se, flags);
-
-	for_each_sched_rt_entity(rt_se) {
-		struct rt_rq *rt_rq =3D group_rt_rq(rt_se);
-
-		if (rt_rq && rt_rq->rt_nr_running)
-			__enqueue_rt_entity(rt_se, flags);
-	}
-	enqueue_top_rt_rq(&rq->rt);
+	__enqueue_rt_entity(rt_se, flags);
 }
=20
 /*
@@ -1435,6 +1463,15 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p=
, int flags)
 	check_schedstat_required();
 	update_stats_wait_start_rt(rt_rq_of_se(rt_se), rt_se);
=20
+#ifdef CONFIG_RT_GROUP_SCHED
+	/* Task arriving in an idle group of tasks. */
+	if (is_dl_group(rt_rq) && (rt_rq->rt_nr_running =3D=3D 0)) {
+		struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq);
+
+		dl_server_start(dl_se);
+	}
+#endif
+
 	enqueue_rt_entity(rt_se, flags);
=20
 	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
@@ -1451,6 +1488,15 @@ static bool dequeue_task_rt(struct rq *rq, struct ta=
sk_struct *p, int flags)
=20
 	dequeue_pushable_task(rt_rq, p);
=20
+#ifdef CONFIG_RT_GROUP_SCHED
+	/* Last task of the task group. */
+	if (is_dl_group(rt_rq) && !rt_rq->rt_nr_running) {
+		struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq);
+
+		dl_server_stop(dl_se);
+	}
+#endif
+
 	return true;
 }
=20
@@ -1477,10 +1523,8 @@ static void requeue_task_rt(struct rq *rq, struct ta=
sk_struct *p, int head)
 	struct sched_rt_entity *rt_se =3D &p->rt;
 	struct rt_rq *rt_rq;
=20
-	for_each_sched_rt_entity(rt_se) {
-		rt_rq =3D rt_rq_of_se(rt_se);
-		requeue_rt_entity(rt_rq, rt_se, head);
-	}
+	rt_rq =3D rt_rq_of_se(rt_se);
+	requeue_rt_entity(rt_rq, rt_se, head);
 }
=20
 static void yield_task_rt(struct rq *rq)
@@ -1612,6 +1656,36 @@ static void wakeup_preempt_rt(struct rq *rq, struct =
task_struct *p, int flags)
 {
 	struct task_struct *donor =3D rq->donor;
=20
+#ifdef CONFIG_RT_GROUP_SCHED
+	if (!rt_group_sched_enabled())
+		goto no_group_sched;
+
+	if (is_dl_group(rt_rq_of_se(&p->rt)) &&
+	    is_dl_group(rt_rq_of_se(&rq->curr->rt))) {
+		struct sched_dl_entity *dl_se, *curr_dl_se;
+
+		dl_se =3D dl_group_of(rt_rq_of_se(&p->rt));
+		curr_dl_se =3D dl_group_of(rt_rq_of_se(&rq->curr->rt));
+
+		if (dl_entity_preempt(dl_se, curr_dl_se)) {
+			resched_curr(rq);
+			return;
+		} else if (!dl_entity_preempt(curr_dl_se, dl_se)) {
+			if (p->prio < rq->curr->prio) {
+				resched_curr(rq);
+				return;
+			}
+		}
+		return;
+	} else if (is_dl_group(rt_rq_of_se(&p->rt))) {
+		resched_curr(rq);
+		return;
+	} else if (is_dl_group(rt_rq_of_se(&rq->curr->rt))) {
+		return;
+	}
+#endif
+
+no_group_sched:
 	if (p->prio < donor->prio) {
 		resched_curr(rq);
 		return;
@@ -1679,17 +1753,12 @@ static struct sched_rt_entity *pick_next_rt_entity(=
struct rt_rq *rt_rq)
 	return next;
 }
=20
-static struct task_struct *_pick_next_task_rt(struct rq *rq)
+static struct task_struct *_pick_next_task_rt(struct rt_rq *rt_rq)
 {
 	struct sched_rt_entity *rt_se;
-	struct rt_rq *rt_rq  =3D &rq->rt;
=20
-	do {
-		rt_se =3D pick_next_rt_entity(rt_rq);
-		if (unlikely(!rt_se))
-			return NULL;
-		rt_rq =3D group_rt_rq(rt_se);
-	} while (rt_rq);
+	rt_se =3D pick_next_rt_entity(rt_rq);
+	BUG_ON(!rt_se);
=20
 	return rt_task_of(rt_se);
 }
@@ -1701,7 +1770,7 @@ static struct task_struct *pick_task_rt(struct rq *rq)
 	if (!sched_rt_runnable(rq))
 		return NULL;
=20
-	p =3D _pick_next_task_rt(rq);
+	p =3D _pick_next_task_rt(&rq->rt);
=20
 	return p;
 }
@@ -2337,12 +2406,36 @@ static void pull_rt_task(struct rq *this_rq)
 		resched_curr(this_rq);
 }
=20
+#ifdef CONFIG_RT_GROUP_SCHED
+static int group_push_rt_task(struct rt_rq *rt_rq)
+{
+	struct rq *rq =3D rq_of_rt_rq(rt_rq);
+
+	if (is_dl_group(rt_rq))
+		return 0;
+
+	return push_rt_task(rq, false);
+}
+
+static void group_push_rt_tasks(struct rt_rq *rt_rq)
+{
+	while (group_push_rt_task(rt_rq))
+		;
+}
+#else
+static void group_push_rt_tasks(struct rt_rq *rt_rq)
+{
+	push_rt_tasks(rq_of_rt_rq(rt_rq));
+}
+#endif
+
 /*
  * If we are not running and we are not going to reschedule soon, we should
  * try to push tasks away now
  */
 static void task_woken_rt(struct rq *rq, struct task_struct *p)
 {
+	struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt);
 	bool need_to_push =3D !task_on_cpu(rq, p) &&
 			    !test_tsk_need_resched(rq->curr) &&
 			    p->nr_cpus_allowed > 1 &&
@@ -2351,7 +2444,7 @@ static void task_woken_rt(struct rq *rq, struct task_=
struct *p)
 			     rq->donor->prio <=3D p->prio);
=20
 	if (need_to_push)
-		push_rt_tasks(rq);
+		group_push_rt_tasks(rt_rq);
 }
=20
 /* Assumes rq->lock is held */
@@ -2360,8 +2453,6 @@ static void rq_online_rt(struct rq *rq)
 	if (rq->rt.overloaded)
 		rt_set_overload(rq);
=20
-	__enable_runtime(rq);
-
 	cpupri_set(&rq->rd->cpupri, rq->cpu, rq->rt.highest_prio.curr);
 }
=20
@@ -2371,8 +2462,6 @@ static void rq_offline_rt(struct rq *rq)
 	if (rq->rt.overloaded)
 		rt_clear_overload(rq);
=20
-	__disable_runtime(rq);
-
 	cpupri_set(&rq->rd->cpupri, rq->cpu, CPUPRI_INVALID);
 }
=20
@@ -2382,6 +2471,8 @@ static void rq_offline_rt(struct rq *rq)
  */
 static void switched_from_rt(struct rq *rq, struct task_struct *p)
 {
+	struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt);
+
 	/*
 	 * If there are other RT tasks then we will reschedule
 	 * and the scheduling of the other RT tasks will handle
@@ -2389,10 +2480,12 @@ static void switched_from_rt(struct rq *rq, struct =
task_struct *p)
 	 * we may need to handle the pulling of RT tasks
 	 * now.
 	 */
-	if (!task_on_rq_queued(p) || rq->rt.rt_nr_running)
+	if (!task_on_rq_queued(p) || rt_rq->rt_nr_running)
 		return;
=20
+#ifndef CONFIG_RT_GROUP_SCHED
 	rt_queue_pull_task(rq);
+#endif
 }
=20
 void __init init_sched_rt_class(void)
@@ -2429,8 +2522,16 @@ static void switched_to_rt(struct rq *rq, struct tas=
k_struct *p)
 	 */
 	if (task_on_rq_queued(p)) {
 #ifdef CONFIG_SMP
+#ifndef CONFIG_RT_GROUP_SCHED
 		if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
 			rt_queue_push_tasks(rq);
+#else
+		if (rt_rq_of_se(&p->rt)->overloaded) {
+		} else {
+			if (p->prio < rq->curr->prio)
+				resched_curr(rq);
+		}
+#endif
 #endif /* CONFIG_SMP */
 		if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq)))
 			resched_curr(rq);
@@ -2444,6 +2545,10 @@ static void switched_to_rt(struct rq *rq, struct tas=
k_struct *p)
 static void
 prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
 {
+#ifdef CONFIG_SMP
+	struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt);
+#endif
+
 	if (!task_on_rq_queued(p))
 		return;
=20
@@ -2453,14 +2558,16 @@ prio_changed_rt(struct rq *rq, struct task_struct *=
p, int oldprio)
 		 * If our priority decreases while running, we
 		 * may need to pull tasks to this runqueue.
 		 */
+#ifndef CONFIG_RT_GROUP_SCHED
 		if (oldprio < p->prio)
 			rt_queue_pull_task(rq);
+#endif
=20
 		/*
 		 * If there's a higher priority task waiting to run
 		 * then reschedule.
 		 */
-		if (p->prio > rq->rt.highest_prio.curr)
+		if (p->prio > rt_rq->highest_prio.curr)
 			resched_curr(rq);
 #else
 		/* For UP simply resched on drop of prio */
@@ -2468,6 +2575,15 @@ prio_changed_rt(struct rq *rq, struct task_struct *p=
, int oldprio)
 			resched_curr(rq);
 #endif /* CONFIG_SMP */
 	} else {
+		/*
+		 * This task is not running, thus we check against the currently
+		 * running task for preemption. We can preempt only if both tasks are
+		 * in the same cgroup or on the global runqueue.
+		 */
+		if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) &&
+		    rt_rq_of_se(&p->rt)->tg !=3D rt_rq_of_se(&rq->curr->rt)->tg)
+			return;
+
 		/*
 		 * This task is not running, but if it is
 		 * greater than the current running task
@@ -2539,12 +2655,12 @@ static void task_tick_rt(struct rq *rq, struct task=
_struct *p, int queued)
 	 * Requeue to the end of queue if we (and all of our ancestors) are not
 	 * the only element on the queue
 	 */
-	for_each_sched_rt_entity(rt_se) {
-		if (rt_se->run_list.prev !=3D rt_se->run_list.next) {
-			requeue_task_rt(rq, p, 0);
-			resched_curr(rq);
-			return;
-		}
+	if (rt_se->run_list.prev !=3D rt_se->run_list.next) {
+		requeue_task_rt(rq, p, 0);
+		resched_curr(rq);
+		// set_tsk_need_resched(p);
+
+		return;
 	}
 }
=20
@@ -2562,16 +2678,16 @@ static unsigned int get_rr_interval_rt(struct rq *r=
q, struct task_struct *task)
 #ifdef CONFIG_SCHED_CORE
 static int task_is_throttled_rt(struct task_struct *p, int cpu)
 {
-	struct rt_rq *rt_rq;
-
 #ifdef CONFIG_RT_GROUP_SCHED // XXX maybe add task_rt_rq(), see also sched=
_rt_period_rt_rq
+	struct rt_rq *rt_rq;
+=09
 	rt_rq =3D task_group(p)->rt_rq[cpu];
 	WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group);
+
+	return dl_group_of(rt_rq)->dl_throttled;
 #else
-	rt_rq =3D &cpu_rq(cpu)->rt;
+	return 0;
 #endif
-
-	return rt_rq_throttled(rt_rq);
 }
 #endif
=20
@@ -2655,8 +2771,8 @@ static int tg_rt_schedulable(struct task_group *tg, v=
oid *data)
 	unsigned long total, sum =3D 0;
 	u64 period, runtime;
=20
-	period =3D ktime_to_ns(tg->rt_bandwidth.rt_period);
-	runtime =3D tg->rt_bandwidth.rt_runtime;
+	period  =3D tg->dl_bandwidth.dl_period;
+	runtime =3D tg->dl_bandwidth.dl_runtime;
=20
 	if (tg =3D=3D d->tg) {
 		period =3D d->rt_period;
@@ -2672,8 +2788,7 @@ static int tg_rt_schedulable(struct task_group *tg, v=
oid *data)
 	/*
 	 * Ensure we don't starve existing RT tasks if runtime turns zero.
 	 */
-	if (rt_bandwidth_enabled() && !runtime &&
-	    tg->rt_bandwidth.rt_runtime && tg_has_rt_tasks(tg))
+	if (dl_bandwidth_enabled() && !runtime && tg_has_rt_tasks(tg))
 		return -EBUSY;
=20
 	if (WARN_ON(!rt_group_sched_enabled() && tg !=3D &root_task_group))
@@ -2687,12 +2802,17 @@ static int tg_rt_schedulable(struct task_group *tg,=
 void *data)
 	if (total > to_ratio(global_rt_period(), global_rt_runtime()))
 		return -EINVAL;
=20
+	if (tg =3D=3D &root_task_group) {
+		if (!dl_check_tg(total))
+			return -EBUSY;
+	}
+
 	/*
 	 * The sum of our children's runtime should not exceed our own.
 	 */
 	list_for_each_entry_rcu(child, &tg->children, siblings) {
-		period =3D ktime_to_ns(child->rt_bandwidth.rt_period);
-		runtime =3D child->rt_bandwidth.rt_runtime;
+		period  =3D child->dl_bandwidth.dl_period;
+		runtime =3D child->dl_bandwidth.dl_runtime;
=20
 		if (child =3D=3D d->tg) {
 			period =3D d->rt_period;
@@ -2718,6 +2838,20 @@ static int __rt_schedulable(struct task_group *tg, u=
64 period, u64 runtime)
 		.rt_runtime =3D runtime,
 	};
=20
+	/*
+	* Since we truncate DL_SCALE bits, make sure we're at least
+	* that big.
+	*/
+	if (runtime !=3D 0 && runtime < (1ULL << DL_SCALE))
+		return -EINVAL;
+
+	/*
+	* Since we use the MSB for wrap-around and sign issues, make
+	* sure it's not set (mind that period can be equal to zero).
+	*/
+	if (period & (1ULL << 63))
+		return -EINVAL;
+
 	rcu_read_lock();
 	ret =3D walk_tg_tree(tg_rt_schedulable, tg_nop, &data);
 	rcu_read_unlock();
@@ -2752,18 +2886,21 @@ static int tg_set_rt_bandwidth(struct task_group *t=
g,
 	if (err)
 		goto unlock;
=20
-	raw_spin_lock_irq(&tg->rt_bandwidth.rt_runtime_lock);
-	tg->rt_bandwidth.rt_period =3D ns_to_ktime(rt_period);
-	tg->rt_bandwidth.rt_runtime =3D rt_runtime;
+	raw_spin_lock_irq(&tg->dl_bandwidth.dl_runtime_lock);
+	tg->dl_bandwidth.dl_period  =3D rt_period;
+	tg->dl_bandwidth.dl_runtime =3D rt_runtime;
=20
-	for_each_possible_cpu(i) {
-		struct rt_rq *rt_rq =3D tg->rt_rq[i];
+	if (tg =3D=3D &root_task_group)
+		goto unlock_bandwidth;
=20
-		raw_spin_lock(&rt_rq->rt_runtime_lock);
-		rt_rq->rt_runtime =3D rt_runtime;
-		raw_spin_unlock(&rt_rq->rt_runtime_lock);
+	for_each_possible_cpu(i) {
+		if (!dl_init_tg(tg->dl_se[i], rt_runtime, rt_period)) {
+			err =3D -EINVAL;
+			break;
+		}
 	}
-	raw_spin_unlock_irq(&tg->rt_bandwidth.rt_runtime_lock);
+unlock_bandwidth:
+	raw_spin_unlock_irq(&tg->dl_bandwidth.dl_runtime_lock);
 unlock:
 	mutex_unlock(&rt_constraints_mutex);
=20
@@ -2774,7 +2911,7 @@ int sched_group_set_rt_runtime(struct task_group *tg,=
 long rt_runtime_us)
 {
 	u64 rt_runtime, rt_period;
=20
-	rt_period =3D ktime_to_ns(tg->rt_bandwidth.rt_period);
+	rt_period  =3D tg->dl_bandwidth.dl_period;
 	rt_runtime =3D (u64)rt_runtime_us * NSEC_PER_USEC;
 	if (rt_runtime_us < 0)
 		rt_runtime =3D RUNTIME_INF;
@@ -2788,10 +2925,10 @@ long sched_group_rt_runtime(struct task_group *tg)
 {
 	u64 rt_runtime_us;
=20
-	if (tg->rt_bandwidth.rt_runtime =3D=3D RUNTIME_INF)
+	if (tg->dl_bandwidth.dl_runtime =3D=3D RUNTIME_INF)
 		return -1;
=20
-	rt_runtime_us =3D tg->rt_bandwidth.rt_runtime;
+	rt_runtime_us =3D tg->dl_bandwidth.dl_runtime;
 	do_div(rt_runtime_us, NSEC_PER_USEC);
 	return rt_runtime_us;
 }
@@ -2804,7 +2941,7 @@ int sched_group_set_rt_period(struct task_group *tg, =
u64 rt_period_us)
 		return -EINVAL;
=20
 	rt_period =3D rt_period_us * NSEC_PER_USEC;
-	rt_runtime =3D tg->rt_bandwidth.rt_runtime;
+	rt_runtime =3D tg->dl_bandwidth.dl_runtime;
=20
 	return tg_set_rt_bandwidth(tg, rt_period, rt_runtime);
 }
@@ -2813,7 +2950,7 @@ long sched_group_rt_period(struct task_group *tg)
 {
 	u64 rt_period_us;
=20
-	rt_period_us =3D ktime_to_ns(tg->rt_bandwidth.rt_period);
+	rt_period_us =3D tg->dl_bandwidth.dl_period;
 	do_div(rt_period_us, NSEC_PER_USEC);
 	return rt_period_us;
 }
@@ -2834,7 +2971,7 @@ static int sched_rt_global_constraints(void)
 int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk)
 {
 	/* Don't accept real-time tasks when there is no way for them to run */
-	if (rt_group_sched_enabled() && rt_task(tsk) && tg->rt_bandwidth.rt_runti=
me =3D=3D 0)
+	if (rt_group_sched_enabled() && rt_task(tsk) && tg->dl_bandwidth.dl_runti=
me =3D=3D 0)
 		return 0;
=20
 	return 1;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 439a95239..c7227a510 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -318,6 +318,13 @@ struct rt_bandwidth {
 	unsigned int		rt_period_active;
 };
=20
+struct dl_bandwidth {
+	raw_spinlock_t          dl_runtime_lock;
+	u64                     dl_runtime;
+	u64                     dl_period;
+};
+
+
 static inline int dl_bandwidth_enabled(void)
 {
 	return sysctl_sched_rt_runtime >=3D 0;
@@ -385,6 +392,8 @@ extern void dl_server_init(struct sched_dl_entity *dl_s=
e, struct dl_rq *dl_rq,
 		    struct rq *served_rq,
 		    dl_server_has_tasks_f has_tasks,
 		    dl_server_pick_f pick_task);
+int dl_check_tg(unsigned long total);
+int dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_runtime, u64 rt_perio=
d);
=20
 extern void dl_server_update_idle_time(struct rq *rq,
 		    struct task_struct *p);
@@ -455,9 +464,15 @@ struct task_group {
=20
 #ifdef CONFIG_RT_GROUP_SCHED
 	struct sched_rt_entity	**rt_se;
+	/*
+	 * The scheduling entities for the task group are managed as a single
+	 * sched_dl_entity, each of them sharing the same dl_bandwidth.
+	 */
+	struct sched_dl_entity	**dl_se;
 	struct rt_rq		**rt_rq;
=20
 	struct rt_bandwidth	rt_bandwidth;
+	struct dl_bandwidth	dl_bandwidth;
 #endif
=20
 #ifdef CONFIG_EXT_GROUP_SCHED
@@ -552,9 +567,9 @@ extern void start_cfs_bandwidth(struct cfs_bandwidth *c=
fs_b);
 extern void unthrottle_cfs_rq(struct cfs_rq *cfs_rq);
 extern bool cfs_task_bw_constrained(struct task_struct *p);
=20
-extern void init_tg_rt_entry(struct task_group *tg, struct rt_rq *rt_rq,
-		struct sched_rt_entity *rt_se, int cpu,
-		struct sched_rt_entity *parent);
+extern void init_tg_rt_entry(struct task_group *tg, struct rq *s_rq,
+		struct sched_dl_entity *rt_se, int cpu,
+		struct sched_dl_entity *parent);
 extern int sched_group_set_rt_runtime(struct task_group *tg, long rt_runti=
me_us);
 extern int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_=
us);
 extern long sched_group_rt_runtime(struct task_group *tg);
@@ -784,7 +799,7 @@ struct scx_rq {
=20
 static inline int rt_bandwidth_enabled(void)
 {
-	return sysctl_sched_rt_runtime >=3D 0;
+	return 0;
 }
=20
 /* RT IPI pull logic requires IRQ_WORK */
@@ -820,12 +835,12 @@ struct rt_rq {
 	raw_spinlock_t		rt_runtime_lock;
=20
 	unsigned int		rt_nr_boosted;
-
-	struct rq		*rq; /* this is always top-level rq, cache? */
 #endif
 #ifdef CONFIG_CGROUP_SCHED
 	struct task_group	*tg; /* this tg has "this" rt_rq on given CPU for runna=
ble entities */
 #endif
+
+	struct rq		*rq; /* this is always top-level rq, cache? */
 };
=20
 static inline bool rt_rq_is_runnable(struct rt_rq *rt_rq)
@@ -2174,7 +2189,7 @@ static inline void set_task_rq(struct task_struct *p,=
 unsigned int cpu)
 	if (!rt_group_sched_enabled())
 		tg =3D &root_task_group;
 	p->rt.rt_rq  =3D tg->rt_rq[cpu];
-	p->rt.parent =3D tg->rt_se[cpu];
+	p->dl.dl_rq  =3D &cpu_rq(cpu)->dl;
 #endif
 }
=20
@@ -2702,6 +2717,7 @@ extern void resched_cpu(int cpu);
 extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 r=
untime);
 extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq);
=20
+void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime);
 extern void init_dl_entity(struct sched_dl_entity *dl_se);
=20
 #define BW_SHIFT		20
@@ -2760,6 +2776,7 @@ static inline void add_nr_running(struct rq *rq, unsi=
gned count)
=20
 static inline void sub_nr_running(struct rq *rq, unsigned count)
 {
+	BUG_ON(rq->nr_running < count);
 	rq->nr_running -=3D count;
 	if (trace_sched_update_nr_running_tp_enabled()) {
 		call_trace_sched_update_nr_running(rq, -count);
@@ -3131,9 +3148,6 @@ static inline void double_rq_unlock(struct rq *rq1, s=
truct rq *rq2)
 #ifdef CONFIG_RT_GROUP_SCHED
 static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
 {
-#ifdef CONFIG_SCHED_DEBUG
-	WARN_ON_ONCE(rt_se->my_q);
-#endif
 	return container_of(rt_se, struct task_struct, rt);
 }
=20
@@ -3153,6 +3167,21 @@ static inline struct rq *rq_of_rt_se(struct sched_rt=
_entity *rt_se)
=20
 	return rt_rq->rq;
 }
+
+static inline int is_dl_group(struct rt_rq *rt_rq)
+{
+	return rt_rq->tg !=3D &root_task_group;
+}
+
+/*
+ * Return the scheduling entity of this group of tasks.
+ */
+static inline struct sched_dl_entity *dl_group_of(struct rt_rq *rt_rq)
+{
+	BUG_ON(!is_dl_group(rt_rq));
+
+	return rt_rq->tg->dl_se[cpu_of(rt_rq->rq)];
+}
 #else
 static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se)
 {
@@ -3177,6 +3206,16 @@ static inline struct rt_rq *rt_rq_of_se(struct sched=
_rt_entity *rt_se)
=20
 	return &rq->rt;
 }
+
+static inline int is_dl_group(struct rt_rq *rt_rq)
+{
+	return 0;
+}
+
+static inline struct sched_dl_entity *dl_group_of(struct rt_rq *rt_rq)
+{
+	return NULL;
+}
 #endif
=20
 DEFINE_LOCK_GUARD_2(double_rq_lock, struct rq,
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 547c1f05b..6c6666b39 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -635,8 +635,8 @@ int __sched_setscheduler(struct task_struct *p,
 		 * assigned.
 		 */
 		if (rt_group_sched_enabled() &&
-				rt_bandwidth_enabled() && rt_policy(policy) &&
-				task_group(p)->rt_bandwidth.rt_runtime =3D=3D 0 &&
+				dl_bandwidth_enabled() && rt_policy(policy) &&
+				task_group(p)->dl_bandwidth.dl_runtime =3D=3D 0 &&
 				!task_group_is_autogroup(task_group(p))) {
 			retval =3D -EPERM;
 			goto unlock;
--=20
2.49.0