From nobody Sun Feb  8 17:00:57 2026
Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com
 [209.85.208.50])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0B6F314A8F
	for <linux-kernel@vger.kernel.org>; Mon,  1 Dec 2025 12:42:32 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.208.50
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764592956; cv=none;
 b=HARzl6N7+8IsPZ4KB8eb8GlY3D6BMfJdZtI8foaBZasztcrT+aoT7wJelhrwl5ESEAzI+Yr6h0mSoRk03TF0VPuxPGlDt2dCGmGm66LutiYmXAx8UzOCjxxpOozZg3UfT6nM7XTHdMo+U+BdCXuyHYUCMQ9ABKoOPWF8mSCDdhQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764592956; c=relaxed/simple;
	bh=3kD8/0vC8COcMppQkPHYbD3ew+AHoKKbw49D25FD9f0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=EHfTgCkUzYS7oUvI8uv18UrpixaRG/BCJV6Vp1m4kb7WJWKNJOo8jEjtimHNs9qQXZ2avXBTqFTkz4iXoJHxAuUtEFvTqwck+VTMcuogqV6cGjUkw5eFz1PrQhjb+Nky4gJyqmKmQY6a+J93p8Dkf3yqSJDQ+H0jvDcz43fnDeg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com;
 spf=pass smtp.mailfrom=gmail.com;
 dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b=b++cUUVj; arc=none smtp.client-ip=209.85.208.50
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="b++cUUVj"
Received: by mail-ed1-f50.google.com with SMTP id
 4fb4d7f45d1cf-640ca678745so7183443a12.2
        for <linux-kernel@vger.kernel.org>;
 Mon, 01 Dec 2025 04:42:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1764592951; x=1765197751;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=VynGhLyLnCtRDFI80gOjWbt3RNlLcXHqh6uWjY2TOIo=;
        b=b++cUUVj28b7QdfYwFvDD/r9qD+NWReLJiROwMinwhDmA36Umga/1sMQJMDDiYM6Yb
         auNxxC6j8wcYIRH0HtWL3xe9MoJFihULWzVWbQnuxkJPCqqjIbh3aPD/WQE8fY+joRop
         ezTnoMjhN7G/no+Mp/zdVnh71b3cZSilwwyyh1hgGm2VRV5vFGL5RchsSQ1QHmOGtZ2Q
         gnRxBPSFvdIYZERS89I/+JjnnSa08/+xHemc9Ek9LncfZJGsZi9NhACYTBCi8uhBhsTM
         oC0xrREK88FqC1zl6fbfaoL1kw7ECcTZolaWpl4YEwDnW7gvgIYmTzyYKNaWvuhdPO+q
         LsOw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1764592951; x=1765197751;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=VynGhLyLnCtRDFI80gOjWbt3RNlLcXHqh6uWjY2TOIo=;
        b=SOghgJgBgL0hCPNsS+OBseByFpWE2zfT/oyBhTMjPM/xCCI6qW8H4+/4afHBtmGYmf
         vpZENxmtQZf3Z6x3cFSvP28kuMIi7uQl+oBJ787cHayx3pHu4CYGO8j1+6M2Pyeoc33v
         67ST7VUsUfHQqIgLRBl328j94mUPxvWqZ77iHW6+lW3oNfHxthFpLdmFVmjxL4OtUoI+
         FXPkWtrzp8FckHccP6kJ9gNuUOKkFvVEHyhETwnJHxOwtEuZZXcw6Z50u6l/Lzlzl9x6
         G9LiyGzMcLv2oJO3CyV8w4t3167y1iaPpnySJVfHuc1ztJ9/svD7Xz8ejHsC21Xb7Pq0
         1trg==
X-Gm-Message-State: AOJu0YzpN41Bp8yYYF3kYaOPpHRnisvIiiFRiyQ2IKDRxSKQzfxg0cGZ
	k1zI1kDBnnxgzPEQzVrwDe81YPMRy5blb6m9va9NqYnjplvEeVqM3rPX
X-Gm-Gg: ASbGncv1f3jEtSQfPb0YLLrZllsXSM07bnFGHS8W+jgdOiFPJxnaD7XZ/RvXmdYglo+
	DuJWCZHAjW+HA1HjyykjUyRfPfPc2r8CvkKM4KtEguGoeIBiUJgg+l6EFAll5GxBJZsC5ayryp/
	HC85Rw7ZGYdw4CD8Oq3jlN229q/KEp/5mQSJBoybUAPqBPcBPJPDoGi+tw2J7Ds+Ld56LgLxEeC
	0FlZbbt76HdV19ds2vtuGy5BtUVDinBBHh0rJyv0mG/nlj+9mfYiNKxHq4pNUyzJbUQvEwd5WcN
	EhABpzhplCsjimaX4WTtz8+VAfy3lTztnV7qSV6rQ/iDzS6oSRRxNh8cNjP4Gi5hq2mkcPp9Hd9
	TFMecI0iAqRmAyVXVqCDIS82ZGK9cOawzdL/LiGS2vpnXKNi+4FlntlAvOJKDb9AqkM6qmQSLhP
	lguSsCpfr2CnQJX9oJ/QU=
X-Google-Smtp-Source: 
 AGHT+IFEdo2RA+warFuEtNhkPY15DCVdmxNetZsYDhkWI9DDCFflVPPNXSfKQ7D3gl/EfdkdPIAWbQ==
X-Received: by 2002:a17:907:7282:b0:b73:278a:a499 with SMTP id
 a640c23a62f3a-b76715654bbmr4582001566b.15.1764592950575;
        Mon, 01 Dec 2025 04:42:30 -0800 (PST)
Received: from victus-lab ([193.205.81.5])
        by smtp.gmail.com with ESMTPSA id
 a640c23a62f3a-b76f59e8612sm1173738266b.52.2025.12.01.04.42.29
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 01 Dec 2025 04:42:29 -0800 (PST)
From: Yuri Andriaccio <yurand2000@gmail.com>
To: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>
Cc: linux-kernel@vger.kernel.org,
	Luca Abeni <luca.abeni@santannapisa.it>,
	Yuri Andriaccio <yuri.andriaccio@santannapisa.it>
Subject: [RFC PATCH v4 26/28] Documentation: Update documentation for
 real-time cgroups
Date: Mon,  1 Dec 2025 13:41:59 +0100
Message-ID: <20251201124205.11169-27-yurand2000@gmail.com>
X-Mailer: git-send-email 2.51.0
In-Reply-To: <20251201124205.11169-1-yurand2000@gmail.com>
References: <20251201124205.11169-1-yurand2000@gmail.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Update the RT_GROUP_SCHED specific documentation. Give a brief theoretical
background for Hierarchical Constant Bandwidth Server (HCBS). Document how
the HCBS is implemented in the kernel and how the RT_GROUP_SCHED behaves
now compared to the version which this patchset replaces.

Signed-off-by: Yuri Andriaccio <yurand2000@gmail.com>
---
 Documentation/scheduler/sched-rt-group.rst | 500 ++++++++++++++++++---
 1 file changed, 426 insertions(+), 74 deletions(-)

diff --git a/Documentation/scheduler/sched-rt-group.rst b/Documentation/sch=
eduler/sched-rt-group.rst
index ab464335d3..a5a9203355 100644
--- a/Documentation/scheduler/sched-rt-group.rst
+++ b/Documentation/scheduler/sched-rt-group.rst
@@ -53,9 +53,12 @@ CPU time is divided by means of specifying how much time=
 can be spent running
 in a given period. We allocate this "run time" for each real-time group wh=
ich
 the other real-time groups will not be permitted to use.

-Any time not allocated to a real-time group will be used to run normal pri=
ority
-tasks (SCHED_OTHER). Any allocated run time not used will also be picked u=
p by
-SCHED_OTHER.
+Each real-time group runs at the same priority as SCHED_DEADLINE, thus they
+share and contend the SCHED_DEADLINE allowed bandwidth. Any time not alloc=
ated
+to a real-time group (and SCHED_DEADLINE tasks) will be used to run both
+SCHED_FIFO/SCHED_RR, normal priority tasks (SCHED_OTHER), and SCHED_EXT ta=
sks,
+following the usual priorities. Any allocated run time not used will also =
be
+picked up by the other scheduling classes, in the same order as before.

 Let's consider an example: a frame fixed real-time renderer must deliver 25
 frames a second, which yields a period of 0.04s per frame. Now say it will=
 also
@@ -73,10 +76,6 @@ The remaining CPU time will be used for user input and o=
ther tasks. Because
 real-time tasks have explicitly allocated the CPU time they need to perform
 their tasks, buffer underruns in the graphics or audio can be eliminated.

-NOTE: the above example is not fully implemented yet. We still
-lack an EDF scheduler to make non-uniform periods usable.
-
-
 2. The Interface
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

@@ -86,40 +85,92 @@ lack an EDF scheduler to make non-uniform periods usabl=
e.

 The system wide settings are configured under the /proc virtual file syste=
m:

-/proc/sys/kernel/sched_rt_period_us:
+``/proc/sys/kernel/sched_rt_period_us``:
   The scheduling period that is equivalent to 100% CPU bandwidth.

-/proc/sys/kernel/sched_rt_runtime_us:
-  A global limit on how much time real-time scheduling may use. This is al=
ways
-  less or equal to the period_us, as it denotes the time allocated from the
-  period_us for the real-time tasks. Without CONFIG_RT_GROUP_SCHED enabled,
-  this only serves for admission control of deadline tasks. With
-  CONFIG_RT_GROUP_SCHED=3Dy it also signifies the total bandwidth availabl=
e to
-  all real-time groups.
+``/proc/sys/kernel/sched_rt_runtime_us``:
+  A global limit on how much time real-time scheduling may use (SCHED_DEAD=
LINE
+  tasks + real-time groups). This is always less or equal to the period_us=
, as
+  it denotes the time allocated from the period_us for the real-time tasks.
+  Without **CONFIG_RT_GROUP_SCHED** enabled, this only serves for admission
+  control of deadline tasks. With **CONFIG_RT_GROUP_SCHED=3Dy** it also si=
gnifies
+  the total bandwidth available to both real-time groups and deadline task=
s.

   * Time is specified in us because the interface is s32. This gives an
     operating range from 1us to about 35 minutes.
-  * sched_rt_period_us takes values from 1 to INT_MAX.
-  * sched_rt_runtime_us takes values from -1 to sched_rt_period_us.
+  * ``sched_rt_period_us`` takes values from 1 to INT_MAX.
+  * ``sched_rt_runtime_us`` takes values from -1 to ``sched_rt_period_us``.
   * A run time of -1 specifies runtime =3D=3D period, ie. no limit.
-  * sched_rt_runtime_us/sched_rt_period_us > 0.05 inorder to preserve
+  * ``sched_rt_runtime_us/sched_rt_period_us`` > 0.05 inorder to preserve
     bandwidth for fair dl_server. For accurate value check average of
-    runtime/period in /sys/kernel/debug/sched/fair_server/cpuX/
-
-
-2.2 Default behaviour
----------------------
-
-The default values for sched_rt_period_us (1000000 or 1s) and
-sched_rt_runtime_us (950000 or 0.95s).  This gives 0.05s to be used by
-SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away
-real-time tasks will not lock up the machine but leave a little time to re=
cover
-it.  By setting runtime to -1 you'd get the old behaviour back.
-
-By default all bandwidth is assigned to the root group and new groups get =
the
-period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you
-want to assign bandwidth to another group, reduce the root group's bandwid=
th
-and assign some or all of the difference to another group.
+    runtime/period in ``/sys/kernel/debug/sched/fair_server/cpuX/``
+
+The default value for ``sched_rt_period_us`` is 1000000 (or 1s) and for
+``sched_rt_runtime_us`` is 950000 (or 0.95s). This gives a minimum of 0.05=
s to
+be used by SCHED_FIFO/SCHED_RR and non-RT tasks (SCHED_OTHER, SCHED_EXT), =
while
+0.95s are the maximum to be used by SCHED_DEADLINE, and rt-cgroups if enab=
led.
+
+2.2 Cgroup settings
+-------------------
+
+Enabling **CONFIG_RT_GROUP_SCHED** lets you explicitly allocate real CPU
+bandwidth to task groups.
+
+This uses the cgroup virtual file system and the CPU controller for cgroup=
s.
+Enabling the controller for the hierarchy creates two files:
+
+* ``<cgroup>/cpu.rt_period_us``, the scheduling period of the group.
+* ``<cgroup>/cpu.rt_runtime_us``, the maximum runtime each CPU will provide
+  every period.
+
+ .. tip::
+  For more information on working with control groups, you should read
+  *Documentation/admin-guide/cgroup-v1/cgroups.rst* as well.
+ ..
+
+By default the root cgroup has the same period of
+``/proc/sys/kernel/sched_rt_period_us``, which is 1s, and a runtime of zer=
o, so
+that rt-cgroup is *soft-disabled* by default, and all the runtime is avail=
able
+for SCHED_DEADLINE tasks only. New groups instead get both period and runt=
ime of
+zero.
+
+2.3 Cgroup Hierarchy and Behaviours
+-----------------------------------
+
+With HCBS, cgroups may act either as task runners or bandwidth reservation:
+
+* A bandwidth reservation cgroup (such as the root control group), has the
+  purpose to reserve a portion of the total real-time bandwidth for its su=
b-tree
+  of groups. A group in this state cannot run SCHED_FIFO/SCHED_RR tasks.
+
+  .. important::
+    The *root control group* behaviour is different from the other cgroups=
, as
+    its job is to reserve bandwidth for the whole group hierarchy, but it =
can
+    also run rt tasks. This is an exception: FIFO/RR tasks running in the
+    root cgroup follow the same rules as FIFO/RR tasks in a kernel which h=
as
+    **CONFIG_RT_GROUP_SCHED=3Dn**, and the bandwidth reservation is instea=
d a
+    feature connected to HCBS, that acts on the cgroup tree.
+  ..
+
+* A *live* group instead can be used to run FIFO/RR tasks, with the given
+  bandwidth parameters: each CPU is served a *potentially continuous* runt=
ime of
+  ``<cgroup>/cpu.rt_runtime_us`` every period ``<cgroup>/cpu.rt_period_us`=
`. It
+  is important to notice that increasing the period but leaving the bandwi=
dth
+  constant changes the behaviour of the cgroup's servers, as the bandwidth=
 given
+  overall is the same, but it is given in longer bursts (and longer slices=
 of no
+  bandwidth).
+
+More specifically on *live* and non-*live*:
+
+* A group is deemed *live* if it is a leaf of the groups' hierarchy or all=
 of
+  its children have runtime 0.
+* *Live* groups are the only groups allowed to run real-time tasks. A SCHE=
D_FIFO
+  task cannot be migrated in a non-*live* group, neither a task inside this
+  group can change scheduling policy to SCHED_FIFO/SCHED_RR if the group i=
s not
+  *live*.
+* Non-*live* groups are only used for bandwidth reservation.
+* Group's bandwidth follow this invariant: the sum of the bandwidths of a
+  group's children is always less than or equal to the group's bandwidth.

 Real-time group scheduling means you have to assign a portion of total CPU
 bandwidth to the group before it will accept real-time tasks. Therefore yo=
u will
@@ -128,63 +179,364 @@ done that, even if the user has the rights to run pr=
ocesses with real-time
 priority!


-2.3 Basis for grouping tasks
-----------------------------
+3. Theoretical Background
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D
+
+
+ ..  BIG FAT WARNING ******************************************************
+
+ .. warning::
+
+   This section contains a (not-thorough) summary on deadline/hierarchical
+   scheduling theory, and how it applies to real-time control groups.
+   The reader can "safely" skip to Section 4 if only interested in seeing
+   how the scheduling policy can be used. Anyway, we strongly recommend
+   to come back here and continue reading (once the urge for testing is
+   satisfied :P) to be sure of fully understanding all technical details.
+
+ .. **********************************************************************=
**
+
+The real-time cgroup scheduler is based upon the **Hierarchical Constant
+Bandwidth Server** (HCBS) [1] *Compositional Scheduling Framework* (CSF). A
+**CSF** is a framework where global (system-level) timing properties can be
+established by composing independently (specified and) analyzed local
+(component-level) timing properties [5].
+
+For HCBS (related to the Linux kernel), the compositional framework consis=
ts of
+two parts:
+
+* The *scheduling components*, which are the basic units of the scheduling=
. In
+  the kernel these are the single cgroups along with the tasks that must b=
e run
+  inside.
+
+* The *scheduling resources*, which are the CPUs of the machine.
+
+HCBS is a *hierarchical scheduling framework*, where the scheduling compon=
ents
+form a hierarchy and resources are allocated from parent components to its=
 child
+components in the hierarchy.
+
+The Chapter is organized as follows: **Section 3.1** gives basic real-time
+theory definitions that are used throughout the whole section. **Section 3=
.2**
+talks about the HCBS framework, giving a general idea on how this is struc=
tured.
+**Section 3.3** introduces the MPR model, one of the many models which may=
 be
+used for the analysis of the scheduling components and the computation of =
the
+minimum required scheduling resources for a given component. **Section 3.4=
**
+shows the schedulability test for MPR on the HCBS framework. **Section 3.5=
**
+shows how to convert a MPR interface to a HCBS compatible resource reserva=
tion
+for a component. Finally, **Section 3.6** lists other interesting models w=
hich
+could be used for the component analysis in HCBS.
+
+3.1 Basic Definitions
+---------------------
+
+*We borrow the same definitions given in the* ``sched_deadline`` *document=
, which
+are very briefly summarized here, and new ones, needed by the following co=
ntent,
+are added.*
+
+A typical real-time task is composed of a repetition of computation phases=
 (task
+instances, or jobs) which are activated on a periodic or sporadic fashion.=
 For
+our purposes, real-time tasks are characterized by three parameters:
+
+* Worst Case Execution Time (WCET): the maximum execution time among all j=
obs.
+* Relative Deadline (D): the maximum time each job must be completed, rela=
tive
+  to the release time of the job.
+* Inter-Arrival Period (P): the exact/minimum (for periodic/sporadic tasks=
) time
+  between each consecutive job.
+
+3.2 Hierarchical Constant Bandwidth Server (HCBS) [1]
+-----------------------------------------------------
+
+As mentioned, HCBS is a *hierarchical scheduling framework*:
+
+* The framework hierarchy follows the same hierarchy of cgroups. Cgroups m=
ay
+  have two roles, either bandwidth reservation for children cgroups, or th=
ey may
+  be *live*, i.e. run tasks (but not both). The root cgroup, for the kerne=
l's
+  implementation of HCBS, acts only as bandwidth reservation (but as writt=
en in
+  this document it has also different uses outside of the hierarchical
+  framework).
+* The cgroup tree is internally flattened, for ease of scheduling, to a
+  two-level hierarchy, since only the *live* groups are of interest and al=
l the
+  necessary information for their scheduling lies in their interface (ther=
e is
+  no need for the reservation components).
+* The hierarchical framework, now on two levels, consists then of a first =
level
+  of cgroups, and a second level of tasks that are run inside these groups.
+* The scheduling of components is performed using global Earliest Deadline=
 First
+  (gEDF), SCHED_DEADLINE in the kernel, following the bandwidth reservatio=
n of
+  each group.
+* Whenever a component is scheduled, a local scheduler picks which of the =
tasks
+  of the cgroup to run. The scheduling policy is global Fixed Priority (gF=
P),
+  SCHED_FIFO/SCHED_RR in the kernel.
+
+3.3 Multiprocessor Periodic Resource (MPR) model
+------------------------------------------------
+
+A Multiprocessor Periodic Resource (MPR) model [2] **u =3D <Pi, Theta, m'>=
**
+specifies that an identical, unit-capacity multiprocessor platform collect=
ively
+provides **Theta** units of resource every **Pi** time units, where the
+**Theta** time units are supplied with concurrency at most **m'**.
+
+This theoretical model is one of the many models that can abstract the
+interface of our real-time cgroups: let **m'** be the number of CPUs of the
+machine, let **Theta** be **m' * <cgroup>/cpu.rt_runtime_us** and **Pi** be
+**<cgroup>/cpu.rt_period_us**.

-Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real
-CPU bandwidth to task groups.
+Let's introduce the concept of Supply Bound Function (SBF). A SBF is a fun=
ction
+which outputs a lower bound for the processor supply provided in a given t=
ime
+interval, given a resource supply model. For a completely dedicated CPU, t=
he SBF
+function is simply the identity function, as it will always provide **t** =
units
+of computation for an interval of length **t**. The situation gets slightl=
y more
+complicated for the MPR model or any of the other model listed in section =
3.6.

-This uses the cgroup virtual file system and "<cgroup>/cpu.rt_runtime_us"
-to control the CPU time reserved for each control group.
+The **SBF(t)** for a MPR model **u =3D <Pi, Theta, m'>** is::

-For more information on working with control groups, you should read
-Documentation/admin-guide/cgroup-v1/cgroups.rst as well.
+             | 0                                       if t' < 0
+             |
+  SBF_u(t) =3D | floor(t' / PI) * Theta
+             |   + max(0, m' * x - (m' * Pi - Theta)   if t' >=3D 0 and 1 =
<=3D x <=3D y
+             |
+             | floor(t' / PI) * Theta
+             |   + max(0, m' * x - (m' * Pi - Theta)   else
+             |   - (m' - beta)

-Group settings are checked against the following limits in order to keep t=
he
-configuration schedulable:
+where::

-   \Sum_{i} runtime_{i} / global_period <=3D global_runtime / global_period
+  alpha =3D floor(Theta / m')
+  beta =3D Theta - m' * alpha
+  t' =3D t - (Pi - ceil(Theta / m'))
+  x  =3D t' - (Pi * floor(t' / Pi))
+  y  =3D Pi - floor(Theta / m')

-For now, this can be simplified to just the following (but see Future plan=
s):
+Briefly, this function models that the server's bandwidth is given as late=
 as
+possible, so describing the worst case possible for the supplied bandwidth.

-   \Sum_{i} runtime_{i} <=3D global_runtime
+3.4 Schedulability for MPR on global Fixed-Priority
+---------------------------------------------------

+Let's introduce the concept of Demand Bound Function (DBF). A DBF is a fun=
ction
+that, given a taskset, a scheduling algorithm and an interval of time, out=
puts
+the worst resource demand for that interval of time.

-3. Future plans
-=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
+It is easy to see that, given a DBF and a SBF, we can deem a component/tas=
kset
+schedulable if, for every time interval t >=3D 0, it is possible to demons=
trate
+that:

-There is work in progress to make the scheduling period for each group
-("<cgroup>/cpu.rt_period_us") configurable as well.
+  DBF(t) <=3D SBF(t)

-The constraint on the period is that a subgroup must have a smaller or
-equal period to its parent. But realistically its not very useful _yet_
-as its prone to starvation without deadline scheduling.
+We have the Supply Bound Function for our given MPR model, so we are missi=
ng the
+Demand Bound Function for a given taskset that is being scheduled using gl=
obal
+Fixed Priority.

-Consider two sibling groups A and B; both have 50% bandwidth, but A's
-period is twice the length of B's.
+3.4.1 Schedulability Analysis for global Fixed Priority
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-* group A: period=3D100000us, runtime=3D50000us
+Bertogna, Cirinei and Lipari [6] have derived a schedulability test for gl=
obal
+Fixed Priority (gFP) on multi-processor platforms. In this test (called
+*BCL_gFP* test) we can consider all the CPUs to be dedicated to the schedu=
ling.

-	- this runs for 0.05s once every 0.1s
+  A taskset **Tau** is schedulable with gFP on a multiprocessor platform
+  composed of **m'** identical processors if for each task **tau_k in Tau*=
*:

-* group B: period=3D 50000us, runtime=3D25000us
+    Sum(for i < k)( min(W_i(D_k), D_k - C_k + 1) ) < m' * (D_k - C_k + 1)

-	- this runs for 0.025s twice every 0.1s (or once every 0.05 sec).
+  where **W_i(t)** is the workload of task **tau_i** over a time interval =
**t**:

-This means that currently a while (1) loop in A will run for the full peri=
od of
-B and can starve B's tasks (assuming they are of lower priority) for a who=
le
-period.
+    W_i(t) =3D N_i(t) * C_i + min(C_i, t + D_i - C_i - N_i(t) * P_i)

-The next project will be SCHED_EDF (Earliest Deadline First scheduling) to=
 bring
-full deadline scheduling to the linux kernel. Deadline scheduling the above
-groups and treating end of the period as a deadline will ensure that they =
both
-get their allocated time.
+  and **N_i(t)** is the number of activations of task **tau_i** that compl=
ete in
+  a time interval **t**:

-Implementing SCHED_EDF might take a while to complete. Priority Inheritanc=
e is
-the biggest challenge as the current linux PI infrastructure is geared tow=
ards
-the limited static priority levels 0-99. With deadline scheduling you need=
 to
-do deadline inheritance (since priority is inversely proportional to the
-deadline delta (deadline - now)).
+    N_i(t) =3D floor( (t + D_i - C_i) / P_i )
+
+  while the **min** term is the contribution of the carried-out job in the
+  interval **t**, i.e. that job that does not completely fit in the interv=
al
+  **t**, but starts inside the interval after all the jobs that complete.
+
+3.4.2 From BCL_gFP to the Demand Bound Function
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+We can then derive the DBF from this test:
+
+  DBF_gFP(tau_k) =3D Sum(for i < k)( min(W_i(D_k), D_k - C_k + 1) ) + m' *=
 (C_k - 1)
+
+Briefly, the first sum component, the same in the BCL_gFP test, describes =
the
+maximum interference that higher priority task give to the analysed task. =
The
+workload is upperbounded by ``(D_k - C_K + 1)`` because we are only intere=
sted
+in the interference in the slack time, while for the ``C_k`` time we are
+requiring that all the CPUs are fully available, as the single job needs `=
C_k`
+(non overlapping) time units to run.
+
+The demand bound function from Bertogna et al. is only defined on a single=
 time
+(i.e. the deadline of the task in analysis) instead of all possible times =
as
+this is the minimum argument to demonstrate schedulability on global Fixed
+Priority.
+
+3.4.3 Putting it all togheter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A component **C**, on **m'** processors, running a taskset **Tau =3D { tau=
_1 =3D
+(C_1, D_1, P_1), ..., tau_n =3D (C_n, D_n, P_n) }** of **n** sporadic task=
s, is
+schedulable under gFP using an MPR model **u =3D <Pi, Theta, m'>**, if for=
 all
+tasks **tau_k in Tau**:
+
+  DBF_gFP(tau_k) <=3D SBF_u(D_K)
+
+3.5 From MPR to deadline servers
+--------------------------------
+
+Since there exist no algorithm to schedule MPR interfaces, a tecnique was
+developed to transform MPR interfaces into periodic tasks, so that a
+number of periodic servers which respect the tasks requirements can be use=
d for
+the scheduling of the MPR interface and associated tasks.
+
+Let **u =3D <Pi, Theta, m>** be a MPR interface, let **a =3D Theta - m * f=
loor(Theta
+/ m)**, let **k =3D floor(a)**. Define a transformation from **u** to a pe=
riodic
+taskset **Tau_u =3D { tau_1 =3D (C_1, D_1, P_1), ..., tau_m' =3D (C_m', D_=
m', P_m')
+}**, where:
+
+  **tau_1 =3D ... =3D tau_k =3D (floor(Theta / m') + 1, Pi, Pi)**
+
+  **tau_k+1 =3D (floor(Theta / m') + a - k * floor(a/k), Pi, Pi)**
+
+  **tau_k+2 =3D ... =3D tau_m' =3D (floor(Theta / m'), Pi, Pi)**
+
+This periodic taskset of servers **Tau_u** can be scheduled on any number =
of
+processors with concurrency at most **m'**.
+
+For real-time control groups, it is possible to just consider a slightly m=
ore
+demanding taskset **Tau_u'**, where each task **tau_i** is defined as foll=
ows:
+
+  **tau_i =3D (ceil(Theta / m'), Pi, Pi)**
+
+3.6 Other models
+----------------
+
+There exist many other theoretical models in literature which are used to
+describe a hierarchical scheduling framework on multi-core architectures.
+Notable examples are the Multi Supply Function (MSF) abstraction [3], the
+Parallel Supply Function (PSF) abstraction [4] and the Bounded Delay
+Multipartition (BDM) [7].
+
+3.7 References
+--------------
+  1 - L. Abeni, A. Balsini, and T. Cucinotta, =E2=80=9CContainer-based rea=
l-time
+      scheduling in the Linux kernel,=E2=80=9D SIGBED Rev., vol. 16, no. 3=
, pp. 33-38,
+      Nov. 2019, doi: 10.1145/3373400.3373405.
+  2 - A. Easwaran, I. Shin, and I. Lee, =E2=80=9COptimal virtual cluster-b=
ased
+      multiprocessor scheduling,=E2=80=9D Real-Time Syst, vol. 43, no. 1, =
pp. 25-59,
+      Sept. 2009, doi: 10.1007/s11241-009-9073-x.
+  3 - E. Bini, G. Buttazzo, and M. Bertogna, =E2=80=9CThe Multi Supply Fun=
ction
+      Abstraction for Multiprocessors,=E2=80=9D in 2009 15th IEEE Internat=
ional
+      Conference on Embedded and Real-Time Computing Systems and Applicati=
ons,
+      Aug. 2009, pp. 294-302. doi: 10.1109/RTCSA.2009.39.
+  4 - E. Bini, B. Marko, and S. K. Baruah, =E2=80=9CThe Parallel Supply Fu=
nction
+      Abstraction for a Virtual Multiprocessor,=E2=80=9D in Scheduling, S.=
 Albers, S. K.
+      Baruah, R. H. M=C3=B6hring, and K. Pruhs, Eds., in Dagstuhl Seminar =
Proceedings
+      (DagSemProc), vol. 10071. Dagstuhl, Germany: Schloss Dagstuhl -
+      Leibniz-Zentrum f=C3=BCr Informatik, 2010, pp. 1-14. doi:
+      10.4230/DagSemProc.10071.14.
+  5 - I. Shin and I. Lee, =E2=80=9CCompositional real-time scheduling fram=
ework,=E2=80=9D in
+      25th IEEE International Real-Time Systems Symposium, Dec. 2004, pp. =
57-67.
+      doi: 10.1109/REAL.2004.15.
+  6 - M. Bertogna, M. Cirinei, and G. Lipari, =E2=80=9CSchedulability Anal=
ysis of Global
+      Scheduling Algorithms on Multiprocessor Platforms,=E2=80=9D IEEE Tra=
nsactions on
+      Parallel and Distributed Systems, vol. 20, no. 4, pp. 553-566, Apr. =
2009,
+      doi: 10.1109/TPDS.2008.129.
+  7 - G. Lipari and E. Bini, =E2=80=9CA Framework for Hierarchical Schedul=
ing on
+      Multiprocessors: From Application Requirements to Run-Time Allocatio=
n,=E2=80=9D in
+      2010 31st IEEE Real-Time Systems Symposium, Nov. 2010, pp. 249-258. =
doi:
+      10.1109/RTSS.2010.12.
+
+
+4. Using Real-Time cgroups
+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
+
+4.1 CGroup Setup
+----------------

-This means the whole PI machinery will have to be reworked - and that is o=
ne of
-the most complex pieces of code we have.
+The following is a brief guide to the use of Real-Time Control Groups.
+
+Of course, real-time control groups require mounting of the cgroup file sy=
stem.
+We have decided to only support cgroups v2, so make sure you mount the v2
+controller for the cgroup hierarchy.
+
+Additionally the real-time cgroups require the CPU controller for the cgro=
ups to
+be enabled::
+
+  # Assume the cgroup file system is mounted at /sys/fs/cgroup
+  > echo "+cpu" > /sys/fs/cgroup/cgroup.subtree_control
+
+The CPU controller can only be mounted if there is no SCHED_FIFO/SCHED_RR =
task
+scheduled in any cgroup other than the root control group.
+
+The root control group has no bandwidth allocated by default, so make sure=
 to
+allocate some bandwidth so that it can be used by the other cgroups. More =
on
+that in the following section...
+
+4.2 Bandwidth Allocation for groups
+-----------------------------------
+
+Allocating bandwidth to a cgroup is a fundamental step to run real-time
+workload. The cgroup filesystem exposes two files:
+
+* ``<cgroup>/cpu.rt_runtime_us``: which specifies the cgroups' runtime in
+  microseconds.
+* ``<cgroup>/cpu.rt_period_us``: which specifies the cgroups' period in
+  microseconds.
+
+Both files are readable and writable, and their default value is zero. By
+definition, the specified runtime must be always less than or equal to the
+period. Additionally, an admission test checks if the bandwidth invariant =
is
+respected (i.e. sum of children's bandwidth <=3D parent's bandwidth).
+
+The root control group files instead control and reserve the SCHED_DEADLINE
+bandwidth allocated to real-time cgroups, since real-time groups compete a=
nd
+share the same bandwidth allocated to SCHED_DEADLINE tasks.
+
+4.3 Running real-time tasks in groups
+-------------------------------------
+
+To run tasks in real-time groups it is just necessary to change a tasks
+scheduling policy to SCHED_FIFO/SCHED_RR and migrate it into the group. If=
 the
+group is not allowed to run real-time tasks because of incorrect configura=
tion,
+either migrating a SCHED_FIFO/SCHED_RR task into the group or changing
+scheduling policy to a task already inside the group will fail::
+
+ # assume there is a task of PID 42 running
+ # change its scheduling policy to SCHED_FIFO, priority 99
+ > chrt -f -p 99 42
+
+ # migrate the task to a cgroup
+ > echo 42 > /sys/fs/cgroup/<my-cgroup>/cgroup.procs
+
+4.4 Special case: the root control group
+----------------------------------------
+
+The root cgroup is special, compared to the other cgroups, as its tasks ar=
e not
+managed by the HCBS algorithm, rather they just use the original
+SCHED_FIFO/SCHED_RR policies (as if CONFIG_RT_GROUP_SCHED was disabled). As
+mentioned, its bandwidth files are just used to control how much of the
+SCHED_DEADLINE bandwidth is allocated to cgroups.
+
+4.5 Guarantees and Special Behaviours
+-------------------------------------
+
+Real-time cgroups are run at the same priority level of SCHED_DEADLINE tas=
ks.
+Since this is the highest priority scheduling policy, and since the Consta=
nt
+Bandwidth Server (CBS) enforces that the specified bandwidth requirements =
for
+both groups and tasks cannot be overrun, real-time groups have the same
+guarantees that SCHED_DEADLINE tasks have, i.e. they will be necessarily
+supplied by the amount of bandwidth requested (whenever the admission tests
+pass).
+
+This means that, since SCHED_FIFO/SCHED_RR tasks (scheduled in the root co=
ntrol
+group) are not subject to bandwidth controls, they are run at a lower prio=
rity
+than the cgroups' counterparts. Nonetheless, a minimum amount of bandwidth=
, if
+reserved, will always be available to run SCHED_FIFO/SCHED_RR workloads in=
 the
+root cgroup, while they will be able to use more runtime if any of the
+SCHED_DEADLINE tasks or servers use less than their specified amount of
+bandwidth. SCHED_OTHER tasks are instead scheduled as normal, at lower pri=
ority
+than real-time workloads.
+
+The aforementioned behaviour differs from the preceding RT_GROUP_SCHED
+implementation, but this is necessary to give actual guarantees to the amo=
unt of
+bandwidth given to rt-cgroups.
\ No newline at end of file
--
2.51.0