From nobody Mon Dec 1 22:02:14 2025 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0B6F314A8F for ; Mon, 1 Dec 2025 12:42:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764592956; cv=none; b=HARzl6N7+8IsPZ4KB8eb8GlY3D6BMfJdZtI8foaBZasztcrT+aoT7wJelhrwl5ESEAzI+Yr6h0mSoRk03TF0VPuxPGlDt2dCGmGm66LutiYmXAx8UzOCjxxpOozZg3UfT6nM7XTHdMo+U+BdCXuyHYUCMQ9ABKoOPWF8mSCDdhQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764592956; c=relaxed/simple; bh=3kD8/0vC8COcMppQkPHYbD3ew+AHoKKbw49D25FD9f0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=EHfTgCkUzYS7oUvI8uv18UrpixaRG/BCJV6Vp1m4kb7WJWKNJOo8jEjtimHNs9qQXZ2avXBTqFTkz4iXoJHxAuUtEFvTqwck+VTMcuogqV6cGjUkw5eFz1PrQhjb+Nky4gJyqmKmQY6a+J93p8Dkf3yqSJDQ+H0jvDcz43fnDeg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=b++cUUVj; arc=none smtp.client-ip=209.85.208.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="b++cUUVj" Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-640ca678745so7183443a12.2 for ; Mon, 01 Dec 2025 04:42:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764592951; x=1765197751; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VynGhLyLnCtRDFI80gOjWbt3RNlLcXHqh6uWjY2TOIo=; b=b++cUUVj28b7QdfYwFvDD/r9qD+NWReLJiROwMinwhDmA36Umga/1sMQJMDDiYM6Yb auNxxC6j8wcYIRH0HtWL3xe9MoJFihULWzVWbQnuxkJPCqqjIbh3aPD/WQE8fY+joRop ezTnoMjhN7G/no+Mp/zdVnh71b3cZSilwwyyh1hgGm2VRV5vFGL5RchsSQ1QHmOGtZ2Q gnRxBPSFvdIYZERS89I/+JjnnSa08/+xHemc9Ek9LncfZJGsZi9NhACYTBCi8uhBhsTM oC0xrREK88FqC1zl6fbfaoL1kw7ECcTZolaWpl4YEwDnW7gvgIYmTzyYKNaWvuhdPO+q LsOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764592951; x=1765197751; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=VynGhLyLnCtRDFI80gOjWbt3RNlLcXHqh6uWjY2TOIo=; b=SOghgJgBgL0hCPNsS+OBseByFpWE2zfT/oyBhTMjPM/xCCI6qW8H4+/4afHBtmGYmf vpZENxmtQZf3Z6x3cFSvP28kuMIi7uQl+oBJ787cHayx3pHu4CYGO8j1+6M2Pyeoc33v 67ST7VUsUfHQqIgLRBl328j94mUPxvWqZ77iHW6+lW3oNfHxthFpLdmFVmjxL4OtUoI+ FXPkWtrzp8FckHccP6kJ9gNuUOKkFvVEHyhETwnJHxOwtEuZZXcw6Z50u6l/Lzlzl9x6 G9LiyGzMcLv2oJO3CyV8w4t3167y1iaPpnySJVfHuc1ztJ9/svD7Xz8ejHsC21Xb7Pq0 1trg== X-Gm-Message-State: AOJu0YzpN41Bp8yYYF3kYaOPpHRnisvIiiFRiyQ2IKDRxSKQzfxg0cGZ k1zI1kDBnnxgzPEQzVrwDe81YPMRy5blb6m9va9NqYnjplvEeVqM3rPX X-Gm-Gg: ASbGncv1f3jEtSQfPb0YLLrZllsXSM07bnFGHS8W+jgdOiFPJxnaD7XZ/RvXmdYglo+ DuJWCZHAjW+HA1HjyykjUyRfPfPc2r8CvkKM4KtEguGoeIBiUJgg+l6EFAll5GxBJZsC5ayryp/ HC85Rw7ZGYdw4CD8Oq3jlN229q/KEp/5mQSJBoybUAPqBPcBPJPDoGi+tw2J7Ds+Ld56LgLxEeC 0FlZbbt76HdV19ds2vtuGy5BtUVDinBBHh0rJyv0mG/nlj+9mfYiNKxHq4pNUyzJbUQvEwd5WcN EhABpzhplCsjimaX4WTtz8+VAfy3lTztnV7qSV6rQ/iDzS6oSRRxNh8cNjP4Gi5hq2mkcPp9Hd9 TFMecI0iAqRmAyVXVqCDIS82ZGK9cOawzdL/LiGS2vpnXKNi+4FlntlAvOJKDb9AqkM6qmQSLhP lguSsCpfr2CnQJX9oJ/QU= X-Google-Smtp-Source: AGHT+IFEdo2RA+warFuEtNhkPY15DCVdmxNetZsYDhkWI9DDCFflVPPNXSfKQ7D3gl/EfdkdPIAWbQ== X-Received: by 2002:a17:907:7282:b0:b73:278a:a499 with SMTP id a640c23a62f3a-b76715654bbmr4582001566b.15.1764592950575; Mon, 01 Dec 2025 04:42:30 -0800 (PST) Received: from victus-lab ([193.205.81.5]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-b76f59e8612sm1173738266b.52.2025.12.01.04.42.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Dec 2025 04:42:29 -0800 (PST) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v4 26/28] Documentation: Update documentation for real-time cgroups Date: Mon, 1 Dec 2025 13:41:59 +0100 Message-ID: <20251201124205.11169-27-yurand2000@gmail.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20251201124205.11169-1-yurand2000@gmail.com> References: <20251201124205.11169-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Update the RT_GROUP_SCHED specific documentation. Give a brief theoretical background for Hierarchical Constant Bandwidth Server (HCBS). Document how the HCBS is implemented in the kernel and how the RT_GROUP_SCHED behaves now compared to the version which this patchset replaces. Signed-off-by: Yuri Andriaccio --- Documentation/scheduler/sched-rt-group.rst | 500 ++++++++++++++++++--- 1 file changed, 426 insertions(+), 74 deletions(-) diff --git a/Documentation/scheduler/sched-rt-group.rst b/Documentation/sch= eduler/sched-rt-group.rst index ab464335d3..a5a9203355 100644 --- a/Documentation/scheduler/sched-rt-group.rst +++ b/Documentation/scheduler/sched-rt-group.rst @@ -53,9 +53,12 @@ CPU time is divided by means of specifying how much time= can be spent running in a given period. We allocate this "run time" for each real-time group wh= ich the other real-time groups will not be permitted to use. -Any time not allocated to a real-time group will be used to run normal pri= ority -tasks (SCHED_OTHER). Any allocated run time not used will also be picked u= p by -SCHED_OTHER. +Each real-time group runs at the same priority as SCHED_DEADLINE, thus they +share and contend the SCHED_DEADLINE allowed bandwidth. Any time not alloc= ated +to a real-time group (and SCHED_DEADLINE tasks) will be used to run both +SCHED_FIFO/SCHED_RR, normal priority tasks (SCHED_OTHER), and SCHED_EXT ta= sks, +following the usual priorities. Any allocated run time not used will also = be +picked up by the other scheduling classes, in the same order as before. Let's consider an example: a frame fixed real-time renderer must deliver 25 frames a second, which yields a period of 0.04s per frame. Now say it will= also @@ -73,10 +76,6 @@ The remaining CPU time will be used for user input and o= ther tasks. Because real-time tasks have explicitly allocated the CPU time they need to perform their tasks, buffer underruns in the graphics or audio can be eliminated. -NOTE: the above example is not fully implemented yet. We still -lack an EDF scheduler to make non-uniform periods usable. - - 2. The Interface =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D @@ -86,40 +85,92 @@ lack an EDF scheduler to make non-uniform periods usabl= e. The system wide settings are configured under the /proc virtual file syste= m: -/proc/sys/kernel/sched_rt_period_us: +``/proc/sys/kernel/sched_rt_period_us``: The scheduling period that is equivalent to 100% CPU bandwidth. -/proc/sys/kernel/sched_rt_runtime_us: - A global limit on how much time real-time scheduling may use. This is al= ways - less or equal to the period_us, as it denotes the time allocated from the - period_us for the real-time tasks. Without CONFIG_RT_GROUP_SCHED enabled, - this only serves for admission control of deadline tasks. With - CONFIG_RT_GROUP_SCHED=3Dy it also signifies the total bandwidth availabl= e to - all real-time groups. +``/proc/sys/kernel/sched_rt_runtime_us``: + A global limit on how much time real-time scheduling may use (SCHED_DEAD= LINE + tasks + real-time groups). This is always less or equal to the period_us= , as + it denotes the time allocated from the period_us for the real-time tasks. + Without **CONFIG_RT_GROUP_SCHED** enabled, this only serves for admission + control of deadline tasks. With **CONFIG_RT_GROUP_SCHED=3Dy** it also si= gnifies + the total bandwidth available to both real-time groups and deadline task= s. * Time is specified in us because the interface is s32. This gives an operating range from 1us to about 35 minutes. - * sched_rt_period_us takes values from 1 to INT_MAX. - * sched_rt_runtime_us takes values from -1 to sched_rt_period_us. + * ``sched_rt_period_us`` takes values from 1 to INT_MAX. + * ``sched_rt_runtime_us`` takes values from -1 to ``sched_rt_period_us``. * A run time of -1 specifies runtime =3D=3D period, ie. no limit. - * sched_rt_runtime_us/sched_rt_period_us > 0.05 inorder to preserve + * ``sched_rt_runtime_us/sched_rt_period_us`` > 0.05 inorder to preserve bandwidth for fair dl_server. For accurate value check average of - runtime/period in /sys/kernel/debug/sched/fair_server/cpuX/ - - -2.2 Default behaviour ---------------------- - -The default values for sched_rt_period_us (1000000 or 1s) and -sched_rt_runtime_us (950000 or 0.95s). This gives 0.05s to be used by -SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away -real-time tasks will not lock up the machine but leave a little time to re= cover -it. By setting runtime to -1 you'd get the old behaviour back. - -By default all bandwidth is assigned to the root group and new groups get = the -period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you -want to assign bandwidth to another group, reduce the root group's bandwid= th -and assign some or all of the difference to another group. + runtime/period in ``/sys/kernel/debug/sched/fair_server/cpuX/`` + +The default value for ``sched_rt_period_us`` is 1000000 (or 1s) and for +``sched_rt_runtime_us`` is 950000 (or 0.95s). This gives a minimum of 0.05= s to +be used by SCHED_FIFO/SCHED_RR and non-RT tasks (SCHED_OTHER, SCHED_EXT), = while +0.95s are the maximum to be used by SCHED_DEADLINE, and rt-cgroups if enab= led. + +2.2 Cgroup settings +------------------- + +Enabling **CONFIG_RT_GROUP_SCHED** lets you explicitly allocate real CPU +bandwidth to task groups. + +This uses the cgroup virtual file system and the CPU controller for cgroup= s. +Enabling the controller for the hierarchy creates two files: + +* ``/cpu.rt_period_us``, the scheduling period of the group. +* ``/cpu.rt_runtime_us``, the maximum runtime each CPU will provide + every period. + + .. tip:: + For more information on working with control groups, you should read + *Documentation/admin-guide/cgroup-v1/cgroups.rst* as well. + .. + +By default the root cgroup has the same period of +``/proc/sys/kernel/sched_rt_period_us``, which is 1s, and a runtime of zer= o, so +that rt-cgroup is *soft-disabled* by default, and all the runtime is avail= able +for SCHED_DEADLINE tasks only. New groups instead get both period and runt= ime of +zero. + +2.3 Cgroup Hierarchy and Behaviours +----------------------------------- + +With HCBS, cgroups may act either as task runners or bandwidth reservation: + +* A bandwidth reservation cgroup (such as the root control group), has the + purpose to reserve a portion of the total real-time bandwidth for its su= b-tree + of groups. A group in this state cannot run SCHED_FIFO/SCHED_RR tasks. + + .. important:: + The *root control group* behaviour is different from the other cgroups= , as + its job is to reserve bandwidth for the whole group hierarchy, but it = can + also run rt tasks. This is an exception: FIFO/RR tasks running in the + root cgroup follow the same rules as FIFO/RR tasks in a kernel which h= as + **CONFIG_RT_GROUP_SCHED=3Dn**, and the bandwidth reservation is instea= d a + feature connected to HCBS, that acts on the cgroup tree. + .. + +* A *live* group instead can be used to run FIFO/RR tasks, with the given + bandwidth parameters: each CPU is served a *potentially continuous* runt= ime of + ``/cpu.rt_runtime_us`` every period ``/cpu.rt_period_us`= `. It + is important to notice that increasing the period but leaving the bandwi= dth + constant changes the behaviour of the cgroup's servers, as the bandwidth= given + overall is the same, but it is given in longer bursts (and longer slices= of no + bandwidth). + +More specifically on *live* and non-*live*: + +* A group is deemed *live* if it is a leaf of the groups' hierarchy or all= of + its children have runtime 0. +* *Live* groups are the only groups allowed to run real-time tasks. A SCHE= D_FIFO + task cannot be migrated in a non-*live* group, neither a task inside this + group can change scheduling policy to SCHED_FIFO/SCHED_RR if the group i= s not + *live*. +* Non-*live* groups are only used for bandwidth reservation. +* Group's bandwidth follow this invariant: the sum of the bandwidths of a + group's children is always less than or equal to the group's bandwidth. Real-time group scheduling means you have to assign a portion of total CPU bandwidth to the group before it will accept real-time tasks. Therefore yo= u will @@ -128,63 +179,364 @@ done that, even if the user has the rights to run pr= ocesses with real-time priority! -2.3 Basis for grouping tasks ----------------------------- +3. Theoretical Background +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D + + + .. BIG FAT WARNING ****************************************************** + + .. warning:: + + This section contains a (not-thorough) summary on deadline/hierarchical + scheduling theory, and how it applies to real-time control groups. + The reader can "safely" skip to Section 4 if only interested in seeing + how the scheduling policy can be used. Anyway, we strongly recommend + to come back here and continue reading (once the urge for testing is + satisfied :P) to be sure of fully understanding all technical details. + + .. **********************************************************************= ** + +The real-time cgroup scheduler is based upon the **Hierarchical Constant +Bandwidth Server** (HCBS) [1] *Compositional Scheduling Framework* (CSF). A +**CSF** is a framework where global (system-level) timing properties can be +established by composing independently (specified and) analyzed local +(component-level) timing properties [5]. + +For HCBS (related to the Linux kernel), the compositional framework consis= ts of +two parts: + +* The *scheduling components*, which are the basic units of the scheduling= . In + the kernel these are the single cgroups along with the tasks that must b= e run + inside. + +* The *scheduling resources*, which are the CPUs of the machine. + +HCBS is a *hierarchical scheduling framework*, where the scheduling compon= ents +form a hierarchy and resources are allocated from parent components to its= child +components in the hierarchy. + +The Chapter is organized as follows: **Section 3.1** gives basic real-time +theory definitions that are used throughout the whole section. **Section 3= .2** +talks about the HCBS framework, giving a general idea on how this is struc= tured. +**Section 3.3** introduces the MPR model, one of the many models which may= be +used for the analysis of the scheduling components and the computation of = the +minimum required scheduling resources for a given component. **Section 3.4= ** +shows the schedulability test for MPR on the HCBS framework. **Section 3.5= ** +shows how to convert a MPR interface to a HCBS compatible resource reserva= tion +for a component. Finally, **Section 3.6** lists other interesting models w= hich +could be used for the component analysis in HCBS. + +3.1 Basic Definitions +--------------------- + +*We borrow the same definitions given in the* ``sched_deadline`` *document= , which +are very briefly summarized here, and new ones, needed by the following co= ntent, +are added.* + +A typical real-time task is composed of a repetition of computation phases= (task +instances, or jobs) which are activated on a periodic or sporadic fashion.= For +our purposes, real-time tasks are characterized by three parameters: + +* Worst Case Execution Time (WCET): the maximum execution time among all j= obs. +* Relative Deadline (D): the maximum time each job must be completed, rela= tive + to the release time of the job. +* Inter-Arrival Period (P): the exact/minimum (for periodic/sporadic tasks= ) time + between each consecutive job. + +3.2 Hierarchical Constant Bandwidth Server (HCBS) [1] +----------------------------------------------------- + +As mentioned, HCBS is a *hierarchical scheduling framework*: + +* The framework hierarchy follows the same hierarchy of cgroups. Cgroups m= ay + have two roles, either bandwidth reservation for children cgroups, or th= ey may + be *live*, i.e. run tasks (but not both). The root cgroup, for the kerne= l's + implementation of HCBS, acts only as bandwidth reservation (but as writt= en in + this document it has also different uses outside of the hierarchical + framework). +* The cgroup tree is internally flattened, for ease of scheduling, to a + two-level hierarchy, since only the *live* groups are of interest and al= l the + necessary information for their scheduling lies in their interface (ther= e is + no need for the reservation components). +* The hierarchical framework, now on two levels, consists then of a first = level + of cgroups, and a second level of tasks that are run inside these groups. +* The scheduling of components is performed using global Earliest Deadline= First + (gEDF), SCHED_DEADLINE in the kernel, following the bandwidth reservatio= n of + each group. +* Whenever a component is scheduled, a local scheduler picks which of the = tasks + of the cgroup to run. The scheduling policy is global Fixed Priority (gF= P), + SCHED_FIFO/SCHED_RR in the kernel. + +3.3 Multiprocessor Periodic Resource (MPR) model +------------------------------------------------ + +A Multiprocessor Periodic Resource (MPR) model [2] **u =3D = ** +specifies that an identical, unit-capacity multiprocessor platform collect= ively +provides **Theta** units of resource every **Pi** time units, where the +**Theta** time units are supplied with concurrency at most **m'**. + +This theoretical model is one of the many models that can abstract the +interface of our real-time cgroups: let **m'** be the number of CPUs of the +machine, let **Theta** be **m' * /cpu.rt_runtime_us** and **Pi** be +**/cpu.rt_period_us**. -Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real -CPU bandwidth to task groups. +Let's introduce the concept of Supply Bound Function (SBF). A SBF is a fun= ction +which outputs a lower bound for the processor supply provided in a given t= ime +interval, given a resource supply model. For a completely dedicated CPU, t= he SBF +function is simply the identity function, as it will always provide **t** = units +of computation for an interval of length **t**. The situation gets slightl= y more +complicated for the MPR model or any of the other model listed in section = 3.6. -This uses the cgroup virtual file system and "/cpu.rt_runtime_us" -to control the CPU time reserved for each control group. +The **SBF(t)** for a MPR model **u =3D ** is:: -For more information on working with control groups, you should read -Documentation/admin-guide/cgroup-v1/cgroups.rst as well. + | 0 if t' < 0 + | + SBF_u(t) =3D | floor(t' / PI) * Theta + | + max(0, m' * x - (m' * Pi - Theta) if t' >=3D 0 and 1 = <=3D x <=3D y + | + | floor(t' / PI) * Theta + | + max(0, m' * x - (m' * Pi - Theta) else + | - (m' - beta) -Group settings are checked against the following limits in order to keep t= he -configuration schedulable: +where:: - \Sum_{i} runtime_{i} / global_period <=3D global_runtime / global_period + alpha =3D floor(Theta / m') + beta =3D Theta - m' * alpha + t' =3D t - (Pi - ceil(Theta / m')) + x =3D t' - (Pi * floor(t' / Pi)) + y =3D Pi - floor(Theta / m') -For now, this can be simplified to just the following (but see Future plan= s): +Briefly, this function models that the server's bandwidth is given as late= as +possible, so describing the worst case possible for the supplied bandwidth. - \Sum_{i} runtime_{i} <=3D global_runtime +3.4 Schedulability for MPR on global Fixed-Priority +--------------------------------------------------- +Let's introduce the concept of Demand Bound Function (DBF). A DBF is a fun= ction +that, given a taskset, a scheduling algorithm and an interval of time, out= puts +the worst resource demand for that interval of time. -3. Future plans -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +It is easy to see that, given a DBF and a SBF, we can deem a component/tas= kset +schedulable if, for every time interval t >=3D 0, it is possible to demons= trate +that: -There is work in progress to make the scheduling period for each group -("/cpu.rt_period_us") configurable as well. + DBF(t) <=3D SBF(t) -The constraint on the period is that a subgroup must have a smaller or -equal period to its parent. But realistically its not very useful _yet_ -as its prone to starvation without deadline scheduling. +We have the Supply Bound Function for our given MPR model, so we are missi= ng the +Demand Bound Function for a given taskset that is being scheduled using gl= obal +Fixed Priority. -Consider two sibling groups A and B; both have 50% bandwidth, but A's -period is twice the length of B's. +3.4.1 Schedulability Analysis for global Fixed Priority +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -* group A: period=3D100000us, runtime=3D50000us +Bertogna, Cirinei and Lipari [6] have derived a schedulability test for gl= obal +Fixed Priority (gFP) on multi-processor platforms. In this test (called +*BCL_gFP* test) we can consider all the CPUs to be dedicated to the schedu= ling. - - this runs for 0.05s once every 0.1s + A taskset **Tau** is schedulable with gFP on a multiprocessor platform + composed of **m'** identical processors if for each task **tau_k in Tau*= *: -* group B: period=3D 50000us, runtime=3D25000us + Sum(for i < k)( min(W_i(D_k), D_k - C_k + 1) ) < m' * (D_k - C_k + 1) - - this runs for 0.025s twice every 0.1s (or once every 0.05 sec). + where **W_i(t)** is the workload of task **tau_i** over a time interval = **t**: -This means that currently a while (1) loop in A will run for the full peri= od of -B and can starve B's tasks (assuming they are of lower priority) for a who= le -period. + W_i(t) =3D N_i(t) * C_i + min(C_i, t + D_i - C_i - N_i(t) * P_i) -The next project will be SCHED_EDF (Earliest Deadline First scheduling) to= bring -full deadline scheduling to the linux kernel. Deadline scheduling the above -groups and treating end of the period as a deadline will ensure that they = both -get their allocated time. + and **N_i(t)** is the number of activations of task **tau_i** that compl= ete in + a time interval **t**: -Implementing SCHED_EDF might take a while to complete. Priority Inheritanc= e is -the biggest challenge as the current linux PI infrastructure is geared tow= ards -the limited static priority levels 0-99. With deadline scheduling you need= to -do deadline inheritance (since priority is inversely proportional to the -deadline delta (deadline - now)). + N_i(t) =3D floor( (t + D_i - C_i) / P_i ) + + while the **min** term is the contribution of the carried-out job in the + interval **t**, i.e. that job that does not completely fit in the interv= al + **t**, but starts inside the interval after all the jobs that complete. + +3.4.2 From BCL_gFP to the Demand Bound Function +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We can then derive the DBF from this test: + + DBF_gFP(tau_k) =3D Sum(for i < k)( min(W_i(D_k), D_k - C_k + 1) ) + m' *= (C_k - 1) + +Briefly, the first sum component, the same in the BCL_gFP test, describes = the +maximum interference that higher priority task give to the analysed task. = The +workload is upperbounded by ``(D_k - C_K + 1)`` because we are only intere= sted +in the interference in the slack time, while for the ``C_k`` time we are +requiring that all the CPUs are fully available, as the single job needs `= C_k` +(non overlapping) time units to run. + +The demand bound function from Bertogna et al. is only defined on a single= time +(i.e. the deadline of the task in analysis) instead of all possible times = as +this is the minimum argument to demonstrate schedulability on global Fixed +Priority. + +3.4.3 Putting it all togheter +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A component **C**, on **m'** processors, running a taskset **Tau =3D { tau= _1 =3D +(C_1, D_1, P_1), ..., tau_n =3D (C_n, D_n, P_n) }** of **n** sporadic task= s, is +schedulable under gFP using an MPR model **u =3D **, if for= all +tasks **tau_k in Tau**: + + DBF_gFP(tau_k) <=3D SBF_u(D_K) + +3.5 From MPR to deadline servers +-------------------------------- + +Since there exist no algorithm to schedule MPR interfaces, a tecnique was +developed to transform MPR interfaces into periodic tasks, so that a +number of periodic servers which respect the tasks requirements can be use= d for +the scheduling of the MPR interface and associated tasks. + +Let **u =3D ** be a MPR interface, let **a =3D Theta - m * f= loor(Theta +/ m)**, let **k =3D floor(a)**. Define a transformation from **u** to a pe= riodic +taskset **Tau_u =3D { tau_1 =3D (C_1, D_1, P_1), ..., tau_m' =3D (C_m', D_= m', P_m') +}**, where: + + **tau_1 =3D ... =3D tau_k =3D (floor(Theta / m') + 1, Pi, Pi)** + + **tau_k+1 =3D (floor(Theta / m') + a - k * floor(a/k), Pi, Pi)** + + **tau_k+2 =3D ... =3D tau_m' =3D (floor(Theta / m'), Pi, Pi)** + +This periodic taskset of servers **Tau_u** can be scheduled on any number = of +processors with concurrency at most **m'**. + +For real-time control groups, it is possible to just consider a slightly m= ore +demanding taskset **Tau_u'**, where each task **tau_i** is defined as foll= ows: + + **tau_i =3D (ceil(Theta / m'), Pi, Pi)** + +3.6 Other models +---------------- + +There exist many other theoretical models in literature which are used to +describe a hierarchical scheduling framework on multi-core architectures. +Notable examples are the Multi Supply Function (MSF) abstraction [3], the +Parallel Supply Function (PSF) abstraction [4] and the Bounded Delay +Multipartition (BDM) [7]. + +3.7 References +-------------- + 1 - L. Abeni, A. Balsini, and T. Cucinotta, =E2=80=9CContainer-based rea= l-time + scheduling in the Linux kernel,=E2=80=9D SIGBED Rev., vol. 16, no. 3= , pp. 33-38, + Nov. 2019, doi: 10.1145/3373400.3373405. + 2 - A. Easwaran, I. Shin, and I. Lee, =E2=80=9COptimal virtual cluster-b= ased + multiprocessor scheduling,=E2=80=9D Real-Time Syst, vol. 43, no. 1, = pp. 25-59, + Sept. 2009, doi: 10.1007/s11241-009-9073-x. + 3 - E. Bini, G. Buttazzo, and M. Bertogna, =E2=80=9CThe Multi Supply Fun= ction + Abstraction for Multiprocessors,=E2=80=9D in 2009 15th IEEE Internat= ional + Conference on Embedded and Real-Time Computing Systems and Applicati= ons, + Aug. 2009, pp. 294-302. doi: 10.1109/RTCSA.2009.39. + 4 - E. Bini, B. Marko, and S. K. Baruah, =E2=80=9CThe Parallel Supply Fu= nction + Abstraction for a Virtual Multiprocessor,=E2=80=9D in Scheduling, S.= Albers, S. K. + Baruah, R. H. M=C3=B6hring, and K. Pruhs, Eds., in Dagstuhl Seminar = Proceedings + (DagSemProc), vol. 10071. Dagstuhl, Germany: Schloss Dagstuhl - + Leibniz-Zentrum f=C3=BCr Informatik, 2010, pp. 1-14. doi: + 10.4230/DagSemProc.10071.14. + 5 - I. Shin and I. Lee, =E2=80=9CCompositional real-time scheduling fram= ework,=E2=80=9D in + 25th IEEE International Real-Time Systems Symposium, Dec. 2004, pp. = 57-67. + doi: 10.1109/REAL.2004.15. + 6 - M. Bertogna, M. Cirinei, and G. Lipari, =E2=80=9CSchedulability Anal= ysis of Global + Scheduling Algorithms on Multiprocessor Platforms,=E2=80=9D IEEE Tra= nsactions on + Parallel and Distributed Systems, vol. 20, no. 4, pp. 553-566, Apr. = 2009, + doi: 10.1109/TPDS.2008.129. + 7 - G. Lipari and E. Bini, =E2=80=9CA Framework for Hierarchical Schedul= ing on + Multiprocessors: From Application Requirements to Run-Time Allocatio= n,=E2=80=9D in + 2010 31st IEEE Real-Time Systems Symposium, Nov. 2010, pp. 249-258. = doi: + 10.1109/RTSS.2010.12. + + +4. Using Real-Time cgroups +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +4.1 CGroup Setup +---------------- -This means the whole PI machinery will have to be reworked - and that is o= ne of -the most complex pieces of code we have. +The following is a brief guide to the use of Real-Time Control Groups. + +Of course, real-time control groups require mounting of the cgroup file sy= stem. +We have decided to only support cgroups v2, so make sure you mount the v2 +controller for the cgroup hierarchy. + +Additionally the real-time cgroups require the CPU controller for the cgro= ups to +be enabled:: + + # Assume the cgroup file system is mounted at /sys/fs/cgroup + > echo "+cpu" > /sys/fs/cgroup/cgroup.subtree_control + +The CPU controller can only be mounted if there is no SCHED_FIFO/SCHED_RR = task +scheduled in any cgroup other than the root control group. + +The root control group has no bandwidth allocated by default, so make sure= to +allocate some bandwidth so that it can be used by the other cgroups. More = on +that in the following section... + +4.2 Bandwidth Allocation for groups +----------------------------------- + +Allocating bandwidth to a cgroup is a fundamental step to run real-time +workload. The cgroup filesystem exposes two files: + +* ``/cpu.rt_runtime_us``: which specifies the cgroups' runtime in + microseconds. +* ``/cpu.rt_period_us``: which specifies the cgroups' period in + microseconds. + +Both files are readable and writable, and their default value is zero. By +definition, the specified runtime must be always less than or equal to the +period. Additionally, an admission test checks if the bandwidth invariant = is +respected (i.e. sum of children's bandwidth <=3D parent's bandwidth). + +The root control group files instead control and reserve the SCHED_DEADLINE +bandwidth allocated to real-time cgroups, since real-time groups compete a= nd +share the same bandwidth allocated to SCHED_DEADLINE tasks. + +4.3 Running real-time tasks in groups +------------------------------------- + +To run tasks in real-time groups it is just necessary to change a tasks +scheduling policy to SCHED_FIFO/SCHED_RR and migrate it into the group. If= the +group is not allowed to run real-time tasks because of incorrect configura= tion, +either migrating a SCHED_FIFO/SCHED_RR task into the group or changing +scheduling policy to a task already inside the group will fail:: + + # assume there is a task of PID 42 running + # change its scheduling policy to SCHED_FIFO, priority 99 + > chrt -f -p 99 42 + + # migrate the task to a cgroup + > echo 42 > /sys/fs/cgroup//cgroup.procs + +4.4 Special case: the root control group +---------------------------------------- + +The root cgroup is special, compared to the other cgroups, as its tasks ar= e not +managed by the HCBS algorithm, rather they just use the original +SCHED_FIFO/SCHED_RR policies (as if CONFIG_RT_GROUP_SCHED was disabled). As +mentioned, its bandwidth files are just used to control how much of the +SCHED_DEADLINE bandwidth is allocated to cgroups. + +4.5 Guarantees and Special Behaviours +------------------------------------- + +Real-time cgroups are run at the same priority level of SCHED_DEADLINE tas= ks. +Since this is the highest priority scheduling policy, and since the Consta= nt +Bandwidth Server (CBS) enforces that the specified bandwidth requirements = for +both groups and tasks cannot be overrun, real-time groups have the same +guarantees that SCHED_DEADLINE tasks have, i.e. they will be necessarily +supplied by the amount of bandwidth requested (whenever the admission tests +pass). + +This means that, since SCHED_FIFO/SCHED_RR tasks (scheduled in the root co= ntrol +group) are not subject to bandwidth controls, they are run at a lower prio= rity +than the cgroups' counterparts. Nonetheless, a minimum amount of bandwidth= , if +reserved, will always be available to run SCHED_FIFO/SCHED_RR workloads in= the +root cgroup, while they will be able to use more runtime if any of the +SCHED_DEADLINE tasks or servers use less than their specified amount of +bandwidth. SCHED_OTHER tasks are instead scheduled as normal, at lower pri= ority +than real-time workloads. + +The aforementioned behaviour differs from the preceding RT_GROUP_SCHED +implementation, but this is necessary to give actual guarantees to the amo= unt of +bandwidth given to rt-cgroups. \ No newline at end of file -- 2.51.0