From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3239839FCA8 for ; Thu, 30 Apr 2026 21:38:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585123; cv=none; b=ceGTSKloc7xeKsqVzT8Qed/aSZhlz9D9nT8o7/YNNMWJNw4Km1FwonfXo2h9q+jHsckRrE0TuGjzxbE3m2HRoX5QVGWv6EVjZPwAbJ3OiVMOFizIUdSRq5UXCAFQoPFN/hWZWRyDdcQ6qkmk+t9/uCCsGoONL4b3tFxvtvNrSYo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585123; c=relaxed/simple; bh=aZ5bBhyAg3AZlid7MBzECjuaU+/+bfoKEc4XuLV3Kms=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iyvK96aLSqkh4d7zCBBjZOvJU5IL/SrBg4qoK/zyd0wNujjfwdHffJewfJBvWCXzgKwavfLa5hPCzuOwTBW7E/ECJh4BOwffwbFHQNOkWoE5GrlEPfCszvwwR94waxep64pn4OxmqU4XVIg497YzcKQPWoN784j7bZh3DcjnCqA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ESMzsIAm; arc=none smtp.client-ip=209.85.128.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ESMzsIAm" Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-488ff90d6c7so13371975e9.2 for ; Thu, 30 Apr 2026 14:38:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585121; x=1778189921; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=sH++P2tLZq5G595iXGgpeG/BQxtOuVtGKVYBQUvNVpc=; b=ESMzsIAmL8AtoLyeEdqpo29Q1MKUho18/Ekf/LUuLdU+d2uZhYJyLuSisKaPeF+OAK htrPZg1zNUtUxJi4TbyCTvs1UTCSzGTYu2cCFvgr1gUDpFPMY6odhRnML2OHO+6vR9il JCWVzooWpX+kQDvVmX+7hQoD1Yosut9N8Del1COhkuyP6oRRSJgK1gSwR1IsQgU8TV44 XMIKuZWWRBToMcRA+oP9MwhECp8LGTI5P80HsKIpJCTUUDXZKmLI6go0QBtN7GPS94Hy 49VyBqtoDf2XVjF/S4oYwxIo76c7OyvjUL+eMXF/rae5P00wSoH5aGFmx8C87F+LlsFm ZsnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585121; x=1778189921; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=sH++P2tLZq5G595iXGgpeG/BQxtOuVtGKVYBQUvNVpc=; b=jEkkazc/1gC/hyrAl0jxAyTjfs9BSW6304qXX3lRDbC9aZd2JCZ0fX5IvUDV1rXfu8 lnaNwuXIV9poQ3Rh50y2NdZ49kKztcvVNuW5xbdjBQkbBW1ePy6Qp/zMGsNbhDxWR6Nk e0dWu2CmXQQqMBZcVhAdkMcoYDqUdZIOk9Xqz6jgXORNFjjx2768Rx+jPq7nMa5zV7Af FIF6+xNru6bP4/6Lq3Lsi7Lgq55fc1gmq0qo6rYB1pGgqiUezogEqbh1Yt+4qewt0KAP N1pDS6YumQDR8Z9uUfrhvMxZDyEIqrE1PMR2ftb7dQvJU/a8MoyswyB5+TH4dC+18T1X 9m3w== X-Gm-Message-State: AOJu0YyTjvqm2LVMk9QjS85kUF7Tg8WUY/FQzKX6ez/qF4U0amP0+vjj jHHRv0wkiGDxlKgZYerDemorWaYiApd+f7OoAFuaRXTC3LJm7tNACP+g X-Gm-Gg: AeBDiet7/UeWpUgllLDjVdNcbEyt53SsKz8IPVJ5mfu3/mLWHwKxCuwNQg1jmFEd9T9 BNgPWhPB159MOMDNEtyvBLlVTRWYHoz4QOekeZ5gtcPg2RWo5FzoTq+Mos9b4RYcnRwvly37hXg Qa0JQRiLiVebmNFbRGbASdPM/gqKyZX0uA2Uat003LEXwQx/SnzwLepxPYfHklQ/RtlCCB3JcWr RQat+fgt3i1a6O5fVtvIxeZwByqF9I/L1V1IFL6igm+5jYAXdSF3XPZzVPS/RIT3Kmy1bRHILEp 6EYvBq84pMYJTImbH6tOXRLaEtJbVwPfqZSuD9cuJUDO4bnxijEHchVHaIbwe8dwpzvTyQNPNh/ Byo1DqVocCLropjJv9zSWCEVqpKxHOE/A/GcOoYE+68Io9g30XVuo6kMttcnwp77VMk0PwQm3Qf lwp2o4qWWz+YyMuE000Ny5m8rBpfVqlD+77i0VwKGW X-Received: by 2002:a05:600c:888e:b0:489:1c1f:35e6 with SMTP id 5b1f17b1804b1-48a83d6a8dcmr58347205e9.6.1777585120673; Thu, 30 Apr 2026 14:38:40 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:40 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 01/29] sched/deadline: Fix replenishment logic for non-deferred servers Date: Thu, 30 Apr 2026 23:38:05 +0200 Message-ID: <20260430213835.62217-2-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Enqueue and replenish non-deferred deadline servers when their runtime is exhausted and the replenishment timer could not be started because it is too close to the wake-up instant. Signed-off-by: Yuri Andriaccio --- kernel/sched/deadline.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 674de6a48551..fb7b62e8190e 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1523,8 +1523,12 @@ static void update_curr_dl_se(struct rq *rq, struct = sched_dl_entity *dl_se, s64 if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(dl_se))) { if (dl_server(dl_se)) { - replenish_dl_new_period(dl_se, rq); - start_dl_timer(dl_se); + if (dl_se->dl_defer) { + replenish_dl_new_period(dl_se, rq); + start_dl_timer(dl_se); + } else { + enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH); + } } else { enqueue_task_dl(rq, dl_task_of(dl_se), ENQUEUE_REPLENISH); } -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f49.google.com (mail-wr1-f49.google.com [209.85.221.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0AB163B2FD5 for ; Thu, 30 Apr 2026 21:38:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585125; cv=none; b=paI6ljSAs6rs93gDJCwPDroRXmBegr3KVvi55SIo5JaI832+BO8fa73S3/D+6PvN6k9nP2EB8YPM6UdvFcOaj2m6QrjBliEudyzvhSwtJMKnehVY8rP/Lnfc6XqLix1W0hrxcFUV6KwA0mx1g9KO1+fU5stD6HO3zQ9ryxGF0kY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585125; c=relaxed/simple; bh=m1vjN2qEDyevLPnyiQMQNrHrguQ+xPUjLyG53fS5P38=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=AvJaywHBXwva0vhFnDlIQBPpnG/LnEApFSLYe0dCjXkf4BadhfH1iZmYMPC5B7590cDN04ihglFc/GFWew5a/PAg+jWMKpuTdyTpT/KqIOvgRATSdTv9jtv/1tryr2PRZNGQcxeempxexwNQstDaOTkxSsgO5nDrcm4hXIedCnE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OZQTuizo; arc=none smtp.client-ip=209.85.221.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OZQTuizo" Received: by mail-wr1-f49.google.com with SMTP id ffacd0b85a97d-448528f4e69so889786f8f.3 for ; Thu, 30 Apr 2026 14:38:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585122; x=1778189922; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QEX2eX6C4MXshGyJ5tN+kLdkzbyV1iymWxKHYCHTVP4=; b=OZQTuizoOAzt+UisSk6ZIMQx5Ou2KhF49Ut3OWzA3gAZR0L9P/OQBd2n9vOCMAQnd4 aqfBjxvj89l/Ysi+n93rY9ZdHCKgZOkwg20v9LSaieIhyRwjO0ehDT7q7HTWFlMmXJ3U Y9k9vDSrTV2UVEztQEQ34h3hQoO2hnq6kHpnfhev2ZTkbOluL7Er2/w8+L8CxBpogPqp GWIIMC0IpLkLE7ZpxVS7g+UdKspxrU8S2UAVO8K7t+54DTr8yG48hgiILZLb6/4d/TP8 Ev9xGZI9ulagPw2mMsnabCWDFtmFLS6Z21GojhgEYYPGjJePNZYHLZqPpaSY1fUCIsVN 2LxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585122; x=1778189922; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=QEX2eX6C4MXshGyJ5tN+kLdkzbyV1iymWxKHYCHTVP4=; b=SKfm26fBHCq8xb7MAR62AwKcZj0V6oeaBgE4Fx5JM76dBCDZSF2LZ1w3ifnIkhdbD0 l4LFeLhRPvozLK8FnVPKLueRWCD8nM/SarI/YKMMi8uqcu4xk5THPCWbwtB8yDGiMU9b 0pTQ5DsAVZWJIHIu76K5Nm/Oax96CD2l09dhd92NPU1y93bEkrYnSICg1Ms83pNP9sbg H3h2ZYKUKQ+9kakUYaJgPw7TxcuFbHNfdDbrXqiQ0z9EtmhX4g4FmJptJDnMJOfZ27JG qrHTHIqKrUpbL+F8AiGyAI3hxAsAkI+hQy9yJyJDDdRMW9vCN5dKdT4fF7WOPVSkH3Xx BjLg== X-Gm-Message-State: AOJu0Yx3kEITWsDTWRtcbgySx8lpOcwhf/EN4fXu2kMjZpb6QNOAGxoc vjqclQeqhlEZRAE96CSbb3rSfB63l3KYFikaOahRijgVAUWHIJIh6ylL X-Gm-Gg: AeBDieun2xJkOIXfSVi/iLizL5xUI+2hrMqdnqrZRsKAVeyn3jLXhcPH+LoaKHndaHt vxgtGCgxh34yNsQibBG01bL7OR2OZggVxDUDDd7nBtuWeFU/RLckQV6KaR93r3diwUeooPmP4Bk e2fkD6Up2OqKC6s8mRaIMe7d2/Xvh9TfLKa4ds/RirRKipi/fNBlXl5xbfCjJ6DEy2bKi2RP9JK 8bz3RXL/a3VdM6Kg/GfJ0i78BAd1/mbAAqNlmQvxXs4ZvkbWuyT0Cc6WSY59Bj70lq19b2nfVLW +uHSBT2hBLn74PWNnUa0iQsvUPQPwFDm9DfZu9GlV4Lz3wGNMQXSzMaZKLrnukj3w9LBHXKgmZz ARzH/5wKp/8SRNQhkZAYy4927qlhVDWcUdwoqF11nJrK+K8bMzH6u8c9T3Bl/w1oezYG3qm2lLB npz/oawLdDbRIytI5fJPasW6tuayU2TiEXYlWXYwRQ X-Received: by 2002:a05:6000:184e:b0:43d:7a97:78b5 with SMTP id ffacd0b85a97d-4493e9836c1mr7877657f8f.33.1777585122477; Thu, 30 Apr 2026 14:38:42 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:42 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 02/29] sched/deadline: Do not access dl_se->rq directly Date: Thu, 30 Apr 2026 23:38:06 +0200 Message-ID: <20260430213835.62217-3-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Make deadline.c code access the runqueue of a scheduling entity saved in the sched_dl_entity data structure. This allows future patches to save different runqueues in sched_dl_entity other than the global runqueues. Move dl_server_apply_params call in sched_init_dl_servers as the rq_of_dl_se function will return the correct deadline entity only if the dl_server flag is set. Add a WARN_ON on the return value of dl_server_apply_params in sched_init_dl_servers as this function may fail if the kernel is not configured correctly. Signed-off-by: luca abeni Signed-off-by: Yuri Andriaccio --- kernel/sched/deadline.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index fb7b62e8190e..ce80d9c08e31 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -863,7 +863,7 @@ static void replenish_dl_entity(struct sched_dl_entity = *dl_se) * and arm the defer timer. */ if (dl_se->dl_defer && !dl_se->dl_defer_running && - dl_time_before(rq_clock(dl_se->rq), dl_se->deadline - dl_se->runtime)= ) { + dl_time_before(rq_clock(rq), dl_se->deadline - dl_se->runtime)) { if (!is_dl_boosted(dl_se)) { =20 /* @@ -1180,11 +1180,11 @@ static enum hrtimer_restart dl_server_timer(struct = hrtimer *timer, struct sched_ * of time. The dl_server_min_res serves as a limit to avoid * forwarding the timer for a too small amount of time. */ - if (dl_time_before(rq_clock(dl_se->rq), + if (dl_time_before(rq_clock(rq), (dl_se->deadline - dl_se->runtime - dl_server_min_res))) { =20 /* reset the defer timer */ - fw =3D dl_se->deadline - rq_clock(dl_se->rq) - dl_se->runtime; + fw =3D dl_se->deadline - rq_clock(rq) - dl_se->runtime; =20 hrtimer_forward_now(timer, ns_to_ktime(fw)); return HRTIMER_RESTART; @@ -1195,7 +1195,7 @@ static enum hrtimer_restart dl_server_timer(struct hr= timer *timer, struct sched_ =20 enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH); =20 - if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &dl_se->rq->cu= rr->dl)) + if (!dl_task(rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl)) resched_curr(rq); =20 __push_dl_task(rq, rf); @@ -1490,7 +1490,7 @@ static void update_curr_dl_se(struct rq *rq, struct s= ched_dl_entity *dl_se, s64 =20 hrtimer_try_to_cancel(&dl_se->dl_timer); =20 - replenish_dl_new_period(dl_se, dl_se->rq); + replenish_dl_new_period(dl_se, rq); =20 if (idle) dl_se->dl_defer_idle =3D 1; @@ -1584,14 +1584,14 @@ static void update_curr_dl_se(struct rq *rq, struct= sched_dl_entity *dl_se, s64 void dl_server_update_idle(struct sched_dl_entity *dl_se, s64 delta_exec) { if (dl_se->dl_server_active && dl_se->dl_runtime && dl_se->dl_defer) - update_curr_dl_se(dl_se->rq, dl_se, delta_exec); + update_curr_dl_se(rq_of_dl_se(dl_se), dl_se, delta_exec); } =20 void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec) { /* 0 runtime =3D fair server disabled */ if (dl_se->dl_server_active && dl_se->dl_runtime) - update_curr_dl_se(dl_se->rq, dl_se, delta_exec); + update_curr_dl_se(rq_of_dl_se(dl_se), dl_se, delta_exec); } =20 /* @@ -1800,7 +1800,7 @@ void dl_server_update(struct sched_dl_entity *dl_se, = s64 delta_exec) */ void dl_server_start(struct sched_dl_entity *dl_se) { - struct rq *rq =3D dl_se->rq; + struct rq *rq; =20 dl_se->dl_defer_idle =3D 0; if (!dl_server(dl_se) || dl_se->dl_server_active || !dl_se->dl_runtime) @@ -1809,15 +1809,15 @@ void dl_server_start(struct sched_dl_entity *dl_se) /* * Update the current task to 'now'. */ + rq =3D rq_of_dl_se(dl_se); rq->donor->sched_class->update_curr(rq); - if (WARN_ON_ONCE(!cpu_online(cpu_of(rq)))) return; =20 dl_se->dl_server_active =3D 1; enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP); - if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl)) - resched_curr(dl_se->rq); + if (!dl_task(rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl)) + resched_curr(rq); } =20 void dl_server_stop(struct sched_dl_entity *dl_se) @@ -1859,9 +1859,9 @@ void sched_init_dl_servers(void) =20 WARN_ON(dl_server(dl_se)); =20 - dl_server_apply_params(dl_se, runtime, period, 1); - dl_se->dl_server =3D 1; + WARN_ON(dl_server_apply_params(dl_se, runtime, period, 1)); + dl_se->dl_defer =3D 1; setup_new_dl_entity(dl_se); =20 @@ -1870,9 +1870,9 @@ void sched_init_dl_servers(void) =20 WARN_ON(dl_server(dl_se)); =20 - dl_server_apply_params(dl_se, runtime, period, 1); - dl_se->dl_server =3D 1; + WARN_ON(dl_server_apply_params(dl_se, runtime, period, 1)); + dl_se->dl_defer =3D 1; setup_new_dl_entity(dl_se); #endif @@ -1898,7 +1898,7 @@ int dl_server_apply_params(struct sched_dl_entity *dl= _se, u64 runtime, u64 perio { u64 old_bw =3D init ? 0 : to_ratio(dl_se->dl_period, dl_se->dl_runtime); u64 new_bw =3D to_ratio(period, runtime); - struct rq *rq =3D dl_se->rq; + struct rq *rq =3D rq_of_dl_se(dl_se); int cpu =3D cpu_of(rq); struct dl_bw *dl_b; unsigned long cap; @@ -1974,7 +1974,7 @@ static enum hrtimer_restart inactive_task_timer(struc= t hrtimer *timer) p =3D dl_task_of(dl_se); rq =3D task_rq_lock(p, &rf); } else { - rq =3D dl_se->rq; + rq =3D rq_of_dl_se(dl_se); rq_lock(rq, &rf); } =20 --=20 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A7A63B19BF for ; Thu, 30 Apr 2026 21:38:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585127; cv=none; b=Yz0U7BM2/oytKZr8hT6bbiNxt7mCYo4BYuXp0C1b3NUZQsMBzLPHwP73hSnnsE8bktOH9pZQhEij3NfyYRfrRfDdhccqWveg3ZcYP+L3njLvcX9Uf3O+riwviResuHBRX17nCDKpMWQf1itAAhcn9qCL7O+BG5REHt4U8+HfROw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585127; c=relaxed/simple; bh=+JrGmpZTjPk9b/Lb89nvSdb3q2hKkkX7BYlhmV0mCZc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WNYcDsCQTS0lZ8uCKHrqeH3jZTCI2F1mrrD+1mnA5inYx6nuavNKqbI3eoHRjfs0nH1Ad8imrkDundcRqGusD+n12yIoGw/B8Tk4MLGlBHwZFOn2TkrVPQf5bWVvGb40Q8HD3uwtD3Fw9rK1qNMrPIm14BwOyGZd7NKX1Irk9Hg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MY48m34Q; arc=none smtp.client-ip=209.85.128.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MY48m34Q" Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-48909558b3aso14497755e9.0 for ; Thu, 30 Apr 2026 14:38:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585124; x=1778189924; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rEV1utz+l1IoymDPrXVXgCM0Qx6/uwG820MKSnB5GmY=; b=MY48m34QBVq7iTJ/NyI9dNQQxw17l8BRVyRFwrZO+XWAaOvjWfMTrE31M/ilWovDoa GRiHPd6knndGnYaAabzfUokFd4o1Qe1TfRBN2ZMfQQtQPyfCGrzceOyGvspa2N12bZNg xW3HgCpLbrYNET28trGMigKqyyKUF52WAstoMioZ7RpZlSMk+p7uAPZ7IlI5bVQ5Q4ki AJOTagFwzsqMr6az4FXGxfCdoyL9NtknmWrRWU5MU7R1PTqQOHw/4S+8V/WcdkGRuaw3 G/tX/nbWMfqffyH3DOkpbXzm3g+QzhSUCyozT/S13LxbyhP/jPD/DzsjgOmx0zOgHXPG AaHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585124; x=1778189924; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=rEV1utz+l1IoymDPrXVXgCM0Qx6/uwG820MKSnB5GmY=; b=cnCjk9ShbJeXrcVGEVXI3LRiSqy64nwvhI02YcGz5UIwmZm8dEO7F2XxCdI2XN+L0L e2XrMupaL2dW3smQ2AaoZeH38VuYyHFU64RiyrZ1UFOsZa60mzvlvJCC71R7mLl5ryoo hZa8RjOkeCauJ+V3CPLU8fR6qre0DK/9fVa07qrT5eLgDMuRAGsjEs4O/mIqeixaz7sf kyEAsV1iidW65Ho//Q+P4pW9w8NN9Pv/SyFmQnNpNps8Zajeg2Fq75TjmqizBUX7R32X 5umn1gAZFQjGJ7jKQggnY6AlWmv+Eqh7wMzH7lVLgCTI0WFf3sCaNLJO9MOENgts5qef VG/A== X-Gm-Message-State: AOJu0YxKd9OuSluTetyM4nGOwc3SvFoPdXg8rkFECksStkPFmf8yMMLp CvCOzGCw1QZ9qT2vyoDb3z42LKjRco6BFlhCXBMD6gk0T6dMAGNqpk6M X-Gm-Gg: AeBDieud2madsPhCC0ATkPWX7XApX7NuvQxvKS6SAetNA+8ljUuuIYveIHIu5fuD3EZ vvRntzsaWfeTphdqiKC5+JbskNvoGdXqpFjL0Yd6IrsuRyUYL/eXdxUZuquJC6DCdL/B8JzkOwX ztKELDVko2LOL7sqAlpdE6Qb1wQwVVH4jjX5ejgXatZJ+or8KJs6JgkD2Feasywr/YbBl9AQXBZ iJstXxOre+V/eur6XVy7DHMqpdXBDcCki6eyy+oTJLlvVG0D5q88XYtKQgpKWDnzG6v2mNRTtR5 X/3bJkC5LX6ZDb6A1o9AX+DApqC/2yvxfaJdtcce9l4cEZfNl2eNckTiuvLxEt2V2REtXbsn0Kf 9257NyU9viOx63P6JJyqz5M/v4pvPJMIdkZXe1HkrZsXwp1gMqvTwLsTyL8ANJBGCkB7M2sMl6S kDzm4ZzaachiRL6dhgxBOrHz5ZQGBd6Yn+u+Dv8QQl X-Received: by 2002:a05:600c:4e0c:b0:487:5c0:671f with SMTP id 5b1f17b1804b1-48a8444a520mr78104525e9.9.1777585124350; Thu, 30 Apr 2026 14:38:44 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:43 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 03/29] sched/deadline: Distinguish between dl_rq and my_q Date: Thu, 30 Apr 2026 23:38:07 +0200 Message-ID: <20260430213835.62217-4-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Split the single runqueue pointer in sched_dl_entity into two separate pointers, following the existing pattern used by sched_rt_entity: - dl_rq: Points to the deadline runqueue where this entity is queued (global runqueue). - my_q: Points to the runqueue that this entity serves (for servers). This distinction is currently redundant for the fair_server and ext_servers (both point to the same CPU's structures), but is essential for future RT cgroup support where deadline servers will be queued on the global dl_rq wh= ile serving tasks from cgroup-specific runqueues. Update rq_of_dl_se() to use container_of_const() to recover the global rq f= rom dl_rq, and update fair.c and ext.c to explicitly use my_q (local rq) when accessing the served runqueue. Update dl_server_init() to take a dl_rq pointer (use to retrieve the global runqueue where the dl_server is scheduled) and a rq pointer (for the local runqueue served by the server). Signed-off-by: luca abeni Signed-off-by: Yuri Andriaccio --- include/linux/sched.h | 6 ++++-- kernel/sched/deadline.c | 10 +++++++--- kernel/sched/ext.c | 4 ++-- kernel/sched/fair.c | 4 ++-- kernel/sched/sched.h | 3 ++- 5 files changed, 17 insertions(+), 10 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 5a5d3dbc9cdf..eb8b57f689b5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -733,9 +733,11 @@ struct sched_dl_entity { * Bits for DL-server functionality. Also see the comment near * dl_server_update(). * - * @rq the runqueue this server is for + * @dl_rq the runqueue on which this entity is (to be) queued + * @my_q the runqueue "owned" by this entity */ - struct rq *rq; + struct dl_rq *dl_rq; + struct rq *my_q; dl_server_pick_f server_pick_task; #ifdef CONFIG_RT_MUTEXES diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index ce80d9c08e31..219fe2fd697d 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -75,10 +75,12 @@ static inline struct rq *rq_of_dl_rq(struct dl_rq *dl_r= q) static inline struct rq *rq_of_dl_se(struct sched_dl_entity *dl_se) { - struct rq *rq =3D dl_se->rq; + struct rq *rq; if (!dl_server(dl_se)) rq =3D task_rq(dl_task_of(dl_se)); + else + rq =3D container_of_const(dl_se->dl_rq, struct rq, dl); return rq; } @@ -1833,10 +1835,12 @@ void dl_server_stop(struct sched_dl_entity *dl_se) dl_se->dl_server_active =3D 0; } -void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, +void dl_server_init(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq, + struct rq *served_rq, dl_server_pick_f pick_task) { - dl_se->rq =3D rq; + dl_se->dl_rq =3D dl_rq; + dl_se->my_q =3D served_rq; dl_se->server_pick_task =3D pick_task; } diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 064eaa76be4b..382152f8895f 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -2606,7 +2606,7 @@ ext_server_pick_task(struct sched_dl_entity *dl_se, s= truct rq_flags *rf) if (!scx_enabled()) return NULL; - return do_pick_task_scx(dl_se->rq, rf, true); + return do_pick_task_scx(dl_se->my_q, rf, true); } /* @@ -2618,7 +2618,7 @@ void ext_server_init(struct rq *rq) init_dl_entity(dl_se); - dl_server_init(dl_se, rq, ext_server_pick_task); + dl_server_init(dl_se, &rq->dl, rq, ext_server_pick_task); } #ifdef CONFIG_SCHED_CORE diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ab4114712be7..8c951186d5e5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9059,7 +9059,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct= *prev, struct rq_flags *rf static struct task_struct * fair_server_pick_task(struct sched_dl_entity *dl_se, struct rq_flags *rf) { - return pick_task_fair(dl_se->rq, rf); + return pick_task_fair(dl_se->my_q, rf); } void fair_server_init(struct rq *rq) @@ -9068,7 +9068,7 @@ void fair_server_init(struct rq *rq) init_dl_entity(dl_se); - dl_server_init(dl_se, rq, fair_server_pick_task); + dl_server_init(dl_se, &rq->dl, rq, fair_server_pick_task); } /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1ef9ba480f51..8572bd12d0a2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -412,7 +412,8 @@ extern void dl_server_update_idle(struct sched_dl_entit= y *dl_se, s64 delta_exec) extern void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec= ); extern void dl_server_start(struct sched_dl_entity *dl_se); extern void dl_server_stop(struct sched_dl_entity *dl_se); -extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, +extern void dl_server_init(struct sched_dl_entity *dl_se, struct dl_rq *dl= _rq, + struct rq *served_rq, dl_server_pick_f pick_task); extern void sched_init_dl_servers(void); -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 134753B8955 for ; Thu, 30 Apr 2026 21:38:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585129; cv=none; b=nkKrwL+52z2mKf65k5G1VnVedtrRNrVPfqwo9Kr9RcrrL8HpAMyORF4IVyw1fx42ocAkCJTWhOfGGgnrvyI2C+YBZA2BQc0Lw7fScTI3c5/aPTR8sqsdDv9xdMOkQQ9tQPo4pOGLkaEH5KdlabdvmubC+c9BdyS5O5xN934kxhs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585129; c=relaxed/simple; bh=cTfwImoKxrdf3Zb/fy3tmCf92c+DLwswJX3ZEpI/jys=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NipQh6XWWg7bQ8Ynv97DdlR5Nqn6GP6LVMuxHvj6aYQWsGj5TxqtmhccUEoaQAVq+eGmCNggCTFgAhPEZTSzGfkYkf7c/hteGR1S2/gOdEiq1mfNe3ZOFIxe1Fw4MBi3MxYl+D3b6ww2LUtKacX4FqeyWDSscVTsDUiz7kbBNh4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=DJ6Awtuo; arc=none smtp.client-ip=209.85.221.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="DJ6Awtuo" Received: by mail-wr1-f51.google.com with SMTP id ffacd0b85a97d-43d73352cf2so1251437f8f.1 for ; Thu, 30 Apr 2026 14:38:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585126; x=1778189926; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FO82wr1GLG+3gs4HQcQ6j0z+UM4MylCIAMZODdOaNFU=; b=DJ6Awtuom5rFObWRVTG7VLCN3/5lJyGWJFzCocoZmnhHEBocbSc/Oe77GCwIzbiCov dPO9wZE9Kn4LqWe81WhgnxnlMPQqz05ty5FQ9X0mUYTU5jo3zfKNt9KkvjVxGd4TIb7Y YvXQQDAwEG6wfw2XwpSRLn2wDmjLlHmHVxxBv+tGt3MaiFwPhvg9joFKDl+LNOJaE0HT ldl809KWuqnv542JL9K/d2Js8A0OZgyW7lyaJWG/9QOPIgXb59rFvvotxboyCvMPKy4C Zo2bndfUMTf7MCf13oL6auBAu9fxRvb4HdQBb7toeJUKqGOP9PE9Wf/yjKBEqDU4WPQC W82w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585126; x=1778189926; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=FO82wr1GLG+3gs4HQcQ6j0z+UM4MylCIAMZODdOaNFU=; b=NAIVvz0FmH6gXzQxw9ff93309EyTee9SCxE/BqMDTiUN/jhF8vexqN1Hvgy6hHFrYw fsXlgD1Z0k3g2X/dswRv9CZOjAdP84jvo0tNGB31YrBnhoL/65N98DuNVWOQ9xH8HThJ V5IldORDI1oT9VNioVHG07xTkoyYPyWmsnZysHtyEGON1vJ+gPJ67OTFQZ3kNgSD8zkD +TEe0aoJkLQQf4jvy6TbfkQ0CTpgxMBJBkqaPvs4M+yfdkKt8+fVXizz0B+VObunwGcU yHPXvPnBkQ2hgpPmBBq9ugGvA8q3eaHbEomMkXYNiBbwJcVWUCzgdzh9uj7Z4MaYh7pa 2N/g== X-Gm-Message-State: AOJu0YyJGITYS8IHR4NQdSQww4GmlF57i+RDETvg9ye9G6M3vjO/nicm kE/O11UOekO9LSIh/thObTXe6NxnSubYHfSE/oFxWrCXcWShNwlUL1YJ X-Gm-Gg: AeBDievrAzWUVtk+eg58HwdXPijZ7rGVQfrkTH7kB+7S3smoQIODKyNPXvld+3QLjjV B7nQTzxbuZaaWm5VOpBWTBm9mf7PKXYjqJuwXhvG/oAplasvA/I01S2ABGcc8PREwP85z5IsBlR jNsw/Jbx27vU8ItoAvQm+OGH13lnUyGyBLqusyIASoMOHDeEL1Nd6vCR/4gk360zD8BPdswg5Z9 9AW7yPuidRFN0WVgCNWuwOD6S08+KbGw3qstA83Imjk2zwPzjDQDWOGfMIgujG049VszitZ3FNU w48x0BMg2Xl3bVyJIdhR0/ScrxKpNmWUYXA+17uKQQ4m5ErjpVAySVrK5OsNznOSBjv2UlOGcZY bnH0E1wJUo7dFYnycXUfmoWGMBZaDtBShBfsML0YgREkSGFO6M4dQXn8/VgM5zU5dhzuSgfNW6e za5SY/S/6eWeUXNFW71gfo5cC/OtKT3PFrqppA+OCj X-Received: by 2002:a05:6000:4205:b0:43d:71b:204b with SMTP id ffacd0b85a97d-44a8814e578mr558205f8f.39.1777585126368; Thu, 30 Apr 2026 14:38:46 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:46 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 04/29] sched/rt: Pass an rt_rq instead of an rq where needed Date: Thu, 30 Apr 2026 23:38:08 +0200 Message-ID: <20260430213835.62217-5-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Make rt.c code access the runqueue through the rt_rq data structure rather than passing an rq pointer directly. This allows future patches to define rt_rq data structures which do not refer only to the global runqueue, but also to local cgroup runqueues (as rt_rq will not be always equal to &rq->rt). Add checks in rt_queue_{push/pull}_tasks to make sure that the given rt_rq object refers to a global runqueue and not any local one. Signed-off-by: luca abeni Signed-off-by: Yuri Andriaccio --- kernel/sched/rt.c | 99 ++++++++++++++++++++++++++--------------------- 1 file changed, 54 insertions(+), 45 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index f69e1f16d923..597eaba00a20 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -370,9 +370,9 @@ static inline void rt_clear_overload(struct rq *rq) cpumask_clear_cpu(rq->cpu, rq->rd->rto_mask); } -static inline int has_pushable_tasks(struct rq *rq) +static inline int has_pushable_tasks(struct rt_rq *rt_rq) { - return !plist_head_empty(&rq->rt.pushable_tasks); + return !plist_head_empty(&rt_rq->pushable_tasks); } static DEFINE_PER_CPU(struct balance_callback, rt_push_head); @@ -381,50 +381,54 @@ static DEFINE_PER_CPU(struct balance_callback, rt_pul= l_head); static void push_rt_tasks(struct rq *); static void pull_rt_task(struct rq *); -static inline void rt_queue_push_tasks(struct rq *rq) +static inline void rt_queue_push_tasks(struct rt_rq *rt_rq) { - if (!has_pushable_tasks(rq)) + struct rq *rq =3D container_of_const(rt_rq, struct rq, rt); + + if (!has_pushable_tasks(rt_rq)) return; queue_balance_callback(rq, &per_cpu(rt_push_head, rq->cpu), push_rt_tasks= ); } -static inline void rt_queue_pull_task(struct rq *rq) +static inline void rt_queue_pull_task(struct rt_rq *rt_rq) { + struct rq *rq =3D container_of_const(rt_rq, struct rq, rt); + queue_balance_callback(rq, &per_cpu(rt_pull_head, rq->cpu), pull_rt_task); } -static void enqueue_pushable_task(struct rq *rq, struct task_struct *p) +static void enqueue_pushable_task(struct rt_rq *rt_rq, struct task_struct = *p) { - plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks); + plist_del(&p->pushable_tasks, &rt_rq->pushable_tasks); plist_node_init(&p->pushable_tasks, p->prio); - plist_add(&p->pushable_tasks, &rq->rt.pushable_tasks); + plist_add(&p->pushable_tasks, &rt_rq->pushable_tasks); /* Update the highest prio pushable task */ - if (p->prio < rq->rt.highest_prio.next) - rq->rt.highest_prio.next =3D p->prio; + if (p->prio < rt_rq->highest_prio.next) + rt_rq->highest_prio.next =3D p->prio; - if (!rq->rt.overloaded) { - rt_set_overload(rq); - rq->rt.overloaded =3D 1; + if (!rt_rq->overloaded) { + rt_set_overload(rq_of_rt_rq(rt_rq)); + rt_rq->overloaded =3D 1; } } -static void dequeue_pushable_task(struct rq *rq, struct task_struct *p) +static void dequeue_pushable_task(struct rt_rq *rt_rq, struct task_struct = *p) { - plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks); + plist_del(&p->pushable_tasks, &rt_rq->pushable_tasks); /* Update the new highest prio pushable task */ - if (has_pushable_tasks(rq)) { - p =3D plist_first_entry(&rq->rt.pushable_tasks, + if (has_pushable_tasks(rt_rq)) { + p =3D plist_first_entry(&rt_rq->pushable_tasks, struct task_struct, pushable_tasks); - rq->rt.highest_prio.next =3D p->prio; + rt_rq->highest_prio.next =3D p->prio; } else { - rq->rt.highest_prio.next =3D MAX_RT_PRIO-1; + rt_rq->highest_prio.next =3D MAX_RT_PRIO-1; - if (rq->rt.overloaded) { - rt_clear_overload(rq); - rq->rt.overloaded =3D 0; + if (rt_rq->overloaded) { + rt_clear_overload(rq_of_rt_rq(rt_rq)); + rt_rq->overloaded =3D 0; } } } @@ -1431,6 +1435,7 @@ static void enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags) { struct sched_rt_entity *rt_se =3D &p->rt; + struct rt_rq *rt_rq =3D rt_rq_of_se(rt_se); if (flags & ENQUEUE_WAKEUP) rt_se->timeout =3D 0; @@ -1444,17 +1449,18 @@ enqueue_task_rt(struct rq *rq, struct task_struct *= p, int flags) return; if (!task_current(rq, p) && p->nr_cpus_allowed > 1) - enqueue_pushable_task(rq, p); + enqueue_pushable_task(rt_rq, p); } static bool dequeue_task_rt(struct rq *rq, struct task_struct *p, int flag= s) { struct sched_rt_entity *rt_se =3D &p->rt; + struct rt_rq *rt_rq =3D rt_rq_of_se(rt_se); update_curr_rt(rq); dequeue_rt_entity(rt_se, flags); - dequeue_pushable_task(rq, p); + dequeue_pushable_task(rt_rq, p); return true; } @@ -1645,14 +1651,14 @@ static void wakeup_preempt_rt(struct rq *rq, struct= task_struct *p, int flags) static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, = bool first) { struct sched_rt_entity *rt_se =3D &p->rt; - struct rt_rq *rt_rq =3D &rq->rt; + struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt); p->se.exec_start =3D rq_clock_task(rq); if (on_rt_rq(&p->rt)) update_stats_wait_end_rt(rt_rq, rt_se); /* The running task is never eligible for pushing */ - dequeue_pushable_task(rq, p); + dequeue_pushable_task(rt_rq, p); if (!first) return; @@ -1665,7 +1671,7 @@ static inline void set_next_task_rt(struct rq *rq, st= ruct task_struct *p, bool f if (rq->donor->sched_class !=3D &rt_sched_class) update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0); - rt_queue_push_tasks(rq); + rt_queue_push_tasks(rt_rq); } static struct sched_rt_entity *pick_next_rt_entity(struct rt_rq *rt_rq) @@ -1716,7 +1722,7 @@ static struct task_struct *pick_task_rt(struct rq *rq= , struct rq_flags *rf) static void put_prev_task_rt(struct rq *rq, struct task_struct *p, struct = task_struct *next) { struct sched_rt_entity *rt_se =3D &p->rt; - struct rt_rq *rt_rq =3D &rq->rt; + struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt); if (on_rt_rq(&p->rt)) update_stats_wait_start_rt(rt_rq, rt_se); @@ -1732,7 +1738,7 @@ static void put_prev_task_rt(struct rq *rq, struct ta= sk_struct *p, struct task_s * if it is still active */ if (on_rt_rq(&p->rt) && p->nr_cpus_allowed > 1) - enqueue_pushable_task(rq, p); + enqueue_pushable_task(rt_rq, p); } /* Only try algorithms three times */ @@ -1742,16 +1748,16 @@ static void put_prev_task_rt(struct rq *rq, struct = task_struct *p, struct task_s * Return the highest pushable rq's task, which is suitable to be executed * on the CPU, NULL otherwise */ -static struct task_struct *pick_highest_pushable_task(struct rq *rq, int c= pu) +static struct task_struct *pick_highest_pushable_task(struct rt_rq *rt_rq,= int cpu) { - struct plist_head *head =3D &rq->rt.pushable_tasks; + struct plist_head *head =3D &rt_rq->pushable_tasks; struct task_struct *p; - if (!has_pushable_tasks(rq)) + if (!has_pushable_tasks(rt_rq)) return NULL; plist_for_each_entry(p, head, pushable_tasks) { - if (task_is_pushable(rq, p, cpu)) + if (task_is_pushable(rq_of_rt_rq(rt_rq), p, cpu)) return p; } @@ -1851,14 +1857,15 @@ static int find_lowest_rq(struct task_struct *task) return -1; } -static struct task_struct *pick_next_pushable_task(struct rq *rq) +static struct task_struct *pick_next_pushable_task(struct rt_rq *rt_rq) { + struct rq *rq =3D rq_of_rt_rq(rt_rq); struct task_struct *p; - if (!has_pushable_tasks(rq)) + if (!has_pushable_tasks(rt_rq)) return NULL; - p =3D plist_first_entry(&rq->rt.pushable_tasks, + p =3D plist_first_entry(&rt_rq->pushable_tasks, struct task_struct, pushable_tasks); BUG_ON(rq->cpu !=3D task_cpu(p)); @@ -1911,7 +1918,7 @@ static struct rq *find_lock_lowest_rq(struct task_str= uct *task, struct rq *rq) */ if (unlikely(is_migration_disabled(task) || !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) || - task !=3D pick_next_pushable_task(rq))) { + task !=3D pick_next_pushable_task(&rq->rt))) { double_unlock_balance(rq, lowest_rq); lowest_rq =3D NULL; @@ -1945,7 +1952,7 @@ static int push_rt_task(struct rq *rq, bool pull) if (!rq->rt.overloaded) return 0; - next_task =3D pick_next_pushable_task(rq); + next_task =3D pick_next_pushable_task(&rq->rt); if (!next_task) return 0; @@ -2020,7 +2027,7 @@ static int push_rt_task(struct rq *rq, bool pull) * run-queue and is also still the next task eligible for * pushing. */ - task =3D pick_next_pushable_task(rq); + task =3D pick_next_pushable_task(&rq->rt); if (task =3D=3D next_task) { /* * The task hasn't migrated, and is still the next @@ -2213,7 +2220,7 @@ void rto_push_irq_work_func(struct irq_work *work) * We do not need to grab the lock to check for has_pushable_tasks. * When it gets updated, a check is made if a push is possible. */ - if (has_pushable_tasks(rq)) { + if (has_pushable_tasks(&rq->rt)) { raw_spin_rq_lock(rq); while (push_rt_task(rq, true)) ; @@ -2242,6 +2249,7 @@ static void pull_rt_task(struct rq *this_rq) int this_cpu =3D this_rq->cpu, cpu; bool resched =3D false; struct task_struct *p, *push_task; + struct rt_rq *src_rt_rq; struct rq *src_rq; int rt_overload_count =3D rt_overloaded(this_rq); @@ -2271,6 +2279,7 @@ static void pull_rt_task(struct rq *this_rq) continue; src_rq =3D cpu_rq(cpu); + src_rt_rq =3D &src_rq->rt; /* * Don't bother taking the src_rq->lock if the next highest @@ -2279,7 +2288,7 @@ static void pull_rt_task(struct rq *this_rq) * logically higher, the src_rq will push this task away. * And if its going logically lower, we do not care */ - if (src_rq->rt.highest_prio.next >=3D + if (src_rt_rq->highest_prio.next >=3D this_rq->rt.highest_prio.curr) continue; @@ -2295,7 +2304,7 @@ static void pull_rt_task(struct rq *this_rq) * We can pull only a task, which is pushable * on its rq, and no others. */ - p =3D pick_highest_pushable_task(src_rq, this_cpu); + p =3D pick_highest_pushable_task(src_rt_rq, this_cpu); /* * Do we have an RT task that preempts @@ -2401,7 +2410,7 @@ static void switched_from_rt(struct rq *rq, struct ta= sk_struct *p) if (!task_on_rq_queued(p) || rq->rt.rt_nr_running) return; - rt_queue_pull_task(rq); + rt_queue_pull_task(rt_rq_of_se(&p->rt)); } void __init init_sched_rt_class(void) @@ -2437,7 +2446,7 @@ static void switched_to_rt(struct rq *rq, struct task= _struct *p) */ if (task_on_rq_queued(p)) { if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) - rt_queue_push_tasks(rq); + rt_queue_push_tasks(rt_rq_of_se(&p->rt)); if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq))) resched_curr(rq); } @@ -2462,7 +2471,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p,= u64 oldprio) * may need to pull tasks to this runqueue. */ if (oldprio < p->prio) - rt_queue_pull_task(rq); + rt_queue_pull_task(rt_rq_of_se(&p->rt)); /* * If there's a higher priority task waiting to run -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE26F3B2FD5 for ; Thu, 30 Apr 2026 21:38:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585135; cv=none; b=izGSmZ47e8IMAZNCQmh6jDQpLVniZSzpN+nY+tbbMSJKbDH1YX3UHptjnFasLeWy4CZQaj3ROngpePEdPlP1JbMO1gfnvlJL4CrekdMOuMabb5kuI/AliccdFazwW/UdSaXQV3maHTfHOVPg4pZgEjulB+nIVDXBvGnSz4Lm9uM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585135; c=relaxed/simple; bh=TxS1jjzRrjS7edTSUKOro7wxkDlJRj9WFFvZDVBIPxk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KKhEoQA40XG/GXL1HuqExej/ZvFFo39EtrHSF2WG40WSQ2b1TcGxRnhHffglas09xL5dDsnvQT1lJEp1fz4kJm7Qrtw29c1B2ytj1UYvRECIADMoiI5yZ7jq8alVMC6IKvjc/Gkm/OE5fEdBoBZYyfKZ6I7KA4/fP+m7RgqK53I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=EVubwjY7; arc=none smtp.client-ip=209.85.128.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="EVubwjY7" Received: by mail-wm1-f42.google.com with SMTP id 5b1f17b1804b1-4891f625344so15392865e9.0 for ; Thu, 30 Apr 2026 14:38:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585128; x=1778189928; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QfxiE9y2IhgGw3IJI8mz/YCwy+mZEwuFD+yzPSDreW0=; b=EVubwjY7Q1CbAsr+UykZwF25MYEmTgcQE+fRWS+yRHWpQcewCH4I1KONDt49SXjGBG vioFGebzUxB6azWOYtxeu3gMhRQtqUxc8hl3oVJ30PkYayQAdRVGa+hL0J4PzU+CIz3j ooqkyum6UQSfLDn5mieP1oXXAv99AhWQ33BLcUP0L5Vk4qJ4koj5Mlwe67uiSKrkqW8+ AZoum39AdRODD0PPro4cRSMN/9CmxaRYSg3YGZWq7kkQE8TgG+iKuETDR1/kxEWXzWCp scpnfYOWJu5MEiWDodEKI2NkhAK6z8vQ/2z6EmtpshlPEfKUV+daDGhBmnVmFUM81YQS g9xQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585128; x=1778189928; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=QfxiE9y2IhgGw3IJI8mz/YCwy+mZEwuFD+yzPSDreW0=; b=DRm4+JSRGpp3bGLxyWCxOGf0CUxD0S9sXKP1syOV43kTrisKtuOD81H81WsJHH2+09 DJmHlEYVusQUyWAtHq258fTTWn9rx81OM9anp7MsQEg997rbPfnzACiN9Yh4tKk/KqLz 2R2Qr7zv341dG61DrLJo1HRMplq3nSgIxSuXedGgPd5vwZyg4+nAUy92iJ27Wlxh/g4W jjUUxM19D96a7t1md1+VlDrDWLYx/3MCQpK1bRE3T9FByBjn/jAEuZtjLJozCXNBsL4X jZ0fndYo5GFcA1ynRnGXwBWlTlzZgac4AYDAOV18muZBWmt1gPG+VpgrFhzA0AoeakjH 778w== X-Gm-Message-State: AOJu0YxHgYqUhiICOghqTBhFl/B9onHuW7CiAGSdNTER7CaQVdEn48Oy gC66ev2NqMN+t0KDedb+cbm+jXRFk8LL6tTXDgpvUMPeMINFwe4PDAc1CT0y7Q== X-Gm-Gg: AeBDies7MXEXvL2lhXijtqk00GBgHOzgk+ippH1dEuwR5rk/E0Emq+NmUk3ZOP1U+zJ i8q67KNV88VQX8xalSdunC5dFcXuQeTqQkZ0qVofuBIAOFQeBxnbjC1y6TT+79chcVmURV6Dvf1 2VzeziLiOpjqo4VJCvKEBUFlLD2IoeRPSmsUfh/vwOn7kGWJCktuPIPykfEqYQkmAMMzPVGO7Yi Twmdo6HQMnq2p5lB/IcqaT2vV0whsjvSV0ctNaHCtESys1cifWuaQ/bnJEVAgY8lR0X4MKNSYA3 Wml0GfoiHnHngm50Pt0HzvH7gurWwuX/LbYOY4XgF3ZLgo8zP391e7p6ZWwH3bextfBvWvyPWe8 qq3eVyJPsj8r1/jXVbQIK0QIHfkm6Dr5ld8Mg/3o7xB5UDjEAm/FwEmgnT1kGPG3h86DzJvnLqD y5ojbNNbr8+Vw6BH9bM+4XQCgM0xJyKkd3bxLHmFdJ X-Received: by 2002:a05:600c:5286:b0:486:faa8:9e4 with SMTP id 5b1f17b1804b1-48a86085131mr68142435e9.12.1777585128056; Thu, 30 Apr 2026 14:38:48 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:47 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 05/29] sched/rt: Move functions from rt.c to sched.h Date: Thu, 30 Apr 2026 23:38:09 +0200 Message-ID: <20260430213835.62217-6-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Make the following functions/macros be non-static and move them in sched.h, so that they can be also used in other source files: - rt_entity_is_task() - rt_task_of() - rq_of_rt_rq() - rt_rq_of_se() - rq_of_rt_se() There are no functional changes, apart from the use of container_of_const() instead of container_of() where applicable. This is needed by future patche= s. Signed-off-by: luca abeni Signed-off-by: Yuri Andriaccio --- kernel/sched/rt.c | 56 ------------------------------------------ kernel/sched/sched.h | 58 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+), 56 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 597eaba00a20..5f89c080a3ef 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -166,36 +166,6 @@ static void destroy_rt_bandwidth(struct rt_bandwidth *= rt_b) hrtimer_cancel(&rt_b->rt_period_timer); } =20 -#define rt_entity_is_task(rt_se) (!(rt_se)->my_q) - -static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se) -{ - WARN_ON_ONCE(!rt_entity_is_task(rt_se)); - - return container_of(rt_se, struct task_struct, rt); -} - -static inline struct rq *rq_of_rt_rq(struct rt_rq *rt_rq) -{ - /* Cannot fold with non-CONFIG_RT_GROUP_SCHED version, layout */ - WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group); - return rt_rq->rq; -} - -static inline struct rt_rq *rt_rq_of_se(struct sched_rt_entity *rt_se) -{ - WARN_ON(!rt_group_sched_enabled() && rt_se->rt_rq->tg !=3D &root_task_gro= up); - return rt_se->rt_rq; -} - -static inline struct rq *rq_of_rt_se(struct sched_rt_entity *rt_se) -{ - struct rt_rq *rt_rq =3D rt_se->rt_rq; - - WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group); - return rt_rq->rq; -} - void unregister_rt_sched_group(struct task_group *tg) { if (!rt_group_sched_enabled()) @@ -294,32 +264,6 @@ int alloc_rt_sched_group(struct task_group *tg, struct= task_group *parent) =20 #else /* !CONFIG_RT_GROUP_SCHED: */ =20 -#define rt_entity_is_task(rt_se) (1) - -static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se) -{ - return container_of(rt_se, struct task_struct, rt); -} - -static inline struct rq *rq_of_rt_rq(struct rt_rq *rt_rq) -{ - return container_of(rt_rq, struct rq, rt); -} - -static inline struct rq *rq_of_rt_se(struct sched_rt_entity *rt_se) -{ - struct task_struct *p =3D rt_task_of(rt_se); - - return task_rq(p); -} - -static inline struct rt_rq *rt_rq_of_se(struct sched_rt_entity *rt_se) -{ - struct rq *rq =3D rq_of_rt_se(rt_se); - - return &rq->rt; -} - void unregister_rt_sched_group(struct task_group *tg) { } =20 void free_rt_sched_group(struct task_group *tg) { } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 8572bd12d0a2..2b8630ed1353 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3305,6 +3305,64 @@ extern void set_rq_offline(struct rq *rq); =20 extern bool sched_smp_initialized; =20 +#ifdef CONFIG_RT_GROUP_SCHED +#define rt_entity_is_task(rt_se) (!(rt_se)->my_q) + +static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se) +{ + WARN_ON_ONCE(!rt_entity_is_task(rt_se)); + + return container_of_const(rt_se, struct task_struct, rt); +} + +static inline struct rq *rq_of_rt_rq(struct rt_rq *rt_rq) +{ + /* Cannot fold with non-CONFIG_RT_GROUP_SCHED version, layout */ + WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group); + return rt_rq->rq; +} + +static inline struct rt_rq *rt_rq_of_se(struct sched_rt_entity *rt_se) +{ + WARN_ON(!rt_group_sched_enabled() && rt_se->rt_rq->tg !=3D &root_task_gro= up); + return rt_se->rt_rq; +} + +static inline struct rq *rq_of_rt_se(struct sched_rt_entity *rt_se) +{ + struct rt_rq *rt_rq =3D rt_se->rt_rq; + + WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group); + return rt_rq->rq; +} +#else +#define rt_entity_is_task(rt_se) (1) + +static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se) +{ + return container_of_const(rt_se, struct task_struct, rt); +} + +static inline struct rq *rq_of_rt_rq(struct rt_rq *rt_rq) +{ + return container_of_const(rt_rq, struct rq, rt); +} + +static inline struct rq *rq_of_rt_se(struct sched_rt_entity *rt_se) +{ + struct task_struct *p =3D rt_task_of(rt_se); + + return task_rq(p); +} + +static inline struct rt_rq *rt_rq_of_se(struct sched_rt_entity *rt_se) +{ + struct rq *rq =3D rq_of_rt_se(rt_se); + + return &rq->rt; +} +#endif + DEFINE_LOCK_GUARD_2(double_rq_lock, struct rq, double_rq_lock(_T->lock, _T->lock2), double_rq_unlock(_T->lock, _T->lock2)) --=20 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4B6C3AEF22 for ; Thu, 30 Apr 2026 21:38:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585134; cv=none; b=PMdBVibhiMUg6SDUjeqhO9AlV3h0wJ7x7rCq+A278mu6Q9f0ac/vePjQogqokr3JtTOYt3ZurUDFnCrXxjHM06yP0nZsK9RtNG+mMwflZ/kthjXUO0z6Ikf4lk4T5Tw80lIk8QLZqtZ4IJS/U+DWsLJK6/IQVHSFedZEZp1wosQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585134; c=relaxed/simple; bh=kd58wLISfFdzZjR9gzmHYZGDTJ7aXQsEponyai+uW80=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uUFDpeJY5l77z2umiuOAjW3lieuz5xLrJcXv5475rgJiUKgefQ3jixg5x7C0y+/mLuduNn9S6vqjlWxkmHgtzdDuYLKupck/tdHHcuXxQflTPoxpKf5RSVABKialk7xVRc+HaKcqNUWvhzqMnFc5K5usQV8sA2OQkrI0s/vnr5w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=T96GNaqk; arc=none smtp.client-ip=209.85.128.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="T96GNaqk" Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-488b0046078so12005835e9.1 for ; Thu, 30 Apr 2026 14:38:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585130; x=1778189930; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ly1UQ8L3ztc7V5eK8qpJw2cvlmazUGXtQ+QLYaDybr4=; b=T96GNaqkQzQYO6lLNRFtGe4Me/VXQtxeIT2H47rzCj4Qn7irNwiY7ok8xTEbAIPnrl iT+S5vbWGDVvzVPOcjt4W1eMxIYK63DaFhyAStsZgs3ML+eHDTrbNZ+WjJ5GSJmFZ5Ne R1HEfHo4CMY69IAJg1IUlF1WguFLucjfJWTPkldoyAiQaeyev8WNWdyfzOdmRKITRTMV VHhOYDTqJiS5pWraoL9mFjUD2qAZi8WIuy48VZxMPoCrqC0e6YbeAAgt3aIWYscpY/qB BzN0dCElSoktyj/b1ORMEUypB47VNyjSK0tDowfD1Ue+toSoN7tFK0q4BFUkGOR5LGUQ GAww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585130; x=1778189930; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ly1UQ8L3ztc7V5eK8qpJw2cvlmazUGXtQ+QLYaDybr4=; b=kId6P3VGtC8ORa7I2fFQFTV03fQhW4gN2N8P12IxoHdTI5RFsPodnakrCcOb/LDnMc XwQpCnQOe4f3oVim5ZF6Itmjo0A/YZyiGSFv5JrWIsI4T9pRp4FrbtUG0HPdqUKd8+GZ Ni2s8ROywvpqkLRX829nKi3M/1QwGW6TJX9ue/rYfOD3wiX+QpnmU07F2+AveE0JsG7y 8prZWaKdk1cWMXXWsDZqcCNCZuZ1kLQeb3zpwN8v/b4b0JNi6oTosgmtjYE0p64kAotM p2G5TPTjXvQN5hSLRpwRqgNWpZYpCowzNhf8WSGABNHgpBr4KklmghaX43SxBcZmwyhW Uhiw== X-Gm-Message-State: AOJu0Yyg9Opn+Q7YJyhvb0z1i4b+AT9XkR6txypWUXT/7PNZjJd9APm5 jmwNYpHzMQdnHmN+hju1/IxCcbeyB1gw4UPIjBqNtGh7noBGrdsyj2y2 X-Gm-Gg: AeBDiesuLkfjzWTMl8pKmw8Eb0LjEk+YNHjIcQNuDWteH618Fp2wsktwRuHNseeKWJq pKNULZHtB7BVrIs8ZZGpJgjJipM9jJQIQIvTtzRIUYa6JBAEMdMBUBa91/cDO68k1XXAQEXeJsj Q9E8DC8k+KJVwbuwadRdi6gdw1jvCaruQRTQcSyeYK31SBSPzwIuwrvfSA3UYtJ9gH4jtFyI3zD dXTkxoEFWyPtgsdlU3BYU7boRty+wQtxhNerdTqXvTie76FjACgG1K2dMQ0xxdvbaCugGkSr1nX C5/dI0NT843gZDQ4L+PaDZm7FSjEfVwEvqKIBUn4NRVmYEqvWzyhsIPJZllUq1xqeG+PzbhpARa +w6uKHbxw3YOZY2+GvZPDIk4L1aBtp8eCRl/2ewK2dcjC4T4jCjnNkSz9heddc0H26pSE7zEL5e mogpBqePe2N3F6b/YLdve9RXZsAnf/q8onUN8CJ1Ry X-Received: by 2002:a5d:64c9:0:b0:43f:e41d:8ba8 with SMTP id ffacd0b85a97d-44a8626e3bbmr586639f8f.2.1777585130099; Thu, 30 Apr 2026 14:38:50 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:49 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 06/29] sched/rt: Disable RT_GROUP_SCHED Date: Thu, 30 Apr 2026 23:38:10 +0200 Message-ID: <20260430213835.62217-7-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Disable the old RT_GROUP_SCHED scheduler. Note that this does not completely remove all the RT_GROUP_SCHED functionality, just unhooks it and removes most of the relevant functions. Some of the RT_GROUP_SCHED functions are kept because they will be adapted for the HCBS scheduling. Most notably: - Disable the initialization of the rt_bandwidth for group scheduling. - Unhook any functionality for RT_GROUP_SCHED in normal rt.c code, leaving only non-group functionality. - Remove group related field initialization in init_rt_rq(). - Remove all the unhooked (and so unused) functions from RT_GROUP_SCHED. - Remove all allocation/deallocation code for rt-groups, always returning failure on allocation. - Update inc/dec_rt_tasks active tasks' counters, as rt scheduling entities now only represent a single task, and not a group of tasks anymore. Signed-off-by: Yuri Andriaccio --- kernel/sched/core.c | 6 - kernel/sched/deadline.c | 34 -- kernel/sched/debug.c | 6 - kernel/sched/rt.c | 861 ++-------------------------------------- kernel/sched/sched.h | 15 +- kernel/sched/syscalls.c | 13 - 6 files changed, 26 insertions(+), 909 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 496dff740dca..a203a27fb16d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8647,11 +8647,6 @@ void __init sched_init(void) =20 init_defrootdomain(); =20 -#ifdef CONFIG_RT_GROUP_SCHED - init_rt_bandwidth(&root_task_group.rt_bandwidth, - global_rt_period(), global_rt_runtime()); -#endif /* CONFIG_RT_GROUP_SCHED */ - #ifdef CONFIG_CGROUP_SCHED task_group_cache =3D KMEM_CACHE(task_group, 0); =20 @@ -8703,7 +8698,6 @@ void __init sched_init(void) * starts working after scheduler_running, which is not the case * yet. */ - rq->rt.rt_runtime =3D global_rt_runtime(); init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL); #endif rq->next_class =3D &idle_sched_class; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 219fe2fd697d..67615a0539fe 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1539,40 +1539,6 @@ static void update_curr_dl_se(struct rq *rq, struct = sched_dl_entity *dl_se, s64 if (!is_leftmost(dl_se, &rq->dl)) resched_curr(rq); } - - /* - * The dl_server does not account for real-time workload because it - * is running fair work. - */ - if (dl_se->dl_server) - return; - -#ifdef CONFIG_RT_GROUP_SCHED - /* - * Because -- for now -- we share the rt bandwidth, we need to - * account our runtime there too, otherwise actual rt tasks - * would be able to exceed the shared quota. - * - * Account to the root rt group for now. - * - * The solution we're working towards is having the RT groups scheduled - * using deadline servers -- however there's a few nasties to figure - * out before that can happen. - */ - if (rt_bandwidth_enabled()) { - struct rt_rq *rt_rq =3D &rq->rt; - - raw_spin_lock(&rt_rq->rt_runtime_lock); - /* - * We'll let actual RT tasks worry about the overflow here, we - * have our own CBS to keep us inline; only account when RT - * bandwidth is relevant. - */ - if (sched_rt_bandwidth_account(rt_rq)) - rt_rq->rt_time +=3D delta_exec; - raw_spin_unlock(&rt_rq->rt_runtime_lock); - } -#endif /* CONFIG_RT_GROUP_SCHED */ } =20 /* diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 15bf45b6f912..e50e5115d4fd 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -997,12 +997,6 @@ void print_rt_rq(struct seq_file *m, int cpu, struct r= t_rq *rt_rq) =20 PU(rt_nr_running); =20 -#ifdef CONFIG_RT_GROUP_SCHED - P(rt_throttled); - PN(rt_time); - PN(rt_runtime); -#endif - #undef PN #undef PU #undef P diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 5f89c080a3ef..392212ac90d8 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -82,115 +82,19 @@ void init_rt_rq(struct rt_rq *rt_rq) rt_rq->highest_prio.next =3D MAX_RT_PRIO-1; rt_rq->overloaded =3D 0; plist_head_init(&rt_rq->pushable_tasks); - /* We start is dequeued state, because no RT tasks are queued */ - rt_rq->rt_queued =3D 0; - -#ifdef CONFIG_RT_GROUP_SCHED - rt_rq->rt_time =3D 0; - rt_rq->rt_throttled =3D 0; - rt_rq->rt_runtime =3D 0; - raw_spin_lock_init(&rt_rq->rt_runtime_lock); - rt_rq->tg =3D &root_task_group; -#endif } =20 #ifdef CONFIG_RT_GROUP_SCHED =20 -static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun= ); - -static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer) -{ - struct rt_bandwidth *rt_b =3D - container_of(timer, struct rt_bandwidth, rt_period_timer); - int idle =3D 0; - int overrun; - - raw_spin_lock(&rt_b->rt_runtime_lock); - for (;;) { - overrun =3D hrtimer_forward_now(timer, rt_b->rt_period); - if (!overrun) - break; - - raw_spin_unlock(&rt_b->rt_runtime_lock); - idle =3D do_sched_rt_period_timer(rt_b, overrun); - raw_spin_lock(&rt_b->rt_runtime_lock); - } - if (idle) - rt_b->rt_period_active =3D 0; - raw_spin_unlock(&rt_b->rt_runtime_lock); - - return idle ? HRTIMER_NORESTART : HRTIMER_RESTART; -} - -void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime) -{ - rt_b->rt_period =3D ns_to_ktime(period); - rt_b->rt_runtime =3D runtime; - - raw_spin_lock_init(&rt_b->rt_runtime_lock); - - hrtimer_setup(&rt_b->rt_period_timer, sched_rt_period_timer, CLOCK_MONOTO= NIC, - HRTIMER_MODE_REL_HARD); -} - -static inline void do_start_rt_bandwidth(struct rt_bandwidth *rt_b) -{ - raw_spin_lock(&rt_b->rt_runtime_lock); - if (!rt_b->rt_period_active) { - rt_b->rt_period_active =3D 1; - /* - * SCHED_DEADLINE updates the bandwidth, as a run away - * RT task with a DL task could hog a CPU. But DL does - * not reset the period. If a deadline task was running - * without an RT task running, it can cause RT tasks to - * throttle when they start up. Kick the timer right away - * to update the period. - */ - hrtimer_forward_now(&rt_b->rt_period_timer, ns_to_ktime(0)); - hrtimer_start_expires(&rt_b->rt_period_timer, - HRTIMER_MODE_ABS_PINNED_HARD); - } - raw_spin_unlock(&rt_b->rt_runtime_lock); -} - -static void start_rt_bandwidth(struct rt_bandwidth *rt_b) -{ - if (!rt_bandwidth_enabled() || rt_b->rt_runtime =3D=3D RUNTIME_INF) - return; - - do_start_rt_bandwidth(rt_b); -} - -static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b) -{ - hrtimer_cancel(&rt_b->rt_period_timer); -} - void unregister_rt_sched_group(struct task_group *tg) { - if (!rt_group_sched_enabled()) - return; =20 - if (tg->rt_se) - destroy_rt_bandwidth(&tg->rt_bandwidth); } =20 void free_rt_sched_group(struct task_group *tg) { - int i; - if (!rt_group_sched_enabled()) return; - - for_each_possible_cpu(i) { - if (tg->rt_rq) - kfree(tg->rt_rq[i]); - if (tg->rt_se) - kfree(tg->rt_se[i]); - } - - kfree(tg->rt_rq); - kfree(tg->rt_se); } =20 void init_tg_rt_entry(struct task_group *tg, struct rt_rq *rt_rq, @@ -200,66 +104,19 @@ void init_tg_rt_entry(struct task_group *tg, struct r= t_rq *rt_rq, struct rq *rq =3D cpu_rq(cpu); =20 rt_rq->highest_prio.curr =3D MAX_RT_PRIO-1; - rt_rq->rt_nr_boosted =3D 0; rt_rq->rq =3D rq; rt_rq->tg =3D tg; =20 tg->rt_rq[cpu] =3D rt_rq; tg->rt_se[cpu] =3D rt_se; - - if (!rt_se) - return; - - if (!parent) - rt_se->rt_rq =3D &rq->rt; - else - rt_se->rt_rq =3D parent->my_q; - - rt_se->my_q =3D rt_rq; - rt_se->parent =3D parent; - INIT_LIST_HEAD(&rt_se->run_list); } =20 int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent) { - struct rt_rq *rt_rq; - struct sched_rt_entity *rt_se; - int i; - if (!rt_group_sched_enabled()) return 1; =20 - tg->rt_rq =3D kzalloc_objs(rt_rq, nr_cpu_ids); - if (!tg->rt_rq) - goto err; - tg->rt_se =3D kzalloc_objs(rt_se, nr_cpu_ids); - if (!tg->rt_se) - goto err; - - init_rt_bandwidth(&tg->rt_bandwidth, ktime_to_ns(global_rt_period()), 0); - - for_each_possible_cpu(i) { - rt_rq =3D kzalloc_node(sizeof(struct rt_rq), - GFP_KERNEL, cpu_to_node(i)); - if (!rt_rq) - goto err; - - rt_se =3D kzalloc_node(sizeof(struct sched_rt_entity), - GFP_KERNEL, cpu_to_node(i)); - if (!rt_se) - goto err_free_rq; - - init_rt_rq(rt_rq); - rt_rq->rt_runtime =3D tg->rt_bandwidth.rt_runtime; - init_tg_rt_entry(tg, rt_rq, rt_se, i, parent->rt_se[i]); - } - return 1; - -err_free_rq: - kfree(rt_rq); -err: - return 0; } =20 #else /* !CONFIG_RT_GROUP_SCHED: */ @@ -377,9 +234,6 @@ static void dequeue_pushable_task(struct rt_rq *rt_rq, = struct task_struct *p) } } =20 -static void enqueue_top_rt_rq(struct rt_rq *rt_rq); -static void dequeue_top_rt_rq(struct rt_rq *rt_rq, unsigned int count); - static inline int on_rt_rq(struct sched_rt_entity *rt_se) { return rt_se->on_rq; @@ -426,16 +280,6 @@ static inline bool rt_task_fits_capacity(struct task_s= truct *p, int cpu) =20 #ifdef CONFIG_RT_GROUP_SCHED =20 -static inline u64 sched_rt_runtime(struct rt_rq *rt_rq) -{ - return rt_rq->rt_runtime; -} - -static inline u64 sched_rt_period(struct rt_rq *rt_rq) -{ - return ktime_to_ns(rt_rq->tg->rt_bandwidth.rt_period); -} - typedef struct task_group *rt_rq_iter_t; =20 static inline struct task_group *next_task_group(struct task_group *tg) @@ -461,457 +305,20 @@ static inline struct task_group *next_task_group(str= uct task_group *tg) iter && (rt_rq =3D iter->rt_rq[cpu_of(rq)]); \ iter =3D next_task_group(iter)) =20 -#define for_each_sched_rt_entity(rt_se) \ - for (; rt_se; rt_se =3D rt_se->parent) - -static inline struct rt_rq *group_rt_rq(struct sched_rt_entity *rt_se) -{ - return rt_se->my_q; -} - static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int = flags); static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int = flags); =20 -static void sched_rt_rq_enqueue(struct rt_rq *rt_rq) -{ - struct task_struct *donor =3D rq_of_rt_rq(rt_rq)->donor; - struct rq *rq =3D rq_of_rt_rq(rt_rq); - struct sched_rt_entity *rt_se; - - int cpu =3D cpu_of(rq); - - rt_se =3D rt_rq->tg->rt_se[cpu]; - - if (rt_rq->rt_nr_running) { - if (!rt_se) - enqueue_top_rt_rq(rt_rq); - else if (!on_rt_rq(rt_se)) - enqueue_rt_entity(rt_se, 0); - - if (rt_rq->highest_prio.curr < donor->prio) - resched_curr(rq); - } -} - -static void sched_rt_rq_dequeue(struct rt_rq *rt_rq) -{ - struct sched_rt_entity *rt_se; - int cpu =3D cpu_of(rq_of_rt_rq(rt_rq)); - - rt_se =3D rt_rq->tg->rt_se[cpu]; - - if (!rt_se) { - dequeue_top_rt_rq(rt_rq, rt_rq->rt_nr_running); - /* Kick cpufreq (see the comment in kernel/sched/sched.h). */ - cpufreq_update_util(rq_of_rt_rq(rt_rq), 0); - } - else if (on_rt_rq(rt_se)) - dequeue_rt_entity(rt_se, 0); -} - -static inline int rt_rq_throttled(struct rt_rq *rt_rq) -{ - return rt_rq->rt_throttled && !rt_rq->rt_nr_boosted; -} - -static int rt_se_boosted(struct sched_rt_entity *rt_se) -{ - struct rt_rq *rt_rq =3D group_rt_rq(rt_se); - struct task_struct *p; - - if (rt_rq) - return !!rt_rq->rt_nr_boosted; - - p =3D rt_task_of(rt_se); - return p->prio !=3D p->normal_prio; -} - -static inline const struct cpumask *sched_rt_period_mask(void) -{ - return this_rq()->rd->span; -} - -static inline -struct rt_rq *sched_rt_period_rt_rq(struct rt_bandwidth *rt_b, int cpu) -{ - return container_of(rt_b, struct task_group, rt_bandwidth)->rt_rq[cpu]; -} - -static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq) -{ - return &rt_rq->tg->rt_bandwidth; -} - -bool sched_rt_bandwidth_account(struct rt_rq *rt_rq) -{ - struct rt_bandwidth *rt_b =3D sched_rt_bandwidth(rt_rq); - - return (hrtimer_active(&rt_b->rt_period_timer) || - rt_rq->rt_time < rt_b->rt_runtime); -} - -/* - * We ran out of runtime, see if we can borrow some from our neighbours. - */ -static void do_balance_runtime(struct rt_rq *rt_rq) -{ - struct rt_bandwidth *rt_b =3D sched_rt_bandwidth(rt_rq); - struct root_domain *rd =3D rq_of_rt_rq(rt_rq)->rd; - int i, weight; - u64 rt_period; - - weight =3D cpumask_weight(rd->span); - - raw_spin_lock(&rt_b->rt_runtime_lock); - rt_period =3D ktime_to_ns(rt_b->rt_period); - for_each_cpu(i, rd->span) { - struct rt_rq *iter =3D sched_rt_period_rt_rq(rt_b, i); - s64 diff; - - if (iter =3D=3D rt_rq) - continue; - - raw_spin_lock(&iter->rt_runtime_lock); - /* - * Either all rqs have inf runtime and there's nothing to steal - * or __disable_runtime() below sets a specific rq to inf to - * indicate its been disabled and disallow stealing. - */ - if (iter->rt_runtime =3D=3D RUNTIME_INF) - goto next; - - /* - * From runqueues with spare time, take 1/n part of their - * spare time, but no more than our period. - */ - diff =3D iter->rt_runtime - iter->rt_time; - if (diff > 0) { - diff =3D div_u64((u64)diff, weight); - if (rt_rq->rt_runtime + diff > rt_period) - diff =3D rt_period - rt_rq->rt_runtime; - iter->rt_runtime -=3D diff; - rt_rq->rt_runtime +=3D diff; - if (rt_rq->rt_runtime =3D=3D rt_period) { - raw_spin_unlock(&iter->rt_runtime_lock); - break; - } - } -next: - raw_spin_unlock(&iter->rt_runtime_lock); - } - raw_spin_unlock(&rt_b->rt_runtime_lock); -} - -/* - * Ensure this RQ takes back all the runtime it lend to its neighbours. - */ -static void __disable_runtime(struct rq *rq) -{ - struct root_domain *rd =3D rq->rd; - rt_rq_iter_t iter; - struct rt_rq *rt_rq; - - if (unlikely(!scheduler_running)) - return; - - for_each_rt_rq(rt_rq, iter, rq) { - struct rt_bandwidth *rt_b =3D sched_rt_bandwidth(rt_rq); - s64 want; - int i; - - raw_spin_lock(&rt_b->rt_runtime_lock); - raw_spin_lock(&rt_rq->rt_runtime_lock); - /* - * Either we're all inf and nobody needs to borrow, or we're - * already disabled and thus have nothing to do, or we have - * exactly the right amount of runtime to take out. - */ - if (rt_rq->rt_runtime =3D=3D RUNTIME_INF || - rt_rq->rt_runtime =3D=3D rt_b->rt_runtime) - goto balanced; - raw_spin_unlock(&rt_rq->rt_runtime_lock); - - /* - * Calculate the difference between what we started out with - * and what we current have, that's the amount of runtime - * we lend and now have to reclaim. - */ - want =3D rt_b->rt_runtime - rt_rq->rt_runtime; - - /* - * Greedy reclaim, take back as much as we can. - */ - for_each_cpu(i, rd->span) { - struct rt_rq *iter =3D sched_rt_period_rt_rq(rt_b, i); - s64 diff; - - /* - * Can't reclaim from ourselves or disabled runqueues. - */ - if (iter =3D=3D rt_rq || iter->rt_runtime =3D=3D RUNTIME_INF) - continue; - - raw_spin_lock(&iter->rt_runtime_lock); - if (want > 0) { - diff =3D min_t(s64, iter->rt_runtime, want); - iter->rt_runtime -=3D diff; - want -=3D diff; - } else { - iter->rt_runtime -=3D want; - want -=3D want; - } - raw_spin_unlock(&iter->rt_runtime_lock); - - if (!want) - break; - } - - raw_spin_lock(&rt_rq->rt_runtime_lock); - /* - * We cannot be left wanting - that would mean some runtime - * leaked out of the system. - */ - WARN_ON_ONCE(want); -balanced: - /* - * Disable all the borrow logic by pretending we have inf - * runtime - in which case borrowing doesn't make sense. - */ - rt_rq->rt_runtime =3D RUNTIME_INF; - rt_rq->rt_throttled =3D 0; - raw_spin_unlock(&rt_rq->rt_runtime_lock); - raw_spin_unlock(&rt_b->rt_runtime_lock); - - /* Make rt_rq available for pick_next_task() */ - sched_rt_rq_enqueue(rt_rq); - } -} - -static void __enable_runtime(struct rq *rq) -{ - rt_rq_iter_t iter; - struct rt_rq *rt_rq; - - if (unlikely(!scheduler_running)) - return; - - /* - * Reset each runqueue's bandwidth settings - */ - for_each_rt_rq(rt_rq, iter, rq) { - struct rt_bandwidth *rt_b =3D sched_rt_bandwidth(rt_rq); - - raw_spin_lock(&rt_b->rt_runtime_lock); - raw_spin_lock(&rt_rq->rt_runtime_lock); - rt_rq->rt_runtime =3D rt_b->rt_runtime; - rt_rq->rt_time =3D 0; - rt_rq->rt_throttled =3D 0; - raw_spin_unlock(&rt_rq->rt_runtime_lock); - raw_spin_unlock(&rt_b->rt_runtime_lock); - } -} - -static void balance_runtime(struct rt_rq *rt_rq) -{ - if (!sched_feat(RT_RUNTIME_SHARE)) - return; - - if (rt_rq->rt_time > rt_rq->rt_runtime) { - raw_spin_unlock(&rt_rq->rt_runtime_lock); - do_balance_runtime(rt_rq); - raw_spin_lock(&rt_rq->rt_runtime_lock); - } -} - -static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) -{ - int i, idle =3D 1, throttled =3D 0; - const struct cpumask *span; - - span =3D sched_rt_period_mask(); - - /* - * FIXME: isolated CPUs should really leave the root task group, - * whether they are isolcpus or were isolated via cpusets, lest - * the timer run on a CPU which does not service all runqueues, - * potentially leaving other CPUs indefinitely throttled. If - * isolation is really required, the user will turn the throttle - * off to kill the perturbations it causes anyway. Meanwhile, - * this maintains functionality for boot and/or troubleshooting. - */ - if (rt_b =3D=3D &root_task_group.rt_bandwidth) - span =3D cpu_online_mask; - - for_each_cpu(i, span) { - int enqueue =3D 0; - struct rt_rq *rt_rq =3D sched_rt_period_rt_rq(rt_b, i); - struct rq *rq =3D rq_of_rt_rq(rt_rq); - struct rq_flags rf; - int skip; - - /* - * When span =3D=3D cpu_online_mask, taking each rq->lock - * can be time-consuming. Try to avoid it when possible. - */ - raw_spin_lock(&rt_rq->rt_runtime_lock); - if (!sched_feat(RT_RUNTIME_SHARE) && rt_rq->rt_runtime !=3D RUNTIME_INF) - rt_rq->rt_runtime =3D rt_b->rt_runtime; - skip =3D !rt_rq->rt_time && !rt_rq->rt_nr_running; - raw_spin_unlock(&rt_rq->rt_runtime_lock); - if (skip) - continue; - - rq_lock(rq, &rf); - update_rq_clock(rq); - - if (rt_rq->rt_time) { - u64 runtime; - - raw_spin_lock(&rt_rq->rt_runtime_lock); - if (rt_rq->rt_throttled) - balance_runtime(rt_rq); - runtime =3D rt_rq->rt_runtime; - rt_rq->rt_time -=3D min(rt_rq->rt_time, overrun*runtime); - if (rt_rq->rt_throttled && rt_rq->rt_time < runtime) { - rt_rq->rt_throttled =3D 0; - enqueue =3D 1; - - /* - * When we're idle and a woken (rt) task is - * throttled wakeup_preempt() will set - * skip_update and the time between the wakeup - * and this unthrottle will get accounted as - * 'runtime'. - */ - if (rt_rq->rt_nr_running && rq->curr =3D=3D rq->idle) - rq_clock_cancel_skipupdate(rq); - } - if (rt_rq->rt_time || rt_rq->rt_nr_running) - idle =3D 0; - raw_spin_unlock(&rt_rq->rt_runtime_lock); - } else if (rt_rq->rt_nr_running) { - idle =3D 0; - if (!rt_rq_throttled(rt_rq)) - enqueue =3D 1; - } - if (rt_rq->rt_throttled) - throttled =3D 1; - - if (enqueue) - sched_rt_rq_enqueue(rt_rq); - rq_unlock(rq, &rf); - } - - if (!throttled && (!rt_bandwidth_enabled() || rt_b->rt_runtime =3D=3D RUN= TIME_INF)) - return 1; - - return idle; -} - -static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq) -{ - u64 runtime =3D sched_rt_runtime(rt_rq); - - if (rt_rq->rt_throttled) - return rt_rq_throttled(rt_rq); - - if (runtime >=3D sched_rt_period(rt_rq)) - return 0; - - balance_runtime(rt_rq); - runtime =3D sched_rt_runtime(rt_rq); - if (runtime =3D=3D RUNTIME_INF) - return 0; - - if (rt_rq->rt_time > runtime) { - struct rt_bandwidth *rt_b =3D sched_rt_bandwidth(rt_rq); - - /* - * Don't actually throttle groups that have no runtime assigned - * but accrue some time due to boosting. - */ - if (likely(rt_b->rt_runtime)) { - rt_rq->rt_throttled =3D 1; - printk_deferred_once("sched: RT throttling activated\n"); - } else { - /* - * In case we did anyway, make it go away, - * replenishment is a joke, since it will replenish us - * with exactly 0 ns. - */ - rt_rq->rt_time =3D 0; - } - - if (rt_rq_throttled(rt_rq)) { - sched_rt_rq_dequeue(rt_rq); - return 1; - } - } - - return 0; -} - -#else /* !CONFIG_RT_GROUP_SCHED: */ +#else /* !CONFIG_RT_GROUP_SCHED */ =20 typedef struct rt_rq *rt_rq_iter_t; =20 #define for_each_rt_rq(rt_rq, iter, rq) \ for ((void) iter, rt_rq =3D &rq->rt; rt_rq; rt_rq =3D NULL) =20 -#define for_each_sched_rt_entity(rt_se) \ - for (; rt_se; rt_se =3D NULL) - -static inline struct rt_rq *group_rt_rq(struct sched_rt_entity *rt_se) -{ - return NULL; -} - -static inline void sched_rt_rq_enqueue(struct rt_rq *rt_rq) -{ - struct rq *rq =3D rq_of_rt_rq(rt_rq); - - if (!rt_rq->rt_nr_running) - return; - - enqueue_top_rt_rq(rt_rq); - resched_curr(rq); -} - -static inline void sched_rt_rq_dequeue(struct rt_rq *rt_rq) -{ - dequeue_top_rt_rq(rt_rq, rt_rq->rt_nr_running); -} - -static inline int rt_rq_throttled(struct rt_rq *rt_rq) -{ - return false; -} - -static inline const struct cpumask *sched_rt_period_mask(void) -{ - return cpu_online_mask; -} - -static inline -struct rt_rq *sched_rt_period_rt_rq(struct rt_bandwidth *rt_b, int cpu) -{ - return &cpu_rq(cpu)->rt; -} - -static void __enable_runtime(struct rq *rq) { } -static void __disable_runtime(struct rq *rq) { } - -#endif /* !CONFIG_RT_GROUP_SCHED */ +#endif /* CONFIG_RT_GROUP_SCHED */ =20 static inline int rt_se_prio(struct sched_rt_entity *rt_se) { -#ifdef CONFIG_RT_GROUP_SCHED - struct rt_rq *rt_rq =3D group_rt_rq(rt_se); - - if (rt_rq) - return rt_rq->highest_prio.curr; -#endif - return rt_task_of(rt_se)->prio; } =20 @@ -931,67 +338,8 @@ static void update_curr_rt(struct rq *rq) if (unlikely(delta_exec <=3D 0)) return; =20 -#ifdef CONFIG_RT_GROUP_SCHED - struct sched_rt_entity *rt_se =3D &donor->rt; - if (!rt_bandwidth_enabled()) return; - - for_each_sched_rt_entity(rt_se) { - struct rt_rq *rt_rq =3D rt_rq_of_se(rt_se); - int exceeded; - - if (sched_rt_runtime(rt_rq) !=3D RUNTIME_INF) { - raw_spin_lock(&rt_rq->rt_runtime_lock); - rt_rq->rt_time +=3D delta_exec; - exceeded =3D sched_rt_runtime_exceeded(rt_rq); - if (exceeded) - resched_curr(rq); - raw_spin_unlock(&rt_rq->rt_runtime_lock); - if (exceeded) - do_start_rt_bandwidth(sched_rt_bandwidth(rt_rq)); - } - } -#endif /* CONFIG_RT_GROUP_SCHED */ -} - -static void -dequeue_top_rt_rq(struct rt_rq *rt_rq, unsigned int count) -{ - struct rq *rq =3D rq_of_rt_rq(rt_rq); - - BUG_ON(&rq->rt !=3D rt_rq); - - if (!rt_rq->rt_queued) - return; - - BUG_ON(!rq->nr_running); - - sub_nr_running(rq, count); - rt_rq->rt_queued =3D 0; - -} - -static void -enqueue_top_rt_rq(struct rt_rq *rt_rq) -{ - struct rq *rq =3D rq_of_rt_rq(rt_rq); - - BUG_ON(&rq->rt !=3D rt_rq); - - if (rt_rq->rt_queued) - return; - - if (rt_rq_throttled(rt_rq)) - return; - - if (rt_rq->rt_nr_running) { - add_nr_running(rq, rt_rq->rt_nr_running); - rt_rq->rt_queued =3D 1; - } - - /* Kick cpufreq (see the comment in kernel/sched/sched.h). */ - cpufreq_update_util(rq, 0); } =20 static void @@ -1062,58 +410,11 @@ dec_rt_prio(struct rt_rq *rt_rq, int prio) dec_rt_prio_smp(rt_rq, prio, prev_prio); } =20 -#ifdef CONFIG_RT_GROUP_SCHED - -static void -inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) -{ - if (rt_se_boosted(rt_se)) - rt_rq->rt_nr_boosted++; - - start_rt_bandwidth(&rt_rq->tg->rt_bandwidth); -} - -static void -dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) -{ - if (rt_se_boosted(rt_se)) - rt_rq->rt_nr_boosted--; - - WARN_ON(!rt_rq->rt_nr_running && rt_rq->rt_nr_boosted); -} - -#else /* !CONFIG_RT_GROUP_SCHED: */ - -static void -inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) -{ -} - -static inline -void dec_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) {} - -#endif /* !CONFIG_RT_GROUP_SCHED */ - static inline -unsigned int rt_se_nr_running(struct sched_rt_entity *rt_se) +unsigned int is_rr_task(struct sched_rt_entity *rt_se) { - struct rt_rq *group_rq =3D group_rt_rq(rt_se); - - if (group_rq) - return group_rq->rt_nr_running; - else - return 1; -} - -static inline -unsigned int rt_se_rr_nr_running(struct sched_rt_entity *rt_se) -{ - struct rt_rq *group_rq =3D group_rt_rq(rt_se); struct task_struct *tsk; =20 - if (group_rq) - return group_rq->rr_nr_running; - tsk =3D rt_task_of(rt_se); =20 return (tsk->policy =3D=3D SCHED_RR) ? 1 : 0; @@ -1122,26 +423,21 @@ unsigned int rt_se_rr_nr_running(struct sched_rt_ent= ity *rt_se) static inline void inc_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) { - int prio =3D rt_se_prio(rt_se); - - WARN_ON(!rt_prio(prio)); - rt_rq->rt_nr_running +=3D rt_se_nr_running(rt_se); - rt_rq->rr_nr_running +=3D rt_se_rr_nr_running(rt_se); + WARN_ON(!rt_prio(rt_se_prio(rt_se))); + rt_rq->rt_nr_running +=3D 1; + rt_rq->rr_nr_running +=3D is_rr_task(rt_se); =20 - inc_rt_prio(rt_rq, prio); - inc_rt_group(rt_se, rt_rq); + inc_rt_prio(rt_rq, rt_se_prio(rt_se)); } =20 static inline void dec_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) { WARN_ON(!rt_prio(rt_se_prio(rt_se))); - WARN_ON(!rt_rq->rt_nr_running); - rt_rq->rt_nr_running -=3D rt_se_nr_running(rt_se); - rt_rq->rr_nr_running -=3D rt_se_rr_nr_running(rt_se); + rt_rq->rt_nr_running -=3D 1; + rt_rq->rr_nr_running -=3D is_rr_task(rt_se); =20 dec_rt_prio(rt_rq, rt_se_prio(rt_se)); - dec_rt_group(rt_se, rt_rq); } =20 /* @@ -1170,10 +466,6 @@ static void __delist_rt_entity(struct sched_rt_entity= *rt_se, struct rt_prio_arr static inline struct sched_statistics * __schedstats_from_rt_se(struct sched_rt_entity *rt_se) { - /* schedstats is not supported for rt group. */ - if (!rt_entity_is_task(rt_se)) - return NULL; - return &rt_task_of(rt_se)->stats; } =20 @@ -1186,9 +478,7 @@ update_stats_wait_start_rt(struct rt_rq *rt_rq, struct= sched_rt_entity *rt_se) if (!schedstat_enabled()) return; =20 - if (rt_entity_is_task(rt_se)) - p =3D rt_task_of(rt_se); - + p =3D rt_task_of(rt_se); stats =3D __schedstats_from_rt_se(rt_se); if (!stats) return; @@ -1205,9 +495,7 @@ update_stats_enqueue_sleeper_rt(struct rt_rq *rt_rq, s= truct sched_rt_entity *rt_ if (!schedstat_enabled()) return; =20 - if (rt_entity_is_task(rt_se)) - p =3D rt_task_of(rt_se); - + p =3D rt_task_of(rt_se); stats =3D __schedstats_from_rt_se(rt_se); if (!stats) return; @@ -1235,9 +523,7 @@ update_stats_wait_end_rt(struct rt_rq *rt_rq, struct s= ched_rt_entity *rt_se) if (!schedstat_enabled()) return; =20 - if (rt_entity_is_task(rt_se)) - p =3D rt_task_of(rt_se); - + p =3D rt_task_of(rt_se); stats =3D __schedstats_from_rt_se(rt_se); if (!stats) return; @@ -1254,9 +540,7 @@ update_stats_dequeue_rt(struct rt_rq *rt_rq, struct sc= hed_rt_entity *rt_se, if (!schedstat_enabled()) return; =20 - if (rt_entity_is_task(rt_se)) - p =3D rt_task_of(rt_se); - + p =3D rt_task_of(rt_se); if ((flags & DEQUEUE_SLEEP) && p) { unsigned int state; =20 @@ -1275,21 +559,8 @@ static void __enqueue_rt_entity(struct sched_rt_entit= y *rt_se, unsigned int flag { struct rt_rq *rt_rq =3D rt_rq_of_se(rt_se); struct rt_prio_array *array =3D &rt_rq->active; - struct rt_rq *group_rq =3D group_rt_rq(rt_se); struct list_head *queue =3D array->queue + rt_se_prio(rt_se); =20 - /* - * Don't enqueue the group if its throttled, or when empty. - * The latter is a consequence of the former when a child group - * get throttled and the current group doesn't have any other - * active members. - */ - if (group_rq && (rt_rq_throttled(group_rq) || !group_rq->rt_nr_running)) { - if (rt_se->on_list) - __delist_rt_entity(rt_se, array); - return; - } - if (move_entity(flags)) { WARN_ON_ONCE(rt_se->on_list); if (flags & ENQUEUE_HEAD) @@ -1319,57 +590,18 @@ static void __dequeue_rt_entity(struct sched_rt_enti= ty *rt_se, unsigned int flag dec_rt_tasks(rt_se, rt_rq); } =20 -/* - * Because the prio of an upper entry depends on the lower - * entries, we must remove entries top - down. - */ -static void dequeue_rt_stack(struct sched_rt_entity *rt_se, unsigned int f= lags) -{ - struct sched_rt_entity *back =3D NULL; - unsigned int rt_nr_running; - - for_each_sched_rt_entity(rt_se) { - rt_se->back =3D back; - back =3D rt_se; - } - - rt_nr_running =3D rt_rq_of_se(back)->rt_nr_running; - - for (rt_se =3D back; rt_se; rt_se =3D rt_se->back) { - if (on_rt_rq(rt_se)) - __dequeue_rt_entity(rt_se, flags); - } - - dequeue_top_rt_rq(rt_rq_of_se(back), rt_nr_running); -} - static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int = flags) { - struct rq *rq =3D rq_of_rt_se(rt_se); - update_stats_enqueue_rt(rt_rq_of_se(rt_se), rt_se, flags); =20 - dequeue_rt_stack(rt_se, flags); - for_each_sched_rt_entity(rt_se) - __enqueue_rt_entity(rt_se, flags); - enqueue_top_rt_rq(&rq->rt); + __enqueue_rt_entity(rt_se, flags); } =20 static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int = flags) { - struct rq *rq =3D rq_of_rt_se(rt_se); - update_stats_dequeue_rt(rt_rq_of_se(rt_se), rt_se, flags); =20 - dequeue_rt_stack(rt_se, flags); - - for_each_sched_rt_entity(rt_se) { - struct rt_rq *rt_rq =3D group_rt_rq(rt_se); - - if (rt_rq && rt_rq->rt_nr_running) - __enqueue_rt_entity(rt_se, flags); - } - enqueue_top_rt_rq(&rq->rt); + __dequeue_rt_entity(rt_se, flags); } =20 /* @@ -1429,13 +661,7 @@ requeue_rt_entity(struct rt_rq *rt_rq, struct sched_r= t_entity *rt_se, int head) =20 static void requeue_task_rt(struct rq *rq, struct task_struct *p, int head) { - struct sched_rt_entity *rt_se =3D &p->rt; - struct rt_rq *rt_rq; - - for_each_sched_rt_entity(rt_se) { - rt_rq =3D rt_rq_of_se(rt_se); - requeue_rt_entity(rt_rq, rt_se, head); - } + requeue_rt_entity(rt_rq_of_se(&p->rt), &p->rt, head); } =20 static void yield_task_rt(struct rq *rq) @@ -1636,21 +862,6 @@ static struct sched_rt_entity *pick_next_rt_entity(st= ruct rt_rq *rt_rq) return next; } =20 -static struct task_struct *_pick_next_task_rt(struct rq *rq) -{ - struct sched_rt_entity *rt_se; - struct rt_rq *rt_rq =3D &rq->rt; - - do { - rt_se =3D pick_next_rt_entity(rt_rq); - if (unlikely(!rt_se)) - return NULL; - rt_rq =3D group_rt_rq(rt_se); - } while (rt_rq); - - return rt_task_of(rt_se); -} - static struct task_struct *pick_task_rt(struct rq *rq, struct rq_flags *rf) { struct task_struct *p; @@ -1658,7 +869,7 @@ static struct task_struct *pick_task_rt(struct rq *rq,= struct rq_flags *rf) if (!sched_rt_runnable(rq)) return NULL; =20 - p =3D _pick_next_task_rt(rq); + p =3D rt_task_of(pick_next_rt_entity(&rq->rt)); =20 return p; } @@ -2322,8 +1533,6 @@ static void rq_online_rt(struct rq *rq) if (rq->rt.overloaded) rt_set_overload(rq); =20 - __enable_runtime(rq); - cpupri_set(&rq->rd->cpupri, rq->cpu, rq->rt.highest_prio.curr); } =20 @@ -2333,8 +1542,6 @@ static void rq_offline_rt(struct rq *rq) if (rq->rt.overloaded) rt_clear_overload(rq); =20 - __disable_runtime(rq); - cpupri_set(&rq->rd->cpupri, rq->cpu, CPUPRI_INVALID); } =20 @@ -2495,12 +1702,10 @@ static void task_tick_rt(struct rq *rq, struct task= _struct *p, int queued) * Requeue to the end of queue if we (and all of our ancestors) are not * the only element on the queue */ - for_each_sched_rt_entity(rt_se) { - if (rt_se->run_list.prev !=3D rt_se->run_list.next) { - requeue_task_rt(rq, p, 0); - resched_curr(rq); - return; - } + if (rt_se->run_list.prev !=3D rt_se->run_list.next) { + requeue_task_rt(rq, p, 0); + resched_curr(rq); + return; } } =20 @@ -2518,16 +1723,7 @@ static unsigned int get_rr_interval_rt(struct rq *rq= , struct task_struct *task) #ifdef CONFIG_SCHED_CORE static int task_is_throttled_rt(struct task_struct *p, int cpu) { - struct rt_rq *rt_rq; - -#ifdef CONFIG_RT_GROUP_SCHED // XXX maybe add task_rt_rq(), see also sched= _rt_period_rt_rq - rt_rq =3D task_group(p)->rt_rq[cpu]; - WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group); -#else - rt_rq =3D &cpu_rq(cpu)->rt; -#endif - - return rt_rq_throttled(rt_rq); + return 0; } #endif /* CONFIG_SCHED_CORE */ =20 @@ -2774,13 +1970,7 @@ long sched_group_rt_period(struct task_group *tg) #ifdef CONFIG_SYSCTL static int sched_rt_global_constraints(void) { - int ret =3D 0; - - mutex_lock(&rt_constraints_mutex); - ret =3D __rt_schedulable(NULL, 0, 0); - mutex_unlock(&rt_constraints_mutex); - - return ret; + return 0; } #endif /* CONFIG_SYSCTL */ =20 @@ -2815,10 +2005,6 @@ static int sched_rt_global_validate(void) return 0; } =20 -static void sched_rt_do_global(void) -{ -} - static int sched_rt_handler(const struct ctl_table *table, int write, void= *buffer, size_t *lenp, loff_t *ppos) { @@ -2846,7 +2032,6 @@ static int sched_rt_handler(const struct ctl_table *t= able, int write, void *buff if (ret) goto undo; =20 - sched_rt_do_global(); sched_dl_do_global(); } if (0) { diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2b8630ed1353..5833905d8eaa 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -820,7 +820,7 @@ struct scx_rq { =20 static inline int rt_bandwidth_enabled(void) { - return sysctl_sched_rt_runtime >=3D 0; + return 0; } =20 /* RT IPI pull logic requires IRQ_WORK */ @@ -860,7 +860,7 @@ struct rt_rq { =20 static inline bool rt_rq_is_runnable(struct rt_rq *rt_rq) { - return rt_rq->rt_queued && rt_rq->rt_nr_running; + return rt_rq->rt_nr_running; } =20 /* Deadline class' related fields in a runqueue */ @@ -2775,7 +2775,7 @@ static inline bool sched_dl_runnable(struct rq *rq) =20 static inline bool sched_rt_runnable(struct rq *rq) { - return rq->rt.rt_queued > 0; + return rq->rt.rt_nr_running > 0; } =20 static inline bool sched_fair_runnable(struct rq *rq) @@ -2887,9 +2887,6 @@ extern void resched_curr(struct rq *rq); extern void resched_curr_lazy(struct rq *rq); extern void resched_cpu(int cpu); =20 -extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 r= untime); -extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq); - extern void init_dl_entity(struct sched_dl_entity *dl_se); =20 extern void init_cfs_throttle_work(struct task_struct *p); @@ -3306,12 +3303,8 @@ extern void set_rq_offline(struct rq *rq); extern bool sched_smp_initialized; =20 #ifdef CONFIG_RT_GROUP_SCHED -#define rt_entity_is_task(rt_se) (!(rt_se)->my_q) - static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se) { - WARN_ON_ONCE(!rt_entity_is_task(rt_se)); - return container_of_const(rt_se, struct task_struct, rt); } =20 @@ -3336,8 +3329,6 @@ static inline struct rq *rq_of_rt_se(struct sched_rt_= entity *rt_se) return rt_rq->rq; } #else -#define rt_entity_is_task(rt_se) (1) - static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se) { return container_of_const(rt_se, struct task_struct, rt); diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c index cadb0e9fe19b..806bc88d21ee 100644 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -606,19 +606,6 @@ int __sched_setscheduler(struct task_struct *p, change: =20 if (user) { -#ifdef CONFIG_RT_GROUP_SCHED - /* - * Do not allow real-time tasks into groups that have no runtime - * assigned. - */ - if (rt_group_sched_enabled() && - rt_bandwidth_enabled() && rt_policy(policy) && - task_group(p)->rt_bandwidth.rt_runtime =3D=3D 0 && - !task_group_is_autogroup(task_group(p))) { - retval =3D -EPERM; - goto unlock; - } -#endif /* CONFIG_RT_GROUP_SCHED */ if (dl_bandwidth_enabled() && dl_policy(policy) && !(attr->sched_flags & SCHED_FLAG_SUGOV)) { cpumask_t *span =3D rq->rd->span; --=20 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F1163B2FFD for ; Thu, 30 Apr 2026 21:38:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585134; cv=none; b=lJEcdFX/6oqz2LT31hIqMHtkXlgSHXEZkMI/eE7vv/BNtvdmesOk0zBQgeZOCNHoipiN8vSfp7s5PF82M7CnTjABW6FM9pTY5fBY1YJv5B2/DnBhj+YpX4gKfK1XhMevPMFUXY3E8V0TOrmJ1TeakU21jNNhzIfkjsQ00n+doxo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585134; c=relaxed/simple; bh=Eyp3uV+qL11K+rqD15xNqMG9WVc28wKc/T28J5YZ4J0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BNdPFy6c6u05+EGPfFyhGBwwYb+S4eAkITj79MDJZ70oDm8e/xUzZfepNgedYclqgGVEF173W7xf9mbrKuUwGfNf1MtNvX/KEy1j9N6g1AnJpKGPHvIR0PY2venLOFgURP/O0gfvoRqhrAo5ZzUaMyQ6W5RIZ9PIAzfKiehJkJA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RupxTJ47; arc=none smtp.client-ip=209.85.128.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RupxTJ47" Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-48374014a77so16451135e9.3 for ; Thu, 30 Apr 2026 14:38:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585132; x=1778189932; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JdI3UsjVeiDjAibmgUmYcwm9N7anheY9a/zPU/tkYCA=; b=RupxTJ47BiT+euzo2xDzbxECMejb63j+Sm3lhus3t4YMif1jeBxu2j3Qp83Pb4+8T+ u2pH+rhAX67RabpYEl9ncA/OB0KLIj3pDbhXtqOlDC8pobZ8rZn7/paosOQx8Dp5AXcO Zuzm0+wK56HwlOjP8C+DZ36QrBsn0t7TRwtPsiELCwwy8wQrU58FmBX5uB9XzOI88yq+ iUMXcog7sqLh7L1yHJU209NH/Tsk83PsJhdw0U0avoxUpKbKLAA1/KdTIAaZUHLMYxww 2bLWprKQ/lKYyQKgxAxXNAHBKtwl9LTnHtAPvtP/zsO3IFoBVXaGPprTmhxRqzD8q3yb ql1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585132; x=1778189932; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=JdI3UsjVeiDjAibmgUmYcwm9N7anheY9a/zPU/tkYCA=; b=f74U2NgKJ8NpLBlLZrQsvjbZZoVGDUUDzhuocEz+1Xu07Jl7sE08kepjyUWrka+Dw8 vQhg6ceEozDUjprB8n0CvNN5vyihRO496HrV1ITu+SUD3ysF7/ex4ssvUdDgZaKGKl28 OFK/CyPCpxGwZdR9QFyL4Uh1rOL3AAxu/2tQxMLnDU6pyaejMtBasxLrModdteySnO7U vFopaDRpm4vT54kGUDbPnVFRgT5anDR9HcJC+cCxvknd0a7rnOoX4nNA5zzi6YCVu5P7 WGGF0nP8cs2OQXDSVbXDvsfUuQrpz5um/CYfrvhCQz0CFyj5kbxkrFnsFHnGSvvcVR9D 2EIg== X-Gm-Message-State: AOJu0YxqdHpEAtuYRCXeF/nodijIMl3l/ATcemRC49zpNKmI7PpwMz4w tPSBp6ywHaWhJGJ6meZAsPvm0+AtcOiC1MatQHGe2e0bdYU1VZX1p1sy X-Gm-Gg: AeBDieufatC7PR9QFci6M4qPp/+LgcSrFbi3oRXlEzkem+m/KTpPN/rGJfpK/tAAEhc HJEUJxHYJyJk1fbNainVGmowCbjIWQUk9uGyNMqctIT2laEMKcUCrJOBYFuS/2ghU4wgXBfkEIb 89XCXRDT+wH67ppBzktBJyHgzssu6ASbgAr5azZPGb09P0P09199BcnMD1Nqu7+7p5sHERusm/9 uk4mqHvDTqK6EmSNzj2nJqhJPs5ytz0x8vDY7xDNnLkkHLLM6OQn1rVoh55OwbKgfVDSRiEN6Ct 7WsI8aqEFikvMwQ3BSV1zdiUVK0vukrApz7mFmwRRikdkW4d8eupVSL5uEMq1/XShZcimkUKY0F JZEEYo339WU8qjpZPeemdjGETn9yFB5kyXO69ifiktjXA4hVTHsCn/jC9DAj8xy5MDgnr0t/Gk5 jqDKtDqN39XzjaD3RLIniEpxUDWVAUHXmSCKJotyrhIWEEytNUJBI= X-Received: by 2002:a05:6000:2505:b0:446:db72:e8ec with SMTP id ffacd0b85a97d-44a875bac0fmr575308f8f.23.1777585131602; Thu, 30 Apr 2026 14:38:51 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:51 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 07/29] sched/rt: Remove unnecessary runqueue pointer in struct rt_rq Date: Thu, 30 Apr 2026 23:38:11 +0200 Message-ID: <20260430213835.62217-8-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove the rq field in struct rt_rq. The rq field now is just caching the pointer to the global runqueue of the given rt_rq, so it is unnecessary as the global runqueue can be retrieved in other ways. Introduce served_rq_of_rt_rq to retrieve the runqueue the given rt_rq is serving. Signed-off-by: Yuri Andriaccio --- kernel/sched/rt.c | 7 ++----- kernel/sched/sched.h | 21 +++++++++++++-------- 2 files changed, 15 insertions(+), 13 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 392212ac90d8..dd4aee5570aa 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -101,10 +101,7 @@ void init_tg_rt_entry(struct task_group *tg, struct rt= _rq *rt_rq, struct sched_rt_entity *rt_se, int cpu, struct sched_rt_entity *parent) { - struct rq *rq =3D cpu_rq(cpu); - rt_rq->highest_prio.curr =3D MAX_RT_PRIO-1; - rt_rq->rq =3D rq; rt_rq->tg =3D tg; tg->rt_rq[cpu] =3D rt_rq; @@ -184,7 +181,7 @@ static void pull_rt_task(struct rq *); static inline void rt_queue_push_tasks(struct rt_rq *rt_rq) { - struct rq *rq =3D container_of_const(rt_rq, struct rq, rt); + struct rq *rq =3D served_rq_of_rt_rq(rt_rq); if (!has_pushable_tasks(rt_rq)) return; @@ -194,7 +191,7 @@ static inline void rt_queue_push_tasks(struct rt_rq *rt= _rq) static inline void rt_queue_pull_task(struct rt_rq *rt_rq) { - struct rq *rq =3D container_of_const(rt_rq, struct rq, rt); + struct rq *rq =3D served_rq_of_rt_rq(rt_rq); queue_balance_callback(rq, &per_cpu(rt_pull_head, rq->cpu), pull_rt_task); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 5833905d8eaa..770de5afd3a9 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -850,8 +850,6 @@ struct rt_rq { raw_spinlock_t rt_runtime_lock; unsigned int rt_nr_boosted; - - struct rq *rq; /* this is always top-level rq, cache? */ #endif #ifdef CONFIG_CGROUP_SCHED struct task_group *tg; /* this tg has "this" rt_rq on given CPU for runna= ble entities */ @@ -3308,11 +3306,16 @@ static inline struct task_struct *rt_task_of(struct= sched_rt_entity *rt_se) return container_of_const(rt_se, struct task_struct, rt); } +static inline struct rq *served_rq_of_rt_rq(struct rt_rq *rt_rq) +{ + WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group); + return container_of_const(rt_rq, struct rq, rt); +} + static inline struct rq *rq_of_rt_rq(struct rt_rq *rt_rq) { /* Cannot fold with non-CONFIG_RT_GROUP_SCHED version, layout */ - WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group); - return rt_rq->rq; + return cpu_rq(served_rq_of_rt_rq(rt_rq)->cpu); } static inline struct rt_rq *rt_rq_of_se(struct sched_rt_entity *rt_se) @@ -3323,10 +3326,7 @@ static inline struct rt_rq *rt_rq_of_se(struct sched= _rt_entity *rt_se) static inline struct rq *rq_of_rt_se(struct sched_rt_entity *rt_se) { - struct rt_rq *rt_rq =3D rt_se->rt_rq; - - WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group); - return rt_rq->rq; + return rq_of_rt_rq(rt_se->rt_rq); } #else static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se) @@ -3334,6 +3334,11 @@ static inline struct task_struct *rt_task_of(struct = sched_rt_entity *rt_se) return container_of_const(rt_se, struct task_struct, rt); } +static inline struct rq *served_rq_of_rt_rq(struct rt_rq *rt_rq) +{ + return container_of_const(rt_rq, struct rq, rt); +} + static inline struct rq *rq_of_rt_rq(struct rt_rq *rt_rq) { return container_of_const(rt_rq, struct rq, rt); -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C7D9D3B47DA for ; Thu, 30 Apr 2026 21:38:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585136; cv=none; b=Za9LsC6pGBrFBMONNxHa7xNuiVsPiy66r7WhsfLdVWOGUcgqbGNcUruOx3coulC3CMuTrv+ocp6kOWi1VVrqG9f0YOjTcsEWYDLFo3aFdurR5kvB77zjaaywuPWo5MiZREzQ7eWnkArd5x5RPDRzkZOPu7tg0w7xjwakKhiX9Gk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585136; c=relaxed/simple; bh=cpnmt/91jXCqkzzOdRu1rP4v3cuTUUQpsYBROazHHJ8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pfsmy+K2SNeTB6PYWc8WBEbDR4Yt6fKARDfC9lfpMvyMPT1aXl/wuKhtIob/BwlYAMUefOyZqwqEWmius+9z1oZD8N8+7wxxJGnnqi5+ym1P9XWSPNEKLFtmWgYWEHv1Iy5Yx5rN7DgyypRVLQDJAiIvX0LPZAnznqFoF4k5Lc8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QoJhK95E; arc=none smtp.client-ip=209.85.221.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QoJhK95E" Received: by mail-wr1-f53.google.com with SMTP id ffacd0b85a97d-448528f4e69so889823f8f.3 for ; Thu, 30 Apr 2026 14:38:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585133; x=1778189933; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wV0TDhPEWgtbRyMvsisK13V3qfSWWjG7b0Z2LUrzoD0=; b=QoJhK95EriODWmrUC91Wy81Z5cYkibX734yQvBJdJf9MWasK/6kJ1iCf5EaAnv0bGW f+9qTYJ4caBjPh+1IogV6miTTkaNZik2s4FSoCGTma6PKnzC7czI2UVisLvbmGdZK4S7 ALj2ZoBnALtEW19w/60HARn0llChyRp9We/8aEamSVnaRjBvfI+Id1yxKHykzg3yotyD ukAf83MHBM4J8gAfL+HGqcInyl/B5LpRnSyFBDhfrPkgv0WmAXcykzsXBSdi8itZVMsy Nr5NCQg55CoRx5bfJy3yVgnaRDley+UjeCsSus7TeW6kP05cvpU9OQKaBJxL7YsksDci rBbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585133; x=1778189933; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=wV0TDhPEWgtbRyMvsisK13V3qfSWWjG7b0Z2LUrzoD0=; b=Ygob6/kl+nGTZ07+D+GrkzRA7aGxk5UXefjH63oWyO0WMnB86ei2Qe1jDo7nuake6s vNYlz9hs68ZCmeYW5wh8ix8oPjIPWer+I83WpoPmW+WZwSKrQUv3zu6wubVhdPCFHZFo 5c4mvpbFzHCsr03RCb+UUU5e9G8tRa+/NJgAlUibRaIvazI8FW+naT79+GWCeImzZSlf Qv2ngAxzAXW0LXeWYpyCuWXqeJExFtRqgfutRp4+EZBdAd4fHVWguPb+y6cnhNGzG1pG XdzH6tyYBzg9h3yEjn4tPHZWZvPx+kSd6lP1gZAh3ggLP2UKiIKelqRIyDVqvRSCng4y KQAA== X-Gm-Message-State: AOJu0YxW4zVfiPC+fMgCgRIXTImC8UA/rXECHux7QUpfMPqQKeykLumH xskNj3yIyY8HuuseSwYGs2XQ9ty3IL8lFZTQFBHYRK4YdPOi1Oy9Kx3b X-Gm-Gg: AeBDiesmdIekNTUZbnqDVa9Soud3w3hu9E2eB+DyPqiPebUq4qNQrMPvy6S4VkCt475 FpzURr4VMYP3R5usefozSGl7ElJ9tJd7QSpji0iC6Cl1Qnxqjm+PphRztTM4mWZpTN3rsVg4XNN 4rkQMzh67LaifKXPfRVZUGsq3BJ7TROjpSxhCE5WlY4xR00bKSUJKJ+kv8ool5AJIbi1kiRYD8f U04o7MS2iOr22CJA5JtuiqbOT40uouziBmZ0wVy6bp6HuDDDbIRCymBLHdIKI26rVIoU/OirjqA 5eTb1D1ZuP3fZE2D1ODc6kYsHsOJy36ZAvdh3QrhaSZ3io/f3KBWGIgtNxPLAMEmPetplGuXVvN cVvAEIzXs/kbM0t2dFDnOOx0zwQMJBUNb9/H0PaQMJTFqWisN8EJpWhkWSTQ0JJWFr+GiuDsjq3 F1h9Xhndt2GQR16U3+Gvz5QamTbbNPmij6LurHWUy1 X-Received: by 2002:a05:6000:2088:b0:43d:70b3:7edf with SMTP id ffacd0b85a97d-4493cc3fe15mr7547387f8f.12.1777585133127; Thu, 30 Apr 2026 14:38:53 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:52 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 08/29] sched/rt: Introduce HCBS specific structs in task_group Date: Thu, 30 Apr 2026 23:38:12 +0200 Message-ID: <20260430213835.62217-9-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Add an array of sched_dl_entity objects in task_group. Create the dl_bandwidth struct and add a field for it in task_group. Add a rq pointer field in struct rt_rq. --- For each CPU on the host system, the task_group manages a sched_dl_entity a= nd a rt_rq object, which in turn keeps a pointer to its locally managed runque= ue. The sched_dl_entity object manages the deadline server which will be schedu= led for execution on the CPU, while the rt_rq object is instead used to referen= ce the local runqueue's specific data and entities and it is used when an actu= al task must be scheduled when the CPU is given to the dl_server. The dl_bandwidth object keeps track of the currently allocated bandwidth for the cgroup. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: Yuri Andriaccio Signed-off-by: Yuri Andriaccio Signed-off-by: luca abeni --- kernel/sched/sched.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 770de5afd3a9..1c614e54eba4 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -322,6 +322,13 @@ struct rt_bandwidth { unsigned int rt_period_active; }; +struct dl_bandwidth { + raw_spinlock_t dl_runtime_lock; + u64 dl_runtime; + u64 dl_period; +}; + + static inline int dl_bandwidth_enabled(void) { return sysctl_sched_rt_runtime >=3D 0; @@ -495,10 +502,17 @@ struct task_group { #endif /* CONFIG_FAIR_GROUP_SCHED */ #ifdef CONFIG_RT_GROUP_SCHED + /* + * Each task group manages a different scheduling entity per CPU, i.e. a + * different deadline server, and a runqueue per CPU. All the dl-servers + * share the same dl_bandwidth object. + */ struct sched_rt_entity **rt_se; + struct sched_dl_entity **dl_se; struct rt_rq **rt_rq; struct rt_bandwidth rt_bandwidth; + struct dl_bandwidth dl_bandwidth; #endif struct scx_task_group scx; @@ -854,6 +868,12 @@ struct rt_rq { #ifdef CONFIG_CGROUP_SCHED struct task_group *tg; /* this tg has "this" rt_rq on given CPU for runna= ble entities */ #endif + + /* + * The cgroup's served runqueue if the rt_rq entity belongs to a cgroup, + * otherwise the top-level global runqueue. + */ + struct rq *rq; }; static inline bool rt_rq_is_runnable(struct rt_rq *rt_rq) -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D5CB3BC693 for ; Thu, 30 Apr 2026 21:38:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585138; cv=none; b=Pp8AjjTQWwcu0WTeC+/V2HGYHjyYpH6vFbaKCFM0vl/vdSfCLr+dEtfQ4t8VTTy6JNFrMBQnYFDTTp5sUr/ZspijjIGCf+KFpxbwec5HWfXDnHCCq+opdHZpvrq2QnFSMxXYsPn1Bh8leWiGlVLQoE/Y6fxZxEvQNbq6ImIgi/U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585138; c=relaxed/simple; bh=YgvxgQKFrmUqNHwd8X54jzOuM2jQ9V/uIuRrad/JscU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qjzjm+m8wGAbylGOqlj11JJO8+xMyduJ/37yhdPcCpsY3hZJv1qIXPsSOr/bNFvM31DGbOuLylJXjmeoBkVkvzdKrRYfkvAYRZBJQT826jll+gOwg2qV0DncaAfOFE35pj50BCYpu/tjH0Jd35/0u6392deSwNCXRVFbKsLAkAw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dcljJHEZ; arc=none smtp.client-ip=209.85.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dcljJHEZ" Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-488ad135063so12390615e9.0 for ; Thu, 30 Apr 2026 14:38:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585135; x=1778189935; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=n3rsAGkpkqdk5LPEdST/HdpNxl4aWz29JOMrx8GqBkg=; b=dcljJHEZyVvZhE/BD4SPeQlCjXbjI2TNeYTIIiQpfcIqBAaJ7fxgGgP2uEx+A82YDQ dg19UCmRiSHw2KxScwNSElShir659EFqegszEvaONH79tiDoGxIKd6KUr/tNlDW1Q0nS 0frp8d03bhlBKbwGQow2V+mdh0UC4GFMJBALl4u6yCNC6+XjEGYL4tc3ndCokKscAJtY G8SjVxsDBfYcQFVCTN6FIizjg5/7QzWCoSRd2C0HgcuSR7zC7pt/dLxQwCSOPwlt3izV 2/9iJnbnHyUT9SrqpHilbRR1o78gtUH29Bbul36Wr67DjE+9gtY44A8mdqwpPrH3ggHR qiUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585135; x=1778189935; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=n3rsAGkpkqdk5LPEdST/HdpNxl4aWz29JOMrx8GqBkg=; b=BDSmclaDywex5UCBCAcAkSmmVyb1gcN7d27Av6w10kRtjBtdBPt5wztCZ1PGI75W8o aFHi1/q+V5ocL/ajT70dOzC6+OIIeQvVegP4pMdYXXOttwFF1PMbjv9xVoIFX0c95WCW +JHXdPNznZHzltCpj+5sC99l/lgEGQPaoPUmB/PpXo9+uG/Z1ga/WKzCCbivSbJfaa5y av9NzRsUYUNmbRhJnJo6uJBYWX9b1+KDohz3nyKbXTBnMxmiYuhZnv2lc5dc2XuXiQqI LE6eyi6ApvXGeT1u2/c6xanAc0JoyY411qSxUfZ8P9IctUdJtqMbs/zs6EMxFi+miR10 bm+Q== X-Gm-Message-State: AOJu0YykFv4ktty0vI129YR0CExZdLUEclp1unWYmhUt5d2+f/n7dVG/ nRhgtjcho3Nu+46IQRlKcJ8n+8eHwDJZtqse2jkWTXiObFZiE/7LjI1z X-Gm-Gg: AeBDieucYb22bBHwFBLfIqWrG8B2Jng9FLYw/XDnkhSfuma37YT8aCA2DLBEAJp/uN+ eFHCdIrH2v4XM3l8vWDERSzNq5Wc2G9vTprIHsORlBpu9bhooPpMLCdFoDVLKU99A5h0EeWBdRw 3h9HJcXdsHkk6D55i1MyQ1BnFwOoMaqP6CiefbyW/Yrvs7yQZt46183URy8KesRlFvOwLHdN7QI 2RIldkRtpot+95y1T+M1Fh6IYJwOjOqiHOPUU0e+ZZ2vcJB9DYgIpqbi79fFFGYTI7rSvlq/1Yo KtpMJyNvhpGm/4K3ThnnqOOhj52WmHZCzunXUEmmLTEUhbKBsZyxU7egVHR2teHlJi+MJ5NtXXU 3RLMFKFRd3lRSZ4ZZ5dMfYBPhWA+QTJthbjJvWpKoyObTk98xqgI+XZ+KPEVEzDU8QOG6s3VGQa sbX0Wrwn0xbApUjCFjK6pCjrfMJGIWMHkdAavxEeUu0k6wrBFWn4M= X-Received: by 2002:a05:600c:5254:b0:488:b8bc:6a32 with SMTP id 5b1f17b1804b1-48a8445e90fmr77149405e9.23.1777585134835; Thu, 30 Apr 2026 14:38:54 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:54 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 09/29] sched/core: Initialize HCBS specific structures Date: Thu, 30 Apr 2026 23:38:13 +0200 Message-ID: <20260430213835.62217-10-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Update autogroups' creation/destruction to use the new data structures. Initialize the default bandwidth for rt-cgroups (sched_init). Initialize rt-scheduler's specific data structures for the root control group (sched_init). Remove init_tg_rt_entry in favour of manual setup of the necessary data structures in sched_init. Add utility functions to check (and get) if a rt_rq entity is connected to a rt-cgroup. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: Yuri Andriaccio Signed-off-by: Yuri Andriaccio Signed-off-by: luca abeni --- kernel/sched/autogroup.c | 4 ++-- kernel/sched/core.c | 11 +++++++++-- kernel/sched/deadline.c | 8 ++++++++ kernel/sched/rt.c | 11 ----------- kernel/sched/sched.h | 30 +++++++++++++++++++++++++++--- 5 files changed, 46 insertions(+), 18 deletions(-) diff --git a/kernel/sched/autogroup.c b/kernel/sched/autogroup.c index e380cf9372bb..2122a0740a19 100644 --- a/kernel/sched/autogroup.c +++ b/kernel/sched/autogroup.c @@ -52,7 +52,7 @@ static inline void autogroup_destroy(struct kref *kref) #ifdef CONFIG_RT_GROUP_SCHED /* We've redirected RT tasks to the root task group... */ - ag->tg->rt_se =3D NULL; + ag->tg->dl_se =3D NULL; ag->tg->rt_rq =3D NULL; #endif sched_release_group(ag->tg); @@ -109,7 +109,7 @@ static inline struct autogroup *autogroup_create(void) * the policy change to proceed. */ free_rt_sched_group(tg); - tg->rt_se =3D root_task_group.rt_se; + tg->dl_se =3D root_task_group.dl_se; tg->rt_rq =3D root_task_group.rt_rq; #endif /* CONFIG_RT_GROUP_SCHED */ tg->autogroup =3D ag; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a203a27fb16d..4e58b4f165ed 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8636,7 +8636,7 @@ void __init sched_init(void) scx_tg_init(&root_task_group); #endif /* CONFIG_EXT_GROUP_SCHED */ #ifdef CONFIG_RT_GROUP_SCHED - root_task_group.rt_se =3D (struct sched_rt_entity **)ptr; + root_task_group.dl_se =3D (struct sched_dl_entity **)ptr; ptr +=3D nr_cpu_ids * sizeof(void **); root_task_group.rt_rq =3D (struct rt_rq **)ptr; @@ -8647,6 +8647,11 @@ void __init sched_init(void) init_defrootdomain(); +#ifdef CONFIG_RT_GROUP_SCHED + init_dl_bandwidth(&root_task_group.dl_bandwidth, + global_rt_period(), global_rt_runtime()); +#endif /* CONFIG_RT_GROUP_SCHED */ + #ifdef CONFIG_CGROUP_SCHED task_group_cache =3D KMEM_CACHE(task_group, 0); @@ -8698,7 +8703,9 @@ void __init sched_init(void) * starts working after scheduler_running, which is not the case * yet. */ - init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL); + rq->rt.tg =3D &root_task_group; + root_task_group.rt_rq[i] =3D &rq->rt; + root_task_group.dl_se[i] =3D NULL; #endif rq->next_class =3D &idle_sched_class; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 67615a0539fe..7c039d5f3c5d 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -505,6 +505,14 @@ static inline int is_leftmost(struct sched_dl_entity *= dl_se, struct dl_rq *dl_rq static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq); +void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime) +{ + raw_spin_lock_init(&dl_b->dl_runtime_lock); + dl_b->dl_period =3D period; + dl_b->dl_runtime =3D runtime; +} + + void init_dl_bw(struct dl_bw *dl_b) { raw_spin_lock_init(&dl_b->lock); diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index dd4aee5570aa..741fac9f57ac 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -97,17 +97,6 @@ void free_rt_sched_group(struct task_group *tg) return; } -void init_tg_rt_entry(struct task_group *tg, struct rt_rq *rt_rq, - struct sched_rt_entity *rt_se, int cpu, - struct sched_rt_entity *parent) -{ - rt_rq->highest_prio.curr =3D MAX_RT_PRIO-1; - rt_rq->tg =3D tg; - - tg->rt_rq[cpu] =3D rt_rq; - tg->rt_se[cpu] =3D rt_se; -} - int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent) { if (!rt_group_sched_enabled()) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1c614e54eba4..e7e263d3cddb 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -604,9 +604,6 @@ extern void start_cfs_bandwidth(struct cfs_bandwidth *c= fs_b); extern void unthrottle_cfs_rq(struct cfs_rq *cfs_rq); extern bool cfs_task_bw_constrained(struct task_struct *p); -extern void init_tg_rt_entry(struct task_group *tg, struct rt_rq *rt_rq, - struct sched_rt_entity *rt_se, int cpu, - struct sched_rt_entity *parent); extern int sched_group_set_rt_runtime(struct task_group *tg, long rt_runti= me_us); extern int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_= us); extern long sched_group_rt_runtime(struct task_group *tg); @@ -2905,6 +2902,7 @@ extern void resched_curr(struct rq *rq); extern void resched_curr_lazy(struct rq *rq); extern void resched_cpu(int cpu); +void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime); extern void init_dl_entity(struct sched_dl_entity *dl_se); extern void init_cfs_throttle_work(struct task_struct *p); @@ -3348,6 +3346,22 @@ static inline struct rq *rq_of_rt_se(struct sched_rt= _entity *rt_se) { return rq_of_rt_rq(rt_se->rt_rq); } + +static inline int is_dl_group(struct rt_rq *rt_rq) +{ + return rt_rq->tg !=3D &root_task_group; +} + +/* + * Return the scheduling entity of this group of tasks. + */ +static inline struct sched_dl_entity *dl_group_of(struct rt_rq *rt_rq) +{ + if (WARN_ON_ONCE(!is_dl_group(rt_rq))) + return NULL; + + return rt_rq->tg->dl_se[served_rq_of_rt_rq(rt_rq)->cpu]; +} #else static inline struct task_struct *rt_task_of(struct sched_rt_entity *rt_se) { @@ -3377,6 +3391,16 @@ static inline struct rt_rq *rt_rq_of_se(struct sched= _rt_entity *rt_se) return &rq->rt; } + +static inline int is_dl_group(struct rt_rq *rt_rq) +{ + return 0; +} + +static inline struct sched_dl_entity *dl_group_of(struct rt_rq *rt_rq) +{ + return NULL; +} #endif DEFINE_LOCK_GUARD_2(double_rq_lock, struct rq, -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 311073BE17A for ; Thu, 30 Apr 2026 21:38:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585139; cv=none; b=jnH6AvizlbMeCoGsxkQ5eQgCBWqwotyDPK6ezrh5oGOTuQpZ9mSRKTRR6tvWdoKy6fa6hYdm+ySfR9Uf/iIvROplHovZSJ9Z0Qm+V7IBDpp4+twb+D5fi9xmEK8SdPEOeQmgka1ujCNkhcQH+vvlnVacRxRFlRu9IoaOG57gzlc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585139; c=relaxed/simple; bh=BlJOFEzu9bdyM9AcfruFGOYlUPLcvhMpRyo4BlvxWGs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=P51pnkKfealJ/aCUAJenBPPQtehbgSsoZfQrXbRi82o3DFA5oMHNTd7O1Pa0/G6o3+I7/5xXENWdMz27DdvNPO++vsjJqydFgiUDCqLD1eEA3uTM5SVgv02KfqW9Dr3Y4L6uTQB7w3i5eCG+/uIMQE481qdN35ju5th5WS6OF/0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=nW0ZYg+r; arc=none smtp.client-ip=209.85.221.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="nW0ZYg+r" Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-448528f4e69so889842f8f.3 for ; Thu, 30 Apr 2026 14:38:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585137; x=1778189937; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=56AjVv5mQFBEEY5//aFc9hW5hMDCJaB3NwSvKxtSKSk=; b=nW0ZYg+rwKUqL6XOes73RnN1933xHwBiXHpjrGvk47xMGFvMJtESdHu4M0i9xq7VcC nLH4cpqSEDpg0hkRcAvT7wZX0xZRhri71uyKhCUy8JJ5HpoPVP/G8mYUtT+dm+qzGJBF zINT2zDISoPaPiL0L95SBxBLQ1nT0nOQy39xuM1tVnytq31aGBWrswaKb5mtoOO/vM51 5XZe+2MVI65yZCmFa334B4q2dOr9zhwBSOF9icszkp2c1lKK0FUlyr4wQnxctsTrNtW5 SFO6gsMUWVNwa0azbma3wWzKdz77BoDmZMEDzcD9bkraLraM4wO/QgFM8shrcte/VwsJ olmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585137; x=1778189937; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=56AjVv5mQFBEEY5//aFc9hW5hMDCJaB3NwSvKxtSKSk=; b=frdB9noqyGAkUc4LpRNgBXTkeUr5nAiAxdGNRty7m1V7imaNs1T7Tk4ZsQ5rZpbXJJ 0IFnHJc0ODmqjmYmxorep+scPWxTSDh0BFg69vDhhRHXAYIZpLO6tpGwutrbYIa10WhR yoDF29q4pYGpn90to6uisIRp83Cb8v5TzzJi6jb3/F11OGIwnXHLuOQiUKnxmcY16zx7 X8FUFPv4T3ze7z55fNA0ChqriIxFR1hvCmr5XFe/HnFXD37EpuYSG+FnfL1EVQF7MeDK vyl3iNn72e6BSGAxTeF+rjpGtuUQZlBtw04gZlEwWYVqALN5c63FmA7Hk0t6ttQLjHjN GLGQ== X-Gm-Message-State: AOJu0Yx1CU+Dw7wBgdcjcBS3IfAe/NOZI7vIN+r7a40uvmYyJiWKLD/w KI8AY7zi4vZg973A5hoRLJlFa8GyX76TfJSJnURjWz39nfexwcjJTkay X-Gm-Gg: AeBDies8UwJRH68usBnftXY5b9jvkjA8gnA5t3SKl/MFka5yNbMddz3mZF6Yjs1sN2j Su+y2uYo0TDdM4ZGZ7R6rfeSr6T+tMj+CRGeF6osPc+FrO3lHWnXasMUr3i0rOxVlen3wa5/ZPJ Q3foRXaz7t/yMA6xmuh2CwMQyIjXiqF4dIg7Ksgw1Ub/NqmDXwhnbWpfXDfKo46JyEhDcOaPjxz kWfJ955cG9bNWzP6+E5rLgmnzZKznZnGvkjNu2uHZv/PUPvIloP9wFu+q10kjt3G2vLZVK4DO0j HdZjObdtav4nVbvg37KG2x/Ssi5HWCAMWOUJJ5fE3m3EZSU381BLvCEOeF1WqQ2Zw1Png5TVO32 jRkj4JUOkVId/40LsqyYGD/S4v0NXO5//FUQuCkqWZ+XmdBGklf4FFjYFeV7P8JjYChAB9I+VDm SvtQrZ5BUmX2+KtKokGRmeiKrLzLfGJSzJW4IGYxJO X-Received: by 2002:a05:6000:26cc:b0:43d:1c39:593c with SMTP id ffacd0b85a97d-4493e88c02amr7774053f8f.30.1777585136615; Thu, 30 Apr 2026 14:38:56 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:56 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 10/29] sched/deadline: Add dl_init_tg Date: Thu, 30 Apr 2026 23:38:14 +0200 Message-ID: <20260430213835.62217-11-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Add dl_init_tg to initialize and/or update a rt-cgroup dl_server and to also account the allocated bandwidth. This function is currently unhooked and will be later used to allocate bandwidth to rt-cgroups. Add lock guard for raw_spin_rq_lock_irq for cleaner code. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: Yuri Andriaccio Signed-off-by: Yuri Andriaccio Signed-off-by: luca abeni --- kernel/sched/deadline.c | 31 +++++++++++++++++++++++++++++++ kernel/sched/sched.h | 5 +++++ 2 files changed, 36 insertions(+) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 7c039d5f3c5d..5532ca4ad969 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -332,6 +332,37 @@ void cancel_inactive_timer(struct sched_dl_entity *dl_= se) cancel_dl_timer(dl_se, &dl_se->inactive_timer); } =20 +#ifdef CONFIG_RT_GROUP_SCHED +void dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_runtime, u64 rt_peri= od) +{ + struct rq *rq =3D container_of_const(dl_se->dl_rq, struct rq, dl); + int is_active; + u64 new_bw; + + guard(raw_spin_rq_lock_irq)(rq); + is_active =3D dl_se->my_q->rt.rt_nr_running > 0; + + update_rq_clock(rq); + dl_server_stop(dl_se); + + new_bw =3D to_ratio(rt_period, rt_runtime); + dl_rq_change_utilization(rq, dl_se, new_bw); + + dl_se->dl_runtime =3D rt_runtime; + dl_se->dl_deadline =3D rt_period; + dl_se->dl_period =3D rt_period; + + dl_se->runtime =3D 0; + dl_se->deadline =3D 0; + + dl_se->dl_bw =3D new_bw; + dl_se->dl_density =3D new_bw; + + if (is_active) + dl_server_start(dl_se); +} +#endif + static void dl_change_utilization(struct task_struct *p, u64 new_bw) { WARN_ON_ONCE(p->dl.flags & SCHED_FLAG_SUGOV); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index e7e263d3cddb..ca69d2132061 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -423,6 +423,7 @@ extern void dl_server_init(struct sched_dl_entity *dl_s= e, struct dl_rq *dl_rq, struct rq *served_rq, dl_server_pick_f pick_task); extern void sched_init_dl_servers(void); +extern void dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_runtime, u64 = rt_period); =20 extern void fair_server_init(struct rq *rq); extern void ext_server_init(struct rq *rq); @@ -2023,6 +2024,10 @@ static inline struct rq *_this_rq_lock_irq(struct rq= _flags *rf) __acquires_ret return rq; } =20 +DEFINE_LOCK_GUARD_1(raw_spin_rq_lock_irq, struct rq, + raw_spin_rq_lock_irq(_T->lock), + raw_spin_rq_unlock_irq(_T->lock)) + #ifdef CONFIG_NUMA =20 enum numa_topology_type { --=20 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1BEA3C0634 for ; Thu, 30 Apr 2026 21:38:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585141; cv=none; b=eEQmfQYuCXTaZ7XyoBPooD/dFl8R91U089OU23FSZ0qFcjjWsYOooDHjU4n/+fQ9VoEqfPOab7Pa+Dq7vfFJDpqEc4TRD87chIO76mD3ebfQ/4dtB5u4sZr67U3azJ14EE3ajFz52xtoLWl6w2UW2g8SWeujeonkcSFBTEw+we0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585141; c=relaxed/simple; bh=1wPzz5pq1UBbOiwmpuuOy7IceVMbktvSPYQdCiYdHgw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=a9JpD3/Ck3xlooU/7+vKeMT3jrCha1uneDByLjGhOEDQta0cb3+cNH5R9GX83kTFwh/TU+VJnN8zDjNKl76de8XPSCufO3QbsvBJ0E8PM5oQMHVzLPBcGF9qdYudoC8h4qzDUNmTfYETDfDoLUEDxmhOLPxtsl+zT+IVGCuaU0Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=SDK8uaGN; arc=none smtp.client-ip=209.85.221.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="SDK8uaGN" Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-43d75312379so1474621f8f.1 for ; Thu, 30 Apr 2026 14:38:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585138; x=1778189938; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NPu5z0uxmwyiDPZKfRv0f/fAi+NkPj42GoDnwTDz+xc=; b=SDK8uaGNy8ls9sFaP/XB8Ahw60LXTa6kn8uhQj4VvKz5GtDT2/EyRnNILt9hmFcfAE 4wcaUncm0udr7BPE0lYwtGZPKtJTK4LWQm4ZIPp0hoeycc4bkX/gKLdLoNbJRRBx6IN3 WDn05ZGq08TEo3o52E6A6JOk9TbtU9pzcWLepw87BVCNEV993rSuge416eK1V/AqLJwR 0+tfj1maPQZqkBAsyrHNm7U5/xHlngVo66YxYJR7JSTXA9k+z60EUNRC7OhynGETtp2p LVC07/AFw63cfUdCt/bBW3o26bSZ6coTHhr/Iz6ma3jnsIE02LBbVdgBIJ03yzgC3qig iQyA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585138; x=1778189938; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=NPu5z0uxmwyiDPZKfRv0f/fAi+NkPj42GoDnwTDz+xc=; b=lwvSdBrAxVZZW2CGvTVIozXF0U5IV5sSv32km7zcWfbUVDe10panQI8w/6wIPjBGzn 1qfdDdZKzm39T3WK31+4ofMSFBwrwAMqgoJxDVb5wfVyJgk1oaAhPoNWExKjmudU8rqX xbatlXTIhPokDFQvgaO3hatScQessqM5gzP2pSyT8uC0M+q8aTGDkfwz7S2A/I7tHNjB lRO8eJul6NxJuCJ4LokX944lZKWXxxh66Vk35WStZ4S4xK7mgV3OasWVNsLjYbWSvIKp E7YDi8NYDfm4iucsb4UWZtYGfOtjG2GGR2lEfhGxmHCIHIkYihP0aGSHIzHY7ObaQ2fG P9LQ== X-Gm-Message-State: AOJu0Yxk+sj34IZVfeagEGC/ZSh1IUyJf2oMknrZvAlWy03JZ4Yqls20 pEFxQ6jlNlIj3RectHDW4HlyqpoGqVODZ6iPokqmPEIl7dNaruPzBS/iAfH3Qw== X-Gm-Gg: AeBDietbAM7vcnIB+Gr6dgcsq+vq83Z7ysRBZq/4Agv/thkXBgzbGaK+c3TzQQ93Y7G B/F6RuxtIk0EZbvUzN2cxzUdPTxS+mYpM9EmrYjsCwadHyWLKT2mdDBeV9F5PA+cmxbYjvzmpUY +hm/+zKPNl9PLlsZnp0VdnbUEzZpCgpsJ46OICY83gQx3VfqLPl/b3wXYJ7U9CsdNO6bU9dgI6z gLM5801nyiT5HEnjpFhvKdA//r0Dh3bDM7ypZB4iw9Fro1wPubc3JRJPKw1iLH29EILGuMPd7o7 YhntuQjq9UO+mWVQgcpmg0Uo+mKzhTPaDare6xz6zCE6C+/Jgs3Rmsq39SksCNCggUUxzJDWLW3 cp8qTyYLRBzCEzOfLmZMbC/noVFDHaEC895pFA+ODVZv61HbmHw1QpoTS9S39BAdmYn2L2M32Cy 1jGMdoCuq8j+kIvnmHRSWR2QM9KSYXtLhQyzKzEqyVaDmAT+0BVuM= X-Received: by 2002:a05:6000:1786:b0:43d:7874:5d3b with SMTP id ffacd0b85a97d-4494e8d73bcmr7678434f8f.9.1777585138154; Thu, 30 Apr 2026 14:38:58 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:57 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 11/29] sched/rt: Add {alloc/unregister/free}_rt_sched_group Date: Thu, 30 Apr 2026 23:38:15 +0200 Message-ID: <20260430213835.62217-12-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Add allocation and deallocation code for rt-cgroups. Declare dl_server specific functions (only skeleton, but no implementation yet), needed by the deadline servers to be called when trying to schedule. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: Yuri Andriaccio Signed-off-by: Yuri Andriaccio Signed-off-by: luca abeni --- kernel/sched/rt.c | 151 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 149 insertions(+), 2 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 741fac9f57ac..3d7f2b2ebe60 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -88,24 +88,171 @@ void init_rt_rq(struct rt_rq *rt_rq) void unregister_rt_sched_group(struct task_group *tg) { + int i; + + if (!rt_group_sched_enabled()) + return; + + if (!tg->dl_se || !tg->rt_rq) + return; + for_each_possible_cpu(i) { + if (!tg->dl_se[i] || !tg->rt_rq[i]) + continue; + + if (tg->dl_se[i]->dl_runtime) + dl_init_tg(tg->dl_se[i], 0, tg->dl_se[i]->dl_period); + } } void free_rt_sched_group(struct task_group *tg) { + int i; + unsigned long flags; + if (!rt_group_sched_enabled()) return; + + if (!tg->dl_se || !tg->rt_rq) + return; + + for_each_possible_cpu(i) { + if (!tg->dl_se[i] || !tg->rt_rq[i]) + continue; + + /* + * Shutdown the dl_server and free it + * + * Since the dl timer is going to be cancelled, + * we risk to never decrease the running bw... + * Fix this issue by changing the group runtime + * to 0 immediately before freeing it. + */ + if (tg->dl_se[i]->dl_runtime) + dl_init_tg(tg->dl_se[i], 0, tg->dl_se[i]->dl_period); + + raw_spin_rq_lock_irqsave(cpu_rq(i), flags); + hrtimer_cancel(&tg->dl_se[i]->dl_timer); + raw_spin_rq_unlock_irqrestore(cpu_rq(i), flags); + kfree(tg->dl_se[i]); + + /* Free the local per-cpu runqueue */ + kfree(served_rq_of_rt_rq(tg->rt_rq[i])); + } + + kfree(tg->rt_rq); + kfree(tg->dl_se); +} + +static struct task_struct *rt_server_pick(struct sched_dl_entity *dl_se, s= truct rq_flags *rf) +{ + return NULL; +} + +static inline void __rt_rq_free(struct rt_rq **rt_rq) +{ + int i; + + for_each_possible_cpu(i) { + kfree(served_rq_of_rt_rq(rt_rq[i])); + } + + kfree(rt_rq); +} + +DEFINE_FREE(rt_rq_free, struct rt_rq **, if (_T) __rt_rq_free(_T)) + +static inline void __dl_se_free(struct sched_dl_entity **dl_se) +{ + int i; + + for_each_possible_cpu(i) { + kfree(dl_se[i]); + } + + kfree(dl_se); +} + +DEFINE_FREE(dl_se_free, struct sched_dl_entity **, if (_T) __dl_se_free(_T= )) + +static int __alloc_rt_sched_group_data(struct task_group *tg) { + /* Instantiate automatic cleanup in event of kalloc fail */ + struct rt_rq **tg_rt_rq __free(rt_rq_free) =3D NULL; + struct sched_dl_entity **tg_dl_se __free(dl_se_free) =3D NULL; + struct sched_dl_entity *dl_se __free(kfree) =3D NULL; + struct rq *s_rq __free(kfree) =3D NULL; + int i; + + tg_rt_rq =3D kcalloc(nr_cpu_ids, sizeof(struct rt_rq *), GFP_KERNEL); + if (!tg_rt_rq) + return 0; + + tg_dl_se =3D kcalloc(nr_cpu_ids, + sizeof(struct sched_dl_entity *), GFP_KERNEL); + if (!tg_dl_se) + return 0; + + for_each_possible_cpu(i) { + s_rq =3D kzalloc_node(sizeof(struct rq), + GFP_KERNEL, cpu_to_node(i)); + if (!s_rq) + return 0; + + dl_se =3D kzalloc_node(sizeof(struct sched_dl_entity), + GFP_KERNEL, cpu_to_node(i)); + if (!dl_se) + return 0; + + tg_rt_rq[i] =3D &no_free_ptr(s_rq)->rt; + tg_dl_se[i] =3D no_free_ptr(dl_se); + } + + tg->rt_rq =3D no_free_ptr(tg_rt_rq); + tg->dl_se =3D no_free_ptr(tg_dl_se); + + return 1; } int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent) { + struct sched_dl_entity *dl_se; + struct rq *s_rq; + int i; + if (!rt_group_sched_enabled()) return 1; + /* Allocate all necessary resources beforehand */ + if (!__alloc_rt_sched_group_data(tg)) + return 0; + + /* Initialize the allocated resources now. */ + init_dl_bandwidth(&tg->dl_bandwidth, 0, 0); + + for_each_possible_cpu(i) { + s_rq =3D served_rq_of_rt_rq(tg->rt_rq[i]); + dl_se =3D tg->dl_se[i]; + + init_rt_rq(&s_rq->rt); + s_rq->cpu =3D i; + s_rq->rt.tg =3D tg; + + init_dl_entity(dl_se); + dl_se->dl_runtime =3D tg->dl_bandwidth.dl_runtime; + dl_se->dl_deadline =3D tg->dl_bandwidth.dl_period; + dl_se->dl_period =3D tg->dl_bandwidth.dl_period; + dl_se->runtime =3D 0; + dl_se->deadline =3D 0; + dl_se->dl_bw =3D to_ratio(dl_se->dl_period, dl_se->dl_runtime); + dl_se->dl_density =3D to_ratio(dl_se->dl_deadline, dl_se->dl_runtime); + dl_se->dl_server =3D 1; + dl_server_init(dl_se, &cpu_rq(i)->dl, s_rq, rt_server_pick); + } + return 1; } -#else /* !CONFIG_RT_GROUP_SCHED: */ +#else /* !CONFIG_RT_GROUP_SCHED */ void unregister_rt_sched_group(struct task_group *tg) { } @@ -115,7 +262,7 @@ int alloc_rt_sched_group(struct task_group *tg, struct = task_group *parent) { return 1; } -#endif /* !CONFIG_RT_GROUP_SCHED */ +#endif /* CONFIG_RT_GROUP_SCHED */ static inline bool need_pull_rt_task(struct rq *rq, struct task_struct *pr= ev) { -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 896753C3BF7 for ; Thu, 30 Apr 2026 21:39:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585143; cv=none; b=gJXOeCdinJ3t04/tyLYhPQ9JUiS+YQYb9D69PCAAXFZlVvgedd12Ntstt38KG17hq9iVOEJ3BB93riZJvaI72M/7A2ZtHOsRSkt4zrS9kCiRwxkpqnUWTU9SBTI4r71eAskY6gOFlO+ox4QVSciUz+MM4pmCcD4xVJlmHW3dCTU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585143; c=relaxed/simple; bh=rFpu0D2aVCEzRgzR3XPVClzKVdrTKgkfikJlcU312Mg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gorAxnqt2GzJLMkbBfwEIAavFluNgzkpxB9JIdJRW4/rbx1jQpJuz45OqW3cgKXDtCmqFDd76nuRfBsKQKbzdjEyRon1ULkfXpX5IWzCxfC4okutMOuAPrMmy+P92POZMiiVmlYyHeseeaFQ6Abt9dIGj5+ao0flQP3GBRVeFYE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=sAuy9cNs; arc=none smtp.client-ip=209.85.221.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="sAuy9cNs" Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-43d76dd4ee8so1107847f8f.2 for ; Thu, 30 Apr 2026 14:39:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585140; x=1778189940; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Bx1RZ3koYOKbALeOWWHhUJW2PB0mUJKZsqlzc6GuGWc=; b=sAuy9cNsvV9nTCTNRLbYiklwliSGInaV+1S00fYORzF9WDlrr0N9VDf6MoWET9Bylg 6GEvA4P4+68hpxiW8B/rtk4eAQVLuBdLwkdtO/IWE7LFi9J0YCswscNx3xaBjSK+ADRi W4yTSl8i/SZQ7/QBbfSzpL4tvTvnK2mp+6DSjKMlJuZwfUSw+6B3+0VCZ4OzW7gXkSNS xuoEkWRtMnq5MNl1AzkJhVOGSzj7NWqSAIlFcAjRL/Eb+wS0FxtKgltShCWpwjZ8POfb ik0TVIgnYCi7G4oHauoNyX8erIa4h8R1WrdTypuPPbpC+NPayHJL8KiJfl6jALowv8bR 2Qbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585140; x=1778189940; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Bx1RZ3koYOKbALeOWWHhUJW2PB0mUJKZsqlzc6GuGWc=; b=GkBRH5PDdpc0+q4wUk09wfpOPqYbc+gHRFSLrc4J7wuNV/Xf/Ir32ZR02t32Fm26Qz 53qnMnYnd59e3IKSDZ8oRK3ZHRSp6X99gtSpUAl5XCP3TMOprPQA1K6T6qrstGVWH1Zz jUp4jTN1yXxmQ0irAyiWGETzecVH1YKr6MFBNLa6fcDLSLdhgomesggFjN3hIKTqmJ9G epZC4LwzZNZ0jexFi7reHnK/tSPBiwT6qWxmhF88J5qnM9pmewhmys75Ll72TC7IdbbJ Nguwq1pEHCPU4jIvIwvNxRFtJ3lGruRIsehvgXPJOZ9RCyMHmM+NVxSrMwTEaVc+LSFB 47sQ== X-Gm-Message-State: AOJu0Yzl+hlks2DgVv4jhUQSMRDvVHWL68T6ZDy0urB9uGIxXuheL3hY m6Ii5fSLLBpx2zsHQPFyiANNZytvCrhIr+rbKPUZlQ4HH3VZrNFXXteE9/4pxw== X-Gm-Gg: AeBDiesr/ku+NygXejg0R3nStCltOW1TBvbkhFvb04tzvW2V4+4+JDb12SzgDKWtij0 E5PvKxvzPZamG8Iym6EUyhqkKjAh4t89cvmi5UAIDCX6fNY1nFJAr7wJv6A4vy9+uIGXsXzJqyS 0KLU9POXQ9R6Vbho2S1N0/GwbEdsXpSBE0g41OnOKL5rl1HraIyCxtBqYQx6fwvpEn5pMyBN2rl lCcU6UAx1mlsS9kXqMalUC3G862rf8+QHYCT63//Yr+0OgkKHR3OGMuD+jBcqO0u3zm+uKeH/vF ImEc7HUMUBu3zF3rdSgVCS3IW06ug8BoI7ehOiFxqvSPTlMkt5GAxRojaEQLO1fpK9hxYzxIrTb +WkAJjdVVpFQxbhCBabaP8nfNBA8csBJQwt0WiZ6hVVyMz62tLDrIpbJPNV1vqWO0Gq3cxs6VPG SDn/lAjtRhjQTrSzQbJBrV1y7HwXS3CPqS6UI+uYJg X-Received: by 2002:a5d:5f91:0:b0:449:fb9e:4b4e with SMTP id ffacd0b85a97d-44a85d97773mr579833f8f.15.1777585139824; Thu, 30 Apr 2026 14:38:59 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.38.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:38:59 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 12/29] sched/deadline: Account rt-cgroups bandwidth in deadline tasks schedulability tests Date: Thu, 30 Apr 2026 23:38:16 +0200 Message-ID: <20260430213835.62217-13-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Account the rt-cgroups hierarchy's reserved bandwidth in the schedulability test of deadline entities. This mechanism allows to completely reserve portion of the rt-bandwidth to rt-cgroups even if they do not use all of it. Account for the rt-cgroups' reserved bandwidth also when changing the total dedicated bandwidth for real time tasks. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: Yuri Andriaccio Signed-off-by: Yuri Andriaccio Signed-off-by: luca abeni --- kernel/sched/deadline.c | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 5532ca4ad969..084af1d375b5 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -202,11 +202,22 @@ void __dl_add(struct dl_bw *dl_b, u64 tsk_bw, int cpu= s) __dl_update(dl_b, -((s32)tsk_bw / cpus)); } +static inline u64 get_dl_groups_bw(void) +{ +#ifdef CONFIG_RT_GROUP_SCHED + return to_ratio(root_task_group.dl_bandwidth.dl_period, + root_task_group.dl_bandwidth.dl_runtime); +#else + return 0; +#endif +} + static inline bool __dl_overflow(struct dl_bw *dl_b, unsigned long cap, u64 old_bw, u64 new_b= w) { return dl_b->bw !=3D -1 && - cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw; + cap_scale(dl_b->bw, cap) < dl_b->total_bw - old_bw + new_bw + + cap_scale(get_dl_groups_bw(), cap); } static inline @@ -3462,8 +3473,9 @@ int sched_dl_global_validate(void) u64 period =3D global_rt_period(); u64 new_bw =3D to_ratio(period, runtime); u64 cookie =3D ++dl_cookie; + u64 dl_groups_root =3D get_dl_groups_bw(); struct dl_bw *dl_b; - int cpu, cpus, ret =3D 0; + int cpu, cap, cpus, ret =3D 0; unsigned long flags; /* @@ -3478,10 +3490,12 @@ int sched_dl_global_validate(void) goto next; dl_b =3D dl_bw_of(cpu); + cap =3D dl_bw_capacity(cpu); cpus =3D dl_bw_cpus(cpu); raw_spin_lock_irqsave(&dl_b->lock, flags); - if (new_bw * cpus < dl_b->total_bw) + if (new_bw * cpus < dl_b->total_bw + + cap_scale(dl_groups_root, cap)) ret =3D -EBUSY; raw_spin_unlock_irqrestore(&dl_b->lock, flags); -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DBECE3C7DF1 for ; Thu, 30 Apr 2026 21:39:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585144; cv=none; b=BEu8jLLXvO4ol1oDKTtvam5SkupFnd+qkzfsEBOR4/njbPWpQ0Qob7LfB5n+tOwNmZ4d6SCqhgWsmZETtG9wzC/MYopTuwbey24i0VjUvIZ3AyIxNB6UukbMglOJywHAqRUjSGJPqnemLG9X0VPAkZAN1njMEmjj4GJp9VtWj8s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585144; c=relaxed/simple; bh=V6N8Z3HqxF7Vs03gx7LpjNmoBsHpjyvtqOwnt1Hq4yI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DyoRutznwz/u74nJJM9eIxmh/2YeIOssLt5m2OdsFoTjhhBQghk9GORWZOcrvnY5FqD8msxWcYVyi1NMuH8JcMhXxO/QAayYvaKHVzGH5zLIWggYlehsx2XHmkfv/kRlIL6RgWUM6RRrXtAanDTmYLCAJ8i5S/SAQzkb1iG8Mts= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FCaEseq2; arc=none smtp.client-ip=209.85.221.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FCaEseq2" Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-43d7e23defbso834421f8f.0 for ; Thu, 30 Apr 2026 14:39:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585141; x=1778189941; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4uCJba0CU9dnTtMqX5SdAOLUZCf3o9RZtBHur+KKBAA=; b=FCaEseq2zl4ZHEDUawKB+BOmZMUkITyYpNhzKbJ2FYETaiWb9FSGWENU0Bm16UEPSe 6D+tRTJQJ7dnMYJFIO38LjitC3N4LqEnRZW95taxxRfTacs9jbt3i3Ug7juPt0A1jJ+o /yURFGWIT+IvTVtx7U4IrtnwgxpNFU6OEi6ZVDp3Ms5OehPbL+Mf3oCgWTKyGdMWmdnR 3AM8fTmiVVEX9JTYidXhZJY7RyryVDIn9wfDvlJfx3V3bfzJH+TX4uve68Rje3aNpAt4 ZGlW/wd72MNaS8RzKSvvNeCRyVZeElvB3fIMO3f5XofYSK6YiiHVlOp4exIfcjQn3BCb rMuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585141; x=1778189941; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=4uCJba0CU9dnTtMqX5SdAOLUZCf3o9RZtBHur+KKBAA=; b=lG6rITY5agsdw/QWSlwLN8BKSqJFqGJQWTXCDSh0UyBPvGCxjfMphiilUJKYzdhvyQ MBmJffrTJES2w0twcVwPV4Jooo6khS75NMD/j3rsc53WTxiPxXu9oFNzhrrxH9WeEfDJ JryuTuIhuqWUpqgA2aOKPPqGuWRnxwywmJSJ85VPRRyxMeqiAC1uSW2JJvTucQFPWsLA 1rbwnweMHr0MOxJ3l7Q4zmHF+LguR9U63ke6yDdx8AHTQyzCH7GaHSqYalDj3C3/0bTq CkSqL6e7YSUUun/kYvbOhDB7IXgwqkal1Vko/rG8IXEMJqgukyrWchKTqGYF2Gii0GL4 QEPA== X-Gm-Message-State: AOJu0Ywp+DJ+47qPSbmrZ1TW7euXNsnhl6UqAJJu3xgNL4Z4JV/e83yp ZEKj+SqE7WxXj36aRIkRp0I7H1lOUQ5ACqo+q9T7vsAeES4FhaRF2vKH X-Gm-Gg: AeBDievQRpopiNd06fHc7cTiSg1e7wvRHM1TrZxeyaXoSOT/i7vd7NzItGR5GArDroD aKPo9oZrx0LhobD1D+wBRGoBqXWI9eIEW5p9M0KRUv+LscU35nUbLol9ivjCQst8n/EBOHV8j8v IQ+jVMuXuqHk7BHp5yNxhKVZOvCQ4bU8XNOsm3e/J14JgX6rWh7h/zBsxZe/ym0krbd3XiX22XV qsOVvziuXepelQPqAgGDXxFCYbOoCkWxsetfV1rzYX9N64zuTxE683SzSus8ffAmT2Ep60kPgle pi8FQxEXmEU7E5yf0ylCf7aqgWrQR9roGp1+N+wuFyIqTTK4duKj3iDADWL2DP9Vp5DZLARFoBo HO6RA1Wm4dE2PQTBBltHdLix+4HwU5qbCbgVA+Vg8D9Ym5sLg5g0ANgxSkSj7N3KrMreJ88TzKV KturggxTuOm2GacvU+kTeEJrKD7OgrACwiXVYJxZrd X-Received: by 2002:a05:6000:230c:b0:43d:1c4a:37c with SMTP id ffacd0b85a97d-44a861795fbmr629668f8f.4.1777585141297; Thu, 30 Apr 2026 14:39:01 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:01 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 13/29] sched/rt: Implement dl-server operations for rt-cgroups Date: Thu, 30 Apr 2026 23:38:17 +0200 Message-ID: <20260430213835.62217-14-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement rt_server_pick, the callback that deadline servers use to pick a task to schedule. rt_server_pick(): pick the next runnable rt task and tell the scheduler that it is going to be scheduled next. Let enqueue_task_rt function start the attached deadline server when the first task is enqueued on a specific rq/server. The server is not symmetrically stopped in dequeue_task_rt as it is stopped when server_pick_task returns NULL (see deadline.c). Change update_curr_rt to perform a deadline server update if the updated task is served by non-root group. Update inc/dec_dl_tasks to account the number of active tasks in the local runqueue for rt-cgroups servers, as their local runqueue is different from the global runqueue, and thus when a rt-group server is activated/deactivated, the number of served tasks' must be added/removed. This uses nr_running to be compatible with future dl-server interfaces. Account also the deadline server so that it is picked for shutdown when its runqueue is empty (future patches will try to pull tasks before stopping). Update inc/dec_rt_prio_smp to change a rq's cpupri only if the rt_rq is the global runqueue, since cgroups are scheduled via their dl-server priority. Update inc/dec_rt_tasks to account for waking/sleeping tasks on the global runqueue, when the task runs on the root cgroup, or its local dl server is active. The accounting is not done when servers are throttled, as they will add/sub the number of tasks running when they get enqueued/dequeued. For rt cgroups, account for the number of active tasks in the nr_running field of the local runqueue (add/sub_nr_running), as this number is used when a dl server is enqueued/dequeued. Update set_task_rq to record the dl_rq, tracking which deadline server manages a task. Update set_task_rq to not use the parent field anymore, as it is unused by this patchset's code. Remove the unused parent field from sched_rt_entity. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: luca abeni Signed-off-by: luca abeni Signed-off-by: Yuri Andriaccio --- include/linux/sched.h | 1 - kernel/sched/deadline.c | 8 ++++++ kernel/sched/rt.c | 60 ++++++++++++++++++++++++++++++++++++++--- kernel/sched/sched.h | 8 +++++- 4 files changed, 71 insertions(+), 6 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index eb8b57f689b5..ea2e74598b93 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -630,7 +630,6 @@ struct sched_rt_entity { struct sched_rt_entity *back; #ifdef CONFIG_RT_GROUP_SCHED - struct sched_rt_entity *parent; /* rq on which this entity is (to be) queued: */ struct rt_rq *rt_rq; /* rq "owned" by this entity/group: */ diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 084af1d375b5..c82810732106 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2093,6 +2093,10 @@ void inc_dl_tasks(struct sched_dl_entity *dl_se, str= uct dl_rq *dl_rq) if (!dl_server(dl_se)) add_nr_running(rq_of_dl_rq(dl_rq), 1); + else if (rq_of_dl_se(dl_se) !=3D dl_se->my_q) { + WARN_ON(dl_se->my_q->rt.rt_nr_running !=3D dl_se->my_q->nr_running); + add_nr_running(rq_of_dl_rq(dl_rq), dl_se->my_q->nr_running + 1); + } inc_dl_deadline(dl_rq, deadline); } @@ -2105,6 +2109,10 @@ void dec_dl_tasks(struct sched_dl_entity *dl_se, str= uct dl_rq *dl_rq) if (!dl_server(dl_se)) sub_nr_running(rq_of_dl_rq(dl_rq), 1); + else if (rq_of_dl_se(dl_se) !=3D dl_se->my_q) { + WARN_ON(dl_se->my_q->rt.rt_nr_running !=3D dl_se->my_q->nr_running); + sub_nr_running(rq_of_dl_rq(dl_rq), dl_se->my_q->nr_running - 1); + } dec_dl_deadline(dl_rq, dl_se->deadline); } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 3d7f2b2ebe60..defb812b0e48 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -144,9 +144,22 @@ void free_rt_sched_group(struct task_group *tg) kfree(tg->dl_se); } +static struct sched_rt_entity *pick_next_rt_entity(struct rt_rq *rt_rq); +static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, = bool first); + static struct task_struct *rt_server_pick(struct sched_dl_entity *dl_se, s= truct rq_flags *rf) { - return NULL; + struct rt_rq *rt_rq =3D &dl_se->my_q->rt; + struct rq *rq =3D rq_of_rt_rq(rt_rq); + struct task_struct *p; + + if (!sched_rt_runnable(dl_se->my_q)) + return NULL; + + p =3D rt_task_of(pick_next_rt_entity(rt_rq)); + set_next_task_rt(rq, p, true); + + return p; } static inline void __rt_rq_free(struct rt_rq **rt_rq) @@ -462,6 +475,7 @@ static inline int rt_se_prio(struct sched_rt_entity *rt= _se) static void update_curr_rt(struct rq *rq) { struct task_struct *donor =3D rq->donor; + struct rt_rq *rt_rq; s64 delta_exec; if (donor->sched_class !=3D &rt_sched_class) @@ -471,8 +485,18 @@ static void update_curr_rt(struct rq *rq) if (unlikely(delta_exec <=3D 0)) return; - if (!rt_bandwidth_enabled()) + if (!rt_group_sched_enabled()) return; + + if (!dl_bandwidth_enabled()) + return; + + rt_rq =3D rt_rq_of_se(&donor->rt); + if (is_dl_group(rt_rq)) { + struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq); + + dl_server_update(dl_se, delta_exec); + } } static void @@ -483,7 +507,7 @@ inc_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev= _prio) /* * Change rq's cpupri only if rt_rq is the top queue. */ - if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && &rq->rt !=3D rt_rq) + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) return; if (rq->online && prio < prev_prio) @@ -498,7 +522,7 @@ dec_rt_prio_smp(struct rt_rq *rt_rq, int prio, int prev= _prio) /* * Change rq's cpupri only if rt_rq is the top queue. */ - if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && &rq->rt !=3D rt_rq) + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) return; if (rq->online && rt_rq->highest_prio.curr !=3D prev_prio) @@ -561,6 +585,16 @@ void inc_rt_tasks(struct sched_rt_entity *rt_se, struc= t rt_rq *rt_rq) rt_rq->rr_nr_running +=3D is_rr_task(rt_se); inc_rt_prio(rt_rq, rt_se_prio(rt_se)); + + if (rt_group_sched_enabled() && is_dl_group(rt_rq)) { + struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq); + + if (!dl_se->dl_throttled) + add_nr_running(rq_of_rt_rq(rt_rq), 1); + add_nr_running(served_rq_of_rt_rq(rt_rq), 1); + } else { + add_nr_running(rq_of_rt_rq(rt_rq), 1); + } } static inline @@ -571,6 +605,16 @@ void dec_rt_tasks(struct sched_rt_entity *rt_se, struc= t rt_rq *rt_rq) rt_rq->rr_nr_running -=3D is_rr_task(rt_se); dec_rt_prio(rt_rq, rt_se_prio(rt_se)); + + if (rt_group_sched_enabled() && is_dl_group(rt_rq)) { + struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq); + + if (!dl_se->dl_throttled) + sub_nr_running(rq_of_rt_rq(rt_rq), 1); + sub_nr_running(served_rq_of_rt_rq(rt_rq), 1); + } else { + sub_nr_running(rq_of_rt_rq(rt_rq), 1); + } } /* @@ -752,6 +796,14 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, = int flags) check_schedstat_required(); update_stats_wait_start_rt(rt_rq_of_se(rt_se), rt_se); + /* Task arriving in an idle group of tasks. */ + if (rt_group_sched_enabled() && + is_dl_group(rt_rq) && rt_rq->rt_nr_running =3D=3D 0) { + struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq); + + dl_server_start(dl_se); + } + enqueue_rt_entity(rt_se, flags); if (task_is_blocked(p)) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ca69d2132061..d949babfe16a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2292,7 +2292,7 @@ static inline void set_task_rq(struct task_struct *p,= unsigned int cpu) if (!rt_group_sched_enabled()) tg =3D &root_task_group; p->rt.rt_rq =3D tg->rt_rq[cpu]; - p->rt.parent =3D tg->rt_se[cpu]; + p->dl.dl_rq =3D &cpu_rq(cpu)->dl; #endif /* CONFIG_RT_GROUP_SCHED */ } @@ -2954,6 +2954,9 @@ static inline void add_nr_running(struct rq *rq, unsi= gned count) unsigned prev_nr =3D rq->nr_running; rq->nr_running =3D prev_nr + count; + if (rq !=3D cpu_rq(rq->cpu)) + return; + if (trace_sched_update_nr_running_tp_enabled()) { call_trace_sched_update_nr_running(rq, count); } @@ -2967,6 +2970,9 @@ static inline void add_nr_running(struct rq *rq, unsi= gned count) static inline void sub_nr_running(struct rq *rq, unsigned count) { rq->nr_running -=3D count; + if (rq !=3D cpu_rq(rq->cpu)) + return; + if (trace_sched_update_nr_running_tp_enabled()) { call_trace_sched_update_nr_running(rq, -count); } -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com [209.85.128.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9664C3CA49D for ; Thu, 30 Apr 2026 21:39:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585146; cv=none; b=uo1fxRCmAwovW+GhfQTfEveVgXaFCA+agzmfuuYtZelRXFfs4gSZRj4iwOyeureWU27WEFSZaUJ6+d5oROOIR9UYl2YolAM7mY5NKttHiBZTjy8G4gUn9juXsuq0udNsLEPdyEq/Gj9jtHgBbgkTAuuxZCx0GH0PDTdCs6ei+9Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585146; c=relaxed/simple; bh=hRMZM1EfcVGEWsgSYAIYLTPoeiUBiWhqtXz6aRHZ73I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SsoMAc3fVg+ebuuK+QSjEkK6DW0FBEwdu3NxVPP6wYGdtg8JTfGzWu63ZAPctH3qqIx93mwVDSF/iE1VBbI0sjUaJMhGxWcqFDmN68jWfoO3tD8oHdj1dSqnWoRdmb3+htPfbt68ZCXt/RG7kYI9u7zX7wYvAIG/m9ltQpkL/ZY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OvlyU0yD; arc=none smtp.client-ip=209.85.128.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OvlyU0yD" Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-4891d7164ddso7944705e9.3 for ; Thu, 30 Apr 2026 14:39:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585143; x=1778189943; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/pmc3SPqNOuM897JTx8gK8hiE6u8sztch7kdFDO7fKQ=; b=OvlyU0yDEPbpuLDrfFRoZLyOyxEg8Iw1Tg5uix/GRsomHEhikWcJ9kRmIdzhfmjChr 8C0Ba+Tdm3mJrfqeN50gkI89ud5zDvS8wgKdUNyLBTytiHTT6cfYQA5yWj/fKY2mP5JE RdVbbBmmnKTZjZNdDENmrvYAgt4B2PCkXlpEiLab6JWiH/j/TNzfRqFY6wRuf59h2oXO /C9wwD+8xxpGhW5fygpwmPAVEy+Me8HAf2lIOvWFwfbJEhXinobXDuObRC+OU5yX9G6Z PylGj2mDLBNAOQl2N3lPIbu6ZAyg1DI5fln1rqEsNv6cE6ejQcaXOaqBdpeTPUd4CpCD DfDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585143; x=1778189943; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=/pmc3SPqNOuM897JTx8gK8hiE6u8sztch7kdFDO7fKQ=; b=Lt+9gjc4IFr1+/VlMn6or7apWAYkS0einSGFpqWvltCSyHhNCxoMa2AFWQGA/KFnIg lRbiSI3xXHGAnCR4QW6QK3HfX2WHI5Lpq4qrwi1xIM5NRdgm44y7yO0a7PJG7H6cZTHS 1mT3LpcUq/43KzxVxic0H85PC5UQ/sLtPTKT6/blXxRtuABjIy52MEr5lHZOjzbal9x0 tgS1H14HVrOFsgpnHwzoNQ0Hl+kEJY0Q7b1m+tsOiI26Y9SZ+hWigPo1ORLLSZYAfL36 Sw1eCn5k/apHwe3elGdC3HuBkRXPIcrakylXceW1Ean42/6zEmNxq26DK+6ja0TnKTkw CkmQ== X-Gm-Message-State: AOJu0YwztINhnKY94f1v1t2Y7F6+gyabKXDYQLwE5ggfRujXk9Npzqzx WIheXvElnyN3iTaAbHRni4CCd7D5YWeDkBeDgXJ1RKzFUYw+vCkyaCFo X-Gm-Gg: AeBDiesJ7L7vzJHAnLtS46Hmdu06BeYgkqAcTacBHmvqoD+jFgKsc1tPcyspeusmN1z 3rsXPxOuz7+fCH1NwsQoShTxrjd1cbCATijYOAWNIF/jdMVakrgpnIVzCyGI7V5q6/PLkc9mK9a 6n1v831V1qPfi0QHO/pjlkBm2UW7WeLPAag/ZtUoAUZHadF05nH1a9Ak8Rbsor3/V2XbIQat5pC i8kdPssIuNb0zOsLQ89sNA1yckIe62YM2Fe9sbv9J3ZLppUWI3tUrLiodh2uNDr7c4mg2t6Edbw BJb7/3lLb0UMhkMIVKw/DE2SiONvCFbT6P1446AsTjA4PEyxd2jN75E09njRql6zAaIw5GW9/Vj s5fUycHRJuAj7Aq63bsVGQIX0BVXzPKAOiLai5SLDqMY0bVURjQoBFEo/asW7Z8bt6hmjbGt7yZ HxsqcXZSWg8tBe3uerUPWa/3LYyoZGNSFDltCEHazm X-Received: by 2002:a05:600c:5254:b0:471:700:f281 with SMTP id 5b1f17b1804b1-48a84460b1amr78128015e9.25.1777585142883; Thu, 30 Apr 2026 14:39:02 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:02 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 14/29] sched/rt: Update task event callbacks for HCBS scheduling Date: Thu, 30 Apr 2026 23:38:18 +0200 Message-ID: <20260430213835.62217-15-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update wakeup_preempt_rt, switched_{from/to}_rt and prio_changed_rt with rt-cgroup's specific preemption rules: - In wakeup_preempt_rt(), whenever a task wakes up, it must be checked if it is served by a deadline server or it lives on the global runqueue. Preemption rules (as documented in the function), change based on the current task and woken task runqueue: - If both tasks are FIFO/RR tasks on the global runqueue, or the same cgroup, run as normal. - If woken is inside a cgroup, but donor is a FIFO task on the global runqueue, always preempt. If donor is a DEADLINE task, check if the dl server preempts donor. - If both tasks are FIFO/RR tasks in served but different groups, check whether the woken server preempts the donor server. - In switched_from_rt(), perform a pull only on the global runqueue, and do nothing if the task is inside a group. This will change when migrations are added. - In switched_to_rt(), queue a push only on the global runqueue, while perform a priority check when the task switching is inside a group. This will change also when migrations are added. - In prio_changed_rt(), queue a pull only on the global runqueue, if the task is not queued. If the task is queued, run preemption checks only if both the prio changed task and curr are in the same cgroup. Update sched_rt_can_attach() to check if a task can be attached to a given cgroup. For now the check only consists in checking if the group has non-zero bandwidth. Remove the tsk argument from sched_rt_can_attach, as it is unused. Change cpu_cgroup_can_attach() to check if the attachee is a FIFO/RR task before attaching it to a cgroup. Update __sched_setscheduler() to perform checks when trying to switch to FIFO/RR for a task inside a cgroup, as the group needs to have runtime allocated. Update task_is_throttled_rt() for SCHED_CORE, returning the is_throttled value of the server if present, while global rt-tasks are never throttled. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: luca abeni Signed-off-by: luca abeni Signed-off-by: Yuri Andriaccio --- kernel/sched/core.c | 2 +- kernel/sched/rt.c | 106 +++++++++++++++++++++++++++++++++++----- kernel/sched/sched.h | 2 +- kernel/sched/syscalls.c | 12 +++++ 4 files changed, 109 insertions(+), 13 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4e58b4f165ed..98a53b60e21f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9270,7 +9270,7 @@ static int cpu_cgroup_can_attach(struct cgroup_taskse= t *tset) goto scx_check; cgroup_taskset_for_each(task, css, tset) { - if (!sched_rt_can_attach(css_tg(css), task)) + if (rt_task(task) && !sched_rt_can_attach(css_tg(css))) return -EINVAL; } scx_check: diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index defb812b0e48..67fbf4bbe461 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -975,7 +975,58 @@ static int balance_rt(struct rq *rq, struct task_struc= t *p, struct rq_flags *rf) static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int fl= ags) { struct task_struct *donor =3D rq->donor; + struct sched_dl_entity *woken_dl_se =3D NULL; + struct sched_dl_entity *donor_dl_se =3D NULL; + + if (!rt_group_sched_enabled()) + goto no_group_sched; + /* + * Preemption checks are different if the waking task and the current task + * are running on the global runqueue or in a cgroup. The following rules + * apply: + * - dl-tasks (and equally dl_servers) always preempt FIFO/RR tasks. + * - if curr is a FIFO/RR task inside a cgroup (i.e. run by a + * dl_server), or curr is a DEADLINE task and waking is a FIFO/RR t= ask + * on the root cgroup, do nothing. + * - if waking is inside a cgroup but curr is a FIFO/RR task in the r= oot + * cgroup, always reschedule. + * - if they are both on the global runqueue, run the standard code. + * - if they are both in the same cgroup, check for tasks priorities. + * - if they are both in a cgroup, but not the same one, check whether = the + * woken task's dl_server preempts the current's dl_server. + * - if curr is a DEADLINE task and waking is in a cgroup, check whether + * the woken task's server preempts curr. + */ + if (is_dl_group(rt_rq_of_se(&p->rt))) + woken_dl_se =3D dl_group_of(rt_rq_of_se(&p->rt)); + if (is_dl_group(rt_rq_of_se(&donor->rt))) + donor_dl_se =3D dl_group_of(rt_rq_of_se(&donor->rt)); + else if (task_has_dl_policy(donor)) + donor_dl_se =3D &donor->dl; + + if (woken_dl_se !=3D NULL && donor_dl_se !=3D NULL) { + if (woken_dl_se =3D=3D donor_dl_se) { + if (p->prio < donor->prio) + resched_curr(rq); + + return; + } + + if (dl_entity_preempt(woken_dl_se, donor_dl_se)) + resched_curr(rq); + + return; + + } else if (woken_dl_se !=3D NULL) { + resched_curr(rq); + return; + + } else if (donor_dl_se !=3D NULL) { + return; + } + +no_group_sched: /* * XXX If we're preempted by DL, queue a push? */ @@ -1026,7 +1077,8 @@ static inline void set_next_task_rt(struct rq *rq, st= ruct task_struct *p, bool f if (rq->donor->sched_class !=3D &rt_sched_class) update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0); - rt_queue_push_tasks(rt_rq); + if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq)) + rt_queue_push_tasks(rt_rq); } static struct sched_rt_entity *pick_next_rt_entity(struct rt_rq *rt_rq) @@ -1736,6 +1788,8 @@ static void rq_offline_rt(struct rq *rq) */ static void switched_from_rt(struct rq *rq, struct task_struct *p) { + struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt); + /* * If there are other RT tasks then we will reschedule * and the scheduling of the other RT tasks will handle @@ -1743,10 +1797,11 @@ static void switched_from_rt(struct rq *rq, struct = task_struct *p) * we may need to handle the pulling of RT tasks * now. */ - if (!task_on_rq_queued(p) || rq->rt.rt_nr_running) + if (!task_on_rq_queued(p) || rt_rq->rt_nr_running) return; - rt_queue_pull_task(rt_rq_of_se(&p->rt)); + if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq)) + rt_queue_pull_task(rt_rq); } void __init init_sched_rt_class(void) @@ -1766,6 +1821,8 @@ void __init init_sched_rt_class(void) */ static void switched_to_rt(struct rq *rq, struct task_struct *p) { + struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt); + /* * If we are running, update the avg_rt tracking, as the running time * will now on be accounted into the latter. @@ -1781,8 +1838,14 @@ static void switched_to_rt(struct rq *rq, struct tas= k_struct *p) * then see if we can move to another run queue. */ if (task_on_rq_queued(p)) { - if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) - rt_queue_push_tasks(rt_rq_of_se(&p->rt)); + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) { + if (p->prio < rq->donor->prio) + resched_curr(rq); + } else { + if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) + rt_queue_push_tasks(rt_rq_of_se(&p->rt)); + } + if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq))) resched_curr(rq); } @@ -1795,6 +1858,8 @@ static void switched_to_rt(struct rq *rq, struct task= _struct *p) static void prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio) { + struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt); + if (!task_on_rq_queued(p)) return; @@ -1807,15 +1872,25 @@ prio_changed_rt(struct rq *rq, struct task_struct *= p, u64 oldprio) * may need to pull tasks to this runqueue. */ if (oldprio < p->prio) - rt_queue_pull_task(rt_rq_of_se(&p->rt)); + if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq)) + rt_queue_pull_task(rt_rq); /* * If there's a higher priority task waiting to run * then reschedule. */ - if (p->prio > rq->rt.highest_prio.curr) + if (p->prio > rt_rq->highest_prio.curr) resched_curr(rq); } else { + /* + * This task is not running, thus we check against the currently + * running task for preemption. We can preempt only if both tasks are + * in the same cgroup or on the global runqueue. + */ + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && + rt_rq_of_se(&p->rt)->tg !=3D rt_rq_of_se(&rq->curr->rt)->tg) + return; + /* * This task is not running, but if it is * greater than the current running task @@ -1908,7 +1983,16 @@ static unsigned int get_rr_interval_rt(struct rq *rq= , struct task_struct *task) #ifdef CONFIG_SCHED_CORE static int task_is_throttled_rt(struct task_struct *p, int cpu) { +#ifdef CONFIG_RT_GROUP_SCHED + struct rt_rq *rt_rq; + + rt_rq =3D task_group(p)->rt_rq[cpu]; + WARN_ON(!rt_group_sched_enabled() && rt_rq->tg !=3D &root_task_group); + + return dl_group_of(rt_rq)->dl_throttled; +#else return 0; +#endif } #endif /* CONFIG_SCHED_CORE */ @@ -2159,16 +2243,16 @@ static int sched_rt_global_constraints(void) } #endif /* CONFIG_SYSCTL */ -int sched_rt_can_attach(struct task_group *tg, struct task_struct *tsk) +int sched_rt_can_attach(struct task_group *tg) { /* Don't accept real-time tasks when there is no way for them to run */ - if (rt_group_sched_enabled() && rt_task(tsk) && tg->rt_bandwidth.rt_runti= me =3D=3D 0) + if (rt_group_sched_enabled() && tg->dl_bandwidth.dl_runtime =3D=3D 0) return 0; return 1; } -#else /* !CONFIG_RT_GROUP_SCHED: */ +#else /* !CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_SYSCTL static int sched_rt_global_constraints(void) @@ -2176,7 +2260,7 @@ static int sched_rt_global_constraints(void) return 0; } #endif /* CONFIG_SYSCTL */ -#endif /* !CONFIG_RT_GROUP_SCHED */ +#endif /* CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_SYSCTL static int sched_rt_global_validate(void) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d949babfe16a..fceb02a04858 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -609,7 +609,7 @@ extern int sched_group_set_rt_runtime(struct task_group= *tg, long rt_runtime_us) extern int sched_group_set_rt_period(struct task_group *tg, u64 rt_period_= us); extern long sched_group_rt_runtime(struct task_group *tg); extern long sched_group_rt_period(struct task_group *tg); -extern int sched_rt_can_attach(struct task_group *tg, struct task_struct *= tsk); +extern int sched_rt_can_attach(struct task_group *tg); extern struct task_group *sched_create_group(struct task_group *parent); extern void sched_online_group(struct task_group *tg, diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c index 806bc88d21ee..15653840c812 100644 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -606,6 +606,18 @@ int __sched_setscheduler(struct task_struct *p, change: if (user) { + /* + * Do not allow real-time tasks into groups that have no runtime + * assigned. + */ + if (rt_group_sched_enabled() && + dl_bandwidth_enabled() && rt_policy(policy) && + !sched_rt_can_attach(task_group(p)) && + !task_group_is_autogroup(task_group(p))) { + retval =3D -EPERM; + goto unlock; + } + if (dl_bandwidth_enabled() && dl_policy(policy) && !(attr->sched_flags & SCHED_FLAG_SUGOV)) { cpumask_t *span =3D rq->rd->span; -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 686EB3CF032 for ; Thu, 30 Apr 2026 21:39:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585149; cv=none; b=fOAPyW8T4R6pwR9PX7IrurEKXGzGZIRBxlLxvIQThYCYtuVqZqbGMsLgl+MHbVSlD322Dqqs6ueU7ZFqBLRC5OVlqvFrVqmLqHtKBMm/6DmMFux980rA2Y+cnLbPf/tTnJO8Wf9u7woVSroqba/C87mv6z2bQZfe4pCj93egLdk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585149; c=relaxed/simple; bh=O8vr8arQUQ9w1H+zuKVFXvhQgQEcu5EiLYxvCpgpJJ0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=s9LzhjGWfeMKAJ9ZlFB0GOs0AWfP/OQSWxGw66DTIcNoGFxWJ7YrnywV6awCBBpX4GnMWuTxJx3ung1tvZOFqt6YRp0OPdNBdMemDp+5nnQfNu+ib2LbwEMo8Ge1K833BwAZRsgzIpHNA0FCLJCNxiJjCkmPoAxgESKE0VzXgkM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=hiofhaY+; arc=none smtp.client-ip=209.85.221.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hiofhaY+" Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-44261378651so1630223f8f.0 for ; Thu, 30 Apr 2026 14:39:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585145; x=1778189945; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rng7wRbboISczYFrMwCDoR+EdqCtU4+WfPrtNp+yNdY=; b=hiofhaY+k9EbiBmrCSKXuJQVfgHbW7OCNuet4lEU4eLjPVSSYUktqAMjlniu3rw9oZ 0n5hWejDkaMMqMB/gVBRSXBoCMX9hdJyWRGofEQhTslasepMK6OfCBQ4NCf+9IRBcA02 MNUDm9Fmndy5z+9yUWvr6bJ82tO6yKzecbyEOjcMTZlHAxHLNYQk7Dzx6gh+JdZnGhKH L1Emn2ZZWxmjtLSTp6VucbacHC/Z9Aef6mNHLR0nWfnlYrOVs+wRrHQSuQx31KmRBmHX y32U7aaIE8hTgRoV0wCGprgn9UgjnptI+Z9CWt0agnj+n8cO+ehRSC4yImFXSewEv4Vs L3Jw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585145; x=1778189945; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=rng7wRbboISczYFrMwCDoR+EdqCtU4+WfPrtNp+yNdY=; b=WNNh9f68WwdpqROkDPgr2g1EG2/0s8pu/iQYCes5AZ1Q36c2vVGnUm6HqTZ+Bp8nOo sHB3pE1e9WoKKgF1YdHYIoy7cNaDXPG6c7fkd3eL2ywWQpm4nUwCeEkhUg3g0vQJu4m2 2pwnEU8GSJ2s7VWHMNRWKKzWlQrj6XTGehYv4NBVsBfGcslbNjB40aQ0t36DWlTWqNHI 4VTev5KsF5WnzWwYDGuZv8J4C+eMyaIOAKoUT/pik4U9+MRb7dQIddiyf1l67EZ7Ezu4 q3RUyHuIdAvwPnbmw+xdCx1gqqHOXLLLzKUQNeQcvxQ6q27eWo5X1sBprMqeltRm0BWn dekQ== X-Gm-Message-State: AOJu0YyTzlk3pzR3jG4T/+/BqdAS+dOw3jTBnQqfmtvY9SQOw+pxhBpn rbeZ8LVT4S4M+83q3gppMXcp8L0sibDeh1CjcWXoIjmqxMoQdLFEVgAx X-Gm-Gg: AeBDieu4QrE6DvCd4n7A/WDergMjQgq8FSW2vSi7Z22Ub6xO816u3AZGXW7Tvm0zBpl Nil2J7cVijIw6Op7bghqkjiUbj2npUkZ9x1gqpEioJanFUOcUKYJefD0KdEIVTDC5d6g131ZcFP xlmkpVPchHqJaVpdYgjGsw4uIKOVIgW6Mv0ZPfHoHvdVEVq/mrygoeAKmOtpBXo0IaM+rb7aKoZ aCtiU8nhortCphqYkco5RZkLDeYkOVrXIXyPCwkDeMVeYyXEwAxbISbn/mMrdSBkdgSp6w1yG+d H4abaqZjQinzf3dxDzQ2A4Z55JXGCFuezjIp5pC939ZLu0aIm7u6NLFkasbonVW5xGOCZWISHSR iAgBLuPsCpnKBkCoDNmDtuo889nmFXnxJj4oTdJrIwAyPTV5t5FTRJK74MO1QHiiT3R8NYgrHFH XSgDhhc9fOY+shrQQqelBTEh+jGO/FGqa6K9DFE5Mg X-Received: by 2002:a05:6000:2d0f:b0:43d:7a5e:8162 with SMTP id ffacd0b85a97d-4494fe77f8fmr5052848f8f.15.1777585144664; Thu, 30 Apr 2026 14:39:04 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:04 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 15/29] sched/rt: Update rt-cgroup schedulability checks Date: Thu, 30 Apr 2026 23:38:19 +0200 Message-ID: <20260430213835.62217-16-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Update sched_group_rt_runtime/period and sched_group_set_rt_runtime/period to use the newly defined data structures and perform necessary checks to update both the runtime and period of a given group. The 'set' functions call tg_set_rt_bandwidth() which is also updated: - Use the newly added HCBS dl_bandwidth structure instead of rt_bandwidth. - Update __rt_schedulable() to check for numerical issues: - Reuse __checkparam_dl. - Add allow_zero_runtime param to __checkparam_dl as cgroups may zero their runtime, while it is not allowed for DEADLINE tasks to do so. - Use RCU lock guard instead of rcu_read_lock/unlock. - Update tg_rt_schedulable(), used when walking the cgroup tree to check if all invariants are met: - Update most of the instructions to obtain data from the newly added data structures (dl_bandwidth). - If the task group is the root group, run a total bandwidth check with the newly added dl_check_tg() function. - After all checks are successful, if the changed group is not the root cgroup, update the assigned runtime and period to all the local deadline servers. - Additionally use mutex guards instead of manually locking/unlocking. Add dl_check_tg(), which performs an admission control test similar to __dl_overflow, but this time we are updating the cgroup's total bandwidth rather than scheduling a new DEADLINE task or updating a non-cgroup deadline server. Add rcu_sched lock guard for rcu_read_lock/unlock_sched. Finally, prevent creation of a cgroup hierarchy with depth greater than two, as this will be addressed in a future patch. A depth two hierarchy is sufficient for now for testing the patchset. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: Yuri Andriaccio Signed-off-by: Yuri Andriaccio Signed-off-by: luca abeni --- include/linux/rcupdate.h | 1 + kernel/sched/core.c | 6 ++++ kernel/sched/deadline.c | 43 +++++++++++++++++----- kernel/sched/rt.c | 77 +++++++++++++++++++--------------------- kernel/sched/sched.h | 3 +- kernel/sched/syscalls.c | 2 +- 6 files changed, 82 insertions(+), 50 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 04f3f86a4145..032cfa763047 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -1191,6 +1191,7 @@ extern int rcu_expedited; extern int rcu_normal; =20 DEFINE_LOCK_GUARD_0(rcu, rcu_read_lock(), rcu_read_unlock()) +DEFINE_LOCK_GUARD_0(rcu_sched, rcu_read_lock_sched(), rcu_read_unlock_sche= d()) DECLARE_LOCK_GUARD_0_ATTRS(rcu, __acquires_shared(RCU), __releases_shared(= RCU)) =20 #endif /* __LINUX_RCUPDATE_H */ diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 98a53b60e21f..0c7032d254ba 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9205,6 +9205,12 @@ cpu_cgroup_css_alloc(struct cgroup_subsys_state *par= ent_css) return &root_task_group.css; } =20 + /* Do not allow cpu_cgroup hierachies with depth greater than 2. */ +#ifdef CONFIG_RT_GROUP_SCHED + if (parent !=3D &root_task_group) + return ERR_PTR(-EINVAL); +#endif + tg =3D sched_create_group(parent); if (IS_ERR(tg)) return ERR_PTR(-ENOMEM); diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index c82810732106..74bff7fb7b92 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -343,7 +343,39 @@ void cancel_inactive_timer(struct sched_dl_entity *dl_= se) cancel_dl_timer(dl_se, &dl_se->inactive_timer); } =20 +/* + * Used for dl_bw check and update, used under sched_rt_handler()::mutex a= nd + * sched_domains_mutex. + */ +u64 dl_cookie; + #ifdef CONFIG_RT_GROUP_SCHED +int dl_check_tg(unsigned long total) +{ + int which_cpu; + int cap; + struct dl_bw *dl_b; + u64 gen =3D ++dl_cookie; + + for_each_possible_cpu(which_cpu) { + guard(rcu_sched)(); + + if (!dl_bw_visited(which_cpu, gen)) { + cap =3D dl_bw_capacity(which_cpu); + dl_b =3D dl_bw_of(which_cpu); + + guard(raw_spinlock_irqsave)(&dl_b->lock); + + if (dl_b->bw !=3D -1 && + cap_scale(dl_b->bw, cap) < dl_b->total_bw + cap_scale(total, cap)) + return 0; + } + + } + + return 1; +} + void dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_runtime, u64 rt_peri= od) { struct rq *rq =3D container_of_const(dl_se->dl_rq, struct rq, dl); @@ -3469,12 +3501,6 @@ DEFINE_SCHED_CLASS(dl) =3D { #endif }; =20 -/* - * Used for dl_bw check and update, used under sched_rt_handler()::mutex a= nd - * sched_domains_mutex. - */ -u64 dl_cookie; - int sched_dl_global_validate(void) { u64 runtime =3D global_rt_runtime(); @@ -3670,7 +3696,7 @@ void __getparam_dl(struct task_struct *p, struct sche= d_attr *attr) * below 2^63 ns (we have to check both sched_deadline and * sched_period, as the latter can be zero). */ -bool __checkparam_dl(const struct sched_attr *attr) +bool __checkparam_dl(const struct sched_attr *attr, bool allow_zero_runtim= e) { u64 period, max, min; =20 @@ -3686,7 +3712,8 @@ bool __checkparam_dl(const struct sched_attr *attr) * Since we truncate DL_SCALE bits, make sure we're at least * that big. */ - if (attr->sched_runtime < (1ULL << DL_SCALE)) + if ((!allow_zero_runtime || attr->sched_runtime !=3D 0) && + attr->sched_runtime < (1ULL << DL_SCALE)) return false; =20 /* diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 67fbf4bbe461..c994447f5b1c 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2035,11 +2035,6 @@ DEFINE_SCHED_CLASS(rt) =3D { }; =20 #ifdef CONFIG_RT_GROUP_SCHED -/* - * Ensure that the real time constraints are schedulable. - */ -static DEFINE_MUTEX(rt_constraints_mutex); - static inline int tg_has_rt_tasks(struct task_group *tg) { struct task_struct *task; @@ -2073,8 +2068,8 @@ static int tg_rt_schedulable(struct task_group *tg, v= oid *data) unsigned long total, sum =3D 0; u64 period, runtime; =20 - period =3D ktime_to_ns(tg->rt_bandwidth.rt_period); - runtime =3D tg->rt_bandwidth.rt_runtime; + period =3D tg->dl_bandwidth.dl_period; + runtime =3D tg->dl_bandwidth.dl_runtime; =20 if (tg =3D=3D d->tg) { period =3D d->rt_period; @@ -2090,8 +2085,7 @@ static int tg_rt_schedulable(struct task_group *tg, v= oid *data) /* * Ensure we don't starve existing RT tasks if runtime turns zero. */ - if (rt_bandwidth_enabled() && !runtime && - tg->rt_bandwidth.rt_runtime && tg_has_rt_tasks(tg)) + if (dl_bandwidth_enabled() && !runtime && tg_has_rt_tasks(tg)) return -EBUSY; =20 if (WARN_ON(!rt_group_sched_enabled() && tg !=3D &root_task_group)) @@ -2105,12 +2099,17 @@ static int tg_rt_schedulable(struct task_group *tg,= void *data) if (total > to_ratio(global_rt_period(), global_rt_runtime())) return -EINVAL; =20 + if (tg =3D=3D &root_task_group) { + if (!dl_check_tg(total)) + return -EBUSY; + } + /* * The sum of our children's runtime should not exceed our own. */ list_for_each_entry_rcu(child, &tg->children, siblings) { - period =3D ktime_to_ns(child->rt_bandwidth.rt_period); - runtime =3D child->rt_bandwidth.rt_runtime; + period =3D child->dl_bandwidth.dl_period; + runtime =3D child->dl_bandwidth.dl_runtime; =20 if (child =3D=3D d->tg) { period =3D d->rt_period; @@ -2128,24 +2127,30 @@ static int tg_rt_schedulable(struct task_group *tg,= void *data) =20 static int __rt_schedulable(struct task_group *tg, u64 period, u64 runtime) { - int ret; - struct rt_schedulable_data data =3D { .tg =3D tg, .rt_period =3D period, .rt_runtime =3D runtime, }; =20 - rcu_read_lock(); - ret =3D walk_tg_tree(tg_rt_schedulable, tg_nop, &data); - rcu_read_unlock(); + struct sched_attr attr =3D { + .sched_flags =3D 0, + .sched_runtime =3D runtime, + .sched_deadline =3D period, + .sched_period =3D period, + }; =20 - return ret; + if (!__checkparam_dl(&attr, true)) + return -EINVAL; + + guard(rcu)(); + return walk_tg_tree(tg_rt_schedulable, tg_nop, &data); } =20 static int tg_set_rt_bandwidth(struct task_group *tg, u64 rt_period, u64 rt_runtime) { + static DEFINE_MUTEX(rt_constraints_mutex); int i, err =3D 0; =20 /* @@ -2155,44 +2160,36 @@ static int tg_set_rt_bandwidth(struct task_group *t= g, if (tg =3D=3D &root_task_group && rt_runtime =3D=3D 0) return -EINVAL; =20 - /* No period doesn't make any sense. */ - if (rt_period =3D=3D 0) - return -EINVAL; - /* * Bound quota to defend quota against overflow during bandwidth shift. */ if (rt_runtime !=3D RUNTIME_INF && rt_runtime > max_rt_runtime) return -EINVAL; =20 - mutex_lock(&rt_constraints_mutex); + guard(mutex)(&rt_constraints_mutex); err =3D __rt_schedulable(tg, rt_period, rt_runtime); if (err) - goto unlock; + return err; =20 - raw_spin_lock_irq(&tg->rt_bandwidth.rt_runtime_lock); - tg->rt_bandwidth.rt_period =3D ns_to_ktime(rt_period); - tg->rt_bandwidth.rt_runtime =3D rt_runtime; + guard(raw_spinlock_irq)(&tg->dl_bandwidth.dl_runtime_lock); + tg->dl_bandwidth.dl_period =3D rt_period; + tg->dl_bandwidth.dl_runtime =3D rt_runtime; =20 - for_each_possible_cpu(i) { - struct rt_rq *rt_rq =3D tg->rt_rq[i]; + if (tg =3D=3D &root_task_group) + return 0; =20 - raw_spin_lock(&rt_rq->rt_runtime_lock); - rt_rq->rt_runtime =3D rt_runtime; - raw_spin_unlock(&rt_rq->rt_runtime_lock); + for_each_possible_cpu(i) { + dl_init_tg(tg->dl_se[i], rt_runtime, rt_period); } - raw_spin_unlock_irq(&tg->rt_bandwidth.rt_runtime_lock); -unlock: - mutex_unlock(&rt_constraints_mutex); =20 - return err; + return 0; } =20 int sched_group_set_rt_runtime(struct task_group *tg, long rt_runtime_us) { u64 rt_runtime, rt_period; =20 - rt_period =3D ktime_to_ns(tg->rt_bandwidth.rt_period); + rt_period =3D tg->dl_bandwidth.dl_period; rt_runtime =3D (u64)rt_runtime_us * NSEC_PER_USEC; if (rt_runtime_us < 0) rt_runtime =3D RUNTIME_INF; @@ -2206,10 +2203,10 @@ long sched_group_rt_runtime(struct task_group *tg) { u64 rt_runtime_us; =20 - if (tg->rt_bandwidth.rt_runtime =3D=3D RUNTIME_INF) + if (tg->dl_bandwidth.dl_runtime =3D=3D RUNTIME_INF) return -1; =20 - rt_runtime_us =3D tg->rt_bandwidth.rt_runtime; + rt_runtime_us =3D tg->dl_bandwidth.dl_runtime; do_div(rt_runtime_us, NSEC_PER_USEC); return rt_runtime_us; } @@ -2222,7 +2219,7 @@ int sched_group_set_rt_period(struct task_group *tg, = u64 rt_period_us) return -EINVAL; =20 rt_period =3D rt_period_us * NSEC_PER_USEC; - rt_runtime =3D tg->rt_bandwidth.rt_runtime; + rt_runtime =3D tg->dl_bandwidth.dl_runtime; =20 return tg_set_rt_bandwidth(tg, rt_period, rt_runtime); } @@ -2231,7 +2228,7 @@ long sched_group_rt_period(struct task_group *tg) { u64 rt_period_us; =20 - rt_period_us =3D ktime_to_ns(tg->rt_bandwidth.rt_period); + rt_period_us =3D tg->dl_bandwidth.dl_period; do_div(rt_period_us, NSEC_PER_USEC); return rt_period_us; } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index fceb02a04858..78f080275bf0 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -364,7 +364,7 @@ extern void sched_dl_do_global(void); extern int sched_dl_overflow(struct task_struct *p, int policy, const str= uct sched_attr *attr); extern void __setparam_dl(struct task_struct *p, const struct sched_attr *= attr); extern void __getparam_dl(struct task_struct *p, struct sched_attr *attr); -extern bool __checkparam_dl(const struct sched_attr *attr); +extern bool __checkparam_dl(const struct sched_attr *attr, bool allow_zero= _runtime); extern bool dl_param_changed(struct task_struct *p, const struct sched_att= r *attr); extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const = struct cpumask *trial); extern int dl_bw_deactivate(int cpu); @@ -423,6 +423,7 @@ extern void dl_server_init(struct sched_dl_entity *dl_s= e, struct dl_rq *dl_rq, struct rq *served_rq, dl_server_pick_f pick_task); extern void sched_init_dl_servers(void); +extern int dl_check_tg(unsigned long total); extern void dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_runtime, u64 = rt_period); =20 extern void fair_server_init(struct rq *rq); diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c index 15653840c812..d30aee2e90c4 100644 --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -528,7 +528,7 @@ int __sched_setscheduler(struct task_struct *p, */ if (attr->sched_priority > MAX_RT_PRIO-1) return -EINVAL; - if ((dl_policy(policy) && !__checkparam_dl(attr)) || + if ((dl_policy(policy) && !__checkparam_dl(attr, false)) || (rt_policy(policy) !=3D (attr->sched_priority !=3D 0))) return -EINVAL; =20 --=20 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f47.google.com (mail-wr1-f47.google.com [209.85.221.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5FD263BC667 for ; Thu, 30 Apr 2026 21:39:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585150; cv=none; b=MqI+2BtRuri8OxXTogGZp2vROFg+2qKl1TYk5cpvqdf6H5VRVC1y1r/ZhLBhziwINo6p9W9seWqwwtR3B/s9OOceIyNUNyHvxXBJBs7a8AvmGlohiFGFNYm7r7FFcN08ADW9GCqh1Iq+B2/fHZfn5OMO4eCbJG3IdzXW6Y15+ms= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585150; c=relaxed/simple; bh=Q8E4LQU9WF0TNjpC1AeTgF6W4hffmiY+OeQ6uTIvkVk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=h8JMWftHyu3FxXcB22G3oX7S2ijjBHTr24YZsJ2fAndB8n1o+xhPScmB/T9MUaN1S38z9yOgbYbCB+ElFuUaWoKfdkkOWhqPIMSBPA6bS/y9kRNdtm0gFTj3BuI207RHWSfCla+453WKMV2cVA+P8oD89wky8JPwXLZqIT8W2bU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=QzPVb7M0; arc=none smtp.client-ip=209.85.221.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="QzPVb7M0" Received: by mail-wr1-f47.google.com with SMTP id ffacd0b85a97d-43cfd832155so898489f8f.1 for ; Thu, 30 Apr 2026 14:39:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585146; x=1778189946; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JWoGjxjzkw/wpq4Ddv+4Pduxthdyfg2ZTCSp2ydxTBs=; b=QzPVb7M0ahMRtFGPcn7MaYsIZJmYZw2leBbae+T7rr708bWBEd7Xyn1OD9L1sXSsBJ 86cEH5PPhArnTUfZnep1C3eu9AYWdfMAAkB3m/TdJhAhYxBi7mQ+0FfJ5FrP2e25AU2F knqV75unleIB66W5BrrJ7hyduqbIiUsrYKvFGDDtK6M9pMfC5SYO9xHqKua8l33F+O6k 2RfNQglZauObKs9Um9MJ8xmtH38W9l1oereOv+JGfyWy/m7+ycf39LQs8CVqVQooiwz7 lgsV9a3dsOyZ4ON5jyO01jg3phe9ancwMHfBZ+Fz4KXrbVRu9f6GJ4EPRWQ7yTJh9WAY faRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585146; x=1778189946; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=JWoGjxjzkw/wpq4Ddv+4Pduxthdyfg2ZTCSp2ydxTBs=; b=NVXlij/njlU1i9zd76oZr9fD5B3vIAc4nI1IrziZBBQX4vpTwu5ngKolQGgzL2R+zV PVZv7BiLUPUKa3wq0bMB9ucsmse/ODxciUBfaQ5cBmiPXLRVBCIpb1KpsMpM4EkT57/Z I/MyAaZSAoBNy9YiPD8Fis3ODXRKt2IcqE98Nf7Okxz+pYkT6cFxeMvA8gjejZuZ7kd2 Lto9RN/BqutlMDzjX+/41pPjowY58K1C4R+xRk66ovFJfz146ivfmg8HAoEqex7y9xsl qtMjNnhUpTjUsPYEjDJBIko7KK18VaKRJxmmHMCSu7bZ0Sn97GxUGENa0SBrqQfxB74X vGBg== X-Gm-Message-State: AOJu0YyMsoVDCCFJePs7ge1QAqLIgFR7E6USnJ3DPPlEgshYizEMs3c7 5tuyUVJfkse99/xaoUmAM8XTJtCKIEE2j3WxMOIr95Z8PMyEKtXa5API X-Gm-Gg: AeBDievAKepfkFuRWqEd7quJJpN82Rdtxnbc3Ldudjtu62pwWPKg3EBUNckEklWXh6N CXKUPEVg6hmHZKwAhZN0OGelxjh3MNP6AjH3Nm6e8XaZo2N5vMrxMMu4GluVTyx6HY4DiusPZR5 9kpE5mZQQ7Z0KeKUY0t5peHcj84IquVTaLE6qeS35GKiZD7hsiyKQWpypkHornPldCLO0rcpq33 WlkLqL3XweyBgZRxqtcLBKS2Tb0Gx7BJQARM243EizwcbZJRqz45Wm+jOGetsZbMKiH4wzSN7DD FgTZRFyX9tBg8cz9xiaDIMQMG4lj6d25n2z0mFRDxegi4uETeQWbt9OGEh/pRC/0F1Y0qf9ggl1 Lj5m6mzfr5Y6Qee0L5I0UNOi2RdZeqyFoyDw/BsOOf2+L286KUaDbwSHzCBjmnNhfq9Jt9JoSBU ChOfZONeA4IqJ7bF3E1UzrRPzQIGqhepTH5N5+RSMC X-Received: by 2002:a5d:64c6:0:b0:43d:6fce:3ef with SMTP id ffacd0b85a97d-4493e2ba8d4mr7506357f8f.21.1777585146293; Thu, 30 Apr 2026 14:39:06 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:05 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 16/29] sched/rt: Allow zeroing the runtime of the root control group Date: Thu, 30 Apr 2026 23:38:20 +0200 Message-ID: <20260430213835.62217-17-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow execution of FIFO/RR tasks in the root cgroup regardless of reserved bandwidth. Tasks in the root cgroup use the standard FIFO/RR scheduler. Allow creation of cgroups with runtime or period zero. Disallow execution of tasks in zero bandwidth cgroups. --- In HCBS, the root control group follows the already existing rules for rt-task scheduling. As such, it does not make use of the deadline servers to account for runtime, or any other HCBS specific code and features. While the runtime of SCHED_DEADLINE tasks depends on the global bandwidth reserved for rt_tasks, the runtime of SCHED_FIFO/SCHED_RR tasks is limited by the activation of fair-servers (as the RT_THROTTLING mechanism has been removed in favour of them), thus their maximum bandwidth depends solely on the fair-server settings (which are thightly related to the global bandwidth reserved for rt-tasks) and the amount of SCHED_OTHER workload to run (recall that if no SCHED_OTHER tasks are running, the FIFO/RR tasks may fully utilize the CPU). The values of runtime and period in the root cgroup's cpu controller do not affect, by design of HCBS, the fair-server settings and similar (consequently they do not affect the scheduling of FIFO/RR tasks in the root cgroup), but they are just used to reserve a portion of the SCHED_DEADLINE bandwidth to the scheduling of rt-cgroups. These values only affect child cgroups, their deadline servers and their assigned FIFO/RR tasks. Signed-off-by: Yuri Andriaccio --- kernel/sched/rt.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index c994447f5b1c..5caddc5c2876 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2085,7 +2085,8 @@ static int tg_rt_schedulable(struct task_group *tg, v= oid *data) /* * Ensure we don't starve existing RT tasks if runtime turns zero. */ - if (dl_bandwidth_enabled() && !runtime && tg_has_rt_tasks(tg)) + if (dl_bandwidth_enabled() && tg !=3D &root_task_group && + !runtime && tg_has_rt_tasks(tg)) return -EBUSY; if (WARN_ON(!rt_group_sched_enabled() && tg !=3D &root_task_group)) @@ -2153,13 +2154,6 @@ static int tg_set_rt_bandwidth(struct task_group *tg, static DEFINE_MUTEX(rt_constraints_mutex); int i, err =3D 0; - /* - * Disallowing the root group RT runtime is BAD, it would disallow the - * kernel creating (and or operating) RT threads. - */ - if (tg =3D=3D &root_task_group && rt_runtime =3D=3D 0) - return -EINVAL; - /* * Bound quota to defend quota against overflow during bandwidth shift. */ @@ -2242,6 +2236,10 @@ static int sched_rt_global_constraints(void) int sched_rt_can_attach(struct task_group *tg) { + /* Allow executing in the root cgroup regardless of allowed bandwidth */ + if (tg =3D=3D &root_task_group) + return 1; + /* Don't accept real-time tasks when there is no way for them to run */ if (rt_group_sched_enabled() && tg->dl_bandwidth.dl_runtime =3D=3D 0) return 0; -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com [209.85.128.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 925F73B27DE for ; Thu, 30 Apr 2026 21:39:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585151; cv=none; b=Kq86q8T5Yl4Lnq6OFgPfh6UVKVc+fjhjZ0GB2tFQCiSB2vI/3Na/fuU9z8ckHCD7ZpbEOtGfR5MUwlHJ12q/JpiAv+U5XpjLJdFUbfN51HTO0oImHMXPpe2Jqqu6/m2SzSzsMkfFkGjm6LmzRjXYFgYeEclD4sA23wziRNmwEkI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585151; c=relaxed/simple; bh=vojUATYKeMem1v29qT/lhnaJyJsj2kwzocGZb7jO8Pk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IpI/T04W3wx7XQbVe1u/CodvlFRTatVZee7oCS5h4NjPxhDHX0lnR3Q1OpQUFyoFqIGpc4gmN24lpoLF3VwRtILwR7Q0uWW+JdJjaLfACq2wbja1SsTQv6eqswaD6SffRyEO8QgIamyG07d8v5Wl42ufTqsnX2grRCooaQk8VKw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eLbjgwmK; arc=none smtp.client-ip=209.85.128.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eLbjgwmK" Received: by mail-wm1-f46.google.com with SMTP id 5b1f17b1804b1-4891d7164ddso7944865e9.3 for ; Thu, 30 Apr 2026 14:39:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585148; x=1778189948; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NHSNVXnHE7UFetmSs3/3K+W0Sl2EX7Mfxn+azRRC0QU=; b=eLbjgwmKZBfRwpT0Th3bDwKzyDGnArgpJa3gC55YtFTGigSTa28/lHav1VtCXzbmp9 vdqBOpgOxGQ7GIKvrHFR4QUSqLNdLVrEBPT0ixWmMYzLXT87hWF3kTTZ8+JoD3Clf1oJ VJbSjmbPhHqpCTMHJx2dhR5/MPXgbONbJ+7h+cfvb3JC8nRHYA93Nla/wHCA79k0g+iz gOIEMGGPRnh8zbKyuOrFoCUbhWN5ZbV4RdRupjKa8CM1TlJkhMg0qhUN+gGMB6WKHfmy vNBzFQv+1Rit2Xw7F8liVr/k9t6UQW/rpLgMtcgJko3Smc5oCOZTrSYN1agtp+2DpK6F YL6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585148; x=1778189948; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=NHSNVXnHE7UFetmSs3/3K+W0Sl2EX7Mfxn+azRRC0QU=; b=A60fjbMBePX94Jh/khiVpQT2CaNtxSIRzRn1ECwu/VlSIrBtTecjdNs+Ju8k+G1q/J rulqvh157KqYK043OvFeMcXGvhsIEWmcxxk02DMDbkiLBuuCHWixoSjea06QMF8Aht8E Qebon9VgR9C7TyFvCVCwrrJh79yaK2gPt7ejQ+8Idvc7U4IdRM0aKAjaHAvxrNwhPQbO zkY/zx3JJ15ow42c8IrS1dw3/ChP0IuwSbKRkpIcmKpx8XLmdwcjxWMuew0Q+0RlHuMC Iq8+CIJoSOI2B1m/51YzcX4bvRda606LKJ3/iC9J1pb0nGjvKN9nalUztP/FSjRYKz4m l12Q== X-Gm-Message-State: AOJu0Yz2pj6x9Nyl59RwWt3Rn/a6pW99G4Jn9+W8vbcfhdrQ7tU3Lo42 ek/+q49kYCsE0/fyt4j/kS0U/0GQLlvz7luRT1CVteuBwsySMlW4rPF1 X-Gm-Gg: AeBDiesSlkZRZ/BezZD1EEQLrG1EoUbcKyPmfzXaSfjTKgdnoKbyJ4DPuU/KGwlHmSc aEkvULteE8/an1HSJTFg0pHFiatvveJ32Afcg8Wefpx9LiFlz9JyWN9sngA2d1W1drD9ci/ppo4 CF6tc1L5TCJN+bQDFR4lnSlm4fvQA+FF4/I02+PvrjD0vFPPyKACCz4vMOMxxuAN7NvI+mXGfAa EBs/wTKqR9FJtLrrmrJDBMhOCUcvVqZ+DqautNrvAP1zBzgesYDI0nEFr5Ng8IU00SuX/xndaPT OB5K5AI44ez/ITai60RwSsQcw/qSkCbpYAkxyrc0Ke37KnNo5ezcCpQkeDwIl6rkYF2l2EjiNcn rDmak/xrvIWeLIWl65i/mhxVr2Sbdy3SiuFayM4VPHkPhXikhfkGKSRFedrjrxejHAVsXGZVQzM mMvmLqi7s8vvHZzxHqYh97PUlhDG5ih/Hn3vyTGp3P X-Received: by 2002:a05:600c:8b84:b0:480:69b6:dfed with SMTP id 5b1f17b1804b1-48a84460ab3mr77944765e9.24.1777585147984; Thu, 30 Apr 2026 14:39:07 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:07 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 17/29] sched/rt: Remove old RT_GROUP_SCHED data structures Date: Thu, 30 Apr 2026 23:38:21 +0200 Message-ID: <20260430213835.62217-18-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Completely remove the old RT_GROUP_SCHED's functions and data structures: - Remove the fields back and my_q from sched_rt_entity. - Remove the rt_bandwidth data structure. - Remove the field rt_bandwidth from task_group. - Remove the rt_bandwidth_enabled function. - Remove the fields rt_queued, rt_throttled, rt_time, rt_runtime, rt_runtime_lock and rt_nr_boosted from rt_rq. All of the removed fields and data are similarly represented in previously added fields in rq, rt_rq, dl_bandwidth and in the dl server themselves. Co-developed-by: Yuri Andriaccio Signed-off-by: Yuri Andriaccio Signed-off-by: luca abeni --- include/linux/sched.h | 3 --- kernel/sched/sched.h | 32 -------------------------------- 2 files changed, 35 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index ea2e74598b93..2740f043e534 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -628,12 +628,9 @@ struct sched_rt_entity { unsigned short on_rq; unsigned short on_list; - struct sched_rt_entity *back; #ifdef CONFIG_RT_GROUP_SCHED /* rq on which this entity is (to be) queued: */ struct rt_rq *rt_rq; - /* rq "owned" by this entity/group: */ - struct rt_rq *my_q; #endif } __randomize_layout; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 78f080275bf0..a4435f107cfe 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -313,15 +313,6 @@ struct rt_prio_array { struct list_head queue[MAX_RT_PRIO]; }; -struct rt_bandwidth { - /* nests inside the rq lock: */ - raw_spinlock_t rt_runtime_lock; - ktime_t rt_period; - u64 rt_runtime; - struct hrtimer rt_period_timer; - unsigned int rt_period_active; -}; - struct dl_bandwidth { raw_spinlock_t dl_runtime_lock; u64 dl_runtime; @@ -341,12 +332,6 @@ static inline int dl_bandwidth_enabled(void) * - cache the fraction of bandwidth that is currently allocated in * each root domain; * - * This is all done in the data structure below. It is similar to the - * one used for RT-throttling (rt_bandwidth), with the main difference - * that, since here we are only interested in admission control, we - * do not decrease any runtime while the group "executes", neither we - * need a timer to replenish it. - * * With respect to SMP, bandwidth is given on a per root domain basis, * meaning that: * - bw (< 100%) is the deadline bandwidth of each CPU; @@ -513,7 +498,6 @@ struct task_group { struct sched_dl_entity **dl_se; struct rt_rq **rt_rq; - struct rt_bandwidth rt_bandwidth; struct dl_bandwidth dl_bandwidth; #endif @@ -831,11 +815,6 @@ struct scx_rq { }; #endif /* CONFIG_SCHED_CLASS_EXT */ -static inline int rt_bandwidth_enabled(void) -{ - return 0; -} - /* RT IPI pull logic requires IRQ_WORK */ #if defined(CONFIG_IRQ_WORK) && defined(CONFIG_SMP) # define HAVE_RT_PUSH_IPI @@ -853,17 +832,6 @@ struct rt_rq { bool overloaded; struct plist_head pushable_tasks; - int rt_queued; - -#ifdef CONFIG_RT_GROUP_SCHED - int rt_throttled; - u64 rt_time; /* consumed RT time, goes up in update_curr_rt */ - u64 rt_runtime; /* allotted RT time, "slice" from rt_bandwidth, RT shar= ing/balancing */ - /* Nests inside the rq lock: */ - raw_spinlock_t rt_runtime_lock; - - unsigned int rt_nr_boosted; -#endif #ifdef CONFIG_CGROUP_SCHED struct task_group *tg; /* this tg has "this" rt_rq on given CPU for runna= ble entities */ #endif -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B8243BD222 for ; Thu, 30 Apr 2026 21:39:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585152; cv=none; b=gy5NVINty8XCWZXNc3W3Z/zPNd9Y5IeVB7jcvY/XpGLgBnreL/2FxuAHYslhT9iPcNQl0bk6GVOnfbBIRzeWd+8v63hWXZaq/h0KH29PBilwsUlzfgArv5llplyEDqFN53KmOmK+HbcxQYvAV9Eekthh4Or5TmOmkI1RtoQ33ag= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585152; c=relaxed/simple; bh=6VMA0qytQJ/Bla+p4ZjLWbIgHCnTI+s009MTcaGX9jA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=g5TPA0SDvJCefiQlZtsaL62munFsSBVINufs+KZytrNkhaC9TBdQIqRNv0Tpxh6bVGbSQt0c54rABnulv63xFMtRsqwRnvnytny4llVIEWADp8Vl03vucl9OovW3K86sT0Ywyc3asTP6F53PsPUjSQRtB4TffZ6CWMjZiFKWKXw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=tFyo9GBl; arc=none smtp.client-ip=209.85.221.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="tFyo9GBl" Received: by mail-wr1-f53.google.com with SMTP id ffacd0b85a97d-43d73422431so1212157f8f.2 for ; Thu, 30 Apr 2026 14:39:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585150; x=1778189950; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0Lu5xeVnQS5ycT3Y8o6/KnH85WLFd+3MlfjjSq8qF0I=; b=tFyo9GBlqid/Q18JVrkgbd/U/d/H3x1nXR6abhXnfS4+p82yf7um5e3zcfdfIajdfh kB7I4lDrpzUKVlCFoAR+Chcpr71BbYKzRYmd7gPJrA3kqQhHvQA/4pdYzu6t7Z8YCObN T+MwMnQM80PEQm//42UMgQVoraSmDrAYz2kCW6db6a2uyfpbHXYcVtJF3pNQMSl0q9lw U0YnAOVX3Arfe4bt8XUitBh6aXLp46zDRPIdYFIF9E5+zagU9y4PYzrWT906a345esPB leh/1GnG3/wMM+CyTIRFJvfhrYhlpQb6W/kfhF+NL+cg1RxvmHv/scgvygC+/NugNaAv n8hQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585150; x=1778189950; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=0Lu5xeVnQS5ycT3Y8o6/KnH85WLFd+3MlfjjSq8qF0I=; b=TKVFlkpoJ4GHaMY7Le1fgP6dB1E56VAz+XuCBX8mM3XkVwctdS4BErsO9ylR+NIuIM s5E8/4xmjv65d5UD1MP9xHYYeKYcZZuavv5nlId81N3omc+6SgqmLG9tCVbK4O63kMH7 fmwIA7TR2WlYFPyYurf+lgrRMyNNHPhx2C+GC3Y5nL5Khu5yj5yxUNRpm4j4HMYVQuax KS16syDgkffd0tZPfcF2pJ+sLt9L5DJ6dpFMlyLszRxVuSxZ2Q6wR47/Wiu27nJfSVyv WgaGHI4xQp28gyZ4YQQFV+0SRh+pgQLKe7QiaEfe3fYtjfu4IxkemaIJwcgRO//SeGqO Yh+g== X-Gm-Message-State: AOJu0YxRjk/yxmKl3QS6nUOSQ//x1xsXKWs/wLbBuJtJPr1IskjbGsYI FnJNrLthJkcZ+3OpfvfGYKWZ5Hd8p+6FZxM9zHvOuAcSfVXTh6PWEHFD X-Gm-Gg: AeBDietCkviWBhbYyhL4Gkg7UAzJyqE2PZpFzv16xCUV/AZCYzghzFW3xp2FDqo7o1k EIxxaGVEzszCXOTLFXUW7TYgNhKg4CwxHGBUZpFR9L9uKZK+bsYrihm3mwCnyvMBa45qqyKNBXc Ht/R4MjNp+7pO3P8ZkM2TsL6Xw/Lxk9HWG9Vobe/+2EwMkI6k/5Q+hU7duRGSA+Hh+GzTp0vo+e GD3Sn2hreSpeZT1ZV6ZXkLR/6RUo+2P25QxfA8rKYmbwsPVQNiCxyXmkKBE3oXjodToC6ezv0Pt XetzJjAaGFuqAzwRNPgSCZuFuCOeRkX033r7ZK9kIyWHOfHiDWdynBOpPJsqHiIQhH3uLq9LlR6 79BhG0+P3gu8xJ6JMP0OnUo/X6S8PhxJIQ4Awh2G1mAz36X8rL5lckmRfnX29hoStHrqJAeHUSp QoSjuYaJYI0NBnET+F1gkUoKXuPwN84XooN4SLp4EA X-Received: by 2002:a05:6000:290d:b0:43b:5231:e94a with SMTP id ffacd0b85a97d-4493ec61b1dmr7673226f8f.30.1777585149614; Thu, 30 Apr 2026 14:39:09 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:09 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 18/29] sched/core: Cgroup v2 support Date: Thu, 30 Apr 2026 23:38:22 +0200 Message-ID: <20260430213835.62217-19-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Make rt_runtime_us and rt_period_us virtual files accessible also to the cgroup v2 controller, effectively enabling the RT_GROUP_SCHED mechanism to cgroups v2. Signed-off-by: luca abeni Signed-off-by: Yuri Andriaccio --- kernel/sched/core.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0c7032d254ba..3ffe3ac5071d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10245,6 +10245,18 @@ static struct cftype cpu_files[] =3D { .write =3D cpu_uclamp_max_write, }, #endif /* CONFIG_UCLAMP_TASK_GROUP */ +#ifdef CONFIG_RT_GROUP_SCHED + { + .name =3D "rt_runtime_us", + .read_s64 =3D cpu_rt_runtime_read, + .write_s64 =3D cpu_rt_runtime_write, + }, + { + .name =3D "rt_period_us", + .read_u64 =3D cpu_rt_period_read_uint, + .write_u64 =3D cpu_rt_period_write_uint, + }, +#endif /* CONFIG_RT_GROUP_SCHED */ { } /* terminate */ }; =20 --=20 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 025483D1710 for ; Thu, 30 Apr 2026 21:39:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585154; cv=none; b=ZhD4QJbzsFE4jJfmu2pgO2x/TNPyb4SnqvGKpPUtUBknHCg4p8kQc2gnn2/DjtqxSB8bYZg06nHHf9wvGt5dFMt3rbYlo5/0neB/dGAgXFMtoK222IFSEWrP4JdEzkpwqUEZQ251oWNRStfIjtwGFVMCRzKv/EtmJdesBtwpono= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585154; c=relaxed/simple; bh=KNyDrSUOwVWl0YBvcP+lNQ6fhGiQtUI8jP+CTaZDL0o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E8Q1tjKoe+bXK4JR2tywpm1j3GdqWoEhD+wpTsbMnEtTSacD/tmV0cAH96pA1V+lnsdJGFkfZXePn+KkMruqjFgg7WnX7BRb0MD9UMgb6QUVQE7NRKB273pQhbtfSS0XOPz9xQ4S8foukMwfpzvz3EGx+Huc5XnzXWh5iXoi4/0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JKTX35O0; arc=none smtp.client-ip=209.85.221.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JKTX35O0" Received: by mail-wr1-f51.google.com with SMTP id ffacd0b85a97d-43cf8d550bdso1244901f8f.0 for ; Thu, 30 Apr 2026 14:39:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585151; x=1778189951; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=dx4h4WRe7Yzkag0HrByDdBhzfbZmh9LuMz6xWPMox/4=; b=JKTX35O04UduRnazz4eeZhSUQMhYxKOxxo7saSokVWpucRXuKon/NPs+zBbGGgIyop rSvIArvGohLHk17h0QiVAzNUPGmfDbfHowHLzUuYgVu9yboP1paswvzwEd+e6tI21OsN vapoITcBRJ2ydsxP0I+2XUN0C2TOubooeNXyxRYHgtjrf25OfGfmaOahjDFMdgWXLo6T hJqN1WUP50VmJwsT+P4cMyP5l2CjYsMbmpBpuCFLBauQeeMjmoOwgrY67va4oO0vymKA 39saRRFD0QQjjMESQht2X4XYUnKkMl3Fl6lKtra3RhN4TbtbswgnYGiEwZcoadMJridb f0mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585151; x=1778189951; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=dx4h4WRe7Yzkag0HrByDdBhzfbZmh9LuMz6xWPMox/4=; b=XndRDPrTk2t6DQJ/43mWdAY8057NNiOrVptUiggI7GxQMVITNr6cVeyLmFE50ncKsw yY6i/Yov5tc0p9Zx6XFPoggp2VAlNFkI+aRV9D2dk8iPaaiU7abbsDKTte9p0UMUdvU+ 9Om3gMgzabr5ROCR/6PlU93/e7bJI/35JD4dGjXHE6Sc6pIDrmD6Pp/3XUVp8geO5mwe ftnL9PS/jrGXN24tM0fDiexfCspFGk64eS28LIg+ZQbNWwBXUsKqC6u9nhReMZxNeCYH sWgqsVyX2cowhMCEQoqj35pfMq/ujw5UrphRv+bkaTB19nSlzFNcYiQcu5Lmk1DcpzMJ leOA== X-Gm-Message-State: AOJu0YxlC50Tf8Q1g3T0zJgsGQsacSOrEjYAa1FS0tPiRggpXRSZ/3rv G2f62MpBVb45FF9X/ZKoIfCyRKAsbz4qcECcXCzF7IgzXpAPeHI/Aeg7 X-Gm-Gg: AeBDiesxDAQVoyDF68XoJUBJpCEAt1tcOop57wcM7Su5R2DvIxgPwkP4IzvMiVqR7z8 cWU+c6mlHBLOXVe5vWs9G6Fm8HbJJNNInr65WqJetJhcDoJccEgdy7Kla2aqVhDuMfKBxCuqYGC XrXBJGEsARhJmqdhSAZ81PQM5fH9TABM4IKnkLb92CQk8viA3x8srDcv+BsBhObH02FX3ofB3CK ij3i7QivETzbQ6Fwbl1Pu4Lhfg+xNkMUYrKeFgDqAf1TMQad6bBp8rzML13hFNCu/HXTgmw0aZ4 MYrVNBcV8mcXVvewOHkb5Q6GTFXhXT9Pvo8ZR09YqJm3GC6yWD4YZI3SsHgMF5WGQJvjHgqtGFk ox1u+54oOuGmUfT1lhyq3Vo0EIkbW3GRfEBae5o+En/yWsZDMlifTgX0pv8ew6QdNKSifuK7+EZ 1NLpn+f4Yh37zxbrav9Gy+N0YuJqWvCmc66dnSEVWw X-Received: by 2002:a05:6000:2212:b0:43f:debd:feb1 with SMTP id ffacd0b85a97d-4493ffcc38bmr7345440f8f.39.1777585151292; Thu, 30 Apr 2026 14:39:11 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:11 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 19/29] sched/rt: Remove support for cgroups-v1 Date: Thu, 30 Apr 2026 23:38:23 +0200 Message-ID: <20260430213835.62217-20-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Disable control files for cgroups-v1, and allow only cgroups-v2. This should simplify maintaining the code, since cgroups-v1 are deprecate= d. Set the default rt-cgroups runtime to zero. Needed for cgroup-v1 kernels as they wouldn't be able to start SCHED_DEADLINE tasks. The bandwidth for rt-cgroups must then be manually assigned after the kernel boots. Remove cpu_rt_group_init function. Signed-off-by: Yuri Andriaccio --- kernel/sched/core.c | 26 +------------------------- 1 file changed, 1 insertion(+), 25 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3ffe3ac5071d..41758824b460 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8649,7 +8649,7 @@ void __init sched_init(void) =20 #ifdef CONFIG_RT_GROUP_SCHED init_dl_bandwidth(&root_task_group.dl_bandwidth, - global_rt_period(), global_rt_runtime()); + global_rt_period(), 0); #endif /* CONFIG_RT_GROUP_SCHED */ =20 #ifdef CONFIG_CGROUP_SCHED @@ -9984,20 +9984,6 @@ static struct cftype cpu_legacy_files[] =3D { }; =20 #ifdef CONFIG_RT_GROUP_SCHED -static struct cftype rt_group_files[] =3D { - { - .name =3D "rt_runtime_us", - .read_s64 =3D cpu_rt_runtime_read, - .write_s64 =3D cpu_rt_runtime_write, - }, - { - .name =3D "rt_period_us", - .read_u64 =3D cpu_rt_period_read_uint, - .write_u64 =3D cpu_rt_period_write_uint, - }, - { } /* Terminate */ -}; - # ifdef CONFIG_RT_GROUP_SCHED_DEFAULT_DISABLED DEFINE_STATIC_KEY_FALSE(rt_group_sched); # else @@ -10020,16 +10006,6 @@ static int __init setup_rt_group_sched(char *str) return 1; } __setup("rt_group_sched=3D", setup_rt_group_sched); - -static int __init cpu_rt_group_init(void) -{ - if (!rt_group_sched_enabled()) - return 0; - - WARN_ON(cgroup_add_legacy_cftypes(&cpu_cgrp_subsys, rt_group_files)); - return 0; -} -subsys_initcall(cpu_rt_group_init); #endif /* CONFIG_RT_GROUP_SCHED */ =20 static int cpu_extra_stat_show(struct seq_file *sf, --=20 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91BB73D3D13 for ; Thu, 30 Apr 2026 21:39:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585156; cv=none; b=Ge4zqU2Xdr/HLSfXqg1U8TTvjMrgbdGWZBb9M/w7R4R+oK2t/XxOCxockV7m9zwFpucvqmUcbUvOUpOlD5lrgibdntFzsTjb9cTrbjE4sY+bGtnonRvFcqj+qTrCYPjIyUFq1petnBnG/bu/K7VSgQGIwlIaQCuhTcP3/wAf0pw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585156; c=relaxed/simple; bh=wM0TKq7Rg5y5xDtXxIITO+iqo0RmM3qXEg68VZyNcHo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eHQw11AHHA61e3YNOafyhnAef19Vjg9Uf/5k6GRe1Cn5iQQrt3sf422CSYw7AGcHl7vpT/Ikm+VvOQ8Pdf1QTnYAbwQbABGYPAALhqwyCCb3RTVqkriBpCb5gy56Dt75vGlmlk3g9AweazBAvl16vBkrLXcRP/rGEcWaM1qMNS4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ujrc5KEc; arc=none smtp.client-ip=209.85.221.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ujrc5KEc" Received: by mail-wr1-f44.google.com with SMTP id ffacd0b85a97d-44a74032ff8so111405f8f.1 for ; Thu, 30 Apr 2026 14:39:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585153; x=1778189953; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eamkCv1N/oJLdxwGUSNg+AYQDDcGgmr4OHVsQdeF9c4=; b=Ujrc5KEclHI93rddY9x9Pbf2DAs2O0ljCHK79Jz1i2simI7pBvUPfxVvTQoojF9GOb Z0O3f/bdSKDLECnlxbRCoY0VjoqhftoVQmD6t7e9uiRVkW+UGxrgYtJlcpjKtVxnQsKr m1xjVwN6br350LMw6TZkxy8bsnhl/bjWVIGNSX8YrboCUR0xV1lv23Nd81G531/xvUrS k+N2bkY6OxNDkJhtmBitAfEUt4lseTU+LP4OdIh0EgBSXMytli+r2UkIQ05uYL9DMd4A H/23qAaEyihQ1i04m9w6A31TGTAGSsTtjQGhTBvtxhRO1wmr2X33H6iIjJ8KojNicGT5 UrSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585153; x=1778189953; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=eamkCv1N/oJLdxwGUSNg+AYQDDcGgmr4OHVsQdeF9c4=; b=ZDyyt8FXkb4vVYnmU8gh6o5ukeF4nZ2z4lfQWrIbdxlEjWAjZNg04SKFddWwZF5x7a juUnPjZOxes1f9N9gZZ7lN+OykhdOPWoQRCQ4O1Ie2TEXqoKKqqzeNzDv8Cwe8Ds+sRE eM38hMnFbToBOl+kaU4qsXJloVZzJHL+CFroV1+W0bERyjIFraF+62eF5UJYTbf1Sxkf DaxJCOhbXA8v3pn+Y2SWkywi9wujUntxtLhpixH00o1nleZ6fGd5G/fH7eohK6nU7gOw P2jEarZr9kFf/M33X73C+jvCGSseYxdYHXL9zzrkMgfozLnHED0e9faE64iQUV8OGWDE an7Q== X-Gm-Message-State: AOJu0YxZhtSKJlaa7KXPU/PqMlSv0cNAKl0vQmID/z9R1XA1NyvstZSN FHqywzmpfDZ0iL/MPboh6moxHGzfeRDOWZ5rsufuQSC8XGlrmFs1Qi9/byrYUA== X-Gm-Gg: AeBDieuShKUftyKQ0NxeLcklv9ibwqkZ+aORm1Kfu+E8R9rXDzBkdNrD2KgKvj6l5tt RSeQNBXYMUEtnVNBglill1HMNK1MThUsgk8Zq1IW/KUjEG8IZIMPOi9sT827sbHHjaPbC5s1lVH x2BKpagV9NM8lsnFCTLvowUzhrZwAwqhnu6CAz6LQrYUQuVc62JRjoqVcCRAM1aDzLm1P8t8th0 +OkHhf07o81PTIkd9wVzP6l4ItbcHseUxDpG46VolTNEkq65EOtdf26YfAdMihpFVO120BsTmk5 Hmrquzf1/+wnBaO4Pwwg3nfzbWg3eNyynSuJIktKPu299PLKzw5TLVFFb5VSS23w/336nUHwxSm mrKV/VCU9LgPKQ80C+/TQo4OHmUlAz2rBL1vLBOywIxH9JFgBn8z0plIcTwta2OXXTfR6noRzdv e23UeFc2IdTqd4UNiKrjSGQ6Rf3JSJOapjfsADJVeA X-Received: by 2002:a05:6000:2381:b0:43d:1dfe:350a with SMTP id ffacd0b85a97d-4493e982942mr8036984f8f.22.1777585152835; Thu, 30 Apr 2026 14:39:12 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:12 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 20/29] sched/deadline: Allow deeper hierarchies of RT cgroups Date: Thu, 30 Apr 2026 23:38:24 +0200 Message-ID: <20260430213835.62217-21-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Allow for cgroup hierarchies with more than two levels. Introduce the concept of live and active groups: - A group is live if it is a leaf group or if all its children have zero runtime. - A live group with non-zero runtime can be used to schedule tasks. - An active cgroup is a live group with running tasks. - A non-live group cannot be used to run tasks, but it is only used for bandwidth accounting, i.e. the sum of its children bandwidth must be less than or equal to the bandwidth of the parent. This change allows to use cgroups for bandwidth management for different users. - While the root cgroup specifies the total allocatable bandwidth of rt cgroups, a further accounting is performed to keep track of the live bandwidth, i.e. the sum of the bandwidth of live groups. The hierarchy invariant states that the live bandwidth must always be less than or equal to the total allocatable bw. Add is_live_sched_group() and sched_group_has_live_siblings() in deadline.c. These utility functions are used by dl_init_tg to perform updates only when necessary: - Only live groups may update the active dl bandwidth of dl entities (call to dl_rq_change_utilization), while non-live groups must not use servers, and thus must not change the active dl bandwidth. - The total bandwidth accounting must be changed to follow the live/non-live rules: - When disabling (runtime zero) the last child of a group, the parent becomes a live group, and so the parent's bw must be accounted back. - When enabling (runtime non-zero) the first child, the parent becomes a non-live group, and so the parent's bandwidth must be removed. Update tg_set_rt_bandwidth() to change the runtime of a group to a non-zero value only if its parent is inactive, thus forcing it to become non-live if it was precedently (it would've already been non-live if a sibling cgroup was live). An exception is made for groups which have the root cgroup as parent. Update sched_rt_can_attach() to allow attaching only on live groups. Update dl_init_tg() to take a task_group pointer and a cpu's id rather than passing directly the pointer to the cpu's deadline server. The task_group pointer is necessary to check and update the live bandwidth accounting. Co-developed-by: Yuri Andriaccio Signed-off-by: Yuri Andriaccio Signed-off-by: luca abeni --- kernel/sched/core.c | 6 ---- kernel/sched/deadline.c | 63 ++++++++++++++++++++++++++++++++++++++--- kernel/sched/rt.c | 17 ++++++++--- kernel/sched/sched.h | 3 +- 4 files changed, 74 insertions(+), 15 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 41758824b460..fd532bb46995 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9205,12 +9205,6 @@ cpu_cgroup_css_alloc(struct cgroup_subsys_state *par= ent_css) return &root_task_group.css; } - /* Do not allow cpu_cgroup hierachies with depth greater than 2. */ -#ifdef CONFIG_RT_GROUP_SCHED - if (parent !=3D &root_task_group) - return ERR_PTR(-EINVAL); -#endif - tg =3D sched_create_group(parent); if (IS_ERR(tg)) return ERR_PTR(-ENOMEM); diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 74bff7fb7b92..5967b5350166 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -376,11 +376,46 @@ int dl_check_tg(unsigned long total) return 1; } -void dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_runtime, u64 rt_peri= od) +/* + * A cgroup is deemed live if: + * - It is a leaf cgroup. + * - All it's children have zero runtime. + */ +bool is_live_sched_group(struct task_group *tg) +{ + struct task_group *child; + bool is_active =3D 1; + + /* if there are no children, this is a leaf group, thus it is live */ + guard(rcu)(); + list_for_each_entry_rcu(child, &tg->children, siblings) { + if (child->dl_bandwidth.dl_runtime > 0) + is_active =3D 0; + } + return is_active; +} + +static inline bool sched_group_has_live_siblings(struct task_group *tg) +{ + struct task_group *child; + bool has_active_siblings =3D 0; + + guard(rcu)(); + list_for_each_entry_rcu(child, &tg->parent->children, siblings) { + if (child !=3D tg && child->dl_bandwidth.dl_runtime > 0) + has_active_siblings =3D 1; + } + return has_active_siblings; +} + +void dl_init_tg(struct task_group *tg, int cpu, u64 rt_runtime, u64 rt_per= iod) { + struct sched_dl_entity *dl_se =3D tg->dl_se[cpu]; struct rq *rq =3D container_of_const(dl_se->dl_rq, struct rq, dl); - int is_active; - u64 new_bw; + int is_active, is_live_group; + u64 old_runtime, new_bw; + + is_live_group =3D (int)is_live_sched_group(tg); guard(raw_spin_rq_lock_irq)(rq); is_active =3D dl_se->my_q->rt.rt_nr_running > 0; @@ -388,8 +423,10 @@ void dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_= runtime, u64 rt_period) update_rq_clock(rq); dl_server_stop(dl_se); + old_runtime =3D dl_se->dl_runtime; new_bw =3D to_ratio(rt_period, rt_runtime); - dl_rq_change_utilization(rq, dl_se, new_bw); + if (is_live_group) + dl_rq_change_utilization(rq, dl_se, new_bw); dl_se->dl_runtime =3D rt_runtime; dl_se->dl_deadline =3D rt_period; @@ -401,6 +438,24 @@ void dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_= runtime, u64 rt_period) dl_se->dl_bw =3D new_bw; dl_se->dl_density =3D new_bw; + /* + * Handle parent bandwidth accounting when child runtime changes: + * - When disabling the last child, the parent becomes a leaf group, + * and so the parent's bandwidth must be accounted back. + * - When enabling the first child, the parent becomes a non-leaf group, + * and so the parent's bandwidth must be removed. + * Only leaf groups (those without active children) have non-zero bandwid= th. + */ + if (tg->parent && tg->parent !=3D &root_task_group) { + if (rt_runtime =3D=3D 0 && old_runtime !=3D 0 && + !sched_group_has_live_siblings(tg)) { + __add_rq_bw(tg->parent->dl_se[cpu]->dl_bw, dl_se->dl_rq); + } else if (rt_runtime !=3D 0 && old_runtime =3D=3D 0 && + !sched_group_has_live_siblings(tg)) { + __sub_rq_bw(tg->parent->dl_se[cpu]->dl_bw, dl_se->dl_rq); + } + } + if (is_active) dl_server_start(dl_se); } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 5caddc5c2876..2be22024e66d 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -101,7 +101,7 @@ void unregister_rt_sched_group(struct task_group *tg) continue; if (tg->dl_se[i]->dl_runtime) - dl_init_tg(tg->dl_se[i], 0, tg->dl_se[i]->dl_period); + dl_init_tg(tg, i, 0, tg->dl_se[i]->dl_period); } } @@ -129,7 +129,7 @@ void free_rt_sched_group(struct task_group *tg) * to 0 immediately before freeing it. */ if (tg->dl_se[i]->dl_runtime) - dl_init_tg(tg->dl_se[i], 0, tg->dl_se[i]->dl_period); + dl_init_tg(tg, i, 0, tg->dl_se[i]->dl_period); raw_spin_rq_lock_irqsave(cpu_rq(i), flags); hrtimer_cancel(&tg->dl_se[i]->dl_timer); @@ -2154,6 +2154,14 @@ static int tg_set_rt_bandwidth(struct task_group *tg, static DEFINE_MUTEX(rt_constraints_mutex); int i, err =3D 0; + /* + * Do not allow to set a RT runtime > 0 if the parent has RT tasks + * (and is not the root group) + */ + if (rt_runtime && tg !=3D &root_task_group && + tg->parent !=3D &root_task_group && tg_has_rt_tasks(tg->parent)) + return -EINVAL; + /* * Bound quota to defend quota against overflow during bandwidth shift. */ @@ -2173,7 +2181,7 @@ static int tg_set_rt_bandwidth(struct task_group *tg, return 0; for_each_possible_cpu(i) { - dl_init_tg(tg->dl_se[i], rt_runtime, rt_period); + dl_init_tg(tg, i, rt_runtime, rt_period); } return 0; @@ -2244,7 +2252,8 @@ int sched_rt_can_attach(struct task_group *tg) if (rt_group_sched_enabled() && tg->dl_bandwidth.dl_runtime =3D=3D 0) return 0; - return 1; + /* tasks can be attached only if the taskgroup has no live children. */ + return (int)is_live_sched_group(tg); } #else /* !CONFIG_RT_GROUP_SCHED */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a4435f107cfe..9814be8348cd 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -409,7 +409,8 @@ extern void dl_server_init(struct sched_dl_entity *dl_s= e, struct dl_rq *dl_rq, dl_server_pick_f pick_task); extern void sched_init_dl_servers(void); extern int dl_check_tg(unsigned long total); -extern void dl_init_tg(struct sched_dl_entity *dl_se, u64 rt_runtime, u64 = rt_period); +extern void dl_init_tg(struct task_group *tg, int cpu, u64 rt_runtime, u64= rt_period); +extern bool is_live_sched_group(struct task_group *tg); extern void fair_server_init(struct rq *rq); extern void ext_server_init(struct rq *rq); -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wm1-f43.google.com (mail-wm1-f43.google.com [209.85.128.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F24563D5655 for ; Thu, 30 Apr 2026 21:39:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585157; cv=none; b=h7/KVjQu/RpgDaIY0PG6STLEziaBMfiDBE4UIC1qS/E1CDM2G2YSzDIVrHSCz7xr5HHw/2ZYiaYT0YwGYp1VwUmTTLvWvVjAvhVRCxrjwKwjL92ksu7BiwHLuDgZURlJrPSIilMx9IOFTTX8ASrWCT9w4AOdYtFqYooi7C/IiYs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585157; c=relaxed/simple; bh=cA1E8hhrItjSTZafOiknRPfkPFija8+qso41IKpTS8I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kmSxw7UD5fyNYBedF2+0zdYY9yRDOBvLtIjFpQz69Z/UrKpDkSphr9wMcHw4Ng5GiZ5uAe8jQnEGZTXfVAC5XjbgK3Ty3HbUQxjMcQrBfDk1hW8u7Em88tfuR2jS3C3eX6oWJXzGHAisjI/gdFTKEFMGWvjd19wXNUI1ScmrH4Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=MiY81DZW; arc=none smtp.client-ip=209.85.128.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="MiY81DZW" Received: by mail-wm1-f43.google.com with SMTP id 5b1f17b1804b1-488a9033b2cso13008385e9.2 for ; Thu, 30 Apr 2026 14:39:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585154; x=1778189954; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XwvkLx66y3TiCTYsZeffMU12ziMocblQ8uBHKybZ2JU=; b=MiY81DZWrnAXSufMahvle99JLLH2a18iJ2P3Ye4kn/z6AKA3qKfd1t0q6la4kWPNLD ODKHfhcq58hox6AWC2q7Zz9jxNiWd6/hu3CGvKpTX8RPRoI5+jJrJRsbOO0Xi+HQ4DzO lTUxXIRSdrrObIEG/NDkc9HGwiSNqgYzkRHtsAnRys6NzmF1cI855jxxbZ5AbAObnM3X G+kBY34+nzAhEZYfXHX9auW6U6sb1xZYHIt0sADN766M0+2LlUliNT/qkFcmL2Er6VdC MgI9/mSXfPM/VIqtliCC0gFC1PUhzBWGGoDvYBjdqVHIKUYGq0jctriIo+mdh2a0E/Gg oyrA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585154; x=1778189954; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=XwvkLx66y3TiCTYsZeffMU12ziMocblQ8uBHKybZ2JU=; b=qoYmpEvJBDfBAZrpzGBQ0nw1aXtBzd6GYLlaZ/4F1HXzLYsk305yw6mjxsIeEfscnc Tx7OHynlOVPrK5E478KvPGLXPEju28hp+N0m0jXVCVT8cs8+O7i0gIZ5WD2IMjEambZJ Xccoh1AWfbGZ+WRAxPSRJmrPQCf9aesG3NOLzZWrL0ET7+RRzkYPloCAHP38fJrl0aLa ARoChulylCTeulmCczMNcHpVEHFo/dVcGWzBwB6axLoTCwDqGmjGM2oruhSLnY26oD5E VRRB/WURmVOhSOJmWjOKFAb6xOdrVcMmo6J8AwJc+68DHhxhYgVleZ7LEBRcVLYvhcK2 e6TQ== X-Gm-Message-State: AOJu0Yzvi/Zbpfe7HxVVOsXVlmSr5agpXC12EowM2H5Ja5ktAchlOUwW 0YLtbXEyZw9HNv/olYULctHTReRKYsmXCt3dNVlQqC6Z87g8V1W/ABwG X-Gm-Gg: AeBDiet6HReK3gFPd/W0EGL43xwvjLyMz5zjCn5D7zGa+2+0FQWXaONBzbzW38LhxQe yGaqGglNV+zqo89LSrOsSkxQcoNy+0sM2sNIswv8SOxulR7fAwl+rwTj4nhEvnAbPwa6hqgluVS 8Q8P45St8Nl8qI1QtB/rcuaDJrEP9HSLt2Nrh9fPtKZqlISYXNLW2d5s/i9QHkYIBcSwBIVCpV/ nmk3t+ciaAOn25tb+v/aRtGOoaB0K/iLPe+M51e+BKgSjXtg5T1q16G8u/r0QjXZzZFl13Z32gQ VUY5dNi4SFP5PWLCOAbPlY1PoHiZ6bf7HQe/g6SGLS6z8khyv0dx207WfxAvnTiH1bo+DD7BH9M QTW+D27o5nbmKjC9Xorp+vxJ0PlnszfC+SgUoEkXoI4SJ718XpHmNLNS4hce51zexfBc7G050MC NVsXpEX9phHl8Q7XNuvnq8kt1anDKz9E9NNUA6Y+YB X-Received: by 2002:a05:600c:a30a:b0:48a:7a10:4f3d with SMTP id 5b1f17b1804b1-48a8eb74fefmr4035745e9.0.1777585154398; Thu, 30 Apr 2026 14:39:14 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:14 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 21/29] sched/rt: Update default bandwidth for real-time tasks to ONE Date: Thu, 30 Apr 2026 23:38:25 +0200 Message-ID: <20260430213835.62217-22-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Set the default total bandwidth for SCHED_DEADLINE tasks and servers to ONE. FIFO/RR tasks are already throttled by fair-servers and ext-servers, and the sysctl_sched_rt_runtime parameter now only defines the total bw that is allowed to deadline entities. Signed-off-by: Yuri Andriaccio --- kernel/sched/rt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 2be22024e66d..db88792787a8 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -19,9 +19,9 @@ int sysctl_sched_rt_period =3D 1000000; /* * part of the period that we allow rt tasks to run in us. - * default: 0.95s + * default: 1s */ -int sysctl_sched_rt_runtime =3D 950000; +int sysctl_sched_rt_runtime =3D 1000000; #ifdef CONFIG_SYSCTL static int sysctl_sched_rr_timeslice =3D (MSEC_PER_SEC * RR_TIMESLICE) / H= Z; -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4B783D567D for ; Thu, 30 Apr 2026 21:39:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585159; cv=none; b=mrJqsobT+0dGPVd0ZpP7hGUA5Mm/mWpoA3eJok8SUvn2MyKbpz1BQrmMCsO2KCLhNpDjWrNrPew4A4BsaeX2JJ8TWWthwoBpELAV3DJ+QS9mb5Rx54XZRiBdoyBLihakie2tZfbdc2AhmbIcZIpKFK6OJKnsdTc98MIm3XBf7x4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585159; c=relaxed/simple; bh=68LkNIP0ZjThHsqexlbrPlTFE8fp2jwe3T+d3UU6TbY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mDhbuyOaN+gxZwNIz+iQVg0DA34E1hE9R20jJf9sk/kVWXyf0MxIFyfGzzYveDcJmuWZZU3bU+YI+cFmLLVEC9iml6zWtSsAJY02Bt4/t4WKckSXsVzWcNDD489u8cuXc/+e9NTbKwS10gTdZE2qHIXT9+ufXnKJbReFLsEArbg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AwjsmCoO; arc=none smtp.client-ip=209.85.221.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AwjsmCoO" Received: by mail-wr1-f42.google.com with SMTP id ffacd0b85a97d-43d64313c39so1020744f8f.3 for ; Thu, 30 Apr 2026 14:39:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585156; x=1778189956; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fGjyemBq9V1ZIcy6qF9U8y/JdE09hO24pzspK8GmEfk=; b=AwjsmCoOIq3XM8/reQkNpRS+QoS1yb+DtnZvjoTlhvdLNvzFXbvVWYPGeRqoC6EX3V FPDnrzAyqJC7w2a7sr0YJAynkQG7Ob5xp3DKn0/QBT/02Tr/zLvmvPiGAat5o9uJ2My7 W7V/laNbIr3GDzgbytSITD2PuH1vM9C2sZg8/obL5ChBnMG9ZKB8PALfpHJ+O3bvZJJG xvL/oF7/4YIMxNHGuFbmchBr/L+4RzQtfHjfEay5GeFecWdac7EPiu0iCTZQxMRu6x12 ePZREez2lwLSeZaDrsJSqGF/SPAwbDBBK+X4gsgyiNvksLInLDvcq/MiDlym11RuM3Hv 1Diw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585156; x=1778189956; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=fGjyemBq9V1ZIcy6qF9U8y/JdE09hO24pzspK8GmEfk=; b=d1VEKiHrcm1ICbCRMp7vxZGnpCjyki58pzRZ9S9mFFmewIOt12M0gYICW2oePmiMBV FXn6njaveVnL5dhmS/NGwGu7tKtSS5S91idxhsT5Q02e0YLJpM2OyulM04RsvQUJ/gBU 2utH6XdLl7J6ACORz4bSvtFVNtz5fDelY26dJKPnWE6/1ijFKEdU6Qp2lFUqIcAtK2G2 3dm/Y5Z3NGBzQtw9+2B5TkOsLgDmrO+Nc0jsuWjaLvvFOafzhHXw0Ga/c7XE6oaT7oNu /ie+Wh8g0XZY/wbNsWr0T32h9PfJZ3i0tVL14sSJ/r6537a6H23twySRzeaccnBRPXS0 0HUA== X-Gm-Message-State: AOJu0YxOziQU2E8m536Vlek2CAz91Lqf5iKG27IXPjNtrNUTobseS5Tw ygNkCt7EYn/pyd3aIXdYeINUFVpBbeEPWBMUJMvxUU6KHXks1cI13R0h X-Gm-Gg: AeBDiete+PLdIan6rvs5U5Y8Cm92eQOzKMde68Y5qQtojx+Pl7LD/1wLaX0JpQ+1NEY X+Vvzy6QeTetje0HHFG3jyKCBdY3POS1G16VTcsyXKQwGCn+gQ+3dKlWEIboZg+aKiHcrFtsviE hsiRaG8Dv5Q80ojrKlw3C4kqiH3usqqYiaXOhNEdMJMcn2xzXxHyfLf7HO1oC3mJWWTdwb/UwxO rq9ZM4kQtxzS7NEpeT5asuEHrLV4YwyD/dRca103DGVIp7ywjrA9JMpmY2Y7kwwM3yfG1wbK4c7 za9n3K10QX79QMc498G41lqYQH/yy9RsRvaWn+OLq8+WDV5lGtKvOXQRHsklLMxhs4hfEvl/H2S 2Glwd7X8fxO59bHQLVfa9oC68IEVj52TaODk+HOi+43KuB+VMgOp0BX78yO846cQMFqx+YL+FI4 zl96t06lkJitkqSU03GQQBxXPgdQr8d3/f0KfqiUML X-Received: by 2002:a05:6000:2909:b0:43c:f3ef:ee36 with SMTP id ffacd0b85a97d-4493ef44486mr7666758f8f.33.1777585156118; Thu, 30 Apr 2026 14:39:16 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:15 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 22/29] sched/rt: Add rt-cgroup migration functions Date: Thu, 30 Apr 2026 23:38:26 +0200 Message-ID: <20260430213835.62217-23-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Add migration related functions: - group_find_lowest_rt_rq - group_find_lock_lowest_rt_rq Find (and lock) the lowest priority non-root runqueue where to migrate a given task. - group_pull_rt_task Try pull a task onto the given non-root runqueue. - group_push_rt_task - group_push_rt_tasks Try push tasks from the given non-root runqueue. - group_pull_rt_task_callback - group_push_rt_tasks_callback - rt_queue_push_from_group - rt_queue_pull_to_group Deferred execution of push and pull functions at balancing points. Update struct rq to include fields for deferred balancing of cgroup runqueu= es. --- The functions are only implemented here, to be hooked up later in the patch= set. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: Yuri Andriaccio Signed-off-by: Yuri Andriaccio Signed-off-by: luca abeni --- kernel/sched/rt.c | 461 +++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 10 + 2 files changed, 471 insertions(+) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index db88792787a8..e1731e01757b 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1,3 +1,4 @@ +#pragma GCC diagnostic ignored "-Wunused-function" // SPDX-License-Identifier: GPL-2.0 /* * Real-Time Scheduling Class (mapped to the SCHED_FIFO and SCHED_RR @@ -84,6 +85,8 @@ void init_rt_rq(struct rt_rq *rt_rq) plist_head_init(&rt_rq->pushable_tasks); } =20 +static void group_pull_rt_task(struct rt_rq *this_rt_rq); + #ifdef CONFIG_RT_GROUP_SCHED =20 void unregister_rt_sched_group(struct task_group *tg) @@ -345,6 +348,46 @@ static inline void rt_queue_pull_task(struct rt_rq *rt= _rq) queue_balance_callback(rq, &per_cpu(rt_pull_head, rq->cpu), pull_rt_task); } =20 +#ifdef CONFIG_RT_GROUP_SCHED +static DEFINE_PER_CPU(struct balance_callback, rt_group_push_head); +static DEFINE_PER_CPU(struct balance_callback, rt_group_pull_head); +static void group_push_rt_tasks_callback(struct rq *); +static void group_pull_rt_task_callback(struct rq *); + +static void rt_queue_push_from_group(struct rt_rq *rt_rq) +{ + struct rq *rq =3D served_rq_of_rt_rq(rt_rq); + struct rq *global_rq =3D cpu_rq(rq->cpu); + + if (global_rq->rq_to_push_from) + return; + + if (!has_pushable_tasks(rt_rq)) + return; + + global_rq->rq_to_push_from =3D rq; + queue_balance_callback(global_rq, &per_cpu(rt_group_push_head, global_rq-= >cpu), + group_push_rt_tasks_callback); +} + +static void rt_queue_pull_to_group(struct rt_rq *rt_rq) +{ + struct rq *rq =3D served_rq_of_rt_rq(rt_rq); + struct rq *global_rq =3D cpu_rq(rq->cpu); + struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq); + + if (dl_se->dl_throttled || global_rq->rq_to_pull_to) + return; + + global_rq->rq_to_pull_to =3D rq; + queue_balance_callback(global_rq, &per_cpu(rt_group_pull_head, global_rq-= >cpu), + group_pull_rt_task_callback); +} +#else /* !CONFIG_RT_GROUP_SCHED */ +static inline void rt_queue_push_from_group(struct rt_rq *rt_rq) {}; +static inline void rt_queue_pull_to_group(struct rt_rq *rt_rq) {}; +#endif /* CONFIG_RT_GROUP_SCHED */ + static void enqueue_pushable_task(struct rt_rq *rt_rq, struct task_struct = *p) { plist_del(&p->pushable_tasks, &rt_rq->pushable_tasks); @@ -1747,6 +1790,424 @@ static void pull_rt_task(struct rq *this_rq) resched_curr(this_rq); } =20 +#ifdef CONFIG_RT_GROUP_SCHED +/* + * Find the lowest priority runqueue among the runqueues of the same + * task group. Unlike find_lowest_rt(), this does not mean that the + * lowest priority cpu is running tasks from this runqueue. + */ +static int group_find_lowest_rt_rq(struct task_struct *task, struct rt_rq = *task_rt_rq) +{ + struct sched_domain *sd; + struct cpumask lowest_mask; + struct sched_dl_entity *dl_se; + struct rt_rq *rt_rq; + int prio, lowest_prio; + int cpu, this_cpu =3D smp_processor_id(); + + if (task->nr_cpus_allowed =3D=3D 1) + return -1; /* No other targets possible */ + + lowest_prio =3D task->prio - 1; + cpumask_clear(&lowest_mask); + for_each_cpu_and(cpu, cpu_online_mask, task->cpus_ptr) { + dl_se =3D task_rt_rq->tg->dl_se[cpu]; + rt_rq =3D &dl_se->my_q->rt; + prio =3D rt_rq->highest_prio.curr; + + /* + * If we're on asym system ensure we consider the different capacities + * of the CPUs when searching for the lowest_mask. + */ + if (dl_se->dl_throttled || !rt_task_fits_capacity(task, cpu)) + continue; + + if (prio >=3D lowest_prio) { + if (prio > lowest_prio) { + cpumask_clear(&lowest_mask); + lowest_prio =3D prio; + } + + cpumask_set_cpu(cpu, &lowest_mask); + } + } + + if (cpumask_empty(&lowest_mask)) + return -1; + + /* + * At this point we have built a mask of CPUs representing the + * lowest priority tasks in the system. Now we want to elect + * the best one based on our affinity and topology. + * + * We prioritize the last CPU that the task executed on since + * it is most likely cache-hot in that location. + */ + cpu =3D task_cpu(task); + if (cpumask_test_cpu(cpu, &lowest_mask)) + return cpu; + + /* + * Otherwise, we consult the sched_domains span maps to figure + * out which CPU is logically closest to our hot cache data. + */ + if (!cpumask_test_cpu(this_cpu, &lowest_mask)) + this_cpu =3D -1; /* Skip this_cpu opt if not among lowest */ + + scoped_guard(rcu) { + for_each_domain(cpu, sd) { + if (sd->flags & SD_WAKE_AFFINE) { + int best_cpu; + + /* + * "this_cpu" is cheaper to preempt than a + * remote processor. + */ + if (this_cpu !=3D -1 && + cpumask_test_cpu(this_cpu, sched_domain_span(sd))) + return this_cpu; + + best_cpu =3D cpumask_any_and_distribute(&lowest_mask, + sched_domain_span(sd)); + if (best_cpu < nr_cpu_ids) + return best_cpu; + } + } + } + + /* + * And finally, if there were no matches within the domains + * just give the caller *something* to work with from the compatible + * locations. + */ + if (this_cpu !=3D -1) + return this_cpu; + + cpu =3D cpumask_any_distribute(&lowest_mask); + if (cpu < nr_cpu_ids) + return cpu; + + return -1; +} + +/* + * Find and lock the lowest priority runqueue among the runqueues + * of the same task group. Unlike find_lock_lowest_rt(), this does not + * mean that the lowest priority cpu is running tasks from this runqueue. + */ +static struct rt_rq *group_find_lock_lowest_rt_rq(struct task_struct *task= , struct rt_rq *rt_rq) +{ + struct rq *rq =3D rq_of_rt_rq(rt_rq); + struct rq *lowest_rq; + struct rt_rq *lowest_rt_rq; + struct sched_dl_entity *lowest_dl_se; + int tries, cpu; + + for (tries =3D 0; tries < RT_MAX_TRIES; tries++) { + cpu =3D group_find_lowest_rt_rq(task, rt_rq); + + if ((cpu =3D=3D -1) || (cpu =3D=3D rq->cpu)) + return NULL; + + lowest_dl_se =3D rt_rq->tg->dl_se[cpu]; + lowest_rt_rq =3D &lowest_dl_se->my_q->rt; + lowest_rq =3D cpu_rq(cpu); + + if (lowest_rt_rq->highest_prio.curr <=3D task->prio) { + /* + * Target rq has tasks of equal or higher priority, + * retrying does not release any lock and is unlikely + * to yield a different result. + */ + return NULL; + } + + /* if the prio of this runqueue changed, try again */ + if (double_lock_balance(rq, lowest_rq)) { + /* + * We had to unlock the run queue. In + * the mean time, task could have + * migrated already or had its affinity changed. + * Also make sure that it wasn't scheduled on its rq. + * It is possible the task was scheduled, set + * "migrate_disabled" and then got preempted, so we must + * check the task migration disable flag here too. + */ + if (unlikely(is_migration_disabled(task) || + lowest_dl_se->dl_throttled || + !cpumask_test_cpu(lowest_rq->cpu, &task->cpus_mask) || + task !=3D pick_next_pushable_task(rt_rq))) { + + double_unlock_balance(rq, lowest_rq); + return NULL; + } + } + + /* If this rq is still suitable use it. */ + if (lowest_rt_rq->highest_prio.curr > task->prio) + return lowest_rt_rq; + + /* try again */ + double_unlock_balance(rq, lowest_rq); + } + + return NULL; +} + +static int group_push_rt_task(struct rt_rq *rt_rq, bool pull) +{ + struct rq *rq =3D rq_of_rt_rq(rt_rq); + struct task_struct *next_task; + struct rq *lowest_rq; + struct rt_rq *lowest_rt_rq; + int ret =3D 0; + + if (!rt_rq->overloaded) + return 0; + + next_task =3D pick_next_pushable_task(rt_rq); + if (!next_task) + return 0; + +retry: + if (is_migration_disabled(next_task)) { + struct task_struct *push_task =3D NULL; + int cpu; + + if (!pull || rq->push_busy) + return 0; + + /* + * If the current task does not belong to the same task group + * we cannot push it away. + */ + if (rq->donor->sched_task_group !=3D rt_rq->tg) + return 0; + + /* + * Invoking group_find_lowest_rt_rq() on anything but an RT task doesn't + * make sense. Per the above priority check, curr has to + * be of higher priority than next_task, so no need to + * reschedule when bailing out. + * + * Note that the stoppers are masqueraded as SCHED_FIFO + * (cf. sched_set_stop_task()), so we can't rely on rt_task(). + */ + if (rq->donor->sched_class !=3D &rt_sched_class) + return 0; + + cpu =3D group_find_lowest_rt_rq(rq->curr, rt_rq); + if (cpu =3D=3D -1 || cpu =3D=3D rq->cpu) + return 0; + + /* + * Given we found a CPU with lower priority than @next_task, + * therefore it should be running. However we cannot migrate it + * to this other CPU, instead attempt to push the current + * running task on this CPU away. + */ + push_task =3D get_push_task(rq); + if (push_task) { + preempt_disable(); + raw_spin_rq_unlock(rq); + stop_one_cpu_nowait(rq->cpu, push_cpu_stop, + push_task, &rq->push_work); + preempt_enable(); + raw_spin_rq_lock(rq); + } + + return 0; + } + + if (WARN_ON(next_task =3D=3D rq->curr)) + return 0; + + /* We might release rq lock */ + get_task_struct(next_task); + + /* group_find_lock_lowest_rq locks the rq if found */ + lowest_rt_rq =3D group_find_lock_lowest_rt_rq(next_task, rt_rq); + if (!lowest_rt_rq) { + struct task_struct *task; + /* + * group_find_lock_lowest_rt_rq releases rq->lock + * so it is possible that next_task has migrated. + * + * We need to make sure that the task is still on the same + * run-queue and is also still the next task eligible for + * pushing. + */ + task =3D pick_next_pushable_task(rt_rq); + if (task =3D=3D next_task) { + /* + * The task hasn't migrated, and is still the next + * eligible task, but we failed to find a run-queue + * to push it to. Do not retry in this case, since + * other CPUs will pull from us when ready. + */ + goto out; + } + + if (!task) + /* No more tasks, just exit */ + goto out; + + /* + * Something has shifted, try again. + */ + put_task_struct(next_task); + next_task =3D task; + goto retry; + } + + lowest_rq =3D rq_of_rt_rq(lowest_rt_rq); + + move_queued_task_locked(rq, lowest_rq, next_task); + resched_curr(lowest_rq); + ret =3D 1; + + double_unlock_balance(rq, lowest_rq); +out: + put_task_struct(next_task); + + return ret; +} + +static void group_pull_rt_task(struct rt_rq *this_rt_rq) +{ + struct rq *this_rq =3D rq_of_rt_rq(this_rt_rq); + int this_cpu =3D this_rq->cpu, cpu; + bool resched =3D false; + struct task_struct *p, *push_task =3D NULL; + struct rt_rq *src_rt_rq; + struct rq *src_rq; + struct sched_dl_entity *src_dl_se; + + for_each_online_cpu(cpu) { + if (this_cpu =3D=3D cpu) + continue; + + src_dl_se =3D this_rt_rq->tg->dl_se[cpu]; + src_rt_rq =3D &src_dl_se->my_q->rt; + + if (src_rt_rq->rt_nr_running <=3D 1 && !src_dl_se->dl_throttled) + continue; + + src_rq =3D rq_of_rt_rq(src_rt_rq); + + /* + * Don't bother taking the src_rq->lock if the next highest + * task is known to be lower-priority than our current task. + * This may look racy, but if this value is about to go + * logically higher, the src_rq will push this task away. + * And if its going logically lower, we do not care + */ + if (src_rt_rq->highest_prio.next >=3D + this_rt_rq->highest_prio.curr) + continue; + + /* + * We can potentially drop this_rq's lock in + * double_lock_balance, and another CPU could + * alter this_rq + */ + push_task =3D NULL; + double_lock_balance(this_rq, src_rq); + + /* + * We can pull only a task, which is pushable + * on its rq, and no others. + */ + p =3D pick_highest_pushable_task(src_rt_rq, this_cpu); + + /* + * Do we have an RT task that preempts + * the to-be-scheduled task? + */ + if (p && (p->prio < this_rt_rq->highest_prio.curr)) { + WARN_ON(p =3D=3D src_rq->curr); + WARN_ON(!task_on_rq_queued(p)); + + /* + * There's a chance that p is higher in priority + * than what's currently running on its CPU. + * This is just that p is waking up and hasn't + * had a chance to schedule. We only pull + * p if it is lower in priority than the + * current task on the run queue + */ + if (src_rq->donor->sched_task_group =3D=3D this_rt_rq->tg && + p->prio < src_rq->donor->prio) + goto skip; + + if (is_migration_disabled(p)) { + /* + * If the current task does not belong to the same task group + * we cannot push it away. + */ + if (src_rq->donor->sched_task_group !=3D this_rt_rq->tg) + goto skip; + + push_task =3D get_push_task(src_rq); + } else { + move_queued_task_locked(src_rq, this_rq, p); + resched =3D true; + } + /* + * We continue with the search, just in + * case there's an even higher prio task + * in another runqueue. (low likelihood + * but possible) + */ + } +skip: + double_unlock_balance(this_rq, src_rq); + + if (push_task) { + preempt_disable(); + raw_spin_rq_unlock(this_rq); + stop_one_cpu_nowait(src_rq->cpu, push_cpu_stop, + push_task, &src_rq->push_work); + preempt_enable(); + raw_spin_rq_lock(this_rq); + } + } + + if (resched) + resched_curr(this_rq); +} + +static void group_push_rt_tasks(struct rt_rq *rt_rq) +{ + while (group_push_rt_task(rt_rq, false)) + ; +} + +static void group_push_rt_tasks_callback(struct rq *global_rq) +{ + struct rt_rq *rt_rq =3D &global_rq->rq_to_push_from->rt; + + if ((rt_rq->rt_nr_running > 1) || + (dl_group_of(rt_rq)->dl_throttled =3D=3D 1)) { + + group_push_rt_tasks(rt_rq); + } + + global_rq->rq_to_push_from =3D NULL; +} + +static void group_pull_rt_task_callback(struct rq *global_rq) +{ + struct rt_rq *rt_rq =3D &global_rq->rq_to_pull_to->rt; + + group_pull_rt_task(rt_rq); + global_rq->rq_to_pull_to =3D NULL; +} +#else /* !CONFIG_RT_GROUP_SCHED */ +static void group_pull_rt_task(struct rt_rq *this_rt_rq) { } +static void group_push_rt_tasks(struct rt_rq *rt_rq) { } +#endif /* CONFIG_RT_GROUP_SCHED */ + /* * If we are not running and we are not going to reschedule soon, we should * try to push tasks away now diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9814be8348cd..6b5bd6270d9a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1330,6 +1330,16 @@ struct rq { struct list_head cfsb_csd_list; #endif =20 +#ifdef CONFIG_RT_GROUP_SCHED + /* + * Balance callbacks operate only on global runqueues. + * These pointers allow referencing cgroup specific runqueues + * for balancing operations. + */ + struct rq *rq_to_push_from; + struct rq *rq_to_pull_to; +#endif + atomic_t nr_iowait; } __no_randomize_layout; =20 --=20 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3EEF33BE636 for ; Thu, 30 Apr 2026 21:39:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585160; cv=none; b=pWNtSV8b0xeRwyd7jr9whyY3K95vn8v5KW9D+oOyIq2X8Rju36xSeYSY2jx2mO2hOCS+YgFnsQvzQSlWYyVkt4Rb6/RTyog3aCxtCgz5it7BJu0mwvZJm+kU6cBla2XgJvHi4YPIk9Fjjr5KcmZh9ItbXor+JopUhyacQqykKZE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585160; c=relaxed/simple; bh=TMYXBI+wJ8DjP2y7fWPxhN6qk1cpUbOyCL5BDkow07k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UEO4PwrwuphuAD16OM66CJ8Mf/pPUxWQM4bXfGh69kDPQg/UflMhvYJlHrPG1W25XTIhPIcNBbGfCrVjCNAMmDVN6000OSfyLkkLAoFQDruWx3mStKEDT6figVAdAwkaHaK+N+ZKGoKY6EOiyi8reNM6cW8BEb/AIY2vSMcf0lQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YTMjmUZ7; arc=none smtp.client-ip=209.85.221.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YTMjmUZ7" Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-43cfd832155so898532f8f.1 for ; Thu, 30 Apr 2026 14:39:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585158; x=1778189958; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=waxlLQPjoqAb9Ik7V3fvlsTTX/AxVfLeB+KRuoD9o70=; b=YTMjmUZ7Lf12LlPOY2BksKH8v4cs2pbgY/rILLNKz6nLTRcuetR59YpBh+Z8FHNvHx Rfi+RcD7CGuKJP59HP7RyBa/1nhpshQpD8n2H2BoZodXtWQGEQjyshCZQKPQcUIRa6p5 1m7iP2K3YlTA5iPPtr40eHzVlF+ebrOCiKASYBNWR4T9Cj8UhVxwU3E+/ZHlJzgDjiW8 B+9+oESnS+w4gMCbOB4mt+D4iTmvfpwxN4qHORpbMNhDFyWvHdR+ErLQ+sSEQj+wbny+ VIECUdlac4xerBk7X+PQBweHVL1jKR1YUEVk0XG68fo4AdrfIzWDqb/Yu3uH3Gzu9tPY SaGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585158; x=1778189958; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=waxlLQPjoqAb9Ik7V3fvlsTTX/AxVfLeB+KRuoD9o70=; b=UWhsHE6l9mfLNPRPT92TH08BXC9KUWBdNprv93i3cBAx5x78TejbvfbJBIUKcFeAPB QOfMgfXwKFBtksz1ImcW7x7ku1fJXLTp9/LbEEe832vfCf7TLwEEE+WPrqEZ9cgKUgxT rxkAqe/fv3dFVgZPpRt9DCP0IgBNcL7vQnqv67PLbC3kMXFQGYYIVcsmAXfSsIzzANvC 0IO7TMd618+cI3f9Artq7e+6cN1hJexgDjcPCyGrmCKUXXnUmUqGHAgD+lt2bafjBh6D AZotEoYovvgrUiS/MhKSVfUhn0PksZSzEWHV+gt0m0MvHc078E8n4wCbuiSv3cGBXQzP esnA== X-Gm-Message-State: AOJu0YyJt+r316Mx1HhxfP/MKakfiYD+QwiNtc7Kwx2OsjbvRGlzzF5n EEU+ogCKiQck5eeECMVjMHQZbs+/wr2pc4u1wp02IQ+us10wxT8tWnww X-Gm-Gg: AeBDievwlMY//dcU/om2dld/ISlHd6aA8Csyrw/8r5lHE5ZrTb0KqqJbPhvnSmJhRTY RwdZweLUiF5b0pnsF/orh28TNZ53gZKSMJdjoMNjT5V/7z6+bhTPxrThZnim8yrSCdKvr59lrdQ Y/3fYObpv3Eo5ODW5EoOBOT8+DicghFbDgWKhnWZe51Isc5L5jO3bSLhtlgTJTY4jizV0Y9VjzA LOa0NQDPOY14PCT9DEVQLbZS+QQ4YtAWTHNeT70qOgCXdyLPJXUzvC+wyAdAjuYcx0m0zy7jWK3 KBl8A7dI5MS1M73yNdnuB1ZCgbtGJnrtLsd2/A1QLsPf8kr9qpVTDWNNE3DrYiWUBNXzV1Skr5l V2duhsXdTuHMirAyw+MY29wAMAdsiEYh3iUB+BTHWyb/jHeFt88wyveSuksb4in1wG+5qdBUd0I YybFPzYmjnoiyCkb8f1tawXb5ID7WfLmdvnlek3wNO X-Received: by 2002:a05:6000:2911:b0:441:3144:efc5 with SMTP id ffacd0b85a97d-4493f03910bmr7717282f8f.42.1777585157707; Thu, 30 Apr 2026 14:39:17 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:17 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 23/29] sched/rt: Hook HCBS migration functions Date: Thu, 30 Apr 2026 23:38:27 +0200 Message-ID: <20260430213835.62217-24-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Hook rt-cgroup migration functions: - balance_rt - set_next_task_rt - task_woken_rt - switched_from_rt - switched_to_rt - prio_changed_rt Follow the same patterns as for the standard FIFO/RR scheduling, but for HCBS cgroups. - put_prev_task_rt If a server is throttled, put_prev_task_rt is invoked and a push is necessary so that the task can keep running on another server if possible. Update select_task_rq_rt to always return the cpu where the task is schedul= ed. Update switched_to_rt to keep track of the deadline server that is assigned= to the task switching to FIFO/RR priority. Co-developed-by: Alessio Balsini Signed-off-by: Alessio Balsini Co-developed-by: Andrea Parri Signed-off-by: Andrea Parri Co-developed-by: Yuri Andriaccio Signed-off-by: Yuri Andriaccio Signed-off-by: luca abeni --- kernel/sched/rt.c | 59 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 45 insertions(+), 14 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index e1731e01757b..e6b3efa358d3 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1,4 +1,3 @@ -#pragma GCC diagnostic ignored "-Wunused-function" // SPDX-License-Identifier: GPL-2.0 /* * Real-Time Scheduling Class (mapped to the SCHED_FIFO and SCHED_RR @@ -906,6 +905,11 @@ select_task_rq_rt(struct task_struct *p, int cpu, int = flags) struct rq *rq; bool test; =20 + /* Just return the task_cpu for processes inside task groups */ + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && + is_dl_group(rt_rq_of_se(&p->rt))) + goto out; + /* For anything but wake ups, just return the task_cpu */ if (!(flags & (WF_TTWU | WF_FORK))) goto out; @@ -1005,7 +1009,10 @@ static int balance_rt(struct rq *rq, struct task_str= uct *p, struct rq_flags *rf) * not yet started the picking loop. */ rq_unpin_lock(rq, rf); - pull_rt_task(rq); + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq_of_se(&p->rt)= )) + group_pull_rt_task(rt_rq_of_se(&p->rt)); + else + pull_rt_task(rq); rq_repin_lock(rq, rf); } =20 @@ -1120,7 +1127,9 @@ static inline void set_next_task_rt(struct rq *rq, st= ruct task_struct *p, bool f if (rq->donor->sched_class !=3D &rt_sched_class) update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0); =20 - if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq)) + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) + rt_queue_push_from_group(rt_rq); + else rt_queue_push_tasks(rt_rq); } =20 @@ -1174,6 +1183,13 @@ static void put_prev_task_rt(struct rq *rq, struct t= ask_struct *p, struct task_s */ if (on_rt_rq(&p->rt) && p->nr_cpus_allowed > 1) enqueue_pushable_task(rt_rq, p); + + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) { + struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq); + + if (dl_se->dl_throttled) + rt_queue_push_from_group(rt_rq); + } } =20 /* Only try algorithms three times */ @@ -2214,6 +2230,7 @@ static void group_push_rt_tasks(struct rt_rq *rt_rq) = { } */ static void task_woken_rt(struct rq *rq, struct task_struct *p) { + struct rt_rq *rt_rq =3D rt_rq_of_se(&p->rt); bool need_to_push =3D !task_on_cpu(rq, p) && !test_tsk_need_resched(rq->curr) && p->nr_cpus_allowed > 1 && @@ -2221,7 +2238,12 @@ static void task_woken_rt(struct rq *rq, struct task= _struct *p) (rq->curr->nr_cpus_allowed < 2 || rq->donor->prio <=3D p->prio); =20 - if (need_to_push) + if (!need_to_push) + return; + + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) + group_push_rt_tasks(rt_rq); + else push_rt_tasks(rq); } =20 @@ -2261,7 +2283,9 @@ static void switched_from_rt(struct rq *rq, struct ta= sk_struct *p) if (!task_on_rq_queued(p) || rt_rq->rt_nr_running) return; =20 - if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq)) + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) + rt_queue_pull_to_group(rt_rq); + else rt_queue_pull_task(rt_rq); } =20 @@ -2290,6 +2314,13 @@ static void switched_to_rt(struct rq *rq, struct tas= k_struct *p) */ if (task_current(rq, p)) { update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0); + + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq_of_se(&p->rt)= )) { + struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq_of_se(&p->rt)); + + p->dl_server =3D dl_se; + } + return; } =20 @@ -2299,13 +2330,10 @@ static void switched_to_rt(struct rq *rq, struct ta= sk_struct *p) * then see if we can move to another run queue. */ if (task_on_rq_queued(p)) { - if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) { - if (p->prio < rq->donor->prio) - resched_curr(rq); - } else { - if (p->nr_cpus_allowed > 1 && rq->rt.overloaded) - rt_queue_push_tasks(rt_rq_of_se(&p->rt)); - } + if (!is_dl_group(rt_rq) && p->nr_cpus_allowed > 1 && rq->rt.overloaded) + rt_queue_push_tasks(rt_rq); + else if (is_dl_group(rt_rq) && rt_rq->overloaded) + rt_queue_push_from_group(rt_rq); =20 if (p->prio < rq->donor->prio && cpu_online(cpu_of(rq))) resched_curr(rq); @@ -2332,9 +2360,12 @@ prio_changed_rt(struct rq *rq, struct task_struct *p= , u64 oldprio) * If our priority decreases while running, we * may need to pull tasks to this runqueue. */ - if (oldprio < p->prio) - if (!IS_ENABLED(CONFIG_RT_GROUP_SCHED) || !is_dl_group(rt_rq)) + if (oldprio < p->prio) { + if (IS_ENABLED(CONFIG_RT_GROUP_SCHED) && is_dl_group(rt_rq)) + rt_queue_pull_to_group(rt_rq); + else rt_queue_pull_task(rt_rq); + } =20 /* * If there's a higher priority task waiting to run --=20 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAEA53D6CD7 for ; Thu, 30 Apr 2026 21:39:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585162; cv=none; b=fYDu3KdHfLDENV7prIZJcsquq0c0qLWqtjFK6wAD/d4rDxo4hm2YlDct14/k3rCAaJwesosS1RRdhSCE5y+xpU+XqQEId9Ng9d7JmuUZo0aBaU5WrJfiyINF8IEcVyd5vBTihDRTjv08gNkbpjXNKpnYSGQ+ZicurMr8nm3fELc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585162; c=relaxed/simple; bh=1No8Mx6z0XK1T+xLZQSIiKDy2gg+dHvwJ9b3kv6ZZ/U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cKOyziOFSnonl5rZmo2FKdo+q/G4JVbvJ1eQSmgIFVRl0EqyyJJXXsgx2AoeLfjV/5dMyl6ZRCXK5B9ZAgtnVg+PAdZD6pWTF1GbR8kk9J18bglB1YEKguMk0k4O/F9IXkn0ptotZ3X/ulXunG4XmDlSVaRFmDilvLxrBr77OFM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bzDi3MBE; arc=none smtp.client-ip=209.85.221.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bzDi3MBE" Received: by mail-wr1-f41.google.com with SMTP id ffacd0b85a97d-449de065cb3so642651f8f.2 for ; Thu, 30 Apr 2026 14:39:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585159; x=1778189959; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=og95UDN5ktE5dTi/dTNpIEcxXmmJX0ZCSSkC8qwnGq4=; b=bzDi3MBERIwqPR5VfQVECRSHpjeAcaL55JBvKEtHf6sG8SWJYFg1gxlSMHIgd+P3tc yCVx3P1tQhG9JIlznpxuw/77ZjF3naVLyUXgABCm2M+M9DTAf0aoImaVk2qub/IeWjSF 3JcMw+lsDxb/ksMYUdMADfMDIK+y1BjgCKWoDjv9jHPIvEI8+8M/BTP6wzqOQMnyg5vQ IUyo3UEOVH/FSpJ/E2g/19xcsHAs9EPDhjt0cIKTFxLKhOb26YluNofjayVpTsMZIhsW 6xard88mjPgsHGGeyHAAqVJwpvl8+BEPaAgsSM2e6rQY+n56BM71d8AaVuxAgYJzX7TB nm4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585159; x=1778189959; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=og95UDN5ktE5dTi/dTNpIEcxXmmJX0ZCSSkC8qwnGq4=; b=lQlJ/SuDACwi2+VA0+nhdOF+D4iYe3W0OAC69LkvEV3AhXB1zUjAXS+0hmXYcuKwbm UIeCsgu4Rxy4K8DKlgemyaIH4KPOrivEcBir6noOtGSI8C5hxWgw4ERcPMqqfIHzYvPj II2uTfI8rXvB+AsKsGaqE3amHUeCdMadcicduJnsiTMAkIJ0eWW1L01Jz9Ej9Y2PzDLW q+hE0ymiGFUXBNTX3CQRJ2QfOQqib98KbygqBh2RrYuo910tKATQWz8Ef120XVJzN7FW FvPLk0VfLW0aTiWRBS/qaFNp11egix83bBwbwSlbS8c67dIZweeur9jWVKdzZKPkq0Rw nLtw== X-Gm-Message-State: AOJu0YylzcbngtAreda42b4RVFwyt2nkze9shaWyxfaxswoEji1zGYw8 nXqf0wBI5BinkakNMFCCdtYWtRrH15t+oPW7c8mWrDZH9uzbQGfe8YMLSErgPw== X-Gm-Gg: AeBDiet/vpiKUya0C+4GyWpgFTwN+39yz01cxtS4jjhH3jEC7FwbIz2CR53bJEtjS0Q z8lDT4nn6mVAAxG/N1KWURIDZjufSuEyoQnjjpU3ZEZ73pdxauymrDQdQIo9p60/IpghHqkFCM8 DWNpNxvDyH2m94GiGztvn1qb9CMvK7d2JaDI2xSDvYd/JLQfLooqr9FzYWeVoOW//mbbjtq1j5f GBlj15bnBQMGuE5INzOls0jPaBF5MCLSv2YmS0sGk5RgFQ0KAZA9MlBadv9UOkszehQEB3ZEPLd PEX+I61h9Et7WjY3hPjdrn1bHCZlwpt81WG9OvNkfYxsuM3wcTvhOztjw5ZHD/Ginp0Gl4WjzbS O9Ll8FWD3eYjU4Kz0MW6+yRHxq6S/PC3cPuqtEp+X9I3EKB9nvf9rSUu2LnWNbO13eSoRZQkXoY ofDVZV+WycNr84nNSd7g6wYu/tF+m9HBjVZCGnSd+e5K0TlDONfnY= X-Received: by 2002:a5d:5d10:0:b0:448:75b0:5ae5 with SMTP id ffacd0b85a97d-449399b9b65mr7701540f8f.0.1777585159157; Thu, 30 Apr 2026 14:39:19 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:18 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 24/29] sched/core: Execute enqueued balance callbacks when changing allowed CPUs Date: Thu, 30 Apr 2026 23:38:28 +0200 Message-ID: <20260430213835.62217-25-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: luca abeni Execute balancing callbacks when setting the affinity of a task, since the HCBS scheduler may request balancing of throttled dl_servers to fully utilize the server's bandwidth. Signed-off-by: luca abeni Signed-off-by: Yuri Andriaccio --- kernel/sched/core.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index fd532bb46995..24ffe933527c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2875,6 +2875,7 @@ static int affine_move_task(struct rq *rq, struct tas= k_struct *p, struct rq_flag if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask) || (task_current_donor(rq, p) && !task_current(rq, p))) { struct task_struct *push_task =3D NULL; + struct balance_callback *head; if ((flags & SCA_MIGRATE_ENABLE) && (p->migration_flags & MDF_PUSH) && !rq->push_busy) { @@ -2893,11 +2894,13 @@ static int affine_move_task(struct rq *rq, struct t= ask_struct *p, struct rq_flag } preempt_disable(); + head =3D splice_balance_callbacks(rq); task_rq_unlock(rq, p, rf); if (push_task) { stop_one_cpu_nowait(rq->cpu, push_cpu_stop, p, &rq->push_work); } + balance_callbacks(rq, head); preempt_enable(); if (complete) @@ -2952,6 +2955,8 @@ static int affine_move_task(struct rq *rq, struct tas= k_struct *p, struct rq_flag } if (task_on_cpu(rq, p) || READ_ONCE(p->__state) =3D=3D TASK_WAKING) { + struct balance_callback *head; + /* * MIGRATE_ENABLE gets here because 'p =3D=3D current', but for * anything else we cannot do is_migration_disabled(), punt @@ -2965,16 +2970,19 @@ static int affine_move_task(struct rq *rq, struct t= ask_struct *p, struct rq_flag p->migration_flags &=3D ~MDF_PUSH; preempt_disable(); + head =3D splice_balance_callbacks(rq); task_rq_unlock(rq, p, rf); if (!stop_pending) { stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop, &pending->arg, &pending->stop_work); } + balance_callbacks(rq, head); preempt_enable(); if (flags & SCA_MIGRATE_ENABLE) return 0; } else { + struct balance_callback *head; if (!is_migration_disabled(p)) { if (task_on_rq_queued(p)) @@ -2985,7 +2993,12 @@ static int affine_move_task(struct rq *rq, struct ta= sk_struct *p, struct rq_flag complete =3D true; } } + + preempt_disable(); + head =3D splice_balance_callbacks(rq); task_rq_unlock(rq, p, rf); + balance_callbacks(rq, head); + preempt_enable(); if (complete) complete_all(&pending->done); -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F32DA3BE17C for ; Thu, 30 Apr 2026 21:39:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585163; cv=none; b=CLLt2VGC5/Wb12Rht3HDOQjsidwMSqs1xU6YGR56vMFjh+T06rfMSy/IUsvOZnZD5Ges8E8M0AB8fw1BvJcLmyzlnwqXWJ+9CW/1lQPrHq4hnpTjhLoQUjYI0mItmtT8t7vUzbYL+LzUupTEAzGLnysD/OtOf2jLbUkAjNuD2j4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585163; c=relaxed/simple; bh=rn6OaWgIK7qfAwTJ7xgJfYgyo9ZNCW2lsqYiE2sBRgk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=f5dSTmZvCwp11pSPjewI5oLFR024rs4W3rC9hcZoersZZj/8NWswnSmrcrIaCQ9EydhkubNggXTgP0C1BNlMb6xL35TW2ZOF9PVK8tqhE5q/BP20OhotQ/+3fmV8gU0Dp97ECSjTzuGtZ8s7psSPpLAgNwNPukeaMzAnuEc3lGA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=T2k/efbt; arc=none smtp.client-ip=209.85.128.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="T2k/efbt" Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-488b0046078so12007715e9.1 for ; Thu, 30 Apr 2026 14:39:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585160; x=1778189960; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xXrLx6wU+YxMlLmmovyxNecU9gdFnA1r84P/O+8wd6M=; b=T2k/efbtkHplO3+7V9syPevf452KnhNb77Zt+Z15SrBoFB+ydrou/kL21U3JSlXgMg yOKLO0Rum5PI4A/gd+LE8SG5OBX/BV51uhFLHrBSxlHNcklwxdHQomjlpbn89ni3mZG7 xzEzJHq6m6JVAJuPNl1KuFjdeYnPXJn4klCmWX/7+ySGHM/Fk5yubf0TxbPYG2i/xeGr gXWhUiE8PEcrS9ene6s8w0u/NKzTc+wWAsov28MPuSAcJLk4ENl/UDOYB5CGk7PHeZXh KE52Zsk+AENZO+BxOhvQARWFmYGj9BwCDn5yq7T6A3aCD1zb9rMo3Takwv0EYVlzLW8L Ip+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585160; x=1778189960; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=xXrLx6wU+YxMlLmmovyxNecU9gdFnA1r84P/O+8wd6M=; b=SWGFhkakS4oUjmRVx2MZUT6CgkKsH4oF9hzqwg/RCj50k6LsOnw3UR1ED1nyG0Wp4k bmmz6eIQg4+nrLfgderxixZSTs1bKg+BKkxUe9jxGMON9QG75YewZ6vC8gfQe5F6fdrU 2M0NarpT9gbAbNzUNdVX7tV45cg6dKlncLTrfjPYkUgCdRd/dojv/ih39prMRine6Aob +TTDD/O2Fvw68ldMD++Peco6mOWukZL5JiNpybHsJx2YfSjVYaTZb9tOgLaKvSGJOyAb 4tTYGoS1LG9pa55wzofmb98Z1+GAbtjs3eflL05piNqTIB5zp33roEq6au2sMk0uEUnz DH3Q== X-Gm-Message-State: AOJu0YwGIrRe2nkS4N6ZwZdx5rtv1DY+Amxkn0WddMfKjoAMmmUPHZ1i F7THlaCv70uF7OnK7O8VTt9afStSTb2J0iEC9GIhaBUEQMDB7LGjQ4n0 X-Gm-Gg: AeBDieuVvvczf23uPqu5z1RpbXYT5CXnSDS0LJnVOPDCy7/LybQ5LXiv0nBeAoXQ45p zD8l6bCMyToutPBTn9CWTgKyu97TNDF4t/Lhw4R4kbXOfUXgWaTSileY+7lKk/hN2oGe/x1Z/sH EIeY1FAL8YTMdmyNCvNWKsNPqX7Ur8LLwE8DVu9Yb3pPGlYW4jRHlkZcuFni9NtwOMqM0ygPp+r 9M+FfIxPZgCNhs7pR+fKLZAtWQ9mNIJtOK+gRbJOxb0h0/Cg7Ue2fzJQJ4NUPw9Jg7T7VxwFkIb LipL8U+3XkeBXW6qee/cWCCvvBnAta70ZG6ExRw6aYRyyiA1d1YZe9PhkiWpzb3uvp+j9Vis8Vm 5FfveC/U/JKm6aJbjoPVoX8T6Tpn1GGfZioPrqtf8gyxsMlRvxx1Dtcm1+W8ioTuOQq/nmGvis7 wgBLJxmWNq2pC0sWMhPiFbBNvYymXAjNZOMBQAh6xo X-Received: by 2002:a05:600c:8719:b0:489:1f97:6b1d with SMTP id 5b1f17b1804b1-48a8ebaa93cmr6659125e9.28.1777585160552; Thu, 30 Apr 2026 14:39:20 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:20 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 25/29] sched/rt: Try pull task on empty server pick Date: Thu, 30 Apr 2026 23:38:29 +0200 Message-ID: <20260430213835.62217-26-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Try to pull task on a server with an empty runqueue before returning NULL (= and thus shutting down). --- When all the servers of a cgroup are throttled, work is pending, and any on= e of the servers is replenished, it may happen that the runqueue is empty and th= us the replenished server is immediately shut down. The server may try to pull a task so that the cgroup could consume its allocated runtime as soon as it is replenished. Signed-off-by: Yuri Andriaccio --- kernel/sched/rt.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index e6b3efa358d3..4553a139398f 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -155,8 +155,14 @@ static struct task_struct *rt_server_pick(struct sched= _dl_entity *dl_se, struct struct rq *rq =3D rq_of_rt_rq(rt_rq); struct task_struct *p; - if (!sched_rt_runnable(dl_se->my_q)) - return NULL; + if (!sched_rt_runnable(dl_se->my_q)) { + rq_unpin_lock(rq, rf); + group_pull_rt_task(rt_rq); + rq_repin_lock(rq, rf); + + if (!sched_rt_runnable(dl_se->my_q)) + return NULL; + } p =3D rt_task_of(pick_next_rt_entity(rt_rq)); set_next_task_rt(rq, p, true); -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD7823D7D91 for ; Thu, 30 Apr 2026 21:39:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585165; cv=none; b=BdGwzBnIGQWDl7XtCYUp/nVucw5wobw0j0Dk84Y4oSXf4TW2XBNVMypIsCLdQJoNXufnbnnVU+Cmcy8tSQKaa3pVyvMoc8ja0w7a03GS2dAHmKD+OiBPbE9HV97Mz6ME3wRKE+UDD1fySaYK8MtcvXjqog8jJEDgqh7WCb+oUZA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585165; c=relaxed/simple; bh=3g6Ss7Sp5MAl+/UMsni/DReWQuvJVgF0XRECcqbGR8w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=J0Pythd7yrdM7VPzCM6bONXsfjS1/GeWl3TxGVcFyGkNXs9s2S5f8iPDBRVGPWQ044GVYOst7Qvy/9wsFzZA1bNvhZrWyvor7OF7VTQ6+fma2m1aoK2p8CbVsyKK+mQWBBjgehTrUbivya4CPH0e7ZOKhoBO51+0l0UjBId3uAw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=JmX6yje0; arc=none smtp.client-ip=209.85.221.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="JmX6yje0" Received: by mail-wr1-f45.google.com with SMTP id ffacd0b85a97d-444826c16ffso1233778f8f.1 for ; Thu, 30 Apr 2026 14:39:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585162; x=1778189962; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=h8AWPkz8KJFiH3aKbTZEJCpprJU6lSxCjxlPiaauIRo=; b=JmX6yje0DiGDFBS0H4knFbChy8loeBAxx1jWwDERoMSmzLLeHQf7rNP7N5tJYfQzm1 GoPpsDKousDU8ff0JkCsYF9xvKn9suzj0iodWkfQodTD7VxW76xsPlsDRQPcZ/A+6rPv JxZo7rbnW77DFRsBXCOGPA7W1+97fxhgZqYhZrOyLxmmcWYpSP4zMfHcW8j3NUJhjEGy K6dep59/mEu8lD9tl9Iw1XSrAeIzWh31fQ4X1J4Qxc0B3YlVzPgvrvSxwq8YHRTaHvhL megHElkNNCVaaXss/k97Y1E2lqkvaGm04Y0gmMpLGmltR7LQiSa0h/m85qr8/yseyhYj 2tMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585162; x=1778189962; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=h8AWPkz8KJFiH3aKbTZEJCpprJU6lSxCjxlPiaauIRo=; b=Cm/LkENAZFuRL17kzc8gyz7u0mjb7cU8I3sNmNUyKLCkwadiZMGi8FXzQU3mTe6hkE +dreYBbxm9AI2hVUYN8GIOzZNtK5+j8M5Vjia7vSm/WUmXnTUd3aAmU2Dyr9y36zSrkY OV/qU58OQIb7XBdvvz4C0GumIGZyBtZ21Zy55Rie7Mm4NHMQsmyAN00JEuyGe8X5linU H78R1R/x40o+bUTM5BqCw0kP0rR7qQlDY0NLmrQlv8uSthyIOa9Q11wSJeoBMRdx2cs5 zPFsYfGgdzAE3cIntIzbceu2UJDvxV4N4RuM47gpepwpcIwycNpDn0kSwIblncVUBaw1 VeTA== X-Gm-Message-State: AOJu0YyEPHpLkEADNL9kM55eyqLVBqNyD8TKugQ5TJ+zNHX7WHNXqMRX yQ6xe2qnmyNafI3gFWsfU3Nst95LfpsogeRKIvd5weST5bAI2ZjDng2L X-Gm-Gg: AeBDiet9QtprfCSbkEa+xWXB1LPU/T6HrtAf0TV3wl+Pasv8/lHACA5KIri8b2H1gYk djwpjiKjXnIi9Qb4dAWbyn8/uMJpTAWkw1M7PypNGp/4qzHWZiZUA+l1kdG4J1VKeQg+xBpju80 4MDghjj83fjQ4h7ZfP/i/mdbQ9KaJES2xvtKA8zcv9j+R2JjrcwBmp7hfLhr3z0UrIjTf8bIBQf oWMi+10RxcRdoXstuTtgAbgqylSgNjkg2Yvj473h2yRkewfuX3UkZcoZgfwwpCciItkEGQWXnEm aKh8kaPOr70nyPxZf2NTdvgbBVkciXPHiUQZfbaRtk633vd0hXfhe2oRlrdRBsAhBLbck7m1e7s rd/M2GAyQHr4ufO48azrPL2TNgmzsEdISZsHJ9zuev4jgt39KB0zYZ89V8EXsGSxmHe/bHk/bYF MqnltKrbbvE3MmxpSrksLUdgAzOGb3wqemQn2eXmUc X-Received: by 2002:a05:6000:1ac8:b0:441:2335:ac3f with SMTP id ffacd0b85a97d-4493f9fe8ffmr8056448f8f.31.1777585162265; Thu, 30 Apr 2026 14:39:22 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:21 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 26/29] sched/core: Execute enqueued balance callbacks after migrate_disable_switch Date: Thu, 30 Apr 2026 23:38:30 +0200 Message-ID: <20260430213835.62217-27-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Execute balance callbacks after migrate_disable_switch. Balancing may be requested on the __schedule path, in migrate_disable_swi= tch, when the running task is throttled and then pushed away from its runqueue. Signed-off-by: Yuri Andriaccio --- kernel/sched/core.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 24ffe933527c..03bd86cc8d4f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2352,6 +2352,9 @@ do_set_cpus_allowed(struct task_struct *p, struct aff= inity_context *ctx); static void migrate_disable_switch(struct rq *rq, struct task_struct *p) { + struct rq_flags rf; + struct balance_callback *head; + struct affinity_context ac =3D { .new_mask =3D cpumask_of(rq->cpu), .flags =3D SCA_MIGRATE_DISABLE, @@ -2363,8 +2366,13 @@ static void migrate_disable_switch(struct rq *rq, st= ruct task_struct *p) if (p->cpus_ptr !=3D &p->cpus_mask) return; - scoped_guard (task_rq_lock, p) - do_set_cpus_allowed(p, &ac); + rq =3D task_rq_lock(p, &rf); + + do_set_cpus_allowed(p, &ac); + + head =3D splice_balance_callbacks(rq); + task_rq_unlock(rq, p, &rf); + balance_callbacks(rq, head); } void ___migrate_enable(void) -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B1553D890E for ; Thu, 30 Apr 2026 21:39:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585169; cv=none; b=VzivLRdTS2vls9K5hLbxZpbN65cQGxl83dD4QPtqYa+LyJsaRpW9xOp2UKJdvikPCxf7CJEIP010BTyCF4pqXaQOHEScPWpUnpx+a3PCX7/dmGoeE5N+cNiysqYx2yKCB4on9DhnhUMZ7VdMDAAX/MRTG9bkdiinc9zMqUtPijo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585169; c=relaxed/simple; bh=T20G7ROZZl0su2MotlQA6bzk5cBx59RGtMYBo7TNjYI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=QbnFFxw/jzBfUMigXH1RbkE5Gcwk8HgdiQF0GgLZhSIAuw3d9STk+qkFiEpxR68f+ycT5JMc58TQvrOtJOcVxH6uTIdvgQ2U+jcVEJygPnNbxXXREYlGTHLZok7WETp+NEJwbQYDgY0BkxQZhCOEY2ngzu+pcuEJB0BwfWfeO7M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=CqTRNi0T; arc=none smtp.client-ip=209.85.128.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CqTRNi0T" Received: by mail-wm1-f42.google.com with SMTP id 5b1f17b1804b1-48a3e9862f0so9046745e9.1 for ; Thu, 30 Apr 2026 14:39:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585164; x=1778189964; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pyNLbChDoP4yqIanENhX/rdxWxugqOoLoiLheFKLBbo=; b=CqTRNi0TTmpp45peYEyv9A6SiSTIgBF5/OHhygJDYZkmF1nr/+HTg2hfk7xRxPdPK5 QbtTaL3ohyZ/mNbnB5DvSylprFJT10BSgIae9UYq+lr2GGffq7PpGRu9QN+3fpcDa6p8 JeGJYwDfzBV8zEHD1B7xCWRh8vpsfZnVe5sRYfzK8gc9xIjeqA6oRvi5aO/r3e+l7TmO CMxcxLruyEfKHCcrWwFhi70JzpWfRpqLfmzo6ZcusbE+PEQsOB5E17d92OV/xk6LPm78 JMRMlV4QaJ3aZMBlL/QFIHuf84MtuGj3JJvcEQjuONueb+HP6CYTJePplCKGS0bw11Ts 0lgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585164; x=1778189964; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=pyNLbChDoP4yqIanENhX/rdxWxugqOoLoiLheFKLBbo=; b=S4i92wjMpcm05yYOj2/jSg27AryfZ0yUQs4yw8Yew5BFwuJR4zMVAHwFLpEHZZ6JhS Fxln2v1K9Y0vMWUD3CZuPVRPNN/j5qRt5Nhhfh57o1PP+QM3pW44CeBKSxUpoNIW4wDq 9M9Vxl8tklk0lE9XnUS/H/H/nI+6RzYluamw8WaOXRove+vQ6oNq4wq6jyM8QvYFMI/L e1gaRllzW44FkpNQCci/KCu3TEdaIwe6U/DfJZ/0hFl+aH84PQylftO4pwB5IDCUaQ77 0PZH9nglvHmSqy4wTpwG2DcAd0Qm57IqmaIfJRXpCD9LYYsnEjdzQazJoou1sqsL+0C5 6uMg== X-Gm-Message-State: AOJu0YzdZVGz3jgps2LAuza9W8IXx7uyHXxSrJvE9tdLNWmiPNvrDlkV 0+q91V0fV4uR1zaKkYOXPkM6hIBQIgruekL6/wxFGcVzSRu88iLPeJgS X-Gm-Gg: AeBDievtdq+LCuR7zyol4ydN0vsf8zf5YyjN4iN2IdUHXaRxT0FdvB8HrCSQUOIUIQn U2Y1LEMFpzZOMONMQHZifazz2SvLsOQqKsSuLZP4pgVNVoLc/N3OZrjNJ/PWDUHyb7gZs0PLKrP /p/wIGv4DvEXSMVrdyK5XC08RMl5lWS2GGuAqR7G37HSws7zHNPjMdDD+JD6LzA95a42XPQB90Z rfyDcjJM35f1ypCM1Dy9W88quuRfgkNLwrogpc9rNsOC4gH9yYkvv1eaK+NgPkfoHZ/N7fVaNcS u48NaUuTfTbl31yowA4EHFTm/q/MamqRHX4PgCF4SJxpQ2G04KT5S7lbtexHZBbYfUu2bvNFk0l hbASbwN+JARnvZ2alDh6bpjqINtpNRDYncygXlpCJfaS3KH13wYM8OZeJU7SMOYO65nQmMh77kK 3oybv8212vKbf9xnUTudBp7EDDb3Ab/s+YLuI8VUV3 X-Received: by 2002:a05:600c:26d2:b0:48a:5236:7f38 with SMTP id 5b1f17b1804b1-48a8607970fmr45429585e9.14.1777585164093; Thu, 30 Apr 2026 14:39:24 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:23 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 27/29] Documentation: Update documentation for real-time cgroups Date: Thu, 30 Apr 2026 23:38:31 +0200 Message-ID: <20260430213835.62217-28-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Update the RT_GROUP_SCHED specific documentation. Give a brief theoretical background for Hierarchical Constant Bandwidth Server (HCBS). Document how the HCBS is implemented in the kernel and how the RT_GROUP_SCHED behaves now compared to the version which this patchset replaces. Signed-off-by: Yuri Andriaccio --- Documentation/scheduler/sched-rt-group.rst | 504 +++++++++++++++++---- 1 file changed, 428 insertions(+), 76 deletions(-) diff --git a/Documentation/scheduler/sched-rt-group.rst b/Documentation/sch= eduler/sched-rt-group.rst index ab464335d320..eb2a9235fb00 100644 --- a/Documentation/scheduler/sched-rt-group.rst +++ b/Documentation/scheduler/sched-rt-group.rst @@ -53,9 +53,12 @@ CPU time is divided by means of specifying how much time= can be spent running in a given period. We allocate this "run time" for each real-time group wh= ich the other real-time groups will not be permitted to use. -Any time not allocated to a real-time group will be used to run normal pri= ority -tasks (SCHED_OTHER). Any allocated run time not used will also be picked u= p by -SCHED_OTHER. +Each real-time group runs at the same priority as SCHED_DEADLINE, thus they +share and contend the SCHED_DEADLINE allowed bandwidth. Any time not alloc= ated +to a real-time group (and SCHED_DEADLINE tasks) will be used to run both +SCHED_FIFO/SCHED_RR, normal priority tasks (SCHED_OTHER), and SCHED_EXT ta= sks, +following the usual priorities. Any allocated run time not used will also = be +picked up by the other scheduling classes, in the same order as before. Let's consider an example: a frame fixed real-time renderer must deliver 25 frames a second, which yields a period of 0.04s per frame. Now say it will= also @@ -73,10 +76,6 @@ The remaining CPU time will be used for user input and o= ther tasks. Because real-time tasks have explicitly allocated the CPU time they need to perform their tasks, buffer underruns in the graphics or audio can be eliminated. -NOTE: the above example is not fully implemented yet. We still -lack an EDF scheduler to make non-uniform periods usable. - - 2. The Interface =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D @@ -86,40 +85,92 @@ lack an EDF scheduler to make non-uniform periods usabl= e. The system wide settings are configured under the /proc virtual file syste= m: -/proc/sys/kernel/sched_rt_period_us: +``/proc/sys/kernel/sched_rt_period_us``: The scheduling period that is equivalent to 100% CPU bandwidth. -/proc/sys/kernel/sched_rt_runtime_us: - A global limit on how much time real-time scheduling may use. This is al= ways - less or equal to the period_us, as it denotes the time allocated from the - period_us for the real-time tasks. Without CONFIG_RT_GROUP_SCHED enabled, - this only serves for admission control of deadline tasks. With - CONFIG_RT_GROUP_SCHED=3Dy it also signifies the total bandwidth availabl= e to - all real-time groups. +``/proc/sys/kernel/sched_rt_runtime_us``: + A global limit on how much time real-time scheduling may use (SCHED_DEAD= LINE + tasks + real-time groups). This is always less or equal to the period_us= , as + it denotes the time allocated from the period_us for the real-time tasks. + Without **CONFIG_RT_GROUP_SCHED** enabled, this only serves for admission + control of deadline tasks. With **CONFIG_RT_GROUP_SCHED=3Dy** it also si= gnifies + the total bandwidth available to both real-time groups and deadline task= s. * Time is specified in us because the interface is s32. This gives an operating range from 1us to about 35 minutes. - * sched_rt_period_us takes values from 1 to INT_MAX. - * sched_rt_runtime_us takes values from -1 to sched_rt_period_us. - * A run time of -1 specifies runtime =3D=3D period, ie. no limit. - * sched_rt_runtime_us/sched_rt_period_us > 0.05 inorder to preserve - bandwidth for fair dl_server. For accurate value check average of - runtime/period in /sys/kernel/debug/sched/fair_server/cpuX/ - - -2.2 Default behaviour ---------------------- - -The default values for sched_rt_period_us (1000000 or 1s) and -sched_rt_runtime_us (950000 or 0.95s). This gives 0.05s to be used by -SCHED_OTHER (non-RT tasks). These defaults were chosen so that a run-away -real-time tasks will not lock up the machine but leave a little time to re= cover -it. By setting runtime to -1 you'd get the old behaviour back. - -By default all bandwidth is assigned to the root group and new groups get = the -period from /proc/sys/kernel/sched_rt_period_us and a run time of 0. If you -want to assign bandwidth to another group, reduce the root group's bandwid= th -and assign some or all of the difference to another group. + * ``sched_rt_period_us`` takes values from 1 to INT_MAX. + * ``sched_rt_runtime_us`` takes values from -1 to ``sched_rt_period_us``. + * A run time of -1 specifies runtime =3D=3D period, i.e., no limit, but = also + disables admission tests for SCHED_DEADLINE. + +The default value for ``sched_rt_period_us`` is 1000000 (or 1s) and for +``sched_rt_runtime_us`` is 1000000 (or 1s), while fair-servers and ext-ser= vers +have a default runtime of 50ms and default period of 1s, giving a minimum = of +0.05s to be used by SCHED_FIFO/SCHED_RR and non-RT tasks (SCHED_OTHER, +SCHED_EXT), while 0.95s are the maximum to be used by SCHED_DEADLINE, and +rt-cgroups if enabled. + +2.2 Cgroup settings +------------------- + +Enabling **CONFIG_RT_GROUP_SCHED** lets you explicitly allocate real CPU +bandwidth to task groups. + +This uses the cgroup virtual file system and the CPU controller for cgroup= s. +Enabling the controller for the hierarchy creates two files: + +* ``/cpu.rt_period_us``, the scheduling period of the group. +* ``/cpu.rt_runtime_us``, the maximum runtime each CPU will provide + every period. + + .. tip:: + For more information on working with control groups, you should read + *Documentation/admin-guide/cgroup-v1/cgroups.rst* as well. + .. + +By default the root cgroup has the same period of +``/proc/sys/kernel/sched_rt_period_us``, which is 1s, and a runtime of zer= o, so +that rt-cgroup is *soft-disabled* by default, and all the runtime is avail= able +for SCHED_DEADLINE tasks only. New groups instead get both a period and a +runtime of zero. + +2.3 Cgroup Hierarchy and Behaviours +----------------------------------- + +With HCBS, cgroups may act either as task runners or bandwidth reservation: + +* A bandwidth reservation cgroup (such as the root control group), has the + purpose to reserve a portion of the total real-time bandwidth for its su= b-tree + of groups. A group in this state cannot run SCHED_FIFO/SCHED_RR tasks. + + .. important:: + The *root control group* behaviour is different from the other cgroups= , as + its job is to reserve bandwidth for the whole group hierarchy, but it = can + also run rt tasks. This is an exception: FIFO/RR tasks running in the + root cgroup follow the same rules as FIFO/RR tasks in a kernel which h= as + **CONFIG_RT_GROUP_SCHED=3Dn**, and the bandwidth reservation is instea= d a + feature connected to HCBS, that acts on the cgroup tree. + .. + +* A *live* group instead can be used to run FIFO/RR tasks, with the given + bandwidth parameters: each CPU is served a *potentially continuous* runt= ime of + ``/cpu.rt_runtime_us`` every period ``/cpu.rt_period_us`= `. It + is important to notice that increasing the period but leaving the bandwi= dth + constant changes the behaviour of the cgroup's servers, as the bandwidth= given + overall is the same, but it is given in longer bursts (and longer slices= of no + bandwidth). + +More specifically on *live* and non-*live*: + +* A group is deemed *live* if it is a leaf of the groups' hierarchy or all= of + its children have runtime 0. +* *Live* groups are the only groups allowed to run real-time tasks. A SCHE= D_FIFO + task cannot be migrated in a non-*live* group, neither a task inside this + group can change scheduling policy to SCHED_FIFO/SCHED_RR if the group i= s not + *live*. +* Non-*live* groups are only used for bandwidth reservation. +* Group's bandwidth follow this invariant: the sum of the bandwidths of a + group's children is always less than or equal to the group's bandwidth. Real-time group scheduling means you have to assign a portion of total CPU bandwidth to the group before it will accept real-time tasks. Therefore yo= u will @@ -128,63 +179,364 @@ done that, even if the user has the rights to run pr= ocesses with real-time priority! -2.3 Basis for grouping tasks ----------------------------- +3. Theoretical Background +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D + + + .. BIG FAT WARNING ****************************************************** + + .. warning:: + + This section contains a (not-thorough) summary on deadline/hierarchical + scheduling theory, and how it applies to real-time control groups. + The reader can "safely" skip to Section 4 if only interested in seeing + how the scheduling policy can be used. Anyway, we strongly recommend + to come back here and continue reading (once the urge for testing is + satisfied :P) to be sure of fully understanding all technical details. + + .. **********************************************************************= ** + +The real-time cgroup scheduler is based upon the **Hierarchical Constant +Bandwidth Server** (HCBS) [1] *Compositional Scheduling Framework* (CSF). A +**CSF** is a framework where global (system-level) timing properties can be +established by composing independently (specified and) analyzed local +(component-level) timing properties [5]. + +For HCBS (related to the Linux kernel), the compositional framework consis= ts of +two parts: + +* The *scheduling components*, which are the basic units of the scheduling= . In + the kernel these are the single cgroups along with the tasks that must b= e run + inside. + +* The *scheduling resources*, which are the CPUs of the machine. + +HCBS is a *hierarchical scheduling framework*, where the scheduling compon= ents +form a hierarchy and resources are allocated from parent components to its= child +components in the hierarchy. + +The Chapter is organized as follows: **Section 3.1** gives basic real-time +theory definitions that are used throughout the whole section. **Section 3= .2** +talks about the HCBS framework, giving a general idea on how this is struc= tured. +**Section 3.3** introduces the MPR model, one of the many models which may= be +used for the analysis of the scheduling components and the computation of = the +minimum required scheduling resources for a given component. **Section 3.4= ** +shows the schedulability test for MPR on the HCBS framework. **Section 3.5= ** +shows how to convert a MPR interface to a HCBS compatible resource reserva= tion +for a component. Finally, **Section 3.6** lists other interesting models w= hich +could be used for the component analysis in HCBS. + +3.1 Basic Definitions +--------------------- + +*We borrow the same definitions given in the* ``sched_deadline`` *document= , which +are very briefly summarized here, and new ones, needed by the following co= ntent, +are added.* + +A typical real-time task is composed of a repetition of computation phases= (task +instances, or jobs) which are activated on a periodic or sporadic fashion.= For +our purposes, real-time tasks are characterized by three parameters: + +* Worst Case Execution Time (WCET): the maximum execution time among all j= obs. +* Relative Deadline (D): the maximum time each job must be completed, rela= tive + to the release time of the job. +* Inter-Arrival Period (P): the exact/minimum (for periodic/sporadic tasks= ) time + between each consecutive job. + +3.2 Hierarchical Constant Bandwidth Server (HCBS) [1] +----------------------------------------------------- + +As mentioned, HCBS is a *hierarchical scheduling framework*: + +* The framework hierarchy follows the same hierarchy of cgroups. Cgroups m= ay + have two roles, either bandwidth reservation for children cgroups, or th= ey may + be *live*, i.e. run tasks (but not both). The root cgroup, for the kerne= l's + implementation of HCBS, acts only as bandwidth reservation (but as writt= en in + this document it has also different uses outside of the hierarchical + framework). +* The cgroup tree is internally flattened, for ease of scheduling, to a + two-level hierarchy, since only the *live* groups are of interest and al= l the + necessary information for their scheduling lies in their interface (ther= e is + no need for the reservation components). +* The hierarchical framework, now on two levels, consists then of a first = level + of cgroups, and a second level of tasks that are run inside these groups. +* The scheduling of components is performed using global Earliest Deadline= First + (gEDF), SCHED_DEADLINE in the kernel, following the bandwidth reservatio= n of + each group. +* Whenever a component is scheduled, a local scheduler picks which of the = tasks + of the cgroup to run. The scheduling policy is global Fixed Priority (gF= P), + SCHED_FIFO/SCHED_RR in the kernel. + +3.3 Multiprocessor Periodic Resource (MPR) model +------------------------------------------------ + +A Multiprocessor Periodic Resource (MPR) model [2] **u =3D = ** +specifies that an identical, unit-capacity multiprocessor platform collect= ively +provides **Theta** units of resource every **Pi** time units, where the +**Theta** time units are supplied with concurrency at most **m'**. + +This theoretical model is one of the many models that can abstract the +interface of our real-time cgroups: let **m'** be the number of CPUs of the +machine, let **Theta** be **m' * /cpu.rt_runtime_us** and **Pi** be +**/cpu.rt_period_us**. -Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real -CPU bandwidth to task groups. +Let's introduce the concept of Supply Bound Function (SBF). A SBF is a fun= ction +which outputs a lower bound for the processor supply provided in a given t= ime +interval, given a resource supply model. For a completely dedicated CPU, t= he SBF +function is simply the identity function, as it will always provide **t** = units +of computation for an interval of length **t**. The situation gets slightl= y more +complicated for the MPR model or any of the other model listed in section = 3.6. -This uses the cgroup virtual file system and "/cpu.rt_runtime_us" -to control the CPU time reserved for each control group. +The **SBF(t)** for a MPR model **u =3D ** is:: -For more information on working with control groups, you should read -Documentation/admin-guide/cgroup-v1/cgroups.rst as well. + | 0 if t' < 0 + | + SBF_u(t) =3D | floor(t' / PI) * Theta + | + max(0, m' * x - (m' * Pi - Theta) if t' >=3D 0 and 1 = <=3D x <=3D y + | + | floor(t' / PI) * Theta + | + max(0, m' * x - (m' * Pi - Theta) else + | - (m' - beta) -Group settings are checked against the following limits in order to keep t= he -configuration schedulable: +where:: - \Sum_{i} runtime_{i} / global_period <=3D global_runtime / global_period + alpha =3D floor(Theta / m') + beta =3D Theta - m' * alpha + t' =3D t - (Pi - ceil(Theta / m')) + x =3D t' - (Pi * floor(t' / Pi)) + y =3D Pi - floor(Theta / m') -For now, this can be simplified to just the following (but see Future plan= s): +Briefly, this function models that the server's bandwidth is given as late= as +possible, so describing the worst case possible for the supplied bandwidth. - \Sum_{i} runtime_{i} <=3D global_runtime +3.4 Schedulability for MPR on global Fixed-Priority +--------------------------------------------------- +Let's introduce the concept of Demand Bound Function (DBF). A DBF is a fun= ction +that, given a taskset, a scheduling algorithm and an interval of time, out= puts +the worst resource demand for that interval of time. -3. Future plans -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +It is easy to see that, given a DBF and a SBF, we can deem a component/tas= kset +schedulable if, for every time interval t >=3D 0, it is possible to demons= trate +that: -There is work in progress to make the scheduling period for each group -("/cpu.rt_period_us") configurable as well. + DBF(t) <=3D SBF(t) -The constraint on the period is that a subgroup must have a smaller or -equal period to its parent. But realistically its not very useful _yet_ -as its prone to starvation without deadline scheduling. +We have the Supply Bound Function for our given MPR model, so we are missi= ng the +Demand Bound Function for a given taskset that is being scheduled using gl= obal +Fixed Priority. -Consider two sibling groups A and B; both have 50% bandwidth, but A's -period is twice the length of B's. +3.4.1 Schedulability Analysis for global Fixed Priority +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -* group A: period=3D100000us, runtime=3D50000us +Bertogna, Cirinei and Lipari [6] have derived a schedulability test for gl= obal +Fixed Priority (gFP) on multi-processor platforms. In this test (called +*BCL_gFP* test) we can consider all the CPUs to be dedicated to the schedu= ling. - - this runs for 0.05s once every 0.1s + A taskset **Tau** is schedulable with gFP on a multiprocessor platform + composed of **m'** identical processors if for each task **tau_k in Tau*= *: -* group B: period=3D 50000us, runtime=3D25000us + Sum(for i < k)( min(W_i(D_k), D_k - C_k + 1) ) < m' * (D_k - C_k + 1) - - this runs for 0.025s twice every 0.1s (or once every 0.05 sec). + where **W_i(t)** is the workload of task **tau_i** over a time interval = **t**: -This means that currently a while (1) loop in A will run for the full peri= od of -B and can starve B's tasks (assuming they are of lower priority) for a who= le -period. + W_i(t) =3D N_i(t) * C_i + min(C_i, t + D_i - C_i - N_i(t) * P_i) -The next project will be SCHED_EDF (Earliest Deadline First scheduling) to= bring -full deadline scheduling to the linux kernel. Deadline scheduling the above -groups and treating end of the period as a deadline will ensure that they = both -get their allocated time. + and **N_i(t)** is the number of activations of task **tau_i** that compl= ete in + a time interval **t**: -Implementing SCHED_EDF might take a while to complete. Priority Inheritanc= e is -the biggest challenge as the current linux PI infrastructure is geared tow= ards -the limited static priority levels 0-99. With deadline scheduling you need= to -do deadline inheritance (since priority is inversely proportional to the -deadline delta (deadline - now)). + N_i(t) =3D floor( (t + D_i - C_i) / P_i ) + + while the **min** term is the contribution of the carried-out job in the + interval **t**, i.e. that job that does not completely fit in the interv= al + **t**, but starts inside the interval after all the jobs that complete. + +3.4.2 From BCL_gFP to the Demand Bound Function +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We can then derive the DBF from this test: + + DBF_gFP(tau_k) =3D Sum(for i < k)( min(W_i(D_k), D_k - C_k + 1) ) + m' *= (C_k - 1) + +Briefly, the first sum component, the same in the BCL_gFP test, describes = the +maximum interference that higher priority task give to the analysed task. = The +workload is upperbounded by ``(D_k - C_K + 1)`` because we are only intere= sted +in the interference in the slack time, while for the ``C_k`` time we are +requiring that all the CPUs are fully available, as the single job needs `= C_k` +(non overlapping) time units to run. + +The demand bound function from Bertogna et al. is only defined on a single= time +(i.e. the deadline of the task in analysis) instead of all possible times = as +this is the minimum argument to demonstrate schedulability on global Fixed +Priority. + +3.4.3 Putting it all togheter +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A component **C**, on **m'** processors, running a taskset **Tau =3D { tau= _1 =3D +(C_1, D_1, P_1), ..., tau_n =3D (C_n, D_n, P_n) }** of **n** sporadic task= s, is +schedulable under gFP using an MPR model **u =3D **, if for= all +tasks **tau_k in Tau**: + + DBF_gFP(tau_k) <=3D SBF_u(D_K) + +3.5 From MPR to deadline servers +-------------------------------- + +Since there exist no algorithm to schedule MPR interfaces, a tecnique was +developed to transform MPR interfaces into periodic tasks, so that a +number of periodic servers which respect the tasks requirements can be use= d for +the scheduling of the MPR interface and associated tasks. + +Let **u =3D ** be a MPR interface, let **a =3D Theta - m * f= loor(Theta +/ m)**, let **k =3D floor(a)**. Define a transformation from **u** to a pe= riodic +taskset **Tau_u =3D { tau_1 =3D (C_1, D_1, P_1), ..., tau_m' =3D (C_m', D_= m', P_m') +}**, where: + + **tau_1 =3D ... =3D tau_k =3D (floor(Theta / m') + 1, Pi, Pi)** + + **tau_k+1 =3D (floor(Theta / m') + a - k * floor(a/k), Pi, Pi)** + + **tau_k+2 =3D ... =3D tau_m' =3D (floor(Theta / m'), Pi, Pi)** + +This periodic taskset of servers **Tau_u** can be scheduled on any number = of +processors with concurrency at most **m'**. + +For real-time control groups, it is possible to just consider a slightly m= ore +demanding taskset **Tau_u'**, where each task **tau_i** is defined as foll= ows: + + **tau_i =3D (ceil(Theta / m'), Pi, Pi)** + +3.6 Other models +---------------- + +There exist many other theoretical models in literature which are used to +describe a hierarchical scheduling framework on multi-core architectures. +Notable examples are the Multi Supply Function (MSF) abstraction [3], the +Parallel Supply Function (PSF) abstraction [4] and the Bounded Delay +Multipartition (BDM) [7]. + +3.7 References +-------------- + 1 - L. Abeni, A. Balsini, and T. Cucinotta, =E2=80=9CContainer-based rea= l-time + scheduling in the Linux kernel,=E2=80=9D SIGBED Rev., vol. 16, no. 3= , pp. 33-38, + Nov. 2019, doi: 10.1145/3373400.3373405. + 2 - A. Easwaran, I. Shin, and I. Lee, =E2=80=9COptimal virtual cluster-b= ased + multiprocessor scheduling,=E2=80=9D Real-Time Syst, vol. 43, no. 1, = pp. 25-59, + Sept. 2009, doi: 10.1007/s11241-009-9073-x. + 3 - E. Bini, G. Buttazzo, and M. Bertogna, =E2=80=9CThe Multi Supply Fun= ction + Abstraction for Multiprocessors,=E2=80=9D in 2009 15th IEEE Internat= ional + Conference on Embedded and Real-Time Computing Systems and Applicati= ons, + Aug. 2009, pp. 294-302. doi: 10.1109/RTCSA.2009.39. + 4 - E. Bini, B. Marko, and S. K. Baruah, =E2=80=9CThe Parallel Supply Fu= nction + Abstraction for a Virtual Multiprocessor,=E2=80=9D in Scheduling, S.= Albers, S. K. + Baruah, R. H. M=C3=B6hring, and K. Pruhs, Eds., in Dagstuhl Seminar = Proceedings + (DagSemProc), vol. 10071. Dagstuhl, Germany: Schloss Dagstuhl - + Leibniz-Zentrum f=C3=BCr Informatik, 2010, pp. 1-14. doi: + 10.4230/DagSemProc.10071.14. + 5 - I. Shin and I. Lee, =E2=80=9CCompositional real-time scheduling fram= ework,=E2=80=9D in + 25th IEEE International Real-Time Systems Symposium, Dec. 2004, pp. = 57-67. + doi: 10.1109/REAL.2004.15. + 6 - M. Bertogna, M. Cirinei, and G. Lipari, =E2=80=9CSchedulability Anal= ysis of Global + Scheduling Algorithms on Multiprocessor Platforms,=E2=80=9D IEEE Tra= nsactions on + Parallel and Distributed Systems, vol. 20, no. 4, pp. 553-566, Apr. = 2009, + doi: 10.1109/TPDS.2008.129. + 7 - G. Lipari and E. Bini, =E2=80=9CA Framework for Hierarchical Schedul= ing on + Multiprocessors: From Application Requirements to Run-Time Allocatio= n,=E2=80=9D in + 2010 31st IEEE Real-Time Systems Symposium, Nov. 2010, pp. 249-258. = doi: + 10.1109/RTSS.2010.12. + + +4. Using Real-Time cgroups +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +4.1 CGroup Setup +---------------- -This means the whole PI machinery will have to be reworked - and that is o= ne of -the most complex pieces of code we have. +The following is a brief guide to the use of Real-Time Control Groups. + +Of course, real-time control groups require mounting of the cgroup file sy= stem. +We have decided to only support cgroups v2, so make sure you mount the v2 +controller for the cgroup hierarchy. + +Additionally the real-time cgroups require the CPU controller for the cgro= ups to +be enabled:: + + # Assume the cgroup file system is mounted at /sys/fs/cgroup + > echo "+cpu" > /sys/fs/cgroup/cgroup.subtree_control + +The CPU controller can only be mounted if there is no SCHED_FIFO/SCHED_RR = task +scheduled in any cgroup other than the root control group. + +The root control group has no bandwidth allocated by default, so make sure= to +allocate some bandwidth so that it can be used by the other cgroups. More = on +that in the following section... + +4.2 Bandwidth Allocation for groups +----------------------------------- + +Allocating bandwidth to a cgroup is a fundamental step to run real-time +workload. The cgroup filesystem exposes two files: + +* ``/cpu.rt_runtime_us``: which specifies the cgroups' runtime in + microseconds. +* ``/cpu.rt_period_us``: which specifies the cgroups' period in + microseconds. + +Both files are readable and writable, and their default value is zero. By +definition, the specified runtime must be always less than or equal to the +period. Additionally, an admission test checks if the bandwidth invariant = is +respected (i.e. sum of children's bandwidth <=3D parent's bandwidth). + +The root control group files instead control and reserve the SCHED_DEADLINE +bandwidth allocated to real-time cgroups, since real-time groups compete a= nd +share the same bandwidth allocated to SCHED_DEADLINE tasks. + +4.3 Running real-time tasks in groups +------------------------------------- + +To run tasks in real-time groups it is just necessary to change a tasks +scheduling policy to SCHED_FIFO/SCHED_RR and migrate it into the group. If= the +group is not allowed to run real-time tasks because of incorrect configura= tion, +either migrating a SCHED_FIFO/SCHED_RR task into the group or changing +scheduling policy to a task already inside the group will fail:: + + # assume there is a task of PID 42 running + # change its scheduling policy to SCHED_FIFO, priority 99 + > chrt -f -p 99 42 + + # migrate the task to a cgroup + > echo 42 > /sys/fs/cgroup//cgroup.procs + +4.4 Special case: the root control group +---------------------------------------- + +The root cgroup is special, compared to the other cgroups, as its tasks ar= e not +managed by the HCBS algorithm, rather they just use the original +SCHED_FIFO/SCHED_RR policies (as if CONFIG_RT_GROUP_SCHED was disabled). As +mentioned, its bandwidth files are just used to control how much of the +SCHED_DEADLINE bandwidth is allocated to cgroups. + +4.5 Guarantees and Special Behaviours +------------------------------------- + +Real-time cgroups are run at the same priority level of SCHED_DEADLINE tas= ks. +Since this is the highest priority scheduling policy, and since the Consta= nt +Bandwidth Server (CBS) enforces that the specified bandwidth requirements = for +both groups and tasks cannot be overrun, real-time groups have the same +guarantees that SCHED_DEADLINE tasks have, i.e. they will be necessarily +supplied by the amount of bandwidth requested (whenever the admission tests +pass). + +This means that, since SCHED_FIFO/SCHED_RR tasks (scheduled in the root co= ntrol +group) are not subject to bandwidth controls, they are run at a lower prio= rity +than the cgroups' counterparts. Nonetheless, a minimum amount of bandwidth= , if +reserved, will always be available to run SCHED_FIFO/SCHED_RR workloads in= the +root cgroup, while they will be able to use more runtime if any of the +SCHED_DEADLINE tasks or servers use less than their specified amount of +bandwidth. SCHED_OTHER tasks are instead scheduled as normal, at lower pri= ority +than real-time workloads. + +The aforementioned behaviour differs from the preceding RT_GROUP_SCHED +implementation, but this is necessary to give actual guarantees to the amo= unt of +bandwidth given to rt-cgroups. \ No newline at end of file -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f53.google.com (mail-wr1-f53.google.com [209.85.221.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41EDF3D8917 for ; Thu, 30 Apr 2026 21:39:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585168; cv=none; b=Npw13j3pQen/kYY1tVAPgAHgRJ0EmKWAnV4B+NdhJKmNECIuq1O9q5QQGmBuwHC0FGWAeqDftU/XIPrvlaFIml0S75YGu+F+j94ZB2Ps0ENL39HsauHP7X5PdUMO+lFZ9I4gWBdSUK6G8HWTYttCTRkm/UQWeYDhRk3xjhrhcIs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585168; c=relaxed/simple; bh=n8kSWG238BpKIbz+RfEJ17iTn7RjOWehuHX7a+HSzYw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OXM9d8IcfK4aN1SLReoG675FPevJc7lqlDUDEfsJ7F30kWnbA64k1rRvicUEpuyJaQbE9JLXVmChs6/yKbWMKYaa/PN1iYmVWw1RVs9XY0CPAj6LpOiUZO1tV5xzMiZZ2qWGdnB0KOB5sxQIZ29pO6MBH4hdI/Mp9GEoykb1VWk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZizZzwtV; arc=none smtp.client-ip=209.85.221.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZizZzwtV" Received: by mail-wr1-f53.google.com with SMTP id ffacd0b85a97d-43d7645adbdso969456f8f.1 for ; Thu, 30 Apr 2026 14:39:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585166; x=1778189966; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wd1MsghUVfBoVfN2EBNqIZBE5g9BWb3RjhrqDXvXoj4=; b=ZizZzwtVb69XQn+6+7ETRqqWPJ+bIZxEhBs8R/Iz0PcJV6zeiiSAR2MXEIGUZoUJiz sy3eUMFny9nc5Tg3XPoACYOxX0bWLEV0iVBFO+85xqIm8mGoIgBFNlr/1zNBjyJicZjP dy4ReuHFJYxK3wKJeB/5kT67gj1cpN8/t4IVCh7nM23aOMn+j2vrppxy9shOU3oWTfVv FQReblayUz2pLBAd1moDyUS3wYDfxLiCIG0Q7ilWvrop83DMovl3HzaVPiTS54K/LSLU K2UBKFcEtGe6F9h9/RDCRF2fpozxvvsGC1CEzeSJ4wP7RFh2vy4XHmDSIUcQaMlEU8NF peAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585166; x=1778189966; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=wd1MsghUVfBoVfN2EBNqIZBE5g9BWb3RjhrqDXvXoj4=; b=Qwm/fJ/gdg2HqIaBZWeI7WyMgjvVkdqL9iXraSr5gqIyoxINcl6HUXGEMwtYy0ERj7 ePgHmiXxs6HKwQATRX9PaENhFCFpVyVG7+LFh5ZDD8oPaUxJzxe0fJgWcgDser61owEU Nuy5Fvxll6aITuP7W3lBgi3nFrWO0EC3j/TNAIi0MCQ89e0BwN77nBvSulBeYVzdF/1y yp8qNVzkM87sz/rdroJKnJ7tGmtqn4f8xEBnissp6j+pftsF3tM9T/v0c7rIcwcMeBue Fl1m32bpUfs0QuXnGrPG15MBES5O//Mt+QiVhZeteD+HcTeoNuUEalzRT83LDGX4/GmQ Ft5g== X-Gm-Message-State: AOJu0YzznaASVmfD2uHgvyWVqFacfJbAFYG4nyY7GjxxEe7Elp9WxOg6 Lq1hLeJxHdl0eISK1vrgLgIACif3shF+PiNirmFhNsBOCrEeLy7AqnJS X-Gm-Gg: AeBDievTD2KrOvP2HdJUWVSunOKTPQtu4xbL42BA6TgqAhO90nVHqhOMcW8uPxlIAwK sVYpOq9JX1D/CugtqSoEBbEHT+wgbfxPRqXEy5hh6qaNTDggTX5MMfn0HGRy+FGtdnR1HOxreZE pdB7wven1AIQ1kK114tyzXO/JtI2hmieqJgQesR5665KIvk6SeUitZIhRHOopKEInqN6lO06OZe XIx1rxLr69CzPg0mlP8YhCBj5TkHrs+6p/XhXwrM/lZBloBzz5MZQWE7FTDqfMMHHqeFWNe0YY0 48QZMLUgFPUDo5vCAeuQWUHLSjXxt9kxgbU4hAVo1oTfkLFrQ8ViOXN2IZF/R5UQudGKNNiy7vv iOjyLa2h6I/urwMBV75BUH4+apvhcgZIUNJ8PnoXRx8pwTAbnlAwoe+Inix4XEd5SDB1haJeisM QG/7oCCTjDsZxFU04O82JCVqNqg63fkuFE+eSfILFv X-Received: by 2002:a05:6000:2385:b0:43d:7ba4:6b5a with SMTP id ffacd0b85a97d-4493f814094mr7315911f8f.22.1777585165647; Thu, 30 Apr 2026 14:39:25 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:25 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 28/29] sched/rt: Add debug BUG_ONs for pre-migration code Date: Thu, 30 Apr 2026 23:38:32 +0200 Message-ID: <20260430213835.62217-29-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add debug BUG_ONs in rt_queue_push/pull_task(s). Can be safely added after all the pre-migration patches. These are extra asserts which are only useful to debug the kernel code and are not meant to be part of the final patchset. Signed-off-by: Yuri Andriaccio --- kernel/sched/rt.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 4553a139398f..6cecda2ce812 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -340,6 +340,9 @@ static inline void rt_queue_push_tasks(struct rt_rq *rt= _rq) { struct rq *rq =3D served_rq_of_rt_rq(rt_rq); + BUG_ON(rt_rq =3D=3D NULL); + BUG_ON(rq !=3D cpu_rq(rq->cpu)); + if (!has_pushable_tasks(rt_rq)) return; @@ -350,6 +353,9 @@ static inline void rt_queue_pull_task(struct rt_rq *rt_= rq) { struct rq *rq =3D served_rq_of_rt_rq(rt_rq); + BUG_ON(rt_rq =3D=3D NULL); + BUG_ON(rq !=3D cpu_rq(rq->cpu)); + queue_balance_callback(rq, &per_cpu(rt_pull_head, rq->cpu), pull_rt_task); } -- 2.53.0 From nobody Tue Jun 16 15:55:05 2026 Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD4263D7D91 for ; Thu, 30 Apr 2026 21:39:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585170; cv=none; b=YT1eTUQFyTEw70Kg1EHbWBjMo9bR6+F0s2RlLEgV4yeNywj1yaHnPZ0No9KXweENri4upBahw2FLF8K0nyQBMTqQJmBeCFT0nzyzFrEFAVqb5Sbw7HP/OlxkRYKAUsVmWA8UnWwapM0jqMxBdPopPVaO/oc6OKlU7UjjEfTL+yc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777585170; c=relaxed/simple; bh=84Uvtc2E1CX4uSHDFqwHEKEKikKyqDb0EjoeCOl7h8E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gDtuJc1DGIOjm+HU7MMciuZmnUNfICQPVwFFJVXCClhFu+Topy9jYDruG9d/umymKf6uzEbYqRwdnkmmwnkecOVom92RiSLUbNtiA5eSSctNzbovKiIWKbwENw2CMugaa8lsYVhtCHJk2AqrYuEHXpb6BHfS6m/CNtg1rDtyK44= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=DA0bL13t; arc=none smtp.client-ip=209.85.221.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="DA0bL13t" Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-43eb012ac4fso838518f8f.0 for ; Thu, 30 Apr 2026 14:39:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777585167; x=1778189967; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xQO/KziHXiXlUh9U2VuvhVYN2sshdGCBqM9ngH/VryU=; b=DA0bL13tdqxELbXITc5iLZcLbesMfAKxqt8Kwi0isYtnVegE0zX8fNRMwlOJMsKQ5t MvjFZ5zfDqh0ALtcaD7EgwMMKX7zfgQrciRh6/GI5QwhThswPnRXUu0aBYsCGIXLSSXg VEoFmXy8IyknNygjgvamROzBVfqSuWZMBi28+fSuL2Tf0LkP16ta9ivNChTr4PxAQUnm 4EQNMy/hI8XN0b+t5SMNkrYfEQZHdDbVZ0ClVcKsi6n0lGrWqu0rtRuo1Nl2BCDjJI7v KpTs0s5JiAltfJnKO4aOeVIUHupQ5PXzLsoOv7bUFm0OiYde6YZe3JnPPiz9eA8bs8Fg ndUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777585167; x=1778189967; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=xQO/KziHXiXlUh9U2VuvhVYN2sshdGCBqM9ngH/VryU=; b=QGSaQ4AwtHfrlBXdKAZaVW1s4RfbXImcj5dnLer7Eny4kABkoXfvft4yL4GHD8uuAI If5N+XIe+Sd80IUa+Mi5WLJ/+mp7UbCgfwacrHlTiE0/jZ4w8ZCS8GSt1fSDfvcQFwga Fbya9742Z9QqrtLNOBDmu+VmFpdNWbr4uQGvokBCOoU1qccDuPA20uxrU0fiR6Ej+zIl 2oPfPSW6RTapNhHZZ8ONZ+FQ10mh4F40D+8dW79wHUYtrq3GGQueJKNC4vGLrD+zmGnU q/e0gbj6dcid2kru++b+S+cTny9vGseebIojM4gjH0muFW1YPXSNJE5tNfnKMzdGFllS o46w== X-Gm-Message-State: AOJu0YxqItcBH80fgekURz+MixmwhnVK8WqXmLnpOeZqNh6zyMczRy7M 28Rdo9jw9Uent77wkgSOtfM+bKU1tFQ+RuPQ0KnSIIQlceWe74QB33uL X-Gm-Gg: AeBDievDekN1SPVO5f0VBO8ifd16vDpCYV+mTbUkmaIW9eCeVVpMwboJOFSc6/VzOdr 1T6AhI9Ks9tCKzKL2NB8mRnnXfVe0I0J0kbD/deBcW81HoalN4EnngeEg0tzSCAdS4HnMKgvDKg gQ1/FFTCv63TysecIyrJqkh+jEUdMXdyefbx/QFtOsHv+yljWLqUZxphsSmpajvd8bTNPHaLhMg 36qNMCh7IMldLnhO1bZ2KXoScue3WQb3IFS6eVLlZuv/h37CKkfPa0iMk0lA/48HouKhKO0KlVG 6M/dPw66G2AyGdwHwrFtRWzPiHKs4bHtIuJaYgeCXYMCpkI2XvJ+Rznwh5AypHvqXfRTd1eKijQ aHqoweXkoehUubQvUfN34hWX0HdHJ3nLv1nCMQUj9eTtEFZZ7d8HxNpoHIAJvrE3jxVOKIO2F5Y 1JQnl6fDMXjC72boWxct5ixbl73QZnVUQ+3tuLh/Wx X-Received: by 2002:a05:6000:18a9:b0:43d:73d4:b34 with SMTP id ffacd0b85a97d-4493d4122f2mr8443333f8f.16.1777585167270; Thu, 30 Apr 2026 14:39:27 -0700 (PDT) Received: from yuri-framework13 ([78.211.51.156]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-44a9879ef89sm418510f8f.30.2026.04.30.14.39.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Apr 2026 14:39:26 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [RFC PATCH v5 29/29] sched/rt: Add debug BUG_ONs in migration code Date: Thu, 30 Apr 2026 23:38:33 +0200 Message-ID: <20260430213835.62217-30-yurand2000@gmail.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260430213835.62217-1-yurand2000@gmail.com> References: <20260430213835.62217-1-yurand2000@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add debug BUG_ONs in group specific migration functions. Can be safely added after all the migration patches. These are extra asserts which are only useful to debug the kernel code and are not meant to be part of the final patchset. Signed-off-by: Yuri Andriaccio --- kernel/sched/rt.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 6cecda2ce812..9f938ce84485 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -370,6 +370,9 @@ static void rt_queue_push_from_group(struct rt_rq *rt_r= q) struct rq *rq =3D served_rq_of_rt_rq(rt_rq); struct rq *global_rq =3D cpu_rq(rq->cpu); + BUG_ON(rt_rq =3D=3D NULL); + BUG_ON(rq =3D=3D global_rq); + if (global_rq->rq_to_push_from) return; @@ -387,6 +390,10 @@ static void rt_queue_pull_to_group(struct rt_rq *rt_rq) struct rq *global_rq =3D cpu_rq(rq->cpu); struct sched_dl_entity *dl_se =3D dl_group_of(rt_rq); + BUG_ON(rt_rq =3D=3D NULL); + BUG_ON(!is_dl_group(rt_rq)); + BUG_ON(rq =3D=3D global_rq); + if (dl_se->dl_throttled || global_rq->rq_to_pull_to) return; @@ -1408,6 +1415,8 @@ static struct rq *find_lock_lowest_rq(struct task_str= uct *task, struct rq *rq) */ static int push_rt_task(struct rq *rq, bool pull) { + BUG_ON(is_dl_group(&rq->rt)); + struct task_struct *next_task; struct rq *lowest_rq; int ret =3D 0; @@ -1709,6 +1718,8 @@ void rto_push_irq_work_func(struct irq_work *work) static void pull_rt_task(struct rq *this_rq) { + BUG_ON(is_dl_group(&this_rq->rt)); + int this_cpu =3D this_rq->cpu, cpu; bool resched =3D false; struct task_struct *p, *push_task; @@ -1833,6 +1844,8 @@ static int group_find_lowest_rt_rq(struct task_struct= *task, struct rt_rq *task_ int prio, lowest_prio; int cpu, this_cpu =3D smp_processor_id(); + BUG_ON(task->sched_task_group !=3D task_rt_rq->tg); + if (task->nr_cpus_allowed =3D=3D 1) return -1; /* No other targets possible */ @@ -1931,6 +1944,8 @@ static struct rt_rq *group_find_lock_lowest_rt_rq(str= uct task_struct *task, stru struct sched_dl_entity *lowest_dl_se; int tries, cpu; + BUG_ON(task->sched_task_group !=3D rt_rq->tg); + for (tries =3D 0; tries < RT_MAX_TRIES; tries++) { cpu =3D group_find_lowest_rt_rq(task, rt_rq); @@ -1984,6 +1999,8 @@ static struct rt_rq *group_find_lock_lowest_rt_rq(str= uct task_struct *task, stru static int group_push_rt_task(struct rt_rq *rt_rq, bool pull) { + BUG_ON(!is_dl_group(rt_rq)); + struct rq *rq =3D rq_of_rt_rq(rt_rq); struct task_struct *next_task; struct rq *lowest_rq; @@ -2103,6 +2120,8 @@ static int group_push_rt_task(struct rt_rq *rt_rq, bo= ol pull) static void group_pull_rt_task(struct rt_rq *this_rt_rq) { + BUG_ON(!is_dl_group(this_rt_rq)); + struct rq *this_rq =3D rq_of_rt_rq(this_rt_rq); int this_cpu =3D this_rq->cpu, cpu; bool resched =3D false; @@ -2215,6 +2234,9 @@ static void group_push_rt_tasks_callback(struct rq *g= lobal_rq) { struct rt_rq *rt_rq =3D &global_rq->rq_to_push_from->rt; + BUG_ON(global_rq->rq_to_push_from =3D=3D NULL); + BUG_ON(served_rq_of_rt_rq(rt_rq) =3D=3D global_rq); + if ((rt_rq->rt_nr_running > 1) || (dl_group_of(rt_rq)->dl_throttled =3D=3D 1)) { @@ -2228,6 +2250,9 @@ static void group_pull_rt_task_callback(struct rq *gl= obal_rq) { struct rt_rq *rt_rq =3D &global_rq->rq_to_pull_to->rt; + BUG_ON(global_rq->rq_to_pull_to =3D=3D NULL); + BUG_ON(served_rq_of_rt_rq(rt_rq) =3D=3D global_rq); + group_pull_rt_task(rt_rq); global_rq->rq_to_pull_to =3D NULL; } -- 2.53.0