From nobody Fri Oct 3 07:40:24 2025 Received: from mail-wm1-f53.google.com (mail-wm1-f53.google.com [209.85.128.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 129E82FB62C for ; Wed, 3 Sep 2025 11:44:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756899894; cv=none; b=OkexlYK3CWKV5tSe1batACVYVhM5PZ1CQwmUHNTDYaWnAv+w+5to9baOVK5szU/xSishhxgC4UDTAVk6waHauoi36RQ/CrBFNfk6wZXc+thuyX+aZ4fixPlg86bjJW3wR1Hcv7ixSul0DPlUGVeYj3dYEVVbMgd9WarGV628idw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756899894; c=relaxed/simple; bh=FgeaLRmJ98PxA0HzbWZHiowbs5UdLqUGz/eQEqut9vI=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=Qlddh6raeSTVxpNumQp34XrYpZN0nGt///O0DscZw1o/TsZyskrV2zb0JW1ZS5FAVZToCO8/zdurC565An4NxDTlFCT9P12CsVxes+VkTRB45cBuQaRrY919RQ8jY56v58enLTmRSUI+5uXDs49WSmUPBBamnhaL1Fj02EjFljY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ix2vMYZT; arc=none smtp.client-ip=209.85.128.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ix2vMYZT" Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-45cb5e1adf7so5185545e9.0 for ; Wed, 03 Sep 2025 04:44:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1756899890; x=1757504690; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=qkHCi90Prsx7EhJyvb4hJ+KWmQTZHqZlClenCTV65TA=; b=ix2vMYZTAAISBOmnSCvWWjmMePCY9wNn8Bg5X9JCPjpeqMnLgPhBXDOOHZAUb+8TvH tgf2W0EbTx7PC9pLVYjqV8kuiE6kM/pljbio8zYTpGLtVxYmmNrAUEqOkhCI8AMtprA2 35kLRM8TsqA4EneJwVbx7QcE/4Xm8Pvmf2G0wuOVdj7ySJVTYY/E4MRDqBafW712g85a j1M4GXdLT8tuiO6tqo2k/S8zKN3k2JDeRmT0yMDZ+WdDZW/yN+OHKG6FIozmxMmZPD+E kOL4E/Pn+8t1jK/L59Z9jWsUtztgdbj7swZPs1JFCFzGL+3NEncWEHl7YOEOyyk0dVCR 4Ehw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756899890; x=1757504690; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qkHCi90Prsx7EhJyvb4hJ+KWmQTZHqZlClenCTV65TA=; b=Q0ja0GIVBZOmF/HnUrQVW30MqM3tL9Mm9EGXaoMpf/X0NQlm9JVVOrf54k1YnjFwBU 3HOp44qjtE01OAe6y4L0uZPIBJu82Vy4CBNUy97PzSOhv/ZVIJhd1fTtqemJkbc1xuZZ LY3L3RI9xm/zRLL7wZg/C68Jp4UpeXjkF/XcLtpE/rG2Kt+bmbZfu408FNQxrMt+629W CT07xna8hN1yDx+pUbwgEjLthdZ+Dp0OTrXESztV2Nsy/h4FH1OZuybs1sGBOECG9r7N LEfqG/Kwo0cvcoAPEIzVGGF0CN+i+Fm2shw3aMCAq7WaecIY45v1BDH5EwpnFjFhWm2D 40Bg== X-Gm-Message-State: AOJu0YxmW+WlGmUJwsm0BJa6WSN6pe5Ty4z8GOGcHwVcvJ4ktA0hq9me WcHyaWIrPRDio928LLTrAG7y/OlxCMLQmoPnYXziPyt9qb/bqp7XJhL6XWDb4205VkY= X-Gm-Gg: ASbGnctacaFe+iejLYTRmMqdZ0Y0MWeHM2JV4q/tgeh630SNOxy1iZvR2DegXe8t/h4 hKOYrXjTXGKsv7inXGFH6EIjytdi+VDyQpyXsouF4EssL33cNhc6Hzyh98rXvAKLgPpT2SjsXBS YKWN0reXceSOCqoTP0Thsk9AcT+713SNlHj/dMdREZuTMkw9NglexvxyzyceFOSRz/9hlWdaSyH WfF9gwq65ja4mGDdzLSWvuTySXacuOOVxeGZ5Gn6d0LDosSI0CcwCGEiIQOfXOb/d0HewvRxVSz 4Vf8gXefPLh8tdfbb49AVMbgJtrNLN5RMhEmoW4vjM1w3+UGPwvHmq2wm1K1Ps4GWShvPzksUsV zPCtF/Wq77V00VNST6gI= X-Google-Smtp-Source: AGHT+IGJRD4xpdmKr1eRs2wzmob0Wqf7qEeMnQKN85y3fhW+s1s36l9XCCq/QqAEzFcM/KPlq4tALQ== X-Received: by 2002:a05:600c:1394:b0:45a:236a:23ba with SMTP id 5b1f17b1804b1-45b85575801mr115309525e9.22.1756899889964; Wed, 03 Sep 2025 04:44:49 -0700 (PDT) Received: from victus-lab ([193.205.81.5]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-45b990acebesm38518605e9.3.2025.09.03.04.44.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Sep 2025 04:44:49 -0700 (PDT) From: Yuri Andriaccio To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: linux-kernel@vger.kernel.org, Luca Abeni , Yuri Andriaccio Subject: [PATCH v4] sched/deadline: Remove fair-servers from real-time task's bandwidth accounting Date: Wed, 3 Sep 2025 13:44:48 +0200 Message-ID: <20250903114448.664452-1-yurand2000@gmail.com> X-Mailer: git-send-email 2.51.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Fair-servers are currently used in place of the old RT_THROTTLING mechanism= to prevent the starvation of SCHED_OTHER (and other lower priority) tasks when real-time FIFO/RR processes are trying to fully utilize the CPU. To allow t= he RT_THROTTLING mechanism, the maximum allocatable bandwidth for real-time ta= sks has been limited to 95% of the CPU-time. The RT_THROTTLING mechanism is now removed in favor of fair-servers, which = are currently set to use, as expected, 5% of the CPU-time. Still, they share the same bandwidth that allows running real-time tasks, and which is still set = to 95% of the total CPU-time. This means that by removing the RT_THROTTLING mechanism, the remaining bandwidth for real-time SCHED_DEADLINE tasks and o= ther dl-servers (FIFO/RR are not affected) is only 90%. This patch reclaims the 5% lost CPU-time, which is definitely reserved for SCHED_OTHER tasks, but should not be accounted together with the other real= -time tasks. More generally, the fair-servers' bandwidth must not be accounted wi= th other real-time tasks. Updates: - Make the fair-servers' bandwidth not be accounted into the total allocated bandwidth for real-time tasks. - Remove the admission control test when allocating a fair-server. - Do not account for fair-servers in the GRUB's bandwidth reclaiming mechan= ism. - Limit the max bandwidth to (BW_UNIT - max_rt_bw) when changing the parame= ters of a fair-server, preventing overcommitment. - Update admission tests (in sched_dl_global_validate) when changing the maximum allocatable bandwidth for real-time tasks, preventing overcommitm= ent. - Update admission tests (in dl_bw_manage) when offlining a CPU. Since the fair-server's bandwidth can be changed through debugfs, it has not been enforced that a fair-server's bandwidth must be always equal to (BW_UN= IT - max_rt_bw), rather it must be less or equal to this value. This allows reta= ining the fair-servers' settings changed through the debugfs when changing the max_rt_bw. This also means that in order to increase the maximum bandwidth for real-ti= me tasks, the bw of fair-servers must be first decreased through debugfs other= wise admission tests will fail, and vice versa, to increase the bw of fair-serve= rs, the bw of real-time tasks must be reduced beforehand. This v4 version removes dl_bw_fair, as it is not needed anymore since each = fair server's bw is now checked individually rather than globally. This is neces= sary because different fair-servers can have different runtimes. The bandwidth assignment is sound only if each CPU's rt-bw + fair-server-bw is less tahn = or equal to 1, rather than computing the total and checking if it is less than= or equal to the number of CPUs. The check on deadline tasks can be instead be = done globally (on a root-domain basis) as dl tasks are allowed to migrate between cores. This new version fixes the error reported here: https://lore.kernel.org/all/aLa3zdmyKuRMy3bm@jlelli-thinkpadt14gen4.remote.= csb/ v1: https://lore.kernel.org/all/20250721111131.309388-1-yurand2000@gmail.co= m/ v2: https://lore.kernel.org/all/20250725164412.35912-1-yurand2000@gmail.com/ v3: https://lore.kernel.org/all/20250901113103.601085-1-yurand2000@gmail.co= m/ Signed-off-by: Yuri Andriaccio --- kernel/sched/deadline.c | 81 +++++++++++++---------------------------- kernel/sched/sched.h | 1 - kernel/sched/topology.c | 8 ---- 3 files changed, 26 insertions(+), 64 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index f25301267e4..35bcd360329 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1659,48 +1659,22 @@ void sched_init_dl_servers(void) } } =20 -void __dl_server_attach_root(struct sched_dl_entity *dl_se, struct rq *rq) -{ - u64 new_bw =3D dl_se->dl_bw; - int cpu =3D cpu_of(rq); - struct dl_bw *dl_b; - - dl_b =3D dl_bw_of(cpu_of(rq)); - guard(raw_spinlock)(&dl_b->lock); - - if (!dl_bw_cpus(cpu)) - return; - - __dl_add(dl_b, new_bw, dl_bw_cpus(cpu)); -} - int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64= period, bool init) { - u64 old_bw =3D init ? 0 : to_ratio(dl_se->dl_period, dl_se->dl_runtime); u64 new_bw =3D to_ratio(period, runtime); struct rq *rq =3D dl_se->rq; int cpu =3D cpu_of(rq); struct dl_bw *dl_b; - unsigned long cap; - int retval =3D 0; - int cpus; =20 dl_b =3D dl_bw_of(cpu); guard(raw_spinlock)(&dl_b->lock); =20 - cpus =3D dl_bw_cpus(cpu); - cap =3D dl_bw_capacity(cpu); - - if (__dl_overflow(dl_b, cap, old_bw, new_bw)) + if (new_bw > BW_UNIT - dl_b->bw) return -EBUSY; =20 if (init) { __add_rq_bw(new_bw, &rq->dl); - __dl_add(dl_b, new_bw, cpus); } else { - __dl_sub(dl_b, dl_se->dl_bw, cpus); - __dl_add(dl_b, new_bw, cpus); - dl_rq_change_utilization(rq, dl_se, new_bw); } =20 @@ -1714,7 +1688,7 @@ int dl_server_apply_params(struct sched_dl_entity *dl= _se, u64 runtime, u64 perio dl_se->dl_bw =3D to_ratio(dl_se->dl_period, dl_se->dl_runtime); dl_se->dl_density =3D to_ratio(dl_se->dl_deadline, dl_se->dl_runtime); =20 - return retval; + return 0; } =20 /* @@ -2945,17 +2919,6 @@ void dl_clear_root_domain(struct root_domain *rd) rd->dl_bw.total_bw =3D 0; for_each_cpu(i, rd->span) cpu_rq(i)->dl.extra_bw =3D cpu_rq(i)->dl.max_bw; - - /* - * dl_servers are not tasks. Since dl_add_task_root_domain ignores - * them, we need to account for them here explicitly. - */ - for_each_cpu(i, rd->span) { - struct sched_dl_entity *dl_se =3D &cpu_rq(i)->fair_server; - - if (dl_server(dl_se) && cpu_active(i)) - __dl_add(&rd->dl_bw, dl_se->dl_bw, dl_bw_cpus(i)); - } } =20 void dl_clear_root_domain_cpu(int cpu) @@ -3139,9 +3102,10 @@ int sched_dl_global_validate(void) u64 period =3D global_rt_period(); u64 new_bw =3D to_ratio(period, runtime); u64 cookie =3D ++dl_cookie; + u64 fair_bw; struct dl_bw *dl_b; - int cpu, cpus, ret =3D 0; - unsigned long flags; + int i, cpu, ret =3D 0; + unsigned long cap, flags; =20 /* * Here we want to check the bandwidth not being set to some @@ -3155,11 +3119,28 @@ int sched_dl_global_validate(void) goto next; =20 dl_b =3D dl_bw_of(cpu); - cpus =3D dl_bw_cpus(cpu); + cap =3D dl_bw_capacity(cpu); =20 raw_spin_lock_irqsave(&dl_b->lock, flags); - if (new_bw * cpus < dl_b->total_bw) + /* Check if the whole root domain can support the active dl tasks */ + if (cap_scale(new_bw, cap) < dl_b->total_bw) { ret =3D -EBUSY; + goto unlock; + } + + /* + * For each cpu in the root domain, check if there is enough + * bandwidth for the fair server. + */ + for_each_cpu_and(i, cpu_rq(cpu)->rd->span, cpu_active_mask) { + fair_bw =3D cpu_rq(i)->fair_server.dl_bw; + + if (new_bw + fair_bw > BW_UNIT) { + ret =3D -EBUSY; + goto unlock; + } + } +unlock: raw_spin_unlock_irqrestore(&dl_b->lock, flags); =20 next: @@ -3444,7 +3425,6 @@ static int dl_bw_manage(enum dl_bw_request req, int c= pu, u64 dl_bw) unsigned long flags, cap; struct dl_bw *dl_b; bool overflow =3D 0; - u64 fair_server_bw =3D 0; =20 rcu_read_lock_sched(); dl_b =3D dl_bw_of(cpu); @@ -3476,28 +3456,19 @@ static int dl_bw_manage(enum dl_bw_request req, int= cpu, u64 dl_bw) */ cap -=3D arch_scale_cpu_capacity(cpu); =20 - /* - * cpu is going offline and NORMAL tasks will be moved away - * from it. We can thus discount dl_server bandwidth - * contribution as it won't need to be servicing tasks after - * the cpu is off. - */ - if (cpu_rq(cpu)->fair_server.dl_server) - fair_server_bw =3D cpu_rq(cpu)->fair_server.dl_bw; - /* * Not much to check if no DEADLINE bandwidth is present. * dl_servers we can discount, as tasks will be moved out the * offlined CPUs anyway. */ - if (dl_b->total_bw - fair_server_bw > 0) { + if (dl_b->total_bw > 0) { /* * Leaving at least one CPU for DEADLINE tasks seems a * wise thing to do. As said above, cpu is not offline * yet, so account for that. */ if (dl_bw_cpus(cpu) - 1) - overflow =3D __dl_overflow(dl_b, cap, fair_server_bw, 0); + overflow =3D __dl_overflow(dl_b, cap, 0, 0); else overflow =3D 1; } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index be9745d104f..01afa7424f8 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -390,7 +390,6 @@ extern void sched_init_dl_servers(void); extern void dl_server_update_idle_time(struct rq *rq, struct task_struct *p); extern void fair_server_init(struct rq *rq); -extern void __dl_server_attach_root(struct sched_dl_entity *dl_se, struct = rq *rq); extern int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64 period, bool init); =20 diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 977e133bb8a..4ea3365984a 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -500,14 +500,6 @@ void rq_attach_root(struct rq *rq, struct root_domain = *rd) if (cpumask_test_cpu(rq->cpu, cpu_active_mask)) set_rq_online(rq); =20 - /* - * Because the rq is not a task, dl_add_task_root_domain() did not - * move the fair server bw to the rd if it already started. - * Add it now. - */ - if (rq->fair_server.dl_server) - __dl_server_attach_root(&rq->fair_server, rq); - rq_unlock_irqrestore(rq, &rf); =20 if (old_rd) base-commit: 5c3b3264e5858813632031ba58bcd6e1eeb3b214 --=20 2.51.0