From nobody Mon Feb 9 01:00:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1C46CC77B7D for ; Mon, 15 May 2023 02:57:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238480AbjEOC5j (ORCPT ); Sun, 14 May 2023 22:57:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238183AbjEOC5b (ORCPT ); Sun, 14 May 2023 22:57:31 -0400 Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com [IPv6:2607:f8b0:4864:20::82a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E31D1984 for ; Sun, 14 May 2023 19:57:24 -0700 (PDT) Received: by mail-qt1-x82a.google.com with SMTP id d75a77b69052e-3f507edcaaaso22329281cf.2 for ; Sun, 14 May 2023 19:57:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitbyteword.org; s=google; t=1684119443; x=1686711443; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2lIJvTsYG8IVqKvLsuCqQCH24IS8rxjIyEsCxC8Ekzo=; b=Hxby+KZbvGKNXas4QOqBhpllFeYbMrES+rRzoyIVnS1QVDwALzDUwnxZO75URpkrYR mwiIg+/wA6KSUrbA9btdgEAiEtrEVDBK9aIvy6gFaxtZxaCRO5EUHimlzPg6l8Zy0WlQ SomjVz1BcH43hpOXqaDrv4+iwh31agTB5z9WbEZp2kwx6KgfhaLYt2SaWFHhKq0XtqcN lCPvyoaz+ssqApVUhikBdq9hupM4+2KypiSzBTzxwrE7BQX2zGGvKsbjAYEPF87UZs+V BZEYjXPiWMVewNm/UP17KMAu3n1gKhU8iiV2yhEvS6dM0wgvF3dvxGYDdjDukDg76Kb7 eP4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684119443; x=1686711443; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2lIJvTsYG8IVqKvLsuCqQCH24IS8rxjIyEsCxC8Ekzo=; b=caXySOd32fp9nM7zgbeyFJjik0lNJ6Tm3YGMyp4BQVG+M4mhrgW6lalO9c1IpQ81ox DSzTBjh3OEHBd07u+hdYSwmdrk1EY+MPGxLMdt7W0V6J2f4e+LfcOGYbXB6eMytuU5bc 7Jrl7Ld6exgAHChL7sU/GXXn40rGQ9KYGyTVb3/RcPi5+JMeLSCKL1YHsjbU9hcKdPn3 uOx6yxADeVbBk75y5pBiWrudZVvMYD/StTSw4kKGV7S+MQpJK3hNG6tFfpJQDzyIksYc YpY7w1aPGFtgQfQBTaumcGDWMN2HYr+1fuFf4CX82v3Hop/rNGwspsmLzbTx4rBfqOsT /SNg== X-Gm-Message-State: AC+VfDyHM2qHaWOGmxjQPcbz9Z+INlIuWm5lai8IbdSV1lcV2NkqdlPY xPZKMI0M5Ldm9jZtX+okQbnWrA== X-Google-Smtp-Source: ACHHUZ5GH3xBTIDMm/JewA8s5ESdUq296AFA4Tje1Z/oeSz703q+OohHVZ8JympWr6Wt6WAsMByh8Q== X-Received: by 2002:a05:622a:4c6:b0:3f5:320e:8b14 with SMTP id q6-20020a05622a04c600b003f5320e8b14mr542358qtx.52.1684119443439; Sun, 14 May 2023 19:57:23 -0700 (PDT) Received: from vinz16.lan (c-73-143-21-186.hsd1.ma.comcast.net. [73.143.21.186]) by smtp.gmail.com with ESMTPSA id c3-20020a05620a134300b007339c5114a9sm2308994qkl.103.2023.05.14.19.57.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 May 2023 19:57:23 -0700 (PDT) From: Vineeth Pillai To: luca.abeni@santannapisa.it, Juri Lelli , Daniel Bristot de Oliveira , Peter Zijlstra , Ingo Molnar , Vincent Guittot , Steven Rostedt , Joel Fernandes , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider Cc: Vineeth Pillai , Jonathan Corbet , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Subject: [PATCH v3 1/5] sched/deadline: Fix bandwidth reclaim equation in GRUB Date: Sun, 14 May 2023 22:57:12 -0400 Message-Id: <20230515025716.316888-2-vineeth@bitbyteword.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230515025716.316888-1-vineeth@bitbyteword.org> References: <20230515025716.316888-1-vineeth@bitbyteword.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" According to the GRUB[1] rule, the runtime is depreciated as: "dq =3D -max{u, (1 - Uinact - Uextra)} dt" (1) To guarentee that deadline tasks doesn't starve lower class tasks, we do not allocate the full bandwidth of the cpu to deadline tasks. Maximum bandwidth usable by deadline tasks is denoted by "Umax". Considering Umax, equation (1) becomes: "dq =3D -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" (2) Current implementation has a minor bug in equation (2). This patch fixes the bug and also fixes the precision issue by using div64_u64. The reclamation logic is verified by a sample program which creates multiple deadline threads and observing their utilization. The tests were run on an isolated cpu(isolcpus=3D3) on a 4 cpu system. Tests on 6.3.0 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RUN 1: runtime=3D7ms, deadline=3Dperiod=3D10ms, RT capacity =3D 95% TID[693]: RECLAIM=3D1, (r=3D7ms, d=3D10ms, p=3D10ms), Util: 93.33 TID[693]: RECLAIM=3D1, (r=3D7ms, d=3D10ms, p=3D10ms), Util: 93.35 TID[693]: RECLAIM=3D1, (r=3D7ms, d=3D10ms, p=3D10ms), Util: 93.35 TID[693]: RECLAIM=3D1, (r=3D7ms, d=3D10ms, p=3D10ms), Util: 93.29 RUN 2: runtime=3D1ms, deadline=3Dperiod=3D100ms, RT capacity =3D 95% TID[708]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 16.69 TID[708]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 16.69 TID[708]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 16.70 RUN 3: 2 tasks Task 1: runtime=3D1ms, deadline=3Dperiod=3D10ms Task 2: runtime=3D1ms, deadline=3Dperiod=3D100ms TID[631]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 62.67 TID[632]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 6.37 TID[631]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 62.38 TID[632]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 6.19 TID[631]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 62.60 TID[632]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 6.23 TID[631]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 62.43 As seen above, the reclamation doesn't reclaim the maximum allowed bandwidth and as the bandwidth of tasks gets smaller, the reclaimed bandwidth also comes down. Tests with this patch applied =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D RUN 1: runtime=3D7ms, deadline=3Dperiod=3D10ms, RT capacity =3D 95% TID[667]: RECLAIM=3D1, (r=3D7ms, d=3D10ms, p=3D10ms), Util: 95.01 TID[667]: RECLAIM=3D1, (r=3D7ms, d=3D10ms, p=3D10ms), Util: 95.00 RUN 2: runtime=3D1ms, deadline=3Dperiod=3D100ms, RT capacity =3D 95% TID[641]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 94.86 TID[641]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 95.06 RUN 3: 2 tasks Task 1: runtime=3D1ms, deadline=3Dperiod=3D10ms Task 2: runtime=3D1ms, deadline=3Dperiod=3D100ms TID[636]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 86.44 TID[637]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 8.67 TID[636]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 86.34 TID[637]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 8.61 Running tasks on all cpus allowing for migration also showed that the utilization is reclaimed to the maximum. Running 10 tasks on 3 cpus SCHED_FLAG_RECLAIM - top shows: %Cpu0 : 94.6 us, 0.0 sy, 0.0 ni, 5.4 id, 0.0 wa %Cpu1 : 95.2 us, 0.0 sy, 0.0 ni, 4.8 id, 0.0 wa %Cpu2 : 95.8 us, 0.0 sy, 0.0 ni, 4.2 id, 0.0 wa [1]: Abeni, Luca & Lipari, Giuseppe & Parri, Andrea & Sun, Youcheng. (2015). Parallel and sequential reclaiming in multicore real-time global scheduling. Signed-off-by: Vineeth Pillai (Google) --- kernel/sched/deadline.c | 72 ++++++++++++++++++++--------------------- kernel/sched/sched.h | 6 ++-- 2 files changed, 39 insertions(+), 39 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 71b24371a6f7..91451c1c7e52 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -487,7 +487,7 @@ static inline int is_leftmost(struct task_struct *p, st= ruct dl_rq *dl_rq) return rb_first_cached(&dl_rq->root) =3D=3D &dl_se->rb_node; } =20 -static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq); +static void init_dl_rq_bw(struct dl_rq *dl_rq); =20 void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime) { @@ -523,7 +523,7 @@ void init_dl_rq(struct dl_rq *dl_rq) =20 dl_rq->running_bw =3D 0; dl_rq->this_bw =3D 0; - init_dl_rq_bw_ratio(dl_rq); + init_dl_rq_bw(dl_rq); } =20 #ifdef CONFIG_SMP @@ -1261,43 +1261,47 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_= se) =20 /* * This function implements the GRUB accounting rule: - * according to the GRUB reclaiming algorithm, the runtime is - * not decreased as "dq =3D -dt", but as - * "dq =3D -max{u / Umax, (1 - Uinact - Uextra)} dt", - * where u is the utilization of the task, Umax is the maximum reclaimable - * utilization, Uinact is the (per-runqueue) inactive utilization, computed - * as the difference between the "total runqueue utilization" and the - * runqueue active utilization, and Uextra is the (per runqueue) extra - * reclaimable utilization. - * Since rq->dl.running_bw and rq->dl.this_bw contain utilizations - * multiplied by 2^BW_SHIFT, the result has to be shifted right by - * BW_SHIFT. - * Since rq->dl.bw_ratio contains 1 / Umax multiplied by 2^RATIO_SHIFT, - * dl_bw is multiped by rq->dl.bw_ratio and shifted right by RATIO_SHIFT. + * As per the GRUB rule,the runtime is not decreased as "dq =3D -dt", but = as + * "dq =3D -max{u, (1 - Uinact - Uextra)} dt", + * where: + * u: Bandwith of the task. + * running_bw: Total bandwidth of tasks in active state for this rq. + * this_bw: Reserved bandwidth for this rq. Includes active and + * inactive bandwidth for this rq. + * Uinact: Inactive utilization (this_bw - running_bw) + * Umax: Max usable bandwidth for DL. Currently + * =3D sched_rt_runtime_us / sched_rt_period_us + * Uextra: Extra bandwidth not reserved: + * =3D Umax - \Sum(u_i / #cpus in the root domain) + * u_i: Bandwidth of an admitted dl task in the + * root domain. + * + * Deadline tasks are not allowed to use the whole bandwidth of the cpu, + * but only a portion of it denoted by "Umax". So the equation becomes: + * "dq =3D -(max{u, (Umax - Uinact - Uextra)} / Umax) dt", + * * Since delta is a 64 bit variable, to have an overflow its value * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds. * So, overflow is not an issue here. */ static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *= dl_se) { - u64 u_inact =3D rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */ u64 u_act; - u64 u_act_min =3D (dl_se->dl_bw * rq->dl.bw_ratio) >> RATIO_SHIFT; + u64 u_inact =3D rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */ =20 /* - * Instead of computing max{u * bw_ratio, (1 - u_inact - u_extra)}, + * Instead of computing max{u, (rq->dl.max_bw - u_inact - u_extra)}, * we compare u_inact + rq->dl.extra_bw with - * 1 - (u * rq->dl.bw_ratio >> RATIO_SHIFT), because - * u_inact + rq->dl.extra_bw can be larger than - * 1 * (so, 1 - u_inact - rq->dl.extra_bw would be negative - * leading to wrong results) + * rq->dl.max_bw - u, because u_inact + rq->dl.extra_bw can be larger + * than rq->dl.max_bw (so, rq->dl.max_bw - u_inact - rq->dl.extra_bw + * would be negative leading to wrong results) */ - if (u_inact + rq->dl.extra_bw > BW_UNIT - u_act_min) - u_act =3D u_act_min; + if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw) + u_act =3D dl_se->dl_bw; else - u_act =3D BW_UNIT - u_inact - rq->dl.extra_bw; + u_act =3D rq->dl.max_bw - u_inact - rq->dl.extra_bw; =20 - return (delta * u_act) >> BW_SHIFT; + return div64_u64(delta * u_act, rq->dl.max_bw); } =20 /* @@ -2780,17 +2784,13 @@ int sched_dl_global_validate(void) return ret; } =20 -static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq) +static void init_dl_rq_bw(struct dl_rq *dl_rq) { - if (global_rt_runtime() =3D=3D RUNTIME_INF) { - dl_rq->bw_ratio =3D 1 << RATIO_SHIFT; - dl_rq->extra_bw =3D 1 << BW_SHIFT; - } else { - dl_rq->bw_ratio =3D to_ratio(global_rt_runtime(), - global_rt_period()) >> (BW_SHIFT - RATIO_SHIFT); - dl_rq->extra_bw =3D to_ratio(global_rt_period(), + if (global_rt_runtime() =3D=3D RUNTIME_INF) + dl_rq->max_bw =3D dl_rq->extra_bw =3D 1 << BW_SHIFT; + else + dl_rq->max_bw =3D dl_rq->extra_bw =3D to_ratio(global_rt_period(), global_rt_runtime()); - } } =20 void sched_dl_do_global(void) @@ -2819,7 +2819,7 @@ void sched_dl_do_global(void) raw_spin_unlock_irqrestore(&dl_b->lock, flags); =20 rcu_read_unlock_sched(); - init_dl_rq_bw_ratio(&cpu_rq(cpu)->dl); + init_dl_rq_bw(&cpu_rq(cpu)->dl); } } =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 3e8df6d31c1e..1bc7ae9ad349 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -754,10 +754,10 @@ struct dl_rq { u64 extra_bw; =20 /* - * Inverse of the fraction of CPU utilization that can be reclaimed - * by the GRUB algorithm. + * Maximum available bandwidth for deadline tasks of this rq. This is + * used in calculation of reclaimable bandwidth(GRUB). */ - u64 bw_ratio; + u64 max_bw; }; =20 #ifdef CONFIG_FAIR_GROUP_SCHED --=20 2.40.1 From nobody Mon Feb 9 01:00:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E88CCC77B7D for ; Mon, 15 May 2023 02:57:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238481AbjEOC5q (ORCPT ); Sun, 14 May 2023 22:57:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237951AbjEOC5b (ORCPT ); Sun, 14 May 2023 22:57:31 -0400 Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com [IPv6:2607:f8b0:4864:20::f2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3391172D for ; Sun, 14 May 2023 19:57:25 -0700 (PDT) Received: by mail-qv1-xf2a.google.com with SMTP id 6a1803df08f44-61a4e03ccbcso113031416d6.2 for ; Sun, 14 May 2023 19:57:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitbyteword.org; s=google; t=1684119444; x=1686711444; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1Tk/101Kg9y1VA9l8G9fZslfzv/h+WcqBxFnkUABG5A=; b=J6XQSlIZBd3HdoROHLTlpZLoseEwp3Cp6nHpUP4x/G2KNdfRKZAQcHI421sqYkRVjc oyZeT4A6AQYEO2/ZHBuUDWx+6vq9FLxIJ74hL140aLc6NZMn0qmSgwqjb8Ui/Y5ya9k1 9SkF82ACmtn4BOqZdF3bGkuXmwuzARfzWE1CxR9TF2CEJv5dtaK4iwRd/70naUOqzS+b heGvSf83D4URQRYnKn4lqKi3yxDWvont4NdfnNU1lQldqScaCiAFZkUJPObuNYA7hE0Q r92qkfdcz3HGUedtqAoan3TZda5dfBoeoZADu9YStLxO/bwR4PcSfJxun/Qzo0ufHcpp j9Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684119444; x=1686711444; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1Tk/101Kg9y1VA9l8G9fZslfzv/h+WcqBxFnkUABG5A=; b=XBm0zCAnY82p4rfxG8tMYgdAWIlW1JRWDQOe7VeGzemb6PQ183+hXqTM5hXJfoWE+J We61r14aE1WoOTEGjHhNHIYzTKioHq85F8b8GhfrZ5nGz/uLAKxUTdqk6ld/ONJAo7Jr dygJQNemWl96Az+bIrBp5tN1obh2DUQMIZK49fq6a0fVTEMbJDdgbgwHw6e2VnuV4wJG c07OEKqpNDMjR0JA0HRxirpbLX9H3Uahd12loUQNyuFO74Ot4QD5K0c8+uryqpKveWri G1y0ehWz4ppVsRNaZOhy4gYKQ9RYBDYRjSJXmig/WlaLTCENqAyVCy4AKT6JLKM2x07y 9I6w== X-Gm-Message-State: AC+VfDxdQydaBgALJXkqLYmKFoTVxHDY8fM50GufcWCKa0E135Sitjyr oixKIed9H3odQsmzEF8F/oUMDQ== X-Google-Smtp-Source: ACHHUZ5+ra7w/3DJxrJFO5+p5e/ebNw1ZiKaxSnmQV2YQDTWthbHuONOXCieAMhto1TV8qnIDxFj6w== X-Received: by 2002:a05:6214:2421:b0:5a2:abf1:7d33 with SMTP id gy1-20020a056214242100b005a2abf17d33mr54987934qvb.50.1684119444604; Sun, 14 May 2023 19:57:24 -0700 (PDT) Received: from vinz16.lan (c-73-143-21-186.hsd1.ma.comcast.net. [73.143.21.186]) by smtp.gmail.com with ESMTPSA id c3-20020a05620a134300b007339c5114a9sm2308994qkl.103.2023.05.14.19.57.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 May 2023 19:57:24 -0700 (PDT) From: Vineeth Pillai To: luca.abeni@santannapisa.it, Juri Lelli , Daniel Bristot de Oliveira , Peter Zijlstra , Ingo Molnar , Vincent Guittot , Steven Rostedt , Joel Fernandes , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider Cc: Vineeth Pillai , Jonathan Corbet , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Subject: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP Date: Sun, 14 May 2023 22:57:13 -0400 Message-Id: <20230515025716.316888-3-vineeth@bitbyteword.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230515025716.316888-1-vineeth@bitbyteword.org> References: <20230515025716.316888-1-vineeth@bitbyteword.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In a multi-processor system, bandwidth usage is divided equally to all cpus. This causes issues with reclaiming free bandwidth on a cpu. "Uextra" is same on all cpus in a root domain and running_bw would be different based on the reserved bandwidth of tasks running on the cpu. This causes disproportionate reclaiming - task with lesser bandwidth reclaims less even if its the only task running on that cpu. Following is a small test with three tasks with reservations (8,10) (1,10) and (1, 100). These three tasks run on different cpus. But since the reclamation logic calculates available bandwidth as a factor of globally available bandwidth, tasks with lesser bandwidth reclaims only little compared to higher bandwidth even if cpu has free and available bandwidth to be reclaimed. TID[730]: RECLAIM=3D1, (r=3D8ms, d=3D10ms, p=3D10ms), Util: 95.05 TID[731]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 31.34 TID[732]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 3.16 Fix: use the available bandwidth on each cpu to calculate reclaimable bandwidth. Admission control takes care of total bandwidth and hence using the available bandwidth on a specific cpu would not break the deadline guarentees. With this fix, the above test behaves as follows: TID[586]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 95.24 TID[585]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 95.01 TID[584]: RECLAIM=3D1, (r=3D8ms, d=3D10ms, p=3D10ms), Util: 95.01 Signed-off-by: Vineeth Pillai (Google) --- kernel/sched/deadline.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 91451c1c7e52..85902c4c484b 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1272,7 +1272,7 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se) * Umax: Max usable bandwidth for DL. Currently * =3D sched_rt_runtime_us / sched_rt_period_us * Uextra: Extra bandwidth not reserved: - * =3D Umax - \Sum(u_i / #cpus in the root domain) + * =3D Umax - this_bw * u_i: Bandwidth of an admitted dl task in the * root domain. * @@ -1286,22 +1286,14 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_= se) */ static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *= dl_se) { - u64 u_act; - u64 u_inact =3D rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */ - /* - * Instead of computing max{u, (rq->dl.max_bw - u_inact - u_extra)}, - * we compare u_inact + rq->dl.extra_bw with - * rq->dl.max_bw - u, because u_inact + rq->dl.extra_bw can be larger - * than rq->dl.max_bw (so, rq->dl.max_bw - u_inact - rq->dl.extra_bw - * would be negative leading to wrong results) + * max{u, Umax - Uinact - Uextra} + * =3D max{u, max_bw - (this_bw - running_bw) + (this_bw - running_bw)} + * =3D max{u, running_bw} =3D running_bw + * So dq =3D -(max{u, Umax - Uinact - Uextra} / Umax) dt + * =3D -(running_bw / max_bw) dt */ - if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw) - u_act =3D dl_se->dl_bw; - else - u_act =3D rq->dl.max_bw - u_inact - rq->dl.extra_bw; - - return div64_u64(delta * u_act, rq->dl.max_bw); + return div64_u64(delta * rq->dl.running_bw, rq->dl.max_bw); } =20 /* --=20 2.40.1 From nobody Mon Feb 9 01:00:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1430C77B7D for ; Mon, 15 May 2023 02:57:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238079AbjEOC5u (ORCPT ); Sun, 14 May 2023 22:57:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52280 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238174AbjEOC5d (ORCPT ); Sun, 14 May 2023 22:57:33 -0400 Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CFBFE1BDC for ; Sun, 14 May 2023 19:57:26 -0700 (PDT) Received: by mail-qt1-x830.google.com with SMTP id d75a77b69052e-3f39600f9b8so36596321cf.3 for ; Sun, 14 May 2023 19:57:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitbyteword.org; s=google; t=1684119446; x=1686711446; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CYx9ZmJ7bNV74WidQqwK64WAaE80NU2WT/I++6rAvxs=; b=n1vpYK+cgXgjA68bQ/KU4JGuzeXQLZwGw3KHh2bJL+E2dpDxUFXH1YIfVwZgH9Gt1w r4cLMwKyYEc4aMKTepZNSIuz+E32w5uZXpqf5182aPaav5tmOIOT9wWfbtSudeitcSjZ CaeeFpyk9J3cgMsVtUFr8ijEW5qg0BIU/Xd+CQaSSwpl3tWgJuJ9NSH18o67DwT+DSxx UtBYqzXhMP4D9kWf+4KWPw2Th28c+jwj4l8bbgDfrahZ8l8QyPfqalbMFoP9w1LsOkyo OtyiU7gpIbhU2KoYe86ekCpQmZWl/Z5ZJKB2cXn2KZ0EKkL4rgZF1zdS1wgmpuLHcs51 qQjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684119446; x=1686711446; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CYx9ZmJ7bNV74WidQqwK64WAaE80NU2WT/I++6rAvxs=; b=M2QIGFBiZFTfi0ZJxjDBR0v/5Fn/9HHqcfsMhC3aUyUgeiTaJACSB8gfTxIBohLjGV lmi96fUso81ZLEFd9IXt8ghhC/AP6KwmixLXe2JLiTbpc7+SQ7EJ8PVZuibwP7BPira+ dZj7Q2MghOtlGJBaZCrIHbQcvIdmCO1lhDYIc772lYp2XfZb9bbTpStnN4TOEw+NoyVo 0stRTSlOJmdE5A/eVdVrFazrQI+CeONzMQl1QtEUguFV+mYjFsTEOeIeLPxmdZoDPFWo cWlC2jgKqFaXIdZ2qMKJlaXEEY46xDlrJhFCrx6xuOqINkmxx7Kf338p3QuyyZhCj058 qpOQ== X-Gm-Message-State: AC+VfDw/gfgQDIwpWh6PQEaZWaSHyVQh6ra5F8DfNjXWMwGCnL67Nv1O FF0VS7kye13EWOff46WROZi40Q== X-Google-Smtp-Source: ACHHUZ72LF1HMFUzTDTTT4n2nNuI8kWCMk/iV96QsJ62QM+hVKoHCgH3GtzZcHjRQLFd0Itp6dMyyQ== X-Received: by 2002:a05:622a:1a19:b0:3b9:b6e3:c78e with SMTP id f25-20020a05622a1a1900b003b9b6e3c78emr53318179qtb.8.1684119445954; Sun, 14 May 2023 19:57:25 -0700 (PDT) Received: from vinz16.lan (c-73-143-21-186.hsd1.ma.comcast.net. [73.143.21.186]) by smtp.gmail.com with ESMTPSA id c3-20020a05620a134300b007339c5114a9sm2308994qkl.103.2023.05.14.19.57.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 May 2023 19:57:25 -0700 (PDT) From: Vineeth Pillai To: luca.abeni@santannapisa.it, Juri Lelli , Daniel Bristot de Oliveira , Peter Zijlstra , Ingo Molnar , Vincent Guittot , Steven Rostedt , Joel Fernandes , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider Cc: Vineeth Pillai , Jonathan Corbet , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Subject: [PATCH v3 3/5] sched/deadline: Remove unused variable extra_bw Date: Sun, 14 May 2023 22:57:14 -0400 Message-Id: <20230515025716.316888-4-vineeth@bitbyteword.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230515025716.316888-1-vineeth@bitbyteword.org> References: <20230515025716.316888-1-vineeth@bitbyteword.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Since we do not use extra_bw for GRUB, remove its usage. Signed-off-by: Vineeth Pillai (Google) --- kernel/sched/deadline.c | 53 ++++++++++++----------------------------- kernel/sched/sched.h | 1 - 2 files changed, 15 insertions(+), 39 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 85902c4c484b..67c1138df43a 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -163,20 +163,6 @@ static inline bool dl_bw_visited(int cpu, u64 gen) return false; } =20 -static inline -void __dl_update(struct dl_bw *dl_b, s64 bw) -{ - struct root_domain *rd =3D container_of(dl_b, struct root_domain, dl_bw); - int i; - - RCU_LOCKDEP_WARN(!rcu_read_lock_sched_held(), - "sched RCU must be held"); - for_each_cpu_and(i, rd->span, cpu_active_mask) { - struct rq *rq =3D cpu_rq(i); - - rq->dl.extra_bw +=3D bw; - } -} #else static inline struct dl_bw *dl_bw_of(int i) { @@ -198,27 +184,18 @@ static inline bool dl_bw_visited(int cpu, u64 gen) return false; } =20 -static inline -void __dl_update(struct dl_bw *dl_b, s64 bw) -{ - struct dl_rq *dl =3D container_of(dl_b, struct dl_rq, dl_bw); - - dl->extra_bw +=3D bw; -} #endif =20 static inline -void __dl_sub(struct dl_bw *dl_b, u64 tsk_bw, int cpus) +void __dl_sub(struct dl_bw *dl_b, u64 tsk_bw) { dl_b->total_bw -=3D tsk_bw; - __dl_update(dl_b, (s32)tsk_bw / cpus); } =20 static inline -void __dl_add(struct dl_bw *dl_b, u64 tsk_bw, int cpus) +void __dl_add(struct dl_bw *dl_b, u64 tsk_bw) { dl_b->total_bw +=3D tsk_bw; - __dl_update(dl_b, -((s32)tsk_bw / cpus)); } =20 static inline bool @@ -430,7 +407,7 @@ static void task_non_contending(struct task_struct *p) if (READ_ONCE(p->__state) =3D=3D TASK_DEAD) sub_rq_bw(&p->dl, &rq->dl); raw_spin_lock(&dl_b->lock); - __dl_sub(dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p))); + __dl_sub(dl_b, p->dl.dl_bw); raw_spin_unlock(&dl_b->lock); __dl_clear_params(p); } @@ -721,12 +698,12 @@ static struct rq *dl_task_offline_migration(struct rq= *rq, struct task_struct *p */ dl_b =3D &rq->rd->dl_bw; raw_spin_lock(&dl_b->lock); - __dl_sub(dl_b, p->dl.dl_bw, cpumask_weight(rq->rd->span)); + __dl_sub(dl_b, p->dl.dl_bw); raw_spin_unlock(&dl_b->lock); =20 dl_b =3D &later_rq->rd->dl_bw; raw_spin_lock(&dl_b->lock); - __dl_add(dl_b, p->dl.dl_bw, cpumask_weight(later_rq->rd->span)); + __dl_add(dl_b, p->dl.dl_bw); raw_spin_unlock(&dl_b->lock); =20 set_task_cpu(p, later_rq->cpu); @@ -1425,7 +1402,7 @@ static enum hrtimer_restart inactive_task_timer(struc= t hrtimer *timer) } =20 raw_spin_lock(&dl_b->lock); - __dl_sub(dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p))); + __dl_sub(dl_b, p->dl.dl_bw); raw_spin_unlock(&dl_b->lock); __dl_clear_params(p); =20 @@ -2506,7 +2483,7 @@ static void set_cpus_allowed_dl(struct task_struct *p, * until we complete the update. */ raw_spin_lock(&src_dl_b->lock); - __dl_sub(src_dl_b, p->dl.dl_bw, dl_bw_cpus(task_cpu(p))); + __dl_sub(src_dl_b, p->dl.dl_bw); raw_spin_unlock(&src_dl_b->lock); } =20 @@ -2560,7 +2537,7 @@ void dl_add_task_root_domain(struct task_struct *p) dl_b =3D &rq->rd->dl_bw; raw_spin_lock(&dl_b->lock); =20 - __dl_add(dl_b, p->dl.dl_bw, cpumask_weight(rq->rd->span)); + __dl_add(dl_b, p->dl.dl_bw); =20 raw_spin_unlock(&dl_b->lock); =20 @@ -2779,9 +2756,9 @@ int sched_dl_global_validate(void) static void init_dl_rq_bw(struct dl_rq *dl_rq) { if (global_rt_runtime() =3D=3D RUNTIME_INF) - dl_rq->max_bw =3D dl_rq->extra_bw =3D 1 << BW_SHIFT; + dl_rq->max_bw =3D 1 << BW_SHIFT; else - dl_rq->max_bw =3D dl_rq->extra_bw =3D to_ratio(global_rt_period(), + dl_rq->max_bw =3D to_ratio(global_rt_period(), global_rt_runtime()); } =20 @@ -2852,8 +2829,8 @@ int sched_dl_overflow(struct task_struct *p, int poli= cy, if (dl_policy(policy) && !task_has_dl_policy(p) && !__dl_overflow(dl_b, cap, 0, new_bw)) { if (hrtimer_active(&p->dl.inactive_timer)) - __dl_sub(dl_b, p->dl.dl_bw, cpus); - __dl_add(dl_b, new_bw, cpus); + __dl_sub(dl_b, p->dl.dl_bw); + __dl_add(dl_b, new_bw); err =3D 0; } else if (dl_policy(policy) && task_has_dl_policy(p) && !__dl_overflow(dl_b, cap, p->dl.dl_bw, new_bw)) { @@ -2864,8 +2841,8 @@ int sched_dl_overflow(struct task_struct *p, int poli= cy, * But this would require to set the task's "inactive * timer" when the task is not inactive. */ - __dl_sub(dl_b, p->dl.dl_bw, cpus); - __dl_add(dl_b, new_bw, cpus); + __dl_sub(dl_b, p->dl.dl_bw); + __dl_add(dl_b, new_bw); dl_change_utilization(p, new_bw); err =3D 0; } else if (!dl_policy(policy) && task_has_dl_policy(p)) { @@ -3044,7 +3021,7 @@ int dl_cpu_busy(int cpu, struct task_struct *p) * We will free resources in the source root_domain * later on (see set_cpus_allowed_dl()). */ - __dl_add(dl_b, p->dl.dl_bw, dl_bw_cpus(cpu)); + __dl_add(dl_b, p->dl.dl_bw); } =20 raw_spin_unlock_irqrestore(&dl_b->lock, flags); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1bc7ae9ad349..33db99756624 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -751,7 +751,6 @@ struct dl_rq { * runqueue (inactive utilization =3D this_bw - running_bw). */ u64 this_bw; - u64 extra_bw; =20 /* * Maximum available bandwidth for deadline tasks of this rq. This is --=20 2.40.1 From nobody Mon Feb 9 01:00:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D111EC77B7D for ; Mon, 15 May 2023 02:57:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238526AbjEOC5y (ORCPT ); Sun, 14 May 2023 22:57:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52342 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238380AbjEOC5e (ORCPT ); Sun, 14 May 2023 22:57:34 -0400 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1634D19B6 for ; Sun, 14 May 2023 19:57:28 -0700 (PDT) Received: by mail-qt1-x835.google.com with SMTP id d75a77b69052e-3f41b73104eso30437831cf.3 for ; Sun, 14 May 2023 19:57:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitbyteword.org; s=google; t=1684119447; x=1686711447; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XYmaRuKFkzMj5gjRDsV91RPkHqeO5wpmEeDsuTj5yKM=; b=tA+fhkPNslQmwh3JHhLY8rFF3qCZt8NNZQOF/RZUfiMV9y2ZP/tLyqzwAgaC3r4PUB GN2ZfPse12OOMyQEsZAM3iZso8OKdlnq+7YyeMIskfFmWJWS9K5dmj327IsLB6V5ViHX o9IxD3EAq7wy+/PAKZkWNwsN2h5YLfKvfs7Tzx5leIsbmjFpDRmWDUAHKDUlFxL2xizz INstPLtnVfCGe6lXvwe1ikyPt/ts3Cu5upyB49FajXeUF3DOCuTL/qTmkZWkivewaldr DdJ/5WDCSZUQOC7NZo26QNIygWAmE3j/4pOrnFJuBf8uK6r9eWtS0zWACLX5No7N5ihH Di6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684119447; x=1686711447; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XYmaRuKFkzMj5gjRDsV91RPkHqeO5wpmEeDsuTj5yKM=; b=MbepRu4bXnl0xTt+GF4e/IfFs7JNjzBwgKe3nAIZBHpiRVCO9htLAmxlyC+pPL/oAJ 5fReiki2SYLNvzbswqhRGuI5j4MDwxfClVXLkhk2bDB5H3XSglPN0gm1fLQhMamdV46V cAYnuEkfDXztUi8oWaqM08wr0wslf4I2CKIhI5X1aq5fnf+AMoOUDEDV5Bp6dLO4sAzm VMpXohwfge3dCIPEmqcTCUyHtGeYIGTTcLvqYFtgsuPdSC1XmDHV6Ijyy4itSdhp9vIl ZEAO8DXAOo4mnusIs1ueypBuizh06UAHPNgM1nnyhGuxAgqQlIPktOsR175Ag+MvsCD8 Iwsw== X-Gm-Message-State: AC+VfDzXDwd6L6QkMViTZGNhLTpll67cS9qvsxN0OrKYsZBBuO4eKSlI nVo+jfQ9a16a1vayTX9T95pvOA== X-Google-Smtp-Source: ACHHUZ4hOz2OdPHISWrsejcQ6hbcJ1Fco5dm89GXSD5nQJhKEVw9HRRSUwoO5imiWhUGek5jXfAkLg== X-Received: by 2002:ac8:5ad5:0:b0:3f5:2eb1:a5d9 with SMTP id d21-20020ac85ad5000000b003f52eb1a5d9mr1820843qtd.63.1684119447185; Sun, 14 May 2023 19:57:27 -0700 (PDT) Received: from vinz16.lan (c-73-143-21-186.hsd1.ma.comcast.net. [73.143.21.186]) by smtp.gmail.com with ESMTPSA id c3-20020a05620a134300b007339c5114a9sm2308994qkl.103.2023.05.14.19.57.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 May 2023 19:57:26 -0700 (PDT) From: Vineeth Pillai To: luca.abeni@santannapisa.it, Juri Lelli , Daniel Bristot de Oliveira , Peter Zijlstra , Ingo Molnar , Vincent Guittot , Steven Rostedt , Joel Fernandes , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider Cc: Vineeth Pillai , Jonathan Corbet , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Subject: [PATCH v3 4/5] sched/deadline: Account for normal deadline tasks in GRUB Date: Sun, 14 May 2023 22:57:15 -0400 Message-Id: <20230515025716.316888-5-vineeth@bitbyteword.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230515025716.316888-1-vineeth@bitbyteword.org> References: <20230515025716.316888-1-vineeth@bitbyteword.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" GRUB algorithm assumes that all tasks participate in the reclaim. So when there is a mix of normal deadline and SCHED_FLAG_RECLAIM tasks, reclaiming the unused bandwidth is not accurate. Running two deadline tasks on a cpu where one is SCHED_FLAG_RECLAIM and other is a normal deadline task, we can see the utilization as follows: Task 1(Normal DL): (5, 10), Task 2(SCHED_FLAG_RECLAIM): (1, 10) TID[673]: RECLAIM=3D0, (r=3D5ms, d=3D10ms, p=3D10ms), Util: 50.11 TID[672]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 15.93 TID[673]: RECLAIM=3D0, (r=3D5ms, d=3D10ms, p=3D10ms), Util: 50.01 TID[672]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 15.83 GRUB rule says, runtime is dpreciated as: "dq =3D -(max{u, (Umax - Uinact - Uextra)} / Umax) dt" Where Umax is the maximum allowed bandwidth for DL tasks Uinact is the inactive utilization for the runqueue Uextra is the free bandwidth available for reclaim To account for a mix of normal deadline and SCHED_RECLAIM_FLAG tasks running together, we do not consider the bandwidth of normal tasks in the equation. So the equation becomes: "dq =3D -(max{u, (Umax_reclaim - Uinact - Uextra)} / Umax_reclaim) dt" "Umax_reclaim" is the maximum allowed bandwidth for SCHED_FLAG_RECLAIM tasks. When only SCHED_FLAG_RECLAIM tasks are running, "Umax_reclaim =3D Umax". Otherwise: "Umax_reclaim =3D Umax - running_bw + Ureclaim" Where Ureclaim is the total bandwidth of SCHED_FLAG_RECLAIM tasks in active state for this runqueue. With this fix, the results of above test is as follows: Task 1(Normal DL): (5, 10), Task 2(SCHED_FLAG_RECLAIM): (1, 10) TID[591]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 45.11 TID[592]: RECLAIM=3D0, (r=3D5ms, d=3D10ms, p=3D10ms), Util: 50.18 TID[591]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 44.99 TID[592]: RECLAIM=3D0, (r=3D5ms, d=3D10ms, p=3D10ms), Util: 49.88 Signed-off-by: Vineeth Pillai (Google) --- kernel/sched/deadline.c | 53 ++++++++++++++++++++++++++++++++--------- kernel/sched/sched.h | 11 +++++++++ 2 files changed, 53 insertions(+), 11 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 67c1138df43a..66a1b9365429 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -206,11 +206,13 @@ __dl_overflow(struct dl_bw *dl_b, unsigned long cap, = u64 old_bw, u64 new_bw) } =20 static inline -void __add_running_bw(u64 dl_bw, struct dl_rq *dl_rq) +void __add_running_bw(u64 dl_bw, struct dl_rq *dl_rq, bool reclaim_bw_se) { u64 old =3D dl_rq->running_bw; =20 lockdep_assert_rq_held(rq_of_dl_rq(dl_rq)); + if (reclaim_bw_se) + dl_rq->reclaim_bw +=3D dl_bw; dl_rq->running_bw +=3D dl_bw; SCHED_WARN_ON(dl_rq->running_bw < old); /* overflow */ SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw); @@ -219,15 +221,19 @@ void __add_running_bw(u64 dl_bw, struct dl_rq *dl_rq) } =20 static inline -void __sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq) +void __sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq, bool reclaim_bw_se) { u64 old =3D dl_rq->running_bw; =20 lockdep_assert_rq_held(rq_of_dl_rq(dl_rq)); + if (reclaim_bw_se) + dl_rq->reclaim_bw -=3D dl_bw; dl_rq->running_bw -=3D dl_bw; SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */ - if (dl_rq->running_bw > old) + if (dl_rq->running_bw > old) { + dl_rq->reclaim_bw =3D 0; dl_rq->running_bw =3D 0; + } /* kick cpufreq (see the comment in kernel/sched/sched.h). */ cpufreq_update_util(rq_of_dl_rq(dl_rq), 0); } @@ -273,14 +279,14 @@ static inline void add_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq) { if (!dl_entity_is_special(dl_se)) - __add_running_bw(dl_se->dl_bw, dl_rq); + __add_running_bw(dl_se->dl_bw, dl_rq, dl_entity_is_reclaim(dl_se)); } =20 static inline void sub_running_bw(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq) { if (!dl_entity_is_special(dl_se)) - __sub_running_bw(dl_se->dl_bw, dl_rq); + __sub_running_bw(dl_se->dl_bw, dl_rq, dl_entity_is_reclaim(dl_se)); } =20 static void dl_change_utilization(struct task_struct *p, u64 new_bw) @@ -499,6 +505,7 @@ void init_dl_rq(struct dl_rq *dl_rq) #endif =20 dl_rq->running_bw =3D 0; + dl_rq->reclaim_bw =3D 0; dl_rq->this_bw =3D 0; init_dl_rq_bw(dl_rq); } @@ -1257,20 +1264,44 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_= se) * but only a portion of it denoted by "Umax". So the equation becomes: * "dq =3D -(max{u, (Umax - Uinact - Uextra)} / Umax) dt", * + * To account for the fact that we have a mix of normal deadline tasks and + * SCHED_RECLAIM_FLAG tasks running together, we do not consider the bandw= idth + * of normal tasks in the equation. So the equation becomes: + * "dq =3D -(max{u, (Umax_reclaim - Uinact - Uextra)} / Umax_reclaim) dt", + * where + * Umax_reclaim: Maximum reclaimable bandwidth for this rq. + * + * We can calculate Umax_reclaim as: + * "Umax_reclaim =3D Uextra + Uinact + Ureclaim" + * where: + * Ureclaim: Total bandwidth of SCHED_FLAG_RECLAIM tasks in active + * state for this rq. + * * Since delta is a 64 bit variable, to have an overflow its value * should be larger than 2^(64 - 20 - 8), which is more than 64 seconds. * So, overflow is not an issue here. */ static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *= dl_se) { + u64 u_max_reclaim; + + /* + * In SMP, max_bw can be less than running_bw without violating the global + * bandwidth limits. If thats the case, we should not reclaim. + */ + if (rq->dl.max_bw < rq->dl.running_bw) + return delta; + + u_max_reclaim =3D rq->dl.max_bw - rq->dl.running_bw + rq->dl.reclaim_bw; + /* - * max{u, Umax - Uinact - Uextra} - * =3D max{u, max_bw - (this_bw - running_bw) + (this_bw - running_bw)} - * =3D max{u, running_bw} =3D running_bw - * So dq =3D -(max{u, Umax - Uinact - Uextra} / Umax) dt - * =3D -(running_bw / max_bw) dt + * max{u, Umax_reclaim - Uinact - Uextra} + * =3D max{u, Uextra + Uinact + Ureclaim - Uinact - Uextra} + * =3D max{u, Ureclaim} =3D Ureclaim =3D reclaim_bw + * So dq =3D -(max{u, Umax_reclaim - Uinact - Uextra} / Umax_reclaim) dt + * =3D -(reclaim_bw / Umax_reclaim) dt */ - return div64_u64(delta * rq->dl.running_bw, rq->dl.max_bw); + return div64_u64(delta * rq->dl.reclaim_bw, u_max_reclaim); } =20 /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 33db99756624..a6cb891835da 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -257,6 +257,11 @@ static inline bool dl_entity_is_special(const struct s= ched_dl_entity *dl_se) #endif } =20 +static inline bool dl_entity_is_reclaim(const struct sched_dl_entity *dl_s= e) +{ + return dl_se->flags & SCHED_FLAG_RECLAIM; +} + /* * Tells if entity @a should preempt entity @b. */ @@ -741,6 +746,12 @@ struct dl_rq { */ u64 running_bw; =20 + /* + * Active bandwidth of SCHED_FLAG_RECLAIM tasks on this rq. + * This will be a subset of running_bw. + */ + u64 reclaim_bw; + /* * Utilization of the tasks "assigned" to this runqueue (including * the tasks that are in runqueue and the tasks that executed on this --=20 2.40.1 From nobody Mon Feb 9 01:00:59 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 547E1C77B7D for ; Mon, 15 May 2023 02:57:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238187AbjEOC55 (ORCPT ); Sun, 14 May 2023 22:57:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52278 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238411AbjEOC5g (ORCPT ); Sun, 14 May 2023 22:57:36 -0400 Received: from mail-qv1-xf2b.google.com (mail-qv1-xf2b.google.com [IPv6:2607:f8b0:4864:20::f2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 637061FC8 for ; Sun, 14 May 2023 19:57:29 -0700 (PDT) Received: by mail-qv1-xf2b.google.com with SMTP id 6a1803df08f44-61b79b9f45bso112936946d6.3 for ; Sun, 14 May 2023 19:57:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitbyteword.org; s=google; t=1684119448; x=1686711448; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2oFCVXjeQ2GIJ5xOGeSz2EMiD7vB66GrLgt9ZF9wK0s=; b=qhLbjW0ZVawuhRY+HIQn9xeqEM+Nwhz5f83xoGSVPgCdqqwYib1B9HscXalx/PHVqB bc+wFoKd/2aIPiJOZ57PPjD43ps7ah8rHeXMhuUoBSPf/HNwiD4AmR/ryYX1GhOTl7Un /awsoDp1jZbMbTa8vksU90Usb3E8VYj7H0SLZnTJ5Gsoj6njR3wmLKo+RNFjhf/SBqmx nez7v5D9IBcrLAaTiXIrIY8l23c2g0l2SjiW1YcUvA/o2BX/G5yMYqeqIFvcA9zNmTBu ez0M/6ZBRAerRK1uMdhYdPiVDHY/8Y/3+1JkPhucWTGeILptA7lK21iPOsjg40/V7/7r lcNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684119448; x=1686711448; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2oFCVXjeQ2GIJ5xOGeSz2EMiD7vB66GrLgt9ZF9wK0s=; b=eq78yhJTPtjmtkbEeZ/eMPigq9JLzidCuZT/cQdgSKFGW7Huktwjg1/Y0Kd6w14FkP GYSUrYHKMSGwQe5SOf1XP4+vQrJMEQJHcd4fA7SQqUZxWIKC8ZCGJj22qivgX7xWFWIP 93UmzcKPVVPu76V1880oM3Yf+JeVgQfSgIJ1OkAQ5R8rCMrr7nkYO1WiHrLAZRvqTCUl p4N4j/fjuDcBLppIogqw8YGcFWS90jTdfmJi6rhZbETCEeKldnDCbfc/9A6L5VQQgG7m FuS9h6PH1XWCoWu5i+97JtPIId5eM9GnkUWYLHLpmmIOVBqe6hatY2kZq6SNN29dmqvT 2CjA== X-Gm-Message-State: AC+VfDwwJxfMdCwTRURhted6E2V7cIlx+5UBUfgJd0KVXESMHzfTbR+m bru+a1rD/L7xMx8QOOS/XpPWew== X-Google-Smtp-Source: ACHHUZ4Q9Tdhe54rgSkAw/RlkBa+2YFISrSNKj8WaG7r0IV3OOMS2EDrDsXpMppUmAcbKVEQRq5Ngw== X-Received: by 2002:a05:6214:e4c:b0:621:e17a:a676 with SMTP id o12-20020a0562140e4c00b00621e17aa676mr15883547qvc.17.1684119448427; Sun, 14 May 2023 19:57:28 -0700 (PDT) Received: from vinz16.lan (c-73-143-21-186.hsd1.ma.comcast.net. [73.143.21.186]) by smtp.gmail.com with ESMTPSA id c3-20020a05620a134300b007339c5114a9sm2308994qkl.103.2023.05.14.19.57.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 May 2023 19:57:28 -0700 (PDT) From: Vineeth Pillai To: luca.abeni@santannapisa.it, Juri Lelli , Daniel Bristot de Oliveira , Peter Zijlstra , Ingo Molnar , Vincent Guittot , Steven Rostedt , Joel Fernandes , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider Cc: Vineeth Pillai , Jonathan Corbet , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Subject: [PATCH v3 5/5] Documentation: sched/deadline: Update GRUB description Date: Sun, 14 May 2023 22:57:16 -0400 Message-Id: <20230515025716.316888-6-vineeth@bitbyteword.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230515025716.316888-1-vineeth@bitbyteword.org> References: <20230515025716.316888-1-vineeth@bitbyteword.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Update the details of GRUB to reflect the updated logic. Signed-off-by: Vineeth Pillai (Google) --- Documentation/scheduler/sched-deadline.rst | 28 ++++++++++++++-------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/Documentation/scheduler/sched-deadline.rst b/Documentation/sch= eduler/sched-deadline.rst index 9d9be52f221a..b45c9dde6671 100644 --- a/Documentation/scheduler/sched-deadline.rst +++ b/Documentation/scheduler/sched-deadline.rst @@ -195,11 +195,15 @@ Deadline Task Scheduling its utilization is added to the active utilization of the runqueue w= here it has been enqueued. =20 - For each runqueue, the algorithm GRUB keeps track of two different bandwi= dths: + For each runqueue, the algorithm GRUB keeps track of three different band= widths: =20 - Active bandwidth (running_bw): this is the sum of the bandwidths of all tasks in active state (i.e., ActiveContending or ActiveNonContending); =20 + - Active bandwidth of SCHED_FLAG_RECLAIM tasks(reclaim_bw): this is the = sum of + bandwidth of all tasks in active state which participates in GRUB. Thi= s is a + subset of running_bw and is needed for reclaimable bandwidth calculati= on. + - Total bandwidth (this_bw): this is the sum of all tasks "belonging" to= the runqueue, including the tasks in Inactive state. =20 @@ -208,21 +212,25 @@ Deadline Task Scheduling It does so by decrementing the runtime of the executing task Ti at a pace= equal to =20 - dq =3D -max{ Ui / Umax, (1 - Uinact - Uextra) } dt + dq =3D -(max{Ui, (Umax_reclaim - Uinact - Uextra)} / Umax_recla= im) dt =20 where: - - Ui is the bandwidth of task Ti; - Umax is the maximum reclaimable utilization (subjected to RT throttling limits); + - Umax_reclaim is the maximum allowable bandwidth for all reclaimable ta= sks + in the runqueue. If there are only SCHED_FLAG_RECLAIM tasks, then + Umax_reclaim =3D Umax; + Otherwise Umax_reclaim =3D (Umax - running_bw + reclaim_bw); - Uinact is the (per runqueue) inactive utilization, computed as - (this_bq - running_bw); + (this_bw - running_bw); - Uextra is the (per runqueue) extra reclaimable utilization - (subjected to RT throttling limits). + (subjected to RT throttling limits); =20 =20 - Let's now see a trivial example of two deadline tasks with runtime equal - to 4 and period equal to 8 (i.e., bandwidth equal to 0.5):: + Let's now see a trivial example of two SCHED_FLAG_RECLAIM tasks with runt= ime + equal to 4 and period equal to 8 (i.e., bandwidth equal to 0.5). Tasks are + allowed to use the whole cpu(Umax =3D Umax_reclaim =3D 1):: =20 A Task T1 | @@ -244,7 +252,7 @@ Deadline Task Scheduling 0 1 2 3 4 5 6 7 8 =20 =20 - A running_bw + A reclaim_bw | 1 ----------------- ------ | | | @@ -272,7 +280,7 @@ Deadline Task Scheduling =20 This is the 0-lag time for Task T1. Since it didn't woken up in the meantime, it enters the Inactive state. Its bandwidth is removed from - running_bw. + running_bw and reclaim_bw. Task T2 continues its execution. However, its runtime is now decreased= as dq =3D - 0.5 dt because Uinact =3D 0.5. Task T2 therefore reclaims the bandwidth unused by Task T1. @@ -280,7 +288,7 @@ Deadline Task Scheduling - Time t =3D 8: =20 Task T1 wakes up. It enters the ActiveContending state again, and the - running_bw is incremented. + running_bw and reclaim_bw are incremented. =20 =20 2.3 Energy-aware scheduling --=20 2.40.1