From nobody Tue Feb 10 14:49:37 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E88CCC77B7D for ; Mon, 15 May 2023 02:57:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238481AbjEOC5q (ORCPT ); Sun, 14 May 2023 22:57:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237951AbjEOC5b (ORCPT ); Sun, 14 May 2023 22:57:31 -0400 Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com [IPv6:2607:f8b0:4864:20::f2a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3391172D for ; Sun, 14 May 2023 19:57:25 -0700 (PDT) Received: by mail-qv1-xf2a.google.com with SMTP id 6a1803df08f44-61a4e03ccbcso113031416d6.2 for ; Sun, 14 May 2023 19:57:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bitbyteword.org; s=google; t=1684119444; x=1686711444; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1Tk/101Kg9y1VA9l8G9fZslfzv/h+WcqBxFnkUABG5A=; b=J6XQSlIZBd3HdoROHLTlpZLoseEwp3Cp6nHpUP4x/G2KNdfRKZAQcHI421sqYkRVjc oyZeT4A6AQYEO2/ZHBuUDWx+6vq9FLxIJ74hL140aLc6NZMn0qmSgwqjb8Ui/Y5ya9k1 9SkF82ACmtn4BOqZdF3bGkuXmwuzARfzWE1CxR9TF2CEJv5dtaK4iwRd/70naUOqzS+b heGvSf83D4URQRYnKn4lqKi3yxDWvont4NdfnNU1lQldqScaCiAFZkUJPObuNYA7hE0Q r92qkfdcz3HGUedtqAoan3TZda5dfBoeoZADu9YStLxO/bwR4PcSfJxun/Qzo0ufHcpp j9Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684119444; x=1686711444; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1Tk/101Kg9y1VA9l8G9fZslfzv/h+WcqBxFnkUABG5A=; b=XBm0zCAnY82p4rfxG8tMYgdAWIlW1JRWDQOe7VeGzemb6PQ183+hXqTM5hXJfoWE+J We61r14aE1WoOTEGjHhNHIYzTKioHq85F8b8GhfrZ5nGz/uLAKxUTdqk6ld/ONJAo7Jr dygJQNemWl96Az+bIrBp5tN1obh2DUQMIZK49fq6a0fVTEMbJDdgbgwHw6e2VnuV4wJG c07OEKqpNDMjR0JA0HRxirpbLX9H3Uahd12loUQNyuFO74Ot4QD5K0c8+uryqpKveWri G1y0ehWz4ppVsRNaZOhy4gYKQ9RYBDYRjSJXmig/WlaLTCENqAyVCy4AKT6JLKM2x07y 9I6w== X-Gm-Message-State: AC+VfDxdQydaBgALJXkqLYmKFoTVxHDY8fM50GufcWCKa0E135Sitjyr oixKIed9H3odQsmzEF8F/oUMDQ== X-Google-Smtp-Source: ACHHUZ5+ra7w/3DJxrJFO5+p5e/ebNw1ZiKaxSnmQV2YQDTWthbHuONOXCieAMhto1TV8qnIDxFj6w== X-Received: by 2002:a05:6214:2421:b0:5a2:abf1:7d33 with SMTP id gy1-20020a056214242100b005a2abf17d33mr54987934qvb.50.1684119444604; Sun, 14 May 2023 19:57:24 -0700 (PDT) Received: from vinz16.lan (c-73-143-21-186.hsd1.ma.comcast.net. [73.143.21.186]) by smtp.gmail.com with ESMTPSA id c3-20020a05620a134300b007339c5114a9sm2308994qkl.103.2023.05.14.19.57.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 May 2023 19:57:24 -0700 (PDT) From: Vineeth Pillai To: luca.abeni@santannapisa.it, Juri Lelli , Daniel Bristot de Oliveira , Peter Zijlstra , Ingo Molnar , Vincent Guittot , Steven Rostedt , Joel Fernandes , Dietmar Eggemann , Ben Segall , Mel Gorman , Valentin Schneider Cc: Vineeth Pillai , Jonathan Corbet , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Subject: [PATCH v3 2/5] sched/deadline: Fix reclaim inaccuracy with SMP Date: Sun, 14 May 2023 22:57:13 -0400 Message-Id: <20230515025716.316888-3-vineeth@bitbyteword.org> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230515025716.316888-1-vineeth@bitbyteword.org> References: <20230515025716.316888-1-vineeth@bitbyteword.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In a multi-processor system, bandwidth usage is divided equally to all cpus. This causes issues with reclaiming free bandwidth on a cpu. "Uextra" is same on all cpus in a root domain and running_bw would be different based on the reserved bandwidth of tasks running on the cpu. This causes disproportionate reclaiming - task with lesser bandwidth reclaims less even if its the only task running on that cpu. Following is a small test with three tasks with reservations (8,10) (1,10) and (1, 100). These three tasks run on different cpus. But since the reclamation logic calculates available bandwidth as a factor of globally available bandwidth, tasks with lesser bandwidth reclaims only little compared to higher bandwidth even if cpu has free and available bandwidth to be reclaimed. TID[730]: RECLAIM=3D1, (r=3D8ms, d=3D10ms, p=3D10ms), Util: 95.05 TID[731]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 31.34 TID[732]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 3.16 Fix: use the available bandwidth on each cpu to calculate reclaimable bandwidth. Admission control takes care of total bandwidth and hence using the available bandwidth on a specific cpu would not break the deadline guarentees. With this fix, the above test behaves as follows: TID[586]: RECLAIM=3D1, (r=3D1ms, d=3D100ms, p=3D100ms), Util: 95.24 TID[585]: RECLAIM=3D1, (r=3D1ms, d=3D10ms, p=3D10ms), Util: 95.01 TID[584]: RECLAIM=3D1, (r=3D8ms, d=3D10ms, p=3D10ms), Util: 95.01 Signed-off-by: Vineeth Pillai (Google) --- kernel/sched/deadline.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 91451c1c7e52..85902c4c484b 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1272,7 +1272,7 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_se) * Umax: Max usable bandwidth for DL. Currently * =3D sched_rt_runtime_us / sched_rt_period_us * Uextra: Extra bandwidth not reserved: - * =3D Umax - \Sum(u_i / #cpus in the root domain) + * =3D Umax - this_bw * u_i: Bandwidth of an admitted dl task in the * root domain. * @@ -1286,22 +1286,14 @@ int dl_runtime_exceeded(struct sched_dl_entity *dl_= se) */ static u64 grub_reclaim(u64 delta, struct rq *rq, struct sched_dl_entity *= dl_se) { - u64 u_act; - u64 u_inact =3D rq->dl.this_bw - rq->dl.running_bw; /* Utot - Uact */ - /* - * Instead of computing max{u, (rq->dl.max_bw - u_inact - u_extra)}, - * we compare u_inact + rq->dl.extra_bw with - * rq->dl.max_bw - u, because u_inact + rq->dl.extra_bw can be larger - * than rq->dl.max_bw (so, rq->dl.max_bw - u_inact - rq->dl.extra_bw - * would be negative leading to wrong results) + * max{u, Umax - Uinact - Uextra} + * =3D max{u, max_bw - (this_bw - running_bw) + (this_bw - running_bw)} + * =3D max{u, running_bw} =3D running_bw + * So dq =3D -(max{u, Umax - Uinact - Uextra} / Umax) dt + * =3D -(running_bw / max_bw) dt */ - if (u_inact + rq->dl.extra_bw > rq->dl.max_bw - dl_se->dl_bw) - u_act =3D dl_se->dl_bw; - else - u_act =3D rq->dl.max_bw - u_inact - rq->dl.extra_bw; - - return div64_u64(delta * u_act, rq->dl.max_bw); + return div64_u64(delta * rq->dl.running_bw, rq->dl.max_bw); } =20 /* --=20 2.40.1