From nobody Wed Dec 31 10:03:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA5D01581E2; Mon, 27 May 2024 12:07:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811630; cv=none; b=ezpj1g8qV6Zp41DqSKV8cgDEDXWu/1uwVSUCXrK/6a0gYr3ITdcPNyE30QShuKyPASL/KSl9Z7ykHUDkQ2Lsdj64m90obx8kkY3o7nE58XIdnhlOenx/w37CcLY6gmB+P4sikNRF+GgdlyBTRlU4x1M1FbYgEaw6iNZYilKx0M0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811630; c=relaxed/simple; bh=hIxgYxksAfA66KrdSWCpQCnHlbL7K67hv5hEDpywMGA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NUPmJN8YNqx5abzeV8BiBxAzHRuQnM32RF6qzs1IEz8itF784TS9ax6jPdkBp+iOqbo9310bY7ksD6NH60kafNMjjee2hG9sgC/TpeR71wT0z2bZegrvtMrKrWEoHEWfrxkcQEu5LCFMmvddSWYG29xr24/NYd4cCgZj9bMlm00= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DZle2ZRO; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DZle2ZRO" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 49E74C4AF07; Mon, 27 May 2024 12:07:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716811630; bh=hIxgYxksAfA66KrdSWCpQCnHlbL7K67hv5hEDpywMGA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=DZle2ZRO0HO8NBfAzwCxTo8C5J/8Sj+gO6g2GMqAfeHUycr3R18026dDgzHidjblz fVCr4dxOL2hpTIKA+pkRkwacP8ZNl8FeN1LkcbXMl3pk4hBl5DO2UjNRhjpIwnXRxl T99+Pkv59cIzD0qZIrLvhfRFf6pIA6AqFDrG6tVze+o4bKWiK7cdmI4PRFq91RxHBw LjV4kWrmgy9FjbWKKJ6iVNd1TlLOepSwenViYTwJV/4mP2rveSINldHlJd5dXIwFZm txbvuAN40TOJbs0VAvESfiqOqtwdfAqBOV86qavDt3DD+lIgwYhyewF307bi7IigcF X6VmzVxMApgyw== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld , Suleiman Souhlal , Youssef Esmat , stable@vger.kernel.org Subject: [PATCH V7 1/9] sched/deadline: Comment sched_dl_entity::dl_server variable Date: Mon, 27 May 2024 14:06:47 +0200 Message-ID: <147f7aa8cb8fd925f36aa8059af6a35aad08b45a.1716811044.git.bristot@kernel.org> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add an explanation for the newly added variable. Cc: stable@vger.kernel.org Fixes: 63ba8422f876 ("sched/deadline: Introduce deadline servers") Signed-off-by: Daniel Bristot de Oliveira Reviewed-by: Suleiman Souhlal Tested-by: Juri Lelli Tested-by: Vineeth Pillai --- include/linux/sched.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 61591ac6eab6..abce356932cc 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -637,6 +637,8 @@ struct sched_dl_entity { * * @dl_overrun tells if the task asked to be informed about runtime * overruns. + * + * @dl_server tells if this is a server entity. */ unsigned int dl_throttled : 1; unsigned int dl_yielded : 1; --=20 2.45.1 From nobody Wed Dec 31 10:03:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 699A01581E2; Mon, 27 May 2024 12:07:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811636; cv=none; b=Vouy6UgmRii9ZLZ7zYoDCi3hczwlChdUUBRj+4sO8yTnPT/+OaMMy/auZBrS12x52mJK0kq9hTkag3dgz1T6Xauj1f/Wd3yGmagvNHY44GpOpE7fNR8LD1+b7D7qopfwecKKUncNJZrwnCuN2etEu5Rrp5Md1m1+0bWxb5OsspA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811636; c=relaxed/simple; bh=Ookat3BufN/8SoLPN/8zjg5igMrsT1mPcTpK30q2+rI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fpwi/fbCa4O5wECNjYarBmHpw6YyqYlqTBggc6bhuDdTSAm74NLNpia36pVB6UiLPXsxYssV9X1dqLMy5DsYwQ69E1BWr0PiYBcQNPpshyjgFViLT3pqF92OYSbRAdI+X5ou27YgQ5nsf2vc2FrGO4TQVsG6eGmuY+Ryohz/TTQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=k9VnWGgW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="k9VnWGgW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E7117C2BBFC; Mon, 27 May 2024 12:07:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716811635; bh=Ookat3BufN/8SoLPN/8zjg5igMrsT1mPcTpK30q2+rI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=k9VnWGgWWTY8G+2PkawIPuyuu2UyVWibAFyKgnv8bCkkMIGA8kQ8d+xMDKmmXqZqM 82td5g7I5/mf2EdLaOuUZxoN4O0LkZDrjmiEaJrsf9msf/wXAII/ssfsyRRidDpqt1 OIImIVeDlb5hOhsJeKMO5q/1MV9j8x1MFauUK5ihbSC7dLyhUy6HGUZBR6NirDjrVF MpE+rSWLy68Ai3D7bzRu2+HWVQsUAXxfUoSR+YG3LDz+T/Q5HbL97In0d+Xgc0v2Ic HmAq89JU7UD4NCfT8TLFImTXVfpJeG1jjasueLkcs0iyMupZ1+Tv9IQjWf0D8CT8jx y2wm5BaqWFHSw== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld , Suleiman Souhlal , Youssef Esmat , stable@vger.kernel.org Subject: [PATCH V7 2/9] sched/core: Add clearing of ->dl_server in put_prev_task_balance() Date: Mon, 27 May 2024 14:06:48 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Joel Fernandes (Google)" Paths using put_prev_task_balance() need to do a pick shortly after. Make sure they also clear the ->dl_server on prev as a part of that. Cc: stable@vger.kernel.org Fixes: 63ba8422f876 ("sched/deadline: Introduce deadline servers") Signed-off-by: Joel Fernandes (Google) Signed-off-by: Daniel Bristot de Oliveira Reviewed-by: Suleiman Souhlal Tested-by: Juri Lelli Tested-by: Vineeth Pillai --- kernel/sched/core.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bcf2c4cc0522..08c409457152 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6006,6 +6006,14 @@ static void put_prev_task_balance(struct rq *rq, str= uct task_struct *prev, #endif =20 put_prev_task(rq, prev); + + /* + * We've updated @prev and no longer need the server link, clear it. + * Must be done before ->pick_next_task() because that can (re)set + * ->dl_server. + */ + if (prev->dl_server) + prev->dl_server =3D NULL; } =20 /* @@ -6049,14 +6057,6 @@ __pick_next_task(struct rq *rq, struct task_struct *= prev, struct rq_flags *rf) restart: put_prev_task_balance(rq, prev, rf); =20 - /* - * We've updated @prev and no longer need the server link, clear it. - * Must be done before ->pick_next_task() because that can (re)set - * ->dl_server. - */ - if (prev->dl_server) - prev->dl_server =3D NULL; - for_each_class(class) { p =3D class->pick_next_task(rq); if (p) --=20 2.45.1 From nobody Wed Dec 31 10:03:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD1D515E5A7; Mon, 27 May 2024 12:07:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811642; cv=none; b=k5TsEj0jwjxpkK+KDic/tIuZb7VUlpzZF5jHi/T8zbK7mUlcgznrHFsGqZvH6RvcxyaXKWNjB2U/OAtvshfJuzzxj08nwJM5x4wpb0rKDLhM9FhBO6PADTUEVfETT3a7w+ktmP5z4UAfnIF+Devw/g7024TnKndV7QQizbkK3DI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811642; c=relaxed/simple; bh=Q6HzBW3LSaYAiupiLhOT8oiqFvjivNTjzbPPDIc2uag=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sgjF6yUmTZXJK8BIhdTOg4TvZSSDB2+hc2NTGAgdBXnNtETaRJ49Iwndd7DG+SyDAKIDDLtOhak3OgJwHeghLpc4wxM948TYZbo9pioKpTXVbgZxEaKcpG3mpekF2LPo6lYyGBv7pF9lgdDCENSc4aYIIb/o56mNJFCiLkX68Po= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Co8LEJEC; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Co8LEJEC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 74073C32781; Mon, 27 May 2024 12:07:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716811641; bh=Q6HzBW3LSaYAiupiLhOT8oiqFvjivNTjzbPPDIc2uag=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Co8LEJEC5qMAS74RlTJm3Yh8502ch8clR/rtF3N1j6+Iyfo3a22VBtL2dObBypVhO iIYNu/iJpmzvMuGytXdALpMTIwYFJtlMjCXNqISEAOCYzUUCrx++iFB/xVq5OJh3Q/ EHEpyBUoY9eLa58JZfMrGMC6yhN+rdHEi8QcDEys+S+pHZOW9cgmfKL9m/CFccgItR PESfW6ZfI/Zw6eY2Pn2mHj2kYtwwxb/7vw1NUYn94ST99X+J4tzJw8GpEF40WfC8tF LjP4IgdjTUGGrm2PyRSdzVIwDYFqaeXkUx3BgJq4IcF5uRNfwjoFBgW68+hR+802fV Xzgo4CjhgQSOg== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld , Suleiman Souhlal , Youssef Esmat , stable@vger.kernel.org Subject: [PATCH V7 3/9] sched/core: Clear prev->dl_server in CFS pick fast path Date: Mon, 27 May 2024 14:06:49 +0200 Message-ID: <7f7381ccba09efcb4a1c1ff808ed58385eccc222.1716811044.git.bristot@kernel.org> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Youssef Esmat In case the previous pick was a DL server pick, ->dl_server might be set. Clear it in the fast path as well. Cc: stable@vger.kernel.org Fixes: 63ba8422f876 ("sched/deadline: Introduce deadline servers") Signed-off-by: Youssef Esmat Signed-off-by: Joel Fernandes (Google) Signed-off-by: Daniel Bristot de Oliveira Reviewed-by: Suleiman Souhlal Tested-by: Juri Lelli Tested-by: Vineeth Pillai --- kernel/sched/core.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 08c409457152..6d01863f93ca 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6044,6 +6044,13 @@ __pick_next_task(struct rq *rq, struct task_struct *= prev, struct rq_flags *rf) p =3D pick_next_task_idle(rq); } =20 + /* + * This is a normal CFS pick, but the previous could be a DL pick. + * Clear it as previous is no longer picked. + */ + if (prev->dl_server) + prev->dl_server =3D NULL; + /* * This is the fast path; it cannot be a DL server pick; * therefore even if @p =3D=3D @prev, ->dl_server must be NULL. --=20 2.45.1 From nobody Wed Dec 31 10:03:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4997115E5A7 for ; Mon, 27 May 2024 12:07:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811647; cv=none; b=Ok7ULydshoEZ7A3pbGIpDEqo2tRzW3GpKX2KpTPjsrDG72g/RywiliPg8NtGfzKW5ouOtEuwZwGAkxzpy4TNZoiGN+7RuQkUDEjwqZpDdJuFnJjTtrME4t++04aFPD0wRpnAjquACrnAcc0p9a4dNLAPxdS0C102arz3JO4JEKs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811647; c=relaxed/simple; bh=bhGVa6euB5QFhz9UojwRvbtIeiZhpw1lAIXXI1eWUe0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fdVZfolYN0eeELHX9E/FjrMlT6E82ikK5kTvlp5N2QfPC/jwUSjFzp/Cj8Y8vsbFPWJH0kjzolYrq7c3tOsDyoveK+kXMdnz164fAUJIZIiAJkKVCLfU868h6PfHcYbuhFB2lvHPwVH1CkkTX3rqartfKh/r/HqX0grMlySjiqI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=esc8Z3TF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="esc8Z3TF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 01295C4AF07; Mon, 27 May 2024 12:07:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716811646; bh=bhGVa6euB5QFhz9UojwRvbtIeiZhpw1lAIXXI1eWUe0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=esc8Z3TFPTxs1e6lLvlYTE5rCKVIRi2dbnzcjPfGsm+Tj+s90EuJSmt3SqpdztVMM mxsByA/O656yvH0fIsT11TiBdtW4aXGQ1eCzHPbGn5FOkDZjIno+/P5JoQbPhAcjRl z0X8TK20dBa2mpMd+l1VSqkgwSMBatm6PgqB6P68tlyukD72yh3M13mhaNqhGzA57Z 3VCyyJmftZ2sfSG3JxF1F4Lmv8y6mT0Zhhossb6bUrHqAyepQR3wlq/KyIV+aU6Ub4 YWe8PkXcjunU+i5dXb8kVrSpXNRT2KVamLhyL/3hAxH9B43l7QRjU2+rFyoYJ1vGsx wq4WMN+r9exZQ== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld , Suleiman Souhlal , Youssef Esmat Subject: [PATCH V7 4/9] sched/fair: Add trivial fair server Date: Mon, 27 May 2024 14:06:50 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Use deadline servers to service fair tasks. This patch adds a fair_server deadline entity which acts as a container for fair entities and can be used to fix starvation when higher priority (wrt fair) tasks are monopolizing CPU(s). Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Daniel Bristot de Oliveira Reviewed-by: Suleiman Souhlal Tested-by: Juri Lelli Tested-by: Vineeth Pillai --- kernel/sched/core.c | 1 + kernel/sched/deadline.c | 23 +++++++++++++++++++++++ kernel/sched/fair.c | 34 ++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 4 ++++ 4 files changed, 62 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 6d01863f93ca..53f0470a1d0a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -10057,6 +10057,7 @@ void __init sched_init(void) #endif /* CONFIG_SMP */ hrtick_rq_init(rq); atomic_set(&rq->nr_iowait, 0); + fair_server_init(rq); =20 #ifdef CONFIG_SCHED_CORE rq->core =3D rq; diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index c75d1307d86d..b69d6c3e1587 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1381,6 +1381,13 @@ static void update_curr_dl_se(struct rq *rq, struct = sched_dl_entity *dl_se, s64 resched_curr(rq); } =20 + /* + * The fair server (sole dl_server) does not account for real-time + * workload because it is running fair work. + */ + if (dl_se =3D=3D &rq->fair_server) + return; + /* * Because -- for now -- we share the rt bandwidth, we need to * account our runtime there too, otherwise actual rt tasks @@ -1414,15 +1421,31 @@ void dl_server_update(struct sched_dl_entity *dl_se= , s64 delta_exec) =20 void dl_server_start(struct sched_dl_entity *dl_se) { + struct rq *rq =3D dl_se->rq; + if (!dl_server(dl_se)) { + /* Disabled */ + dl_se->dl_runtime =3D 0; + dl_se->dl_deadline =3D 1000 * NSEC_PER_MSEC; + dl_se->dl_period =3D 1000 * NSEC_PER_MSEC; + dl_se->dl_server =3D 1; setup_new_dl_entity(dl_se); } + + if (!dl_se->dl_runtime) + return; + enqueue_dl_entity(dl_se, ENQUEUE_WAKEUP); + if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &rq->curr->dl)) + resched_curr(dl_se->rq); } =20 void dl_server_stop(struct sched_dl_entity *dl_se) { + if (!dl_se->dl_runtime) + return; + dequeue_dl_entity(dl_se, DEQUEUE_SLEEP); } =20 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8a5b1ae0aa55..2d5d3e6c1e72 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5766,6 +5766,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); struct sched_entity *se; long task_delta, idle_task_delta, dequeue =3D 1; + long rq_h_nr_running =3D rq->cfs.h_nr_running; =20 raw_spin_lock(&cfs_b->lock); /* This will start the period timer if necessary */ @@ -5838,6 +5839,9 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) sub_nr_running(rq, task_delta); =20 done: + /* Stop the fair server if throttling resulted in no runnable tasks */ + if (rq_h_nr_running && !rq->cfs.h_nr_running) + dl_server_stop(&rq->fair_server); /* * Note: distribution will already see us throttled via the * throttled-list. rq->lock protects completion. @@ -5855,6 +5859,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); struct sched_entity *se; long task_delta, idle_task_delta; + long rq_h_nr_running =3D rq->cfs.h_nr_running; =20 se =3D cfs_rq->tg->se[cpu_of(rq)]; =20 @@ -5930,6 +5935,10 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) unthrottle_throttle: assert_list_leaf_cfs_rq(rq); =20 + /* Start the fair server if un-throttling resulted in new runnable tasks = */ + if (!rq_h_nr_running && rq->cfs.h_nr_running) + dl_server_start(&rq->fair_server); + /* Determine whether we need to wake up potentially idle CPU: */ if (rq->curr =3D=3D rq->idle && rq->cfs.nr_running) resched_curr(rq); @@ -6760,6 +6769,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) */ util_est_enqueue(&rq->cfs, p); =20 + if (!throttled_hierarchy(task_cfs_rq(p)) && !rq->cfs.h_nr_running) + dl_server_start(&rq->fair_server); + /* * If in_iowait is set, the code below may not trigger any cpufreq * utilization updates, so do it here explicitly with the IOWAIT flag @@ -6904,6 +6916,9 @@ static void dequeue_task_fair(struct rq *rq, struct t= ask_struct *p, int flags) rq->next_balance =3D jiffies; =20 dequeue_throttle: + if (!throttled_hierarchy(task_cfs_rq(p)) && !rq->cfs.h_nr_running) + dl_server_stop(&rq->fair_server); + util_est_update(&rq->cfs, p, task_sleep); hrtick_update(rq); } @@ -8607,6 +8622,25 @@ static struct task_struct *__pick_next_task_fair(str= uct rq *rq) return pick_next_task_fair(rq, NULL, NULL); } =20 +static bool fair_server_has_tasks(struct sched_dl_entity *dl_se) +{ + return !!dl_se->rq->cfs.nr_running; +} + +static struct task_struct *fair_server_pick(struct sched_dl_entity *dl_se) +{ + return pick_next_task_fair(dl_se->rq, NULL, NULL); +} + +void fair_server_init(struct rq *rq) +{ + struct sched_dl_entity *dl_se =3D &rq->fair_server; + + init_dl_entity(dl_se); + + dl_server_init(dl_se, rq, fair_server_has_tasks, fair_server_pick); +} + /* * Account for a descheduled task: */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a831af102070..39c9669b23a7 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -356,6 +356,8 @@ extern void dl_server_init(struct sched_dl_entity *dl_s= e, struct rq *rq, dl_server_has_tasks_f has_tasks, dl_server_pick_f pick); =20 +extern void fair_server_init(struct rq *rq); + #ifdef CONFIG_CGROUP_SCHED =20 struct cfs_rq; @@ -1037,6 +1039,8 @@ struct rq { struct rt_rq rt; struct dl_rq dl; =20 + struct sched_dl_entity fair_server; + #ifdef CONFIG_FAIR_GROUP_SCHED /* list of leaf cfs_rq on this CPU: */ struct list_head leaf_cfs_rq_list; --=20 2.45.1 From nobody Wed Dec 31 10:03:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40A7A15ECDE for ; Mon, 27 May 2024 12:07:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811652; cv=none; b=cO+Zlr60mxJVMq7eoG36uZfSSdT9rzKLok2XqMkYC1gCpZQg6C74q/PTOVBnRf0lZSvBg5WGzpqPALZWxRYQWldpsqjwhPmZq/B3ZzMZbkauh/QFBQrx6VreRc1mM8DakpyhxQWMsBjXjhmTcjLYM2kMwsgpF3EkelnvkWasPxo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811652; c=relaxed/simple; bh=v0GEfxwNhDkKjb2yyYTKi/u+TmOm87sO5ofC42i4j/Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nNBxVR7dTf3cCXAVkbPyjayj32rp2Bd8s7R/Qys5f4OS53RSySE04X2eP7M5IXsBtMXfaqoh3ATSnf4w7SztP9z6NhXhGdcs/xZF+XDNVi/TZmYCntBJe7+3GmSxv0WhdtGTAl9+JbfwfESvA2u2++hA10EXRF8UUB69V045lus= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=pCKRRG2x; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pCKRRG2x" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6982FC2BBFC; Mon, 27 May 2024 12:07:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716811652; bh=v0GEfxwNhDkKjb2yyYTKi/u+TmOm87sO5ofC42i4j/Y=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=pCKRRG2xnvWcz5I30mZ7uv40nqJqtLYljK4pDCrI6QH64p3ICa1L7vKdkc5t/nKHD nsvUwi1z30I3JnE3JVK26H02s86/Cwq7uOeruZCAaakfuFYdz9rBuN87H2DrBXKFmu X2adIBwE+CNwMej4pRvsyhf1zUwHmnH8sJt5p8hmBhyX0IYYXmbtgPHk9NDd44u8kJ xsreIOheJ3Kd0ElCY+vli2f92bLWgxgSRt3r3RVeDhQ5ChC/TdI6cZ8vZiDEnhUpNg teTd9cqdlXozvX9O4ku4b0rHg2wccrHD3TSsT5YaSvzFF/ukwm6ARicD7ywivR+vU6 lmx15DqqeZy7A== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld , Suleiman Souhlal , Youssef Esmat Subject: [PATCH V7 5/9] sched/deadline: Deferrable dl server Date: Mon, 27 May 2024 14:06:51 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Among the motivations for the DL servers is the real-time throttling mechanism. This mechanism works by throttling the rt_rq after running for a long period without leaving space for fair tasks. The base dl server avoids this problem by boosting fair tasks instead of throttling the rt_rq. The point is that it boosts without waiting for potential starvation, causing some non-intuitive cases. For example, an IRQ dispatches two tasks on an idle system, a fair and an RT. The DL server will be activated, running the fair task before the RT one. This problem can be avoided by deferring the dl server activation. By setting the defer option, the dl_server will dispatch an SCHED_DEADLINE reservation with replenished runtime, but throttled. The dl_timer will be set for the defer time at (period - runtime) ns from start time. Thus boosting the fair rq at defer time. If the fair scheduler has the opportunity to run while waiting for defer time, the dl server runtime will be consumed. If the runtime is completely consumed before the defer time, the server will be replenished while still in a throttled state. Then, the dl_timer will be reset to the new defer time If the fair server reaches the defer time without consuming its runtime, the server will start running, following CBS rules (thus without breaking SCHED_DEADLINE). Then the server will continue the running state (without deferring) until it fair tasks are able to execute as regular fair scheduler (end of the starvation). Signed-off-by: Daniel Bristot de Oliveira Reviewed-by: Suleiman Souhlal Tested-by: Juri Lelli Tested-by: Vineeth Pillai --- include/linux/sched.h | 12 ++ kernel/sched/deadline.c | 301 ++++++++++++++++++++++++++++++++++------ kernel/sched/fair.c | 24 +++- kernel/sched/idle.c | 2 + kernel/sched/sched.h | 4 +- 5 files changed, 298 insertions(+), 45 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index abce356932cc..611771fec4df 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -639,12 +639,24 @@ struct sched_dl_entity { * overruns. * * @dl_server tells if this is a server entity. + * + * @dl_defer tells if this is a deferred or regular server. For + * now only defer server exists. + * + * @dl_defer_armed tells if the deferrable server is waiting + * for the replenishment timer to activate it. + * + * @dl_defer_running tells if the deferrable server is actually + * running, skipping the defer phase. */ unsigned int dl_throttled : 1; unsigned int dl_yielded : 1; unsigned int dl_non_contending : 1; unsigned int dl_overrun : 1; unsigned int dl_server : 1; + unsigned int dl_defer : 1; + unsigned int dl_defer_armed : 1; + unsigned int dl_defer_running : 1; =20 /* * Bandwidth enforcement timer. Each -deadline task has its diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index b69d6c3e1587..eddfe18d9762 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -771,6 +771,15 @@ static inline void replenish_dl_new_period(struct sche= d_dl_entity *dl_se, /* for non-boosted task, pi_of(dl_se) =3D=3D dl_se */ dl_se->deadline =3D rq_clock(rq) + pi_of(dl_se)->dl_deadline; dl_se->runtime =3D pi_of(dl_se)->dl_runtime; + + /* + * If it is a deferred reservation, and the server + * is not handling an starvation case, defer it. + */ + if (dl_se->dl_defer & !dl_se->dl_defer_running) { + dl_se->dl_throttled =3D 1; + dl_se->dl_defer_armed =3D 1; + } } =20 /* @@ -809,6 +818,9 @@ static inline void setup_new_dl_entity(struct sched_dl_= entity *dl_se) replenish_dl_new_period(dl_se, rq); } =20 +static int start_dl_timer(struct sched_dl_entity *dl_se); +static bool dl_entity_overflow(struct sched_dl_entity *dl_se, u64 t); + /* * Pure Earliest Deadline First (EDF) scheduling does not deal with the * possibility of a entity lasting more than what it declared, and thus @@ -837,9 +849,18 @@ static void replenish_dl_entity(struct sched_dl_entity= *dl_se) /* * This could be the case for a !-dl task that is boosted. * Just go with full inherited parameters. + * + * Or, it could be the case of a deferred reservation that + * was not able to consume its runtime in background and + * reached this point with current u > U. + * + * In both cases, set a new period. */ - if (dl_se->dl_deadline =3D=3D 0) - replenish_dl_new_period(dl_se, rq); + if (dl_se->dl_deadline =3D=3D 0 || + (dl_se->dl_defer_armed && dl_entity_overflow(dl_se, rq_clock(rq)))) { + dl_se->deadline =3D rq_clock(rq) + pi_of(dl_se)->dl_deadline; + dl_se->runtime =3D pi_of(dl_se)->dl_runtime; + } =20 if (dl_se->dl_yielded && dl_se->runtime > 0) dl_se->runtime =3D 0; @@ -873,6 +894,44 @@ static void replenish_dl_entity(struct sched_dl_entity= *dl_se) dl_se->dl_yielded =3D 0; if (dl_se->dl_throttled) dl_se->dl_throttled =3D 0; + + /* + * If this is the replenishment of a deferred reservation, + * clear the flag and return. + */ + if (dl_se->dl_defer_armed) { + dl_se->dl_defer_armed =3D 0; + return; + } + + /* + * A this point, if the deferred server is not armed, and the deadline + * is in the future, if it is not running already, throttle the server + * and arm the defer timer. + */ + if (dl_se->dl_defer && !dl_se->dl_defer_running && + dl_time_before(rq_clock(dl_se->rq), dl_se->deadline - dl_se->runtime)= ) { + if (!is_dl_boosted(dl_se) && dl_se->server_has_tasks(dl_se)) { + + /* + * Set dl_se->dl_defer_armed and dl_throttled variables to + * inform the start_dl_timer() that this is a deferred + * activation. + */ + dl_se->dl_defer_armed =3D 1; + dl_se->dl_throttled =3D 1; + if (!start_dl_timer(dl_se)) { + /* + * If for whatever reason (delays), a previous timer was + * queued but not serviced, cancel it and clean the + * deferrable server variables intended for start_dl_timer(). + */ + hrtimer_try_to_cancel(&dl_se->dl_timer); + dl_se->dl_defer_armed =3D 0; + dl_se->dl_throttled =3D 0; + } + } + } } =20 /* @@ -1023,6 +1082,15 @@ static void update_dl_entity(struct sched_dl_entity = *dl_se) } =20 replenish_dl_new_period(dl_se, rq); + } else if (dl_server(dl_se) && dl_se->dl_defer) { + /* + * The server can still use its previous deadline, so check if + * it left the dl_defer_running state. + */ + if (!dl_se->dl_defer_running) { + dl_se->dl_defer_armed =3D 1; + dl_se->dl_throttled =3D 1; + } } } =20 @@ -1055,8 +1123,21 @@ static int start_dl_timer(struct sched_dl_entity *dl= _se) * We want the timer to fire at the deadline, but considering * that it is actually coming from rq->clock and not from * hrtimer's time base reading. + * + * The deferred reservation will have its timer set to + * (deadline - runtime). At that point, the CBS rule will decide + * if the current deadline can be used, or if a replenishment is + * required to avoid add too much pressure on the system + * (current u > U). */ - act =3D ns_to_ktime(dl_next_period(dl_se)); + if (dl_se->dl_defer_armed) { + WARN_ON_ONCE(!dl_se->dl_throttled); + act =3D ns_to_ktime(dl_se->deadline - dl_se->runtime); + } else { + /* act =3D deadline - rel-deadline + period */ + act =3D ns_to_ktime(dl_next_period(dl_se)); + } + now =3D hrtimer_cb_get_time(timer); delta =3D ktime_to_ns(now) - rq_clock(rq); act =3D ktime_add_ns(act, delta); @@ -1106,6 +1187,62 @@ static void __push_dl_task(struct rq *rq, struct rq_= flags *rf) #endif } =20 +/* a defer timer will not be reset if the runtime consumed was < dl_server= _min_res */ +static const u64 dl_server_min_res =3D 1 * NSEC_PER_MSEC; + +static enum hrtimer_restart dl_server_timer(struct hrtimer *timer, struct = sched_dl_entity *dl_se) +{ + struct rq *rq =3D rq_of_dl_se(dl_se); + u64 fw; + + scoped_guard (rq_lock, rq) { + struct rq_flags *rf =3D &scope.rf; + + if (!dl_se->dl_throttled || !dl_se->dl_runtime) + return HRTIMER_NORESTART; + + sched_clock_tick(); + update_rq_clock(rq); + + if (!dl_se->dl_runtime) + return HRTIMER_NORESTART; + + if (!dl_se->server_has_tasks(dl_se)) { + replenish_dl_entity(dl_se); + return HRTIMER_NORESTART; + } + + if (dl_se->dl_defer_armed) { + /* + * First check if the server could consume runtime in background. + * If so, it is possible to push the defer timer for this amount + * of time. The dl_server_min_res serves as a limit to avoid + * forwarding the timer for a too small amount of time. + */ + if (dl_time_before(rq_clock(dl_se->rq), + (dl_se->deadline - dl_se->runtime - dl_server_min_res))) { + + /* reset the defer timer */ + fw =3D dl_se->deadline - rq_clock(dl_se->rq) - dl_se->runtime; + + hrtimer_forward_now(timer, ns_to_ktime(fw)); + return HRTIMER_RESTART; + } + + dl_se->dl_defer_running =3D 1; + } + + enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH); + + if (!dl_task(dl_se->rq->curr) || dl_entity_preempt(dl_se, &dl_se->rq->cu= rr->dl)) + resched_curr(rq); + + __push_dl_task(rq, rf); + } + + return HRTIMER_NORESTART; +} + /* * This is the bandwidth enforcement timer callback. If here, we know * a task is not on its dl_rq, since the fact that the timer was running @@ -1128,28 +1265,8 @@ static enum hrtimer_restart dl_task_timer(struct hrt= imer *timer) struct rq_flags rf; struct rq *rq; =20 - if (dl_server(dl_se)) { - struct rq *rq =3D rq_of_dl_se(dl_se); - struct rq_flags rf; - - rq_lock(rq, &rf); - if (dl_se->dl_throttled) { - sched_clock_tick(); - update_rq_clock(rq); - - if (dl_se->server_has_tasks(dl_se)) { - enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH); - resched_curr(rq); - __push_dl_task(rq, &rf); - } else { - replenish_dl_entity(dl_se); - } - - } - rq_unlock(rq, &rf); - - return HRTIMER_NORESTART; - } + if (dl_server(dl_se)) + return dl_server_timer(timer, dl_se); =20 p =3D dl_task_of(dl_se); rq =3D task_rq_lock(p, &rf); @@ -1319,22 +1436,10 @@ static u64 grub_reclaim(u64 delta, struct rq *rq, s= truct sched_dl_entity *dl_se) return (delta * u_act) >> BW_SHIFT; } =20 -static inline void -update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se, - int flags); -static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se= , s64 delta_exec) +s64 dl_scaled_delta_exec(struct rq *rq, struct sched_dl_entity *dl_se, s64= delta_exec) { s64 scaled_delta_exec; =20 - if (unlikely(delta_exec <=3D 0)) { - if (unlikely(dl_se->dl_yielded)) - goto throttle; - return; - } - - if (dl_entity_is_special(dl_se)) - return; - /* * For tasks that participate in GRUB, we implement GRUB-PA: the * spare reclaimed bandwidth is used to clock down frequency. @@ -1353,8 +1458,64 @@ static void update_curr_dl_se(struct rq *rq, struct = sched_dl_entity *dl_se, s64 scaled_delta_exec =3D cap_scale(scaled_delta_exec, scale_cpu); } =20 + return scaled_delta_exec; +} + +static inline void +update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se, + int flags); +static void update_curr_dl_se(struct rq *rq, struct sched_dl_entity *dl_se= , s64 delta_exec) +{ + s64 scaled_delta_exec; + + if (unlikely(delta_exec <=3D 0)) { + if (unlikely(dl_se->dl_yielded)) + goto throttle; + return; + } + + if (dl_server(dl_se) && dl_se->dl_throttled && !dl_se->dl_defer) + return; + + if (dl_entity_is_special(dl_se)) + return; + + scaled_delta_exec =3D dl_scaled_delta_exec(rq, dl_se, delta_exec); + dl_se->runtime -=3D scaled_delta_exec; =20 + /* + * The fair server can consume its runtime while throttled (not queued/ + * running as regular CFS). + * + * If the server consumes its entire runtime in this state. The server + * is not required for the current period. Thus, reset the server by + * starting a new period, pushing the activation. + */ + if (dl_se->dl_defer && dl_se->dl_throttled && dl_runtime_exceeded(dl_se))= { + /* + * If the server was previously activated - the starving condition + * took place, it this point it went away because the fair scheduler + * was able to get runtime in background. So return to the initial + * state. + */ + dl_se->dl_defer_running =3D 0; + + hrtimer_try_to_cancel(&dl_se->dl_timer); + + replenish_dl_new_period(dl_se, dl_se->rq); + + /* + * Not being able to start the timer seems problematic. If it could not + * be started for whatever reason, we need to "unthrottle" the DL server + * and queue right away. Otherwise nothing might queue it. That's similar + * to what enqueue_dl_entity() does on start_dl_timer=3D=3D0. For now, j= ust warn. + */ + WARN_ON_ONCE(!start_dl_timer(dl_se)); + + return; + } + throttle: if (dl_runtime_exceeded(dl_se) || dl_se->dl_yielded) { dl_se->dl_throttled =3D 1; @@ -1414,9 +1575,46 @@ static void update_curr_dl_se(struct rq *rq, struct = sched_dl_entity *dl_se, s64 } } =20 +/* + * In the non-defer mode, the idle time is not accounted, as the + * server provides a guarantee. + * + * If the dl_server is in defer mode, the idle time is also considered + * as time available for the fair server, avoiding a penalty for the + * rt scheduler that did not consumed that time. + */ +void dl_server_update_idle_time(struct rq *rq, struct task_struct *p) +{ + s64 delta_exec, scaled_delta_exec; + + if (!rq->fair_server.dl_defer) + return; + + /* no need to discount more */ + if (rq->fair_server.runtime < 0) + return; + + delta_exec =3D rq_clock_task(rq) - p->se.exec_start; + if (delta_exec < 0) + return; + + scaled_delta_exec =3D dl_scaled_delta_exec(rq, &rq->fair_server, delta_ex= ec); + + rq->fair_server.runtime -=3D scaled_delta_exec; + + if (rq->fair_server.runtime < 0) { + rq->fair_server.dl_defer_running =3D 0; + rq->fair_server.runtime =3D 0; + } + + p->se.exec_start =3D rq_clock_task(rq); +} + void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec) { - update_curr_dl_se(dl_se->rq, dl_se, delta_exec); + /* 0 runtime =3D fair server disabled */ + if (dl_se->dl_runtime) + update_curr_dl_se(dl_se->rq, dl_se, delta_exec); } =20 void dl_server_start(struct sched_dl_entity *dl_se) @@ -1430,6 +1628,7 @@ void dl_server_start(struct sched_dl_entity *dl_se) dl_se->dl_period =3D 1000 * NSEC_PER_MSEC; =20 dl_se->dl_server =3D 1; + dl_se->dl_defer =3D 1; setup_new_dl_entity(dl_se); } =20 @@ -1447,6 +1646,9 @@ void dl_server_stop(struct sched_dl_entity *dl_se) return; =20 dequeue_dl_entity(dl_se, DEQUEUE_SLEEP); + hrtimer_try_to_cancel(&dl_se->dl_timer); + dl_se->dl_defer_armed =3D 0; + dl_se->dl_throttled =3D 0; } =20 void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, @@ -1758,7 +1960,7 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int = flags) * be counted in the active utilization; hence, we need to call * add_running_bw(). */ - if (dl_se->dl_throttled && !(flags & ENQUEUE_REPLENISH)) { + if (!dl_se->dl_defer && dl_se->dl_throttled && !(flags & ENQUEUE_REPLENIS= H)) { if (flags & ENQUEUE_WAKEUP) task_contending(dl_se, flags); =20 @@ -1780,6 +1982,25 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se, int= flags) setup_new_dl_entity(dl_se); } =20 + /* + * If the reservation is still throttled, e.g., it got replenished but is= a + * deferred task and still got to wait, don't enqueue. + */ + if (dl_se->dl_throttled && start_dl_timer(dl_se)) + return; + + /* + * We're about to enqueue, make sure we're not ->dl_throttled! + * In case the timer was not started, say because the defer time + * has passed, mark as not throttled and mark unarmed. + * Also cancel earlier timers, since letting those run is pointless. + */ + if (dl_se->dl_throttled) { + hrtimer_try_to_cancel(&dl_se->dl_timer); + dl_se->dl_defer_armed =3D 0; + dl_se->dl_throttled =3D 0; + } + __enqueue_dl_entity(dl_se); } =20 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2d5d3e6c1e72..20e8b02c5cb3 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1156,12 +1156,13 @@ s64 update_curr_common(struct rq *rq) static void update_curr(struct cfs_rq *cfs_rq) { struct sched_entity *curr =3D cfs_rq->curr; + struct rq *rq =3D rq_of(cfs_rq); s64 delta_exec; =20 if (unlikely(!curr)) return; =20 - delta_exec =3D update_curr_se(rq_of(cfs_rq), curr); + delta_exec =3D update_curr_se(rq, curr); if (unlikely(delta_exec <=3D 0)) return; =20 @@ -1169,8 +1170,19 @@ static void update_curr(struct cfs_rq *cfs_rq) update_deadline(cfs_rq, curr); update_min_vruntime(cfs_rq); =20 - if (entity_is_task(curr)) - update_curr_task(task_of(curr), delta_exec); + if (entity_is_task(curr)) { + struct task_struct *p =3D task_of(curr); + + update_curr_task(p, delta_exec); + + /* + * Any fair task that runs outside of fair_server should + * account against fair_server such that it can account for + * this time and possibly avoid running this period. + */ + if (p->dl_server !=3D &rq->fair_server) + dl_server_update(&rq->fair_server, delta_exec); + } =20 account_cfs_rq_runtime(cfs_rq, delta_exec); } @@ -6769,8 +6781,12 @@ enqueue_task_fair(struct rq *rq, struct task_struct = *p, int flags) */ util_est_enqueue(&rq->cfs, p); =20 - if (!throttled_hierarchy(task_cfs_rq(p)) && !rq->cfs.h_nr_running) + if (!throttled_hierarchy(task_cfs_rq(p)) && !rq->cfs.h_nr_running) { + /* Account for idle runtime */ + if (!rq->nr_running) + dl_server_update_idle_time(rq, rq->curr); dl_server_start(&rq->fair_server); + } =20 /* * If in_iowait is set, the code below may not trigger any cpufreq diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 6135fbe83d68..5f8806bc6924 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -458,12 +458,14 @@ static void wakeup_preempt_idle(struct rq *rq, struct= task_struct *p, int flags) =20 static void put_prev_task_idle(struct rq *rq, struct task_struct *prev) { + dl_server_update_idle_time(rq, prev); } =20 static void set_next_task_idle(struct rq *rq, struct task_struct *next, bo= ol first) { update_idle_core(rq); schedstat_inc(rq->sched_goidle); + next->se.exec_start =3D rq_clock_task(rq); } =20 #ifdef CONFIG_SMP diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 39c9669b23a7..76751b945474 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -328,7 +328,7 @@ extern bool __checkparam_dl(const struct sched_attr *at= tr); extern bool dl_param_changed(struct task_struct *p, const struct sched_att= r *attr); extern int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur, const = struct cpumask *trial); extern int dl_bw_check_overflow(int cpu); - +extern s64 dl_scaled_delta_exec(struct rq *rq, struct sched_dl_entity *dl_= se, s64 delta_exec); /* * SCHED_DEADLINE supports servers (nested scheduling) with the following * interface: @@ -356,6 +356,8 @@ extern void dl_server_init(struct sched_dl_entity *dl_s= e, struct rq *rq, dl_server_has_tasks_f has_tasks, dl_server_pick_f pick); =20 +extern void dl_server_update_idle_time(struct rq *rq, + struct task_struct *p); extern void fair_server_init(struct rq *rq); =20 #ifdef CONFIG_CGROUP_SCHED --=20 2.45.1 From nobody Wed Dec 31 10:03:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E3A315ECFA for ; Mon, 27 May 2024 12:07:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811657; cv=none; b=RJfuyQDLdYncWjE2pMzL8AfOul2MS1sbMs1BF8E2FZXKcJvoPvX59MwXjAumBJkvwGV3y5gTWxGI0faqMberFOjjglZ/6d9tF21DCvgJZvKDKMLWM15pZ/Yd/ZnmKwts2pt8sEPAaxxThbICAFjGVnNqQkio+eaL7Xtsf2TG/10= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811657; c=relaxed/simple; bh=5lup1FZO/XefFIZhrWgD25s1f4FfOt8VZtpHNBZ1gUk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Kz7E0RaSY2fuJE4+NK2dlycndIMjRuC/5mBZ34yiBexkxvdiFGuN3SyqC1CU7qx/FAxk298Ql5zGRuD0J1m6x8MKf77aNUeXkhnWeGt4GH0xqdypUOVS7rX5pxNqE3838ERNExSm7uFenyN4SRrN5l0hgTSmOS4dvTbNc2AeDEA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XhPhr/Uy; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XhPhr/Uy" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9FFD7C2BBFC; Mon, 27 May 2024 12:07:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716811657; bh=5lup1FZO/XefFIZhrWgD25s1f4FfOt8VZtpHNBZ1gUk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XhPhr/UyLxbKAs2IMOjk0VguggdStLske5yroaSsJC+j0FI7vCE5mPMed9chLcnsc IWAKfpCrynUsrFZ5efHHA11nyP4WeOp1dX6ru4ZHWwZi4GcGJR9yhXs3T2MRgRKmh8 RRBRhjDqBAF8CaBmKbgXw94BrfbZp0JvLUcYHu+ZtGvzDxQruLS9Xv/WWZSclviW0F ProR84+7gs33nQK+OuLEmIa3WD9y2CtqCRI/F4jEMFsd6cTDJarhekrxRrE0yh48k3 H507S4H+tutcesP6rz98WdH579J3/tvND+yWHm0rGUDj39m3e9fEOlVtfZ5gTg6qCY 4VBQQrg9Ttq2A== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld , Suleiman Souhlal , Youssef Esmat Subject: [PATCH V7 6/9] sched/fair: Fair server interface Date: Mon, 27 May 2024 14:06:52 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add an interface for fair server setup on debugfs. Each CPU has two files under /debug/sched/fair_server/cpu{ID}: - runtime: set runtime in ns - period: set period in ns This then leaves /proc/sys/kernel/sched_rt_{period,runtime}_us to set bounds on admission control. The interface also add the server to the dl bandwidth accounting. Signed-off-by: Daniel Bristot de Oliveira Reviewed-by: Suleiman Souhlal Tested-by: Juri Lelli Tested-by: Vineeth Pillai --- kernel/sched/deadline.c | 103 +++++++++++++++++++++----- kernel/sched/debug.c | 159 ++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 3 + kernel/sched/topology.c | 8 ++ 4 files changed, 256 insertions(+), 17 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index eddfe18d9762..f8afe0a69c1e 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -320,19 +320,12 @@ void sub_running_bw(struct sched_dl_entity *dl_se, st= ruct dl_rq *dl_rq) __sub_running_bw(dl_se->dl_bw, dl_rq); } =20 -static void dl_change_utilization(struct task_struct *p, u64 new_bw) +static void dl_rq_change_utilization(struct rq *rq, struct sched_dl_entity= *dl_se, u64 new_bw) { - struct rq *rq; - - WARN_ON_ONCE(p->dl.flags & SCHED_FLAG_SUGOV); - - if (task_on_rq_queued(p)) - return; + if (dl_se->dl_non_contending) { + sub_running_bw(dl_se, &rq->dl); + dl_se->dl_non_contending =3D 0; =20 - rq =3D task_rq(p); - if (p->dl.dl_non_contending) { - sub_running_bw(&p->dl, &rq->dl); - p->dl.dl_non_contending =3D 0; /* * If the timer handler is currently running and the * timer cannot be canceled, inactive_task_timer() @@ -340,13 +333,25 @@ static void dl_change_utilization(struct task_struct = *p, u64 new_bw) * will not touch the rq's active utilization, * so we are still safe. */ - if (hrtimer_try_to_cancel(&p->dl.inactive_timer) =3D=3D 1) - put_task_struct(p); + if (hrtimer_try_to_cancel(&dl_se->inactive_timer) =3D=3D 1) { + if (!dl_server(dl_se)) + put_task_struct(dl_task_of(dl_se)); + } } - __sub_rq_bw(p->dl.dl_bw, &rq->dl); + __sub_rq_bw(dl_se->dl_bw, &rq->dl); __add_rq_bw(new_bw, &rq->dl); } =20 +static void dl_change_utilization(struct task_struct *p, u64 new_bw) +{ + WARN_ON_ONCE(p->dl.flags & SCHED_FLAG_SUGOV); + + if (task_on_rq_queued(p)) + return; + + dl_rq_change_utilization(task_rq(p), &p->dl, new_bw); +} + static void __dl_clear_params(struct sched_dl_entity *dl_se); =20 /* @@ -1621,11 +1626,17 @@ void dl_server_start(struct sched_dl_entity *dl_se) { struct rq *rq =3D dl_se->rq; =20 + /* + * XXX: the apply do not work fine at the init phase for the + * fair server because things are not yet set. We need to improve + * this before getting generic. + */ if (!dl_server(dl_se)) { /* Disabled */ - dl_se->dl_runtime =3D 0; - dl_se->dl_deadline =3D 1000 * NSEC_PER_MSEC; - dl_se->dl_period =3D 1000 * NSEC_PER_MSEC; + u64 runtime =3D 0; + u64 period =3D 1000 * NSEC_PER_MSEC; + + dl_server_apply_params(dl_se, runtime, period, 1); =20 dl_se->dl_server =3D 1; dl_se->dl_defer =3D 1; @@ -1660,6 +1671,64 @@ void dl_server_init(struct sched_dl_entity *dl_se, s= truct rq *rq, dl_se->server_pick =3D pick; } =20 +void __dl_server_attach_root(struct sched_dl_entity *dl_se, struct rq *rq) +{ + u64 new_bw =3D dl_se->dl_bw; + int cpu =3D cpu_of(rq); + struct dl_bw *dl_b; + + dl_b =3D dl_bw_of(cpu_of(rq)); + guard(raw_spinlock)(&dl_b->lock); + + if (!dl_bw_cpus(cpu)) + return; + + __dl_add(dl_b, new_bw, dl_bw_cpus(cpu)); +} + +int dl_server_apply_params(struct sched_dl_entity *dl_se, u64 runtime, u64= period, bool init) +{ + u64 old_bw =3D init ? 0 : to_ratio(dl_se->dl_period, dl_se->dl_runtime); + u64 new_bw =3D to_ratio(period, runtime); + struct rq *rq =3D dl_se->rq; + int cpu =3D cpu_of(rq); + struct dl_bw *dl_b; + unsigned long cap; + int retval =3D 0; + int cpus; + + dl_b =3D dl_bw_of(cpu); + guard(raw_spinlock)(&dl_b->lock); + + cpus =3D dl_bw_cpus(cpu); + cap =3D dl_bw_capacity(cpu); + + if (__dl_overflow(dl_b, cap, old_bw, new_bw)) + return -EBUSY; + + if (init) { + __add_rq_bw(new_bw, &rq->dl); + __dl_add(dl_b, new_bw, cpus); + } else { + __dl_sub(dl_b, dl_se->dl_bw, cpus); + __dl_add(dl_b, new_bw, cpus); + + dl_rq_change_utilization(rq, dl_se, new_bw); + } + + dl_se->dl_runtime =3D runtime; + dl_se->dl_deadline =3D period; + dl_se->dl_period =3D period; + + dl_se->runtime =3D 0; + dl_se->deadline =3D 0; + + dl_se->dl_bw =3D to_ratio(dl_se->dl_period, dl_se->dl_runtime); + dl_se->dl_density =3D to_ratio(dl_se->dl_deadline, dl_se->dl_runtime); + + return retval; +} + /* * Update the current task's runtime statistics (provided it is still * a -deadline task and has not been removed from the dl_rq). diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index c1eb9a1afd13..b14ffb100867 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -333,8 +333,165 @@ static const struct file_operations sched_debug_fops = =3D { .release =3D seq_release, }; =20 +enum dl_param { + DL_RUNTIME =3D 0, + DL_PERIOD, +}; + +static unsigned long fair_server_period_max =3D (1 << 22) * NSEC_PER_USEC;= /* ~4 seconds */ +static unsigned long fair_server_period_min =3D (100) * NSEC_PER_USEC; = /* 100 us */ + +static ssize_t sched_fair_server_write(struct file *filp, const char __use= r *ubuf, + size_t cnt, loff_t *ppos, enum dl_param param) +{ + long cpu =3D (long) ((struct seq_file *) filp->private_data)->private; + struct rq *rq =3D cpu_rq(cpu); + u64 runtime, period; + size_t err; + int retval; + u64 value; + + err =3D kstrtoull_from_user(ubuf, cnt, 10, &value); + if (err) + return err; + + scoped_guard (rq_lock_irqsave, rq) { + runtime =3D rq->fair_server.dl_runtime; + period =3D rq->fair_server.dl_period; + + switch (param) { + case DL_RUNTIME: + if (runtime =3D=3D value) + break; + runtime =3D value; + break; + case DL_PERIOD: + if (value =3D=3D period) + break; + period =3D value; + break; + } + + if (runtime > period || + period > fair_server_period_max || + period < fair_server_period_min) { + return -EINVAL; + } + + if (rq->cfs.h_nr_running) { + update_rq_clock(rq); + dl_server_stop(&rq->fair_server); + } + + retval =3D dl_server_apply_params(&rq->fair_server, runtime, period, 0); + if (retval) + cnt =3D retval; + + if (!runtime) + printk_deferred("Fair server disabled in CPU %d, system may crash due t= o starvation.\n", + cpu_of(rq)); + + if (rq->cfs.h_nr_running) + dl_server_start(&rq->fair_server); + } + + *ppos +=3D cnt; + return cnt; +} + +static size_t sched_fair_server_show(struct seq_file *m, void *v, enum dl_= param param) +{ + unsigned long cpu =3D (unsigned long) m->private; + struct rq *rq =3D cpu_rq(cpu); + u64 value; + + switch (param) { + case DL_RUNTIME: + value =3D rq->fair_server.dl_runtime; + break; + case DL_PERIOD: + value =3D rq->fair_server.dl_period; + break; + } + + seq_printf(m, "%llu\n", value); + return 0; + +} + +static ssize_t +sched_fair_server_runtime_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + return sched_fair_server_write(filp, ubuf, cnt, ppos, DL_RUNTIME); +} + +static int sched_fair_server_runtime_show(struct seq_file *m, void *v) +{ + return sched_fair_server_show(m, v, DL_RUNTIME); +} + +static int sched_fair_server_runtime_open(struct inode *inode, struct file= *filp) +{ + return single_open(filp, sched_fair_server_runtime_show, inode->i_private= ); +} + +static const struct file_operations fair_server_runtime_fops =3D { + .open =3D sched_fair_server_runtime_open, + .write =3D sched_fair_server_runtime_write, + .read =3D seq_read, + .llseek =3D seq_lseek, + .release =3D single_release, +}; + +static ssize_t +sched_fair_server_period_write(struct file *filp, const char __user *ubuf, + size_t cnt, loff_t *ppos) +{ + return sched_fair_server_write(filp, ubuf, cnt, ppos, DL_PERIOD); +} + +static int sched_fair_server_period_show(struct seq_file *m, void *v) +{ + return sched_fair_server_show(m, v, DL_PERIOD); +} + +static int sched_fair_server_period_open(struct inode *inode, struct file = *filp) +{ + return single_open(filp, sched_fair_server_period_show, inode->i_private); +} + +static const struct file_operations fair_server_period_fops =3D { + .open =3D sched_fair_server_period_open, + .write =3D sched_fair_server_period_write, + .read =3D seq_read, + .llseek =3D seq_lseek, + .release =3D single_release, +}; + static struct dentry *debugfs_sched; =20 +static void debugfs_fair_server_init(void) +{ + struct dentry *d_fair; + unsigned long cpu; + + d_fair =3D debugfs_create_dir("fair_server", debugfs_sched); + if (!d_fair) + return; + + for_each_possible_cpu(cpu) { + struct dentry *d_cpu; + char buf[32]; + + snprintf(buf, sizeof(buf), "cpu%lu", cpu); + d_cpu =3D debugfs_create_dir(buf, d_fair); + + debugfs_create_file("runtime", 0644, d_cpu, (void *) cpu, &fair_server_r= untime_fops); + debugfs_create_file("period", 0644, d_cpu, (void *) cpu, &fair_server_pe= riod_fops); + } +} + static __init int sched_init_debug(void) { struct dentry __maybe_unused *numa; @@ -374,6 +531,8 @@ static __init int sched_init_debug(void) =20 debugfs_create_file("debug", 0444, debugfs_sched, NULL, &sched_debug_fops= ); =20 + debugfs_fair_server_init(); + return 0; } late_initcall(sched_init_debug); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 76751b945474..ed7be7b085af 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -359,6 +359,9 @@ extern void dl_server_init(struct sched_dl_entity *dl_s= e, struct rq *rq, extern void dl_server_update_idle_time(struct rq *rq, struct task_struct *p); extern void fair_server_init(struct rq *rq); +extern void __dl_server_attach_root(struct sched_dl_entity *dl_se, struct = rq *rq); +extern int dl_server_apply_params(struct sched_dl_entity *dl_se, + u64 runtime, u64 period, bool init); =20 #ifdef CONFIG_CGROUP_SCHED =20 diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index a6994a1fcc90..a172ed4d95af 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -516,6 +516,14 @@ void rq_attach_root(struct rq *rq, struct root_domain = *rd) if (cpumask_test_cpu(rq->cpu, cpu_active_mask)) set_rq_online(rq); =20 + /* + * Because the rq is not a task, dl_add_task_root_domain() did not + * move the fair server bw to the rd if it already started. + * Add it now. + */ + if (rq->fair_server.dl_server) + __dl_server_attach_root(&rq->fair_server, rq); + rq_unlock_irqrestore(rq, &rf); =20 if (old_rd) --=20 2.45.1 From nobody Wed Dec 31 10:03:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6D1F15E5A1 for ; Mon, 27 May 2024 12:07:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811662; cv=none; b=WGrkARdPQtIeRnRwERsIukPbOoM3m2XEcnKr9sCDl2NmCZe6I2Y6sk1ef3l7UjVc2U99Yp3tXGJGPFewGwOSRsAzqEqdGtroRC6tiJqW3MdQGzYTOEy/CWSoHb+aHHhsAc/vPPIRM/EuPXuTWgCUapJy+MsHhZGhoPSFw6ICxcQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811662; c=relaxed/simple; bh=NDyaoW3Q0zMLZglcicXIeo2NJZBNXM84ySYhq0BL5yo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=g3MdB9+PE3umBVfxCBJfZo1vSyCFHNtkGi4xPG6gXHybx0lT2mJLucMKpYRFu7QOSgcZ3KWnGpkNH9J8yTKx3oQ5cXnIf8U5YMTPhOBqyyjHFydse8Xg6ZQddjhUttJXn/U9I4QFO4hrhV77D/A0g695OVwAOvOw+tWh8zXc42c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=vPr2yflF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="vPr2yflF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A87F4C32789; Mon, 27 May 2024 12:07:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716811662; bh=NDyaoW3Q0zMLZglcicXIeo2NJZBNXM84ySYhq0BL5yo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=vPr2yflFqRxoc2GaYKemAj+WW3w2guVJfzqE48MR3fXAcEl1cJ2+30qlA7ZuKuZgV PxRayVXh1XMqTyg65D9nZ7YfbICGl1+Ez29gjR5icOTLg077RVhuepX0IOEiWCQv6N 5fYySXbJtFcZ9wZWgQ51h5reHWplz11AYcMp0fSHZWrph12GjOGTUUNxLlrGiWBXM+ hoje3t+pzPuFqlgDOKJnDsgN5aIkRWoR2y6jdxq+VdIsbdKS7grvfNaL5XZH6sUdQq Dka+SLuJicCz6ncXVz0P2FCypn3eYIWI+w0criGF/ogt2DOohHf3BBdV6YSXllDm/R a9fIvwV6ockcQ== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld , Suleiman Souhlal , Youssef Esmat Subject: [PATCH V7 7/9] sched/core: Fix priority checking for DL server picks Date: Mon, 27 May 2024 14:06:53 +0200 Message-ID: <48b78521d86f3b33c24994d843c1aad6b987dda9.1716811044.git.bristot@kernel.org> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Joel Fernandes (Google)" In core scheduling, a DL server pick (which is CFS task) should be given higher priority than tasks in other classes. Not doing so causes CFS starvation. A kselftest is added later to demonstrate this. A CFS task that is competing with RT tasks can be completely starved without this and the DL server's boosting completely ignored. Fix these problems. Reviewed-by: Vineeth Pillai Reported-by: Suleiman Souhlal Signed-off-by: Joel Fernandes (Google) Signed-off-by: Daniel Bristot de Oliveira Reviewed-by: Suleiman Souhlal Tested-by: Juri Lelli Tested-by: Vineeth Pillai --- kernel/sched/core.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 53f0470a1d0a..01336277eac9 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -162,6 +162,9 @@ static inline int __task_prio(const struct task_struct = *p) if (p->sched_class =3D=3D &stop_sched_class) /* trumps deadline */ return -2; =20 + if (p->dl_server) + return -1; /* deadline */ + if (rt_prio(p->prio)) /* includes deadline */ return p->prio; /* [-1, 99] */ =20 @@ -191,8 +194,24 @@ static inline bool prio_less(const struct task_struct = *a, if (-pb < -pa) return false; =20 - if (pa =3D=3D -1) /* dl_prio() doesn't work because of stop_class above */ - return !dl_time_before(a->dl.deadline, b->dl.deadline); + if (pa =3D=3D -1) { /* dl_prio() doesn't work because of stop_class above= */ + const struct sched_dl_entity *a_dl, *b_dl; + + a_dl =3D &a->dl; + /* + * Since,'a' and 'b' can be CFS tasks served by DL server, + * __task_prio() can return -1 (for DL) even for those. In that + * case, get to the dl_server's DL entity. + */ + if (a->dl_server) + a_dl =3D a->dl_server; + + b_dl =3D &b->dl; + if (b->dl_server) + b_dl =3D b->dl_server; + + return !dl_time_before(a_dl->deadline, b_dl->deadline); + } =20 if (pa =3D=3D MAX_RT_PRIO + MAX_NICE) /* fair */ return cfs_prio_less(a, b, in_fi); --=20 2.45.1 From nobody Wed Dec 31 10:03:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A8C015E5A1 for ; Mon, 27 May 2024 12:07:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811668; cv=none; b=Uy76pPUg/6ZNXyU4HzXXBUdDRX6IeB94dqiHKrpQIfSwZOot4XmG1YCyhULvahhD8hMrUJFgqjBSdFs6jfkY0BlGWrfCbbcsyT6MxNjK7VPxPNX5Nwcz/xx/incRY+9lcujBEhi2ZHG90RaMvkJVKbx8Di9E80qTnyAJcWxU0Iw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811668; c=relaxed/simple; bh=TxEOnzOIGujg32HlOqWh78fNUd3cZ9WBcF3eGv0gPuQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=etQIEkDvSPPIEBfUWcJp8iF8gl6Y1cNfdmM3u3EgvmB0SvmXdIzSQOkshz3qlhruIJCk32Dndahhsw0RGSJL0B0Z3mkSOgAYxjnUSzaNVMmiHJ2Jp9T5I5oFwIes/VH4rZNicwQeIVEynw5ILzZei2cTZ5mNu5iH32rwUewl+Rg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Cmke/4R/; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Cmke/4R/" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C7341C32781; Mon, 27 May 2024 12:07:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716811667; bh=TxEOnzOIGujg32HlOqWh78fNUd3cZ9WBcF3eGv0gPuQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Cmke/4R/wN8jjP+oMQSZTK1Q78xdhT484z6rsM/9wIYAqu//fb2XsPp+n+Et5sJ1x aMb+kX0Nlx6Wd6FPMZy6hD7NVjg93ToeFZa3kPhRCVwMRclpAUd365J6h3pX7ESwIt oIDSTt2ektVR4elbkDHd+tWvJwf0U8zHihP7AJOS4PK1YbRtgpuGXtRTopthf9G3zb SgMBbOY2PTrc+jP16Ga5v93UXL231vGkwlFteLqx/DhwiWvbw+XBAZDCMZkU7piVqp j1ezi2z8aU8836HYIY/4XftiJAhQeuykem4wkE3E374ixLBvUGcf9vKlSezZm/Syjm MnpeJ1dFkht6Q== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld , Suleiman Souhlal , Youssef Esmat Subject: [PATCH V7 8/9] sched/core: Fix picking of tasks for core scheduling with DL server Date: Mon, 27 May 2024 14:06:54 +0200 Message-ID: X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Joel Fernandes (Google)" * Use simple CFS pick_task for DL pick_task DL server's pick_task calls CFS's pick_next_task_fair(), this is wrong because core scheduling's pick_task only calls CFS's pick_task() for evaluation / checking of the CFS task (comparing across CPUs), not for actually affirmatively picking the next task. This causes RB tree corruption issues in CFS that were found by syzbot. * Make pick_task_fair clear DL server A DL task pick might set ->dl_server, but it is possible the task will never run (say the other HT has a stop task). If the CFS task is picked in the future directly (say without DL server), ->dl_server will be set. So clear it in pick_task_fair(). This fixes the KASAN issue reported by syzbot in set_next_entity(). (DL refactoring suggestions by Vineeth Pillai). Reviewed-by: Vineeth Pillai Reported-by: Suleiman Souhlal Signed-off-by: Joel Fernandes (Google) Signed-off-by: Daniel Bristot de Oliveira Reviewed-by: Suleiman Souhlal Tested-by: Juri Lelli Tested-by: Vineeth Pillai --- include/linux/sched.h | 3 ++- kernel/sched/deadline.c | 27 ++++++++++++++++++++++----- kernel/sched/fair.c | 23 +++++++++++++++++++++-- kernel/sched/sched.h | 3 ++- 4 files changed, 47 insertions(+), 9 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 611771fec4df..eb8f8b7929c8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -684,7 +684,8 @@ struct sched_dl_entity { */ struct rq *rq; dl_server_has_tasks_f server_has_tasks; - dl_server_pick_f server_pick; + dl_server_pick_f server_pick_next; + dl_server_pick_f server_pick_task; =20 #ifdef CONFIG_RT_MUTEXES /* diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index f8afe0a69c1e..0dbb42cf7fe6 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1664,11 +1664,13 @@ void dl_server_stop(struct sched_dl_entity *dl_se) =20 void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, dl_server_has_tasks_f has_tasks, - dl_server_pick_f pick) + dl_server_pick_f pick_next, + dl_server_pick_f pick_task) { dl_se->rq =3D rq; dl_se->server_has_tasks =3D has_tasks; - dl_se->server_pick =3D pick; + dl_se->server_pick_next =3D pick_next; + dl_se->server_pick_task =3D pick_task; } =20 void __dl_server_attach_root(struct sched_dl_entity *dl_se, struct rq *rq) @@ -2394,7 +2396,12 @@ static struct sched_dl_entity *pick_next_dl_entity(s= truct dl_rq *dl_rq) return __node_2_dle(left); } =20 -static struct task_struct *pick_task_dl(struct rq *rq) +/* + * __pick_next_task_dl - Helper to pick the next -deadline task to run. + * @rq: The runqueue to pick the next task from. + * @peek: If true, just peek at the next task. Only relevant for dlserver. + */ +static struct task_struct *__pick_next_task_dl(struct rq *rq, bool peek) { struct sched_dl_entity *dl_se; struct dl_rq *dl_rq =3D &rq->dl; @@ -2408,7 +2415,10 @@ static struct task_struct *pick_task_dl(struct rq *r= q) WARN_ON_ONCE(!dl_se); =20 if (dl_server(dl_se)) { - p =3D dl_se->server_pick(dl_se); + if (IS_ENABLED(CONFIG_SMP) && peek) + p =3D dl_se->server_pick_task(dl_se); + else + p =3D dl_se->server_pick_next(dl_se); if (!p) { WARN_ON_ONCE(1); dl_se->dl_yielded =3D 1; @@ -2423,11 +2433,18 @@ static struct task_struct *pick_task_dl(struct rq *= rq) return p; } =20 +#ifdef CONFIG_SMP +static struct task_struct *pick_task_dl(struct rq *rq) +{ + return __pick_next_task_dl(rq, true); +} +#endif + static struct task_struct *pick_next_task_dl(struct rq *rq) { struct task_struct *p; =20 - p =3D pick_task_dl(rq); + p =3D __pick_next_task_dl(rq, false); if (!p) return p; =20 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 20e8b02c5cb3..14ec002bb4f9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8484,6 +8484,14 @@ static struct task_struct *pick_task_fair(struct rq = *rq) cfs_rq =3D group_cfs_rq(se); } while (cfs_rq); =20 + /* + * This can be called from directly from CFS's ->pick_task() or indirectly + * from DL's ->pick_task when fair server is enabled. In the indirect cas= e, + * DL will set ->dl_server just after this function is called, so its Ok = to + * clear. In the direct case, we are picking directly so we must clear it. + */ + task_of(se)->dl_server =3D NULL; + return task_of(se); } #endif @@ -8643,7 +8651,16 @@ static bool fair_server_has_tasks(struct sched_dl_en= tity *dl_se) return !!dl_se->rq->cfs.nr_running; } =20 -static struct task_struct *fair_server_pick(struct sched_dl_entity *dl_se) +static struct task_struct *fair_server_pick_task(struct sched_dl_entity *d= l_se) +{ +#ifdef CONFIG_SMP + return pick_task_fair(dl_se->rq); +#else + return NULL; +#endif +} + +static struct task_struct *fair_server_pick_next(struct sched_dl_entity *d= l_se) { return pick_next_task_fair(dl_se->rq, NULL, NULL); } @@ -8654,7 +8671,9 @@ void fair_server_init(struct rq *rq) =20 init_dl_entity(dl_se); =20 - dl_server_init(dl_se, rq, fair_server_has_tasks, fair_server_pick); + dl_server_init(dl_se, rq, fair_server_has_tasks, fair_server_pick_next, + fair_server_pick_task); + } =20 /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index ed7be7b085af..3b8684b5ec8e 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -354,7 +354,8 @@ extern void dl_server_start(struct sched_dl_entity *dl_= se); extern void dl_server_stop(struct sched_dl_entity *dl_se); extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, dl_server_has_tasks_f has_tasks, - dl_server_pick_f pick); + dl_server_pick_f pick_next, + dl_server_pick_f pick_task); =20 extern void dl_server_update_idle_time(struct rq *rq, struct task_struct *p); --=20 2.45.1 From nobody Wed Dec 31 10:03:19 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4135615E5DE for ; Mon, 27 May 2024 12:07:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811673; cv=none; b=ZgpQPhIUV1QIDzZOTtXlysiL/iR9V2GlEtN6g+YN7iwBM/IMrJChUPzSW/hzhq36ri1byuOPMkmelIvSyRsFG48hj8zoknqnCBPm9obznx3GlfKUBYkE/lhKcyPuzfdcwS5DhfaU1vYbBmLaFyx6s1gr3DLD0s+X/RM2EdmgTLE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1716811673; c=relaxed/simple; bh=9iphjjJ3l0wGAk7aOQyFCRiylzWP1wts1/hpHCZEWt4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=afgbMNqk9aQWg68fJ55PN5rNccqX8jFH+9PVoilZ4WUCYTc9SbDgfUMKxJYxuKGR5mmMrfDhSIHf+iDQ5qvh5lXcWJNmKKl1dWxTtOUERi+Ouh65bAt/z8u3bnIA+8cgaDnv/0Ifjc9jVi2nQhxQCAe3PIDQifgC1dQ6pmQ/VU0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=odWyLTXg; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="odWyLTXg" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 35BE3C2BBFC; Mon, 27 May 2024 12:07:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1716811672; bh=9iphjjJ3l0wGAk7aOQyFCRiylzWP1wts1/hpHCZEWt4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=odWyLTXgmxbOZj+K6wt4ojKpcBgaD1Mohgq5Nn91w3SyPMQIv349voOWkMbg4iyTh UZy9a9sJBrY//v+h8YVpE4SpPXX+I5HTIU2Duu9c8LfyLtwAEUI6Bd19/V0CMuByC/ QM8lh0ZrStApAVR0OMPd6H9yC/hQScPtemO6Sd5OHa4f55QTIgcq9Bm7ydTkI4HPz/ n+iQqyg9AI1U8n09AZ0jrw3EHo4NkAwvt4xZ/htGd+jgcHLm8+4dfbJYcsg6FpFuaH GgR+L7blagK1xdPGDqOJw7NkS8WexpnG7oPKcbWrPvb2ixUwXihuln/DkHlMGXe0d6 JqqnBn5DbD2UQ== From: Daniel Bristot de Oliveira To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , linux-kernel@vger.kernel.org, Luca Abeni , Tommaso Cucinotta , Thomas Gleixner , Joel Fernandes , Vineeth Pillai , Shuah Khan , bristot@kernel.org, Phil Auld , Suleiman Souhlal , Youssef Esmat Subject: [PATCH V7 9/9] sched/rt: Remove default bandwidth control Date: Mon, 27 May 2024 14:06:55 +0200 Message-ID: <14d562db55df5c3c780d91940743acb166895ef7.1716811044.git.bristot@kernel.org> X-Mailer: git-send-email 2.45.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Now that fair_server exists, we no longer need RT bandwidth control unless RT_GROUP_SCHED. Enable fair_server with parameters equivalent to RT throttling. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Daniel Bristot de Oliveira Acked-by: Juri Lelli Reviewed-by: Suleiman Souhlal Tested-by: Juri Lelli Tested-by: Vineeth Pillai --- kernel/sched/core.c | 9 +- kernel/sched/deadline.c | 5 +- kernel/sched/debug.c | 3 + kernel/sched/rt.c | 242 ++++++++++++++++++---------------------- kernel/sched/sched.h | 3 +- 5 files changed, 120 insertions(+), 142 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 01336277eac9..8439c2f992db 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9987,8 +9987,6 @@ void __init sched_init(void) #endif /* CONFIG_RT_GROUP_SCHED */ } =20 - init_rt_bandwidth(&def_rt_bandwidth, global_rt_period(), global_rt_runtim= e()); - #ifdef CONFIG_SMP init_defrootdomain(); #endif @@ -10043,8 +10041,13 @@ void __init sched_init(void) init_tg_cfs_entry(&root_task_group, &rq->cfs, NULL, i, NULL); #endif /* CONFIG_FAIR_GROUP_SCHED */ =20 - rq->rt.rt_runtime =3D def_rt_bandwidth.rt_runtime; #ifdef CONFIG_RT_GROUP_SCHED + /* + * This is required for init cpu because rt.c:__enable_runtime() + * starts working after scheduler_running, which is not the case + * yet. + */ + rq->rt.rt_runtime =3D global_rt_runtime(); init_tg_rt_entry(&root_task_group, &rq->rt, NULL, i, NULL); #endif #ifdef CONFIG_SMP diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index 0dbb42cf7fe6..7df8179bfa08 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1554,6 +1554,7 @@ static void update_curr_dl_se(struct rq *rq, struct s= ched_dl_entity *dl_se, s64 if (dl_se =3D=3D &rq->fair_server) return; =20 +#ifdef CONFIG_RT_GROUP_SCHED /* * Because -- for now -- we share the rt bandwidth, we need to * account our runtime there too, otherwise actual rt tasks @@ -1578,6 +1579,7 @@ static void update_curr_dl_se(struct rq *rq, struct s= ched_dl_entity *dl_se, s64 rt_rq->rt_time +=3D delta_exec; raw_spin_unlock(&rt_rq->rt_runtime_lock); } +#endif } =20 /* @@ -1632,8 +1634,7 @@ void dl_server_start(struct sched_dl_entity *dl_se) * this before getting generic. */ if (!dl_server(dl_se)) { - /* Disabled */ - u64 runtime =3D 0; + u64 runtime =3D 50 * NSEC_PER_MSEC; u64 period =3D 1000 * NSEC_PER_MSEC; =20 dl_server_apply_params(dl_se, runtime, period, 1); diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index b14ffb100867..2d5851d65c67 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -889,9 +889,12 @@ void print_rt_rq(struct seq_file *m, int cpu, struct r= t_rq *rt_rq) SEQ_printf(m, " .%-30s: %Ld.%06ld\n", #x, SPLIT_NS(rt_rq->x)) =20 PU(rt_nr_running); + +#ifdef CONFIG_RT_GROUP_SCHED P(rt_throttled); PN(rt_time); PN(rt_runtime); +#endif =20 #undef PN #undef PU diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index aa4c1c874fa4..fb591b98d71f 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -8,10 +8,6 @@ int sched_rr_timeslice =3D RR_TIMESLICE; /* More than 4 hours if BW_SHIFT equals 20. */ static const u64 max_rt_runtime =3D MAX_BW; =20 -static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun= ); - -struct rt_bandwidth def_rt_bandwidth; - /* * period over which we measure -rt task CPU usage in us. * default: 1s @@ -66,6 +62,40 @@ static int __init sched_rt_sysctl_init(void) late_initcall(sched_rt_sysctl_init); #endif =20 +void init_rt_rq(struct rt_rq *rt_rq) +{ + struct rt_prio_array *array; + int i; + + array =3D &rt_rq->active; + for (i =3D 0; i < MAX_RT_PRIO; i++) { + INIT_LIST_HEAD(array->queue + i); + __clear_bit(i, array->bitmap); + } + /* delimiter for bitsearch: */ + __set_bit(MAX_RT_PRIO, array->bitmap); + +#if defined CONFIG_SMP + rt_rq->highest_prio.curr =3D MAX_RT_PRIO-1; + rt_rq->highest_prio.next =3D MAX_RT_PRIO-1; + rt_rq->overloaded =3D 0; + plist_head_init(&rt_rq->pushable_tasks); +#endif /* CONFIG_SMP */ + /* We start is dequeued state, because no RT tasks are queued */ + rt_rq->rt_queued =3D 0; + +#ifdef CONFIG_RT_GROUP_SCHED + rt_rq->rt_time =3D 0; + rt_rq->rt_throttled =3D 0; + rt_rq->rt_runtime =3D 0; + raw_spin_lock_init(&rt_rq->rt_runtime_lock); +#endif +} + +#ifdef CONFIG_RT_GROUP_SCHED + +static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun= ); + static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer) { struct rt_bandwidth *rt_b =3D @@ -130,35 +160,6 @@ static void start_rt_bandwidth(struct rt_bandwidth *rt= _b) do_start_rt_bandwidth(rt_b); } =20 -void init_rt_rq(struct rt_rq *rt_rq) -{ - struct rt_prio_array *array; - int i; - - array =3D &rt_rq->active; - for (i =3D 0; i < MAX_RT_PRIO; i++) { - INIT_LIST_HEAD(array->queue + i); - __clear_bit(i, array->bitmap); - } - /* delimiter for bitsearch: */ - __set_bit(MAX_RT_PRIO, array->bitmap); - -#if defined CONFIG_SMP - rt_rq->highest_prio.curr =3D MAX_RT_PRIO-1; - rt_rq->highest_prio.next =3D MAX_RT_PRIO-1; - rt_rq->overloaded =3D 0; - plist_head_init(&rt_rq->pushable_tasks); -#endif /* CONFIG_SMP */ - /* We start is dequeued state, because no RT tasks are queued */ - rt_rq->rt_queued =3D 0; - - rt_rq->rt_time =3D 0; - rt_rq->rt_throttled =3D 0; - rt_rq->rt_runtime =3D 0; - raw_spin_lock_init(&rt_rq->rt_runtime_lock); -} - -#ifdef CONFIG_RT_GROUP_SCHED static void destroy_rt_bandwidth(struct rt_bandwidth *rt_b) { hrtimer_cancel(&rt_b->rt_period_timer); @@ -195,7 +196,6 @@ void unregister_rt_sched_group(struct task_group *tg) { if (tg->rt_se) destroy_rt_bandwidth(&tg->rt_bandwidth); - } =20 void free_rt_sched_group(struct task_group *tg) @@ -253,8 +253,7 @@ int alloc_rt_sched_group(struct task_group *tg, struct = task_group *parent) if (!tg->rt_se) goto err; =20 - init_rt_bandwidth(&tg->rt_bandwidth, - ktime_to_ns(def_rt_bandwidth.rt_period), 0); + init_rt_bandwidth(&tg->rt_bandwidth, ktime_to_ns(global_rt_period()), 0); =20 for_each_possible_cpu(i) { rt_rq =3D kzalloc_node(sizeof(struct rt_rq), @@ -604,70 +603,6 @@ static inline struct rt_bandwidth *sched_rt_bandwidth(= struct rt_rq *rt_rq) return &rt_rq->tg->rt_bandwidth; } =20 -#else /* !CONFIG_RT_GROUP_SCHED */ - -static inline u64 sched_rt_runtime(struct rt_rq *rt_rq) -{ - return rt_rq->rt_runtime; -} - -static inline u64 sched_rt_period(struct rt_rq *rt_rq) -{ - return ktime_to_ns(def_rt_bandwidth.rt_period); -} - -typedef struct rt_rq *rt_rq_iter_t; - -#define for_each_rt_rq(rt_rq, iter, rq) \ - for ((void) iter, rt_rq =3D &rq->rt; rt_rq; rt_rq =3D NULL) - -#define for_each_sched_rt_entity(rt_se) \ - for (; rt_se; rt_se =3D NULL) - -static inline struct rt_rq *group_rt_rq(struct sched_rt_entity *rt_se) -{ - return NULL; -} - -static inline void sched_rt_rq_enqueue(struct rt_rq *rt_rq) -{ - struct rq *rq =3D rq_of_rt_rq(rt_rq); - - if (!rt_rq->rt_nr_running) - return; - - enqueue_top_rt_rq(rt_rq); - resched_curr(rq); -} - -static inline void sched_rt_rq_dequeue(struct rt_rq *rt_rq) -{ - dequeue_top_rt_rq(rt_rq, rt_rq->rt_nr_running); -} - -static inline int rt_rq_throttled(struct rt_rq *rt_rq) -{ - return rt_rq->rt_throttled; -} - -static inline const struct cpumask *sched_rt_period_mask(void) -{ - return cpu_online_mask; -} - -static inline -struct rt_rq *sched_rt_period_rt_rq(struct rt_bandwidth *rt_b, int cpu) -{ - return &cpu_rq(cpu)->rt; -} - -static inline struct rt_bandwidth *sched_rt_bandwidth(struct rt_rq *rt_rq) -{ - return &def_rt_bandwidth; -} - -#endif /* CONFIG_RT_GROUP_SCHED */ - bool sched_rt_bandwidth_account(struct rt_rq *rt_rq) { struct rt_bandwidth *rt_b =3D sched_rt_bandwidth(rt_rq); @@ -859,7 +794,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth= *rt_b, int overrun) const struct cpumask *span; =20 span =3D sched_rt_period_mask(); -#ifdef CONFIG_RT_GROUP_SCHED + /* * FIXME: isolated CPUs should really leave the root task group, * whether they are isolcpus or were isolated via cpusets, lest @@ -871,7 +806,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth= *rt_b, int overrun) */ if (rt_b =3D=3D &root_task_group.rt_bandwidth) span =3D cpu_online_mask; -#endif + for_each_cpu(i, span) { int enqueue =3D 0; struct rt_rq *rt_rq =3D sched_rt_period_rt_rq(rt_b, i); @@ -938,18 +873,6 @@ static int do_sched_rt_period_timer(struct rt_bandwidt= h *rt_b, int overrun) return idle; } =20 -static inline int rt_se_prio(struct sched_rt_entity *rt_se) -{ -#ifdef CONFIG_RT_GROUP_SCHED - struct rt_rq *rt_rq =3D group_rt_rq(rt_se); - - if (rt_rq) - return rt_rq->highest_prio.curr; -#endif - - return rt_task_of(rt_se)->prio; -} - static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq) { u64 runtime =3D sched_rt_runtime(rt_rq); @@ -993,6 +916,72 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_= rq) return 0; } =20 +#else /* !CONFIG_RT_GROUP_SCHED */ + +typedef struct rt_rq *rt_rq_iter_t; + +#define for_each_rt_rq(rt_rq, iter, rq) \ + for ((void) iter, rt_rq =3D &rq->rt; rt_rq; rt_rq =3D NULL) + +#define for_each_sched_rt_entity(rt_se) \ + for (; rt_se; rt_se =3D NULL) + +static inline struct rt_rq *group_rt_rq(struct sched_rt_entity *rt_se) +{ + return NULL; +} + +static inline void sched_rt_rq_enqueue(struct rt_rq *rt_rq) +{ + struct rq *rq =3D rq_of_rt_rq(rt_rq); + + if (!rt_rq->rt_nr_running) + return; + + enqueue_top_rt_rq(rt_rq); + resched_curr(rq); +} + +static inline void sched_rt_rq_dequeue(struct rt_rq *rt_rq) +{ + dequeue_top_rt_rq(rt_rq, rt_rq->rt_nr_running); +} + +static inline int rt_rq_throttled(struct rt_rq *rt_rq) +{ + return false; +} + +static inline const struct cpumask *sched_rt_period_mask(void) +{ + return cpu_online_mask; +} + +static inline +struct rt_rq *sched_rt_period_rt_rq(struct rt_bandwidth *rt_b, int cpu) +{ + return &cpu_rq(cpu)->rt; +} + +#ifdef CONFIG_SMP +static void __enable_runtime(struct rq *rq) { } +static void __disable_runtime(struct rq *rq) { } +#endif + +#endif /* CONFIG_RT_GROUP_SCHED */ + +static inline int rt_se_prio(struct sched_rt_entity *rt_se) +{ +#ifdef CONFIG_RT_GROUP_SCHED + struct rt_rq *rt_rq =3D group_rt_rq(rt_se); + + if (rt_rq) + return rt_rq->highest_prio.curr; +#endif + + return rt_task_of(rt_se)->prio; +} + /* * Update the current task's runtime statistics. Skip current tasks that * are not in our scheduling class. @@ -1000,7 +989,6 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_= rq) static void update_curr_rt(struct rq *rq) { struct task_struct *curr =3D rq->curr; - struct sched_rt_entity *rt_se =3D &curr->rt; s64 delta_exec; =20 if (curr->sched_class !=3D &rt_sched_class) @@ -1010,6 +998,9 @@ static void update_curr_rt(struct rq *rq) if (unlikely(delta_exec <=3D 0)) return; =20 +#ifdef CONFIG_RT_GROUP_SCHED + struct sched_rt_entity *rt_se =3D &curr->rt; + if (!rt_bandwidth_enabled()) return; =20 @@ -1028,6 +1019,7 @@ static void update_curr_rt(struct rq *rq) do_start_rt_bandwidth(sched_rt_bandwidth(rt_rq)); } } +#endif } =20 static void @@ -1184,7 +1176,6 @@ dec_rt_group(struct sched_rt_entity *rt_se, struct rt= _rq *rt_rq) static void inc_rt_group(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq) { - start_rt_bandwidth(&def_rt_bandwidth); } =20 static inline @@ -2912,19 +2903,6 @@ int sched_rt_can_attach(struct task_group *tg, struc= t task_struct *tsk) #ifdef CONFIG_SYSCTL static int sched_rt_global_constraints(void) { - unsigned long flags; - int i; - - raw_spin_lock_irqsave(&def_rt_bandwidth.rt_runtime_lock, flags); - for_each_possible_cpu(i) { - struct rt_rq *rt_rq =3D &cpu_rq(i)->rt; - - raw_spin_lock(&rt_rq->rt_runtime_lock); - rt_rq->rt_runtime =3D global_rt_runtime(); - raw_spin_unlock(&rt_rq->rt_runtime_lock); - } - raw_spin_unlock_irqrestore(&def_rt_bandwidth.rt_runtime_lock, flags); - return 0; } #endif /* CONFIG_SYSCTL */ @@ -2944,12 +2922,6 @@ static int sched_rt_global_validate(void) =20 static void sched_rt_do_global(void) { - unsigned long flags; - - raw_spin_lock_irqsave(&def_rt_bandwidth.rt_runtime_lock, flags); - def_rt_bandwidth.rt_runtime =3D global_rt_runtime(); - def_rt_bandwidth.rt_period =3D ns_to_ktime(global_rt_period()); - raw_spin_unlock_irqrestore(&def_rt_bandwidth.rt_runtime_lock, flags); } =20 static int sched_rt_handler(struct ctl_table *table, int write, void *buff= er, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 3b8684b5ec8e..c93de331171b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -729,13 +729,13 @@ struct rt_rq { #endif /* CONFIG_SMP */ int rt_queued; =20 +#ifdef CONFIG_RT_GROUP_SCHED int rt_throttled; u64 rt_time; u64 rt_runtime; /* Nests inside the rq lock: */ raw_spinlock_t rt_runtime_lock; =20 -#ifdef CONFIG_RT_GROUP_SCHED unsigned int rt_nr_boosted; =20 struct rq *rq; @@ -2478,7 +2478,6 @@ extern void reweight_task(struct task_struct *p, int = prio); extern void resched_curr(struct rq *rq); extern void resched_cpu(int cpu); =20 -extern struct rt_bandwidth def_rt_bandwidth; extern void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 r= untime); extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq); =20 --=20 2.45.1