From nobody Sun Dec 14 06:15:47 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 455C174059 for ; Sat, 27 Jul 2024 11:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078148; cv=none; b=dxxVHjCHytO6A3Rt68lpLiVf3NTSR2cUD0bsf3AQ+qUKCOyCUvqtXOftoFZqzC8C7NdrAy/qUMA8zVOF5G6k7MjhEVHsSvLBnevcpCq8ZZnzGkdnQO7YhpmQNns4uUpDAPozCZY0bj92ADxzdkBq+TXb8jvpN37QSyRPj1Q64gw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078148; c=relaxed/simple; bh=vjHoXtu1Dr7ssLbAC37qswiXh7DpIvczSEIMZRk40yI=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=EMJeW7UCxqlrAR+VElno0J3P/MQnDN2bDGdzgDEELGkkFFI0qxvgY8scwhLnOzGQZXg5aSFJZ1Vzp3WdBvVRxo1FlpPfi28zfG1hhKLkMN1N6JkZnpzpZ/r0viFvevLFwSRhvmI9UxVwKcsTrzkO3yqaJpJ6UR8KA8y/LJPwsEw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Pl+3Y21E; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Pl+3Y21E" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=AtyXK0mDyRpLR40IzL1aJmLNNAhRc7sK+9ltbHhK2LU=; b=Pl+3Y21Ej5c+sXuHRNBxTsoqM7 FQz2D9AiI8yXct+F6d4ej5dwI6yzNEB66r/5nwc53doVsinvpMzcdz7LI0ZRjdjfGeWNoZ9wVFWDY AaPln+69YOIOkhWg6E68VapqWGyKhErFLqlyz6QRHLl4Zyeoc+ugZDpkpvcU/jRTB+GoIbAYmE70k zR+ML/6wlmUMmAXVZBPG2leIlen2SsZD4NCLiQz0fnd38zP1pvBQaKxA2rzqrlzhfuF8r8AKufBW/ yJQZbht6CEDDWw4wSbt4flurGZ9/JMpmRhfX8YPPnP3msz4VUpG5Qy6vx/qZr6a1k5b5STzrnTMHG UPCcVlQQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBf-0000000BGJM-2Yeg; Sat, 27 Jul 2024 11:02:07 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 8FD7E3001AD; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105028.287790895@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:33 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 01/24] sched/eevdf: Add feature comments References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/features.h | 7 +++++++ 1 file changed, 7 insertions(+) --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -5,7 +5,14 @@ * sleep+wake cycles. EEVDF placement strategy #1, #2 if disabled. */ SCHED_FEAT(PLACE_LAG, true) +/* + * Give new tasks half a slice to ease into the competition. + */ SCHED_FEAT(PLACE_DEADLINE_INITIAL, true) +/* + * Inhibit (wakeup) preemption until the current task has either matched t= he + * 0-lag point or until is has exhausted it's slice. + */ SCHED_FEAT(RUN_TO_PARITY, true) =20 /* From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA3754D8BC for ; Sat, 27 Jul 2024 11:02:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078145; cv=none; b=i9K7Vanb9cOLVJG1488SMxjNm6o5ERK1rgzDM2AWo4nUE4Ex+FlY6TSYnHWAz+oQD1JY8iJd8wT32QO0gxNXO/6NDIrWfkER8iRQ8dLPpI3Wqv5vN8thOk108n1XN+p0pYXI0kaC6Lzwg0I6mn/t+uz7otgIZnEzjcZg7t7RR7c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078145; c=relaxed/simple; bh=AusR2py2b2TRuwPprHkpWBQD3DFbwMOr/saRKPYpEiQ=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=myOq2uI3lUVWd5uIsiSlKqfbY18GG6koWMOJiTJNFBtc0UD7mJsKZp1tWhzW8xJj29UjZP+ujoktL3iK24mP9lT9zUAAYjkAWn/vA5q71rpQ4sB5N0hS6jiJmF6H27WSGbwMokGMEL1EaEitlzL7Nn8L/IEDM7FE1YDfUze0HmQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=VmNcrTiY; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="VmNcrTiY" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=GEYq0WT0f8wEWJeuTVZAcjZJLXV5clrhaBXbfYQp8uw=; b=VmNcrTiYGi6zQh4baBY+HNO137 w7pipouNodq3ubvglYHyBRkT3J6aE+Q5iTHyaTcf4eCwstlICtXvU/QoeJmjiLiSiXjsR8KKsdcz0 Xa0QsQS12AzsjMuKQ/SfJdIkZhgUoPDr5sGIxWlfC0qdn5tsCPjnlZalbYWrv18YPSTGaomV884iQ Cslz1vcBthY4Z75gaPEi7Nn9Ow9Us5i5vw08OWfyPZPFL503dnT14y8gtnlS8SKThpJIor+QFwF1u ibfkOYFyGzmrqTtrjxljW/REMGDQEknQaxgvmjxkLeS/IdIfyUDS2JzK7CE72Poje55iLF1TbZ1jj xOqCzgOg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBf-00000004QMZ-2MAE; Sat, 27 Jul 2024 11:02:07 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 92671300820; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105028.395297941@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:34 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 02/24] sched/eevdf: Remove min_vruntime_copy References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since commit e8f331bcc270 ("sched/smp: Use lag to simplify cross-runqueue placement") the min_vruntime_copy is no longer used. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 5 ++--- kernel/sched/sched.h | 4 ---- 2 files changed, 2 insertions(+), 7 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -780,8 +780,7 @@ static void update_min_vruntime(struct c } =20 /* ensure we never gain time by being placed backwards. */ - u64_u32_store(cfs_rq->min_vruntime, - __update_min_vruntime(cfs_rq, vruntime)); + cfs_rq->min_vruntime =3D __update_min_vruntime(cfs_rq, vruntime); } =20 static inline bool __entity_less(struct rb_node *a, const struct rb_node *= b) @@ -12876,7 +12875,7 @@ static void set_next_task_fair(struct rq void init_cfs_rq(struct cfs_rq *cfs_rq) { cfs_rq->tasks_timeline =3D RB_ROOT_CACHED; - u64_u32_store(cfs_rq->min_vruntime, (u64)(-(1LL << 20))); + cfs_rq->min_vruntime =3D (u64)(-(1LL << 20)); #ifdef CONFIG_SMP raw_spin_lock_init(&cfs_rq->removed.lock); #endif --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -599,10 +599,6 @@ struct cfs_rq { u64 min_vruntime_fi; #endif =20 -#ifndef CONFIG_64BIT - u64 min_vruntime_copy; -#endif - struct rb_root_cached tasks_timeline; =20 /* From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BFA44D8BE for ; Sat, 27 Jul 2024 11:02:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; cv=none; b=PrxXFCk/QN627oYJ9pep5TQZI+jXU1FB1WA/tyODzR2n5m3dEHdmyzRwVd+Ipq7Ebgg7Y8t6bulFIUFw7InMsqfKWYZdPrxmFu4pA7GXYkbkXa9ZbRyWf6xVusYthvTEIyEEcrDUxt1zo6PWoZYF1/Y6y362DWEZTO3yMJk21zI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; c=relaxed/simple; bh=CWC6phhl27QKUKHcX+u+q0hZxqkvNID08qRp3b7ns/I=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=mKOstguacygqr/2urakHEzlHOYy1XY6kIqr4HctcCH2DCIF/Y+7YGkWENG5hAgg23owtExMwkuYERXm/VHuC9f94xxhTCN6sXe8VOQVel1WMUU5/mPiI8QJzxsbkrzj2IeLWPs8px57XlEX+L9renMo+R56IcZzBYsvUeYlz7Jg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=CP6M7lko; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="CP6M7lko" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=sIXjnXscTLqsdcYQkDS8hUjtkuTzHJom0pcE5upcMWY=; b=CP6M7lko/4bQZG+rbAPTFJ1wUV ksqa5/GAZc8EWISmn0JD5FPrVntKcf/kXxmirLDaF4ZalGtxL5K6stGJiEo7GhQSIaQLP8uV52bWu JDVkukfbTVu0AcjHL2vqFPfa3ZaFyXXC7jE2uYjDB+uijf0MQTAF6VA9ZI6FhWVZfw1c0EySUT9hq xuVGWYYIds3qhWxWKv6YLsZW5YsNfVFrVnUIstqEYZPmRChyFCTT5y15O4SnIcsj/SJnqLXluAL4E 32gjXVbonsu0ooRJUaEM5rs3ZOnWA2v9EM1uhWgqMbsPmncBYhbdjTr1XTmPnVupyCU6ymsDUPzz2 /Dy6teNg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBf-00000004QMa-2QTA; Sat, 27 Jul 2024 11:02:07 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 96FD330088D; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105028.501679876@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:35 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 03/24] sched/fair: Cleanup pick_task_fair() vs throttle References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Per 54d27365cae8 ("sched/fair: Prevent throttling in early pick_next_task_fair()") the reason check_cfs_rq_runtime() is under the 'if (curr)' check is to ensure the (downward) traversal does not result in an empty cfs_rq. But then the pick_task_fair() 'copy' of all this made it restart the traversal anyway, so that seems to solve the issue too. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ben Segall --- kernel/sched/fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8435,11 +8435,11 @@ static struct task_struct *pick_task_fai update_curr(cfs_rq); else curr =3D NULL; - - if (unlikely(check_cfs_rq_runtime(cfs_rq))) - goto again; } =20 + if (unlikely(check_cfs_rq_runtime(cfs_rq))) + goto again; + se =3D pick_next_entity(cfs_rq); cfs_rq =3D group_cfs_rq(se); } while (cfs_rq); From nobody Sun Dec 14 06:15:47 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B67B455C3E for ; Sat, 27 Jul 2024 11:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078149; cv=none; b=ugudYREeb659N2+OgjBe+VvQRTRziih2OOAB8IdaZMupvGZATg7S90O2dWad0n27m9pzfHVYMHdnP4y86JnqP3prIwJEMBhg752aQW/SM/p/tbZ0dfNT8+S/9aNhNIxl1xu6s+VKVkiFhpqe3X4GpQly+2GB46W9uma1M2sSc+A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078149; c=relaxed/simple; bh=itOTK8ClA+pTJ/B6S2TzJ7Aj4zf26Rg68UHzyKxe108=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=DP/YnJ9QQVL+mBedanzht/5aTmqLqV2ouNL0qJB6oJ9nCZfeCwLqHWcUjhAfwN+T/MghaqfIAGXM2ImvNymNHTlVg5PyyJJH6Cuq36o9SSFB9WXhKf0qd2gEBZ2tqc/D+XVZMmMDPb3T3pBOIT7TEEeu9VSizwVSMb0eJLzzkxs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Pas/nqwO; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Pas/nqwO" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=Y5Ccj0++8TjNxyCdrumDAmasoET46Cckf/W1eeLKY0g=; b=Pas/nqwOlLXNF1z9srw84pRITK wy61UcYWAQYBokTeCDQjkP/j2EZmncaubDVheaAxuClIfHqsoGzC8CN/tMROzmzFKnHjZx1eHjmho 87i4ks8O9B3cbYxYvAXucU6F1GxIiVgspEB20glkzEVF8SuEKlvQdesMSxEwZnk9vc3Km9rPa2JUv Ql2f2Xn95IPodaNAGABd66br8DExQswlXfQh3VTcC2vqNUSip3Bz7wW//1guSLeHwDEDT9QShvwem /48ZbYsXVZWmHlCt1cgb5mueQpFkm6RZvPZEr2Mj7VUWIpPC+YBw+6CINlJA+KQg+FI+l6P58FpId cUWjq16w==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBf-0000000BGJL-2Xv4; Sat, 27 Jul 2024 11:02:07 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 9C49C300CBA; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105028.614707623@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:36 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 04/24] sched/fair: Cleanup pick_task_fair()s curr References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With 4c456c9ad334 ("sched/fair: Remove unused 'curr' argument from pick_next_entity()") curr is no longer being used, so no point in clearing it. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8427,15 +8427,9 @@ static struct task_struct *pick_task_fai return NULL; =20 do { - struct sched_entity *curr =3D cfs_rq->curr; - /* When we pick for a remote RQ, we'll not have done put_prev_entity() */ - if (curr) { - if (curr->on_rq) - update_curr(cfs_rq); - else - curr =3D NULL; - } + if (cfs_rq->curr && cfs_rq->curr->on_rq) + update_curr(cfs_rq); =20 if (unlikely(check_cfs_rq_runtime(cfs_rq))) goto again; From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8981F4D8D0 for ; Sat, 27 Jul 2024 11:02:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; cv=none; b=I3oDlFNoG72fMrJKp5nVq2p7QLNoP4mKdnvJYxQEbw+nd6lEFlsZwumT7PWNq9P29JwGqLk0/RgQbV+BLKKEB+ZHYNtioi4OoRPTaUJyf0MLis1dLJF/TQEu9P+wTykUDRmN887AbVC7Z0WUpBMagUEQvDPo0+wcLg9RtbcXr88= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; c=relaxed/simple; bh=kSvxvy4/yoYnf9FCBRWdJWzSutKP9ReoOQkAiZZfpWw=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=ZHh218cU9Xily+djpQdQF/41ovVFuEiSkHzYIsdrp/Ayu+zfq4CjDjc6e5wRnlGPUEFerQ1webBr7TJGzsDGqHjej21XAwjtgwvLpE6JXsdWGNx1iifuOeCfqUh+lEkIpXu48MGwcP93WGnIe8JCM7+rrdsfFo44AMGfECAHBhU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=IESUCHlN; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="IESUCHlN" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=FG3IvHQufE/O/s5O2v8p/EyTrS09xA8nLYQquzdpLRU=; b=IESUCHlNFsxTWaa4o9gPJ5NwmH o7+Yi/bI0FmoCGH58MzXqHLcfHsLekutner9ktx5A7iUJ1Q/ccv2SgIyUgxBiA3Ju61uTDDudPDTf 78YNwWLz5+7P+Y9iw0f2KlbXLDth5DSt8KiYWMrOqJne+gulS1kBn6YyH97NcEgy2KXCzf2tfFmwx PaEk6+NSEg/gj/7PpLI455FY9tEPev01HxGkFgVcP4eKQp4Fm2h1oXHLZN3vydle+rNFQ03OqTYEH TOCJ5q1d/ILn60ksKvwgGrGeCvjotECmQ1kqC1ZDOGIbhPhxrE2Z572RNjeiiqYFFkXi5wOIBJ6/7 ZcswPJCQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-00000004QMd-0fFo; Sat, 27 Jul 2024 11:02:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id A069A3011E4; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105028.725062368@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:37 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 05/24] sched/fair: Unify pick_{,next_}_task_fair() References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement pick_next_task_fair() in terms of pick_task_fair() to de-duplicate the pick loop. More importantly, this makes all the pick loops use the state-invariant form, which is useful to introduce further re-try conditions in later patches. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 60 ++++++-----------------------------------------= ----- 1 file changed, 8 insertions(+), 52 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8415,7 +8415,6 @@ static void check_preempt_wakeup_fair(st resched_curr(rq); } =20 -#ifdef CONFIG_SMP static struct task_struct *pick_task_fair(struct rq *rq) { struct sched_entity *se; @@ -8427,7 +8426,7 @@ static struct task_struct *pick_task_fai return NULL; =20 do { - /* When we pick for a remote RQ, we'll not have done put_prev_entity() */ + /* Might not have done put_prev_entity() */ if (cfs_rq->curr && cfs_rq->curr->on_rq) update_curr(cfs_rq); =20 @@ -8440,19 +8439,19 @@ static struct task_struct *pick_task_fai =20 return task_of(se); } -#endif =20 struct task_struct * pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_fla= gs *rf) { - struct cfs_rq *cfs_rq =3D &rq->cfs; struct sched_entity *se; struct task_struct *p; int new_tasks; =20 again: - if (!sched_fair_runnable(rq)) + p =3D pick_task_fair(rq); + if (!p) goto idle; + se =3D &p->se; =20 #ifdef CONFIG_FAIR_GROUP_SCHED if (!prev || prev->sched_class !=3D &fair_sched_class) @@ -8464,52 +8463,14 @@ pick_next_task_fair(struct rq *rq, struc * * Therefore attempt to avoid putting and setting the entire cgroup * hierarchy, only change the part that actually changes. - */ - - do { - struct sched_entity *curr =3D cfs_rq->curr; - - /* - * Since we got here without doing put_prev_entity() we also - * have to consider cfs_rq->curr. If it is still a runnable - * entity, update_curr() will update its vruntime, otherwise - * forget we've ever seen it. - */ - if (curr) { - if (curr->on_rq) - update_curr(cfs_rq); - else - curr =3D NULL; - - /* - * This call to check_cfs_rq_runtime() will do the - * throttle and dequeue its entity in the parent(s). - * Therefore the nr_running test will indeed - * be correct. - */ - if (unlikely(check_cfs_rq_runtime(cfs_rq))) { - cfs_rq =3D &rq->cfs; - - if (!cfs_rq->nr_running) - goto idle; - - goto simple; - } - } - - se =3D pick_next_entity(cfs_rq); - cfs_rq =3D group_cfs_rq(se); - } while (cfs_rq); - - p =3D task_of(se); - - /* + * * Since we haven't yet done put_prev_entity and if the selected task * is a different task than we started out with, try and touch the * least amount of cfs_rqs. */ if (prev !=3D p) { struct sched_entity *pse =3D &prev->se; + struct cfs_rq *cfs_rq; =20 while (!(cfs_rq =3D is_same_group(se, pse))) { int se_depth =3D se->depth; @@ -8535,13 +8496,8 @@ pick_next_task_fair(struct rq *rq, struc if (prev) put_prev_task(rq, prev); =20 - do { - se =3D pick_next_entity(cfs_rq); - set_next_entity(cfs_rq, se); - cfs_rq =3D group_cfs_rq(se); - } while (cfs_rq); - - p =3D task_of(se); + for_each_sched_entity(se) + set_next_entity(cfs_rq_of(se), se); =20 done: __maybe_unused; #ifdef CONFIG_SMP From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B88CB77113 for ; Sat, 27 Jul 2024 11:02:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078150; cv=none; b=nQang4pZgNW3Y+VyPMMy/8RT9er4DNyRhgesnhXWQkG12V1XUnETZ/8Y9IEoV/ETzGtxhjl0a8e3boeQy1kj+uDafhACIEHev7LtbVQh+1qDR3MiaNLwLjEvMAVy4/7IVwFITYwWZYAEpnCa4ZCAr7G8robCN67GMFue/rWUBlo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078150; c=relaxed/simple; bh=nhL59T6elYdLuyB0RjGQo0N9qwu/eI7fAleBo/HW70I=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=OChTs/aLVTFwaKYjDc3+8emYRy+2QLiPJrqvMhTUwdKLR5k6LVi3cPrUsLJ9LSmd9x245zWEJ7ubqbcrpNUjxbJx1UNKXuyp5wupI+EXVBbrdi2Ucb9zOvxss4e4qgoDzOSKu5lzW0E3upIzglBqzml/BrMqObRIHtT+LGDBhPs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=IycGFN9A; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="IycGFN9A" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=FRzZ9IuL6b8fEH7RF9UvkS800H6XvcRuToO64xLZ6Jc=; b=IycGFN9AEvXI5Nmvm/QbZAi7wN 0JRgGNbbT8RilcTR0njK3Iq2TK96dxfNBjyRnM9eghJ8SiaGkbvZpjecLEjl9TT7hPgWfi6RXENIS hXiYkN4BqBLMwXpz8j1uS8mPbzuZNr2jQkSJrfcW9XHZc4irKl75VBvK4E/CHScLxOnl4FfKyGGE/ a0WF5PxlhjakrtX2oI3KPSJu7D/OfIN9tlb1ZPKY/lle9ocytCQMt861cCjlXWE+PO4q6tcAL3khu rvo1P1xFUauTGRdwwcuQWHkKnQFCM9A8+FYLXdLKoBP5wMeju8dOKGtx5mwgomx77Iorsph0RbTZV PkfXJfuQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-00000004QMe-0fE9; Sat, 27 Jul 2024 11:02:20 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id A4EB53018AA; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105028.864630153@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:38 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 06/24] sched: Allow sched_class::dequeue_task() to fail References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Change the function signature of sched_class::dequeue_task() to return a boolean, allowing future patches to 'fail' dequeue. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 7 +++++-- kernel/sched/deadline.c | 4 +++- kernel/sched/fair.c | 4 +++- kernel/sched/idle.c | 3 ++- kernel/sched/rt.c | 4 +++- kernel/sched/sched.h | 4 ++-- kernel/sched/stop_task.c | 3 ++- 7 files changed, 20 insertions(+), 9 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2001,7 +2001,10 @@ void enqueue_task(struct rq *rq, struct sched_core_enqueue(rq, p); } =20 -void dequeue_task(struct rq *rq, struct task_struct *p, int flags) +/* + * Must only return false when DEQUEUE_SLEEP. + */ +inline bool dequeue_task(struct rq *rq, struct task_struct *p, int flags) { if (sched_core_enabled(rq)) sched_core_dequeue(rq, p, flags); @@ -2015,7 +2018,7 @@ void dequeue_task(struct rq *rq, struct } =20 uclamp_rq_dec(rq, p); - p->sched_class->dequeue_task(rq, p, flags); + return p->sched_class->dequeue_task(rq, p, flags); } =20 void activate_task(struct rq *rq, struct task_struct *p, int flags) --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2162,7 +2162,7 @@ static void enqueue_task_dl(struct rq *r enqueue_pushable_dl_task(rq, p); } =20 -static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flag= s) +static bool dequeue_task_dl(struct rq *rq, struct task_struct *p, int flag= s) { update_curr_dl(rq); =20 @@ -2172,6 +2172,8 @@ static void dequeue_task_dl(struct rq *r dequeue_dl_entity(&p->dl, flags); if (!p->dl.dl_throttled && !dl_server(&p->dl)) dequeue_pushable_dl_task(rq, p); + + return true; } =20 /* --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6865,7 +6865,7 @@ static void set_next_buddy(struct sched_ * decreased. We remove the task from the rbtree and * update the fair scheduling stats: */ -static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags) +static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags) { struct cfs_rq *cfs_rq; struct sched_entity *se =3D &p->se; @@ -6937,6 +6937,8 @@ static void dequeue_task_fair(struct rq dequeue_throttle: util_est_update(&rq->cfs, p, task_sleep); hrtick_update(rq); + + return true; } =20 #ifdef CONFIG_SMP --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -482,13 +482,14 @@ struct task_struct *pick_next_task_idle( * It is not legal to sleep in the idle task - print a warning * message if some code attempts to do it: */ -static void +static bool dequeue_task_idle(struct rq *rq, struct task_struct *p, int flags) { raw_spin_rq_unlock_irq(rq); printk(KERN_ERR "bad: scheduling from the idle thread!\n"); dump_stack(); raw_spin_rq_lock_irq(rq); + return true; } =20 /* --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1483,7 +1483,7 @@ enqueue_task_rt(struct rq *rq, struct ta enqueue_pushable_task(rq, p); } =20 -static void dequeue_task_rt(struct rq *rq, struct task_struct *p, int flag= s) +static bool dequeue_task_rt(struct rq *rq, struct task_struct *p, int flag= s) { struct sched_rt_entity *rt_se =3D &p->rt; =20 @@ -1491,6 +1491,8 @@ static void dequeue_task_rt(struct rq *r dequeue_rt_entity(rt_se, flags); =20 dequeue_pushable_task(rq, p); + + return true; } =20 /* --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2285,7 +2285,7 @@ struct sched_class { #endif =20 void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags); - void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags); + bool (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags); void (*yield_task) (struct rq *rq); bool (*yield_to_task)(struct rq *rq, struct task_struct *p); =20 @@ -3606,7 +3606,7 @@ extern int __sched_setaffinity(struct ta extern void __setscheduler_prio(struct task_struct *p, int prio); extern void set_load_weight(struct task_struct *p, bool update_load); extern void enqueue_task(struct rq *rq, struct task_struct *p, int flags); -extern void dequeue_task(struct rq *rq, struct task_struct *p, int flags); +extern bool dequeue_task(struct rq *rq, struct task_struct *p, int flags); =20 extern void check_class_changed(struct rq *rq, struct task_struct *p, const struct sched_class *prev_class, --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -57,10 +57,11 @@ enqueue_task_stop(struct rq *rq, struct add_nr_running(rq, 1); } =20 -static void +static bool dequeue_task_stop(struct rq *rq, struct task_struct *p, int flags) { sub_nr_running(rq, 1); + return true; } =20 static void yield_task_stop(struct rq *rq) From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA2D4487BF for ; Sat, 27 Jul 2024 11:02:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; cv=none; b=mJAqYWJsVHd7E3pDSZmxY6GfRr8XzXVuBiPRf5REBaOwrPi7sXoFysv2BVhuUB/OS1VfPG33AAJl174Vk56zWt7fIRNHhOQIiNia2MjpsaWQ7KYEsxbOI3z0OipJqyiogFEM82Eej5Wtml0HPb55u5uQYfqS5yaLJlBZAVD2/4s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; c=relaxed/simple; bh=uR7VIeybmmiVmZtIlkQtaC8JrMp3cyacg/DYBlGJ8ms=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=CW6wKGqPtBVBV9OukRJFANyMrXBGEu2uWut6J0T5b9sBV8ZUPUtuHtra7+3mPy672wGV2pGQDJ716F1A52uE1mrQo0yknP2t6L84zGsSKuf8tomiPsqApnzyl0JQLZ7lRibYAntlLQqEA3Tcg16vWjs6O9nSm5UQ3XZtGsBwFQA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=bx1+BIwg; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="bx1+BIwg" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=cTU7K6RXzDNlE+8/w9VLYgVsxCh5cOYcd4m1ztxn/yU=; b=bx1+BIwg8de2o0I8O6tJK+8uS9 n2Svc3LlyDrkQSNs4pYF9t2XoJXR2E8++gWKSHBmL7YU7eJvhKXW2oPHiPiVEz9cM6Ic07h4qfv7w BdqSdjJJKF5gtT/7y6qMCx2vIJ6C1w+v6+9QFZHxpD4fxswNKhzhmB1DkEom+722xObTxohy2vna2 BgBAny77BrG9w8ZpDEDd9vd+lRrrE7AVRldfKy6p+w2Ql78cuR0KvcF2Ch3GyK3aC7KQDn6R9uUjP 4y2kReSZZW+71xsr4Ef51B2HaNrfO+pCLZKDGLRIOLpn+qrJ26SzR39AxdZqGQrjTSwd5II/CKvyK RbEKWBkg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-00000004QMf-0iSj; Sat, 27 Jul 2024 11:02:08 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id A96EF302182; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105028.977256873@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:39 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 07/24] sched/fair: Re-organize dequeue_task_fair() References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Working towards delaying dequeue, notably also inside the hierachy, rework dequeue_task_fair() such that it can 'resume' an interrupted hierarchy walk. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 61 ++++++++++++++++++++++++++++++++++-------------= ----- 1 file changed, 40 insertions(+), 21 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6861,34 +6861,43 @@ enqueue_task_fair(struct rq *rq, struct static void set_next_buddy(struct sched_entity *se); =20 /* - * The dequeue_task method is called before nr_running is - * decreased. We remove the task from the rbtree and - * update the fair scheduling stats: + * Basically dequeue_task_fair(), except it can deal with dequeue_entity() + * failing half-way through and resume the dequeue later. + * + * Returns: + * -1 - dequeue delayed + * 0 - dequeue throttled + * 1 - dequeue complete */ -static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags) +static int dequeue_entities(struct rq *rq, struct sched_entity *se, int fl= ags) { - struct cfs_rq *cfs_rq; - struct sched_entity *se =3D &p->se; - int task_sleep =3D flags & DEQUEUE_SLEEP; - int idle_h_nr_running =3D task_has_idle_policy(p); bool was_sched_idle =3D sched_idle_rq(rq); int rq_h_nr_running =3D rq->cfs.h_nr_running; + bool task_sleep =3D flags & DEQUEUE_SLEEP; + struct task_struct *p =3D NULL; + int idle_h_nr_running =3D 0; + int h_nr_running =3D 0; + struct cfs_rq *cfs_rq; =20 - util_est_dequeue(&rq->cfs, p); + if (entity_is_task(se)) { + p =3D task_of(se); + h_nr_running =3D 1; + idle_h_nr_running =3D task_has_idle_policy(p); + } =20 for_each_sched_entity(se) { cfs_rq =3D cfs_rq_of(se); dequeue_entity(cfs_rq, se, flags); =20 - cfs_rq->h_nr_running--; + cfs_rq->h_nr_running -=3D h_nr_running; cfs_rq->idle_h_nr_running -=3D idle_h_nr_running; =20 if (cfs_rq_is_idle(cfs_rq)) - idle_h_nr_running =3D 1; + idle_h_nr_running =3D h_nr_running; =20 /* end evaluation on encountering a throttled cfs_rq */ if (cfs_rq_throttled(cfs_rq)) - goto dequeue_throttle; + return 0; =20 /* Don't dequeue parent if it has other entities besides us */ if (cfs_rq->load.weight) { @@ -6912,20 +6921,18 @@ static bool dequeue_task_fair(struct rq se_update_runnable(se); update_cfs_group(se); =20 - cfs_rq->h_nr_running--; + cfs_rq->h_nr_running -=3D h_nr_running; cfs_rq->idle_h_nr_running -=3D idle_h_nr_running; =20 if (cfs_rq_is_idle(cfs_rq)) - idle_h_nr_running =3D 1; + idle_h_nr_running =3D h_nr_running; =20 /* end evaluation on encountering a throttled cfs_rq */ if (cfs_rq_throttled(cfs_rq)) - goto dequeue_throttle; - + return 0; } =20 - /* At this point se is NULL and we are at root level*/ - sub_nr_running(rq, 1); + sub_nr_running(rq, h_nr_running); =20 if (rq_h_nr_running && !rq->cfs.h_nr_running) dl_server_stop(&rq->fair_server); @@ -6934,10 +6941,22 @@ static bool dequeue_task_fair(struct rq if (unlikely(!was_sched_idle && sched_idle_rq(rq))) rq->next_balance =3D jiffies; =20 -dequeue_throttle: - util_est_update(&rq->cfs, p, task_sleep); - hrtick_update(rq); + return 1; +} +/* + * The dequeue_task method is called before nr_running is + * decreased. We remove the task from the rbtree and + * update the fair scheduling stats: + */ +static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags) +{ + util_est_dequeue(&rq->cfs, p); =20 + if (dequeue_entities(rq, &p->se, flags) < 0) + return false; + + util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP); + hrtick_update(rq); return true; } From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA265433BD for ; Sat, 27 Jul 2024 11:02:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; cv=none; b=Nv/LLZB0Mv0E5lLASwbMPvraQi/Yer09rD4s62qKFzYGbE4uXeHncK5i7cRNutL1mr70QwJGtDVV2vy+jFL6nwSphSzSpLKF9UqcLPLyO2/693wQHJodyYCg55d+yqcUW4U2XKU+OZ+QfAS2ZH1rrGH8nqsKuT+A0dSQC3B7PJs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; c=relaxed/simple; bh=Fq2aw5LDEXvtEG/N7DcTZYxUnI4J2dgJCoiX5HpImTU=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=bTY0FFFAxuEgATsEtax4tZ68XvtsCLX0YDRCdYOt6IuF8FMZ/4lFuAAJ7xsuOgcKjcAXrDp7OVnrYkfGiyDSsRc04Dqnky+B0qoU8sKmuEucbZajQjdFSyMIe4f3OqPxxE1Nejm1KfjAdxGmo+5hdidx+X24PAgK9kAaEfAX6+4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=qypaa9Kr; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="qypaa9Kr" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=/yEJ1pY+KW0CCS//TLlPHquy6j3QO0ymbu1f54Ni/tc=; b=qypaa9Kr2Ja/gi69iwXnUKkmJJ N/w7uFbSmmQDVNaS3MP5HuLEzwmOLyoIOpET6/IH4E2tJY+EWnRT0tP3MyyIOzFLRGeHDA7qgFvhv OmvvkLGGRGv/GV+SmrzdvPuRqWjst3mgaxg1WEh+PZUCocmOj2DteGHqAMYBpwHyWVkJmSMdP3Ue3 wHDb7/0V10V6wsZPyb5cZTETqwU7onSi8CoXvgB1KTsMhrbmjghH7OM8LoM/y3HkR09TuvRQRkcpM 2ye2efV44mvhtIAbdK7MBha6q1EemFESLIjGssczCfkJ/4vW6oU0Ty37hGNBFBwGvQ5BcOjOlD9a5 rYTXanEQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-00000004QMg-0iaG; Sat, 27 Jul 2024 11:02:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id AD008302DAF; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105029.086192709@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:40 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 08/24] sched: Split DEQUEUE_SLEEP from deactivate_task() References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As a preparation for dequeue_task() failing, and a second code-path needing to take care of the 'success' path, split out the DEQEUE_SLEEP path from deactivate_task(). Much thanks to Libo for spotting and fixing a TASK_ON_RQ_MIGRATING ordering fail. Fixed-by: Libo Chen Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 23 +++++++++++++---------- kernel/sched/sched.h | 14 ++++++++++++++ 2 files changed, 27 insertions(+), 10 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2036,12 +2036,23 @@ void activate_task(struct rq *rq, struct =20 void deactivate_task(struct rq *rq, struct task_struct *p, int flags) { - WRITE_ONCE(p->on_rq, (flags & DEQUEUE_SLEEP) ? 0 : TASK_ON_RQ_MIGRATING); + WRITE_ONCE(p->on_rq, TASK_ON_RQ_MIGRATING); ASSERT_EXCLUSIVE_WRITER(p->on_rq); =20 + /* + * Code explicitly relies on TASK_ON_RQ_MIGRATING begin set *before* + * dequeue_task() and cleared *after* enqueue_task(). + */ + dequeue_task(rq, p, flags); } =20 +static void block_task(struct rq *rq, struct task_struct *p, int flags) +{ + if (dequeue_task(rq, p, DEQUEUE_SLEEP | flags)) + __block_task(rq, p); +} + /** * task_curr - is this task currently executing on a CPU? * @p: the task in question. @@ -6486,9 +6497,6 @@ static void __sched notrace __schedule(u !(prev_state & TASK_NOLOAD) && !(prev_state & TASK_FROZEN); =20 - if (prev->sched_contributes_to_load) - rq->nr_uninterruptible++; - /* * __schedule() ttwu() * prev_state =3D prev->state; if (p->on_rq && ...) @@ -6500,12 +6508,7 @@ static void __sched notrace __schedule(u * * After this, schedule() must not care about p->state any more. */ - deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK); - - if (prev->in_iowait) { - atomic_inc(&rq->nr_iowait); - delayacct_blkio_start(); - } + block_task(rq, prev, DEQUEUE_NOCLOCK); } switch_count =3D &prev->nvcsw; } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -68,6 +68,7 @@ #include #include #include +#include =20 #include #include @@ -2591,6 +2592,19 @@ static inline void sub_nr_running(struct sched_update_tick_dependency(rq); } =20 +static inline void __block_task(struct rq *rq, struct task_struct *p) +{ + WRITE_ONCE(p->on_rq, 0); + ASSERT_EXCLUSIVE_WRITER(p->on_rq); + if (p->sched_contributes_to_load) + rq->nr_uninterruptible++; + + if (p->in_iowait) { + atomic_inc(&rq->nr_iowait); + delayacct_blkio_start(); + } +} + extern void activate_task(struct rq *rq, struct task_struct *p, int flags); extern void deactivate_task(struct rq *rq, struct task_struct *p, int flag= s); From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA22F20DE8 for ; Sat, 27 Jul 2024 11:02:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; cv=none; b=oqVJEAWt1bOBOuz1ht1wAx8z/pX3XPdlMEtZVs2lVkJoeunB4B8DBW+hdtktpmylzrLfGAknqtfG9WpNcUT4fmgMyK6yT+yt5CTQIyzxMvtVIkaoupIIOmNLMcFZwFPYMsgPlxao4aoiLYeqvbu99TM5znrl1QYRG2YaX/d3H1s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; c=relaxed/simple; bh=A2CZohtVVQVvdFwTyK6Q/keDVxatN8INOXhy5uX3XdE=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=kwvYVuL5zMLEVC/sYq9e6uPdPztGBOO1lqCD2eJWCpoKA9yYo7x7/MgVFsMl+ev1ZxEGiwJ7qWZr/grcW/8wmjW6wwVO7I1Zbgsu3GucIrRyaJDCR4ye8OR3LNwLT36NFcXIl6YRIIejJf6f5J0eeaBmFHcv8PL74QiTvqfiSNc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=q0MppEeT; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="q0MppEeT" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=Jwg5pijGEsgzZg4N0bmmotKdfYZQrvKWZdWkHQdbCk8=; b=q0MppEeTrji6u4UU3vlyyzBcDt Pg3B84116ibp+Q3ZcyqTxujAv6vYcWBClxks9pFOmO/hm074Cc35DnZepeLR9GS3YuFugfRSfzEq2 3RjwBJbDCHL6pfkx9Ci9Aa80l3Uvxp2SkpxgGFwrnd+g2hauoSBwVquLiD0TSHxFn+v9d1FzPUmJ0 K4X+T4R/iCKSVXuHAfv00HKvlIVES5DFFRyrieN/v0/Z262dILrYjPv1OR7+eXuiu3//CUcEVcneM Y8L/M9Y6kJsBKOt460IO31curZA6/pj3VDf130ymX4fw++MXMONvTFiwrx2AWQGE+hK0tnjhmYpoj tMkwoy6w==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-00000004QMh-0iVm; Sat, 27 Jul 2024 11:02:08 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id B1C01302DF4; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105029.200000445@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:41 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 09/24] sched: Prepare generic code for delayed dequeue References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" While most of the delayed dequeue code can be done inside the sched_class itself, there is one location where we do not have an appropriate hook, namely ttwu_runnable(). Add an ENQUEUE_DELAYED call to the on_rq path to deal with waking delayed dequeue tasks. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/sched.h | 1 + kernel/sched/core.c | 17 ++++++++++++++++- kernel/sched/sched.h | 2 ++ 3 files changed, 19 insertions(+), 1 deletion(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -542,6 +542,7 @@ struct sched_entity { =20 struct list_head group_node; unsigned int on_rq; + unsigned int sched_delayed; =20 u64 exec_start; u64 sum_exec_runtime; --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2036,6 +2036,8 @@ void activate_task(struct rq *rq, struct =20 void deactivate_task(struct rq *rq, struct task_struct *p, int flags) { + SCHED_WARN_ON(flags & DEQUEUE_SLEEP); + WRITE_ONCE(p->on_rq, TASK_ON_RQ_MIGRATING); ASSERT_EXCLUSIVE_WRITER(p->on_rq); =20 @@ -3677,12 +3679,14 @@ static int ttwu_runnable(struct task_str =20 rq =3D __task_rq_lock(p, &rf); if (task_on_rq_queued(p)) { + update_rq_clock(rq); + if (p->se.sched_delayed) + enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED); if (!task_on_cpu(rq, p)) { /* * When on_rq && !on_cpu the task is preempted, see if * it should preempt the task that is current now. */ - update_rq_clock(rq); wakeup_preempt(rq, p, wake_flags); } ttwu_do_wakeup(p); @@ -4062,11 +4069,16 @@ int try_to_wake_up(struct task_struct *p * case the whole 'p->on_rq && ttwu_runnable()' case below * without taking any locks. * + * Specifically, given current runs ttwu() we must be before + * schedule()'s block_task(), as such this must not observe + * sched_delayed. + * * In particular: * - we rely on Program-Order guarantees for all the ordering, * - we're serialized against set_special_state() by virtue of * it disabling IRQs (this allows not taking ->pi_lock). */ + SCHED_WARN_ON(p->se.sched_delayed); if (!ttwu_state_match(p, state, &success)) goto out; =20 @@ -4358,6 +4370,9 @@ static void __sched_fork(unsigned long c p->se.slice =3D sysctl_sched_base_slice; INIT_LIST_HEAD(&p->se.group_node); =20 + /* A delayed task cannot be in clone(). */ + SCHED_WARN_ON(p->se.sched_delayed); + #ifdef CONFIG_FAIR_GROUP_SCHED p->se.cfs_rq =3D NULL; #endif --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2258,6 +2258,7 @@ extern const u32 sched_prio_to_wmult[40 #define DEQUEUE_MOVE 0x04 /* Matches ENQUEUE_MOVE */ #define DEQUEUE_NOCLOCK 0x08 /* Matches ENQUEUE_NOCLOCK */ #define DEQUEUE_MIGRATING 0x100 /* Matches ENQUEUE_MIGRATING */ +#define DEQUEUE_DELAYED 0x200 /* Matches ENQUEUE_DELAYED */ =20 #define ENQUEUE_WAKEUP 0x01 #define ENQUEUE_RESTORE 0x02 @@ -2273,6 +2274,7 @@ extern const u32 sched_prio_to_wmult[40 #endif #define ENQUEUE_INITIAL 0x80 #define ENQUEUE_MIGRATING 0x100 +#define ENQUEUE_DELAYED 0x200 =20 #define RETRY_TASK ((void *)-1UL) From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9DF877114 for ; Sat, 27 Jul 2024 11:02:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078150; cv=none; b=uY+GiVB5AkHAeeSxDOO3qbXOIoNwIdnyyYnsy2twVqcaCMSZE7ljOWPK8603pl60/cYBXFpOOTt2WJIg2MkEBDitx9CuD8Enu8xIeKHtX+yXOJbSYRtAj2xuwELOILlSgEr8iBD9ovsitvb4pm8ET37yv63os7/Lv5/KfZUqyrM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078150; c=relaxed/simple; bh=VRvKCSO+u3SVIJJGEgQSXAVEe0CYqd2QHfPZ5uoKIVw=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=mSz2nZ9XxjNK3B8tXLYIMhdVDztOWK9gvcqQnmNovGrC2OSZ4iUodrwiQTQf5D6ZTPLpC4NPaVZ6xOox6WcEVQqnnnE6cW6SpgV1GVH9zInlXSY40uLNKfLuq7JO9N/bmmuTLMc5+bTbg54jV/P0XwI7h1FV4N13mDM+rPEh8xs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=lc+hP98b; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="lc+hP98b" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=thXjteC/Xnc5IiNOadt/4ekOLhMA+NC8yGXwI3xUFUQ=; b=lc+hP98bjyDH+7AxRy/bfzfxDc XytfExVaWQxu0v/piFxABt9514uQIfKXEVwRDzgcW6QjURQzDQ3uRCHiKmpTbiq4l1/hONqA81Glm 1KBZ3aPBcUBjzvtMAWg6CFT2Gs0xPGIdaIuvESEju1qxrn5AYXQbXP0OTsvptVNp/XL6h3h20KlsB fDrzu/Tel79uQ+Gu34VT2YJarrm8ZOSEgaTwnJ9jPw42HFir3wV0QTsSFM19U37yLcDVIsQwXtEPH GtA3nVbHr+GyAilVs4KflW03m0cft959xe/UgbUrf1CO39aJsFxu8GY2yplhEcKShY6AcRMWcicmH CRDQkquA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-00000004QMi-0ksP; Sat, 27 Jul 2024 11:02:21 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id B75FF30325A; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105029.315205425@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:42 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de, Luis Machado , Hongyan Xia Subject: [PATCH 10/24] sched/uclamg: Handle delayed dequeue References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Delayed dequeue has tasks sit around on the runqueue that are not actually runnable -- specifically, they will be dequeued the moment they get picked. One side-effect is that such a task can get migrated, which leads to a 'nested' dequeue_task() scenario that messes up uclamp if we don't take care. Notably, dequeue_task(DEQUEUE_SLEEP) can 'fail' and keep the task on the runqueue. This however will have removed the task from uclamp -- per uclamp_rq_dec() in dequeue_task(). So far so good. However, if at that point the task gets migrated -- or nice adjusted or any of a myriad of operations that does a dequeue-enqueue cycle -- we'll pass through dequeue_task()/enqueue_task() again. Without modification this will lead to a double decrement for uclamp, which is wrong. Reported-by: Luis Machado Reported-by: Hongyan Xia Signed-off-by: Peter Zijlstra (Intel) Tested-by: Dietmar Eggemann --- kernel/sched/core.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1676,6 +1676,9 @@ static inline void uclamp_rq_inc(struct if (unlikely(!p->sched_class->uclamp_enabled)) return; =20 + if (p->se.sched_delayed) + return; + for_each_clamp_id(clamp_id) uclamp_rq_inc_id(rq, p, clamp_id); =20 @@ -1700,6 +1703,9 @@ static inline void uclamp_rq_dec(struct if (unlikely(!p->sched_class->uclamp_enabled)) return; =20 + if (p->se.sched_delayed) + return; + for_each_clamp_id(clamp_id) uclamp_rq_dec_id(rq, p, clamp_id); } @@ -1979,8 +1985,12 @@ void enqueue_task(struct rq *rq, struct psi_enqueue(p, (flags & ENQUEUE_WAKEUP) && !(flags & ENQUEUE_MIGRATED)); } =20 - uclamp_rq_inc(rq, p); p->sched_class->enqueue_task(rq, p, flags); + /* + * Must be after ->enqueue_task() because ENQUEUE_DELAYED can clear + * ->sched_delayed. + */ + uclamp_rq_inc(rq, p); =20 if (sched_core_enabled(rq)) sched_core_enqueue(rq, p); @@ -2002,6 +2012,10 @@ inline bool dequeue_task(struct rq *rq, psi_dequeue(p, flags & DEQUEUE_SLEEP); } =20 + /* + * Must be before ->dequeue_task() because ->dequeue_task() can 'fail' + * and mark the task ->sched_delayed. + */ uclamp_rq_dec(rq, p); return p->sched_class->dequeue_task(rq, p, flags); } From nobody Sun Dec 14 06:15:47 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DE8274077 for ; Sat, 27 Jul 2024 11:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078150; cv=none; b=mWiRXwJELuPE5wxEVHzgDXX8N9ZwwXUZXneCFr+YowlbvD8++WPV99eJaaSamserZPJSSjSRqAMcdYnsgsO40Cw/32GsT25m1v6dJYqHJ3zORmbG75wDTaVMNrM7Tu8V0l/lyxodQ5f+CKPPvMjAMqH3INUIVeiTrP42ygSqd+Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078150; c=relaxed/simple; bh=0cZialdLEZNPbZyDkuZzRylWBeVlYoMizeLnDMVUCxE=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=FKGAX1X39ikj1LJc3bzvcjDMJ4ZB2/eQR4/OkF0ZNc2RQyIBzRgkMFC7wmudr2lvye0M8z1DYpngjpkqR8pD0rmYmX7qCC8PXkBijoFDeo7jejW7up+EyRCq2BsiHd6vK1NwKyjC7doiBEDnObNeOLfbdENXC1qMV5qYztymn8s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=iwRCEgP5; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="iwRCEgP5" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=xsLAhVPjLFIx3JQF2Hi98IC3K1z3J6cnu59IuKi2zBo=; b=iwRCEgP51b3UB9r+vmmjav5IOR 7r0orQ2pI8tZ0SOHQTGW4kbu8VfQUUrpELfo5hxzyVN+cM4hjseLNtQkG64sw5XE0vt/aRRezrriX ZPtg/zH7NvGvM1fTDZF/ld1R+/B5KDpgwCdOdbpvhKdWDUO4bqOcun60FuZ7wxJ2T7JchzogURFQ4 ORsykQHmT2kfNhvVm3EIZqVvzRdVkPJbW2t+Ydnh2jZb8Mvt5g/DOCvmp7lONQYcVIVsjIzLm6VVd MkBTKx5/7+N/dTohmsHDtJ0ZKt+upMaSwRxKt1sQ7A69gqt8ATc2mZBL5WUAKDnhzBAvu5ycSDpxM +7myWJ4Q==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-0000000BGJR-1ak0; Sat, 27 Jul 2024 11:02:08 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id BC62D303DD8; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105029.486423066@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:43 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 11/24] sched/fair: Assert {set_next,put_prev}_entity() are properly balanced References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Just a little sanity test.. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 2 ++ 1 file changed, 2 insertions(+) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5432,6 +5432,7 @@ set_next_entity(struct cfs_rq *cfs_rq, s } =20 update_stats_curr_start(cfs_rq, se); + SCHED_WARN_ON(cfs_rq->curr); cfs_rq->curr =3D se; =20 /* @@ -5493,6 +5494,7 @@ static void put_prev_entity(struct cfs_r /* in !on_rq case, update occurred at dequeue */ update_load_avg(cfs_rq, prev, 0); } + SCHED_WARN_ON(cfs_rq->curr !=3D prev); cfs_rq->curr =3D NULL; } From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA29A481B7 for ; Sat, 27 Jul 2024 11:02:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; cv=none; b=rOcroGJudAqAWlUi6jK8UUfAfZLhtzZQjEiVfg9yPjP+kZe8Svqh254uxgBOr1vpTNkDqmt6F3/ltwuXkwlPmzgjXscLoa1dAFxmxTXn8bMCQvg+/C1FueFqV27se7IQuC9oPA0t+MUcTCqKu4c/kfxdSWLwYvfZYo3IOk4QeRg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; c=relaxed/simple; bh=Op/W2UQMguL4//BkKo3H6WwvAQRTno1SnLTSPqXtkKM=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=W0Jm7sBLZZt3feo8Qy7b0gmGqZvPnpoqcKzcaws3Yfk9Rzawf9qfZ9Esk28iwmagDHcpqIwm2TgSZ/9C2eU8CFzzo49fTpaz1rCSOLnKnGaMHwrszxCwkp2+Qu/dnz2WJ0OSoJNNK/wEWkTp4GgyqlYsFsRGkD3meSwr332C9x4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=bIsb1Sk+; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="bIsb1Sk+" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=QwzCMHrkYBSbjfUXXH4pXWPLFQlSvaWTAzXPcHOhQoM=; b=bIsb1Sk+0ykHar96ywShbjrCtI 4WLJfIKqbAkm1W8iPA7OiD0f0QutA7eR1U83EXxuTdpUFY2IfQkcsMzIxZ82mvef8H3Bg5sP2dd4U iQnKWkFm0AT3hx4awpiGdcsknt64t6PCk7X+8EwT3yZPgr0taFaq7q437rg+mmVYIbv4m4nxtiXiW HHswhV+W/iNiqHl0FtDILxvuZcGN8ApmNbQoqyChH+D5Hj9wY/zbgi5ZwHdTr0PfwcExqQnZfxA66 T6/dToo+nJd+8Bq4y57wAWXUctR8nw34dq5jtINUsfF9CqBv1kAOpXrVEg2DEPA0fXNaSKoV+vKHg imMzZ3Lw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-00000004QMl-1cRa; Sat, 27 Jul 2024 11:02:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id C177C30614E; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105029.631948434@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:44 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 12/24] sched/fair: Prepare exit/cleanup paths for delayed_dequeue References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When dequeue_task() is delayed it becomes possible to exit a task (or cgroup) that is still enqueued. Ensure things are dequeued before freeing. NOTE: switched_from_fair() causes spurious wakeups due to clearing sched_delayed after enqueueing a task in another class that should've been dequeued. This *should* be harmless. Signed-off-by: Peter Zijlstra (Intel) Reported-by: kernel test robot --- kernel/sched/fair.c | 61 ++++++++++++++++++++++++++++++++++++++++-------= ----- 1 file changed, 48 insertions(+), 13 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8318,7 +8318,20 @@ static void migrate_task_rq_fair(struct =20 static void task_dead_fair(struct task_struct *p) { - remove_entity_load_avg(&p->se); + struct sched_entity *se =3D &p->se; + + if (se->sched_delayed) { + struct rq_flags rf; + struct rq *rq; + + rq =3D task_rq_lock(p, &rf); + update_rq_clock(rq); + if (se->sched_delayed) + dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED); + task_rq_unlock(rq, p, &rf); + } + + remove_entity_load_avg(se); } =20 /* @@ -12817,10 +12830,26 @@ static void attach_task_cfs_rq(struct ta static void switched_from_fair(struct rq *rq, struct task_struct *p) { detach_task_cfs_rq(p); + /* + * Since this is called after changing class, this isn't quite right. + * Specifically, this causes the task to get queued in the target class + * and experience a 'spurious' wakeup. + * + * However, since 'spurious' wakeups are harmless, this shouldn't be a + * problem. + */ + p->se.sched_delayed =3D 0; + /* + * While here, also clear the vlag, it makes little sense to carry that + * over the excursion into the new class. + */ + p->se.vlag =3D 0; } =20 static void switched_to_fair(struct rq *rq, struct task_struct *p) { + SCHED_WARN_ON(p->se.sched_delayed); + attach_task_cfs_rq(p); =20 set_task_max_allowed_capacity(p); @@ -12971,28 +13000,33 @@ void online_fair_sched_group(struct task =20 void unregister_fair_sched_group(struct task_group *tg) { - unsigned long flags; - struct rq *rq; int cpu; =20 destroy_cfs_bandwidth(tg_cfs_bandwidth(tg)); =20 for_each_possible_cpu(cpu) { - if (tg->se[cpu]) - remove_entity_load_avg(tg->se[cpu]); + struct cfs_rq *cfs_rq =3D tg->cfs_rq[cpu]; + struct sched_entity *se =3D tg->se[cpu]; + struct rq *rq =3D cpu_rq(cpu); + + if (se) { + if (se->sched_delayed) { + guard(rq_lock_irqsave)(rq); + if (se->sched_delayed) + dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED); + list_del_leaf_cfs_rq(cfs_rq); + } + remove_entity_load_avg(se); + } =20 /* * Only empty task groups can be destroyed; so we can speculatively * check on_list without danger of it being re-added. */ - if (!tg->cfs_rq[cpu]->on_list) - continue; - - rq =3D cpu_rq(cpu); - - raw_spin_rq_lock_irqsave(rq, flags); - list_del_leaf_cfs_rq(tg->cfs_rq[cpu]); - raw_spin_rq_unlock_irqrestore(rq, flags); + if (cfs_rq->on_list) { + guard(rq_lock_irqsave)(rq); + list_del_leaf_cfs_rq(cfs_rq); + } } } From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C47676C61 for ; Sat, 27 Jul 2024 11:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078149; cv=none; b=UKJtWcxLso3P2IgJiwrH3hDi8RaX+7x2ihUlD/My29T1fIls81LwtqKAWkxNXJJc71G9tNDrsRWZbOGyarTJKBcNQpBZ4Lfj+Q2Y7EHKtPk3v8aA9XBgShwB5i1WCpV3E3aDFd26ffTwvyDQSx4fg8MLFUPFWOCaJUUX5aTPiKc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078149; c=relaxed/simple; bh=UjpvSUfzFnwK76Bs7wFDqK5xuHrje2T2yPFj2mB7/4k=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=F9RCfedleXc0QloM7dAkr2gt8Zw7gpHyEYonq9PxRNR4UkdZgxSPRR7W5pmLjyQFxViPIi4IyfqM9Pl1boU+CbHyyWQGcJ5meTof9JGxw4gDoJTfeTUVGpgeoSk3W/IDJVrXZFE+Vw36056uEWvQBMIj02hfwr8gPgwomFUvfZ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=T0Hhd/lQ; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="T0Hhd/lQ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=Wypbsa8TcX0ssA9GJe6UbYub1r9G8dWt4safnPicCkk=; b=T0Hhd/lQOs7Nfrx2Ye5Q9fSmpc lpxF2w2pZaksqrXeSTruK8hhZYbdZk0p1DLXVp8KHbnu/tHEXPr8vMf9Cc5A0r3eZH7Vm7px/Hbgg Xcn2/jF7GengsO6phxM8NuVbEckbcNjl2yv8Z7rCqbCKG2SkeqBS6p2Hn0KS0VJOOV1IbJxb0qqvp +rEg+C8EYPugVEl07WtVECS/EDCLrmIbNtFOvN/EKYJwklvbRovgOL7oebc3QwcMH6R0Bndgdjydu PWe1v9PFsNtEU2JyvRvDvE6z6+qo6C337HfQQy0CSUXMRtJHcKPcyIYVDKoc7QBE3YWY7zYZgulHM HAL+NF0A==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-00000004QMm-1exh; Sat, 27 Jul 2024 11:02:20 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id C696D306155; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105029.747330118@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:45 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 13/24] sched/fair: Prepare pick_next_task() for delayed dequeue References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Delayed dequeue's natural end is when it gets picked again. Ensure pick_next_task() knows what to do with delayed tasks. Note, this relies on the earlier patch that made pick_next_task() state invariant -- it will restart the pick on dequeue, because obviously the just dequeued task is no longer eligible. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5453,6 +5453,8 @@ set_next_entity(struct cfs_rq *cfs_rq, s se->prev_sum_exec_runtime =3D se->sum_exec_runtime; } =20 +static int dequeue_entities(struct rq *rq, struct sched_entity *se, int fl= ags); + /* * Pick the next process, keeping these things in mind, in this order: * 1) keep things fair between processes/task groups @@ -5461,16 +5463,27 @@ set_next_entity(struct cfs_rq *cfs_rq, s * 4) do not run the "skip" process, if something else is available */ static struct sched_entity * -pick_next_entity(struct cfs_rq *cfs_rq) +pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq) { /* * Enabling NEXT_BUDDY will affect latency but not fairness. */ if (sched_feat(NEXT_BUDDY) && - cfs_rq->next && entity_eligible(cfs_rq, cfs_rq->next)) + cfs_rq->next && entity_eligible(cfs_rq, cfs_rq->next)) { + /* ->next will never be delayed */ + SCHED_WARN_ON(cfs_rq->next->sched_delayed); return cfs_rq->next; + } + + struct sched_entity *se =3D pick_eevdf(cfs_rq); + if (se->sched_delayed) { + dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED); + SCHED_WARN_ON(se->sched_delayed); + SCHED_WARN_ON(se->on_rq); =20 - return pick_eevdf(cfs_rq); + return NULL; + } + return se; } =20 static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq); @@ -8478,7 +8491,9 @@ static struct task_struct *pick_task_fai if (unlikely(check_cfs_rq_runtime(cfs_rq))) goto again; =20 - se =3D pick_next_entity(cfs_rq); + se =3D pick_next_entity(rq, cfs_rq); + if (!se) + goto again; cfs_rq =3D group_cfs_rq(se); } while (cfs_rq); From nobody Sun Dec 14 06:15:47 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07FA2763EE for ; Sat, 27 Jul 2024 11:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078149; cv=none; b=CD1Oj8VTnjKg26YjWuySBNxrI91GeJufrxsuzWlAIipdgo8G9xM9CSQVLzGWpUm9d96FDLaLMnmgWZkqNhdnYz5GTj6oBlAAldFlgtQkhV8lzv6eouSU+L9whqjMWlotwZqyw8Ye9GT1fKCMD8uaO0l/a4daFxf0LT6qI23mAVk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078149; c=relaxed/simple; bh=W7ixlBVpjk3Ug3BVDum/bSfV7lclGNSQIxdbcrz6DBI=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=m1HWq3kgvFV53jKgKcn/cVmPmSjGGrZhP/HCfAaInTs1a/uRUjum3JLWEUfHw8JVN/zImg94otrWoSQ6Bp3JfVjl61gGR/p4VWvMjYVWyQC8e0mhvNBMCZ9dTNGUxJbYFQZyAzrKmNJ1M+raUcPSbLZTdbkYZRmNO74tZTT9VCs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=i57wwRoU; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="i57wwRoU" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=Zafh4zVDUAS/FH2BtZM4nOsdVxbLhzdki+RYH2NkTR8=; b=i57wwRoUpYQrtEV06H3CCV7tqU IE+v6U57JWNB5MxyZ4WE+CCyw8u/bcn+LZhPsuDZpaAttlKtEB5s/8r7t2oSXPNADQiq4hf2oAHwS pqV5PCpnp9iDmPAdc0PlFi/AFzEEm17SO06Q3yPMhlAuA79JaLBHr3kcgOOVxdQo4Iv+wGeMFnH12 +l+5WGvZTNEzTXm2MPQU0j2GV2qJjn2tVsVy6kke9m6tudhX+k0G/L0AwkfEMqITUzoyNn4TE2V7a P6SB9N4bAssowPiTNugh4Q/KLQUVrUeb9eNeW84RAI+DJ5zAEpb/7qptz4w21uBlZi/czKmrGCwf7 lzjdJ8Cg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-0000000BGJS-1d2B; Sat, 27 Jul 2024 11:02:08 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id CB99C3061AB; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105029.888107381@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:46 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 14/24] sched/fair: Implement ENQUEUE_DELAYED References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Doing a wakeup on a delayed dequeue task is about as simple as it sounds -- remove the delayed mark and enjoy the fact it was actually still on the runqueue. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 37 +++++++++++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5290,6 +5290,9 @@ static inline int cfs_rq_throttled(struc static inline bool cfs_bandwidth_used(void); =20 static void +requeue_delayed_entity(struct sched_entity *se); + +static void enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) { bool curr =3D cfs_rq->curr =3D=3D se; @@ -5922,8 +5925,10 @@ void unthrottle_cfs_rq(struct cfs_rq *cf for_each_sched_entity(se) { struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); =20 - if (se->on_rq) + if (se->on_rq) { + SCHED_WARN_ON(se->sched_delayed); break; + } enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP); =20 if (cfs_rq_is_idle(group_cfs_rq(se))) @@ -6773,6 +6778,22 @@ static int sched_idle_cpu(int cpu) } #endif =20 +static void +requeue_delayed_entity(struct sched_entity *se) +{ + struct cfs_rq *cfs_rq =3D cfs_rq_of(se); + + /* + * se->sched_delayed should imply both: se->on_rq =3D=3D 1. + * Because a delayed entity is one that is still on + * the runqueue competing until elegibility. + */ + SCHED_WARN_ON(!se->sched_delayed); + SCHED_WARN_ON(!se->on_rq); + + se->sched_delayed =3D 0; +} + /* * The enqueue_task method is called before nr_running is * increased. Here we update the fair scheduling stats and @@ -6787,6 +6812,11 @@ enqueue_task_fair(struct rq *rq, struct int task_new =3D !(flags & ENQUEUE_WAKEUP); int rq_h_nr_running =3D rq->cfs.h_nr_running; =20 + if (flags & ENQUEUE_DELAYED) { + requeue_delayed_entity(se); + return; + } + /* * The code below (indirectly) updates schedutil which looks at * the cfs_rq utilization to select a frequency. @@ -6804,8 +6834,11 @@ enqueue_task_fair(struct rq *rq, struct cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT); =20 for_each_sched_entity(se) { - if (se->on_rq) + if (se->on_rq) { + if (se->sched_delayed) + requeue_delayed_entity(se); break; + } cfs_rq =3D cfs_rq_of(se); enqueue_entity(cfs_rq, se, flags); From nobody Sun Dec 14 06:15:47 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4553E7404B for ; Sat, 27 Jul 2024 11:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078149; cv=none; b=LYRJ7BDuC8VTpEhJ+9tXRQiKSRx8ajyhVfDOXBnkyGOC8P0kyQKhtFmUZjkQAljOA/yzoIbm6O6P9VcivJd7p2atwWjkNjY58iznofcTUT3XrU4L6+Q4J3NeVQzbP9vdoTH0RthHMoSuFOLpPEZqom/EIkfvf734kEyjomuzTLo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078149; c=relaxed/simple; bh=gw3JP4/yaIApWvKIzw8xSsfXB6GJVNQesX3J64WYlD0=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=IgZJLrQylLJCFm79q0btdQ64HdbPhrapuyqIUZoLpZJoYfNhiq1W3+6Q4NDRppfQRnniO5UoQfcVzaM1GGbRhQFnyY+k8nYkFaeqEp+1Lf84UbLvQ/9aiZI0d2GJvt9r66rq6Mlz1YyrWdX8iwLD8G1jb43DOVVSApB1Ea9anzo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=RdgnCKZ7; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="RdgnCKZ7" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=mnMD1X1rMlybv0/z4WGvNLJsPKB2ipjRlyKJPJw/a9E=; b=RdgnCKZ7lGCtBr1mWeYG0DrL7I efJJNj/vUYpODp8Jf7MSSDtPKt12/k/h8GwIc8l/GDPP5UwQWDSYwfvYbzKe+3c4/sMIF2mrvDHhk kDRfHO9WM9pti5+ZS7DZ+xHnKvbAnh/eHoREvRZZ9VImsUq0Sz1E31oimZHhfs+OpBjyza0NyFd9d +I2rid3IdRwpekBaL5OMb8gn1Hzulldgc23R2qOn3iWoRW0FxbeRxbtALTKYsBmIL/gjAns4jdMYy HJkWazrfk/547tOuBclABs6WoH1tqoTXmVtAki2A19tSxEfhlNT0SyazM2Njg5YgK9jSFVJamq3Sp wlg4Nliw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-0000000BGJV-3v9I; Sat, 27 Jul 2024 11:02:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id D1C803061D0; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105029.998329901@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:47 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 15/24] sched,freezer: Mark TASK_FROZEN special References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The special task states are those that do not suffer spurious wakeups, TASK_FROZEN is very much one of those, mark it as such. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/sched.h | 5 +++-- kernel/freezer.c | 2 +- 2 files changed, 4 insertions(+), 3 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -147,8 +147,9 @@ struct user_event_mm; * Special states are those that do not use the normal wait-loop pattern. = See * the comment with set_special_state(). */ -#define is_special_task_state(state) \ - ((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | TASK_DEAD)) +#define is_special_task_state(state) \ + ((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | \ + TASK_DEAD | TASK_FROZEN)) =20 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP # define debug_normal_state_change(state_value) \ --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -72,7 +72,7 @@ bool __refrigerator(bool check_kthr_stop bool freeze; =20 raw_spin_lock_irq(¤t->pi_lock); - set_current_state(TASK_FROZEN); + WRITE_ONCE(current->__state, TASK_FROZEN); /* unstale saved_state so that __thaw_task() will wake us up */ current->saved_state =3D TASK_RUNNING; raw_spin_unlock_irq(¤t->pi_lock); From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B806853E22 for ; Sat, 27 Jul 2024 11:02:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078148; cv=none; b=KCpL/er2M7iLAefQFJs0b4z817gkEtsCcvxAO5zVQWQGxWIVDCVJ02TePPxWQP0k4WVds8xBus9rE9skmywwnYiJNr4nb8+UT0sHQffcCVi0pt7p7CqWINi1AALw73BIPJGmQ7ahVgL/fdp2z+cBNE6gh+GX0NPDSLAQV3rcnzI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078148; c=relaxed/simple; bh=WxACGyjM8uXZHdHrTp/rYtGc/g0ySsYwgQvxlBhYS3g=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=gojDzDsb+av4LluagyMsx8zQEunpC/F9EaNo5m/NNymVRTSkNYqYxOYAesyqISqoEpj2hHakp6SKvgYyIKFcOddC6VenkpxjqfFThDismBAH51e25Vvo9G8xXI1xPnvmJlzRaqMMgcnKDgLVV4gzrYLYN/AvmakDE6t/LDAa5PA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Rkpx50GJ; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Rkpx50GJ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=NLtNFSK8fuvX7I5XkvXJ2Vk6YGgGJlY5tluqW0aUh0o=; b=Rkpx50GJ4UC8xdjckeH/jFwm5A GgiuPzfBhftQQuedLIsEo9HyN2RgeDC7eBUbqdNP8QbK7qtsJz1uW6FOlf1JKexmXUmXykh+IKcJq 8Gex4ws1IVVjdGr9EHKYU4yUSjSbgmWvuyvA9o64dtBhKZt3pa1D4e6VCkIPfUkpN5VJfVg038hR2 OWmm7/lXE5Cxqr2r7y/+yndDswD4ld8u4g5CWfWn6JR/tK12HJWnREDCpUnw9BN98j5z/KppwjGzl JXZVBtghX4yIFmMMzFpLHAccE9nfj/EOkodsjArVRLbajHDh3CHxprJqR12F8/UhRDXnCyQ1gfjP2 ohEw/1PQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-00000004QMn-3xkM; Sat, 27 Jul 2024 11:02:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id D741C3061E4; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105030.110439521@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:48 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 16/24] sched: Teach dequeue_task() about special task states References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since special task states must not suffer spurious wakeups, and the proposed delayed dequeue can cause exactly these (under some boundary conditions), propagate this knowledge into dequeue_task() such that it can do the right thing. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/core.c | 7 ++++++- kernel/sched/sched.h | 3 ++- 2 files changed, 8 insertions(+), 2 deletions(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6521,11 +6521,16 @@ static void __sched notrace __schedule(u if (signal_pending_state(prev_state, prev)) { WRITE_ONCE(prev->__state, TASK_RUNNING); } else { + int flags =3D DEQUEUE_NOCLOCK; + prev->sched_contributes_to_load =3D (prev_state & TASK_UNINTERRUPTIBLE) && !(prev_state & TASK_NOLOAD) && !(prev_state & TASK_FROZEN); =20 + if (unlikely(is_special_task_state(prev_state))) + flags |=3D DEQUEUE_SPECIAL; + /* * __schedule() ttwu() * prev_state =3D prev->state; if (p->on_rq && ...) @@ -6537,7 +6542,7 @@ static void __sched notrace __schedule(u * * After this, schedule() must not care about p->state any more. */ - block_task(rq, prev, DEQUEUE_NOCLOCK); + block_task(rq, prev, flags); } switch_count =3D &prev->nvcsw; } --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2254,10 +2254,11 @@ extern const u32 sched_prio_to_wmult[40 * */ =20 -#define DEQUEUE_SLEEP 0x01 +#define DEQUEUE_SLEEP 0x01 /* Matches ENQUEUE_WAKEUP */ #define DEQUEUE_SAVE 0x02 /* Matches ENQUEUE_RESTORE */ #define DEQUEUE_MOVE 0x04 /* Matches ENQUEUE_MOVE */ #define DEQUEUE_NOCLOCK 0x08 /* Matches ENQUEUE_NOCLOCK */ +#define DEQUEUE_SPECIAL 0x10 #define DEQUEUE_MIGRATING 0x100 /* Matches ENQUEUE_MIGRATING */ #define DEQUEUE_DELAYED 0x200 /* Matches ENQUEUE_DELAYED */ From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E8554D9FE for ; Sat, 27 Jul 2024 11:02:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; cv=none; b=JzVCD7zMQY+ZzebaIizsC7uzKWrEDpkQYxY74J+WSUn3vB5ZCwcssQJwoIVPrrmEOI98m8j79SeCnBu8HBopYPxLjxN4sXDYxPQgAVz5fHIZDP3BtyzsgwlmcJw/tVc8wfe7svItIqZKpeMH/loMCqgEMlBM8upK6GtkldglSh4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; c=relaxed/simple; bh=VfSeFVQpVR3Ie2WXE/8+GLfqEkjbiZzJJ3ebPVldrF0=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=PDE0ZhQ1V+/6sBkPmOmXJp2v20wVPm7B/syj69e9XuUCvLktcP3u3CGKB4ILQryszjl9UEOcsGf1WmjSsO30D/CWsu3T+OAsqCRUj22cylYRAgrvPzB1tM6pmiZYqjHkjU8Fq04Cw7qVbEFmjzp4aCbcMzKDDSmSncWhzexGFrg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=i8IKfGYw; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="i8IKfGYw" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=HSpNrrhOKnI+fCUAM80vQF3pcg432VOftpJHATgs/yI=; b=i8IKfGYw1RyTWCVQQKUxc1C4+R 3XOM/OM3GPCwFnGa+VNUdfA7vcwSOD1XdBjZVepkjZ74XM8ZyZT4EwUg+Vlom/jZJGIxXlfcG4dIg G2O00wI+3JfjemjhoyH/nhjpqj+2pteAYyBcpYJolDX1Kd6qwDUhWqMoRFtDNV7dEwrk5x4oP30dq qOBy2tfaz8gQewS+FjVduuDlf2O/tu42GIm8gfK6QrvBzzhmTVv6gtQnvocQ5J+YUEtV5hFAdx+pg TobDgIrPhBV0ddeyXQiAwRRvxBzQyAL6i/2ktYnNlsNuV1NPPpkkqoFOaUz8fQR/FShfR4qYOKovJ Lii+9Dxw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-00000004QMo-41bV; Sat, 27 Jul 2024 11:02:10 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id DB9603061F2; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105030.226163742@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:49 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 17/24] sched/fair: Implement delayed dequeue References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Extend / fix 86bfbb7ce4f6 ("sched/fair: Add lag based placement") by noting that lag is fundamentally a temporal measure. It should not be carried around indefinitely. OTOH it should also not be instantly discarded, doing so will allow a task to game the system by purposefully (micro) sleeping at the end of its time quantum. Since lag is intimately tied to the virtual time base, a wall-time based decay is also insufficient, notably competition is required for any of this to make sense. Instead, delay the dequeue and keep the 'tasks' on the runqueue, competing until they are eligible. Strictly speaking, we only care about keeping them until the 0-lag point, but that is a difficult proposition, instead carry them around until they get picked again, and dequeue them at that point. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/deadline.c | 1=20 kernel/sched/fair.c | 82 ++++++++++++++++++++++++++++++++++++++++++-= ----- kernel/sched/features.h | 9 +++++ 3 files changed, 81 insertions(+), 11 deletions(-) --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2428,7 +2428,6 @@ static struct task_struct *__pick_next_t else p =3D dl_se->server_pick_next(dl_se); if (!p) { - WARN_ON_ONCE(1); dl_se->dl_yielded =3D 1; update_curr_dl_se(rq, dl_se, 0); goto again; --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5379,20 +5379,44 @@ static void clear_buddies(struct cfs_rq =20 static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq); =20 -static void +static bool dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) { - int action =3D UPDATE_TG; + if (flags & DEQUEUE_DELAYED) { + /* + * DEQUEUE_DELAYED is typically called from pick_next_entity() + * at which point we've already done update_curr() and do not + * want to do so again. + */ + SCHED_WARN_ON(!se->sched_delayed); + se->sched_delayed =3D 0; + } else { + bool sleep =3D flags & DEQUEUE_SLEEP; + + /* + * DELAY_DEQUEUE relies on spurious wakeups, special task + * states must not suffer spurious wakeups, excempt them. + */ + if (flags & DEQUEUE_SPECIAL) + sleep =3D false; + + SCHED_WARN_ON(sleep && se->sched_delayed); + update_curr(cfs_rq); =20 + if (sched_feat(DELAY_DEQUEUE) && sleep && + !entity_eligible(cfs_rq, se)) { + if (cfs_rq->next =3D=3D se) + cfs_rq->next =3D NULL; + se->sched_delayed =3D 1; + return false; + } + } + + int action =3D UPDATE_TG; if (entity_is_task(se) && task_on_rq_migrating(task_of(se))) action |=3D DO_DETACH; =20 /* - * Update run-time statistics of the 'current'. - */ - update_curr(cfs_rq); - - /* * When dequeuing a sched_entity, we must: * - Update loads to have both entity and cfs_rq synced with now. * - For group_entity, update its runnable_weight to reflect the new @@ -5430,6 +5454,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, st =20 if (cfs_rq->nr_running =3D=3D 0) update_idle_cfs_rq_clock_pelt(cfs_rq); + + return true; } =20 static void @@ -5828,11 +5854,21 @@ static bool throttle_cfs_rq(struct cfs_r idle_task_delta =3D cfs_rq->idle_h_nr_running; for_each_sched_entity(se) { struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); + int flags; + /* throttled entity or throttle-on-deactivate */ if (!se->on_rq) goto done; =20 - dequeue_entity(qcfs_rq, se, DEQUEUE_SLEEP); + /* + * Abuse SPECIAL to avoid delayed dequeue in this instance. + * This avoids teaching dequeue_entities() about throttled + * entities and keeps things relatively simple. + */ + flags =3D DEQUEUE_SLEEP | DEQUEUE_SPECIAL; + if (se->sched_delayed) + flags |=3D DEQUEUE_DELAYED; + dequeue_entity(qcfs_rq, se, flags); =20 if (cfs_rq_is_idle(group_cfs_rq(se))) idle_task_delta =3D cfs_rq->h_nr_running; @@ -6918,6 +6954,7 @@ static int dequeue_entities(struct rq *r bool was_sched_idle =3D sched_idle_rq(rq); int rq_h_nr_running =3D rq->cfs.h_nr_running; bool task_sleep =3D flags & DEQUEUE_SLEEP; + bool task_delayed =3D flags & DEQUEUE_DELAYED; struct task_struct *p =3D NULL; int idle_h_nr_running =3D 0; int h_nr_running =3D 0; @@ -6931,7 +6968,13 @@ static int dequeue_entities(struct rq *r =20 for_each_sched_entity(se) { cfs_rq =3D cfs_rq_of(se); - dequeue_entity(cfs_rq, se, flags); + + if (!dequeue_entity(cfs_rq, se, flags)) { + if (p && &p->se =3D=3D se) + return -1; + + break; + } =20 cfs_rq->h_nr_running -=3D h_nr_running; cfs_rq->idle_h_nr_running -=3D idle_h_nr_running; @@ -6956,6 +6999,7 @@ static int dequeue_entities(struct rq *r break; } flags |=3D DEQUEUE_SLEEP; + flags &=3D ~(DEQUEUE_DELAYED | DEQUEUE_SPECIAL); } =20 for_each_sched_entity(se) { @@ -6985,6 +7029,17 @@ static int dequeue_entities(struct rq *r if (unlikely(!was_sched_idle && sched_idle_rq(rq))) rq->next_balance =3D jiffies; =20 + if (p && task_delayed) { + SCHED_WARN_ON(!task_sleep); + SCHED_WARN_ON(p->on_rq !=3D 1); + + /* Fix-up what dequeue_task_fair() skipped */ + hrtick_update(rq); + + /* Fix-up what block_task() skipped. */ + __block_task(rq, p); + } + return 1; } /* @@ -6996,8 +7051,10 @@ static bool dequeue_task_fair(struct rq { util_est_dequeue(&rq->cfs, p); =20 - if (dequeue_entities(rq, &p->se, flags) < 0) + if (dequeue_entities(rq, &p->se, flags) < 0) { + util_est_update(&rq->cfs, p, DEQUEUE_SLEEP); return false; + } =20 util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP); hrtick_update(rq); @@ -12973,6 +13030,11 @@ static void set_next_task_fair(struct rq /* ensure bandwidth has been allocated on our new cfs_rq */ account_cfs_rq_runtime(cfs_rq, 0); } + + if (!first) + return; + + SCHED_WARN_ON(se->sched_delayed); } =20 void init_cfs_rq(struct cfs_rq *cfs_rq) --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -29,6 +29,15 @@ SCHED_FEAT(NEXT_BUDDY, false) SCHED_FEAT(CACHE_HOT_BUDDY, true) =20 /* + * Delay dequeueing tasks until they get selected or woken. + * + * By delaying the dequeue for non-eligible tasks, they remain in the + * competition and can burn off their negative lag. When they get selected + * they'll have positive lag by definition. + */ +SCHED_FEAT(DELAY_DEQUEUE, true) + +/* * Allow wakeup-time preemption of the current task: */ SCHED_FEAT(WAKEUP_PREEMPTION, true) From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA1F11C2D for ; Sat, 27 Jul 2024 11:02:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; cv=none; b=AM2zS9di0i7xs03U269jYm962CP+NCcBdqkEFP/j5THOLfYQDiyMhAa3hkEg4Ix4MA6VQ+rG4oAdcawkxie2llGLJ5Bgui+LivW6sHF8dA/l0yRlDeH2jX38K0lXfFQKecQvkGaQ6gJTxUQ8V2LFGoQv3QxrwTeXUv8bhd4i3+Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; c=relaxed/simple; bh=TtplGg6uk5+Os8Ofsvg2aLyv6W87kcH4oBqgaXL3AJs=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=F/uJY2AJmzcfrKL6lZI2eZuHJYtRjgzYRP9l872MkwazPsfnS3mzkZAszk87AVNsnIXsXaDw+dK5nR/Xw/JBwOHzf+yaAqjRHnzTUVIiMIq2YHvG2n5Tz5CKax1mMhl3us9fqvkmQ/HloaPw6MPh+KAt3tCQbpxoZ8JxZa2KgTM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=JKlWRxQN; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="JKlWRxQN" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=f5nSfXsjbdXc2zt2BxCGMWrjQwJeiNZOYkwovsM214c=; b=JKlWRxQNVdLdUBSathHe/VV3gx yhd+jENL0Uh2EXU1a4TknwXGsyEat2pVKNzkUnBpdzAgpEuHnLxgGTsgGYI6ceB/8rN2OvzKz0KXD Qed9E0lyBhQbszfzNKhoB/u8HRKz3AxZ9F37WAEXL5id/BaXZuXwgPoai4RbsKh5N8Yn+AEhYnTHp iwilHKXXt5Ov7h9EEzRfjtYx87MDJetLLXbtZMzpzXL/SMop/LUGiB8CVRJ402bF1uKXXT3RTNDuX mHozYJvgWRwYAxJ7sdMicQxDyDBXA4WmklbqSYkikI6oaLz01JJ6QRO1KGhRB3rhvyg4JqnUOf0+h jx+Ez8Eg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBg-00000004QMp-41UF; Sat, 27 Jul 2024 11:02:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id E08C1306210; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105030.403750550@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:50 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 18/24] sched/fair: Implement DELAY_ZERO References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" 'Extend' DELAY_DEQUEUE by noting that since we wanted to dequeued them at the 0-lag point, truncate lag (eg. don't let them earn positive lag). Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 16 ++++++++++++++++ kernel/sched/features.h | 3 +++ 2 files changed, 19 insertions(+) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5529,6 +5529,8 @@ pick_next_entity(struct rq *rq, struct c dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED); SCHED_WARN_ON(se->sched_delayed); SCHED_WARN_ON(se->on_rq); + if (sched_feat(DELAY_ZERO) && se->vlag > 0) + se->vlag =3D 0; =20 return NULL; } @@ -6827,6 +6829,20 @@ requeue_delayed_entity(struct sched_enti SCHED_WARN_ON(!se->sched_delayed); SCHED_WARN_ON(!se->on_rq); =20 + if (sched_feat(DELAY_ZERO)) { + update_entity_lag(cfs_rq, se); + if (se->vlag > 0) { + cfs_rq->nr_running--; + if (se !=3D cfs_rq->curr) + __dequeue_entity(cfs_rq, se); + se->vlag =3D 0; + place_entity(cfs_rq, se, 0); + if (se !=3D cfs_rq->curr) + __enqueue_entity(cfs_rq, se); + cfs_rq->nr_running++; + } + } + se->sched_delayed =3D 0; } =20 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -34,8 +34,11 @@ SCHED_FEAT(CACHE_HOT_BUDDY, true) * By delaying the dequeue for non-eligible tasks, they remain in the * competition and can burn off their negative lag. When they get selected * they'll have positive lag by definition. + * + * DELAY_ZERO clips the lag on dequeue (or wakeup) to 0. */ SCHED_FEAT(DELAY_DEQUEUE, true) +SCHED_FEAT(DELAY_ZERO, true) =20 /* * Allow wakeup-time preemption of the current task: From nobody Sun Dec 14 06:15:47 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4558674055 for ; Sat, 27 Jul 2024 11:02:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078149; cv=none; b=odXiaaLvf74e+XHYoK5Yf0Gv7tpmcagwBCOoIPKb128WkWmnTE4unecdyE3g7Dbhj9AXpyUl90+nQNsyu+2wMjuCgN3rOJ7ao4sYnjwwnFN8chEH0Z8pRbg28VrsamJB5Mh71nW9mK1fnsqXSEvww09vMulouQdHCmlykaIyi64= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078149; c=relaxed/simple; bh=/bt1jsGljwmqXDGhUCqycO9P4j980Pnqs6I+pKCnaeQ=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Wka+d55hrNgO82vSRq07PSxPUa0lyA7h0M40uNgSfsOJmpPUwyRd+VOgOMuCHQBpqyBvdy+qvTYbP1e1DKvysMRc/rDqhQYqTc4CLHkrkJ5LICJbmQ607hMlYOg5ur8+tT/fgwhsRc0XY+iWMxm8/9blhaVJyjtNr6bepoPaML4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=OlXggREI; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="OlXggREI" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=KBYqzmCCXvnBT8Ca4wIXVC+Krpwe0R190Rjd8q21tHM=; b=OlXggREI53hQu8Ybf7Ynld8smL 5ggNhFVte30x3qcwiB2t62a6jVYOz3TB5F+UgKNYCNEFwds1mGwji/vphe+2xxgpIc7329AdYdKmr Lxy+bxyNz/sVSBWBEiG9O+aJEUWWVTKGONa86H1P+RqJoR1wWEKrpr0hVBJd/uADFrXsSZnBKa3bT TZqn3mG3GYiAIX8U5yN6+Ifg7cJ4wRQu7NDfgUXaSTQUh9zuTq5+h5NpCyVGw9v6XckADmXW13YCY 7Tb1ee1E/F9xOvHOu2pNNZFu9LksSMvJcIBN5jgpzmrOWZ3/Zq87IBgdx/ntVnoBErJIHgnuN2EBQ R3psUyWg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBh-0000000BGJX-1oRh; Sat, 27 Jul 2024 11:02:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id E568F306212; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105030.514088302@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:51 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 19/24] sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Note that tasks that are kept on the runqueue to burn off negative lag, are not in fact runnable anymore, they'll get dequeued the moment they get picked. As such, don't count this time towards runnable. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 2 ++ kernel/sched/sched.h | 6 ++++++ 2 files changed, 8 insertions(+) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5388,6 +5388,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, st if (cfs_rq->next =3D=3D se) cfs_rq->next =3D NULL; se->sched_delayed =3D 1; + update_load_avg(cfs_rq, se, 0); return false; } } @@ -6814,6 +6815,7 @@ requeue_delayed_entity(struct sched_enti } =20 se->sched_delayed =3D 0; + update_load_avg(cfs_rq, se, 0); } =20 /* --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -816,6 +816,9 @@ static inline void se_update_runnable(st =20 static inline long se_runnable(struct sched_entity *se) { + if (se->sched_delayed) + return false; + if (entity_is_task(se)) return !!se->on_rq; else @@ -830,6 +833,9 @@ static inline void se_update_runnable(st =20 static inline long se_runnable(struct sched_entity *se) { + if (se->sched_delayed) + return false; + return !!se->on_rq; } From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA33D4D8BA for ; Sat, 27 Jul 2024 11:02:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; cv=none; b=HMe3HCpS+w1YfEdxCShTcwgL2ldAVj0ZnG5IgCotAb3Osov7G7MkjNOmKSmXi2PThxEQ+ZVAlQirkywCYCMmQN5HajytJwFsDupcRAsAsHC6gGlN9OGal6iEnf4GrKAit/TrZGWXLTXWEFsrwHTyoMwt/hd7mb2Try3eyrjfiPo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; c=relaxed/simple; bh=/zltB2EK2a6pfqmWxiWhbdjIPYlwOWl743k4KtsH9c8=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=c8OGIvTwrYmJvYZX+7F7/G4E/UHfepH00NG3NJcjlQOFjINKLGmonTW3h0ajjlujD1GscZYjcXueZZCvw90MCGV2k+jwsbVRSEI/Ig38izE2CzJqNLqthDaHz3EokFQVrkd3rlEaSNlklVUAGdQf0KlGfFCJ95thAAi5oGxi6pc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=CN3niHGc; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="CN3niHGc" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=HBhWBQhAOUn18prUd0sfKay2l2ePtm4YMSYD6/f8bCc=; b=CN3niHGcpkYouJ81CdwRBR99D8 ve7jMLksCiNf1+vuldGmF5xhvfMZBYnXX8JCx87MQg1HIyEW7GxRJRPYpLl63NqfUFDRbuTcG/KuH 3D5/aHdk9tORr0MOSdGPvNNnCgVoh0RCQ7T85dm4Y0y17Ft+ZUqI+TKhIxULhGvxM38oFpgLJudMi n/DPCapqPHpFOfvPcsIcinFWG3bdNYxbi4zP+cCOEf6kHHEEBD/7iFdi35pliSSHbDYPJydAARl3z yPq1R5+4M0jsSofu/LUf5Ga15LJV9K6wpiq4RvNFMGqwyS5VNBLBu1cYdqSwhJpJkA7QAnv3SGDbp HAWEBl3Q==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBh-00000004QMw-1rrD; Sat, 27 Jul 2024 11:02:09 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id EA93730624A; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105030.625119246@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:52 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 20/24] sched/fair: Avoid re-setting virtual deadline on migrations References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" During OSPM24 Youssef noted that migrations are re-setting the virtual deadline. Notably everything that does a dequeue-enqueue, like setting nice, changing preferred numa-node, and a myriad of other random crap, will cause this to happen. This shouldn't be. Preserve the relative virtual deadline across such dequeue/enqueue cycles. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/sched.h | 6 ++++-- kernel/sched/fair.c | 23 ++++++++++++++++++----- kernel/sched/features.h | 4 ++++ 3 files changed, 26 insertions(+), 7 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -542,8 +542,10 @@ struct sched_entity { u64 min_vruntime; =20 struct list_head group_node; - unsigned int on_rq; - unsigned int sched_delayed; + unsigned char on_rq; + unsigned char sched_delayed; + unsigned char rel_deadline; + /* hole */ =20 u64 exec_start; u64 sum_exec_runtime; --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5270,6 +5270,12 @@ place_entity(struct cfs_rq *cfs_rq, stru =20 se->vruntime =3D vruntime - lag; =20 + if (sched_feat(PLACE_REL_DEADLINE) && se->rel_deadline) { + se->deadline +=3D se->vruntime; + se->rel_deadline =3D 0; + return; + } + /* * When joining the competition; the existing tasks will be, * on average, halfway through their slice, as such start tasks @@ -5382,6 +5388,8 @@ static __always_inline void return_cfs_r static bool dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) { + bool sleep =3D flags & DEQUEUE_SLEEP; + if (flags & DEQUEUE_DELAYED) { /* * DEQUEUE_DELAYED is typically called from pick_next_entity() @@ -5391,19 +5399,18 @@ dequeue_entity(struct cfs_rq *cfs_rq, st SCHED_WARN_ON(!se->sched_delayed); se->sched_delayed =3D 0; } else { - bool sleep =3D flags & DEQUEUE_SLEEP; - + bool delay =3D sleep; /* * DELAY_DEQUEUE relies on spurious wakeups, special task * states must not suffer spurious wakeups, excempt them. */ if (flags & DEQUEUE_SPECIAL) - sleep =3D false; + delay =3D false; =20 - SCHED_WARN_ON(sleep && se->sched_delayed); + SCHED_WARN_ON(delay && se->sched_delayed); update_curr(cfs_rq); =20 - if (sched_feat(DELAY_DEQUEUE) && sleep && + if (sched_feat(DELAY_DEQUEUE) && delay && !entity_eligible(cfs_rq, se)) { if (cfs_rq->next =3D=3D se) cfs_rq->next =3D NULL; @@ -5434,6 +5441,11 @@ dequeue_entity(struct cfs_rq *cfs_rq, st clear_buddies(cfs_rq, se); =20 update_entity_lag(cfs_rq, se); + if (sched_feat(PLACE_REL_DEADLINE) && !sleep) { + se->deadline -=3D se->vruntime; + se->rel_deadline =3D 1; + } + if (se !=3D cfs_rq->curr) __dequeue_entity(cfs_rq, se); se->on_rq =3D 0; @@ -13024,6 +13036,7 @@ static void switched_from_fair(struct rq * over the excursion into the new class. */ p->se.vlag =3D 0; + p->se.rel_deadline =3D 0; } =20 static void switched_to_fair(struct rq *rq, struct task_struct *p) --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -10,6 +10,10 @@ SCHED_FEAT(PLACE_LAG, true) */ SCHED_FEAT(PLACE_DEADLINE_INITIAL, true) /* + * Preserve relative virtual deadline on 'migration'. + */ +SCHED_FEAT(PLACE_REL_DEADLINE, true) +/* * Inhibit (wakeup) preemption until the current task has either matched t= he * 0-lag point or until is has exhausted it's slice. */ From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2DDAE53362 for ; Sat, 27 Jul 2024 11:02:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078148; cv=none; b=h/PZenOvodM2aQOu3tPXQNe+ozqc5cx7vkwgOiHuCWkq888EtAlYYURdkpc8o7G4DFfIR/NkBHUEN33NCOMKMwBglWy79tV4O6RxkRyZc0XCRAnZgCkkwQzByf8HxJ2ryYg2N1jCFF/jQvozIMs2Po6TwHU6+HWh/pCnQvgc2Oc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078148; c=relaxed/simple; bh=Or4+uACR+jQL3H4G5GA++OAJ9Aj6g9B8faMTAV3niIc=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=sgXdQ0E13mPZi9juZkd+UZQNILZDXXION2Fdbtd4oLjzZVQg1pnIUMoo8TEAvS1B0hT5a5fJLhNG2l/Ln6wty1kJfSox7NJvMDKmBb9v/mS84W1sedCbDoDRP2Ml0SeqEb691b+5QitN76DSfg0klKUyoKCLUS6tfO+GdBJTlLI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=fgsHNGWw; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="fgsHNGWw" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=BxHbmBvcrkqr1ZyDmvxI/RrR7sTlnrHIdKc8vA7hhaE=; b=fgsHNGWwlfVWyS3sLj0i8Y9d11 6bdonR1gkw4iYya1fz1uQVlcOdThRqpAXqJeAzNgz7LzhyVDyy/CEDhcXZDawklyIblTZ5uKcq4Yl pA8yBY9uOvOi47rG5BwLRUfKhSHYhXTdr14SDwyo02vKORVa9ORfJyXcIE8z38u5dmfxSCt0Da7JI 9EaiXr7To1waMtqmwbGgCHefY3arcW0KiiCKcSfc796zMQSSe9nOF+qUof3vOkTY7lH/4TPrKG8hV v/8voreNVY+JV55YP1AAD1OfVqV1gl/gk8OXtuHpVgyczNdk30YHnWQj0ptBWckGMljaagA/vE18o aH/V8Z6w==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBh-00000004QN1-3hcV; Sat, 27 Jul 2024 11:02:10 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id EEFEA307F59; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105030.735459544@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:53 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de, Mike Galbraith Subject: [PATCH 21/24] sched/eevdf: Allow shorter slices to wakeup-preempt References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Part of the reason to have shorter slices is to improve responsiveness. Allow shorter slices to preempt longer slices on wakeup. Task | Runtime ms | Switches | Avg delay ms | Ma= x delay ms | Sum delay ms | 100ms massive_intr 500us cyclictest NO_PREEMPT_SHORT 1 massive_intr:(5) | 846018.956 ms | 779188 | avg: 0.273 ms | ma= x: 58.337 ms | sum:212545.245 ms | 2 massive_intr:(5) | 853450.693 ms | 792269 | avg: 0.275 ms | ma= x: 71.193 ms | sum:218263.588 ms | 3 massive_intr:(5) | 843888.920 ms | 771456 | avg: 0.277 ms | ma= x: 92.405 ms | sum:213353.221 ms | 1 chromium-browse:(8) | 53015.889 ms | 131766 | avg: 0.463 ms | ma= x: 36.341 ms | sum:60959.230 ms | 2 chromium-browse:(8) | 53864.088 ms | 136962 | avg: 0.480 ms | ma= x: 27.091 ms | sum:65687.681 ms | 3 chromium-browse:(9) | 53637.904 ms | 132637 | avg: 0.481 ms | ma= x: 24.756 ms | sum:63781.673 ms | 1 cyclictest:(5) | 12615.604 ms | 639689 | avg: 0.471 ms | ma= x: 32.272 ms | sum:301351.094 ms | 2 cyclictest:(5) | 12511.583 ms | 642578 | avg: 0.448 ms | ma= x: 44.243 ms | sum:287632.830 ms | 3 cyclictest:(5) | 12545.867 ms | 635953 | avg: 0.475 ms | ma= x: 25.530 ms | sum:302374.658 ms | 100ms massive_intr 500us cyclictest PREEMPT_SHORT 1 massive_intr:(5) | 839843.919 ms | 837384 | avg: 0.264 ms | ma= x: 74.366 ms | sum:221476.885 ms | 2 massive_intr:(5) | 852449.913 ms | 845086 | avg: 0.252 ms | ma= x: 68.162 ms | sum:212595.968 ms | 3 massive_intr:(5) | 839180.725 ms | 836883 | avg: 0.266 ms | ma= x: 69.742 ms | sum:222812.038 ms | 1 chromium-browse:(11) | 54591.481 ms | 138388 | avg: 0.458 ms | ma= x: 35.427 ms | sum:63401.508 ms | 2 chromium-browse:(8) | 52034.541 ms | 132276 | avg: 0.436 ms | ma= x: 31.826 ms | sum:57732.958 ms | 3 chromium-browse:(8) | 55231.771 ms | 141892 | avg: 0.469 ms | ma= x: 27.607 ms | sum:66538.697 ms | 1 cyclictest:(5) | 13156.391 ms | 667412 | avg: 0.373 ms | ma= x: 38.247 ms | sum:249174.502 ms | 2 cyclictest:(5) | 12688.939 ms | 665144 | avg: 0.374 ms | ma= x: 33.548 ms | sum:248509.392 ms | 3 cyclictest:(5) | 13475.623 ms | 669110 | avg: 0.370 ms | ma= x: 37.819 ms | sum:247673.390 ms | As per the numbers the, this makes cyclictest (short slice) it's max-delay more consistent and consistency drops the sum-delay. The trade-off is that the massive_intr (long slice) gets more context switches and a slight increase in sum-delay. [mike: numbers] Signed-off-by: Peter Zijlstra (Intel) Tested-by: Mike Galbraith --- kernel/sched/fair.c | 64 ++++++++++++++++++++++++++++++++++++++++++-= ----- kernel/sched/features.h | 5 +++ 2 files changed, 61 insertions(+), 8 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -973,10 +973,10 @@ static void clear_buddies(struct cfs_rq * XXX: strictly: vd_i +=3D N*r_i/w_i such that: vd_i > ve_i * this is probably good enough. */ -static void update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se) +static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se) { if ((s64)(se->vruntime - se->deadline) < 0) - return; + return false; =20 /* * For EEVDF the virtual time slope is determined by w_i (iow. @@ -993,10 +993,7 @@ static void update_deadline(struct cfs_r /* * The task has consumed its request, reschedule. */ - if (cfs_rq->nr_running > 1) { - resched_curr(rq_of(cfs_rq)); - clear_buddies(cfs_rq, se); - } + return true; } =20 #include "pelt.h" @@ -1134,6 +1131,38 @@ static inline void update_curr_task(stru dl_server_update(p->dl_server, delta_exec); } =20 +static inline bool did_preempt_short(struct cfs_rq *cfs_rq, struct sched_e= ntity *curr) +{ + if (!sched_feat(PREEMPT_SHORT)) + return false; + + if (curr->vlag =3D=3D curr->deadline) + return false; + + return !entity_eligible(cfs_rq, curr); +} + +static inline bool do_preempt_short(struct cfs_rq *cfs_rq, + struct sched_entity *pse, struct sched_entity *se) +{ + if (!sched_feat(PREEMPT_SHORT)) + return false; + + if (pse->slice >=3D se->slice) + return false; + + if (!entity_eligible(cfs_rq, pse)) + return false; + + if (entity_before(pse, se)) + return true; + + if (!entity_eligible(cfs_rq, se)) + return true; + + return false; +} + /* * Used by other classes to account runtime. */ @@ -1157,6 +1186,7 @@ static void update_curr(struct cfs_rq *c struct sched_entity *curr =3D cfs_rq->curr; struct rq *rq =3D rq_of(cfs_rq); s64 delta_exec; + bool resched; =20 if (unlikely(!curr)) return; @@ -1166,7 +1196,7 @@ static void update_curr(struct cfs_rq *c return; =20 curr->vruntime +=3D calc_delta_fair(delta_exec, curr); - update_deadline(cfs_rq, curr); + resched =3D update_deadline(cfs_rq, curr); update_min_vruntime(cfs_rq); =20 if (entity_is_task(curr)) { @@ -1184,6 +1214,14 @@ static void update_curr(struct cfs_rq *c } =20 account_cfs_rq_runtime(cfs_rq, delta_exec); + + if (rq->nr_running =3D=3D 1) + return; + + if (resched || did_preempt_short(cfs_rq, curr)) { + resched_curr(rq); + clear_buddies(cfs_rq, curr); + } } =20 static void update_curr_fair(struct rq *rq) @@ -8611,7 +8649,17 @@ static void check_preempt_wakeup_fair(st cfs_rq =3D cfs_rq_of(se); update_curr(cfs_rq); /* - * XXX pick_eevdf(cfs_rq) !=3D se ? + * If @p has a shorter slice than current and @p is eligible, override + * current's slice protection in order to allow preemption. + * + * Note that even if @p does not turn out to be the most eligible + * task at this moment, current's slice protection will be lost. + */ + if (do_preempt_short(cfs_rq, pse, se) && se->vlag =3D=3D se->deadline) + se->vlag =3D se->deadline + 1; + + /* + * If @p has become the most eligible task, force preemption. */ if (pick_eevdf(cfs_rq) =3D=3D pse) goto preempt; --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -18,6 +18,11 @@ SCHED_FEAT(PLACE_REL_DEADLINE, true) * 0-lag point or until is has exhausted it's slice. */ SCHED_FEAT(RUN_TO_PARITY, true) +/* + * Allow wakeup of tasks with a shorter slice to cancel RESPECT_SLICE for + * current. + */ +SCHED_FEAT(PREEMPT_SHORT, true) =20 /* * Prefer to schedule the task we woke last (assuming it failed From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 841DF53373 for ; Sat, 27 Jul 2024 11:02:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; cv=none; b=h/zYuxaltwdJo1WjdqEf+OSLIgUca9CWuts3Xv2NZKKKJ378rTtLhpmoPU24L5XxIvQL3xIRLNjxThoUmcZl+V61HMiuDux/F9eE6Xjkakimgz+TI34oML/wF3fyAQdLGWgGznrJE4ZU6/P0cjFnJt2bhbhz043goWV2igL24z4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; c=relaxed/simple; bh=oQpf4g/SySD6xJ/0w+UWEVV5+reLGOg1YKvu+Lqt664=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=p0Ztv1regamJ/U2nAL2m03xNj9zkcggtCX2bblIm5CNYzYYIuhkm8bg8Q+pq2XXPo6utAoz8WL7BBbzyn72ldjZs1O+8ec41Axxdt1zObJc4U0nVyxUBWvJ9jG9JwdXiQZwNGGAl7MVeWAlsDH7+SZ6BKjscq7XLPRJUuQ8A/0c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=izh7o+SR; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="izh7o+SR" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=FAkwvs/DyGvaLJA1zsiw9ImjvNqvGv3MsoxpjJVRuYw=; b=izh7o+SRwUt0X1MlRXI9Z8ZmYD 7/U4kEQCHtjRo+UtdCJJx45Np3FfB6xj1bOsZ6hSXQkH9x0s5n+7wePFyTlqprwtOmSyrba/pf/Y6 REn3NLSNcygym8nzkX0dom1YVLRJtwdvIqYLJ8HRhJQWgRu6Ke/Ejqb/ZQYVtWJBrqU0wCtCbxMfo /eKAQIKYshH/ZJboLstFddQ7RtA8UTP7skRLleRi5tbYW57YlnqzCex5/ugMyeIT+YTlgqtT0k5ZP hj4UcDFiQxVzQkXJghax7VUmf3zDZdiWyhIjq1gvmAgWNBeGb04r0Qk8qVIZoSITjlNMRV3zV/m2S gA7bWuDQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBh-00000004QN2-3hcN; Sat, 27 Jul 2024 11:02:10 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id F411F308024; Sat, 27 Jul 2024 13:02:06 +0200 (CEST) Message-Id: <20240727105030.842834421@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:54 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 22/24] sched/eevdf: Use sched_attr::sched_runtime to set request/slice suggestion References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allow applications to directly set a suggested request/slice length using sched_attr::sched_runtime. The implementation clamps the value to: 0.1[ms] <=3D slice <=3D 100[ms] which is 1/10 the size of HZ=3D1000 and 10 times the size of HZ=3D100. Applications should strive to use their periodic runtime at a high confidence interval (95%+) as the target slice. Using a smaller slice will introduce undue preemptions, while using a larger value will increase latency. For all the following examples assume a scheduling quantum of 8, and for consistency all examples have W=3D4: {A,B,C,D}(w=3D1,r=3D8): ABCD... +---+---+---+--- t=3D0, V=3D1.5 t=3D1, V=3D3.5 A |------< A |------< B |------< B |------< C |------< C |------< D |------< D |------< ---+*------+-------+--- ---+--*----+-------+--- t=3D2, V=3D5.5 t=3D3, V=3D7.5 A |------< A |------< B |------< B |------< C |------< C |------< D |------< D |------< ---+----*--+-------+--- ---+------*+-------+--- Note: 4 identical tasks in FIFO order ~~~ {A,B}(w=3D1,r=3D16) C(w=3D2,r=3D16) AACCBBCC... +---+---+---+--- t=3D0, V=3D1.25 t=3D2, V=3D5.25 A |--------------< A |--------------< B |--------------< B |--------------< C |------< C |------< ---+*------+-------+--- ---+----*--+-------+--- t=3D4, V=3D8.25 t=3D6, V=3D12.25 A |--------------< A |--------------< B |--------------< B |--------------< C |------< C |------< ---+-------*-------+--- ---+-------+---*---+--- Note: 1 heavy task -- because q=3D8, double r such that the deadline of the= w=3D2 task doesn't go below q. Note: observe the full schedule becomes: W*max(r_i/w_i) =3D 4*2q =3D 8q in = length. Note: the period of the heavy task is half the full period at: W*(r_i/w_i) =3D 4*(2q/2) =3D 4q ~~~ {A,C,D}(w=3D1,r=3D16) B(w=3D1,r=3D8): BAACCBDD... +---+---+---+--- t=3D0, V=3D1.5 t=3D1, V=3D3.5 A |--------------< A |---------------< B |------< B |------< C |--------------< C |--------------< D |--------------< D |--------------< ---+*------+-------+--- ---+--*----+-------+--- t=3D3, V=3D7.5 t=3D5, V=3D11.5 A |---------------< A |---------------< B |------< B |------< C |--------------< C |-------------= -< D |--------------< D |--------------< ---+------*+-------+--- ---+-------+--*----+--- t=3D6, V=3D13.5 A |---------------< B |------< C |--------------< D |--------------< ---+-------+----*--+--- Note: 1 short task -- again double r so that the deadline of the short task won't be below q. Made B short because its not the leftmost task, but= is eligible with the 0,1,2,3 spread. Note: like with the heavy task, the period of the short task observes: W*(r_i/w_i) =3D 4*(1q/1) =3D 4q ~~~ A(w=3D1,r=3D16) B(w=3D1,r=3D8) C(w=3D2,r=3D16) BCCAABCC... +---+---+---+--- t=3D0, V=3D1.25 t=3D1, V=3D3.25 A |--------------< A |--------------< B |------< B |------< C |------< C |------< ---+*------+-------+--- ---+--*----+-------+--- t=3D3, V=3D7.25 t=3D5, V=3D11.25 A |--------------< A |--------------< B |------< B |------< C |------< C |------< ---+------*+-------+--- ---+-------+--*----+--- t=3D6, V=3D13.25 A |--------------< B |------< C |------< ---+-------+----*--+--- Note: 1 heavy and 1 short task -- combine them all. Note: both the short and heavy task end up with a period of 4q ~~~ A(w=3D1,r=3D16) B(w=3D2,r=3D16) C(w=3D1,r=3D8) BBCAABBC... +---+---+---+--- t=3D0, V=3D1 t=3D2, V=3D5 A |--------------< A |--------------< B |------< B |------< C |------< C |------< ---+*------+-------+--- ---+----*--+-------+--- t=3D3, V=3D7 t=3D5, V=3D11 A |--------------< A |--------------< B |------< B |------< C |------< C |------< ---+------*+-------+--- ---+-------+--*----+--- t=3D7, V=3D15 A |--------------< B |------< C |------< ---+-------+------*+--- Note: as before but permuted ~~~ >From all this it can be deduced that, for the steady state: - the total period (P) of a schedule is: W*max(r_i/w_i) - the average period of a task is: W*(r_i/w_i) - each task obtains the fair share: w_i/W of each full period P Signed-off-by: Peter Zijlstra (Intel) --- include/linux/sched.h | 1 + kernel/sched/core.c | 4 +++- kernel/sched/debug.c | 3 ++- kernel/sched/fair.c | 6 ++++-- kernel/sched/syscalls.c | 29 +++++++++++++++++++++++------ 5 files changed, 33 insertions(+), 10 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -544,6 +544,7 @@ struct sched_entity { unsigned char on_rq; unsigned char sched_delayed; unsigned char rel_deadline; + unsigned char custom_slice; /* hole */ =20 u64 exec_start; --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4347,7 +4347,6 @@ static void __sched_fork(unsigned long c p->se.nr_migrations =3D 0; p->se.vruntime =3D 0; p->se.vlag =3D 0; - p->se.slice =3D sysctl_sched_base_slice; INIT_LIST_HEAD(&p->se.group_node); =20 /* A delayed task cannot be in clone(). */ @@ -4600,6 +4599,8 @@ int sched_fork(unsigned long clone_flags =20 p->prio =3D p->normal_prio =3D p->static_prio; set_load_weight(p, false); + p->se.custom_slice =3D 0; + p->se.slice =3D sysctl_sched_base_slice; =20 /* * We don't need the reset flag anymore after the fork. It has @@ -8328,6 +8329,7 @@ void __init sched_init(void) } =20 set_load_weight(&init_task, false); + init_task.se.slice =3D sysctl_sched_base_slice, =20 /* * The boot idle thread does lazy MMU switching as well: --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -580,11 +580,12 @@ print_task(struct seq_file *m, struct rq else SEQ_printf(m, " %c", task_state_to_char(p)); =20 - SEQ_printf(m, "%15s %5d %9Ld.%06ld %c %9Ld.%06ld %9Ld.%06ld %9Ld.%06ld %9= Ld %5d ", + SEQ_printf(m, "%15s %5d %9Ld.%06ld %c %9Ld.%06ld %c %9Ld.%06ld %9Ld.%06ld= %9Ld %5d ", p->comm, task_pid_nr(p), SPLIT_NS(p->se.vruntime), entity_eligible(cfs_rq_of(&p->se), &p->se) ? 'E' : 'N', SPLIT_NS(p->se.deadline), + p->se.custom_slice ? 'S' : ' ', SPLIT_NS(p->se.slice), SPLIT_NS(p->se.sum_exec_runtime), (long long)(p->nvcsw + p->nivcsw), --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -995,7 +995,8 @@ static void update_deadline(struct cfs_r * nice) while the request time r_i is determined by * sysctl_sched_base_slice. */ - se->slice =3D sysctl_sched_base_slice; + if (!se->custom_slice) + se->slice =3D sysctl_sched_base_slice; =20 /* * EEVDF: vd_i =3D ve_i + r_i / w_i @@ -5190,7 +5191,8 @@ place_entity(struct cfs_rq *cfs_rq, stru u64 vslice, vruntime =3D avg_vruntime(cfs_rq); s64 lag =3D 0; =20 - se->slice =3D sysctl_sched_base_slice; + if (!se->custom_slice) + se->slice =3D sysctl_sched_base_slice; vslice =3D calc_delta_fair(se->slice, se); =20 /* --- a/kernel/sched/syscalls.c +++ b/kernel/sched/syscalls.c @@ -401,10 +401,20 @@ static void __setscheduler_params(struct =20 p->policy =3D policy; =20 - if (dl_policy(policy)) + if (dl_policy(policy)) { __setparam_dl(p, attr); - else if (fair_policy(policy)) + } else if (fair_policy(policy)) { p->static_prio =3D NICE_TO_PRIO(attr->sched_nice); + if (attr->sched_runtime) { + p->se.custom_slice =3D 1; + p->se.slice =3D clamp_t(u64, attr->sched_runtime, + NSEC_PER_MSEC/10, /* HZ=3D1000 * 10 */ + NSEC_PER_MSEC*100); /* HZ=3D100 / 10 */ + } else { + p->se.custom_slice =3D 0; + p->se.slice =3D sysctl_sched_base_slice; + } + } =20 /* * __sched_setscheduler() ensures attr->sched_priority =3D=3D 0 when @@ -700,7 +710,9 @@ int __sched_setscheduler(struct task_str * but store a possible modification of reset_on_fork. */ if (unlikely(policy =3D=3D p->policy)) { - if (fair_policy(policy) && attr->sched_nice !=3D task_nice(p)) + if (fair_policy(policy) && + (attr->sched_nice !=3D task_nice(p) || + (attr->sched_runtime !=3D p->se.slice))) goto change; if (rt_policy(policy) && attr->sched_priority !=3D p->rt_priority) goto change; @@ -846,6 +858,9 @@ static int _sched_setscheduler(struct ta .sched_nice =3D PRIO_TO_NICE(p->static_prio), }; =20 + if (p->se.custom_slice) + attr.sched_runtime =3D p->se.slice; + /* Fixup the legacy SCHED_RESET_ON_FORK hack. */ if ((policy !=3D SETPARAM_POLICY) && (policy & SCHED_RESET_ON_FORK)) { attr.sched_flags |=3D SCHED_FLAG_RESET_ON_FORK; @@ -1012,12 +1027,14 @@ static int sched_copy_attr(struct sched_ =20 static void get_params(struct task_struct *p, struct sched_attr *attr) { - if (task_has_dl_policy(p)) + if (task_has_dl_policy(p)) { __getparam_dl(p, attr); - else if (task_has_rt_policy(p)) + } else if (task_has_rt_policy(p)) { attr->sched_priority =3D p->rt_priority; - else + } else { attr->sched_nice =3D task_nice(p); + attr->sched_runtime =3D p->se.slice; + } } =20 /** From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA30A4AEE9 for ; Sat, 27 Jul 2024 11:02:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; cv=none; b=uXGpEfjOTJVHBc+fhzDCwFv+XSKyGMP4Olb21GCcXNpJRktP8J4PIjqDDcUMbaUclkV5Iraz09hzlnRIb/EHgPnBE1cB2qsyX36yOOLmxfEd27GDkV+W4rH+xiMvrg+sotM2SsuzjQ5PCSoogY9qk/NlmG9gQERXCtiu+tZtvmM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078146; c=relaxed/simple; bh=GatGV5EiNrcfC9FtYW3ixHu+CIAH0wQyqLtdISPb76U=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=LuS2NGGD0PyCR8jw98w19qahjZwh8vQOt3n/rylkj/Sgpzj55SF+Az4cOXJ3Hyg+9QcxqWSXfsrOaC7auDCG4oJ/mvm1nIVuPN6wbzFJLW9cwBb/cbjXHOafA1OShuN9nnltgkkC80VAjY4SKn7t5J1/XK40csOj5jnMZp8J+9U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Bb+U52wi; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Bb+U52wi" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=zyZc3oXnO0kyE8lQTKVm1rBPZIzLBcn8udEj4bjNwec=; b=Bb+U52wi0jOtjccVx1Gv1TYwDY Ie2DdsMsNZ9BT8s1pwB2gHkOQ9LIMN9PmB0mh3PjFKFrn+JOib2Nh2GRl8rWmpfaUIibrb6YySNh0 tINNh5/QTGl8w04xLCIYGtYL0sBrfLnkObKOkrxzv11abWWbPJezZUcq73JMN8HIKVxYSdS32X030 F5ikwuSOw1yls73It62fmPIXfo+fXS3nDb638QFpZmbjq4u9JdwhfZT79xDm/yC9xp+arRYWiYdNa Xa6cy36Usd5YZ/B7xSETA8ogrJRcS4V8GwLBYTzS7qHH889a6Q1iSVBRTY5GbV6QCKTYzjc36DNVT jp9TgfYQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBh-00000004QN3-42Nd; Sat, 27 Jul 2024 11:02:10 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 046AE308147; Sat, 27 Jul 2024 13:02:07 +0200 (CEST) Message-Id: <20240727105030.948188417@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:55 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [PATCH 23/24] sched/eevdf: Propagate min_slice up the cgroup hierarchy References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the absence of an explicit cgroup slice configureation, make mixed slice length work with cgroups by propagating the min_slice up the hierarchy. This ensures the cgroup entity gets timely service to service its entities that have this timing constraint set on them. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/sched.h | 1=20 kernel/sched/fair.c | 57 +++++++++++++++++++++++++++++++++++++++++++++= ++++- 2 files changed, 57 insertions(+), 1 deletion(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -542,6 +542,7 @@ struct sched_entity { struct rb_node run_node; u64 deadline; u64 min_vruntime; + u64 min_slice; =20 struct list_head group_node; unsigned char on_rq; --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -782,6 +782,21 @@ static void update_min_vruntime(struct c cfs_rq->min_vruntime =3D __update_min_vruntime(cfs_rq, vruntime); } =20 +static inline u64 cfs_rq_min_slice(struct cfs_rq *cfs_rq) +{ + struct sched_entity *root =3D __pick_root_entity(cfs_rq); + struct sched_entity *curr =3D cfs_rq->curr; + u64 min_slice =3D ~0ULL; + + if (curr && curr->on_rq) + min_slice =3D curr->slice; + + if (root) + min_slice =3D min(min_slice, root->min_slice); + + return min_slice; +} + static inline bool __entity_less(struct rb_node *a, const struct rb_node *= b) { return entity_before(__node_2_se(a), __node_2_se(b)); @@ -798,19 +813,34 @@ static inline void __min_vruntime_update } } =20 +static inline void __min_slice_update(struct sched_entity *se, struct rb_n= ode *node) +{ + if (node) { + struct sched_entity *rse =3D __node_2_se(node); + if (rse->min_slice < se->min_slice) + se->min_slice =3D rse->min_slice; + } +} + /* * se->min_vruntime =3D min(se->vruntime, {left,right}->min_vruntime) */ static inline bool min_vruntime_update(struct sched_entity *se, bool exit) { u64 old_min_vruntime =3D se->min_vruntime; + u64 old_min_slice =3D se->min_slice; struct rb_node *node =3D &se->run_node; =20 se->min_vruntime =3D se->vruntime; __min_vruntime_update(se, node->rb_right); __min_vruntime_update(se, node->rb_left); =20 - return se->min_vruntime =3D=3D old_min_vruntime; + se->min_slice =3D se->slice; + __min_slice_update(se, node->rb_right); + __min_slice_update(se, node->rb_left); + + return se->min_vruntime =3D=3D old_min_vruntime && + se->min_slice =3D=3D old_min_slice; } =20 RB_DECLARE_CALLBACKS(static, min_vruntime_cb, struct sched_entity, @@ -823,6 +853,7 @@ static void __enqueue_entity(struct cfs_ { avg_vruntime_add(cfs_rq, se); se->min_vruntime =3D se->vruntime; + se->min_slice =3D se->slice; rb_add_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline, __entity_less, &min_vruntime_cb); } @@ -6917,6 +6948,7 @@ enqueue_task_fair(struct rq *rq, struct int idle_h_nr_running =3D task_has_idle_policy(p); int task_new =3D !(flags & ENQUEUE_WAKEUP); int rq_h_nr_running =3D rq->cfs.h_nr_running; + u64 slice =3D 0; =20 if (flags & ENQUEUE_DELAYED) { requeue_delayed_entity(se); @@ -6946,7 +6978,18 @@ enqueue_task_fair(struct rq *rq, struct break; } cfs_rq =3D cfs_rq_of(se); + + /* + * Basically set the slice of group entries to the min_slice of + * their respective cfs_rq. This ensures the group can service + * its entities in the desired time-frame. + */ + if (slice) { + se->slice =3D slice; + se->custom_slice =3D 1; + } enqueue_entity(cfs_rq, se, flags); + slice =3D cfs_rq_min_slice(cfs_rq); =20 cfs_rq->h_nr_running++; cfs_rq->idle_h_nr_running +=3D idle_h_nr_running; @@ -6968,6 +7011,9 @@ enqueue_task_fair(struct rq *rq, struct se_update_runnable(se); update_cfs_group(se); =20 + se->slice =3D slice; + slice =3D cfs_rq_min_slice(cfs_rq); + cfs_rq->h_nr_running++; cfs_rq->idle_h_nr_running +=3D idle_h_nr_running; =20 @@ -7033,11 +7079,15 @@ static int dequeue_entities(struct rq *r int idle_h_nr_running =3D 0; int h_nr_running =3D 0; struct cfs_rq *cfs_rq; + u64 slice =3D 0; =20 if (entity_is_task(se)) { p =3D task_of(se); h_nr_running =3D 1; idle_h_nr_running =3D task_has_idle_policy(p); + } else { + cfs_rq =3D group_cfs_rq(se); + slice =3D cfs_rq_min_slice(cfs_rq); } =20 for_each_sched_entity(se) { @@ -7062,6 +7112,8 @@ static int dequeue_entities(struct rq *r =20 /* Don't dequeue parent if it has other entities besides us */ if (cfs_rq->load.weight) { + slice =3D cfs_rq_min_slice(cfs_rq); + /* Avoid re-evaluating load for this entity: */ se =3D parent_entity(se); /* @@ -7083,6 +7135,9 @@ static int dequeue_entities(struct rq *r se_update_runnable(se); update_cfs_group(se); =20 + se->slice =3D slice; + slice =3D cfs_rq_min_slice(cfs_rq); + cfs_rq->h_nr_running -=3D h_nr_running; cfs_rq->idle_h_nr_running -=3D idle_h_nr_running; From nobody Sun Dec 14 06:15:47 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB98055894 for ; Sat, 27 Jul 2024 11:02:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; cv=none; b=XS35I0pCGST6IJExlSpUOGDUDqgwhCuheQGBKpG2Te3q1VBYOaHhfkVq0X4b6lmk3ZY9fbI1RLSVJ7tGeaEEdKhyhR1Pbv2GcNmkkkb15kTIE7o2DtfoawLb1eG+KYrCn7AMIYTMABFVXl7Vxu1KparHzqn2040omMbORzqt0V0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722078147; c=relaxed/simple; bh=jvrYKQlA1u/jdzQXZTXd1P3gWMEwCKuD3bzEi2iyJFE=; h=Message-Id:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=rjrr9V7JK+3pFnh89BtP1tmGfr+YaScqddIeiCiAJTz5sbjokNFJ1jSo8U3l0xMtUlwN9londk2cRs6TL0i+Qh/RmLcqopeGpaNSZfEWvhSr4buWEd8yhVFiH7xLMFxCT858BhGisjGeo25IzaOosIfjOeKW2DSWsupGBlsUhlc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=eS13RqF4; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="eS13RqF4" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-Id:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=yJLgJxK/rIWYjJubSkU0CormnL4GDwWbu+QPPlsX1Sk=; b=eS13RqF4jL6eGUDhOoMWabFclS NAYkP7Xe9mP2WQ35cNAiISO0wNYm4ANQrlzGXnXlCAaA8/Yj+6LwUh/Q4MYgNEfQVjf49T/6yCFq6 y8WMuVHaaLRvHGZkbAP8vSbhJgzaAuWYFU5Xi8mdepC+4tJYeTI7w0ikLMkW55NDidlG6tMFhDS8e zAg37WJlGyAaMplapfqulk+RYorvajtmEYb8rno0LtbrSM/TMDlHPqoqvfKIZ5fEEyIK4MMtxV1xS XdiULU9nDNnhh32aeTkGXfPlTgoyd9IV5sTj/ircgBmS1aVqIRMP05z+MyULdde3RFCxC9J/b9bd0 4cBsjMPA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.97.1 #2 (Red Hat Linux)) id 1sXfBh-00000004QN4-42Nk; Sat, 27 Jul 2024 11:02:10 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 09823308528; Sat, 27 Jul 2024 13:02:07 +0200 (CEST) Message-Id: <20240727105031.053611186@infradead.org> User-Agent: quilt/0.65 Date: Sat, 27 Jul 2024 12:27:56 +0200 From: Peter Zijlstra To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: kprateek.nayak@amd.com, wuyun.abel@bytedance.com, youssefesmat@chromium.org, tglx@linutronix.de, efault@gmx.de Subject: [RFC PATCH 24/24] sched/time: Introduce CLOCK_THREAD_DVFS_ID References: <20240727102732.960974693@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to measure thread time in a DVFS world, introduce CLOCK_THREAD_DVFS_ID -- a copy of CLOCK_THREAD_CPUTIME_ID that slows down with both DVFS scaling and CPU capacity. The clock does *NOT* support setting timers. Useful for both SCHED_DEADLINE and the newly introduced sched_attr::sched_runtime usage for SCHED_NORMAL. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/posix-timers_types.h | 5 ++-- include/linux/sched.h | 1=20 include/linux/sched/cputime.h | 3 ++ include/uapi/linux/time.h | 1=20 kernel/sched/core.c | 40 ++++++++++++++++++++++++++++++++= +++++ kernel/sched/fair.c | 8 +++++-- kernel/time/posix-cpu-timers.c | 16 +++++++++++++- kernel/time/posix-timers.c | 1=20 kernel/time/posix-timers.h | 1=20 9 files changed, 71 insertions(+), 5 deletions(-) --- a/include/linux/posix-timers_types.h +++ b/include/linux/posix-timers_types.h @@ -13,9 +13,9 @@ * * Bit 2 indicates whether a cpu clock refers to a thread or a process. * - * Bits 1 and 0 give the type: PROF=3D0, VIRT=3D1, SCHED=3D2, or FD=3D3. + * Bits 1 and 0 give the type: PROF=3D0, VIRT=3D1, SCHED=3D2, or DVSF=3D3 * - * A clockid is invalid if bits 2, 1, and 0 are all set. + * (DVFS is PERTHREAD only) */ #define CPUCLOCK_PID(clock) ((pid_t) ~((clock) >> 3)) #define CPUCLOCK_PERTHREAD(clock) \ @@ -27,6 +27,7 @@ #define CPUCLOCK_PROF 0 #define CPUCLOCK_VIRT 1 #define CPUCLOCK_SCHED 2 +#define CPUCLOCK_DVFS 3 #define CPUCLOCK_MAX 3 #define CLOCKFD CPUCLOCK_MAX #define CLOCKFD_MASK (CPUCLOCK_PERTHREAD_MASK|CPUCLOCK_CLOCK_MASK) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -550,6 +550,7 @@ struct sched_entity { u64 exec_start; u64 sum_exec_runtime; u64 prev_sum_exec_runtime; + u64 sum_dvfs_runtime; u64 vruntime; s64 vlag; u64 slice; --- a/include/linux/sched/cputime.h +++ b/include/linux/sched/cputime.h @@ -180,4 +180,7 @@ static inline void prev_cputime_init(str extern unsigned long long task_sched_runtime(struct task_struct *task); =20 +extern unsigned long long +task_sched_dvfs_runtime(struct task_struct *task); + #endif /* _LINUX_SCHED_CPUTIME_H */ --- a/include/uapi/linux/time.h +++ b/include/uapi/linux/time.h @@ -62,6 +62,7 @@ struct timezone { */ #define CLOCK_SGI_CYCLE 10 #define CLOCK_TAI 11 +#define CLOCK_THREAD_DVFS_ID 12 =20 #define MAX_CLOCKS 16 #define CLOCKS_MASK (CLOCK_REALTIME | CLOCK_MONOTONIC) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4551,6 +4551,7 @@ static void __sched_fork(unsigned long c p->se.exec_start =3D 0; p->se.sum_exec_runtime =3D 0; p->se.prev_sum_exec_runtime =3D 0; + p->se.sum_dvfs_runtime =3D 0; p->se.nr_migrations =3D 0; p->se.vruntime =3D 0; p->se.vlag =3D 0; @@ -5632,6 +5633,45 @@ unsigned long long task_sched_runtime(st task_rq_unlock(rq, p, &rf); =20 return ns; +} + +unsigned long long task_sched_dvfs_runtime(struct task_struct *p) +{ + struct rq_flags rf; + struct rq *rq; + u64 ns; + +#if defined(CONFIG_64BIT) && defined(CONFIG_SMP) + /* + * 64-bit doesn't need locks to atomically read a 64-bit value. + * So we have a optimization chance when the task's delta_exec is 0. + * Reading ->on_cpu is racy, but this is ok. + * + * If we race with it leaving CPU, we'll take a lock. So we're correct. + * If we race with it entering CPU, unaccounted time is 0. This is + * indistinguishable from the read occurring a few cycles earlier. + * If we see ->on_cpu without ->on_rq, the task is leaving, and has + * been accounted, so we're correct here as well. + */ + if (!p->on_cpu || !task_on_rq_queued(p)) + return p->se.sum_dvfs_runtime; +#endif + + rq =3D task_rq_lock(p, &rf); + /* + * Must be ->curr _and_ ->on_rq. If dequeued, we would + * project cycles that may never be accounted to this + * thread, breaking clock_gettime(). + */ + if (task_current(rq, p) && task_on_rq_queued(p)) { + prefetch_curr_exec_start(p); + update_rq_clock(rq); + p->sched_class->update_curr(rq); + } + ns =3D p->se.sum_dvfs_runtime; + task_rq_unlock(rq, p, &rf); + + return ns; } =20 #ifdef CONFIG_SCHED_DEBUG --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1118,15 +1118,19 @@ static void update_tg_load_avg(struct cf static s64 update_curr_se(struct rq *rq, struct sched_entity *curr) { u64 now =3D rq_clock_task(rq); - s64 delta_exec; + s64 delta_exec, delta_dvfs; =20 - delta_exec =3D now - curr->exec_start; + delta_dvfs =3D delta_exec =3D now - curr->exec_start; if (unlikely(delta_exec <=3D 0)) return delta_exec; =20 curr->exec_start =3D now; curr->sum_exec_runtime +=3D delta_exec; =20 + delta_dvfs =3D cap_scale(delta_dvfs, arch_scale_freq_capacity(cpu_of(rq))= ); + delta_dvfs =3D cap_scale(delta_dvfs, arch_scale_cpu_capacity(cpu_of(rq))); + curr->sum_dvfs_runtime +=3D delta_dvfs; + if (schedstat_enabled()) { struct sched_statistics *stats; =20 --- a/kernel/time/posix-cpu-timers.c +++ b/kernel/time/posix-cpu-timers.c @@ -164,7 +164,7 @@ posix_cpu_clock_getres(const clockid_t w if (!error) { tp->tv_sec =3D 0; tp->tv_nsec =3D ((NSEC_PER_SEC + HZ - 1) / HZ); - if (CPUCLOCK_WHICH(which_clock) =3D=3D CPUCLOCK_SCHED) { + if (CPUCLOCK_WHICH(which_clock) >=3D CPUCLOCK_SCHED) { /* * If sched_clock is using a cycle counter, we * don't have any idea of its true resolution @@ -198,6 +198,9 @@ static u64 cpu_clock_sample(const clocki if (clkid =3D=3D CPUCLOCK_SCHED) return task_sched_runtime(p); =20 + if (clkid =3D=3D CPUCLOCK_DVFS) + return task_sched_dvfs_runtime(p); + task_cputime(p, &utime, &stime); =20 switch (clkid) { @@ -1628,6 +1631,7 @@ static long posix_cpu_nsleep_restart(str =20 #define PROCESS_CLOCK make_process_cpuclock(0, CPUCLOCK_SCHED) #define THREAD_CLOCK make_thread_cpuclock(0, CPUCLOCK_SCHED) +#define THREAD_DVFS_CLOCK make_thread_cpuclock(0, CPUCLOCK_DVFS) =20 static int process_cpu_clock_getres(const clockid_t which_clock, struct timespec64 *tp) @@ -1664,6 +1668,11 @@ static int thread_cpu_timer_create(struc timer->it_clock =3D THREAD_CLOCK; return posix_cpu_timer_create(timer); } +static int thread_dvfs_cpu_clock_get(const clockid_t which_clock, + struct timespec64 *tp) +{ + return posix_cpu_clock_get(THREAD_DVFS_CLOCK, tp); +} =20 const struct k_clock clock_posix_cpu =3D { .clock_getres =3D posix_cpu_clock_getres, @@ -1690,3 +1699,8 @@ const struct k_clock clock_thread =3D { .clock_get_timespec =3D thread_cpu_clock_get, .timer_create =3D thread_cpu_timer_create, }; + +const struct k_clock clock_thread_dvfs =3D { + .clock_getres =3D thread_cpu_clock_getres, + .clock_get_timespec =3D thread_dvfs_cpu_clock_get, +}; --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -1516,6 +1516,7 @@ static const struct k_clock * const posi [CLOCK_MONOTONIC] =3D &clock_monotonic, [CLOCK_PROCESS_CPUTIME_ID] =3D &clock_process, [CLOCK_THREAD_CPUTIME_ID] =3D &clock_thread, + [CLOCK_THREAD_DVFS_ID] =3D &clock_thread_dvfs, [CLOCK_MONOTONIC_RAW] =3D &clock_monotonic_raw, [CLOCK_REALTIME_COARSE] =3D &clock_realtime_coarse, [CLOCK_MONOTONIC_COARSE] =3D &clock_monotonic_coarse, --- a/kernel/time/posix-timers.h +++ b/kernel/time/posix-timers.h @@ -34,6 +34,7 @@ extern const struct k_clock clock_posix_ extern const struct k_clock clock_posix_dynamic; extern const struct k_clock clock_process; extern const struct k_clock clock_thread; +extern const struct k_clock clock_thread_dvfs; extern const struct k_clock alarm_clock; =20 int posix_timer_event(struct k_itimer *timr, int si_private);