From nobody Wed Dec 17 14:01:16 2025 Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B2191FC7F2 for ; Thu, 13 Mar 2025 07:21:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850474; cv=none; b=axciPMSmWGvhMesGhl4f12ceZ5DB6XELtswzJD5SBIVcSbqv5MXFX+Axav1S3XkmPc/LgegJL4FBC0GFU8nqUNXbpKmX1Aq3cydXag/aXQInYwrlkBOucZs3FwdkY36qC1s0CKQaiNFqQMV/wvv1taCrTemK2kxO0ET4viCWJ58= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850474; c=relaxed/simple; bh=gHPiJK6HRckmlXxxye7eG/CgOKv8DfBqUI+lI52Wxcc=; h=From:References:In-Reply-To:Mime-Version:Date:Message-ID:Subject: To:Cc:Content-Type; b=TGjkFF0jRbncn8dNJTQZ2t//e7G9Kmfm5HSWYCaDmAVfRi/znYR4aIKJM8WGlh2/mxwRSNlWf2C8XrTqT7Zc79el5xNc2oZ4wG+tZ/GyeXj/rdcRwpKudbHHvIkpT6MsIk9nbOwH51PRTwTXAlq+IsL4nYEO0gUjo8gL7u1xsEQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=HDSzFFUg; arc=none smtp.client-ip=209.85.208.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="HDSzFFUg" Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-5dccaaca646so1217653a12.0 for ; Thu, 13 Mar 2025 00:21:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741850471; x=1742455271; darn=vger.kernel.org; h=cc:to:subject:message-id:date:mime-version:in-reply-to:references :from:from:to:cc:subject:date:message-id:reply-to; bh=29ZMaOlzLLl8W5L5LLXlHyP6nAGcgQli4bqNsuL0SBE=; b=HDSzFFUg2sbtR2XKpWnOY4+3Ls8GYrdI/rc7iYtUKg4Fkf/rarJ0eSCw29Dljo+JDR eWQJRpH2XOJWyILOW51qTpIsIVbGYu8s3j5lE6Hq7XmWowrdQH4mdiftBl2Fa1kZ4JlT 1QVeYkaXa8l7h2/MjeauehY5jlbmqN7jz6VjrrAHiXbgq4tuCqHU5mHUt3i0lCX8FWKY gVc80YvBx7bQb3hrh5JfVW8RI2Y5p++cikiWK866KTQ/7kg4FVIgzbZVfKucbyn7sOv9 LGjoqSklwZLiri9DfFoMtmSS74eoqS30eLq8UOyeMSSzmHIqAEsUT3tIG6hrSs3eS5J8 5DoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741850471; x=1742455271; h=cc:to:subject:message-id:date:mime-version:in-reply-to:references :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=29ZMaOlzLLl8W5L5LLXlHyP6nAGcgQli4bqNsuL0SBE=; b=lMhaEgaZA+crsnYxp/cD7Jf88lwqt27S5SLUVWFT6u9ejpB83403DSOWPfqUOpTJTx pNp0QZj87QBX/9T2ChXAtMu1tz802dnIhC4i5mPnjPsY3mDwihsXYeLC5yoVNV0anme8 kKJQZ/tvIYISjBOuOhKAyuQxLNbntNLX9npvCiFa+iikrJQ3YS9V2k75N3VtAv7bfcaC JX4/PQ0R0VtQ0vvsfg7BIqFzVxgW8j9SFFPbhEXOqswYbIqkfNziVy9jnqUUKtWoo7x/ tiFC0SqK1+wE1AzfRAyL7/PW7X33nAjR4DPgTRh95uXE45aE5Rajr5p2ohKDvZUd+hYu nrew== X-Gm-Message-State: AOJu0YwOUyzu6gu4+HU/GaEzuOeWfyrO7NyY3pWE8aVwmbTVbLvVhU4/ AdNCDhdQvPXtTx8YTgN6khsrX1+yoU7PsZQwqsskUBYLL054LwqDYg8i+l5iKFFThykBIznKIHk 415uFGmz4PAQAptIxz8rIRELEiMthBJmd1YFS X-Gm-Gg: ASbGnctbZSCYYv8a3yS5pyUVEpKCipdNNhBrZAiaN4qBM8Rd+y+IghooKmraxn9CXEy 5/gGN/N/gACIgXMVEAttkin4e875F1OGuGmljAODgVEOlqGREGLt7jRNPPiyJwq0w0ywDi5PyN4 HH26KbwvGMnwV+JThXDupasplKSnhguf5HfJ+fQw== X-Google-Smtp-Source: AGHT+IHxkyL3RCMezoLUpX6NVM2XBBM8muoFF88BBs6ttBOc0pZ9L/+IsvBgykrUwA39mrtGeeqUoWafaGRMCC+z0EM= X-Received: by 2002:a50:cd5e:0:b0:5e7:87ea:b18c with SMTP id 4fb4d7f45d1cf-5e814ee5d15mr1105266a12.15.1741850470812; Thu, 13 Mar 2025 00:21:10 -0700 (PDT) Received: from 44278815321 named unknown by gmailapi.google.com with HTTPREST; Thu, 13 Mar 2025 02:21:10 -0500 From: Aaron Lu References: <20250313072030.1032893-1-ziqianlu@bytedance.com> In-Reply-To: <20250313072030.1032893-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: Aaron Lu X-Mailer: git-send-email 2.39.5 Date: Thu, 13 Mar 2025 02:21:10 -0500 X-Gm-Features: AQ5f1Jqo4tNhOWmDkvlNOQeUbc53LvIH-F5HqukIKZbZifT-c4YHJ4guYgUDFVo Message-ID: Subject: [RFC PATCH 1/7] sched/fair: Add related data structure for task based throttle To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider Add related data structures for this new throttle functionality. [aaronlu: extracted from Valentin's original patches] Signed-off-by: Valentin Schneider Signed-off-by: Aaron Lu --- include/linux/sched.h | 4 ++++ kernel/sched/core.c | 3 +++ kernel/sched/fair.c | 12 ++++++++++++ kernel/sched/sched.h | 2 ++ 4 files changed, 21 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 9632e3318e0d6..eec9087232660 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -858,6 +858,10 @@ struct task_struct { #ifdef CONFIG_CGROUP_SCHED struct task_group *sched_task_group; +#ifdef CONFIG_CFS_BANDWIDTH + struct callback_head sched_throttle_work; + struct list_head throttle_node; +#endif #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 621cfc731c5be..56e2ea14ac3b4 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4493,6 +4493,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) #ifdef CONFIG_FAIR_GROUP_SCHED p->se.cfs_rq =3D NULL; +#ifdef CONFIG_CFS_BANDWIDTH + init_cfs_throttle_work(p); +#endif #endif #ifdef CONFIG_SCHEDSTATS diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9dafb374d76d9..60eb5329bf526 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5832,6 +5832,18 @@ static inline int throttled_lb_pair(struct task_group *tg, throttled_hierarchy(dest_cfs_rq); } +static void throttle_cfs_rq_work(struct callback_head *work) +{ +} + +void init_cfs_throttle_work(struct task_struct *p) +{ + init_task_work(&p->sched_throttle_work, throttle_cfs_rq_work); + /* Protect against double add, see throttle_cfs_rq() and throttle_cfs_rq_work() */ + p->sched_throttle_work.next =3D &p->sched_throttle_work; + INIT_LIST_HEAD(&p->throttle_node); +} + static int tg_unthrottle_up(struct task_group *tg, void *data) { struct rq *rq =3D data; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 023b844159c94..c8bfa3d708081 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2724,6 +2724,8 @@ extern bool sched_rt_bandwidth_account(struct rt_rq *rt_rq); extern void init_dl_entity(struct sched_dl_entity *dl_se); +extern void init_cfs_throttle_work(struct task_struct *p); + #define BW_SHIFT 20 #define BW_UNIT (1 << BW_SHIFT) #define RATIO_SHIFT 8 --=20 2.39.5 From nobody Wed Dec 17 14:01:16 2025 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64BAB1FCFF6 for ; Thu, 13 Mar 2025 07:21:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850485; cv=none; b=m5fkGg6GLi/vZTeeIcBJLZ5s8sThiu2S2zvxvTEOuyfdOeT107EMDz/aMV44zSCMr0gBr0o3kFsrkYYsr+mQRIR9e0kklF7/BPiOn8UvWsigWfqBH31jD5iyPyt2NmXCfrp9o8ER2EIZp0bhjoaUARYdksgwZGfuPhvZrT48QBk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850485; c=relaxed/simple; bh=FPYrVMdzMgoRn0ykyVlfROJr6GOBCIwlvqj8+tF+pR0=; h=Mime-Version:From:References:In-Reply-To:Date:Message-ID:Subject: To:Cc:Content-Type; b=logENHshvxZbBVWb2dud8D7+MIIqiD+pYGopGQp5uzROzNiu5njhJIK2VtGE/WwE3Gx5vZoFsd4t0jZ7eD531Q2vtI36v6p3PJZj1hItd0AQSLrXVdvOJhIPmRyPNx1WGoSMKzHoKKIPdHPFWkCTZ31ilLxvj4/JIVjNU+63iTM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=MGCg9nMT; arc=none smtp.client-ip=209.85.208.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="MGCg9nMT" Received: by mail-ed1-f49.google.com with SMTP id 4fb4d7f45d1cf-5e6ff035e9aso1293567a12.0 for ; Thu, 13 Mar 2025 00:21:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741850482; x=1742455282; darn=vger.kernel.org; h=cc:to:subject:message-id:date:in-reply-to:references:from :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=9X3j/Al+E0qt3CF0NzuCOMM22tcAbBsDeUSDt8lyYIc=; b=MGCg9nMTOUvY/LeQmcs7y7uHuZpQdbHtPO4j0ssibi1/kdVaVtGMNf99mlVb+j236T cRIVnKisdZLQzQFZr2m2zXGwiNZSN/GqxjCOydTbRTtPX8MaZXnwmzEiOOW3gpioYKy/ yLVX5Ia2ciT4GZ1+e6+TON5Z39Sh7ozCe6OCArmYed+4NYBx7kb8Un225+J2pMIW/84d Y6PGhZOasvKbxdENmh/9Cu8PgScZp68wAUo97Lsv+zZIbJ4xToVU+Q2sZ0/ch1N/Oo/7 9XkrgOPgbyhuXC2aZFGQQHXriR8NlH4MjGxqufj1BOfKjy8C+p4/xc23+24lpHgFpk0I ZO3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741850482; x=1742455282; h=cc:to:subject:message-id:date:in-reply-to:references:from :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=9X3j/Al+E0qt3CF0NzuCOMM22tcAbBsDeUSDt8lyYIc=; b=GDMx5gsmnEzVkreqmkG8CmRPvXT/+6SAE0Dc/z4wC0tSgpYzRpvacyMW9lO0QknxbS p93LEOsKmM9IBg/Gr4XVMVRTyZ1EbQadCZ4NKEZvpRU1uod3H48wg+RPd1tJFT/43BQ/ BlaLYK2PTGOmTtlhzNTYtERRU+OqUIPVcu+Wa3mJRa6hGko9ibaM94nSjumyjd5eD4cw brE44+8p6YJ9FbQizvt7ruXDkSonANqoCRVjhzqEsf01JEisE8l52ROVRmgMFdXfFKAv OS2SK8YgYF37lxegSutnrI+jVEZe1BENn18OTq/T9spPHbj00afkFu52BBfj83FaeGxR Axhw== X-Gm-Message-State: AOJu0Ywa7BpAApGVG9HDOVnpGtbzCcHP1vISNZasiWuGPtMUsrDBHLU5 tz6P72N7g+ssRZWFYdxDtny5KN/B1EY8AAM8R5UvPmBGl5ZOFEfwGME9IWDtQG2kojq4twAcwPW wUpSqDdXWj/mkQ2XnkHmkwR3NqmTr+rYByBe1 X-Gm-Gg: ASbGncvjFhM/DeK7Aje6ASOZaJluZyjNvZ5OU7wAj3YEm5FG3yUIwnKy/U8EE34gll1 9OknpSz+6SqOiWhwbcyFs/1Sb++X8L5cppU/MngFvVX7nSKwQRSBeJnwQSrbcm3RffymTZKBB2I 3NttOdNZe8kI+6AawI2lgv7D26bys= X-Google-Smtp-Source: AGHT+IGgnZtSATN2Xo2VvJvcktGSwzPWv0X85+vVzQIddh/0WYEeN2pnG6/EtCQih1PN3DJ8ZncETihAEApQQDvIAeI= X-Received: by 2002:a05:6402:d08:b0:5e6:60da:dc45 with SMTP id 4fb4d7f45d1cf-5e75f987950mr12486383a12.31.1741850481603; Thu, 13 Mar 2025 00:21:21 -0700 (PDT) Received: from 44278815321 named unknown by gmailapi.google.com with HTTPREST; Thu, 13 Mar 2025 07:21:21 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: Aaron Lu From: Aaron Lu References: <20250313072030.1032893-1-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250313072030.1032893-1-ziqianlu@bytedance.com> Date: Thu, 13 Mar 2025 07:21:21 +0000 X-Gm-Features: AQ5f1JooU-arp1lqIqQdw18F33sXyJhPom3vRV1LsqGgNxWZpG0iLVAjL4pOIm0 Message-ID: Subject: [RFC PATCH 2/7] sched/fair: Handle throttle path for task based throttle To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider Once a cfs_rq gets throttled, for all tasks belonging to this cfs_rq, add a task work to them so that when those tasks return to user, the actual throttle/dequeue can happen. Note that since the throttle/dequeue always happens on a task basis when it returns to user, it's no longer necessary for check_cfs_rq_runtime() to return a value and pick_task_fair() acts differently according to that return value, so check_cfs_rq_runtime() is changed to not return a value. [aaronlu: extracted from Valentin's original patches. Fixed a problem that curr is not in timeline tree and has to be dealed with explicitely; Make check_cfs_rq_runtime() void.] Signed-off-by: Valentin Schneider Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 201 ++++++++++++++++++++++++------------------- kernel/sched/sched.h | 1 + 2 files changed, 112 insertions(+), 90 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 60eb5329bf526..ab403ff7d53c8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5607,7 +5607,7 @@ pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq) return se; } -static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq); +static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq); static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *pr= ev) { @@ -5832,8 +5832,49 @@ static inline int throttled_lb_pair(struct task_group *tg, throttled_hierarchy(dest_cfs_rq); } +static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static void throttle_cfs_rq_work(struct callback_head *work) { + struct task_struct *p =3D container_of(work, struct task_struct, sched_throttle_work); + struct sched_entity *se; + struct cfs_rq *cfs_rq; + struct rq *rq; + struct rq_flags rf; + + WARN_ON_ONCE(p !=3D current); + p->sched_throttle_work.next =3D &p->sched_throttle_work; + + /* + * If task is exiting, then there won't be a return to userspace, so we + * don't have to bother with any of this. + */ + if ((p->flags & PF_EXITING)) + return; + + rq =3D task_rq_lock(p, &rf); + + se =3D &p->se; + cfs_rq =3D cfs_rq_of(se); + + /* Raced, forget */ + if (p->sched_class !=3D &fair_sched_class) + goto out_unlock; + + /* + * If not in limbo, then either replenish has happened or this task got + * migrated out of the throttled cfs_rq, move along + */ + if (!cfs_rq->throttle_count) + goto out_unlock; + + update_rq_clock(rq); + WARN_ON_ONCE(!list_empty(&p->throttle_node)); + list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list); + dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL); + resched_curr(rq); + +out_unlock: + task_rq_unlock(rq, p, &rf); } void init_cfs_throttle_work(struct task_struct *p) @@ -5873,32 +5914,81 @@ static int tg_unthrottle_up(struct task_group *tg, void *data) return 0; } +static inline bool task_has_throttle_work(struct task_struct *p) +{ + return p->sched_throttle_work.next !=3D &p->sched_throttle_work; +} + +static inline void task_throttle_setup_work(struct task_struct *p) +{ + /* + * Kthreads and exiting tasks don't return to userspace, so adding the + * work is pointless + */ + if ((p->flags & (PF_EXITING | PF_KTHREAD))) + return; + + if (task_has_throttle_work(p)) + return; + + task_work_add(p, &p->sched_throttle_work, TWA_RESUME); +} + static int tg_throttle_down(struct task_group *tg, void *data) { struct rq *rq =3D data; struct cfs_rq *cfs_rq =3D tg->cfs_rq[cpu_of(rq)]; + struct task_struct *p; + struct rb_node *node; + + cfs_rq->throttle_count++; + if (cfs_rq->throttle_count > 1) + return 0; /* group is entering throttled state, stop time */ - if (!cfs_rq->throttle_count) { - cfs_rq->throttled_clock_pelt =3D rq_clock_pelt(rq); - list_del_leaf_cfs_rq(cfs_rq); + cfs_rq->throttled_clock_pelt =3D rq_clock_pelt(rq); + list_del_leaf_cfs_rq(cfs_rq); - SCHED_WARN_ON(cfs_rq->throttled_clock_self); - if (cfs_rq->nr_queued) - cfs_rq->throttled_clock_self =3D rq_clock(rq); + SCHED_WARN_ON(cfs_rq->throttled_clock_self); + if (cfs_rq->nr_queued) + cfs_rq->throttled_clock_self =3D rq_clock(rq); + + WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list)); + /* + * rq_lock is held, current is (obviously) executing this in kernelspace. + * + * All other tasks enqueued on this rq have their saved PC at the + * context switch, so they will go through the kernel before returning + * to userspace. Thus, there are no tasks-in-userspace to handle, just + * install the task_work on all of them. + */ + node =3D rb_first(&cfs_rq->tasks_timeline.rb_root); + while (node) { + struct sched_entity *se =3D __node_2_se(node); + + if (!entity_is_task(se)) + goto next; + + p =3D task_of(se); + task_throttle_setup_work(p); +next: + node =3D rb_next(node); + } + + /* curr is not in the timeline tree */ + if (cfs_rq->curr && entity_is_task(cfs_rq->curr)) { + p =3D task_of(cfs_rq->curr); + task_throttle_setup_work(p); } - cfs_rq->throttle_count++; return 0; } -static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) +static void throttle_cfs_rq(struct cfs_rq *cfs_rq) { struct rq *rq =3D rq_of(cfs_rq); struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); - struct sched_entity *se; - long queued_delta, runnable_delta, idle_delta, dequeue =3D 1; - long rq_h_nr_queued =3D rq->cfs.h_nr_queued; + int dequeue =3D 1; raw_spin_lock(&cfs_b->lock); /* This will start the period timer if necessary */ @@ -5919,74 +6009,13 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) raw_spin_unlock(&cfs_b->lock); if (!dequeue) - return false; /* Throttle no longer required. */ - - se =3D cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))]; + return; /* Throttle no longer required. */ /* freeze hierarchy runnable averages while throttled */ rcu_read_lock(); walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq); rcu_read_unlock(); - queued_delta =3D cfs_rq->h_nr_queued; - runnable_delta =3D cfs_rq->h_nr_runnable; - idle_delta =3D cfs_rq->h_nr_idle; - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - int flags; - - /* throttled entity or throttle-on-deactivate */ - if (!se->on_rq) - goto done; - - /* - * Abuse SPECIAL to avoid delayed dequeue in this instance. - * This avoids teaching dequeue_entities() about throttled - * entities and keeps things relatively simple. - */ - flags =3D DEQUEUE_SLEEP | DEQUEUE_SPECIAL; - if (se->sched_delayed) - flags |=3D DEQUEUE_DELAYED; - dequeue_entity(qcfs_rq, se, flags); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued -=3D queued_delta; - qcfs_rq->h_nr_runnable -=3D runnable_delta; - qcfs_rq->h_nr_idle -=3D idle_delta; - - if (qcfs_rq->load.weight) { - /* Avoid re-evaluating load for this entity: */ - se =3D parent_entity(se); - break; - } - } - - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - /* throttled entity or throttle-on-deactivate */ - if (!se->on_rq) - goto done; - - update_load_avg(qcfs_rq, se, 0); - se_update_runnable(se); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued -=3D queued_delta; - qcfs_rq->h_nr_runnable -=3D runnable_delta; - qcfs_rq->h_nr_idle -=3D idle_delta; - } - - /* At this point se is NULL and we are at root level*/ - sub_nr_running(rq, queued_delta); - - /* Stop the fair server if throttling resulted in no runnable tasks */ - if (rq_h_nr_queued && !rq->cfs.h_nr_queued) - dl_server_stop(&rq->fair_server); -done: /* * Note: distribution will already see us throttled via the * throttled-list. rq->lock protects completion. @@ -5995,7 +6024,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) SCHED_WARN_ON(cfs_rq->throttled_clock); if (cfs_rq->nr_queued) cfs_rq->throttled_clock =3D rq_clock(rq); - return true; + return; } void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) @@ -6471,22 +6500,22 @@ static void sync_throttle(struct task_group *tg, int cpu) } /* conditionally throttle active cfs_rq's from put_prev_entity() */ -static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq) +static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq) { if (!cfs_bandwidth_used()) - return false; + return; if (likely(!cfs_rq->runtime_enabled || cfs_rq->runtime_remaining > 0)) - return false; + return; /* * it's possible for a throttled entity to be forced into a running * state (e.g. set_curr_task), in this case we're finished. */ if (cfs_rq_throttled(cfs_rq)) - return true; + return; - return throttle_cfs_rq(cfs_rq); + throttle_cfs_rq(cfs_rq); } static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer) @@ -6582,6 +6611,7 @@ static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq) cfs_rq->runtime_enabled =3D 0; INIT_LIST_HEAD(&cfs_rq->throttled_list); INIT_LIST_HEAD(&cfs_rq->throttled_csd_list); + INIT_LIST_HEAD(&cfs_rq->throttled_limbo_list); } void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b) @@ -7117,10 +7147,6 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags) if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D h_nr_queued; - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - return 0; - /* Don't dequeue parent if it has other entities besides us */ if (cfs_rq->load.weight) { slice =3D cfs_rq_min_slice(cfs_rq); @@ -7157,10 +7183,6 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags) if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D h_nr_queued; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - return 0; } sub_nr_running(rq, h_nr_queued); @@ -8869,8 +8891,7 @@ static struct task_struct *pick_task_fair(struct rq *= rq) if (cfs_rq->curr && cfs_rq->curr->on_rq) update_curr(cfs_rq); - if (unlikely(check_cfs_rq_runtime(cfs_rq))) - goto again; + check_cfs_rq_runtime(cfs_rq); se =3D pick_next_entity(rq, cfs_rq); if (!se) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index c8bfa3d708081..5c2af5a70163c 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -742,6 +742,7 @@ struct cfs_rq { int throttle_count; struct list_head throttled_list; struct list_head throttled_csd_list; + struct list_head throttled_limbo_list; #endif /* CONFIG_CFS_BANDWIDTH */ #endif /* CONFIG_FAIR_GROUP_SCHED */ }; --=20 2.39.5 From nobody Wed Dec 17 14:01:16 2025 Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB7891FC7C6 for ; Thu, 13 Mar 2025 07:21:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850497; cv=none; b=CmYc68vjWYr1IFY4wh1Gq36LS9Zfkx69ALnTnz6RuRMUEFnP94nZY9+GUvSjpwRTNwGAO42hOc337NpabI2x2JNwGtXBcnJISqEqSqi0h8RTVAKiCWguH7QTMAmCRF5DMRedWVxo3znGQN1/+ei2y7xwnssUprfy8ZZ7UFTFpEo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850497; c=relaxed/simple; bh=lOJiSz7w3clLPaLxlSufcLuCf7SVEUA7lnAKEmzNWT4=; h=Mime-Version:In-Reply-To:References:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=L7couGqI8vM0wR8GtL+YDrMPbEhQn579ySmAtfDVZUbgCdhFcJpYel8ovJxq6/3oDdiD3f+dIHScXM/MjxfdILxBLftnsClkEHjl639mN9B4FZq55hSYn+3g3AsMwia/CVL4yiO0BUz5/56IvfZ/eVeAyvM21rBeDoAUgenpCrw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=e182KGKy; arc=none smtp.client-ip=209.85.208.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="e182KGKy" Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-5e6167d0536so1207852a12.1 for ; Thu, 13 Mar 2025 00:21:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741850493; x=1742455293; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8IUTonFnbGR28bH8I5KVSgIiO0PN+YCRA2kMGCMcFvo=; b=e182KGKysAZLSRSRBRgNsksc0zQlCPVtcYaC3R3Jxj0T6J5CZcNNQj510b+7oc1AKS bgi1iqWPdaXOQ8oKP965mgDOx6nybR7EM35u7fex1ErPebfT/lyV8th0HS/9h38o4bwn 7u8Wq0XL4yLRrGOEgy4ERUZ0TM12XroHRHZaYr3GeCVbLBxNtqGCq0cThzRFtQIn0J6A 1OEMBff+nIrLwjMinDudQTI14KK83ZCtROBnRE1wHaTa+wcGyA0N/5co4Thh/8sA0H5h huqu62tqhPFS0POSPkoYu1vdliA+IVmn/jOHWeFmobhGSGEMqRm9CACc/Vh/jvCG/YlH g3JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741850493; x=1742455293; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8IUTonFnbGR28bH8I5KVSgIiO0PN+YCRA2kMGCMcFvo=; b=SOFZd8Y/CdbiSladPE7bvgZX1O/RtbB7SBMcmgjT5qgatnEY+Fp3kCoZ6CR+3w/XPA 6FG7pFG2iBHYpf+UM2FLf76naVA3fxr135Tp5stNCSnqFprMM5MQS/RDQt1x5xG1e0YY RdG8llYlBPZInzNI+7wLsYV3K877YyI51V/knJ3SstTJT06cnTWvHmQ5J7JLpoQTHOfd AWDmsIA18omXEBd7Yk+wmK/3z54ppmMcEj53aJ1Mu3ivf3ghB9cQp2Ravlr+rAEiqal3 UTUoxSuGziWcE9Dj0XN6KMYDtwCVVjg92toqux/+kNkMGR1J6982PS90jWBGmqXyB9ib H4lQ== X-Gm-Message-State: AOJu0Yx3ah7MoWh6lB27k9vmRY/IuM419LlckUmJSKXlQoffl/ZBF0fQ VGwnZCtpWyQCulnRmdIGCmEu/iwWfQy1gCSl9wucIO2E7301LCj4Iir+BEykM2Qin+LiaSnTDF8 blRjmtvbX78PZiHWL6NwzL0VbrzGxoHyXX2CB X-Gm-Gg: ASbGncv1EgksUiWzsb9gpsOHxcfMpKfgXwBFpbYh8Com9IdL1X68yQju9EOFG58lV5s B9ln8C+DX7hbrUB/iO7pg/zUx6xD/BcoKQ/aF2KColJ1wa+QFB54laLBSfHrXLJwhuku5lgwREl 5FvGbwkfflleT2dqMkBsxrWyyorZM= X-Google-Smtp-Source: AGHT+IGeC7ByiYR5LqV6b9JQUsphbI5YmoLxADfUOig/IAbugjBqDiy9iqK8Q2k+FurOApQZ5znqPsMmSMUksniypIM= X-Received: by 2002:a05:6402:d08:b0:5e6:60da:dc45 with SMTP id 4fb4d7f45d1cf-5e75f987950mr12486783a12.31.1741850493161; Thu, 13 Mar 2025 00:21:33 -0700 (PDT) Received: from 44278815321 named unknown by gmailapi.google.com with HTTPREST; Thu, 13 Mar 2025 00:21:32 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20250313072030.1032893-1-ziqianlu@bytedance.com> References: <20250313072030.1032893-1-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 X-Original-From: Aaron Lu From: Aaron Lu Date: Thu, 13 Mar 2025 00:21:32 -0700 X-Gm-Features: AQ5f1Jp_ea6Yb1KOLsBDUR1_sNy1JA_TVnPxXRepu8DMiQqpWVs4TTLWsW7wDNw Message-ID: Subject: [RFC PATCH 3/7] sched/fair: Handle unthrottle path for task based throttle To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider On unthrottle, enqueue throttled tasks back so they can continue to run. Note that for this task based throttling, the only throttle place is when it returns to user space so as long as a task is enqueued, no matter its cfs_rq is throttled or not, it will be allowed to run till it reaches that throttle place. leaf_cfs_rq list is handled differently now: as long as a task is enqueued to a throttled or not cfs_rq, this cfs_rq will be added to that list and when cfs_rq is throttled and all its tasks are dequeued, it will be removed from that list. I think this is easy to reason so chose to do so. [aaronlu: extracted from Valentin's original patches. I also changed the implementation to using enqueue_task_fair() for queuing back tasks to unthrottled cfs_rq] Signed-off-by: Valentin Schneider Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 132 +++++++++++++++----------------------------- 1 file changed, 45 insertions(+), 87 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ab403ff7d53c8..4a95fe3785e43 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5366,18 +5366,17 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) if (cfs_rq->nr_queued =3D=3D 1) { check_enqueue_throttle(cfs_rq); - if (!throttled_hierarchy(cfs_rq)) { - list_add_leaf_cfs_rq(cfs_rq); - } else { + list_add_leaf_cfs_rq(cfs_rq); #ifdef CONFIG_CFS_BANDWIDTH + if (throttled_hierarchy(cfs_rq)) { struct rq *rq =3D rq_of(cfs_rq); if (cfs_rq_throttled(cfs_rq) && !cfs_rq->throttled_clock) cfs_rq->throttled_clock =3D rq_clock(rq); if (!cfs_rq->throttled_clock_self) cfs_rq->throttled_clock_self =3D rq_clock(rq); -#endif } +#endif } } @@ -5525,8 +5524,11 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) if (flags & DEQUEUE_DELAYED) finish_delayed_dequeue_entity(se); - if (cfs_rq->nr_queued =3D=3D 0) + if (cfs_rq->nr_queued =3D=3D 0) { update_idle_cfs_rq_clock_pelt(cfs_rq); + if (throttled_hierarchy(cfs_rq)) + list_del_leaf_cfs_rq(cfs_rq); + } return true; } @@ -5832,6 +5834,11 @@ static inline int throttled_lb_pair(struct task_group *tg, throttled_hierarchy(dest_cfs_rq); } +static inline bool task_is_throttled(struct task_struct *p) +{ + return !list_empty(&p->throttle_node); +} + static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static void throttle_cfs_rq_work(struct callback_head *work) { @@ -5885,32 +5892,45 @@ void init_cfs_throttle_work(struct task_struct *p) INIT_LIST_HEAD(&p->throttle_node); } +static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static int tg_unthrottle_up(struct task_group *tg, void *data) { struct rq *rq =3D data; struct cfs_rq *cfs_rq =3D tg->cfs_rq[cpu_of(rq)]; + struct task_struct *p, *tmp; cfs_rq->throttle_count--; - if (!cfs_rq->throttle_count) { - cfs_rq->throttled_clock_pelt_time +=3D rq_clock_pelt(rq) - - cfs_rq->throttled_clock_pelt; + if (cfs_rq->throttle_count) + return 0; - /* Add cfs_rq with load or one or more already running entities to the list */ - if (!cfs_rq_is_decayed(cfs_rq)) - list_add_leaf_cfs_rq(cfs_rq); + cfs_rq->throttled_clock_pelt_time +=3D rq_clock_pelt(rq) - + cfs_rq->throttled_clock_pelt; - if (cfs_rq->throttled_clock_self) { - u64 delta =3D rq_clock(rq) - cfs_rq->throttled_clock_self; + if (cfs_rq->throttled_clock_self) { + u64 delta =3D rq_clock(rq) - cfs_rq->throttled_clock_self; - cfs_rq->throttled_clock_self =3D 0; + cfs_rq->throttled_clock_self =3D 0; - if (SCHED_WARN_ON((s64)delta < 0)) - delta =3D 0; + if (SCHED_WARN_ON((s64)delta < 0)) + delta =3D 0; - cfs_rq->throttled_clock_self_time +=3D delta; - } + cfs_rq->throttled_clock_self_time +=3D delta; } + /* Re-enqueue the tasks that have been throttled at this level. */ + list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_node) { + list_del_init(&p->throttle_node); + /* + * FIXME: p may not be allowed to run on this rq anymore + * due to affinity change while p is throttled. + */ + enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP); + } + + /* Add cfs_rq with load or one or more already running entities to the li= st */ + if (!cfs_rq_is_decayed(cfs_rq)) + list_add_leaf_cfs_rq(cfs_rq); + return 0; } @@ -5947,12 +5967,16 @@ static int tg_throttle_down(struct task_group *tg, void *data) /* group is entering throttled state, stop time */ cfs_rq->throttled_clock_pelt =3D rq_clock_pelt(rq); - list_del_leaf_cfs_rq(cfs_rq); SCHED_WARN_ON(cfs_rq->throttled_clock_self); if (cfs_rq->nr_queued) cfs_rq->throttled_clock_self =3D rq_clock(rq); + if (!cfs_rq->nr_queued) { + list_del_leaf_cfs_rq(cfs_rq); + return 0; + } + WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list)); /* * rq_lock is held, current is (obviously) executing this in kernelspace. @@ -6031,11 +6055,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) { struct rq *rq =3D rq_of(cfs_rq); struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); - struct sched_entity *se; - long queued_delta, runnable_delta, idle_delta; - long rq_h_nr_queued =3D rq->cfs.h_nr_queued; - - se =3D cfs_rq->tg->se[cpu_of(rq)]; + struct sched_entity *se =3D cfs_rq->tg->se[cpu_of(rq)]; cfs_rq->throttled =3D 0; @@ -6063,62 +6083,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) if (list_add_leaf_cfs_rq(cfs_rq_of(se))) break; } - goto unthrottle_throttle; } - queued_delta =3D cfs_rq->h_nr_queued; - runnable_delta =3D cfs_rq->h_nr_runnable; - idle_delta =3D cfs_rq->h_nr_idle; - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - - /* Handle any unfinished DELAY_DEQUEUE business first. */ - if (se->sched_delayed) { - int flags =3D DEQUEUE_SLEEP | DEQUEUE_DELAYED; - - dequeue_entity(qcfs_rq, se, flags); - } else if (se->on_rq) - break; - enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued +=3D queued_delta; - qcfs_rq->h_nr_runnable +=3D runnable_delta; - qcfs_rq->h_nr_idle +=3D idle_delta; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(qcfs_rq)) - goto unthrottle_throttle; - } - - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - - update_load_avg(qcfs_rq, se, UPDATE_TG); - se_update_runnable(se); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued +=3D queued_delta; - qcfs_rq->h_nr_runnable +=3D runnable_delta; - qcfs_rq->h_nr_idle +=3D idle_delta; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(qcfs_rq)) - goto unthrottle_throttle; - } - - /* Start the fair server if un-throttling resulted in new runnable tasks = */ - if (!rq_h_nr_queued && rq->cfs.h_nr_queued) - dl_server_start(&rq->fair_server); - - /* At this point se is NULL and we are at root level*/ - add_nr_running(rq, queued_delta); - -unthrottle_throttle: assert_list_leaf_cfs_rq(rq); /* Determine whether we need to wake up potentially idle CPU: */ @@ -6989,6 +6955,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) util_est_enqueue(&rq->cfs, p); if (flags & ENQUEUE_DELAYED) { + SCHED_WARN_ON(task_is_throttled(p)); requeue_delayed_entity(se); return; } @@ -7031,10 +6998,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D 1; - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - goto enqueue_throttle; - flags =3D ENQUEUE_WAKEUP; } @@ -7056,10 +7019,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D 1; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - goto enqueue_throttle; } if (!rq_h_nr_queued && rq->cfs.h_nr_queued) { @@ -7089,7 +7048,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) if (!task_new) check_update_overutilized_status(rq); -enqueue_throttle: assert_list_leaf_cfs_rq(rq); hrtick_update(rq); --=20 2.39.5 From nobody Wed Dec 17 14:01:16 2025 Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11F251FC7D6 for ; Thu, 13 Mar 2025 07:21:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850505; cv=none; b=Y92GjDMaQq3dvyTJF686lHfhp8bCo9Y88Hv263gbXAzY9wPZdeIwbW6EomyYsCbA1G5LgeG/3TcZf6icj7sd7o/8/muBXw0ecmgiQYG7llPF2uQNHHaRdk5tjxTWKI2pWuv7yGET2QljFHq517zl9y4TzHdg/5546jLFykwFRCc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850505; c=relaxed/simple; bh=L9rWjtWbTUj8kJFF/EqrEqk9SOXiY+xaWNJyOwq1GC4=; h=From:References:Mime-Version:In-Reply-To:Date:Message-ID:Subject: To:Cc:Content-Type; b=EE1IllvgWSycFAu1sWaR13sPR7Ys6EknPuiM/bztOa0tyPmXhaGlxYB/GB10+ACTNqYTTQTPJcT5RAOuZrQ9fOooH8bnHqFkryx8ZJ15ewlg1BwmOczDnvUauGrs6WbfG1pJgSnn2cb7b207lftMr6DamRZkF8D3kzqpagRDmBI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Dll3boE4; arc=none smtp.client-ip=209.85.218.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Dll3boE4" Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-ac29fd22163so107249366b.3 for ; Thu, 13 Mar 2025 00:21:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741850502; x=1742455302; darn=vger.kernel.org; h=cc:to:subject:message-id:date:in-reply-to:mime-version:references :from:from:to:cc:subject:date:message-id:reply-to; bh=R6ACoLFMhR5QCe3o6r/A2S+wrUu/mwDBM2MCBM/Dhkg=; b=Dll3boE4OkdMSShcuKRghZrzIsTZDhFcKzxPhwBpBInrgZ7VHe+aOfLkk0UM0A5CFW PzcW7e16SJJLtDahqDI2Yzf8f8isE2kapjBsHHBfa1xtORgBbbhLZXWmbbg5+AWAlg40 htixx3oYeTpo3F2942LinGEMAlplgfLObvn7HWpP1dyMu7RSOY8/llBu3GkE/x1CeIQu xTGP5VlTZXFIRXvBt8I7d/OqfeQ5sWqqcDOVSnIEmn6TBBXLh0J6vL/r1M1zIqtAYuSB hK0a1AsbunY/yB4W3S0hEgT55kIXwzabAC8q43vfsu+8j9fJqkumqJmjfV0bW6Q0kOlZ ZplQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741850502; x=1742455302; h=cc:to:subject:message-id:date:in-reply-to:mime-version:references :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=R6ACoLFMhR5QCe3o6r/A2S+wrUu/mwDBM2MCBM/Dhkg=; b=mCdkfYVzn6kvVNBIupUnEydAdlBS7Xw+4C6J+0Erh1kqGhABtTJ80vXQ4CMq6K67mA 4TtYauANfAtOmPqEaGvJ4gAMdg3+m77BYYdWO/eiGY453yn0oPjREYVI1ss7GJGyfIBC AHyFaAz4juc2C4gKU+WMfzx+uKs2ncFdd1SPxqXSsa2ICr0T7roN26l0/s+oUDsOOYI3 4YEpf8LAZMd3BLFUYoLWEqpBeSP8Gfp07QN1+XhxDRo7LfZA1pB4cSer5xlUiJPNlTJf g0f8/y22mytNx8hDXQXlAIvUnO+boXMpmSk/SAGa3IT6+vbNqfwc0MwG/au7agdY6E+1 JkZw== X-Gm-Message-State: AOJu0Yy1q/cS4JZ80UVhcND+Lyy+lqneYyU5P7wWkAcoEJ+g7f6EiFEg 1U92TLfvNmH9vMRFfjDrN7hD3LEInbuahp9ocXj64G8PfHdNC7tvtH4HjKbDat4XrY26zQQN8KJ qdUSYxxPJeFZMRdHGOmKTzWYqKA7D3gc6tm7z X-Gm-Gg: ASbGncvljzZp6oWq4nm++W5n2xEO4O4wKr0jtXtZtbyZ/I05Iygjxmp5HVkq/e806Sd 5x341I87DAzMRNtjOQkGxYwu4dYGMr/lqJAsndg/+sET9KLtjN7UKGVSy0xH2irLKemi8EoRxJd NQOUnZCPGoEi0GQnNMCGYQccB/Y5Y= X-Google-Smtp-Source: AGHT+IHZlkarGeABNz4Te832wQORdZEZDUsFX8baQz3ibjsRP5NxGbLCL2EsI4JyTBDnuEd9vj+aCazq//6SIWkBOMM= X-Received: by 2002:a17:907:1b05:b0:abf:5c3e:e5ac with SMTP id a640c23a62f3a-ac25274a016mr3260906466b.11.1741850502239; Thu, 13 Mar 2025 00:21:42 -0700 (PDT) Received: from 44278815321 named unknown by gmailapi.google.com with HTTPREST; Thu, 13 Mar 2025 02:21:41 -0500 From: Aaron Lu References: <20250313072030.1032893-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20250313072030.1032893-1-ziqianlu@bytedance.com> X-Original-From: Aaron Lu X-Mailer: git-send-email 2.39.5 Date: Thu, 13 Mar 2025 02:21:41 -0500 X-Gm-Features: AQ5f1JrMYgKlv4h5qut83Hf_IVCXLSzFvmIokw7J7C3zknPboY4KARnKxo14f2A Message-ID: Subject: [RFC PATCH 4/7] sched/fair: Take care of migrated task for task based throttle To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If a task is migrated to a new cpu, it is possible this task is not throttled but the new cfs_rq is throttled or vice vesa. Take care of these situations in enqueue path. Note that we can't handle this in migrate_task_rq_fair() because there, the dst cpu's rq lock is not held and things like checking if the new cfs_rq needs throttle can be racy. Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4a95fe3785e43..9e036f18d73e6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7051,6 +7051,23 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) assert_list_leaf_cfs_rq(rq); hrtick_update(rq); + + if (!cfs_bandwidth_used()) + return; + + /* + * This is for migrate_task_rq_fair(): the new_cpu's rq lock is not held + * in migrate_task_rq_fair() so we have to do these things in enqueue + * time when the dst cpu's rq lock is held. Doing this check in enqueue + * time also takes care of newly woken up tasks, e.g. a task wakes up + * into a throttled cfs_rq. + * + * It's possible the task has a throttle work added but this new cfs_rq + * is not in throttled hierarchy but that's OK, throttle_cfs_rq_work() + * will take care of it. + */ + if (throttled_hierarchy(cfs_rq_of(&p->se))) + task_throttle_setup_work(p); } static void set_next_buddy(struct sched_entity *se); --=20 2.39.5 From nobody Wed Dec 17 14:01:16 2025 Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFE9F1FCCEA for ; Thu, 13 Mar 2025 07:21:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850517; cv=none; b=Z5lV2IMuTo5KLXHE6lyycrXmDF8jjuQ79MPHI2Nx03VyV/OrfNeZT32mTEuu4oNIT4lQZDPkwbjmHPBwX5iSRtqYPkb7viAw7PQkGwrPsr4adUh9a8b3535FObNCAP99A0j4xl+Nd43QlFJ2/J/pv/QdFD8tbnX+J2qYtcq5DLU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850517; c=relaxed/simple; bh=F2iSSOdR/hDLzHj6tvB8TPDBA4YNdpFnG46Hrlaho/Y=; h=From:Mime-Version:References:In-Reply-To:Date:Message-ID:Subject: To:Cc:Content-Type; b=oc4RlVp/KElLG7bXQIw3lWcTg7Swgp5ZLecFKB9fw5hzZmJlHpqON0zyL5aYi+o0wgusQWehLOjFMCYA7zqT8Tyev8bxnNzkQjKOn4XgeOA7Imp15B5I+AeIkg0hAuuON7SwTqo8dU1xQJE4MrqnMohxQrG8qLuTflk9auZ1BQ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=NoMIY2vZ; arc=none smtp.client-ip=209.85.208.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="NoMIY2vZ" Received: by mail-ed1-f46.google.com with SMTP id 4fb4d7f45d1cf-5e61da95244so1063656a12.2 for ; Thu, 13 Mar 2025 00:21:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741850514; x=1742455314; darn=vger.kernel.org; h=cc:to:subject:message-id:date:in-reply-to:references:mime-version :from:from:to:cc:subject:date:message-id:reply-to; bh=jQAPUFnO9nEMuZGJ5x9hWTx/DhspMU7TljFp3+HbdFk=; b=NoMIY2vZP0t/tX8yBbh7F1TL6xfLB3UbCqHxkPiPpFlc4u0j9lcXwtO1YNr72jcWK0 XXfM1XuzC0054OFHErCYjOPqB5I2lfF6vg+jhIug4Wprqz8lpNyXvTLeJ5xvPYbCaC1s JVc3MoR3Igfi7P1Y/ZZLvYs5+Ys1Tu3SfxLeic/k/jCCzcQw+wxsPnr3CVoKUllcWsWd GHHiyEV+wuDBxyX/LzzkVHq14jAn9RYTjoTUhAjuyTvuyuhrsZp6n3tAR1V7O6RlIpuu ZN29yiAgYfcpl/9ZdqGJCJOFOJBZ8JwPABJarArK3ZbdDfnDjZkn/ddq8Hfm/OHHdnSD w+Tw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741850514; x=1742455314; h=cc:to:subject:message-id:date:in-reply-to:references:mime-version :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jQAPUFnO9nEMuZGJ5x9hWTx/DhspMU7TljFp3+HbdFk=; b=qt6wXx+FxhOyWWgRQZV78XOcRmbWBbdQYk3+P73WbIw/5h7FhDP1DdH/6FkKo2jzDg jvlFGgx2VcA6TmAgrpdNMFHj2dbVCQVk30uSiUMlPoMBKUvVPCrhdjv6SJIOJCksBfrK vp9QAmJmceIHETpslvHf5g+X4xDC8OC/wXF1weXsnCYLo5+/1UkZLt71eYyHElC3Tg5F zAKuIshA9WZCtjUP6/DjS29cQbf7/LiX0vJuXoPSt22oN0k6UZho+0izBdHEO7ktXSM/ I6RTPVLSwHyRDazWLyXkai5a0Kfa/dc3WBt+Jtsz75cHbh3KMoOtYa0h94nI7ax9QmvQ RAGg== X-Gm-Message-State: AOJu0YzbVhmoHBHwnFeBiwhqn31U4fRsrGX2wU95BwvJLpSXsTAY8u7F PoKljK4NI4q4Ce8yeukHHQYzl6p7+66kKE+FMnmdO9YtuhZDJxCPXEbQTxavom7gg1ayDxMUkjx mXRkW+ssQ9bVJ8JjdUaazVT7TFhdzk93OcQoc X-Gm-Gg: ASbGnctzhzrxii9Bg2wEbn2vrXMbdHc3ALGC7Eftqw8H+b1uLXa+o5O24Iw/i6wYvCn FzqMnXjMpDvT/oN2IpDwJpw6gCFcZxUxjHwznk70g0iL+t6ZiWElpYuQcl677UdESA/DGY33CfD rCZPmAoIEkkgQoAiupP2B7tEQlWp4= X-Google-Smtp-Source: AGHT+IFfeWDlYHQTGMGMpO3/ozaP1dGAKoleEHFUOzP5BCuASPpC5R/F94l3F6TLYPhgGf2jH/VO5O2ShcA1DRsgK60= X-Received: by 2002:a05:6402:2747:b0:5e0:2e70:c2af with SMTP id 4fb4d7f45d1cf-5e5e24909a2mr28745235a12.26.1741850514277; Thu, 13 Mar 2025 00:21:54 -0700 (PDT) Received: from 44278815321 named unknown by gmailapi.google.com with HTTPREST; Thu, 13 Mar 2025 02:21:53 -0500 X-Mailer: git-send-email 2.39.5 From: Aaron Lu Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: Aaron Lu References: <20250313072030.1032893-1-ziqianlu@bytedance.com> In-Reply-To: <20250313072030.1032893-1-ziqianlu@bytedance.com> Date: Thu, 13 Mar 2025 02:21:53 -0500 X-Gm-Features: AQ5f1JokQNZnRoxq3hXBX0TkqhOuxJ2KX7RFvRrM-srb1Xpe3594xFdcKDUmfTQ Message-ID: Subject: [RFC PATCH 5/7] sched/fair: Take care of group/affinity/sched_class change for throttled task To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On task group change, for a queued task, core will dequeue it and then requeued it. The throttled task is still considered as queued by core because p->on_rq is still set so core will dequeue it too, but since the task is already dequeued on throttle, handle this case properly in fair class code. Affinity and sched class change is similar. Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9e036f18d73e6..f26d53ac143fe 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5876,8 +5876,8 @@ static void throttle_cfs_rq_work(struct callback_head *work) update_rq_clock(rq); WARN_ON_ONCE(!list_empty(&p->throttle_node)); - list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list); dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL); + list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list); resched_curr(rq); out_unlock: @@ -5920,10 +5920,6 @@ static int tg_unthrottle_up(struct task_group *tg, void *data) /* Re-enqueue the tasks that have been throttled at this level. */ list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_node) { list_del_init(&p->throttle_node); - /* - * FIXME: p may not be allowed to run on this rq anymore - * due to affinity change while p is throttled. - */ enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP); } @@ -7194,6 +7190,16 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags) */ static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags) { + if (task_is_throttled(p)) { + /* sched/core wants to dequeue this throttled task. */ + SCHED_WARN_ON(p->se.on_rq); + SCHED_WARN_ON(flags & DEQUEUE_SLEEP); + + list_del_init(&p->throttle_node); + + return true; + } + if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags & DEQUEUE_SAVE)))) util_est_dequeue(&rq->cfs, p); --=20 2.39.5 From nobody Wed Dec 17 14:01:16 2025 Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C35551FC0E9 for ; Thu, 13 Mar 2025 07:22:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850527; cv=none; b=TDLKm0tu7Xpuj80Qiqv/nYd/GAWKqiIiUHXZj1BCZ8Qx18xm91PG669Lp3OUG8SHlLgCrrDAdgNhFaC/RtNiO0w7ervxoi6EjpV/7JKcVfX3PyMFnXVb4uQ3aGnfnFTIa2OjZQvC2cK0I29WwWYMsdmsclUAyGGylmcm4sSsYgM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850527; c=relaxed/simple; bh=ABdjK49NUCTMY7T5xxi5URbrXzpCVmOiO3AOk1OmA3A=; h=Mime-Version:From:References:In-Reply-To:Date:Message-ID:Subject: To:Cc:Content-Type; b=bnsGwLucfRdKMx32IMYAI9IfQR7gGhNjAMJFEFoaF47UNUiggyMy3o0HDZW66psM3IapFD0S3BzhYiPIS0z/Tj/Oa5MiH01WdcQlBnAYxkijrY4OcqYbTOQC2GZjuwchu3YKmvy5gJeaFdAaFBK7rSEdDexo4T+7q1+DkWkU7pA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=ac/CWr6H; arc=none smtp.client-ip=209.85.208.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="ac/CWr6H" Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-5e5deb6482cso3118838a12.1 for ; Thu, 13 Mar 2025 00:22:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741850524; x=1742455324; darn=vger.kernel.org; h=cc:to:subject:message-id:date:in-reply-to:references:from :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=mpdHZooSnnkS+K92aYRbLN7CIlx+7CoXTHdbFnteGvg=; b=ac/CWr6HWNIzT/enpGrH0S9ZJwXv11081w7nJzu4u0CF+G8uBoB9FXCpsXNv4mol8V 6KCR5WB8Zq/9ofQKFmFHeLzjdL1RDAKTkGDj4z/RVdPwUUBLgy3ubQij45vtrUWCjnY4 OGaiyWfWRMAb4VwjWRV+RQJ9QLoQ9gfW/Y8u9XiDj18U4jQ5RU7I+2VU5xHY15BEEc8b p3ugvAlZWRs1bGHz8f74jeSaY0lCvZagxOpX/HlG+x+EIwKLpbweW5mS9D8drzydG1xH bgZCmxGz/feSH4Gcgcf50/lMnoauVlKdeU35BzdCKdKD8/UTodAWbedcoJ6/JiAxuhSN SXyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741850524; x=1742455324; h=cc:to:subject:message-id:date:in-reply-to:references:from :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=mpdHZooSnnkS+K92aYRbLN7CIlx+7CoXTHdbFnteGvg=; b=CE1Y17iMm/aal3Aa8NOCTfPtBtrxFsB1TwHndvKotnAM/x98hVvvGz83wtJg60oY5v vYMgm27Xb1KEpZ0tFgU1GYidAqTigb0nWX53f4a7ZJd9VhNS3LSadEIZoABbwNJva266 /lhWWz4ww0jnvyMvx8kdWuFkc3VMxoNRPGy96LfNUed1to9CyufpWxiqmihsIAkueeXW gABlNeYhb9I0CQ4WgDh2Ki9soSdbPUSMM6o7knA6GqUvNuhenuSOyfpUpzhzScwq+06K lDTQHCkwz2JawK2q2H7PyvCTTlXfTrsuzbVIVfTPxZC/vXM9oh+tcDGKpPZ+Bhsi5ag3 6bgQ== X-Gm-Message-State: AOJu0YwH7KILgAy2EpBEITpfZzehUHoKLUhnrBeCPIipYxDKlRlbp5MD U8CIuayrGR9Ohb03s4D+SJ6DMPGXaDEJRyc/vYFOXxNKM78ZtBK7akiDTbEzgrVMnwlESePbJLY 2SskYEnipS+Lqj2kfF5CbQEPcNEVglLk0e8Wk X-Gm-Gg: ASbGncslxrGL+g+ZF70VRhRA2DlFYUl0g6hnCo82VHgjY0IAondk+rH07tnrepD2ml8 Z8G6l7HMA/mMqmoTu5p1IpzG6ZKloQfDAgQWPhsnnHVK8u9KIjymOtFpeFYaRRUUEtG3XDvyP0L NOyeWtH52atejUxO/MR9KvWu/3zq+zpOnojrMoPg== X-Google-Smtp-Source: AGHT+IH+IMhoSkaWpJS5TRkmjFwrwbFe9U/nZkypYcakAA2YPfL6JqR499A40vDIj5a4m3lqPAs4L2zH2oBrrA5kvzs= X-Received: by 2002:a05:6402:2805:b0:5db:e88c:914f with SMTP id 4fb4d7f45d1cf-5e814d8e710mr1565687a12.4.1741850524075; Thu, 13 Mar 2025 00:22:04 -0700 (PDT) Received: from 44278815321 named unknown by gmailapi.google.com with HTTPREST; Thu, 13 Mar 2025 00:22:03 -0700 X-Mailer: git-send-email 2.39.5 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 From: Aaron Lu X-Original-From: Aaron Lu References: <20250313072030.1032893-1-ziqianlu@bytedance.com> In-Reply-To: <20250313072030.1032893-1-ziqianlu@bytedance.com> Date: Thu, 13 Mar 2025 00:22:03 -0700 X-Gm-Features: AQ5f1JqKcMI_Wdoy74ykvkqqoq0GJML9yKZWszLN8__AcBW_Ar7AZHPLoJbMPRM Message-ID: Subject: [RFC PATCH 6/7] sched/fair: fix tasks_rcu with task based throttle To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Taskes throttled on exit to user path are scheduled by cond_resched() in task_work_run() but that is a preempt schedule and doesn't mark a task rcu quiescent state. Fix this by directly calling schedule() in throttle_cfs_rq_work(). Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f26d53ac143fe..be96f7d32998c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5847,6 +5847,7 @@ static void throttle_cfs_rq_work(struct callback_head *work) struct cfs_rq *cfs_rq; struct rq *rq; struct rq_flags rf; + bool sched =3D false; WARN_ON_ONCE(p !=3D current); p->sched_throttle_work.next =3D &p->sched_throttle_work; @@ -5879,9 +5880,13 @@ static void throttle_cfs_rq_work(struct callback_head *work) dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL); list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list); resched_curr(rq); + sched =3D true; out_unlock: task_rq_unlock(rq, p, &rf); + + if (sched) + schedule(); } void init_cfs_throttle_work(struct task_struct *p) --=20 2.39.5 From nobody Wed Dec 17 14:01:16 2025 Received: from mail-ej1-f52.google.com (mail-ej1-f52.google.com [209.85.218.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 955D61FDA86 for ; Thu, 13 Mar 2025 07:22:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850537; cv=none; b=obBYPLM2yZJUpz7DFmxGSUoR94OXaEMkcAPd50p8pRuno8B5jGotdz/UZzVqRY5wgwIuzzOTOaKeqEMZmxufp4YKfzuqj+teV7vKe3iEtI2ytgdVKQw2Td/1/bo/pCR82eAIHa+x6UFEfCSmC/dvQ7WbMyE4E/16d8Ep6+Uwt3A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741850537; c=relaxed/simple; bh=AE5A2NiuhhYdM1sWAknuDe5X1OFs+sj0hwKEzFeDmoU=; h=Mime-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=qr46QaKZDScUZVfLxYlRC10xsa4f/detuqhIA30zV9CAs8fsSAOYxzqtTrp+FMjNYFM6XMAVy7P+DK1x+t2jSIm995zZrVbJ8GUwy8/oM/czpytriVuxqBLiXAJ7dZ/UYBoBabdKw1b89MdZ7zEUBWsFq2eppkDacE0fJIlgJPE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=hZH9EzLM; arc=none smtp.client-ip=209.85.218.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="hZH9EzLM" Received: by mail-ej1-f52.google.com with SMTP id a640c23a62f3a-aaeec07b705so100378666b.2 for ; Thu, 13 Mar 2025 00:22:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1741850534; x=1742455334; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vpEzXfsqBkoX4MatYvv/s6fQuJuNm6IXmUv71NmaMxI=; b=hZH9EzLMG1zuEgH0KFLAGmFJ/cotpjsWT/Ssy91D2uwi8hwyAefuARJ+0HjrcRbniW ZOAT+wR15Akbeo6yh7jdBxYrVM45yHtAZjd4/B5E7luifn/p8EXJu5WuPBDB/t4Jxp3v 4zIiU9KydcW1s1x75BvASN8Herg3uRixS9DMQB87IjcxEGy1nRXYbqetZdyORHDQp9Zf eqdpoUqiaQqYOKgIDY4EReUgFQT+5Iqih6GgbCySbjUeJK1OcO7fCbDUGLnJpLgMr7Bd PyTtFqI7XS2ozavRf//w0qW6bZEfFwjfOOhBV9qJSuYZgNPPvz3i008m1BjPBrLmaI3e hsmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741850534; x=1742455334; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vpEzXfsqBkoX4MatYvv/s6fQuJuNm6IXmUv71NmaMxI=; b=Dlr3qChzUoO+lxTcwzhAW8HqMbFW7/noibd+/XawDFgq7yq/oZYCH3Chu+g01opRb8 8MCGm67EY0lnG61E/mPcnTOWdSCaOlofRZ73a9MBCgGsJcKDMziUHSnjdLUPOqnO4lYa 41x1jlJuK9tv8C+OILKC/moPbyxQk75ozDZhA1P8zi1+hFIeqKA7x4F12FQIOP0NY4gV Y4vDXwS46lOL0A5v1c55aXVWddSFByYdxIc+Q/OfIrRHnW+3FE3GT4P68fJ2siJgxgYB j+b3PflmklDPJyLJ4CwLTx7RED6ZcHAIy/k0sX9cjL0ZJdzc/jmFDsbD7VOtonLNYK4O OHHQ== X-Gm-Message-State: AOJu0YzR7zAHMsSRpxMd5CDg4LlyHRxl4Q1rUGGGlK49k8Gc6dlnaENJ lgvb6wIXwT91ZBOYvi0m2IZxnnLkMHP0JXS6gLtB9qEGBGIPvGLNwiDNSWztHH4S4R47tnoIAtI F1YbHKhn0GzWbNwbCvdTb8fVImgHFp6ozJOI7 X-Gm-Gg: ASbGncvAPjLiD/+XVs5PJE8rDImV4abikgNHgvk+lyMNRPzhMVwFdjRm2YECjs99de3 losuC44EKLlowqOMruncfJWyGDfuZU6SyCim3jQRSkdROXzvb2WMtd7zNrvvNwoRFqDqONgvz3E ZCj3sx+yhB9WhcuK4P4jpHXZ6FzKE= X-Google-Smtp-Source: AGHT+IHhlNrNpiMA7L0JPZsP3KNrZZ1fOZ0mF1oPxXrCVJMx5bfO95h7bBq14Q+DG2+I0K7s3DVDmcJBTCJXGgDAuEw= X-Received: by 2002:a17:907:1c15:b0:abf:7406:a5c3 with SMTP id a640c23a62f3a-ac2b9ee8733mr1323584666b.51.1741850534007; Thu, 13 Mar 2025 00:22:14 -0700 (PDT) Received: from 44278815321 named unknown by gmailapi.google.com with HTTPREST; Thu, 13 Mar 2025 00:22:13 -0700 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250313072030.1032893-1-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250313072030.1032893-1-ziqianlu@bytedance.com> From: Aaron Lu X-Original-From: Aaron Lu Date: Thu, 13 Mar 2025 00:22:13 -0700 X-Gm-Features: AQ5f1JqvOOWc7Eu5kkoNljS6MLk_wH52rWSn_S_LjFzatT5ufD9JuvWltz5Q9LA Message-ID: Subject: [RFC PATCH 7/7] sched/fair: Make sure cfs_rq has enough runtime_remaining on unthrottle path To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" It's possible unthrottle_cfs_rq() is called with !runtime_remaining due to things like user changed quota setting(see tg_set_cfs_bandwidth()) or async unthrottled us with a positive runtime_remaining but other still running entities consumed those runtime before we reach there. Anyway, we can't unthrottle this cfs_rq without any runtime remaining because task enqueue during unthrottle can immediately trigger a throttle by check_enqueue_throttle(), which should never happen. Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index be96f7d32998c..d646451d617c1 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6058,6 +6058,19 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); struct sched_entity *se =3D cfs_rq->tg->se[cpu_of(rq)]; + /* + * It's possible we are called with !runtime_remaining due to things + * like user changed quota setting(see tg_set_cfs_bandwidth()) or async + * unthrottled us with a positive runtime_remaining but other still + * running entities consumed those runtime before we reach here. + * + * Anyway, we can't unthrottle this cfs_rq without any runtime remaining + * because any enqueue below will immediately trigger a throttle, which + * is not supposed to happen on unthrottle path. + */ + if (cfs_rq->runtime_enabled && !cfs_rq->runtime_remaining) + return; + cfs_rq->throttled =3D 0; update_rq_clock(rq); --=20 2.39.5