From nobody Thu Oct 2 21:53:56 2025 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A9F3C1DFF7 for ; Wed, 10 Sep 2025 09:51:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757497873; cv=none; b=rloOm47Y2oLLHzqotSBRq9qdiTconoTRW6kY5oLw0AGvU1k0W2BVh2RuqyB9XO2lAkuQX+YjBr+PbfPsx68yhDjzKv+U07S95BftD0Go1zKMDhkDhtltVbH5a2HqivenjokL/BqU/aJIYTfzGAjcTH/vQPCwVBM0nyg1pkQoJZw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757497873; c=relaxed/simple; bh=AX7DJgKmuwY1JHqB34E9VPYI7r8lViLIWSnF1A1HikY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=njMypAvChFAFeYxGmwBZKZ/W7wEsBilgoUurr7D1t+3gBm5SLUprL8tOYr+imhX5dXz1b10lB9EcOtOxa0VQcL4jSh1ea0Surjcegb/RmBKqZLhJT1bWkXJ4jb9Q6dG4A2Fs0qGQoOrXoJbLrXcFn7Y+FmNZK8f4WQ45SSsQ/ko= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=hYYJaIOa; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="hYYJaIOa" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-772627dd50aso469147b3a.1 for ; Wed, 10 Sep 2025 02:51:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1757497871; x=1758102671; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0hwnY1xbsQsVWYPohOmtJfOKnNCQAY4lc0iZDXl0QLc=; b=hYYJaIOaJ3R6RsvJHuLRXDlJMOAQzYmjRsJJwy5U24iRGnlBuR9RNYndm4kjigIEEZ ngi1yUY/TWvEnBMyV1jrwEBJQzOIHpHzyzKtu8v0mI5V4CnsPPYuJIcQDvcOLZco4AEt qjAuUtxWtmG/LyofvahsQHI6RSIkkkKUJg7tbGPhPdXPNySU0Ut6br4ssiojjoo2c7UU aiPs8+2UZpKeqNfdt3jz6Z+c5mwx/VmR5SbKScmnA1bzcbE/DAINZcPjy1EJkM4MIlYo MRk7aJJ7tpWVscHPDhpXmZK7Q03KySpvq/s4jjFPsx8WQfKbbhT354nLH0AuUPTNtyEz BOnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757497871; x=1758102671; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0hwnY1xbsQsVWYPohOmtJfOKnNCQAY4lc0iZDXl0QLc=; b=jcGDMlkbNhUG7CpELrWC7AF9uWwjao6BvpmkmYJ6LasxWkTj0/yPvYMR3r6SQM/ySN yLmq2cuIfb0uo9Rl3C1ttLwQ5ggr7xYR6jf/n87WwCECnImfeF1auQvIdQIjpfjcxWRG hCiUAVDrEJsi0kF9K0zV+1RkOwvdXk7RtEiriGF409GIUFZIKubjjCeZRPaPF/iLlHYU gxt44JamIKjrqWHx9RifYqi8yTC5HTVzHIFuc7BuMEcJPt1V6AxWEA4n8OCec3eNDK8V fKupP77jezCKptTrpiiMr6IY9TsMdZDFQyCYZaH9oPCyja+D9Tpej2UvcNbTnBRAZ8Td EpIQ== X-Gm-Message-State: AOJu0Ywdr6iL7JEYPHKZJZLqMOghIEO0KIqrKVN5hnrWXXvwtcPlJNwa 8pAoimUTtuE2qtBERIFgQlo6M0oizshJSlQ7EYkZ9ck45GGJGktLWtfGW3c3Q7VLhQ== X-Gm-Gg: ASbGncuAWxQsqgyvXGbrlfLCXEVY1um+E+5HR5WyxIG22BztzhRN8dQ9fbb4L+Om+DP TkV0NhGb2l0XjrKqFfoR5ZH8NBYqaJMhq/dCMElYzSe0g+YIvjwcob4/2UKYJXK988qIGIUgz8Y ubcs9W6wZa0MtU6VUY2wyc20JNv18CUR5O/czxmCZnUhgwh6lA0MvxxW1JTooZzi4rCpbvFWj5N QyTYB7IaAmnXBmPDwgeeNeizOwPHrsHGgcI5EXTerTUrbT1a8BVvTl2u8FHAIJc2ig996joHxU/ fr3xjvWtuZyGuZiYciDTv1ITt/3Fx6fjqQuYlpc+bpuTT+HefxxJ216Px6mViGZcjPHiEPWVvRn WLQ4Eq6dowyW7xqoBCYXwJz6yyQIz13RZUBgmLMg9LHCE9MEsj1HvKJm3o8MQsD9kJfP3erHzBG 8= X-Google-Smtp-Source: AGHT+IGEt1O774I1Gfw72/Z9hsnCoDWdTF+fzrOjKaCYbYtP4yrLq7WfyATRGLAXIYyRtV1THGki8g== X-Received: by 2002:a05:6a20:1596:b0:248:ef8:66df with SMTP id adf61e73a8af0-24e87b2898dmr32078846637.30.1757497870911; Wed, 10 Sep 2025 02:51:10 -0700 (PDT) Received: from 5CG4011XCS-JQI.bytedance.net ([61.213.176.57]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-b548b81f5acsm1850623a12.1.2025.09.10.02.51.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Sep 2025 02:51:10 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Chengming Zhou , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chuyi Zhou , Jan Kiszka , Florian Bezdeka , Songtang Liu , Chen Yu , Matteo Martelli , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Sebastian Andrzej Siewior Subject: [PATCH 1/4] sched/fair: Propagate load for throttled cfs_rq Date: Wed, 10 Sep 2025 17:50:41 +0800 Message-Id: <20250910095044.278-2-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250910095044.278-1-ziqianlu@bytedance.com> References: <20250910095044.278-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Before task based throttle model, propagating load will stop at a throttled cfs_rq and that propagate will happen on unthrottle time by update_load_avg(). Now that there is no update_load_avg() on unthrottle for throttled cfs_rq and all load tracking is done by task related operations, let the propagate happen immediately. While at it, add a comment to explain why cfs_rqs that are not affected by throttle have to be added to leaf cfs_rq list in propagate_entity_cfs_rq() per my understanding of commit 0258bdfaff5b ("sched/fair: Fix unfairness caused by missing load decay"). Signed-off-by: Aaron Lu Reviewed-by: Ben Segall Reviewed-by: Chengming Zhou Reviewed-by: Valentin Schneider --- kernel/sched/fair.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index df8dc389af8e1..f993de30e1466 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5729,6 +5729,11 @@ static inline int cfs_rq_throttled(struct cfs_rq *cf= s_rq) return cfs_bandwidth_used() && cfs_rq->throttled; } =20 +static inline bool cfs_rq_pelt_clock_throttled(struct cfs_rq *cfs_rq) +{ + return cfs_bandwidth_used() && cfs_rq->pelt_clock_throttled; +} + /* check whether cfs_rq, or any parent, is throttled */ static inline int throttled_hierarchy(struct cfs_rq *cfs_rq) { @@ -6721,6 +6726,11 @@ static inline int cfs_rq_throttled(struct cfs_rq *cf= s_rq) return 0; } =20 +static inline bool cfs_rq_pelt_clock_throttled(struct cfs_rq *cfs_rq) +{ + return false; +} + static inline int throttled_hierarchy(struct cfs_rq *cfs_rq) { return 0; @@ -13151,10 +13161,13 @@ static void propagate_entity_cfs_rq(struct sched_= entity *se) { struct cfs_rq *cfs_rq =3D cfs_rq_of(se); =20 - if (cfs_rq_throttled(cfs_rq)) - return; - - if (!throttled_hierarchy(cfs_rq)) + /* + * If a task gets attached to this cfs_rq and before being queued, + * it gets migrated to another CPU due to reasons like affinity + * change, make sure this cfs_rq stays on leaf cfs_rq list to have + * that removed load decayed or it can cause faireness problem. + */ + if (!cfs_rq_pelt_clock_throttled(cfs_rq)) list_add_leaf_cfs_rq(cfs_rq); =20 /* Start to propagate at parent */ @@ -13165,10 +13178,7 @@ static void propagate_entity_cfs_rq(struct sched_e= ntity *se) =20 update_load_avg(cfs_rq, se, UPDATE_TG); =20 - if (cfs_rq_throttled(cfs_rq)) - break; - - if (!throttled_hierarchy(cfs_rq)) + if (!cfs_rq_pelt_clock_throttled(cfs_rq)) list_add_leaf_cfs_rq(cfs_rq); } } --=20 2.39.5 From nobody Thu Oct 2 21:53:56 2025 Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0306E30EF69 for ; Wed, 10 Sep 2025 09:51:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757497881; cv=none; b=icTOjtwhYHpVuyysP6AsYWpvWlP6pxbBS39Ac5fQBRYtBUmGl5tWHQRRK+ffXijJYIVxV+1RXDTr/osub28jckxrGMj3cmDjuc67yUpm3CpkurPCVNDuPzuDv01Vx/camWYARvoG+anhPjQlScUg1uX4yk8vCy8X7rC3AmWg0ow= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757497881; c=relaxed/simple; bh=dQarK1+MK5EzNxypAdjzskSxZASZk2eoueH4p+weyRo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=BV0oyVBAFzHDUVbN3+3qc/jYsJsDAd4SERs4OI5KVeP7kIZgevS/abJsFgkmL5nJM8xcP0uiVCWliioSJcJCOqMMnzNGh/kGbliekQW1+baB98Z28JNXPg/V0+swUpecWjnQmLLA8k60FOujR1mWZmRAOVUEQ4V9e7p0F/0h3Go= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=XC8QF85z; arc=none smtp.client-ip=209.85.210.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="XC8QF85z" Received: by mail-pf1-f182.google.com with SMTP id d2e1a72fcca58-7742adc1f25so3172859b3a.2 for ; Wed, 10 Sep 2025 02:51:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1757497879; x=1758102679; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gNuQPZfQnXL5EQi47WEKgLiiG7HGygdxhbHZFwHI3wY=; b=XC8QF85z1as/9B5rUq8KLmbD2p391Ofy58oIR767GqUE5tG3OFe2ss9BuSnxBjqv43 On6SmexeN2fI99SeAIqwyu/b2N7G3YKrO1xg0jA3oxlI5YR7is4Ic84Encc+kaOud+Ec vTADPa6f+xCJh+iLMxlgrkK2eWop+a8X1DErDkj+8YqkR+FjGHdvo4jHjXezpmLyVnSa O3Yayq7GcdUsqQoi4oPkwwqnwOSS1WF13bLPCoB3UzvWLRE9zhl+JPsdeRI3ATdNfdLt vWVi4X4Xc47nCz3/Z4MAeaKLAjBrQefVgi+lcidmuHez+NtaWREqmfA110ap0LEfMPIs AXJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757497879; x=1758102679; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gNuQPZfQnXL5EQi47WEKgLiiG7HGygdxhbHZFwHI3wY=; b=ie9INUJ57Ux+T5SZiU91GYFg5B7lLJ3C0n+qVjlFflJvkv4fB8hhedYpGbtYnqrc06 CC9dpu4DCPgbNCR/d7+KLzkezek4wpJFDANazD5AePOcYjsr3OwoGRlPh03Nukm0YEVr QKs1k1j2iCeL3CAsVLJLQzPUZo5LJf2qckOFPYHC/k7j/tWBOn3TTL2BdVutCGb6HFAi sRheIzC7g9PuSjPTDSFK4crramlNpCnJOjvwiNzDgmRb7TwQXTyAsy4RRTNZ0M1hvrj9 rWc6+zCwlH55AbCSbku3KKqbkHEEAjAWoBQm8D+cDmFg2K4zKSHhYMzh6yb6zx54hlwb iNTw== X-Gm-Message-State: AOJu0Yy2PRFU/SJ+HcK4lZ/1XiO6xSWOBaBO4H+Kblj+EL7Dyv+x9rIp Y5QzZuuWUnXq6CvdYmPeFe8tyMXSsLoNLBjpcnWom6iBBe4FE4yRVFlZMTeti+2dqQ== X-Gm-Gg: ASbGncuCnh2UYNmxgPHgFu/Ni5C5oBQZ7p42D8AVuHiR04sWOA+2I6PBKGrkSon2O7r SusEbzTl3sO29LTziV1T0duzxn0pNHg40dHW2IENewbdwPpVd3slXEyNJC1e8ndaspYrXz9TqCo 3d96EFbsrkwXiCo4s26Xhi3FubwK/GzV5gZQ96cs4+ZSj+OjQVHl9ZaGV1KIpL88T3OElCe0lVG pYXKd3xp+VH8M5bvGLbVSbrkPlZmjoMuzD55hubEngDRb7r+s9Pj/QSbwDA4YNVi6SvMlKKigwF cl4+nZQVz41LdGQ9FlaboOuujUUfl2EeT1jMqwu4fH8/VSgzrgznHy2B4iENX2JTHQAXZNgjB9e 0r9ltSGIxT0iOmPq6LQ8wy7hkNnztWuIy7Ah9ucZUgFW8gJE3IwCOqbCZsHyUC92XNggOX96Ypn gXlqTJFeBqm3KeFUwMy9ly X-Google-Smtp-Source: AGHT+IE+NhmrnNJXNKSVjlTjp3RxW0RpcG97OXtafHQj59Kmvoeb2Vs8PTiln/f9k4O08wu9UzFDkg== X-Received: by 2002:a05:6a20:244b:b0:24f:b82d:424d with SMTP id adf61e73a8af0-25341e68314mr20868884637.37.1757497879205; Wed, 10 Sep 2025 02:51:19 -0700 (PDT) Received: from 5CG4011XCS-JQI.bytedance.net ([61.213.176.57]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-b548b81f5acsm1850623a12.1.2025.09.10.02.51.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Sep 2025 02:51:18 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Chengming Zhou , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chuyi Zhou , Jan Kiszka , Florian Bezdeka , Songtang Liu , Chen Yu , Matteo Martelli , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Sebastian Andrzej Siewior Subject: [PATCH 2/4] sched/fair: update_cfs_group() for throttled cfs_rqs Date: Wed, 10 Sep 2025 17:50:42 +0800 Message-Id: <20250910095044.278-3-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250910095044.278-1-ziqianlu@bytedance.com> References: <20250910095044.278-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With task based throttle model, tasks in a throttled hierarchy are allowed to continue to run if they are running in kernel mode. For this reason, PELT clock is not stopped for these cfs_rqs in throttled hierarchy when they still have tasks running or queued. Since PELT clock is not stopped, whether to allow update_cfs_group() doing its job for cfs_rqs which are in throttled hierarchy but still have tasks running/queued is a question. The good side is, continue to run update_cfs_group() can get these cfs_rq entities with an up2date weight and that up2date weight can be useful to derive an accurate load for the CPU as well as ensure fairness if multiple tasks of different cgroups are running on the same CPU. OTOH, as Benjamin Segall pointed: when unthrottle comes around the most likely correct distribution is the distribution we had at the time of throttle. In reality, either way may not matter that much if tasks in throttled hierarchy don't run in kernel mode for too long. But in case that happens, let these cfs_rq entities have an up2date weight seems a good thing to do. Signed-off-by: Aaron Lu Reviewed-by: Ben Segall Reviewed-by: Valentin Schneider --- kernel/sched/fair.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f993de30e1466..58f5349d37256 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3957,9 +3957,6 @@ static void update_cfs_group(struct sched_entity *se) if (!gcfs_rq || !gcfs_rq->load.weight) return; =20 - if (throttled_hierarchy(gcfs_rq)) - return; - shares =3D calc_group_shares(gcfs_rq); if (unlikely(se->load.weight !=3D shares)) reweight_entity(cfs_rq_of(se), se, shares); --=20 2.39.5 From nobody Thu Oct 2 21:53:56 2025 Received: from mail-pg1-f181.google.com (mail-pg1-f181.google.com [209.85.215.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 66BE525A320 for ; Wed, 10 Sep 2025 09:51:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757497890; cv=none; b=ea5TwnVS3Cw3+xpdBwZb1jhyEiIz0siuap9VcAIXJs+iHxEo6Dt2g6JFvR9eCrQK0WYTm9JJS187OsrUui4XQmulXGorQlTXxUkMcKCD0WLVpjyh/rWR3XzF2jlZUATHO1EJVgBtaVNqk667CYEdbk4O+3ukmsQAPgWyxJl0kDs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757497890; c=relaxed/simple; bh=PPdJKbt+QhiatobW0HNneeEsuS7d4F+Tqh3a3V4pt/4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=SA2r98ADt9T2WITjOBAYwrwJ2pB1/kZdvUXj/Yk1fNAcoDdLAYOMPTXx6wJicuyKpZvZiZbkbDFaBh2x9fHGunknHT5CPDnRvciBGht1Xb3A7m6O4o6YHNSU7RHrXx+FfJOoe48iHngd4Zu8teXzyK2ygUIQCgTgeJsenZBkjlw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=VOGutj54; arc=none smtp.client-ip=209.85.215.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="VOGutj54" Received: by mail-pg1-f181.google.com with SMTP id 41be03b00d2f7-b475dfb4f42so3976657a12.0 for ; Wed, 10 Sep 2025 02:51:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1757497888; x=1758102688; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XN62zYWns9T3iEXjB0mMzeReZa6lZpakq/cZfsDQ4jU=; b=VOGutj54gh6sQCICTO8xC9AoHPJdhtxJp0h6C0XJU5VZa5FGNUGfGNOwBQJqdjB2Gz Wa2099nc+nN+z8pQKpFpmmJzDr1JM7pNleoxZge3/AOFnchS15VTwtAKDxo2r8tcStRb hTYzv05tHl0J3pRr+BLiXfhSXbLTPwgMloCjF/LFs8Zg5vYCZ/Tdkcb7g7Nk6y+7OQAi 60blSxp0KUC2r+YWO5KQ0czGgxBVgKJgpIiVqxZzLjywTEhX/UnYFQ4iQ3irluEJRdD+ UbEr+NMA+sJZbST2vLjCE8q778D/ihcDOl8SnSgVQuLc9uSPRqBSNidye5YnesDD2BNW Hjng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757497888; x=1758102688; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XN62zYWns9T3iEXjB0mMzeReZa6lZpakq/cZfsDQ4jU=; b=YHDXHFQUGv96YTxq7Xr0ScvppDudfC3dOy04NKpIib3AykJq3ExET9TOAG+dUcvR1T x1STZEB6UEJkDqxDcZ7HJMAG1lOLbXUyVKFDCMMc1iprN4am77wClG/vQr6m9h4WFXKx UilGPyZQW6KWDu6n0+qW5gUJAJlthWnn+5SBcuJYV+9VmwuqDp5dU3ANzi1l8v1/I0Tk 8jHiDkyTuoWc6ILinnvmOVqZbVRx0drUP1hsrFzEIMInGCe4AuXnQttBeflIxRywWzi7 yS7MDaToHG5X0C+E6nAey01EtSRDa09U1n1muEAqDxYmIEmnWpSMJClL5rQY487FDRig D9zg== X-Gm-Message-State: AOJu0Yx6IVcatllMiP2tNdw16l87xp8ou+T2zW7Ugch5hxmL45+3VYtB 0x83bNf83Vrd6sGRdPDdJ7nATKdINN9vJW6qCTCGQFEmYo3RMILVc2WH4ahGOqI6Nw== X-Gm-Gg: ASbGncuueYQ2CZDV4Ve4n4EX2S2fS6QW13EbKJNhBQdKxDGC+0WMScS1IcigeOe4A4T Nl9wbjZXQJ6HdVa9KV8l4B7ByTvH76pHyi42nX2vWjDuva7mt9XWhSD0/yfre8KFnf3m1m6h36L NKc/hLCATdCorxTPnJBAocXtWh67on4kSOUACkRzRaVgyecORXY6PJ6uKuuFd8ZbMuNJL8V56Nx bxSqL1cVWQMxPNHaag6oIliUdMTsGxLEO1cm/+nHxGnQbF1DRst5lgKI3JLSHHyM5Pq32HcYNXI mc7LnDVrJ+8WtNlbK+Ts9QfrhoxtWQSuHvJVRRxKGn7TaIKw2Ko5k0SAf5CKgSkLEaLtBZg8/1z qNyybiVbIR4saMZVq/01x/DaoEdErvoXhJv8TCnuVYwdFejmu337QRdrO66R/VgTO31EAMVwPVy E/R9bqDWmmBfJal0khzE63 X-Google-Smtp-Source: AGHT+IGz+PRsKft1Xi0RyAaFtraod4QzmZBMZyBBJczolHAPHmi+z2xaTTuNMkiWdUulhNc1qSHVGA== X-Received: by 2002:a05:6a20:1596:b0:24b:bae4:9c7f with SMTP id adf61e73a8af0-25341e68162mr21656234637.39.1757497887544; Wed, 10 Sep 2025 02:51:27 -0700 (PDT) Received: from 5CG4011XCS-JQI.bytedance.net ([61.213.176.57]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-b548b81f5acsm1850623a12.1.2025.09.10.02.51.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Sep 2025 02:51:27 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Chengming Zhou , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chuyi Zhou , Jan Kiszka , Florian Bezdeka , Songtang Liu , Chen Yu , Matteo Martelli , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Sebastian Andrzej Siewior Subject: [PATCH 3/4] sched/fair: Do not special case tasks in throttled hierarchy Date: Wed, 10 Sep 2025 17:50:43 +0800 Message-Id: <20250910095044.278-4-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250910095044.278-1-ziqianlu@bytedance.com> References: <20250910095044.278-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With the introduction of task based throttle model, task in a throttled hierarchy is allowed to continue to run till it gets throttled on its ret2user path. For this reason, remove those throttled_hierarchy() checks in the following functions so that those tasks can get their turn as normal tasks: dequeue_entities(), check_preempt_wakeup_fair() and yield_to_task_fair(). The benefit of doing it this way is: if those tasks gets the chance to run earlier and if they hold any kernel resources, they can release those resources earlier. The downside is, if they don't hold any kernel resouces, all they can do is to throttle themselves on their way back to user space so the favor to let them run seems not that useful and for check_preempt_wakeup_fair(), that favor may be bad for curr. K Prateek Nayak pointed out prio_changed_fair() can send a throttled task to check_preempt_wakeup_fair(), further tests showed the affinity change path from move_queued_task() can also send a throttled task to check_preempt_wakeup_fair(), that's why the check of task_is_throttled() in that function. Signed-off-by: Aaron Lu Reviewed-by: Ben Segall Reviewed-by: Valentin Schneider --- kernel/sched/fair.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 58f5349d37256..3dbdfaa697477 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7081,7 +7081,7 @@ static int dequeue_entities(struct rq *rq, struct sch= ed_entity *se, int flags) * Bias pick_next to pick a task from this cfs_rq, as * p is sleeping when it is within its sched_slice. */ - if (task_sleep && se && !throttled_hierarchy(cfs_rq)) + if (task_sleep && se) set_next_buddy(se); break; } @@ -8735,7 +8735,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, = struct task_struct *p, int * lead to a throttle). This both saves work and prevents false * next-buddy nomination below. */ - if (unlikely(throttled_hierarchy(cfs_rq_of(pse)))) + if (task_is_throttled(p)) return; =20 if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_dela= yed) { @@ -9009,8 +9009,8 @@ static bool yield_to_task_fair(struct rq *rq, struct = task_struct *p) { struct sched_entity *se =3D &p->se; =20 - /* throttled hierarchies are not runnable */ - if (!se->on_rq || throttled_hierarchy(cfs_rq_of(se))) + /* !se->on_rq also covers throttled task */ + if (!se->on_rq) return false; =20 /* Tell the scheduler that we'd really like se to run next. */ --=20 2.39.5 From nobody Thu Oct 2 21:53:56 2025 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA0CD25DB06 for ; Wed, 10 Sep 2025 09:51:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757497898; cv=none; b=lxtPPzDSHA0lgQxgK31QvD3yYLwKVjvi+Xz131Rq6NXp2jrW1T9jxg5Y3kPEFHfqjJmXBrQIVR8bEuWZ9cKg2ZSUMeRmYKkg0mD7bI0/4Utt8oiOzi7MzO9iKRv32zs46iRGk3VXXBbzQS8tX8dFsKeaC4dlg2NNlVFWL5u6vZI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757497898; c=relaxed/simple; bh=e4TTFxWbfCAruZCvvckJLtHWN3hktCeBCZI+2cmgIrc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=lMbL8VzC84jr11WCX6Nj27gjxQj8PRO8TsUW6nCNPFWalKwCIvIPshxiSmZJdwS8UGqkGYwI1mnwSw8E6F7L2BF2WROdzHwN8qK39e4qT0jlYZ7fSNwJtj4wuetHKb205vYcxkivu7RjLHI0YplD7+HMcH5CoEq0qSILl2mHkFs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=HlipuqdQ; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="HlipuqdQ" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-77460a64848so1505659b3a.1 for ; Wed, 10 Sep 2025 02:51:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1757497896; x=1758102696; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LhkrDCBZE1HQCcoT5uSllVkcfT4YZeL1VIJqEPNvTDM=; b=HlipuqdQof6ieid0CQK8VUf6bCBGJ1oKJEyyBGixZues9NKwtJ37s9aJxH34UwFeDz dmeVTFB+qwm6K+y8gAdd+lUltJMwZ/eToksBVSkHTfEMcQCkGSPCCV5Nmx93YouIlNxJ ktNWRRU6+EZ9o9rZHdQeFx4JVIi6tJo1HtFi0kL2Kx34TCVDltK9qL5Rh79RLMw2eNIR gW6kOlO7KWdiQDcX86JGzkkz6yBwXmlP6txlswzZZ9Cnu8C8r8miRUYXdLHxCqYFzG5p 1d8IeiGm4nDQIDnAXh29s99R99Tocst1txpE+2we5rTddWKD3PolWEsqhT/BIMoH7y9Z jRpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1757497896; x=1758102696; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LhkrDCBZE1HQCcoT5uSllVkcfT4YZeL1VIJqEPNvTDM=; b=TYWIrjwMdbaB2SczYKyJhe3yQ2bxg6Xq8dqlCrntfQPF9rbxm3C3DKfFyGgNx2nv8X kg+ewnx4aC7SrBI1tIt7DdprlfbEao9fjhZ305Q8FrlL1UCNZtbgiWPU3MznIpJW3fgd +vfhjkT5Ts4qCtA9B9zxL2sajBOhcbVAYFuL/w1smkSqIP9jlvyVXkDMSyp1LIJjteFA xrW/91leZAfiBE8KMiT8cYw5uQwjGJ8CxTEePZweR4xQ3A+mu7dAi5qkp34CdGogWQaP IcTOeoL0tl4y2TLlHRTRRZPI2lzj3V5XfnuJp2YgAAZcX9296ABvjO5sjL2r+YvMHERz MC+g== X-Gm-Message-State: AOJu0Ywslk6X9iJ+K1pF6vkQaQJWcE/+rQsWrzNlchkRKt5duT2zyZKR tvCZJy69HQOSml2pCjJxNr1p1LkLffdoCIyHCvcb1vSuBADFbL/0QpMLMvnotDC6rw== X-Gm-Gg: ASbGncv2qHaaaorddj57DY/vaEB2rMmSiCi8dekrNpCgEqL5sbKl7xZEdbyJP17OEQj rmEIoZCJgzWxF23tbZ3vK41dvNXYnZ6HCAcdGJ6MosiOxfDdxdhvaT9ANLtVe37a2EpmCpTQ54O 3+fMfX7pSPgt0IiFoCGxmuOz6jyxZ0igs4ZKXR539Nnum5f0nF/cuoTh9XCzF421q7WluGqV4lj s4TjwFyPTbeAx6r7Y9b/AMeqXisWDYPsyL5sRh3tX8YHdQ2SPSxdumqWeB0Yx4U2GG3/Z8Lh6TE Q5wP7oaAEnXG0lhDjw29vbld2s0SJAHoC+Y4+iTbwRxhRQubLZFUbac91/9BVOXlPEk+U6tzu5G vgdYDafoH340SAJsjvFvolkJqVq3mq+oZl+etZnrHtggEYU4EeYlDZdnOtIFekKQf3TfajnUrLJ fI5nw0Hgbpzw== X-Google-Smtp-Source: AGHT+IFF3i3IWo+xqIZXLLhITk3nlcct04gm0352Icsw+69CRyxsMwcBjxoZ9MfilSofK0SnEl3ipQ== X-Received: by 2002:a05:6a20:4322:b0:240:489:be9a with SMTP id adf61e73a8af0-2533fab5821mr21778099637.23.1757497895854; Wed, 10 Sep 2025 02:51:35 -0700 (PDT) Received: from 5CG4011XCS-JQI.bytedance.net ([61.213.176.57]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-b548b81f5acsm1850623a12.1.2025.09.10.02.51.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Sep 2025 02:51:35 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Chengming Zhou , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chuyi Zhou , Jan Kiszka , Florian Bezdeka , Songtang Liu , Chen Yu , Matteo Martelli , =?UTF-8?q?Michal=20Koutn=C3=BD?= , Sebastian Andrzej Siewior Subject: [PATCH 4/4] sched/fair: Do not balance task to a throttled cfs_rq Date: Wed, 10 Sep 2025 17:50:44 +0800 Message-Id: <20250910095044.278-5-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250910095044.278-1-ziqianlu@bytedance.com> References: <20250910095044.278-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When doing load balance and the target cfs_rq is in throttled hierarchy, whether to allow balancing there is a question. The good side to allow balancing is: if the target CPU is idle or less loaded and the being balanced task is holding some kernel resources, then it seems a good idea to balance the task there and let the task get the CPU earlier and release kernel resources sooner. The bad part is, if the task is not holding any kernel resources, then the balance seems not that useful. While theoretically it's debatable, a performance test[0] which involves 200 cgroups and each cgroup runs hackbench(20 sender, 20 receiver) in pipe mode showed a performance degradation on AMD Genoa when allowing load balance to throttled cfs_rq. Analysis[1] showed hackbench doesn't like task migration across LLC boundary. For this reason, add a check in can_migrate_task() to forbid balancing to a cfs_rq that is in throttled hierarchy. This reduced task migration a lot and performance restored. [0]: https://lore.kernel.org/lkml/20250822110701.GB289@bytedance/ [1]: https://lore.kernel.org/lkml/20250903101102.GB42@bytedance/ Signed-off-by: Aaron Lu Reviewed-by: Ben Segall Reviewed-by: Valentin Schneider --- kernel/sched/fair.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3dbdfaa697477..00ee59993b6a3 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9369,14 +9369,19 @@ int can_migrate_task(struct task_struct *p, struct = lb_env *env) /* * We do not migrate tasks that are: * 1) delayed dequeued unless we migrate load, or - * 2) cannot be migrated to this CPU due to cpus_ptr, or - * 3) running (obviously), or - * 4) are cache-hot on their current CPU, or - * 5) are blocked on mutexes (if SCHED_PROXY_EXEC is enabled) + * 2) target cfs_rq is in throttled hierarchy, or + * 3) cannot be migrated to this CPU due to cpus_ptr, or + * 4) running (obviously), or + * 5) are cache-hot on their current CPU, or + * 6) are blocked on mutexes (if SCHED_PROXY_EXEC is enabled) */ if ((p->se.sched_delayed) && (env->migration_type !=3D migrate_load)) return 0; =20 + if (task_group(p) && + throttled_hierarchy(task_group(p)->cfs_rq[env->dst_cpu])) + return 0; + /* * We want to prioritize the migration of eligible tasks. * For ineligible tasks we soft-limit them and only allow --=20 2.39.5