From nobody Sun Jun 28 00:10:22 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC7CAC433EF for ; Thu, 17 Feb 2022 15:44:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242680AbiBQPog (ORCPT ); Thu, 17 Feb 2022 10:44:36 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:46382 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242667AbiBQPoa (ORCPT ); Thu, 17 Feb 2022 10:44:30 -0500 Received: from mail-pg1-x530.google.com (mail-pg1-x530.google.com [IPv6:2607:f8b0:4864:20::530]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FFCCB23AD for ; Thu, 17 Feb 2022 07:44:15 -0800 (PST) Received: by mail-pg1-x530.google.com with SMTP id 195so5356472pgc.6 for ; Thu, 17 Feb 2022 07:44:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=n+yvIhku4QXYZHCkM9HOGBC9iJm2wXLIAwkbGeymkbE=; b=x2TiCPsdfRAzb4WvqNWpOV2WZEKxBwHKf9FVM14db4rWZ2wR+FLr9xtVNg946N1Npu 2PSrMbpQ8RuGuYWZuMIKFCgFrhIrjmsbwGB3XvQTWO0PTt8VrADoAkXE3qhf99C7IkOP i0fxoySK/P1swY8R0nBZQNFl26Z5g8rIe8bW2NmC4FRwHoN5o+EpHs3who07tvuTEdHP YAF/wKRZMl4nvF28QmKbPN8/u6HjN9PpLtU9vAuMP/A1DU/AUyUSDl8Izf2FJN9AejRF hBk3QTb8nGjPZkRCIf7/G59JvOq2hQj+pzVQ9Dr729maoz6ZYNVUEchmhQwF1thWmCaB TG2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=n+yvIhku4QXYZHCkM9HOGBC9iJm2wXLIAwkbGeymkbE=; b=4f564W4uAvWrVDU87fSl3Qk4qLWnn6aHprQO3JPVck6k3pzU78uLIbX++He9npQwW1 skum1tKJSKXx6vVD+aJ0mzaNhaXSlqAy+nHrFqSH2JI6gB9Ts3HcQwosaQp9GH+jXVHd qLhsix3Fdg0zl4QfThMDvH0qbMaQ+SRVrpGXK4YbRLMxYVt6Zbl+Hg/jBBHzkwOUAAGQ OxeCpB2DfXjq7rwteF2a0mzZm9fjD3zUU0X6KoX4OGkntaLUBZEcKIUOWQxAxaMl9rdZ kegYTkAxP9qhxR+SFdxWR8gr8pIOdaQXNAJHp2O2x2OgoZBPVhMxUl+EacbHINp2UAv+ gXtA== X-Gm-Message-State: AOAM530Rz0zbvLCEspSnZ2p2WHUCjVmnkzuwD9Xqpq0DNo+/ama6utlE MxcbMjOsngBnsarFFvnM0rIgJw== X-Google-Smtp-Source: ABdhPJzUTv/+o1tgFeWYGOhx9y50utJ2NQ3B5/sPu3pXedj38tVjwYbgPpNXi3L9lFwzu4XKTOtZqg== X-Received: by 2002:a63:4f4d:0:b0:372:8724:8397 with SMTP id p13-20020a634f4d000000b0037287248397mr2865541pgl.598.1645112655058; Thu, 17 Feb 2022 07:44:15 -0800 (PST) Received: from localhost.localdomain ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id s6sm24108pfk.86.2022.02.17.07.44.11 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Feb 2022 07:44:14 -0800 (PST) From: Abel Wu To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira Cc: linux-kernel@vger.kernel.org Subject: [RFC PATCH 1/5] sched/fair: record overloaded cpus Date: Thu, 17 Feb 2022 23:43:57 +0800 Message-Id: <20220217154403.6497-2-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220217154403.6497-1-wuyun.abel@bytedance.com> References: <20220217154403.6497-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" An CFS runqueue is considered overloaded when there are more than one pullable non-idle tasks on it (since sched- idle cpus are treated as idle cpus). And idle tasks are counted towards rq->cfs.idle_h_nr_running, that is either assigned SCHED_IDLE policy or placed under idle cgroups. The overloaded cfs rqs can cause performance issues to both task types: - for latency critical tasks like SCHED_NORMAL, time of waiting in the rq will increase and result in higher pct99 latency, and - batch tasks may not be able to make full use of cpu capacity if sched-idle rq exists, thus presents poorer throughput. The mask of overloaded cpus is updated in periodic tick and the idle path at the LLC domain basis. This cpumask will also be used in SIS as a filter, improving idle cpu searching. Signed-off-by: Abel Wu --- include/linux/sched/topology.h | 10 ++++++++++ kernel/sched/core.c | 1 + kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++++++= ++++ kernel/sched/sched.h | 6 ++++++ kernel/sched/topology.c | 4 +++- 5 files changed, 63 insertions(+), 1 deletion(-) diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index 56cffe42abbc..03c9c81dc886 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -81,6 +81,16 @@ struct sched_domain_shared { atomic_t ref; atomic_t nr_busy_cpus; int has_idle_cores; + + /* + * The above varibles are used in idle path and + * select_task_rq, and the following two are + * mainly updated in tick. They are all hot but + * for different usage, so start a new cacheline + * to avoid false sharing. + */ + atomic_t nr_overloaded ____cacheline_aligned; + unsigned long overloaded[]; /* Must be last */ }; =20 struct sched_domain { diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1d863d7f6ad7..a6da2998ec49 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9423,6 +9423,7 @@ void __init sched_init(void) rq->wake_stamp =3D jiffies; rq->wake_avg_idle =3D rq->avg_idle; rq->max_idle_balance_cost =3D sysctl_sched_migration_cost; + rq->overloaded =3D 0; =20 INIT_LIST_HEAD(&rq->cfs_tasks); =20 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5c4bfffe8c2c..0a0438c3319b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6968,6 +6968,46 @@ balance_fair(struct rq *rq, struct task_struct *prev= , struct rq_flags *rf) =20 return newidle_balance(rq, rf) !=3D 0; } + +static inline int cfs_rq_overloaded(struct rq *rq) +{ + return rq->cfs.h_nr_running - rq->cfs.idle_h_nr_running > 1; +} + +/* Must be called with rq locked */ +static void update_overload_status(struct rq *rq) +{ + struct sched_domain_shared *sds; + int overloaded =3D cfs_rq_overloaded(rq); + int cpu =3D cpu_of(rq); + + lockdep_assert_rq_held(rq); + + if (rq->overloaded =3D=3D overloaded) + return; + + rcu_read_lock(); + sds =3D rcu_dereference(per_cpu(sd_llc_shared, cpu)); + if (unlikely(!sds)) + goto unlock; + + if (overloaded) { + cpumask_set_cpu(cpu, sdo_mask(sds)); + atomic_inc(&sds->nr_overloaded); + } else { + cpumask_clear_cpu(cpu, sdo_mask(sds)); + atomic_dec(&sds->nr_overloaded); + } + + rq->overloaded =3D overloaded; +unlock: + rcu_read_unlock(); +} + +#else + +static inline void update_overload_status(struct rq *rq) { } + #endif /* CONFIG_SMP */ =20 static unsigned long wakeup_gran(struct sched_entity *se) @@ -7315,6 +7355,8 @@ done: __maybe_unused; if (new_tasks > 0) goto again; =20 + update_overload_status(rq); + /* * rq is about to be idle, check if we need to update the * lost_idle_time of clock_pelt @@ -11131,6 +11173,7 @@ static void task_tick_fair(struct rq *rq, struct ta= sk_struct *curr, int queued) if (static_branch_unlikely(&sched_numa_balancing)) task_tick_numa(rq, curr); =20 + update_overload_status(rq); update_misfit_status(curr, rq); update_overutilized_status(task_rq(curr)); =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9b33ba9c3c42..c81a87082b8b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1012,6 +1012,7 @@ struct rq { =20 unsigned char nohz_idle_balance; unsigned char idle_balance; + unsigned char overloaded; =20 unsigned long misfit_task_load; =20 @@ -1762,6 +1763,11 @@ static inline struct sched_domain *lowest_flag_domai= n(int cpu, int flag) return sd; } =20 +static inline struct cpumask *sdo_mask(struct sched_domain_shared *sds) +{ + return to_cpumask(sds->overloaded); +} + DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc); DECLARE_PER_CPU(int, sd_llc_size); DECLARE_PER_CPU(int, sd_llc_id); diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index e6cd55951304..641f11415819 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1623,6 +1623,8 @@ sd_init(struct sched_domain_topology_level *tl, sd->shared =3D *per_cpu_ptr(sdd->sds, sd_id); atomic_inc(&sd->shared->ref); atomic_set(&sd->shared->nr_busy_cpus, sd_weight); + atomic_set(&sd->shared->nr_overloaded, 0); + cpumask_clear(sdo_mask(sd->shared)); } =20 sd->private =3D sdd; @@ -2050,7 +2052,7 @@ static int __sdt_alloc(const struct cpumask *cpu_map) =20 *per_cpu_ptr(sdd->sd, j) =3D sd; =20 - sds =3D kzalloc_node(sizeof(struct sched_domain_shared), + sds =3D kzalloc_node(sizeof(struct sched_domain_shared) + cpumask_size(= ), GFP_KERNEL, cpu_to_node(j)); if (!sds) return -ENOMEM; --=20 2.11.0 From nobody Sun Jun 28 00:10:22 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3DD0C433F5 for ; Thu, 17 Feb 2022 15:44:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242701AbiBQPor (ORCPT ); Thu, 17 Feb 2022 10:44:47 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:46644 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242675AbiBQPoe (ORCPT ); Thu, 17 Feb 2022 10:44:34 -0500 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13EDDB23AD for ; Thu, 17 Feb 2022 07:44:20 -0800 (PST) Received: by mail-pl1-x62c.google.com with SMTP id z17so4893641plb.9 for ; Thu, 17 Feb 2022 07:44:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=8c8qUSDML4pELaqmow9If9ubJ2YNKV4cb1f0Kg7vIhc=; b=DZJEGwcKPRuKMTlN8wElSnrFIVNjERV1GQdLA1u6bMTbcrxKI/CFuusUoSIFs3Hyyh uUWYI+tHr6HEiklU3LfcjNGoViUmJY/CY4UVxo2RSQdnNVDw4H8KXTnaluw1WlgiWMkS kTYsBxvFFtIPNBqSUdpDSgKC/yP4vtTgZkPghn7sLGAWg8J5QZuVxwbXIOG3w2aXC/wA 74DBk7OxKZiQg+Kk2QTFERT0sy/fdZpSUJ4E8B/gSiTv74q+rjwjOW2zXQqnfBUXiV2N OuYdnHFFZbNoP0BqhQQztpUrHxfDkQICXvcryWOjFX63fHJntzggwTlbQ17bJRoFz98y siKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8c8qUSDML4pELaqmow9If9ubJ2YNKV4cb1f0Kg7vIhc=; b=MDV3+Uufr451l3xFOJHu5hx06ZpHekemziqw7lI77YRyn6RCsfE5KFQjJPC5ZaqHON KcnC6XwZIs4OqPXYZuFo7OokcSmRGAlMfht0OL+FP2Bf6LLfQ8XwMg3HOKE6eQIp0s5g Xa1ozNX0wfFeB9m6AzRHJQJ8/+rVqRWROE+cDgn1+mqBpEVAIfo7BNezHBGKgmYQkadu ONjQeA2KBBoqA71TiJsefvh210Q0rJhbDlzV3vvib5Y5ThCLqlhy5gRC+TOjk6uOFo2h y97AiQM9q/teO+nJLU9yht+wGSEYr6ZriYd17bfsPKe8jNLXeeIpU6OeWMin5Jorw6I8 3bCQ== X-Gm-Message-State: AOAM533UjzKHhOHRMthIuz80Nnfo10Ym5WQoTldDmAP6bv0da41C4xZ7 w5biy9VQP9/VrxA2XPNz2LO1gS+QoR70Pw== X-Google-Smtp-Source: ABdhPJyJWf0xIPg3G7CLAb8e9ZxCOdulcgJWLUbeDnOUXht7QJ44V3fwvfk0SdNcoBkVWlyX72f16g== X-Received: by 2002:a17:90b:3755:b0:1b9:bc4a:b9dc with SMTP id ne21-20020a17090b375500b001b9bc4ab9dcmr3576201pjb.159.1645112659553; Thu, 17 Feb 2022 07:44:19 -0800 (PST) Received: from localhost.localdomain ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id s6sm24108pfk.86.2022.02.17.07.44.15 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Feb 2022 07:44:19 -0800 (PST) From: Abel Wu To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira Cc: linux-kernel@vger.kernel.org Subject: [RFC PATCH 2/5] sched/fair: introduce sched-idle balance Date: Thu, 17 Feb 2022 23:43:58 +0800 Message-Id: <20220217154403.6497-3-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220217154403.6497-1-wuyun.abel@bytedance.com> References: <20220217154403.6497-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The goal of the sched-idle balancing is to let the non-idle tasks make full use of cpu resources. To achieve that, we mainly do two things: - pull non-idle tasks for sched-idle or idle rqs from the overloaded ones, and - prevent pulling the last non-idle task in an rq We do sched-idle balance at normal load balancing and newly idle if necessary. The idle balancing is ignored due to high wakeup latency. Signed-off-by: Abel Wu --- include/linux/sched/idle.h | 1 + kernel/sched/fair.c | 128 +++++++++++++++++++++++++++++++++++++++++= ++++ 2 files changed, 129 insertions(+) diff --git a/include/linux/sched/idle.h b/include/linux/sched/idle.h index d73d314d59c6..50ec5c770f85 100644 --- a/include/linux/sched/idle.h +++ b/include/linux/sched/idle.h @@ -8,6 +8,7 @@ enum cpu_idle_type { CPU_IDLE, CPU_NOT_IDLE, CPU_NEWLY_IDLE, + CPU_SCHED_IDLE, CPU_MAX_IDLE_TYPES }; =20 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0a0438c3319b..070a6fb1d2bf 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -456,6 +456,21 @@ static int se_is_idle(struct sched_entity *se) return cfs_rq_is_idle(group_cfs_rq(se)); } =20 +/* Is task idle from the top hierarchy POV */ +static int task_h_idle(struct task_struct *p) +{ + struct sched_entity *se =3D &p->se; + + if (task_has_idle_policy(p)) + return 1; + + for_each_sched_entity(se) + if (cfs_rq_is_idle(cfs_rq_of(se))) + return 1; + + return 0; +} + #else /* !CONFIG_FAIR_GROUP_SCHED */ =20 #define for_each_sched_entity(se) \ @@ -508,6 +523,11 @@ static int se_is_idle(struct sched_entity *se) return 0; } =20 +static inline int task_h_idle(struct task_struct *p) +{ + return task_has_idle_policy(p); +} + #endif /* CONFIG_FAIR_GROUP_SCHED */ =20 static __always_inline @@ -6974,6 +6994,11 @@ static inline int cfs_rq_overloaded(struct rq *rq) return rq->cfs.h_nr_running - rq->cfs.idle_h_nr_running > 1; } =20 +static inline bool need_pull_cfs_task(struct rq *rq) +{ + return rq->cfs.h_nr_running =3D=3D rq->cfs.idle_h_nr_running; +} + /* Must be called with rq locked */ static void update_overload_status(struct rq *rq) { @@ -7767,6 +7792,22 @@ int can_migrate_task(struct task_struct *p, struct l= b_env *env) if (kthread_is_per_cpu(p)) return 0; =20 + /* + * Disregard hierarchically idle tasks during sched-idle + * load balancing. + */ + if (env->idle =3D=3D CPU_SCHED_IDLE && task_h_idle(p)) + return 0; + + /* + * Skip p if it is the last non-idle task in src_rq. This + * protects latency and throughput for non-idle tasks at + * the cost of temporary load imbalance (which will probably + * be fixed soon). + */ + if (!cfs_rq_overloaded(env->src_rq) && !task_h_idle(p)) + return 0; + if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) { int cpu; =20 @@ -10265,6 +10306,83 @@ static inline bool update_newidle_cost(struct sche= d_domain *sd, u64 cost) } =20 /* + * The sched-idle balancing tries to eliminate overloaded cfs rqs + * by spreading out non-idle tasks prior to normal load balancing. + */ +static void sched_idle_balance(struct rq *dst_rq) +{ + struct sched_domain *sd; + struct task_struct *p; + int dst_cpu =3D cpu_of(dst_rq), cpu; + + sd =3D rcu_dereference(per_cpu(sd_llc, dst_cpu)); + if (unlikely(!sd)) + return; + + if (!atomic_read(&sd->shared->nr_overloaded)) + return; + + for_each_cpu_wrap(cpu, sdo_mask(sd->shared), dst_cpu + 1) { + struct rq *rq =3D cpu_rq(cpu); + struct rq_flags rf; + struct lb_env env; + + if (cpu =3D=3D dst_cpu) + continue; + + if (!cfs_rq_overloaded(rq)) + continue; + + rq_lock_irqsave(rq, &rf); + + /* + * Check again to ensure there are pullable tasks. + * This is necessary because multiple rqs can pull + * tasks at the same time. IOW contention on this + * rq is heavy, so it would be better clear this + * cpu from overloaded mask. + */ + if (unlikely(!cfs_rq_overloaded(rq))) { + update_overload_status(rq); + rq_unlock_irqrestore(rq, &rf); + continue; + } + + env =3D (struct lb_env) { + .sd =3D sd, + .dst_cpu =3D dst_cpu, + .dst_rq =3D dst_rq, + .src_cpu =3D cpu, + .src_rq =3D rq, + .idle =3D CPU_SCHED_IDLE, /* non-idle only */ + .flags =3D LBF_DST_PINNED, /* pin dst_cpu */ + }; + + update_rq_clock(rq); + p =3D detach_one_task(&env); + + /* + * Lazy updating overloaded mask here. If the rq is + * still overloaded then we are just wasting cycles. + * And it's OK even if the rq becomes un-overloaded + * since the cost of peeking rq's data without lock + * won't be much in next loops (during which the rq + * can even be overloaded again). + */ + + rq_unlock(rq, &rf); + + if (p) { + attach_one_task(dst_rq, p); + local_irq_restore(rf.flags); + return; + } + + local_irq_restore(rf.flags); + } +} + +/* * It checks each scheduling domain to see if it is due to be balanced, * and initiates a balancing operation if so. * @@ -10284,6 +10402,10 @@ static void rebalance_domains(struct rq *rq, enum = cpu_idle_type idle) u64 max_cost =3D 0; =20 rcu_read_lock(); + + if (need_pull_cfs_task(rq)) + sched_idle_balance(rq); + for_each_domain(cpu, sd) { /* * Decay the newidle max times here because this is a regular @@ -10913,6 +11035,12 @@ static int newidle_balance(struct rq *this_rq, str= uct rq_flags *rf) update_blocked_averages(this_cpu); =20 rcu_read_lock(); + + sched_idle_balance(this_rq); + t1 =3D sched_clock_cpu(this_cpu); + curr_cost +=3D t1 - t0; + t0 =3D t1; + for_each_domain(this_cpu, sd) { int continue_balancing =3D 1; u64 domain_cost; --=20 2.11.0 From nobody Sun Jun 28 00:10:22 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08CD1C433EF for ; Thu, 17 Feb 2022 15:44:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242706AbiBQPow (ORCPT ); Thu, 17 Feb 2022 10:44:52 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:47132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229721AbiBQPok (ORCPT ); Thu, 17 Feb 2022 10:44:40 -0500 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3FDE2B2FCC for ; Thu, 17 Feb 2022 07:44:26 -0800 (PST) Received: by mail-pl1-x62f.google.com with SMTP id i10so4918971plr.2 for ; Thu, 17 Feb 2022 07:44:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=pcumJNf5Ae0cpSONOSikL4F3HHoR7gzQ3j0ko1H1EYA=; b=xLqXVTBfwLXvvTwxShJClcJY4H8/n4Dqn/yUpBEFJAPWMFjImM+6x+J+4mkmQeqt+7 h7mp1wbiJl7SpeCswHgbeBkTtiao16nAFFdBKb5sYqs04D27N0wnSvqCG0lEnHW7qjMj qIkPaIxZS3cTaL+U8TlPKtfoz2N2mi0lc9u2iwfebOcTbK4n3nm+BiS3MwbkzQBEDPq/ AKTVSSXHd9HBZ1UXqLy4f1qs4y81wC/05kXptRdBgQuiorpMSunM6yBfu1TX1ue7uNJ2 MdwGaQ4ULsA4EciMsYHnUOYT6GlSXiELVdPGMaQQ7jgJJBXQe67mxgYdVscdCWnZ6JIq Zf+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=pcumJNf5Ae0cpSONOSikL4F3HHoR7gzQ3j0ko1H1EYA=; b=m6nSVJS26FCvOUS1eO+/XnYGW+8Qs0fJd2UnQGMpq6OZYcI3Cz+cK3tYIljVy57h59 Rkpkz5LPIyASfZ0HkZrP0s/YeDm6iO+n7dRNHPgsAutyQtK7AJaT5TMELdhKdUEkwbOI DtUFBgKGO4Qn65GRHhZqwc+7Gl630+4ikl4SO6BCZoACubO/SqvN7QwRIKIPdlYW+Jgs n3qa9xU2zbDc+Ff6t3XtObhgvTEamBSesuWtkR7chRkv5YfUe31LTH76rOVgC6leaJFi jq13ukpMSEi3bAhW0YgIxkV1K8pbVKFSz41HIc97jY6nM3BN11sCUJdNpK31eQkGsJR9 QOSg== X-Gm-Message-State: AOAM533o4xHFPBeO+yFVGZH7O34uVQ+q6p96b3fDa3jurm8EAGMNLAN6 83g9Jqhq7X0k8DuHUPNG3hpWAA== X-Google-Smtp-Source: ABdhPJwn0A6Ux0vWlggtaXcNdCJX/IVUFtMdq4N4cAnTHmRnk7Job/RoN/HtukkC3LnXh5iniu1cBA== X-Received: by 2002:a17:903:110d:b0:14d:85b2:4b36 with SMTP id n13-20020a170903110d00b0014d85b24b36mr3327997plh.75.1645112666141; Thu, 17 Feb 2022 07:44:26 -0800 (PST) Received: from localhost.localdomain ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id s6sm24108pfk.86.2022.02.17.07.44.20 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Feb 2022 07:44:23 -0800 (PST) From: Abel Wu To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira Cc: linux-kernel@vger.kernel.org Subject: [RFC PATCH 3/5] sched/fair: add stats for sched-idle balancing Date: Thu, 17 Feb 2022 23:43:59 +0800 Message-Id: <20220217154403.6497-4-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220217154403.6497-1-wuyun.abel@bytedance.com> References: <20220217154403.6497-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" To better understand the behavior of sched-idle balancing, add some statistics like other load balancing mechanisms did. Signed-off-by: Abel Wu --- include/linux/sched/topology.h | 5 +++++ kernel/sched/fair.c | 6 +++++- kernel/sched/stats.c | 5 +++-- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index 03c9c81dc886..4259963d3e5e 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -150,6 +150,11 @@ struct sched_domain { unsigned int ttwu_wake_remote; unsigned int ttwu_move_affine; unsigned int ttwu_move_balance; + + /* sched-idle balancing */ + unsigned int sib_peeked; + unsigned int sib_pulled; + unsigned int sib_failed; #endif #ifdef CONFIG_SCHED_DEBUG char *name; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 070a6fb1d2bf..c83c0864e429 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10330,8 +10330,10 @@ static void sched_idle_balance(struct rq *dst_rq) if (cpu =3D=3D dst_cpu) continue; =20 - if (!cfs_rq_overloaded(rq)) + if (!cfs_rq_overloaded(rq)) { + schedstat_inc(sd->sib_peeked); continue; + } =20 rq_lock_irqsave(rq, &rf); =20 @@ -10375,10 +10377,12 @@ static void sched_idle_balance(struct rq *dst_rq) if (p) { attach_one_task(dst_rq, p); local_irq_restore(rf.flags); + schedstat_inc(sd->sib_pulled); return; } =20 local_irq_restore(rf.flags); + schedstat_inc(sd->sib_failed); } } =20 diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c index 07dde2928c79..3ee476c72806 100644 --- a/kernel/sched/stats.c +++ b/kernel/sched/stats.c @@ -164,12 +164,13 @@ static int show_schedstat(struct seq_file *seq, void = *v) sd->lb_nobusyg[itype]); } seq_printf(seq, - " %u %u %u %u %u %u %u %u %u %u %u %u\n", + " %u %u %u %u %u %u %u %u %u %u %u %u %u %u %u\n", sd->alb_count, sd->alb_failed, sd->alb_pushed, sd->sbe_count, sd->sbe_balanced, sd->sbe_pushed, sd->sbf_count, sd->sbf_balanced, sd->sbf_pushed, sd->ttwu_wake_remote, sd->ttwu_move_affine, - sd->ttwu_move_balance); + sd->ttwu_move_balance, sd->sib_peeked, + sd->sib_pulled, sd->sib_failed); } rcu_read_unlock(); #endif --=20 2.11.0 From nobody Sun Jun 28 00:10:22 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44D0FC433EF for ; Thu, 17 Feb 2022 15:44:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242714AbiBQPpA (ORCPT ); Thu, 17 Feb 2022 10:45:00 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:47528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242696AbiBQPop (ORCPT ); Thu, 17 Feb 2022 10:44:45 -0500 Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 824742B1025 for ; Thu, 17 Feb 2022 07:44:31 -0800 (PST) Received: by mail-pf1-x433.google.com with SMTP id d187so5254753pfa.10 for ; Thu, 17 Feb 2022 07:44:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=RKkQi9dMA8sF35J+wiDdn1p2PDGkj85iaPf+444HF4E=; b=E0gO/iytRoHMUaWWM1mxd8l8f0NtDzy9w0nEZLnf7yorta0WAiPMdLmM7Wv9YOfoo/ k0oC/D0/DjZY4dStf+gorxnkSjtbyiPi4vOEdCzVEjGxlEwDbS5NIq+rP7EiGnPFATuK fNCyEaulD7GXYGC9LC6obH6sCG6wUZXsUR6JspdSH0uke1guQiM3brqeYOWDkbF76zWW ECO7+Cs4N+cbHSEjW8BtRRtD+juobpxS3/zcQ7nH9atKRl40ax49KMMX+xhSvLmkuZ4H Gv4q46pVtm2ZL0opphURbIpB6bzs1l561oNKFtf1H7M7Scdn68MWNRAwZZYctatcUe1J Ubyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=RKkQi9dMA8sF35J+wiDdn1p2PDGkj85iaPf+444HF4E=; b=HzChMkCiVPEhx0rSsQdRamIMwqUpEMa+Ud7aC0t6vM8PPO2fIetQaxoVFpf7T0WpXv C8NhTjYZxXVPoztytkv7eXJbEewxms71pZUeSa5GA2/p0NO9rgOEhhHQybbt+ztOMaMa ORozOoou+q+q593AWq7oFa8AkHCjJszbvWlU9ntAw5jxMwEZoWMhiwtJHJ+cpJ1oQMJs 2P4bYGtXiQH4Wx7UcykydoJJ5hTio5LV8Vx5/+8RZHUbB91BFLFh+32CQ63aLL65EU0U l1vTzNizwmsRGQNnb43U6UTAxuv1iR+ClEguO7kCmsXk6iOisQYkemMj4DaCDOht7zpP 5Y6w== X-Gm-Message-State: AOAM53257O6YEWLa5gp6mW11EI9OdXC26EH6TWkd0eIZ8ktxgeqpv654 i+6fTU44ZHafOJShwVQUyL+8Hg== X-Google-Smtp-Source: ABdhPJzUYvCnnJhJgaMqTONin63sSutNin9KcezQzLFNRjL19K1ZsgzeT6dAFZv8BWgHEpmCx1rwxg== X-Received: by 2002:a05:6a02:184:b0:373:a24e:5ab with SMTP id bj4-20020a056a02018400b00373a24e05abmr2895106pgb.400.1645112671092; Thu, 17 Feb 2022 07:44:31 -0800 (PST) Received: from localhost.localdomain ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id s6sm24108pfk.86.2022.02.17.07.44.26 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Feb 2022 07:44:30 -0800 (PST) From: Abel Wu To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira Cc: linux-kernel@vger.kernel.org Subject: [RFC PATCH 4/5] sched/fair: filter out overloaded cpus in sis Date: Thu, 17 Feb 2022 23:44:00 +0800 Message-Id: <20220217154403.6497-5-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220217154403.6497-1-wuyun.abel@bytedance.com> References: <20220217154403.6497-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Skip overloaded cpus in SIS if any. This improves idle cpu searching, especially under the SIS_PROP constrain that search depth is limited. The mask of overloaded cpus might not be quite accurate since it is generally updated at tick granule, but the overloaded cpus are unlikely to go into idle shortly. Signed-off-by: Abel Wu --- kernel/sched/fair.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c83c0864e429..1d8f396e6f41 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6273,6 +6273,9 @@ static int select_idle_cpu(struct task_struct *p, str= uct sched_domain *sd, bool =20 cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr); =20 + if (atomic_read(&sd->shared->nr_overloaded)) + cpumask_andnot(cpus, cpus, sdo_mask(sd->shared)); + if (sched_feat(SIS_PROP) && !has_idle_core) { u64 avg_cost, avg_idle, span_avg; unsigned long now =3D jiffies; --=20 2.11.0 From nobody Sun Jun 28 00:10:22 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91560C433EF for ; Thu, 17 Feb 2022 15:44:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242707AbiBQPo4 (ORCPT ); Thu, 17 Feb 2022 10:44:56 -0500 Received: from mxb-00190b01.gslb.pphosted.com ([23.128.96.19]:47658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240712AbiBQPoz (ORCPT ); Thu, 17 Feb 2022 10:44:55 -0500 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1BA814091 for ; Thu, 17 Feb 2022 07:44:35 -0800 (PST) Received: by mail-pg1-x533.google.com with SMTP id 132so5366186pga.5 for ; Thu, 17 Feb 2022 07:44:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=j0E9JlSv55Udz7d31hv5lpiU6VWwDB8VKAhZNevzXCs=; b=yRKtUo7EWwFSzty1/lI/wG7khNU9AhaEy8TYjtR22apwxHHnz2GSuAmhVM3aV945VV mmOf2MlnXQhrREhiFPfUEdqRzkhelCfh/hXphaspizEXniYntuu9dvkSm92kSBPyPKik PrH8Jkqc8WaFkQN3Iq/+7N7yw3RUvArygQEe4CyDPd1jylnrtHthsFkiV7c2KH+ET844 CspHyjYCg86bVQ4CTkINn+hUTodlmdU7XMnT/4leFUNO7d9inxT6Wsl7BrhyUUV5GXuF ZXVpKcFWr3W5oILJMqtgUE3feFo/5Dab6Bt7suNT7IScYEirBbKGhuJ7FWoOsjuf5UIx KRWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=j0E9JlSv55Udz7d31hv5lpiU6VWwDB8VKAhZNevzXCs=; b=zvCoSH0pQ0l6UP9H78M+Rl7nRcmksTNEVgevMCbCdGQDJ0s/HIqNnrwWryzzWBUOA9 yDGgLOmo3aDoUDxygjk7iXWZdwW2uldYNYjlwHCUVtpddlizWe14WOhVYdzPK3r/jABs 9rV80XNUMVQRkWrBIwd6olPKQmZmxRsLmLeiGur5mEHA67BVNmweFg4k9a2uWSjmUbj/ fdBgnqm5B9/vHvzAwhJCtrcpGJK7CJeN3nQrdsAS42cyohdA/VSqAnRaJ72upQMl+DBL YlvNLVVvW3ExgtFBwe1AG5FR/htRJIqsJmpGjBu5rOfeCMdo28OxX2ihe+O8Hju5dhuk UStA== X-Gm-Message-State: AOAM532CvjvYaDmTkGipw0G0LUPOgDgPsprz/h3H3OFg+Qd3++saj0Dh vt4Sxy6gqU+hzE0+IiwoxyuAxmGGDdnsUA== X-Google-Smtp-Source: ABdhPJxzUUurSFtJCNWKs4as8Ze+x95ZUmyDEQ4C51KUy8A2qI+sXwvR8P0mTJJYRMxUT/f44U8H7A== X-Received: by 2002:a63:1f02:0:b0:364:6f52:7ab2 with SMTP id f2-20020a631f02000000b003646f527ab2mr2817581pgf.396.1645112675447; Thu, 17 Feb 2022 07:44:35 -0800 (PST) Received: from localhost.localdomain ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id s6sm24108pfk.86.2022.02.17.07.44.31 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Feb 2022 07:44:35 -0800 (PST) From: Abel Wu To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira Cc: linux-kernel@vger.kernel.org Subject: [RFC PATCH 5/5] sched/fair: favor cpu capacity for idle tasks Date: Thu, 17 Feb 2022 23:44:01 +0800 Message-Id: <20220217154403.6497-6-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20220217154403.6497-1-wuyun.abel@bytedance.com> References: <20220217154403.6497-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Unlike select_idle_sibling() in which we need to find a not-so-bad candidate ASAP, the slowpath gives us more tolerance: ignore sched-idle cpus for idle tasks since they prefer cpu capacity rather than latency, and besides spreading out idle tasks also good for latency of normal tasks. Signed-off-by: Abel Wu --- kernel/sched/fair.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1d8f396e6f41..57f1d8c43228 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6007,6 +6007,7 @@ find_idlest_group(struct sched_domain *sd, struct tas= k_struct *p, int this_cpu); static int find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, in= t this_cpu) { + bool ignore_si =3D task_h_idle(p); unsigned long load, min_load =3D ULONG_MAX; unsigned int min_exit_latency =3D UINT_MAX; u64 latest_idle_timestamp =3D 0; @@ -6025,7 +6026,13 @@ find_idlest_group_cpu(struct sched_group *group, str= uct task_struct *p, int this if (!sched_core_cookie_match(rq, p)) continue; =20 - if (sched_idle_cpu(i)) + /* + * The idle tasks prefer cpu capacity rather than + * latency. Spreading out idle tasks also good for + * latency of normal tasks since they won't suffer + * high cpu wakeup delay. + */ + if (!ignore_si && sched_idle_cpu(i)) return i; =20 if (available_idle_cpu(i)) { --=20 2.11.0