From nobody Wed Apr 8 04:34:30 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F0B1C00140 for ; Wed, 24 Aug 2022 08:20:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235828AbiHXIUg (ORCPT ); Wed, 24 Aug 2022 04:20:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235764AbiHXIUX (ORCPT ); Wed, 24 Aug 2022 04:20:23 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0A5AE8A7EA for ; Wed, 24 Aug 2022 01:20:16 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id x19so13012590pfq.1 for ; Wed, 24 Aug 2022 01:20:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=bAVnRb19pf9PwP9xG6OiD8KoZISv9vXBBeWuZGjOxXg=; b=5TE7MY5htLE6CIQSm+EL9kGn4AZR8ubMCRgoUNLToBnYDqA8A7gQ9rGfrWZfmPsZ9v R1MbWWmWth5F0PBQCC06qwmm0jMNPwp/6yq4Jl3OS+h8p3ZO++PLX1Zkc64219ChaaV/ 1bD5G9nI5UXe4+6fOsoTTxHmKAhIjI5NALsIOlbu3PrFC+7CGwIj5gzSMGMp1fZ9gWk+ Q9wbWAVCdqiLCLtcBQqV9lU4VRLR6NtDsMNSghiyrIAWFjCpP9Rg4ai0vwaG1HhTj3Ka ikNrRsRaj0L8LJ1sbuy9fmx2jOADpuCij8iQL+w6Ib3Q3ntGh+c8JsQ+UdKOOvb9a0jU 2zYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=bAVnRb19pf9PwP9xG6OiD8KoZISv9vXBBeWuZGjOxXg=; b=Rumjn8+HD5iP38JtkoKGfqIfJMgc/+fngsex/v26Wia98lRmJqjotIeFMMD0VcqLQM zQy+HsXmjEmmsSta59ia5sfDY8W+FjkeVXW+Y7QC0dKiO5Bq9rOrFIx8hvkA2vrfHGyy S34H4aOIPjU+XInFhy2bcLRVRhbHPAWqd98+gn8UIWUZurK18BP8Rd4Y2pFdDyKNWiQ/ KZRbXFGuPyerm0NFtPDu2Xyi++v8NKWKI/Df3z068dczMtAryvR8j3TckwB8za+lRs0c Mbg9tcwiULexY6L2js0kgPd7jyXVAu63wMKMFib3i0fhqiseYxOoGxGO8ppCpMQbgcYu kK4w== X-Gm-Message-State: ACgBeo0GIjjhcc01FhA8V79HtBEbVE6PZyO2q7kl4x4Ql464zuRaHEjQ O0ozyDXA0oPQTYUvuva8tvsSYQ== X-Google-Smtp-Source: AA6agR5wlAsnCaO9Lk5Bcq7Zui37MrgHL2TTtrCp3qP0l/TUhzYCaUiFbrEJFN7g+OCtaLosxuKn/g== X-Received: by 2002:a63:d84f:0:b0:428:ee87:3769 with SMTP id k15-20020a63d84f000000b00428ee873769mr23966270pgj.212.1661329215397; Wed, 24 Aug 2022 01:20:15 -0700 (PDT) Received: from C02CV1DAMD6P.bytedance.net ([139.177.225.244]) by smtp.gmail.com with ESMTPSA id q31-20020a635c1f000000b00421841943dfsm10486587pgb.12.2022.08.24.01.20.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Aug 2022 01:20:15 -0700 (PDT) From: Chengming Zhou To: tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, surenb@google.com Cc: gregkh@linuxfoundation.org, corbet@lwn.net, mingo@redhat.com, peterz@infradead.org, songmuchun@bytedance.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Chengming Zhou Subject: [PATCH v3 06/10] sched/psi: remove NR_ONCPU task accounting Date: Wed, 24 Aug 2022 16:18:25 +0800 Message-Id: <20220824081829.33748-7-zhouchengming@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220824081829.33748-1-zhouchengming@bytedance.com> References: <20220824081829.33748-1-zhouchengming@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" From: Johannes Weiner We put all fields updated by the scheduler in the first cacheline of struct psi_group_cpu for performance. Since we want add another PSI_IRQ_FULL to track IRQ/SOFTIRQ pressure, we need to reclaim space first. This patch remove NR_ONCPU task accounting in struct psi_group_cpu, use one bit in state_mask to track instead. Signed-off-by: Johannes Weiner Signed-off-by: Chengming Zhou Reviewed-by: Chengming Zhou Tested-by: Chengming Zhou --- include/linux/psi_types.h | 16 +++++++-------- kernel/sched/psi.c | 41 ++++++++++++++++++++++++++++----------- 2 files changed, 37 insertions(+), 20 deletions(-) diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index c7fe7c089718..54cb74946db4 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -15,13 +15,6 @@ enum psi_task_count { NR_IOWAIT, NR_MEMSTALL, NR_RUNNING, - /* - * This can't have values other than 0 or 1 and could be - * implemented as a bit flag. But for now we still have room - * in the first cacheline of psi_group_cpu, and this way we - * don't have to special case any state tracking for it. - */ - NR_ONCPU, /* * For IO and CPU stalls the presence of running/oncpu tasks * in the domain means a partial rather than a full stall. @@ -32,16 +25,18 @@ enum psi_task_count { * threads and memstall ones. */ NR_MEMSTALL_RUNNING, - NR_PSI_TASK_COUNTS =3D 5, + NR_PSI_TASK_COUNTS =3D 4, }; =20 /* Task state bitmasks */ #define TSK_IOWAIT (1 << NR_IOWAIT) #define TSK_MEMSTALL (1 << NR_MEMSTALL) #define TSK_RUNNING (1 << NR_RUNNING) -#define TSK_ONCPU (1 << NR_ONCPU) #define TSK_MEMSTALL_RUNNING (1 << NR_MEMSTALL_RUNNING) =20 +/* Only one task can be scheduled, no corresponding task count */ +#define TSK_ONCPU (1 << NR_PSI_TASK_COUNTS) + /* Resources that workloads could be stalled on */ enum psi_res { PSI_IO, @@ -68,6 +63,9 @@ enum psi_states { NR_PSI_STATES =3D 7, }; =20 +/* Use one bit in the state mask to track TSK_ONCPU */ +#define PSI_ONCPU (1 << NR_PSI_STATES) + enum psi_aggregators { PSI_AVGS =3D 0, PSI_POLL, diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 26c03bd56b9c..af83531162fc 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -212,7 +212,7 @@ void __init psi_init(void) group_init(&psi_system); } =20 -static bool test_state(unsigned int *tasks, enum psi_states state) +static bool test_state(unsigned int *tasks, enum psi_states state, bool on= cpu) { switch (state) { case PSI_IO_SOME: @@ -225,9 +225,9 @@ static bool test_state(unsigned int *tasks, enum psi_st= ates state) return unlikely(tasks[NR_MEMSTALL] && tasks[NR_RUNNING] =3D=3D tasks[NR_MEMSTALL_RUNNING]); case PSI_CPU_SOME: - return unlikely(tasks[NR_RUNNING] > tasks[NR_ONCPU]); + return unlikely(tasks[NR_RUNNING] > oncpu); case PSI_CPU_FULL: - return unlikely(tasks[NR_RUNNING] && !tasks[NR_ONCPU]); + return unlikely(tasks[NR_RUNNING] && !oncpu); case PSI_NONIDLE: return tasks[NR_IOWAIT] || tasks[NR_MEMSTALL] || tasks[NR_RUNNING]; @@ -689,9 +689,9 @@ static void psi_group_change(struct psi_group *group, i= nt cpu, bool wake_clock) { struct psi_group_cpu *groupc; - u32 state_mask =3D 0; unsigned int t, m; enum psi_states s; + u32 state_mask; =20 groupc =3D per_cpu_ptr(group->pcpu, cpu); =20 @@ -707,17 +707,36 @@ static void psi_group_change(struct psi_group *group,= int cpu, =20 record_times(groupc, now); =20 + /* + * Start with TSK_ONCPU, which doesn't have a corresponding + * task count - it's just a boolean flag directly encoded in + * the state mask. Clear, set, or carry the current state if + * no changes are requested. + */ + if (unlikely(clear & TSK_ONCPU)) { + state_mask =3D 0; + clear &=3D ~TSK_ONCPU; + } else if (unlikely(set & TSK_ONCPU)) { + state_mask =3D PSI_ONCPU; + set &=3D ~TSK_ONCPU; + } else { + state_mask =3D groupc->state_mask & PSI_ONCPU; + } + + /* + * The rest of the state mask is calculated based on the task + * counts. Update those first, then construct the mask. + */ for (t =3D 0, m =3D clear; m; m &=3D ~(1 << t), t++) { if (!(m & (1 << t))) continue; if (groupc->tasks[t]) { groupc->tasks[t]--; } else if (!psi_bug) { - printk_deferred(KERN_ERR "psi: task underflow! cpu=3D%d t=3D%d tasks=3D= [%u %u %u %u %u] clear=3D%x set=3D%x\n", + printk_deferred(KERN_ERR "psi: task underflow! cpu=3D%d t=3D%d tasks=3D= [%u %u %u %u] clear=3D%x set=3D%x\n", cpu, t, groupc->tasks[0], groupc->tasks[1], groupc->tasks[2], - groupc->tasks[3], groupc->tasks[4], - clear, set); + groupc->tasks[3], clear, set); psi_bug =3D 1; } } @@ -726,9 +745,8 @@ static void psi_group_change(struct psi_group *group, i= nt cpu, if (set & (1 << t)) groupc->tasks[t]++; =20 - /* Calculate state mask representing active states */ for (s =3D 0; s < NR_PSI_STATES; s++) { - if (test_state(groupc->tasks, s)) + if (test_state(groupc->tasks, s, state_mask & PSI_ONCPU)) state_mask |=3D (1 << s); } =20 @@ -740,7 +758,7 @@ static void psi_group_change(struct psi_group *group, i= nt cpu, * task in a cgroup is in_memstall, the corresponding groupc * on that cpu is in PSI_MEM_FULL state. */ - if (unlikely(groupc->tasks[NR_ONCPU] && cpu_curr(cpu)->in_memstall)) + if (unlikely((state_mask & PSI_ONCPU) && cpu_curr(cpu)->in_memstall)) state_mask |=3D (1 << PSI_MEM_FULL); =20 groupc->state_mask =3D state_mask; @@ -829,7 +847,8 @@ void psi_task_switch(struct task_struct *prev, struct t= ask_struct *next, */ iter =3D NULL; while ((group =3D iterate_groups(next, &iter))) { - if (per_cpu_ptr(group->pcpu, cpu)->tasks[NR_ONCPU]) { + if (per_cpu_ptr(group->pcpu, cpu)->state_mask & + PSI_ONCPU) { common =3D group; break; } --=20 2.37.2