From nobody Sun Dec 14 13:57:03 2025 Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DA961632D7; Sun, 11 May 2025 03:08:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746932915; cv=none; b=i5wkMBGd3ofWCXx9FoujQlpe/Pz0R0sPOiVcP1Vani2TZ49U1Kj9g/pVbOBI/rr3sJUmSac6hp6BgPLLRGvyiZfGqHmN3w2SN1cLk3WNo1MNDm1mMYIvWq+VmtLdJPrkGI67KIVgJe6wPhVSWB2jw1U0aEwhVz3JKv7bpe/+eKc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746932915; c=relaxed/simple; bh=MQEsQHR3gRve3q2ofgvAZApoif+uVZYs6z9H/I8Zi7Q=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version:Content-Type; b=XtOykp20LBgfOsbB9pzBltoLRudqqk0qL6P7nvMXU5GkLv+XWFGNHY9uixfUwsgO9flH6cTN4kI7QFS+uo/2PtDG7sqm2XN5cF/wb3Cy42ePt+gkK4B/HosrtPqdEl/GMNeX83ST5SL1kwIhAAnry3C/WqRDWl68Q+2RGdJAdYE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BfqGLWzv; arc=none smtp.client-ip=209.85.210.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BfqGLWzv" Received: by mail-pf1-f172.google.com with SMTP id d2e1a72fcca58-74237a74f15so3147161b3a.0; Sat, 10 May 2025 20:08:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746932912; x=1747537712; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Gt7RuWvxVmDNymBvAA2iervDVjFxk6KLxIbmPbMBI7s=; b=BfqGLWzvWb2RDu/Tkn1mBjk+4VWl/Te3DJ+wBTnFY2LsleJZXKeYI2KoOD9qQfy31s 60oJwj2vtUTGLWClgcIGOYOXLy5VpnSrq8VJf81/SDqJzqd8hNqGavTREUfzuqgdWLk0 r2s7m7klYKX/1MQmS9iN7WctvKqDhkhSD7FpVsSrQ0fHT/ge9Jli37xc6TjF2VswkCDN 9JWaghXVNhEYYVB0H5Hk8cdCY0eHDfL/t4ssMBiKFXsPCMHF4T+qp3OsmjafFFpXhfO0 dZtCpDnwJ3yDXIYDd1tgNELxovLy54DlfdZlg3NynvH4fCt4+mOdioDCmIhQAmXfGvSP XHeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746932912; x=1747537712; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Gt7RuWvxVmDNymBvAA2iervDVjFxk6KLxIbmPbMBI7s=; b=mizOB639IVMBtTcDs9dap/hBHjnLO0LImc8ftOjzk5xlLBxs4RTye7yzJiMh2+/RIV SOZ4PDUCXh0DKugO5sQuKMVRcUxVhHfYr9SNbPCRmQp6uCDdZM4PffoWHYPsn3sPOlkp a4EE1cxcKQDYqRsY5kOrUummDvgH/jng7bbY+yjOyDfHsZ6gJeui0iKiXSzbAtfrGlL2 +DnPFPB21aA8fZx6BWaDn6CD4NaYygIYndmBeS8F1/YZWifLn5+qX905DrPtrDnlu8Dc lyiGYFGAqtRGtmhn9lo8LVKxrpDDsgoLSNHB5LdhFIvVQOs+YOgtU604Q5AF1F5OsLy+ mHGg== X-Forwarded-Encrypted: i=1; AJvYcCVzaMptTcHjd3D+wRa9ZZib2DpVhzS9FucqUJmvmmAj/ZYVsdTRDZ3wem9o+Zgqup8O8ScP2WviBwy9m53n@vger.kernel.org, AJvYcCWJ/nDC6CwisIWyDx1yYywBnG5E5z83szbOdSAIPaNUp2JWqVZibKuKWSMscoczwXTttFQ1523w@vger.kernel.org X-Gm-Message-State: AOJu0YwS80ZSy3zuJA1QTy9VfraTdDBJLMiCCMGkJCJJL1b3kQ1xIyjU AfBuclhZ1UMkotzhfbMQeT3VG2+9GrbxHdv7aTf0dk1yj2e22M+2 X-Gm-Gg: ASbGncvoj8a5KPXMNC1lbSXro/iZcGZNmerqgoDaFfqfotTsXTKL+WOMvxA8MztqVjN 97hVrdPF/M16hqYOqGi8CSZVVFGZvvZa8bW4q5VbAWOPLAnJIuxiPkmKrLgfNSzs5bPN6bf0jOY 8B0CB1pLbZuG3HDeXEvsOHspcWyuIW7jaYcL/b7nTvI3pz0AHrxBTs5JPRRtuE7Sd1Wc2Cr5u9l uyVKxcqz58e2vtGgHb6kKs+Vh/APLZQ9XVJcAVVbTYfhxx3t8PtEzAOVsOv2g5Yj7RDrXjwQ/oa nbp/4jqLWXMiVAAI79Xbef5CGrExQ667IAo5GuhGxsJ+Hjzmql7mTx0gYTJt8j7TZghrvp0PaqT V X-Google-Smtp-Source: AGHT+IHwTZDePmas/9n60pqusuSVwcyJig1GGPD8uSrk/a3UQoN2MLg28Aj3VU93LkhQWzNiv/RdhQ== X-Received: by 2002:a05:6a00:997:b0:73e:b65:bc47 with SMTP id d2e1a72fcca58-7423bfe4cf7mr11349477b3a.17.1746932912456; Sat, 10 May 2025 20:08:32 -0700 (PDT) Received: from localhost.localdomain ([39.144.107.128]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74237a105fdsm3956883b3a.100.2025.05.10.20.08.28 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sat, 10 May 2025 20:08:31 -0700 (PDT) From: Yafang Shao To: mingo@redhat.com, peterz@infradead.org, mkoutny@suse.com, hannes@cmpxchg.org Cc: juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, surenb@google.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lkp@intel.com, Yafang Shao Subject: [PATCH v9 1/2] sched: Fix cgroup irq time for CONFIG_IRQ_TIME_ACCOUNTING Date: Sun, 11 May 2025 11:07:59 +0800 Message-Id: <20250511030800.1900-2-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250511030800.1900-1-laoar.shao@gmail.com> References: <20250511030800.1900-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable After enabling CONFIG_IRQ_TIME_ACCOUNTING to monitor IRQ pressure in our container environment, we observed several noticeable behavioral changes. One of our IRQ-heavy services, such as Redis, reported a significant reduction in CPU usage after upgrading to the new kernel with CONFIG_IRQ_TIME_ACCOUNTING enabled. However, despite adding more threads to handle an increased workload, the CPU usage could not be raised. In other words, even though the container=E2=80=99s CPU usage appeared low, it= was unable to process more workloads to utilize additional CPU resources, which caused issues. This behavior can be demonstrated using netperf: function start_server() { for j in `seq 1 3`; do netserver -p $[12345+j] > /dev/null & done } server_ip=3D$1 function start_client() { # That applies to cgroup2 as well. mkdir -p /sys/fs/cgroup/cpuacct/test echo $$ > /sys/fs/cgroup/cpuacct/test/cgroup.procs for j in `seq 1 3`; do port=3D$[12345+j] taskset -c 0 netperf -H ${server_ip} -l ${run_time:-30000} \ -t TCP_STREAM -p $port -- -D -m 1k -M 1K -s 8k -S 8k \ > /dev/null & done } start_server start_client We can verify the CPU usage of the test cgroup using cpuacct.stat. The output shows: system: 53 user: 2 The CPU usage of the cgroup is relatively low at around 55%, but this usage doesn't increase, even with more netperf tasks. The reason is that CPU0 is at 100% utilization, as confirmed by mpstat: 02:56:22 PM CPU %usr %nice %sys %iowait %irq %soft %steal = %guest %gnice %idle 02:56:23 PM 0 0.99 0.00 55.45 0.00 0.99 42.57 0.00 = 0.00 0.00 0.00 02:56:23 PM CPU %usr %nice %sys %iowait %irq %soft %steal = %guest %gnice %idle 02:56:24 PM 0 2.00 0.00 55.00 0.00 0.00 43.00 0.00 = 0.00 0.00 0.00 It is clear that the %soft is excluded in the cgroup of the interrupted task. This behavior is unexpected. We should include IRQ time in the cgroup to reflect the pressure the group is under. After a thorough analysis, I discovered that this change in behavior is due to commit 305e6835e055 ("sched: Do not account irq time to current task"), which altered whether IRQ time should be charged to the interrupted task. While I agree that a task should not be penalized by random interrupts, the task itself cannot progress while interrupted. Therefore, the interrupted time should be reported to the user. The system metric in cpuacct.stat is crucial in indicating whether a container is under heavy system pressure, including IRQ/softirq activity. Hence, IRQ/softirq time should be included in the cpuacct system usage, which also applies to cgroup2=E2=80=99s rstat. The reason it doesn't just add the cgroup_account_*() to irqtime_account_irq() is that it might result in performance hit to hold the rq_lock in the critical path. Taking inspiration from commit ddae0ca2a8fe ("sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpath"), I've now adapted the approach to handle it in a non-critical path, reducing the performance impact. Signed-off-by: Yafang Shao Cc: Johannes Weiner Cc: Michal Koutn=C3=BD --- kernel/sched/core.c | 33 +++++++++++++++++++++++++++++++-- kernel/sched/psi.c | 13 +++---------- kernel/sched/sched.h | 2 +- kernel/sched/stats.h | 7 ++++--- 4 files changed, 39 insertions(+), 16 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c81cf642dba0..2a1ce5a88076 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5627,6 +5627,35 @@ static int __init setup_resched_latency_warn_ms(char= *str) } __setup("resched_latency_warn_ms=3D", setup_resched_latency_warn_ms); =20 +#ifdef CONFIG_IRQ_TIME_ACCOUNTING +static void account_irqtime(struct rq *rq, struct task_struct *curr, + struct task_struct *prev) +{ + int cpu =3D smp_processor_id(); + s64 delta; + u64 irq; + + if (!irqtime_enabled()) + return; + + irq =3D irq_time_read(cpu); + delta =3D (s64)(irq - rq->irq_time); + if (delta < 0) + return; + + rq->irq_time =3D irq; + psi_account_irqtime(rq, curr, prev, delta); + cgroup_account_cputime(curr, delta); + /* We account both softirq and irq into CPUTIME_IRQ */ + cgroup_account_cputime_field(curr, CPUTIME_IRQ, delta); +} +#else +static inline void account_irqtime(struct rq *rq, struct task_struct *curr, + struct task_struct *prev) +{ +} +#endif + /* * This function gets called by the timer code, with HZ frequency. * We call it with interrupts disabled. @@ -5649,7 +5678,7 @@ void sched_tick(void) rq_lock(rq, &rf); donor =3D rq->donor; =20 - psi_account_irqtime(rq, donor, NULL); + account_irqtime(rq, donor, NULL); =20 update_rq_clock(rq); hw_pressure =3D arch_scale_hw_pressure(cpu_of(rq)); @@ -6757,7 +6786,7 @@ static void __sched notrace __schedule(int sched_mode) ++*switch_count; =20 migrate_disable_switch(rq, prev); - psi_account_irqtime(rq, prev, next); + account_irqtime(rq, prev, next); psi_sched_switch(prev, next, !task_on_rq_queued(prev) || prev->se.sched_delayed); =20 diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 1396674fa722..4affc1f747bd 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -990,15 +990,14 @@ void psi_task_switch(struct task_struct *prev, struct= task_struct *next, } =20 #ifdef CONFIG_IRQ_TIME_ACCOUNTING -void psi_account_irqtime(struct rq *rq, struct task_struct *curr, struct t= ask_struct *prev) +void psi_account_irqtime(struct rq *rq, struct task_struct *curr, struct t= ask_struct *prev, + s64 delta) { int cpu =3D task_cpu(curr); struct psi_group *group; struct psi_group_cpu *groupc; - s64 delta; - u64 irq; =20 - if (static_branch_likely(&psi_disabled) || !irqtime_enabled()) + if (static_branch_likely(&psi_disabled)) return; =20 if (!curr->pid) @@ -1009,12 +1008,6 @@ void psi_account_irqtime(struct rq *rq, struct task_= struct *curr, struct task_st if (prev && task_psi_group(prev) =3D=3D group) return; =20 - irq =3D irq_time_read(cpu); - delta =3D (s64)(irq - rq->psi_irq_time); - if (delta < 0) - return; - rq->psi_irq_time =3D irq; - do { u64 now; =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 47972f34ea70..b5e71dfad66c 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1226,7 +1226,7 @@ struct rq { =20 #ifdef CONFIG_IRQ_TIME_ACCOUNTING u64 prev_irq_time; - u64 psi_irq_time; + u64 irq_time; #endif #ifdef CONFIG_PARAVIRT u64 prev_steal_time; diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h index 452826df6ae1..b5b626cb1b83 100644 --- a/kernel/sched/stats.h +++ b/kernel/sched/stats.h @@ -111,10 +111,11 @@ void psi_task_change(struct task_struct *task, int cl= ear, int set); void psi_task_switch(struct task_struct *prev, struct task_struct *next, bool sleep); #ifdef CONFIG_IRQ_TIME_ACCOUNTING -void psi_account_irqtime(struct rq *rq, struct task_struct *curr, struct t= ask_struct *prev); +void psi_account_irqtime(struct rq *rq, struct task_struct *curr, + struct task_struct *prev, s64 delta); #else static inline void psi_account_irqtime(struct rq *rq, struct task_struct *= curr, - struct task_struct *prev) {} + struct task_struct *prev, s64 delta) {} #endif /*CONFIG_IRQ_TIME_ACCOUNTING */ /* * PSI tracks state that persists across sleeps, such as iowaits and @@ -228,7 +229,7 @@ static inline void psi_sched_switch(struct task_struct = *prev, struct task_struct *next, bool sleep) {} static inline void psi_account_irqtime(struct rq *rq, struct task_struct *= curr, - struct task_struct *prev) {} + struct task_struct *prev, s64 delta) {} #endif /* CONFIG_PSI */ =20 #ifdef CONFIG_SCHED_INFO --=20 2.43.5 From nobody Sun Dec 14 13:57:03 2025 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B63F1487D1; Sun, 11 May 2025 03:08:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746932919; cv=none; b=tYFb/KO+q9Yq2ZY7C1IBLfzunX85OgT9OekGCGt/JOy7SAIgwSTGV+Sz4mr/ZBahw8FWqYZxeNhKCztabn9ER21GVKXJuPJUP1jM3bTFq7NjEDz4tUTpGZm6tiZ8MU+Kzv6UXR/KqCVva9cfkGMOIkzVKpA/gy85RMLeeNaFbZU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1746932919; c=relaxed/simple; bh=r+vkn0eEIdcMigpTHj0eqKwWUdStj46jhNQGLfY2V9U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=M1nPLPi55cI37/mb7+HkF57L8QKtxrWiqVQSuCtHDcQu5td3jvxoFLhtua0xHDjP3GpylZ4CgOcszslTy/fqjOLwrpOUmhO/cNQeL5tEhsigqi76drhS0iLmMpoZrBG4QpCgH9b3+KJM5d0jGFBix+yIGaxeja0x6KWbkoqJMCs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=FeKj45kc; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="FeKj45kc" Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-7376dd56f8fso4474073b3a.2; Sat, 10 May 2025 20:08:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746932917; x=1747537717; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1lTKKd7diRQJTYNbFHCqlVfIA24BlZl71qqXWQO/K1A=; b=FeKj45kcZSghynZBIji2xbK2Jjg3Q1Rs8qh/wsueYTEoM+/MTKYlE6H9406UvEws5Q +mdQxFshaOhMUeHZIB6j7CY7XhICYYFtW7Vqdq7IPyNr1XdqS+jeR7AtqQnMmhhOnxJJ GRwsNuJMdD6wRYWD0F1F7vcKOvVQ0ykE/o9ShI9y67rmKRnQiCENDw47GkdOpOemIN3k hC9Y9YsBvGLPsDK4aLMma9BE4iLM/yx2wMbeYZgIAo9lt+X1jsubsXQiCXMMbN/nyavq bgEgOCrpWqRa+NmKAq3grrforwLbLdbaH8+thAYtJ8Cd8JzIeL1rnegm2lw3rKf9NTTO 7MSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746932917; x=1747537717; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1lTKKd7diRQJTYNbFHCqlVfIA24BlZl71qqXWQO/K1A=; b=lnTqqVWFXT1d2HRhM5hZqb5IRyHGgsAaz2erMGNb9g4ViK4PxnJ4Wd3MBpY/3c37us BNJvc9fdq6nE1mf/GWj1ZDzY+N0WiBURpG5BmYrdhMt+7utocaf4vlku15/Ut3c6PQNk 5ULctBy6CuOJ2JGT4HAU7Y/l3Bb+VDJDaYzZNAPRRCPo7QGs8uGGVSrmRkkT+h1LEcjg sOl0R1ljCdE4f8BDYCIAiUktjuKo9C9YBD8qpvDviGAlPAqKl8/Y1Vseh0NfndgVq1B7 yhrZfJD4isPZI/2rqSCOYwjjC1brBQ6ocob/6EmM4FaM9AYHtzozlIcgg2przks2bAVp 3BAQ== X-Forwarded-Encrypted: i=1; AJvYcCUIGGk+Eo8fukfklvpvfb8k/HrHw8qLQDL6sQeD7ppdYSs/d0WnGmb4z+E4ugUBkgLK0e/KM9VWwRC5byV/@vger.kernel.org, AJvYcCVVDp0J8Qg9GL6KMx5U08FwZe1tUlbQcGjZbB/cWgzeiHl4F2TfYTvUiIaBrnQyeY3voSLbVcHz@vger.kernel.org X-Gm-Message-State: AOJu0YwsGGCcGEGmD1QpzHgiY+U1tSIZyctRK6i5DFzccug19kgQFC6Q IPnCo4c1zUEPvZzrygkJdMNHMSls/EOZeHdTs9TdSZuNeCcfIVDd X-Gm-Gg: ASbGncstKJMew0UxC/94xDVliwN871nA+QkosKk34k6bj1XlIFnb7Zk7YmSI32d8p7d wSKFTe/viih66yTltVYxeInpeU9OHnYeZFmorI43Xaa/pWf2+mzUkBb8YbF34g8Qdokn4UWBKyQ ZREnWAzRR6JyZwSbHYOR/QzbN0vCzqbB/KO9dt3d0q+tur9M7K/GGMgDrduPrLrZ3bKfavbUyCF TPu8NZaX40Q6kNvvFp7U+l1bHcgoZ5sdVRN10w2kESCHJHwVfDKvxS+nx/qOY5+GVVptN2bQSnN X8aZNhSBWGDm229tdiJc4Y6oSpD3QqlijqJ9Og6NKGUr0WtYAN7QVtCN62oBV/mhzH2McBPu3tf u X-Google-Smtp-Source: AGHT+IFA0e+IZ50Drym0CBcMDdhFz1H3piGuyTG0prah5+c2i+dJpAsPtAP72MkkFB/AVs91R+pcPg== X-Received: by 2002:a05:6a00:2d08:b0:740:67ce:1d8b with SMTP id d2e1a72fcca58-7423bbf68e9mr13179579b3a.7.1746932917339; Sat, 10 May 2025 20:08:37 -0700 (PDT) Received: from localhost.localdomain ([39.144.107.128]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74237a105fdsm3956883b3a.100.2025.05.10.20.08.32 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sat, 10 May 2025 20:08:36 -0700 (PDT) From: Yafang Shao To: mingo@redhat.com, peterz@infradead.org, mkoutny@suse.com, hannes@cmpxchg.org Cc: juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, surenb@google.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lkp@intel.com, Yafang Shao , Eric Dumazet Subject: [PATCH v9 2/2] sched: Annotate sched_clock_irqtime with __read_mostly Date: Sun, 11 May 2025 11:08:00 +0800 Message-Id: <20250511030800.1900-3-laoar.shao@gmail.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20250511030800.1900-1-laoar.shao@gmail.com> References: <20250511030800.1900-1-laoar.shao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Eric reported an issue [0] as follows, : rebalance_domains() can attempt to change sched_balance_running : more than 350,000 times per second on our servers. : If sched_clock_irqtime and sched_balance_running share the : same cache line, we see a very high cost on hosts with 480 threads : dealing with many interrupts. While the rebalance_domains() issue has been resolved [1], we should proactively annotate sched_clock_irqtime with __read_mostly to prevent potential cacheline false sharing. This optimization is particularly justified since sched_clock_irqtime is only modified during TSC instability events. Link: https://lore.kernel.org/all/20250423174634.3009657-1-edumazet@google.= com/ [0] Link: https://lore.kernel.org/all/20250416035823.1846307-1-tim.c.chen@linux= .intel.com/ [1] Reported-by: Eric Dumazet Debugged-by: Eric Dumazet Signed-off-by: Yafang Shao Cc: Eric Dumazet Reviewed-by: Michal Koutn=C3=BD --- kernel/sched/cputime.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 6dab4854c6c0..c499a42ceda4 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -22,7 +22,7 @@ */ DEFINE_PER_CPU(struct irqtime, cpu_irqtime); =20 -int sched_clock_irqtime; +int __read_mostly sched_clock_irqtime; =20 void enable_sched_clock_irqtime(void) { --=20 2.43.5