[PATCH] sched/cputime: Make IRQ time accounting configurable at boot time

Bart Van Assche posted 1 patch 2 years, 7 months ago
There is a newer version of this series
Documentation/admin-guide/kernel-parameters.txt |  4 ++++
kernel/sched/cputime.c                          | 13 +++++++++++++
2 files changed, 17 insertions(+)
[PATCH] sched/cputime: Make IRQ time accounting configurable at boot time
Posted by Bart Van Assche 2 years, 7 months ago
Some producers of Android devices want IRQ time accounting enabled while
others want IRQ time accounting disabled. Hence, make IRQ time accounting
configurable at boot time.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  4 ++++
 kernel/sched/cputime.c                          | 13 +++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9e5bab29685f..67a2ad3af833 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5611,6 +5611,10 @@
 			non-zero "wait" parameter.  See weight_single
 			and weight_many.
 
+	sched_clock_irqtime= [KNL]
+			Can be used to disable IRQ time accounting if
+			CONFIG_IRQ_TIME_ACCOUNTING=y.
+
 	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate
 			xtime_lock contention on larger systems, and/or RCU lock
 			contention on all systems with CONFIG_MAXSMP set.
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index af7952f12e6c..d9c65017024d 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -24,6 +24,19 @@ DEFINE_PER_CPU(struct irqtime, cpu_irqtime);
 
 static int sched_clock_irqtime;
 
+static int __init sched_clock_irqtime_setup(char *arg)
+{
+	bool enabled;
+
+	if (kstrtobool(arg, &enabled) < 0)
+		pr_err("Invalid sched_clock_irqtime value\n");
+	else
+		sched_clock_irqtime = enabled;
+	return 1;
+}
+
+__setup("sched_clock_irqtime=", sched_clock_irqtime_setup);
+
 void enable_sched_clock_irqtime(void)
 {
 	sched_clock_irqtime = 1;
Re: [PATCH] sched/cputime: Make IRQ time accounting configurable at boot time
Posted by Peter Zijlstra 2 years, 7 months ago
On Thu, Jun 15, 2023 at 01:37:26PM -0700, Bart Van Assche wrote:
> Some producers of Android devices want IRQ time accounting enabled while
> others want IRQ time accounting disabled. Hence, make IRQ time accounting
> configurable at boot time.

Why would they want this disabled? IRQ time accounting avoids a number
of issues under high irq/softirq pressure.

Disabling this makes no sense.
Re: [PATCH] sched/cputime: Make IRQ time accounting configurable at boot time
Posted by Qais Yousef 2 years, 7 months ago
On 06/16/23 09:45, Peter Zijlstra wrote:
> On Thu, Jun 15, 2023 at 01:37:26PM -0700, Bart Van Assche wrote:
> > Some producers of Android devices want IRQ time accounting enabled while
> > others want IRQ time accounting disabled. Hence, make IRQ time accounting
> > configurable at boot time.
> 
> Why would they want this disabled? IRQ time accounting avoids a number
> of issues under high irq/softirq pressure.
> 
> Disabling this makes no sense.

I think it is assumed that IRQ time accounting is only used for stat
collection (which is what I thought too), but based on this response I can see
it is used to ensure we account for stolen time in update_rq_clock_task() so it
helps to make it account more accurately for the time a task actually spent
running and doing useful work.

Bart, could you profile the cause of the high overhead? The config message says
a small perf impact, but 40% mentioned in your v2 is high. Could you try to
break down the problem further as we might be overlooking the true source of
overhead or miss an opportunity to improve the accounting logic instead?


Thanks

--
Qais Yousef
Re: [PATCH] sched/cputime: Make IRQ time accounting configurable at boot time
Posted by Bart Van Assche 2 years, 7 months ago
On 6/16/23 00:45, Peter Zijlstra wrote:
> On Thu, Jun 15, 2023 at 01:37:26PM -0700, Bart Van Assche wrote:
>> Some producers of Android devices want IRQ time accounting enabled while
>> others want IRQ time accounting disabled. Hence, make IRQ time accounting
>> configurable at boot time.
> 
> Why would they want this disabled? IRQ time accounting avoids a number
> of issues under high irq/softirq pressure.
> 
> Disabling this makes no sense.

This is why disabling IRQ time accounting makes a ton of sense to me:
* If disabling IRQ time accounting would not be useful, there wouldn't
   be a kernel configuration option that controls whether it is enabled
   or disabled - it would be enabled all the time.
* If enabling IRQ time accounting would be essential, all Linux
   distributors would enable it. In the x86 kernels I checked, IRQ time
   accounting is disabled (Debian and openSUSE).
* For x86 there is already a kernel parameter for disabling IRQ time
   accounting (tsc=noirqtime).
* Modern hardware, e.g. UFSHCI 4.0 controllers, supports sending the
   completion interrupt to the CPU core that submitted the I/O. With such
   hardware IRQ overload (100% spent in IRQ handlers and 0% outside IRQ
   handlers) is impossible because the submitter is slowed down by the
   completion interrupts.
* The performance overhead of CONFIG_IRQ_TIME_ACCOUNTING is
   unacceptable. A quick test in an x86 VM shows that enabling
   CONFIG_IRQ_TIME_ACCOUNTING reduces IOPS by 10% (220K -> 200K). On an
   Android setup I measured an IOPS reduction of 40% (100K -> 60K) due
   to CONFIG_IRQ_TIME_ACCOUNTING. This is not acceptable.

Thanks,

Bart.