kernel/sched/core.c | 1 + kernel/sched/cputime.c | 33 ++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 6 ++++++ 3 files changed, 39 insertions(+), 1 deletion(-)
The problem that the statistical time goes backward, the value read first is 319, and the value read again is 318. As follows:
first:
cat /proc/stat | grep cpu1
cpu1 319 0 496 41665 0 0 0 0 0 0
then:
cat /proc/stat | grep cpu1
cpu1 318 0 497 41674 0 0 0 0 0 0
Time goes back, which is counterintuitive.
After debug this, The problem is caused by the implementation of kcpustat_cpu_fetch_vtime. As follows:
CPU0 CPU1
First:
show_stat():
->kcpustat_cpu_fetch()
->kcpustat_cpu_fetch_vtime()
->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu) + vtime->utime + delta; rq->curr is in user mod
---> When CPU1 rq->curr running on userspace, need add utime and delta
---> rq->curr->vtime->utime is less than 1 tick
Then:
show_stat():
->kcpustat_cpu_fetch()
->kcpustat_cpu_fetch_vtime()
->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu); rq->curr is in kernel mod
---> When CPU1 rq->curr running on kernel space, just got kcpustat
Fixes: 74722bb223d0 ("sched/vtime: Bring up complete kcpustat accessor")
Signed-off-by: Li Hua <hucool.lihua@huawei.com>
---
kernel/sched/core.c | 1 +
kernel/sched/cputime.c | 33 ++++++++++++++++++++++++++++++++-
kernel/sched/sched.h | 6 ++++++
3 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 189999007f32..c542b61cab54 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9753,6 +9753,7 @@ void __init sched_init(void)
rq->core_cookie = 0UL;
#endif
+ cputime_cpu_init(i);
}
set_load_weight(&init_task, false);
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 95fc77853743..ba3bcb40795e 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -1060,6 +1060,19 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
return 0;
}
+/*
+ * Stores the time of the last acquisition, which is used to handle the case of
+ * time backwards.
+ */
+static DEFINE_PER_CPU(struct kernel_cpustat, cpustat_prev);
+static DEFINE_PER_CPU(raw_spinlock_t, cpustat_prev_lock);
+
+void cputime_cpu_init(int cpu)
+{
+ raw_spin_lock_init(per_cpu_ptr(&cpustat_prev_lock, cpu));
+}
+
+
void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu)
{
const struct kernel_cpustat *src = &kcpustat_cpu(cpu);
@@ -1087,8 +1100,26 @@ void kcpustat_cpu_fetch(struct kernel_cpustat *dst, int cpu)
err = kcpustat_cpu_fetch_vtime(dst, src, curr, cpu);
rcu_read_unlock();
- if (!err)
+ if (!err) {
+ int i;
+ int map[5] = {CPUTIME_USER, CPUTIME_SYSTEM, CPUTIME_NICE,
+ CPUTIME_GUEST, CPUTIME_GUEST_NICE};
+ struct kernel_cpustat *prev = &per_cpu(cpustat_prev, cpu);
+ raw_spinlock_t *cpustat_lock = &per_cpu(cpustat_prev_lock, cpu);
+ u64 *dst_stat = dst->cpustat;
+ u64 *prev_stat = prev->cpustat;
+
+ raw_spin_lock(cpustat_lock);
+ for (i = 0; i < 5; i++) {
+ int idx = map[i];
+
+ if (dst_stat[idx] < prev_stat[idx])
+ dst_stat[idx] = prev_stat[idx];
+ }
+ *prev = *dst;
+ raw_spin_unlock(cpustat_lock);
return;
+ }
cpu_relax();
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index a6f071b2acac..cbe09795a394 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3156,4 +3156,10 @@ extern int sched_dynamic_mode(const char *str);
extern void sched_dynamic_update(int mode);
#endif
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
+extern void cputime_cpu_init(int cpu);
+#else
+static inline void cputime_cpu_init(int cpu) {}
+#endif
+
#endif /* _KERNEL_SCHED_SCHED_H */
--
2.17.1
On Sat, Aug 13, 2022 at 08:01:02AM +0800, Li Hua wrote: > The problem that the statistical time goes backward, the value read first is 319, and the value read again is 318. As follows: > first: > cat /proc/stat | grep cpu1 > cpu1 319 0 496 41665 0 0 0 0 0 0 > then: > cat /proc/stat | grep cpu1 > cpu1 318 0 497 41674 0 0 0 0 0 0 > > Time goes back, which is counterintuitive. > > After debug this, The problem is caused by the implementation of kcpustat_cpu_fetch_vtime. As follows: > > CPU0 CPU1 > First: > show_stat(): > ->kcpustat_cpu_fetch() > ->kcpustat_cpu_fetch_vtime() > ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu) + vtime->utime + delta; rq->curr is in user mod > ---> When CPU1 rq->curr running on userspace, need add utime and delta > ---> rq->curr->vtime->utime is less than 1 tick > Then: > show_stat(): > ->kcpustat_cpu_fetch() > ->kcpustat_cpu_fetch_vtime() > ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu); rq->curr is in kernel mod > ---> When CPU1 rq->curr running on kernel space, just got kcpustat This is unreadable, what?!?
Assume that a CPU time“ A” is read from /proc/stat, and after a while, a CPU time “B” is read. If T = B – A < 0, T is identified as a large number as an unsigned integer. As a result, the CPU usage calculated by this way will be abnormally high. It seems to be a problem to be fixed. original link: https://lore.kernel.org/lkml/20220813000102.42051-1-hucool.lihua@huawei.com/ 在 2022/8/15 16:15, Peter Zijlstra 写道: > On Sat, Aug 13, 2022 at 08:01:02AM +0800, Li Hua wrote: >> The problem that the statistical time goes backward, the value read first is 319, and the value read again is 318. As follows: >> first: >> cat /proc/stat | grep cpu1 >> cpu1 319 0 496 41665 0 0 0 0 0 0 >> then: >> cat /proc/stat | grep cpu1 >> cpu1 318 0 497 41674 0 0 0 0 0 0 >> >> Time goes back, which is counterintuitive. >> >> After debug this, The problem is caused by the implementation of kcpustat_cpu_fetch_vtime. As follows: >> >> CPU0 CPU1 >> First: >> show_stat(): >> ->kcpustat_cpu_fetch() >> ->kcpustat_cpu_fetch_vtime() >> ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu) + vtime->utime + delta; rq->curr is in user mod >> ---> When CPU1 rq->curr running on userspace, need add utime and delta >> ---> rq->curr->vtime->utime is less than 1 tick >> Then: >> show_stat(): >> ->kcpustat_cpu_fetch() >> ->kcpustat_cpu_fetch_vtime() >> ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu); rq->curr is in kernel mod >> ---> When CPU1 rq->curr running on kernel space, just got kcpustat > This is unreadable, what?!? > .
It's unreadable, I'm sorry about that. The CPU statistics time read from /proc/stat should only be incremented. The bug I found is that the value read latest is smaller than the former. The root cause of the problem is that the "vtime->utime" and "delta" are temporarily added to the stack and show to the user. The value of 'vtime->utime + delta' depends on which task the CPU is executing. As bellow: show_stat -> kcpustat_cpu_fetch -> kcpustat_cpu_fetch_vtime -> cpustat[CPUTIME_USER] += vtime->utime + delta
ping... Thank you for your reply.
© 2016 - 2026 Red Hat, Inc.