include/linux/cgroup-defs.h | 1 + kernel/cgroup/rstat.c | 16 ++++- tools/testing/selftests/cgroup/test_cpu.c | 72 +++++++++++++++++++++++ 3 files changed, 86 insertions(+), 3 deletions(-)
From: Joshua Hahn <joshua.hahn6@gmail.com> v2 -> v3: Signed-off-by & renamed subject for clarity. v1 -> v2: Edited commit messages for clarity. Niced CPU usage is a metric reported in host-level /prot/stat, but is not reported in cgroup-level statistics in cpu.stat. However, when a host contains multiple tasks across different workloads, it becomes difficult to gauge how much of the task is being spent on niced processes based on /proc/stat alone, since host-level metrics do not provide this cgroup-level granularity. Exposing this metric will allow users to accurately probe the niced CPU metric for each workload, and make more informed decisions when directing higher priority tasks. Joshua Hahn (2): Tracking cgroup-level niced CPU time Selftests for niced CPU statistics include/linux/cgroup-defs.h | 1 + kernel/cgroup/rstat.c | 16 ++++- tools/testing/selftests/cgroup/test_cpu.c | 72 +++++++++++++++++++++++ 3 files changed, 86 insertions(+), 3 deletions(-) -- 2.43.5
On Mon, Sep 23, 2024 at 07:20:04AM GMT, Joshua Hahn <joshua.hahnjy@gmail.com> wrote: > From: Joshua Hahn <joshua.hahn6@gmail.com> > > v2 -> v3: Signed-off-by & renamed subject for clarity. > v1 -> v2: Edited commit messages for clarity. Thanks for the version changelog, appreciated! ... > Exposing this metric will allow users to accurately probe the niced CPU > metric for each workload, and make more informed decisions when > directing higher priority tasks. Possibly an example of how this value (combined with some other?) is used for decisions could shed some light on this and justify adding this attribute. Thanks, Michal (I'll respond here to Tejun's message from v2 thread.) On Tue, Sep 10, 2024 at 11:01:07AM GMT, Tejun Heo <tj@kernel.org> wrote: > I think it's as useful as system-wide nice metric is. Exactly -- and I don't understand how that system-wide value (without any cgroups) is useful. If I don't know how many there are niced and non-niced tasks and what their runnable patterns are, the aggregated nice time can have ambiguous interpretations. > I think there are benefits to mirroring system wide metrics, at least > ones as widely spread as nice. I agree with benefits of mirroring of some system wide metrics when they are useful <del>but not all of them because it's difficult/impossible to take them away once they're exposed</del>. Actually, readers _should_ handle missing keys gracefuly, so this may be just fine. (Is this nice time widely spread? (I remember the field from `top`, still not sure how to use it.) Are other proc_stat(5) fields different? I see how this can be the global analog on leaf cgroups but interpretting middle cgroups with children of different cpu.weights?)
Hello, Michal. On Thu, Sep 26, 2024 at 08:10:35PM +0200, Michal Koutný wrote: ... > On Tue, Sep 10, 2024 at 11:01:07AM GMT, Tejun Heo <tj@kernel.org> wrote: > > I think it's as useful as system-wide nice metric is. > > Exactly -- and I don't understand how that system-wide value (without > any cgroups) is useful. > If I don't know how many there are niced and non-niced tasks and what > their runnable patterns are, the aggregated nice time can have ambiguous > interpretations. > > > I think there are benefits to mirroring system wide metrics, at least > > ones as widely spread as nice. > > I agree with benefits of mirroring of some system wide metrics when they > are useful <del>but not all of them because it's difficult/impossible to take > them away once they're exposed</del>. Actually, readers _should_ handle > missing keys gracefuly, so this may be just fine. > > (Is this nice time widely spread? (I remember the field from `top`, still > not sure how to use it.) Are other proc_stat(5) fields different? A personal anecdote: I usually run compile jobs with nice and look at the nice utilization to see what the system is doing. I think it'd be simliar for most folks. Because the number has always been there and ubiqutous across many monitoring tools, people end up using it for something. It's not a great metric but a long-standing and widely available one, so it ends up with usages. BTW, there are numbers which are actively silly - e.g. iowait, especially due to how it gets aggregated across multiple CPUs. That, we want to actively drop especially as the pressure metrics is the better substitute. I don't think nice is in that category. It's not the best metric there is but not useless or misleading. > I see how this can be the global analog on leaf cgroups but > interpretting middle cgroups with children of different cpu.weights?) I think aggregating per-thread numbers is the right thing to do. It's just sum of CPU cycles spent by threads which got niced. Thanks. -- tejun
© 2016 - 2024 Red Hat, Inc.