[PATCH v3 bpf-next 0/4] sched: Fix irq accounting for CONFIG_IRQ_TIME_ACCOUNTING

Yafang Shao posted 4 patches 1 month, 1 week ago
There is a newer version of this series
kernel/sched/core.c    | 78 +++++++++++++++++++++++++++++-------------
kernel/sched/cputime.c | 16 ++++-----
kernel/sched/psi.c     | 12 ++-----
kernel/sched/sched.h   |  1 +
kernel/sched/stats.h   |  7 ++--
5 files changed, 69 insertions(+), 45 deletions(-)
[PATCH v3 bpf-next 0/4] sched: Fix irq accounting for CONFIG_IRQ_TIME_ACCOUNTING
Posted by Yafang Shao 1 month, 1 week ago
After enabling CONFIG_IRQ_TIME_ACCOUNTING to track IRQ pressure in our
container environment, we encountered several user-visible behavioral
changes:

- Interrupted IRQ/softirq time is not accounted for in the cpuacct cgroup

  This breaks userspace applications that rely on CPU usage data from
  cgroups to monitor CPU pressure. This patchset resolves the issue by
  ensuring that IRQ/softirq time is accounted for in the cgroup of the
  interrupted tasks.

- getrusage(2) does not include time interrupted by IRQ/softirq

  Some services use getrusage(2) to check if workloads are experiencing CPU
  pressure. Since IRQ/softirq time is no longer charged to task runtime,
  getrusage(2) can no longer reflect the CPU pressure caused by heavy
  interrupts.

This patchset addresses the first issue, which is relatively
straightforward. However, the second issue remains unresolved, as there
might be debate over whether interrupted time should be considered part of
a task’s usage. Nonetheless, it is important to report interrupted time to
the user via some metric, though that is a separate discussion.

Changes:
v2->v3:
- Add a helper account_irqtime() to avoid redundant code (Johannes)

v1->v2: https://lore.kernel.org/cgroups/20241008061951.3980-1-laoar.shao@gmail.com/
- Fix lockdep issues reported by kernel test robot <oliver.sang@intel.com>

v1: https://lore.kernel.org/all/20240923090028.16368-1-laoar.shao@gmail.com/

Yafang Shao (4):
  sched: Define sched_clock_irqtime as static key
  sched: Don't account irq time if sched_clock_irqtime is disabled
  sched, psi: Don't account irq time if sched_clock_irqtime is disabled
  sched: Fix cgroup irq accounting for CONFIG_IRQ_TIME_ACCOUNTING

 kernel/sched/core.c    | 78 +++++++++++++++++++++++++++++-------------
 kernel/sched/cputime.c | 16 ++++-----
 kernel/sched/psi.c     | 12 ++-----
 kernel/sched/sched.h   |  1 +
 kernel/sched/stats.h   |  7 ++--
 5 files changed, 69 insertions(+), 45 deletions(-)

-- 
2.43.5

Re: [PATCH v3 bpf-next 0/4] sched: Fix irq accounting for CONFIG_IRQ_TIME_ACCOUNTING
Posted by Yafang Shao 1 month ago
On Mon, Oct 14, 2024 at 11:11 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> After enabling CONFIG_IRQ_TIME_ACCOUNTING to track IRQ pressure in our
> container environment, we encountered several user-visible behavioral
> changes:
>
> - Interrupted IRQ/softirq time is not accounted for in the cpuacct cgroup
>
>   This breaks userspace applications that rely on CPU usage data from
>   cgroups to monitor CPU pressure. This patchset resolves the issue by
>   ensuring that IRQ/softirq time is accounted for in the cgroup of the
>   interrupted tasks.
>
> - getrusage(2) does not include time interrupted by IRQ/softirq
>
>   Some services use getrusage(2) to check if workloads are experiencing CPU
>   pressure. Since IRQ/softirq time is no longer charged to task runtime,
>   getrusage(2) can no longer reflect the CPU pressure caused by heavy
>   interrupts.
>
> This patchset addresses the first issue, which is relatively
> straightforward. However, the second issue remains unresolved, as there
> might be debate over whether interrupted time should be considered part of
> a task’s usage. Nonetheless, it is important to report interrupted time to
> the user via some metric, though that is a separate discussion.
>
> Changes:
> v2->v3:
> - Add a helper account_irqtime() to avoid redundant code (Johannes)
>
> v1->v2: https://lore.kernel.org/cgroups/20241008061951.3980-1-laoar.shao@gmail.com/
> - Fix lockdep issues reported by kernel test robot <oliver.sang@intel.com>
>
> v1: https://lore.kernel.org/all/20240923090028.16368-1-laoar.shao@gmail.com/
>
> Yafang Shao (4):
>   sched: Define sched_clock_irqtime as static key
>   sched: Don't account irq time if sched_clock_irqtime is disabled
>   sched, psi: Don't account irq time if sched_clock_irqtime is disabled
>   sched: Fix cgroup irq accounting for CONFIG_IRQ_TIME_ACCOUNTING
>
>  kernel/sched/core.c    | 78 +++++++++++++++++++++++++++++-------------
>  kernel/sched/cputime.c | 16 ++++-----
>  kernel/sched/psi.c     | 12 ++-----
>  kernel/sched/sched.h   |  1 +
>  kernel/sched/stats.h   |  7 ++--
>  5 files changed, 69 insertions(+), 45 deletions(-)
>
> --
> 2.43.5
>

Hello Peter, Hello Ingo,

Do you have comments or any suggestions on this fix?

-- 
Regards
Yafang