[PATCH v2 0/2] Two semi-related perf throttling fixes

Calvin Owens posted 2 patches 1 month, 2 weeks ago
arch/x86/events/amd/ibs.c |  6 +++---
arch/x86/events/core.c    |  3 ++-
kernel/events/core.c      | 16 ++++++++++++++++
3 files changed, 21 insertions(+), 4 deletions(-)
[PATCH v2 0/2] Two semi-related perf throttling fixes
Posted by Calvin Owens 1 month, 2 weeks ago
Hi all,

This is a refresh of this little series:
https://lore.kernel.org/lkml/cover.1774969692.git.calvin@wbinvd.org/

Patch [1/2] is the same, patch [2/2] has two minor fixes described in
the patch itself. The remainder of this cover is the same as in v1.

Cheers,
Calvin

---

In the course of investigating [1], I set out to understand why this
sequence of messages is printed every boot, even when nobody is using
perf at all:

    perf: interrupt took too long (2516 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
    perf: interrupt took too long (3156 > 3145), lowering kernel.perf_event_max_sample_rate to 63000
    perf: interrupt took too long (4014 > 3945), lowering kernel.perf_event_max_sample_rate to 49000
    perf: interrupt took too long (5035 > 5017), lowering kernel.perf_event_max_sample_rate to 39000
    perf: interrupt took too long (6302 > 6293), lowering kernel.perf_event_max_sample_rate to 31000
    perf: interrupt took too long (7879 > 7877), lowering kernel.perf_event_max_sample_rate to 25000
    perf: interrupt took too long (9852 > 9848), lowering kernel.perf_event_max_sample_rate to 20000

It turns out this happens because of how the dynamic sample rate
throttling interacts with the perf hardware watchdog. Patch [2/2] is my
attempt to prevent the dynamic throttling logic from acting solely based
on the latency of the watchdog NMI.

Intel CPUs were happy with that. But AMD CPUs still printed the messages!

That happens because AMD CPUs have a second PMU facility with its own
NMI handler, and both NMI handlers average in their latency, even when
they don't actually handle the NMI.

Patch [1/2] fixes that, which is a correctness issue entirely
independent of patch [2/2]. But it also happens to be required for patch
[2/2] to achieve its goal on AMD CPUs, so I sent them together.

Thanks,
Calvin

[1] https://lore.kernel.org/all/acMe-QZUel-bBYUh@mozart.vkv.me/

Calvin Owens (2):
  perf/x86: Avoid double accounting of PMU NMI latencies
  perf: Don't throttle based on NMI watchdog events

 arch/x86/events/amd/ibs.c |  6 +++---
 arch/x86/events/core.c    |  3 ++-
 kernel/events/core.c      | 16 ++++++++++++++++
 3 files changed, 21 insertions(+), 4 deletions(-)

-- 
2.47.3