From nobody Wed Apr 1 11:19:17 2026 Received: from mail-dl1-f47.google.com (mail-dl1-f47.google.com [74.125.82.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85717406291 for ; Tue, 31 Mar 2026 15:26:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774970764; cv=none; b=bHUJ6RPrpMUHREWIUhDexckgIOlR3/m6RiiswlwYWs92wDBzIbN2Z+XfoW/Yf00I02ODflubYjmt3EYQlF3rdWd4U44rOXBhXagAQGFdV54B5Yt9yXUCq1AgmEkKw105WeGZc3fEPEPFUteX7ZieEjZ8KVTgf9ctNyPSy+hDGTU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774970764; c=relaxed/simple; bh=5lEWMzXasikcO31AENKqbC2O8hgoosi4MrR3O6h5qjo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Foc8LSmeDbvXt3le1SxwLIn9KFBKBcxYIJ7RPTJBq6b4hgwyj9cyFGXILia3kIggAuXMrCZFRxy/OGXLm3gQy+65HShdWyviUVwrT77/jDUX756TnYqW74V+IVzpLIXvR9AjJmDPAxGkT+snnq9fXPE0P4P3aiXzlDfIjzpBxIw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=wbinvd.org; spf=pass smtp.mailfrom=wbinvd.org; dkim=pass (2048-bit key) header.d=wbinvd.org header.i=@wbinvd.org header.b=X0+BvwWG; arc=none smtp.client-ip=74.125.82.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=wbinvd.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=wbinvd.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=wbinvd.org header.i=@wbinvd.org header.b="X0+BvwWG" Received: by mail-dl1-f47.google.com with SMTP id a92af1059eb24-12a695044a4so555679c88.0 for ; Tue, 31 Mar 2026 08:26:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wbinvd.org; s=wbinvd; t=1774970763; x=1775575563; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Dz1Axq4Y+lq0ZDtoswb/uDlh80ImpylCPpWTk7UXK8M=; b=X0+BvwWG0p80dR9JBRWfrAQVCetEKmdHNm8xXXi9A9cW61uJkpeXkowoXtxrbPm9DY T5Fw+6ZYpGLucYw6Zq/cZp8sPS+bDtvA+p0wDdO0Ua9txyNgRgBOFOqef+gocnQyS0fR 245Gm/48pycTeG3UE5FZ/3rSsQh254jCGgZwwvO8Le5HZYQbdqLXnmcG41jVaQ58tqHD jWM2MuaAvH6uWy5/SxGLxnPObVmNjbpzWHMnbeW2iAdeRdSNHGrDHVVVCQRaTGpEpqMv +0wud2F20FqbkysQ3DuNXlUTRHEORZnhba+ZIud84nSEpIWCHJSXCVDfI+IGfyn9sRUL FAIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774970763; x=1775575563; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Dz1Axq4Y+lq0ZDtoswb/uDlh80ImpylCPpWTk7UXK8M=; b=LL3hqWCa+zocsw22HLJNe1SawFv2o3OFNKj2enir95AfjMRvyHDkQ2G5eF3hGrxkLE NFEs7vYeilh5m880GNIPi24OxKRi8wO1zbq79CVUQ/xbtkWtFrZdRyE8+MwmW2RSZRfX Vm8SDM++gK36Tku3ZZk6aMk7ApZDIglVxKuzQBLs0JddJ7aWoYheKgt88oLbpQ1QrP40 Y9vzQ1uOunLt8i4+wlaOBA53oLUuy3mNMKODjPdV+LOQpK8Gtxm8KSkGt5n4CrDXvV9z SLo3YHk9OgkHA97CkrlC0a/IvV3YyeGc182l+10OeQBSEJuueifAzNI3+qvl8gouiByk Atzw== X-Gm-Message-State: AOJu0Yxy9gwBDX8TkhaflYv6/tPo0YmZ+WvqBnTxp8Hq729d3SHSJ+1d l6rmvSPyFcek5EXeLg1Jz7F3EBr6wsSyQXL2BidbHKyIhtmUIi4oXEvJQcSE11xuzuCag3K3JAR RU6u4cdLKuA== X-Gm-Gg: ATEYQzwoBaCcTiD4CBQpubuP2wsL5UG9BHBPizCai7HxAUnAGIN991I3aUJ+WV9whTB yLGVDwW0UBs3sTclLCB7NPboSMwyalyYaSujF9triix0HmzK3iZd9AexrOa9wsR5rMmGMcAAsTr ws7agoE3k2UG2n9vlx188K6K1TH5jrAllSjpQgF3R91DTP8zfwbmjB6zltJRSpdzdqCCPTcncGN 2pyCFitomIzExqsvWyrWR6WpvPnh10+pyYzwhVDAYOOaRXkx4SCU1YBhtLdO4XtX4fZpAlQOvOm 015KpV2ydMqS2C3TcTwJiXYLl0qwQe0+mM3L52nrMD+KjfQ9HfmpyJgRjat1SS2+O+pQApxaOBs IfxNMsIoCHBpJMOamPtfFGrIL1JATHGJoaCkIkjWaaMTtXYt+ISSli0c6SBj1BBRX9fXAkAVB9U F3uQl/YmEZ86EjsXQOJYr5cT2XIX/mbNPcRw== X-Received: by 2002:a05:7022:388a:b0:11b:1c7e:27d0 with SMTP id a92af1059eb24-12ab2752bbamr10827485c88.0.1774970762557; Tue, 31 Mar 2026 08:26:02 -0700 (PDT) Received: from mozart.vkv.me ([2001:5a8:468b:d015:5f44:e3d:a8c5:d59]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-12ab983f9f3sm15618113c88.10.2026.03.31.08.26.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Mar 2026 08:26:02 -0700 (PDT) From: Calvin Owens To: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org, x86@kernel.org, Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , James Clark , Thomas Gleixner , Borislav Petkov , Dave Hansen , "H. Peter Anvin" Subject: [PATCH 2/2] perf: Don't throttle based on NMI watchdog events Date: Tue, 31 Mar 2026 08:25:50 -0700 Message-ID: <685061daed5b505ef604835b602e0202531f2315.1774969692.git.calvin@wbinvd.org> X-Mailer: git-send-email 2.47.3 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The throttling logic in perf_sample_event_took() assumes the NMI is running at the maximum allowed sample rate. While this makes sense most of the time, it wildly overestimates the runtime of the NMI for the perf hardware watchdog: # bpftrace -e 'kprobe:perf_sample_event_took { \ printf("%s: cpu=3D%02d time_taken=3D%dns\n", \ strftime("%H:%M:%S.%f", nsecs), cpu(), arg0); }' 03:12:13.087003: cpu=3D00 time_taken=3D3190ns 03:12:13.486789: cpu=3D01 time_taken=3D2918ns 03:12:18.075288: cpu=3D03 time_taken=3D3308ns 03:12:19.797207: cpu=3D02 time_taken=3D2581ns 03:12:23.110317: cpu=3D00 time_taken=3D2823ns 03:12:23.510308: cpu=3D01 time_taken=3D2943ns 03:12:29.229348: cpu=3D03 time_taken=3D3669ns 03:12:31.656306: cpu=3D02 time_taken=3D3262ns The NMI for the watchdog runs for 2-4us every ten seconds, but the math done in perf_sample_event_took() concludes it is running for 200-400ms every second! When it is the only PMU event running, it can take minutes to hours of samples from the watchdog for the moving average to accumulate to something near the real mean, which causes the same little "litany" of sample rate throttles to happen every time Linux boots with the perf hardware watchdog enabled: perf: interrupt took too long (2526 > 2500), lowering kernel.perf_event= _max_sample_rate to 79000 perf: interrupt took too long (3177 > 3157), lowering kernel.perf_event= _max_sample_rate to 62000 perf: interrupt took too long (3979 > 3971), lowering kernel.perf_event= _max_sample_rate to 50000 perf: interrupt took too long (4983 > 4973), lowering kernel.perf_event= _max_sample_rate to 40000 This serves no purpose: it doesn't actually affect the runtime of the watchdog NMI at all. It confuses users, because it suggests their machine is spinning its wheels in interrupts when it isn't. Because the watchdog NMI is so infrequent, we can avoid throttling it by making the throttling a two-step process: load and update a timestamp whenever we think we need to throttle, and only actually proceed to throttle if the last time that happened was less than one second ago. This is inelegant, but it avoids touching the hot path and preserves current throttling behavior for real PMU use, at the cost of delaying the throttling by a single NMI. Signed-off-by: Calvin Owens --- kernel/events/core.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/kernel/events/core.c b/kernel/events/core.c index 89b40e439717..0f7a7e912f55 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -623,6 +623,7 @@ core_initcall(init_events_core_sysctls); */ #define NR_ACCUMULATED_SAMPLES 128 static DEFINE_PER_CPU(u64, running_sample_length); +static DEFINE_PER_CPU(u64, last_throttle_clock); =20 static u64 __report_avg; static u64 __report_allowed; @@ -643,6 +644,8 @@ void perf_sample_event_took(u64 sample_len_ns) u64 max_len =3D READ_ONCE(perf_sample_allowed_ns); u64 running_len; u64 avg_len; + u64 delta; + u64 now; u32 max; =20 if (max_len =3D=3D 0) @@ -663,6 +666,17 @@ void perf_sample_event_took(u64 sample_len_ns) if (avg_len <=3D max_len) return; =20 + /* + * Very infrequent events like the perf counter hard watchdog + * can trigger spurious throttling: skip throttling if the prior + * NMI got here more than one second before this NMI began. + */ + now =3D local_clock(); + delta =3D now - __this_cpu_read(last_throttle_clock); + __this_cpu_write(last_throttle_clock, now); + if (delta - sample_len_ns > NSEC_PER_SEC) + return; + __report_avg =3D avg_len; __report_allowed =3D max_len; =20 --=20 2.47.3