From nobody Wed Feb 11 08:10:45 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E52342EC562 for ; Tue, 10 Feb 2026 15:50:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770738654; cv=none; b=FE9GvNlwVdV9PQKmOMX1LKeYgKvu9tgvCgXl/dQ1tWZPDskrdI5XoNs6VzM99EAmABFczVLLsuzYNqFdFj41OuwRq3LFvltW73G6ldBHjxCaGpDDNmF/4/OdiG7yP2yxy2vSK2GBrr00qps3AW4qy2ScE6I9NUcx+4BiqaaEqBM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770738654; c=relaxed/simple; bh=URwR1r+AJn5FVnv/LUKiN2zd/U+5gHbHtclsxde1VqA=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=C8HyoMn9cpT4eJGiS/t0eS3jUCsRU8vveJFwc2PcrCWtEXmA+HJjkebCRIq180qxjqJ0OBb++iszpTHl0laBhmIjVYyM/RlsUwE3Eb+OgIlAz8aQ87ZmPwjK7LwEupHkE83iELiTj1+X3sJ92hdZ4z4t2Df/P8ph3XhcDEO1CrA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Q3n43AMT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Q3n43AMT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AEB37C19421; Tue, 10 Feb 2026 15:50:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1770738653; bh=URwR1r+AJn5FVnv/LUKiN2zd/U+5gHbHtclsxde1VqA=; h=Date:From:To:Cc:Subject:References:From; b=Q3n43AMTqbIT4GKBHwhycNC0KegxdGum0bCOzRZFwaLvLypEqr3277y1hEX+DZuLY RbowNNAwtZTgbLsEExst+oXUt022tQHCSWBqqvs/mnyb0y0twyve5XjKtl4T52bjVd q7lrJYRDImYWe27uSJr5jjx673N/dLH5F+QXY95M7AgyDP/4UO3xQAY/vkwbTcmgHm bc29sxiNN1N5cq4ZKkA61Gs7BMPpqT2TRq6PpcNRiGLQkFYqn44sQKLDPcrZFDI321 v4O7JxjBdcsAaCboXjeoIqOkyyK9aRKc0Q5fY9kIa5X01z96egj8C6kUpjGEBCFWnL CJlru7eyYc4yw== Received: from rostedt by gandalf with local (Exim 4.99.1) (envelope-from ) id 1vpq0t-0000000079S-1LFK; Tue, 10 Feb 2026 10:50:55 -0500 Message-ID: <20260210155055.180192150@kernel.org> User-Agent: quilt/0.68 Date: Tue, 10 Feb 2026 10:50:29 -0500 From: Steven Rostedt To: linux-kernel@vger.kernel.org Cc: Masami Hiramatsu , Mark Rutland , Mathieu Desnoyers , Andrew Morton , Colin Lord Subject: [for-next][PATCH 2/2] tracing: Fix false sharing in hwlat get_sample() References: <20260210155027.468969167@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Colin Lord The get_sample() function in the hwlat tracer assumes the caller holds hwlat_data.lock, but this is not actually happening. The result is unprotected data access to hwlat_data, and in per-cpu mode can result in false sharing which may show up as false positive latency events. The specific case of false sharing observed was primarily between hwlat_data.sample_width and hwlat_data.count. These are separated by just 8B and are therefore likely to share a cache line. When one thread modifies count, the cache line is in a modified state so when other threads read sample_width in the main latency detection loop, they fetch the modified cache line. On some systems, the fetch itself may be slow enough to count as a latency event, which could set up a self reinforcing cycle of latency events as each event increments count which then causes more latency events, continuing the cycle. The other result of the unprotected data access is that hwlat_data.count can end up with duplicate or missed values, which was observed on some systems in testing. Convert hwlat_data.count to atomic64_t so it can be safely modified without locking, and prevent false sharing by pulling sample_width into a local variable. One system this was tested on was a dual socket server with 32 CPUs on each numa node. With settings of 1us threshold, 1000us width, and 2000us window, this change reduced the number of latency events from 500 per second down to approximately 1 event per minute. Some machines tested did not exhibit measurable latency from the false sharing. Cc: Masami Hiramatsu Cc: Mathieu Desnoyers Link: https://patch.msgid.link/20260210074810.6328-1-clord@mykolab.com Signed-off-by: Colin Lord Signed-off-by: Steven Rostedt (Google) --- kernel/trace/trace_hwlat.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c index 2f7b94e98317..3fe274b84f1c 100644 --- a/kernel/trace/trace_hwlat.c +++ b/kernel/trace/trace_hwlat.c @@ -102,9 +102,9 @@ struct hwlat_sample { /* keep the global state somewhere. */ static struct hwlat_data { =20 - struct mutex lock; /* protect changes */ + struct mutex lock; /* protect changes */ =20 - u64 count; /* total since reset */ + atomic64_t count; /* total since reset */ =20 u64 sample_window; /* total sampling window (on+off) */ u64 sample_width; /* active sampling portion of window */ @@ -193,8 +193,7 @@ void trace_hwlat_callback(bool enter) * get_sample - sample the CPU TSC and look for likely hardware latencies * * Used to repeatedly capture the CPU TSC (or similar), looking for potent= ial - * hardware-induced latency. Called with interrupts disabled and with - * hwlat_data.lock held. + * hardware-induced latency. Called with interrupts disabled. */ static int get_sample(void) { @@ -204,6 +203,7 @@ static int get_sample(void) time_type start, t1, t2, last_t2; s64 diff, outer_diff, total, last_total =3D 0; u64 sample =3D 0; + u64 sample_width =3D READ_ONCE(hwlat_data.sample_width); u64 thresh =3D tracing_thresh; u64 outer_sample =3D 0; int ret =3D -1; @@ -267,7 +267,7 @@ static int get_sample(void) if (diff > sample) sample =3D diff; /* only want highest value */ =20 - } while (total <=3D hwlat_data.sample_width); + } while (total <=3D sample_width); =20 barrier(); /* finish the above in the view for NMIs */ trace_hwlat_callback_enabled =3D false; @@ -285,8 +285,7 @@ static int get_sample(void) if (kdata->nmi_total_ts) do_div(kdata->nmi_total_ts, NSEC_PER_USEC); =20 - hwlat_data.count++; - s.seqnum =3D hwlat_data.count; + s.seqnum =3D atomic64_inc_return(&hwlat_data.count); s.duration =3D sample; s.outer_duration =3D outer_sample; s.nmi_total_ts =3D kdata->nmi_total_ts; @@ -832,7 +831,7 @@ static int hwlat_tracer_init(struct trace_array *tr) =20 hwlat_trace =3D tr; =20 - hwlat_data.count =3D 0; + atomic64_set(&hwlat_data.count, 0); tr->max_latency =3D 0; save_tracing_thresh =3D tracing_thresh; =20 --=20 2.51.0