From nobody Tue Jun 23 00:43:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9979BC433EF for ; Mon, 14 Mar 2022 19:47:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244423AbiCNTs4 (ORCPT ); Mon, 14 Mar 2022 15:48:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244469AbiCNTsn (ORCPT ); Mon, 14 Mar 2022 15:48:43 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BD5D73E5E7 for ; Mon, 14 Mar 2022 12:47:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1647287239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5mETI+QLK6ZhxRbTKxuVCGJOhAl1j7UwP6UTDjwDl6o=; b=Ou+erTEN1WB+oktNV6WYPYTGQEbZc6eelV1padouHnr/2vF0CSo4URhjByEKumCusrWPId mgOGuZvp7Nkm48v6g1eFU+Hpl4ckiaoDvDayvckjcd0B14t5+feRBewNODJ49PsSrxBCx1 TcGhAZ3pfO925rJIZ4N+C35Bg0OTNnI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-449-2pCTlSTPPnmRiIwsCSSdKA-1; Mon, 14 Mar 2022 15:47:09 -0400 X-MC-Unique: 2pCTlSTPPnmRiIwsCSSdKA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0EC0A1C0514F; Mon, 14 Mar 2022 19:47:09 +0000 (UTC) Received: from llong.com (unknown [10.22.16.213]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9B0FD40C128B; Mon, 14 Mar 2022 19:47:08 +0000 (UTC) From: Waiman Long To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, "H. Peter Anvin" , Feng Tang , Bill Gray , Jirka Hladky , Waiman Long Subject: [PATCH 1/2] x86/tsc: Reduce external interference on max_warp detection Date: Mon, 14 Mar 2022 15:46:29 -0400 Message-Id: <20220314194630.1726542-2-longman@redhat.com> In-Reply-To: <20220314194630.1726542-1-longman@redhat.com> References: <20220314194630.1726542-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The TSC max_warp detection code in check_tsc_warp() is very timing sensitive. Due to the possibility of false cacheline sharing, activities done in other CPUs may have an impact on the max_warp detection process. Put the max_wrap detection data variables on their own cacheline to reduce that kind of external interference. Signed-off-by: Waiman Long --- arch/x86/kernel/tsc_sync.c | 57 ++++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 27 deletions(-) diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c index 9452dc9664b5..70aeb254b62b 100644 --- a/arch/x86/kernel/tsc_sync.c +++ b/arch/x86/kernel/tsc_sync.c @@ -253,12 +253,15 @@ static atomic_t test_runs; * we want to have the fastest, inlined, non-debug version * of a critical section, to be able to prove TSC time-warps: */ -static arch_spinlock_t sync_lock =3D __ARCH_SPIN_LOCK_UNLOCKED; - -static cycles_t last_tsc; -static cycles_t max_warp; -static int nr_warps; -static int random_warps; +static struct { + arch_spinlock_t lock; + int nr_warps; + int random_warps; + cycles_t last_tsc; + cycles_t max_warp; +} sync ____cacheline_aligned_in_smp =3D { + .lock =3D __ARCH_SPIN_LOCK_UNLOCKED, +}; =20 /* * TSC-warp measurement loop running on both CPUs. This is not called @@ -281,11 +284,11 @@ static cycles_t check_tsc_warp(unsigned int timeout) * previous TSC that was measured (possibly on * another CPU) and update the previous TSC timestamp. */ - arch_spin_lock(&sync_lock); - prev =3D last_tsc; + arch_spin_lock(&sync.lock); + prev =3D sync.last_tsc; now =3D rdtsc_ordered(); - last_tsc =3D now; - arch_spin_unlock(&sync_lock); + sync.last_tsc =3D now; + arch_spin_unlock(&sync.lock); =20 /* * Be nice every now and then (and also check whether @@ -304,18 +307,18 @@ static cycles_t check_tsc_warp(unsigned int timeout) * we saw a time-warp of the TSC going backwards: */ if (unlikely(prev > now)) { - arch_spin_lock(&sync_lock); - max_warp =3D max(max_warp, prev - now); - cur_max_warp =3D max_warp; + arch_spin_lock(&sync.lock); + sync.max_warp =3D max(sync.max_warp, prev - now); + cur_max_warp =3D sync.max_warp; /* * Check whether this bounces back and forth. Only * one CPU should observe time going backwards. */ - if (cur_warps !=3D nr_warps) - random_warps++; - nr_warps++; - cur_warps =3D nr_warps; - arch_spin_unlock(&sync_lock); + if (cur_warps !=3D sync.nr_warps) + sync.random_warps++; + sync.nr_warps++; + cur_warps =3D sync.nr_warps; + arch_spin_unlock(&sync.lock); } } WARN(!(now-start), @@ -394,21 +397,21 @@ void check_tsc_sync_source(int cpu) * stop. If not, decrement the number of runs an check if we can * retry. In case of random warps no retry is attempted. */ - if (!nr_warps) { + if (!sync.nr_warps) { atomic_set(&test_runs, 0); =20 pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n", smp_processor_id(), cpu); =20 - } else if (atomic_dec_and_test(&test_runs) || random_warps) { + } else if (atomic_dec_and_test(&test_runs) || sync.random_warps) { /* Force it to 0 if random warps brought us here */ atomic_set(&test_runs, 0); =20 pr_warn("TSC synchronization [CPU#%d -> CPU#%d]:\n", smp_processor_id(), cpu); pr_warn("Measured %Ld cycles TSC warp between CPUs, " - "turning off TSC clock.\n", max_warp); - if (random_warps) + "turning off TSC clock.\n", sync.max_warp); + if (sync.random_warps) pr_warn("TSC warped randomly between CPUs\n"); mark_tsc_unstable("check_tsc_sync_source failed"); } @@ -417,10 +420,10 @@ void check_tsc_sync_source(int cpu) * Reset it - just in case we boot another CPU later: */ atomic_set(&start_count, 0); - random_warps =3D 0; - nr_warps =3D 0; - max_warp =3D 0; - last_tsc =3D 0; + sync.random_warps =3D 0; + sync.nr_warps =3D 0; + sync.max_warp =3D 0; + sync.last_tsc =3D 0; =20 /* * Let the target continue with the bootup: @@ -476,7 +479,7 @@ void check_tsc_sync_target(void) /* * Store the maximum observed warp value for a potential retry: */ - gbl_max_warp =3D max_warp; + gbl_max_warp =3D sync.max_warp; =20 /* * Ok, we are done: --=20 2.27.0 From nobody Tue Jun 23 00:43:35 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DCC4C433EF for ; Mon, 14 Mar 2022 19:47:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244436AbiCNTsa (ORCPT ); Mon, 14 Mar 2022 15:48:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244403AbiCNTsZ (ORCPT ); Mon, 14 Mar 2022 15:48:25 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0CE2F3DA79 for ; Mon, 14 Mar 2022 12:47:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1647287232; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=74lQ53agjdb+drBg2ZrVPEZ4Mhaf7waMbyBjjplBVKg=; b=bWHqTz5KLatfxVfAphRtdymnvyGQq6NR4a1m2O8QNuoN+jhDpvpkFSpvmsdw1/KBOhb73q 1lQwTpQ92Q2Xbvk/MgOTHaMGcQMWmmTWKIeiG7Ripfcvx84CJYbIVYXNzcJ4cANk93uwfY UwNeNtNSfD4sLyDPrNPc31dQp1kswHw= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-421-pGt2JrTsNNmuGNLVzlgqPQ-1; Mon, 14 Mar 2022 15:47:11 -0400 X-MC-Unique: pGt2JrTsNNmuGNLVzlgqPQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 80EA52999B35; Mon, 14 Mar 2022 19:47:10 +0000 (UTC) Received: from llong.com (unknown [10.22.16.213]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1BD9C400F3EF; Mon, 14 Mar 2022 19:47:09 +0000 (UTC) From: Waiman Long To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen Cc: x86@kernel.org, linux-kernel@vger.kernel.org, "H. Peter Anvin" , Feng Tang , Bill Gray , Jirka Hladky , Waiman Long Subject: [PATCH 2/2] x86/tsc_sync: Add synchronization overhead to tsc adjustment Date: Mon, 14 Mar 2022 15:46:30 -0400 Message-Id: <20220314194630.1726542-3-longman@redhat.com> In-Reply-To: <20220314194630.1726542-1-longman@redhat.com> References: <20220314194630.1726542-1-longman@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" As stated in the comment of check_tsc_sync_target(): The adjustment value is slightly off by the overhead of the sync mechanism (observed values are ~200 TSC cycles), but this really depends on CPU, node distance and frequency. So compensating for this is hard to get right. That overhead, however, can cause the tsc adjustment to fail after 3 test runs as can be seen when booting up a hot 4-socket Intel CooperLake system: [ 0.034090] TSC deadline timer available [ 0.008807] TSC ADJUST compensate: CPU36 observed 95626 warp. Adjust: 95= 626 [ 0.008807] TSC ADJUST compensate: CPU36 observed 74 warp. Adjust: 95700 [ 0.974281] TSC synchronization [CPU#0 -> CPU#36]: [ 0.974281] Measured 4 cycles TSC warp between CPUs, turning off TSC clo= ck. [ 0.974281] tsc: Marking TSC unstable due to check_tsc_sync_source failed To prevent this tsc adjustment failure, we need to estimate the sync overhead which will be at least an unlock operation in one cpu followed by a lock operation in another cpu. The measurement is done in check_tsc_sync_target() after stop_count reached 2 which is set by the source cpu after it re-initializes the sync variables causing the lock cacheline to be remote from the target cpu. The subsequent time measurement will then be similar to latency between successive 2-cpu sync loop in check_tsc_warp(). Interrupt should not yet been enabled when check_tsc_sync_target() is called. However some interference may have caused the overhead estimation to vary a bit. With this patch applied, the measured overhead on the same CooperLake system on different reboot runs varies from 104 to 326. Signed-off-by: Waiman Long --- arch/x86/kernel/tsc_sync.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c index 70aeb254b62b..e2c43ba4e7b8 100644 --- a/arch/x86/kernel/tsc_sync.c +++ b/arch/x86/kernel/tsc_sync.c @@ -445,6 +445,7 @@ void check_tsc_sync_target(void) struct tsc_adjust *cur =3D this_cpu_ptr(&tsc_adjust); unsigned int cpu =3D smp_processor_id(); cycles_t cur_max_warp, gbl_max_warp; + cycles_t start, sync_overhead; int cpus =3D 2; =20 /* Also aborts if there is no TSC. */ @@ -505,29 +506,37 @@ void check_tsc_sync_target(void) if (!atomic_read(&test_runs)) return; =20 + /* + * Estimate the synchronization overhead by measuring the time for + * a lock/unlock operation. + */ + start =3D rdtsc_ordered(); + arch_spin_lock(&sync.lock); + arch_spin_unlock(&sync.lock); + sync_overhead =3D rdtsc_ordered() - start; + /* * If the warp value of this CPU is 0, then the other CPU * observed time going backwards so this TSC was ahead and * needs to move backwards. */ - if (!cur_max_warp) + if (!cur_max_warp) { cur_max_warp =3D -gbl_max_warp; + sync_overhead =3D -sync_overhead; + } =20 /* * Add the result to the previous adjustment value. * * The adjustment value is slightly off by the overhead of the * sync mechanism (observed values are ~200 TSC cycles), but this - * really depends on CPU, node distance and frequency. So - * compensating for this is hard to get right. Experiments show - * that the warp is not longer detectable when the observed warp - * value is used. In the worst case the adjustment needs to go - * through a 3rd run for fine tuning. + * really depends on CPU, node distance and frequency. Add the + * estimated sync overhead to the adjustment value. */ - cur->adjusted +=3D cur_max_warp; + cur->adjusted +=3D cur_max_warp + sync_overhead; =20 - pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp. Adjust: %lld\n", - cpu, cur_max_warp, cur->adjusted); + pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp (overhead %lld).= Adjust: %lld\n", + cpu, cur_max_warp, sync_overhead, cur->adjusted); =20 wrmsrl(MSR_IA32_TSC_ADJUST, cur->adjusted); goto retry; --=20 2.27.0