From nobody Tue Jun 23 22:23:31 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D9EEBC433EF for ; Fri, 25 Feb 2022 01:39:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235986AbiBYBkF (ORCPT ); Thu, 24 Feb 2022 20:40:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231774AbiBYBkC (ORCPT ); Thu, 24 Feb 2022 20:40:02 -0500 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25CCA2804CF for ; Thu, 24 Feb 2022 17:39:32 -0800 (PST) Received: by mail-pg1-x54a.google.com with SMTP id bh9-20020a056a02020900b0036c0d29eb3eso1858755pgb.9 for ; Thu, 24 Feb 2022 17:39:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=reply-to:date:message-id:mime-version:subject:from:to:cc; bh=lsIVxLdUn9+kqKlthczBNmMV4Jfo8iVsWw07MsiBzFo=; b=S1KhSS0YRiaedP0dpkuo9CSOauVYjzsqSwrDB7U8+wO9GIVphEwKYGjbx4XBYymstD e0Q9KgXKyhZyhkAyN6VqAFzFD7Mb34VbZAF+GDrvLpwpvXzyr5d9KyrvU6bIIwkUriHN s/5HWFK1Ui8qJr/+eTC8OfLBVChIHwmRQAB5S8JAdfzAeC0rMhAXkBMRTVKgmCPq/pYm nxys78/POUfQ9ThDbgsaMyjINh4xQu62bSuPcsl8elaL0CG1EAfNiazDHAaE2UBp2nnM nr8HywrRNUmOm0lYa021CQd6WBjeGL3sDpnIPSVp+PjnZ85vBm/8IEoUgKDiQXsb4jbG luxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:reply-to:date:message-id:mime-version:subject :from:to:cc; bh=lsIVxLdUn9+kqKlthczBNmMV4Jfo8iVsWw07MsiBzFo=; b=EWlYJXzDoX2bMPVeFcbLglmNmoT1abh0j3iXXB1pqvREDNX7y/CyPGomZ0ANFz6Ahz AbtCyCKu6W9uGZAIKjaUHspqiRrBZDLR4iQmIgX6avtVIcpcbd+vh7ymp75lozqyL417 1dAAXiUhl2PajXypMCY3BXPb8i6vWD4/zlHT3OTOa8lf3lWnzWHjjhtZWbthUrlqVXoj bRs9YLPYF8ReYkd+p+JSUw+SzbIsufAuZKGPBctJLiTS1/2dhIdPHm7H3BtbALve03Ne WbuIcwnz7pi/OQVkKQS0XIliQFmpmWvWbyiTiMxDFZKCm9WBP1y3J11Y57aeklQ9uWMw IIYw== X-Gm-Message-State: AOAM532xL24ZRpMteK0/9S09CE/ZbkfcUGmlCeolr7Ew0XC4vjO5nn1Z by+M3HgWULfpwjK2opNy4YSVflL1BS4= X-Google-Smtp-Source: ABdhPJxthNBm8xeMjBejGNjZhtAPusW151AX+qp4ccNqPhw0xRMdnSW1ZqJrQRd4Lv/wPI98qL9BTJjlf0M= X-Received: from seanjc.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:3e5]) (user=seanjc job=sendgmr) by 2002:a17:90a:f505:b0:1bc:d47e:8b19 with SMTP id cs5-20020a17090af50500b001bcd47e8b19mr856714pjb.102.1645753171649; Thu, 24 Feb 2022 17:39:31 -0800 (PST) Reply-To: Sean Christopherson Date: Fri, 25 Feb 2022 01:39:29 +0000 Message-Id: <20220225013929.3577699-1-seanjc@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.35.1.574.g5d30c73bfb-goog Subject: [PATCH] KVM: x86: Don't snapshot "max" TSC if host TSC is constant From: Sean Christopherson To: Paolo Bonzini Cc: Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Suleiman Souhlal , Anton Romanov Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Don't snapshot tsc_khz into max_tsc_khz during KVM initialization if the host TSC is constant, in which case the actual TSC frequency will never change and thus capturing the "max" TSC during initialization is unnecessary, KVM can simply use tsc_khz during VM creation. On CPUs with constant TSC, but not a hardware-specified TSC frequency, snapshotting max_tsc_khz and using that to set a VM's default TSC frequency can lead to KVM thinking it needs to manually scale the guest's TSC if refining the TSC completes after KVM snapshots tsc_khz. The actual frequency never changes, only the kernel's calculation of what that frequency is changes. On systems without hardware TSC scaling, this either puts KVM into "always catchup" mode (extremely inefficient), or prevents creating VMs altogether. Ideally, KVM would not be able to race with TSC refinement, or would have a hook into tsc_refine_calibration_work() to get an alert when refinement is complete. Avoiding the race altogether isn't practical as refinement takes a relative eternity; it's deliberately put on a work queue outside of the normal boot sequence to avoid unnecessarily delaying boot. Adding a hook is doable, but somewhat gross due to KVM's ability to be built as a module. And if the TSC is constant, which is likely the case for every VMX/SVM-capable CPU produced in the last decade, the race can be hit if and only if userspace is able to create a VM before TSC refinement completes; refinement is slow, but not that slow. For now, punt on a proper fix, as not taking a snapshot can help some uses cases and not taking a snapshot is arguably correct irrespective of the race with refinement. Cc: Suleiman Souhlal Cc: Anton Romanov Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6552360d8888..81d9d84dc59f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8727,13 +8727,15 @@ static int kvmclock_cpu_online(unsigned int cpu) =20 static void kvm_timer_init(void) { - max_tsc_khz =3D tsc_khz; - if (!boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { #ifdef CONFIG_CPU_FREQ struct cpufreq_policy *policy; int cpu; +#endif =20 + max_tsc_khz =3D tsc_khz; + +#ifdef CONFIG_CPU_FREQ cpu =3D get_cpu(); policy =3D cpufreq_cpu_get(cpu); if (policy) { @@ -11160,7 +11162,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) vcpu->arch.msr_platform_info =3D MSR_PLATFORM_INFO_CPUID_FAULT; kvm_vcpu_mtrr_init(vcpu); vcpu_load(vcpu); - kvm_set_tsc_khz(vcpu, max_tsc_khz); + kvm_set_tsc_khz(vcpu, max_tsc_khz ? : tsc_khz); kvm_vcpu_reset(vcpu, false); kvm_init_mmu(vcpu); vcpu_put(vcpu); base-commit: f4bc051fc91ab9f1d5225d94e52d369ef58bec58 --=20 2.35.1.574.g5d30c73bfb-goog