From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930552; cv=none; d=zohomail.com; s=zohoarc; b=LJ8DskODI8KEyLHkAicTGU0AiBS8kC5G6vYqCSEew3qaDbW6Mx/4/0RBXaWEAnUwAHU5IAUYswOwkAGwWq1bDx7tR6IlJsanwSOJK1KjQ2Xt61ZcV+KUJFSy/7rM5tk0hd7HB002gUZMwq9wLSZ63chK9LLQCEWMjNBzAkR0yt0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930552; h=Content-Type:Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=IKIepJVZTrQvr9mzOQ+/HkJi9FX0zbFmsPkvIgbqGRE=; b=mVJSXiFnY/Pr17WONnNEgoTvAQmvk7T5kg1hC+0HiXXdzglR9IjhVyaEiYsovcd3F97CEnX2vT1qtJEoFuswKUQ12aRtJrkqMIPXJHNOg9WIIBiZ9sPkDxHdIIHt7Pc06yLstAgcerLQ4j/GqMQAz8SBprhj8X+PIedc93wNfJc= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930552068695.865868765167; Mon, 8 Jun 2026 07:55:52 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331713.1594346 (Exim 4.92) (envelope-from ) id 1wWbNp-0003Uk-Vk; Mon, 08 Jun 2026 14:55:21 +0000 Received: by outflank-mailman (output) from mailman id 1331713.1594346; Mon, 08 Jun 2026 14:55:21 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNp-0003Rk-BT; Mon, 08 Jun 2026 14:55:21 +0000 Received: by outflank-mailman (input) for mailman id 1331713; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNl-0002P5-7L for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNk-00EcHE-Jv; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [10.42.69.1] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7d0-5cb7-0a2a0a5109dd-0a2a45018424-8 for ; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-d62444.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d2-c1f2-0a2a45010019-5a9b3222ea9c-3 for ; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNR-0000000Dtx9-2t34; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NEY-1Xm3; Mon, 08 Jun 2026 15:54:57 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=IKIepJVZTrQvr9mzOQ+/HkJi9FX0zbFmsPkvIgbqGRE=; b=I1BaiAVqVDGDPY/DJMfHhsDzqI CPaFVooL4qZtsnnTM+BJwtuHplzLKHiqwoTwUFmsxyhp8F4R84fIgwZo6ogTKuhj9fkpx4ut5Gp3l IfafIQl9SY/azFshGwtcrlVKWkYSRIBL6wDcajc855HYrOd04CU0E2bjvkeHkfY+hLnN7l4Js/5aE YIk+mSzqTIWhBiKDQBe+XySRSaCwD8KnrAc55L3VozM0FU2qVB7d1OBFwwJTwhC98Rds/89Llm3jC 4rodbhz/ZkeR8oF2GB8tc0CbRQiirP/U7D2WaNY2CAnwj15OkbXzSRBg4i4dZQhThFNUmI3XkRK/Y He1oCUBA==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 01/34] KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init() Date: Mon, 8 Jun 2026 15:47:42 +0100 Message-ID: <20260608145455.89187-2-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-d62444/1780930515-ACE52FF4-025FBCFA/0/0 X-purgate-type: clean X-purgate-size: 4433 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930553174158500 From: David Woodhouse The KVM clock is an interesting thing. It is defined as "nanoseconds since the guest was created", but in practice it runs at two *different* rates =E2=80=94 or three different rates, if you count implementation bugs. Definition A is that it runs synchronously with the CLOCK_MONOTONIC_RAW of the host, with a delta of kvm->arch.kvmclock_offset. But that version doesn't actually get used in the common case, where the host has a reliable TSC and the guest TSCs are all running at the same rate and in sync with each other, and kvm->arch.use_master_clock is set. In that common case, definition B is used: There is a reference point in time at kvm->arch.master_kernel_ns (again a CLOCK_MONOTONIC_RAW time), and a corresponding host TSC value kvm->arch.master_cycle_now. This fixed point in time is converted to guest units (the time offset by kvmclock_offset and the TSC Value scaled and offset to be a guest TSC value) and advertised to the guest in the pvclock structure. While in this 'use_master_clock' mode, the fixed point in time never needs to be changed, and the clock runs precisely in time with the guest TSC, at the rate advertised in the pvclock structure. The third definition C is implemented in kvm_get_wall_clock_epoch() and __get_kvmclock(), using the master_cycle_now and master_kernel_ns fields but converting the *host* TSC cycles directly to a value in nanoseconds instead of scaling via the guest TSC. One might na=C3=AFvely think that all three definitions are identical, since CLOCK_MONOTONIC_RAW is not skewed by NTP frequency corrections; all three are just the result of counting the host TSC at a known frequency, or the scaled guest TSC at a known precise fraction of the host's frequency. The problem is with arithmetic precision, and the way that frequency scaling is done in a division-free way by multiplying by a scale factor, then shifting right. In practice, all three ways of calculating the KVM clock will suffer a systemic drift from each other. Eventually, definition C should just be eliminated. Commit 451a707813ae ("KVM: x86/xen: improve accuracy of Xen timers") worked around it for the specific case of Xen timers, which are defined in terms of the KVM clock and suffered from a continually increasing error in timer expiry times. That commit notes that get_kvmclock_ns() is non-trivial to fix and says "I'll come back to that", which remains true. Definitions A and B do need to coexist, the former to handle the case where the host or guest TSC is suboptimally configured. But KVM should be more careful about switching between them, and the discontinuity in guest time which could result. In particular, KVM_REQ_MASTERCLOCK_UPDATE will take a new snapshot of time as the reference in master_kernel_ns and master_cycle_now, yanking the guest's clock back to match definition A at that moment. When invoked from in 'use_master_clock' mode, kvm_update_masterclock() should probably *adjust* kvm->arch.kvmclock_offset to account for the drift, instead of yanking the clock back to definition A. But in the meantime there are a bunch of places where it just doesn't need to be invoked at all. To start with: there is no need to do such an update when a Xen guest populates the shared_info page. This seems to have been a hangover from the very first implementation of shared_info which automatically populated the vcpu_info structures at their default locations, but even then it should just have raised KVM_REQ_CLOCK_UPDATE on each vCPU instead of using KVM_REQ_MASTERCLOCK_UPDATE. And now that userspace is expected to explicitly set the vcpu_info even in its default locations, there's not even any need for that either. Fixes: 629b5348841a ("KVM: x86/xen: update wallclock region") Reviewed-by: Paul Durrant Signed-off-by: David Woodhouse --- arch/x86/kvm/xen.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 91fd3673c09a..82e34edbfdbd 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -98,8 +98,6 @@ static int kvm_xen_shared_info_init(struct kvm *kvm) wc->version =3D wc_version + 1; read_unlock_irq(&gpc->lock); =20 - kvm_make_all_cpus_request(kvm, KVM_REQ_MASTERCLOCK_UPDATE); - out: srcu_read_unlock(&kvm->srcu, idx); return ret; --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930547; cv=none; d=zohomail.com; s=zohoarc; b=jkz8HG9uDZnf/wYcE+41XkEtnaB5/b4UPm7aVU9Zp+fK9s+b44FliqZAkdPoCQucTwEZO/kYMOh2kMD1DmNG99RHBK03C8ZNZIDA+z4dYVtmGVxYwg5WROAOMhYK54Sfti3KTYE8tHdF7i/7lp0VnWJYyaPmirDY+kmVrBfFKBw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930547; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=O2rfIXf7FAMn5RSx8KeKP3wOA/C5b6hiUSG3v0gFZjg=; b=hmGiwTmr6Alv9W3vjMKF/0NFftoxGHnYSibJYM5alemgQVx/g5esjRdlQXcDQpZVpvCA3pfwlnvC27uEBnHOix5Dxt0pWFPgV6vFsNxhtpHQIajZkokAhi40vqrspb4dVpFjbLm1g2iEOZ8wixYq2s/pQ+j5nwwsrl1LdPa+ow8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930547557530.2130528579311; Mon, 8 Jun 2026 07:55:47 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331717.1594376 (Exim 4.92) (envelope-from ) id 1wWbNt-0004Jt-Kk; Mon, 08 Jun 2026 14:55:25 +0000 Received: by outflank-mailman (output) from mailman id 1331717.1594376; Mon, 08 Jun 2026 14:55:25 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNs-0004HY-QN; Mon, 08 Jun 2026 14:55:24 +0000 Received: by outflank-mailman (input) for mailman id 1331717; Mon, 08 Jun 2026 14:55:18 +0000 Received: from mx.expurgate.net ([194.145.224.20]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNl-0002X3-PF for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNk-008Spg-7s; Mon, 08 Jun 2026 16:55:17 +0200 Received: from [10.42.69.8] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7c3-e002-0a2a0a5209dd-0a2a4508bf20-30 for ; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [90.155.92.199] (helo=desiato.infradead.org) by tlsNG-c1860d.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d3-63b5-0a2a45080019-5a9b5cc7a05a-3 for ; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001Afu-2NIx; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NEb-1nML; Mon, 08 Jun 2026 15:54:57 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=desiato.20200630 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=O2rfIXf7FAMn5RSx8KeKP3wOA/C5b6hiUSG3v0gFZjg=; b=UswvzwdXW4cumr9tVtwAmH9r8z HQ1J2ErUUt4WUozma1R4y/NVmlngB4qUv3jxoBqgdvUcFtML+L59FLtTS+mitvNTGD+UPhG8+J748 vDipfUYMSJj1H47ExypIaoMUd+myYQOmfPcfPlFfWm95NwcQ8ctlY5OLIz55RnLlkHqJG6+9iqrdQ theE324CCz9gwFcX6T1z2gIUqtNY7DhuIB50aUMemh/pduJdQUlfyaKqx710p/lSNl9lSnSC8rYTp e+luOkrmVJfPPRgxblutyhgG4158HFr5hT8Nex8+j6cdbryej7hJ2HGeV3wdmEhaL8eAYIFiMMQAp 7yC7FnjA==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 02/34] KVM: x86: Improve accuracy of KVM clock when TSC scaling is in force Date: Mon, 8 Jun 2026 15:47:43 +0100 Message-ID: <20260608145455.89187-3-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-c1860d/1780930516-C5784DB1-BC49454C/0/0 X-purgate-type: clean X-purgate-size: 4034 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930550624154100 Content-Type: text/plain; charset="utf-8" From: David Woodhouse The kvm_guest_time_update() function scales the host TSC frequency to the guest's using kvm_scale_tsc() and the v->arch.l1_tsc_scaling_ratio scaling ratio previously calculated for that vCPU. Then calculates the scaling factors for the KVM clock itself based on that guest TSC frequency. However, it uses kHz as the unit when scaling, and then multiplies by 1000 only at the end. With a host TSC frequency of 3000MHz and a guest set to 2500MHz, the result of kvm_scale_tsc() will actually come out at 2,499,999kHz. So the KVM clock advertised to the guest is based on a frequency of 2,499,999,000 Hz. By using Hz as the unit from the beginning, the KVM clock would be based on a more accurate frequency of 2,499,999,999 Hz in this example. Use u64 for the hw_tsc_hz field since an unsigned int would overflow for TSC frequencies above 4GHz. Use div_u64() for the Xen CPUID leaf to play nice with 32-bit kernels. Fixes: 78db6a503796 ("KVM: x86: rewrite handling of scaled TSC for kvmclock= ") Reviewed-by: Paul Durrant Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/cpuid.c | 2 +- arch/x86/kvm/x86.c | 17 +++++++++-------- 3 files changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index c470e40a00aa..37264212c7df 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -950,7 +950,7 @@ struct kvm_vcpu_arch { gpa_t time; s8 pvclock_tsc_shift; u32 pvclock_tsc_mul; - unsigned int hw_tsc_khz; + u64 hw_tsc_hz; struct gfn_to_pfn_cache pv_time; /* set guest stopped flag in pvclock flags field */ bool pvclock_set_guest_stopped_request; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index e69156b54cff..621d950ec692 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -2131,7 +2131,7 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *= ebx, *ecx =3D vcpu->arch.pvclock_tsc_mul; *edx =3D vcpu->arch.pvclock_tsc_shift; } else if (index =3D=3D 2) { - *eax =3D vcpu->arch.hw_tsc_khz; + *eax =3D div_u64(vcpu->arch.hw_tsc_hz, 1000); } } } else { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0a1b63c63d1a..d9ef165df6a1 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3314,7 +3314,8 @@ static void kvm_setup_guest_pvclock(struct pvclock_vc= pu_time_info *ref_hv_clock, int kvm_guest_time_update(struct kvm_vcpu *v) { struct pvclock_vcpu_time_info hv_clock =3D {}; - unsigned long flags, tgt_tsc_khz; + unsigned long flags; + u64 tgt_tsc_hz; unsigned seq; struct kvm_vcpu_arch *vcpu =3D &v->arch; struct kvm_arch *ka =3D &v->kvm->arch; @@ -3340,8 +3341,8 @@ int kvm_guest_time_update(struct kvm_vcpu *v) =20 /* Keep irq disabled to prevent changes to the clock */ local_irq_save(flags); - tgt_tsc_khz =3D get_cpu_tsc_khz(); - if (unlikely(tgt_tsc_khz =3D=3D 0)) { + tgt_tsc_hz =3D (u64)get_cpu_tsc_khz() * 1000; + if (unlikely(tgt_tsc_hz =3D=3D 0)) { local_irq_restore(flags); kvm_make_request(KVM_REQ_CLOCK_UPDATE, v); return 1; @@ -3376,16 +3377,16 @@ int kvm_guest_time_update(struct kvm_vcpu *v) /* With all the info we got, fill in the values */ =20 if (kvm_caps.has_tsc_control) { - tgt_tsc_khz =3D kvm_scale_tsc(tgt_tsc_khz, + tgt_tsc_hz =3D kvm_scale_tsc(tgt_tsc_hz, v->arch.l1_tsc_scaling_ratio); - tgt_tsc_khz =3D tgt_tsc_khz ? : 1; + tgt_tsc_hz =3D tgt_tsc_hz ? : 1; } =20 - if (unlikely(vcpu->hw_tsc_khz !=3D tgt_tsc_khz)) { - kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL, + if (unlikely(vcpu->hw_tsc_hz !=3D tgt_tsc_hz)) { + kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_hz, &vcpu->pvclock_tsc_shift, &vcpu->pvclock_tsc_mul); - vcpu->hw_tsc_khz =3D tgt_tsc_khz; + vcpu->hw_tsc_hz =3D tgt_tsc_hz; } =20 hv_clock.tsc_shift =3D vcpu->pvclock_tsc_shift; --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA36D43E9DF; Mon, 8 Jun 2026 14:55:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930540; cv=none; b=NELpBzeWU94iA/dVq5gyDoNG4iaS/cPbebUfYVerM8iCb7SVzHwJxR2DV4tPpRg2jrOCa1Vds/OmOkrWlbnyTcKjkC5TJ2z6En4NtvRtXQzf0LiWyG2cuE4XuWr3A3PQyBalHHl58FoRUeNWjjtwz/JYVdK4Bk0mWRst4u6pXCs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930540; c=relaxed/simple; bh=f0ywMWwOQi/gME2keYFYuL1CdWe5N2EjqIK4nwqWdbA=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QOL29TrRD+z6qPTQ6HWj3z5kJJWS4EWLKbar4vNQo1MJK/7C0EfJlIOHM4hZrMtYu+QQGb+qBOOORHUz2adN9cGY3cizM6MpeuveDkD9LbavkbsvrunkPXNBeyCNS7OipqhCZrjF8synCMAfW4xUH0wZly/agY9xAqTsf/lGXpw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=QQavcMWo; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="QQavcMWo" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=0atnhSqy/yhBdxYTwYxwUB4G++t3k5fRFvMbSMUIxqE=; b=QQavcMWoXNBjzD1rCz3GOy7WYG A/Aa/6NRe4xs8pNb/rcy5Qzc8HeIkasYAKay8dnSVpg2u2oxChl+QRq4tLy0yP9m/yCdHS4SFqmEW Oe8b1/W3mTGHty4ixZJrNDCpRmEJNpLZooFQszMGzOv9AAVn0vu49RHAZHyhJAsrsVWNUGx/9PpAw gppme1iqYcn/KOkUMJBAJtHe2B5RZnG3ksd6IOplvI59AWB2sOZ9rQAxfjAQ5xGuASDoSaLMG3gyZ Qxd47BVQwx6LTHvJW52iJfkEvzpu81k6IOiy6T/463nKGIiZaZXkSHyrLzhr+DIl6tcCQkjJFFCtD Zn5XsUSA==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNR-0000000Dtx7-2v7C; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NEe-1xOY; Mon, 08 Jun 2026 15:54:57 +0100 From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 03/34] UAPI: x86: Move pvclock-abi to UAPI for x86 platforms Date: Mon, 8 Jun 2026 15:47:44 +0100 Message-ID: <20260608145455.89187-4-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: Jack Allister A subsequent commit will provide a new KVM interface for performing a fixup/correction of the KVM clock against the reference TSC. The KVM_[GS]ET_CLOCK_GUEST API requires a pvclock_vcpu_time_info, as such the caller must know about this definition. Move the definition to the UAPI folder so that it is exported to usermode and also change the type definitions to use the standard for UAPI exports. Signed-off-by: Jack Allister Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- MAINTAINERS | 4 +-- arch/x86/include/{ =3D> uapi}/asm/pvclock-abi.h | 27 ++++++++++--------- scripts/xen-hypercalls.sh | 2 +- 3 files changed, 18 insertions(+), 15 deletions(-) rename arch/x86/include/{ =3D> uapi}/asm/pvclock-abi.h (82%) diff --git a/MAINTAINERS b/MAINTAINERS index 882214b0e7db..dc0f6516beb4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14402,7 +14402,7 @@ S: Supported T: git git://git.kernel.org/pub/scm/virt/kvm/kvm.git F: arch/um/include/asm/kvm_para.h F: arch/x86/include/asm/kvm_para.h -F: arch/x86/include/asm/pvclock-abi.h +F: arch/x86/include/uapi/asm/pvclock-abi.h F: arch/x86/include/uapi/asm/kvm_para.h F: arch/x86/kernel/kvm.c F: arch/x86/kernel/kvmclock.c @@ -29081,7 +29081,7 @@ R: Boris Ostrovsky L: xen-devel@lists.xenproject.org (moderated for non-subscribers) S: Supported F: arch/x86/configs/xen.config -F: arch/x86/include/asm/pvclock-abi.h +F: arch/x86/include/uapi/asm/pvclock-abi.h F: arch/x86/include/asm/xen/ F: arch/x86/platform/pvh/ F: arch/x86/xen/ diff --git a/arch/x86/include/asm/pvclock-abi.h b/arch/x86/include/uapi/asm= /pvclock-abi.h similarity index 82% rename from arch/x86/include/asm/pvclock-abi.h rename to arch/x86/include/uapi/asm/pvclock-abi.h index b9fece5fc96d..6d70cf640362 100644 --- a/arch/x86/include/asm/pvclock-abi.h +++ b/arch/x86/include/uapi/asm/pvclock-abi.h @@ -1,6 +1,9 @@ -/* SPDX-License-Identifier: GPL-2.0 */ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ #ifndef _ASM_X86_PVCLOCK_ABI_H #define _ASM_X86_PVCLOCK_ABI_H + +#include + #ifndef __ASSEMBLER__ =20 /* @@ -24,20 +27,20 @@ */ =20 struct pvclock_vcpu_time_info { - u32 version; - u32 pad0; - u64 tsc_timestamp; - u64 system_time; - u32 tsc_to_system_mul; - s8 tsc_shift; - u8 flags; - u8 pad[2]; + __u32 version; + __u32 pad0; + __u64 tsc_timestamp; + __u64 system_time; + __u32 tsc_to_system_mul; + __s8 tsc_shift; + __u8 flags; + __u8 pad[2]; } __attribute__((__packed__)); /* 32 bytes */ =20 struct pvclock_wall_clock { - u32 version; - u32 sec; - u32 nsec; + __u32 version; + __u32 sec; + __u32 nsec; } __attribute__((__packed__)); =20 #define PVCLOCK_TSC_STABLE_BIT (1 << 0) diff --git a/scripts/xen-hypercalls.sh b/scripts/xen-hypercalls.sh index f18b00843df3..51a722198997 100755 --- a/scripts/xen-hypercalls.sh +++ b/scripts/xen-hypercalls.sh @@ -5,7 +5,7 @@ shift in=3D"$@" =20 for i in $in; do - eval $CPP $LINUXINCLUDE -dD -imacros "$i" -x c /dev/null + eval $CPP -D__KERNEL__ $LINUXINCLUDE -dD -imacros "$i" -x c /dev/null done | \ awk '$1 =3D=3D "#define" && $2 ~ /__HYPERVISOR_[a-z][a-z_0-9]*/ { v[$3] = =3D $2 } END { print "/* auto-generated by scripts/xen-hypercall.sh */" --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930576; cv=none; d=zohomail.com; s=zohoarc; b=gUXNGmNS/RLncTGdZdIKm1J601C/rxq/ZA1FTLeRKuEriXfSi0itgF5pbFVgxhe3LU2ZRMkUeyO6Y8d+NlQFFMoec6VXsIUN4TNkF98QsOW5ChjpZiAj1Fyzo6yqJsqoImvTCJRnkbQ6+FGSkKepqLupXLZtFASluyDSjT/rkFo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930576; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=2Oxi1JQyjawFeozluuHPF/o1BI3urjdsB6sGEgqw30E=; b=RNHd2bsiB5UOdS+/HITcBCovHasYl8R2/KsfcGtZeeZXdV/JJTJpL7/DKfm7qcfigD9xTjZM4xsmsbZoCfP/LBxQV99YZoKvifqvfqY25S/79dl6RpD1OFWBdImwyV+8eLURcdPiLLbJgW3YVXvFbe8+//JmA+I0bDbPiENmbOk= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930576146878.9504904243287; Mon, 8 Jun 2026 07:56:16 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331744.1594479 (Exim 4.92) (envelope-from ) id 1wWbOL-0001SW-Hg; Mon, 08 Jun 2026 14:55:53 +0000 Received: by outflank-mailman (output) from mailman id 1331744.1594479; Mon, 08 Jun 2026 14:55:53 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbOJ-0001Ji-Pv; Mon, 08 Jun 2026 14:55:51 +0000 Received: by outflank-mailman (input) for mailman id 1331744; Mon, 08 Jun 2026 14:55:37 +0000 Received: from mx.expurgate.net ([195.190.135.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbO3-0006FU-Oe for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:35 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbO2-002m34-WA; Mon, 08 Jun 2026 16:55:35 +0200 Received: from [10.42.69.9] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7d7-bab6-0a2a0a5309dd-0a2a4509a2de-28 for ; Mon, 08 Jun 2026 16:55:34 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-bad1c0.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7e6-2497-0a2a45090019-5a9b3222d7b0-3 for ; Mon, 08 Jun 2026 16:55:34 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNR-0000000Dtx5-2ug3; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NEh-27Vz; Mon, 08 Jun 2026 15:54:57 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=2Oxi1JQyjawFeozluuHPF/o1BI3urjdsB6sGEgqw30E=; b=N/Noax8ZTZo3+v8S2RV+79V2c+ ut8YiQEhddA432Gob+zYKbBnP4DPqiMXim4qlPmp9YrYBq8jW84y5XpBfoz4+2MnIpu8x52wzHFqj ji1mRtWgsujcD0UwuK7DCy+gAgDnYjYsFAJT7Rd2Qekp7EOoC8/MHIhOaHDyVSdZm/1+GCZqxef6E PmA8z7xCw1h7nhVFcQZ7TZygzKED832R9G+ecPPeWvhzFda8ho6rYCRC3ivmd4d/5XR80etslgilY o7DN4wQjCK1jB3OlXaZT4+VEN8tD9HGHYaT/Iq/0SQduBO7eQZG02RBWwMQiX1J5hH+CKZJIk5MhO cyjIdsPg==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 04/34] KVM: x86: Add KVM_[GS]ET_CLOCK_GUEST for accurate KVM clock migration Date: Mon, 8 Jun 2026 15:47:45 +0100 Message-ID: <20260608145455.89187-5-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-bad1c0/1780930534-4236CA53-662336E5/0/0 X-purgate-type: clean X-purgate-size: 10710 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930577386158500 Content-Type: text/plain; charset="utf-8" From: Jack Allister In the common case (where kvm->arch.use_master_clock is true), the KVM clock is defined as a simple arithmetic function of the guest TSC, based on a reference point stored in kvm->arch.master_kernel_ns and kvm->arch.master_cycle_now. The existing KVM_[GS]ET_CLOCK functionality does not allow for this relationship to be precisely saved and restored by userspace. All it can currently do is set the KVM clock at a given UTC reference time, which is necessarily imprecise. So on live update, the guest TSC can remain cycle accurate at precisely the same offset from the host TSC, but there is no way for userspace to restore the KVM clock accurately. Even on live migration to a new host, where the accuracy of the guest time-keeping is fundamentally limited by the accuracy of wallclock synchronization between the source and destination hosts, the clock jump experienced by the guest's TSC and its KVM clock should at least be *consistent*. Even when the guest TSC suffers a discontinuity, its KVM clock should still remain the *same* arithmetic function of the guest TSC, and not suffer an *additional* discontinuity. To allow for accurate migration of the KVM clock, add per-vCPU ioctls which save and restore the actual PV clock info in pvclock_vcpu_time_info. The restoration in KVM_SET_CLOCK_GUEST works by creating a new reference point in time just as kvm_update_masterclock() does, and calculating the corresponding guest TSC value. This guest TSC value is then passed through the user-provided pvclock structure to generate the *intended* KVM clock value at that point in time, and through the *actual* KVM clock calculation. Then kvm->arch.kvmclock_offset is adjusted to eliminate the difference. Where kvm->arch.use_master_clock is false (because the host TSC is unreliable, or the guest TSCs are configured strangely), the KVM clock is *not* defined as a function of the guest TSC so KVM_GET_CLOCK_GUEST returns an error. In this case, as documented, userspace shall use the legacy KVM_GET_CLOCK ioctl. The loss of precision is acceptable in this case since the clocks are imprecise in this mode anyway. On *restoration*, if kvm->arch.use_master_clock is false, an error is returned for similar reasons and userspace shall fall back to using KVM_SET_CLOCK. This does mean that, as documented, userspace needs to use *both* KVM_GET_CLOCK_GUEST and KVM_GET_CLOCK and send both results with the migration data (unless the intent is to refuse to resume on a host with bad TSC). Co-developed-by: David Woodhouse Signed-off-by: David Woodhouse Signed-off-by: Jack Allister Reviewed-by: Paul Durrant Cc: Dongli Zhang --- Documentation/virt/kvm/api.rst | 37 ++++++++ arch/x86/kvm/x86.c | 164 +++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 3 + 3 files changed, 204 insertions(+) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 52bbbb553ce1..2268b4442df6 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6553,6 +6553,43 @@ KVM_S390_KEYOP_SSKE Sets the storage key for the guest address ``guest_addr`` to the key specified in ``key``, returning the previous value in ``key``. =20 +4.145 KVM_GET_CLOCK_GUEST +---------------------------- + +:Capability: none +:Architectures: x86_64 +:Type: vcpu ioctl +:Parameters: struct pvclock_vcpu_time_info (out) +:Returns: 0 on success, <0 on error + +Retrieves the current time information structure used for KVM/PV clocks, +in precisely the form advertised to the guest vCPU, which gives parameters +for a direct conversion from a guest TSC value to nanoseconds. + +When the KVM clock is not in "master clock" mode, for example because the +host TSC is unreliable or the guest TSCs are oddly configured, the KVM clo= ck +is actually defined by the host CLOCK_MONOTONIC_RAW instead of the guest T= SC. +In this case, the KVM_GET_CLOCK_GUEST ioctl returns -EINVAL. + +4.146 KVM_SET_CLOCK_GUEST +---------------------------- + +:Capability: none +:Architectures: x86_64 +:Type: vcpu ioctl +:Parameters: struct pvclock_vcpu_time_info (in) +:Returns: 0 on success, <0 on error + +Sets the KVM clock (for the whole VM) in terms of the vCPU TSC, using the +pvclock structure as returned by KVM_GET_CLOCK_GUEST. This allows the prec= ise +arithmetic relationship between guest TSC and KVM clock to be preserved by +userspace across migration. + +When the KVM clock is not in "master clock" mode, and the KVM clock is act= ually +defined by the host CLOCK_MONOTONIC_RAW, this ioctl returns -EINVAL. Users= pace +may choose to set the clock using the less precise KVM_SET_CLOCK ioctl, or= may +choose to fail, denying migration to a host whose TSC is misbehaving. + .. _kvm_run: =20 5. The kvm_run structure diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d9ef165df6a1..b7e5f6e3dc6c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6205,6 +6205,162 @@ static int kvm_get_reg_list(struct kvm_vcpu *vcpu, return 0; } =20 +#ifdef CONFIG_X86_64 +static int kvm_vcpu_ioctl_get_clock_guest(struct kvm_vcpu *v, void __user = *argp) +{ + struct pvclock_vcpu_time_info hv_clock =3D {}; + struct kvm_vcpu_arch *vcpu =3D &v->arch; + struct kvm_arch *ka =3D &v->kvm->arch; + unsigned int seq; + + /* + * If KVM_REQ_CLOCK_UPDATE is already pending, or if the pvclock + * has never been generated at all, call kvm_guest_time_update(). + */ + if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, v) || !vcpu->hw_tsc_hz) { + int idx =3D srcu_read_lock(&v->kvm->srcu); + int ret =3D kvm_guest_time_update(v); + + srcu_read_unlock(&v->kvm->srcu, idx); + if (ret) + return -EINVAL; + } + + /* + * Reconstruct the pvclock from the master clock state, matching + * exactly what kvm_guest_time_update() writes to the guest. + */ + do { + seq =3D read_seqcount_begin(&ka->pvclock_sc); + + if (!ka->use_master_clock) + return -EINVAL; + + hv_clock.tsc_timestamp =3D kvm_read_l1_tsc(v, ka->master_cycle_now); + hv_clock.system_time =3D ka->master_kernel_ns + ka->kvmclock_offset; + } while (read_seqcount_retry(&ka->pvclock_sc, seq)); + + hv_clock.tsc_shift =3D vcpu->pvclock_tsc_shift; + hv_clock.tsc_to_system_mul =3D vcpu->pvclock_tsc_mul; + hv_clock.flags =3D PVCLOCK_TSC_STABLE_BIT; + + if (copy_to_user(argp, &hv_clock, sizeof(hv_clock))) + return -EFAULT; + + return 0; +} + +/* + * Reverse the calculation in the hv_clock definition. + * + * time_ns =3D ( (cycles << shift) * mul ) >> 32; + * (although shift can be negative, so that's bad C) + * + * So for a single second, + * NSEC_PER_SEC =3D ( ( FREQ_HZ << shift) * mul ) >> 32 + * NSEC_PER_SEC << 32 =3D ( FREQ_HZ << shift ) * mul + * ( NSEC_PER_SEC << 32 ) / mul =3D FREQ_HZ << shift + * ( NSEC_PER_SEC << 32 ) / mul ) >> shift =3D FREQ_HZ + */ +static u64 hvclock_to_hz(u32 mul, s8 shift) +{ + u64 tm =3D NSEC_PER_SEC << 32; + + /* Maximise precision. Shift right until the top bit is set */ + tm <<=3D 2; + shift +=3D 2; + + /* While 'mul' is even, increase the shift *after* the division */ + while (!(mul & 1)) { + shift++; + mul >>=3D 1; + } + + tm /=3D mul; + + if (shift > 0) + return tm >> shift; + else + return tm << -shift; +} + +static int kvm_vcpu_ioctl_set_clock_guest(struct kvm_vcpu *v, void __user = *argp) +{ + struct pvclock_vcpu_time_info user_hv_clock; + struct kvm *kvm =3D v->kvm; + struct kvm_arch *ka =3D &kvm->arch; + u64 curr_tsc_hz, user_tsc_hz; + u64 user_clk_ns; + u64 guest_tsc; + int rc =3D 0; + + if (copy_from_user(&user_hv_clock, argp, sizeof(user_hv_clock))) + return -EFAULT; + + if (user_hv_clock.pad0 || user_hv_clock.pad[0] || user_hv_clock.pad[1]) + return -EINVAL; + + if (!user_hv_clock.tsc_to_system_mul) + return -EINVAL; + + if (user_hv_clock.tsc_shift < -32 || user_hv_clock.tsc_shift > 32) + return -EINVAL; + + user_tsc_hz =3D hvclock_to_hz(user_hv_clock.tsc_to_system_mul, + user_hv_clock.tsc_shift); + + kvm_hv_request_tsc_page_update(kvm); + + /* + * kvm_start_pvclock_update() takes tsc_write_lock and opens + * the pvclock seqcount; kvm_end_pvclock_update() closes both. + * All clock state modifications between them are atomic with + * respect to readers in kvm_guest_time_update(). + */ + kvm_start_pvclock_update(kvm); + pvclock_update_vm_gtod_copy(kvm); + + if (!ka->use_master_clock) { + rc =3D -EINVAL; + goto out; + } + + curr_tsc_hz =3D (u64)get_cpu_tsc_khz() * 1000; + if (unlikely(curr_tsc_hz =3D=3D 0)) { + rc =3D -EINVAL; + goto out; + } + + if (kvm_caps.has_tsc_control) + curr_tsc_hz =3D kvm_scale_tsc(curr_tsc_hz, + v->arch.l1_tsc_scaling_ratio); + + /* + * Allow for a discrepancy of 1 kHz either way between the TSC + * frequency used to generate the user's pvclock and the current + * host's measured frequency, since they may not precisely match. + */ + if (user_tsc_hz < curr_tsc_hz - 1000 || + user_tsc_hz > curr_tsc_hz + 1000) { + rc =3D -ERANGE; + goto out; + } + + /* + * Calculate the guest TSC at the new reference point, and the + * corresponding KVM clock value according to user_hv_clock. + * Adjust kvmclock_offset so both definitions agree. + */ + guest_tsc =3D kvm_read_l1_tsc(v, ka->master_cycle_now); + user_clk_ns =3D __pvclock_read_cycles(&user_hv_clock, guest_tsc); + ka->kvmclock_offset =3D user_clk_ns - ka->master_kernel_ns; + +out: + kvm_end_pvclock_update(kvm); + return rc; +} +#endif + long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -6605,6 +6761,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp, srcu_read_unlock(&vcpu->kvm->srcu, idx); break; } +#ifdef CONFIG_X86_64 + case KVM_SET_CLOCK_GUEST: + r =3D kvm_vcpu_ioctl_set_clock_guest(vcpu, argp); + break; + case KVM_GET_CLOCK_GUEST: + r =3D kvm_vcpu_ioctl_get_clock_guest(vcpu, argp); + break; +#endif #ifdef CONFIG_KVM_HYPERV case KVM_GET_SUPPORTED_HV_CPUID: r =3D kvm_ioctl_get_supported_hv_cpuid(vcpu, argp); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 6c8afa2047bf..9b50191b859c 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1669,4 +1669,7 @@ struct kvm_pre_fault_memory { __u64 padding[5]; }; =20 +#define KVM_SET_CLOCK_GUEST _IOW(KVMIO, 0xd6, struct pvclock_vcpu_time_inf= o) +#define KVM_GET_CLOCK_GUEST _IOR(KVMIO, 0xd7, struct pvclock_vcpu_time_inf= o) + #endif /* __LINUX_KVM_H */ --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930551; cv=none; d=zohomail.com; s=zohoarc; b=YnNF0xSt9z93dcqGTLRgF/w8xHFM+tU2p1M2wuJh8jOo4jownDdYU8DCnPfc51TOn492Txv7+7di8TqddYQhBfDXDwIu57Es4iNdrh0YOML9FgX1J6SkD07Y757Ox4rjFn1Zv6M2GNHGLimyxd9OgurHsLGgkmTJz6IPZgdN5LQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930551; h=Content-Type:Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=6TRJ2V15CGAD4IK40NX80aKZiFhmGyTcFBfAlg5QdUA=; b=QxvvY0ByULTHL2DLQl7Qwm8rrL8WFg3vQ4k3CgPy9qg9pR2gRhMGjcJS1WvwrKKQ0FpG6KzS5z9PeieHaqGxckFpjBxdD51bYwTHFg9CDmHmI/JD+mRfvo9g9cCO4DcXyx3gW7mI6amgICwS2xPAJGyUew1DmX9/WcKZHTP6Nwg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930551837640.7119563782245; Mon, 8 Jun 2026 07:55:51 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331716.1594368 (Exim 4.92) (envelope-from ) id 1wWbNs-00048H-PT; Mon, 08 Jun 2026 14:55:24 +0000 Received: by outflank-mailman (output) from mailman id 1331716.1594368; Mon, 08 Jun 2026 14:55:24 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNs-00043r-0E; Mon, 08 Jun 2026 14:55:24 +0000 Received: by outflank-mailman (input) for mailman id 1331716; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNl-0002Ns-5G for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNk-00EcHE-HR; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [10.42.69.1] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7d0-5cb7-0a2a0a5109dd-0a2a45018424-6 for ; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [90.155.92.199] (helo=desiato.infradead.org) by tlsNG-d62444.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d3-c1f2-0a2a45010019-5a9b5cc7e19c-3 for ; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001Afv-2Kws; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NEk-2IDj; Mon, 08 Jun 2026 15:54:57 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=desiato.20200630 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=6TRJ2V15CGAD4IK40NX80aKZiFhmGyTcFBfAlg5QdUA=; b=M4uOYeD6r7mluxNx0PQ0VPxKLD bRZ6ybXWOtawZ107tdN6ss1YWpR+PNV5H36UUsXsmsYjupy7/21JjI2LTBAC88/H40Ern9a1yA4hk shvf8wWkdDQ9KipqprAG6mNeKZ+TN8Elp8obsPY7oyXIRH3CPnLfLbxLx7I8WlDmFJZKsBNs45jyW 2kzqfqs/tRCOlfKPR6mASqEmn2wFwrOHG8mRQDTYY7xM5kLsQ4nuEWaOU25xbCvykEPBgFMG2rdLK RFqmrhZZ6llgZB7RnXqNsDoC+S+fkyasPetFfzN080Rzt8MtCuf8vHNY2Ik6ymOp3GZsZifFGpAfL yjvlb2jA==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 05/34] KVM: selftests: Add KVM/PV clock selftest to prove timer correction Date: Mon, 8 Jun 2026 15:47:46 +0100 Message-ID: <20260608145455.89187-6-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-d62444/1780930516-B5B46FF4-BB22682A/0/0 X-purgate-type: clean X-purgate-size: 15950 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930553270158500 From: Jack Allister A VM's KVM/PV clock has an inherent relationship to its TSC. When either the host system live-updates or the VM is live-migrated this pairing of the two clock sources should stay the same. In reality this is not the case without some correction taking place. The KVM_GET_CLOCK_GUEST/KVM_SET_CLOCK_GUEST ioctls can be used to perform a correction on the PVTI (PV time information) structure held by KVM to effectively fix up the kvmclock_offset prior to the guest VM resuming in either a live-update/migration scenario. This test proves that without the necessary fixup there is a perceived change in the guest TSC and KVM/PV clock relationship before and after a simulated LU/LM takes place, and that the correction eliminates it. The test: 1. Snapshots the PVTI at boot (PVTI0). 2. Induces a change in PVTI data (KVM_REQ_MASTERCLOCK_UPDATE). 3. Snapshots the PVTI after the change (PVTI1). 4. Requests correction via KVM_SET_CLOCK_GUEST using PVTI0. 5. Snapshots the PVTI after correction (PVTI2). Then samples the TSC at a single point in time and calculates the KVM clock using each PVTI snapshot. The corrected clock should match the boot clock to within =C2=B11ns. The test enumerates multiple TSC frequencies from 1GHz to 5GHz at 500MHz steps, crossing the 32-bit boundary, to exercise the scaling path at various ratios. The sleep duration between snapshots is configurable via the -s/--sleep command line option. Co-developed-by: David Woodhouse Signed-off-by: David Woodhouse Signed-off-by: Jack Allister Reviewed-by: Paul Durrant Cc: Dongli Zhang --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../testing/selftests/kvm/x86/pvclock_test.c | 440 ++++++++++++++++++ 2 files changed, 441 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86/pvclock_test.c diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index 9118a5a51b89..fb935ae3bf38 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -105,6 +105,7 @@ TEST_GEN_PROGS_x86 +=3D x86/pmu_counters_test TEST_GEN_PROGS_x86 +=3D x86/pmu_event_filter_test TEST_GEN_PROGS_x86 +=3D x86/private_mem_conversions_test TEST_GEN_PROGS_x86 +=3D x86/private_mem_kvm_exits_test +TEST_GEN_PROGS_x86 +=3D x86/pvclock_test TEST_GEN_PROGS_x86 +=3D x86/set_boot_cpu_id TEST_GEN_PROGS_x86 +=3D x86/set_sregs_test TEST_GEN_PROGS_x86 +=3D x86/smaller_maxphyaddr_emulation_test diff --git a/tools/testing/selftests/kvm/x86/pvclock_test.c b/tools/testing= /selftests/kvm/x86/pvclock_test.c new file mode 100644 index 000000000000..aecd62fc8a93 --- /dev/null +++ b/tools/testing/selftests/kvm/x86/pvclock_test.c @@ -0,0 +1,440 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright =C2=A9 Amazon.com, Inc. or its affiliates. + * + * Tests for pvclock API + * KVM_SET_CLOCK_GUEST/KVM_GET_CLOCK_GUEST + */ +#include +#include +#include +#include +#include + +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" +#include "apic.h" + +#include + +/* + * Reproduce the pvclock calculation the guest uses to convert TSC to + * nanoseconds. This must match the kernel's __pvclock_read_cycles(). + */ +static inline uint64_t pvclock_scale_delta(uint64_t delta, uint32_t mul, + int8_t shift) +{ + if (shift < 0) + delta >>=3D -shift; + else + delta <<=3D shift; + return ((__uint128_t)delta * mul) >> 32; +} + +static inline uint64_t pvclock_read_cycles(struct pvclock_vcpu_time_info *= src, + uint64_t tsc) +{ + uint64_t delta =3D tsc - src->tsc_timestamp; + + return src->system_time + pvclock_scale_delta(delta, + src->tsc_to_system_mul, + src->tsc_shift); +} + +static inline void pvti_snapshot(struct pvclock_vcpu_time_info *dst, + volatile struct pvclock_vcpu_time_info *src) +{ + uint32_t version; + + do { + version =3D src->version; + __asm__ __volatile__("" ::: "memory"); + *dst =3D *src; + __asm__ __volatile__("" ::: "memory"); + } while ((src->version & 1) || src->version !=3D version); +} + +enum { + STAGE_FIRST_BOOT, + STAGE_UNCORRECTED, + STAGE_CORRECTED +}; + +#define KVMCLOCK_GPA 0xc0000000ull +#define KVMCLOCK_SIZE sizeof(struct pvclock_vcpu_time_info) + +static void trigger_pvti_update(void) +{ + /* + * Toggle between KVM's old and new system time methods to coerce KVM + * into updating the fields in the PV time info struct. + */ + wrmsr(MSR_KVM_SYSTEM_TIME, KVMCLOCK_GPA | KVM_MSR_ENABLED); + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, KVMCLOCK_GPA | KVM_MSR_ENABLED); +} + +static void guest_code(void) +{ + struct pvclock_vcpu_time_info *pvti =3D + (void *)(unsigned long)KVMCLOCK_GPA; + struct pvclock_vcpu_time_info pvti_boot; + struct pvclock_vcpu_time_info pvti_uncorrected; + struct pvclock_vcpu_time_info pvti_corrected; + uint64_t tsc_guest; + uint64_t clk_boot, clk_uncorrected, clk_corrected; + int64_t delta_corrected; + + /* Set up kvmclock and snapshot the initial pvclock parameters. */ + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, KVMCLOCK_GPA | KVM_MSR_ENABLED); + pvti_snapshot(&pvti_boot, pvti); + GUEST_SYNC(STAGE_FIRST_BOOT); + + /* + * Trigger an update of the PVTI. Calculating the KVM clock using this + * updated structure will show a delta from the original. + */ + trigger_pvti_update(); + pvti_snapshot(&pvti_uncorrected, pvti); + GUEST_SYNC(STAGE_UNCORRECTED); + + /* + * Snapshot the corrected time (the host does KVM_SET_CLOCK_GUEST when + * handling STAGE_UNCORRECTED). + */ + pvti_snapshot(&pvti_corrected, pvti); + + /* + * Sample the TSC at a single point in time, then calculate the + * effective KVM clock using the PVTI from each stage. Verify that the + * corrected clock matches the boot clock to within =C2=B12ns. + */ + tsc_guest =3D rdtsc(); + + clk_boot =3D pvclock_read_cycles(&pvti_boot, tsc_guest); + clk_uncorrected =3D pvclock_read_cycles(&pvti_uncorrected, tsc_guest); + clk_corrected =3D pvclock_read_cycles(&pvti_corrected, tsc_guest); + + delta_corrected =3D clk_boot - clk_corrected; + + __GUEST_ASSERT(delta_corrected >=3D -2 && delta_corrected <=3D 2, + "corrected delta %ld out of range (boot=3D%lu uncorrected=3D%lu c= orrected=3D%lu)", + delta_corrected, clk_boot, clk_uncorrected, clk_corrected); + + GUEST_SYNC(STAGE_CORRECTED); +} + +static void run_test(struct kvm_vm *vm, struct kvm_vcpu *vcpu, + unsigned int sleep_sec) +{ + struct pvclock_vcpu_time_info pvti_before; + struct ucall uc; + + for (;;) { + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + break; + case UCALL_SYNC: + break; + default: + TEST_FAIL("Unexpected ucall"); + } + + switch (uc.args[1]) { + case STAGE_FIRST_BOOT: + /* Save the pvclock parameters before the update. */ + vcpu_ioctl(vcpu, KVM_GET_CLOCK_GUEST, &pvti_before); + + /* Sleep to let the clocks diverge. */ + sleep(sleep_sec); + break; + + case STAGE_UNCORRECTED: + /* Restore the original pvclock parameters. */ + vcpu_ioctl(vcpu, KVM_SET_CLOCK_GUEST, &pvti_before); + break; + + case STAGE_CORRECTED: + /* Guest verified the delta in-guest. */ + return; + + default: + TEST_FAIL("Unknown stage %lu", uc.args[1]); + } + } +} + +static void configure_pvclock(struct kvm_vm *vm) +{ + unsigned int nr_pages; + + nr_pages =3D vm_calc_num_guest_pages(VM_MODE_DEFAULT, getpagesize()); + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, + KVMCLOCK_GPA, 1, nr_pages, 0); + virt_map(vm, KVMCLOCK_GPA, KVMCLOCK_GPA, nr_pages); +} + +static void run_at_frequency(uint64_t tsc_khz, unsigned int sleep_sec) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + + pr_info("Testing at TSC frequency %lu kHz\n", tsc_khz); + vm =3D vm_create_with_one_vcpu(&vcpu, guest_code); + configure_pvclock(vm); + vcpu_ioctl(vcpu, KVM_SET_TSC_KHZ, (void *)tsc_khz); + run_test(vm, vcpu, sleep_sec); + kvm_vm_free(vm); +} + +static void test_tsc_stable_bit(void); +static void test_clock_guest_with_offsets(void); + +static void usage(const char *name) +{ + printf("Usage: %s [options]\n" + " -s, --sleep SEC sleep duration between snapshots (default: = 2)\n" + " -h, --help show this help\n", name); +} + +int main(int argc, char *argv[]) +{ + static const struct option long_opts[] =3D { + { "sleep", required_argument, NULL, 's' }, + { "help", no_argument, NULL, 'h' }, + { NULL, 0, NULL, 0 }, + }; + unsigned int sleep_sec =3D 2; + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + uint64_t host_khz; + uint64_t freq; + int opt; + + while ((opt =3D getopt_long(argc, argv, "s:h", long_opts, NULL)) !=3D -1)= { + switch (opt) { + case 's': + sleep_sec =3D atoi(optarg); + break; + case 'h': + default: + usage(argv[0]); + return opt =3D=3D 'h' ? 0 : 1; + } + } + + TEST_REQUIRE(sys_clocksource_is_based_on_tsc()); + TEST_REQUIRE(kvm_has_cap(KVM_CAP_TSC_CONTROL)); + + vm =3D vm_create_with_one_vcpu(&vcpu, guest_code); + configure_pvclock(vm); + + /* Check KVM_GET_CLOCK_GUEST is supported */ + { + struct pvclock_vcpu_time_info tmp; + int ret =3D __vcpu_ioctl(vcpu, KVM_GET_CLOCK_GUEST, &tmp); + TEST_REQUIRE(ret !=3D -1 || errno !=3D ENOTTY); + } + + /* First run at native frequency (no scaling). */ + run_test(vm, vcpu, sleep_sec); + + /* + * Then enumerate a range of TSC frequencies crossing the 32-bit + * boundary, to exercise the scaling path at various ratios. + */ + host_khz =3D __vcpu_ioctl(vcpu, KVM_GET_TSC_KHZ, NULL); + kvm_vm_free(vm); + + for (freq =3D 1000000; freq <=3D 5000000; freq +=3D 500000) { + if (freq =3D=3D host_khz) + continue; + run_at_frequency(freq, sleep_sec); + } + + test_tsc_stable_bit(); + test_clock_guest_with_offsets(); + + return 0; +} + +static void guest_code_stable_bit(void) +{ + uint32_t apic_id =3D GET_APIC_ID_FIELD(xapic_read_reg(APIC_ID)); + uint64_t gpa =3D KVMCLOCK_GPA + apic_id * sizeof(struct pvclock_vcpu_time= _info); + + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, gpa | KVM_MSR_ENABLED); + GUEST_SYNC(0); + GUEST_SYNC(0); + GUEST_SYNC(0); +} + +static void set_tsc_offset(struct kvm_vcpu *vcpu, uint64_t offset) +{ + struct kvm_device_attr attr =3D { + .group =3D KVM_VCPU_TSC_CTRL, + .attr =3D KVM_VCPU_TSC_OFFSET, + .addr =3D (__u64)(uintptr_t)&offset, + }; + vcpu_ioctl(vcpu, KVM_SET_DEVICE_ATTR, &attr); +} + +static void run_vcpu_once(struct kvm_vcpu *vcpu) +{ + struct ucall uc; + + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + break; + case UCALL_SYNC: + break; + default: + TEST_FAIL("Unexpected ucall"); + } +} + +static void test_tsc_stable_bit(void) +{ + struct pvclock_vcpu_time_info pvti; + struct kvm_vcpu *vcpus[2]; + struct kvm_vm *vm; + int ret; + + pr_info("Testing PVCLOCK_TSC_STABLE_BIT with matched/unmatched TSCs\n"); + + vm =3D vm_create_with_vcpus(2, guest_code_stable_bit, vcpus); + configure_pvclock(vm); + + /* + * Case 1: All TSCs matched (same frequency and offset). + * Master clock should be active, PVCLOCK_TSC_STABLE_BIT set. + */ + run_vcpu_once(vcpus[0]); + + ret =3D __vcpu_ioctl(vcpus[0], KVM_GET_CLOCK_GUEST, &pvti); + TEST_ASSERT(!ret, "GET_CLOCK_GUEST should succeed with matched TSCs"); + TEST_ASSERT(pvti.flags & PVCLOCK_TSC_STABLE_BIT, + "PVCLOCK_TSC_STABLE_BIT should be set with matched TSCs"); + + /* + * Case 2: Different TSC offset, same frequency. + * Master clock should still be active (frequency matches), but + * PVCLOCK_TSC_STABLE_BIT should be cleared (offsets differ). + */ + set_tsc_offset(vcpus[1], 12345678); + run_vcpu_once(vcpus[1]); + run_vcpu_once(vcpus[0]); + + ret =3D __vcpu_ioctl(vcpus[0], KVM_GET_CLOCK_GUEST, &pvti); + if (ret) { + /* Master clock disabled by offset mismatch =E2=80=94 old kernel */ + pr_info(" Skipping offset tests (master clock requires matched offsets)= \n"); + goto out_stable; + } + TEST_ASSERT(!(pvti.flags & PVCLOCK_TSC_STABLE_BIT), + "PVCLOCK_TSC_STABLE_BIT should be clear with offset-mismatched TSCs"= ); + + /* + * Case 3: Different TSC frequency. + * Master clock should be disabled entirely. + */ + vcpu_ioctl(vcpus[1], KVM_SET_TSC_KHZ, + (void *)(unsigned long)(__vcpu_ioctl(vcpus[1], KVM_GET_TSC_KHZ, NULL)= / 2)); + /* Write TSC to trigger kvm_synchronize_tsc / kvm_track_tsc_matching */ + vcpu_set_msr(vcpus[1], MSR_IA32_TSC, 0); + run_vcpu_once(vcpus[1]); + + ret =3D __vcpu_ioctl(vcpus[0], KVM_GET_CLOCK_GUEST, &pvti); + TEST_ASSERT(ret && errno =3D=3D EINVAL, + "GET_CLOCK_GUEST should fail with frequency-mismatched TSCs, got %d = (errno %d)", + ret, errno); + +out_stable: + kvm_vm_free(vm); +} + +static void test_clock_guest_with_offsets(void) +{ + struct pvclock_vcpu_time_info pvti0, pvti1, pvti1_after; + struct kvm_vcpu *vcpus[2]; + struct kvm_vm *vm; + int64_t delta; + int ret; + + pr_info("Testing KVM_[GS]ET_CLOCK_GUEST with different TSC offsets\n"); + + vm =3D vm_create_with_vcpus(2, guest_code_stable_bit, vcpus); + configure_pvclock(vm); + + /* Set different TSC offsets on the two vCPUs */ + set_tsc_offset(vcpus[0], 0); + set_tsc_offset(vcpus[1], 1000000000ull); + + /* Run both to establish kvmclock */ + run_vcpu_once(vcpus[0]); + run_vcpu_once(vcpus[1]); + + /* GET_CLOCK_GUEST on both =E2=80=94 should succeed (master clock active)= */ + ret =3D __vcpu_ioctl(vcpus[0], KVM_GET_CLOCK_GUEST, &pvti0); + if (ret) { + pr_info(" Skipping (master clock requires matched offsets on this kerne= l)\n"); + kvm_vm_free(vm); + return; + } + ret =3D __vcpu_ioctl(vcpus[1], KVM_GET_CLOCK_GUEST, &pvti1); + TEST_ASSERT(!ret, "GET_CLOCK_GUEST on vcpu1 failed"); + + /* The tsc_timestamps should differ (different offsets) */ + TEST_ASSERT(pvti0.tsc_timestamp !=3D pvti1.tsc_timestamp, + "tsc_timestamps should differ with different offsets"); + + /* Sleep to let time elapse, then restore vcpu0's clock */ + sleep(1); + vcpu_ioctl(vcpus[0], KVM_SET_CLOCK_GUEST, &pvti0); + + /* Run vcpu0 to process the clock update */ + run_vcpu_once(vcpus[0]); + + /* GET_CLOCK_GUEST on vcpu1 =E2=80=94 should reflect the correction */ + ret =3D __vcpu_ioctl(vcpus[1], KVM_GET_CLOCK_GUEST, &pvti1_after); + TEST_ASSERT(!ret, "GET_CLOCK_GUEST on vcpu1 after SET failed"); + + /* + * After SET on vcpu0, verify the correction worked by getting + * the clock on vcpu0 again. The mul/shift should be the same, + * and computing kvmclock at the same TSC should give the same + * result as the original (within =C2=B12ns). + */ + { + struct pvclock_vcpu_time_info pvti0_after; + uint64_t tsc_now, clk_from_old, clk_from_new; + + ret =3D __vcpu_ioctl(vcpus[0], KVM_GET_CLOCK_GUEST, &pvti0_after); + TEST_ASSERT(!ret, "GET_CLOCK_GUEST on vcpu0 after SET failed"); + + tsc_now =3D pvti0_after.tsc_timestamp; + clk_from_old =3D pvclock_read_cycles(&pvti0, tsc_now); + clk_from_new =3D pvclock_read_cycles(&pvti0_after, tsc_now); + + delta =3D (int64_t)clk_from_new - (int64_t)clk_from_old; + TEST_ASSERT(delta >=3D -2 && delta <=3D 2, + "clock correction delta should be <=3D2ns, got %ld ns", + delta); + } + + /* + * Also verify that vcpu1's clock is still accessible (master + * clock still active with different offsets). + */ + ret =3D __vcpu_ioctl(vcpus[1], KVM_GET_CLOCK_GUEST, &pvti1_after); + TEST_ASSERT(!ret, "GET_CLOCK_GUEST on vcpu1 after SET failed"); + + kvm_vm_free(vm); +} --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930539; cv=none; d=zohomail.com; s=zohoarc; b=I3+smRzLNKrma+gtMb8KoOEJ/dwbw6OfrRVvxxEEXHp1IIJagOfQGeyM8VrNkJtYMe6o2LTdRKjXk5CC+jXHVWCbrGjXWq1vK5Y35uuoy2mWmy9GTDDntA17Uo7byZr1x+Yt8xx6kTfUscmsQBHDFPNZX7/YvB0GYRjZAj51wiY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930539; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=kmXiitnSRUwEV6tPjvXxCrsAzGUuV+8SMjnRuSsAfbI=; b=myKh6H2ZTvSd5Is47GCTjPUYh7IdxNmHeX74xi5mS+zltEWIGaJzPJLpX/uG85tnq4v/DHhMLuAGDhP6giGvBv8oSaJhw79ca6MzCidZoUc7PZW5CEbcPBIhVBXXDA1mW8ZYIOiktsJun5gS/tPh9nw10Ir26aX/TveNZk1wI50= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930539467847.6293558531382; Mon, 8 Jun 2026 07:55:39 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331700.1594255 (Exim 4.92) (envelope-from ) id 1wWbNh-0001PE-RZ; Mon, 08 Jun 2026 14:55:13 +0000 Received: by outflank-mailman (output) from mailman id 1331700.1594255; Mon, 08 Jun 2026 14:55:13 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNh-0001P5-My; Mon, 08 Jun 2026 14:55:13 +0000 Received: by outflank-mailman (input) for mailman id 1331700; Mon, 08 Jun 2026 14:55:13 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNg-0001OZ-9f for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:13 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNe-00EcD6-Jx; Mon, 08 Jun 2026 16:55:11 +0200 Received: from [10.42.69.9] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7bc-5cb7-0a2a0a5109dd-0a2a4509e8d2-14 for ; Mon, 08 Jun 2026 16:55:10 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-bad1c0.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7cd-2497-0a2a45090019-5a9b3222cd2a-3 for ; Mon, 08 Jun 2026 16:55:09 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNR-0000000Dtx8-31m6; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NEn-2XC4; Mon, 08 Jun 2026 15:54:57 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=kmXiitnSRUwEV6tPjvXxCrsAzGUuV+8SMjnRuSsAfbI=; b=upKvM4mru9iTXS4LtuodoQr3O7 eH8eJEEzfTSmKawhDNYywuN5Fc+EhdCFOZOLZN41PAS1m/7o4wybNoX2pVIq00YYgtNjOdpjNk/bT w6xZK5J20UhUOGa4ConrVw5wGP3C2HNrg4KzW9s2aLdaFB7Khb7b8kLJWIF9+0wHXhd/TZVw9H8eh jkdEWjp9iKK4RBFT0WMlVBBzUK9zEglvNDtbkGVBwAdqHWIbYxuFk/wrCP/Zk9Siw0RrGEMps/jDV LoyxYP549CTgy3caIpf36/2b9QoFEhgSAgOMVLGyR8f22DTiCK1aZ7z8BFlOEyRcYetYtTNrRQviN 21Ox5wOw==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 06/34] KVM: x86: Explicitly disable TSC scaling without CONSTANT_TSC Date: Mon, 8 Jun 2026 15:47:47 +0100 Message-ID: <20260608145455.89187-7-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-bad1c0/1780930510-89174A53-6F72A377/0/0 X-purgate-type: clean X-purgate-size: 1884 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930541903154100 Content-Type: text/plain; charset="utf-8" From: David Woodhouse KVM does make an attempt to cope with non-constant TSC, and has notifiers to handle host TSC frequency changes. However, it *only* adjusts the KVM clock, and doesn't adjust TSC frequency scaling when the host changes. This is presumably because non-constant TSCs were fixed in hardware long before TSC scaling was implemented, so there should never be real CPUs which have TSC scaling but *not* CONSTANT_TSC. Such a combination could potentially happen in some odd L1 nesting environment, but it isn't worth trying to support it. Just make the dependency explicit. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- arch/x86/kvm/svm/svm.c | 3 ++- arch/x86/kvm/vmx/vmx.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index e7fdd7a9c280..7817752533fe 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -5546,7 +5546,8 @@ static __init int svm_hardware_setup(void) XFEATURE_MASK_BNDCSR); =20 if (tsc_scaling) { - if (!boot_cpu_has(X86_FEATURE_TSCRATEMSR)) { + if (!boot_cpu_has(X86_FEATURE_TSCRATEMSR) || + !boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) { tsc_scaling =3D false; } else { pr_info("TSC scaling supported\n"); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a29896a9ef14..ed207cc7692d 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8672,7 +8672,7 @@ __init int vmx_hardware_setup(void) if (!enable_apicv || !cpu_has_vmx_ipiv()) enable_ipiv =3D false; =20 - if (cpu_has_vmx_tsc_scaling()) + if (cpu_has_vmx_tsc_scaling() && boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) kvm_caps.has_tsc_control =3D true; =20 kvm_caps.max_tsc_scaling_ratio =3D KVM_VMX_TSC_MULTIPLIER_MAX; --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F15C93F076C; Mon, 8 Jun 2026 14:55:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930516; cv=none; b=mp+dg/hhDBGkwdGeAm5KeJK6pgS8/K/DNZj4GETUFwONZWmLbSelHt3UmIk5RbLRPwXRfSAWQM6q24wyqF6U3plhP7knuU3OT5pSgUZTdPAZQAPeI+t3+t8Mb9m1YxuGJkbh8MNzHWoZuzhd28/pPoeEqJe5gGqda80JXFmnAZY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930516; c=relaxed/simple; bh=y4ApAUIqRPCFjHNLAgcYHd73PCgvXtcqEokKbJmlZR4=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sjXS611/HEIsJGzx+U9r32Y0dcvz2VZ0DBZKUyJ3RChRGAqSqlCBMh2qdLZWYZOjFdVYV6RrrkBWb42JJcBKaXhApLVR281dZGQYbFmRymO4Cc3JuAnIxslAe1vo5NNromwANzNhe08lPA8P/1PEVkjml037gjERwGNOgD899Co= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=aZGcqpzL; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="aZGcqpzL" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=PQH1s5X/ubTmj6TSpsud5d3OFm7dGCk1M0UtxMvyZFE=; b=aZGcqpzLCQbJf47VRT2oCaHI5o SbAvKu7kWlyBGUrmA+/l18vVNr3QdBHfahIrcZfQJkGAVGs3o38KJEBbjYKPlOT3SLqiUCtvrsFsJ +bQDvjqz5fKj7TDGjjosEQDQqsLwzFAVCtk3kq5eQJU1TTq5ZjeuuUKmfbEl1JbB7LBj0yjstTGmr tD875uYDxfNpnbgVQ2IwPqLHLrPWnwCTEwaW/6RAh1iJBiyO5Lvjt+V1kVbQq4RmF4JOOqDu7vCzO 0ECh92S0tDag2AlJwALLBG0Hyx5dV5NF5QRVf4QBmHvn8D7AQZAomwzKdDDH4Cv38nEnOl12amZgA 4srQhWTQ==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNR-0000000Dtx6-2v57; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NEq-2rr6; Mon, 08 Jun 2026 15:54:57 +0100 From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 07/34] KVM: x86: Activate master clock immediately on vCPU creation Date: Mon, 8 Jun 2026 15:47:48 +0100 Message-ID: <20260608145455.89187-8-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse Previously, the master clock was only activated when the first vCPU processed KVM_REQ_MASTERCLOCK_UPDATE during KVM_RUN. This meant that KVM_GET_CLOCK could not return the host_tsc field until after the first KVM_RUN, making it impossible for userspace to follow the documented TSC migration procedure without a dummy vCPU run. Fix this by calling kvm_update_masterclock() directly from kvm_arch_vcpu_postcreate(), after kvm_synchronize_tsc() has already set all_vcpus_matched_freq. This ensures the master clock is active immediately, and KVM_GET_CLOCK returns a valid {host_tsc, realtime} pair as soon as a vCPU exists. Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b7e5f6e3dc6c..c1897d939da9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13098,6 +13098,8 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) return; vcpu_load(vcpu); kvm_synchronize_tsc(vcpu, NULL); + if (!vcpu->kvm->arch.use_master_clock) + kvm_update_masterclock(vcpu->kvm); vcpu_put(vcpu); =20 /* poll control enabled by default */ --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930569; cv=none; d=zohomail.com; s=zohoarc; b=FKcYrfumrqbbmQKrGAXIGb7xa+snevyvWR5TXVy+U/uKJuobfJTwn3lqmGweeFbrdGM2ByCUBLC7UqTl93u3EIwKH2yekfVxOtjfbclCRb5Ede/d5MfFoey/eEN4EhkwYj0N8LapJTffxmutpga6/HCUpg1JSl79gHMnmQbKZng= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930569; h=Content-Type:Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=0hmHp3Tay4e4dT67OIE0sxOXH554yCe/d2xysunw/ow=; b=TAPcZYAqVnfY0i5xu07LeprtRPf5ljWvj9jX00VJUtR4yOZp16QAahcJszrpOw0EsEiYOlJ0BOjChttaskENxHwlZ4oJOt6bckUrjU5KiZZC4U9haSRuhNukc3ZQAgDjwFuHkoj5PioN6SlW3fpkKh+pUjDm6RVigVpGAVwEB58= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930569776712.1174899885162; Mon, 8 Jun 2026 07:56:09 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331722.1594424 (Exim 4.92) (envelope-from ) id 1wWbO5-0006L0-GV; Mon, 08 Jun 2026 14:55:37 +0000 Received: by outflank-mailman (output) from mailman id 1331722.1594424; Mon, 08 Jun 2026 14:55:36 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbO3-0006En-Kc; Mon, 08 Jun 2026 14:55:35 +0000 Received: by outflank-mailman (input) for mailman id 1331722; Mon, 08 Jun 2026 14:55:19 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNn-0002sa-Be for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:19 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNm-00EcHE-NY; Mon, 08 Jun 2026 16:55:18 +0200 Received: from [10.42.69.4] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7d5-5cb7-0a2a0a5109dd-0a2a4504bbec-6 for ; Mon, 08 Jun 2026 16:55:18 +0200 Received: from [90.155.92.199] (helo=desiato.infradead.org) by tlsNG-ebf023.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d6-1dec-0a2a45040019-5a9b5cc7af2c-3 for ; Mon, 08 Jun 2026 16:55:18 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001Aft-2Ksg; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NEt-37cf; Mon, 08 Jun 2026 15:54:57 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=desiato.20200630 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=0hmHp3Tay4e4dT67OIE0sxOXH554yCe/d2xysunw/ow=; b=i4KiovwvvEpSBanZMuNSv5yZaZ MBttEb7j17qLekz/G3XnoOg9pg40k7eaIcK/1OltaMIFfRBdN5OH3uAsh1KZWItODngkNk8FdevOj vxZ3DFrsjPbmW+xiFTwKHTyqiuEdTYi1SONbS/EpVsqlyxsqhnvLB8S65jQIqPpL+6o01TpuMF7OV v4ZnBnApmz5klnUuTBWgqpMm17FdLze2GnhK01WACOGWv+cIJf+kJrZ1ZzXMji4QG5ZrP7Aicqgob sWK115j7XH875jTDi/U1JV7VneYjYvHLhuN583CY/lge9rNDvGaviarRV+TPryj598ZGQfi0iGvqx m8nfsVhg==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 08/34] KVM: x86: Add KVM_VCPU_TSC_SCALE and fix the documentation on TSC migration Date: Mon, 8 Jun 2026 15:47:49 +0100 Message-ID: <20260608145455.89187-9-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-ebf023/1780930518-40F753FF-495C3118/0/0 X-purgate-type: clean X-purgate-size: 23767 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930571355158500 From: David Woodhouse The documentation on TSC migration using KVM_VCPU_TSC_OFFSET is woefully inadequate. It ignores TSC scaling, and ignores the fact that the host TSC may differ from one host to the next (and in fact because of the way the kernel calibrates it, it generally differs from one boot to the next even on the same hardware). Add KVM_VCPU_TSC_SCALE to extract the actual scale ratio and frac_bits, and attempt to document the process that userspace needs to follow to preserve the TSC across migration. Add a self test to function as an exemplar. Only enumerate KVM_VCPU_TSC_SCALE when kvm_caps.has_tsc_control is true, since the scaling ratio is only meaningful when hardware TSC scaling is supported. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- Documentation/virt/kvm/devices/vcpu.rst | 119 ++++-- arch/x86/include/uapi/asm/kvm.h | 6 + arch/x86/kvm/x86.c | 22 + tools/testing/selftests/kvm/Makefile.kvm | 1 + .../kvm/x86/pvclock_migration_test.c | 382 ++++++++++++++++++ 5 files changed, 500 insertions(+), 30 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86/pvclock_migration_test.c diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/k= vm/devices/vcpu.rst index 5e3805820010..167aa4140d30 100644 --- a/Documentation/virt/kvm/devices/vcpu.rst +++ b/Documentation/virt/kvm/devices/vcpu.rst @@ -243,7 +243,10 @@ Returns: Specifies the guest's TSC offset relative to the host's TSC. The guest's TSC is then derived by the following equation: =20 - guest_tsc =3D host_tsc + KVM_VCPU_TSC_OFFSET + guest_tsc =3D ((host_tsc * tsc_scale_ratio) >> tsc_scale_bits) + KVM_VCP= U_TSC_OFFSET + +The values of tsc_scale_ratio and tsc_scale_bits can be obtained using +the KVM_VCPU_TSC_SCALE attribute. =20 This attribute is useful to adjust the guest's TSC on live migration, so that the TSC counts the time during which the VM was paused. The @@ -251,44 +254,100 @@ following describes a possible algorithm to use for = this purpose. =20 From the source VMM process: =20 -1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_src), +1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (host_tsc_src), kvmclock nanoseconds (guest_src), and host CLOCK_REALTIME nanoseconds - (host_src). + (time_src) at a given moment (Tsrc). + +2. For each vCPU[i]: + + a. Read the KVM_VCPU_TSC_OFFSET attribute to record the guest TSC offset + (ofs_src[i]). =20 -2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the - guest TSC offset (ofs_src[i]). + b. Read the KVM_VCPU_TSC_SCALE attribute to record the guest TSC scaling + ratio (ratio_src[i], frac_bits_src[i]). =20 -3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the - guest's TSC (freq). + c. Use host_tsc_src and the scaling/offset factors to calculate this + vCPU's TSC at time Tsrc: + + tsc_src[i] =3D ((host_tsc_src * ratio_src[i]) >> frac_bits_src[i]) += ofs_src[i] + +3. Invoke the KVM_GET_CLOCK_GUEST ioctl on the boot vCPU to return the KVM + clock as a function of the guest TSC (pvti_src). (This ioctl may not + succeed if the host and guest TSCs are not consistent and well-behaved.) =20 From the destination VMM process: =20 -4. Invoke the KVM_SET_CLOCK ioctl, providing the source nanoseconds from - kvmclock (guest_src) and CLOCK_REALTIME (host_src) in their respective - fields. Ensure that the KVM_CLOCK_REALTIME flag is set in the provided - structure. +4. Before creating the vCPUs, invoke the KVM_SET_TSC_KHZ ioctl on the VM, = to + set the scaled frequency of the guest's TSC (freq). + +5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (host_tsc_dst) and + host CLOCK_REALTIME nanoseconds (time_dst) at a given moment (Tdst). + +6. Calculate the number of nanoseconds elapsed between Tsrc and Tdst: + + =CE=94T =3D time_dst - time_src + +7. As each vCPU[i] is created: + + a. Read the KVM_VCPU_TSC_SCALE attribute to record the guest TSC scaling + ratio (ratio_dst[i], frac_bits_dst[i]). + + b. Calculate the intended guest TSC value at time Tdst: + + tsc_dst[i] =3D tsc_src[i] + (=CE=94T * freq[i]) =20 - KVM will advance the VM's kvmclock to account for elapsed time since - recording the clock values. Note that this will cause problems in - the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized - between the source and destination, and a reasonably short time passes - between the source pausing the VMs and the destination executing - steps 4-7. + c. Use host_tsc_dst and the scaling factors to calculate this vCPU's + raw scaled TSC at time Tdst without offsetting: + + raw_dst[i] =3D ((host_tsc_dst * ratio_dst[i]) >> frac_bits_dst[i]) + + d. Calculate ofs_dst[i] =3D tsc_dst[i] - raw_dst[i] and set the resulti= ng + offset using the KVM_VCPU_TSC_OFFSET attribute. + +8. If pvti_src was provided, invoke the KVM_SET_CLOCK_GUEST ioctl on the b= oot + vCPU to restore the KVM clock as a precise function of the guest TSC. + +9. If KVM_SET_CLOCK_GUEST was not available or failed (e.g. because the + master clock is not active), fall back to the KVM_SET_CLOCK ioctl, + providing the source nanoseconds from kvmclock (guest_src) and + CLOCK_REALTIME (time_src) in their respective fields. Ensure that the + KVM_CLOCK_REALTIME flag is set in the provided structure. + + KVM will restore the VM's kvmclock, accounting for elapsed time since + the clock values were recorded. Note that this will cause problems in + the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized between + the source and destination, and a reasonably short time passes between + the source pausing the VMs and the destination resuming them. + Due to the KVM_[SG]ET_CLOCK API using CLOCK_REALTIME instead of + CLOCK_TAI, leap seconds during the migration may also introduce errors. + +4.2 ATTRIBUTE: KVM_VCPU_TSC_SCALE +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:Parameters: struct kvm_vcpu_tsc_scale + +Returns: + + =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + -EFAULT Error reading the provided parameter + address. + -ENXIO Attribute not supported (no TSC scaling) + -EINVAL Invalid request to write the attribute + =3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_dest) and - kvmclock nanoseconds (guest_dest). +This read-only attribute reports the guest's TSC scaling factor, in the fo= rm +of a fixed-point number represented by the following structure:: =20 -6. Adjust the guest TSC offsets for every vCPU to account for (1) time - elapsed since recording state and (2) difference in TSCs between the - source and destination machine: + struct kvm_vcpu_tsc_scale { + __u64 tsc_ratio; + __u64 tsc_frac_bits; + }; =20 - ofs_dst[i] =3D ofs_src[i] - - (guest_src - guest_dest) * freq + - (tsc_src - tsc_dest) +The tsc_frac_bits field indicates the location of the fixed point, such th= at +host TSC values are converted to guest TSC using the formula: =20 - ("ofs[i] + tsc - guest * freq" is the guest TSC value corresponding to - a time of 0 in kvmclock. The above formula ensures that it is the - same on the destination as it was on the source). + guest_tsc =3D ((host_tsc * tsc_ratio) >> tsc_frac_bits) + offset =20 -7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the - respective value derived in the previous step. +Userspace can use this to precisely calculate the guest TSC from the host +TSC at any given moment. This is needed for accurate migration of guests, +as described in the documentation for the KVM_VCPU_TSC_OFFSET attribute. diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kv= m.h index 5f2b30d0405c..384be9a53395 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -961,6 +961,12 @@ struct kvm_hyperv_eventfd { /* for KVM_{GET,SET,HAS}_DEVICE_ATTR */ #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TS= C) */ #define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */ +#define KVM_VCPU_TSC_SCALE 1 /* attribute for TSC scaling factor */ + +struct kvm_vcpu_tsc_scale { + __u64 tsc_ratio; + __u64 tsc_frac_bits; +}; =20 /* x86-specific KVM_EXIT_HYPERCALL flags. */ #define KVM_EXIT_HYPERCALL_LONG_MODE _BITULL(0) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c1897d939da9..6337f9b9d7ac 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5930,6 +5930,9 @@ static int kvm_arch_tsc_has_attr(struct kvm_vcpu *vcp= u, case KVM_VCPU_TSC_OFFSET: r =3D 0; break; + case KVM_VCPU_TSC_SCALE: + r =3D kvm_caps.has_tsc_control ? 0 : -ENXIO; + break; default: r =3D -ENXIO; } @@ -5950,6 +5953,22 @@ static int kvm_arch_tsc_get_attr(struct kvm_vcpu *vc= pu, break; r =3D 0; break; + case KVM_VCPU_TSC_SCALE: { + struct kvm_vcpu_tsc_scale scale; + + if (!kvm_caps.has_tsc_control) { + r =3D -ENXIO; + break; + } + + scale.tsc_ratio =3D vcpu->arch.l1_tsc_scaling_ratio; + scale.tsc_frac_bits =3D kvm_caps.tsc_scaling_ratio_frac_bits; + r =3D -EFAULT; + if (copy_to_user(uaddr, &scale, sizeof(scale))) + break; + r =3D 0; + break; + } default: r =3D -ENXIO; } @@ -5989,6 +6008,9 @@ static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcp= u, r =3D 0; break; } + case KVM_VCPU_TSC_SCALE: + r =3D -EINVAL; /* Read only */ + break; default: r =3D -ENXIO; } diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index fb935ae3bf38..90568ab631d7 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -106,6 +106,7 @@ TEST_GEN_PROGS_x86 +=3D x86/pmu_event_filter_test TEST_GEN_PROGS_x86 +=3D x86/private_mem_conversions_test TEST_GEN_PROGS_x86 +=3D x86/private_mem_kvm_exits_test TEST_GEN_PROGS_x86 +=3D x86/pvclock_test +TEST_GEN_PROGS_x86 +=3D x86/pvclock_migration_test TEST_GEN_PROGS_x86 +=3D x86/set_boot_cpu_id TEST_GEN_PROGS_x86 +=3D x86/set_sregs_test TEST_GEN_PROGS_x86 +=3D x86/smaller_maxphyaddr_emulation_test diff --git a/tools/testing/selftests/kvm/x86/pvclock_migration_test.c b/too= ls/testing/selftests/kvm/x86/pvclock_migration_test.c new file mode 100644 index 000000000000..6a7eaf627d1a --- /dev/null +++ b/tools/testing/selftests/kvm/x86/pvclock_migration_test.c @@ -0,0 +1,382 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Test KVM clock precision across simulated live migration. + * + * Verifies that the documented TSC migration procedure (using + * KVM_VCPU_TSC_OFFSET, KVM_VCPU_TSC_SCALE, KVM_GET_CLOCK, and + * KVM_SET_CLOCK_GUEST) preserves the kvmclock's relationship to + * CLOCK_MONOTONIC_RAW. + * + * The test: + * 1. Creates a VM, runs the guest to enable kvmclock + * 2. Does a PTP-like ABA measurement of kvmclock vs CLOCK_MONOTONIC_RAW + * 3. Follows the documented migration procedure (same host, 1s pause) + * 4. Does the same ABA measurement on the destination VM + * 5. Verifies the kvmclock-vs-monotonic delta is preserved + */ +#include +#include +#include +#include +#include +#include + +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" + +#include + +#define KVMCLOCK_GPA 0xc0000000ULL + +static void guest_code(void) +{ + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, KVMCLOCK_GPA | 1); + GUEST_SYNC(0); + GUEST_SYNC(1); +} + +static uint64_t read_kvmclock_ns(struct kvm_vm *vm) +{ + struct kvm_clock_data data =3D {}; + + vm_ioctl(vm, KVM_GET_CLOCK, &data); + return data.clock; +} + +static uint64_t pvclock_read_cycles(struct pvclock_vcpu_time_info *src, + uint64_t tsc) +{ + uint64_t delta =3D tsc - src->tsc_timestamp; + uint64_t ns; + + if (src->tsc_shift >=3D 0) + delta <<=3D src->tsc_shift; + else + delta >>=3D -(int32_t)src->tsc_shift; + + ns =3D (unsigned __int128)delta * src->tsc_to_system_mul >> 32; + return src->system_time + ns; +} + +/* + * ABA measurement: read CLOCK_MONOTONIC_RAW, kvmclock, CLOCK_MONOTONIC_RA= W. + * Repeat 3 times, keep the reading with the smallest spread. + */ +static void aba_reading(struct kvm_vm *vm, uint64_t *lo, uint64_t *kvm_ns, + uint64_t *hi) +{ + uint64_t best_spread =3D UINT64_MAX; + int i; + + for (i =3D 0; i < 3; i++) { + struct timespec ts1, ts2; + uint64_t m1, m2, clk; + + clock_gettime(CLOCK_MONOTONIC_RAW, &ts1); + clk =3D read_kvmclock_ns(vm); + clock_gettime(CLOCK_MONOTONIC_RAW, &ts2); + + m1 =3D ts1.tv_sec * 1000000000ULL + ts1.tv_nsec; + m2 =3D ts2.tv_sec * 1000000000ULL + ts2.tv_nsec; + + if (m2 - m1 < best_spread) { + best_spread =3D m2 - m1; + *lo =3D m1; + *kvm_ns =3D clk; + *hi =3D m2; + } + } +} + +static struct kvm_vm *create_vm(struct kvm_vcpu **vcpu) +{ + struct kvm_vm *vm =3D vm_create_with_one_vcpu(vcpu, guest_code); + + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, + KVMCLOCK_GPA, 1, 1, 0); + virt_map(vm, KVMCLOCK_GPA, KVMCLOCK_GPA, 1); + return vm; +} + +int main(void) +{ + struct pvclock_vcpu_time_info pvti_src; + struct kvm_clock_data clock_src, clock_dst; + struct kvm_vcpu_tsc_scale scale_src, scale_dst; + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + struct ucall uc; + uint64_t mono_before, kvm_before, kvm_after; + int64_t delta_before; + uint64_t ofs_src, tsc_src, tsc_dst, raw_dst, ofs_dst; + uint64_t host_tsc_src, host_tsc_dst; + uint64_t time_src, time_dst; + int64_t delta_t; + uint32_t freq_khz =3D 1500000; /* 1.5 GHz =E2=80=94 forces TSC scaling */ + int ret; + + TEST_REQUIRE(sys_clocksource_is_based_on_tsc()); + + /* =3D=3D=3D SOURCE SIDE =3D=3D=3D */ + pr_info("=3D=3D=3D Source VM =3D=3D=3D\n"); + vm =3D create_vm(&vcpu); + + /* Set guest TSC frequency (may trigger scaling) */ + vcpu_ioctl(vcpu, KVM_SET_TSC_KHZ, (void *)(unsigned long)freq_khz); + + /* Run guest to enable kvmclock */ + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_SYNC); + + /* ABA measurement: kvmclock vs CLOCK_MONOTONIC_RAW */ + uint64_t src_mono_lo, src_mono_hi; + aba_reading(vm, &src_mono_lo, &kvm_before, &src_mono_hi); + mono_before =3D (src_mono_lo + src_mono_hi) / 2; + delta_before =3D (int64_t)(kvm_before - mono_before); + pr_info(" kvmclock - MONOTONIC_RAW =3D %" PRId64 " ns (=C2=B1%" PRIu64 "= ns)\n", + delta_before, (src_mono_hi - src_mono_lo) / 2); + + /* Step 1: KVM_GET_CLOCK for atomic {host_tsc, realtime} */ + memset(&clock_src, 0, sizeof(clock_src)); + clock_src.flags =3D KVM_CLOCK_REALTIME; + vm_ioctl(vm, KVM_GET_CLOCK, &clock_src); + host_tsc_src =3D clock_src.host_tsc; + time_src =3D clock_src.realtime; + + /* Step 2: Save TSC offset and scale */ + { + struct kvm_device_attr attr =3D { + .group =3D KVM_VCPU_TSC_CTRL, + .attr =3D KVM_VCPU_TSC_OFFSET, + .addr =3D (uint64_t)(uintptr_t)&ofs_src, + }; + vcpu_ioctl(vcpu, KVM_GET_DEVICE_ATTR, &attr); + } + { + struct kvm_device_attr attr =3D { + .group =3D KVM_VCPU_TSC_CTRL, + .attr =3D KVM_VCPU_TSC_SCALE, + .addr =3D (uint64_t)(uintptr_t)&scale_src, + }; + memset(&scale_src, 0, sizeof(scale_src)); + __vcpu_ioctl(vcpu, KVM_GET_DEVICE_ATTR, &attr); + } + + /* Compute guest TSC at Tsrc */ + if (scale_src.tsc_frac_bits) + tsc_src =3D ((unsigned __int128)host_tsc_src * scale_src.tsc_ratio + >> scale_src.tsc_frac_bits) + ofs_src; + else + tsc_src =3D host_tsc_src + ofs_src; + + /* Step 3: KVM_GET_CLOCK_GUEST */ + ret =3D __vcpu_ioctl(vcpu, KVM_GET_CLOCK_GUEST, &pvti_src); + TEST_ASSERT(!ret, "KVM_GET_CLOCK_GUEST failed"); + + pr_info(" TSC freq=3D%u kHz, offset=3D%" PRId64 "\n", freq_khz, (int64_t= )ofs_src); + + kvm_vm_release(vm); + + /* =3D=3D=3D PAUSE (simulate migration) =3D=3D=3D */ + pr_info("=3D=3D=3D Pausing 1 second =3D=3D=3D\n"); + sleep(1); + + /* =3D=3D=3D DESTINATION SIDE =3D=3D=3D */ + pr_info("=3D=3D=3D Destination VM =3D=3D=3D\n"); + vm =3D create_vm(&vcpu); + + /* Step 4: KVM_SET_TSC_KHZ */ + vcpu_ioctl(vcpu, KVM_SET_TSC_KHZ, (void *)(unsigned long)freq_khz); + + /* Step 5: KVM_GET_CLOCK for atomic {host_tsc, realtime} pair. + * Master clock is active from vCPU creation. + */ + memset(&clock_dst, 0, sizeof(clock_dst)); + vm_ioctl(vm, KVM_GET_CLOCK, &clock_dst); + host_tsc_dst =3D clock_dst.host_tsc; + time_dst =3D clock_dst.realtime; + + /* Step 6: =CE=94T */ + delta_t =3D (int64_t)(time_dst - time_src); + + /* Step 7: Compute destination offset */ + { + struct kvm_device_attr attr =3D { + .group =3D KVM_VCPU_TSC_CTRL, + .attr =3D KVM_VCPU_TSC_SCALE, + .addr =3D (uint64_t)(uintptr_t)&scale_dst, + }; + memset(&scale_dst, 0, sizeof(scale_dst)); + __vcpu_ioctl(vcpu, KVM_GET_DEVICE_ATTR, &attr); + } + + tsc_dst =3D tsc_src + (uint64_t)((int64_t)freq_khz * 1000 * delta_t / 100= 0000000LL); + + if (scale_dst.tsc_frac_bits) + raw_dst =3D (unsigned __int128)host_tsc_dst * scale_dst.tsc_ratio + >> scale_dst.tsc_frac_bits; + else + raw_dst =3D host_tsc_dst; + + ofs_dst =3D tsc_dst - raw_dst; + + /* + * The TSC offset delta introduced by using CLOCK_REALTIME to + * estimate elapsed time. On same host, the correct offset is + * ofs_src; the difference is the CLOCK_REALTIME-vs-TSC error. + */ + int64_t tsc_ofs_delta =3D (int64_t)(ofs_dst - ofs_src); + int64_t tsc_ofs_delta_ns =3D tsc_ofs_delta * 1000000000LL / ((int64_t)fre= q_khz * 1000); + pr_info(" Destination TSC offset=3D%" PRId64 + ", imprecision from CLOCK_REALTIME: %" PRId64 " cycles =3D %" + PRId64 " ns\n", (int64_t)ofs_dst, tsc_ofs_delta, tsc_ofs_delta_ns); + + /* Set TSC offset */ + { + struct kvm_device_attr attr =3D { + .group =3D KVM_VCPU_TSC_CTRL, + .attr =3D KVM_VCPU_TSC_OFFSET, + .addr =3D (uint64_t)(uintptr_t)&ofs_dst, + }; + vcpu_ioctl(vcpu, KVM_SET_DEVICE_ATTR, &attr); + } + + /* Step 8: KVM_SET_CLOCK_GUEST */ + ret =3D __vcpu_ioctl(vcpu, KVM_SET_CLOCK_GUEST, &pvti_src); + TEST_ASSERT(!ret, "KVM_SET_CLOCK_GUEST failed: errno %d", errno); + + /* Run guest to update pvclock page on destination */ + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_SYNC); + + /* ABA measurement on destination */ + uint64_t mono_lo, mono_hi; + aba_reading(vm, &mono_lo, &kvm_after, &mono_hi); + + /* + * The kvmclock is tied to the guest TSC via SET_CLOCK_GUEST. + * The guest TSC is offset from the correct value by tsc_ofs_delta_ns + * (due to CLOCK_REALTIME imprecision). So the kvmclock should be + * offset from CLOCK_MONOTONIC_RAW by exactly: + * (original delta) + tsc_ofs_delta_ns + * + * The "original delta" has uncertainty from the source ABA spread, + * and the measurement has uncertainty from the destination ABA spread. + * Verify the expected value falls within the combined bounds. + */ + int64_t delta_before_lo =3D (int64_t)(kvm_before - src_mono_hi); + int64_t delta_before_hi =3D (int64_t)(kvm_before - src_mono_lo); + int64_t expected_lo =3D delta_before_lo + tsc_ofs_delta_ns; + int64_t expected_hi =3D delta_before_hi + tsc_ofs_delta_ns; + int64_t actual_lo =3D (int64_t)(kvm_after - mono_hi); + int64_t actual_hi =3D (int64_t)(kvm_after - mono_lo); + + /* Show the shift relative to the source measurement */ + int64_t expected_mid =3D tsc_ofs_delta_ns; + int64_t expected_err =3D (int64_t)(src_mono_hi - src_mono_lo) / 2; + int64_t actual_mid =3D ((actual_lo + actual_hi) / 2) - delta_before; + int64_t actual_err =3D (int64_t)(mono_hi - mono_lo) / 2; + pr_info(" kvmclock-mono shift: expected %" PRId64 " ns (=C2=B1%" PRId64 + "), measured %" PRId64 " ns (=C2=B1%" PRId64 ")\n", + expected_mid, expected_err, actual_mid, actual_err); + + /* The ranges must overlap */ + TEST_ASSERT(expected_hi >=3D actual_lo && expected_lo <=3D actual_hi, + "Ranges don't overlap: expected [%" PRId64 ", %" PRId64 + "] measured [%" PRId64 ", %" PRId64 "]", + expected_lo, expected_hi, actual_lo, actual_hi); + + /* + * Direct pvclock verification: read the destination pvclock page + * and verify that computing kvmclock from pvti_src and pvti_dst + * at the same guest TSC gives the same result. + * + * Get an atomic {host_tsc, kvmclock} pair, scale host_tsc to + * guest TSC using KVM_VCPU_TSC_SCALE, then compute kvmclock + * from both pvclock structs. + */ + struct kvm_clock_data clock_now =3D {}; + vm_ioctl(vm, KVM_GET_CLOCK, &clock_now); + + struct pvclock_vcpu_time_info *pvti_dst =3D addr_gpa2hva(vm, KVMCLOCK_GPA= ); + uint64_t host_tsc_now =3D clock_now.host_tsc; + uint64_t guest_tsc_now; + + if (scale_dst.tsc_frac_bits) + guest_tsc_now =3D ((unsigned __int128)host_tsc_now * + scale_dst.tsc_ratio >> scale_dst.tsc_frac_bits) + + ofs_dst; + else + guest_tsc_now =3D host_tsc_now + ofs_dst; + + uint64_t clk_from_src =3D pvclock_read_cycles(&pvti_src, guest_tsc_now); + uint64_t clk_from_dst =3D pvclock_read_cycles(pvti_dst, guest_tsc_now); + int64_t pvclock_delta =3D (int64_t)(clk_from_src - clk_from_dst); + + pr_info(" Pvclock direct: src=3D%" PRIu64 " dst=3D%" PRIu64 + " delta=3D%" PRId64 " ns\n", clk_from_src, clk_from_dst, pvclock_delta); + pr_info(" KVM_GET_CLOCK: %" PRIu64 " ns\n", (uint64_t)clock_now.clock); + + TEST_ASSERT(pvclock_delta >=3D -1 && pvclock_delta <=3D 1, + "pvclock src vs dst disagree by %" PRId64 " ns", pvclock_delta); + + /* + * Tight ABA: compare pvclock_read() directly (no ioctl) against + * CLOCK_MONOTONIC_RAW. The spread should be much smaller since + * there's no syscall between the two clock_gettime calls =E2=80=94 just + * rdtsc + userspace mul/shift. + */ + uint64_t tight_mono_lo =3D 0, tight_mono_hi =3D 0, tight_kvm =3D 0; + uint64_t tight_best_spread =3D UINT64_MAX; + for (int i =3D 0; i < 3; i++) { + struct timespec ts1, ts2; + uint64_t m1, m2, tsc, clk; + + clock_gettime(CLOCK_MONOTONIC_RAW, &ts1); + tsc =3D rdtsc(); + clock_gettime(CLOCK_MONOTONIC_RAW, &ts2); + + m1 =3D ts1.tv_sec * 1000000000ULL + ts1.tv_nsec; + m2 =3D ts2.tv_sec * 1000000000ULL + ts2.tv_nsec; + + /* Scale host TSC to guest TSC */ + if (scale_dst.tsc_frac_bits) + tsc =3D ((unsigned __int128)tsc * scale_dst.tsc_ratio + >> scale_dst.tsc_frac_bits) + ofs_dst; + else + tsc +=3D ofs_dst; + + clk =3D pvclock_read_cycles(pvti_dst, tsc); + + if (m2 - m1 < tight_best_spread) { + tight_best_spread =3D m2 - m1; + tight_mono_lo =3D m1; + tight_mono_hi =3D m2; + tight_kvm =3D clk; + } + } + pr_info(" Tight ABA spread: %" PRIu64 " ns (best of 3)\n", tight_best_sp= read); + + int64_t tight_expected_lo =3D delta_before_lo + tsc_ofs_delta_ns; + int64_t tight_expected_hi =3D delta_before_hi + tsc_ofs_delta_ns; + int64_t tight_actual_lo =3D (int64_t)(tight_kvm - tight_mono_hi); + int64_t tight_actual_hi =3D (int64_t)(tight_kvm - tight_mono_lo); + int64_t tight_actual_mid =3D ((tight_actual_lo + tight_actual_hi) / 2) - = delta_before; + int64_t tight_actual_err =3D (int64_t)(tight_mono_hi - tight_mono_lo) / 2; + + pr_info(" Tight kvmclock-mono shift: expected %" PRId64 + " ns (=C2=B1%" PRId64 "), measured %" PRId64 " ns (=C2=B1%" PRId64 ")\n", + expected_mid, expected_err, tight_actual_mid, tight_actual_err); + + TEST_ASSERT(tight_expected_hi >=3D tight_actual_lo && + tight_expected_lo <=3D tight_actual_hi, + "Tight ABA ranges don't overlap"); + + kvm_vm_release(vm); + pr_info("PASS: kvmclock offset matches TSC delta from CLOCK_REALTIME" + " (%" PRId64 " ns) within ABA bounds\n", tsc_ofs_delta_ns); + return 0; +} --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930547; cv=none; d=zohomail.com; s=zohoarc; b=djFfkST1yH7Cniyu2h2ACYjSSWjjjCcH1VuJHb6Z6I+T7vaFngnkK4gPmcDdoo429famHMutG64DYhptcc0vMWDfUkUwDQwOesCUpR1G1qlGm+YGDoRmh0HZTEAf3iEXvEw2by8q/9Va9PFRLWi+KL/D4ZvLsec5QbSiV8StV6o= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930547; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=xzlpkIa4DyVp/6DXKWYEMYtET/4mCymMWHGnH0+qq44=; b=HnbNYQfAzG7ml7Db49dqczSD+rEE5l5c58ZoEkKLInhazGzqeetk5KcfN1x4+BScreGwFgQXsRrsvYbAFQChxK5yIvLpKZ+ze5xj047gaKqbMGjzw1ZKZ4Mx89MXGPahbJmEcWgdZhMOc6ReO5mZocwaxjdB8ysbss6mgBEnxzQ= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930543994406.2436213226746; Mon, 8 Jun 2026 07:55:43 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331712.1594351 (Exim 4.92) (envelope-from ) id 1wWbNq-0003hH-Cz; Mon, 08 Jun 2026 14:55:22 +0000 Received: by outflank-mailman (output) from mailman id 1331712.1594351; Mon, 08 Jun 2026 14:55:22 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNp-0003dM-WE; Mon, 08 Jun 2026 14:55:22 +0000 Received: by outflank-mailman (input) for mailman id 1331712; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNk-00027r-BT for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNj-00EcHH-O0; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [10.42.69.11] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7bb-bab6-0a2a0a5309dd-0a2a450bae1a-46 for ; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [90.155.92.199] (helo=desiato.infradead.org) by tlsNG-42698a.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d3-212f-0a2a450b0019-5a9b5cc7b4f6-3 for ; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001Afx-2Mwq; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NEw-3Lzb; Mon, 08 Jun 2026 15:54:57 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=desiato.20200630 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=xzlpkIa4DyVp/6DXKWYEMYtET/4mCymMWHGnH0+qq44=; b=kLSsiyZSY0uGzPuLZzMZQccrVR oTMwbu3mZjvmNhy96ViattvnzwQ+QV18hQW7ybAGW1CGjiQhSeUgL2J4Jzj//FNmuQjhiM0CxEQS1 CDwWtDn9r+BmPjdxWsklCo4lGIEq6WfFOT2E6hnWdy0iLskgVQrHnnU/3eWGPG2+gm8q2W01vA/uB sH8MdtdYARjHCHwjGxL1mHLSn3+DXcpzJcHVbVjJ6FZjMQAnyAfpkiSIGG9+XMGd7geUCTibXf2bO l/0m5k9Z077Xju2PFw+r28+BrkRx24ypwLqcU9YBjtrc0aDa81AjhM3lirvqUrHRDgaDoCuX/t+cQ 7JLgP+GQ==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 09/34] KVM: x86: Avoid NTP frequency skew for KVM clock on 32-bit host Date: Mon, 8 Jun 2026 15:47:50 +0100 Message-ID: <20260608145455.89187-10-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-42698a/1780930515-20C7DF3B-BF8FC315/0/0 X-purgate-type: clean X-purgate-size: 3670 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930549142158500 Content-Type: text/plain; charset="utf-8" From: David Woodhouse Commit 53fafdbb8b21 ("KVM: x86: switch KVMCLOCK base to monotonic raw clock") did so only for 64-bit hosts, by capturing the boot offset from within the existing clocksource notifier update_pvclock_gtod(). That notifier was added in commit 16e8d74d2da9 ("KVM: x86: notifier for clocksource changes") but only on x86_64, because its original purpose was just to disable the "master clock" mode which is only supported on x86_64. Now that the notifier is used for more than disabling master clock mode, enable it for the 32-bit build too so that get_kvmclock_base_ns() can be unaffected by NTP sync on 32-bit too. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- arch/x86/kvm/x86.c | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6337f9b9d7ac..50bd2871b051 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2342,7 +2342,6 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned= index, u64 *data) return kvm_set_msr_ignored_check(vcpu, index, *data, true); } =20 -#ifdef CONFIG_X86_64 struct pvclock_clock { int vclock_mode; u64 cycle_last; @@ -2400,13 +2399,6 @@ static s64 get_kvmclock_base_ns(void) /* Count up from boot time, but with the frequency of the raw clock. */ return ktime_to_ns(ktime_add(ktime_get_raw(), pvclock_gtod_data.offs_boot= )); } -#else -static s64 get_kvmclock_base_ns(void) -{ - /* Master clock not used, so we can just use CLOCK_BOOTTIME. */ - return ktime_get_boottime_ns(); -} -#endif =20 static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int se= c_hi_ofs) { @@ -10173,6 +10165,7 @@ static void pvclock_irq_work_fn(struct irq_work *w) } =20 static DEFINE_IRQ_WORK(pvclock_irq_work, pvclock_irq_work_fn); +#endif =20 /* * Notification about pvclock gtod data update. @@ -10180,26 +10173,26 @@ static DEFINE_IRQ_WORK(pvclock_irq_work, pvclock_= irq_work_fn); static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long un= used, void *priv) { - struct pvclock_gtod_data *gtod =3D &pvclock_gtod_data; struct timekeeper *tk =3D priv; =20 update_pvclock_gtod(tk); =20 +#ifdef CONFIG_X86_64 /* * Disable master clock if host does not trust, or does not use, * TSC based clocksource. Delegate queue_work() to irq_work as * this is invoked with tk_core.seq write held. */ - if (!gtod_is_based_on_tsc(gtod->clock.vclock_mode) && + if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode) && atomic_read(&kvm_guest_has_master_clock) !=3D 0) irq_work_queue(&pvclock_irq_work); +#endif return 0; } =20 static struct notifier_block pvclock_gtod_notifier =3D { .notifier_call =3D pvclock_gtod_notify, }; -#endif =20 void kvm_setup_xss_caps(void) { @@ -10388,9 +10381,9 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *op= s) =20 if (pi_inject_timer =3D=3D -1) pi_inject_timer =3D housekeeping_enabled(HK_TYPE_TIMER); -#ifdef CONFIG_X86_64 pvclock_gtod_register_notifier(&pvclock_gtod_notifier); =20 +#ifdef CONFIG_X86_64 if (hypervisor_is_type(X86_HYPER_MS_HYPERV)) set_hv_tscchange_cb(kvm_hyperv_tsc_notifier); #endif @@ -10447,8 +10440,8 @@ void kvm_x86_vendor_exit(void) CPUFREQ_TRANSITION_NOTIFIER); cpuhp_remove_state_nocalls(CPUHP_AP_X86_KVM_CLK_ONLINE); } -#ifdef CONFIG_X86_64 pvclock_gtod_unregister_notifier(&pvclock_gtod_notifier); +#ifdef CONFIG_X86_64 irq_work_sync(&pvclock_irq_work); cancel_work_sync(&pvclock_gtod_work); #endif --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DFFB43E9D2; Mon, 8 Jun 2026 14:55:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930540; cv=none; b=NLgYieoNNHXy2EErJiA43vsWti6vBOf1LQHZJ8mg8xkp0B9/ISgK38sxBUUG6sfHs/yZJBr+H/Lzx4VVT9Zlhpk+m9LHdBdVA9FMBP+x1kuYn7gAkZoI1p2siGtHbgWlBK6igAlhVAUnASKHRUhQr9w3YGEpQ/JuLXG3r6aIT40= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930540; c=relaxed/simple; bh=W0liJgLUP+4F6PDsC7cXp3OtJreSUh0fBECzJHMveas=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=u8RdlKCUS+4kLgT+N4WW5D5PCAyfCTwc4CoC25k8dZ29yB8+JdqaUzB6fHB0Z0KnmxqP+mfdxMKNwUyOXfSXuoqaZ4BnQPNzAv8z/L7Of20YerkMzcqvWT2oO7+W4JA6iFTUHcrIh4s5RckEJkLpTUvsmLQzWvqA8OZYFp2DN5c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=szg0DuJa; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="szg0DuJa" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=mK9MivH4Sa09pKns/iaOBplM1ylYPC1nLBsprSjeCoQ=; b=szg0DuJaBDaAO6gJ+6si2oIAis T91YLENzCyD9xsWd+xbTgQw7GqijATO+tnbaTBL01eGHLbKCN5Cc3aCoQxj6HIAMG90eZjZs0Ydug jPIbOu8aN+jme6AwIGKke4AjYZcyV9ipCdmVuwkICFvP8Ht0pf7uBv0dj7EY00YzdrWDTlNiKh6N7 pTg/dfcczCehCy18HLpKxo3Dhch1c3qYACouIglNEkfV4ijxAmKYsV6yx/YJ8ZfFPAQYuCyDeRHVv gqbeTfHqe9xZAshQmAkH2NTO9GqdT8NLIKMzSAguCJgjcAqyyI+RQVUTleE5TBh2TgG8xYP3fhq19 TmvcWxLA==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNR-0000000DtxA-32IJ; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NEz-3a70; Mon, 08 Jun 2026 15:54:57 +0100 From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 10/34] KVM: x86: Fold __get_kvmclock() into get_kvmclock() Date: Mon, 8 Jun 2026 15:47:51 +0100 Message-ID: <20260608145455.89187-11-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse There is no need for the separate __get_kvmclock() helper; just inline its body into get_kvmclock() within the seqcount retry loop. No functional change. Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 63 +++++++++++++++++++++------------------------- 1 file changed, 28 insertions(+), 35 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 50bd2871b051..fce898811fe7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3200,50 +3200,43 @@ static unsigned long get_cpu_tsc_khz(void) return __this_cpu_read(cpu_tsc_khz); } =20 -/* Called within read_seqcount_begin/retry for kvm->pvclock_sc. */ -static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) +static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) { struct kvm_arch *ka =3D &kvm->arch; struct pvclock_vcpu_time_info hv_clock; + unsigned int seq; =20 - /* both __this_cpu_read() and rdtsc() should be on the same cpu */ - get_cpu(); + do { + seq =3D read_seqcount_begin(&ka->pvclock_sc); =20 - data->flags =3D 0; - if (ka->use_master_clock && - (static_cpu_has(X86_FEATURE_CONSTANT_TSC) || __this_cpu_read(cpu_tsc_= khz))) { + /* both __this_cpu_read() and rdtsc() should be on the same cpu */ + get_cpu(); + + data->flags =3D 0; + if (ka->use_master_clock && + (static_cpu_has(X86_FEATURE_CONSTANT_TSC) || __this_cpu_read(cpu_tsc= _khz))) { #ifdef CONFIG_X86_64 - struct timespec64 ts; + struct timespec64 ts; =20 - if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) { - data->realtime =3D ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec; - data->flags |=3D KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC; - } else + if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) { + data->realtime =3D ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec; + data->flags |=3D KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC; + } else #endif - data->host_tsc =3D rdtsc(); - - data->flags |=3D KVM_CLOCK_TSC_STABLE; - hv_clock.tsc_timestamp =3D ka->master_cycle_now; - hv_clock.system_time =3D ka->master_kernel_ns + ka->kvmclock_offset; - kvm_get_time_scale(NSEC_PER_SEC, get_cpu_tsc_khz() * 1000LL, - &hv_clock.tsc_shift, - &hv_clock.tsc_to_system_mul); - data->clock =3D __pvclock_read_cycles(&hv_clock, data->host_tsc); - } else { - data->clock =3D get_kvmclock_base_ns() + ka->kvmclock_offset; - } - - put_cpu(); -} - -static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) -{ - struct kvm_arch *ka =3D &kvm->arch; - unsigned seq; + data->host_tsc =3D rdtsc(); + + data->flags |=3D KVM_CLOCK_TSC_STABLE; + hv_clock.tsc_timestamp =3D ka->master_cycle_now; + hv_clock.system_time =3D ka->master_kernel_ns + ka->kvmclock_offset; + kvm_get_time_scale(NSEC_PER_SEC, get_cpu_tsc_khz() * 1000LL, + &hv_clock.tsc_shift, + &hv_clock.tsc_to_system_mul); + data->clock =3D __pvclock_read_cycles(&hv_clock, data->host_tsc); + } else { + data->clock =3D get_kvmclock_base_ns() + ka->kvmclock_offset; + } =20 - do { - seq =3D read_seqcount_begin(&ka->pvclock_sc); - __get_kvmclock(kvm, data); + put_cpu(); } while (read_seqcount_retry(&ka->pvclock_sc, seq)); } =20 --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F14693F0767; Mon, 8 Jun 2026 14:55:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930516; cv=none; b=HYryWlqIV1DTTcRP+NIWiDed10BYJNjnd5pCp8K66gEioyMQqmKNdEyNnsP5g0zw6P0tDejBNNSQJflXlSWJLivpkQrQItlcaC478VR0m8yLjnjaE7ovUcKTTzv22r1xgGj8HVUKXIYQl+H4H8rzXoFPCUjM18Pu8QzDKDb2ICU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930516; c=relaxed/simple; bh=3HGCHUA2PyJ2aulk/W2TeWQ178nvRCGBukPfIltJmzE=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=X96IUIxYQfy3XREdfBNqqF8Y9sLl8YcrCy0P2M4H7I7MTm2XgehI+LmeGRzaB+A/2oOY/L1mq7F0x5HXVhoDCfXucl2qQDGKU+TxAHmGgCFP7LcraC/q1rWy9VxGh7itvdZufy5Jczp2rd06X/pn5UgIqeGQo7QjQL/jMdGC6TU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Tp+yi1Wz; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Tp+yi1Wz" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=/6X9NkkXJGBiIJZgksWE6pM8oGLkNoBdJss2eJyo+jo=; b=Tp+yi1Wz3jymBuLeW82uMZjkkH mfqyw7ZDqC/8GrxtdY31F4JRU7e64yb4mB12cJAMtCttaRAuGjqefJGL28TwWLMkXP5k0NW0xrRRq s80I4Gz0M1yRY6f45AXMEBasj8k2qWYOYhg8TKDw0ImF6oIV131Xba564m6wHw6Sl8NQvlVSZwcU0 nloZja3Puek11Uc2jm7xj3EXGyokyIKzEwLvYm7whYJ0X4fk46wZlxuZ2r4CX5IL+X2BGHTRsjDzm c0zz+zTVRjCTvhiC74DHbmZp7dEt5jz39JMAgwVzsoeuHYOWFvURgNKcrrCBOUhL+g3O1pMTyRuqO mxY/IyLw==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNR-0000000DtxB-31cT; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NF2-3o2K; Mon, 08 Jun 2026 15:54:57 +0100 From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 11/34] KVM: x86: Restructure get_kvmclock() Date: Mon, 8 Jun 2026 15:47:52 +0100 Message-ID: <20260608145455.89187-12-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse Move get/put_cpu inside the use_master_clock branch since they are only needed there (for RDTSC and get_cpu_tsc_khz() to be on the same CPU). Simplify the use_master_clock condition: the open-coded CONSTANT_TSC || cpu_tsc_khz check is unnecessary since use_master_clock can only be true when the TSC is usable. Wrap the entire use_master_clock block in #ifdef CONFIG_X86_64, since use_master_clock is never true on 32-bit (host_tsc_clocksource is only set under CONFIG_X86_64). When the clock read fails (e.g. clocksource transitioning away from TSC), fall back to the non-master-clock path (get_kvmclock_base_ns) rather than proceeding with uninitialised data or spinning in the seqcount loop. Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 34 +++++++++++++++++++++++----------- 1 file changed, 23 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fce898811fe7..6983a7494fcd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3209,21 +3209,30 @@ static void get_kvmclock(struct kvm *kvm, struct kv= m_clock_data *data) do { seq =3D read_seqcount_begin(&ka->pvclock_sc); =20 - /* both __this_cpu_read() and rdtsc() should be on the same cpu */ - get_cpu(); - data->flags =3D 0; - if (ka->use_master_clock && - (static_cpu_has(X86_FEATURE_CONSTANT_TSC) || __this_cpu_read(cpu_tsc= _khz))) { #ifdef CONFIG_X86_64 + if (ka->use_master_clock) { struct timespec64 ts; =20 + /* + * The RDTSC and get_cpu_tsc_khz() must happen on + * the same CPU. + */ + get_cpu(); + if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) { data->realtime =3D ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec; data->flags |=3D KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC; - } else -#endif - data->host_tsc =3D rdtsc(); + } else { + /* + * Clock read failed (e.g. clocksource is + * transitioning away from TSC). Fall back to + * the non-master-clock path rather than + * spinning. + */ + put_cpu(); + goto fallback; + } =20 data->flags |=3D KVM_CLOCK_TSC_STABLE; hv_clock.tsc_timestamp =3D ka->master_cycle_now; @@ -3232,11 +3241,14 @@ static void get_kvmclock(struct kvm *kvm, struct kv= m_clock_data *data) &hv_clock.tsc_shift, &hv_clock.tsc_to_system_mul); data->clock =3D __pvclock_read_cycles(&hv_clock, data->host_tsc); - } else { + + put_cpu(); + } else +#endif + { +fallback: data->clock =3D get_kvmclock_base_ns() + ka->kvmclock_offset; } - - put_cpu(); } while (read_seqcount_retry(&ka->pvclock_sc, seq)); } =20 --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930536; cv=none; d=zohomail.com; s=zohoarc; b=LfGUG7OO5l73vOBjW5IHCrO3qICb4qO0VbSrkDgIvcxJK5dJZZtTFHcPgA8Rb84oFV7TaVWt7fF44qIjZM+Y71RPDAh6LRLggRwAZYJzsOnlaHACDatwnMz7PIzAyZceluECF4pQ+fIlzAeAc5Xto1BVl00VeAEYkEDM7Zrj2LA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930536; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=GYIPkApXjqdmWDUs+uq5FjcmUtBEQ1pISKw6Z2UaRCo=; b=WF40Qgd63pBI1VNfvXFMGqBKbFwZN2bbliwAjwFO6YUUnpPGRsFd9DyVfgfaj/LXpqQMwz+3ARAbz9XjnejSB4gp13tk28R99z/5wPBTmoHELTjCJ/RELpfzNL6KXd+9ROx5MLlwZzDt2ZX12OoyOiFGaYGCZcw7gb8odQr4gIU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930536826504.07150077773565; Mon, 8 Jun 2026 07:55:36 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331703.1594270 (Exim 4.92) (envelope-from ) id 1wWbNi-0001ak-Pa; Mon, 08 Jun 2026 14:55:14 +0000 Received: by outflank-mailman (output) from mailman id 1331703.1594270; Mon, 08 Jun 2026 14:55:14 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNi-0001Xr-HO; Mon, 08 Jun 2026 14:55:14 +0000 Received: by outflank-mailman (input) for mailman id 1331703; Mon, 08 Jun 2026 14:55:13 +0000 Received: from mx.expurgate.net ([195.190.135.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNh-0001Od-0o for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:13 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNg-00AKPU-D0; Mon, 08 Jun 2026 16:55:12 +0200 Received: from [10.42.69.3] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7c9-bab6-0a2a0a5309dd-0a2a4503be90-8 for ; Mon, 08 Jun 2026 16:55:11 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-33051d.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7cf-672d-0a2a45030019-5a9b32229a98-3 for ; Mon, 08 Jun 2026 16:55:11 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNR-0000000DtxD-3A0L; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NFC-3xeb; Mon, 08 Jun 2026 15:54:57 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=GYIPkApXjqdmWDUs+uq5FjcmUtBEQ1pISKw6Z2UaRCo=; b=gV/ATQV+Z//gzR/mu9S2cXp5NR ySggUKBa6gNNOlXNF45eMUlAhuW2gom1CLZgCwjQxHRcP/vcTVxwTR0jL7OihAU5/M4aAzRdbkkQ7 mB/1csf6TB/7Fb759BnTDKJ5XmQz2m0zdVXzCHbqxtWnVTBFUlP0s73BFu2wF/n/7D/mlhSDi+Oj3 ASAZ7bO1d5bEwDPF7buZyBilAd1UvwIIt7dhGmCg4484OJ4N3r5I0OIrVAoIRVllZaDOCSEmr0dIw I1A+KIdfueRdB1XrOslA8Bsj5phPoKiL5LZCdXg9hLxLcvl5vbJpdPZ2PyixH5YCU5bC1Q/WJbaeV /ogJWSFw==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 12/34] KVM: x86: Fix KVM clock precision in get_kvmclock() with TSC scaling Date: Mon, 8 Jun 2026 15:47:53 +0100 Message-ID: <20260608145455.89187-13-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-33051d/1780930511-4046D938-2BFFCB9A/0/0 X-purgate-type: clean X-purgate-size: 4618 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930538061154100 Content-Type: text/plain; charset="utf-8" From: David Woodhouse When in master clock mode, the KVM clock is defined in terms of the guest TSC. But get_kvmclock() was computing it from the host TSC without applying TSC scaling, leading to a systemic drift from the values the guest computes from its own TSC. Store the VM's TSC scaling ratio in kvm_arch and precompute the guest-TSC-based mul/shift in pvclock_update_vm_gtod_copy(). Use these in get_kvmclock() to scale the host TSC delta to guest TSC before converting to nanoseconds. This avoids "definition C" of the KVM clock described in the earlier commit "KVM: x86/xen: Do not corrupt KVM clock in kvm_xen_shared_info_init()". Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 4 +++ arch/x86/kvm/x86.c | 52 +++++++++++++++++++++++++++++---- 2 files changed, 51 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 37264212c7df..5348fd5ea3f3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1490,6 +1490,7 @@ struct kvm_arch { u64 last_tsc_write; u32 last_tsc_khz; u64 last_tsc_offset; + u64 last_tsc_scaling_ratio; u64 cur_tsc_nsec; u64 cur_tsc_write; u64 cur_tsc_offset; @@ -1504,6 +1505,9 @@ struct kvm_arch { bool use_master_clock; u64 master_kernel_ns; u64 master_cycle_now; + u64 master_tsc_scaling_ratio; + s8 master_tsc_shift; + u32 master_tsc_mul; =20 #ifdef CONFIG_KVM_HYPERV struct kvm_hv hyperv; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6983a7494fcd..7ae6a7705353 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2781,6 +2781,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vc= pu, u64 offset, u64 tsc, kvm->arch.last_tsc_write =3D tsc; kvm->arch.last_tsc_khz =3D vcpu->arch.virtual_tsc_khz; kvm->arch.last_tsc_offset =3D offset; + kvm->arch.last_tsc_scaling_ratio =3D vcpu->arch.l1_tsc_scaling_ratio; =20 vcpu->arch.last_guest_tsc =3D tsc; =20 @@ -3109,6 +3110,8 @@ static bool kvm_get_walltime_and_clockread(struct tim= espec64 *ts, * */ =20 +static unsigned long get_cpu_tsc_khz(void); + static void pvclock_update_vm_gtod_copy(struct kvm *kvm) { #ifdef CONFIG_X86_64 @@ -3132,9 +3135,30 @@ static void pvclock_update_vm_gtod_copy(struct kvm *= kvm) && !ka->backwards_tsc_observed && !ka->boot_vcpu_runs_old_kvmclock; =20 - if (ka->use_master_clock) + if (ka->use_master_clock) { + u64 tsc_hz; + atomic_set(&kvm_guest_has_master_clock, 1); =20 + /* + * Copy the scaling ratio and precompute the mul/shift for + * converting guest TSC to nanoseconds. These are used by + * get_kvmclock() to compute kvmclock from the host TSC + * without needing a vCPU reference. + */ + ka->master_tsc_scaling_ratio =3D ka->last_tsc_scaling_ratio; + tsc_hz =3D (u64)get_cpu_tsc_khz() * 1000; + if (tsc_hz && kvm_caps.has_tsc_control) + tsc_hz =3D kvm_scale_tsc(tsc_hz, + ka->master_tsc_scaling_ratio); + if (tsc_hz) + kvm_get_time_scale(NSEC_PER_SEC, tsc_hz, + &ka->master_tsc_shift, + &ka->master_tsc_mul); + else + ka->use_master_clock =3D false; + } + vclock_mode =3D pvclock_gtod_data.clock.vclock_mode; trace_kvm_update_master_clock(ka->use_master_clock, vclock_mode, vcpus_matched); @@ -3237,10 +3261,28 @@ static void get_kvmclock(struct kvm *kvm, struct kv= m_clock_data *data) data->flags |=3D KVM_CLOCK_TSC_STABLE; hv_clock.tsc_timestamp =3D ka->master_cycle_now; hv_clock.system_time =3D ka->master_kernel_ns + ka->kvmclock_offset; - kvm_get_time_scale(NSEC_PER_SEC, get_cpu_tsc_khz() * 1000LL, - &hv_clock.tsc_shift, - &hv_clock.tsc_to_system_mul); - data->clock =3D __pvclock_read_cycles(&hv_clock, data->host_tsc); + + /* + * Use the precomputed guest-TSC-based mul/shift + * so that the kvmclock value matches what the + * guest computes from its own TSC. + */ + hv_clock.tsc_shift =3D ka->master_tsc_shift; + hv_clock.tsc_to_system_mul =3D ka->master_tsc_mul; + + if (kvm_caps.has_tsc_control) { + u64 tsc_delta =3D data->host_tsc - ka->master_cycle_now; + + tsc_delta =3D kvm_scale_tsc(tsc_delta, + ka->master_tsc_scaling_ratio); + data->clock =3D hv_clock.system_time + + pvclock_scale_delta(tsc_delta, + hv_clock.tsc_to_system_mul, + hv_clock.tsc_shift); + } else { + data->clock =3D __pvclock_read_cycles(&hv_clock, + data->host_tsc); + } =20 put_cpu(); } else --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F13C83EFD2E; Mon, 8 Jun 2026 14:55:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930516; cv=none; b=n2Pf9AhG+u+fcPLXm43kz58NFsVDDXiIET2ymB4U8OHrO8v36MijwLULKW99yEPW5zODmmvT8xD0KLkdGFKwJ50X30u2S0GOisvMRzJoiET5/0L8NdbCF8R5MtE5wyY71cPFR6FdGpX3f29jj8cTcrSJc6vR8da4xKXmNEL+hS0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930516; c=relaxed/simple; bh=4HRAvu7QnItsVVy9cJZHyFTB7WbZwCOHb9jStVvYf9k=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JHOu5YjQw2JiZi2VTpO7n1wEZAJuVHceslMivnxXsqj1VahqlLDuCGxl789VYb/sF0xLWrD34OB5yJClO8KYQ867Rz0blfknwXwGMQ0clbIUmLq+/Rtw2RvxHM/NbKUeSxo7q9VPuPXP5M1loczRmp4aq53ct36aIzLg0cKCXAs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=RGeLi7Gj; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="RGeLi7Gj" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=kzo7us9axLE6O2exw0Cl6U5WHgHBbewVPWCB3bErZOE=; b=RGeLi7GjoSXYplCK7Jkv0M+NUZ d3RESPAK3OJqd9O/Y6CBJzjRWyo+LmbRbYXU17ZRiNIF5jRN/1AelT3Pp7TCUWDm2bDspYBdl8om3 K5LV2Wpn+lijtTE0bITDN+lqpHd8EGL9BOTxPTq0oHdwHicvt6BAUUIY8oE4emduAdzw1OFQ4x7YQ Pe7emlltLqC6oNqj+gHa2Ex8FZ2suXd+M4Yo5H3xQUFch1KSl8iFV/oRtJO0+igNGXiPQ63poo9UT 0+em0x8c/zUkuqgc+gKdNziudceC23aF8P+fHaZHRCFsOTBdkmb9hsvqPKzYnchYajaEz0ZrhRJya q4GZk7fA==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNR-0000000DtxF-3M66; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNR-00000000NFK-45z6; Mon, 08 Jun 2026 15:54:57 +0100 From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 13/34] KVM: x86: Use get_kvmclock() in kvm_get_wall_clock_epoch() Date: Mon, 8 Jun 2026 15:47:54 +0100 Message-ID: <20260608145455.89187-14-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse Now that get_kvmclock() correctly handles TSC scaling and captures both wallclock and kvmclock from the same TSC reading, kvm_get_wall_clock_epoch() can simply call it instead of duplicating the pvclock computation. This eliminates the last instance of the "definition C" kvmclock calculation that computed nanoseconds directly from the host TSC without accounting for guest TSC scaling. Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 59 +++++++--------------------------------------- 1 file changed, 9 insertions(+), 50 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7ae6a7705353..fc9366b83912 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3494,63 +3494,22 @@ int kvm_guest_time_update(struct kvm_vcpu *v) * wallclock and kvmclock times, and subtracting one from the other. * * Fall back to using their values at slightly different moments by - * calling ktime_get_real_ns() and get_kvmclock_ns() separately. + * calling ktime_get_real_ns() and get_kvmclock() separately. */ uint64_t kvm_get_wall_clock_epoch(struct kvm *kvm) { -#ifdef CONFIG_X86_64 - struct pvclock_vcpu_time_info hv_clock; - struct kvm_arch *ka =3D &kvm->arch; - unsigned long seq, local_tsc_khz; - struct timespec64 ts; - uint64_t host_tsc; - - do { - seq =3D read_seqcount_begin(&ka->pvclock_sc); - - local_tsc_khz =3D 0; - if (!ka->use_master_clock) - break; - - /* - * The TSC read and the call to get_cpu_tsc_khz() must happen - * on the same CPU. - */ - get_cpu(); - - local_tsc_khz =3D get_cpu_tsc_khz(); - - if (local_tsc_khz && - !kvm_get_walltime_and_clockread(&ts, &host_tsc)) - local_tsc_khz =3D 0; /* Fall back to old method */ - - put_cpu(); - - /* - * These values must be snapshotted within the seqcount loop. - * After that, it's just mathematics which can happen on any - * CPU at any time. - */ - hv_clock.tsc_timestamp =3D ka->master_cycle_now; - hv_clock.system_time =3D ka->master_kernel_ns + ka->kvmclock_offset; + struct kvm_clock_data data; =20 - } while (read_seqcount_retry(&ka->pvclock_sc, seq)); + get_kvmclock(kvm, &data); =20 /* - * If the conditions were right, and obtaining the wallclock+TSC was - * successful, calculate the KVM clock at the corresponding time and - * subtract one from the other to get the guest's epoch in nanoseconds - * since 1970-01-01. + * If get_kvmclock() captured both wallclock and kvmclock from the + * same TSC reading, use them for a precise epoch calculation. */ - if (local_tsc_khz) { - kvm_get_time_scale(NSEC_PER_SEC, local_tsc_khz * NSEC_PER_USEC, - &hv_clock.tsc_shift, - &hv_clock.tsc_to_system_mul); - return ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec - - __pvclock_read_cycles(&hv_clock, host_tsc); - } -#endif - return ktime_get_real_ns() - get_kvmclock_ns(kvm); + if (data.flags & KVM_CLOCK_REALTIME) + return data.realtime - data.clock; + + return ktime_get_real_ns() - data.clock; } =20 /* --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930561; cv=none; d=zohomail.com; s=zohoarc; b=E2amzP8/sTSM6+EZ0sXBXgg89ZVqfXr/h4B/AZzupyhrvrMHoBGSaO7BnqJURbCa0MSsR6DKAS4YVx6rps2VA+esEWOSbDjDEXlVGZqRc+BsuyrAmhkgRF6hFng67vsGZGEKLcVGdFj2uGL7rK0ht/YptqJG4KdB7s1JlmNW2gs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930561; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=i/yoO3HDR7YLqjov478325pvAUsRGQQgqZ9ev5HV2o8=; b=Fw2ZHjjaA9IAaiVsl1YhIGdtb6lfArEcXSr3ZyMcap4ijlKrYIqjjhiJ0CZbNJ+oVfHZ8PvIx8pRLyklUqAU9FiX8JX25hz2k3b7YIm9QydI2JiyLlWHFs2WKegLcBY0QZWnloDnn3IwJL0C8VgGRSjJO3lNwZgYTgoyCPkGgsw= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930561176669.1728208548973; Mon, 8 Jun 2026 07:56:01 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331718.1594390 (Exim 4.92) (envelope-from ) id 1wWbNw-0004p6-Fw; Mon, 08 Jun 2026 14:55:28 +0000 Received: by outflank-mailman (output) from mailman id 1331718.1594390; Mon, 08 Jun 2026 14:55:28 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNv-0004id-Bb; Mon, 08 Jun 2026 14:55:27 +0000 Received: by outflank-mailman (input) for mailman id 1331718; Mon, 08 Jun 2026 14:55:18 +0000 Received: from mx.expurgate.net ([194.145.224.20]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNk-000290-ED for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNj-008Spg-QH; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [10.42.69.12] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7c9-e002-0a2a0a5209dd-0a2a450c8386-40 for ; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [90.155.92.199] (helo=desiato.infradead.org) by tlsNG-d25034.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d2-62f1-0a2a450c0019-5a9b5cc7ae94-3 for ; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001Afy-2NZe; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NFO-03tL; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=desiato.20200630 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=i/yoO3HDR7YLqjov478325pvAUsRGQQgqZ9ev5HV2o8=; b=jdSWb0OwqtV+/2hdzov+bkf+X9 a3t28sQxckXLw/lCD+sO7y7wSauhODpuY3bxOVM7opTEgJm0VRb+ei4hzNmgi8Q54A7RAqn9y7J39 c3bD3LfUDHVO9Oe6jgJZjjrPOdwOkYPT6Rv2LSxp0mssjpfs1EJ/5b+afZRhN2Z5HBSP7yn4sZ6vE 3/2kzUC3X9oMWIxm9ICfZJqKZNGUqesh1s/WVS/SPtGBrGfm20Y8PEeiwqHeRux8qeMJ6icISAd3l K/VT8djYMmJvgWQ0trv4zu1JyVDGixCCaPcm+KIcB2qjQukfufGCs5P9Mr4e+oXYLttQxyYLmeVD0 cJhJ0bEw==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 14/34] KVM: x86: Fix compute_guest_tsc() to handle negative time deltas Date: Mon, 8 Jun 2026 15:47:55 +0100 Message-ID: <20260608145455.89187-15-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-d25034/1780930515-E2368CF5-90131852/0/0 X-purgate-type: clean X-purgate-size: 1721 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930562265154100 Content-Type: text/plain; charset="utf-8" From: David Woodhouse The compute_guest_tsc() function computes the guest TSC at a given kernel_ns timestamp. When the master clock reference point (master_kernel_ns) is earlier than vcpu->arch.this_tsc_nsec, the delta is negative. Since pvclock_scale_delta() takes a u64, the negative value wraps to a huge positive number, producing a wildly wrong result. Handle negative deltas explicitly by negating the delta, scaling it, and subtracting from this_tsc_write. Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index fc9366b83912..8aae22401046 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2588,11 +2588,21 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u= 32 user_tsc_khz) =20 static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns) { - u64 tsc =3D pvclock_scale_delta(kernel_ns-vcpu->arch.this_tsc_nsec, - vcpu->arch.virtual_tsc_mult, - vcpu->arch.virtual_tsc_shift); - tsc +=3D vcpu->arch.this_tsc_write; - return tsc; + s64 delta_ns =3D kernel_ns - vcpu->arch.this_tsc_nsec; + u64 tsc; + + /* Handle negative deltas gracefully (master clock ref may be earlier) */ + if (delta_ns < 0) { + tsc =3D pvclock_scale_delta(-delta_ns, + vcpu->arch.virtual_tsc_mult, + vcpu->arch.virtual_tsc_shift); + return vcpu->arch.this_tsc_write - tsc; + } + + tsc =3D pvclock_scale_delta(delta_ns, + vcpu->arch.virtual_tsc_mult, + vcpu->arch.virtual_tsc_shift); + return vcpu->arch.this_tsc_write + tsc; } =20 #ifdef CONFIG_X86_64 --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930550; cv=none; d=zohomail.com; s=zohoarc; b=gW20JitW+nDRPU8Qs5M0pvnYQWGGky09w3Wkz1lX8RbuTVoO7cSUbEP1KUAgypNPGI4818VJrNJoX9jIGgDlN9Bcja+vpBI4pkbUBMzRwpaz54Gccz/2a4XnZgD3nwFiTTWoqp9NaITk8sHqYIfPUJ/sHoXSwg4pt51RmVfWr/o= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930550; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=VBMeuRPafAbqaNcAfzTWThuHMwIVcYEzYD/qbdzYY+4=; b=WkgRc7IjoKpnG1sOlr773bw+vcmbCiwzBs901TkUi2Ad2b1TmmKg4NrwibKBv8qKE0YRHyQPeIap9FK8oTNzFEDMK1FHq1fe9pJa3/b1bfuYF439NtU3XmVQ3qknDIt0i5/FmNsVR9vx5dJ91MoOdu6BTZxCDCU8ZPUpVhT3+oE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930550329874.4094867053775; Mon, 8 Jun 2026 07:55:50 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331719.1594400 (Exim 4.92) (envelope-from ) id 1wWbNy-00058Z-6g; Mon, 08 Jun 2026 14:55:30 +0000 Received: by outflank-mailman (output) from mailman id 1331719.1594400; Mon, 08 Jun 2026 14:55:29 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNx-00051c-05; Mon, 08 Jun 2026 14:55:29 +0000 Received: by outflank-mailman (input) for mailman id 1331719; Mon, 08 Jun 2026 14:55:18 +0000 Received: from mx.expurgate.net ([194.145.224.20]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNl-0002Y7-QD for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNl-008Spg-6X; Mon, 08 Jun 2026 16:55:17 +0200 Received: from [10.42.69.11] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7d4-e002-0a2a0a5209dd-0a2a450ba1be-0 for ; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [90.155.92.199] (helo=desiato.infradead.org) by tlsNG-42698a.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d3-212f-0a2a450b0019-5a9b5cc7e1ac-3 for ; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001Ag0-2MZo; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NFS-0ELi; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=desiato.20200630 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=VBMeuRPafAbqaNcAfzTWThuHMwIVcYEzYD/qbdzYY+4=; b=MhrcM22gHWIGol+6qlbmYezfmM XrKrpEeMyEMARpHzL8b4zyJDZFJxeeDpbjB+S7Dc+LgeJ1x+Mx512/T8bVoFmaYFReygowb+Sg52c GzmiWvmwHu1vCILEQXrtONeEQzOmHbLU7ch4jsc3pW5pK6A6WcQdQWGV4pA25zkl20XN8H9AKbNvi Ub7WrGsoCiiil+nxsVLYUxu3bfIdsGYbEcVzuGfP900xvgv7F8uDlzuB6PahsCywZvWrmjRUlS3Co moN+EMVubj9V2vl3/8ECg7H8LIIOmzPdwTCkmD/WVCEjEGY7keK/EsiMNUkEDTjMdZvylfqqfvjtk OhEcmK9Q==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 15/34] KVM: x86: Restructure kvm_guest_time_update() for TSC upscaling Date: Mon, 8 Jun 2026 15:47:56 +0100 Message-ID: <20260608145455.89187-16-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-42698a/1780930516-1A573F3B-D32E948A/0/0 X-purgate-type: clean X-purgate-size: 4471 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930552115154100 Content-Type: text/plain; charset="utf-8" From: David Woodhouse Restructure kvm_guest_time_update() so that kernel_ns/host_tsc are always "now" when doing TSC catchup, then swap in the master clock reference values afterward for the hv_clock. This makes the TSC upscaling code considerably simpler: the catchup adjustment is computed as the delta between what the guest TSC *should* be at "now" and what it actually is, rather than mixing "now" and "master clock reference" timestamps. The seqcount loop now also contains the kvm_get_time_and_clockread() call (matching get_kvmclock's pattern), with the same WARN for unexpected failure. Based on a suggestion by Sean Christopherson. Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 74 +++++++++++++++++++++++++++++++++------------- 1 file changed, 53 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8aae22401046..92e32d720523 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3363,46 +3363,63 @@ static void kvm_setup_guest_pvclock(struct pvclock_= vcpu_time_info *ref_hv_clock, int kvm_guest_time_update(struct kvm_vcpu *v) { struct pvclock_vcpu_time_info hv_clock =3D {}; - unsigned long flags; u64 tgt_tsc_hz; unsigned seq; struct kvm_vcpu_arch *vcpu =3D &v->arch; struct kvm_arch *ka =3D &v->kvm->arch; s64 kernel_ns; u64 tsc_timestamp, host_tsc; + u64 master_host_tsc =3D 0; + s64 master_kernel_ns =3D 0; bool use_master_clock; =20 - kernel_ns =3D 0; - host_tsc =3D 0; - /* * If the host uses TSC clock, then passthrough TSC as stable * to the guest. */ do { seq =3D read_seqcount_begin(&ka->pvclock_sc); + use_master_clock =3D ka->use_master_clock; - if (use_master_clock) { - host_tsc =3D ka->master_cycle_now; - kernel_ns =3D ka->master_kernel_ns; - } + + /* + * The TSC read and the call to get_cpu_tsc_khz() must happen + * on the same CPU. + */ + get_cpu(); + + tgt_tsc_hz =3D (u64)get_cpu_tsc_khz() * 1000; + +#ifdef CONFIG_X86_64 + if (use_master_clock && + !kvm_get_time_and_clockread(&kernel_ns, &host_tsc) && + !read_seqcount_retry(&ka->pvclock_sc, seq)) + use_master_clock =3D false; +#endif + + put_cpu(); + + if (!use_master_clock) + break; + + master_host_tsc =3D ka->master_cycle_now; + master_kernel_ns =3D ka->master_kernel_ns; } while (read_seqcount_retry(&ka->pvclock_sc, seq)); =20 - /* Keep irq disabled to prevent changes to the clock */ - local_irq_save(flags); - tgt_tsc_hz =3D (u64)get_cpu_tsc_khz() * 1000; if (unlikely(tgt_tsc_hz =3D=3D 0)) { - local_irq_restore(flags); kvm_make_request(KVM_REQ_CLOCK_UPDATE, v); return 1; } + if (!use_master_clock) { + unsigned long flags; + + local_irq_save(flags); host_tsc =3D rdtsc(); kernel_ns =3D get_kvmclock_base_ns(); + local_irq_restore(flags); } =20 - tsc_timestamp =3D kvm_read_l1_tsc(v, host_tsc); - /* * We may have to catch up the TSC to match elapsed wall clock * time for two reasons, even if kvmclock is used. @@ -3411,17 +3428,32 @@ int kvm_guest_time_update(struct kvm_vcpu *v) * entry to avoid unknown leaps of TSC even when running * again on the same CPU. This may cause apparent elapsed * time to disappear, and the guest to stand still or run - * very slowly. + * very slowly. */ if (vcpu->tsc_catchup) { - u64 tsc =3D compute_guest_tsc(v, kernel_ns); - if (tsc > tsc_timestamp) { - adjust_tsc_offset_guest(v, tsc - tsc_timestamp); - tsc_timestamp =3D tsc; - } + s64 adjustment; + + /* + * Calculate the delta between what the guest TSC *should* be + * and what it actually is according to kvm_read_l1_tsc(). + */ + adjustment =3D compute_guest_tsc(v, kernel_ns) - + kvm_read_l1_tsc(v, host_tsc); + if (adjustment > 0) + adjust_tsc_offset_guest(v, adjustment); } =20 - local_irq_restore(flags); + /* + * Now that TSC upscaling is out of the way, the remaining calculations + * are all relative to the reference time that's placed in hv_clock. + * If the master clock is NOT in use, the reference time is "now". If + * master clock is in use, the reference time comes from there. + */ + if (use_master_clock) { + host_tsc =3D master_host_tsc; + kernel_ns =3D master_kernel_ns; + } + tsc_timestamp =3D kvm_read_l1_tsc(v, host_tsc); =20 /* With all the info we got, fill in the values */ =20 --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F62547AF5F; Mon, 8 Jun 2026 14:55:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930548; cv=none; b=raWP95LCyngQZnykd2brTkmAf0yrHh8foWtZt3bjK4rGmwOXlb36aDnXs7fzMnhWRiVKaMYY0aIPdBvpWEj8YGwTMWKk97B5sdXOXOS5OdWILfBtTPHvQsmEP8kx0jovgRbPT2m/shaNVpA8RbqAl2Dzlz+PeowH+Yn67JyEeD0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930548; c=relaxed/simple; bh=BRryjkqi72CYtRIATe2RY4Jwi1R8kuHW2WgrK3EhjY0=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Uq83Md3InkvqwcxKLSkEKNJS/dw9VyUusaRBYmgJmm100lVdECUP5h7z8qJzDjGZDNq3Oiht3tEq71jFwpIT4LrBhXxUbOoiD6aXiwT7b4UH+sg7ue5/iwJtS0M++F+U/rumnWJocitg8ccxyDRI38vjRXUYl4zqcCT79d3S+IY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=desiato.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=Ni9Ilvbw; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=desiato.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="Ni9Ilvbw" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=2HLSLUjeLQsW5WrqM21C7HZ/xT6pine5iDmPQ6igVxE=; b=Ni9Ilvbwo7oIrKnHzDrIZAX3F2 RVmRXimMBpSXsB/PuaSbg7zYWlWsY9UbP0wfNf5MjypDK+kBFRkXsj0vr9rZtr0xBq0N4u3z84LWb RkcALGmd1E1iTn/pYRHEwvY0rIcfjeZ4m/iW/CXynbM0G8e2Xw/147pFKI0z7DMnFZi2ZtaxDjJwQ 2TTQpAps1eMGFLa+LHQAZI2lhKXsKQNZoAMimSnbZozldFTFmjECb4y0D8gAkpJV1zRbGQDWWgoDM hvapc+bDfQtP6TDe0mL9HqbqDIvgaulmjVRM8bTdK+Aw70gqrKAi9unsKv9TS7s5FTmPCuB3m/JzY 8exJbhGw==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001Ag1-2Lkb; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NFW-0Spr; Mon, 08 Jun 2026 15:54:58 +0100 From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 16/34] KVM: x86: Simplify and comment kvm_get_time_scale() Date: Mon, 8 Jun 2026 15:47:57 +0100 Message-ID: <20260608145455.89187-17-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html From: David Woodhouse The kvm_get_time_scale() function was entirely opaque. Add comments explaining what it does: compute a fixed-point multiplier and shift for converting TSC ticks to nanoseconds via pvclock_scale_delta(). Rename the local variables from the cryptic tps64/tps32/scaled64 to base_hz_u64/base32/scaled_hz_u64 to make the code self-documenting. The "tps32" name stood for "Ticks Per Second" but was misleading since it held the shifted base frequency, not a tick count. No functional change. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- arch/x86/kvm/x86.c | 55 +++++++++++++++++++++++++++++++++------------- 1 file changed, 40 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 92e32d720523..a6c31a0d9955 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2472,32 +2472,57 @@ static uint32_t div_frac(uint32_t dividend, uint32_= t divisor) return dividend; } =20 -static void kvm_get_time_scale(uint64_t scaled_hz, uint64_t base_hz, +static void kvm_get_time_scale(u64 scaled_hz, u64 base_hz, s8 *pshift, u32 *pmultiplier) { - uint64_t scaled64; - int32_t shift =3D 0; - uint64_t tps64; - uint32_t tps32; + u64 scaled_hz_u64 =3D scaled_hz; + s32 shift =3D 0; + u64 base_hz_u64; + u32 base32; =20 - tps64 =3D base_hz; - scaled64 =3D scaled_hz; - while (tps64 > scaled64*2 || tps64 & 0xffffffff00000000ULL) { - tps64 >>=3D 1; + /* + * This function calculates a fixed-point multiplier and shift such + * that: + * time_ns =3D (tsc_cycles << shift) * multiplier >> 32 + * + * Where tsc_cycles tick at base_hz, and time_ns should count at + * scaled_hz (typically NSEC_PER_SEC for a TSC=E2=86=92nanoseconds conver= sion). + * + * The multiplier is: (scaled_hz << 32) / base_hz, adjusted by shift + * to keep everything in range. + */ + + base_hz_u64 =3D base_hz; + + /* + * Start by shifting base_hz right until it fits in 32 bits, and + * is lower than double the target rate. This introduces a negative + * shift value which would result in pvclock_scale_delta() shifting + * the actual tick count right before performing the multiplication. + */ + while (base_hz_u64 > scaled_hz_u64 * 2 || base_hz_u64 >> 32) { + base_hz_u64 >>=3D 1; shift--; } =20 - tps32 =3D (uint32_t)tps64; - while (tps32 <=3D scaled64 || scaled64 & 0xffffffff00000000ULL) { - if (scaled64 & 0xffffffff00000000ULL || tps32 & 0x80000000) - scaled64 >>=3D 1; + /* Now the shifted base_hz fits in 32 bits. */ + base32 =3D (u32)base_hz_u64; + + /* + * Next, shift scaled_hz right until it fits in 32 bits, and ensure + * that the shifted base_hz is strictly larger (so that the result of the + * final division also fits in 32 bits). + */ + while (base32 <=3D scaled_hz_u64 || scaled_hz_u64 >> 32) { + if (scaled_hz_u64 >> 32 || base32 & BIT(31)) + scaled_hz_u64 >>=3D 1; else - tps32 <<=3D 1; + base32 <<=3D 1; shift++; } =20 *pshift =3D shift; - *pmultiplier =3D div_frac(scaled64, tps32); + *pmultiplier =3D div_frac(scaled_hz_u64, base32); } =20 #ifdef CONFIG_X86_64 --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930546; cv=none; d=zohomail.com; s=zohoarc; b=RxXv/QD6mTIdxZL3DWaHZgJZN27L+C38ZSLKaltr4Gh6M4orb6J+lyttTe4Ix2ePR0tbxB83SoFckLjimrM7EHutpqEzAVGGsaavi80KydJPjeR1XK6YI3I1FDyArevbGTZmAaV0hmtWSYmSaZ9+mkZ6k0HGwADcw9OxAcOlSKc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930546; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=mXbhskdA++vhUDI1oas++LQ6MKrW1eT7EmECWvGk1MU=; b=dJTBRg4ZS+IfzFthoc9fvTG1TI7qy5Lckb3QhFjd5bePmjZwJmQLfaHruTnueJTskChYMbz9iyVC1qZ0hOPYTHJRtD140fUnx0X3Bfix4R6GEGocrrK3MDShDg2DK19wwl1z63J/+ivFGBcOqSpAduda9vLNzPIgjrmdXfrX4Ow= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930546185283.6918456239048; Mon, 8 Jun 2026 07:55:46 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331704.1594276 (Exim 4.92) (envelope-from ) id 1wWbNj-0001i4-3G; Mon, 08 Jun 2026 14:55:15 +0000 Received: by outflank-mailman (output) from mailman id 1331704.1594276; Mon, 08 Jun 2026 14:55:15 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNi-0001gQ-T7; Mon, 08 Jun 2026 14:55:14 +0000 Received: by outflank-mailman (input) for mailman id 1331704; Mon, 08 Jun 2026 14:55:13 +0000 Received: from mx.expurgate.net ([194.145.224.20]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNh-0001Ok-IK for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:13 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNg-008Slm-VL; Mon, 08 Jun 2026 16:55:12 +0200 Received: from [10.42.69.8] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7c3-e002-0a2a0a5209dd-0a2a4508bf20-22 for ; Mon, 08 Jun 2026 16:55:12 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-c1860d.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d0-63b5-0a2a45080019-5a9b3222c400-3 for ; Mon, 08 Jun 2026 16:55:12 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNR-0000000DtxG-4BPF; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NFa-0ldF; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=mXbhskdA++vhUDI1oas++LQ6MKrW1eT7EmECWvGk1MU=; b=kld3KJEKyVhgHwNB9MXsgtFGHM NRk8kDLq4sjWFPSiEYCazB3mLYrvTTJPUeVyZ8Adpmnt0NNFfaZoiBsBTHBZiMcMB/VwcWUxwxy7g lNILHXZ/Blw5vKOImjaz2dxZqvHuPxZKmZfM2mAtOIsTRmAupSROxT8IOZPNFf5YWr8Aqcjni3HSu Tm5dD4wjccOy5vGXBPeqCifw70AHNUqkba5MbwYlq1/G6tBhhz1sBC19DaWllKo9KdUR/kDGv0bZD UP7MppcYr9VAmJQO2gCb91fWtWdSBSqhI7QLwJi8L+i/vREiicZBMMi+ceG/pcLoDUSrhXoBZt8ha 9XciMkyw==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 17/34] KVM: x86: Remove implicit rdtsc() from kvm_compute_l1_tsc_offset() Date: Mon, 8 Jun 2026 15:47:58 +0100 Message-ID: <20260608145455.89187-18-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-c1860d/1780930512-BF171DB1-E00BEBBE/0/0 X-purgate-type: clean X-purgate-size: 2890 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930547122158500 Content-Type: text/plain; charset="utf-8" From: David Woodhouse Let the callers pass the host TSC value in as an explicit parameter. This leaves some fairly obviously stupid code, which is using this function to compare the guest TSC at some *other* time, with the newly-minted TSC value from rdtsc(). Unless it's being used to measure *elapsed* time, that isn't very sensible. In this case, "obviously stupid" is an improvement over being non-obviously so. No functional change intended. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- arch/x86/kvm/x86.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a6c31a0d9955..bce4c7a6a6fe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2693,11 +2693,12 @@ u64 kvm_scale_tsc(u64 tsc, u64 ratio) return _tsc; } =20 -static u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc) +static u64 kvm_compute_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 host_tsc, + u64 target_tsc) { u64 tsc; =20 - tsc =3D kvm_scale_tsc(rdtsc(), vcpu->arch.l1_tsc_scaling_ratio); + tsc =3D kvm_scale_tsc(host_tsc, vcpu->arch.l1_tsc_scaling_ratio); =20 return target_tsc - tsc; } @@ -2859,7 +2860,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu= , u64 *user_value) bool synchronizing =3D false; =20 raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); - offset =3D kvm_compute_l1_tsc_offset(vcpu, data); + offset =3D kvm_compute_l1_tsc_offset(vcpu, rdtsc(), data); ns =3D get_kvmclock_base_ns(); elapsed =3D ns - kvm->arch.last_tsc_nsec; =20 @@ -2908,7 +2909,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu= , u64 *user_value) } else { u64 delta =3D nsec_to_cycles(vcpu, elapsed); data +=3D delta; - offset =3D kvm_compute_l1_tsc_offset(vcpu, data); + offset =3D kvm_compute_l1_tsc_offset(vcpu, rdtsc(), data); } matched =3D true; } @@ -4155,7 +4156,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct = msr_data *msr_info) if (msr_info->host_initiated) { kvm_synchronize_tsc(vcpu, &data); } else if (!vcpu->arch.guest_tsc_protected) { - u64 adj =3D kvm_compute_l1_tsc_offset(vcpu, data) - vcpu->arch.l1_tsc_o= ffset; + u64 adj =3D kvm_compute_l1_tsc_offset(vcpu, rdtsc(), data) - + vcpu->arch.l1_tsc_offset; adjust_tsc_offset_guest(vcpu, adj); vcpu->arch.ia32_tsc_adjust_msr +=3D adj; } @@ -5279,7 +5281,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cp= u) mark_tsc_unstable("KVM discovered backwards TSC"); =20 if (kvm_check_tsc_unstable()) { - u64 offset =3D kvm_compute_l1_tsc_offset(vcpu, + u64 offset =3D kvm_compute_l1_tsc_offset(vcpu, rdtsc(), vcpu->arch.last_guest_tsc); kvm_vcpu_write_tsc_offset(vcpu, offset); if (!vcpu->arch.guest_tsc_protected) --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A9C03466B7D; Mon, 8 Jun 2026 14:55:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930548; cv=none; b=Aaxlmi1sjgqB9wFQywAmMQmMI4KzukHJGaTUrgxiqPXXQxxsxLrx6nEerE4oN9eNk/H4VmjOQtgyaKUkjyICKWZCjjCNOzVOqywOL5P69J/ia/Ifs28IjiMaz2154RKSd0zJgqijPiMgfeMOC49GIE/L+bzBINe/HQxFt1f+LjY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930548; c=relaxed/simple; bh=2JEOL6buJEIOXwDzWM5x8pciNT/Do1qDZMdU+g6FQYQ=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A1OzqYPrYXLj9qJNag+lPjYoz6u7sVxHN9/EuQRRskuqDeKI8s0il1264XmtDYLzwS4gQP2yD7tIVV+GIme8I6O23yw7NC2R/Ogcqg6f8iz2Gw7VeM0vSZKqDf450LwDJNvG9BxT1pzY0l8gsIHBFedxx5MVUBoDDeNiU6kTETI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=desiato.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=A6BQQsRa; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=desiato.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="A6BQQsRa" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=/71kIHPPPlp90x0zTmAXvyqLn1mltlKAEfLjrxV8XdM=; b=A6BQQsRa64I6mTEqUV89khFF7n tGY0CGJIbQdDxEhp6s01y+W2JVFv3edCFqDd6pQrU6pqn2KfQoYouBk3g82mPQY/yzGoJ5e3rtRFb 94N4kWbKMp3agWFeAdr4IpEqaefogXhrFSJUbguDSd4CQMBtCa15QLwJr1rpHTHTWzylwy+114xpZ RyRDV/1gp9kWCuXu2gp0JhsyyAvcwemutFv1iI/A2/ip8x+LLB2WTxZTZs1UZuyqeb7coE7xrECjO mzUlkWj+bMJHXU4lXnKI/OE6OHrT3MKSVEylX05mloZTH7x5NKf71NrA+tISRvrfqi2blHTHzkZNb 7aEVC/1A==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001Ag3-2Mbn; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NFe-13O0; Mon, 08 Jun 2026 15:54:58 +0100 From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 18/34] KVM: x86: Improve synchronization in kvm_synchronize_tsc() Date: Mon, 8 Jun 2026 15:47:59 +0100 Message-ID: <20260608145455.89187-19-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse When synchronizing to an existing TSC (either by explicitly writing zero, or the legacy hack where the TSC is written within one second's worth of the previously written TSC), the last_tsc_write and last_tsc_nsec values were being misrecorded by __kvm_synchronize_tsc(). The *unsynchronized* value of the TSC (perhaps even zero) was being recorded, along with the current time at which kvm_synchronize_tsc() was called. This could cause *subsequent* writes to fail to synchronize correctly. Fix that by resetting {data, ns} to the previous values before passing them to __kvm_synchronize_tsc() when synchronization is detected. Except in the case where the TSC is unstable and *has* to be synthesised from the host clock, in which case attempt to create a nsec/tsc pair which is on the correct line. Furthermore, there were *three* different TSC reads used for calculating the "current" time, all slightly different from each other. Fix that by using kvm_get_time_and_clockread() where possible and using the same host_tsc value in all cases. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- arch/x86/kvm/x86.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bce4c7a6a6fe..c8c0633263fb 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -203,6 +203,9 @@ module_param(mitigate_smt_rsb, bool, 0444); * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU * returns to userspace, i.e. the kernel can run with the guest's value. */ +#ifdef CONFIG_X86_64 +static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *tsc_timestamp); +#endif #define KVM_MAX_NR_USER_RETURN_MSRS 16 =20 struct kvm_user_return_msrs { @@ -2854,14 +2857,23 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vc= pu, u64 *user_value) { u64 data =3D user_value ? *user_value : 0; struct kvm *kvm =3D vcpu->kvm; - u64 offset, ns, elapsed; + u64 offset, host_tsc, elapsed; + s64 ns; unsigned long flags; bool matched =3D false; bool synchronizing =3D false; =20 raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); - offset =3D kvm_compute_l1_tsc_offset(vcpu, rdtsc(), data); - ns =3D get_kvmclock_base_ns(); + +#ifdef CONFIG_X86_64 + if (!kvm_get_time_and_clockread(&ns, &host_tsc)) +#endif + { + host_tsc =3D rdtsc(); + ns =3D get_kvmclock_base_ns(); + } + + offset =3D kvm_compute_l1_tsc_offset(vcpu, host_tsc, data); elapsed =3D ns - kvm->arch.last_tsc_nsec; =20 if (vcpu->arch.virtual_tsc_khz) { @@ -2904,12 +2916,25 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vc= pu, u64 *user_value) */ if (synchronizing && vcpu->arch.virtual_tsc_khz =3D=3D kvm->arch.last_tsc_khz) { + /* + * If synchronizing, the "last written" TSC value/time + * recorded by __kvm_synchronize_tsc() should not change + * (i.e. should be precisely the same as the existing + * generation). + */ + data =3D kvm->arch.last_tsc_write; + if (!kvm_check_tsc_unstable()) { offset =3D kvm->arch.cur_tsc_offset; + ns =3D kvm->arch.cur_tsc_nsec; } else { + /* + * ...unless the TSC is unstable and has to be + * synthesised from the host clock in nanoseconds. + */ u64 delta =3D nsec_to_cycles(vcpu, elapsed); data +=3D delta; - offset =3D kvm_compute_l1_tsc_offset(vcpu, rdtsc(), data); + offset =3D kvm_compute_l1_tsc_offset(vcpu, host_tsc, data); } matched =3D true; } --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930546; cv=none; d=zohomail.com; s=zohoarc; b=n1gTejpbgKpfmabvapAgvPLJbjIUYYw+fLoRPBuJNtNvyzuGjKJP4y5lMA0ZiiYRNcz7x3DC0Sxrw4Y0l+sI21RBoNfxi3uqId4zS4SgyIUPS463L+AaTFgnEd5XWTHBOvaW/++8WN8zHu9TA8lrqeW/Dr1GbO7/ZPoSV+Om/Qo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930546; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=Z11TCooN3o79fGtJQIRZglBTEce9tdBf389vknMfIOQ=; b=aMvLRpawvkFuTXd5pUTTCQHiQUBzICm63jRtGYBVLWxPzeo9VTHC+8snteMOLypU0am97cXMcZOkKN59KtuwKr8NouiDcdjeXbZ8cYagWpgImTmAI132ft0NAC14YhOYz12M7t+HVNFi+8gsyrSUoBftpbMrpg8p4DsQCV82fW8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930546029142.02526582644157; Mon, 8 Jun 2026 07:55:46 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331711.1594336 (Exim 4.92) (envelope-from ) id 1wWbNo-0003E5-O3; Mon, 08 Jun 2026 14:55:20 +0000 Received: by outflank-mailman (output) from mailman id 1331711.1594336; Mon, 08 Jun 2026 14:55:20 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNo-0003Bm-4U; Mon, 08 Jun 2026 14:55:20 +0000 Received: by outflank-mailman (input) for mailman id 1331711; Mon, 08 Jun 2026 14:55:15 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNj-0001tW-KR for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:15 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNj-00EcHH-0k; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [10.42.69.11] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7bb-bab6-0a2a0a5309dd-0a2a450bae1a-44 for ; Mon, 08 Jun 2026 16:55:14 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-42698a.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d2-212f-0a2a450b0019-5a9b3222ea8c-3 for ; Mon, 08 Jun 2026 16:55:14 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNS-0000000DtxP-0Uh1; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NFi-1I3p; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=Z11TCooN3o79fGtJQIRZglBTEce9tdBf389vknMfIOQ=; b=cynKPFl8GgekA9SKrwFXVB9r9J WAbFVCHdiWGH0CeOokQFuPCsXJ5o3iDL2cPV9hIMKQxr1N7MXHK6yEk9iJjsb9GAWXZOHgTi0DsXj JzQ+Nkq02WPUfyO9KNXEhSze7KlpaKZOXEG2oJjzfevRTF18SPRZ1yBRI8hEftHxaDX+zmpufV/QS pzA0W4c7cG6cx+ReYdtKYBmjJnHYRBJBqQeLcoh/2sc7BaFdzsuH2Xs71Xgo66vPsfPFQqbgY1cjp 2Ssmi5qG4wR/HkVXxhTRcfvPypwbpq2lU6dJxpPrNxctwK779aMrXdCeYKsaCKu079gHZ/Lr2+gJO ADa0uoTA==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 19/34] KVM: x86: Kill last_tsc_{nsec,write,offset} fields Date: Mon, 8 Jun 2026 15:48:00 +0100 Message-ID: <20260608145455.89187-20-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-42698a/1780930514-2007BF3B-9E42E192/0/0 X-purgate-type: clean X-purgate-size: 5887 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930548067154100 Content-Type: text/plain; charset="utf-8" From: David Woodhouse These pointlessly duplicate the cur_tsc_{nsec,write,offset} values. The only place they were used was where the TSC is stable and a new vCPU is being synchronized to the previous setting, in which case the cur_tsc_* value is definitely identical. Rename last_tsc_khz and last_tsc_scaling_ratio to cur_tsc_khz and cur_tsc_scaling_ratio respectively, since they are properties of the current TSC generation. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- arch/x86/include/asm/kvm_host.h | 7 ++---- arch/x86/kvm/x86.c | 42 ++++++++++++++++----------------- 2 files changed, 22 insertions(+), 27 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 5348fd5ea3f3..59298a8f78eb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1486,11 +1486,8 @@ struct kvm_arch { * preemption-disabled region, so it must be a raw spinlock. */ raw_spinlock_t tsc_write_lock; - u64 last_tsc_nsec; - u64 last_tsc_write; - u32 last_tsc_khz; - u64 last_tsc_offset; - u64 last_tsc_scaling_ratio; + u32 cur_tsc_khz; + u64 cur_tsc_scaling_ratio; u64 cur_tsc_nsec; u64 cur_tsc_write; u64 cur_tsc_offset; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c8c0633263fb..bbd642e0dc54 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2813,14 +2813,12 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *= vcpu, u64 offset, u64 tsc, vcpu->kvm->arch.user_set_tsc =3D true; =20 /* - * We also track th most recent recorded KHZ, write and time to - * allow the matching interval to be extended at each write. + * Track the TSC frequency, scaling ratio, and offset for the current + * generation. These are used to detect matching TSC writes and to + * compute the guest TSC from the host clock. */ - kvm->arch.last_tsc_nsec =3D ns; - kvm->arch.last_tsc_write =3D tsc; - kvm->arch.last_tsc_khz =3D vcpu->arch.virtual_tsc_khz; - kvm->arch.last_tsc_offset =3D offset; - kvm->arch.last_tsc_scaling_ratio =3D vcpu->arch.l1_tsc_scaling_ratio; + kvm->arch.cur_tsc_khz =3D vcpu->arch.virtual_tsc_khz; + kvm->arch.cur_tsc_scaling_ratio =3D vcpu->arch.l1_tsc_scaling_ratio; =20 vcpu->arch.last_guest_tsc =3D tsc; =20 @@ -2833,8 +2831,6 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vc= pu, u64 offset, u64 tsc, * nanosecond time, offset, and write, so if TSCs are in * sync, we can match exact offset, and if not, we can match * exact software computation in compute_guest_tsc() - * - * These values are tracked in kvm->arch.cur_xxx variables. */ kvm->arch.cur_tsc_generation++; kvm->arch.cur_tsc_nsec =3D ns; @@ -2874,7 +2870,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu= , u64 *user_value) } =20 offset =3D kvm_compute_l1_tsc_offset(vcpu, host_tsc, data); - elapsed =3D ns - kvm->arch.last_tsc_nsec; + elapsed =3D ns - kvm->arch.cur_tsc_nsec; =20 if (vcpu->arch.virtual_tsc_khz) { if (data =3D=3D 0) { @@ -2884,7 +2880,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu= , u64 *user_value) */ synchronizing =3D true; } else if (kvm->arch.user_set_tsc) { - u64 tsc_exp =3D kvm->arch.last_tsc_write + + u64 tsc_exp =3D kvm->arch.cur_tsc_write + nsec_to_cycles(vcpu, elapsed); u64 tsc_hz =3D vcpu->arch.virtual_tsc_khz * 1000LL; /* @@ -2915,14 +2911,14 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vc= pu, u64 *user_value) * it's better to try to match offsets from the beginning. */ if (synchronizing && - vcpu->arch.virtual_tsc_khz =3D=3D kvm->arch.last_tsc_khz) { + vcpu->arch.virtual_tsc_khz =3D=3D kvm->arch.cur_tsc_khz) { /* * If synchronizing, the "last written" TSC value/time * recorded by __kvm_synchronize_tsc() should not change * (i.e. should be precisely the same as the existing * generation). */ - data =3D kvm->arch.last_tsc_write; + data =3D kvm->arch.cur_tsc_write; =20 if (!kvm_check_tsc_unstable()) { offset =3D kvm->arch.cur_tsc_offset; @@ -3207,7 +3203,7 @@ static void pvclock_update_vm_gtod_copy(struct kvm *k= vm) * get_kvmclock() to compute kvmclock from the host TSC * without needing a vCPU reference. */ - ka->master_tsc_scaling_ratio =3D ka->last_tsc_scaling_ratio; + ka->master_tsc_scaling_ratio =3D ka->cur_tsc_scaling_ratio; tsc_hz =3D (u64)get_cpu_tsc_khz() * 1000; if (tsc_hz && kvm_caps.has_tsc_control) tsc_hz =3D kvm_scale_tsc(tsc_hz, @@ -6088,8 +6084,8 @@ static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcp= u, raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); =20 matched =3D (vcpu->arch.virtual_tsc_khz && - kvm->arch.last_tsc_khz =3D=3D vcpu->arch.virtual_tsc_khz && - kvm->arch.last_tsc_offset =3D=3D offset); + kvm->arch.cur_tsc_khz =3D=3D vcpu->arch.virtual_tsc_khz && + kvm->arch.cur_tsc_offset =3D=3D offset); =20 tsc =3D kvm_scale_tsc(rdtsc(), vcpu->arch.l1_tsc_scaling_ratio) + offset; ns =3D get_kvmclock_base_ns(); @@ -13543,13 +13539,15 @@ int kvm_arch_enable_virtualization_cpu(void) } =20 /* - * We have to disable TSC offset matching.. if you were - * booting a VM while issuing an S4 host suspend.... - * you may have some problem. Solving this issue is - * left as an exercise to the reader. + * Adjust the TSC matching reference by the same + * delta applied to each vCPU's offset, so that + * future KVM_SET_TSC / vCPU creation still matches + * correctly against the adjusted TSC timeline. + * Scale from host to guest TSC rate. */ - kvm->arch.last_tsc_nsec =3D 0; - kvm->arch.last_tsc_write =3D 0; + kvm->arch.cur_tsc_write -=3D + kvm_scale_tsc(delta_cyc, + kvm->arch.cur_tsc_scaling_ratio); } =20 } --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B12E43E9CB; Mon, 8 Jun 2026 14:55:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930538; cv=none; b=nlTMgHjLQG30s/SGlDaOpqV2++QZPOyk2iiRAL/ZICBMi4koHrq7XvDZLa9vHy2g2cDxtyOEwpbRh3i8A7cRG0nbnUx+0gnc7JKrGX1KaWZ9ddeu+x9DCu7EW6LwBXN5EjbGhe+yh6a19EzlOGGiLHThvUZOgeKDg8CVwB0FT9Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930538; c=relaxed/simple; bh=7Wz8+VAwcDMp5FiEB9Evg7YVWlEd1Zm2fNwaWkDZSnc=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Z65fVnMpGMBj9/VIW6vpmNErZEt6aTJgA3REnqJfJQoe9D2t00Ik2NqYhqcdW+fUlQHl5zXLUIyd92L9A/mnc64NXlq5mMjaN5M45+3z4HlSgdkFXe+22mJuRDZaMHifFcw3itHVLmFGD75E1akbZmQYJWTIju+PV9aaP5RdrHI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=casper.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=t9va9E+4; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=casper.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="t9va9E+4" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=W6KRWl5EKJVhKKJ+V1j8dnQD5A6yc6eAk2IU9lJn4Ac=; b=t9va9E+4WmvNE0SA7fFgnaHxAD 7AhhRvH+95yZbHhxU70s20pgqCglU/2MRBf1gWeBjW9qHK7AGbJzeCPBsEqwAGYcYGBfsGKyUr0K8 I+Kucvi+HCbxb6DqCmYG9QGa1BzBi/4nL3594Qqo4Fe9p0Bmocw0V0aKBMjgDXLqyVEvPUG5alwe3 Gwm1fDNDHbc9FsARs4mHF2I3Hq5XuSDM8w1ZLNSPU2uby115X+7zeGiFO7d1bAmfZIfNETMCOZs3e wGUEret33AxsKpui8L7/TSjGwrWVDEn5pApD3sHs6Wl2fYTNj0aa5VWYwneoqHr3ah5Gg13o/A9mJ awCwVtvA==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNS-0000000DtxQ-0eWu; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NFm-1WIb; Mon, 08 Jun 2026 15:54:58 +0100 From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 20/34] KVM: x86: Replace nr_vcpus_matched_tsc count with all_vcpus_matched_tsc bool Date: Mon, 8 Jun 2026 15:48:01 +0100 Message-ID: <20260608145455.89187-21-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse Using a count and comparing with kvm->online_vcpus was always racy because a new vCPU could be created while kvm_track_tsc_matching() was running and comparing with kvm->online_vcpus. That variable is only atomic with respect to itself; kvm_arch_vcpu_create() runs before kvm->online_vcpus is incremented for the new vCPU. Replace the count with a boolean that is set in kvm_track_tsc_matching() after comparing the count, and cleared when a new TSC generation starts. The boolean is consumed by pvclock_update_vm_gtod_copy() under the tsc_write_lock, which serializes against __kvm_synchronize_tsc(). Keep the count for now as it's still used in the trace event. Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/x86.c | 10 ++++++---- 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 59298a8f78eb..eb81f90284ba 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1492,6 +1492,7 @@ struct kvm_arch { u64 cur_tsc_write; u64 cur_tsc_offset; u64 cur_tsc_generation; + bool all_vcpus_matched_tsc; int nr_vcpus_matched_tsc; =20 u32 default_tsc_khz; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index bbd642e0dc54..ac66f8e7116f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2651,8 +2651,10 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *= vcpu, bool new_generation) * and all vCPUs must have matching TSCs. Note, the count for matching * vCPUs doesn't include the reference vCPU, hence "+1". */ - bool use_master_clock =3D (ka->nr_vcpus_matched_tsc + 1 =3D=3D - atomic_read(&vcpu->kvm->online_vcpus)) && + ka->all_vcpus_matched_tsc =3D (ka->nr_vcpus_matched_tsc + 1 =3D=3D + atomic_read(&vcpu->kvm->online_vcpus)); + + bool use_master_clock =3D ka->all_vcpus_matched_tsc && gtod_is_based_on_tsc(gtod->clock.vclock_mode); =20 /* @@ -2837,6 +2839,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vc= pu, u64 offset, u64 tsc, kvm->arch.cur_tsc_write =3D tsc; kvm->arch.cur_tsc_offset =3D offset; kvm->arch.nr_vcpus_matched_tsc =3D 0; + kvm->arch.all_vcpus_matched_tsc =3D false; } else if (vcpu->arch.this_tsc_generation !=3D kvm->arch.cur_tsc_generati= on) { kvm->arch.nr_vcpus_matched_tsc++; } @@ -3177,8 +3180,7 @@ static void pvclock_update_vm_gtod_copy(struct kvm *k= vm) bool host_tsc_clocksource, vcpus_matched; =20 lockdep_assert_held(&kvm->arch.tsc_write_lock); - vcpus_matched =3D (ka->nr_vcpus_matched_tsc + 1 =3D=3D - atomic_read(&kvm->online_vcpus)); + vcpus_matched =3D ka->all_vcpus_matched_tsc; =20 /* * If the host uses TSC clock, then passthrough TSC as stable --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930539; cv=none; d=zohomail.com; s=zohoarc; b=dWb0MZN9Ru0u0Rrg+0msC5fVE7eJFO1qUrcXf1l9u4PKZSrZVAy1V4ZqG5Fhb6LzDAa8dJtFeX/l8/xBn/29CR/qQ3nasqEx2Qglw16P+qOqAYrLFu/fOd3MkcPOIqF6wks33LAlmRGBw24U5+qXbViECJE+qN90/gf/wMwfogU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930539; h=Content-Type:Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=cmBqqtGJwbuFFzQ3xL4KAIBfESZHairUslQqBI77Ze0=; b=S+lv08+naNVW2PzeTG0l6mTVcxQz+OriovyXNbzFZww2v0D+I2ylgaT2h+SAxPlE1Ggj4J3q7XBIlDxthRfZje5FO2aQP35nRdPenraNNmcATVsb7yluxjEcgK5SlOVWzb/u+kzTxK3iGJrvLWtaEfSdNoAPWgYFdF2gnbTUmnU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930539467186.92095962333894; Mon, 8 Jun 2026 07:55:39 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331707.1594293 (Exim 4.92) (envelope-from ) id 1wWbNk-00027x-AB; Mon, 08 Jun 2026 14:55:16 +0000 Received: by outflank-mailman (output) from mailman id 1331707.1594293; Mon, 08 Jun 2026 14:55:16 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNk-00025i-1o; Mon, 08 Jun 2026 14:55:16 +0000 Received: by outflank-mailman (input) for mailman id 1331707; Mon, 08 Jun 2026 14:55:14 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNh-0001P6-TP for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:14 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNh-00EcHH-95; Mon, 08 Jun 2026 16:55:13 +0200 Received: from [10.42.69.3] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7c9-bab6-0a2a0a5309dd-0a2a4503be90-16 for ; Mon, 08 Jun 2026 16:55:13 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-33051d.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d0-672d-0a2a45030019-5a9b3222d954-3 for ; Mon, 08 Jun 2026 16:55:13 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNS-0000000DtxR-0or9; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NFq-1gTE; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=cmBqqtGJwbuFFzQ3xL4KAIBfESZHairUslQqBI77Ze0=; b=dB1mnmM67CSlPJ6qjMZQQjgdYG Wvrfg4w3dEFSg2McKUMLj7rJRsW2heHp4o5GCCWuu7n8rYuEJbnowfNdy8nwZZEECFDzCuY1rFpeI tTHDpGf+o6xNA2jP69ahEanmXVrPb/JeZd1tefKD11xaBN6l9FLgluxY4zEqEFAhJ2EFpIRBQPGig R6VZDpVufIAjpY1rbFjXr+woncXFvfV6Jrc1PGg/gZoh4b8QZkmL4RJQZgSuPO6Y4WpUSRmKeGUpa GQgB0qyhndvUYvNOeBQs0vTz/R805XKvXREXqgPNbSllBj2QKpXgaOSasLbcdLxFhYlbqVHB01qy9 xmXunmwg==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 21/34] KVM: x86: Allow KVM master clock mode when TSCs are offset from each other Date: Mon, 8 Jun 2026 15:48:02 +0100 Message-ID: <20260608145455.89187-22-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-33051d/1780930513-37340938-D226680D/0/0 X-purgate-type: clean X-purgate-size: 8693 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930542052154100 From: David Woodhouse Previously, a guest writing different TSC values on different vCPUs could force KVM out of master clock mode. With this change, only a frequency mismatch disables master clock. The only ways for non-master-clock mode to happen now are archaic hardware without a TSC-based clocksource, or a VMM that sets different TSC frequencies across vCPUs. Running at a different frequency would lead to a systemic skew between the clock(s) as observed by different vCPUs due to arithmetic precision in the scaling. So that should indeed force the clock to be based on the host's CLOCK_MONOTONIC_RAW instead of being in masterclock mode where it is defined by the guest TSC. But when the vCPUs merely have a different TSC *offset*, that's not a problem. The offset is applied to that vCPU's kvmclock->tsc_timestamp field, and it all comes out in the wash. Track frequency matching separately from offset matching using a dedicated freq generation counter (cur_tsc_freq_generation) that only bumps on actual frequency changes. Each vCPU is counted exactly once per freq generation via a per-vCPU this_tsc_freq_generation field, preventing repeated syncs of the same vCPU from falsely re-enabling master clock. Note that the generation-based counting has a known limitation: if all vCPUs are in sync and one changes away and then back again, the other vCPUs are still at the old generation and won't be counted until they sync again (which may never happen). This was always the case for the offset tracking and isn't expected VMM behaviour =E2=80=94 although it is t= he scenario that the VM-wide KVM_SET_TSC_KHZ ioctl was introduced to handle cleanly. While at it, restructure the existing TSC offset generation tracking to use the same pattern: reset counter to zero on new generation, then unconditionally count vCPUs that haven't been seen in this generation. Both counters now use a consistent >=3D online_vcpus threshold (1-based counting where the reference vCPU is included in the count). Use frequency match for master clock eligibility, and full TSC match (including offset) only for PVCLOCK_TSC_STABLE_BIT, which tells the guest it is safe to skip cross-vCPU monotonicity enforcement. Signed-off-by: David Woodhouse --- arch/x86/include/asm/kvm_host.h | 4 ++ arch/x86/kvm/x86.c | 68 ++++++++++++++++++++++++++------- 2 files changed, 58 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index eb81f90284ba..699a1a197194 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -970,6 +970,7 @@ struct kvm_vcpu_arch { u64 this_tsc_nsec; u64 this_tsc_write; u64 this_tsc_generation; + u64 this_tsc_freq_generation; bool tsc_catchup; bool tsc_always_catchup; s8 virtual_tsc_shift; @@ -1493,6 +1494,9 @@ struct kvm_arch { u64 cur_tsc_offset; u64 cur_tsc_generation; bool all_vcpus_matched_tsc; + bool all_vcpus_matched_freq; + int nr_vcpus_matched_freq; + u64 cur_tsc_freq_generation; int nr_vcpus_matched_tsc; =20 u32 default_tsc_khz; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ac66f8e7116f..86c30be4c5d2 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2647,14 +2647,37 @@ static void kvm_track_tsc_matching(struct kvm_vcpu = *vcpu, bool new_generation) struct pvclock_gtod_data *gtod =3D &pvclock_gtod_data; =20 /* - * To use the masterclock, the host clocksource must be based on TSC - * and all vCPUs must have matching TSCs. Note, the count for matching - * vCPUs doesn't include the reference vCPU, hence "+1". + * Track whether all vCPUs have matching TSC offsets (for + * PVCLOCK_TSC_STABLE_BIT) and matching frequencies (for + * master clock eligibility). + */ + + /* + * A new vCPU might already have incremented ->online_vcpus + * and cause a temporary false negative here. But will then + * call kvm_synchronize_tsc() from kvm_arch_vcpu_postcreate() + * and finish the job. */ - ka->all_vcpus_matched_tsc =3D (ka->nr_vcpus_matched_tsc + 1 =3D=3D - atomic_read(&vcpu->kvm->online_vcpus)); + int online =3D atomic_read(&vcpu->kvm->online_vcpus); =20 - bool use_master_clock =3D ka->all_vcpus_matched_tsc && + ka->all_vcpus_matched_tsc =3D (ka->nr_vcpus_matched_tsc >=3D online); + /* + * all_vcpus_matched_freq starts true and is cleared when + * __kvm_synchronize_tsc() detects a frequency mismatch. + * Re-enable when all vCPUs have synced with matching frequency. + * If all offsets also match, that implies frequencies match too. + */ + if (ka->all_vcpus_matched_tsc || + ka->nr_vcpus_matched_freq >=3D online) + ka->all_vcpus_matched_freq =3D true; + + /* + * To use the masterclock, the host clocksource must be based on TSC + * and all vCPUs must have matching TSC *frequency*. Different offsets + * are fine =E2=80=94 each vCPU's pvclock has its own tsc_timestamp that + * accounts for its offset. + */ + bool use_master_clock =3D ka->all_vcpus_matched_freq && gtod_is_based_on_tsc(gtod->clock.vclock_mode); =20 /* @@ -2818,7 +2841,22 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *v= cpu, u64 offset, u64 tsc, * Track the TSC frequency, scaling ratio, and offset for the current * generation. These are used to detect matching TSC writes and to * compute the guest TSC from the host clock. + * + * If the frequency changed, master clock mode can no longer be used + * since the kvmclock scaling factors differ between vCPUs. */ + if (vcpu->arch.virtual_tsc_khz !=3D kvm->arch.cur_tsc_khz) { + kvm->arch.cur_tsc_freq_generation++; + kvm->arch.all_vcpus_matched_freq =3D false; + kvm->arch.nr_vcpus_matched_freq =3D 0; + } + + /* Count each vCPU once per freq generation */ + if (vcpu->arch.this_tsc_freq_generation !=3D kvm->arch.cur_tsc_freq_gener= ation) { + vcpu->arch.this_tsc_freq_generation =3D kvm->arch.cur_tsc_freq_generatio= n; + kvm->arch.nr_vcpus_matched_freq++; + } + kvm->arch.cur_tsc_khz =3D vcpu->arch.virtual_tsc_khz; kvm->arch.cur_tsc_scaling_ratio =3D vcpu->arch.l1_tsc_scaling_ratio; =20 @@ -2835,17 +2873,18 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *= vcpu, u64 offset, u64 tsc, * exact software computation in compute_guest_tsc() */ kvm->arch.cur_tsc_generation++; + kvm->arch.all_vcpus_matched_tsc =3D false; + kvm->arch.nr_vcpus_matched_tsc =3D 0; kvm->arch.cur_tsc_nsec =3D ns; kvm->arch.cur_tsc_write =3D tsc; kvm->arch.cur_tsc_offset =3D offset; - kvm->arch.nr_vcpus_matched_tsc =3D 0; - kvm->arch.all_vcpus_matched_tsc =3D false; - } else if (vcpu->arch.this_tsc_generation !=3D kvm->arch.cur_tsc_generati= on) { + } + + if (vcpu->arch.this_tsc_generation !=3D kvm->arch.cur_tsc_generation) { + vcpu->arch.this_tsc_generation =3D kvm->arch.cur_tsc_generation; kvm->arch.nr_vcpus_matched_tsc++; } =20 - /* Keep track of which generation this VCPU has synchronized to */ - vcpu->arch.this_tsc_generation =3D kvm->arch.cur_tsc_generation; vcpu->arch.this_tsc_nsec =3D kvm->arch.cur_tsc_nsec; vcpu->arch.this_tsc_write =3D kvm->arch.cur_tsc_write; =20 @@ -3180,7 +3219,7 @@ static void pvclock_update_vm_gtod_copy(struct kvm *k= vm) bool host_tsc_clocksource, vcpus_matched; =20 lockdep_assert_held(&kvm->arch.tsc_write_lock); - vcpus_matched =3D ka->all_vcpus_matched_tsc; + vcpus_matched =3D ka->all_vcpus_matched_freq; =20 /* * If the host uses TSC clock, then passthrough TSC as stable @@ -3527,7 +3566,7 @@ int kvm_guest_time_update(struct kvm_vcpu *v) =20 /* If the host uses TSC clocksource, then it is stable */ hv_clock.flags =3D 0; - if (use_master_clock) + if (use_master_clock && ka->all_vcpus_matched_tsc) hv_clock.flags |=3D PVCLOCK_TSC_STABLE_BIT; =20 if (vcpu->pv_time.active) { @@ -6354,7 +6393,7 @@ static int kvm_vcpu_ioctl_get_clock_guest(struct kvm_= vcpu *v, void __user *argp) =20 hv_clock.tsc_shift =3D vcpu->pvclock_tsc_shift; hv_clock.tsc_to_system_mul =3D vcpu->pvclock_tsc_mul; - hv_clock.flags =3D PVCLOCK_TSC_STABLE_BIT; + hv_clock.flags =3D ka->all_vcpus_matched_tsc ? PVCLOCK_TSC_STABLE_BIT : 0; =20 if (copy_to_user(argp, &hv_clock, sizeof(hv_clock))) return -EFAULT; @@ -13649,6 +13688,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long= type) mutex_init(&kvm->arch.apic_map_lock); seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lo= ck); kvm->arch.kvmclock_offset =3D -get_kvmclock_base_ns(); + kvm->arch.all_vcpus_matched_freq =3D true; =20 raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); pvclock_update_vm_gtod_copy(kvm); --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930557; cv=none; d=zohomail.com; s=zohoarc; b=ESppG1PzKj6VB+KDZ8hD7vrT27uNfvTb3z8htWpez+QdPlMAGx99Q4O+X2rjSgDh622E51g3Zl0/PlUy1BZVtp4H6cLNdqboVQ8x2U4WPbyrrsGYPhrafnB1vrd2e+hgcYvej5r62cn+LoMts/hNKNo8vpSgmwhVnzVZAcgLKmA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930557; h=Content-Type:Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=ONJsZsuBBdVs8sdkr0bY486TNXSRV1vDyDb41XHmd34=; b=g262v6KkwkyY4imCkbQSUQVvFum3AJFKFkLjLeQ3pbTd2R9SN9SBbpTk1BZF1G8U8G7ZYIAJabvUKlXQwTDQjS+PCEC1/2p0npkRBJL3ML4V/2DauE1fmxemt0wNF3nXNYgCOXxnTL4JB/7YZWsVx/DRfzKmDkLbfT9aSCMJF4U= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930557869679.0088936150568; Mon, 8 Jun 2026 07:55:57 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331720.1594406 (Exim 4.92) (envelope-from ) id 1wWbNz-0005Px-Fj; Mon, 08 Jun 2026 14:55:31 +0000 Received: by outflank-mailman (output) from mailman id 1331720.1594406; Mon, 08 Jun 2026 14:55:31 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNy-0005Lt-IZ; Mon, 08 Jun 2026 14:55:30 +0000 Received: by outflank-mailman (input) for mailman id 1331720; Mon, 08 Jun 2026 14:55:18 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNl-0002Y3-QE for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNl-00EcHE-6C; Mon, 08 Jun 2026 16:55:17 +0200 Received: from [10.42.69.9] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7bc-5cb7-0a2a0a5109dd-0a2a4509e8d2-42 for ; Mon, 08 Jun 2026 16:55:17 +0200 Received: from [90.155.92.199] (helo=desiato.infradead.org) by tlsNG-bad1c0.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d4-2497-0a2a45090019-5a9b5cc785ce-3 for ; Mon, 08 Jun 2026 16:55:17 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001AgH-2Mvm; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NFu-1qUd; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=desiato.20200630 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=ONJsZsuBBdVs8sdkr0bY486TNXSRV1vDyDb41XHmd34=; b=i2W5qdjt3Ffu2Ly9INCzHjLLH6 RkzTt1z+9AsIrp0HUAIl9AGZPPTK09k3yNWdrUA/VeJa981Xw8Ug3rTfQwWuLJjb1gjJmph3lQkQJ +lo+godeqk4nMD1OC4bM595+9edWadmNDfCGT6rYGVTOTIoOZ5qTE02zBBvciBfF9+2MU6+ydassJ lJ20Tgl6c4JIXS0G8KdGzZ9qgXD3UeUy1KZCziPm5oMpvEDb+Gew4saE9xRT0JrdjLuLNbamLGgZ0 jtO5zNHyxnIC8Z6fNFOeqn/n7WYT7Akwywu/K/AHgbVD2CVdIIwzOD+94WtIULLip2HGqbVy4+qee YJj4F0fw==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 22/34] KVM: selftests: Add master clock offset test Date: Mon, 8 Jun 2026 15:48:03 +0100 Message-ID: <20260608145455.89187-23-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-bad1c0/1780930517-89377A53-DE61EEF4/0/0 X-purgate-type: clean X-purgate-size: 7995 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930560330154100 From: David Woodhouse Verify that KVM master clock mode remains active when vCPUs have different TSC offsets but the same frequency. Creates three vCPUs, sets one to a different TSC value, and confirms: - KVM_CLOCK_HOST_TSC is set (master clock active) - KVM_CLOCK_TSC_STABLE is NOT set (offsets differ) Signed-off-by: David Woodhouse Assisted-by: Kiro (claude-opus-4.6-1m) --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../kvm/x86/masterclock_offset_test.c | 180 ++++++++++++++++++ 2 files changed, 181 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86/masterclock_offset_test= .c diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index 90568ab631d7..7ecaaf82056e 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -106,6 +106,7 @@ TEST_GEN_PROGS_x86 +=3D x86/pmu_event_filter_test TEST_GEN_PROGS_x86 +=3D x86/private_mem_conversions_test TEST_GEN_PROGS_x86 +=3D x86/private_mem_kvm_exits_test TEST_GEN_PROGS_x86 +=3D x86/pvclock_test +TEST_GEN_PROGS_x86 +=3D x86/masterclock_offset_test TEST_GEN_PROGS_x86 +=3D x86/pvclock_migration_test TEST_GEN_PROGS_x86 +=3D x86/set_boot_cpu_id TEST_GEN_PROGS_x86 +=3D x86/set_sregs_test diff --git a/tools/testing/selftests/kvm/x86/masterclock_offset_test.c b/to= ols/testing/selftests/kvm/x86/masterclock_offset_test.c new file mode 100644 index 000000000000..88e2bd2edab5 --- /dev/null +++ b/tools/testing/selftests/kvm/x86/masterclock_offset_test.c @@ -0,0 +1,180 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Test that KVM master clock mode works with different TSC offsets + * as long as all vCPUs have the same TSC frequency. + */ +#include +#include + +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" + +#include + +#define KVMCLOCK_GPA 0xc0000000ull +#define TSC_OFFSET (1000000000ULL) + +static uint64_t pvclock_calc(struct pvclock_vcpu_time_info *pvti, uint64_t= guest_tsc) +{ + uint64_t delta =3D guest_tsc - pvti->tsc_timestamp; + + if (pvti->tsc_shift >=3D 0) + delta <<=3D pvti->tsc_shift; + else + delta >>=3D -(int)pvti->tsc_shift; + + return pvti->system_time + ((__uint128_t)delta * pvti->tsc_to_system_mul = >> 32); +} + +static void guest_code(void) +{ + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, KVMCLOCK_GPA | KVM_MSR_ENABLED); + for (;;) + GUEST_SYNC(0); +} + +int main(void) +{ + struct kvm_vcpu *vcpus[3]; + struct kvm_clock_data clock; + struct pvclock_vcpu_time_info pvti[3]; + struct kvm_vm *vm; + uint64_t offset0, host_tsc, clk0, clk2; + int i; + + TEST_REQUIRE(sys_clocksource_is_based_on_tsc()); + + vm =3D vm_create_with_vcpus(3, guest_code, vcpus); + + TEST_REQUIRE(!__vcpu_has_device_attr(vcpus[0], KVM_VCPU_TSC_CTRL, + KVM_VCPU_TSC_OFFSET)); + + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, + KVMCLOCK_GPA, 1, + vm_calc_num_guest_pages(VM_MODE_DEFAULT, + getpagesize()), 0); + virt_map(vm, KVMCLOCK_GPA, KVMCLOCK_GPA, + vm_calc_num_guest_pages(VM_MODE_DEFAULT, getpagesize())); + + /* Get vCPU 0's default offset and set vCPU 2's offset higher */ + vcpu_device_attr_get(vcpus[0], KVM_VCPU_TSC_CTRL, + KVM_VCPU_TSC_OFFSET, &offset0); + uint64_t offset2 =3D offset0 + TSC_OFFSET; + vcpu_device_attr_set(vcpus[2], KVM_VCPU_TSC_CTRL, + KVM_VCPU_TSC_OFFSET, &offset2); + + /* Run each vCPU to enable kvmclock (with offset already set) */ + for (i =3D 0; i < 3; i++) { + vcpu_run(vcpus[i]); + TEST_ASSERT_KVM_EXIT_REASON(vcpus[i], KVM_EXIT_IO); + } + + /* Check master clock is active */ + memset(&clock, 0, sizeof(clock)); + vm_ioctl(vm, KVM_GET_CLOCK, &clock); + pr_info("KVM_GET_CLOCK flags: 0x%x\n", clock.flags); + TEST_ASSERT(clock.flags & KVM_CLOCK_HOST_TSC, + "Master clock should be active, flags=3D0x%x", clock.flags); + TEST_ASSERT(clock.flags & KVM_CLOCK_TSC_STABLE, + "KVM_CLOCK_TSC_STABLE should be set, flags=3D0x%x", clock.flags); + + /* Get per-vCPU pvclock in order 0, 2, 1 */ + int order[] =3D {0, 2, 1}; + for (i =3D 0; i < 3; i++) { + int idx =3D order[i]; + __vcpu_ioctl(vcpus[idx], KVM_GET_CLOCK_GUEST, &pvti[idx]); + pr_info("vCPU %d: tsc_timestamp=3D%lu system_time=3D%lu " + "mul=3D%u shift=3D%d flags=3D0x%x\n", + idx, (unsigned long)pvti[idx].tsc_timestamp, + (unsigned long)pvti[idx].system_time, + pvti[idx].tsc_to_system_mul, pvti[idx].tsc_shift, + pvti[idx].flags); + } + + /* Read guest TSCs: should see (0+OFF) < 2 < (1+OFF) */ + uint64_t gtsc0 =3D vcpu_get_msr(vcpus[0], MSR_IA32_TSC); + uint64_t gtsc2 =3D vcpu_get_msr(vcpus[2], MSR_IA32_TSC); + uint64_t gtsc1 =3D vcpu_get_msr(vcpus[1], MSR_IA32_TSC); + pr_info("Guest TSCs: vcpu0=3D%lu vcpu2=3D%lu vcpu1=3D%lu\n", + (unsigned long)gtsc0, (unsigned long)gtsc2, (unsigned long)gtsc1); + pr_info("vcpu0+OFF=3D%lu vcpu1+OFF=3D%lu\n", + (unsigned long)(gtsc0 + TSC_OFFSET), + (unsigned long)(gtsc1 + TSC_OFFSET)); + TEST_ASSERT(gtsc0 + TSC_OFFSET < gtsc2 && gtsc2 < gtsc1 + TSC_OFFSET, + "Expected (vcpu0+OFF) < vcpu2 < (vcpu1+OFF)"); + + /* PVCLOCK_TSC_STABLE_BIT should NOT be set (offsets differ) */ + TEST_ASSERT(!(pvti[2].flags & PVCLOCK_TSC_STABLE_BIT), + "PVCLOCK_TSC_STABLE_BIT should NOT be set, flags=3D0x%x", + pvti[2].flags); + + /* Same mul/shift */ + TEST_ASSERT(pvti[0].tsc_to_system_mul =3D=3D pvti[2].tsc_to_system_mul && + pvti[0].tsc_shift =3D=3D pvti[2].tsc_shift, + "All vCPUs should have same mul/shift"); + + /* + * Read host TSC once. At this instant: + * vCPU 0 guest TSC =3D host_tsc + offset0 + * vCPU 2 guest TSC =3D host_tsc + offset0 + TSC_OFFSET + * Feed each through its pvclock. Expect the same kvmclock. + */ + host_tsc =3D rdtsc(); + clk0 =3D pvclock_calc(&pvti[0], host_tsc + offset0); + clk2 =3D pvclock_calc(&pvti[2], host_tsc + offset0 + TSC_OFFSET); + + pr_info("kvmclock via vCPU 0: %lu ns\n", (unsigned long)clk0); + pr_info("kvmclock via vCPU 2: %lu ns\n", (unsigned long)clk2); + TEST_ASSERT(clk0 =3D=3D clk2, + "kvmclock from offset vCPUs should match exactly, " + "diff=3D%ld ns", (long)(clk2 - clk0)); + + pr_info("PASSED: pvclock consistent across offset vCPUs\n"); + + /* + * Now add an hour to the VM kvmclock via KVM_SET_CLOCK, run each + * vCPU to pick up the update, and check they're still in sync. + */ + { +#define ONE_HOUR_NS (3600ULL * NSEC_PER_SEC) + struct kvm_clock_data setclk =3D { .clock =3D clock.clock + ONE_HOUR_NS = }; + + vm_ioctl(vm, KVM_SET_CLOCK, &setclk); + } + + /* Guest code does GUEST_SYNC then exits =E2=80=94 run each to see update= */ + for (i =3D 0; i < 3; i++) { + vcpu_run(vcpus[order[i]]); + TEST_ASSERT_KVM_EXIT_REASON(vcpus[order[i]], KVM_EXIT_IO); + } + + /* Re-read pvclocks */ + for (i =3D 0; i < 3; i++) + __vcpu_ioctl(vcpus[order[i]], KVM_GET_CLOCK_GUEST, &pvti[order[i]]); + + pr_info("After +1h: vCPU 0 system_time=3D%lu, vCPU 2 system_time=3D%lu\n", + (unsigned long)pvti[0].system_time, + (unsigned long)pvti[2].system_time); + TEST_ASSERT(pvti[0].system_time =3D=3D pvti[2].system_time, + "system_time should still match after KVM_SET_CLOCK"); + + host_tsc =3D rdtsc(); + clk0 =3D pvclock_calc(&pvti[0], host_tsc + offset0); + clk2 =3D pvclock_calc(&pvti[2], host_tsc + offset0 + TSC_OFFSET); + + pr_info("After +1h: kvmclock via vCPU 0: %lu ns\n", (unsigned long)clk0); + pr_info("After +1h: kvmclock via vCPU 2: %lu ns\n", (unsigned long)clk2); + TEST_ASSERT(clk0 =3D=3D clk2, + "After +1h: kvmclock should still match, diff=3D%ld ns", + (long)(clk2 - clk0)); + + /* Verify the clock actually moved by ~1 hour */ + TEST_ASSERT(clk0 > ONE_HOUR_NS, + "Clock should be > 1 hour after set, got %lu ns", + (unsigned long)clk0); + + pr_info("PASSED: pvclock still consistent after KVM_SET_CLOCK +1h\n"); + kvm_vm_free(vm); + return 0; +} --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A00F46AF36; Mon, 8 Jun 2026 14:55:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930546; cv=none; b=O3h/OsAwD8hRRxfB8v1ipVfbDu0v+n/MkMynbX/EX/y7byjvFPk07fkg+P2wq0feIe5UhwKdecALyiGphIO7tuQ5rLWcPgV6nRe+jS2fdGneTFn4bFbYyAka6W3yhpddJvLDB5OwoZPONXJ1wl5oknAIWj5OKTkuLXg4DAsYfD0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930546; c=relaxed/simple; bh=W6njqFmEDJ7HeD7Rffg21UkWZz8yIroVUFoIw5g2Bu4=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=X8LOGhFZuapAms+2n1zQfFnyeDjrken8Q7iyxgrJfD0GPbTJbOiOEHe9Apu7wE1QVI4benQM74ppKjsb42S6MULMGshDuKyUV2EXlgJGwsYHgFxtUOtpk0+U7kBry7vwDIUE/Chkxw92wtVxIbjbjHysP1zzlY9ixi6UHbZ8lKU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=desiato.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=LZ34K6Z/; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=desiato.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="LZ34K6Z/" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=V9tsEhABtLjPHBbHUDtdbXgPRuXzrYUZjf8XuXI9WXM=; b=LZ34K6Z/ddUt3/HQU86KKuhjIf DHs/vhqZl97OOyCk/KC0mjRjKd3b+RCQWe+oE5pzxOekNIgh64VpJwMH51XifVzRtsZwMcz6xLywe eMW0NG+tyuhVE85Y6YxEYqiqsj/JqOrv/QcyGxdgA5hG4apiLHPVknvr+dIAIm3Kr2YG+xGlMaEdX 2jOo5/sxniu4E9QdxMzGM/SduW5Su+K50DThxKGruIN1XbtKcoEpMJxftu8+vJzLwBzx4aOJ+Kbov fW1UYS9UvugRAEufvhayteWenkHeBjEIRgiWXJ8XJtdOebVsgjEt+M8HWZzfjV/tHilkP+BvlKdlO v763ZQKw==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001AgI-2L15; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NFy-20wI; Mon, 08 Jun 2026 15:54:58 +0100 From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 23/34] KVM: x86: Factor out kvm_use_master_clock() Date: Mon, 8 Jun 2026 15:48:04 +0100 Message-ID: <20260608145455.89187-24-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html From: David Woodhouse Both kvm_track_tsc_matching() and pvclock_update_vm_gtod_copy() make a decision about whether the KVM clock should be in master clock mode. They used *different* criteria for the decision though. This isn't really a problem; it only has the potential to cause unnecessary invocations of KVM_REQ_MASTERCLOCK_UPDATE if the masterclock was disabled due to TSC going backwards, or the guest using the old MSR. But it isn't pretty. Factor the decision out to a single function. And document the historical reason why it's disabled for guests that use the old MSR_KVM_SYSTEM_TIME. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- arch/x86/kvm/x86.c | 40 ++++++++++++++++++++++++++++++---------- 1 file changed, 30 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 86c30be4c5d2..72fb4620a5ba 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2638,11 +2638,30 @@ static inline bool gtod_is_based_on_tsc(int mode) { return mode =3D=3D VDSO_CLOCKMODE_TSC || mode =3D=3D VDSO_CLOCKMODE_HVCLO= CK; } -#endif + +static bool kvm_use_master_clock(struct kvm *kvm) +{ + struct kvm_arch *ka =3D &kvm->arch; + + /* + * The 'old kvmclock' check is a workaround (from 2015) for a + * SUSE 2.6.16 kernel that didn't boot if the system_time in + * its kvmclock was too far behind the current time. So the + * mode of just setting the reference point and allowing time + * to proceed linearly from there makes it fail to boot. + * Despite that being kind of the *point* of the way the clock + * is exposed to the guest. By coincidence, the offending + * kernels used the old MSR_KVM_SYSTEM_TIME, which was moved + * only because it resided in the wrong number range. So the + * workaround is activated for *all* guests using the old MSR. + */ + return ka->all_vcpus_matched_freq && + !ka->backwards_tsc_observed && + !ka->boot_vcpu_runs_old_kvmclock; +} =20 static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu, bool new_generat= ion) { -#ifdef CONFIG_X86_64 struct kvm_arch *ka =3D &vcpu->kvm->arch; struct pvclock_gtod_data *gtod =3D &pvclock_gtod_data; =20 @@ -2677,7 +2696,7 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *v= cpu, bool new_generation) * are fine =E2=80=94 each vCPU's pvclock has its own tsc_timestamp that * accounts for its offset. */ - bool use_master_clock =3D ka->all_vcpus_matched_freq && + bool use_master_clock =3D kvm_use_master_clock(vcpu->kvm) && gtod_is_based_on_tsc(gtod->clock.vclock_mode); =20 /* @@ -2693,8 +2712,11 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *= vcpu, bool new_generation) trace_kvm_track_tsc(vcpu->vcpu_id, ka->nr_vcpus_matched_tsc, atomic_read(&vcpu->kvm->online_vcpus), ka->use_master_clock, gtod->clock.vclock_mode); -#endif } +#else +static inline void kvm_track_tsc_matching(struct kvm_vcpu *vcpu, + bool new_generation) {} +#endif =20 /* * Multiply tsc by a fixed point number represented by ratio. @@ -3216,10 +3238,9 @@ static void pvclock_update_vm_gtod_copy(struct kvm *= kvm) #ifdef CONFIG_X86_64 struct kvm_arch *ka =3D &kvm->arch; int vclock_mode; - bool host_tsc_clocksource, vcpus_matched; + bool host_tsc_clocksource; =20 lockdep_assert_held(&kvm->arch.tsc_write_lock); - vcpus_matched =3D ka->all_vcpus_matched_freq; =20 /* * If the host uses TSC clock, then passthrough TSC as stable @@ -3229,9 +3250,8 @@ static void pvclock_update_vm_gtod_copy(struct kvm *k= vm) &ka->master_kernel_ns, &ka->master_cycle_now); =20 - ka->use_master_clock =3D host_tsc_clocksource && vcpus_matched - && !ka->backwards_tsc_observed - && !ka->boot_vcpu_runs_old_kvmclock; + ka->use_master_clock =3D host_tsc_clocksource && + kvm_use_master_clock(kvm); =20 if (ka->use_master_clock) { u64 tsc_hz; @@ -3259,7 +3279,7 @@ static void pvclock_update_vm_gtod_copy(struct kvm *k= vm) =20 vclock_mode =3D pvclock_gtod_data.clock.vclock_mode; trace_kvm_update_master_clock(ka->use_master_clock, vclock_mode, - vcpus_matched); + ka->all_vcpus_matched_freq); #endif } =20 --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930574; cv=none; d=zohomail.com; s=zohoarc; b=ljsWwAXzlsMEAIWawzQqVP+JkdGdpbefCCSPYms75S83fGLC52VqOlCB1BH0KJS0GWOFgduz4hu5ZQUt0+QkXqP4ovfR78FVY8Ou2bgA1D/7/GYUIm8OVxU/2LP47ai0R5y0V7GB+skSvp+vcxAPu0H7CCV+9ouvQgvq1Wn14Ts= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930574; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=REjyeU/kyU3vSPUwZePxtCsRdbQCzl4siBooSVcvGrc=; b=Y4fQZqtsiRwpfPnp8n3g3BAYEqCqbPi45NJuZ0RbmN7Hv7l92AUjuScDCsYCtl5u/uJX3qN1/rfHRiJGyc/bn4kVy7M9KC5NYCvxsa4wbjpVtYtYor4P6NhtUkdlTLIigqh89+UM8S97ge+9VgJ6tWIGlWr2l/IR1jFDTpc0hKU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930574214692.0172119168517; Mon, 8 Jun 2026 07:56:14 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331742.1594461 (Exim 4.92) (envelope-from ) id 1wWbOG-0000Po-8o; Mon, 08 Jun 2026 14:55:48 +0000 Received: by outflank-mailman (output) from mailman id 1331742.1594461; Mon, 08 Jun 2026 14:55:47 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbOF-0000Hh-1U; Mon, 08 Jun 2026 14:55:47 +0000 Received: by outflank-mailman (input) for mailman id 1331742; Mon, 08 Jun 2026 14:55:37 +0000 Received: from mx.expurgate.net ([195.190.135.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbO4-0006Kg-6V for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:36 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbO3-002m34-GK; Mon, 08 Jun 2026 16:55:35 +0200 Received: from [10.42.69.11] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7e5-bab6-0a2a0a5309dd-0a2a450b81e4-6 for ; Mon, 08 Jun 2026 16:55:35 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-42698a.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7e7-212f-0a2a450b0019-5a9b3222e524-3 for ; Mon, 08 Jun 2026 16:55:35 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNS-0000000DtxS-1OfW; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NG2-2An5; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=REjyeU/kyU3vSPUwZePxtCsRdbQCzl4siBooSVcvGrc=; b=R6fYnSgo9AJ70nAd2C+RVjywXv 3RKDaLpLEDja3NIhxJvnyHzeAwVdCXItlxZ4/5m49olzzOlHhj+GXzkpTws2uRcJ+4Ii3Hbd9sDrT +FRDlQgaMSPaY/maYBxkxIijSKBuhwsGpJG5AXDt0lfeMkuF1X7eJJqqV+RosheI9940OivN7K+Ve lKyVdYfYPZD0DIbGNPH6Fj1V+zfxWVcO/0xSStYz+p8GQMWNaKLmdZctBeIPzGjbS3oBjL44BQRsX JYi5dmj0UBDBMZQI3uwC8+w0YO8jMjQF6izmHG2O9PsCLfhbVI2em44cdWjGiCwsDf22dPXp7V4FA hOUB8J2Q==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 24/34] KVM: x86: Avoid gratuitous global clock updates Date: Mon, 8 Jun 2026 15:48:05 +0100 Message-ID: <20260608145455.89187-25-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-42698a/1780930535-1377AF3B-05DA3D1E/0/0 X-purgate-type: clean X-purgate-size: 2308 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930575260158500 Content-Type: text/plain; charset="utf-8" From: David Woodhouse Eliminate two sources of unnecessary KVM_REQ_GLOBAL_CLOCK_UPDATE: 1. kvm_write_system_time(): The global clock update was a workaround for ever-drifting clocks based on the host's CLOCK_MONOTONIC subject to NTP skew. Now that the KVM clock uses CLOCK_MONOTONIC_RAW, the clock does not drift with NTP corrections and there is no need to synchronize all vCPUs on boot or resume. Use KVM_REQ_CLOCK_UPDATE on the vCPU itself, and only when the clock is being enabled, not disabled. 2. kvm_arch_vcpu_load(): In master clock mode, migration between pCPUs does not require any clock update since the master clock reference is shared. Only request a local KVM_REQ_CLOCK_UPDATE for the vCPU's first-ever load (vcpu->cpu =3D=3D -1) to generate initial pvclock params. In non-master-clock mode, keep the global update to synchronize all vCPUs. Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 72fb4620a5ba..4fc21d701588 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2457,13 +2457,13 @@ static void kvm_write_system_time(struct kvm_vcpu *= vcpu, gpa_t system_time, } =20 vcpu->arch.time =3D system_time; - kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu); =20 /* we verify if the enable bit is set... */ - if (system_time & 1) + if (system_time & 1) { kvm_gpc_activate(&vcpu->arch.pv_time, system_time & ~1ULL, sizeof(struct pvclock_vcpu_time_info)); - else + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + } else kvm_gpc_deactivate(&vcpu->arch.pv_time); =20 return; @@ -5377,8 +5377,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int c= pu) * On a host with synchronized TSC, there is no need to update * kvmclock on vcpu->cpu migration */ - if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu =3D=3D -1) + if (!vcpu->kvm->arch.use_master_clock) kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu); + else if (vcpu->cpu =3D=3D -1) + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); if (vcpu->cpu !=3D cpu) kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu); vcpu->cpu =3D cpu; --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930545; cv=none; d=zohomail.com; s=zohoarc; b=dFz2HtNe1iRJ591ehjz1o50lIWq+WIQmGJcaVhwF3OLkm4Un981ezQX6XaW8CJfTiwExgPNfLmZM2gS0mqCpChlduuAgtqGSvkFbugp45tEGPhRop6wM5ywhXMJve6NF4TcLkKgb1gojGFgrF6LvqIx0/qyek3nyvN9OXPgIxZs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930545; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=V5hQ2fJb27QK5yOdBL7f7A818+T1JNBNilH2QwnYDyE=; b=GqUJ41TBGtBwfRuT7wd6g5jbfROFolSORUmoRVOW0fg23IP+Bv7G4NNA+kR7aRq/oA2rYTTJg/7zWg78GbTIXYRZfn7MoaKBUuuJVpFDoFWAmfVJYY6POFL2EHL537PTYCQssDBlJj7NETp5/DWbD+6L7hyhblBwZK9QVsAadRI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930545092583.4558717594156; Mon, 8 Jun 2026 07:55:45 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331709.1594310 (Exim 4.92) (envelope-from ) id 1wWbNl-0002PD-CK; Mon, 08 Jun 2026 14:55:17 +0000 Received: by outflank-mailman (output) from mailman id 1331709.1594310; Mon, 08 Jun 2026 14:55:17 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNk-0002NJ-UU; Mon, 08 Jun 2026 14:55:16 +0000 Received: by outflank-mailman (input) for mailman id 1331709; Mon, 08 Jun 2026 14:55:14 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNi-0001Ph-2z for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:14 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNh-00EcHE-FG; Mon, 08 Jun 2026 16:55:13 +0200 Received: from [10.42.69.1] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7d0-5cb7-0a2a0a5109dd-0a2a45018424-2 for ; Mon, 08 Jun 2026 16:55:13 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-d62444.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d1-c1f2-0a2a45010019-5a9b3222c232-3 for ; Mon, 08 Jun 2026 16:55:13 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNS-0000000DtxU-1Yju; Mon, 08 Jun 2026 14:54:58 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NG6-2Q2u; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=V5hQ2fJb27QK5yOdBL7f7A818+T1JNBNilH2QwnYDyE=; b=FHfGEEkE15rW8ab6qOQyAKlwTq pwCb1E2xyKsfKkqAwcMbzw4od+T/RbLOlnkmrIWtidUE9eb7eeSxAWvIhRJrtQLoKXeeQynN5nz/p gLGH5CAUgBbkHX6sHVqcpPU8UIL571t8dTRpztLKdRuwKCRC9on//sgzOz8v5Wqpkkvp6q+iD5ETQ LA0vcVq6M6X0DYXNzh7nmKV4u/AhACeTIsNV5hfGqylri2ibABEtThIwGzPN10/jOjmb0Iyqc32Dm RETe84ke6DhvwyKjFwRzLBQa3yxJVHAh4/64jiGi0LKnUcV9IcIngDBG7U2jn0p5djfzY6rPtSc3W NbmR32Pg==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 25/34] KVM: x86/xen: Prevent runstate times from becoming negative Date: Mon, 8 Jun 2026 15:48:06 +0100 Message-ID: <20260608145455.89187-26-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-d62444/1780930513-ACC53FF4-DDA337D4/0/0 X-purgate-type: clean X-purgate-size: 3262 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930546166154100 Content-Type: text/plain; charset="utf-8" From: David Woodhouse When kvm_xen_update_runstate() is invoked to set a vCPU's runstate, the time spent in the previous runstate is accounted. This is based on the delta between the current KVM clock time, and the previous value stored in vcpu->arch.xen.runstate_entry_time. If the KVM clock goes backwards, that delta will be negative. Or, since it's an unsigned 64-bit integer, very *large*. Linux guests deal with that particularly badly, reporting 100% steal time for ever more (well, for *centuries* at least, until the delta has been consumed). So when a negative delta is detected, just refrain from updating the runstate times until the KVM clock catches up with runstate_entry_time again. Also clamp steal_ns to delta_ns to prevent steal time from exceeding the total elapsed time, and handle negative steal_ns (which can happen if run_delay goes backwards across a scheduler update). The userspace APIs for setting the runstate times do not allow them to be set past the current KVM clock, but userspace can still adjust the KVM clock *after* setting the runstate times, which would cause this situation to occur. Signed-off-by: David Woodhouse Reviewed-by: Paul Durrant --- arch/x86/kvm/xen.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 82e34edbfdbd..b1d67ece5db3 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -586,29 +586,45 @@ void kvm_xen_update_runstate(struct kvm_vcpu *v, int = state) { struct kvm_vcpu_xen *vx =3D &v->arch.xen; u64 now =3D get_kvmclock_ns(v->kvm); - u64 delta_ns =3D now - vx->runstate_entry_time; u64 run_delay =3D current->sched_info.run_delay; + s64 delta_ns =3D now - vx->runstate_entry_time; + s64 steal_ns =3D run_delay - vx->last_steal; =20 + /* + * If the vCPU was never run before, its prior state should + * be considered RUNSTATE_offline. + */ if (unlikely(!vx->runstate_entry_time)) vx->current_runstate =3D RUNSTATE_offline; =20 + /* + * If KVM clock went backwards, just update the current runstate + * but don't account any time. Leave entry_time unchanged so the + * next positive delta covers the full period once the clock + * catches up. Update last_steal every time so stolen time only + * reflects the interval since the most recent call. + */ + if (delta_ns < 0) + goto update_guest; + /* * Time waiting for the scheduler isn't "stolen" if the * vCPU wasn't running anyway. */ - if (vx->current_runstate =3D=3D RUNSTATE_running) { - u64 steal_ns =3D run_delay - vx->last_steal; + if (vx->current_runstate =3D=3D RUNSTATE_running && steal_ns > 0) { + if (steal_ns > delta_ns) + steal_ns =3D delta_ns; =20 delta_ns -=3D steal_ns; - vx->runstate_times[RUNSTATE_runnable] +=3D steal_ns; } - vx->last_steal =3D run_delay; =20 vx->runstate_times[vx->current_runstate] +=3D delta_ns; - vx->current_runstate =3D state; vx->runstate_entry_time =3D now; =20 + update_guest: + vx->current_runstate =3D state; + vx->last_steal =3D run_delay; if (vx->runstate_cache.active) kvm_xen_update_runstate_guest(v, state =3D=3D RUNSTATE_runnable); } --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930550; cv=none; d=zohomail.com; s=zohoarc; b=WezdbV5MS2sIpLJhYsDLvK5UxoU+RyDg6nRxbbOWnKKrcv8c6oHpc35lK91fXIWfA0nCOR10r/TkgvdkAVqruMpOPj+5TVK1TrGnXYRmTt6Xhwm+XRGf0F/iY0wv4t4m9RgRB2INOSenMsi93jelqUtfDVxKcbZMAXGjsBOi51g= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930550; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=/W9sOIKxGm8rH1/acw9blOPAPsQ6wYjC7vlzDavQpTM=; b=Z/D2aPAlevXrP3YffJKerSo7SaWbXeJusfmezZ6i7yOWxju6W0d5cz/3QhlYSAcKu4AvDaPNaBE58L7xMQnPwBhgq479Qy4Qlu42R/F+qXqzHcF3LsyYE5wCeiuQbnkwRBT5FRW9iR4KPkZ1psqLY2MTvXTI9Omok1UiE3/1dNI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930550892787.4332342472245; Mon, 8 Jun 2026 07:55:50 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331714.1594363 (Exim 4.92) (envelope-from ) id 1wWbNr-0003ww-Qo; Mon, 08 Jun 2026 14:55:23 +0000 Received: by outflank-mailman (output) from mailman id 1331714.1594363; Mon, 08 Jun 2026 14:55:23 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNr-0003tY-7w; Mon, 08 Jun 2026 14:55:23 +0000 Received: by outflank-mailman (input) for mailman id 1331714; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net ([194.145.224.20]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNk-00028E-Ch for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNj-008Spg-On; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [10.42.69.12] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7c9-e002-0a2a0a5209dd-0a2a450c8386-38 for ; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [90.155.92.199] (helo=desiato.infradead.org) by tlsNG-d25034.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d2-62f1-0a2a450c0019-5a9b5cc7b604-3 for ; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001AgL-2MkQ; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NGA-2a5R; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=desiato.20200630 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=/W9sOIKxGm8rH1/acw9blOPAPsQ6wYjC7vlzDavQpTM=; b=gtRLD1C8qxcpFgBEQgMx6p4aSS jss3oqQtFSK+Yxd/DdaJcl4ZCorlTxWdgLrew6EgL1gFJhUlNzYAWzXOgEp38KzLp8iBzZnMDZOg4 C64My2gg+VeYLGSj1Vr5fbGi1JkrsGcuzt+JD7Z7t9ajnknWv/qWKNfBBldXREXmwrg5mvMFEzvFB pF4vm3T1OEQC2iBtEbdzvQ2dOGCo1WAbsU6zn87pg+EPD/pU9++Axe/w7CdO0MlUlZdKggQNvVuYV 2CBFRlZJB4tC34gssCmFT6b82np37IGMIC8jddDhu8EkywPz1e53vOGc3YLu19yl8PVlEMZpc+CXZ f7QDjVhA==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 26/34] KVM: x86: Avoid redundant masterclock updates from multiple vCPUs Date: Mon, 8 Jun 2026 15:48:07 +0100 Message-ID: <20260608145455.89187-27-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-d25034/1780930515-F467CCF5-A880CF63/0/0 X-purgate-type: clean X-purgate-size: 5132 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930552208154100 Content-Type: text/plain; charset="utf-8" From: David Woodhouse When a masterclock update is triggered (e.g. by the clocksource change notifier), KVM_REQ_MASTERCLOCK_UPDATE is set on all vCPUs. Without this fix, each vCPU independently processes the request and redundantly re-executes the entire pvclock_update_vm_gtod_copy() sequence, serialized only by tsc_write_lock. Each redundant re-snapshot of the master clock reference point introduces potential clock drift. Fix this by having __kvm_start_pvclock_update() check, after acquiring the lock, whether the requesting vCPU's KVM_REQ_MASTERCLOCK_UPDATE is still set. If another vCPU already did the update and cleared it, bail out. Otherwise, clear the request on all other vCPUs before proceeding. The caller in vcpu_enter_guest() now uses kvm_test_request() (non-clearing) since the clearing is done inside __kvm_start_pvclock_update() under the lock. Suggested-by: Dongli Zhang Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 60 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 46 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4fc21d701588..54d4b1b3cfe4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3288,10 +3288,39 @@ static void kvm_make_mclock_inprogress_request(stru= ct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS); } =20 -static void __kvm_start_pvclock_update(struct kvm *kvm) +static void kvm_clear_mclock_inprogress_request(struct kvm *kvm) { + struct kvm_vcpu *vcpu; + unsigned long i; + + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); +} + +static bool __kvm_start_pvclock_update(struct kvm *kvm, struct kvm_vcpu *r= equesting_vcpu) +{ + struct kvm_vcpu *vcpu; + unsigned long i; + raw_spin_lock_irq(&kvm->arch.tsc_write_lock); + + /* + * If another vCPU already did the update while we were waiting + * for the lock, our request will have been cleared. Bail out. + */ + if (requesting_vcpu && + !kvm_test_request(KVM_REQ_MASTERCLOCK_UPDATE, requesting_vcpu)) { + kvm_clear_mclock_inprogress_request(kvm); + raw_spin_unlock_irq(&kvm->arch.tsc_write_lock); + return false; + } + + /* The update is VM-wide; prevent other vCPUs from redoing it. */ + kvm_for_each_vcpu(i, vcpu, kvm) + kvm_clear_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); + write_seqcount_begin(&kvm->arch.pvclock_sc); + return true; } =20 static void kvm_start_pvclock_update(struct kvm *kvm) @@ -3299,7 +3328,7 @@ static void kvm_start_pvclock_update(struct kvm *kvm) kvm_make_mclock_inprogress_request(kvm); =20 /* no guest entries from this point */ - __kvm_start_pvclock_update(kvm); + __kvm_start_pvclock_update(kvm, NULL); } =20 static void kvm_end_pvclock_update(struct kvm *kvm) @@ -3308,22 +3337,25 @@ static void kvm_end_pvclock_update(struct kvm *kvm) struct kvm_vcpu *vcpu; unsigned long i; =20 - write_seqcount_end(&ka->pvclock_sc); - raw_spin_unlock_irq(&ka->tsc_write_lock); kvm_for_each_vcpu(i, vcpu, kvm) kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); =20 /* guest entries allowed */ - kvm_for_each_vcpu(i, vcpu, kvm) - kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); + kvm_clear_mclock_inprogress_request(kvm); + + write_seqcount_end(&ka->pvclock_sc); + raw_spin_unlock_irq(&ka->tsc_write_lock); } =20 -static void kvm_update_masterclock(struct kvm *kvm) +static void kvm_update_masterclock(struct kvm *kvm, struct kvm_vcpu *vcpu) { kvm_hv_request_tsc_page_update(kvm); - kvm_start_pvclock_update(kvm); - pvclock_update_vm_gtod_copy(kvm); - kvm_end_pvclock_update(kvm); + kvm_make_mclock_inprogress_request(kvm); + + if (__kvm_start_pvclock_update(kvm, vcpu)) { + pvclock_update_vm_gtod_copy(kvm); + kvm_end_pvclock_update(kvm); + } } =20 /* @@ -10157,7 +10189,7 @@ static void kvm_hyperv_tsc_notifier(void) kvm_caps.max_guest_tsc_khz =3D tsc_khz; =20 list_for_each_entry(kvm, &vm_list, vm_list) { - __kvm_start_pvclock_update(kvm); + __kvm_start_pvclock_update(kvm, NULL); pvclock_update_vm_gtod_copy(kvm); kvm_end_pvclock_update(kvm); } @@ -11535,8 +11567,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) kvm_mmu_free_obsolete_roots(vcpu); if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu)) __kvm_migrate_timers(vcpu); - if (kvm_check_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu)) - kvm_update_masterclock(vcpu->kvm); + if (kvm_test_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu)) + kvm_update_masterclock(vcpu->kvm, vcpu); if (kvm_check_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu)) kvm_gen_kvmclock_update(vcpu); if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) { @@ -13273,7 +13305,7 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) vcpu_load(vcpu); kvm_synchronize_tsc(vcpu, NULL); if (!vcpu->kvm->arch.use_master_clock) - kvm_update_masterclock(vcpu->kvm); + kvm_update_masterclock(vcpu->kvm, NULL); vcpu_put(vcpu); =20 /* poll control enabled by default */ --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930546; cv=none; d=zohomail.com; s=zohoarc; b=UnvkDu2/BZO5AGKzW7Kt8cXuGdv3Yd5mIRIyukw9/4NyjMlq6cgGxW2heqsjPI8Y4h6YMTDN8W+LV1hZTfGy1zHml1NgxuI83w+SWvukn1nbVAbEvDVBiy3LtJuR5X/CvqD+kNi0/pSMo68pVZMNWqMl/Ut770CjNi+o/U4lipk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930546; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=mAoCrW41i2oubtWoRcSbcZKOTJ7hLP/U/r8vkThOrIU=; b=bR2uE1NWqXMCIyqp0i5mBoNbM88/2gSn+eAduRzHxb+MQ9vgGVFSEXs1gMeocAVGLGt9Iybv0q+5UW8vHV/jwCIItKgWNv0ZKhntfPVY/eFq38ltMbaE0/kTjH7ZcNA0GoDHoz3OjEx8TZsrfp8Jf6VQU2gJ1pwSSHLw00nnsi4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930543987521.0699342747051; Mon, 8 Jun 2026 07:55:43 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331715.1594385 (Exim 4.92) (envelope-from ) id 1wWbNu-0004Sd-VL; Mon, 08 Jun 2026 14:55:26 +0000 Received: by outflank-mailman (output) from mailman id 1331715.1594385; Mon, 08 Jun 2026 14:55:26 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNt-0004PC-Md; Mon, 08 Jun 2026 14:55:25 +0000 Received: by outflank-mailman (input) for mailman id 1331715; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net ([195.190.135.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNk-0002Hj-Rv for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:17 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNk-00AKVN-8I; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [10.42.69.5] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7c6-2eae-0a2a0a5409dd-0a2a450596fa-20 for ; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [90.155.92.199] (helo=desiato.infradead.org) by tlsNG-c201ff.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d3-aaa8-0a2a45050019-5a9b5cc7b614-3 for ; Mon, 08 Jun 2026 16:55:16 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNU-00000001AgM-2M92; Mon, 08 Jun 2026 14:55:00 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NGE-2kHD; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=desiato.20200630 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=mAoCrW41i2oubtWoRcSbcZKOTJ7hLP/U/r8vkThOrIU=; b=hJmTbo0VQUYJIJ5SiadsI+2Oi6 MKKp3iSxBITfY/0JfC1ulBP5KmALb63BQVFZ5VBIZ8zHsl+fv1IbRJswv4zVZ5Gd2YEyx68z/xCIY y8evaf+XUEww9YTdUGtalJxvTdrTL+sGNRUiaP31Pf1Y9FnGQ9ltnFdiv8mJE23LklblzcqdllpIf VZv83tdmjr0SAKriU5YRvhiHcTjfa8VG0A48nvBVSyi0PPOrE1P6WtBDCvpdneNT6I/ngEMGo+ueG He7tGsQdtSYrAz3HrHHEnKW9d9JmO462eHtQeKqgzNHIngCwn1xabEDqaGgl5+HSPMg6o+1/bWfPY lQ71nmTQ==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 27/34] KVM: x86: Remove runtime Xen TSC frequency CPUID update Date: Mon, 8 Jun 2026 15:48:08 +0100 Message-ID: <20260608145455.89187-28-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-c201ff/1780930516-DA76A443-E1A010EA/0/0 X-purgate-type: clean X-purgate-size: 3099 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930549275158500 Content-Type: text/plain; charset="utf-8" From: David Woodhouse Remove the code in kvm_cpuid() that dynamically updates the Xen TSC info CPUID leaf at runtime. This code was updating the wrong sub-leaf anyway (0x40000x03/2 EAX is the *host* TSC frequency per the Xen ABI, not the guest frequency which belongs in 0x40000x03/0 ECX). Userspace now has all the information it needs to populate the Xen TSC info leaves (and the generic 0x40000010 timing leaf) at vCPU setup time: - KVM_GET_CLOCK_GUEST returns the pvclock_vcpu_time_info structure containing tsc_to_system_mul and tsc_shift (Xen leaf index 1) - KVM_VCPU_TSC_SCALE returns the effective TSC and bus frequencies in kHz (Xen leaf index 2, and 0x40000010) - KVM_VCPU_TSC_SCALE returns the raw hardware scaling ratio for precise arithmetic (VMClock) This eliminates the last instance of KVM modifying guest CPUID entries at runtime for timing information. Signed-off-by: David Woodhouse --- arch/x86/kvm/cpuid.c | 16 ---------------- arch/x86/kvm/xen.h | 13 ------------- 2 files changed, 29 deletions(-) diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 621d950ec692..826637a0b72d 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -2117,22 +2117,6 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 = *ebx, } else if (function =3D=3D 0x80000007) { if (kvm_hv_invtsc_suppressed(vcpu)) *edx &=3D ~feature_bit(CONSTANT_TSC); - } else if (IS_ENABLED(CONFIG_KVM_XEN) && - kvm_xen_is_tsc_leaf(vcpu, function)) { - /* - * Update guest TSC frequency information if necessary. - * Ignore failures, there is no sane value that can be - * provided if KVM can't get the TSC frequency. - */ - if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) - kvm_guest_time_update(vcpu); - - if (index =3D=3D 1) { - *ecx =3D vcpu->arch.pvclock_tsc_mul; - *edx =3D vcpu->arch.pvclock_tsc_shift; - } else if (index =3D=3D 2) { - *eax =3D div_u64(vcpu->arch.hw_tsc_hz, 1000); - } } } else { *eax =3D *ebx =3D *ecx =3D *edx =3D 0; diff --git a/arch/x86/kvm/xen.h b/arch/x86/kvm/xen.h index 59e6128a7bd3..f372855857a8 100644 --- a/arch/x86/kvm/xen.h +++ b/arch/x86/kvm/xen.h @@ -50,14 +50,6 @@ static inline void kvm_xen_sw_enable_lapic(struct kvm_vc= pu *vcpu) kvm_xen_inject_vcpu_vector(vcpu); } =20 -static inline bool kvm_xen_is_tsc_leaf(struct kvm_vcpu *vcpu, u32 function) -{ - return static_branch_unlikely(&kvm_xen_enabled.key) && - vcpu->arch.xen.cpuid.base && - function <=3D vcpu->arch.xen.cpuid.limit && - function =3D=3D (vcpu->arch.xen.cpuid.base | XEN_CPUID_LEAF(3)); -} - static inline bool kvm_xen_msr_enabled(struct kvm *kvm) { return static_branch_unlikely(&kvm_xen_enabled.key) && @@ -177,11 +169,6 @@ static inline bool kvm_xen_timer_enabled(struct kvm_vc= pu *vcpu) { return false; } - -static inline bool kvm_xen_is_tsc_leaf(struct kvm_vcpu *vcpu, u32 function) -{ - return false; -} #endif =20 int kvm_xen_hypercall(struct kvm_vcpu *vcpu); --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930545; cv=none; d=zohomail.com; s=zohoarc; b=HTkxJ/QMUjWZ4QiRcpLiajEObnokf1hFiNfv/72+EJBMQV/Fp4VjFwDLMUiL59kdWELORP6hSMWe2KlWFPMarTtVjU9gQ7ckxFIdpl/tueql4XBC/lWxM6qd/2eZaT/shi/0/EfeYhzIrx3Cq6dImX8tWhKSmsudNN9/1kVT9NQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930545; h=Content-Type:Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=c78r7FHx129wTlE8duPzxp93qTvHafKXfP1ElOAPp+s=; b=MqofpVDU+Mt3CdhySBx4Ci0GEhT1xQMC/SkuNEQAdR7zdnGmA7GMdkUFt+Iunl1OveZRV38PGv8YD/B2rUWgohsjAMBImV82I7+mfUHqYHNZAlXA/3bDJHi4jWPgWPTW8DqmrJEON7jhaj66RQ8HLVELQK1YHIglul0/gKwQXpE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930545009122.17022032037471; Mon, 8 Jun 2026 07:55:45 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331710.1594325 (Exim 4.92) (envelope-from ) id 1wWbNn-0002qb-B1; Mon, 08 Jun 2026 14:55:19 +0000 Received: by outflank-mailman (output) from mailman id 1331710.1594325; Mon, 08 Jun 2026 14:55:19 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNm-0002o8-T6; Mon, 08 Jun 2026 14:55:18 +0000 Received: by outflank-mailman (input) for mailman id 1331710; Mon, 08 Jun 2026 14:55:15 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNi-0001VG-LX for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:14 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNi-00EcHE-1W; Mon, 08 Jun 2026 16:55:14 +0200 Received: from [10.42.69.9] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7bc-5cb7-0a2a0a5109dd-0a2a4509e8d2-30 for ; Mon, 08 Jun 2026 16:55:14 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-bad1c0.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d1-2497-0a2a45090019-5a9b3222d95e-3 for ; Mon, 08 Jun 2026 16:55:13 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNS-0000000Dtxe-2Jxv; Mon, 08 Jun 2026 14:54:59 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NGJ-2vCG; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=c78r7FHx129wTlE8duPzxp93qTvHafKXfP1ElOAPp+s=; b=Hu0bnb2gTJzqaczociH1Gd9yU6 V9F3FDAt6H9lJWrBbnepBHzF5cFtAwyfJufteJNkcYd7UmwJbVGSeJ+PiYe0Vw0mWwyt2PtQNq3Ys G7eVdSyuNjH+ehZf6PZmwQMYuoeaP7zSEecwrM4Ociadp0YNirqwZJ21WOR1Qak1GJNdNr7fs16YS 0UexfmoAAfxWHoBgrDH7yMnjXk1n9VGZnCC8pnXSrYtUfv6zhMG5gN9jiUO5B3LQN/lsVl/G1vDSS WrhQctlyxYmEHa/EArSepdj0dRkq3whg/6n5LQFQPzAAI1+yWZmjnueJ59D3BOJBY2M+2nv8FXsSl p4XpDtjQ==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 28/34] KVM: selftests: Add Xen/generic CPUID timing leaf test Date: Mon, 8 Jun 2026 15:48:09 +0100 Message-ID: <20260608145455.89187-29-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-bad1c0/1780930513-8A18BA53-D5E908E8/0/0 X-purgate-type: clean X-purgate-size: 8839 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930546267154100 From: David Woodhouse Verify that userspace can correctly populate Xen and generic CPUID timing leaves using the KVM_VCPU_TSC_SCALE and KVM_VCPU_TSC_SCALE attributes. This validates that the removal of KVM's runtime Xen CPUID modification doesn't break guests: userspace queries the effective TSC and bus frequencies, computes the pvclock mul/shift, populates the CPUID leaves, and the guest verifies the values match. The test exercises: - KVM_VCPU_TSC_SCALE at native and scaled frequencies - KVM_VCPU_TSC_SCALE ratio verification against effective frequency - Generic timing leaf 0x40000010 (EAX=3Dtsc_khz, EBX=3Dbus_khz) - Xen leaf 3 sub-leaf 0 (ECX=3Dguest TSC kHz) - Xen leaf 3 sub-leaf 1 (ECX=3Dmul, EDX=3Dshift) Gracefully skips TSC scaling tests on hardware without support. Signed-off-by: David Woodhouse --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../selftests/kvm/x86/xen_cpuid_timing_test.c | 230 ++++++++++++++++++ 2 files changed, 231 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86/xen_cpuid_timing_test.c diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index 7ecaaf82056e..58aac2980cdf 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -141,6 +141,7 @@ TEST_GEN_PROGS_x86 +=3D x86/xss_msr_test TEST_GEN_PROGS_x86 +=3D x86/debug_regs TEST_GEN_PROGS_x86 +=3D x86/tsc_msrs_test TEST_GEN_PROGS_x86 +=3D x86/vmx_pmu_caps_test +TEST_GEN_PROGS_x86 +=3D x86/xen_cpuid_timing_test TEST_GEN_PROGS_x86 +=3D x86/xen_shinfo_test TEST_GEN_PROGS_x86 +=3D x86/xen_vmcall_test TEST_GEN_PROGS_x86 +=3D x86/sev_init2_tests diff --git a/tools/testing/selftests/kvm/x86/xen_cpuid_timing_test.c b/tool= s/testing/selftests/kvm/x86/xen_cpuid_timing_test.c new file mode 100644 index 000000000000..a0c262b8db89 --- /dev/null +++ b/tools/testing/selftests/kvm/x86/xen_cpuid_timing_test.c @@ -0,0 +1,230 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Test that userspace can correctly populate Xen and generic CPUID + * timing leaves using KVM_GET_TSC_KHZ and KVM_VCPU_TSC_SCALE. + * + * This validates that the removal of KVM's runtime Xen CPUID modification + * doesn't break guests, because userspace has all the information needed. + */ +#include +#include +#include + +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" + +#include + +#define XEN_CPUID_BASE 0x40000100 +#define XEN_CPUID_LEAF(n) (XEN_CPUID_BASE + (n)) +#define GENERIC_TIMING_LEAF 0x40000010 + +/* Values set by host, verified by guest */ +static uint32_t expected_tsc_khz; +static uint32_t expected_bus_khz; +static uint32_t expected_tsc_mul; +static int8_t expected_tsc_shift; +static uint64_t host_khz; + +static void guest_code(void) +{ + uint32_t eax, ebx, ecx, edx; + + /* Check generic timing leaf 0x40000010 */ + __cpuid(GENERIC_TIMING_LEAF, 0, &eax, &ebx, &ecx, &edx); + GUEST_ASSERT_EQ(eax, expected_tsc_khz); + GUEST_ASSERT_EQ(ebx, expected_bus_khz); + + /* Check Xen leaf 3, sub-leaf 0: ECX =3D guest TSC frequency */ + __cpuid(XEN_CPUID_LEAF(3), 0, &eax, &ebx, &ecx, &edx); + GUEST_ASSERT_EQ(ecx, expected_tsc_khz); + + /* Check Xen leaf 3, sub-leaf 1: ECX =3D mul, EDX =3D shift */ + __cpuid(XEN_CPUID_LEAF(3), 1, &eax, &ebx, &ecx, &edx); + GUEST_ASSERT_EQ(ecx, expected_tsc_mul); + GUEST_ASSERT_EQ((int8_t)edx, expected_tsc_shift); + + GUEST_SYNC(0); +} + +static void add_cpuid_entry(struct kvm_vcpu *vcpu, uint32_t function, + uint32_t index, uint32_t eax, uint32_t ebx, + uint32_t ecx, uint32_t edx) +{ + struct kvm_cpuid2 *cpuid =3D vcpu->cpuid; + struct kvm_cpuid_entry2 *entry; + int n =3D cpuid->nent; + + vcpu->cpuid =3D realloc(vcpu->cpuid, + sizeof(*cpuid) + (n + 1) * sizeof(*entry)); + cpuid =3D vcpu->cpuid; + cpuid->nent =3D n + 1; + + entry =3D &cpuid->entries[n]; + memset(entry, 0, sizeof(*entry)); + entry->function =3D function; + entry->index =3D index; + entry->flags =3D KVM_CPUID_FLAG_SIGNIFCANT_INDEX; + entry->eax =3D eax; + entry->ebx =3D ebx; + entry->ecx =3D ecx; + entry->edx =3D edx; +} + +/* + * Compute pvclock mul/shift from frequency, matching kvm_get_time_scale(). + */ +static void compute_tsc_mul_shift(uint64_t tsc_hz, uint32_t *mul, int8_t *= shift) +{ + uint64_t scaled =3D 1000000000ULL; + uint64_t base =3D tsc_hz; + int32_t s =3D 0; + uint32_t base32; + + while (base > scaled * 2 || base >> 32) { + base >>=3D 1; + s--; + } + base32 =3D (uint32_t)base; + while (base32 <=3D scaled || scaled >> 32) { + if (scaled >> 32 || base32 & (1U << 31)) + scaled >>=3D 1; + else + base32 <<=3D 1; + s++; + } + *mul =3D (uint32_t)((scaled << 32) / base32); + *shift =3D (int8_t)s; +} + +static void run_test(uint64_t tsc_khz) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + struct ucall uc; + uint32_t effective_tsc_khz, effective_bus_khz; + int bus_cycle_ns; + + vm =3D vm_create_with_one_vcpu(&vcpu, guest_code); + + if (tsc_khz) { + pr_info("Testing at TSC frequency %lu kHz\n", tsc_khz); + vcpu_ioctl(vcpu, KVM_SET_TSC_KHZ, (void *)(unsigned long)tsc_khz); + } else { + pr_info("Testing at native TSC frequency\n"); + } + + effective_tsc_khz =3D __vcpu_ioctl(vcpu, KVM_GET_TSC_KHZ, NULL); + bus_cycle_ns =3D vm_check_cap(vm, KVM_CAP_X86_APIC_BUS_CYCLES_NS); + effective_bus_khz =3D bus_cycle_ns > 0 ? 1000000 / bus_cycle_ns : 1000000; + + /* If scaling wasn't applied, skip this frequency */ + if (tsc_khz && effective_tsc_khz =3D=3D host_khz) { + pr_info(" TSC scaling not available, skipping\n"); + kvm_vm_release(vm); + return; + } + + pr_info(" Effective TSC: %u kHz, Bus: %u kHz\n", effective_tsc_khz, effe= ctive_bus_khz); + + /* Also exercise KVM_VCPU_TSC_SCALE if available */ + { + struct { uint64_t ratio; uint64_t frac_bits; } scale; + struct kvm_device_attr scale_attr =3D { + .group =3D KVM_VCPU_TSC_CTRL, + .attr =3D 1, /* KVM_VCPU_TSC_SCALE */ + .addr =3D (uint64_t)(uintptr_t)&scale, + }; + + if (!__vcpu_ioctl(vcpu, KVM_HAS_DEVICE_ATTR, &scale_attr)) { + vcpu_ioctl(vcpu, KVM_GET_DEVICE_ATTR, &scale_attr); + pr_info(" TSC scale: ratio=3D%lu frac_bits=3D%lu\n", + scale.ratio, scale.frac_bits); + + /* + * Verify: applying the ratio to the host TSC frequency + * should give approximately the effective frequency. + */ + if (tsc_khz) { + uint64_t computed =3D ((__uint128_t)host_khz * scale.ratio) >> scale.f= rac_bits; + int64_t diff =3D (int64_t)computed - (int64_t)effective_tsc_khz; + + TEST_ASSERT(diff >=3D -1 && diff <=3D 1, + "TSC_SCALE ratio mismatch: computed %lu vs effective %u (diff %ld= )", + computed, effective_tsc_khz, diff); + } + } + } + + compute_tsc_mul_shift((uint64_t)effective_tsc_khz * 1000, + &expected_tsc_mul, &expected_tsc_shift); + + expected_tsc_khz =3D effective_tsc_khz; + expected_bus_khz =3D effective_bus_khz; + + sync_global_to_guest(vm, expected_tsc_khz); + sync_global_to_guest(vm, expected_bus_khz); + sync_global_to_guest(vm, expected_tsc_mul); + sync_global_to_guest(vm, expected_tsc_shift); + + /* Populate CPUID leaves as a VMM would */ + add_cpuid_entry(vcpu, GENERIC_TIMING_LEAF, 0, + effective_tsc_khz, effective_bus_khz, 0, 0); + add_cpuid_entry(vcpu, XEN_CPUID_LEAF(3), 0, + 0, 0, effective_tsc_khz, 0); + add_cpuid_entry(vcpu, XEN_CPUID_LEAF(3), 1, + 0, 0, expected_tsc_mul, + (uint32_t)(uint8_t)expected_tsc_shift); + + vcpu_set_cpuid(vcpu); + + pr_info(" pvclock mul=3D%u shift=3D%d\n", expected_tsc_mul, expected_tsc= _shift); + + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + break; + case UCALL_SYNC: + break; + default: + TEST_FAIL("Unexpected ucall"); + } + + kvm_vm_release(vm); +} + +int main(void) +{ + uint64_t freq; + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + struct kvm_device_attr attr =3D { + .group =3D KVM_VCPU_TSC_CTRL, + .attr =3D KVM_VCPU_TSC_SCALE, + }; + + TEST_REQUIRE(sys_clocksource_is_based_on_tsc()); + + /* Check KVM_VCPU_TSC_SCALE is supported (implies TSC scaling) */ + vm =3D vm_create_with_one_vcpu(&vcpu, guest_code); + TEST_REQUIRE(!__vcpu_ioctl(vcpu, KVM_HAS_DEVICE_ATTR, &attr)); + host_khz =3D __vcpu_ioctl(vcpu, KVM_GET_TSC_KHZ, NULL); + kvm_vm_release(vm); + + /* Native frequency */ + run_test(0); + + /* Scaled frequencies =E2=80=94 skip if TSC scaling not available */ + for (freq =3D 1000000; freq <=3D 4000000; freq +=3D 1000000) { + if (freq =3D=3D host_khz) + continue; + run_test(freq); + } + + pr_info("PASS: All CPUID timing leaf tests passed\n"); + return 0; +} --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930569; cv=none; d=zohomail.com; s=zohoarc; b=dUXW+eEnQuD8eXhfYf84uX5WAoJOcfMBCpW6NBsVYn/j7CSIqv33bx5kM0F8HiCvHaAbFxh4bEEWYJqsmPdjbti5mG/UCw/kumYv+Wo4Xl77UEjIKmv5U+K3ts9QbWSRe6wYrJJcZnnaau3UXk4Dfpi+0CIOoQpjY3Cb9QKYFHk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930569; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=lqNX+TXIvKbp1pjqN/ugyzGv3kWdP4RiL9hc1KXGPRI=; b=hzcroRj8k5IWav2vqQsEdexKVzcse3SjU4mOllnCO9ck6kDOGb28cK/WQesbiP5c6iFZefgb7VKBbxBmNy93ksL55F8xRP5k3qPtnsM9f0vXTa01YuUiVRveZPvFEV8ZGzgkgot1MGgqNYTEoUA7P822Bd9tlF/CBeMjB9uuVJs= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930569717369.6121137973663; Mon, 8 Jun 2026 07:56:09 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331741.1594449 (Exim 4.92) (envelope-from ) id 1wWbOD-0008JV-Hj; Mon, 08 Jun 2026 14:55:45 +0000 Received: by outflank-mailman (output) from mailman id 1331741.1594449; Mon, 08 Jun 2026 14:55:45 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbOC-0008Dj-HP; Mon, 08 Jun 2026 14:55:44 +0000 Received: by outflank-mailman (input) for mailman id 1331741; Mon, 08 Jun 2026 14:55:36 +0000 Received: from mx.expurgate.net ([195.190.135.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbO3-0006FN-PJ for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:35 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbO2-00AKad-Vy; Mon, 08 Jun 2026 16:55:35 +0200 Received: from [10.42.69.12] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7e1-2eae-0a2a0a5409dd-0a2a450cc378-24 for ; Mon, 08 Jun 2026 16:55:34 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-d25034.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7e6-62f1-0a2a450c0019-5a9b3222d8f8-3 for ; Mon, 08 Jun 2026 16:55:34 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNS-0000000Dtxf-2ZjW; Mon, 08 Jun 2026 14:54:59 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NGR-3GbY; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=lqNX+TXIvKbp1pjqN/ugyzGv3kWdP4RiL9hc1KXGPRI=; b=wRwgQ5ZGbWhFRFS/ILi+ryrW8z RCJJgbqOfBEBjgKSKCcCoDW2seCAXJOVkJsIUjpsZTFf3ZwwW/r/LBoRg3GcTDegI9Om9buMxnXSm ol4oIqjGlb4+UjznpTA/wrrDb9iKy8EBM/hG8vEG7tlKR+JP/4yLfX9N3NkyF9T370tZRJwZjGk+d HgwTljMZ5/hOowc/g8+thNhKsZ2enCaTJi9CEaWb230kQ4ycqvj9pOEWkLhZEyZr0PfWYT9G+NitC bu20HsuDZZ1x7G+uKKJk8yolRz05c/6kBD7WH2e58bwup4UlWMqcQw5MeXhriW5XlTXpV6fs8IKUT sbgyowbQ==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 29/34] KVM: x86: Re-synchronize TSC after KVM_SET_TSC_KHZ Date: Mon, 8 Jun 2026 15:48:10 +0100 Message-ID: <20260608145455.89187-30-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-d25034/1780930534-E0159CF5-60DDAD60/0/0 X-purgate-type: clean X-purgate-size: 2045 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930571253158502 Content-Type: text/plain; charset="utf-8" From: David Woodhouse KVM_SET_TSC_KHZ changes the vCPU's TSC scaling ratio but does not update the VM-wide cur_tsc_scaling_ratio used by get_kvmclock(). This causes get_kvmclock() to use a stale (default 1:1) ratio when computing the KVM clock, leading to drift between the host-side kvmclock and what the guest observes. Fix this by calling kvm_synchronize_tsc() after changing the TSC frequency. This: - Updates cur_tsc_scaling_ratio (consumed by pvclock_update_vm_gtod_copy) - Ensures the TSC value is continuous across the frequency change - Triggers kvm_track_tsc_matching() for proper masterclock handling - Allows subsequent vCPUs to synchronize via the 1-second slop hack Signed-off-by: David Woodhouse --- arch/x86/kvm/x86.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 54d4b1b3cfe4..96250264d403 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -206,6 +206,7 @@ module_param(mitigate_smt_rsb, bool, 0444); #ifdef CONFIG_X86_64 static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *tsc_timestamp); #endif +static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 *user_value); #define KVM_MAX_NR_USER_RETURN_MSRS 16 =20 struct kvm_user_return_msrs { @@ -2611,7 +2612,20 @@ static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u3= 2 user_tsc_khz) user_tsc_khz, thresh_lo, thresh_hi); use_scaling =3D 1; } - return set_tsc_khz(vcpu, user_tsc_khz, use_scaling); + if (set_tsc_khz(vcpu, user_tsc_khz, use_scaling)) + return -1; + + /* + * Re-synchronize the TSC after changing frequency. This ensures + * cur_tsc_scaling_ratio is updated (used by get_kvmclock) and + * the TSC value is continuous across the frequency change. + */ + { + u64 tsc =3D kvm_read_l1_tsc(vcpu, rdtsc()); + + kvm_synchronize_tsc(vcpu, &tsc); + } + return 0; } =20 static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns) --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930565; cv=none; d=zohomail.com; s=zohoarc; b=bVHL9IGkSnCjCWaeiYzVjDF7+K8jp/x6+YF9AvisQ3wClVkF8NhgrJpRf8TiS/mpwVJZ5qG/rv+eKYixboaSrq+9/PMDKUXMkADfMeFdBUVQ5XUbRFW1ONIdhKquL4LT68BNcQd5inIBCMxhrV31IOKpkD3jgZKBRdoz2RSb6Nc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930565; h=Content-Type:Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=aHV9lck9iJ7HfeFHLwY/x8/L76vGmBWhEjbD8RUPDnI=; b=JW2HddPXjfO026klRK37q25j/EpvIUqaKwCGeFnmv9xB7uUTkjWiWj9HKCGrODUPdUNWu1CN9S9z4tkpiWCN8enY/znT2rwa3ttkPUFMnEC/UjkB+Puu109ndB1xkl6NS2b/ctZH3MEdPz3huCBnR9vx0ae1C/c3johQrKSzizQ= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930565277236.42472934210832; Mon, 8 Jun 2026 07:56:05 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331736.1594440 (Exim 4.92) (envelope-from ) id 1wWbOB-0007nR-Kp; Mon, 08 Jun 2026 14:55:43 +0000 Received: by outflank-mailman (output) from mailman id 1331736.1594440; Mon, 08 Jun 2026 14:55:43 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbOA-0007bY-3t; Mon, 08 Jun 2026 14:55:42 +0000 Received: by outflank-mailman (input) for mailman id 1331736; Mon, 08 Jun 2026 14:55:33 +0000 Received: from mx.expurgate.net ([195.190.135.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbO0-0005bW-8v for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:32 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNz-002m34-E0; Mon, 08 Jun 2026 16:55:31 +0200 Received: from [10.42.69.9] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7d7-bab6-0a2a0a5309dd-0a2a4509a2de-22 for ; Mon, 08 Jun 2026 16:55:31 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-bad1c0.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7e2-2497-0a2a45090019-5a9b3222d714-3 for ; Mon, 08 Jun 2026 16:55:31 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNS-0000000Dtxj-2vXt; Mon, 08 Jun 2026 14:54:59 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NGX-3X6a; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=aHV9lck9iJ7HfeFHLwY/x8/L76vGmBWhEjbD8RUPDnI=; b=N9+adQEbHGyBhmT81pe1mmYYck LjW/bk4JiQDQVfbGbb3rZrQkZSu2DFaWqWwtmJ4aYta+6mUnOrqpsfIou57Twpu7e3mpdtOR6IaS/ PR30FjJPheq3HC07+oHZIvW0QagIsGBCO4FGmfBYHGTO6L32PDV0Gau8jWk4WqHWk2LxHqr7BBkeg InjZCCPr2gDsciyXM/r+vac1yWl6d0xWHTwRNSxvzqdCOnylWOjVvDEBHh/ul2i4PBO8o+YQL6ivG 9Y+C1B3FJ8v/gGK94o+RhO5HmgPf9ZaXmgM9mMzuTK+aM+zaoXaQmTWHqMhV9AhnL2ZQbnsjqcGaA wWHDSRaQ==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 30/34] KVM: selftests: Add Xen runstate migration test Date: Mon, 8 Jun 2026 15:48:11 +0100 Message-ID: <20260608145455.89187-31-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-bad1c0/1780930531-42B70A53-B48BA8E3/0/0 X-purgate-type: clean X-purgate-size: 7849 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930566404154100 From: David Woodhouse Test that Xen runstate (steal time) is correctly accounted across a simulated live migration using KVM_XEN_VCPU_ATTR and KVM_[GS]ET_CLOCK_GUEST. The test simulates what a real VMM does during migration: 1. Creates a VM with Xen HVM config and runstate tracking 2. Runs the guest to accumulate some kvmclock time 3. Saves clock (KVM_GET_CLOCK_GUEST), TSC offset, and runstate 4. Marks the saved state as RUNSTATE_runnable (vCPU not running) 5. Destroys the source VM 6. Sleeps 10ms (simulating migration network transfer time) 7. Creates a new VM and restores all state precisely as saved 8. Runs the guest and verifies the migration gap appears as steal The kernel accounts the gap because: on vcpu_load, it transitions from RUNSTATE_runnable to RUNSTATE_running, computing delta =3D kvmclock_now - state_entry_time. Since kvmclock has advanced past the saved entry time (real time elapsed during migration), the delta is added to time_runnable. Signed-off-by: David Woodhouse --- .../selftests/kvm/x86/xen_migration_test.c | 194 ++++++++++++++++++ 1 file changed, 194 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86/xen_migration_test.c diff --git a/tools/testing/selftests/kvm/x86/xen_migration_test.c b/tools/t= esting/selftests/kvm/x86/xen_migration_test.c new file mode 100644 index 000000000000..37e8ace00611 --- /dev/null +++ b/tools/testing/selftests/kvm/x86/xen_migration_test.c @@ -0,0 +1,194 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Test Xen runstate (steal time) preservation across simulated migration. + * + * Verifies that the kernel correctly accounts the migration gap as + * steal time (runnable) when runstate data is saved and restored + * precisely, but real time elapses during the migration. + * + * The key insight: userspace saves the runstate with state=3DRUNSTATE_run= nable + * (the vCPU is not running during migration). On restore, the kernel sees + * that kvmclock has advanced past state_entry_time, and accounts the + * difference as time spent in the runnable state. + */ +#include +#include +#include +#include +#include + +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" + +#include + +#define SHINFO_GPA 0xc0000000ULL +#define RUNSTATE_GPA (SHINFO_GPA + 0x1000) + +#define RUNSTATE_running 0 +#define RUNSTATE_runnable 1 +#define RUNSTATE_blocked 2 +#define RUNSTATE_offline 3 + +struct vcpu_runstate_info { + uint32_t state; + uint64_t state_entry_time; + uint64_t time[4]; +} __attribute__((packed)); + +static void guest_code(void) +{ + volatile struct vcpu_runstate_info *rs =3D + (void *)(unsigned long)RUNSTATE_GPA; + + /* Report runstate times =E2=80=94 no need to enable kvmclock MSR, + * the kernel writes runstate using its internal kvmclock. */ + GUEST_SYNC_ARGS(0, rs->time[RUNSTATE_runnable], + rs->time[RUNSTATE_running], 0, 0); +} + +static struct kvm_vm *create_xen_vm(struct kvm_vcpu **vcpu) +{ + struct kvm_vm *vm; + int xen_caps; + + vm =3D vm_create_with_one_vcpu(vcpu, guest_code); + + xen_caps =3D kvm_check_cap(KVM_CAP_XEN_HVM); + TEST_REQUIRE(xen_caps & KVM_XEN_HVM_CONFIG_SHARED_INFO); + TEST_REQUIRE(xen_caps & KVM_XEN_HVM_CONFIG_RUNSTATE); + + /* Map pages */ + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, + SHINFO_GPA, 1, 2, 0); + virt_map(vm, SHINFO_GPA, SHINFO_GPA, 2); + + /* Enable Xen HVM with MSR interception (enables runstate tracking) */ + struct kvm_xen_hvm_config cfg =3D { + .flags =3D KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL, + .msr =3D 0x40000000, + }; + vm_ioctl(vm, KVM_XEN_HVM_CONFIG, &cfg); + + /* Set shared_info */ + struct kvm_xen_hvm_attr ha =3D { + .type =3D KVM_XEN_ATTR_TYPE_SHARED_INFO, + .u.shared_info.gfn =3D SHINFO_GPA >> 12, + }; + vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &ha); + + /* Set runstate address */ + struct kvm_xen_vcpu_attr rs_addr =3D { + .type =3D KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR, + .u.gpa =3D RUNSTATE_GPA, + }; + vcpu_ioctl(*vcpu, KVM_XEN_VCPU_SET_ATTR, &rs_addr); + + return vm; +} + +int main(void) +{ + struct pvclock_vcpu_time_info pvti; + struct kvm_xen_vcpu_attr runstate_save; + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + struct ucall uc; + uint64_t tsc_offset; + int ret; + + /* =3D=3D=3D SOURCE SIDE =3D=3D=3D */ + pr_info("=3D=3D=3D Source: create VM and run guest =3D=3D=3D\n"); + vm =3D create_xen_vm(&vcpu); + + /* Run guest once to accumulate some runstate time */ + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_SYNC); + + pr_info(" Guest sees: runnable=3D%" PRIu64 " running=3D%" PRIu64 "\n", + uc.args[2], uc.args[3]); + + /* Save clock state */ + ret =3D __vcpu_ioctl(vcpu, KVM_GET_CLOCK_GUEST, &pvti); + TEST_ASSERT(!ret, "KVM_GET_CLOCK_GUEST failed"); + + /* Save TSC offset */ + tsc_offset =3D vcpu_get_msr(vcpu, MSR_IA32_TSC_ADJUST); + + /* Save runstate =E2=80=94 the vCPU is now "runnable" (not running) */ + runstate_save.type =3D KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA; + vcpu_ioctl(vcpu, KVM_XEN_VCPU_GET_ATTR, &runstate_save); + + /* + * Transition to runnable state before saving =E2=80=94 the vCPU is + * not running during migration. + */ + runstate_save.u.runstate.state =3D RUNSTATE_runnable; + + pr_info(" Saved runstate: running=3D%" PRIu64 " runnable=3D%" PRIu64 + " entry=3D%" PRIu64 "\n", + (uint64_t)runstate_save.u.runstate.time_running, + (uint64_t)runstate_save.u.runstate.time_runnable, + (uint64_t)runstate_save.u.runstate.state_entry_time); + + uint64_t saved_runnable =3D runstate_save.u.runstate.time_runnable; + + kvm_vm_release(vm); + + /* =3D=3D=3D MIGRATION GAP =3D=3D=3D */ + pr_info("=3D=3D=3D Simulating migration (sleeping 10ms) =3D=3D=3D\n"); + usleep(10000); + + /* =3D=3D=3D DESTINATION SIDE =3D=3D=3D */ + pr_info("=3D=3D=3D Destination: create new VM and restore =3D=3D=3D\n"); + vm =3D create_xen_vm(&vcpu); + + /* Restore TSC offset */ + vcpu_set_msr(vcpu, MSR_IA32_TSC_ADJUST, tsc_offset); + + /* Restore clock =E2=80=94 kvmclock will now be ~10ms ahead of the snapsh= ot */ + vcpu_ioctl(vcpu, KVM_SET_CLOCK_GUEST, &pvti); + + /* Restore runstate exactly as saved (state=3Drunnable) */ + runstate_save.type =3D KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA; + ret =3D __vcpu_ioctl(vcpu, KVM_XEN_VCPU_SET_ATTR, &runstate_save); + TEST_ASSERT(!ret, "Restore runstate failed: errno %d", errno); + + /* + * Run the guest. When the vCPU enters vcpu_run, the kernel + * transitions from RUNSTATE_runnable to RUNSTATE_running. + * It computes: delta =3D kvmclock_now - state_entry_time + * This delta (which includes the migration gap) is added to + * time_runnable (steal time). + */ + vcpu_run(vcpu); + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_IO); + TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_SYNC); + + uint64_t guest_runnable =3D uc.args[2]; + uint64_t guest_running =3D uc.args[3]; + + pr_info(" Guest sees: runnable=3D%" PRIu64 " running=3D%" PRIu64 "\n", + guest_runnable, guest_running); + + uint64_t steal_increase =3D guest_runnable - saved_runnable; + pr_info(" Steal time increase: %" PRIu64 " ns (migration gap)\n", + steal_increase); + + /* + * The steal time increase should be at least 10ms (the sleep) + * but not more than 5s (allowing for VM creation overhead). + * The actual gap is from the source's state_entry_time to the + * destination's kvmclock "now" at vcpu_load time. + */ + TEST_ASSERT(steal_increase >=3D 10000000ULL && + steal_increase < 5000000000ULL, + "Steal time increase %" PRIu64 " ns not in expected range " + "[10ms, 5s]", steal_increase); + + kvm_vm_release(vm); + pr_info("PASS: Migration gap correctly accounted as steal time\n"); + return 0; +} --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930542; cv=none; d=zohomail.com; s=zohoarc; b=G+B3Tp+ptwgL1Avm2WBB7WYSbfrBFmOQS0AnmFipTmKX2WKrjRjZroOrVRdfe1/KaFcj0QAFGJj30jG/ObS3LOHhTPL1MY3pfjPaMca3j+5R/nh65vsCK1ZEfAaMFzP6rQU3C/qBHPLzKt6+x423OiqCCPVXZkhOivPmGocr2aQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930542; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=s86e1eDk7JwzlpKJ6Et1TYCYwI2LU3hCrdoVI9vRngc=; b=dKBjPTNelVoxh2s3aX0a36BEoDbFtDeP2Tny9G8tSFGJlpcdBHn1ocKgVypEZIPxjfWnWLIjXBlBjwawJVCCWlOpTlppAbanhVxKIpJYLK2IbaAu6sCXep9qykPdgwB58MCwwK1m6pJH2AovvGx+jj3UO7LoymAa8wztOlW51f4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930542156433.4383492897439; Mon, 8 Jun 2026 07:55:42 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331701.1594260 (Exim 4.92) (envelope-from ) id 1wWbNi-0001QM-7A; Mon, 08 Jun 2026 14:55:14 +0000 Received: by outflank-mailman (output) from mailman id 1331701.1594260; Mon, 08 Jun 2026 14:55:14 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNh-0001Pv-UI; Mon, 08 Jun 2026 14:55:13 +0000 Received: by outflank-mailman (input) for mailman id 1331701; Mon, 08 Jun 2026 14:55:13 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNh-0001Of-D7 for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:13 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNg-00EcHE-Q0; Mon, 08 Jun 2026 16:55:12 +0200 Received: from [10.42.69.4] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7c5-5cb7-0a2a0a5109dd-0a2a4504aa90-36 for ; Mon, 08 Jun 2026 16:55:12 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-ebf023.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d0-1dec-0a2a45040019-5a9b3222a71a-3 for ; Mon, 08 Jun 2026 16:55:12 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNS-0000000Dtxo-3N7M; Mon, 08 Jun 2026 14:54:59 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNS-00000000NGd-3xVq; Mon, 08 Jun 2026 15:54:58 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=s86e1eDk7JwzlpKJ6Et1TYCYwI2LU3hCrdoVI9vRngc=; b=rELP0pPshxo9UwZ5syelAnSKO6 4cfW8H6uBn/Zf/P0TiJzkdhGO+CwQJuVJ1hnrK2PiO+YODU7eRwQQ9U0yAKSEfyEvh/5xs4UNG2kn +v9PB4fL/k9dKigWNx3dlMJnggVrVGZf0qAsQb8kfP3eJ8BEJA+ahjTIyHp2dRMjxWRNJSY2ptm2B SEp+3poR4nnttJ92rVtigJ2l024kUzYMQSO61fQrwDR86js32ZvlkN2dNVL/BcDq4LQ51TBunnw8Y ZCtUTxz76ATt+jri2NqK539eYVH1wZMP0lXqVNOlqeX2U5+2ij2e9cZSDy+MJzWgPGsBMFESKwxnq +P/Ylngg==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 31/34] KVM: x86: Use ktime_get_snapshot_id() for master clock Date: Mon, 8 Jun 2026 15:48:12 +0100 Message-ID: <20260608145455.89187-32-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-ebf023/1780930512-433673FF-713DFB6D/0/0 X-purgate-type: clean X-purgate-size: 3751 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930543157158500 Content-Type: text/plain; charset="utf-8" From: David Woodhouse Replace the KVM-private vgettsc()/do_kvmclock_base()/do_monotonic()/ do_realtime() timekeeping reimplementation with calls to the generic ktime_get_snapshot_id() interface. The snapshot provides both the system time and the raw_cycles (TSC) atomically paired. When raw_cycles is zero, the clocksource could not provide a raw hardware counter value, which is equivalent to the previous vgettsc() returning VDSO_CLOCKMODE_NONE. For kvm_get_time_and_clockread(), the kvmclock base time is CLOCK_MONOTONIC_RAW + offs_boot. The snapshot provides the raw time atomically paired with the TSC; offs_boot is added separately as it only changes at suspend/resume boundaries. This is a step towards eliminating the pvclock_gtod_data private copy of timekeeping state and the associated notifier callback. Signed-off-by: David Woodhouse Assisted-by: Kiro:claude-opus-4.6-1m --- arch/x86/kvm/x86.c | 46 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 35 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 96250264d403..2713aebb96ae 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -35,6 +35,7 @@ #include "smm.h" =20 #include +#include #include #include #include @@ -3162,14 +3163,32 @@ static int do_realtime(struct timespec64 *ts, u64 *= tsc_timestamp) * reports the TSC value from which it do so. Returns true if host is * using TSC based clocksource. */ +static bool kvm_snapshot_has_tsc(struct system_time_snapshot *snap, + u64 *tsc_timestamp) +{ + if (snap->cs_id =3D=3D CSID_X86_TSC) { + *tsc_timestamp =3D snap->cycles; + return true; + } + + if (snap->hw_csid =3D=3D CSID_X86_TSC && snap->hw_cycles) { + *tsc_timestamp =3D snap->hw_cycles; + return true; + } + + return false; +} + static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *tsc_timestamp) { - /* checked again under seqlock below */ - if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode)) + struct system_time_snapshot snap; + + ktime_get_snapshot_id(CLOCK_MONOTONIC_RAW, &snap); + if (!kvm_snapshot_has_tsc(&snap, tsc_timestamp)) return false; =20 - return gtod_is_based_on_tsc(do_kvmclock_base(kernel_ns, - tsc_timestamp)); + *kernel_ns =3D ktime_to_ns(ktime_mono_to_any(snap.systime, TK_OFFS_BOOT)); + return true; } =20 /* @@ -3178,12 +3197,14 @@ static bool kvm_get_time_and_clockread(s64 *kernel_= ns, u64 *tsc_timestamp) */ bool kvm_get_monotonic_and_clockread(s64 *kernel_ns, u64 *tsc_timestamp) { - /* checked again under seqlock below */ - if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode)) + struct system_time_snapshot snap; + + ktime_get_snapshot_id(CLOCK_MONOTONIC, &snap); + if (!kvm_snapshot_has_tsc(&snap, tsc_timestamp)) return false; =20 - return gtod_is_based_on_tsc(do_monotonic(kernel_ns, - tsc_timestamp)); + *kernel_ns =3D ktime_to_ns(snap.systime); + return true; } =20 /* @@ -3196,11 +3217,14 @@ bool kvm_get_monotonic_and_clockread(s64 *kernel_ns= , u64 *tsc_timestamp) static bool kvm_get_walltime_and_clockread(struct timespec64 *ts, u64 *tsc_timestamp) { - /* checked again under seqlock below */ - if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode)) + struct system_time_snapshot snap; + + ktime_get_snapshot_id(CLOCK_REALTIME, &snap); + if (!kvm_snapshot_has_tsc(&snap, tsc_timestamp)) return false; =20 - return gtod_is_based_on_tsc(do_realtime(ts, tsc_timestamp)); + *ts =3D ktime_to_timespec64(snap.systime); + return true; } #endif =20 --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C1FD46AF38; Mon, 8 Jun 2026 14:55:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930546; cv=none; b=kna08LVeFNOHtqhzk9u2afjrsYOAtC5ZCC5HAZ7k5bO1KNFarPrRn9ijBMoqbkeE5u2GjsNpnxHPZeqqoVHOoL8j3O4b9rxEmXkTa8m26V/ViKVrvoI+g9DuD80TfIMY27j1flY2f+aMuyCEe1dvIql4B7AmvI5zWFFwhdmtRUI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780930546; c=relaxed/simple; bh=YKhhhht+zbPfyPeyUv12YljkVrKSOCIavpgHT4Fk81o=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BtBgaFAML09a+0OTAOFjAL1aKGoeICEhsSJE8yPuxXdAguQwJBJNbjMvlGZzJRMMtA8N36EB0AU3AtRx7dWbUfdNXZ2jDQRZtmISBbalL1+onfS5w34+wbb70wWCCm9YZk/pjehp8mZ2GSoSGLFHKopR+W+4RSYU10Sp+eHOtvw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=desiato.srs.infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=VFXvohgL; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=desiato.srs.infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="VFXvohgL" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=ssc2OYJnjVRJZfXuehVvSyJIF5UAgdCRis7RcurFUak=; b=VFXvohgLy6g/BbZF+sYaQENPUm yEPrtvWhoUEP7ysZ1jL03TlZvP6/mte7iIouk1LvPyagfBknLxDiPAtS0pMgyLk73UBWYMZsBhBJk QSxMDhNuZUb74U7gfaFlkjyOPkmD2kDZCIqJ0BGr/KqQ3cXnaY23jHOrMeNLQwiwvDnQlxEAGVPC1 XfEeXu2OiCeBVXeGu3N7Ay+yzO3K1yXv0OlyXvOaSVBDHHfbEDttdh9ymYTwSs2+6HWSijid365+Q peOdhFh+s8x1IvBj/iAOV4v/3paFNbtKp+GZDFVrHvfSKdYU0WaIDXKGQiD1buYYfynh8dUtGADr8 iUbu1u3w==; Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNV-00000001AgR-3WQp; Mon, 08 Jun 2026 14:55:02 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNT-00000000NGi-06LB; Mon, 08 Jun 2026 15:54:59 +0100 From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 32/34] KVM: x86: Compute kvmclock base without pvclock_gtod_data Date: Mon, 8 Jun 2026 15:48:13 +0100 Message-ID: <20260608145455.89187-33-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html Content-Type: text/plain; charset="utf-8" From: David Woodhouse get_kvmclock_base_ns() needs CLOCK_MONOTONIC_RAW + offs_boot. Compute this directly rather than reading offs_boot from the pvclock_gtod_data private copy. offs_boot only changes at suspend/resume so does not need to be atomically paired with the raw clock read. Signed-off-by: David Woodhouse Assisted-by: Kiro:claude-opus-4.6-1m --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2713aebb96ae..c18947c5b63f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2402,7 +2402,7 @@ static void update_pvclock_gtod(struct timekeeper *tk) static s64 get_kvmclock_base_ns(void) { /* Count up from boot time, but with the frequency of the raw clock. */ - return ktime_to_ns(ktime_add(ktime_get_raw(), pvclock_gtod_data.offs_boot= )); + return ktime_to_ns(ktime_mono_to_any(ktime_get_raw(), TK_OFFS_BOOT)); } =20 static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int se= c_hi_ofs) --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930555; cv=none; d=zohomail.com; s=zohoarc; b=njtKn0Q0kPuNEmB1Ea5uDwgZ2UvfNt2dbWQhroVJdaQP7FPuCW1gvU191NGjhHO/ocTouCWEbFF2s7Oujdqux14abSw1aOl3ZXCvJbnjXV3MKPvdRv/zFsbjgk+IXOeCQvtCkmQ4BwQQoedZqLLfvFCCRoRpVk1IvP1CC7JSrAg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930555; h=Content-Type:Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=GbicoDuWZuSNUqLEMcVhKsjPlvyGA6ZGIMWsZrA2Bd4=; b=P4JKeqDn27wNkHho8jugCgmQvPjZeFhTsiSEpjScrG/lTbQAvedbEVREtCvbDNV3hc8WFUWZtRNsaf/NEJQsDcVMOBO88younGISqsLq1OM6crV8W2XbTeRxT8jOf3Sj1QU6wU1EI0VbLdP4Iu8WWVlVGy+dhADSV+KEGR/CMc0= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930555521658.3503929394451; Mon, 8 Jun 2026 07:55:55 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331721.1594413 (Exim 4.92) (envelope-from ) id 1wWbO1-0005fD-9A; Mon, 08 Jun 2026 14:55:33 +0000 Received: by outflank-mailman (output) from mailman id 1331721.1594413; Mon, 08 Jun 2026 14:55:32 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNz-0005Z9-RB; Mon, 08 Jun 2026 14:55:31 +0000 Received: by outflank-mailman (input) for mailman id 1331721; Mon, 08 Jun 2026 14:55:18 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNk-00028T-D7 for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:18 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNj-00EcHH-PO; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [10.42.69.11] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7bb-bab6-0a2a0a5309dd-0a2a450bae1a-48 for ; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [90.155.92.199] (helo=desiato.infradead.org) by tlsNG-42698a.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d3-212f-0a2a450b0019-5a9b5cc785cc-3 for ; Mon, 08 Jun 2026 16:55:15 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by desiato.infradead.org with esmtpsa (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNV-00000001AgT-3Wjr; Mon, 08 Jun 2026 14:55:02 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNT-00000000NGq-0OCB; Mon, 08 Jun 2026 15:54:59 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=desiato.20200630 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To: From:Reply-To:Cc:Content-ID:Content-Description; bh=GbicoDuWZuSNUqLEMcVhKsjPlvyGA6ZGIMWsZrA2Bd4=; b=leHuzNM3KLTrNW7+UGnCAwbQJe vUfUSugsjMC/ACkfR1NJ+y3bNXWweyIjcD9uvt+b7RJpQpgEZCh8r0YFLprPoHeEz7zTbVZ40VQf1 RHGWGvBWUHjAky+jIEeN+oIpxdIwA+rhYJFsTqIgSLpRwP/QOYLul/8M4ZMbb0hoRRFf/JZgbNsHX IWIEswsP5ZrCjtOdL/HQr3BqSZWQ5/uovEEuxmiW7W6MWIXC9g6IckD0ZB8vI9ExeEjmMsm6aWVg1 FkyqCuvpziD3jAkT+kLC0I29mxEwe78pfGfSZNNXltg0QZDpzE3eTwC4P5suBTsmwFzLtXPwQMelI fZwnwQgw==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 33/34] KVM: x86: Replace pvclock_gtod_data vclock_mode with boolean Date: Mon, 8 Jun 2026 15:48:14 +0100 Message-ID: <20260608145455.89187-34-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by desiato.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-42698a/1780930515-21183F3B-5CA7E539/0/0 X-purgate-type: clean X-purgate-size: 3656 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930558209154100 From: David Woodhouse The remaining users of pvclock_gtod_data only need to know whether the host clocksource is TSC-based. Replace all vclock_mode checks with a simple kvm_host_has_tsc_clocksource boolean, updated by the pvclock_gtod_notify callback. This is inherently racy (as it always was =E2=80=94 kvm_track_tsc_matching never held the gtod seqcount), relying on eventual consistency: the notifier fires on every timekeeping update and will correct any transient inconsistency within one tick. Signed-off-by: David Woodhouse Assisted-by: Kiro:claude-opus-4.6-1m --- arch/x86/kvm/x86.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c18947c5b63f..93a428c37847 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2649,6 +2649,8 @@ static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s= 64 kernel_ns) } =20 #ifdef CONFIG_X86_64 +static bool kvm_host_has_tsc_clocksource; + static inline bool gtod_is_based_on_tsc(int mode) { return mode =3D=3D VDSO_CLOCKMODE_TSC || mode =3D=3D VDSO_CLOCKMODE_HVCLO= CK; @@ -2678,7 +2680,6 @@ static bool kvm_use_master_clock(struct kvm *kvm) static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu, bool new_generat= ion) { struct kvm_arch *ka =3D &vcpu->kvm->arch; - struct pvclock_gtod_data *gtod =3D &pvclock_gtod_data; =20 /* * Track whether all vCPUs have matching TSC offsets (for @@ -2712,7 +2713,7 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *v= cpu, bool new_generation) * accounts for its offset. */ bool use_master_clock =3D kvm_use_master_clock(vcpu->kvm) && - gtod_is_based_on_tsc(gtod->clock.vclock_mode); + kvm_host_has_tsc_clocksource; =20 /* * Request a masterclock update if the masterclock needs to be toggled @@ -2726,7 +2727,7 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *v= cpu, bool new_generation) =20 trace_kvm_track_tsc(vcpu->vcpu_id, ka->nr_vcpus_matched_tsc, atomic_read(&vcpu->kvm->online_vcpus), - ka->use_master_clock, gtod->clock.vclock_mode); + ka->use_master_clock, kvm_host_has_tsc_clocksource); } #else static inline void kvm_track_tsc_matching(struct kvm_vcpu *vcpu, @@ -2850,7 +2851,7 @@ static inline bool kvm_check_tsc_unstable(void) * TSC is marked unstable when we're running on Hyper-V, * 'TSC page' clocksource is good. */ - if (pvclock_gtod_data.clock.vclock_mode =3D=3D VDSO_CLOCKMODE_HVCLOCK) + if (kvm_host_has_tsc_clocksource) return false; #endif return check_tsc_unstable(); @@ -3315,7 +3316,7 @@ static void pvclock_update_vm_gtod_copy(struct kvm *k= vm) ka->use_master_clock =3D false; } =20 - vclock_mode =3D pvclock_gtod_data.clock.vclock_mode; + vclock_mode =3D kvm_host_has_tsc_clocksource; trace_kvm_update_master_clock(ka->use_master_clock, vclock_mode, ka->all_vcpus_matched_freq); #endif @@ -10407,12 +10408,15 @@ static int pvclock_gtod_notify(struct notifier_bl= ock *nb, unsigned long unused, update_pvclock_gtod(tk); =20 #ifdef CONFIG_X86_64 + kvm_host_has_tsc_clocksource =3D + gtod_is_based_on_tsc(tk->tkr_mono.clock->vdso_clock_mode); + /* * Disable master clock if host does not trust, or does not use, * TSC based clocksource. Delegate queue_work() to irq_work as * this is invoked with tk_core.seq write held. */ - if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode) && + if (!kvm_host_has_tsc_clocksource && atomic_read(&kvm_guest_has_master_clock) !=3D 0) irq_work_queue(&pvclock_irq_work); #endif --=20 2.54.0 From nobody Sat Jun 13 07:34:16 2026 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass(p=none dis=none) header.from=infradead.org ARC-Seal: i=1; a=rsa-sha256; t=1780930541; cv=none; d=zohomail.com; s=zohoarc; b=b/5EkOBrxDXMwS+FFyf43wuJNzuf4Ow4QTX5ipZziKup8bEJfBjpaR0DPCf+R1m2JY29oQSqQktiOQTlQRYDvwdny0O7hP5ZadaUVVa12b8s9BupC72VWLofLi9JUandAWriaNiG1vdSO95P4xN0K2fhMQVQ7YtcNWP1t2oilUc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1780930541; h=Content-Transfer-Encoding:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To:Cc; bh=6Oiy0XgmrdblbNNhRhh9uvWEqZS/fcG3BHbMB9gqe6c=; b=msi8t4x1C0TnhXopwfqjBw2Oi5mpmeIXbQSa4ZYzcWVf5X3EpjRPEVS4P3jsfIn69ZRhFdmiNubQ4FrzWUGHgoVM/3KiFoXSA+pZnpwJUdEmAm30+2U07e96SFZxUS++XSmyCq6BdSwCga+wbX1SkRE2VaTpGTTQ9qP5QfPM6Tg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1780930541797638.2963107197779; Mon, 8 Jun 2026 07:55:41 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1331702.1594266 (Exim 4.92) (envelope-from ) id 1wWbNi-0001TY-Eg; Mon, 08 Jun 2026 14:55:14 +0000 Received: by outflank-mailman (output) from mailman id 1331702.1594266; Mon, 08 Jun 2026 14:55:14 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNi-0001T1-8N; Mon, 08 Jun 2026 14:55:14 +0000 Received: by outflank-mailman (input) for mailman id 1331702; Mon, 08 Jun 2026 14:55:13 +0000 Received: from mx.expurgate.net ([194.145.224.10]) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1wWbNh-0001Oe-BQ for xen-devel@lists.xenproject.org; Mon, 08 Jun 2026 14:55:13 +0000 Received: from mx.expurgate.net (helo=localhost) by mx.expurgate.net with esmtp id 1wWbNg-00EcHE-O7; Mon, 08 Jun 2026 16:55:12 +0200 Received: from [10.42.69.10] (helo=localhost) by localhost with ESMTP (eXpurgate MTA 0.9.1) (envelope-from ) id 6a26d7b4-5cb7-0a2a0a5109dd-0a2a450aa392-34 for ; Mon, 08 Jun 2026 16:55:12 +0200 Received: from [90.155.50.34] (helo=casper.infradead.org) by tlsNG-4011c0.mxtls.expurgate.net with ESMTPS (eXpurgate 4.56.1) (envelope-from ) id 6a26d7d0-56b3-0a2a450a0019-5a9b322293c2-3 for ; Mon, 08 Jun 2026 16:55:12 +0200 Received: from [2001:8b0:10b:1::425] (helo=i7.infradead.org) by casper.infradead.org with esmtpsa (Exim 4.99.1 #2 (Red Hat Linux)) id 1wWbNS-0000000Dtxp-3tSE; Mon, 08 Jun 2026 14:54:59 +0000 Received: from dwoodhou by i7.infradead.org with local (Exim 4.99.2 #2 (Red Hat Linux)) id 1wWbNT-00000000NGv-0Yy9; Mon, 08 Jun 2026 15:54:59 +0100 X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Authentication-Results: eu.smtp.expurgate.cloud; dkim=pass header.s=casper.20170209 header.d=infradead.org header.i="@infradead.org" header.h="Sender:Content-Transfer-Encoding:MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-ID:Date:Subject:To:From:Reply-To: Cc:Content-Type:Content-ID:Content-Description; bh=6Oiy0XgmrdblbNNhRhh9uvWEqZS/fcG3BHbMB9gqe6c=; b=MSZfuWqiZiqALtY1MLC6v0WbvH cpxENKVttZkGvUcUnfH3cmXTyHF3n/98pkcgu7CvfrdZYKiuL6WuWY0AZcqinMnfKB+/Rg9Yt95g3 yaATLRg4awYZ0QThmGe6xgS2YzYF+UYyqEwCEPwExEhXVXmzZmBTD3aFHGyMiygVIGd7INsNPIXZB HodVYABRpzYthI/nQjlXg9LBoL4JsE/P04Lx7m0sGPt/WKsEAFrYugZyax7C14LGzIuV9ltqZ0Co5 htE40im31t6H9LcfAt3wwZLgv0lDLBGf6rVZiWaAiQ0wEjqbxzkg4SJ9PTo72VNEt8wbX4RaIDYD9 4qnXWU6Q==; From: David Woodhouse To: Paolo Bonzini , Jonathan Corbet , Shuah Khan , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Vitaly Kuznetsov , Juergen Gross , Boris Ostrovsky , David Woodhouse , Paul Durrant , Jonathan Cameron , Sascha Bischoff , Marc Zyngier , Joey Gouly , Jack Allister , Dongli Zhang , joe.jin@oracle.com, kvm@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, xen-devel@lists.xenproject.org, linux-kselftest@vger.kernel.org Subject: [PATCH v5 34/34] KVM: x86: Remove pvclock_gtod_data and private timekeeping code Date: Mon, 8 Jun 2026 15:48:15 +0100 Message-ID: <20260608145455.89187-35-dwmw2@infradead.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260608145455.89187-1-dwmw2@infradead.org> References: <20260608145455.89187-1-dwmw2@infradead.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: David Woodhouse X-SRS-Rewrite: SMTP reverse-path rewritten from by casper.infradead.org. See http://www.infradead.org/rpr.html X-purgate-ID: tlsNG-4011c0/1780930512-733738B7-BC741CEE/0/0 X-purgate-type: clean X-purgate-size: 9071 X-ZohoMail-DKIM: pass (identity @infradead.org) X-ZM-MESSAGEID: 1780930543579158500 Content-Type: text/plain; charset="utf-8" From: David Woodhouse Remove the now-unused KVM-private timekeeping infrastructure: - struct pvclock_clock and struct pvclock_gtod_data - update_pvclock_gtod() and its seqcount-protected state copy - read_tsc() (KVM's private TSC reader with cycle_last clamping) - vgettsc() (KVM's private clocksource interpolation) - do_kvmclock_base(), do_monotonic(), do_realtime() Signed-off-by: David Woodhouse Assisted-by: Kiro:claude-opus-4.6-1m --- Documentation/virt/kvm/devices/vcpu.rst | 4 +- arch/x86/kvm/vmx/vmx.c | 2 + arch/x86/kvm/x86.c | 177 +----------------- .../testing/selftests/kvm/x86/pvclock_test.c | 7 +- 4 files changed, 9 insertions(+), 181 deletions(-) diff --git a/Documentation/virt/kvm/devices/vcpu.rst b/Documentation/virt/k= vm/devices/vcpu.rst index 167aa4140d30..3d1a89c2b4f7 100644 --- a/Documentation/virt/kvm/devices/vcpu.rst +++ b/Documentation/virt/kvm/devices/vcpu.rst @@ -243,9 +243,9 @@ Returns: Specifies the guest's TSC offset relative to the host's TSC. The guest's TSC is then derived by the following equation: =20 - guest_tsc =3D ((host_tsc * tsc_scale_ratio) >> tsc_scale_bits) + KVM_VCP= U_TSC_OFFSET + guest_tsc =3D ((host_tsc * tsc_ratio) >> tsc_frac_bits) + KVM_VCPU_TSC_O= FFSET =20 -The values of tsc_scale_ratio and tsc_scale_bits can be obtained using +The values of tsc_ratio and tsc_frac_bits can be obtained using the KVM_VCPU_TSC_SCALE attribute. =20 This attribute is useful to adjust the guest's TSC on live migration, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index ed207cc7692d..1aaf3924a799 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8674,6 +8674,8 @@ __init int vmx_hardware_setup(void) =20 if (cpu_has_vmx_tsc_scaling() && boot_cpu_has(X86_FEATURE_CONSTANT_TSC)) kvm_caps.has_tsc_control =3D true; + else + vmcs_config.cpu_based_2nd_exec_ctrl &=3D ~SECONDARY_EXEC_TSC_SCALING; =20 kvm_caps.max_tsc_scaling_ratio =3D KVM_VMX_TSC_MULTIPLIER_MAX; kvm_caps.tsc_scaling_ratio_frac_bits =3D 48; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 93a428c37847..966057913366 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2347,58 +2347,6 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigne= d index, u64 *data) return kvm_set_msr_ignored_check(vcpu, index, *data, true); } =20 -struct pvclock_clock { - int vclock_mode; - u64 cycle_last; - u64 mask; - u32 mult; - u32 shift; - u64 base_cycles; - u64 offset; -}; - -struct pvclock_gtod_data { - seqcount_t seq; - - struct pvclock_clock clock; /* extract of a clocksource struct */ - struct pvclock_clock raw_clock; /* extract of a clocksource struct */ - - ktime_t offs_boot; - u64 wall_time_sec; -}; - -static struct pvclock_gtod_data pvclock_gtod_data; - -static void update_pvclock_gtod(struct timekeeper *tk) -{ - struct pvclock_gtod_data *vdata =3D &pvclock_gtod_data; - - write_seqcount_begin(&vdata->seq); - - /* copy pvclock gtod data */ - vdata->clock.vclock_mode =3D tk->tkr_mono.clock->vdso_clock_mode; - vdata->clock.cycle_last =3D tk->tkr_mono.cycle_last; - vdata->clock.mask =3D tk->tkr_mono.mask; - vdata->clock.mult =3D tk->tkr_mono.mult; - vdata->clock.shift =3D tk->tkr_mono.shift; - vdata->clock.base_cycles =3D tk->tkr_mono.xtime_nsec; - vdata->clock.offset =3D tk->tkr_mono.base; - - vdata->raw_clock.vclock_mode =3D tk->tkr_raw.clock->vdso_clock_mode; - vdata->raw_clock.cycle_last =3D tk->tkr_raw.cycle_last; - vdata->raw_clock.mask =3D tk->tkr_raw.mask; - vdata->raw_clock.mult =3D tk->tkr_raw.mult; - vdata->raw_clock.shift =3D tk->tkr_raw.shift; - vdata->raw_clock.base_cycles =3D tk->tkr_raw.xtime_nsec; - vdata->raw_clock.offset =3D tk->tkr_raw.base; - - vdata->wall_time_sec =3D tk->xtime_sec; - - vdata->offs_boot =3D tk->offs_boot; - - write_seqcount_end(&vdata->seq); -} - static s64 get_kvmclock_base_ns(void) { /* Count up from boot time, but with the frequency of the raw clock. */ @@ -3037,128 +2985,6 @@ static inline void adjust_tsc_offset_host(struct kv= m_vcpu *vcpu, s64 adjustment) =20 #ifdef CONFIG_X86_64 =20 -static u64 read_tsc(void) -{ - u64 ret =3D (u64)rdtsc_ordered(); - u64 last =3D pvclock_gtod_data.clock.cycle_last; - - if (likely(ret >=3D last)) - return ret; - - /* - * GCC likes to generate cmov here, but this branch is extremely - * predictable (it's just a function of time and the likely is - * very likely) and there's a data dependence, so force GCC - * to generate a branch instead. I don't barrier() because - * we don't actually need a barrier, and if this function - * ever gets inlined it will generate worse code. - */ - asm volatile (""); - return last; -} - -static inline u64 vgettsc(struct pvclock_clock *clock, u64 *tsc_timestamp, - int *mode) -{ - u64 tsc_pg_val; - long v; - - switch (clock->vclock_mode) { - case VDSO_CLOCKMODE_HVCLOCK: - if (hv_read_tsc_page_tsc(hv_get_tsc_page(), - tsc_timestamp, &tsc_pg_val)) { - /* TSC page valid */ - *mode =3D VDSO_CLOCKMODE_HVCLOCK; - v =3D (tsc_pg_val - clock->cycle_last) & - clock->mask; - } else { - /* TSC page invalid */ - *mode =3D VDSO_CLOCKMODE_NONE; - } - break; - case VDSO_CLOCKMODE_TSC: - *mode =3D VDSO_CLOCKMODE_TSC; - *tsc_timestamp =3D read_tsc(); - v =3D (*tsc_timestamp - clock->cycle_last) & - clock->mask; - break; - default: - *mode =3D VDSO_CLOCKMODE_NONE; - } - - if (*mode =3D=3D VDSO_CLOCKMODE_NONE) - *tsc_timestamp =3D v =3D 0; - - return v * clock->mult; -} - -/* - * As with get_kvmclock_base_ns(), this counts from boot time, at the - * frequency of CLOCK_MONOTONIC_RAW (hence adding gtos->offs_boot). - */ -static int do_kvmclock_base(s64 *t, u64 *tsc_timestamp) -{ - struct pvclock_gtod_data *gtod =3D &pvclock_gtod_data; - unsigned long seq; - int mode; - u64 ns; - - do { - seq =3D read_seqcount_begin(>od->seq); - ns =3D gtod->raw_clock.base_cycles; - ns +=3D vgettsc(>od->raw_clock, tsc_timestamp, &mode); - ns >>=3D gtod->raw_clock.shift; - ns +=3D ktime_to_ns(ktime_add(gtod->raw_clock.offset, gtod->offs_boot)); - } while (unlikely(read_seqcount_retry(>od->seq, seq))); - *t =3D ns; - - return mode; -} - -/* - * This calculates CLOCK_MONOTONIC at the time of the TSC snapshot, with - * no boot time offset. - */ -static int do_monotonic(s64 *t, u64 *tsc_timestamp) -{ - struct pvclock_gtod_data *gtod =3D &pvclock_gtod_data; - unsigned long seq; - int mode; - u64 ns; - - do { - seq =3D read_seqcount_begin(>od->seq); - ns =3D gtod->clock.base_cycles; - ns +=3D vgettsc(>od->clock, tsc_timestamp, &mode); - ns >>=3D gtod->clock.shift; - ns +=3D ktime_to_ns(gtod->clock.offset); - } while (unlikely(read_seqcount_retry(>od->seq, seq))); - *t =3D ns; - - return mode; -} - -static int do_realtime(struct timespec64 *ts, u64 *tsc_timestamp) -{ - struct pvclock_gtod_data *gtod =3D &pvclock_gtod_data; - unsigned long seq; - int mode; - u64 ns; - - do { - seq =3D read_seqcount_begin(>od->seq); - ts->tv_sec =3D gtod->wall_time_sec; - ns =3D gtod->clock.base_cycles; - ns +=3D vgettsc(>od->clock, tsc_timestamp, &mode); - ns >>=3D gtod->clock.shift; - } while (unlikely(read_seqcount_retry(>od->seq, seq))); - - ts->tv_sec +=3D __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); - ts->tv_nsec =3D ns; - - return mode; -} - /* * Calculates the kvmclock_base_ns (CLOCK_MONOTONIC_RAW + boot time) and * reports the TSC value from which it do so. Returns true if host is @@ -6231,7 +6057,7 @@ static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcp= u, break; } case KVM_VCPU_TSC_SCALE: - r =3D -EINVAL; /* Read only */ + r =3D kvm_caps.has_tsc_control ? -EINVAL : -ENXIO; break; default: r =3D -ENXIO; @@ -10405,7 +10231,6 @@ static int pvclock_gtod_notify(struct notifier_bloc= k *nb, unsigned long unused, { struct timekeeper *tk =3D priv; =20 - update_pvclock_gtod(tk); =20 #ifdef CONFIG_X86_64 kvm_host_has_tsc_clocksource =3D diff --git a/tools/testing/selftests/kvm/x86/pvclock_test.c b/tools/testing= /selftests/kvm/x86/pvclock_test.c index aecd62fc8a93..4c1869fa482e 100644 --- a/tools/testing/selftests/kvm/x86/pvclock_test.c +++ b/tools/testing/selftests/kvm/x86/pvclock_test.c @@ -14,7 +14,6 @@ #include "test_util.h" #include "kvm_util.h" #include "processor.h" -#include "apic.h" =20 #include =20 @@ -262,10 +261,12 @@ int main(int argc, char *argv[]) return 0; } =20 +static volatile uint32_t vcpu_counter; + static void guest_code_stable_bit(void) { - uint32_t apic_id =3D GET_APIC_ID_FIELD(xapic_read_reg(APIC_ID)); - uint64_t gpa =3D KVMCLOCK_GPA + apic_id * sizeof(struct pvclock_vcpu_time= _info); + uint32_t idx =3D __atomic_fetch_add(&vcpu_counter, 1, __ATOMIC_SEQ_CST); + uint64_t gpa =3D KVMCLOCK_GPA + idx * sizeof(struct pvclock_vcpu_time_inf= o); =20 wrmsr(MSR_KVM_SYSTEM_TIME_NEW, gpa | KVM_MSR_ENABLED); GUEST_SYNC(0); --=20 2.54.0