From nobody Mon Oct 6 11:55:46 2025 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B31E828D85C for ; Tue, 22 Jul 2025 05:50:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753163446; cv=none; b=bPZ8bNWRUB9rfOujnpaZGwlGXUB5f1hRbPBv9xfhgFjmLaZxJ0ESSvK+R6VQVLX4V+lW8/LvI/zNw+uFt02/Tou2/0sWHpmZj3ORtXwF4ADZxbsoT5V0r+kuR2KnLSTzIE621F1wWS8PWXx2shQ5Td2zq+tTjhxIPchxNgeSb1s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753163446; c=relaxed/simple; bh=/fx4kWRX+PU6Fv4S7XnbQ02veHjaFijyVUSS0cLjVF8=; h=Date:In-Reply-To:Message-Id:Mime-Version:References:Subject:From: To:Cc:Content-Type; b=djZlh3gHICshmB4KkxRreBVxYO76jAhbtFy9nX8ahCL0waA6yOLhYW/ePY4mRksqBzu8wguu+YDxtytJ/XiD3DT/hJ8o8eEzbTnhINAVdbz+gnOyrlRvwkgVCjwr0jRWED6WQhDdWsSF1Jp4qo4Ja/ruopzu1ROoNAP6CRrdyxM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--suleiman.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=KNULWDde; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--suleiman.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KNULWDde" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-7111ff9c057so71828147b3.0 for ; Mon, 21 Jul 2025 22:50:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753163443; x=1753768243; darn=vger.kernel.org; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=GeeOPD6VQHaCMKZJuXhAQvuJ1HF7a4B8fy+bBRtdRfM=; b=KNULWDdeeOUEHINlMsfKGhVT7aMrbcdR1SydqTcnfOxWjl0Nj5rlLUrHE9iCicyWIW 9ByWu9PnZxH+XWv96a5LpaGookMv8uAOdKqorClaj6msTONA05HOvfP0uwVk/OkJGyqa +qJe6eDF/TOAopUZAYCOKlbSKCoQaoxalmiVoYkE8+vbfhHInTokYG9L+jaWSeyw5vy0 QLlX4rJlhSouan4r8NlVBCULPe1Ms7Tcj+jmFXpJtJ1XeBHyCbw7FHZs/npv+yOW44Sw 1ey/i336EhNwkoo3zMySffrpTSnTtUmKxl0wfe+CF7hFjiBlCRqfe1J8Sp1jppBbucnc 11CA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753163443; x=1753768243; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=GeeOPD6VQHaCMKZJuXhAQvuJ1HF7a4B8fy+bBRtdRfM=; b=ktsTLtmZUgud8EoXhxSrEx3lV58kLuLExMJuR4PKUNFgoGBCtOTmpPnoPlyzVzU//g +1/J2wE3kLXdMs12DjixDJo7gS8urqMcmE2TvrzyZF9MKgxkLhgIVrcY1bKRFgVPj7zc QUNYxyJ3o/6Td3aonNjibAKWzLG5pAfqm0YSX+IiMBbJ6RnHPwxAmk+5PRxh15h/VlwZ 5TDAF7WZwpCiVe5MCz/pubkT+WQyRjg6lnV0kSxGBVDJ2A/5qIviPvMxRObXBhBA4mPI F4mhV7ZwOpNXmKRz/ri0I3c/GDW+siInzTCUNLzBexUOeeC7tB54j+CVrf+bUdvwaFrc MvBA== X-Forwarded-Encrypted: i=1; AJvYcCWKxS1IPjDBanvk+5jKOhnCfKMt1fnRpPtr131uN2Mu1BjELBxqXQ2TBdg67X9PLv40zxvkcLjzhjL+FMU=@vger.kernel.org X-Gm-Message-State: AOJu0YyHir3fs/xvEkUK5X8QOV73DNUJrPjlAYWwed7pF58Zw/odJaUU 9fWZ53tvLo4XUMhg9Ud9nv95gdI+tZPiq8+VtPC91dZ0u3qd8OytJ4tVmasaUjPj8khbnlSjt3E +lXQt0X8iKoUpIQ== X-Google-Smtp-Source: AGHT+IG0/er4Oy4baTA0rbPQI0RPFFRIlvUUkZi83jLlYdOmAyZ0wGQZ0n3y5nto8lfMSr6b3Bt7hRUlYbWmSQ== X-Received: from suleiman1.tok.corp.google.com ([2401:fa00:8f:203:eafa:b5d9:dc1:d7b1]) (user=suleiman job=sendgmr) by 2002:a81:c902:0:b0:70e:70de:64ef with SMTP id 00721157ae682-71836903ef2mr51877b3.0.1753163443305; Mon, 21 Jul 2025 22:50:43 -0700 (PDT) Date: Tue, 22 Jul 2025 14:50:28 +0900 In-Reply-To: <20250722055030.3126772-1-suleiman@google.com> Message-Id: <20250722055030.3126772-2-suleiman@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250722055030.3126772-1-suleiman@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Subject: [PATCH v8 1/3] KVM: x86: Advance guest TSC after deep suspend. From: Suleiman Souhlal To: Paolo Bonzini , Sean Christopherson Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Chao Gao , David Woodhouse , Sergey Senozhatsky , Konrad Rzeszutek Wilk , Tzung-Bi Shih , John Stultz , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, ssouhlal@freebsd.org, Suleiman Souhlal Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Try to advance guest TSC to current time after suspend when the host TSCs went backwards. This makes the behavior consistent between suspends where host TSC resets and suspends where it doesn't, such as suspend-to-idle, where in the former case if the host TSC resets, the guests' would previously be "frozen" due to KVM's backwards TSC prevention, while in the latter case they would advance. Suggested-by: Sean Christopherson Signed-off-by: Suleiman Souhlal --- arch/x86/include/asm/kvm_host.h | 3 ++ arch/x86/kvm/x86.c | 49 ++++++++++++++++++++++++++++++++- 2 files changed, 51 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index fb01e456b624..e57d51e9f2be 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1415,6 +1415,9 @@ struct kvm_arch { u64 cur_tsc_offset; u64 cur_tsc_generation; int nr_vcpus_matched_tsc; +#ifdef CONFIG_X86_64 + bool host_was_suspended; +#endif =20 u32 default_tsc_khz; bool user_set_tsc; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a9d992d5652f..422c7fcc5d83 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2779,7 +2779,7 @@ static inline void adjust_tsc_offset_guest(struct kvm= _vcpu *vcpu, kvm_vcpu_write_tsc_offset(vcpu, tsc_offset + adjustment); } =20 -static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjus= tment) +static inline void __adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adj= ustment) { if (vcpu->arch.l1_tsc_scaling_ratio !=3D kvm_caps.default_tsc_scaling_rat= io) WARN_ON(adjustment < 0); @@ -4995,6 +4995,52 @@ static bool need_emulate_wbinvd(struct kvm_vcpu *vcp= u) =20 static DEFINE_PER_CPU(struct kvm_vcpu *, last_vcpu); =20 +#ifdef CONFIG_X86_64 +static void kvm_set_host_was_suspended(struct kvm *kvm) +{ + kvm->arch.host_was_suspended =3D true; +} + +static void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, u64 adj) +{ + unsigned long flags; + struct kvm *kvm; + bool advance; + u64 kernel_ns, l1_tsc, offset, tsc_now; + + kvm =3D vcpu->kvm; + advance =3D kvm_get_time_and_clockread(&kernel_ns, &tsc_now); + raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); + /* + * Advance the guest's TSC to current time instead of only preventing + * it from going backwards, while making sure all the vCPUs use the + * same offset. + */ + if (kvm->arch.host_was_suspended && advance) { + l1_tsc =3D nsec_to_cycles(vcpu, + kvm->arch.kvmclock_offset + kernel_ns); + offset =3D kvm_compute_l1_tsc_offset(vcpu, l1_tsc); + kvm->arch.cur_tsc_offset =3D offset; + kvm_vcpu_write_tsc_offset(vcpu, offset); + } else if (advance) { + kvm_vcpu_write_tsc_offset(vcpu, kvm->arch.cur_tsc_offset); + } else { + __adjust_tsc_offset_host(vcpu, adj); + } + kvm->arch.host_was_suspended =3D false; + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); +} +#else +static void kvm_set_host_was_suspended(struct kvm *kvm) +{ +} + +static void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, u64 adj) +{ + __adjust_tsc_offset_host(vcpu, adj); +} +#endif /* CONFIG_X86_64 */ + void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) { struct kvm_pmu *pmu =3D vcpu_to_pmu(vcpu); @@ -12729,6 +12775,7 @@ int kvm_arch_enable_virtualization_cpu(void) kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); } =20 + kvm_set_host_was_suspended(kvm); /* * We have to disable TSC offset matching.. if you were * booting a VM while issuing an S4 host suspend.... --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 11:55:46 2025 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE84828FFFB for ; Tue, 22 Jul 2025 05:50:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753163451; cv=none; b=iOF073eEMi32oQPhg2aAObcNYwPhtFI97pEvLUmy28HSnsRowBDicPJWjeRSjKKPTVGYl/jmgmiAo6XFkyMcDVHUEQhIWabHLOTbO3lQ0IRqcKRCMjlTF0W6oVahcMRToz/iCjNAFRifDhsodW9voQRPAFm2eKIvIG1rCJSP+iI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753163451; c=relaxed/simple; bh=rZeBNVhCb6ccSjimPPljnNXPjv5gKEiGcujaRLsgPoQ=; h=Date:In-Reply-To:Message-Id:Mime-Version:References:Subject:From: To:Cc:Content-Type; b=jvz+Eb7fmjdRrL0oxEYqOJasPYwTmk5O0mSnYsVCH7KQGquY6k5VNU+iDO7pGgZkj+HiKd4KT0ohLCBH/XncobVaV4gIPZccOPvW6X3D9CgWEFWJgqUXoddN5orTR3JG2vs2RtC62RPy8YN1dfjFOMO7rjRv1plz1DEHf/aXadA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--suleiman.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=zEmZgSqg; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--suleiman.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="zEmZgSqg" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-70e4269deb2so67427627b3.3 for ; Mon, 21 Jul 2025 22:50:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753163448; x=1753768248; darn=vger.kernel.org; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CwznA19PRum1QeHsT62Slfw8h/DfNulA6Z3eT71J628=; b=zEmZgSqgQamufEPvVf+Ti00rePAojpjL0hW1/VcHXKlPKJndr1Xqcz8nhf4FxcELhB Uc8azNDHf+MjeqKvWDyl0ICHsUcGDmS8wdnyaZQm8qV3P5CHpvhD2HwQ2mFV+NlzHPiL tf9eH2SYItncj+2niedI+nKtAVnbvW8ChiwfTYRHreASXoEyfNAdNrCJEmoLevH553GT tHIWVWB6NeR/CJYhXNaGSUne83NAIPz/esFv0EbiJLTFKhuQG6A9sD+SzR6evGt6z2uu 7ERImAWloGJKGqALCuAJLMecAWyUTXOxmQtaykO/rXXAWNvjy1Nmh3UtFlMMhV0DXHZD LvZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753163448; x=1753768248; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CwznA19PRum1QeHsT62Slfw8h/DfNulA6Z3eT71J628=; b=sZSUntKPq39aG36QgoEmXM33DyD63qoalOImaeWVADGAF1BfqgffLzzKqiNEV34+22 wliXWtXBQ4ompNnxy4lUhrYuwfowJzWfdAFkQLUV0f6/wqLx59XWWYHqNH1pmlZJMTxC VzqrHBwJoC9vka2+P7/iiynxokAdZscdFPfD+5Z8+8xZVl0toaHvs+Ittxn2V334nES6 7CvbF6gKRdXjIVOVAkqeFqQYlvXWKsHNFBiiDjGsvbicnP9qFC53kfYbCxUR9vbeBkog p32Zx4UDsUX793cMtkv3PEss2SvsD2Uf13MXKyevdHk9JOg4t9bwiirhQjhO9sFXccou dwuw== X-Forwarded-Encrypted: i=1; AJvYcCWpd7ll0ZnC0yEDPVXF7KbkILKTcAA/Kc7WSTx7+ukG+hMssxi75UUqymj24FWEXkkRXfKiDuHN2xRvNNY=@vger.kernel.org X-Gm-Message-State: AOJu0YyPCGbWpLcwyxkIrLJ50zT0+sllRglOHkRTSrCte/oj6MDcXFyV RqiO74IVcLcoFXZ6w0IRtxnKRFSNNNNEZ/d17scqMwLc4iiyKagvPT9lUZLxrEm8FfD574u6a0s b3/9fHyBW+f4Jfg== X-Google-Smtp-Source: AGHT+IF8le2uYW9nAUBEYPBsA2ijEf5SGga5Xt7Pm5nTyIxPwSjfwaM+fxxRiBZ3N59mAkkk8JpMMZaW274ezw== X-Received: from suleiman1.tok.corp.google.com ([2401:fa00:8f:203:eafa:b5d9:dc1:d7b1]) (user=suleiman job=sendgmr) by 2002:a25:6f45:0:b0:e82:e80:9cbb with SMTP id 3f1490d57ef6-e8bc23f57c1mr5879276.3.1753163447313; Mon, 21 Jul 2025 22:50:47 -0700 (PDT) Date: Tue, 22 Jul 2025 14:50:29 +0900 In-Reply-To: <20250722055030.3126772-1-suleiman@google.com> Message-Id: <20250722055030.3126772-3-suleiman@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250722055030.3126772-1-suleiman@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Subject: [PATCH v8 2/3] KVM: x86: Include host suspended duration in steal time From: Suleiman Souhlal To: Paolo Bonzini , Sean Christopherson Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Chao Gao , David Woodhouse , Sergey Senozhatsky , Konrad Rzeszutek Wilk , Tzung-Bi Shih , John Stultz , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, ssouhlal@freebsd.org, Suleiman Souhlal Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce MSR_KVM_SUSPEND_STEAL which controls whether or not a guest wants the duration of host suspend to be included in steal time. This lets guests subtract the duration during which the host was suspended from the runtime of tasks that were running over the suspend, in order to prevent cases where host suspend causes long runtimes in guest tasks, even though their effective runtime was much shorter. Signed-off-by: Suleiman Souhlal --- Documentation/virt/kvm/x86/cpuid.rst | 4 ++ Documentation/virt/kvm/x86/msr.rst | 14 +++++ arch/x86/include/asm/kvm_host.h | 3 ++ arch/x86/include/uapi/asm/kvm_para.h | 2 + arch/x86/kvm/cpuid.c | 4 +- arch/x86/kvm/x86.c | 80 ++++++++++++++++++++++++++-- 6 files changed, 101 insertions(+), 6 deletions(-) diff --git a/Documentation/virt/kvm/x86/cpuid.rst b/Documentation/virt/kvm/= x86/cpuid.rst index bda3e3e737d7..71b42b649973 100644 --- a/Documentation/virt/kvm/x86/cpuid.rst +++ b/Documentation/virt/kvm/x86/cpuid.rst @@ -103,6 +103,10 @@ KVM_FEATURE_HC_MAP_GPA_RANGE 16 guest c= hecks this feature bit bef KVM_FEATURE_MIGRATION_CONTROL 17 guest checks this feature b= it before using MSR_KVM_MIGRATION_CON= TROL =20 +KVM_FEATURE_SUSPEND_STEAL 18 guest checks this feature b= it + before using + MSR_KVM_SUSPEND_STEAL. + KVM_FEATURE_CLOCKSOURCE_STABLE_BIT 24 host will warn if no guest-= side per-cpu warps are expected = in kvmclock diff --git a/Documentation/virt/kvm/x86/msr.rst b/Documentation/virt/kvm/x8= 6/msr.rst index 3aecf2a70e7b..7c33f9ee11f5 100644 --- a/Documentation/virt/kvm/x86/msr.rst +++ b/Documentation/virt/kvm/x86/msr.rst @@ -296,6 +296,12 @@ data: the amount of time in which this vCPU did not run, in nanoseconds. Time during which the vcpu is idle, will not be reported as steal time. + If the guest set the enable bit in MSR_KVM_SUSPEND_STEAL, + steal time includes the duration during which the host is + suspended. The case where the host suspends during a VM + migration might not be accounted if VCPUs aren't entered + post-resume. A workaround would be for the VMM to ensure that + the guest is entered with KVM_RUN after resuming from suspend. =20 preempted: indicate the vCPU who owns this struct is running or @@ -388,3 +394,11 @@ data: guest is communicating page encryption status to the host using the ``KVM_HC_MAP_GPA_RANGE`` hypercall, it can set bit 0 in this MSR to allow live migration of the guest. + +MSR_KVM_SUSPEND_STEAL: + 0x4b564d09 + +data: + This MSR is available if KVM_FEATURE_SUSPEND_STEAL is present in + CPUID. Bit 0 controls whether the host should include the duration it + has been suspended in steal time (1), or not (0). diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index e57d51e9f2be..eeb63852f062 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -933,6 +933,8 @@ struct kvm_vcpu_arch { u8 preempted; u64 msr_val; u64 last_steal; + u64 suspend_ts; + atomic64_t suspend_ns; struct gfn_to_hva_cache cache; } st; =20 @@ -1029,6 +1031,7 @@ struct kvm_vcpu_arch { } pv_eoi; =20 u64 msr_kvm_poll_control; + u64 msr_kvm_suspend_steal; =20 /* pv related host specific info */ struct { diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/a= sm/kvm_para.h index a1efa7907a0b..678ebc3d7eeb 100644 --- a/arch/x86/include/uapi/asm/kvm_para.h +++ b/arch/x86/include/uapi/asm/kvm_para.h @@ -36,6 +36,7 @@ #define KVM_FEATURE_MSI_EXT_DEST_ID 15 #define KVM_FEATURE_HC_MAP_GPA_RANGE 16 #define KVM_FEATURE_MIGRATION_CONTROL 17 +#define KVM_FEATURE_SUSPEND_STEAL 18 =20 #define KVM_HINTS_REALTIME 0 =20 @@ -58,6 +59,7 @@ #define MSR_KVM_ASYNC_PF_INT 0x4b564d06 #define MSR_KVM_ASYNC_PF_ACK 0x4b564d07 #define MSR_KVM_MIGRATION_CONTROL 0x4b564d08 +#define MSR_KVM_SUSPEND_STEAL 0x4b564d09 =20 struct kvm_steal_time { __u64 steal; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index f84bc0569c9c..2ba85f208f87 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1621,8 +1621,10 @@ static inline int __do_cpuid_func(struct kvm_cpuid_a= rray *array, u32 function) (1 << KVM_FEATURE_PV_SCHED_YIELD) | (1 << KVM_FEATURE_ASYNC_PF_INT); =20 - if (sched_info_on()) + if (sched_info_on()) { entry->eax |=3D (1 << KVM_FEATURE_STEAL_TIME); + entry->eax |=3D (1 << KVM_FEATURE_SUSPEND_STEAL); + } =20 entry->ebx =3D 0; entry->ecx =3D 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 422c7fcc5d83..655fc85ab942 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3753,6 +3753,8 @@ static void record_steal_time(struct kvm_vcpu *vcpu) steal +=3D current->sched_info.run_delay - vcpu->arch.st.last_steal; vcpu->arch.st.last_steal =3D current->sched_info.run_delay; + if (unlikely(atomic64_read(&vcpu->arch.st.suspend_ns))) + steal +=3D atomic64_xchg(&vcpu->arch.st.suspend_ns, 0); unsafe_put_user(steal, &st->steal, out); =20 version +=3D 1; @@ -4058,6 +4060,17 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct= msr_data *msr_info) vcpu->arch.msr_kvm_poll_control =3D data; break; =20 + case MSR_KVM_SUSPEND_STEAL: + if (!guest_pv_has(vcpu, KVM_FEATURE_SUSPEND_STEAL) || + !guest_pv_has(vcpu, KVM_FEATURE_STEAL_TIME)) + return 1; + + if (!(data & KVM_MSR_ENABLED)) + return 1; + + vcpu->arch.msr_kvm_suspend_steal =3D data; + break; + case MSR_IA32_MCG_CTL: case MSR_IA32_MCG_STATUS: case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1: @@ -4404,6 +4417,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct= msr_data *msr_info) =20 msr_info->data =3D vcpu->arch.msr_kvm_poll_control; break; + case MSR_KVM_SUSPEND_STEAL: + if (!guest_pv_has(vcpu, KVM_FEATURE_SUSPEND_STEAL)) + return 1; + msr_info->data =3D vcpu->arch.msr_kvm_suspend_steal; + break; case MSR_IA32_P5_MC_ADDR: case MSR_IA32_P5_MC_TYPE: case MSR_IA32_MCG_CAP: @@ -7027,13 +7045,52 @@ static int kvm_arch_suspend_notifier(struct kvm *kv= m) { struct kvm_vcpu *vcpu; unsigned long i; + bool kick_vcpus =3D false; =20 - /* - * Ignore the return, marking the guest paused only "fails" if the vCPU - * isn't using kvmclock; continuing on is correct and desirable. - */ - kvm_for_each_vcpu(i, vcpu, kvm) + kvm_for_each_vcpu(i, vcpu, kvm) { + if (vcpu->arch.msr_kvm_suspend_steal & KVM_MSR_ENABLED) { + kick_vcpus =3D true; + WRITE_ONCE(vcpu->arch.st.suspend_ts, + ktime_get_boottime_ns()); + } + /* + * Ignore the return, marking the guest paused only "fails" if + * the vCPU isn't using kvmclock; continuing on is correct and + * desirable. + */ (void)kvm_set_guest_paused(vcpu); + } + + if (kick_vcpus) + kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); + + return NOTIFY_DONE; +} + +static int +kvm_arch_resume_notifier(struct kvm *kvm) +{ + struct kvm_vcpu *vcpu; + unsigned long i; + + kvm_for_each_vcpu(i, vcpu, kvm) { + u64 suspend_ns =3D ktime_get_boottime_ns() - + vcpu->arch.st.suspend_ts; + + WRITE_ONCE(vcpu->arch.st.suspend_ts, 0); + + /* + * Only accumulate the suspend time if suspend steal-time is + * enabled, but always clear suspend_ts and kick the vCPU as + * the vCPU could have disabled suspend steal-time after the + * suspend notifier grabbed suspend_ts. + */ + if (vcpu->arch.msr_kvm_suspend_steal & KVM_MSR_ENABLED) + atomic64_add(suspend_ns, &vcpu->arch.st.suspend_ns); + + kvm_make_request(KVM_REQ_STEAL_UPDATE, vcpu); + kvm_vcpu_kick(vcpu); + } =20 return NOTIFY_DONE; } @@ -7044,6 +7101,9 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned lo= ng state) case PM_HIBERNATION_PREPARE: case PM_SUSPEND_PREPARE: return kvm_arch_suspend_notifier(kvm); + case PM_POST_HIBERNATION: + case PM_POST_SUSPEND: + return kvm_arch_resume_notifier(kvm); } =20 return NOTIFY_DONE; @@ -11233,6 +11293,16 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) =20 static bool kvm_vcpu_running(struct kvm_vcpu *vcpu) { + /* + * During host SUSPEND/RESUME tasks get frozen after SUSPEND notifiers + * run, and thawed before RESUME notifiers, i.e. vCPUs can be actively + * running when KVM sees the system as suspended. Block the vCPU if + * KVM sees the vCPU as suspended to ensure the suspend steal time is + * accounted before the guest can run, and to the correct guest task. + */ + if (READ_ONCE(vcpu->arch.st.suspend_ts)) + return false; + return (vcpu->arch.mp_state =3D=3D KVM_MP_STATE_RUNNABLE && !vcpu->arch.apf.halted); } --=20 2.50.0.727.gbf7dc18ff4-goog From nobody Mon Oct 6 11:55:46 2025 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADD2728FFE6 for ; Tue, 22 Jul 2025 05:50:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753163454; cv=none; b=EckkRX04tWAJgbCFcpmHS12yzPIA+TzcVhFr73WcEsU9cREVGRIGLPbvU3gF8hoB438JxUPe9YBaJ679vCKR7mhzCce16N0+Qa8dL/uJboSyeazcvlDJ2lzCX0puNHp4LNpXPExkOP2QKgvD1DzV1R9r0Ash4n/EqlX0Fw/7fJ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753163454; c=relaxed/simple; bh=f2app9m1XBJQAi6+4l94hDgNDXkVwP8W7DjMeboFeKs=; h=Date:In-Reply-To:Message-Id:Mime-Version:References:Subject:From: To:Cc:Content-Type; b=bITqmnTeo5wCvVAR4Dkbaj7shApwgpY+Rp8TiBmjLlG44maGUprjSnMe3u4XvSuNXaAsllwm2O3AMOb0lN3c6R+Ue99pJQuRTA5/6vEgcnVoFG5ICSiHNsdTBGY6TBSJAlrn9wHJ7RxfCsZ0FlpGrHYfpeX22Ebsb3bQ9UH1PEg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--suleiman.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=htsuDxOK; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--suleiman.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="htsuDxOK" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e8bcab61882so7291956276.0 for ; Mon, 21 Jul 2025 22:50:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1753163452; x=1753768252; darn=vger.kernel.org; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XJUgLDdqD5HwEkqjp4kNT3CMaMib66Jd7rSrX/S2lMg=; b=htsuDxOKLWcDTconWdgbl/hKNsA0naehu5CU7AULw6+bDQKkOYq9qBU9Ujh86rAbvk DmlmJC/Ug30WCVV7ZDiyvwnVmogwDHQTEm2H0unTzK6jLjtrjNtul1aNi66NCNnaXM3E Vdvob3bv1/3bJQ7JqJgPGdYjFTv2tQ+KzxB0GjGSIzPr/bc/cdFUW4bZlnzk0YykEyJc MPzGzI8WgL3eotqkdwbTic3aMpA8IJBCHcrM9bi/0mKl+KG/jpYAdwr0ntIrV9oEN3Dt RYy8ad78XQjOE4zV6sJjhKg8FxRVBxxmGpr2RXsdB4vpCXPBHzvm6tiqCMSPOFq8pcBM yyaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753163452; x=1753768252; h=cc:to:from:subject:references:mime-version:message-id:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XJUgLDdqD5HwEkqjp4kNT3CMaMib66Jd7rSrX/S2lMg=; b=FUNSTtFKqHG4LY3qtexs8dx5jgonlZa3ygbqcclUT8QPWg1hpmoXbgBRElIHeaNpLE Tqw1zS1m41FgOLdLDnL3DPwFJl1eED+P80/LdV2Aw0/aOY5W94uSLV30lCUdPyQf5hgk Lnsgj2VtkQAmNbcvVo2Im9b/cY0HF9yaIjRiLJo/iSSQwMCpj06ZUaAQ4JlBUIzIRoPe 0FxoBXmmKAMjxxTAptEgtfOxTO4dZgkifrapU45NtaYKMn1z58L2YtnZVM6ya3VVDXr8 bv9lqsf25iZHmLkEAL3h3sXpctIZ8YToQCooR35e93rEc8lY8pAvniaqbL8iqOuur299 /CtA== X-Forwarded-Encrypted: i=1; AJvYcCWhO/CQUvjfRw/osX5+NW4sPMMFJqft8puZJVxoB4qFA1F7uHwi25kadAIsjq1SviAT+/ZdLrx3wPH326I=@vger.kernel.org X-Gm-Message-State: AOJu0Yx9+Y+ypvc4aHN3w7xqG49KDdQEMrzlAbCE2Oa9RQ6NQOX8tX+O 8KnBs2rY+yyXYbF/QgrrJcu0xkXdSssgGWzAGXHrVUFiiDDKopj8/1jVPC/zEQWFthsW0nC9tTs Q9AX3IeryeFmd5A== X-Google-Smtp-Source: AGHT+IEzH5es+owsbGwJIgWMHUn6KwbtBgGKdaJcZ2YsHKNxdG3f/gO+Qi3qLfXbsU9+QlhJpAwtZwURqNvhQw== X-Received: from suleiman1.tok.corp.google.com ([2401:fa00:8f:203:eafa:b5d9:dc1:d7b1]) (user=suleiman job=sendgmr) by 2002:a25:a02a:0:b0:e8b:68e0:a7f4 with SMTP id 3f1490d57ef6-e8bc252f6afmr2612276.4.1753163451131; Mon, 21 Jul 2025 22:50:51 -0700 (PDT) Date: Tue, 22 Jul 2025 14:50:30 +0900 In-Reply-To: <20250722055030.3126772-1-suleiman@google.com> Message-Id: <20250722055030.3126772-4-suleiman@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250722055030.3126772-1-suleiman@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Subject: [PATCH v8 3/3] KVM: x86: Add "suspendsteal" cmdline to request host to add suspend duration in steal time From: Suleiman Souhlal To: Paolo Bonzini , Sean Christopherson Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Chao Gao , David Woodhouse , Sergey Senozhatsky , Konrad Rzeszutek Wilk , Tzung-Bi Shih , John Stultz , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, ssouhlal@freebsd.org, Suleiman Souhlal Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce a new command line parameter, "suspendsteal", enabling the guest to use MSR_KVM_SUSPEND_STEAL, which tells the host that it would like host suspend duration to be included in steal time. Signed-off-by: Suleiman Souhlal --- Documentation/admin-guide/kernel-parameters.txt | 5 +++++ arch/x86/kernel/kvm.c | 15 +++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 07e22ba5bfe3..9a5490539bb2 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -7074,6 +7074,11 @@ improve throughput, but will also increase the amount of memory reserved for use by the client. =20 + suspendsteal + [X86,PV_OPS] + Enable requesting the host to include the duration the + host was suspended in steal time. Disabled by default. + suspend.pm_test_delay=3D [SUSPEND] Sets the number of seconds to remain in a suspend test diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 921c1c783bc1..35d1bb2283c2 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -320,6 +320,18 @@ static void __init paravirt_ops_setup(void) #endif } =20 +static bool suspend_steal; + +static int __init suspendsteal_setup(char *s) +{ + if (kvm_para_has_feature(KVM_FEATURE_SUSPEND_STEAL)) + suspend_steal =3D true; + + return 0; +} + +early_param("suspendsteal", suspendsteal_setup); + static void kvm_register_steal_time(void) { int cpu =3D smp_processor_id(); @@ -331,6 +343,9 @@ static void kvm_register_steal_time(void) wrmsrq(MSR_KVM_STEAL_TIME, (slow_virt_to_phys(st) | KVM_MSR_ENABLED)); pr_debug("stealtime: cpu %d, msr %llx\n", cpu, (unsigned long long) slow_virt_to_phys(st)); + + if (suspend_steal) + wrmsrl(MSR_KVM_SUSPEND_STEAL, KVM_MSR_ENABLED); } =20 static DEFINE_PER_CPU_DECRYPTED(unsigned long, kvm_apic_eoi) =3D KVM_PV_EO= I_DISABLED; --=20 2.50.0.727.gbf7dc18ff4-goog