From nobody Sun Feb  8 16:26:02 2026
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC687236457
	for <linux-kernel@vger.kernel.org>; Fri, 15 Aug 2025 00:26:21 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.73
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1755217583; cv=none;
 b=Em7XFBgT+LSe4nQP3mW73krDt73oX5VOZfJnXRAzAH5BJ/cUjS1JHarbt6SnkNXBFXFkCowwuo+plWUOesbMzT0Dv6RWlTVY03P6RP5/ZsHbxGxOrBnEXlxgQ9SRa1eo9wDXZOYlsjSydR4/ne4Kxc8uVwtlTKNkt8Sg+l/1j6E=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1755217583; c=relaxed/simple;
	bh=tSxnN4InUIVD7D/q15JOgDmCPiIY9B6B5sarM+K8O58=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=GHlurCFAmo+PfL9v+LuID977yGD7tYHmqB0djQiFnicYEj3xyffDdvMeH7Edb9YqyKTul6HKptmo/49C3RZPFpRgg4HvO2ctrwLG9b7qr0ZD6LrlSb75AoXmcEeEXZ95FQzPG9jhVaMua5SJrh/XuL2g1983w/75NFrXsPTG5Zo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=wNxUYaPE; arc=none smtp.client-ip=209.85.216.73
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="wNxUYaPE"
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-323267c0292so1438706a91.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 14 Aug 2025 17:26:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1755217581; x=1755822381;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=RmdRRSXfWYLt7Y3oCZwldZc/EeDUMdt6fP/F4SZL2Q4=;
        b=wNxUYaPEF+rc58WhsT02GfuR1sjAndO6qC5hTfxpeAze8hye5mwObXtR2IvK9n6D2T
         RrJctjUYa4SgZB2kJCK/6Z0bS93Gh8Wd3PnrXHTJ1PksPWm7SXFNp5JAm02EU5vbB2Nj
         0PqdfFbsykbuC4mjuW1x7gH6+g3gNpGfKRGYTB21h+bB+KWTJXtCXrDzb+A7eLyFnDcO
         ioRFjNr67DjnmyoKRlyFm/jcx3nt1g9pQL/eQKn/FNX7BpazkYByjNW0aW2LDoY5oJEr
         T2ymQ+dUz0Nl2wcyrjJQZ4QsxJpkYEFBoWQNW0j4nFyM69bJjH3i1hBl5VI2OTyEqb2C
         27MQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1755217581; x=1755822381;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=RmdRRSXfWYLt7Y3oCZwldZc/EeDUMdt6fP/F4SZL2Q4=;
        b=Pq9j6c/Dr0sOqvA3nrPYJLa69L4c9fq630UcbtRxVeAFc6cbslkevJjjVRLdsUVlkN
         bzwAzXncM4SjtqjGcqLkjYMCmvz7S65WvJ1gmz0owZ/AYNx3TwffKYa/UyYWERngsWb5
         +8uB2kTCG904L3Mwoh7T5xBA5K2cRqfEbUFoAnMRK/yR9Q4cil3r7gxh3WXtb9Lu/NOZ
         wUTTGL+hi3YNJr2WM4/iOpoeUDHTSwGtsMuzDwGZ8UCsyrPdP/i8lI6dP30+RzLmpC5f
         IsoHswxGt973Ykc8gbUAO+jWq74/ehrdfBM7MWnVRnJ39kcfaGIGLQ496WZlpbzBCmgH
         M8uA==
X-Forwarded-Encrypted: i=1;
 AJvYcCVzt1W/IIXlIf6L/1vLmE2RPSZluG1OcJQ3ou7SuT2xd1/7ct2hDV2PAlzhH8IKr/f37MYib3xC2fAcUjg=@vger.kernel.org
X-Gm-Message-State: AOJu0YxoCuRnmxMo1ahDsda9ydXEzaibP8vMY/mDB0QpMfs0ljCosFrd
	6pfeB5Ak4eHwagxiHzQ914HRIuso996nVB99GuVKnrPxZKNkL7YFkld5UbUM0v1hktBhxbL3cHQ
	6mTEKCg==
X-Google-Smtp-Source: 
 AGHT+IEpakHRIzRHWBrSdnkOTo2ICcrcknc/nqDyv63mqLeYibZ2NJIczCRHKLZvXXoQ5sAiPqejhAFCClw=
X-Received: from pjv12.prod.google.com ([2002:a17:90b:564c:b0:312:1900:72e2])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:4f81:b0:31f:36da:3f85
 with SMTP id 98e67ed59e1d1-32342227b18mr295391a91.17.1755217581357; Thu, 14
 Aug 2025 17:26:21 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Thu, 14 Aug 2025 17:25:40 -0700
In-Reply-To: <20250815002540.2375664-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250815002540.2375664-1-seanjc@google.com>
X-Mailer: git-send-email 2.51.0.rc1.163.g2494970778-goog
Message-ID: <20250815002540.2375664-21-seanjc@google.com>
Subject: [PATCH 6.6.y 20/20] KVM: VMX: Preserve host's
 DEBUGCTLMSR_FREEZE_IN_SMM
 while running the guest
From: Sean Christopherson <seanjc@google.com>
To: stable@vger.kernel.org, Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Sasha Levin <sashal@kernel.org>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Paolo Bonzini <pbonzini@redhat.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Maxim Levitsky <mlevitsk@redhat.com>

[ Upstream commit 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a ]

Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
while running the guest.  When running with the "default treatment of SMIs"
in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
is visible to host (non-SMM) software, and instead transitions directly
from VMX non-root to SMM.  And critically, DEBUGCTL isn't context switched
by hardware on SMI or RSM, i.e. SMM will run with whatever value was
resident in hardware at the time of the SMI.

Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
events while the CPU is executing in SMM, which can pollute profiling and
potentially leak information into the guest.

Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
function call), by way of /sys/devices/cpu/freeze_on_smi.

Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
at worst could lead to undesirable behavior in the future if AMD CPUs ever
happened to pick up a collision with the bit.

Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
owns and controls GUEST_IA32_DEBUGCTL.

WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).

Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
load_vmcs12_host_state().  For the VM-Fail case, even though KVM won't
have actually entered the guest, vcpu_enter_guest() will have run with
vmcs02 active and thus could result in vmcs01 being run with a stale value.

Cc: stable@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
[sean: move vmx/main.c change to vmx/vmx.c]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  7 +++++++
 arch/x86/kvm/vmx/nested.c       |  3 +++
 arch/x86/kvm/vmx/vmx.c          |  5 +++++
 arch/x86/kvm/vmx/vmx.h          | 15 ++++++++++++++-
 arch/x86/kvm/x86.c              | 14 ++++++++++++--
 5 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 7373b22c02a7..813887324d52 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1553,6 +1553,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_m=
ode_logical)
 enum kvm_x86_run_flags {
 	KVM_RUN_FORCE_IMMEDIATE_EXIT	=3D BIT(0),
 	KVM_RUN_LOAD_GUEST_DR6		=3D BIT(1),
+	KVM_RUN_LOAD_DEBUGCTL		=3D BIT(2),
 };
=20
 struct kvm_x86_ops {
@@ -1580,6 +1581,12 @@ struct kvm_x86_ops {
 	void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
 	void (*vcpu_put)(struct kvm_vcpu *vcpu);
=20
+	/*
+	 * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to
+	 * match the host's value even while the guest is active.
+	 */
+	const u64 HOST_OWNED_DEBUGCTL;
+
 	void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
 	int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
 	int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2ce39ffbcefb..d2fa192d7ce7 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4688,6 +4688,9 @@ static void nested_vmx_restore_host_state(struct kvm_=
vcpu *vcpu)
 			WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7)));
 	}
=20
+	/* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */
+	vmx_reload_guest_debugctl(vcpu);
+
 	/*
 	 * Note that calling vmx_set_{efer,cr0,cr4} is important as they
 	 * handle a variety of side effects to KVM's software model.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d0973bd7853c..9b1f22bcb716 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7399,6 +7399,9 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu,=
 u64 run_flags)
 	if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
 		set_debugreg(vcpu->arch.dr6, 6);
=20
+	if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
+		vmx_reload_guest_debugctl(vcpu);
+
 	/*
 	 * Refresh vmcs.HOST_CR3 if necessary.  This must be done immediately
 	 * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
@@ -8326,6 +8329,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata =3D {
 	.vcpu_load =3D vmx_vcpu_load,
 	.vcpu_put =3D vmx_vcpu_put,
=20
+	.HOST_OWNED_DEBUGCTL =3D DEBUGCTLMSR_FREEZE_IN_SMM,
+
 	.update_exception_bitmap =3D vmx_update_exception_bitmap,
 	.get_msr_feature =3D vmx_get_msr_feature,
 	.get_msr =3D vmx_get_msr,
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 769e70fd142c..5d73d3e570d7 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -434,12 +434,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64=
 data, bool host_initiated)
=20
 static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
 {
+	WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM);
+
+	val |=3D vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM;
 	vmcs_write64(GUEST_IA32_DEBUGCTL, val);
 }
=20
 static inline u64 vmx_guest_debugctl_read(void)
 {
-	return vmcs_read64(GUEST_IA32_DEBUGCTL);
+	return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM;
+}
+
+static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu)
+{
+	u64 val =3D vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+	if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM))
+		return;
+
+	vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM);
 }
=20
 /*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 07207d8126b4..af0b2b3bc991 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10518,7 +10518,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		dm_request_for_irq_injection(vcpu) &&
 		kvm_cpu_accept_dm_intr(vcpu);
 	fastpath_t exit_fastpath;
-	u64 run_flags;
+	u64 run_flags, debug_ctl;
=20
 	bool req_immediate_exit =3D false;
=20
@@ -10777,7 +10777,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		set_debugreg(0, 7);
 	}
=20
-	vcpu->arch.host_debugctl =3D get_debugctlmsr();
+	/*
+	 * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL
+	 * can be modified in IRQ context, e.g. via SMP function calls.  Inform
+	 * vendor code if any host-owned bits were changed, e.g. so that the
+	 * value loaded into hardware while running the guest can be updated.
+	 */
+	debug_ctl =3D get_debugctlmsr();
+	if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUG=
CTL &&
+	    !vcpu->arch.guest_state_protected)
+		run_flags |=3D KVM_RUN_LOAD_DEBUGCTL;
+	vcpu->arch.host_debugctl =3D debug_ctl;
=20
 	guest_timing_enter_irqoff();
=20
--=20
2.51.0.rc1.163.g2494970778-goog