From nobody Sat Oct 4 16:20:46 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A10A61E520F for ; Fri, 15 Aug 2025 00:57:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755219463; cv=none; b=Z6q7VpoV5L+w8Op3LHKobNqei5cQ60DafOLvc5Cyo6YDA3bVjmkn3kKPF5ha9cs5/1AFj9Pn836FZ9kgMJZr64SCKKNztsWX9XEQzZUaN8OxzUJ5eSDNUvj8HSTDEBwYa10B1uCJKeqpI/S6WECNWr76vGiuxXfMLQc4GbSKi8k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755219463; c=relaxed/simple; bh=D3SCzJLx7z3/xuvZe7P1l+Hc98pS/XxswTMwxFnhSas=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nxccfSBh/TT5Wl2N440J8HPTwNrhBRju76uh9OAE8HXDWh16Xm1B6M1fKAwmFeAY00zMsr8Ghx9mYK4dPgQMYTTnMj9ZwgzMA/Zpv35XWP2v/NMLuzcTQjNf3x1++3DpED3X6mLtpDjVqCq31HKp9kh9MP+ibWtFjXks9ec7LaM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=LWW35Fdr; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LWW35Fdr" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-76e2eb6d2baso2985246b3a.3 for ; Thu, 14 Aug 2025 17:57:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1755219461; x=1755824261; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=KJQ/Bc06ZBSNQpA/+1i2e1EgCyr29JOoXKc+Br8C5UA=; b=LWW35FdrH9U5XdVuisL8oEnOnJw1p7fZMEWepj2mPwDYJPNrtsH6Pk2Te2pFQuEPlg 5oWL6TlwCr6QdPwZuKu81cKt1xlmWOXI2AM6r9DowXTgWjSLuAIdcfzUf4mq4pq26AJX Lwdlu60gPsKtHgGUNk3S8xfCC997PbxxW9Uy2xvniCuakhUXPN4P/AD42wFAAjc/aZIn /wjBn40L/rSOFOpZlASYa2CibzxcPklX8B38NZzF1wh4JKi2zEd+YHEeP72jNkqLi7t9 IYb/BaA53YDWmbWB6je7HW3mvLH9GzBT/PYvgAAW9H5f2+aAiLkQvUjeCW0dab402bSx CVKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755219461; x=1755824261; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KJQ/Bc06ZBSNQpA/+1i2e1EgCyr29JOoXKc+Br8C5UA=; b=Yd0o0qioGXhTNm4qABIK9V7Gcy1UjBzFv6mOzdmnsGzC597XnMvNiLgIwvmn499c4Z l+SiEZ9jnTu5ca1TZbCpgKr7mkqZHtawHmAbUFGX2hkgW84tmKGM6z5k24XVEEBEYpjl n9F9pE3YBgBopBs0RveUnzh5vr0OdwOesyCKAmP88jVJownqKebcdqtln7aATXAmW2dn IWVv/1RtYOrUrm3wsrHrx7B/FeaEl8mGYYzgPBeDAYmBOawYEoSKUwDK1hZ/3tjLLxc8 X1/O912mX1RBOT1u5aZI0ydgBK8ElzK3U/QzvK6jFg7vPuUufhCkrTMZ9ADWT8Lk4DMQ Dtng== X-Forwarded-Encrypted: i=1; AJvYcCV58cWbUGaa8JtRq7b9eTZ1n3Z7TPyOywR7MpaE/9/NlGf6C8qljbhVgLpR7SpCy5kTWwRG6AG0eS5Py1k=@vger.kernel.org X-Gm-Message-State: AOJu0Yz8YTus/UBSmJ8AdZQ9eI64tWca0V42iEzOW9i4jRKkVXSkMolk WeaKP47PxqxHYoDX7eCARNZWV6beGn0rzoZQYwHrWJEFu9Ovz3q2h/7m5DHFGi2o6+j40K9xW4d v32hv8A== X-Google-Smtp-Source: AGHT+IGZ4VntDzaqy3qfZtCDACj8O+RSKqOB5NRBlo4nW7dJOpqHPhO34Cy3RdpV/3eMeBJuwMFcmtrg+/A= X-Received: from pghc21.prod.google.com ([2002:a63:da15:0:b0:b2d:aac5:e874]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:244f:b0:240:1d9a:4c95 with SMTP id adf61e73a8af0-240d2d88145mr418868637.2.1755219460800; Thu, 14 Aug 2025 17:57:40 -0700 (PDT) Reply-To: Sean Christopherson Date: Thu, 14 Aug 2025 17:57:25 -0700 In-Reply-To: <20250815005725.2386187-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250815005725.2386187-1-seanjc@google.com> X-Mailer: git-send-email 2.51.0.rc1.163.g2494970778-goog Message-ID: <20250815005725.2386187-8-seanjc@google.com> Subject: [PATCH 6.12.y 7/7] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest From: Sean Christopherson To: stable@vger.kernel.org, Greg Kroah-Hartman , Sasha Levin Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Paolo Bonzini Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Maxim Levitsky [ Upstream commit 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a ] Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting while running the guest. When running with the "default treatment of SMIs" in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that is visible to host (non-SMM) software, and instead transitions directly from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched by hardware on SMI or RSM, i.e. SMM will run with whatever value was resident in hardware at the time of the SMI. Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting events while the CPU is executing in SMM, which can pollute profiling and potentially leak information into the guest. Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner run loop, as the bit can be toggled in IRQ context via IPI callback (SMP function call), by way of /sys/devices/cpu/freeze_on_smi. Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs, i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and at worst could lead to undesirable behavior in the future if AMD CPUs ever happened to pick up a collision with the bit. Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module owns and controls GUEST_IA32_DEBUGCTL. WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the lack of handling isn't a KVM bug (TDX already WARNs on any run_flag). Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state(). Doing so avoids the need to track host_debugctl on a per-VMCS basis, as GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't have actually entered the guest, vcpu_enter_guest() will have run with vmcs02 active and thus could result in vmcs01 being run with a stale value. Cc: stable@vger.kernel.org Signed-off-by: Maxim Levitsky Co-developed-by: Sean Christopherson Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com Signed-off-by: Sean Christopherson [sean: resolve syntactic conflict in vt_x86_ops definition] Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 7 +++++++ arch/x86/kvm/vmx/main.c | 2 ++ arch/x86/kvm/vmx/nested.c | 3 +++ arch/x86/kvm/vmx/vmx.c | 3 +++ arch/x86/kvm/vmx/vmx.h | 15 ++++++++++++++- arch/x86/kvm/x86.c | 14 ++++++++++++-- 6 files changed, 41 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 2ed05925d9d5..d27df86aa62c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1630,6 +1630,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_m= ode_logical) enum kvm_x86_run_flags { KVM_RUN_FORCE_IMMEDIATE_EXIT =3D BIT(0), KVM_RUN_LOAD_GUEST_DR6 =3D BIT(1), + KVM_RUN_LOAD_DEBUGCTL =3D BIT(2), }; =20 struct kvm_x86_ops { @@ -1659,6 +1660,12 @@ struct kvm_x86_ops { void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); void (*vcpu_put)(struct kvm_vcpu *vcpu); =20 + /* + * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to + * match the host's value even while the guest is active. + */ + const u64 HOST_OWNED_DEBUGCTL; + void (*update_exception_bitmap)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 7668e2fb8043..3f83e36a657b 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -42,6 +42,8 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D { .vcpu_load =3D vmx_vcpu_load, .vcpu_put =3D vmx_vcpu_put, =20 + .HOST_OWNED_DEBUGCTL =3D DEBUGCTLMSR_FREEZE_IN_SMM, + .update_exception_bitmap =3D vmx_update_exception_bitmap, .get_feature_msr =3D vmx_get_feature_msr, .get_msr =3D vmx_get_msr, diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 9a336f661fc6..60bd2791d933 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4829,6 +4829,9 @@ static void nested_vmx_restore_host_state(struct kvm_= vcpu *vcpu) WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7))); } =20 + /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */ + vmx_reload_guest_debugctl(vcpu); + /* * Note that calling vmx_set_{efer,cr0,cr4} is important as they * handle a variety of side effects to KVM's software model. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 4bb25519e7ce..6c185a260c5b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7407,6 +7407,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 ru= n_flags) if (run_flags & KVM_RUN_LOAD_GUEST_DR6) set_debugreg(vcpu->arch.dr6, 6); =20 + if (run_flags & KVM_RUN_LOAD_DEBUGCTL) + vmx_reload_guest_debugctl(vcpu); + /* * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 5b2c5cb5e32e..a7e2de50d27f 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -440,12 +440,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64= data, bool host_initiated) =20 static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val) { + WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM); + + val |=3D vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM; vmcs_write64(GUEST_IA32_DEBUGCTL, val); } =20 static inline u64 vmx_guest_debugctl_read(void) { - return vmcs_read64(GUEST_IA32_DEBUGCTL); + return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM; +} + +static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu) +{ + u64 val =3D vmcs_read64(GUEST_IA32_DEBUGCTL); + + if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM)) + return; + + vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM); } =20 /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7beea8fb6ea6..dbd295ef3eba 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10711,7 +10711,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); fastpath_t exit_fastpath; - u64 run_flags; + u64 run_flags, debug_ctl; =20 bool req_immediate_exit =3D false; =20 @@ -10982,7 +10982,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) set_debugreg(DR7_FIXED_1, 7); } =20 - vcpu->arch.host_debugctl =3D get_debugctlmsr(); + /* + * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL + * can be modified in IRQ context, e.g. via SMP function calls. Inform + * vendor code if any host-owned bits were changed, e.g. so that the + * value loaded into hardware while running the guest can be updated. + */ + debug_ctl =3D get_debugctlmsr(); + if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUG= CTL && + !vcpu->arch.guest_state_protected) + run_flags |=3D KVM_RUN_LOAD_DEBUGCTL; + vcpu->arch.host_debugctl =3D debug_ctl; =20 guest_timing_enter_irqoff(); =20 --=20 2.51.0.rc1.163.g2494970778-goog