From nobody Wed Nov 27 12:31:37 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB6321D4176 for ; Wed, 9 Oct 2024 17:50:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728496209; cv=none; b=u0oZ8f2bguEYlIOjEyyH64cjYpBKuvGuI7S/gyASHaSIaDCgK8J16FV0rbQ16pC+CADoMOLa2BDb96ZqSyo/FmgZshHIsJ/MqKhBLSNIpWN70sgVBQcEML1z/bWmhPHAgGRzwkUaUJzY7zivv768Mhi+OBeDrVyQVsmVifwE+8k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728496209; c=relaxed/simple; bh=85MiJbPlCBRn8NQoDRwLJneffyaGCs2FYkZSdoqVaQA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=f36vd7RUfbSFCnSdwH+3ihrn+6u9lO+QLm5RH/GO9OKC92ndKT2gQVhvtOgVLoRfczaj1NGRigpl5nJ+nFqWpIcJyCGjP4kCma15JxOW6D+NnCwPc29PLU4Yp78tQjLm/hlh0QMgV31i/4pGBdpWXF5Txw0VIW4J7aK2U8XdJOg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=N5HEqYxZ; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="N5HEqYxZ" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6e21dcc7044so2965317b3.1 for ; Wed, 09 Oct 2024 10:50:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728496207; x=1729101007; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=h1Tp88JPCtGmNT2zJzXG5vNGkuLUQ/egYckqu51pEoA=; b=N5HEqYxZj7o+TMT6o7IV6U7driZqs089vpxrEhnW9INh7hvo0r/TRkA2YjKpLhKt7h 6euGU2N9dcMmxHcLzYUVbA/sUlFk5KjrhZOs2EoUZAkJpug364sGUEpl7rDSppxyPz1D 5k2ZgPZTD8sVNlGOAm1UVt+TEJXNSchomT94NWVzQpQMetG387b4LPtW/72Zpg2wpmgP +vZsyt2zHissmzqQIKBIGL4sE8fMgxMfaIUO6lbZVBGJg4+hJedb7K1/DtHjKoMmUPwY JDo+h6e6PMZIOtFyncEOgqYmWsTMF9gpPUBgRCsQVs72D94Jo6mkfNDdE6mNzz88umOL SH/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728496207; x=1729101007; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=h1Tp88JPCtGmNT2zJzXG5vNGkuLUQ/egYckqu51pEoA=; b=S3tdFsJCFh5TZ6oQM4IP44DCOMVv04HHYsN80Vsc+xD/1RMPT/P38xYiSqB7TjQn9Y zovMmhdZe/gnR3n3SEQFrv0UTzsATW/jN50Ykm44Qwp58A1pI3+cpS9XTASMsnWwWmoV FVoCB9hdJ8ZezXQQAnZSQDF7avI6R6VsWy4nF1WMt3OfXnPKxdEdE5BFS5xois170IZD PWh19jU0GRGgPltzhkYqJn/2nJqYh/ENcF5zzSAcbJwe/S/Pr5atJf/G3EtaE78TQ7J1 dnf1FAS8obK5e5qWPpwHwnREKnwwvOp5jv2EXvTu6quBAyiovcxUfbRkVlIhV/u1rSKl S7mQ== X-Forwarded-Encrypted: i=1; AJvYcCW1JJL+nehMD+2TyLuUaSnpJaZPsIpmpvy4D35atDR1iJzyWhbAZlToXDGvBr3mPHOYwDcN5/5WOmUfX10=@vger.kernel.org X-Gm-Message-State: AOJu0Ywr95H4Uho2czPrLBYLkZ25jthVoImrk05uUBxJLfHkL6hDtmH5 tt9tkxeEyvsRXDalEhKpIFHYpChTJdyBPWXMo5HTrkfSN6CRkT2ggNEfLrcFEgfJ+tMFgyHJBzJ bDw== X-Google-Smtp-Source: AGHT+IH7ulc/vjxbXlOiyHtWgK/BgxYqcqGjypCpU7Kt/4W3UcUFzu3+iDcaxEbKWDJ7lNvWET3UroG6PwA= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a5b:94c:0:b0:e28:ef8f:7424 with SMTP id 3f1490d57ef6-e28fe44d418mr29001276.11.1728496206718; Wed, 09 Oct 2024 10:50:06 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 9 Oct 2024 10:49:59 -0700 In-Reply-To: <20241009175002.1118178-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241009175002.1118178-1-seanjc@google.com> X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog Message-ID: <20241009175002.1118178-2-seanjc@google.com> Subject: [PATCH v4 1/4] KVM: x86: Bypass register cache when querying CPL from kvm_sched_out() From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Maxim Levitsky Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When querying guest CPL to determine if a vCPU was preempted while in kernel mode, bypass the register cache, i.e. always read SS.AR_BYTES from the VMCS on Intel CPUs. If the kernel is running with full preemption enabled, using the register cache in the preemption path can result in stale and/or uninitialized data being cached in the segment cache. In particular the following scenario is currently possible: - vCPU is just created, and the vCPU thread is preempted before SS.AR_BYTES is written in vmx_vcpu_reset(). - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() =3D> vmx_get_cpl() reads and caches '0' for SS.AR_BYTES. - vmx_vcpu_reset() =3D> seg_setup() configures SS.AR_BYTES, but doesn't invoke vmx_segment_cache_clear() to invalidate the cache. As a result, KVM retains a stale value in the cache, which can be read, e.g. via KVM_GET_SREGS. Usually this is not a problem because the VMX segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM selftests) reads and writes system registers just after the vCPU was created, _without_ modifying SS.AR_BYTES, userspace will write back the stale '0' value and ultimately will trigger a VM-Entry failure due to incorrect SS segment type. Note, the VM-Enter failure can also be avoided by moving the call to vmx_segment_cache_clear() until after the vmx_vcpu_reset() initializes all segments. However, while that change is correct and desirable (and will come along shortly), it does not address the underlying problem that accessing KVM's register caches from !task context is generally unsafe. In addition to fixing the immediate bug, bypassing the cache for this particular case will allow hardening KVM register caching log to assert that the caches are accessed only when KVM _knows_ it is safe to do so. Fixes: de63ad4cf497 ("KVM: X86: implement the logic for spinlock optimizati= on") Reported-by: Maxim Levitsky Closes: https://lore.kernel.org/all/20240716022014.240960-3-mlevitsk@redhat= .com Signed-off-by: Sean Christopherson Reviewed-by: Maxim Levitsky --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/svm/svm.c | 1 + arch/x86/kvm/vmx/main.c | 1 + arch/x86/kvm/vmx/vmx.c | 23 ++++++++++++++++++----- arch/x86/kvm/vmx/vmx.h | 1 + arch/x86/kvm/x86.c | 8 +++++++- 7 files changed, 30 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 861d080ed4c6..5aff7222e40f 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -34,6 +34,7 @@ KVM_X86_OP(set_msr) KVM_X86_OP(get_segment_base) KVM_X86_OP(get_segment) KVM_X86_OP(get_cpl) +KVM_X86_OP(get_cpl_no_cache) KVM_X86_OP(set_segment) KVM_X86_OP(get_cs_db_l_bits) KVM_X86_OP(is_valid_cr0) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 6d9f763a7bb9..3ae90df0a177 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1656,6 +1656,7 @@ struct kvm_x86_ops { void (*get_segment)(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); int (*get_cpl)(struct kvm_vcpu *vcpu); + int (*get_cpl_no_cache)(struct kvm_vcpu *vcpu); void (*set_segment)(struct kvm_vcpu *vcpu, struct kvm_segment *var, int seg); void (*get_cs_db_l_bits)(struct kvm_vcpu *vcpu, int *db, int *l); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 9df3e1e5ae81..50f6b0e03d04 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -5031,6 +5031,7 @@ static struct kvm_x86_ops svm_x86_ops __initdata =3D { .get_segment =3D svm_get_segment, .set_segment =3D svm_set_segment, .get_cpl =3D svm_get_cpl, + .get_cpl_no_cache =3D svm_get_cpl, .get_cs_db_l_bits =3D svm_get_cs_db_l_bits, .is_valid_cr0 =3D svm_is_valid_cr0, .set_cr0 =3D svm_set_cr0, diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index 7668e2fb8043..92d35cc6cd15 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -50,6 +50,7 @@ struct kvm_x86_ops vt_x86_ops __initdata =3D { .get_segment =3D vmx_get_segment, .set_segment =3D vmx_set_segment, .get_cpl =3D vmx_get_cpl, + .get_cpl_no_cache =3D vmx_get_cpl_no_cache, .get_cs_db_l_bits =3D vmx_get_cs_db_l_bits, .is_valid_cr0 =3D vmx_is_valid_cr0, .set_cr0 =3D vmx_set_cr0, diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 1a4438358c5e..12dd7009efbe 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -3568,16 +3568,29 @@ u64 vmx_get_segment_base(struct kvm_vcpu *vcpu, int= seg) return vmx_read_guest_seg_base(to_vmx(vcpu), seg); } =20 -int vmx_get_cpl(struct kvm_vcpu *vcpu) +static int __vmx_get_cpl(struct kvm_vcpu *vcpu, bool no_cache) { struct vcpu_vmx *vmx =3D to_vmx(vcpu); + int ar; =20 if (unlikely(vmx->rmode.vm86_active)) return 0; - else { - int ar =3D vmx_read_guest_seg_ar(vmx, VCPU_SREG_SS); - return VMX_AR_DPL(ar); - } + + if (no_cache) + ar =3D vmcs_read32(GUEST_SS_AR_BYTES); + else + ar =3D vmx_read_guest_seg_ar(vmx, VCPU_SREG_SS); + return VMX_AR_DPL(ar); +} + +int vmx_get_cpl(struct kvm_vcpu *vcpu) +{ + return __vmx_get_cpl(vcpu, false); +} + +int vmx_get_cpl_no_cache(struct kvm_vcpu *vcpu) +{ + return __vmx_get_cpl(vcpu, true); } =20 static u32 vmx_segment_access_rights(struct kvm_segment *var) diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 2325f773a20b..bcf40c7f3a38 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -385,6 +385,7 @@ void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu); void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_s= el, unsigned long fs_base, unsigned long gs_base); int vmx_get_cpl(struct kvm_vcpu *vcpu); +int vmx_get_cpl_no_cache(struct kvm_vcpu *vcpu); bool vmx_emulation_required(struct kvm_vcpu *vcpu); unsigned long vmx_get_rflags(struct kvm_vcpu *vcpu); void vmx_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 83fe0a78146f..830073294640 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5094,7 +5094,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) int idx; =20 if (vcpu->preempted) { - vcpu->arch.preempted_in_kernel =3D kvm_arch_vcpu_in_kernel(vcpu); + /* + * Assume protected guests are in-kernel. Inefficient yielding + * due to false positives is preferable to never yielding due + * to false negatives. + */ + vcpu->arch.preempted_in_kernel =3D vcpu->arch.guest_state_protected || + !kvm_x86_call(get_cpl_no_cache)(vcpu); =20 /* * Take the srcu lock as memslots will be accessed to check the gfn --=20 2.47.0.rc1.288.g06298d1525-goog From nobody Wed Nov 27 12:31:37 2024 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9E8B21E0DFA for ; Wed, 9 Oct 2024 17:50:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728496211; cv=none; b=gkv07717+jfyEwfmePDYJMEv3vo9NU/KP+S+4JMJEPDW5gXWfklyO1grbNe6TwPBYRvB7lQGavw6znwzPZCMGHSGMXPxc+5ypamzy8j+vXBZ7YvhD7m1OzMRB5z51ycijRsQIO+hIKjqG0q/y1SuZQbcrMKa2uCtjdXiFKUtYs8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728496211; c=relaxed/simple; bh=XGN8QmzWeMv16qVKiSYSJZttp2ea5qDKnmkRDQ95LS0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lqSndvucfBHyTVx9CiRNTZOyOH/kPxhJcjfoqzuEkie+Smt2hWQVd2H1KkvxqFGTevjbDrYf+CIIMSjEjE6m8KBGUbxqTjJXcNvFTkPOdDxfAN/uMziNiCGEF1nyYqGqvEEridWIS8dKJZ4TCTjXJXwlwXX9+JvOSzYU9l4gGVw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=N+OLYdhF; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="N+OLYdhF" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-71e04c42fecso77555b3a.0 for ; Wed, 09 Oct 2024 10:50:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728496209; x=1729101009; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=+KKX3lXTCaUFVYVr2zKh6PXHZREG1pYDrjz48FgujYw=; b=N+OLYdhFhjRpVAneqIR7aaUH0cdtto9w74D2lARz0jNhD92MZ1YJsTBhyyzA0lCT1m 1mKfCDaVNCi4C0ObUyw0cIEPNxAWk4BJHbZ7mLoNbs1a8+edDg+Fvva+6N9kd3GCAYZy Zd0ulDwrRn0D4P4VjL6gRkf2HBMsHWnRX3QOXAvNsLXCJEVfCMXI4kLDJ76vDUA9odlN f4svU1Vm3TD+LWGX5vdAibGjHqLS76AdQwTPfp751Q+iuKvQGu0KqvmK4TmLtUcm2jHF cr2vAgVZ8EpwKk8Y4kDCOksD8Zi9I1ldVyf1Z4U4csZEDQDGE2FiP70GHbgkmt8/MkCC zMxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728496209; x=1729101009; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+KKX3lXTCaUFVYVr2zKh6PXHZREG1pYDrjz48FgujYw=; b=HBokij/9q7oBKTE4zM9Dg5jbNXNKPToYEQBeUp5mJ5M8cQl81PdoT9W2aT0wUHOaU7 /DiH53k3n9XceSScn+fvuWa5waUc3MR0NBOQaQZS403H7FhoZWLnBOlL+7uGGeZYhMc3 AsatIfxXt37LCOnfNUyVzChb1xl2q/tn+enHfTwTKGruaGbeshC7koeNKZIdmjKSmpCd dCTfo381/CF387ikRmseqFc/8sxIvwjf/ATM5Y5ERYXoULjwt/InN1yFixazPKxC2jyZ htWz59KG2/YQIQcxNvPzMYAYCXnIjn8ClJTYMbosTi727AbhY8G3CKrezJi0daoIMeT+ OcyQ== X-Forwarded-Encrypted: i=1; AJvYcCUQp4ztJ6A3qiyiRWvGr8Mhc03okbc/ZLvCC2wUg+Hyu5h/zhWXo0vWwivls8LpiFqISNxa9oIQws/pudk=@vger.kernel.org X-Gm-Message-State: AOJu0Ywdscu+pXGdwbRlxiWXi7wAtCNzi74e219k9M9blqrKKSWVGtNS lFKVaoKpVF6mD/xJzYzGSP2+t8VDMaAiy488yyj+RL3TF2Ncb2cZC1oMuNEohlZE7OE3GmX86UO vHg== X-Google-Smtp-Source: AGHT+IEVGupj+lzuaWgALZWozBmfxi8I2JGdBW0Fd7qPciWqwQBnx+E5KfqkSKSPrS5ywaRa+QgHUrupAV8= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a05:6a00:6f44:b0:71e:268b:845e with SMTP id d2e1a72fcca58-71e26e53c16mr1169b3a.1.1728496208598; Wed, 09 Oct 2024 10:50:08 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 9 Oct 2024 10:50:00 -0700 In-Reply-To: <20241009175002.1118178-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241009175002.1118178-1-seanjc@google.com> X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog Message-ID: <20241009175002.1118178-3-seanjc@google.com> Subject: [PATCH v4 2/4] KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset() From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Maxim Levitsky Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Maxim Levitsky Reset the segment cache after segment initialization in vmx_vcpu_reset() to harden KVM against caching stale/uninitialized data. Without the recent fix to bypass the cache in kvm_arch_vcpu_put(), the following scenario is possible: - vCPU is just created, and the vCPU thread is preempted before SS.AR_BYTES is written in vmx_vcpu_reset(). - When scheduling out the vCPU task, kvm_arch_vcpu_in_kernel() =3D> vmx_get_cpl() reads and caches '0' for SS.AR_BYTES. - vmx_vcpu_reset() =3D> seg_setup() configures SS.AR_BYTES, but doesn't invoke vmx_segment_cache_clear() to invalidate the cache. As a result, KVM retains a stale value in the cache, which can be read, e.g. via KVM_GET_SREGS. Usually this is not a problem because the VMX segment cache is reset on each VM-Exit, but if the userspace VMM (e.g KVM selftests) reads and writes system registers just after the vCPU was created, _without_ modifying SS.AR_BYTES, userspace will write back the stale '0' value and ultimately will trigger a VM-Entry failure due to incorrect SS segment type. Invalidating the cache after writing the VMCS doesn't address the general issue of cache accesses from IRQ context being unsafe, but it does prevent KVM from clobbering the VMCS, i.e. mitigates the harm done _if_ KVM has a bug that results in an unsafe cache access. Signed-off-by: Maxim Levitsky Fixes: 2fb92db1ec08 ("KVM: VMX: Cache vmcs segment fields") [sean: rework changelog to account for previous patch] Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/vmx.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 12dd7009efbe..a11faab67b4a 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -4901,9 +4901,6 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_= event) vmx->hv_deadline_tsc =3D -1; kvm_set_cr8(vcpu, 0); =20 - vmx_segment_cache_clear(vmx); - kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS); - seg_setup(VCPU_SREG_CS); vmcs_write16(GUEST_CS_SELECTOR, 0xf000); vmcs_writel(GUEST_CS_BASE, 0xffff0000ul); @@ -4930,6 +4927,9 @@ void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_= event) vmcs_writel(GUEST_IDTR_BASE, 0); vmcs_write32(GUEST_IDTR_LIMIT, 0xffff); =20 + vmx_segment_cache_clear(vmx); + kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS); + vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); vmcs_write32(GUEST_INTERRUPTIBILITY_INFO, 0); vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, 0); --=20 2.47.0.rc1.288.g06298d1525-goog From nobody Wed Nov 27 12:31:37 2024 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 792F81E1331 for ; Wed, 9 Oct 2024 17:50:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728496212; cv=none; b=i67Hxzteb2IIG/UXxpNq1LtBFidkr3cClZuz3Q3xSbzYAm01mDk9SGENR+kk8SP/AfR703RHuIRApIRjsyhW/+SkwT+7w/9lhQT5nX5pPnGtuWuONn8uDZ8c6D4Kxs9b01Y8XBwvzuXNe78cMBKuM+oayVs5VMztK+6IgpJ5SHs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728496212; c=relaxed/simple; bh=SGkZJcqM6l7cc+bXFGwWYrJw06CpvtpeZuEcwNGJSXM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=DVvdI4x83CYQcBW7HCuEyJuphSLh29MbaG8T2h7egQrtwer9hmtaOJ9W8MLI6yDrJ2tyE9DiME3gGbswVeAwLv5y9SdTROKUXeWaSqvtf7tH2yODVdmz15iJMWZxN+FHFvcIwdVBoFSpPIKthmF+lMFjDSqYsdvzg9qxC7xUqeY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TEmHu0RP; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TEmHu0RP" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-71e0228d71dso63252b3a.0 for ; Wed, 09 Oct 2024 10:50:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728496211; x=1729101011; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=AwyBAzSnZj8uKv6KW6XIzrczvMSQb4MHq6nrZHBBhJE=; b=TEmHu0RPEu+YlIiSTs4vjdGaYVOcftqmesl7VGz8OmdCiuqiT2sJjcy4OYPYzsPa2F eOv97SOzPLJ92DnZlI2foFGr5cF6ASnvRrSRs2dNw9KTyZHG8ZVc9ymmzbSkgmDtW68d 5gwahUPt3KNJW+93W+MhnIlKALRD1TbsM0XLePqD6gknLgH+FmjCIUvAZ9M9zV1C75A9 kLdPao7oIpPJlxmYze+YLVKXIOMU5bF+LG2lWMebeXgEllRhJdimq2Apz8q6ZHnp5iRg LmKESrObjbpbt9V9D2GWBtw9NKk7+nSN9HP68i/17k7EJxk0XeZmx8DZ3rsCY82aCQvf BZLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728496211; x=1729101011; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AwyBAzSnZj8uKv6KW6XIzrczvMSQb4MHq6nrZHBBhJE=; b=oP+zGlgOyT57l+pgf6wUsYw9ARF2UhaDJ7AZsk76+njacO5TTaV2twXAL9T4Ar1rjw 7/nIcmP0brJ1OI0Tss/z5LVhwVFoKgHRip8gqev5oG04o5bApsdZB1Mt1raxM1pM4MdK KD3sxG8yWYhFQWqh+05zjOEJyTKC6F8EhnPY9X5XcWwiA8lOVfsuQ+rFj1wnnI1Q5g1O hqCWJmj23W92GH/a18PWaH13wjl9rF9gXNmED0qVCqVaFMvoT6ifvYO0NDCs7Pf3UVpg Bw21tTg7OlOWaMd1puJ468otULUruP1+JCXr5XKPLMHme2gqXkAvd18qKUS7gduG3eG5 Hs+A== X-Forwarded-Encrypted: i=1; AJvYcCUChC/CjHtdWeY9Dtk/BvrqGTm/YM8TErAkZAnrXAHTvLtsl+9S63A+YQOlLqep7BwSTjncXWrH3IT6hqk=@vger.kernel.org X-Gm-Message-State: AOJu0YxlwIHnbRdKuUgTCO6eeNSPtRNqvnOXj/Ak4QeEqm/jJlGAbKQM K3HxM9LUsX9pVYdeCLOXNqfU1TnLUkiGIJDhufBUwPwAqtaY9gKdS0tfcA5vo70kQa8KoEbemA0 pJg== X-Google-Smtp-Source: AGHT+IEUDzIOqyoe2fHUURoFuYYcS3+r+FH0yq7uJsmpfcv/LkNzTroW0jRCSQ+Vt3s4Kg9OOtr2wC8SkO0= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a05:6a00:9199:b0:71e:1e8:e337 with SMTP id d2e1a72fcca58-71e1dbe467fmr3709b3a.4.1728496210470; Wed, 09 Oct 2024 10:50:10 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 9 Oct 2024 10:50:01 -0700 In-Reply-To: <20241009175002.1118178-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241009175002.1118178-1-seanjc@google.com> X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog Message-ID: <20241009175002.1118178-4-seanjc@google.com> Subject: [PATCH v4 3/4] KVM: x86: Add lockdep-guarded asserts on register cache usage From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Maxim Levitsky Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When lockdep is enabled, assert that KVM accesses the register caches if and only if cache fills are guaranteed to consume fresh data, i.e. when KVM when KVM is in control of the code sequence. Concretely, the caches can only be used from task context (synchronous) or when handling a PMI VM-Exit (asynchronous, but only in specific windows where the caches are in a known, stable state). Generally speaking, there are very few flows where reading register state from an asynchronous context is correct or even necessary. So, rather than trying to figure out a generic solution, simply disallow using the caches outside of task context by default, and deal with any future exceptions on a case-by-case basis _if_ they arise. Signed-off-by: Sean Christopherson Reviewed-by: Maxim Levitsky --- arch/x86/kvm/kvm_cache_regs.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index b1eb46e26b2e..36a8786db291 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -43,6 +43,18 @@ BUILD_KVM_GPR_ACCESSORS(r14, R14) BUILD_KVM_GPR_ACCESSORS(r15, R15) #endif =20 +/* + * Using the register cache from interrupt context is generally not allowe= d, as + * caching a register and marking it available/dirty can't be done atomica= lly, + * i.e. accesses from interrupt context may clobber state or read stale da= ta if + * the vCPU task is in the process of updating the cache. The exception i= s if + * KVM is handling a PMI IRQ/NMI VM-Exit, as that bound code sequence does= n't + * touch the cache, it runs after the cache is reset (post VM-Exit), and P= MIs + * need to access several registers that are cacheable. + */ +#define kvm_assert_register_caching_allowed(vcpu) \ + lockdep_assert_once(in_task() || kvm_arch_pmi_in_guest(vcpu)) + /* * avail dirty * 0 0 register in VMCS/VMCB @@ -53,24 +65,28 @@ BUILD_KVM_GPR_ACCESSORS(r15, R15) static inline bool kvm_register_is_available(struct kvm_vcpu *vcpu, enum kvm_reg reg) { + kvm_assert_register_caching_allowed(vcpu); return test_bit(reg, (unsigned long *)&vcpu->arch.regs_avail); } =20 static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu, enum kvm_reg reg) { + kvm_assert_register_caching_allowed(vcpu); return test_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty); } =20 static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu, enum kvm_reg reg) { + kvm_assert_register_caching_allowed(vcpu); __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail); } =20 static inline void kvm_register_mark_dirty(struct kvm_vcpu *vcpu, enum kvm_reg reg) { + kvm_assert_register_caching_allowed(vcpu); __set_bit(reg, (unsigned long *)&vcpu->arch.regs_avail); __set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty); } @@ -84,6 +100,7 @@ static inline void kvm_register_mark_dirty(struct kvm_vc= pu *vcpu, static __always_inline bool kvm_register_test_and_mark_available(struct kv= m_vcpu *vcpu, enum kvm_reg reg) { + kvm_assert_register_caching_allowed(vcpu); return arch___test_and_set_bit(reg, (unsigned long *)&vcpu->arch.regs_ava= il); } =20 --=20 2.47.0.rc1.288.g06298d1525-goog From nobody Wed Nov 27 12:31:37 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DE691E1C0F for ; Wed, 9 Oct 2024 17:50:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728496214; cv=none; b=sqBVfvRnr1jo1Lhm3kcMOpsuMvOmnxgYohJjC2H7Iu1zEMJToG/IhxPdH8O0hOYttZR4uTxBNBDfutvOh5NAcTOxvWE8TWDWVn/lLrIPJgdC8SdzjkXsMVRviykdeViUvQg5T/A3mt9ZeM3mwKR5tUwZOQLCAMut/Hp7IzKMEDU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728496214; c=relaxed/simple; bh=MYjJ/qnOvQzH1wfa59u0Xpbj214hk83+i8Sp8cBUcZc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ETZ/u/W6Tp8Hh+VnQPEnp4p1lkc72n/N8iVT1y47wauBj1nnDKQoJKRVK9yRPA8jfYewRXD2OOyaix1/Glsq5AD2ERW/1BHoOMyzFgzz2/kASgs5VKm2ax+ztn/NhwgnxI5MZCxypwbG+wVWN4u2XQ3f9f8vD2xooIpvfJdySe8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=1FjbdrPS; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1FjbdrPS" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6e22f8dc491so3281187b3.1 for ; Wed, 09 Oct 2024 10:50:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728496212; x=1729101012; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=q5Tl19evhAY8JjScv5s5uNMaM2+262nOKyeLkvLhKQo=; b=1FjbdrPSsv33OnLfU0iG4wzT17yYlp1ATLd+59nfKHbCrdam3WqUkIeKABmLRLpgYb OCGL818jgKFSKD9JtFP6I8thtN2SMa2QLfHDSX6/xt5Qc6khKtGuvSfMKZ4Inv/nACeC aA4ndLMeGoOwTCHlAmWMCPlOCe1fzco/5eX1AAGB8QnXLqeKiair3FDfkYAHu1WgboBt 1aEn3rsctbldHZy2q2f3izlukQ3fego0wJmtKB9Ud+3wK4VyjRDGcP0K3MVvAQt/nvOk rupNUZ4/X9SufY+ZZv1PatcpzcNUsj7x07D1Kxj0Jjx2sT/KXYAltl9PdrhHdK3CMrhj G+uQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728496212; x=1729101012; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=q5Tl19evhAY8JjScv5s5uNMaM2+262nOKyeLkvLhKQo=; b=V8sg1jHO8AI5/yw80b0QWz6++F1EhrMPGprWiKaVOyFiM/HWvupdhdqaLhdEi/656s gz+Bn4eNa7/gK/rjE6EsRQo5wct3h5ycYF4pcBwCDdSOoIk3ZGcmgN3V2D+fcGzsZiWO aZ51UfX7+TDYuGOT7JycabH/mUO3VBgGOK4bmvCsC3EoGdoEg/7QN0o1aVcVL4K0bPlw F+LHeJw1ORhJhZ3kxTR4G3UT8zVdQccM77dIxw5D+T8rxJTfZ7xJrm5W/u5yJCFEzH3p 2eWVvpxTGWRf9M/rhpPcBZbnsYkfNglJqNhihIO2tGyTs0svlbNKAGpQO4OjiPS+j8xu sY/Q== X-Forwarded-Encrypted: i=1; AJvYcCUdpgLjBzQYnj8he3fvoMcG8KS4m33Mns4JGZSaeSubyWjlUVOSp6OoyQUitFvrv6sr0acPiQX1GHAGBk8=@vger.kernel.org X-Gm-Message-State: AOJu0Yz7RUgH3cR4T1Qp78o/iWVHOVU7X5Npf47gaaTYugk94GeQO9qy amlATvUal4ixnpvFWl38nV1Hs0qmVtho4d9J9lGliYnX/1t/WTRiOl6OU0fbajFrX10mbc03qmm lkA== X-Google-Smtp-Source: AGHT+IFMHgERYMzuO6VYoXJn+Uk/j2H+Jy1Zk5qgBHPNEHHOBCqb5nEizw0iDClpKHX0M5PlTaKe9CMJs0w= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:9d:3983:ac13:c240]) (user=seanjc job=sendgmr) by 2002:a0d:ce42:0:b0:6de:19f:34d7 with SMTP id 00721157ae682-6e32217b753mr159347b3.2.1728496212289; Wed, 09 Oct 2024 10:50:12 -0700 (PDT) Reply-To: Sean Christopherson Date: Wed, 9 Oct 2024 10:50:02 -0700 In-Reply-To: <20241009175002.1118178-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241009175002.1118178-1-seanjc@google.com> X-Mailer: git-send-email 2.47.0.rc1.288.g06298d1525-goog Message-ID: <20241009175002.1118178-5-seanjc@google.com> Subject: [PATCH v4 4/4] KVM: x86: Use '0' for guest RIP if PMI encounters protected guest state From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Maxim Levitsky Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Explicitly return '0' for guest RIP when handling a PMI VM-Exit for a vCPU with protected guest state, i.e. when KVM can't read the real RIP. While there is no "right" value, and profiling a protect guest is rather futile, returning the last known RIP is worse than returning obviously "bad" data. E.g. for SEV-ES+, the last known RIP will often point somewhere in the guest's boot flow. Opportunistically add WARNs to effectively assert that the in_kernel() and get_ip() callbacks are restricted to the common PMI handler, as the return values for the protected guest state case are largely arbitrary, i.e. only make any sense whatsoever for PMIs, where the returned values have no functional impact and thus don't truly matter. Signed-off-by: Sean Christopherson Reviewed-by: Maxim Levitsky --- arch/x86/kvm/x86.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 830073294640..516cf6c71567 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13213,6 +13213,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, =20 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) { + WARN_ON_ONCE(!kvm_arch_pmi_in_guest(vcpu)); + if (vcpu->arch.guest_state_protected) return true; =20 @@ -13221,6 +13223,11 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) =20 unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu) { + WARN_ON_ONCE(!kvm_arch_pmi_in_guest(vcpu)); + + if (vcpu->arch.guest_state_protected) + return 0; + return kvm_rip_read(vcpu); } =20 --=20 2.47.0.rc1.288.g06298d1525-goog