From nobody Sat Nov 23 20:24:30 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07ADC1A0B13; Mon, 11 Nov 2024 16:39:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731343177; cv=none; b=NGYos0ttCALpIfaOnnEa7LX0RUKcpZZq7Ke+lVkhfiUBQtQNYkvHnUx58m8KAL2I2gUBebp3ZsYOp9DMRHlQgSMQ4eX6M5lsLv25uoQf8Cb34W/+0kxVsfxa+nXdSj2Xi83FQXr3BgXDJacUrWDJMOlPMGnhpRVBKVe/0cqYC14= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731343177; c=relaxed/simple; bh=7HDQrOeHzesoq08mXwj6oE4jK5rhJKQzrH8b34Uv2k4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CR45blTQ0kgn7zZ2Lxk44H02RIFFxtgGYfO5o5pw0k/ywLrjXCzr14NKWHK59NbLYZrUicc8lSEdVjUrF0tKhIiY69/ToRbiqrrgYdqsXX3m+Aqzw3f+qGjkjJZQbPWcDTfgJ885K7we7rxOlhKLZPgG8Y0OVixlxpSEUJLzcSI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BB5+8ioJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BB5+8ioJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8C122C4CED7; Mon, 11 Nov 2024 16:39:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731343176; bh=7HDQrOeHzesoq08mXwj6oE4jK5rhJKQzrH8b34Uv2k4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BB5+8ioJvp+yzp8h3GJLoj/X/aCEDdrslkwehkNhxnVwIQp17Hm7Of1tA+qdosGGp nd4WKni+7IDBguleEa4dPdzpwmpyxUBtblng3ITt2X2ASMMVLSq1vhT0ubeYP6H6m+ qVOeFANgJWoQmKPRTFr5uhWu1rMj5QNsEz6HCYFfcNizBr1G6NnV455mAq1tqp/+KJ jHofPd164o6IoPytkZKX+ZKLdVHuiCK22nBzmQegCLGAZ0pqF6dEZfzxZwp8wSwX+u o3/OjESIAZ+7aXuOVSjx6AId92dnrZluHkFAaV3D7uAKt0FMOjh11MCqdRV5T1kMhw jKAO/gcTzLMtA== From: Amit Shah To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org Cc: amit.shah@amd.com, thomas.lendacky@amd.com, bp@alien8.de, tglx@linutronix.de, peterz@infradead.org, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, corbet@lwn.net, mingo@redhat.com, dave.hansen@linux.intel.com, hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com, daniel.sneddon@linux.intel.com, kai.huang@intel.com, sandipan.das@amd.com, boris.ostrovsky@oracle.com, Babu.Moger@amd.com, david.kaplan@amd.com, dwmw@amazon.co.uk, andrew.cooper3@citrix.com Subject: [RFC PATCH v2 1/3] x86: cpu/bugs: update SpectreRSB comments for AMD Date: Mon, 11 Nov 2024 17:39:11 +0100 Message-ID: <20241111163913.36139-2-amit@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241111163913.36139-1-amit@kernel.org> References: <20241111163913.36139-1-amit@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Amit Shah AMD CPUs do not fall back to the BTB when the RSB underflows for RET address speculation. AMD CPUs have not needed to stuff the RSB for underflow conditions. The RSB poisoning case is addressed by RSB filling - clean up the FIXME comment about it. Signed-off-by: Amit Shah --- arch/x86/kernel/cpu/bugs.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 47a01d4028f6..0aa629b5537d 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1828,9 +1828,6 @@ static void __init spectre_v2_select_mitigation(void) * speculated return targets may come from the branch predictor, * which could have a user-poisoned BTB or BHB entry. * - * AMD has it even worse: *all* returns are speculated from the BTB, - * regardless of the state of the RSB. - * * When IBRS or eIBRS is enabled, the "user -> kernel" attack * scenario is mitigated by the IBRS branch prediction isolation * properties, so the RSB buffer filling wouldn't be necessary to @@ -1852,8 +1849,6 @@ static void __init spectre_v2_select_mitigation(void) * * So to mitigate all cases, unconditionally fill RSB on context * switches. - * - * FIXME: Is this pointless for retbleed-affected AMD? */ setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switc= h\n"); --=20 2.47.0 From nobody Sat Nov 23 20:24:30 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DDAE1A0B13; Mon, 11 Nov 2024 16:39:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731343182; cv=none; b=ZskzV7yJE1nE7ZvYnilPXIXSyYbchw075SpAj5rk+qDu1ftVaIK+G2qDSb17Gu2KTuJ0u6wg3sHvY63R6zTVyjxUAdImtRq3yWd116Z7fpVEpOgcE5INjnpvIgWx4U5kp8ur9wzHK2i/Jx+QbVRb1VqCk4dOMkXvrdI52SR/4XI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731343182; c=relaxed/simple; bh=41oxJ5fykYMjyLxmhvNwa/DNFouyIpJaqEOFrvTA7G0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E0YUcCcatShdbtiXyp3T80UAo9MOT4pbBgaVNt3HvTMBl91coHjDq2QOj7q0KxdycAM6fSzQlEQ+/K1+grdvFd/3B+Fz1Yreo3XMXprAy8GDH/j4ANwJhIXqHFpV2XoSkIMFO1nCCCbSitGZ4zP1TevvzPu8JS/JbfN948dc0qE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tDCoWnrI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tDCoWnrI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1D093C4CED4; Mon, 11 Nov 2024 16:39:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731343182; bh=41oxJ5fykYMjyLxmhvNwa/DNFouyIpJaqEOFrvTA7G0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=tDCoWnrI6N8SU8BOGyxwXfTfWxc6BfccU3vGOsmYNxoVpbBS66PURLYTetZD9UuEp wHBHB5P0JwZeV44Im4te9lfhVriItNchB/ZfQPo2ACWq4ak+5B0iKZ+RjY9yg5jBd8 y+n3kP9uF7kRfKCXN3l61t78apdHKcwcb2bqBUMhH+CX8DMTaiCxWFHKranYbxRpgW oWHEbhwoKgJemP3GX6Ve3c0iXX+kGfIiHfh26QVkZSZ9h1lhVBcEHZUaTiS8nFlGln Kl4k02AayEG2L45/9R1xpDWaRRKwKSQYVLuwYDqlWzNsT8ATDEAkb1gLJfAPry7ep+ eRQJt8DZzoovA== From: Amit Shah To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org Cc: amit.shah@amd.com, thomas.lendacky@amd.com, bp@alien8.de, tglx@linutronix.de, peterz@infradead.org, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, corbet@lwn.net, mingo@redhat.com, dave.hansen@linux.intel.com, hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com, daniel.sneddon@linux.intel.com, kai.huang@intel.com, sandipan.das@amd.com, boris.ostrovsky@oracle.com, Babu.Moger@amd.com, david.kaplan@amd.com, dwmw@amazon.co.uk, andrew.cooper3@citrix.com Subject: [RFC PATCH v2 2/3] x86: cpu/bugs: add support for AMD ERAPS feature Date: Mon, 11 Nov 2024 17:39:12 +0100 Message-ID: <20241111163913.36139-3-amit@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241111163913.36139-1-amit@kernel.org> References: <20241111163913.36139-1-amit@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Amit Shah Remove explicit RET stuffing / filling on VMEXITs and context switches on AMD CPUs with the ERAPS feature (Zen5+). With the Enhanced Return Address Prediction Security feature, any of the following conditions triggers a flush of the RSB (aka RAP in AMD manuals) in hardware: * context switch (e.g., move to CR3) * TLB flush * some writes to CR4 The feature also explicitly tags host and guest addresses in the RSB - eliminating the need for explicit flushing of the RSB on VMEXIT. [RFC note: We'll wait for the APM to be updated with the real wording, but assuming it's going to say the ERAPS feature works the way described above, let's continue the discussion re: when the kernel currently calls FILL_RETURN_BUFFER, and what dropping it entirely means. Dave Hansen pointed out __switch_to_asm fills the RSB each time it's called, so let's address the cases there: 1. user->kernel switch: Switching from userspace to kernelspace, and then using user-stuffed RSB entries in the kernel is not possible thanks to SMEP. We can safely drop the FILL_RETURN_BUFFER call for this case. In fact, this is how the original code was when dwmw2 added it originally in c995efd5a. So while this case currently triggers an RSB flush (and will not after this ERAPS patch), the current flush isn't necessary for AMD systems with SMEP anyway. 2. user->user or kernel->kernel: If a user->user switch does not result in a CR3 change, it's a different thread in the same process context. That's the same case for kernel->kernel switch. In this case, the RSB entries are still valid in that context, just not the correct ones in the new thread's context. It's difficult to imagine this being a security risk. The current code clearing it, and this patch not doing so for AMD-with-ERAPS, isn't a concern as far as I see. ] Feature mentioned in AMD PPR 57238. Will be resubmitted once APM is public - which I'm told is imminent. Signed-off-by: Amit Shah --- Documentation/admin-guide/hw-vuln/spectre.rst | 5 ++-- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/nospec-branch.h | 12 ++++++++ arch/x86/kernel/cpu/bugs.c | 29 ++++++++++++++----- 4 files changed, 37 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/hw-vuln/spectre.rst b/Documentation/= admin-guide/hw-vuln/spectre.rst index 132e0bc6007e..647c10c0307a 100644 --- a/Documentation/admin-guide/hw-vuln/spectre.rst +++ b/Documentation/admin-guide/hw-vuln/spectre.rst @@ -417,9 +417,10 @@ The possible values in this file are: =20 - Return stack buffer (RSB) protection status: =20 - =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D + =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 'RSB filling' Protection of RSB on context switch enabled - =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D + 'ERAPS' Hardware RSB flush on context switches + guest/host tags + =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 - EIBRS Post-barrier Return Stack Buffer (PBRSB) protection status: =20 diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpuf= eatures.h index 913fd3a7bac6..665032b12871 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -458,6 +458,7 @@ #define X86_FEATURE_AUTOIBRS (20*32+ 8) /* Automatic IBRS */ #define X86_FEATURE_NO_SMM_CTL_MSR (20*32+ 9) /* SMM_CTL MSR is not presen= t */ =20 +#define X86_FEATURE_ERAPS (20*32+24) /* Enhanced RAP / RSB / RAS Security= */ #define X86_FEATURE_SBPB (20*32+27) /* Selective Branch Prediction Barrie= r */ #define X86_FEATURE_IBPB_BRTYPE (20*32+28) /* MSR_PRED_CMD[IBPB] flushes = all branch type predictions */ #define X86_FEATURE_SRSO_NO (20*32+29) /* CPU is not affected by SRSO */ diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/no= spec-branch.h index 96b410b1d4e8..f5ee7fc71db5 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -117,6 +117,18 @@ * We define a CPP macro such that it can be used from both .S files and * inline assembly. It's possible to do a .macro and then include that * from C via asm(".include ") but let's not go there. + * + * AMD CPUs with the ERAPS feature may have a larger default RSB. These C= PUs + * use the default number of entries on a host, and can optionally (based = on + * hypervisor setup) use 32 (old) or the new default in a guest. The numb= er + * of default entries is reflected in CPUID 8000_0021:EBX[23:16]. + * + * With the ERAPS feature, RSB filling is not necessary anymore: the RSB is + * auto-cleared by hardware on context switches, TLB flushes, or some CR4 + * writes. Adapting the value of RSB_CLEAR_LOOPS below for ERAPS would ch= ange + * it to a runtime variable instead of the current compile-time constant, = so + * leave it as-is, as this works for both older CPUs, as well as newer ones + * with ERAPS. */ =20 #define RETPOLINE_THUNK_SIZE 32 diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 0aa629b5537d..02446815b0de 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1818,9 +1818,12 @@ static void __init spectre_v2_select_mitigation(void) pr_info("%s\n", spectre_v2_strings[mode]); =20 /* - * If Spectre v2 protection has been enabled, fill the RSB during a - * context switch. In general there are two types of RSB attacks - * across context switches, for which the CALLs/RETs may be unbalanced. + * If Spectre v2 protection has been enabled, the RSB needs to be + * cleared during a context switch. Either do it in software by + * filling the RSB, or in hardware via ERAPS. + * + * In general there are two types of RSB attacks across context + * switches, for which the CALLs/RETs may be unbalanced. * * 1) RSB underflow * @@ -1848,12 +1851,21 @@ static void __init spectre_v2_select_mitigation(voi= d) * RSB clearing. * * So to mitigate all cases, unconditionally fill RSB on context - * switches. + * switches when ERAPS is not present. */ - setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); - pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switc= h\n"); + if (!boot_cpu_has(X86_FEATURE_ERAPS)) { + setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); + pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context swit= ch\n"); =20 - spectre_v2_determine_rsb_fill_type_at_vmexit(mode); + /* + * For guest -> host (or vice versa) RSB poisoning scenarios, + * determine the mitigation mode here. With ERAPS, RSB + * entries are tagged as host or guest - ensuring that neither + * the host nor the guest have to clear or fill RSB entries to + * avoid poisoning: skip RSB filling at VMEXIT in that case. + */ + spectre_v2_determine_rsb_fill_type_at_vmexit(mode); + } =20 /* * Retpoline protects the kernel, but doesn't protect firmware. IBRS @@ -2866,7 +2878,7 @@ static ssize_t spectre_v2_show_state(char *buf) spectre_v2_enabled =3D=3D SPECTRE_V2_EIBRS_LFENCE) return sysfs_emit(buf, "Vulnerable: eIBRS+LFENCE with unprivileged eBPF = and SMT\n"); =20 - return sysfs_emit(buf, "%s%s%s%s%s%s%s%s\n", + return sysfs_emit(buf, "%s%s%s%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], ibpb_state(), boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? "; IBRS_FW" : "", @@ -2874,6 +2886,7 @@ static ssize_t spectre_v2_show_state(char *buf) boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? "; RSB filling" : "", pbrsb_eibrs_state(), spectre_bhi_state(), + boot_cpu_has(X86_FEATURE_ERAPS) ? "; ERAPS hardware RSB flush" : "", /* this should always be at the end */ spectre_v2_module_string()); } --=20 2.47.0 From nobody Sat Nov 23 20:24:30 2024 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B7801A0B13; Mon, 11 Nov 2024 16:39:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731343188; cv=none; b=Fp5Bk1hC0gYNRydYfXkwFW4Pld5huJ6VJhtzjrMaXJuLlcgp2K1MXE/JPcChU8THSs08WF2Fhxs4bXm9Rf0OxDoPbqBtYUgRzs4AFFCRcGRHSecDO9DzdtnoPFDXSw7AQVYrZVvangZki2gZTc8SxR/BRMG1F7TSQbZdvWEWqhE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731343188; c=relaxed/simple; bh=whNZGouXzBk6ZR0NrIPatst5Q2YbdPXJuNWvgMvM7Tc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t2VMFE6yk9Vhu37LyG8dpybWe6nCFqXmQwUP2kHzIb8UHKeMEuXB9u4iqfq9Wv7W5ve5mD2q5H6i6UQHVbEPaTjd0UICukyEP4qzp7uDj4yuHRsejNRxaaeOF3DYEReC5Ms32xV7qBp/B7zMinjVDeIa288csOQaUADW146WM9Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CCVeQje5; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CCVeQje5" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9F039C4CECF; Mon, 11 Nov 2024 16:39:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1731343187; bh=whNZGouXzBk6ZR0NrIPatst5Q2YbdPXJuNWvgMvM7Tc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CCVeQje5F3QT/8kFgOzW2AckOSWV+eaXXse0kUfWxz6VVNsGy/xKlECJVNNjmNAZb AWAUbSpJxeCPlwY6j+tatCu5sHBuKeILbLXL46kuQAdysfJZ9zrLpcvfkom6Dhhjaq VqIXblf37sBUBAdPT4b0nXGKZ7X151RIivcLuqquG2KWM/oXgWSSKYhxeV4Fxe6cNx xNWa2I/iDtyG4Q/A7Fsz30hN2gYcBoo8nHDr8mBhSOO16zgrVJahl53J6nwGwsdfUP EyNbYslGMsUE/7mW62kThZ10/EEFelESgSePkQakfXt2cKRTlykuLJl2MPcASGdfZj ULSh4462Jzryg== From: Amit Shah To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, x86@kernel.org, linux-doc@vger.kernel.org Cc: amit.shah@amd.com, thomas.lendacky@amd.com, bp@alien8.de, tglx@linutronix.de, peterz@infradead.org, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, corbet@lwn.net, mingo@redhat.com, dave.hansen@linux.intel.com, hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com, daniel.sneddon@linux.intel.com, kai.huang@intel.com, sandipan.das@amd.com, boris.ostrovsky@oracle.com, Babu.Moger@amd.com, david.kaplan@amd.com, dwmw@amazon.co.uk, andrew.cooper3@citrix.com Subject: [RFC PATCH v2 3/3] x86: kvm: svm: add support for ERAPS and FLUSH_RAP_ON_VMRUN Date: Mon, 11 Nov 2024 17:39:13 +0100 Message-ID: <20241111163913.36139-4-amit@kernel.org> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20241111163913.36139-1-amit@kernel.org> References: <20241111163913.36139-1-amit@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Amit Shah AMD CPUs with the ERAPS feature (Zen5+) have a larger RSB (aka RAP). While the new default RSB size is used on the host without any software modification necessary, the RSB usage for guests is limited to the older value (32 entries) for backwards compatibility. With this patch, KVM enables guest mode to also use the default number of entries by setting the new ALLOW_LARGER_RAP bit in the VMCB. The two cases for backward compatibility that need special handling are nested guests, and guests using shadow paging (or when NPT is disabled): For nested guests: the ERAPS feature adds host/guest tagging to entries in the RSB, but does not distinguish between ASIDs. On a nested exit, the L0 hypervisor instructs the microcode (via another new VMCB bit, FLUSH_RAP_ON_VMRUN) to flush the RSB on the next VMRUN to prevent RSB poisoning attacks from an L2 guest to an L1 guest. With that in place, this feature can be exposed to guests. For shadow paging guests: do not expose this feature to guests; only expose if nested paging is enabled, to ensure context switches within guests trigger TLB flushes on the CPU -- thereby ensuring guest context switches flush guest RSB entries. For shadow paging, the CPU's CR3 is not used for guest processes, and hence cannot benefit from this feature. Signed-off-by: Amit Shah --- arch/x86/include/asm/svm.h | 6 +++++- arch/x86/kvm/cpuid.c | 18 ++++++++++++++-- arch/x86/kvm/svm/svm.c | 44 ++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/svm/svm.h | 15 +++++++++++++ 4 files changed, 80 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 2b59b9951c90..f8584a63c859 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -129,7 +129,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area { u64 tsc_offset; u32 asid; u8 tlb_ctl; - u8 reserved_2[3]; + u8 erap_ctl; + u8 reserved_2[2]; u32 int_ctl; u32 int_vector; u32 int_state; @@ -175,6 +176,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area { #define TLB_CONTROL_FLUSH_ASID 3 #define TLB_CONTROL_FLUSH_ASID_LOCAL 7 =20 +#define ERAP_CONTROL_ALLOW_LARGER_RAP 0 +#define ERAP_CONTROL_FLUSH_RAP 1 + #define V_TPR_MASK 0x0f =20 #define V_IRQ_SHIFT 8 diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 41786b834b16..b432fe3a9f49 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -797,6 +797,8 @@ void kvm_set_cpu_caps(void) F(WRMSR_XX_BASE_NS) ); =20 + if (tdp_enabled) + kvm_cpu_cap_check_and_set(X86_FEATURE_ERAPS); kvm_cpu_cap_check_and_set(X86_FEATURE_SBPB); kvm_cpu_cap_check_and_set(X86_FEATURE_IBPB_BRTYPE); kvm_cpu_cap_check_and_set(X86_FEATURE_SRSO_NO); @@ -1356,10 +1358,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_= array *array, u32 function) case 0x80000020: entry->eax =3D entry->ebx =3D entry->ecx =3D entry->edx =3D 0; break; - case 0x80000021: - entry->ebx =3D entry->ecx =3D entry->edx =3D 0; + case 0x80000021: { + unsigned int ebx_mask =3D 0; + + entry->ecx =3D entry->edx =3D 0; cpuid_entry_override(entry, CPUID_8000_0021_EAX); + + /* + * Bits 23:16 in EBX indicate the size of the RSB. + * Expose the value in the hardware to the guest. + */ + if (kvm_cpu_cap_has(X86_FEATURE_ERAPS)) + ebx_mask |=3D GENMASK(23, 16); + + entry->ebx &=3D ebx_mask; break; + } /* AMD Extended Performance Monitoring and Debug */ case 0x80000022: { union cpuid_0x80000022_ebx ebx; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 9df3e1e5ae81..c98ae5ee3646 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1360,6 +1360,28 @@ static void init_vmcb(struct kvm_vcpu *vcpu) if (boot_cpu_has(X86_FEATURE_V_SPEC_CTRL)) set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SPEC_CTRL, 1, 1); =20 + /* + * If the hardware has a larger RSB, use it in the guest context as + * well. + * + * When running nested guests: the hardware tags host and guest RSB + * entries, but the entries are ASID agnostic. Differentiating L1 and + * L2 guests isn't possible in hardware. To prevent L2->L1 RSB + * poisoning attacks in this case, the L0 hypervisor must set + * FLUSH_RAP_ON_VMRUN in the L1's VMCB on a nested #VMEXIT to ensure + * the next VMRUN flushes the RSB. + * + * For shadow paging / NPT disabled case: the CPU's CR3 does not + * contain the CR3 of the running guest process, and hence intra-guest + * context switches will not cause a hardware TLB flush, which in turn + * does not result in a guest RSB flush that the ERAPS feature + * provides. Do not expose ERAPS or the larger RSB to the guest in + * this case, so the guest continues implementing software mitigations + * as well as only sees 32 entries for the RSB. + */ + if (cpu_feature_enabled(X86_FEATURE_ERAPS) && npt_enabled) + vmcb_set_larger_rap(svm->vmcb); + if (kvm_vcpu_apicv_active(vcpu)) avic_init_vmcb(svm, vmcb); =20 @@ -3393,6 +3415,7 @@ static void dump_vmcb(struct kvm_vcpu *vcpu) pr_err("%-20s%016llx\n", "tsc_offset:", control->tsc_offset); pr_err("%-20s%d\n", "asid:", control->asid); pr_err("%-20s%d\n", "tlb_ctl:", control->tlb_ctl); + pr_err("%-20s%d\n", "erap_ctl:", control->erap_ctl); pr_err("%-20s%08x\n", "int_ctl:", control->int_ctl); pr_err("%-20s%08x\n", "int_vector:", control->int_vector); pr_err("%-20s%08x\n", "int_state:", control->int_state); @@ -3559,6 +3582,27 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fa= stpath_t exit_fastpath) =20 trace_kvm_nested_vmexit(vcpu, KVM_ISA_SVM); =20 + if (boot_cpu_has(X86_FEATURE_ERAPS) + && vmcb_is_larger_rap(svm->vmcb01.ptr)) { + /* + * XXX a few further optimizations can be made: + * + * 1. In pre_svm_run() we can reset this bit when a hw + * TLB flush has happened - any context switch on a + * CPU (which causes a TLB flush) auto-flushes the RSB + * - eg when this vCPU is scheduled on a different + * pCPU. + * + * 2. This is also not needed in the case where the + * vCPU is being scheduled on the same pCPU, but there + * was a context switch between the #VMEXIT and VMRUN. + * + * 3. If the guest returns to L2 again after this + * #VMEXIT, there's no need to flush the RSB. + */ + vmcb_set_flush_rap(svm->vmcb01.ptr); + } + vmexit =3D nested_svm_exit_special(svm); =20 if (vmexit =3D=3D NESTED_EXIT_CONTINUE) diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 43fa6a16eb19..8a7877f46dc5 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -500,6 +500,21 @@ static inline bool svm_is_intercept(struct vcpu_svm *s= vm, int bit) return vmcb_is_intercept(&svm->vmcb->control, bit); } =20 +static inline void vmcb_set_flush_rap(struct vmcb *vmcb) +{ + __set_bit(ERAP_CONTROL_FLUSH_RAP, (unsigned long *)&vmcb->control.erap_ct= l); +} + +static inline void vmcb_set_larger_rap(struct vmcb *vmcb) +{ + __set_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->control.= erap_ctl); +} + +static inline bool vmcb_is_larger_rap(struct vmcb *vmcb) +{ + return test_bit(ERAP_CONTROL_ALLOW_LARGER_RAP, (unsigned long *)&vmcb->co= ntrol.erap_ctl); +} + static inline bool nested_vgif_enabled(struct vcpu_svm *svm) { return guest_can_use(&svm->vcpu, X86_FEATURE_VGIF) && --=20 2.47.0