Xen Security Advisory 433 v1 - x86/AMD: Zenbleed

Xen.org security team posted 1 patch 9 months, 1 week ago
Failed in applying to current master (apply log)
Xen Security Advisory 433 v1 - x86/AMD: Zenbleed
Posted by Xen.org security team 9 months, 1 week ago
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

                    Xen Security Advisory XSA-433

                          x86/AMD: Zenbleed

ISSUE DESCRIPTION
=================

Researchers at Google have discovered Zenbleed, a hardware bug causing
corruption of the vector registers.

When a VZEROUPPER instruction is discarded as part of a bad transient
execution path, its effect on internal tracking are not unwound
correctly.  This manifests as the wrong micro-architectural state
becoming architectural, and corrupting the vector registers.

Note: While this malfunction is related to speculative execution, this
      is not a speculative sidechannel vulnerability.

The corruption is not random.  It happens to be stale values from the
physical vector register file, a structure competitively shared between
sibling threads.  Therefore, an attacker can directly access data from
the sibling thread, or from a more privileged context.

For more details, see:
  https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7008.html
  https://github.com/google/security-research/security/advisories/GHSA-v6wh-rxpg-cmm8

IMPACT
======

With very low probability, corruption of the vector registers can occur.
This data corruption causes mis-calculations in subsequent logic.

An attacker can exploit this bug to read data from different contexts on
the same core.  Examples of such data includes key material, cypher and
plaintext from the AES-NI instructions, or the contents of REP-MOVS
instructions, commonly used to implement memcpy().

VULNERABLE SYSTEMS
==================

Systems running all versions of Xen are affected.

This bug is specific to the AMD Zen2 microarchitecture.  AMD do not
believe that other microarchitectures are affected.

MITIGATION
==========

This issue can be mitigated by disabling AVX, either by booting Xen with
`cpuid=no-avx` on the command line, or by specifying `cpuid="host:avx=0"` in
the vm.cfg file of all untrusted VMs.  However, this will come with a
significant impact on the system and is not recommended for anyone able to
deploy the microcode or patch described below.

RESOLUTION
==========

AMD are producing microcode updates to address the bug.  Consult your
dom0 OS vendor.  This microcode is effective when late-loaded, which can
be performed on a live system without reboot.

In cases where microcode is not available, the appropriate attached
patch updates Xen to use a control register to avoid the issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa433.patch           xen-unstable
xsa433-4.17.patch      Xen 4.17.x
xsa433-4.16.patch      Xen 4.16.x
xsa433-4.15.patch      Xen 4.15.x
xsa433-4.14.patch      Xen 4.14.x

$ sha256sum xsa433*
a9331733b63e3e566f1436a48e9bd9e8b86eb48da6a8ced72ff4affb7859e027  xsa433.patch
6f1db2a2078b0152631f819f8ddee21720dabe185ec49dc9806d4a9d3478adfd  xsa433-4.14.patch
ca3a92605195307ae9b6ff87240beb52a097c125a760c919d7b9a0aff6e557c0  xsa433-4.15.patch
e5e94b3de68842a1c8d222802fb204d64acd118e3293c8e909dfaf3ada23d912  xsa433-4.16.patch
41d12104869b7e8307cd93af1af12b4fd75a669aeff15d31b234dc72981ae407  xsa433-4.17.patch
$

NOTE CONCERNING TIMELINE
========================

This issue is subject to coordinated disclosure on August 8th.  The
discoverer chose to publish details ahead of this timeline.
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmS+oDEMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZ4JkIAMOW9i78luUOEgggrQDp97T1CMAhew+3v+r2ZPMl
z7a6ATRU3oW7yeepYEP/1mrRFi2E09zrj0rDLvLVrYrhqeDGVIL+ZfI480508/5Y
ubRYZC13rA3jDMDu9r+oBIzObumecRAVj54j5BQmuKyXDqkDMGfbVShpMMvARvhE
wqlBXNFB1Z+ARlDrDZZo6sKhfUqHS4Fo8iilWthKxY9Eb0cxxA1PazMJz5OOaqe6
6Y3hHrSN4dq3DseAhYGgtw+BOTa/XlgAzkdlJM0DvooS22HFuHqwB7dckrtpCMlC
6I3P3p0GfsnG8U99lxYWzuEbtAKwSsFf/da2S8A4rel0aOE=
=xmQd
-----END PGP SIGNATURE-----
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/amd: Mitigations for Zenbleed

Zenbleed is a malfunction on AMD Zen2 uarch parts which results in corruption
of the vector registers.  An attacker can trigger this bug deliberately in
order to access stale data in the physical vector register file.  This can
include data from sibling threads, or a higher-privilege context.

Microcode is the preferred mitigation but in the case that's not available use
the chickenbit as instructed by AMD.  Re-evaluate the mitigation on late
microcode load too.

This is XSA-433 / CVE-2023-20593.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 0eaef82e5145..3ed06f670491 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -13,6 +13,7 @@
 #include <asm/spec_ctrl.h>
 #include <asm/acpi.h>
 #include <asm/apic.h>
+#include <asm/microcode.h>
 
 #include "cpu.h"
 
@@ -905,6 +906,72 @@ void __init detect_zen2_null_seg_behaviour(void)
 
 }
 
+void amd_check_zenbleed(void)
+{
+	const struct cpu_signature *sig = &this_cpu(cpu_sig);
+	unsigned int good_rev, chickenbit = (1 << 9);
+	uint64_t val, old_val;
+
+	/*
+	 * If we're virtualised, we can't do family/model checks safely, and
+	 * we likely wouldn't have access to DE_CFG even if we could see a
+	 * microcode revision.
+	 *
+	 * A hypervisor may hide AVX as a stopgap mitigation.  We're not in a
+	 * position to care either way.  An admin doesn't want to be disabling
+	 * AVX as a mitigation on any build of Xen with this logic present.
+	 */
+	if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17)
+		return;
+
+	switch (boot_cpu_data.x86_model) {
+	case 0x30 ... 0x3f: good_rev = 0x0830107a; break;
+	case 0x60 ... 0x67: good_rev = 0x0860010b; break;
+	case 0x68 ... 0x6f: good_rev = 0x08608105; break;
+	case 0x70 ... 0x7f: good_rev = 0x08701032; break;
+	case 0xa0 ... 0xaf: good_rev = 0x08a00008; break;
+	default:
+		/*
+		 * With the Fam17h check above, parts getting here are Zen1.
+		 * They're not affected.
+		 */
+		return;
+	}
+
+	rdmsrl(MSR_AMD64_DE_CFG, val);
+	old_val = val;
+
+	/*
+	 * Microcode is the preferred mitigation, in terms of performance.
+	 * However, without microcode, this chickenbit (specific to the Zen2
+	 * uarch) disables Floating Point Mov-Elimination to mitigate the
+	 * issue.
+	 */
+	val &= ~chickenbit;
+	if (sig->rev < good_rev)
+		val |= chickenbit;
+
+	if (val == old_val)
+		/* Nothing to change. */
+		return;
+
+	/*
+	 * DE_CFG is a Core-scoped MSR, and this write is racy during late
+	 * microcode load.  However, both threads calculate the new value from
+	 * state which is shared, and unrelated to the old value, so the
+	 * result should be consistent.
+	 */
+	wrmsrl(MSR_AMD64_DE_CFG, val);
+
+	/*
+	 * Inform the admin that we changed something, but don't spam,
+	 * especially during a late microcode load.
+	 */
+	if (smp_processor_id() == 0)
+		printk(XENLOG_INFO "Zenbleed mitigation - using %s\n",
+		       val & chickenbit ? "chickenbit" : "microcode");
+}
+
 static void cf_check init_amd(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
@@ -1171,6 +1238,8 @@ static void cf_check init_amd(struct cpuinfo_x86 *c)
 	if ((smp_processor_id() == 1) && !cpu_has(c, X86_FEATURE_ITSC))
 		disable_c1_ramping();
 
+	amd_check_zenbleed();
+
 	check_syscfg_dram_mod_en();
 
 	amd_log_freq(c);
diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index a9a5557835e4..75fc84e445ce 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -262,6 +262,8 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
            "microcode: CPU%u updated from revision %#x to %#x, date = %04x-%02x-%02x\n",
            cpu, old_rev, rev, patch->year, patch->month, patch->day);
 
+    amd_check_zenbleed();
+
     return 0;
 }
 
diff --git a/xen/arch/x86/include/asm/processor.h b/xen/arch/x86/include/asm/processor.h
index 3b3cf51814f8..c0529cc3d984 100644
--- a/xen/arch/x86/include/asm/processor.h
+++ b/xen/arch/x86/include/asm/processor.h
@@ -547,6 +547,8 @@ enum ap_boot_method {
 };
 extern enum ap_boot_method ap_boot_method;
 
+void amd_check_zenbleed(void);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_X86_PROCESSOR_H */

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/amd: Mitigations for Zenbleed

Zenbleed is a malfunction on AMD Zen2 uarch parts which results in corruption
of the vector registers.  An attacker can trigger this bug deliberately in
order to access stale data in the physical vector register file.  This can
include data from sibling threads, or a higher-privilege context.

Microcode is the preferred mitigation but in the case that's not available use
the chickenbit as instructed by AMD.  Re-evaluate the mitigation on late
microcode load too.

This is XSA-433 / CVE-2023-20593.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index b670ab6cd1b4..9db79f409a5f 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -13,6 +13,7 @@
 #include <asm/spec_ctrl.h>
 #include <asm/acpi.h>
 #include <asm/apic.h>
+#include <asm/microcode.h>
 
 #include "cpu.h"
 
@@ -756,6 +757,72 @@ void amd_init_spectral_chicken(void)
 		wrmsr_safe(MSR_AMD64_DE_CFG2, val | chickenbit);
 }
 
+void amd_check_zenbleed(void)
+{
+	const struct cpu_signature *sig = &this_cpu(cpu_sig);
+	unsigned int good_rev, chickenbit = (1 << 9);
+	uint64_t val, old_val;
+
+	/*
+	 * If we're virtualised, we can't do family/model checks safely, and
+	 * we likely wouldn't have access to DE_CFG even if we could see a
+	 * microcode revision.
+	 *
+	 * A hypervisor may hide AVX as a stopgap mitigation.  We're not in a
+	 * position to care either way.  An admin doesn't want to be disabling
+	 * AVX as a mitigation on any build of Xen with this logic present.
+	 */
+	if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17)
+		return;
+
+	switch (boot_cpu_data.x86_model) {
+	case 0x30 ... 0x3f: good_rev = 0x0830107a; break;
+	case 0x60 ... 0x67: good_rev = 0x0860010b; break;
+	case 0x68 ... 0x6f: good_rev = 0x08608105; break;
+	case 0x70 ... 0x7f: good_rev = 0x08701032; break;
+	case 0xa0 ... 0xaf: good_rev = 0x08a00008; break;
+	default:
+		/*
+		 * With the Fam17h check above, parts getting here are Zen1.
+		 * They're not affected.
+		 */
+		return;
+	}
+
+	rdmsrl(MSR_AMD64_DE_CFG, val);
+	old_val = val;
+
+	/*
+	 * Microcode is the preferred mitigation, in terms of performance.
+	 * However, without microcode, this chickenbit (specific to the Zen2
+	 * uarch) disables Floating Point Mov-Elimination to mitigate the
+	 * issue.
+	 */
+	val &= ~chickenbit;
+	if (sig->rev < good_rev)
+		val |= chickenbit;
+
+	if (val == old_val)
+		/* Nothing to change. */
+		return;
+
+	/*
+	 * DE_CFG is a Core-scoped MSR, and this write is racy during late
+	 * microcode load.  However, both threads calculate the new value from
+	 * state which is shared, and unrelated to the old value, so the
+	 * result should be consistent.
+	 */
+	wrmsrl(MSR_AMD64_DE_CFG, val);
+
+	/*
+	 * Inform the admin that we changed something, but don't spam,
+	 * especially during a late microcode load.
+	 */
+	if (smp_processor_id() == 0)
+		printk(XENLOG_INFO "Zenbleed mitigation - using %s\n",
+		       val & chickenbit ? "chickenbit" : "microcode");
+}
+
 static void init_amd(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
@@ -1016,6 +1083,8 @@ static void init_amd(struct cpuinfo_x86 *c)
 	if ((smp_processor_id() == 1) && !cpu_has(c, X86_FEATURE_ITSC))
 		disable_c1_ramping();
 
+	amd_check_zenbleed();
+
 	check_syscfg_dram_mod_en();
 
 	amd_log_freq(c);
diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index 5eb93195c3a3..9101f93e4227 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -251,6 +251,8 @@ static int apply_microcode(const struct microcode_patch *patch)
     printk(XENLOG_WARNING "microcode: CPU%u updated from revision %#x to %#x\n",
            cpu, old_rev, rev);
 
+    amd_check_zenbleed();
+
     return 0;
 }
 
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 3ff7cc5807e7..71b454d984ac 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -635,6 +635,8 @@ void tsx_init(void);
 void update_mcu_opt_ctrl(void);
 void set_in_mcu_opt_ctrl(uint32_t mask, uint32_t val);
 
+void amd_check_zenbleed(void);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_X86_PROCESSOR_H */

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/amd: Mitigations for Zenbleed

Zenbleed is a malfunction on AMD Zen2 uarch parts which results in corruption
of the vector registers.  An attacker can trigger this bug deliberately in
order to access stale data in the physical vector register file.  This can
include data from sibling threads, or a higher-privilege context.

Microcode is the preferred mitigation but in the case that's not available use
the chickenbit as instructed by AMD.  Re-evaluate the mitigation on late
microcode load too.

This is XSA-433 / CVE-2023-20593.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index a8d2fb8a1590..dd4dc3157c26 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -13,6 +13,7 @@
 #include <asm/spec_ctrl.h>
 #include <asm/acpi.h>
 #include <asm/apic.h>
+#include <asm/microcode.h>
 
 #include "cpu.h"
 
@@ -756,6 +757,72 @@ void amd_init_spectral_chicken(void)
 		wrmsr_safe(MSR_AMD64_DE_CFG2, val | chickenbit);
 }
 
+void amd_check_zenbleed(void)
+{
+	const struct cpu_signature *sig = &this_cpu(cpu_sig);
+	unsigned int good_rev, chickenbit = (1 << 9);
+	uint64_t val, old_val;
+
+	/*
+	 * If we're virtualised, we can't do family/model checks safely, and
+	 * we likely wouldn't have access to DE_CFG even if we could see a
+	 * microcode revision.
+	 *
+	 * A hypervisor may hide AVX as a stopgap mitigation.  We're not in a
+	 * position to care either way.  An admin doesn't want to be disabling
+	 * AVX as a mitigation on any build of Xen with this logic present.
+	 */
+	if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17)
+		return;
+
+	switch (boot_cpu_data.x86_model) {
+	case 0x30 ... 0x3f: good_rev = 0x0830107a; break;
+	case 0x60 ... 0x67: good_rev = 0x0860010b; break;
+	case 0x68 ... 0x6f: good_rev = 0x08608105; break;
+	case 0x70 ... 0x7f: good_rev = 0x08701032; break;
+	case 0xa0 ... 0xaf: good_rev = 0x08a00008; break;
+	default:
+		/*
+		 * With the Fam17h check above, parts getting here are Zen1.
+		 * They're not affected.
+		 */
+		return;
+	}
+
+	rdmsrl(MSR_AMD64_DE_CFG, val);
+	old_val = val;
+
+	/*
+	 * Microcode is the preferred mitigation, in terms of performance.
+	 * However, without microcode, this chickenbit (specific to the Zen2
+	 * uarch) disables Floating Point Mov-Elimination to mitigate the
+	 * issue.
+	 */
+	val &= ~chickenbit;
+	if (sig->rev < good_rev)
+		val |= chickenbit;
+
+	if (val == old_val)
+		/* Nothing to change. */
+		return;
+
+	/*
+	 * DE_CFG is a Core-scoped MSR, and this write is racy during late
+	 * microcode load.  However, both threads calculate the new value from
+	 * state which is shared, and unrelated to the old value, so the
+	 * result should be consistent.
+	 */
+	wrmsrl(MSR_AMD64_DE_CFG, val);
+
+	/*
+	 * Inform the admin that we changed something, but don't spam,
+	 * especially during a late microcode load.
+	 */
+	if (smp_processor_id() == 0)
+		printk(XENLOG_INFO "Zenbleed mitigation - using %s\n",
+		       val & chickenbit ? "chickenbit" : "microcode");
+}
+
 static void init_amd(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
@@ -1016,6 +1083,8 @@ static void init_amd(struct cpuinfo_x86 *c)
 	if ((smp_processor_id() == 1) && !cpu_has(c, X86_FEATURE_ITSC))
 		disable_c1_ramping();
 
+	amd_check_zenbleed();
+
 	check_syscfg_dram_mod_en();
 
 	amd_log_freq(c);
diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index 809ba4967c70..90629bee0dab 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -254,6 +254,8 @@ static int apply_microcode(const struct microcode_patch *patch)
     printk(XENLOG_WARNING "microcode: CPU%u updated from revision %#x to %#x\n",
            cpu, old_rev, rev);
 
+    amd_check_zenbleed();
+
     return 0;
 }
 
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index c8745e1f31aa..7b5ba41a40a0 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -641,6 +641,8 @@ enum ap_boot_method {
 };
 extern enum ap_boot_method ap_boot_method;
 
+void amd_check_zenbleed(void);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_X86_PROCESSOR_H */

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/amd: Mitigations for Zenbleed

Zenbleed is a malfunction on AMD Zen2 uarch parts which results in corruption
of the vector registers.  An attacker can trigger this bug deliberately in
order to access stale data in the physical vector register file.  This can
include data from sibling threads, or a higher-privilege context.

Microcode is the preferred mitigation but in the case that's not available use
the chickenbit as instructed by AMD.  Re-evaluate the mitigation on late
microcode load too.

This is XSA-433 / CVE-2023-20593.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 08e3e1e8a2d8..4ed08df4a8ce 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -13,6 +13,7 @@
 #include <asm/spec_ctrl.h>
 #include <asm/acpi.h>
 #include <asm/apic.h>
+#include <asm/microcode.h>
 
 #include "cpu.h"
 
@@ -769,6 +770,72 @@ void __init detect_zen2_null_seg_behaviour(void)
 
 }
 
+void amd_check_zenbleed(void)
+{
+	const struct cpu_signature *sig = &this_cpu(cpu_sig);
+	unsigned int good_rev, chickenbit = (1 << 9);
+	uint64_t val, old_val;
+
+	/*
+	 * If we're virtualised, we can't do family/model checks safely, and
+	 * we likely wouldn't have access to DE_CFG even if we could see a
+	 * microcode revision.
+	 *
+	 * A hypervisor may hide AVX as a stopgap mitigation.  We're not in a
+	 * position to care either way.  An admin doesn't want to be disabling
+	 * AVX as a mitigation on any build of Xen with this logic present.
+	 */
+	if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17)
+		return;
+
+	switch (boot_cpu_data.x86_model) {
+	case 0x30 ... 0x3f: good_rev = 0x0830107a; break;
+	case 0x60 ... 0x67: good_rev = 0x0860010b; break;
+	case 0x68 ... 0x6f: good_rev = 0x08608105; break;
+	case 0x70 ... 0x7f: good_rev = 0x08701032; break;
+	case 0xa0 ... 0xaf: good_rev = 0x08a00008; break;
+	default:
+		/*
+		 * With the Fam17h check above, parts getting here are Zen1.
+		 * They're not affected.
+		 */
+		return;
+	}
+
+	rdmsrl(MSR_AMD64_DE_CFG, val);
+	old_val = val;
+
+	/*
+	 * Microcode is the preferred mitigation, in terms of performance.
+	 * However, without microcode, this chickenbit (specific to the Zen2
+	 * uarch) disables Floating Point Mov-Elimination to mitigate the
+	 * issue.
+	 */
+	val &= ~chickenbit;
+	if (sig->rev < good_rev)
+		val |= chickenbit;
+
+	if (val == old_val)
+		/* Nothing to change. */
+		return;
+
+	/*
+	 * DE_CFG is a Core-scoped MSR, and this write is racy during late
+	 * microcode load.  However, both threads calculate the new value from
+	 * state which is shared, and unrelated to the old value, so the
+	 * result should be consistent.
+	 */
+	wrmsrl(MSR_AMD64_DE_CFG, val);
+
+	/*
+	 * Inform the admin that we changed something, but don't spam,
+	 * especially during a late microcode load.
+	 */
+	if (smp_processor_id() == 0)
+		printk(XENLOG_INFO "Zenbleed mitigation - using %s\n",
+		       val & chickenbit ? "chickenbit" : "microcode");
+}
+
 static void init_amd(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
@@ -1041,6 +1108,8 @@ static void init_amd(struct cpuinfo_x86 *c)
 	if ((smp_processor_id() == 1) && !cpu_has(c, X86_FEATURE_ITSC))
 		disable_c1_ramping();
 
+	amd_check_zenbleed();
+
 	check_syscfg_dram_mod_en();
 
 	amd_log_freq(c);
diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index 52182c1a2383..483a9b547f80 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -262,6 +262,8 @@ static int apply_microcode(const struct microcode_patch *patch)
            "microcode: CPU%u updated from revision %#x to %#x, date = %04x-%02x-%02x\n",
            cpu, old_rev, rev, patch->year, patch->month, patch->day);
 
+    amd_check_zenbleed();
+
     return 0;
 }
 
diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index 3d8aacd3aab2..96621ec39f8b 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -639,6 +639,8 @@ enum ap_boot_method {
 };
 extern enum ap_boot_method ap_boot_method;
 
+void amd_check_zenbleed(void);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_X86_PROCESSOR_H */

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/amd: Mitigations for Zenbleed

Zenbleed is a malfunction on AMD Zen2 uarch parts which results in corruption
of the vector registers.  An attacker can trigger this bug deliberately in
order to access stale data in the physical vector register file.  This can
include data from sibling threads, or a higher-privilege context.

Microcode is the preferred mitigation but in the case that's not available use
the chickenbit as instructed by AMD.  Re-evaluate the mitigation on late
microcode load too.

This is XSA-433 / CVE-2023-20593.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index b6a20d375ad1..8d23a5be0c5f 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -13,6 +13,7 @@
 #include <asm/spec_ctrl.h>
 #include <asm/acpi.h>
 #include <asm/apic.h>
+#include <asm/microcode.h>
 
 #include "cpu.h"
 
@@ -878,6 +879,72 @@ void __init detect_zen2_null_seg_behaviour(void)
 
 }
 
+void amd_check_zenbleed(void)
+{
+	const struct cpu_signature *sig = &this_cpu(cpu_sig);
+	unsigned int good_rev, chickenbit = (1 << 9);
+	uint64_t val, old_val;
+
+	/*
+	 * If we're virtualised, we can't do family/model checks safely, and
+	 * we likely wouldn't have access to DE_CFG even if we could see a
+	 * microcode revision.
+	 *
+	 * A hypervisor may hide AVX as a stopgap mitigation.  We're not in a
+	 * position to care either way.  An admin doesn't want to be disabling
+	 * AVX as a mitigation on any build of Xen with this logic present.
+	 */
+	if (cpu_has_hypervisor || boot_cpu_data.x86 != 0x17)
+		return;
+
+	switch (boot_cpu_data.x86_model) {
+	case 0x30 ... 0x3f: good_rev = 0x0830107a; break;
+	case 0x60 ... 0x67: good_rev = 0x0860010b; break;
+	case 0x68 ... 0x6f: good_rev = 0x08608105; break;
+	case 0x70 ... 0x7f: good_rev = 0x08701032; break;
+	case 0xa0 ... 0xaf: good_rev = 0x08a00008; break;
+	default:
+		/*
+		 * With the Fam17h check above, parts getting here are Zen1.
+		 * They're not affected.
+		 */
+		return;
+	}
+
+	rdmsrl(MSR_AMD64_DE_CFG, val);
+	old_val = val;
+
+	/*
+	 * Microcode is the preferred mitigation, in terms of performance.
+	 * However, without microcode, this chickenbit (specific to the Zen2
+	 * uarch) disables Floating Point Mov-Elimination to mitigate the
+	 * issue.
+	 */
+	val &= ~chickenbit;
+	if (sig->rev < good_rev)
+		val |= chickenbit;
+
+	if (val == old_val)
+		/* Nothing to change. */
+		return;
+
+	/*
+	 * DE_CFG is a Core-scoped MSR, and this write is racy during late
+	 * microcode load.  However, both threads calculate the new value from
+	 * state which is shared, and unrelated to the old value, so the
+	 * result should be consistent.
+	 */
+	wrmsrl(MSR_AMD64_DE_CFG, val);
+
+	/*
+	 * Inform the admin that we changed something, but don't spam,
+	 * especially during a late microcode load.
+	 */
+	if (smp_processor_id() == 0)
+		printk(XENLOG_INFO "Zenbleed mitigation - using %s\n",
+		       val & chickenbit ? "chickenbit" : "microcode");
+}
+
 static void cf_check init_amd(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
@@ -1150,6 +1217,8 @@ static void cf_check init_amd(struct cpuinfo_x86 *c)
 	if ((smp_processor_id() == 1) && !cpu_has(c, X86_FEATURE_ITSC))
 		disable_c1_ramping();
 
+	amd_check_zenbleed();
+
 	check_syscfg_dram_mod_en();
 
 	amd_log_freq(c);
diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index ded8fe90e650..c6d13f3fb35f 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -262,6 +262,8 @@ static int cf_check apply_microcode(const struct microcode_patch *patch)
            "microcode: CPU%u updated from revision %#x to %#x, date = %04x-%02x-%02x\n",
            cpu, old_rev, rev, patch->year, patch->month, patch->day);
 
+    amd_check_zenbleed();
+
     return 0;
 }
 
diff --git a/xen/arch/x86/include/asm/processor.h b/xen/arch/x86/include/asm/processor.h
index 8e2816fae9b9..66611df6efc1 100644
--- a/xen/arch/x86/include/asm/processor.h
+++ b/xen/arch/x86/include/asm/processor.h
@@ -637,6 +637,8 @@ enum ap_boot_method {
 };
 extern enum ap_boot_method ap_boot_method;
 
+void amd_check_zenbleed(void);
+
 #endif /* !__ASSEMBLY__ */
 
 #endif /* __ASM_X86_PROCESSOR_H */