Xen Security Advisory 407 v1 (CVE-2022-23816,CVE-2022-23825,CVE-2022-29900) - Retbleed - arbitrary speculative code execution with return instructions

Xen.org security team posted 1 patch 1 year, 9 months ago
Failed in applying to current master (apply log)
Xen Security Advisory 407 v1 (CVE-2022-23816,CVE-2022-23825,CVE-2022-29900) - Retbleed - arbitrary speculative code execution with return instructions
Posted by Xen.org security team 1 year, 9 months ago
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

 Xen Security Advisory CVE-2022-23816,CVE-2022-23825,CVE-2022-29900 / XSA-407

   Retbleed - arbitrary speculative code execution with return instructions

ISSUE DESCRIPTION
=================

Researchers at ETH Zurich have discovered Retbleed, allowing for
arbitrary speculative execution in a victim context.

For more details, see:
  https://comsec.ethz.ch/retbleed

ETH Zurich have allocated CVE-2022-29900 for AMD and CVE-2022-29901 for
Intel.

Despite the similar preconditions, these are very different
microarchitectural behaviours between vendors.

On AMD CPUs, Retbleed is one specific instance of a more general
microarchitectural behaviour called Branch Type Confusion.  AMD have
assigned CVE-2022-23816 (Retbleed) and CVE-2022-23825 (Branch Type
Confusion).

For more details, see:
  https://www.amd.com/en/corporate/product-security/bulletin/amd-sb-1037

On Intel CPUs, Retbleed is not a new vulnerability; it is only
applicable to software which did not follow Intel's original Spectre-v2
guidance.  Intel are using the ETH Zurich allocated CVE-2022-29901.

For more details, see:
  https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00702.html
  https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/return-stack-buffer-underflow.html

ARM have indicated existing guidance on Spectre-v2 is sufficient.

IMPACT
======

An attacker might be able to infer the contents of arbitrary host
memory, including memory assigned to other guests.

VULNERABLE SYSTEMS
==================

Systems running all versions of Xen are affected.

Whether a CPU is potentially vulnerable depends on its
microarchitecture.  Consult your hardware vendor.

For ARM and Intel CPUs, Xen implemented the vendor-recommended defaults
in XSA-254 and follow-on fixes.  Therefore, the Xen Security Team
believes there are no further changes necessary on these CPUs.
Administrators who deviated from the default mitigations are potentially
affected and should re-evaluate their threat model.

For AMD, CPUs from the Zen2 microarchitecture and earlier are
potentially vulnerable.  Zen3 and later CPUs are not believed to be
vulnerable.

The patches for Xen implement the IBPB-at-entry mitigation.  This
depends on the IBPB microcode distributed by AMD in 2018 as part of the
original Spectre/Meltdown work.  Consult your dom0 OS vendor.

In addition to IBPB, "cross thread" safety is necessary.  On Zen2 CPUs,
Xen uses STIBP by default.  On Zen1 CPUs, SMT needs disabling either in
the firmware, or by passing `smt=0` on Xen's command line.  On Fam15h
CPUs, Cluster Multi-Threading needs disabling in firmware.

Due to performance concerns, dom0 is excluded from IBPB-on-entry
protections by default.  This is because PV dom0 is trusted in most
deployments.  If your threat model model doesn't allow for dom0 to be
treated specially, boot with `spec-ctrl=ibpb-entry` which will cause
IBPB-on-entry protections to be applied to dom0 too.

MITIGATION
==========

There are no mitigations.

RESOLUTION
==========

Applying the appropriate attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

For the 4.15 and 4.16 branches in particular, these patches depend on:

 - x86/spec-ctrl: Only adjust MSR_SPEC_CTRL for idle with legacy IBRS
 - x86/spec-ctrl: Knobs for STIBP and PSFD, and follow hardware STIBP hint
 - xen/cmdline: Extend parse_boolean() to signal a name match
 - x86/spec-ctrl: Add fine-grained cmdline suboptions for primitives

which have been recently backported.

xsa407/xsa407-?.patch           xen-unstable
xsa407/xsa407-4.16-*.patch      Xen 4.16.x
xsa407/xsa407-4.15-*.patch      Xen 4.15.x
xsa407/xsa407-4.14-*.patch      Xen 4.14.x
xsa407/xsa407-4.13-*.patch      Xen 4.13.x

$ sha256sum xsa407* xsa407*/*
0a6dea915dd760afc73c3f50f432422d2e853eecaf99e3cdeb2d6e0fb3ee71b1  xsa407.meta
8894b0dc8e8c0900560366bde766826bf357c8aec3233ed6147f2094633a3cbc  xsa407/xsa407-1.patch
5955614d73b34ebeab45386dbc9dbe5d96c54f7945d22d5de6c645a5b7796a2f  xsa407/xsa407-2.patch
1f901df3a382a547dda6ef4ef088a5cf60a2d2b0382be451b148bf166cf43013  xsa407/xsa407-3.patch
8f75a9eee8ee2a563bc90e493ba1e4ac29335f677a3d2049ec27e138b7e3021e  xsa407/xsa407-4.13-01.patch
fbe3dca8f170dabf61c620ad1dde12898d52521ede59822971f461549323f946  xsa407/xsa407-4.13-02.patch
9a392bca751a6d6b9489b536ffd7e14722f22e44d266631898ec024b1b258e27  xsa407/xsa407-4.13-03.patch
1eae8ece87fc06ca883fc7510b80ced195c3ec44a589fcf464ead076de4d1afd  xsa407/xsa407-4.13-04.patch
016b1a682aa292a380a4fd9c49c65ade0fa7d19ef2f636611d7883d6adb38008  xsa407/xsa407-4.13-05.patch
cc3435e7bf2331a61c6e6731d8c0c4edd10ac49c85c9702e3d790309e1bb494a  xsa407/xsa407-4.13-06.patch
46f7b3d8a4ae39fa325dda2d77091b0768367a3d2cf6a341996042e511e46b93  xsa407/xsa407-4.13-07.patch
cb31c3104890c83fad719c8a2c7b0ae242625132a1e9d6afbf6310af10a8c14a  xsa407/xsa407-4.13-08.patch
55241ce45fb11825f7867ae188d73007c38c63d4a8489d990a8c869e000669dc  xsa407/xsa407-4.13-09.patch
f2d8a64f8446890a055e084f195a9f7c8982915556cefb48dd12b0e798b30a0f  xsa407/xsa407-4.13-10.patch
aa0ed6a1126c4d9d5fc94c00a51ddf27f4357c4a1cb258f72f0c17ac4ce0d191  xsa407/xsa407-4.13-11.patch
e91f244180bd92c111e1c653c22644b8144f3610717dd00347b7f21df75830bf  xsa407/xsa407-4.13-12.patch
59c604f50e0cedac2d5011fdb580aab4d719dbe73d9c50096faae70324864927  xsa407/xsa407-4.13-13.patch
ca1f04eaacf86ac21a4656b8f1ad9ff0b06d5f295bba5ee21e0bbb4698b165c0  xsa407/xsa407-4.13-14.patch
d3ee52d4144b5bb375c1fb7e484b68190632da22e654ab480b73745fe2f23af1  xsa407/xsa407-4.13-15.patch
af4ec1eed3d10ce6795e96216676db581e19e4e65d19ee48679a1230a6c37a2f  xsa407/xsa407-4.13-16.patch
8ee57395139261e09e387775d7f5c36a1fa53f75caff89302167727230250501  xsa407/xsa407-4.13-17.patch
1495ffd28238737bd9ad346e5667065f5acaa82adc86aeceb358be3d3b1469f9  xsa407/xsa407-4.13-18.patch
a02fd749eb761b93fe7b2e5977a9aa493af13044165b71de9e7625c0237c2fde  xsa407/xsa407-4.13-19.patch
cf127677913b8127c9a71b1c9b3badf9f2c2064d1ed2602d236ba610f7335c8a  xsa407/xsa407-4.13-20.patch
0289eb4a9098ab806f5b847e5f55652817b9bb8c9ebc98e28fe8ac626c77f77c  xsa407/xsa407-4.13-21.patch
fde8cafcd3207329a7582a18a333f95e82e5edc54e93e4d7603c62dc262942a4  xsa407/xsa407-4.14-01.patch
e2e7aaf633c2638f4a81eca9e627110b4ad087760b9f4880965093f874b138a2  xsa407/xsa407-4.14-02.patch
69938e6c1293040aff921f2cd6bf2ad850caa682745f0d6be8bc2aabb3802edf  xsa407/xsa407-4.14-03.patch
47d67a565d3077688a43937d7cb6cc79d43a8d5e8563711a1476924c696a9759  xsa407/xsa407-4.14-04.patch
14d15b20e053c7dec2e9dd9cbd108284b0ac2069dac2e5c4e76ab4c78637fbe8  xsa407/xsa407-4.14-05.patch
cd38bb072a8e99760a80464482d645aba1531fdb4f4d04eabd7c48e2db00c8e1  xsa407/xsa407-4.14-06.patch
85b79e26fce7b649ae860f9860925060867004ea1940c1110f5e22354891b66a  xsa407/xsa407-4.14-07.patch
0c292123259319cf110df43ccad50ac5f1396de234d457ed1fbd60462da40d82  xsa407/xsa407-4.14-08.patch
1161d0378a63d79461bfdfdc082bb6e49418f6b356a85048a33f268031a11abd  xsa407/xsa407-4.14-09.patch
4b4c652481abaf49d1531ed5c6b6f91b17ab8ae71fcbb085f4557b661fa74d5d  xsa407/xsa407-4.14-10.patch
074c16f104563ca665ee3af5144b9d3ec5131eea6eb9c5859ba5e2a33051bd55  xsa407/xsa407-4.14-11.patch
e816ea5ea372e4e1429e7191721df9203ee8759a337c91e57d176f2d6a636949  xsa407/xsa407-4.14-12.patch
61382cd7985ac5b3d265a08188cbebdd6916fd150413bb77e5ad452fa98e254a  xsa407/xsa407-4.15-1.patch
cd38bb072a8e99760a80464482d645aba1531fdb4f4d04eabd7c48e2db00c8e1  xsa407/xsa407-4.15-2.patch
03d8a0e18b4e1ffbac268cfc159341b4d641d0322ea77efd22c43e4a4318d511  xsa407/xsa407-4.15-3.patch
ae8e8f220a708401a68535e88a3092b35c3db0a20bd3e3a27cdcc7e88d1ff600  xsa407/xsa407-4.15-4.patch
aba615483add2199ad2912557e0b9024d6efd6573fa8009590502d483a78e63a  xsa407/xsa407-4.15-5.patch
55e58ce88ff7126c314c7e24f75a700a3263388137ecd725d2e459e21c018f64  xsa407/xsa407-4.15-6.patch
a3b146ba37e183d9aec813e66e00a6647835246270d2a9a649724f2570c96c17  xsa407/xsa407-4.15-7.patch
ac33b676c2fc5fdee565701baadddd627e492e85f9ca481d12a510c5fc3ff7ab  xsa407/xsa407-4.15-8.patch
3a3cec31ebb8f0fb41e3804f03318becc2a978d71831cf086f77c7eff89de9fe  xsa407/xsa407-4.16-1.patch
b936a9a36c336d1dcf05923f9a07728522f6a6d1474006ec179981a4787a4522  xsa407/xsa407-4.16-2.patch
825a683f37964186ab669468c517c342dc55b1e86898a75c86f8ff0de47e1b76  xsa407/xsa407-4.16-3.patch
402795d0cb418503c3e90b65f3bf546493a7411d14208ac718ba8639f67d1860  xsa407/xsa407-4.16-4.patch
ec6009b2ddaa74099725844bf4343efb8510015fb851d3ccc26913f877db0bdf  xsa407/xsa407-4.16-5.patch
4fa9c65ee0bdf8650b0cd483c205a305352d918408b91d4adf83c84d1b269b2e  xsa407/xsa407-4.16-6.patch
1c01c1508103de49cc1895a60babe9d33feaa27da8d2bd89c6895c0173e280d7  xsa407/xsa407-4.16-7.patch
d2a4e06959dff5a9772b13d921332804fbaf81f012c0b9cf85f8b9dd008c61de  xsa407/xsa407-4.16-8.patch
c178e43d3f569086aee66ffaf28f22156bdb22144bdff7ffe4f7c20242abe73c  xsa407/xsa407-4.patch
eb9985afa38b1d2bffd6a48772a429fc0f88375cd3fc0b977f9a8a0981ab87b4  xsa407/xsa407-5.patch
aea9fb436a1f3dc38a874b8b3e4d0f1a82fb14c5e50c579a978aee1a83bfdb72  xsa407/xsa407-6.patch
cf1e9796dbaaedf1e3ba7efb830fe99dea8f09125d7ec7bd2a16b11cfc131aa6  xsa407/xsa407-7.patch
cc46f1da318dfa72b87bbc069bf448eed3d1b264281e3a7d9a6bad8f6519e8c3  xsa407/xsa407-8.patch
$

NOTE CONCERNING LACK OF EMBARGO
===============================

The disclosers did not authorise us to predisclose.
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmLNotoMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZeiUH/jZsXrd1X9mzVrBaoQQckCtYtrM+9rYS1JbupDZx
Ca6P0zwKaX1uaDi/De/UCAbt4fCpE/xqqy9X5wMX0XUFJEhr74GKXDh/evzH7C/i
WxwNmoTio0Un5jw+aLlKGza7oSNYVKPgYjDim7iTMmWdzWauS6Ock3HQn2jkG0JL
nTarKFX2JjC2INiu6YssDS81nI6cPJAz+AB4FzzU6u/2loPZv5hxpYnrUsWlRaH1
87pAiGhi7gc9yhv9FTi3C/paBG/kioqQi/ahV5S/l2nlIR1xo97ewfStcdAsT5sl
XgFq0sKLamMti0Ens3tydrXVNeyfHq9ABlN2eOnufZNT8Kc=
=CEa6
-----END PGP SIGNATURE-----
{
  "XSA": 407,
  "SupportedVersions": [
    "master",
    "4.16",
    "4.15",
    "4.14",
    "4.13"
  ],
  "Trees": [
    "xen"
  ],
  "Recipes": {
    "4.13": {
      "Recipes": {
        "xen": {
          "StableRef": "87ff11354f0dc0d6e77e1695e6c1e14aa1382cdc",
          "Prereqs": [],
          "Patches": [
            "xsa407/xsa407-4.13-*.patch"
          ]
        }
      }
    },
    "4.14": {
      "Recipes": {
        "xen": {
          "StableRef": "c5f774eaeeca195ef85b47713f0b21220c4b41e6",
          "Prereqs": [],
          "Patches": [
            "xsa407/xsa407-4.14-*.patch"
          ]
        }
      }
    },
    "4.15": {
      "Recipes": {
        "xen": {
          "StableRef": "505771bb1dffdf6f763fad18ee49a913b98abfea",
          "Prereqs": [],
          "Patches": [
            "xsa407/xsa407-4.15-*.patch"
          ]
        }
      }
    },
    "4.16": {
      "Recipes": {
        "xen": {
          "StableRef": "744accad1b73223b3261e3e678e16e030d83b179",
          "Prereqs": [],
          "Patches": [
            "xsa407/xsa407-4.16-*.patch"
          ]
        }
      }
    },
    "master": {
      "Recipes": {
        "xen": {
          "StableRef": "dc7da0874ba4e8fab4c5783055755938ef19fc37",
          "Prereqs": [],
          "Patches": [
            "xsa407/xsa407-?.patch"
          ]
        }
      }
    }
  }
}From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rework spec_ctrl_flags context switching

We are shortly going to need to context switch new bits in both the vcpu and
S3 paths.  Introduce SCF_IST_MASK and SCF_DOM_MASK, and rework d->arch.verw
into d->arch.spec_ctrl_flags to accommodate.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index c4e7e8698920..1bb4d78392c4 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -248,8 +248,8 @@ static int enter_state(u32 state)
         error = 0;
 
     ci = get_cpu_info();
-    /* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */
-    ci->spec_ctrl_flags &= ~SCF_ist_wrmsr;
+    /* Avoid NMI/#MC using unsafe MSRs until we've reloaded microcode. */
+    ci->spec_ctrl_flags &= ~SCF_IST_MASK;
 
     ACPI_FLUSH_CPU_CACHE();
 
@@ -292,8 +292,8 @@ static int enter_state(u32 state)
     if ( !recheck_cpu_features(0) )
         panic("Missing previously available feature(s)\n");
 
-    /* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
-    ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
+    /* Re-enabled default NMI/#MC use of MSRs now microcode is loaded. */
+    ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_IST_MASK);
 
     if ( boot_cpu_has(X86_FEATURE_IBRSB) || boot_cpu_has(X86_FEATURE_IBRS) )
     {
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 408ee284ed3f..21dbf7b822fc 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2123,10 +2123,10 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
             }
         }
 
-        /* Update the top-of-stack block with the VERW disposition. */
-        info->spec_ctrl_flags &= ~SCF_verw;
-        if ( nextd->arch.verw )
-            info->spec_ctrl_flags |= SCF_verw;
+        /* Update the top-of-stack block with the new spec_ctrl settings. */
+        info->spec_ctrl_flags =
+            (info->spec_ctrl_flags       & ~SCF_DOM_MASK) |
+            (nextd->arch.spec_ctrl_flags &  SCF_DOM_MASK);
     }
 
     sched_context_switched(prev, next);
diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
index ad01ee68e12c..4e59ca8c4e14 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -324,8 +324,7 @@ struct arch_domain
     uint32_t pci_cf8;
     uint8_t cmos_idx;
 
-    /* Use VERW on return-to-guest for its flushing side effect. */
-    bool verw;
+    uint8_t spec_ctrl_flags; /* See SCF_DOM_MASK */
 
     union {
         struct pv_domain pv;
diff --git a/xen/arch/x86/include/asm/spec_ctrl.h b/xen/arch/x86/include/asm/spec_ctrl.h
index 7e83e0179fb9..3cd72e40305f 100644
--- a/xen/arch/x86/include/asm/spec_ctrl.h
+++ b/xen/arch/x86/include/asm/spec_ctrl.h
@@ -20,12 +20,40 @@
 #ifndef __X86_SPEC_CTRL_H__
 #define __X86_SPEC_CTRL_H__
 
-/* Encoding of cpuinfo.spec_ctrl_flags */
+/*
+ * Encoding of:
+ *   cpuinfo.spec_ctrl_flags
+ *   default_spec_ctrl_flags
+ *   domain.spec_ctrl_flags
+ *
+ * Live settings are in the top-of-stack block, because they need to be
+ * accessable when XPTI is active.  Some settings are fixed from boot, some
+ * context switched per domain, and some inhibited in the S3 path.
+ */
 #define SCF_use_shadow (1 << 0)
 #define SCF_ist_wrmsr  (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
 
+/*
+ * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
+ * functionality requires updated microcode to work.
+ *
+ * On boot, this is easy; we load microcode before figuring out which
+ * speculative protections to apply.  However, on the S3 resume path, we must
+ * be able to disable the configured mitigations until microcode is reloaded.
+ *
+ * These are the controls to inhibit on the S3 resume path until microcode has
+ * been reloaded.
+ */
+#define SCF_IST_MASK (SCF_ist_wrmsr)
+
+/*
+ * Some speculative protections are per-domain.  These settings are merged
+ * into the top-of-stack block in the context switch path.
+ */
+#define SCF_DOM_MASK (SCF_verw)
+
 #ifndef __ASSEMBLY__
 
 #include <asm/alternative.h>
diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index 5a590bac44aa..66b00d511fc6 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -248,9 +248,6 @@
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.
- * Fine grain control of SCF_ist_wrmsr is needed for safety in the S3 resume
- * path to avoid using MSR_SPEC_CTRL before the microcode introducing it has
- * been reloaded.
  */
 .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
 /*
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 328862bdf549..97b0272eccf7 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -1011,9 +1011,12 @@ void spec_ctrl_init_domain(struct domain *d)
 {
     bool pv = is_pv_domain(d);
 
-    d->arch.verw =
-        (pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
-        (opt_fb_clear_mmio && is_iommu_enabled(d));
+    bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
+                 (opt_fb_clear_mmio && is_iommu_enabled(d)));
+
+    d->arch.spec_ctrl_flags =
+        (verw   ? SCF_verw         : 0) |
+        0;
 }
 
 void __init init_speculation_mitigations(void)
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rename SCF_ist_wrmsr to SCF_ist_sc_msr

We are about to introduce SCF_ist_ibpb, at which point SCF_ist_wrmsr becomes
ambiguous.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/include/asm/spec_ctrl.h b/xen/arch/x86/include/asm/spec_ctrl.h
index 3cd72e40305f..f8f0ac47e759 100644
--- a/xen/arch/x86/include/asm/spec_ctrl.h
+++ b/xen/arch/x86/include/asm/spec_ctrl.h
@@ -31,7 +31,7 @@
  * context switched per domain, and some inhibited in the S3 path.
  */
 #define SCF_use_shadow (1 << 0)
-#define SCF_ist_wrmsr  (1 << 1)
+#define SCF_ist_sc_msr (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
 
@@ -46,7 +46,7 @@
  * These are the controls to inhibit on the S3 resume path until microcode has
  * been reloaded.
  */
-#define SCF_IST_MASK (SCF_ist_wrmsr)
+#define SCF_IST_MASK (SCF_ist_sc_msr)
 
 /*
  * Some speculative protections are per-domain.  These settings are merged
diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index 66b00d511fc6..0ff1b118f882 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -266,8 +266,8 @@
 
 .L\@_skip_rsb:
 
-    test $SCF_ist_wrmsr, %al
-    jz .L\@_skip_wrmsr
+    test $SCF_ist_sc_msr, %al
+    jz .L\@_skip_msr_spec_ctrl
 
     xor %edx, %edx
     testb $3, UREGS_cs(%rsp)
@@ -290,7 +290,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
      * to speculate around the WRMSR.  As a result, we need a dispatch
      * serialising instruction in the else clause.
      */
-.L\@_skip_wrmsr:
+.L\@_skip_msr_spec_ctrl:
     lfence
     UNLIKELY_END(\@_serialise)
 .endm
@@ -301,7 +301,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
  * Requires %rbx=stack_end
  * Clobbers %rax, %rcx, %rdx
  */
-    testb $SCF_ist_wrmsr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
+    testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
     jz .L\@_skip
 
     DO_SPEC_CTRL_EXIT_TO_XEN
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 97b0272eccf7..0ef46ee17533 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -1116,7 +1116,7 @@ void __init init_speculation_mitigations(void)
     {
         if ( opt_msr_sc_pv )
         {
-            default_spec_ctrl_flags |= SCF_ist_wrmsr;
+            default_spec_ctrl_flags |= SCF_ist_sc_msr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_PV);
         }
 
@@ -1127,7 +1127,7 @@ void __init init_speculation_mitigations(void)
              * Xen's value is not restored atomically.  An early NMI hitting
              * the VMExit path needs to restore Xen's value for safety.
              */
-            default_spec_ctrl_flags |= SCF_ist_wrmsr;
+            default_spec_ctrl_flags |= SCF_ist_sc_msr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
         }
     }
@@ -1140,7 +1140,7 @@ void __init init_speculation_mitigations(void)
          * on real hardware matches the availability of MSR_SPEC_CTRL in the
          * first place.
          *
-         * No need for SCF_ist_wrmsr because Xen's value is restored
+         * No need for SCF_ist_sc_msr because Xen's value is restored
          * atomically WRT NMIs in the VMExit path.
          *
          * TODO: Adjust cpu_has_svm_spec_ctrl to be usable earlier on boot.
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rename opt_ibpb to opt_ibpb_ctxt_switch

We are about to introduce the use of IBPB at different points in Xen, making
opt_ibpb ambiguous.  Rename it to opt_ibpb_ctxt_switch.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 21dbf7b822fc..532b87e8af56 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2095,7 +2095,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
         ctxt_switch_levelling(next);
 
-        if ( opt_ibpb && !is_idle_domain(nextd) )
+        if ( opt_ibpb_ctxt_switch && !is_idle_domain(nextd) )
         {
             static DEFINE_PER_CPU(unsigned int, last);
             unsigned int *last_id = &this_cpu(last);
diff --git a/xen/arch/x86/include/asm/spec_ctrl.h b/xen/arch/x86/include/asm/spec_ctrl.h
index f8f0ac47e759..fb4365575620 100644
--- a/xen/arch/x86/include/asm/spec_ctrl.h
+++ b/xen/arch/x86/include/asm/spec_ctrl.h
@@ -63,7 +63,7 @@
 void init_speculation_mitigations(void);
 void spec_ctrl_init_domain(struct domain *d);
 
-extern bool opt_ibpb;
+extern bool opt_ibpb_ctxt_switch;
 extern bool opt_ssbd;
 extern int8_t opt_eager_fpu;
 extern int8_t opt_l1d_flush;
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 0ef46ee17533..ede0a27c0fea 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -54,7 +54,7 @@ int8_t __initdata opt_stibp = -1;
 bool __ro_after_init opt_ssbd;
 int8_t __initdata opt_psfd = -1;
 
-bool __read_mostly opt_ibpb = true;
+bool __read_mostly opt_ibpb_ctxt_switch = true;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 static bool __initdata opt_branch_harden = true;
@@ -117,7 +117,7 @@ static int __init cf_check parse_spec_ctrl(const char *s)
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
-            opt_ibpb = false;
+            opt_ibpb_ctxt_switch = false;
             opt_ssbd = false;
             opt_l1d_flush = 0;
             opt_branch_harden = false;
@@ -238,7 +238,7 @@ static int __init cf_check parse_spec_ctrl(const char *s)
 
         /* Misc settings. */
         else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
-            opt_ibpb = val;
+            opt_ibpb_ctxt_switch = val;
         else if ( (val = parse_boolean("eager-fpu", s, ss)) >= 0 )
             opt_eager_fpu = val;
         else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
@@ -458,7 +458,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (opt_tsx & 1)                             ? " TSX+" : " TSX-",
            !cpu_has_srbds_ctrl                       ? "" :
            opt_srb_lock                              ? " SRB_LOCK+" : " SRB_LOCK-",
-           opt_ibpb                                  ? " IBPB"  : "",
+           opt_ibpb_ctxt_switch                      ? " IBPB-ctxt" : "",
            opt_l1d_flush                             ? " L1D_FLUSH" : "",
            opt_md_clear_pv || opt_md_clear_hvm ||
            opt_fb_clear_mmio                         ? " VERW"  : "",
@@ -1241,7 +1241,7 @@ void __init init_speculation_mitigations(void)
 
     /* Check we have hardware IBPB support before using it... */
     if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-        opt_ibpb = false;
+        opt_ibpb_ctxt_switch = false;
 
     /* Check whether Eager FPU should be enabled by default. */
     if ( opt_eager_fpu == -1 )
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM

These were written before Spectre/Meltdown went public, and there was large
uncertainty in how the protections would evolve.  As it turns out, they're
very specific to Intel hardware, and not very suitable for AMD.

Drop the macros, opencoding the relevant subset of functionality, and leaving
grep-fodder to locate the logic.  No change at all for VT-x.

For AMD, the only relevant piece of functionality is DO_OVERWRITE_RSB,
although we will soon be adding (different) logic to handle MSR_SPEC_CTRL.

This has a marginal improvement of removing an unconditional pile of long-nops
from the vmentry/exit path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 95b13fa43e0753b7514bef13abe28253e8614f62)
[Forward port over XSA-404]

diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index e954d8e02110..0684d29050a5 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -63,7 +63,7 @@ __UNLIKELY_END(nsvm_hap)
         mov VCPUMSR_spec_ctrl_raw(%rax), %eax
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-        SPEC_CTRL_EXIT_TO_HVM   /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
+        /* SPEC_CTRL_EXIT_TO_SVM   (nothing currently) */
 
         pop  %r15
         pop  %r14
@@ -90,7 +90,8 @@ __UNLIKELY_END(nsvm_hap)
 
         GET_CURRENT(bx)
 
-        SPEC_CTRL_ENTRY_FROM_HVM    /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
+        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: b=curr %rsp=regs/cpuinfo, Clob: ac  */
+        ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         STGI
diff --git a/xen/arch/x86/hvm/vmx/entry.S b/xen/arch/x86/hvm/vmx/entry.S
index 62ed0d854df1..db3af571c913 100644
--- a/xen/arch/x86/hvm/vmx/entry.S
+++ b/xen/arch/x86/hvm/vmx/entry.S
@@ -33,7 +33,9 @@ ENTRY(vmx_asm_vmexit_handler)
         movb $1,VCPU_vmx_launched(%rbx)
         mov  %rax,VCPU_hvm_guest_cr2(%rbx)
 
-        SPEC_CTRL_ENTRY_FROM_HVM    /* Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
+        /* SPEC_CTRL_ENTRY_FROM_VMX    Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
+        ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
+        ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM, X86_FEATURE_SC_MSR_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         /* Hardware clears MSR_DEBUGCTL on VMExit.  Reinstate it if debugging Xen. */
@@ -80,7 +82,8 @@ UNLIKELY_END(realmode)
         mov VCPUMSR_spec_ctrl_raw(%rax), %eax
 
         /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
-        SPEC_CTRL_EXIT_TO_HVM   /* Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
+        /* SPEC_CTRL_EXIT_TO_VMX   Req: a=spec_ctrl %rsp=regs/cpuinfo, Clob: cd */
+        ALTERNATIVE "", DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM
         DO_SPEC_CTRL_COND_VERW
 
         mov  VCPU_hvm_guest_cr2(%rbx),%rax
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 4a3777cc5227..fe90c80ac39c 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -68,14 +68,16 @@
  *
  * The following ASM fragments implement this algorithm.  See their local
  * comments for further details.
- *  - SPEC_CTRL_ENTRY_FROM_HVM
  *  - SPEC_CTRL_ENTRY_FROM_PV
  *  - SPEC_CTRL_ENTRY_FROM_INTR
  *  - SPEC_CTRL_ENTRY_FROM_INTR_IST
  *  - SPEC_CTRL_EXIT_TO_XEN_IST
  *  - SPEC_CTRL_EXIT_TO_XEN
  *  - SPEC_CTRL_EXIT_TO_PV
- *  - SPEC_CTRL_EXIT_TO_HVM
+ *
+ * Additionally, the following grep-fodder exists to find the HVM logic.
+ *  - SPEC_CTRL_ENTRY_FROM_{SVM,VMX}
+ *  - SPEC_CTRL_EXIT_TO_{SVM,VMX}
  */
 
 .macro DO_OVERWRITE_RSB tmp=rax
@@ -228,12 +230,6 @@
     wrmsr
 .endm
 
-/* Use after a VMEXIT from an HVM guest. */
-#define SPEC_CTRL_ENTRY_FROM_HVM                                        \
-    ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM;           \
-    ALTERNATIVE "", DO_SPEC_CTRL_ENTRY_FROM_HVM,                        \
-        X86_FEATURE_SC_MSR_HVM
-
 /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
 #define SPEC_CTRL_ENTRY_FROM_PV                                         \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
@@ -257,11 +253,6 @@
         DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_PV;              \
     DO_SPEC_CTRL_COND_VERW
 
-/* Use when exiting to HVM guest context. */
-#define SPEC_CTRL_EXIT_TO_HVM                                           \
-    ALTERNATIVE "",                                                     \
-        DO_SPEC_CTRL_EXIT_TO_GUEST, X86_FEATURE_SC_MSR_HVM;             \
-
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.
  * Fine grain control of SCF_ist_wrmsr is needed for safety in the S3 resume
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Split the "Hardware features" diagnostic line

Separate the read-only hints from the features requiring active actions on
Xen's behalf.

Also take the opportunity split the IBRS/IBPB and IBPB mess.  More features
with overlapping enumeration are on the way, and and it is not useful to split
them like this.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 565ebcda976c05b0c6191510d5e32b621a2b1867)
[Forward ported over XSA-404]

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 3b34ca705cc6..fec145a7b9e1 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -322,28 +322,35 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
 
     printk("Speculative mitigation facilities:\n");
 
-    /* Hardware features which pertain to speculative mitigations. */
-    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
-           (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB)) ? " IBRS/IBPB" : "",
-           (_7d0 & cpufeat_mask(X86_FEATURE_STIBP)) ? " STIBP"     : "",
-           (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH)) ? " L1D_FLUSH" : "",
-           (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))  ? " SSBD"      : "",
-           (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR)) ? " MD_CLEAR" : "",
-           (_7d0 & cpufeat_mask(X86_FEATURE_SRBDS_CTRL)) ? " SRBDS_CTRL" : "",
-           (e8b  & cpufeat_mask(X86_FEATURE_IBPB))  ? " IBPB"      : "",
-           (caps & ARCH_CAPS_IBRS_ALL)              ? " IBRS_ALL"  : "",
-           (caps & ARCH_CAPS_RDCL_NO)               ? " RDCL_NO"   : "",
-           (caps & ARCH_CAPS_RSBA)                  ? " RSBA"      : "",
-           (caps & ARCH_CAPS_SKIP_L1DFL)            ? " SKIP_L1DFL": "",
-           (caps & ARCH_CAPS_SSB_NO)                ? " SSB_NO"    : "",
-           (caps & ARCH_CAPS_MDS_NO)                ? " MDS_NO"    : "",
-           (caps & ARCH_CAPS_TSX_CTRL)              ? " TSX_CTRL"  : "",
-           (caps & ARCH_CAPS_TAA_NO)                ? " TAA_NO"    : "",
-           (caps & ARCH_CAPS_SBDR_SSDP_NO)          ? " SBDR_SSDP_NO" : "",
-           (caps & ARCH_CAPS_FBSDP_NO)              ? " FBSDP_NO"  : "",
-           (caps & ARCH_CAPS_PSDP_NO)               ? " PSDP_NO"   : "",
-           (caps & ARCH_CAPS_FB_CLEAR)              ? " FB_CLEAR"  : "",
-           (caps & ARCH_CAPS_FB_CLEAR_CTRL)         ? " FB_CLEAR_CTRL" : "");
+    /*
+     * Hardware read-only information, stating immunity to certain issues, or
+     * suggestions of which mitigation to use.
+     */
+    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s\n",
+           (caps & ARCH_CAPS_RDCL_NO)                        ? " RDCL_NO"        : "",
+           (caps & ARCH_CAPS_IBRS_ALL)                       ? " IBRS_ALL"       : "",
+           (caps & ARCH_CAPS_RSBA)                           ? " RSBA"           : "",
+           (caps & ARCH_CAPS_SKIP_L1DFL)                     ? " SKIP_L1DFL"     : "",
+           (caps & ARCH_CAPS_SSB_NO)                         ? " SSB_NO"         : "",
+           (caps & ARCH_CAPS_MDS_NO)                         ? " MDS_NO"         : "",
+           (caps & ARCH_CAPS_TAA_NO)                         ? " TAA_NO"         : "",
+           (caps & ARCH_CAPS_SBDR_SSDP_NO)                   ? " SBDR_SSDP_NO"   : "",
+           (caps & ARCH_CAPS_FBSDP_NO)                       ? " FBSDP_NO"       : "",
+           (caps & ARCH_CAPS_PSDP_NO)                        ? " PSDP_NO"        : "");
+
+    /* Hardware features which need driving to mitigate issues. */
+    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s\n",
+           (e8b  & cpufeat_mask(X86_FEATURE_IBPB)) ||
+           (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB))          ? " IBPB"           : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB))          ? " IBRS"           : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_STIBP))          ? " STIBP"          : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))           ? " SSBD"           : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH))      ? " L1D_FLUSH"      : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR))       ? " MD_CLEAR"       : "",
+           (_7d0 & cpufeat_mask(X86_FEATURE_SRBDS_CTRL))     ? " SRBDS_CTRL"     : "",
+           (caps & ARCH_CAPS_TSX_CTRL)                       ? " TSX_CTRL"       : "",
+           (caps & ARCH_CAPS_FB_CLEAR)                       ? " FB_CLEAR"       : "",
+           (caps & ARCH_CAPS_FB_CLEAR_CTRL)                  ? " FB_CLEAR_CTRL"  : "");
 
     /* Compiled-in support which pertains to mitigations. */
     if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/amd: Enumeration for speculative features/hints

There is a step change in speculation protections between the Zen1 and Zen2
microarchitectures.

Zen1 and older have no special support.  Control bits in non-architectural
MSRs are used to make lfence be dispatch-serialising (Spectre v1), and to
disable Memory Disambiguation (Speculative Store Bypass).  IBPB was
retrofitted in a microcode update, and software methods are required for
Spectre v2 protections.

Because the bit controlling Memory Disambiguation is model specific,
hypervisors are expected to expose a MSR_VIRT_SPEC_CTRL interface which
abstracts the model specific details.

Zen2 and later implement the MSR_SPEC_CTRL interface in hardware, and
virtualise the interface for HVM guests to use.  A number of hint bits are
specified too to help guide OS software to the most efficient mitigation
strategy.

Zen3 introduced a new feature, Predictive Store Forwarding, along with a
control to disable it in sensitive code.

Add CPUID and VMCB details for all the new functionality.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 747424c664bb164a04e7a9f2ffbf02d4a1630d7d)

diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index 083869dcf4a7..977a4377bf1b 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -264,6 +264,16 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
         {"rstr-fp-err-ptrs", 0x80000008, NA, CPUID_REG_EBX, 2, 1},
         {"wbnoinvd",     0x80000008, NA, CPUID_REG_EBX,  9,  1},
         {"ibpb",         0x80000008, NA, CPUID_REG_EBX, 12,  1},
+        {"ibrs",         0x80000008, NA, CPUID_REG_EBX, 14,  1},
+        {"amd-stibp",    0x80000008, NA, CPUID_REG_EBX, 15,  1},
+        {"ibrs-always",  0x80000008, NA, CPUID_REG_EBX, 16,  1},
+        {"stibp-always", 0x80000008, NA, CPUID_REG_EBX, 17,  1},
+        {"ibrs-fast",    0x80000008, NA, CPUID_REG_EBX, 18,  1},
+        {"ibrs-same-mode", 0x80000008, NA, CPUID_REG_EBX, 19,  1},
+        {"amd-ssbd",     0x80000008, NA, CPUID_REG_EBX, 24,  1},
+        {"virt-ssbd",    0x80000008, NA, CPUID_REG_EBX, 25,  1},
+        {"ssb-no",       0x80000008, NA, CPUID_REG_EBX, 26,  1},
+        {"psfd",         0x80000008, NA, CPUID_REG_EBX, 28,  1},
 
         {"nc",           0x80000008, NA, CPUID_REG_ECX,  0,  8},
         {"apicidsize",   0x80000008, NA, CPUID_REG_ECX, 12,  4},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index a09440813b9d..ff167779df31 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -147,9 +147,16 @@ static const char *const str_e8b[32] =
     [ 0] = "clzero",
     [ 2] = "rstr-fp-err-ptrs",
 
-    /* [ 8] */            [ 9] = "wbnoinvd",
+    /* [ 8] */                 [ 9] = "wbnoinvd",
 
     [12] = "ibpb",
+    [14] = "ibrs",             [15] = "amd-stibp",
+    [16] = "ibrs-always",      [17] = "stibp-always",
+    [18] = "ibrs-fast",        [19] = "ibrs-same-mode",
+
+    [24] = "amd-ssbd",         [25] = "virt-ssbd",
+    [26] = "ssb-no",
+    [28] = "psfd",
 };
 
 static const char *const str_7d0[32] =
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index a6de9ccb8fdc..c973411019c4 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1657,6 +1657,7 @@ const struct hvm_function_table * __init start_svm(void)
     P(cpu_has_pause_filter, "Pause-Intercept Filter");
     P(cpu_has_pause_thresh, "Pause-Intercept Filter Threshold");
     P(cpu_has_tsc_ratio, "TSC Rate MSR");
+    P(cpu_has_svm_spec_ctrl, "MSR_SPEC_CTRL virtualisation");
 #undef P
 
     if ( !printed )
diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h
index 00d22caac74c..6419b4d2c5e4 100644
--- a/xen/include/asm-x86/cpufeature.h
+++ b/xen/include/asm-x86/cpufeature.h
@@ -124,6 +124,11 @@
 /* CPUID level 0x80000007.edx */
 #define cpu_has_itsc            boot_cpu_has(X86_FEATURE_ITSC)
 
+/* CPUID level 0x80000008.ebx */
+#define cpu_has_amd_ssbd        boot_cpu_has(X86_FEATURE_AMD_SSBD)
+#define cpu_has_virt_ssbd       boot_cpu_has(X86_FEATURE_VIRT_SSBD)
+#define cpu_has_ssb_no          boot_cpu_has(X86_FEATURE_SSB_NO)
+
 /* CPUID level 0x00000007:0.edx */
 #define cpu_has_avx512_4vnniw   boot_cpu_has(X86_FEATURE_AVX512_4VNNIW)
 #define cpu_has_avx512_4fmaps   boot_cpu_has(X86_FEATURE_AVX512_4FMAPS)
diff --git a/xen/include/asm-x86/hvm/svm/svm.h b/xen/include/asm-x86/hvm/svm/svm.h
index 52752fe5ab9c..2b522ee79b17 100644
--- a/xen/include/asm-x86/hvm/svm/svm.h
+++ b/xen/include/asm-x86/hvm/svm/svm.h
@@ -74,6 +74,7 @@ extern u32 svm_feature_flags;
 #define SVM_FEATURE_PAUSETHRESH   12 /* Pause intercept filter support */
 #define SVM_FEATURE_VLOADSAVE     15 /* virtual vmload/vmsave */
 #define SVM_FEATURE_VGIF          16 /* Virtual GIF */
+#define SVM_FEATURE_SPEC_CTRL     20 /* MSR_SPEC_CTRL virtualisation */
 
 #define cpu_has_svm_feature(f) (svm_feature_flags & (1u << (f)))
 #define cpu_has_svm_npt       cpu_has_svm_feature(SVM_FEATURE_NPT)
@@ -87,6 +88,7 @@ extern u32 svm_feature_flags;
 #define cpu_has_pause_thresh  cpu_has_svm_feature(SVM_FEATURE_PAUSETHRESH)
 #define cpu_has_tsc_ratio     cpu_has_svm_feature(SVM_FEATURE_TSCRATEMSR)
 #define cpu_has_svm_vloadsave cpu_has_svm_feature(SVM_FEATURE_VLOADSAVE)
+#define cpu_has_svm_spec_ctrl cpu_has_svm_feature(SVM_FEATURE_SPEC_CTRL)
 
 #define SVM_PAUSEFILTER_INIT    4000
 #define SVM_PAUSETHRESH_INIT    1000
diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h b/xen/include/asm-x86/hvm/svm/vmcb.h
index 10b51c64bb3b..560dbb28736f 100644
--- a/xen/include/asm-x86/hvm/svm/vmcb.h
+++ b/xen/include/asm-x86/hvm/svm/vmcb.h
@@ -494,7 +494,9 @@ struct vmcb_struct {
     u64 _lastbranchtoip;        /* cleanbit 10 */
     u64 _lastintfromip;         /* cleanbit 10 */
     u64 _lastinttoip;           /* cleanbit 10 */
-    u64 res17[301];
+    u64 res17[9];
+    u64 spec_ctrl;
+    u64 res18[291];
 };
 
 struct svm_domain {
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 2a80660d849d..20653bd6601e 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -41,6 +41,7 @@
 #define SPEC_CTRL_IBRS			(_AC(1, ULL) << 0)
 #define SPEC_CTRL_STIBP			(_AC(1, ULL) << 1)
 #define SPEC_CTRL_SSBD			(_AC(1, ULL) << 2)
+#define SPEC_CTRL_PSFD			(_AC(1, ULL) << 7)
 
 #define MSR_PRED_CMD			0x00000049
 #define PRED_CMD_IBPB			(_AC(1, ULL) << 0)
@@ -272,6 +273,7 @@
 #define MSR_K8_ENABLE_C1E		0xc0010055
 #define MSR_K8_VM_CR			0xc0010114
 #define MSR_K8_VM_HSAVE_PA		0xc0010117
+#define MSR_VIRT_SPEC_CTRL		0xc001011f /* Layout matches MSR_SPEC_CTRL */
 
 #define MSR_F15H_CU_POWER		0xc001007a
 #define MSR_F15H_CU_MAX_POWER		0xc001007b
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index a1d1619643e3..6084e0569de4 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -248,6 +248,16 @@ XEN_CPUFEATURE(CLZERO,        8*32+ 0) /*A  CLZERO instruction */
 XEN_CPUFEATURE(RSTR_FP_ERR_PTRS, 8*32+ 2) /*A  (F)X{SAVE,RSTOR} always saves/restores FPU Error pointers */
 XEN_CPUFEATURE(WBNOINVD,      8*32+ 9) /*   WBNOINVD instruction */
 XEN_CPUFEATURE(IBPB,          8*32+12) /*A  IBPB support only (no IBRS, used by AMD) */
+XEN_CPUFEATURE(IBRS,          8*32+14) /*   MSR_SPEC_CTRL.IBRS */
+XEN_CPUFEATURE(AMD_STIBP,     8*32+15) /*   MSR_SPEC_CTRL.STIBP */
+XEN_CPUFEATURE(IBRS_ALWAYS,   8*32+16) /*   IBRS preferred always on */
+XEN_CPUFEATURE(STIBP_ALWAYS,  8*32+17) /*   STIBP preferred always on */
+XEN_CPUFEATURE(IBRS_FAST,     8*32+18) /*   IBRS preferred over software options */
+XEN_CPUFEATURE(IBRS_SAME_MODE, 8*32+19) /*   IBRS provides same-mode protection */
+XEN_CPUFEATURE(AMD_SSBD,      8*32+24) /*   MSR_SPEC_CTRL.SSBD available */
+XEN_CPUFEATURE(VIRT_SSBD,     8*32+25) /*   MSR_VIRT_SPEC_CTRL.SSBD */
+XEN_CPUFEATURE(SSB_NO,        8*32+26) /*   Hardware not vulnerable to SSB */
+XEN_CPUFEATURE(PSFD,          8*32+28) /*   MSR_SPEC_CTRL.PSFD */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/amd: Use newer SSBD mechanisms if they exist

The opencoded legacy Memory Disambiguation logic in init_amd() neglected
Fam19h for the Zen3 microarchitecture.  Further more, all Zen2 based system
have the architectural MSR_SPEC_CTRL and the SSBD bit within it, so shouldn't
be using MSR_AMD64_LS_CFG.

Implement the algorithm given in AMD's SSBD whitepaper, and leave a
printk_once() behind in the case that no controls can be found.

This now means that a user explicitly choosing `spec-ctrl=ssbd` will properly
turn off Memory Disambiguation on Fam19h/Zen3 systems.

This still remains a single system-wide setting (for now), and is not context
switched between vCPUs.  As such, it doesn't interact with Intel's use of
MSR_SPEC_CTRL and default_xen_spec_ctrl (yet).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 2a4e6c4e4bea2e0bb720418c331ee28ff9c7632e)

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index aa1b9d0dda6b..de0389810b25 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -540,6 +540,56 @@ void early_init_amd(struct cpuinfo_x86 *c)
 	ctxt_switch_levelling(NULL);
 }
 
+/*
+ * Refer to the AMD Speculative Store Bypass whitepaper:
+ * https://developer.amd.com/wp-content/resources/124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf
+ */
+void amd_init_ssbd(const struct cpuinfo_x86 *c)
+{
+	int bit = -1;
+
+	if (cpu_has_ssb_no)
+		return;
+
+	if (cpu_has_amd_ssbd) {
+		wrmsrl(MSR_SPEC_CTRL, opt_ssbd ? SPEC_CTRL_SSBD : 0);
+		return;
+	}
+
+	if (cpu_has_virt_ssbd) {
+		wrmsrl(MSR_VIRT_SPEC_CTRL, opt_ssbd ? SPEC_CTRL_SSBD : 0);
+		return;
+	}
+
+	switch (c->x86) {
+	case 0x15: bit = 54; break;
+	case 0x16: bit = 33; break;
+	case 0x17:
+	case 0x18: bit = 10; break;
+	}
+
+	if (bit >= 0) {
+		uint64_t val, mask = 1ull << bit;
+
+		if (rdmsr_safe(MSR_AMD64_LS_CFG, val) ||
+		    ({
+			    val &= ~mask;
+			    if (opt_ssbd)
+				    val |= mask;
+			    false;
+		    }) ||
+		    wrmsr_safe(MSR_AMD64_LS_CFG, val) ||
+		    ({
+			    rdmsrl(MSR_AMD64_LS_CFG, val);
+			    (val & mask) != (opt_ssbd * mask);
+		    }))
+			bit = -1;
+	}
+
+	if (bit < 0)
+		printk_once(XENLOG_ERR "No SSBD controls available\n");
+}
+
 static void init_amd(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
@@ -616,24 +666,7 @@ static void init_amd(struct cpuinfo_x86 *c)
 				  c->x86_capability);
 	}
 
-	/*
-	 * If the user has explicitly chosen to disable Memory Disambiguation
-	 * to mitigiate Speculative Store Bypass, poke the appropriate MSR.
-	 */
-	if (opt_ssbd) {
-		int bit = -1;
-
-		switch (c->x86) {
-		case 0x15: bit = 54; break;
-		case 0x16: bit = 33; break;
-		case 0x17: bit = 10; break;
-		}
-
-		if (bit >= 0 && !rdmsr_safe(MSR_AMD64_LS_CFG, value)) {
-			value |= 1ull << bit;
-			wrmsr_safe(MSR_AMD64_LS_CFG, value);
-		}
-	}
+	amd_init_ssbd(c);
 
 	/* MFENCE stops RDTSC speculation */
 	if (!cpu_has_lfence_dispatch)
diff --git a/xen/arch/x86/cpu/cpu.h b/xen/arch/x86/cpu/cpu.h
index c2f4d9a06ae5..23ccd6611565 100644
--- a/xen/arch/x86/cpu/cpu.h
+++ b/xen/arch/x86/cpu/cpu.h
@@ -19,3 +19,4 @@ extern void detect_ht(struct cpuinfo_x86 *c);
 extern bool detect_extended_topology(struct cpuinfo_x86 *c);
 
 void early_init_amd(struct cpuinfo_x86 *c);
+void amd_init_ssbd(const struct cpuinfo_x86 *c);
diff --git a/xen/arch/x86/cpu/hygon.c b/xen/arch/x86/cpu/hygon.c
index 9ab7aa8622e7..b769e8ad9062 100644
--- a/xen/arch/x86/cpu/hygon.c
+++ b/xen/arch/x86/cpu/hygon.c
@@ -59,14 +59,7 @@ static void init_hygon(struct cpuinfo_x86 *c)
 		__set_bit(X86_FEATURE_LFENCE_DISPATCH,
 			  c->x86_capability);
 
-	/*
-	 * If the user has explicitly chosen to disable Memory Disambiguation
-	 * to mitigiate Speculative Store Bypass, poke the appropriate MSR.
-	 */
-	if (opt_ssbd && !rdmsr_safe(MSR_AMD64_LS_CFG, value)) {
-		value |= 1ull << 10;
-		wrmsr_safe(MSR_AMD64_LS_CFG, value);
-	}
+	amd_init_ssbd(c);
 
 	/* MFENCE stops RDTSC speculation */
 	if (!cpu_has_lfence_dispatch)
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index fec145a7b9e1..be3528c28ebb 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -331,6 +331,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (caps & ARCH_CAPS_IBRS_ALL)                       ? " IBRS_ALL"       : "",
            (caps & ARCH_CAPS_RSBA)                           ? " RSBA"           : "",
            (caps & ARCH_CAPS_SKIP_L1DFL)                     ? " SKIP_L1DFL"     : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_SSB_NO)) ||
            (caps & ARCH_CAPS_SSB_NO)                         ? " SSB_NO"         : "",
            (caps & ARCH_CAPS_MDS_NO)                         ? " MDS_NO"         : "",
            (caps & ARCH_CAPS_TAA_NO)                         ? " TAA_NO"         : "",
@@ -339,15 +340,17 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (caps & ARCH_CAPS_PSDP_NO)                        ? " PSDP_NO"        : "");
 
     /* Hardware features which need driving to mitigate issues. */
-    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s\n",
            (e8b  & cpufeat_mask(X86_FEATURE_IBPB)) ||
            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB))          ? " IBPB"           : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB))          ? " IBRS"           : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_STIBP))          ? " STIBP"          : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_AMD_SSBD)) ||
            (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))           ? " SSBD"           : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH))      ? " L1D_FLUSH"      : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR))       ? " MD_CLEAR"       : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_SRBDS_CTRL))     ? " SRBDS_CTRL"     : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_VIRT_SSBD))      ? " VIRT_SSBD"      : "",
            (caps & ARCH_CAPS_TSX_CTRL)                       ? " TSX_CTRL"       : "",
            (caps & ARCH_CAPS_FB_CLEAR)                       ? " FB_CLEAR"       : "",
            (caps & ARCH_CAPS_FB_CLEAR_CTRL)                  ? " FB_CLEAR_CTRL"  : "");
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Print all AMD speculative hints/features

We already print Intel features that aren't yet implemented/used, so be
consistent on AMD too.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 3d189f16a11d5a209fb47fa3635847608d43745c)
[Forward port over XSA-398]

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index be3528c28ebb..0cc9b3fed52c 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -326,7 +326,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * Hardware read-only information, stating immunity to certain issues, or
      * suggestions of which mitigation to use.
      */
-    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (caps & ARCH_CAPS_RDCL_NO)                        ? " RDCL_NO"        : "",
            (caps & ARCH_CAPS_IBRS_ALL)                       ? " IBRS_ALL"       : "",
            (caps & ARCH_CAPS_RSBA)                           ? " RSBA"           : "",
@@ -337,16 +337,23 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (caps & ARCH_CAPS_TAA_NO)                         ? " TAA_NO"         : "",
            (caps & ARCH_CAPS_SBDR_SSDP_NO)                   ? " SBDR_SSDP_NO"   : "",
            (caps & ARCH_CAPS_FBSDP_NO)                       ? " FBSDP_NO"       : "",
-           (caps & ARCH_CAPS_PSDP_NO)                        ? " PSDP_NO"        : "");
+           (caps & ARCH_CAPS_PSDP_NO)                        ? " PSDP_NO"        : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS))    ? " IBRS_ALWAYS"    : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS))   ? " STIBP_ALWAYS"   : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_FAST))      ? " IBRS_FAST"      : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "");
 
     /* Hardware features which need driving to mitigate issues. */
-    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (e8b  & cpufeat_mask(X86_FEATURE_IBPB)) ||
            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB))          ? " IBPB"           : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_IBRS)) ||
            (_7d0 & cpufeat_mask(X86_FEATURE_IBRSB))          ? " IBRS"           : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_AMD_STIBP)) ||
            (_7d0 & cpufeat_mask(X86_FEATURE_STIBP))          ? " STIBP"          : "",
            (e8b  & cpufeat_mask(X86_FEATURE_AMD_SSBD)) ||
            (_7d0 & cpufeat_mask(X86_FEATURE_SSBD))           ? " SSBD"           : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_PSFD))           ? " PSFD"           : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_L1D_FLUSH))      ? " L1D_FLUSH"      : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_MD_CLEAR))       ? " MD_CLEAR"       : "",
            (_7d0 & cpufeat_mask(X86_FEATURE_SRBDS_CTRL))     ? " SRBDS_CTRL"     : "",
@@ -367,14 +374,19 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s, Other:%s%s%s%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s, Other:%s%s%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
            thunk == THUNK_JMP       ? "JMP" : "?",
-           !boot_cpu_has(X86_FEATURE_IBRSB)          ? "No" :
+           (!boot_cpu_has(X86_FEATURE_IBRSB) &&
+            !boot_cpu_has(X86_FEATURE_IBRS))         ? "No" :
            (default_xen_spec_ctrl & SPEC_CTRL_IBRS)  ? "IBRS+" :  "IBRS-",
-           !boot_cpu_has(X86_FEATURE_SSBD)           ? "" :
+           (!boot_cpu_has(X86_FEATURE_STIBP) &&
+            !boot_cpu_has(X86_FEATURE_AMD_STIBP))    ? "" :
+           (default_xen_spec_ctrl & SPEC_CTRL_STIBP) ? " STIBP+" : " STIBP-",
+           (!boot_cpu_has(X86_FEATURE_SSBD) &&
+            !boot_cpu_has(X86_FEATURE_AMD_SSBD))     ? "" :
            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
            !(caps & ARCH_CAPS_TSX_CTRL)              ? "" :
            (opt_tsx & 1)                             ? " TSX+" : " TSX-",
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Drop use_spec_ctrl boolean

Several bugfixes have reduced the utility of this variable from it's original
purpose, and now all it does is aid in the setup of SCF_ist_wrmsr.

Simplify the logic by drop the variable, and doubling up the setting of
SCF_ist_wrmsr for the PV and HVM blocks, which will make the AMD SPEC_CTRL
support easier to follow.  Leave a comment explaining why SCF_ist_wrmsr is
still necessary for the VMExit case.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit ec083bf552c35e10347449e21809f4780f8155d2)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 0cc9b3fed52c..a072ec987793 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -923,7 +923,7 @@ void spec_ctrl_init_domain(struct domain *d)
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
-    bool use_spec_ctrl = false, ibrs = false, hw_smt_enabled;
+    bool ibrs = false, hw_smt_enabled;
     bool cpu_has_bug_taa;
     uint64_t caps = 0;
 
@@ -998,19 +998,21 @@ void __init init_speculation_mitigations(void)
     {
         if ( opt_msr_sc_pv )
         {
-            use_spec_ctrl = true;
+            default_spec_ctrl_flags |= SCF_ist_wrmsr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_PV);
         }
 
         if ( opt_msr_sc_hvm )
         {
-            use_spec_ctrl = true;
+            /*
+             * While the guest MSR_SPEC_CTRL value is loaded/saved atomically,
+             * Xen's value is not restored atomically.  An early NMI hitting
+             * the VMExit path needs to restore Xen's value for safety.
+             */
+            default_spec_ctrl_flags |= SCF_ist_wrmsr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
         }
 
-        if ( use_spec_ctrl )
-            default_spec_ctrl_flags |= SCF_ist_wrmsr;
-
         if ( ibrs )
             default_xen_spec_ctrl |= SPEC_CTRL_IBRS;
     }
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Introduce new has_spec_ctrl boolean

Most MSR_SPEC_CTRL setup will be common between Intel and AMD.  Instead of
opencoding an OR of two features everywhere, introduce has_spec_ctrl instead.

Reword the comment above the Intel specific alternatives block to highlight
that it is Intel specific, and pull the setting of default_xen_spec_ctrl.IBRS
out because it will want to be common.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 5d9eff3a312763d889cfbf3c8468b6dfb3ab490c)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index a072ec987793..422714f3a211 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -923,7 +923,7 @@ void spec_ctrl_init_domain(struct domain *d)
 void __init init_speculation_mitigations(void)
 {
     enum ind_thunk thunk = THUNK_DEFAULT;
-    bool ibrs = false, hw_smt_enabled;
+    bool has_spec_ctrl, ibrs = false, hw_smt_enabled;
     bool cpu_has_bug_taa;
     uint64_t caps = 0;
 
@@ -932,6 +932,8 @@ void __init init_speculation_mitigations(void)
 
     hw_smt_enabled = check_smt_enabled();
 
+    has_spec_ctrl = boot_cpu_has(X86_FEATURE_IBRSB);
+
     /*
      * Has the user specified any custom BTI mitigations?  If so, follow their
      * instructions exactly and disable all heuristics.
@@ -955,11 +957,11 @@ void __init init_speculation_mitigations(void)
              */
             if ( retpoline_safe(caps) )
                 thunk = THUNK_RETPOLINE;
-            else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+            else if ( has_spec_ctrl )
                 ibrs = true;
         }
         /* Without compiler thunk support, use IBRS if available. */
-        else if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+        else if ( has_spec_ctrl )
             ibrs = true;
     }
 
@@ -990,10 +992,7 @@ void __init init_speculation_mitigations(void)
     else if ( thunk == THUNK_JMP )
         setup_force_cpu_cap(X86_FEATURE_IND_THUNK_JMP);
 
-    /*
-     * If we are on hardware supporting MSR_SPEC_CTRL, see about setting up
-     * the alternatives blocks so we can virtualise support for guests.
-     */
+    /* Intel hardware: MSR_SPEC_CTRL alternatives setup. */
     if ( boot_cpu_has(X86_FEATURE_IBRSB) )
     {
         if ( opt_msr_sc_pv )
@@ -1012,11 +1011,12 @@ void __init init_speculation_mitigations(void)
             default_spec_ctrl_flags |= SCF_ist_wrmsr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
         }
-
-        if ( ibrs )
-            default_xen_spec_ctrl |= SPEC_CTRL_IBRS;
     }
 
+    /* If we have IBRS available, see whether we should use it. */
+    if ( has_spec_ctrl && ibrs )
+        default_xen_spec_ctrl |= SPEC_CTRL_IBRS;
+
     /* If we have SSBD available, see whether we should use it. */
     if ( boot_cpu_has(X86_FEATURE_SSBD) && opt_ssbd )
         default_xen_spec_ctrl |= SPEC_CTRL_SSBD;
@@ -1250,7 +1250,7 @@ void __init init_speculation_mitigations(void)
      * boot won't have any other code running in a position to mount an
      * attack.
      */
-    if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+    if ( has_spec_ctrl )
     {
         bsp_delay_spec_ctrl = !cpu_has_hypervisor && default_xen_spec_ctrl;
 
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Don't use spec_ctrl_{enter,exit}_idle() for S3

'idle' here refers to hlt/mwait.  The S3 path isn't an idle path - it is a
platform reset.

We need to load default_xen_spec_ctrl unilaterally on the way back up.
Currently it happens as a side effect of X86_FEATURE_SC_MSR_IDLE or the next
return-to-guest, but that's fragile behaviour.

Conversely, there is no need to clear IBRS and flush the store buffers on the
way down; we're microseconds away from cutting power.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 71fac402e05ade7b0af2c34f77517449f6f7e2c1)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 19b04aed2208..6424a9021293 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -245,7 +245,6 @@ static int enter_state(u32 state)
         error = 0;
 
     ci = get_cpu_info();
-    spec_ctrl_enter_idle(ci);
     /* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */
     ci->spec_ctrl_flags &= ~SCF_ist_wrmsr;
 
@@ -295,7 +294,9 @@ static int enter_state(u32 state)
 
     /* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
     ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
-    spec_ctrl_exit_idle(ci);
+
+    if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+        wrmsrl(MSR_SPEC_CTRL, default_xen_spec_ctrl);
 
     if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) )
         wrmsrl(MSR_MCU_OPT_CTRL, default_xen_mcu_opt_ctrl);
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Use common MSR_SPEC_CTRL logic for AMD

Currently, amd_init_ssbd() works by being the only write to MSR_SPEC_CTRL in
the system.  This ceases to be true when using the common logic.

Include AMD MSR_SPEC_CTRL in has_spec_ctrl to activate the common paths, and
introduce an AMD specific block to control alternatives.  Also update the
boot/resume paths to configure default_xen_spec_ctrl.

svm.h needs an adjustment to remove a dependency on include order.

For now, only active alternatives for HVM - PV will require more work.  No
functional change, as no alternatives are defined yet for HVM yet.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 378f2e6df31442396f0afda19794c5c6091d96f9)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 6424a9021293..df77b2ca5cfb 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -295,7 +295,7 @@ static int enter_state(u32 state)
     /* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
     ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
 
-    if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+    if ( boot_cpu_has(X86_FEATURE_IBRSB) || boot_cpu_has(X86_FEATURE_IBRS) )
         wrmsrl(MSR_SPEC_CTRL, default_xen_spec_ctrl);
 
     if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) )
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index de0389810b25..d5dece37cdeb 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -552,7 +552,7 @@ void amd_init_ssbd(const struct cpuinfo_x86 *c)
 		return;
 
 	if (cpu_has_amd_ssbd) {
-		wrmsrl(MSR_SPEC_CTRL, opt_ssbd ? SPEC_CTRL_SSBD : 0);
+		/* Handled by common MSR_SPEC_CTRL logic */
 		return;
 	}
 
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 2d525c8b2c9c..91e2ae626300 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -365,7 +365,7 @@ void start_secondary(void *unused)
      * settings.  Note: These MSRs may only become available after loading
      * microcode.
      */
-    if ( boot_cpu_has(X86_FEATURE_IBRSB) )
+    if ( boot_cpu_has(X86_FEATURE_IBRSB) || boot_cpu_has(X86_FEATURE_IBRS) )
         wrmsrl(MSR_SPEC_CTRL, default_xen_spec_ctrl);
     if ( boot_cpu_has(X86_FEATURE_SRBDS_CTRL) )
         wrmsrl(MSR_MCU_OPT_CTRL, default_xen_mcu_opt_ctrl);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 422714f3a211..a768c57b45e0 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -21,6 +21,7 @@
 #include <xen/lib.h>
 #include <xen/warning.h>
 
+#include <asm/hvm/svm/svm.h>
 #include <asm/microcode.h>
 #include <asm/msr.h>
 #include <asm/processor.h>
@@ -932,7 +933,8 @@ void __init init_speculation_mitigations(void)
 
     hw_smt_enabled = check_smt_enabled();
 
-    has_spec_ctrl = boot_cpu_has(X86_FEATURE_IBRSB);
+    has_spec_ctrl = (boot_cpu_has(X86_FEATURE_IBRSB) ||
+                     boot_cpu_has(X86_FEATURE_IBRS));
 
     /*
      * Has the user specified any custom BTI mitigations?  If so, follow their
@@ -1013,12 +1015,32 @@ void __init init_speculation_mitigations(void)
         }
     }
 
+    /* AMD hardware: MSR_SPEC_CTRL alternatives setup. */
+    if ( boot_cpu_has(X86_FEATURE_IBRS) )
+    {
+        /*
+         * Virtualising MSR_SPEC_CTRL for guests depends on SVM support, which
+         * on real hardware matches the availability of MSR_SPEC_CTRL in the
+         * first place.
+         *
+         * No need for SCF_ist_wrmsr because Xen's value is restored
+         * atomically WRT NMIs in the VMExit path.
+         *
+         * TODO: Adjust cpu_has_svm_spec_ctrl to be usable earlier on boot.
+         */
+        if ( opt_msr_sc_hvm &&
+             (boot_cpu_data.extended_cpuid_level >= 0x8000000a) &&
+             (cpuid_edx(0x8000000a) & (1u << SVM_FEATURE_SPEC_CTRL)) )
+            setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
+    }
+
     /* If we have IBRS available, see whether we should use it. */
     if ( has_spec_ctrl && ibrs )
         default_xen_spec_ctrl |= SPEC_CTRL_IBRS;
 
     /* If we have SSBD available, see whether we should use it. */
-    if ( boot_cpu_has(X86_FEATURE_SSBD) && opt_ssbd )
+    if ( opt_ssbd && (boot_cpu_has(X86_FEATURE_SSBD) ||
+                      boot_cpu_has(X86_FEATURE_AMD_SSBD)) )
         default_xen_spec_ctrl |= SPEC_CTRL_SSBD;
 
     /*
diff --git a/xen/include/asm-x86/hvm/svm/svm.h b/xen/include/asm-x86/hvm/svm/svm.h
index 2b522ee79b17..f28b40996ca0 100644
--- a/xen/include/asm-x86/hvm/svm/svm.h
+++ b/xen/include/asm-x86/hvm/svm/svm.h
@@ -45,6 +45,9 @@ static inline void svm_invlpga(unsigned long linear, uint32_t asid)
         "a" (linear), "c" (asid));
 }
 
+struct cpu_user_regs;
+struct vcpu;
+
 unsigned long *svm_msrbit(unsigned long *msr_bitmap, uint32_t msr);
 void __update_guest_eip(struct cpu_user_regs *regs, unsigned int inst_len);
 void svm_update_guest_cr(struct vcpu *, unsigned int cr, unsigned int flags);
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Only adjust MSR_SPEC_CTRL for idle with legacy IBRS

Back at the time of the original Spectre-v2 fixes, it was recommended to clear
MSR_SPEC_CTRL when going idle.  This is because of the side effects on the
sibling thread caused by the microcode IBRS and STIBP implementations which
were retrofitted to existing CPUs.

However, there are no relevant cross-thread impacts for the hardware
IBRS/STIBP implementations, so this logic should not be used on Intel CPUs
supporting eIBRS, or any AMD CPUs; doing so only adds unnecessary latency to
the idle path.

Furthermore, there's no point playing with MSR_SPEC_CTRL in the idle paths if
SMT is disabled for other reasons.

Fixes: 8d03080d2a33 ("x86/spec-ctrl: Cease using thunk=lfence on AMD")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit ffc7694e0c99eea158c32aa164b7d1e1bb1dc46b)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index a768c57b45e0..e037e9b52ea2 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -1080,8 +1080,14 @@ void __init init_speculation_mitigations(void)
     /* (Re)init BSP state now that default_spec_ctrl_flags has been calculated. */
     init_shadow_spec_ctrl_state();
 
-    /* If Xen is using any MSR_SPEC_CTRL settings, adjust the idle path. */
-    if ( default_xen_spec_ctrl )
+    /*
+     * For microcoded IBRS only (i.e. Intel, pre eIBRS), it is recommended to
+     * clear MSR_SPEC_CTRL before going idle, to avoid impacting sibling
+     * threads.  Activate this if SMT is enabled, and Xen is using a non-zero
+     * MSR_SPEC_CTRL setting.
+     */
+    if ( boot_cpu_has(X86_FEATURE_IBRSB) && !(caps & ARCH_CAPS_IBRS_ALL) &&
+         hw_smt_enabled && default_xen_spec_ctrl )
         setup_force_cpu_cap(X86_FEATURE_SC_MSR_IDLE);
 
     xpti_init_default(caps);
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index bcba926bda41..fc54c24a34ef 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -33,7 +33,7 @@ XEN_CPUFEATURE(SC_MSR_HVM,        X86_SYNTH(17)) /* MSR_SPEC_CTRL used by Xen fo
 XEN_CPUFEATURE(SC_RSB_PV,         X86_SYNTH(18)) /* RSB overwrite needed for PV */
 XEN_CPUFEATURE(SC_RSB_HVM,        X86_SYNTH(19)) /* RSB overwrite needed for HVM */
 XEN_CPUFEATURE(XEN_SELFSNOOP,     X86_SYNTH(20)) /* SELFSNOOP gets used by Xen itself */
-XEN_CPUFEATURE(SC_MSR_IDLE,       X86_SYNTH(21)) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
+XEN_CPUFEATURE(SC_MSR_IDLE,       X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on idle */
 XEN_CPUFEATURE(XEN_LBR,           X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
 /* Bits 23,24 unused. */
 XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 157a2c67d89c..ffc054975ee3 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -80,7 +80,8 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     uint32_t val = 0;
 
     /*
-     * Branch Target Injection:
+     * It is recommended in some cases to clear MSR_SPEC_CTRL when going idle,
+     * to avoid impacting sibling threads.
      *
      * Latch the new shadow value, then enable shadowing, then update the MSR.
      * There are no SMP issues here; only local processor ordering concerns.
@@ -116,7 +117,7 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     uint32_t val = info->xen_spec_ctrl;
 
     /*
-     * Branch Target Injection:
+     * Restore MSR_SPEC_CTRL on exit from idle.
      *
      * Disable shadowing before updating the MSR.  There are no SMP issues
      * here; only local processor ordering concerns.
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Knobs for STIBP and PSFD, and follow hardware STIBP
 hint

STIBP and PSFD are slightly weird bits, because they're both implied by other
bits in MSR_SPEC_CTRL.  Add fine grain controls for them, and take the
implications into account when setting IBRS/SSBD.

Rearrange the IBPB text/variables/logic to keep all the MSR_SPEC_CTRL bits
together, for consistency.

However, AMD have a hardware hint CPUID bit recommending that STIBP be set
unilaterally.  This is advertised on Zen3, so follow the recommendation.
Furthermore, in such cases, set STIBP behind the guest's back for now.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit fef244b179c06fcdfa581f7d57fa6e578c49ff50)
[Adjust for absence of INTEL_PSFD]

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 022cb01da762..a998418585bd 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2034,8 +2034,9 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 
 ### spec-ctrl (x86)
 > `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
->              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
->              l1d-flush,branch-harden,srb-lock,unpriv-mmio}=<bool> ]`
+>              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
+>              eager-fpu,l1d-flush,branch-harden,srb-lock,
+>              unpriv-mmio}=<bool> ]`
 
 Controls for speculative execution sidechannel mitigations.  By default, Xen
 will pick the most appropriate mitigations based on compiled in support,
@@ -2085,9 +2086,10 @@ On hardware supporting IBRS (Indirect Branch Restricted Speculation), the
 If Xen is not using IBRS itself, functionality is still set up so IBRS can be
 virtualised for guests.
 
-On hardware supporting IBPB (Indirect Branch Prediction Barrier), the `ibpb=`
-option can be used to force (the default) or prevent Xen from issuing branch
-prediction barriers on vcpu context switches.
+On hardware supporting STIBP (Single Thread Indirect Branch Predictors), the
+`stibp=` option can be used to force or prevent Xen using the feature itself.
+By default, Xen will use STIBP when IBRS is in use (IBRS implies STIBP), and
+when hardware hints recommend using it as a blanket setting.
 
 On hardware supporting SSBD (Speculative Store Bypass Disable), the `ssbd=`
 option can be used to force or prevent Xen using the feature itself.  On AMD
@@ -2095,6 +2097,15 @@ hardware, this is a global option applied at boot, and not virtualised for
 guest use.  On Intel hardware, the feature is virtualised for guests,
 independently of Xen's choice of setting.
 
+On hardware supporting PSFD (Predictive Store Forwarding Disable), the `psfd=`
+option can be used to force or prevent Xen using the feature itself.  By
+default, Xen will not use PSFD.  PSFD is implied by SSBD, and SSBD is off by
+default.
+
+On hardware supporting IBPB (Indirect Branch Prediction Barrier), the `ibpb=`
+option can be used to force (the default) or prevent Xen from issuing branch
+prediction barriers on vcpu context switches.
+
 On all hardware, the `eager-fpu=` option can be used to force or prevent Xen
 from using fully eager FPU context switches.  This is currently implemented as
 a global control.  By default, Xen will choose to use fully eager context
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index e037e9b52ea2..a4121e25cc63 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -48,9 +48,13 @@ static enum ind_thunk {
     THUNK_LFENCE,
     THUNK_JMP,
 } opt_thunk __initdata = THUNK_DEFAULT;
+
 static int8_t __initdata opt_ibrs = -1;
+int8_t __initdata opt_stibp = -1;
+bool __read_mostly opt_ssbd;
+int8_t __initdata opt_psfd = -1;
+
 bool __read_mostly opt_ibpb = true;
-bool __read_mostly opt_ssbd = false;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 bool __read_mostly opt_branch_harden = true;
@@ -174,12 +178,20 @@ static int __init parse_spec_ctrl(const char *s)
             else
                 rc = -EINVAL;
         }
+
+        /* Bits in MSR_SPEC_CTRL. */
         else if ( (val = parse_boolean("ibrs", s, ss)) >= 0 )
             opt_ibrs = val;
-        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
-            opt_ibpb = val;
+        else if ( (val = parse_boolean("stibp", s, ss)) >= 0 )
+            opt_stibp = val;
         else if ( (val = parse_boolean("ssbd", s, ss)) >= 0 )
             opt_ssbd = val;
+        else if ( (val = parse_boolean("psfd", s, ss)) >= 0 )
+            opt_psfd = val;
+
+        /* Misc settings. */
+        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+            opt_ibpb = val;
         else if ( (val = parse_boolean("eager-fpu", s, ss)) >= 0 )
             opt_eager_fpu = val;
         else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
@@ -375,7 +387,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s, Other:%s%s%s%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -389,6 +401,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (!boot_cpu_has(X86_FEATURE_SSBD) &&
             !boot_cpu_has(X86_FEATURE_AMD_SSBD))     ? "" :
            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
+           !boot_cpu_has(X86_FEATURE_PSFD) &&
+           (default_xen_spec_ctrl & SPEC_CTRL_PSFD)  ? " PSFD+" : " PSFD-",
            !(caps & ARCH_CAPS_TSX_CTRL)              ? "" :
            (opt_tsx & 1)                             ? " TSX+" : " TSX-",
            !boot_cpu_has(X86_FEATURE_SRBDS_CTRL)     ? "" :
@@ -1034,14 +1048,48 @@ void __init init_speculation_mitigations(void)
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
     }
 
-    /* If we have IBRS available, see whether we should use it. */
+    /* Figure out default_xen_spec_ctrl. */
     if ( has_spec_ctrl && ibrs )
+    {
+        /* IBRS implies STIBP.  */
+        if ( opt_stibp == -1 )
+            opt_stibp = 1;
+
         default_xen_spec_ctrl |= SPEC_CTRL_IBRS;
+    }
+
+    /*
+     * Use STIBP by default if the hardware hint is set.  Otherwise, leave it
+     * off as it a severe performance pentalty on pre-eIBRS Intel hardware
+     * where it was retrofitted in microcode.
+     */
+    if ( opt_stibp == -1 )
+        opt_stibp = !!boot_cpu_has(X86_FEATURE_STIBP_ALWAYS);
+
+    if ( opt_stibp && (boot_cpu_has(X86_FEATURE_STIBP) ||
+                       boot_cpu_has(X86_FEATURE_AMD_STIBP)) )
+        default_xen_spec_ctrl |= SPEC_CTRL_STIBP;
 
-    /* If we have SSBD available, see whether we should use it. */
     if ( opt_ssbd && (boot_cpu_has(X86_FEATURE_SSBD) ||
                       boot_cpu_has(X86_FEATURE_AMD_SSBD)) )
+    {
+        /* SSBD implies PSFD */
+        if ( opt_psfd == -1 )
+            opt_psfd = 1;
+
         default_xen_spec_ctrl |= SPEC_CTRL_SSBD;
+    }
+
+    /*
+     * Don't use PSFD by default.  AMD designed the predictor to
+     * auto-clear on privilege change.  PSFD is implied by SSBD, which is
+     * off by default.
+     */
+    if ( opt_psfd == -1 )
+        opt_psfd = 0;
+
+    if ( opt_psfd && boot_cpu_has(X86_FEATURE_PSFD) )
+        default_xen_spec_ctrl |= SPEC_CTRL_PSFD;
 
     /*
      * PV guests can poison the RSB to any virtual address from which
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: xen/cmdline: Extend parse_boolean() to signal a name match

This will help parsing a sub-option which has boolean and non-boolean options
available.

First, rework 'int val' into 'bool has_neg_prefix'.  This inverts it's value,
but the resulting logic is far easier to follow.

Second, reject anything of the form 'no-$FOO=' which excludes ambiguous
constructs such as 'no-$foo=yes' which have never been valid.

This just leaves the case where everything is otherwise fine, but parse_bool()
can't interpret the provided string.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 382326cac528dd1eb0d04efd5c05363c453e29f4)

diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index de2fd1e334d2..7ba07eaaa6d1 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -275,9 +275,9 @@ int parse_bool(const char *s, const char *e)
 int parse_boolean(const char *name, const char *s, const char *e)
 {
     size_t slen, nlen;
-    int val = !!strncmp(s, "no-", 3);
+    bool has_neg_prefix = !strncmp(s, "no-", 3);
 
-    if ( !val )
+    if ( has_neg_prefix )
         s += 3;
 
     slen = e ? ({ ASSERT(e >= s); e - s; }) : strlen(s);
@@ -289,11 +289,23 @@ int parse_boolean(const char *name, const char *s, const char *e)
 
     /* Exact, unadorned name?  Result depends on the 'no-' prefix. */
     if ( slen == nlen )
-        return val;
+        return !has_neg_prefix;
+
+    /* Inexact match with a 'no-' prefix?  Not valid. */
+    if ( has_neg_prefix )
+        return -1;
 
     /* =$SOMETHING?  Defer to the regular boolean parsing. */
     if ( s[nlen] == '=' )
-        return parse_bool(&s[nlen + 1], e);
+    {
+        int b = parse_bool(&s[nlen + 1], e);
+
+        if ( b >= 0 )
+            return b;
+
+        /* Not a boolean, but the name matched.  Signal specially. */
+        return -2;
+    }
 
     /* Unrecognised.  Give up. */
     return -1;
diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h
index 3c27ba902de0..e0a234300a1e 100644
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -79,7 +79,8 @@ int parse_bool(const char *s, const char *e);
 /**
  * Given a specific name, parses a string of the form:
  *   [no-]$NAME[=...]
- * returning 0 or 1 for a recognised boolean, or -1 for an error.
+ * returning 0 or 1 for a recognised boolean.  Returns -1 for general errors,
+ * and -2 for "not a boolean, but $NAME= matches".
  */
 int parse_boolean(const char *name, const char *s, const char *e);
 
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Add fine-grained cmdline suboptions for primitives

Support controling the PV/HVM suboption of msr-sc/rsb/md-clear, which
previously wasn't possible.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 27357c394ba6e1571a89105b840ce1c6f026485c)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index a998418585bd..359859b51808 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2033,7 +2033,8 @@ not be able to control the state of the mitigation.
 By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 
 ### spec-ctrl (x86)
-> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
+> `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
+>              {msr-sc,rsb,md-clear}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
 >              unpriv-mmio}=<bool> ]`
@@ -2058,12 +2059,17 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
+The `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` options offer fine
 grained control over the primitives by Xen.  These impact Xen's ability to
-protect itself, and Xen's ability to virtualise support for guests to use.
+protect itself, and/or Xen's ability to virtualise support for guests to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
+* Each other option can be used either as a plain boolean
+  (e.g. `spec-ctrl=rsb` to control both the PV and HVM sub-options), or with
+  `pv=` or `hvm=` subsuboptions (e.g. `spec-ctrl=rsb=no-hvm` to disable HVM
+  RSB only).
+
 * `msr-sc=` offers control over Xen's support for manipulating `MSR_SPEC_CTRL`
   on entry and exit.  These blocks are necessary to virtualise support for
   guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index a4121e25cc63..58af8a28386a 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -148,20 +148,68 @@ static int __init parse_spec_ctrl(const char *s)
             opt_rsb_hvm = val;
             opt_md_clear_hvm = val;
         }
-        else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
+        else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
         {
-            opt_msr_sc_pv = val;
-            opt_msr_sc_hvm = val;
+            switch ( val )
+            {
+            case 0:
+            case 1:
+                opt_msr_sc_pv = opt_msr_sc_hvm = val;
+                break;
+
+            case -2:
+                s += strlen("msr-sc=");
+                if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+                    opt_msr_sc_pv = val;
+                else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+                    opt_msr_sc_hvm = val;
+                else
+            default:
+                    rc = -EINVAL;
+                break;
+            }
         }
-        else if ( (val = parse_boolean("rsb", s, ss)) >= 0 )
+        else if ( (val = parse_boolean("rsb", s, ss)) != -1 )
         {
-            opt_rsb_pv = val;
-            opt_rsb_hvm = val;
+            switch ( val )
+            {
+            case 0:
+            case 1:
+                opt_rsb_pv = opt_rsb_hvm = val;
+                break;
+
+            case -2:
+                s += strlen("rsb=");
+                if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+                    opt_rsb_pv = val;
+                else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+                    opt_rsb_hvm = val;
+                else
+            default:
+                    rc = -EINVAL;
+                break;
+            }
         }
-        else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
+        else if ( (val = parse_boolean("md-clear", s, ss)) != -1 )
         {
-            opt_md_clear_pv = val;
-            opt_md_clear_hvm = val;
+            switch ( val )
+            {
+            case 0:
+            case 1:
+                opt_md_clear_pv = opt_md_clear_hvm = val;
+                break;
+
+            case -2:
+                s += strlen("md-clear=");
+                if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+                    opt_md_clear_pv = val;
+                else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+                    opt_md_clear_hvm = val;
+                else
+            default:
+                    rc = -EINVAL;
+                break;
+            }
         }
 
         /* Xen's speculative sidechannel mitigation settings. */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rework spec_ctrl_flags context switching

We are shortly going to need to context switch new bits in both the vcpu and
S3 paths.  Introduce SCF_IST_MASK and SCF_DOM_MASK, and rework d->arch.verw
into d->arch.spec_ctrl_flags to accommodate.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index df77b2ca5cfb..f1dbbf7fc920 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -245,8 +245,8 @@ static int enter_state(u32 state)
         error = 0;
 
     ci = get_cpu_info();
-    /* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */
-    ci->spec_ctrl_flags &= ~SCF_ist_wrmsr;
+    /* Avoid NMI/#MC using unsafe MSRs until we've reloaded microcode. */
+    ci->spec_ctrl_flags &= ~SCF_IST_MASK;
 
     ACPI_FLUSH_CPU_CACHE();
 
@@ -292,8 +292,8 @@ static int enter_state(u32 state)
     if ( !recheck_cpu_features(0) )
         panic("Missing previously available feature(s)\n");
 
-    /* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
-    ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
+    /* Re-enabled default NMI/#MC use of MSRs now microcode is loaded. */
+    ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_IST_MASK);
 
     if ( boot_cpu_has(X86_FEATURE_IBRSB) || boot_cpu_has(X86_FEATURE_IBRS) )
         wrmsrl(MSR_SPEC_CTRL, default_xen_spec_ctrl);
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index fe95b25a034e..0f94270254a6 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1820,10 +1820,10 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
             }
         }
 
-        /* Update the top-of-stack block with the VERW disposition. */
-        info->spec_ctrl_flags &= ~SCF_verw;
-        if ( nextd->arch.verw )
-            info->spec_ctrl_flags |= SCF_verw;
+        /* Update the top-of-stack block with the new spec_ctrl settings. */
+        info->spec_ctrl_flags =
+            (info->spec_ctrl_flags       & ~SCF_DOM_MASK) |
+            (nextd->arch.spec_ctrl_flags &  SCF_DOM_MASK);
     }
 
     sched_context_switched(prev, next);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 58af8a28386a..3030400a72e3 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -978,9 +978,12 @@ void spec_ctrl_init_domain(struct domain *d)
 {
     bool pv = is_pv_domain(d);
 
-    d->arch.verw =
-        (pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
-        (opt_fb_clear_mmio && is_iommu_enabled(d));
+    bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
+                 (opt_fb_clear_mmio && is_iommu_enabled(d)));
+
+    d->arch.spec_ctrl_flags =
+        (verw   ? SCF_verw         : 0) |
+        0;
 }
 
 void __init init_speculation_mitigations(void)
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 71d1ca243b32..0527626a266f 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -295,8 +295,7 @@ struct arch_domain
     uint32_t pci_cf8;
     uint8_t cmos_idx;
 
-    /* Use VERW on return-to-guest for its flushing side effect. */
-    bool verw;
+    uint8_t spec_ctrl_flags; /* See SCF_DOM_MASK */
 
     union {
         struct pv_domain pv;
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index ffc054975ee3..a7b1b8f59084 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -20,12 +20,40 @@
 #ifndef __X86_SPEC_CTRL_H__
 #define __X86_SPEC_CTRL_H__
 
-/* Encoding of cpuinfo.spec_ctrl_flags */
+/*
+ * Encoding of:
+ *   cpuinfo.spec_ctrl_flags
+ *   default_spec_ctrl_flags
+ *   domain.spec_ctrl_flags
+ *
+ * Live settings are in the top-of-stack block, because they need to be
+ * accessable when XPTI is active.  Some settings are fixed from boot, some
+ * context switched per domain, and some inhibited in the S3 path.
+ */
 #define SCF_use_shadow (1 << 0)
 #define SCF_ist_wrmsr  (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
 
+/*
+ * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
+ * functionality requires updated microcode to work.
+ *
+ * On boot, this is easy; we load microcode before figuring out which
+ * speculative protections to apply.  However, on the S3 resume path, we must
+ * be able to disable the configured mitigations until microcode is reloaded.
+ *
+ * These are the controls to inhibit on the S3 resume path until microcode has
+ * been reloaded.
+ */
+#define SCF_IST_MASK (SCF_ist_wrmsr)
+
+/*
+ * Some speculative protections are per-domain.  These settings are merged
+ * into the top-of-stack block in the context switch path.
+ */
+#define SCF_DOM_MASK (SCF_verw)
+
 #ifndef __ASSEMBLY__
 
 #include <asm/alternative.h>
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index fe90c80ac39c..12b3111ebdc0 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -255,9 +255,6 @@
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.
- * Fine grain control of SCF_ist_wrmsr is needed for safety in the S3 resume
- * path to avoid using MSR_SPEC_CTRL before the microcode introducing it has
- * been reloaded.
  */
 .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
 /*
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rename SCF_ist_wrmsr to SCF_ist_sc_msr

We are about to introduce SCF_ist_ibpb, at which point SCF_ist_wrmsr becomes
ambiguous.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 3030400a72e3..deab5d990c9f 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -1064,7 +1064,7 @@ void __init init_speculation_mitigations(void)
     {
         if ( opt_msr_sc_pv )
         {
-            default_spec_ctrl_flags |= SCF_ist_wrmsr;
+            default_spec_ctrl_flags |= SCF_ist_sc_msr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_PV);
         }
 
@@ -1075,7 +1075,7 @@ void __init init_speculation_mitigations(void)
              * Xen's value is not restored atomically.  An early NMI hitting
              * the VMExit path needs to restore Xen's value for safety.
              */
-            default_spec_ctrl_flags |= SCF_ist_wrmsr;
+            default_spec_ctrl_flags |= SCF_ist_sc_msr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
         }
     }
@@ -1088,7 +1088,7 @@ void __init init_speculation_mitigations(void)
          * on real hardware matches the availability of MSR_SPEC_CTRL in the
          * first place.
          *
-         * No need for SCF_ist_wrmsr because Xen's value is restored
+         * No need for SCF_ist_sc_msr because Xen's value is restored
          * atomically WRT NMIs in the VMExit path.
          *
          * TODO: Adjust cpu_has_svm_spec_ctrl to be usable earlier on boot.
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index a7b1b8f59084..3b2cd424132e 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -31,7 +31,7 @@
  * context switched per domain, and some inhibited in the S3 path.
  */
 #define SCF_use_shadow (1 << 0)
-#define SCF_ist_wrmsr  (1 << 1)
+#define SCF_ist_sc_msr (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
 
@@ -46,7 +46,7 @@
  * These are the controls to inhibit on the S3 resume path until microcode has
  * been reloaded.
  */
-#define SCF_IST_MASK (SCF_ist_wrmsr)
+#define SCF_IST_MASK (SCF_ist_sc_msr)
 
 /*
  * Some speculative protections are per-domain.  These settings are merged
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 12b3111ebdc0..3e9d6bd11420 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -273,8 +273,8 @@
 
 .L\@_skip_rsb:
 
-    test $SCF_ist_wrmsr, %al
-    jz .L\@_skip_wrmsr
+    test $SCF_ist_sc_msr, %al
+    jz .L\@_skip_msr_spec_ctrl
 
     xor %edx, %edx
     testb $3, UREGS_cs(%rsp)
@@ -297,7 +297,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
      * to speculate around the WRMSR.  As a result, we need a dispatch
      * serialising instruction in the else clause.
      */
-.L\@_skip_wrmsr:
+.L\@_skip_msr_spec_ctrl:
     lfence
     UNLIKELY_END(\@_serialise)
 .endm
@@ -308,7 +308,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
  * Requires %rbx=stack_end
  * Clobbers %rax, %rcx, %rdx
  */
-    testb $SCF_ist_wrmsr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
+    testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
     jz .L\@_skip
 
     DO_SPEC_CTRL_EXIT_TO_XEN
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rename opt_ibpb to opt_ibpb_ctxt_switch

We are about to introduce the use of IBPB at different points in Xen, making
opt_ibpb ambiguous.  Rename it to opt_ibpb_ctxt_switch.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 0f94270254a6..6199f3651457 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1792,7 +1792,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
         ctxt_switch_levelling(next);
 
-        if ( opt_ibpb && !is_idle_domain(nextd) )
+        if ( opt_ibpb_ctxt_switch && !is_idle_domain(nextd) )
         {
             static DEFINE_PER_CPU(unsigned int, last);
             unsigned int *last_id = &this_cpu(last);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index deab5d990c9f..4517339ea669 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -54,7 +54,7 @@ int8_t __initdata opt_stibp = -1;
 bool __read_mostly opt_ssbd;
 int8_t __initdata opt_psfd = -1;
 
-bool __read_mostly opt_ibpb = true;
+bool __read_mostly opt_ibpb_ctxt_switch = true;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 bool __read_mostly opt_branch_harden = true;
@@ -118,7 +118,7 @@ static int __init parse_spec_ctrl(const char *s)
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
-            opt_ibpb = false;
+            opt_ibpb_ctxt_switch = false;
             opt_ssbd = false;
             opt_l1d_flush = 0;
             opt_branch_harden = false;
@@ -239,7 +239,7 @@ static int __init parse_spec_ctrl(const char *s)
 
         /* Misc settings. */
         else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
-            opt_ibpb = val;
+            opt_ibpb_ctxt_switch = val;
         else if ( (val = parse_boolean("eager-fpu", s, ss)) >= 0 )
             opt_eager_fpu = val;
         else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
@@ -455,7 +455,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (opt_tsx & 1)                             ? " TSX+" : " TSX-",
            !boot_cpu_has(X86_FEATURE_SRBDS_CTRL)     ? "" :
            opt_srb_lock                              ? " SRB_LOCK+" : " SRB_LOCK-",
-           opt_ibpb                                  ? " IBPB"  : "",
+           opt_ibpb_ctxt_switch                      ? " IBPB-ctxt" : "",
            opt_l1d_flush                             ? " L1D_FLUSH" : "",
            opt_md_clear_pv || opt_md_clear_hvm ||
            opt_fb_clear_mmio                         ? " VERW"  : "",
@@ -1170,7 +1170,7 @@ void __init init_speculation_mitigations(void)
 
     /* Check we have hardware IBPB support before using it... */
     if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-        opt_ibpb = false;
+        opt_ibpb_ctxt_switch = false;
 
     /* Check whether Eager FPU should be enabled by default. */
     if ( opt_eager_fpu == -1 )
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 3b2cd424132e..495260d6b6fc 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -63,7 +63,7 @@
 void init_speculation_mitigations(void);
 void spec_ctrl_init_domain(struct domain *d);
 
-extern bool opt_ibpb;
+extern bool opt_ibpb_ctxt_switch;
 extern bool opt_ssbd;
 extern int8_t opt_eager_fpu;
 extern int8_t opt_l1d_flush;
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rework SPEC_CTRL_ENTRY_FROM_INTR_IST

We are shortly going to add a conditional IBPB in this path.

Therefore, we cannot hold spec_ctrl_flags in %eax, and rely on only clobbering
it after we're done with its contents.  %rbx is available for use, and the
more normal register to hold preserved information in.

With %rax freed up, use it instead of %rdx for the RSB tmp register, and for
the adjustment to spec_ctrl_flags.

This leaves no use of %rdx, except as 0 for the upper half of WRMSR.  In
practice, %rdx is 0 from SAVE_ALL on all paths and isn't likely to change in
the foreseeable future, so update the macro entry requirements to state this
dependency.  This marginal optimisation can be revisited if circumstances
change.

No practical change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 6a5f8aaec3d9..481110fb9360 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -805,7 +805,7 @@ ENTRY(double_fault)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rbx
@@ -838,7 +838,7 @@ handle_ist_exception:
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 3e9d6bd11420..3ec680b5870b 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -258,34 +258,33 @@
  */
 .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
 /*
- * Requires %rsp=regs, %r14=stack_end
- * Clobbers %rax, %rcx, %rdx
+ * Requires %rsp=regs, %r14=stack_end, %rdx=0
+ * Clobbers %rax, %rbx, %rcx, %rdx
  *
  * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
  * maybexen=1, but with conditionals rather than alternatives.
  */
-    movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %eax
+    movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
 
-    test $SCF_ist_rsb, %al
+    test $SCF_ist_rsb, %bl
     jz .L\@_skip_rsb
 
-    DO_OVERWRITE_RSB tmp=rdx /* Clobbers %rcx/%rdx */
+    DO_OVERWRITE_RSB         /* Clobbers %rax/%rcx */
 
 .L\@_skip_rsb:
 
-    test $SCF_ist_sc_msr, %al
+    test $SCF_ist_sc_msr, %bl
     jz .L\@_skip_msr_spec_ctrl
 
-    xor %edx, %edx
+    xor %eax, %eax
     testb $3, UREGS_cs(%rsp)
-    setnz %dl
-    not %edx
-    and %dl, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+    setnz %al
+    not %eax
+    and %al, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
 
     /* Load Xen's intended value. */
     mov $MSR_SPEC_CTRL, %ecx
     movzbl STACK_CPUINFO_FIELD(xen_spec_ctrl)(%r14), %eax
-    xor %edx, %edx
     wrmsr
 
     /* Opencoded UNLIKELY_START() with no condition. */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Support IBPB-on-entry

We are going to need this to mitigate Branch Type Confusion on AMD/Hygon CPUs,
but as we've talked about using it in other cases too, arrange to support it
generally.  However, this is also very expensive in some cases, so we're going
to want per-domain controls.

Introduce SCF_ist_ibpb and SCF_entry_ibpb controls, adding them to the IST and
DOM masks as appropriate.  Also introduce X86_FEATURE_IBPB_ENTRY_{PV,HVM} to
to patch the code blocks.

For SVM, the STGI is serialising enough to protect against Spectre-v1 attacks,
so no "else lfence" is necessary.  VT-x will use use the MSR host load list,
so doesn't need any code in the VMExit path.

For the IST path, we can't safely check CPL==0 to skip a flush, as we might
have hit an entry path before it's IBPB.  As IST hitting Xen is rare, flush
irrespective of CPL.  A later path, SCF_ist_sc_msr, provides Spectre-v1
safety.

For the PV paths, we know we're interrupting CPL>0, while for the INTR paths,
we can safely check CPL==0.  Only flush when interrupting guest context.

An "else lfence" is needed for safety, but we want to be able to skip it on
unaffected CPUs, so the block wants to be an alternative, which means the
lfence has to be inline rather than UNLIKELY() (the replacement block doesn't
have displacements fixed up for anything other than the first instruction).

As with SPEC_CTRL_ENTRY_FROM_INTR_IST, %rdx is 0 on entry so rely on this to
shrink the logic marginally.  Update the comments to specify this new
dependency.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index 0684d29050a5..d7c85fdfc61b 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -90,10 +90,26 @@ __UNLIKELY_END(nsvm_hap)
 
         GET_CURRENT(bx)
 
-        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: b=curr %rsp=regs/cpuinfo, Clob: ac  */
+        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: %rsp=regs/cpuinfo, %rdx=0 Clob: ac  */
+
+        .macro svm_vmexit_cond_ibpb
+            testb  $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
+            jz     .L_skip_ibpb
+
+            mov    $MSR_PRED_CMD, %ecx
+            mov    $PRED_CMD_IBPB, %eax
+            wrmsr
+.L_skip_ibpb:
+	.endm
+        ALTERNATIVE "", svm_vmexit_cond_ibpb, X86_FEATURE_IBPB_ENTRY_HVM
+
         ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
+        /*
+         * STGI is executed unconditionally, and is sufficiently serialising
+         * to safely resolve any Spectre-v1 concerns in the above logic.
+         */
         STGI
 GLOBAL(svm_stgi_label)
         mov  %rsp,%rdi
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index f3bd7f9bee94..447f583e07c4 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1312,6 +1312,10 @@ static int construct_vmcs(struct vcpu *v)
         rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D,
                          VMX_MSR_GUEST_LOADONLY);
 
+    if ( !rc && (d->arch.spec_ctrl_flags & SCF_entry_ibpb) )
+        rc = vmx_add_msr(v, MSR_PRED_CMD, PRED_CMD_IBPB,
+                         VMX_MSR_HOST);
+
  out:
     vmx_vmcs_exit(v);
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 9059c2ef6e43..6d2e0426771e 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -17,7 +17,7 @@ ENTRY(entry_int82)
         movl  $HYPERCALL_VECTOR, 4(%rsp)
         SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         CR4_PV32_RESTORE
@@ -208,7 +208,7 @@ ENTRY(cstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 481110fb9360..9fd4cef30f73 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -235,7 +235,7 @@ ENTRY(lstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -270,7 +270,7 @@ GLOBAL(sysenter_eflags_saved)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -321,7 +321,7 @@ ENTRY(int80_direct_trap)
         movl  $0x80, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -581,7 +581,7 @@ ENTRY(common_interrupt)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
@@ -613,7 +613,7 @@ GLOBAL(handle_exception)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index fc54c24a34ef..c6136ca4a031 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -37,6 +37,8 @@ XEN_CPUFEATURE(SC_MSR_IDLE,       X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on idle
 XEN_CPUFEATURE(XEN_LBR,           X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
 /* Bits 23,24 unused. */
 XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
+XEN_CPUFEATURE(IBPB_ENTRY_PV,     X86_SYNTH(26)) /* MSR_PRED_CMD used by Xen for PV */
+XEN_CPUFEATURE(IBPB_ENTRY_HVM,    X86_SYNTH(27)) /* MSR_PRED_CMD used by Xen for HVM */
 
 /* Bug words follow the synthetic words. */
 #define X86_NR_BUG 1
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 495260d6b6fc..8b69a819175a 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -34,6 +34,8 @@
 #define SCF_ist_sc_msr (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
+#define SCF_ist_ibpb   (1 << 4)
+#define SCF_entry_ibpb (1 << 5)
 
 /*
  * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
@@ -46,13 +48,13 @@
  * These are the controls to inhibit on the S3 resume path until microcode has
  * been reloaded.
  */
-#define SCF_IST_MASK (SCF_ist_sc_msr)
+#define SCF_IST_MASK (SCF_ist_sc_msr | SCF_ist_ibpb)
 
 /*
  * Some speculative protections are per-domain.  These settings are merged
  * into the top-of-stack block in the context switch path.
  */
-#define SCF_DOM_MASK (SCF_verw)
+#define SCF_DOM_MASK (SCF_verw | SCF_entry_ibpb)
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 3ec680b5870b..5756f3f1a83d 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -80,6 +80,35 @@
  *  - SPEC_CTRL_EXIT_TO_{SVM,VMX}
  */
 
+.macro DO_SPEC_CTRL_COND_IBPB maybexen:req
+/*
+ * Requires %rsp=regs (also cpuinfo if !maybexen)
+ * Requires %r14=stack_end (if maybexen), %rdx=0
+ * Clobbers %rax, %rcx, %rdx
+ *
+ * Conditionally issue IBPB if SCF_entry_ibpb is active.  In the maybexen
+ * case, we can safely look at UREGS_cs to skip taking the hit when
+ * interrupting Xen.
+ */
+    .if \maybexen
+        testb  $SCF_entry_ibpb, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+        jz     .L\@_skip
+        testb  $3, UREGS_cs(%rsp)
+    .else
+        testb  $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
+    .endif
+    jz     .L\@_skip
+
+    mov     $MSR_PRED_CMD, %ecx
+    mov     $PRED_CMD_IBPB, %eax
+    wrmsr
+    jmp     .L\@_done
+
+.L\@_skip:
+    lfence
+.L\@_done:
+.endm
+
 .macro DO_OVERWRITE_RSB tmp=rax
 /*
  * Requires nothing
@@ -232,12 +261,16 @@
 
 /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
 #define SPEC_CTRL_ENTRY_FROM_PV                                         \
+    ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=0),     \
+        X86_FEATURE_IBPB_ENTRY_PV;                                      \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
     ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=0),         \
         X86_FEATURE_SC_MSR_PV
 
 /* Use in interrupt/exception context.  May interrupt Xen or PV context. */
 #define SPEC_CTRL_ENTRY_FROM_INTR                                       \
+    ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=1),     \
+        X86_FEATURE_IBPB_ENTRY_PV;                                      \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
     ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1),         \
         X86_FEATURE_SC_MSR_PV
@@ -261,11 +294,23 @@
  * Requires %rsp=regs, %r14=stack_end, %rdx=0
  * Clobbers %rax, %rbx, %rcx, %rdx
  *
- * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
- * maybexen=1, but with conditionals rather than alternatives.
+ * This is logical merge of:
+ *    DO_SPEC_CTRL_COND_IBPB maybexen=0
+ *    DO_OVERWRITE_RSB
+ *    DO_SPEC_CTRL_ENTRY maybexen=1
+ * but with conditionals rather than alternatives.
  */
     movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
 
+    test    $SCF_ist_ibpb, %bl
+    jz      .L\@_skip_ibpb
+
+    mov     $MSR_PRED_CMD, %ecx
+    mov     $PRED_CMD_IBPB, %eax
+    wrmsr
+
+.L\@_skip_ibpb:
+
     test $SCF_ist_rsb, %bl
     jz .L\@_skip_rsb
 
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/cpuid: Enumeration for BTC_NO

BTC_NO indicates that hardware is not succeptable to Branch Type Confusion.

Zen3 CPUs don't suffer BTC.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index 977a4377bf1b..11b43807e965 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -274,6 +274,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
         {"virt-ssbd",    0x80000008, NA, CPUID_REG_EBX, 25,  1},
         {"ssb-no",       0x80000008, NA, CPUID_REG_EBX, 26,  1},
         {"psfd",         0x80000008, NA, CPUID_REG_EBX, 28,  1},
+        {"btc-no",       0x80000008, NA, CPUID_REG_EBX, 29,  1},
 
         {"nc",           0x80000008, NA, CPUID_REG_ECX,  0,  8},
         {"apicidsize",   0x80000008, NA, CPUID_REG_ECX, 12,  4},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index ff167779df31..52f5059d8f7c 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -156,7 +156,7 @@ static const char *const str_e8b[32] =
 
     [24] = "amd-ssbd",         [25] = "virt-ssbd",
     [26] = "ssb-no",
-    [28] = "psfd",
+    [28] = "psfd",             [29] = "btc-no",
 };
 
 static const char *const str_7d0[32] =
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index d5dece37cdeb..a2041e5138a2 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -707,6 +707,16 @@ static void init_amd(struct cpuinfo_x86 *c)
 			warning_add(text);
 		}
 		break;
+
+	case 0x19:
+		/*
+		 * Zen3 (Fam19h model < 0x10) parts are not susceptible to
+		 * Branch Type Confusion, but predate the allocation of the
+		 * BTC_NO bit.  Fill it back in if we're not virtualised.
+		 */
+		if (!cpu_has_hypervisor && !cpu_has(c, X86_FEATURE_BTC_NO))
+			__set_bit(X86_FEATURE_BTC_NO, c->x86_capability);
+		break;
 	}
 
 	display_cacheinfo(c);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 4517339ea669..4fdc6e93add4 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -387,7 +387,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * Hardware read-only information, stating immunity to certain issues, or
      * suggestions of which mitigation to use.
      */
-    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (caps & ARCH_CAPS_RDCL_NO)                        ? " RDCL_NO"        : "",
            (caps & ARCH_CAPS_IBRS_ALL)                       ? " IBRS_ALL"       : "",
            (caps & ARCH_CAPS_RSBA)                           ? " RSBA"           : "",
@@ -402,7 +402,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (e8b  & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS))    ? " IBRS_ALWAYS"    : "",
            (e8b  & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS))   ? " STIBP_ALWAYS"   : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBRS_FAST))      ? " IBRS_FAST"      : "",
-           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "");
+           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_BTC_NO))         ? " BTC_NO"         : "");
 
     /* Hardware features which need driving to mitigate issues. */
     printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 6084e0569de4..44b3ba331fb7 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -258,6 +258,7 @@ XEN_CPUFEATURE(AMD_SSBD,      8*32+24) /*   MSR_SPEC_CTRL.SSBD available */
 XEN_CPUFEATURE(VIRT_SSBD,     8*32+25) /*   MSR_VIRT_SPEC_CTRL.SSBD */
 XEN_CPUFEATURE(SSB_NO,        8*32+26) /*   Hardware not vulnerable to SSB */
 XEN_CPUFEATURE(PSFD,          8*32+28) /*   MSR_SPEC_CTRL.PSFD */
+XEN_CPUFEATURE(BTC_NO,        8*32+29) /*A  Hardware not vulnerable to Branch Type Confusion */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Enable Zen2 chickenbit

... as instructed in the Branch Type Confusion whitepaper.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index a2041e5138a2..e9530d581b11 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -590,6 +590,31 @@ void amd_init_ssbd(const struct cpuinfo_x86 *c)
 		printk_once(XENLOG_ERR "No SSBD controls available\n");
 }
 
+/*
+ * On Zen2 we offer this chicken (bit) on the altar of Speculation.
+ *
+ * Refer to the AMD Branch Type Confusion whitepaper:
+ * https://XXX
+ *
+ * Setting this unnamed bit supposedly causes prediction information on
+ * non-branch instructions to be ignored.  It is to be set unilaterally in
+ * newer microcode.
+ *
+ * This chickenbit is something unrelated on Zen1, and Zen1 vs Zen2 isn't a
+ * simple model number comparison, so use STIBP as a heuristic to separate the
+ * two uarches in Fam17h(AMD)/18h(Hygon).
+ */
+void amd_init_spectral_chicken(void)
+{
+	uint64_t val, chickenbit = 1 << 1;
+
+	if (cpu_has_hypervisor || !boot_cpu_has(X86_FEATURE_AMD_STIBP))
+		return;
+
+	if (rdmsr_safe(MSR_AMD64_DE_CFG2, val) == 0 && !(val & chickenbit))
+		wrmsr_safe(MSR_AMD64_DE_CFG2, val | chickenbit);
+}
+
 static void init_amd(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
@@ -668,6 +693,9 @@ static void init_amd(struct cpuinfo_x86 *c)
 
 	amd_init_ssbd(c);
 
+	if (c->x86 == 0x17)
+		amd_init_spectral_chicken();
+
 	/* MFENCE stops RDTSC speculation */
 	if (!cpu_has_lfence_dispatch)
 		__set_bit(X86_FEATURE_MFENCE_RDTSC, c->x86_capability);
diff --git a/xen/arch/x86/cpu/cpu.h b/xen/arch/x86/cpu/cpu.h
index 23ccd6611565..680b515b3c92 100644
--- a/xen/arch/x86/cpu/cpu.h
+++ b/xen/arch/x86/cpu/cpu.h
@@ -20,3 +20,4 @@ extern bool detect_extended_topology(struct cpuinfo_x86 *c);
 
 void early_init_amd(struct cpuinfo_x86 *c);
 void amd_init_ssbd(const struct cpuinfo_x86 *c);
+void amd_init_spectral_chicken(void);
diff --git a/xen/arch/x86/cpu/hygon.c b/xen/arch/x86/cpu/hygon.c
index b769e8ad9062..c500485dde3f 100644
--- a/xen/arch/x86/cpu/hygon.c
+++ b/xen/arch/x86/cpu/hygon.c
@@ -61,6 +61,12 @@ static void init_hygon(struct cpuinfo_x86 *c)
 
 	amd_init_ssbd(c);
 
+	/*
+	 * TODO: Check heuristic safety with Hygon first
+	if (c->x86 == 0x18)
+		amd_init_spectral_chicken();
+	 */
+
 	/* MFENCE stops RDTSC speculation */
 	if (!cpu_has_lfence_dispatch)
 		__set_bit(X86_FEATURE_MFENCE_RDTSC, c->x86_capability);
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 20653bd6601e..7b8a9c2951aa 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -311,6 +311,7 @@
 #define MSR_AMD64_DC_CFG		0xc0011022
 #define MSR_AMD64_DE_CFG		0xc0011029
 #define AMD64_DE_CFG_LFENCE_SERIALISE	(_AC(1, ULL) << 1)
+#define MSR_AMD64_DE_CFG2		0xc00110e3
 
 #define MSR_AMD64_DR0_ADDRESS_MASK	0xc0011027
 #define MSR_AMD64_DR1_ADDRESS_MASK	0xc0011019
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Mitigate Branch Type Confusion when possible

Branch Type Confusion affects AMD/Hygon CPUs on Zen2 and earlier.  To
mitigate, we require SMT safety (STIBP on Zen2, no-SMT on Zen1), and to issue
an IBPB on each entry to Xen, to flush the BTB.

Due to performance concerns, dom0 (which is trusted in most configurations) is
excluded from protections by default.

Therefore:
 * Use STIBP by default on Zen2 too, which now means we want it on by default
   on all hardware supporting STIBP.
 * Break the current IBPB logic out into a new function, extending it with
   IBPB-at-entry logic.
 * Change the existing IBPB-at-ctxt-switch boolean to be tristate, and disable
   it by default when IBPB-at-entry is providing sufficient safety.

If all PV guests on the system are trusted, then it is recommended to boot
with `spec-ctrl=ibpb-entry=no-pv`, as this will provide an additional marginal
perf improvement.

This is part of XSA-407 / CVE-2022-23825.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 359859b51808..956ecd5a3591 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2034,7 +2034,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 
 ### spec-ctrl (x86)
 > `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
->              {msr-sc,rsb,md-clear}=<bool>|{pv,hvm}=<bool>,
+>              {msr-sc,rsb,md-clear,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
 >              unpriv-mmio}=<bool> ]`
@@ -2059,9 +2059,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` options offer fine
-grained control over the primitives by Xen.  These impact Xen's ability to
-protect itself, and/or Xen's ability to virtualise support for guests to use.
+The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `md-clear=` and `ibpb-entry=` options
+offer fine grained control over the primitives by Xen.  These impact Xen's
+ability to protect itself, and/or Xen's ability to virtualise support for
+guests to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -2080,6 +2081,11 @@ protect itself, and/or Xen's ability to virtualise support for guests to use.
   compatibility with development versions of this fix, `mds=` is also accepted
   on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
   preference to here.*
+* `ibpb-entry=` offers control over whether IBPB (Indirect Branch Prediction
+  Barrier) is used on entry to Xen.  This is used by default on hardware
+  vulnerable to Branch Type Confusion, but for performance reasons, dom0 is
+  unprotected by default.  If it necessary to protect dom0 too, boot with
+  `spec-ctrl=ibpb-entry`.
 
 If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 4fdc6e93add4..bfa5d27e00f5 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -39,6 +39,10 @@ static bool __initdata opt_rsb_hvm = true;
 static int8_t __read_mostly opt_md_clear_pv = -1;
 static int8_t __read_mostly opt_md_clear_hvm = -1;
 
+static int8_t __read_mostly opt_ibpb_entry_pv = -1;
+static int8_t __read_mostly opt_ibpb_entry_hvm = -1;
+static bool __read_mostly opt_ibpb_entry_dom0;
+
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
     THUNK_DEFAULT, /* Decide which thunk to use at boot time. */
@@ -54,7 +58,7 @@ int8_t __initdata opt_stibp = -1;
 bool __read_mostly opt_ssbd;
 int8_t __initdata opt_psfd = -1;
 
-bool __read_mostly opt_ibpb_ctxt_switch = true;
+int8_t __read_mostly opt_ibpb_ctxt_switch = -1;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 bool __read_mostly opt_branch_harden = true;
@@ -115,6 +119,9 @@ static int __init parse_spec_ctrl(const char *s)
             opt_rsb_hvm = false;
             opt_md_clear_pv = 0;
             opt_md_clear_hvm = 0;
+            opt_ibpb_entry_pv = 0;
+            opt_ibpb_entry_hvm = 0;
+            opt_ibpb_entry_dom0 = false;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -141,12 +148,14 @@ static int __init parse_spec_ctrl(const char *s)
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
             opt_md_clear_pv = val;
+            opt_ibpb_entry_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
             opt_md_clear_hvm = val;
+            opt_ibpb_entry_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
         {
@@ -211,6 +220,28 @@ static int __init parse_spec_ctrl(const char *s)
                 break;
             }
         }
+        else if ( (val = parse_boolean("ibpb-entry", s, ss)) != -1 )
+        {
+            switch ( val )
+            {
+            case 0:
+            case 1:
+                opt_ibpb_entry_pv = opt_ibpb_entry_hvm =
+                    opt_ibpb_entry_dom0 = val;
+                break;
+
+            case -2:
+                s += strlen("ibpb-entry=");
+                if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+                    opt_ibpb_entry_pv = val;
+                else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+                    opt_ibpb_entry_hvm = val;
+                else
+            default:
+                    rc = -EINVAL;
+                break;
+            }
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -474,27 +505,31 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * mitigation support for guests.
      */
 #ifdef CONFIG_HVM
-    printk("  Support for HVM VMs:%s%s%s%s%s\n",
+    printk("  Support for HVM VMs:%s%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             boot_cpu_has(X86_FEATURE_MD_CLEAR)   ||
+            boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
-           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
+           boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM)  ? " IBPB-entry"    : "");
 
 #endif
 #ifdef CONFIG_PV
-    printk("  Support for PV VMs:%s%s%s%s%s\n",
+    printk("  Support for PV VMs:%s%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             boot_cpu_has(X86_FEATURE_MD_CLEAR)  ||
+            boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
-           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
+           boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV)   ? " IBPB-entry"    : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
            opt_xpti_hwdom ? "enabled" : "disabled",
@@ -727,6 +762,55 @@ static bool __init should_use_eager_fpu(void)
     }
 }
 
+static void __init ibpb_calculations(void)
+{
+    /* Check we have hardware IBPB support before using it... */
+    if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
+    {
+        opt_ibpb_entry_hvm = opt_ibpb_entry_pv = opt_ibpb_ctxt_switch = 0;
+        opt_ibpb_entry_dom0 = false;
+        return;
+    }
+
+    /*
+     * IBPB-on-entry mitigations for Branch Type Confusion.
+     *
+     * IBPB && !BTC_NO selects all AMD/Hygon hardware, not known to be safe,
+     * that we can provide some form of mitigation on.
+     */
+    if ( opt_ibpb_entry_pv == -1 )
+        opt_ibpb_entry_pv = (IS_ENABLED(CONFIG_PV) &&
+                             boot_cpu_has(X86_FEATURE_IBPB) &&
+                             !boot_cpu_has(X86_FEATURE_BTC_NO));
+    if ( opt_ibpb_entry_hvm == -1 )
+        opt_ibpb_entry_hvm = (IS_ENABLED(CONFIG_HVM) &&
+                              boot_cpu_has(X86_FEATURE_IBPB) &&
+                              !boot_cpu_has(X86_FEATURE_BTC_NO));
+
+    if ( opt_ibpb_entry_pv )
+    {
+        setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_PV);
+
+        /*
+         * We only need to flush in IST context if we're protecting against PV
+         * guests.  HVM IBPB-on-entry protections are both atomic with
+         * NMI/#MC, so can't interrupt Xen ahead of having already flushed the
+         * BTB.
+         */
+        default_spec_ctrl_flags |= SCF_ist_ibpb;
+    }
+    if ( opt_ibpb_entry_hvm )
+        setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_HVM);
+
+    /*
+     * If we're using IBPB-on-entry to protect against PV and HVM guests
+     * (ignoring dom0 if trusted), then there's no need to also issue IBPB on
+     * context switch too.
+     */
+    if ( opt_ibpb_ctxt_switch == -1 )
+        opt_ibpb_ctxt_switch = !(opt_ibpb_entry_hvm && opt_ibpb_entry_pv);
+}
+
 /* Calculate whether this CPU is vulnerable to L1TF. */
 static __init void l1tf_calculations(uint64_t caps)
 {
@@ -982,8 +1066,12 @@ void spec_ctrl_init_domain(struct domain *d)
     bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
                  (opt_fb_clear_mmio && is_iommu_enabled(d)));
 
+    bool ibpb = ((pv ? opt_ibpb_entry_pv : opt_ibpb_entry_hvm) &&
+                 (d->domain_id != 0 || opt_ibpb_entry_dom0));
+
     d->arch.spec_ctrl_flags =
         (verw   ? SCF_verw         : 0) |
+        (ibpb   ? SCF_entry_ibpb   : 0) |
         0;
 }
 
@@ -1111,12 +1199,15 @@ void __init init_speculation_mitigations(void)
     }
 
     /*
-     * Use STIBP by default if the hardware hint is set.  Otherwise, leave it
-     * off as it a severe performance pentalty on pre-eIBRS Intel hardware
-     * where it was retrofitted in microcode.
+     * Use STIBP by default on all AMD systems.  Zen3 and later enumerate
+     * STIBP_ALWAYS, but STIBP is needed on Zen2 as part of the mitigations
+     * for Branch Type Confusion.
+     *
+     * Leave STIBP off by default on Intel.  Pre-eIBRS systems suffer a
+     * substantial perf hit when it was implemented in microcode.
      */
     if ( opt_stibp == -1 )
-        opt_stibp = !!boot_cpu_has(X86_FEATURE_STIBP_ALWAYS);
+        opt_stibp = !!boot_cpu_has(X86_FEATURE_AMD_STIBP);
 
     if ( opt_stibp && (boot_cpu_has(X86_FEATURE_STIBP) ||
                        boot_cpu_has(X86_FEATURE_AMD_STIBP)) )
@@ -1169,9 +1260,7 @@ void __init init_speculation_mitigations(void)
     if ( opt_rsb_hvm )
         setup_force_cpu_cap(X86_FEATURE_SC_RSB_HVM);
 
-    /* Check we have hardware IBPB support before using it... */
-    if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-        opt_ibpb_ctxt_switch = false;
+    ibpb_calculations();
 
     /* Check whether Eager FPU should be enabled by default. */
     if ( opt_eager_fpu == -1 )
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 8b69a819175a..2f15ae981495 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -65,7 +65,7 @@
 void init_speculation_mitigations(void);
 void spec_ctrl_init_domain(struct domain *d);
 
-extern bool opt_ibpb_ctxt_switch;
+extern int8_t opt_ibpb_ctxt_switch;
 extern bool opt_ssbd;
 extern int8_t opt_eager_fpu;
 extern int8_t opt_l1d_flush;
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Only adjust MSR_SPEC_CTRL for idle with legacy IBRS

Back at the time of the original Spectre-v2 fixes, it was recommended to clear
MSR_SPEC_CTRL when going idle.  This is because of the side effects on the
sibling thread caused by the microcode IBRS and STIBP implementations which
were retrofitted to existing CPUs.

However, there are no relevant cross-thread impacts for the hardware
IBRS/STIBP implementations, so this logic should not be used on Intel CPUs
supporting eIBRS, or any AMD CPUs; doing so only adds unnecessary latency to
the idle path.

Furthermore, there's no point playing with MSR_SPEC_CTRL in the idle paths if
SMT is disabled for other reasons.

Fixes: 8d03080d2a33 ("x86/spec-ctrl: Cease using thunk=lfence on AMD")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit ffc7694e0c99eea158c32aa164b7d1e1bb1dc46b)

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 58a2797dfea1..d7f767b0739c 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -1104,8 +1104,14 @@ void __init init_speculation_mitigations(void)
     /* (Re)init BSP state now that default_spec_ctrl_flags has been calculated. */
     init_shadow_spec_ctrl_state();
 
-    /* If Xen is using any MSR_SPEC_CTRL settings, adjust the idle path. */
-    if ( default_xen_spec_ctrl )
+    /*
+     * For microcoded IBRS only (i.e. Intel, pre eIBRS), it is recommended to
+     * clear MSR_SPEC_CTRL before going idle, to avoid impacting sibling
+     * threads.  Activate this if SMT is enabled, and Xen is using a non-zero
+     * MSR_SPEC_CTRL setting.
+     */
+    if ( boot_cpu_has(X86_FEATURE_IBRSB) && !(caps & ARCH_CAPS_IBRS_ALL) &&
+         hw_smt_enabled && default_xen_spec_ctrl )
         setup_force_cpu_cap(X86_FEATURE_SC_MSR_IDLE);
 
     xpti_init_default(caps);
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index 9eaab7a2a1fa..f7488d3ccbfa 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -33,7 +33,7 @@ XEN_CPUFEATURE(SC_MSR_HVM,        X86_SYNTH(17)) /* MSR_SPEC_CTRL used by Xen fo
 XEN_CPUFEATURE(SC_RSB_PV,         X86_SYNTH(18)) /* RSB overwrite needed for PV */
 XEN_CPUFEATURE(SC_RSB_HVM,        X86_SYNTH(19)) /* RSB overwrite needed for HVM */
 XEN_CPUFEATURE(XEN_SELFSNOOP,     X86_SYNTH(20)) /* SELFSNOOP gets used by Xen itself */
-XEN_CPUFEATURE(SC_MSR_IDLE,       X86_SYNTH(21)) /* (SC_MSR_PV || SC_MSR_HVM) && default_xen_spec_ctrl */
+XEN_CPUFEATURE(SC_MSR_IDLE,       X86_SYNTH(21)) /* Clear MSR_SPEC_CTRL on idle */
 XEN_CPUFEATURE(XEN_LBR,           X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
 /* Bits 23,24 unused. */
 XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 68f6c46c470c..12283573cdd5 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -78,7 +78,8 @@ static always_inline void spec_ctrl_enter_idle(struct cpu_info *info)
     uint32_t val = 0;
 
     /*
-     * Branch Target Injection:
+     * It is recommended in some cases to clear MSR_SPEC_CTRL when going idle,
+     * to avoid impacting sibling threads.
      *
      * Latch the new shadow value, then enable shadowing, then update the MSR.
      * There are no SMP issues here; only local processor ordering concerns.
@@ -114,7 +115,7 @@ static always_inline void spec_ctrl_exit_idle(struct cpu_info *info)
     uint32_t val = info->xen_spec_ctrl;
 
     /*
-     * Branch Target Injection:
+     * Restore MSR_SPEC_CTRL on exit from idle.
      *
      * Disable shadowing before updating the MSR.  There are no SMP issues
      * here; only local processor ordering concerns.
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Knobs for STIBP and PSFD, and follow hardware STIBP
 hint

STIBP and PSFD are slightly weird bits, because they're both implied by other
bits in MSR_SPEC_CTRL.  Add fine grain controls for them, and take the
implications into account when setting IBRS/SSBD.

Rearrange the IBPB text/variables/logic to keep all the MSR_SPEC_CTRL bits
together, for consistency.

However, AMD have a hardware hint CPUID bit recommending that STIBP be set
unilaterally.  This is advertised on Zen3, so follow the recommendation.
Furthermore, in such cases, set STIBP behind the guest's back for now.  This
has negligible overhead for the guest, but saves a WRMSR on vmentry.  This is
the only default change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit fef244b179c06fcdfa581f7d57fa6e578c49ff50)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index d1d5852cdd84..2302cec91fea 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2105,8 +2105,9 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 
 ### spec-ctrl (x86)
 > `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
->              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,eager-fpu,
->              l1d-flush,branch-harden,srb-lock,unpriv-mmio}=<bool> ]`
+>              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
+>              eager-fpu,l1d-flush,branch-harden,srb-lock,
+>              unpriv-mmio}=<bool> ]`
 
 Controls for speculative execution sidechannel mitigations.  By default, Xen
 will pick the most appropriate mitigations based on compiled in support,
@@ -2156,9 +2157,10 @@ On hardware supporting IBRS (Indirect Branch Restricted Speculation), the
 If Xen is not using IBRS itself, functionality is still set up so IBRS can be
 virtualised for guests.
 
-On hardware supporting IBPB (Indirect Branch Prediction Barrier), the `ibpb=`
-option can be used to force (the default) or prevent Xen from issuing branch
-prediction barriers on vcpu context switches.
+On hardware supporting STIBP (Single Thread Indirect Branch Predictors), the
+`stibp=` option can be used to force or prevent Xen using the feature itself.
+By default, Xen will use STIBP when IBRS is in use (IBRS implies STIBP), and
+when hardware hints recommend using it as a blanket setting.
 
 On hardware supporting SSBD (Speculative Store Bypass Disable), the `ssbd=`
 option can be used to force or prevent Xen using the feature itself.  On AMD
@@ -2166,6 +2168,15 @@ hardware, this is a global option applied at boot, and not virtualised for
 guest use.  On Intel hardware, the feature is virtualised for guests,
 independently of Xen's choice of setting.
 
+On hardware supporting PSFD (Predictive Store Forwarding Disable), the `psfd=`
+option can be used to force or prevent Xen using the feature itself.  By
+default, Xen will not use PSFD.  PSFD is implied by SSBD, and SSBD is off by
+default.
+
+On hardware supporting IBPB (Indirect Branch Prediction Barrier), the `ibpb=`
+option can be used to force (the default) or prevent Xen from issuing branch
+prediction barriers on vcpu context switches.
+
 On all hardware, the `eager-fpu=` option can be used to force or prevent Xen
 from using fully eager FPU context switches.  This is currently implemented as
 a global control.  By default, Xen will choose to use fully eager context
diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c
index 55da9302e5d7..a0bf9f4e056a 100644
--- a/xen/arch/x86/hvm/svm/vmcb.c
+++ b/xen/arch/x86/hvm/svm/vmcb.c
@@ -29,6 +29,7 @@
 #include <asm/hvm/support.h>
 #include <asm/hvm/svm/svm.h>
 #include <asm/hvm/svm/svmdebug.h>
+#include <asm/spec_ctrl.h>
 
 struct vmcb_struct *alloc_vmcb(void)
 {
@@ -175,6 +176,14 @@ static int construct_vmcb(struct vcpu *v)
             vmcb->_pause_filter_thresh = SVM_PAUSETHRESH_INIT;
     }
 
+    /*
+     * When default_xen_spec_ctrl simply SPEC_CTRL_STIBP, default this behind
+     * the back of the VM too.  Our SMT topology isn't accurate, the overhead
+     * is neglegable, and doing this saves a WRMSR on the vmentry path.
+     */
+    if ( default_xen_spec_ctrl == SPEC_CTRL_STIBP )
+        v->arch.msrs->spec_ctrl.raw = SPEC_CTRL_STIBP;
+
     return 0;
 }
 
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index d7f767b0739c..06790897e496 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -48,9 +48,13 @@ static enum ind_thunk {
     THUNK_LFENCE,
     THUNK_JMP,
 } opt_thunk __initdata = THUNK_DEFAULT;
+
 static int8_t __initdata opt_ibrs = -1;
+int8_t __initdata opt_stibp = -1;
+bool __read_mostly opt_ssbd;
+int8_t __initdata opt_psfd = -1;
+
 bool __read_mostly opt_ibpb = true;
-bool __read_mostly opt_ssbd = false;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 bool __read_mostly opt_branch_harden = true;
@@ -173,12 +177,20 @@ static int __init parse_spec_ctrl(const char *s)
             else
                 rc = -EINVAL;
         }
+
+        /* Bits in MSR_SPEC_CTRL. */
         else if ( (val = parse_boolean("ibrs", s, ss)) >= 0 )
             opt_ibrs = val;
-        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
-            opt_ibpb = val;
+        else if ( (val = parse_boolean("stibp", s, ss)) >= 0 )
+            opt_stibp = val;
         else if ( (val = parse_boolean("ssbd", s, ss)) >= 0 )
             opt_ssbd = val;
+        else if ( (val = parse_boolean("psfd", s, ss)) >= 0 )
+            opt_psfd = val;
+
+        /* Misc settings. */
+        else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
+            opt_ibpb = val;
         else if ( (val = parse_boolean("eager-fpu", s, ss)) >= 0 )
             opt_eager_fpu = val;
         else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
@@ -377,7 +389,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s, Other:%s%s%s%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -391,6 +403,9 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (!boot_cpu_has(X86_FEATURE_SSBD) &&
             !boot_cpu_has(X86_FEATURE_AMD_SSBD))     ? "" :
            (default_xen_spec_ctrl & SPEC_CTRL_SSBD)  ? " SSBD+" : " SSBD-",
+           (!boot_cpu_has(X86_FEATURE_PSFD) &&
+            !boot_cpu_has(X86_FEATURE_INTEL_PSFD))   ? "" :
+           (default_xen_spec_ctrl & SPEC_CTRL_PSFD)  ? " PSFD+" : " PSFD-",
            !(caps & ARCH_CAPS_TSX_CTRL)              ? "" :
            (opt_tsx & 1)                             ? " TSX+" : " TSX-",
            !cpu_has_srbds_ctrl                       ? "" :
@@ -951,10 +966,7 @@ void __init init_speculation_mitigations(void)
         if ( !has_spec_ctrl )
             printk(XENLOG_WARNING "?!? CET active, but no MSR_SPEC_CTRL?\n");
         else if ( opt_ibrs == -1 )
-        {
             opt_ibrs = ibrs = true;
-            default_xen_spec_ctrl |= SPEC_CTRL_IBRS | SPEC_CTRL_STIBP;
-        }
 
         if ( opt_thunk == THUNK_DEFAULT || opt_thunk == THUNK_RETPOLINE )
             thunk = THUNK_JMP;
@@ -1058,14 +1070,49 @@ void __init init_speculation_mitigations(void)
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
     }
 
-    /* If we have IBRS available, see whether we should use it. */
+    /* Figure out default_xen_spec_ctrl. */
     if ( has_spec_ctrl && ibrs )
+    {
+        /* IBRS implies STIBP.  */
+        if ( opt_stibp == -1 )
+            opt_stibp = 1;
+
         default_xen_spec_ctrl |= SPEC_CTRL_IBRS;
+    }
+
+    /*
+     * Use STIBP by default if the hardware hint is set.  Otherwise, leave it
+     * off as it a severe performance pentalty on pre-eIBRS Intel hardware
+     * where it was retrofitted in microcode.
+     */
+    if ( opt_stibp == -1 )
+        opt_stibp = !!boot_cpu_has(X86_FEATURE_STIBP_ALWAYS);
+
+    if ( opt_stibp && (boot_cpu_has(X86_FEATURE_STIBP) ||
+                       boot_cpu_has(X86_FEATURE_AMD_STIBP)) )
+        default_xen_spec_ctrl |= SPEC_CTRL_STIBP;
 
-    /* If we have SSBD available, see whether we should use it. */
     if ( opt_ssbd && (boot_cpu_has(X86_FEATURE_SSBD) ||
                       boot_cpu_has(X86_FEATURE_AMD_SSBD)) )
+    {
+        /* SSBD implies PSFD */
+        if ( opt_psfd == -1 )
+            opt_psfd = 1;
+
         default_xen_spec_ctrl |= SPEC_CTRL_SSBD;
+    }
+
+    /*
+     * Don't use PSFD by default.  AMD designed the predictor to
+     * auto-clear on privilege change.  PSFD is implied by SSBD, which is
+     * off by default.
+     */
+    if ( opt_psfd == -1 )
+        opt_psfd = 0;
+
+    if ( opt_psfd && (boot_cpu_has(X86_FEATURE_PSFD) ||
+                      boot_cpu_has(X86_FEATURE_INTEL_PSFD)) )
+        default_xen_spec_ctrl |= SPEC_CTRL_PSFD;
 
     /*
      * PV guests can poison the RSB to any virtual address from which
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: xen/cmdline: Extend parse_boolean() to signal a name match

This will help parsing a sub-option which has boolean and non-boolean options
available.

First, rework 'int val' into 'bool has_neg_prefix'.  This inverts it's value,
but the resulting logic is far easier to follow.

Second, reject anything of the form 'no-$FOO=' which excludes ambiguous
constructs such as 'no-$foo=yes' which have never been valid.

This just leaves the case where everything is otherwise fine, but parse_bool()
can't interpret the provided string.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 382326cac528dd1eb0d04efd5c05363c453e29f4)

diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index c3a943f07765..f07ff41d881e 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -272,9 +272,9 @@ int parse_bool(const char *s, const char *e)
 int parse_boolean(const char *name, const char *s, const char *e)
 {
     size_t slen, nlen;
-    int val = !!strncmp(s, "no-", 3);
+    bool has_neg_prefix = !strncmp(s, "no-", 3);
 
-    if ( !val )
+    if ( has_neg_prefix )
         s += 3;
 
     slen = e ? ({ ASSERT(e >= s); e - s; }) : strlen(s);
@@ -286,11 +286,23 @@ int parse_boolean(const char *name, const char *s, const char *e)
 
     /* Exact, unadorned name?  Result depends on the 'no-' prefix. */
     if ( slen == nlen )
-        return val;
+        return !has_neg_prefix;
+
+    /* Inexact match with a 'no-' prefix?  Not valid. */
+    if ( has_neg_prefix )
+        return -1;
 
     /* =$SOMETHING?  Defer to the regular boolean parsing. */
     if ( s[nlen] == '=' )
-        return parse_bool(&s[nlen + 1], e);
+    {
+        int b = parse_bool(&s[nlen + 1], e);
+
+        if ( b >= 0 )
+            return b;
+
+        /* Not a boolean, but the name matched.  Signal specially. */
+        return -2;
+    }
 
     /* Unrecognised.  Give up. */
     return -1;
diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h
index 076bcfb67dbb..900c0ce3e466 100644
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -82,7 +82,8 @@ int parse_bool(const char *s, const char *e);
 /**
  * Given a specific name, parses a string of the form:
  *   [no-]$NAME[=...]
- * returning 0 or 1 for a recognised boolean, or -1 for an error.
+ * returning 0 or 1 for a recognised boolean.  Returns -1 for general errors,
+ * and -2 for "not a boolean, but $NAME= matches".
  */
 int parse_boolean(const char *name, const char *s, const char *e);
 
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Add fine-grained cmdline suboptions for primitives

Support controling the PV/HVM suboption of msr-sc/rsb/md-clear, which
previously wasn't possible.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 27357c394ba6e1571a89105b840ce1c6f026485c)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 2302cec91fea..a84f5c19218d 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2104,7 +2104,8 @@ not be able to control the state of the mitigation.
 By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 
 ### spec-ctrl (x86)
-> `= List of [ <bool>, xen=<bool>, {pv,hvm,msr-sc,rsb,md-clear}=<bool>,
+> `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
+>              {msr-sc,rsb,md-clear}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
 >              unpriv-mmio}=<bool> ]`
@@ -2129,12 +2130,17 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The booleans `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` offer fine
+The `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` options offer fine
 grained control over the primitives by Xen.  These impact Xen's ability to
-protect itself, and Xen's ability to virtualise support for guests to use.
+protect itself, and/or Xen's ability to virtualise support for guests to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
+* Each other option can be used either as a plain boolean
+  (e.g. `spec-ctrl=rsb` to control both the PV and HVM sub-options), or with
+  `pv=` or `hvm=` subsuboptions (e.g. `spec-ctrl=rsb=no-hvm` to disable HVM
+  RSB only).
+
 * `msr-sc=` offers control over Xen's support for manipulating `MSR_SPEC_CTRL`
   on entry and exit.  These blocks are necessary to virtualise support for
   guests and if disabled, guests will be unable to use IBRS/STIBP/SSBD/etc.
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 06790897e496..225fe08259b3 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -147,20 +147,68 @@ static int __init parse_spec_ctrl(const char *s)
             opt_rsb_hvm = val;
             opt_md_clear_hvm = val;
         }
-        else if ( (val = parse_boolean("msr-sc", s, ss)) >= 0 )
+        else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
         {
-            opt_msr_sc_pv = val;
-            opt_msr_sc_hvm = val;
+            switch ( val )
+            {
+            case 0:
+            case 1:
+                opt_msr_sc_pv = opt_msr_sc_hvm = val;
+                break;
+
+            case -2:
+                s += strlen("msr-sc=");
+                if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+                    opt_msr_sc_pv = val;
+                else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+                    opt_msr_sc_hvm = val;
+                else
+            default:
+                    rc = -EINVAL;
+                break;
+            }
         }
-        else if ( (val = parse_boolean("rsb", s, ss)) >= 0 )
+        else if ( (val = parse_boolean("rsb", s, ss)) != -1 )
         {
-            opt_rsb_pv = val;
-            opt_rsb_hvm = val;
+            switch ( val )
+            {
+            case 0:
+            case 1:
+                opt_rsb_pv = opt_rsb_hvm = val;
+                break;
+
+            case -2:
+                s += strlen("rsb=");
+                if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+                    opt_rsb_pv = val;
+                else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+                    opt_rsb_hvm = val;
+                else
+            default:
+                    rc = -EINVAL;
+                break;
+            }
         }
-        else if ( (val = parse_boolean("md-clear", s, ss)) >= 0 )
+        else if ( (val = parse_boolean("md-clear", s, ss)) != -1 )
         {
-            opt_md_clear_pv = val;
-            opt_md_clear_hvm = val;
+            switch ( val )
+            {
+            case 0:
+            case 1:
+                opt_md_clear_pv = opt_md_clear_hvm = val;
+                break;
+
+            case -2:
+                s += strlen("md-clear=");
+                if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+                    opt_md_clear_pv = val;
+                else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+                    opt_md_clear_hvm = val;
+                else
+            default:
+                    rc = -EINVAL;
+                break;
+            }
         }
 
         /* Xen's speculative sidechannel mitigation settings. */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rework spec_ctrl_flags context switching

We are shortly going to need to context switch new bits in both the vcpu and
S3 paths.  Introduce SCF_IST_MASK and SCF_DOM_MASK, and rework d->arch.verw
into d->arch.spec_ctrl_flags to accommodate.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 774e0fcd35d7..06f3e0e9f3e0 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -246,8 +246,8 @@ static int enter_state(u32 state)
         error = 0;
 
     ci = get_cpu_info();
-    /* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */
-    ci->spec_ctrl_flags &= ~SCF_ist_wrmsr;
+    /* Avoid NMI/#MC using unsafe MSRs until we've reloaded microcode. */
+    ci->spec_ctrl_flags &= ~SCF_IST_MASK;
 
     ACPI_FLUSH_CPU_CACHE();
 
@@ -290,8 +290,8 @@ static int enter_state(u32 state)
     if ( !recheck_cpu_features(0) )
         panic("Missing previously available feature(s)\n");
 
-    /* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
-    ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
+    /* Re-enabled default NMI/#MC use of MSRs now microcode is loaded. */
+    ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_IST_MASK);
 
     if ( boot_cpu_has(X86_FEATURE_IBRSB) || boot_cpu_has(X86_FEATURE_IBRS) )
     {
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 5ea5ef6ba037..305a63b67e2d 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1838,10 +1838,10 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
             }
         }
 
-        /* Update the top-of-stack block with the VERW disposition. */
-        info->spec_ctrl_flags &= ~SCF_verw;
-        if ( nextd->arch.verw )
-            info->spec_ctrl_flags |= SCF_verw;
+        /* Update the top-of-stack block with the new spec_ctrl settings. */
+        info->spec_ctrl_flags =
+            (info->spec_ctrl_flags       & ~SCF_DOM_MASK) |
+            (nextd->arch.spec_ctrl_flags &  SCF_DOM_MASK);
     }
 
     sched_context_switched(prev, next);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 225fe08259b3..0fabfbe2a9f4 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -981,9 +981,12 @@ void spec_ctrl_init_domain(struct domain *d)
 {
     bool pv = is_pv_domain(d);
 
-    d->arch.verw =
-        (pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
-        (opt_fb_clear_mmio && is_iommu_enabled(d));
+    bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
+                 (opt_fb_clear_mmio && is_iommu_enabled(d)));
+
+    d->arch.spec_ctrl_flags =
+        (verw   ? SCF_verw         : 0) |
+        0;
 }
 
 void __init init_speculation_mitigations(void)
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 4ee76bba45da..53d5a43ec0ce 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -308,8 +308,7 @@ struct arch_domain
     uint32_t pci_cf8;
     uint8_t cmos_idx;
 
-    /* Use VERW on return-to-guest for its flushing side effect. */
-    bool verw;
+    uint8_t spec_ctrl_flags; /* See SCF_DOM_MASK */
 
     union {
         struct pv_domain pv;
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 12283573cdd5..60d6d2dc9407 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -20,12 +20,40 @@
 #ifndef __X86_SPEC_CTRL_H__
 #define __X86_SPEC_CTRL_H__
 
-/* Encoding of cpuinfo.spec_ctrl_flags */
+/*
+ * Encoding of:
+ *   cpuinfo.spec_ctrl_flags
+ *   default_spec_ctrl_flags
+ *   domain.spec_ctrl_flags
+ *
+ * Live settings are in the top-of-stack block, because they need to be
+ * accessable when XPTI is active.  Some settings are fixed from boot, some
+ * context switched per domain, and some inhibited in the S3 path.
+ */
 #define SCF_use_shadow (1 << 0)
 #define SCF_ist_wrmsr  (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
 
+/*
+ * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
+ * functionality requires updated microcode to work.
+ *
+ * On boot, this is easy; we load microcode before figuring out which
+ * speculative protections to apply.  However, on the S3 resume path, we must
+ * be able to disable the configured mitigations until microcode is reloaded.
+ *
+ * These are the controls to inhibit on the S3 resume path until microcode has
+ * been reloaded.
+ */
+#define SCF_IST_MASK (SCF_ist_wrmsr)
+
+/*
+ * Some speculative protections are per-domain.  These settings are merged
+ * into the top-of-stack block in the context switch path.
+ */
+#define SCF_DOM_MASK (SCF_verw)
+
 #ifndef __ASSEMBLY__
 
 #include <asm/alternative.h>
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 5a590bac44aa..66b00d511fc6 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -248,9 +248,6 @@
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.
- * Fine grain control of SCF_ist_wrmsr is needed for safety in the S3 resume
- * path to avoid using MSR_SPEC_CTRL before the microcode introducing it has
- * been reloaded.
  */
 .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
 /*
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rename SCF_ist_wrmsr to SCF_ist_sc_msr

We are about to introduce SCF_ist_ibpb, at which point SCF_ist_wrmsr becomes
ambiguous.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 0fabfbe2a9f4..a6def47061e8 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -1086,7 +1086,7 @@ void __init init_speculation_mitigations(void)
     {
         if ( opt_msr_sc_pv )
         {
-            default_spec_ctrl_flags |= SCF_ist_wrmsr;
+            default_spec_ctrl_flags |= SCF_ist_sc_msr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_PV);
         }
 
@@ -1097,7 +1097,7 @@ void __init init_speculation_mitigations(void)
              * Xen's value is not restored atomically.  An early NMI hitting
              * the VMExit path needs to restore Xen's value for safety.
              */
-            default_spec_ctrl_flags |= SCF_ist_wrmsr;
+            default_spec_ctrl_flags |= SCF_ist_sc_msr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
         }
     }
@@ -1110,7 +1110,7 @@ void __init init_speculation_mitigations(void)
          * on real hardware matches the availability of MSR_SPEC_CTRL in the
          * first place.
          *
-         * No need for SCF_ist_wrmsr because Xen's value is restored
+         * No need for SCF_ist_sc_msr because Xen's value is restored
          * atomically WRT NMIs in the VMExit path.
          *
          * TODO: Adjust cpu_has_svm_spec_ctrl to be usable earlier on boot.
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 60d6d2dc9407..6f8b0e09348e 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -31,7 +31,7 @@
  * context switched per domain, and some inhibited in the S3 path.
  */
 #define SCF_use_shadow (1 << 0)
-#define SCF_ist_wrmsr  (1 << 1)
+#define SCF_ist_sc_msr (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
 
@@ -46,7 +46,7 @@
  * These are the controls to inhibit on the S3 resume path until microcode has
  * been reloaded.
  */
-#define SCF_IST_MASK (SCF_ist_wrmsr)
+#define SCF_IST_MASK (SCF_ist_sc_msr)
 
 /*
  * Some speculative protections are per-domain.  These settings are merged
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 66b00d511fc6..0ff1b118f882 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -266,8 +266,8 @@
 
 .L\@_skip_rsb:
 
-    test $SCF_ist_wrmsr, %al
-    jz .L\@_skip_wrmsr
+    test $SCF_ist_sc_msr, %al
+    jz .L\@_skip_msr_spec_ctrl
 
     xor %edx, %edx
     testb $3, UREGS_cs(%rsp)
@@ -290,7 +290,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
      * to speculate around the WRMSR.  As a result, we need a dispatch
      * serialising instruction in the else clause.
      */
-.L\@_skip_wrmsr:
+.L\@_skip_msr_spec_ctrl:
     lfence
     UNLIKELY_END(\@_serialise)
 .endm
@@ -301,7 +301,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
  * Requires %rbx=stack_end
  * Clobbers %rax, %rcx, %rdx
  */
-    testb $SCF_ist_wrmsr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
+    testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
     jz .L\@_skip
 
     DO_SPEC_CTRL_EXIT_TO_XEN
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rename opt_ibpb to opt_ibpb_ctxt_switch

We are about to introduce the use of IBPB at different points in Xen, making
opt_ibpb ambiguous.  Rename it to opt_ibpb_ctxt_switch.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 305a63b67e2d..3658e50d56c7 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1810,7 +1810,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
         ctxt_switch_levelling(next);
 
-        if ( opt_ibpb && !is_idle_domain(nextd) )
+        if ( opt_ibpb_ctxt_switch && !is_idle_domain(nextd) )
         {
             static DEFINE_PER_CPU(unsigned int, last);
             unsigned int *last_id = &this_cpu(last);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index a6def47061e8..ced0f8c2aea6 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -54,7 +54,7 @@ int8_t __initdata opt_stibp = -1;
 bool __read_mostly opt_ssbd;
 int8_t __initdata opt_psfd = -1;
 
-bool __read_mostly opt_ibpb = true;
+bool __read_mostly opt_ibpb_ctxt_switch = true;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 bool __read_mostly opt_branch_harden = true;
@@ -117,7 +117,7 @@ static int __init parse_spec_ctrl(const char *s)
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
-            opt_ibpb = false;
+            opt_ibpb_ctxt_switch = false;
             opt_ssbd = false;
             opt_l1d_flush = 0;
             opt_branch_harden = false;
@@ -238,7 +238,7 @@ static int __init parse_spec_ctrl(const char *s)
 
         /* Misc settings. */
         else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
-            opt_ibpb = val;
+            opt_ibpb_ctxt_switch = val;
         else if ( (val = parse_boolean("eager-fpu", s, ss)) >= 0 )
             opt_eager_fpu = val;
         else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
@@ -458,7 +458,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (opt_tsx & 1)                             ? " TSX+" : " TSX-",
            !cpu_has_srbds_ctrl                       ? "" :
            opt_srb_lock                              ? " SRB_LOCK+" : " SRB_LOCK-",
-           opt_ibpb                                  ? " IBPB"  : "",
+           opt_ibpb_ctxt_switch                      ? " IBPB-ctxt" : "",
            opt_l1d_flush                             ? " L1D_FLUSH" : "",
            opt_md_clear_pv || opt_md_clear_hvm ||
            opt_fb_clear_mmio                         ? " VERW"  : "",
@@ -1193,7 +1193,7 @@ void __init init_speculation_mitigations(void)
 
     /* Check we have hardware IBPB support before using it... */
     if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-        opt_ibpb = false;
+        opt_ibpb_ctxt_switch = false;
 
     /* Check whether Eager FPU should be enabled by default. */
     if ( opt_eager_fpu == -1 )
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 6f8b0e09348e..fd8162ca9ab9 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -63,7 +63,7 @@
 void init_speculation_mitigations(void);
 void spec_ctrl_init_domain(struct domain *d);
 
-extern bool opt_ibpb;
+extern bool opt_ibpb_ctxt_switch;
 extern bool opt_ssbd;
 extern int8_t opt_eager_fpu;
 extern int8_t opt_l1d_flush;
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rework SPEC_CTRL_ENTRY_FROM_INTR_IST

We are shortly going to add a conditional IBPB in this path.

Therefore, we cannot hold spec_ctrl_flags in %eax, and rely on only clobbering
it after we're done with its contents.  %rbx is available for use, and the
more normal register to hold preserved information in.

With %rax freed up, use it instead of %rdx for the RSB tmp register, and for
the adjustment to spec_ctrl_flags.

This leaves no use of %rdx, except as 0 for the upper half of WRMSR.  In
practice, %rdx is 0 from SAVE_ALL on all paths and isn't likely to change in
the foreseeable future, so update the macro entry requirements to state this
dependency.  This marginal optimisation can be revisited if circumstances
change.

No practical change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index cbf332e752a8..87bf9cb6942b 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -854,7 +854,7 @@ ENTRY(double_fault)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rbx
@@ -889,7 +889,7 @@ handle_ist_exception:
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 0ff1b118f882..15e24cde00d1 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -251,34 +251,33 @@
  */
 .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
 /*
- * Requires %rsp=regs, %r14=stack_end
- * Clobbers %rax, %rcx, %rdx
+ * Requires %rsp=regs, %r14=stack_end, %rdx=0
+ * Clobbers %rax, %rbx, %rcx, %rdx
  *
  * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
  * maybexen=1, but with conditionals rather than alternatives.
  */
-    movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %eax
+    movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
 
-    test $SCF_ist_rsb, %al
+    test $SCF_ist_rsb, %bl
     jz .L\@_skip_rsb
 
-    DO_OVERWRITE_RSB tmp=rdx /* Clobbers %rcx/%rdx */
+    DO_OVERWRITE_RSB         /* Clobbers %rax/%rcx */
 
 .L\@_skip_rsb:
 
-    test $SCF_ist_sc_msr, %al
+    test $SCF_ist_sc_msr, %bl
     jz .L\@_skip_msr_spec_ctrl
 
-    xor %edx, %edx
+    xor %eax, %eax
     testb $3, UREGS_cs(%rsp)
-    setnz %dl
-    not %edx
-    and %dl, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+    setnz %al
+    not %eax
+    and %al, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
 
     /* Load Xen's intended value. */
     mov $MSR_SPEC_CTRL, %ecx
     movzbl STACK_CPUINFO_FIELD(xen_spec_ctrl)(%r14), %eax
-    xor %edx, %edx
     wrmsr
 
     /* Opencoded UNLIKELY_START() with no condition. */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Support IBPB-on-entry

We are going to need this to mitigate Branch Type Confusion on AMD/Hygon CPUs,
but as we've talked about using it in other cases too, arrange to support it
generally.  However, this is also very expensive in some cases, so we're going
to want per-domain controls.

Introduce SCF_ist_ibpb and SCF_entry_ibpb controls, adding them to the IST and
DOM masks as appropriate.  Also introduce X86_FEATURE_IBPB_ENTRY_{PV,HVM} to
to patch the code blocks.

For SVM, the STGI is serialising enough to protect against Spectre-v1 attacks,
so no "else lfence" is necessary.  VT-x will use use the MSR host load list,
so doesn't need any code in the VMExit path.

For the IST path, we can't safely check CPL==0 to skip a flush, as we might
have hit an entry path before it's IBPB.  As IST hitting Xen is rare, flush
irrespective of CPL.  A later path, SCF_ist_sc_msr, provides Spectre-v1
safety.

For the PV paths, we know we're interrupting CPL>0, while for the INTR paths,
we can safely check CPL==0.  Only flush when interrupting guest context.

An "else lfence" is needed for safety, but we want to be able to skip it on
unaffected CPUs, so the block wants to be an alternative, which means the
lfence has to be inline rather than UNLIKELY() (the replacement block doesn't
have displacements fixed up for anything other than the first instruction).

As with SPEC_CTRL_ENTRY_FROM_INTR_IST, %rdx is 0 on entry so rely on this to
shrink the logic marginally.  Update the comments to specify this new
dependency.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index 055e6f4564c6..7aab67899bea 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -101,7 +101,19 @@ __UNLIKELY_END(nsvm_hap)
 
         GET_CURRENT(bx)
 
-        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: %rsp=regs/cpuinfo         Clob: acd */
+        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: %rsp=regs/cpuinfo, %rdx=0 Clob: acd */
+
+        .macro svm_vmexit_cond_ibpb
+            testb  $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
+            jz     .L_skip_ibpb
+
+            mov    $MSR_PRED_CMD, %ecx
+            mov    $PRED_CMD_IBPB, %eax
+            wrmsr
+.L_skip_ibpb:
+	.endm
+        ALTERNATIVE "", svm_vmexit_cond_ibpb, X86_FEATURE_IBPB_ENTRY_HVM
+
         ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
 
         .macro svm_vmexit_spec_ctrl
@@ -118,6 +130,10 @@ __UNLIKELY_END(nsvm_hap)
         ALTERNATIVE "", svm_vmexit_spec_ctrl, X86_FEATURE_SC_MSR_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
+        /*
+         * STGI is executed unconditionally, and is sufficiently serialising
+         * to safely resolve any Spectre-v1 concerns in the above logic.
+         */
         STGI
 GLOBAL(svm_stgi_label)
         mov  %rsp,%rdi
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 1466064d0cc9..3d271178e8bb 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1332,6 +1332,10 @@ static int construct_vmcs(struct vcpu *v)
         rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D,
                          VMX_MSR_GUEST_LOADONLY);
 
+    if ( !rc && (d->arch.spec_ctrl_flags & SCF_entry_ibpb) )
+        rc = vmx_add_msr(v, MSR_PRED_CMD, PRED_CMD_IBPB,
+                         VMX_MSR_HOST);
+
  out:
     vmx_vmcs_exit(v);
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index b67468f7c934..302530e65e0b 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -18,7 +18,7 @@ ENTRY(entry_int82)
         movl  $HYPERCALL_VECTOR, 4(%rsp)
         SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         CR4_PV32_RESTORE
@@ -212,7 +212,7 @@ ENTRY(cstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 87bf9cb6942b..153e89e24694 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -248,7 +248,7 @@ ENTRY(lstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -287,7 +287,7 @@ GLOBAL(sysenter_eflags_saved)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -339,7 +339,7 @@ ENTRY(int80_direct_trap)
         movl  $0x80, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -600,7 +600,7 @@ ENTRY(common_interrupt)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
@@ -633,7 +633,7 @@ GLOBAL(handle_exception)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index f7488d3ccbfa..b233e5835fb5 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -39,6 +39,8 @@ XEN_CPUFEATURE(XEN_LBR,           X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
 XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
 XEN_CPUFEATURE(XEN_SHSTK,         X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
 XEN_CPUFEATURE(XEN_IBT,           X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
+XEN_CPUFEATURE(IBPB_ENTRY_PV,     X86_SYNTH(28)) /* MSR_PRED_CMD used by Xen for PV */
+XEN_CPUFEATURE(IBPB_ENTRY_HVM,    X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen for HVM */
 
 /* Bug words follow the synthetic words. */
 #define X86_NR_BUG 1
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index fd8162ca9ab9..10cd0cd2518f 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -34,6 +34,8 @@
 #define SCF_ist_sc_msr (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
+#define SCF_ist_ibpb   (1 << 4)
+#define SCF_entry_ibpb (1 << 5)
 
 /*
  * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
@@ -46,13 +48,13 @@
  * These are the controls to inhibit on the S3 resume path until microcode has
  * been reloaded.
  */
-#define SCF_IST_MASK (SCF_ist_sc_msr)
+#define SCF_IST_MASK (SCF_ist_sc_msr | SCF_ist_ibpb)
 
 /*
  * Some speculative protections are per-domain.  These settings are merged
  * into the top-of-stack block in the context switch path.
  */
-#define SCF_DOM_MASK (SCF_verw)
+#define SCF_DOM_MASK (SCF_verw | SCF_entry_ibpb)
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 15e24cde00d1..9eb4ad9ab71d 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -88,6 +88,35 @@
  *  - SPEC_CTRL_EXIT_TO_{SVM,VMX}
  */
 
+.macro DO_SPEC_CTRL_COND_IBPB maybexen:req
+/*
+ * Requires %rsp=regs (also cpuinfo if !maybexen)
+ * Requires %r14=stack_end (if maybexen), %rdx=0
+ * Clobbers %rax, %rcx, %rdx
+ *
+ * Conditionally issue IBPB if SCF_entry_ibpb is active.  In the maybexen
+ * case, we can safely look at UREGS_cs to skip taking the hit when
+ * interrupting Xen.
+ */
+    .if \maybexen
+        testb  $SCF_entry_ibpb, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+        jz     .L\@_skip
+        testb  $3, UREGS_cs(%rsp)
+    .else
+        testb  $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
+    .endif
+    jz     .L\@_skip
+
+    mov     $MSR_PRED_CMD, %ecx
+    mov     $PRED_CMD_IBPB, %eax
+    wrmsr
+    jmp     .L\@_done
+
+.L\@_skip:
+    lfence
+.L\@_done:
+.endm
+
 .macro DO_OVERWRITE_RSB tmp=rax
 /*
  * Requires nothing
@@ -225,12 +254,16 @@
 
 /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
 #define SPEC_CTRL_ENTRY_FROM_PV                                         \
+    ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=0),     \
+        X86_FEATURE_IBPB_ENTRY_PV;                                      \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
     ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=0),         \
         X86_FEATURE_SC_MSR_PV
 
 /* Use in interrupt/exception context.  May interrupt Xen or PV context. */
 #define SPEC_CTRL_ENTRY_FROM_INTR                                       \
+    ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=1),     \
+        X86_FEATURE_IBPB_ENTRY_PV;                                      \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
     ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1),         \
         X86_FEATURE_SC_MSR_PV
@@ -254,11 +287,23 @@
  * Requires %rsp=regs, %r14=stack_end, %rdx=0
  * Clobbers %rax, %rbx, %rcx, %rdx
  *
- * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
- * maybexen=1, but with conditionals rather than alternatives.
+ * This is logical merge of:
+ *    DO_SPEC_CTRL_COND_IBPB maybexen=0
+ *    DO_OVERWRITE_RSB
+ *    DO_SPEC_CTRL_ENTRY maybexen=1
+ * but with conditionals rather than alternatives.
  */
     movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
 
+    test    $SCF_ist_ibpb, %bl
+    jz      .L\@_skip_ibpb
+
+    mov     $MSR_PRED_CMD, %ecx
+    mov     $PRED_CMD_IBPB, %eax
+    wrmsr
+
+.L\@_skip_ibpb:
+
     test $SCF_ist_rsb, %bl
     jz .L\@_skip_rsb
 
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/cpuid: Enumeration for BTC_NO

BTC_NO indicates that hardware is not succeptable to Branch Type Confusion.

Zen3 CPUs don't suffer BTC.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
index 86c8d21555ba..25576b4d992d 100644
--- a/tools/libxl/libxl_cpuid.c
+++ b/tools/libxl/libxl_cpuid.c
@@ -280,6 +280,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
         {"virt-ssbd",    0x80000008, NA, CPUID_REG_EBX, 25,  1},
         {"ssb-no",       0x80000008, NA, CPUID_REG_EBX, 26,  1},
         {"psfd",         0x80000008, NA, CPUID_REG_EBX, 28,  1},
+        {"btc-no",       0x80000008, NA, CPUID_REG_EBX, 29,  1},
 
         {"nc",           0x80000008, NA, CPUID_REG_ECX,  0,  8},
         {"apicidsize",   0x80000008, NA, CPUID_REG_ECX, 12,  4},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 7ebf520a7171..e5208cfa4538 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -157,7 +157,7 @@ static const char *const str_e8b[32] =
     /* [22] */                 [23] = "ppin",
     [24] = "amd-ssbd",         [25] = "virt-ssbd",
     [26] = "ssb-no",
-    [28] = "psfd",
+    [28] = "psfd",             [29] = "btc-no",
 };
 
 static const char *const str_7d0[32] =
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 142f34af5f70..7409af98f633 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -822,6 +822,16 @@ static void init_amd(struct cpuinfo_x86 *c)
 			warning_add(text);
 		}
 		break;
+
+	case 0x19:
+		/*
+		 * Zen3 (Fam19h model < 0x10) parts are not susceptible to
+		 * Branch Type Confusion, but predate the allocation of the
+		 * BTC_NO bit.  Fill it back in if we're not virtualised.
+		 */
+		if (!cpu_has_hypervisor && !cpu_has(c, X86_FEATURE_BTC_NO))
+			__set_bit(X86_FEATURE_BTC_NO, c->x86_capability);
+		break;
 	}
 
 	display_cacheinfo(c);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index ced0f8c2aea6..9f66c715516c 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -388,7 +388,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * Hardware read-only information, stating immunity to certain issues, or
      * suggestions of which mitigation to use.
      */
-    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (caps & ARCH_CAPS_RDCL_NO)                        ? " RDCL_NO"        : "",
            (caps & ARCH_CAPS_IBRS_ALL)                       ? " IBRS_ALL"       : "",
            (caps & ARCH_CAPS_RSBA)                           ? " RSBA"           : "",
@@ -403,7 +403,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (e8b  & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS))    ? " IBRS_ALWAYS"    : "",
            (e8b  & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS))   ? " STIBP_ALWAYS"   : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBRS_FAST))      ? " IBRS_FAST"      : "",
-           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "");
+           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_BTC_NO))         ? " BTC_NO"         : "");
 
     /* Hardware features which need driving to mitigate issues. */
     printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index c5af6f03cff6..746a75200ab8 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -264,6 +264,7 @@ XEN_CPUFEATURE(AMD_SSBD,      8*32+24) /*S  MSR_SPEC_CTRL.SSBD available */
 XEN_CPUFEATURE(VIRT_SSBD,     8*32+25) /*   MSR_VIRT_SPEC_CTRL.SSBD */
 XEN_CPUFEATURE(SSB_NO,        8*32+26) /*A  Hardware not vulnerable to SSB */
 XEN_CPUFEATURE(PSFD,          8*32+28) /*S  MSR_SPEC_CTRL.PSFD */
+XEN_CPUFEATURE(BTC_NO,        8*32+29) /*A  Hardware not vulnerable to Branch Type Confusion */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Enable Zen2 chickenbit

... as instructed in the Branch Type Confusion whitepaper.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 7409af98f633..f50f91f81eb9 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -731,6 +731,31 @@ void amd_init_ssbd(const struct cpuinfo_x86 *c)
 		printk_once(XENLOG_ERR "No SSBD controls available\n");
 }
 
+/*
+ * On Zen2 we offer this chicken (bit) on the altar of Speculation.
+ *
+ * Refer to the AMD Branch Type Confusion whitepaper:
+ * https://XXX
+ *
+ * Setting this unnamed bit supposedly causes prediction information on
+ * non-branch instructions to be ignored.  It is to be set unilaterally in
+ * newer microcode.
+ *
+ * This chickenbit is something unrelated on Zen1, and Zen1 vs Zen2 isn't a
+ * simple model number comparison, so use STIBP as a heuristic to separate the
+ * two uarches in Fam17h(AMD)/18h(Hygon).
+ */
+void amd_init_spectral_chicken(void)
+{
+	uint64_t val, chickenbit = 1 << 1;
+
+	if (cpu_has_hypervisor || !boot_cpu_has(X86_FEATURE_AMD_STIBP))
+		return;
+
+	if (rdmsr_safe(MSR_AMD64_DE_CFG2, val) == 0 && !(val & chickenbit))
+		wrmsr_safe(MSR_AMD64_DE_CFG2, val | chickenbit);
+}
+
 static void init_amd(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
@@ -783,6 +808,9 @@ static void init_amd(struct cpuinfo_x86 *c)
 
 	amd_init_ssbd(c);
 
+	if (c->x86 == 0x17)
+		amd_init_spectral_chicken();
+
 	/* MFENCE stops RDTSC speculation */
 	if (!cpu_has_lfence_dispatch)
 		__set_bit(X86_FEATURE_MFENCE_RDTSC, c->x86_capability);
diff --git a/xen/arch/x86/cpu/cpu.h b/xen/arch/x86/cpu/cpu.h
index 1a5b3918b37e..e76ab5ce1ae2 100644
--- a/xen/arch/x86/cpu/cpu.h
+++ b/xen/arch/x86/cpu/cpu.h
@@ -22,3 +22,4 @@ void early_init_amd(struct cpuinfo_x86 *c);
 void amd_log_freq(const struct cpuinfo_x86 *c);
 void amd_init_lfence(struct cpuinfo_x86 *c);
 void amd_init_ssbd(const struct cpuinfo_x86 *c);
+void amd_init_spectral_chicken(void);
diff --git a/xen/arch/x86/cpu/hygon.c b/xen/arch/x86/cpu/hygon.c
index 3845e0cf0e89..0cb0e7d55e61 100644
--- a/xen/arch/x86/cpu/hygon.c
+++ b/xen/arch/x86/cpu/hygon.c
@@ -36,6 +36,12 @@ static void init_hygon(struct cpuinfo_x86 *c)
 
 	amd_init_ssbd(c);
 
+	/*
+	 * TODO: Check heuristic safety with Hygon first
+	if (c->x86 == 0x18)
+		amd_init_spectral_chicken();
+	 */
+
 	/* MFENCE stops RDTSC speculation */
 	if (!cpu_has_lfence_dispatch)
 		__set_bit(X86_FEATURE_MFENCE_RDTSC, c->x86_capability);
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index c8670eab8ef5..4c1cba589d08 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -359,6 +359,7 @@
 #define MSR_AMD64_DC_CFG		0xc0011022
 #define MSR_AMD64_DE_CFG		0xc0011029
 #define AMD64_DE_CFG_LFENCE_SERIALISE	(_AC(1, ULL) << 1)
+#define MSR_AMD64_DE_CFG2		0xc00110e3
 
 #define MSR_AMD64_DR0_ADDRESS_MASK	0xc0011027
 #define MSR_AMD64_DR1_ADDRESS_MASK	0xc0011019
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Mitigate Branch Type Confusion when possible

Branch Type Confusion affects AMD/Hygon CPUs on Zen2 and earlier.  To
mitigate, we require SMT safety (STIBP on Zen2, no-SMT on Zen1), and to issue
an IBPB on each entry to Xen, to flush the BTB.

Due to performance concerns, dom0 (which is trusted in most configurations) is
excluded from protections by default.

Therefore:
 * Use STIBP by default on Zen2 too, which now means we want it on by default
   on all hardware supporting STIBP.
 * Break the current IBPB logic out into a new function, extending it with
   IBPB-at-entry logic.
 * Change the existing IBPB-at-ctxt-switch boolean to be tristate, and disable
   it by default when IBPB-at-entry is providing sufficient safety.

If all PV guests on the system are trusted, then it is recommended to boot
with `spec-ctrl=ibpb-entry=no-pv`, as this will provide an additional marginal
perf improvement.

This is part of XSA-407 / CVE-2022-23825.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index a84f5c19218d..f13304ef4eb1 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2105,7 +2105,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 
 ### spec-ctrl (x86)
 > `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
->              {msr-sc,rsb,md-clear}=<bool>|{pv,hvm}=<bool>,
+>              {msr-sc,rsb,md-clear,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
 >              unpriv-mmio}=<bool> ]`
@@ -2130,9 +2130,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` options offer fine
-grained control over the primitives by Xen.  These impact Xen's ability to
-protect itself, and/or Xen's ability to virtualise support for guests to use.
+The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `md-clear=` and `ibpb-entry=` options
+offer fine grained control over the primitives by Xen.  These impact Xen's
+ability to protect itself, and/or Xen's ability to virtualise support for
+guests to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -2151,6 +2152,11 @@ protect itself, and/or Xen's ability to virtualise support for guests to use.
   compatibility with development versions of this fix, `mds=` is also accepted
   on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
   preference to here.*
+* `ibpb-entry=` offers control over whether IBPB (Indirect Branch Prediction
+  Barrier) is used on entry to Xen.  This is used by default on hardware
+  vulnerable to Branch Type Confusion, but for performance reasons, dom0 is
+  unprotected by default.  If it necessary to protect dom0 too, boot with
+  `spec-ctrl=ibpb-entry`.
 
 If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 9f66c715516c..563519ce0e31 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -39,6 +39,10 @@ static bool __initdata opt_rsb_hvm = true;
 static int8_t __read_mostly opt_md_clear_pv = -1;
 static int8_t __read_mostly opt_md_clear_hvm = -1;
 
+static int8_t __read_mostly opt_ibpb_entry_pv = -1;
+static int8_t __read_mostly opt_ibpb_entry_hvm = -1;
+static bool __read_mostly opt_ibpb_entry_dom0;
+
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
     THUNK_DEFAULT, /* Decide which thunk to use at boot time. */
@@ -54,7 +58,7 @@ int8_t __initdata opt_stibp = -1;
 bool __read_mostly opt_ssbd;
 int8_t __initdata opt_psfd = -1;
 
-bool __read_mostly opt_ibpb_ctxt_switch = true;
+int8_t __read_mostly opt_ibpb_ctxt_switch = -1;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 bool __read_mostly opt_branch_harden = true;
@@ -114,6 +118,9 @@ static int __init parse_spec_ctrl(const char *s)
             opt_rsb_hvm = false;
             opt_md_clear_pv = 0;
             opt_md_clear_hvm = 0;
+            opt_ibpb_entry_pv = 0;
+            opt_ibpb_entry_hvm = 0;
+            opt_ibpb_entry_dom0 = false;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -140,12 +147,14 @@ static int __init parse_spec_ctrl(const char *s)
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
             opt_md_clear_pv = val;
+            opt_ibpb_entry_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
             opt_md_clear_hvm = val;
+            opt_ibpb_entry_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
         {
@@ -210,6 +219,28 @@ static int __init parse_spec_ctrl(const char *s)
                 break;
             }
         }
+        else if ( (val = parse_boolean("ibpb-entry", s, ss)) != -1 )
+        {
+            switch ( val )
+            {
+            case 0:
+            case 1:
+                opt_ibpb_entry_pv = opt_ibpb_entry_hvm =
+                    opt_ibpb_entry_dom0 = val;
+                break;
+
+            case -2:
+                s += strlen("ibpb-entry=");
+                if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+                    opt_ibpb_entry_pv = val;
+                else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+                    opt_ibpb_entry_hvm = val;
+                else
+            default:
+                    rc = -EINVAL;
+                break;
+            }
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -477,27 +508,31 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * mitigation support for guests.
      */
 #ifdef CONFIG_HVM
-    printk("  Support for HVM VMs:%s%s%s%s%s\n",
+    printk("  Support for HVM VMs:%s%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             boot_cpu_has(X86_FEATURE_MD_CLEAR)   ||
+            boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
-           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
+           boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM)  ? " IBPB-entry"    : "");
 
 #endif
 #ifdef CONFIG_PV
-    printk("  Support for PV VMs:%s%s%s%s%s\n",
+    printk("  Support for PV VMs:%s%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             boot_cpu_has(X86_FEATURE_MD_CLEAR)  ||
+            boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
-           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
+           boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV)   ? " IBPB-entry"    : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
            opt_xpti_hwdom ? "enabled" : "disabled",
@@ -730,6 +765,55 @@ static bool __init should_use_eager_fpu(void)
     }
 }
 
+static void __init ibpb_calculations(void)
+{
+    /* Check we have hardware IBPB support before using it... */
+    if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
+    {
+        opt_ibpb_entry_hvm = opt_ibpb_entry_pv = opt_ibpb_ctxt_switch = 0;
+        opt_ibpb_entry_dom0 = false;
+        return;
+    }
+
+    /*
+     * IBPB-on-entry mitigations for Branch Type Confusion.
+     *
+     * IBPB && !BTC_NO selects all AMD/Hygon hardware, not known to be safe,
+     * that we can provide some form of mitigation on.
+     */
+    if ( opt_ibpb_entry_pv == -1 )
+        opt_ibpb_entry_pv = (IS_ENABLED(CONFIG_PV) &&
+                             boot_cpu_has(X86_FEATURE_IBPB) &&
+                             !boot_cpu_has(X86_FEATURE_BTC_NO));
+    if ( opt_ibpb_entry_hvm == -1 )
+        opt_ibpb_entry_hvm = (IS_ENABLED(CONFIG_HVM) &&
+                              boot_cpu_has(X86_FEATURE_IBPB) &&
+                              !boot_cpu_has(X86_FEATURE_BTC_NO));
+
+    if ( opt_ibpb_entry_pv )
+    {
+        setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_PV);
+
+        /*
+         * We only need to flush in IST context if we're protecting against PV
+         * guests.  HVM IBPB-on-entry protections are both atomic with
+         * NMI/#MC, so can't interrupt Xen ahead of having already flushed the
+         * BTB.
+         */
+        default_spec_ctrl_flags |= SCF_ist_ibpb;
+    }
+    if ( opt_ibpb_entry_hvm )
+        setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_HVM);
+
+    /*
+     * If we're using IBPB-on-entry to protect against PV and HVM guests
+     * (ignoring dom0 if trusted), then there's no need to also issue IBPB on
+     * context switch too.
+     */
+    if ( opt_ibpb_ctxt_switch == -1 )
+        opt_ibpb_ctxt_switch = !(opt_ibpb_entry_hvm && opt_ibpb_entry_pv);
+}
+
 /* Calculate whether this CPU is vulnerable to L1TF. */
 static __init void l1tf_calculations(uint64_t caps)
 {
@@ -985,8 +1069,12 @@ void spec_ctrl_init_domain(struct domain *d)
     bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
                  (opt_fb_clear_mmio && is_iommu_enabled(d)));
 
+    bool ibpb = ((pv ? opt_ibpb_entry_pv : opt_ibpb_entry_hvm) &&
+                 (d->domain_id != 0 || opt_ibpb_entry_dom0));
+
     d->arch.spec_ctrl_flags =
         (verw   ? SCF_verw         : 0) |
+        (ibpb   ? SCF_entry_ibpb   : 0) |
         0;
 }
 
@@ -1133,12 +1221,15 @@ void __init init_speculation_mitigations(void)
     }
 
     /*
-     * Use STIBP by default if the hardware hint is set.  Otherwise, leave it
-     * off as it a severe performance pentalty on pre-eIBRS Intel hardware
-     * where it was retrofitted in microcode.
+     * Use STIBP by default on all AMD systems.  Zen3 and later enumerate
+     * STIBP_ALWAYS, but STIBP is needed on Zen2 as part of the mitigations
+     * for Branch Type Confusion.
+     *
+     * Leave STIBP off by default on Intel.  Pre-eIBRS systems suffer a
+     * substantial perf hit when it was implemented in microcode.
      */
     if ( opt_stibp == -1 )
-        opt_stibp = !!boot_cpu_has(X86_FEATURE_STIBP_ALWAYS);
+        opt_stibp = !!boot_cpu_has(X86_FEATURE_AMD_STIBP);
 
     if ( opt_stibp && (boot_cpu_has(X86_FEATURE_STIBP) ||
                        boot_cpu_has(X86_FEATURE_AMD_STIBP)) )
@@ -1192,9 +1283,7 @@ void __init init_speculation_mitigations(void)
     if ( opt_rsb_hvm )
         setup_force_cpu_cap(X86_FEATURE_SC_RSB_HVM);
 
-    /* Check we have hardware IBPB support before using it... */
-    if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-        opt_ibpb_ctxt_switch = false;
+    ibpb_calculations();
 
     /* Check whether Eager FPU should be enabled by default. */
     if ( opt_eager_fpu == -1 )
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 10cd0cd2518f..33e845991b0a 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -65,7 +65,7 @@
 void init_speculation_mitigations(void);
 void spec_ctrl_init_domain(struct domain *d);
 
-extern bool opt_ibpb_ctxt_switch;
+extern int8_t opt_ibpb_ctxt_switch;
 extern bool opt_ssbd;
 extern int8_t opt_eager_fpu;
 extern int8_t opt_l1d_flush;
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rework spec_ctrl_flags context switching

We are shortly going to need to context switch new bits in both the vcpu and
S3 paths.  Introduce SCF_IST_MASK and SCF_DOM_MASK, and rework d->arch.verw
into d->arch.spec_ctrl_flags to accommodate.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 5eaa77f66a28..dd397f713067 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -248,8 +248,8 @@ static int enter_state(u32 state)
         error = 0;
 
     ci = get_cpu_info();
-    /* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */
-    ci->spec_ctrl_flags &= ~SCF_ist_wrmsr;
+    /* Avoid NMI/#MC using unsafe MSRs until we've reloaded microcode. */
+    ci->spec_ctrl_flags &= ~SCF_IST_MASK;
 
     ACPI_FLUSH_CPU_CACHE();
 
@@ -292,8 +292,8 @@ static int enter_state(u32 state)
     if ( !recheck_cpu_features(0) )
         panic("Missing previously available feature(s)\n");
 
-    /* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
-    ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
+    /* Re-enabled default NMI/#MC use of MSRs now microcode is loaded. */
+    ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_IST_MASK);
 
     if ( boot_cpu_has(X86_FEATURE_IBRSB) || boot_cpu_has(X86_FEATURE_IBRS) )
     {
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 4a61e951facf..79f2c6ab19b8 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2069,10 +2069,10 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
             }
         }
 
-        /* Update the top-of-stack block with the VERW disposition. */
-        info->spec_ctrl_flags &= ~SCF_verw;
-        if ( nextd->arch.verw )
-            info->spec_ctrl_flags |= SCF_verw;
+        /* Update the top-of-stack block with the new spec_ctrl settings. */
+        info->spec_ctrl_flags =
+            (info->spec_ctrl_flags       & ~SCF_DOM_MASK) |
+            (nextd->arch.spec_ctrl_flags &  SCF_DOM_MASK);
     }
 
     sched_context_switched(prev, next);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 225fe08259b3..0fabfbe2a9f4 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -981,9 +981,12 @@ void spec_ctrl_init_domain(struct domain *d)
 {
     bool pv = is_pv_domain(d);
 
-    d->arch.verw =
-        (pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
-        (opt_fb_clear_mmio && is_iommu_enabled(d));
+    bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
+                 (opt_fb_clear_mmio && is_iommu_enabled(d)));
+
+    d->arch.spec_ctrl_flags =
+        (verw   ? SCF_verw         : 0) |
+        0;
 }
 
 void __init init_speculation_mitigations(void)
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index d0df7f83aa0c..7d6483f21bb1 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -319,8 +319,7 @@ struct arch_domain
     uint32_t pci_cf8;
     uint8_t cmos_idx;
 
-    /* Use VERW on return-to-guest for its flushing side effect. */
-    bool verw;
+    uint8_t spec_ctrl_flags; /* See SCF_DOM_MASK */
 
     union {
         struct pv_domain pv;
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 12283573cdd5..60d6d2dc9407 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -20,12 +20,40 @@
 #ifndef __X86_SPEC_CTRL_H__
 #define __X86_SPEC_CTRL_H__
 
-/* Encoding of cpuinfo.spec_ctrl_flags */
+/*
+ * Encoding of:
+ *   cpuinfo.spec_ctrl_flags
+ *   default_spec_ctrl_flags
+ *   domain.spec_ctrl_flags
+ *
+ * Live settings are in the top-of-stack block, because they need to be
+ * accessable when XPTI is active.  Some settings are fixed from boot, some
+ * context switched per domain, and some inhibited in the S3 path.
+ */
 #define SCF_use_shadow (1 << 0)
 #define SCF_ist_wrmsr  (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
 
+/*
+ * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
+ * functionality requires updated microcode to work.
+ *
+ * On boot, this is easy; we load microcode before figuring out which
+ * speculative protections to apply.  However, on the S3 resume path, we must
+ * be able to disable the configured mitigations until microcode is reloaded.
+ *
+ * These are the controls to inhibit on the S3 resume path until microcode has
+ * been reloaded.
+ */
+#define SCF_IST_MASK (SCF_ist_wrmsr)
+
+/*
+ * Some speculative protections are per-domain.  These settings are merged
+ * into the top-of-stack block in the context switch path.
+ */
+#define SCF_DOM_MASK (SCF_verw)
+
 #ifndef __ASSEMBLY__
 
 #include <asm/alternative.h>
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 5a590bac44aa..66b00d511fc6 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -248,9 +248,6 @@
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.
- * Fine grain control of SCF_ist_wrmsr is needed for safety in the S3 resume
- * path to avoid using MSR_SPEC_CTRL before the microcode introducing it has
- * been reloaded.
  */
 .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
 /*
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rename SCF_ist_wrmsr to SCF_ist_sc_msr

We are about to introduce SCF_ist_ibpb, at which point SCF_ist_wrmsr becomes
ambiguous.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 0fabfbe2a9f4..a6def47061e8 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -1086,7 +1086,7 @@ void __init init_speculation_mitigations(void)
     {
         if ( opt_msr_sc_pv )
         {
-            default_spec_ctrl_flags |= SCF_ist_wrmsr;
+            default_spec_ctrl_flags |= SCF_ist_sc_msr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_PV);
         }
 
@@ -1097,7 +1097,7 @@ void __init init_speculation_mitigations(void)
              * Xen's value is not restored atomically.  An early NMI hitting
              * the VMExit path needs to restore Xen's value for safety.
              */
-            default_spec_ctrl_flags |= SCF_ist_wrmsr;
+            default_spec_ctrl_flags |= SCF_ist_sc_msr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
         }
     }
@@ -1110,7 +1110,7 @@ void __init init_speculation_mitigations(void)
          * on real hardware matches the availability of MSR_SPEC_CTRL in the
          * first place.
          *
-         * No need for SCF_ist_wrmsr because Xen's value is restored
+         * No need for SCF_ist_sc_msr because Xen's value is restored
          * atomically WRT NMIs in the VMExit path.
          *
          * TODO: Adjust cpu_has_svm_spec_ctrl to be usable earlier on boot.
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 60d6d2dc9407..6f8b0e09348e 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -31,7 +31,7 @@
  * context switched per domain, and some inhibited in the S3 path.
  */
 #define SCF_use_shadow (1 << 0)
-#define SCF_ist_wrmsr  (1 << 1)
+#define SCF_ist_sc_msr (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
 
@@ -46,7 +46,7 @@
  * These are the controls to inhibit on the S3 resume path until microcode has
  * been reloaded.
  */
-#define SCF_IST_MASK (SCF_ist_wrmsr)
+#define SCF_IST_MASK (SCF_ist_sc_msr)
 
 /*
  * Some speculative protections are per-domain.  These settings are merged
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 66b00d511fc6..0ff1b118f882 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -266,8 +266,8 @@
 
 .L\@_skip_rsb:
 
-    test $SCF_ist_wrmsr, %al
-    jz .L\@_skip_wrmsr
+    test $SCF_ist_sc_msr, %al
+    jz .L\@_skip_msr_spec_ctrl
 
     xor %edx, %edx
     testb $3, UREGS_cs(%rsp)
@@ -290,7 +290,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
      * to speculate around the WRMSR.  As a result, we need a dispatch
      * serialising instruction in the else clause.
      */
-.L\@_skip_wrmsr:
+.L\@_skip_msr_spec_ctrl:
     lfence
     UNLIKELY_END(\@_serialise)
 .endm
@@ -301,7 +301,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
  * Requires %rbx=stack_end
  * Clobbers %rax, %rcx, %rdx
  */
-    testb $SCF_ist_wrmsr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
+    testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
     jz .L\@_skip
 
     DO_SPEC_CTRL_EXIT_TO_XEN
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rename opt_ibpb to opt_ibpb_ctxt_switch

We are about to introduce the use of IBPB at different points in Xen, making
opt_ibpb ambiguous.  Rename it to opt_ibpb_ctxt_switch.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 79f2c6ab19b8..2838f976d729 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2041,7 +2041,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
         ctxt_switch_levelling(next);
 
-        if ( opt_ibpb && !is_idle_domain(nextd) )
+        if ( opt_ibpb_ctxt_switch && !is_idle_domain(nextd) )
         {
             static DEFINE_PER_CPU(unsigned int, last);
             unsigned int *last_id = &this_cpu(last);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index a6def47061e8..ced0f8c2aea6 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -54,7 +54,7 @@ int8_t __initdata opt_stibp = -1;
 bool __read_mostly opt_ssbd;
 int8_t __initdata opt_psfd = -1;
 
-bool __read_mostly opt_ibpb = true;
+bool __read_mostly opt_ibpb_ctxt_switch = true;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 bool __read_mostly opt_branch_harden = true;
@@ -117,7 +117,7 @@ static int __init parse_spec_ctrl(const char *s)
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
-            opt_ibpb = false;
+            opt_ibpb_ctxt_switch = false;
             opt_ssbd = false;
             opt_l1d_flush = 0;
             opt_branch_harden = false;
@@ -238,7 +238,7 @@ static int __init parse_spec_ctrl(const char *s)
 
         /* Misc settings. */
         else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
-            opt_ibpb = val;
+            opt_ibpb_ctxt_switch = val;
         else if ( (val = parse_boolean("eager-fpu", s, ss)) >= 0 )
             opt_eager_fpu = val;
         else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
@@ -458,7 +458,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (opt_tsx & 1)                             ? " TSX+" : " TSX-",
            !cpu_has_srbds_ctrl                       ? "" :
            opt_srb_lock                              ? " SRB_LOCK+" : " SRB_LOCK-",
-           opt_ibpb                                  ? " IBPB"  : "",
+           opt_ibpb_ctxt_switch                      ? " IBPB-ctxt" : "",
            opt_l1d_flush                             ? " L1D_FLUSH" : "",
            opt_md_clear_pv || opt_md_clear_hvm ||
            opt_fb_clear_mmio                         ? " VERW"  : "",
@@ -1193,7 +1193,7 @@ void __init init_speculation_mitigations(void)
 
     /* Check we have hardware IBPB support before using it... */
     if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-        opt_ibpb = false;
+        opt_ibpb_ctxt_switch = false;
 
     /* Check whether Eager FPU should be enabled by default. */
     if ( opt_eager_fpu == -1 )
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 6f8b0e09348e..fd8162ca9ab9 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -63,7 +63,7 @@
 void init_speculation_mitigations(void);
 void spec_ctrl_init_domain(struct domain *d);
 
-extern bool opt_ibpb;
+extern bool opt_ibpb_ctxt_switch;
 extern bool opt_ssbd;
 extern int8_t opt_eager_fpu;
 extern int8_t opt_l1d_flush;
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rework SPEC_CTRL_ENTRY_FROM_INTR_IST

We are shortly going to add a conditional IBPB in this path.

Therefore, we cannot hold spec_ctrl_flags in %eax, and rely on only clobbering
it after we're done with its contents.  %rbx is available for use, and the
more normal register to hold preserved information in.

With %rax freed up, use it instead of %rdx for the RSB tmp register, and for
the adjustment to spec_ctrl_flags.

This leaves no use of %rdx, except as 0 for the upper half of WRMSR.  In
practice, %rdx is 0 from SAVE_ALL on all paths and isn't likely to change in
the foreseeable future, so update the macro entry requirements to state this
dependency.  This marginal optimisation can be revisited if circumstances
change.

No practical change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 2f3f48ff27c3..9bfc5964a911 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -874,7 +874,7 @@ ENTRY(double_fault)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rbx
@@ -910,7 +910,7 @@ handle_ist_exception:
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 0ff1b118f882..15e24cde00d1 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -251,34 +251,33 @@
  */
 .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
 /*
- * Requires %rsp=regs, %r14=stack_end
- * Clobbers %rax, %rcx, %rdx
+ * Requires %rsp=regs, %r14=stack_end, %rdx=0
+ * Clobbers %rax, %rbx, %rcx, %rdx
  *
  * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
  * maybexen=1, but with conditionals rather than alternatives.
  */
-    movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %eax
+    movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
 
-    test $SCF_ist_rsb, %al
+    test $SCF_ist_rsb, %bl
     jz .L\@_skip_rsb
 
-    DO_OVERWRITE_RSB tmp=rdx /* Clobbers %rcx/%rdx */
+    DO_OVERWRITE_RSB         /* Clobbers %rax/%rcx */
 
 .L\@_skip_rsb:
 
-    test $SCF_ist_sc_msr, %al
+    test $SCF_ist_sc_msr, %bl
     jz .L\@_skip_msr_spec_ctrl
 
-    xor %edx, %edx
+    xor %eax, %eax
     testb $3, UREGS_cs(%rsp)
-    setnz %dl
-    not %edx
-    and %dl, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+    setnz %al
+    not %eax
+    and %al, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
 
     /* Load Xen's intended value. */
     mov $MSR_SPEC_CTRL, %ecx
     movzbl STACK_CPUINFO_FIELD(xen_spec_ctrl)(%r14), %eax
-    xor %edx, %edx
     wrmsr
 
     /* Opencoded UNLIKELY_START() with no condition. */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Support IBPB-on-entry

We are going to need this to mitigate Branch Type Confusion on AMD/Hygon CPUs,
but as we've talked about using it in other cases too, arrange to support it
generally.  However, this is also very expensive in some cases, so we're going
to want per-domain controls.

Introduce SCF_ist_ibpb and SCF_entry_ibpb controls, adding them to the IST and
DOM masks as appropriate.  Also introduce X86_FEATURE_IBPB_ENTRY_{PV,HVM} to
to patch the code blocks.

For SVM, the STGI is serialising enough to protect against Spectre-v1 attacks,
so no "else lfence" is necessary.  VT-x will use use the MSR host load list,
so doesn't need any code in the VMExit path.

For the IST path, we can't safely check CPL==0 to skip a flush, as we might
have hit an entry path before it's IBPB.  As IST hitting Xen is rare, flush
irrespective of CPL.  A later path, SCF_ist_sc_msr, provides Spectre-v1
safety.

For the PV paths, we know we're interrupting CPL>0, while for the INTR paths,
we can safely check CPL==0.  Only flush when interrupting guest context.

An "else lfence" is needed for safety, but we want to be able to skip it on
unaffected CPUs, so the block wants to be an alternative, which means the
lfence has to be inline rather than UNLIKELY() (the replacement block doesn't
have displacements fixed up for anything other than the first instruction).

As with SPEC_CTRL_ENTRY_FROM_INTR_IST, %rdx is 0 on entry so rely on this to
shrink the logic marginally.  Update the comments to specify this new
dependency.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index 4ae55a2ef605..0ff4008060fa 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -97,7 +97,19 @@ __UNLIKELY_END(nsvm_hap)
 
         GET_CURRENT(bx)
 
-        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: %rsp=regs/cpuinfo         Clob: acd */
+        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: %rsp=regs/cpuinfo, %rdx=0 Clob: acd */
+
+        .macro svm_vmexit_cond_ibpb
+            testb  $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
+            jz     .L_skip_ibpb
+
+            mov    $MSR_PRED_CMD, %ecx
+            mov    $PRED_CMD_IBPB, %eax
+            wrmsr
+.L_skip_ibpb:
+	.endm
+        ALTERNATIVE "", svm_vmexit_cond_ibpb, X86_FEATURE_IBPB_ENTRY_HVM
+
         ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
 
         .macro svm_vmexit_spec_ctrl
@@ -114,6 +126,10 @@ __UNLIKELY_END(nsvm_hap)
         ALTERNATIVE "", svm_vmexit_spec_ctrl, X86_FEATURE_SC_MSR_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
+        /*
+         * STGI is executed unconditionally, and is sufficiently serialising
+         * to safely resolve any Spectre-v1 concerns in the above logic.
+         */
         stgi
 GLOBAL(svm_stgi_label)
         mov  %rsp,%rdi
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index f9f9bc18cdbc..dd817cee4e69 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1345,6 +1345,10 @@ static int construct_vmcs(struct vcpu *v)
         rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D,
                          VMX_MSR_GUEST_LOADONLY);
 
+    if ( !rc && (d->arch.spec_ctrl_flags & SCF_entry_ibpb) )
+        rc = vmx_add_msr(v, MSR_PRED_CMD, PRED_CMD_IBPB,
+                         VMX_MSR_HOST);
+
  out:
     vmx_vmcs_exit(v);
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 0cfe95314249..5c999271e617 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -20,7 +20,7 @@ ENTRY(entry_int82)
         movl  $HYPERCALL_VECTOR, 4(%rsp)
         SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         CR4_PV32_RESTORE
@@ -216,7 +216,7 @@ ENTRY(cstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 9bfc5964a911..3c8593325606 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -260,7 +260,7 @@ ENTRY(lstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -299,7 +299,7 @@ GLOBAL(sysenter_eflags_saved)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -351,7 +351,7 @@ ENTRY(int80_direct_trap)
         movl  $0x80, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -618,7 +618,7 @@ ENTRY(common_interrupt)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
@@ -652,7 +652,7 @@ GLOBAL(handle_exception)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index f7488d3ccbfa..b233e5835fb5 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -39,6 +39,8 @@ XEN_CPUFEATURE(XEN_LBR,           X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
 XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
 XEN_CPUFEATURE(XEN_SHSTK,         X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
 XEN_CPUFEATURE(XEN_IBT,           X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
+XEN_CPUFEATURE(IBPB_ENTRY_PV,     X86_SYNTH(28)) /* MSR_PRED_CMD used by Xen for PV */
+XEN_CPUFEATURE(IBPB_ENTRY_HVM,    X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen for HVM */
 
 /* Bug words follow the synthetic words. */
 #define X86_NR_BUG 1
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index fd8162ca9ab9..10cd0cd2518f 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -34,6 +34,8 @@
 #define SCF_ist_sc_msr (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
+#define SCF_ist_ibpb   (1 << 4)
+#define SCF_entry_ibpb (1 << 5)
 
 /*
  * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
@@ -46,13 +48,13 @@
  * These are the controls to inhibit on the S3 resume path until microcode has
  * been reloaded.
  */
-#define SCF_IST_MASK (SCF_ist_sc_msr)
+#define SCF_IST_MASK (SCF_ist_sc_msr | SCF_ist_ibpb)
 
 /*
  * Some speculative protections are per-domain.  These settings are merged
  * into the top-of-stack block in the context switch path.
  */
-#define SCF_DOM_MASK (SCF_verw)
+#define SCF_DOM_MASK (SCF_verw | SCF_entry_ibpb)
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 15e24cde00d1..9eb4ad9ab71d 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -88,6 +88,35 @@
  *  - SPEC_CTRL_EXIT_TO_{SVM,VMX}
  */
 
+.macro DO_SPEC_CTRL_COND_IBPB maybexen:req
+/*
+ * Requires %rsp=regs (also cpuinfo if !maybexen)
+ * Requires %r14=stack_end (if maybexen), %rdx=0
+ * Clobbers %rax, %rcx, %rdx
+ *
+ * Conditionally issue IBPB if SCF_entry_ibpb is active.  In the maybexen
+ * case, we can safely look at UREGS_cs to skip taking the hit when
+ * interrupting Xen.
+ */
+    .if \maybexen
+        testb  $SCF_entry_ibpb, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+        jz     .L\@_skip
+        testb  $3, UREGS_cs(%rsp)
+    .else
+        testb  $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
+    .endif
+    jz     .L\@_skip
+
+    mov     $MSR_PRED_CMD, %ecx
+    mov     $PRED_CMD_IBPB, %eax
+    wrmsr
+    jmp     .L\@_done
+
+.L\@_skip:
+    lfence
+.L\@_done:
+.endm
+
 .macro DO_OVERWRITE_RSB tmp=rax
 /*
  * Requires nothing
@@ -225,12 +254,16 @@
 
 /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
 #define SPEC_CTRL_ENTRY_FROM_PV                                         \
+    ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=0),     \
+        X86_FEATURE_IBPB_ENTRY_PV;                                      \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
     ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=0),         \
         X86_FEATURE_SC_MSR_PV
 
 /* Use in interrupt/exception context.  May interrupt Xen or PV context. */
 #define SPEC_CTRL_ENTRY_FROM_INTR                                       \
+    ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=1),     \
+        X86_FEATURE_IBPB_ENTRY_PV;                                      \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
     ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1),         \
         X86_FEATURE_SC_MSR_PV
@@ -254,11 +287,23 @@
  * Requires %rsp=regs, %r14=stack_end, %rdx=0
  * Clobbers %rax, %rbx, %rcx, %rdx
  *
- * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
- * maybexen=1, but with conditionals rather than alternatives.
+ * This is logical merge of:
+ *    DO_SPEC_CTRL_COND_IBPB maybexen=0
+ *    DO_OVERWRITE_RSB
+ *    DO_SPEC_CTRL_ENTRY maybexen=1
+ * but with conditionals rather than alternatives.
  */
     movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
 
+    test    $SCF_ist_ibpb, %bl
+    jz      .L\@_skip_ibpb
+
+    mov     $MSR_PRED_CMD, %ecx
+    mov     $PRED_CMD_IBPB, %eax
+    wrmsr
+
+.L\@_skip_ibpb:
+
     test $SCF_ist_rsb, %bl
     jz .L\@_skip_rsb
 
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/cpuid: Enumeration for BTC_NO

BTC_NO indicates that hardware is not succeptable to Branch Type Confusion.

Zen3 CPUs don't suffer BTC.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
index 9a4eb8015a43..2632efc6adb0 100644
--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -283,6 +283,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
         {"virt-ssbd",    0x80000008, NA, CPUID_REG_EBX, 25,  1},
         {"ssb-no",       0x80000008, NA, CPUID_REG_EBX, 26,  1},
         {"psfd",         0x80000008, NA, CPUID_REG_EBX, 28,  1},
+        {"btc-no",       0x80000008, NA, CPUID_REG_EBX, 29,  1},
 
         {"nc",           0x80000008, NA, CPUID_REG_ECX,  0,  8},
         {"apicidsize",   0x80000008, NA, CPUID_REG_ECX, 12,  4},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index 12111fe12d16..e83bc4793d6e 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -157,7 +157,7 @@ static const char *const str_e8b[32] =
     /* [22] */                 [23] = "ppin",
     [24] = "amd-ssbd",         [25] = "virt-ssbd",
     [26] = "ssb-no",
-    [28] = "psfd",
+    [28] = "psfd",             [29] = "btc-no",
 };
 
 static const char *const str_7d0[32] =
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 986672a072b7..675b877f193c 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -822,6 +822,16 @@ static void init_amd(struct cpuinfo_x86 *c)
 			warning_add(text);
 		}
 		break;
+
+	case 0x19:
+		/*
+		 * Zen3 (Fam19h model < 0x10) parts are not susceptible to
+		 * Branch Type Confusion, but predate the allocation of the
+		 * BTC_NO bit.  Fill it back in if we're not virtualised.
+		 */
+		if (!cpu_has_hypervisor && !cpu_has(c, X86_FEATURE_BTC_NO))
+			__set_bit(X86_FEATURE_BTC_NO, c->x86_capability);
+		break;
 	}
 
 	display_cacheinfo(c);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index ced0f8c2aea6..9f66c715516c 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -388,7 +388,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * Hardware read-only information, stating immunity to certain issues, or
      * suggestions of which mitigation to use.
      */
-    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (caps & ARCH_CAPS_RDCL_NO)                        ? " RDCL_NO"        : "",
            (caps & ARCH_CAPS_IBRS_ALL)                       ? " IBRS_ALL"       : "",
            (caps & ARCH_CAPS_RSBA)                           ? " RSBA"           : "",
@@ -403,7 +403,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (e8b  & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS))    ? " IBRS_ALWAYS"    : "",
            (e8b  & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS))   ? " STIBP_ALWAYS"   : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBRS_FAST))      ? " IBRS_FAST"      : "",
-           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "");
+           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_BTC_NO))         ? " BTC_NO"         : "");
 
     /* Hardware features which need driving to mitigate issues. */
     printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 9686c82ed75c..1bbc7da4b53c 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -265,6 +265,7 @@ XEN_CPUFEATURE(AMD_SSBD,      8*32+24) /*S  MSR_SPEC_CTRL.SSBD available */
 XEN_CPUFEATURE(VIRT_SSBD,     8*32+25) /*   MSR_VIRT_SPEC_CTRL.SSBD */
 XEN_CPUFEATURE(SSB_NO,        8*32+26) /*A  Hardware not vulnerable to SSB */
 XEN_CPUFEATURE(PSFD,          8*32+28) /*S  MSR_SPEC_CTRL.PSFD */
+XEN_CPUFEATURE(BTC_NO,        8*32+29) /*A  Hardware not vulnerable to Branch Type Confusion */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Enable Zen2 chickenbit

... as instructed in the Branch Type Confusion whitepaper.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 675b877f193c..60dbe61a61ca 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -731,6 +731,31 @@ void amd_init_ssbd(const struct cpuinfo_x86 *c)
 		printk_once(XENLOG_ERR "No SSBD controls available\n");
 }
 
+/*
+ * On Zen2 we offer this chicken (bit) on the altar of Speculation.
+ *
+ * Refer to the AMD Branch Type Confusion whitepaper:
+ * https://XXX
+ *
+ * Setting this unnamed bit supposedly causes prediction information on
+ * non-branch instructions to be ignored.  It is to be set unilaterally in
+ * newer microcode.
+ *
+ * This chickenbit is something unrelated on Zen1, and Zen1 vs Zen2 isn't a
+ * simple model number comparison, so use STIBP as a heuristic to separate the
+ * two uarches in Fam17h(AMD)/18h(Hygon).
+ */
+void amd_init_spectral_chicken(void)
+{
+	uint64_t val, chickenbit = 1 << 1;
+
+	if (cpu_has_hypervisor || !boot_cpu_has(X86_FEATURE_AMD_STIBP))
+		return;
+
+	if (rdmsr_safe(MSR_AMD64_DE_CFG2, val) == 0 && !(val & chickenbit))
+		wrmsr_safe(MSR_AMD64_DE_CFG2, val | chickenbit);
+}
+
 static void init_amd(struct cpuinfo_x86 *c)
 {
 	u32 l, h;
@@ -783,6 +808,9 @@ static void init_amd(struct cpuinfo_x86 *c)
 
 	amd_init_ssbd(c);
 
+	if (c->x86 == 0x17)
+		amd_init_spectral_chicken();
+
 	/* MFENCE stops RDTSC speculation */
 	if (!cpu_has_lfence_dispatch)
 		__set_bit(X86_FEATURE_MFENCE_RDTSC, c->x86_capability);
diff --git a/xen/arch/x86/cpu/cpu.h b/xen/arch/x86/cpu/cpu.h
index 1a5b3918b37e..e76ab5ce1ae2 100644
--- a/xen/arch/x86/cpu/cpu.h
+++ b/xen/arch/x86/cpu/cpu.h
@@ -22,3 +22,4 @@ void early_init_amd(struct cpuinfo_x86 *c);
 void amd_log_freq(const struct cpuinfo_x86 *c);
 void amd_init_lfence(struct cpuinfo_x86 *c);
 void amd_init_ssbd(const struct cpuinfo_x86 *c);
+void amd_init_spectral_chicken(void);
diff --git a/xen/arch/x86/cpu/hygon.c b/xen/arch/x86/cpu/hygon.c
index 3845e0cf0e89..0cb0e7d55e61 100644
--- a/xen/arch/x86/cpu/hygon.c
+++ b/xen/arch/x86/cpu/hygon.c
@@ -36,6 +36,12 @@ static void init_hygon(struct cpuinfo_x86 *c)
 
 	amd_init_ssbd(c);
 
+	/*
+	 * TODO: Check heuristic safety with Hygon first
+	if (c->x86 == 0x18)
+		amd_init_spectral_chicken();
+	 */
+
 	/* MFENCE stops RDTSC speculation */
 	if (!cpu_has_lfence_dispatch)
 		__set_bit(X86_FEATURE_MFENCE_RDTSC, c->x86_capability);
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 1e743461e91d..b4a360723b14 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -359,6 +359,7 @@
 #define MSR_AMD64_DE_CFG		0xc0011029
 #define AMD64_DE_CFG_LFENCE_SERIALISE	(_AC(1, ULL) << 1)
 #define MSR_AMD64_EX_CFG		0xc001102c
+#define MSR_AMD64_DE_CFG2		0xc00110e3
 
 #define MSR_AMD64_DR0_ADDRESS_MASK	0xc0011027
 #define MSR_AMD64_DR1_ADDRESS_MASK	0xc0011019
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Mitigate Branch Type Confusion when possible

Branch Type Confusion affects AMD/Hygon CPUs on Zen2 and earlier.  To
mitigate, we require SMT safety (STIBP on Zen2, no-SMT on Zen1), and to issue
an IBPB on each entry to Xen, to flush the BTB.

Due to performance concerns, dom0 (which is trusted in most configurations) is
excluded from protections by default.

Therefore:
 * Use STIBP by default on Zen2 too, which now means we want it on by default
   on all hardware supporting STIBP.
 * Break the current IBPB logic out into a new function, extending it with
   IBPB-at-entry logic.
 * Change the existing IBPB-at-ctxt-switch boolean to be tristate, and disable
   it by default when IBPB-at-entry is providing sufficient safety.

If all PV guests on the system are trusted, then it is recommended to boot
with `spec-ctrl=ibpb-entry=no-pv`, as this will provide an additional marginal
perf improvement.

This is part of XSA-407 / CVE-2022-23825.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index b06db5f654e5..b73c4a605011 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2170,7 +2170,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 
 ### spec-ctrl (x86)
 > `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
->              {msr-sc,rsb,md-clear}=<bool>|{pv,hvm}=<bool>,
+>              {msr-sc,rsb,md-clear,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
 >              unpriv-mmio}=<bool> ]`
@@ -2195,9 +2195,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` options offer fine
-grained control over the primitives by Xen.  These impact Xen's ability to
-protect itself, and/or Xen's ability to virtualise support for guests to use.
+The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `md-clear=` and `ibpb-entry=` options
+offer fine grained control over the primitives by Xen.  These impact Xen's
+ability to protect itself, and/or Xen's ability to virtualise support for
+guests to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -2216,6 +2217,11 @@ protect itself, and/or Xen's ability to virtualise support for guests to use.
   compatibility with development versions of this fix, `mds=` is also accepted
   on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
   preference to here.*
+* `ibpb-entry=` offers control over whether IBPB (Indirect Branch Prediction
+  Barrier) is used on entry to Xen.  This is used by default on hardware
+  vulnerable to Branch Type Confusion, but for performance reasons, dom0 is
+  unprotected by default.  If it necessary to protect dom0 too, boot with
+  `spec-ctrl=ibpb-entry`.
 
 If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 9f66c715516c..563519ce0e31 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -39,6 +39,10 @@ static bool __initdata opt_rsb_hvm = true;
 static int8_t __read_mostly opt_md_clear_pv = -1;
 static int8_t __read_mostly opt_md_clear_hvm = -1;
 
+static int8_t __read_mostly opt_ibpb_entry_pv = -1;
+static int8_t __read_mostly opt_ibpb_entry_hvm = -1;
+static bool __read_mostly opt_ibpb_entry_dom0;
+
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
     THUNK_DEFAULT, /* Decide which thunk to use at boot time. */
@@ -54,7 +58,7 @@ int8_t __initdata opt_stibp = -1;
 bool __read_mostly opt_ssbd;
 int8_t __initdata opt_psfd = -1;
 
-bool __read_mostly opt_ibpb_ctxt_switch = true;
+int8_t __read_mostly opt_ibpb_ctxt_switch = -1;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 bool __read_mostly opt_branch_harden = true;
@@ -114,6 +118,9 @@ static int __init parse_spec_ctrl(const char *s)
             opt_rsb_hvm = false;
             opt_md_clear_pv = 0;
             opt_md_clear_hvm = 0;
+            opt_ibpb_entry_pv = 0;
+            opt_ibpb_entry_hvm = 0;
+            opt_ibpb_entry_dom0 = false;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -140,12 +147,14 @@ static int __init parse_spec_ctrl(const char *s)
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
             opt_md_clear_pv = val;
+            opt_ibpb_entry_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
             opt_md_clear_hvm = val;
+            opt_ibpb_entry_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
         {
@@ -210,6 +219,28 @@ static int __init parse_spec_ctrl(const char *s)
                 break;
             }
         }
+        else if ( (val = parse_boolean("ibpb-entry", s, ss)) != -1 )
+        {
+            switch ( val )
+            {
+            case 0:
+            case 1:
+                opt_ibpb_entry_pv = opt_ibpb_entry_hvm =
+                    opt_ibpb_entry_dom0 = val;
+                break;
+
+            case -2:
+                s += strlen("ibpb-entry=");
+                if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+                    opt_ibpb_entry_pv = val;
+                else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+                    opt_ibpb_entry_hvm = val;
+                else
+            default:
+                    rc = -EINVAL;
+                break;
+            }
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -477,27 +508,31 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * mitigation support for guests.
      */
 #ifdef CONFIG_HVM
-    printk("  Support for HVM VMs:%s%s%s%s%s\n",
+    printk("  Support for HVM VMs:%s%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             boot_cpu_has(X86_FEATURE_MD_CLEAR)   ||
+            boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
-           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
+           boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM)  ? " IBPB-entry"    : "");
 
 #endif
 #ifdef CONFIG_PV
-    printk("  Support for PV VMs:%s%s%s%s%s\n",
+    printk("  Support for PV VMs:%s%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             boot_cpu_has(X86_FEATURE_MD_CLEAR)  ||
+            boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
-           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
+           boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV)   ? " IBPB-entry"    : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
            opt_xpti_hwdom ? "enabled" : "disabled",
@@ -730,6 +765,55 @@ static bool __init should_use_eager_fpu(void)
     }
 }
 
+static void __init ibpb_calculations(void)
+{
+    /* Check we have hardware IBPB support before using it... */
+    if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
+    {
+        opt_ibpb_entry_hvm = opt_ibpb_entry_pv = opt_ibpb_ctxt_switch = 0;
+        opt_ibpb_entry_dom0 = false;
+        return;
+    }
+
+    /*
+     * IBPB-on-entry mitigations for Branch Type Confusion.
+     *
+     * IBPB && !BTC_NO selects all AMD/Hygon hardware, not known to be safe,
+     * that we can provide some form of mitigation on.
+     */
+    if ( opt_ibpb_entry_pv == -1 )
+        opt_ibpb_entry_pv = (IS_ENABLED(CONFIG_PV) &&
+                             boot_cpu_has(X86_FEATURE_IBPB) &&
+                             !boot_cpu_has(X86_FEATURE_BTC_NO));
+    if ( opt_ibpb_entry_hvm == -1 )
+        opt_ibpb_entry_hvm = (IS_ENABLED(CONFIG_HVM) &&
+                              boot_cpu_has(X86_FEATURE_IBPB) &&
+                              !boot_cpu_has(X86_FEATURE_BTC_NO));
+
+    if ( opt_ibpb_entry_pv )
+    {
+        setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_PV);
+
+        /*
+         * We only need to flush in IST context if we're protecting against PV
+         * guests.  HVM IBPB-on-entry protections are both atomic with
+         * NMI/#MC, so can't interrupt Xen ahead of having already flushed the
+         * BTB.
+         */
+        default_spec_ctrl_flags |= SCF_ist_ibpb;
+    }
+    if ( opt_ibpb_entry_hvm )
+        setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_HVM);
+
+    /*
+     * If we're using IBPB-on-entry to protect against PV and HVM guests
+     * (ignoring dom0 if trusted), then there's no need to also issue IBPB on
+     * context switch too.
+     */
+    if ( opt_ibpb_ctxt_switch == -1 )
+        opt_ibpb_ctxt_switch = !(opt_ibpb_entry_hvm && opt_ibpb_entry_pv);
+}
+
 /* Calculate whether this CPU is vulnerable to L1TF. */
 static __init void l1tf_calculations(uint64_t caps)
 {
@@ -985,8 +1069,12 @@ void spec_ctrl_init_domain(struct domain *d)
     bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
                  (opt_fb_clear_mmio && is_iommu_enabled(d)));
 
+    bool ibpb = ((pv ? opt_ibpb_entry_pv : opt_ibpb_entry_hvm) &&
+                 (d->domain_id != 0 || opt_ibpb_entry_dom0));
+
     d->arch.spec_ctrl_flags =
         (verw   ? SCF_verw         : 0) |
+        (ibpb   ? SCF_entry_ibpb   : 0) |
         0;
 }
 
@@ -1133,12 +1221,15 @@ void __init init_speculation_mitigations(void)
     }
 
     /*
-     * Use STIBP by default if the hardware hint is set.  Otherwise, leave it
-     * off as it a severe performance pentalty on pre-eIBRS Intel hardware
-     * where it was retrofitted in microcode.
+     * Use STIBP by default on all AMD systems.  Zen3 and later enumerate
+     * STIBP_ALWAYS, but STIBP is needed on Zen2 as part of the mitigations
+     * for Branch Type Confusion.
+     *
+     * Leave STIBP off by default on Intel.  Pre-eIBRS systems suffer a
+     * substantial perf hit when it was implemented in microcode.
      */
     if ( opt_stibp == -1 )
-        opt_stibp = !!boot_cpu_has(X86_FEATURE_STIBP_ALWAYS);
+        opt_stibp = !!boot_cpu_has(X86_FEATURE_AMD_STIBP);
 
     if ( opt_stibp && (boot_cpu_has(X86_FEATURE_STIBP) ||
                        boot_cpu_has(X86_FEATURE_AMD_STIBP)) )
@@ -1192,9 +1283,7 @@ void __init init_speculation_mitigations(void)
     if ( opt_rsb_hvm )
         setup_force_cpu_cap(X86_FEATURE_SC_RSB_HVM);
 
-    /* Check we have hardware IBPB support before using it... */
-    if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-        opt_ibpb_ctxt_switch = false;
+    ibpb_calculations();
 
     /* Check whether Eager FPU should be enabled by default. */
     if ( opt_eager_fpu == -1 )
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 10cd0cd2518f..33e845991b0a 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -65,7 +65,7 @@
 void init_speculation_mitigations(void);
 void spec_ctrl_init_domain(struct domain *d);
 
-extern bool opt_ibpb_ctxt_switch;
+extern int8_t opt_ibpb_ctxt_switch;
 extern bool opt_ssbd;
 extern int8_t opt_eager_fpu;
 extern int8_t opt_l1d_flush;
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rework spec_ctrl_flags context switching

We are shortly going to need to context switch new bits in both the vcpu and
S3 paths.  Introduce SCF_IST_MASK and SCF_DOM_MASK, and rework d->arch.verw
into d->arch.spec_ctrl_flags to accommodate.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 5eaa77f66a28..dd397f713067 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -248,8 +248,8 @@ static int enter_state(u32 state)
         error = 0;
 
     ci = get_cpu_info();
-    /* Avoid NMI/#MC using MSR_SPEC_CTRL until we've reloaded microcode. */
-    ci->spec_ctrl_flags &= ~SCF_ist_wrmsr;
+    /* Avoid NMI/#MC using unsafe MSRs until we've reloaded microcode. */
+    ci->spec_ctrl_flags &= ~SCF_IST_MASK;
 
     ACPI_FLUSH_CPU_CACHE();
 
@@ -292,8 +292,8 @@ static int enter_state(u32 state)
     if ( !recheck_cpu_features(0) )
         panic("Missing previously available feature(s)\n");
 
-    /* Re-enabled default NMI/#MC use of MSR_SPEC_CTRL. */
-    ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_ist_wrmsr);
+    /* Re-enabled default NMI/#MC use of MSRs now microcode is loaded. */
+    ci->spec_ctrl_flags |= (default_spec_ctrl_flags & SCF_IST_MASK);
 
     if ( boot_cpu_has(X86_FEATURE_IBRSB) || boot_cpu_has(X86_FEATURE_IBRS) )
     {
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 1fe6644a71ae..82a0b73cf6ef 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2092,10 +2092,10 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
             }
         }
 
-        /* Update the top-of-stack block with the VERW disposition. */
-        info->spec_ctrl_flags &= ~SCF_verw;
-        if ( nextd->arch.verw )
-            info->spec_ctrl_flags |= SCF_verw;
+        /* Update the top-of-stack block with the new spec_ctrl settings. */
+        info->spec_ctrl_flags =
+            (info->spec_ctrl_flags       & ~SCF_DOM_MASK) |
+            (nextd->arch.spec_ctrl_flags &  SCF_DOM_MASK);
     }
 
     sched_context_switched(prev, next);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 9507e5da60a9..7e646680f1c7 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -1010,9 +1010,12 @@ void spec_ctrl_init_domain(struct domain *d)
 {
     bool pv = is_pv_domain(d);
 
-    d->arch.verw =
-        (pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
-        (opt_fb_clear_mmio && is_iommu_enabled(d));
+    bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
+                 (opt_fb_clear_mmio && is_iommu_enabled(d)));
+
+    d->arch.spec_ctrl_flags =
+        (verw   ? SCF_verw         : 0) |
+        0;
 }
 
 void __init init_speculation_mitigations(void)
diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 2398a1d99da9..e4c099262cb7 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -319,8 +319,7 @@ struct arch_domain
     uint32_t pci_cf8;
     uint8_t cmos_idx;
 
-    /* Use VERW on return-to-guest for its flushing side effect. */
-    bool verw;
+    uint8_t spec_ctrl_flags; /* See SCF_DOM_MASK */
 
     union {
         struct pv_domain pv;
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 7e83e0179fb9..3cd72e40305f 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -20,12 +20,40 @@
 #ifndef __X86_SPEC_CTRL_H__
 #define __X86_SPEC_CTRL_H__
 
-/* Encoding of cpuinfo.spec_ctrl_flags */
+/*
+ * Encoding of:
+ *   cpuinfo.spec_ctrl_flags
+ *   default_spec_ctrl_flags
+ *   domain.spec_ctrl_flags
+ *
+ * Live settings are in the top-of-stack block, because they need to be
+ * accessable when XPTI is active.  Some settings are fixed from boot, some
+ * context switched per domain, and some inhibited in the S3 path.
+ */
 #define SCF_use_shadow (1 << 0)
 #define SCF_ist_wrmsr  (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
 
+/*
+ * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
+ * functionality requires updated microcode to work.
+ *
+ * On boot, this is easy; we load microcode before figuring out which
+ * speculative protections to apply.  However, on the S3 resume path, we must
+ * be able to disable the configured mitigations until microcode is reloaded.
+ *
+ * These are the controls to inhibit on the S3 resume path until microcode has
+ * been reloaded.
+ */
+#define SCF_IST_MASK (SCF_ist_wrmsr)
+
+/*
+ * Some speculative protections are per-domain.  These settings are merged
+ * into the top-of-stack block in the context switch path.
+ */
+#define SCF_DOM_MASK (SCF_verw)
+
 #ifndef __ASSEMBLY__
 
 #include <asm/alternative.h>
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 5a590bac44aa..66b00d511fc6 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -248,9 +248,6 @@
 
 /*
  * Use in IST interrupt/exception context.  May interrupt Xen or PV context.
- * Fine grain control of SCF_ist_wrmsr is needed for safety in the S3 resume
- * path to avoid using MSR_SPEC_CTRL before the microcode introducing it has
- * been reloaded.
  */
 .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
 /*
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rename SCF_ist_wrmsr to SCF_ist_sc_msr

We are about to introduce SCF_ist_ibpb, at which point SCF_ist_wrmsr becomes
ambiguous.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 7e646680f1c7..89f95c083e1b 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -1115,7 +1115,7 @@ void __init init_speculation_mitigations(void)
     {
         if ( opt_msr_sc_pv )
         {
-            default_spec_ctrl_flags |= SCF_ist_wrmsr;
+            default_spec_ctrl_flags |= SCF_ist_sc_msr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_PV);
         }
 
@@ -1126,7 +1126,7 @@ void __init init_speculation_mitigations(void)
              * Xen's value is not restored atomically.  An early NMI hitting
              * the VMExit path needs to restore Xen's value for safety.
              */
-            default_spec_ctrl_flags |= SCF_ist_wrmsr;
+            default_spec_ctrl_flags |= SCF_ist_sc_msr;
             setup_force_cpu_cap(X86_FEATURE_SC_MSR_HVM);
         }
     }
@@ -1139,7 +1139,7 @@ void __init init_speculation_mitigations(void)
          * on real hardware matches the availability of MSR_SPEC_CTRL in the
          * first place.
          *
-         * No need for SCF_ist_wrmsr because Xen's value is restored
+         * No need for SCF_ist_sc_msr because Xen's value is restored
          * atomically WRT NMIs in the VMExit path.
          *
          * TODO: Adjust cpu_has_svm_spec_ctrl to be usable earlier on boot.
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 3cd72e40305f..f8f0ac47e759 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -31,7 +31,7 @@
  * context switched per domain, and some inhibited in the S3 path.
  */
 #define SCF_use_shadow (1 << 0)
-#define SCF_ist_wrmsr  (1 << 1)
+#define SCF_ist_sc_msr (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
 
@@ -46,7 +46,7 @@
  * These are the controls to inhibit on the S3 resume path until microcode has
  * been reloaded.
  */
-#define SCF_IST_MASK (SCF_ist_wrmsr)
+#define SCF_IST_MASK (SCF_ist_sc_msr)
 
 /*
  * Some speculative protections are per-domain.  These settings are merged
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 66b00d511fc6..0ff1b118f882 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -266,8 +266,8 @@
 
 .L\@_skip_rsb:
 
-    test $SCF_ist_wrmsr, %al
-    jz .L\@_skip_wrmsr
+    test $SCF_ist_sc_msr, %al
+    jz .L\@_skip_msr_spec_ctrl
 
     xor %edx, %edx
     testb $3, UREGS_cs(%rsp)
@@ -290,7 +290,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
      * to speculate around the WRMSR.  As a result, we need a dispatch
      * serialising instruction in the else clause.
      */
-.L\@_skip_wrmsr:
+.L\@_skip_msr_spec_ctrl:
     lfence
     UNLIKELY_END(\@_serialise)
 .endm
@@ -301,7 +301,7 @@ UNLIKELY_DISPATCH_LABEL(\@_serialise):
  * Requires %rbx=stack_end
  * Clobbers %rax, %rcx, %rdx
  */
-    testb $SCF_ist_wrmsr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
+    testb $SCF_ist_sc_msr, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%rbx)
     jz .L\@_skip
 
     DO_SPEC_CTRL_EXIT_TO_XEN
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rename opt_ibpb to opt_ibpb_ctxt_switch

We are about to introduce the use of IBPB at different points in Xen, making
opt_ibpb ambiguous.  Rename it to opt_ibpb_ctxt_switch.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 82a0b73cf6ef..0d39981550ca 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2064,7 +2064,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
         ctxt_switch_levelling(next);
 
-        if ( opt_ibpb && !is_idle_domain(nextd) )
+        if ( opt_ibpb_ctxt_switch && !is_idle_domain(nextd) )
         {
             static DEFINE_PER_CPU(unsigned int, last);
             unsigned int *last_id = &this_cpu(last);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 89f95c083e1b..f4ae36eae2d0 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -54,7 +54,7 @@ int8_t __initdata opt_stibp = -1;
 bool __read_mostly opt_ssbd;
 int8_t __initdata opt_psfd = -1;
 
-bool __read_mostly opt_ibpb = true;
+bool __read_mostly opt_ibpb_ctxt_switch = true;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 static bool __initdata opt_branch_harden = true;
@@ -117,7 +117,7 @@ static int __init parse_spec_ctrl(const char *s)
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
-            opt_ibpb = false;
+            opt_ibpb_ctxt_switch = false;
             opt_ssbd = false;
             opt_l1d_flush = 0;
             opt_branch_harden = false;
@@ -238,7 +238,7 @@ static int __init parse_spec_ctrl(const char *s)
 
         /* Misc settings. */
         else if ( (val = parse_boolean("ibpb", s, ss)) >= 0 )
-            opt_ibpb = val;
+            opt_ibpb_ctxt_switch = val;
         else if ( (val = parse_boolean("eager-fpu", s, ss)) >= 0 )
             opt_eager_fpu = val;
         else if ( (val = parse_boolean("l1d-flush", s, ss)) >= 0 )
@@ -458,7 +458,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (opt_tsx & 1)                             ? " TSX+" : " TSX-",
            !cpu_has_srbds_ctrl                       ? "" :
            opt_srb_lock                              ? " SRB_LOCK+" : " SRB_LOCK-",
-           opt_ibpb                                  ? " IBPB"  : "",
+           opt_ibpb_ctxt_switch                      ? " IBPB-ctxt" : "",
            opt_l1d_flush                             ? " L1D_FLUSH" : "",
            opt_md_clear_pv || opt_md_clear_hvm ||
            opt_fb_clear_mmio                         ? " VERW"  : "",
@@ -1240,7 +1240,7 @@ void __init init_speculation_mitigations(void)
 
     /* Check we have hardware IBPB support before using it... */
     if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-        opt_ibpb = false;
+        opt_ibpb_ctxt_switch = false;
 
     /* Check whether Eager FPU should be enabled by default. */
     if ( opt_eager_fpu == -1 )
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index f8f0ac47e759..fb4365575620 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -63,7 +63,7 @@
 void init_speculation_mitigations(void);
 void spec_ctrl_init_domain(struct domain *d);
 
-extern bool opt_ibpb;
+extern bool opt_ibpb_ctxt_switch;
 extern bool opt_ssbd;
 extern int8_t opt_eager_fpu;
 extern int8_t opt_l1d_flush;
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rework SPEC_CTRL_ENTRY_FROM_INTR_IST

We are shortly going to add a conditional IBPB in this path.

Therefore, we cannot hold spec_ctrl_flags in %eax, and rely on only clobbering
it after we're done with its contents.  %rbx is available for use, and the
more normal register to hold preserved information in.

With %rax freed up, use it instead of %rdx for the RSB tmp register, and for
the adjustment to spec_ctrl_flags.

This leaves no use of %rdx, except as 0 for the upper half of WRMSR.  In
practice, %rdx is 0 from SAVE_ALL on all paths and isn't likely to change in
the foreseeable future, so update the macro entry requirements to state this
dependency.  This marginal optimisation can be revisited if circumstances
change.

No practical change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 2a86938f1f32..a1810bf4d311 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -932,7 +932,7 @@ ENTRY(double_fault)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rbx
@@ -968,7 +968,7 @@ handle_ist_exception:
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 0ff1b118f882..15e24cde00d1 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -251,34 +251,33 @@
  */
 .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
 /*
- * Requires %rsp=regs, %r14=stack_end
- * Clobbers %rax, %rcx, %rdx
+ * Requires %rsp=regs, %r14=stack_end, %rdx=0
+ * Clobbers %rax, %rbx, %rcx, %rdx
  *
  * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
  * maybexen=1, but with conditionals rather than alternatives.
  */
-    movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %eax
+    movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
 
-    test $SCF_ist_rsb, %al
+    test $SCF_ist_rsb, %bl
     jz .L\@_skip_rsb
 
-    DO_OVERWRITE_RSB tmp=rdx /* Clobbers %rcx/%rdx */
+    DO_OVERWRITE_RSB         /* Clobbers %rax/%rcx */
 
 .L\@_skip_rsb:
 
-    test $SCF_ist_sc_msr, %al
+    test $SCF_ist_sc_msr, %bl
     jz .L\@_skip_msr_spec_ctrl
 
-    xor %edx, %edx
+    xor %eax, %eax
     testb $3, UREGS_cs(%rsp)
-    setnz %dl
-    not %edx
-    and %dl, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+    setnz %al
+    not %eax
+    and %al, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
 
     /* Load Xen's intended value. */
     mov $MSR_SPEC_CTRL, %ecx
     movzbl STACK_CPUINFO_FIELD(xen_spec_ctrl)(%r14), %eax
-    xor %edx, %edx
     wrmsr
 
     /* Opencoded UNLIKELY_START() with no condition. */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Support IBPB-on-entry

We are going to need this to mitigate Branch Type Confusion on AMD/Hygon CPUs,
but as we've talked about using it in other cases too, arrange to support it
generally.  However, this is also very expensive in some cases, so we're going
to want per-domain controls.

Introduce SCF_ist_ibpb and SCF_entry_ibpb controls, adding them to the IST and
DOM masks as appropriate.  Also introduce X86_FEATURE_IBPB_ENTRY_{PV,HVM} to
to patch the code blocks.

For SVM, the STGI is serialising enough to protect against Spectre-v1 attacks,
so no "else lfence" is necessary.  VT-x will use use the MSR host load list,
so doesn't need any code in the VMExit path.

For the IST path, we can't safely check CPL==0 to skip a flush, as we might
have hit an entry path before it's IBPB.  As IST hitting Xen is rare, flush
irrespective of CPL.  A later path, SCF_ist_sc_msr, provides Spectre-v1
safety.

For the PV paths, we know we're interrupting CPL>0, while for the INTR paths,
we can safely check CPL==0.  Only flush when interrupting guest context.

An "else lfence" is needed for safety, but we want to be able to skip it on
unaffected CPUs, so the block wants to be an alternative, which means the
lfence has to be inline rather than UNLIKELY() (the replacement block doesn't
have displacements fixed up for anything other than the first instruction).

As with SPEC_CTRL_ENTRY_FROM_INTR_IST, %rdx is 0 on entry so rely on this to
shrink the logic marginally.  Update the comments to specify this new
dependency.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index 4ae55a2ef605..0ff4008060fa 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -97,7 +97,19 @@ __UNLIKELY_END(nsvm_hap)
 
         GET_CURRENT(bx)
 
-        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: %rsp=regs/cpuinfo         Clob: acd */
+        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: %rsp=regs/cpuinfo, %rdx=0 Clob: acd */
+
+        .macro svm_vmexit_cond_ibpb
+            testb  $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
+            jz     .L_skip_ibpb
+
+            mov    $MSR_PRED_CMD, %ecx
+            mov    $PRED_CMD_IBPB, %eax
+            wrmsr
+.L_skip_ibpb:
+	.endm
+        ALTERNATIVE "", svm_vmexit_cond_ibpb, X86_FEATURE_IBPB_ENTRY_HVM
+
         ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
 
         .macro svm_vmexit_spec_ctrl
@@ -114,6 +126,10 @@ __UNLIKELY_END(nsvm_hap)
         ALTERNATIVE "", svm_vmexit_spec_ctrl, X86_FEATURE_SC_MSR_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
+        /*
+         * STGI is executed unconditionally, and is sufficiently serialising
+         * to safely resolve any Spectre-v1 concerns in the above logic.
+         */
         stgi
 GLOBAL(svm_stgi_label)
         mov  %rsp,%rdi
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index f9f9bc18cdbc..dd817cee4e69 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1345,6 +1345,10 @@ static int construct_vmcs(struct vcpu *v)
         rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D,
                          VMX_MSR_GUEST_LOADONLY);
 
+    if ( !rc && (d->arch.spec_ctrl_flags & SCF_entry_ibpb) )
+        rc = vmx_add_msr(v, MSR_PRED_CMD, PRED_CMD_IBPB,
+                         VMX_MSR_HOST);
+
  out:
     vmx_vmcs_exit(v);
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 5fd6dbbd4513..b86d38d1c50d 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -18,7 +18,7 @@ ENTRY(entry_int82)
         movl  $HYPERCALL_VECTOR, 4(%rsp)
         SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         CR4_PV32_RESTORE
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index a1810bf4d311..fba8ae498f74 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -260,7 +260,7 @@ ENTRY(lstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -298,7 +298,7 @@ ENTRY(cstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -338,7 +338,7 @@ GLOBAL(sysenter_eflags_saved)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -392,7 +392,7 @@ ENTRY(int80_direct_trap)
         movl  $0x80, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -674,7 +674,7 @@ ENTRY(common_interrupt)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
@@ -708,7 +708,7 @@ GLOBAL(handle_exception)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index 493d338a085e..672c9ee22ba2 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -39,6 +39,8 @@ XEN_CPUFEATURE(XEN_LBR,           X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
 XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
 XEN_CPUFEATURE(XEN_SHSTK,         X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
 XEN_CPUFEATURE(XEN_IBT,           X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
+XEN_CPUFEATURE(IBPB_ENTRY_PV,     X86_SYNTH(28)) /* MSR_PRED_CMD used by Xen for PV */
+XEN_CPUFEATURE(IBPB_ENTRY_HVM,    X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen for HVM */
 
 /* Bug words follow the synthetic words. */
 #define X86_NR_BUG 1
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index fb4365575620..3fc599a817c4 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -34,6 +34,8 @@
 #define SCF_ist_sc_msr (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
+#define SCF_ist_ibpb   (1 << 4)
+#define SCF_entry_ibpb (1 << 5)
 
 /*
  * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
@@ -46,13 +48,13 @@
  * These are the controls to inhibit on the S3 resume path until microcode has
  * been reloaded.
  */
-#define SCF_IST_MASK (SCF_ist_sc_msr)
+#define SCF_IST_MASK (SCF_ist_sc_msr | SCF_ist_ibpb)
 
 /*
  * Some speculative protections are per-domain.  These settings are merged
  * into the top-of-stack block in the context switch path.
  */
-#define SCF_DOM_MASK (SCF_verw)
+#define SCF_DOM_MASK (SCF_verw | SCF_entry_ibpb)
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/include/asm-x86/spec_ctrl_asm.h b/xen/include/asm-x86/spec_ctrl_asm.h
index 15e24cde00d1..9eb4ad9ab71d 100644
--- a/xen/include/asm-x86/spec_ctrl_asm.h
+++ b/xen/include/asm-x86/spec_ctrl_asm.h
@@ -88,6 +88,35 @@
  *  - SPEC_CTRL_EXIT_TO_{SVM,VMX}
  */
 
+.macro DO_SPEC_CTRL_COND_IBPB maybexen:req
+/*
+ * Requires %rsp=regs (also cpuinfo if !maybexen)
+ * Requires %r14=stack_end (if maybexen), %rdx=0
+ * Clobbers %rax, %rcx, %rdx
+ *
+ * Conditionally issue IBPB if SCF_entry_ibpb is active.  In the maybexen
+ * case, we can safely look at UREGS_cs to skip taking the hit when
+ * interrupting Xen.
+ */
+    .if \maybexen
+        testb  $SCF_entry_ibpb, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+        jz     .L\@_skip
+        testb  $3, UREGS_cs(%rsp)
+    .else
+        testb  $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
+    .endif
+    jz     .L\@_skip
+
+    mov     $MSR_PRED_CMD, %ecx
+    mov     $PRED_CMD_IBPB, %eax
+    wrmsr
+    jmp     .L\@_done
+
+.L\@_skip:
+    lfence
+.L\@_done:
+.endm
+
 .macro DO_OVERWRITE_RSB tmp=rax
 /*
  * Requires nothing
@@ -225,12 +254,16 @@
 
 /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
 #define SPEC_CTRL_ENTRY_FROM_PV                                         \
+    ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=0),     \
+        X86_FEATURE_IBPB_ENTRY_PV;                                      \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
     ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=0),         \
         X86_FEATURE_SC_MSR_PV
 
 /* Use in interrupt/exception context.  May interrupt Xen or PV context. */
 #define SPEC_CTRL_ENTRY_FROM_INTR                                       \
+    ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=1),     \
+        X86_FEATURE_IBPB_ENTRY_PV;                                      \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
     ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1),         \
         X86_FEATURE_SC_MSR_PV
@@ -254,11 +287,23 @@
  * Requires %rsp=regs, %r14=stack_end, %rdx=0
  * Clobbers %rax, %rbx, %rcx, %rdx
  *
- * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
- * maybexen=1, but with conditionals rather than alternatives.
+ * This is logical merge of:
+ *    DO_SPEC_CTRL_COND_IBPB maybexen=0
+ *    DO_OVERWRITE_RSB
+ *    DO_SPEC_CTRL_ENTRY maybexen=1
+ * but with conditionals rather than alternatives.
  */
     movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
 
+    test    $SCF_ist_ibpb, %bl
+    jz      .L\@_skip_ibpb
+
+    mov     $MSR_PRED_CMD, %ecx
+    mov     $PRED_CMD_IBPB, %eax
+    wrmsr
+
+.L\@_skip_ibpb:
+
     test $SCF_ist_rsb, %bl
     jz .L\@_skip_rsb
 
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/cpuid: Enumeration for BTC_NO

BTC_NO indicates that hardware is not succeptable to Branch Type Confusion.

Zen3 CPUs don't suffer BTC.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
index d462f9e421ed..bf6fdee360a9 100644
--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -288,6 +288,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
         {"virt-ssbd",    0x80000008, NA, CPUID_REG_EBX, 25,  1},
         {"ssb-no",       0x80000008, NA, CPUID_REG_EBX, 26,  1},
         {"psfd",         0x80000008, NA, CPUID_REG_EBX, 28,  1},
+        {"btc-no",       0x80000008, NA, CPUID_REG_EBX, 29,  1},
 
         {"nc",           0x80000008, NA, CPUID_REG_ECX,  0,  8},
         {"apicidsize",   0x80000008, NA, CPUID_REG_ECX, 12,  4},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index bc7dcf55757a..fe22f5f5b68b 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -158,7 +158,7 @@ static const char *const str_e8b[32] =
     /* [22] */                 [23] = "ppin",
     [24] = "amd-ssbd",         [25] = "virt-ssbd",
     [26] = "ssb-no",
-    [28] = "psfd",
+    [28] = "psfd",             [29] = "btc-no",
 };
 
 static const char *const str_7d0[32] =
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index b3b9a0df5fed..b158e3acb5c7 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -847,6 +847,16 @@ static void init_amd(struct cpuinfo_x86 *c)
 			warning_add(text);
 		}
 		break;
+
+	case 0x19:
+		/*
+		 * Zen3 (Fam19h model < 0x10) parts are not susceptible to
+		 * Branch Type Confusion, but predate the allocation of the
+		 * BTC_NO bit.  Fill it back in if we're not virtualised.
+		 */
+		if (!cpu_has_hypervisor && !cpu_has(c, X86_FEATURE_BTC_NO))
+			__set_bit(X86_FEATURE_BTC_NO, c->x86_capability);
+		break;
 	}
 
 	display_cacheinfo(c);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index f4ae36eae2d0..0f101c057f3e 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -388,7 +388,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * Hardware read-only information, stating immunity to certain issues, or
      * suggestions of which mitigation to use.
      */
-    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (caps & ARCH_CAPS_RDCL_NO)                        ? " RDCL_NO"        : "",
            (caps & ARCH_CAPS_IBRS_ALL)                       ? " IBRS_ALL"       : "",
            (caps & ARCH_CAPS_RSBA)                           ? " RSBA"           : "",
@@ -403,7 +403,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (e8b  & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS))    ? " IBRS_ALWAYS"    : "",
            (e8b  & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS))   ? " STIBP_ALWAYS"   : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBRS_FAST))      ? " IBRS_FAST"      : "",
-           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "");
+           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_BTC_NO))         ? " BTC_NO"         : "");
 
     /* Hardware features which need driving to mitigate issues. */
     printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 743b857dcd5c..e7b8167800a2 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -266,6 +266,7 @@ XEN_CPUFEATURE(AMD_SSBD,      8*32+24) /*S  MSR_SPEC_CTRL.SSBD available */
 XEN_CPUFEATURE(VIRT_SSBD,     8*32+25) /*   MSR_VIRT_SPEC_CTRL.SSBD */
 XEN_CPUFEATURE(SSB_NO,        8*32+26) /*A  Hardware not vulnerable to SSB */
 XEN_CPUFEATURE(PSFD,          8*32+28) /*S  MSR_SPEC_CTRL.PSFD */
+XEN_CPUFEATURE(BTC_NO,        8*32+29) /*A  Hardware not vulnerable to Branch Type Confusion */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Enable Zen2 chickenbit

... as instructed in the Branch Type Confusion whitepaper.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index b158e3acb5c7..37ac84ddd74d 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -731,6 +731,31 @@ void amd_init_ssbd(const struct cpuinfo_x86 *c)
 		printk_once(XENLOG_ERR "No SSBD controls available\n");
 }
 
+/*
+ * On Zen2 we offer this chicken (bit) on the altar of Speculation.
+ *
+ * Refer to the AMD Branch Type Confusion whitepaper:
+ * https://XXX
+ *
+ * Setting this unnamed bit supposedly causes prediction information on
+ * non-branch instructions to be ignored.  It is to be set unilaterally in
+ * newer microcode.
+ *
+ * This chickenbit is something unrelated on Zen1, and Zen1 vs Zen2 isn't a
+ * simple model number comparison, so use STIBP as a heuristic to separate the
+ * two uarches in Fam17h(AMD)/18h(Hygon).
+ */
+void amd_init_spectral_chicken(void)
+{
+	uint64_t val, chickenbit = 1 << 1;
+
+	if (cpu_has_hypervisor || !boot_cpu_has(X86_FEATURE_AMD_STIBP))
+		return;
+
+	if (rdmsr_safe(MSR_AMD64_DE_CFG2, val) == 0 && !(val & chickenbit))
+		wrmsr_safe(MSR_AMD64_DE_CFG2, val | chickenbit);
+}
+
 void __init detect_zen2_null_seg_behaviour(void)
 {
 	uint64_t base;
@@ -796,6 +821,9 @@ static void init_amd(struct cpuinfo_x86 *c)
 
 	amd_init_ssbd(c);
 
+	if (c->x86 == 0x17)
+		amd_init_spectral_chicken();
+
 	/* Probe for NSCB on Zen2 CPUs when not virtualised */
 	if (!cpu_has_hypervisor && !cpu_has_nscb && c == &boot_cpu_data &&
 	    c->x86 == 0x17)
diff --git a/xen/arch/x86/cpu/cpu.h b/xen/arch/x86/cpu/cpu.h
index b593bd85f04f..145bc5156a86 100644
--- a/xen/arch/x86/cpu/cpu.h
+++ b/xen/arch/x86/cpu/cpu.h
@@ -22,4 +22,5 @@ void early_init_amd(struct cpuinfo_x86 *c);
 void amd_log_freq(const struct cpuinfo_x86 *c);
 void amd_init_lfence(struct cpuinfo_x86 *c);
 void amd_init_ssbd(const struct cpuinfo_x86 *c);
+void amd_init_spectral_chicken(void);
 void detect_zen2_null_seg_behaviour(void);
diff --git a/xen/arch/x86/cpu/hygon.c b/xen/arch/x86/cpu/hygon.c
index cdc94130dd2e..6f8d491297e8 100644
--- a/xen/arch/x86/cpu/hygon.c
+++ b/xen/arch/x86/cpu/hygon.c
@@ -41,6 +41,12 @@ static void init_hygon(struct cpuinfo_x86 *c)
 		detect_zen2_null_seg_behaviour();
 
 	/*
+	 * TODO: Check heuristic safety with Hygon first
+	if (c->x86 == 0x18)
+		amd_init_spectral_chicken();
+	 */
+
+	/*
 	 * Hygon CPUs before Zen2 don't clear segment bases/limits when
 	 * loading a NULL selector.
 	 */
diff --git a/xen/include/asm-x86/msr-index.h b/xen/include/asm-x86/msr-index.h
index 72bc32ba04ff..d3735e499e0f 100644
--- a/xen/include/asm-x86/msr-index.h
+++ b/xen/include/asm-x86/msr-index.h
@@ -361,6 +361,7 @@
 #define MSR_AMD64_DE_CFG		0xc0011029
 #define AMD64_DE_CFG_LFENCE_SERIALISE	(_AC(1, ULL) << 1)
 #define MSR_AMD64_EX_CFG		0xc001102c
+#define MSR_AMD64_DE_CFG2		0xc00110e3
 
 #define MSR_AMD64_DR0_ADDRESS_MASK	0xc0011027
 #define MSR_AMD64_DR1_ADDRESS_MASK	0xc0011019
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Mitigate Branch Type Confusion when possible

Branch Type Confusion affects AMD/Hygon CPUs on Zen2 and earlier.  To
mitigate, we require SMT safety (STIBP on Zen2, no-SMT on Zen1), and to issue
an IBPB on each entry to Xen, to flush the BTB.

Due to performance concerns, dom0 (which is trusted in most configurations) is
excluded from protections by default.

Therefore:
 * Use STIBP by default on Zen2 too, which now means we want it on by default
   on all hardware supporting STIBP.
 * Break the current IBPB logic out into a new function, extending it with
   IBPB-at-entry logic.
 * Change the existing IBPB-at-ctxt-switch boolean to be tristate, and disable
   it by default when IBPB-at-entry is providing sufficient safety.

If all PV guests on the system are trusted, then it is recommended to boot
with `spec-ctrl=ibpb-entry=no-pv`, as this will provide an additional marginal
perf improvement.

This is part of XSA-407 / CVE-2022-23825.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 1bbdb55129cc..bd6826d0ae05 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2234,7 +2234,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 
 ### spec-ctrl (x86)
 > `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
->              {msr-sc,rsb,md-clear}=<bool>|{pv,hvm}=<bool>,
+>              {msr-sc,rsb,md-clear,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
 >              unpriv-mmio}=<bool> ]`
@@ -2259,9 +2259,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` options offer fine
-grained control over the primitives by Xen.  These impact Xen's ability to
-protect itself, and/or Xen's ability to virtualise support for guests to use.
+The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `md-clear=` and `ibpb-entry=` options
+offer fine grained control over the primitives by Xen.  These impact Xen's
+ability to protect itself, and/or Xen's ability to virtualise support for
+guests to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -2280,6 +2281,11 @@ protect itself, and/or Xen's ability to virtualise support for guests to use.
   compatibility with development versions of this fix, `mds=` is also accepted
   on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
   preference to here.*
+* `ibpb-entry=` offers control over whether IBPB (Indirect Branch Prediction
+  Barrier) is used on entry to Xen.  This is used by default on hardware
+  vulnerable to Branch Type Confusion, but for performance reasons, dom0 is
+  unprotected by default.  If it necessary to protect dom0 too, boot with
+  `spec-ctrl=ibpb-entry`.
 
 If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 0f101c057f3e..1d9796c34d71 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -39,6 +39,10 @@ static bool __initdata opt_rsb_hvm = true;
 static int8_t __read_mostly opt_md_clear_pv = -1;
 static int8_t __read_mostly opt_md_clear_hvm = -1;
 
+static int8_t __read_mostly opt_ibpb_entry_pv = -1;
+static int8_t __read_mostly opt_ibpb_entry_hvm = -1;
+static bool __read_mostly opt_ibpb_entry_dom0;
+
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
     THUNK_DEFAULT, /* Decide which thunk to use at boot time. */
@@ -54,7 +58,7 @@ int8_t __initdata opt_stibp = -1;
 bool __read_mostly opt_ssbd;
 int8_t __initdata opt_psfd = -1;
 
-bool __read_mostly opt_ibpb_ctxt_switch = true;
+int8_t __read_mostly opt_ibpb_ctxt_switch = -1;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 static bool __initdata opt_branch_harden = true;
@@ -114,6 +118,9 @@ static int __init parse_spec_ctrl(const char *s)
             opt_rsb_hvm = false;
             opt_md_clear_pv = 0;
             opt_md_clear_hvm = 0;
+            opt_ibpb_entry_pv = 0;
+            opt_ibpb_entry_hvm = 0;
+            opt_ibpb_entry_dom0 = false;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -140,12 +147,14 @@ static int __init parse_spec_ctrl(const char *s)
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
             opt_md_clear_pv = val;
+            opt_ibpb_entry_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
             opt_md_clear_hvm = val;
+            opt_ibpb_entry_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
         {
@@ -210,6 +219,28 @@ static int __init parse_spec_ctrl(const char *s)
                 break;
             }
         }
+        else if ( (val = parse_boolean("ibpb-entry", s, ss)) != -1 )
+        {
+            switch ( val )
+            {
+            case 0:
+            case 1:
+                opt_ibpb_entry_pv = opt_ibpb_entry_hvm =
+                    opt_ibpb_entry_dom0 = val;
+                break;
+
+            case -2:
+                s += strlen("ibpb-entry=");
+                if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+                    opt_ibpb_entry_pv = val;
+                else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+                    opt_ibpb_entry_hvm = val;
+                else
+            default:
+                    rc = -EINVAL;
+                break;
+            }
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -477,27 +508,31 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * mitigation support for guests.
      */
 #ifdef CONFIG_HVM
-    printk("  Support for HVM VMs:%s%s%s%s%s\n",
+    printk("  Support for HVM VMs:%s%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             boot_cpu_has(X86_FEATURE_MD_CLEAR)   ||
+            boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
-           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
+           boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM)  ? " IBPB-entry"    : "");
 
 #endif
 #ifdef CONFIG_PV
-    printk("  Support for PV VMs:%s%s%s%s%s\n",
+    printk("  Support for PV VMs:%s%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             boot_cpu_has(X86_FEATURE_MD_CLEAR)  ||
+            boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
-           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
+           boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV)   ? " IBPB-entry"    : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
            opt_xpti_hwdom ? "enabled" : "disabled",
@@ -759,6 +794,55 @@ static bool __init should_use_eager_fpu(void)
     }
 }
 
+static void __init ibpb_calculations(void)
+{
+    /* Check we have hardware IBPB support before using it... */
+    if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
+    {
+        opt_ibpb_entry_hvm = opt_ibpb_entry_pv = opt_ibpb_ctxt_switch = 0;
+        opt_ibpb_entry_dom0 = false;
+        return;
+    }
+
+    /*
+     * IBPB-on-entry mitigations for Branch Type Confusion.
+     *
+     * IBPB && !BTC_NO selects all AMD/Hygon hardware, not known to be safe,
+     * that we can provide some form of mitigation on.
+     */
+    if ( opt_ibpb_entry_pv == -1 )
+        opt_ibpb_entry_pv = (IS_ENABLED(CONFIG_PV) &&
+                             boot_cpu_has(X86_FEATURE_IBPB) &&
+                             !boot_cpu_has(X86_FEATURE_BTC_NO));
+    if ( opt_ibpb_entry_hvm == -1 )
+        opt_ibpb_entry_hvm = (IS_ENABLED(CONFIG_HVM) &&
+                              boot_cpu_has(X86_FEATURE_IBPB) &&
+                              !boot_cpu_has(X86_FEATURE_BTC_NO));
+
+    if ( opt_ibpb_entry_pv )
+    {
+        setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_PV);
+
+        /*
+         * We only need to flush in IST context if we're protecting against PV
+         * guests.  HVM IBPB-on-entry protections are both atomic with
+         * NMI/#MC, so can't interrupt Xen ahead of having already flushed the
+         * BTB.
+         */
+        default_spec_ctrl_flags |= SCF_ist_ibpb;
+    }
+    if ( opt_ibpb_entry_hvm )
+        setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_HVM);
+
+    /*
+     * If we're using IBPB-on-entry to protect against PV and HVM guests
+     * (ignoring dom0 if trusted), then there's no need to also issue IBPB on
+     * context switch too.
+     */
+    if ( opt_ibpb_ctxt_switch == -1 )
+        opt_ibpb_ctxt_switch = !(opt_ibpb_entry_hvm && opt_ibpb_entry_pv);
+}
+
 /* Calculate whether this CPU is vulnerable to L1TF. */
 static __init void l1tf_calculations(uint64_t caps)
 {
@@ -1014,8 +1098,12 @@ void spec_ctrl_init_domain(struct domain *d)
     bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
                  (opt_fb_clear_mmio && is_iommu_enabled(d)));
 
+    bool ibpb = ((pv ? opt_ibpb_entry_pv : opt_ibpb_entry_hvm) &&
+                 (d->domain_id != 0 || opt_ibpb_entry_dom0));
+
     d->arch.spec_ctrl_flags =
         (verw   ? SCF_verw         : 0) |
+        (ibpb   ? SCF_entry_ibpb   : 0) |
         0;
 }
 
@@ -1162,12 +1250,15 @@ void __init init_speculation_mitigations(void)
     }
 
     /*
-     * Use STIBP by default if the hardware hint is set.  Otherwise, leave it
-     * off as it a severe performance pentalty on pre-eIBRS Intel hardware
-     * where it was retrofitted in microcode.
+     * Use STIBP by default on all AMD systems.  Zen3 and later enumerate
+     * STIBP_ALWAYS, but STIBP is needed on Zen2 as part of the mitigations
+     * for Branch Type Confusion.
+     *
+     * Leave STIBP off by default on Intel.  Pre-eIBRS systems suffer a
+     * substantial perf hit when it was implemented in microcode.
      */
     if ( opt_stibp == -1 )
-        opt_stibp = !!boot_cpu_has(X86_FEATURE_STIBP_ALWAYS);
+        opt_stibp = !!boot_cpu_has(X86_FEATURE_AMD_STIBP);
 
     if ( opt_stibp && (boot_cpu_has(X86_FEATURE_STIBP) ||
                        boot_cpu_has(X86_FEATURE_AMD_STIBP)) )
@@ -1239,9 +1330,7 @@ void __init init_speculation_mitigations(void)
     if ( opt_rsb_hvm )
         setup_force_cpu_cap(X86_FEATURE_SC_RSB_HVM);
 
-    /* Check we have hardware IBPB support before using it... */
-    if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-        opt_ibpb_ctxt_switch = false;
+    ibpb_calculations();
 
     /* Check whether Eager FPU should be enabled by default. */
     if ( opt_eager_fpu == -1 )
diff --git a/xen/include/asm-x86/spec_ctrl.h b/xen/include/asm-x86/spec_ctrl.h
index 3fc599a817c4..9403b81dc7af 100644
--- a/xen/include/asm-x86/spec_ctrl.h
+++ b/xen/include/asm-x86/spec_ctrl.h
@@ -65,7 +65,7 @@
 void init_speculation_mitigations(void);
 void spec_ctrl_init_domain(struct domain *d);
 
-extern bool opt_ibpb_ctxt_switch;
+extern int8_t opt_ibpb_ctxt_switch;
 extern bool opt_ssbd;
 extern int8_t opt_eager_fpu;
 extern int8_t opt_l1d_flush;
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Rework SPEC_CTRL_ENTRY_FROM_INTR_IST

We are shortly going to add a conditional IBPB in this path.

Therefore, we cannot hold spec_ctrl_flags in %eax, and rely on only clobbering
it after we're done with its contents.  %rbx is available for use, and the
more normal register to hold preserved information in.

With %rax freed up, use it instead of %rdx for the RSB tmp register, and for
the adjustment to spec_ctrl_flags.

This leaves no use of %rdx, except as 0 for the upper half of WRMSR.  In
practice, %rdx is 0 from SAVE_ALL on all paths and isn't likely to change in
the foreseeable future, so update the macro entry requirements to state this
dependency.  This marginal optimisation can be revisited if circumstances
change.

No practical change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index 0ff1b118f882..15e24cde00d1 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -251,34 +251,33 @@
  */
 .macro SPEC_CTRL_ENTRY_FROM_INTR_IST
 /*
- * Requires %rsp=regs, %r14=stack_end
- * Clobbers %rax, %rcx, %rdx
+ * Requires %rsp=regs, %r14=stack_end, %rdx=0
+ * Clobbers %rax, %rbx, %rcx, %rdx
  *
  * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
  * maybexen=1, but with conditionals rather than alternatives.
  */
-    movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %eax
+    movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
 
-    test $SCF_ist_rsb, %al
+    test $SCF_ist_rsb, %bl
     jz .L\@_skip_rsb
 
-    DO_OVERWRITE_RSB tmp=rdx /* Clobbers %rcx/%rdx */
+    DO_OVERWRITE_RSB         /* Clobbers %rax/%rcx */
 
 .L\@_skip_rsb:
 
-    test $SCF_ist_sc_msr, %al
+    test $SCF_ist_sc_msr, %bl
     jz .L\@_skip_msr_spec_ctrl
 
-    xor %edx, %edx
+    xor %eax, %eax
     testb $3, UREGS_cs(%rsp)
-    setnz %dl
-    not %edx
-    and %dl, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+    setnz %al
+    not %eax
+    and %al, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
 
     /* Load Xen's intended value. */
     mov $MSR_SPEC_CTRL, %ecx
     movzbl STACK_CPUINFO_FIELD(xen_spec_ctrl)(%r14), %eax
-    xor %edx, %edx
     wrmsr
 
     /* Opencoded UNLIKELY_START() with no condition. */
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index ea6f0afbc2b4..5ad5c36128aa 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -966,7 +966,7 @@ ENTRY(double_fault)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rbx
@@ -1002,7 +1002,7 @@ handle_ist_exception:
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR_IST /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: abcd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Support IBPB-on-entry

We are going to need this to mitigate Branch Type Confusion on AMD/Hygon CPUs,
but as we've talked about using it in other cases too, arrange to support it
generally.  However, this is also very expensive in some cases, so we're going
to want per-domain controls.

Introduce SCF_ist_ibpb and SCF_entry_ibpb controls, adding them to the IST and
DOM masks as appropriate.  Also introduce X86_FEATURE_IBPB_ENTRY_{PV,HVM} to
to patch the code blocks.

For SVM, the STGI is serialising enough to protect against Spectre-v1 attacks,
so no "else lfence" is necessary.  VT-x will use use the MSR host load list,
so doesn't need any code in the VMExit path.

For the IST path, we can't safely check CPL==0 to skip a flush, as we might
have hit an entry path before it's IBPB.  As IST hitting Xen is rare, flush
irrespective of CPL.  A later path, SCF_ist_sc_msr, provides Spectre-v1
safety.

For the PV paths, we know we're interrupting CPL>0, while for the INTR paths,
we can safely check CPL==0.  Only flush when interrupting guest context.

An "else lfence" is needed for safety, but we want to be able to skip it on
unaffected CPUs, so the block wants to be an alternative, which means the
lfence has to be inline rather than UNLIKELY() (the replacement block doesn't
have displacements fixed up for anything other than the first instruction).

As with SPEC_CTRL_ENTRY_FROM_INTR_IST, %rdx is 0 on entry so rely on this to
shrink the logic marginally.  Update the comments to specify this new
dependency.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index 4ae55a2ef605..0ff4008060fa 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -97,7 +97,19 @@ __UNLIKELY_END(nsvm_hap)
 
         GET_CURRENT(bx)
 
-        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: %rsp=regs/cpuinfo         Clob: acd */
+        /* SPEC_CTRL_ENTRY_FROM_SVM    Req: %rsp=regs/cpuinfo, %rdx=0 Clob: acd */
+
+        .macro svm_vmexit_cond_ibpb
+            testb  $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
+            jz     .L_skip_ibpb
+
+            mov    $MSR_PRED_CMD, %ecx
+            mov    $PRED_CMD_IBPB, %eax
+            wrmsr
+.L_skip_ibpb:
+	.endm
+        ALTERNATIVE "", svm_vmexit_cond_ibpb, X86_FEATURE_IBPB_ENTRY_HVM
+
         ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_HVM
 
         .macro svm_vmexit_spec_ctrl
@@ -114,6 +126,10 @@ __UNLIKELY_END(nsvm_hap)
         ALTERNATIVE "", svm_vmexit_spec_ctrl, X86_FEATURE_SC_MSR_HVM
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
+        /*
+         * STGI is executed unconditionally, and is sufficiently serialising
+         * to safely resolve any Spectre-v1 concerns in the above logic.
+         */
         stgi
 GLOBAL(svm_stgi_label)
         mov  %rsp,%rdi
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 683c650d77b1..4f12fa06ac3e 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -1335,6 +1335,10 @@ static int construct_vmcs(struct vcpu *v)
         rc = vmx_add_msr(v, MSR_FLUSH_CMD, FLUSH_CMD_L1D,
                          VMX_MSR_GUEST_LOADONLY);
 
+    if ( !rc && (d->arch.spec_ctrl_flags & SCF_entry_ibpb) )
+        rc = vmx_add_msr(v, MSR_PRED_CMD, PRED_CMD_IBPB,
+                         VMX_MSR_HOST);
+
  out:
     vmx_vmcs_exit(v);
 
diff --git a/xen/arch/x86/include/asm/cpufeatures.h b/xen/arch/x86/include/asm/cpufeatures.h
index 493d338a085e..672c9ee22ba2 100644
--- a/xen/arch/x86/include/asm/cpufeatures.h
+++ b/xen/arch/x86/include/asm/cpufeatures.h
@@ -39,6 +39,8 @@ XEN_CPUFEATURE(XEN_LBR,           X86_SYNTH(22)) /* Xen uses MSR_DEBUGCTL.LBR */
 XEN_CPUFEATURE(SC_VERW_IDLE,      X86_SYNTH(25)) /* VERW used by Xen for idle */
 XEN_CPUFEATURE(XEN_SHSTK,         X86_SYNTH(26)) /* Xen uses CET Shadow Stacks */
 XEN_CPUFEATURE(XEN_IBT,           X86_SYNTH(27)) /* Xen uses CET Indirect Branch Tracking */
+XEN_CPUFEATURE(IBPB_ENTRY_PV,     X86_SYNTH(28)) /* MSR_PRED_CMD used by Xen for PV */
+XEN_CPUFEATURE(IBPB_ENTRY_HVM,    X86_SYNTH(29)) /* MSR_PRED_CMD used by Xen for HVM */
 
 /* Bug words follow the synthetic words. */
 #define X86_NR_BUG 1
diff --git a/xen/arch/x86/include/asm/spec_ctrl.h b/xen/arch/x86/include/asm/spec_ctrl.h
index fb4365575620..3fc599a817c4 100644
--- a/xen/arch/x86/include/asm/spec_ctrl.h
+++ b/xen/arch/x86/include/asm/spec_ctrl.h
@@ -34,6 +34,8 @@
 #define SCF_ist_sc_msr (1 << 1)
 #define SCF_ist_rsb    (1 << 2)
 #define SCF_verw       (1 << 3)
+#define SCF_ist_ibpb   (1 << 4)
+#define SCF_entry_ibpb (1 << 5)
 
 /*
  * The IST paths (NMI/#MC) can interrupt any arbitrary context.  Some
@@ -46,13 +48,13 @@
  * These are the controls to inhibit on the S3 resume path until microcode has
  * been reloaded.
  */
-#define SCF_IST_MASK (SCF_ist_sc_msr)
+#define SCF_IST_MASK (SCF_ist_sc_msr | SCF_ist_ibpb)
 
 /*
  * Some speculative protections are per-domain.  These settings are merged
  * into the top-of-stack block in the context switch path.
  */
-#define SCF_DOM_MASK (SCF_verw)
+#define SCF_DOM_MASK (SCF_verw | SCF_entry_ibpb)
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/arch/x86/include/asm/spec_ctrl_asm.h b/xen/arch/x86/include/asm/spec_ctrl_asm.h
index 15e24cde00d1..9eb4ad9ab71d 100644
--- a/xen/arch/x86/include/asm/spec_ctrl_asm.h
+++ b/xen/arch/x86/include/asm/spec_ctrl_asm.h
@@ -88,6 +88,35 @@
  *  - SPEC_CTRL_EXIT_TO_{SVM,VMX}
  */
 
+.macro DO_SPEC_CTRL_COND_IBPB maybexen:req
+/*
+ * Requires %rsp=regs (also cpuinfo if !maybexen)
+ * Requires %r14=stack_end (if maybexen), %rdx=0
+ * Clobbers %rax, %rcx, %rdx
+ *
+ * Conditionally issue IBPB if SCF_entry_ibpb is active.  In the maybexen
+ * case, we can safely look at UREGS_cs to skip taking the hit when
+ * interrupting Xen.
+ */
+    .if \maybexen
+        testb  $SCF_entry_ibpb, STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14)
+        jz     .L\@_skip
+        testb  $3, UREGS_cs(%rsp)
+    .else
+        testb  $SCF_entry_ibpb, CPUINFO_xen_spec_ctrl(%rsp)
+    .endif
+    jz     .L\@_skip
+
+    mov     $MSR_PRED_CMD, %ecx
+    mov     $PRED_CMD_IBPB, %eax
+    wrmsr
+    jmp     .L\@_done
+
+.L\@_skip:
+    lfence
+.L\@_done:
+.endm
+
 .macro DO_OVERWRITE_RSB tmp=rax
 /*
  * Requires nothing
@@ -225,12 +254,16 @@
 
 /* Use after an entry from PV context (syscall/sysenter/int80/int82/etc). */
 #define SPEC_CTRL_ENTRY_FROM_PV                                         \
+    ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=0),     \
+        X86_FEATURE_IBPB_ENTRY_PV;                                      \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
     ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=0),         \
         X86_FEATURE_SC_MSR_PV
 
 /* Use in interrupt/exception context.  May interrupt Xen or PV context. */
 #define SPEC_CTRL_ENTRY_FROM_INTR                                       \
+    ALTERNATIVE "", __stringify(DO_SPEC_CTRL_COND_IBPB maybexen=1),     \
+        X86_FEATURE_IBPB_ENTRY_PV;                                      \
     ALTERNATIVE "", DO_OVERWRITE_RSB, X86_FEATURE_SC_RSB_PV;            \
     ALTERNATIVE "", __stringify(DO_SPEC_CTRL_ENTRY maybexen=1),         \
         X86_FEATURE_SC_MSR_PV
@@ -254,11 +287,23 @@
  * Requires %rsp=regs, %r14=stack_end, %rdx=0
  * Clobbers %rax, %rbx, %rcx, %rdx
  *
- * This is logical merge of DO_OVERWRITE_RSB and DO_SPEC_CTRL_ENTRY
- * maybexen=1, but with conditionals rather than alternatives.
+ * This is logical merge of:
+ *    DO_SPEC_CTRL_COND_IBPB maybexen=0
+ *    DO_OVERWRITE_RSB
+ *    DO_SPEC_CTRL_ENTRY maybexen=1
+ * but with conditionals rather than alternatives.
  */
     movzbl STACK_CPUINFO_FIELD(spec_ctrl_flags)(%r14), %ebx
 
+    test    $SCF_ist_ibpb, %bl
+    jz      .L\@_skip_ibpb
+
+    mov     $MSR_PRED_CMD, %ecx
+    mov     $PRED_CMD_IBPB, %eax
+    wrmsr
+
+.L\@_skip_ibpb:
+
     test $SCF_ist_rsb, %bl
     jz .L\@_skip_rsb
 
diff --git a/xen/arch/x86/x86_64/compat/entry.S b/xen/arch/x86/x86_64/compat/entry.S
index 5fd6dbbd4513..b86d38d1c50d 100644
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -18,7 +18,7 @@ ENTRY(entry_int82)
         movl  $HYPERCALL_VECTOR, 4(%rsp)
         SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         CR4_PV32_RESTORE
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index 5ad5c36128aa..26bf2f1941f8 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -260,7 +260,7 @@ ENTRY(lstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -298,7 +298,7 @@ ENTRY(cstar_enter)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -338,7 +338,7 @@ GLOBAL(sysenter_eflags_saved)
         movl  $TRAP_syscall, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -392,7 +392,7 @@ ENTRY(int80_direct_trap)
         movl  $0x80, 4(%rsp)
         SAVE_ALL
 
-        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_PV /* Req: %rsp=regs/cpuinfo, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         GET_STACK_END(bx)
@@ -674,7 +674,7 @@ ENTRY(common_interrupt)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
@@ -708,7 +708,7 @@ GLOBAL(handle_exception)
 
         GET_STACK_END(14)
 
-        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, Clob: acd */
+        SPEC_CTRL_ENTRY_FROM_INTR /* Req: %rsp=regs, %r14=end, %rdx=0, Clob: acd */
         /* WARNING! `ret`, `call *`, `jmp *` not safe before this point. */
 
         mov   STACK_CPUINFO_FIELD(xen_cr3)(%r14), %rcx
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/cpuid: Enumeration for BTC_NO

BTC_NO indicates that hardware is not succeptable to Branch Type Confusion.

Zen3 CPUs don't suffer BTC.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
index 67ba72dd8475..f4735b1c13ab 100644
--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -289,6 +289,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
         {"virt-ssbd",    0x80000008, NA, CPUID_REG_EBX, 25,  1},
         {"ssb-no",       0x80000008, NA, CPUID_REG_EBX, 26,  1},
         {"psfd",         0x80000008, NA, CPUID_REG_EBX, 28,  1},
+        {"btc-no",       0x80000008, NA, CPUID_REG_EBX, 29,  1},
 
         {"nc",           0x80000008, NA, CPUID_REG_ECX,  0,  8},
         {"apicidsize",   0x80000008, NA, CPUID_REG_ECX, 12,  4},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index a9e0fb5d17e5..1e6b077ba4da 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -160,7 +160,7 @@ static const char *const str_e8b[32] =
     /* [22] */                 [23] = "ppin",
     [24] = "amd-ssbd",         [25] = "virt-ssbd",
     [26] = "ssb-no",
-    [28] = "psfd",
+    [28] = "psfd",             [29] = "btc-no",
 };
 
 static const char *const str_7d0[32] =
diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index f1d11a1cb734..618c7d5b2a46 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -847,6 +847,16 @@ static void cf_check init_amd(struct cpuinfo_x86 *c)
 			warning_add(text);
 		}
 		break;
+
+	case 0x19:
+		/*
+		 * Zen3 (Fam19h model < 0x10) parts are not susceptible to
+		 * Branch Type Confusion, but predate the allocation of the
+		 * BTC_NO bit.  Fill it back in if we're not virtualised.
+		 */
+		if (!cpu_has_hypervisor && !cpu_has(c, X86_FEATURE_BTC_NO))
+			__set_bit(X86_FEATURE_BTC_NO, c->x86_capability);
+		break;
 	}
 
 	display_cacheinfo(c);
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index ede0a27c0fea..e5ea2c5c6c93 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -388,7 +388,7 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * Hardware read-only information, stating immunity to certain issues, or
      * suggestions of which mitigation to use.
      */
-    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
+    printk("  Hardware hints:%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n",
            (caps & ARCH_CAPS_RDCL_NO)                        ? " RDCL_NO"        : "",
            (caps & ARCH_CAPS_IBRS_ALL)                       ? " IBRS_ALL"       : "",
            (caps & ARCH_CAPS_RSBA)                           ? " RSBA"           : "",
@@ -403,7 +403,8 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
            (e8b  & cpufeat_mask(X86_FEATURE_IBRS_ALWAYS))    ? " IBRS_ALWAYS"    : "",
            (e8b  & cpufeat_mask(X86_FEATURE_STIBP_ALWAYS))   ? " STIBP_ALWAYS"   : "",
            (e8b  & cpufeat_mask(X86_FEATURE_IBRS_FAST))      ? " IBRS_FAST"      : "",
-           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "");
+           (e8b  & cpufeat_mask(X86_FEATURE_IBRS_SAME_MODE)) ? " IBRS_SAME_MODE" : "",
+           (e8b  & cpufeat_mask(X86_FEATURE_BTC_NO))         ? " BTC_NO"         : "");
 
     /* Hardware features which need driving to mitigate issues. */
     printk("  Hardware features:%s%s%s%s%s%s%s%s%s%s%s%s\n",
diff --git a/xen/include/public/arch-x86/cpufeatureset.h b/xen/include/public/arch-x86/cpufeatureset.h
index 101698941074..c9c4683557ff 100644
--- a/xen/include/public/arch-x86/cpufeatureset.h
+++ b/xen/include/public/arch-x86/cpufeatureset.h
@@ -268,6 +268,7 @@ XEN_CPUFEATURE(AMD_SSBD,      8*32+24) /*S  MSR_SPEC_CTRL.SSBD available */
 XEN_CPUFEATURE(VIRT_SSBD,     8*32+25) /*   MSR_VIRT_SPEC_CTRL.SSBD */
 XEN_CPUFEATURE(SSB_NO,        8*32+26) /*A  Hardware not vulnerable to SSB */
 XEN_CPUFEATURE(PSFD,          8*32+28) /*S  MSR_SPEC_CTRL.PSFD */
+XEN_CPUFEATURE(BTC_NO,        8*32+29) /*A  Hardware not vulnerable to Branch Type Confusion */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0.edx, word 9 */
 XEN_CPUFEATURE(AVX512_4VNNIW, 9*32+ 2) /*A  AVX512 Neural Network Instructions */
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Enable Zen2 chickenbit

... as instructed in the Branch Type Confusion whitepaper.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

diff --git a/xen/arch/x86/cpu/amd.c b/xen/arch/x86/cpu/amd.c
index 618c7d5b2a46..29c59bcba409 100644
--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -731,6 +731,31 @@ void amd_init_ssbd(const struct cpuinfo_x86 *c)
 		printk_once(XENLOG_ERR "No SSBD controls available\n");
 }
 
+/*
+ * On Zen2 we offer this chicken (bit) on the altar of Speculation.
+ *
+ * Refer to the AMD Branch Type Confusion whitepaper:
+ * https://XXX
+ *
+ * Setting this unnamed bit supposedly causes prediction information on
+ * non-branch instructions to be ignored.  It is to be set unilaterally in
+ * newer microcode.
+ *
+ * This chickenbit is something unrelated on Zen1, and Zen1 vs Zen2 isn't a
+ * simple model number comparison, so use STIBP as a heuristic to separate the
+ * two uarches in Fam17h(AMD)/18h(Hygon).
+ */
+void amd_init_spectral_chicken(void)
+{
+	uint64_t val, chickenbit = 1 << 1;
+
+	if (cpu_has_hypervisor || !boot_cpu_has(X86_FEATURE_AMD_STIBP))
+		return;
+
+	if (rdmsr_safe(MSR_AMD64_DE_CFG2, val) == 0 && !(val & chickenbit))
+		wrmsr_safe(MSR_AMD64_DE_CFG2, val | chickenbit);
+}
+
 void __init detect_zen2_null_seg_behaviour(void)
 {
 	uint64_t base;
@@ -796,6 +821,9 @@ static void cf_check init_amd(struct cpuinfo_x86 *c)
 
 	amd_init_ssbd(c);
 
+	if (c->x86 == 0x17)
+		amd_init_spectral_chicken();
+
 	/* Probe for NSCB on Zen2 CPUs when not virtualised */
 	if (!cpu_has_hypervisor && !cpu_has_nscb && c == &boot_cpu_data &&
 	    c->x86 == 0x17)
diff --git a/xen/arch/x86/cpu/cpu.h b/xen/arch/x86/cpu/cpu.h
index a228087f9157..85a67771f7b7 100644
--- a/xen/arch/x86/cpu/cpu.h
+++ b/xen/arch/x86/cpu/cpu.h
@@ -22,4 +22,5 @@ void cf_check early_init_amd(struct cpuinfo_x86 *c);
 void amd_log_freq(const struct cpuinfo_x86 *c);
 void amd_init_lfence(struct cpuinfo_x86 *c);
 void amd_init_ssbd(const struct cpuinfo_x86 *c);
+void amd_init_spectral_chicken(void);
 void detect_zen2_null_seg_behaviour(void);
diff --git a/xen/arch/x86/cpu/hygon.c b/xen/arch/x86/cpu/hygon.c
index 3c8516e014c3..361eb6fd411b 100644
--- a/xen/arch/x86/cpu/hygon.c
+++ b/xen/arch/x86/cpu/hygon.c
@@ -41,6 +41,12 @@ static void cf_check init_hygon(struct cpuinfo_x86 *c)
 		detect_zen2_null_seg_behaviour();
 
 	/*
+	 * TODO: Check heuristic safety with Hygon first
+	if (c->x86 == 0x18)
+		amd_init_spectral_chicken();
+	 */
+
+	/*
 	 * Hygon CPUs before Zen2 don't clear segment bases/limits when
 	 * loading a NULL selector.
 	 */
diff --git a/xen/arch/x86/include/asm/msr-index.h b/xen/arch/x86/include/asm/msr-index.h
index bcb424a32026..8cab8736d8a5 100644
--- a/xen/arch/x86/include/asm/msr-index.h
+++ b/xen/arch/x86/include/asm/msr-index.h
@@ -377,6 +377,7 @@
 #define MSR_AMD64_DE_CFG		0xc0011029
 #define AMD64_DE_CFG_LFENCE_SERIALISE	(_AC(1, ULL) << 1)
 #define MSR_AMD64_EX_CFG		0xc001102c
+#define MSR_AMD64_DE_CFG2		0xc00110e3
 
 #define MSR_AMD64_DR0_ADDRESS_MASK	0xc0011027
 #define MSR_AMD64_DR1_ADDRESS_MASK	0xc0011019
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/spec-ctrl: Mitigate Branch Type Confusion when possible

Branch Type Confusion affects AMD/Hygon CPUs on Zen2 and earlier.  To
mitigate, we require SMT safety (STIBP on Zen2, no-SMT on Zen1), and to issue
an IBPB on each entry to Xen, to flush the BTB.

Due to performance concerns, dom0 (which is trusted in most configurations) is
excluded from protections by default.

Therefore:
 * Use STIBP by default on Zen2 too, which now means we want it on by default
   on all hardware supporting STIBP.
 * Break the current IBPB logic out into a new function, extending it with
   IBPB-at-entry logic.
 * Change the existing IBPB-at-ctxt-switch boolean to be tristate, and disable
   it by default when IBPB-at-entry is providing sufficient safety.

If all PV guests on the system are trusted, then it is recommended to boot
with `spec-ctrl=ibpb-entry=no-pv`, as this will provide an additional marginal
perf improvement.

This is part of XSA-407 / CVE-2022-23825.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index de33ccc005fc..971f47a77e94 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2258,7 +2258,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 
 ### spec-ctrl (x86)
 > `= List of [ <bool>, xen=<bool>, {pv,hvm}=<bool>,
->              {msr-sc,rsb,md-clear}=<bool>|{pv,hvm}=<bool>,
+>              {msr-sc,rsb,md-clear,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
 >              unpriv-mmio}=<bool> ]`
@@ -2283,9 +2283,10 @@ in place for guests to use.
 
 Use of a positive boolean value for either of these options is invalid.
 
-The `pv=`, `hvm=`, `msr-sc=`, `rsb=` and `md-clear=` options offer fine
-grained control over the primitives by Xen.  These impact Xen's ability to
-protect itself, and/or Xen's ability to virtualise support for guests to use.
+The `pv=`, `hvm=`, `msr-sc=`, `rsb=`, `md-clear=` and `ibpb-entry=` options
+offer fine grained control over the primitives by Xen.  These impact Xen's
+ability to protect itself, and/or Xen's ability to virtualise support for
+guests to use.
 
 * `pv=` and `hvm=` offer control over all suboptions for PV and HVM guests
   respectively.
@@ -2304,6 +2305,11 @@ protect itself, and/or Xen's ability to virtualise support for guests to use.
   compatibility with development versions of this fix, `mds=` is also accepted
   on Xen 4.12 and earlier as an alias.  Consult vendor documentation in
   preference to here.*
+* `ibpb-entry=` offers control over whether IBPB (Indirect Branch Prediction
+  Barrier) is used on entry to Xen.  This is used by default on hardware
+  vulnerable to Branch Type Confusion, but for performance reasons, dom0 is
+  unprotected by default.  If it necessary to protect dom0 too, boot with
+  `spec-ctrl=ibpb-entry`.
 
 If Xen was compiled with INDIRECT_THUNK support, `bti-thunk=` can be used to
 select which of the thunks gets patched into the `__x86_indirect_thunk_%reg`
diff --git a/xen/arch/x86/include/asm/spec_ctrl.h b/xen/arch/x86/include/asm/spec_ctrl.h
index 3fc599a817c4..9403b81dc7af 100644
--- a/xen/arch/x86/include/asm/spec_ctrl.h
+++ b/xen/arch/x86/include/asm/spec_ctrl.h
@@ -65,7 +65,7 @@
 void init_speculation_mitigations(void);
 void spec_ctrl_init_domain(struct domain *d);
 
-extern bool opt_ibpb_ctxt_switch;
+extern int8_t opt_ibpb_ctxt_switch;
 extern bool opt_ssbd;
 extern int8_t opt_eager_fpu;
 extern int8_t opt_l1d_flush;
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index e5ea2c5c6c93..9dd4d846f555 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -39,6 +39,10 @@ static bool __initdata opt_rsb_hvm = true;
 static int8_t __ro_after_init opt_md_clear_pv = -1;
 static int8_t __ro_after_init opt_md_clear_hvm = -1;
 
+static int8_t __ro_after_init opt_ibpb_entry_pv = -1;
+static int8_t __ro_after_init opt_ibpb_entry_hvm = -1;
+static bool __ro_after_init opt_ibpb_entry_dom0;
+
 /* Cmdline controls for Xen's speculative settings. */
 static enum ind_thunk {
     THUNK_DEFAULT, /* Decide which thunk to use at boot time. */
@@ -54,7 +58,7 @@ int8_t __initdata opt_stibp = -1;
 bool __ro_after_init opt_ssbd;
 int8_t __initdata opt_psfd = -1;
 
-bool __read_mostly opt_ibpb_ctxt_switch = true;
+int8_t __ro_after_init opt_ibpb_ctxt_switch = -1;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 static bool __initdata opt_branch_harden = true;
@@ -114,6 +118,9 @@ static int __init cf_check parse_spec_ctrl(const char *s)
             opt_rsb_hvm = false;
             opt_md_clear_pv = 0;
             opt_md_clear_hvm = 0;
+            opt_ibpb_entry_pv = 0;
+            opt_ibpb_entry_hvm = 0;
+            opt_ibpb_entry_dom0 = false;
 
             opt_thunk = THUNK_JMP;
             opt_ibrs = 0;
@@ -140,12 +147,14 @@ static int __init cf_check parse_spec_ctrl(const char *s)
             opt_msr_sc_pv = val;
             opt_rsb_pv = val;
             opt_md_clear_pv = val;
+            opt_ibpb_entry_pv = val;
         }
         else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
         {
             opt_msr_sc_hvm = val;
             opt_rsb_hvm = val;
             opt_md_clear_hvm = val;
+            opt_ibpb_entry_hvm = val;
         }
         else if ( (val = parse_boolean("msr-sc", s, ss)) != -1 )
         {
@@ -210,6 +219,28 @@ static int __init cf_check parse_spec_ctrl(const char *s)
                 break;
             }
         }
+        else if ( (val = parse_boolean("ibpb-entry", s, ss)) != -1 )
+        {
+            switch ( val )
+            {
+            case 0:
+            case 1:
+                opt_ibpb_entry_pv = opt_ibpb_entry_hvm =
+                    opt_ibpb_entry_dom0 = val;
+                break;
+
+            case -2:
+                s += strlen("ibpb-entry=");
+                if ( (val = parse_boolean("pv", s, ss)) >= 0 )
+                    opt_ibpb_entry_pv = val;
+                else if ( (val = parse_boolean("hvm", s, ss)) >= 0 )
+                    opt_ibpb_entry_hvm = val;
+                else
+            default:
+                    rc = -EINVAL;
+                break;
+            }
+        }
 
         /* Xen's speculative sidechannel mitigation settings. */
         else if ( !strncmp(s, "bti-thunk=", 10) )
@@ -477,27 +508,31 @@ static void __init print_details(enum ind_thunk thunk, uint64_t caps)
      * mitigation support for guests.
      */
 #ifdef CONFIG_HVM
-    printk("  Support for HVM VMs:%s%s%s%s%s\n",
+    printk("  Support for HVM VMs:%s%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_HVM) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_HVM) ||
             boot_cpu_has(X86_FEATURE_MD_CLEAR)   ||
+            boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_HVM)      ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_HVM)      ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
-           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
+           boot_cpu_has(X86_FEATURE_IBPB_ENTRY_HVM)  ? " IBPB-entry"    : "");
 
 #endif
 #ifdef CONFIG_PV
-    printk("  Support for PV VMs:%s%s%s%s%s\n",
+    printk("  Support for PV VMs:%s%s%s%s%s%s\n",
            (boot_cpu_has(X86_FEATURE_SC_MSR_PV) ||
             boot_cpu_has(X86_FEATURE_SC_RSB_PV) ||
             boot_cpu_has(X86_FEATURE_MD_CLEAR)  ||
+            boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV) ||
             opt_eager_fpu)                           ? ""               : " None",
            boot_cpu_has(X86_FEATURE_SC_MSR_PV)       ? " MSR_SPEC_CTRL" : "",
            boot_cpu_has(X86_FEATURE_SC_RSB_PV)       ? " RSB"           : "",
            opt_eager_fpu                             ? " EAGER_FPU"     : "",
-           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "");
+           boot_cpu_has(X86_FEATURE_MD_CLEAR)        ? " MD_CLEAR"      : "",
+           boot_cpu_has(X86_FEATURE_IBPB_ENTRY_PV)   ? " IBPB-entry"    : "");
 
     printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
            opt_xpti_hwdom ? "enabled" : "disabled",
@@ -760,6 +795,55 @@ static bool __init should_use_eager_fpu(void)
     }
 }
 
+static void __init ibpb_calculations(void)
+{
+    /* Check we have hardware IBPB support before using it... */
+    if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
+    {
+        opt_ibpb_entry_hvm = opt_ibpb_entry_pv = opt_ibpb_ctxt_switch = 0;
+        opt_ibpb_entry_dom0 = false;
+        return;
+    }
+
+    /*
+     * IBPB-on-entry mitigations for Branch Type Confusion.
+     *
+     * IBPB && !BTC_NO selects all AMD/Hygon hardware, not known to be safe,
+     * that we can provide some form of mitigation on.
+     */
+    if ( opt_ibpb_entry_pv == -1 )
+        opt_ibpb_entry_pv = (IS_ENABLED(CONFIG_PV) &&
+                             boot_cpu_has(X86_FEATURE_IBPB) &&
+                             !boot_cpu_has(X86_FEATURE_BTC_NO));
+    if ( opt_ibpb_entry_hvm == -1 )
+        opt_ibpb_entry_hvm = (IS_ENABLED(CONFIG_HVM) &&
+                              boot_cpu_has(X86_FEATURE_IBPB) &&
+                              !boot_cpu_has(X86_FEATURE_BTC_NO));
+
+    if ( opt_ibpb_entry_pv )
+    {
+        setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_PV);
+
+        /*
+         * We only need to flush in IST context if we're protecting against PV
+         * guests.  HVM IBPB-on-entry protections are both atomic with
+         * NMI/#MC, so can't interrupt Xen ahead of having already flushed the
+         * BTB.
+         */
+        default_spec_ctrl_flags |= SCF_ist_ibpb;
+    }
+    if ( opt_ibpb_entry_hvm )
+        setup_force_cpu_cap(X86_FEATURE_IBPB_ENTRY_HVM);
+
+    /*
+     * If we're using IBPB-on-entry to protect against PV and HVM guests
+     * (ignoring dom0 if trusted), then there's no need to also issue IBPB on
+     * context switch too.
+     */
+    if ( opt_ibpb_ctxt_switch == -1 )
+        opt_ibpb_ctxt_switch = !(opt_ibpb_entry_hvm && opt_ibpb_entry_pv);
+}
+
 /* Calculate whether this CPU is vulnerable to L1TF. */
 static __init void l1tf_calculations(uint64_t caps)
 {
@@ -1015,8 +1099,12 @@ void spec_ctrl_init_domain(struct domain *d)
     bool verw = ((pv ? opt_md_clear_pv : opt_md_clear_hvm) ||
                  (opt_fb_clear_mmio && is_iommu_enabled(d)));
 
+    bool ibpb = ((pv ? opt_ibpb_entry_pv : opt_ibpb_entry_hvm) &&
+                 (d->domain_id != 0 || opt_ibpb_entry_dom0));
+
     d->arch.spec_ctrl_flags =
         (verw   ? SCF_verw         : 0) |
+        (ibpb   ? SCF_entry_ibpb   : 0) |
         0;
 }
 
@@ -1163,12 +1251,15 @@ void __init init_speculation_mitigations(void)
     }
 
     /*
-     * Use STIBP by default if the hardware hint is set.  Otherwise, leave it
-     * off as it a severe performance pentalty on pre-eIBRS Intel hardware
-     * where it was retrofitted in microcode.
+     * Use STIBP by default on all AMD systems.  Zen3 and later enumerate
+     * STIBP_ALWAYS, but STIBP is needed on Zen2 as part of the mitigations
+     * for Branch Type Confusion.
+     *
+     * Leave STIBP off by default on Intel.  Pre-eIBRS systems suffer a
+     * substantial perf hit when it was implemented in microcode.
      */
     if ( opt_stibp == -1 )
-        opt_stibp = !!boot_cpu_has(X86_FEATURE_STIBP_ALWAYS);
+        opt_stibp = !!boot_cpu_has(X86_FEATURE_AMD_STIBP);
 
     if ( opt_stibp && (boot_cpu_has(X86_FEATURE_STIBP) ||
                        boot_cpu_has(X86_FEATURE_AMD_STIBP)) )
@@ -1240,9 +1331,7 @@ void __init init_speculation_mitigations(void)
     if ( opt_rsb_hvm )
         setup_force_cpu_cap(X86_FEATURE_SC_RSB_HVM);
 
-    /* Check we have hardware IBPB support before using it... */
-    if ( !boot_cpu_has(X86_FEATURE_IBRSB) && !boot_cpu_has(X86_FEATURE_IBPB) )
-        opt_ibpb_ctxt_switch = false;
+    ibpb_calculations();
 
     /* Check whether Eager FPU should be enabled by default. */
     if ( opt_eager_fpu == -1 )
Re: [oss-security] Xen Security Advisory 407 v1 (CVE-2022-23816,CVE-2022-23825,CVE-2022-29900) - Retbleed - arbitrary speculative code execution with return instructions
Posted by Salvatore Bonaccorso 1 year, 9 months ago
Hi,

On Tue, Jul 12, 2022 at 04:36:10PM +0000, Xen.org security team wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
>  Xen Security Advisory CVE-2022-23816,CVE-2022-23825,CVE-2022-29900 / XSA-407
> 
>    Retbleed - arbitrary speculative code execution with return instructions
> 
> ISSUE DESCRIPTION
> =================
> 
> Researchers at ETH Zurich have discovered Retbleed, allowing for
> arbitrary speculative execution in a victim context.
> 
> For more details, see:
>   https://comsec.ethz.ch/retbleed
> 
> ETH Zurich have allocated CVE-2022-29900 for AMD and CVE-2022-29901 for
> Intel.
> 
> Despite the similar preconditions, these are very different
> microarchitectural behaviours between vendors.
> 
> On AMD CPUs, Retbleed is one specific instance of a more general
> microarchitectural behaviour called Branch Type Confusion.  AMD have
> assigned CVE-2022-23816 (Retbleed) and CVE-2022-23825 (Branch Type
> Confusion).
> 
> For more details, see:
>   https://www.amd.com/en/corporate/product-security/bulletin/amd-sb-1037

Is it confirmed that AMD is not using CVE-2022-29900? The above
amd-sb-1037 references as well both CVE-2022-23825 (Branch Type
Confusion) and CVE-2022-29900 (RETbleed), so I assume they agreed to
use CVE-2022-29900 for retbleed?

So should the Xen advisory as well use CVE-2022-23825,CVE-2022-29900
and CVE-2022-29901?

Regards,
Salvatore
Re: [oss-security] Xen Security Advisory 407 v1 (CVE-2022-23816,CVE-2022-23825,CVE-2022-29900) - Retbleed - arbitrary speculative code execution with return instructions
Posted by Salvatore Bonaccorso 1 year, 9 months ago
Hi,

On Tue, Jul 12, 2022 at 09:27:07PM +0200, Salvatore Bonaccorso wrote:
> Hi,
> 
> On Tue, Jul 12, 2022 at 04:36:10PM +0000, Xen.org security team wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA256
> > 
> >  Xen Security Advisory CVE-2022-23816,CVE-2022-23825,CVE-2022-29900 / XSA-407
> > 
> >    Retbleed - arbitrary speculative code execution with return instructions
> > 
> > ISSUE DESCRIPTION
> > =================
> > 
> > Researchers at ETH Zurich have discovered Retbleed, allowing for
> > arbitrary speculative execution in a victim context.
> > 
> > For more details, see:
> >   https://comsec.ethz.ch/retbleed
> > 
> > ETH Zurich have allocated CVE-2022-29900 for AMD and CVE-2022-29901 for
> > Intel.
> > 
> > Despite the similar preconditions, these are very different
> > microarchitectural behaviours between vendors.
> > 
> > On AMD CPUs, Retbleed is one specific instance of a more general
> > microarchitectural behaviour called Branch Type Confusion.  AMD have
> > assigned CVE-2022-23816 (Retbleed) and CVE-2022-23825 (Branch Type
> > Confusion).
> > 
> > For more details, see:
> >   https://www.amd.com/en/corporate/product-security/bulletin/amd-sb-1037
> 
> Is it confirmed that AMD is not using CVE-2022-29900? The above
> amd-sb-1037 references as well both CVE-2022-23825 (Branch Type
> Confusion) and CVE-2022-29900 (RETbleed), so I assume they agreed to
> use CVE-2022-29900 for retbleed?
> 
> So should the Xen advisory as well use CVE-2022-23825,CVE-2022-29900
> and CVE-2022-29901?

Nevermind, I missunderstood the wording and the advisory just mentions
all the related CVEs correctly and made a thinko. It might turn out
that CVE-2022-23816 will not be used, but then the title would read
only as 

Xen Security Advisory CVE-2022-23825,CVE-2022-29900 / XSA-407

So please disregard the question above.

Salvatore
Re: [oss-security] Xen Security Advisory 407 v1 (CVE-2022-23816,CVE-2022-23825,CVE-2022-29900) - Retbleed - arbitrary speculative code execution with return instructions
Posted by Andrew Cooper 1 year, 9 months ago
On 12/07/2022 20:34, Salvatore Bonaccorso wrote:
> Hi,
>
> On Tue, Jul 12, 2022 at 09:27:07PM +0200, Salvatore Bonaccorso wrote:
>> Hi,
>>
>> On Tue, Jul 12, 2022 at 04:36:10PM +0000, Xen.org security team wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>>  Xen Security Advisory CVE-2022-23816,CVE-2022-23825,CVE-2022-29900 / XSA-407
>>>
>>>    Retbleed - arbitrary speculative code execution with return instructions
>>>
>>> ISSUE DESCRIPTION
>>> =================
>>>
>>> Researchers at ETH Zurich have discovered Retbleed, allowing for
>>> arbitrary speculative execution in a victim context.
>>>
>>> For more details, see:
>>>   https://comsec.ethz.ch/retbleed
>>>
>>> ETH Zurich have allocated CVE-2022-29900 for AMD and CVE-2022-29901 for
>>> Intel.
>>>
>>> Despite the similar preconditions, these are very different
>>> microarchitectural behaviours between vendors.
>>>
>>> On AMD CPUs, Retbleed is one specific instance of a more general
>>> microarchitectural behaviour called Branch Type Confusion.  AMD have
>>> assigned CVE-2022-23816 (Retbleed) and CVE-2022-23825 (Branch Type
>>> Confusion).
>>>
>>> For more details, see:
>>>   https://www.amd.com/en/corporate/product-security/bulletin/amd-sb-1037
>> Is it confirmed that AMD is not using CVE-2022-29900? The above
>> amd-sb-1037 references as well both CVE-2022-23825 (Branch Type
>> Confusion) and CVE-2022-29900 (RETbleed), so I assume they agreed to
>> use CVE-2022-29900 for retbleed?
>>
>> So should the Xen advisory as well use CVE-2022-23825,CVE-2022-29900
>> and CVE-2022-29901?
> Nevermind, I missunderstood the wording and the advisory just mentions
> all the related CVEs correctly and made a thinko. It might turn out
> that CVE-2022-23816 will not be used, but then the title would read
> only as 
>
> Xen Security Advisory CVE-2022-23825,CVE-2022-29900 / XSA-407
>
> So please disregard the question above.

/sigh

AMD changed the CVE in the bulletin between the final draft, and what
went public.

CVE-2022-23816 has been referenced by multiple other vendors too, so is
definitely out in the world.  Hopefully MITRE will close out one of
CVE-2022-23816 and CVE-2022-29900 as a dup of the other.

For now, I think the least confusing option is to keep both referenced.

~Andrew