Xen Security Advisory 453 v1 (CVE-2024-2193) - GhostRace: Speculative Race Conditions

Xen.org security team posted 1 patch 1 month, 2 weeks ago
Failed in applying to current master (apply log)
Xen Security Advisory 453 v1 (CVE-2024-2193) - GhostRace: Speculative Race Conditions
Posted by Xen.org security team 1 month, 2 weeks ago
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

            Xen Security Advisory CVE-2024-2193 / XSA-453

                GhostRace: Speculative Race Conditions

ISSUE DESCRIPTION
=================

Researchers at VU Amsterdam and IBM Research have discovered GhostRace;
an analysis of the behaviour of synchronisation primitives under
speculative execution.

Synchronisation primitives are typically formed as an unbounded loop
which waits until a resource is available to be accessed.  This means
there is a conditional branch which can be microarchitecturally bypassed
using Spectre-v1 techniques, allowing an attacker to speculatively
execute critical regions.

Therefore, while a critical region might be safe architecturally, it can
still suffer from data races under speculation with unsafe consequences.

The GhostRace paper focuses on Speculative Concurrent Use-After-Free
issues, but notes that there are many other types of speculative data
hazard to be explored.

For more details, see:
  https://vusec.net/projects/ghostrace

IMPACT
======

An attacker might be able to infer the contents of arbitrary host
memory, including memory assigned to other guests.

VULNERABLE SYSTEMS
==================

Systems running all versions of Xen are affected.

GhostRace is a variation of Spectre-v1, and Spectre-v1 is known to
affect a wide range of CPU architectures and designs.  Consult your
hardware vendor.

However, Xen does not have any known gadgets vulnerable to GhostRace at
the time of writing.

Furthermore, even with the vulnerable instance found in Linux, the
researchers had to insert an artificial syscall to make the instance
more accessible to a userspace attacker.

Therefore, The Xen Security Team does not believe that immediate action
is required.

MITIGATION
==========

There are no mitigations.

RESOLUTION
==========

Out of caution, the Xen Security Team have provided hardening patches
including the addition of a new LOCK_HARDEN mechanism on x86 similar to
the existing BRANCH_HARDEN.

LOCK_HARDEN is off by default, owing to the uncertainty of there being a
vulnerability under Xen, and uncertainty over the performance impact.

However, we expect more research to happen in this area, and feel it is
prudent to have a mitigation in place.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa453/xsa453-?.patch           xen-unstable
xsa453/xsa453-4.18-?.patch      Xen 4.18.x
xsa453/xsa453-4.17-?.patch      Xen 4.17.x
xsa453/xsa453-4.16-?.patch      Xen 4.16.x
xsa453/xsa453-4.15-?.patch      Xen 4.15.x

$ sha256sum xsa453*/*
5487c6595b114187191e09bc5d7510d228a018ca98bc43ef58f8225fbd843636  xsa453/xsa453-1.patch
1d4ae5ce07f6869dbc20342289d8a00937868014b1c8a69054815cce7a836761  xsa453/xsa453-2.patch
a873074149a74ce1f6252cdaa20e5432930f77caf59cf328ce6c2e0b000e1f3b  xsa453/xsa453-3.patch
12b3e60005f50df1b7050984f0b7545eadc5e99425dae3b4d186c67a4caaeee4  xsa453/xsa453-4.15-1.patch
0b242be1d3fa0c4bcbb3e7755c0267ec6d307c75eec2e7348d8405017af0ab06  xsa453/xsa453-4.15-2.patch
2d17d1586e4b20a5c7677c3ab4553971251c123570c05d1adfede671f5e1d501  xsa453/xsa453-4.15-3.patch
8d209d1c9d3585bd190f9c97d866ff30ef18514ebf874869e5881b5856d3b81e  xsa453/xsa453-4.15-4.patch
350dbcb1f22874f5545936c307a69ae8acd8eef5f24dfccfe2ba2d1e8997c14d  xsa453/xsa453-4.15-5.patch
334fe9512a90c84210a010d9aff82b96eac00d9beb8291a243339e5ca9fb69c2  xsa453/xsa453-4.15-6.patch
bc3781df298eba4b306b742a8b06869eb83c5619a4dd3ae0ddd746a96708e3ea  xsa453/xsa453-4.15-7.patch
b8f0798863f70c65b20809f6749ef17e098f74e944386a7c8199396a7aab7295  xsa453/xsa453-4.15-8.patch
85c66b0f6fad0df2a705a48f75506142cacdf39bab1b68bb22ce4924d3ddae1c  xsa453/xsa453-4.16-1.patch
35416e86df8b55e0d165edef33557d3232c6c7b56ea36fb12278242134279fae  xsa453/xsa453-4.16-2.patch
1f6f09b860d7dc4add0356dd544d85faab6750a5dc72d15438e77322498c0d39  xsa453/xsa453-4.16-3.patch
8d209d1c9d3585bd190f9c97d866ff30ef18514ebf874869e5881b5856d3b81e  xsa453/xsa453-4.16-4.patch
350dbcb1f22874f5545936c307a69ae8acd8eef5f24dfccfe2ba2d1e8997c14d  xsa453/xsa453-4.16-5.patch
f03fba4192ec375220557c6488986c4bb0acb130fcdc61c0a3fe7bb48ffeaf98  xsa453/xsa453-4.16-6.patch
702330fe49015e174fac88cc290cc4ba78af97cc27ca6ac6d612a7f3de264ca1  xsa453/xsa453-4.16-7.patch
cc25536abac03b92a3486df8db4a89aecb8447aa1d31870def4ebf90782017df  xsa453/xsa453-4.16-8.patch
9b0e67756cb0f98721f748f76b767da88cad22969bf32052f9171e0260c8c596  xsa453/xsa453-4.17-1.patch
1cde6cae3738a380d35b769d44344d8e92585d9f4f8bccff1cae933b3d7dd5c8  xsa453/xsa453-4.17-2.patch
dbd117b3482ff24b146ee4936a691ed796ae073abd1c66db5cb5b5ede04c82ea  xsa453/xsa453-4.17-3.patch
00f78778eb392aeda13803bb321d255335fea27abd3beb8fcc70a49ce81fcb3c  xsa453/xsa453-4.17-4.patch
9bad3d96b74ceb9ce6232d4b4e434f7a023ad6ed31f6ff074869e037f6b296c6  xsa453/xsa453-4.17-5.patch
d62b1014347fcb7b6575fe0a1145b358719154655afd007a36739f6fe10cb4d6  xsa453/xsa453-4.17-6.patch
ba6597f3bf859ae38eef675e3540fc8f79dd2a672486c0fbe31a5740cafeffcd  xsa453/xsa453-4.17-7.patch
eb92c317c367689e401d20ce9ff2e5e5b5c551bc8f36424012ccc71c3df240e3  xsa453/xsa453-4.17-8.patch
70334588834939d8e06f0ec3edec2f0e10c1fc5af11aac01a71e6c78075f7352  xsa453/xsa453-4.18-1.patch
7960863a4917ae994a20c5dcd93f080b328749ef24108a5ec436b4a32ff12f07  xsa453/xsa453-4.18-2.patch
57306cbd89f4dc6c65ad89f3a7fedf3b84ebd28f423b54de8a18d8bc247bfbc5  xsa453/xsa453-4.18-3.patch
6280c40626e8d190e4c7216d7574be2bcf5a8143509640a6241706c21fdc3336  xsa453/xsa453-4.18-4.patch
cc9206b7bde3748b3ac58c338f1b233aae25be91fa1a56442e54030037188509  xsa453/xsa453-4.18-5.patch
12ddaedad54794bf7f64b4954e167dca92bfa53a658f3eeec9bd93ce282eee65  xsa453/xsa453-4.18-6.patch
86d1972ca5a01167d4f8da28256e2183227e7d1d0e5245dc85521b260299c64e  xsa453/xsa453-4.18-7.patch
0feec9819a74ab61664e31fff1a0df4b1fe4145fd62fcd5ca7dfc6566f9f938c  xsa453/xsa453-4.patch
9c22f02fe450fc5a05121040f8137b2755c2d196b0a777643587a166ab29a5e6  xsa453/xsa453-5.patch
ef4312c837f6e295796c1bc9a70f5ae27ac846e7149694c9c1f13b10e2b92945  xsa453/xsa453-6.patch
e7b8750f00c9d2018b4c43cceaf931837ea84ee2a8bf40aaf694e1f2f13c7ef1  xsa453/xsa453-7.patch
$

NOTE ABOUT IPI LIVELOCK
=======================

A observation from the GhostRace paper, unrelated to speculation, is the
ability of userspace to livelock the kernel with IPIs.  While the
GhostRace paper is specific to Linux, similar primitives exist for guest
kernels.

However, after analysis and experimentation, The Xen Security Team are
not aware of a way for a guest kernel to mount a similar attack against
Xen.
-----BEGIN PGP SIGNATURE-----

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmXwhb8MHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZWrEH/jb7eEkcdFGvVFvuBbU4dNrEx61eql7LdHjbvLg+
8PkdhjRafl3h766tqilbZiF+ZhM/HmV3i+5t7x6+HhsO59eMuWLghVC1woy0H6VI
QSVAio918183Z7HogcSBw1Z1dFup7rTX3aX9hi/TLARN0VY1mxH3hmxJ7iNYsBHw
mLjgcRXj+aM7iRmIMveWAJD39UU9KVV4F2jDaJl+ay2vH5dwrtlKMdI7Yv9lY45P
USAZxWQJ35ifpZtVTN6C38LzkHPJRvpZib7K+DnfIAaZIwWr10ZSjS+LxK+UMaYJ
fejYte+ki40uS0E7AhlesBSQb7C6qDM8GJbMtwj6en5LN14=
=V/0y
-----END PGP SIGNATURE-----
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: xen: Swap order of actions in the FREE*() macros

Wherever possible, it is a good idea to NULL out the visible reference to an
object prior to freeing it.  The FREE*() macros already collect together both
parts, making it easy to adjust.

This has a marginal code generation improvement, as some of the calls to the
free() function can be tailcall optimised.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index bb29b352ec06..3e84960a365f 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -93,8 +93,9 @@ bool scrub_free_pages(void);
 
 /* Free an allocation, and zero the pointer to it. */
 #define FREE_XENHEAP_PAGES(p, o) do { \
-    free_xenheap_pages(p, o);         \
+    void *_ptr_ = (p);                \
     (p) = NULL;                       \
+    free_xenheap_pages(_ptr_, o);     \
 } while ( false )
 #define FREE_XENHEAP_PAGE(p) FREE_XENHEAP_PAGES(p, 0)
 
diff --git a/xen/include/xen/xmalloc.h b/xen/include/xen/xmalloc.h
index 9ecddbff5e00..1b88a83be879 100644
--- a/xen/include/xen/xmalloc.h
+++ b/xen/include/xen/xmalloc.h
@@ -66,9 +66,10 @@
 extern void xfree(void *p);
 
 /* Free an allocation, and zero the pointer to it. */
-#define XFREE(p) do { \
-    xfree(p);         \
-    (p) = NULL;       \
+#define XFREE(p) do {                       \
+    void *_ptr_ = (p);                      \
+    (p) = NULL;                             \
+    xfree(_ptr_);                           \
 } while ( false )
 
 /* Underlying functions */

From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86/spinlock: introduce support for blocking speculation into
 critical regions

Introduce a new Kconfig option to block speculation into lock protected
critical regions.  The Kconfig option is enabled by default, but the mitigation
won't be engaged unless it's explicitly enabled in the command line using
`spec-ctrl=lock-harden`.

Convert the spinlock acquire macros into always-inline functions, and introduce
a speculation barrier after the lock has been taken.  Note the speculation
barrier is not placed inside the implementation of the spin lock functions, as
to prevent speculation from falling through the call to the lock functions
resulting in the barrier also being skipped.

trylock variants are protected using a construct akin to the existing
evaluate_nospec().

This patch only implements the speculation barrier for x86.

Note spin locks are the only locking primitive taken care in this change,
further locking primitives will be adjusted by separate changes.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index a54d77528ae9..54edbc0fbc1a 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2366,7 +2366,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 >              {msr-sc,rsb,verw,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
->              unpriv-mmio,gds-mit,div-scrub}=<bool> ]`
+>              unpriv-mmio,gds-mit,div-scrub,lock-harden}=<bool> ]`
 
 Controls for speculative execution sidechannel mitigations.  By default, Xen
 will pick the most appropriate mitigations based on compiled in support,
@@ -2493,6 +2493,11 @@ On all hardware, the `div-scrub=` option can be used to force or prevent Xen
 from mitigating the DIV-leakage vulnerability.  By default, Xen will mitigate
 DIV-leakage on hardware believed to be vulnerable.
 
+If Xen is compiled with `CONFIG_SPECULATIVE_HARDEN_LOCK`, the `lock-harden=`
+boolean can be used to force or prevent Xen from using speculation barriers to
+protect lock critical regions.  This mitigation won't be engaged by default,
+and needs to be explicitly enabled on the command line.
+
 ### sync_console
 > `= <boolean>`
 
diff --git a/xen/arch/x86/include/asm/cpufeatures.h b/xen/arch/x86/include/asm/cpufeatures.h
index c3aad21c3b43..7e8221fd85dd 100644
--- a/xen/arch/x86/include/asm/cpufeatures.h
+++ b/xen/arch/x86/include/asm/cpufeatures.h
@@ -24,7 +24,7 @@ XEN_CPUFEATURE(APERFMPERF,        X86_SYNTH( 8)) /* APERFMPERF */
 XEN_CPUFEATURE(MFENCE_RDTSC,      X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */
 XEN_CPUFEATURE(XEN_SMEP,          X86_SYNTH(10)) /* SMEP gets used by Xen itself */
 XEN_CPUFEATURE(XEN_SMAP,          X86_SYNTH(11)) /* SMAP gets used by Xen itself */
-/* Bit 12 unused. */
+XEN_CPUFEATURE(SC_NO_LOCK_HARDEN, X86_SYNTH(12)) /* (Disable) Lock critical region hardening */
 XEN_CPUFEATURE(IND_THUNK_LFENCE,  X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */
 XEN_CPUFEATURE(IND_THUNK_JMP,     X86_SYNTH(14)) /* Use IND_THUNK_JMP */
 XEN_CPUFEATURE(SC_NO_BRANCH_HARDEN, X86_SYNTH(15)) /* (Disable) Conditional branch hardening */
diff --git a/xen/arch/x86/include/asm/nospec.h b/xen/arch/x86/include/asm/nospec.h
index 07606834c4c9..e058a3bb0e61 100644
--- a/xen/arch/x86/include/asm/nospec.h
+++ b/xen/arch/x86/include/asm/nospec.h
@@ -62,6 +62,32 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
 /* Override default implementation in nospec.h. */
 #define array_index_mask_nospec array_index_mask_nospec
 
+static always_inline void arch_block_lock_speculation(void)
+{
+    alternative("lfence", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+}
+
+/* Allow to insert a read memory barrier into conditionals */
+static always_inline bool barrier_lock_true(void)
+{
+    alternative("lfence #nospec-true", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+    return true;
+}
+
+static always_inline bool barrier_lock_false(void)
+{
+    alternative("lfence #nospec-false", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+    return false;
+}
+
+static always_inline bool arch_lock_evaluate_nospec(bool condition)
+{
+    if ( condition )
+        return barrier_lock_true();
+    else
+        return barrier_lock_false();
+}
+
 #endif /* _ASM_X86_NOSPEC_H */
 
 /*
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 0638d9980a61..0b670c3ca753 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -53,6 +53,7 @@ int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 static bool __initdata opt_branch_harden =
     IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH);
+static bool __initdata opt_lock_harden;
 
 bool __initdata bsp_delay_spec_ctrl;
 uint8_t __read_mostly default_xen_spec_ctrl;
@@ -121,6 +122,7 @@ static int __init cf_check parse_spec_ctrl(const char *s)
             opt_ssbd = false;
             opt_l1d_flush = 0;
             opt_branch_harden = false;
+            opt_lock_harden = false;
             opt_srb_lock = 0;
             opt_unpriv_mmio = false;
             opt_gds_mit = 0;
@@ -286,6 +288,16 @@ static int __init cf_check parse_spec_ctrl(const char *s)
                 rc = -EINVAL;
             }
         }
+        else if ( (val = parse_boolean("lock-harden", s, ss)) >= 0 )
+        {
+            if ( IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
+                opt_lock_harden = val;
+            else
+            {
+                no_config_param("SPECULATIVE_HARDEN_LOCK", "spec-ctrl", s, ss);
+                rc = -EINVAL;
+            }
+        }
         else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
             opt_srb_lock = val;
         else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
@@ -488,7 +500,8 @@ static void __init print_details(enum ind_thunk thunk)
     if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) ||
          IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_ARRAY) ||
          IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) ||
-         IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) )
+         IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) ||
+         IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
         printk("  Compiled-in support:"
 #ifdef CONFIG_INDIRECT_THUNK
                " INDIRECT_THUNK"
@@ -504,11 +517,14 @@ static void __init print_details(enum ind_thunk thunk)
 #endif
 #ifdef CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS
                " HARDEN_GUEST_ACCESS"
+#endif
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+               " HARDEN_LOCK"
 #endif
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
+    printk("  Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s%s\n",
            thunk != THUNK_NONE      ? "BTI-Thunk: " : "",
            thunk == THUNK_NONE      ? "" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE, " :
@@ -535,7 +551,8 @@ static void __init print_details(enum ind_thunk thunk)
            opt_verw_pv || opt_verw_hvm ||
            opt_verw_mmio                             ? " VERW"  : "",
            opt_div_scrub                             ? " DIV" : "",
-           opt_branch_harden                         ? " BRANCH_HARDEN" : "");
+           opt_branch_harden                         ? " BRANCH_HARDEN" : "",
+           opt_lock_harden                           ? " LOCK_HARDEN" : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
@@ -1918,6 +1935,9 @@ void __init init_speculation_mitigations(void)
     if ( !opt_branch_harden )
         setup_force_cpu_cap(X86_FEATURE_SC_NO_BRANCH_HARDEN);
 
+    if ( !opt_lock_harden )
+        setup_force_cpu_cap(X86_FEATURE_SC_NO_LOCK_HARDEN);
+
     /*
      * We do not disable HT by default on affected hardware.
      *
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 310ad4229cdf..a5c3d5a6bf2f 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -188,6 +188,23 @@ config SPECULATIVE_HARDEN_GUEST_ACCESS
 
 	  If unsure, say Y.
 
+config SPECULATIVE_HARDEN_LOCK
+	bool "Speculative lock context hardening"
+	default y
+	depends on X86
+	help
+	  Contemporary processors may use speculative execution as a
+	  performance optimisation, but this can potentially be abused by an
+	  attacker to leak data via speculative sidechannels.
+
+	  One source of data leakage is via speculative accesses to lock
+	  critical regions.
+
+	  This option is disabled by default at run time, and needs to be
+	  enabled on the command line.
+
+	  If unsure, say Y.
+
 endmenu
 
 config DIT_DEFAULT
diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h
index 4c250ebbd663..e8d73f9538e5 100644
--- a/xen/include/xen/nospec.h
+++ b/xen/include/xen/nospec.h
@@ -69,6 +69,21 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
 #define array_access_nospec(array, index)                               \
     (array)[array_index_nospec(index, ARRAY_SIZE(array))]
 
+static always_inline void block_lock_speculation(void)
+{
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+    arch_block_lock_speculation();
+#endif
+}
+
+static always_inline bool lock_evaluate_nospec(bool condition)
+{
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+    return arch_lock_evaluate_nospec(condition);
+#endif
+    return condition;
+}
+
 #endif /* XEN_NOSPEC_H */
 
 /*
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index 0e6a083dfb9e..8430a888a8ca 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -1,6 +1,7 @@
 #ifndef __SPINLOCK_H__
 #define __SPINLOCK_H__
 
+#include <xen/nospec.h>
 #include <xen/time.h>
 #include <xen/types.h>
 
@@ -202,13 +203,30 @@ int _spin_trylock_recursive(spinlock_t *lock);
 void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
-#define spin_lock(l)                  _spin_lock(l)
-#define spin_lock_cb(l, c, d)         _spin_lock_cb(l, c, d)
-#define spin_lock_irq(l)              _spin_lock_irq(l)
+static always_inline void spin_lock(spinlock_t *l)
+{
+    _spin_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void spin_lock_cb(spinlock_t *l, void (*c)(void *data),
+                                       void *d)
+{
+    _spin_lock_cb(l, c, d);
+    block_lock_speculation();
+}
+
+static always_inline void spin_lock_irq(spinlock_t *l)
+{
+    _spin_lock_irq(l);
+    block_lock_speculation();
+}
+
 #define spin_lock_irqsave(l, f)                                 \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _spin_lock_irqsave(l));                          \
+        block_lock_speculation();                               \
     })
 
 #define spin_unlock(l)                _spin_unlock(l)
@@ -216,7 +234,7 @@ void _spin_unlock_recursive(spinlock_t *lock);
 #define spin_unlock_irqrestore(l, f)  _spin_unlock_irqrestore(l, f)
 
 #define spin_is_locked(l)             _spin_is_locked(l)
-#define spin_trylock(l)               _spin_trylock(l)
+#define spin_trylock(l)               lock_evaluate_nospec(_spin_trylock(l))
 
 #define spin_trylock_irqsave(lock, flags)       \
 ({                                              \
@@ -237,8 +255,15 @@ void _spin_unlock_recursive(spinlock_t *lock);
  * are any critical regions that cannot form part of such a set, they can use
  * standard spin_[un]lock().
  */
-#define spin_trylock_recursive(l)     _spin_trylock_recursive(l)
-#define spin_lock_recursive(l)        _spin_lock_recursive(l)
+#define spin_trylock_recursive(l) \
+    lock_evaluate_nospec(_spin_trylock_recursive(l))
+
+static always_inline void spin_lock_recursive(spinlock_t *l)
+{
+    _spin_lock_recursive(l);
+    block_lock_speculation();
+}
+
 #define spin_unlock_recursive(l)      _spin_unlock_recursive(l)
 
 #endif /* __SPINLOCK_H__ */
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: rwlock: introduce support for blocking speculation into critical
 regions

Introduce inline wrappers as required and add direct calls to
block_lock_speculation() in order to prevent speculation into the rwlock
protected critical regions.

Note the rwlock primitives are adjusted to use the non speculation safe variants
of the spinlock handlers, as a speculation barrier is added in the rwlock
calling wrappers.

trylock variants are protected by using lock_evaluate_nospec().

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index 18224a4bb5d6..290602936df6 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -34,8 +34,11 @@ void queue_read_lock_slowpath(rwlock_t *lock)
 
     /*
      * Put the reader into the wait queue.
+     *
+     * Use the speculation unsafe helper, as it's the caller responsibility to
+     * issue a speculation barrier if required.
      */
-    spin_lock(&lock->lock);
+    _spin_lock(&lock->lock);
 
     /*
      * At the head of the wait queue now, wait until the writer state
@@ -66,8 +69,13 @@ void queue_write_lock_slowpath(rwlock_t *lock)
 {
     u32 cnts;
 
-    /* Put the writer into the wait queue. */
-    spin_lock(&lock->lock);
+    /*
+     * Put the writer into the wait queue.
+     *
+     * Use the speculation unsafe helper, as it's the caller responsibility to
+     * issue a speculation barrier if required.
+     */
+    _spin_lock(&lock->lock);
 
     /* Try to acquire the lock directly if no reader is present. */
     if ( !atomic_read(&lock->cnts) &&
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index 08ba46de1552..ffff0fad45a7 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -259,27 +259,49 @@ static inline int _rw_is_write_locked(const rwlock_t *lock)
     return (atomic_read(&lock->cnts) & _QW_WMASK) == _QW_LOCKED;
 }
 
-#define read_lock(l)                  _read_lock(l)
-#define read_lock_irq(l)              _read_lock_irq(l)
+static always_inline void read_lock(rwlock_t *l)
+{
+    _read_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void read_lock_irq(rwlock_t *l)
+{
+    _read_lock_irq(l);
+    block_lock_speculation();
+}
+
 #define read_lock_irqsave(l, f)                                 \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _read_lock_irqsave(l));                          \
+        block_lock_speculation();                               \
     })
 
 #define read_unlock(l)                _read_unlock(l)
 #define read_unlock_irq(l)            _read_unlock_irq(l)
 #define read_unlock_irqrestore(l, f)  _read_unlock_irqrestore(l, f)
-#define read_trylock(l)               _read_trylock(l)
+#define read_trylock(l)               lock_evaluate_nospec(_read_trylock(l))
+
+static always_inline void write_lock(rwlock_t *l)
+{
+    _write_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void write_lock_irq(rwlock_t *l)
+{
+    _write_lock_irq(l);
+    block_lock_speculation();
+}
 
-#define write_lock(l)                 _write_lock(l)
-#define write_lock_irq(l)             _write_lock_irq(l)
 #define write_lock_irqsave(l, f)                                \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _write_lock_irqsave(l));                         \
+        block_lock_speculation();                               \
     })
-#define write_trylock(l)              _write_trylock(l)
+#define write_trylock(l)              lock_evaluate_nospec(_write_trylock(l))
 
 #define write_unlock(l)               _write_unlock(l)
 #define write_unlock_irq(l)           _write_unlock_irq(l)
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/paging: Delete update_cr3()'s do_locking parameter

Nicola reports that the XSA-438 fix introduced new MISRA violations because of
some incidental tidying it tried to do.  The parameter is useless, so resolve
the MISRA regression by removing it.

hap_update_cr3() discards the parameter entirely, while sh_update_cr3() uses
it to distinguish internal and external callers and therefore whether the
paging lock should be taken.

However, we have paging_lock_recursive() for this purpose, which also avoids
the ability for the shadow internal callers to accidentally not hold the lock.

Fixes: fb0ff49fe9f7 ("x86/shadow: defer releasing of PV's top-level shadow reference")
Reported-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit e71157d1ac2a7fbf413130663cf0a93ff9fbcf7e)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index fa479d3d97b3..63c29da696dd 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -728,7 +728,7 @@ static bool_t hap_invlpg(struct vcpu *v, unsigned long linear)
     return 1;
 }
 
-static pagetable_t hap_update_cr3(struct vcpu *v, bool do_locking, bool noflush)
+static pagetable_t hap_update_cr3(struct vcpu *v, bool noflush)
 {
     v->arch.hvm.hw_cr[3] = v->arch.hvm.guest_cr[3];
     hvm_update_guest_cr3(v, noflush);
@@ -818,7 +818,7 @@ static void hap_update_paging_modes(struct vcpu *v)
     }
 
     /* CR3 is effectively updated by a mode change. Flush ASIDs, etc. */
-    hap_update_cr3(v, 0, false);
+    hap_update_cr3(v, false);
 
  unlock:
     paging_unlock(d);
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 04ceca4a52b0..364d74f595cc 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -2630,7 +2630,7 @@ static void sh_update_paging_modes(struct vcpu *v)
     }
 #endif /* OOS */
 
-    v->arch.paging.mode->update_cr3(v, 0, false);
+    v->arch.paging.mode->update_cr3(v, false);
 }
 
 void shadow_update_paging_modes(struct vcpu *v)
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 35d47d6fbb3d..5f73bba41bfb 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2884,7 +2884,7 @@ static int sh_page_fault(struct vcpu *v,
          * In any case, in the PAE case, the ASSERT is not true; it can
          * happen because of actions the guest is taking. */
 #if GUEST_PAGING_LEVELS == 3
-        v->arch.paging.mode->update_cr3(v, 0, false);
+        v->arch.paging.mode->update_cr3(v, false);
 #else
         ASSERT(d->is_shutting_down);
 #endif
@@ -3604,17 +3604,13 @@ sh_detach_old_tables(struct vcpu *v)
     }
 }
 
-static pagetable_t
-sh_update_cr3(struct vcpu *v, bool do_locking, bool noflush)
+static pagetable_t sh_update_cr3(struct vcpu *v, bool noflush)
 /* Updates vcpu->arch.cr3 after the guest has changed CR3.
  * Paravirtual guests should set v->arch.guest_table (and guest_table_user,
  * if appropriate).
  * HVM guests should also make sure hvm_get_guest_cntl_reg(v, 3) works;
  * this function will call hvm_update_guest_cr(v, 3) to tell them where the
  * shadow tables are.
- * If do_locking != 0, assume we are being called from outside the
- * shadow code, and must take and release the paging lock; otherwise
- * that is the caller's responsibility.
  */
 {
     struct domain *d = v->domain;
@@ -3632,7 +3628,11 @@ sh_update_cr3(struct vcpu *v, bool do_locking, bool noflush)
         return old_entry;
     }
 
-    if ( do_locking ) paging_lock(v->domain);
+    /*
+     * This is used externally (with the paging lock not taken) and internally
+     * by the shadow code (with the lock already taken).
+     */
+    paging_lock_recursive(v->domain);
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
     /* Need to resync all the shadow entries on a TLB flush.  Resync
@@ -3870,8 +3870,7 @@ sh_update_cr3(struct vcpu *v, bool do_locking, bool noflush)
     shadow_sync_other_vcpus(v);
 #endif
 
-    /* Release the lock, if we took it (otherwise it's the caller's problem) */
-    if ( do_locking ) paging_unlock(v->domain);
+    paging_unlock(v->domain);
 
     return old_entry;
 }
diff --git a/xen/arch/x86/mm/shadow/none.c b/xen/arch/x86/mm/shadow/none.c
index 9b9c03ce7e1c..f80a2ba0f3e0 100644
--- a/xen/arch/x86/mm/shadow/none.c
+++ b/xen/arch/x86/mm/shadow/none.c
@@ -50,7 +50,7 @@ static unsigned long _gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
     return gfn_x(INVALID_GFN);
 }
 
-static pagetable_t _update_cr3(struct vcpu *v, bool do_locking, bool noflush)
+static pagetable_t _update_cr3(struct vcpu *v, bool noflush)
 {
     ASSERT_UNREACHABLE();
     return pagetable_null();
diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
index 5bcdbf93a770..3ad8aab47135 100644
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -136,8 +136,7 @@ struct paging_mode {
                                             unsigned long cr3,
                                             paddr_t ga, uint32_t *pfec,
                                             unsigned int *page_order);
-    pagetable_t   (*update_cr3            )(struct vcpu *v, bool do_locking,
-                                            bool noflush);
+    pagetable_t   (*update_cr3            )(struct vcpu *v, bool noflush);
     void          (*update_paging_modes   )(struct vcpu *v);
     bool          (*flush_tlb             )(bool (*flush_vcpu)(void *ctxt,
                                                                struct vcpu *v),
@@ -311,7 +310,7 @@ static inline unsigned long paging_ga_to_gfn_cr3(struct vcpu *v,
  * as the value to load into the host CR3 to schedule this vcpu */
 static inline pagetable_t paging_update_cr3(struct vcpu *v, bool noflush)
 {
-    return paging_get_hostmode(v)->update_cr3(v, 1, noflush);
+    return paging_get_hostmode(v)->update_cr3(v, noflush);
 }
 
 /* Update all the things that are derived from the guest's CR0/CR3/CR4.

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: xen: Swap order of actions in the FREE*() macros

Wherever possible, it is a good idea to NULL out the visible reference to an
object prior to freeing it.  The FREE*() macros already collect together both
parts, making it easy to adjust.

This has a marginal code generation improvement, as some of the calls to the
free() function can be tailcall optimised.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c4f427ec879e7c0df6d44d02561e8bee838a293e)

diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 667f9dac83a4..7fc8a8898afb 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -80,8 +80,9 @@ bool scrub_free_pages(void);
 
 /* Free an allocation, and zero the pointer to it. */
 #define FREE_XENHEAP_PAGES(p, o) do { \
-    free_xenheap_pages(p, o);         \
+    void *_ptr_ = (p);                \
     (p) = NULL;                       \
+    free_xenheap_pages(_ptr_, o);     \
 } while ( false )
 #define FREE_XENHEAP_PAGE(p) FREE_XENHEAP_PAGES(p, 0)
 
diff --git a/xen/include/xen/xmalloc.h b/xen/include/xen/xmalloc.h
index 16979a117c6a..d857298011c1 100644
--- a/xen/include/xen/xmalloc.h
+++ b/xen/include/xen/xmalloc.h
@@ -66,9 +66,10 @@
 extern void xfree(void *);
 
 /* Free an allocation, and zero the pointer to it. */
-#define XFREE(p) do { \
-    xfree(p);         \
-    (p) = NULL;       \
+#define XFREE(p) do {                       \
+    void *_ptr_ = (p);                      \
+    (p) = NULL;                             \
+    xfree(_ptr_);                           \
 } while ( false )
 
 /* Underlying functions */
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86/spinlock: introduce support for blocking speculation into
 critical regions

Introduce a new Kconfig option to block speculation into lock protected
critical regions.  The Kconfig option is enabled by default, but the mitigation
won't be engaged unless it's explicitly enabled in the command line using
`spec-ctrl=lock-harden`.

Convert the spinlock acquire macros into always-inline functions, and introduce
a speculation barrier after the lock has been taken.  Note the speculation
barrier is not placed inside the implementation of the spin lock functions, as
to prevent speculation from falling through the call to the lock functions
resulting in the barrier also being skipped.

trylock variants are protected using a construct akin to the existing
evaluate_nospec().

This patch only implements the speculation barrier for x86.

Note spin locks are the only locking primitive taken care in this change,
further locking primitives will be adjusted by separate changes.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 7ef0084418e188d05f338c3e028fbbe8b6924afa)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index d020d79dde25..5e63e3d8224e 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2189,7 +2189,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 >              {msr-sc,rsb,verw,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
->              unpriv-mmio,gds-mit,div-scrub}=<bool> ]`
+>              unpriv-mmio,gds-mit,div-scrub,lock-harden}=<bool> ]`
 
 Controls for speculative execution sidechannel mitigations.  By default, Xen
 will pick the most appropriate mitigations based on compiled in support,
@@ -2314,6 +2314,11 @@ On all hardware, the `div-scrub=` option can be used to force or prevent Xen
 from mitigating the DIV-leakage vulnerability.  By default, Xen will mitigate
 DIV-leakage on hardware believed to be vulnerable.
 
+If Xen is compiled with `CONFIG_SPECULATIVE_HARDEN_LOCK`, the `lock-harden=`
+boolean can be used to force or prevent Xen from using speculation barriers to
+protect lock critical regions.  This mitigation won't be engaged by default,
+and needs to be explicitly enabled on the command line.
+
 ### sync_console
 > `= <boolean>`
 
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index dd86b89bb153..b24c36c99e6f 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -63,6 +63,7 @@ int8_t __read_mostly opt_ibpb_ctxt_switch = -1;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 bool __read_mostly opt_branch_harden = true;
+static bool __initdata opt_lock_harden;
 
 bool __initdata bsp_delay_spec_ctrl;
 uint8_t __read_mostly default_xen_spec_ctrl;
@@ -131,6 +132,7 @@ static int __init parse_spec_ctrl(const char *s)
             opt_ssbd = false;
             opt_l1d_flush = 0;
             opt_branch_harden = false;
+            opt_lock_harden = false;
             opt_srb_lock = 0;
             opt_unpriv_mmio = false;
             opt_gds_mit = 0;
@@ -282,6 +284,16 @@ static int __init parse_spec_ctrl(const char *s)
             opt_l1d_flush = val;
         else if ( (val = parse_boolean("branch-harden", s, ss)) >= 0 )
             opt_branch_harden = val;
+        else if ( (val = parse_boolean("lock-harden", s, ss)) >= 0 )
+        {
+            if ( IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
+                opt_lock_harden = val;
+            else
+            {
+                no_config_param("SPECULATIVE_HARDEN_LOCK", "spec-ctrl", s, ss);
+                rc = -EINVAL;
+            }
+        }
         else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
             opt_srb_lock = val;
         else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
@@ -481,18 +493,22 @@ static void __init print_details(enum ind_thunk thunk)
            (e21a & cpufeat_mask(X86_FEATURE_SBPB))           ? " SBPB"           : "");
 
     /* Compiled-in support which pertains to mitigations. */
-    if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
+    if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) ||
+         IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
         printk("  Compiled-in support:"
 #ifdef CONFIG_INDIRECT_THUNK
                " INDIRECT_THUNK"
 #endif
 #ifdef CONFIG_SHADOW_PAGING
                " SHADOW_PAGING"
+#endif
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+               " HARDEN_LOCK"
 #endif
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -518,7 +534,8 @@ static void __init print_details(enum ind_thunk thunk)
            opt_verw_pv || opt_verw_hvm ||
            opt_verw_mmio                             ? " VERW"  : "",
            opt_div_scrub                             ? " DIV" : "",
-           opt_branch_harden                         ? " BRANCH_HARDEN" : "");
+           opt_branch_harden                         ? " BRANCH_HARDEN" : "",
+           opt_lock_harden                           ? " LOCK_HARDEN" : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
@@ -1816,6 +1833,9 @@ void __init init_speculation_mitigations(void)
     if ( opt_branch_harden )
         setup_force_cpu_cap(X86_FEATURE_SC_BRANCH_HARDEN);
 
+    if ( !opt_lock_harden )
+        setup_force_cpu_cap(X86_FEATURE_SC_NO_LOCK_HARDEN);
+
     /*
      * We do not disable HT by default on affected hardware.
      *
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index eb953d171eb2..aa18d423b23c 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -129,6 +129,23 @@ config SPECULATIVE_HARDEN_GUEST_ACCESS
 
 	  If unsure, say Y.
 
+config SPECULATIVE_HARDEN_LOCK
+	bool "Speculative lock context hardening"
+	default y
+	depends on X86
+	help
+	  Contemporary processors may use speculative execution as a
+	  performance optimisation, but this can potentially be abused by an
+	  attacker to leak data via speculative sidechannels.
+
+	  One source of data leakage is via speculative accesses to lock
+	  critical regions.
+
+	  This option is disabled by default at run time, and needs to be
+	  enabled on the command line.
+
+	  If unsure, say Y.
+
 endmenu
 
 config HYPFS
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index d993e06e4ce8..24a182d37f21 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -24,7 +24,7 @@ XEN_CPUFEATURE(APERFMPERF,        X86_SYNTH( 8)) /* APERFMPERF */
 XEN_CPUFEATURE(MFENCE_RDTSC,      X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */
 XEN_CPUFEATURE(XEN_SMEP,          X86_SYNTH(10)) /* SMEP gets used by Xen itself */
 XEN_CPUFEATURE(XEN_SMAP,          X86_SYNTH(11)) /* SMAP gets used by Xen itself */
-/* Bit 12 - unused. */
+XEN_CPUFEATURE(SC_NO_LOCK_HARDEN, X86_SYNTH(12)) /* (Disable) Lock critical region hardening */
 XEN_CPUFEATURE(IND_THUNK_LFENCE,  X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */
 XEN_CPUFEATURE(IND_THUNK_JMP,     X86_SYNTH(14)) /* Use IND_THUNK_JMP */
 XEN_CPUFEATURE(SC_BRANCH_HARDEN,  X86_SYNTH(15)) /* Conditional Branch Hardening */
diff --git a/xen/include/asm-x86/nospec.h b/xen/include/asm-x86/nospec.h
index f6eb84eee554..e38f56cbe8f4 100644
--- a/xen/include/asm-x86/nospec.h
+++ b/xen/include/asm-x86/nospec.h
@@ -27,6 +27,32 @@ static always_inline void block_speculation(void)
     barrier_nospec_true();
 }
 
+static always_inline void arch_block_lock_speculation(void)
+{
+    alternative("lfence", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+}
+
+/* Allow to insert a read memory barrier into conditionals */
+static always_inline bool barrier_lock_true(void)
+{
+    alternative("lfence #nospec-true", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+    return true;
+}
+
+static always_inline bool barrier_lock_false(void)
+{
+    alternative("lfence #nospec-false", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+    return false;
+}
+
+static always_inline bool arch_lock_evaluate_nospec(bool condition)
+{
+    if ( condition )
+        return barrier_lock_true();
+    else
+        return barrier_lock_false();
+}
+
 #endif /* _ASM_X86_NOSPEC_H */
 
 /*
diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h
index 76255bc46efe..455284640396 100644
--- a/xen/include/xen/nospec.h
+++ b/xen/include/xen/nospec.h
@@ -70,6 +70,21 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
 #define array_access_nospec(array, index)                               \
     (array)[array_index_nospec(index, ARRAY_SIZE(array))]
 
+static always_inline void block_lock_speculation(void)
+{
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+    arch_block_lock_speculation();
+#endif
+}
+
+static always_inline bool lock_evaluate_nospec(bool condition)
+{
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+    return arch_lock_evaluate_nospec(condition);
+#endif
+    return condition;
+}
+
 #endif /* XEN_NOSPEC_H */
 
 /*
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index 9fa4e600c1f7..efdb21ea9072 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -1,6 +1,7 @@
 #ifndef __SPINLOCK_H__
 #define __SPINLOCK_H__
 
+#include <xen/nospec.h>
 #include <xen/time.h>
 #include <asm/system.h>
 #include <asm/spinlock.h>
@@ -189,13 +190,30 @@ int _spin_trylock_recursive(spinlock_t *lock);
 void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
-#define spin_lock(l)                  _spin_lock(l)
-#define spin_lock_cb(l, c, d)         _spin_lock_cb(l, c, d)
-#define spin_lock_irq(l)              _spin_lock_irq(l)
+static always_inline void spin_lock(spinlock_t *l)
+{
+    _spin_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void spin_lock_cb(spinlock_t *l, void (*c)(void *data),
+                                       void *d)
+{
+    _spin_lock_cb(l, c, d);
+    block_lock_speculation();
+}
+
+static always_inline void spin_lock_irq(spinlock_t *l)
+{
+    _spin_lock_irq(l);
+    block_lock_speculation();
+}
+
 #define spin_lock_irqsave(l, f)                                 \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _spin_lock_irqsave(l));                          \
+        block_lock_speculation();                               \
     })
 
 #define spin_unlock(l)                _spin_unlock(l)
@@ -203,7 +221,7 @@ void _spin_unlock_recursive(spinlock_t *lock);
 #define spin_unlock_irqrestore(l, f)  _spin_unlock_irqrestore(l, f)
 
 #define spin_is_locked(l)             _spin_is_locked(l)
-#define spin_trylock(l)               _spin_trylock(l)
+#define spin_trylock(l)               lock_evaluate_nospec(_spin_trylock(l))
 
 #define spin_trylock_irqsave(lock, flags)       \
 ({                                              \
@@ -224,8 +242,15 @@ void _spin_unlock_recursive(spinlock_t *lock);
  * are any critical regions that cannot form part of such a set, they can use
  * standard spin_[un]lock().
  */
-#define spin_trylock_recursive(l)     _spin_trylock_recursive(l)
-#define spin_lock_recursive(l)        _spin_lock_recursive(l)
+#define spin_trylock_recursive(l) \
+    lock_evaluate_nospec(_spin_trylock_recursive(l))
+
+static always_inline void spin_lock_recursive(spinlock_t *l)
+{
+    _spin_lock_recursive(l);
+    block_lock_speculation();
+}
+
 #define spin_unlock_recursive(l)      _spin_unlock_recursive(l)
 
 #endif /* __SPINLOCK_H__ */
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: rwlock: introduce support for blocking speculation into critical
 regions

Introduce inline wrappers as required and add direct calls to
block_lock_speculation() in order to prevent speculation into the rwlock
protected critical regions.

Note the rwlock primitives are adjusted to use the non speculation safe variants
of the spinlock handlers, as a speculation barrier is added in the rwlock
calling wrappers.

trylock variants are protected by using lock_evaluate_nospec().

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit a1fb15f61692b1fa9945fc51f55471ace49cdd59)

diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index dadab372b5e1..2464f745485d 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -34,8 +34,11 @@ void queue_read_lock_slowpath(rwlock_t *lock)
 
     /*
      * Put the reader into the wait queue.
+     *
+     * Use the speculation unsafe helper, as it's the caller responsibility to
+     * issue a speculation barrier if required.
      */
-    spin_lock(&lock->lock);
+    _spin_lock(&lock->lock);
 
     /*
      * At the head of the wait queue now, wait until the writer state
@@ -64,8 +67,13 @@ void queue_write_lock_slowpath(rwlock_t *lock)
 {
     u32 cnts;
 
-    /* Put the writer into the wait queue. */
-    spin_lock(&lock->lock);
+    /*
+     * Put the writer into the wait queue.
+     *
+     * Use the speculation unsafe helper, as it's the caller responsibility to
+     * issue a speculation barrier if required.
+     */
+    _spin_lock(&lock->lock);
 
     /* Try to acquire the lock directly if no reader is present. */
     if ( !atomic_read(&lock->cnts) &&
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index 0cc9167715b3..fd0458be94ae 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -247,27 +247,49 @@ static inline int _rw_is_write_locked(rwlock_t *lock)
     return (atomic_read(&lock->cnts) & _QW_WMASK) == _QW_LOCKED;
 }
 
-#define read_lock(l)                  _read_lock(l)
-#define read_lock_irq(l)              _read_lock_irq(l)
+static always_inline void read_lock(rwlock_t *l)
+{
+    _read_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void read_lock_irq(rwlock_t *l)
+{
+    _read_lock_irq(l);
+    block_lock_speculation();
+}
+
 #define read_lock_irqsave(l, f)                                 \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _read_lock_irqsave(l));                          \
+        block_lock_speculation();                               \
     })
 
 #define read_unlock(l)                _read_unlock(l)
 #define read_unlock_irq(l)            _read_unlock_irq(l)
 #define read_unlock_irqrestore(l, f)  _read_unlock_irqrestore(l, f)
-#define read_trylock(l)               _read_trylock(l)
+#define read_trylock(l)               lock_evaluate_nospec(_read_trylock(l))
+
+static always_inline void write_lock(rwlock_t *l)
+{
+    _write_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void write_lock_irq(rwlock_t *l)
+{
+    _write_lock_irq(l);
+    block_lock_speculation();
+}
 
-#define write_lock(l)                 _write_lock(l)
-#define write_lock_irq(l)             _write_lock_irq(l)
 #define write_lock_irqsave(l, f)                                \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _write_lock_irqsave(l));                         \
+        block_lock_speculation();                               \
     })
-#define write_trylock(l)              _write_trylock(l)
+#define write_trylock(l)              lock_evaluate_nospec(_write_trylock(l))
 
 #define write_unlock(l)               _write_unlock(l)
 #define write_unlock_irq(l)           _write_unlock_irq(l)
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: percpu-rwlock: introduce support for blocking speculation into
 critical regions

Add direct calls to block_lock_speculation() where required in order to prevent
speculation into the lock protected critical regions.  Also convert
_percpu_read_lock() from inline to always_inline.

Note that _percpu_write_lock() has been modified the use the non speculation
safe of the locking primites, as a speculation is added unconditionally by the
calling wrapper.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit f218daf6d3a3b847736d37c6a6b76031a0d08441)

diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index 2464f745485d..703276f4aa63 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -125,8 +125,12 @@ void _percpu_write_lock(percpu_rwlock_t **per_cpudata,
     /*
      * First take the write lock to protect against other writers or slow
      * path readers.
+     *
+     * Note we use the speculation unsafe variant of write_lock(), as the
+     * calling wrapper already adds a speculation barrier after the lock has
+     * been taken.
      */
-    write_lock(&percpu_rwlock->rwlock);
+    _write_lock(&percpu_rwlock->rwlock);
 
     /* Now set the global variable so that readers start using read_lock. */
     percpu_rwlock->writer_activating = 1;
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index fd0458be94ae..abe0804bf7d5 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -326,8 +326,8 @@ static inline void _percpu_rwlock_owner_check(percpu_rwlock_t **per_cpudata,
 #define percpu_rwlock_resource_init(l, owner) \
     (*(l) = (percpu_rwlock_t)PERCPU_RW_LOCK_UNLOCKED(&get_per_cpu_var(owner)))
 
-static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
-                                         percpu_rwlock_t *percpu_rwlock)
+static always_inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
+                                            percpu_rwlock_t *percpu_rwlock)
 {
     /* Validate the correct per_cpudata variable has been provided. */
     _percpu_rwlock_owner_check(per_cpudata, percpu_rwlock);
@@ -362,6 +362,8 @@ static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
     }
     else
     {
+        /* Other branch already has a speculation barrier in read_lock(). */
+        block_lock_speculation();
         /* All other paths have implicit check_lock() calls via read_lock(). */
         check_lock(&percpu_rwlock->rwlock.lock.debug, false);
     }
@@ -410,8 +412,12 @@ static inline void _percpu_write_unlock(percpu_rwlock_t **per_cpudata,
     _percpu_read_lock(&get_per_cpu_var(percpu), lock)
 #define percpu_read_unlock(percpu, lock) \
     _percpu_read_unlock(&get_per_cpu_var(percpu), lock)
-#define percpu_write_lock(percpu, lock) \
-    _percpu_write_lock(&get_per_cpu_var(percpu), lock)
+
+#define percpu_write_lock(percpu, lock)                 \
+({                                                      \
+    _percpu_write_lock(&get_per_cpu_var(percpu), lock); \
+    block_lock_speculation();                           \
+})
 #define percpu_write_unlock(percpu, lock) \
     _percpu_write_unlock(&get_per_cpu_var(percpu), lock)
 
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: locking: attempt to ensure lock wrappers are always inline

In order to prevent the locking speculation barriers from being inside of
`call`ed functions that could be speculatively bypassed.

While there also add an extra locking barrier to _mm_write_lock() in the branch
taken when the lock is already held.

Note some functions are switched to use the unsafe variants (without speculation
barrier) of the locking primitives, but a speculation barrier is always added
to the exposed public lock wrapping helper.  That's the case with
sched_spin_lock_double() or pcidevs_lock() for example.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 197ecd838a2aaf959a469df3696d4559c4f8b762)

diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 4d4d22b0e72b..6c0923a42e21 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -161,7 +161,7 @@ static int pt_irq_masked(struct periodic_time *pt)
  * pt->vcpu field, because another thread holding the pt_migrate lock
  * may already be spinning waiting for your vcpu lock.
  */
-static void pt_vcpu_lock(struct vcpu *v)
+static always_inline void pt_vcpu_lock(struct vcpu *v)
 {
     spin_lock(&v->arch.hvm.tm_lock);
 }
@@ -180,9 +180,13 @@ static void pt_vcpu_unlock(struct vcpu *v)
  * need to take an additional lock that protects against pt->vcpu
  * changing.
  */
-static void pt_lock(struct periodic_time *pt)
+static always_inline void pt_lock(struct periodic_time *pt)
 {
-    read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
+    /*
+     * Use the speculation unsafe variant for the first lock, as the following
+     * lock taking helper already includes a speculation barrier.
+     */
+    _read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
     spin_lock(&pt->vcpu->arch.hvm.tm_lock);
 }
 
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index d6c073dc5cf5..cc635a440571 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -88,8 +88,8 @@ static inline void _set_lock_level(int l)
     this_cpu(mm_lock_level) = l;
 }
 
-static inline void _mm_lock(const struct domain *d, mm_lock_t *l,
-                            const char *func, int level, int rec)
+static always_inline void _mm_lock(const struct domain *d, mm_lock_t *l,
+                                   const char *func, int level, int rec)
 {
     if ( !((mm_locked_by_me(l)) && rec) )
         _check_lock_level(d, level);
@@ -139,8 +139,8 @@ static inline int mm_write_locked_by_me(mm_rwlock_t *l)
     return (l->locker == get_processor_id());
 }
 
-static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
-                                  const char *func, int level)
+static always_inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
+                                         const char *func, int level)
 {
     if ( !mm_write_locked_by_me(l) )
     {
@@ -151,6 +151,8 @@ static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
         l->unlock_level = _get_lock_level();
         _set_lock_level(_lock_level(d, level));
     }
+    else
+        block_speculation();
     l->recurse_count++;
 }
 
@@ -164,8 +166,8 @@ static inline void mm_write_unlock(mm_rwlock_t *l)
     percpu_write_unlock(p2m_percpu_rwlock, &l->lock);
 }
 
-static inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
-                                 int level)
+static always_inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
+                                        int level)
 {
     _check_lock_level(d, level);
     percpu_read_lock(p2m_percpu_rwlock, &l->lock);
@@ -180,15 +182,15 @@ static inline void mm_read_unlock(mm_rwlock_t *l)
 
 /* This wrapper uses the line number to express the locking order below */
 #define declare_mm_lock(name)                                                 \
-    static inline void mm_lock_##name(const struct domain *d, mm_lock_t *l,   \
-                                      const char *func, int rec)              \
+    static always_inline void mm_lock_##name(                                 \
+        const struct domain *d, mm_lock_t *l, const char *func, int rec)      \
     { _mm_lock(d, l, func, MM_LOCK_ORDER_##name, rec); }
 #define declare_mm_rwlock(name)                                               \
-    static inline void mm_write_lock_##name(const struct domain *d,           \
-                                            mm_rwlock_t *l, const char *func) \
+    static always_inline void mm_write_lock_##name(                           \
+        const struct domain *d, mm_rwlock_t *l, const char *func)             \
     { _mm_write_lock(d, l, func, MM_LOCK_ORDER_##name); }                     \
-    static inline void mm_read_lock_##name(const struct domain *d,            \
-                                           mm_rwlock_t *l)                    \
+    static always_inline void mm_read_lock_##name(const struct domain *d,     \
+                                                  mm_rwlock_t *l)             \
     { _mm_read_lock(d, l, MM_LOCK_ORDER_##name); }
 /* These capture the name of the calling function */
 #define mm_lock(name, d, l) mm_lock_##name(d, l, __func__, 0)
@@ -321,7 +323,7 @@ declare_mm_lock(altp2mlist)
 #define MM_LOCK_ORDER_altp2m                 40
 declare_mm_rwlock(altp2m);
 
-static inline void p2m_lock(struct p2m_domain *p)
+static always_inline void p2m_lock(struct p2m_domain *p)
 {
     if ( p2m_is_altp2m(p) )
         mm_write_lock(altp2m, p->domain, &p->lock);
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index c1d26936831c..f9a830fb20b3 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -34,7 +34,7 @@
 #define superpage_aligned(_x)  (((_x)&(SUPERPAGE_PAGES-1))==0)
 
 /* Enforce lock ordering when grabbing the "external" page_alloc lock */
-static inline void lock_page_alloc(struct p2m_domain *p2m)
+static always_inline void lock_page_alloc(struct p2m_domain *p2m)
 {
     page_alloc_mm_pre_lock(p2m->domain);
     spin_lock(&(p2m->domain->page_alloc_lock));
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index c94ea74b121e..29e1e41eba19 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -57,7 +57,7 @@
  * just assume the event channel is free or unbound at the moment when the
  * evtchn_read_trylock() returns false.
  */
-static inline void evtchn_write_lock(struct evtchn *evtchn)
+static always_inline void evtchn_write_lock(struct evtchn *evtchn)
 {
     write_lock(&evtchn->lock);
 
@@ -323,8 +323,8 @@ static long evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
     return rc;
 }
 
-
-static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
+static always_inline void double_evtchn_lock(struct evtchn *lchn,
+                                             struct evtchn *rchn)
 {
     ASSERT(lchn != rchn);
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 01e426c67fb6..60d8ca3ddff4 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -397,7 +397,7 @@ static inline void act_set_gfn(struct active_grant_entry *act, gfn_t gfn)
 
 static DEFINE_PERCPU_RWLOCK_GLOBAL(grant_rwlock);
 
-static inline void grant_read_lock(struct grant_table *gt)
+static always_inline void grant_read_lock(struct grant_table *gt)
 {
     percpu_read_lock(grant_rwlock, &gt->lock);
 }
@@ -407,7 +407,7 @@ static inline void grant_read_unlock(struct grant_table *gt)
     percpu_read_unlock(grant_rwlock, &gt->lock);
 }
 
-static inline void grant_write_lock(struct grant_table *gt)
+static always_inline void grant_write_lock(struct grant_table *gt)
 {
     percpu_write_lock(grant_rwlock, &gt->lock);
 }
@@ -444,7 +444,7 @@ nr_active_grant_frames(struct grant_table *gt)
     return num_act_frames_from_sha_frames(nr_grant_frames(gt));
 }
 
-static inline struct active_grant_entry *
+static always_inline struct active_grant_entry *
 active_entry_acquire(struct grant_table *t, grant_ref_t e)
 {
     struct active_grant_entry *act;
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 03ace41540d6..9e80ad4c7463 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -348,23 +348,28 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
  * This avoids dead- or live-locks when this code is running on both
  * cpus at the same time.
  */
-static void sched_spin_lock_double(spinlock_t *lock1, spinlock_t *lock2,
-                                   unsigned long *flags)
+static always_inline void sched_spin_lock_double(
+    spinlock_t *lock1, spinlock_t *lock2, unsigned long *flags)
 {
+    /*
+     * In order to avoid extra overhead, use the locking primitives without the
+     * speculation barrier, and introduce a single barrier here.
+     */
     if ( lock1 == lock2 )
     {
-        spin_lock_irqsave(lock1, *flags);
+        *flags = _spin_lock_irqsave(lock1);
     }
     else if ( lock1 < lock2 )
     {
-        spin_lock_irqsave(lock1, *flags);
-        spin_lock(lock2);
+        *flags = _spin_lock_irqsave(lock1);
+        _spin_lock(lock2);
     }
     else
     {
-        spin_lock_irqsave(lock2, *flags);
-        spin_lock(lock1);
+        *flags = _spin_lock_irqsave(lock2);
+        _spin_lock(lock1);
     }
+    block_lock_speculation();
 }
 
 static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index 8c07f033d3b0..3ac3eac54a41 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -207,8 +207,24 @@ DECLARE_PER_CPU(cpumask_t, cpumask_scratch);
 #define cpumask_scratch        (&this_cpu(cpumask_scratch))
 #define cpumask_scratch_cpu(c) (&per_cpu(cpumask_scratch, c))
 
+/*
+ * Deal with _spin_lock_irqsave() returning the flags value instead of storing
+ * it in a passed parameter.
+ */
+#define _sched_spinlock0(lock, irq) _spin_lock##irq(lock)
+#define _sched_spinlock1(lock, irq, arg) ({ \
+    BUILD_BUG_ON(sizeof(arg) != sizeof(unsigned long)); \
+    (arg) = _spin_lock##irq(lock); \
+})
+
+#define _sched_spinlock__(nr) _sched_spinlock ## nr
+#define _sched_spinlock_(nr)  _sched_spinlock__(nr)
+#define _sched_spinlock(lock, irq, args...) \
+    _sched_spinlock_(count_args(args))(lock, irq, ## args)
+
 #define sched_lock(kind, param, cpu, irq, arg...) \
-static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
+static always_inline spinlock_t \
+*kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
 { \
     for ( ; ; ) \
     { \
@@ -220,10 +236,16 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
          * \
          * It may also be the case that v->processor may change but the \
          * lock may be the same; this will succeed in that case. \
+         * \
+         * Use the speculation unsafe locking helper, there's a speculation \
+         * barrier before returning to the caller. \
          */ \
-        spin_lock##irq(lock, ## arg); \
+        _sched_spinlock(lock, irq, ## arg); \
         if ( likely(lock == get_sched_res(cpu)->schedule_lock) ) \
+        { \
+            block_lock_speculation(); \
             return lock; \
+        } \
         spin_unlock##irq(lock, ## arg); \
     } \
 }
diff --git a/xen/common/timer.c b/xen/common/timer.c
index 1bb265ceea0e..dc831efc79e5 100644
--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -240,7 +240,7 @@ static inline void deactivate_timer(struct timer *timer)
     list_add(&timer->inactive, &per_cpu(timers, timer->cpu).inactive);
 }
 
-static inline bool_t timer_lock(struct timer *timer)
+static inline bool_t timer_lock_unsafe(struct timer *timer)
 {
     unsigned int cpu;
 
@@ -254,7 +254,8 @@ static inline bool_t timer_lock(struct timer *timer)
             rcu_read_unlock(&timer_cpu_read_lock);
             return 0;
         }
-        spin_lock(&per_cpu(timers, cpu).lock);
+        /* Use the speculation unsafe variant, the wrapper has the barrier. */
+        _spin_lock(&per_cpu(timers, cpu).lock);
         if ( likely(timer->cpu == cpu) )
             break;
         spin_unlock(&per_cpu(timers, cpu).lock);
@@ -267,8 +268,9 @@ static inline bool_t timer_lock(struct timer *timer)
 #define timer_lock_irqsave(t, flags) ({         \
     bool_t __x;                                 \
     local_irq_save(flags);                      \
-    if ( !(__x = timer_lock(t)) )               \
+    if ( !(__x = timer_lock_unsafe(t)) )        \
         local_irq_restore(flags);               \
+    block_lock_speculation();                   \
     __x;                                        \
 })
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 84d0d2fbab94..034b36fc232b 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -52,9 +52,10 @@ struct pci_seg {
 
 static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
 
-void pcidevs_lock(void)
+/* Do not use, as it has no speculation barrier, use pcidevs_lock() instead. */
+void pcidevs_lock_unsafe(void)
 {
-    spin_lock_recursive(&_pcidevs_lock);
+    _spin_lock_recursive(&_pcidevs_lock);
 }
 
 void pcidevs_unlock(void)
diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h
index 7c825e9d9c0a..d4b2beda798d 100644
--- a/xen/include/asm-x86/irq.h
+++ b/xen/include/asm-x86/irq.h
@@ -177,6 +177,7 @@ extern void irq_complete_move(struct irq_desc *);
 
 extern struct irq_desc *irq_desc;
 
+/* Not speculation safe, only used for AP bringup. */
 void lock_vector_lock(void);
 void unlock_vector_lock(void);
 
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index a0a85cdda899..cdfa3df6f9fb 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -105,12 +105,12 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
 #define bucket_from_port(d, p) \
     ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
 
-static inline void evtchn_read_lock(struct evtchn *evtchn)
+static always_inline void evtchn_read_lock(struct evtchn *evtchn)
 {
     read_lock(&evtchn->lock);
 }
 
-static inline bool evtchn_read_trylock(struct evtchn *evtchn)
+static always_inline bool evtchn_read_trylock(struct evtchn *evtchn)
 {
     return read_trylock(&evtchn->lock);
 }
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index cd238ae852b0..5eb8287a31f9 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -144,8 +144,12 @@ struct pci_dev {
  * devices, it also sync the access to the msi capability that is not
  * interrupt handling related (the mask bit register).
  */
-
-void pcidevs_lock(void);
+void pcidevs_lock_unsafe(void);
+static always_inline void pcidevs_lock(void)
+{
+    pcidevs_lock_unsafe();
+    block_lock_speculation();
+}
 void pcidevs_unlock(void);
 bool_t __must_check pcidevs_locked(void);
 
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86/mm: add speculation barriers to open coded locks

Add a speculation barrier to the clearly identified open-coded lock taking
functions.

Note that the memory sharing page_lock() replacement (_page_lock()) is left
as-is, as the code is experimental and not security supported.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 42a572a38e22a97d86a4b648a22597628d5b42e4)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 44ac8cae76a1..ad22543d1b88 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2004,7 +2004,7 @@ static inline bool current_locked_page_ne_check(struct page_info *page) {
 #define current_locked_page_ne_check(x) true
 #endif
 
-int page_lock(struct page_info *page)
+int page_lock_unsafe(struct page_info *page)
 {
     unsigned long x, nx;
 
@@ -2065,7 +2065,7 @@ void page_unlock(struct page_info *page)
  * l3t_lock(), so to avoid deadlock we must avoid grabbing them in
  * reverse order.
  */
-static void l3t_lock(struct page_info *page)
+static always_inline void l3t_lock(struct page_info *page)
 {
     unsigned long x, nx;
 
@@ -2074,6 +2074,8 @@ static void l3t_lock(struct page_info *page)
             cpu_relax();
         nx = x | PGT_locked;
     } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
+
+    block_lock_speculation();
 }
 
 static void l3t_unlock(struct page_info *page)
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index cffd0d642534..917fbe29bb96 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -393,7 +393,9 @@ const struct platform_bad_page *get_platform_badpages(unsigned int *array_size);
  * The use of PGT_locked in mem_sharing does not collide, since mem_sharing is
  * only supported for hvm guests, which do not have PV PTEs updated.
  */
-int page_lock(struct page_info *page);
+int page_lock_unsafe(struct page_info *page);
+#define page_lock(pg)   lock_evaluate_nospec(page_lock_unsafe(pg))
+
 void page_unlock(struct page_info *page);
 
 void put_page_type(struct page_info *page);
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86: protect conditional lock taking from speculative execution

Conditionally taken locks that use the pattern:

if ( lock )
    spin_lock(...);

Need an else branch in order to issue an speculation barrier in the else case,
just like it's done in case the lock needs to be acquired.

eval_nospec() could be used on the condition itself, but that would result in a
double barrier on the branch where the lock is taken.

Introduce a new pair of helpers, {gfn,spin}_lock_if() that can be used to
conditionally take a lock in a speculation safe way.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 03cf7ca23e0e876075954c558485b267b7d02406)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index ad22543d1b88..ffee1c6bd191 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4978,8 +4978,7 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
         if ( !l3t )
             return NULL;
         clear_page(l3t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
         {
             l4_pgentry_t l4e = l4e_from_paddr(__pa(l3t), __PAGE_HYPERVISOR);
@@ -5013,8 +5012,7 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
         if ( !l2t )
             return NULL;
         clear_page(l2t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
             l3e_write(pl3e, l3e_from_paddr(__pa(l2t), __PAGE_HYPERVISOR));
@@ -5046,8 +5044,7 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
         if ( !l1t )
             return NULL;
         clear_page(l1t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
             l2e_write(pl2e, l2e_from_paddr(__pa(l1t), __PAGE_HYPERVISOR));
@@ -5076,6 +5073,8 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
     do {                      \
         if ( locking )        \
             l3t_lock(page);   \
+        else                            \
+            block_lock_speculation();   \
     } while ( false )
 
 #define L3T_UNLOCK(page)                           \
@@ -5284,8 +5283,7 @@ int map_pages_to_xen(
             if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
                 flush_flags |= FLUSH_TLB_GLOBAL;
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
@@ -5384,8 +5382,7 @@ int map_pages_to_xen(
                 if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
                     flush_flags |= FLUSH_TLB_GLOBAL;
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
@@ -5424,8 +5421,7 @@ int map_pages_to_xen(
                 unsigned long base_mfn;
                 const l1_pgentry_t *l1t;
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
 
                 ol2e = *pl2e;
                 /*
@@ -5478,8 +5474,7 @@ int map_pages_to_xen(
             unsigned long base_mfn;
             const l2_pgentry_t *l2t;
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
 
             ol3e = *pl3e;
             /*
@@ -5614,8 +5609,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                           l2e_from_pfn(l3e_get_pfn(*pl3e) +
                                        (i << PAGETABLE_ORDER),
                                        l3e_get_flags(*pl3e)));
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+
+            spin_lock_if(locking, &map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
@@ -5671,8 +5666,8 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                     l1e_write(&l1t[i],
                               l1e_from_pfn(l2e_get_pfn(*pl2e) + i,
                                            l2e_get_flags(*pl2e) & ~_PAGE_PSE));
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+
+                spin_lock_if(locking, &map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
@@ -5714,8 +5709,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
              */
             if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
                 continue;
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
 
             /*
              * L2E may be already cleared, or set to a superpage, by
@@ -5760,8 +5754,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
         if ( (nf & _PAGE_PRESENT) ||
              ((v != e) && (l2_table_offset(v) + l1_table_offset(v) != 0)) )
             continue;
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
 
         /*
          * L3E may be already cleared, or set to a superpage, by
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index cc635a440571..7eee233b4cef 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -347,6 +347,15 @@ static inline void p2m_unlock(struct p2m_domain *p)
 #define p2m_locked_by_me(p)   mm_write_locked_by_me(&(p)->lock)
 #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
 
+static always_inline void gfn_lock_if(bool condition, struct p2m_domain *p2m,
+                                      gfn_t gfn, unsigned int order)
+{
+    if ( condition )
+        gfn_lock(p2m, gfn, order);
+    else
+        block_lock_speculation();
+}
+
 /* PoD lock (per-p2m-table)
  *
  * Protects private PoD data structs: entry and cache
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 107f6778a6e1..596928e083fb 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -509,9 +509,8 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
         return _mfn(gfn_l);
     }
 
-    if ( locked )
-        /* Grab the lock here, don't release until put_gfn */
-        gfn_lock(p2m, gfn, 0);
+    /* Grab the lock here, don't release until put_gfn */
+    gfn_lock_if(locked, p2m, gfn, 0);
 
     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index efdb21ea9072..8bffb3f4b610 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -216,6 +216,14 @@ static always_inline void spin_lock_irq(spinlock_t *l)
         block_lock_speculation();                               \
     })
 
+/* Conditionally take a spinlock in a speculation safe way. */
+static always_inline void spin_lock_if(bool condition, spinlock_t *l)
+{
+    if ( condition )
+        _spin_lock(l);
+    block_lock_speculation();
+}
+
 #define spin_unlock(l)                _spin_unlock(l)
 #define spin_unlock_irq(l)            _spin_unlock_irq(l)
 #define spin_unlock_irqrestore(l, f)  _spin_unlock_irqrestore(l, f)
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/paging: Delete update_cr3()'s do_locking parameter

Nicola reports that the XSA-438 fix introduced new MISRA violations because of
some incidental tidying it tried to do.  The parameter is useless, so resolve
the MISRA regression by removing it.

hap_update_cr3() discards the parameter entirely, while sh_update_cr3() uses
it to distinguish internal and external callers and therefore whether the
paging lock should be taken.

However, we have paging_lock_recursive() for this purpose, which also avoids
the ability for the shadow internal callers to accidentally not hold the lock.

Fixes: fb0ff49fe9f7 ("x86/shadow: defer releasing of PV's top-level shadow reference")
Reported-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit e71157d1ac2a7fbf413130663cf0a93ff9fbcf7e)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index fa479d3d97b3..63c29da696dd 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -728,7 +728,7 @@ static bool_t hap_invlpg(struct vcpu *v, unsigned long linear)
     return 1;
 }
 
-static pagetable_t hap_update_cr3(struct vcpu *v, bool do_locking, bool noflush)
+static pagetable_t hap_update_cr3(struct vcpu *v, bool noflush)
 {
     v->arch.hvm.hw_cr[3] = v->arch.hvm.guest_cr[3];
     hvm_update_guest_cr3(v, noflush);
@@ -818,7 +818,7 @@ static void hap_update_paging_modes(struct vcpu *v)
     }
 
     /* CR3 is effectively updated by a mode change. Flush ASIDs, etc. */
-    hap_update_cr3(v, 0, false);
+    hap_update_cr3(v, false);
 
  unlock:
     paging_unlock(d);
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 242b93537f9a..a8869a3fb7eb 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -2563,7 +2563,7 @@ static void sh_update_paging_modes(struct vcpu *v)
     }
 #endif /* OOS */
 
-    v->arch.paging.mode->update_cr3(v, 0, false);
+    v->arch.paging.mode->update_cr3(v, false);
 }
 
 void shadow_update_paging_modes(struct vcpu *v)
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index cf3ded70e75e..78bb89f1ee04 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2499,7 +2499,7 @@ static int sh_page_fault(struct vcpu *v,
          * In any case, in the PAE case, the ASSERT is not true; it can
          * happen because of actions the guest is taking. */
 #if GUEST_PAGING_LEVELS == 3
-        v->arch.paging.mode->update_cr3(v, 0, false);
+        v->arch.paging.mode->update_cr3(v, false);
 #else
         ASSERT(d->is_shutting_down);
 #endif
@@ -3219,17 +3219,13 @@ sh_detach_old_tables(struct vcpu *v)
     }
 }
 
-static pagetable_t
-sh_update_cr3(struct vcpu *v, bool do_locking, bool noflush)
+static pagetable_t sh_update_cr3(struct vcpu *v, bool noflush)
 /* Updates vcpu->arch.cr3 after the guest has changed CR3.
  * Paravirtual guests should set v->arch.guest_table (and guest_table_user,
  * if appropriate).
  * HVM guests should also make sure hvm_get_guest_cntl_reg(v, 3) works;
  * this function will call hvm_update_guest_cr(v, 3) to tell them where the
  * shadow tables are.
- * If do_locking != 0, assume we are being called from outside the
- * shadow code, and must take and release the paging lock; otherwise
- * that is the caller's responsibility.
  */
 {
     struct domain *d = v->domain;
@@ -3247,7 +3243,11 @@ sh_update_cr3(struct vcpu *v, bool do_locking, bool noflush)
         return old_entry;
     }
 
-    if ( do_locking ) paging_lock(v->domain);
+    /*
+     * This is used externally (with the paging lock not taken) and internally
+     * by the shadow code (with the lock already taken).
+     */
+    paging_lock_recursive(v->domain);
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
     /* Need to resync all the shadow entries on a TLB flush.  Resync
@@ -3483,8 +3483,7 @@ sh_update_cr3(struct vcpu *v, bool do_locking, bool noflush)
     shadow_sync_other_vcpus(v);
 #endif
 
-    /* Release the lock, if we took it (otherwise it's the caller's problem) */
-    if ( do_locking ) paging_unlock(v->domain);
+    paging_unlock(v->domain);
 
     return old_entry;
 }
diff --git a/xen/arch/x86/mm/shadow/none.c b/xen/arch/x86/mm/shadow/none.c
index 2a5fd409b2d8..003536980803 100644
--- a/xen/arch/x86/mm/shadow/none.c
+++ b/xen/arch/x86/mm/shadow/none.c
@@ -52,7 +52,7 @@ static unsigned long _gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m,
 }
 #endif
 
-static pagetable_t _update_cr3(struct vcpu *v, bool do_locking, bool noflush)
+static pagetable_t _update_cr3(struct vcpu *v, bool noflush)
 {
     ASSERT_UNREACHABLE();
     return pagetable_null();
diff --git a/xen/include/asm-x86/paging.h b/xen/include/asm-x86/paging.h
index fceb208d3671..bd7c7008ae79 100644
--- a/xen/include/asm-x86/paging.h
+++ b/xen/include/asm-x86/paging.h
@@ -138,8 +138,7 @@ struct paging_mode {
                                             paddr_t ga, uint32_t *pfec,
                                             unsigned int *page_order);
 #endif
-    pagetable_t   (*update_cr3            )(struct vcpu *v, bool do_locking,
-                                            bool noflush);
+    pagetable_t   (*update_cr3            )(struct vcpu *v, bool noflush);
     void          (*update_paging_modes   )(struct vcpu *v);
     bool          (*flush_tlb             )(bool (*flush_vcpu)(void *ctxt,
                                                                struct vcpu *v),
@@ -317,7 +316,7 @@ static inline unsigned long paging_ga_to_gfn_cr3(struct vcpu *v,
  * as the value to load into the host CR3 to schedule this vcpu */
 static inline pagetable_t paging_update_cr3(struct vcpu *v, bool noflush)
 {
-    return paging_get_hostmode(v)->update_cr3(v, 1, noflush);
+    return paging_get_hostmode(v)->update_cr3(v, noflush);
 }
 
 /* Update all the things that are derived from the guest's CR0/CR3/CR4.

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: xen: Swap order of actions in the FREE*() macros

Wherever possible, it is a good idea to NULL out the visible reference to an
object prior to freeing it.  The FREE*() macros already collect together both
parts, making it easy to adjust.

This has a marginal code generation improvement, as some of the calls to the
free() function can be tailcall optimised.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c4f427ec879e7c0df6d44d02561e8bee838a293e)

diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 3f5c296138cf..c0b77d563d80 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -80,8 +80,9 @@ bool scrub_free_pages(void);
 
 /* Free an allocation, and zero the pointer to it. */
 #define FREE_XENHEAP_PAGES(p, o) do { \
-    free_xenheap_pages(p, o);         \
+    void *_ptr_ = (p);                \
     (p) = NULL;                       \
+    free_xenheap_pages(_ptr_, o);     \
 } while ( false )
 #define FREE_XENHEAP_PAGE(p) FREE_XENHEAP_PAGES(p, 0)
 
diff --git a/xen/include/xen/xmalloc.h b/xen/include/xen/xmalloc.h
index 16979a117c6a..d857298011c1 100644
--- a/xen/include/xen/xmalloc.h
+++ b/xen/include/xen/xmalloc.h
@@ -66,9 +66,10 @@
 extern void xfree(void *);
 
 /* Free an allocation, and zero the pointer to it. */
-#define XFREE(p) do { \
-    xfree(p);         \
-    (p) = NULL;       \
+#define XFREE(p) do {                       \
+    void *_ptr_ = (p);                      \
+    (p) = NULL;                             \
+    xfree(_ptr_);                           \
 } while ( false )
 
 /* Underlying functions */
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86/spinlock: introduce support for blocking speculation into
 critical regions

Introduce a new Kconfig option to block speculation into lock protected
critical regions.  The Kconfig option is enabled by default, but the mitigation
won't be engaged unless it's explicitly enabled in the command line using
`spec-ctrl=lock-harden`.

Convert the spinlock acquire macros into always-inline functions, and introduce
a speculation barrier after the lock has been taken.  Note the speculation
barrier is not placed inside the implementation of the spin lock functions, as
to prevent speculation from falling through the call to the lock functions
resulting in the barrier also being skipped.

trylock variants are protected using a construct akin to the existing
evaluate_nospec().

This patch only implements the speculation barrier for x86.

Note spin locks are the only locking primitive taken care in this change,
further locking primitives will be adjusted by separate changes.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 7ef0084418e188d05f338c3e028fbbe8b6924afa)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index 029002fa82d6..33c32cfc1cbc 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2263,7 +2263,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 >              {msr-sc,rsb,verw,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
->              unpriv-mmio,gds-mit,div-scrub}=<bool> ]`
+>              unpriv-mmio,gds-mit,div-scrub,lock-harden}=<bool> ]`
 
 Controls for speculative execution sidechannel mitigations.  By default, Xen
 will pick the most appropriate mitigations based on compiled in support,
@@ -2388,6 +2388,11 @@ On all hardware, the `div-scrub=` option can be used to force or prevent Xen
 from mitigating the DIV-leakage vulnerability.  By default, Xen will mitigate
 DIV-leakage on hardware believed to be vulnerable.
 
+If Xen is compiled with `CONFIG_SPECULATIVE_HARDEN_LOCK`, the `lock-harden=`
+boolean can be used to force or prevent Xen from using speculation barriers to
+protect lock critical regions.  This mitigation won't be engaged by default,
+and needs to be explicitly enabled on the command line.
+
 ### sync_console
 > `= <boolean>`
 
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 24bf98a018a0..0a7af22a9b3c 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -63,6 +63,7 @@ int8_t __read_mostly opt_ibpb_ctxt_switch = -1;
 int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 static bool __initdata opt_branch_harden = true;
+static bool __initdata opt_lock_harden;
 
 bool __initdata bsp_delay_spec_ctrl;
 uint8_t __read_mostly default_xen_spec_ctrl;
@@ -131,6 +132,7 @@ static int __init parse_spec_ctrl(const char *s)
             opt_ssbd = false;
             opt_l1d_flush = 0;
             opt_branch_harden = false;
+            opt_lock_harden = false;
             opt_srb_lock = 0;
             opt_unpriv_mmio = false;
             opt_gds_mit = 0;
@@ -282,6 +284,16 @@ static int __init parse_spec_ctrl(const char *s)
             opt_l1d_flush = val;
         else if ( (val = parse_boolean("branch-harden", s, ss)) >= 0 )
             opt_branch_harden = val;
+        else if ( (val = parse_boolean("lock-harden", s, ss)) >= 0 )
+        {
+            if ( IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
+                opt_lock_harden = val;
+            else
+            {
+                no_config_param("SPECULATIVE_HARDEN_LOCK", "spec-ctrl", s, ss);
+                rc = -EINVAL;
+            }
+        }
         else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
             opt_srb_lock = val;
         else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
@@ -481,18 +493,22 @@ static void __init print_details(enum ind_thunk thunk)
            (e21a & cpufeat_mask(X86_FEATURE_SBPB))           ? " SBPB"           : "");
 
     /* Compiled-in support which pertains to mitigations. */
-    if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) )
+    if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) ||
+         IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
         printk("  Compiled-in support:"
 #ifdef CONFIG_INDIRECT_THUNK
                " INDIRECT_THUNK"
 #endif
 #ifdef CONFIG_SHADOW_PAGING
                " SHADOW_PAGING"
+#endif
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+               " HARDEN_LOCK"
 #endif
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
+    printk("  Xen settings: BTI-Thunk %s, SPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s%s\n",
            thunk == THUNK_NONE      ? "N/A" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE" :
            thunk == THUNK_LFENCE    ? "LFENCE" :
@@ -518,7 +534,8 @@ static void __init print_details(enum ind_thunk thunk)
            opt_verw_pv || opt_verw_hvm ||
            opt_verw_mmio                             ? " VERW"  : "",
            opt_div_scrub                             ? " DIV" : "",
-           opt_branch_harden                         ? " BRANCH_HARDEN" : "");
+           opt_branch_harden                         ? " BRANCH_HARDEN" : "",
+           opt_lock_harden                           ? " LOCK_HARDEN" : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
@@ -1889,6 +1906,9 @@ void __init init_speculation_mitigations(void)
     if ( !opt_branch_harden )
         setup_force_cpu_cap(X86_FEATURE_SC_NO_BRANCH_HARDEN);
 
+    if ( !opt_lock_harden )
+        setup_force_cpu_cap(X86_FEATURE_SC_NO_LOCK_HARDEN);
+
     /*
      * We do not disable HT by default on affected hardware.
      *
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index c9f4b7f49240..01c70109f539 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -161,6 +161,23 @@ config SPECULATIVE_HARDEN_GUEST_ACCESS
 
 	  If unsure, say Y.
 
+config SPECULATIVE_HARDEN_LOCK
+	bool "Speculative lock context hardening"
+	default y
+	depends on X86
+	help
+	  Contemporary processors may use speculative execution as a
+	  performance optimisation, but this can potentially be abused by an
+	  attacker to leak data via speculative sidechannels.
+
+	  One source of data leakage is via speculative accesses to lock
+	  critical regions.
+
+	  This option is disabled by default at run time, and needs to be
+	  enabled on the command line.
+
+	  If unsure, say Y.
+
 endmenu
 
 config HYPFS
diff --git a/xen/include/asm-x86/cpufeatures.h b/xen/include/asm-x86/cpufeatures.h
index 70b93b6b443f..7e8221fd85dd 100644
--- a/xen/include/asm-x86/cpufeatures.h
+++ b/xen/include/asm-x86/cpufeatures.h
@@ -24,7 +24,7 @@ XEN_CPUFEATURE(APERFMPERF,        X86_SYNTH( 8)) /* APERFMPERF */
 XEN_CPUFEATURE(MFENCE_RDTSC,      X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */
 XEN_CPUFEATURE(XEN_SMEP,          X86_SYNTH(10)) /* SMEP gets used by Xen itself */
 XEN_CPUFEATURE(XEN_SMAP,          X86_SYNTH(11)) /* SMAP gets used by Xen itself */
-/* Bit 12 - unused. */
+XEN_CPUFEATURE(SC_NO_LOCK_HARDEN, X86_SYNTH(12)) /* (Disable) Lock critical region hardening */
 XEN_CPUFEATURE(IND_THUNK_LFENCE,  X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */
 XEN_CPUFEATURE(IND_THUNK_JMP,     X86_SYNTH(14)) /* Use IND_THUNK_JMP */
 XEN_CPUFEATURE(SC_NO_BRANCH_HARDEN, X86_SYNTH(15)) /* (Disable) Conditional branch hardening */
diff --git a/xen/include/asm-x86/nospec.h b/xen/include/asm-x86/nospec.h
index 7150e76b87fb..0725839e1982 100644
--- a/xen/include/asm-x86/nospec.h
+++ b/xen/include/asm-x86/nospec.h
@@ -38,6 +38,32 @@ static always_inline void block_speculation(void)
     barrier_nospec_true();
 }
 
+static always_inline void arch_block_lock_speculation(void)
+{
+    alternative("lfence", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+}
+
+/* Allow to insert a read memory barrier into conditionals */
+static always_inline bool barrier_lock_true(void)
+{
+    alternative("lfence #nospec-true", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+    return true;
+}
+
+static always_inline bool barrier_lock_false(void)
+{
+    alternative("lfence #nospec-false", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+    return false;
+}
+
+static always_inline bool arch_lock_evaluate_nospec(bool condition)
+{
+    if ( condition )
+        return barrier_lock_true();
+    else
+        return barrier_lock_false();
+}
+
 #endif /* _ASM_X86_NOSPEC_H */
 
 /*
diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h
index 76255bc46efe..455284640396 100644
--- a/xen/include/xen/nospec.h
+++ b/xen/include/xen/nospec.h
@@ -70,6 +70,21 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
 #define array_access_nospec(array, index)                               \
     (array)[array_index_nospec(index, ARRAY_SIZE(array))]
 
+static always_inline void block_lock_speculation(void)
+{
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+    arch_block_lock_speculation();
+#endif
+}
+
+static always_inline bool lock_evaluate_nospec(bool condition)
+{
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+    return arch_lock_evaluate_nospec(condition);
+#endif
+    return condition;
+}
+
 #endif /* XEN_NOSPEC_H */
 
 /*
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index 9fa4e600c1f7..efdb21ea9072 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -1,6 +1,7 @@
 #ifndef __SPINLOCK_H__
 #define __SPINLOCK_H__
 
+#include <xen/nospec.h>
 #include <xen/time.h>
 #include <asm/system.h>
 #include <asm/spinlock.h>
@@ -189,13 +190,30 @@ int _spin_trylock_recursive(spinlock_t *lock);
 void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
-#define spin_lock(l)                  _spin_lock(l)
-#define spin_lock_cb(l, c, d)         _spin_lock_cb(l, c, d)
-#define spin_lock_irq(l)              _spin_lock_irq(l)
+static always_inline void spin_lock(spinlock_t *l)
+{
+    _spin_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void spin_lock_cb(spinlock_t *l, void (*c)(void *data),
+                                       void *d)
+{
+    _spin_lock_cb(l, c, d);
+    block_lock_speculation();
+}
+
+static always_inline void spin_lock_irq(spinlock_t *l)
+{
+    _spin_lock_irq(l);
+    block_lock_speculation();
+}
+
 #define spin_lock_irqsave(l, f)                                 \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _spin_lock_irqsave(l));                          \
+        block_lock_speculation();                               \
     })
 
 #define spin_unlock(l)                _spin_unlock(l)
@@ -203,7 +221,7 @@ void _spin_unlock_recursive(spinlock_t *lock);
 #define spin_unlock_irqrestore(l, f)  _spin_unlock_irqrestore(l, f)
 
 #define spin_is_locked(l)             _spin_is_locked(l)
-#define spin_trylock(l)               _spin_trylock(l)
+#define spin_trylock(l)               lock_evaluate_nospec(_spin_trylock(l))
 
 #define spin_trylock_irqsave(lock, flags)       \
 ({                                              \
@@ -224,8 +242,15 @@ void _spin_unlock_recursive(spinlock_t *lock);
  * are any critical regions that cannot form part of such a set, they can use
  * standard spin_[un]lock().
  */
-#define spin_trylock_recursive(l)     _spin_trylock_recursive(l)
-#define spin_lock_recursive(l)        _spin_lock_recursive(l)
+#define spin_trylock_recursive(l) \
+    lock_evaluate_nospec(_spin_trylock_recursive(l))
+
+static always_inline void spin_lock_recursive(spinlock_t *l)
+{
+    _spin_lock_recursive(l);
+    block_lock_speculation();
+}
+
 #define spin_unlock_recursive(l)      _spin_unlock_recursive(l)
 
 #endif /* __SPINLOCK_H__ */
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: rwlock: introduce support for blocking speculation into critical
 regions

Introduce inline wrappers as required and add direct calls to
block_lock_speculation() in order to prevent speculation into the rwlock
protected critical regions.

Note the rwlock primitives are adjusted to use the non speculation safe variants
of the spinlock handlers, as a speculation barrier is added in the rwlock
calling wrappers.

trylock variants are protected by using lock_evaluate_nospec().

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit a1fb15f61692b1fa9945fc51f55471ace49cdd59)

diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index dadab372b5e1..2464f745485d 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -34,8 +34,11 @@ void queue_read_lock_slowpath(rwlock_t *lock)
 
     /*
      * Put the reader into the wait queue.
+     *
+     * Use the speculation unsafe helper, as it's the caller responsibility to
+     * issue a speculation barrier if required.
      */
-    spin_lock(&lock->lock);
+    _spin_lock(&lock->lock);
 
     /*
      * At the head of the wait queue now, wait until the writer state
@@ -64,8 +67,13 @@ void queue_write_lock_slowpath(rwlock_t *lock)
 {
     u32 cnts;
 
-    /* Put the writer into the wait queue. */
-    spin_lock(&lock->lock);
+    /*
+     * Put the writer into the wait queue.
+     *
+     * Use the speculation unsafe helper, as it's the caller responsibility to
+     * issue a speculation barrier if required.
+     */
+    _spin_lock(&lock->lock);
 
     /* Try to acquire the lock directly if no reader is present. */
     if ( !atomic_read(&lock->cnts) &&
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index 0cc9167715b3..fd0458be94ae 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -247,27 +247,49 @@ static inline int _rw_is_write_locked(rwlock_t *lock)
     return (atomic_read(&lock->cnts) & _QW_WMASK) == _QW_LOCKED;
 }
 
-#define read_lock(l)                  _read_lock(l)
-#define read_lock_irq(l)              _read_lock_irq(l)
+static always_inline void read_lock(rwlock_t *l)
+{
+    _read_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void read_lock_irq(rwlock_t *l)
+{
+    _read_lock_irq(l);
+    block_lock_speculation();
+}
+
 #define read_lock_irqsave(l, f)                                 \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _read_lock_irqsave(l));                          \
+        block_lock_speculation();                               \
     })
 
 #define read_unlock(l)                _read_unlock(l)
 #define read_unlock_irq(l)            _read_unlock_irq(l)
 #define read_unlock_irqrestore(l, f)  _read_unlock_irqrestore(l, f)
-#define read_trylock(l)               _read_trylock(l)
+#define read_trylock(l)               lock_evaluate_nospec(_read_trylock(l))
+
+static always_inline void write_lock(rwlock_t *l)
+{
+    _write_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void write_lock_irq(rwlock_t *l)
+{
+    _write_lock_irq(l);
+    block_lock_speculation();
+}
 
-#define write_lock(l)                 _write_lock(l)
-#define write_lock_irq(l)             _write_lock_irq(l)
 #define write_lock_irqsave(l, f)                                \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _write_lock_irqsave(l));                         \
+        block_lock_speculation();                               \
     })
-#define write_trylock(l)              _write_trylock(l)
+#define write_trylock(l)              lock_evaluate_nospec(_write_trylock(l))
 
 #define write_unlock(l)               _write_unlock(l)
 #define write_unlock_irq(l)           _write_unlock_irq(l)
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: percpu-rwlock: introduce support for blocking speculation into
 critical regions

Add direct calls to block_lock_speculation() where required in order to prevent
speculation into the lock protected critical regions.  Also convert
_percpu_read_lock() from inline to always_inline.

Note that _percpu_write_lock() has been modified the use the non speculation
safe of the locking primites, as a speculation is added unconditionally by the
calling wrapper.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit f218daf6d3a3b847736d37c6a6b76031a0d08441)

diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index 2464f745485d..703276f4aa63 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -125,8 +125,12 @@ void _percpu_write_lock(percpu_rwlock_t **per_cpudata,
     /*
      * First take the write lock to protect against other writers or slow
      * path readers.
+     *
+     * Note we use the speculation unsafe variant of write_lock(), as the
+     * calling wrapper already adds a speculation barrier after the lock has
+     * been taken.
      */
-    write_lock(&percpu_rwlock->rwlock);
+    _write_lock(&percpu_rwlock->rwlock);
 
     /* Now set the global variable so that readers start using read_lock. */
     percpu_rwlock->writer_activating = 1;
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index fd0458be94ae..abe0804bf7d5 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -326,8 +326,8 @@ static inline void _percpu_rwlock_owner_check(percpu_rwlock_t **per_cpudata,
 #define percpu_rwlock_resource_init(l, owner) \
     (*(l) = (percpu_rwlock_t)PERCPU_RW_LOCK_UNLOCKED(&get_per_cpu_var(owner)))
 
-static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
-                                         percpu_rwlock_t *percpu_rwlock)
+static always_inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
+                                            percpu_rwlock_t *percpu_rwlock)
 {
     /* Validate the correct per_cpudata variable has been provided. */
     _percpu_rwlock_owner_check(per_cpudata, percpu_rwlock);
@@ -362,6 +362,8 @@ static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
     }
     else
     {
+        /* Other branch already has a speculation barrier in read_lock(). */
+        block_lock_speculation();
         /* All other paths have implicit check_lock() calls via read_lock(). */
         check_lock(&percpu_rwlock->rwlock.lock.debug, false);
     }
@@ -410,8 +412,12 @@ static inline void _percpu_write_unlock(percpu_rwlock_t **per_cpudata,
     _percpu_read_lock(&get_per_cpu_var(percpu), lock)
 #define percpu_read_unlock(percpu, lock) \
     _percpu_read_unlock(&get_per_cpu_var(percpu), lock)
-#define percpu_write_lock(percpu, lock) \
-    _percpu_write_lock(&get_per_cpu_var(percpu), lock)
+
+#define percpu_write_lock(percpu, lock)                 \
+({                                                      \
+    _percpu_write_lock(&get_per_cpu_var(percpu), lock); \
+    block_lock_speculation();                           \
+})
 #define percpu_write_unlock(percpu, lock) \
     _percpu_write_unlock(&get_per_cpu_var(percpu), lock)
 
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: locking: attempt to ensure lock wrappers are always inline

In order to prevent the locking speculation barriers from being inside of
`call`ed functions that could be speculatively bypassed.

While there also add an extra locking barrier to _mm_write_lock() in the branch
taken when the lock is already held.

Note some functions are switched to use the unsafe variants (without speculation
barrier) of the locking primitives, but a speculation barrier is always added
to the exposed public lock wrapping helper.  That's the case with
sched_spin_lock_double() or pcidevs_lock() for example.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 197ecd838a2aaf959a469df3696d4559c4f8b762)

diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 6fdc3e19fe8c..dd2de574cf18 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -161,7 +161,7 @@ static int pt_irq_masked(struct periodic_time *pt)
  * pt->vcpu field, because another thread holding the pt_migrate lock
  * may already be spinning waiting for your vcpu lock.
  */
-static void pt_vcpu_lock(struct vcpu *v)
+static always_inline void pt_vcpu_lock(struct vcpu *v)
 {
     spin_lock(&v->arch.hvm.tm_lock);
 }
@@ -180,9 +180,13 @@ static void pt_vcpu_unlock(struct vcpu *v)
  * need to take an additional lock that protects against pt->vcpu
  * changing.
  */
-static void pt_lock(struct periodic_time *pt)
+static always_inline void pt_lock(struct periodic_time *pt)
 {
-    read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
+    /*
+     * Use the speculation unsafe variant for the first lock, as the following
+     * lock taking helper already includes a speculation barrier.
+     */
+    _read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
     spin_lock(&pt->vcpu->arch.hvm.tm_lock);
 }
 
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index d6c073dc5cf5..cc635a440571 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -88,8 +88,8 @@ static inline void _set_lock_level(int l)
     this_cpu(mm_lock_level) = l;
 }
 
-static inline void _mm_lock(const struct domain *d, mm_lock_t *l,
-                            const char *func, int level, int rec)
+static always_inline void _mm_lock(const struct domain *d, mm_lock_t *l,
+                                   const char *func, int level, int rec)
 {
     if ( !((mm_locked_by_me(l)) && rec) )
         _check_lock_level(d, level);
@@ -139,8 +139,8 @@ static inline int mm_write_locked_by_me(mm_rwlock_t *l)
     return (l->locker == get_processor_id());
 }
 
-static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
-                                  const char *func, int level)
+static always_inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
+                                         const char *func, int level)
 {
     if ( !mm_write_locked_by_me(l) )
     {
@@ -151,6 +151,8 @@ static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
         l->unlock_level = _get_lock_level();
         _set_lock_level(_lock_level(d, level));
     }
+    else
+        block_speculation();
     l->recurse_count++;
 }
 
@@ -164,8 +166,8 @@ static inline void mm_write_unlock(mm_rwlock_t *l)
     percpu_write_unlock(p2m_percpu_rwlock, &l->lock);
 }
 
-static inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
-                                 int level)
+static always_inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
+                                        int level)
 {
     _check_lock_level(d, level);
     percpu_read_lock(p2m_percpu_rwlock, &l->lock);
@@ -180,15 +182,15 @@ static inline void mm_read_unlock(mm_rwlock_t *l)
 
 /* This wrapper uses the line number to express the locking order below */
 #define declare_mm_lock(name)                                                 \
-    static inline void mm_lock_##name(const struct domain *d, mm_lock_t *l,   \
-                                      const char *func, int rec)              \
+    static always_inline void mm_lock_##name(                                 \
+        const struct domain *d, mm_lock_t *l, const char *func, int rec)      \
     { _mm_lock(d, l, func, MM_LOCK_ORDER_##name, rec); }
 #define declare_mm_rwlock(name)                                               \
-    static inline void mm_write_lock_##name(const struct domain *d,           \
-                                            mm_rwlock_t *l, const char *func) \
+    static always_inline void mm_write_lock_##name(                           \
+        const struct domain *d, mm_rwlock_t *l, const char *func)             \
     { _mm_write_lock(d, l, func, MM_LOCK_ORDER_##name); }                     \
-    static inline void mm_read_lock_##name(const struct domain *d,            \
-                                           mm_rwlock_t *l)                    \
+    static always_inline void mm_read_lock_##name(const struct domain *d,     \
+                                                  mm_rwlock_t *l)             \
     { _mm_read_lock(d, l, MM_LOCK_ORDER_##name); }
 /* These capture the name of the calling function */
 #define mm_lock(name, d, l) mm_lock_##name(d, l, __func__, 0)
@@ -321,7 +323,7 @@ declare_mm_lock(altp2mlist)
 #define MM_LOCK_ORDER_altp2m                 40
 declare_mm_rwlock(altp2m);
 
-static inline void p2m_lock(struct p2m_domain *p)
+static always_inline void p2m_lock(struct p2m_domain *p)
 {
     if ( p2m_is_altp2m(p) )
         mm_write_lock(altp2m, p->domain, &p->lock);
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index a3c9d8a97423..c82628840864 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -35,7 +35,7 @@
 #define superpage_aligned(_x)  (((_x)&(SUPERPAGE_PAGES-1))==0)
 
 /* Enforce lock ordering when grabbing the "external" page_alloc lock */
-static inline void lock_page_alloc(struct p2m_domain *p2m)
+static always_inline void lock_page_alloc(struct p2m_domain *p2m)
 {
     page_alloc_mm_pre_lock(p2m->domain);
     spin_lock(&(p2m->domain->page_alloc_lock));
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index da88ad141a69..e5f4e68b8819 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -57,7 +57,7 @@
  * just assume the event channel is free or unbound at the moment when the
  * evtchn_read_trylock() returns false.
  */
-static inline void evtchn_write_lock(struct evtchn *evtchn)
+static always_inline void evtchn_write_lock(struct evtchn *evtchn)
 {
     write_lock(&evtchn->lock);
 
@@ -324,7 +324,8 @@ static int evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc)
     return rc;
 }
 
-static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
+static always_inline void double_evtchn_lock(struct evtchn *lchn,
+                                             struct evtchn *rchn)
 {
     ASSERT(lchn != rchn);
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 76272b3c8add..9464cebdd6e4 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -398,7 +398,7 @@ static inline void act_set_gfn(struct active_grant_entry *act, gfn_t gfn)
 
 static DEFINE_PERCPU_RWLOCK_GLOBAL(grant_rwlock);
 
-static inline void grant_read_lock(struct grant_table *gt)
+static always_inline void grant_read_lock(struct grant_table *gt)
 {
     percpu_read_lock(grant_rwlock, &gt->lock);
 }
@@ -408,7 +408,7 @@ static inline void grant_read_unlock(struct grant_table *gt)
     percpu_read_unlock(grant_rwlock, &gt->lock);
 }
 
-static inline void grant_write_lock(struct grant_table *gt)
+static always_inline void grant_write_lock(struct grant_table *gt)
 {
     percpu_write_lock(grant_rwlock, &gt->lock);
 }
@@ -445,7 +445,7 @@ nr_active_grant_frames(struct grant_table *gt)
     return num_act_frames_from_sha_frames(nr_grant_frames(gt));
 }
 
-static inline struct active_grant_entry *
+static always_inline struct active_grant_entry *
 active_entry_acquire(struct grant_table *t, grant_ref_t e)
 {
     struct active_grant_entry *act;
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 03ace41540d6..9e80ad4c7463 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -348,23 +348,28 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
  * This avoids dead- or live-locks when this code is running on both
  * cpus at the same time.
  */
-static void sched_spin_lock_double(spinlock_t *lock1, spinlock_t *lock2,
-                                   unsigned long *flags)
+static always_inline void sched_spin_lock_double(
+    spinlock_t *lock1, spinlock_t *lock2, unsigned long *flags)
 {
+    /*
+     * In order to avoid extra overhead, use the locking primitives without the
+     * speculation barrier, and introduce a single barrier here.
+     */
     if ( lock1 == lock2 )
     {
-        spin_lock_irqsave(lock1, *flags);
+        *flags = _spin_lock_irqsave(lock1);
     }
     else if ( lock1 < lock2 )
     {
-        spin_lock_irqsave(lock1, *flags);
-        spin_lock(lock2);
+        *flags = _spin_lock_irqsave(lock1);
+        _spin_lock(lock2);
     }
     else
     {
-        spin_lock_irqsave(lock2, *flags);
-        spin_lock(lock1);
+        *flags = _spin_lock_irqsave(lock2);
+        _spin_lock(lock1);
     }
+    block_lock_speculation();
 }
 
 static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index 0527a8c70d1c..24a93dd0c123 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -207,8 +207,24 @@ DECLARE_PER_CPU(cpumask_t, cpumask_scratch);
 #define cpumask_scratch        (&this_cpu(cpumask_scratch))
 #define cpumask_scratch_cpu(c) (&per_cpu(cpumask_scratch, c))
 
+/*
+ * Deal with _spin_lock_irqsave() returning the flags value instead of storing
+ * it in a passed parameter.
+ */
+#define _sched_spinlock0(lock, irq) _spin_lock##irq(lock)
+#define _sched_spinlock1(lock, irq, arg) ({ \
+    BUILD_BUG_ON(sizeof(arg) != sizeof(unsigned long)); \
+    (arg) = _spin_lock##irq(lock); \
+})
+
+#define _sched_spinlock__(nr) _sched_spinlock ## nr
+#define _sched_spinlock_(nr)  _sched_spinlock__(nr)
+#define _sched_spinlock(lock, irq, args...) \
+    _sched_spinlock_(count_args(args))(lock, irq, ## args)
+
 #define sched_lock(kind, param, cpu, irq, arg...) \
-static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
+static always_inline spinlock_t \
+*kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
 { \
     for ( ; ; ) \
     { \
@@ -220,10 +236,16 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
          * \
          * It may also be the case that v->processor may change but the \
          * lock may be the same; this will succeed in that case. \
+         * \
+         * Use the speculation unsafe locking helper, there's a speculation \
+         * barrier before returning to the caller. \
          */ \
-        spin_lock##irq(lock, ## arg); \
+        _sched_spinlock(lock, irq, ## arg); \
         if ( likely(lock == get_sched_res(cpu)->schedule_lock) ) \
+        { \
+            block_lock_speculation(); \
             return lock; \
+        } \
         spin_unlock##irq(lock, ## arg); \
     } \
 }
diff --git a/xen/common/timer.c b/xen/common/timer.c
index 1bb265ceea0e..dc831efc79e5 100644
--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -240,7 +240,7 @@ static inline void deactivate_timer(struct timer *timer)
     list_add(&timer->inactive, &per_cpu(timers, timer->cpu).inactive);
 }
 
-static inline bool_t timer_lock(struct timer *timer)
+static inline bool_t timer_lock_unsafe(struct timer *timer)
 {
     unsigned int cpu;
 
@@ -254,7 +254,8 @@ static inline bool_t timer_lock(struct timer *timer)
             rcu_read_unlock(&timer_cpu_read_lock);
             return 0;
         }
-        spin_lock(&per_cpu(timers, cpu).lock);
+        /* Use the speculation unsafe variant, the wrapper has the barrier. */
+        _spin_lock(&per_cpu(timers, cpu).lock);
         if ( likely(timer->cpu == cpu) )
             break;
         spin_unlock(&per_cpu(timers, cpu).lock);
@@ -267,8 +268,9 @@ static inline bool_t timer_lock(struct timer *timer)
 #define timer_lock_irqsave(t, flags) ({         \
     bool_t __x;                                 \
     local_irq_save(flags);                      \
-    if ( !(__x = timer_lock(t)) )               \
+    if ( !(__x = timer_lock_unsafe(t)) )        \
         local_irq_restore(flags);               \
+    block_lock_speculation();                   \
     __x;                                        \
 })
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 6fc27e7ede40..2fd663062ad5 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -52,9 +52,10 @@ struct pci_seg {
 
 static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
 
-void pcidevs_lock(void)
+/* Do not use, as it has no speculation barrier, use pcidevs_lock() instead. */
+void pcidevs_lock_unsafe(void)
 {
-    spin_lock_recursive(&_pcidevs_lock);
+    _spin_lock_recursive(&_pcidevs_lock);
 }
 
 void pcidevs_unlock(void)
diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h
index 7c825e9d9c0a..d4b2beda798d 100644
--- a/xen/include/asm-x86/irq.h
+++ b/xen/include/asm-x86/irq.h
@@ -177,6 +177,7 @@ extern void irq_complete_move(struct irq_desc *);
 
 extern struct irq_desc *irq_desc;
 
+/* Not speculation safe, only used for AP bringup. */
 void lock_vector_lock(void);
 void unlock_vector_lock(void);
 
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 21c95e14fd6a..18924e69e7d0 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -105,12 +105,12 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
 #define bucket_from_port(d, p) \
     ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
 
-static inline void evtchn_read_lock(struct evtchn *evtchn)
+static always_inline void evtchn_read_lock(struct evtchn *evtchn)
 {
     read_lock(&evtchn->lock);
 }
 
-static inline bool evtchn_read_trylock(struct evtchn *evtchn)
+static always_inline bool evtchn_read_trylock(struct evtchn *evtchn)
 {
     return read_trylock(&evtchn->lock);
 }
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index ac3880e686f8..3f1324e5de92 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -147,8 +147,12 @@ struct pci_dev {
  * devices, it also sync the access to the msi capability that is not
  * interrupt handling related (the mask bit register).
  */
-
-void pcidevs_lock(void);
+void pcidevs_lock_unsafe(void);
+static always_inline void pcidevs_lock(void)
+{
+    pcidevs_lock_unsafe();
+    block_lock_speculation();
+}
 void pcidevs_unlock(void);
 bool_t __must_check pcidevs_locked(void);
 
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86/mm: add speculation barriers to open coded locks

Add a speculation barrier to the clearly identified open-coded lock taking
functions.

Note that the memory sharing page_lock() replacement (_page_lock()) is left
as-is, as the code is experimental and not security supported.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 42a572a38e22a97d86a4b648a22597628d5b42e4)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index ea024c145034..2bf1b709851a 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2005,7 +2005,7 @@ static inline bool current_locked_page_ne_check(struct page_info *page) {
 #define current_locked_page_ne_check(x) true
 #endif
 
-int page_lock(struct page_info *page)
+int page_lock_unsafe(struct page_info *page)
 {
     unsigned long x, nx;
 
@@ -2066,7 +2066,7 @@ void page_unlock(struct page_info *page)
  * l3t_lock(), so to avoid deadlock we must avoid grabbing them in
  * reverse order.
  */
-static void l3t_lock(struct page_info *page)
+static always_inline void l3t_lock(struct page_info *page)
 {
     unsigned long x, nx;
 
@@ -2075,6 +2075,8 @@ static void l3t_lock(struct page_info *page)
             cpu_relax();
         nx = x | PGT_locked;
     } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
+
+    block_lock_speculation();
 }
 
 static void l3t_unlock(struct page_info *page)
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index cccef852b4de..73d5a98bec7e 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -393,7 +393,9 @@ const struct platform_bad_page *get_platform_badpages(unsigned int *array_size);
  * The use of PGT_locked in mem_sharing does not collide, since mem_sharing is
  * only supported for hvm guests, which do not have PV PTEs updated.
  */
-int page_lock(struct page_info *page);
+int page_lock_unsafe(struct page_info *page);
+#define page_lock(pg)   lock_evaluate_nospec(page_lock_unsafe(pg))
+
 void page_unlock(struct page_info *page);
 
 void put_page_type(struct page_info *page);
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86: protect conditional lock taking from speculative execution

Conditionally taken locks that use the pattern:

if ( lock )
    spin_lock(...);

Need an else branch in order to issue an speculation barrier in the else case,
just like it's done in case the lock needs to be acquired.

eval_nospec() could be used on the condition itself, but that would result in a
double barrier on the branch where the lock is taken.

Introduce a new pair of helpers, {gfn,spin}_lock_if() that can be used to
conditionally take a lock in a speculation safe way.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 03cf7ca23e0e876075954c558485b267b7d02406)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 2bf1b709851a..16287e62af23 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5000,8 +5000,7 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
         if ( !l3t )
             return NULL;
         UNMAP_DOMAIN_PAGE(l3t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
         {
             l4_pgentry_t l4e = l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
@@ -5038,8 +5037,7 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
             return NULL;
         }
         UNMAP_DOMAIN_PAGE(l2t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
             l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
@@ -5077,8 +5075,7 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
             return NULL;
         }
         UNMAP_DOMAIN_PAGE(l1t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
             l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR));
@@ -5109,6 +5106,8 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
     do {                      \
         if ( locking )        \
             l3t_lock(page);   \
+        else                            \
+            block_lock_speculation();   \
     } while ( false )
 
 #define L3T_UNLOCK(page)                           \
@@ -5324,8 +5323,7 @@ int map_pages_to_xen(
             if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
                 flush_flags |= FLUSH_TLB_GLOBAL;
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
@@ -5429,8 +5427,7 @@ int map_pages_to_xen(
                 if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
                     flush_flags |= FLUSH_TLB_GLOBAL;
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
@@ -5471,8 +5468,7 @@ int map_pages_to_xen(
                 unsigned long base_mfn;
                 const l1_pgentry_t *l1t;
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
 
                 ol2e = *pl2e;
                 /*
@@ -5526,8 +5522,7 @@ int map_pages_to_xen(
             unsigned long base_mfn;
             const l2_pgentry_t *l2t;
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
 
             ol3e = *pl3e;
             /*
@@ -5671,8 +5666,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                                        l3e_get_flags(*pl3e)));
             UNMAP_DOMAIN_PAGE(l2t);
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
@@ -5731,8 +5725,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                                            l2e_get_flags(*pl2e) & ~_PAGE_PSE));
                 UNMAP_DOMAIN_PAGE(l1t);
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
@@ -5776,8 +5769,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
              */
             if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
                 continue;
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
 
             /*
              * L2E may be already cleared, or set to a superpage, by
@@ -5824,8 +5816,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
         if ( (nf & _PAGE_PRESENT) ||
              ((v != e) && (l2_table_offset(v) + l1_table_offset(v) != 0)) )
             continue;
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
 
         /*
          * L3E may be already cleared, or set to a superpage, by
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index cc635a440571..7eee233b4cef 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -347,6 +347,15 @@ static inline void p2m_unlock(struct p2m_domain *p)
 #define p2m_locked_by_me(p)   mm_write_locked_by_me(&(p)->lock)
 #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
 
+static always_inline void gfn_lock_if(bool condition, struct p2m_domain *p2m,
+                                      gfn_t gfn, unsigned int order)
+{
+    if ( condition )
+        gfn_lock(p2m, gfn, order);
+    else
+        block_lock_speculation();
+}
+
 /* PoD lock (per-p2m-table)
  *
  * Protects private PoD data structs: entry and cache
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 2d41446a6902..ddd2f861c3c7 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -514,9 +514,8 @@ mfn_t __get_gfn_type_access(struct p2m_domain *p2m, unsigned long gfn_l,
     if ( q & P2M_UNSHARE )
         q |= P2M_ALLOC;
 
-    if ( locked )
-        /* Grab the lock here, don't release until put_gfn */
-        gfn_lock(p2m, gfn, 0);
+    /* Grab the lock here, don't release until put_gfn */
+    gfn_lock_if(locked, p2m, gfn, 0);
 
     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index efdb21ea9072..8bffb3f4b610 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -216,6 +216,14 @@ static always_inline void spin_lock_irq(spinlock_t *l)
         block_lock_speculation();                               \
     })
 
+/* Conditionally take a spinlock in a speculation safe way. */
+static always_inline void spin_lock_if(bool condition, spinlock_t *l)
+{
+    if ( condition )
+        _spin_lock(l);
+    block_lock_speculation();
+}
+
 #define spin_unlock(l)                _spin_unlock(l)
 #define spin_unlock_irq(l)            _spin_unlock_irq(l)
 #define spin_unlock_irqrestore(l, f)  _spin_unlock_irqrestore(l, f)
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: x86/paging: Delete update_cr3()'s do_locking parameter

Nicola reports that the XSA-438 fix introduced new MISRA violations because of
some incidental tidying it tried to do.  The parameter is useless, so resolve
the MISRA regression by removing it.

hap_update_cr3() discards the parameter entirely, while sh_update_cr3() uses
it to distinguish internal and external callers and therefore whether the
paging lock should be taken.

However, we have paging_lock_recursive() for this purpose, which also avoids
the ability for the shadow internal callers to accidentally not hold the lock.

Fixes: fb0ff49fe9f7 ("x86/shadow: defer releasing of PV's top-level shadow reference")
Reported-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Henry Wang <Henry.Wang@arm.com>
(cherry picked from commit e71157d1ac2a7fbf413130663cf0a93ff9fbcf7e)

diff --git a/xen/arch/x86/include/asm/paging.h b/xen/arch/x86/include/asm/paging.h
index 94c590f31aa8..809ff35d9a0d 100644
--- a/xen/arch/x86/include/asm/paging.h
+++ b/xen/arch/x86/include/asm/paging.h
@@ -138,8 +138,7 @@ struct paging_mode {
                                             paddr_t ga, uint32_t *pfec,
                                             unsigned int *page_order);
 #endif
-    pagetable_t   (*update_cr3            )(struct vcpu *v, bool do_locking,
-                                            bool noflush);
+    pagetable_t   (*update_cr3            )(struct vcpu *v, bool noflush);
     void          (*update_paging_modes   )(struct vcpu *v);
     bool          (*flush_tlb             )(const unsigned long *vcpu_bitmap);
 
@@ -312,7 +311,7 @@ static inline unsigned long paging_ga_to_gfn_cr3(struct vcpu *v,
  * as the value to load into the host CR3 to schedule this vcpu */
 static inline pagetable_t paging_update_cr3(struct vcpu *v, bool noflush)
 {
-    return paging_get_hostmode(v)->update_cr3(v, 1, noflush);
+    return paging_get_hostmode(v)->update_cr3(v, noflush);
 }
 
 /* Update all the things that are derived from the guest's CR0/CR3/CR4.
diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 57a19c3d59d1..3ad39a7dd781 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -739,8 +739,7 @@ static bool cf_check hap_invlpg(struct vcpu *v, unsigned long linear)
     return 1;
 }
 
-static pagetable_t cf_check hap_update_cr3(
-    struct vcpu *v, bool do_locking, bool noflush)
+static pagetable_t cf_check hap_update_cr3(struct vcpu *v, bool noflush)
 {
     v->arch.hvm.hw_cr[3] = v->arch.hvm.guest_cr[3];
     hvm_update_guest_cr3(v, noflush);
@@ -826,7 +825,7 @@ static void cf_check hap_update_paging_modes(struct vcpu *v)
     }
 
     /* CR3 is effectively updated by a mode change. Flush ASIDs, etc. */
-    hap_update_cr3(v, 0, false);
+    hap_update_cr3(v, false);
 
  unlock:
     paging_unlock(d);
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index c0940f939ef0..18714dbd02ab 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -2579,7 +2579,7 @@ static void sh_update_paging_modes(struct vcpu *v)
     }
 #endif /* OOS */
 
-    v->arch.paging.mode->update_cr3(v, 0, false);
+    v->arch.paging.mode->update_cr3(v, false);
 }
 
 void cf_check shadow_update_paging_modes(struct vcpu *v)
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index c92b354a7815..e54a507b54f6 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -2506,7 +2506,7 @@ static int cf_check sh_page_fault(
          * In any case, in the PAE case, the ASSERT is not true; it can
          * happen because of actions the guest is taking. */
 #if GUEST_PAGING_LEVELS == 3
-        v->arch.paging.mode->update_cr3(v, 0, false);
+        v->arch.paging.mode->update_cr3(v, false);
 #else
         ASSERT(d->is_shutting_down);
 #endif
@@ -3224,17 +3224,13 @@ static void cf_check sh_detach_old_tables(struct vcpu *v)
     }
 }
 
-static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool do_locking,
-                                          bool noflush)
+static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool noflush)
 /* Updates vcpu->arch.cr3 after the guest has changed CR3.
  * Paravirtual guests should set v->arch.guest_table (and guest_table_user,
  * if appropriate).
  * HVM guests should also make sure hvm_get_guest_cntl_reg(v, 3) works;
  * this function will call hvm_update_guest_cr(v, 3) to tell them where the
  * shadow tables are.
- * If do_locking != 0, assume we are being called from outside the
- * shadow code, and must take and release the paging lock; otherwise
- * that is the caller's responsibility.
  */
 {
     struct domain *d = v->domain;
@@ -3252,7 +3248,11 @@ static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool do_locking,
         return old_entry;
     }
 
-    if ( do_locking ) paging_lock(v->domain);
+    /*
+     * This is used externally (with the paging lock not taken) and internally
+     * by the shadow code (with the lock already taken).
+     */
+    paging_lock_recursive(v->domain);
 
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
     /* Need to resync all the shadow entries on a TLB flush.  Resync
@@ -3480,8 +3480,7 @@ static pagetable_t cf_check sh_update_cr3(struct vcpu *v, bool do_locking,
     shadow_sync_other_vcpus(v);
 #endif
 
-    /* Release the lock, if we took it (otherwise it's the caller's problem) */
-    if ( do_locking ) paging_unlock(v->domain);
+    paging_unlock(v->domain);
 
     return old_entry;
 }
diff --git a/xen/arch/x86/mm/shadow/none.c b/xen/arch/x86/mm/shadow/none.c
index 743c0ffb8514..7e4e386cd030 100644
--- a/xen/arch/x86/mm/shadow/none.c
+++ b/xen/arch/x86/mm/shadow/none.c
@@ -52,8 +52,7 @@ static unsigned long cf_check _gva_to_gfn(
 }
 #endif
 
-static pagetable_t cf_check _update_cr3(struct vcpu *v, bool do_locking,
-                                        bool noflush)
+static pagetable_t cf_check _update_cr3(struct vcpu *v, bool noflush)
 {
     ASSERT_UNREACHABLE();
     return pagetable_null();

From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: xen: Swap order of actions in the FREE*() macros

Wherever possible, it is a good idea to NULL out the visible reference to an
object prior to freeing it.  The FREE*() macros already collect together both
parts, making it easy to adjust.

This has a marginal code generation improvement, as some of the calls to the
free() function can be tailcall optimised.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c4f427ec879e7c0df6d44d02561e8bee838a293e)

diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 3dc61bcc3c07..211685a5d29c 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -80,8 +80,9 @@ bool scrub_free_pages(void);
 
 /* Free an allocation, and zero the pointer to it. */
 #define FREE_XENHEAP_PAGES(p, o) do { \
-    free_xenheap_pages(p, o);         \
+    void *_ptr_ = (p);                \
     (p) = NULL;                       \
+    free_xenheap_pages(_ptr_, o);     \
 } while ( false )
 #define FREE_XENHEAP_PAGE(p) FREE_XENHEAP_PAGES(p, 0)
 
diff --git a/xen/include/xen/xmalloc.h b/xen/include/xen/xmalloc.h
index 16979a117c6a..d857298011c1 100644
--- a/xen/include/xen/xmalloc.h
+++ b/xen/include/xen/xmalloc.h
@@ -66,9 +66,10 @@
 extern void xfree(void *);
 
 /* Free an allocation, and zero the pointer to it. */
-#define XFREE(p) do { \
-    xfree(p);         \
-    (p) = NULL;       \
+#define XFREE(p) do {                       \
+    void *_ptr_ = (p);                      \
+    (p) = NULL;                             \
+    xfree(_ptr_);                           \
 } while ( false )
 
 /* Underlying functions */
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86/spinlock: introduce support for blocking speculation into
 critical regions

Introduce a new Kconfig option to block speculation into lock protected
critical regions.  The Kconfig option is enabled by default, but the mitigation
won't be engaged unless it's explicitly enabled in the command line using
`spec-ctrl=lock-harden`.

Convert the spinlock acquire macros into always-inline functions, and introduce
a speculation barrier after the lock has been taken.  Note the speculation
barrier is not placed inside the implementation of the spin lock functions, as
to prevent speculation from falling through the call to the lock functions
resulting in the barrier also being skipped.

trylock variants are protected using a construct akin to the existing
evaluate_nospec().

This patch only implements the speculation barrier for x86.

Note spin locks are the only locking primitive taken care in this change,
further locking primitives will be adjusted by separate changes.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 7ef0084418e188d05f338c3e028fbbe8b6924afa)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index d909ec94fe7c..e1d56407dd88 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2327,7 +2327,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 >              {msr-sc,rsb,verw,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
->              unpriv-mmio,gds-mit,div-scrub}=<bool> ]`
+>              unpriv-mmio,gds-mit,div-scrub,lock-harden}=<bool> ]`
 
 Controls for speculative execution sidechannel mitigations.  By default, Xen
 will pick the most appropriate mitigations based on compiled in support,
@@ -2454,6 +2454,11 @@ On all hardware, the `div-scrub=` option can be used to force or prevent Xen
 from mitigating the DIV-leakage vulnerability.  By default, Xen will mitigate
 DIV-leakage on hardware believed to be vulnerable.
 
+If Xen is compiled with `CONFIG_SPECULATIVE_HARDEN_LOCK`, the `lock-harden=`
+boolean can be used to force or prevent Xen from using speculation barriers to
+protect lock critical regions.  This mitigation won't be engaged by default,
+and needs to be explicitly enabled on the command line.
+
 ### sync_console
 > `= <boolean>`
 
diff --git a/xen/arch/x86/include/asm/cpufeatures.h b/xen/arch/x86/include/asm/cpufeatures.h
index c3aad21c3b43..7e8221fd85dd 100644
--- a/xen/arch/x86/include/asm/cpufeatures.h
+++ b/xen/arch/x86/include/asm/cpufeatures.h
@@ -24,7 +24,7 @@ XEN_CPUFEATURE(APERFMPERF,        X86_SYNTH( 8)) /* APERFMPERF */
 XEN_CPUFEATURE(MFENCE_RDTSC,      X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */
 XEN_CPUFEATURE(XEN_SMEP,          X86_SYNTH(10)) /* SMEP gets used by Xen itself */
 XEN_CPUFEATURE(XEN_SMAP,          X86_SYNTH(11)) /* SMAP gets used by Xen itself */
-/* Bit 12 unused. */
+XEN_CPUFEATURE(SC_NO_LOCK_HARDEN, X86_SYNTH(12)) /* (Disable) Lock critical region hardening */
 XEN_CPUFEATURE(IND_THUNK_LFENCE,  X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */
 XEN_CPUFEATURE(IND_THUNK_JMP,     X86_SYNTH(14)) /* Use IND_THUNK_JMP */
 XEN_CPUFEATURE(SC_NO_BRANCH_HARDEN, X86_SYNTH(15)) /* (Disable) Conditional branch hardening */
diff --git a/xen/arch/x86/include/asm/nospec.h b/xen/arch/x86/include/asm/nospec.h
index 7150e76b87fb..0725839e1982 100644
--- a/xen/arch/x86/include/asm/nospec.h
+++ b/xen/arch/x86/include/asm/nospec.h
@@ -38,6 +38,32 @@ static always_inline void block_speculation(void)
     barrier_nospec_true();
 }
 
+static always_inline void arch_block_lock_speculation(void)
+{
+    alternative("lfence", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+}
+
+/* Allow to insert a read memory barrier into conditionals */
+static always_inline bool barrier_lock_true(void)
+{
+    alternative("lfence #nospec-true", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+    return true;
+}
+
+static always_inline bool barrier_lock_false(void)
+{
+    alternative("lfence #nospec-false", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+    return false;
+}
+
+static always_inline bool arch_lock_evaluate_nospec(bool condition)
+{
+    if ( condition )
+        return barrier_lock_true();
+    else
+        return barrier_lock_false();
+}
+
 #endif /* _ASM_X86_NOSPEC_H */
 
 /*
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 1ee81e2dfe79..ac21af2c5c0f 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -65,6 +65,7 @@ int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 static bool __initdata opt_branch_harden =
     IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH);
+static bool __initdata opt_lock_harden;
 
 bool __initdata bsp_delay_spec_ctrl;
 uint8_t __read_mostly default_xen_spec_ctrl;
@@ -133,6 +134,7 @@ static int __init cf_check parse_spec_ctrl(const char *s)
             opt_ssbd = false;
             opt_l1d_flush = 0;
             opt_branch_harden = false;
+            opt_lock_harden = false;
             opt_srb_lock = 0;
             opt_unpriv_mmio = false;
             opt_gds_mit = 0;
@@ -298,6 +300,16 @@ static int __init cf_check parse_spec_ctrl(const char *s)
                 rc = -EINVAL;
             }
         }
+        else if ( (val = parse_boolean("lock-harden", s, ss)) >= 0 )
+        {
+            if ( IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
+                opt_lock_harden = val;
+            else
+            {
+                no_config_param("SPECULATIVE_HARDEN_LOCK", "spec-ctrl", s, ss);
+                rc = -EINVAL;
+            }
+        }
         else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
             opt_srb_lock = val;
         else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
@@ -500,7 +512,8 @@ static void __init print_details(enum ind_thunk thunk)
     if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) ||
          IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_ARRAY) ||
          IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) ||
-         IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) )
+         IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) ||
+         IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
         printk("  Compiled-in support:"
 #ifdef CONFIG_INDIRECT_THUNK
                " INDIRECT_THUNK"
@@ -516,11 +529,14 @@ static void __init print_details(enum ind_thunk thunk)
 #endif
 #ifdef CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS
                " HARDEN_GUEST_ACCESS"
+#endif
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+               " HARDEN_LOCK"
 #endif
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
+    printk("  Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s%s\n",
            thunk != THUNK_NONE      ? "BTI-Thunk: " : "",
            thunk == THUNK_NONE      ? "" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE, " :
@@ -547,7 +563,8 @@ static void __init print_details(enum ind_thunk thunk)
            opt_verw_pv || opt_verw_hvm ||
            opt_verw_mmio                             ? " VERW"  : "",
            opt_div_scrub                             ? " DIV" : "",
-           opt_branch_harden                         ? " BRANCH_HARDEN" : "");
+           opt_branch_harden                         ? " BRANCH_HARDEN" : "",
+           opt_lock_harden                           ? " LOCK_HARDEN" : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
@@ -1930,6 +1947,9 @@ void __init init_speculation_mitigations(void)
     if ( !opt_branch_harden )
         setup_force_cpu_cap(X86_FEATURE_SC_NO_BRANCH_HARDEN);
 
+    if ( !opt_lock_harden )
+        setup_force_cpu_cap(X86_FEATURE_SC_NO_LOCK_HARDEN);
+
     /*
      * We do not disable HT by default on affected hardware.
      *
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index e7794cb7f681..cd7385153823 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -173,6 +173,23 @@ config SPECULATIVE_HARDEN_GUEST_ACCESS
 
 	  If unsure, say Y.
 
+config SPECULATIVE_HARDEN_LOCK
+	bool "Speculative lock context hardening"
+	default y
+	depends on X86
+	help
+	  Contemporary processors may use speculative execution as a
+	  performance optimisation, but this can potentially be abused by an
+	  attacker to leak data via speculative sidechannels.
+
+	  One source of data leakage is via speculative accesses to lock
+	  critical regions.
+
+	  This option is disabled by default at run time, and needs to be
+	  enabled on the command line.
+
+	  If unsure, say Y.
+
 endmenu
 
 config DIT_DEFAULT
diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h
index 76255bc46efe..455284640396 100644
--- a/xen/include/xen/nospec.h
+++ b/xen/include/xen/nospec.h
@@ -70,6 +70,21 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
 #define array_access_nospec(array, index)                               \
     (array)[array_index_nospec(index, ARRAY_SIZE(array))]
 
+static always_inline void block_lock_speculation(void)
+{
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+    arch_block_lock_speculation();
+#endif
+}
+
+static always_inline bool lock_evaluate_nospec(bool condition)
+{
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+    return arch_lock_evaluate_nospec(condition);
+#endif
+    return condition;
+}
+
 #endif /* XEN_NOSPEC_H */
 
 /*
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index 961891bea4d5..daf48fdea709 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -1,6 +1,7 @@
 #ifndef __SPINLOCK_H__
 #define __SPINLOCK_H__
 
+#include <xen/nospec.h>
 #include <xen/time.h>
 #include <asm/system.h>
 #include <asm/spinlock.h>
@@ -189,13 +190,30 @@ int _spin_trylock_recursive(spinlock_t *lock);
 void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
-#define spin_lock(l)                  _spin_lock(l)
-#define spin_lock_cb(l, c, d)         _spin_lock_cb(l, c, d)
-#define spin_lock_irq(l)              _spin_lock_irq(l)
+static always_inline void spin_lock(spinlock_t *l)
+{
+    _spin_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void spin_lock_cb(spinlock_t *l, void (*c)(void *data),
+                                       void *d)
+{
+    _spin_lock_cb(l, c, d);
+    block_lock_speculation();
+}
+
+static always_inline void spin_lock_irq(spinlock_t *l)
+{
+    _spin_lock_irq(l);
+    block_lock_speculation();
+}
+
 #define spin_lock_irqsave(l, f)                                 \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _spin_lock_irqsave(l));                          \
+        block_lock_speculation();                               \
     })
 
 #define spin_unlock(l)                _spin_unlock(l)
@@ -203,7 +221,7 @@ void _spin_unlock_recursive(spinlock_t *lock);
 #define spin_unlock_irqrestore(l, f)  _spin_unlock_irqrestore(l, f)
 
 #define spin_is_locked(l)             _spin_is_locked(l)
-#define spin_trylock(l)               _spin_trylock(l)
+#define spin_trylock(l)               lock_evaluate_nospec(_spin_trylock(l))
 
 #define spin_trylock_irqsave(lock, flags)       \
 ({                                              \
@@ -224,8 +242,15 @@ void _spin_unlock_recursive(spinlock_t *lock);
  * are any critical regions that cannot form part of such a set, they can use
  * standard spin_[un]lock().
  */
-#define spin_trylock_recursive(l)     _spin_trylock_recursive(l)
-#define spin_lock_recursive(l)        _spin_lock_recursive(l)
+#define spin_trylock_recursive(l) \
+    lock_evaluate_nospec(_spin_trylock_recursive(l))
+
+static always_inline void spin_lock_recursive(spinlock_t *l)
+{
+    _spin_lock_recursive(l);
+    block_lock_speculation();
+}
+
 #define spin_unlock_recursive(l)      _spin_unlock_recursive(l)
 
 #endif /* __SPINLOCK_H__ */
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: rwlock: introduce support for blocking speculation into critical
 regions

Introduce inline wrappers as required and add direct calls to
block_lock_speculation() in order to prevent speculation into the rwlock
protected critical regions.

Note the rwlock primitives are adjusted to use the non speculation safe variants
of the spinlock handlers, as a speculation barrier is added in the rwlock
calling wrappers.

trylock variants are protected by using lock_evaluate_nospec().

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit a1fb15f61692b1fa9945fc51f55471ace49cdd59)

diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index aa15529bbe8c..cda06b9d6ece 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -34,8 +34,11 @@ void queue_read_lock_slowpath(rwlock_t *lock)
 
     /*
      * Put the reader into the wait queue.
+     *
+     * Use the speculation unsafe helper, as it's the caller responsibility to
+     * issue a speculation barrier if required.
      */
-    spin_lock(&lock->lock);
+    _spin_lock(&lock->lock);
 
     /*
      * At the head of the wait queue now, wait until the writer state
@@ -64,8 +67,13 @@ void queue_write_lock_slowpath(rwlock_t *lock)
 {
     u32 cnts;
 
-    /* Put the writer into the wait queue. */
-    spin_lock(&lock->lock);
+    /*
+     * Put the writer into the wait queue.
+     *
+     * Use the speculation unsafe helper, as it's the caller responsibility to
+     * issue a speculation barrier if required.
+     */
+    _spin_lock(&lock->lock);
 
     /* Try to acquire the lock directly if no reader is present. */
     if ( !atomic_read(&lock->cnts) &&
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index 0cc9167715b3..fd0458be94ae 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -247,27 +247,49 @@ static inline int _rw_is_write_locked(rwlock_t *lock)
     return (atomic_read(&lock->cnts) & _QW_WMASK) == _QW_LOCKED;
 }
 
-#define read_lock(l)                  _read_lock(l)
-#define read_lock_irq(l)              _read_lock_irq(l)
+static always_inline void read_lock(rwlock_t *l)
+{
+    _read_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void read_lock_irq(rwlock_t *l)
+{
+    _read_lock_irq(l);
+    block_lock_speculation();
+}
+
 #define read_lock_irqsave(l, f)                                 \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _read_lock_irqsave(l));                          \
+        block_lock_speculation();                               \
     })
 
 #define read_unlock(l)                _read_unlock(l)
 #define read_unlock_irq(l)            _read_unlock_irq(l)
 #define read_unlock_irqrestore(l, f)  _read_unlock_irqrestore(l, f)
-#define read_trylock(l)               _read_trylock(l)
+#define read_trylock(l)               lock_evaluate_nospec(_read_trylock(l))
+
+static always_inline void write_lock(rwlock_t *l)
+{
+    _write_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void write_lock_irq(rwlock_t *l)
+{
+    _write_lock_irq(l);
+    block_lock_speculation();
+}
 
-#define write_lock(l)                 _write_lock(l)
-#define write_lock_irq(l)             _write_lock_irq(l)
 #define write_lock_irqsave(l, f)                                \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _write_lock_irqsave(l));                         \
+        block_lock_speculation();                               \
     })
-#define write_trylock(l)              _write_trylock(l)
+#define write_trylock(l)              lock_evaluate_nospec(_write_trylock(l))
 
 #define write_unlock(l)               _write_unlock(l)
 #define write_unlock_irq(l)           _write_unlock_irq(l)
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: percpu-rwlock: introduce support for blocking speculation into
 critical regions

Add direct calls to block_lock_speculation() where required in order to prevent
speculation into the lock protected critical regions.  Also convert
_percpu_read_lock() from inline to always_inline.

Note that _percpu_write_lock() has been modified the use the non speculation
safe of the locking primites, as a speculation is added unconditionally by the
calling wrapper.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit f218daf6d3a3b847736d37c6a6b76031a0d08441)

diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index cda06b9d6ece..4da0ed8fadb0 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -125,8 +125,12 @@ void _percpu_write_lock(percpu_rwlock_t **per_cpudata,
     /*
      * First take the write lock to protect against other writers or slow
      * path readers.
+     *
+     * Note we use the speculation unsafe variant of write_lock(), as the
+     * calling wrapper already adds a speculation barrier after the lock has
+     * been taken.
      */
-    write_lock(&percpu_rwlock->rwlock);
+    _write_lock(&percpu_rwlock->rwlock);
 
     /* Now set the global variable so that readers start using read_lock. */
     percpu_rwlock->writer_activating = 1;
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index fd0458be94ae..abe0804bf7d5 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -326,8 +326,8 @@ static inline void _percpu_rwlock_owner_check(percpu_rwlock_t **per_cpudata,
 #define percpu_rwlock_resource_init(l, owner) \
     (*(l) = (percpu_rwlock_t)PERCPU_RW_LOCK_UNLOCKED(&get_per_cpu_var(owner)))
 
-static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
-                                         percpu_rwlock_t *percpu_rwlock)
+static always_inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
+                                            percpu_rwlock_t *percpu_rwlock)
 {
     /* Validate the correct per_cpudata variable has been provided. */
     _percpu_rwlock_owner_check(per_cpudata, percpu_rwlock);
@@ -362,6 +362,8 @@ static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
     }
     else
     {
+        /* Other branch already has a speculation barrier in read_lock(). */
+        block_lock_speculation();
         /* All other paths have implicit check_lock() calls via read_lock(). */
         check_lock(&percpu_rwlock->rwlock.lock.debug, false);
     }
@@ -410,8 +412,12 @@ static inline void _percpu_write_unlock(percpu_rwlock_t **per_cpudata,
     _percpu_read_lock(&get_per_cpu_var(percpu), lock)
 #define percpu_read_unlock(percpu, lock) \
     _percpu_read_unlock(&get_per_cpu_var(percpu), lock)
-#define percpu_write_lock(percpu, lock) \
-    _percpu_write_lock(&get_per_cpu_var(percpu), lock)
+
+#define percpu_write_lock(percpu, lock)                 \
+({                                                      \
+    _percpu_write_lock(&get_per_cpu_var(percpu), lock); \
+    block_lock_speculation();                           \
+})
 #define percpu_write_unlock(percpu, lock) \
     _percpu_write_unlock(&get_per_cpu_var(percpu), lock)
 
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: locking: attempt to ensure lock wrappers are always inline

In order to prevent the locking speculation barriers from being inside of
`call`ed functions that could be speculatively bypassed.

While there also add an extra locking barrier to _mm_write_lock() in the branch
taken when the lock is already held.

Note some functions are switched to use the unsafe variants (without speculation
barrier) of the locking primitives, but a speculation barrier is always added
to the exposed public lock wrapping helper.  That's the case with
sched_spin_lock_double() or pcidevs_lock() for example.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 197ecd838a2aaf959a469df3696d4559c4f8b762)

diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index cb1d81bf9e82..66f10952456b 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -161,7 +161,7 @@ static int pt_irq_masked(struct periodic_time *pt)
  * pt->vcpu field, because another thread holding the pt_migrate lock
  * may already be spinning waiting for your vcpu lock.
  */
-static void pt_vcpu_lock(struct vcpu *v)
+static always_inline void pt_vcpu_lock(struct vcpu *v)
 {
     spin_lock(&v->arch.hvm.tm_lock);
 }
@@ -180,9 +180,13 @@ static void pt_vcpu_unlock(struct vcpu *v)
  * need to take an additional lock that protects against pt->vcpu
  * changing.
  */
-static void pt_lock(struct periodic_time *pt)
+static always_inline void pt_lock(struct periodic_time *pt)
 {
-    read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
+    /*
+     * Use the speculation unsafe variant for the first lock, as the following
+     * lock taking helper already includes a speculation barrier.
+     */
+    _read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
     spin_lock(&pt->vcpu->arch.hvm.tm_lock);
 }
 
diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index f6a0207a8087..823d627fd001 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -178,6 +178,7 @@ void cf_check irq_complete_move(struct irq_desc *);
 
 extern struct irq_desc *irq_desc;
 
+/* Not speculation safe, only used for AP bringup. */
 void lock_vector_lock(void);
 void unlock_vector_lock(void);
 
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index c1523aeccf99..265239c49f39 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -86,8 +86,8 @@ static inline void _set_lock_level(int l)
     this_cpu(mm_lock_level) = l;
 }
 
-static inline void _mm_lock(const struct domain *d, mm_lock_t *l,
-                            const char *func, int level, int rec)
+static always_inline void _mm_lock(const struct domain *d, mm_lock_t *l,
+                                   const char *func, int level, int rec)
 {
     if ( !((mm_locked_by_me(l)) && rec) )
         _check_lock_level(d, level);
@@ -137,8 +137,8 @@ static inline int mm_write_locked_by_me(mm_rwlock_t *l)
     return (l->locker == get_processor_id());
 }
 
-static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
-                                  const char *func, int level)
+static always_inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
+                                         const char *func, int level)
 {
     if ( !mm_write_locked_by_me(l) )
     {
@@ -149,6 +149,8 @@ static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
         l->unlock_level = _get_lock_level();
         _set_lock_level(_lock_level(d, level));
     }
+    else
+        block_speculation();
     l->recurse_count++;
 }
 
@@ -162,8 +164,8 @@ static inline void mm_write_unlock(mm_rwlock_t *l)
     percpu_write_unlock(p2m_percpu_rwlock, &l->lock);
 }
 
-static inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
-                                 int level)
+static always_inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
+                                        int level)
 {
     _check_lock_level(d, level);
     percpu_read_lock(p2m_percpu_rwlock, &l->lock);
@@ -178,15 +180,15 @@ static inline void mm_read_unlock(mm_rwlock_t *l)
 
 /* This wrapper uses the line number to express the locking order below */
 #define declare_mm_lock(name)                                                 \
-    static inline void mm_lock_##name(const struct domain *d, mm_lock_t *l,   \
-                                      const char *func, int rec)              \
+    static always_inline void mm_lock_##name(                                 \
+        const struct domain *d, mm_lock_t *l, const char *func, int rec)      \
     { _mm_lock(d, l, func, MM_LOCK_ORDER_##name, rec); }
 #define declare_mm_rwlock(name)                                               \
-    static inline void mm_write_lock_##name(const struct domain *d,           \
-                                            mm_rwlock_t *l, const char *func) \
+    static always_inline void mm_write_lock_##name(                           \
+        const struct domain *d, mm_rwlock_t *l, const char *func)             \
     { _mm_write_lock(d, l, func, MM_LOCK_ORDER_##name); }                     \
-    static inline void mm_read_lock_##name(const struct domain *d,            \
-                                           mm_rwlock_t *l)                    \
+    static always_inline void mm_read_lock_##name(const struct domain *d,     \
+                                                  mm_rwlock_t *l)             \
     { _mm_read_lock(d, l, MM_LOCK_ORDER_##name); }
 /* These capture the name of the calling function */
 #define mm_lock(name, d, l) mm_lock_##name(d, l, __func__, 0)
@@ -321,7 +323,7 @@ declare_mm_lock(altp2mlist)
 #define MM_LOCK_ORDER_altp2m                 40
 declare_mm_rwlock(altp2m);
 
-static inline void p2m_lock(struct p2m_domain *p)
+static always_inline void p2m_lock(struct p2m_domain *p)
 {
     if ( p2m_is_altp2m(p) )
         mm_write_lock(altp2m, p->domain, &p->lock);
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index fc110506dce2..99dbcb3101e2 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -36,7 +36,7 @@
 #define superpage_aligned(_x)  (((_x)&(SUPERPAGE_PAGES-1))==0)
 
 /* Enforce lock ordering when grabbing the "external" page_alloc lock */
-static inline void lock_page_alloc(struct p2m_domain *p2m)
+static always_inline void lock_page_alloc(struct p2m_domain *p2m)
 {
     page_alloc_mm_pre_lock(p2m->domain);
     spin_lock(&(p2m->domain->page_alloc_lock));
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index f5e0b12d1520..dada9f15f574 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -62,7 +62,7 @@
  * just assume the event channel is free or unbound at the moment when the
  * evtchn_read_trylock() returns false.
  */
-static inline void evtchn_write_lock(struct evtchn *evtchn)
+static always_inline void evtchn_write_lock(struct evtchn *evtchn)
 {
     write_lock(&evtchn->lock);
 
@@ -364,7 +364,8 @@ int evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc, evtchn_port_t port)
     return rc;
 }
 
-static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
+static always_inline void double_evtchn_lock(struct evtchn *lchn,
+                                             struct evtchn *rchn)
 {
     ASSERT(lchn != rchn);
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index ee7cc496b8cb..62a8685cd514 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -410,7 +410,7 @@ static inline void act_set_gfn(struct active_grant_entry *act, gfn_t gfn)
 
 static DEFINE_PERCPU_RWLOCK_GLOBAL(grant_rwlock);
 
-static inline void grant_read_lock(struct grant_table *gt)
+static always_inline void grant_read_lock(struct grant_table *gt)
 {
     percpu_read_lock(grant_rwlock, &gt->lock);
 }
@@ -420,7 +420,7 @@ static inline void grant_read_unlock(struct grant_table *gt)
     percpu_read_unlock(grant_rwlock, &gt->lock);
 }
 
-static inline void grant_write_lock(struct grant_table *gt)
+static always_inline void grant_write_lock(struct grant_table *gt)
 {
     percpu_write_lock(grant_rwlock, &gt->lock);
 }
@@ -457,7 +457,7 @@ nr_active_grant_frames(struct grant_table *gt)
     return num_act_frames_from_sha_frames(nr_grant_frames(gt));
 }
 
-static inline struct active_grant_entry *
+static always_inline struct active_grant_entry *
 active_entry_acquire(struct grant_table *t, grant_ref_t e)
 {
     struct active_grant_entry *act;
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 078beb1adbbd..29bbab5ac6fd 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -348,23 +348,28 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
  * This avoids dead- or live-locks when this code is running on both
  * cpus at the same time.
  */
-static void sched_spin_lock_double(spinlock_t *lock1, spinlock_t *lock2,
-                                   unsigned long *flags)
+static always_inline void sched_spin_lock_double(
+    spinlock_t *lock1, spinlock_t *lock2, unsigned long *flags)
 {
+    /*
+     * In order to avoid extra overhead, use the locking primitives without the
+     * speculation barrier, and introduce a single barrier here.
+     */
     if ( lock1 == lock2 )
     {
-        spin_lock_irqsave(lock1, *flags);
+        *flags = _spin_lock_irqsave(lock1);
     }
     else if ( lock1 < lock2 )
     {
-        spin_lock_irqsave(lock1, *flags);
-        spin_lock(lock2);
+        *flags = _spin_lock_irqsave(lock1);
+        _spin_lock(lock2);
     }
     else
     {
-        spin_lock_irqsave(lock2, *flags);
-        spin_lock(lock1);
+        *flags = _spin_lock_irqsave(lock2);
+        _spin_lock(lock1);
     }
+    block_lock_speculation();
 }
 
 static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index 0527a8c70d1c..24a93dd0c123 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -207,8 +207,24 @@ DECLARE_PER_CPU(cpumask_t, cpumask_scratch);
 #define cpumask_scratch        (&this_cpu(cpumask_scratch))
 #define cpumask_scratch_cpu(c) (&per_cpu(cpumask_scratch, c))
 
+/*
+ * Deal with _spin_lock_irqsave() returning the flags value instead of storing
+ * it in a passed parameter.
+ */
+#define _sched_spinlock0(lock, irq) _spin_lock##irq(lock)
+#define _sched_spinlock1(lock, irq, arg) ({ \
+    BUILD_BUG_ON(sizeof(arg) != sizeof(unsigned long)); \
+    (arg) = _spin_lock##irq(lock); \
+})
+
+#define _sched_spinlock__(nr) _sched_spinlock ## nr
+#define _sched_spinlock_(nr)  _sched_spinlock__(nr)
+#define _sched_spinlock(lock, irq, args...) \
+    _sched_spinlock_(count_args(args))(lock, irq, ## args)
+
 #define sched_lock(kind, param, cpu, irq, arg...) \
-static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
+static always_inline spinlock_t \
+*kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
 { \
     for ( ; ; ) \
     { \
@@ -220,10 +236,16 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
          * \
          * It may also be the case that v->processor may change but the \
          * lock may be the same; this will succeed in that case. \
+         * \
+         * Use the speculation unsafe locking helper, there's a speculation \
+         * barrier before returning to the caller. \
          */ \
-        spin_lock##irq(lock, ## arg); \
+        _sched_spinlock(lock, irq, ## arg); \
         if ( likely(lock == get_sched_res(cpu)->schedule_lock) ) \
+        { \
+            block_lock_speculation(); \
             return lock; \
+        } \
         spin_unlock##irq(lock, ## arg); \
     } \
 }
diff --git a/xen/common/timer.c b/xen/common/timer.c
index 9b5016d5ed82..459668d417f4 100644
--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -240,7 +240,7 @@ static inline void deactivate_timer(struct timer *timer)
     list_add(&timer->inactive, &per_cpu(timers, timer->cpu).inactive);
 }
 
-static inline bool_t timer_lock(struct timer *timer)
+static inline bool_t timer_lock_unsafe(struct timer *timer)
 {
     unsigned int cpu;
 
@@ -254,7 +254,8 @@ static inline bool_t timer_lock(struct timer *timer)
             rcu_read_unlock(&timer_cpu_read_lock);
             return 0;
         }
-        spin_lock(&per_cpu(timers, cpu).lock);
+        /* Use the speculation unsafe variant, the wrapper has the barrier. */
+        _spin_lock(&per_cpu(timers, cpu).lock);
         if ( likely(timer->cpu == cpu) )
             break;
         spin_unlock(&per_cpu(timers, cpu).lock);
@@ -267,8 +268,9 @@ static inline bool_t timer_lock(struct timer *timer)
 #define timer_lock_irqsave(t, flags) ({         \
     bool_t __x;                                 \
     local_irq_save(flags);                      \
-    if ( !(__x = timer_lock(t)) )               \
+    if ( !(__x = timer_lock_unsafe(t)) )        \
         local_irq_restore(flags);               \
+    block_lock_speculation();                   \
     __x;                                        \
 })
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 8c62b14d19c1..1b3d28516643 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -52,9 +52,10 @@ struct pci_seg {
 
 static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
 
-void pcidevs_lock(void)
+/* Do not use, as it has no speculation barrier, use pcidevs_lock() instead. */
+void pcidevs_lock_unsafe(void)
 {
-    spin_lock_recursive(&_pcidevs_lock);
+    _spin_lock_recursive(&_pcidevs_lock);
 }
 
 void pcidevs_unlock(void)
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 8eae9984a9f1..dd96e84c6956 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -114,12 +114,12 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
 #define bucket_from_port(d, p) \
     ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
 
-static inline void evtchn_read_lock(struct evtchn *evtchn)
+static always_inline void evtchn_read_lock(struct evtchn *evtchn)
 {
     read_lock(&evtchn->lock);
 }
 
-static inline bool evtchn_read_trylock(struct evtchn *evtchn)
+static always_inline bool evtchn_read_trylock(struct evtchn *evtchn)
 {
     return read_trylock(&evtchn->lock);
 }
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 5975ca2f3032..b373f139d136 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -155,8 +155,12 @@ struct pci_dev {
  * devices, it also sync the access to the msi capability that is not
  * interrupt handling related (the mask bit register).
  */
-
-void pcidevs_lock(void);
+void pcidevs_lock_unsafe(void);
+static always_inline void pcidevs_lock(void)
+{
+    pcidevs_lock_unsafe();
+    block_lock_speculation();
+}
 void pcidevs_unlock(void);
 bool_t __must_check pcidevs_locked(void);
 
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86/mm: add speculation barriers to open coded locks

Add a speculation barrier to the clearly identified open-coded lock taking
functions.

Note that the memory sharing page_lock() replacement (_page_lock()) is left
as-is, as the code is experimental and not security supported.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 42a572a38e22a97d86a4b648a22597628d5b42e4)

diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
index a5d7fdd32ea7..5845b729c3f7 100644
--- a/xen/arch/x86/include/asm/mm.h
+++ b/xen/arch/x86/include/asm/mm.h
@@ -393,7 +393,9 @@ const struct platform_bad_page *get_platform_badpages(unsigned int *array_size);
  * The use of PGT_locked in mem_sharing does not collide, since mem_sharing is
  * only supported for hvm guests, which do not have PV PTEs updated.
  */
-int page_lock(struct page_info *page);
+int page_lock_unsafe(struct page_info *page);
+#define page_lock(pg)   lock_evaluate_nospec(page_lock_unsafe(pg))
+
 void page_unlock(struct page_info *page);
 
 void put_page_type(struct page_info *page);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 330c4abcd10e..8d19d719bd16 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2033,7 +2033,7 @@ static inline bool current_locked_page_ne_check(struct page_info *page) {
 #define current_locked_page_ne_check(x) true
 #endif
 
-int page_lock(struct page_info *page)
+int page_lock_unsafe(struct page_info *page)
 {
     unsigned long x, nx;
 
@@ -2094,7 +2094,7 @@ void page_unlock(struct page_info *page)
  * l3t_lock(), so to avoid deadlock we must avoid grabbing them in
  * reverse order.
  */
-static void l3t_lock(struct page_info *page)
+static always_inline void l3t_lock(struct page_info *page)
 {
     unsigned long x, nx;
 
@@ -2103,6 +2103,8 @@ static void l3t_lock(struct page_info *page)
             cpu_relax();
         nx = x | PGT_locked;
     } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
+
+    block_lock_speculation();
 }
 
 static void l3t_unlock(struct page_info *page)
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86: protect conditional lock taking from speculative execution

Conditionally taken locks that use the pattern:

if ( lock )
    spin_lock(...);

Need an else branch in order to issue an speculation barrier in the else case,
just like it's done in case the lock needs to be acquired.

eval_nospec() could be used on the condition itself, but that would result in a
double barrier on the branch where the lock is taken.

Introduce a new pair of helpers, {gfn,spin}_lock_if() that can be used to
conditionally take a lock in a speculation safe way.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 03cf7ca23e0e876075954c558485b267b7d02406)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 8d19d719bd16..d31b8d56ffbc 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5023,8 +5023,7 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
         if ( !l3t )
             return NULL;
         UNMAP_DOMAIN_PAGE(l3t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
         {
             l4_pgentry_t l4e = l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
@@ -5061,8 +5060,7 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
             return NULL;
         }
         UNMAP_DOMAIN_PAGE(l2t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
             l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
@@ -5100,8 +5098,7 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
             return NULL;
         }
         UNMAP_DOMAIN_PAGE(l1t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
             l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR));
@@ -5132,6 +5129,8 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
     do {                      \
         if ( locking )        \
             l3t_lock(page);   \
+        else                            \
+            block_lock_speculation();   \
     } while ( false )
 
 #define L3T_UNLOCK(page)                           \
@@ -5347,8 +5346,7 @@ int map_pages_to_xen(
             if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
                 flush_flags |= FLUSH_TLB_GLOBAL;
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
@@ -5452,8 +5450,7 @@ int map_pages_to_xen(
                 if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
                     flush_flags |= FLUSH_TLB_GLOBAL;
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
@@ -5494,8 +5491,7 @@ int map_pages_to_xen(
                 unsigned long base_mfn;
                 const l1_pgentry_t *l1t;
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
 
                 ol2e = *pl2e;
                 /*
@@ -5549,8 +5545,7 @@ int map_pages_to_xen(
             unsigned long base_mfn;
             const l2_pgentry_t *l2t;
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
 
             ol3e = *pl3e;
             /*
@@ -5694,8 +5689,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                                        l3e_get_flags(*pl3e)));
             UNMAP_DOMAIN_PAGE(l2t);
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
@@ -5754,8 +5748,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                                            l2e_get_flags(*pl2e) & ~_PAGE_PSE));
                 UNMAP_DOMAIN_PAGE(l1t);
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
@@ -5799,8 +5792,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
              */
             if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
                 continue;
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
 
             /*
              * L2E may be already cleared, or set to a superpage, by
@@ -5847,8 +5839,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
         if ( (nf & _PAGE_PRESENT) ||
              ((v != e) && (l2_table_offset(v) + l1_table_offset(v) != 0)) )
             continue;
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
 
         /*
          * L3E may be already cleared, or set to a superpage, by
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 265239c49f39..3ea2d8eb032c 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -347,6 +347,15 @@ static inline void p2m_unlock(struct p2m_domain *p)
 #define p2m_locked_by_me(p)   mm_write_locked_by_me(&(p)->lock)
 #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
 
+static always_inline void gfn_lock_if(bool condition, struct p2m_domain *p2m,
+                                      gfn_t gfn, unsigned int order)
+{
+    if ( condition )
+        gfn_lock(p2m, gfn, order);
+    else
+        block_lock_speculation();
+}
+
 /* PoD lock (per-p2m-table)
  *
  * Protects private PoD data structs: entry and cache
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index b28c899b5ea7..1fa9e01012a2 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -292,9 +292,8 @@ mfn_t p2m_get_gfn_type_access(struct p2m_domain *p2m, gfn_t gfn,
     if ( q & P2M_UNSHARE )
         q |= P2M_ALLOC;
 
-    if ( locked )
-        /* Grab the lock here, don't release until put_gfn */
-        gfn_lock(p2m, gfn, 0);
+    /* Grab the lock here, don't release until put_gfn */
+    gfn_lock_if(locked, p2m, gfn, 0);
 
     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index daf48fdea709..7e75d0e2e7fb 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -216,6 +216,14 @@ static always_inline void spin_lock_irq(spinlock_t *l)
         block_lock_speculation();                               \
     })
 
+/* Conditionally take a spinlock in a speculation safe way. */
+static always_inline void spin_lock_if(bool condition, spinlock_t *l)
+{
+    if ( condition )
+        _spin_lock(l);
+    block_lock_speculation();
+}
+
 #define spin_unlock(l)                _spin_unlock(l)
 #define spin_unlock_irq(l)            _spin_unlock_irq(l)
 #define spin_unlock_irqrestore(l, f)  _spin_unlock_irqrestore(l, f)
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: xen: Swap order of actions in the FREE*() macros

Wherever possible, it is a good idea to NULL out the visible reference to an
object prior to freeing it.  The FREE*() macros already collect together both
parts, making it easy to adjust.

This has a marginal code generation improvement, as some of the calls to the
free() function can be tailcall optimised.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c4f427ec879e7c0df6d44d02561e8bee838a293e)

diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 8b9618609f77..8bc5f4249d1b 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -91,8 +91,9 @@ bool scrub_free_pages(void);
 
 /* Free an allocation, and zero the pointer to it. */
 #define FREE_XENHEAP_PAGES(p, o) do { \
-    free_xenheap_pages(p, o);         \
+    void *_ptr_ = (p);                \
     (p) = NULL;                       \
+    free_xenheap_pages(_ptr_, o);     \
 } while ( false )
 #define FREE_XENHEAP_PAGE(p) FREE_XENHEAP_PAGES(p, 0)
 
diff --git a/xen/include/xen/xmalloc.h b/xen/include/xen/xmalloc.h
index 16979a117c6a..d857298011c1 100644
--- a/xen/include/xen/xmalloc.h
+++ b/xen/include/xen/xmalloc.h
@@ -66,9 +66,10 @@
 extern void xfree(void *);
 
 /* Free an allocation, and zero the pointer to it. */
-#define XFREE(p) do { \
-    xfree(p);         \
-    (p) = NULL;       \
+#define XFREE(p) do {                       \
+    void *_ptr_ = (p);                      \
+    (p) = NULL;                             \
+    xfree(_ptr_);                           \
 } while ( false )
 
 /* Underlying functions */

From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86/spinlock: introduce support for blocking speculation into
 critical regions

Introduce a new Kconfig option to block speculation into lock protected
critical regions.  The Kconfig option is enabled by default, but the mitigation
won't be engaged unless it's explicitly enabled in the command line using
`spec-ctrl=lock-harden`.

Convert the spinlock acquire macros into always-inline functions, and introduce
a speculation barrier after the lock has been taken.  Note the speculation
barrier is not placed inside the implementation of the spin lock functions, as
to prevent speculation from falling through the call to the lock functions
resulting in the barrier also being skipped.

trylock variants are protected using a construct akin to the existing
evaluate_nospec().

This patch only implements the speculation barrier for x86.

Note spin locks are the only locking primitive taken care in this change,
further locking primitives will be adjusted by separate changes.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 7ef0084418e188d05f338c3e028fbbe8b6924afa)

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index fbf16839249a..3f9f9167182f 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -2373,7 +2373,7 @@ By default SSBD will be mitigated at runtime (i.e `ssbd=runtime`).
 >              {msr-sc,rsb,verw,ibpb-entry}=<bool>|{pv,hvm}=<bool>,
 >              bti-thunk=retpoline|lfence|jmp, {ibrs,ibpb,ssbd,psfd,
 >              eager-fpu,l1d-flush,branch-harden,srb-lock,
->              unpriv-mmio,gds-mit,div-scrub}=<bool> ]`
+>              unpriv-mmio,gds-mit,div-scrub,lock-harden}=<bool> ]`
 
 Controls for speculative execution sidechannel mitigations.  By default, Xen
 will pick the most appropriate mitigations based on compiled in support,
@@ -2500,6 +2500,11 @@ On all hardware, the `div-scrub=` option can be used to force or prevent Xen
 from mitigating the DIV-leakage vulnerability.  By default, Xen will mitigate
 DIV-leakage on hardware believed to be vulnerable.
 
+If Xen is compiled with `CONFIG_SPECULATIVE_HARDEN_LOCK`, the `lock-harden=`
+boolean can be used to force or prevent Xen from using speculation barriers to
+protect lock critical regions.  This mitigation won't be engaged by default,
+and needs to be explicitly enabled on the command line.
+
 ### sync_console
 > `= <boolean>`
 
diff --git a/xen/arch/x86/include/asm/cpufeatures.h b/xen/arch/x86/include/asm/cpufeatures.h
index c3aad21c3b43..7e8221fd85dd 100644
--- a/xen/arch/x86/include/asm/cpufeatures.h
+++ b/xen/arch/x86/include/asm/cpufeatures.h
@@ -24,7 +24,7 @@ XEN_CPUFEATURE(APERFMPERF,        X86_SYNTH( 8)) /* APERFMPERF */
 XEN_CPUFEATURE(MFENCE_RDTSC,      X86_SYNTH( 9)) /* MFENCE synchronizes RDTSC */
 XEN_CPUFEATURE(XEN_SMEP,          X86_SYNTH(10)) /* SMEP gets used by Xen itself */
 XEN_CPUFEATURE(XEN_SMAP,          X86_SYNTH(11)) /* SMAP gets used by Xen itself */
-/* Bit 12 unused. */
+XEN_CPUFEATURE(SC_NO_LOCK_HARDEN, X86_SYNTH(12)) /* (Disable) Lock critical region hardening */
 XEN_CPUFEATURE(IND_THUNK_LFENCE,  X86_SYNTH(13)) /* Use IND_THUNK_LFENCE */
 XEN_CPUFEATURE(IND_THUNK_JMP,     X86_SYNTH(14)) /* Use IND_THUNK_JMP */
 XEN_CPUFEATURE(SC_NO_BRANCH_HARDEN, X86_SYNTH(15)) /* (Disable) Conditional branch hardening */
diff --git a/xen/arch/x86/include/asm/nospec.h b/xen/arch/x86/include/asm/nospec.h
index 7150e76b87fb..0725839e1982 100644
--- a/xen/arch/x86/include/asm/nospec.h
+++ b/xen/arch/x86/include/asm/nospec.h
@@ -38,6 +38,32 @@ static always_inline void block_speculation(void)
     barrier_nospec_true();
 }
 
+static always_inline void arch_block_lock_speculation(void)
+{
+    alternative("lfence", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+}
+
+/* Allow to insert a read memory barrier into conditionals */
+static always_inline bool barrier_lock_true(void)
+{
+    alternative("lfence #nospec-true", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+    return true;
+}
+
+static always_inline bool barrier_lock_false(void)
+{
+    alternative("lfence #nospec-false", "", X86_FEATURE_SC_NO_LOCK_HARDEN);
+    return false;
+}
+
+static always_inline bool arch_lock_evaluate_nospec(bool condition)
+{
+    if ( condition )
+        return barrier_lock_true();
+    else
+        return barrier_lock_false();
+}
+
 #endif /* _ASM_X86_NOSPEC_H */
 
 /*
diff --git a/xen/arch/x86/spec_ctrl.c b/xen/arch/x86/spec_ctrl.c
index 8165379fed94..5dfc4ed69ec5 100644
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -53,6 +53,7 @@ int8_t __read_mostly opt_eager_fpu = -1;
 int8_t __read_mostly opt_l1d_flush = -1;
 static bool __initdata opt_branch_harden =
     IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH);
+static bool __initdata opt_lock_harden;
 
 bool __initdata bsp_delay_spec_ctrl;
 uint8_t __read_mostly default_xen_spec_ctrl;
@@ -121,6 +122,7 @@ static int __init cf_check parse_spec_ctrl(const char *s)
             opt_ssbd = false;
             opt_l1d_flush = 0;
             opt_branch_harden = false;
+            opt_lock_harden = false;
             opt_srb_lock = 0;
             opt_unpriv_mmio = false;
             opt_gds_mit = 0;
@@ -286,6 +288,16 @@ static int __init cf_check parse_spec_ctrl(const char *s)
                 rc = -EINVAL;
             }
         }
+        else if ( (val = parse_boolean("lock-harden", s, ss)) >= 0 )
+        {
+            if ( IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
+                opt_lock_harden = val;
+            else
+            {
+                no_config_param("SPECULATIVE_HARDEN_LOCK", "spec-ctrl", s, ss);
+                rc = -EINVAL;
+            }
+        }
         else if ( (val = parse_boolean("srb-lock", s, ss)) >= 0 )
             opt_srb_lock = val;
         else if ( (val = parse_boolean("unpriv-mmio", s, ss)) >= 0 )
@@ -488,7 +500,8 @@ static void __init print_details(enum ind_thunk thunk)
     if ( IS_ENABLED(CONFIG_INDIRECT_THUNK) || IS_ENABLED(CONFIG_SHADOW_PAGING) ||
          IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_ARRAY) ||
          IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_BRANCH) ||
-         IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) )
+         IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS) ||
+         IS_ENABLED(CONFIG_SPECULATIVE_HARDEN_LOCK) )
         printk("  Compiled-in support:"
 #ifdef CONFIG_INDIRECT_THUNK
                " INDIRECT_THUNK"
@@ -504,11 +517,14 @@ static void __init print_details(enum ind_thunk thunk)
 #endif
 #ifdef CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS
                " HARDEN_GUEST_ACCESS"
+#endif
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+               " HARDEN_LOCK"
 #endif
                "\n");
 
     /* Settings for Xen's protection, irrespective of guests. */
-    printk("  Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s\n",
+    printk("  Xen settings: %s%sSPEC_CTRL: %s%s%s%s%s, Other:%s%s%s%s%s%s%s\n",
            thunk != THUNK_NONE      ? "BTI-Thunk: " : "",
            thunk == THUNK_NONE      ? "" :
            thunk == THUNK_RETPOLINE ? "RETPOLINE, " :
@@ -535,7 +551,8 @@ static void __init print_details(enum ind_thunk thunk)
            opt_verw_pv || opt_verw_hvm ||
            opt_verw_mmio                             ? " VERW"  : "",
            opt_div_scrub                             ? " DIV" : "",
-           opt_branch_harden                         ? " BRANCH_HARDEN" : "");
+           opt_branch_harden                         ? " BRANCH_HARDEN" : "",
+           opt_lock_harden                           ? " LOCK_HARDEN" : "");
 
     /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
     if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
@@ -1918,6 +1935,9 @@ void __init init_speculation_mitigations(void)
     if ( !opt_branch_harden )
         setup_force_cpu_cap(X86_FEATURE_SC_NO_BRANCH_HARDEN);
 
+    if ( !opt_lock_harden )
+        setup_force_cpu_cap(X86_FEATURE_SC_NO_LOCK_HARDEN);
+
     /*
      * We do not disable HT by default on affected hardware.
      *
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 4d6fe051641d..3361a6d89257 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -188,6 +188,23 @@ config SPECULATIVE_HARDEN_GUEST_ACCESS
 
 	  If unsure, say Y.
 
+config SPECULATIVE_HARDEN_LOCK
+	bool "Speculative lock context hardening"
+	default y
+	depends on X86
+	help
+	  Contemporary processors may use speculative execution as a
+	  performance optimisation, but this can potentially be abused by an
+	  attacker to leak data via speculative sidechannels.
+
+	  One source of data leakage is via speculative accesses to lock
+	  critical regions.
+
+	  This option is disabled by default at run time, and needs to be
+	  enabled on the command line.
+
+	  If unsure, say Y.
+
 endmenu
 
 config DIT_DEFAULT
diff --git a/xen/include/xen/nospec.h b/xen/include/xen/nospec.h
index 76255bc46efe..455284640396 100644
--- a/xen/include/xen/nospec.h
+++ b/xen/include/xen/nospec.h
@@ -70,6 +70,21 @@ static inline unsigned long array_index_mask_nospec(unsigned long index,
 #define array_access_nospec(array, index)                               \
     (array)[array_index_nospec(index, ARRAY_SIZE(array))]
 
+static always_inline void block_lock_speculation(void)
+{
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+    arch_block_lock_speculation();
+#endif
+}
+
+static always_inline bool lock_evaluate_nospec(bool condition)
+{
+#ifdef CONFIG_SPECULATIVE_HARDEN_LOCK
+    return arch_lock_evaluate_nospec(condition);
+#endif
+    return condition;
+}
+
 #endif /* XEN_NOSPEC_H */
 
 /*
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index e7a1c1aa8988..28fce5615e5c 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -1,6 +1,7 @@
 #ifndef __SPINLOCK_H__
 #define __SPINLOCK_H__
 
+#include <xen/nospec.h>
 #include <xen/time.h>
 #include <xen/types.h>
 
@@ -195,13 +196,30 @@ int _spin_trylock_recursive(spinlock_t *lock);
 void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
-#define spin_lock(l)                  _spin_lock(l)
-#define spin_lock_cb(l, c, d)         _spin_lock_cb(l, c, d)
-#define spin_lock_irq(l)              _spin_lock_irq(l)
+static always_inline void spin_lock(spinlock_t *l)
+{
+    _spin_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void spin_lock_cb(spinlock_t *l, void (*c)(void *data),
+                                       void *d)
+{
+    _spin_lock_cb(l, c, d);
+    block_lock_speculation();
+}
+
+static always_inline void spin_lock_irq(spinlock_t *l)
+{
+    _spin_lock_irq(l);
+    block_lock_speculation();
+}
+
 #define spin_lock_irqsave(l, f)                                 \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _spin_lock_irqsave(l));                          \
+        block_lock_speculation();                               \
     })
 
 #define spin_unlock(l)                _spin_unlock(l)
@@ -209,7 +227,7 @@ void _spin_unlock_recursive(spinlock_t *lock);
 #define spin_unlock_irqrestore(l, f)  _spin_unlock_irqrestore(l, f)
 
 #define spin_is_locked(l)             _spin_is_locked(l)
-#define spin_trylock(l)               _spin_trylock(l)
+#define spin_trylock(l)               lock_evaluate_nospec(_spin_trylock(l))
 
 #define spin_trylock_irqsave(lock, flags)       \
 ({                                              \
@@ -230,8 +248,15 @@ void _spin_unlock_recursive(spinlock_t *lock);
  * are any critical regions that cannot form part of such a set, they can use
  * standard spin_[un]lock().
  */
-#define spin_trylock_recursive(l)     _spin_trylock_recursive(l)
-#define spin_lock_recursive(l)        _spin_lock_recursive(l)
+#define spin_trylock_recursive(l) \
+    lock_evaluate_nospec(_spin_trylock_recursive(l))
+
+static always_inline void spin_lock_recursive(spinlock_t *l)
+{
+    _spin_lock_recursive(l);
+    block_lock_speculation();
+}
+
 #define spin_unlock_recursive(l)      _spin_unlock_recursive(l)
 
 #endif /* __SPINLOCK_H__ */
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: rwlock: introduce support for blocking speculation into critical
 regions

Introduce inline wrappers as required and add direct calls to
block_lock_speculation() in order to prevent speculation into the rwlock
protected critical regions.

Note the rwlock primitives are adjusted to use the non speculation safe variants
of the spinlock handlers, as a speculation barrier is added in the rwlock
calling wrappers.

trylock variants are protected by using lock_evaluate_nospec().

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit a1fb15f61692b1fa9945fc51f55471ace49cdd59)

diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index 18224a4bb5d6..290602936df6 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -34,8 +34,11 @@ void queue_read_lock_slowpath(rwlock_t *lock)
 
     /*
      * Put the reader into the wait queue.
+     *
+     * Use the speculation unsafe helper, as it's the caller responsibility to
+     * issue a speculation barrier if required.
      */
-    spin_lock(&lock->lock);
+    _spin_lock(&lock->lock);
 
     /*
      * At the head of the wait queue now, wait until the writer state
@@ -66,8 +69,13 @@ void queue_write_lock_slowpath(rwlock_t *lock)
 {
     u32 cnts;
 
-    /* Put the writer into the wait queue. */
-    spin_lock(&lock->lock);
+    /*
+     * Put the writer into the wait queue.
+     *
+     * Use the speculation unsafe helper, as it's the caller responsibility to
+     * issue a speculation barrier if required.
+     */
+    _spin_lock(&lock->lock);
 
     /* Try to acquire the lock directly if no reader is present. */
     if ( !atomic_read(&lock->cnts) &&
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index e0d2b41c5c7e..9a0d3ec23847 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -259,27 +259,49 @@ static inline int _rw_is_write_locked(const rwlock_t *lock)
     return (atomic_read(&lock->cnts) & _QW_WMASK) == _QW_LOCKED;
 }
 
-#define read_lock(l)                  _read_lock(l)
-#define read_lock_irq(l)              _read_lock_irq(l)
+static always_inline void read_lock(rwlock_t *l)
+{
+    _read_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void read_lock_irq(rwlock_t *l)
+{
+    _read_lock_irq(l);
+    block_lock_speculation();
+}
+
 #define read_lock_irqsave(l, f)                                 \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _read_lock_irqsave(l));                          \
+        block_lock_speculation();                               \
     })
 
 #define read_unlock(l)                _read_unlock(l)
 #define read_unlock_irq(l)            _read_unlock_irq(l)
 #define read_unlock_irqrestore(l, f)  _read_unlock_irqrestore(l, f)
-#define read_trylock(l)               _read_trylock(l)
+#define read_trylock(l)               lock_evaluate_nospec(_read_trylock(l))
+
+static always_inline void write_lock(rwlock_t *l)
+{
+    _write_lock(l);
+    block_lock_speculation();
+}
+
+static always_inline void write_lock_irq(rwlock_t *l)
+{
+    _write_lock_irq(l);
+    block_lock_speculation();
+}
 
-#define write_lock(l)                 _write_lock(l)
-#define write_lock_irq(l)             _write_lock_irq(l)
 #define write_lock_irqsave(l, f)                                \
     ({                                                          \
         BUILD_BUG_ON(sizeof(f) != sizeof(unsigned long));       \
         ((f) = _write_lock_irqsave(l));                         \
+        block_lock_speculation();                               \
     })
-#define write_trylock(l)              _write_trylock(l)
+#define write_trylock(l)              lock_evaluate_nospec(_write_trylock(l))
 
 #define write_unlock(l)               _write_unlock(l)
 #define write_unlock_irq(l)           _write_unlock_irq(l)
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: percpu-rwlock: introduce support for blocking speculation into
 critical regions

Add direct calls to block_lock_speculation() where required in order to prevent
speculation into the lock protected critical regions.  Also convert
_percpu_read_lock() from inline to always_inline.

Note that _percpu_write_lock() has been modified the use the non speculation
safe of the locking primites, as a speculation is added unconditionally by the
calling wrapper.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit f218daf6d3a3b847736d37c6a6b76031a0d08441)

diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index 290602936df6..f5a249bcc240 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -129,8 +129,12 @@ void _percpu_write_lock(percpu_rwlock_t **per_cpudata,
     /*
      * First take the write lock to protect against other writers or slow
      * path readers.
+     *
+     * Note we use the speculation unsafe variant of write_lock(), as the
+     * calling wrapper already adds a speculation barrier after the lock has
+     * been taken.
      */
-    write_lock(&percpu_rwlock->rwlock);
+    _write_lock(&percpu_rwlock->rwlock);
 
     /* Now set the global variable so that readers start using read_lock. */
     percpu_rwlock->writer_activating = 1;
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index 9a0d3ec23847..9e35ee2edf8f 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -338,8 +338,8 @@ static inline void _percpu_rwlock_owner_check(percpu_rwlock_t **per_cpudata,
 #define percpu_rwlock_resource_init(l, owner) \
     (*(l) = (percpu_rwlock_t)PERCPU_RW_LOCK_UNLOCKED(&get_per_cpu_var(owner)))
 
-static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
-                                         percpu_rwlock_t *percpu_rwlock)
+static always_inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
+                                            percpu_rwlock_t *percpu_rwlock)
 {
     /* Validate the correct per_cpudata variable has been provided. */
     _percpu_rwlock_owner_check(per_cpudata, percpu_rwlock);
@@ -374,6 +374,8 @@ static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
     }
     else
     {
+        /* Other branch already has a speculation barrier in read_lock(). */
+        block_lock_speculation();
         /* All other paths have implicit check_lock() calls via read_lock(). */
         check_lock(&percpu_rwlock->rwlock.lock.debug, false);
     }
@@ -430,8 +432,12 @@ static inline void _percpu_write_unlock(percpu_rwlock_t **per_cpudata,
     _percpu_read_lock(&get_per_cpu_var(percpu), lock)
 #define percpu_read_unlock(percpu, lock) \
     _percpu_read_unlock(&get_per_cpu_var(percpu), lock)
-#define percpu_write_lock(percpu, lock) \
-    _percpu_write_lock(&get_per_cpu_var(percpu), lock)
+
+#define percpu_write_lock(percpu, lock)                 \
+({                                                      \
+    _percpu_write_lock(&get_per_cpu_var(percpu), lock); \
+    block_lock_speculation();                           \
+})
 #define percpu_write_unlock(percpu, lock) \
     _percpu_write_unlock(&get_per_cpu_var(percpu), lock)
 
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: locking: attempt to ensure lock wrappers are always inline

In order to prevent the locking speculation barriers from being inside of
`call`ed functions that could be speculatively bypassed.

While there also add an extra locking barrier to _mm_write_lock() in the branch
taken when the lock is already held.

Note some functions are switched to use the unsafe variants (without speculation
barrier) of the locking primitives, but a speculation barrier is always added
to the exposed public lock wrapping helper.  That's the case with
sched_spin_lock_double() or pcidevs_lock() for example.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 197ecd838a2aaf959a469df3696d4559c4f8b762)

diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 8f53e88d6706..e1d6845a2844 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -150,7 +150,7 @@ static int pt_irq_masked(struct periodic_time *pt)
  * pt->vcpu field, because another thread holding the pt_migrate lock
  * may already be spinning waiting for your vcpu lock.
  */
-static void pt_vcpu_lock(struct vcpu *v)
+static always_inline void pt_vcpu_lock(struct vcpu *v)
 {
     spin_lock(&v->arch.hvm.tm_lock);
 }
@@ -169,9 +169,13 @@ static void pt_vcpu_unlock(struct vcpu *v)
  * need to take an additional lock that protects against pt->vcpu
  * changing.
  */
-static void pt_lock(struct periodic_time *pt)
+static always_inline void pt_lock(struct periodic_time *pt)
 {
-    read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
+    /*
+     * Use the speculation unsafe variant for the first lock, as the following
+     * lock taking helper already includes a speculation barrier.
+     */
+    _read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
     spin_lock(&pt->vcpu->arch.hvm.tm_lock);
 }
 
diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index a87af47ece22..465ab39bb041 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -174,6 +174,7 @@ void cf_check irq_complete_move(struct irq_desc *desc);
 
 extern struct irq_desc *irq_desc;
 
+/* Not speculation safe, only used for AP bringup. */
 void lock_vector_lock(void);
 void unlock_vector_lock(void);
 
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 5a3f96fbaadd..5ec080c02fd8 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -74,8 +74,8 @@ static inline void _set_lock_level(int l)
     this_cpu(mm_lock_level) = l;
 }
 
-static inline void _mm_lock(const struct domain *d, mm_lock_t *l,
-                            const char *func, int level, int rec)
+static always_inline void _mm_lock(const struct domain *d, mm_lock_t *l,
+                                   const char *func, int level, int rec)
 {
     if ( !((mm_locked_by_me(l)) && rec) )
         _check_lock_level(d, level);
@@ -125,8 +125,8 @@ static inline int mm_write_locked_by_me(mm_rwlock_t *l)
     return (l->locker == get_processor_id());
 }
 
-static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
-                                  const char *func, int level)
+static always_inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
+                                         const char *func, int level)
 {
     if ( !mm_write_locked_by_me(l) )
     {
@@ -137,6 +137,8 @@ static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
         l->unlock_level = _get_lock_level();
         _set_lock_level(_lock_level(d, level));
     }
+    else
+        block_speculation();
     l->recurse_count++;
 }
 
@@ -150,8 +152,8 @@ static inline void mm_write_unlock(mm_rwlock_t *l)
     percpu_write_unlock(p2m_percpu_rwlock, &l->lock);
 }
 
-static inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
-                                 int level)
+static always_inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
+                                        int level)
 {
     _check_lock_level(d, level);
     percpu_read_lock(p2m_percpu_rwlock, &l->lock);
@@ -166,15 +168,15 @@ static inline void mm_read_unlock(mm_rwlock_t *l)
 
 /* This wrapper uses the line number to express the locking order below */
 #define declare_mm_lock(name)                                                 \
-    static inline void mm_lock_##name(const struct domain *d, mm_lock_t *l,   \
-                                      const char *func, int rec)              \
+    static always_inline void mm_lock_##name(                                 \
+        const struct domain *d, mm_lock_t *l, const char *func, int rec)      \
     { _mm_lock(d, l, func, MM_LOCK_ORDER_##name, rec); }
 #define declare_mm_rwlock(name)                                               \
-    static inline void mm_write_lock_##name(const struct domain *d,           \
-                                            mm_rwlock_t *l, const char *func) \
+    static always_inline void mm_write_lock_##name(                           \
+        const struct domain *d, mm_rwlock_t *l, const char *func)             \
     { _mm_write_lock(d, l, func, MM_LOCK_ORDER_##name); }                     \
-    static inline void mm_read_lock_##name(const struct domain *d,            \
-                                           mm_rwlock_t *l)                    \
+    static always_inline void mm_read_lock_##name(const struct domain *d,     \
+                                                  mm_rwlock_t *l)             \
     { _mm_read_lock(d, l, MM_LOCK_ORDER_##name); }
 /* These capture the name of the calling function */
 #define mm_lock(name, d, l) mm_lock_##name(d, l, __func__, 0)
@@ -309,7 +311,7 @@ declare_mm_lock(altp2mlist)
 #define MM_LOCK_ORDER_altp2m                 40
 declare_mm_rwlock(altp2m);
 
-static inline void p2m_lock(struct p2m_domain *p)
+static always_inline void p2m_lock(struct p2m_domain *p)
 {
     if ( p2m_is_altp2m(p) )
         mm_write_lock(altp2m, p->domain, &p->lock);
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index 9969eb45fa8c..9be67b63ce3e 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -24,7 +24,7 @@
 #define superpage_aligned(_x)  (((_x)&(SUPERPAGE_PAGES-1))==0)
 
 /* Enforce lock ordering when grabbing the "external" page_alloc lock */
-static inline void lock_page_alloc(struct p2m_domain *p2m)
+static always_inline void lock_page_alloc(struct p2m_domain *p2m)
 {
     page_alloc_mm_pre_lock(p2m->domain);
     spin_lock(&(p2m->domain->page_alloc_lock));
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index a7a004a08429..66f924a7b091 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -45,7 +45,7 @@
  * just assume the event channel is free or unbound at the moment when the
  * evtchn_read_trylock() returns false.
  */
-static inline void evtchn_write_lock(struct evtchn *evtchn)
+static always_inline void evtchn_write_lock(struct evtchn *evtchn)
 {
     write_lock(&evtchn->lock);
 
@@ -351,7 +351,8 @@ int evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc, evtchn_port_t port)
     return rc;
 }
 
-static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
+static always_inline void double_evtchn_lock(struct evtchn *lchn,
+                                             struct evtchn *rchn)
 {
     ASSERT(lchn != rchn);
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 89b7811c51c3..934924cbda66 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -403,7 +403,7 @@ static inline void act_set_gfn(struct active_grant_entry *act, gfn_t gfn)
 
 static DEFINE_PERCPU_RWLOCK_GLOBAL(grant_rwlock);
 
-static inline void grant_read_lock(struct grant_table *gt)
+static always_inline void grant_read_lock(struct grant_table *gt)
 {
     percpu_read_lock(grant_rwlock, &gt->lock);
 }
@@ -413,7 +413,7 @@ static inline void grant_read_unlock(struct grant_table *gt)
     percpu_read_unlock(grant_rwlock, &gt->lock);
 }
 
-static inline void grant_write_lock(struct grant_table *gt)
+static always_inline void grant_write_lock(struct grant_table *gt)
 {
     percpu_write_lock(grant_rwlock, &gt->lock);
 }
@@ -450,7 +450,7 @@ nr_active_grant_frames(struct grant_table *gt)
     return num_act_frames_from_sha_frames(nr_grant_frames(gt));
 }
 
-static inline struct active_grant_entry *
+static always_inline struct active_grant_entry *
 active_entry_acquire(struct grant_table *t, grant_ref_t e)
 {
     struct active_grant_entry *act;
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 901782bbb416..34ad39b9ad0b 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -348,23 +348,28 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
  * This avoids dead- or live-locks when this code is running on both
  * cpus at the same time.
  */
-static void sched_spin_lock_double(spinlock_t *lock1, spinlock_t *lock2,
-                                   unsigned long *flags)
+static always_inline void sched_spin_lock_double(
+    spinlock_t *lock1, spinlock_t *lock2, unsigned long *flags)
 {
+    /*
+     * In order to avoid extra overhead, use the locking primitives without the
+     * speculation barrier, and introduce a single barrier here.
+     */
     if ( lock1 == lock2 )
     {
-        spin_lock_irqsave(lock1, *flags);
+        *flags = _spin_lock_irqsave(lock1);
     }
     else if ( lock1 < lock2 )
     {
-        spin_lock_irqsave(lock1, *flags);
-        spin_lock(lock2);
+        *flags = _spin_lock_irqsave(lock1);
+        _spin_lock(lock2);
     }
     else
     {
-        spin_lock_irqsave(lock2, *flags);
-        spin_lock(lock1);
+        *flags = _spin_lock_irqsave(lock2);
+        _spin_lock(lock1);
     }
+    block_lock_speculation();
 }
 
 static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index c516976c3740..3b97f1576782 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -207,8 +207,24 @@ DECLARE_PER_CPU(cpumask_t, cpumask_scratch);
 #define cpumask_scratch        (&this_cpu(cpumask_scratch))
 #define cpumask_scratch_cpu(c) (&per_cpu(cpumask_scratch, c))
 
+/*
+ * Deal with _spin_lock_irqsave() returning the flags value instead of storing
+ * it in a passed parameter.
+ */
+#define _sched_spinlock0(lock, irq) _spin_lock##irq(lock)
+#define _sched_spinlock1(lock, irq, arg) ({ \
+    BUILD_BUG_ON(sizeof(arg) != sizeof(unsigned long)); \
+    (arg) = _spin_lock##irq(lock); \
+})
+
+#define _sched_spinlock__(nr) _sched_spinlock ## nr
+#define _sched_spinlock_(nr)  _sched_spinlock__(nr)
+#define _sched_spinlock(lock, irq, args...) \
+    _sched_spinlock_(count_args(args))(lock, irq, ## args)
+
 #define sched_lock(kind, param, cpu, irq, arg...) \
-static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
+static always_inline spinlock_t \
+*kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
 { \
     for ( ; ; ) \
     { \
@@ -220,10 +236,16 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
          * \
          * It may also be the case that v->processor may change but the \
          * lock may be the same; this will succeed in that case. \
+         * \
+         * Use the speculation unsafe locking helper, there's a speculation \
+         * barrier before returning to the caller. \
          */ \
-        spin_lock##irq(lock, ## arg); \
+        _sched_spinlock(lock, irq, ## arg); \
         if ( likely(lock == get_sched_res(cpu)->schedule_lock) ) \
+        { \
+            block_lock_speculation(); \
             return lock; \
+        } \
         spin_unlock##irq(lock, ## arg); \
     } \
 }
diff --git a/xen/common/timer.c b/xen/common/timer.c
index 0fddfa74879e..38eb5fd20d36 100644
--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -239,7 +239,7 @@ static inline void deactivate_timer(struct timer *timer)
     list_add(&timer->inactive, &per_cpu(timers, timer->cpu).inactive);
 }
 
-static inline bool_t timer_lock(struct timer *timer)
+static inline bool_t timer_lock_unsafe(struct timer *timer)
 {
     unsigned int cpu;
 
@@ -253,7 +253,8 @@ static inline bool_t timer_lock(struct timer *timer)
             rcu_read_unlock(&timer_cpu_read_lock);
             return 0;
         }
-        spin_lock(&per_cpu(timers, cpu).lock);
+        /* Use the speculation unsafe variant, the wrapper has the barrier. */
+        _spin_lock(&per_cpu(timers, cpu).lock);
         if ( likely(timer->cpu == cpu) )
             break;
         spin_unlock(&per_cpu(timers, cpu).lock);
@@ -266,8 +267,9 @@ static inline bool_t timer_lock(struct timer *timer)
 #define timer_lock_irqsave(t, flags) ({         \
     bool_t __x;                                 \
     local_irq_save(flags);                      \
-    if ( !(__x = timer_lock(t)) )               \
+    if ( !(__x = timer_lock_unsafe(t)) )        \
         local_irq_restore(flags);               \
+    block_lock_speculation();                   \
     __x;                                        \
 })
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index e99837b6e141..2a1e7ee89a5d 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -52,9 +52,10 @@ struct pci_seg {
 
 static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
 
-void pcidevs_lock(void)
+/* Do not use, as it has no speculation barrier, use pcidevs_lock() instead. */
+void pcidevs_lock_unsafe(void)
 {
-    spin_lock_recursive(&_pcidevs_lock);
+    _spin_lock_recursive(&_pcidevs_lock);
 }
 
 void pcidevs_unlock(void)
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 8e509e078475..f1472ea1ebe5 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -114,12 +114,12 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
 #define bucket_from_port(d, p) \
     ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
 
-static inline void evtchn_read_lock(struct evtchn *evtchn)
+static always_inline void evtchn_read_lock(struct evtchn *evtchn)
 {
     read_lock(&evtchn->lock);
 }
 
-static inline bool evtchn_read_trylock(struct evtchn *evtchn)
+static always_inline bool evtchn_read_trylock(struct evtchn *evtchn)
 {
     return read_trylock(&evtchn->lock);
 }
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 251b8761a8e9..a71bed36be29 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -155,8 +155,12 @@ struct pci_dev {
  * devices, it also sync the access to the msi capability that is not
  * interrupt handling related (the mask bit register).
  */
-
-void pcidevs_lock(void);
+void pcidevs_lock_unsafe(void);
+static always_inline void pcidevs_lock(void)
+{
+    pcidevs_lock_unsafe();
+    block_lock_speculation();
+}
 void pcidevs_unlock(void);
 bool __must_check pcidevs_locked(void);
 
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86/mm: add speculation barriers to open coded locks

Add a speculation barrier to the clearly identified open-coded lock taking
functions.

Note that the memory sharing page_lock() replacement (_page_lock()) is left
as-is, as the code is experimental and not security supported.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 42a572a38e22a97d86a4b648a22597628d5b42e4)

diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
index 05dfe35502c8..d1b1fee99b7d 100644
--- a/xen/arch/x86/include/asm/mm.h
+++ b/xen/arch/x86/include/asm/mm.h
@@ -399,7 +399,9 @@ const struct platform_bad_page *get_platform_badpages(unsigned int *array_size);
  * The use of PGT_locked in mem_sharing does not collide, since mem_sharing is
  * only supported for hvm guests, which do not have PV PTEs updated.
  */
-int page_lock(struct page_info *page);
+int page_lock_unsafe(struct page_info *page);
+#define page_lock(pg)   lock_evaluate_nospec(page_lock_unsafe(pg))
+
 void page_unlock(struct page_info *page);
 
 void put_page_type(struct page_info *page);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index ab0acbfea6e5..000fd0fb558b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2017,7 +2017,7 @@ static inline bool current_locked_page_ne_check(struct page_info *page) {
 #define current_locked_page_ne_check(x) true
 #endif
 
-int page_lock(struct page_info *page)
+int page_lock_unsafe(struct page_info *page)
 {
     unsigned long x, nx;
 
@@ -2078,7 +2078,7 @@ void page_unlock(struct page_info *page)
  * l3t_lock(), so to avoid deadlock we must avoid grabbing them in
  * reverse order.
  */
-static void l3t_lock(struct page_info *page)
+static always_inline void l3t_lock(struct page_info *page)
 {
     unsigned long x, nx;
 
@@ -2087,6 +2087,8 @@ static void l3t_lock(struct page_info *page)
             cpu_relax();
         nx = x | PGT_locked;
     } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
+
+    block_lock_speculation();
 }
 
 static void l3t_unlock(struct page_info *page)
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86: protect conditional lock taking from speculative execution

Conditionally taken locks that use the pattern:

if ( lock )
    spin_lock(...);

Need an else branch in order to issue an speculation barrier in the else case,
just like it's done in case the lock needs to be acquired.

eval_nospec() could be used on the condition itself, but that would result in a
double barrier on the branch where the lock is taken.

Introduce a new pair of helpers, {gfn,spin}_lock_if() that can be used to
conditionally take a lock in a speculation safe way.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 03cf7ca23e0e876075954c558485b267b7d02406)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 000fd0fb558b..45bfbc2522f7 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5007,8 +5007,7 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
         if ( !l3t )
             return NULL;
         UNMAP_DOMAIN_PAGE(l3t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
         {
             l4_pgentry_t l4e = l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
@@ -5045,8 +5044,7 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
             return NULL;
         }
         UNMAP_DOMAIN_PAGE(l2t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
             l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
@@ -5084,8 +5082,7 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
             return NULL;
         }
         UNMAP_DOMAIN_PAGE(l1t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
             l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR));
@@ -5116,6 +5113,8 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
     do {                      \
         if ( locking )        \
             l3t_lock(page);   \
+        else                            \
+            block_lock_speculation();   \
     } while ( false )
 
 #define L3T_UNLOCK(page)                           \
@@ -5331,8 +5330,7 @@ int map_pages_to_xen(
             if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
                 flush_flags |= FLUSH_TLB_GLOBAL;
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
@@ -5436,8 +5434,7 @@ int map_pages_to_xen(
                 if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
                     flush_flags |= FLUSH_TLB_GLOBAL;
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
@@ -5478,8 +5475,7 @@ int map_pages_to_xen(
                 unsigned long base_mfn;
                 const l1_pgentry_t *l1t;
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
 
                 ol2e = *pl2e;
                 /*
@@ -5533,8 +5529,7 @@ int map_pages_to_xen(
             unsigned long base_mfn;
             const l2_pgentry_t *l2t;
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
 
             ol3e = *pl3e;
             /*
@@ -5678,8 +5673,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                                        l3e_get_flags(*pl3e)));
             UNMAP_DOMAIN_PAGE(l2t);
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
@@ -5738,8 +5732,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                                            l2e_get_flags(*pl2e) & ~_PAGE_PSE));
                 UNMAP_DOMAIN_PAGE(l1t);
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
@@ -5783,8 +5776,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
              */
             if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
                 continue;
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
 
             /*
              * L2E may be already cleared, or set to a superpage, by
@@ -5831,8 +5823,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
         if ( (nf & _PAGE_PRESENT) ||
              ((v != e) && (l2_table_offset(v) + l1_table_offset(v) != 0)) )
             continue;
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
 
         /*
          * L3E may be already cleared, or set to a superpage, by
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 5ec080c02fd8..b4960fb90eff 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -335,6 +335,15 @@ static inline void p2m_unlock(struct p2m_domain *p)
 #define p2m_locked_by_me(p)   mm_write_locked_by_me(&(p)->lock)
 #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
 
+static always_inline void gfn_lock_if(bool condition, struct p2m_domain *p2m,
+                                      gfn_t gfn, unsigned int order)
+{
+    if ( condition )
+        gfn_lock(p2m, gfn, order);
+    else
+        block_lock_speculation();
+}
+
 /* PoD lock (per-p2m-table)
  *
  * Protects private PoD data structs: entry and cache
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 0983bd71d9a9..22ab1d606e8a 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -280,9 +280,8 @@ mfn_t p2m_get_gfn_type_access(struct p2m_domain *p2m, gfn_t gfn,
     if ( q & P2M_UNSHARE )
         q |= P2M_ALLOC;
 
-    if ( locked )
-        /* Grab the lock here, don't release until put_gfn */
-        gfn_lock(p2m, gfn, 0);
+    /* Grab the lock here, don't release until put_gfn */
+    gfn_lock_if(locked, p2m, gfn, 0);
 
     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index 28fce5615e5c..c830df3430a3 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -222,6 +222,14 @@ static always_inline void spin_lock_irq(spinlock_t *l)
         block_lock_speculation();                               \
     })
 
+/* Conditionally take a spinlock in a speculation safe way. */
+static always_inline void spin_lock_if(bool condition, spinlock_t *l)
+{
+    if ( condition )
+        _spin_lock(l);
+    block_lock_speculation();
+}
+
 #define spin_unlock(l)                _spin_unlock(l)
 #define spin_unlock_irq(l)            _spin_unlock_irq(l)
 #define spin_unlock_irqrestore(l, f)  _spin_unlock_irqrestore(l, f)
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: percpu-rwlock: introduce support for blocking speculation into
 critical regions

Add direct calls to block_lock_speculation() where required in order to prevent
speculation into the lock protected critical regions.  Also convert
_percpu_read_lock() from inline to always_inline.

Note that _percpu_write_lock() has been modified the use the non speculation
safe of the locking primites, as a speculation is added unconditionally by the
calling wrapper.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/common/rwlock.c b/xen/common/rwlock.c
index 290602936df6..f5a249bcc240 100644
--- a/xen/common/rwlock.c
+++ b/xen/common/rwlock.c
@@ -129,8 +129,12 @@ void _percpu_write_lock(percpu_rwlock_t **per_cpudata,
     /*
      * First take the write lock to protect against other writers or slow
      * path readers.
+     *
+     * Note we use the speculation unsafe variant of write_lock(), as the
+     * calling wrapper already adds a speculation barrier after the lock has
+     * been taken.
      */
-    write_lock(&percpu_rwlock->rwlock);
+    _write_lock(&percpu_rwlock->rwlock);
 
     /* Now set the global variable so that readers start using read_lock. */
     percpu_rwlock->writer_activating = 1;
diff --git a/xen/include/xen/rwlock.h b/xen/include/xen/rwlock.h
index ffff0fad45a7..65d88b0ef45d 100644
--- a/xen/include/xen/rwlock.h
+++ b/xen/include/xen/rwlock.h
@@ -338,8 +338,8 @@ static inline void _percpu_rwlock_owner_check(percpu_rwlock_t **per_cpudata,
 #define percpu_rwlock_resource_init(l, owner) \
     (*(l) = (percpu_rwlock_t)PERCPU_RW_LOCK_UNLOCKED(&get_per_cpu_var(owner)))
 
-static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
-                                         percpu_rwlock_t *percpu_rwlock)
+static always_inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
+                                            percpu_rwlock_t *percpu_rwlock)
 {
     /* Validate the correct per_cpudata variable has been provided. */
     _percpu_rwlock_owner_check(per_cpudata, percpu_rwlock);
@@ -374,6 +374,8 @@ static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata,
     }
     else
     {
+        /* Other branch already has a speculation barrier in read_lock(). */
+        block_lock_speculation();
         /* All other paths have implicit check_lock() calls via read_lock(). */
         check_lock(&percpu_rwlock->rwlock.lock.debug, false);
     }
@@ -430,8 +432,12 @@ static inline void _percpu_write_unlock(percpu_rwlock_t **per_cpudata,
     _percpu_read_lock(&get_per_cpu_var(percpu), lock)
 #define percpu_read_unlock(percpu, lock) \
     _percpu_read_unlock(&get_per_cpu_var(percpu), lock)
-#define percpu_write_lock(percpu, lock) \
-    _percpu_write_lock(&get_per_cpu_var(percpu), lock)
+
+#define percpu_write_lock(percpu, lock)                 \
+({                                                      \
+    _percpu_write_lock(&get_per_cpu_var(percpu), lock); \
+    block_lock_speculation();                           \
+})
 #define percpu_write_unlock(percpu, lock) \
     _percpu_write_unlock(&get_per_cpu_var(percpu), lock)
 
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: locking: attempt to ensure lock wrappers are always inline

In order to prevent the locking speculation barriers from being inside of
`call`ed functions that could be speculatively bypassed.

While there also add an extra locking barrier to _mm_write_lock() in the branch
taken when the lock is already held.

Note some functions are switched to use the unsafe variants (without speculation
barrier) of the locking primitives, but a speculation barrier is always added
to the exposed public lock wrapping helper.  That's the case with
sched_spin_lock_double() or pcidevs_lock() for example.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 8f53e88d6706..e1d6845a2844 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -150,7 +150,7 @@ static int pt_irq_masked(struct periodic_time *pt)
  * pt->vcpu field, because another thread holding the pt_migrate lock
  * may already be spinning waiting for your vcpu lock.
  */
-static void pt_vcpu_lock(struct vcpu *v)
+static always_inline void pt_vcpu_lock(struct vcpu *v)
 {
     spin_lock(&v->arch.hvm.tm_lock);
 }
@@ -169,9 +169,13 @@ static void pt_vcpu_unlock(struct vcpu *v)
  * need to take an additional lock that protects against pt->vcpu
  * changing.
  */
-static void pt_lock(struct periodic_time *pt)
+static always_inline void pt_lock(struct periodic_time *pt)
 {
-    read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
+    /*
+     * Use the speculation unsafe variant for the first lock, as the following
+     * lock taking helper already includes a speculation barrier.
+     */
+    _read_lock(&pt->vcpu->domain->arch.hvm.pl_time->pt_migrate);
     spin_lock(&pt->vcpu->arch.hvm.tm_lock);
 }
 
diff --git a/xen/arch/x86/include/asm/irq.h b/xen/arch/x86/include/asm/irq.h
index 7d49f3c1904a..413994d2133b 100644
--- a/xen/arch/x86/include/asm/irq.h
+++ b/xen/arch/x86/include/asm/irq.h
@@ -151,6 +151,7 @@ void cf_check irq_complete_move(struct irq_desc *desc);
 
 extern struct irq_desc *irq_desc;
 
+/* Not speculation safe, only used for AP bringup. */
 void lock_vector_lock(void);
 void unlock_vector_lock(void);
 
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 00b1bc402d6d..273fff86baa4 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -74,8 +74,8 @@ static inline void _set_lock_level(int l)
     this_cpu(mm_lock_level) = l;
 }
 
-static inline void _mm_lock(const struct domain *d, mm_lock_t *l,
-                            const char *func, int level, int rec)
+static always_inline void _mm_lock(const struct domain *d, mm_lock_t *l,
+                                   const char *func, int level, int rec)
 {
     if ( !((mm_locked_by_me(l)) && rec) )
         _check_lock_level(d, level);
@@ -125,8 +125,8 @@ static inline int mm_write_locked_by_me(mm_rwlock_t *l)
     return (l->locker == smp_processor_id());
 }
 
-static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
-                                  const char *func, int level)
+static always_inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
+                                         const char *func, int level)
 {
     if ( !mm_write_locked_by_me(l) )
     {
@@ -137,6 +137,8 @@ static inline void _mm_write_lock(const struct domain *d, mm_rwlock_t *l,
         l->unlock_level = _get_lock_level();
         _set_lock_level(_lock_level(d, level));
     }
+    else
+        block_speculation();
     l->recurse_count++;
 }
 
@@ -150,8 +152,8 @@ static inline void mm_write_unlock(mm_rwlock_t *l)
     percpu_write_unlock(p2m_percpu_rwlock, &l->lock);
 }
 
-static inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
-                                 int level)
+static always_inline void _mm_read_lock(const struct domain *d, mm_rwlock_t *l,
+                                        int level)
 {
     _check_lock_level(d, level);
     percpu_read_lock(p2m_percpu_rwlock, &l->lock);
@@ -166,15 +168,15 @@ static inline void mm_read_unlock(mm_rwlock_t *l)
 
 /* This wrapper uses the line number to express the locking order below */
 #define declare_mm_lock(name)                                                 \
-    static inline void mm_lock_##name(const struct domain *d, mm_lock_t *l,   \
-                                      const char *func, int rec)              \
+    static always_inline void mm_lock_##name(                                 \
+        const struct domain *d, mm_lock_t *l, const char *func, int rec)      \
     { _mm_lock(d, l, func, MM_LOCK_ORDER_##name, rec); }
 #define declare_mm_rwlock(name)                                               \
-    static inline void mm_write_lock_##name(const struct domain *d,           \
-                                            mm_rwlock_t *l, const char *func) \
+    static always_inline void mm_write_lock_##name(                           \
+        const struct domain *d, mm_rwlock_t *l, const char *func)             \
     { _mm_write_lock(d, l, func, MM_LOCK_ORDER_##name); }                     \
-    static inline void mm_read_lock_##name(const struct domain *d,            \
-                                           mm_rwlock_t *l)                    \
+    static always_inline void mm_read_lock_##name(const struct domain *d,     \
+                                                  mm_rwlock_t *l)             \
     { _mm_read_lock(d, l, MM_LOCK_ORDER_##name); }
 /* These capture the name of the calling function */
 #define mm_lock(name, d, l) mm_lock_##name(d, l, __func__, 0)
@@ -309,7 +311,7 @@ declare_mm_lock(altp2mlist)
 #define MM_LOCK_ORDER_altp2m                 40
 declare_mm_rwlock(altp2m);
 
-static inline void p2m_lock(struct p2m_domain *p)
+static always_inline void p2m_lock(struct p2m_domain *p)
 {
     if ( p2m_is_altp2m(p) )
         mm_write_lock(altp2m, p->domain, &p->lock);
diff --git a/xen/arch/x86/mm/p2m-pod.c b/xen/arch/x86/mm/p2m-pod.c
index bc78c61062b4..65d31e552305 100644
--- a/xen/arch/x86/mm/p2m-pod.c
+++ b/xen/arch/x86/mm/p2m-pod.c
@@ -24,7 +24,7 @@
 #define superpage_aligned(_x)  (((_x)&(SUPERPAGE_PAGES-1))==0)
 
 /* Enforce lock ordering when grabbing the "external" page_alloc lock */
-static inline void lock_page_alloc(struct p2m_domain *p2m)
+static always_inline void lock_page_alloc(struct p2m_domain *p2m)
 {
     page_alloc_mm_pre_lock(p2m->domain);
     spin_lock(&(p2m->domain->page_alloc_lock));
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index 15aec5dcbbda..bbfda2538e8a 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -45,7 +45,7 @@
  * just assume the event channel is free or unbound at the moment when the
  * evtchn_read_trylock() returns false.
  */
-static inline void evtchn_write_lock(struct evtchn *evtchn)
+static always_inline void evtchn_write_lock(struct evtchn *evtchn)
 {
     write_lock(&evtchn->lock);
 
@@ -349,7 +349,8 @@ int evtchn_alloc_unbound(evtchn_alloc_unbound_t *alloc, evtchn_port_t port)
     return rc;
 }
 
-static void double_evtchn_lock(struct evtchn *lchn, struct evtchn *rchn)
+static always_inline void double_evtchn_lock(struct evtchn *lchn,
+                                             struct evtchn *rchn)
 {
     ASSERT(lchn != rchn);
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 37b178a67bf7..77089308829b 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -403,7 +403,7 @@ static inline void act_set_gfn(struct active_grant_entry *act, gfn_t gfn)
 
 static DEFINE_PERCPU_RWLOCK_GLOBAL(grant_rwlock);
 
-static inline void grant_read_lock(struct grant_table *gt)
+static always_inline void grant_read_lock(struct grant_table *gt)
 {
     percpu_read_lock(grant_rwlock, &gt->lock);
 }
@@ -413,7 +413,7 @@ static inline void grant_read_unlock(struct grant_table *gt)
     percpu_read_unlock(grant_rwlock, &gt->lock);
 }
 
-static inline void grant_write_lock(struct grant_table *gt)
+static always_inline void grant_write_lock(struct grant_table *gt)
 {
     percpu_write_lock(grant_rwlock, &gt->lock);
 }
@@ -450,7 +450,7 @@ nr_active_grant_frames(struct grant_table *gt)
     return num_act_frames_from_sha_frames(nr_grant_frames(gt));
 }
 
-static inline struct active_grant_entry *
+static always_inline struct active_grant_entry *
 active_entry_acquire(struct grant_table *t, grant_ref_t e)
 {
     struct active_grant_entry *act;
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index c5db3739727d..3e48da1cdd4e 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -348,23 +348,28 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
  * This avoids dead- or live-locks when this code is running on both
  * cpus at the same time.
  */
-static void sched_spin_lock_double(spinlock_t *lock1, spinlock_t *lock2,
-                                   unsigned long *flags)
+static always_inline void sched_spin_lock_double(
+    spinlock_t *lock1, spinlock_t *lock2, unsigned long *flags)
 {
+    /*
+     * In order to avoid extra overhead, use the locking primitives without the
+     * speculation barrier, and introduce a single barrier here.
+     */
     if ( lock1 == lock2 )
     {
-        spin_lock_irqsave(lock1, *flags);
+        *flags = _spin_lock_irqsave(lock1);
     }
     else if ( lock1 < lock2 )
     {
-        spin_lock_irqsave(lock1, *flags);
-        spin_lock(lock2);
+        *flags = _spin_lock_irqsave(lock1);
+        _spin_lock(lock2);
     }
     else
     {
-        spin_lock_irqsave(lock2, *flags);
-        spin_lock(lock1);
+        *flags = _spin_lock_irqsave(lock2);
+        _spin_lock(lock1);
     }
+    block_lock_speculation();
 }
 
 static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2,
diff --git a/xen/common/sched/private.h b/xen/common/sched/private.h
index 26a196f4283b..459d1dfb11a5 100644
--- a/xen/common/sched/private.h
+++ b/xen/common/sched/private.h
@@ -208,8 +208,24 @@ DECLARE_PER_CPU(cpumask_t, cpumask_scratch);
 #define cpumask_scratch        (&this_cpu(cpumask_scratch))
 #define cpumask_scratch_cpu(c) (&per_cpu(cpumask_scratch, c))
 
+/*
+ * Deal with _spin_lock_irqsave() returning the flags value instead of storing
+ * it in a passed parameter.
+ */
+#define _sched_spinlock0(lock, irq) _spin_lock##irq(lock)
+#define _sched_spinlock1(lock, irq, arg) ({ \
+    BUILD_BUG_ON(sizeof(arg) != sizeof(unsigned long)); \
+    (arg) = _spin_lock##irq(lock); \
+})
+
+#define _sched_spinlock__(nr) _sched_spinlock ## nr
+#define _sched_spinlock_(nr)  _sched_spinlock__(nr)
+#define _sched_spinlock(lock, irq, args...) \
+    _sched_spinlock_(count_args(args))(lock, irq, ## args)
+
 #define sched_lock(kind, param, cpu, irq, arg...) \
-static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
+static always_inline spinlock_t \
+*kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
 { \
     for ( ; ; ) \
     { \
@@ -221,10 +237,16 @@ static inline spinlock_t *kind##_schedule_lock##irq(param EXTRA_TYPE(arg)) \
          * \
          * It may also be the case that v->processor may change but the \
          * lock may be the same; this will succeed in that case. \
+         * \
+         * Use the speculation unsafe locking helper, there's a speculation \
+         * barrier before returning to the caller. \
          */ \
-        spin_lock##irq(lock, ## arg); \
+        _sched_spinlock(lock, irq, ## arg); \
         if ( likely(lock == get_sched_res(cpu)->schedule_lock) ) \
+        { \
+            block_lock_speculation(); \
             return lock; \
+        } \
         spin_unlock##irq(lock, ## arg); \
     } \
 }
diff --git a/xen/common/timer.c b/xen/common/timer.c
index 785177e7fa47..a21798b76f38 100644
--- a/xen/common/timer.c
+++ b/xen/common/timer.c
@@ -239,7 +239,7 @@ static inline void deactivate_timer(struct timer *timer)
     list_add(&timer->inactive, &per_cpu(timers, timer->cpu).inactive);
 }
 
-static inline bool timer_lock(struct timer *timer)
+static inline bool timer_lock_unsafe(struct timer *timer)
 {
     unsigned int cpu;
 
@@ -253,7 +253,8 @@ static inline bool timer_lock(struct timer *timer)
             rcu_read_unlock(&timer_cpu_read_lock);
             return 0;
         }
-        spin_lock(&per_cpu(timers, cpu).lock);
+        /* Use the speculation unsafe variant, the wrapper has the barrier. */
+        _spin_lock(&per_cpu(timers, cpu).lock);
         if ( likely(timer->cpu == cpu) )
             break;
         spin_unlock(&per_cpu(timers, cpu).lock);
@@ -266,8 +267,9 @@ static inline bool timer_lock(struct timer *timer)
 #define timer_lock_irqsave(t, flags) ({         \
     bool __x;                                   \
     local_irq_save(flags);                      \
-    if ( !(__x = timer_lock(t)) )               \
+    if ( !(__x = timer_lock_unsafe(t)) )        \
         local_irq_restore(flags);               \
+    block_lock_speculation();                   \
     __x;                                        \
 })
 
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 194701c9137d..6a1eda675dcc 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -52,9 +52,10 @@ struct pci_seg {
 
 static spinlock_t _pcidevs_lock = SPIN_LOCK_UNLOCKED;
 
-void pcidevs_lock(void)
+/* Do not use, as it has no speculation barrier, use pcidevs_lock() instead. */
+void pcidevs_lock_unsafe(void)
 {
-    spin_lock_recursive(&_pcidevs_lock);
+    _spin_lock_recursive(&_pcidevs_lock);
 }
 
 void pcidevs_unlock(void)
diff --git a/xen/include/xen/event.h b/xen/include/xen/event.h
index 8e509e078475..f1472ea1ebe5 100644
--- a/xen/include/xen/event.h
+++ b/xen/include/xen/event.h
@@ -114,12 +114,12 @@ void notify_via_xen_event_channel(struct domain *ld, int lport);
 #define bucket_from_port(d, p) \
     ((group_from_port(d, p))[((p) % EVTCHNS_PER_GROUP) / EVTCHNS_PER_BUCKET])
 
-static inline void evtchn_read_lock(struct evtchn *evtchn)
+static always_inline void evtchn_read_lock(struct evtchn *evtchn)
 {
     read_lock(&evtchn->lock);
 }
 
-static inline bool evtchn_read_trylock(struct evtchn *evtchn)
+static always_inline bool evtchn_read_trylock(struct evtchn *evtchn)
 {
     return read_trylock(&evtchn->lock);
 }
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 1df1863b1331..63e49f0117e9 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -166,8 +166,12 @@ struct pci_dev {
  * devices, it also sync the access to the msi capability that is not
  * interrupt handling related (the mask bit register).
  */
-
-void pcidevs_lock(void);
+void pcidevs_lock_unsafe(void);
+static always_inline void pcidevs_lock(void)
+{
+    pcidevs_lock_unsafe();
+    block_lock_speculation();
+}
 void pcidevs_unlock(void);
 bool __must_check pcidevs_locked(void);
 
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86/mm: add speculation barriers to open coded locks

Add a speculation barrier to the clearly identified open-coded lock taking
functions.

Note that the memory sharing page_lock() replacement (_page_lock()) is left
as-is, as the code is experimental and not security supported.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
index 7d26d9cd2fc0..65d209d5fff4 100644
--- a/xen/arch/x86/include/asm/mm.h
+++ b/xen/arch/x86/include/asm/mm.h
@@ -399,7 +399,9 @@ const struct platform_bad_page *get_platform_badpages(unsigned int *array_size);
  * The use of PGT_locked in mem_sharing does not collide, since mem_sharing is
  * only supported for hvm guests, which do not have PV PTEs updated.
  */
-int page_lock(struct page_info *page);
+int page_lock_unsafe(struct page_info *page);
+#define page_lock(pg)   lock_evaluate_nospec(page_lock_unsafe(pg))
+
 void page_unlock(struct page_info *page);
 
 void put_page_type(struct page_info *page);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 0c6658298de2..2ba4c2640149 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2034,7 +2034,7 @@ static inline bool current_locked_page_ne_check(struct page_info *page) {
 #define current_locked_page_ne_check(x) true
 #endif
 
-int page_lock(struct page_info *page)
+int page_lock_unsafe(struct page_info *page)
 {
     unsigned long x, nx;
 
@@ -2095,7 +2095,7 @@ void page_unlock(struct page_info *page)
  * l3t_lock(), so to avoid deadlock we must avoid grabbing them in
  * reverse order.
  */
-static void l3t_lock(struct page_info *page)
+static always_inline void l3t_lock(struct page_info *page)
 {
     unsigned long x, nx;
 
@@ -2104,6 +2104,8 @@ static void l3t_lock(struct page_info *page)
             cpu_relax();
         nx = x | PGT_locked;
     } while ( cmpxchg(&page->u.inuse.type_info, x, nx) != x );
+
+    block_lock_speculation();
 }
 
 static void l3t_unlock(struct page_info *page)
From: =?UTF-8?q?Roger=20Pau=20Monn=C3=A9?= <roger.pau@citrix.com>
Subject: x86: protect conditional lock taking from speculative execution

Conditionally taken locks that use the pattern:

if ( lock )
    spin_lock(...);

Need an else branch in order to issue an speculation barrier in the else case,
just like it's done in case the lock needs to be acquired.

eval_nospec() could be used on the condition itself, but that would result in a
double barrier on the branch where the lock is taken.

Introduce a new pair of helpers, {gfn,spin}_lock_if() that can be used to
conditionally take a lock in a speculation safe way.

This is part of XSA-453 / CVE-2024-2193

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 2ba4c2640149..62f5b811bbe8 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5018,8 +5018,7 @@ static l3_pgentry_t *virt_to_xen_l3e(unsigned long v)
         if ( !l3t )
             return NULL;
         UNMAP_DOMAIN_PAGE(l3t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
         {
             l4_pgentry_t l4e = l4e_from_mfn(l3mfn, __PAGE_HYPERVISOR);
@@ -5056,8 +5055,7 @@ static l2_pgentry_t *virt_to_xen_l2e(unsigned long v)
             return NULL;
         }
         UNMAP_DOMAIN_PAGE(l2t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l3e_get_flags(*pl3e) & _PAGE_PRESENT) )
         {
             l3e_write(pl3e, l3e_from_mfn(l2mfn, __PAGE_HYPERVISOR));
@@ -5095,8 +5093,7 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
             return NULL;
         }
         UNMAP_DOMAIN_PAGE(l1t);
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
         if ( !(l2e_get_flags(*pl2e) & _PAGE_PRESENT) )
         {
             l2e_write(pl2e, l2e_from_mfn(l1mfn, __PAGE_HYPERVISOR));
@@ -5127,6 +5124,8 @@ l1_pgentry_t *virt_to_xen_l1e(unsigned long v)
     do {                      \
         if ( locking )        \
             l3t_lock(page);   \
+        else                            \
+            block_lock_speculation();   \
     } while ( false )
 
 #define L3T_UNLOCK(page)                           \
@@ -5342,8 +5341,7 @@ int map_pages_to_xen(
             if ( l3e_get_flags(ol3e) & _PAGE_GLOBAL )
                 flush_flags |= FLUSH_TLB_GLOBAL;
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
@@ -5447,8 +5445,7 @@ int map_pages_to_xen(
                 if ( l2e_get_flags(*pl2e) & _PAGE_GLOBAL )
                     flush_flags |= FLUSH_TLB_GLOBAL;
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
@@ -5489,8 +5486,7 @@ int map_pages_to_xen(
                 unsigned long base_mfn;
                 const l1_pgentry_t *l1t;
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
 
                 ol2e = *pl2e;
                 /*
@@ -5544,8 +5540,7 @@ int map_pages_to_xen(
             unsigned long base_mfn;
             const l2_pgentry_t *l2t;
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
 
             ol3e = *pl3e;
             /*
@@ -5689,8 +5684,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                                        l3e_get_flags(*pl3e)));
             UNMAP_DOMAIN_PAGE(l2t);
 
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
             if ( (l3e_get_flags(*pl3e) & _PAGE_PRESENT) &&
                  (l3e_get_flags(*pl3e) & _PAGE_PSE) )
             {
@@ -5749,8 +5743,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
                                            l2e_get_flags(*pl2e) & ~_PAGE_PSE));
                 UNMAP_DOMAIN_PAGE(l1t);
 
-                if ( locking )
-                    spin_lock(&map_pgdir_lock);
+                spin_lock_if(locking, &map_pgdir_lock);
                 if ( (l2e_get_flags(*pl2e) & _PAGE_PRESENT) &&
                      (l2e_get_flags(*pl2e) & _PAGE_PSE) )
                 {
@@ -5794,8 +5787,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
              */
             if ( (nf & _PAGE_PRESENT) || ((v != e) && (l1_table_offset(v) != 0)) )
                 continue;
-            if ( locking )
-                spin_lock(&map_pgdir_lock);
+            spin_lock_if(locking, &map_pgdir_lock);
 
             /*
              * L2E may be already cleared, or set to a superpage, by
@@ -5842,8 +5834,7 @@ int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int nf)
         if ( (nf & _PAGE_PRESENT) ||
              ((v != e) && (l2_table_offset(v) + l1_table_offset(v) != 0)) )
             continue;
-        if ( locking )
-            spin_lock(&map_pgdir_lock);
+        spin_lock_if(locking, &map_pgdir_lock);
 
         /*
          * L3E may be already cleared, or set to a superpage, by
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 273fff86baa4..2eae73ac68d1 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -335,6 +335,15 @@ static inline void p2m_unlock(struct p2m_domain *p)
 #define p2m_locked_by_me(p)   mm_write_locked_by_me(&(p)->lock)
 #define gfn_locked_by_me(p,g) p2m_locked_by_me(p)
 
+static always_inline void gfn_lock_if(bool condition, struct p2m_domain *p2m,
+                                      gfn_t gfn, unsigned int order)
+{
+    if ( condition )
+        gfn_lock(p2m, gfn, order);
+    else
+        block_lock_speculation();
+}
+
 /* PoD lock (per-p2m-table)
  *
  * Protects private PoD data structs: entry and cache
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 4dd41193f538..ca24cd4a67da 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -281,9 +281,8 @@ mfn_t p2m_get_gfn_type_access(struct p2m_domain *p2m, gfn_t gfn,
     if ( q & P2M_UNSHARE )
         q |= P2M_ALLOC;
 
-    if ( locked )
-        /* Grab the lock here, don't release until put_gfn */
-        gfn_lock(p2m, gfn, 0);
+    /* Grab the lock here, don't release until put_gfn */
+    gfn_lock_if(locked, p2m, gfn, 0);
 
     mfn = p2m->get_entry(p2m, gfn, t, a, q, page_order, NULL);
 
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index 8430a888a8ca..e351fc9995f6 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -229,6 +229,14 @@ static always_inline void spin_lock_irq(spinlock_t *l)
         block_lock_speculation();                               \
     })
 
+/* Conditionally take a spinlock in a speculation safe way. */
+static always_inline void spin_lock_if(bool condition, spinlock_t *l)
+{
+    if ( condition )
+        _spin_lock(l);
+    block_lock_speculation();
+}
+
 #define spin_unlock(l)                _spin_unlock(l)
 #define spin_unlock_irq(l)            _spin_unlock_irq(l)
 #define spin_unlock_irqrestore(l, f)  _spin_unlock_irqrestore(l, f)