[PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall()

Sohil Mehta posted 15 patches 13 hours ago
[PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall()
Posted by Sohil Mehta 13 hours ago
Problem
-------
In order to map the EFI runtime services, set_virtual_address_map()
needs to be called, which resides in the lower half of the address
space. This means that LASS needs to be temporarily disabled around this
call.

Wrapping efi_enter_virtual_mode() with lass_stac()/clac() is not enough,
because the AC flag only gates data accesses, not instruction fetches.
Clearing the CR4.LASS bit is required to make this work.

However, pinned CR4 bits are not expected to be modified after
boot CPU init, resulting in a kernel warning.

Solution
--------
One option is to move the CR pinning setup immediately after the runtime
services have been mapped. However, that is a narrow fix that would
require revisiting if something else needs to modify a pinned CR bit.

CR pinning mainly prevents exploits from trivially modifying
security-sensitive CR bits. There is limited benefit to enabling CR
pinning before userspace comes up. Defer CR pinning enforcement until
late_initcall() to allow EFI and future users to modify the CR bits
without any concern for CR pinning.

Save the pinned bits while initializing the boot CPU because they are
needed later to program the value on APs when they come up.

Note
----
This introduces a small window between the boot CPU being initialized
and CR pinning being enforced, where any in-kernel clearing of the
pinned bits could go unnoticed. Later, when enforcement begins, a
warning is triggered as soon as any CR4 bit is modified, such as
X86_CR4_PGE during a TLB flush.

Currently, this is a purely theoretical concern. There are multiple ways
to resolve it [1] if it becomes a problem in practice.

Link: https://lore.kernel.org/lkml/c59aa7ac-62a6-45ec-b626-de518b25f7d9@intel.com/ [1]
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v10:
 - Split recording pinned bits and enabling pinning into two functions.
 - Defer pinning until userspace comes up.

This patch does not include any changes to harden the CR pinning
implementation, as that is beyond the scope of this series.
---
 arch/x86/kernel/cpu/common.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 61ab332eaf73..57d5824465b0 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -476,8 +476,8 @@ void cr4_init(void)
 
 	if (boot_cpu_has(X86_FEATURE_PCID))
 		cr4 |= X86_CR4_PCIDE;
-	if (static_branch_likely(&cr_pinning))
-		cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
+
+	cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
 
 	__write_cr4(cr4);
 
@@ -487,14 +487,21 @@ void cr4_init(void)
 
 /*
  * Once CPU feature detection is finished (and boot params have been
- * parsed), record any of the sensitive CR bits that are set, and
- * enable CR pinning.
+ * parsed), record any of the sensitive CR bits that are set.
  */
-static void __init setup_cr_pinning(void)
+static void __init record_cr_pinned_bits(void)
 {
 	cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & cr4_pinned_mask;
+}
+
+/* Enables enforcement of the CR pinned bits */
+static int __init enable_cr_pinning(void)
+{
 	static_key_enable(&cr_pinning.key);
+
+	return 0;
 }
+late_initcall(enable_cr_pinning);
 
 static __init int x86_nofsgsbase_setup(char *arg)
 {
@@ -2119,7 +2126,7 @@ static __init void identify_boot_cpu(void)
 	enable_sep_cpu();
 #endif
 	cpu_detect_tlb(&boot_cpu_data);
-	setup_cr_pinning();
+	record_cr_pinned_bits();
 
 	tsx_init();
 	tdx_init();
-- 
2.43.0
Re: [PATCH v10 05/15] x86/cpu: Defer CR pinning enforcement until late_initcall()
Posted by Edgecombe, Rick P 2 hours ago
On Mon, 2025-10-06 at 23:51 -0700, Sohil Mehta wrote:
> Problem
> -------
> In order to map the EFI runtime services, set_virtual_address_map()
> needs to be called, which resides in the lower half of the address
> space. This means that LASS needs to be temporarily disabled around this
> call.

Possibly naive EFI question...

efi_runtime_services_32_t has: 
   typedef struct {
   	efi_table_hdr_t hdr;
   	u32 get_time;
   	u32 set_time;
   	u32 get_wakeup_time;
   	u32 set_wakeup_time;
   	u32 set_virtual_address_map;
   	u32 convert_pointer;
   	u32 get_variable;
   	u32 get_next_variable;
   	u32 set_variable;
   	u32 get_next_high_mono_count;
   	u32 reset_system;
   	u32 update_capsule;
   	u32 query_capsule_caps;
   	u32 query_variable_info;
   } efi_runtime_services_32_t;
   

Why is only set_virtual_address_map problematic? Some of the other ones get
called after boot by a bunch of modules by the looks of it.

> 
> Wrapping efi_enter_virtual_mode() with lass_stac()/clac() is not enough,
> because the AC flag only gates data accesses, not instruction fetches.
> Clearing the CR4.LASS bit is required to make this work.
> 
> However, pinned CR4 bits are not expected to be modified after
> boot CPU init, resulting in a kernel warning.
> 
> Solution
> --------
> One option is to move the CR pinning setup immediately after the runtime
> services have been mapped. However, that is a narrow fix that would
> require revisiting if something else needs to modify a pinned CR bit.
> 
> CR pinning mainly prevents exploits from trivially modifying
> security-sensitive CR bits. There is limited benefit to enabling CR
> pinning before userspace comes up. Defer CR pinning enforcement until
> late_initcall() to allow EFI and future users to modify the CR bits
> without any concern for CR pinning.
> 
> Save the pinned bits while initializing the boot CPU because they are
> needed later to program the value on APs when they come up.
> 
> Note
> ----
> This introduces a small window between the boot CPU being initialized
> and CR pinning being enforced, where any in-kernel clearing of the
> pinned bits could go unnoticed. Later, when enforcement begins, a
> warning is triggered as soon as any CR4 bit is modified, such as
> X86_CR4_PGE during a TLB flush.
> 
> Currently, this is a purely theoretical concern. There are multiple ways
> to resolve it [1] if it becomes a problem in practice.
> 
> Link: https://lore.kernel.org/lkml/c59aa7ac-62a6-45ec-b626-de518b25f7d9@intel.com/ [1]
> Suggested-by: Dave Hansen <dave.hansen@intel.com>
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> ---
> v10:
>  - Split recording pinned bits and enabling pinning into two functions.
>  - Defer pinning until userspace comes up.
> 
> This patch does not include any changes to harden the CR pinning
> implementation, as that is beyond the scope of this series.
> ---
>  arch/x86/kernel/cpu/common.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 61ab332eaf73..57d5824465b0 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -476,8 +476,8 @@ void cr4_init(void)
>  
>  	if (boot_cpu_has(X86_FEATURE_PCID))
>  		cr4 |= X86_CR4_PCIDE;
> -	if (static_branch_likely(&cr_pinning))
> -		cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
> +
> +	cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;


Can you explain why this change is needed? It relies on cr4_pinned_bits to be
already set, and kind of is "enforcement", but no longer depends on
enable_cr_pinning() being called.


>  
>  	__write_cr4(cr4);
>  
> @@ -487,14 +487,21 @@ void cr4_init(void)
>  
>  /*
>   * Once CPU feature detection is finished (and boot params have been
> - * parsed), record any of the sensitive CR bits that are set, and
> - * enable CR pinning.
> + * parsed), record any of the sensitive CR bits that are set.
>   */
> -static void __init setup_cr_pinning(void)
> +static void __init record_cr_pinned_bits(void)
>  {
>  	cr4_pinned_bits = this_cpu_read(cpu_tlbstate.cr4) & cr4_pinned_mask;
> +}
> +
> +/* Enables enforcement of the CR pinned bits */
> +static int __init enable_cr_pinning(void)
> +{
>  	static_key_enable(&cr_pinning.key);
> +
> +	return 0;
>  }
> +late_initcall(enable_cr_pinning);
>  
>  static __init int x86_nofsgsbase_setup(char *arg)
>  {
> @@ -2119,7 +2126,7 @@ static __init void identify_boot_cpu(void)
>  	enable_sep_cpu();
>  #endif
>  	cpu_detect_tlb(&boot_cpu_data);
> -	setup_cr_pinning();
> +	record_cr_pinned_bits();
>  
>  	tsx_init();
>  	tdx_init();