From: Dave Hansen <dave.hansen@linux.intel.com>
== CR Pinning Background ==
Modern CPU hardening features like SMAP/SMEP are enabled by flipping
control register (CR) bits. Attackers find these features inconvenient and
often try to disable them.
CR-pinning is a kernel hardening feature that detects when
security-sensitive control bits are flipped off, complains about it, then
turns them back on. The CR-pinning checks are performed in the CR
manipulation helpers.
X86_CR4_FRED controls FRED enabling and is pinned. There is a single,
system-wide static key that controls CR-pinning behavior. The static key is
enabled by the boot CPU after it has established its CR configuration.
The end result is that CR-pinning is not active while initializing the boot
CPU but it is active while bringing up secondary CPUs.
== FRED Background ==
FRED is a new hardware entry/exit feature for the kernel. It is not on by
default and started out as Intel-only. AMD is just adding support now.
FRED has MSRs for configuration and is enabled by the pinned X86_CR4_FRED
bit. It should not be enabled until after MSRs are properly initialized.
== SEV Background ==
AMD SEV-ES and SEV-SNP use #VC (Virtualization Communication) exceptions to
handle operations that require hypervisor assistance. These exceptions
occur during various operations including MMIO access, CPUID instructions,
and certain memory accesses.
Writes to the console can generate #VC.
== Problem ==
CR-pinning implicitly enables FRED on secondary CPUs at a different point
than the boot CPU. This point is *before* the CPU has done an explicit
cr4_set_bits(X86_CR4_FRED) and before the MSRs are initialized. This means
that there is a window where no exceptions can be handled.
For SEV-ES/SNP and TDX guests, any console output during this window
triggers #VC or #VE exceptions that result in triple faults because the
exception handlers rely on FRED MSRs that aren't yet configured.
== Fix ==
Defer CR-pinning enforcement during secondary CPU bringup. This avoids any
implicit CR changes during CPU bringup, ensuring that FRED is not enabled
before it is configured and able to handle a #VC or #VE.
This also aligns boot and secondary CPU bringup.
CR-pinning is now enforced only when the CPU is online. cr4_init() is
called during secondary CPU bringup, while the CPU is still offline, so the
pinning logic in cr4_init() is redundant. Remove it and add WARN_ON_ONCE()
to catch any future break of this assumption.
Note: FRED is not on by default anywhere so this is not likely to be
causing many problems. The only reason this was noticed was that AMD
started to enable FRED and was turning it on.
Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code")
Reported-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
[ Nikunj: Updated SEV background section wording ]
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Cc: stable@vger.kernel.org # 6.9+
---
arch/x86/kernel/cpu/common.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 1c3261cae40c..3ccc6416a11d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -434,6 +434,21 @@ static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_C
static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning);
static unsigned long cr4_pinned_bits __ro_after_init;
+static bool cr_pinning_enabled(void)
+{
+ if (!static_branch_likely(&cr_pinning))
+ return false;
+
+ /*
+ * Do not enforce pinning during CPU bringup. It might
+ * turn on features that are not set up yet, like FRED.
+ */
+ if (!cpu_online(smp_processor_id()))
+ return false;
+
+ return true;
+}
+
void native_write_cr0(unsigned long val)
{
unsigned long bits_missing = 0;
@@ -441,7 +456,7 @@ void native_write_cr0(unsigned long val)
set_register:
asm volatile("mov %0,%%cr0": "+r" (val) : : "memory");
- if (static_branch_likely(&cr_pinning)) {
+ if (cr_pinning_enabled()) {
if (unlikely((val & X86_CR0_WP) != X86_CR0_WP)) {
bits_missing = X86_CR0_WP;
val |= bits_missing;
@@ -460,7 +475,7 @@ void __no_profile native_write_cr4(unsigned long val)
set_register:
asm volatile("mov %0,%%cr4": "+r" (val) : : "memory");
- if (static_branch_likely(&cr_pinning)) {
+ if (cr_pinning_enabled()) {
if (unlikely((val & cr4_pinned_mask) != cr4_pinned_bits)) {
bits_changed = (val & cr4_pinned_mask) ^ cr4_pinned_bits;
val = (val & ~cr4_pinned_mask) | cr4_pinned_bits;
@@ -502,8 +517,8 @@ void cr4_init(void)
if (boot_cpu_has(X86_FEATURE_PCID))
cr4 |= X86_CR4_PCIDE;
- if (static_branch_likely(&cr_pinning))
- cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
+
+ WARN_ON_ONCE(cr_pinning_enabled());
__write_cr4(cr4);
--
2.48.1
On Thu, Feb 26, 2026 at 09:23:48AM +0000, Nikunj A Dadhania wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> == CR Pinning Background ==
>
> Modern CPU hardening features like SMAP/SMEP are enabled by flipping
> control register (CR) bits. Attackers find these features inconvenient and
> often try to disable them.
>
> CR-pinning is a kernel hardening feature that detects when
> security-sensitive control bits are flipped off, complains about it, then
> turns them back on. The CR-pinning checks are performed in the CR
> manipulation helpers.
>
> X86_CR4_FRED controls FRED enabling and is pinned. There is a single,
> system-wide static key that controls CR-pinning behavior. The static key is
> enabled by the boot CPU after it has established its CR configuration.
>
> The end result is that CR-pinning is not active while initializing the boot
> CPU but it is active while bringing up secondary CPUs.
>
> == FRED Background ==
>
> FRED is a new hardware entry/exit feature for the kernel. It is not on by
> default and started out as Intel-only. AMD is just adding support now.
>
> FRED has MSRs for configuration and is enabled by the pinned X86_CR4_FRED
> bit. It should not be enabled until after MSRs are properly initialized.
>
> == SEV Background ==
>
> AMD SEV-ES and SEV-SNP use #VC (Virtualization Communication) exceptions to
> handle operations that require hypervisor assistance. These exceptions
> occur during various operations including MMIO access, CPUID instructions,
> and certain memory accesses.
>
> Writes to the console can generate #VC.
>
> == Problem ==
>
> CR-pinning implicitly enables FRED on secondary CPUs at a different point
> than the boot CPU. This point is *before* the CPU has done an explicit
> cr4_set_bits(X86_CR4_FRED) and before the MSRs are initialized. This means
> that there is a window where no exceptions can be handled.
>
> For SEV-ES/SNP and TDX guests, any console output during this window
> triggers #VC or #VE exceptions that result in triple faults because the
> exception handlers rely on FRED MSRs that aren't yet configured.
>
> == Fix ==
>
> Defer CR-pinning enforcement during secondary CPU bringup. This avoids any
> implicit CR changes during CPU bringup, ensuring that FRED is not enabled
> before it is configured and able to handle a #VC or #VE.
>
> This also aligns boot and secondary CPU bringup.
>
> CR-pinning is now enforced only when the CPU is online. cr4_init() is
> called during secondary CPU bringup, while the CPU is still offline, so the
> pinning logic in cr4_init() is redundant. Remove it and add WARN_ON_ONCE()
> to catch any future break of this assumption.
>
> Note: FRED is not on by default anywhere so this is not likely to be
> causing many problems. The only reason this was noticed was that AMD
> started to enable FRED and was turning it on.
>
> Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code")
> Reported-by: Nikunj A Dadhania <nikunj@amd.com>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
> [ Nikunj: Updated SEV background section wording ]
> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
> Cc: stable@vger.kernel.org # 6.9+
> ---
> arch/x86/kernel/cpu/common.c | 23 +++++++++++++++++++----
> 1 file changed, 19 insertions(+), 4 deletions(-)
My SNP guest stops booting with this right:
...
[ 3.134372] Memory Encryption Features active: AMD SEV SEV-ES SEV-SNP
[ 3.138211] SEV: Status: SEV SEV-ES SEV-SNP
[ 3.142211] pid_max: default: 32768 minimum: 301
[ 3.145350] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
[ 3.146222] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
[ 3.150613] VFS: Finished mounting rootfs on nullfs
[ 3.154971] Running RCU synchronous self tests
[ 3.158212] Running RCU synchronous self tests
[ 3.269518] smpboot: CPU0: AMD EPYC-v4 Processor (family: 0x17, model: 0x1, stepping: 0x2)
[ 3.270237] SEV: APIC: wakeup_secondary_cpu() replaced with wakeup_cpu_via_vmgexit()
[ 3.274786] Performance Events: Fam17h+ core perfctr, AMD PMU driver.
[ 3.278228] ... version: 0
[ 3.282211] ... bit width: 48
[ 3.284781] ... generic counters: 6
[ 3.286212] ... generic bitmap: 000000000000003f
[ 3.290211] ... fixed-purpose counters: 0
[ 3.294211] ... fixed-purpose bitmap: 0000000000000000
[ 3.297549] ... value mask: 0000ffffffffffff
[ 3.298211] ... max period: 00007fffffffffff
[ 3.302211] ... global_ctrl mask: 000000000000003f
[ 3.306490] signal: max sigframe size: 1776
[ 3.310343] rcu: Hierarchical SRCU implementation.
[ 3.313285] rcu: Max phase no-delay instances is 1000.
[ 3.314355] Timer migration: 2 hierarchy levels; 8 children per group; 2 crossnode level
[ 3.326860] smp: Bringing up secondary CPUs ...
[ 3.329547] smpboot: Parallel CPU startup disabled by the platform
[ 3.330627] smpboot: x86: Booting SMP configuration:
<--- HERE.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 3/9/26 06:46, Borislav Petkov wrote:
> My SNP guest stops booting with this right:
Could you dump out CR4 at wakeup_cpu_via_vmgexit() before and after this
patch? Right here:
/* CR4 should maintain the MCE value */
cr4 = native_read_cr4() & X86_CR4_MCE;
It's got to be some delta there.
The other possibility is that some CR4 bit becomes no longer pinned when
the CPU comes up, and the *pinning* was what caused the secondary CPU's
CR4 bit to get set, not its actual initialization.
Basically, the secondary boot code didn't explicitly set a bit and
counted on the pinning code to do it instead.
It's probably exacerbated by the "novel" way that SEV-SNP CPUs get
brought up and all the assembly that *only* runs there.
On Mon, Mar 09, 2026 at 08:38:10AM -0700, Dave Hansen wrote:
> On 3/9/26 06:46, Borislav Petkov wrote:
> > My SNP guest stops booting with this right:
>
> Could you dump out CR4 at wakeup_cpu_via_vmgexit() before and after this
> patch? Right here:
>
> /* CR4 should maintain the MCE value */
> cr4 = native_read_cr4() & X86_CR4_MCE;
>
> It's got to be some delta there.
Looks the same to me:
before: 31 SEV: wakeup_cpu_via_vmgexit: CR4: 0x3506f0
That's 31 CPUs - no BSP with the CR4 value above.
after: [ 3.354326] SEV: wakeup_cpu_via_vmgexit: CR4: 0x3506f0
That stops after CPU1, i.e., the first AP. But the CR4 value is the same.
> The other possibility is that some CR4 bit becomes no longer pinned when
> the CPU comes up, and the *pinning* was what caused the secondary CPU's
> CR4 bit to get set, not its actual initialization.
>
> Basically, the secondary boot code didn't explicitly set a bit and
> counted on the pinning code to do it instead.
>
> It's probably exacerbated by the "novel" way that SEV-SNP CPUs get
> brought up and all the assembly that *only* runs there.
I guess I can start commenting out things to see what happens. Hmmm...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 3/9/26 09:15, Borislav Petkov wrote:
> On Mon, Mar 09, 2026 at 08:38:10AM -0700, Dave Hansen wrote:
>> On 3/9/26 06:46, Borislav Petkov wrote:
>>> My SNP guest stops booting with this right:
>> Could you dump out CR4 at wakeup_cpu_via_vmgexit() before and after this
>> patch? Right here:
>>
>> /* CR4 should maintain the MCE value */
>> cr4 = native_read_cr4() & X86_CR4_MCE;
>>
>> It's got to be some delta there.
> Looks the same to me:
>
> before: 31 SEV: wakeup_cpu_via_vmgexit: CR4: 0x3506f0
>
> That's 31 CPUs - no BSP with the CR4 value above.
>
> after: [ 3.354326] SEV: wakeup_cpu_via_vmgexit: CR4: 0x3506f0
>
> That stops after CPU1, i.e., the first AP. But the CR4 value is the same.
The only pinned bits in there are: SMAP, SMEP and FSGSBASE.
SMAP and SMEP are unlikely to be biting us here.
FSGSBASE is _possible_ but I don't see any of the {RD,WR}{F,G}SBASE
instructions in early boot where it would bite us.
Can you boot this thing without FSGSBASE support?
The other option would be to boot a working system, normally and see
what is getting flipped by pinning at cr4_init(). The attached patch
does that. It also uses trace_printk() so it hopefully won't trip over
#VC's during early boot with the console.
For me, it's flipping on 0x310800, which is:
#define X86_CR4_OSXMMEXCPT (1ul << 10)
#define X86_CR4_FSGSBASE (1ul << 16)
#define X86_CR4_SMEP (1ul << 20)
#define X86_CR4_SMAP (1ul << 21)
*Maybe* the paranoid entry code is getting called from the #VC handler
in early boot? It has ALTERNATIVEs on X86_FEATURE_FSGSBASE and might be
using the FSGSBASE instructions in there.
On 3/9/26 13:03, Dave Hansen wrote:
> On 3/9/26 09:15, Borislav Petkov wrote:
>> On Mon, Mar 09, 2026 at 08:38:10AM -0700, Dave Hansen wrote:
>>> On 3/9/26 06:46, Borislav Petkov wrote:
>>>> My SNP guest stops booting with this right:
>>> Could you dump out CR4 at wakeup_cpu_via_vmgexit() before and after this
>>> patch? Right here:
>>>
>>> /* CR4 should maintain the MCE value */
>>> cr4 = native_read_cr4() & X86_CR4_MCE;
>>>
>>> It's got to be some delta there.
>> Looks the same to me:
>>
>> before: 31 SEV: wakeup_cpu_via_vmgexit: CR4: 0x3506f0
>>
>> That's 31 CPUs - no BSP with the CR4 value above.
>>
>> after: [ 3.354326] SEV: wakeup_cpu_via_vmgexit: CR4: 0x3506f0
>>
>> That stops after CPU1, i.e., the first AP. But the CR4 value is the same.
>
> The only pinned bits in there are: SMAP, SMEP and FSGSBASE.
>
> SMAP and SMEP are unlikely to be biting us here.
>
> FSGSBASE is _possible_ but I don't see any of the {RD,WR}{F,G}SBASE
> instructions in early boot where it would bite us.
>
> Can you boot this thing without FSGSBASE support?
>
> The other option would be to boot a working system, normally and see
> what is getting flipped by pinning at cr4_init(). The attached patch
> does that. It also uses trace_printk() so it hopefully won't trip over
> #VC's during early boot with the console.
>
> For me, it's flipping on 0x310800, which is:
>
> #define X86_CR4_OSXMMEXCPT (1ul << 10)
> #define X86_CR4_FSGSBASE (1ul << 16)
> #define X86_CR4_SMEP (1ul << 20)
> #define X86_CR4_SMAP (1ul << 21)
>
> *Maybe* the paranoid entry code is getting called from the #VC handler
> in early boot? It has ALTERNATIVEs on X86_FEATURE_FSGSBASE and might be
> using the FSGSBASE instructions in there.
Could be... before the patch the AP CR4 value is:
[ 0.020362] *** DEBUG: cr4_init - cr4=0x3100f0
after the patch it is:
[ 0.020284] *** DEBUG: cr4_init - cr4=0xf0
The SNP guest is dying in __x2apic_enable() when trying to read
MSR_IA32_APICBASE, which will trigger a #VC.
If I set CR4[16] in cr4_init() then the SNP guest boots fine.
Thanks,
Tom
On 3/9/26 11:40, Tom Lendacky wrote: > The SNP guest is dying in __x2apic_enable() when trying to read > MSR_IA32_APICBASE, which will trigger a #VC. > > If I set CR4[16] in cr4_init() then the SNP guest boots fine. That sounds pretty definitive. How does this work on the boot CPU? How does it manage to get FSGSBASE set up before __x2apic_enable()? Or is it on the early exception code, which might not use FSGSBASE instructions? Either way, I do think this needs to get fixed up. It was not acceptable for cr4_init() implicitly to set pinned features and then have the CPU boot code come along and do: cr4_set_bits(X86_CR4_FSGSBASE); It all basically worked by accident before.
On 3/10/2026 12:57 AM, Dave Hansen wrote:
> On 3/9/26 11:40, Tom Lendacky wrote:
>> The SNP guest is dying in __x2apic_enable() when trying to read
>> MSR_IA32_APICBASE, which will trigger a #VC.
>>
>> If I set CR4[16] in cr4_init() then the SNP guest boots fine.
>
> That sounds pretty definitive.
Thanks Dave and Tom for the help to uncover CR4.FSGSBASE implicit dependency.
> How does this work on the boot CPU? How does it manage to get FSGSBASE
> set up before __x2apic_enable()?
The boot CPU doesn't have FSGSBASE enabled before __x2apic_enable() either.
> Or is it on the early exception code, which might not use FSGSBASE instructions?
Correct. The key difference is timing relative to ALTERNATIVE patching. I added instrumentation to verify this behavior:
Boot CPU sequence (without CR pinning disable patch):
traps: trap_init() ENTRY: CR4.FSGSBASE=0, cpu=0
traps: trap_init() AFTER cpu_init: CR4.FSGSBASE=0, cpu=0
arch_cpu_finalize_init() ENTRY: CR4.FSGSBASE=0, cpu=0
identify_boot_cpu() ENTRY: CR4.FSGSBASE=0, cpu=0
identify_boot_cpu() EXIT: CR4.FSGSBASE=1, cpu=0
SMP alternatives: BEFORE apply_alternatives: CR4.FSGSBASE=1, boot_cpu_has(FSGSBASE)=1, cpu=0
SMP alternatives: Starting ALTERNATIVE patching
The boot CPU enables CR4.FSGSBASE in identify_boot_cpu() *before* ALTERNATIVE
patching occurs in alternative_instructions(). This means
cpu_init()
-> x2apic_setup()
-> __x2apic_enable()
executes unpatched paranoid_entry() code that uses RDMSR/SWAPGS instead of
RDGSBASE/WRGSBASE. Due to this, boot CPU does not have this problem.
Secondary CPU sequence (without CR pinning disable patch):
smpboot: start_secondary() BEFORE cr4_init: CR4.FSGSBASE=0, cpu=1
cr4_init() ENTRY: CR4=0x10f0, FSGSBASE=0, cpu=1
cr4_init() EXIT: CR4=0x10f0->0x3318f0, FSGSBASE=0->1, cpu=1, pinning=1
Secondary CPUs boot after alternatives have been applied globally. They
execute already-patched paranoid_entry() code that uses RDGSBASE/WRGSBASE
instructions, which require CR4.FSGSBASE=1.
Currently, secondary CPUs get CR4.FSGSBASE set implicitly through CR pinning.
The CR pinning disable patch removes this implicit setting, exposing the
hidden dependency. Without CR4.FSGSBASE enabled, RDGSBASE/WRGSBASE should
trigger #UD.
Note: I also verified the root cause by disabling the FSGSBASE ALTERNATIVE
patching, which forced the code to always use RDMSR/SWAPGS. With this change,
SNP guests boot successfully even without CR4.FSGSBASE set early, confirming
the issue is the timing between ALTERNATIVE patching (global) and CR4.FSGSBASE
enablement(per-CPU)
> Either way, I do think this needs to get fixed up. It was not acceptable
> for cr4_init() implicitly to set pinned features and then have the CPU
> boot code come along and do:
>
> cr4_set_bits(X86_CR4_FSGSBASE);
>
> It all basically worked by accident before.
Agreed. The attached pre-patch makes the dependency explicit by directly
enabling CR4.FSGSBASE in cr4_init() when the feature is available. This
ensures secondary CPUs have FSGSBASE enabled before any exceptions can
occur, regardless of CR pinning state:
Secondary CPU sequence (with this patch):
smpboot: start_secondary() BEFORE cr4_init: CR4.FSGSBASE=0, cpu=1
cr4_init() ENTRY: CR4=0x10f0, FSGSBASE=0, cpu=1
cr4_init() EXIT: CR4=0x10f0->0x310f0, FSGSBASE=0->1, cpu=1, pinning=0
-------------------------------------------------------------------------
From: Nikunj A Dadhania <nikunj@amd.com>
Subject: [PATCH] x86/cpu: Enable FSGSBASE early in cr4_init()
== Background ==
Exception entry code (paranoid_entry()) uses ALTERNATIVE patching based on
X86_FEATURE_FSGSBASE to decide whether to use RDGSBASE/WRGSBASE
instructions or the slower RDMSR/SWAPGS sequence for saving/restoring
GSBASE.
For boot cpu, ALTERNATIVE patching happens after enabling FSGSBASE in CR4.
When the feature is available, the code is permanently patched to use
RDGSBASE/WRGSBASE, which require CR4.FSGSBASE=1 to execute without
triggering #UD.
== Boot Sequence ==
Boot CPU (with CR pinning enabled):
trap_init()
cpu_init() <- Uses unpatched code (RDMSR/SWAPGS)
x2apic_setup()
...
arch_cpu_finalize_init()
identify_boot_cpu()
identify_cpu()
cr4_set_bits(X86_CR4_FSGSBASE) # Enables the feature
# This becomes part of cr4_pinned_bits
...
alternative_instructions() <- Patches code to use RDGSBASE/WRGSBASE
Secondary CPUs (with CR pinning enabled):
start_secondary()
cr4_init() <- Code already patched, CR4.FSGSBASE=1
set implicitly via cr4_pinned_bits
cpu_init() <- exceptions work because FSGSBASE is
already enabled
Secondary CPU (with CR pinning disabled):
start_secondary()
cr4_init() <- Code already patched, CR4.FSGSBASE=0
cpu_init()
x2apic_setup()
rdmsrq(MSR_IA32_APICBASE) <- Triggers #VC in SNP guests
exc_vmm_communication()
paranoid_entry() <- Uses RDGSBASE with CR4.FSGSBASE=0
(patched code)
...
ap_starting()
identify_secondary_cpu()
identify_cpu()
cr4_set_bits(X86_CR4_FSGSBASE) <- Enables the feature, which is
too late
== CR Pinning ==
Currently, CR4.FSGSBASE is set implicitly through CR pinning: the boot CPU
sets it during identify_cpu(), it becomes part of cr4_pinned_bits, and
cr4_init() applies those pinned bits to secondary CPUs. This works but
creates an undocumented dependency between cr4_init() and the pinning
mechanism.
== Problem ==
Secondary CPUs boot after alternatives have been applied globally. They
execute already-patched paranoid_entry() code that uses RDGSBASE/WRGSBASE
instructions, which require CR4.FSGSBASE=1. Upcoming changes to CR pinning
behavior will break the implicit dependency, causing secondary CPUs to
generate #UD.
This issue manifests on AMD SEV-SNP guests, where the rdmsrq() in
x2apic_setup() triggers a #VC exception early during cpu_init(). The #VC
handler (exc_vmm_communication()) executes the patched paranoid_entry()
path. Without CR4.FSGSBASE enabled, RDGSBASE instructions trigger #UD.
== Fix ==
Make the dependency explicit by directly enabling CR4.FSGSBASE in
cr4_init() when the feature is available. This ensures secondary CPUs have
FSGSBASE enabled before any exceptions can occur, matching the boot CPU's
final state.
Fixes: c82965f9e530 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
Cc: stable@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sohil Mehta <sohil.mehta@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Nikunj A Dadhania <nikunj@amd.com>
---
arch/x86/kernel/cpu/common.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 1c3261cae40c..f5f9b242a983 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -502,6 +502,16 @@ void cr4_init(void)
if (boot_cpu_has(X86_FEATURE_PCID))
cr4 |= X86_CR4_PCIDE;
+
+ /*
+ * Enable FSGSBASE if available. Exception entry code (paranoid_entry)
+ * is patched to use RDGSBASE/WRGSBASE when this feature is present,
+ * and those instructions require CR4.FSGSBASE=1. Secondary CPUs must
+ * enable this before any exceptions occur.
+ */
+ if (boot_cpu_has(X86_FEATURE_FSGSBASE))
+ cr4 |= X86_CR4_FSGSBASE;
+
if (static_branch_likely(&cr_pinning))
cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
--
2.48.1
Nikunj, thanks for tracking this down and filling in the last piece of the puzzle about ALTERNATIVEs patching. On 3/11/26 03:41, Nikunj A. Dadhania wrote: > + /* > + * Enable FSGSBASE if available. Exception entry code (paranoid_entry) > + * is patched to use RDGSBASE/WRGSBASE when this feature is present, > + * and those instructions require CR4.FSGSBASE=1. Secondary CPUs must > + * enable this before any exceptions occur. > + */ > + if (boot_cpu_has(X86_FEATURE_FSGSBASE)) > + cr4 |= X86_CR4_FSGSBASE; But this still double-enables X86_CR4_FSGSBASE. Could we initialize X86_CR4_FSGSBASE in *one* place, please? Also, please avoid passive voice in stuff like this. It's just more efficient to say: CPUs that support FSGSBASE may use RDGSBASE/WRGSBASE in paranoid_entry(). Enable the feature before any exceptions occur.
On 3/11/2026 7:37 PM, Dave Hansen wrote:
> Nikunj, thanks for tracking this down and filling in the last piece of
> the puzzle about ALTERNATIVEs patching.
>
> On 3/11/26 03:41, Nikunj A. Dadhania wrote:
>> + /*
>> + * Enable FSGSBASE if available. Exception entry code (paranoid_entry)
>> + * is patched to use RDGSBASE/WRGSBASE when this feature is present,
>> + * and those instructions require CR4.FSGSBASE=1. Secondary CPUs must
>> + * enable this before any exceptions occur.
>> + */
>> + if (boot_cpu_has(X86_FEATURE_FSGSBASE))
>> + cr4 |= X86_CR4_FSGSBASE;
>
> But this still double-enables X86_CR4_FSGSBASE. Could we initialize
> X86_CR4_FSGSBASE in *one* place, please?
Makes sense. Moving X86_CR4_FSGSBASE enablement to cr4_init() and calling cr4_init()
early in the boot CPU path eliminates the double-enable. I've tested this lightly -
let me know if this is the right approach.
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index a24c7805acdb..98efbd13a5a6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -595,7 +595,7 @@ extern void load_fixmap_gdt(int);
extern void cpu_init(void);
extern void cpu_init_exception_handling(bool boot_cpu);
extern void cpu_init_replace_early_idt(void);
-extern void cr4_init(void);
+extern void cr4_init(bool boot_cpu);
extern void set_task_blockstep(struct task_struct *task, bool on);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 1c3261cae40c..aa07e2eef228 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -496,14 +496,26 @@ unsigned long cr4_read_shadow(void)
}
EXPORT_SYMBOL_FOR_KVM(cr4_read_shadow);
-void cr4_init(void)
+void cr4_init(bool boot_cpu)
{
unsigned long cr4 = __read_cr4();
- if (boot_cpu_has(X86_FEATURE_PCID))
- cr4 |= X86_CR4_PCIDE;
- if (static_branch_likely(&cr_pinning))
- cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
+ /*
+ * CPUs that support FSGSBASE may use RDGSBASE/WRGSBASE in
+ * paranoid_entry(). Enable the feature before any exceptions
+ * occur.
+ */
+ if (boot_cpu_has(X86_FEATURE_FSGSBASE)) {
+ cr4 |= X86_CR4_FSGSBASE;
+ elf_hwcap2 |= HWCAP2_FSGSBASE;
+ }
+
+ if (!boot_cpu) {
+ if (boot_cpu_has(X86_FEATURE_PCID))
+ cr4 |= X86_CR4_PCIDE;
+ if (static_branch_likely(&cr_pinning))
+ cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
+ }
__write_cr4(cr4);
@@ -2047,12 +2059,6 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_umip(c);
setup_lass(c);
- /* Enable FSGSBASE instructions if available. */
- if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
- cr4_set_bits(X86_CR4_FSGSBASE);
- elf_hwcap2 |= HWCAP2_FSGSBASE;
- }
-
/*
* The vendor-specific functions might have changed features.
* Now we do "generic changes."
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5cd6950ab672..52642908c2bf 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -233,7 +233,7 @@ static void notrace __noendbr start_secondary(void *unused)
* before cpu_init(), SMP booting is too fragile that we want to
* limit the things done here to the most necessary things.
*/
- cr4_init();
+ cr4_init(false);
/*
* 32-bit specific. 64-bit reaches this code with the correct page
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4dbff8ef9b1c..0f7400257ce5 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1685,6 +1685,13 @@ void __init trap_init(void)
/* Init GHCB memory pages when running as an SEV-ES guest */
sev_es_init_vc_handling();
+ /*
+ * Initialize CR4 early, before cpu_init(). This ensures features like
+ * FSGSBASE are enabled before exception handlers run, avoiding double
+ * initialization later.
+ */
+ cr4_init(true);
+
/* Initialize TSS before setting up traps so ISTs work */
cpu_init_exception_handling(true);
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index db9b8e222b38..2ab0e20288e9 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -58,7 +58,7 @@ static void cpu_bringup(void)
{
int cpu;
- cr4_init();
+ cr4_init(false);
cpuhp_ap_sync_alive();
cpu_init();
fpu__init_cpu();
>
> Also, please avoid passive voice in stuff like this. It's just more
> efficient to say:
>
> CPUs that support FSGSBASE may use RDGSBASE/WRGSBASE in
> paranoid_entry(). Enable the feature before any exceptions
> occur.
Ack
Regards,
Nikunj
On 3/11/26 08:42, Nikunj A. Dadhania wrote:
> -void cr4_init(void)
> +void cr4_init(bool boot_cpu)
I think the "pass a bool to a function" pattern is really not great.
The problem isn't actually setting CR4 twice. The problem is having two
different code paths to set it. Doing the 'bool' doesn't eliminate the
code path, it just makes the whole thing more complicated to reason
about and further bifurcates the boot and secondary CPU bringup paths.
We want to unify those, not bifurcate them.
Why don't we just universally set X86_CR4_FSGSBASE in cr4_init()?
BTW, commit c7ad5ad297e tells some of the tales of woe around CR4 and
boot vs. secondary CPUs:
cpu_init() is weird: it's called rather late (after early
identification and after most MMU state is initialized) on the
boot CPU but is called extremely early (before identification)
on secondary CPUs.
This weirdness is still biting us today. CR4 pinning just made things
worse (or at least harder to understand).
I have the feeling we need to bite the bullet here and actually start
thinking about this holistically. I _think_ 'mmu_cr4_features' is
conceptually pretty close to what we need. It's just named wrong.
/*
* Current system-wide configuration information for CR4 register.
* All of the bits in these feature masks are supported by the current
* running CPU.
*/
struct cr4_config {
/*
* Features that are needed in early assembly like the
* trampoline. These should not have side effects or interact
* with other features.
*
* Only written by the boot CPU. Establishes bringup value on
* secondary CPUs.
*/
unsigned long early_features;
/*
* Features that get enabled from C code: cr4_late_init().
* Anything with early side-effects that can't be in the
* early set.
*/
unsigned long late_features;
/*
* Features that do not work in real-mode.
*/
unsigned long realmode_incompat_features;
/*
* Points into the real mode trampoline header. Write to this
* every time 'early_features' is updated. (maybe??)
*/
unsigned long *trampoline_features_target;
/*
* Features that are always on when userspace is running.
* CR4 manipulation functions will notice if these bits get
* cleared, restore them and WARN().
*/
unsigned long pinned_features;
};
Then instead of sprinkling code around like:
/*
* This function is called before exiting to real-mode and that
* will
* fail with CR4.PCIDE still set.
*/
if (boot_cpu_has(X86_FEATURE_PCID))
cr4_clear_bits(X86_CR4_PCIDE);
We can do *ALL* the PCID setup in one place:
void boot_cpu_setup_pcid()
{
if (!boot_cpu_has(X86_FEATURE_PCID))
return;
/* PCID is not needed in early bringup, only enable it late: */
cr4_config.late_features |= X86_CR4_PCIDE;
/*
* PCIDE needs CR0.PG=1, which is not true in real mode.
* Ensure it is disabled in real mode:
*/
cr4_config.realmode_incompat_features |= X86_CR4_PCIDE;
__write_cr4(cr4);
/* Initialize cr4 shadow for this CPU. */
this_cpu_write(cpu_tlbstate.cr4, cr4);
}
and the read-mode entry becomes:
cr4_clear_bits(cr4_config.realmode_incompat_features);
What do folks think? Can we expand the 'mmu_cr4_features' to more than
MMU features?
On 3/11/2026 10:58 PM, Dave Hansen wrote:
> On 3/11/26 08:42, Nikunj A. Dadhania wrote:
> BTW, commit c7ad5ad297e tells some of the tales of woe around CR4 and
> boot vs. secondary CPUs:
>
> cpu_init() is weird: it's called rather late (after early
> identification and after most MMU state is initialized) on the
> boot CPU but is called extremely early (before identification)
> on secondary CPUs.
>
> This weirdness is still biting us today. CR4 pinning just made things
> worse (or at least harder to understand).
>
> I have the feeling we need to bite the bullet here and actually start
> thinking about this holistically. I _think_ 'mmu_cr4_features' is
> conceptually pretty close to what we need. It's just named wrong.
>
> /*
> * Current system-wide configuration information for CR4 register.
> * All of the bits in these feature masks are supported by the current
> * running CPU.
> */
> struct cr4_config {
...
>
> What do folks think? Can we expand the 'mmu_cr4_features' to more than
> MMU features?
I'll let you and the other x86 maintainers decide on the cr4_config
approach. However, I have two concerns for the immediate fix:
1) Back-porting complexity: The current issue affects kernels (6.9+)
where SEV-SNP guests fail to boot with FRED enabled. A simpler fix would
be easier to backport and verify across stable branches.
2) Scope and risk: The cr4_config refactoring touches core x86 boot paths
and would need careful analysis of all CR4 feature interactions
(PCID, FSGSBASE, SMEP, SMAP, etc.) across different boot scenarios
(boot CPU, secondary CPUs, real-mode transitions, kexec, etc.).
Would it make sense to take a two-phase approach:
Phase 1 (suitable for stable as well):
1) Universally set X86_CR4_FSGSBASE in cr4_init() and call cr4_init()
from trap_init() on the boot CPU
2) Disable CR pinning during secondary CPU bringup
3) Add #VC handler for FRED and use boot_ghcb during early boot
Phase 2:
- Build consensus among x86 maintainers on the cr4_config approach
- Implement the refactoring once the design is agreed upon
I'm happy to work on Phase 2 with guidance from the maintainers, but would
prefer to decouple it from the urgent boot failure fix.
Thanks,
Nikunj
On 3/12/2026 7:08 AM, Nikunj A. Dadhania wrote: > 1) Universally set X86_CR4_FSGSBASE in cr4_init() and call cr4_init() > from trap_init() on the boot CPU cr4_init() seems like the wrong place to do this. I don't think this a primarily a CR4 issue. Deferring CR4 pinning maybe have uncovered the FSGSBASE issue. But, essentially the difference lies in when we enable exception handling related features on the BSP vs APs. It involves setting a few other things than CR4 programming. See: /* * Setup everything needed to handle exceptions from the IDT, including the IST * exceptions which use paranoid_entry(). */ void cpu_init_exception_handling(bool boot_cpu) IIUC, anything that is needed to handle exceptions should be initialized here. As FSGSBASE is used in the paranoid_entry() code, should its enabling be moved to cpu_init_exception_handling()? Note, we already have other things there for #VC handling: /* GHCB needs to be setup to handle #VC. */ setup_ghcb(); > 2) Disable CR pinning during secondary CPU bringup > 3) Add #VC handler for FRED and use boot_ghcb during early boot >
On 3/12/2026 11:39 PM, Sohil Mehta wrote:
> On 3/12/2026 7:08 AM, Nikunj A. Dadhania wrote:
>
>> 1) Universally set X86_CR4_FSGSBASE in cr4_init() and call cr4_init()
>> from trap_init() on the boot CPU
>
>
> cr4_init() seems like the wrong place to do this. I don't think this a
> primarily a CR4 issue. Deferring CR4 pinning maybe have uncovered the
> FSGSBASE issue.
>
> But, essentially the difference lies in when we enable exception
> handling related features on the BSP vs APs. It involves setting a few
> other things than CR4 programming.
>
> See:
>
> /*
> * Setup everything needed to handle exceptions from the IDT, including
> the IST
> * exceptions which use paranoid_entry().
> */
> void cpu_init_exception_handling(bool boot_cpu)
>
> IIUC, anything that is needed to handle exceptions should be initialized
> here. As FSGSBASE is used in the paranoid_entry() code, should its
> enabling be moved to cpu_init_exception_handling()?
Good idea, thanks!
For the boot CPU, FRED is enabled via cpu_init_replace_early_idt(), and
cpu_init_exception_handling() is called later (before alternative patching).
FSGSBASE can be safely enabled in cpu_init_exception_handling():
start_kernel()
setup_arch()
cpu_init_replace_early_idt()
cpu_init_fred_exceptions() <-- FRED enabled here
...
trap_init()
cpu_init_exception_handling(true)
cr4_set_bits(X86_CR4_FSGSBASE); <-- Enable FSGSBASE here
...
arch_cpu_finalize_init()
...
alternative_instructions() <- Patches code to use RDGSBASE/WRGSBASE
For secondary CPUs, FSGSBASE can be safely enabled before any exceptions
occur, and FRED is enabled immediately after:
start_secondary()
cr4_init() <- Code already patched, CR4.FSGSBASE=0
cpu_init_exception_handling(false)
cr4_set_bits(X86_CR4_FSGSBASE); <-- Enable FSGSBASE here
if (!boot_cpu)
cpu_init_fred_exceptions(); <-- FRED enabled here for secondary CPU
...
cpu_init()
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 1c3261cae40c..bd35e98d648d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2047,12 +2047,6 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_umip(c);
setup_lass(c);
- /* Enable FSGSBASE instructions if available. */
- if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
- cr4_set_bits(X86_CR4_FSGSBASE);
- elf_hwcap2 |= HWCAP2_FSGSBASE;
- }
-
/*
* The vendor-specific functions might have changed features.
* Now we do "generic changes."
@@ -2413,6 +2407,16 @@ void cpu_init_exception_handling(bool boot_cpu)
/* GHCB needs to be setup to handle #VC. */
setup_ghcb();
+ /*
+ * CPUs that support FSGSBASE may use RDGSBASE/WRGSBASE in
+ * paranoid_entry(). Enable the feature before any exceptions
+ * occur.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_FSGSBASE)) {
+ cr4_set_bits(X86_CR4_FSGSBASE);
+ elf_hwcap2 |= HWCAP2_FSGSBASE;
+ }
+
if (cpu_feature_enabled(X86_FEATURE_FRED)) {
/* The boot CPU has enabled FRED during early boot */
if (!boot_cpu)
Regards,
Nikunj
On 3/13/2026 1:35 AM, Nikunj A. Dadhania wrote:
>
> + /*
> + * CPUs that support FSGSBASE may use RDGSBASE/WRGSBASE in
> + * paranoid_entry(). Enable the feature before any exceptions
> + * occur.
> + */
To ensure this not bite back again, maybe better to document the detail
on the way the entry code was shaped out here (or somewhere else):
To APs, the exception entry code was already patched when available,
which means those instructions will execute unconditionally there. So,
APs must set CR4.FSGSBASE as early as possible.
> + if (cpu_feature_enabled(X86_FEATURE_FSGSBASE)) {
> + cr4_set_bits(X86_CR4_FSGSBASE);
> + elf_hwcap2 |= HWCAP2_FSGSBASE;
> + }
> +
On 3/13/2026 1:35 AM, Nikunj A. Dadhania wrote:
> @@ -2413,6 +2407,16 @@ void cpu_init_exception_handling(bool boot_cpu)
> /* GHCB needs to be setup to handle #VC. */
> setup_ghcb();
>
> + /*
> + * CPUs that support FSGSBASE may use RDGSBASE/WRGSBASE in
> + * paranoid_entry(). Enable the feature before any exceptions
> + * occur.
> + */
> + if (cpu_feature_enabled(X86_FEATURE_FSGSBASE)) {
> + cr4_set_bits(X86_CR4_FSGSBASE);
> + elf_hwcap2 |= HWCAP2_FSGSBASE;
> + }
> +
There is already a lot happening in this function. If we end up going
this route, I would suggest moving this to a small wrapper function such
as setup_fsgsbase() or enable_fsgsbase().
cpu_init_exception_handling() doesn't seem to be in a hot-path and
cpu_feature_enabled(X86_FEATURE_FSGSBASE) is generally expected to be
true on modern CPUs.
> if (cpu_feature_enabled(X86_FEATURE_FRED)) {
> /* The boot CPU has enabled FRED during early boot */
> if (!boot_cpu)
>
> Regards,
> Nikunj
>
>
On Fri, Mar 13, 2026 at 11:05:10AM -0700, Sohil Mehta wrote:
> There is already a lot happening in this function. If we end up going
> this route, I would suggest moving this to a small wrapper function such
> as setup_fsgsbase() or enable_fsgsbase().
For what? For a three-liner, well-separated conditional?
Not really.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 3/12/26 07:08, Nikunj A. Dadhania wrote: > 1) Back-porting complexity: The current issue affects kernels (6.9+) > where SEV-SNP guests fail to boot with FRED enabled. A simpler fix would > be easier to backport and verify across stable branches. The simplest fix is to disable FRED on those kernels, fwiw.
On 3/12/2026 7:20 AM, Dave Hansen wrote: > On 3/12/26 07:08, Nikunj A. Dadhania wrote: >> 1) Back-porting complexity: The current issue affects kernels (6.9+) >> where SEV-SNP guests fail to boot with FRED enabled. A simpler fix would >> be easier to backport and verify across stable branches. > > The simplest fix is to disable FRED on those kernels, fwiw. In addition to this, On SEV systems, early exceptions appear to be expected in practice while CR4.FSGSBASE=0. So, at the moment, it also looks safe and simple to disable the feature until when those entry paths are adjusted to tolerate that case. Currently, those entry paths are patched to use FSGSBASE instructions regardless of the CR4 setting. That inflexibility appears to make it broken in the first place. I’d take a look and come back with something reviewable.
On 3/16/26 13:27, Chang S. Bae wrote: > On 3/12/2026 7:20 AM, Dave Hansen wrote: >> On 3/12/26 07:08, Nikunj A. Dadhania wrote: >>> 1) Back-porting complexity: The current issue affects kernels (6.9+) >>> where SEV-SNP guests fail to boot with FRED enabled. A simpler >>> fix would >>> be easier to backport and verify across stable branches. >> >> The simplest fix is to disable FRED on those kernels, fwiw. > > In addition to this, > > On SEV systems, early exceptions appear to be expected in practice while > CR4.FSGSBASE=0. So, at the moment, it also looks safe and simple to > disable the feature until when those entry paths are adjusted to > tolerate that case. Sure. FSGSBASE at entry is _purely_ a performance optimization. It seems reasonable to say for simplicity that the early exception code should not use FSGSBASE instructions. > Currently, those entry paths are patched to use FSGSBASE instructions > regardless of the CR4 setting. That inflexibility appears to make it > broken in the first place. I’d take a look and come back with something > reviewable. Yup. But, just to be clear, the patching is done by the boot CPU before the secondaries even come up. So the "late" exception handlers are incompatible with the secondary CPU from the moment it comes up until the moment it enables CR4.FSGSBASE. Either we change how alternatives patching works, we use some other exception code, or we try and get CR4.FSGSBASE established as early as possible on the secondaries.
On 3/16/2026 2:43 PM, Dave Hansen wrote: > > Either we change how alternatives patching works, we use some other > exception code, or we try and get CR4.FSGSBASE established as early as > possible on the secondaries. It looks like the last option was sorted out already. As played out other options a little bit, deferring patching looked a bit funky, and either macrofying the entry code or referencing XCR0 on the fly appeared a bit overkill. So, yes, nothing stands out than that choice.
On 3/17/2026 10:04 AM, Chang S. Bae wrote: > > referencing XCR0 on Oops, CR4 apparently.
On 3/17/2026 3:13 AM, Dave Hansen wrote: > Either we change how alternatives patching works, we use some other > exception code, or we try and get CR4.FSGSBASE established as early as > possible on the secondaries. Right, I moved FSGSBASE enablement to cpu_init_exception_handling() to establish it early: https://lore.kernel.org/all/1439b4f4-ff0c-4b6a-ac86-5c0da2d26cf5@amd.com This: 1) Enables FSGSBASE early for secondary CPUs before any exceptions occur 2) Consolidates initialization for both boot and secondary CPUs in one place 3) Eliminates the double-enable issue as in cr4_init()/identify_cpu() Does this approach look good to you? Regards, Nikunj
On 3/16/26 21:12, Nikunj A. Dadhania wrote: > On 3/17/2026 3:13 AM, Dave Hansen wrote: >> Either we change how alternatives patching works, we use some other >> exception code, or we try and get CR4.FSGSBASE established as early as >> possible on the secondaries. > Right, I moved FSGSBASE enablement to cpu_init_exception_handling() to > establish it early: > > https://lore.kernel.org/all/1439b4f4-ff0c-4b6a-ac86-5c0da2d26cf5@amd.com > > This: > 1) Enables FSGSBASE early for secondary CPUs before any exceptions occur > 2) Consolidates initialization for both boot and secondary CPUs in one place > 3) Eliminates the double-enable issue as in cr4_init()/identify_cpu() > > Does this approach look good to you? After sleeping on it, yes, this looks good to me. Please resend this with a full changelog and so forth, and I think we can move forward with it.
On Tue, Mar 17, 2026 at 08:31:06AM -0700, Dave Hansen wrote:
> Please resend this with a full changelog and so forth, and I think we can
> move forward with it.
Patch works on 3 machines here, should be good. I'll queue the two once I have
them and we can hammer on them some more.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 3/17/2026 10:24 PM, Borislav Petkov wrote: > On Tue, Mar 17, 2026 at 08:31:06AM -0700, Dave Hansen wrote: >> Please resend this with a full changelog and so forth, and I think we can >> move forward with it. > > Patch works on 3 machines here, should be good. I'll queue the two once I have > them and we can hammer on them some more. Thank you Dave and Boris for the review and the quick turnaround. I’ve posted the v3 here: https://lore.kernel.org/all/20260318075654.1792916-1-nikunj@amd.com/ Regards, Nikunj
On Tue, Mar 17, 2026 at 09:42:58AM +0530, Nikunj A. Dadhania wrote:
>
>
> On 3/17/2026 3:13 AM, Dave Hansen wrote:
> > Either we change how alternatives patching works, we use some other
> > exception code, or we try and get CR4.FSGSBASE established as early as
> > possible on the secondaries.
>
> Right, I moved FSGSBASE enablement to cpu_init_exception_handling() to
> establish it early:
>
> https://lore.kernel.org/all/1439b4f4-ff0c-4b6a-ac86-5c0da2d26cf5@amd.com
So this thing does the following here.
CPU0 goes first, the APs follow.
I think that should be good as a stable fix, unless I'm missing something.
We can then do our rework ontop of unifying the CR4 init and pinning but that
will be future work which doesn't need stable anyway...
[ 0.132839] cpu_init_exception_handling: CPU0: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU2, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU2: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU4, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU4: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU6, CR4: 0x3100f0
[ 0.240204] cr4_init: CPU8, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU8: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU10, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU10: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU12, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU12: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU14, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU14: Set X86_CR4_FSGSBASE
[ 0.240204] cpu_init_exception_handling: CPU6: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU1, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU1: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU3, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU3: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU5, CR4: 0x3100f0
[ 0.240204] cr4_init: CPU7, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU7: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU9, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU9: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU11, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU11: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU13, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU13: Set X86_CR4_FSGSBASE
[ 0.240204] cr4_init: CPU15, CR4: 0x3100f0
[ 0.240204] cpu_init_exception_handling: CPU5: Set X86_CR4_FSGSBASE
[ 0.240204] cpu_init_exception_handling: CPU15: Set X86_CR4_FSGSBASE
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On 3/12/2026 7:50 PM, Dave Hansen wrote: > On 3/12/26 07:08, Nikunj A. Dadhania wrote: >> 1) Back-porting complexity: The current issue affects kernels (6.9+) >> where SEV-SNP guests fail to boot with FRED enabled. A simpler fix would >> be easier to backport and verify across stable branches. > > The simplest fix is to disable FRED on those kernels, fwiw. That would work, but disabling FRED means LTS users will not be able to use FRED with confidential computing—that's not really a fix. This isn't just SEV-SNP. Xin Li confirmed Intel TDX has the same issue: FRED is enabled before exception handling is ready, and #VC/#VE can't be handled on secondary CPUs. The initialization order is wrong for both SEV-SNP and TDX. The fixes are small and targeted—just ensuring FRED state is set up before it's needed. Regards, Nikunj
On 3/12/26 07:53, Nikunj A. Dadhania wrote: >> The simplest fix is to disable FRED on those kernels, fwiw. > That would work, but disabling FRED means LTS users will not be able to use > FRED with confidential computing—that's not really a fix. Why not? Is there something out there that *NEEDS* FRED to function? > This isn't just SEV-SNP. Xin Li confirmed Intel TDX has the same issue: FRED > is enabled before exception handling is ready, and #VC/#VE can't be handled on > secondary CPUs. > > The initialization order is wrong for both SEV-SNP and TDX. The fixes are small > and targeted—just ensuring FRED state is set up before it's needed. Sure, it's a theoretical problem for TDX and a practical, demonstrable one for SEV-SNP.
On Thu, 12 Mar 2026 08:02:53 -0700 Dave Hansen <dave.hansen@intel.com> wrote: > On 3/12/26 07:53, Nikunj A. Dadhania wrote: > >> The simplest fix is to disable FRED on those kernels, fwiw. > > That would work, but disabling FRED means LTS users will not be able to use > > FRED with confidential computing—that's not really a fix. > > Why not? > > Is there something out there that *NEEDS* FRED to function? I didn't think there was any real hardware that actually supports FRED. By the time that exists there might be a sane fix - or a newer LTS kernel. David
On 3/11/2026 10:58 PM, Dave Hansen wrote:
> On 3/11/26 08:42, Nikunj A. Dadhania wrote:
>> -void cr4_init(void)
>> +void cr4_init(bool boot_cpu)
>
> I think the "pass a bool to a function" pattern is really not great.
The reason I introduced it was to avoid double-enabling PCIDE on the boot
CPU. The boot CPU already enables PCIDE via setup_pcid().
> The problem isn't actually setting CR4 twice. The problem is having two
> different code paths to set it. Doing the 'bool' doesn't eliminate the
> code path, it just makes the whole thing more complicated to reason
> about and further bifurcates the boot and secondary CPU bringup paths.
>
> We want to unify those, not bifurcate them.
>
> Why don't we just universally set X86_CR4_FSGSBASE in cr4_init()?
Right, this approach becomes much simpler with the boot_cpu:
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 1c3261cae40c..6c0493eaf813 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -502,6 +502,17 @@ void cr4_init(void)
if (boot_cpu_has(X86_FEATURE_PCID))
cr4 |= X86_CR4_PCIDE;
+
+ /*
+ * CPUs that support FSGSBASE may use RDGSBASE/WRGSBASE in
+ * paranoid_entry(). Enable the feature before any exceptions
+ * occur.
+ */
+ if (boot_cpu_has(X86_FEATURE_FSGSBASE)) {
+ cr4 |= X86_CR4_FSGSBASE;
+ elf_hwcap2 |= HWCAP2_FSGSBASE;
+ }
+
if (static_branch_likely(&cr_pinning))
cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits;
@@ -2047,12 +2058,6 @@ static void identify_cpu(struct cpuinfo_x86 *c)
setup_umip(c);
setup_lass(c);
- /* Enable FSGSBASE instructions if available. */
- if (cpu_has(c, X86_FEATURE_FSGSBASE)) {
- cr4_set_bits(X86_CR4_FSGSBASE);
- elf_hwcap2 |= HWCAP2_FSGSBASE;
- }
-
/*
* The vendor-specific functions might have changed features.
* Now we do "generic changes."
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4dbff8ef9b1c..0a2b129bda03 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -1685,6 +1685,13 @@ void __init trap_init(void)
/* Init GHCB memory pages when running as an SEV-ES guest */
sev_es_init_vc_handling();
+ /*
+ * Initialize CR4 early, before cpu_init(). This ensures features like
+ * FSGSBASE are enabled before exception handlers run, avoiding double
+ * initialization later.
+ */
+ cr4_init();
+
/* Initialize TSS before setting up traps so ISTs work */
cpu_init_exception_handling(true);
© 2016 - 2026 Red Hat, Inc.