From: Ard Biesheuvel <ardb@kernel.org>
cpu_feature_enabled() uses a ternary alternative, where the late variant
is based on code patching and the early variant accesses the capability
field in boot_cpu_data directly.
This allows cpu_feature_enabled() to be called quite early, but it still
requires that the CPU feature detection code runs before being able to
rely on the return value of cpu_feature_enabled().
This is a problem for the implementation of pgtable_l5_enabled(), which
is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be
called extremely early. Currently, there is a hacky workaround where
some source files that may execute before (but also after) CPU feature
detection have a different version of pgtable_l5_enabled(), based on the
USE_EARLY_PGTABLE_L5 preprocessor macro.
Instead, let's make it possible to set CPU feature arbitrarily early, so
that the X86_FEATURE_5LEVEL_PAGING capability can be set before even
entering C code.
This involves relying on static initialization of boot_cpu_data and the
cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in
.data. This ensures that they won't be cleared along with the rest of
BSS.
Note that forcing a capability involves setting it in both
boot_cpu_data.x86_capability[] and cpu_caps_set[].
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/cpu/common.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 6f7827015834..f6f206743d6a 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c)
}
/* Aligned to unsigned long to avoid split lock in atomic bitmap ops */
-__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
-__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
+__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
+__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long));
#ifdef CONFIG_X86_32
/* The 32-bit entry code needs to find cpu_entry_area. */
@@ -1708,9 +1708,6 @@ static void __init cpu_parse_early_param(void)
*/
static void __init early_identify_cpu(struct cpuinfo_x86 *c)
{
- memset(&c->x86_capability, 0, sizeof(c->x86_capability));
- c->extended_cpuid_level = 0;
-
if (!have_cpuid_p())
identify_cpu_without_cpuid(c);
@@ -1922,7 +1919,6 @@ static void identify_cpu(struct cpuinfo_x86 *c)
c->x86_virt_bits = 32;
#endif
c->x86_cache_alignment = c->x86_clflush_size;
- memset(&c->x86_capability, 0, sizeof(c->x86_capability));
#ifdef CONFIG_X86_VMX_FEATURE_NAMES
memset(&c->vmx_capability, 0, sizeof(c->vmx_capability));
#endif
@@ -2084,6 +2080,7 @@ void identify_secondary_cpu(unsigned int cpu)
*c = boot_cpu_data;
c->cpu_index = cpu;
+ memset(&c->x86_capability, 0, sizeof(c->x86_capability));
identify_cpu(c);
#ifdef CONFIG_X86_32
enable_sep_cpu();
--
2.49.0.1101.gccaa498523-goog
* Ard Biesheuvel <ardb+git@google.com> wrote: > From: Ard Biesheuvel <ardb@kernel.org> > > cpu_feature_enabled() uses a ternary alternative, where the late variant > is based on code patching and the early variant accesses the capability > field in boot_cpu_data directly. > > This allows cpu_feature_enabled() to be called quite early, but it still > requires that the CPU feature detection code runs before being able to > rely on the return value of cpu_feature_enabled(). > > This is a problem for the implementation of pgtable_l5_enabled(), which > is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be > called extremely early. Currently, there is a hacky workaround where > some source files that may execute before (but also after) CPU feature > detection have a different version of pgtable_l5_enabled(), based on the > USE_EARLY_PGTABLE_L5 preprocessor macro. > > Instead, let's make it possible to set CPU feature arbitrarily early, so > that the X86_FEATURE_5LEVEL_PAGING capability can be set before even > entering C code. > > This involves relying on static initialization of boot_cpu_data and the > cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in > .data. This ensures that they won't be cleared along with the rest of > BSS. > > Note that forcing a capability involves setting it in both > boot_cpu_data.x86_capability[] and cpu_caps_set[]. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > arch/x86/kernel/cpu/common.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > index 6f7827015834..f6f206743d6a 100644 > --- a/arch/x86/kernel/cpu/common.c > +++ b/arch/x86/kernel/cpu/common.c > @@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c) > } > > /* Aligned to unsigned long to avoid split lock in atomic bitmap ops */ > -__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > -__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > +__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > +__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); This change is not mentioned in the changelog AFAICS, but it should be in a separate patch anyway. Thanks, Ingo
On Thu, May 15, 2025 at 08:56:59AM +0200, Ingo Molnar wrote: > > * Ard Biesheuvel <ardb+git@google.com> wrote: > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > cpu_feature_enabled() uses a ternary alternative, where the late variant > > is based on code patching and the early variant accesses the capability > > field in boot_cpu_data directly. > > > > This allows cpu_feature_enabled() to be called quite early, but it still > > requires that the CPU feature detection code runs before being able to > > rely on the return value of cpu_feature_enabled(). > > > > This is a problem for the implementation of pgtable_l5_enabled(), which > > is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be > > called extremely early. Currently, there is a hacky workaround where > > some source files that may execute before (but also after) CPU feature > > detection have a different version of pgtable_l5_enabled(), based on the > > USE_EARLY_PGTABLE_L5 preprocessor macro. > > > > Instead, let's make it possible to set CPU feature arbitrarily early, so > > that the X86_FEATURE_5LEVEL_PAGING capability can be set before even > > entering C code. > > > > This involves relying on static initialization of boot_cpu_data and the > > cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in > > .data. This ensures that they won't be cleared along with the rest of > > BSS. > > > > Note that forcing a capability involves setting it in both > > boot_cpu_data.x86_capability[] and cpu_caps_set[]. > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > arch/x86/kernel/cpu/common.c | 9 +++------ > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > > index 6f7827015834..f6f206743d6a 100644 > > --- a/arch/x86/kernel/cpu/common.c > > +++ b/arch/x86/kernel/cpu/common.c > > @@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c) > > } > > > > /* Aligned to unsigned long to avoid split lock in atomic bitmap ops */ > > -__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > -__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > +__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > +__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > This change is not mentioned in the changelog AFAICS, but it should be > in a separate patch anyway. And why not __ro_after_init? -- Kiryl Shutsemau / Kirill A. Shutemov
* Kirill A. Shutemov <kirill@shutemov.name> wrote: > On Thu, May 15, 2025 at 08:56:59AM +0200, Ingo Molnar wrote: > > > > * Ard Biesheuvel <ardb+git@google.com> wrote: > > > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > > > cpu_feature_enabled() uses a ternary alternative, where the late variant > > > is based on code patching and the early variant accesses the capability > > > field in boot_cpu_data directly. > > > > > > This allows cpu_feature_enabled() to be called quite early, but it still > > > requires that the CPU feature detection code runs before being able to > > > rely on the return value of cpu_feature_enabled(). > > > > > > This is a problem for the implementation of pgtable_l5_enabled(), which > > > is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be > > > called extremely early. Currently, there is a hacky workaround where > > > some source files that may execute before (but also after) CPU feature > > > detection have a different version of pgtable_l5_enabled(), based on the > > > USE_EARLY_PGTABLE_L5 preprocessor macro. > > > > > > Instead, let's make it possible to set CPU feature arbitrarily early, so > > > that the X86_FEATURE_5LEVEL_PAGING capability can be set before even > > > entering C code. > > > > > > This involves relying on static initialization of boot_cpu_data and the > > > cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in > > > .data. This ensures that they won't be cleared along with the rest of > > > BSS. > > > > > > Note that forcing a capability involves setting it in both > > > boot_cpu_data.x86_capability[] and cpu_caps_set[]. > > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > --- > > > arch/x86/kernel/cpu/common.c | 9 +++------ > > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > > > index 6f7827015834..f6f206743d6a 100644 > > > --- a/arch/x86/kernel/cpu/common.c > > > +++ b/arch/x86/kernel/cpu/common.c > > > @@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c) > > > } > > > > > > /* Aligned to unsigned long to avoid split lock in atomic bitmap ops */ > > > -__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > -__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > +__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > +__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > > This change is not mentioned in the changelog AFAICS, but it should be > > in a separate patch anyway. > > And why not __ro_after_init? That's patch #7 :-) I got confused about that too. Patch #2 should not touch this line, and patch #7 should simply introduce __ro_after_init, and we are good I think. Thanks, Ingo
On Thu, 15 May 2025 at 09:18, Ingo Molnar <mingo@kernel.org> wrote: > > > * Kirill A. Shutemov <kirill@shutemov.name> wrote: > > > On Thu, May 15, 2025 at 08:56:59AM +0200, Ingo Molnar wrote: > > > > > > * Ard Biesheuvel <ardb+git@google.com> wrote: > > > > > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > > > > > cpu_feature_enabled() uses a ternary alternative, where the late variant > > > > is based on code patching and the early variant accesses the capability > > > > field in boot_cpu_data directly. > > > > > > > > This allows cpu_feature_enabled() to be called quite early, but it still > > > > requires that the CPU feature detection code runs before being able to > > > > rely on the return value of cpu_feature_enabled(). > > > > > > > > This is a problem for the implementation of pgtable_l5_enabled(), which > > > > is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be > > > > called extremely early. Currently, there is a hacky workaround where > > > > some source files that may execute before (but also after) CPU feature > > > > detection have a different version of pgtable_l5_enabled(), based on the > > > > USE_EARLY_PGTABLE_L5 preprocessor macro. > > > > > > > > Instead, let's make it possible to set CPU feature arbitrarily early, so > > > > that the X86_FEATURE_5LEVEL_PAGING capability can be set before even > > > > entering C code. > > > > > > > > This involves relying on static initialization of boot_cpu_data and the > > > > cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in > > > > .data. This ensures that they won't be cleared along with the rest of > > > > BSS. > > > > > > > > Note that forcing a capability involves setting it in both > > > > boot_cpu_data.x86_capability[] and cpu_caps_set[]. > > > > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > --- > > > > arch/x86/kernel/cpu/common.c | 9 +++------ > > > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > > > > index 6f7827015834..f6f206743d6a 100644 > > > > --- a/arch/x86/kernel/cpu/common.c > > > > +++ b/arch/x86/kernel/cpu/common.c > > > > @@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c) > > > > } > > > > > > > > /* Aligned to unsigned long to avoid split lock in atomic bitmap ops */ > > > > -__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > > -__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > > +__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > > +__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > > > > This change is not mentioned in the changelog AFAICS, but it should be > > > in a separate patch anyway. > > > > And why not __ro_after_init? > > That's patch #7 :-) > > I got confused about that too. > > Patch #2 should not touch this line, and patch #7 should simply > introduce __ro_after_init, and we are good I think. > This change is needed because it prevents these arrays from being cleared along with the rest of BSS, which occurs after the startup code executes. So conceptually, moving these out of BSS is similar to dropping the memset()s, and therefore this belongs in the same patch. However, you are correct that moving these into __ro_after_init achieves the same thing, so I will just reorder that patch with this one, and clarify in the commit log that we are relying on the fact that __ro_after_init is not cleared at boot time.
* Ard Biesheuvel <ardb@kernel.org> wrote: > On Thu, 15 May 2025 at 09:18, Ingo Molnar <mingo@kernel.org> wrote: > > > > > > * Kirill A. Shutemov <kirill@shutemov.name> wrote: > > > > > On Thu, May 15, 2025 at 08:56:59AM +0200, Ingo Molnar wrote: > > > > > > > > * Ard Biesheuvel <ardb+git@google.com> wrote: > > > > > > > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > > > > > > > cpu_feature_enabled() uses a ternary alternative, where the late variant > > > > > is based on code patching and the early variant accesses the capability > > > > > field in boot_cpu_data directly. > > > > > > > > > > This allows cpu_feature_enabled() to be called quite early, but it still > > > > > requires that the CPU feature detection code runs before being able to > > > > > rely on the return value of cpu_feature_enabled(). > > > > > > > > > > This is a problem for the implementation of pgtable_l5_enabled(), which > > > > > is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be > > > > > called extremely early. Currently, there is a hacky workaround where > > > > > some source files that may execute before (but also after) CPU feature > > > > > detection have a different version of pgtable_l5_enabled(), based on the > > > > > USE_EARLY_PGTABLE_L5 preprocessor macro. > > > > > > > > > > Instead, let's make it possible to set CPU feature arbitrarily early, so > > > > > that the X86_FEATURE_5LEVEL_PAGING capability can be set before even > > > > > entering C code. > > > > > > > > > > This involves relying on static initialization of boot_cpu_data and the > > > > > cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in > > > > > .data. This ensures that they won't be cleared along with the rest of > > > > > BSS. > > > > > > > > > > Note that forcing a capability involves setting it in both > > > > > boot_cpu_data.x86_capability[] and cpu_caps_set[]. > > > > > > > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > > > > --- > > > > > arch/x86/kernel/cpu/common.c | 9 +++------ > > > > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > > > > > > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > > > > > index 6f7827015834..f6f206743d6a 100644 > > > > > --- a/arch/x86/kernel/cpu/common.c > > > > > +++ b/arch/x86/kernel/cpu/common.c > > > > > @@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c) > > > > > } > > > > > > > > > > /* Aligned to unsigned long to avoid split lock in atomic bitmap ops */ > > > > > -__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > > > -__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > > > +__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > > > +__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > > > > > > > This change is not mentioned in the changelog AFAICS, but it should be > > > > in a separate patch anyway. > > > > > > And why not __ro_after_init? > > > > That's patch #7 :-) > > > > I got confused about that too. > > > > Patch #2 should not touch this line, and patch #7 should simply > > introduce __ro_after_init, and we are good I think. > > > > This change is needed because it prevents these arrays from being > cleared along with the rest of BSS, which occurs after the startup > code executes. I see, it's a correctness & bisectability aspect, even if it's obsoleted later on. Good point, objection withdrawn. > So conceptually, moving these out of BSS is similar to dropping the > memset()s, and therefore this belongs in the same patch. > > However, you are correct that moving these into __ro_after_init > achieves the same thing, so I will just reorder that patch with this > one, and clarify in the commit log that we are relying on the fact > that __ro_after_init is not cleared at boot time. That works fine for me too - whichever order you prefer. Thanks, Ingo
* Ingo Molnar <mingo@kernel.org> wrote: > > * Ard Biesheuvel <ardb+git@google.com> wrote: > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > cpu_feature_enabled() uses a ternary alternative, where the late variant > > is based on code patching and the early variant accesses the capability > > field in boot_cpu_data directly. > > > > This allows cpu_feature_enabled() to be called quite early, but it still > > requires that the CPU feature detection code runs before being able to > > rely on the return value of cpu_feature_enabled(). > > > > This is a problem for the implementation of pgtable_l5_enabled(), which > > is based on cpu_feature_enabled(X86_FEATURE_5LEVEL_PAGING), and may be > > called extremely early. Currently, there is a hacky workaround where > > some source files that may execute before (but also after) CPU feature > > detection have a different version of pgtable_l5_enabled(), based on the > > USE_EARLY_PGTABLE_L5 preprocessor macro. > > > > Instead, let's make it possible to set CPU feature arbitrarily early, so > > that the X86_FEATURE_5LEVEL_PAGING capability can be set before even > > entering C code. > > > > This involves relying on static initialization of boot_cpu_data and the > > cpu_caps_set/cpu_caps_cleared arrays, so they all need to reside in > > .data. This ensures that they won't be cleared along with the rest of > > BSS. > > > > Note that forcing a capability involves setting it in both > > boot_cpu_data.x86_capability[] and cpu_caps_set[]. > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > arch/x86/kernel/cpu/common.c | 9 +++------ > > 1 file changed, 3 insertions(+), 6 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c > > index 6f7827015834..f6f206743d6a 100644 > > --- a/arch/x86/kernel/cpu/common.c > > +++ b/arch/x86/kernel/cpu/common.c > > @@ -704,8 +704,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c) > > } > > > > /* Aligned to unsigned long to avoid split lock in atomic bitmap ops */ > > -__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > -__u32 cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > +__u32 __read_mostly cpu_caps_cleared[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > +__u32 __read_mostly cpu_caps_set[NCAPINTS + NBUGINTS] __aligned(sizeof(unsigned long)); > > This change is not mentioned in the changelog AFAICS, but it should be > in a separate patch anyway. So patch #7 makes this __ro_after_init. I think we should not introduce the __read_mostly attribute in patch #2, and introduce __ro_after_init in patch #7 - with the same end result. Thanks, Ingo
© 2016 - 2026 Red Hat, Inc.