arch/x86/include/asm/cpufeature.h | 1 + arch/x86/kernel/cpu/common.c | 4 ++++ arch/x86/kernel/cpu/cpuid-deps.c | 31 +++++++++++++++++++++++++++++++ 3 files changed, 36 insertions(+)
Currently, the cpuid_deps[] table is only exercised when a particular
feature gets explicitly disabled and clear_cpu_cap() is called. However,
some of these listed dependencies might already be missing during boot.
These types of errors shouldn't generally happen in production
environments but they could sometimes sneak through, especially when VMs
and Kconfigs are in the mix. Also, the kernel might introduce artificial
dependencies between unrelated features such as making LAM depend on
LASS.
Unexpected failures can occur when the kernel tries to use such a
feature. Rather than debug such scenarios, it would be better to disable
the feature upfront.
Add a simple boot time scan of the cpuid_deps[] table and disable any
feature with unmet dependencies. do_clear_cpu_cap() makes sure that the
dependency tree reaches a stable state in the end.
Also, add a print to the kernel log which might be useful to users if a
feature gets unexpectedly disabled. Features are typically represented
through unsigned integers within the kernel though some of them have
user friendly names if they are exposed via cpuinfo. Show the
friendlier name if available otherwise display the X86_FEATURE_*
numerals to make it easier to identify the feature.
Suggested-by: Tony Luck <tony.luck@intel.com> # Kernel log print
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v4: Update the log level to pr_info() (Ingo)
Update the char buffer size to 16 (0day warning)
Dropped Dave's review tag since the code has changed a bit more
than usual.
v3: https://lore.kernel.org/lkml/20241207004126.2054658-1-sohil.mehta@intel.com/
Picked up the review tag.
Reworded the commit message.
v2: Use cpu_has() instead of boot_cpu_has() (Sean)
RFC-v1: New patch.
---
arch/x86/include/asm/cpufeature.h | 1 +
arch/x86/kernel/cpu/common.c | 4 ++++
arch/x86/kernel/cpu/cpuid-deps.c | 31 +++++++++++++++++++++++++++++++
3 files changed, 36 insertions(+)
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 0b9611da6c53..8821336a6c73 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -148,6 +148,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
extern void setup_clear_cpu_cap(unsigned int bit);
extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
+void filter_feature_dependencies(struct cpuinfo_x86 *c);
#define setup_force_cpu_cap(bit) do { \
\
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a5c28975c608..bd27cf5974d4 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1608,6 +1608,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
c->cpu_index = 0;
filter_cpuid_features(c, false);
+ filter_feature_dependencies(c);
if (this_cpu->c_bsp_init)
this_cpu->c_bsp_init(c);
@@ -1862,6 +1863,9 @@ static void identify_cpu(struct cpuinfo_x86 *c)
/* Filter out anything that depends on CPUID levels we don't have */
filter_cpuid_features(c, true);
+ /* Filter out features that don't have their dependencies met */
+ filter_feature_dependencies(c);
+
/* If the model name is still unset, do table lookup. */
if (!c->x86_model_id[0]) {
const char *p;
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 8bd84114c2d9..dccb547cbf0d 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -146,3 +146,34 @@ void setup_clear_cpu_cap(unsigned int feature)
{
do_clear_cpu_cap(NULL, feature);
}
+
+/*
+ * Return the feature "name" if available otherwise return
+ * the X86_FEATURE_* numerals to make it easier to identify
+ * the feature.
+ */
+static const char *x86_feature_name(unsigned int feature, char *buf)
+{
+ if (x86_cap_flags[feature])
+ return x86_cap_flags[feature];
+
+ snprintf(buf, 16, "%d*32+%2d", feature / 32, feature % 32);
+
+ return buf;
+}
+
+void filter_feature_dependencies(struct cpuinfo_x86 *c)
+{
+ char feature_buf[16], depends_buf[16];
+ const struct cpuid_dep *d;
+
+ for (d = cpuid_deps; d->feature; d++) {
+ if (cpu_has(c, d->feature) && !cpu_has(c, d->depends)) {
+ pr_info("CPU%d: Disabling feature %s due to missing feature %s\n",
+ smp_processor_id(),
+ x86_feature_name(d->feature, feature_buf),
+ x86_feature_name(d->depends, depends_buf));
+ do_clear_cpu_cap(c, d->feature);
+ }
+ }
+}
--
2.43.0
* Sohil Mehta <sohil.mehta@intel.com> wrote:
> +
> +/*
> + * Return the feature "name" if available otherwise return
> + * the X86_FEATURE_* numerals to make it easier to identify
> + * the feature.
> + */
> +static const char *x86_feature_name(unsigned int feature, char *buf)
> +{
> + if (x86_cap_flags[feature])
> + return x86_cap_flags[feature];
> +
> + snprintf(buf, 16, "%d*32+%2d", feature / 32, feature % 32);
> +
> + return buf;
> +}
> +
> +void filter_feature_dependencies(struct cpuinfo_x86 *c)
> +{
> + char feature_buf[16], depends_buf[16];
> + const struct cpuid_dep *d;
> +
> + for (d = cpuid_deps; d->feature; d++) {
> + if (cpu_has(c, d->feature) && !cpu_has(c, d->depends)) {
> + pr_info("CPU%d: Disabling feature %s due to missing feature %s\n",
> + smp_processor_id(),
> + x86_feature_name(d->feature, feature_buf),
> + x86_feature_name(d->depends, depends_buf));
> + do_clear_cpu_cap(c, d->feature);
> + }
> + }
So let's not disable any CPU features actively for the time being, how
about issuing a pr_warn() only about the dependency violation?
I think the main problem is when these problems slip through 100%
unnoticed.
Thanks,
Ingo
On 2/27/2025 10:46 AM, Ingo Molnar wrote:
>> +void filter_feature_dependencies(struct cpuinfo_x86 *c)
>> +{
>> + char feature_buf[16], depends_buf[16];
>> + const struct cpuid_dep *d;
>> +
>> + for (d = cpuid_deps; d->feature; d++) {
>> + if (cpu_has(c, d->feature) && !cpu_has(c, d->depends)) {
>> + pr_info("CPU%d: Disabling feature %s due to missing feature %s\n",
>> + smp_processor_id(),
>> + x86_feature_name(d->feature, feature_buf),
>> + x86_feature_name(d->depends, depends_buf));
>> + do_clear_cpu_cap(c, d->feature);
>> + }
>> + }
>
> So let's not disable any CPU features actively for the time being, how
> about issuing a pr_warn() only about the dependency violation?
>
> I think the main problem is when these problems slip through 100%
> unnoticed.
>
I guess you are right. Highlighting the issue is the main part. Beyond
that we can leave the system behavior as-is for now.
Most of the listed dependencies seem to be spec-driven, though the
kernel might create arbitrary dependencies for security reasons such as
making LAM depend on LASS[1]. I think those can probably be handled on a
case by case basis during specific feature enabling.
For the new pr_warn(), I am considering printing it only once per
feature instead of printing it on every CPU (which could be 100s).
But that would mean tracking it in a global feature_warn bitmap.
DECLARE_BITMAP(feature_warn, MAX_FEATURE_BITS);
Another option would be run the scan only on the BSP. But that could
cause some issues to be missed[2].
I am wondering if there is a better way to do this?
-Sohil
[1]:
https://lore.kernel.org/all/20241028160917.1380714-15-alexander.shishkin@linux.intel.com/
[2]:
https://lore.kernel.org/lkml/ZsfJUT0AWFhoONWf@google.com/#:~:text=divergent%20features%20from%20the%20boot%20CPU
* Sohil Mehta <sohil.mehta@intel.com> wrote:
> On 2/27/2025 10:46 AM, Ingo Molnar wrote:
>
> >> +void filter_feature_dependencies(struct cpuinfo_x86 *c)
> >> +{
> >> + char feature_buf[16], depends_buf[16];
> >> + const struct cpuid_dep *d;
> >> +
> >> + for (d = cpuid_deps; d->feature; d++) {
> >> + if (cpu_has(c, d->feature) && !cpu_has(c, d->depends)) {
> >> + pr_info("CPU%d: Disabling feature %s due to missing feature %s\n",
> >> + smp_processor_id(),
> >> + x86_feature_name(d->feature, feature_buf),
> >> + x86_feature_name(d->depends, depends_buf));
> >> + do_clear_cpu_cap(c, d->feature);
> >> + }
> >> + }
> >
> > So let's not disable any CPU features actively for the time being, how
> > about issuing a pr_warn() only about the dependency violation?
> >
> > I think the main problem is when these problems slip through 100%
> > unnoticed.
> >
>
> I guess you are right. Highlighting the issue is the main part. Beyond
> that we can leave the system behavior as-is for now.
>
> Most of the listed dependencies seem to be spec-driven, though the
> kernel might create arbitrary dependencies for security reasons such as
> making LAM depend on LASS[1]. I think those can probably be handled on a
> case by case basis during specific feature enabling.
>
> For the new pr_warn(), I am considering printing it only once per
> feature instead of printing it on every CPU (which could be 100s).
Yeah.
> But that would mean tracking it in a global feature_warn bitmap.
>
> DECLARE_BITMAP(feature_warn, MAX_FEATURE_BITS);
>
> Another option would be run the scan only on the BSP. But that could
> cause some issues to be missed[2].
Just use pr_warn_once().
Yes, this might cause subsequent CPU feature dependency problems to
stay unreported, but the hope here is that these are rare and get
fixed, right?
Thanks,
Ingo
© 2016 - 2025 Red Hat, Inc.