[PATCH v2] x86/cpufeature: Add feature dependency checks

Sohil Mehta posted 1 patch 3 weeks, 3 days ago
arch/x86/include/asm/cpufeature.h |  1 +
arch/x86/kernel/cpu/common.c      |  4 ++++
arch/x86/kernel/cpu/cpuid-deps.c  | 10 ++++++++++
3 files changed, 15 insertions(+)
[PATCH v2] x86/cpufeature: Add feature dependency checks
Posted by Sohil Mehta 3 weeks, 3 days ago
Currently, the cpuid-deps[] table is only exercised when a particular
feature gets explicitly disabled and clear_cpu_cap() is called. However,
some of these listed dependencies might already be missing during boot.
These types of errors shouldn't generally happen in production
environments but they could sometimes sneak through, especially when
VMs and Kconfigs are in the mix.

Unexpected failures can occur when the kernel tries to use such a
feature. Rather than debug such scenarios, it would be better to
disable the feature upfront.

Add a simple boot time check for missing feature dependencies and
disable any feature whose dependencies are not met.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
I considered merging filter_feature_dependencies() and filter_cpuid_features()
but they operate on slightly different structures namely, struct
cpuid_dependent_feature and struct cpuid_dep. Operating on them in the same
function didn't seem worthwhile. Please let me know if someone feels otherwise.

v2: Use cpu_has() instead of boot_cpu_has() (Sean)

RFC-v1: https://lore.kernel.org/lkml/20240822202226.862398-1-sohil.mehta@intel.com/
---

 arch/x86/include/asm/cpufeature.h |  1 +
 arch/x86/kernel/cpu/common.c      |  4 ++++
 arch/x86/kernel/cpu/cpuid-deps.c  | 10 ++++++++++
 3 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 0b9611da6c53..8821336a6c73 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -148,6 +148,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
 
 extern void setup_clear_cpu_cap(unsigned int bit);
 extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
+void filter_feature_dependencies(struct cpuinfo_x86 *c);
 
 #define setup_force_cpu_cap(bit) do {			\
 							\
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f1040cb64841..b7491e57a74e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1610,6 +1610,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 		c->cpu_index = 0;
 		filter_cpuid_features(c, false);
+		filter_feature_dependencies(c);
 
 		if (this_cpu->c_bsp_init)
 			this_cpu->c_bsp_init(c);
@@ -1862,6 +1863,9 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Filter out anything that depends on CPUID levels we don't have */
 	filter_cpuid_features(c, true);
 
+	/* Filter out features that don't have their dependencies met */
+	filter_feature_dependencies(c);
+
 	/* If the model name is still unset, do table lookup. */
 	if (!c->x86_model_id[0]) {
 		const char *p;
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 8bd84114c2d9..8bea5c5e4fd2 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -146,3 +146,13 @@ void setup_clear_cpu_cap(unsigned int feature)
 {
 	do_clear_cpu_cap(NULL, feature);
 }
+
+void filter_feature_dependencies(struct cpuinfo_x86 *c)
+{
+	const struct cpuid_dep *d;
+
+	for (d = cpuid_deps; d->feature; d++) {
+		if (cpu_has(c, d->feature) && !cpu_has(c, d->depends))
+			do_clear_cpu_cap(c, d->feature);
+	}
+}
-- 
2.34.1
Re: [PATCH v2] x86/cpufeature: Add feature dependency checks
Posted by Dave Hansen 3 weeks, 3 days ago
On 10/30/24 16:31, Sohil Mehta wrote:
> Unexpected failures can occur when the kernel tries to use such a
> feature. Rather than debug such scenarios, it would be better to
> disable the feature upfront.

It Looks OK and sane to me.  It's nice that it inherits the "Loop until
we get a stable state" from do_clear_cpu_cap().

Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
RE: [PATCH v2] x86/cpufeature: Add feature dependency checks
Posted by Luck, Tony 3 weeks, 3 days ago
> +void filter_feature_dependencies(struct cpuinfo_x86 *c)
> +{
> +     const struct cpuid_dep *d;
> +
> +     for (d = cpuid_deps; d->feature; d++) {
> +             if (cpu_has(c, d->feature) && !cpu_has(c, d->depends))
> +                     do_clear_cpu_cap(c, d->feature);
> +     }
> +}

The dependency check found something very wrong. Should there be
a pr_warn() to give some clue that Linux papered over this problem?

-Tony
RE: [PATCH v2] x86/cpufeature: Add feature dependency checks
Posted by H. Peter Anvin 3 weeks, 3 days ago
On October 30, 2024 4:44:59 PM PDT, "Luck, Tony" <tony.luck@intel.com> wrote:
>> +void filter_feature_dependencies(struct cpuinfo_x86 *c)
>> +{
>> +     const struct cpuid_dep *d;
>> +
>> +     for (d = cpuid_deps; d->feature; d++) {
>> +             if (cpu_has(c, d->feature) && !cpu_has(c, d->depends))
>> +                     do_clear_cpu_cap(c, d->feature);
>> +     }
>> +}
>
>The dependency check found something very wrong. Should there be
>a pr_warn() to give some clue that Linux papered over this problem?
>
>-Tony
>

Not necessarily. Linux is free to impose restrictions that don't necessarily match the hardware thermometers. For example, letting LAM depend on LASS.
Re: [PATCH v2] x86/cpufeature: Add feature dependency checks
Posted by Sohil Mehta 3 weeks, 3 days ago
On 10/30/2024 4:44 PM, Luck, Tony wrote:
>> +void filter_feature_dependencies(struct cpuinfo_x86 *c)
>> +{
>> +     const struct cpuid_dep *d;
>> +
>> +     for (d = cpuid_deps; d->feature; d++) {
>> +             if (cpu_has(c, d->feature) && !cpu_has(c, d->depends))
>> +                     do_clear_cpu_cap(c, d->feature);
>> +     }
>> +}
> 
> The dependency check found something very wrong. Should there be
> a pr_warn() to give some clue that Linux papered over this problem?
> 

Not sure if this necessitates a warning. A missing feature may not be a
problem for the user. I am treating a disabled feature due to the
original CPUID enumeration not being present vs due to a dependency not
being met as the same.

System and kernel developers would definitely care if an expected
feature is missing. Maybe a pr_debug()? Want to avoid unnecessary
escalations and bug reports.

Sohil