[PATCH v5] x86/cpufeature: Warn about unmet feature dependencies

Sohil Mehta posted 1 patch 11 months, 1 week ago
There is a newer version of this series
arch/x86/include/asm/cpufeature.h |  1 +
arch/x86/kernel/cpu/common.c      |  4 ++++
arch/x86/kernel/cpu/cpuid-deps.c  | 35 +++++++++++++++++++++++++++++++
3 files changed, 40 insertions(+)
[PATCH v5] x86/cpufeature: Warn about unmet feature dependencies
Posted by Sohil Mehta 11 months, 1 week ago
Currently, the cpuid_deps[] table is only exercised when a particular
feature gets explicitly disabled and clear_cpu_cap() is called. However,
some of these listed dependencies might already be missing during boot.

These types of errors shouldn't generally happen in production
environments but they could sometimes sneak through, especially when VMs
and Kconfigs are in the mix. Also, the kernel might introduce artificial
dependencies between unrelated features such as making LAM depend on
LASS.

Unexpected failures can occur when the kernel tries to use such a
feature. Add a simple boot time scan of the cpuid_deps[] table to detect
the missing dependencies. One option is to disable all of such features
during boot but that may cause regressions in existing systems. For now,
just warn about the missing dependencies to create awareness.

As a trade-off between spamming the kernel log and keeping track of all
the features that have been warned about, only warn about the first
missing dependency. Any subsequent unmet dependency will only be logged
after the first one has been resolved.

Features are typically represented through unsigned integers within the
kernel though some of them have user friendly names if they are exposed
via cpuinfo.  Show the friendlier name if available otherwise display
the X86_FEATURE_* numerals to make it easier to identify the feature.

Suggested-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
v5: Don't disable features with unmet dependencies (Ingo)
    Use pr_warn_once() for the warning (Ingo)

v4: https://lore.kernel.org/lkml/20241210224037.3052555-1-sohil.mehta@intel.com/
    Update the log level to pr_info() (Ingo)
    Update the char buffer size to 16 (0day warning)
    Dropped Dave's review tag since the code has changed a bit more
    than usual.

v3: Picked up the review tag.
    Reworded the commit message.

v2: Use cpu_has() instead of boot_cpu_has() (Sean)
---
 arch/x86/include/asm/cpufeature.h |  1 +
 arch/x86/kernel/cpu/common.c      |  4 ++++
 arch/x86/kernel/cpu/cpuid-deps.c  | 35 +++++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index de1ad09fe8d7..1bb30e82ea77 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -149,6 +149,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
 
 extern void setup_clear_cpu_cap(unsigned int bit);
 extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
+void scan_feature_dependencies(struct cpuinfo_x86 *c);
 
 #define setup_force_cpu_cap(bit) do {			\
 							\
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 7cce91b19fb2..52111d97c8ae 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1610,6 +1610,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 		c->cpu_index = 0;
 		filter_cpuid_features(c, false);
+		scan_feature_dependencies(c);
 
 		if (this_cpu->c_bsp_init)
 			this_cpu->c_bsp_init(c);
@@ -1870,6 +1871,9 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Filter out anything that depends on CPUID levels we don't have */
 	filter_cpuid_features(c, true);
 
+	/* Scan for unmet dependencies based on the CPUID dependency table */
+	scan_feature_dependencies(c);
+
 	/* If the model name is still unset, do table lookup. */
 	if (!c->x86_model_id[0]) {
 		const char *p;
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index df838e3bdbe0..1b02dbd43cf4 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -147,3 +147,38 @@ void setup_clear_cpu_cap(unsigned int feature)
 {
 	do_clear_cpu_cap(NULL, feature);
 }
+
+/*
+ * Return the feature "name" if available otherwise return
+ * the X86_FEATURE_* numerals to make it easier to identify
+ * the feature.
+ */
+static const char *x86_feature_name(unsigned int feature, char *buf)
+{
+	if (x86_cap_flags[feature])
+		return x86_cap_flags[feature];
+
+	snprintf(buf, 16, "%d*32+%2d", feature / 32, feature % 32);
+
+	return buf;
+}
+
+void scan_feature_dependencies(struct cpuinfo_x86 *c)
+{
+	char feature_buf[16], depends_buf[16];
+	const struct cpuid_dep *d;
+
+	for (d = cpuid_deps; d->feature; d++) {
+		if (cpu_has(c, d->feature) && !cpu_has(c, d->depends)) {
+			/*
+			 * Only warn about the first unmet dependency on the
+			 * first CPU where it is encountered to avoid spamming
+			 * the kernel log.
+			 */
+			pr_warn_once("CPU%d: feature:%s may not work properly without feature:%s\n",
+				     smp_processor_id(),
+				     x86_feature_name(d->feature, feature_buf),
+				     x86_feature_name(d->depends, depends_buf));
+		}
+	}
+}
-- 
2.43.0
Re: [PATCH v5] x86/cpufeature: Warn about unmet feature dependencies
Posted by Ingo Molnar 11 months, 1 week ago
* Sohil Mehta <sohil.mehta@intel.com> wrote:

> Currently, the cpuid_deps[] table is only exercised when a particular
> feature gets explicitly disabled and clear_cpu_cap() is called. However,
> some of these listed dependencies might already be missing during boot.
> 
> These types of errors shouldn't generally happen in production
> environments but they could sometimes sneak through, especially when VMs
> and Kconfigs are in the mix. Also, the kernel might introduce artificial
> dependencies between unrelated features such as making LAM depend on
> LASS.
> 
> Unexpected failures can occur when the kernel tries to use such a
> feature. Add a simple boot time scan of the cpuid_deps[] table to detect
> the missing dependencies. One option is to disable all of such features
> during boot but that may cause regressions in existing systems. For now,
> just warn about the missing dependencies to create awareness.
> 
> As a trade-off between spamming the kernel log and keeping track of all
> the features that have been warned about, only warn about the first
> missing dependency. Any subsequent unmet dependency will only be logged
> after the first one has been resolved.
> 
> Features are typically represented through unsigned integers within the
> kernel though some of them have user friendly names if they are exposed
> via cpuinfo.  Show the friendlier name if available otherwise display
> the X86_FEATURE_* numerals to make it easier to identify the feature.
> 
> Suggested-by: Tony Luck <tony.luck@intel.com>
> Suggested-by: Ingo Molnar <mingo@redhat.com>
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> ---
> v5: Don't disable features with unmet dependencies (Ingo)
>     Use pr_warn_once() for the warning (Ingo)
> 
> v4: https://lore.kernel.org/lkml/20241210224037.3052555-1-sohil.mehta@intel.com/
>     Update the log level to pr_info() (Ingo)
>     Update the char buffer size to 16 (0day warning)
>     Dropped Dave's review tag since the code has changed a bit more
>     than usual.
> 
> v3: Picked up the review tag.
>     Reworded the commit message.
> 
> v2: Use cpu_has() instead of boot_cpu_has() (Sean)
> ---
>  arch/x86/include/asm/cpufeature.h |  1 +
>  arch/x86/kernel/cpu/common.c      |  4 ++++
>  arch/x86/kernel/cpu/cpuid-deps.c  | 35 +++++++++++++++++++++++++++++++
>  3 files changed, 40 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index de1ad09fe8d7..1bb30e82ea77 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -149,6 +149,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
>  
>  extern void setup_clear_cpu_cap(unsigned int bit);
>  extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
> +void scan_feature_dependencies(struct cpuinfo_x86 *c);
>  
>  #define setup_force_cpu_cap(bit) do {			\
>  							\
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 7cce91b19fb2..52111d97c8ae 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1610,6 +1610,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
>  
>  		c->cpu_index = 0;
>  		filter_cpuid_features(c, false);
> +		scan_feature_dependencies(c);
>  
>  		if (this_cpu->c_bsp_init)
>  			this_cpu->c_bsp_init(c);
> @@ -1870,6 +1871,9 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>  	/* Filter out anything that depends on CPUID levels we don't have */
>  	filter_cpuid_features(c, true);
>  
> +	/* Scan for unmet dependencies based on the CPUID dependency table */
> +	scan_feature_dependencies(c);

s/scane_feature_dependencies
 /x86_check_cpufeature_deps

or so.

> +/*
> + * Return the feature "name" if available otherwise return
> + * the X86_FEATURE_* numerals to make it easier to identify
> + * the feature.

s/available otherwise
 /available, otherwise

> + */
> +static const char *x86_feature_name(unsigned int feature, char *buf)
> +{
> +	if (x86_cap_flags[feature])
> +		return x86_cap_flags[feature];
> +
> +	snprintf(buf, 16, "%d*32+%2d", feature / 32, feature % 32);
> +
> +	return buf;
> +}
> +
> +void scan_feature_dependencies(struct cpuinfo_x86 *c)
> +{
> +	char feature_buf[16], depends_buf[16];
> +	const struct cpuid_dep *d;
> +
> +	for (d = cpuid_deps; d->feature; d++) {
> +		if (cpu_has(c, d->feature) && !cpu_has(c, d->depends)) {
> +			/*
> +			 * Only warn about the first unmet dependency on the
> +			 * first CPU where it is encountered to avoid spamming
> +			 * the kernel log.
> +			 */
> +			pr_warn_once("CPU%d: feature:%s may not work properly without feature:%s\n",
> +				     smp_processor_id(),
> +				     x86_feature_name(d->feature, feature_buf),
> +				     x86_feature_name(d->depends, depends_buf));

I'd make this a bit less passive-aggressive, something like:

     x86 CPU feature dependency check failure: CPU%d has '%s' enabled but '%s' disabled. Kernel might be fine, but no guarantees.

Because initially most of these warnings will be about quirks and 
oddities that happen to work fine in the current code.

Thanks,

	Ingo
Re: [PATCH v5] x86/cpufeature: Warn about unmet feature dependencies
Posted by Sohil Mehta 11 months ago
On 3/7/2025 3:55 AM, Ingo Molnar wrote:

>>  
>> +	/* Scan for unmet dependencies based on the CPUID dependency table */
>> +	scan_feature_dependencies(c);
> 
> s/scane_feature_dependencies
>  /x86_check_cpufeature_deps
> 

How about check_cpufeature_deps() without the "x86" prefix? It would
blend in with the other function calls in early_identify_cpu() and
identify_cpu().

> 
>> + */
>> +static const char *x86_feature_name(unsigned int feature, char *buf)
>> +{
>> +	if (x86_cap_flags[feature])
>> +		return x86_cap_flags[feature];
>> +
>> +	snprintf(buf, 16, "%d*32+%2d", feature / 32, feature % 32);
>> +
>> +	return buf;
>> +}
>> +

I was wondering if it would be better to build the feature name using a
macro and reusing it elsewhere? This is all I could come up with:

/*
 * Use with a %s format specifier to print the feature name.
 *
 * Return the feature "name" if set, otherwise return the X86_FEATURE_*
 * numerals to make it easier to identify the feature.
 */
#define x86_feature_name(feature) \
	(x86_cap_flags[feature] ? x86_cap_flags[feature] : \
	({ \
		char buf[16]; \
		snprintf(buf, 16, "%d*32+%2d", feature >> 5, feature & 31); \
		buf; \
	}) \
	)


This would remove the need for callers to explicitly define a buffer.
Also, it would help reduce a few lines in the newly merged
parse_set_clear_cpuid(). But overall, it doesn't seem worth it. Let me
know if you think otherwise or have a better idea.

>> +void scan_feature_dependencies(struct cpuinfo_x86 *c)
>> +{
>> +	char feature_buf[16], depends_buf[16];
>> +	const struct cpuid_dep *d;
>> +
>> +	for (d = cpuid_deps; d->feature; d++) {
>> +		if (cpu_has(c, d->feature) && !cpu_has(c, d->depends)) {
>> +			/*
>> +			 * Only warn about the first unmet dependency on the
>> +			 * first CPU where it is encountered to avoid spamming
>> +			 * the kernel log.
>> +			 */
>> +			pr_warn_once("CPU%d: feature:%s may not work properly without feature:%s\n",
>> +				     smp_processor_id(),
>> +				     x86_feature_name(d->feature, feature_buf),
>> +				     x86_feature_name(d->depends, depends_buf));
> 
> I'd make this a bit less passive-aggressive, something like:
> 
>      x86 CPU feature dependency check failure: CPU%d has '%s' enabled but '%s' disabled. Kernel might be fine, but no guarantees.
> 

Sure! How about making it slightly shorter?

"x86 CPU feature check: CPU%d has '%s' enabled but '%s' disabled. Kernel
might be fine, but no guarantees."

> Because initially most of these warnings will be about quirks and 
> oddities that happen to work fine in the current code.
> 

Yeah, we can probably modify the check later based on how it goes.
Re: [PATCH v5] x86/cpufeature: Warn about unmet feature dependencies
Posted by Ingo Molnar 11 months ago
* Sohil Mehta <sohil.mehta@intel.com> wrote:

> On 3/7/2025 3:55 AM, Ingo Molnar wrote:
> 
> >>  
> >> +	/* Scan for unmet dependencies based on the CPUID dependency table */
> >> +	scan_feature_dependencies(c);
> > 
> > s/scane_feature_dependencies
> >  /x86_check_cpufeature_deps
> > 
> 
> How about check_cpufeature_deps() without the "x86" prefix? It would
> blend in with the other function calls in early_identify_cpu() and
> identify_cpu().

Yeah, I suppose that would work too. There's no discernible rhyme and 
reason to the naming choices within the interfaces used by 
arch/x86/kernel/cpu/common.c that I can see, so I suppose the shorter 
one that is still unambiguous wins.

> >> + */
> >> +static const char *x86_feature_name(unsigned int feature, char *buf)
> >> +{
> >> +	if (x86_cap_flags[feature])
> >> +		return x86_cap_flags[feature];
> >> +
> >> +	snprintf(buf, 16, "%d*32+%2d", feature / 32, feature % 32);
> >> +
> >> +	return buf;
> >> +}
> >> +
> 
> I was wondering if it would be better to build the feature name using 
> a macro and reusing it elsewhere? This is all I could come up with:
> 
> /*
>  * Use with a %s format specifier to print the feature name.
>  *
>  * Return the feature "name" if set, otherwise return the X86_FEATURE_*
>  * numerals to make it easier to identify the feature.
>  */
> #define x86_feature_name(feature) \
> 	(x86_cap_flags[feature] ? x86_cap_flags[feature] : \
> 	({ \
> 		char buf[16]; \
> 		snprintf(buf, 16, "%d*32+%2d", feature >> 5, feature & 31); \
> 		buf; \
> 	}) \
> 	)

I'm not sure this is an improvement.

> This would remove the need for callers to explicitly define a buffer. 
> Also, it would help reduce a few lines in the newly merged 
> parse_set_clear_cpuid(). But overall, it doesn't seem worth it. Let 
> me know if you think otherwise or have a better idea.

No good ideas right now.

> > I'd make this a bit less passive-aggressive, something like:
> > 
> >      x86 CPU feature dependency check failure: CPU%d has '%s' enabled but '%s' disabled. Kernel might be fine, but no guarantees.
> > 
> 
> Sure! How about making it slightly shorter?
> 
> "x86 CPU feature check: CPU%d has '%s' enabled but '%s' disabled. Kernel
> might be fine, but no guarantees."

Yeah, so I really wanted to sneak in the 'dependency' part - because 
it's not necessarily obvious from the text, and most syslog readers 
will have no idea what it's all about.

I don't think line length should be an issue for a message we don't 
expect to trigger normally. Clarity is more important.

Thanks,

	Ingo
Re: [PATCH v5] x86/cpufeature: Warn about unmet feature dependencies
Posted by Sohil Mehta 11 months ago
On 3/13/2025 3:14 AM, Ingo Molnar wrote:

>>> I'd make this a bit less passive-aggressive, something like:
>>>
>>>      x86 CPU feature dependency check failure: CPU%d has '%s' enabled but '%s' disabled. Kernel might be fine, but no guarantees.
>>>
> 

...

> Yeah, so I really wanted to sneak in the 'dependency' part - because 
> it's not necessarily obvious from the text, and most syslog readers 
> will have no idea what it's all about.
> 
> I don't think line length should be an issue for a message we don't 
> expect to trigger normally. Clarity is more important.
> 

Sounds good, I'll use the one you proposed as-is. Will send a new
(hopefully final) revision soon with the changes.

> Thanks,
> 
> 	Ingo