[RFC PATCH] x86/cpufeature: Add feature dependency checks

Sohil Mehta posted 1 patch 1 year, 5 months ago
There is a newer version of this series
arch/x86/include/asm/cpufeature.h |  1 +
arch/x86/kernel/cpu/common.c      |  4 ++++
arch/x86/kernel/cpu/cpuid-deps.c  | 10 ++++++++++
3 files changed, 15 insertions(+)
[RFC PATCH] x86/cpufeature: Add feature dependency checks
Posted by Sohil Mehta 1 year, 5 months ago
Currently, the cpuid-deps[] table is only exercised when a particular
feature gets explicitly disabled and clear_cpu_cap() is called. However,
some of these listed dependencies might already be missing during boot.
Unexpected failures can occur when the kernel tries to use such a
feature.

Therefore, add boot time checks for missing feature dependencies and
disable any feature whose dependencies are not met.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
---
Arguably, this situation should only happen on broken hardware and it may not
make sense to add such a check to the kernel. OTOH, this can be viewed as a
safety mechanism to make failures more graceful on such configurations in real
or virtual environments.

I feel since we already have the cpuid-deps[] table and the incremental changes
are small, this patch might be a useful addition.

Also, if this check seems worthwhile, would it be useful to combine and rewrite
it with filter_cpuid_features() since it tries to do something similar?
---

 arch/x86/include/asm/cpufeature.h |  1 +
 arch/x86/kernel/cpu/common.c      |  4 ++++
 arch/x86/kernel/cpu/cpuid-deps.c  | 10 ++++++++++
 3 files changed, 15 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 0b9611da6c53..347ef04f65ef 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -148,6 +148,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
 
 extern void setup_clear_cpu_cap(unsigned int bit);
 extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
+extern void filter_feature_dependencies(struct cpuinfo_x86 *c);
 
 #define setup_force_cpu_cap(bit) do {			\
 							\
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d4e539d4e158..6b725dbd8db7 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1602,6 +1602,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
 
 		c->cpu_index = 0;
 		filter_cpuid_features(c, false);
+		filter_feature_dependencies(c);
 
 		if (this_cpu->c_bsp_init)
 			this_cpu->c_bsp_init(c);
@@ -1854,6 +1855,9 @@ static void identify_cpu(struct cpuinfo_x86 *c)
 	/* Filter out anything that depends on CPUID levels we don't have */
 	filter_cpuid_features(c, true);
 
+	/* Filter out features that don't have their dependencies met */
+	filter_feature_dependencies(c);
+
 	/* If the model name is still unset, do table lookup. */
 	if (!c->x86_model_id[0]) {
 		const char *p;
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index b7d9f530ae16..88b34a97278a 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -147,3 +147,13 @@ void setup_clear_cpu_cap(unsigned int feature)
 {
 	do_clear_cpu_cap(NULL, feature);
 }
+
+void filter_feature_dependencies(struct cpuinfo_x86 *c)
+{
+	const struct cpuid_dep *d;
+
+	for (d = cpuid_deps; d->feature; d++) {
+		if (boot_cpu_has(d->feature) && !boot_cpu_has(d->depends))
+			do_clear_cpu_cap(c, d->feature);
+	}
+}
-- 
2.34.1
Re: [RFC PATCH] x86/cpufeature: Add feature dependency checks
Posted by Sean Christopherson 1 year, 5 months ago
On Thu, Aug 22, 2024, Sohil Mehta wrote:
> Currently, the cpuid-deps[] table is only exercised when a particular
> feature gets explicitly disabled and clear_cpu_cap() is called. However,
> some of these listed dependencies might already be missing during boot.
> Unexpected failures can occur when the kernel tries to use such a
> feature.
> 
> Therefore, add boot time checks for missing feature dependencies and
> disable any feature whose dependencies are not met.
> 
> Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
> ---
> Arguably, this situation should only happen on broken hardware and it may not
> make sense to add such a check to the kernel. OTOH, this can be viewed as a
> safety mechanism to make failures more graceful on such configurations in real
> or virtual environments.

And goofy Kconfigs.   But yeah, lack of any meaningful fallout is why my version
didn't go anywhere.

https://lore.kernel.org/all/20221203003745.1475584-2-seanjc@google.com

> I feel since we already have the cpuid-deps[] table and the incremental changes
> are small, this patch might be a useful addition.
> 
> Also, if this check seems worthwhile, would it be useful to combine and rewrite
> it with filter_cpuid_features() since it tries to do something similar?
> ---
> 
>  arch/x86/include/asm/cpufeature.h |  1 +
>  arch/x86/kernel/cpu/common.c      |  4 ++++
>  arch/x86/kernel/cpu/cpuid-deps.c  | 10 ++++++++++
>  3 files changed, 15 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 0b9611da6c53..347ef04f65ef 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -148,6 +148,7 @@ extern const char * const x86_bug_flags[NBUGINTS*32];
>  
>  extern void setup_clear_cpu_cap(unsigned int bit);
>  extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit);
> +extern void filter_feature_dependencies(struct cpuinfo_x86 *c);
>  
>  #define setup_force_cpu_cap(bit) do {			\
>  							\
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index d4e539d4e158..6b725dbd8db7 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1602,6 +1602,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c)
>  
>  		c->cpu_index = 0;
>  		filter_cpuid_features(c, false);
> +		filter_feature_dependencies(c);
>  
>  		if (this_cpu->c_bsp_init)
>  			this_cpu->c_bsp_init(c);
> @@ -1854,6 +1855,9 @@ static void identify_cpu(struct cpuinfo_x86 *c)
>  	/* Filter out anything that depends on CPUID levels we don't have */
>  	filter_cpuid_features(c, true);
>  
> +	/* Filter out features that don't have their dependencies met */
> +	filter_feature_dependencies(c);
> +
>  	/* If the model name is still unset, do table lookup. */
>  	if (!c->x86_model_id[0]) {
>  		const char *p;
> diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
> index b7d9f530ae16..88b34a97278a 100644
> --- a/arch/x86/kernel/cpu/cpuid-deps.c
> +++ b/arch/x86/kernel/cpu/cpuid-deps.c
> @@ -147,3 +147,13 @@ void setup_clear_cpu_cap(unsigned int feature)
>  {
>  	do_clear_cpu_cap(NULL, feature);
>  }
> +
> +void filter_feature_dependencies(struct cpuinfo_x86 *c)
> +{
> +	const struct cpuid_dep *d;
> +
> +	for (d = cpuid_deps; d->feature; d++) {
> +		if (boot_cpu_has(d->feature) && !boot_cpu_has(d->depends))

I don't think checking boot_cpu_has() is correct, it's entirely possible for a CPU
to have divergent features from the boot CPU, e.g. if a feature is dependent on
BIOS enabling (or disabling) and BIOS messed up.

> +			do_clear_cpu_cap(c, d->feature);
> +	}
> +}
> -- 
> 2.34.1
>
Re: [RFC PATCH] x86/cpufeature: Add feature dependency checks
Posted by Sohil Mehta 1 year, 5 months ago
On 8/22/2024 4:27 PM, Sean Christopherson wrote:
> On Thu, Aug 22, 2024, Sohil Mehta wrote:
>> Arguably, this situation should only happen on broken hardware and it may not
>> make sense to add such a check to the kernel. OTOH, this can be viewed as a
>> safety mechanism to make failures more graceful on such configurations in real
>> or virtual environments.
> 
> And goofy Kconfigs.   But yeah, lack of any meaningful fallout is why my version
> didn't go anywhere.
> 

By fallout do you mean that the observed behavior when the kernel runs
into such a misconfiguration or just the general lack of such
misconfigured hardware/guest?

I tried experimenting with the behavior for the last entry on the
cpuid_deps[] table:
{ X86_FEATURE_FRED,                     X86_FEATURE_WRMSRNS   },

In this case, even if WRMSRNS is not present, the kernel would go ahead
and enable FRED, which would cause a panic when wrmsrns() is exercised
in update_task_stack().

I agree to the second part that such conditions are more likely to
happen in pre-production environments. But I still feel that for the
rare case when something like this seeps through it would be better to
disable the feature upfront than run in a kernel panic or some other
unexpected behavior.

> https://lore.kernel.org/all/20221203003745.1475584-2-seanjc@google.com
> 

The code is very similar to the one I proposed. If we do take this
forward, would it be fine if I add a Originally-by tag from you?


>> +void filter_feature_dependencies(struct cpuinfo_x86 *c)
>> +{
>> +	const struct cpuid_dep *d;
>> +
>> +	for (d = cpuid_deps; d->feature; d++) {
>> +		if (boot_cpu_has(d->feature) && !boot_cpu_has(d->depends))
> 
> I don't think checking boot_cpu_has() is correct, it's entirely possible for a CPU
> to have divergent features from the boot CPU, e.g. if a feature is dependent on
> BIOS enabling (or disabling) and BIOS messed up.
> 

Yeah, makes sense. cpu_has() would be better suited as you have done in
your original patch.

>> +			do_clear_cpu_cap(c, d->feature);
>> +	}
>> +}
>> -- 
>> 2.34.1
>>
Re: [RFC PATCH] x86/cpufeature: Add feature dependency checks
Posted by Sean Christopherson 1 year, 5 months ago
On Fri, Aug 23, 2024, Sohil Mehta wrote:
> On 8/22/2024 4:27 PM, Sean Christopherson wrote:
> > On Thu, Aug 22, 2024, Sohil Mehta wrote:
> >> Arguably, this situation should only happen on broken hardware and it may not
> >> make sense to add such a check to the kernel. OTOH, this can be viewed as a
> >> safety mechanism to make failures more graceful on such configurations in real
> >> or virtual environments.
> > 
> > And goofy Kconfigs.   But yeah, lack of any meaningful fallout is why my version
> > didn't go anywhere.
> > 
> 
> By fallout do you mean that the observed behavior when the kernel runs
> into such a misconfiguration

This.

> or just the general lack of such
> misconfigured hardware/guest?
> 
> I tried experimenting with the behavior for the last entry on the
> cpuid_deps[] table:
> { X86_FEATURE_FRED,                     X86_FEATURE_WRMSRNS   },
> 
> In this case, even if WRMSRNS is not present, the kernel would go ahead
> and enable FRED, which would cause a panic when wrmsrns() is exercised
> in update_task_stack().
> 
> I agree to the second part that such conditions are more likely to
> happen in pre-production environments.

And in VMs, e.g. unless the SDM explicitly says FRED implies WRMSRNS, it will be
architecturally legal, if unusual, to advertise FRED with WRMSRNS to a guest.

> But I still feel that for the rare case when something like this seeps
> through it would be better to disable the feature upfront than run in a
> kernel panic or some other unexpected behavior.

Agreed.

> > https://lore.kernel.org/all/20221203003745.1475584-2-seanjc@google.com
> > 
> 
> The code is very similar to the one I proposed. If we do take this
> forward, would it be fine if I add a Originally-by tag from you?

No need, you came up with the code independently.
Re: [RFC PATCH] x86/cpufeature: Add feature dependency checks
Posted by Sohil Mehta 1 year, 5 months ago
On 8/26/2024 1:05 PM, Sean Christopherson wrote:
> On Fri, Aug 23, 2024, Sohil Mehta wrote:
>> But I still feel that for the rare case when something like this seeps
>> through it would be better to disable the feature upfront than run in a
>> kernel panic or some other unexpected behavior.
> 
> Agreed.
> 

Great, I'll wait for a few more days to see if someone says otherwise.