Declare safe late loadable microcode

[PATCH v1 Part2 3/5] x86/microcode: Add a generic mechanism to declare support for minrev

Posted by Ashok Raj 2 years, 8 months ago

Intel microcode adds some meta-data to report a minimum required revision
before this new microcode can be safely late loaded. There are no generic
mechanism to declare support for all vendors.

Add generic support to microcode core to declare such support, this allows
late-loading to be permitted in those architectures that report support
for safe late loading.

Late loading has added support for

- New images declaring a required minimum base version before a late-load
  is performed.

Tainting only happens on architectures that don't support minimum required
version reporting.

Add a new variable in microcode_ops to allow an architecture to declare
support for safe microcode late loading.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: x86 <x86@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner (Intel) <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Stefan Talpalaru <stefantalpalaru@yahoo.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Peter Zilstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
---
 arch/x86/include/asm/microcode.h      |  2 ++
 arch/x86/kernel/cpu/microcode/core.c  | 25 ++++++++++++++++++++-----
 arch/x86/kernel/cpu/microcode/intel.c |  1 +
 arch/x86/Kconfig                      |  7 ++++---
 4 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/microcode.h b/arch/x86/include/asm/microcode.h
index d5a58bde091c..3d48143e84a9 100644
--- a/arch/x86/include/asm/microcode.h
+++ b/arch/x86/include/asm/microcode.h
@@ -33,6 +33,8 @@ enum ucode_state {
 };
 
 struct microcode_ops {
+	bool safe_late_load;
+
 	enum ucode_state (*request_microcode_fw) (int cpu, struct device *);
 
 	void (*microcode_fini_cpu) (int cpu);
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index c361882baf63..446ddf3fcc29 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -472,6 +472,7 @@ static ssize_t reload_store(struct device *dev,
 	enum ucode_state tmp_ret = UCODE_OK;
 	int bsp = boot_cpu_data.cpu_index;
 	unsigned long val;
+	bool safe_late_load = false;
 	ssize_t ret = 0;
 
 	ret = kstrtoul(buf, 0, &val);
@@ -487,13 +488,22 @@ static ssize_t reload_store(struct device *dev,
 	if (ret)
 		goto put;
 
+	safe_late_load = microcode_ops->safe_late_load;
+
+	/*
+	 * If safe loading indication isn't present, bail out.
+	 */
+	if (!safe_late_load) {
+		pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
+		pr_err("You should switch to early loading, if possible.\n");
+		ret = -EINVAL;
+		goto put;
+	}
+
 	tmp_ret = microcode_ops->request_microcode_fw(bsp, &microcode_pdev->dev);
 	if (tmp_ret != UCODE_NEW)
 		goto put;
 
-	pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
-	pr_err("You should switch to early loading, if possible.\n");
-
 	mutex_lock(&microcode_mutex);
 	ret = microcode_reload_late();
 	mutex_unlock(&microcode_mutex);
@@ -501,11 +511,16 @@ static ssize_t reload_store(struct device *dev,
 put:
 	cpus_read_unlock();
 
+	/*
+	 * Only taint if a successful load and vendor doesn't support
+	 * safe_late_load
+	 */
+	if (!(ret && safe_late_load))
+		add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+
 	if (ret == 0)
 		ret = size;
 
-	add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
-
 	return ret;
 }
 
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 6046f90a47b2..eba4f463ef1c 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -806,6 +806,7 @@ static enum ucode_state request_microcode_fw(int cpu, struct device *device)
 }
 
 static struct microcode_ops microcode_intel_ops = {
+	.safe_late_load			  = true,
 	.request_microcode_fw             = request_microcode_fw,
 	.collect_cpu_info                 = collect_cpu_info,
 	.apply_microcode                  = apply_microcode_intel,
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 3604074a878b..ddc4130e6f8c 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1352,15 +1352,16 @@ config MICROCODE_AMD
 	  processors will be enabled.
 
 config MICROCODE_LATE_LOADING
-	bool "Late microcode loading (DANGEROUS)"
-	default n
+	bool "Late microcode loading"
+	default y
 	depends on MICROCODE
 	help
 	  Loading microcode late, when the system is up and executing instructions
 	  is a tricky business and should be avoided if possible. Just the sequence
 	  of synchronizing all cores and SMT threads is one fragile dance which does
 	  not guarantee that cores might not softlock after the loading. Therefore,
-	  use this at your own risk. Late loading taints the kernel too.
+	  use this at your own risk. Late loading taints the kernel, if it
+	  doesn't support a minimum required base version before an update.
 
 config X86_MSR
 	tristate "/dev/cpu/*/msr - Model-specific register support"
-- 
2.34.1

Re: [PATCH v1 Part2 3/5] x86/microcode: Add a generic mechanism to declare support for minrev

Posted by Thomas Gleixner 2 years, 7 months ago

Ashok!

On Fri, Jan 13 2023 at 09:29, Ashok Raj wrote:
> Intel microcode adds some meta-data to report a minimum required revision
> before this new microcode can be safely late loaded. There are no generic

s/this new microcode/a new microcode revision/

Changelogs are not restricted by twitter posting rules.

> mechanism to declare support for all vendors.
>
> Add generic support to microcode core to declare such support, this allows
> late-loading to be permitted in those architectures that report support
> for safe late loading.
>
> Late loading has added support for
>
> - New images declaring a required minimum base version before a late-load
>   is performed.
>
> Tainting only happens on architectures that don't support minimum required
> version reporting.
>
> Add a new variable in microcode_ops to allow an architecture to declare
> support for safe microcode late loading.
> @@ -487,13 +488,22 @@ static ssize_t reload_store(struct device *dev,
>  	if (ret)
>  		goto put;
>  
> +	safe_late_load = microcode_ops->safe_late_load;
> +
> +	/*
> +	 * If safe loading indication isn't present, bail out.
> +	 */
> +	if (!safe_late_load) {
> +		pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
> +		pr_err("You should switch to early loading, if possible.\n");
> +		ret = -EINVAL;
> +		goto put;
> +	}
> +
>  	tmp_ret = microcode_ops->request_microcode_fw(bsp, &microcode_pdev->dev);
>  	if (tmp_ret != UCODE_NEW)
>  		goto put;
>  
> -	pr_err("Attempting late microcode loading - it is dangerous and taints the kernel.\n");
> -	pr_err("You should switch to early loading, if possible.\n");
> -

Why are you not moving the pr_err()s right away (in 1/5) to the place
where you move it now?

>  	mutex_lock(&microcode_mutex);
>  	ret = microcode_reload_late();
>  	mutex_unlock(&microcode_mutex);
> @@ -501,11 +511,16 @@ static ssize_t reload_store(struct device *dev,
>  put:
>  	cpus_read_unlock();
>  
> +	/*
> +	 * Only taint if a successful load and vendor doesn't support
> +	 * safe_late_load
> +	 */
> +	if (!(ret && safe_late_load))
> +		add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);

The resulting code is undecodable garbage. Whats worse is that the
existing logic in this code is broken already.

#1
	ssize_t ret = 0;

This 'ret = 0' assignment is pointless as ret is immediately overwritten
by the next line:

	ret = kstrtoul(buf, 0, &val);
	if (ret)
		return ret;

	if (val != 1)
		return size;

Now this is really useful. If the value is invalid, i.e. it causes the
function to abort immediately it returns 'size' which means the write
was successful. Oh well.

Now lets look at a few lines further down:

#2

	ssize_t ret = 0;
        ...
        ret = check_online_cpus();
	if (ret)
		goto put;
        ...
put:
        ...
	add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
        ...
        return ret;

Why are we tainting the kernel when there was absolutely ZERO action
done here? All what check_online_cpus() figured out was that not enough
CPUs were online, right? That justfies a error return, but the taint is
bogus, no?

The next bogosity is:

	ssize_t ret = 0;
        ...
        tmp_ret = microcode_ops->request_microcode_fw(bsp, &microcode_pdev->dev);
	if (tmp_ret != UCODE_NEW)
		goto put;
        ...    
put:
        ...
	add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);

	if (ret == 0)
		ret = size;

        return ret;

IOW, the microcode request can fail for whatever reason and the return
value is unconditionally 'size' which means the write to the sysfs file
is successfull.

#3

Not to talk about the completely broken error handling in the actual
microcode loading case in __reload_late()::wait_for_siblings code path.

Maybe more #...

How does any of this make sense and allows sensible scripting of this
interface?

Surely you spent several orders of magnitude more time to stare at this
code than I did during this review, no?

Now instead of noticing and fixing any of this nonsense you are duct
taping this whole safe_late_load handling into that mess to make it even
more incomprehensible.

If you expected an alternative patch here, then I have to disappoint
you.

I'm not presenting you the proper solution this time on a silver tablet
because I'm in the process of taming my 'let me fix this for you' reflex
to prepare for my retirement some years down the road.

But you should have enough hints to fix all of this for real, right?

Thanks,

        tglx

Re: [PATCH v1 Part2 3/5] x86/microcode: Add a generic mechanism to declare support for minrev