[Patch v3 Part2 4/9] x86/microcode: Do not call apply_microcode() on sibling threads

Ashok Raj posted 9 patches 2 years, 10 months ago
[Patch v3 Part2 4/9] x86/microcode: Do not call apply_microcode() on sibling threads
Posted by Ashok Raj 2 years, 10 months ago
Microcode updates are applied at the core, so an update to one HT sibling
is effective on all HT siblings of the same core.

During late-load, after the primary has updated the microcode, it also
reflects that in the per-cpu structure (cpuinfo_x86) holding the current
revision.

Current code calls apply_microcode() to update the SW per-cpu revision.

But in the odd case when primary returned with an error, and as a result
the secondary didn't get the revision updated, will attempt to perform
a patch load and the primary has already been released to the system.
This could be problematic, because the whole rendezvous dance is to
prevent updates when one of the siblings could be executing arbitrary code.

Replace apply_microcode() with a call to collect_cpu_info() and let that
call also update the per-cpu structure instead of returning the previously
cached values.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: x86 <x86@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner (Intel) <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Stefan Talpalaru <stefantalpalaru@yahoo.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Peter Zilstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Cooper <Andrew.Cooper3@citrix.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Martin Pohlack <mpohlack@amazon.de>
---
 arch/x86/kernel/cpu/microcode/core.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index e4b4dfcf2d18..8452fad89bf6 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -386,6 +386,7 @@ static int __wait_for_cpus(atomic_t *t, long long timeout)
 static int __reload_late(void *info)
 {
 	int cpu = smp_processor_id();
+	struct ucode_cpu_info *uci;
 	enum ucode_state err;
 	int ret = 0;
 
@@ -421,12 +422,13 @@ static int __reload_late(void *info)
 
 	/*
 	 * At least one thread has completed update on each core.
-	 * For others, simply call the update to make sure the
-	 * per-cpu cpuinfo can be updated with right microcode
-	 * revision.
+	 * For siblings, collect the cpuinfo and update the
+	 * per-cpu cpuinfo with the current microcode revision.
 	 */
-	if (cpumask_first(topology_sibling_cpumask(cpu)) != cpu)
-		err = microcode_ops->apply_microcode(cpu);
+	if (cpumask_first(topology_sibling_cpumask(cpu)) != cpu) {
+		uci = ucode_cpu_info + cpu;
+		microcode_ops->collect_cpu_info(cpu, &uci->cpu_sig);
+	}
 
 	return ret;
 }
-- 
2.37.2
Re: [Patch v3 Part2 4/9] x86/microcode: Do not call apply_microcode() on sibling threads
Posted by Dave Hansen 2 years, 10 months ago
On 1/30/23 13:39, Ashok Raj wrote:
> Microcode updates are applied at the core, so an update to one HT sibling
> is effective on all HT siblings of the same core.
> 
> During late-load, after the primary has updated the microcode, it also
> reflects that in the per-cpu structure (cpuinfo_x86) holding the current
> revision.
> 
> Current code calls apply_microcode() to update the SW per-cpu revision.

I'm having a hard time following this.  I can't even suggest a better
message because I can't grok this one.

> But in the odd case when primary returned with an error, and as a result
> the secondary didn't get the revision updated, will attempt to perform
> a patch load and the primary has already been released to the system.
> This could be problematic, because the whole rendezvous dance is to
> prevent updates when one of the siblings could be executing arbitrary code.

OK, let me see if I understand:

Today, ->apply_microcode() is called for both CPU threads.  Typically,
T0 comes in and will actually successfully update the microcode.  T1
will come in later, notice that T0 updated the microcode already and
return without even trying to do the update WRMSR.  One thing T1 _does_
do before returning is to update the per-cpu data.

That works great, unless T0 experiences an error.  In that case, T0 will
jump out of __reload_late() after failing to do the update.  T1 will
come bumbling along after it and will enter ->apply_microcode(),
blissfully unaware of T0's failure.  T1 will assume that it is supposed
to do T0's job, noting "rev < mc->hdr.rev".  T1 will write the MSR while
T0 is off doing god knows what.

T1 should not even be attempting to do ->apply_microcode() because T0 is
not quiescent.

> Replace apply_microcode() with a call to collect_cpu_info() and let that
> call also update the per-cpu structure instead of returning the previously
> cached values.

	To fix this, remove the path where T1 calls ->apply_microcode().
	However, this alone would leave the per-cpu metadata for T1 out
	of date.  Call collect_cpu_info() to ensure it is updated.

Right?

FWIW, this seems a bit hacky and inconsistent to me.  It would be nice
if the common T0/T1 work (updating the per-cpu metadata) was done with
common code.

Could we zap the uci->cpu_sig.rev work entirely from ->apply_microcode()
and do it in __reload_late() instead?
Re: [Patch v3 Part2 4/9] x86/microcode: Do not call apply_microcode() on sibling threads
Posted by Borislav Petkov 2 years, 10 months ago
On Wed, Feb 01, 2023 at 02:21:18PM -0800, Dave Hansen wrote:
> That works great, unless T0 experiences an error.  In that case, T0 will
> jump out of __reload_late() after failing to do the update.  T1 will
> come bumbling along after it and will enter ->apply_microcode(),
> blissfully unaware of T0's failure.  T1 will assume that it is supposed
> to do T0's job, noting "rev < mc->hdr.rev".  T1 will write the MSR while
> T0 is off doing god knows what.
> 
> T1 should not even be attempting to do ->apply_microcode() because T0 is
> not quiescent.

Yah, thanks for explaining properly.

So, if T0 fails, then we will say that it failed. The ->apply_microcode()
call on T1 was never meant to apply any microcode - just to update the
cached data.

Now, if T0 fails, then it doesn't matter what T1 does - you have a
bigger problem:

A subset of the cores is running with new microcode while other subset
with the old one. Now this is a shit situation I don't want to be in.

And I don't have a good way out of it.

Revert to the old patch? Maybe...

Retry to application on all again with the hope that it works this time?

What if some core touches a MSR being added with the new microcode
patch?

Late loading is a big PITA. As we've been preaching for a while now.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [Patch v3 Part2 4/9] x86/microcode: Do not call apply_microcode() on sibling threads
Posted by Ashok Raj 2 years, 10 months ago
On Wed, Feb 01, 2023 at 11:40:58PM +0100, Borislav Petkov wrote:
> On Wed, Feb 01, 2023 at 02:21:18PM -0800, Dave Hansen wrote:
> > That works great, unless T0 experiences an error.  In that case, T0 will
> > jump out of __reload_late() after failing to do the update.  T1 will
> > come bumbling along after it and will enter ->apply_microcode(),
> > blissfully unaware of T0's failure.  T1 will assume that it is supposed
> > to do T0's job, noting "rev < mc->hdr.rev".  T1 will write the MSR while
> > T0 is off doing god knows what.
> > 
> > T1 should not even be attempting to do ->apply_microcode() because T0 is
> > not quiescent.
> 
> Yah, thanks for explaining properly.
> 
> So, if T0 fails, then we will say that it failed. The ->apply_microcode()
> call on T1 was never meant to apply any microcode - just to update the
> cached data.

The commit log "attempted" to convey that, replace primary with T0, and
secondary with T1.

> 
> Now, if T0 fails, then it doesn't matter what T1 does - you have a
> bigger problem:
> 
> A subset of the cores is running with new microcode while other subset
> with the old one. Now this is a shit situation I don't want to be in.
> 
> And I don't have a good way out of it.

T1 shouldn't be sent down the apply_microcode() path, but instead
just gather and update the per-cpu info reflecting the current revision.

Patch 3 and 4 attempted to that. 

In addition....to ensure cores being out of sync within themselves

At wait_for_siblings: Each thread can check their rev against the BSP (yes
BSP can be removed, but we can elect a core leader) and if they are
different we can either warn/taint or pull the plug and panic.


> 
> Revert to the old patch? Maybe...
> 
> Retry to application on all again with the hope that it works this time?
> 
> What if some core touches a MSR being added with the new microcode
> patch?
> 
> Late loading is a big PITA. As we've been preaching for a while now.
> 

Cheers,
Ashok
Re: [Patch v3 Part2 4/9] x86/microcode: Do not call apply_microcode() on sibling threads
Posted by Borislav Petkov 2 years, 10 months ago
On Wed, Feb 01, 2023 at 06:51:44PM -0800, Ashok Raj wrote:
> T1 shouldn't be sent down the apply_microcode() path, but instead
> just gather and update the per-cpu info reflecting the current revision.

As I said previously, it doesn't apply the microcode but simply updates
the cached info. The assumption was that T0 would not fail.

> At wait_for_siblings: Each thread can check their rev against the BSP (yes
> BSP can be removed, but we can elect a core leader) and if they are

A core leader?

The one who succeeded in doing the update?

> different we can either warn/taint or pull the plug and panic.

What about SMT=off? How are you gonna handle that? Who's the core leader
there?

What about the other vendor? That hackery of yours needs to be verified
there too.

So this whole deal needs to be analyzed properly. What would happen in
any potential outcome, how should the kernel behave in each case, etc,
etc.

In thinking about the minrev, I can just as well imagine that such
microcode patches which could cause such a failure should be marked and
not loaded late either. But I'm sure you don't want that.

In any case, without a proper analysis and writeup how that new
algorithm is going to look like and behave, this is not going anywhere.
No ad-hoc patches, no let's fix that aspect first.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [Patch v3 Part2 4/9] x86/microcode: Do not call apply_microcode() on sibling threads
Posted by Ashok Raj 2 years, 10 months ago
Thanks Boris,

On Thu, Feb 02, 2023 at 10:40:31AM +0100, Borislav Petkov wrote:
> On Wed, Feb 01, 2023 at 06:51:44PM -0800, Ashok Raj wrote:
> > T1 shouldn't be sent down the apply_microcode() path, but instead
> > just gather and update the per-cpu info reflecting the current revision.
> 
> As I said previously, it doesn't apply the microcode but simply updates
> the cached info. The assumption was that T0 would not fail.

That is super clear. But the conclusion was, in the case T0 fails there are
some side effect that are hard to manage with the current algorighm.  

That was the motivation for the  change. Sending T1 to just collect and
report the revision, but not via apply_microcode() but via some alternate
function that doesn't have any bad interaction in the possible case if T0
failed.

> 
> > At wait_for_siblings: Each thread can check their rev against the BSP (yes
> > BSP can be removed, but we can elect a core leader) and if they are
> 
> A core leader?
> 
> The one who succeeded in doing the update?

My bad, core leader is the wrong term,  I meant something like how MCE
selects a rendezvous leader today. IRC, what we call as the monarch in
MCE land when it handles broadcast MCE's.

> 
> > different we can either warn/taint or pull the plug and panic.
> 
> What about SMT=off? How are you gonna handle that? Who's the core leader
> there?

This is broken today, IRC, we started to check cpu_present_mask and
cpu_online_mask counts are equal. This change to allow was a late change.

07d981ad4cf1 ("x86/microcode: Allow late microcode loading with SMT disabled")

Since those threads are in mwait(), I'm thinking for correctness we need to
revert that change.

Reverting the patch and borking late-load if ht=off was used, seems the
mitigation to prevent that.

Most BIOS's should have a similar ht_off, and that is the best way to turn
off HT. Its probably better that way to get all execution core resources
reserved for that one active thread.

> 
> What about the other vendor? That hackery of yours needs to be verified
> there too.

You mean updating the revision only outside of the path to
apply_microcode()? 

Its only AMD/Intel today for microcode updates. Yes we should test and
validate if this assumption is correct. But that's the *assumption* today
correct?

> 
> So this whole deal needs to be analyzed properly. What would happen in
> any potential outcome, how should the kernel behave in each case, etc,
> etc.

Complete aggreement. The one case we overlooked in the current algorithm
was "what if any core is in disaggreement with each other"

If there are other cases that come to mind, I can invest in looking into
that.

> 
> In thinking about the minrev, I can just as well imagine that such
> microcode patches which could cause such a failure should be marked and
> not loaded late either. But I'm sure you don't want that.

Once there is minrev enforcement,  default action, this is automatic,
since we refuse to load anything with minrev=0. 

They automatically become eligible for early load only. 

> 
> In any case, without a proper analysis and writeup how that new
> algorithm is going to look like and behave, this is not going anywhere.
> No ad-hoc patches, no let's fix that aspect first.

I'm happy to add more into Documentation/x86/microcode.rst. I can work with
Dave Hansen on closing or enumerating the algorithm and why we made the
choices. 

Cheers,
Ashok