Making microcode late-load robust

[PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode

Posted by Ashok Raj 3 years, 7 months ago

When save_microcode_patch() is looking to replace an existing microcode in
the cache, current code is *only* checks the CPU sig/pf in the main
header. Microcode can carry additional sig/pf combinations in the extended
signature table, which is completely missed today.

For e.g. Current patch is a multi-stepping patch and new incoming patch is
a specific patch just for this CPUs stepping.

patch1:
fms3 <--- header FMS
...
ext_sig:
fms1
fms2

patch2: new
fms2 <--- header FMS

Current code takes only fms3 and checks with patch2 fms2.

saved_patch.header.fms3 != new_patch.header.fms2, so save_microcode_patch
saves it to the end of list instead of replacing patch1 with patch2.

There is no functional user observable issue since find_patch() skips
patch versions that are <= current_patch and will land on patch2 properly.

Nevertheless this will just end up storing every patch that isn't required.
Kernel just needs to store the latest patch. Otherwise its a memory leak
that sits in kernel and never used.

Cc: stable@vger.kernel.org
Fixes: fe055896c040 ("x86/microcode: Merge the early microcode loader")
Tested-by: William Xie <william.xie@intel.com>
Reported-by: William Xie <william.xie@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 025c8f0cd948..c4b11e2fbe33 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -114,10 +114,18 @@ static void save_microcode_patch(struct ucode_cpu_info *uci, void *data, unsigne
 
 	list_for_each_entry_safe(iter, tmp, &microcode_cache, plist) {
 		mc_saved_hdr = (struct microcode_header_intel *)iter->data;
-		sig	     = mc_saved_hdr->sig;
-		pf	     = mc_saved_hdr->pf;
 
-		if (find_matching_signature(data, sig, pf)) {
+		sig = uci->cpu_sig.sig;
+		pf  = uci->cpu_sig.pf;
+
+		/*
+		 * Compare the current CPUs signature with the ones in the
+		 * cache to identify the right candidate to replace. At any
+		 * given time, we should have no more than one valid patch
+		 * file for a given CPU fms+pf in the cache list.
+		 */
+
+		if (find_matching_signature(iter->data, sig, pf)) {
 			prev_found = true;
 
 			if (mc_hdr->rev <= mc_saved_hdr->rev)
-- 
2.32.0

Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode

Posted by Borislav Petkov 3 years, 7 months ago

On Wed, Aug 17, 2022 at 05:11:23AM +0000, Ashok Raj wrote:
> When save_microcode_patch() is looking to replace an existing microcode in
> the cache, current code is *only* checks the CPU sig/pf in the main

Write those "sig/pf" things out once so that it is clear what that is.

> header. Microcode can carry additional sig/pf combinations in the extended
> signature table, which is completely missed today.
> 
> For e.g. Current patch is a multi-stepping patch and new incoming patch is
> a specific patch just for this CPUs stepping.
> 
> patch1:
> fms3 <--- header FMS
> ...
> ext_sig:
> fms1
> fms2
> 
> patch2: new
> fms2 <--- header FMS
> 
> Current code takes only fms3 and checks with patch2 fms2.

So, find_matching_signature() does all the signatures matching and
scanning already. If anything, that function should tell its callers
whether the patch it is looking at - the fms2 one - should replace the
current one or not.

I.e., all the logic to say how strong a patch match is, should be
concentrated there. And then the caller will do the according action.

> saved_patch.header.fms3 != new_patch.header.fms2, so save_microcode_patch
> saves it to the end of list instead of replacing patch1 with patch2.
> 
> There is no functional user observable issue since find_patch() skips
> patch versions that are <= current_patch and will land on patch2 properly.
> 
> Nevertheless this will just end up storing every patch that isn't required.
> Kernel just needs to store the latest patch. Otherwise its a memory leak
> that sits in kernel and never used.

Oh well.

> Cc: stable@vger.kernel.org

Why?

This looks like a small correction to me which doesn't need to go to
stable...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode

Posted by Ashok Raj 3 years, 7 months ago

On Fri, Aug 19, 2022 at 12:24:41PM +0200, Borislav Petkov wrote:
> On Wed, Aug 17, 2022 at 05:11:23AM +0000, Ashok Raj wrote:
> > When save_microcode_patch() is looking to replace an existing microcode in
> > the cache, current code is *only* checks the CPU sig/pf in the main
> 
> Write those "sig/pf" things out once so that it is clear what that is.

Thanks, will do.
> 
> > header. Microcode can carry additional sig/pf combinations in the extended
> > signature table, which is completely missed today.
> > 
> > For e.g. Current patch is a multi-stepping patch and new incoming patch is
> > a specific patch just for this CPUs stepping.
> > 
> > patch1:
> > fms3 <--- header FMS
> > ...
> > ext_sig:
> > fms1
> > fms2
> > 
> > patch2: new
> > fms2 <--- header FMS
> > 
> > Current code takes only fms3 and checks with patch2 fms2.
> 
> So, find_matching_signature() does all the signatures matching and
> scanning already. If anything, that function should tell its callers
> whether the patch it is looking at - the fms2 one - should replace the
> current one or not.
> 
> I.e., all the logic to say how strong a patch match is, should be
> concentrated there. And then the caller will do the according action.

I updated the commit log accordingly. Basically find_matching_signature()
is only intended to find a CPU's sig/pf against a microcode image and not
intended to compare between two different images. 
> 
> > saved_patch.header.fms3 != new_patch.header.fms2, so save_microcode_patch
> > saves it to the end of list instead of replacing patch1 with patch2.
> > 
> > There is no functional user observable issue since find_patch() skips
> > patch versions that are <= current_patch and will land on patch2 properly.
> > 
> > Nevertheless this will just end up storing every patch that isn't required.
> > Kernel just needs to store the latest patch. Otherwise its a memory leak
> > that sits in kernel and never used.
> 
> Oh well.
> 
> > Cc: stable@vger.kernel.org
> 
> Why?

We have some code to support model specific microcode rollback support.
This code is just internal. That codebase triggered the bug. 

I'll drop the Cc next time.

Cheers,
Ashok

Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode

Posted by Borislav Petkov 3 years, 7 months ago

On Tue, Aug 23, 2022 at 11:13:13AM +0000, Ashok Raj wrote:
> > > patch1:
> > > fms3 <--- header FMS
> > > ...
> > > ext_sig:
> > > fms1
> > > fms2
> > > 
> > > patch2: new
> > > fms2 <--- header FMS
> > > 
> > > Current code takes only fms3 and checks with patch2 fms2.
> > 
> > So, find_matching_signature() does all the signatures matching and
> > scanning already. If anything, that function should tell its callers
> > whether the patch it is looking at - the fms2 one - should replace the
> > current one or not.
> > 
> > I.e., all the logic to say how strong a patch match is, should be
> > concentrated there. And then the caller will do the according action.
> 
> I updated the commit log accordingly. Basically find_matching_signature()
> is only intended to find a CPU's sig/pf against a microcode image and not
> intended to compare between two different images. 

Err, what?

find_matching_signature() looks at fmt3 - your example above - and then
goes and looks at ext_sig. Also your example above.

So you can teach that function to say with a *separate* return value
"replace current patch with this new patch because this new patch is a
better fit."

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode

Posted by Ashok Raj 3 years, 7 months ago

On Wed, Aug 24, 2022 at 09:27:55PM +0200, Borislav Petkov wrote:
> On Tue, Aug 23, 2022 at 11:13:13AM +0000, Ashok Raj wrote:
> > > > patch1:
> > > > fms3 <--- header FMS
> > > > ...
> > > > ext_sig:
> > > > fms1
> > > > fms2
> > > > 
> > > > patch2: new
> > > > fms2 <--- header FMS
> > > > 
> > > > Current code takes only fms3 and checks with patch2 fms2.
> > > 
> > > So, find_matching_signature() does all the signatures matching and
> > > scanning already. If anything, that function should tell its callers
> > > whether the patch it is looking at - the fms2 one - should replace the
> > > current one or not.
> > > 
> > > I.e., all the logic to say how strong a patch match is, should be
> > > concentrated there. And then the caller will do the according action.
> > 
> > I updated the commit log accordingly. Basically find_matching_signature()
> > is only intended to find a CPU's sig/pf against a microcode image and not
> > intended to compare between two different images. 
> 
> Err, what?
> 
> find_matching_signature() looks at fmt3 - your example above - and then
> goes and looks at ext_sig. Also your example above.

-               sig          = mc_saved_hdr->sig;
-               pf           = mc_saved_hdr->pf;

-               if (find_matching_signature(data, sig, pf)) {

In the above example, 

Old patch header rev is fms3
New patch header is fms2
Your CPU sig is fms2. 

The code was taking mc_saved_hdr->sig (which is fms3) and looking in the
new patch (which only has fms2) now you will never find and end up in the
situation described, which is adding the new patch to the tail instead of
replacing patch1. 

Also look at all the existing usages of the function and we *always* use
the fms of the CPU.

Bottom line, you never need to have more than 1 patch for a given CPU. The
list only exists since it could technically support multi-stepping in the
system.

> 
> So you can teach that function to say with a *separate* return value
> "replace current patch with this new patch because this new patch is a
> better fit."

I didn't quite understand what needs to be taught. 

FWIW, I rewrote the commit  log with *lots* help from Dave. Hopefully the
full rewrite of the log can help jog memory about the purpose and why this
has survived all these years. Simply because the code still is capable of
picking the right microcode except in our internal testing it showed its
warts. 

The new commit log below: 

== Background ==

Microcode needs to be reapplied upon a resume from suspend flow. For this
purpose kernel saves a copy of the microcode after its loaded. Its possible
to build systems with two or more CPUs that have incompatible microcodes,
like if the CPUs have different stepping and each have separate microcode
files.

save_microcode_patch() is called when it comes time to load a new microcode
image into kernel. Its job is to either ADD this to the kernel cache list
if one didn't exist, or REPLACE if one was already saved in the list.
save_microcode_patch() is expected to search the saved list for the
specific CPU in the system, and replace it with the new incoming patch
being updated.

At any given time, there *must* be exactly one entry in the cache for a
given CPU in the system. If one isn't found, save_microcode_patch() just
adds this image to the list.

Lastly, microcode images have a CPU signature for compatible processors
that this image applies. The signature can be found in two places.

- The microcode header
- In the extended signature table in the image. This is located at the end
  of the image itself.

for e.g. The image would consist of

microcode header
    { rev, date, checksum, sigx }
image
    { microcode image }

extended_signature_table.
    sig1
    sig2
    sig3

When Intel releases microcode images, typically the most recent stepping is
listed in the main microcode header, and the additional (older) ones are
listed in the extended signature table.

== Problem ==
The kernel is supposed to match the CPUs signature (struct ucode_cpu_info)
with the existing patches in the microcode_cache list. But, instead it
tries to match the signature in the old microcode, against the one in the
saved patch list in microcode_cache.

This would seem to make perfect sense if both old and new had the same
signature listed in the microcode header. But if the current CPU is listed
in the extended signature (say like sig1, above), and the new patch had a
signature in the main header as sig2, and only sig2 in the new patch.
Current code would take sigx from the current image and compare with sig2.
this is a mismatch and yield no search results, and save_microcode_patch,
will keep both the old and new images. 

== Impact ==

When find_patch() is called since the code only checks for entries in the
cache > what's loaded in the CPU, it will skip the one at the head (which
is the older revision) and find the one at the tail, which is the newer
revision.

-- Problem 1--
So what exactly is the problem? find_patch() will always find the correct
patch to update on resume. But we end up storing an extra image that would
never be required. 

-- Problem 2 --
There was some internal version with support for microcode rollback, which
permits picking revisions that are older than currently loaded.
microcode_cache now has both the old and the new, in some situations it
selected the older patch instead of picking the newer revision at the end
of the list.

[Note, the code that surfaced the bug was purely for internal evaluation
purposes.]

Nevertheless this will just end up storing every patch that isn't required.
Kernel just needs to store the latest patch. Otherwise its a memory leak
that sits in kernel and never used. This is mostly harmless.

== Solution ==
Always search for an entry in the microcode_cache, with the interested CPU
signature instead of using the signature found in another microcode entry.

In fact every other usage of find_matching_microcode() is always called
with the CPU signature. This is the *only* instance that used the wrong
values to compare.

Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode

Posted by Borislav Petkov 3 years, 7 months ago

On Thu, Aug 25, 2022 at 03:27:57AM +0000, Ashok Raj wrote:
> Old patch header rev is fms3
> New patch header is fms2
> Your CPU sig is fms2.

Can you pls give me exact reproduction instructions how you come to this
situation? 

dhansen and I are talking about this whole deal on IRC and I'm having
more questions than answers so I'd like to reproduce this locally.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode

Posted by Ashok Raj 3 years, 7 months ago

On Fri, Aug 26, 2022 at 06:24:02PM +0200, Borislav Petkov wrote:
> On Thu, Aug 25, 2022 at 03:27:57AM +0000, Ashok Raj wrote:
> > Old patch header rev is fms3
> > New patch header is fms2
> > Your CPU sig is fms2.
> 
> Can you pls give me exact reproduction instructions how you come to this
> situation? 

Sure, the problem is that code base is something I *never* wanted to post,
since its not architectural.

The hard part is to split some of the internal code out and have a plain
version on public tree that can reproduce. 

You may need additional hackery since you won't have the additional
revisions for testing. I'll try to hack something up

Cheers,
Ashok

Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode

Posted by Borislav Petkov 3 years, 7 months ago

On Fri, Aug 26, 2022 at 05:18:32PM +0000, Ashok Raj wrote:
> Sure, the problem is that code base is something I *never* wanted to post,
> since its not architectural.

I don't need the code base - I need the microcode patches you're using
to trigger this.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode

Posted by Ingo Molnar 3 years, 7 months ago

* Ashok Raj <ashok.raj@intel.com> wrote:

> Cc: stable@vger.kernel.org

Not sure the series justifies stable - there's like a ton of things that 
could go wrong with a series like this & we want to have some serious 
testing first.

>  	list_for_each_entry_safe(iter, tmp, &microcode_cache, plist) {
>  		mc_saved_hdr = (struct microcode_header_intel *)iter->data;
> -		sig	     = mc_saved_hdr->sig;
> -		pf	     = mc_saved_hdr->pf;
>  
> -		if (find_matching_signature(data, sig, pf)) {
> +		sig = uci->cpu_sig.sig;
> +		pf  = uci->cpu_sig.pf;
> +
> +		/*
> +		 * Compare the current CPUs signature with the ones in the

s/CPUs
 /CPU's

Thanks,

	Ingo

Re: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode

Posted by Ashok Raj 3 years, 7 months ago

Hi Ingo

On Wed, Aug 17, 2022 at 09:43:07AM +0200, Ingo Molnar wrote:
> 
> * Ashok Raj <ashok.raj@intel.com> wrote:
> 
> > Cc: stable@vger.kernel.org
> 
> Not sure the series justifies stable - there's like a ton of things that 
> could go wrong with a series like this & we want to have some serious 
> testing first.

Its my bad I got lazy and put this patch in the series. This is a *bug* in
existing code, and completely unrelated.  I must have submitted it
just by itself. 

Boris, let me know if you want me to resubmit and I'll do this by itself.

> 
> >  	list_for_each_entry_safe(iter, tmp, &microcode_cache, plist) {
> >  		mc_saved_hdr = (struct microcode_header_intel *)iter->data;
> > -		sig	     = mc_saved_hdr->sig;
> > -		pf	     = mc_saved_hdr->pf;
> >  
> > -		if (find_matching_signature(data, sig, pf)) {
> > +		sig = uci->cpu_sig.sig;
> > +		pf  = uci->cpu_sig.pf;
> > +
> > +		/*
> > +		 * Compare the current CPUs signature with the ones in the
> 
> s/CPUs
>  /CPU's

Thanks, will update.