[PATCH for-4.15] x86/ucode/amd: Handle length sanity check failures more gracefully

Andrew Cooper posted 1 patch 3 years, 2 months ago
Test gitlab-ci failed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20210209214911.18461-1-andrew.cooper3@citrix.com
xen/arch/x86/cpu/microcode/amd.c | 24 ++++++++++++++++++++++--
1 file changed, 22 insertions(+), 2 deletions(-)
[PATCH for-4.15] x86/ucode/amd: Handle length sanity check failures more gracefully
Posted by Andrew Cooper 3 years, 2 months ago
Currently, a failure of verify_patch_size() causes an early abort of the
microcode blob loop, which in turn causes a second go around the main
container loop, ultimately failing the UCODE_MAGIC check.

First, check for errors after the blob loop.  An error here is unrecoverable,
so avoid going around the container loop again and printing an
unhelpful-at-best error concerning bad UCODE_MAGIC.

Second, split the verify_patch_size() check out of the microcode blob header
check.  In the case that the sanity check fails, we can still use the
known-to-be-plausible header length to continue walking the container to
potentially find other applicable microcode blobs.

Before:
  (XEN) microcode: Bad microcode data
  (XEN) microcode: Wrong microcode patch file magic
  (XEN) Parsing microcode blob error -22

After:
  (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa000
  (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa010
  (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa011
  (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa200
  (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa210
  (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa500
  (XEN) microcode: couldn't find any matching ucode in the provided blob!

Fixes: 4de936a38a ("x86/ucode/amd: Rework parsing logic in cpu_request_microcode()")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Ian Jackson <iwj@xenproject.org>

For 4.15.  Found when putting a test together to prove the correctness of the
"x86/ucode: Fix microcode payload size for Fam19 processors" fix.

This allows microcode loading to still function even if the length magic
numbers aren't correct for a subset of blobs within the container(s).
---
 xen/arch/x86/cpu/microcode/amd.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index c4ab395799..fe7b79bd0a 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -348,8 +348,7 @@ static struct microcode_patch *cpu_request_microcode(const void *buf, size_t siz
 
             if ( size < sizeof(*mc) ||
                  (mc = buf)->type != UCODE_UCODE_TYPE ||
-                 size - sizeof(*mc) < mc->len ||
-                 (!skip_ucode && !verify_patch_size(mc->len)) )
+                 size - sizeof(*mc) < mc->len )
             {
                 printk(XENLOG_ERR "microcode: Bad microcode data\n");
                 error = -EINVAL;
@@ -359,6 +358,19 @@ static struct microcode_patch *cpu_request_microcode(const void *buf, size_t siz
             if ( skip_ucode )
                 goto skip;
 
+            if ( !verify_patch_size(mc->len) )
+            {
+                printk(XENLOG_WARNING
+                       "microcode: Bad microcode length 0x%08x for cpu 0x%04x\n",
+                       mc->len, mc->patch->processor_rev_id);
+                /*
+                 * If the blob size sanity check fails, trust the container
+                 * length which has already been checked to be at least
+                 * plausible at this point.
+                 */
+                goto skip;
+            }
+
             /*
              * If the new ucode covers current CPU, compare ucodes and store the
              * one with higher revision.
@@ -382,6 +394,14 @@ static struct microcode_patch *cpu_request_microcode(const void *buf, size_t siz
             if ( size >= 4 && *(const uint32_t *)buf == UCODE_MAGIC )
                 break;
         }
+
+        /*
+         * Any error means we didn't get cleanly to the end of the microcode
+         * container.  There isn't an overall length field, so we've got no
+         * way of skipping to the next container in the stream.
+         */
+        if ( error )
+            break;
     }
 
     if ( saved )
-- 
2.11.0


Re: [PATCH for-4.15] x86/ucode/amd: Handle length sanity check failures more gracefully
Posted by Jan Beulich 3 years, 2 months ago
On 09.02.2021 22:49, Andrew Cooper wrote:
> Currently, a failure of verify_patch_size() causes an early abort of the
> microcode blob loop, which in turn causes a second go around the main
> container loop, ultimately failing the UCODE_MAGIC check.
> 
> First, check for errors after the blob loop.  An error here is unrecoverable,
> so avoid going around the container loop again and printing an
> unhelpful-at-best error concerning bad UCODE_MAGIC.
> 
> Second, split the verify_patch_size() check out of the microcode blob header
> check.  In the case that the sanity check fails, we can still use the
> known-to-be-plausible header length to continue walking the container to
> potentially find other applicable microcode blobs.

Since the code comment you add further clarifies this, if my
understanding here is correct that you don't think we should
mistrust the entire container in such a case ...

> Before:
>   (XEN) microcode: Bad microcode data
>   (XEN) microcode: Wrong microcode patch file magic
>   (XEN) Parsing microcode blob error -22
> 
> After:
>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa000
>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa010
>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa011
>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa200
>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa210
>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa500
>   (XEN) microcode: couldn't find any matching ucode in the provided blob!
> 
> Fixes: 4de936a38a ("x86/ucode/amd: Rework parsing logic in cpu_request_microcode()")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

After all we're trying to balance between detecting broken
containers and having wrong constants ourselves. Personally
I'd be more inclined to err on the safe side and avoid
further loading attempts, but I can see the alternative
perspective also being a reasonable one.

Jan

Re: [PATCH for-4.15] x86/ucode/amd: Handle length sanity check failures more gracefully
Posted by Andrew Cooper 3 years, 2 months ago
On 10/02/2021 10:55, Jan Beulich wrote:
> On 09.02.2021 22:49, Andrew Cooper wrote:
>> Currently, a failure of verify_patch_size() causes an early abort of the
>> microcode blob loop, which in turn causes a second go around the main
>> container loop, ultimately failing the UCODE_MAGIC check.
>>
>> First, check for errors after the blob loop.  An error here is unrecoverable,
>> so avoid going around the container loop again and printing an
>> unhelpful-at-best error concerning bad UCODE_MAGIC.
>>
>> Second, split the verify_patch_size() check out of the microcode blob header
>> check.  In the case that the sanity check fails, we can still use the
>> known-to-be-plausible header length to continue walking the container to
>> potentially find other applicable microcode blobs.
> Since the code comment you add further clarifies this, if my
> understanding here is correct that you don't think we should
> mistrust the entire container in such a case ...
>
>> Before:
>>   (XEN) microcode: Bad microcode data
>>   (XEN) microcode: Wrong microcode patch file magic
>>   (XEN) Parsing microcode blob error -22
>>
>> After:
>>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa000
>>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa010
>>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa011
>>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa200
>>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa210
>>   (XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa500
>>   (XEN) microcode: couldn't find any matching ucode in the provided blob!
>>
>> Fixes: 4de936a38a ("x86/ucode/amd: Rework parsing logic in cpu_request_microcode()")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

>
> After all we're trying to balance between detecting broken
> containers and having wrong constants ourselves. Personally
> I'd be more inclined to err on the safe side and avoid
> further loading attempts, but I can see the alternative
> perspective also being a reasonable one.

The more I learn, the more I'm starting to mistrust the containers.

But as we don't know whether it is us or the container at fault - and
have an example where we are at fault - I don't think blocking loading
is an appropriate thing to do.  (Amongst other things, it totally kills
any ability to test this interface.)

~Andrew