[v1] x86/ucode: Support loading latest ucode from linux-firwmare

[PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware

Posted by Andrew Cooper 1 week, 2 days ago

When Entrysign has been mitigated in firwmare, it is believed to be safe to
pass blobs to the CPU again.  This avoids us needing to update the digest
table for new microcodes.

Relax the digest check when firmware looks to be up to date, and leave behind
a clear message when not.

This is best-effort only.  If a malicious microcode has been loaded prior to
Xen running, then all bets are off.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

I need to double check the revision table.  I think I need to submit a
correction to Linux first.
---
 xen/arch/x86/cpu/microcode/amd.c     | 81 +++++++++++++++++++++++++++-
 xen/arch/x86/cpu/microcode/core.c    |  2 +
 xen/arch/x86/cpu/microcode/private.h |  2 +
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/cpu/microcode/amd.c b/xen/arch/x86/cpu/microcode/amd.c
index 30bddc89da0a..b5b55b7a00cd 100644
--- a/xen/arch/x86/cpu/microcode/amd.c
+++ b/xen/arch/x86/cpu/microcode/amd.c
@@ -101,6 +101,7 @@ static const struct patch_digest {
 } patch_digests[] = {
 #include "amd-patch-digests.c"
 };
+static bool __ro_after_init entrysign_mitigiated_in_firmware;
 
 static int cf_check cmp_patch_id(const void *key, const void *elem)
 {
@@ -125,7 +126,7 @@ static bool check_digest(const struct container_microcode *mc)
      * microcode updates.  Mitigate by checking the digest of the patch
      * against a list of known provenance.
      */
-    if ( boot_cpu_data.family < 0x17 ||
+    if ( boot_cpu_data.family < 0x17 || entrysign_mitigiated_in_firmware ||
          !opt_digest_check )
         return true;
 
@@ -597,3 +598,81 @@ static void __init __constructor test_digests_sorted(void)
     }
 }
 #endif /* CONFIG_SELF_TESTS */
+
+/*
+ * The Entrysign vulnerability affects all Zen1 thru Zen5 CPUs.  Firmware
+ * fixes were produced in Nov/Dec 2025.  Zen3 thru Zen5 can continue to take
+ * OS-loadable microcode updates using a new signature scheme, as long as
+ * firmware has been updated first.
+ */
+void __init amd_check_entrysign(void)
+{
+    unsigned int curr_rev;
+    uint8_t fixed_rev;
+
+    if ( boot_cpu_data.vendor != X86_VENDOR_AMD ||
+         boot_cpu_data.family < 0x17 ||
+         boot_cpu_data.family > 0x1a )
+        return;
+
+    /*
+     * Table taken from Linux, which is the only known source of information
+     * about client revisions.
+     */
+    curr_rev = this_cpu(cpu_sig).rev;
+    switch ( curr_rev >> 8 )
+    {
+    case 0x080012: fixed_rev = 0x6f; break;
+    case 0x080082: fixed_rev = 0x0f; break;
+    case 0x083010: fixed_rev = 0x7c; break;
+    case 0x086001: fixed_rev = 0x0e; break;
+    case 0x086081: fixed_rev = 0x08; break;
+    case 0x087010: fixed_rev = 0x34; break;
+    case 0x08a000: fixed_rev = 0x0a; break;
+    case 0x0a0010: fixed_rev = 0x7a; break;
+    case 0x0a0011: fixed_rev = 0xda; break;
+    case 0x0a0012: fixed_rev = 0x43; break;
+    case 0x0a0082: fixed_rev = 0x0e; break;
+    case 0x0a1011: fixed_rev = 0x53; break;
+    case 0x0a1012: fixed_rev = 0x4e; break;
+    case 0x0a1081: fixed_rev = 0x09; break;
+    case 0x0a2010: fixed_rev = 0x2f; break;
+    case 0x0a2012: fixed_rev = 0x12; break;
+    case 0x0a4041: fixed_rev = 0x09; break;
+    case 0x0a5000: fixed_rev = 0x13; break;
+    case 0x0a6012: fixed_rev = 0x0a; break;
+    case 0x0a7041: fixed_rev = 0x09; break;
+    case 0x0a7052: fixed_rev = 0x08; break;
+    case 0x0a7080: fixed_rev = 0x09; break;
+    case 0x0a70c0: fixed_rev = 0x09; break;
+    case 0x0aa001: fixed_rev = 0x16; break;
+    case 0x0aa002: fixed_rev = 0x18; break;
+    case 0x0b0021: fixed_rev = 0x46; break;
+    case 0x0b1010: fixed_rev = 0x46; break;
+    case 0x0b2040: fixed_rev = 0x31; break;
+    case 0x0b4040: fixed_rev = 0x31; break;
+    case 0x0b6000: fixed_rev = 0x31; break;
+    case 0x0b7000: fixed_rev = 0x31; break;
+    default:
+        printk(XENLOG_WARNING
+               "Unrecognised CPU %02x-%02x-%02x ucode 0x%08x, assuming vulnerable to Entrysign\n",
+               boot_cpu_data.family, boot_cpu_data.model,
+               boot_cpu_data.stepping, curr_rev);
+        return;
+    }
+
+    /*
+     * This check is best-effort.  If the platform looks to be out of date, it
+     * probably is.  If the platform looks to be fixed, it either genuinely
+     * is, or malware has gotten in before Xen booted and all bets are off.
+     */
+    if ( (uint8_t)curr_rev >= fixed_rev )
+    {
+        entrysign_mitigiated_in_firmware = true;
+        return;
+    }
+
+    printk(XENLOG_ERR
+           "Platform vulnerable to Entrysign (SB-7033, CVE-2024-36347) - firmware update required\n");
+    add_taint(TAINT_CPU_OUT_OF_SPEC);
+}
diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
index 2705bb43c97f..1d1a5aa4b097 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
@@ -750,6 +750,8 @@ static int __init early_microcode_load(struct boot_info *bi)
     int idx = opt_mod_idx;
     int rc;
 
+    amd_check_entrysign();
+
     /*
      * Cmdline parsing ensures this invariant holds, so that we don't end up
      * trying to mix multiple ways of finding the microcode.
diff --git a/xen/arch/x86/cpu/microcode/private.h b/xen/arch/x86/cpu/microcode/private.h
index f5e2bfee00d9..e6c965dc99dd 100644
--- a/xen/arch/x86/cpu/microcode/private.h
+++ b/xen/arch/x86/cpu/microcode/private.h
@@ -81,8 +81,10 @@ extern bool opt_digest_check;
  */
 #ifdef CONFIG_AMD
 void ucode_probe_amd(struct microcode_ops *ops);
+void amd_check_entrysign(void);
 #else
 static inline void ucode_probe_amd(struct microcode_ops *ops) {}
+static inline void amd_check_entrysign(void) {}
 #endif
 
 #ifdef CONFIG_INTEL
-- 
2.39.5

Re: [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware

Posted by Jan Beulich 1 week, 1 day ago

On 20.10.2025 15:19, Andrew Cooper wrote:
> @@ -597,3 +598,81 @@ static void __init __constructor test_digests_sorted(void)
>      }
>  }
>  #endif /* CONFIG_SELF_TESTS */
> +
> +/*
> + * The Entrysign vulnerability affects all Zen1 thru Zen5 CPUs.

And older ones are fine, or merely have no fixes produced?

>  Firmware
> + * fixes were produced in Nov/Dec 2025.  Zen3 thru Zen5 can continue to take
> + * OS-loadable microcode updates using a new signature scheme, as long as
> + * firmware has been updated first.
> + */

Yet what about Zen1/2?

> +void __init amd_check_entrysign(void)
> +{
> +    unsigned int curr_rev;
> +    uint8_t fixed_rev;
> +
> +    if ( boot_cpu_data.vendor != X86_VENDOR_AMD ||
> +         boot_cpu_data.family < 0x17 ||
> +         boot_cpu_data.family > 0x1a )
> +        return;
> +
> +    /*
> +     * Table taken from Linux, which is the only known source of information
> +     * about client revisions.
> +     */
> +    curr_rev = this_cpu(cpu_sig).rev;
> +    switch ( curr_rev >> 8 )
> +    {
> +    case 0x080012: fixed_rev = 0x6f; break;
> +    case 0x080082: fixed_rev = 0x0f; break;

In your reply you mentioned a "general off-by-1" when comparing with Linux,
but I'm in trouble understanding how both can be correct. Leaving aside the
1st line (for which you sent a Linux patch anyway), how can our
"(uint8_t)curr_rev >= fixed_rev" (i.e. "(uint8_t)curr_rev >= 0x0f") further
below be correct at the same time as Linux'es "return cur_rev <= 0x800820f"
(indicating to the caller whether a SHA check is needed) is also correct?
We say 0x0f is okay, while they demand a SHA check for that revision.

In any event, whatever (legitimate) off-by-1 it is that I'm failing to spot,
I think this would want explaining in the comment above.

> +    case 0x083010: fixed_rev = 0x7c; break;
> +    case 0x086001: fixed_rev = 0x0e; break;
> +    case 0x086081: fixed_rev = 0x08; break;
> +    case 0x087010: fixed_rev = 0x34; break;
> +    case 0x08a000: fixed_rev = 0x0a; break;
> +    case 0x0a0010: fixed_rev = 0x7a; break;
> +    case 0x0a0011: fixed_rev = 0xda; break;
> +    case 0x0a0012: fixed_rev = 0x43; break;
> +    case 0x0a0082: fixed_rev = 0x0e; break;
> +    case 0x0a1011: fixed_rev = 0x53; break;
> +    case 0x0a1012: fixed_rev = 0x4e; break;
> +    case 0x0a1081: fixed_rev = 0x09; break;
> +    case 0x0a2010: fixed_rev = 0x2f; break;
> +    case 0x0a2012: fixed_rev = 0x12; break;
> +    case 0x0a4041: fixed_rev = 0x09; break;
> +    case 0x0a5000: fixed_rev = 0x13; break;
> +    case 0x0a6012: fixed_rev = 0x0a; break;
> +    case 0x0a7041: fixed_rev = 0x09; break;
> +    case 0x0a7052: fixed_rev = 0x08; break;
> +    case 0x0a7080: fixed_rev = 0x09; break;
> +    case 0x0a70c0: fixed_rev = 0x09; break;
> +    case 0x0aa001: fixed_rev = 0x16; break;
> +    case 0x0aa002: fixed_rev = 0x18; break;
> +    case 0x0b0021: fixed_rev = 0x46; break;
> +    case 0x0b1010: fixed_rev = 0x46; break;
> +    case 0x0b2040: fixed_rev = 0x31; break;
> +    case 0x0b4040: fixed_rev = 0x31; break;
> +    case 0x0b6000: fixed_rev = 0x31; break;
> +    case 0x0b7000: fixed_rev = 0x31; break;

Without at least brief model related comments this looks extremely opaque.
Linux, as a minimal reference, at least has cpuid_to_ucode_rev() and the
accompanying union zen_patch_rev. Background of my remark is that I would
have expected there to be more models per Zen<N>, seeing in particular how
many different BKDGs / PPRs and RGs there are. Many RGs in particular say
they apply to a range of models, yet no similar ranges are covered here
(unless my deciphering attempts went wrong).

Jan

Re: [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware

Posted by Andrew Cooper 1 week ago

On 21/10/2025 10:47 am, Jan Beulich wrote:
> On 20.10.2025 15:19, Andrew Cooper wrote:
>> @@ -597,3 +598,81 @@ static void __init __constructor test_digests_sorted(void)
>>      }
>>  }
>>  #endif /* CONFIG_SELF_TESTS */
>> +
>> +/*
>> + * The Entrysign vulnerability affects all Zen1 thru Zen5 CPUs.
> And older ones are fine, or merely have no fixes produced?

Unknown.  Everything prior to Zen1 is fully out of support from AMD.

>
>>  Firmware
>> + * fixes were produced in Nov/Dec 2025.  Zen3 thru Zen5 can continue to take
>> + * OS-loadable microcode updates using a new signature scheme, as long as
>> + * firmware has been updated first.
>> + */
> Yet what about Zen1/2?

There's no signature fix for Zen1/2.  The "fix" disables microcode
loading, and firmware updates are the only way to get new fixes.

This is footnote 1 in the bulletin.  "Microcode cannot be hot-loaded
after updating to this PI version."

>
>> +void __init amd_check_entrysign(void)
>> +{
>> +    unsigned int curr_rev;
>> +    uint8_t fixed_rev;
>> +
>> +    if ( boot_cpu_data.vendor != X86_VENDOR_AMD ||
>> +         boot_cpu_data.family < 0x17 ||
>> +         boot_cpu_data.family > 0x1a )
>> +        return;
>> +
>> +    /*
>> +     * Table taken from Linux, which is the only known source of information
>> +     * about client revisions.
>> +     */
>> +    curr_rev = this_cpu(cpu_sig).rev;
>> +    switch ( curr_rev >> 8 )
>> +    {
>> +    case 0x080012: fixed_rev = 0x6f; break;
>> +    case 0x080082: fixed_rev = 0x0f; break;
> In your reply you mentioned a "general off-by-1" when comparing with Linux,
> but I'm in trouble understanding how both can be correct. Leaving aside the
> 1st line (for which you sent a Linux patch anyway), how can our
> "(uint8_t)curr_rev >= fixed_rev" (i.e. "(uint8_t)curr_rev >= 0x0f") further
> below be correct at the same time as Linux'es "return cur_rev <= 0x800820f"
> (indicating to the caller whether a SHA check is needed) is also correct?
> We say 0x0f is okay, while they demand a SHA check for that revision.
>
> In any event, whatever (legitimate) off-by-1 it is that I'm failing to spot,
> I think this would want explaining in the comment above.

What you've spotted is the off-by-one error.

Linux is written as "curr <= last-vuln-rev" in order to do the digest check.

Xen wants "cur >= first-fixed-rev"; I renamed the variable and forgot to
adjust the table to compensate.  I've already fixed it in v2, so this
line now reads fixed_rev = 0x0a.

>
>> +    case 0x083010: fixed_rev = 0x7c; break;
>> +    case 0x086001: fixed_rev = 0x0e; break;
>> +    case 0x086081: fixed_rev = 0x08; break;
>> +    case 0x087010: fixed_rev = 0x34; break;
>> +    case 0x08a000: fixed_rev = 0x0a; break;
>> +    case 0x0a0010: fixed_rev = 0x7a; break;
>> +    case 0x0a0011: fixed_rev = 0xda; break;
>> +    case 0x0a0012: fixed_rev = 0x43; break;
>> +    case 0x0a0082: fixed_rev = 0x0e; break;
>> +    case 0x0a1011: fixed_rev = 0x53; break;
>> +    case 0x0a1012: fixed_rev = 0x4e; break;
>> +    case 0x0a1081: fixed_rev = 0x09; break;
>> +    case 0x0a2010: fixed_rev = 0x2f; break;
>> +    case 0x0a2012: fixed_rev = 0x12; break;
>> +    case 0x0a4041: fixed_rev = 0x09; break;
>> +    case 0x0a5000: fixed_rev = 0x13; break;
>> +    case 0x0a6012: fixed_rev = 0x0a; break;
>> +    case 0x0a7041: fixed_rev = 0x09; break;
>> +    case 0x0a7052: fixed_rev = 0x08; break;
>> +    case 0x0a7080: fixed_rev = 0x09; break;
>> +    case 0x0a70c0: fixed_rev = 0x09; break;
>> +    case 0x0aa001: fixed_rev = 0x16; break;
>> +    case 0x0aa002: fixed_rev = 0x18; break;
>> +    case 0x0b0021: fixed_rev = 0x46; break;
>> +    case 0x0b1010: fixed_rev = 0x46; break;
>> +    case 0x0b2040: fixed_rev = 0x31; break;
>> +    case 0x0b4040: fixed_rev = 0x31; break;
>> +    case 0x0b6000: fixed_rev = 0x31; break;
>> +    case 0x0b7000: fixed_rev = 0x31; break;
> Without at least brief model related comments this looks extremely opaque.
> Linux, as a minimal reference, at least has cpuid_to_ucode_rev() and the
> accompanying union zen_patch_rev.

We have other tables like this in Xen.  Linux has even more.

These case labels are family/model/steppings, but not in the same format
as CPUID.1.EAX, and also not in the same format at patch->processor_id.

This is true even on CPUs prior to Zen1, making union zen_patch_rev
misleading and why I have intentionally not ported it across.  Fam11h
seems to be where this started being true in practice.

Fam10h has cases the same ucode applies to different steppings of CPUs.

>  Background of my remark is that I would
> have expected there to be more models per Zen<N>, seeing in particular how
> many different BKDGs / PPRs and RGs there are. Many RGs in particular say
> they apply to a range of models, yet no similar ranges are covered here
> (unless my deciphering attempts went wrong).

PPRs/RGs are generally per block of 0x10 models and all steppings
therewith.  This is quite often one production CPU and a handful of
preproduction steppings, but e.g. Milan and MilanX are two production
CPUs share a same PPR/RG, as they differ only by stepping.

Preproduction CPUs probably won't have a fix (other than the final two
rows which are A0 stepping of something presumably trying to get out of
the door when Entrysign was found.)  The list does look to be right
order of magnitude for the production CPUs.

The AMD bulletin only gives microcode versions for server.  Clients only
state AgesaPI versions, so I'm entirely reliant on Linux for the
microcode versions.

~Andrew

Re: [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware

Posted by Jan Beulich 6 days, 14 hours ago

On 22.10.2025 23:19, Andrew Cooper wrote:
> On 21/10/2025 10:47 am, Jan Beulich wrote:
>> On 20.10.2025 15:19, Andrew Cooper wrote:
>>> +void __init amd_check_entrysign(void)
>>> +{
>>> +    unsigned int curr_rev;
>>> +    uint8_t fixed_rev;
>>> +
>>> +    if ( boot_cpu_data.vendor != X86_VENDOR_AMD ||
>>> +         boot_cpu_data.family < 0x17 ||
>>> +         boot_cpu_data.family > 0x1a )
>>> +        return;
>>> +
>>> +    /*
>>> +     * Table taken from Linux, which is the only known source of information
>>> +     * about client revisions.
>>> +     */
>>> +    curr_rev = this_cpu(cpu_sig).rev;
>>> +    switch ( curr_rev >> 8 )
>>> +    {
>>> +    case 0x080012: fixed_rev = 0x6f; break;
>>> +    case 0x080082: fixed_rev = 0x0f; break;
>> In your reply you mentioned a "general off-by-1" when comparing with Linux,
>> but I'm in trouble understanding how both can be correct. Leaving aside the
>> 1st line (for which you sent a Linux patch anyway), how can our
>> "(uint8_t)curr_rev >= fixed_rev" (i.e. "(uint8_t)curr_rev >= 0x0f") further
>> below be correct at the same time as Linux'es "return cur_rev <= 0x800820f"
>> (indicating to the caller whether a SHA check is needed) is also correct?
>> We say 0x0f is okay, while they demand a SHA check for that revision.
>>
>> In any event, whatever (legitimate) off-by-1 it is that I'm failing to spot,
>> I think this would want explaining in the comment above.
> 
> What you've spotted is the off-by-one error.
> 
> Linux is written as "curr <= last-vuln-rev" in order to do the digest check.
> 
> Xen wants "cur >= first-fixed-rev"; I renamed the variable and forgot to
> adjust the table to compensate.  I've already fixed it in v2, so this
> line now reads fixed_rev = 0x0a.

Now I'm even more confused. If Linux uses 0x0f for last-vuln-rev, how would
0x0a be first-fixed-ref?

>>> +    case 0x083010: fixed_rev = 0x7c; break;
>>> +    case 0x086001: fixed_rev = 0x0e; break;
>>> +    case 0x086081: fixed_rev = 0x08; break;
>>> +    case 0x087010: fixed_rev = 0x34; break;
>>> +    case 0x08a000: fixed_rev = 0x0a; break;
>>> +    case 0x0a0010: fixed_rev = 0x7a; break;
>>> +    case 0x0a0011: fixed_rev = 0xda; break;
>>> +    case 0x0a0012: fixed_rev = 0x43; break;
>>> +    case 0x0a0082: fixed_rev = 0x0e; break;
>>> +    case 0x0a1011: fixed_rev = 0x53; break;
>>> +    case 0x0a1012: fixed_rev = 0x4e; break;
>>> +    case 0x0a1081: fixed_rev = 0x09; break;
>>> +    case 0x0a2010: fixed_rev = 0x2f; break;
>>> +    case 0x0a2012: fixed_rev = 0x12; break;
>>> +    case 0x0a4041: fixed_rev = 0x09; break;
>>> +    case 0x0a5000: fixed_rev = 0x13; break;
>>> +    case 0x0a6012: fixed_rev = 0x0a; break;
>>> +    case 0x0a7041: fixed_rev = 0x09; break;
>>> +    case 0x0a7052: fixed_rev = 0x08; break;
>>> +    case 0x0a7080: fixed_rev = 0x09; break;
>>> +    case 0x0a70c0: fixed_rev = 0x09; break;
>>> +    case 0x0aa001: fixed_rev = 0x16; break;
>>> +    case 0x0aa002: fixed_rev = 0x18; break;
>>> +    case 0x0b0021: fixed_rev = 0x46; break;
>>> +    case 0x0b1010: fixed_rev = 0x46; break;
>>> +    case 0x0b2040: fixed_rev = 0x31; break;
>>> +    case 0x0b4040: fixed_rev = 0x31; break;
>>> +    case 0x0b6000: fixed_rev = 0x31; break;
>>> +    case 0x0b7000: fixed_rev = 0x31; break;
>> Without at least brief model related comments this looks extremely opaque.
>> Linux, as a minimal reference, at least has cpuid_to_ucode_rev() and the
>> accompanying union zen_patch_rev.
> 
> We have other tables like this in Xen.  Linux has even more.

The one in amd-patch-digests.c I'm aware of. Oh, and tsa_calculations().
But ...

> These case labels are family/model/steppings, but not in the same format
> as CPUID.1.EAX, and also not in the same format at patch->processor_id.

... none of them explaining what these numbers really mean isn't helpful.
I didn't question them earlier because I assumed them to be all "magic".
Now that I learned how they're encoded, I thought it might be (have been)
nice if they weren't left as "entirely magic".

>>  Background of my remark is that I would
>> have expected there to be more models per Zen<N>, seeing in particular how
>> many different BKDGs / PPRs and RGs there are. Many RGs in particular say
>> they apply to a range of models, yet no similar ranges are covered here
>> (unless my deciphering attempts went wrong).
> 
> PPRs/RGs are generally per block of 0x10 models and all steppings
> therewith.  This is quite often one production CPU and a handful of
> preproduction steppings, but e.g. Milan and MilanX are two production
> CPUs share a same PPR/RG, as they differ only by stepping.
> 
> Preproduction CPUs probably won't have a fix (other than the final two
> rows which are A0 stepping of something presumably trying to get out of
> the door when Entrysign was found.)  The list does look to be right
> order of magnitude for the production CPUs.

Sure, and my question wasn't towards steppings of individual models. My
question was towards models of individual families, where the docs
suggest far more exist than this table would cover. I guess that while
talking mainly of steppings, you really (also) meant to say that most of
the model numbers weren't used in practice (for production CPUs) either?

> The AMD bulletin only gives microcode versions for server.  Clients only
> state AgesaPI versions, so I'm entirely reliant on Linux for the
> microcode versions.

I did understand that, yes, as you have a code comment saying so.

Jan

Re: [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware

Posted by Andrew Cooper 2 days ago

On 23/10/2025 8:05 am, Jan Beulich wrote:
> On 22.10.2025 23:19, Andrew Cooper wrote:
>> On 21/10/2025 10:47 am, Jan Beulich wrote:
>>> On 20.10.2025 15:19, Andrew Cooper wrote:
>>>> +void __init amd_check_entrysign(void)
>>>> +{
>>>> +    unsigned int curr_rev;
>>>> +    uint8_t fixed_rev;
>>>> +
>>>> +    if ( boot_cpu_data.vendor != X86_VENDOR_AMD ||
>>>> +         boot_cpu_data.family < 0x17 ||
>>>> +         boot_cpu_data.family > 0x1a )
>>>> +        return;
>>>> +
>>>> +    /*
>>>> +     * Table taken from Linux, which is the only known source of information
>>>> +     * about client revisions.
>>>> +     */
>>>> +    curr_rev = this_cpu(cpu_sig).rev;
>>>> +    switch ( curr_rev >> 8 )
>>>> +    {
>>>> +    case 0x080012: fixed_rev = 0x6f; break;
>>>> +    case 0x080082: fixed_rev = 0x0f; break;
>>> In your reply you mentioned a "general off-by-1" when comparing with Linux,
>>> but I'm in trouble understanding how both can be correct. Leaving aside the
>>> 1st line (for which you sent a Linux patch anyway), how can our
>>> "(uint8_t)curr_rev >= fixed_rev" (i.e. "(uint8_t)curr_rev >= 0x0f") further
>>> below be correct at the same time as Linux'es "return cur_rev <= 0x800820f"
>>> (indicating to the caller whether a SHA check is needed) is also correct?
>>> We say 0x0f is okay, while they demand a SHA check for that revision.
>>>
>>> In any event, whatever (legitimate) off-by-1 it is that I'm failing to spot,
>>> I think this would want explaining in the comment above.
>> What you've spotted is the off-by-one error.
>>
>> Linux is written as "curr <= last-vuln-rev" in order to do the digest check.
>>
>> Xen wants "cur >= first-fixed-rev"; I renamed the variable and forgot to
>> adjust the table to compensate.  I've already fixed it in v2, so this
>> line now reads fixed_rev = 0x0a.
> Now I'm even more confused. If Linux uses 0x0f for last-vuln-rev, how would
> 0x0a be first-fixed-ref?

Sorry, that was a typo in my email.  I've got 0x10 locally.

>
>>>> +    case 0x083010: fixed_rev = 0x7c; break;
>>>> +    case 0x086001: fixed_rev = 0x0e; break;
>>>> +    case 0x086081: fixed_rev = 0x08; break;
>>>> +    case 0x087010: fixed_rev = 0x34; break;
>>>> +    case 0x08a000: fixed_rev = 0x0a; break;
>>>> +    case 0x0a0010: fixed_rev = 0x7a; break;
>>>> +    case 0x0a0011: fixed_rev = 0xda; break;
>>>> +    case 0x0a0012: fixed_rev = 0x43; break;
>>>> +    case 0x0a0082: fixed_rev = 0x0e; break;
>>>> +    case 0x0a1011: fixed_rev = 0x53; break;
>>>> +    case 0x0a1012: fixed_rev = 0x4e; break;
>>>> +    case 0x0a1081: fixed_rev = 0x09; break;
>>>> +    case 0x0a2010: fixed_rev = 0x2f; break;
>>>> +    case 0x0a2012: fixed_rev = 0x12; break;
>>>> +    case 0x0a4041: fixed_rev = 0x09; break;
>>>> +    case 0x0a5000: fixed_rev = 0x13; break;
>>>> +    case 0x0a6012: fixed_rev = 0x0a; break;
>>>> +    case 0x0a7041: fixed_rev = 0x09; break;
>>>> +    case 0x0a7052: fixed_rev = 0x08; break;
>>>> +    case 0x0a7080: fixed_rev = 0x09; break;
>>>> +    case 0x0a70c0: fixed_rev = 0x09; break;
>>>> +    case 0x0aa001: fixed_rev = 0x16; break;
>>>> +    case 0x0aa002: fixed_rev = 0x18; break;
>>>> +    case 0x0b0021: fixed_rev = 0x46; break;
>>>> +    case 0x0b1010: fixed_rev = 0x46; break;
>>>> +    case 0x0b2040: fixed_rev = 0x31; break;
>>>> +    case 0x0b4040: fixed_rev = 0x31; break;
>>>> +    case 0x0b6000: fixed_rev = 0x31; break;
>>>> +    case 0x0b7000: fixed_rev = 0x31; break;
>>> Without at least brief model related comments this looks extremely opaque.
>>> Linux, as a minimal reference, at least has cpuid_to_ucode_rev() and the
>>> accompanying union zen_patch_rev.
>> We have other tables like this in Xen.  Linux has even more.
> The one in amd-patch-digests.c I'm aware of. Oh, and tsa_calculations().
> But ...
>
>> These case labels are family/model/steppings, but not in the same format
>> as CPUID.1.EAX, and also not in the same format at patch->processor_id.
> ... none of them explaining what these numbers really mean isn't helpful.
> I didn't question them earlier because I assumed them to be all "magic".
> Now that I learned how they're encoded, I thought it might be (have been)
> nice if they weren't left as "entirely magic".

Well - they are about as magic as numbers get.

It's just a convention that AMD uses when choosing the (otherwise
arbitrary) patch_id, and I'm not aware of it being written down
anywhere.  Using the entrysign vulnerability, AIUI you can choose an
arbitrary 32bit value here.

Linux says it's from Fam17h onwards, but the pattern works from Fam12h,
and Fam10h was definitely different.

I've got no idea how long it will continue.  For one, the 8-bit ucode
revision is proving to be a limiting factor on some CPUs, and e.g. one
of the 3 F/M/S encoding (patch->processor_id) will run out when we hit
Zen15 CPUs at the current rate that AMD are using Family numbers.

>>>  Background of my remark is that I would
>>> have expected there to be more models per Zen<N>, seeing in particular how
>>> many different BKDGs / PPRs and RGs there are. Many RGs in particular say
>>> they apply to a range of models, yet no similar ranges are covered here
>>> (unless my deciphering attempts went wrong).
>> PPRs/RGs are generally per block of 0x10 models and all steppings
>> therewith.  This is quite often one production CPU and a handful of
>> preproduction steppings, but e.g. Milan and MilanX are two production
>> CPUs share a same PPR/RG, as they differ only by stepping.
>>
>> Preproduction CPUs probably won't have a fix (other than the final two
>> rows which are A0 stepping of something presumably trying to get out of
>> the door when Entrysign was found.)  The list does look to be right
>> order of magnitude for the production CPUs.
> Sure, and my question wasn't towards steppings of individual models. My
> question was towards models of individual families, where the docs
> suggest far more exist than this table would cover. I guess that while
> talking mainly of steppings, you really (also) meant to say that most of
> the model numbers weren't used in practice (for production CPUs) either?

AMD's numbering space is very sparse.  From a block of 0x10 (or in some
cases 8) model numbers, it's uncommon to see anything other than 0 or 1.

~Andrew

Re: [PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware

Posted by Andrew Cooper 1 week, 2 days ago

On 20/10/2025 2:19 pm, Andrew Cooper wrote:
> When Entrysign has been mitigated in firwmare, it is believed to be safe to
> pass blobs to the CPU again.  This avoids us needing to update the digest
> table for new microcodes.
>
> Relax the digest check when firmware looks to be up to date, and leave behind
> a clear message when not.
>
> This is best-effort only.  If a malicious microcode has been loaded prior to
> Xen running, then all bets are off.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
>
> I need to double check the revision table.  I think I need to submit a
> correction to Linux first.

Yes. 
https://lore.kernel.org/lkml/20251020144124.2930784-1-andrew.cooper3@citrix.com/T/#u

Also there's a general off-by-one error in the revisions, owing to a
difference in how Linux and Xen are using the boundaries.

Both fixed locally for v2.

~Andrew

[PATCH 1/5] x86/ucode: Fix missing printk() newline in ucode_probe_amd()
[PATCH 2/5] x86/ucode: Abort parallel load early on any control thread error
[PATCH 3/5] x86/ucode: Refine TLB flush fix for AMD Fam17h CPUs
[PATCH 4/5] x86/ucode: Cross check the minimum revision
[PATCH 5/5] x86/ucode: Relax digest check when Entrysign is fixed in firmware