[PATCH] x86/ucode: Fix stability of the Raw CPU Policy rescan

Andrew Cooper posted 1 patch 3 months ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20240123205948.1782556-1-andrew.cooper3@citrix.com
xen/arch/x86/cpu-policy.c         |  7 +++++++
xen/arch/x86/cpu/microcode/core.c | 20 +++++++++++++++++---
2 files changed, 24 insertions(+), 3 deletions(-)
[PATCH] x86/ucode: Fix stability of the Raw CPU Policy rescan
Posted by Andrew Cooper 3 months ago
Always run microcode_update_helper() on the BSP, so the the updated Raw CPU
policy doesn't get non-BSP topology details included.

Have calculate_raw_cpu_policy() clear the instantanious XSTATE sizes.  The
value XCR0 | MSR_XSS had when we scanned the policy isn't terribly interesting
to report.

When CPUID Masking is active, it affects CPUID instructions issued by Xen
too.  Transiently disable masking to get a clean scan.

Fixes: 694d79ed5aac ("x86/ucode: Refresh raw CPU policy after microcode load")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

I debased adding named fields for the instantious xstate sizes, but decided
not to.  There's no (other) case where I can see them reasonably being used.
---
 xen/arch/x86/cpu-policy.c         |  7 +++++++
 xen/arch/x86/cpu/microcode/core.c | 20 +++++++++++++++++---
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 76efb050edf7..82b10de03efd 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -353,6 +353,13 @@ void calculate_raw_cpu_policy(void)
     /* Nothing good will come from Xen and libx86 disagreeing on vendor. */
     ASSERT(p->x86_vendor == boot_cpu_data.x86_vendor);
 
+    /*
+     * Clear the truly dynamic fields.  These vary with the in-context XCR0
+     * and MSR_XSS, and aren't interesting fields in the raw policy.
+     */
+    p->xstate.raw[0].ebx = 0;
+    p->xstate.raw[1].ebx = 0;
+
     /* 0x000000ce  MSR_INTEL_PLATFORM_INFO */
     /* Was already added by probe_cpuid_faulting() */
 }
diff --git a/xen/arch/x86/cpu/microcode/core.c b/xen/arch/x86/cpu/microcode/core.c
index 120a11d5036d..6f95f7bbe223 100644
--- a/xen/arch/x86/cpu/microcode/core.c
+++ b/xen/arch/x86/cpu/microcode/core.c
@@ -680,8 +680,18 @@ static long cf_check microcode_update_helper(void *data)
         microcode_update_cache(patch);
         spin_unlock(&microcode_mutex);
 
-        /* Refresh the raw CPU policy, in case the features have changed. */
+        /*
+         * Refresh the raw CPU policy, in case the features have changed.
+         * Disable CPUID masking if in use, to avoid having current's
+         * cpu_policy affect the rescan.
+         */
+	if ( ctxt_switch_masking )
+            alternative_vcall(ctxt_switch_masking, NULL);
+
         calculate_raw_cpu_policy();
+
+	if ( ctxt_switch_masking )
+            alternative_vcall(ctxt_switch_masking, current);
     }
     else
         microcode_free_patch(patch);
@@ -721,8 +731,12 @@ int microcode_update(XEN_GUEST_HANDLE(const_void) buf, unsigned long len)
     }
     buffer->len = len;
 
-    return continue_hypercall_on_cpu(smp_processor_id(),
-                                     microcode_update_helper, buffer);
+    /*
+     * Always queue microcode_update_helper() on CPU0.  Most of the logic
+     * won't care, but the update of the Raw CPU policy wants to (re)run on
+     * the BSP.
+     */
+    return continue_hypercall_on_cpu(0, microcode_update_helper, buffer);
 }
 
 static int __init cf_check microcode_init(void)
-- 
2.30.2


Re: [PATCH] x86/ucode: Fix stability of the Raw CPU Policy rescan
Posted by Jan Beulich 3 months ago
On 23.01.2024 21:59, Andrew Cooper wrote:
> Always run microcode_update_helper() on the BSP, so the the updated Raw CPU
> policy doesn't get non-BSP topology details included.

Wouldn't it be better (and consistent with ...

> Have calculate_raw_cpu_policy() clear the instantanious XSTATE sizes.  The
> value XCR0 | MSR_XSS had when we scanned the policy isn't terribly interesting
> to report.

... this) to purge these details from the raw policy as well then?

> When CPUID Masking is active, it affects CPUID instructions issued by Xen
> too.  Transiently disable masking to get a clean scan.
> 
> Fixes: 694d79ed5aac ("x86/ucode: Refresh raw CPU policy after microcode load")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Irrespective of the question above, I'm also okay with the change as is:
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
Re: [PATCH] x86/ucode: Fix stability of the Raw CPU Policy rescan
Posted by Andrew Cooper 3 months ago
On 24/01/2024 3:37 pm, Jan Beulich wrote:
> On 23.01.2024 21:59, Andrew Cooper wrote:
>> Always run microcode_update_helper() on the BSP, so the the updated Raw CPU
>> policy doesn't get non-BSP topology details included.
> Wouldn't it be better (and consistent with ...
>
>> Have calculate_raw_cpu_policy() clear the instantanious XSTATE sizes.  The
>> value XCR0 | MSR_XSS had when we scanned the policy isn't terribly interesting
>> to report.
> ... this) to purge these details from the raw policy as well then?

I did spend some time considering this.

Rerunning on the same CPU is more resilient to new topology leaves, so
we'd want to be doing that irrespective.

The XCR0/XSS state really is transient, and the useful information is
everywhere else in leaf 0xd.

>
>> When CPUID Masking is active, it affects CPUID instructions issued by Xen
>> too.  Transiently disable masking to get a clean scan.
>>
>> Fixes: 694d79ed5aac ("x86/ucode: Refresh raw CPU policy after microcode load")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Irrespective of the question above, I'm also okay with the change as is:
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks.

~Andrew
Re: [PATCH] x86/ucode: Fix stability of the Raw CPU Policy rescan
Posted by Jan Beulich 3 months ago
On 24.01.2024 18:23, Andrew Cooper wrote:
> On 24/01/2024 3:37 pm, Jan Beulich wrote:
>> On 23.01.2024 21:59, Andrew Cooper wrote:
>>> Always run microcode_update_helper() on the BSP, so the the updated Raw CPU
>>> policy doesn't get non-BSP topology details included.
>> Wouldn't it be better (and consistent with ...
>>
>>> Have calculate_raw_cpu_policy() clear the instantanious XSTATE sizes.  The
>>> value XCR0 | MSR_XSS had when we scanned the policy isn't terribly interesting
>>> to report.
>> ... this) to purge these details from the raw policy as well then?
> 
> I did spend some time considering this.
> 
> Rerunning on the same CPU is more resilient to new topology leaves, so
> we'd want to be doing that irrespective.

I'm afraid I don't understand this: If a ucode update surfaced new leaves,
they surely would appear on all CPUs? IOW my question still stands: Wouldn't
we better zap topology data from the raw policy (thus also not propagating
it into other policies)? At which point retrieval becomes independent of
what CPU it is run on (if there were any other CPU-specific pieces of data,
similar zapping should happen for them).

Surely using CPU0 here isn't much of a problem, as this is a pretty
infrequent event. But generally I'd like to avoid "preferring" CPU0 as much
as possible. Hence I'd prefer if even in cases like this one we could avoid
it.

> The XCR0/XSS state really is transient, and the useful information is
> everywhere else in leaf 0xd.

Sure, but this is still independent on what CPU the retrieval is run on.

Jan