[PATCH] x86/SMP: guard socket_cpumask[] access in cpu_smpboot_free()

Jan Beulich posted 1 patch 4 months ago
Failed in applying to current master (apply log)
[PATCH] x86/SMP: guard socket_cpumask[] access in cpu_smpboot_free()
Posted by Jan Beulich 4 months ago
Before CPU identification has run (and it may not have run at all e.g.
when AP bringup failed altogether), cpu_data[].phys_proc_id (which is
what cpu_to_socket() resolves to) can't really be used. The use of
cpu_to_socket()'s result as an array index cpu_smpboot_free() therefore
needs guarding, as the function will also be invoked upon AP bringup
failure, in which case CPU identification may not have run.

Without "x86/CPU: re-work populating of cpu_data[]" [1] the issue is
less pronounced: The field starts out as zero, then has the BSP value
(likely again zero) copied into it, and it is properly invalidated only
in cpu_smpboot_free(). Still it is clearly wrong to use the BSP's socket
number here.

Making the guard work with and without the above patch applied turns out
interesting: Prior to that patch, the sole invalidation done is that in
cpu_smpboot_free(). Upon a later bringup attempt, the fields invalidated
are overwritten by the BSP values again, though. Hence compare APIC IDs,
as they cannot validly be the same once CPU identification has run.

[1] https://lists.xen.org/archives/html/xen-devel/2024-02/msg00727.html

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Sadly there was no feedback at all yet for the referenced patch.

--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -958,7 +958,13 @@ static void cpu_smpboot_free(unsigned in
     unsigned int socket = cpu_to_socket(cpu);
     struct cpuinfo_x86 *c = cpu_data;
 
-    if ( cpumask_empty(socket_cpumask[socket]) )
+    /*
+     * We may come here without the CPU having run through CPU identification.
+     * In that case the socket number cannot be relied upon, but the respective
+     * socket_cpumask[] slot also wouldn't have been set.
+     */
+    if ( c[cpu].apicid != boot_cpu_data.apicid &&
+         cpumask_empty(socket_cpumask[socket]) )
     {
         xfree(socket_cpumask[socket]);
         socket_cpumask[socket] = NULL;
Re: [PATCH] x86/SMP: guard socket_cpumask[] access in cpu_smpboot_free()
Posted by Roger Pau Monné 3 months, 2 weeks ago
On Mon, Jun 30, 2025 at 01:42:31PM +0200, Jan Beulich wrote:
> Before CPU identification has run (and it may not have run at all e.g.
> when AP bringup failed altogether), cpu_data[].phys_proc_id (which is
> what cpu_to_socket() resolves to) can't really be used. The use of
> cpu_to_socket()'s result as an array index cpu_smpboot_free() therefore
> needs guarding, as the function will also be invoked upon AP bringup
> failure, in which case CPU identification may not have run.
> 
> Without "x86/CPU: re-work populating of cpu_data[]" [1] the issue is
> less pronounced: The field starts out as zero, then has the BSP value
> (likely again zero) copied into it, and it is properly invalidated only
> in cpu_smpboot_free(). Still it is clearly wrong to use the BSP's socket
> number here.
> 
> Making the guard work with and without the above patch applied turns out
> interesting: Prior to that patch, the sole invalidation done is that in
> cpu_smpboot_free(). Upon a later bringup attempt, the fields invalidated
> are overwritten by the BSP values again, though. Hence compare APIC IDs,
> as they cannot validly be the same once CPU identification has run.
> 
> [1] https://lists.xen.org/archives/html/xen-devel/2024-02/msg00727.html
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

Re: [PATCH] x86/SMP: guard socket_cpumask[] access in cpu_smpboot_free()
Posted by Jan Beulich 4 months ago
On 30.06.2025 13:42, Jan Beulich wrote:
> Before CPU identification has run (and it may not have run at all e.g.
> when AP bringup failed altogether), cpu_data[].phys_proc_id (which is
> what cpu_to_socket() resolves to) can't really be used. The use of
> cpu_to_socket()'s result as an array index cpu_smpboot_free() therefore
> needs guarding, as the function will also be invoked upon AP bringup
> failure, in which case CPU identification may not have run.
> 
> Without "x86/CPU: re-work populating of cpu_data[]" [1] the issue is
> less pronounced: The field starts out as zero, then has the BSP value
> (likely again zero) copied into it, and it is properly invalidated only
> in cpu_smpboot_free(). Still it is clearly wrong to use the BSP's socket
> number here.
> 
> Making the guard work with and without the above patch applied turns out
> interesting: Prior to that patch, the sole invalidation done is that in
> cpu_smpboot_free(). Upon a later bringup attempt, the fields invalidated
> are overwritten by the BSP values again, though. Hence compare APIC IDs,
> as they cannot validly be the same once CPU identification has run.
> 
> [1] https://lists.xen.org/archives/html/xen-devel/2024-02/msg00727.html

Fixes: 4824bdfdabeb ("x86: add socket_cpumask")

> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Sadly there was no feedback at all yet for the referenced patch.
> 
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -958,7 +958,13 @@ static void cpu_smpboot_free(unsigned in
>      unsigned int socket = cpu_to_socket(cpu);
>      struct cpuinfo_x86 *c = cpu_data;
>  
> -    if ( cpumask_empty(socket_cpumask[socket]) )
> +    /*
> +     * We may come here without the CPU having run through CPU identification.
> +     * In that case the socket number cannot be relied upon, but the respective
> +     * socket_cpumask[] slot also wouldn't have been set.
> +     */
> +    if ( c[cpu].apicid != boot_cpu_data.apicid &&
> +         cpumask_empty(socket_cpumask[socket]) )
>      {
>          xfree(socket_cpumask[socket]);
>          socket_cpumask[socket] = NULL;