Before CPU identification has run (and it may not have run at all e.g.
when AP bringup failed altogether), cpu_data[].phys_proc_id (which is
what cpu_to_socket() resolves to) can't really be used. The use of
cpu_to_socket()'s result as an array index cpu_smpboot_free() therefore
needs guarding, as the function will also be invoked upon AP bringup
failure, in which case CPU identification may not have run.
Without "x86/CPU: re-work populating of cpu_data[]" [1] the issue is
less pronounced: The field starts out as zero, then has the BSP value
(likely again zero) copied into it, and it is properly invalidated only
in cpu_smpboot_free(). Still it is clearly wrong to use the BSP's socket
number here.
Making the guard work with and without the above patch applied turns out
interesting: Prior to that patch, the sole invalidation done is that in
cpu_smpboot_free(). Upon a later bringup attempt, the fields invalidated
are overwritten by the BSP values again, though. Hence compare APIC IDs,
as they cannot validly be the same once CPU identification has run.
[1] https://lists.xen.org/archives/html/xen-devel/2024-02/msg00727.html
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Sadly there was no feedback at all yet for the referenced patch.
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -958,7 +958,13 @@ static void cpu_smpboot_free(unsigned in
unsigned int socket = cpu_to_socket(cpu);
struct cpuinfo_x86 *c = cpu_data;
- if ( cpumask_empty(socket_cpumask[socket]) )
+ /*
+ * We may come here without the CPU having run through CPU identification.
+ * In that case the socket number cannot be relied upon, but the respective
+ * socket_cpumask[] slot also wouldn't have been set.
+ */
+ if ( c[cpu].apicid != boot_cpu_data.apicid &&
+ cpumask_empty(socket_cpumask[socket]) )
{
xfree(socket_cpumask[socket]);
socket_cpumask[socket] = NULL;
On Mon, Jun 30, 2025 at 01:42:31PM +0200, Jan Beulich wrote: > Before CPU identification has run (and it may not have run at all e.g. > when AP bringup failed altogether), cpu_data[].phys_proc_id (which is > what cpu_to_socket() resolves to) can't really be used. The use of > cpu_to_socket()'s result as an array index cpu_smpboot_free() therefore > needs guarding, as the function will also be invoked upon AP bringup > failure, in which case CPU identification may not have run. > > Without "x86/CPU: re-work populating of cpu_data[]" [1] the issue is > less pronounced: The field starts out as zero, then has the BSP value > (likely again zero) copied into it, and it is properly invalidated only > in cpu_smpboot_free(). Still it is clearly wrong to use the BSP's socket > number here. > > Making the guard work with and without the above patch applied turns out > interesting: Prior to that patch, the sole invalidation done is that in > cpu_smpboot_free(). Upon a later bringup attempt, the fields invalidated > are overwritten by the BSP values again, though. Hence compare APIC IDs, > as they cannot validly be the same once CPU identification has run. > > [1] https://lists.xen.org/archives/html/xen-devel/2024-02/msg00727.html > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger.
On 30.06.2025 13:42, Jan Beulich wrote:
> Before CPU identification has run (and it may not have run at all e.g.
> when AP bringup failed altogether), cpu_data[].phys_proc_id (which is
> what cpu_to_socket() resolves to) can't really be used. The use of
> cpu_to_socket()'s result as an array index cpu_smpboot_free() therefore
> needs guarding, as the function will also be invoked upon AP bringup
> failure, in which case CPU identification may not have run.
>
> Without "x86/CPU: re-work populating of cpu_data[]" [1] the issue is
> less pronounced: The field starts out as zero, then has the BSP value
> (likely again zero) copied into it, and it is properly invalidated only
> in cpu_smpboot_free(). Still it is clearly wrong to use the BSP's socket
> number here.
>
> Making the guard work with and without the above patch applied turns out
> interesting: Prior to that patch, the sole invalidation done is that in
> cpu_smpboot_free(). Upon a later bringup attempt, the fields invalidated
> are overwritten by the BSP values again, though. Hence compare APIC IDs,
> as they cannot validly be the same once CPU identification has run.
>
> [1] https://lists.xen.org/archives/html/xen-devel/2024-02/msg00727.html
Fixes: 4824bdfdabeb ("x86: add socket_cpumask")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> Sadly there was no feedback at all yet for the referenced patch.
>
> --- a/xen/arch/x86/smpboot.c
> +++ b/xen/arch/x86/smpboot.c
> @@ -958,7 +958,13 @@ static void cpu_smpboot_free(unsigned in
> unsigned int socket = cpu_to_socket(cpu);
> struct cpuinfo_x86 *c = cpu_data;
>
> - if ( cpumask_empty(socket_cpumask[socket]) )
> + /*
> + * We may come here without the CPU having run through CPU identification.
> + * In that case the socket number cannot be relied upon, but the respective
> + * socket_cpumask[] slot also wouldn't have been set.
> + */
> + if ( c[cpu].apicid != boot_cpu_data.apicid &&
> + cpumask_empty(socket_cpumask[socket]) )
> {
> xfree(socket_cpumask[socket]);
> socket_cpumask[socket] = NULL;
© 2016 - 2025 Red Hat, Inc.