xen/common/percpu.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
Invocation of the CPU_UP_PREPARE notification
on ARM64 during resume causes a crash:
(XEN) [  315.807606] Error bringing CPU1 up: -16
(XEN) [  315.811926] Xen BUG at common/cpu.c:258
[...]
(XEN) [  316.142765] Xen call trace:
(XEN) [  316.146048]    [<00000a0000202264>] enable_nonboot_cpus+0x128/0x1ac (PC)
(XEN) [  316.153219]    [<00000a000020225c>] enable_nonboot_cpus+0x120/0x1ac (LR)
(XEN) [  316.160391]    [<00000a0000278180>] suspend.c#system_suspend+0x4c/0x1a0
(XEN) [  316.167476]    [<00000a0000206b70>] domain.c#continue_hypercall_tasklet_handler+0x54/0xd0
(XEN) [  316.176117]    [<00000a0000226538>] tasklet.c#do_tasklet_work+0xb8/0x100
(XEN) [  316.183288]    [<00000a0000226920>] do_tasklet+0x68/0xb0
(XEN) [  316.189077]    [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194
(XEN) [  316.195644]    [<00000a0000277638>] shutdown.c#halt_this_cpu+0/0x14
(XEN) [  316.202383]    [<0000000000000008>] 0000000000000008
Freeing per-CPU areas and setting __per_cpu_offset to INVALID_PERCPU_AREA
only occur when !park_offline_cpus and system_state is not SYS_STATE_suspend.
On ARM64, park_offline_cpus is always false, so setting __per_cpu_offset to
INVALID_PERCPU_AREA depends solely on the system state.
If the system is suspended, this area is not freed, and during resume, an error
occurs in init_percpu_area, causing a crash because INVALID_PERCPU_AREA is not
set and park_offline_cpus remains 0:
    if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
        return park_offline_cpus ? 0 : -EBUSY;
It appears that the same crash can occur on x86 if park_offline_cpus is set
to 0 during Xen suspend.
Signed-off-by: Mykyta Poturai <mykyta_poturai@epam.com>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Discussion related to these changes can be found here:
https://marc.info/?l=xen-devel&m=174116563705295&w=2
---
 xen/common/percpu.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/xen/common/percpu.c b/xen/common/percpu.c
index e4e8b7bcab..607557436d 100644
--- a/xen/common/percpu.c
+++ b/xen/common/percpu.c
@@ -30,7 +30,12 @@ static int init_percpu_area(unsigned int cpu)
     char *p;
 
     if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
-        return park_offline_cpus ? 0 : -EBUSY;
+    {
+        if ( system_state == SYS_STATE_resume )
+            return 0;
+        else
+            return park_offline_cpus ? 0 : -EBUSY;
+    }
 
     if ( (p = alloc_xenheap_pages(PERCPU_ORDER, 0)) == NULL )
         return -ENOMEM;
-- 
2.43.0On 27.03.2025 06:22, Mykola Kvach wrote: > Invocation of the CPU_UP_PREPARE notification > on ARM64 during resume causes a crash: > > (XEN) [ 315.807606] Error bringing CPU1 up: -16 > (XEN) [ 315.811926] Xen BUG at common/cpu.c:258 > [...] > (XEN) [ 316.142765] Xen call trace: > (XEN) [ 316.146048] [<00000a0000202264>] enable_nonboot_cpus+0x128/0x1ac (PC) > (XEN) [ 316.153219] [<00000a000020225c>] enable_nonboot_cpus+0x120/0x1ac (LR) > (XEN) [ 316.160391] [<00000a0000278180>] suspend.c#system_suspend+0x4c/0x1a0 > (XEN) [ 316.167476] [<00000a0000206b70>] domain.c#continue_hypercall_tasklet_handler+0x54/0xd0 > (XEN) [ 316.176117] [<00000a0000226538>] tasklet.c#do_tasklet_work+0xb8/0x100 > (XEN) [ 316.183288] [<00000a0000226920>] do_tasklet+0x68/0xb0 > (XEN) [ 316.189077] [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194 > (XEN) [ 316.195644] [<00000a0000277638>] shutdown.c#halt_this_cpu+0/0x14 > (XEN) [ 316.202383] [<0000000000000008>] 0000000000000008 > > Freeing per-CPU areas and setting __per_cpu_offset to INVALID_PERCPU_AREA > only occur when !park_offline_cpus and system_state is not SYS_STATE_suspend. > On ARM64, park_offline_cpus is always false, so setting __per_cpu_offset to > INVALID_PERCPU_AREA depends solely on the system state. > > If the system is suspended, this area is not freed, and during resume, an error > occurs in init_percpu_area, causing a crash because INVALID_PERCPU_AREA is not > set and park_offline_cpus remains 0: > > if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA ) > return park_offline_cpus ? 0 : -EBUSY; > > It appears that the same crash can occur on x86 if park_offline_cpus is set > to 0 during Xen suspend. Oh, also - it's not "appears"; iirc Marek meanwhile confirmed the misbehavior on x86 AMD hardware. Jan
On Thu, Mar 27, 2025 at 10:27 AM Jan Beulich <jbeulich@suse.com> wrote: > > On 27.03.2025 06:22, Mykola Kvach wrote: > > Invocation of the CPU_UP_PREPARE notification > > on ARM64 during resume causes a crash: > > > > (XEN) [ 315.807606] Error bringing CPU1 up: -16 > > (XEN) [ 315.811926] Xen BUG at common/cpu.c:258 > > [...] > > (XEN) [ 316.142765] Xen call trace: > > (XEN) [ 316.146048] [<00000a0000202264>] enable_nonboot_cpus+0x128/0x1ac (PC) > > (XEN) [ 316.153219] [<00000a000020225c>] enable_nonboot_cpus+0x120/0x1ac (LR) > > (XEN) [ 316.160391] [<00000a0000278180>] suspend.c#system_suspend+0x4c/0x1a0 > > (XEN) [ 316.167476] [<00000a0000206b70>] domain.c#continue_hypercall_tasklet_handler+0x54/0xd0 > > (XEN) [ 316.176117] [<00000a0000226538>] tasklet.c#do_tasklet_work+0xb8/0x100 > > (XEN) [ 316.183288] [<00000a0000226920>] do_tasklet+0x68/0xb0 > > (XEN) [ 316.189077] [<00000a000026e120>] domain.c#idle_loop+0x7c/0x194 > > (XEN) [ 316.195644] [<00000a0000277638>] shutdown.c#halt_this_cpu+0/0x14 > > (XEN) [ 316.202383] [<0000000000000008>] 0000000000000008 > > > > Freeing per-CPU areas and setting __per_cpu_offset to INVALID_PERCPU_AREA > > only occur when !park_offline_cpus and system_state is not SYS_STATE_suspend. > > On ARM64, park_offline_cpus is always false, so setting __per_cpu_offset to > > INVALID_PERCPU_AREA depends solely on the system state. > > > > If the system is suspended, this area is not freed, and during resume, an error > > occurs in init_percpu_area, causing a crash because INVALID_PERCPU_AREA is not > > set and park_offline_cpus remains 0: > > > > if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA ) > > return park_offline_cpus ? 0 : -EBUSY; > > > > It appears that the same crash can occur on x86 if park_offline_cpus is set > > to 0 during Xen suspend. > > Oh, also - it's not "appears"; iirc Marek meanwhile confirmed the misbehavior on > x86 AMD hardware. I've updated this line in the commit message. > > Jan ~Mykola
On 27.03.2025 06:22, Mykola Kvach wrote:
> --- a/xen/common/percpu.c
> +++ b/xen/common/percpu.c
> @@ -30,7 +30,12 @@ static int init_percpu_area(unsigned int cpu)
>      char *p;
>  
>      if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
> -        return park_offline_cpus ? 0 : -EBUSY;
> +    {
> +        if ( system_state == SYS_STATE_resume )
> +            return 0;
> +        else
> +            return park_offline_cpus ? 0 : -EBUSY;
> +    }
Hmm, why not the much simpler
        return park_offline_cpus || system_state == SYS_STATE_resume
               ? 0
               : -EBUSY;
Even if not for whatever reason, I'd really like to ask to omit such an
unnecessary "else".
Preferably with the adjustment (which I'd be happy to make while
committing) and with the (iirc) previously suggested Fixes: tag:
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan
                
            Hi,
On Thu, Mar 27, 2025 at 10:25 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 27.03.2025 06:22, Mykola Kvach wrote:
> > --- a/xen/common/percpu.c
> > +++ b/xen/common/percpu.c
> > @@ -30,7 +30,12 @@ static int init_percpu_area(unsigned int cpu)
> >      char *p;
> >
> >      if ( __per_cpu_offset[cpu] != INVALID_PERCPU_AREA )
> > -        return park_offline_cpus ? 0 : -EBUSY;
> > +    {
> > +        if ( system_state == SYS_STATE_resume )
> > +            return 0;
> > +        else
> > +            return park_offline_cpus ? 0 : -EBUSY;
> > +    }
>
> Hmm, why not the much simpler
>
>         return park_offline_cpus || system_state == SYS_STATE_resume
>                ? 0
>                : -EBUSY;
>
> Even if not for whatever reason, I'd really like to ask to omit such an
> unnecessary "else".
Done
>
> Preferably with the adjustment (which I'd be happy to make while
> committing) and with the (iirc) previously suggested Fixes: tag:
Done
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Added
>
> Jan
Best regards,
Mykola
                
            © 2016 - 2025 Red Hat, Inc.