[PATCH] x86/x2APIC: correct cluster tracking upon CPUs going down for S3

Jan Beulich posted 1 patch 3 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/0a082833-3d52-4c9d-8ea4-5be01986cd2c@suse.com
[PATCH] x86/x2APIC: correct cluster tracking upon CPUs going down for S3
Posted by Jan Beulich 3 months, 2 weeks ago
Downing CPUs for S3 is somewhat special: Since we can expect the system
to come back up in exactly the same hardware configuration, per-CPU data
for the secondary CPUs isn't de-allocated (and then cleared upon re-
allocation when the CPUs are being brought back up). Therefore the
cluster_cpus per-CPU pointer will retain its value for all CPUs other
than the final one in a cluster (i.e. in particular for all CPUs in the
same cluster as CPU0). That, however, is in conflict with the assertion
early in init_apic_ldr_x2apic_cluster().

Note that the issue is avoided on Intel hardware, where we park CPUs
instead of bringing them down.

Extend the bypassing of the freeing to the suspend case, thus making
suspend/resume also a tiny bit faster.

Fixes: 2e6c8f182c9c ("x86: distinguish CPU offlining from CPU removal")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

--- a/xen/arch/x86/genapic/x2apic.c
+++ b/xen/arch/x86/genapic/x2apic.c
@@ -228,7 +228,8 @@ static int cf_check update_clusterinfo(
     case CPU_UP_CANCELED:
     case CPU_DEAD:
     case CPU_REMOVE:
-        if ( park_offline_cpus == (action != CPU_REMOVE) )
+        if ( park_offline_cpus == (action != CPU_REMOVE) ||
+             system_state == SYS_STATE_suspend )
             break;
         if ( per_cpu(cluster_cpus, cpu) )
         {

Re: [PATCH] x86/x2APIC: correct cluster tracking upon CPUs going down for S3
Posted by Andrew Cooper 3 months ago
On 08/08/2024 2:30 pm, Jan Beulich wrote:
> Downing CPUs for S3 is somewhat special: Since we can expect the system
> to come back up in exactly the same hardware configuration, per-CPU data
> for the secondary CPUs isn't de-allocated (and then cleared upon re-
> allocation when the CPUs are being brought back up). Therefore the
> cluster_cpus per-CPU pointer will retain its value for all CPUs other
> than the final one in a cluster (i.e. in particular for all CPUs in the
> same cluster as CPU0). That, however, is in conflict with the assertion
> early in init_apic_ldr_x2apic_cluster().
>
> Note that the issue is avoided on Intel hardware, where we park CPUs
> instead of bringing them down.
>
> Extend the bypassing of the freeing to the suspend case, thus making
> suspend/resume also a tiny bit faster.
>
> Fixes: 2e6c8f182c9c ("x86: distinguish CPU offlining from CPU removal")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>