[Xen-devel] [PATCH v4] x86/cpu: Sync any remaining RCU callbacks before CPU up/down

Igor Druzhinin posted 1 patch 4 years, 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/1583863578-18063-1-git-send-email-igor.druzhinin@citrix.com
xen/arch/x86/acpi/power.c |  1 -
xen/arch/x86/sysctl.c     | 10 ++--------
xen/common/cpu.c          |  2 ++
3 files changed, 4 insertions(+), 9 deletions(-)
[Xen-devel] [PATCH v4] x86/cpu: Sync any remaining RCU callbacks before CPU up/down
Posted by Igor Druzhinin 4 years, 1 month ago
During CPU down operation RCU callbacks are scheduled to finish
off some actions later as soon as CPU is fully dead (the same applies
to CPU up operation in case error path is taken). If in the same grace
period another CPU up operation is performed on the same CPU, RCU callback
will be called later on a CPU in a potentially wrong (already up again
instead of still being down) state leading to eventual state inconsistency
and/or crash.

In order to avoid it - flush RCU callbacks explicitly before starting the
next CPU up/down operation.

Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
This got discovered trying to resume PV shim with multiple vCPUs on AMD
machine (where park_offline_cpus == 0). RCU callback responsible for
freeing percpu area on CPU offline got finally called after CPU went
online again as the guest performed regular vCPU offline/online operations
on resume.

Note: this patch requires RCU series v4 from Juergen to be applied -
https://lists.xenproject.org/archives/html/xen-devel/2020-03/msg00668.html

v2: changed rcu_barrier() position, updated description
v3: moved rcu_barrier() to common cpu_up/cpu_down code to cover more cases
v4: kept existing comments in modified form
---
 xen/arch/x86/acpi/power.c |  1 -
 xen/arch/x86/sysctl.c     | 10 ++--------
 xen/common/cpu.c          |  2 ++
 3 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index b5df00b..847c273 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -305,7 +305,6 @@ static int enter_state(u32 state)
     cpufreq_add_cpu(0);
 
  enable_cpu:
-    rcu_barrier();
     mtrr_aps_sync_begin();
     enable_nonboot_cpus();
     mtrr_aps_sync_end();
diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index a95923e..b0cb1b5 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -84,12 +84,9 @@ long cpu_up_helper(void *data)
     unsigned int cpu = (unsigned long)data;
     int ret = cpu_up(cpu);
 
+    /* Have one more go on EBUSY. */
     if ( ret == -EBUSY )
-    {
-        /* On EBUSY, flush RCU work and have one more go. */
-        rcu_barrier();
         ret = cpu_up(cpu);
-    }
 
     if ( !ret && !opt_smt &&
          cpu_data[cpu].compute_unit_id == INVALID_CUID &&
@@ -109,12 +106,9 @@ long cpu_down_helper(void *data)
 {
     int cpu = (unsigned long)data;
     int ret = cpu_down(cpu);
+    /* Have one more go on EBUSY. */
     if ( ret == -EBUSY )
-    {
-        /* On EBUSY, flush RCU work and have one more go. */
-        rcu_barrier();
         ret = cpu_down(cpu);
-    }
     return ret;
 }
 
diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index 31953f3..1f976db 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -4,6 +4,7 @@
 #include <xen/init.h>
 #include <xen/sched.h>
 #include <xen/stop_machine.h>
+#include <xen/rcupdate.h>
 
 unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
 #ifndef nr_cpumask_bits
@@ -53,6 +54,7 @@ void put_cpu_maps(void)
 
 void cpu_hotplug_begin(void)
 {
+    rcu_barrier();
     write_lock(&cpu_add_remove_lock);
 }
 
-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4] x86/cpu: Sync any remaining RCU callbacks before CPU up/down
Posted by Jan Beulich 4 years, 1 month ago
On 10.03.2020 19:06, Igor Druzhinin wrote:
> During CPU down operation RCU callbacks are scheduled to finish
> off some actions later as soon as CPU is fully dead (the same applies
> to CPU up operation in case error path is taken). If in the same grace
> period another CPU up operation is performed on the same CPU, RCU callback
> will be called later on a CPU in a potentially wrong (already up again
> instead of still being down) state leading to eventual state inconsistency
> and/or crash.
> 
> In order to avoid it - flush RCU callbacks explicitly before starting the
> next CPU up/down operation.
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>

(Nit: These preferably would come in inverse order.)

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4] x86/cpu: Sync any remaining RCU callbacks before CPU up/down
Posted by Jan Beulich 4 years, 1 month ago
On 10.03.2020 19:06, Igor Druzhinin wrote:
> During CPU down operation RCU callbacks are scheduled to finish
> off some actions later as soon as CPU is fully dead (the same applies
> to CPU up operation in case error path is taken). If in the same grace
> period another CPU up operation is performed on the same CPU, RCU callback
> will be called later on a CPU in a potentially wrong (already up again
> instead of still being down) state leading to eventual state inconsistency
> and/or crash.
> 
> In order to avoid it - flush RCU callbacks explicitly before starting the
> next CPU up/down operation.
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
> This got discovered trying to resume PV shim with multiple vCPUs on AMD
> machine (where park_offline_cpus == 0). RCU callback responsible for
> freeing percpu area on CPU offline got finally called after CPU went
> online again as the guest performed regular vCPU offline/online operations
> on resume.
> 
> Note: this patch requires RCU series v4 from Juergen to be applied -
> https://lists.xenproject.org/archives/html/xen-devel/2020-03/msg00668.html

I was about to apply the patch yesterday (I think) when I stumbled
across this note. Is this actually still true? If so, would you
mind helping me see the dependency?

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v4] x86/cpu: Sync any remaining RCU callbacks before CPU up/down
Posted by Igor Druzhinin 4 years, 1 month ago
On 13/03/2020 11:23, Jan Beulich wrote:
> On 10.03.2020 19:06, Igor Druzhinin wrote:
>> During CPU down operation RCU callbacks are scheduled to finish
>> off some actions later as soon as CPU is fully dead (the same applies
>> to CPU up operation in case error path is taken). If in the same grace
>> period another CPU up operation is performed on the same CPU, RCU callback
>> will be called later on a CPU in a potentially wrong (already up again
>> instead of still being down) state leading to eventual state inconsistency
>> and/or crash.
>>
>> In order to avoid it - flush RCU callbacks explicitly before starting the
>> next CPU up/down operation.
>>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
>> ---
>> This got discovered trying to resume PV shim with multiple vCPUs on AMD
>> machine (where park_offline_cpus == 0). RCU callback responsible for
>> freeing percpu area on CPU offline got finally called after CPU went
>> online again as the guest performed regular vCPU offline/online operations
>> on resume.
>>
>> Note: this patch requires RCU series v4 from Juergen to be applied -
>> https://lists.xenproject.org/archives/html/xen-devel/2020-03/msg00668.html
> 
> I was about to apply the patch yesterday (I think) when I stumbled
> across this note. Is this actually still true? If so, would you
> mind helping me see the dependency?

Yes, that's the case otherwise you're risking to crash near 100% of
installations as rcu_barrirer without Juergen's fixes is simply broken.

Igor

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel