xen/common/sched/core.c | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-)
In core-scheduling mode, Xen might crash when entering ACPI S5 state.
This happens in sched_slave() during is_idle_unit(next) check because
next->vcpu_list is stale and points to an already freed memory.
This situation happens shortly after scheduler_disable() is called if
some CPU is still inside sched_slave() softirq. Current logic simply
returns prev->next_task from sched_wait_rendezvous_in() which causes
the described crash because next_task->vcpu_list has become invalid.
Fix the crash by returning NULL from sched_wait_rendezvous_in() in
the case when scheduler_disable() has been called.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
CC: Juergen Gross <jgross@suse.com>
CC: Dario Faggioli <dfaggioli@suse.com>
CC: George Dunlap <george.dunlap@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
---
xen/common/sched/core.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 626861a3fe..d4a6489929 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -2484,19 +2484,15 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev,
*lock = pcpu_schedule_lock_irq(cpu);
- if ( unlikely(!scheduler_active) )
- {
- ASSERT(is_idle_unit(prev));
- atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
- prev->rendezvous_in_cnt = 0;
- }
-
/*
* Check for scheduling resource switched. This happens when we are
* moved away from our cpupool and cpus are subject of the idle
* scheduler now.
+ *
+ * This is also a bail out case when scheduler_disable() has been
+ * called.
*/
- if ( unlikely(sr != get_sched_res(cpu)) )
+ if ( unlikely(sr != get_sched_res(cpu) || !scheduler_active) )
{
ASSERT(is_idle_unit(prev));
atomic_set(&prev->next_task->rendezvous_out_cnt, 0);
--
2.17.1
On 09.04.20 11:41, Sergey Dyasli wrote: > In core-scheduling mode, Xen might crash when entering ACPI S5 state. > This happens in sched_slave() during is_idle_unit(next) check because > next->vcpu_list is stale and points to an already freed memory. > > This situation happens shortly after scheduler_disable() is called if > some CPU is still inside sched_slave() softirq. Current logic simply > returns prev->next_task from sched_wait_rendezvous_in() which causes > the described crash because next_task->vcpu_list has become invalid. > > Fix the crash by returning NULL from sched_wait_rendezvous_in() in > the case when scheduler_disable() has been called. > > Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Good catch! Have you seen any further problems (e.g. with cpu on/offlining) with this patch applied? Reviewed-by: Juergen Gross <jgross@suse.com> Juergen
(CC Igor) On 09/04/2020 13:50, Jürgen Groß wrote: > On 09.04.20 11:41, Sergey Dyasli wrote: >> In core-scheduling mode, Xen might crash when entering ACPI S5 state. >> This happens in sched_slave() during is_idle_unit(next) check because >> next->vcpu_list is stale and points to an already freed memory. >> >> This situation happens shortly after scheduler_disable() is called if >> some CPU is still inside sched_slave() softirq. Current logic simply >> returns prev->next_task from sched_wait_rendezvous_in() which causes >> the described crash because next_task->vcpu_list has become invalid. >> >> Fix the crash by returning NULL from sched_wait_rendezvous_in() in >> the case when scheduler_disable() has been called. >> >> Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> > > Good catch! > > Have you seen any further problems (e.g. with cpu on/offlining) with > this patch applied? This patch shouldn't affect cpu on/offlining AFAICS. Igor was the one testing cpu on/offlining and I think he came to a conclusion that it's broken even without core-scheduling enabled. > Reviewed-by: Juergen Gross <jgross@suse.com> Thanks! -- Sergey
On Thu, 2020-04-09 at 14:50 +0200, Jürgen Groß wrote: > On 09.04.20 11:41, Sergey Dyasli wrote: > > In core-scheduling mode, Xen might crash when entering ACPI S5 > > state. > > This happens in sched_slave() during is_idle_unit(next) check > > because > > next->vcpu_list is stale and points to an already freed memory. > > > > This situation happens shortly after scheduler_disable() is called > > if > > some CPU is still inside sched_slave() softirq. Current logic > > simply > > returns prev->next_task from sched_wait_rendezvous_in() which > > causes > > the described crash because next_task->vcpu_list has become > > invalid. > > > > Fix the crash by returning NULL from sched_wait_rendezvous_in() in > > the case when scheduler_disable() has been called. > > > > Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> > > Reviewed-by: Juergen Gross <jgross@suse.com> > Reviewed-by: Dario Faggioli <dfaggioli@suse.com> Thanks and Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere)
© 2016 - 2024 Red Hat, Inc.