xen/common/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
With smt=0 during a suspend/resume cycle of the machine the threads
which have been parked before will briefly come up again. This can
result in problems e.g. with cpufreq driver being active as this will
call into get_cpu_idle_time() for a cpu without initialized scheduler
data.
Fix that by letting get_cpu_idle_time() deal with this case.
Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
An alternative way to fix the issue would be to keep the sched_resource
of offline cpus allocated like we already do with idle vcpus and units.
This fix would be more intrusive, but it would avoid similar other bugs
like this one.
---
xen/common/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 6d34764d38..9ac1b01ca8 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu)
struct vcpu_runstate_info state = { 0 };
const struct vcpu *v = idle_vcpu[cpu];
- if ( cpu_online(cpu) && v )
+ if ( cpu_online(cpu) && v && get_sched_res(cpu) )
vcpu_runstate_get(v, &state);
return state.time[RUNSTATE_running];
--
2.26.2
On Wed, 2021-08-18 at 12:21 +0200, Juergen Gross wrote: > Fix that by letting get_cpu_idle_time() deal with this case. > > Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core > scheduling") > Reported-by: Marek Marczykowski-Górecki > <marmarek@invisiblethingslab.com> > Signed-off-by: Juergen Gross <jgross@suse.com> > Tested-by: Marek Marczykowski-Górecki > <marmarek@invisiblethingslab.com> > Mmm... This is an interesting one. In fact, this fix is not only correct, it's also simple, effective and (I guess) easy enough to backport. Considering all these things together with the fact that we have an open issue, I'm going to provide my: Acked-by: Dario Faggioli <dfaggioli@suse.com> (and this remains valid with Jan's proposed change done upon committing.) That said, in the long run... > --- > An alternative way to fix the issue would be to keep the > sched_resource > of offline cpus allocated like we already do with idle vcpus and > units. > This fix would be more intrusive, but it would avoid similar other > bugs > like this one. > ... it would be probably interesting to go this route, as it looks both more consistent and future proof (I mean implement it proactively, independently of issues... when/if we have time, of course!) Thanks and Regards -- Dario Faggioli, Ph.D http://about.me/dario.faggioli Virtualization Software Engineer SUSE Labs, SUSE https://www.suse.com/ ------------------------------------------------------------------- <<This happens because _I_ choose it to happen!>> (Raistlin Majere)
On 18.08.2021 12:21, Juergen Gross wrote: > With smt=0 during a suspend/resume cycle of the machine the threads > which have been parked before will briefly come up again. This can > result in problems e.g. with cpufreq driver being active as this will > call into get_cpu_idle_time() for a cpu without initialized scheduler > data. > > Fix that by letting get_cpu_idle_time() deal with this case. > > Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling") > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > Signed-off-by: Juergen Gross <jgross@suse.com> > Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> > --- a/xen/common/sched/core.c > +++ b/xen/common/sched/core.c > @@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu) > struct vcpu_runstate_info state = { 0 }; > const struct vcpu *v = idle_vcpu[cpu]; > > - if ( cpu_online(cpu) && v ) > + if ( cpu_online(cpu) && v && get_sched_res(cpu) ) > vcpu_runstate_get(v, &state); My earlier question was aiming at getting rid of the (now) middle part of the condition; I thought this may be okay to do as a secondary change here. But perhaps you intentionally left it there, so I'm unsure whether to suggest to make the adjustment while committing (awaiting a maintainer ack first anyway). Jan
On 18.08.21 12:35, Jan Beulich wrote: > On 18.08.2021 12:21, Juergen Gross wrote: >> With smt=0 during a suspend/resume cycle of the machine the threads >> which have been parked before will briefly come up again. This can >> result in problems e.g. with cpufreq driver being active as this will >> call into get_cpu_idle_time() for a cpu without initialized scheduler >> data. >> >> Fix that by letting get_cpu_idle_time() deal with this case. >> >> Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling") >> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> >> Signed-off-by: Juergen Gross <jgross@suse.com> >> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > >> --- a/xen/common/sched/core.c >> +++ b/xen/common/sched/core.c >> @@ -337,7 +337,7 @@ uint64_t get_cpu_idle_time(unsigned int cpu) >> struct vcpu_runstate_info state = { 0 }; >> const struct vcpu *v = idle_vcpu[cpu]; >> >> - if ( cpu_online(cpu) && v ) >> + if ( cpu_online(cpu) && v && get_sched_res(cpu) ) >> vcpu_runstate_get(v, &state); > > My earlier question was aiming at getting rid of the (now) middle part > of the condition; I thought this may be okay to do as a secondary change > here. But perhaps you intentionally left it there, so I'm unsure whether > to suggest to make the adjustment while committing (awaiting a > maintainer ack first anyway). Ah, okay. Yes, I think the test of v being non-NULL can be removed. Juergen
© 2016 - 2024 Red Hat, Inc.