Currently, the throttle_thread_scheduled flag is reset back to 0 before
sleeping (as part of the throttling logic). Given that throttle_timer
(well, any timer) may tick with a slight delay, it so happens that under
heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
schedule a further cpu_throttle_thread() work item after the flag reset,
but before the previous sleep completed. This results on the vCPU thread
sleeping continuously for potentially several seconds in a row.
The chances of that happening can be drastically minimised by resetting
the flag after the sleep.
Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>
---
cpus.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/cpus.c b/cpus.c
index 516e5cb..f42eebd 100644
--- a/cpus.c
+++ b/cpus.c
@@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
qemu_mutex_unlock_iothread();
- atomic_set(&cpu->throttle_thread_scheduled, 0);
g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */
qemu_mutex_lock_iothread();
+ atomic_set(&cpu->throttle_thread_scheduled, 0);
}
static void cpu_throttle_timer_tick(void *opaque)
--
1.9.5
On 05/19/2017 05:29 PM, Felipe Franciosi wrote: > Currently, the throttle_thread_scheduled flag is reset back to 0 before > sleeping (as part of the throttling logic). Given that throttle_timer > (well, any timer) may tick with a slight delay, it so happens that under > heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may > schedule a further cpu_throttle_thread() work item after the flag reset, > but before the previous sleep completed. This results on the vCPU thread > sleeping continuously for potentially several seconds in a row. > > The chances of that happening can be drastically minimised by resetting > the flag after the sleep. > > Signed-off-by: Felipe Franciosi <felipe@nutanix.com> > Signed-off-by: Malcolm Crossley <malcolm@nutanix.com> > --- > cpus.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/cpus.c b/cpus.c > index 516e5cb..f42eebd 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque) > sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS); > > qemu_mutex_unlock_iothread(); > - atomic_set(&cpu->throttle_thread_scheduled, 0); > g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */ > qemu_mutex_lock_iothread(); > + atomic_set(&cpu->throttle_thread_scheduled, 0); > } > > static void cpu_throttle_timer_tick(void *opaque) > This seems to make sense to me. Acked-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> I'm CC'ing Juan, Amit and David as they are all active in the migration area and may have opinions on this. Juan and David were also reviewers for the original series. -- -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote: > On 05/19/2017 05:29 PM, Felipe Franciosi wrote: > > Currently, the throttle_thread_scheduled flag is reset back to 0 before > > sleeping (as part of the throttling logic). Given that throttle_timer > > (well, any timer) may tick with a slight delay, it so happens that under > > heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may > > schedule a further cpu_throttle_thread() work item after the flag reset, > > but before the previous sleep completed. This results on the vCPU thread > > sleeping continuously for potentially several seconds in a row. > > > > The chances of that happening can be drastically minimised by resetting > > the flag after the sleep. > > > > Signed-off-by: Felipe Franciosi <felipe@nutanix.com> > > Signed-off-by: Malcolm Crossley <malcolm@nutanix.com> > > --- > > cpus.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/cpus.c b/cpus.c > > index 516e5cb..f42eebd 100644 > > --- a/cpus.c > > +++ b/cpus.c > > @@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque) > > sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS); > > > > qemu_mutex_unlock_iothread(); > > - atomic_set(&cpu->throttle_thread_scheduled, 0); > > g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */ > > qemu_mutex_lock_iothread(); > > + atomic_set(&cpu->throttle_thread_scheduled, 0); > > } > > > > static void cpu_throttle_timer_tick(void *opaque) > > > > This seems to make sense to me. > > Acked-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> > > I'm CC'ing Juan, Amit and David as they are all active in the migration area > and may have > opinions on this. Juan and David were also reviewers for the original > series. The description is interesting and sounds reasonable; it'll be interesting to see what difference it makes to the autoconverge behaviour for those workloads that need this level of throttle. Dave > -- > -- Jason J. Herne (jjherne@linux.vnet.ibm.com) > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> On 1 Jun 2017, at 15:36, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote: >> On 05/19/2017 05:29 PM, Felipe Franciosi wrote: >>> Currently, the throttle_thread_scheduled flag is reset back to 0 before >>> sleeping (as part of the throttling logic). Given that throttle_timer >>> (well, any timer) may tick with a slight delay, it so happens that under >>> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may >>> schedule a further cpu_throttle_thread() work item after the flag reset, >>> but before the previous sleep completed. This results on the vCPU thread >>> sleeping continuously for potentially several seconds in a row. >>> >>> The chances of that happening can be drastically minimised by resetting >>> the flag after the sleep. >>> >>> Signed-off-by: Felipe Franciosi <felipe@nutanix.com> >>> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com> >>> --- >>> cpus.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/cpus.c b/cpus.c >>> index 516e5cb..f42eebd 100644 >>> --- a/cpus.c >>> +++ b/cpus.c >>> @@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque) >>> sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS); >>> >>> qemu_mutex_unlock_iothread(); >>> - atomic_set(&cpu->throttle_thread_scheduled, 0); >>> g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */ >>> qemu_mutex_lock_iothread(); >>> + atomic_set(&cpu->throttle_thread_scheduled, 0); >>> } >>> >>> static void cpu_throttle_timer_tick(void *opaque) >>> >> >> This seems to make sense to me. >> >> Acked-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> >> >> I'm CC'ing Juan, Amit and David as they are all active in the migration area >> and may have >> opinions on this. Juan and David were also reviewers for the original >> series. > > The description is interesting and sounds reasonable; it'll be > interesting to see what difference it makes to the autoconverge > behaviour for those workloads that need this level of throttle. To get some hard data, we wrote a little application that: 1) spawns multiple threads (one per vCPU) 2) each thread mmap()s+mlock()s a certain workset (eg. 30GB/#threads for a 32GB VM) 3) each thread writes a word to the beginning of every page in a tight loop 4) the parent thread periodically reports the number of dirtied pages Even on a dedicated 10G link, that is pretty much guaranteed to require 99% throttle to converge. Before the patch, Qemu migrates the VM (depicted above) fairly quickly (~40s) after reaching 99% throttle. The application reported a few seconds at a time with lockups which we initially thought was just that thread not running between Qemu-induced vCPU sleeps (and later attributed it to the reported bug). Then we used a 1G link. This time, the migration had to run for a lot longer even at 99%. That made the bug more likely to happen and we observed soft lockups (reported by the guest's kernel on the console) of 70+ seconds. Using the patch, and back on a 10G link, the migration completes after a few more iterations than before (took just under 2mins after reaching 99%). If you want further validation of the bug, instrumenting cpus-common.c:process_queued_cpu_work() could be done to show that cpu_throttle_thread() is running back-to-back under these cases. In summary we believe this patch is immediately required to prevent the lockups. A more elaborate throttling solution should be considered as future work. Perhaps a per-vCPU timer which throttles more precisely or a new convergence design altogether. Thanks, Felipe > > Dave > >> -- >> -- Jason J. Herne (jjherne@linux.vnet.ibm.com) >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Felipe Franciosi (felipe@nutanix.com) wrote: > > > On 1 Jun 2017, at 15:36, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > > > > * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote: > >> On 05/19/2017 05:29 PM, Felipe Franciosi wrote: > >>> Currently, the throttle_thread_scheduled flag is reset back to 0 before > >>> sleeping (as part of the throttling logic). Given that throttle_timer > >>> (well, any timer) may tick with a slight delay, it so happens that under > >>> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may > >>> schedule a further cpu_throttle_thread() work item after the flag reset, > >>> but before the previous sleep completed. This results on the vCPU thread > >>> sleeping continuously for potentially several seconds in a row. > >>> > >>> The chances of that happening can be drastically minimised by resetting > >>> the flag after the sleep. > >>> > >>> Signed-off-by: Felipe Franciosi <felipe@nutanix.com> > >>> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com> > >>> --- > >>> cpus.c | 2 +- > >>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>> > >>> diff --git a/cpus.c b/cpus.c > >>> index 516e5cb..f42eebd 100644 > >>> --- a/cpus.c > >>> +++ b/cpus.c > >>> @@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque) > >>> sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS); > >>> > >>> qemu_mutex_unlock_iothread(); > >>> - atomic_set(&cpu->throttle_thread_scheduled, 0); > >>> g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */ > >>> qemu_mutex_lock_iothread(); > >>> + atomic_set(&cpu->throttle_thread_scheduled, 0); > >>> } > >>> > >>> static void cpu_throttle_timer_tick(void *opaque) > >>> > >> > >> This seems to make sense to me. > >> > >> Acked-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> > >> > >> I'm CC'ing Juan, Amit and David as they are all active in the migration area > >> and may have > >> opinions on this. Juan and David were also reviewers for the original > >> series. > > > > The description is interesting and sounds reasonable; it'll be > > interesting to see what difference it makes to the autoconverge > > behaviour for those workloads that need this level of throttle. > > To get some hard data, we wrote a little application that: > 1) spawns multiple threads (one per vCPU) > 2) each thread mmap()s+mlock()s a certain workset (eg. 30GB/#threads for a 32GB VM) > 3) each thread writes a word to the beginning of every page in a tight loop > 4) the parent thread periodically reports the number of dirtied pages > > Even on a dedicated 10G link, that is pretty much guaranteed to require 99% throttle to converge. > > Before the patch, Qemu migrates the VM (depicted above) fairly quickly (~40s) after reaching 99% throttle. The application reported a few seconds at a time with lockups which we initially thought was just that thread not running between Qemu-induced vCPU sleeps (and later attributed it to the reported bug). > > Then we used a 1G link. This time, the migration had to run for a lot longer even at 99%. That made the bug more likely to happen and we observed soft lockups (reported by the guest's kernel on the console) of 70+ seconds. > > Using the patch, and back on a 10G link, the migration completes after a few more iterations than before (took just under 2mins after reaching 99%). If you want further validation of the bug, instrumenting cpus-common.c:process_queued_cpu_work() could be done to show that cpu_throttle_thread() is running back-to-back under these cases. OK, that's reasonable. > In summary we believe this patch is immediately required to prevent the lockups. Yes, agreed. > A more elaborate throttling solution should be considered as future work. Perhaps a per-vCPU timer which throttles more precisely or a new convergence design altogether. Dave > > Thanks, > Felipe > > > > > Dave > > > >> -- > >> -- Jason J. Herne (jjherne@linux.vnet.ibm.com) > >> > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 19/05/2017 23:29, Felipe Franciosi wrote: > Currently, the throttle_thread_scheduled flag is reset back to 0 before > sleeping (as part of the throttling logic). Given that throttle_timer > (well, any timer) may tick with a slight delay, it so happens that under > heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may > schedule a further cpu_throttle_thread() work item after the flag reset, > but before the previous sleep completed. This results on the vCPU thread > sleeping continuously for potentially several seconds in a row. > > The chances of that happening can be drastically minimised by resetting > the flag after the sleep. True, on the other hand this may also increase the chance of not sleeping at all. How overcommitted was the host system? Paolo > Signed-off-by: Felipe Franciosi <felipe@nutanix.com> > Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>
> On 25 May 2017, at 16:52, Paolo Bonzini <pbonzini@redhat.com> wrote: > > > > On 19/05/2017 23:29, Felipe Franciosi wrote: >> Currently, the throttle_thread_scheduled flag is reset back to 0 before >> sleeping (as part of the throttling logic). Given that throttle_timer >> (well, any timer) may tick with a slight delay, it so happens that under >> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may >> schedule a further cpu_throttle_thread() work item after the flag reset, >> but before the previous sleep completed. This results on the vCPU thread >> sleeping continuously for potentially several seconds in a row. >> >> The chances of that happening can be drastically minimised by resetting >> the flag after the sleep. > > True, on the other hand this may also increase the chance of not > sleeping at all. The perfect solution (for this throttling strategy) would probably be a per-cpu timer. In the meantime, I think avoiding massive sleeps is a win. We observed stalls in excess of 70 secs at 99% throttle. > How overcommitted was the host system? Not overcommitted at all. And it's quite easy to reproduce. All you need is a workload heavy enough to prevent the migration from converging (or a slow network which you can emulate with a qdisc). With a Linux guest, you should quickly see soft lockups being reported. With Windows, probably BSODs. Thanks, Felipe > > Paolo > >> Signed-off-by: Felipe Franciosi <felipe@nutanix.com> >> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>
On 25/05/2017 18:25, Felipe Franciosi wrote: > The perfect solution (for this throttling strategy) would probably be > a per-cpu timer. In the meantime, I think avoiding massive sleeps is > a win. We observed stalls in excess of 70 secs at 99% throttle. Ah, so the issue is not overcommit, it's too high throttling. Then it makes sense. Thanks, Paolo >> How overcommitted was the host system? > Not overcommitted at all. And it's quite easy to reproduce. All you > need is a workload heavy enough to prevent the migration from > converging (or a slow network which you can emulate with a qdisc). > > With a Linux guest, you should quickly see soft lockups being > reported. With Windows, probably BSODs.
Felipe Franciosi <felipe@nutanix.com> wrote: > Currently, the throttle_thread_scheduled flag is reset back to 0 before > sleeping (as part of the throttling logic). Given that throttle_timer > (well, any timer) may tick with a slight delay, it so happens that under > heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may > schedule a further cpu_throttle_thread() work item after the flag reset, > but before the previous sleep completed. This results on the vCPU thread > sleeping continuously for potentially several seconds in a row. > > The chances of that happening can be drastically minimised by resetting > the flag after the sleep. > > Signed-off-by: Felipe Franciosi <felipe@nutanix.com> > Signed-off-by: Malcolm Crossley <malcolm@nutanix.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Paolo, I think that the analisys makes sense. Should you pull this patch, or do you want me to pull it? Thanks, Juan.
On 07/06/2017 18:26, Juan Quintela wrote: > Felipe Franciosi <felipe@nutanix.com> wrote: >> Currently, the throttle_thread_scheduled flag is reset back to 0 before >> sleeping (as part of the throttling logic). Given that throttle_timer >> (well, any timer) may tick with a slight delay, it so happens that under >> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may >> schedule a further cpu_throttle_thread() work item after the flag reset, >> but before the previous sleep completed. This results on the vCPU thread >> sleeping continuously for potentially several seconds in a row. >> >> The chances of that happening can be drastically minimised by resetting >> the flag after the sleep. >> >> Signed-off-by: Felipe Franciosi <felipe@nutanix.com> >> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com> > > Reviewed-by: Juan Quintela <quintela@redhat.com> > > Paolo, I think that the analisys makes sense. > > Should you pull this patch, or do you want me to pull it? I've already included in my jinxed (now at v6) pull request. Paolo
© 2016 - 2024 Red Hat, Inc.