cpus: reset throttle_thread_scheduled after sleep

[Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep

Posted by Felipe Franciosi 8 years, 8 months ago

Currently, the throttle_thread_scheduled flag is reset back to 0 before
sleeping (as part of the throttling logic). Given that throttle_timer
(well, any timer) may tick with a slight delay, it so happens that under
heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
schedule a further cpu_throttle_thread() work item after the flag reset,
but before the previous sleep completed. This results on the vCPU thread
sleeping continuously for potentially several seconds in a row.

The chances of that happening can be drastically minimised by resetting
the flag after the sleep.

Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>
---
 cpus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 516e5cb..f42eebd 100644
--- a/cpus.c
+++ b/cpus.c
@@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
     sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
 
     qemu_mutex_unlock_iothread();
-    atomic_set(&cpu->throttle_thread_scheduled, 0);
     g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */
     qemu_mutex_lock_iothread();
+    atomic_set(&cpu->throttle_thread_scheduled, 0);
 }
 
 static void cpu_throttle_timer_tick(void *opaque)
-- 
1.9.5

Re: [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep

Posted by Jason J. Herne 8 years, 8 months ago

On 05/19/2017 05:29 PM, Felipe Franciosi wrote:
> Currently, the throttle_thread_scheduled flag is reset back to 0 before
> sleeping (as part of the throttling logic). Given that throttle_timer
> (well, any timer) may tick with a slight delay, it so happens that under
> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
> schedule a further cpu_throttle_thread() work item after the flag reset,
> but before the previous sleep completed. This results on the vCPU thread
> sleeping continuously for potentially several seconds in a row.
>
> The chances of that happening can be drastically minimised by resetting
> the flag after the sleep.
>
> Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>
> ---
>  cpus.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/cpus.c b/cpus.c
> index 516e5cb..f42eebd 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
>      sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
>
>      qemu_mutex_unlock_iothread();
> -    atomic_set(&cpu->throttle_thread_scheduled, 0);
>      g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */
>      qemu_mutex_lock_iothread();
> +    atomic_set(&cpu->throttle_thread_scheduled, 0);
>  }
>
>  static void cpu_throttle_timer_tick(void *opaque)
>

This seems to make sense to me.

Acked-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>

I'm CC'ing Juan, Amit and David as they are all active in the migration 
area and may have
opinions on this. Juan and David were also reviewers for the original 
series.

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)

Re: [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep

Posted by Dr. David Alan Gilbert 8 years, 8 months ago

* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> On 05/19/2017 05:29 PM, Felipe Franciosi wrote:
> > Currently, the throttle_thread_scheduled flag is reset back to 0 before
> > sleeping (as part of the throttling logic). Given that throttle_timer
> > (well, any timer) may tick with a slight delay, it so happens that under
> > heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
> > schedule a further cpu_throttle_thread() work item after the flag reset,
> > but before the previous sleep completed. This results on the vCPU thread
> > sleeping continuously for potentially several seconds in a row.
> > 
> > The chances of that happening can be drastically minimised by resetting
> > the flag after the sleep.
> > 
> > Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
> > Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>
> > ---
> >  cpus.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/cpus.c b/cpus.c
> > index 516e5cb..f42eebd 100644
> > --- a/cpus.c
> > +++ b/cpus.c
> > @@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
> >      sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
> > 
> >      qemu_mutex_unlock_iothread();
> > -    atomic_set(&cpu->throttle_thread_scheduled, 0);
> >      g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */
> >      qemu_mutex_lock_iothread();
> > +    atomic_set(&cpu->throttle_thread_scheduled, 0);
> >  }
> > 
> >  static void cpu_throttle_timer_tick(void *opaque)
> > 
> 
> This seems to make sense to me.
> 
> Acked-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> 
> I'm CC'ing Juan, Amit and David as they are all active in the migration area
> and may have
> opinions on this. Juan and David were also reviewers for the original
> series.

The description is interesting and sounds reasonable; it'll be
interesting to see what difference it makes to the autoconverge
behaviour for those workloads that need this level of throttle.

Dave

> -- 
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep

Posted by Felipe Franciosi 8 years, 8 months ago

> On 1 Jun 2017, at 15:36, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> 
> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>> On 05/19/2017 05:29 PM, Felipe Franciosi wrote:
>>> Currently, the throttle_thread_scheduled flag is reset back to 0 before
>>> sleeping (as part of the throttling logic). Given that throttle_timer
>>> (well, any timer) may tick with a slight delay, it so happens that under
>>> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
>>> schedule a further cpu_throttle_thread() work item after the flag reset,
>>> but before the previous sleep completed. This results on the vCPU thread
>>> sleeping continuously for potentially several seconds in a row.
>>> 
>>> The chances of that happening can be drastically minimised by resetting
>>> the flag after the sleep.
>>> 
>>> Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
>>> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>
>>> ---
>>> cpus.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/cpus.c b/cpus.c
>>> index 516e5cb..f42eebd 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
>>>     sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
>>> 
>>>     qemu_mutex_unlock_iothread();
>>> -    atomic_set(&cpu->throttle_thread_scheduled, 0);
>>>     g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */
>>>     qemu_mutex_lock_iothread();
>>> +    atomic_set(&cpu->throttle_thread_scheduled, 0);
>>> }
>>> 
>>> static void cpu_throttle_timer_tick(void *opaque)
>>> 
>> 
>> This seems to make sense to me.
>> 
>> Acked-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
>> 
>> I'm CC'ing Juan, Amit and David as they are all active in the migration area
>> and may have
>> opinions on this. Juan and David were also reviewers for the original
>> series.
> 
> The description is interesting and sounds reasonable; it'll be
> interesting to see what difference it makes to the autoconverge
> behaviour for those workloads that need this level of throttle.

To get some hard data, we wrote a little application that:
1) spawns multiple threads (one per vCPU)
2) each thread mmap()s+mlock()s a certain workset (eg. 30GB/#threads for a 32GB VM)
3) each thread writes a word to the beginning of every page in a tight loop
4) the parent thread periodically reports the number of dirtied pages

Even on a dedicated 10G link, that is pretty much guaranteed to require 99% throttle to converge.

Before the patch, Qemu migrates the VM (depicted above) fairly quickly (~40s) after reaching 99% throttle. The application reported a few seconds at a time with lockups which we initially thought was just that thread not running between Qemu-induced vCPU sleeps (and later attributed it to the reported bug).

Then we used a 1G link. This time, the migration had to run for a lot longer even at 99%. That made the bug more likely to happen and we observed soft lockups (reported by the guest's kernel on the console) of 70+ seconds.

Using the patch, and back on a 10G link, the migration completes after a few more iterations than before (took just under 2mins after reaching 99%). If you want further validation of the bug, instrumenting cpus-common.c:process_queued_cpu_work() could be done to show that cpu_throttle_thread() is running back-to-back under these cases.

In summary we believe this patch is immediately required to prevent the lockups. A more elaborate throttling solution should be considered as future work. Perhaps a per-vCPU timer which throttles more precisely or a new convergence design altogether.

Thanks,
Felipe

> 
> Dave
> 
>> -- 
>> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
>> 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep

Posted by Dr. David Alan Gilbert 8 years, 8 months ago

* Felipe Franciosi (felipe@nutanix.com) wrote:
> 
> > On 1 Jun 2017, at 15:36, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > 
> > * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> >> On 05/19/2017 05:29 PM, Felipe Franciosi wrote:
> >>> Currently, the throttle_thread_scheduled flag is reset back to 0 before
> >>> sleeping (as part of the throttling logic). Given that throttle_timer
> >>> (well, any timer) may tick with a slight delay, it so happens that under
> >>> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
> >>> schedule a further cpu_throttle_thread() work item after the flag reset,
> >>> but before the previous sleep completed. This results on the vCPU thread
> >>> sleeping continuously for potentially several seconds in a row.
> >>> 
> >>> The chances of that happening can be drastically minimised by resetting
> >>> the flag after the sleep.
> >>> 
> >>> Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
> >>> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>
> >>> ---
> >>> cpus.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>> 
> >>> diff --git a/cpus.c b/cpus.c
> >>> index 516e5cb..f42eebd 100644
> >>> --- a/cpus.c
> >>> +++ b/cpus.c
> >>> @@ -677,9 +677,9 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
> >>>     sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
> >>> 
> >>>     qemu_mutex_unlock_iothread();
> >>> -    atomic_set(&cpu->throttle_thread_scheduled, 0);
> >>>     g_usleep(sleeptime_ns / 1000); /* Convert ns to us for usleep call */
> >>>     qemu_mutex_lock_iothread();
> >>> +    atomic_set(&cpu->throttle_thread_scheduled, 0);
> >>> }
> >>> 
> >>> static void cpu_throttle_timer_tick(void *opaque)
> >>> 
> >> 
> >> This seems to make sense to me.
> >> 
> >> Acked-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> >> 
> >> I'm CC'ing Juan, Amit and David as they are all active in the migration area
> >> and may have
> >> opinions on this. Juan and David were also reviewers for the original
> >> series.
> > 
> > The description is interesting and sounds reasonable; it'll be
> > interesting to see what difference it makes to the autoconverge
> > behaviour for those workloads that need this level of throttle.
> 
> To get some hard data, we wrote a little application that:
> 1) spawns multiple threads (one per vCPU)
> 2) each thread mmap()s+mlock()s a certain workset (eg. 30GB/#threads for a 32GB VM)
> 3) each thread writes a word to the beginning of every page in a tight loop
> 4) the parent thread periodically reports the number of dirtied pages
> 
> Even on a dedicated 10G link, that is pretty much guaranteed to require 99% throttle to converge.
> 
> Before the patch, Qemu migrates the VM (depicted above) fairly quickly (~40s) after reaching 99% throttle. The application reported a few seconds at a time with lockups which we initially thought was just that thread not running between Qemu-induced vCPU sleeps (and later attributed it to the reported bug).
> 
> Then we used a 1G link. This time, the migration had to run for a lot longer even at 99%. That made the bug more likely to happen and we observed soft lockups (reported by the guest's kernel on the console) of 70+ seconds.
> 
> Using the patch, and back on a 10G link, the migration completes after a few more iterations than before (took just under 2mins after reaching 99%). If you want further validation of the bug, instrumenting cpus-common.c:process_queued_cpu_work() could be done to show that cpu_throttle_thread() is running back-to-back under these cases.

OK, that's reasonable.

> In summary we believe this patch is immediately required to prevent the lockups.

Yes, agreed.

> A more elaborate throttling solution should be considered as future work. Perhaps a per-vCPU timer which throttles more precisely or a new convergence design altogether.

Dave

> 
> Thanks,
> Felipe
> 
> > 
> > Dave
> > 
> >> -- 
> >> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> >> 
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep

Posted by Paolo Bonzini 8 years, 8 months ago


On 19/05/2017 23:29, Felipe Franciosi wrote:
> Currently, the throttle_thread_scheduled flag is reset back to 0 before
> sleeping (as part of the throttling logic). Given that throttle_timer
> (well, any timer) may tick with a slight delay, it so happens that under
> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
> schedule a further cpu_throttle_thread() work item after the flag reset,
> but before the previous sleep completed. This results on the vCPU thread
> sleeping continuously for potentially several seconds in a row.
> 
> The chances of that happening can be drastically minimised by resetting
> the flag after the sleep.

True, on the other hand this may also increase the chance of not
sleeping at all.

How overcommitted was the host system?

Paolo

> Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>

Re: [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep

Posted by Felipe Franciosi 8 years, 8 months ago

> On 25 May 2017, at 16:52, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> 
> 
> On 19/05/2017 23:29, Felipe Franciosi wrote:
>> Currently, the throttle_thread_scheduled flag is reset back to 0 before
>> sleeping (as part of the throttling logic). Given that throttle_timer
>> (well, any timer) may tick with a slight delay, it so happens that under
>> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
>> schedule a further cpu_throttle_thread() work item after the flag reset,
>> but before the previous sleep completed. This results on the vCPU thread
>> sleeping continuously for potentially several seconds in a row.
>> 
>> The chances of that happening can be drastically minimised by resetting
>> the flag after the sleep.
> 
> True, on the other hand this may also increase the chance of not
> sleeping at all.

The perfect solution (for this throttling strategy) would probably be a per-cpu timer. In the meantime, I think avoiding massive sleeps is a win. We observed stalls in excess of 70 secs at 99% throttle.

> How overcommitted was the host system?

Not overcommitted at all. And it's quite easy to reproduce. All you need is a workload heavy enough to prevent the migration from converging (or a slow network which you can emulate with a qdisc).

With a Linux guest, you should quickly see soft lockups being reported. With Windows, probably BSODs.

Thanks,
Felipe

> 
> Paolo
> 
>> Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
>> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>

Re: [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep

Posted by Paolo Bonzini 8 years, 8 months ago


On 25/05/2017 18:25, Felipe Franciosi wrote:
> The perfect solution (for this throttling strategy) would probably be
> a per-cpu timer. In the meantime, I think avoiding massive sleeps is
> a win. We observed stalls in excess of 70 secs at 99% throttle.

Ah, so the issue is not overcommit, it's too high throttling.  Then it
makes sense.

Thanks,

Paolo

>> How overcommitted was the host system?
> Not overcommitted at all. And it's quite easy to reproduce. All you
> need is a workload heavy enough to prevent the migration from
> converging (or a slow network which you can emulate with a qdisc).
> 
> With a Linux guest, you should quickly see soft lockups being
> reported. With Windows, probably BSODs.

Re: [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep

Posted by Juan Quintela 8 years, 8 months ago

Felipe Franciosi <felipe@nutanix.com> wrote:
> Currently, the throttle_thread_scheduled flag is reset back to 0 before
> sleeping (as part of the throttling logic). Given that throttle_timer
> (well, any timer) may tick with a slight delay, it so happens that under
> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
> schedule a further cpu_throttle_thread() work item after the flag reset,
> but before the previous sleep completed. This results on the vCPU thread
> sleeping continuously for potentially several seconds in a row.
>
> The chances of that happening can be drastically minimised by resetting
> the flag after the sleep.
>
> Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

Paolo, I think that the analisys makes sense.

Should you pull this patch, or do you want me to pull it?

Thanks, Juan.

Re: [Qemu-devel] [PATCH] cpus: reset throttle_thread_scheduled after sleep

Posted by Paolo Bonzini 8 years, 8 months ago


On 07/06/2017 18:26, Juan Quintela wrote:
> Felipe Franciosi <felipe@nutanix.com> wrote:
>> Currently, the throttle_thread_scheduled flag is reset back to 0 before
>> sleeping (as part of the throttling logic). Given that throttle_timer
>> (well, any timer) may tick with a slight delay, it so happens that under
>> heavy throttling (ie. close or on CPU_THROTTLE_PCT_MAX) the tick may
>> schedule a further cpu_throttle_thread() work item after the flag reset,
>> but before the previous sleep completed. This results on the vCPU thread
>> sleeping continuously for potentially several seconds in a row.
>>
>> The chances of that happening can be drastically minimised by resetting
>> the flag after the sleep.
>>
>> Signed-off-by: Felipe Franciosi <felipe@nutanix.com>
>> Signed-off-by: Malcolm Crossley <malcolm@nutanix.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> Paolo, I think that the analisys makes sense.
> 
> Should you pull this patch, or do you want me to pull it?

I've already included in my jinxed (now at v6) pull request.

Paolo