[v3] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount

[PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount

Posted by Jamie Iles 1 year ago

From: Jamie Iles <jiles@qti.qualcomm.com>

The round-robin scheduler will iterate over the CPU list with an
assigned budget until the next timer expiry and may exit early because
of a TB exit.  This is fine under normal operation but with icount
enabled and SMP it is possible for a CPU to be starved of run time and
the system live-locks.

For example, booting a riscv64 platform with '-icount
shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
has timers enabled and starts performing TLB shootdowns.  In this case
we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
1.  As we enter the TCG loop, we assign the icount budget to next timer
interrupt to CPU 0 and begin executing where the guest is sat in a busy
loop exhausting all of the budget before we try to execute CPU 1 which
is the target of the IPI but CPU 1 is left with no budget with which to
execute and the process repeats.

We try here to add some fairness by splitting the budget across all of
the CPUs on the thread fairly before entering each one.  The CPU count
is cached on CPU list generation ID to avoid iterating the list on each
loop iteration.  With this change it is possible to boot an SMP rv64
guest with icount enabled and no hangs.

New in v3 (address feedback from Richard Henderson):
 - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where 
   appropriate
 - Move rr_cpu_count() call to be conditional on icount_enabled()
 - Initialize cpu_budget to 0

Jamie Iles (2):
  cpu: expose qemu_cpu_list_lock for lock-guard use
  accel/tcg/tcg-accel-ops-rr: ensure fairness with icount

 accel/tcg/tcg-accel-ops-icount.c | 17 +++++++++++++--
 accel/tcg/tcg-accel-ops-icount.h |  3 ++-
 accel/tcg/tcg-accel-ops-rr.c     | 37 +++++++++++++++++++++++++++++++-
 cpus-common.c                    |  2 +-
 include/exec/cpu-common.h        |  1 +
 linux-user/elfload.c             | 12 +++++------
 migration/dirtyrate.c            | 26 +++++++++++-----------
 trace/control-target.c           |  9 ++++----
 8 files changed, 78 insertions(+), 29 deletions(-)

-- 
2.25.1

Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount

Posted by Richard Henderson 1 year ago

On 4/27/23 03:09, Jamie Iles wrote:
> From: Jamie Iles <jiles@qti.qualcomm.com>
> 
> The round-robin scheduler will iterate over the CPU list with an
> assigned budget until the next timer expiry and may exit early because
> of a TB exit.  This is fine under normal operation but with icount
> enabled and SMP it is possible for a CPU to be starved of run time and
> the system live-locks.
> 
> For example, booting a riscv64 platform with '-icount
> shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
> has timers enabled and starts performing TLB shootdowns.  In this case
> we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
> 1.  As we enter the TCG loop, we assign the icount budget to next timer
> interrupt to CPU 0 and begin executing where the guest is sat in a busy
> loop exhausting all of the budget before we try to execute CPU 1 which
> is the target of the IPI but CPU 1 is left with no budget with which to
> execute and the process repeats.
> 
> We try here to add some fairness by splitting the budget across all of
> the CPUs on the thread fairly before entering each one.  The CPU count
> is cached on CPU list generation ID to avoid iterating the list on each
> loop iteration.  With this change it is possible to boot an SMP rv64
> guest with icount enabled and no hangs.
> 
> New in v3 (address feedback from Richard Henderson):
>   - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where
>     appropriate
>   - Move rr_cpu_count() call to be conditional on icount_enabled()
>   - Initialize cpu_budget to 0
> 
> Jamie Iles (2):
>    cpu: expose qemu_cpu_list_lock for lock-guard use
>    accel/tcg/tcg-accel-ops-rr: ensure fairness with icount

It appears as if one of these two patches causes a failure in replay, e.g.

   https://gitlab.com/rth7680/qemu/-/jobs/4200609234#L4162

Would you have a look, please?


r~

Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount

Posted by Jamie Iles 12 months ago

Hi Richard,

On Sat, Apr 29, 2023 at 10:28:03AM +0100, Richard Henderson wrote:
> On 4/27/23 03:09, Jamie Iles wrote:
> > From: Jamie Iles <jiles@qti.qualcomm.com>
> > 
> > The round-robin scheduler will iterate over the CPU list with an
> > assigned budget until the next timer expiry and may exit early because
> > of a TB exit.  This is fine under normal operation but with icount
> > enabled and SMP it is possible for a CPU to be starved of run time and
> > the system live-locks.
> > 
> > For example, booting a riscv64 platform with '-icount
> > shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
> > has timers enabled and starts performing TLB shootdowns.  In this case
> > we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
> > 1.  As we enter the TCG loop, we assign the icount budget to next timer
> > interrupt to CPU 0 and begin executing where the guest is sat in a busy
> > loop exhausting all of the budget before we try to execute CPU 1 which
> > is the target of the IPI but CPU 1 is left with no budget with which to
> > execute and the process repeats.
> > 
> > We try here to add some fairness by splitting the budget across all of
> > the CPUs on the thread fairly before entering each one.  The CPU count
> > is cached on CPU list generation ID to avoid iterating the list on each
> > loop iteration.  With this change it is possible to boot an SMP rv64
> > guest with icount enabled and no hangs.
> > 
> > New in v3 (address feedback from Richard Henderson):
> >   - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where
> >     appropriate
> >   - Move rr_cpu_count() call to be conditional on icount_enabled()
> >   - Initialize cpu_budget to 0
> > 
> > Jamie Iles (2):
> >    cpu: expose qemu_cpu_list_lock for lock-guard use
> >    accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
> 
> It appears as if one of these two patches causes a failure in replay, e.g.
> 
>   https://gitlab.com/rth7680/qemu/-/jobs/4200609234#L4162
> 
> Would you have a look, please?

I was out on vacation and it looks like this job got cleaned up, but was 
this a mutex check failing for the replay mutex and/or iothread mutex?  
If so then the following patch fixes it for me which I can include in a 
v4

Thanks,

Jamie


diff --git i/accel/tcg/tcg-accel-ops-icount.c w/accel/tcg/tcg-accel-ops-icount.c
index e1e8afaf2f99..3d2cfbbc9778 100644
--- i/accel/tcg/tcg-accel-ops-icount.c
+++ w/accel/tcg/tcg-accel-ops-icount.c
@@ -114,13 +114,13 @@ void icount_prepare_for_run(CPUState *cpu, int64_t cpu_budget)
     g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
     g_assert(cpu->icount_extra == 0);
 
+    replay_mutex_lock();
+
     cpu->icount_budget = MIN(icount_get_limit(), cpu_budget);
     insns_left = MIN(0xffff, cpu->icount_budget);
     cpu_neg(cpu)->icount_decr.u16.low = insns_left;
     cpu->icount_extra = cpu->icount_budget - insns_left;
 
-    replay_mutex_lock();
-
     if (cpu->icount_budget == 0) {
         /*
          * We're called without the iothread lock, so must take it while
diff --git i/replay/replay.c w/replay/replay.c
index c39156c52221..0f7d766efe81 100644
--- i/replay/replay.c
+++ w/replay/replay.c
@@ -74,7 +74,7 @@ uint64_t replay_get_current_icount(void)
 int replay_get_instructions(void)
 {
     int res = 0;
-    replay_mutex_lock();
+    g_assert(replay_mutex_locked());
     if (replay_next_event_is(EVENT_INSTRUCTION)) {
         res = replay_state.instruction_count;
         if (replay_break_icount != -1LL) {
@@ -85,7 +85,6 @@ int replay_get_instructions(void)
             }
         }
     }
-    replay_mutex_unlock();
     return res;
 }

Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount

Posted by Richard Henderson 11 months, 3 weeks ago

On 5/3/23 10:44, Jamie Iles wrote:
> Hi Richard,
> 
> On Sat, Apr 29, 2023 at 10:28:03AM +0100, Richard Henderson wrote:
>> On 4/27/23 03:09, Jamie Iles wrote:
>>> From: Jamie Iles <jiles@qti.qualcomm.com>
>>>
>>> The round-robin scheduler will iterate over the CPU list with an
>>> assigned budget until the next timer expiry and may exit early because
>>> of a TB exit.  This is fine under normal operation but with icount
>>> enabled and SMP it is possible for a CPU to be starved of run time and
>>> the system live-locks.
>>>
>>> For example, booting a riscv64 platform with '-icount
>>> shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
>>> has timers enabled and starts performing TLB shootdowns.  In this case
>>> we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
>>> 1.  As we enter the TCG loop, we assign the icount budget to next timer
>>> interrupt to CPU 0 and begin executing where the guest is sat in a busy
>>> loop exhausting all of the budget before we try to execute CPU 1 which
>>> is the target of the IPI but CPU 1 is left with no budget with which to
>>> execute and the process repeats.
>>>
>>> We try here to add some fairness by splitting the budget across all of
>>> the CPUs on the thread fairly before entering each one.  The CPU count
>>> is cached on CPU list generation ID to avoid iterating the list on each
>>> loop iteration.  With this change it is possible to boot an SMP rv64
>>> guest with icount enabled and no hangs.
>>>
>>> New in v3 (address feedback from Richard Henderson):
>>>    - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where
>>>      appropriate
>>>    - Move rr_cpu_count() call to be conditional on icount_enabled()
>>>    - Initialize cpu_budget to 0
>>>
>>> Jamie Iles (2):
>>>     cpu: expose qemu_cpu_list_lock for lock-guard use
>>>     accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
>>
>> It appears as if one of these two patches causes a failure in replay, e.g.
>>
>>    https://gitlab.com/rth7680/qemu/-/jobs/4200609234#L4162
>>
>> Would you have a look, please?
> 
> I was out on vacation and it looks like this job got cleaned up, but was
> this a mutex check failing for the replay mutex and/or iothread mutex?
> If so then the following patch fixes it for me which I can include in a
> v4

That was it.  I'll squash this fix and re-queue.


r~

> 
> Thanks,
> 
> Jamie
> 
> 
> diff --git i/accel/tcg/tcg-accel-ops-icount.c w/accel/tcg/tcg-accel-ops-icount.c
> index e1e8afaf2f99..3d2cfbbc9778 100644
> --- i/accel/tcg/tcg-accel-ops-icount.c
> +++ w/accel/tcg/tcg-accel-ops-icount.c
> @@ -114,13 +114,13 @@ void icount_prepare_for_run(CPUState *cpu, int64_t cpu_budget)
>       g_assert(cpu_neg(cpu)->icount_decr.u16.low == 0);
>       g_assert(cpu->icount_extra == 0);
>   
> +    replay_mutex_lock();
> +
>       cpu->icount_budget = MIN(icount_get_limit(), cpu_budget);
>       insns_left = MIN(0xffff, cpu->icount_budget);
>       cpu_neg(cpu)->icount_decr.u16.low = insns_left;
>       cpu->icount_extra = cpu->icount_budget - insns_left;
>   
> -    replay_mutex_lock();
> -
>       if (cpu->icount_budget == 0) {
>           /*
>            * We're called without the iothread lock, so must take it while
> diff --git i/replay/replay.c w/replay/replay.c
> index c39156c52221..0f7d766efe81 100644
> --- i/replay/replay.c
> +++ w/replay/replay.c
> @@ -74,7 +74,7 @@ uint64_t replay_get_current_icount(void)
>   int replay_get_instructions(void)
>   {
>       int res = 0;
> -    replay_mutex_lock();
> +    g_assert(replay_mutex_locked());
>       if (replay_next_event_is(EVENT_INSTRUCTION)) {
>           res = replay_state.instruction_count;
>           if (replay_break_icount != -1LL) {
> @@ -85,7 +85,6 @@ int replay_get_instructions(void)
>               }
>           }
>       }
> -    replay_mutex_unlock();
>       return res;
>   }
>

Re: [PATCH v3 0/2] accel/tcg/tcg-accel-ops-rr: ensure fairness with icount

Posted by Philippe Mathieu-Daudé 1 year ago

On 29/4/23 11:28, Richard Henderson wrote:
> On 4/27/23 03:09, Jamie Iles wrote:
>> From: Jamie Iles <jiles@qti.qualcomm.com>
>>
>> The round-robin scheduler will iterate over the CPU list with an
>> assigned budget until the next timer expiry and may exit early because
>> of a TB exit.  This is fine under normal operation but with icount
>> enabled and SMP it is possible for a CPU to be starved of run time and
>> the system live-locks.
>>
>> For example, booting a riscv64 platform with '-icount
>> shift=0,align=off,sleep=on -smp 2' we observe a livelock once the kernel
>> has timers enabled and starts performing TLB shootdowns.  In this case
>> we have CPU 0 in M-mode with interrupts disabled sending an IPI to CPU
>> 1.  As we enter the TCG loop, we assign the icount budget to next timer
>> interrupt to CPU 0 and begin executing where the guest is sat in a busy
>> loop exhausting all of the budget before we try to execute CPU 1 which
>> is the target of the IPI but CPU 1 is left with no budget with which to
>> execute and the process repeats.
>>
>> We try here to add some fairness by splitting the budget across all of
>> the CPUs on the thread fairly before entering each one.  The CPU count
>> is cached on CPU list generation ID to avoid iterating the list on each
>> loop iteration.  With this change it is possible to boot an SMP rv64
>> guest with icount enabled and no hangs.
>>
>> New in v3 (address feedback from Richard Henderson):
>>   - Additional patch to use QEMU_LOCK_GUARD with qemu_cpu_list_lock where
>>     appropriate
>>   - Move rr_cpu_count() call to be conditional on icount_enabled()
>>   - Initialize cpu_budget to 0
>>
>> Jamie Iles (2):
>>    cpu: expose qemu_cpu_list_lock for lock-guard use
>>    accel/tcg/tcg-accel-ops-rr: ensure fairness with icount
> 
> It appears as if one of these two patches causes a failure in replay, e.g.
> 
>    https://gitlab.com/rth7680/qemu/-/jobs/4200609234#L4162
> 
> Would you have a look, please?

cpu count only going forward in rr_cpu_count()?