[PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode

Peter Maydell posted 2 patches 2 years, 9 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20210725174405.24568-1-peter.maydell@linaro.org
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>
accel/tcg/cpu-exec.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
[PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode
Posted by Peter Maydell 2 years, 9 months ago
This patchset fixes the intermittent hang seen when running a guest in
icount mode, as reported in
  https://gitlab.com/qemu-project/qemu/-/issues/499 .

The underlying cause of the hang is that code in cpu_loop_exec_tb()
was using CF_COUNT_MASK as the maximum possible number of instructions
it would try to execute from a TB when it set the icount_decr.u16.low
field. This is wrong, because (a) that field can validly be set to any
unsigned 16-bit integer and (b) now that CF_COUNT_MASK has been
reduced to 511 in commit 78ff82bb1b67c0d7, it might be less than the
number of insns in the TB.

Patch one fixes cpu_loop_exec_tb() to use the actual maximum valid
value for icount_decr.u16.low, which is 0xffff.  Patch two adjusts the
"should we ask for a TB with exactly this many insns in it?" condition
so that instead of testing "cpu->icount_extra == 0", which should be
always true if (insns_left > 0 && insns_left < tb->icount), we assert
it instead.  This assertion would have caught the bug fixed in patch
one.

Tested using the same iterating loop test described in the bug report;
without the fix QEMU hangs within a handful of iterations. With the
fix it managed 175 successful iterations before I got bored and hit ^C.

thanks
-- PMM

Peter Maydell (2):
  accel/tcg: Don't use CF_COUNT_MASK as the max value of
    icount_decr.u16.low
  accel/tcg: Remove unnecessary check on icount_extra in
    cpu_loop_exec_tb()

 accel/tcg/cpu-exec.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

-- 
2.20.1


Re: [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode
Posted by Richard Henderson 2 years, 9 months ago
On 7/25/21 7:44 AM, Peter Maydell wrote:
> This patchset fixes the intermittent hang seen when running a guest in
> icount mode, as reported in
>    https://gitlab.com/qemu-project/qemu/-/issues/499 .
> 
> The underlying cause of the hang is that code in cpu_loop_exec_tb()
> was using CF_COUNT_MASK as the maximum possible number of instructions
> it would try to execute from a TB when it set the icount_decr.u16.low
> field. This is wrong, because (a) that field can validly be set to any
> unsigned 16-bit integer and (b) now that CF_COUNT_MASK has been
> reduced to 511 in commit 78ff82bb1b67c0d7, it might be less than the
> number of insns in the TB.
> 
> Patch one fixes cpu_loop_exec_tb() to use the actual maximum valid
> value for icount_decr.u16.low, which is 0xffff.  Patch two adjusts the
> "should we ask for a TB with exactly this many insns in it?" condition
> so that instead of testing "cpu->icount_extra == 0", which should be
> always true if (insns_left > 0 && insns_left < tb->icount), we assert
> it instead.  This assertion would have caught the bug fixed in patch
> one.
> 
> Tested using the same iterating loop test described in the bug report;
> without the fix QEMU hangs within a handful of iterations. With the
> fix it managed 175 successful iterations before I got bored and hit ^C.
> 
> thanks
> -- PMM
> 
> Peter Maydell (2):
>    accel/tcg: Don't use CF_COUNT_MASK as the max value of
>      icount_decr.u16.low
>    accel/tcg: Remove unnecessary check on icount_extra in
>      cpu_loop_exec_tb()

Nice one.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~

Re: [PATCH for-6.1 0/2] accel/tcg: Fix hang when running in icount mode
Posted by Richard Henderson 2 years, 9 months ago
On 7/25/21 8:11 AM, Richard Henderson wrote:
> On 7/25/21 7:44 AM, Peter Maydell wrote:
>> This patchset fixes the intermittent hang seen when running a guest in
>> icount mode, as reported in
>>    https://gitlab.com/qemu-project/qemu/-/issues/499 .
>>
>> The underlying cause of the hang is that code in cpu_loop_exec_tb()
>> was using CF_COUNT_MASK as the maximum possible number of instructions
>> it would try to execute from a TB when it set the icount_decr.u16.low
>> field. This is wrong, because (a) that field can validly be set to any
>> unsigned 16-bit integer and (b) now that CF_COUNT_MASK has been
>> reduced to 511 in commit 78ff82bb1b67c0d7, it might be less than the
>> number of insns in the TB.
>>
>> Patch one fixes cpu_loop_exec_tb() to use the actual maximum valid
>> value for icount_decr.u16.low, which is 0xffff.  Patch two adjusts the
>> "should we ask for a TB with exactly this many insns in it?" condition
>> so that instead of testing "cpu->icount_extra == 0", which should be
>> always true if (insns_left > 0 && insns_left < tb->icount), we assert
>> it instead.  This assertion would have caught the bug fixed in patch
>> one.
>>
>> Tested using the same iterating loop test described in the bug report;
>> without the fix QEMU hangs within a handful of iterations. With the
>> fix it managed 175 successful iterations before I got bored and hit ^C.
>>
>> thanks
>> -- PMM
>>
>> Peter Maydell (2):
>>    accel/tcg: Don't use CF_COUNT_MASK as the max value of
>>      icount_decr.u16.low
>>    accel/tcg: Remove unnecessary check on icount_extra in
>>      cpu_loop_exec_tb()
> 
> Nice one.
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Queued for 6.1.

r~