accel/tcg/cpu-exec.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
This patchset fixes the intermittent hang seen when running a guest in icount mode, as reported in https://gitlab.com/qemu-project/qemu/-/issues/499 . The underlying cause of the hang is that code in cpu_loop_exec_tb() was using CF_COUNT_MASK as the maximum possible number of instructions it would try to execute from a TB when it set the icount_decr.u16.low field. This is wrong, because (a) that field can validly be set to any unsigned 16-bit integer and (b) now that CF_COUNT_MASK has been reduced to 511 in commit 78ff82bb1b67c0d7, it might be less than the number of insns in the TB. Patch one fixes cpu_loop_exec_tb() to use the actual maximum valid value for icount_decr.u16.low, which is 0xffff. Patch two adjusts the "should we ask for a TB with exactly this many insns in it?" condition so that instead of testing "cpu->icount_extra == 0", which should be always true if (insns_left > 0 && insns_left < tb->icount), we assert it instead. This assertion would have caught the bug fixed in patch one. Tested using the same iterating loop test described in the bug report; without the fix QEMU hangs within a handful of iterations. With the fix it managed 175 successful iterations before I got bored and hit ^C. thanks -- PMM Peter Maydell (2): accel/tcg: Don't use CF_COUNT_MASK as the max value of icount_decr.u16.low accel/tcg: Remove unnecessary check on icount_extra in cpu_loop_exec_tb() accel/tcg/cpu-exec.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) -- 2.20.1
On 7/25/21 7:44 AM, Peter Maydell wrote: > This patchset fixes the intermittent hang seen when running a guest in > icount mode, as reported in > https://gitlab.com/qemu-project/qemu/-/issues/499 . > > The underlying cause of the hang is that code in cpu_loop_exec_tb() > was using CF_COUNT_MASK as the maximum possible number of instructions > it would try to execute from a TB when it set the icount_decr.u16.low > field. This is wrong, because (a) that field can validly be set to any > unsigned 16-bit integer and (b) now that CF_COUNT_MASK has been > reduced to 511 in commit 78ff82bb1b67c0d7, it might be less than the > number of insns in the TB. > > Patch one fixes cpu_loop_exec_tb() to use the actual maximum valid > value for icount_decr.u16.low, which is 0xffff. Patch two adjusts the > "should we ask for a TB with exactly this many insns in it?" condition > so that instead of testing "cpu->icount_extra == 0", which should be > always true if (insns_left > 0 && insns_left < tb->icount), we assert > it instead. This assertion would have caught the bug fixed in patch > one. > > Tested using the same iterating loop test described in the bug report; > without the fix QEMU hangs within a handful of iterations. With the > fix it managed 175 successful iterations before I got bored and hit ^C. > > thanks > -- PMM > > Peter Maydell (2): > accel/tcg: Don't use CF_COUNT_MASK as the max value of > icount_decr.u16.low > accel/tcg: Remove unnecessary check on icount_extra in > cpu_loop_exec_tb() Nice one. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
On 7/25/21 8:11 AM, Richard Henderson wrote: > On 7/25/21 7:44 AM, Peter Maydell wrote: >> This patchset fixes the intermittent hang seen when running a guest in >> icount mode, as reported in >> https://gitlab.com/qemu-project/qemu/-/issues/499 . >> >> The underlying cause of the hang is that code in cpu_loop_exec_tb() >> was using CF_COUNT_MASK as the maximum possible number of instructions >> it would try to execute from a TB when it set the icount_decr.u16.low >> field. This is wrong, because (a) that field can validly be set to any >> unsigned 16-bit integer and (b) now that CF_COUNT_MASK has been >> reduced to 511 in commit 78ff82bb1b67c0d7, it might be less than the >> number of insns in the TB. >> >> Patch one fixes cpu_loop_exec_tb() to use the actual maximum valid >> value for icount_decr.u16.low, which is 0xffff. Patch two adjusts the >> "should we ask for a TB with exactly this many insns in it?" condition >> so that instead of testing "cpu->icount_extra == 0", which should be >> always true if (insns_left > 0 && insns_left < tb->icount), we assert >> it instead. This assertion would have caught the bug fixed in patch >> one. >> >> Tested using the same iterating loop test described in the bug report; >> without the fix QEMU hangs within a handful of iterations. With the >> fix it managed 175 successful iterations before I got bored and hit ^C. >> >> thanks >> -- PMM >> >> Peter Maydell (2): >> accel/tcg: Don't use CF_COUNT_MASK as the max value of >> icount_decr.u16.low >> accel/tcg: Remove unnecessary check on icount_extra in >> cpu_loop_exec_tb() > > Nice one. > Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Queued for 6.1. r~
© 2016 - 2024 Red Hat, Inc.