v2: https://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00656.html Changes since v2: - rebase onto master, fixing conflicts - add R-b's - add a missing page_lock to page_collection_lock - add a couple of missing assert_page_locked assertions - add page_lock_pair, as suggested by Alex and Richard - use a per-thread GHashTable to keep track of locked pages - get rid of page_collection assertions, and just export assert_no_pages_locked() [Alex: I removed your R-b.] Thanks, Emilio
On 05/21/2018 04:39 PM, Emilio G. Cota wrote: > v2: https://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00656.html > > Changes since v2: > > - rebase onto master, fixing conflicts > > - add R-b's > > - add a missing page_lock to page_collection_lock > > - add a couple of missing assert_page_locked assertions > > - add page_lock_pair, as suggested by Alex and Richard > > - use a per-thread GHashTable to keep track of locked pages > > - get rid of page_collection assertions, and just export > assert_no_pages_locked() [Alex: I removed your R-b.] Thanks. Queued to tcg-next. r~
On 05/30/2018 03:46 PM, Richard Henderson wrote: > Thanks. Queued to tcg-next. Hmph. Unqueued, at least for now. ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug: assertion failed: (page_is_locked(pd)) #3 0x00007ffff4b6915e in g_assertion_message_expr () at /lib64/libglib-2.0.so.0 #4 0x000055555583c088 in page_unlock__debug (pd=0x7fffa423aa80) at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615 #5 0x000055555583c1be in page_unlock (pd=0x7fffa423aa80) at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:661 #6 0x000055555583c2ef in page_entry_destroy (p=0x7fffa8024460) at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:694 #7 0x00007ffff4b6f448 in () at /lib64/libglib-2.0.so.0 #8 0x00007ffff4b6fea2 in g_tree_destroy () at /lib64/libglib-2.0.so.0 #9 0x000055555583c791 in page_collection_unlock (set=0x7fffa802eba0) at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:842 #10 0x00005555557b301a in memory_notdirty_write_complete (ndi=0x7fffd9cf6050) at /home/rth/work/qemu/qemu/exec.c:2495 #11 0x00005555557b317f in notdirty_mem_write (opaque=0x0, ram_addr=12334096, val=18446739675675374544, size=8) at /home/rth/work/qemu/qemu/exec.c:2535 #12 0x000055555580f14b in memory_region_write_accessor (mr=0x5555562a38a0 <io_mem_notdirty>, addr=12334096, value=0x7fffd9cf6178, size=8, shift=0, mask=18446744073709551615, attrs=...) at /home/rth/work/qemu/qemu/memory.c:530 #13 0x000055555580f360 in access_with_adjusted_size (addr=12334096, value=0x7fffd9cf6178, size=8, access_size_min=1, access_size_max=8, access_fn= 0x55555580f061 <memory_region_write_accessor>, mr=0x5555562a38a0 <io_mem_notdirty>, attrs=...) at /home/rth/work/qemu/qemu/memory.c:597 #14 0x0000555555811cef in memory_region_dispatch_write (mr=0x5555562a38a0 <io_mem_notdirty>, addr=12334096, data=18446739675675374544, size=8, attrs=...) at /home/rth/work/qemu/qemu/memory.c:1474 #15 0x0000555555825d73 in io_writex (env=0x555556869090, iotlbentry=0x555556870520, mmu_idx=0, val=18446739675675374544, addr=18446739675675374608, retaddr=140736231479305, size=8) at /home/rth/work/qemu/qemu/accel/tcg/cputlb.c:813 #16 0x0000555555828b6d in io_writeq (env=0x555556869090, mmu_idx=0, index=225, val=18446739675675374544, addr=18446739675675374608, retaddr=140736231479305) at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:265 #17 0x0000555555828d2c in helper_le_stq_mmu (env=0x555556869090, addr=18446739675675374608, val=18446739675675374544, oi=48, retaddr=140736231479305) at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:301 #18 0x00007fffb5159809 in code_gen_buffer () I can invoke similar crashes with just about every image I try. r~
Richard Henderson <richard.henderson@linaro.org> writes: > On 05/30/2018 03:46 PM, Richard Henderson wrote: >> Thanks. Queued to tcg-next. > Hmph. Unqueued, at least for now. > > ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug: > assertion failed: (page_is_locked(pd)) > > #3 0x00007ffff4b6915e in g_assertion_message_expr () > at /lib64/libglib-2.0.so.0 > #4 0x000055555583c088 in page_unlock__debug (pd=0x7fffa423aa80) > at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615 > #5 0x000055555583c1be in page_unlock (pd=0x7fffa423aa80) > at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:661 > #6 0x000055555583c2ef in page_entry_destroy (p=0x7fffa8024460) > at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:694 > #7 0x00007ffff4b6f448 in () at /lib64/libglib-2.0.so.0 > #8 0x00007ffff4b6fea2 in g_tree_destroy () at /lib64/libglib-2.0.so.0 > #9 0x000055555583c791 in page_collection_unlock (set=0x7fffa802eba0) > at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:842 > #10 0x00005555557b301a in memory_notdirty_write_complete (ndi=0x7fffd9cf6050) > at /home/rth/work/qemu/qemu/exec.c:2495 > #11 0x00005555557b317f in notdirty_mem_write (opaque=0x0, ram_addr=12334096, > val=18446739675675374544, size=8) at /home/rth/work/qemu/qemu/exec.c:2535 > #12 0x000055555580f14b in memory_region_write_accessor (mr=0x5555562a38a0 > <io_mem_notdirty>, addr=12334096, value=0x7fffd9cf6178, size=8, shift=0, > mask=18446744073709551615, attrs=...) at /home/rth/work/qemu/qemu/memory.c:530 > #13 0x000055555580f360 in access_with_adjusted_size (addr=12334096, > value=0x7fffd9cf6178, size=8, access_size_min=1, access_size_max=8, access_fn= > 0x55555580f061 <memory_region_write_accessor>, mr=0x5555562a38a0 > <io_mem_notdirty>, attrs=...) at /home/rth/work/qemu/qemu/memory.c:597 > #14 0x0000555555811cef in memory_region_dispatch_write (mr=0x5555562a38a0 > <io_mem_notdirty>, addr=12334096, data=18446739675675374544, size=8, attrs=...) > at /home/rth/work/qemu/qemu/memory.c:1474 > #15 0x0000555555825d73 in io_writex (env=0x555556869090, > iotlbentry=0x555556870520, mmu_idx=0, val=18446739675675374544, > addr=18446739675675374608, retaddr=140736231479305, size=8) at > /home/rth/work/qemu/qemu/accel/tcg/cputlb.c:813 > #16 0x0000555555828b6d in io_writeq (env=0x555556869090, mmu_idx=0, index=225, > val=18446739675675374544, addr=18446739675675374608, retaddr=140736231479305) > at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:265 > #17 0x0000555555828d2c in helper_le_stq_mmu (env=0x555556869090, > addr=18446739675675374608, val=18446739675675374544, oi=48, > retaddr=140736231479305) > at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:301 > #18 0x00007fffb5159809 in code_gen_buffer () > > I can invoke similar crashes with just about every image I try. Just booting up? I've been hammering builds in my system image with debug-tcg enabled and haven't triggered it yet. Using: ./aarch64-softmmu/qemu-system-aarch64 -machine virt,graphics=on,gic-version=3,virtualization=on -cpu cortex-a53 --serial mon:stdio -nic user,model=virtio-net-pci,hostfwd=tcp::2222-:22 -device virtio-blk-device,drive=myblock -drive file=/home/alex/lsrc/qemu/images/debian-stable-arm64.qcow2,id=myblock,index=0,if=none -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -append "console=ttyAMA0 root=/dev/vda1" -display none -m 4096 -name debug-threads=on -smp 8 -- Alex Bennée
On 06/01/2018 02:32 AM, Alex Bennée wrote: >> I can invoke similar crashes with just about every image I try. > > Just booting up? Not even very far into the boot, and given that I can see it with coldfire, even without -smp. r~
On Wed, May 30, 2018 at 16:05:14 -0700, Richard Henderson wrote: > On 05/30/2018 03:46 PM, Richard Henderson wrote: > > Thanks. Queued to tcg-next. > Hmph. Unqueued, at least for now. > > ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug: > assertion failed: (page_is_locked(pd)) Gaah, sorry. In v3 forgot to call the lock__debug function from a successful trylock. I tested v3 on aarch64, which explains why I didn't catch the bug. Fixed now: --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -704,6 +704,7 @@ static bool page_entry_trylock(struct page_entry *pe) if (!busy) { g_assert(!pe->locked); pe->locked = true; + page_lock__debug(pe->pd); } return busy; } I also added the following, which cannot hurt: diff --git a/exec.c b/exec.c index afc37e0..e874d67 100644 --- a/exec.c +++ b/exec.c @@ -2493,6 +2493,7 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi) { if (ndi->pages) { page_collection_unlock(ndi->pages); + ndi->pages = NULL; } (Note that calling page_collection_unlock twice on that pointer would blow up.) The above two one-liners are the only code changes between v3 and v4. I also added Alex's R-b tag for the qht patch. I've boot-tested v4 on aarch64, arm, x86_64-softmmu, riscv64, sh4, sparc, s390x, ppc64 and or1k, with and without TCG debug. You can fetch v4 from: https://github.com/cota/qemu/tree/tb-lock-removal-redux-v4 Thanks, Emilio
Emilio G. Cota <cota@braap.org> writes: > On Wed, May 30, 2018 at 16:05:14 -0700, Richard Henderson wrote: >> On 05/30/2018 03:46 PM, Richard Henderson wrote: >> > Thanks. Queued to tcg-next. >> Hmph. Unqueued, at least for now. >> >> ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug: >> assertion failed: (page_is_locked(pd)) > > Gaah, sorry. In v3 forgot to call the lock__debug function from > a successful trylock. I tested v3 on aarch64, which explains why > I didn't catch the bug. Fixed now: > > --- a/accel/tcg/translate-all.c > +++ b/accel/tcg/translate-all.c > @@ -704,6 +704,7 @@ static bool page_entry_trylock(struct page_entry *pe) > if (!busy) { > g_assert(!pe->locked); > pe->locked = true; > + page_lock__debug(pe->pd); > } > return busy; > } > > I also added the following, which cannot hurt: > > diff --git a/exec.c b/exec.c > index afc37e0..e874d67 100644 > --- a/exec.c > +++ b/exec.c > @@ -2493,6 +2493,7 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi) > { > if (ndi->pages) { > page_collection_unlock(ndi->pages); > + ndi->pages = NULL; > } > > (Note that calling page_collection_unlock twice on that pointer > would blow up.) > > The above two one-liners are the only code changes between v3 and v4. > I also added Alex's R-b tag for the qht patch. > > I've boot-tested v4 on aarch64, arm, x86_64-softmmu, riscv64, sh4, > sparc, s390x, ppc64 and or1k, with and without TCG debug. > > You can fetch v4 from: > https://github.com/cota/qemu/tree/tb-lock-removal-redux-v4 I've just spun up v4 and it still works on aarch64 and now also passes the coldfire test rich passed me. So have a: Tested-by: Alex Bennée <alex.bennee@linaro.org> For those two. > > Thanks, > > Emilio -- Alex Bennée
Alex Bennée <alex.bennee@linaro.org> writes: > Emilio G. Cota <cota@braap.org> writes: > >> On Wed, May 30, 2018 at 16:05:14 -0700, Richard Henderson wrote: >>> On 05/30/2018 03:46 PM, Richard Henderson wrote: >>> > Thanks. Queued to tcg-next. >>> Hmph. Unqueued, at least for now. >>> >>> ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug: >>> assertion failed: (page_is_locked(pd)) >> >> Gaah, sorry. In v3 forgot to call the lock__debug function from >> a successful trylock. I tested v3 on aarch64, which explains why >> I didn't catch the bug. Fixed now: >> >> --- a/accel/tcg/translate-all.c >> +++ b/accel/tcg/translate-all.c >> @@ -704,6 +704,7 @@ static bool page_entry_trylock(struct page_entry *pe) >> if (!busy) { >> g_assert(!pe->locked); >> pe->locked = true; >> + page_lock__debug(pe->pd); >> } >> return busy; >> } >> >> I also added the following, which cannot hurt: >> >> diff --git a/exec.c b/exec.c >> index afc37e0..e874d67 100644 >> --- a/exec.c >> +++ b/exec.c >> @@ -2493,6 +2493,7 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi) >> { >> if (ndi->pages) { >> page_collection_unlock(ndi->pages); >> + ndi->pages = NULL; >> } >> >> (Note that calling page_collection_unlock twice on that pointer >> would blow up.) >> >> The above two one-liners are the only code changes between v3 and v4. >> I also added Alex's R-b tag for the qht patch. >> >> I've boot-tested v4 on aarch64, arm, x86_64-softmmu, riscv64, sh4, >> sparc, s390x, ppc64 and or1k, with and without TCG debug. >> >> You can fetch v4 from: >> https://github.com/cota/qemu/tree/tb-lock-removal-redux-v4 > > I've just spun up v4 and it still works on aarch64 and now also passes > the coldfire test rich passed me. So have a: > > Tested-by: Alex Bennée <alex.bennee@linaro.org> > > For those two. Ping, Richard are you happy with the fix? > >> >> Thanks, >> >> Emilio -- Alex Bennée
On 06/14/2018 08:34 AM, Alex Bennée wrote: >>> You can fetch v4 from: >>> https://github.com/cota/qemu/tree/tb-lock-removal-redux-v4 >> >> I've just spun up v4 and it still works on aarch64 and now also passes >> the coldfire test rich passed me. So have a: >> >> Tested-by: Alex Bennée <alex.bennee@linaro.org> >> >> For those two. > > Ping, Richard are you happy with the fix? Yes; just sent a PR including v4. It survived a couple of overnight stress-tests. I did miss your T-B from this thread though; whoops. r~
Emilio G. Cota <cota@braap.org> writes: > v2: https://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00656.html > <snip> I'm sure you've seen something similar but currently my profile looks like this: 5.91% qemu-system-aar qemu-system-aarch64 [.] cpu_get_tb_cpu_state 5.88% qemu-system-aar qemu-system-aarch64 [.] helper_lookup_tb_ptr 5.42% qemu-system-aar qemu-system-aarch64 [.] get_phys_addr_lpae 3.27% qemu-system-aar qemu-system-aarch64 [.] qht_lookup 2.80% qemu-system-aar qemu-system-aarch64 [.] tlb_set_page_with_attrs 2.06% qemu-system-aar qemu-system-aarch64 [.] address_space_ldq_le 1.90% qemu-system-aar qemu-system-aarch64 [.] victim_tlb_hit 1.89% qemu-system-aar qemu-system-aarch64 [.] address_space_translate_internal 1.62% qemu-system-aar qemu-system-aarch64 [.] object_dynamic_cast_assert 1.40% qemu-system-aar qemu-system-aarch64 [.] get_phys_addr 1.35% qemu-system-aar qemu-system-aarch64 [.] tb_htable_lookup 1.31% qemu-system-aar qemu-system-aarch64 [.] flatview_do_translate 1.21% qemu-system-aar libc-2.27.so [.] 0x000000000018ef2d 1.12% qemu-system-aar qemu-system-aarch64 [.] get_page_addr_code 1.06% qemu-system-aar libpthread-2.27.so [.] __pthread_mutex_lock 1.06% qemu-system-aar qemu-system-aarch64 [.] tlb_flush_nocheck 1.01% qemu-system-aar [unknown] [k] 0xffffffff88f95990 0.98% qemu-system-aar qemu-system-aarch64 [.] tcg_gen_code -- Alex Bennée
© 2016 - 2024 Red Hat, Inc.