[Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3

Emilio G. Cota posted 17 patches 5 years, 10 months ago
Failed in applying to current master (apply log)
Test checkpatch passed
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test s390x passed
[Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3
Posted by Emilio G. Cota 5 years, 10 months ago
v2: https://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00656.html

Changes since v2:

- rebase onto master, fixing conflicts

- add R-b's

- add a missing page_lock to page_collection_lock

- add a couple of missing assert_page_locked assertions

- add page_lock_pair, as suggested by Alex and Richard

- use a per-thread GHashTable to keep track of locked pages

- get rid of page_collection assertions, and just export
  assert_no_pages_locked() [Alex: I removed your R-b.]

Thanks,

		Emilio

Re: [Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3
Posted by Richard Henderson 5 years, 10 months ago
On 05/21/2018 04:39 PM, Emilio G. Cota wrote:
> v2: https://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00656.html
> 
> Changes since v2:
> 
> - rebase onto master, fixing conflicts
> 
> - add R-b's
> 
> - add a missing page_lock to page_collection_lock
> 
> - add a couple of missing assert_page_locked assertions
> 
> - add page_lock_pair, as suggested by Alex and Richard
> 
> - use a per-thread GHashTable to keep track of locked pages
> 
> - get rid of page_collection assertions, and just export
>   assert_no_pages_locked() [Alex: I removed your R-b.]

Thanks.  Queued to tcg-next.


r~

Re: [Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3
Posted by Richard Henderson 5 years, 10 months ago
On 05/30/2018 03:46 PM, Richard Henderson wrote:
> Thanks.  Queued to tcg-next.
Hmph.  Unqueued, at least for now.

ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug:
assertion failed: (page_is_locked(pd))

#3  0x00007ffff4b6915e in g_assertion_message_expr ()
    at /lib64/libglib-2.0.so.0
#4  0x000055555583c088 in page_unlock__debug (pd=0x7fffa423aa80)
    at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615
#5  0x000055555583c1be in page_unlock (pd=0x7fffa423aa80)
    at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:661
#6  0x000055555583c2ef in page_entry_destroy (p=0x7fffa8024460)
    at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:694
#7  0x00007ffff4b6f448 in  () at /lib64/libglib-2.0.so.0
#8  0x00007ffff4b6fea2 in g_tree_destroy () at /lib64/libglib-2.0.so.0
#9  0x000055555583c791 in page_collection_unlock (set=0x7fffa802eba0)
    at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:842
#10 0x00005555557b301a in memory_notdirty_write_complete (ndi=0x7fffd9cf6050)
    at /home/rth/work/qemu/qemu/exec.c:2495
#11 0x00005555557b317f in notdirty_mem_write (opaque=0x0, ram_addr=12334096,
val=18446739675675374544, size=8) at /home/rth/work/qemu/qemu/exec.c:2535
#12 0x000055555580f14b in memory_region_write_accessor (mr=0x5555562a38a0
<io_mem_notdirty>, addr=12334096, value=0x7fffd9cf6178, size=8, shift=0,
mask=18446744073709551615, attrs=...) at /home/rth/work/qemu/qemu/memory.c:530
#13 0x000055555580f360 in access_with_adjusted_size (addr=12334096,
value=0x7fffd9cf6178, size=8, access_size_min=1, access_size_max=8, access_fn=
    0x55555580f061 <memory_region_write_accessor>, mr=0x5555562a38a0
<io_mem_notdirty>, attrs=...) at /home/rth/work/qemu/qemu/memory.c:597
#14 0x0000555555811cef in memory_region_dispatch_write (mr=0x5555562a38a0
<io_mem_notdirty>, addr=12334096, data=18446739675675374544, size=8, attrs=...)
    at /home/rth/work/qemu/qemu/memory.c:1474
#15 0x0000555555825d73 in io_writex (env=0x555556869090,
iotlbentry=0x555556870520, mmu_idx=0, val=18446739675675374544,
addr=18446739675675374608, retaddr=140736231479305, size=8) at
/home/rth/work/qemu/qemu/accel/tcg/cputlb.c:813
#16 0x0000555555828b6d in io_writeq (env=0x555556869090, mmu_idx=0, index=225,
val=18446739675675374544, addr=18446739675675374608, retaddr=140736231479305)
    at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:265
#17 0x0000555555828d2c in helper_le_stq_mmu (env=0x555556869090,
addr=18446739675675374608, val=18446739675675374544, oi=48,
retaddr=140736231479305)
    at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:301
#18 0x00007fffb5159809 in code_gen_buffer ()

I can invoke similar crashes with just about every image I try.


r~

Re: [Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3
Posted by Alex Bennée 5 years, 10 months ago
Richard Henderson <richard.henderson@linaro.org> writes:

> On 05/30/2018 03:46 PM, Richard Henderson wrote:
>> Thanks.  Queued to tcg-next.
> Hmph.  Unqueued, at least for now.
>
> ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug:
> assertion failed: (page_is_locked(pd))
>
> #3  0x00007ffff4b6915e in g_assertion_message_expr ()
>     at /lib64/libglib-2.0.so.0
> #4  0x000055555583c088 in page_unlock__debug (pd=0x7fffa423aa80)
>     at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615
> #5  0x000055555583c1be in page_unlock (pd=0x7fffa423aa80)
>     at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:661
> #6  0x000055555583c2ef in page_entry_destroy (p=0x7fffa8024460)
>     at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:694
> #7  0x00007ffff4b6f448 in  () at /lib64/libglib-2.0.so.0
> #8  0x00007ffff4b6fea2 in g_tree_destroy () at /lib64/libglib-2.0.so.0
> #9  0x000055555583c791 in page_collection_unlock (set=0x7fffa802eba0)
>     at /home/rth/work/qemu/qemu/accel/tcg/translate-all.c:842
> #10 0x00005555557b301a in memory_notdirty_write_complete (ndi=0x7fffd9cf6050)
>     at /home/rth/work/qemu/qemu/exec.c:2495
> #11 0x00005555557b317f in notdirty_mem_write (opaque=0x0, ram_addr=12334096,
> val=18446739675675374544, size=8) at /home/rth/work/qemu/qemu/exec.c:2535
> #12 0x000055555580f14b in memory_region_write_accessor (mr=0x5555562a38a0
> <io_mem_notdirty>, addr=12334096, value=0x7fffd9cf6178, size=8, shift=0,
> mask=18446744073709551615, attrs=...) at /home/rth/work/qemu/qemu/memory.c:530
> #13 0x000055555580f360 in access_with_adjusted_size (addr=12334096,
> value=0x7fffd9cf6178, size=8, access_size_min=1, access_size_max=8, access_fn=
>     0x55555580f061 <memory_region_write_accessor>, mr=0x5555562a38a0
> <io_mem_notdirty>, attrs=...) at /home/rth/work/qemu/qemu/memory.c:597
> #14 0x0000555555811cef in memory_region_dispatch_write (mr=0x5555562a38a0
> <io_mem_notdirty>, addr=12334096, data=18446739675675374544, size=8, attrs=...)
>     at /home/rth/work/qemu/qemu/memory.c:1474
> #15 0x0000555555825d73 in io_writex (env=0x555556869090,
> iotlbentry=0x555556870520, mmu_idx=0, val=18446739675675374544,
> addr=18446739675675374608, retaddr=140736231479305, size=8) at
> /home/rth/work/qemu/qemu/accel/tcg/cputlb.c:813
> #16 0x0000555555828b6d in io_writeq (env=0x555556869090, mmu_idx=0, index=225,
> val=18446739675675374544, addr=18446739675675374608, retaddr=140736231479305)
>     at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:265
> #17 0x0000555555828d2c in helper_le_stq_mmu (env=0x555556869090,
> addr=18446739675675374608, val=18446739675675374544, oi=48,
> retaddr=140736231479305)
>     at /home/rth/work/qemu/qemu/accel/tcg/softmmu_template.h:301
> #18 0x00007fffb5159809 in code_gen_buffer ()
>
> I can invoke similar crashes with just about every image I try.

Just booting up? I've been hammering builds in my system image with
debug-tcg enabled and haven't triggered it yet.

Using:

./aarch64-softmmu/qemu-system-aarch64 -machine virt,graphics=on,gic-version=3,virtualization=on -cpu cortex-a53 --serial mon:stdio -nic user,model=virtio-net-pci,hostfwd=tcp::2222-:22 -device virtio-blk-device,drive=myblock -drive file=/home/alex/lsrc/qemu/images/debian-stable-arm64.qcow2,id=myblock,index=0,if=none -kernel /home/alex/lsrc/qemu/images/aarch64-current-linux-kernel-only.img -append "console=ttyAMA0 root=/dev/vda1" -display none -m 4096 -name debug-threads=on -smp 8
--
Alex Bennée

Re: [Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3
Posted by Richard Henderson 5 years, 9 months ago
On 06/01/2018 02:32 AM, Alex Bennée wrote:
>> I can invoke similar crashes with just about every image I try.
> 
> Just booting up?

Not even very far into the boot, and given that I can see it with coldfire,
even without -smp.


r~

Re: [Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3
Posted by Emilio G. Cota 5 years, 9 months ago
On Wed, May 30, 2018 at 16:05:14 -0700, Richard Henderson wrote:
> On 05/30/2018 03:46 PM, Richard Henderson wrote:
> > Thanks.  Queued to tcg-next.
> Hmph.  Unqueued, at least for now.
> 
> ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug:
> assertion failed: (page_is_locked(pd))

Gaah, sorry. In v3 forgot to call the lock__debug function from
a successful trylock. I tested v3 on aarch64, which explains why
I didn't catch the bug. Fixed now:

--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -704,6 +704,7 @@ static bool page_entry_trylock(struct page_entry *pe)
     if (!busy) {
         g_assert(!pe->locked);
         pe->locked = true;
+        page_lock__debug(pe->pd);
     }
     return busy;
 }

I also added the following, which cannot hurt:

diff --git a/exec.c b/exec.c
index afc37e0..e874d67 100644
--- a/exec.c
+++ b/exec.c
@@ -2493,6 +2493,7 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
 {
     if (ndi->pages) {
         page_collection_unlock(ndi->pages);
+        ndi->pages = NULL;
     }

(Note that calling page_collection_unlock twice on that pointer
would blow up.)

The above two one-liners are the only code changes between v3 and v4.
I also added Alex's R-b tag for the qht patch.

I've boot-tested v4 on aarch64, arm, x86_64-softmmu, riscv64, sh4,
sparc, s390x, ppc64 and or1k, with and without TCG debug.

You can fetch v4 from:
  https://github.com/cota/qemu/tree/tb-lock-removal-redux-v4

Thanks,

		Emilio


Re: [Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3
Posted by Alex Bennée 5 years, 9 months ago
Emilio G. Cota <cota@braap.org> writes:

> On Wed, May 30, 2018 at 16:05:14 -0700, Richard Henderson wrote:
>> On 05/30/2018 03:46 PM, Richard Henderson wrote:
>> > Thanks.  Queued to tcg-next.
>> Hmph.  Unqueued, at least for now.
>>
>> ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug:
>> assertion failed: (page_is_locked(pd))
>
> Gaah, sorry. In v3 forgot to call the lock__debug function from
> a successful trylock. I tested v3 on aarch64, which explains why
> I didn't catch the bug. Fixed now:
>
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -704,6 +704,7 @@ static bool page_entry_trylock(struct page_entry *pe)
>      if (!busy) {
>          g_assert(!pe->locked);
>          pe->locked = true;
> +        page_lock__debug(pe->pd);
>      }
>      return busy;
>  }
>
> I also added the following, which cannot hurt:
>
> diff --git a/exec.c b/exec.c
> index afc37e0..e874d67 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2493,6 +2493,7 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
>  {
>      if (ndi->pages) {
>          page_collection_unlock(ndi->pages);
> +        ndi->pages = NULL;
>      }
>
> (Note that calling page_collection_unlock twice on that pointer
> would blow up.)
>
> The above two one-liners are the only code changes between v3 and v4.
> I also added Alex's R-b tag for the qht patch.
>
> I've boot-tested v4 on aarch64, arm, x86_64-softmmu, riscv64, sh4,
> sparc, s390x, ppc64 and or1k, with and without TCG debug.
>
> You can fetch v4 from:
>   https://github.com/cota/qemu/tree/tb-lock-removal-redux-v4

I've just spun up v4 and it still works on aarch64 and now also passes
the coldfire test rich passed me. So have a:

Tested-by: Alex Bennée <alex.bennee@linaro.org>

For those two.

>
> Thanks,
>
> 		Emilio


--
Alex Bennée

Re: [Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3
Posted by Alex Bennée 5 years, 9 months ago
Alex Bennée <alex.bennee@linaro.org> writes:

> Emilio G. Cota <cota@braap.org> writes:
>
>> On Wed, May 30, 2018 at 16:05:14 -0700, Richard Henderson wrote:
>>> On 05/30/2018 03:46 PM, Richard Henderson wrote:
>>> > Thanks.  Queued to tcg-next.
>>> Hmph.  Unqueued, at least for now.
>>>
>>> ERROR:/home/rth/work/qemu/qemu/accel/tcg/translate-all.c:615:page_unlock__debug:
>>> assertion failed: (page_is_locked(pd))
>>
>> Gaah, sorry. In v3 forgot to call the lock__debug function from
>> a successful trylock. I tested v3 on aarch64, which explains why
>> I didn't catch the bug. Fixed now:
>>
>> --- a/accel/tcg/translate-all.c
>> +++ b/accel/tcg/translate-all.c
>> @@ -704,6 +704,7 @@ static bool page_entry_trylock(struct page_entry *pe)
>>      if (!busy) {
>>          g_assert(!pe->locked);
>>          pe->locked = true;
>> +        page_lock__debug(pe->pd);
>>      }
>>      return busy;
>>  }
>>
>> I also added the following, which cannot hurt:
>>
>> diff --git a/exec.c b/exec.c
>> index afc37e0..e874d67 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -2493,6 +2493,7 @@ void memory_notdirty_write_complete(NotDirtyInfo *ndi)
>>  {
>>      if (ndi->pages) {
>>          page_collection_unlock(ndi->pages);
>> +        ndi->pages = NULL;
>>      }
>>
>> (Note that calling page_collection_unlock twice on that pointer
>> would blow up.)
>>
>> The above two one-liners are the only code changes between v3 and v4.
>> I also added Alex's R-b tag for the qht patch.
>>
>> I've boot-tested v4 on aarch64, arm, x86_64-softmmu, riscv64, sh4,
>> sparc, s390x, ppc64 and or1k, with and without TCG debug.
>>
>> You can fetch v4 from:
>>   https://github.com/cota/qemu/tree/tb-lock-removal-redux-v4
>
> I've just spun up v4 and it still works on aarch64 and now also passes
> the coldfire test rich passed me. So have a:
>
> Tested-by: Alex Bennée <alex.bennee@linaro.org>
>
> For those two.


Ping, Richard are you happy with the fix?

>
>>
>> Thanks,
>>
>> 		Emilio


--
Alex Bennée

Re: [Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3
Posted by Richard Henderson 5 years, 9 months ago
On 06/14/2018 08:34 AM, Alex Bennée wrote:
>>> You can fetch v4 from:
>>>   https://github.com/cota/qemu/tree/tb-lock-removal-redux-v4
>>
>> I've just spun up v4 and it still works on aarch64 and now also passes
>> the coldfire test rich passed me. So have a:
>>
>> Tested-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> For those two.
> 
> Ping, Richard are you happy with the fix?

Yes; just sent a PR including v4.  It survived a couple of overnight
stress-tests.  I did miss your T-B from this thread though; whoops.


r~

Re: [Qemu-devel] [PATCH v3 00/17] tcg: tb_lock removal redux v3
Posted by Alex Bennée 5 years, 9 months ago
Emilio G. Cota <cota@braap.org> writes:

> v2: https://lists.nongnu.org/archive/html/qemu-devel/2018-04/msg00656.html
>
<snip>
I'm sure you've seen something similar but currently my profile looks
like this:

   5.91%  qemu-system-aar  qemu-system-aarch64      [.] cpu_get_tb_cpu_state
   5.88%  qemu-system-aar  qemu-system-aarch64      [.] helper_lookup_tb_ptr
   5.42%  qemu-system-aar  qemu-system-aarch64      [.] get_phys_addr_lpae
   3.27%  qemu-system-aar  qemu-system-aarch64      [.] qht_lookup
   2.80%  qemu-system-aar  qemu-system-aarch64      [.] tlb_set_page_with_attrs
   2.06%  qemu-system-aar  qemu-system-aarch64      [.] address_space_ldq_le
   1.90%  qemu-system-aar  qemu-system-aarch64      [.] victim_tlb_hit
   1.89%  qemu-system-aar  qemu-system-aarch64      [.] address_space_translate_internal
   1.62%  qemu-system-aar  qemu-system-aarch64      [.] object_dynamic_cast_assert
   1.40%  qemu-system-aar  qemu-system-aarch64      [.] get_phys_addr
   1.35%  qemu-system-aar  qemu-system-aarch64      [.] tb_htable_lookup
   1.31%  qemu-system-aar  qemu-system-aarch64      [.] flatview_do_translate
   1.21%  qemu-system-aar  libc-2.27.so             [.] 0x000000000018ef2d
   1.12%  qemu-system-aar  qemu-system-aarch64      [.] get_page_addr_code
   1.06%  qemu-system-aar  libpthread-2.27.so       [.] __pthread_mutex_lock
   1.06%  qemu-system-aar  qemu-system-aarch64      [.] tlb_flush_nocheck
   1.01%  qemu-system-aar  [unknown]                [k] 0xffffffff88f95990
   0.98%  qemu-system-aar  qemu-system-aarch64      [.] tcg_gen_code


--
Alex Bennée