[Qemu-devel] [PULL 5/6] console: remove do_safe_dpy_refresh

Gerd Hoffmann posted 6 patches 8 years, 7 months ago
[Qemu-devel] [PULL 5/6] console: remove do_safe_dpy_refresh
Posted by Gerd Hoffmann 8 years, 7 months ago
Drop the temporary workaround for the broken display updates.
All display adapters are updated, so this should be safe without
causing regressions.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20170614084538.32480-1-kraxel@redhat.com
---
 ui/console.c | 25 +------------------------
 1 file changed, 1 insertion(+), 24 deletions(-)

diff --git a/ui/console.c b/ui/console.c
index d914cced53..af0c56c600 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1579,36 +1579,13 @@ bool dpy_gfx_check_format(QemuConsole *con,
     return true;
 }
 
-/*
- * Safe DPY refresh for TCG guests. We use the exclusive mechanism to
- * ensure the TCG vCPUs are quiescent so we can avoid races between
- * dirty page tracking for direct frame-buffer access by the guest.
- *
- * This is a temporary stopgap until we've fixed the dirty tracking
- * races in display adapters.
- */
-static void do_safe_dpy_refresh(DisplayChangeListener *dcl)
-{
-    qemu_mutex_unlock_iothread();
-    start_exclusive();
-    qemu_mutex_lock_iothread();
-    dcl->ops->dpy_refresh(dcl);
-    qemu_mutex_unlock_iothread();
-    end_exclusive();
-    qemu_mutex_lock_iothread();
-}
-
 static void dpy_refresh(DisplayState *s)
 {
     DisplayChangeListener *dcl;
 
     QLIST_FOREACH(dcl, &s->listeners, next) {
         if (dcl->ops->dpy_refresh) {
-            if (tcg_enabled()) {
-                do_safe_dpy_refresh(dcl);
-            } else {
-                dcl->ops->dpy_refresh(dcl);
-            }
+            dcl->ops->dpy_refresh(dcl);
         }
     }
 }
-- 
2.9.3


Re: [Qemu-devel] [PULL 5/6] console: remove do_safe_dpy_refresh
Posted by Laurent Vivier 8 years, 6 months ago
On 21/06/2017 15:23, Gerd Hoffmann wrote:
> Drop the temporary workaround for the broken display updates.
> All display adapters are updated, so this should be safe without
> causing regressions.

It seems it breaks QMP command 'migrate "exec:cat>mig"'.

The command hangs and doesn't create the file.

It happens with qemu-system-ppc64 on x86 (so TCG mode).

my command:

   ./ppc64-softmmu/qemu-system-ppc64 -serial mon:stdio

I wait SLOF fails to find an OS, and:

    Ctrl-a c
    (qemu) migrate -d "exec:cat>mig"

The file is not created and the command hangs:

#0  in __lll_lock_wait
#1  in pthread_mutex_lock
#2  in qemu_mutex_lock
#3  in rcu_init_lock
#4  in fork
#5  in qemu_fork
#6  in qio_channel_command_new_spawn
#7  in exec_start_outgoing_migration
#8  in qmp_migrate
...

It looks like a deadlock.

Laurent

> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> Acked-by: Alex Bennée <alex.bennee@linaro.org>
> Message-id: 20170614084538.32480-1-kraxel@redhat.com
> ---
>  ui/console.c | 25 +------------------------
>  1 file changed, 1 insertion(+), 24 deletions(-)
> 
> diff --git a/ui/console.c b/ui/console.c
> index d914cced53..af0c56c600 100644
> --- a/ui/console.c
> +++ b/ui/console.c
> @@ -1579,36 +1579,13 @@ bool dpy_gfx_check_format(QemuConsole *con,
>      return true;
>  }
>  
> -/*
> - * Safe DPY refresh for TCG guests. We use the exclusive mechanism to
> - * ensure the TCG vCPUs are quiescent so we can avoid races between
> - * dirty page tracking for direct frame-buffer access by the guest.
> - *
> - * This is a temporary stopgap until we've fixed the dirty tracking
> - * races in display adapters.
> - */
> -static void do_safe_dpy_refresh(DisplayChangeListener *dcl)
> -{
> -    qemu_mutex_unlock_iothread();
> -    start_exclusive();
> -    qemu_mutex_lock_iothread();
> -    dcl->ops->dpy_refresh(dcl);
> -    qemu_mutex_unlock_iothread();
> -    end_exclusive();
> -    qemu_mutex_lock_iothread();
> -}
> -
>  static void dpy_refresh(DisplayState *s)
>  {
>      DisplayChangeListener *dcl;
>  
>      QLIST_FOREACH(dcl, &s->listeners, next) {
>          if (dcl->ops->dpy_refresh) {
> -            if (tcg_enabled()) {
> -                do_safe_dpy_refresh(dcl);
> -            } else {
> -                dcl->ops->dpy_refresh(dcl);
> -            }
> +            dcl->ops->dpy_refresh(dcl);
>          }
>      }
>  }
> 


Re: [Qemu-devel] [PULL 5/6] console: remove do_safe_dpy_refresh
Posted by Laurent Vivier 8 years, 6 months ago
On 18/07/2017 15:07, Laurent Vivier wrote:
> On 21/06/2017 15:23, Gerd Hoffmann wrote:
>> Drop the temporary workaround for the broken display updates.
>> All display adapters are updated, so this should be safe without
>> causing regressions.
> 
> It seems it breaks QMP command 'migrate "exec:cat>mig"'.
> 
> The command hangs and doesn't create the file.
> 
> It happens with qemu-system-ppc64 on x86 (so TCG mode).
> 
> my command:
> 
>    ./ppc64-softmmu/qemu-system-ppc64 -serial mon:stdio
> 
> I wait SLOF fails to find an OS, and:
> 
>     Ctrl-a c
>     (qemu) migrate -d "exec:cat>mig"
> 
> The file is not created and the command hangs:
> 
> #0  in __lll_lock_wait
> #1  in pthread_mutex_lock
> #2  in qemu_mutex_lock
> #3  in rcu_init_lock
> #4  in fork
> #5  in qemu_fork
> #6  in qio_channel_command_new_spawn
> #7  in exec_start_outgoing_migration
> #8  in qmp_migrate
> ...
> 
> It looks like a deadlock.

I think this patch is not the cause of the problem, the one it removes
just unlocks the deadlock by playing with locks.

We have a rcu_init_lock() on fork() because of:

utils/rcu.c:

static void __attribute__((__constructor__)) rcu_init(void)
{
#ifdef CONFIG_POSIX
    pthread_atfork(rcu_init_lock, rcu_init_unlock, rcu_init_unlock);
#endif
    rcu_init_complete();
}

The QMP thread hangs on:

(gdb) p rcu_sync_lock
$1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 23865,
      __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {
        __prev = 0x0, __next = 0x0}},
    __size = "\002\000\000\000\000\000\000\000\071]\000\000\001", '\000'
<repeats 26 times>, __align = 2}, initialized = true}


The lock is already taken by thread 2:

(gdb) info thread
  Id   Target Id         Frame
  1    Thread 0x7f1cf02fdf00 (LWP 23864) "qemu-system-ppc"
0x00007f1cd914b37d in __lll_lock_wait () from /lib64/libpthread.so.0
* 2    Thread 0x7f1cc9762700 (LWP 23865) "qemu-system-ppc"
0x00007f1cd410daa9 in syscall () from /lib64/libc.so.6
  3    Thread 0x7f1cbf8d5700 (LWP 23866) "qemu-system-ppc"
0x00007f1cd914b37d in __lll_lock_wait () from /lib64/libpthread.so.0

(gdb) bt
#0  0x00007f1cd410daa9 in syscall () at /lib64/libc.so.6
#1  0x000055ab028ddda2 in qemu_futex_wait
#2  0x000055ab028ddda2 in qemu_event_wait
#3  0x000055ab028eda2b in wait_for_readers
#4  0x000055ab028eda2b in synchronize_rcu
#5  0x000055ab028edc5b in call_rcu_thread
#6  0x00007f1cd914273a in start_thread ()
#7  0x00007f1cd4113e0f in clone ()

So it seems we cannot fork() from QMP?
[cc: Paolo]

Any comments?

Laurent

Re: [Qemu-devel] [PULL 5/6] console: remove do_safe_dpy_refresh
Posted by Dr. David Alan Gilbert 8 years, 6 months ago
* Laurent Vivier (lvivier@redhat.com) wrote:
> On 18/07/2017 15:07, Laurent Vivier wrote:
> > On 21/06/2017 15:23, Gerd Hoffmann wrote:
> >> Drop the temporary workaround for the broken display updates.
> >> All display adapters are updated, so this should be safe without
> >> causing regressions.
> > 
> > It seems it breaks QMP command 'migrate "exec:cat>mig"'.
> > 
> > The command hangs and doesn't create the file.
> > 
> > It happens with qemu-system-ppc64 on x86 (so TCG mode).
> > 
> > my command:
> > 
> >    ./ppc64-softmmu/qemu-system-ppc64 -serial mon:stdio
> > 
> > I wait SLOF fails to find an OS, and:
> > 
> >     Ctrl-a c
> >     (qemu) migrate -d "exec:cat>mig"
> > 
> > The file is not created and the command hangs:
> > 
> > #0  in __lll_lock_wait
> > #1  in pthread_mutex_lock
> > #2  in qemu_mutex_lock
> > #3  in rcu_init_lock
> > #4  in fork
> > #5  in qemu_fork
> > #6  in qio_channel_command_new_spawn
> > #7  in exec_start_outgoing_migration
> > #8  in qmp_migrate
> > ...
> > 
> > It looks like a deadlock.
> 
> I think this patch is not the cause of the problem, the one it removes
> just unlocks the deadlock by playing with locks.
> 
> We have a rcu_init_lock() on fork() because of:
> 
> utils/rcu.c:
> 
> static void __attribute__((__constructor__)) rcu_init(void)
> {
> #ifdef CONFIG_POSIX
>     pthread_atfork(rcu_init_lock, rcu_init_unlock, rcu_init_unlock);
> #endif
>     rcu_init_complete();
> }
> 
> The QMP thread hangs on:
> 
> (gdb) p rcu_sync_lock
> $1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 23865,
>       __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {
>         __prev = 0x0, __next = 0x0}},
>     __size = "\002\000\000\000\000\000\000\000\071]\000\000\001", '\000'
> <repeats 26 times>, __align = 2}, initialized = true}
> 
> 
> The lock is already taken by thread 2:
> 
> (gdb) info thread
>   Id   Target Id         Frame
>   1    Thread 0x7f1cf02fdf00 (LWP 23864) "qemu-system-ppc"
> 0x00007f1cd914b37d in __lll_lock_wait () from /lib64/libpthread.so.0
> * 2    Thread 0x7f1cc9762700 (LWP 23865) "qemu-system-ppc"
> 0x00007f1cd410daa9 in syscall () from /lib64/libc.so.6
>   3    Thread 0x7f1cbf8d5700 (LWP 23866) "qemu-system-ppc"
> 0x00007f1cd914b37d in __lll_lock_wait () from /lib64/libpthread.so.0
> 
> (gdb) bt
> #0  0x00007f1cd410daa9 in syscall () at /lib64/libc.so.6
> #1  0x000055ab028ddda2 in qemu_futex_wait
> #2  0x000055ab028ddda2 in qemu_event_wait
> #3  0x000055ab028eda2b in wait_for_readers
> #4  0x000055ab028eda2b in synchronize_rcu
> #5  0x000055ab028edc5b in call_rcu_thread
> #6  0x00007f1cd914273a in start_thread ()
> #7  0x00007f1cd4113e0f in clone ()
> 
> So it seems we cannot fork() from QMP?
> [cc: Paolo]
> 
> Any comments?

I remembered hitting this in the past - but I can only
trigger it rarely for me on x86;  the following script triggers it
after ~100 iterations on my laptop:

#!/bin/bash

while true
do
  OURPIPE=/tmp/delaystop.$$
  mknod $OURPIPE p

  ./try/x86_64-softmmu/qemu-system-x86_64 -nographic -M pc,accel=kvm -smp 8 < $OURPIPE &
  QEMUPID=$!
  exec 10> "$OURPIPE"

  # Flip the mon to hmp
  echo -e '\001c' >&10
  # just a test
  echo 'migrate  -d "exec: cat > /dev/null"' >&10
  sleep $(printf ".%05d" $RANDOM)
  echo "info status" >&10
  echo "q" >&10

  rm $OURPIPE
  wait $QEMUPID || break
done

(From my notes I mentioned that to Paolo about 18months ago after
he nailed a different case)

Dave

> Laurent
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PULL 5/6] console: remove do_safe_dpy_refresh
Posted by Paolo Bonzini 8 years, 6 months ago
On 18/07/2017 15:56, Laurent Vivier wrote:
> On 18/07/2017 15:07, Laurent Vivier wrote:
>> On 21/06/2017 15:23, Gerd Hoffmann wrote:
>>> Drop the temporary workaround for the broken display updates.
>>> All display adapters are updated, so this should be safe without
>>> causing regressions.
>>
>> It seems it breaks QMP command 'migrate "exec:cat>mig"'.
>>
>> The command hangs and doesn't create the file.
>>
>> It happens with qemu-system-ppc64 on x86 (so TCG mode).
>>
>> my command:
>>
>>    ./ppc64-softmmu/qemu-system-ppc64 -serial mon:stdio
>>
>> I wait SLOF fails to find an OS, and:
>>
>>     Ctrl-a c
>>     (qemu) migrate -d "exec:cat>mig"
>>
>> The file is not created and the command hangs:
>>
>> #0  in __lll_lock_wait
>> #1  in pthread_mutex_lock
>> #2  in qemu_mutex_lock
>> #3  in rcu_init_lock
>> #4  in fork
>> #5  in qemu_fork
>> #6  in qio_channel_command_new_spawn
>> #7  in exec_start_outgoing_migration
>> #8  in qmp_migrate
>> ...
>>
>> It looks like a deadlock.
> 
> I think this patch is not the cause of the problem, the one it removes
> just unlocks the deadlock by playing with locks.
> 
> We have a rcu_init_lock() on fork() because of:
> 
> utils/rcu.c:
> 
> static void __attribute__((__constructor__)) rcu_init(void)
> {
> #ifdef CONFIG_POSIX
>     pthread_atfork(rcu_init_lock, rcu_init_unlock, rcu_init_unlock);
> #endif
>     rcu_init_complete();
> }
> 
> The QMP thread hangs on:
> 
> (gdb) p rcu_sync_lock
> $1 = {lock = {__data = {__lock = 2, __count = 0, __owner = 23865,
>       __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {
>         __prev = 0x0, __next = 0x0}},
>     __size = "\002\000\000\000\000\000\000\000\071]\000\000\001", '\000'
> <repeats 26 times>, __align = 2}, initialized = true}
> 
> 
> The lock is already taken by thread 2:
> 
> (gdb) info thread
>   Id   Target Id         Frame
>   1    Thread 0x7f1cf02fdf00 (LWP 23864) "qemu-system-ppc"
> 0x00007f1cd914b37d in __lll_lock_wait () from /lib64/libpthread.so.0
> * 2    Thread 0x7f1cc9762700 (LWP 23865) "qemu-system-ppc"
> 0x00007f1cd410daa9 in syscall () from /lib64/libc.so.6
>   3    Thread 0x7f1cbf8d5700 (LWP 23866) "qemu-system-ppc"
> 0x00007f1cd914b37d in __lll_lock_wait () from /lib64/libpthread.so.0
> 
> (gdb) bt
> #0  0x00007f1cd410daa9 in syscall () at /lib64/libc.so.6
> #1  0x000055ab028ddda2 in qemu_futex_wait
> #2  0x000055ab028ddda2 in qemu_event_wait
> #3  0x000055ab028eda2b in wait_for_readers
> #4  0x000055ab028eda2b in synchronize_rcu
> #5  0x000055ab028edc5b in call_rcu_thread
> #6  0x00007f1cd914273a in start_thread ()
> #7  0x00007f1cd4113e0f in clone ()
> 
> So it seems we cannot fork() from QMP?
> [cc: Paolo]

There have been other similar bugs, as David reported.  The plan was to
disable pthread_atfork soon after daemonize (basically assuming that
after daemonize fork is immediately followed by exec), but I've been
lazy and never finished those patches.  Looks like it's time.

Paolo