[Qemu-devel] [RFC PATCH] char-socket: hold chr_write_lock during tcp_chr_free_connection()

Alberto Garcia posted 1 patch 5 years, 2 months ago
Test docker-mingw@fedora passed
Test asan passed
Test checkpatch passed
Test docker-clang@ubuntu passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190206124345.14942-1-berto@igalia.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, "Marc-André Lureau" <marcandre.lureau@redhat.com>
chardev/char-socket.c | 4 ++++
1 file changed, 4 insertions(+)
[Qemu-devel] [RFC PATCH] char-socket: hold chr_write_lock during tcp_chr_free_connection()
Posted by Alberto Garcia 5 years, 2 months ago
There's a race condition in which the tcp_chr_read() ioc handler can
close a connection that is being written to from another thread.

Running iotest 136 in a loop triggers this problem and crashes QEMU.

 (gdb) bt
 #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
 #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
 #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
 #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
 #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
 #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
 #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
 #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
 #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
 #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
 #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
 #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
 #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
 #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
 #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
 #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
 #16 0x00005558b819140d in main_loop () at vl.c:1924
 #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 chardev/char-socket.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index eaa8e8b68f..47bef1d2bd 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -148,7 +148,9 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
 
         if (ret < 0 && errno != EAGAIN) {
             if (tcp_chr_read_poll(chr) <= 0) {
+                qemu_mutex_unlock(&chr->chr_write_lock);
                 tcp_chr_disconnect(chr);
+                qemu_mutex_lock(&chr->chr_write_lock);
                 return len;
             } /* else let the read handler finish it properly */
         }
@@ -449,7 +451,9 @@ static void tcp_chr_disconnect(Chardev *chr)
     SocketChardev *s = SOCKET_CHARDEV(chr);
     bool emit_close = s->connected;
 
+    qemu_mutex_lock(&chr->chr_write_lock);
     tcp_chr_free_connection(chr);
+    qemu_mutex_unlock(&chr->chr_write_lock);
 
     if (s->listener) {
         qio_net_listener_set_client_func_full(s->listener, tcp_chr_accept,
-- 
2.11.0


Re: [Qemu-devel] [RFC PATCH] char-socket: hold chr_write_lock during tcp_chr_free_connection()
Posted by Daniel P. Berrangé 5 years, 2 months ago
On Wed, Feb 06, 2019 at 02:43:45PM +0200, Alberto Garcia wrote:
> There's a race condition in which the tcp_chr_read() ioc handler can
> close a connection that is being written to from another thread.

Isn't the chardev only ever supposed to be read/written from a
single thread.

> 
> Running iotest 136 in a loop triggers this problem and crashes QEMU.
> 
>  (gdb) bt
>  #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
>  #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
>  #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
>  #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
>  #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
>  #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
>  #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
>  #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
>  #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
>  #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
>  #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
>  #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
>  #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
>  #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
>  #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
>  #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
>  #16 0x00005558b819140d in main_loop () at vl.c:1924
>  #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665

This shows the main thread trace - i'd be interested to see a
trace of the other thread which is calling tcp_chr_free_connection()
to understand why we have 2 threads racing at all.

> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  chardev/char-socket.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index eaa8e8b68f..47bef1d2bd 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -148,7 +148,9 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>  
>          if (ret < 0 && errno != EAGAIN) {
>              if (tcp_chr_read_poll(chr) <= 0) {
> +                qemu_mutex_unlock(&chr->chr_write_lock);
>                  tcp_chr_disconnect(chr);
> +                qemu_mutex_lock(&chr->chr_write_lock);
>                  return len;
>              } /* else let the read handler finish it properly */
>          }
> @@ -449,7 +451,9 @@ static void tcp_chr_disconnect(Chardev *chr)
>      SocketChardev *s = SOCKET_CHARDEV(chr);
>      bool emit_close = s->connected;
>  
> +    qemu_mutex_lock(&chr->chr_write_lock);
>      tcp_chr_free_connection(chr);
> +    qemu_mutex_unlock(&chr->chr_write_lock);
>  
>      if (s->listener) {
>          qio_net_listener_set_client_func_full(s->listener, tcp_chr_accept,
> -- 
> 2.11.0
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC PATCH] char-socket: hold chr_write_lock during tcp_chr_free_connection()
Posted by Paolo Bonzini 5 years, 2 months ago
On 06/02/19 13:49, Daniel P. Berrangé wrote:
> On Wed, Feb 06, 2019 at 02:43:45PM +0200, Alberto Garcia wrote:
>> There's a race condition in which the tcp_chr_read() ioc handler can
>> close a connection that is being written to from another thread.
> Isn't the chardev only ever supposed to be read/written from a
> single thread.
> 

No, writes from multiple threads are allowed; of course they have to
have proper locking to avoid mixing messages.

Reads have to happen from a single thread because they are tied to the
chr_read handlers.

Paolo

Re: [Qemu-devel] [RFC PATCH] char-socket: hold chr_write_lock during tcp_chr_free_connection()
Posted by Daniel P. Berrangé 5 years, 2 months ago
On Wed, Feb 06, 2019 at 01:52:59PM +0100, Paolo Bonzini wrote:
> On 06/02/19 13:49, Daniel P. Berrangé wrote:
> > On Wed, Feb 06, 2019 at 02:43:45PM +0200, Alberto Garcia wrote:
> >> There's a race condition in which the tcp_chr_read() ioc handler can
> >> close a connection that is being written to from another thread.
> > Isn't the chardev only ever supposed to be read/written from a
> > single thread.
> > 
> 
> No, writes from multiple threads are allowed; of course they have to
> have proper locking to avoid mixing messages.
> 
> Reads have to happen from a single thread because they are tied to the
> chr_read handlers.

Ok, then we really ought to have full locking around tcp_chr_disconnect

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [RFC PATCH] char-socket: hold chr_write_lock during tcp_chr_free_connection()
Posted by Alberto Garcia 5 years, 2 months ago
On Wed 06 Feb 2019 01:49:02 PM CET, Daniel P. Berrangé wrote:
>>  (gdb) bt
>>  #0  0x00005558b842902d in object_get_class (obj=0x0) at qom/object.c:860
>>  #1  0x00005558b84f92db in qio_channel_writev_full (ioc=0x0, iov=0x7ffc355decf0, niov=1, fds=0x0, nfds=0, errp=0x0) at io/channel.c:76
>>  #2  0x00005558b84e0e9e in io_channel_send_full (ioc=0x0, buf=0x5558baf5beb0, len=138, fds=0x0, nfds=0) at chardev/char-io.c:123
>>  #3  0x00005558b84e4a69 in tcp_chr_write (chr=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138) at chardev/char-socket.c:135
>>  #4  0x00005558b84dca55 in qemu_chr_write_buffer (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, offset=0x7ffc355dedd0, write_all=false) at chardev/char.c:112
>>  #5  0x00005558b84dcbc2 in qemu_chr_write (s=0x5558ba460380, buf=0x5558baf5beb0 "...", len=138, write_all=false) at chardev/char.c:147
>>  #6  0x00005558b84dfb26 in qemu_chr_fe_write (be=0x5558ba476610, buf=0x5558baf5beb0 "...", len=138) at chardev/char-fe.c:42
>>  #7  0x00005558b8088c86 in monitor_flush_locked (mon=0x5558ba476610) at monitor.c:406
>>  #8  0x00005558b8088e8c in monitor_puts (mon=0x5558ba476610, str=0x5558ba921e49 "") at monitor.c:449
>>  #9  0x00005558b8089178 in qmp_send_response (mon=0x5558ba476610, rsp=0x5558bb161600) at monitor.c:498
>>  #10 0x00005558b808920c in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:526
>>  #11 0x00005558b8089307 in monitor_qapi_event_queue_no_reenter (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:551
>>  #12 0x00005558b80896c0 in qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x5558bb161600) at monitor.c:626
>>  #13 0x00005558b855f23b in qapi_event_send_shutdown (guest=false, reason=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at qapi/qapi-events-run-state.c:43
>>  #14 0x00005558b81911ef in qemu_system_shutdown (cause=SHUTDOWN_CAUSE_HOST_QMP_QUIT) at vl.c:1837
>>  #15 0x00005558b8191308 in main_loop_should_exit () at vl.c:1885
>>  #16 0x00005558b819140d in main_loop () at vl.c:1924
>>  #17 0x00005558b8198c84 in main (argc=18, argv=0x7ffc355df3f8, envp=0x7ffc355df490) at vl.c:4665
>
> This shows the main thread trace - i'd be interested to see a trace of
> the other thread which is calling tcp_chr_free_connection() to
> understand why we have 2 threads racing at all.

The other thread is already done by the time this one crashes, but you
can see what's going on if you replace the qemu_mutex_lock() call in
this patch with a trylock() + an assertion:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007fafc05e642a in __GI_abort () at abort.c:89
#2  0x00007fafc05dde67 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55a4dbe78448 "qemu_mutex_trylock(&chr->chr_write_lock) != -EBUSY", 
    file=file@entry=0x55a4dbe78320 "chardev/char-socket.c", line=line@entry=452, 
    function=function@entry=0x55a4dbe788d0 <__PRETTY_FUNCTION__.20007> "tcp_chr_disconnect") at assert.c:92
#3  0x00007fafc05ddf12 in __GI___assert_fail (assertion=0x55a4dbe78448 "qemu_mutex_trylock(&chr->chr_write_lock) != -EBUSY", 
    file=0x55a4dbe78320 "chardev/char-socket.c", line=452, function=0x55a4dbe788d0 <__PRETTY_FUNCTION__.20007> "tcp_chr_disconnect")
    at assert.c:101
#4  0x000055a4dbbbdab1 in tcp_chr_disconnect (chr=0x55a4dc76e380) at chardev/char-socket.c:452
#5  0x000055a4dbbbdca0 in tcp_chr_read (chan=0x55a4dc76e770, cond=(G_IO_IN | G_IO_HUP), opaque=0x55a4dc76e380) at chardev/char-socket.c:486
#6  0x000055a4dbbd7f76 in qio_channel_fd_source_dispatch (source=0x7fafac005000, callback=0x55a4dbbbdb5c <tcp_chr_read>, user_data=0x55a4dc76e380)
    at io/channel-watch.c:84
#7  0x00007fafc3ad16aa in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#8  0x00007fafc3ad1a60 in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#9  0x00007fafc3ad1d82 in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#10 0x000055a4db8601f9 in iothread_run (opaque=0x55a4dc781440) at iothread.c:74
#11 0x000055a4dbc57039 in qemu_thread_start (args=0x55a4dc774c10) at util/qemu-thread-posix.c:502
#12 0x00007fafc0958494 in start_thread (arg=0x7fafb23aa700) at pthread_create.c:333
#13 0x00007fafc069aacf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Berto

Re: [Qemu-devel] [RFC PATCH] char-socket: hold chr_write_lock during tcp_chr_free_connection()
Posted by Paolo Bonzini 5 years, 2 months ago
On 06/02/19 13:43, Alberto Garcia wrote:
>          }
> @@ -449,7 +451,9 @@ static void tcp_chr_disconnect(Chardev *chr)
>      SocketChardev *s = SOCKET_CHARDEV(chr);
>      bool emit_close = s->connected;
>  
> +    qemu_mutex_lock(&chr->chr_write_lock);
>      tcp_chr_free_connection(chr);
> +    qemu_mutex_unlock(&chr->chr_write_lock);
>  
>      if (s->listener) {
>          qio_net_listener_set_client_func_full(s->listener, tcp_chr_accept,

Should operations on the listener also be protected?  I think that apart
from emitting the closed event itself everything in this function should
be protected by the lock.   The closed event should be pushed to the
GMainContext using an idle source (perhaps it's worth writing a wrapper
qemu_idle_add that takes a GMainContext, as the same idiom is already
present in pty_chr_state and qio_task_thread_worker).

socket_reconnect_timeout should also take the lock when freeing
s->reconnect_timer.

Paolo

Re: [Qemu-devel] [RFC PATCH] char-socket: hold chr_write_lock during tcp_chr_free_connection()
Posted by Alberto Garcia 5 years, 2 months ago
On Wed 06 Feb 2019 02:00:03 PM CET, Paolo Bonzini wrote:
> On 06/02/19 13:43, Alberto Garcia wrote:
>>          }
>> @@ -449,7 +451,9 @@ static void tcp_chr_disconnect(Chardev *chr)
>>      SocketChardev *s = SOCKET_CHARDEV(chr);
>>      bool emit_close = s->connected;
>>  
>> +    qemu_mutex_lock(&chr->chr_write_lock);
>>      tcp_chr_free_connection(chr);
>> +    qemu_mutex_unlock(&chr->chr_write_lock);
>>  
>>      if (s->listener) {
>>          qio_net_listener_set_client_func_full(s->listener, tcp_chr_accept,
>
> Should operations on the listener also be protected?  I think that apart
> from emitting the closed event itself everything in this function should
> be protected by the lock.   The closed event should be pushed to the
> GMainContext using an idle source (perhaps it's worth writing a wrapper
> qemu_idle_add that takes a GMainContext, as the same idiom is already
> present in pty_chr_state and qio_task_thread_worker).

Now that you mention this it seems that we are leaking the GSources in
those two cases? GLib's g_idle_add_full() implementation does unref the
source after attaching it to the GMainContext.

   https://gitlab.gnome.org/GNOME/glib/blob/master/glib/gmain.c#L5684

Berto