[PATCH] linux-user,bsd-user: re-exec with G_SLICE=always-malloc

Daniel P. Berrangé posted 1 patch 1 year, 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20221004120047.857591-1-berrange@redhat.com
Maintainers: Warner Losh <imp@bsdimp.com>, Kyle Evans <kevans@freebsd.org>, Laurent Vivier <laurent@vivier.eu>
bsd-user/main.c   | 20 ++++++++++++++++++++
linux-user/main.c | 20 ++++++++++++++++++++
2 files changed, 40 insertions(+)
[PATCH] linux-user,bsd-user: re-exec with G_SLICE=always-malloc
Posted by Daniel P. Berrangé 1 year, 6 months ago
The g_slice custom allocator is not async signal safe with its
mutexes. When a multithreaded program running in the qemu user
emulator forks, it can end up deadlocking in the g_slice
allocator

  Thread 1:
  #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  #1 0x00007f54e190c77c in g_mutex_lock_slowpath (mutex=mutex@entry=0x7f54e1dc7600 <allocator+96>) at ../glib/gthread-posix.c:1462
  #2 0x00007f54e190d222 in g_mutex_lock (mutex=mutex@entry=0x7f54e1dc7600 <allocator+96>) at ../glib/gthread-posix.c:1486
  #3 0x00007f54e18e39f2 in magazine_cache_pop_magazine (countp=0x7f54280e6638, ix=2) at ../glib/gslice.c:769
  #4 thread_memory_magazine1_reload (ix=2, tmem=0x7f54280e6600) at ../glib/gslice.c:845
  #5 g_slice_alloc (mem_size=mem_size@entry=40) at ../glib/gslice.c:1058
  #6 0x00007f54e18f06fa in g_tree_node_new (value=0x7f54d4066540 <code_gen_buffer+419091>, key=0x7f54d4066560 <code_gen_buffer+419123>) at ../glib/gtree.c:517
  #7 g_tree_insert_internal (tree=0x555556aed800, key=0x7f54d4066560 <code_gen_buffer+419123>, value=0x7f54d4066540 <code_gen_buffer+419091>, replace=0) at ../glib/gtree.c:517
  #8 0x00007f54e186b755 in tcg_tb_insert (tb=0x7f54d4066540 <code_gen_buffer+419091>) at ../tcg/tcg.c:534
  #9 0x00007f54e1820545 in tb_gen_code (cpu=0x7f54980b4b60, pc=274906407438, cs_base=0, flags=24832, cflags=-16252928) at ../accel/tcg/translate-all.c:2118
  #10 0x00007f54e18034a5 in tb_find (cpu=0x7f54980b4b60, last_tb=0x7f54d4066440 <code_gen_buffer+418835>, tb_exit=0, cf_mask=524288) at ../accel/tcg/cpu-exec.c:462
  #11 0x00007f54e1803bd9 in cpu_exec (cpu=0x7f54980b4b60) at ../accel/tcg/cpu-exec.c:818
  #12 0x00007f54e1735a4c in cpu_loop (env=0x7f54980bce40) at ../linux-user/riscv/cpu_loop.c:37
  #13 0x00007f54e1844b22 in clone_func (arg=0x7f5402f3b080) at ../linux-user/syscall.c:6422
  #14 0x00007f54e191950a in start_thread (arg=<optimized out>) at pthread_create.c:477
  #15 0x00007f54e19a52a3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The only known workaround for this problem is to disable the g_slice
custom allocator, in favor of system malloc which is believed to be
async signal safe on all platforms QEMU officially targets.

g_slice uses a one-time initializer to check the G_SLICE env variable
making it hard for QEMU to set the env before any GLib API call has
triggered the initializer. Even attribute((constructor)) is not
sufficient as QEMU has many constructors and there is no ordering
guarantee between them.

This patch attempts to workaround this by re-exec()ing the QEMU user
emulators if the G_SLICE env variable is not already set. This means
the env variable will be inherited down the process tree spawned
from there onwards. There is a possibility this could have unexpected
consequences, but this has to be balanced against the real known
problem of QEMU user emulators randomly deadlocking.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/285
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---

Can't say I especially like this but I'm out of other ideas for how
to guarantee a solution. Users can't set env vars prior to launching
QEMU user emulators when using binfmt.

NB, I tested the linux-user impl and it stops the hangs in my
testing. I've not even compiled tested the bsd-user impl, just
blindly copied the linux-user code.

 bsd-user/main.c   | 20 ++++++++++++++++++++
 linux-user/main.c | 20 ++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index 6f09180d65..c816ab80b3 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -284,6 +284,26 @@ int main(int argc, char **argv)
     envlist_t *envlist = NULL;
     char *argv0 = NULL;
 
+    /*
+     * Historically the g_slice custom allocator was not async
+     * signal safe with its mutexes, causing deadlocks when a
+     * threaded program forks. Setting G_SLICE=always-malloc
+     * forces use of system malloc which is generally safe.
+     *
+     * https://gitlab.com/qemu-project/qemu/-/issues/285
+     *
+     * Remove if we ever bump min GLib to a version that
+     * drops g_slice's custom allocator:
+     *
+     * https://gitlab.gnome.org/GNOME/glib/-/issues/1079
+     */
+    if (getenv("G_SLICE") == NULL) {
+        setenv("G_SLICE", "always-malloc", 1);
+        execvp(argv[0], argv);
+        perror("execvep");
+        abort();
+    }
+
     adjust_ssize();
 
     if (argc <= 1) {
diff --git a/linux-user/main.c b/linux-user/main.c
index 88fccfe261..62391b9d32 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -654,6 +654,26 @@ int main(int argc, char **argv, char **envp)
     unsigned long max_reserved_va;
     bool preserve_argv0;
 
+    /*
+     * Historically the g_slice custom allocator was not async
+     * signal safe with its mutexes, causing deadlocks when a
+     * threaded program forks. Setting G_SLICE=always-malloc
+     * forces use of system malloc which is generally safe.
+     *
+     * https://gitlab.com/qemu-project/qemu/-/issues/285
+     *
+     * Remove if we ever bump min GLib to a version that
+     * drops g_slice's custom allocator:
+     *
+     * https://gitlab.gnome.org/GNOME/glib/-/issues/1079
+     */
+    if (getenv("G_SLICE") == NULL) {
+        setenv("G_SLICE", "always-malloc", 1);
+        execvp(argv[0], argv);
+        perror("execvep");
+        abort();
+    }
+
     error_init(argv[0]);
     module_call_init(MODULE_INIT_TRACE);
     qemu_init_cpu_list();
-- 
2.37.3


Re: [PATCH] linux-user,bsd-user: re-exec with G_SLICE=always-malloc
Posted by Emilio Cota 1 year, 4 months ago
On Tue, Oct 04, 2022 at 13:00:47 +0100, Daniel P. Berrangé wrote:
(snip)
> Can't say I especially like this but I'm out of other ideas for how
> to guarantee a solution. Users can't set env vars prior to launching
> QEMU user emulators when using binfmt.

An alternative is to not use GSlice between fork/exec. I'm
not sure if within that region there are other users besides
GTree (GArray perhaps?), but if there aren't, then just using
a different binary tree implementation should do.

Untested patches using ccan's AVL tree: 
  https://github.com/cota/qemu/commits/avl

Would that be more palatable?

Thanks,
		Emilio
Re: [PATCH] linux-user,bsd-user: re-exec with G_SLICE=always-malloc
Posted by Alex Bennée 1 year, 4 months ago
Emilio Cota <cota@braap.org> writes:

> On Tue, Oct 04, 2022 at 13:00:47 +0100, Daniel P. Berrangé wrote:
> (snip)
>> Can't say I especially like this but I'm out of other ideas for how
>> to guarantee a solution. Users can't set env vars prior to launching
>> QEMU user emulators when using binfmt.
>
> An alternative is to not use GSlice between fork/exec. I'm
> not sure if within that region there are other users besides
> GTree (GArray perhaps?), but if there aren't, then just using
> a different binary tree implementation should do.

Hmm my distros version of GArray certainly does and that is used quite
heavily across gdbstub and plugins.

>
> Untested patches using ccan's AVL tree: 
>   https://github.com/cota/qemu/commits/avl
>
> Would that be more palatable?

I think generally we wouldn't want to have multiple implementations
unless there was a definite benefit (c.f. QHT). That said I think
Richard's latest optimisation work:

  Subject: [PATCH v2 0/7] accel/tcg: Rewrite user-only vma tracking
  Date: Thu, 27 Oct 2022 22:12:51 +1100
  Message-Id: <20221027111258.348196-1-richard.henderson@linaro.org>

brings in the kernel's interval tree (with unit tests). I wonder if the
page_collection use of GTree could be converted to that?

I don't know how you would defend against re-introducing it into
linux-user though aside from commentary.

>
> Thanks,
> 		Emilio


-- 
Alex Bennée
Re: [PATCH] linux-user,bsd-user: re-exec with G_SLICE=always-malloc
Posted by Emilio Cota 1 year, 3 months ago
On Thu, Dec 01, 2022 at 10:49:27 +0000, Alex Bennée wrote:
> Emilio Cota <cota@braap.org> writes:
> > On Tue, Oct 04, 2022 at 13:00:47 +0100, Daniel P. Berrangé wrote:
> > (snip)
> >> Can't say I especially like this but I'm out of other ideas for how
> >> to guarantee a solution. Users can't set env vars prior to launching
> >> QEMU user emulators when using binfmt.
> >
> > An alternative is to not use GSlice between fork/exec. I'm
> > not sure if within that region there are other users besides
> > GTree (GArray perhaps?), but if there aren't, then just using
> > a different binary tree implementation should do.
> 
> Hmm my distros version of GArray certainly does and that is used quite
> heavily across gdbstub and plugins.

Then we might have to also import a GSlice-free GArray ("QArray").
Currently we just deadlock on POSIX-compliant code, which is
unacceptable.

> > Untested patches using ccan's AVL tree: 
> >   https://github.com/cota/qemu/commits/avl
> >
> > Would that be more palatable?
> 
> I think generally we wouldn't want to have multiple implementations
> unless there was a definite benefit (c.f. QHT). That said I think
> Richard's latest optimisation work:
> 
>   Subject: [PATCH v2 0/7] accel/tcg: Rewrite user-only vma tracking
>   Date: Thu, 27 Oct 2022 22:12:51 +1100
>   Message-Id: <20221027111258.348196-1-richard.henderson@linaro.org>
> 
> brings in the kernel's interval tree (with unit tests). I wonder if the
> page_collection use of GTree could be converted to that?

Thanks. I looked into reusing this but I don't think it's a drop-in
replacement for GTree.

> I don't know how you would defend against re-introducing it into
> linux-user though aside from commentary.

To close the loop:
I've sent a patch series that imports GTree-sans-GSlice as QTree,
and uses that for TCG:
  https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02080.html

Thanks,
		Emilio
Re: [PATCH] linux-user,bsd-user: re-exec with G_SLICE=always-malloc
Posted by Richard Henderson 1 year, 6 months ago
On 10/4/22 05:00, Daniel P. Berrangé wrote:
> g_slice uses a one-time initializer to check the G_SLICE env variable
> making it hard for QEMU to set the env before any GLib API call has
> triggered the initializer. Even attribute((constructor)) is not
> sufficient as QEMU has many constructors and there is no ordering
> guarantee between them.

There are orderings for constructors, see __attribute__((constructor(priority))).


r~

Re: [PATCH] linux-user,bsd-user: re-exec with G_SLICE=always-malloc
Posted by Daniel P. Berrangé 1 year, 6 months ago
On Tue, Oct 04, 2022 at 07:59:18AM -0700, Richard Henderson wrote:
> On 10/4/22 05:00, Daniel P. Berrangé wrote:
> > g_slice uses a one-time initializer to check the G_SLICE env variable
> > making it hard for QEMU to set the env before any GLib API call has
> > triggered the initializer. Even attribute((constructor)) is not
> > sufficient as QEMU has many constructors and there is no ordering
> > guarantee between them.
> 
> There are orderings for constructors, see __attribute__((constructor(priority))).

Oh, thanks for pointing that out. I tried it, but glib threw 
a bag of rocks at me ;-P

The priority works for ordering within the scope of the binary
containing the constructor.

libglib.so itself has a constructor function registered, and that
calls APIs that trigger GSlice initialization:

#0  g_slice_init_nomessage () at ../glib/gslice.c:443
#1  0x00007ffff7ecbd6e in thread_memory_from_self () at ../glib/gslice.c:560
#2  thread_memory_from_self () at ../glib/gslice.c:550
#3  g_slice_alloc (mem_size=96) at ../glib/gslice.c:1050
#4  0x00007ffff7e9ca02 in g_hash_table_new_full (hash_func=0x7ffff7e91a00 <g_str_hash>, 
    key_equal_func=0x7ffff7e91a70 <g_str_equal>, key_destroy_func=0x0, value_destroy_func=0x0)
    at ../glib/ghash.c:1071
#5  0x00007ffff7ebf313 in g_quark_init () at ../glib/gquark.c:61
#6  0x00007ffff7e77d79 in glib_init () at ../glib/glib-init.c:339
#7  glib_init () at ../glib/glib-init.c:328
#8  glib_init_ctor () at ../glib/glib-init.c:453
#9  0x00007ffff7fcbf7e in call_init (env=0x7fffffffdb80, argv=0x7fffffffdb68, argc=2, l=<optimized out>)
    at dl-init.c:70
#10 call_init (l=<optimized out>, argc=2, argv=0x7fffffffdb68, env=0x7fffffffdb80) at dl-init.c:26
#11 0x00007ffff7fcc06c in _dl_init (main_map=0x7ffff7ffe2a0, argc=2, argv=0x7fffffffdb68, env=0x7fffffffdb80)
    at dl-init.c:117
#12 0x00007ffff7fe3d4a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2


This all takes place when libglib.so is loaded, which happens prior
to any code in QEMU being loaded / executed. So no constructor in
QEMU code can ever pre-empt this in dynamic builds.


The only possible silver linining is that in static linked builds,
it appears that a QEMU constructor with priority 101, will pre-empt
the constructor from any library. This is kind of crazy, as it means
if any library or app code uses priorities, it'll get totally different
execution ordering depending on whether it is dynamic or statically
built. 

I guess we could rely on this hack if we declare that everyone using
binfmt is probably relying on static linked QEMU, and in non-binfmt
cases people can set the env var themselves.  It still feels pretty
dirty.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH] linux-user,bsd-user: re-exec with G_SLICE=always-malloc
Posted by Richard Henderson 1 year, 6 months ago
On 10/6/22 11:12, Daniel P. Berrangé wrote:
> On Tue, Oct 04, 2022 at 07:59:18AM -0700, Richard Henderson wrote:
>> On 10/4/22 05:00, Daniel P. Berrangé wrote:
>>> g_slice uses a one-time initializer to check the G_SLICE env variable
>>> making it hard for QEMU to set the env before any GLib API call has
>>> triggered the initializer. Even attribute((constructor)) is not
>>> sufficient as QEMU has many constructors and there is no ordering
>>> guarantee between them.
>>
>> There are orderings for constructors, see __attribute__((constructor(priority))).
> 
> Oh, thanks for pointing that out. I tried it, but glib threw
> a bag of rocks at me ;-P
> 
> The priority works for ordering within the scope of the binary
> containing the constructor.

Yes.

> 
> libglib.so itself has a constructor function registered, and that
> calls APIs that trigger GSlice initialization:

Ah.  I had been hoping that gslice would be initialized on first use, so as long as we 
could get the setenv done before any other qemu code ran, we'd be fine.

> This all takes place when libglib.so is loaded, which happens prior
> to any code in QEMU being loaded / executed. So no constructor in
> QEMU code can ever pre-empt this in dynamic builds.

Shared libraries have a defined initialization order too, but we'd have to play real 
irritating games to make this happen, installing a shared library of our own (linked later 
in the sequence to qemu, and itself *not* linked to libglib.so).  Not worth it.

> The only possible silver linining is that in static linked builds,
> it appears that a QEMU constructor with priority 101, will pre-empt
> the constructor from any library. This is kind of crazy, as it means
> if any library or app code uses priorities, it'll get totally different
> execution ordering depending on whether it is dynamic or statically
> built.

Plausible...

> I guess we could rely on this hack if we declare that everyone using
> binfmt is probably relying on static linked QEMU, and in non-binfmt
> cases people can set the env var themselves.  It still feels pretty
> dirty.

... but as you say, dirty.

Alternately, report it as a bug to glib, because we can't be the only project impacted by 
this.


r~

Re: [PATCH] linux-user,bsd-user: re-exec with G_SLICE=always-malloc
Posted by Kyle Evans 1 year, 6 months ago
On Thu, Oct 6, 2022 at 11:29 AM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 10/6/22 11:12, Daniel P. Berrangé wrote:
> > The only possible silver linining is that in static linked builds,
> > it appears that a QEMU constructor with priority 101, will pre-empt
> > the constructor from any library. This is kind of crazy, as it means
> > if any library or app code uses priorities, it'll get totally different
> > execution ordering depending on whether it is dynamic or statically
> > built.
>
> Plausible...
>

> > I guess we could rely on this hack if we declare that everyone using
> > binfmt is probably relying on static linked QEMU, and in non-binfmt
> > cases people can set the env var themselves.  It still feels pretty
> > dirty.
>
> ... but as you say, dirty.

FWIW, on FreeBSD at least, we don't support dynamically linked
bsd-user and I'd go as far as to say that we have no desire to change
that in the future, either -- the benefits are simply not there to
outweigh the pain for our use-cases.

Thanks,

Kyle Evans
Re: [PATCH] linux-user,bsd-user: re-exec with G_SLICE=always-malloc
Posted by Peter Maydell 1 year, 6 months ago
On Tue, 4 Oct 2022 at 13:00, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> The g_slice custom allocator is not async signal safe with its
> mutexes. When a multithreaded program running in the qemu user
> emulator forks, it can end up deadlocking in the g_slice
> allocator
>
>   Thread 1:
>   #0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
>   #1 0x00007f54e190c77c in g_mutex_lock_slowpath (mutex=mutex@entry=0x7f54e1dc7600 <allocator+96>) at ../glib/gthread-posix.c:1462
>   #2 0x00007f54e190d222 in g_mutex_lock (mutex=mutex@entry=0x7f54e1dc7600 <allocator+96>) at ../glib/gthread-posix.c:1486
>   #3 0x00007f54e18e39f2 in magazine_cache_pop_magazine (countp=0x7f54280e6638, ix=2) at ../glib/gslice.c:769
>   #4 thread_memory_magazine1_reload (ix=2, tmem=0x7f54280e6600) at ../glib/gslice.c:845
>   #5 g_slice_alloc (mem_size=mem_size@entry=40) at ../glib/gslice.c:1058
>   #6 0x00007f54e18f06fa in g_tree_node_new (value=0x7f54d4066540 <code_gen_buffer+419091>, key=0x7f54d4066560 <code_gen_buffer+419123>) at ../glib/gtree.c:517
>   #7 g_tree_insert_internal (tree=0x555556aed800, key=0x7f54d4066560 <code_gen_buffer+419123>, value=0x7f54d4066540 <code_gen_buffer+419091>, replace=0) at ../glib/gtree.c:517
>   #8 0x00007f54e186b755 in tcg_tb_insert (tb=0x7f54d4066540 <code_gen_buffer+419091>) at ../tcg/tcg.c:534
>   #9 0x00007f54e1820545 in tb_gen_code (cpu=0x7f54980b4b60, pc=274906407438, cs_base=0, flags=24832, cflags=-16252928) at ../accel/tcg/translate-all.c:2118
>   #10 0x00007f54e18034a5 in tb_find (cpu=0x7f54980b4b60, last_tb=0x7f54d4066440 <code_gen_buffer+418835>, tb_exit=0, cf_mask=524288) at ../accel/tcg/cpu-exec.c:462
>   #11 0x00007f54e1803bd9 in cpu_exec (cpu=0x7f54980b4b60) at ../accel/tcg/cpu-exec.c:818
>   #12 0x00007f54e1735a4c in cpu_loop (env=0x7f54980bce40) at ../linux-user/riscv/cpu_loop.c:37
>   #13 0x00007f54e1844b22 in clone_func (arg=0x7f5402f3b080) at ../linux-user/syscall.c:6422
>   #14 0x00007f54e191950a in start_thread (arg=<optimized out>) at pthread_create.c:477
>   #15 0x00007f54e19a52a3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> The only known workaround for this problem is to disable the g_slice
> custom allocator, in favor of system malloc which is believed to be
> async signal safe on all platforms QEMU officially targets.
>
> g_slice uses a one-time initializer to check the G_SLICE env variable
> making it hard for QEMU to set the env before any GLib API call has
> triggered the initializer. Even attribute((constructor)) is not
> sufficient as QEMU has many constructors and there is no ordering
> guarantee between them.
>
> This patch attempts to workaround this by re-exec()ing the QEMU user
> emulators if the G_SLICE env variable is not already set. This means
> the env variable will be inherited down the process tree spawned
> from there onwards. There is a possibility this could have unexpected
> consequences, but this has to be balanced against the real known
> problem of QEMU user emulators randomly deadlocking.
>
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/285
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>
> Can't say I especially like this but I'm out of other ideas for how
> to guarantee a solution. Users can't set env vars prior to launching
> QEMU user emulators when using binfmt.
>
> NB, I tested the linux-user impl and it stops the hangs in my
> testing. I've not even compiled tested the bsd-user impl, just
> blindly copied the linux-user code.

I suspect a simple re-exec won't play nicely with all the possible
ways you can use binfmt-misc...

-- PMM