Commit bb6cf6f0168 ("accel/tcg: Factor tcg_cpu_reset_hold()
out") wanted to restrict tlb_flush() to system emulation,
but inadvertently also restricted tcg_flush_jmp_cache(),
which was before called on user emulation via:
Realize -> Reset -> cpu_common_reset_hold()
Since threads (vCPUs) use a common CPUJumpCache, when many
threads are created / joined, they eventually end re-using
a CPUJumpCache entry, which was cleared when the first vCPU
was allocated (via Realize) but then stayed dirty, leading to:
Thread 1 "qemu-s390x" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007ffff7c41e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 0x00007ffff7bf2fb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007ffff7bdd472 in __GI_abort () at ./stdlib/abort.c:79
#4 0x00007ffff7bdd395 in __assert_fail_base (fmt=0x7ffff7d51a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5555556d71b8 "cpu->accel",
file=file@entry=0x5555556d70e0 "cpu-target.c", line=line@entry=158, function=function@entry=0x5555556d7260 <__PRETTY_FUNCTION__.3> "cpu_exec_realizefn") at ./assert/assert.c:92
#5 0x00007ffff7bebeb2 in __GI___assert_fail (assertion=assertion@entry=0x5555556d71b8 "cpu->accel", file=file@entry=0x5555556d70e0 "cpu-target.c", line=line@entry=158,
function=function@entry=0x5555556d7260 <__PRETTY_FUNCTION__.3> "cpu_exec_realizefn") at ./assert/assert.c:101
#6 0x00005555555d44ca in cpu_exec_realizefn (cpu=cpu@entry=0x5555557c28c0, errp=errp@entry=0x7fffffffe140) at cpu-target.c:158
#7 0x000055555559f50b in s390_cpu_realizefn (dev=0x5555557c28c0, errp=0x7fffffffe1a0) at target/s390x/cpu.c:261
#8 0x000055555563f78b in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7fffffffe2e0) at hw/core/qdev.c:510
#9 0x000055555564365d in property_set_bool (obj=0x5555557c28c0, v=<optimized out>, name=<optimized out>, opaque=0x5555557a9140, errp=0x7fffffffe2e0) at qom/object.c:2362
#10 0x0000555555646bbb in object_property_set (obj=obj@entry=0x5555557c28c0, name=name@entry=0x5555556e8ae2 "realized", v=v@entry=0x5555557c6650, errp=errp@entry=0x7fffffffe2e0)
at qom/object.c:1471
#11 0x000055555564a45f in object_property_set_qobject (obj=obj@entry=0x5555557c28c0, name=name@entry=0x5555556e8ae2 "realized", value=value@entry=0x5555557a7a90, errp=errp@entry=0x7fffffffe2e0)
at qom/qom-qobject.c:28
#12 0x0000555555647224 in object_property_set_bool (obj=0x5555557c28c0, name=name@entry=0x5555556e8ae2 "realized", value=value@entry=true, errp=errp@entry=0x7fffffffe2e0)
at qom/object.c:1541
#13 0x000055555564027c in qdev_realize (dev=<optimized out>, bus=bus@entry=0x0, errp=errp@entry=0x7fffffffe2e0) at hw/core/qdev.c:291
#14 0x000055555559bb54 in cpu_create (typename=<optimized out>) at hw/core/cpu-common.c:57
#15 0x000055555559a467 in main (argc=4, argv=0x7fffffffeaa8, envp=<optimized out>) at linux-user/main.c:811
Have cpu_exec_reset_hold() call the common tcg_exec_reset()
helper on user emulation, eventually calling tcg_flush_jmp_cache().
Fixes: bb6cf6f0168 ("accel/tcg: Factor tcg_cpu_reset_hold() out")
Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
accel/tcg/user-exec-stub.c | 4 ----
accel/tcg/user-exec.c | 5 +++++
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/accel/tcg/user-exec-stub.c b/accel/tcg/user-exec-stub.c
index 4fbe2dbdc88..2dc6fd9c4e8 100644
--- a/accel/tcg/user-exec-stub.c
+++ b/accel/tcg/user-exec-stub.c
@@ -14,10 +14,6 @@ void qemu_init_vcpu(CPUState *cpu)
{
}
-void cpu_exec_reset_hold(CPUState *cpu)
-{
-}
-
/* User mode emulation does not support record/replay yet. */
bool replay_exception(void)
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 0561c4f6dc7..92640f07ed7 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -40,6 +40,11 @@ __thread uintptr_t helper_retaddr;
//#define DEBUG_SIGNAL
+void cpu_exec_reset_hold(CPUState *cpu)
+{
+ tcg_exec_reset(cpu);
+}
+
void cpu_interrupt(CPUState *cpu, int mask)
{
g_assert(bql_locked());
--
2.47.1
On 1/2/25 10:25, Philippe Mathieu-Daudé wrote:
> Commit bb6cf6f0168 ("accel/tcg: Factor tcg_cpu_reset_hold()
> out") wanted to restrict tlb_flush() to system emulation,
> but inadvertently also restricted tcg_flush_jmp_cache(),
> which was before called on user emulation via:
>
> Realize -> Reset -> cpu_common_reset_hold()
>
> Since threads (vCPUs) use a common CPUJumpCache, when many
> threads are created / joined, they eventually end re-using
> a CPUJumpCache entry, which was cleared when the first vCPU
> was allocated (via Realize) but then stayed dirty, leading to:
>
> Thread 1 "qemu-s390x" received signal SIGABRT, Aborted.
> __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> 44 ./nptl/pthread_kill.c: No such file or directory.
> (gdb) bt
> #0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
> #1 0x00007ffff7c41e8f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
> #2 0x00007ffff7bf2fb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
> #3 0x00007ffff7bdd472 in __GI_abort () at ./stdlib/abort.c:79
> #4 0x00007ffff7bdd395 in __assert_fail_base (fmt=0x7ffff7d51a90 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5555556d71b8 "cpu->accel",
> file=file@entry=0x5555556d70e0 "cpu-target.c", line=line@entry=158, function=function@entry=0x5555556d7260 <__PRETTY_FUNCTION__.3> "cpu_exec_realizefn") at ./assert/assert.c:92
> #5 0x00007ffff7bebeb2 in __GI___assert_fail (assertion=assertion@entry=0x5555556d71b8 "cpu->accel", file=file@entry=0x5555556d70e0 "cpu-target.c", line=line@entry=158,
> function=function@entry=0x5555556d7260 <__PRETTY_FUNCTION__.3> "cpu_exec_realizefn") at ./assert/assert.c:101
> #6 0x00005555555d44ca in cpu_exec_realizefn (cpu=cpu@entry=0x5555557c28c0, errp=errp@entry=0x7fffffffe140) at cpu-target.c:158
> #7 0x000055555559f50b in s390_cpu_realizefn (dev=0x5555557c28c0, errp=0x7fffffffe1a0) at target/s390x/cpu.c:261
> #8 0x000055555563f78b in device_set_realized (obj=<optimized out>, value=<optimized out>, errp=0x7fffffffe2e0) at hw/core/qdev.c:510
> #9 0x000055555564365d in property_set_bool (obj=0x5555557c28c0, v=<optimized out>, name=<optimized out>, opaque=0x5555557a9140, errp=0x7fffffffe2e0) at qom/object.c:2362
> #10 0x0000555555646bbb in object_property_set (obj=obj@entry=0x5555557c28c0, name=name@entry=0x5555556e8ae2 "realized", v=v@entry=0x5555557c6650, errp=errp@entry=0x7fffffffe2e0)
> at qom/object.c:1471
> #11 0x000055555564a45f in object_property_set_qobject (obj=obj@entry=0x5555557c28c0, name=name@entry=0x5555556e8ae2 "realized", value=value@entry=0x5555557a7a90, errp=errp@entry=0x7fffffffe2e0)
> at qom/qom-qobject.c:28
> #12 0x0000555555647224 in object_property_set_bool (obj=0x5555557c28c0, name=name@entry=0x5555556e8ae2 "realized", value=value@entry=true, errp=errp@entry=0x7fffffffe2e0)
> at qom/object.c:1541
> #13 0x000055555564027c in qdev_realize (dev=<optimized out>, bus=bus@entry=0x0, errp=errp@entry=0x7fffffffe2e0) at hw/core/qdev.c:291
> #14 0x000055555559bb54 in cpu_create (typename=<optimized out>) at hw/core/cpu-common.c:57
> #15 0x000055555559a467 in main (argc=4, argv=0x7fffffffeaa8, envp=<optimized out>) at linux-user/main.c:811
>
> Have cpu_exec_reset_hold() call the common tcg_exec_reset()
> helper on user emulation, eventually calling tcg_flush_jmp_cache().
>
> Fixes: bb6cf6f0168 ("accel/tcg: Factor tcg_cpu_reset_hold() out")
> Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> ---
> accel/tcg/user-exec-stub.c | 4 ----
> accel/tcg/user-exec.c | 5 +++++
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/accel/tcg/user-exec-stub.c b/accel/tcg/user-exec-stub.c
> index 4fbe2dbdc88..2dc6fd9c4e8 100644
> --- a/accel/tcg/user-exec-stub.c
> +++ b/accel/tcg/user-exec-stub.c
> @@ -14,10 +14,6 @@ void qemu_init_vcpu(CPUState *cpu)
> {
> }
>
> -void cpu_exec_reset_hold(CPUState *cpu)
> -{
> -}
> -
> /* User mode emulation does not support record/replay yet. */
>
> bool replay_exception(void)
> diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
> index 0561c4f6dc7..92640f07ed7 100644
> --- a/accel/tcg/user-exec.c
> +++ b/accel/tcg/user-exec.c
> @@ -40,6 +40,11 @@ __thread uintptr_t helper_retaddr;
>
> //#define DEBUG_SIGNAL
>
> +void cpu_exec_reset_hold(CPUState *cpu)
> +{
> + tcg_exec_reset(cpu);
> +}
> +
> void cpu_interrupt(CPUState *cpu, int mask)
> {
> g_assert(bql_locked());
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
On Thu, 2025-01-02 at 19:25 +0100, Philippe Mathieu-Daudé wrote:
> Commit bb6cf6f0168 ("accel/tcg: Factor tcg_cpu_reset_hold()
> out") wanted to restrict tlb_flush() to system emulation,
> but inadvertently also restricted tcg_flush_jmp_cache(),
> which was before called on user emulation via:
>
> Realize -> Reset -> cpu_common_reset_hold()
>
> Since threads (vCPUs) use a common CPUJumpCache, when many
> threads are created / joined, they eventually end re-using
> a CPUJumpCache entry, which was cleared when the first vCPU
> was allocated (via Realize) but then stayed dirty, leading to:
How are jump caches shared between qemu-user vCPUs?
I found the following, but this looks private and zeroed out
during initialization:
bool tcg_exec_realizefn(CPUState *cpu, Error **errp)
[...]
cpu->tb_jmp_cache = g_new0(CPUJumpCache, 1);
I was also wondering whether vCPUs themselves may be recycled, but
it doesn't seem to be the case, since do_fork() -> cpu_copy() ->
cpu_create() -> object_new() -> object_new_with_type() calls
g_malloc().
Btw, I tried to reproduce the original issue, but bumped into something
seemingly unrelated. To make matters worse, debugging seems to be
broken, so it may take some time before I can properly test this
change.
Thread 2 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 37607.37622]
0x000002aa00a6a64c in cs_option (ud=140251083477344,
type=CS_OPT_SYNTAX, value=2) at capstone/cs.c:782
782 return arch_configs[handle->arch].arch_option(handle,
type, value);
(gdb) info threads
Ignoring packet error, continuing...
On Fri, 2025-01-10 at 00:43 +0100, Ilya Leoshkevich wrote:
> On Thu, 2025-01-02 at 19:25 +0100, Philippe Mathieu-Daudé wrote:
> > Commit bb6cf6f0168 ("accel/tcg: Factor tcg_cpu_reset_hold()
> > out") wanted to restrict tlb_flush() to system emulation,
> > but inadvertently also restricted tcg_flush_jmp_cache(),
> > which was before called on user emulation via:
> >
> > Realize -> Reset -> cpu_common_reset_hold()
> >
> > Since threads (vCPUs) use a common CPUJumpCache, when many
> > threads are created / joined, they eventually end re-using
> > a CPUJumpCache entry, which was cleared when the first vCPU
> > was allocated (via Realize) but then stayed dirty, leading to:
>
> How are jump caches shared between qemu-user vCPUs?
> I found the following, but this looks private and zeroed out
> during initialization:
>
> bool tcg_exec_realizefn(CPUState *cpu, Error **errp)
> [...]
> cpu->tb_jmp_cache = g_new0(CPUJumpCache, 1);
>
> I was also wondering whether vCPUs themselves may be recycled, but
> it doesn't seem to be the case, since do_fork() -> cpu_copy() ->
> cpu_create() -> object_new() -> object_new_with_type() calls
> g_malloc().
>
>
>
> Btw, I tried to reproduce the original issue, but bumped into
> something
> seemingly unrelated. To make matters worse, debugging seems to be
> broken, so it may take some time before I can properly test this
> change.
>
> Thread 2 received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 37607.37622]
> 0x000002aa00a6a64c in cs_option (ud=140251083477344,
> type=CS_OPT_SYNTAX, value=2) at capstone/cs.c:782
> 782 return arch_configs[handle->arch].arch_option(handle,
> type, value);
> (gdb) info threads
> Ignoring packet error, continuing...
With a small debugger fix [1] I finally managed to investigate and fix
the crash, which turned out to be not caused by QEMU [2], and with that
the testsuite ran without further issues. So I don't seem to be able to
reproduce the original issue to verify this series.
[1]
https://lore.kernel.org/qemu-devel/20250113134658.68376-1-iii@linux.ibm.com/
[2] https://github.com/capstone-rust/capstone-rs/pull/166
© 2016 - 2026 Red Hat, Inc.