linux-user/main.c | 2 ++ linux-user/syscall.c | 14 ++++++++++++++ linux-user/user-internals.h | 2 ++ 3 files changed, 18 insertions(+)
By the spec, fork() copies only the thread which executes it.
So it may happen, what while one thread is doing a fork,
another thread is holding `clone_lock` mutex
(e.g. doing a `fork()` or `exit()`).
So the child process is born with the mutex being held,
and there are nobody to release it.
As the thread executing do_syscall() is not considered running,
start_exclusive() does not protect us from the case.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3226
Signed-off-by: Aleksandr Sergeev <sergeev0xef@gmail.com>
---
linux-user/main.c | 2 ++
linux-user/syscall.c | 14 ++++++++++++++
linux-user/user-internals.h | 2 ++
3 files changed, 18 insertions(+)
diff --git a/linux-user/main.c b/linux-user/main.c
index db751c07576..c49d1e91d22 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -145,6 +145,7 @@ unsigned long guest_stack_size = TARGET_DEFAULT_STACK_SIZE;
void fork_start(void)
{
start_exclusive();
+ clone_fork_start();
mmap_fork_start();
cpu_list_lock();
qemu_plugin_user_prefork_lock();
@@ -174,6 +175,7 @@ void fork_end(pid_t pid)
cpu_list_unlock();
}
gdbserver_fork_end(thread_cpu, pid);
+ clone_fork_end(child);
/*
* qemu_init_cpu_list() reinitialized the child exclusive state, but we
* also need to keep current_cpu consistent, so call end_exclusive() for
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 3944004568f..e64e5db1139 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6856,6 +6856,20 @@ static void *clone_func(void *arg)
return NULL;
}
+void clone_fork_start(void)
+{
+ pthread_mutex_lock(&clone_lock);
+}
+
+void clone_fork_end(bool child)
+{
+ if (child) {
+ pthread_mutex_init(&clone_lock, NULL);
+ } else {
+ pthread_mutex_unlock(&clone_lock);
+ }
+}
+
/* do_fork() Must return host values and target errnos (unlike most
do_*() functions). */
static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
diff --git a/linux-user/user-internals.h b/linux-user/user-internals.h
index 067c02bb93e..24d35998f07 100644
--- a/linux-user/user-internals.h
+++ b/linux-user/user-internals.h
@@ -69,6 +69,8 @@ abi_long get_errno(abi_long ret);
const char *target_strerror(int err);
int get_osversion(void);
void init_qemu_uname_release(void);
+void clone_fork_start(void);
+void clone_fork_end(bool child);
void fork_start(void);
void fork_end(pid_t pid);
--
2.52.0
On 1/27/26 01:16, Aleksandr Sergeev wrote: > By the spec, fork() copies only the thread which executes it. > So it may happen, what while one thread is doing a fork, > another thread is holding `clone_lock` mutex > (e.g. doing a `fork()` or `exit()`). > So the child process is born with the mutex being held, > and there are nobody to release it. > > As the thread executing do_syscall() is not considered running, > start_exclusive() does not protect us from the case. > > Resolves:https://gitlab.com/qemu-project/qemu/-/issues/3226 > Signed-off-by: Aleksandr Sergeev<sergeev0xef@gmail.com> > --- > linux-user/main.c | 2 ++ > linux-user/syscall.c | 14 ++++++++++++++ > linux-user/user-internals.h | 2 ++ > 3 files changed, 18 insertions(+) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~
On 2/3/26 12:42, Richard Henderson wrote: > On 1/27/26 01:16, Aleksandr Sergeev wrote: >> By the spec, fork() copies only the thread which executes it. >> So it may happen, what while one thread is doing a fork, >> another thread is holding `clone_lock` mutex >> (e.g. doing a `fork()` or `exit()`). >> So the child process is born with the mutex being held, >> and there are nobody to release it. >> >> As the thread executing do_syscall() is not considered running, >> start_exclusive() does not protect us from the case. >> >> Resolves:https://gitlab.com/qemu-project/qemu/-/issues/3226 >> Signed-off-by: Aleksandr Sergeev<sergeev0xef@gmail.com> >> --- >> linux-user/main.c | 2 ++ >> linux-user/syscall.c | 14 ++++++++++++++ >> linux-user/user-internals.h | 2 ++ >> 3 files changed, 18 insertions(+) > > Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Queued, thanks. r~
Aleksandr Sergeev <sergeev0xef@gmail.com> writes:
(Adding Paolo/Richard to Cc for exclusive discussion)
> By the spec, fork() copies only the thread which executes it.
> So it may happen, what while one thread is doing a fork,
> another thread is holding `clone_lock` mutex
> (e.g. doing a `fork()` or `exit()`).
> So the child process is born with the mutex being held,
> and there are nobody to release it.
>
> As the thread executing do_syscall() is not considered running,
> start_exclusive() does not protect us from the case.
I see the logic of what you say as:
/* Write pending_cpus before reading other_cpu->running. */
smp_mb();
running_cpus = 0;
CPU_FOREACH(other_cpu) {
if (qatomic_read(&other_cpu->running)) {
other_cpu->has_waiter = true;
running_cpus++;
qemu_cpu_kick(other_cpu);
}
}
but...
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3226
> Signed-off-by: Aleksandr Sergeev <sergeev0xef@gmail.com>
> ---
> linux-user/main.c | 2 ++
> linux-user/syscall.c | 14 ++++++++++++++
> linux-user/user-internals.h | 2 ++
> 3 files changed, 18 insertions(+)
>
> diff --git a/linux-user/main.c b/linux-user/main.c
> index db751c07576..c49d1e91d22 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -145,6 +145,7 @@ unsigned long guest_stack_size = TARGET_DEFAULT_STACK_SIZE;
> void fork_start(void)
> {
> start_exclusive();
> + clone_fork_start();
> mmap_fork_start();
> cpu_list_lock();
> qemu_plugin_user_prefork_lock();
> @@ -174,6 +175,7 @@ void fork_end(pid_t pid)
> cpu_list_unlock();
> }
> gdbserver_fork_end(thread_cpu, pid);
> + clone_fork_end(child);
> /*
> * qemu_init_cpu_list() reinitialized the child exclusive state, but we
> * also need to keep current_cpu consistent, so call end_exclusive() for
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 3944004568f..e64e5db1139 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -6856,6 +6856,20 @@ static void *clone_func(void *arg)
> return NULL;
> }
>
> +void clone_fork_start(void)
> +{
> + pthread_mutex_lock(&clone_lock);
> +}
> +
> +void clone_fork_end(bool child)
> +{
> + if (child) {
> + pthread_mutex_init(&clone_lock, NULL);
> + } else {
> + pthread_mutex_unlock(&clone_lock);
> + }
> +}
> +
This looks like it should just be open-coded in the:
if (flags & CLONE_VM) {
}
leg of do_fork()?
Or maybe I'm missing the subtly of the race condition?
> /* do_fork() Must return host values and target errnos (unlike most
> do_*() functions). */
> static int do_fork(CPUArchState *env, unsigned int flags, abi_ulong newsp,
> diff --git a/linux-user/user-internals.h b/linux-user/user-internals.h
> index 067c02bb93e..24d35998f07 100644
> --- a/linux-user/user-internals.h
> +++ b/linux-user/user-internals.h
> @@ -69,6 +69,8 @@ abi_long get_errno(abi_long ret);
> const char *target_strerror(int err);
> int get_osversion(void);
> void init_qemu_uname_release(void);
> +void clone_fork_start(void);
> +void clone_fork_end(bool child);
> void fork_start(void);
> void fork_end(pid_t pid);
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
On Mon, 26 Jan 2026 at 16:02, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Aleksandr Sergeev <sergeev0xef@gmail.com> writes:
>
> (Adding Paolo/Richard to Cc for exclusive discussion)
>
> > By the spec, fork() copies only the thread which executes it.
> > So it may happen, what while one thread is doing a fork,
> > another thread is holding `clone_lock` mutex
> > (e.g. doing a `fork()` or `exit()`).
> > So the child process is born with the mutex being held,
> > and there are nobody to release it.
> >
> > As the thread executing do_syscall() is not considered running,
> > start_exclusive() does not protect us from the case.
> >
> > +void clone_fork_start(void)
> > +{
> > + pthread_mutex_lock(&clone_lock);
> > +}
> > +
> > +void clone_fork_end(bool child)
> > +{
> > + if (child) {
> > + pthread_mutex_init(&clone_lock, NULL);
> > + } else {
> > + pthread_mutex_unlock(&clone_lock);
> > + }
> > +}
> > +
>
> This looks like it should just be open-coded in the:
>
> if (flags & CLONE_VM) {
>
> }
>
> leg of do_fork()?
>
> Or maybe I'm missing the subtly of the race condition?
It's a general thing with fork() -- we need to make sure
we hold all the mutexes in QEMU before forking, so that when
we do the fork we know the state of the data structures they
are protecting. Then after the fork the parent unlocks the
mutexes again and the child can re-init the mutex.
We don't want to open-code this because there are quite a
lot of things to handle and do_fork() isalready pretty long
and complicated. Instead we have the fork_start() and fork_end()
functions which sequence all the things that need to be handled
across the fork, including the mutex lock/unlock stuff.
I think this patch is doing it the right way, by adding this
mutex to the set of things to be handled across fork. Compare
mmap_fork_start() and mmap_fork_end() which do this for the
mmap_mutex.
-- PMM
© 2016 - 2026 Red Hat, Inc.