[PULL 1/3] linux-user: un-parent OBJECT(cpu) when closing thread

Alex Bennée posted 3 patches 3 years, 5 months ago
Maintainers: Laurent Vivier <laurent@vivier.eu>, Cleber Rosa <crosa@redhat.com>, "Philippe Mathieu-Daudé" <f4bug@amsat.org>, Wainer dos Santos Moschetta <wainersm@redhat.com>, Beraldo Leal <bleal@redhat.com>
[PULL 1/3] linux-user: un-parent OBJECT(cpu) when closing thread
Posted by Alex Bennée 3 years, 5 months ago
While forcing the CPU to unrealize by hand does trigger the clean-up
code we never fully free resources because refcount never reaches
zero. This is because QOM automatically added objects without an
explicit parent to /unattached/, incrementing the refcount.

Instead of manually triggering unrealization just unparent the object
and let the device machinery deal with that for us.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/866
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220811151413.3350684-2-alex.bennee@linaro.org>

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index f409121202..bfdd60136b 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8594,7 +8594,13 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1,
         if (CPU_NEXT(first_cpu)) {
             TaskState *ts = cpu->opaque;
 
-            object_property_set_bool(OBJECT(cpu), "realized", false, NULL);
+            if (ts->child_tidptr) {
+                put_user_u32(0, ts->child_tidptr);
+                do_sys_futex(g2h(cpu, ts->child_tidptr),
+                             FUTEX_WAKE, INT_MAX, NULL, NULL, 0);
+            }
+
+            object_unparent(OBJECT(cpu));
             object_unref(OBJECT(cpu));
             /*
              * At this point the CPU should be unrealized and removed
@@ -8604,11 +8610,6 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1,
 
             pthread_mutex_unlock(&clone_lock);
 
-            if (ts->child_tidptr) {
-                put_user_u32(0, ts->child_tidptr);
-                do_sys_futex(g2h(cpu, ts->child_tidptr),
-                             FUTEX_WAKE, INT_MAX, NULL, NULL, 0);
-            }
             thread_cpu = NULL;
             g_free(ts);
             rcu_unregister_thread();
-- 
2.30.2


Re: [PULL 1/3] linux-user: un-parent OBJECT(cpu) when closing thread
Posted by Richard Henderson 3 years, 5 months ago
On 8/16/22 05:26, Alex Bennée wrote:
> While forcing the CPU to unrealize by hand does trigger the clean-up
> code we never fully free resources because refcount never reaches
> zero. This is because QOM automatically added objects without an
> explicit parent to /unattached/, incrementing the refcount.
> 
> Instead of manually triggering unrealization just unparent the object
> and let the device machinery deal with that for us.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/866
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Laurent Vivier <laurent@vivier.eu>
> Message-Id: <20220811151413.3350684-2-alex.bennee@linaro.org>
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index f409121202..bfdd60136b 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -8594,7 +8594,13 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1,
>           if (CPU_NEXT(first_cpu)) {
>               TaskState *ts = cpu->opaque;
>   
> -            object_property_set_bool(OBJECT(cpu), "realized", false, NULL);
> +            if (ts->child_tidptr) {
> +                put_user_u32(0, ts->child_tidptr);
> +                do_sys_futex(g2h(cpu, ts->child_tidptr),
> +                             FUTEX_WAKE, INT_MAX, NULL, NULL, 0);
> +            }
> +
> +            object_unparent(OBJECT(cpu));

This has caused a regression in arm/aarch64.

We hard-code ARMCPRegInfo pointers into TranslationBlocks, for calling into 
helper_{get,set}cp_reg{,64}.  So we have a race condition between whichever cpu thread 
translates the code first (encoding the pointer), and that cpu thread exiting, so that the 
next execution of the TB references a freed data structure.

We shouldn't have N copies of these pointers in the first place.  This seems like 
something that ought to be placed on the ARMCPUClass, so that it could be shared by each cpu.

But that's going to be a complex fix, so I'm reverting this for rc4.


r~

Re: [PULL 1/3] linux-user: un-parent OBJECT(cpu) when closing thread
Posted by Alex Bennée 3 years, 5 months ago
Richard Henderson <richard.henderson@linaro.org> writes:

> On 8/16/22 05:26, Alex Bennée wrote:
>> While forcing the CPU to unrealize by hand does trigger the clean-up
>> code we never fully free resources because refcount never reaches
>> zero. This is because QOM automatically added objects without an
>> explicit parent to /unattached/, incrementing the refcount.
>> Instead of manually triggering unrealization just unparent the
>> object
>> and let the device machinery deal with that for us.
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/866
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> Reviewed-by: Laurent Vivier <laurent@vivier.eu>
>> Message-Id: <20220811151413.3350684-2-alex.bennee@linaro.org>
>> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
>> index f409121202..bfdd60136b 100644
>> --- a/linux-user/syscall.c
>> +++ b/linux-user/syscall.c
>> @@ -8594,7 +8594,13 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1,
>>           if (CPU_NEXT(first_cpu)) {
>>               TaskState *ts = cpu->opaque;
>>   -            object_property_set_bool(OBJECT(cpu), "realized",
>> false, NULL);
>> +            if (ts->child_tidptr) {
>> +                put_user_u32(0, ts->child_tidptr);
>> +                do_sys_futex(g2h(cpu, ts->child_tidptr),
>> +                             FUTEX_WAKE, INT_MAX, NULL, NULL, 0);
>> +            }
>> +
>> +            object_unparent(OBJECT(cpu));
>
> This has caused a regression in arm/aarch64.
>
> We hard-code ARMCPRegInfo pointers into TranslationBlocks, for calling
> into helper_{get,set}cp_reg{,64}.  So we have a race condition between
> whichever cpu thread translates the code first (encoding the pointer),
> and that cpu thread exiting, so that the next execution of the TB
> references a freed data structure.

What is the test case that breaks this? I guess a multi-threaded
sysregs.c would trigger it?

> We shouldn't have N copies of these pointers in the first place.  This
> seems like something that ought to be placed on the ARMCPUClass, so
> that it could be shared by each cpu.
>
> But that's going to be a complex fix, so I'm reverting this for rc4.
>
>
> r~


-- 
Alex Bennée
Re: [PULL 1/3] linux-user: un-parent OBJECT(cpu) when closing thread
Posted by Richard Henderson 3 years, 5 months ago
On 8/19/22 01:37, Alex Bennée wrote:
>> This has caused a regression in arm/aarch64.
>>
>> We hard-code ARMCPRegInfo pointers into TranslationBlocks, for calling
>> into helper_{get,set}cp_reg{,64}.  So we have a race condition between
>> whichever cpu thread translates the code first (encoding the pointer),
>> and that cpu thread exiting, so that the next execution of the TB
>> references a freed data structure.
> 
> What is the test case that breaks this? I guess a multi-threaded
> sysregs.c would trigger it?

E.g. tests/tcg/aarch64-linux-user/signals.


r~