From nobody Sun May 19 12:13:31 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=reject dis=none) header.from=google.com ARC-Seal: i=1; a=rsa-sha256; t=1591926471; cv=none; d=zohomail.com; s=zohoarc; b=aAHp2oFIhkl6G2OuvqGGm/iw7qiT7qHK3A4LY+0F6uP89Sf2ZilS4JE2lz7EEnbBgv5eCzaITBJestS46p7ombHcw2K4Id0F5TgWeg+lKJmZOCUGBDe0Hq4oGv/sppWy6vWfXiICpn4OQQ8ZB6EqsjWbf//xtC21ymg6keWc4tY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1591926471; h=Content-Type:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=gapAc62Y0td/O+BfFuvMHL0I99L+MBv6AKtq97q67nA=; b=NEckwEDbdJY0fiH4r+ccyel9uNoFm8r74Fo+RN9bm10PFgaRMVzpxPHUMy96IsKOMXNjnvs4z9c5AjHFMxxXlVCwag/6ccKqdGcld1WWH7eqSxOCcu73mgM6Jj1BRqRu11jafkPPA4BVzcR+MUux0Rbv4klCz2EYFLU/2wO+PKE= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=reject dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1591926471340582.6177847525502; Thu, 11 Jun 2020 18:47:51 -0700 (PDT) Received: from localhost ([::1]:53444 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jjYnN-0002YV-Qg for importer@patchew.org; Thu, 11 Jun 2020 21:47:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51008) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3Y97iXgMKCqgRShOWWOTM.KWUYMUc-LMdMTVWVOVc.WZO@flex--jkz.bounces.google.com>) id 1jjYly-0000w9-DO for qemu-devel@nongnu.org; Thu, 11 Jun 2020 21:46:22 -0400 Received: from mail-yb1-xb4a.google.com ([2607:f8b0:4864:20::b4a]:54976) by eggs.gnu.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3Y97iXgMKCqgRShOWWOTM.KWUYMUc-LMdMTVWVOVc.WZO@flex--jkz.bounces.google.com>) id 1jjYlv-0001Ul-Kd for qemu-devel@nongnu.org; Thu, 11 Jun 2020 21:46:22 -0400 Received: by mail-yb1-xb4a.google.com with SMTP id p22so8734099ybg.21 for ; Thu, 11 Jun 2020 18:46:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=gapAc62Y0td/O+BfFuvMHL0I99L+MBv6AKtq97q67nA=; b=UNZManRrcIOG3hz2BSX5qgzphaRjWDLtbBPKQ+Dz7cDgkGoScO52FB+IEPRLQkPs53 krBkpS0JR92U9mZNSJn3992tk6usaMBxwYr5vSqtom4WBY0pKEQ51vikRjdMhqi/xIbZ hfdYBrmCxCA1HeBzyDmk7nVQv5KVgU12csdfPJ1qKqCHT6pZJeazX4DU5FR8CNh55oyS ChjhTZCur2mMHy4VbhHB2idTuQemYRRUVnHXWUc1a66ASR31DmISGom/AQ4V8IH5VgvB C4PM8mrmV9MwQzEqgKWOuuYMhQOFuVpTbt2vEv1zixw8bRstvD3D/7XEU/3hIH3RQl+i ++5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=gapAc62Y0td/O+BfFuvMHL0I99L+MBv6AKtq97q67nA=; b=D7bT7nOFAG/MqoNm0eXykyvTSgB8Uvt69Mo/EwoWXxpWMyh5gpR3scQI2by/NWWt0L jERyv9Xf+fN0foz/KYEpvKiZ7N0uXv5gtONkh9nUWa+qO1T2nXyJlp8TV8+4TUlWwLwn 6uskHONXOLaAvb2/jqzyimT/esZTR+ITGKOkHiIegdg6BuzRi9d0ToQyhIYXZoFaTV6I LX3v9EMTJoDyWDIJZUlz7gVnEky5DRrIvhFPhKFGC3wWI6iryi1/bq80brHj3Wjm9QN6 yI7Au+6b+PwRQYCQsiMXZlmdBSRdhpAz1OIb8d7utPc61I9ritvrzMbS7IOzrEDnKHP3 ZLYA== X-Gm-Message-State: AOAM531ZgxFbCOe/fCkPbUeqtfMIDcPXmj4mFQABDvmiXhf3JEwu2XYK ucX6P5hjNcwKA4AsmpgiSF7l4k/EldLVcTQYibt3YbQgMkrNnBgOxYr9pMGvh3E2l21IkfLHoYj xuB3CuIaNp6DFPII1VPwVpC7xT4k/Xi7ME8glUmZUwQCgOsjSq0/y X-Google-Smtp-Source: ABdhPJxNNBIu+raP6oEUgHk2C3aGta4khQdCFs5N8n4PqFrlqFzvKTWB6xf1e77zu07PmR2iYtVOkv4= X-Received: by 2002:a25:bc81:: with SMTP id e1mr7623586ybk.375.1591926371420; Thu, 11 Jun 2020 18:46:11 -0700 (PDT) Date: Thu, 11 Jun 2020 18:46:02 -0700 In-Reply-To: <20200612014606.147691-1-jkz@google.com> Message-Id: <20200612014606.147691-2-jkz@google.com> Mime-Version: 1.0 References: <20200612014606.147691-1-jkz@google.com> X-Mailer: git-send-email 2.27.0.290.gba653c62da-goog Subject: [PATCH 1/5] linux-user: Refactor do_fork to use new `qemu_clone` From: Josh Kunz To: qemu-devel@nongnu.org Cc: riku.voipio@iki.fi, laurent@vivier.eu, alex.bennee@linaro.org, Josh Kunz Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::b4a; envelope-from=3Y97iXgMKCqgRShOWWOTM.KWUYMUc-LMdMTVWVOVc.WZO@flex--jkz.bounces.google.com; helo=mail-yb1-xb4a.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -105 X-Spam_score: -10.6 X-Spam_bar: ---------- X-Spam_report: (-10.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-1, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @google.com) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is pre-work for adding full support for the `CLONE_VM` `clone` flag. In a follow-up patch, we'll add support to `clone.c` for `clone_vm`-type clones beyond threads. CLONE_VM support is more complicated, so first we're splitting existing clone mechanisms (pthread_create, and fork) into a separate file. Signed-off-by: Josh Kunz --- linux-user/Makefile.objs | 2 +- linux-user/clone.c | 152 ++++++++++++++++ linux-user/clone.h | 27 +++ linux-user/syscall.c | 376 +++++++++++++++++++-------------------- 4 files changed, 365 insertions(+), 192 deletions(-) create mode 100644 linux-user/clone.c create mode 100644 linux-user/clone.h diff --git a/linux-user/Makefile.objs b/linux-user/Makefile.objs index 1940910a73..d6788f012c 100644 --- a/linux-user/Makefile.objs +++ b/linux-user/Makefile.objs @@ -1,7 +1,7 @@ obj-y =3D main.o syscall.o strace.o mmap.o signal.o \ elfload.o linuxload.o uaccess.o uname.o \ safe-syscall.o $(TARGET_ABI_DIR)/signal.o \ - $(TARGET_ABI_DIR)/cpu_loop.o exit.o fd-trans.o + $(TARGET_ABI_DIR)/cpu_loop.o exit.o fd-trans.o clone.o =20 obj-$(TARGET_HAS_BFLT) +=3D flatload.o obj-$(TARGET_I386) +=3D vm86.o diff --git a/linux-user/clone.c b/linux-user/clone.c new file mode 100644 index 0000000000..f02ae8c464 --- /dev/null +++ b/linux-user/clone.c @@ -0,0 +1,152 @@ +#include "qemu/osdep.h" +#include "qemu.h" +#include "clone.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static const unsigned long NEW_STACK_SIZE =3D 0x40000UL; + +/* + * A completion tracks an event that can be completed. It's based on the + * kernel concept with the same name, but implemented with userspace locks. + */ +struct completion { + /* done is set once this completion has been completed. */ + bool done; + /* mu syncronizes access to this completion. */ + pthread_mutex_t mu; + /* cond is used to broadcast completion status to awaiting threads. */ + pthread_cond_t cond; +}; + +static void completion_init(struct completion *c) +{ + c->done =3D false; + pthread_mutex_init(&c->mu, NULL); + pthread_cond_init(&c->cond, NULL); +} + +/* + * Block until the given completion finishes. Returns immediately if the + * completion has already finished. + */ +static void completion_await(struct completion *c) +{ + pthread_mutex_lock(&c->mu); + if (c->done) { + pthread_mutex_unlock(&c->mu); + return; + } + pthread_cond_wait(&c->cond, &c->mu); + assert(c->done && "returned from cond wait without being marked as don= e"); + pthread_mutex_unlock(&c->mu); +} + +/* + * Finish the completion. Unblocks all awaiters. + */ +static void completion_finish(struct completion *c) +{ + pthread_mutex_lock(&c->mu); + assert(!c->done && "trying to finish an already finished completion"); + c->done =3D true; + pthread_cond_broadcast(&c->cond); + pthread_mutex_unlock(&c->mu); +} + +struct clone_thread_info { + struct completion running; + int tid; + int (*callback)(void *); + void *payload; +}; + +static void *clone_thread_run(void *raw_info) +{ + struct clone_thread_info *info =3D (struct clone_thread_info *) raw_in= fo; + info->tid =3D syscall(SYS_gettid); + + /* + * Save out callback/payload since lifetime of info is only guaranteed + * until we finish the completion. + */ + int (*callback)(void *) =3D info->callback; + void *payload =3D info->payload; + completion_finish(&info->running); + + _exit(callback(payload)); +} + +static int clone_thread(int flags, int (*callback)(void *), void *payload) +{ + struct clone_thread_info info; + pthread_attr_t attr; + int ret; + pthread_t thread_unused; + + memset(&info, 0, sizeof(info)); + + completion_init(&info.running); + info.callback =3D callback; + info.payload =3D payload; + + (void)pthread_attr_init(&attr); + (void)pthread_attr_setstacksize(&attr, NEW_STACK_SIZE); + (void)pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); + + ret =3D pthread_create(&thread_unused, &attr, clone_thread_run, (void = *) &info); + /* pthread_create returns errors directly, instead of via errno. */ + if (ret !=3D 0) { + errno =3D ret; + ret =3D -1; + } else { + completion_await(&info.running); + ret =3D info.tid; + } + + pthread_attr_destroy(&attr); + return ret; +} + +int qemu_clone(int flags, int (*callback)(void *), void *payload) +{ + int ret; + + if (clone_flags_are_thread(flags)) { + /* + * The new process uses the same flags as pthread_create, so we can + * use pthread_create directly. This is an optimization. + */ + return clone_thread(flags, callback, payload); + } + + if (clone_flags_are_fork(flags)) { + /* + * Special case a true `fork` clone call. This is so we can take + * advantage of special pthread_atfork handlers in libraries we + * depend on (e.g., glibc). Without this, existing users of `fork` + * in multi-threaded environments will likely get new flaky + * deadlocks. + */ + fork_start(); + ret =3D fork(); + if (ret =3D=3D 0) { + fork_end(1); + _exit(callback(payload)); + } + fork_end(0); + return ret; + } + + /* !fork && !thread */ + errno =3D EINVAL; + return -1; +} diff --git a/linux-user/clone.h b/linux-user/clone.h new file mode 100644 index 0000000000..34ae9b3780 --- /dev/null +++ b/linux-user/clone.h @@ -0,0 +1,27 @@ +#ifndef CLONE_H +#define CLONE_H + +/* + * qemu_clone executes the given `callback`, with the given payload as the + * first argument, in a new process created with the given flags. Some clo= ne + * flags, such as *SETTLS, *CLEARTID are not supported. The child thread I= D is + * returned on success, otherwise negative errno is returned on clone fail= ure. + */ +int qemu_clone(int flags, int (*callback)(void *), void *payload); + +/* Returns true if the given clone flags can be emulated with libc fork. */ +static bool clone_flags_are_fork(unsigned int flags) +{ + return flags =3D=3D SIGCHLD; +} + +/* Returns true if the given clone flags can be emulated with pthread_crea= te. */ +static bool clone_flags_are_thread(unsigned int flags) +{ + return flags =3D=3D ( + CLONE_VM | CLONE_FS | CLONE_FILES | + CLONE_SIGHAND | CLONE_THREAD | CLONE_SYSVSEM + ); +} + +#endif /* CLONE_H */ diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 97de9fb5c9..7ce021cea2 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -122,6 +122,7 @@ #include "qapi/error.h" #include "fd-trans.h" #include "tcg/tcg.h" +#include "clone.h" =20 #ifndef CLONE_IO #define CLONE_IO 0x80000000 /* Clone io context */ @@ -135,12 +136,6 @@ * * flags we can implement within QEMU itself * * flags we can't support and will return an error for */ -/* For thread creation, all these flags must be present; for - * fork, none must be present. - */ -#define CLONE_THREAD_FLAGS \ - (CLONE_VM | CLONE_FS | CLONE_FILES | \ - CLONE_SIGHAND | CLONE_THREAD | CLONE_SYSVSEM) =20 /* These flags are ignored: * CLONE_DETACHED is now ignored by the kernel; @@ -150,30 +145,10 @@ (CLONE_DETACHED | CLONE_IO) =20 /* Flags for fork which we can implement within QEMU itself */ -#define CLONE_OPTIONAL_FORK_FLAGS \ +#define CLONE_EMULATED_FLAGS \ (CLONE_SETTLS | CLONE_PARENT_SETTID | \ CLONE_CHILD_CLEARTID | CLONE_CHILD_SETTID) =20 -/* Flags for thread creation which we can implement within QEMU itself */ -#define CLONE_OPTIONAL_THREAD_FLAGS \ - (CLONE_SETTLS | CLONE_PARENT_SETTID | \ - CLONE_CHILD_CLEARTID | CLONE_CHILD_SETTID | CLONE_PARENT) - -#define CLONE_INVALID_FORK_FLAGS \ - (~(CSIGNAL | CLONE_OPTIONAL_FORK_FLAGS | CLONE_IGNORED_FLAGS)) - -#define CLONE_INVALID_THREAD_FLAGS \ - (~(CSIGNAL | CLONE_THREAD_FLAGS | CLONE_OPTIONAL_THREAD_FLAGS | \ - CLONE_IGNORED_FLAGS)) - -/* CLONE_VFORK is special cased early in do_fork(). The other flag bits - * have almost all been allocated. We cannot support any of - * CLONE_NEWNS, CLONE_NEWCGROUP, CLONE_NEWUTS, CLONE_NEWIPC, - * CLONE_NEWUSER, CLONE_NEWPID, CLONE_NEWNET, CLONE_PTRACE, CLONE_UNTRACED. - * The checks against the invalid thread masks above will catch these. - * (The one remaining unallocated bit is 0x1000 which used to be CLONE_PID= .) - */ - /* Define DEBUG_ERESTARTSYS to force every syscall to be restarted * once. This exercises the codepaths for restart. */ @@ -1104,7 +1079,7 @@ static inline rlim_t target_to_host_rlim(abi_ulong ta= rget_rlim) { abi_ulong target_rlim_swap; rlim_t result; - =20 + target_rlim_swap =3D tswapal(target_rlim); if (target_rlim_swap =3D=3D TARGET_RLIM_INFINITY) return RLIM_INFINITY; @@ -1112,7 +1087,7 @@ static inline rlim_t target_to_host_rlim(abi_ulong ta= rget_rlim) result =3D target_rlim_swap; if (target_rlim_swap !=3D (rlim_t)result) return RLIM_INFINITY; - =20 + return result; } #endif @@ -1122,13 +1097,13 @@ static inline abi_ulong host_to_target_rlim(rlim_t = rlim) { abi_ulong target_rlim_swap; abi_ulong result; - =20 + if (rlim =3D=3D RLIM_INFINITY || rlim !=3D (abi_long)rlim) target_rlim_swap =3D TARGET_RLIM_INFINITY; else target_rlim_swap =3D rlim; result =3D tswapal(target_rlim_swap); - =20 + return result; } #endif @@ -1615,10 +1590,11 @@ static inline abi_long target_to_host_cmsg(struct m= sghdr *msgh, abi_ulong target_cmsg_addr; struct target_cmsghdr *target_cmsg, *target_cmsg_start; socklen_t space =3D 0; - =20 + msg_controllen =3D tswapal(target_msgh->msg_controllen); - if (msg_controllen < sizeof (struct target_cmsghdr))=20 + if (msg_controllen < sizeof(struct target_cmsghdr)) { goto the_end; + } target_cmsg_addr =3D tswapal(target_msgh->msg_control); target_cmsg =3D lock_user(VERIFY_READ, target_cmsg_addr, msg_controlle= n, 1); target_cmsg_start =3D target_cmsg; @@ -1703,8 +1679,9 @@ static inline abi_long host_to_target_cmsg(struct tar= get_msghdr *target_msgh, socklen_t space =3D 0; =20 msg_controllen =3D tswapal(target_msgh->msg_controllen); - if (msg_controllen < sizeof (struct target_cmsghdr))=20 + if (msg_controllen < sizeof(struct target_cmsghdr)) { goto the_end; + } target_cmsg_addr =3D tswapal(target_msgh->msg_control); target_cmsg =3D lock_user(VERIFY_WRITE, target_cmsg_addr, msg_controll= en, 0); target_cmsg_start =3D target_cmsg; @@ -5750,9 +5727,10 @@ abi_long do_set_thread_area(CPUX86State *env, abi_ul= ong ptr) } unlock_user_struct(target_ldt_info, ptr, 1); =20 - if (ldt_info.entry_number < TARGET_GDT_ENTRY_TLS_MIN ||=20 - ldt_info.entry_number > TARGET_GDT_ENTRY_TLS_MAX) - return -TARGET_EINVAL; + if (ldt_info.entry_number < TARGET_GDT_ENTRY_TLS_MIN || + ldt_info.entry_number > TARGET_GDT_ENTRY_TLS_MAX) { + return -TARGET_EINVAL; + } seg_32bit =3D ldt_info.flags & 1; contents =3D (ldt_info.flags >> 1) & 3; read_exec_only =3D (ldt_info.flags >> 3) & 1; @@ -5828,7 +5806,7 @@ static abi_long do_get_thread_area(CPUX86State *env, = abi_ulong ptr) lp =3D (uint32_t *)(gdt_table + idx); entry_1 =3D tswap32(lp[0]); entry_2 =3D tswap32(lp[1]); - =20 + read_exec_only =3D ((entry_2 >> 9) & 1) ^ 1; contents =3D (entry_2 >> 10) & 3; seg_not_present =3D ((entry_2 >> 15) & 1) ^ 1; @@ -5844,8 +5822,8 @@ static abi_long do_get_thread_area(CPUX86State *env, = abi_ulong ptr) (read_exec_only << 3) | (limit_in_pages << 4) | (seg_not_present << 5) | (useable << 6) | (lm << 7); limit =3D (entry_1 & 0xffff) | (entry_2 & 0xf0000); - base_addr =3D (entry_1 >> 16) |=20 - (entry_2 & 0xff000000) |=20 + base_addr =3D (entry_1 >> 16) | + (entry_2 & 0xff000000) | ((entry_2 & 0xff) << 16); target_ldt_info->base_addr =3D tswapal(base_addr); target_ldt_info->limit =3D tswap32(limit); @@ -5895,53 +5873,71 @@ abi_long do_arch_prctl(CPUX86State *env, int code, = abi_ulong addr) =20 #endif /* defined(TARGET_I386) */ =20 -#define NEW_STACK_SIZE 0x40000 - - static pthread_mutex_t clone_lock =3D PTHREAD_MUTEX_INITIALIZER; typedef struct { - CPUArchState *env; + /* Used to synchronize thread/process creation between parent and chil= d. */ pthread_mutex_t mutex; pthread_cond_t cond; - pthread_t thread; - uint32_t tid; + /* + * Guest pointers for implementing CLONE_PARENT_SETTID + * and CLONE_CHILD_SETTID. + */ abi_ulong child_tidptr; abi_ulong parent_tidptr; - sigset_t sigmask; -} new_thread_info; + struct { + sigset_t sigmask; + CPUArchState *env; + bool register_thread; + bool signal_setup; + } child; +} clone_info; =20 -static void *clone_func(void *arg) +static int clone_run(void *arg) { - new_thread_info *info =3D arg; + clone_info *info =3D (clone_info *) arg; CPUArchState *env; CPUState *cpu; TaskState *ts; + uint32_t tid; =20 - rcu_register_thread(); - tcg_register_thread(); - env =3D info->env; + if (info->child.register_thread) { + rcu_register_thread(); + tcg_register_thread(); + } + + env =3D info->child.env; cpu =3D env_cpu(env); thread_cpu =3D cpu; ts =3D (TaskState *)cpu->opaque; - info->tid =3D sys_gettid(); + tid =3D sys_gettid(); task_settid(ts); - if (info->child_tidptr) - put_user_u32(info->tid, info->child_tidptr); - if (info->parent_tidptr) - put_user_u32(info->tid, info->parent_tidptr); + qemu_guest_random_seed_thread_part2(cpu->random_seed); - /* Enable signals. */ - sigprocmask(SIG_SETMASK, &info->sigmask, NULL); - /* Signal to the parent that we're ready. */ - pthread_mutex_lock(&info->mutex); - pthread_cond_broadcast(&info->cond); - pthread_mutex_unlock(&info->mutex); - /* Wait until the parent has finished initializing the tls state. */ - pthread_mutex_lock(&clone_lock); - pthread_mutex_unlock(&clone_lock); + + if (info->parent_tidptr) { + /* + * Even when memory is not shared, parent_tidptr is set before the + * process copy, so we need to set it in the child. + */ + put_user_u32(tid, info->parent_tidptr); + } + + if (info->child_tidptr) { + put_user_u32(tid, info->child_tidptr); + } + + /* Enable signals. */ + sigprocmask(SIG_SETMASK, &info->child.sigmask, NULL); + + if (info->child.signal_setup) { + pthread_mutex_lock(&info->mutex); + pthread_cond_broadcast(&info->cond); + pthread_mutex_unlock(&info->mutex); + } + cpu_loop(env); /* never exits */ - return NULL; + _exit(1); /* avoid compiler warning. */ } =20 /* do_fork() Must return host values and target errnos (unlike most @@ -5951,139 +5947,131 @@ static int do_fork(CPUArchState *env, unsigned in= t flags, abi_ulong newsp, abi_ulong child_tidptr) { CPUState *cpu =3D env_cpu(env); - int ret; + int proc_flags, host_sig, ret; TaskState *ts; CPUState *new_cpu; - CPUArchState *new_env; - sigset_t sigmask; + sigset_t block_sigmask; + sigset_t orig_sigmask; + clone_info info; + TaskState *parent_ts =3D (TaskState *)cpu->opaque; =20 - flags &=3D ~CLONE_IGNORED_FLAGS; + memset(&info, 0, sizeof(info)); + + /* + * When cloning the actual subprocess, we don't need to worry about any + * flags that can be ignored, or emulated in QEMU. proc_flags holds on= ly + * the flags that need to be passed to `clone` itself. + */ + proc_flags =3D flags & ~(CLONE_EMULATED_FLAGS | CLONE_IGNORED_FLAGS); + + /* + * The exit signal is included in the flags. That signal needs to be m= apped + * to the appropriate host signal, and we need to check if that signal= is + * supported. + */ + host_sig =3D target_to_host_signal(proc_flags & CSIGNAL); + if (host_sig > SIGRTMAX) { + qemu_log_mask(LOG_UNIMP, + "guest signal %d not supported for exit_signal", + proc_flags & CSIGNAL); + return -TARGET_EINVAL; + } + proc_flags =3D (proc_flags & ~CSIGNAL) | host_sig; =20 /* Emulate vfork() with fork() */ - if (flags & CLONE_VFORK) - flags &=3D ~(CLONE_VFORK | CLONE_VM); + if (proc_flags & CLONE_VFORK) { + proc_flags &=3D ~(CLONE_VFORK | CLONE_VM); + } =20 - if (flags & CLONE_VM) { - TaskState *parent_ts =3D (TaskState *)cpu->opaque; - new_thread_info info; - pthread_attr_t attr; + if (!clone_flags_are_fork(proc_flags) && + !clone_flags_are_thread(proc_flags)) { + qemu_log_mask(LOG_UNIMP, "unsupported clone flags"); + return -TARGET_EINVAL; + } =20 - if (((flags & CLONE_THREAD_FLAGS) !=3D CLONE_THREAD_FLAGS) || - (flags & CLONE_INVALID_THREAD_FLAGS)) { - return -TARGET_EINVAL; - } + pthread_mutex_init(&info.mutex, NULL); + pthread_mutex_lock(&info.mutex); + pthread_cond_init(&info.cond, NULL); =20 - ts =3D g_new0(TaskState, 1); - init_task_state(ts); + ts =3D g_new0(TaskState, 1); + init_task_state(ts); =20 - /* Grab a mutex so that thread setup appears atomic. */ - pthread_mutex_lock(&clone_lock); + /* Guard CPU copy. It is not thread-safe. */ + pthread_mutex_lock(&clone_lock); + info.child.env =3D cpu_copy(env); + pthread_mutex_unlock(&clone_lock); + /* Init regs that differ from the parent. */ + cpu_clone_regs_child(info.child.env, newsp, flags); =20 - /* we create a new CPU instance. */ - new_env =3D cpu_copy(env); - /* Init regs that differ from the parent. */ - cpu_clone_regs_child(new_env, newsp, flags); + if (flags & CLONE_SETTLS) { + cpu_set_tls(info.child.env, newtls); + } + + new_cpu =3D env_cpu(info.child.env); + new_cpu->opaque =3D ts; + ts->bprm =3D parent_ts->bprm; + ts->info =3D parent_ts->info; + ts->signal_mask =3D parent_ts->signal_mask; + + if (flags & CLONE_CHILD_CLEARTID) { + ts->child_tidptr =3D child_tidptr; + } + + if (flags & CLONE_CHILD_SETTID) { + info.child_tidptr =3D child_tidptr; + } + if (flags & CLONE_PARENT_SETTID) { + info.parent_tidptr =3D parent_tidptr; + } + + /* + * If the child process is going to share memory, and this is our first + * such child process or thread, we need to ensure we generate code for + * parallel execution and flush old translations. + */ + if (!parallel_cpus && (proc_flags & CLONE_VM)) { + parallel_cpus =3D true; + tb_flush(cpu); + } + + if (proc_flags & CLONE_VM) { + info.child.register_thread =3D true; + info.child.signal_setup =3D true; + } + + /* + * It is not safe to deliver signals until the child has finished + * initializing, so temporarily block all signals. + */ + sigfillset(&block_sigmask); + sigprocmask(SIG_BLOCK, &block_sigmask, &orig_sigmask); + info.child.sigmask =3D orig_sigmask; + + ret =3D get_errno(qemu_clone(proc_flags, clone_run, (void *) &info)); + + if (ret >=3D 0 && (proc_flags & CLONE_VM)) { + /* + * Wait for the child to finish setup if the child is running in t= he + * same VM. + */ + pthread_cond_wait(&info.cond, &info.mutex); + } + + sigprocmask(SIG_SETMASK, &orig_sigmask, NULL); + + pthread_mutex_unlock(&info.mutex); + pthread_cond_destroy(&info.cond); + pthread_mutex_destroy(&info.mutex); + + if (ret >=3D 0 && !(proc_flags & CLONE_VM)) { + /* + * If !CLONE_VM, then we need to set parent_tidptr, since the child + * won't set it for us. Should always be safe to set it here anywa= ys. + */ + put_user_u32(ret, info.parent_tidptr); cpu_clone_regs_parent(env, flags); - new_cpu =3D env_cpu(new_env); - new_cpu->opaque =3D ts; - ts->bprm =3D parent_ts->bprm; - ts->info =3D parent_ts->info; - ts->signal_mask =3D parent_ts->signal_mask; - - if (flags & CLONE_CHILD_CLEARTID) { - ts->child_tidptr =3D child_tidptr; - } - - if (flags & CLONE_SETTLS) { - cpu_set_tls (new_env, newtls); - } - - memset(&info, 0, sizeof(info)); - pthread_mutex_init(&info.mutex, NULL); - pthread_mutex_lock(&info.mutex); - pthread_cond_init(&info.cond, NULL); - info.env =3D new_env; - if (flags & CLONE_CHILD_SETTID) { - info.child_tidptr =3D child_tidptr; - } - if (flags & CLONE_PARENT_SETTID) { - info.parent_tidptr =3D parent_tidptr; - } - - ret =3D pthread_attr_init(&attr); - ret =3D pthread_attr_setstacksize(&attr, NEW_STACK_SIZE); - ret =3D pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED= ); - /* It is not safe to deliver signals until the child has finished - initializing, so temporarily block all signals. */ - sigfillset(&sigmask); - sigprocmask(SIG_BLOCK, &sigmask, &info.sigmask); - cpu->random_seed =3D qemu_guest_random_seed_thread_part1(); - - /* If this is our first additional thread, we need to ensure we - * generate code for parallel execution and flush old translations. - */ - if (!parallel_cpus) { - parallel_cpus =3D true; - tb_flush(cpu); - } - - ret =3D pthread_create(&info.thread, &attr, clone_func, &info); - /* TODO: Free new CPU state if thread creation failed. */ - - sigprocmask(SIG_SETMASK, &info.sigmask, NULL); - pthread_attr_destroy(&attr); - if (ret =3D=3D 0) { - /* Wait for the child to initialize. */ - pthread_cond_wait(&info.cond, &info.mutex); - ret =3D info.tid; - } else { - ret =3D -1; - } - pthread_mutex_unlock(&info.mutex); - pthread_cond_destroy(&info.cond); - pthread_mutex_destroy(&info.mutex); - pthread_mutex_unlock(&clone_lock); - } else { - /* if no CLONE_VM, we consider it is a fork */ - if (flags & CLONE_INVALID_FORK_FLAGS) { - return -TARGET_EINVAL; - } - - /* We can't support custom termination signals */ - if ((flags & CSIGNAL) !=3D TARGET_SIGCHLD) { - return -TARGET_EINVAL; - } - - if (block_signals()) { - return -TARGET_ERESTARTSYS; - } - - fork_start(); - ret =3D fork(); - if (ret =3D=3D 0) { - /* Child Process. */ - cpu_clone_regs_child(env, newsp, flags); - fork_end(1); - /* There is a race condition here. The parent process could - theoretically read the TID in the child process before the = child - tid is set. This would require using either ptrace - (not implemented) or having *_tidptr to point at a shared m= emory - mapping. We can't repeat the spinlock hack used above beca= use - the child process gets its own copy of the lock. */ - if (flags & CLONE_CHILD_SETTID) - put_user_u32(sys_gettid(), child_tidptr); - if (flags & CLONE_PARENT_SETTID) - put_user_u32(sys_gettid(), parent_tidptr); - ts =3D (TaskState *)cpu->opaque; - if (flags & CLONE_SETTLS) - cpu_set_tls (env, newtls); - if (flags & CLONE_CHILD_CLEARTID) - ts->child_tidptr =3D child_tidptr; - } else { - cpu_clone_regs_parent(env, flags); - fork_end(0); - } } + return ret; } =20 @@ -7644,6 +7632,7 @@ static abi_long do_syscall1(void *cpu_env, int num, a= bi_long arg1, =20 switch(num) { case TARGET_NR_exit: + { /* In old applications this may be used to implement _exit(2). However in threaded applictions it is used for thread terminati= on, and _exit_group is used for application termination. @@ -7673,6 +7662,7 @@ static abi_long do_syscall1(void *cpu_env, int num, a= bi_long arg1, do_sys_futex(g2h(ts->child_tidptr), FUTEX_WAKE, INT_MAX, NULL, NULL, 0); } + thread_cpu =3D NULL; g_free(ts); rcu_unregister_thread(); @@ -7683,6 +7673,7 @@ static abi_long do_syscall1(void *cpu_env, int num, a= bi_long arg1, preexit_cleanup(cpu_env, arg1); _exit(arg1); return 0; /* avoid warning */ + } case TARGET_NR_read: if (arg2 =3D=3D 0 && arg3 =3D=3D 0) { return get_errno(safe_read(arg1, 0, 0)); @@ -9679,9 +9670,10 @@ static abi_long do_syscall1(void *cpu_env, int num, = abi_long arg1, return ret; #ifdef __NR_exit_group /* new thread calls */ - case TARGET_NR_exit_group: + case TARGET_NR_exit_group: { preexit_cleanup(cpu_env, arg1); return get_errno(exit_group(arg1)); + } #endif case TARGET_NR_setdomainname: if (!(p =3D lock_user_string(arg1))) @@ -10873,8 +10865,10 @@ static abi_long do_syscall1(void *cpu_env, int num= , abi_long arg1, return get_errno(fchown(arg1, low2highuid(arg2), low2highgid(arg3)= )); #if defined(TARGET_NR_fchownat) case TARGET_NR_fchownat: - if (!(p =3D lock_user_string(arg2)))=20 + p =3D lock_user_string(arg2) + if (!p) { return -TARGET_EFAULT; + } ret =3D get_errno(fchownat(arg1, p, low2highuid(arg3), low2highgid(arg4), arg5)); unlock_user(p, arg2, 0); --=20 2.27.0.290.gba653c62da-goog From nobody Sun May 19 12:13:31 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=reject dis=none) header.from=google.com ARC-Seal: i=1; a=rsa-sha256; t=1591926553; cv=none; d=zohomail.com; s=zohoarc; b=Gd6lGqAuRwpErtWa4rb9Cs5spFMXiKCXZv0at/Nw60gQdjGiTs5rFEr1ZT0mq5uLon3dQf129gY0WyQYZcv9q+afuy/lNCWcgEejtLnqQlzamlN7uPfRFPQI1kfAKiqd8B3DmOeZtwE8eCJA2F0kkBA3TTqlD3HrMsjN+Hge2Ns= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1591926553; h=Content-Type:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=zrAZryzclMmQ2GnW3+4bij4G7St8PIqWiedWlSplzFg=; b=lmWsmF2JtmEpud74LqFduQk2gbYdb/h6qPzjVPh+fZh7c/4ynMtYnCKA7zH/O468+9PAlG1WPg2qnDOJH76p7euxX4k5c8JD+FH2DE0GKZfxYDXq6K2BJporhBgIBDFGzKra/fK8OVkkYjSYOB6Mst+YcnIAwHyKX5t+lvucxs4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=reject dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 15919265531935.6589627375586815; Thu, 11 Jun 2020 18:49:13 -0700 (PDT) Received: from localhost ([::1]:59644 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jjYoh-0005sz-Sj for importer@patchew.org; Thu, 11 Jun 2020 21:49:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51030) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3Zd7iXgMKCqoTUjQYYQVO.MYWaOWe-NOfOVXYXQXe.YbQ@flex--jkz.bounces.google.com>) id 1jjYlz-0000xD-Ds for qemu-devel@nongnu.org; Thu, 11 Jun 2020 21:46:23 -0400 Received: from mail-qk1-x749.google.com ([2607:f8b0:4864:20::749]:44709) by eggs.gnu.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3Zd7iXgMKCqoTUjQYYQVO.MYWaOWe-NOfOVXYXQXe.YbQ@flex--jkz.bounces.google.com>) id 1jjYlx-0001Up-C4 for qemu-devel@nongnu.org; Thu, 11 Jun 2020 21:46:23 -0400 Received: by mail-qk1-x749.google.com with SMTP id j16so6800262qka.11 for ; Thu, 11 Jun 2020 18:46:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=zrAZryzclMmQ2GnW3+4bij4G7St8PIqWiedWlSplzFg=; b=sh9rj0Ze7pugCxLZjpbqmMaoKlnzXdDEqx/0wdHHh9PVhw4JU8BVAU9wsvU9wzdJlS ePqKSE3IET/Y0oQuMhPvY+zLcJBGM+VmOFC0L+xzIYUtD91jkMN53kZ1q22nKXZR0X7C ckJMAcHT2f7K46LiAknyOzQnkRg0WrGvwRwWM7FpNOXDfsKFfQq61cB3AxUM/OUntXnz 3HRqfbv9yyC6eNrF6rg1krmH3bJBFbf6AuvXeWAXa487ExLV02Z0j4ZkOr2EQNtw6I8f vO5tFuFVza2sg2qWpJ+LZ2pCWyVv7Fo7dmTCSp25BvsW6H7LCErYb+7rz8pYmSiIna4f IYSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=zrAZryzclMmQ2GnW3+4bij4G7St8PIqWiedWlSplzFg=; b=UMvV0cRJ3Z79iaQhocj7/K0aLJ7mnfO+ONFriCr3w8BKlKkCuG9906V6MNbO0XpmcX sItLpM5AIndpwIu64RLQy+uM+LwLg0odeDE8YjBbRuFcKJQVFSoEoJg5AQx0g3XQ4ftG 8Se68ngJB1tX9ViUppinEqlIH9a4iuJ6y8sVs9qTWB1uYekeiWAbVaiy3RLikXopUiMN VIssGI9q1BdYB/IFh9ygNz5oqW7T8Zz/7IgSK3mUKuKxTVdafy4Ivkpipz5Db135VfwF WqWKaZ/UA1ovVktZq8/NISPCSUb51tqh6KMfPzxTGnEjkeruwdoa1VSG4fmT3Nm6cb6Y jp2w== X-Gm-Message-State: AOAM530iRx0+dTNVNv/Xbdf/HplfmFjJe7Ig3BlXwX6lNk/J80i8hsRw L6Gw0/UB2mS11erji7FfiIjetdKckQU5+REmcWOcPzJHgIaZFOv/j03jO+KCPIlMtBmABqp38Kr 9HtTo42QDV7ywDsudHWj8LuxqP1q50XRzWQbh4tmMW+LGf2AHJnZl X-Google-Smtp-Source: ABdhPJzeAyJjF03I50ifR/5PtDnbCByuW3dr6mY5DWWXG5piARCTwYxjHYUyEDMxEGJFh1LCFlWtfWc= X-Received: by 2002:a05:6214:846:: with SMTP id dg6mr9942730qvb.210.1591926373252; Thu, 11 Jun 2020 18:46:13 -0700 (PDT) Date: Thu, 11 Jun 2020 18:46:03 -0700 In-Reply-To: <20200612014606.147691-1-jkz@google.com> Message-Id: <20200612014606.147691-3-jkz@google.com> Mime-Version: 1.0 References: <20200612014606.147691-1-jkz@google.com> X-Mailer: git-send-email 2.27.0.290.gba653c62da-goog Subject: [PATCH 2/5] linux-user: Make fd_trans task-specific. From: Josh Kunz To: qemu-devel@nongnu.org Cc: riku.voipio@iki.fi, laurent@vivier.eu, alex.bennee@linaro.org, Josh Kunz Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::749; envelope-from=3Zd7iXgMKCqoTUjQYYQVO.MYWaOWe-NOfOVXYXQXe.YbQ@flex--jkz.bounces.google.com; helo=mail-qk1-x749.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -105 X-Spam_score: -10.6 X-Spam_bar: ---------- X-Spam_report: (-10.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-1, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @google.com) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The file-descriptor translation subsystem used by QEMU uses some global variables to track file descriptors an their associated state. In the future (when clone is implemented) it may be possible to have two processes that share memory, but have a unique set of file descriptors. This change associates the file-descriptor translation table with the per-task TaskState structure. Since many tasks will share file descriptors (e.g., threads), a structure similar to the existing structure = is used. Each task has a pointer to a global table. That table can be shared by multiple tasks, or changed if a task needs to use a different FD table. Signed-off-by: Josh Kunz --- linux-user/Makefile.objs | 2 +- linux-user/fd-trans-tbl.c | 13 +++++++ linux-user/fd-trans-type.h | 17 +++++++++ linux-user/fd-trans.c | 3 -- linux-user/fd-trans.h | 75 ++++++++++++++++++++++++-------------- linux-user/main.c | 1 + linux-user/qemu.h | 24 ++++++++++++ linux-user/syscall.c | 12 ++++++ 8 files changed, 115 insertions(+), 32 deletions(-) create mode 100644 linux-user/fd-trans-tbl.c create mode 100644 linux-user/fd-trans-type.h diff --git a/linux-user/Makefile.objs b/linux-user/Makefile.objs index d6788f012c..d19102e244 100644 --- a/linux-user/Makefile.objs +++ b/linux-user/Makefile.objs @@ -1,7 +1,7 @@ obj-y =3D main.o syscall.o strace.o mmap.o signal.o \ elfload.o linuxload.o uaccess.o uname.o \ safe-syscall.o $(TARGET_ABI_DIR)/signal.o \ - $(TARGET_ABI_DIR)/cpu_loop.o exit.o fd-trans.o clone.o + $(TARGET_ABI_DIR)/cpu_loop.o exit.o fd-trans.o clone.o fd-trans-tb= l.o =20 obj-$(TARGET_HAS_BFLT) +=3D flatload.o obj-$(TARGET_I386) +=3D vm86.o diff --git a/linux-user/fd-trans-tbl.c b/linux-user/fd-trans-tbl.c new file mode 100644 index 0000000000..6afe91096e --- /dev/null +++ b/linux-user/fd-trans-tbl.c @@ -0,0 +1,13 @@ +#include "qemu/osdep.h" +#include "fd-trans.h" + +struct fd_trans_table *fd_trans_table_clone(struct fd_trans_table *tbl) +{ + struct fd_trans_table *new_tbl =3D g_new0(struct fd_trans_table, 1); + new_tbl->fd_max =3D tbl->fd_max; + new_tbl->entries =3D g_new0(TargetFdTrans*, tbl->fd_max); + memcpy(new_tbl->entries, + tbl->entries, + sizeof(*new_tbl->entries) * tbl->fd_max); + return new_tbl; +} diff --git a/linux-user/fd-trans-type.h b/linux-user/fd-trans-type.h new file mode 100644 index 0000000000..06c4427642 --- /dev/null +++ b/linux-user/fd-trans-type.h @@ -0,0 +1,17 @@ +#ifndef FD_TRANS_TYPE_H +#define FD_TRANS_TYPE_H + +/* + * Break out the TargetFdTrans typedefs into a separate file, to break + * the circular dependency between qemu.h and fd-trans.h. + */ + +typedef abi_long (*TargetFdDataFunc)(void *, size_t); +typedef abi_long (*TargetFdAddrFunc)(void *, abi_ulong, socklen_t); +typedef struct TargetFdTrans { + TargetFdDataFunc host_to_target_data; + TargetFdDataFunc target_to_host_data; + TargetFdAddrFunc target_to_host_addr; +} TargetFdTrans; + +#endif /* FD_TRANS_TYPE_H */ diff --git a/linux-user/fd-trans.c b/linux-user/fd-trans.c index c0687c52e6..c552034a5e 100644 --- a/linux-user/fd-trans.c +++ b/linux-user/fd-trans.c @@ -261,9 +261,6 @@ enum { QEMU___RTA_MAX }; =20 -TargetFdTrans **target_fd_trans; -unsigned int target_fd_max; - static void tswap_nlmsghdr(struct nlmsghdr *nlh) { nlh->nlmsg_len =3D tswap32(nlh->nlmsg_len); diff --git a/linux-user/fd-trans.h b/linux-user/fd-trans.h index a3fcdaabc7..07ae04dad7 100644 --- a/linux-user/fd-trans.h +++ b/linux-user/fd-trans.h @@ -16,38 +16,45 @@ #ifndef FD_TRANS_H #define FD_TRANS_H =20 -typedef abi_long (*TargetFdDataFunc)(void *, size_t); -typedef abi_long (*TargetFdAddrFunc)(void *, abi_ulong, socklen_t); -typedef struct TargetFdTrans { - TargetFdDataFunc host_to_target_data; - TargetFdDataFunc target_to_host_data; - TargetFdAddrFunc target_to_host_addr; -} TargetFdTrans; +#include "qemu.h" +#include "fd-trans-type.h" =20 -extern TargetFdTrans **target_fd_trans; - -extern unsigned int target_fd_max; +/* + * Return a duplicate of the given fd_trans_table. This function always + * succeeds. Ownership of the pointed-to table is yielded to the caller. T= he + * caller is responsible for freeing the table when it is no longer in-use. + */ +struct fd_trans_table *fd_trans_table_clone(struct fd_trans_table *tbl); =20 static inline TargetFdDataFunc fd_trans_target_to_host_data(int fd) { - if (fd >=3D 0 && fd < target_fd_max && target_fd_trans[fd]) { - return target_fd_trans[fd]->target_to_host_data; + TaskState *ts =3D (TaskState *)thread_cpu->opaque; + struct fd_trans_table *tbl =3D ts->fd_trans_tbl; + + if (fd >=3D 0 && fd < tbl->fd_max && tbl->entries[fd]) { + return tbl->entries[fd]->target_to_host_data; } return NULL; } =20 static inline TargetFdDataFunc fd_trans_host_to_target_data(int fd) { - if (fd >=3D 0 && fd < target_fd_max && target_fd_trans[fd]) { - return target_fd_trans[fd]->host_to_target_data; + TaskState *ts =3D (TaskState *)thread_cpu->opaque; + struct fd_trans_table *tbl =3D ts->fd_trans_tbl; + + if (fd >=3D 0 && fd < tbl->fd_max && tbl->entries[fd]) { + return tbl->entries[fd]->host_to_target_data; } return NULL; } =20 static inline TargetFdAddrFunc fd_trans_target_to_host_addr(int fd) { - if (fd >=3D 0 && fd < target_fd_max && target_fd_trans[fd]) { - return target_fd_trans[fd]->target_to_host_addr; + TaskState *ts =3D (TaskState *)thread_cpu->opaque; + struct fd_trans_table *tbl =3D ts->fd_trans_tbl; + + if (fd >=3D 0 && fd < tbl->fd_max && tbl->entries[fd]) { + return tbl->entries[fd]->target_to_host_addr; } return NULL; } @@ -56,29 +63,41 @@ static inline void fd_trans_register(int fd, TargetFdTr= ans *trans) { unsigned int oldmax; =20 - if (fd >=3D target_fd_max) { - oldmax =3D target_fd_max; - target_fd_max =3D ((fd >> 6) + 1) << 6; /* by slice of 64 entries = */ - target_fd_trans =3D g_renew(TargetFdTrans *, - target_fd_trans, target_fd_max); - memset((void *)(target_fd_trans + oldmax), 0, - (target_fd_max - oldmax) * sizeof(TargetFdTrans *)); + TaskState *ts =3D (TaskState *)thread_cpu->opaque; + struct fd_trans_table *tbl =3D ts->fd_trans_tbl; + + /* + * TODO: This is racy. Updates to tbl->entries should be guarded by + * a lock. + */ + if (fd >=3D tbl->fd_max) { + oldmax =3D tbl->fd_max; + tbl->fd_max =3D ((fd >> 6) + 1) << 6; /* by slice of 64 entries */ + tbl->entries =3D g_renew(TargetFdTrans *, tbl->entries, tbl->fd_ma= x); + memset((void *)(tbl->entries + oldmax), 0, + (tbl->fd_max - oldmax) * sizeof(TargetFdTrans *)); } - target_fd_trans[fd] =3D trans; + tbl->entries[fd] =3D trans; } =20 static inline void fd_trans_unregister(int fd) { - if (fd >=3D 0 && fd < target_fd_max) { - target_fd_trans[fd] =3D NULL; + TaskState *ts =3D (TaskState *)thread_cpu->opaque; + struct fd_trans_table *tbl =3D ts->fd_trans_tbl; + + if (fd >=3D 0 && fd < tbl->fd_max) { + tbl->entries[fd] =3D NULL; } } =20 static inline void fd_trans_dup(int oldfd, int newfd) { + TaskState *ts =3D (TaskState *)thread_cpu->opaque; + struct fd_trans_table *tbl =3D ts->fd_trans_tbl; + fd_trans_unregister(newfd); - if (oldfd < target_fd_max && target_fd_trans[oldfd]) { - fd_trans_register(newfd, target_fd_trans[oldfd]); + if (oldfd >=3D 0 && oldfd < tbl->fd_max && tbl->entries[oldfd]) { + fd_trans_register(newfd, tbl->entries[oldfd]); } } =20 diff --git a/linux-user/main.c b/linux-user/main.c index 3597e99bb1..d1ed0f6120 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -796,6 +796,7 @@ int main(int argc, char **argv, char **envp) ts->bprm =3D &bprm; cpu->opaque =3D ts; task_settid(ts); + ts->fd_trans_tbl =3D g_new0(struct fd_trans_table, 1); =20 ret =3D loader_exec(execfd, exec_path, target_argv, target_environ, re= gs, info, &bprm); diff --git a/linux-user/qemu.h b/linux-user/qemu.h index ce902f5132..989e01ad8d 100644 --- a/linux-user/qemu.h +++ b/linux-user/qemu.h @@ -5,6 +5,7 @@ #include "cpu.h" #include "exec/exec-all.h" #include "exec/cpu_ldst.h" +#include "fd-trans-type.h" =20 #undef DEBUG_REMAP #ifdef DEBUG_REMAP @@ -96,6 +97,22 @@ struct emulated_sigtable { target_siginfo_t info; }; =20 +/* + * The fd_trans_table is used the FD data translation subsystem to + * find FD data translators (i.e. functions). `entries` is an array of poi= nters + * with size `fd_max`, containing pointers to TargetFDTrans structs. A poi= nter + * to a struct of this type is stored TaskState, which allows the struct i= tself + * to be shared by all tasks (e.g., threads) that share a file descriptor + * namespace. Storing a pointer to this table in the TaskState struct is n= eeded + * to support rare cases where tasks share an address space, but do not sh= are + * a set of file descriptors (e.g., after clone(CLONE_VM) when CLONE_FILES= is + * not set). See `fd-trans.h` for more info on the FD translation subsyste= m. + */ +struct fd_trans_table { + uint64_t fd_max; + TargetFdTrans **entries; +}; + /* NOTE: we force a big alignment so that the stack stored after is aligned too */ typedef struct TaskState { @@ -153,6 +170,13 @@ typedef struct TaskState { =20 /* This thread's sigaltstack, if it has one */ struct target_sigaltstack sigaltstack_used; + + /* + * A pointer to the FD trans table to be used by this task. Note that = the + * task doesn't have exclusive control of the fd_trans_table so access= to + * the table itself should be guarded. + */ + struct fd_trans_table *fd_trans_tbl; } __attribute__((aligned(16))) TaskState; =20 extern char *exec_path; diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 7ce021cea2..ff1d07871f 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -6013,6 +6013,18 @@ static int do_fork(CPUArchState *env, unsigned int f= lags, abi_ulong newsp, ts->info =3D parent_ts->info; ts->signal_mask =3D parent_ts->signal_mask; =20 + if (flags & CLONE_FILES) { + ts->fd_trans_tbl =3D parent_ts->fd_trans_tbl; + } else { + /* + * When CLONE_FILES is not set, the parent and child will have + * different file descriptor tables, so we need a new + * fd_trans_tbl. Clone from parent_ts, since child inherits all our + * file descriptors. + */ + ts->fd_trans_tbl =3D fd_trans_table_clone(parent_ts->fd_trans_tbl); + } + if (flags & CLONE_CHILD_CLEARTID) { ts->child_tidptr =3D child_tidptr; } --=20 2.27.0.290.gba653c62da-goog From nobody Sun May 19 12:13:31 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=reject dis=none) header.from=google.com ARC-Seal: i=1; a=rsa-sha256; t=1591926471; cv=none; d=zohomail.com; s=zohoarc; b=J4aLV/HdGELV+NOSAe2w7m/Iym3IHKG/H5aLaCIssTPJEUxZT2V2LJRHUhApWYoio8WLbdupISfD6yNYjKFWCfRd6T/ckAv/CSNpC2zZ5HR9LDfSZ4fMYhNWw765Gof01krXt/q0uMloTlnMXsbUpSeyKrzLoIKybSfwQpyb2lc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1591926471; h=Content-Type:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=DK7j++nEwmx0IXJsl6/327fR/1xlRgvEHDAk8673aKE=; b=RSgxKMkIzqAFpy6KdSzMZdquMFXEebYzKqPRHEEtSPkCWFbUhgpNeNWOP5brQQmT7x4Dbqe8AT1pbw3k3jz/0TW61qZ5DCHUF9hV+XSXVwlI1N9lPqOGvtyTa/zow4MdN6Am49Z9VUS2CZavrjMlX06MvEwOp6hMfVUvSCwmxgg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=reject dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1591926471199491.75171009403516; Thu, 11 Jun 2020 18:47:51 -0700 (PDT) Received: from localhost ([::1]:53604 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jjYnN-0002cK-SK for importer@patchew.org; Thu, 11 Jun 2020 21:47:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51052) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3Z97iXgMKCqwVWlSaaSXQ.OaYcQYg-PQhQXZaZSZg.adS@flex--jkz.bounces.google.com>) id 1jjYm0-0000yt-FG for qemu-devel@nongnu.org; Thu, 11 Jun 2020 21:46:24 -0400 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]:49577) by eggs.gnu.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3Z97iXgMKCqwVWlSaaSXQ.OaYcQYg-PQhQXZaZSZg.adS@flex--jkz.bounces.google.com>) id 1jjYly-0001V3-H0 for qemu-devel@nongnu.org; Thu, 11 Jun 2020 21:46:24 -0400 Received: by mail-yb1-xb49.google.com with SMTP id o140so8738216yba.16 for ; Thu, 11 Jun 2020 18:46:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=DK7j++nEwmx0IXJsl6/327fR/1xlRgvEHDAk8673aKE=; b=LF8mekIt78DHARO9lp2gqgy8Xomvq7GbJpzWLfpELWN/OOC8/kvZzUBesI9jwV2CX7 Zbw2i5xlfBEt2pm9rUkZZCfKBnED6jKUOp/aQnK5hzbO5YIsyZjxn7l/s3OfMMFZbsXG eImFm2ZxFIZFN188f3uCNgwUw9doEkyt5wDxRkiLtw1J5f9Lm03qm4bw8QbcCaS5Du+R sc81WFtlVAET/gXNGnYeUe1Fq/2yAmgr6kFV4V96AdayTHLgCDjuEGnoct0Wgwpneas9 Fjh0gbulpxyCCNT2xCayB6SbhlVGOkDGWZG4mkVnIil3H9oT7IGKb0ZrLI7BddbxZfBP dCLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=DK7j++nEwmx0IXJsl6/327fR/1xlRgvEHDAk8673aKE=; b=gGJREPnl7PrTn9TSTY9s/EUtWCeEoi7Ac8G+d8RubMSa4w6Zo8mmRoT9+UVdkpRDIE TW29BznnzoMkOnsQivEJ7ljy4kQ9bXilq3scSVDYsUfTCKbi1xoWSY4Iy2vCEhx/m5uG V4/dCfLLYhmkKeRR9pu52UvOlgtzdc7HG7HP1KoYifsp6ALzEK3XzLpZO2azqMg+fgV1 GpDouUhJMeJGqOYcfvTYVHP3YyjBaQaL/tdRUDzdBt2KP+eTaUie2PFQrVvgpDoOpP+5 0zKRSjAgGZwoKp4Gt8kPV+sS5ZyqttILZiqPT5yJHE7q9GyRIqWVG+DBPhOVsDkWwWrn 9e4Q== X-Gm-Message-State: AOAM5328TwGSobXFQF1PLBwY5NfKigqzJVLfDtytIQnxULmiPcbIcPHm B6RTEO+NbNQG+JMKOh1x6ITu/KqbJDXUkKzwlHJcytr/4WKs3ZKnrvZ5RPDKAJ85HULvbqx4gFJ ey/4xrxnrD3CfytL3xKamgsUQ5vlv8bgklaKM84hBAgCdvo7/qVlI X-Google-Smtp-Source: ABdhPJwEkT/xjV3Iu9ncqRphNSpQ+qOLGtX2ke72xAkTOx6Rp/bRLRcr9avVPGd4oQSr8BXgpHtRAks= X-Received: by 2002:a25:ec0d:: with SMTP id j13mr17129494ybh.364.1591926375093; Thu, 11 Jun 2020 18:46:15 -0700 (PDT) Date: Thu, 11 Jun 2020 18:46:04 -0700 In-Reply-To: <20200612014606.147691-1-jkz@google.com> Message-Id: <20200612014606.147691-4-jkz@google.com> Mime-Version: 1.0 References: <20200612014606.147691-1-jkz@google.com> X-Mailer: git-send-email 2.27.0.290.gba653c62da-goog Subject: [PATCH 3/5] linux-user: Make sigact_table part of the task state. From: Josh Kunz To: qemu-devel@nongnu.org Cc: riku.voipio@iki.fi, laurent@vivier.eu, alex.bennee@linaro.org, Josh Kunz Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::b49; envelope-from=3Z97iXgMKCqwVWlSaaSXQ.OaYcQYg-PQhQXZaZSZg.adS@flex--jkz.bounces.google.com; helo=mail-yb1-xb49.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -105 X-Spam_score: -10.6 X-Spam_bar: ---------- X-Spam_report: (-10.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-1, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @google.com) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" sigact_table stores the signal handlers for the given process. Once we support CLONE_VM, two tasks using the same virtual memory may need different signal handler tables (e.g., if CLONE_SIGHAND is not provided). Here we make sigact_table part of the TaskState, so it can be duplicated as needed when cloning children. Signed-off-by: Josh Kunz --- linux-user/qemu.h | 8 ++++++++ linux-user/signal.c | 35 +++++++++++++++++++++++++++-------- linux-user/syscall.c | 17 +++++++++++++++++ 3 files changed, 52 insertions(+), 8 deletions(-) diff --git a/linux-user/qemu.h b/linux-user/qemu.h index 989e01ad8d..54bf4f47be 100644 --- a/linux-user/qemu.h +++ b/linux-user/qemu.h @@ -177,6 +177,12 @@ typedef struct TaskState { * the table itself should be guarded. */ struct fd_trans_table *fd_trans_tbl; + + /* + * A table containing signal actions for the target. It should have at + * least TARGET_NSIG entries + */ + struct target_sigaction *sigact_tbl; } __attribute__((aligned(16))) TaskState; =20 extern char *exec_path; @@ -419,7 +425,9 @@ void print_syscall_ret(int num, abi_long arg1); */ void print_taken_signal(int target_signum, const target_siginfo_t *tinfo); =20 + /* signal.c */ +struct target_sigaction *sigact_table_clone(struct target_sigaction *orig); void process_pending_signals(CPUArchState *cpu_env); void signal_init(void); int queue_signal(CPUArchState *env, int sig, int si_type, diff --git a/linux-user/signal.c b/linux-user/signal.c index 8cf51ffecd..dc98def6d1 100644 --- a/linux-user/signal.c +++ b/linux-user/signal.c @@ -25,7 +25,13 @@ #include "trace.h" #include "signal-common.h" =20 -static struct target_sigaction sigact_table[TARGET_NSIG]; +struct target_sigaltstack target_sigaltstack_used =3D { + .ss_sp =3D 0, + .ss_size =3D 0, + .ss_flags =3D TARGET_SS_DISABLE, +}; + +typedef struct target_sigaction sigact_table[TARGET_NSIG]; =20 static void host_signal_handler(int host_signum, siginfo_t *info, void *puc); @@ -542,6 +548,11 @@ static void signal_table_init(void) } } =20 +struct target_sigaction *sigact_table_clone(struct target_sigaction *orig) +{ + return memcpy(g_new(sigact_table, 1), orig, sizeof(sigact_table)); +} + void signal_init(void) { TaskState *ts =3D (TaskState *)thread_cpu->opaque; @@ -556,6 +567,12 @@ void signal_init(void) /* Set the signal mask from the host mask. */ sigprocmask(0, 0, &ts->signal_mask); =20 + /* + * Set all host signal handlers. ALL signals are blocked during + * the handlers to serialize them. + */ + ts->sigact_tbl =3D (struct target_sigaction *) g_new0(sigact_table, 1); + sigfillset(&act.sa_mask); act.sa_flags =3D SA_SIGINFO; act.sa_sigaction =3D host_signal_handler; @@ -568,9 +585,9 @@ void signal_init(void) host_sig =3D target_to_host_signal(i); sigaction(host_sig, NULL, &oact); if (oact.sa_sigaction =3D=3D (void *)SIG_IGN) { - sigact_table[i - 1]._sa_handler =3D TARGET_SIG_IGN; + ts->sigact_tbl[i - 1]._sa_handler =3D TARGET_SIG_IGN; } else if (oact.sa_sigaction =3D=3D (void *)SIG_DFL) { - sigact_table[i - 1]._sa_handler =3D TARGET_SIG_DFL; + ts->sigact_tbl[i - 1]._sa_handler =3D TARGET_SIG_DFL; } /* If there's already a handler installed then something has gone horribly wrong, so don't even try to handle that case. */ @@ -608,11 +625,12 @@ void force_sig(int sig) #if !defined(TARGET_RISCV) void force_sigsegv(int oldsig) { + TaskState *ts =3D (TaskState *)thread_cpu->opaque; if (oldsig =3D=3D SIGSEGV) { /* Make sure we don't try to deliver the signal again; this will * end up with handle_pending_signal() calling dump_core_and_abort= (). */ - sigact_table[oldsig - 1]._sa_handler =3D TARGET_SIG_DFL; + ts->sigact_tbl[oldsig - 1]._sa_handler =3D TARGET_SIG_DFL; } force_sig(TARGET_SIGSEGV); } @@ -837,6 +855,7 @@ int do_sigaction(int sig, const struct target_sigaction= *act, struct sigaction act1; int host_sig; int ret =3D 0; + TaskState* ts =3D (TaskState *)thread_cpu->opaque; =20 trace_signal_do_sigaction_guest(sig, TARGET_NSIG); =20 @@ -848,7 +867,7 @@ int do_sigaction(int sig, const struct target_sigaction= *act, return -TARGET_ERESTARTSYS; } =20 - k =3D &sigact_table[sig - 1]; + k =3D &ts->sigact_tbl[sig - 1]; if (oact) { __put_user(k->_sa_handler, &oact->_sa_handler); __put_user(k->sa_flags, &oact->sa_flags); @@ -930,7 +949,7 @@ static void handle_pending_signal(CPUArchState *cpu_env= , int sig, sa =3D NULL; handler =3D TARGET_SIG_IGN; } else { - sa =3D &sigact_table[sig - 1]; + sa =3D &ts->sigact_tbl[sig - 1]; handler =3D sa->_sa_handler; } =20 @@ -1022,9 +1041,9 @@ void process_pending_signals(CPUArchState *cpu_env) * looping round and round indefinitely. */ if (sigismember(&ts->signal_mask, target_to_host_signal_table[= sig]) - || sigact_table[sig - 1]._sa_handler =3D=3D TARGET_SIG_IGN= ) { + || ts->sigact_tbl[sig - 1]._sa_handler =3D=3D TARGET_SIG_I= GN) { sigdelset(&ts->signal_mask, target_to_host_signal_table[si= g]); - sigact_table[sig - 1]._sa_handler =3D TARGET_SIG_DFL; + ts->sigact_tbl[sig - 1]._sa_handler =3D TARGET_SIG_DFL; } =20 handle_pending_signal(cpu_env, sig, &ts->sync_signal); diff --git a/linux-user/syscall.c b/linux-user/syscall.c index ff1d07871f..838caf9c98 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -5989,6 +5989,17 @@ static int do_fork(CPUArchState *env, unsigned int f= lags, abi_ulong newsp, return -TARGET_EINVAL; } =20 + if ((flags & CLONE_SIGHAND) && !(flags & CLONE_VM)) { + /* + * Like CLONE_FILES, this flag combination is unsupported. If + * CLONE_SIGHAND is specified without CLONE_VM, then we need to ke= ep + * the sigact table in-sync across virtual memory boundaries, whic= h is + * substantially more complicated. + */ + qemu_log_mask(LOG_UNIMP, "CLONE_SIGHAND only supported with CLONE_= VM"); + return -TARGET_EINVAL; + } + pthread_mutex_init(&info.mutex, NULL); pthread_mutex_lock(&info.mutex); pthread_cond_init(&info.cond, NULL); @@ -6025,6 +6036,12 @@ static int do_fork(CPUArchState *env, unsigned int f= lags, abi_ulong newsp, ts->fd_trans_tbl =3D fd_trans_table_clone(parent_ts->fd_trans_tbl); } =20 + if (flags & CLONE_SIGHAND) { + ts->sigact_tbl =3D parent_ts->sigact_tbl; + } else { + ts->sigact_tbl =3D sigact_table_clone(parent_ts->sigact_tbl); + } + if (flags & CLONE_CHILD_CLEARTID) { ts->child_tidptr =3D child_tidptr; } --=20 2.27.0.290.gba653c62da-goog From nobody Sun May 19 12:13:31 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=reject dis=none) header.from=google.com ARC-Seal: i=1; a=rsa-sha256; t=1591926478; cv=none; d=zohomail.com; s=zohoarc; b=Gujp/79kivPFI6SRfxxvSzibs1euLhvGA0eqGBLxMpQg3NdKLFmQqtoilB6ZaNNir7Oqo+zy3a7dlQ5RJkmLjas6Wo3R0jMd//SD10ifbCiPc3GMcKGVYGmxRaP+0JlYVqS12lhmvcTFFFFd4rL/t59RJVa0ffsi94nl6Qcqy8Y= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1591926478; h=Content-Type:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=Vp4OVl46XRZBSe+oi9pU4mOxTCIGyGyo/JRsd2TSg+Q=; b=mssC//qu4u72FuG1ULn6OJV1UlbpjtBZxVOsCW6TwVZzUNcZU5sR9T1+J+Nz9RBF7jZRFGnniq3ssUjZbEzWLR9at5QukZhTtAkvXCzcyretDT1n3zsWy3EW4V4d0uFfsE2sMsroH8K6/IIZHXGL7dUolZ5jfpU5LsTX83hJLCo= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=reject dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1591926478283989.4605096984161; Thu, 11 Jun 2020 18:47:58 -0700 (PDT) Received: from localhost ([::1]:54210 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jjYnU-0002rd-SQ for importer@patchew.org; Thu, 11 Jun 2020 21:47:56 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51100) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3ad7iXgMKCq4XYnUccUZS.QcaeSai-RSjSZbcbUbi.cfU@flex--jkz.bounces.google.com>) id 1jjYm3-00015i-Vt for qemu-devel@nongnu.org; Thu, 11 Jun 2020 21:46:28 -0400 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]:54976) by eggs.gnu.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3ad7iXgMKCq4XYnUccUZS.QcaeSai-RSjSZbcbUbi.cfU@flex--jkz.bounces.google.com>) id 1jjYm0-0001VA-Dd for qemu-devel@nongnu.org; Thu, 11 Jun 2020 21:46:27 -0400 Received: by mail-yb1-xb49.google.com with SMTP id p22so8734277ybg.21 for ; Thu, 11 Jun 2020 18:46:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Vp4OVl46XRZBSe+oi9pU4mOxTCIGyGyo/JRsd2TSg+Q=; b=Adf0hl3VTIFRgGdDeZA2K0o6Fm9yW4o0neP4mUgr1nzYS4LDfOi1wtxiizgedyKp21 3E9NfBAAiyRKLKn6ypv2b6y8v1PvxSkDpPp8EPa9bTx9lmma9dFcfnlFt0MuH58VbX77 SPLNiVUM3aGuBTP6Oerant2kzNUJ+MeTn938wfObJuHDSa849lTjFxPe8IOWvszueqM6 m4jkoYg39wuermUDhKdW6O923i2iahVldV5gmXcrj/4F0lcPSH7HVpgypPLBHpfP/TbH fnACzyoKJfcF/Mt2AlT5rfzJKOJCCWY/cG+IBvhK8+isBsheFF78H7jTOnuamWeI12IK qPDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Vp4OVl46XRZBSe+oi9pU4mOxTCIGyGyo/JRsd2TSg+Q=; b=uQDKO4cBn2emmCXfw11yCj5cTlwasqvimufFPOUJwZqJoxpYXHvJNypwTn2pdz7MnB u+jJF0JWxOlELRuJ7Vj33VLeJpfBew+dVD9NE0plBDq995Ze6EVt0XkxUDZYiSd6urEO v9gGs8SwEN0/CFhQO+YExQAT50TL0MQzDaPHsEGiDokAkhuKUXgMhi8MvsvFsJY4CCg/ YvIVYjBidnZFrWSksfcck9LVNS4q3SMljA/XfmHYtx1+SDnnNcT8P/Ie65DLEM9lkM4g L/a8Bcv0Ga6dcnJQGU5/RnwclcHa2FPpywlNAfHziiyI7CUtpzlhT5Vr5GRH/Ooq87Cg /LKQ== X-Gm-Message-State: AOAM532WJ4s0lv4tfxLwKGro/6mfr2hhECAHj6CNsY0ZCVzU6YpmXpKd 87xqCgUHkbr+UWNS+PcIpl2ngPd1t0oyxzNWd75xsX3g9Xo+RjUBxMHAsq1xysn1b8yAQ5EzAs/ pxK8NAa5xOC7t7CLCwXrJgSa5mG2uV0M+HM0wX2ic5C+xhcoXJofe X-Google-Smtp-Source: ABdhPJxIjZoFRRW6HDdtHlpf7w3REoolueKDU90GZiXKTLrad/KVynm57QPQRb3Og5nyD4BbixqOp20= X-Received: by 2002:a25:cfcd:: with SMTP id f196mr17883823ybg.142.1591926377140; Thu, 11 Jun 2020 18:46:17 -0700 (PDT) Date: Thu, 11 Jun 2020 18:46:05 -0700 In-Reply-To: <20200612014606.147691-1-jkz@google.com> Message-Id: <20200612014606.147691-5-jkz@google.com> Mime-Version: 1.0 References: <20200612014606.147691-1-jkz@google.com> X-Mailer: git-send-email 2.27.0.290.gba653c62da-goog Subject: [PATCH 4/5] linux-user: Support CLONE_VM and extended clone options From: Josh Kunz To: qemu-devel@nongnu.org Cc: riku.voipio@iki.fi, laurent@vivier.eu, alex.bennee@linaro.org, Josh Kunz Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::b49; envelope-from=3ad7iXgMKCq4XYnUccUZS.QcaeSai-RSjSZbcbUbi.cfU@flex--jkz.bounces.google.com; helo=mail-yb1-xb49.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -105 X-Spam_score: -10.6 X-Spam_bar: ---------- X-Spam_report: (-10.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-1, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @google.com) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The `clone` system call can be used to create new processes that share attributes with their parents, such as virtual memory, file system location, file descriptor tables, etc. These can be useful to a variety of guest programs. Before this patch, QEMU had support for a limited set of these attributes. Basically the ones needed for threads, and the options used by fork. This change adds support for all flag combinations involving CLONE_VM. In theory, almost all clone options could be supported, but invocations not using CLONE_VM are likely to run afoul of linux-user's inherently multi-threaded design. To add this support, this patch updates the `qemu_clone` helper. An overview of the mechanism used to support general `clone` options with CLONE_VM is described below. This patch also enables by-default the `clone` unit-tests in tests/tcg/multiarch/linux-test.c, and adds an additional test for duplicate exit signals, based on a bug found during development. !! Overview Adding support for CLONE_VM is tricky. The parent and guest process will share an address space (similar to threads), so the emulator must coordinate between the parent and the child. Currently, QEMU relies heavily on Thread Local Storage (TLS) as part of this coordination strategy. For threads, this works fine, because libc manages the thread-local data region used for TLS, when we create new threads using `pthread_create`. Ideally we could use the same mechanism for "process-local storage" needed to allow the parent/child processes to emulate in tandem. Unfortunately TLS is tightly integrated into libc. The only way to create TLS data regions is via the `pthread_create` API which also spawns a new thread (rather than a new processes, which is what we want). Worse still, TLS itself is a complicated arch-specific feature that is tightly integrated into the rest of libc and the dynamic linker. Re-implementing TLS support for QEMU would likely require a special dynamic linker / libc. Alternatively, the popular libcs could be extended, to allow for users to create TLS regions without creating threads. Even if major libcs decide to add this support, QEMU will still need a temporary work around until those libcs are widely deployed. It's also unclear if libcs will be interested in supporting this case, since TLS image creation is generally deeply integrated with thread setup. In this patch, I've employed an alternative approach: spawning a thread an "stealing" its TLS image for use in the child process. This approach leaves a dangling thread while the TLS image is in use, but by design that thread will not become schedulable until after the TLS data is no longer in-use by the child (as described in a moment). Therefore, it should cause relatively minimal overhead. When considered in the larger context, this seems like a reasonable tradeoff. A major complication of this approach knowing when it is safe to clean up the stack, and TLS image, used by a child process. When a child is created with `CLONE_VM` its stack, and TLS data, need to remain valid until that child has either exited, or successfully called `execve` (on `execve` the child is given a new VMM by the kernel). One approach would be to use `waitid(WNOWAIT)` (the `WNOWAIT` allows the guest to reap the child). The problem is that the `wait` family of calls only waits for termination. The pattern of `clone() ... execve()` for long running child processes is pretty common. If we waited for child processes to exit, it's likely we would end up using substantially more memory, and keep the suspended TLS thread around much longer than necessary. Instead, in this patch, I've used an "trampoline" process. The real parent first clones a trampoline, the trampoline then clones the ultimate child using the `CLONE_VFORK` option. `CLONE_VFORK` suspends the trampoline process until the child has exited, or called `execve`. Once the trampoline is re-scheduled, we know it is safe to clean up after the child. This creates one more suspended process, but typically, the trampoline only exists for a short period of time. !! CLONE_VM setup, step by step 1. First, the suspended thread whose TLS we will use is created using `pthread_create`. The thread fetches and returns it's "TLS pointer" (an arch-specific value given to the kernel) to the parent. It then blocks on a lock to prevent its TLS data from being cleaned up. Ultimately the lock will be unlocked by the trampoline once the child exits. 2. Once the TLS thread has fetched the TLS pointer, it notifies the real parent thread, which calls `clone()` to create the trampoline process. For ease of implementation, the TLS image is set for the trampoline process during this step. This allows the trampoline to use functions that require TLS if needed (e.g., printf). TLS location is inherited when a new child is spawned, so this TLS data will automatically be inherited by the child. 3. Once the trampoline has been spawned, it registers itself as a "hidden" process with the signal subsystem. This prevents the exit signal from the trampoline from ever being forwarded to the guest. This is needed due to the way that Linux sets the exit signal for the ultimate child when `CLONE_PARENT` is set. See the source for details. 4. Once setup is complete, the trampoline spawns the final child with the original clone flags, plus `CLONE_PARENT`, so the child is correctly parented to the kernel task on which the guest invoked `clone`. Without this, kernel features like PDEATHSIG, and subreapers, would not work properly. As previously discussed, the trampoline also supplies `CLONE_VFORK` so that it is suspended until the child can be cleaned up. 5. Once the child is spawned, it signals the original parent thread that it is running. At this point, the trampoline process is suspended (due to CLONE_VFORK). 6. Finally, the call to `qemu_clone` in the parent is finished, the child begins executing the given callback function in the new child process. !! Cleaning up Clean up itself is a multi-step process. Once the child exits, or is killed by a signal (cleanup is the same in both cases), the trampoline process becomes schedulable. When the trampoline is scheduled, it frees the child stack, and unblocks the suspended TLS thread. This cleans up the child resources, but not the stack used by the trampoline itself. It is possible for a process to clean up its own stack, but it is tricky, and architecture-specific. Instead we leverage the TLS manager thread to clean up the trampoline stack. When the trampoline is cloned (in step 2 above), we additionally set the `CHILD_SETTID` and `CHILD_CLEARTID` flags. The target location for the SET/CLEAR TID is set to a special field known by the TLS manager. Then, when the TLS manager thread is unsuspended, it performs an additional `FUTEX_WAIT` on this location. That blocks the TLS manager thread until the trampoline has fully exited, then the TLS manager thread frees the trampoline process's stack, before exiting itself. !! Shortcomings of this patch * It's complicated. * It doesn't support any clone options when CLONE_VM is omitted. * It doesn't properly clean up the CPU queue when the child process terminates, or calls execve(). * RCU unregistration is done in the trampoline process (in clone.c), but registration happens in syscall.c This should be made more explicit. * The TLS image, and trampoline stack are not cleaned up if the parent calls `execve` or `exit_group` before the child does. This is because those cleanup tasks are handled by the TLS manager thread. The TLS manager thread is in the same thread group as the parent, so it will be terminated if the parent exits or calls `execve`. !! Alternatives considered * Non-standard libc extension to allow creating TLS images independent of threads. This would allow us to just `clone` the child directly instead of this complicated maneuver. Though we probably would still need the cleanup logic. For libcs, TLS image allocation is tightly connected to thread stack allocation, which is also arch-specific. I do not have enough experience with libc development to know if maintainers of any popular libcs would be open to supporting such an API. Additionally, since it will probably take years before a libc fix would be widely deployed, we need an interim solution anyways. * Non-standard, Linux-only, libc extension to allow us to specify the CLONE_* flags used by `pthread_create`. The processes we are creating are basically threads in a different thread group. If we could alter the flags used, this whole processes could become a `pthread_create.` The problem with this approach is that I don't know what requirements pthreads has on threads to ensure they function properly. I suspect that pthreads relies on CHILD_CLEARTID+FUTEX_WAKE to cleanup detached thread state. Since we don't control the child exit reason (Linux only handles CHILD_CLEARTID on normal, non-signal process termination), we probably can't use this same tracking mechanism. * Other mechanisms for detecting child exit so cleanup can happen besides CLONE_VFORK: * waitid(WNOWAIT): This can only detect exit, not execve. * file descriptors with close on exec set: This cannot detect children cloned with CLONE_FILES. * System V semaphore adjustments: Cannot detect children cloned with CLONE_SYSVSEM. * CLONE_CHILD_CLEARTID + FUTEX_WAIT: Cannot detect abnormally terminated children. * Doing the child clone directly in the TLS manager thread: This saves the need for the trampoline process, but it causes the child process to be parented to the wrong kernel task (the TLS thread instead of the Main thread) breaking things like PDEATHSIG. Signed-off-by: Josh Kunz --- linux-user/clone.c | 415 ++++++++++++++++++++++++++++++- linux-user/qemu.h | 17 ++ linux-user/signal.c | 49 ++++ linux-user/syscall.c | 69 +++-- tests/tcg/multiarch/linux-test.c | 67 ++++- 5 files changed, 592 insertions(+), 25 deletions(-) diff --git a/linux-user/clone.c b/linux-user/clone.c index f02ae8c464..3f7344cf9e 100644 --- a/linux-user/clone.c +++ b/linux-user/clone.c @@ -12,6 +12,12 @@ #include #include =20 +/* arch-specifc includes needed to fetch the TLS base offset. */ +#if defined(__x86_64__) +#include +#include +#endif + static const unsigned long NEW_STACK_SIZE =3D 0x40000UL; =20 /* @@ -62,6 +68,397 @@ static void completion_finish(struct completion *c) pthread_mutex_unlock(&c->mu); } =20 +struct tls_manager { + void *tls_ptr; + /* fetched is completed once tls_ptr has been set by the thread. */ + struct completion fetched; + /* + * spawned is completed by the user once the managed_tid + * has been spawned. + */ + struct completion spawned; + /* + * TID of the child whose memory is cleaned up upon death. This memory + * location is used as part of a futex op, and is cleared by the kernel + * since we specify CHILD_CLEARTID. + */ + int managed_tid; + /* + * The value to be `free`'d up once the janitor is ready to clean up t= he + * TLS section, and the managed tid has exited. + */ + void *cleanup; +}; + +/* + * tls_ptr fetches the TLS "pointer" for the current thread. This pointer + * should be whatever platform-specific address is used to represent the T= LS + * base address. + */ +static void *tls_ptr() +{ + void *ptr; +#if defined(__x86_64__) + /* + * On x86_64, the TLS base is stored in the `fs` segment register, we = can + * fetch it with `ARCH_GET_FS`: + */ + (void)syscall(SYS_arch_prctl, ARCH_GET_FS, (unsigned long) &ptr); +#else + ptr =3D NULL; +#endif + return ptr; +} + +/* + * clone_vm_supported returns true if clone_vm() is supported on this + * platform. + */ +static bool clone_vm_supported() +{ +#if defined(__x86_64__) + return true; +#else + return false; +#endif +} + +static void *tls_manager_thread(void *arg) +{ + struct tls_manager *mgr =3D (struct tls_manager *) arg; + int child_tid, ret; + + /* + * NOTE: Do not use an TLS in this thread until after the `spawned` + * completion is finished. We need to preserve the pristine state of + * the TLS image for this thread, so it can be re-used in a separate + * process. + */ + mgr->tls_ptr =3D tls_ptr(); + + /* Notify tls_new that we finished fetching the TLS ptr. */ + completion_finish(&mgr->fetched); + + /* + * Wait for the user of our TLS to tell us the child using our TLS has + * been spawned. + */ + completion_await(&mgr->spawned); + + child_tid =3D atomic_fetch_or(&mgr->managed_tid, 0); + /* + * Check if the child has already terminated by this point. If not, wa= it + * for the child to exit. As long as the trampoline is not killed by + * a signal, the kernel guarantees that the memory at &mgr->managed_tid + * will be cleared, and a FUTEX_WAKE at that address will triggered. + */ + if (child_tid !=3D 0) { + ret =3D syscall(SYS_futex, &mgr->managed_tid, FUTEX_WAIT, + child_tid, NULL, NULL, 0); + assert(ret =3D=3D 0 && "clone manager futex should always succeed"= ); + } + + free(mgr->cleanup); + g_free(mgr); + + return NULL; +} + +static struct tls_manager *tls_manager_new() +{ + struct tls_manager *mgr =3D g_new0(struct tls_manager, 1); + sigset_t block, oldmask; + + sigfillset(&block); + if (sigprocmask(SIG_BLOCK, &block, &oldmask) !=3D 0) { + return NULL; + } + + completion_init(&mgr->fetched); + completion_init(&mgr->spawned); + + pthread_attr_t attr; + pthread_attr_init(&attr); + pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); + + pthread_t unused; + if (pthread_create(&unused, &attr, tls_manager_thread, (void *) mgr)) { + pthread_attr_destroy(&attr); + g_free(mgr); + return NULL; + } + pthread_attr_destroy(&attr); + completion_await(&mgr->fetched); + + if (sigprocmask(SIG_SETMASK, &oldmask, NULL) !=3D 0) { + /* Let the thread exit, and cleanup itself. */ + completion_finish(&mgr->spawned); + return NULL; + } + + /* Once we finish awaiting, the tls_ptr will be usable. */ + return mgr; +} + +struct stack { + /* Buffer is the "base" of the stack buffer. */ + void *buffer; + /* Top is the "start" of the stack (since stack addresses "grow down")= . */ + void *top; +}; + +struct info { + /* Stacks used for the trampoline and child process. */ + struct { + struct stack trampoline; + struct stack process; + } stack; + struct completion child_ready; + /* `clone` flags for the process the user asked us to make. */ + int flags; + sigset_t orig_mask; + /* + * Function to run in the ultimate child process, and payload to pass = as + * the argument. + */ + int (*clone_f)(void *); + void *payload; + /* + * Result of calling `clone` for the child clone. Will be set to + * `-errno` if an error occurs. + */ + int result; +}; + +static bool stack_new(struct stack *stack) +{ + /* + * TODO: put a guard page at the bottom of the stack, so we don't + * accidentally roll off the end. + */ + if (posix_memalign(&stack->buffer, 16, NEW_STACK_SIZE)) { + return false; + } + memset(stack->buffer, 0, NEW_STACK_SIZE); + stack->top =3D stack->buffer + NEW_STACK_SIZE; + return true; +} + +static int clone_child(void *raw_info) +{ + struct info *info =3D (struct info *) raw_info; + int (*clone_f)(void *) =3D info->clone_f; + void *payload =3D info->payload; + if (!(info->flags & CLONE_VFORK)) { + /* + * If CLONE_VFORK is NOT set, then the trampoline has stalled (it + * forces VFORK), but the actual clone should return immediately. = In + * this case, this thread needs to notify the parent that the new + * process is running. If CLONE_VFORK IS set, the trampoline will + * notify the parent once the normal kernel vfork completes. + */ + completion_finish(&info->child_ready); + } + if (sigprocmask(SIG_SETMASK, &info->orig_mask, NULL) !=3D 0) { + perror("failed to restore signal mask in cloned child"); + _exit(1); + } + return clone_f(payload); +} + +static int clone_trampoline(void *raw_info) +{ + struct info *info =3D (struct info *) raw_info; + int flags; + + struct stack process_stack =3D info->stack.process; + int orig_flags =3D info->flags; + + if (orig_flags & CSIGNAL) { + /* + * It should be safe to call here, since we know signals are block= ed + * for this process. + */ + hide_current_process_exit_signal(); + } + + /* + * Force CLONE_PARENT, so that we don't accidentally become a child of= the + * trampoline thread. This kernel task should either be a child of the + * trampoline's parent (if CLONE_PARENT is not in info->flags), or a c= hild + * of the calling process's parent (if CLONE_PARENT IS in info->flags). + * That is to say, our parent should always be the correct parent for = the + * child task. + * + * Force CLONE_VFORK so that we know when the child is no longer holdi= ng + * a reference to this process's virtual memory. CLONE_VFORK just susp= ends + * this task until the child execs or exits, it should not affect how = the + * child process is created in any way. This is the only generic way I= 'm + * aware of to observe *any* exit or exec. Including "abnormal" exits = like + * exits via signals. + * + * Force CLONE_CHILD_SETTID, since we want to track the CHILD TID in t= he + * `info` structure. Capturing the child via `clone` call directly is + * slightly nicer than making a syscall in the child. Since we know we= 're + * doing a CLONE_VM here, we can use CLONE_CHILD_SETTID, to guarantee = that + * the kernel must set the child TID before the child is run. The child + * TID should be visibile to the parent, since both parent and child s= hare + * and address space. If the clone fails, we overwrite `info->result` + * anyways with the error code. + */ + flags =3D orig_flags | CLONE_PARENT | CLONE_VFORK | CLONE_CHILD_SETTID; + if (clone(clone_child, info->stack.process.top, flags, + (void *) info, NULL, NULL, &info->result) < 0) { + info->result =3D -errno; + completion_finish(&info->child_ready); + return 0; + } + + /* + * Clean up the child process stack, since we know the child can no lo= nger + * reference it. + */ + free(process_stack.buffer); + + /* + * We know the process we created was CLONE_VFORK, so it registered wi= th + * the RCU. We share a TLS image with the process, so we can unregister + * it from the RCU. Since the TLS image will be valid for at least our + * lifetime, it should be OK to leave the child processes RCU entry in + * the queue between when the child execve or exits, and the OS returns + * here from our vfork. + */ + rcu_unregister_thread(); + + /* + * If we're doing a real vfork here, we need to notify the parent that= the + * vfork has happened. + */ + if (orig_flags & CLONE_VFORK) { + completion_finish(&info->child_ready); + } + + return 0; +} + +static int clone_vm(int flags, int (*callback)(void *), void *payload) +{ + struct info info; + sigset_t sigmask; + int ret; + + assert(flags & CLONE_VM && "CLONE_VM flag must be set"); + + memset(&info, 0, sizeof(info)); + info.clone_f =3D callback; + info.payload =3D payload; + info.flags =3D flags; + + /* + * Set up the stacks for the child processes needed to execute the clo= ne. + */ + if (!stack_new(&info.stack.trampoline)) { + return -1; + } + if (!stack_new(&info.stack.process)) { + free(info.stack.trampoline.buffer); + return -1; + } + + /* + * tls_manager_new grants us it's ownership of the reference to the + * TLS manager, so we "leak" the data pointer, instead of using _get() + */ + struct tls_manager *mgr =3D tls_manager_new(); + if (mgr =3D=3D NULL) { + free(info.stack.trampoline.buffer); + free(info.stack.process.buffer); + return -1; + } + + /* Manager cleans up the trampoline stack once the trampoline exits. */ + mgr->cleanup =3D info.stack.trampoline.buffer; + + /* + * Flags used by the trampoline in the 2-phase clone setup for children + * cloned with CLONE_VM. We want the trampoline to be essentially iden= tical + * to its parent. This improves the performance of cloning the trampol= ine, + * and guarantees that the real flags are implemented correctly. + * + * CLONE_CHILD_SETTID: Make the kernel set the managed_tid for the TLS + * manager. + * + * CLONE_CHILD_CLEARTID: Make the kernel clear the managed_tid, and + * trigger a FUTEX_WAKE (received by the TLS manager), so the TLS mana= ger + * knows when to cleanup the trampoline stack. + * + * CLONE_SETTLS: To set the trampoline TLS based on the tls manager. + */ + static const int base_trampoline_flags =3D ( + CLONE_FILES | CLONE_FS | CLONE_IO | CLONE_PTRACE | + CLONE_SIGHAND | CLONE_SYSVSEM | CLONE_VM + ) | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID | CLONE_SETTLS; + + int trampoline_flags =3D base_trampoline_flags; + + /* + * To get the process hierarchy right, we set the trampoline + * CLONE_PARENT/CLONE_THREAD flag to match the child + * CLONE_PARENT/CLONE_THREAD. So add those flags if specified by the c= hild. + */ + trampoline_flags |=3D (flags & CLONE_PARENT) ? CLONE_PARENT : 0; + trampoline_flags |=3D (flags & CLONE_THREAD) ? CLONE_THREAD : 0; + + /* + * When using CLONE_PARENT, linux always sets the exit_signal for the = task + * to the exit_signal of the parent process. For our purposes, the + * trampoline process. exit_signal has special significance for calls = like + * `wait`, so it needs to be set correctly. We add the signal part of = the + * user flags here so the ultimate child gets the right signal. + * + * This has the unfortunate side-effect of sending the parent two exit + * signals. One when the true child exits, and one when the trampoline + * exits. To work-around this we have to capture the exit signal from = the + * trampoline and supress it. + */ + trampoline_flags |=3D (flags & CSIGNAL); + + sigfillset(&sigmask); + if (sigprocmask(SIG_BLOCK, &sigmask, &info.orig_mask) !=3D 0) { + free(info.stack.trampoline.buffer); + free(info.stack.process.buffer); + completion_finish(&mgr->spawned); + return -1; + } + + if (clone(clone_trampoline, + info.stack.trampoline.top, trampoline_flags, &info, + NULL, mgr->tls_ptr, &mgr->managed_tid) < 0) { + free(info.stack.trampoline.buffer); + free(info.stack.process.buffer); + completion_finish(&mgr->spawned); + return -1; + } + + completion_await(&info.child_ready); + completion_finish(&mgr->spawned); + + ret =3D sigprocmask(SIG_SETMASK, &info.orig_mask, NULL); + /* + * If our final sigproc mask doesn't work, we're pretty screwed. We may + * have started the final child now, and there's no going back. If this + * ever happens, just crash. + */ + assert(!ret && "sigprocmask after clone needs to succeed"); + + /* If we have an error result, then set errno as needed. */ + if (info.result < 0) { + errno =3D -info.result; + return -1; + } + return info.result; +} + struct clone_thread_info { struct completion running; int tid; @@ -120,6 +517,17 @@ int qemu_clone(int flags, int (*callback)(void *), voi= d *payload) { int ret; =20 + /* + * Backwards Compatibility: Remove once all target platforms support + * clone_vm. Previously, we implemented vfork() via a fork() call, + * preserve that behavior instead of failing. + */ + if (!clone_vm_supported()) { + if (flags & CLONE_VFORK) { + flags &=3D ~(CLONE_VFORK | CLONE_VM); + } + } + if (clone_flags_are_thread(flags)) { /* * The new process uses the same flags as pthread_create, so we can @@ -146,7 +554,12 @@ int qemu_clone(int flags, int (*callback)(void *), voi= d *payload) return ret; } =20 - /* !fork && !thread */ + if (clone_vm_supported() && (flags & CLONE_VM)) { + return clone_vm(flags, callback, payload); + } + + /* !fork && !thread && !CLONE_VM. This form is unsupported. */ + errno =3D EINVAL; return -1; } diff --git a/linux-user/qemu.h b/linux-user/qemu.h index 54bf4f47be..e29912466c 100644 --- a/linux-user/qemu.h +++ b/linux-user/qemu.h @@ -94,6 +94,7 @@ struct vm86_saved_state { =20 struct emulated_sigtable { int pending; /* true if signal is pending */ + pid_t exit_pid; /* non-zero host pid, if a process is exiting. */ target_siginfo_t info; }; =20 @@ -183,6 +184,15 @@ typedef struct TaskState { * least TARGET_NSIG entries */ struct target_sigaction *sigact_tbl; + + /* + * Set to true if the process asssociated with this task state was clo= ned. + * This is needed to disambiguate cloned processes from threads. If + * CLONE_VM is used, a pthread_exit(..) will free the stack/TLS of the + * trampoline thread, and the trampoline will be unable to conduct its + * cleanup. + */ + bool is_cloned; } __attribute__((aligned(16))) TaskState; =20 extern char *exec_path; @@ -442,6 +452,13 @@ abi_long do_sigaltstack(abi_ulong uss_addr, abi_ulong = uoss_addr, abi_ulong sp); int do_sigprocmask(int how, const sigset_t *set, sigset_t *oldset); abi_long do_swapcontext(CPUArchState *env, abi_ulong uold_ctx, abi_ulong unew_ctx, abi_long ctx_size); + +/* + * Register the current process as a "hidden" process. Exit signals genera= ted + * by this process should not be delivered to the guest. + */ +void hide_current_process_exit_signal(void); + /** * block_signals: block all signals while handling this guest syscall * diff --git a/linux-user/signal.c b/linux-user/signal.c index dc98def6d1..a7f0612b64 100644 --- a/linux-user/signal.c +++ b/linux-user/signal.c @@ -36,6 +36,21 @@ typedef struct target_sigaction sigact_table[TARGET_NSIG= ]; static void host_signal_handler(int host_signum, siginfo_t *info, void *puc); =20 +/* + * This table, initilized in signal_init, is used to track "hidden" proces= ses + * for which exit signals should not be delivered. The PIDs of the process= es + * hidden processes are stored as keys. Values are always set to NULL. + * + * Note: Process IDs stored in this table may "leak" (i.e., never be remov= ed + * from the table) if the guest blocks (SIG_IGN) the exit signal for the c= hild + * it spawned. There is a small risk, that this PID could later be reused + * by an alternate child process, and the child exit would be hidden. This= is + * an unusual case that is unlikely to happen, but it is possible. + */ +static GHashTable *hidden_processes; + +/* this lock guards access to the `hidden_processes` table. */ +static pthread_mutex_t hidden_processes_lock =3D PTHREAD_MUTEX_INITIALIZER; =20 /* * System includes define _NSIG as SIGRTMAX + 1, @@ -564,6 +579,9 @@ void signal_init(void) /* initialize signal conversion tables */ signal_table_init(); =20 + /* initialize the hidden process table. */ + hidden_processes =3D g_hash_table_new(g_direct_hash, g_direct_equal); + /* Set the signal mask from the host mask. */ sigprocmask(0, 0, &ts->signal_mask); =20 @@ -749,6 +767,10 @@ static void host_signal_handler(int host_signum, sigin= fo_t *info, k =3D &ts->sigtab[sig - 1]; k->info =3D tinfo; k->pending =3D sig; + k->exit_pid =3D 0; + if (info->si_code & (CLD_DUMPED | CLD_KILLED | CLD_EXITED)) { + k->exit_pid =3D info->si_pid; + } ts->signal_pending =3D 1; =20 /* Block host signals until target signal handler entered. We @@ -930,6 +952,17 @@ int do_sigaction(int sig, const struct target_sigactio= n *act, return ret; } =20 +void hide_current_process_exit_signal(void) +{ + pid_t pid =3D getpid(); + + pthread_mutex_lock(&hidden_processes_lock); + + (void)g_hash_table_insert(hidden_processes, GINT_TO_POINTER(pid), NULL= ); + + pthread_mutex_unlock(&hidden_processes_lock); +} + static void handle_pending_signal(CPUArchState *cpu_env, int sig, struct emulated_sigtable *k) { @@ -944,6 +977,22 @@ static void handle_pending_signal(CPUArchState *cpu_en= v, int sig, /* dequeue signal */ k->pending =3D 0; =20 + if (k->exit_pid) { + pthread_mutex_lock(&hidden_processes_lock); + /* + * If the exit signal is for a hidden PID, then just drop it, and + * remove the hidden process from the list, since we know it has + * exited. + */ + if (g_hash_table_contains(hidden_processes, + GINT_TO_POINTER(k->exit_pid))) { + g_hash_table_remove(hidden_processes, GINT_TO_POINTER(k->exit_= pid)); + pthread_mutex_unlock(&hidden_processes_lock); + return; + } + pthread_mutex_unlock(&hidden_processes_lock); + } + sig =3D gdb_handlesig(cpu, sig); if (!sig) { sa =3D NULL; diff --git a/linux-user/syscall.c b/linux-user/syscall.c index 838caf9c98..20cf5d5464 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -139,10 +139,9 @@ =20 /* These flags are ignored: * CLONE_DETACHED is now ignored by the kernel; - * CLONE_IO is just an optimisation hint to the I/O scheduler */ #define CLONE_IGNORED_FLAGS \ - (CLONE_DETACHED | CLONE_IO) + (CLONE_DETACHED) =20 /* Flags for fork which we can implement within QEMU itself */ #define CLONE_EMULATED_FLAGS \ @@ -5978,14 +5977,31 @@ static int do_fork(CPUArchState *env, unsigned int = flags, abi_ulong newsp, } proc_flags =3D (proc_flags & ~CSIGNAL) | host_sig; =20 - /* Emulate vfork() with fork() */ - if (proc_flags & CLONE_VFORK) { - proc_flags &=3D ~(CLONE_VFORK | CLONE_VM); + + if (!clone_flags_are_fork(proc_flags) && !(flags & CLONE_VM)) { + /* + * If the user is doing a non-CLONE_VM clone, which cannot be emul= ated + * with fork, we can't guarantee that we can emulate this correctl= y. + * It should work OK as long as there are no threads in parent pro= cess, + * so we hide it behind a flag if the user knows what they're doin= g. + */ + qemu_log_mask(LOG_UNIMP, + "Refusing non-fork/thread clone without CLONE_VM."); + return -TARGET_EINVAL; } =20 - if (!clone_flags_are_fork(proc_flags) && - !clone_flags_are_thread(proc_flags)) { - qemu_log_mask(LOG_UNIMP, "unsupported clone flags"); + if ((flags & CLONE_FILES) && !(flags & CLONE_VM)) { + /* + * This flag combination is currently unsupported. QEMU needs to u= pdate + * the fd_trans_table as new file descriptors are opened. This is = easy + * when CLONE_VM is set, because the fd_trans_table is shared betw= een + * the parent and child. Without CLONE_VM the fd_trans_table will = need + * to be share specially using shared memory mappings, or a + * consistentcy protocol between the child and the parent. + * + * For now, just return EINVAL in this case. + */ + qemu_log_mask(LOG_UNIMP, "CLONE_FILES only supported with CLONE_VM= "); return -TARGET_EINVAL; } =20 @@ -6042,6 +6058,10 @@ static int do_fork(CPUArchState *env, unsigned int f= lags, abi_ulong newsp, ts->sigact_tbl =3D sigact_table_clone(parent_ts->sigact_tbl); } =20 + if (!clone_flags_are_thread(proc_flags)) { + ts->is_cloned =3D true; + } + if (flags & CLONE_CHILD_CLEARTID) { ts->child_tidptr =3D child_tidptr; } @@ -6063,10 +6083,8 @@ static int do_fork(CPUArchState *env, unsigned int f= lags, abi_ulong newsp, tb_flush(cpu); } =20 - if (proc_flags & CLONE_VM) { - info.child.register_thread =3D true; - info.child.signal_setup =3D true; - } + info.child.signal_setup =3D (flags & CLONE_VM) && !(flags & CLONE_VFOR= K); + info.child.register_thread =3D !!(flags & CLONE_VM); =20 /* * It is not safe to deliver signals until the child has finished @@ -6078,7 +6096,7 @@ static int do_fork(CPUArchState *env, unsigned int fl= ags, abi_ulong newsp, =20 ret =3D get_errno(qemu_clone(proc_flags, clone_run, (void *) &info)); =20 - if (ret >=3D 0 && (proc_flags & CLONE_VM)) { + if (ret >=3D 0 && (flags & CLONE_VM) && !(flags & CLONE_VFORK)) { /* * Wait for the child to finish setup if the child is running in t= he * same VM. @@ -6092,7 +6110,7 @@ static int do_fork(CPUArchState *env, unsigned int fl= ags, abi_ulong newsp, pthread_cond_destroy(&info.cond); pthread_mutex_destroy(&info.mutex); =20 - if (ret >=3D 0 && !(proc_flags & CLONE_VM)) { + if (ret >=3D 0 && !(flags & CLONE_VM)) { /* * If !CLONE_VM, then we need to set parent_tidptr, since the child * won't set it for us. Should always be safe to set it here anywa= ys. @@ -7662,6 +7680,7 @@ static abi_long do_syscall1(void *cpu_env, int num, a= bi_long arg1, switch(num) { case TARGET_NR_exit: { + bool do_pthread_exit =3D false; /* In old applications this may be used to implement _exit(2). However in threaded applictions it is used for thread terminati= on, and _exit_group is used for application termination. @@ -7692,10 +7711,20 @@ static abi_long do_syscall1(void *cpu_env, int num,= abi_long arg1, NULL, NULL, 0); } =20 + /* + * Need this multi-step process so we can free ts before calli= ng + * pthread_exit. + */ + if (!ts->is_cloned) { + do_pthread_exit =3D true; + } + thread_cpu =3D NULL; g_free(ts); - rcu_unregister_thread(); - pthread_exit(NULL); + if (do_pthread_exit) { + rcu_unregister_thread(); + pthread_exit(NULL); + } } =20 pthread_mutex_unlock(&clone_lock); @@ -9700,6 +9729,14 @@ static abi_long do_syscall1(void *cpu_env, int num, = abi_long arg1, #ifdef __NR_exit_group /* new thread calls */ case TARGET_NR_exit_group: { + /* + * TODO: We need to clean up CPUs (like is done for exit(2)) + * for all threads in this process when exit_group is called, at l= east + * for tasks that have been cloned. Could also be done in + * clone_trampoline/tls_mgr. Since this cleanup is non-trival (nee= d to + * coordinate it across threads. Right now it seems to be fine wit= hout + * the cleanup, so just leaving a note. + */ preexit_cleanup(cpu_env, arg1); return get_errno(exit_group(arg1)); } diff --git a/tests/tcg/multiarch/linux-test.c b/tests/tcg/multiarch/linux-t= est.c index 8a7c15cd31..a7723556c2 100644 --- a/tests/tcg/multiarch/linux-test.c +++ b/tests/tcg/multiarch/linux-test.c @@ -407,14 +407,13 @@ static void test_clone(void) =20 stack1 =3D malloc(STACK_SIZE); pid1 =3D chk_error(clone(thread1_func, stack1 + STACK_SIZE, - CLONE_VM | CLONE_FS | CLONE_FILES | - CLONE_SIGHAND | CLONE_THREAD | CLONE_SYSVSEM, + CLONE_VM | SIGCHLD, "hello1")); =20 stack2 =3D malloc(STACK_SIZE); pid2 =3D chk_error(clone(thread2_func, stack2 + STACK_SIZE, CLONE_VM | CLONE_FS | CLONE_FILES | - CLONE_SIGHAND | CLONE_THREAD | CLONE_SYSVSEM, + CLONE_SIGHAND | CLONE_SYSVSEM | SIGCHLD, "hello2")); =20 wait_for_child(pid1); @@ -517,6 +516,61 @@ static void test_shm(void) chk_error(shmdt(ptr)); } =20 +static volatile sig_atomic_t test_clone_signal_count_handler_calls; + +static void test_clone_signal_count_handler(int sig) +{ + test_clone_signal_count_handler_calls++; +} + +/* A clone function that does nothing and exits successfully. */ +static int successful_func(void *arg __attribute__((unused))) +{ + return 0; +} + +/* + * With our clone implementation it's possible that we could generate too = many + * child exit signals. Make sure only the single expected child-exit signa= l is + * generated. + */ +static void test_clone_signal_count(void) +{ + uint8_t *child_stack; + struct sigaction prev, test; + int status; + pid_t pid; + + memset(&test, 0, sizeof(test)); + test.sa_handler =3D test_clone_signal_count_handler; + test.sa_flags =3D SA_RESTART; + + /* Use real-time signals, so every signal event gets delivered. */ + chk_error(sigaction(SIGRTMIN, &test, &prev)); + + child_stack =3D malloc(STACK_SIZE); + pid =3D chk_error(clone( + successful_func, + child_stack + STACK_SIZE, + CLONE_VM | SIGRTMIN, + NULL + )); + + /* + * Need to use __WCLONE here because we are not using SIGCHLD as the + * exit_signal. By default linux only waits for children spawned with + * SIGCHLD. + */ + chk_error(waitpid(pid, &status, __WCLONE)); + + chk_error(sigaction(SIGRTMIN, &prev, NULL)); + + if (test_clone_signal_count_handler_calls !=3D 1) { + error("expected to receive exactly 1 signal, received %d signals", + test_clone_signal_count_handler_calls); + } +} + int main(int argc, char **argv) { test_file(); @@ -524,11 +578,8 @@ int main(int argc, char **argv) test_fork(); test_time(); test_socket(); - - if (argc > 1) { - printf("test_clone still considered buggy\n"); - test_clone(); - } + test_clone(); + test_clone_signal_count(); =20 test_signal(); test_shm(); --=20 2.27.0.290.gba653c62da-goog From nobody Sun May 19 12:13:31 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=reject dis=none) header.from=google.com ARC-Seal: i=1; a=rsa-sha256; t=1591926618; cv=none; d=zohomail.com; s=zohoarc; b=Xr7Km8RuMs9K9q7WNT0skWLX9oF1JQzrFQH7QrbuwflYeIUw5taee6S/BabTDd3/ytnWeBq3On+AKPC4JP901iycvUYW5nBauSvuPF8u58ICHQRwkvwitfFAdjQItHFpriGv/y5OHUO7yeZ28l0o5BzL6/5DY+IhE0tf/y5Hm1Q= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1591926618; h=Content-Type:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=E4ik8F3C9Ws3lL4kBNltyKwjw/0cR5DBXCIJAWDvMCQ=; b=nGcohVFiULIh5sOlCBqFnA4lpIohnjZ6LquiVFq534kBitf2OUTb17d6Dw5jIHaA8PfGOZf9TugJPnrXzp6vtWDlFUKw/JmyvvK2Lon0bQvpBhbFDw7DY0V9gDlI0JfqysgOC7WMAJ2Pf3Mt92I1/rThmuw/hfNLWIcJHsyxXQg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=reject dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1591926618126108.88040795443476; Thu, 11 Jun 2020 18:50:18 -0700 (PDT) Received: from localhost ([::1]:33604 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jjYpk-0006nk-R8 for importer@patchew.org; Thu, 11 Jun 2020 21:50:16 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51118) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3a97iXgMKCrAZapWeeWbU.SecgUck-TUlUbdedWdk.ehW@flex--jkz.bounces.google.com>) id 1jjYm4-00016r-GZ for qemu-devel@nongnu.org; Thu, 11 Jun 2020 21:46:28 -0400 Received: from mail-qv1-xf49.google.com ([2607:f8b0:4864:20::f49]:42163) by eggs.gnu.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3a97iXgMKCrAZapWeeWbU.SecgUck-TUlUbdedWdk.ehW@flex--jkz.bounces.google.com>) id 1jjYm2-0001VU-IX for qemu-devel@nongnu.org; Thu, 11 Jun 2020 21:46:28 -0400 Received: by mail-qv1-xf49.google.com with SMTP id w3so5845285qvl.9 for ; Thu, 11 Jun 2020 18:46:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=E4ik8F3C9Ws3lL4kBNltyKwjw/0cR5DBXCIJAWDvMCQ=; b=E4F4T8Cy2xR0y1KX38vZfba81i0X7CyiecxwNKCR4zh3nzXXfJOPYU+LILn2ZbZUl9 CoXIa7VhRxMG7us19Tor3Uo0NfRN2k3Z9WQxmTwX8YYeVWUWlYIkBK/kfymfu6lMeEM3 FVtw1Fw+zTOAVEdyFcBQD3G8//0HtX+CgbdcJnyy5DUeT4/F/klNqnugQkyZ5FPxNluq JI4N8oF1gzaj4JwzmT5jpTlcq2alhXRmHe9e3ZguBdnfuO0CIZfpZfotU50IKkeQFjaX d1ezF/6vl4J0aJhQixCZoVN8yYGqWHRmeHFTx18ZSWWfVWur/fl4MrL7MzBWqCv8elmS zV1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=E4ik8F3C9Ws3lL4kBNltyKwjw/0cR5DBXCIJAWDvMCQ=; b=anvUR0OvuS9fnr6evYEh/S5gGF2RAuQ1aK2Bp8Vqic9SHOQ8oQvEY3Z9DnlBeg+iwN N/EAfdKe/3R38eHptiZmg5cRXl/nq7QXnRx4r1jdtMmmel65ulX9BhCv6dqJocNEMxv4 yTmJ5Rk4VMbGQyBI19fCsHQa0ZuQi58W6fDGT2Gluub7BBRgznFXbqJGA23boKFxA5Oc Y3rPGAwkB9v4ix5Hr0tLglvxM7sFj9ppJx0ZPsHJqRdPZe5Rkbi0kDFX/qEWwGNOx/8t EygFDSOCRe07FOkREEQjme9fMZohCgNgKxRHkKmgOB2ZanZqAaRhjou2Rh4wvf7ZVjpo 4hLw== X-Gm-Message-State: AOAM530TtSTHLl8N2AKG7stjARcwrZ32XTyKU0SLxBg/YmHaPlTbODsC oVuFN8LOQSN++kDuJEkfvirzFaHmlCOtVWytQaGt3Vxja5uxXhJNeinB8I0a8ytcWLAUNw6d1em Ls08DEdqzpbOna91in9E3XkKu3lg7hd7RlnLF+XuGfUyhyAijTrMN X-Google-Smtp-Source: ABdhPJxKQFGXJRYnQx9WtxygmdRTedVUH/CHhykVMpalGOxw5Yw7Z/23BDIgD/FwQoGpQVGc4+womNk= X-Received: by 2002:a05:6214:11f0:: with SMTP id e16mr10306827qvu.37.1591926379175; Thu, 11 Jun 2020 18:46:19 -0700 (PDT) Date: Thu, 11 Jun 2020 18:46:06 -0700 In-Reply-To: <20200612014606.147691-1-jkz@google.com> Message-Id: <20200612014606.147691-6-jkz@google.com> Mime-Version: 1.0 References: <20200612014606.147691-1-jkz@google.com> X-Mailer: git-send-email 2.27.0.290.gba653c62da-goog Subject: [PATCH 5/5] linux-user: Add PDEATHSIG test for clone process hierarchy. From: Josh Kunz To: qemu-devel@nongnu.org Cc: riku.voipio@iki.fi, laurent@vivier.eu, alex.bennee@linaro.org, Josh Kunz Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::f49; envelope-from=3a97iXgMKCrAZapWeeWbU.SecgUck-TUlUbdedWdk.ehW@flex--jkz.bounces.google.com; helo=mail-qv1-xf49.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -105 X-Spam_score: -10.6 X-Spam_bar: ---------- X-Spam_report: (-10.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-1, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5 autolearn=_AUTOLEARN X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail-DKIM: pass (identity @google.com) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Certain process-level linux features like subreapers, and PDEATHSIG, depend on the guest's process hierarchy being emulated correctly on the host. This change adds a test that makes sure PDEATHSIG works for a guest process created with `clone`. Signed-off-by: Josh Kunz --- tests/tcg/multiarch/Makefile.target | 3 + tests/tcg/multiarch/linux-test.c | 160 ++++++++++++++++++++++++++-- 2 files changed, 153 insertions(+), 10 deletions(-) diff --git a/tests/tcg/multiarch/Makefile.target b/tests/tcg/multiarch/Make= file.target index cb49cc9ccb..d937b4c59b 100644 --- a/tests/tcg/multiarch/Makefile.target +++ b/tests/tcg/multiarch/Makefile.target @@ -60,3 +60,6 @@ endif =20 # Update TESTS TESTS +=3D $(MULTIARCH_TESTS) + +# linux-test.c depends on -pthread. +LDFLAGS +=3D -pthread diff --git a/tests/tcg/multiarch/linux-test.c b/tests/tcg/multiarch/linux-t= est.c index a7723556c2..1824a5a0c2 100644 --- a/tests/tcg/multiarch/linux-test.c +++ b/tests/tcg/multiarch/linux-test.c @@ -20,16 +20,19 @@ #include #include #include +#include #include #include #include #include #include +#include #include #include #include #include #include +#include #include #include #include @@ -41,6 +44,7 @@ #include #include #include +#include =20 #define STACK_SIZE 16384 =20 @@ -368,14 +372,12 @@ static void test_pipe(void) chk_error(close(fds[1])); } =20 -static int thread1_res; -static int thread2_res; - static int thread1_func(void *arg) { + int *res =3D (int *) arg; int i; for(i=3D0;i<5;i++) { - thread1_res++; + (*res)++; usleep(10 * 1000); } return 0; @@ -383,9 +385,10 @@ static int thread1_func(void *arg) =20 static int thread2_func(void *arg) { + int *res =3D (int *) arg; int i; for(i=3D0;i<6;i++) { - thread2_res++; + (*res)++; usleep(10 * 1000); } return 0; @@ -405,25 +408,27 @@ static void test_clone(void) uint8_t *stack1, *stack2; pid_t pid1, pid2; =20 + int t1 =3D 0, t2 =3D 0; + stack1 =3D malloc(STACK_SIZE); pid1 =3D chk_error(clone(thread1_func, stack1 + STACK_SIZE, CLONE_VM | SIGCHLD, - "hello1")); + &t1)); =20 stack2 =3D malloc(STACK_SIZE); pid2 =3D chk_error(clone(thread2_func, stack2 + STACK_SIZE, CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_SYSVSEM | SIGCHLD, - "hello2")); + &t2)); =20 wait_for_child(pid1); free(stack1); wait_for_child(pid2); free(stack2); =20 - if (thread1_res !=3D 5 || - thread2_res !=3D 6) + if (t1 !=3D 5 || t2 !=3D 6) { error("clone"); + } } =20 /***********************************/ @@ -562,6 +567,7 @@ static void test_clone_signal_count(void) * SIGCHLD. */ chk_error(waitpid(pid, &status, __WCLONE)); + free(child_stack); =20 chk_error(sigaction(SIGRTMIN, &prev, NULL)); =20 @@ -571,6 +577,139 @@ static void test_clone_signal_count(void) } } =20 +struct test_clone_pdeathsig_info { + uint8_t *child_stack; + pthread_mutex_t notify_test_mutex; + pthread_cond_t notify_test_cond; + pthread_mutex_t notify_parent_mutex; + pthread_cond_t notify_parent_cond; + bool signal_received; +}; + +static int test_clone_pdeathsig_child(void *arg) +{ + struct test_clone_pdeathsig_info *info =3D + (struct test_clone_pdeathsig_info *) arg; + sigset_t wait_on, block_all; + siginfo_t sinfo; + struct timespec timeout; + int ret; + + /* Block all signals, so SIGUSR1 will be pending when we wait on it. */ + sigfillset(&block_all); + chk_error(sigprocmask(SIG_BLOCK, &block_all, NULL)); + + chk_error(prctl(PR_SET_PDEATHSIG, SIGUSR1)); + + pthread_mutex_lock(&info->notify_parent_mutex); + pthread_cond_broadcast(&info->notify_parent_cond); + pthread_mutex_unlock(&info->notify_parent_mutex); + + sigemptyset(&wait_on); + sigaddset(&wait_on, SIGUSR1); + timeout.tv_sec =3D 0; + timeout.tv_nsec =3D 300 * 1000 * 1000; /* 300ms */ + + ret =3D sigtimedwait(&wait_on, &sinfo, &timeout); + + if (ret < 0 && errno !=3D EAGAIN) { + error("%m (ret=3D%d, errno=3D%d/%s)", ret, errno, strerror(errno)); + } + if (ret =3D=3D SIGUSR1) { + info->signal_received =3D true; + } + pthread_mutex_lock(&info->notify_test_mutex); + pthread_cond_broadcast(&info->notify_test_cond); + pthread_mutex_unlock(&info->notify_test_mutex); + _exit(0); +} + +static int test_clone_pdeathsig_parent(void *arg) +{ + struct test_clone_pdeathsig_info *info =3D + (struct test_clone_pdeathsig_info *) arg; + + pthread_mutex_lock(&info->notify_parent_mutex); + + chk_error(clone( + test_clone_pdeathsig_child, + info->child_stack + STACK_SIZE, + CLONE_VM, + info + )); + + /* No need to reap the child, it will get reaped by init. */ + + /* Wait for the child to signal that they have set up PDEATHSIG. */ + pthread_cond_wait(&info->notify_parent_cond, &info->notify_parent_mute= x); + pthread_mutex_unlock(&info->notify_parent_mutex); /* avoid UB on dest= roy */ + + _exit(0); +} + +/* + * This checks that cloned children have the correct parent/child + * relationship using PDEATHSIG. PDEATHSIG is based on kernel task hierarc= hy, + * rather than "process" hierarchy, so it should be pretty sensitive to + * breakages. PDEATHSIG is also a widely used feature, so it's important + * it's correct. + * + * This test works by spawning a child process (parent) which then spawns = it's + * own child (the child). The child registers a PDEATHSIG handler, and then + * notifies the parent which exits. The child then waits for the PDEATHSIG + * signal it regsitered. The child reports whether or not the signal is + * received within a small time window, and then notifies the test runner + * (this function) that the test is finished. + */ +static void test_clone_pdeathsig(void) +{ + uint8_t *parent_stack; + struct test_clone_pdeathsig_info info; + pid_t pid; + int status; + + memset(&info, 0, sizeof(info)); + + /* + * Setup condition variables, so we can be notified once the final chi= ld + * observes the PDEATHSIG signal from it's parent exiting. When the pa= rent + * exits, the child will be orphaned, so we can't use `wait*` to wait = for + * it to finish. + */ + chk_error(pthread_mutex_init(&info.notify_test_mutex, NULL)); + chk_error(pthread_cond_init(&info.notify_test_cond, NULL)); + chk_error(pthread_mutex_init(&info.notify_parent_mutex, NULL)); + chk_error(pthread_cond_init(&info.notify_parent_cond, NULL)); + + parent_stack =3D malloc(STACK_SIZE); + info.child_stack =3D malloc(STACK_SIZE); + + pthread_mutex_lock(&info.notify_test_mutex); + + pid =3D chk_error(clone( + test_clone_pdeathsig_parent, + parent_stack + STACK_SIZE, + CLONE_VM, + &info + )); + + pthread_cond_wait(&info.notify_test_cond, &info.notify_test_mutex); + pthread_mutex_unlock(&info.notify_test_mutex); + chk_error(waitpid(pid, &status, __WCLONE)); /* reap the parent */ + + free(parent_stack); + free(info.child_stack); + + pthread_cond_destroy(&info.notify_parent_cond); + pthread_mutex_destroy(&info.notify_parent_mutex); + pthread_cond_destroy(&info.notify_test_cond); + pthread_mutex_destroy(&info.notify_test_mutex); + + if (!info.signal_received) { + error("child did not receive PDEATHSIG on parent death"); + } +} + int main(int argc, char **argv) { test_file(); @@ -580,8 +719,9 @@ int main(int argc, char **argv) test_socket(); test_clone(); test_clone_signal_count(); - + test_clone_pdeathsig(); test_signal(); test_shm(); + return 0; } --=20 2.27.0.290.gba653c62da-goog