From nobody Mon Feb 9 12:01:23 2026 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D89931991D2 for ; Tue, 28 Jan 2025 16:07:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738080476; cv=none; b=BTD8Fsm6TqjELNJgD8yIu+FkxEqPSrtHsYjtx80fsKicyvzQkRUxYDrPN4PSM9Y0JFR2nuFe0NHKsZvh+li+I2SyAPykf6ECop15+9LaCz2Va64/vCvClIYvowwzceo6RReJSm3EfKyLWU39cN2x8MqoxZyu/6G8Vx9E9ETc4i0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738080476; c=relaxed/simple; bh=FTCgQD4ThC68cSaJS3RdVFZr6gomHLrcg59hc72/1hI=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=K12kh9oDpV/nIte5qfb0UBTorxO7AwlsRDwabmxuSb0a2TdEAr46QbzKK6nC8kS1dHXljWgxMT4eCq9TCis+5BF1SOjsoQ3VhIBnyFjbgUqjGLnZ1lCNASVbPisIwLT5x65UnCWkZHaZlrp+8Pbw/mp9rir0tLBn6rQZsu8TB/w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OcbFZvo5; arc=none smtp.client-ip=209.85.208.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OcbFZvo5" Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-5dc59303334so854021a12.2 for ; Tue, 28 Jan 2025 08:07:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738080473; x=1738685273; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ovay8nedMLnJi20KqC6F6Tzu3DM+dLgm6RUualvZZ3c=; b=OcbFZvo5DCN3RUcmANRr6u2Hh6zRwsBdRvfL+qhB5o8QZeQkVX9C6tJ7EgYyj6n+3x UXgHQJKCmViida34bNnoy2XtlxngNz63i330rIU3ydJJbQspXNSyeTVb3CggU7yJXb7O og9zSKdoOIaTwbOLGBml1b625dBm6NSI2Oqdc3n00/hATNxCt81tdhjTrGXgFPxxBamd NTrhyPRojJAItYCroyfivbLfS13Gi3MFKxtUgdtQSYfVS7dF+/6TjAUdmDQPtxcj+TVt gDHC3MZAL63WG1oBShWRCXRLXDCpnuvCMyUdFGCmv6gQiHSdRI16Rrw6G07LOZ92Fkcs HH5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738080473; x=1738685273; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ovay8nedMLnJi20KqC6F6Tzu3DM+dLgm6RUualvZZ3c=; b=syFrfD5m3MO0YfOa5v85CwXdRoAA6POj7yM7Xw7z6L5Kf+hpAuslhcf1erZtniQuWQ jKG/Zvt32wrtE2awgCmDCIZFBayifdaa9sdb84yeKMVqODQCdwzy2YJHoFZFRZ1cmuCB G8LTFtRBTgWpY2UHfEqwnRjOIqaeU57r6wj01mYQp81BjmcqPhQj36Fh0SsT4JD1X2xt 1TdGjSjs4TFQeFwwgpBGF3OpGq8s7AlizOeKMKUsofYns7qtH9R+Nujg2OjDvH247qnn oPp7Qp/NoAzFaPeosCL63eCI+t4kKU4a/XpB2GTcLu+tX03NY9f9eJc4TKwXbdAJpDeE 6mBQ== X-Forwarded-Encrypted: i=1; AJvYcCVKs45gQJHXZ+1tregnxcXVnm2nFVodz36JrN63HZ47pTGD5Ok3Ht6BAb2YtGtcvzQ/TwcFp/j69bPsYJk=@vger.kernel.org X-Gm-Message-State: AOJu0YwgQ91uoJdjVrABdTuZdlnRAiVqBSkchtEPlUZBlEDpV94DdOrM GxFEF0ocSdnUuf0YYy7iBalQCx2DiYB11WYIeeaDT3QB0O4iJxPj X-Gm-Gg: ASbGncvEqK3l5+8UFLUc/QX2FfU8tnY+fCCf07eOXWKs91c/u7YCJFiwlDfusJJsj4/ Hxq/0A+8QLDfIpO14R8CAvX1TCXSKgQvwDPBdN7BZpZDX5RF3D6LLi+IIxRyOMH3IzfhgZvaotR IBudlB2uNdxGQC+X/zdgK+MJRgBI8J1Oh9oJ1hLcYoSnRjoVORKIJB0dN5ArPMlMe9gt6QEY9oJ o1IvaHk8olQKDF1sjw9qBkb8exOfYIkdGoiiHDYPBPZdr8iqIHXMF+CRL1G4q7wLCBHNjq/uTmC we44g+LtrewvkwIZ0UZIE8z3sBfMcPM= X-Google-Smtp-Source: AGHT+IF4h9Geomlfe43LVBQYIdfj4bDaBQVvTG+9BiPbyFdhTm8Pxvp2SBqsUXLR1ROpCt4rSgT35Q== X-Received: by 2002:a05:6402:358f:b0:5dc:1059:69f with SMTP id 4fb4d7f45d1cf-5dc105a612cmr18010289a12.10.1738080472827; Tue, 28 Jan 2025 08:07:52 -0800 (PST) Received: from f.. (cst-prg-69-60.cust.vodafone.cz. [46.135.69.60]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-5dc186b3157sm7407406a12.50.2025.01.28.08.07.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jan 2025 08:07:52 -0800 (PST) From: Mateusz Guzik To: brauner@kernel.org, oleg@redhat.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mateusz Guzik Subject: [PATCH v2] exit: perform randomness and pid work without tasklist_lock Date: Tue, 28 Jan 2025 17:07:43 +0100 Message-ID: <20250128160743.3142544-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Both add_device_randomness() and attach_pid()/detach_pid() have their own synchronisation mechanisms. The clone side calls them *without* the tasklist_lock held, meaning parallel calls can contend on their locks. The exit side calls them *with* the tasklist_lock lock, which means the hold time is avoidably extended by waiting on either of the 2 locks, in turn exacerbating contention on tasklist_lock itself. Postponing the work until after the lock is dropped bumps thread creation/destruction rate by 15% on a 24 core vm. Bench (plop into will-it-scale): $ cat tests/threadspawn1.c char *testcase_description =3D "Thread creation and teardown"; static void *worker(void *arg) { return (NULL); } void testcase(unsigned long long *iterations, unsigned long nr) { pthread_t thread; int error; while (1) { error =3D pthread_create(&thread, NULL, worker, NULL); assert(error =3D=3D 0); error =3D pthread_join(thread, NULL); assert(error =3D=3D 0); (*iterations)++; } } Run: $ ./threadspawn1_processes -t 24 Signed-off-by: Mateusz Guzik --- v2: - introduce a struct to collect work - move out pids as well there is more which can be pulled out this may look suspicious: + proc_flush_pid(p->thread_pid); AFAICS this is constant for the duration of the lifetime, so i don't there is a problem include/linux/pid.h | 1 + kernel/exit.c | 53 +++++++++++++++++++++++++++++++++------------ kernel/pid.c | 23 +++++++++++++++----- 3 files changed, 58 insertions(+), 19 deletions(-) diff --git a/include/linux/pid.h b/include/linux/pid.h index 98837a1ff0f3..6e9fcacd02cd 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -101,6 +101,7 @@ extern struct pid *get_task_pid(struct task_struct *tas= k, enum pid_type type); * these helpers must be called with the tasklist_lock write-held. */ extern void attach_pid(struct task_struct *task, enum pid_type); +extern struct pid *detach_pid_return(struct task_struct *task, enum pid_ty= pe); extern void detach_pid(struct task_struct *task, enum pid_type); extern void change_pid(struct task_struct *task, enum pid_type, struct pid *pid); diff --git a/kernel/exit.c b/kernel/exit.c index 1dcddfe537ee..4e452d3e3a89 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -122,14 +122,23 @@ static __init int kernel_exit_sysfs_init(void) late_initcall(kernel_exit_sysfs_init); #endif =20 -static void __unhash_process(struct task_struct *p, bool group_dead) +/* + * For things release_task would like to do *after* tasklist_lock is relea= sed. + */ +struct release_task_post { + unsigned long long randomness; + struct pid *pids[PIDTYPE_MAX]; +}; + +static void __unhash_process(struct release_task_post *post, struct task_s= truct *p, + bool group_dead) { nr_threads--; - detach_pid(p, PIDTYPE_PID); + post->pids[PIDTYPE_PID] =3D detach_pid_return(p, PIDTYPE_PID); if (group_dead) { - detach_pid(p, PIDTYPE_TGID); - detach_pid(p, PIDTYPE_PGID); - detach_pid(p, PIDTYPE_SID); + post->pids[PIDTYPE_TGID] =3D detach_pid_return(p, PIDTYPE_TGID); + post->pids[PIDTYPE_PGID] =3D detach_pid_return(p, PIDTYPE_PGID); + post->pids[PIDTYPE_SID] =3D detach_pid_return(p, PIDTYPE_SID); =20 list_del_rcu(&p->tasks); list_del_init(&p->sibling); @@ -141,7 +150,8 @@ static void __unhash_process(struct task_struct *p, boo= l group_dead) /* * This function expects the tasklist_lock write-locked. */ -static void __exit_signal(struct task_struct *tsk) +static void __exit_signal(struct release_task_post *post, + struct task_struct *tsk) { struct signal_struct *sig =3D tsk->signal; bool group_dead =3D thread_group_leader(tsk); @@ -174,8 +184,7 @@ static void __exit_signal(struct task_struct *tsk) sig->curr_target =3D next_thread(tsk); } =20 - add_device_randomness((const void*) &tsk->se.sum_exec_runtime, - sizeof(unsigned long long)); + post->randomness =3D tsk->se.sum_exec_runtime; =20 /* * Accumulate here the counters for all threads as they die. We could @@ -197,7 +206,7 @@ static void __exit_signal(struct task_struct *tsk) task_io_accounting_add(&sig->ioac, &tsk->ioac); sig->sum_sched_runtime +=3D tsk->se.sum_exec_runtime; sig->nr_threads--; - __unhash_process(tsk, group_dead); + __unhash_process(post, tsk, group_dead); write_sequnlock(&sig->stats_lock); =20 /* @@ -240,9 +249,13 @@ void __weak release_thread(struct task_struct *dead_ta= sk) void release_task(struct task_struct *p) { struct task_struct *leader; - struct pid *thread_pid; int zap_leader; + struct release_task_post post; repeat: + memset(&post, 0, sizeof(post)); + + proc_flush_pid(p->thread_pid); + /* don't need to get the RCU readlock here - the process is dead and * can't be modifying its own credentials. But shut RCU-lockdep up */ rcu_read_lock(); @@ -253,8 +266,7 @@ void release_task(struct task_struct *p) =20 write_lock_irq(&tasklist_lock); ptrace_release_task(p); - thread_pid =3D get_pid(p->thread_pid); - __exit_signal(p); + __exit_signal(&post, p); =20 /* * If we are the last non-leader member of the thread @@ -276,8 +288,21 @@ void release_task(struct task_struct *p) } =20 write_unlock_irq(&tasklist_lock); - proc_flush_pid(thread_pid); - put_pid(thread_pid); + + /* + * Process clean up deferred to after we drop the tasklist lock. + */ + add_device_randomness((const void*) &post.randomness, + sizeof(unsigned long long)); + if (post.pids[PIDTYPE_PID]) + free_pid(post.pids[PIDTYPE_PID]); + if (post.pids[PIDTYPE_TGID]) + free_pid(post.pids[PIDTYPE_TGID]); + if (post.pids[PIDTYPE_PGID]) + free_pid(post.pids[PIDTYPE_PGID]); + if (post.pids[PIDTYPE_SID]) + free_pid(post.pids[PIDTYPE_SID]); + release_thread(p); put_task_struct_rcu_user(p); =20 diff --git a/kernel/pid.c b/kernel/pid.c index 3a10a7b6fcf8..047cdbcef5cf 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -343,7 +343,7 @@ void attach_pid(struct task_struct *task, enum pid_type= type) hlist_add_head_rcu(&task->pid_links[type], &pid->tasks[type]); } =20 -static void __change_pid(struct task_struct *task, enum pid_type type, +static struct pid *__change_pid(struct task_struct *task, enum pid_type ty= pe, struct pid *new) { struct pid **pid_ptr =3D task_pid_ptr(task, type); @@ -362,20 +362,33 @@ static void __change_pid(struct task_struct *task, en= um pid_type type, =20 for (tmp =3D PIDTYPE_MAX; --tmp >=3D 0; ) if (pid_has_task(pid, tmp)) - return; + return NULL; =20 - free_pid(pid); + return pid; +} + +struct pid *detach_pid_return(struct task_struct *task, enum pid_type type) +{ + return __change_pid(task, type, NULL); } =20 void detach_pid(struct task_struct *task, enum pid_type type) { - __change_pid(task, type, NULL); + struct pid *pid; + + pid =3D detach_pid_return(task, type); + if (pid) + free_pid(pid); } =20 void change_pid(struct task_struct *task, enum pid_type type, struct pid *pid) { - __change_pid(task, type, pid); + struct pid *opid; + + opid =3D __change_pid(task, type, pid); + if (opid) + free_pid(opid); attach_pid(task, type); } =20 --=20 2.43.0