From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93F05205AB5 for ; Mon, 3 Feb 2025 13:59:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591188; cv=none; b=hKptd2CMWolDjdRFt4uZmQKUvNiF47VBwyuVkPTehTyhJVyipsQRyNQsVUx8oL8y1uH4jg/I6agfStGGWnVoZtcE+JtmuQnSXds8WMm7IN67J3T2GKG+rkHNiuo/N7TNZejg/WauAFKQ01x17aSo274yIOzvNGyu8K8zRhBjMGk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591188; c=relaxed/simple; bh=0XXWylux21wmXCjjhIFOvVe9sS4zaU/MFgX+/8PLhvs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nLMbTJYkq32GXFYHQvWzO3rMgUY4lS2oGG6YHb03goeHOdOzSJeUNg7hriOPRiW43Sq8w6sKsh1jwq9MR3Dq1uEOdmzw7Mkg+rB1MjYd0a1rYJazAIvUI5M0AMfcVaF6qLv9ui+kW1f7wEmSrHnzxY6+DuBheH1LupmUu7Pg6zA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=E4nICYjn; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=mZ+vOkIr; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="E4nICYjn"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="mZ+vOkIr" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A7xCSCJQLcL9ZuyvcbAlBoIzs1uRrQYGWOhagszxr9E=; b=E4nICYjnr19U+nUwGA1oShLP23NPjHkVeP8x8UA2intYCkNs/gGPgNa7+7p6x5tQad/bIp RDQeoIpJCA3fCBsGAtSW8rYTemJMSdjAhsYPCqprYUWawuV+UMT0Wramx8GzYX6asHkpkA X6/ba4P0leV9O/gIulsAasYhzRlehYeaLBPCub8bjbwmyILfbvMKi2MZQ48GeluclV85LD VobrokdlIgkvLrQjU9bJCFvb86GJIuZwEKHSXYXMoXNQt9ktWJFPJP4jXWDrhTTWLkeLK6 yNhjOnUiNVbxrHf0p8E/7RIIdjquUYT1Nye3goXQybjrTy3Q3LYCgvoGedxc8g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591184; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A7xCSCJQLcL9ZuyvcbAlBoIzs1uRrQYGWOhagszxr9E=; b=mZ+vOkIrHXaQ+FU/c/rN+YBtWmT1eE92CgkPN51vMuUodMeNQQRw3Qg6BsUPmg40ZmKaa1 7oKnVLxrhS/F7iDw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , kernel test robot , Sebastian Andrzej Siewior Subject: [PATCH v8 01/15] rcuref: Avoid false positive "imbalanced put" report. Date: Mon, 3 Feb 2025 14:59:21 +0100 Message-ID: <20250203135935.440018-2-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Thomas Gleixner The kernel test robot reported an "imbalanced put" which turned out to be false positive. Consider the following race: ref =3D 0 (via rcuref_init(ref, 1)) T1 T2 rcuref_put(ref) -> atomic_add_negative_release(-1, ref) = # ref -> 0xffffffff -> rcuref_put_slowpath(ref) rcuref_get(ref) -> atomic_add_negative_relaxed(1, = &ref->refcnt) -> return true; = # ref -> 0 rcuref_put(ref) -> atomic_add_negative_release(-1,= ref) # ref -> 0xffffffff -> rcuref_put_slowpath() -> cnt =3D atomic_read(&ref->refcnt); = # cnt -> 0xffffffff / RCUREF_NOREF -> atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD)) = # ref -> 0xe0000000 / RCUREF_DEAD -> return true -> cnt =3D atomic_read(&ref->ref= cnt); # cnt -> 0xe0000000 / RCUREF_DEAD -> if (cnt > RCUREF_RELEASED) = # 0xe0000000 > 0xc0000000 -> WARN_ONCE(cnt >=3D RCUREF_R= ELEASED, "rcuref - imbalanced put()") The problem is the additional read in the slow path (after it decremented to RCUREF_NOREF) which can happen after the counter has been marked RCUREF_DEAD. Avoid the false positive by reusing the returning value from the decrement. Now every "final" put uses RCUREF_NOREF in the slow path and attempts the final cmpxchg() to RCUREF_DEAD. Fixes: ee1ee6db07795 ("atomics: Provide rcuref - scalable reference countin= g") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-lkp/202412311453.9d7636a2-lkp@intel.com Debugged-by: Sebastian Andrzej Siewior Reviewed-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior --- include/linux/rcuref.h | 9 ++++++--- lib/rcuref.c | 5 ++--- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/include/linux/rcuref.h b/include/linux/rcuref.h index 2c8bfd0f1b6b3..6322d8c1c6b42 100644 --- a/include/linux/rcuref.h +++ b/include/linux/rcuref.h @@ -71,27 +71,30 @@ static inline __must_check bool rcuref_get(rcuref_t *re= f) return rcuref_get_slowpath(ref); } =20 -extern __must_check bool rcuref_put_slowpath(rcuref_t *ref); +extern __must_check bool rcuref_put_slowpath(rcuref_t *ref, unsigned int c= nt); =20 /* * Internal helper. Do not invoke directly. */ static __always_inline __must_check bool __rcuref_put(rcuref_t *ref) { + int cnt; + RCU_LOCKDEP_WARN(!rcu_read_lock_held() && preemptible(), "suspicious rcuref_put_rcusafe() usage"); /* * Unconditionally decrease the reference count. The saturation and * dead zones provide enough tolerance for this. */ - if (likely(!atomic_add_negative_release(-1, &ref->refcnt))) + cnt =3D atomic_sub_return_release(1, &ref->refcnt); + if (likely(cnt >=3D 0)) return false; =20 /* * Handle the last reference drop and cases inside the saturation * and dead zones. */ - return rcuref_put_slowpath(ref); + return rcuref_put_slowpath(ref, cnt); } =20 /** diff --git a/lib/rcuref.c b/lib/rcuref.c index 97f300eca927c..5bd726b71e393 100644 --- a/lib/rcuref.c +++ b/lib/rcuref.c @@ -220,6 +220,7 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath); /** * rcuref_put_slowpath - Slowpath of __rcuref_put() * @ref: Pointer to the reference count + * @cnt: The resulting value of the fastpath decrement * * Invoked when the reference count is outside of the valid zone. * @@ -233,10 +234,8 @@ EXPORT_SYMBOL_GPL(rcuref_get_slowpath); * with a concurrent get()/put() pair. Caller is not allowed to * deconstruct the protected object. */ -bool rcuref_put_slowpath(rcuref_t *ref) +bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt) { - unsigned int cnt =3D atomic_read(&ref->refcnt); - /* Did this drop the last reference? */ if (likely(cnt =3D=3D RCUREF_NOREF)) { /* --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEE7F205ADD for ; Mon, 3 Feb 2025 13:59:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591188; cv=none; b=lIq1JQtMFnkuuqoxmI2c9N6ErC1jQafLSLmMQTuomj8YhjpY+e30HXQNZ17dLd395KUSSJ6o6vHavTMlYIBCRMYMyUUMSdBcWTnKxZEtpx/+BIzIXF/9o5aUpbqzM/bj9c5rV7XQ7UtDnVqAcX8vhIXPkCPvp/2U6+i0cl2D90M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591188; c=relaxed/simple; bh=tQ0PK3PxF+Mh6rd67NIZ/6W+nOrnjXuL8xhzSXA7qDk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ikgN8JUloE48B+EVnzIt3K3SFG79x6xGxdt+Q2GAKIob/NNzSRcznd5D7aTE8LVxKFwYS2FXRakpEyGjKO5oLBm5FSS531Mgt/ZHrPGzDd2a7JH968KUVN1YJnGNazD9U1uBhYB/YKq0340IHkvk9KTVQ9l7VDfnHULTLLrEUok= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=IIYlNcfU; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=X2H94ubr; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="IIYlNcfU"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="X2H94ubr" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lA7nLRMSPeonOj1Bx6zFLlkUTG+IdFmNM3dz0lX9bDA=; b=IIYlNcfUI6QrlaOEvYeV56VgsuxQQtuxGVrnKRXL8gJNBVupG6BKtcik49pQ7U/IMRr3yT jkYMJu6q4myqfSEiN9UpkPnqg5tuhmnyleUw5p+1fhmxa+RtSvapfYOkZBiaXsfVzgJwEC 8yn//Fu7puA+bOugbmEMSTDO2gQZtbzUV2Y0OdfxKwgpfVwz6xYGEB1I9tDPuLnTjouvho rYvbQUW5AcHFSueHYMJMe0eFHzApw3iN2gt5Htt2fKrZqP/ewRQNETkNUkKbsemIX/e3j5 j5UpA5oK8YUp0D7ujvKlvxCpRLtMZxBHfkavdK3gwWRsmu6oYcEi56N0P/JMZQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lA7nLRMSPeonOj1Bx6zFLlkUTG+IdFmNM3dz0lX9bDA=; b=X2H94ubrQ8HuUlmEJOolniCAcYEDtNAVVIciDNc87m0QUJ58vFzxiOPQgL5HRPf281c1zN FFKFxvdCrdGG9QCQ== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 02/15] futex: Create helper function to initialize a hash slot. Date: Mon, 3 Feb 2025 14:59:22 +0100 Message-ID: <20250203135935.440018-3-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Factor out the futex_hash_bucket initialisation into a helpr function. The helper function will be used in a follow up patch implementing process private hash buckets. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index ebdd76b4ecbba..d1d3c7b358b23 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -1124,6 +1124,13 @@ void futex_exit_release(struct task_struct *tsk) futex_cleanup_end(tsk, FUTEX_STATE_DEAD); } =20 +static void futex_hash_bucket_init(struct futex_hash_bucket *fhb) +{ + atomic_set(&fhb->waiters, 0); + plist_head_init(&fhb->chain); + spin_lock_init(&fhb->lock); +} + static int __init futex_init(void) { unsigned int futex_shift; @@ -1141,11 +1148,8 @@ static int __init futex_init(void) futex_hashsize, futex_hashsize); futex_hashsize =3D 1UL << futex_shift; =20 - for (i =3D 0; i < futex_hashsize; i++) { - atomic_set(&futex_queues[i].waiters, 0); - plist_head_init(&futex_queues[i].chain); - spin_lock_init(&futex_queues[i].lock); - } + for (i =3D 0; i < futex_hashsize; i++) + futex_hash_bucket_init(&futex_queues[i]); =20 return 0; } --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7314A205ADE for ; Mon, 3 Feb 2025 13:59:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591189; cv=none; b=DDye5r3PBDF5E40uDyGIOEyaPS7TdyPeTwOMQMUqgFt9Aqvc/EAzFMEmeMEc+MF3F4zv45WQPje4an8kLt1+yapU/QJCPDsQdAW8JfACWMANsrIjipyBqagLkqLeNd8LzmfkQugim2FPNcMPKtV0jHLaBYNMcbpa2mEuy/g1j5g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591189; c=relaxed/simple; bh=Zsxh+2Kx1X8PB8bDy2EbrRymu9UFXPLfnK49Wsfyx30=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TXS5sm6IQem6sdGTmg16bG3RcQqDkmXGqyxjXgK0HBaC3+cZjJmryI23kHkiNopmOTfXjDeJgQZftchTtU6KL2Gs/mhMk2Wo22RHDsIKOzLS6otnqfUGmZeRovDoxrkbgmMq500JPm7dDd8MK0uh9vZQPvXRV7TdRBvey9oAbJY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=X1vxkJTW; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=FY3ZPT+M; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="X1vxkJTW"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="FY3ZPT+M" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TYjRcCIcmYFlAqyivBWzYgmSVXYK4NavtinGURxN1gU=; b=X1vxkJTWeaw+l4/9A4p8Fx6HNWAMrHarLJpkRIOQE6H7dTeMcMURzYIklD6/OD11kUQLgW ttTN1SN7X2VBhO8a2k9m+bUhxU3jE2ItEyuQpJkC/xRMVFxFWogdCkej3Uek3egFe5+J6f G6ucDIbEGnb3bFsWTd1GIxvIkMWcKG9VZDpc/H0qfnHpAUwUlJ7rOMpIZDeji4FFk49IHO LnOWfYTOdPv2yG5/KdZa7VqsnFaUOPI+345NMz0dPx2fTZTzG3ZmelbIh85UcX5oj10uzW XCZCGh05LdVhmrshTzoifNNSV65TJjATvJTPnPVNJ7gn0FPjh4YkKEYj3NMNEw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TYjRcCIcmYFlAqyivBWzYgmSVXYK4NavtinGURxN1gU=; b=FY3ZPT+MFS0XgAUKyOzds2gsOrSFCDUJPju5d+r9TQwiTPc7P1Ua0kXVdxFMNRzhyuRoP9 8sKgB8SAPPlgUlAw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 03/15] futex: Add basic infrastructure for local task local hash. Date: Mon, 3 Feb 2025 14:59:23 +0100 Message-ID: <20250203135935.440018-4-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The futex hashmap is system wide and shared by random tasks. Each slot is hashed based on its address and VMA. Due to randomized VMAs (and memory allocations) the same logical lock (pointer) can end up in a different hash bucket on each invocation of the application. This in turn means that different applications may share a hash bucket on the first invocation but not on the second an it is not always clear which applications will be involved. This can result in high latency's to acquire the futex_hash_bucket::lock especially if the lock owner is limited to a CPU and not be effectively PI boosted. Introduce a task local hash map. The hashmap can be allocated via prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, 0) The `0' argument allocates a default number of 16 slots, a higher number can be specified if desired. The current upper limit is 131072. The allocated hashmap is used by all threads within a process. A thread can check if the private map has been allocated via prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS); Which return the current number of slots. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/futex.h | 20 ++++++++ include/linux/mm_types.h | 6 ++- include/uapi/linux/prctl.h | 5 ++ kernel/fork.c | 2 + kernel/futex/core.c | 101 +++++++++++++++++++++++++++++++++++-- kernel/sys.c | 4 ++ 6 files changed, 133 insertions(+), 5 deletions(-) diff --git a/include/linux/futex.h b/include/linux/futex.h index b70df27d7e85c..943828db52234 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -77,6 +77,15 @@ void futex_exec_release(struct task_struct *tsk); =20 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, u32 __user *uaddr2, u32 val2, u32 val3); +int futex_hash_prctl(unsigned long arg2, unsigned long arg3); +int futex_hash_allocate_default(void); +void futex_hash_free(struct mm_struct *mm); + +static inline void futex_mm_init(struct mm_struct *mm) +{ + mm->futex_hash_bucket =3D NULL; +} + #else static inline void futex_init_task(struct task_struct *tsk) { } static inline void futex_exit_recursive(struct task_struct *tsk) { } @@ -88,6 +97,17 @@ static inline long do_futex(u32 __user *uaddr, int op, u= 32 val, { return -EINVAL; } +static inline int futex_hash_prctl(unsigned long arg2, unsigned long arg3) +{ + return -EINVAL; +} +static inline int futex_hash_allocate_default(void) +{ + return 0; +} +static inline void futex_hash_free(struct mm_struct *mm) { } +static inline void futex_mm_init(struct mm_struct *mm) { } + #endif =20 #endif diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6b27db7f94963..c20f2310d78ca 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -30,6 +30,7 @@ #define INIT_PASID 0 =20 struct address_space; +struct futex_hash_bucket; struct mem_cgroup; =20 /* @@ -936,7 +937,10 @@ struct mm_struct { */ seqcount_t mm_lock_seq; #endif - +#ifdef CONFIG_FUTEX + unsigned int futex_hash_mask; + struct futex_hash_bucket *futex_hash_bucket; +#endif =20 unsigned long hiwater_rss; /* High-watermark of RSS usage */ unsigned long hiwater_vm; /* High-water virtual memory usage */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 5c6080680cb27..55b843644c51a 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -353,4 +353,9 @@ struct prctl_mm_map { */ #define PR_LOCK_SHADOW_STACK_STATUS 76 =20 +/* FUTEX hash management */ +#define PR_FUTEX_HASH 77 +# define PR_FUTEX_HASH_SET_SLOTS 1 +# define PR_FUTEX_HASH_GET_SLOTS 2 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 735405a9c5f32..80ac156adebbf 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1287,6 +1287,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, RCU_INIT_POINTER(mm->exe_file, NULL); mmu_notifier_subscriptions_init(mm); init_tlb_flush_pending(mm); + futex_mm_init(mm); #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(CONFIG_SPLIT_PMD_PTLO= CKS) mm->pmd_huge_pte =3D NULL; #endif @@ -1364,6 +1365,7 @@ static inline void __mmput(struct mm_struct *mm) if (mm->binfmt) module_put(mm->binfmt->module); lru_gen_del_mm(mm); + futex_hash_free(mm); mmdrop(mm); } =20 diff --git a/kernel/futex/core.c b/kernel/futex/core.c index d1d3c7b358b23..26328d8072fee 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -39,6 +39,7 @@ #include #include #include +#include =20 #include "futex.h" #include "../locking/rtmutex_common.h" @@ -107,18 +108,40 @@ late_initcall(fail_futex_debugfs); =20 #endif /* CONFIG_FAIL_FUTEX */ =20 +static inline bool futex_key_is_private(union futex_key *key) +{ + /* + * Relies on get_futex_key() to set either bit for shared + * futexes -- see comment with union futex_key. + */ + return !(key->both.offset & (FUT_OFF_INODE | FUT_OFF_MMSHARED)); +} + /** - * futex_hash - Return the hash bucket in the global hash + * futex_hash - Return the hash bucket in the global or local hash * @key: Pointer to the futex key for which the hash is calculated * * We hash on the keys returned from get_futex_key (see below) and return = the - * corresponding hash bucket in the global hash. + * corresponding hash bucket in the global hash. If the FUTEX is private a= nd + * a local hash table is privated then this one is used. */ struct futex_hash_bucket *futex_hash(union futex_key *key) { - u32 hash =3D jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, - key->both.offset); + struct futex_hash_bucket *fhb; + u32 hash; =20 + fhb =3D current->mm->futex_hash_bucket; + if (fhb && futex_key_is_private(key)) { + u32 hash_mask =3D current->mm->futex_hash_mask; + + hash =3D jhash2((u32 *)key, + offsetof(typeof(*key), both.offset) / 4, + key->both.offset); + return &fhb[hash & hash_mask]; + } + hash =3D jhash2((u32 *)key, + offsetof(typeof(*key), both.offset) / 4, + key->both.offset); return &futex_queues[hash & (futex_hashsize - 1)]; } =20 @@ -1131,6 +1154,76 @@ static void futex_hash_bucket_init(struct futex_hash= _bucket *fhb) spin_lock_init(&fhb->lock); } =20 +void futex_hash_free(struct mm_struct *mm) +{ + kvfree(mm->futex_hash_bucket); +} + +static int futex_hash_allocate(unsigned int hash_slots) +{ + struct futex_hash_bucket *fhb; + int i; + + if (current->mm->futex_hash_bucket) + return -EALREADY; + + if (!thread_group_leader(current)) + return -EINVAL; + + if (hash_slots =3D=3D 0) + hash_slots =3D 16; + if (hash_slots < 2) + hash_slots =3D 2; + if (hash_slots > 131072) + hash_slots =3D 131072; + if (!is_power_of_2(hash_slots)) + hash_slots =3D rounddown_pow_of_two(hash_slots); + + fhb =3D kvmalloc_array(hash_slots, sizeof(struct futex_hash_bucket), GFP_= KERNEL_ACCOUNT); + if (!fhb) + return -ENOMEM; + + current->mm->futex_hash_mask =3D hash_slots - 1; + + for (i =3D 0; i < hash_slots; i++) + futex_hash_bucket_init(&fhb[i]); + + current->mm->futex_hash_bucket =3D fhb; + return 0; +} + +int futex_hash_allocate_default(void) +{ + return futex_hash_allocate(0); +} + +static int futex_hash_get_slots(void) +{ + if (current->mm->futex_hash_bucket) + return current->mm->futex_hash_mask + 1; + return 0; +} + +int futex_hash_prctl(unsigned long arg2, unsigned long arg3) +{ + int ret; + + switch (arg2) { + case PR_FUTEX_HASH_SET_SLOTS: + ret =3D futex_hash_allocate(arg3); + break; + + case PR_FUTEX_HASH_GET_SLOTS: + ret =3D futex_hash_get_slots(); + break; + + default: + ret =3D -EINVAL; + break; + } + return ret; +} + static int __init futex_init(void) { unsigned int futex_shift; diff --git a/kernel/sys.c b/kernel/sys.c index cb366ff8703af..e509ad9795103 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -52,6 +52,7 @@ #include #include #include +#include =20 #include #include @@ -2811,6 +2812,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, ar= g2, unsigned long, arg3, return -EINVAL; error =3D arch_lock_shadow_stack_status(me, arg2); break; + case PR_FUTEX_HASH: + error =3D futex_hash_prctl(arg2, arg3); + break; default: trace_task_prctl_unknown(option, arg2, arg3, arg4, arg5); error =3D -EINVAL; --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95316205E06 for ; Mon, 3 Feb 2025 13:59:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591189; cv=none; b=Fn3CZNtg2QF0oGk+msTmUNVj6VNErfZzYL9eDZ5YBfmUOO4iUE/aduEv3gQjBFFpuJTPNUFiNSalE99PvkAhA9Uo6d2sZgPd73HwYwiQ4MNC0oZ6WYXGcitquaJApODoSZnAG4ZicOPhqBGU2COjAULuaRJoYq3yOgO7N/BELcI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591189; c=relaxed/simple; bh=jx24DITxVlzSBY9t3sScv5dV3Uc64R3fHu3VkFAyLnY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HLvHtgTtHxMrjtLC9P8j9hwaQHyM3+8p0ZK/1bzCLXDjbfsnCrtaAMUtEPtfjuMDX+VFe0xK9eDw3hSa6Qn/X9NQupAqK/kZ9bwq+18+6hnpphqqh3XQ0MC79EcTifOARrxV2XPBXe0n87HjoaNUb7qq4yzm90nhxz7q40y2UIo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=D76qvXYx; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=0SO/MN56; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="D76qvXYx"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="0SO/MN56" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3VkTRA93v/TYQ+iuRG7ZC9MwtD2MiDnMSmSFE4JgP+k=; b=D76qvXYx56p9ojZyje4mApDuUcLwaObOfVX2A3kyauTPNdaQW+7tjReWK9hHcDf3gAoOxQ IzrFZW123lB2MJBFv8VMgIavy0+3/9LhReyWaNwlmSgasXJKBEdRWal+8vTHrtNWdt/4lV RGktU0Msph4+D9uMPY4A6aLNSz/j0ygln6rRlDb/P3MvKc7Xl+VgeVbatydVAPojFWNS/F MSrTWTAzY2RWYtxJWKerkc0UM9+9oLhsDcDZrS5fVtRln/eETZYCSIB39n24+lYozeXzxz 3u0XD1sSRmaIXG95GmH1RZkzjJADyljxb3Ah/qDDKHL1f6+lW8HUzrhpX+tICA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591185; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3VkTRA93v/TYQ+iuRG7ZC9MwtD2MiDnMSmSFE4JgP+k=; b=0SO/MN56aksC1mFhnQgftu04wGhlKq0G5ad8ARewLh7eDfxmdJvTNh0ELQJBZZtCHVXt8G xnO4j27VXfnouYAw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 04/15] futex: Allow automatic allocation of process wide futex hash. Date: Mon, 3 Feb 2025 14:59:24 +0100 Message-ID: <20250203135935.440018-5-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allocate a default futex hash if a task forks its first thread. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/futex.h | 12 ++++++++++++ kernel/fork.c | 24 ++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/include/linux/futex.h b/include/linux/futex.h index 943828db52234..bad377c30de5e 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -86,6 +86,13 @@ static inline void futex_mm_init(struct mm_struct *mm) mm->futex_hash_bucket =3D NULL; } =20 +static inline bool futex_hash_requires_allocation(void) +{ + if (current->mm->futex_hash_bucket) + return false; + return true; +} + #else static inline void futex_init_task(struct task_struct *tsk) { } static inline void futex_exit_recursive(struct task_struct *tsk) { } @@ -108,6 +115,11 @@ static inline int futex_hash_allocate_default(void) static inline void futex_hash_free(struct mm_struct *mm) { } static inline void futex_mm_init(struct mm_struct *mm) { } =20 +static inline bool futex_hash_requires_allocation(void) +{ + return false; +} + #endif =20 #endif diff --git a/kernel/fork.c b/kernel/fork.c index 80ac156adebbf..824cc55d32ece 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2138,6 +2138,15 @@ static void rv_task_fork(struct task_struct *p) #define rv_task_fork(p) do {} while (0) #endif =20 +static bool need_futex_hash_allocate_default(u64 clone_flags) +{ + if ((clone_flags & (CLONE_THREAD | CLONE_VM)) !=3D (CLONE_THREAD | CLONE_= VM)) + return false; + if (!thread_group_empty(current)) + return false; + return futex_hash_requires_allocation(); +} + /* * This creates a new process as a copy of the old one, * but does not actually start it yet. @@ -2515,6 +2524,21 @@ __latent_entropy struct task_struct *copy_process( if (retval) goto bad_fork_cancel_cgroup; =20 + /* + * Allocate a default futex hash for the user process once the first + * thread spawns. + */ + if (need_futex_hash_allocate_default(clone_flags)) { + retval =3D futex_hash_allocate_default(); + if (retval) + goto bad_fork_core_free; + /* + * If we fail beyond this point we don't free the allocated + * futex hash map. We assume that another thread will be created + * and makes use of it. The hash map will be freed once the main + * thread terminates. + */ + } /* * From this point on we must avoid any synchronous user-space * communication until we take the tasklist-lock. In particular, we do --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3C9A2063E7 for ; Mon, 3 Feb 2025 13:59:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591192; cv=none; b=OPaf/6y1PmgMoQ5WlVPDUnVPakSr6/P12VSteEd+jNdA32UAHEmlDyrw+FgtP2yAZ9S3WATy50R/9MDMWWJhEbQw2pGTk6oPFQGVTHnvpeH/WbENpenMFbl2RhGAFu+CzRXH5D62F0XEqD1eNoit2+Z+4oLaY0Q/wkIPr9dQHpk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591192; c=relaxed/simple; bh=X5pzFQl3t9Id8Bg4N9Vu+xtYMPukwkaSLyljCfYdojQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YG84t3UK3xdmfgXmR/BA3RsfdKDW5lr+vaviBRMTfhsujSxSTLg/9v6mAPeqCXhb2Q4bp3KlWIPQyRxLzkiKqd3plku3LZjJpoBgj9gUG5HD703H2eGBlsYT1aBv31p2deJuejClLa2ns08JymziArn09AkVICunChTTf9rvjgE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=qTF8fUgb; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=19w4cvjf; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="qTF8fUgb"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="19w4cvjf" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591186; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2Fblq5lzhRPvME3UqTEQiFKUL0jlzsGIai//ftQqLlw=; b=qTF8fUgbbogNRKtUjm6PiwG8FLHKBa0NMt4mUi66jfERsAGoYbVPibbQyMMWvhUBjZmnCI hjwVRb346ZA5v2kSz8X8M39SluyvekwvOCufRhmGA8q7rXl/MRWm+mA4BeTEaMpp0E/2Cg 0ELtek18NqozhrB+CTBk58T2T+JRTtbwzo8UGmS47NBOnp9aqXSJIJUAzJRTCP95Mvi9Q2 GFrH90yuxGb/eFPbUpXPPqGwcEv8O03Dkh8ycnBF3LPB+lW9xzQQjdRnfFPUrDX1k4I/F7 Yr+j+TThohsl2F3pLlFe68OrYNK0ZLsvPZlNQFh6unL770CKJiVRj6PXT6cF0A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591186; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2Fblq5lzhRPvME3UqTEQiFKUL0jlzsGIai//ftQqLlw=; b=19w4cvjfd6yy0uf7CrSFn47Uu+Iu/R/9ZGUthPVkqVDrbveouATshVn/EMC0kKY4R+4l0u vl/eZ7GJLHxMEqAA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 05/15] futex: Hash only the address for private futexes. Date: Mon, 3 Feb 2025 14:59:25 +0100 Message-ID: <20250203135935.440018-6-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" futex_hash() passes the whole futex_key to jhash2. The first two member are passed as the first argument and the offset as the "initial value". For private futexes, the mm-part is always the same and it is used only within the process. By excluding the mm part from the hash, we reduce the length passed to jhash2 from 4 (16 / 4) to 2 (8 / 2). This avoids the __jhash_mix() part of jhash. The resulting code is smaller and based on testing this variant performs as good as the original or slightly better. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 26328d8072fee..f608cd6ccc032 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -134,8 +134,8 @@ struct futex_hash_bucket *futex_hash(union futex_key *k= ey) if (fhb && futex_key_is_private(key)) { u32 hash_mask =3D current->mm->futex_hash_mask; =20 - hash =3D jhash2((u32 *)key, - offsetof(typeof(*key), both.offset) / 4, + hash =3D jhash2((void *)&key->private.address, + sizeof(key->private.address) / 4, key->both.offset); return &fhb[hash & hash_mask]; } --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3C112063E4 for ; Mon, 3 Feb 2025 13:59:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591191; cv=none; b=EQy++FE32alsT21iWjUMjWouhLxPS2wwXmoOXSAgW4xoG5KYcK/WaH3HGlbZau6UCBZrU2ZvJwTHKjV1eL2BHQ09k3Gbr/qSpFAW//QlM/HXxUbi7eUh+546DMqMVsWjr2htC3tBenA3tvtVEu8tChtz89O4DvcsGT2zTWHln+Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591191; c=relaxed/simple; bh=kneljZUGZhSn7ACC62SOnRyod0ubWPJeTd+fbse+vOg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DlKtoiQuqcZg/7KcgbScVt+eWRKgpL0yaCnoAC5lnm3JEoDiHLSajq0OdWkyXbZxYr7Ot4BEIOI5dORehE28XMH1fo838d0XMZjBf3QqJREZQlVRJbDNIuDUOOQVlmSVJhZ1MJb4owZfpR49PogVBSv4wSeptAeYwoZiF+oMFfg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=U/GjAUJC; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=sF5jW7kV; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="U/GjAUJC"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="sF5jW7kV" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591186; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5VUcnGP0kVAVKvxfl1B22K4KhjbPPRJxpoPJxnoANCs=; b=U/GjAUJC7maOqG180AjoZM9Pg1p45x5RvINI4B2DDujN2wHAuGe5o4sRM657YtWgDYNmpv CizAr/n8/Rfj3KdtPzdBXTGpFP6/Qk6rncypm0rKtdw+KDV/OV9DsICzY+3wWhtBDsqfyA ZKZqnRizpbDGTpj+vBFz1rDar0USO2PTNzb0QXPsiqTUxBhCdqqki5nTsWhzlgALGiSai5 sXFCFa+psFSdFUluN9LA2SqAZE7m7oP0zSFKgQiwbfi/8nF3A9c79co4q+cNwmfqVhSh3Y QhVANcgt1PP300CIhl4y4hxVJoGxBJ2tLX4Zq1Bp2NDVvAUigB6lF3SmTNlwSw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591186; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5VUcnGP0kVAVKvxfl1B22K4KhjbPPRJxpoPJxnoANCs=; b=sF5jW7kV0BdSYB/gNHknraMz/BPTrgLuwpM5NFmhpy3cKYspZY9AC9WVlj81AKf/QjX7Ee siQHyIRcuPDLTNBA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 06/15] futex: Move private hashing into its own function. Date: Mon, 3 Feb 2025 14:59:26 +0100 Message-ID: <20250203135935.440018-7-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The hashing of the private is slightly different and will be needed again while moving a futex_q entry to a different hash bucket after the resize. Move the private hashing into its own function. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index f608cd6ccc032..fdfc3402278a1 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -117,6 +117,18 @@ static inline bool futex_key_is_private(union futex_ke= y *key) return !(key->both.offset & (FUT_OFF_INODE | FUT_OFF_MMSHARED)); } =20 +static struct futex_hash_bucket *futex_hash_private(union futex_key *key, + struct futex_hash_bucket *fhb, + u32 hash_mask) +{ + u32 hash; + + hash =3D jhash2((void *)&key->private.address, + sizeof(key->private.address) / 4, + key->both.offset); + return &fhb[hash & hash_mask]; +} + /** * futex_hash - Return the hash bucket in the global or local hash * @key: Pointer to the futex key for which the hash is calculated @@ -131,14 +143,9 @@ struct futex_hash_bucket *futex_hash(union futex_key *= key) u32 hash; =20 fhb =3D current->mm->futex_hash_bucket; - if (fhb && futex_key_is_private(key)) { - u32 hash_mask =3D current->mm->futex_hash_mask; + if (fhb && futex_key_is_private(key)) + return futex_hash_private(key, fhb, current->mm->futex_hash_mask); =20 - hash =3D jhash2((void *)&key->private.address, - sizeof(key->private.address) / 4, - key->both.offset); - return &fhb[hash & hash_mask]; - } hash =3D jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, key->both.offset); --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3D4F2063E9 for ; Mon, 3 Feb 2025 13:59:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591191; cv=none; b=NvuMzonsXKaSlFzB/uON/v2OXRvixHFR4rTp6lcsq1aAHFq0fafTIeSJVHgZ8BZ+hKv9duyFJVujuyCm3VfIxrPiFRtTZq4HOoZtASajCtq4ygWPBqe5QRnPMKBK+iosUp/NfGM93gq3xcn/yy/3Oa11z8jSNqzTdN/tMz/Uk04= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591191; c=relaxed/simple; bh=pc05xyIdq1ywOatVW1w3Z8hi+B0vbA81OcYBNiDKpGs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qByX+riXP5OIL5TxmvkjBRe/SbpAHdjAPFtsok/QI69QOGhYJ9VnmfdUEyVxjZ0CkkooyaYn1N1Pyx6uct7NyxYKraaCnEVey6GXQyyfxTLZbQIgz6rQARq5shxfXTHB5guJ80WnCtirbr4ljecf1V/L3JF2piWx5TfTtO9KhrY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=t7SFYKht; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=vDO0yJBW; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="t7SFYKht"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="vDO0yJBW" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591186; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ksFDiPENi8cTdcGGbyczRGuxWkcGhFHMgH00kb8pNaA=; b=t7SFYKhtX9f/+40BIJGKkCOIeg2U3kJx9Mo44JSkeE/TbLc7dfOMpCFbLqsYkizYny19vW OL6gKOXjLEZhK6iO/ecBP6RDB198Px7eRGPe/pRkP8cOy5pLz2tcFmIZr2JoebcIFFnMAe Vm5kc7qYxj3cK11m29QxSRs+ehJcxPd5J2UGXIp/KHaALpgcv133FeRsO4iOg8QRH+bfPZ 3H9JyluaGn6JB11CGQBt/oh6Qi3caeJTVXma5j+OXy0L6ygvuaOsK194F1Rs38jKt0hzpO IkKKFPyk1XPb5tAlQeeh4cwDrQRJNk2lKqHg0ADULJX554DVNdmd4rLaryBv7g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591186; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ksFDiPENi8cTdcGGbyczRGuxWkcGhFHMgH00kb8pNaA=; b=vDO0yJBWyeTNyWWXWd/ltjMcxPjp1VdHXOAEWIUkGoyQ31QnTICPQvBchnvbJS64JqiDtK hVZdUfNWQ3Ssg4Cw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 07/15] futex: Decrease the waiter count before the unlock operation. Date: Mon, 3 Feb 2025 14:59:27 +0100 Message-ID: <20250203135935.440018-8-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To support runtime resizing of the process private hash, it's required to not use the obtained hash bucket once the reference count has been dropped. The reference will be dropped after the unlock of the hash bucket. The amount of waiters is decremented after the unlock operation. There is no requirement that this needs to happen after the unlock. The increment happens before acquiring the lock to signal early that there will be a waiter. The waiter can avoid blocking on the lock if it is known that there will be no waiter. There is no difference in terms of ordering if the decrement happens before or after the unlock. Decrease the waiter count before the unlock operation. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 2 +- kernel/futex/requeue.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index fdfc3402278a1..6d12614dad5e8 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -558,8 +558,8 @@ struct futex_hash_bucket *futex_q_lock(struct futex_q *= q) void futex_q_unlock(struct futex_hash_bucket *hb) __releases(&hb->lock) { - spin_unlock(&hb->lock); futex_hb_waiters_dec(hb); + spin_unlock(&hb->lock); } =20 void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb) diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index b47bb764b3520..fb69dcdf74da8 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -456,8 +456,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, ret =3D futex_get_value_locked(&curval, uaddr1); =20 if (unlikely(ret)) { - double_unlock_hb(hb1, hb2); futex_hb_waiters_dec(hb2); + double_unlock_hb(hb1, hb2); =20 ret =3D get_user(curval, uaddr1); if (ret) @@ -542,8 +542,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, * waiter::requeue_state is correct. */ case -EFAULT: - double_unlock_hb(hb1, hb2); futex_hb_waiters_dec(hb2); + double_unlock_hb(hb1, hb2); ret =3D fault_in_user_writeable(uaddr2); if (!ret) goto retry; @@ -556,8 +556,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, * exit to complete. * - EAGAIN: The user space value changed. */ - double_unlock_hb(hb1, hb2); futex_hb_waiters_dec(hb2); + double_unlock_hb(hb1, hb2); /* * Handle the case where the owner is in the middle of * exiting. Wait for the exit to complete otherwise @@ -674,9 +674,9 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, put_pi_state(pi_state); =20 out_unlock: + futex_hb_waiters_dec(hb2); double_unlock_hb(hb1, hb2); wake_up_q(&wake_q); - futex_hb_waiters_dec(hb2); return ret ? ret : task_count; } =20 --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3CF42063E8 for ; Mon, 3 Feb 2025 13:59:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591192; cv=none; b=ayz4DDqL5Mirh259loB6knINfB21pQ0GbeY62DjhJGyZSe6tFh0/YUL1QBy7tfIQlTqewiB8x77YOxG/kzNQTKSOyoz9SINlDUGvFIiTRVdAXYT3Q1689lX7N06aw96d+nmmkjiO5XbtNE69bGP7Tobu9d5M1zUcDlCJkyVTHdU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591192; c=relaxed/simple; bh=hPWB0PH5d6WOsOXdlYzGZPDLiqyfRYlk4+M7EReyrMk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kb4JMMIINds9tzkyOJnPYtJSxBW0wyM0jFHmcSaaviNIum0J+9Vhan8+LUM7rLImrO/vKKRmfekbJywI7h5uw7/AIKnT0zT5eZL/cdt35YNEVMJ8CBB7KJu7+5rXq39486g8zsJ9+quOHoa5ssT2qTJTcCZtn+L97YpUZL/9Q8w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=qnQjaW8/; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=H9VG0cDm; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="qnQjaW8/"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="H9VG0cDm" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C4LaPykV7UfafdMEsKb9zzrHTc/iBLj9D5gTxlSJA28=; b=qnQjaW8/NRAwq9oblnsj3x5Iaz/xYydcY9LIrA0TGDCMhwpERXAv8OkHdTpmdEbPIWZHYO balSpn9/h0GmvoC7v85jRI+pAEn/AJOpzwc+mPrv0rLMqlul+Z59C/loDFtYST9Dg1rOKK Wfpi7SEEncRIFhNiOmv+/UCWbesniDZTrzOqIeQGL8fucpZORptENYywemQxT1mjAJ89cW dXemxErATY1MQewEeLi0I0bpt+GUwBHrqgeVLbsrJYg6GDuDfH+6PhI+CwYSAh4clylew5 4TZEW0O9gCfp1NH8dDIJ19lRW6kRNKvA0c6OahQOj8jMxScynoduYEBfRdMwWA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C4LaPykV7UfafdMEsKb9zzrHTc/iBLj9D5gTxlSJA28=; b=H9VG0cDm6thJHoEiLt2xvWS0qXroRSafP4Kj/tNOeEUwYqM0STL4GP3Qs7HPtlioyA3Q+c wBQSVtHruTfYlbBg== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 08/15] futex: Prepare for reference counting of the process private hash end of operation. Date: Mon, 3 Feb 2025 14:59:28 +0100 Message-ID: <20250203135935.440018-9-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To support runtime resizing of the process private hash, it's required to add a reference count to the hash structure. The reference count ensures that the hash cannot be resized or freed while a task is operating on it. The reference count will be obtained within futex_hash() and dropped once the hash bucket is unlocked and not longer required for the particular operation (queue, unqueue, wakeup etc.). This is achieved by: - appending _put() to existing functions so it's clear that they also put the hash reference and fixing up the usage sites - providing new helpers, which combine common operations (unlock, put), and using them at the appropriate places - providing new helper for standalone reference counting functionality and using them at places, where the unlock operation needs to be separate. Signed-off-by: Sebastian Andrzej Siewior --- io_uring/futex.c | 2 +- kernel/futex/core.c | 12 ++++++++---- kernel/futex/futex.h | 31 ++++++++++++++++++++----------- kernel/futex/pi.c | 19 ++++++++++--------- kernel/futex/requeue.c | 12 ++++++------ kernel/futex/waitwake.c | 23 ++++++++++++----------- 6 files changed, 57 insertions(+), 42 deletions(-) diff --git a/io_uring/futex.c b/io_uring/futex.c index 3159a2b7eeca1..2141811077b81 100644 --- a/io_uring/futex.c +++ b/io_uring/futex.c @@ -338,7 +338,7 @@ int io_futex_wait(struct io_kiocb *req, unsigned int is= sue_flags) hlist_add_head(&req->hash_node, &ctx->futex_list); io_ring_submit_unlock(ctx, issue_flags); =20 - futex_queue(&ifd->q, hb); + futex_queue_put(&ifd->q, hb); return IOU_ISSUE_SKIP_COMPLETE; } =20 diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 6d12614dad5e8..b54fcb1c6248d 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -152,6 +152,9 @@ struct futex_hash_bucket *futex_hash(union futex_key *k= ey) return &futex_queues[hash & (futex_hashsize - 1)]; } =20 +void futex_hash_put(struct futex_hash_bucket *hb) +{ +} =20 /** * futex_setup_timer - set up the sleeping hrtimer. @@ -543,8 +546,8 @@ struct futex_hash_bucket *futex_q_lock(struct futex_q *= q) * Increment the counter before taking the lock so that * a potential waker won't miss a to-be-slept task that is * waiting for the spinlock. This is safe as all futex_q_lock() - * users end up calling futex_queue(). Similarly, for housekeeping, - * decrement the counter at futex_q_unlock() when some error has + * users end up calling futex_queue_put(). Similarly, for housekeeping, + * decrement the counter at futex_q_unlock_put() when some error has * occurred and we don't end up adding the task to the list. */ futex_hb_waiters_inc(hb); /* implies smp_mb(); (A) */ @@ -555,11 +558,12 @@ struct futex_hash_bucket *futex_q_lock(struct futex_q= *q) return hb; } =20 -void futex_q_unlock(struct futex_hash_bucket *hb) +void futex_q_unlock_put(struct futex_hash_bucket *hb) __releases(&hb->lock) { futex_hb_waiters_dec(hb); spin_unlock(&hb->lock); + futex_hash_put(hb); } =20 void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb) @@ -586,7 +590,7 @@ void __futex_queue(struct futex_q *q, struct futex_hash= _bucket *hb) * @q: The futex_q to unqueue * * The q->lock_ptr must not be held by the caller. A call to futex_unqueue= () must - * be paired with exactly one earlier call to futex_queue(). + * be paired with exactly one earlier call to futex_queue_put(). * * Return: * - 1 - if the futex_q was still queued (and we removed unqueued it); diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index 99b32e728c4ad..36627617f7ced 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -202,6 +202,7 @@ futex_setup_timer(ktime_t *time, struct hrtimer_sleeper= *timeout, int flags, u64 range_ns); =20 extern struct futex_hash_bucket *futex_hash(union futex_key *key); +extern void futex_hash_put(struct futex_hash_bucket *hb); =20 /** * futex_match - Check whether two futex keys are equal @@ -288,23 +289,29 @@ extern void __futex_unqueue(struct futex_q *q); extern void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb); extern int futex_unqueue(struct futex_q *q); =20 +static inline void futex_hb_unlock_put(struct futex_hash_bucket *hb) +{ + spin_unlock(&hb->lock); + futex_hash_put(hb); +} + /** - * futex_queue() - Enqueue the futex_q on the futex_hash_bucket + * futex_queue_put() - Enqueue the futex_q on the futex_hash_bucket * @q: The futex_q to enqueue * @hb: The destination hash bucket * - * The hb->lock must be held by the caller, and is released here. A call to - * futex_queue() is typically paired with exactly one call to futex_unqueu= e(). The - * exceptions involve the PI related operations, which may use futex_unque= ue_pi() - * or nothing if the unqueue is done as part of the wake process and the u= nqueue - * state is implicit in the state of woken task (see futex_wait_requeue_pi= () for - * an example). + * The hb->lock must be held by the caller, and is released here and the r= eference + * on the hb is dropped. A call to futex_queue_put() is typically paired w= ith + * exactly one call to futex_unqueue(). The exceptions involve the PI rela= ted + * operations, which may use futex_unqueue_pi() or nothing if the unqueue = is + * done as part of the wake process and the unqueue state is implicit in t= he + * state of woken task (see futex_wait_requeue_pi() for an example). */ -static inline void futex_queue(struct futex_q *q, struct futex_hash_bucket= *hb) +static inline void futex_queue_put(struct futex_q *q, struct futex_hash_bu= cket *hb) __releases(&hb->lock) { __futex_queue(q, hb); - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); } =20 extern void futex_unqueue_pi(struct futex_q *q); @@ -350,7 +357,7 @@ static inline int futex_hb_waiters_pending(struct futex= _hash_bucket *hb) } =20 extern struct futex_hash_bucket *futex_q_lock(struct futex_q *q); -extern void futex_q_unlock(struct futex_hash_bucket *hb); +extern void futex_q_unlock_put(struct futex_hash_bucket *hb); =20 =20 extern int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucke= t *hb, @@ -380,11 +387,13 @@ double_lock_hb(struct futex_hash_bucket *hb1, struct = futex_hash_bucket *hb2) } =20 static inline void -double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *= hb2) +double_unlock_hb_put(struct futex_hash_bucket *hb1, struct futex_hash_buck= et *hb2) { spin_unlock(&hb1->lock); if (hb1 !=3D hb2) spin_unlock(&hb2->lock); + futex_hash_put(hb1); + futex_hash_put(hb2); } =20 /* syscalls */ diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index daea650b16f51..797270228665a 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -217,9 +217,9 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uv= al, /* * We get here with hb->lock held, and having found a * futex_top_waiter(). This means that futex_lock_pi() of said futex_q - * has dropped the hb->lock in between futex_queue() and futex_unqueue_pi= (), - * which in turn means that futex_lock_pi() still has a reference on - * our pi_state. + * has dropped the hb->lock in between futex_queue_put() and + * futex_unqueue_pi(), which in turn means that futex_lock_pi() still + * has a reference on our pi_state. * * The waiter holding a reference on @pi_state also protects against * the unlocked put_pi_state() in futex_unlock_pi(), futex_lock_pi() @@ -963,7 +963,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags= , ktime_t *time, int tryl * exit to complete. * - EAGAIN: The user space value changed. */ - futex_q_unlock(hb); + futex_q_unlock_put(hb); /* * Handle the case where the owner is in the middle of * exiting. Wait for the exit to complete otherwise @@ -1083,7 +1083,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fla= gs, ktime_t *time, int tryl goto out; =20 out_unlock_put_key: - futex_q_unlock(hb); + futex_q_unlock_put(hb); =20 out: if (to) { @@ -1093,7 +1093,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fla= gs, ktime_t *time, int tryl return ret !=3D -EINTR ? ret : -ERESTARTNOINTR; =20 uaddr_faulted: - futex_q_unlock(hb); + futex_q_unlock_put(hb); =20 ret =3D fault_in_user_writeable(uaddr); if (ret) @@ -1193,7 +1193,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) } =20 get_pi_state(pi_state); - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); =20 /* drops pi_state->pi_mutex.wait_lock */ ret =3D wake_futex_pi(uaddr, uval, pi_state, rt_waiter); @@ -1232,7 +1232,8 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) * owner. */ if ((ret =3D futex_cmpxchg_value_locked(&curval, uaddr, uval, 0))) { - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); + switch (ret) { case -EFAULT: goto pi_faulted; @@ -1252,7 +1253,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) ret =3D (curval =3D=3D uval) ? 0 : -EAGAIN; =20 out_unlock: - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); return ret; =20 pi_retry: diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index fb69dcdf74da8..217cec5c8302e 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -58,7 +58,7 @@ enum { }; =20 const struct futex_q futex_q_init =3D { - /* list gets initialized in futex_queue()*/ + /* list gets initialized in futex_queue_put()*/ .wake =3D futex_wake_mark, .key =3D FUTEX_KEY_INIT, .bitset =3D FUTEX_BITSET_MATCH_ANY, @@ -457,7 +457,7 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, =20 if (unlikely(ret)) { futex_hb_waiters_dec(hb2); - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); =20 ret =3D get_user(curval, uaddr1); if (ret) @@ -543,7 +543,7 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, */ case -EFAULT: futex_hb_waiters_dec(hb2); - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); ret =3D fault_in_user_writeable(uaddr2); if (!ret) goto retry; @@ -557,7 +557,7 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, * - EAGAIN: The user space value changed. */ futex_hb_waiters_dec(hb2); - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); /* * Handle the case where the owner is in the middle of * exiting. Wait for the exit to complete otherwise @@ -675,7 +675,7 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, =20 out_unlock: futex_hb_waiters_dec(hb2); - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); wake_up_q(&wake_q); return ret ? ret : task_count; } @@ -814,7 +814,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, * shared futexes. We need to compare the keys: */ if (futex_match(&q.key, &key2)) { - futex_q_unlock(hb); + futex_q_unlock_put(hb); ret =3D -EINVAL; goto out; } diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index eb86a7ade06a2..8027195802ca1 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -195,7 +195,7 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, i= nt nr_wake, u32 bitset) } } =20 - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); wake_up_q(&wake_q); return ret; } @@ -273,7 +273,7 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flag= s, u32 __user *uaddr2, double_lock_hb(hb1, hb2); op_ret =3D futex_atomic_op_inuser(op, uaddr2); if (unlikely(op_ret < 0)) { - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); =20 if (!IS_ENABLED(CONFIG_MMU) || unlikely(op_ret !=3D -EFAULT && op_ret !=3D -EAGAIN)) { @@ -326,7 +326,7 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flag= s, u32 __user *uaddr2, } =20 out_unlock: - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); wake_up_q(&wake_q); return ret; } @@ -334,7 +334,7 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flag= s, u32 __user *uaddr2, static long futex_wait_restart(struct restart_block *restart); =20 /** - * futex_wait_queue() - futex_queue() and wait for wakeup, timeout, or sig= nal + * futex_wait_queue() - futex_queue_put() and wait for wakeup, timeout, or= signal * @hb: the futex hash bucket, must be locked by the caller * @q: the futex_q to queue up on * @timeout: the prepared hrtimer_sleeper, or null for no timeout @@ -345,11 +345,11 @@ void futex_wait_queue(struct futex_hash_bucket *hb, s= truct futex_q *q, /* * The task state is guaranteed to be set before another task can * wake it. set_current_state() is implemented using smp_store_mb() and - * futex_queue() calls spin_unlock() upon completion, both serializing + * futex_queue_put() calls spin_unlock() upon completion, both serializing * access to the hash list and forcing another memory barrier. */ set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE); - futex_queue(q, hb); + futex_queue_put(q, hb); =20 /* Arm the timer */ if (timeout) @@ -460,11 +460,12 @@ int futex_wait_multiple_setup(struct futex_vector *vs= , int count, int *woken) * next futex. Queue each futex at this moment so hb can * be unlocked. */ - futex_queue(q, hb); + futex_queue_put(q, hb); continue; } =20 - futex_q_unlock(hb); + futex_q_unlock_put(hb); + __set_current_state(TASK_RUNNING); =20 /* @@ -623,7 +624,7 @@ int futex_wait_setup(u32 __user *uaddr, u32 val, unsign= ed int flags, ret =3D futex_get_value_locked(&uval, uaddr); =20 if (ret) { - futex_q_unlock(*hb); + futex_q_unlock_put(*hb); =20 ret =3D get_user(uval, uaddr); if (ret) @@ -636,7 +637,7 @@ int futex_wait_setup(u32 __user *uaddr, u32 val, unsign= ed int flags, } =20 if (uval !=3D val) { - futex_q_unlock(*hb); + futex_q_unlock_put(*hb); ret =3D -EWOULDBLOCK; } =20 @@ -664,7 +665,7 @@ int __futex_wait(u32 __user *uaddr, unsigned int flags,= u32 val, if (ret) return ret; =20 - /* futex_queue and wait for wakeup, timeout, or a signal. */ + /* futex_queue_put() and wait for wakeup, timeout, or a signal. */ futex_wait_queue(hb, &q, to); =20 /* If we were woken (and unqueued), we succeeded, whatever. */ --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3B9C2063E3 for ; Mon, 3 Feb 2025 13:59:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591192; cv=none; b=V4UUkbXREdVas1CGyjbXHeLIuRYcEuvfXkBWFvZJexgQhbCqgWsjNfFG3yfFcYrh2n+su5YcJSKuC7T7v94bSLklxPQBioOOiQWDUIFSG1NEcvS+gOhJt8W8gKx4ziHUR/iTSLCU5mhdLwFRPaeHUU/c3VgNL0eVPUWqVIFJVkQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591192; c=relaxed/simple; bh=kahJmMMuAdiqLs2rA484rWrUQ4drU0Xpd4m7TvAUT1o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fH0KBsafVapXgCuPrsVvbDp5+fPP1unqJaXZ6v9IoX4sL6ZWGWdYgH9lopXSrW8Fwqmiqkm8arM8FcR/c3PXRx+n3FYtNqA6auIhbtOMy2J8qh/3pgnyApKX5615XjlCWWa6bNRkaApDDWu6orCu1ly4k0oUXfRRHwVcOTe58cQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=yuZ8RJ1W; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=rQMLuiHu; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="yuZ8RJ1W"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="rQMLuiHu" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pI2Iy/DyMXZ+IzQy/zbBGtQShR8KXyUna/Rvr4+O/+A=; b=yuZ8RJ1Wu56vHMyEPhds4tgsSYiAxQQZii8LDBbbyPmd6UeVUL8f606F2W/41pRVEKb3hp 8Ppf/jHdBxURXSTp75GTvz/ctM+w9BIZggQCY74fU4L3UYa5n+91OGH5CnWbbnY1lJecU7 5AvP2NSLEedvLtLTOwwG6T5/Ceg1kNVCvDuVpjrbY93mDS9kxmEEZXmt6V91DHT4w5HLeV 4sfkGVnbdrNkO2uT+DyK3R2EB+EXg8GHWOvhKnlStQS40sLPUr9uAG+x7t6E2uqJJOILTH EbruP38D6Y5pObTIfO4+biPozL130BmGtwc+TalmA31PK18GiUFdV1kM3tF6+w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pI2Iy/DyMXZ+IzQy/zbBGtQShR8KXyUna/Rvr4+O/+A=; b=rQMLuiHuRC3XIye6WcwOXXrW/X/DADGX6SqdZmkjtcfbvtYewEVchBElM58NJd3O/jL+sg 52pu7yhYqXCkaNDw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 09/15] futex: Re-evaluate the hash bucket after dropping the lock Date: Mon, 3 Feb 2025 14:59:29 +0100 Message-ID: <20250203135935.440018-10-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In futex_requeue() and futex_wake_op() the hash bucket lock is dropped in the failure paths for handling page faults and other error scenarios. After that the code jumps back to retry_private which relocks the hash bucket[s] under the assumption that the hash bucket pointer which was retrieved via futex_hash() is still valid. With resizable private hash buckets, that assumption is not longer true as the waiters can be moved to a larger hash in the meantime. Move the retry_private label above the hashing function to handle this correctly. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/requeue.c | 2 +- kernel/futex/waitwake.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index 217cec5c8302e..31ec543e7fdb3 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -443,10 +443,10 @@ int futex_requeue(u32 __user *uaddr1, unsigned int fl= ags1, if (requeue_pi && futex_match(&key1, &key2)) return -EINVAL; =20 +retry_private: hb1 =3D futex_hash(&key1); hb2 =3D futex_hash(&key2); =20 -retry_private: futex_hb_waiters_inc(hb2); double_lock_hb(hb1, hb2); =20 diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index 8027195802ca1..98409ba9605a8 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -266,10 +266,10 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int fl= ags, u32 __user *uaddr2, if (unlikely(ret !=3D 0)) return ret; =20 +retry_private: hb1 =3D futex_hash(&key1); hb2 =3D futex_hash(&key2); =20 -retry_private: double_lock_hb(hb1, hb2); op_ret =3D futex_atomic_op_inuser(op, uaddr2); if (unlikely(op_ret < 0)) { --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7AB22063F3 for ; Mon, 3 Feb 2025 13:59:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591192; cv=none; b=kt6soQvGuvzX6be8TurJRJ/8lN4IbHAv5nNlw0S1kczHkQFl2GxYnEGK/3x0eCogMnENLKgMcBrAYAm3ZGdBE9TN3fNeXEgfGv7NKMHe/lUeRvwNm0tO6Tys+Dt/q1eT5eFgJ/Bj+xMQO02nfs46Wz83g8XdR62wvKiUjPYudXs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591192; c=relaxed/simple; bh=yubPvaEZK0vxwclCtPELJ5sI0axGk6ORRtP2sesGHLI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FB4EAg+jMWVrebObLhHSCk52qurjnslPCeFEwxi+oMTUH4gzWostn9SVhHUGxN9b6+XYm/UQFnC4xN8SG9ruVpho7K13vqGOH+QvFo+IbwhDCFWl46l8LvqKrnFeVhElgWlGuPNS1bX4FXqIQ8o63B5F9DLlZBvh0p8O2Me27k0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=EEK0NM6J; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Z32FA2NU; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="EEK0NM6J"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Z32FA2NU" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=angzKPUQ8WH0gNbUxVDwpfIDs5MjiCkNcpeGZGsQfvQ=; b=EEK0NM6JQStGUhcosxLsW8pbA9q98AzsmDTNOlVT6fclBbldgtjmf6U3HDB9QpEhQduN5t X6sg79D5V92cHyxFcjskViTHABorbmStBkG0Mp+kAJaGzlHAINqHaRMHSdTpmg2QOeXZoB VmRfAJz3lvZuFwW09vFdb2bizg17u1/A9gURufTxOXjfnsKg4H4odzWlDZQwfbTWOHtlNM idTABv0F06YKQ/VbFKeNGlj/8GoTh0Gp1cEq05wnps1KuCiSf9gen8pZzQSKMM1ptLYYVW FLO0oTI08kDei5zY8F5vn0IEYOox74VYFPWWSHkoNNUzx7t9afz8IJ+Ou0rLTA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=angzKPUQ8WH0gNbUxVDwpfIDs5MjiCkNcpeGZGsQfvQ=; b=Z32FA2NUAwb9bDZPt5mZUwlrSJbejI+62IWj1T+1OqIfbxXsAFu/w7TJZPL8ghoS0/NG9y 9hh3seNdnm9AIxAg== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 10/15] futex: Introduce futex_get_locked_hb(). Date: Mon, 3 Feb 2025 14:59:30 +0100 Message-ID: <20250203135935.440018-11-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" futex_lock_pi() and __fixup_pi_state_owner() acquire the futex_q::lock_ptr without holding a reference assuming the previously obtained hash bucket and the assigned lock_ptr are still valid. This isn't the case once the private hash can be resized and becomes invalid after the reference drop. Introduce futex_get_locked_hb() to lock the hash bucket recorded in futex_q::lock_ptr. The lock pointer is read in a RCU section to ensure that it does not go away if the hash bucket has been replaced and the old pointer has been observed. After locking the pointer needs to be compared to check if it changed. If so then the hash bucket has been replaced and the user has been moved to the new one and lock_ptr has been updated. The lock operation needs to be redone in this case. Once the lock_ptr is the same, we can return the futex_hash_bucket it belongs to as the hash bucket for the caller locked. This is important because we don't own a reference so the hash bucket is valid as long as we hold the lock. This means if the local hash is resized then this (old) hash bucket remains valid as long as we hold the lock because all user need to be moved to the new hash bucket and have their lock_ptr updated. The task performing the resize will block. A special case is an early return in futex_lock_pi() (due to signal or timeout) and a successful futex_wait_requeue_pi(). In both cases a valid futex_q::lock_ptr is expected (and its matching hash bucket) but since the waiter has been removed from the hash this can no longer be guaranteed. Therefore before the waiter is removed a reference is acquired which is later dropped by the waiter to avoid a resize. Add futex_get_locked_hb() and use it. Acquire an additional reference in requeue_pi_wake_futex() and futex_unlock_pi() while the futex_q is removed, denote this extra reference in futex_q::drop_hb_ref and let the waiter drop the reference in this case. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 41 +++++++++++++++++++++++++++++++++++++++++ kernel/futex/futex.h | 4 +++- kernel/futex/pi.c | 17 +++++++++++++++-- kernel/futex/requeue.c | 16 ++++++++++++---- 4 files changed, 71 insertions(+), 7 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index b54fcb1c6248d..b0fb2b10a387c 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -156,6 +156,17 @@ void futex_hash_put(struct futex_hash_bucket *hb) { } =20 +/** + * futex_hash_get - Get an additional reference for the local hash. + * @hb: ptr to the private local hash. + * + * Obtain an additional reference for the already obtained hash bucket. The + * caller must already own an reference. + */ +void futex_hash_get(struct futex_hash_bucket *hb) +{ +} + /** * futex_setup_timer - set up the sleeping hrtimer. * @time: ptr to the given timeout value @@ -639,6 +650,36 @@ int futex_unqueue(struct futex_q *q) return ret; } =20 +struct futex_hash_bucket *futex_get_locked_hb(struct futex_q *q) +{ + struct futex_hash_bucket *hb; + spinlock_t *lock_ptr; + + /* + * See futex_unqueue() why lock_ptr can change. + */ + guard(rcu)(); +retry: + lock_ptr =3D READ_ONCE(q->lock_ptr); + spin_lock(lock_ptr); + + if (unlikely(lock_ptr !=3D q->lock_ptr)) { + spin_unlock(lock_ptr); + goto retry; + } + + hb =3D container_of(lock_ptr, struct futex_hash_bucket, lock); + /* + * The caller needs to either hold a reference on the hash (to ensure + * that the hash is not resized) _or_ be enqueued on the hash. This + * ensures that futex_q::lock_ptr is updated while moved to the new + * hash during resize. + * Once the hash bucket is locked the resize operation, which might be + * in progress, will block on the lock. + */ + return hb; +} + /* * PI futexes can not be requeued and must remove themselves from the hash * bucket. The hash bucket lock (i.e. lock_ptr) is held. diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index 36627617f7ced..5b33016648ecd 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -182,6 +182,7 @@ struct futex_q { union futex_key *requeue_pi_key; u32 bitset; atomic_t requeue_state; + bool drop_hb_ref; #ifdef CONFIG_PREEMPT_RT struct rcuwait requeue_wait; #endif @@ -196,12 +197,13 @@ enum futex_access { =20 extern int get_futex_key(u32 __user *uaddr, unsigned int flags, union fute= x_key *key, enum futex_access rw); - +extern struct futex_hash_bucket *futex_get_locked_hb(struct futex_q *q); extern struct hrtimer_sleeper * futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout, int flags, u64 range_ns); =20 extern struct futex_hash_bucket *futex_hash(union futex_key *key); +extern void futex_hash_get(struct futex_hash_bucket *hb); extern void futex_hash_put(struct futex_hash_bucket *hb); =20 /** diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index 797270228665a..c83fe575f954a 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -806,7 +806,7 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, st= ruct futex_q *q, break; } =20 - spin_lock(q->lock_ptr); + futex_get_locked_hb(q); raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); =20 /* @@ -922,6 +922,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags= , ktime_t *time, int tryl struct rt_mutex_waiter rt_waiter; struct futex_hash_bucket *hb; struct futex_q q =3D futex_q_init; + bool fast_path_ref_put =3D false; DEFINE_WAKE_Q(wake_q); int res, ret; =20 @@ -988,6 +989,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags= , ktime_t *time, int tryl ret =3D rt_mutex_futex_trylock(&q.pi_state->pi_mutex); /* Fixup the trylock return value: */ ret =3D ret ? 0 : -EWOULDBLOCK; + fast_path_ref_put =3D true; goto no_block; } =20 @@ -1014,6 +1016,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fla= gs, ktime_t *time, int tryl */ raw_spin_lock_irq(&q.pi_state->pi_mutex.wait_lock); spin_unlock(q.lock_ptr); + futex_hash_put(hb); /* * __rt_mutex_start_proxy_lock() unconditionally enqueues the @rt_waiter * such that futex_unlock_pi() is guaranteed to observe the waiter when @@ -1060,7 +1063,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fla= gs, ktime_t *time, int tryl * spinlock/rtlock (which might enqueue its own rt_waiter) and fix up * the */ - spin_lock(q.lock_ptr); + hb =3D futex_get_locked_hb(&q); /* * Waiter is unqueued. */ @@ -1080,6 +1083,10 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl= ags, ktime_t *time, int tryl =20 futex_unqueue_pi(&q); spin_unlock(q.lock_ptr); + if (q.drop_hb_ref) + futex_hash_put(hb); + if (fast_path_ref_put) + futex_hash_put(hb); goto out; =20 out_unlock_put_key: @@ -1187,6 +1194,12 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int = flags) */ rt_waiter =3D rt_mutex_top_waiter(&pi_state->pi_mutex); if (!rt_waiter) { + /* + * Acquire a reference for the leaving waiter to ensure + * valid futex_q::lock_ptr. + */ + futex_hash_get(hb); + top_waiter->drop_hb_ref =3D true; __futex_unqueue(top_waiter); raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); goto retry_hb; diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index 31ec543e7fdb3..167204e856fec 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -231,7 +231,12 @@ void requeue_pi_wake_futex(struct futex_q *q, union fu= tex_key *key, =20 WARN_ON(!q->rt_waiter); q->rt_waiter =3D NULL; - + /* + * Acquire a reference for the waiter to ensure valid + * futex_q::lock_ptr. + */ + futex_hash_get(hb); + q->drop_hb_ref =3D true; q->lock_ptr =3D &hb->lock; =20 /* Signal locked state to the waiter */ @@ -825,7 +830,8 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, switch (futex_requeue_pi_wakeup_sync(&q)) { case Q_REQUEUE_PI_IGNORE: /* The waiter is still on uaddr1 */ - spin_lock(&hb->lock); + hb =3D futex_get_locked_hb(&q); + ret =3D handle_early_requeue_pi_wakeup(hb, &q, to); spin_unlock(&hb->lock); break; @@ -833,7 +839,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, case Q_REQUEUE_PI_LOCKED: /* The requeue acquired the lock */ if (q.pi_state && (q.pi_state->owner !=3D current)) { - spin_lock(q.lock_ptr); + futex_get_locked_hb(&q); ret =3D fixup_pi_owner(uaddr2, &q, true); /* * Drop the reference to the pi state which the @@ -860,7 +866,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, if (ret && !rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter)) ret =3D 0; =20 - spin_lock(q.lock_ptr); + futex_get_locked_hb(&q); debug_rt_mutex_free_waiter(&rt_waiter); /* * Fixup the pi_state owner and possibly acquire the lock if we @@ -892,6 +898,8 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, default: BUG(); } + if (q.drop_hb_ref) + futex_hash_put(hb); =20 out: if (to) { --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D5A9C2066DB for ; Mon, 3 Feb 2025 13:59:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591193; cv=none; b=E15G1jn45/oEFD3+PfJHujOKKRfNA83gj/9MqTw8Mzeqxus3hVn/xlfg2IZIvzThQvUVlrYXy6XrR/XTcDIkwGanOFLfnnBIIAi0PyaHUgi4mibt0NooezRI+jhhzgG6F/XQ4PnyXXpB0Tti/SRNiCk2dWCKIfti2/BZMQEXiJ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591193; c=relaxed/simple; bh=1uYS9pNiTULyeIwt9yeX0lUYb6w6XJseygnHhzzqUQE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A3mOcCrmaYnR9dz6Hlu12jb/NgaCwDyGW4Ut9YKvPaePtwyTxfZgl4bQ7lr5OHUZ7TgX0uX2ki6XXK6NXCzdSvhV90vvAg04uIZcWGaNfUQJJ/B/4q1UfCb6B1BXPtz7c3kqA5OWlmJaSnJDLleYyHFcS3ryh4eCp+asWWTFQBI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=VPlObGbP; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=wE4mSvr0; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="VPlObGbP"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="wE4mSvr0" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591187; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J1iTOZ4lQ3H+o3YagXn1wXAwJ9BFbmsamUspj/Lezec=; b=VPlObGbPyKbNT5ckC/Dy/dGcpSteFcTZ0NJ5DtQzMyUZy0eGSzPr71MGF/ds14SVu5b36h YAGVHg0Ap4I2E0cZTaYRyWsFigl9aC0Jfj5axyO6TRjvjs+V2+DeVGOYFxTTMOw4leexUn fQzAL1F44LAFXn696ZNAULTUeeDBugWuFigZlPPS9I1Q3ubLmQ5udqmr6Y3X0sR5ovuWtj cstVetwF+oshF6dzMne0jq/djEc6obD2H876FHA3LnX6Z6plWv5igmm1VIeUKooYUpYiSh cevLWu3pBT/Wh3wudTv7MkUf2OgHF6LQGgYOXtsZR3BUI7Z1rZ6e4hiOxgkiaA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591188; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=J1iTOZ4lQ3H+o3YagXn1wXAwJ9BFbmsamUspj/Lezec=; b=wE4mSvr0IBnKhJdv4DWD07gqIUKvHbo0xkjsMRvWtpP9bTE9JlqNi5Luk8gTnxn/Q2gLs5 mRGFSB/qLMSrS4CA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 11/15] futex: Acquire a hash reference in futex_wait_multiple_setup(). Date: Mon, 3 Feb 2025 14:59:31 +0100 Message-ID: <20250203135935.440018-12-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" futex_wait_multiple_setup() changes task_struct::__state to !TASK_RUNNING and then enqueues on multiple futexes. Every futex_q_lock() acquires a reference on the global hash which is dropped later. If a rehash is in progress then the loop will block on mm_struct::futex_hash_bucket for the rehash to complete and this will lose the previously set task_struct::__state. Acquire a reference on the local hash to avoiding blocking on mm_struct::futex_hash_bucket. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 10 ++++++++++ kernel/futex/futex.h | 2 ++ kernel/futex/waitwake.c | 21 +++++++++++++++++++-- 3 files changed, 31 insertions(+), 2 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index b0fb2b10a387c..7130019aa9ec6 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -129,6 +129,11 @@ static struct futex_hash_bucket *futex_hash_private(un= ion futex_key *key, return &fhb[hash & hash_mask]; } =20 +struct futex_private_hash *futex_get_private_hash(void) +{ + return NULL; +} + /** * futex_hash - Return the hash bucket in the global or local hash * @key: Pointer to the futex key for which the hash is calculated @@ -152,6 +157,11 @@ struct futex_hash_bucket *futex_hash(union futex_key *= key) return &futex_queues[hash & (futex_hashsize - 1)]; } =20 +bool futex_put_private_hash(struct futex_private_hash *hb_p) +{ + return false; +} + void futex_hash_put(struct futex_hash_bucket *hb) { } diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index 5b33016648ecd..d6fa6f663d9ad 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -205,6 +205,8 @@ futex_setup_timer(ktime_t *time, struct hrtimer_sleeper= *timeout, extern struct futex_hash_bucket *futex_hash(union futex_key *key); extern void futex_hash_get(struct futex_hash_bucket *hb); extern void futex_hash_put(struct futex_hash_bucket *hb); +extern struct futex_private_hash *futex_get_private_hash(void); +extern bool futex_put_private_hash(struct futex_private_hash *hb_p); =20 /** * futex_match - Check whether two futex keys are equal diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index 98409ba9605a8..3d57b47692f57 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -395,7 +395,7 @@ int futex_unqueue_multiple(struct futex_vector *v, int = count) } =20 /** - * futex_wait_multiple_setup - Prepare to wait and enqueue multiple futexes + * __futex_wait_multiple_setup - Prepare to wait and enqueue multiple fute= xes * @vs: The futex list to wait on * @count: The size of the list * @woken: Index of the last woken futex, if any. Used to notify the @@ -410,7 +410,7 @@ int futex_unqueue_multiple(struct futex_vector *v, int = count) * - 0 - Success * - <0 - -EFAULT, -EWOULDBLOCK or -EINVAL */ -int futex_wait_multiple_setup(struct futex_vector *vs, int count, int *wok= en) +static int __futex_wait_multiple_setup(struct futex_vector *vs, int count,= int *woken) { struct futex_hash_bucket *hb; bool retry =3D false; @@ -499,6 +499,23 @@ int futex_wait_multiple_setup(struct futex_vector *vs,= int count, int *woken) return 0; } =20 +int futex_wait_multiple_setup(struct futex_vector *vs, int count, int *wok= en) +{ + struct futex_private_hash *hb_p; + int ret; + + /* + * Assume to have a private futex and acquire a reference on the private + * hash to avoid blocking on mm_struct::futex_hash_bucket during rehash + * after changing the task state. + */ + hb_p =3D futex_get_private_hash(); + ret =3D __futex_wait_multiple_setup(vs, count, woken); + if (hb_p) + futex_put_private_hash(hb_p); + return ret; +} + /** * futex_sleep_multiple - Check sleeping conditions and sleep * @vs: List of futexes to wait for --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D1CC2066C4 for ; Mon, 3 Feb 2025 13:59:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591193; cv=none; b=OF7o3xg1GFH3sFyjNh9Z+3c45/sunYF7rVZd9ueR7XldfWTU90XohtiEqtLiOBMcRhpnYG7bR99Q0tHmM6HfiOO8qofqm74aQtbWYH5r70kJ3k6i/jCAM4xnVWBTWezDQnxG7aaxAgOPLuwLBRB/8TlZy70KlNshZJ+N7CNYA/o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591193; c=relaxed/simple; bh=VoSmxeN2eLq6/w0zDkByDRft/KLzIzymjynPM3ccWN4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XEw/fHrNcNrvFRLBQrlxP9c9IyeUJHIvk0DXY+z3Sn8kG8YLD/7JfTwZ/VCK+q/321pLIhnx+5GB+jy6BnSQ8nZUsaz1fa8GS4vLirvSwSgzLJlPSeEckw0XWthi+BGqKXSCNUYKJPW05cqubWi3Tho9Jp3sxxD4bVZ0tHCyXPU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=PXvWuAQY; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=EX7BxbO/; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="PXvWuAQY"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="EX7BxbO/" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591188; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yA05YBQY+kTjPGvwPHZb21SXWyVuKH+DzKhEH0ESsSc=; b=PXvWuAQYt5i/Z2bRqyQGGsWtk+tpnWFDHClj6lIjCGtilaH1fkMiTW3W4JmWsXbOr6SGDR LIMiwSV/ORIKrvrjpESvwsohzFOc5//pjmiBBMT+OS5PqUHu4xRqKgNJzTSFX3eHKKnlOI ZcXtuGnBhIlgKCQF8johiJIn43Gu5zKUP/2HbtEJNVqJ29UqQXCB4AwKLexVyZSdtV3HmM MH5tC0el4bCDIbv13s4MjPPhoY55QDJ+0PUX+WmFYbod4Bq2N29B8k6QouKVZW6016by6L QqryE01+X7+DUrDFr2GUd3ucAXt0Z6GZwfuuWAa1mlidu2wKQErVGrTdWNnKKw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591188; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yA05YBQY+kTjPGvwPHZb21SXWyVuKH+DzKhEH0ESsSc=; b=EX7BxbO/sw35hNx7JA8nF6wK7dcEl/Muf42EU6Dw1XEMKXHOi8yO61ivGNxpiZUNBmbM9n E3Y9z7vqNCKc9PBw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 12/15] futex: Allow to re-allocate the private local hash. Date: Mon, 3 Feb 2025 14:59:32 +0100 Message-ID: <20250203135935.440018-13-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The mm_struct::futex_hash_lock guards the futex_hash_bucket assignment/ replacement. The futex_hash_allocate()/ PR_FUTEX_HASH_SET_SLOTS operation can now be invoked at runtime and resize an already existing internal private futex_hash_bucket to another size. The reallocation is based on an idea by Thomas Gleixner: The initial allocation of struct futex_private_hash sets the reference count to one. Every user acquires a reference on the local hash before using it and drops it after it enqueued itself on the hash bucket. There is no reference held while the task is scheduled out while waiting for the wake up. The resize allocates a new struct futex_private_hash and drops the initial reference under the mm_struct::futex_hash_lock. If the reference drop results in destruction of the object then users currently queued on the local hash will be requeued on the new local hash. At the end mm_struct::futex_phash is updated, the old pointer is RCU freed and the mutex is dropped. If the reference drop does not result in destruction of the object then the new pointer is saved as mm_struct::futex_phash_new. In this case replacement is delayed. The user dropping the last reference is not always the best choice to perform the replacement. For instance futex_wait_queue() drops the reference after changing its task state which will also be modified while the futex_hash_lock is acquired. Therefore the replacement is delayed to the task acquiring a reference on the current local hash. This scheme keeps the requirement that all waiters/ wakers of the same addr= ess block always on the same futex_hash_bucket::lock. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/futex.h | 5 +- include/linux/mm_types.h | 7 +- kernel/futex/core.c | 252 +++++++++++++++++++++++++++++++++++---- kernel/futex/futex.h | 1 + kernel/futex/requeue.c | 5 + kernel/futex/waitwake.c | 4 +- 6 files changed, 243 insertions(+), 31 deletions(-) diff --git a/include/linux/futex.h b/include/linux/futex.h index bad377c30de5e..bfb38764bac7a 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -83,12 +83,13 @@ void futex_hash_free(struct mm_struct *mm); =20 static inline void futex_mm_init(struct mm_struct *mm) { - mm->futex_hash_bucket =3D NULL; + rcu_assign_pointer(mm->futex_phash, NULL); + mutex_init(&mm->futex_hash_lock); } =20 static inline bool futex_hash_requires_allocation(void) { - if (current->mm->futex_hash_bucket) + if (current->mm->futex_phash) return false; return true; } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c20f2310d78ca..19abbc870e0a9 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -30,7 +30,7 @@ #define INIT_PASID 0 =20 struct address_space; -struct futex_hash_bucket; +struct futex_private_hash; struct mem_cgroup; =20 /* @@ -938,8 +938,9 @@ struct mm_struct { seqcount_t mm_lock_seq; #endif #ifdef CONFIG_FUTEX - unsigned int futex_hash_mask; - struct futex_hash_bucket *futex_hash_bucket; + struct mutex futex_hash_lock; + struct futex_private_hash __rcu *futex_phash; + struct futex_private_hash *futex_phash_new; #endif =20 unsigned long hiwater_rss; /* High-watermark of RSS usage */ diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 7130019aa9ec6..e1bf43f7eb277 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -40,6 +40,7 @@ #include #include #include +#include =20 #include "futex.h" #include "../locking/rtmutex_common.h" @@ -56,6 +57,14 @@ static struct { #define futex_queues (__futex_data.queues) #define futex_hashsize (__futex_data.hashsize) =20 +struct futex_private_hash { + rcuref_t users; + unsigned int hash_mask; + struct rcu_head rcu; + bool initial_ref_dropped; + bool released; + struct futex_hash_bucket queues[]; +}; =20 /* * Fault injections for futexes. @@ -129,9 +138,122 @@ static struct futex_hash_bucket *futex_hash_private(u= nion futex_key *key, return &fhb[hash & hash_mask]; } =20 +static void futex_rehash_current_users(struct futex_private_hash *old, + struct futex_private_hash *new) +{ + struct futex_hash_bucket *hb_old, *hb_new; + unsigned int slots =3D old->hash_mask + 1; + u32 hash_mask =3D new->hash_mask; + unsigned int i; + + for (i =3D 0; i < slots; i++) { + struct futex_q *this, *tmp; + + hb_old =3D &old->queues[i]; + + spin_lock(&hb_old->lock); + plist_for_each_entry_safe(this, tmp, &hb_old->chain, list) { + + plist_del(&this->list, &hb_old->chain); + futex_hb_waiters_dec(hb_old); + + WARN_ON_ONCE(this->lock_ptr !=3D &hb_old->lock); + + hb_new =3D futex_hash_private(&this->key, new->queues, hash_mask); + futex_hb_waiters_inc(hb_new); + /* + * The new pointer isn't published yet but an already + * moved user can be unqueued due to timeout or signal. + */ + spin_lock_nested(&hb_new->lock, SINGLE_DEPTH_NESTING); + plist_add(&this->list, &hb_new->chain); + this->lock_ptr =3D &hb_new->lock; + spin_unlock(&hb_new->lock); + } + spin_unlock(&hb_old->lock); + } +} + +static void futex_assign_new_hash(struct futex_private_hash *hb_p_new, + struct mm_struct *mm) +{ + bool drop_init_ref =3D hb_p_new !=3D NULL; + struct futex_private_hash *hb_p; + + if (!hb_p_new) { + hb_p_new =3D mm->futex_phash_new; + mm->futex_phash_new =3D NULL; + } + /* Someone was quicker, the current mask is valid */ + if (!hb_p_new) + return; + + hb_p =3D rcu_dereference_check(mm->futex_phash, + lockdep_is_held(&mm->futex_hash_lock)); + if (hb_p) { + if (hb_p->hash_mask >=3D hb_p_new->hash_mask) { + /* It was increased again while we were waiting */ + kvfree(hb_p_new); + return; + } + /* + * If the caller started the resize then the initial reference + * needs to be dropped. If the object can not be deconstructed + * we save hb_p_new for later and ensure the reference counter + * is not dropped again. + */ + if (drop_init_ref && + (hb_p->initial_ref_dropped || !futex_put_private_hash(hb_p))) { + mm->futex_phash_new =3D hb_p_new; + hb_p->initial_ref_dropped =3D true; + return; + } + if (!READ_ONCE(hb_p->released)) { + mm->futex_phash_new =3D hb_p_new; + return; + } + + futex_rehash_current_users(hb_p, hb_p_new); + } + rcu_assign_pointer(mm->futex_phash, hb_p_new); + kvfree_rcu(hb_p, rcu); +} + struct futex_private_hash *futex_get_private_hash(void) { - return NULL; + struct mm_struct *mm =3D current->mm; + /* + * Ideally we don't loop. If there is a replacement in progress + * then a new private hash is already prepared and a reference can't be + * obtained once the last user dropped it's. + * In that case we block on mm_struct::futex_hash_lock and either have + * to perform the replacement or wait while someone else is doing the + * job. Eitherway, on the second iteration we acquire a reference on the + * new private hash or loop again because a new replacement has been + * requested. + */ +again: + scoped_guard(rcu) { + struct futex_private_hash *hb_p; + + hb_p =3D rcu_dereference(mm->futex_phash); + if (!hb_p) + return NULL; + + if (rcuref_get(&hb_p->users)) + return hb_p; + } + scoped_guard(mutex, ¤t->mm->futex_hash_lock) + futex_assign_new_hash(NULL, mm); + goto again; +} + +static struct futex_private_hash *futex_get_private_hb(union futex_key *ke= y) +{ + if (!futex_key_is_private(key)) + return NULL; + + return futex_get_private_hash(); } =20 /** @@ -144,12 +266,12 @@ struct futex_private_hash *futex_get_private_hash(voi= d) */ struct futex_hash_bucket *futex_hash(union futex_key *key) { - struct futex_hash_bucket *fhb; + struct futex_private_hash *hb_p; u32 hash; =20 - fhb =3D current->mm->futex_hash_bucket; - if (fhb && futex_key_is_private(key)) - return futex_hash_private(key, fhb, current->mm->futex_hash_mask); + hb_p =3D futex_get_private_hb(key); + if (hb_p) + return futex_hash_private(key, hb_p->queues, hb_p->hash_mask); =20 hash =3D jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, @@ -159,11 +281,25 @@ struct futex_hash_bucket *futex_hash(union futex_key = *key) =20 bool futex_put_private_hash(struct futex_private_hash *hb_p) { - return false; + bool released; + + guard(preempt)(); + released =3D rcuref_put_rcusafe(&hb_p->users); + if (released) + WRITE_ONCE(hb_p->released, true); + return released; } =20 void futex_hash_put(struct futex_hash_bucket *hb) { + struct futex_private_hash *hb_p; + + if (hb->hb_slot =3D=3D 0) + return; + hb_p =3D container_of(hb, struct futex_private_hash, + queues[hb->hb_slot - 1]); + + futex_put_private_hash(hb_p); } =20 /** @@ -175,6 +311,14 @@ void futex_hash_put(struct futex_hash_bucket *hb) */ void futex_hash_get(struct futex_hash_bucket *hb) { + struct futex_private_hash *hb_p; + + if (hb->hb_slot =3D=3D 0) + return; + hb_p =3D container_of(hb, struct futex_private_hash, + queues[hb->hb_slot - 1]); + + WARN_ON_ONCE(!rcuref_get(&hb_p->users)); } =20 /** @@ -622,6 +766,8 @@ int futex_unqueue(struct futex_q *q) spinlock_t *lock_ptr; int ret =3D 0; =20 + /* RCU so lock_ptr is not going away during locking. */ + guard(rcu)(); /* In the common case we don't take the spinlock, which is nice. */ retry: /* @@ -1032,10 +1178,22 @@ static void compat_exit_robust_list(struct task_str= uct *curr) static void exit_pi_state_list(struct task_struct *curr) { struct list_head *next, *head =3D &curr->pi_state_list; + struct futex_private_hash *hb_p; struct futex_pi_state *pi_state; struct futex_hash_bucket *hb; union futex_key key =3D FUTEX_KEY_INIT; =20 + /* + * The mutex mm_struct::futex_hash_lock might be acquired. + */ + might_sleep(); + /* + * Ensure the hash remains stable (no resize) during the while loop + * below. The hb pointer is acquired under the pi_lock so we can't block + * on the mutex. + */ + WARN_ON(curr !=3D current); + hb_p =3D futex_get_private_hash(); /* * We are a ZOMBIE and nobody can enqueue itself on * pi_state_list anymore, but we have to be careful @@ -1061,6 +1219,7 @@ static void exit_pi_state_list(struct task_struct *cu= rr) if (!refcount_inc_not_zero(&pi_state->refcount)) { raw_spin_unlock_irq(&curr->pi_lock); cpu_relax(); + futex_hash_put(hb); raw_spin_lock_irq(&curr->pi_lock); continue; } @@ -1076,7 +1235,7 @@ static void exit_pi_state_list(struct task_struct *cu= rr) if (head->next !=3D next) { /* retain curr->pi_lock for the loop invariant */ raw_spin_unlock(&pi_state->pi_mutex.wait_lock); - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); put_pi_state(pi_state); continue; } @@ -1088,7 +1247,7 @@ static void exit_pi_state_list(struct task_struct *cu= rr) =20 raw_spin_unlock(&curr->pi_lock); raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); =20 rt_mutex_futex_unlock(&pi_state->pi_mutex); put_pi_state(pi_state); @@ -1096,6 +1255,8 @@ static void exit_pi_state_list(struct task_struct *cu= rr) raw_spin_lock_irq(&curr->pi_lock); } raw_spin_unlock_irq(&curr->pi_lock); + if (hb_p) + futex_put_private_hash(hb_p); } #else static inline void exit_pi_state_list(struct task_struct *curr) { } @@ -1209,8 +1370,9 @@ void futex_exit_release(struct task_struct *tsk) futex_cleanup_end(tsk, FUTEX_STATE_DEAD); } =20 -static void futex_hash_bucket_init(struct futex_hash_bucket *fhb) +static void futex_hash_bucket_init(struct futex_hash_bucket *fhb, unsigned= int slot) { + fhb->hb_slot =3D slot; atomic_set(&fhb->waiters, 0); plist_head_init(&fhb->chain); spin_lock_init(&fhb->lock); @@ -1218,20 +1380,33 @@ static void futex_hash_bucket_init(struct futex_has= h_bucket *fhb) =20 void futex_hash_free(struct mm_struct *mm) { - kvfree(mm->futex_hash_bucket); + struct futex_private_hash *hb_p; + + kvfree(mm->futex_phash_new); + /* + * The mm_struct belonging to the task is removed so all threads, that + * ever accessed the private hash, are gone and the pointer can be + * accessed directly (omitting a RCU-read section). + * Since there can not be a thread holding a reference to the private + * hash we free it immediately. + */ + hb_p =3D rcu_access_pointer(mm->futex_phash); + if (!hb_p) + return; + + if (!hb_p->initial_ref_dropped && WARN_ON(!futex_put_private_hash(hb_p))) + return; + + kvfree(hb_p); } =20 static int futex_hash_allocate(unsigned int hash_slots) { - struct futex_hash_bucket *fhb; + struct futex_private_hash *hb_p, *hb_tofree =3D NULL; + struct mm_struct *mm =3D current->mm; + size_t alloc_size; int i; =20 - if (current->mm->futex_hash_bucket) - return -EALREADY; - - if (!thread_group_leader(current)) - return -EINVAL; - if (hash_slots =3D=3D 0) hash_slots =3D 16; if (hash_slots < 2) @@ -1241,16 +1416,39 @@ static int futex_hash_allocate(unsigned int hash_sl= ots) if (!is_power_of_2(hash_slots)) hash_slots =3D rounddown_pow_of_two(hash_slots); =20 - fhb =3D kvmalloc_array(hash_slots, sizeof(struct futex_hash_bucket), GFP_= KERNEL_ACCOUNT); - if (!fhb) + if (unlikely(check_mul_overflow(hash_slots, sizeof(struct futex_hash_buck= et), + &alloc_size))) return -ENOMEM; =20 - current->mm->futex_hash_mask =3D hash_slots - 1; + if (unlikely(check_add_overflow(alloc_size, sizeof(struct futex_private_h= ash), + &alloc_size))) + return -ENOMEM; + + hb_p =3D kvmalloc(alloc_size, GFP_KERNEL_ACCOUNT); + if (!hb_p) + return -ENOMEM; + + rcuref_init(&hb_p->users, 1); + hb_p->initial_ref_dropped =3D false; + hb_p->released =3D false; + hb_p->hash_mask =3D hash_slots - 1; =20 for (i =3D 0; i < hash_slots; i++) - futex_hash_bucket_init(&fhb[i]); + futex_hash_bucket_init(&hb_p->queues[i], i + 1); =20 - current->mm->futex_hash_bucket =3D fhb; + scoped_guard(mutex, &mm->futex_hash_lock) { + if (mm->futex_phash_new) { + if (mm->futex_phash_new->hash_mask <=3D hb_p->hash_mask) { + hb_tofree =3D mm->futex_phash_new; + } else { + hb_tofree =3D hb_p; + hb_p =3D mm->futex_phash_new; + } + mm->futex_phash_new =3D NULL; + } + futex_assign_new_hash(hb_p, mm); + } + kvfree(hb_tofree); return 0; } =20 @@ -1261,8 +1459,12 @@ int futex_hash_allocate_default(void) =20 static int futex_hash_get_slots(void) { - if (current->mm->futex_hash_bucket) - return current->mm->futex_hash_mask + 1; + struct futex_private_hash *hb_p; + + guard(rcu)(); + hb_p =3D rcu_dereference(current->mm->futex_phash); + if (hb_p) + return hb_p->hash_mask + 1; return 0; } =20 @@ -1304,7 +1506,7 @@ static int __init futex_init(void) futex_hashsize =3D 1UL << futex_shift; =20 for (i =3D 0; i < futex_hashsize; i++) - futex_hash_bucket_init(&futex_queues[i]); + futex_hash_bucket_init(&futex_queues[i], 0); =20 return 0; } diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index d6fa6f663d9ad..127f333d3b0d5 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -115,6 +115,7 @@ static inline bool should_fail_futex(bool fshared) */ struct futex_hash_bucket { atomic_t waiters; + unsigned int hb_slot; spinlock_t lock; struct plist_head chain; } ____cacheline_aligned_in_smp; diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index 167204e856fec..eb506428c9574 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -87,6 +87,11 @@ void requeue_futex(struct futex_q *q, struct futex_hash_= bucket *hb1, futex_hb_waiters_inc(hb2); plist_add(&q->list, &hb2->chain); q->lock_ptr =3D &hb2->lock; + /* + * hb1 and hb2 belong to the same futex_hash_bucket_private + * because if we managed get a reference on hb1 then it can't be + * replaced. Therefore we avoid put(hb1)+get(hb2) here. + */ } q->key =3D *key2; } diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index 3d57b47692f57..7dd75781c9b6b 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -173,8 +173,10 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, = int nr_wake, u32 bitset) hb =3D futex_hash(&key); =20 /* Make sure we really have tasks to wakeup */ - if (!futex_hb_waiters_pending(hb)) + if (!futex_hb_waiters_pending(hb)) { + futex_hash_put(hb); return ret; + } =20 spin_lock(&hb->lock); =20 --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2FB652066ED for ; Mon, 3 Feb 2025 13:59:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591194; cv=none; b=plhdqo3InfZcm7gRKoW3nAL2xet/sjO6NzJJ4LzAXNS/Von3ODjFwUdIZ+GEABRM5lyyemdch2ceLbp9Ol99PqH8ZtdXn1FvisZEiKpQ1d6EfZyOgrRIbVjsePCIcrGgOatS+sJn1TlPVEzzHniumoOCRwfrQIgWG/DoEuw8Qv8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591194; c=relaxed/simple; bh=At0iobiar2kYsDUJScEyR9TMwWbMFo3Vzl5kX1JCPYY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Jx3yFEuh9i4dWciZH7Zm2jTK329MwyoOQd50in0TzP5/P8bcfy9OcSYSxLe/bFGsiLDU/e9AnI+/LHF39wvQvMvirav+McHJKwbdxZSyB+NXQah/Z0l8L8DyAQlTBfRWOW69a3aLSsmyYbvbRRIyfvo2vnxOCcgh+zy183i0WgI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=1lJSq7mX; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=/wzqmem8; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="1lJSq7mX"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="/wzqmem8" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591188; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fa74DZfXPqAvHSMVYpF3I4XvhkatGI0nWl7rzvRsR1w=; b=1lJSq7mXVlT5TEdN0+T0PW8YHc+RCaAALIuYM/Qdk5Of9Bzg6B6SvCpRdbbPRaHLrMdKOS UjXYtKF1yv9H1XoYakHenG8KGUmSTFbnG2eBcs+Bm06CBSY5hJ0WTfcHXwSQqb/5oc3yri bo6g4EStNvAfMRce77e/pOz7Yv8vlF5YVcoHR3TUGfJhbS5UzrmZTlgRlEKxIivhJenFeT orLWGEE89EQJvIsQVdde1YG9TiYbXape9TdA/EZHE+FbxsoY/vl0ToYZgtmq8JCkvgvj5d 794jXCVYVjEpiOtZNkoJZLdGfTigmIUGP13s0+SMILxcw0cjXBKo4spiJ2ESbg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591188; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fa74DZfXPqAvHSMVYpF3I4XvhkatGI0nWl7rzvRsR1w=; b=/wzqmem8sQH6iVu6jPrN7houDx6dfAVXdFrOHLDViLgzJV2ZmnPiqF3GouE/g0I7hOhvpA yGjUh97/Cxl8WuCA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 13/15] futex: Resize local futex hash table based on number of threads. Date: Mon, 3 Feb 2025 14:59:33 +0100 Message-ID: <20250203135935.440018-14-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Automatically size the local hash based on the number of threads. The logic tries to allocate between 16 and futex_hashsize (the default for the system wide hash bucket) and uses 4 * number-of-threads. On CONFIG_BASE_SMALL configs the suggested size is always 2. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/futex.h | 12 ------------ kernel/fork.c | 4 +--- kernel/futex/core.c | 34 +++++++++++++++++++++++++++++++--- 3 files changed, 32 insertions(+), 18 deletions(-) diff --git a/include/linux/futex.h b/include/linux/futex.h index bfb38764bac7a..6469aeb76a150 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -87,13 +87,6 @@ static inline void futex_mm_init(struct mm_struct *mm) mutex_init(&mm->futex_hash_lock); } =20 -static inline bool futex_hash_requires_allocation(void) -{ - if (current->mm->futex_phash) - return false; - return true; -} - #else static inline void futex_init_task(struct task_struct *tsk) { } static inline void futex_exit_recursive(struct task_struct *tsk) { } @@ -116,11 +109,6 @@ static inline int futex_hash_allocate_default(void) static inline void futex_hash_free(struct mm_struct *mm) { } static inline void futex_mm_init(struct mm_struct *mm) { } =20 -static inline bool futex_hash_requires_allocation(void) -{ - return false; -} - #endif =20 #endif diff --git a/kernel/fork.c b/kernel/fork.c index 824cc55d32ece..5e15e5b24f289 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2142,9 +2142,7 @@ static bool need_futex_hash_allocate_default(u64 clon= e_flags) { if ((clone_flags & (CLONE_THREAD | CLONE_VM)) !=3D (CLONE_THREAD | CLONE_= VM)) return false; - if (!thread_group_empty(current)) - return false; - return futex_hash_requires_allocation(); + return true; } =20 /* diff --git a/kernel/futex/core.c b/kernel/futex/core.c index e1bf43f7eb277..9a12dccb1c995 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -1411,8 +1411,8 @@ static int futex_hash_allocate(unsigned int hash_slot= s) hash_slots =3D 16; if (hash_slots < 2) hash_slots =3D 2; - if (hash_slots > 131072) - hash_slots =3D 131072; + if (hash_slots > futex_hashsize) + hash_slots =3D futex_hashsize; if (!is_power_of_2(hash_slots)) hash_slots =3D rounddown_pow_of_two(hash_slots); =20 @@ -1454,7 +1454,35 @@ static int futex_hash_allocate(unsigned int hash_slo= ts) =20 int futex_hash_allocate_default(void) { - return futex_hash_allocate(0); + unsigned int threads, buckets, current_buckets =3D 0; + struct futex_private_hash *hb_p; + + if (!current->mm) + return 0; + + scoped_guard(rcu) { + threads =3D get_nr_threads(current); + hb_p =3D rcu_dereference(current->mm->futex_phash); + if (hb_p) + current_buckets =3D hb_p->hash_mask + 1; + } + + if (IS_ENABLED(CONFIG_BASE_SMALL)) { + buckets =3D 2; + + } else { + /* + * The default allocation will remain within + * 16 <=3D threads * 4 <=3D global hash size + */ + buckets =3D roundup_pow_of_two(4 * threads); + buckets =3D max(buckets, 16); + buckets =3D min(buckets, futex_hashsize); + } + if (current_buckets >=3D buckets) + return 0; + + return futex_hash_allocate(buckets); } =20 static int futex_hash_get_slots(void) --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D7AB2066F1 for ; Mon, 3 Feb 2025 13:59:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591193; cv=none; b=Jw4SS/Qxkmp8Fzk2hpDQoxQU85dvol9Ws3IPf9Anytnq+2K67ClCne/h3rbRogjwG3fZXpZRCZVf5pRJDPh8fj2vbndW33iF5ZJtXg8ypUuk5Dis3XPXnKelaX8BwIC8nOzy7YxQFYwK2lDCH7sdjf9JazjoigHBKnDEyPseAQQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591193; c=relaxed/simple; bh=ljv1o7TtJkciS0Z3bcBYUZl790/dhvDSZEkPyI4+Bzw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Qah4+acub0zgdLRckKC1tMEJpkXdGyaGv5ioCVig1Stt2H8JADl0+2uarKo72zz777nhPEUV6OVQz/yR7/MQixGXPhkJYJjNtkxCuT0/I28LDAOqofc/FHEq0UQE6676JzrONUkDs48PFfMQQifYUXwNgzs3ms5i3vCaIHXXLaQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=0ieeXSA0; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=GZAKcxCh; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="0ieeXSA0"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="GZAKcxCh" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591189; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UmCfPL4peB7Probbyc3F6ku38xfucU5l2syASfQjPfU=; b=0ieeXSA0pSHAUojPhUZHGRYYnFulBq+pMwLcG6tH3JnE1JYpVHfk9UaKWrU5/F+ANFDA0m vpNgHh4TsplLjZBUeaf/iAZ/TdC26C06j1Aq857wdy+EiLEeB2aBzrIRXlFXsX975q/w0n Nhmi3periXcoUh4yXF0s7JnlgDMlLiiwW3KC3kEVYtl9fHFKBecYmgIGwL7ipa33Rjw7Gg SgruXpwcq7JpJVRFCR0QsEXcDfYHW7oruzExCRbKASaWTp6HBIvlKFO8beXBEU7RZXUDxn DAa8jzvT4s74LLgCkMZHbpYtTk3YZfy49M77kox060e39uzJGBpdMEojayFsng== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591189; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UmCfPL4peB7Probbyc3F6ku38xfucU5l2syASfQjPfU=; b=GZAKcxChK7oCL0kJam/EJJGZLm+xbC/JGp6WTMh/YrDqOQsnUJNzOoLTv0nWymDI/L+QfV OKl6Ii/8mYlinnCw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 14/15] futex: Use a hashmask instead of hashsize. Date: Mon, 3 Feb 2025 14:59:34 +0100 Message-ID: <20250203135935.440018-15-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The global hash uses futex_hashsize to save the amount of the hash buckets that have been allocated during system boot. On each futex_hash() invocation this number is substracted by one to get the mask. This can be optimized by saving directly the mask avoiding the substraction on each futex_hash() invocation. Rename futex_hashsize to futex_hashmask and save the mask of the allocated hash map. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 9a12dccb1c995..9e32bfa7ba4ab 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -52,10 +52,10 @@ */ static struct { struct futex_hash_bucket *queues; - unsigned long hashsize; + unsigned long hashmask; } __futex_data __read_mostly __aligned(2*sizeof(long)); #define futex_queues (__futex_data.queues) -#define futex_hashsize (__futex_data.hashsize) +#define futex_hashmask (__futex_data.hashmask) =20 struct futex_private_hash { rcuref_t users; @@ -276,7 +276,7 @@ struct futex_hash_bucket *futex_hash(union futex_key *k= ey) hash =3D jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, key->both.offset); - return &futex_queues[hash & (futex_hashsize - 1)]; + return &futex_queues[hash & futex_hashmask]; } =20 bool futex_put_private_hash(struct futex_private_hash *hb_p) @@ -1411,8 +1411,8 @@ static int futex_hash_allocate(unsigned int hash_slot= s) hash_slots =3D 16; if (hash_slots < 2) hash_slots =3D 2; - if (hash_slots > futex_hashsize) - hash_slots =3D futex_hashsize; + if (hash_slots > futex_hashmask + 1) + hash_slots =3D futex_hashmask + 1; if (!is_power_of_2(hash_slots)) hash_slots =3D rounddown_pow_of_two(hash_slots); =20 @@ -1477,7 +1477,7 @@ int futex_hash_allocate_default(void) */ buckets =3D roundup_pow_of_two(4 * threads); buckets =3D max(buckets, 16); - buckets =3D min(buckets, futex_hashsize); + buckets =3D min(buckets, futex_hashmask + 1); } if (current_buckets >=3D buckets) return 0; @@ -1518,24 +1518,25 @@ int futex_hash_prctl(unsigned long arg2, unsigned l= ong arg3) =20 static int __init futex_init(void) { + unsigned long i, hashsize; unsigned int futex_shift; - unsigned long i; =20 #ifdef CONFIG_BASE_SMALL - futex_hashsize =3D 16; + hashsize =3D 16; #else - futex_hashsize =3D roundup_pow_of_two(256 * num_possible_cpus()); + hashsize =3D roundup_pow_of_two(256 * num_possible_cpus()); #endif =20 futex_queues =3D alloc_large_system_hash("futex", sizeof(*futex_queues), - futex_hashsize, 0, 0, + hashsize, 0, 0, &futex_shift, NULL, - futex_hashsize, futex_hashsize); - futex_hashsize =3D 1UL << futex_shift; + hashsize, hashsize); + hashsize =3D 1UL << futex_shift; =20 - for (i =3D 0; i < futex_hashsize; i++) + for (i =3D 0; i < hashsize; i++) futex_hash_bucket_init(&futex_queues[i], 0); =20 + futex_hashmask =3D hashsize - 1; return 0; } core_initcall(futex_init); --=20 2.47.2 From nobody Sun Feb 8 22:07:57 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 08BC320765D for ; Mon, 3 Feb 2025 13:59:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591194; cv=none; b=ZCtNo9M7ZY9tVe0q84hZun9ZfQMgKy7v3clLgLBfGiKmVb7csnfrEUq9sNEA3LqJFgd6SDYg6cMTZpb+BQ1xkyBXz/+3DnAXQTYFMLied1V23pN6EC9RVrRFP/yTd15io5iLwusug4fbWF8JjaBPIZp6nuNvFWY2G1tuycpsrIE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738591194; c=relaxed/simple; bh=TPjwjY4OUGLtB55FXtC5lAL4JV26Lw/HwIxeSp2mBTs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DQ75mBCKt3Q5J/8O5/b5ebzP2rWeXM7IC2D9Y22J35+5SpA5glcXe8S81FUoFVSfIBa/IPAbapwfDeLfZRh/fOb1VhFI/HciY69fbOXnktjhGakl9zbINK/T/JazkflJnBDmmTa445egUGJOCJDuSEe34boLO2FEYmVkuPGCsCw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=aT9TDmyY; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Y4Ln6iJr; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="aT9TDmyY"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Y4Ln6iJr" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1738591189; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jywQLBSVGQH5BxKU7UyDrrI350h2lNTE0NJeLSeDe7U=; b=aT9TDmyYIXjST5Y2sAThBQtKEzd3myjwC3G7LQ00rI5ELKnhjwsV4tRnyjsjfQd/F8pb0r 7jNJ97Zl0lCanHl4w0Ccv5ZbCXuZCeGp+SVRLD9pCYkxtwj6xehLcnW6mHAOsyWICEvt/z 59s45hu7qdZlDzuWxWI4m24PMqwxHN90+5R9GQ2OftfJxXwPsjLcy/xSdOgLSCQlnULbq0 Vwv3wk/q+a675afp58K8D3vAw5RtIErb8oq0azXgCxSn8tsFe3CuE6E7AV+oKj0dBNU34u s2xEOK4IQIbT9jKToHpBlo5gT1z/pYQB7EOioG79zBzWxHIUCM1StO+2yhp+6g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1738591189; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jywQLBSVGQH5BxKU7UyDrrI350h2lNTE0NJeLSeDe7U=; b=Y4Ln6iJrnmDAulJbs+lUwZwg4ucwwRwXsrY0Skw0Ix1cufuXqBzrCrlnYFSLoyc/QCuyiC 5eE3apvAFQCfKBBA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v8 15/15] futex: Avoid allocating new local hash if there is something pending. Date: Mon, 3 Feb 2025 14:59:35 +0100 Message-ID: <20250203135935.440018-16-bigeasy@linutronix.de> In-Reply-To: <20250203135935.440018-1-bigeasy@linutronix.de> References: <20250203135935.440018-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If a new private hash has been prepared but not yet applied then creating a new thread will lead to the allocation of another private hash which will be freed later. This is an optional optimisation to avoid allocating a new private hash if the already prepared private hash matches the current requirement. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 9e32bfa7ba4ab..7f61f734ca5d9 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -1424,6 +1424,13 @@ static int futex_hash_allocate(unsigned int hash_slo= ts) &alloc_size))) return -ENOMEM; =20 + if (mm->futex_phash_new) { + scoped_guard(mutex, &mm->futex_hash_lock) { + if (mm->futex_phash_new && mm->futex_phash_new->hash_mask >=3D hash_slo= ts - 1) + return 0; + } + } + hb_p =3D kvmalloc(alloc_size, GFP_KERNEL_ACCOUNT); if (!hb_p) return -ENOMEM; --=20 2.47.2