From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 824941990D6 for ; Wed, 18 Dec 2024 11:16:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520589; cv=none; b=j2OcEOEH27hZREVyjU6BVsaJifJy4wnbxkdudvR8sAoqoiYd87okDqH9lBU/NvolsEKzgMp/QA2Jy/cNRpsfkEkQ5tGrSgsoSnLcHlclHq4gp7lyH7/wayiAbLHz0CVRN3f8zZlV7Qh+hj6rxCOF59neDwkr+dcpq04u6kVFz6E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520589; c=relaxed/simple; bh=InvQp8csFucSAYHvhD3g6l+nVFgce1XtjBnaPhEq20I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qlj3zldp//0n9pQWdX0N0V6M1IJgdcIR24bV+uss6E6L2Ud03vqcvH63ssEQxfm4wq2DGU2UYgsq4MtXdxaLE1L1D4K3Y3H2BabcTnmn7EntOLDDy7JUPgaxryr8pCGKZkWZbAHNNg6b2WmaHJwGIqkOtpufoY+0g9EzjxsX7A8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=TADJ2fNN; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=yuo/oj78; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="TADJ2fNN"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="yuo/oj78" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520585; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gQw1ajNgbVfr/Cy7eKebvTHzkVUlLunUhj0hFbpNeGU=; b=TADJ2fNN4+8V4eGcE3LEw8PjJf84dNmIruJUZTWtfbxj57ul3lzqCabze3sNHhcUKrIkjF H10UrQihUyWr0dUe1qw6zvgI6pd/I0bm1BseHB6ASHF9YYUjGiA9rQKcx9ogreYYHrVefS Ryo1H3ib7NEVV6J53JR9ZbS6/9JvfP8G8fv87AD7OALhZCoOvPLShxNJu9qcb/mz3LZ7SR Rk6r3J14QvonH2elspnT/nD75N4fn8JB12NNnl3xRuei13yhpcNrk9zjoKHIfFfcetqzYk jc0NEWqDVPTY24KJugBXZVRZNIsmZ7PBUfYagqe3Ux1dARfoyXTdPAM4FiMz8g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520585; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gQw1ajNgbVfr/Cy7eKebvTHzkVUlLunUhj0hFbpNeGU=; b=yuo/oj78SjNviLgfkUcNjhGLIIZf/iw+4DPvgfZJRodOxNaKb5an5Hmidudc5Q/xKwt9OW x9AKmmlzxanzRSBg== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 01/15] futex: Create helper function to initialize a hash slot. Date: Wed, 18 Dec 2024 12:09:39 +0100 Message-ID: <20241218111618.268028-2-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Factor out the futex_hash_bucket initialisation into a helpr function. The helper function will be used in a follow up patch implementing process private hash buckets. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index ebdd76b4ecbba..d1d3c7b358b23 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -1124,6 +1124,13 @@ void futex_exit_release(struct task_struct *tsk) futex_cleanup_end(tsk, FUTEX_STATE_DEAD); } =20 +static void futex_hash_bucket_init(struct futex_hash_bucket *fhb) +{ + atomic_set(&fhb->waiters, 0); + plist_head_init(&fhb->chain); + spin_lock_init(&fhb->lock); +} + static int __init futex_init(void) { unsigned int futex_shift; @@ -1141,11 +1148,8 @@ static int __init futex_init(void) futex_hashsize, futex_hashsize); futex_hashsize =3D 1UL << futex_shift; =20 - for (i =3D 0; i < futex_hashsize; i++) { - atomic_set(&futex_queues[i].waiters, 0); - plist_head_init(&futex_queues[i].chain); - spin_lock_init(&futex_queues[i].lock); - } + for (i =3D 0; i < futex_hashsize; i++) + futex_hash_bucket_init(&futex_queues[i]); =20 return 0; } --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CAF93159209 for ; Wed, 18 Dec 2024 11:16:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520589; cv=none; b=VpwJJtm8xWi2ENIgvIA1xKFqoRa8tFQtfm6vZCY2kESFqFwhDaYw6MvG2XWifgtHwJwVdqZvYIOwjs5HEqssqKC79hwiWXRY6gzUSlpgZC+fBHfTU48zj4ugzN/xs9TKGlVabTmn3uvDMH53W14nugcMte3DVCU1LI+wJt4ru0M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520589; c=relaxed/simple; bh=VFUo8fLP7fYv3J8uw9e2MvlXdsUe9QNKIk91xG+cLjw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WkmKwy0vBcaM+Z4Uk9cAG5n6ilcrkiZPHyZH4eOnbpgWERTC38HfOZlLslum3GMn3mtsQiJua2QQupU2qcRsUVY0g0/66lYdz7Aj+N1uiAXPNzsvqBLNUbWPOBnHBEWAbiyyai1P0D+/eESPYCsFQ05RtS/wOASl3Lz0I1lSO/o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Jml2EcSu; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=7tgv44iM; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Jml2EcSu"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="7tgv44iM" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520586; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+PNdyDDnxj0+A1kbyMNndI39hLrigmWKLKP/EyyYtKE=; b=Jml2EcSuxFcKB6BJke9mxPRI1fAXd3HL2wrzcPOvoiavh6eilZSGIqJiekrNR9YsyM2Eo3 HJSAOpOlsp3uGWbllgKf89dC1Tso7uCA3ljb9BO51A8LJmSjTDjMSOwcl/ye++IoOb+SXm 8+am5L9m+vIqhcYPNfl+N+cn+6+Z1uMvgYDAkxsc1J0w/XmIkXllmVOlm3yBRSHtdcehUV 1xkSfx5BvmRnguFSd5+aarwqC1voY5ICuMVH6l7+mjEDWnGqmapcbIk0YFRs5B9oTPmyab 4eeCBxuMbpIYl7FLv3i1jHc3U2ZCntLNLlcwg0X+I1VvbdRqHQUAwUEaPoQUjg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520586; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+PNdyDDnxj0+A1kbyMNndI39hLrigmWKLKP/EyyYtKE=; b=7tgv44iMzmUPHsSPdF+oEAnUW0VTriZ8zMPW6MKMNDmCy543NwjYHv2YYbCG059P9judIz 36svQyZufaVzhNBA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 02/15] futex: Add basic infrastructure for local task local hash. Date: Wed, 18 Dec 2024 12:09:40 +0100 Message-ID: <20241218111618.268028-3-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The futex hashmap is system wide and shared by random tasks. Each slot is hashed based on its address and VMA. Due to randomized VMAs (and memory allocations) the same logical lock (pointer) can end up in a different hash bucket on each invocation of the application. This in turn means that different applications may share a hash bucket on the first invocation but not on the second an it is not always clear which applications will be involved. This can result in high latency's to acquire the futex_hash_bucket::lock especially if the lock owner is limited to a CPU and not be effectively PI boosted. Introduce a task local hash map. The hashmap can be allocated via prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, 0) The `0' argument allocates a default number of 16 slots, a higher number can be specified if desired. The current upper limit is 131072. The allocated hashmap is used by all threads within a process. A thread can check if the private map has been allocated via prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS); Which return the current number of slots. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/futex.h | 20 ++++++++ include/linux/mm_types.h | 5 ++ include/uapi/linux/prctl.h | 5 ++ kernel/fork.c | 2 + kernel/futex/core.c | 99 ++++++++++++++++++++++++++++++++++++-- kernel/sys.c | 4 ++ 6 files changed, 132 insertions(+), 3 deletions(-) diff --git a/include/linux/futex.h b/include/linux/futex.h index b70df27d7e85c..943828db52234 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -77,6 +77,15 @@ void futex_exec_release(struct task_struct *tsk); =20 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, u32 __user *uaddr2, u32 val2, u32 val3); +int futex_hash_prctl(unsigned long arg2, unsigned long arg3); +int futex_hash_allocate_default(void); +void futex_hash_free(struct mm_struct *mm); + +static inline void futex_mm_init(struct mm_struct *mm) +{ + mm->futex_hash_bucket =3D NULL; +} + #else static inline void futex_init_task(struct task_struct *tsk) { } static inline void futex_exit_recursive(struct task_struct *tsk) { } @@ -88,6 +97,17 @@ static inline long do_futex(u32 __user *uaddr, int op, u= 32 val, { return -EINVAL; } +static inline int futex_hash_prctl(unsigned long arg2, unsigned long arg3) +{ + return -EINVAL; +} +static inline int futex_hash_allocate_default(void) +{ + return 0; +} +static inline void futex_hash_free(struct mm_struct *mm) { } +static inline void futex_mm_init(struct mm_struct *mm) { } + #endif =20 #endif diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7361a8f3ab68e..2337a2e481fd0 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -30,6 +30,7 @@ #define INIT_PASID 0 =20 struct address_space; +struct futex_hash_bucket; struct mem_cgroup; =20 /* @@ -902,6 +903,10 @@ struct mm_struct { int mm_lock_seq; #endif =20 +#ifdef CONFIG_FUTEX + unsigned int futex_hash_mask; + struct futex_hash_bucket *futex_hash_bucket; +#endif =20 unsigned long hiwater_rss; /* High-watermark of RSS usage */ unsigned long hiwater_vm; /* High-water virtual memory usage */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 5c6080680cb27..55b843644c51a 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -353,4 +353,9 @@ struct prctl_mm_map { */ #define PR_LOCK_SHADOW_STACK_STATUS 76 =20 +/* FUTEX hash management */ +#define PR_FUTEX_HASH 77 +# define PR_FUTEX_HASH_SET_SLOTS 1 +# define PR_FUTEX_HASH_GET_SLOTS 2 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 1450b461d196a..cda8886f3a1d7 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1284,6 +1284,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, RCU_INIT_POINTER(mm->exe_file, NULL); mmu_notifier_subscriptions_init(mm); init_tlb_flush_pending(mm); + futex_mm_init(mm); #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(CONFIG_SPLIT_PMD_PTLO= CKS) mm->pmd_huge_pte =3D NULL; #endif @@ -1361,6 +1362,7 @@ static inline void __mmput(struct mm_struct *mm) if (mm->binfmt) module_put(mm->binfmt->module); lru_gen_del_mm(mm); + futex_hash_free(mm); mmdrop(mm); } =20 diff --git a/kernel/futex/core.c b/kernel/futex/core.c index d1d3c7b358b23..b87bd27b73707 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -39,6 +39,7 @@ #include #include #include +#include =20 #include "futex.h" #include "../locking/rtmutex_common.h" @@ -107,18 +108,40 @@ late_initcall(fail_futex_debugfs); =20 #endif /* CONFIG_FAIL_FUTEX */ =20 +static inline bool futex_key_is_private(union futex_key *key) +{ + /* + * Relies on get_futex_key() to set either bit for shared + * futexes -- see comment with union futex_key. + */ + return !(key->both.offset & (FUT_OFF_INODE | FUT_OFF_MMSHARED)); +} + /** * futex_hash - Return the hash bucket in the global hash * @key: Pointer to the futex key for which the hash is calculated * * We hash on the keys returned from get_futex_key (see below) and return = the - * corresponding hash bucket in the global hash. + * corresponding hash bucket in the global hash. If the FUTEX is private a= nd + * a local hash table is privated then this one is used. */ struct futex_hash_bucket *futex_hash(union futex_key *key) { - u32 hash =3D jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, - key->both.offset); + struct futex_hash_bucket *fhb; + u32 hash; =20 + fhb =3D current->mm->futex_hash_bucket; + if (fhb && futex_key_is_private(key)) { + u32 hash_mask =3D current->mm->futex_hash_mask; + + hash =3D jhash2((u32 *)key, + offsetof(typeof(*key), both.offset) / 4, + key->both.offset); + return &fhb[hash & hash_mask]; + } + hash =3D jhash2((u32 *)key, + offsetof(typeof(*key), both.offset) / 4, + key->both.offset); return &futex_queues[hash & (futex_hashsize - 1)]; } =20 @@ -1131,6 +1154,76 @@ static void futex_hash_bucket_init(struct futex_hash= _bucket *fhb) spin_lock_init(&fhb->lock); } =20 +void futex_hash_free(struct mm_struct *mm) +{ + kvfree(mm->futex_hash_bucket); +} + +static int futex_hash_allocate(unsigned int hash_slots) +{ + struct futex_hash_bucket *fhb; + int i; + + if (current->mm->futex_hash_bucket) + return -EALREADY; + + if (!thread_group_leader(current)) + return -EINVAL; + + if (hash_slots =3D=3D 0) + hash_slots =3D 16; + if (hash_slots < 2) + hash_slots =3D 2; + if (hash_slots > 131072) + hash_slots =3D 131072; + if (!is_power_of_2(hash_slots)) + hash_slots =3D rounddown_pow_of_two(hash_slots); + + fhb =3D kvmalloc_array(hash_slots, sizeof(struct futex_hash_bucket), GFP_= KERNEL_ACCOUNT); + if (!fhb) + return -ENOMEM; + + current->mm->futex_hash_mask =3D hash_slots - 1; + + for (i =3D 0; i < hash_slots; i++) + futex_hash_bucket_init(&fhb[i]); + + current->mm->futex_hash_bucket =3D fhb; + return 0; +} + +int futex_hash_allocate_default(void) +{ + return futex_hash_allocate(0); +} + +static int futex_hash_get_slots(void) +{ + if (current->mm->futex_hash_bucket) + return current->mm->futex_hash_mask + 1; + return 0; +} + +int futex_hash_prctl(unsigned long arg2, unsigned long arg3) +{ + int ret; + + switch (arg2) { + case PR_FUTEX_HASH_SET_SLOTS: + ret =3D futex_hash_allocate(arg3); + break; + + case PR_FUTEX_HASH_GET_SLOTS: + ret =3D futex_hash_get_slots(); + break; + + default: + ret =3D -EINVAL; + break; + } + return ret; +} + static int __init futex_init(void) { unsigned int futex_shift; diff --git a/kernel/sys.c b/kernel/sys.c index c4c701c6f0b4d..d8081f1d07d11 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -52,6 +52,7 @@ #include #include #include +#include =20 #include #include @@ -2809,6 +2810,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, ar= g2, unsigned long, arg3, return -EINVAL; error =3D arch_lock_shadow_stack_status(me, arg2); break; + case PR_FUTEX_HASH: + error =3D futex_hash_prctl(arg2, arg3); + break; default: error =3D -EINVAL; break; --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 415B01990C1 for ; Wed, 18 Dec 2024 11:16:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520589; cv=none; b=NXS4MzfC67ap1Rhgr6B+D8MAxlDcPgFHxw7derARF45d+QCv7CfmMi9nRdyA8hjKibuUXA7WPfcxqfRLkXLKpXs355T+cD3CKrXkYw68jLlrx/7ROfmozNyToe+TK3P3cko34L6qmt2MMlfKGOVYMBKEK5b3Tw3h3r1eIgD2+oE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520589; c=relaxed/simple; bh=2Yf9dTHoYaby3VbY12Yk9sfWH1gdnumaX6lKBWWsHvg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ChwV7VmlaIW2ujtWaWCJEZyqFf/O8AuV+OIMG6RgaV0KDbrGBmwGRA+XOepbOeKQRQoG7SLo9QQFBeK3oaRW6lvBGXxVqKduYDzbOE0V4MBQugMKNefScaUFBcrFfKzMANBnE51ePwq8f8rbMUBxqbv488mGlR1lZV6Bimd2U0k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=yqu28+AF; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=akULtRY/; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="yqu28+AF"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="akULtRY/" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520586; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WP9NALlVKbZdzyimsq9a3cjbsLLj2+xhboLJAEJ0CvY=; b=yqu28+AFKTJHgRFGWNdKThyTi3LmLiCeZ1j1JCbgGePz8X+3QLkrS9mJxojU+93pbwe1KV 8BPSSi42940a9VWDfT964hO1umBDD3GrnVl3y9OarkPaUHVtnu2lbweZzEMJw6/rNOLmbH YjDg5wFdsYnczO9sD5CzINFIshYCWBgh+x6lyLfmRPDkzwrn68uiEPBV3Tx/Cweu3tfjS6 lgEorQdOVW8Sk+76W572qab+gEyZtKTM9GHx6nSyIGHtmmmusGe6BtfcWSOYAUYMoet8H3 GumdkJWX4UTM8rSLagncnCIx/lhx6i7VKKHZlIY8J2O3kwH/cvXRFDCLqLmI4A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520586; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WP9NALlVKbZdzyimsq9a3cjbsLLj2+xhboLJAEJ0CvY=; b=akULtRY/cLH+3+pl2cW92/kDeDIJJ93LZG7+S7o3kH+vZuaK1h/+px1IjvQ8rMulWVMc2w XwGXJzc7w2Jw2hDw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 03/15] futex: Allow automatic allocation of process wide futex hash. Date: Wed, 18 Dec 2024 12:09:41 +0100 Message-ID: <20241218111618.268028-4-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allocate a default futex hash if a task forks its first thread. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/futex.h | 12 ++++++++++++ kernel/fork.c | 24 ++++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/include/linux/futex.h b/include/linux/futex.h index 943828db52234..bad377c30de5e 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -86,6 +86,13 @@ static inline void futex_mm_init(struct mm_struct *mm) mm->futex_hash_bucket =3D NULL; } =20 +static inline bool futex_hash_requires_allocation(void) +{ + if (current->mm->futex_hash_bucket) + return false; + return true; +} + #else static inline void futex_init_task(struct task_struct *tsk) { } static inline void futex_exit_recursive(struct task_struct *tsk) { } @@ -108,6 +115,11 @@ static inline int futex_hash_allocate_default(void) static inline void futex_hash_free(struct mm_struct *mm) { } static inline void futex_mm_init(struct mm_struct *mm) { } =20 +static inline bool futex_hash_requires_allocation(void) +{ + return false; +} + #endif =20 #endif diff --git a/kernel/fork.c b/kernel/fork.c index cda8886f3a1d7..95d38709fde10 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2130,6 +2130,15 @@ static void rv_task_fork(struct task_struct *p) #define rv_task_fork(p) do {} while (0) #endif =20 +static bool need_futex_hash_allocate_default(u64 clone_flags) +{ + if ((clone_flags & (CLONE_THREAD | CLONE_VM)) !=3D (CLONE_THREAD | CLONE_= VM)) + return false; + if (!thread_group_empty(current)) + return false; + return futex_hash_requires_allocation(); +} + /* * This creates a new process as a copy of the old one, * but does not actually start it yet. @@ -2507,6 +2516,21 @@ __latent_entropy struct task_struct *copy_process( if (retval) goto bad_fork_cancel_cgroup; =20 + /* + * Allocate a default futex hash for the user process once the first + * thread spawns. + */ + if (need_futex_hash_allocate_default(clone_flags)) { + retval =3D futex_hash_allocate_default(); + if (retval) + goto bad_fork_core_free; + /* + * If we fail beyond this point we don't free the allocated + * futex hash map. We assume that another thread will created + * and makes use of it. The hash map will be freed once the main + * thread terminates. + */ + } /* * From this point on we must avoid any synchronous user-space * communication until we take the tasklist-lock. In particular, we do --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B880819ABC2 for ; Wed, 18 Dec 2024 11:16:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520590; cv=none; b=rcJ6FGCj8lz4itcTtRz77kxbw/EJvsMLypSlLTzPVaTixdu1WixuRjTbgrfUqQo9J2l3LzuQG28lmsyK73zlcbuI6BAAoC6n9zZn1UM2M7TA6VFa3YK5oy1Bdt/xlp+y9KbZHpEGimwawQrOyBvSReFUhD0GcWZGCSDgdy3Qa0g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520590; c=relaxed/simple; bh=KHeSjPd0zRCAGUtxGFlcWOPrmx9y+ukDcIFpzj3jkJU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VNFn9to8t0wOD8IXb7Sc3B2h+2d87qSZVI88H31oPaiRNrqI+grCH9wDZ5jaUf3+Egir+2K9jLi0FAPNqjas45OEKHOduFLMYt+ORnwdXSeaO2Dcoy7aWFxX8sZmn7Obp27DnF8V5o9YTZRdnkIQk3khGleNnWrmVuWWR066fOk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=1NegcJlp; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=8VFfZVdq; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="1NegcJlp"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="8VFfZVdq" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520586; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XOfEVIxyuzb94jSGu1mkbITIF88QTRaDLmkNO646qMs=; b=1NegcJlpKykNW1HiGWSvi9auFDlUOcVsa/R0nC8qJ0T3ic4JwNQf/vXiIjmOLXXhEdhVEg AZen1kwMZyepRnfPI+EznLQjn1nBqGU8qf/4PQrAoLC03ZwnsgSKu49SpgajzF6eryaWHZ G9Z6HN9eTw0yT0el5Dql7PpY/J/AljRSTK4jFr2zW9MZklQ1rUSQghHLCGvgStyFSJ50Kn X/QALxeQwKk5I5GG63AWWYGeGZ6co29h1ptEKpi5hQaKfu8hRboSLlK2W+KEGeKqg+Grhu Bk8Iu/4Hx4YW6vjnwuNck0XaNWPIKHTfl3UE1AaApJAPcakiimpk2rIihDr90w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520586; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XOfEVIxyuzb94jSGu1mkbITIF88QTRaDLmkNO646qMs=; b=8VFfZVdqwmI7NGr8ifOHzbbM2xZY7KdOSU2onWiSQ9d1QXTR+bswedr7JtDjEfoIYjjmX4 SNUqt69ZSp5/LqCA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 04/15] futex: Hash only the address for private futexes. Date: Wed, 18 Dec 2024 12:09:42 +0100 Message-ID: <20241218111618.268028-5-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" futex_hash() passes the whole futex_key to jhash2. The first two member are passed as the first argument and the offset as the "initial value". For private futexes, the mm-part is always the same and it is used only within the process. By excluding the mm part from the hash, we reduce the length passed to jhash2 from 4 (16 / 4) to 2 (8 / 2). This avoids the __jhash_mix() part of jhash. The resulting code is smaller and based on testing this variant performs as good as the original or slightly better. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index b87bd27b73707..583a0149d62c9 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -134,8 +134,8 @@ struct futex_hash_bucket *futex_hash(union futex_key *k= ey) if (fhb && futex_key_is_private(key)) { u32 hash_mask =3D current->mm->futex_hash_mask; =20 - hash =3D jhash2((u32 *)key, - offsetof(typeof(*key), both.offset) / 4, + hash =3D jhash2((void *)&key->private.address, + sizeof(key->private.address) / 4, key->both.offset); return &fhb[hash & hash_mask]; } --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACC211A0706 for ; Wed, 18 Dec 2024 11:16:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520594; cv=none; b=YQstZqyJo9VDx35daDTSSZbx0oTNiJOozXJJmblbZ7XgSHatwb88uG6KeqwZ48laTT9yDMMkZMZg+wR5ks1xDvZ5BVYpXSSKivVfsy92qrJfxuyAdOeE+g4iVEGEPNKzZWS5JCLyPBlJ8gB2ypC42h62FnfluysnJlva1DxUbvQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520594; c=relaxed/simple; bh=XCZnUcVRTg/OdBNdYd0llHPmAfBHZiYKdMr5bvCqgZs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KGF3N2slhUC4TKcE6n4GR7urAnrp/HGD2Vs6gOlZzb9OTCe1uMDRT/kNBFKaXo03AnI2esmAhpMmal5yIcuPh4nx5n4fgRPx9ZE2eacLGzToc25edyAq7n2erE9uMIp+9Sq1vN7DZob5W+JuHGKTXptUpientP1Y4lQhlGVLdMU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=oVd2RRy6; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=y2a7l/St; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="oVd2RRy6"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="y2a7l/St" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FjMoQW4IUAh/X7O25zMu1DmUj0sv7Y/29SSuFX54hl0=; b=oVd2RRy62yB59XKqHW51YkKhmwsmSmX5PF6Gd68ikCHYK5hfh6G1ZDZXDKprv+7b2eKZx5 KKL+oo8yV13TTSytG5+u9QH/MxLIIJjKFye5JCGNGOljTKGIKxhrxsV4MYnkqWUYSZ0lFz NDSejLlT6v2N5UOItBZy1YeZLsXrsKZuoG4FdbK/+n5lgJYzz9bQ/nkUF9ShTV2bki3a9j 1IEZ/dH/1R6kL5XItR3hanq5t9481DygE5MAUqF23t9MBRBTKG3+R9qRWzjfTOYaRNrikr +KgWoXpQqtMxYdDf7ltlyEgNpD05UaNauBDj9MDyKlm/q3aSijbPTepVU5+0WA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FjMoQW4IUAh/X7O25zMu1DmUj0sv7Y/29SSuFX54hl0=; b=y2a7l/St8NL21PbSUVy0qGg5rlf9J320OKt+6wh95LWeYX/NjSL3WFKc6K5ArwK/9EqWMb k3uZzSVhnu/V7nAQ== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 05/15] futex: Move private hashing into its own function. Date: Wed, 18 Dec 2024 12:09:43 +0100 Message-ID: <20241218111618.268028-6-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The hashing of the private is slightly different and will be needed again while moving a futex_q entry to a different hash bucket after the resize. Move the private hashing into its own function. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 583a0149d62c9..907b76590df16 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -117,6 +117,18 @@ static inline bool futex_key_is_private(union futex_ke= y *key) return !(key->both.offset & (FUT_OFF_INODE | FUT_OFF_MMSHARED)); } =20 +static struct futex_hash_bucket *futex_hash_private(union futex_key *key, + struct futex_hash_bucket *fhb, + u32 hash_mask) +{ + u32 hash; + + hash =3D jhash2((void *)&key->private.address, + sizeof(key->private.address) / 4, + key->both.offset); + return &fhb[hash & hash_mask]; +} + /** * futex_hash - Return the hash bucket in the global hash * @key: Pointer to the futex key for which the hash is calculated @@ -131,14 +143,9 @@ struct futex_hash_bucket *futex_hash(union futex_key *= key) u32 hash; =20 fhb =3D current->mm->futex_hash_bucket; - if (fhb && futex_key_is_private(key)) { - u32 hash_mask =3D current->mm->futex_hash_mask; + if (fhb && futex_key_is_private(key)) + return futex_hash_private(key, fhb, current->mm->futex_hash_mask); =20 - hash =3D jhash2((void *)&key->private.address, - sizeof(key->private.address) / 4, - key->both.offset); - return &fhb[hash & hash_mask]; - } hash =3D jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, key->both.offset); --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACB571A00FE for ; Wed, 18 Dec 2024 11:16:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520592; cv=none; b=GEh49J+tKhRmoDWAzZvgeuXYuPwyScOKchHgongifSt7224yT1NqaGwoOByPpf1Ij/G8xGopNj9Eqe0Jl2sPgSrjgIML/iU77W37iuyADF9T38xcma25r4DEoYu/bxfUTvfKmNQcef1Cl//bSohUfsoTd2nlYB4gPGdkXlUuRUY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520592; c=relaxed/simple; bh=fgip/RU+lTB3R0w6MLaWB/u++7/aYFKZsABMMEtRnlI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FtgMtMQRnRyyzt+O84Wcc/WEVVCEK3FiJOmTFy5YeCdhCfeFHLfbDcZOCvKvFOJ5tXrFTJNiV5kl9Z1Do5MSXJdbuIZ1opDo0W+O+VWbFKhZAnIF1b8Sbeu4uYP7WLKpkHwGuF6Pm4JbDRA7OS8jJ1pSpYIZh4sXSF0a//ViTxE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=ck6gAqne; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=eirE1mtR; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ck6gAqne"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="eirE1mtR" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/bxp6kVzpikx/DHSvPukV1Xoc2vF5cq0WM8tkOhgCLc=; b=ck6gAqnei+cui7vNnBjVsGhmV+FtS4ahJBaBRVH6bpewWS9rWe0d6hl7ZlP41O8logXipB dzQ7da1aZ/y18YAZR2ys30Q1r8qvFpV0hTrKc7URjY4+ftXs39sb50h0w11R6jPUunns1/ VP4wL9DCeQeX6xjS0nIW7NeeoC6XGXDDKMUfYyskDTu6BuBlyZm6BkWEQ6vp3SLHOOgiVn DLIGIBb13x5DM76+bITCoKwrrpS5LevpSfYZlf6+5aH86DGoCJiU+kvNzUDRkesLJ7XQvF A6SiTp5E4suHS/tifgDGSVTOzZGYlA9/EPRCq+BtifFdecqgLtpPjsg9h8xeQw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/bxp6kVzpikx/DHSvPukV1Xoc2vF5cq0WM8tkOhgCLc=; b=eirE1mtRKMA2ztguFUjhEh87Gr0/Bu9dd4wDA7UtNcYjL9drVRBYpQlOeTW162QnFOe1H3 nSdwGf8R3Tra9nBQ== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 06/15] futex: Decrease the waiter count before the unlock operation. Date: Wed, 18 Dec 2024 12:09:44 +0100 Message-ID: <20241218111618.268028-7-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To support runtime resizing of the process private hash, it's required to not use the obtained hash bucket once the reference count has been dropped. The reference will be dropped after the unlock of the hash bucket. The amount of waiters is decremented after the unlock operation. There is no requirement that this needs to happen after the unlock. The increment happens before acquiring the lock to signal early that there will be a waiter. The waiter can avoid blocking on the lock if it is known that there will be no waiter. There is no difference in terms of ordering if the decrement happens before or after the unlock. Decrease the waiter count before the unlock operation. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 2 +- kernel/futex/requeue.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 907b76590df16..254d0dfac71a9 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -558,8 +558,8 @@ struct futex_hash_bucket *futex_q_lock(struct futex_q *= q) void futex_q_unlock(struct futex_hash_bucket *hb) __releases(&hb->lock) { - spin_unlock(&hb->lock); futex_hb_waiters_dec(hb); + spin_unlock(&hb->lock); } =20 void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb) diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index b47bb764b3520..fb69dcdf74da8 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -456,8 +456,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, ret =3D futex_get_value_locked(&curval, uaddr1); =20 if (unlikely(ret)) { - double_unlock_hb(hb1, hb2); futex_hb_waiters_dec(hb2); + double_unlock_hb(hb1, hb2); =20 ret =3D get_user(curval, uaddr1); if (ret) @@ -542,8 +542,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, * waiter::requeue_state is correct. */ case -EFAULT: - double_unlock_hb(hb1, hb2); futex_hb_waiters_dec(hb2); + double_unlock_hb(hb1, hb2); ret =3D fault_in_user_writeable(uaddr2); if (!ret) goto retry; @@ -556,8 +556,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, * exit to complete. * - EAGAIN: The user space value changed. */ - double_unlock_hb(hb1, hb2); futex_hb_waiters_dec(hb2); + double_unlock_hb(hb1, hb2); /* * Handle the case where the owner is in the middle of * exiting. Wait for the exit to complete otherwise @@ -674,9 +674,9 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, put_pi_state(pi_state); =20 out_unlock: + futex_hb_waiters_dec(hb2); double_unlock_hb(hb1, hb2); wake_up_q(&wake_q); - futex_hb_waiters_dec(hb2); return ret ? ret : task_count; } =20 --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD3741A0739 for ; Wed, 18 Dec 2024 11:16:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520593; cv=none; b=cGgluXlmtQwCSpsWSFhuNfYYSGvcnMtaPcNTmXZNY5ylOYb/o5Yq4eyzTStw+dS7BoscdD9qhXddq9huuXu8TxvqGjA7OPMc+A2L6pJQkbJTUQomVLEZzenYPwfywHOjoRR8Tf+jFGik+7liG8HlB5SCu/0Gl22GVYf7xpzSrtU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520593; c=relaxed/simple; bh=Rh6Tx+aMV6aJnI9oc61ha+Lg03eXPOQ6Z1ap9/e3dBs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bKA8kyXeWcoylH4+FWW1D3O07+nw1aQnFnpkqSzAR2yKsRpct5kxIsNb81fmTGiZoG/m/9hZQ8VQs0GHwWHszssruVmQwugHF+57TyWMBUfCrM4PxDNSVLEUQNxBVMDEI0EfztEmdstGtUpcHZQ9iMUmCLVZsMqH3QfDkCqXg04= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=qZq4fFZE; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=38w7R5sY; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="qZq4fFZE"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="38w7R5sY" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rcrdHhzuIv1LMO43Ma+OAV3EaUDMh2Ab0o8q1NSDUWE=; b=qZq4fFZEt0UUYkpAeFxBt1hx/2ck0SNmvy5m5UjKUGtcmLPO5marZHgjBOQoU1wg2rY7sj +Hobw5DMynVN1MBhP796PJf9kME4Jx48UxJqKt8cEfpro2dNuhaJY0k43yeZegE3HNGsgN yDp1nYItfW+OzKpJ3PPMNLzKdI/KRB3BNen2K+cKBiOa6EML89COWIgB48cBfAdplzscNs IsaLOAoBUwiAF/QWhT5AIdkfcaDjaSPQgC69sugSM7eAunIK7rQmxR1tLO+bS8Pbjao4Rl Pb2UTMJDUaW0HzQEbhfSKCNL2v13DcQoejNfhirr6mJZfKggWHeZsQ+D5mQtwA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rcrdHhzuIv1LMO43Ma+OAV3EaUDMh2Ab0o8q1NSDUWE=; b=38w7R5sY4dm0LM3URIkKsj5GjheAzmbpFnDzRyJJ8B53Myy71B/wpQOObjkRiV5Rk9iVQr /m8xnxL1qXiO0RAw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 07/15] futex: Prepare for reference counting of the process private hash end of operation. Date: Wed, 18 Dec 2024 12:09:45 +0100 Message-ID: <20241218111618.268028-8-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To support runtime resizing of the process private hash, it's required to add a reference count to the hash structure. The reference count ensures that the hash cannot be resized or freed while a task is operating on it. The reference count will be obtained within futex_hash() and dropped once the hash bucket is unlocked and not longer required for the particular operation (queue, unqueue, wakeup etc.). This is achieved by: - appending _put() to existing functions so it's clear that they also put the hash reference and fixing up the usage sites - providing new helpers, which combine common operations (unlock, put), and using them at the appropriate places - providing new helper for standalone reference counting functionality and using them at places, where the unlock operation needs to be separate. Signed-off-by: Sebastian Andrzej Siewior --- io_uring/futex.c | 2 +- kernel/futex/core.c | 12 ++++++++---- kernel/futex/futex.h | 31 ++++++++++++++++++++----------- kernel/futex/pi.c | 19 ++++++++++--------- kernel/futex/requeue.c | 12 ++++++------ kernel/futex/waitwake.c | 23 ++++++++++++----------- 6 files changed, 57 insertions(+), 42 deletions(-) diff --git a/io_uring/futex.c b/io_uring/futex.c index e29662f039e1a..67246438da228 100644 --- a/io_uring/futex.c +++ b/io_uring/futex.c @@ -349,7 +349,7 @@ int io_futex_wait(struct io_kiocb *req, unsigned int is= sue_flags) hlist_add_head(&req->hash_node, &ctx->futex_list); io_ring_submit_unlock(ctx, issue_flags); =20 - futex_queue(&ifd->q, hb); + futex_queue_put(&ifd->q, hb); return IOU_ISSUE_SKIP_COMPLETE; } =20 diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 254d0dfac71a9..1521fbdf22f65 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -152,6 +152,9 @@ struct futex_hash_bucket *futex_hash(union futex_key *k= ey) return &futex_queues[hash & (futex_hashsize - 1)]; } =20 +void futex_hash_put(struct futex_hash_bucket *hb) +{ +} =20 /** * futex_setup_timer - set up the sleeping hrtimer. @@ -543,8 +546,8 @@ struct futex_hash_bucket *futex_q_lock(struct futex_q *= q) * Increment the counter before taking the lock so that * a potential waker won't miss a to-be-slept task that is * waiting for the spinlock. This is safe as all futex_q_lock() - * users end up calling futex_queue(). Similarly, for housekeeping, - * decrement the counter at futex_q_unlock() when some error has + * users end up calling futex_queue_put(). Similarly, for housekeeping, + * decrement the counter at futex_q_unlock_put() when some error has * occurred and we don't end up adding the task to the list. */ futex_hb_waiters_inc(hb); /* implies smp_mb(); (A) */ @@ -555,11 +558,12 @@ struct futex_hash_bucket *futex_q_lock(struct futex_q= *q) return hb; } =20 -void futex_q_unlock(struct futex_hash_bucket *hb) +void futex_q_unlock_put(struct futex_hash_bucket *hb) __releases(&hb->lock) { futex_hb_waiters_dec(hb); spin_unlock(&hb->lock); + futex_hash_put(hb); } =20 void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb) @@ -586,7 +590,7 @@ void __futex_queue(struct futex_q *q, struct futex_hash= _bucket *hb) * @q: The futex_q to unqueue * * The q->lock_ptr must not be held by the caller. A call to futex_unqueue= () must - * be paired with exactly one earlier call to futex_queue(). + * be paired with exactly one earlier call to futex_queue_put(). * * Return: * - 1 - if the futex_q was still queued (and we removed unqueued it); diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index 99b32e728c4ad..36627617f7ced 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -202,6 +202,7 @@ futex_setup_timer(ktime_t *time, struct hrtimer_sleeper= *timeout, int flags, u64 range_ns); =20 extern struct futex_hash_bucket *futex_hash(union futex_key *key); +extern void futex_hash_put(struct futex_hash_bucket *hb); =20 /** * futex_match - Check whether two futex keys are equal @@ -288,23 +289,29 @@ extern void __futex_unqueue(struct futex_q *q); extern void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb); extern int futex_unqueue(struct futex_q *q); =20 +static inline void futex_hb_unlock_put(struct futex_hash_bucket *hb) +{ + spin_unlock(&hb->lock); + futex_hash_put(hb); +} + /** - * futex_queue() - Enqueue the futex_q on the futex_hash_bucket + * futex_queue_put() - Enqueue the futex_q on the futex_hash_bucket * @q: The futex_q to enqueue * @hb: The destination hash bucket * - * The hb->lock must be held by the caller, and is released here. A call to - * futex_queue() is typically paired with exactly one call to futex_unqueu= e(). The - * exceptions involve the PI related operations, which may use futex_unque= ue_pi() - * or nothing if the unqueue is done as part of the wake process and the u= nqueue - * state is implicit in the state of woken task (see futex_wait_requeue_pi= () for - * an example). + * The hb->lock must be held by the caller, and is released here and the r= eference + * on the hb is dropped. A call to futex_queue_put() is typically paired w= ith + * exactly one call to futex_unqueue(). The exceptions involve the PI rela= ted + * operations, which may use futex_unqueue_pi() or nothing if the unqueue = is + * done as part of the wake process and the unqueue state is implicit in t= he + * state of woken task (see futex_wait_requeue_pi() for an example). */ -static inline void futex_queue(struct futex_q *q, struct futex_hash_bucket= *hb) +static inline void futex_queue_put(struct futex_q *q, struct futex_hash_bu= cket *hb) __releases(&hb->lock) { __futex_queue(q, hb); - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); } =20 extern void futex_unqueue_pi(struct futex_q *q); @@ -350,7 +357,7 @@ static inline int futex_hb_waiters_pending(struct futex= _hash_bucket *hb) } =20 extern struct futex_hash_bucket *futex_q_lock(struct futex_q *q); -extern void futex_q_unlock(struct futex_hash_bucket *hb); +extern void futex_q_unlock_put(struct futex_hash_bucket *hb); =20 =20 extern int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucke= t *hb, @@ -380,11 +387,13 @@ double_lock_hb(struct futex_hash_bucket *hb1, struct = futex_hash_bucket *hb2) } =20 static inline void -double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *= hb2) +double_unlock_hb_put(struct futex_hash_bucket *hb1, struct futex_hash_buck= et *hb2) { spin_unlock(&hb1->lock); if (hb1 !=3D hb2) spin_unlock(&hb2->lock); + futex_hash_put(hb1); + futex_hash_put(hb2); } =20 /* syscalls */ diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index d62cca5ed8f4c..8561f94f21ed9 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -217,9 +217,9 @@ static int attach_to_pi_state(u32 __user *uaddr, u32 uv= al, /* * We get here with hb->lock held, and having found a * futex_top_waiter(). This means that futex_lock_pi() of said futex_q - * has dropped the hb->lock in between futex_queue() and futex_unqueue_pi= (), - * which in turn means that futex_lock_pi() still has a reference on - * our pi_state. + * has dropped the hb->lock in between futex_queue_put() and + * futex_unqueue_pi(), which in turn means that futex_lock_pi() still + * has a reference on our pi_state. * * The waiter holding a reference on @pi_state also protects against * the unlocked put_pi_state() in futex_unlock_pi(), futex_lock_pi() @@ -963,7 +963,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags= , ktime_t *time, int tryl * exit to complete. * - EAGAIN: The user space value changed. */ - futex_q_unlock(hb); + futex_q_unlock_put(hb); /* * Handle the case where the owner is in the middle of * exiting. Wait for the exit to complete otherwise @@ -1086,7 +1086,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fla= gs, ktime_t *time, int tryl goto out; =20 out_unlock_put_key: - futex_q_unlock(hb); + futex_q_unlock_put(hb); =20 out: if (to) { @@ -1096,7 +1096,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fla= gs, ktime_t *time, int tryl return ret !=3D -EINTR ? ret : -ERESTARTNOINTR; =20 uaddr_faulted: - futex_q_unlock(hb); + futex_q_unlock_put(hb); =20 ret =3D fault_in_user_writeable(uaddr); if (ret) @@ -1196,7 +1196,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) } =20 get_pi_state(pi_state); - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); =20 /* drops pi_state->pi_mutex.wait_lock */ ret =3D wake_futex_pi(uaddr, uval, pi_state, rt_waiter); @@ -1235,7 +1235,8 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) * owner. */ if ((ret =3D futex_cmpxchg_value_locked(&curval, uaddr, uval, 0))) { - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); + switch (ret) { case -EFAULT: goto pi_faulted; @@ -1255,7 +1256,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) ret =3D (curval =3D=3D uval) ? 0 : -EAGAIN; =20 out_unlock: - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); return ret; =20 pi_retry: diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index fb69dcdf74da8..217cec5c8302e 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -58,7 +58,7 @@ enum { }; =20 const struct futex_q futex_q_init =3D { - /* list gets initialized in futex_queue()*/ + /* list gets initialized in futex_queue_put()*/ .wake =3D futex_wake_mark, .key =3D FUTEX_KEY_INIT, .bitset =3D FUTEX_BITSET_MATCH_ANY, @@ -457,7 +457,7 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, =20 if (unlikely(ret)) { futex_hb_waiters_dec(hb2); - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); =20 ret =3D get_user(curval, uaddr1); if (ret) @@ -543,7 +543,7 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, */ case -EFAULT: futex_hb_waiters_dec(hb2); - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); ret =3D fault_in_user_writeable(uaddr2); if (!ret) goto retry; @@ -557,7 +557,7 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, * - EAGAIN: The user space value changed. */ futex_hb_waiters_dec(hb2); - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); /* * Handle the case where the owner is in the middle of * exiting. Wait for the exit to complete otherwise @@ -675,7 +675,7 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, =20 out_unlock: futex_hb_waiters_dec(hb2); - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); wake_up_q(&wake_q); return ret ? ret : task_count; } @@ -814,7 +814,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, * shared futexes. We need to compare the keys: */ if (futex_match(&q.key, &key2)) { - futex_q_unlock(hb); + futex_q_unlock_put(hb); ret =3D -EINVAL; goto out; } diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index 3a10375d95218..01d419efbf298 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -195,7 +195,7 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, i= nt nr_wake, u32 bitset) } } =20 - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); wake_up_q(&wake_q); return ret; } @@ -274,7 +274,7 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flag= s, u32 __user *uaddr2, double_lock_hb(hb1, hb2); op_ret =3D futex_atomic_op_inuser(op, uaddr2); if (unlikely(op_ret < 0)) { - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); =20 if (!IS_ENABLED(CONFIG_MMU) || unlikely(op_ret !=3D -EFAULT && op_ret !=3D -EAGAIN)) { @@ -327,7 +327,7 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flag= s, u32 __user *uaddr2, } =20 out_unlock: - double_unlock_hb(hb1, hb2); + double_unlock_hb_put(hb1, hb2); wake_up_q(&wake_q); return ret; } @@ -335,7 +335,7 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flag= s, u32 __user *uaddr2, static long futex_wait_restart(struct restart_block *restart); =20 /** - * futex_wait_queue() - futex_queue() and wait for wakeup, timeout, or sig= nal + * futex_wait_queue() - futex_queue_put() and wait for wakeup, timeout, or= signal * @hb: the futex hash bucket, must be locked by the caller * @q: the futex_q to queue up on * @timeout: the prepared hrtimer_sleeper, or null for no timeout @@ -346,11 +346,11 @@ void futex_wait_queue(struct futex_hash_bucket *hb, s= truct futex_q *q, /* * The task state is guaranteed to be set before another task can * wake it. set_current_state() is implemented using smp_store_mb() and - * futex_queue() calls spin_unlock() upon completion, both serializing + * futex_queue_put() calls spin_unlock() upon completion, both serializing * access to the hash list and forcing another memory barrier. */ set_current_state(TASK_INTERRUPTIBLE|TASK_FREEZABLE); - futex_queue(q, hb); + futex_queue_put(q, hb); =20 /* Arm the timer */ if (timeout) @@ -461,11 +461,12 @@ int futex_wait_multiple_setup(struct futex_vector *vs= , int count, int *woken) * next futex. Queue each futex at this moment so hb can * be unlocked. */ - futex_queue(q, hb); + futex_queue_put(q, hb); continue; } =20 - futex_q_unlock(hb); + futex_q_unlock_put(hb); + __set_current_state(TASK_RUNNING); =20 /* @@ -624,7 +625,7 @@ int futex_wait_setup(u32 __user *uaddr, u32 val, unsign= ed int flags, ret =3D futex_get_value_locked(&uval, uaddr); =20 if (ret) { - futex_q_unlock(*hb); + futex_q_unlock_put(*hb); =20 ret =3D get_user(uval, uaddr); if (ret) @@ -637,7 +638,7 @@ int futex_wait_setup(u32 __user *uaddr, u32 val, unsign= ed int flags, } =20 if (uval !=3D val) { - futex_q_unlock(*hb); + futex_q_unlock_put(*hb); ret =3D -EWOULDBLOCK; } =20 @@ -665,7 +666,7 @@ int __futex_wait(u32 __user *uaddr, unsigned int flags,= u32 val, if (ret) return ret; =20 - /* futex_queue and wait for wakeup, timeout, or a signal. */ + /* futex_queue_put() and wait for wakeup, timeout, or a signal. */ futex_wait_queue(hb, &q, to); =20 /* If we were woken (and unqueued), we succeeded, whatever. */ --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACBA21A01CC for ; Wed, 18 Dec 2024 11:16:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520592; cv=none; b=QEQ8RoQAWUdLeFsKdMzgt/go2P+Y+sPU0IJiPnqa1CLhoZ54RbrFyy3PyPE3oX0WV0QcTrvD+ZpxoYzIlNFL7pL9Fpe59jcjVU8Slh8vkQ2PUcC+OH8dGQVVdfet4AK3l4GCFF03R7RCB6jbp2u19BXrcuw99IrGNb7mC5snA+g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520592; c=relaxed/simple; bh=y9QHnww7dsE9jJJeTyKx3mgtH9RzETbu4RJH6fEo64g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LEoKZRvqj07h1uLPI9kHZUqUpxPCYadhSLEEPHB1fG3sQgLjtfMWoetUT/1dRzxMS4KBKYTwwuNXqAe9zx6M9DeZlViUVMOKbGIqXmVvSOFEJCvARSiwQA4uQspeaPYuKaPkLrXTEw88aMxeYMZv8H9OquohkiruGpfX5HXDKY8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=L044UsHG; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Tz2Jz5wG; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="L044UsHG"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Tz2Jz5wG" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520588; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aXpqOOBJeNGbJ2omXu9WIWMFWCXXo+fLDPz2Rz90DRw=; b=L044UsHGW17Fd16srhWn6G6Av/s6pi5potN9lmucpqUgpUAaxavjAmGDsmBL1jC225Yt7K NVQZbS5mRlOo+yiQ3WIcJwZrP42Xu5HfUhpnKQoqRYbF3MB3/E8HWc6sXP7dSYf/2+Jhzz sRxprfSgyLeRDGKpXuVNTgsqPlmRfKYIUf7Ap006MQ1mhLugQkVe/WyoDJE3llbNxuDigv WZtZZdoF8YMWc2ZXyyNdBWFDjDEueRTWyGMBKEdayFoeWLR1Q1Vq1o6gYdOYklYZX6SgJN NjNeuaxAIoTrVm88/8CjlsyJSW6+RMqGS5WQxvEFRnv9a9gCuAuF0JWLZKqXzg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520588; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aXpqOOBJeNGbJ2omXu9WIWMFWCXXo+fLDPz2Rz90DRw=; b=Tz2Jz5wGTz9q+rMRIkGz5NKttHocwn6b74/f9B7Rf8gW0MNBFZWzac+G0vRz55bZLuXbYI nfgH+RR8IWyYwKAw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 08/15] futex: Re-evaluate the hash bucket after dropping the lock Date: Wed, 18 Dec 2024 12:09:46 +0100 Message-ID: <20241218111618.268028-9-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In futex_requeue() and futex_wake_op() the hash bucket lock is dropped in the failure paths for handling page faults and other error scenarios. After that the code jumps back to retry_private which relocks the hash bucket[s] under the assumption that the hash bucket pointer which was retrieved via futex_hash() is still valid. With resizable private hash buckets, that assumption is not longer true as the waiters can be moved to a larger hash in the meantime. Move the retry_private label above the hashing function to handle this correctly. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/requeue.c | 2 +- kernel/futex/waitwake.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index 217cec5c8302e..31ec543e7fdb3 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -443,10 +443,10 @@ int futex_requeue(u32 __user *uaddr1, unsigned int fl= ags1, if (requeue_pi && futex_match(&key1, &key2)) return -EINVAL; =20 +retry_private: hb1 =3D futex_hash(&key1); hb2 =3D futex_hash(&key2); =20 -retry_private: futex_hb_waiters_inc(hb2); double_lock_hb(hb1, hb2); =20 diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index 01d419efbf298..03e2f546967d8 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -267,10 +267,10 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int fl= ags, u32 __user *uaddr2, if (unlikely(ret !=3D 0)) return ret; =20 +retry_private: hb1 =3D futex_hash(&key1); hb2 =3D futex_hash(&key2); =20 -retry_private: double_lock_hb(hb1, hb2); op_ret =3D futex_atomic_op_inuser(op, uaddr2); if (unlikely(op_ret < 0)) { --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75F3F1A2395 for ; Wed, 18 Dec 2024 11:16:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520593; cv=none; b=atZ4LBpRNc5pTrpmLBQ3CLMVyJM3sB2KFnd7CNQUzN6rKpagG273hQ3ZvYTHwf5UFQCGfEF+6zDdIEO9TxSgb99mIsId7v3Jou/8sMBhPf6nH+I6mIZbyeWaBQfJZzaPNJkMgEJNtldKhSSUkSdhjXNUhwxqtaB3vTOH+uvJfhw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520593; c=relaxed/simple; bh=1G25jwU4wkiHBYDpfpt5AIvIQWhWc1xVKqCoGjnjpX8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pDBbvsOUiohXFR55QMJEgjrfU+9jb6dIoNQsOMvynAb3wfJ7jAy2nODfw08B/4k1eyaffJq/kOiLAg9Un/0c8Wad+cbNh/AkOnJmOgcMFZKKP6L3B8sXqCXg9ce/K8T4sSiH6p4c3t9edyvYR+Fu4xeQM+5MpsF7U4ZYoeCxyfM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=rHPFvuTD; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=VtcZHwaf; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="rHPFvuTD"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="VtcZHwaf" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520588; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UDEiWWUVA45ZXipGBHkUnhPQu1TACnkeY6TZeiEgfB4=; b=rHPFvuTDGrk+j7lRW9SfsdLEJxFlxELqQpnvhhzsZWlVaXA/HqT8+peNszJCYYkTopGhua YEh4oX7QqW38bNrkhzoVB9tQ9tyofNMpBeiOj5c7TJvRQy85ASCD1gCu7aetm784GSHeLS g2GzBE2YQmI/6rDHNZ24dqwgM1CXGHuYRcY5mZDb9AxEB90bWcfq7Qv8ZR+eW2kKIcNsoc q50KdMYCrl4dlC+UmOu/CQ3O9oPsAIf5lYtNkeXmCYZzuJpMrBaADJp/4gPd60F5RxNCj8 wCNmqYB69tSyoaf5/5DZOl35h9jetY8rIeBbEb61G/hozBDwHODSVVaFkQa2SQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520588; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UDEiWWUVA45ZXipGBHkUnhPQu1TACnkeY6TZeiEgfB4=; b=VtcZHwafwIa05dCsctnpK9HZ4pg2Tzvz3BV4Jti1txJM0+/A7HiOFBCF4TQZz8fIJkjFrT hi4vdQvUnDT+dSCA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 09/15] futex: Introduce futex_get_locked_hb(). Date: Wed, 18 Dec 2024 12:09:47 +0100 Message-ID: <20241218111618.268028-10-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" futex_lock_pi() and __fixup_pi_state_owner() acquire the futex_q::lock_ptr without holding a reference assuming the previously obtained hash bucket and the assigned lock_ptr are still valid. This isn't the case once the private hash can be resized and becomes invalid after the reference drop. Introduce futex_get_locked_hb() to lock the hash bucket recorded in futex_q::lock_ptr. The lock pointer is read in a RCU section to ensure that it does not go away if the hash bucket has been replaced and the old pointer has been observed. After locking the pointer needs to be compared to check if it changed. If so then the hash bucket has been replaced and the user has been moved to the new one and lock_ptr has been updated. The lock operation needs to be redone in this case. Once the lock_ptr is the same, we can return the futex_hash_bucket it belongs to as the hash bucket for the caller locked. This is important because we don't own a reference so the hash bucket is valid as long as we hold the lock. This means if the local hash is resized then this (old) hash bucket remains valid as long as we hold the lock because all user need to be moved to the new hash bucket and have their lock_ptr updated. The task performing the resize will block. Add futex_get_locked_hb() and use it. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 27 +++++++++++++++++++++++++++ kernel/futex/futex.h | 2 +- kernel/futex/pi.c | 9 +++++++-- kernel/futex/requeue.c | 8 +++++--- 4 files changed, 40 insertions(+), 6 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 1521fbdf22f65..e8214656a66b6 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -639,6 +639,33 @@ int futex_unqueue(struct futex_q *q) return ret; } =20 +struct futex_hash_bucket *futex_get_locked_hb(struct futex_q *q) +{ + struct futex_hash_bucket *hb; + spinlock_t *lock_ptr; + + /* + * See futex_unqueue() why lock_ptr can change. + */ + guard(rcu)(); +retry: + lock_ptr =3D READ_ONCE(q->lock_ptr); + spin_lock(lock_ptr); + + if (unlikely(lock_ptr !=3D q->lock_ptr)) { + spin_unlock(lock_ptr); + goto retry; + } + + hb =3D container_of(lock_ptr, struct futex_hash_bucket, lock); + /* + * We don't acquire a reference on the hb because we don't get it + * if a resize is in progress and we got the old hb->lock before the + * resizing task got it so we can't be moved to the new hb. + */ + return hb; +} + /* * PI futexes can not be requeued and must remove themselves from the hash * bucket. The hash bucket lock (i.e. lock_ptr) is held. diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index 36627617f7ced..3c78126d4079e 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -196,7 +196,7 @@ enum futex_access { =20 extern int get_futex_key(u32 __user *uaddr, unsigned int flags, union fute= x_key *key, enum futex_access rw); - +extern struct futex_hash_bucket *futex_get_locked_hb(struct futex_q *q); extern struct hrtimer_sleeper * futex_setup_timer(ktime_t *time, struct hrtimer_sleeper *timeout, int flags, u64 range_ns); diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index 8561f94f21ed9..506ba1ad8ff23 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -806,7 +806,7 @@ static int __fixup_pi_state_owner(u32 __user *uaddr, st= ruct futex_q *q, break; } =20 - spin_lock(q->lock_ptr); + futex_get_locked_hb(q); raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); =20 /* @@ -922,6 +922,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags= , ktime_t *time, int tryl struct rt_mutex_waiter rt_waiter; struct futex_hash_bucket *hb; struct futex_q q =3D futex_q_init; + bool no_block_fp =3D false; DEFINE_WAKE_Q(wake_q); int res, ret; =20 @@ -988,6 +989,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags= , ktime_t *time, int tryl ret =3D rt_mutex_futex_trylock(&q.pi_state->pi_mutex); /* Fixup the trylock return value: */ ret =3D ret ? 0 : -EWOULDBLOCK; + no_block_fp =3D true; goto no_block; } =20 @@ -1024,6 +1026,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fla= gs, ktime_t *time, int tryl raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock); wake_up_q(&wake_q); preempt_enable(); + futex_hash_put(hb); =20 if (ret) { if (ret =3D=3D 1) @@ -1063,7 +1066,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fla= gs, ktime_t *time, int tryl * spinlock/rtlock (which might enqueue its own rt_waiter) and fix up * the */ - spin_lock(q.lock_ptr); + hb =3D futex_get_locked_hb(&q); /* * Waiter is unqueued. */ @@ -1083,6 +1086,8 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fla= gs, ktime_t *time, int tryl =20 futex_unqueue_pi(&q); spin_unlock(q.lock_ptr); + if (no_block_fp) + futex_hash_put(hb); goto out; =20 out_unlock_put_key: diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index 31ec543e7fdb3..db27fbf68521c 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -825,15 +825,17 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned= int flags, switch (futex_requeue_pi_wakeup_sync(&q)) { case Q_REQUEUE_PI_IGNORE: /* The waiter is still on uaddr1 */ - spin_lock(&hb->lock); + hb =3D futex_get_locked_hb(&q); + ret =3D handle_early_requeue_pi_wakeup(hb, &q, to); spin_unlock(&hb->lock); + break; =20 case Q_REQUEUE_PI_LOCKED: /* The requeue acquired the lock */ if (q.pi_state && (q.pi_state->owner !=3D current)) { - spin_lock(q.lock_ptr); + futex_get_locked_hb(&q); ret =3D fixup_pi_owner(uaddr2, &q, true); /* * Drop the reference to the pi state which the @@ -860,7 +862,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, if (ret && !rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter)) ret =3D 0; =20 - spin_lock(q.lock_ptr); + futex_get_locked_hb(&q); debug_rt_mutex_free_waiter(&rt_waiter); /* * Fixup the pi_state owner and possibly acquire the lock if we --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75EAF1A2392 for ; Wed, 18 Dec 2024 11:16:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520594; cv=none; b=n4coc4+J3CcVU/H9FVwf5VEh9OmkwLXnX2YUcjzdquYjRuxZ+6e2g2krqDNtUrz1VuCHTMyOr+JsvcksqSvLVSAj9/jpF4Ms7JwPt0C7Vz6kDnvWgiLKMLlAkORn8RhswI4PMu/xISFxSPDAN6Jk0XOEjamvjgn8aLYzjddujPA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520594; c=relaxed/simple; bh=WyIL6PQjQbMEPTx5vZ3SfTtmd4qLJoASE/UxWEuTWos=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dPzKi3cyF9jl3ElJztJg+E/ZevwRvJQ9SIYogdpL1Wc4eaAI0MagPerbUiT3nassbPm3blafWVJTdDca69Fj92bzzj00Scc7Xnb9uWLGldPCH5LAIy74m2zOB4fpPfiWPXodgJjEss5nbh+QU1rxJsb5nLfNCwFI4c8Wepz95BI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=HhHsL6/t; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=yIGtkPFP; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="HhHsL6/t"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="yIGtkPFP" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520588; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3HcslQVXhazTWL1I10kAakFcEs2U6WIv9cP4n3ds6s4=; b=HhHsL6/t5zETZGL5Ep4EIqkbw71eR7LxHIpTnOzZz/bN9HrmGOxlq5RW6aUHARjo7qgYNd e8ERrMErYiDTVmY9024XItu3B5hPPtdSwQ9YkpWah8NIkAVROFCd80rgli/1wSN5XLV3JH M6WuH4oHQYFrSrOp6Jv/7h2wXZhplubJBBChv9r5b7u14BHGZfIIeFjgfSSW6d8GyzXVrO bmFe1HsU2MgipIUe6Z6FKAjM7s8vCdBLDZ4iaPYRR0oRCVJjyDRyikmX+ygBOw4afq3pDG C2FaB04g49XoCJnCUhrj/YmENb9FIqX8n14hTOMcN5VRl8IoAaXY7Xo8XG3H/A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520588; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3HcslQVXhazTWL1I10kAakFcEs2U6WIv9cP4n3ds6s4=; b=yIGtkPFPggT0ICCpTGqpPrdVKPLRN82Knuvt3s5OQchmj7tzKVlFCSW634OfXWRfXaJpOa F7m82vMW/RGKVaDw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 10/15] futex: Allow to re-allocate the private local hash. Date: Wed, 18 Dec 2024 12:09:48 +0100 Message-ID: <20241218111618.268028-11-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The mm_struct::futex_hash_lock guards the futex_hash_bucket assignment/ replacement. The futex_hash_allocate()/ PR_FUTEX_HASH_SET_SLOTS operation can now be invoked at runtime and resize an already existing internal private futex_hash_bucket to another size. The reallocation is based on an idea by Thomas Gleixner: The initial allocation of struct futex_hash_bucket_private sets the reference count to one. Every user acquires a reference on the local hash before using it and drops it after it enqueued itself on the hash bucket. There is no reference held while the task is scheduled out while waiting for the wake up. The resize allocates a new struct futex_hash_bucket_private and drops the initial reference under the mm_struct::futex_hash_lock. If the reference drop results in destruction of the object then users currently queued on the local hash will be requeued on the new local hash. At the end mm_struct::futex_hash_bucket is updated, the old pointer is RCU freed and the mutex is dropped. If the reference drop does not result in destruction of the object then the new pointer is saved as mm_struct::futex_hash_new. In this case replacement is delayed. The user dropping the last reference is not always the best choice to preform the replacement. For instance futex_wait_queue() drops the reference after changing its task state which will also be modified while the futex_hash_lock is acquired. Therefore the replacement is delayed to the task acquiring a reference on the current local hash. This scheme keeps the requirement that during a lock/ unlock operation all waiter block on the same futex_hash_bucket::lock. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/futex.h | 3 +- include/linux/mm_types.h | 7 +- kernel/futex/core.c | 230 +++++++++++++++++++++++++++++++++++---- kernel/futex/futex.h | 1 + kernel/futex/requeue.c | 5 + kernel/futex/waitwake.c | 4 +- 6 files changed, 222 insertions(+), 28 deletions(-) diff --git a/include/linux/futex.h b/include/linux/futex.h index bad377c30de5e..3ced01a9c5218 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -83,7 +83,8 @@ void futex_hash_free(struct mm_struct *mm); =20 static inline void futex_mm_init(struct mm_struct *mm) { - mm->futex_hash_bucket =3D NULL; + rcu_assign_pointer(mm->futex_hash_bucket, NULL); + mutex_init(&mm->futex_hash_lock); } =20 static inline bool futex_hash_requires_allocation(void) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 2337a2e481fd0..62fe872b381f8 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -30,7 +30,7 @@ #define INIT_PASID 0 =20 struct address_space; -struct futex_hash_bucket; +struct futex_hash_bucket_private; struct mem_cgroup; =20 /* @@ -904,8 +904,9 @@ struct mm_struct { #endif =20 #ifdef CONFIG_FUTEX - unsigned int futex_hash_mask; - struct futex_hash_bucket *futex_hash_bucket; + struct mutex futex_hash_lock; + struct futex_hash_bucket_private __rcu *futex_hash_bucket; + struct futex_hash_bucket_private *futex_hash_new; #endif =20 unsigned long hiwater_rss; /* High-watermark of RSS usage */ diff --git a/kernel/futex/core.c b/kernel/futex/core.c index e8214656a66b6..44e16f033a4dd 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -40,6 +40,7 @@ #include #include #include +#include =20 #include "futex.h" #include "../locking/rtmutex_common.h" @@ -56,6 +57,13 @@ static struct { #define futex_queues (__futex_data.queues) #define futex_hashsize (__futex_data.hashsize) =20 +struct futex_hash_bucket_private { + rcuref_t users; + unsigned int hash_mask; + struct rcu_head rcu; + bool initial_ref_dropped; + struct futex_hash_bucket queues[]; +}; =20 /* * Fault injections for futexes. @@ -129,6 +137,122 @@ static struct futex_hash_bucket *futex_hash_private(u= nion futex_key *key, return &fhb[hash & hash_mask]; } =20 +static void futex_rehash_current_users(struct futex_hash_bucket_private *o= ld, + struct futex_hash_bucket_private *new) +{ + struct futex_hash_bucket *hb_old, *hb_new; + unsigned int slots =3D old->hash_mask + 1; + u32 hash_mask =3D new->hash_mask; + unsigned int i; + + for (i =3D 0; i < slots; i++) { + struct futex_q *this, *tmp; + + hb_old =3D &old->queues[i]; + + spin_lock(&hb_old->lock); + plist_for_each_entry_safe(this, tmp, &hb_old->chain, list) { + + plist_del(&this->list, &hb_old->chain); + futex_hb_waiters_dec(hb_old); + + WARN_ON_ONCE(this->lock_ptr !=3D &hb_old->lock); + + hb_new =3D futex_hash_private(&this->key, new->queues, hash_mask); + futex_hb_waiters_inc(hb_new); + /* + * The new pointer isn't published yet but an already + * moved user can be unqueued due to timeout or signal. + */ + spin_lock_nested(&hb_new->lock, SINGLE_DEPTH_NESTING); + plist_add(&this->list, &hb_new->chain); + this->lock_ptr =3D &hb_new->lock; + spin_unlock(&hb_new->lock); + } + spin_unlock(&hb_old->lock); + } +} + +static void __futex_assign_new_hb(struct futex_hash_bucket_private *hb_p_n= ew, + struct mm_struct *mm) +{ + struct futex_hash_bucket_private *hb_p; + bool drop_init_ref =3D hb_p_new !=3D NULL; + + if (!hb_p_new) { + hb_p_new =3D mm->futex_hash_new; + mm->futex_hash_new =3D NULL; + } + /* Someone was quicker, the current mask is valid */ + if (!hb_p_new) + return; + + hb_p =3D rcu_dereference_check(mm->futex_hash_bucket, + lockdep_is_held(&mm->futex_hash_lock)); + if (hb_p) { + if (hb_p->hash_mask >=3D hb_p_new->hash_mask) { + /* It was increased again while we were waiting */ + kvfree(hb_p_new); + return; + } + /* + * If the caller started the resize then the initial reference + * needs needs to be dropped. If the object can not be + * deconstructed we save hb_p_new for later and ensure the + * reference counter is not dropped again. + */ + if (drop_init_ref && + (hb_p->initial_ref_dropped || !rcuref_put(&hb_p->users))) { + mm->futex_hash_new =3D hb_p_new; + hb_p->initial_ref_dropped =3D true; + return; + } + + futex_rehash_current_users(hb_p, hb_p_new); + } + rcu_assign_pointer(mm->futex_hash_bucket, hb_p_new); + kvfree_rcu(hb_p, rcu); +} + +static void futex_assign_new_hb(struct futex_hash_bucket_private *hb_p_new) +{ + struct mm_struct *mm =3D current->mm; + + scoped_guard(mutex, &mm->futex_hash_lock) + __futex_assign_new_hb(hb_p_new, mm); +} + +static struct futex_hash_bucket_private *futex_get_private_hb(union futex_= key *key) +{ + struct mm_struct *mm =3D current->mm; + + if (!futex_key_is_private(key)) + return NULL; + /* + * Ideally we don't loop. If there is a replacement in progress + * then a new local hash is already prepared. We fail to obtain + * a reference only after the last user returned its referefence. + * In that case futex_assign_new_hb() blocks on futex_hash_bucket + * and we either have to performon the replacement or wait + * while someone else is doing the job. Eitherway, after we + * return we can acquire a reference on the new local hash + * (unless it is replaced again). + */ +again: + scoped_guard(rcu) { + struct futex_hash_bucket_private *hb_p; + + hb_p =3D rcu_dereference(mm->futex_hash_bucket); + if (!hb_p) + return NULL; + + if (rcuref_get(&hb_p->users)) + return hb_p; + } + futex_assign_new_hb(NULL); + goto again; +} + /** * futex_hash - Return the hash bucket in the global hash * @key: Pointer to the futex key for which the hash is calculated @@ -139,12 +263,12 @@ static struct futex_hash_bucket *futex_hash_private(u= nion futex_key *key, */ struct futex_hash_bucket *futex_hash(union futex_key *key) { - struct futex_hash_bucket *fhb; + struct futex_hash_bucket_private *hb_p =3D NULL; u32 hash; =20 - fhb =3D current->mm->futex_hash_bucket; - if (fhb && futex_key_is_private(key)) - return futex_hash_private(key, fhb, current->mm->futex_hash_mask); + hb_p =3D futex_get_private_hb(key); + if (hb_p) + return futex_hash_private(key, hb_p->queues, hb_p->hash_mask); =20 hash =3D jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, @@ -154,6 +278,16 @@ struct futex_hash_bucket *futex_hash(union futex_key *= key) =20 void futex_hash_put(struct futex_hash_bucket *hb) { + struct futex_hash_bucket_private *hb_p; + + if (hb->hb_slot =3D=3D 0) + return; + hb_p =3D container_of(hb, struct futex_hash_bucket_private, + queues[hb->hb_slot - 1]); + + if (!rcuref_put(&hb_p->users)) + return; + /* If hb_p is for destruction then this is delayed to futex_hash() */ } =20 /** @@ -601,6 +735,8 @@ int futex_unqueue(struct futex_q *q) spinlock_t *lock_ptr; int ret =3D 0; =20 + /* RCU so lock_ptr is not going away during locking. */ + guard(rcu)(); /* In the common case we don't take the spinlock, which is nice. */ retry: /* @@ -1008,10 +1144,23 @@ static void compat_exit_robust_list(struct task_str= uct *curr) static void exit_pi_state_list(struct task_struct *curr) { struct list_head *next, *head =3D &curr->pi_state_list; + struct futex_hash_bucket_private *hb_p; struct futex_pi_state *pi_state; struct futex_hash_bucket *hb; union futex_key key =3D FUTEX_KEY_INIT; =20 + /* + * Lock the futex_hash_bucket to ensure that the hb remains unchanged. + * This is important so we can invoke futex_hash() under the pi_lock. + */ + guard(mutex)(&curr->mm->futex_hash_lock); + hb_p =3D rcu_dereference_check(curr->mm->futex_hash_bucket, + lockdep_is_held(&curr->mm->futex_hash_lock)); + if (hb_p) { + if (rcuref_read(&hb_p->users) =3D=3D 0) + __futex_assign_new_hb(NULL, curr->mm); + } + /* * We are a ZOMBIE and nobody can enqueue itself on * pi_state_list anymore, but we have to be careful @@ -1037,6 +1186,7 @@ static void exit_pi_state_list(struct task_struct *cu= rr) if (!refcount_inc_not_zero(&pi_state->refcount)) { raw_spin_unlock_irq(&curr->pi_lock); cpu_relax(); + futex_hash_put(hb); raw_spin_lock_irq(&curr->pi_lock); continue; } @@ -1052,7 +1202,7 @@ static void exit_pi_state_list(struct task_struct *cu= rr) if (head->next !=3D next) { /* retain curr->pi_lock for the loop invariant */ raw_spin_unlock(&pi_state->pi_mutex.wait_lock); - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); put_pi_state(pi_state); continue; } @@ -1064,7 +1214,7 @@ static void exit_pi_state_list(struct task_struct *cu= rr) =20 raw_spin_unlock(&curr->pi_lock); raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); - spin_unlock(&hb->lock); + futex_hb_unlock_put(hb); =20 rt_mutex_futex_unlock(&pi_state->pi_mutex); put_pi_state(pi_state); @@ -1185,8 +1335,9 @@ void futex_exit_release(struct task_struct *tsk) futex_cleanup_end(tsk, FUTEX_STATE_DEAD); } =20 -static void futex_hash_bucket_init(struct futex_hash_bucket *fhb) +static void futex_hash_bucket_init(struct futex_hash_bucket *fhb, unsigned= int slot) { + fhb->hb_slot =3D slot; atomic_set(&fhb->waiters, 0); plist_head_init(&fhb->chain); spin_lock_init(&fhb->lock); @@ -1194,20 +1345,27 @@ static void futex_hash_bucket_init(struct futex_has= h_bucket *fhb) =20 void futex_hash_free(struct mm_struct *mm) { - kvfree(mm->futex_hash_bucket); + struct futex_hash_bucket_private *hb_p; + + /* We are the last one and we hold the initial reference */ + hb_p =3D rcu_dereference_check(mm->futex_hash_bucket, true); + if (!hb_p) + return; + + kvfree(mm->futex_hash_new); + if (WARN_ON(!rcuref_put(&hb_p->users))) + return; + + kvfree(hb_p); } =20 static int futex_hash_allocate(unsigned int hash_slots) { - struct futex_hash_bucket *fhb; + struct futex_hash_bucket_private *hb_p, *hb_tofree =3D NULL; + struct mm_struct *mm =3D current->mm; + size_t alloc_size; int i; =20 - if (current->mm->futex_hash_bucket) - return -EALREADY; - - if (!thread_group_leader(current)) - return -EINVAL; - if (hash_slots =3D=3D 0) hash_slots =3D 16; if (hash_slots < 2) @@ -1217,16 +1375,38 @@ static int futex_hash_allocate(unsigned int hash_sl= ots) if (!is_power_of_2(hash_slots)) hash_slots =3D rounddown_pow_of_two(hash_slots); =20 - fhb =3D kvmalloc_array(hash_slots, sizeof(struct futex_hash_bucket), GFP_= KERNEL_ACCOUNT); - if (!fhb) + if (unlikely(check_mul_overflow(hash_slots, sizeof(struct futex_hash_buck= et), + &alloc_size))) return -ENOMEM; =20 - current->mm->futex_hash_mask =3D hash_slots - 1; + if (unlikely(check_add_overflow(alloc_size, sizeof(struct futex_hash_buck= et_private), + &alloc_size))) + return -ENOMEM; + + hb_p =3D kvmalloc(alloc_size, GFP_KERNEL_ACCOUNT); + if (!hb_p) + return -ENOMEM; + + rcuref_init(&hb_p->users, 1); + hb_p->initial_ref_dropped =3D false; + hb_p->hash_mask =3D hash_slots - 1; =20 for (i =3D 0; i < hash_slots; i++) - futex_hash_bucket_init(&fhb[i]); + futex_hash_bucket_init(&hb_p->queues[i], i + 1); =20 - current->mm->futex_hash_bucket =3D fhb; + scoped_guard(mutex, &mm->futex_hash_lock) { + if (mm->futex_hash_new) { + if (mm->futex_hash_new->hash_mask <=3D hb_p->hash_mask) { + hb_tofree =3D mm->futex_hash_new; + } else { + hb_tofree =3D hb_p; + hb_p =3D mm->futex_hash_new; + } + mm->futex_hash_new =3D NULL; + } + __futex_assign_new_hb(hb_p, mm); + } + kvfree(hb_tofree); return 0; } =20 @@ -1237,8 +1417,12 @@ int futex_hash_allocate_default(void) =20 static int futex_hash_get_slots(void) { - if (current->mm->futex_hash_bucket) - return current->mm->futex_hash_mask + 1; + struct futex_hash_bucket_private *hb_p; + + guard(rcu)(); + hb_p =3D rcu_dereference(current->mm->futex_hash_bucket); + if (hb_p) + return hb_p->hash_mask + 1; return 0; } =20 @@ -1280,7 +1464,7 @@ static int __init futex_init(void) futex_hashsize =3D 1UL << futex_shift; =20 for (i =3D 0; i < futex_hashsize; i++) - futex_hash_bucket_init(&futex_queues[i]); + futex_hash_bucket_init(&futex_queues[i], 0); =20 return 0; } diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index 3c78126d4079e..8f6ff83f9a499 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -115,6 +115,7 @@ static inline bool should_fail_futex(bool fshared) */ struct futex_hash_bucket { atomic_t waiters; + unsigned int hb_slot; spinlock_t lock; struct plist_head chain; } ____cacheline_aligned_in_smp; diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index db27fbf68521c..1c1c43251120e 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -87,6 +87,11 @@ void requeue_futex(struct futex_q *q, struct futex_hash_= bucket *hb1, futex_hb_waiters_inc(hb2); plist_add(&q->list, &hb2->chain); q->lock_ptr =3D &hb2->lock; + /* + * hb1 and hb2 belong to the same futex_hash_bucket_private + * because if we managed get a reference on hb1 then it can't be + * replaced. Therefore we avoid put(hb1)+get(hb2) here. + */ } q->key =3D *key2; } diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index 03e2f546967d8..f5ee140886a50 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -173,8 +173,10 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, = int nr_wake, u32 bitset) hb =3D futex_hash(&key); =20 /* Make sure we really have tasks to wakeup */ - if (!futex_hb_waiters_pending(hb)) + if (!futex_hb_waiters_pending(hb)) { + futex_hash_put(hb); return ret; + } =20 spin_lock(&hb->lock); =20 --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 760941A2397 for ; Wed, 18 Dec 2024 11:16:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520593; cv=none; b=VA4OGV2c14yZFO7gIlZtjAX1msVh1J3p9nttDHsEEzUih/t44p2GnAimy+wgeD4FmLO3QqjsfbJAP7uL0OOEhV5Ft0qPzXEQv12vDL0hjCvnNLc0FH1pRQGBx8z+0BuKznzBNBXcHMlG1mHjRoO8X3666+N8HEbw8vCh6n1jR/w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520593; c=relaxed/simple; bh=gXkGtxoyoXl/NMK73cExS5UnrS3URYzCFyB2LAX73YY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=n9OfLUoNNSB6CU+2DovjWE4Jc6/40H3QATyrISXiqAUouauhA0Kmc6ZZxybI+MTHpzEdA5CqJxmMhsuLLD52/GqMJ7QWCsAtse+69PPegMnWBsul/kxUXpUKj2VAI9J8UzU7PsjTkTiviFg1zJCsM6QdWdF5oGtf0N3X1x+jmyo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=H80C/v9N; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=d760Iv3X; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="H80C/v9N"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="d760Iv3X" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520588; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D24xeo6QcAfcCgI6fSDs0xWo834h1otg4FtB/ux1bVA=; b=H80C/v9NSBbHrBUHTSq2KaSlc5HmEsN7Skgq0sw4+MB5WrZ14Yx0E4CHm+gNHIp0KQjdef FNHZjJnz7mq1RNn8OalNgrGQF+FsxzUbRPvtRG6WAa5/qU2yQHUfw9F8ZRFwZ1JngQjoFZ avzJ4bGD5HItdFwJ4WqqlKh3g0NAZDvhTfjD4X9OgjCquGimBkX96qooneSB5jFE7Y5Sja 87hQarsu0iy7JBHG2KlqO5KArx+NMGU5imUnrJRHEs7iqLMlpUwwQyh5pc4IyjLUw1THsj OpVhu8AW/AU/rxRcFez+jKHbIHf/0BZAu/oLNMi8JpHNCr1enUDOS+NhNn9TNw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520588; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=D24xeo6QcAfcCgI6fSDs0xWo834h1otg4FtB/ux1bVA=; b=d760Iv3XFK7Hk2aFNIBeCOijmXsmLqAyfaBt2yAybjJGtEXl2fYV0SIbwquEhUwRz+69f3 22KgJPQByrSt5IDg== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 11/15] futex: Resize local futex hash table based on number of threads. Date: Wed, 18 Dec 2024 12:09:49 +0100 Message-ID: <20241218111618.268028-12-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Automatically size the local hash based on the number of threads. The logic tries to allocate between 16 and futex_hashsize (the default for the system wide hash bucket) and uses 4 * number-of-threads. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/futex.h | 12 ------------ kernel/fork.c | 4 +--- kernel/futex/core.c | 25 ++++++++++++++++++++++--- 3 files changed, 23 insertions(+), 18 deletions(-) diff --git a/include/linux/futex.h b/include/linux/futex.h index 3ced01a9c5218..403b54526a081 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -87,13 +87,6 @@ static inline void futex_mm_init(struct mm_struct *mm) mutex_init(&mm->futex_hash_lock); } =20 -static inline bool futex_hash_requires_allocation(void) -{ - if (current->mm->futex_hash_bucket) - return false; - return true; -} - #else static inline void futex_init_task(struct task_struct *tsk) { } static inline void futex_exit_recursive(struct task_struct *tsk) { } @@ -116,11 +109,6 @@ static inline int futex_hash_allocate_default(void) static inline void futex_hash_free(struct mm_struct *mm) { } static inline void futex_mm_init(struct mm_struct *mm) { } =20 -static inline bool futex_hash_requires_allocation(void) -{ - return false; -} - #endif =20 #endif diff --git a/kernel/fork.c b/kernel/fork.c index 95d38709fde10..7364fa5f2872a 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2134,9 +2134,7 @@ static bool need_futex_hash_allocate_default(u64 clon= e_flags) { if ((clone_flags & (CLONE_THREAD | CLONE_VM)) !=3D (CLONE_THREAD | CLONE_= VM)) return false; - if (!thread_group_empty(current)) - return false; - return futex_hash_requires_allocation(); + return true; } =20 /* diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 44e16f033a4dd..95a177f0c5d68 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -1370,8 +1370,8 @@ static int futex_hash_allocate(unsigned int hash_slot= s) hash_slots =3D 16; if (hash_slots < 2) hash_slots =3D 2; - if (hash_slots > 131072) - hash_slots =3D 131072; + if (hash_slots > futex_hashsize) + hash_slots =3D futex_hashsize; if (!is_power_of_2(hash_slots)) hash_slots =3D rounddown_pow_of_two(hash_slots); =20 @@ -1412,7 +1412,26 @@ static int futex_hash_allocate(unsigned int hash_slo= ts) =20 int futex_hash_allocate_default(void) { - return futex_hash_allocate(0); + unsigned int threads, buckets, current_buckets =3D 0; + struct futex_hash_bucket_private *hb_p; + + if (!current->mm) + return 0; + + scoped_guard(rcu) { + threads =3D get_nr_threads(current); + hb_p =3D rcu_dereference(current->mm->futex_hash_bucket); + if (hb_p) + current_buckets =3D hb_p->hash_mask + 1; + } + + buckets =3D roundup_pow_of_two(4 * threads); + buckets =3D max(buckets, 16); + buckets =3D min(buckets, futex_hashsize); + if (current_buckets >=3D buckets) + return 0; + + return futex_hash_allocate(buckets); } =20 static int futex_hash_get_slots(void) --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75FB71A2396 for ; Wed, 18 Dec 2024 11:16:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520593; cv=none; b=PJq0Z1oYOANuBHB9oc7d012wU0Ws7u3jr54ETXg1O/kcIrneP3VjWRCy50bi+GeHY1ax0w6IrBA7n4OraK+TvedN8LwqxDj3iLvtnu1k6C8z+B1VOp40rIU0Jtd2zmvfbuVydQeiZGqKJPKyX2Zhos/gRtgaKoSP7WeW85OjruQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520593; c=relaxed/simple; bh=KUtNoG6W1ZSSkFBnPeDDIlHEmrhXr7E6zS1pVpjwhkQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ju/mpSlrPYDU9/0qNHiy7NyWQCeZD9K1LhMiqjNIu65SgkpDrzk7L8WFg9TVj13JYCrZIeeaShLM1G3CVGzmYqdq0Mdzp7JVUYJv4daP5TFAFtiEbU/625xxE1HP27aAKOBOBlfzbZvSxOqhgAPMQVQXGUowVMot6Ks3tMXwihY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=3/DfEiuV; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=JGTGfDcN; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="3/DfEiuV"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="JGTGfDcN" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520589; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2NFPMwUTOUxStVZVcnmN4MuCLIQZ4Qw1x9ywgplHsew=; b=3/DfEiuVfvO04rbcbDZDReFNuqFzn704FYTab+I7D/TVDIOkE5ttYEvEfCAt8Dmte8gVy+ sbCb+hEunVqhzZY094IKw4xOE8tiFw5MOSnJGs+uijt5JmO2puh7MQ6kTqfnX/1s70tcIo 6fvBcpYz/RQfuonLnfN7XRRnTjGwwdZIECmYlFvBOZdRsXxvA9NMSeoCoaqimBcxFYes8/ Xv1a3pZSdVL9Hpze6r3zJZPgZjroHUz9uyo2ldp7dsjILAEQuQDvT/H4dqI6ChYceuuaGm 3m5AYVp31r0wtAbf8dT9ESO36h4TnLBGTaBpiCiIkGcOXWDHbU8GHNSx/wpUng== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520589; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2NFPMwUTOUxStVZVcnmN4MuCLIQZ4Qw1x9ywgplHsew=; b=JGTGfDcNopj5bXPc6F//W1sd9f+xse3foUMZCKkHfBLJx+gmOY4EuvP43Ull1bMnpGNj/D PFOvrsaeOccoTcAQ== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 12/15] futex: Use a hashmask instead of hashsize. Date: Wed, 18 Dec 2024 12:09:50 +0100 Message-ID: <20241218111618.268028-13-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The global hash uses futex_hashsize to save the amount of the hash buckets that have been allocated during system boot. On each futex_hash() invocation this number is substracted by one to get the mask. This can be optimized by saving directly the mask avoiding the substraction on each futex_hash() invocation. Rename futex_hashsize to futex_hashmask and save the mask of the allocated hash map. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 95a177f0c5d68..0fdbf691ec95b 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -52,10 +52,10 @@ */ static struct { struct futex_hash_bucket *queues; - unsigned long hashsize; + unsigned long hashmask; } __futex_data __read_mostly __aligned(2*sizeof(long)); #define futex_queues (__futex_data.queues) -#define futex_hashsize (__futex_data.hashsize) +#define futex_hashmask (__futex_data.hashmask) =20 struct futex_hash_bucket_private { rcuref_t users; @@ -273,7 +273,7 @@ struct futex_hash_bucket *futex_hash(union futex_key *k= ey) hash =3D jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, key->both.offset); - return &futex_queues[hash & (futex_hashsize - 1)]; + return &futex_queues[hash & futex_hashmask]; } =20 void futex_hash_put(struct futex_hash_bucket *hb) @@ -1370,8 +1370,8 @@ static int futex_hash_allocate(unsigned int hash_slot= s) hash_slots =3D 16; if (hash_slots < 2) hash_slots =3D 2; - if (hash_slots > futex_hashsize) - hash_slots =3D futex_hashsize; + if (hash_slots > futex_hashmask + 1) + hash_slots =3D futex_hashmask + 1; if (!is_power_of_2(hash_slots)) hash_slots =3D rounddown_pow_of_two(hash_slots); =20 @@ -1427,7 +1427,7 @@ int futex_hash_allocate_default(void) =20 buckets =3D roundup_pow_of_two(4 * threads); buckets =3D max(buckets, 16); - buckets =3D min(buckets, futex_hashsize); + buckets =3D min(buckets, futex_hashmask + 1); if (current_buckets >=3D buckets) return 0; =20 @@ -1467,24 +1467,25 @@ int futex_hash_prctl(unsigned long arg2, unsigned l= ong arg3) =20 static int __init futex_init(void) { + unsigned long i, hashsize; unsigned int futex_shift; - unsigned long i; =20 #ifdef CONFIG_BASE_SMALL - futex_hashsize =3D 16; + hashsize =3D 16; #else - futex_hashsize =3D roundup_pow_of_two(256 * num_possible_cpus()); + hashsize =3D roundup_pow_of_two(256 * num_possible_cpus()); #endif =20 futex_queues =3D alloc_large_system_hash("futex", sizeof(*futex_queues), - futex_hashsize, 0, 0, + hashsize, 0, 0, &futex_shift, NULL, - futex_hashsize, futex_hashsize); - futex_hashsize =3D 1UL << futex_shift; + hashsize, hashsize); + hashsize =3D 1UL << futex_shift; =20 - for (i =3D 0; i < futex_hashsize; i++) + for (i =3D 0; i < hashsize; i++) futex_hash_bucket_init(&futex_queues[i], 0); =20 + futex_hashmask =3D hashsize - 1; return 0; } core_initcall(futex_init); --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B3241A23BC for ; Wed, 18 Dec 2024 11:16:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520594; cv=none; b=un/L0v7DC5Aw4BXNkIpgC0f6aOOBrHYrFcFfYaEuM85ma5ateajYsLv3+U/M2ZnQgtnW9ycZBegsr9rQDjrKzyoCU46untzVcUaFag2ltxkHnm5WL7+ojRc4y6aaeYOqHRFkpokOpH1eFgm/6fsd8o58lc9HoLneGcexSdKzPA4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520594; c=relaxed/simple; bh=oBCl2xg1hBgwXDdOFVnS6Y5pFTfEYL/Bz1BD8yFVl24=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Lm//ps00EgC7qmobQ4n3IfjxNz14SOxnivTbGHlyV3lnenTRcFLE7clBTZjag6t6cv0Xxavng8hv3criJSX6E0omL6yxUF5BThrcmW7aW7iESiXqi2QN6TMs2VFfvOedQhO0DNw/BLF6OxIq2Cix3Cc4Lnvi3nNmKXXW3SdFDxI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=1LIl3Hf0; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=AUy0hPT0; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="1LIl3Hf0"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="AUy0hPT0" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520589; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3L65Sur2ZtMYl6cHzYDBnz6z7qp7UvxDlE/18U+/UkU=; b=1LIl3Hf030VYIBzkEsRy7IGPTTcEUQyyjJab11hUPmv+hcTgKuA908KTnBkRVP/wB3et1C 2CmM50YkpP1qLLjmGejxJp3UIOwmy4MI74qC2/GRGXBrt9Eq35wvuhkIu+FvMVw0hLev7t C8LzbS7OHPRGIjXYy49I0Cln3qtVjo6/sYjOm6alIWUQPfxvHe0QpQ5qHjPD8haP0bTsAz dVbanRfLrKhTk3bbKVfOhPoNcBSVoeAttD0eX6JWUmlozo72dBqg7+mJ1SYauSFPW9YTTg aspJO0NSh80IiHfeYbyORqos4sw8urbvijQQ2B10pRqh1DYENNWrdP9ZRAh0PA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520589; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3L65Sur2ZtMYl6cHzYDBnz6z7qp7UvxDlE/18U+/UkU=; b=AUy0hPT0qARsiXoSThyw4KGSXZQvDYMZQwYEWn7h1lqyvscU/zACZAGmXdpHFdPedVJ0WI 1H3J5nNyPhEhqyAA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 13/15] =?UTF-8?q?tools/perf:=20Add=20the=20prctl(PR=5FF?= =?UTF-8?q?UTEX=5FHASH,=E2=80=A6)=20to=20futex-hash.?= Date: Wed, 18 Dec 2024 12:09:51 +0100 Message-ID: <20241218111618.268028-14-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Wire up PR_FUTEX_HASH to futex-hash. Use the `-b' argument to specify the number of buckets. Read it back and show during invocation. Signed-off-by: Sebastian Andrzej Siewior --- tools/perf/bench/futex-hash.c | 19 +++++++++++++++++-- tools/perf/bench/futex.h | 1 + 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c index b472eded521b1..e24e987ae213e 100644 --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -22,6 +22,7 @@ #include #include #include +#include =20 #include "../util/mutex.h" #include "../util/stat.h" @@ -53,6 +54,7 @@ static struct bench_futex_parameters params =3D { }; =20 static const struct option options[] =3D { + OPT_UINTEGER('b', "buckets", ¶ms.nbuckets, "Task local futex buckets = to allocate"), OPT_UINTEGER('t', "threads", ¶ms.nthreads, "Specify amount of threads= "), OPT_UINTEGER('r', "runtime", ¶ms.runtime, "Specify runtime (in second= s)"), OPT_UINTEGER('f', "futexes", ¶ms.nfutexes, "Specify amount of futexes= per threads"), @@ -120,6 +122,10 @@ static void print_summary(void) (int)bench__runtime.tv_sec); } =20 +#define PR_FUTEX_HASH 77 +# define PR_FUTEX_HASH_SET_SLOTS 1 +# define PR_FUTEX_HASH_GET_SLOTS 2 + int bench_futex_hash(int argc, const char **argv) { int ret =3D 0; @@ -131,6 +137,7 @@ int bench_futex_hash(int argc, const char **argv) struct perf_cpu_map *cpu; int nrcpus; size_t size; + int num_buckets; =20 argc =3D parse_options(argc, argv, options, bench_futex_hash_usage, 0); if (argc) { @@ -147,6 +154,14 @@ int bench_futex_hash(int argc, const char **argv) act.sa_sigaction =3D toggle_done; sigaction(SIGINT, &act, NULL); =20 + ret =3D prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, params.nbuckets); + if (ret) { + printf("Allocation of %u hash buckets failed: %d/%m\n", + params.nbuckets, ret); + goto errmem; + } + num_buckets =3D prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS); + if (params.mlockall) { if (mlockall(MCL_CURRENT | MCL_FUTURE)) err(EXIT_FAILURE, "mlockall"); @@ -162,8 +177,8 @@ int bench_futex_hash(int argc, const char **argv) if (!params.fshared) futex_flag =3D FUTEX_PRIVATE_FLAG; =20 - printf("Run summary [PID %d]: %d threads, each operating on %d [%s] futex= es for %d secs.\n\n", - getpid(), params.nthreads, params.nfutexes, params.fshared ? "shar= ed":"private", params.runtime); + printf("Run summary [PID %d]: %d threads, hash slots: %d each operating o= n %d [%s] futexes for %d secs.\n\n", + getpid(), params.nthreads, num_buckets, params.nfutexes, params.fs= hared ? "shared":"private", params.runtime); =20 init_stats(&throughput_stats); mutex_init(&thread_lock); diff --git a/tools/perf/bench/futex.h b/tools/perf/bench/futex.h index ebdc2b032afc1..abc353c63a9a4 100644 --- a/tools/perf/bench/futex.h +++ b/tools/perf/bench/futex.h @@ -20,6 +20,7 @@ struct bench_futex_parameters { bool multi; /* lock-pi */ bool pi; /* requeue-pi */ bool broadcast; /* requeue */ + unsigned int nbuckets; unsigned int runtime; /* seconds*/ unsigned int nthreads; unsigned int nfutexes; --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9D0D1A23AA for ; Wed, 18 Dec 2024 11:16:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520595; cv=none; b=gssl2XszDCunVd1MqF+iIHvrT1qYw73cOrj0hSzZH3a8adQqKQA7PP0b9NQobqi5vIPu+LHEbloutMmkTiFx6SBp2WaoQM+vpAwgXCpOPmT14HHO6heps+zgJu9Tje2VKkoWwuKPXmz73gqxxl7kvKkTPY3ziOJlQNGrWMEsNDA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520595; c=relaxed/simple; bh=Wza0NVogmEzWlHk3L0MKCrq8xu+lZPkmjetxS0L4yJY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fSJ01HONoqqPOmKi4Hj6NlZlvhxza4EWHjSri3ZYBbunzk05p+gf+n4KAtRFbFxBLrsIOXaqQa+6hTVKsw8FNFVWdWo6AvVey16mIrHYK3V8fjLjONQKY000utHPXd0Uo24G5VrI6QCpLQnJ2rbwuwcJiuv7ypLd23ME5nomyTU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=KW1c5Bgc; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=hQnZJZ00; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="KW1c5Bgc"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="hQnZJZ00" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520589; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P0sAWYJTAdfyW4L0KBHcmylrIRN9HFXqGitHstMvgdc=; b=KW1c5BgcmAR5UMKDiZJRcc9u2kZoNjVKgHimqiSUCbnjN2m3GhRHSIwP4/CIAgrltoDsJ3 IwEgQGeG8oSnJNVJvIjw9q0Ygc+4QD4zPeIZiKnKYkzJp8GJNx7GnEahWPEzXrWMbyAwut glXoISyIX2AW+liqfnqJWaXhHII/jdruaEkMsM0AiBVrUNBHtD6AZwkYe/g23OsgUulxc8 9yB813e2KshWaNrG9loV8RogW1e6kXjI4otJWUOMsspS3FwwV5N4HzuBzzs5X2X3BLbx6t gla56PZVMivA3iSLL8rOTcCVLn7V5A3qWavnnjYrlKsTj9DKtrloZ9Z6zHrsmQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520589; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P0sAWYJTAdfyW4L0KBHcmylrIRN9HFXqGitHstMvgdc=; b=hQnZJZ00CJACkDX9WDSsw0lu4R+aIxvqUsE8U/MtbY0WSRNT21zgm9HlYjaa4HZ+ZZeJ34 Viyg37/UdOgw4TCg== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 14/15] tools/perf: The the current affinity for CPU pinning in futex-hash. Date: Wed, 18 Dec 2024 12:09:52 +0100 Message-ID: <20241218111618.268028-15-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to simplify NUMA local testing, let futex-hash use the current affinity mask and pin the individual threads based on that mask. Signed-off-by: Sebastian Andrzej Siewior --- tools/perf/bench/futex-hash.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c index e24e987ae213e..216b0d1301ffc 100644 --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -126,10 +126,24 @@ static void print_summary(void) # define PR_FUTEX_HASH_SET_SLOTS 1 # define PR_FUTEX_HASH_GET_SLOTS 2 =20 +static unsigned int get_cpu_bit(cpu_set_t *set, size_t set_size, unsigned = int r_cpu) +{ + unsigned int cpu =3D 0; + + do { + if (CPU_ISSET_S(cpu, set_size, set)) { + if (!r_cpu) + return cpu; + r_cpu--; + } + cpu++; + } while (1); +} + int bench_futex_hash(int argc, const char **argv) { int ret =3D 0; - cpu_set_t *cpuset; + cpu_set_t *cpuset, cpuset_; struct sigaction act; unsigned int i; pthread_attr_t thread_attr; @@ -167,8 +181,12 @@ int bench_futex_hash(int argc, const char **argv) err(EXIT_FAILURE, "mlockall"); } =20 + ret =3D pthread_getaffinity_np(pthread_self(), sizeof(cpuset_), &cpuset_); + BUG_ON(ret); + nrcpus =3D CPU_COUNT(&cpuset_); + if (!params.nthreads) /* default to the number of CPUs */ - params.nthreads =3D perf_cpu_map__nr(cpu); + params.nthreads =3D nrcpus; =20 worker =3D calloc(params.nthreads, sizeof(*worker)); if (!worker) @@ -189,10 +207,9 @@ int bench_futex_hash(int argc, const char **argv) pthread_attr_init(&thread_attr); gettimeofday(&bench__start, NULL); =20 - nrcpus =3D cpu__max_cpu().cpu; - cpuset =3D CPU_ALLOC(nrcpus); + cpuset =3D CPU_ALLOC(4096); BUG_ON(!cpuset); - size =3D CPU_ALLOC_SIZE(nrcpus); + size =3D CPU_ALLOC_SIZE(4096); =20 for (i =3D 0; i < params.nthreads; i++) { worker[i].tid =3D i; @@ -202,7 +219,8 @@ int bench_futex_hash(int argc, const char **argv) =20 CPU_ZERO_S(size, cpuset); =20 - CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, c= puset); + CPU_SET_S(get_cpu_bit(&cpuset_, sizeof(cpuset_), i % nrcpus), size, cpus= et); + ret =3D pthread_attr_setaffinity_np(&thread_attr, size, cpuset); if (ret) { CPU_FREE(cpuset); --=20 2.45.2 From nobody Thu Dec 18 09:47:24 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 204C01ACEC4 for ; Wed, 18 Dec 2024 11:16:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520596; cv=none; b=MuvqumK0MGwI1hqGy5JPphc9fO4IgJBR2jaoyqnnnD3ZTB9Gxh8s10uQv0O3BG4VfBpTQAlqoKOoe3prXEde8VhiF9g5LnlL3N6hlig9tc32CZ8AsyZj1fVr0OVrugGLE3WHq4u0hpN7OPF0si/6bbgeSglzAwN0CN2xryG8orQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734520596; c=relaxed/simple; bh=lg8axIUYclb2T6+boQQvvI4x/adQdWN/2MeRAxM1I+E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CT4K5MAfJ44dI1qNQzud0WDDjvkabmTuzC74+c0fRj8JDldY7V1itbxlrEi0jLg6QXTLLpmHtomHwvB4Yf5YvutoT0vNW2Rd/5wSp3V3W/NMpB0hb01IMtxJGz3v62QnB475bEcVWBlc3hHDCe2kB7EV9umPkunkZ01fbRkF8zs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Nz+vicYF; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=rT2FZcQI; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Nz+vicYF"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="rT2FZcQI" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734520590; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E6O0H8PocWBBcPGtHkLgMVhBIfVergybbi7OL8FwAuw=; b=Nz+vicYFoPpydvMoeFX3CcWqvq+tyoxEuqQU2nS6ZKEXVg8zJ2WlVhqzsP1sxkkiFq4P4k hJ99IbGCf9LaVOqcsYVQERW8eZQg7tUpISSW/1ge2YsUzxpVkrZ2Go25SDENDYGhosVfNa C0QWy7H1eLQ6aQIpflsr4Ct8AnBYwz9yWl7ig6kw6au2uHh+WupKGsgXZaK2m8b8ssYwkY qn4KYpHiQjT8JFA2KfekuFV+xkpGYudr8JiupRz8vIPsNDJJ3blibFPu6oOFKClD+jS8My I7fRl9F/nAv3//epocOBiZOT+w/JRMpMKIuxOvWwHYP4h9OzpI1u8AmebNBjuA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734520590; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E6O0H8PocWBBcPGtHkLgMVhBIfVergybbi7OL8FwAuw=; b=rT2FZcQI4wJIv8JCgGtWHplwZJBCkjF+uhKMLvLwyvPDlkKGDOTr+kY1TVMYxdI7k4hC3y icfnKIJpRVNWXDDQ== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v6 15/15] tools/perf: Allocate futex locks on the local CPU-node. Date: Wed, 18 Dec 2024 12:09:53 +0100 Message-ID: <20241218111618.268028-16-bigeasy@linutronix.de> In-Reply-To: <20241218111618.268028-1-bigeasy@linutronix.de> References: <20241218111618.268028-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sebastian Andrzej Siewior --- tools/perf/bench/futex-hash.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c index 216b0d1301ffc..4c7c6677463f8 100644 --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -122,6 +122,8 @@ static void print_summary(void) (int)bench__runtime.tv_sec); } =20 +#include + #define PR_FUTEX_HASH 77 # define PR_FUTEX_HASH_SET_SLOTS 1 # define PR_FUTEX_HASH_GET_SLOTS 2 @@ -212,14 +214,19 @@ int bench_futex_hash(int argc, const char **argv) size =3D CPU_ALLOC_SIZE(4096); =20 for (i =3D 0; i < params.nthreads; i++) { + unsigned int cpu_num; worker[i].tid =3D i; - worker[i].futex =3D calloc(params.nfutexes, sizeof(*worker[i].futex)); - if (!worker[i].futex) - goto errmem; =20 CPU_ZERO_S(size, cpuset); + cpu_num =3D get_cpu_bit(&cpuset_, sizeof(cpuset_), i % nrcpus); + //worker[i].futex =3D calloc(params.nfutexes, sizeof(*worker[i].futex)); =20 - CPU_SET_S(get_cpu_bit(&cpuset_, sizeof(cpuset_), i % nrcpus), size, cpus= et); + worker[i].futex =3D numa_alloc_onnode(params.nfutexes * sizeof(*worker[i= ].futex), + numa_node_of_cpu(cpu_num)); + if (worker[i].futex =3D=3D MAP_FAILED || worker[i].futex =3D=3D NULL) + goto errmem; + + CPU_SET_S(cpu_num, size, cpuset); =20 ret =3D pthread_attr_setaffinity_np(&thread_attr, size, cpuset); if (ret) { @@ -271,7 +278,7 @@ int bench_futex_hash(int argc, const char **argv) &worker[i].futex[params.nfutexes-1], t); } =20 - zfree(&worker[i].futex); + numa_free(worker[i].futex, params.nfutexes * sizeof(*worker[i].futex)); } =20 print_summary(); --=20 2.45.2