From nobody Fri Dec 19 04:55:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 390621F756E for ; Tue, 3 Dec 2024 16:43:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244220; cv=none; b=pu64n7QjShaY1eSaXDMAVedopcRc/Ne2diJ/EkGUkSGtQ0MhJWy4+gHD0tLhQa19i9DoZynf6PsfT9ai1Q7YJyAajkCY44SsIlb3mbKFYQ4n5jo9GFoMoXWG7zC+qdrnM1C5ilCSihc9wRE2Mfipppq9ynUF+Hqtjbh+9ZH1fyY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244220; c=relaxed/simple; bh=mev7mpyqFcGhGcFLm7uZaP+9Wq71MjLIv/zUC+zUGlc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SAsjk3RQiyPwpQwAh4bRpr8sk3xGVOXxSfggapDIlvbbPVcIwzY77gz9N4JmUZRN8IArQjEGYv+yoymOgWbhEf6TFtURqH6V/u4lZTcDx6sqb4yKHB/GYFKl67w0FE5lSh4NzDPEP6CBTcllYaVTK8wnPRvNiP/0A5SnQC+bukE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=li5ktNZV; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=u1mKlZDg; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="li5ktNZV"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="u1mKlZDg" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1733244217; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Tiz7IpGB0/Q4xmudvzY6v3tsedJV49r7UohyKrwKBEw=; b=li5ktNZVN7SLq0YPxZLS/Sv6kmTMWjxrk5txelzuP/X54MIHCdumDndR+ddp1bjdz1evQz g930/5mh/qO5aIbuaEqs3rAeQAgcn6WgvaNoSyf76DnUSeNrteNGSI6/GNod+Q3caDlpcb lahUg2XIUdCfC8Z86RTKpGmkZ5UbGZBQIplmaEKpEcilcc4uJjAwR6yXx0MIOpViK6oj5s e0oqAorKbBPZpJ4T2rbEvdr3XCxXqG3pL+fi5vVm7qfkDeSV6ubN5FqD8vwYYmXBnLyZwE fFEPMks0B9fZw2fZCU6qzINdElK4TNXGI/U+qak6E/7Aw2hsZM2jpseliLT38Q== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1733244217; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Tiz7IpGB0/Q4xmudvzY6v3tsedJV49r7UohyKrwKBEw=; b=u1mKlZDggOWHwlv+ccFNtXcjILYxjd1KkPE0k9Hgi/uSCO+bn8ns5OigXkqD8+DiJSZ5ur tyYECb6rpUFV48BA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v4 01/11] futex: Create helper function to initialize a hash slot. Date: Tue, 3 Dec 2024 17:42:09 +0100 Message-ID: <20241203164335.1125381-2-bigeasy@linutronix.de> In-Reply-To: <20241203164335.1125381-1-bigeasy@linutronix.de> References: <20241203164335.1125381-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Factor out the futex_hash_bucket initialisation into a helpr function. The helper function will be used in a follow up patch. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index ebdd76b4ecbba..d1d3c7b358b23 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -1124,6 +1124,13 @@ void futex_exit_release(struct task_struct *tsk) futex_cleanup_end(tsk, FUTEX_STATE_DEAD); } =20 +static void futex_hash_bucket_init(struct futex_hash_bucket *fhb) +{ + atomic_set(&fhb->waiters, 0); + plist_head_init(&fhb->chain); + spin_lock_init(&fhb->lock); +} + static int __init futex_init(void) { unsigned int futex_shift; @@ -1141,11 +1148,8 @@ static int __init futex_init(void) futex_hashsize, futex_hashsize); futex_hashsize =3D 1UL << futex_shift; =20 - for (i =3D 0; i < futex_hashsize; i++) { - atomic_set(&futex_queues[i].waiters, 0); - plist_head_init(&futex_queues[i].chain); - spin_lock_init(&futex_queues[i].lock); - } + for (i =3D 0; i < futex_hashsize; i++) + futex_hash_bucket_init(&futex_queues[i]); =20 return 0; } --=20 2.45.2 From nobody Fri Dec 19 04:55:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC8FA1F8AD8 for ; Tue, 3 Dec 2024 16:43:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244221; cv=none; b=tCpSexrPX3RfkLTH35pvweK6xPElKQTSGqCEoXQj111JpwcBWcDfHglvSWZYzuZfvOc4fs5VKrpyrSzEOmT9CERVRnzwYs11IJrAimOBRtHyBAhu/9YMhfEVwNlSfJ/2TdT791kjzynfP+RAGwWzt1lMLPhaVGGTPBGO3GLZ6K0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244221; c=relaxed/simple; bh=q0xcEg06K+76sMw1kCDqWusX74XTilCEOo6aA0QzIlc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=s1zUegvOAoWPxcy0ON/hItMyVNGmlLlXS4Wq9MC6tInXfzZ6jL1/1rWL9Xx+ob1TjN+dHBcu2lkhPV9EyJsi+ESxS7tF3VnGG1ucdzPTmrJWB24cAAFFGsPPPQrAXgBGIKaeE6AHiSOu6PWQE5DfSVz0ZA/w011scylPMn1xTNI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=hwozFIDP; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=HUPh+I91; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="hwozFIDP"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="HUPh+I91" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1733244217; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2055qkkMZMT8jjFAZE4+srYSktnnsFEAA3czZoN/1Mw=; b=hwozFIDPtSolqmIqYi9QUN6t8sydj0xB1jGNIu/Nfb2/hyxiZ8+3WwoxCuGvtYOPfg81yz 6ScZNRbZUYoLl0zDmE4YBABBWKj3awAK0QnZvHQhilat9BUjfwyiP67VNKegQzxsQHn2VX LRTj0fm141Cl3aMgyMpEDLh914DjwEwLPqLmo3QYFyljYZ0TUw0ciycxn0uI3TPLnq6eSz 8z+rJSKwM1t7VuxrVHU+/hwSTqyvZwVfvPwmkTi/x0nOZT51OgII/BbpkYiFt63+sp6qZp XCUVfzdisnvIupW/q4RV8A4Hk16xPUnzlSHwdVaP21ieVasVzFNOViSCOtAK9g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1733244217; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2055qkkMZMT8jjFAZE4+srYSktnnsFEAA3czZoN/1Mw=; b=HUPh+I91HBoKAKEfgdx22bnK65sjSH69+Sq0sDqqPVxPkChXpmLjm7Aljl0lttUXEwnBKn ca8k8ud3b+uVciBQ== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v4 02/11] futex: Add basic infrastructure for local task local hash. Date: Tue, 3 Dec 2024 17:42:10 +0100 Message-ID: <20241203164335.1125381-3-bigeasy@linutronix.de> In-Reply-To: <20241203164335.1125381-1-bigeasy@linutronix.de> References: <20241203164335.1125381-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The futex hashmap is system wide and shared by random tasks. Each slot is hashed based on its address and VMA. Due to randomized VMAs (and memory allocations) the same logical lock (pointer) can end up in a different hash bucket on each invocation of the application. This in turn means that different applications may share a hash bucket on the first invocation but not on the second an it is not always clear which applications will be involved. This can result in high latency's to acquire the futex_hash_bucket::lock especially if the lock owner is limited to a CPU and not be effectively PI boosted. Introduce a task local hash map. The hashmap can be allocated via prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, 0) The `0' argument allocates a default number of 16 slots, a higher number can be specified if desired. The current upper limit is 131072. The allocated hashmap is used by all threads within a process. A thread can check if the private map has been allocated via prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS); Which return the current number of slots. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/futex.h | 22 ++++++++ include/linux/mm_types.h | 3 ++ include/uapi/linux/prctl.h | 5 ++ kernel/fork.c | 2 + kernel/futex/core.c | 100 +++++++++++++++++++++++++++++++++++-- kernel/sys.c | 4 ++ 6 files changed, 133 insertions(+), 3 deletions(-) diff --git a/include/linux/futex.h b/include/linux/futex.h index b70df27d7e85c..61e81b866d34e 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -77,6 +77,16 @@ void futex_exec_release(struct task_struct *tsk); =20 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, u32 __user *uaddr2, u32 val2, u32 val3); +int futex_hash_prctl(unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5); +int futex_hash_allocate_default(void); +void futex_hash_free(struct mm_struct *mm); + +static inline void futex_mm_init(struct mm_struct *mm) +{ + mm->futex_hash_bucket =3D NULL; +} + #else static inline void futex_init_task(struct task_struct *tsk) { } static inline void futex_exit_recursive(struct task_struct *tsk) { } @@ -88,6 +98,18 @@ static inline long do_futex(u32 __user *uaddr, int op, u= 32 val, { return -EINVAL; } +static inline int futex_hash_prctl(unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5) +{ + return -EINVAL; +} +static inline int futex_hash_allocate_default(void) +{ + return 0; +} +static inline void futex_hash_free(struct mm_struct *mm) { } +static inline void futex_mm_init(struct mm_struct *mm) { } + #endif =20 #endif diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7361a8f3ab68e..b16b97ab8fb2a 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -30,6 +30,7 @@ #define INIT_PASID 0 =20 struct address_space; +struct futex_hash_bucket; struct mem_cgroup; =20 /* @@ -902,6 +903,8 @@ struct mm_struct { int mm_lock_seq; #endif =20 + unsigned int futex_hash_mask; + struct futex_hash_bucket *futex_hash_bucket; =20 unsigned long hiwater_rss; /* High-watermark of RSS usage */ unsigned long hiwater_vm; /* High-water virtual memory usage */ diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 5c6080680cb27..55b843644c51a 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -353,4 +353,9 @@ struct prctl_mm_map { */ #define PR_LOCK_SHADOW_STACK_STATUS 76 =20 +/* FUTEX hash management */ +#define PR_FUTEX_HASH 77 +# define PR_FUTEX_HASH_SET_SLOTS 1 +# define PR_FUTEX_HASH_GET_SLOTS 2 + #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/fork.c b/kernel/fork.c index 1450b461d196a..cda8886f3a1d7 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1284,6 +1284,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm= , struct task_struct *p, RCU_INIT_POINTER(mm->exe_file, NULL); mmu_notifier_subscriptions_init(mm); init_tlb_flush_pending(mm); + futex_mm_init(mm); #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(CONFIG_SPLIT_PMD_PTLO= CKS) mm->pmd_huge_pte =3D NULL; #endif @@ -1361,6 +1362,7 @@ static inline void __mmput(struct mm_struct *mm) if (mm->binfmt) module_put(mm->binfmt->module); lru_gen_del_mm(mm); + futex_hash_free(mm); mmdrop(mm); } =20 diff --git a/kernel/futex/core.c b/kernel/futex/core.c index d1d3c7b358b23..fbfe1f1e94505 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -39,6 +39,7 @@ #include #include #include +#include =20 #include "futex.h" #include "../locking/rtmutex_common.h" @@ -107,18 +108,40 @@ late_initcall(fail_futex_debugfs); =20 #endif /* CONFIG_FAIL_FUTEX */ =20 +static inline bool futex_key_is_private(union futex_key *key) +{ + /* + * Relies on get_futex_key() to set either bit for shared + * futexes -- see comment with union futex_key. + */ + return !(key->both.offset & (FUT_OFF_INODE | FUT_OFF_MMSHARED)); +} + /** * futex_hash - Return the hash bucket in the global hash * @key: Pointer to the futex key for which the hash is calculated * * We hash on the keys returned from get_futex_key (see below) and return = the - * corresponding hash bucket in the global hash. + * corresponding hash bucket in the global hash. If the FUTEX is private a= nd + * a local hash table is privated then this one is used. */ struct futex_hash_bucket *futex_hash(union futex_key *key) { - u32 hash =3D jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, - key->both.offset); + struct futex_hash_bucket *fhb; + u32 hash; =20 + fhb =3D current->mm->futex_hash_bucket; + if (fhb && futex_key_is_private(key)) { + u32 hash_mask =3D current->mm->futex_hash_mask; + + hash =3D jhash2((u32 *)key, + offsetof(typeof(*key), both.offset) / 4, + key->both.offset); + return &fhb[hash & hash_mask]; + } + hash =3D jhash2((u32 *)key, + offsetof(typeof(*key), both.offset) / 4, + key->both.offset); return &futex_queues[hash & (futex_hashsize - 1)]; } =20 @@ -1131,6 +1154,77 @@ static void futex_hash_bucket_init(struct futex_hash= _bucket *fhb) spin_lock_init(&fhb->lock); } =20 +void futex_hash_free(struct mm_struct *mm) +{ + kvfree(mm->futex_hash_bucket); +} + +static int futex_hash_allocate(unsigned int hash_slots) +{ + struct futex_hash_bucket *fhb; + int i; + + if (current->mm->futex_hash_bucket) + return -EALREADY; + + if (!thread_group_leader(current)) + return -EINVAL; + + if (hash_slots =3D=3D 0) + hash_slots =3D 16; + if (hash_slots < 2) + hash_slots =3D 2; + if (hash_slots > 131072) + hash_slots =3D 131072; + if (!is_power_of_2(hash_slots)) + hash_slots =3D rounddown_pow_of_two(hash_slots); + + fhb =3D kvmalloc_array(hash_slots, sizeof(struct futex_hash_bucket), GFP_= KERNEL_ACCOUNT); + if (!fhb) + return -ENOMEM; + + current->mm->futex_hash_mask =3D hash_slots - 1; + + for (i =3D 0; i < hash_slots; i++) + futex_hash_bucket_init(&fhb[i]); + + current->mm->futex_hash_bucket =3D fhb; + return 0; +} + +int futex_hash_allocate_default(void) +{ + return futex_hash_allocate(0); +} + +static int futex_hash_get_slots(void) +{ + if (current->mm->futex_hash_bucket) + return current->mm->futex_hash_mask + 1; + return 0; +} + +int futex_hash_prctl(unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5) +{ + int ret; + + switch (arg2) { + case PR_FUTEX_HASH_SET_SLOTS: + ret =3D futex_hash_allocate(arg3); + break; + + case PR_FUTEX_HASH_GET_SLOTS: + ret =3D futex_hash_get_slots(); + break; + + default: + ret =3D -EINVAL; + break; + } + return ret; +} + static int __init futex_init(void) { unsigned int futex_shift; diff --git a/kernel/sys.c b/kernel/sys.c index c4c701c6f0b4d..dfa8b1b344edb 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -52,6 +52,7 @@ #include #include #include +#include =20 #include #include @@ -2809,6 +2810,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, ar= g2, unsigned long, arg3, return -EINVAL; error =3D arch_lock_shadow_stack_status(me, arg2); break; + case PR_FUTEX_HASH: + error =3D futex_hash_prctl(arg2, arg3, arg4, arg5); + break; default: error =3D -EINVAL; break; --=20 2.45.2 From nobody Fri Dec 19 04:55:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC9521F8ADB for ; Tue, 3 Dec 2024 16:43:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244221; cv=none; b=M1+UyrQJB4vHVN049ZJhr1v4bFSyG6GCZTQsRnasurF7ptlkNPCimHqnCYOJ3HIId4FDcBuuAiXn7lvwThgjIzoAFaopcisgr3lPRCE6EchKEuv/jsZRITz3HJXjXJTUA7NmfqkknDjLWacob/3zn02ycd4ZMp72PBjMDm/KjK8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244221; c=relaxed/simple; bh=1jg7C4Hn7tl+DNFyCMZ58uFZgzsrwganhgb1pjnKjCs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ea6hWGAOdQLOZZxof4+TDFOlHWs5Al4J0GgoZzJWAjq0usi3kumsXwP7U6x2+95ZPY6xBXz7OS2vHugUxQ8ZaGrOkhvTgd6S786do8fbm0eZK5LhhYDmPc6ggCxSedWPu1x5zP1Z5cVwj6/TPu5noWVAXR3qCgjYYygvZYC76HU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=fiLfMQae; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Fwj48aWp; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="fiLfMQae"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Fwj48aWp" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1733244218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BZ1YXfG8xZyG1QlATAkQFmGyYAn75aCDQCDdTpLb2R8=; b=fiLfMQaeTXwX5dGNyPra8Hg97O74thVcC2fy8zPFrM8uWth3ZNUPkgzq9dMrG+xiezf6yS a17KkI4QFjkGDuNr1OHQdW+MItu9dYNCnMqxM1Eqtn9V2EO0zDibxt6APOf4nuB/dnAwJ2 P2J+VGsxJ2JyEoIz8bHSruC8cF6XK/PrsmO7fDlv8r25hd7jYJfYwDonk3uRYo2FG4cYo5 mqtPD6nnvnyaw7zWieCyRGbcOW7WkFH2fMOAN41amxT5wzGDHWUk80ZQbLo08Iu66vrHJw rONFgtAne5uereZvnq+r6Daj5ciwruOfHxAOMEWsAHLWK6e9na87qAh0EtIuYQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1733244218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BZ1YXfG8xZyG1QlATAkQFmGyYAn75aCDQCDdTpLb2R8=; b=Fwj48aWpDnaqEvQJgYqkNOI5wai4nHTteDqyWcouVSpbmSuce2B9NicOMoJ0fhoxsn1DWJ 8CFUaXkZg07LuCCg== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v4 03/11] futex: Allow automatic allocation of process wide futex hash. Date: Tue, 3 Dec 2024 17:42:11 +0100 Message-ID: <20241203164335.1125381-4-bigeasy@linutronix.de> In-Reply-To: <20241203164335.1125381-1-bigeasy@linutronix.de> References: <20241203164335.1125381-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allocate a default futex hash if a task forks its first thread. Signed-off-by: Sebastian Andrzej Siewior --- kernel/fork.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/kernel/fork.c b/kernel/fork.c index cda8886f3a1d7..6267d600af991 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2130,6 +2130,17 @@ static void rv_task_fork(struct task_struct *p) #define rv_task_fork(p) do {} while (0) #endif =20 +static bool need_futex_hash_allocate_default(u64 clone_flags) +{ + if ((clone_flags & (CLONE_THREAD | CLONE_VM)) !=3D (CLONE_THREAD | CLONE_= VM)) + return false; + if (!thread_group_empty(current)) + return false; + if (current->mm->futex_hash_bucket) + return false; + return true; +} + /* * This creates a new process as a copy of the old one, * but does not actually start it yet. @@ -2507,6 +2518,21 @@ __latent_entropy struct task_struct *copy_process( if (retval) goto bad_fork_cancel_cgroup; =20 + /* + * Allocate a default futex hash for the user process once the first + * thread spawns. + */ + if (need_futex_hash_allocate_default(clone_flags)) { + retval =3D futex_hash_allocate_default(); + if (retval) + goto bad_fork_core_free; + /* + * If we fail beyond this point we don't free the allocated + * futex hash map. We assume that another thread will created + * and makes use of it The hash map will be freed once the main + * thread terminates. + */ + } /* * From this point on we must avoid any synchronous user-space * communication until we take the tasklist-lock. In particular, we do --=20 2.45.2 From nobody Fri Dec 19 04:55:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82E861F8AE0 for ; Tue, 3 Dec 2024 16:43:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244222; cv=none; b=Ury+dDQDRogSSj6HglCQgJO/5vxQu30VnrUinxAIpFgrEUSn+lI/822jr/hnZcuV51ku/b6hG7jx5MRs6supqwA2tz9ovjaMYNbGvtJNG/VoBsRXo19VgE6MsFyfka4GA/ZV5dDuatlKim3vtw78pc5q8FllxGIXMB/q7W0Jmjw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244222; c=relaxed/simple; bh=LTsqdft1dPcv0xHP6p0o7VD6AxpWCX4fuSvJY4j9o+I=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LVxa97Fq9wgoc5ZpdYUJbL9/u00LhBt8HC74iQxhUSPYbIVnZJ/90A/YX8plwiaFuqP7AUb//01x2acEZPRx1ZKxHuLQ7DorAoi8c0Docz6CyG7eK0G0IvtFMIQ0Gc0qbpdWjZP9J6Dd1kA3WohsqCITbNHz0z6vDIfAz++dowc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=TnH6AdTQ; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=u++tDink; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="TnH6AdTQ"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="u++tDink" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1733244218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5hRl2vndQY+cEadOVbdMCgzj5162VGAvi0uAXVn0fM8=; b=TnH6AdTQNUk66+aJZ3tCzIMTdAirn/b6inUMFWNvRG7X2CN7eLmmE0jDwQjeoRSGZvy51A GXEc1w9kFkqoNrG/wb6Wmi1qj3ufWOb/SU4m9+m+gHaNx7Xpz5DACPvWY+KDvnVAVVvwOl KR7BMtbEQuFDc2xuut0l6aQg3erFnaSz68DGwq1M73i5KHFyMWELOvr7J6l1Zv4nuq+XtJ xee+LMyVrjMvFNV/fK6qz8+bi4hLBnUsounHRVQGV9xvdQGSN+9DoEHVfTztUluF4Wnw42 Vg4+G3DNArUE2rs+4539DLLwGJlaokKZPKjfNQu8+O3X1qhWBkD+q1BxxUk9iQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1733244218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5hRl2vndQY+cEadOVbdMCgzj5162VGAvi0uAXVn0fM8=; b=u++tDinkaLpDMb532QuRHwaT5xQsLd5Ldr6axlVz5SZfU4y4sBX3l+c+cv/rldWYS6lP8O Z3eOamhD82+MRxAA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v4 04/11] futex: Hash only the address for private futexes. Date: Tue, 3 Dec 2024 17:42:12 +0100 Message-ID: <20241203164335.1125381-5-bigeasy@linutronix.de> In-Reply-To: <20241203164335.1125381-1-bigeasy@linutronix.de> References: <20241203164335.1125381-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" futex_hash() passes the whole futex_key to jhash2. The first two member are passed as the first argument and the offset as the "initial value". For private futexes, the mm-part is always the same and it is used only within the process. By excluding the mm part from the hash, we reduce the length passed to jhash2 from 4 (16 / 4) to 2 (8 / 2). This avoids the __jhash_mix() part of jhash. The resulting code is smaller and based on testing this variant performs as good as the original or slightly better. Signed-off-by: Sebastian Andrzej Siewior --- kernel/futex/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/futex/core.c b/kernel/futex/core.c index fbfe1f1e94505..14251bbafaffb 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -134,8 +134,8 @@ struct futex_hash_bucket *futex_hash(union futex_key *k= ey) if (fhb && futex_key_is_private(key)) { u32 hash_mask =3D current->mm->futex_hash_mask; =20 - hash =3D jhash2((u32 *)key, - offsetof(typeof(*key), both.offset) / 4, + hash =3D jhash2((void *)&key->private.address, + sizeof(key->private.address) / 4, key->both.offset); return &fhb[hash & hash_mask]; } --=20 2.45.2 From nobody Fri Dec 19 04:55:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D2931FA167 for ; Tue, 3 Dec 2024 16:43:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244224; cv=none; b=qf6sCrxBI7c3jgDU2qiTr7fQdqRVLjSaskA+F3AY2lO8KmpVqGnk9rHtihqOz4a6nSGV6qC3agp1H0XMZCS9DPxzu8TV965H3KEd2mgSz2zqnc02Ur1l4YGeTx2gxTCdtcDLhpnOAGHLZVNKi3+NBBt2V37OO+7quXHKRcssres= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244224; c=relaxed/simple; bh=kVN2hfIojMc/AG54+sdOHXIgWCq4aNVEYbrMlU3QDps=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rq7ZLsGxFmL5/6p59lE7z2J5QKVsNGeVJ254IketwExdhTcWGjTMh2deuSFuvQUZPeja6BB0+UEJ5wwgEK3LijIwJffbHBSoNrmk6LSyuqTohkuudH+QCrC0gxKMWuzQzpWt9Ju6z7h6WqvoF+59uM+pxNTHhiuJ846dzHbix8E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=eaRjOliy; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=MlipyIrT; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="eaRjOliy"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="MlipyIrT" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1733244218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+vx6odPE5/M/zu43Vy1csw0o+66LVJ83d3Ac8QFvOmM=; b=eaRjOliyVWp05gLuJ6RSUp8qLdPAL15qtoZUcS8UMvbGO4sI8vVkzgMHSoUTHfY6lduEtB i2BI+H4URmoCMmfu8NONki1kI1MxR7xxvrbORqyFWw/NrETZSCr/0dOG/OrPYec5JB3OZN wj5VCmP7o4A/a0XWN1Tdxe/wi7XWouP8FZqjICxg9C56QrTXtLKX4DR548oSzr5fKvEvDL R3qn4ZLmwuMTiBtN1gbmLO3i8A10yGzNianHqxG4d0CcfKnckkpo3Nl/skizwvWtmzzVLR ji7aP/BIUpYSVBl/FEelPx8ebc6qSJtugDSaj/kIWhfRayoLaVkKiu5tm4+87A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1733244218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+vx6odPE5/M/zu43Vy1csw0o+66LVJ83d3Ac8QFvOmM=; b=MlipyIrTh252aBTkvvUF+4wWCubtr4tXSZpYH+Pyt2Xd6/yxGR0TQv+EJjWWC9kRWvANym uuFhAQBXRrFxiIAg== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v4 05/11] futex: Track the futex hash bucket. Date: Tue, 3 Dec 2024 17:42:13 +0100 Message-ID: <20241203164335.1125381-6-bigeasy@linutronix.de> In-Reply-To: <20241203164335.1125381-1-bigeasy@linutronix.de> References: <20241203164335.1125381-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add futex_hash_get/put() to keep the assigned hash_bucket around while a futex operation is performed. Have RCU lifetime guarantee for futex_hash_bucket_private. This is should have the right amount of gets/ puts so that the private hash bucket is released on exit. This is preparatory work to allow change the hash bucket at runtime. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/futex.h | 2 +- include/linux/mm_types.h | 5 +- kernel/futex/core.c | 104 +++++++++++++++++++++++++++++++++------ kernel/futex/futex.h | 8 +++ kernel/futex/pi.c | 7 +++ kernel/futex/requeue.c | 16 ++++++ kernel/futex/waitwake.c | 15 +++++- 7 files changed, 136 insertions(+), 21 deletions(-) diff --git a/include/linux/futex.h b/include/linux/futex.h index 61e81b866d34e..359fc24eb37ff 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -84,7 +84,7 @@ void futex_hash_free(struct mm_struct *mm); =20 static inline void futex_mm_init(struct mm_struct *mm) { - mm->futex_hash_bucket =3D NULL; + rcu_assign_pointer(mm->futex_hash_bucket, NULL); } =20 #else diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index b16b97ab8fb2a..4f39928631042 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -30,7 +30,7 @@ #define INIT_PASID 0 =20 struct address_space; -struct futex_hash_bucket; +struct futex_hash_bucket_private; struct mem_cgroup; =20 /* @@ -903,8 +903,7 @@ struct mm_struct { int mm_lock_seq; #endif =20 - unsigned int futex_hash_mask; - struct futex_hash_bucket *futex_hash_bucket; + struct futex_hash_bucket_private __rcu *futex_hash_bucket; =20 unsigned long hiwater_rss; /* High-watermark of RSS usage */ unsigned long hiwater_vm; /* High-water virtual memory usage */ diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 14251bbafaffb..464918d85395e 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -40,6 +40,7 @@ #include #include #include +#include =20 #include "futex.h" #include "../locking/rtmutex_common.h" @@ -56,6 +57,12 @@ static struct { #define futex_queues (__futex_data.queues) #define futex_hashsize (__futex_data.hashsize) =20 +struct futex_hash_bucket_private { + rcuref_t users; + unsigned int hash_mask; + struct rcu_head rcu; + struct futex_hash_bucket queues[]; +}; =20 /* * Fault injections for futexes. @@ -127,17 +134,24 @@ static inline bool futex_key_is_private(union futex_k= ey *key) */ struct futex_hash_bucket *futex_hash(union futex_key *key) { - struct futex_hash_bucket *fhb; + struct futex_hash_bucket_private *hb_p =3D NULL; u32 hash; =20 - fhb =3D current->mm->futex_hash_bucket; - if (fhb && futex_key_is_private(key)) { - u32 hash_mask =3D current->mm->futex_hash_mask; + if (futex_key_is_private(key)) { + guard(rcu)(); + + do { + hb_p =3D rcu_dereference(current->mm->futex_hash_bucket); + } while (hb_p && !rcuref_get(&hb_p->users)); + } + + if (hb_p) { + u32 hash_mask =3D hb_p->hash_mask; =20 hash =3D jhash2((void *)&key->private.address, sizeof(key->private.address) / 4, key->both.offset); - return &fhb[hash & hash_mask]; + return &hb_p->queues[hash & hash_mask]; } hash =3D jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, @@ -145,6 +159,35 @@ struct futex_hash_bucket *futex_hash(union futex_key *= key) return &futex_queues[hash & (futex_hashsize - 1)]; } =20 +static void futex_hash_priv_put(struct futex_hash_bucket_private *hb_p) +{ + if (rcuref_put(&hb_p->users)) + kvfree_rcu(hb_p, rcu); +} + +void futex_hash_put(struct futex_hash_bucket *hb) +{ + struct futex_hash_bucket_private *hb_p; + + if (hb->hb_slot =3D=3D 0) + return; + hb_p =3D container_of(hb, struct futex_hash_bucket_private, + queues[hb->hb_slot - 1]); + futex_hash_priv_put(hb_p); +} + +void futex_hash_get(struct futex_hash_bucket *hb) +{ + struct futex_hash_bucket_private *hb_p; + + if (hb->hb_slot =3D=3D 0) + return; + + hb_p =3D container_of(hb, struct futex_hash_bucket_private, + queues[hb->hb_slot - 1]); + /* The ref needs to be owned by the caller so this can't fail */ + WARN_ON_ONCE(!rcuref_get(&hb_p->users)); +} =20 /** * futex_setup_timer - set up the sleeping hrtimer. @@ -599,7 +642,10 @@ int futex_unqueue(struct futex_q *q) */ lock_ptr =3D READ_ONCE(q->lock_ptr); if (lock_ptr !=3D NULL) { + struct futex_hash_bucket *hb; + spin_lock(lock_ptr); + hb =3D futex_hb_from_futex_q(q); /* * q->lock_ptr can change between reading it and * spin_lock(), causing us to take the wrong lock. This @@ -622,6 +668,7 @@ int futex_unqueue(struct futex_q *q) BUG_ON(q->pi_state); =20 spin_unlock(lock_ptr); + futex_hash_put(hb); ret =3D 1; } =20 @@ -999,6 +1046,7 @@ static void exit_pi_state_list(struct task_struct *cur= r) if (!refcount_inc_not_zero(&pi_state->refcount)) { raw_spin_unlock_irq(&curr->pi_lock); cpu_relax(); + futex_hash_put(hb); raw_spin_lock_irq(&curr->pi_lock); continue; } @@ -1015,6 +1063,7 @@ static void exit_pi_state_list(struct task_struct *cu= rr) /* retain curr->pi_lock for the loop invariant */ raw_spin_unlock(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); + futex_hash_put(hb); put_pi_state(pi_state); continue; } @@ -1027,6 +1076,7 @@ static void exit_pi_state_list(struct task_struct *cu= rr) raw_spin_unlock(&curr->pi_lock); raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); + futex_hash_put(hb); =20 rt_mutex_futex_unlock(&pi_state->pi_mutex); put_pi_state(pi_state); @@ -1147,8 +1197,9 @@ void futex_exit_release(struct task_struct *tsk) futex_cleanup_end(tsk, FUTEX_STATE_DEAD); } =20 -static void futex_hash_bucket_init(struct futex_hash_bucket *fhb) +static void futex_hash_bucket_init(struct futex_hash_bucket *fhb, unsigned= int slot) { + fhb->hb_slot =3D slot; atomic_set(&fhb->waiters, 0); plist_head_init(&fhb->chain); spin_lock_init(&fhb->lock); @@ -1156,12 +1207,20 @@ static void futex_hash_bucket_init(struct futex_has= h_bucket *fhb) =20 void futex_hash_free(struct mm_struct *mm) { - kvfree(mm->futex_hash_bucket); + struct futex_hash_bucket_private *hb_p; + + /* own a reference */ + hb_p =3D rcu_dereference_check(mm->futex_hash_bucket, true); + if (!hb_p) + return; + WARN_ON(rcuref_read(&hb_p->users) !=3D 1); + futex_hash_priv_put(hb_p); } =20 static int futex_hash_allocate(unsigned int hash_slots) { - struct futex_hash_bucket *fhb; + struct futex_hash_bucket_private *hb_p; + size_t alloc_size; int i; =20 if (current->mm->futex_hash_bucket) @@ -1179,16 +1238,25 @@ static int futex_hash_allocate(unsigned int hash_sl= ots) if (!is_power_of_2(hash_slots)) hash_slots =3D rounddown_pow_of_two(hash_slots); =20 - fhb =3D kvmalloc_array(hash_slots, sizeof(struct futex_hash_bucket), GFP_= KERNEL_ACCOUNT); - if (!fhb) + if (unlikely(check_mul_overflow(hash_slots, sizeof(struct futex_hash_buck= et), + &alloc_size))) return -ENOMEM; =20 - current->mm->futex_hash_mask =3D hash_slots - 1; + if (unlikely(check_add_overflow(alloc_size, sizeof(struct futex_hash_buck= et_private), + &alloc_size))) + return -ENOMEM; + + hb_p =3D kvmalloc(alloc_size, GFP_KERNEL_ACCOUNT); + if (!hb_p) + return -ENOMEM; + + rcuref_init(&hb_p->users, 1); + hb_p->hash_mask =3D hash_slots - 1; =20 for (i =3D 0; i < hash_slots; i++) - futex_hash_bucket_init(&fhb[i]); + futex_hash_bucket_init(&hb_p->queues[i], i + 1); =20 - current->mm->futex_hash_bucket =3D fhb; + rcu_assign_pointer(current->mm->futex_hash_bucket, hb_p); return 0; } =20 @@ -1199,8 +1267,12 @@ int futex_hash_allocate_default(void) =20 static int futex_hash_get_slots(void) { - if (current->mm->futex_hash_bucket) - return current->mm->futex_hash_mask + 1; + struct futex_hash_bucket_private *hb_p; + + guard(rcu)(); + hb_p =3D rcu_dereference(current->mm->futex_hash_bucket); + if (hb_p) + return hb_p->hash_mask + 1; return 0; } =20 @@ -1243,7 +1315,7 @@ static int __init futex_init(void) futex_hashsize =3D 1UL << futex_shift; =20 for (i =3D 0; i < futex_hashsize; i++) - futex_hash_bucket_init(&futex_queues[i]); + futex_hash_bucket_init(&futex_queues[i], 0); =20 return 0; } diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index 618ce1fe870e9..ceea260ad9e80 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -115,6 +115,7 @@ static inline bool should_fail_futex(bool fshared) */ struct futex_hash_bucket { atomic_t waiters; + unsigned int hb_slot; spinlock_t lock; struct plist_head chain; } ____cacheline_aligned_in_smp; @@ -202,6 +203,13 @@ futex_setup_timer(ktime_t *time, struct hrtimer_sleepe= r *timeout, int flags, u64 range_ns); =20 extern struct futex_hash_bucket *futex_hash(union futex_key *key); +extern void futex_hash_put(struct futex_hash_bucket *hb); +extern void futex_hash_get(struct futex_hash_bucket *hb); + +static inline struct futex_hash_bucket *futex_hb_from_futex_q(struct futex= _q *q) +{ + return container_of(q->lock_ptr, struct futex_hash_bucket, lock); +} =20 /** * futex_match - Check whether two futex keys are equal diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index d62cca5ed8f4c..60a62ab250b08 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -964,6 +964,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags= , ktime_t *time, int tryl * - EAGAIN: The user space value changed. */ futex_q_unlock(hb); + futex_hash_put(hb); /* * Handle the case where the owner is in the middle of * exiting. Wait for the exit to complete otherwise @@ -1083,10 +1084,12 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int f= lags, ktime_t *time, int tryl =20 futex_unqueue_pi(&q); spin_unlock(q.lock_ptr); + futex_hash_put(hb); goto out; =20 out_unlock_put_key: futex_q_unlock(hb); + futex_hash_put(hb); =20 out: if (to) { @@ -1097,6 +1100,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fla= gs, ktime_t *time, int tryl =20 uaddr_faulted: futex_q_unlock(hb); + futex_hash_put(hb); =20 ret =3D fault_in_user_writeable(uaddr); if (ret) @@ -1197,6 +1201,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) =20 get_pi_state(pi_state); spin_unlock(&hb->lock); + futex_hash_put(hb); =20 /* drops pi_state->pi_mutex.wait_lock */ ret =3D wake_futex_pi(uaddr, uval, pi_state, rt_waiter); @@ -1236,6 +1241,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) */ if ((ret =3D futex_cmpxchg_value_locked(&curval, uaddr, uval, 0))) { spin_unlock(&hb->lock); + futex_hash_put(hb); switch (ret) { case -EFAULT: goto pi_faulted; @@ -1256,6 +1262,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) =20 out_unlock: spin_unlock(&hb->lock); + futex_hash_put(hb); return ret; =20 pi_retry: diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index b47bb764b3520..39e96f1bef8ce 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -87,6 +87,8 @@ void requeue_futex(struct futex_q *q, struct futex_hash_b= ucket *hb1, futex_hb_waiters_inc(hb2); plist_add(&q->list, &hb2->chain); q->lock_ptr =3D &hb2->lock; + futex_hash_put(hb1); + futex_hash_get(hb2); } q->key =3D *key2; } @@ -231,8 +233,10 @@ void requeue_pi_wake_futex(struct futex_q *q, union fu= tex_key *key, =20 WARN_ON(!q->rt_waiter); q->rt_waiter =3D NULL; + futex_hash_put(futex_hb_from_futex_q(q)); =20 q->lock_ptr =3D &hb->lock; + futex_hash_get(hb); =20 /* Signal locked state to the waiter */ futex_requeue_pi_complete(q, 1); @@ -458,6 +462,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, if (unlikely(ret)) { double_unlock_hb(hb1, hb2); futex_hb_waiters_dec(hb2); + futex_hash_put(hb1); + futex_hash_put(hb2); =20 ret =3D get_user(curval, uaddr1); if (ret) @@ -544,6 +550,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, case -EFAULT: double_unlock_hb(hb1, hb2); futex_hb_waiters_dec(hb2); + futex_hash_put(hb1); + futex_hash_put(hb2); ret =3D fault_in_user_writeable(uaddr2); if (!ret) goto retry; @@ -558,6 +566,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, */ double_unlock_hb(hb1, hb2); futex_hb_waiters_dec(hb2); + futex_hash_put(hb1); + futex_hash_put(hb2); /* * Handle the case where the owner is in the middle of * exiting. Wait for the exit to complete otherwise @@ -677,6 +687,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, double_unlock_hb(hb1, hb2); wake_up_q(&wake_q); futex_hb_waiters_dec(hb2); + futex_hash_put(hb1); + futex_hash_put(hb2); return ret ? ret : task_count; } =20 @@ -815,6 +827,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, */ if (futex_match(&q.key, &key2)) { futex_q_unlock(hb); + futex_hash_put(hb); ret =3D -EINVAL; goto out; } @@ -828,6 +841,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, spin_lock(&hb->lock); ret =3D handle_early_requeue_pi_wakeup(hb, &q, to); spin_unlock(&hb->lock); + futex_hash_put(hb); break; =20 case Q_REQUEUE_PI_LOCKED: @@ -847,6 +861,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, */ ret =3D ret < 0 ? ret : 0; } + futex_hash_put(futex_hb_from_futex_q(&q)); break; =20 case Q_REQUEUE_PI_DONE: @@ -876,6 +891,7 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, =20 futex_unqueue_pi(&q); spin_unlock(q.lock_ptr); + futex_hash_put(futex_hb_from_futex_q(&q)); =20 if (ret =3D=3D -EINTR) { /* diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index 3a10375d95218..1f2d11eb7f89f 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -113,6 +113,8 @@ bool __futex_wake_mark(struct futex_q *q) return false; =20 __futex_unqueue(q); + /* Waiters reference */ + futex_hash_put(futex_hb_from_futex_q(q)); /* * The waiting task can free the futex_q as soon as q->lock_ptr =3D NULL * is written, without taking any locks. This is possible in the event @@ -173,8 +175,10 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, = int nr_wake, u32 bitset) hb =3D futex_hash(&key); =20 /* Make sure we really have tasks to wakeup */ - if (!futex_hb_waiters_pending(hb)) + if (!futex_hb_waiters_pending(hb)) { + futex_hash_put(hb); return ret; + } =20 spin_lock(&hb->lock); =20 @@ -196,6 +200,7 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, i= nt nr_wake, u32 bitset) } =20 spin_unlock(&hb->lock); + futex_hash_put(hb); wake_up_q(&wake_q); return ret; } @@ -275,6 +280,8 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flag= s, u32 __user *uaddr2, op_ret =3D futex_atomic_op_inuser(op, uaddr2); if (unlikely(op_ret < 0)) { double_unlock_hb(hb1, hb2); + futex_hash_put(hb1); + futex_hash_put(hb2); =20 if (!IS_ENABLED(CONFIG_MMU) || unlikely(op_ret !=3D -EFAULT && op_ret !=3D -EAGAIN)) { @@ -329,6 +336,8 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flag= s, u32 __user *uaddr2, out_unlock: double_unlock_hb(hb1, hb2); wake_up_q(&wake_q); + futex_hash_put(hb1); + futex_hash_put(hb2); return ret; } =20 @@ -466,6 +475,8 @@ int futex_wait_multiple_setup(struct futex_vector *vs, = int count, int *woken) } =20 futex_q_unlock(hb); + futex_hash_put(hb); + __set_current_state(TASK_RUNNING); =20 /* @@ -625,6 +636,7 @@ int futex_wait_setup(u32 __user *uaddr, u32 val, unsign= ed int flags, =20 if (ret) { futex_q_unlock(*hb); + futex_hash_put(*hb); =20 ret =3D get_user(uval, uaddr); if (ret) @@ -638,6 +650,7 @@ int futex_wait_setup(u32 __user *uaddr, u32 val, unsign= ed int flags, =20 if (uval !=3D val) { futex_q_unlock(*hb); + futex_hash_put(*hb); ret =3D -EWOULDBLOCK; } =20 --=20 2.45.2 From nobody Fri Dec 19 04:55:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4129C1FA16E for ; Tue, 3 Dec 2024 16:43:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244225; cv=none; b=DRSVFjrDUbSSYq6Kow8hSNjrfSPtxnRVI0JeWhHL9R3t+6MN5q7e+cQ7kKrctYwFoVeaK5UHlsnHftAEN1ChhrIsZUw05Z0VZRUkOsQOt0P9hr5FQ5a3OzEESCw008Srs8fsJYL+VuCJEqRnTFOuPothmfisyGEn62Ncnbq6zfc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244225; c=relaxed/simple; bh=1P4wggfGo8N1lrg78RRL3VO1TQ2jPBcoWG7btPF78Dc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Hls6kQiCF7+26R/81uaE7otQ/6J+jRv1Kor6v6uL7CHlWowKKnepgjcuBVTkWYjhXwHsoMi2eNaQsdRjvVcvLpUobA2/2dS8McYjYkBdVrYtaemSgfoCrmGMyb6+CRsP0C5msBS5gfSUdkOMqEHZ43TaKkrfg0VvdHNPfpdE8jI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=3zW1BJfp; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=dkSd5yJU; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="3zW1BJfp"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="dkSd5yJU" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1733244218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=m8OQB/9UCvB6TFuSmvXUFUtblQYkeKf7yS+DTrnPZFE=; b=3zW1BJfp7u0sm1WWs79AqoX5YZqluSJRB6eLl4OV67zpBT7xOlI4lPdHPtl5btTRNHEtda xRA6+H9QsI9ImbfX30IBPT76pa08YC6dlkMdbce1SCrCWBOxIUKS85bwd8tAscrMbSaaSb y+Aymxhf/vGVFfpVYFtWVDm0lyN/vItwIp5Rf8NUDR8hu8cr9vrMvq7d9wlpzzfYLZjzBl 1OcJvqTN2nNQXT7/CKAKbwjVPHmhHwTynPW4D/QvVD3en1FsecIaX68e/b2dtkVSnib3vO aSLZkVSeRCzaEEnVEGI/uUhvonbAPXhDyIk7HFJiox4AYeEBR4DaIJ8lEoLT0g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1733244218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=m8OQB/9UCvB6TFuSmvXUFUtblQYkeKf7yS+DTrnPZFE=; b=dkSd5yJU5+fMVORP0l34vwb7IQ4nB+OhzITG0LtMSo2vFLkCmQE84d2JJcHsGmetaoBnPG w8I8TaI0dNWNZOCw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v4 06/11] futex: Allow to re-allocate the private hash bucket. Date: Tue, 3 Dec 2024 17:42:14 +0100 Message-ID: <20241203164335.1125381-7-bigeasy@linutronix.de> In-Reply-To: <20241203164335.1125381-1-bigeasy@linutronix.de> References: <20241203164335.1125381-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The mm_struct::futex_hash_lock guards the futex_hash_bucket assignment/ replacement. The futex_hash_allocate()/ PR_FUTEX_HASH_SET_SLOTS operation can now be invoked at runtime and resize the internal private futex_hash_bucket to another size. The idea is to use the recently introduced ref counting to keep a currently used HB around. On resize/ replacement a new HB (array) is assigned to the process. All users on the old HB will receive a wakeup so they can dequeue them self from the old hb and enqueue on the new one. In the WAIT case, after a wakeup the needs to check if a successful wake up occurred and if not and the HB changed just dequeue + enqueue and wait again. In the WAKE case, it needs to iterate over all waiters. If a the HB changed then the waiter can not disappear. New waiters will use the new HB and therefore will be missed. Therefore it will try again if the HB changed and it may wake more tasks. The same logic applies to REQUEUE. LOCK_PI, UNLOCK_PI and its REQUEUE_PI part are slightly more complicated due to the internal kernel state. If the HB changes then we have the old PI state created by the first waiter and possible a new PI state created by waiter on the new HB lock. On LOCK_PI, if the HB changed it needs to abandon the PI state it may have acquired the lock on PI state but everyone else might use the "new" PI state. This PI state won't be used anymore because every water will requeue. It is needed to check the UADDR if the lock has been passed by UNLOCK_PI prio the HB change or if were woken up due to the HB change. If we own the lock based on UADDR, we own it otherwise we retry. UNLOCK_PI takes the first waiter and passes the lock. If there is no waiter then it updates the UADDR to 0. Before the update succeeds the HB can change and a waiter can setup a new PI state based for the UNLOCK_PI caller and wait on it. To complicate it further, userland can acquire the lock at this time. This may happen because new waiter no longer block on the hb lock. To avoid this race, futex_hash_lock is acquired for the update to 0 ensure the HB can't change and all waiter will block. The same logic applies to REQUEUE_PI. WAIT_REQUEUE_PI tries to recover from a HB change in a similar way LOCK_PI does. If the requeue occurred but it waits already on UADDR2 then for the last step it simply invokes futex_lock_pi(). CMP_REQUEUE_PI follows the UNLOCK_PI logic and acquires futex_hash_lock for the whole operation. Signed-off-by: Sebastian Andrzej Siewior --- include/linux/futex.h | 1 + include/linux/mm_types.h | 1 + kernel/futex/core.c | 65 ++++++++++++++++--- kernel/futex/futex.h | 3 + kernel/futex/pi.c | 110 +++++++++++++++++++++++++++++++- kernel/futex/requeue.c | 74 ++++++++++++++++++++- kernel/futex/waitwake.c | 42 ++++++++++-- kernel/locking/rtmutex.c | 26 ++++++++ kernel/locking/rtmutex_common.h | 5 +- 9 files changed, 308 insertions(+), 19 deletions(-) diff --git a/include/linux/futex.h b/include/linux/futex.h index 359fc24eb37ff..ce9e284bbeb09 100644 --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -85,6 +85,7 @@ void futex_hash_free(struct mm_struct *mm); static inline void futex_mm_init(struct mm_struct *mm) { rcu_assign_pointer(mm->futex_hash_bucket, NULL); + init_rwsem(&mm->futex_hash_lock); } =20 #else diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 4f39928631042..07f1567f2b51f 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -903,6 +903,7 @@ struct mm_struct { int mm_lock_seq; #endif =20 + struct rw_semaphore futex_hash_lock; struct futex_hash_bucket_private __rcu *futex_hash_bucket; =20 unsigned long hiwater_rss; /* High-watermark of RSS usage */ diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 464918d85395e..0dd7100e36419 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -573,6 +573,7 @@ struct futex_hash_bucket *futex_q_lock(struct futex_q *= q) { struct futex_hash_bucket *hb; =20 +try_again: hb =3D futex_hash(&q->key); =20 /* @@ -588,7 +589,13 @@ struct futex_hash_bucket *futex_q_lock(struct futex_q = *q) q->lock_ptr =3D &hb->lock; =20 spin_lock(&hb->lock); - return hb; + if (futex_check_hb_valid(hb)) + return hb; + + futex_hb_waiters_dec(hb); + spin_unlock(&hb->lock); + futex_hash_put(hb); + goto try_again; } =20 void futex_q_unlock(struct futex_hash_bucket *hb) @@ -1217,18 +1224,50 @@ void futex_hash_free(struct mm_struct *mm) futex_hash_priv_put(hb_p); } =20 +static void futex_put_old_hb_p(struct futex_hash_bucket_private *hb_p) +{ + unsigned int slots =3D hb_p->hash_mask + 1; + struct futex_hash_bucket *hb; + DEFINE_WAKE_Q(wake_q); + unsigned int i; + + for (i =3D 0; i < slots; i++) { + struct futex_q *this; + + hb =3D &hb_p->queues[i]; + + spin_lock(&hb->lock); + plist_for_each_entry(this, &hb->chain, list) + wake_q_add(&wake_q, this->task); + spin_unlock(&hb->lock); + } + futex_hash_priv_put(hb_p); + + wake_up_q(&wake_q); +} + +bool futex_check_hb_valid(struct futex_hash_bucket *hb) +{ + struct futex_hash_bucket_private *hb_p_now; + struct futex_hash_bucket_private *hb_p; + + if (hb->hb_slot =3D=3D 0) + return true; + guard(rcu)(); + hb_p_now =3D rcu_dereference(current->mm->futex_hash_bucket); + hb_p =3D container_of(hb, struct futex_hash_bucket_private, + queues[hb->hb_slot - 1]); + + return hb_p_now =3D=3D hb_p; +} + static int futex_hash_allocate(unsigned int hash_slots) { - struct futex_hash_bucket_private *hb_p; + struct futex_hash_bucket_private *hb_p, *hb_p_old =3D NULL; + struct mm_struct *mm; size_t alloc_size; int i; =20 - if (current->mm->futex_hash_bucket) - return -EALREADY; - - if (!thread_group_leader(current)) - return -EINVAL; - if (hash_slots =3D=3D 0) hash_slots =3D 16; if (hash_slots < 2) @@ -1256,7 +1295,15 @@ static int futex_hash_allocate(unsigned int hash_slo= ts) for (i =3D 0; i < hash_slots; i++) futex_hash_bucket_init(&hb_p->queues[i], i + 1); =20 - rcu_assign_pointer(current->mm->futex_hash_bucket, hb_p); + mm =3D current->mm; + scoped_guard(rwsem_write, &mm->futex_hash_lock) { + hb_p_old =3D rcu_dereference_check(mm->futex_hash_bucket, + lockdep_is_held(&mm->futex_hash_lock)); + rcu_assign_pointer(mm->futex_hash_bucket, hb_p); + } + if (hb_p_old) + futex_put_old_hb_p(hb_p_old); + return 0; } =20 diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index ceea260ad9e80..503f56643a966 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -205,6 +205,9 @@ futex_setup_timer(ktime_t *time, struct hrtimer_sleeper= *timeout, extern struct futex_hash_bucket *futex_hash(union futex_key *key); extern void futex_hash_put(struct futex_hash_bucket *hb); extern void futex_hash_get(struct futex_hash_bucket *hb); +extern bool futex_check_hb_valid(struct futex_hash_bucket *hb); +extern bool check_pi_lock_owner(u32 __user *uaddr); +extern void reset_pi_state_owner(struct futex_pi_state *pi_state); =20 static inline struct futex_hash_bucket *futex_hb_from_futex_q(struct futex= _q *q) { diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index 60a62ab250b08..b4156d1cc6608 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -43,8 +43,8 @@ static struct futex_pi_state *alloc_pi_state(void) return pi_state; } =20 -static void pi_state_update_owner(struct futex_pi_state *pi_state, - struct task_struct *new_owner) +void pi_state_update_owner(struct futex_pi_state *pi_state, + struct task_struct *new_owner) { struct task_struct *old_owner =3D pi_state->owner; =20 @@ -854,6 +854,47 @@ static int fixup_pi_state_owner(u32 __user *uaddr, str= uct futex_q *q, return ret; } =20 +bool check_pi_lock_owner(u32 __user *uaddr) +{ + u32 our_tid, uval; + int ret; + + our_tid =3D task_pid_vnr(current); + do { + ret =3D futex_get_value_locked(&uval, uaddr); + switch (ret) { + case 0: + if ((uval & FUTEX_TID_MASK) =3D=3D our_tid) + return true; + return false; + break; + + case -EFAULT: + ret =3D fault_in_user_writeable(uaddr); + if (ret < 0) + return false; + break; + + case -EAGAIN: + cond_resched(); + break; + + default: + WARN_ON_ONCE(1); + return false; + } + + } while (1); +} + +void reset_pi_state_owner(struct futex_pi_state *pi_state) +{ + raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); + pi_state_update_owner(pi_state, NULL); + pi_state->owner =3D NULL; + raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); +} + /** * fixup_pi_owner() - Post lock pi_state and corner case management * @uaddr: user address of the futex @@ -999,6 +1040,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flag= s, ktime_t *time, int tryl rt_mutex_pre_schedule(); =20 rt_mutex_init_waiter(&rt_waiter); + rt_waiter.hb =3D hb; =20 /* * On PREEMPT_RT, when hb->lock becomes an rt_mutex, we must not @@ -1070,6 +1112,37 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int fl= ags, ktime_t *time, int tryl */ rt_mutex_post_schedule(); no_block: + if (!futex_check_hb_valid(hb)) { + bool uaddr_owner; + /* + * We might got the lock, we might not. We own the outdated internal + * state because the HB changed under us so it might have been all + * for nothing. + * We need to reset the pi_state and its owner because it + * points to current owner of the lock but it is not what new + * lock/ unlock caller will see so it needs a clean up. If we own + * the lock according to uaddr then it has been passed to us by an + * unlock and we got it before the HB changed. Lucky us, we keep + * it. If we were able to steal it or did not get it in the + * first place then we need to try again with the HB in place. + */ + reset_pi_state_owner(q.pi_state); + futex_unqueue_pi(&q); + spin_unlock(q.lock_ptr); + futex_hash_put(hb); + + uaddr_owner =3D check_pi_lock_owner(uaddr); + if (uaddr_owner) { + ret =3D 0; + goto out; + } + + if (refill_pi_state_cache()) { + ret =3D -ENOMEM; + goto out; + } + goto retry_private; + } /* * Fixup the pi_state owner and possibly acquire the lock if we * haven't already. @@ -1121,6 +1194,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) { u32 curval, uval, vpid =3D task_pid_vnr(current); union futex_key key =3D FUTEX_KEY_INIT; + struct rw_semaphore *futex_hash_lock =3D NULL; struct futex_hash_bucket *hb; struct futex_q *top_waiter; int ret; @@ -1128,6 +1202,8 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) if (!IS_ENABLED(CONFIG_FUTEX_PI)) return -ENOSYS; =20 + if (!(flags & FLAGS_SHARED)) + futex_hash_lock =3D ¤t->mm->futex_hash_lock; retry: if (get_user(uval, uaddr)) return -EFAULT; @@ -1232,6 +1308,32 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int = flags) return ret; } =20 + /* + * If the hb changed before the following cmpxchg finished then the + * situtation gets complicated as we don't own the lock anymore but + * there could be an internal state recorded under our name by the + * waiter under a different hb->lock. Also the PI-lock could be snuck in + * userland so there is no guarantee that we get it back. + * To avoid the mess due to this tiny race, ensure that the HB can not + * be resized while the PI lock with no owner is unlocked. + */ + if (futex_hash_lock) { + spin_unlock(&hb->lock); + down_read(futex_hash_lock); + spin_lock(&hb->lock); + + if (!futex_check_hb_valid(hb)) { + spin_unlock(&hb->lock); + up_read(futex_hash_lock); + futex_hash_put(hb); + goto retry; + } + if (futex_top_waiter(hb, &key)) { + up_read(futex_hash_lock); + goto retry_hb; + } + } + /* * We have no kernel internal state, i.e. no waiters in the * kernel. Waiters which are about to queue themselves are stuck @@ -1241,6 +1343,8 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) */ if ((ret =3D futex_cmpxchg_value_locked(&curval, uaddr, uval, 0))) { spin_unlock(&hb->lock); + if (futex_hash_lock) + up_read(futex_hash_lock); futex_hash_put(hb); switch (ret) { case -EFAULT: @@ -1254,6 +1358,8 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) return ret; } } + if (futex_hash_lock) + up_read(futex_hash_lock); =20 /* * If uval has changed, let user space handle it. diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index 39e96f1bef8ce..6b3c4413fbf47 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -378,6 +378,7 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, struct futex_hash_bucket *hb1, *hb2; struct futex_q *this, *next; DEFINE_WAKE_Q(wake_q); + struct rw_semaphore *futex_hash_lock =3D NULL; =20 if (nr_wake < 0 || nr_requeue < 0) return -EINVAL; @@ -429,6 +430,9 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, */ if (refill_pi_state_cache()) return -ENOMEM; + + if (!(flags1 & FLAGS_SHARED) || !(flags2 & FLAGS_SHARED)) + futex_hash_lock =3D ¤t->mm->futex_hash_lock; } =20 retry: @@ -447,10 +451,12 @@ int futex_requeue(u32 __user *uaddr1, unsigned int fl= ags1, if (requeue_pi && futex_match(&key1, &key2)) return -EINVAL; =20 +retry_private: + if (futex_hash_lock) + down_read(futex_hash_lock); hb1 =3D futex_hash(&key1); hb2 =3D futex_hash(&key2); =20 -retry_private: futex_hb_waiters_inc(hb2); double_lock_hb(hb1, hb2); =20 @@ -465,6 +471,9 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, futex_hash_put(hb1); futex_hash_put(hb2); =20 + if (futex_hash_lock) + up_read(futex_hash_lock); + ret =3D get_user(curval, uaddr1); if (ret) return ret; @@ -552,6 +561,9 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, futex_hb_waiters_dec(hb2); futex_hash_put(hb1); futex_hash_put(hb2); + if (futex_hash_lock) + up_read(futex_hash_lock); + ret =3D fault_in_user_writeable(uaddr2); if (!ret) goto retry; @@ -568,6 +580,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, futex_hb_waiters_dec(hb2); futex_hash_put(hb1); futex_hash_put(hb2); + if (futex_hash_lock) + up_read(futex_hash_lock); /* * Handle the case where the owner is in the middle of * exiting. Wait for the exit to complete otherwise @@ -687,6 +701,23 @@ int futex_requeue(u32 __user *uaddr1, unsigned int fla= gs1, double_unlock_hb(hb1, hb2); wake_up_q(&wake_q); futex_hb_waiters_dec(hb2); + + /* + * If there was no error in the process so far and we woke less than we + * could have and hb changed then we try again in case we missed + * someone. + */ + if (ret >=3D 0 && + !(task_count - nr_wake >=3D nr_requeue) && + (!futex_check_hb_valid(hb1) || !futex_check_hb_valid(hb2))) { + futex_hash_put(hb1); + futex_hash_put(hb2); + wake_q_init(&wake_q); + goto retry_private; + } + if (futex_hash_lock) + up_read(futex_hash_lock); + futex_hash_put(hb1); futex_hash_put(hb2); return ret ? ret : task_count; @@ -783,8 +814,8 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, struct rt_mutex_waiter rt_waiter; struct futex_hash_bucket *hb; union futex_key key2 =3D FUTEX_KEY_INIT; - struct futex_q q =3D futex_q_init; struct rt_mutex_base *pi_mutex; + struct futex_q q; int res, ret; =20 if (!IS_ENABLED(CONFIG_FUTEX_PI)) @@ -799,6 +830,8 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, to =3D futex_setup_timer(abs_time, &timeout, flags, current->timer_slack_ns); =20 +hb_changed_again: + q =3D futex_q_init; /* * The waiter is allocated on our stack, manipulated by the requeue * code while we sleep on uaddr. @@ -841,6 +874,12 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned = int flags, spin_lock(&hb->lock); ret =3D handle_early_requeue_pi_wakeup(hb, &q, to); spin_unlock(&hb->lock); + + if (ret =3D=3D -EWOULDBLOCK && !futex_check_hb_valid(hb)) { + futex_hash_put(hb); + goto hb_changed_again; + } + futex_hash_put(hb); break; =20 @@ -865,6 +904,8 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned i= nt flags, break; =20 case Q_REQUEUE_PI_DONE: + rt_waiter.hb =3D futex_hb_from_futex_q(&q); + /* Requeue completed. Current is 'pi_blocked_on' the rtmutex */ pi_mutex =3D &q.pi_state->pi_mutex; ret =3D rt_mutex_wait_proxy_lock(pi_mutex, to, &rt_waiter); @@ -876,6 +917,35 @@ int futex_wait_requeue_pi(u32 __user *uaddr, unsigned = int flags, ret =3D 0; =20 spin_lock(q.lock_ptr); + if (!futex_check_hb_valid(rt_waiter.hb)) { + bool uaddr_owner; + + debug_rt_mutex_free_waiter(&rt_waiter); + /* + * The HB changed under us after we were requeued on + * uaddr2. We may have acquire the lock on the pi_state + * but this the state that is seen on the current HB. + * However, there could also be an UNLOCK_PI event + * before and we own the lock based on uaddr2. + * Unlock so the next waiter can do the same and + * acquire the PI lock on uaddr2. + */ + reset_pi_state_owner(q.pi_state); + + futex_unqueue_pi(&q); + spin_unlock(q.lock_ptr); + futex_hash_put(futex_hb_from_futex_q(&q)); + + if (to) { + hrtimer_cancel(&to->timer); + destroy_hrtimer_on_stack(&to->timer); + } + uaddr_owner =3D check_pi_lock_owner(uaddr2); + if (uaddr_owner) + return 0; + + return futex_lock_pi(uaddr2, flags, abs_time, 0); + } debug_rt_mutex_free_waiter(&rt_waiter); /* * Fixup the pi_state owner and possibly acquire the lock if we diff --git a/kernel/futex/waitwake.c b/kernel/futex/waitwake.c index 1f2d11eb7f89f..0179b61877529 100644 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -180,6 +180,7 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, i= nt nr_wake, u32 bitset) return ret; } =20 +again_hb_change: spin_lock(&hb->lock); =20 plist_for_each_entry_safe(this, next, &hb->chain, list) { @@ -200,6 +201,16 @@ int futex_wake(u32 __user *uaddr, unsigned int flags, = int nr_wake, u32 bitset) } =20 spin_unlock(&hb->lock); + /* + * If there was no error, we woke less than we could have and the hb + * changed then we try again. + */ + if (ret > 0 && ret < nr_wake && !futex_check_hb_valid(hb)) { + futex_hash_put(hb); + hb =3D futex_hash(&key); + if (futex_hb_waiters_pending(hb)) + goto again_hb_change; + } futex_hash_put(hb); wake_up_q(&wake_q); return ret; @@ -261,7 +272,7 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flag= s, u32 __user *uaddr2, union futex_key key1 =3D FUTEX_KEY_INIT, key2 =3D FUTEX_KEY_INIT; struct futex_hash_bucket *hb1, *hb2; struct futex_q *this, *next; - int ret, op_ret; + int ret, op_ret, op_woke; DEFINE_WAKE_Q(wake_q); =20 retry: @@ -272,11 +283,19 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int fl= ags, u32 __user *uaddr2, if (unlikely(ret !=3D 0)) return ret; =20 +retry_hash: hb1 =3D futex_hash(&key1); hb2 =3D futex_hash(&key2); =20 retry_private: double_lock_hb(hb1, hb2); + if (!futex_check_hb_valid(hb1) || !futex_check_hb_valid(hb2)) { + double_unlock_hb(hb1, hb2); + futex_hash_put(hb1); + futex_hash_put(hb2); + goto retry_hash; + } + op_ret =3D futex_atomic_op_inuser(op, uaddr2); if (unlikely(op_ret < 0)) { double_unlock_hb(hb1, hb2); @@ -305,6 +324,8 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flag= s, u32 __user *uaddr2, goto retry; } =20 + op_woke =3D 0; +retry_wake: plist_for_each_entry_safe(this, next, &hb1->chain, list) { if (futex_match (&this->key, &key1)) { if (this->pi_state || this->rt_waiter) { @@ -318,7 +339,6 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int flag= s, u32 __user *uaddr2, } =20 if (op_ret > 0) { - op_ret =3D 0; plist_for_each_entry_safe(this, next, &hb2->chain, list) { if (futex_match (&this->key, &key2)) { if (this->pi_state || this->rt_waiter) { @@ -326,19 +346,31 @@ int futex_wake_op(u32 __user *uaddr1, unsigned int fl= ags, u32 __user *uaddr2, goto out_unlock; } this->wake(&wake_q, this); - if (++op_ret >=3D nr_wake2) + if (++op_woke >=3D nr_wake2) break; } } - ret +=3D op_ret; } =20 out_unlock: double_unlock_hb(hb1, hb2); + if (ret >=3D 0 && + (!(ret >=3D nr_wake) || !(op_woke >=3D nr_wake2)) && + (!futex_check_hb_valid(hb1) || !futex_check_hb_valid(hb2))) { + + futex_hash_put(hb1); + futex_hash_put(hb2); + hb1 =3D futex_hash(&key1); + hb2 =3D futex_hash(&key2); + double_lock_hb(hb1, hb2); + goto retry_wake; + } + wake_up_q(&wake_q); + futex_hash_put(hb1); futex_hash_put(hb2); - return ret; + return ret < 0 ? ret : ret + op_woke; } =20 static long futex_wait_restart(struct restart_block *restart); diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index ac1365afcc4a5..ce1cf32dc7ed0 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -58,10 +58,29 @@ static inline int __ww_mutex_check_kill(struct rt_mutex= *lock, return 0; } =20 +extern bool futex_check_hb_valid(struct futex_hash_bucket *hb); + +static inline bool __internal_retry_reason(struct rt_mutex_waiter *waiter) +{ + if (!IS_ENABLED(CONFIG_FUTEX)) + return false; + + if (!waiter->hb) + return false; + if (futex_check_hb_valid(waiter->hb)) + return false; + return true; +} + #else # define build_ww_mutex() (true) # define ww_container_of(rtm) container_of(rtm, struct ww_mutex, base) # include "ww_mutex.h" + +static inline bool __internal_retry_reason(struct rt_mutex_waiter *waiter) +{ + return false; +} #endif =20 /* @@ -1633,6 +1652,13 @@ static int __sched rt_mutex_slowlock_block(struct rt= _mutex_base *lock, break; } =20 + if (!build_ww_mutex()) { + if (__internal_retry_reason(waiter)) { + ret =3D -EAGAIN; + break; + } + } + if (waiter =3D=3D rt_mutex_top_waiter(lock)) owner =3D rt_mutex_owner(lock); else diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_commo= n.h index c38a2d2d4a7ee..3bd0925a73a6a 100644 --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -56,6 +56,7 @@ struct rt_mutex_waiter { struct rt_mutex_base *lock; unsigned int wake_state; struct ww_acquire_ctx *ww_ctx; + struct futex_hash_bucket *hb; }; =20 /** @@ -100,7 +101,8 @@ extern int __rt_mutex_futex_trylock(struct rt_mutex_bas= e *l); extern void rt_mutex_futex_unlock(struct rt_mutex_base *lock); extern bool __rt_mutex_futex_unlock(struct rt_mutex_base *lock, struct rt_wake_q_head *wqh); - +extern void pi_state_update_owner(struct futex_pi_state *pi_state, + struct task_struct *new_owner); extern void rt_mutex_postunlock(struct rt_wake_q_head *wqh); =20 /* @@ -216,6 +218,7 @@ static inline void rt_mutex_init_waiter(struct rt_mutex= _waiter *waiter) RB_CLEAR_NODE(&waiter->tree.entry); waiter->wake_state =3D TASK_NORMAL; waiter->task =3D NULL; + waiter->hb =3D NULL; } =20 static inline void rt_mutex_init_rtlock_waiter(struct rt_mutex_waiter *wai= ter) --=20 2.45.2 From nobody Fri Dec 19 04:55:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3FAD61FA16B for ; Tue, 3 Dec 2024 16:43:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244225; cv=none; b=CvKqV7PrSkpOnxL3uYxR+u70TvIVDFiOCj/jyUeHzrh4HLKrlLo0I32CY96Ln+NxI3uixSHCaR9Mc+MzrIrBbdU4bEowqpGfSHOI+kO1WKh3kbRKWnuHvw19FRxa7Ly1IzB2OvyFxpoP7GTXcI6n9frmNSburvWZ3OoBHJ6etc8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244225; c=relaxed/simple; bh=249cN2j3NzyyB4xPMe6Du4F38tI5oXp8rh//Bgb6UB0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Wz/XOry8Jz0SQUUV1m4NkIqu1TNrhVL2nGvYYyglJ/VEYK3E0U+hzlcSjxI1LbKQ4b5AR0bgbCmSBfT+xQeHrKFV3BQLDR1Vwbj5zkSMRk4fTmpLWY1tvahTO1PE3Qf3kb0hOFYtMJkF4C4vfPSof2yrYxTtscQVJA5lXX/e0gY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=l8IJrqF+; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=1Ia6LAZr; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="l8IJrqF+"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="1Ia6LAZr" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1733244219; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ImT2t8RS8Q+PIjBlF2tDKi4h5urza10E3YW86rAn+8c=; b=l8IJrqF+0IL8WT2QSSNGkfM5LecxjrcDX+sOj/pBvFDzWRXU77ShnX8KGcTEMbW7h7RM5j 4jWVyUedL32048YD3M5BTC57sMvW9C7ydmY7MiBFfKSqiwHcyIoa4dtLK2CN18e2UfQijN MJeW95U1TN4O0qBThEdVPTl9EtvCpq9zJlFKXlj4d6KDer60FlEsd1JIOxM3DS3IH1Ceeb YWefoq03h/GXU9h+cAd9YP1ZqhXalgwPcf8kfGNL5XY7vnaehFrszwvFk7KCoPidGY90sc FwPyJjnMRUSdtTsL1RQDKudliFOXDoUEz0Wan7YnfXPoxbWikKzETLSjCk+fAw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1733244219; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ImT2t8RS8Q+PIjBlF2tDKi4h5urza10E3YW86rAn+8c=; b=1Ia6LAZrDUCtOw+KuElzdems8El1JPtGYHpnNGnd5kgb3/IXfGapomfH7PuGdNrm4AmHCT w2l3HVsOuzxfBOBw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v4 07/11] futex: Allow to make the number of slots invariant. Date: Tue, 3 Dec 2024 17:42:15 +0100 Message-ID: <20241203164335.1125381-8-bigeasy@linutronix.de> In-Reply-To: <20241203164335.1125381-1-bigeasy@linutronix.de> References: <20241203164335.1125381-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add an option to freeze the number of hash buckets. The idea is to have fixed once a certain size is acceptable so that it can be avoided to acquire mm_struct::futex_hash_lock on certain operations. Signed-off-by: Sebastian Andrzej Siewior --- include/uapi/linux/prctl.h | 2 ++ kernel/futex/core.c | 54 ++++++++++++++++++++++++++++++++++++-- kernel/futex/futex.h | 1 + kernel/futex/pi.c | 2 +- kernel/futex/requeue.c | 3 ++- 5 files changed, 58 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h index 55b843644c51a..d1f4b3dea565c 100644 --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -357,5 +357,7 @@ struct prctl_mm_map { #define PR_FUTEX_HASH 77 # define PR_FUTEX_HASH_SET_SLOTS 1 # define PR_FUTEX_HASH_GET_SLOTS 2 +# define PR_FUTEX_HASH_SET_INVARIANT 3 +# define PR_FUTEX_HASH_GET_INVARIANT 4 =20 #endif /* _LINUX_PRCTL_H */ diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 0dd7100e36419..1abea8f9abd22 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -61,6 +61,7 @@ struct futex_hash_bucket_private { rcuref_t users; unsigned int hash_mask; struct rcu_head rcu; + bool slots_invariant; struct futex_hash_bucket queues[]; }; =20 @@ -1266,6 +1267,7 @@ static int futex_hash_allocate(unsigned int hash_slot= s) struct futex_hash_bucket_private *hb_p, *hb_p_old =3D NULL; struct mm_struct *mm; size_t alloc_size; + int ret =3D 0; int i; =20 if (hash_slots =3D=3D 0) @@ -1291,20 +1293,30 @@ static int futex_hash_allocate(unsigned int hash_sl= ots) =20 rcuref_init(&hb_p->users, 1); hb_p->hash_mask =3D hash_slots - 1; + hb_p->slots_invariant =3D false; =20 for (i =3D 0; i < hash_slots; i++) futex_hash_bucket_init(&hb_p->queues[i], i + 1); =20 mm =3D current->mm; scoped_guard(rwsem_write, &mm->futex_hash_lock) { + hb_p_old =3D rcu_dereference_check(mm->futex_hash_bucket, lockdep_is_held(&mm->futex_hash_lock)); - rcu_assign_pointer(mm->futex_hash_bucket, hb_p); + if (hb_p_old && hb_p_old->slots_invariant) + ret =3D -EINVAL; + else + rcu_assign_pointer(mm->futex_hash_bucket, hb_p); } + if (ret) { + kvfree(hb_p); + return ret; + } + if (hb_p_old) futex_put_old_hb_p(hb_p_old); =20 - return 0; + return ret; } =20 int futex_hash_allocate_default(void) @@ -1323,6 +1335,36 @@ static int futex_hash_get_slots(void) return 0; } =20 +static int futex_hash_set_invariant(void) +{ + struct futex_hash_bucket_private *hb_p; + struct mm_struct *mm; + + mm =3D current->mm; + guard(rwsem_write)(&mm->futex_hash_lock); + hb_p =3D rcu_dereference_check(mm->futex_hash_bucket, + lockdep_is_held(&mm->futex_hash_lock)); + if (!hb_p) + return -EINVAL; + if (hb_p->slots_invariant) + return -EALREADY; + hb_p->slots_invariant =3D true; + return 0; +} + +bool futex_hash_is_invariant(void) +{ + struct futex_hash_bucket_private *hb_p; + struct mm_struct *mm; + + mm =3D current->mm; + guard(rcu)(); + hb_p =3D rcu_dereference(mm->futex_hash_bucket); + if (!hb_p) + return -EINVAL; + return hb_p->slots_invariant; +} + int futex_hash_prctl(unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) { @@ -1337,6 +1379,14 @@ int futex_hash_prctl(unsigned long arg2, unsigned lo= ng arg3, ret =3D futex_hash_get_slots(); break; =20 + case PR_FUTEX_HASH_SET_INVARIANT: + ret =3D futex_hash_set_invariant(); + break; + + case PR_FUTEX_HASH_GET_INVARIANT: + ret =3D futex_hash_is_invariant(); + break; + default: ret =3D -EINVAL; break; diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h index 503f56643a966..e81820a393027 100644 --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -208,6 +208,7 @@ extern void futex_hash_get(struct futex_hash_bucket *hb= ); extern bool futex_check_hb_valid(struct futex_hash_bucket *hb); extern bool check_pi_lock_owner(u32 __user *uaddr); extern void reset_pi_state_owner(struct futex_pi_state *pi_state); +extern bool futex_hash_is_invariant(void); =20 static inline struct futex_hash_bucket *futex_hb_from_futex_q(struct futex= _q *q) { diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c index b4156d1cc6608..9df320be750c3 100644 --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -1202,7 +1202,7 @@ int futex_unlock_pi(u32 __user *uaddr, unsigned int f= lags) if (!IS_ENABLED(CONFIG_FUTEX_PI)) return -ENOSYS; =20 - if (!(flags & FLAGS_SHARED)) + if (!(flags & FLAGS_SHARED) && !futex_hash_is_invariant()) futex_hash_lock =3D ¤t->mm->futex_hash_lock; retry: if (get_user(uval, uaddr)) diff --git a/kernel/futex/requeue.c b/kernel/futex/requeue.c index 6b3c4413fbf47..904c68abfb8f3 100644 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -431,7 +431,8 @@ int futex_requeue(u32 __user *uaddr1, unsigned int flag= s1, if (refill_pi_state_cache()) return -ENOMEM; =20 - if (!(flags1 & FLAGS_SHARED) || !(flags2 & FLAGS_SHARED)) + if ((!(flags1 & FLAGS_SHARED) || !(flags2 & FLAGS_SHARED)) && + !futex_hash_is_invariant()) futex_hash_lock =3D ¤t->mm->futex_hash_lock; } =20 --=20 2.45.2 From nobody Fri Dec 19 04:55:08 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5FEF21FA17D for ; Tue, 3 Dec 2024 16:43:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244224; cv=none; b=tnDPHF0AL//UA4wxX28rAn1lGAV69RR1Gvy/I4K6jFFbC17d0ueSzqOg1wRI7hcQsmcfSYc8GzKaftlOr2Dw/yLBIcpSmppzgvoLr8bKCwPJhwO2Y132xFFG4qQoo/IsJUfuStVnuGq3mnXrKPnx3uZsizYkznJDsa4hlNGvZVI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244224; c=relaxed/simple; bh=ZMp6vYNIg9gi1PP51ysgh4RevnxhkX7M+zQrNiRnfyI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OCjHx2G5P+ygAwjPiSbyyWCTya/FtiMziRhaH7bKBJKxYtyOVui7BJz0+W5787pYnAkBPK+vB3xwE1NWAimGit3HdW7YwuS+iHrK8XNsoFW+5FIe1b//HQbP5wYctc5OBc37ZJopwkK2J4QYj+qUXZmlY4yrqjqc0xl7icGT338= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=jIrb/sSa; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=83/TizOY; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="jIrb/sSa"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="83/TizOY" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1733244219; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HPNujvPD1HpjKF+RNz+0sqiBOxGwTUAti6CTU9Flecs=; b=jIrb/sSaxf1Q85+Tm9NhINDTmsQXC3fkOYkeuT5i0cXqdV4Taet1tVkxpJsp9mtBh7eSK/ cNkqa+yGkR5oDRdBXeXyv5GPbTtr02ck21ElxDlMLGTO/k4fRKgZLed7nMWAAZ+1giJ7k4 UNBZQD/4R6Sfa7uY+7isv+RgGbe1jngOsRR/HHA97YdAxpFOmoGV09LQE6c6jWrN6NJ0rC 7P1+1G9cy5aPQn/2n3TIuF5KVoeSi0Rl/raqYSjGdy9uPrNTq8hgT6jlHgeQdBWM9M+2r3 MByT4gAmtgANQw8NyuV/k9D+6Zrfad7O3mR3MK3pE3OSjHEcyTq5dynBUxXaiw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1733244219; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HPNujvPD1HpjKF+RNz+0sqiBOxGwTUAti6CTU9Flecs=; b=83/TizOYeBodsemzShK87EWAnjQyVELcIEwduUtxgrGBuw5x45HJZW3f6iGiDMKeYp98Sq jrMYOONFN5dCumCw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v4 08/11] futex: Resize futex hash table based on number of threads. Date: Tue, 3 Dec 2024 17:42:16 +0100 Message-ID: <20241203164335.1125381-9-bigeasy@linutronix.de> In-Reply-To: <20241203164335.1125381-1-bigeasy@linutronix.de> References: <20241203164335.1125381-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Automatically size hash bucket based on the number of threads. The logic tries to allocate between 16 and futex_hashsize (the default for the system wide hash bucket) and uses 4 * number-of-threads. If the upper limit is reached, the HB will be made invariant. Signed-off-by: Sebastian Andrzej Siewior --- kernel/fork.c | 4 ---- kernel/futex/core.c | 39 +++++++++++++++++++++++++++++++++------ 2 files changed, 33 insertions(+), 10 deletions(-) diff --git a/kernel/fork.c b/kernel/fork.c index 6267d600af991..35ec9958707c5 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2134,10 +2134,6 @@ static bool need_futex_hash_allocate_default(u64 clo= ne_flags) { if ((clone_flags & (CLONE_THREAD | CLONE_VM)) !=3D (CLONE_THREAD | CLONE_= VM)) return false; - if (!thread_group_empty(current)) - return false; - if (current->mm->futex_hash_bucket) - return false; return true; } =20 diff --git a/kernel/futex/core.c b/kernel/futex/core.c index 1abea8f9abd22..19515aa5a6430 100644 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -65,6 +65,8 @@ struct futex_hash_bucket_private { struct futex_hash_bucket queues[]; }; =20 +static unsigned int futex_default_max_buckets; + /* * Fault injections for futexes. */ @@ -1262,7 +1264,7 @@ bool futex_check_hb_valid(struct futex_hash_bucket *h= b) return hb_p_now =3D=3D hb_p; } =20 -static int futex_hash_allocate(unsigned int hash_slots) +static int futex_hash_allocate(unsigned int hash_slots, bool slots_invaria= nt) { struct futex_hash_bucket_private *hb_p, *hb_p_old =3D NULL; struct mm_struct *mm; @@ -1274,8 +1276,8 @@ static int futex_hash_allocate(unsigned int hash_slot= s) hash_slots =3D 16; if (hash_slots < 2) hash_slots =3D 2; - if (hash_slots > 131072) - hash_slots =3D 131072; + if (hash_slots > futex_default_max_buckets) + hash_slots =3D futex_default_max_buckets; if (!is_power_of_2(hash_slots)) hash_slots =3D rounddown_pow_of_two(hash_slots); =20 @@ -1293,7 +1295,7 @@ static int futex_hash_allocate(unsigned int hash_slot= s) =20 rcuref_init(&hb_p->users, 1); hb_p->hash_mask =3D hash_slots - 1; - hb_p->slots_invariant =3D false; + hb_p->slots_invariant =3D slots_invariant; =20 for (i =3D 0; i < hash_slots; i++) futex_hash_bucket_init(&hb_p->queues[i], i + 1); @@ -1321,7 +1323,31 @@ static int futex_hash_allocate(unsigned int hash_slo= ts) =20 int futex_hash_allocate_default(void) { - return futex_hash_allocate(0); + unsigned int threads; + unsigned int buckets; + unsigned int current_buckets =3D 0; + struct futex_hash_bucket_private *hb_p; + + if (!current->mm) + return 0; + + scoped_guard(rcu) { + threads =3D get_nr_threads(current); + hb_p =3D rcu_dereference(current->mm->futex_hash_bucket); + if (hb_p) { + if (hb_p->slots_invariant) + return 0; + current_buckets =3D hb_p->hash_mask + 1; + } + } + + buckets =3D roundup_pow_of_two(4 * threads); + buckets =3D max(buckets, 16); + buckets =3D min(buckets, futex_default_max_buckets); + if (current_buckets > buckets) + return 0; + + return futex_hash_allocate(buckets, buckets =3D=3D futex_default_max_buck= ets); } =20 static int futex_hash_get_slots(void) @@ -1372,7 +1398,7 @@ int futex_hash_prctl(unsigned long arg2, unsigned lon= g arg3, =20 switch (arg2) { case PR_FUTEX_HASH_SET_SLOTS: - ret =3D futex_hash_allocate(arg3); + ret =3D futex_hash_allocate(arg3, false); break; =20 case PR_FUTEX_HASH_GET_SLOTS: @@ -1404,6 +1430,7 @@ static int __init futex_init(void) #else futex_hashsize =3D roundup_pow_of_two(256 * num_possible_cpus()); #endif + futex_default_max_buckets =3D futex_hashsize; =20 futex_queues =3D alloc_large_system_hash("futex", sizeof(*futex_queues), futex_hashsize, 0, 0, --=20 2.45.2 From nobody Fri Dec 19 04:55:09 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0926A1FA85A for ; Tue, 3 Dec 2024 16:43:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244226; cv=none; b=ZOap6HKZba3S7sk05e9J0JSjdFo1lsbRViUTCRtuVBxmobhLOtZ74p9rMOY98KI9UpCTE4GaKrj41+EMZiMJ2KCApuSMvlxR1ANAeLj120E5wVNVwM4HExIDk0gz4Ew0zAcwPnVVo3rSKT/1r152niDv/gWXuF/5Fv+gKaTW5zQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244226; c=relaxed/simple; bh=oBCl2xg1hBgwXDdOFVnS6Y5pFTfEYL/Bz1BD8yFVl24=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=XYntrkOKvsadITJqIZMl93bquHAr7/yUo9BDA5sKk3kx+O9NnyN2TtyxFep9uWgyg86l54Kd3An2EgRp2Ul3K2S2+ru3JzXGblcQQH2NDLL56iplpQrPu5ocNhttgMFjUi0avUwid+MYsawv3xeGGhit3Pbz3fIAwGI+HDIRyfw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=yyaTzsI2; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=r0sfbJnO; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="yyaTzsI2"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="r0sfbJnO" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1733244220; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3L65Sur2ZtMYl6cHzYDBnz6z7qp7UvxDlE/18U+/UkU=; b=yyaTzsI2ZfgofexCT8fP6/onFXsORN8ml3KEv+A4iKrpxAX4e0394J8o49EeH8b/g3EBlb fz75JHiJOYnn4KcYDMGp/gVtagdSMenrZnOyu8rkTcnorBR0i3KpgSAdkP/uoAX6icrToC sz7h0Qj/U5EYPGlP7Hc0vrZX7t/r+XNJI0hFfo76ZWooVwcfdoFJArCJnd5c1OcxXYiFIe PTzqLhMALwyzPbGeie5bukhNIDqLyXmx3e2LLGxUnmcx92Tf2cAxMZDrWlHIwzHdGnOMFY Y2PKms0xhUMYEr+lrOiCOcCalAILHuvSnLwAoFk0S3vQj1cbpp/nxuI5q4xHvg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1733244220; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3L65Sur2ZtMYl6cHzYDBnz6z7qp7UvxDlE/18U+/UkU=; b=r0sfbJnO1IiaeYZuC4fkvIyJ3CeUD+oPLmU9UsTw/lYsXJYvOGmjz7W2SxKjoWzN30dboA 9UU8RqMNaq6C6IAA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v4 09/11] =?UTF-8?q?tools/perf:=20Add=20the=20prctl(PR=5FF?= =?UTF-8?q?UTEX=5FHASH,=E2=80=A6)=20to=20futex-hash.?= Date: Tue, 3 Dec 2024 17:42:17 +0100 Message-ID: <20241203164335.1125381-10-bigeasy@linutronix.de> In-Reply-To: <20241203164335.1125381-1-bigeasy@linutronix.de> References: <20241203164335.1125381-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Wire up PR_FUTEX_HASH to futex-hash. Use the `-b' argument to specify the number of buckets. Read it back and show during invocation. Signed-off-by: Sebastian Andrzej Siewior --- tools/perf/bench/futex-hash.c | 19 +++++++++++++++++-- tools/perf/bench/futex.h | 1 + 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c index b472eded521b1..e24e987ae213e 100644 --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -22,6 +22,7 @@ #include #include #include +#include =20 #include "../util/mutex.h" #include "../util/stat.h" @@ -53,6 +54,7 @@ static struct bench_futex_parameters params =3D { }; =20 static const struct option options[] =3D { + OPT_UINTEGER('b', "buckets", ¶ms.nbuckets, "Task local futex buckets = to allocate"), OPT_UINTEGER('t', "threads", ¶ms.nthreads, "Specify amount of threads= "), OPT_UINTEGER('r', "runtime", ¶ms.runtime, "Specify runtime (in second= s)"), OPT_UINTEGER('f', "futexes", ¶ms.nfutexes, "Specify amount of futexes= per threads"), @@ -120,6 +122,10 @@ static void print_summary(void) (int)bench__runtime.tv_sec); } =20 +#define PR_FUTEX_HASH 77 +# define PR_FUTEX_HASH_SET_SLOTS 1 +# define PR_FUTEX_HASH_GET_SLOTS 2 + int bench_futex_hash(int argc, const char **argv) { int ret =3D 0; @@ -131,6 +137,7 @@ int bench_futex_hash(int argc, const char **argv) struct perf_cpu_map *cpu; int nrcpus; size_t size; + int num_buckets; =20 argc =3D parse_options(argc, argv, options, bench_futex_hash_usage, 0); if (argc) { @@ -147,6 +154,14 @@ int bench_futex_hash(int argc, const char **argv) act.sa_sigaction =3D toggle_done; sigaction(SIGINT, &act, NULL); =20 + ret =3D prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_SET_SLOTS, params.nbuckets); + if (ret) { + printf("Allocation of %u hash buckets failed: %d/%m\n", + params.nbuckets, ret); + goto errmem; + } + num_buckets =3D prctl(PR_FUTEX_HASH, PR_FUTEX_HASH_GET_SLOTS); + if (params.mlockall) { if (mlockall(MCL_CURRENT | MCL_FUTURE)) err(EXIT_FAILURE, "mlockall"); @@ -162,8 +177,8 @@ int bench_futex_hash(int argc, const char **argv) if (!params.fshared) futex_flag =3D FUTEX_PRIVATE_FLAG; =20 - printf("Run summary [PID %d]: %d threads, each operating on %d [%s] futex= es for %d secs.\n\n", - getpid(), params.nthreads, params.nfutexes, params.fshared ? "shar= ed":"private", params.runtime); + printf("Run summary [PID %d]: %d threads, hash slots: %d each operating o= n %d [%s] futexes for %d secs.\n\n", + getpid(), params.nthreads, num_buckets, params.nfutexes, params.fs= hared ? "shared":"private", params.runtime); =20 init_stats(&throughput_stats); mutex_init(&thread_lock); diff --git a/tools/perf/bench/futex.h b/tools/perf/bench/futex.h index ebdc2b032afc1..abc353c63a9a4 100644 --- a/tools/perf/bench/futex.h +++ b/tools/perf/bench/futex.h @@ -20,6 +20,7 @@ struct bench_futex_parameters { bool multi; /* lock-pi */ bool pi; /* requeue-pi */ bool broadcast; /* requeue */ + unsigned int nbuckets; unsigned int runtime; /* seconds*/ unsigned int nthreads; unsigned int nfutexes; --=20 2.45.2 From nobody Fri Dec 19 04:55:09 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 092D11FAC25 for ; Tue, 3 Dec 2024 16:43:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244225; cv=none; b=jKPhVh47ZxsNcbqWTaSWkhlJuiksoBKdlzNfZnN3SkjS5DKPQ37SL1uvY1uNREatEJ17DrkJQj/CgS5S7vH1EuY/yhkvRK6kyZ7x/2BOQb/14OFMvOh0UmxnISgQ1OOIU3ubnd2mQdEaDzES4vPALkln4N6EojajJW9kwWZiTCs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244225; c=relaxed/simple; bh=Wza0NVogmEzWlHk3L0MKCrq8xu+lZPkmjetxS0L4yJY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rZRkdvu0R72XCqB5hCQ+kpvn49mq3Y3s6eslOmCJ1D9eu77xBinsrKFAij5LJ2GiZ4DEaSBIHLTJEfeDm2v1q0atlPjXft7sUNbj0m8ySqTUxhlola38BMsFom60FZ/d5fZuP1yw2IHsjG5E/SiPLFPyzvyvZMBUDkLKs5veCV0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=k34AECJC; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=s5rWl00I; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="k34AECJC"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="s5rWl00I" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1733244220; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P0sAWYJTAdfyW4L0KBHcmylrIRN9HFXqGitHstMvgdc=; b=k34AECJCqpyYje6MdUzAb8klCv7g3VFljE5SoccNrpJVhJC6WHjJ6xxj3I6XjeEdO9SUuu OYia/PyM2915O/lnIOd/YVRVLoYUTAtXPqJkRLBdsPAAV1oibq1xVv7K5n7nIl0dpFVVCx wohPxyR31/zCjIj7bWYNkt9j/1pCguf5257L/1h0EzzmFSTCB5NADtU2fbOOtpdeC/O5Sg v7QOYny5v/IiZB/IndhnGn44jSgeJFXgGDHdMFMyb6MYCMJ86msyUPnFv8UTSR38Vi/y+M QR9poG/aIbcyGid2pPBUkTWStzqVy0qOxxu8ZhGatwFpxMgwqqLLkIpwXU3Vnw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1733244220; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P0sAWYJTAdfyW4L0KBHcmylrIRN9HFXqGitHstMvgdc=; b=s5rWl00IhZX5N4IBcJNQs/yw4WQIe1cHYaW4tt1HNnYQcmn7hD3lmSjYr//B96IVYdIxR9 t/YjbP8hupcn46CA== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v4 10/11] tools/perf: The the current affinity for CPU pinning in futex-hash. Date: Tue, 3 Dec 2024 17:42:18 +0100 Message-ID: <20241203164335.1125381-11-bigeasy@linutronix.de> In-Reply-To: <20241203164335.1125381-1-bigeasy@linutronix.de> References: <20241203164335.1125381-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to simplify NUMA local testing, let futex-hash use the current affinity mask and pin the individual threads based on that mask. Signed-off-by: Sebastian Andrzej Siewior --- tools/perf/bench/futex-hash.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c index e24e987ae213e..216b0d1301ffc 100644 --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -126,10 +126,24 @@ static void print_summary(void) # define PR_FUTEX_HASH_SET_SLOTS 1 # define PR_FUTEX_HASH_GET_SLOTS 2 =20 +static unsigned int get_cpu_bit(cpu_set_t *set, size_t set_size, unsigned = int r_cpu) +{ + unsigned int cpu =3D 0; + + do { + if (CPU_ISSET_S(cpu, set_size, set)) { + if (!r_cpu) + return cpu; + r_cpu--; + } + cpu++; + } while (1); +} + int bench_futex_hash(int argc, const char **argv) { int ret =3D 0; - cpu_set_t *cpuset; + cpu_set_t *cpuset, cpuset_; struct sigaction act; unsigned int i; pthread_attr_t thread_attr; @@ -167,8 +181,12 @@ int bench_futex_hash(int argc, const char **argv) err(EXIT_FAILURE, "mlockall"); } =20 + ret =3D pthread_getaffinity_np(pthread_self(), sizeof(cpuset_), &cpuset_); + BUG_ON(ret); + nrcpus =3D CPU_COUNT(&cpuset_); + if (!params.nthreads) /* default to the number of CPUs */ - params.nthreads =3D perf_cpu_map__nr(cpu); + params.nthreads =3D nrcpus; =20 worker =3D calloc(params.nthreads, sizeof(*worker)); if (!worker) @@ -189,10 +207,9 @@ int bench_futex_hash(int argc, const char **argv) pthread_attr_init(&thread_attr); gettimeofday(&bench__start, NULL); =20 - nrcpus =3D cpu__max_cpu().cpu; - cpuset =3D CPU_ALLOC(nrcpus); + cpuset =3D CPU_ALLOC(4096); BUG_ON(!cpuset); - size =3D CPU_ALLOC_SIZE(nrcpus); + size =3D CPU_ALLOC_SIZE(4096); =20 for (i =3D 0; i < params.nthreads; i++) { worker[i].tid =3D i; @@ -202,7 +219,8 @@ int bench_futex_hash(int argc, const char **argv) =20 CPU_ZERO_S(size, cpuset); =20 - CPU_SET_S(perf_cpu_map__cpu(cpu, i % perf_cpu_map__nr(cpu)).cpu, size, c= puset); + CPU_SET_S(get_cpu_bit(&cpuset_, sizeof(cpuset_), i % nrcpus), size, cpus= et); + ret =3D pthread_attr_setaffinity_np(&thread_attr, size, cpuset); if (ret) { CPU_FREE(cpuset); --=20 2.45.2 From nobody Fri Dec 19 04:55:09 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 091AC1FA856 for ; Tue, 3 Dec 2024 16:43:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244225; cv=none; b=sRKR7jVq5ywVOR1L2EDxbjEO4XMY4dnzIDE4R2Lt886Gi7s5jTj5sIfxW4YK6fdkmunnSBQrF8N5hchXNGKr3yF55BJ53E6TKiG6asUE/t/3HSeIDGAc0ZybJE2R42Sou7UhTaQarQSVrIYcgB9aNJoeYAKvw7kH8cEpaNpbjtg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733244225; c=relaxed/simple; bh=lg8axIUYclb2T6+boQQvvI4x/adQdWN/2MeRAxM1I+E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hLHQuTCZy9rqbzhZPbE2hrPmjVApgn8Ttffjm6DpYWAblHQEFfBV0bzmbmYOGeM87NkcgEYter+B3nUCCV0Lp7AJA+OS6XkOxIXuaCA/2V/y5KCU6aRVzSHt8sAyZ6IW0x6Idc/aA3+AH5pgJxj6AC6aOEvn31UB0qUe+mzHpTA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=DBcpRnSr; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=a/ISmL5H; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="DBcpRnSr"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="a/ISmL5H" From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1733244220; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E6O0H8PocWBBcPGtHkLgMVhBIfVergybbi7OL8FwAuw=; b=DBcpRnSrvKq6BYw4qSNnTpbp5w1GXXkZ4+lWDZsmzIoYZU2gdMLICwjQ2GDUgBhi3sbwG6 f6TcqeTuhvvVtyigVSbRtGnmk7r6ROzxsveu0VLk+TW9CaUvJ5PoQjTcnCGwSLk3Ih0Ifn /HEmglVhVxRsfkDPWmwa1gyLRIsgUPGz/oZCS03LetXFaoKQ4lHHWAMekKxt+kwaFpCRK8 f5RdnLzdICm7aE0CdCQVjFo+89Rlm4IMBa2/fjCR8QYVp5WnFrXv8N381xegdIkY8DUG6l XxEbr3YqXmP2hvcauGMojhoHDL+grOP7715OGth8J1WoZioVh+BVOcKcW6kH7A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1733244220; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E6O0H8PocWBBcPGtHkLgMVhBIfVergybbi7OL8FwAuw=; b=a/ISmL5HlE1mncnUp/VtOQw2xilo06rJ0JxH5DNqCjxZGawJ4StqmK62DXYqcxNgg7Unj+ H9S5QPNqL7HkGvCw== To: linux-kernel@vger.kernel.org Cc: =?UTF-8?q?Andr=C3=A9=20Almeida?= , Darren Hart , Davidlohr Bueso , Ingo Molnar , Juri Lelli , Peter Zijlstra , Thomas Gleixner , Valentin Schneider , Waiman Long , Sebastian Andrzej Siewior Subject: [PATCH v4 11/11] tools/perf: Allocate futex locks on the local CPU-node. Date: Tue, 3 Dec 2024 17:42:19 +0100 Message-ID: <20241203164335.1125381-12-bigeasy@linutronix.de> In-Reply-To: <20241203164335.1125381-1-bigeasy@linutronix.de> References: <20241203164335.1125381-1-bigeasy@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Signed-off-by: Sebastian Andrzej Siewior --- tools/perf/bench/futex-hash.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/tools/perf/bench/futex-hash.c b/tools/perf/bench/futex-hash.c index 216b0d1301ffc..4c7c6677463f8 100644 --- a/tools/perf/bench/futex-hash.c +++ b/tools/perf/bench/futex-hash.c @@ -122,6 +122,8 @@ static void print_summary(void) (int)bench__runtime.tv_sec); } =20 +#include + #define PR_FUTEX_HASH 77 # define PR_FUTEX_HASH_SET_SLOTS 1 # define PR_FUTEX_HASH_GET_SLOTS 2 @@ -212,14 +214,19 @@ int bench_futex_hash(int argc, const char **argv) size =3D CPU_ALLOC_SIZE(4096); =20 for (i =3D 0; i < params.nthreads; i++) { + unsigned int cpu_num; worker[i].tid =3D i; - worker[i].futex =3D calloc(params.nfutexes, sizeof(*worker[i].futex)); - if (!worker[i].futex) - goto errmem; =20 CPU_ZERO_S(size, cpuset); + cpu_num =3D get_cpu_bit(&cpuset_, sizeof(cpuset_), i % nrcpus); + //worker[i].futex =3D calloc(params.nfutexes, sizeof(*worker[i].futex)); =20 - CPU_SET_S(get_cpu_bit(&cpuset_, sizeof(cpuset_), i % nrcpus), size, cpus= et); + worker[i].futex =3D numa_alloc_onnode(params.nfutexes * sizeof(*worker[i= ].futex), + numa_node_of_cpu(cpu_num)); + if (worker[i].futex =3D=3D MAP_FAILED || worker[i].futex =3D=3D NULL) + goto errmem; + + CPU_SET_S(cpu_num, size, cpuset); =20 ret =3D pthread_attr_setaffinity_np(&thread_attr, size, cpuset); if (ret) { @@ -271,7 +278,7 @@ int bench_futex_hash(int argc, const char **argv) &worker[i].futex[params.nfutexes-1], t); } =20 - zfree(&worker[i].futex); + numa_free(worker[i].futex, params.nfutexes * sizeof(*worker[i].futex)); } =20 print_summary(); --=20 2.45.2