From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F47EC04FE1 for ; Mon, 7 Aug 2023 12:37:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233607AbjHGMhU (ORCPT ); Mon, 7 Aug 2023 08:37:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229562AbjHGMhS (ORCPT ); Mon, 7 Aug 2023 08:37:18 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E685810F2; Mon, 7 Aug 2023 05:37:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=53jHvCThY56ekVsZwPC+RAqN66FMg8fPTI36UoUqaKU=; b=eqmLvazGl7vBH7qJgPl+mV8Rn1 Db9gNWnIQZ6A9f+8x4iCTa3tOp45OF1tMlCfJCzFVwpufed2L7Syzuc7UOY9iih9D4mimwCVtlcDQ 5FAN+SfFoNcoZgnC0ba/iDFxh6N7opePHRF2GS1XFSgF6jwb1TdsPczXS237JH48Uo2hQM6B4//4N 5GuRLi7XiTPgNCyPEcQw5aln/fT+PUo0QdWUVkjjAyoUG6MzYF3TqTPqdrdvCZsTN9LYGFKD0FGFf TQK0/YtXVD4x9hkj8ivhhhRiVCQ2Rx4zxOsKNOlw0CNxd3aDi11330mX1Q9ng36kvGJP2ha83Y9ht EjcAsWdA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qSzTk-00AxFz-Rv; Mon, 07 Aug 2023 12:36:57 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id D39913005A2; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 8A81C2021C3D4; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123322.814039156@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:44 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de Subject: [PATCH v2 01/14] futex: Clarify FUTEX2 flags References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" sys_futex_waitv() is part of the futex2 series (the first and only so far) of syscalls and has a flags field per futex (as opposed to flags being encoded in the futex op). This new flags field has a new namespace, which unfortunately isn't super explicit. Notably it currently takes FUTEX_32 and FUTEX_PRIVATE_FLAG. Introduce the FUTEX2 namespace to clarify this Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner Reviewed-by: Andr=C3=A9 Almeida --- include/uapi/linux/futex.h | 16 +++++++++++++--- kernel/futex/syscalls.c | 7 +++---- 2 files changed, 16 insertions(+), 7 deletions(-) --- a/include/uapi/linux/futex.h +++ b/include/uapi/linux/futex.h @@ -44,10 +44,20 @@ FUTEX_PRIVATE_FLAG) =20 /* - * Flags to specify the bit length of the futex word for futex2 syscalls. - * Currently, only 32 is supported. + * Flags for futex2 syscalls. */ -#define FUTEX_32 2 + /* 0x00 */ + /* 0x01 */ +#define FUTEX2_SIZE_U32 0x02 + /* 0x04 */ + /* 0x08 */ + /* 0x10 */ + /* 0x20 */ + /* 0x40 */ +#define FUTEX2_PRIVATE FUTEX_PRIVATE_FLAG + +/* do not use */ +#define FUTEX_32 FUTEX2_SIZE_U32 /* historical accident :-( */ =20 /* * Max numbers of elements in a futex_waitv array --- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -183,8 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3); } =20 -/* Mask of available flags for each futex in futex_waitv list */ -#define FUTEXV_WAITER_MASK (FUTEX_32 | FUTEX_PRIVATE_FLAG) +#define FUTEX2_VALID_MASK (FUTEX2_SIZE_U32 | FUTEX2_PRIVATE) =20 /** * futex_parse_waitv - Parse a waitv array from userspace @@ -205,10 +204,10 @@ static int futex_parse_waitv(struct fute if (copy_from_user(&aux, &uwaitv[i], sizeof(aux))) return -EFAULT; =20 - if ((aux.flags & ~FUTEXV_WAITER_MASK) || aux.__reserved) + if ((aux.flags & ~FUTEX2_VALID_MASK) || aux.__reserved) return -EINVAL; =20 - if (!(aux.flags & FUTEX_32)) + if (!(aux.flags & FUTEX2_SIZE_U32)) return -EINVAL; =20 futexv[i].w.flags =3D aux.flags; From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1EF9C001B0 for ; Mon, 7 Aug 2023 12:38:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233704AbjHGMhv (ORCPT ); Mon, 7 Aug 2023 08:37:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233602AbjHGMhT (ORCPT ); Mon, 7 Aug 2023 08:37:19 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3E7A10FD; Mon, 7 Aug 2023 05:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=VeisDNjC4AzLF+2mqHFPpkzGGrK09hiTJOlUvJPWkc0=; b=Y0Ic2ASPfM4zwCf8njfDLQ9TfI OuHwsWViCbzS/XDUIJxYEuz+6570rMjB76ialdlDvBGnoZ5f0lGB/AZPHEL1zLVvE+0SXP3E9NKXG aChDsIyEWqNqEjGtk0KBEbmdyoBkO60j1QC7lLf0GPh1j3ArPWtOzZPV1lbRjDAWphs19H93MFN/Y TuDXZNeMFZiuB7FXopYB4UqHciRVwN4uVfSTNPU0z/3z4PBBCn6Q/8fMMasBpNmDC+gK3mHnZENIx 4WJRUxHLhoH2oxp5n9XSV6JwbcpNwv/1h/idKQawLI7xtQnn4TDuz8SQ4B7D5AQpkC0GsM4dn+LC8 mUPicgtA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qSzTk-003oSe-1T; Mon, 07 Aug 2023 12:36:56 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id D56643006F1; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 902A920236021; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123322.883413972@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:45 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de Subject: [PATCH v2 02/14] futex: Extend the FUTEX2 flags References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add the definition for the missing but always intended extra sizes, and add a NUMA flag for the planned numa extention. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Andr=C3=A9 Almeida Reviewed-by: Thomas Gleixner --- include/uapi/linux/futex.h | 21 ++++++++++++++++++--- kernel/futex/syscalls.c | 9 +++++++-- 2 files changed, 25 insertions(+), 5 deletions(-) --- a/include/uapi/linux/futex.h +++ b/include/uapi/linux/futex.h @@ -45,17 +45,32 @@ =20 /* * Flags for futex2 syscalls. + * + * NOTE: these are not pure flags, they can also be seen as: + * + * union { + * u32 flags; + * struct { + * u32 size : 2, + * numa : 1, + * : 4, + * private : 1; + * }; + * }; */ - /* 0x00 */ - /* 0x01 */ +#define FUTEX2_SIZE_U8 0x00 +#define FUTEX2_SIZE_U16 0x01 #define FUTEX2_SIZE_U32 0x02 - /* 0x04 */ +#define FUTEX2_SIZE_U64 0x03 +#define FUTEX2_NUMA 0x04 /* 0x08 */ /* 0x10 */ /* 0x20 */ /* 0x40 */ #define FUTEX2_PRIVATE FUTEX_PRIVATE_FLAG =20 +#define FUTEX2_SIZE_MASK 0x03 + /* do not use */ #define FUTEX_32 FUTEX2_SIZE_U32 /* historical accident :-( */ =20 --- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -183,7 +183,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3); } =20 -#define FUTEX2_VALID_MASK (FUTEX2_SIZE_U32 | FUTEX2_PRIVATE) +#define FUTEX2_VALID_MASK (FUTEX2_SIZE_MASK | FUTEX2_PRIVATE) =20 /** * futex_parse_waitv - Parse a waitv array from userspace @@ -207,7 +207,12 @@ static int futex_parse_waitv(struct fute if ((aux.flags & ~FUTEX2_VALID_MASK) || aux.__reserved) return -EINVAL; =20 - if (!(aux.flags & FUTEX2_SIZE_U32)) + if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) { + if ((aux.flags & FUTEX2_SIZE_MASK) =3D=3D FUTEX2_SIZE_U64) + return -EINVAL; + } + + if ((aux.flags & FUTEX2_SIZE_MASK) !=3D FUTEX2_SIZE_U32) return -EINVAL; =20 futexv[i].w.flags =3D aux.flags; From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26DDEC07E8B for ; Mon, 7 Aug 2023 12:37:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233624AbjHGMhY (ORCPT ); Mon, 7 Aug 2023 08:37:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229578AbjHGMhT (ORCPT ); Mon, 7 Aug 2023 08:37:19 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0159A10F3; Mon, 7 Aug 2023 05:37:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=w9vOIgSzc3lTdvJPA8NMQRoDG93TaTUpiM2kUBnJUKc=; b=HVQVCCwtOi10B4bKYKmHp+D8Xn 3Mwa7mNyJqgKtoSRW8aoRhg2wTYWEV9bu/9nUYfTmEk+3fz/rBjtI7q1THMC5ThblPseixtBckC3P jjsb4x5Dz2mO5rtOJ9ot0TyP3uuCeT8xgf7r1l0LNK1BT5L6Qi6StGFPGzQmtVQY9lWOzBzHyC66H q+xN/XwnxvVLzZCjTSvnUwEl4A5jKDjzYcTRry6ZggG2oevtz8yX4SWXiLop/BFRdUUsRCip7yNiw js+pcKYQeW/P2tUP49TmzRMkJdXA+QWfsKkYRpUYtcwdtt5yxgOY5SqOCkQ1ofSU6Lq0Uq+3lN0Wc sDcW8hMQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qSzTk-00AxG0-S3; Mon, 07 Aug 2023 12:36:57 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id D731E30092A; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 95E432029B0A3; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123322.952568452@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:46 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de Subject: [PATCH v2 03/14] futex: Flag conversion References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Futex has 3 sets of flags: - legacy futex op bits - futex2 flags - internal flags Add a few helpers to convert from the API flags into the internal flags. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Andr=C3=A9 Almeida Reviewed-by: Thomas Gleixner --- kernel/futex/futex.h | 63 +++++++++++++++++++++++++++++++++++++++++++= ++--- kernel/futex/syscalls.c | 24 ++++++------------ kernel/futex/waitwake.c | 4 +-- 3 files changed, 71 insertions(+), 20 deletions(-) --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -5,6 +5,7 @@ #include #include #include +#include =20 #ifdef CONFIG_PREEMPT_RT #include @@ -16,8 +17,15 @@ * Futex flags used to encode options to functions and preserve them across * restarts. */ +#define FLAGS_SIZE_8 0x00 +#define FLAGS_SIZE_16 0x01 +#define FLAGS_SIZE_32 0x02 +#define FLAGS_SIZE_64 0x03 + +#define FLAGS_SIZE_MASK 0x03 + #ifdef CONFIG_MMU -# define FLAGS_SHARED 0x01 +# define FLAGS_SHARED 0x10 #else /* * NOMMU does not have per process address space. Let the compiler optimize @@ -25,8 +33,57 @@ */ # define FLAGS_SHARED 0x00 #endif -#define FLAGS_CLOCKRT 0x02 -#define FLAGS_HAS_TIMEOUT 0x04 +#define FLAGS_CLOCKRT 0x20 +#define FLAGS_HAS_TIMEOUT 0x40 +#define FLAGS_NUMA 0x80 + +/* FUTEX_ to FLAGS_ */ +static inline unsigned int futex_to_flags(unsigned int op) +{ + unsigned int flags =3D FLAGS_SIZE_32; + + if (!(op & FUTEX_PRIVATE_FLAG)) + flags |=3D FLAGS_SHARED; + + if (op & FUTEX_CLOCK_REALTIME) + flags |=3D FLAGS_CLOCKRT; + + return flags; +} + +/* FUTEX2_ to FLAGS_ */ +static inline unsigned int futex2_to_flags(unsigned int flags2) +{ + unsigned int flags =3D flags2 & FUTEX2_SIZE_MASK; + + if (!(flags2 & FUTEX2_PRIVATE)) + flags |=3D FLAGS_SHARED; + + if (flags2 & FUTEX2_NUMA) + flags |=3D FLAGS_NUMA; + + return flags; +} + +static inline bool futex_flags_valid(unsigned int flags) +{ + /* Only 64bit futexes for 64bit code */ + if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) { + if ((flags & FLAGS_SIZE_MASK) =3D=3D FLAGS_SIZE_64) + return false; + } + + /* Only 32bit futexes are implemented -- for now */ + if ((flags & FLAGS_SIZE_MASK) !=3D FLAGS_SIZE_32) + return false; + + return true; +} + +static inline unsigned int futex_size(unsigned int flags) +{ + return 1 << (flags & FLAGS_SIZE_MASK); +} =20 #ifdef CONFIG_FAIL_FUTEX extern bool should_fail_futex(bool fshared); --- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -1,6 +1,5 @@ // SPDX-License-Identifier: GPL-2.0-or-later =20 -#include #include #include =20 @@ -85,15 +84,12 @@ SYSCALL_DEFINE3(get_robust_list, int, pi long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, u32 __user *uaddr2, u32 val2, u32 val3) { + unsigned int flags =3D futex_to_flags(op); int cmd =3D op & FUTEX_CMD_MASK; - unsigned int flags =3D 0; =20 - if (!(op & FUTEX_PRIVATE_FLAG)) - flags |=3D FLAGS_SHARED; - - if (op & FUTEX_CLOCK_REALTIME) { - flags |=3D FLAGS_CLOCKRT; - if (cmd !=3D FUTEX_WAIT_BITSET && cmd !=3D FUTEX_WAIT_REQUEUE_PI && + if (flags & FLAGS_CLOCKRT) { + if (cmd !=3D FUTEX_WAIT_BITSET && + cmd !=3D FUTEX_WAIT_REQUEUE_PI && cmd !=3D FUTEX_LOCK_PI2) return -ENOSYS; } @@ -201,21 +197,19 @@ static int futex_parse_waitv(struct fute unsigned int i; =20 for (i =3D 0; i < nr_futexes; i++) { + unsigned int flags; + if (copy_from_user(&aux, &uwaitv[i], sizeof(aux))) return -EFAULT; =20 if ((aux.flags & ~FUTEX2_VALID_MASK) || aux.__reserved) return -EINVAL; =20 - if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()) { - if ((aux.flags & FUTEX2_SIZE_MASK) =3D=3D FUTEX2_SIZE_U64) - return -EINVAL; - } - - if ((aux.flags & FUTEX2_SIZE_MASK) !=3D FUTEX2_SIZE_U32) + flags =3D futex2_to_flags(aux.flags); + if (!futex_flags_valid(flags)) return -EINVAL; =20 - futexv[i].w.flags =3D aux.flags; + futexv[i].w.flags =3D flags; futexv[i].w.val =3D aux.val; futexv[i].w.uaddr =3D aux.uaddr; futexv[i].q =3D futex_q_init; --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -419,11 +419,11 @@ static int futex_wait_multiple_setup(str */ retry: for (i =3D 0; i < count; i++) { - if ((vs[i].w.flags & FUTEX_PRIVATE_FLAG) && retry) + if (!(vs[i].w.flags & FLAGS_SHARED) && retry) continue; =20 ret =3D get_futex_key(u64_to_user_ptr(vs[i].w.uaddr), - !(vs[i].w.flags & FUTEX_PRIVATE_FLAG), + vs[i].w.flags & FLAGS_SHARED, &vs[i].q.key, FUTEX_READ); =20 if (unlikely(ret)) From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8756CC001B0 for ; Mon, 7 Aug 2023 12:37:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233673AbjHGMhl (ORCPT ); Mon, 7 Aug 2023 08:37:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233603AbjHGMhT (ORCPT ); Mon, 7 Aug 2023 08:37:19 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BA4C10FE; Mon, 7 Aug 2023 05:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=1Epi3uPitQ1on7D3YBD3uiLy0TLxJfWf5bL2Vyq95Dw=; b=Co5btgZE3bcLd6z8lfAD2yNF2L ng1KYKs/j+LRyQXyEVivIfQ47vWvqix6Wv8uRYB11Zf6YAPWQ2Kzx6ArCzkOTTWGT+8XCPQ7uXuEW vDAkp51vggTuX10agZQ0yg3W576dARN+M8hwxlyoPwRGjE1XOB0MhUIjhFWN9aJ7vL1eea62cfgP6 NvnVZop5Z9+vAD7rvcXJhH1RecuHIqRh/u8NJKUmTnog33pDpfH/OHUZ4xTe7JYu2pvydxZvXodvi 2IHrCULJhmfjc6eogA90H7UhlDF3UynqeHLBuojaS/A/nCPh7Z4kvOMBaOLWLDVm17tOL1bQyvxXM L0ZUXWwQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qSzTk-003oSd-1J; Mon, 07 Aug 2023 12:36:56 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id D1BF4300473; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 986A62021C3D7; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123323.020870574@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:47 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de Subject: [PATCH v2 04/14] futex: Validate futex value against futex size References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Ensure the futex value fits in the given futex size. Since this adds a constraint to an existing syscall, it might possibly change behaviour. Currently the value would be truncated to a u32 and any high bits would get silently lost. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner --- kernel/futex/futex.h | 10 ++++++++++ kernel/futex/syscalls.c | 3 +++ 2 files changed, 13 insertions(+) --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -85,6 +85,16 @@ static inline unsigned int futex_size(un return 1 << (flags & FLAGS_SIZE_MASK); } =20 +static inline bool futex_validate_input(unsigned int flags, u64 val) +{ + int bits =3D 8 * futex_size(flags); + + if (bits < 64 && (val >> bits)) + return false; + + return true; +} + #ifdef CONFIG_FAIL_FUTEX extern bool should_fail_futex(bool fshared); #else --- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -209,6 +209,9 @@ static int futex_parse_waitv(struct fute if (!futex_flags_valid(flags)) return -EINVAL; =20 + if (!futex_validate_input(flags, aux.val)) + return -EINVAL; + futexv[i].w.flags =3D flags; futexv[i].w.val =3D aux.val; futexv[i].w.uaddr =3D aux.uaddr; From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E801AC04A6A for ; Mon, 7 Aug 2023 12:38:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233741AbjHGMh6 (ORCPT ); Mon, 7 Aug 2023 08:37:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36556 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233615AbjHGMhV (ORCPT ); Mon, 7 Aug 2023 08:37:21 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F239110F3; Mon, 7 Aug 2023 05:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=e0ODiErkiAW6GZEr+QWyQaocRNB5DYuNqCHK8u2llMY=; b=WsiTil+cp9p9DlcqmRAr0x3S3V JnAuedsnn3FQ2guRfve4/1h7VbOAMNu9jEcjXnC1yU3KcSUEPnkCS5lQxwf9Tg/izo/NuwcjkDwxS wrRzof95u3tqna3Mx86ezjdGyx6Af0iTHczZR1lV7GmwnE0zpiqxtB2FOLFcyzxHAv1gLC5ZTxIRB BaYW/z1bWWOyMGePPfiMBtzg9G/xaCgMVQ4Cgp8Eb7nQoaA8gcJP4cLnojosKq4NfsLuN8zVpIu9B /W6S7fNKBXiwiIGZovqxWCFaIZjxYZNZJ39sUbq3p4bZ88ttFaHL7mI5zGGvv2U3Xl+YP4vjrCQ+M USoTyOhw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qSzTl-003oSn-1l; Mon, 07 Aug 2023 12:36:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 287E73033AE; Mon, 7 Aug 2023 14:36:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 9EA5E2021C3D8; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123323.090897260@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:48 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de, Geert Uytterhoeven Subject: [PATCH v2 05/14] futex: Add sys_futex_wake() References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To complement sys_futex_waitv() add sys_futex_wake(). This syscall implements what was previously known as FUTEX_WAKE_BITSET except it uses 'unsigned long' for the bitmask and takes FUTEX2 flags. The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Geert Uytterhoeven Reviewed-by: Thomas Gleixner --- arch/alpha/kernel/syscalls/syscall.tbl | 1=20 arch/arm/tools/syscall.tbl | 1=20 arch/arm64/include/asm/unistd.h | 2 - arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1=20 arch/m68k/kernel/syscalls/syscall.tbl | 1=20 arch/microblaze/kernel/syscalls/syscall.tbl | 1=20 arch/mips/kernel/syscalls/syscall_n32.tbl | 1=20 arch/mips/kernel/syscalls/syscall_n64.tbl | 1=20 arch/mips/kernel/syscalls/syscall_o32.tbl | 1=20 arch/parisc/kernel/syscalls/syscall.tbl | 1=20 arch/powerpc/kernel/syscalls/syscall.tbl | 1=20 arch/s390/kernel/syscalls/syscall.tbl | 1=20 arch/sh/kernel/syscalls/syscall.tbl | 1=20 arch/sparc/kernel/syscalls/syscall.tbl | 1=20 arch/x86/entry/syscalls/syscall_32.tbl | 1=20 arch/x86/entry/syscalls/syscall_64.tbl | 1=20 arch/xtensa/kernel/syscalls/syscall.tbl | 1=20 include/linux/syscalls.h | 3 ++ include/uapi/asm-generic/unistd.h | 5 ++-- kernel/futex/syscalls.c | 30 +++++++++++++++++++++++= +++++ kernel/sys_ni.c | 1=20 22 files changed, 56 insertions(+), 3 deletions(-) --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -491,3 +491,4 @@ 559 common futex_waitv sys_futex_waitv 560 common set_mempolicy_home_node sys_ni_syscall 561 common cachestat sys_cachestat +562 common futex_wake sys_futex_wake --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -465,3 +465,4 @@ 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat +452 common futex_wake sys_futex_wake --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -39,7 +39,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) =20 -#define __NR_compat_syscalls 452 +#define __NR_compat_syscalls 453 #endif =20 #define __ARCH_WANT_SYS_CLONE --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -909,6 +909,8 @@ __SYSCALL(__NR_futex_waitv, sys_futex_wa __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) #define __NR_cachestat 451 __SYSCALL(__NR_cachestat, sys_cachestat) +#define __NR_futex_wake 452 +__SYSCALL(__NR_futex_wake, sys_futex_wake) =20 /* * Please add new compat syscalls above this comment and update --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -372,3 +372,4 @@ 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat +452 common futex_wake sys_futex_wake --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -451,3 +451,4 @@ 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat +452 common futex_wake sys_futex_wake --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -457,3 +457,4 @@ 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat +452 common futex_wake sys_futex_wake --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -390,3 +390,4 @@ 449 n32 futex_waitv sys_futex_waitv 450 n32 set_mempolicy_home_node sys_set_mempolicy_home_node 451 n32 cachestat sys_cachestat +452 n32 futex_wake sys_futex_wake --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -366,3 +366,4 @@ 449 n64 futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 n64 cachestat sys_cachestat +452 n64 futex_wake sys_futex_wake --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -439,3 +439,4 @@ 449 o32 futex_waitv sys_futex_waitv 450 o32 set_mempolicy_home_node sys_set_mempolicy_home_node 451 o32 cachestat sys_cachestat +452 o32 futex_wake sys_futex_wake --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -450,3 +450,4 @@ 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat +452 common futex_wake sys_futex_wake --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -538,3 +538,4 @@ 449 common futex_waitv sys_futex_waitv 450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat +452 common futex_wake sys_futex_wake --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -454,3 +454,4 @@ 449 common futex_waitv sys_futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_me= mpolicy_home_node 451 common cachestat sys_cachestat sys_cachestat +452 common futex_wake sys_futex_wake sys_futex_wake --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -454,3 +454,4 @@ 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat +452 common futex_wake sys_futex_wake --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -497,3 +497,4 @@ 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat +452 common futex_wake sys_futex_wake --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -456,3 +456,4 @@ 449 i386 futex_waitv sys_futex_waitv 450 i386 set_mempolicy_home_node sys_set_mempolicy_home_node 451 i386 cachestat sys_cachestat +452 i386 futex_wake sys_futex_wake --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -373,6 +373,7 @@ 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat +452 common futex_wake sys_futex_wake =20 # # Due to a historical design error, certain syscalls are numbered differen= tly --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -422,3 +422,4 @@ 449 common futex_waitv sys_futex_waitv 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat +452 common futex_wake sys_futex_wake --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -563,6 +563,9 @@ asmlinkage long sys_set_robust_list(stru asmlinkage long sys_futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes, unsigned int flags, struct __kernel_timespec __user *timeout, clockid_t clockid); + +asmlinkage long sys_futex_wake(void __user *uaddr, unsigned long mask, int= nr, unsigned int flags); + asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp, struct __kernel_timespec __user *rmtp); asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp, --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -816,12 +816,13 @@ __SYSCALL(__NR_process_mrelease, sys_pro __SYSCALL(__NR_futex_waitv, sys_futex_waitv) #define __NR_set_mempolicy_home_node 450 __SYSCALL(__NR_set_mempolicy_home_node, sys_set_mempolicy_home_node) - #define __NR_cachestat 451 __SYSCALL(__NR_cachestat, sys_cachestat) +#define __NR_futex_wake 452 +__SYSCALL(__NR_futex_wake, sys_futex_wake) =20 #undef __NR_syscalls -#define __NR_syscalls 452 +#define __NR_syscalls 453 =20 /* * 32 bit systems traditionally used different --- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -306,6 +306,36 @@ SYSCALL_DEFINE5(futex_waitv, struct fute return ret; } =20 +/* + * sys_futex_wake - Wake a number of futexes + * @uaddr: Address of the futex(es) to wake + * @mask: bitmask + * @nr: Number of the futexes to wake + * @flags: FUTEX2 flags + * + * Identical to the traditional FUTEX_WAKE_BITSET op, except it is part of= the + * futex2 family of calls. + */ + +SYSCALL_DEFINE4(futex_wake, + void __user *, uaddr, + unsigned long, mask, + int, nr, + unsigned int, flags) +{ + if (flags & ~FUTEX2_VALID_MASK) + return -EINVAL; + + flags =3D futex2_to_flags(flags); + if (!futex_flags_valid(flags)) + return -EINVAL; + + if (!futex_validate_input(flags, mask)) + return -EINVAL; + + return futex_wake(uaddr, flags, nr, mask); +} + #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(set_robust_list, struct compat_robust_list_head __user *, head, --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -87,6 +87,7 @@ COND_SYSCALL_COMPAT(set_robust_list); COND_SYSCALL(get_robust_list); COND_SYSCALL_COMPAT(get_robust_list); COND_SYSCALL(futex_waitv); +COND_SYSCALL(futex_wake); COND_SYSCALL(kexec_load); COND_SYSCALL_COMPAT(kexec_load); COND_SYSCALL(init_module); From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CB8AC001DF for ; Mon, 7 Aug 2023 12:38:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233783AbjHGMiH (ORCPT ); Mon, 7 Aug 2023 08:38:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233617AbjHGMhW (ORCPT ); Mon, 7 Aug 2023 08:37:22 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 539DE10F6; Mon, 7 Aug 2023 05:37:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=EZIyQFpYSmr+EWjOL5a+M3dpjuZvxzS495rTXOpv4ME=; b=ehEsNfwSZ0jDg9NlrPMsn1dqWe hSHvvBaaop3p/hC7EAiKB3mpSXaMcVk/pLfS3vCEm+SJXLH2YTa3jWAswyPgX7W0OWD6DXywxUBwM SlQXRDZUWVskUJ+Jqofi1maJKT60zqta/JoGUMiaYEc6b1f2GrVs0bGoV2QSG7365xb+ZKUIdY6+x X2rgtzAC3ytHrhfdOSjAWDYVC8v6xsHIHRouK6CmddiPPOjmDTrleNbDqa6KGqrcn2f5WjhMHd/na XqN6fR5rZfVtevfu35gZqoqbaHihhri3Ks90q59ukzBFu9teGEpzIi7owwqWXBCS+9kSDOcOPL3Kh Tj0gpo2A==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qSzTl-003oSm-1l; Mon, 07 Aug 2023 12:36:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 277CA301C41; Mon, 7 Aug 2023 14:36:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id A15492021C3D9; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123323.159400076@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:49 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de, Geert Uytterhoeven Subject: [PATCH v2 06/14] futex: Add sys_futex_wait() References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To complement sys_futex_waitv()/wake(), add sys_futex_wait(). This syscall implements what was previously known as FUTEX_WAIT_BITSET except it uses 'unsigned long' for the value and bitmask arguments, takes timespec and clockid_t arguments for the absolute timeout and uses FUTEX2 flags. The 'unsigned long' allows FUTEX2_SIZE_U64 on 64bit platforms. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner Acked-by: Geert Uytterhoeven --- arch/alpha/kernel/syscalls/syscall.tbl | 1=20 arch/arm/tools/syscall.tbl | 1=20 arch/arm64/include/asm/unistd.h | 2=20 arch/arm64/include/asm/unistd32.h | 2=20 arch/ia64/kernel/syscalls/syscall.tbl | 1=20 arch/m68k/kernel/syscalls/syscall.tbl | 1=20 arch/microblaze/kernel/syscalls/syscall.tbl | 1=20 arch/mips/kernel/syscalls/syscall_n32.tbl | 1=20 arch/mips/kernel/syscalls/syscall_n64.tbl | 1=20 arch/mips/kernel/syscalls/syscall_o32.tbl | 1=20 arch/parisc/kernel/syscalls/syscall.tbl | 1=20 arch/powerpc/kernel/syscalls/syscall.tbl | 1=20 arch/s390/kernel/syscalls/syscall.tbl | 1=20 arch/sh/kernel/syscalls/syscall.tbl | 1=20 arch/sparc/kernel/syscalls/syscall.tbl | 1=20 arch/x86/entry/syscalls/syscall_32.tbl | 1=20 arch/x86/entry/syscalls/syscall_64.tbl | 1=20 arch/xtensa/kernel/syscalls/syscall.tbl | 1=20 include/linux/syscalls.h | 4=20 include/uapi/asm-generic/unistd.h | 4=20 kernel/futex/futex.h | 3=20 kernel/futex/syscalls.c | 120 +++++++++++++++++++++--= ----- kernel/futex/waitwake.c | 67 +++++++++------ kernel/sys_ni.c | 1=20 24 files changed, 159 insertions(+), 60 deletions(-) --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -492,3 +492,4 @@ 560 common set_mempolicy_home_node sys_ni_syscall 561 common cachestat sys_cachestat 562 common futex_wake sys_futex_wake +563 common futex_wait sys_futex_wait --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -466,3 +466,4 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake +453 common futex_wait sys_futex_wait --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -39,7 +39,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) =20 -#define __NR_compat_syscalls 453 +#define __NR_compat_syscalls 454 #endif =20 #define __ARCH_WANT_SYS_CLONE --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -911,6 +911,8 @@ __SYSCALL(__NR_set_mempolicy_home_node, __SYSCALL(__NR_cachestat, sys_cachestat) #define __NR_futex_wake 452 __SYSCALL(__NR_futex_wake, sys_futex_wake) +#define __NR_futex_wait 453 +__SYSCALL(__NR_futex_wait, sys_futex_wait) =20 /* * Please add new compat syscalls above this comment and update --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -373,3 +373,4 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake +453 common futex_wait sys_futex_wait --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -452,3 +452,4 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake +453 common futex_wait sys_futex_wait --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -458,3 +458,4 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake +453 common futex_wait sys_futex_wait --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -391,3 +391,4 @@ 450 n32 set_mempolicy_home_node sys_set_mempolicy_home_node 451 n32 cachestat sys_cachestat 452 n32 futex_wake sys_futex_wake +453 n32 futex_wait sys_futex_wait --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -367,3 +367,4 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 n64 cachestat sys_cachestat 452 n64 futex_wake sys_futex_wake +453 n64 futex_wait sys_futex_wait --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -440,3 +440,4 @@ 450 o32 set_mempolicy_home_node sys_set_mempolicy_home_node 451 o32 cachestat sys_cachestat 452 o32 futex_wake sys_futex_wake +453 o32 futex_wait sys_futex_wait --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -451,3 +451,4 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake +453 common futex_wait sys_futex_wait --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -539,3 +539,4 @@ 450 nospu set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake +453 common futex_wait sys_futex_wait --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -455,3 +455,4 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node sys_set_me= mpolicy_home_node 451 common cachestat sys_cachestat sys_cachestat 452 common futex_wake sys_futex_wake sys_futex_wake +453 common futex_wait sys_futex_wait sys_futex_wait --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -455,3 +455,4 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake +453 common futex_wait sys_futex_wait --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -498,3 +498,4 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake +453 common futex_wait sys_futex_wait --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -457,3 +457,4 @@ 450 i386 set_mempolicy_home_node sys_set_mempolicy_home_node 451 i386 cachestat sys_cachestat 452 i386 futex_wake sys_futex_wake +453 i386 futex_wait sys_futex_wait --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -374,6 +374,7 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake +453 common futex_wait sys_futex_wait =20 # # Due to a historical design error, certain syscalls are numbered differen= tly --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -423,3 +423,4 @@ 450 common set_mempolicy_home_node sys_set_mempolicy_home_node 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake +453 common futex_wait sys_futex_wait --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -566,6 +566,10 @@ asmlinkage long sys_futex_waitv(struct f =20 asmlinkage long sys_futex_wake(void __user *uaddr, unsigned long mask, int= nr, unsigned int flags); =20 +asmlinkage long sys_futex_wait(void __user *uaddr, unsigned long val, unsi= gned long mask, + unsigned int flags, struct __kernel_timespec __user *timespec, + clockid_t clockid); + asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp, struct __kernel_timespec __user *rmtp); asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp, --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -820,9 +820,11 @@ __SYSCALL(__NR_set_mempolicy_home_node, __SYSCALL(__NR_cachestat, sys_cachestat) #define __NR_futex_wake 452 __SYSCALL(__NR_futex_wake, sys_futex_wake) +#define __NR_futex_wait 453 +__SYSCALL(__NR_futex_wait, sys_futex_wait) =20 #undef __NR_syscalls -#define __NR_syscalls 453 +#define __NR_syscalls 454 =20 /* * 32 bit systems traditionally used different --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -331,6 +331,9 @@ extern int futex_requeue(u32 __user *uad u32 __user *uaddr2, int nr_wake, int nr_requeue, u32 *cmpval, int requeue_pi); =20 +extern int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, + struct hrtimer_sleeper *to, u32 bitset); + extern int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *abs_time, u32 bitset); =20 --- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -221,6 +221,46 @@ static int futex_parse_waitv(struct fute return 0; } =20 +static int futex2_setup_timeout(struct __kernel_timespec __user *timeout, + clockid_t clockid, struct hrtimer_sleeper *to) +{ + int flag_clkid =3D 0, flag_init =3D 0; + struct timespec64 ts; + ktime_t time; + int ret; + + if (!timeout) + return 0; + + if (clockid =3D=3D CLOCK_REALTIME) { + flag_clkid =3D FLAGS_CLOCKRT; + flag_init =3D FUTEX_CLOCK_REALTIME; + } + + if (clockid !=3D CLOCK_REALTIME && clockid !=3D CLOCK_MONOTONIC) + return -EINVAL; + + if (get_timespec64(&ts, timeout)) + return -EFAULT; + + /* + * Since there's no opcode for futex_waitv, use + * FUTEX_WAIT_BITSET that uses absolute timeout as well + */ + ret =3D futex_init_timeout(FUTEX_WAIT_BITSET, flag_init, &ts, &time); + if (ret) + return ret; + + futex_setup_timer(&time, to, flag_clkid, 0); + return 0; +} + +static inline void futex2_destroy_timeout(struct hrtimer_sleeper *to) +{ + hrtimer_cancel(&to->timer); + destroy_hrtimer_on_stack(&to->timer); +} + /** * sys_futex_waitv - Wait on a list of futexes * @waiters: List of futexes to wait on @@ -250,8 +290,6 @@ SYSCALL_DEFINE5(futex_waitv, struct fute { struct hrtimer_sleeper to; struct futex_vector *futexv; - struct timespec64 ts; - ktime_t time; int ret; =20 /* This syscall supports no flags for now */ @@ -261,30 +299,8 @@ SYSCALL_DEFINE5(futex_waitv, struct fute if (!nr_futexes || nr_futexes > FUTEX_WAITV_MAX || !waiters) return -EINVAL; =20 - if (timeout) { - int flag_clkid =3D 0, flag_init =3D 0; - - if (clockid =3D=3D CLOCK_REALTIME) { - flag_clkid =3D FLAGS_CLOCKRT; - flag_init =3D FUTEX_CLOCK_REALTIME; - } - - if (clockid !=3D CLOCK_REALTIME && clockid !=3D CLOCK_MONOTONIC) - return -EINVAL; - - if (get_timespec64(&ts, timeout)) - return -EFAULT; - - /* - * Since there's no opcode for futex_waitv, use - * FUTEX_WAIT_BITSET that uses absolute timeout as well - */ - ret =3D futex_init_timeout(FUTEX_WAIT_BITSET, flag_init, &ts, &time); - if (ret) - return ret; - - futex_setup_timer(&time, &to, flag_clkid, 0); - } + if (timeout && (ret =3D futex2_setup_timeout(timeout, clockid, &to))) + return ret; =20 futexv =3D kcalloc(nr_futexes, sizeof(*futexv), GFP_KERNEL); if (!futexv) { @@ -299,10 +315,8 @@ SYSCALL_DEFINE5(futex_waitv, struct fute kfree(futexv); =20 destroy_timer: - if (timeout) { - hrtimer_cancel(&to.timer); - destroy_hrtimer_on_stack(&to.timer); - } + if (timeout) + futex2_destroy_timeout(&to); return ret; } =20 @@ -336,6 +350,52 @@ SYSCALL_DEFINE4(futex_wake, return futex_wake(uaddr, flags, nr, mask); } =20 +/* + * sys_futex_wait - Wait on a futex + * @uaddr: Address of the futex to wait on + * @val: Value of @uaddr + * @mask: bitmask + * @flags: FUTEX2 flags + * @timeout: Optional absolute timeout + * @clockid: Clock to be used for the timeout, realtime or monotonic + * + * Identical to the traditional FUTEX_WAIT_BITSET op, except it is part of= the + * futex2 familiy of calls. + */ + +SYSCALL_DEFINE6(futex_wait, + void __user *, uaddr, + unsigned long, val, + unsigned long, mask, + unsigned int, flags, + struct __kernel_timespec __user *, timeout, + clockid_t, clockid) +{ + struct hrtimer_sleeper to; + int ret; + + if (flags & ~FUTEX2_VALID_MASK) + return -EINVAL; + + flags =3D futex2_to_flags(flags); + if (!futex_flags_valid(flags)) + return -EINVAL; + + if (!futex_validate_input(flags, val) || + !futex_validate_input(flags, mask)) + return -EINVAL; + + if (timeout && (ret =3D futex2_setup_timeout(timeout, clockid, &to))) + return ret; + + ret =3D __futex_wait(uaddr, flags, val, timeout ? &to : NULL, mask); + + if (timeout) + futex2_destroy_timeout(&to); + + return ret; +} + #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(set_robust_list, struct compat_robust_list_head __user *, head, --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -629,20 +629,18 @@ int futex_wait_setup(u32 __user *uaddr, return ret; } =20 -int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *ab= s_time, u32 bitset) +int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, + struct hrtimer_sleeper *to, u32 bitset) { - struct hrtimer_sleeper timeout, *to; - struct restart_block *restart; - struct futex_hash_bucket *hb; struct futex_q q =3D futex_q_init; + struct futex_hash_bucket *hb; int ret; =20 if (!bitset) return -EINVAL; + q.bitset =3D bitset; =20 - to =3D futex_setup_timer(abs_time, &timeout, flags, - current->timer_slack_ns); retry: /* * Prepare to wait on uaddr. On success, it holds hb->lock and q @@ -650,18 +648,17 @@ int futex_wait(u32 __user *uaddr, unsign */ ret =3D futex_wait_setup(uaddr, val, flags, &q, &hb); if (ret) - goto out; + return ret; =20 /* futex_queue and wait for wakeup, timeout, or a signal. */ futex_wait_queue(hb, &q, to); =20 /* If we were woken (and unqueued), we succeeded, whatever. */ - ret =3D 0; if (!futex_unqueue(&q)) - goto out; - ret =3D -ETIMEDOUT; + return 0; + if (to && !to->task) - goto out; + return -ETIMEDOUT; =20 /* * We expect signal_pending(current), but we might be the @@ -670,24 +667,38 @@ int futex_wait(u32 __user *uaddr, unsign if (!signal_pending(current)) goto retry; =20 - ret =3D -ERESTARTSYS; - if (!abs_time) - goto out; - - restart =3D ¤t->restart_block; - restart->futex.uaddr =3D uaddr; - restart->futex.val =3D val; - restart->futex.time =3D *abs_time; - restart->futex.bitset =3D bitset; - restart->futex.flags =3D flags | FLAGS_HAS_TIMEOUT; - - ret =3D set_restart_fn(restart, futex_wait_restart); - -out: - if (to) { - hrtimer_cancel(&to->timer); - destroy_hrtimer_on_stack(&to->timer); + return -ERESTARTSYS; +} + +int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, ktime_t *ab= s_time, u32 bitset) +{ + struct hrtimer_sleeper timeout, *to; + struct restart_block *restart; + int ret; + + to =3D futex_setup_timer(abs_time, &timeout, flags, + current->timer_slack_ns); + + ret =3D __futex_wait(uaddr, flags, val, to, bitset); + + /* No timeout, nothing to clean up. */ + if (!to) + return ret; + + hrtimer_cancel(&to->timer); + destroy_hrtimer_on_stack(&to->timer); + + if (ret =3D=3D -ERESTARTSYS) { + restart =3D ¤t->restart_block; + restart->futex.uaddr =3D uaddr; + restart->futex.val =3D val; + restart->futex.time =3D *abs_time; + restart->futex.bitset =3D bitset; + restart->futex.flags =3D flags | FLAGS_HAS_TIMEOUT; + + return set_restart_fn(restart, futex_wait_restart); } + return ret; } =20 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -88,6 +88,7 @@ COND_SYSCALL(get_robust_list); COND_SYSCALL_COMPAT(get_robust_list); COND_SYSCALL(futex_waitv); COND_SYSCALL(futex_wake); +COND_SYSCALL(futex_wait); COND_SYSCALL(kexec_load); COND_SYSCALL_COMPAT(kexec_load); COND_SYSCALL(init_module); From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24E1AC001B0 for ; Mon, 7 Aug 2023 12:37:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233678AbjHGMho (ORCPT ); Mon, 7 Aug 2023 08:37:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233609AbjHGMhU (ORCPT ); Mon, 7 Aug 2023 08:37:20 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EFF1910F2; Mon, 7 Aug 2023 05:37:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=kWs9TUI2FqJo9lXva2lEdKKGsIDSB3Llcr9amlgOJS4=; b=n/KciRWLyqqJUOUjojGDRmvbp5 /2vbrJgg6yBBhRRMx0DUh3lH3MdS7eMpBG4BHEkx39W9k7snr+E8i3KhRCe0XBfjiVjVuriKRQjWl dYJQZCGbeQHwjPq78B6ac+59qdpEXbqs1btvtMbncK89wDV0RmIOdtNQGHZJH5goNQ75ncT5a6d3P 4tgHvUz8IVyrmMbzw0ZM4T0O/iPv6rQaOgZS8wPfxfNt2zbOBfdkb5vPri+jTkPq3fuDGJr89atvU rD4v3FkYLKoc2VN6WgG4rl8Fn/mhpO8282N3zMcAq2ex+BOMtPvGsUG1WQIMSGHKoMkgvhhNsdwEc wiFgobmQ==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qSzTl-003oSk-1V; Mon, 07 Aug 2023 12:36:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 2876D302DD9; Mon, 7 Aug 2023 14:36:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id A69902021C3DA; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123323.228604931@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:50 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de Subject: [PATCH v2 07/14] futex: Propagate flags into get_futex_key() References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Instead of only passing FLAGS_SHARED as a boolean, pass down flags as a whole. No functional change intended. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner --- kernel/futex/core.c | 5 ++++- kernel/futex/futex.h | 2 +- kernel/futex/pi.c | 4 ++-- kernel/futex/requeue.c | 6 +++--- kernel/futex/waitwake.c | 14 +++++++------- 5 files changed, 17 insertions(+), 14 deletions(-) --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -217,7 +217,7 @@ static u64 get_inode_sequence_number(str * * lock_page() might sleep, the caller should not hold a spinlock. */ -int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key *key, +int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *= key, enum futex_access rw) { unsigned long address =3D (unsigned long)uaddr; @@ -225,6 +225,9 @@ int get_futex_key(u32 __user *uaddr, boo struct page *page, *tail; struct address_space *mapping; int err, ro =3D 0; + bool fshared; + + fshared =3D flags & FLAGS_SHARED; =20 /* * The futex address must be "naturally" aligned. --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -174,7 +174,7 @@ enum futex_access { FUTEX_WRITE }; =20 -extern int get_futex_key(u32 __user *uaddr, bool fshared, union futex_key = *key, +extern int get_futex_key(u32 __user *uaddr, unsigned int flags, union fute= x_key *key, enum futex_access rw); =20 extern struct hrtimer_sleeper * --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -945,7 +945,7 @@ int futex_lock_pi(u32 __user *uaddr, uns to =3D futex_setup_timer(time, &timeout, flags, 0); =20 retry: - ret =3D get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key, FUTEX_WRITE); + ret =3D get_futex_key(uaddr, flags, &q.key, FUTEX_WRITE); if (unlikely(ret !=3D 0)) goto out; =20 @@ -1117,7 +1117,7 @@ int futex_unlock_pi(u32 __user *uaddr, u if ((uval & FUTEX_TID_MASK) !=3D vpid) return -EPERM; =20 - ret =3D get_futex_key(uaddr, flags & FLAGS_SHARED, &key, FUTEX_WRITE); + ret =3D get_futex_key(uaddr, flags, &key, FUTEX_WRITE); if (ret) return ret; =20 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -424,10 +424,10 @@ int futex_requeue(u32 __user *uaddr1, un } =20 retry: - ret =3D get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ); + ret =3D get_futex_key(uaddr1, flags, &key1, FUTEX_READ); if (unlikely(ret !=3D 0)) return ret; - ret =3D get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, + ret =3D get_futex_key(uaddr2, flags, &key2, requeue_pi ? FUTEX_WRITE : FUTEX_READ); if (unlikely(ret !=3D 0)) return ret; @@ -789,7 +789,7 @@ int futex_wait_requeue_pi(u32 __user *ua */ rt_mutex_init_waiter(&rt_waiter); =20 - ret =3D get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE); + ret =3D get_futex_key(uaddr2, flags, &key2, FUTEX_WRITE); if (unlikely(ret !=3D 0)) goto out; =20 --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -145,13 +145,13 @@ int futex_wake(u32 __user *uaddr, unsign struct futex_hash_bucket *hb; struct futex_q *this, *next; union futex_key key =3D FUTEX_KEY_INIT; - int ret; DEFINE_WAKE_Q(wake_q); + int ret; =20 if (!bitset) return -EINVAL; =20 - ret =3D get_futex_key(uaddr, flags & FLAGS_SHARED, &key, FUTEX_READ); + ret =3D get_futex_key(uaddr, flags, &key, FUTEX_READ); if (unlikely(ret !=3D 0)) return ret; =20 @@ -245,10 +245,10 @@ int futex_wake_op(u32 __user *uaddr1, un DEFINE_WAKE_Q(wake_q); =20 retry: - ret =3D get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, FUTEX_READ); + ret =3D get_futex_key(uaddr1, flags, &key1, FUTEX_READ); if (unlikely(ret !=3D 0)) return ret; - ret =3D get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2, FUTEX_WRITE); + ret =3D get_futex_key(uaddr2, flags, &key2, FUTEX_WRITE); if (unlikely(ret !=3D 0)) return ret; =20 @@ -423,7 +423,7 @@ static int futex_wait_multiple_setup(str continue; =20 ret =3D get_futex_key(u64_to_user_ptr(vs[i].w.uaddr), - vs[i].w.flags & FLAGS_SHARED, + vs[i].w.flags, &vs[i].q.key, FUTEX_READ); =20 if (unlikely(ret)) @@ -435,7 +435,7 @@ static int futex_wait_multiple_setup(str for (i =3D 0; i < count; i++) { u32 __user *uaddr =3D (u32 __user *)(unsigned long)vs[i].w.uaddr; struct futex_q *q =3D &vs[i].q; - u32 val =3D (u32)vs[i].w.val; + u32 val =3D vs[i].w.val; =20 hb =3D futex_q_lock(q); ret =3D futex_get_value_locked(&uval, uaddr); @@ -599,7 +599,7 @@ int futex_wait_setup(u32 __user *uaddr, * while the syscall executes. */ retry: - ret =3D get_futex_key(uaddr, flags & FLAGS_SHARED, &q->key, FUTEX_READ); + ret =3D get_futex_key(uaddr, flags, &q->key, FUTEX_READ); if (unlikely(ret !=3D 0)) return ret; From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76539C41513 for ; Mon, 7 Aug 2023 12:37:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233648AbjHGMh3 (ORCPT ); Mon, 7 Aug 2023 08:37:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233592AbjHGMhT (ORCPT ); Mon, 7 Aug 2023 08:37:19 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 95E5010FC; Mon, 7 Aug 2023 05:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=pYLybm9JZIJCZuh77o+xiW69/IF1XmJJdrRIN+H3ncw=; b=pFuSQuVGSBJuutUwmlZpjMWdRU kvMUDLgJkXKo9JPSo4x0sQY9HwX6M+hpbJzOfRDznKFrVHKVWrcBUnYeDmtwOPuHVYstLYl+mrfro cZZpCjRlWeufpcF++64Tvvbk7dVtbbOx8m2nxgp4EKelAXSxwAVvB7sX5tg8Cl+Lr3QQF3oOEcEZw FDF0SkoPTqaoqymn0jO9gOgPdi66KzSZ4r7GlthBVG7UtjsLTArj7CbvEpNC2ab9N64AgmFfVr63v vJ1Fd8ziqqrOHHo214l1verg7oEcGCRSwG1Z0sAByEXYygkgkxN/UGMRogWSyT2ZIV5eMlsQSaDYz V521WANA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qSzTl-003oSj-1T; Mon, 07 Aug 2023 12:36:57 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 28739302792; Mon, 7 Aug 2023 14:36:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id AD6302021C3D5; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123323.297438324@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:51 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de Subject: [PATCH v2 08/14] futex: Add flags2 argument to futex_requeue() References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to support mixed size requeue, add a second flags argument to the internal futex_requeue() function. No functional change intended. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner --- kernel/futex/futex.h | 5 +++-- kernel/futex/requeue.c | 12 +++++++----- kernel/futex/syscalls.c | 6 +++--- 3 files changed, 13 insertions(+), 10 deletions(-) --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -318,8 +318,9 @@ extern int futex_wait_requeue_pi(u32 __u val, ktime_t *abs_time, u32 bitset, u32 __user *uaddr2); =20 -extern int futex_requeue(u32 __user *uaddr1, unsigned int flags, - u32 __user *uaddr2, int nr_wake, int nr_requeue, +extern int futex_requeue(u32 __user *uaddr1, unsigned int flags1, + u32 __user *uaddr2, unsigned int flags2, + int nr_wake, int nr_requeue, u32 *cmpval, int requeue_pi); =20 extern int __futex_wait(u32 __user *uaddr, unsigned int flags, u32 val, --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -346,8 +346,9 @@ futex_proxy_trylock_atomic(u32 __user *p /** * futex_requeue() - Requeue waiters from uaddr1 to uaddr2 * @uaddr1: source futex user address - * @flags: futex flags (FLAGS_SHARED, etc.) + * @flags1: futex flags (FLAGS_SHARED, etc.) * @uaddr2: target futex user address + * @flags2: futex flags (FLAGS_SHARED, etc.) * @nr_wake: number of waiters to wake (must be 1 for requeue_pi) * @nr_requeue: number of waiters to requeue (0-INT_MAX) * @cmpval: @uaddr1 expected value (or %NULL) @@ -361,7 +362,8 @@ futex_proxy_trylock_atomic(u32 __user *p * - >=3D0 - on success, the number of tasks requeued or woken; * - <0 - on error */ -int futex_requeue(u32 __user *uaddr1, unsigned int flags, u32 __user *uadd= r2, +int futex_requeue(u32 __user *uaddr1, unsigned int flags1, + u32 __user *uaddr2, unsigned int flags2, int nr_wake, int nr_requeue, u32 *cmpval, int requeue_pi) { union futex_key key1 =3D FUTEX_KEY_INIT, key2 =3D FUTEX_KEY_INIT; @@ -424,10 +426,10 @@ int futex_requeue(u32 __user *uaddr1, un } =20 retry: - ret =3D get_futex_key(uaddr1, flags, &key1, FUTEX_READ); + ret =3D get_futex_key(uaddr1, flags1, &key1, FUTEX_READ); if (unlikely(ret !=3D 0)) return ret; - ret =3D get_futex_key(uaddr2, flags, &key2, + ret =3D get_futex_key(uaddr2, flags2, &key2, requeue_pi ? FUTEX_WRITE : FUTEX_READ); if (unlikely(ret !=3D 0)) return ret; @@ -459,7 +461,7 @@ int futex_requeue(u32 __user *uaddr1, un if (ret) return ret; =20 - if (!(flags & FLAGS_SHARED)) + if (!(flags1 & FLAGS_SHARED)) goto retry_private; =20 goto retry; --- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -106,9 +106,9 @@ long do_futex(u32 __user *uaddr, int op, case FUTEX_WAKE_BITSET: return futex_wake(uaddr, flags, val, val3); case FUTEX_REQUEUE: - return futex_requeue(uaddr, flags, uaddr2, val, val2, NULL, 0); + return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, NULL, 0); case FUTEX_CMP_REQUEUE: - return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 0); + return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, &val3, 0); case FUTEX_WAKE_OP: return futex_wake_op(uaddr, flags, uaddr2, val, val2, val3); case FUTEX_LOCK_PI: @@ -125,7 +125,7 @@ long do_futex(u32 __user *uaddr, int op, return futex_wait_requeue_pi(uaddr, flags, val, timeout, val3, uaddr2); case FUTEX_CMP_REQUEUE_PI: - return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1); + return futex_requeue(uaddr, flags, uaddr2, flags, val, val2, &val3, 1); } return -ENOSYS; } From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B6D0C04A94 for ; Mon, 7 Aug 2023 12:38:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233767AbjHGMiE (ORCPT ); Mon, 7 Aug 2023 08:38:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233610AbjHGMhU (ORCPT ); Mon, 7 Aug 2023 08:37:20 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 14118170B; Mon, 7 Aug 2023 05:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=Xi18wbHB7VNHKYqwwPbXtDuzrsECrr9c5XSVyiI+TAw=; b=KqO2kuHfKLrv5u2OY576yuyqTS MpbOyrVFRvo7ba296aw60KiAk4fzQqtj8P72vWN0T9Hc1gT0+aqIWFhqLtPjy5vx0RzYrpumAIOG/ Aek+ye7WtBQr+4HWa5BCSjSCLXADV6kA8mvRfog3zxySGaAosFLdQH4qVcVj/HMrnrwcm+msHG8EC SbcijN7kTPA+UjQ9oTgsiSnlToZ4gjGygc+kq3ct6wQXAoIbErA5zu4gYUNuHSB7VDf7yhdnBnpOd yBQagzYNZhTMX0CGgF0nOZVyjZajZX9sDU6NaMQtsXEEShakqKKrGTlx5US3CXnxVos3p5Kx7zVd9 0aOFqbDA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qSzTl-00AxGN-Ho; Mon, 07 Aug 2023 12:36:57 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 287AA30334C; Mon, 7 Aug 2023 14:36:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id B074B2021C3DC; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123323.366498604@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:52 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de, Geert Uytterhoeven Subject: [PATCH v2 09/14] futex: Add sys_futex_requeue() References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Finish of the 'simple' futex2 syscall group by adding sys_futex_requeue(). Unlike sys_futex_{wait,wake}() its arguments are too numerous to fit into a regular syscall. As such, use struct futex_waitv to pass the 'source' and 'destination' futexes to the syscall. This syscall implements what was previously known as FUTEX_CMP_REQUEUE and uses {val, uaddr, flags} for source and {uaddr, flags} for destination. This design explicitly allows requeueing between different types of futex by having a different flags word per uaddr. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Geert Uytterhoeven Reviewed-by: Thomas Gleixner --- arch/alpha/kernel/syscalls/syscall.tbl | 1=20 arch/arm/tools/syscall.tbl | 1=20 arch/arm64/include/asm/unistd.h | 2 - arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1=20 arch/m68k/kernel/syscalls/syscall.tbl | 1=20 arch/microblaze/kernel/syscalls/syscall.tbl | 1=20 arch/mips/kernel/syscalls/syscall_n32.tbl | 1=20 arch/mips/kernel/syscalls/syscall_n64.tbl | 1=20 arch/mips/kernel/syscalls/syscall_o32.tbl | 1=20 arch/parisc/kernel/syscalls/syscall.tbl | 1=20 arch/powerpc/kernel/syscalls/syscall.tbl | 1=20 arch/s390/kernel/syscalls/syscall.tbl | 1=20 arch/sh/kernel/syscalls/syscall.tbl | 1=20 arch/sparc/kernel/syscalls/syscall.tbl | 1=20 arch/x86/entry/syscalls/syscall_32.tbl | 1=20 arch/x86/entry/syscalls/syscall_64.tbl | 1=20 arch/xtensa/kernel/syscalls/syscall.tbl | 1=20 include/linux/syscalls.h | 3 ++ include/uapi/asm-generic/unistd.h | 4 ++ kernel/futex/syscalls.c | 38 +++++++++++++++++++++++= +++++ kernel/sys_ni.c | 1=20 22 files changed, 64 insertions(+), 2 deletions(-) --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -493,3 +493,4 @@ 561 common cachestat sys_cachestat 562 common futex_wake sys_futex_wake 563 common futex_wait sys_futex_wait +564 common futex_requeue sys_futex_requeue --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -467,3 +467,4 @@ 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake 453 common futex_wait sys_futex_wait +454 common futex_requeue sys_futex_requeue --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -39,7 +39,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) =20 -#define __NR_compat_syscalls 454 +#define __NR_compat_syscalls 455 #endif =20 #define __ARCH_WANT_SYS_CLONE --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -913,6 +913,8 @@ __SYSCALL(__NR_cachestat, sys_cachestat) __SYSCALL(__NR_futex_wake, sys_futex_wake) #define __NR_futex_wait 453 __SYSCALL(__NR_futex_wait, sys_futex_wait) +#define __NR_futex_requeue 454 +__SYSCALL(__NR_futex_requeue, sys_futex_requeue) =20 /* * Please add new compat syscalls above this comment and update --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -374,3 +374,4 @@ 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake 453 common futex_wait sys_futex_wait +454 common futex_requeue sys_futex_requeue --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -453,3 +453,4 @@ 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake 453 common futex_wait sys_futex_wait +454 common futex_requeue sys_futex_requeue --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -459,3 +459,4 @@ 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake 453 common futex_wait sys_futex_wait +454 common futex_requeue sys_futex_requeue --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -392,3 +392,4 @@ 451 n32 cachestat sys_cachestat 452 n32 futex_wake sys_futex_wake 453 n32 futex_wait sys_futex_wait +454 n32 futex_requeue sys_futex_requeue --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -368,3 +368,4 @@ 451 n64 cachestat sys_cachestat 452 n64 futex_wake sys_futex_wake 453 n64 futex_wait sys_futex_wait +454 n64 futex_requeue sys_futex_requeue --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -441,3 +441,4 @@ 451 o32 cachestat sys_cachestat 452 o32 futex_wake sys_futex_wake 453 o32 futex_wait sys_futex_wait +454 o32 futex_requeue sys_futex_requeue --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -452,3 +452,4 @@ 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake 453 common futex_wait sys_futex_wait +454 common futex_requeue sys_futex_requeue --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -540,3 +540,4 @@ 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake 453 common futex_wait sys_futex_wait +454 common futex_requeue sys_futex_requeue --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -456,3 +456,4 @@ 451 common cachestat sys_cachestat sys_cachestat 452 common futex_wake sys_futex_wake sys_futex_wake 453 common futex_wait sys_futex_wait sys_futex_wait +454 common futex_requeue sys_futex_requeue sys_futex_requeue --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -456,3 +456,4 @@ 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake 453 common futex_wait sys_futex_wait +454 common futex_requeue sys_futex_requeue --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -499,3 +499,4 @@ 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake 453 common futex_wait sys_futex_wait +454 common futex_requeue sys_futex_requeue --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -458,3 +458,4 @@ 451 i386 cachestat sys_cachestat 452 i386 futex_wake sys_futex_wake 453 i386 futex_wait sys_futex_wait +454 i386 futex_requeue sys_futex_requeue --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -375,6 +375,7 @@ 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake 453 common futex_wait sys_futex_wait +454 common futex_requeue sys_futex_requeue =20 # # Due to a historical design error, certain syscalls are numbered differen= tly --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -424,3 +424,4 @@ 451 common cachestat sys_cachestat 452 common futex_wake sys_futex_wake 453 common futex_wait sys_futex_wait +454 common futex_requeue sys_futex_requeue --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -570,6 +570,9 @@ asmlinkage long sys_futex_wait(void __us unsigned int flags, struct __kernel_timespec __user *timespec, clockid_t clockid); =20 +asmlinkage long sys_futex_requeue(struct futex_waitv __user *waiters, + unsigned int flags, int nr_wake, int nr_requeue); + asmlinkage long sys_nanosleep(struct __kernel_timespec __user *rqtp, struct __kernel_timespec __user *rmtp); asmlinkage long sys_nanosleep_time32(struct old_timespec32 __user *rqtp, --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -822,9 +822,11 @@ __SYSCALL(__NR_cachestat, sys_cachestat) __SYSCALL(__NR_futex_wake, sys_futex_wake) #define __NR_futex_wait 453 __SYSCALL(__NR_futex_wait, sys_futex_wait) +#define __NR_futex_requeue 454 +__SYSCALL(__NR_futex_requeue, sys_futex_requeue) =20 #undef __NR_syscalls -#define __NR_syscalls 454 +#define __NR_syscalls 455 =20 /* * 32 bit systems traditionally used different --- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -396,6 +396,44 @@ SYSCALL_DEFINE6(futex_wait, return ret; } =20 +/* + * sys_futex_requeue - Requeue a waiter from one futex to another + * @waiters: array describing the source and destination futex + * @flags: unused + * @nr_wake: number of futexes to wake + * @nr_requeue: number of futexes to requeue + * + * Identical to the traditional FUTEX_CMP_REQUEUE op, except it is part of= the + * futex2 family of calls. + */ + +SYSCALL_DEFINE4(futex_requeue, + struct futex_waitv __user *, waiters, + unsigned int, flags, + int, nr_wake, + int, nr_requeue) +{ + struct futex_vector futexes[2]; + u32 cmpval; + int ret; + + if (flags) + return -EINVAL; + + if (!waiters) + return -EINVAL; + + ret =3D futex_parse_waitv(futexes, waiters, 2); + if (ret) + return ret; + + cmpval =3D futexes[0].w.val; + + return futex_requeue(u64_to_user_ptr(futexes[0].w.uaddr), futexes[0].w.fl= ags, + u64_to_user_ptr(futexes[1].w.uaddr), futexes[1].w.flags, + nr_wake, nr_requeue, &cmpval, 0); +} + #ifdef CONFIG_COMPAT COMPAT_SYSCALL_DEFINE2(set_robust_list, struct compat_robust_list_head __user *, head, --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -89,6 +89,7 @@ COND_SYSCALL_COMPAT(get_robust_list); COND_SYSCALL(futex_waitv); COND_SYSCALL(futex_wake); COND_SYSCALL(futex_wait); +COND_SYSCALL(futex_requeue); COND_SYSCALL(kexec_load); COND_SYSCALL_COMPAT(kexec_load); COND_SYSCALL(init_module); From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF387C001DF for ; Mon, 7 Aug 2023 12:37:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232030AbjHGMhg (ORCPT ); Mon, 7 Aug 2023 08:37:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232651AbjHGMhU (ORCPT ); Mon, 7 Aug 2023 08:37:20 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1D29170A; Mon, 7 Aug 2023 05:37:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=yhuAOwd3npGJeiWmKJXyciafSAGyeTkzn0KP+zTqoEU=; b=DIfNmxEY7NkuxRZGvTBNvETWLd ddtah769bA3DgEscE0yGOSe5I39fPDiu5Mjvo17Y567v9fB2s7UX2zdEL/95eWCfI3b3Q1PhxBfaY LYDstXbDtVxb/OurZOWFFWwWlAgq3BKqt2A4y2gFBkoO5WtEh/Pn60AQf1xOMd9mA8wonY7weYkEp Bx9EOyLBwrEFqUCS0it9omaKgOhmDSlZW0dv+uiZc5hKDPEZXaJU2140Vc2QBw/ncEUVuJibJQxVw lTJr2P9b8KYQTkBIXzNM11ot2CJ4W9zOw6AmQWhQLaNvpx7A5KkfFbaocB0ejMdjiL0f3fMIPCfTt T0c0m83Q==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1qSzTl-003oSl-1d; Mon, 07 Aug 2023 12:36:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 2DAA73033B1; Mon, 7 Aug 2023 14:36:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id B72A22021C3DB; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123323.434708155@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:53 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de, Christoph Hellwig Subject: [PATCH v2 10/14] mm: Add vmalloc_huge_node() References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" To enable node specific hash-tables. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Christoph Hellwig --- include/linux/vmalloc.h | 1 + mm/vmalloc.c | 7 +++++++ 2 files changed, 8 insertions(+) --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -152,6 +152,7 @@ extern void *__vmalloc_node_range(unsign void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_ma= sk, int node, const void *caller) __alloc_size(1); void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1); +void *vmalloc_huge_node(unsigned long size, gfp_t gfp_mask, int node) __al= loc_size(1); =20 extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_s= ize(1, 2); extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2); --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3416,6 +3416,13 @@ void *vmalloc(unsigned long size) } EXPORT_SYMBOL(vmalloc); =20 +void *vmalloc_huge_node(unsigned long size, gfp_t gfp_mask, int node) +{ + return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, + gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP, + node, __builtin_return_address(0)); +} + /** * vmalloc_huge - allocate virtually contiguous memory, allow huge pages * @size: allocation size From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D42A6C001B0 for ; Mon, 7 Aug 2023 12:37:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233683AbjHGMhr (ORCPT ); Mon, 7 Aug 2023 08:37:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36532 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233597AbjHGMhT (ORCPT ); Mon, 7 Aug 2023 08:37:19 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6326710F6; Mon, 7 Aug 2023 05:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=15dV+fw3TVZwuZ2GLz+s4AB6Ib3B7DldkK+IHYVR8MU=; b=rPipdjkXy2N64QJCm3H49Wb5YA s2TPd1tAd7bFDfW/5nyPT5H2/NrqKO7byVKH7j2gZYeLITPjUUMdOTl4F57jAyJFa3J4fmmpUhMug 8dLUwR/TM/6SD8smp7MtGFdU1hN4l6p2XUy9FA30zOZJmzR+UbAF6KsRMEZbHlTnSPNe3cz9k+Svw V6bFGevXhncv9QbCtNwhbsvoOUQynVl1GzEMcjL6hhEtXF8f78PhIAVAma6BiVNd1o70j/HZ2E8+1 meAATMOisxckPnmmuzz9EcDDZwowdLZGsA5Z691qKWRah5tXHHQZzS2V3VaClSFj1WTC956zmoakZ DXK6aBGg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qSzTl-00AxGK-GQ; Mon, 07 Aug 2023 12:36:57 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 307603033B5; Mon, 7 Aug 2023 14:36:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id BC0D12021C3D6; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123323.504975124@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:54 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de Subject: [PATCH v2 11/14] futex: Implement FUTEX2_NUMA References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Extend the futex2 interface to be numa aware. When FUTEX2_NUMA is specified for a futex, the user value is extended to two words (of the same size). The first is the user value we all know, the second one will be the node to place this futex on. struct futex_numa_32 { u32 val; u32 node; }; When node is set to ~0, WAIT will set it to the current node_id such that WAKE knows where to find it. If userspace corrupts the node value between WAIT and WAKE, the futex will not be found and no wakeup will happen. When FUTEX2_NUMA is not set, the node is simply an extention of the hash, such that traditional futexes are still interleaved over the nodes. This is done to avoid having to have a separate !numa hash-table. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/futex.h | 3 + kernel/futex/core.c | 129 +++++++++++++++++++++++++++++++++++++++----= ----- kernel/futex/futex.h | 25 +++++++-- kernel/futex/syscalls.c | 2=20 4 files changed, 128 insertions(+), 31 deletions(-) --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -34,6 +34,7 @@ union futex_key { u64 i_seq; unsigned long pgoff; unsigned int offset; + /* unsigned int node; */ } shared; struct { union { @@ -42,11 +43,13 @@ union futex_key { }; unsigned long address; unsigned int offset; + /* unsigned int node; */ } private; struct { u64 ptr; unsigned long word; unsigned int offset; + unsigned int node; /* NOT hashed! */ } both; }; =20 --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -34,7 +34,8 @@ #include #include #include -#include +#include +#include #include #include =20 @@ -47,12 +48,14 @@ * reside in the same cacheline. */ static struct { - struct futex_hash_bucket *queues; unsigned long hashsize; + unsigned int hashshift; + struct futex_hash_bucket *queues[MAX_NUMNODES]; } __futex_data __read_mostly __aligned(2*sizeof(long)); -#define futex_queues (__futex_data.queues) -#define futex_hashsize (__futex_data.hashsize) =20 +#define futex_hashsize (__futex_data.hashsize) +#define futex_hashshift (__futex_data.hashshift) +#define futex_queues (__futex_data.queues) =20 /* * Fault injections for futexes. @@ -105,6 +108,26 @@ late_initcall(fail_futex_debugfs); =20 #endif /* CONFIG_FAIL_FUTEX */ =20 +static int futex_get_value(u32 *val, u32 __user *from, unsigned int flags) +{ + switch (futex_size(flags)) { + case 1: return __get_user(*val, (u8 __user *)from); + case 2: return __get_user(*val, (u16 __user *)from); + case 4: return __get_user(*val, (u32 __user *)from); + default: BUG(); + } +} + +static int futex_put_value(u32 val, u32 __user *to, unsigned int flags) +{ + switch (futex_size(flags)) { + case 1: return __put_user(val, (u8 __user *)to); + case 2: return __put_user(val, (u16 __user *)to); + case 4: return __put_user(val, (u32 __user *)to); + default: BUG(); + } +} + /** * futex_hash - Return the hash bucket in the global hash * @key: Pointer to the futex key for which the hash is calculated @@ -114,10 +137,29 @@ late_initcall(fail_futex_debugfs); */ struct futex_hash_bucket *futex_hash(union futex_key *key) { - u32 hash =3D jhash2((u32 *)key, offsetof(typeof(*key), both.offset) / 4, + u32 hash =3D jhash2((u32 *)key, + offsetof(typeof(*key), both.offset) / sizeof(u32), key->both.offset); + int node =3D key->both.node; + + if (node =3D=3D -1) { + /* + * In case of !FLAGS_NUMA, use some unused hash bits to pick a + * node -- this ensures regular futexes are interleaved across + * the nodes and avoids having to allocate multiple + * hash-tables. + * + * NOTE: this isn't perfectly uniform, but it is fast and + * handles sparse node masks. + */ + node =3D (hash >> futex_hashshift) % nr_node_ids; + if (!node_possible(node)) { + node =3D find_next_bit_wrap(node_possible_map.bits, + nr_node_ids, node); + } + } =20 - return &futex_queues[hash & (futex_hashsize - 1)]; + return &futex_queues[node][hash & (futex_hashsize - 1)]; } =20 =20 @@ -217,32 +259,56 @@ static u64 get_inode_sequence_number(str * * lock_page() might sleep, the caller should not hold a spinlock. */ -int get_futex_key(u32 __user *uaddr, unsigned int flags, union futex_key *= key, +int get_futex_key(void __user *uaddr, unsigned int flags, union futex_key = *key, enum futex_access rw) { unsigned long address =3D (unsigned long)uaddr; struct mm_struct *mm =3D current->mm; struct page *page, *tail; struct address_space *mapping; - int err, ro =3D 0; + int node, err, size, ro =3D 0; bool fshared; =20 fshared =3D flags & FLAGS_SHARED; + size =3D futex_size(flags); + if (flags & FLAGS_NUMA) + size *=3D 2; =20 /* * The futex address must be "naturally" aligned. */ key->both.offset =3D address % PAGE_SIZE; - if (unlikely((address % sizeof(u32)) !=3D 0)) + if (unlikely((address % size) !=3D 0)) return -EINVAL; address -=3D key->both.offset; =20 - if (unlikely(!access_ok(uaddr, sizeof(u32)))) + if (unlikely(!access_ok(uaddr, size))) return -EFAULT; =20 if (unlikely(should_fail_futex(fshared))) return -EFAULT; =20 + if (flags & FLAGS_NUMA) { + void __user *naddr =3D uaddr + size / 2; + + if (futex_get_value(&node, naddr, flags)) + return -EFAULT; + + if (node =3D=3D -1) { + node =3D numa_node_id(); + if (futex_put_value(node, naddr, flags)) + return -EFAULT; + + } else if (node >=3D MAX_NUMNODES || !node_possible(node)) { + return -EINVAL; + } + + key->both.node =3D node; + + } else { + key->both.node =3D -1; + } + /* * PROCESS_PRIVATE futexes are fast. * As the mm cannot disappear under us and the 'key' only needs @@ -1125,27 +1191,42 @@ void futex_exit_release(struct task_stru =20 static int __init futex_init(void) { - unsigned int futex_shift; - unsigned long i; + unsigned int order, n; + unsigned long size, i; =20 #if CONFIG_BASE_SMALL futex_hashsize =3D 16; #else - futex_hashsize =3D roundup_pow_of_two(256 * num_possible_cpus()); + futex_hashsize =3D 256 * num_possible_cpus(); + futex_hashsize /=3D num_possible_nodes(); + futex_hashsize =3D roundup_pow_of_two(futex_hashsize); #endif + futex_hashshift =3D ilog2(futex_hashsize); + size =3D sizeof(struct futex_hash_bucket) * futex_hashsize; + order =3D get_order(size); + + for_each_node(n) { + struct futex_hash_bucket *table; + + if (order > MAX_ORDER) + table =3D vmalloc_huge_node(size, GFP_KERNEL, n); + else + table =3D alloc_pages_exact_nid(n, size, GFP_KERNEL); + + BUG_ON(!table); + + for (i =3D 0; i < futex_hashsize; i++) { + atomic_set(&table[i].waiters, 0); + spin_lock_init(&table[i].lock); + plist_head_init(&table[i].chain); + } =20 - futex_queues =3D alloc_large_system_hash("futex", sizeof(*futex_queues), - futex_hashsize, 0, - futex_hashsize < 256 ? HASH_SMALL : 0, - &futex_shift, NULL, - futex_hashsize, futex_hashsize); - futex_hashsize =3D 1UL << futex_shift; - - for (i =3D 0; i < futex_hashsize; i++) { - atomic_set(&futex_queues[i].waiters, 0); - plist_head_init(&futex_queues[i].chain); - spin_lock_init(&futex_queues[i].lock); + futex_queues[n] =3D table; } + pr_info("futex hash table, %d nodes, %ld entries (order: %d, %lu bytes)\n= ", + num_possible_nodes(), + futex_hashsize, order, + sizeof(struct futex_hash_bucket) * futex_hashsize); =20 return 0; } --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -65,6 +65,11 @@ static inline unsigned int futex2_to_fla return flags; } =20 +static inline unsigned int futex_size(unsigned int flags) +{ + return 1 << (flags & FLAGS_SIZE_MASK); +} + static inline bool futex_flags_valid(unsigned int flags) { /* Only 64bit futexes for 64bit code */ @@ -77,12 +82,20 @@ static inline bool futex_flags_valid(uns if ((flags & FLAGS_SIZE_MASK) !=3D FLAGS_SIZE_32) return false; =20 - return true; -} + /* + * Must be able to represent both NUMA_NO_NODE and every valid nodeid + * in a futex word. + */ + if (flags & FLAGS_NUMA) { + int bits =3D 8 * futex_size(flags); + u64 max =3D ~0ULL; =20 -static inline unsigned int futex_size(unsigned int flags) -{ - return 1 << (flags & FLAGS_SIZE_MASK); + max >>=3D 64 - bits; + if (nr_node_ids >=3D max) + return false; + } + + return true; } =20 static inline bool futex_validate_input(unsigned int flags, u64 val) @@ -183,7 +196,7 @@ enum futex_access { FUTEX_WRITE }; =20 -extern int get_futex_key(u32 __user *uaddr, unsigned int flags, union fute= x_key *key, +extern int get_futex_key(void __user *uaddr, unsigned int flags, union fut= ex_key *key, enum futex_access rw); =20 extern struct hrtimer_sleeper * --- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -179,7 +179,7 @@ SYSCALL_DEFINE6(futex, u32 __user *, uad return do_futex(uaddr, op, val, tp, uaddr2, (unsigned long)utime, val3); } =20 -#define FUTEX2_VALID_MASK (FUTEX2_SIZE_MASK | FUTEX2_PRIVATE) +#define FUTEX2_VALID_MASK (FUTEX2_SIZE_MASK | FUTEX2_NUMA | FUTEX2_PRIVATE) =20 /** * futex_parse_waitv - Parse a waitv array from userspace From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C71DC07E8C for ; Mon, 7 Aug 2023 12:37:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231700AbjHGMh0 (ORCPT ); Mon, 7 Aug 2023 08:37:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231648AbjHGMhT (ORCPT ); Mon, 7 Aug 2023 08:37:19 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C32E10F8; Mon, 7 Aug 2023 05:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=sh7cl6AJW0b0f2uQNaiBLh5jyRqhKF5FPfLTlh7soFE=; b=gkcqeHyPgX7jzxCrnbCbVQRmKU sxoBoTTJB+eaxMv4o9RWqLYYrr00tzh8Y9tp8nLHPktfM1FlIWx2vk2VRzQe8Q7POGb5HzWl2G6DB +UaFAkRrHRqmTrT5H0aEIWsB903bT6N4SHjiTV15ToaPCWPDJVDqaN1jbyrjUE9/kYCon9BGbjwfG bwREGmgtoAiEptkoyJahuC3WQqM8EuiXjVpJxEg9tVZr/XerxfvGQ3Fof5g32eL8vZq92giDChBrA JAavjQYe3tEWK6PnJUoleS9+plUE+y4ttVSq9quCRNjAa1c6d7MBwqkeAzyLAaklJPnpTzmlb5Ffw yMOOu43w==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qSzTl-00AxGJ-GT; Mon, 07 Aug 2023 12:36:57 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 336FE3033D4; Mon, 7 Aug 2023 14:36:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id C12802021C3DD; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123323.573374169@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:55 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de Subject: [PATCH v2 12/14] futex: Propagate flags into futex_get_value_locked() References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to facilitate variable sized futexes propagate the flags into futex_get_value_locked(). No functional change intended. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner --- kernel/futex/core.c | 4 ++-- kernel/futex/futex.h | 2 +- kernel/futex/pi.c | 8 ++++---- kernel/futex/requeue.c | 4 ++-- kernel/futex/waitwake.c | 4 ++-- 5 files changed, 11 insertions(+), 11 deletions(-) --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -515,12 +515,12 @@ int futex_cmpxchg_value_locked(u32 *curv return ret; } =20 -int futex_get_value_locked(u32 *dest, u32 __user *from) +int futex_get_value_locked(u32 *dest, u32 __user *from, unsigned int flags) { int ret; =20 pagefault_disable(); - ret =3D __get_user(*dest, from); + ret =3D futex_get_value(dest, from, flags); pagefault_enable(); =20 return ret ? -EFAULT : 0; --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -218,7 +218,7 @@ extern void futex_wake_mark(struct wake_ =20 extern int fault_in_user_writeable(u32 __user *uaddr); extern int futex_cmpxchg_value_locked(u32 *curval, u32 __user *uaddr, u32 = uval, u32 newval); -extern int futex_get_value_locked(u32 *dest, u32 __user *from); +extern int futex_get_value_locked(u32 *dest, u32 __user *from, unsigned in= t flags); extern struct futex_q *futex_top_waiter(struct futex_hash_bucket *hb, unio= n futex_key *key); =20 extern void __futex_unqueue(struct futex_q *q); --- a/kernel/futex/pi.c +++ b/kernel/futex/pi.c @@ -239,7 +239,7 @@ static int attach_to_pi_state(u32 __user * still is what we expect it to be, otherwise retry the entire * operation. */ - if (futex_get_value_locked(&uval2, uaddr)) + if (futex_get_value_locked(&uval2, uaddr, FLAGS_SIZE_32)) goto out_efault; =20 if (uval !=3D uval2) @@ -358,7 +358,7 @@ static int handle_exit_race(u32 __user * * The same logic applies to the case where the exiting task is * already gone. */ - if (futex_get_value_locked(&uval2, uaddr)) + if (futex_get_value_locked(&uval2, uaddr, FLAGS_SIZE_32)) return -EFAULT; =20 /* If the user space value has changed, try again. */ @@ -526,7 +526,7 @@ int futex_lock_pi_atomic(u32 __user *uad * Read the user space value first so we can validate a few * things before proceeding further. */ - if (futex_get_value_locked(&uval, uaddr)) + if (futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32)) return -EFAULT; =20 if (unlikely(should_fail_futex(true))) @@ -762,7 +762,7 @@ static int __fixup_pi_state_owner(u32 __ if (!pi_state->owner) newtid |=3D FUTEX_OWNER_DIED; =20 - err =3D futex_get_value_locked(&uval, uaddr); + err =3D futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32); if (err) goto handle_err; =20 --- a/kernel/futex/requeue.c +++ b/kernel/futex/requeue.c @@ -273,7 +273,7 @@ futex_proxy_trylock_atomic(u32 __user *p u32 curval; int ret; =20 - if (futex_get_value_locked(&curval, pifutex)) + if (futex_get_value_locked(&curval, pifutex, FLAGS_SIZE_32)) return -EFAULT; =20 if (unlikely(should_fail_futex(true))) @@ -451,7 +451,7 @@ int futex_requeue(u32 __user *uaddr1, un if (likely(cmpval !=3D NULL)) { u32 curval; =20 - ret =3D futex_get_value_locked(&curval, uaddr1); + ret =3D futex_get_value_locked(&curval, uaddr1, FLAGS_SIZE_32); =20 if (unlikely(ret)) { double_unlock_hb(hb1, hb2); --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -438,7 +438,7 @@ static int futex_wait_multiple_setup(str u32 val =3D vs[i].w.val; =20 hb =3D futex_q_lock(q); - ret =3D futex_get_value_locked(&uval, uaddr); + ret =3D futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32); =20 if (!ret && uval =3D=3D val) { /* @@ -606,7 +606,7 @@ int futex_wait_setup(u32 __user *uaddr, retry_private: *hb =3D futex_q_lock(q); =20 - ret =3D futex_get_value_locked(&uval, uaddr); + ret =3D futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32); =20 if (ret) { futex_q_unlock(*hb); From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5878DC04FE0 for ; Mon, 7 Aug 2023 12:37:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233659AbjHGMhc (ORCPT ); Mon, 7 Aug 2023 08:37:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233594AbjHGMhT (ORCPT ); Mon, 7 Aug 2023 08:37:19 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 735BD10F7; Mon, 7 Aug 2023 05:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=q2JUuQQkf4QJ5HlnBdmZO9ANv9VK51IAZht4p6DhAtU=; b=X2O67bPGappPJwoyYOLJli/2uq GGSmNYvPLgZVt2/X3Hmrqu1CS1Zl85dYxPspA117dnFShAiy3LcGcjLWdXZTycTNRrU1JneiRVLRn tF26dsEBGwIqCOIot1HFOrI/Fk0gDY8+KE/QG3B9WI9TIXDwsOozljsjh74EFJdAZuM4h9/nbE9Vh o24zKa2EFC2UIQkKpr0RLuefzcIEbbl+sipU4Ta6NRRJkYkadodRotAl0POJj4WwsxnQYnjlHSVmi cOmGBvprufkD87jLIhKiDB9kt5z1Z/WQKyHaXTVD1+6Jkq1ILqtyz/ZnZFh/da1aKf1GrYhBTezjK bl271Qeg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qSzTl-00AxGL-Gr; Mon, 07 Aug 2023 12:36:57 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 37457303463; Mon, 7 Aug 2023 14:36:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id C5FE42021B5CF; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123323.641470179@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:56 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de Subject: [PATCH v2 13/14] futex: Enable FUTEX2_{8,16} References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When futexes are no longer u32 aligned, the lower offset bits are no longer available to put type info in. However, since offset is the offset within a page, there are plenty bits available on the top end. After that, pass flags into futex_get_value_locked() for WAIT and disallow FUTEX2_SIZE_U64 instead of mandating FUTEX2_SIZE_U32. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Thomas Gleixner --- include/linux/futex.h | 11 ++++++----- kernel/futex/core.c | 9 +++++++++ kernel/futex/futex.h | 4 ++-- kernel/futex/waitwake.c | 5 +++-- 4 files changed, 20 insertions(+), 9 deletions(-) --- a/include/linux/futex.h +++ b/include/linux/futex.h @@ -16,18 +16,19 @@ struct task_struct; * The key type depends on whether it's a shared or private mapping. * Don't rearrange members without looking at hash_futex(). * - * offset is aligned to a multiple of sizeof(u32) (=3D=3D 4) by definition. - * We use the two low order bits of offset to tell what is the kind of key= : + * offset is the position within a page and is in the range [0, PAGE_SIZE). + * The high bits of the offset indicate what kind of key this is: * 00 : Private process futex (PTHREAD_PROCESS_PRIVATE) * (no reference on an inode or mm) * 01 : Shared futex (PTHREAD_PROCESS_SHARED) * mapped on a file (reference on the underlying inode) * 10 : Shared futex (PTHREAD_PROCESS_SHARED) * (but private mapping on an mm, and reference taken on it) -*/ + */ =20 -#define FUT_OFF_INODE 1 /* We set bit 0 if key has a reference on inode= */ -#define FUT_OFF_MMSHARED 2 /* We set bit 1 if key has a reference on mm */ +#define FUT_OFF_INODE (PAGE_SIZE << 0) +#define FUT_OFF_MMSHARED (PAGE_SIZE << 1) +#define FUT_OFF_SIZE (PAGE_SIZE << 2) =20 union futex_key { struct { --- a/kernel/futex/core.c +++ b/kernel/futex/core.c @@ -308,6 +308,15 @@ int get_futex_key(void __user *uaddr, un } =20 /* + * Encode the futex size in the offset. This makes cross-size + * wake-wait fail -- see futex_match(). + * + * NOTE that cross-size wake-wait is fundamentally broken wrt + * FLAGS_NUMA but could possibly work for !NUMA. + */ + key->both.offset |=3D FUT_OFF_SIZE * (flags & FLAGS_SIZE_MASK); + + /* * PROCESS_PRIVATE futexes are fast. * As the mm cannot disappear under us and the 'key' only needs * virtual address, we dont even have to find the underlying vma. --- a/kernel/futex/futex.h +++ b/kernel/futex/futex.h @@ -79,8 +79,8 @@ static inline bool futex_flags_valid(uns return false; } =20 - /* Only 32bit futexes are implemented -- for now */ - if ((flags & FLAGS_SIZE_MASK) !=3D FLAGS_SIZE_32) + /* 64bit futexes aren't implemented -- yet */ + if ((flags & FLAGS_SIZE_MASK) =3D=3D FLAGS_SIZE_64) return false; =20 /* --- a/kernel/futex/waitwake.c +++ b/kernel/futex/waitwake.c @@ -434,11 +434,12 @@ static int futex_wait_multiple_setup(str =20 for (i =3D 0; i < count; i++) { u32 __user *uaddr =3D (u32 __user *)(unsigned long)vs[i].w.uaddr; + unsigned int flags =3D vs[i].w.flags; struct futex_q *q =3D &vs[i].q; u32 val =3D vs[i].w.val; =20 hb =3D futex_q_lock(q); - ret =3D futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32); + ret =3D futex_get_value_locked(&uval, uaddr, flags); =20 if (!ret && uval =3D=3D val) { /* @@ -606,7 +607,7 @@ int futex_wait_setup(u32 __user *uaddr, retry_private: *hb =3D futex_q_lock(q); =20 - ret =3D futex_get_value_locked(&uval, uaddr, FLAGS_SIZE_32); + ret =3D futex_get_value_locked(&uval, uaddr, flags); =20 if (ret) { futex_q_unlock(*hb); From nobody Thu Sep 11 20:36:47 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD234C41513 for ; Mon, 7 Aug 2023 12:38:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233726AbjHGMhx (ORCPT ); Mon, 7 Aug 2023 08:37:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229562AbjHGMhU (ORCPT ); Mon, 7 Aug 2023 08:37:20 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E672C10F1; Mon, 7 Aug 2023 05:37:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=SuU6iwWIqVk8hwlYVFfKmnnvnAg5pfU6VmbV05Pxd1Q=; b=m2xRwbPdauLk+2EhPGclMbKjy4 puYrQxvHgAAcp8hr85eED1DkcDdAE1lTN+A9fOqaeHl7t91bfzTj4keU7QiHhvPu6cu+PaHynO18U 5NMwW4Fibt+7Smho22DvCQKeff8yfU9m5InsACn4bTC9Etg6YvAwWARYiEfacxUKtUBUQgVbHB8N/ MUs1C1j3YXJblu64z/VmATgpiUYp0ssn4A/94t/YUT2TKBz9fAwXXGueK2nnZcgropnAYPYkwarEc t5ifoZn181/WXgf7EAM8oXTHzZj9PPrrotk+1oBKlSZcuSPhqw2uJh6MrFb0kuUXqd/kzcdZD8v6y +yk4Q2YA==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qSzTl-00AxGM-HJ; Mon, 07 Aug 2023 12:36:58 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 3A1283061FF; Mon, 7 Aug 2023 14:36:56 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id CB7C0201FFA49; Mon, 7 Aug 2023 14:36:54 +0200 (CEST) Message-ID: <20230807123323.710299007@infradead.org> User-Agent: quilt/0.66 Date: Mon, 07 Aug 2023 14:18:57 +0200 From: Peter Zijlstra To: tglx@linutronix.de, axboe@kernel.dk Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, mingo@redhat.com, dvhart@infradead.org, dave@stgolabs.net, andrealmeid@igalia.com, Andrew Morton , urezki@gmail.com, hch@infradead.org, lstoakes@gmail.com, Arnd Bergmann , linux-api@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org, malteskarupke@web.de Subject: [PATCH v2 14/14] futex,selftests: Extend the futex selftests References: <20230807121843.710612856@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Extend the wait/requeue selftests to also cover the futex2 syscalls. Signed-off-by: Peter Zijlstra (Intel) --- tools/testing/selftests/futex/functional/futex_requeue.c | 100 ++= +++++++- tools/testing/selftests/futex/functional/futex_wait.c | 56 ++= ++- tools/testing/selftests/futex/functional/futex_wait_timeout.c | 14 + tools/testing/selftests/futex/functional/futex_wait_wouldblock.c | 28 ++ tools/testing/selftests/futex/functional/futex_waitv.c | 15 - tools/testing/selftests/futex/functional/run.sh | 6=20 tools/testing/selftests/futex/include/futex2test.h | 39 +++ 7 files changed, 229 insertions(+), 29 deletions(-) --- a/tools/testing/selftests/futex/functional/futex_requeue.c +++ b/tools/testing/selftests/futex/functional/futex_requeue.c @@ -7,8 +7,10 @@ =20 #include #include +#include #include "logging.h" #include "futextest.h" +#include "futex2test.h" =20 #define TEST_NAME "futex-requeue" #define timeout_ns 30000000 @@ -16,24 +18,58 @@ =20 volatile futex_t *f1; =20 +bool futex2 =3D 0; +bool mixed =3D 0; + void usage(char *prog) { printf("Usage: %s\n", prog); printf(" -c Use color\n"); + printf(" -n Use futex2 interface\n"); + printf(" -x Use mixed size futex\n"); printf(" -h Display this help message\n"); printf(" -v L Verbosity level: %d=3DQUIET %d=3DCRITICAL %d=3DINFO\n", VQUIET, VCRITICAL, VINFO); } =20 -void *waiterfn(void *arg) +static void *waiterfn(void *arg) { + unsigned int flags =3D 0; struct timespec to; =20 - to.tv_sec =3D 0; - to.tv_nsec =3D timeout_ns; + if (futex2) { + unsigned long mask; + + if (clock_gettime(CLOCK_MONOTONIC, &to)) { + printf("clock_gettime() failed errno %d", errno); + return NULL; + } + + to.tv_nsec +=3D timeout_ns; + if (to.tv_nsec >=3D 1000000000) { + to.tv_sec++; + to.tv_nsec -=3D 1000000000; + } + + if (mixed) { + flags |=3D FUTEX2_SIZE_U16; + mask =3D (unsigned short)(~0U); + } else { + flags |=3D FUTEX2_SIZE_U32; + mask =3D (unsigned int)(~0U); + } + + if (futex2_wait(f1, *f1, mask, flags, + &to, CLOCK_MONOTONIC)) + printf("waiter failed errno %d\n", errno); + } else { + + to.tv_sec =3D 0; + to.tv_nsec =3D timeout_ns; =20 - if (futex_wait(f1, *f1, &to, 0)) - printf("waiter failed errno %d\n", errno); + if (futex_wait(f1, *f1, &to, flags)) + printf("waiter failed errno %d\n", errno); + } =20 return NULL; } @@ -48,7 +84,7 @@ int main(int argc, char *argv[]) =20 f1 =3D &_f1; =20 - while ((c =3D getopt(argc, argv, "cht:v:")) !=3D -1) { + while ((c =3D getopt(argc, argv, "xncht:v:")) !=3D -1) { switch (c) { case 'c': log_color(1); @@ -59,6 +95,12 @@ int main(int argc, char *argv[]) case 'v': log_verbosity(atoi(optarg)); break; + case 'x': + mixed=3D1; + /* fallthrough */ + case 'n': + futex2=3D1; + break; default: usage(basename(argv[0])); exit(1); @@ -79,7 +121,22 @@ int main(int argc, char *argv[]) usleep(WAKE_WAIT_US); =20 info("Requeuing 1 futex from f1 to f2\n"); - res =3D futex_cmp_requeue(f1, 0, &f2, 0, 1, 0); + if (futex2) { + struct futex_waitv futexes[2] =3D { + { + .val =3D 0, + .uaddr =3D (unsigned long)f1, + .flags =3D mixed ? FUTEX2_SIZE_U16 : FUTEX2_SIZE_U32, + }, + { + .uaddr =3D (unsigned long)&f2, + .flags =3D FUTEX2_SIZE_U32, + }, + }; + res =3D futex2_requeue(futexes, 0, 0, 1); + } else { + res =3D futex_cmp_requeue(f1, 0, &f2, 0, 1, 0); + } if (res !=3D 1) { ksft_test_result_fail("futex_requeue simple returned: %d %s\n", res ? errno : res, @@ -89,7 +146,11 @@ int main(int argc, char *argv[]) =20 =20 info("Waking 1 futex at f2\n"); - res =3D futex_wake(&f2, 1, 0); + if (futex2) { + res =3D futex2_wake(&f2, ~0U, 1, FUTEX2_SIZE_U32); + } else { + res =3D futex_wake(&f2, 1, 0); + } if (res !=3D 1) { ksft_test_result_fail("futex_requeue simple returned: %d %s\n", res ? errno : res, @@ -112,7 +173,22 @@ int main(int argc, char *argv[]) usleep(WAKE_WAIT_US); =20 info("Waking 3 futexes at f1 and requeuing 7 futexes from f1 to f2\n"); - res =3D futex_cmp_requeue(f1, 0, &f2, 3, 7, 0); + if (futex2) { + struct futex_waitv futexes[2] =3D { + { + .val =3D 0, + .uaddr =3D (unsigned long)f1, + .flags =3D mixed ? FUTEX2_SIZE_U16 : FUTEX2_SIZE_U32, + }, + { + .uaddr =3D (unsigned long)&f2, + .flags =3D FUTEX2_SIZE_U32, + }, + }; + res =3D futex2_requeue(futexes, 0, 3, 7); + } else { + res =3D futex_cmp_requeue(f1, 0, &f2, 3, 7, 0); + } if (res !=3D 10) { ksft_test_result_fail("futex_requeue many returned: %d %s\n", res ? errno : res, @@ -121,7 +197,11 @@ int main(int argc, char *argv[]) } =20 info("Waking INT_MAX futexes at f2\n"); - res =3D futex_wake(&f2, INT_MAX, 0); + if (futex2) { + res =3D futex2_wake(&f2, ~0U, INT_MAX, FUTEX2_SIZE_U32); + } else { + res =3D futex_wake(&f2, INT_MAX, 0); + } if (res !=3D 7) { ksft_test_result_fail("futex_requeue many returned: %d %s\n", res ? errno : res, --- a/tools/testing/selftests/futex/functional/futex_wait.c +++ b/tools/testing/selftests/futex/functional/futex_wait.c @@ -9,8 +9,10 @@ #include #include #include +#include #include "logging.h" #include "futextest.h" +#include "futex2test.h" =20 #define TEST_NAME "futex-wait" #define timeout_ns 30000000 @@ -19,10 +21,13 @@ =20 void *futex; =20 +bool futex2 =3D 0; + void usage(char *prog) { printf("Usage: %s\n", prog); printf(" -c Use color\n"); + printf(" -n Use futex2 interface\n"); printf(" -h Display this help message\n"); printf(" -v L Verbosity level: %d=3DQUIET %d=3DCRITICAL %d=3DINFO\n", VQUIET, VCRITICAL, VINFO); @@ -30,17 +35,35 @@ void usage(char *prog) =20 static void *waiterfn(void *arg) { - struct timespec to; unsigned int flags =3D 0; + struct timespec to; =20 if (arg) flags =3D *((unsigned int *) arg); =20 - to.tv_sec =3D 0; - to.tv_nsec =3D timeout_ns; + if (futex2) { + if (clock_gettime(CLOCK_MONOTONIC, &to)) { + printf("clock_gettime() failed errno %d", errno); + return NULL; + } =20 - if (futex_wait(futex, 0, &to, flags)) - printf("waiter failed errno %d\n", errno); + to.tv_nsec +=3D timeout_ns; + if (to.tv_nsec >=3D 1000000000) { + to.tv_sec++; + to.tv_nsec -=3D 1000000000; + } + + if (futex2_wait(futex, 0, ~0U, flags | FUTEX2_SIZE_U32, + &to, CLOCK_MONOTONIC)) + printf("waiter failed errno %d\n", errno); + } else { + + to.tv_sec =3D 0; + to.tv_nsec =3D timeout_ns; + + if (futex_wait(futex, 0, &to, flags)) + printf("waiter failed errno %d\n", errno); + } =20 return NULL; } @@ -55,7 +78,7 @@ int main(int argc, char *argv[]) =20 futex =3D &f_private; =20 - while ((c =3D getopt(argc, argv, "cht:v:")) !=3D -1) { + while ((c =3D getopt(argc, argv, "ncht:v:")) !=3D -1) { switch (c) { case 'c': log_color(1); @@ -66,6 +89,9 @@ int main(int argc, char *argv[]) case 'v': log_verbosity(atoi(optarg)); break; + case 'n': + futex2=3D1; + break; default: usage(basename(argv[0])); exit(1); @@ -84,7 +110,11 @@ int main(int argc, char *argv[]) usleep(WAKE_WAIT_US); =20 info("Calling private futex_wake on futex: %p\n", futex); - res =3D futex_wake(futex, 1, FUTEX_PRIVATE_FLAG); + if (futex2) { + res =3D futex2_wake(futex, ~0U, 1, FUTEX2_SIZE_U32 | FUTEX2_PRIVATE); + } else { + res =3D futex_wake(futex, 1, FUTEX_PRIVATE_FLAG); + } if (res !=3D 1) { ksft_test_result_fail("futex_wake private returned: %d %s\n", errno, strerror(errno)); @@ -112,7 +142,11 @@ int main(int argc, char *argv[]) usleep(WAKE_WAIT_US); =20 info("Calling shared (page anon) futex_wake on futex: %p\n", futex); - res =3D futex_wake(futex, 1, 0); + if (futex2) { + res =3D futex2_wake(futex, ~0U, 1, FUTEX2_SIZE_U32); + } else { + res =3D futex_wake(futex, 1, 0); + } if (res !=3D 1) { ksft_test_result_fail("futex_wake shared (page anon) returned: %d %s\n", errno, strerror(errno)); @@ -151,7 +185,11 @@ int main(int argc, char *argv[]) usleep(WAKE_WAIT_US); =20 info("Calling shared (file backed) futex_wake on futex: %p\n", futex); - res =3D futex_wake(shm, 1, 0); + if (futex2) { + res =3D futex2_wake(shm, ~0U, 1, FUTEX2_SIZE_U32); + } else { + res =3D futex_wake(shm, 1, 0); + } if (res !=3D 1) { ksft_test_result_fail("futex_wake shared (file backed) returned: %d %s\n= ", errno, strerror(errno)); --- a/tools/testing/selftests/futex/functional/futex_wait_timeout.c +++ b/tools/testing/selftests/futex/functional/futex_wait_timeout.c @@ -125,7 +125,7 @@ int main(int argc, char *argv[]) } =20 ksft_print_header(); - ksft_set_plan(9); + ksft_set_plan(11); ksft_print_msg("%s: Block on a futex and wait for timeout\n", basename(argv[0])); ksft_print_msg("\tArguments: timeout=3D%ldns\n", timeout_ns); @@ -194,6 +194,18 @@ int main(int argc, char *argv[]) res =3D futex_waitv(&waitv, 1, 0, &to, CLOCK_REALTIME); test_timeout(res, &ret, "futex_waitv realtime", ETIMEDOUT); =20 + /* futex2_wait with CLOCK_MONOTONIC */ + if (futex_get_abs_timeout(CLOCK_MONOTONIC, &to, timeout_ns)) + return RET_FAIL; + res =3D futex2_wait(&f1, f1, 1, FUTEX2_SIZE_U32, &to, CLOCK_MONOTONIC); + test_timeout(res, &ret, "futex2_wait monotonic", ETIMEDOUT); + + /* futex2_wait with CLOCK_REALTIME */ + if (futex_get_abs_timeout(CLOCK_REALTIME, &to, timeout_ns)) + return RET_FAIL; + res =3D futex2_wait(&f1, f1, 1, FUTEX2_SIZE_U32, &to, CLOCK_REALTIME); + test_timeout(res, &ret, "futex2_wait realtime", ETIMEDOUT); + ksft_print_cnts(); return ret; } --- a/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c +++ b/tools/testing/selftests/futex/functional/futex_wait_wouldblock.c @@ -46,7 +46,7 @@ int main(int argc, char *argv[]) struct futex_waitv waitv =3D { .uaddr =3D (uintptr_t)&f1, .val =3D f1+1, - .flags =3D FUTEX_32, + .flags =3D FUTEX2_SIZE_U32 | FUTEX2_PRIVATE, .__reserved =3D 0 }; =20 @@ -68,7 +68,7 @@ int main(int argc, char *argv[]) } =20 ksft_print_header(); - ksft_set_plan(2); + ksft_set_plan(3); ksft_print_msg("%s: Test the unexpected futex value in FUTEX_WAIT\n", basename(argv[0])); =20 @@ -106,6 +106,30 @@ int main(int argc, char *argv[]) ksft_test_result_pass("futex_waitv\n"); } =20 + if (clock_gettime(CLOCK_MONOTONIC, &to)) { + error("clock_gettime failed\n", errno); + return errno; + } + + to.tv_nsec +=3D timeout_ns; + + if (to.tv_nsec >=3D 1000000000) { + to.tv_sec++; + to.tv_nsec -=3D 1000000000; + } + + info("Calling futex2_wait on f1: %u @ %p with val=3D%u\n", f1, &f1, f1+1); + res =3D futex2_wait(&f1, f1+1, ~0U, FUTEX2_SIZE_U32 | FUTEX2_PRIVATE, + &to, CLOCK_MONOTONIC); + if (!res || errno !=3D EWOULDBLOCK) { + ksft_test_result_pass("futex2_wait returned: %d %s\n", + res ? errno : res, + res ? strerror(errno) : ""); + ret =3D RET_FAIL; + } else { + ksft_test_result_pass("futex2_wait\n"); + } + ksft_print_cnts(); return ret; } --- a/tools/testing/selftests/futex/functional/futex_waitv.c +++ b/tools/testing/selftests/futex/functional/futex_waitv.c @@ -88,7 +88,7 @@ int main(int argc, char *argv[]) =20 for (i =3D 0; i < NR_FUTEXES; i++) { waitv[i].uaddr =3D (uintptr_t)&futexes[i]; - waitv[i].flags =3D FUTEX_32 | FUTEX_PRIVATE_FLAG; + waitv[i].flags =3D FUTEX2_SIZE_U32 | FUTEX2_PRIVATE; waitv[i].val =3D 0; waitv[i].__reserved =3D 0; } @@ -99,7 +99,8 @@ int main(int argc, char *argv[]) =20 usleep(WAKE_WAIT_US); =20 - res =3D futex_wake(u64_to_ptr(waitv[NR_FUTEXES - 1].uaddr), 1, FUTEX_PRIV= ATE_FLAG); + res =3D futex2_wake(u64_to_ptr(waitv[NR_FUTEXES - 1].uaddr), ~0U, 1, + FUTEX2_PRIVATE | FUTEX2_SIZE_U32); if (res !=3D 1) { ksft_test_result_fail("futex_wake private returned: %d %s\n", res ? errno : res, @@ -122,7 +123,7 @@ int main(int argc, char *argv[]) =20 *shared_data =3D 0; waitv[i].uaddr =3D (uintptr_t)shared_data; - waitv[i].flags =3D FUTEX_32; + waitv[i].flags =3D FUTEX2_SIZE_U32; waitv[i].val =3D 0; waitv[i].__reserved =3D 0; } @@ -145,8 +146,8 @@ int main(int argc, char *argv[]) for (i =3D 0; i < NR_FUTEXES; i++) shmdt(u64_to_ptr(waitv[i].uaddr)); =20 - /* Testing a waiter without FUTEX_32 flag */ - waitv[0].flags =3D FUTEX_PRIVATE_FLAG; + /* Testing a waiter without FUTEX2_SIZE_U32 flag */ + waitv[0].flags =3D FUTEX2_PRIVATE; =20 if (clock_gettime(CLOCK_MONOTONIC, &to)) error("gettime64 failed\n", errno); @@ -160,11 +161,11 @@ int main(int argc, char *argv[]) res ? strerror(errno) : ""); ret =3D RET_FAIL; } else { - ksft_test_result_pass("futex_waitv without FUTEX_32\n"); + ksft_test_result_pass("futex_waitv without FUTEX2_SIZE_U32\n"); } =20 /* Testing a waiter with an unaligned address */ - waitv[0].flags =3D FUTEX_PRIVATE_FLAG | FUTEX_32; + waitv[0].flags =3D FUTEX2_PRIVATE | FUTEX2_SIZE_U32; waitv[0].uaddr =3D 1; =20 if (clock_gettime(CLOCK_MONOTONIC, &to)) --- a/tools/testing/selftests/futex/functional/run.sh +++ b/tools/testing/selftests/futex/functional/run.sh @@ -76,9 +76,15 @@ echo =20 echo ./futex_wait $COLOR +echo +./futex_wait -n $COLOR =20 echo ./futex_requeue $COLOR +echo +./futex_requeue -n $COLOR +echo +./futex_requeue -x $COLOR =20 echo ./futex_waitv $COLOR --- a/tools/testing/selftests/futex/include/futex2test.h +++ b/tools/testing/selftests/futex/include/futex2test.h @@ -8,6 +8,28 @@ =20 #define u64_to_ptr(x) ((void *)(uintptr_t)(x)) =20 +#ifndef __NR_futex_wake +#define __NR_futex_wake 452 +#define __NR_futex_wait 453 +#define __NR_futex_requeue 454 +#endif + +#ifndef FUTEX2_SIZE_U8 +/* + * Flags for futex2 syscalls. + */ +#define FUTEX2_SIZE_U8 0x00 +#define FUTEX2_SIZE_U16 0x01 +#define FUTEX2_SIZE_U32 0x02 +#define FUTEX2_SIZE_U64 0x03 +#define FUTEX2_NUMA 0x04 + /* 0x08 */ + /* 0x10 */ + /* 0x20 */ + /* 0x40 */ +#define FUTEX2_PRIVATE FUTEX_PRIVATE_FLAG +#endif + /** * futex_waitv - Wait at multiple futexes, wake on any * @waiters: Array of waiters @@ -20,3 +42,20 @@ static inline int futex_waitv(volatile s { return syscall(__NR_futex_waitv, waiters, nr_waiters, flags, timo, clocki= d); } + +static inline int futex2_wake(volatile void *uaddr, unsigned long mask, in= t nr, unsigned int flags) +{ + return syscall(__NR_futex_wake, uaddr, mask, nr, flags); +} + +static inline int futex2_wait(volatile void *uaddr, unsigned long val, uns= igned long mask, + unsigned int flags, struct timespec *timo, clockid_t clockid) +{ + return syscall(__NR_futex_wait, uaddr, val, mask, flags, timo, clockid); +} + +static inline int futex2_requeue(struct futex_waitv *futexes, unsigned int= flags, + int nr_wake, int nr_requeue) +{ + return syscall(__NR_futex_requeue, futexes, flags, nr_wake, nr_requeue); +}