[[PATCH v3] 1/4] man/man2/prctl.2, man/man2const/PR_FUTEX_HASH.2const: Document PR_FUTEX_HASH

Sebastian Andrzej Siewior posted 1 patch 6 months, 3 weeks ago
man/man2/prctl.2                   |  3 +
man/man2const/PR_FUTEX_HASH.2const | 92 ++++++++++++++++++++++++++++++
2 files changed, 95 insertions(+)
create mode 100644 man/man2const/PR_FUTEX_HASH.2const
[[PATCH v3] 1/4] man/man2/prctl.2, man/man2const/PR_FUTEX_HASH.2const: Document PR_FUTEX_HASH
Posted by Sebastian Andrzej Siewior 6 months, 3 weeks ago
The prctl(PR_FUTEX_HASH) is queued for the v6.16 merge window.
Add some documentation of the interface.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 man/man2/prctl.2                   |  3 +
 man/man2const/PR_FUTEX_HASH.2const | 92 ++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)
 create mode 100644 man/man2const/PR_FUTEX_HASH.2const

diff --git a/man/man2/prctl.2 b/man/man2/prctl.2
index cb5e75bf79ab2..ddfd1d1f5b940 100644
--- a/man/man2/prctl.2
+++ b/man/man2/prctl.2
@@ -150,6 +150,8 @@ with a significance depending on the first one.
 .B PR_GET_MDWE
 .TQ
 .B PR_RISCV_SET_ICACHE_FLUSH_CTX
+.TQ
+.B PR_FUTEX_HASH
 .SH RETURN VALUE
 On success,
 a nonnegative value is returned.
@@ -262,4 +264,5 @@ so these operations should be used with care.
 .BR PR_SET_MDWE (2const),
 .BR PR_GET_MDWE (2const),
 .BR PR_RISCV_SET_ICACHE_FLUSH_CTX (2const),
+.BR PR_FUTEX_HASH (2const),
 .BR core (5)
diff --git a/man/man2const/PR_FUTEX_HASH.2const b/man/man2const/PR_FUTEX_HASH.2const
new file mode 100644
index 0000000000000..c27adcb73d079
--- /dev/null
+++ b/man/man2const/PR_FUTEX_HASH.2const
@@ -0,0 +1,92 @@
+.\" Copyright, the authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH PR_FUTEX_HASH 2const (date) "Linux man-pages (unreleased)"
+.SH NAME
+PR_FUTEX_HASH
+\-
+configure the private futex hash
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <linux/prctl.h>" "  /* Definition of " PR_* " constants */"
+.B #include <sys/prctl.h>
+.P
+.BI "int prctl(PR_FUTEX_HASH, unsigned long " op ", ...);"
+.fi
+.SH DESCRIPTION
+Configure the attributes for the underlying hash used by the
+.BR futex (2)
+family of operations.
+The Linux kernel uses a hash to distribute the unrelated
+.BR futex (2)
+requests to different data structures
+in order to reduce the lock contention.
+Unrelated requests are requests which are not related to one another
+because they use a different
+.I uaddr
+value of the syscall or the requests are issued by different processes
+and the
+.B FUTEX_PRIVATE_FLAG
+option is set.
+The data structure holds the in-kernel representation of the operation and
+keeps track of the current users which are enqueued and wait for a wake up.
+It also provides synchronisation of waiters against wakers.
+The size of the global hash is determined at boot time
+and is based on the number of CPUs in the system.
+Due to hash collision two unrelated
+.BR futex (2)
+requests can share the same hash bucket.
+This in turn can lead to delays of the
+.BR futex (2)
+operation due to lock contention while accessing the data structure.
+These delays can be problematic on a real-time system
+since random processes can
+share in-kernel locks
+and it is not deterministic which process will be involved.
+.P
+Linux 6.16 implements a process-wide private hash which is used by all
+.BR futex (2)
+operations that specify the
+.B FUTEX_PRIVATE_FLAG
+option as part of the operation.
+Without any configuration
+the kernel will allocate 16 hash slots
+once the first thread has been created.
+If the process continues to create threads,
+the kernel will try to resize the private hash based on the number of threads
+and available CPUs in the system.
+The kernel will only increase the size and will make sure it does not exceed
+the size of the global hash.
+.P
+The user can configure the size of the private hash which will also disable the
+automatic resize provided by the kernel.
+.P
+The value in
+.I op
+is one of the options below.
+.TP
+.B PR_FUTEX_HASH_GET_IMMUTABLE
+.TQ
+.B PR_FUTEX_HASH_GET_SLOTS
+.TQ
+.B PR_FUTEX_HASH_SET_SLOTS
+.SH RETURN VALUE
+On success,
+these calls return a nonnegative value.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 6.16.
+.SH SEE ALSO
+.BR prctl (2),
+.BR futex (2),
+.BR PR_FUTEX_HASH_GET_IMMUTABLE (2const),
+.BR PR_FUTEX_HASH_GET_SLOTS (2const),
+.BR PR_FUTEX_HASH_SET_SLOTS (2const)
-- 
2.49.0
Re: [[PATCH v3] 1/4] man/man2/prctl.2, man/man2const/PR_FUTEX_HASH.2const: Document PR_FUTEX_HASH
Posted by Alejandro Colomar 6 months, 3 weeks ago
Hi Sebastian,

On Mon, May 26, 2025 at 05:55:20PM +0200, Sebastian Andrzej Siewior wrote:
> The prctl(PR_FUTEX_HASH) is queued for the v6.16 merge window.
> Add some documentation of the interface.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  man/man2/prctl.2                   |  3 +
>  man/man2const/PR_FUTEX_HASH.2const | 92 ++++++++++++++++++++++++++++++
>  2 files changed, 95 insertions(+)
>  create mode 100644 man/man2const/PR_FUTEX_HASH.2const
> 
> diff --git a/man/man2/prctl.2 b/man/man2/prctl.2
> index cb5e75bf79ab2..ddfd1d1f5b940 100644
> --- a/man/man2/prctl.2
> +++ b/man/man2/prctl.2
> @@ -150,6 +150,8 @@ with a significance depending on the first one.
>  .B PR_GET_MDWE
>  .TQ
>  .B PR_RISCV_SET_ICACHE_FLUSH_CTX
> +.TQ
> +.B PR_FUTEX_HASH
>  .SH RETURN VALUE
>  On success,
>  a nonnegative value is returned.
> @@ -262,4 +264,5 @@ so these operations should be used with care.
>  .BR PR_SET_MDWE (2const),
>  .BR PR_GET_MDWE (2const),
>  .BR PR_RISCV_SET_ICACHE_FLUSH_CTX (2const),
> +.BR PR_FUTEX_HASH (2const),
>  .BR core (5)
> diff --git a/man/man2const/PR_FUTEX_HASH.2const b/man/man2const/PR_FUTEX_HASH.2const
> new file mode 100644
> index 0000000000000..c27adcb73d079
> --- /dev/null
> +++ b/man/man2const/PR_FUTEX_HASH.2const
> @@ -0,0 +1,92 @@
> +.\" Copyright, the authors of the Linux man-pages project
> +.\"
> +.\" SPDX-License-Identifier: Linux-man-pages-copyleft
> +.\"
> +.TH PR_FUTEX_HASH 2const (date) "Linux man-pages (unreleased)"
> +.SH NAME
> +PR_FUTEX_HASH
> +\-
> +configure the private futex hash
> +.SH LIBRARY
> +Standard C library
> +.RI ( libc ,\~ \-lc )
> +.SH SYNOPSIS
> +.nf
> +.BR "#include <linux/prctl.h>" "  /* Definition of " PR_* " constants */"
> +.B #include <sys/prctl.h>
> +.P
> +.BI "int prctl(PR_FUTEX_HASH, unsigned long " op ", ...);"
> +.fi
> +.SH DESCRIPTION
> +Configure the attributes for the underlying hash used by the
> +.BR futex (2)
> +family of operations.
> +The Linux kernel uses a hash to distribute the unrelated
> +.BR futex (2)
> +requests to different data structures
> +in order to reduce the lock contention.
> +Unrelated requests are requests which are not related to one another
> +because they use a different
> +.I uaddr
> +value of the syscall or the requests are issued by different processes

I think 'use a different uaddr value of the syscall' is technically
incorrect, because two processes may have a different address for the
same futex word, as their address space is different, right?

See futex(2):

$ MANWIDTH=72 man futex | grep -B7 -A5 different.v

     A futex is a 32‐bit value——referred to below  as  a  futex  word——
     whose  address  is  supplied to the futex() system call.  (Futexes
     are 32 bits in size on all platforms, including  64‐bit  systems.)
     All  futex  operations  are  governed  by this value.  In order to
     share a futex between processes, the futex is placed in  a  region
     of shared memory, created using (for example) mmap(2) or shmat(2).
     (Thus, the futex word may have different virtual addresses in dif‐
     ferent  processes, but these addresses all refer to the same loca‐
     tion in physical memory.)  In a multithreaded program, it is  suf‐
     ficient to place the futex word in a global variable shared by all
     threads.

Maybe say 'use a different futex word'?

> +and the
> +.B FUTEX_PRIVATE_FLAG
> +option is set.

By referring to a different futex word, this is already implied, so we
can drop it.

> +The data structure holds the in-kernel representation of the operation and
> +keeps track of the current users which are enqueued and wait for a wake up.
> +It also provides synchronisation of waiters against wakers.
> +The size of the global hash is determined at boot time
> +and is based on the number of CPUs in the system.
> +Due to hash collision two unrelated

s/ two/, two/

> +.BR futex (2)
> +requests can share the same hash bucket.
> +This in turn can lead to delays of the
> +.BR futex (2)
> +operation due to lock contention while accessing the data structure.
> +These delays can be problematic on a real-time system
> +since random processes can
> +share in-kernel locks
> +and it is not deterministic which process will be involved.
> +.P
> +Linux 6.16 implements a process-wide private hash which is used by all
> +.BR futex (2)
> +operations that specify the
> +.B FUTEX_PRIVATE_FLAG
> +option as part of the operation.
> +Without any configuration
> +the kernel will allocate 16 hash slots
> +once the first thread has been created.
> +If the process continues to create threads,
> +the kernel will try to resize the private hash based on the number of threads
> +and available CPUs in the system.
> +The kernel will only increase the size and will make sure it does not exceed
> +the size of the global hash.
> +.P
> +The user can configure the size of the private hash which will also disable the

s/hash/\nhash/

> +automatic resize provided by the kernel.
> +.P
> +The value in
> +.I op
> +is one of the options below.
> +.TP
> +.B PR_FUTEX_HASH_GET_IMMUTABLE
> +.TQ
> +.B PR_FUTEX_HASH_GET_SLOTS
> +.TQ
> +.B PR_FUTEX_HASH_SET_SLOTS
> +.SH RETURN VALUE
> +On success,
> +these calls return a nonnegative value.
> +On error, \-1 is returned, and
> +.I errno
> +is set to indicate the error.
> +.SH STANDARDS
> +Linux.
> +.SH HISTORY
> +Linux 6.16.
> +.SH SEE ALSO
> +.BR prctl (2),
> +.BR futex (2),
> +.BR PR_FUTEX_HASH_GET_IMMUTABLE (2const),
> +.BR PR_FUTEX_HASH_GET_SLOTS (2const),
> +.BR PR_FUTEX_HASH_SET_SLOTS (2const)
> -- 
> 2.49.0

Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es/>
Re: [[PATCH v3] 1/4] man/man2/prctl.2, man/man2const/PR_FUTEX_HASH.2const: Document PR_FUTEX_HASH
Posted by Sebastian Andrzej Siewior 6 months, 3 weeks ago
On 2025-05-30 11:51:58 [+0200], Alejandro Colomar wrote:
> Hi Sebastian,
Hi Alejandro,

> > diff --git a/man/man2const/PR_FUTEX_HASH.2const b/man/man2const/PR_FUTEX_HASH.2const
> > new file mode 100644
> > index 0000000000000..c27adcb73d079
> > --- /dev/null
> > +++ b/man/man2const/PR_FUTEX_HASH.2const
…
> > +Unrelated requests are requests which are not related to one another
> > +because they use a different
> > +.I uaddr
> > +value of the syscall or the requests are issued by different processes
> 
> I think 'use a different uaddr value of the syscall' is technically
> incorrect, because two processes may have a different address for the
> same futex word, as their address space is different, right?

A shared futex over shared memory. Yes.
 
> See futex(2):
> 
> $ MANWIDTH=72 man futex | grep -B7 -A5 different.v
> 
>      A futex is a 32‐bit value——referred to below  as  a  futex  word——
>      whose  address  is  supplied to the futex() system call.  (Futexes
>      are 32 bits in size on all platforms, including  64‐bit  systems.)
>      All  futex  operations  are  governed  by this value.  In order to
>      share a futex between processes, the futex is placed in  a  region
>      of shared memory, created using (for example) mmap(2) or shmat(2).
>      (Thus, the futex word may have different virtual addresses in dif‐
>      ferent  processes, but these addresses all refer to the same loca‐
>      tion in physical memory.)  In a multithreaded program, it is  suf‐
>      ficient to place the futex word in a global variable shared by all
>      threads.
> 
> Maybe say 'use a different futex word'?

Oh yes, this would make it simpler to express.

Sebastian