[PATCH v2] PR_*ET_THP_DISABLE.2const: document addition of PR_THP_DISABLE_EXCEPT_ADVISED

Usama Arif posted 1 patch 1 month, 1 week ago
man/man2/madvise.2                      |  6 ++-
man/man2const/PR_GET_THP_DISABLE.2const | 20 +++++++---
man/man2const/PR_SET_THP_DISABLE.2const | 52 +++++++++++++++++++++----
3 files changed, 64 insertions(+), 14 deletions(-)
[PATCH v2] PR_*ET_THP_DISABLE.2const: document addition of PR_THP_DISABLE_EXCEPT_ADVISED
Posted by Usama Arif 1 month, 1 week ago
PR_THP_DISABLE_EXCEPT_ADVISED extended PR_SET_THP_DISABLE to only provide
THPs when advised. IOW, it allows individual processes to opt-out of THP =
"always" into THP = "madvise", without affecting other workloads on the
system. The series has been merged in [1]. Before [1], the following 2
calls were allowed with PR_SET_THP_DISABLE:

prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0); // to reset THP setting.
prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0); // to disable THPs completely.

Now in addition to the 2 calls above, you can do:

prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0); // to
disable THPs except madvise.

This patch documents the changes introduced due to the addition of
PR_THP_DISABLE_EXCEPT_ADVISED flag:
- PR_GET_THP_DISABLE returns a value whose bits indicate how THP-disable
  is configured for the calling thread (with or without
  PR_THP_DISABLE_EXCEPT_ADVISED).
- PR_SET_THP_DISABLE now uses arg3 to specify whether to disable THP
  completely for the process, or disable except madvise
  (PR_THP_DISABLE_EXCEPT_ADVISED).

[1] https://github.com/torvalds/linux/commit/9dc21bbd62edeae6f63e6f25e1edb7167452457b

Signed-off-by: Usama Arif <usamaarif642@gmail.com>
---
v1 -> v2 (Alejandro Colomar):
- Fixed double negation on when MADV_HUGEPAGE will succeed
- Turn return values of PR_GET_THP_DISABLE into a table
- Turn madvise calls into full italics
- Use semantic newlines
---
 man/man2/madvise.2                      |  6 ++-
 man/man2const/PR_GET_THP_DISABLE.2const | 20 +++++++---
 man/man2const/PR_SET_THP_DISABLE.2const | 52 +++++++++++++++++++++----
 3 files changed, 64 insertions(+), 14 deletions(-)

diff --git a/man/man2/madvise.2 b/man/man2/madvise.2
index 7a4310c40..55c6f4a6c 100644
--- a/man/man2/madvise.2
+++ b/man/man2/madvise.2
@@ -372,9 +372,11 @@ or
 .BR VM_PFNMAP ,
 nor can it be stack memory or backed by a DAX-enabled device
 (unless the DAX device is hot-plugged as System RAM).
-The process must also not have
+The process can have
 .B PR_SET_THP_DISABLE
-set (see
+set only if
+.B PR_THP_DISABLE_EXCEPT_ADVISED
+flag is set (see
 .BR prctl (2)).
 .IP
 The
diff --git a/man/man2const/PR_GET_THP_DISABLE.2const b/man/man2const/PR_GET_THP_DISABLE.2const
index 38ff3b370..d63cff21c 100644
--- a/man/man2const/PR_GET_THP_DISABLE.2const
+++ b/man/man2const/PR_GET_THP_DISABLE.2const
@@ -6,7 +6,7 @@
 .SH NAME
 PR_GET_THP_DISABLE
 \-
-get the state of the "THP disable" flag for the calling thread
+get the state of the "THP disable" flags for the calling thread
 .SH LIBRARY
 Standard C library
 .RI ( libc ,\~ \-lc )
@@ -18,13 +18,23 @@ Standard C library
 .B int prctl(PR_GET_THP_DISABLE, 0L, 0L, 0L, 0L);
 .fi
 .SH DESCRIPTION
-Return the current setting of
-the "THP disable" flag for the calling thread:
-either 1, if the flag is set, or 0, if it is not.
+Return a value whose bits indicate how THP-disable is configured
+for the calling thread.
+The returned value is interpreted as follows:
+.P
+.TS
+allbox;
+cb cb cb l
+c c c l.
+Bit 1	Bit 0	Value	Description
+0	0	0	No THP-disable behaviour specified.
+0	1	1	THP is entirely disabled for this process.
+1	1	3	THP-except-advised mode is set for this process.
+.TE
 .SH RETURN VALUE
 On success,
 .BR PR_GET_THP_DISABLE ,
-returns the boolean value described above.
+returns the value described above.
 On error, \-1 is returned, and
 .I errno
 is set to indicate the error.
diff --git a/man/man2const/PR_SET_THP_DISABLE.2const b/man/man2const/PR_SET_THP_DISABLE.2const
index 532beac66..75e17fa6a 100644
--- a/man/man2const/PR_SET_THP_DISABLE.2const
+++ b/man/man2const/PR_SET_THP_DISABLE.2const
@@ -6,7 +6,7 @@
 .SH NAME
 PR_SET_THP_DISABLE
 \-
-set the state of the "THP disable" flag for the calling thread
+set the state of the "THP disable" flags for the calling thread
 .SH LIBRARY
 Standard C library
 .RI ( libc ,\~ \-lc )
@@ -15,15 +15,20 @@ Standard C library
 .BR "#include <linux/prctl.h>" "  /* Definition of " PR_* " constants */"
 .B #include <sys/prctl.h>
 .P
-.BI "int prctl(PR_SET_THP_DISABLE, long " flag ", 0L, 0L, 0L);"
+.BI "int prctl(PR_SET_THP_DISABLE, long " thp_disable ", unsigned long " flags ", 0L, 0L);"
 .fi
 .SH DESCRIPTION
-Set the state of the "THP disable" flag for the calling thread.
+Set the state of the "THP disable" flags for the calling thread.
 If
-.I flag
-has a nonzero value, the flag is set, otherwise it is cleared.
+.I thp_disable
+has a nonzero value,
+the THP disable flag is set according to the value of
+.I flags,
+otherwise it is cleared.
 .P
-Setting this flag provides a method
+This
+.BR prctl (2)
+provides a method
 for disabling transparent huge pages
 for jobs where the code cannot be modified,
 and using a
@@ -31,10 +36,43 @@ and using a
 hook with
 .BR madvise (2)
 is not an option (i.e., statically allocated data).
-The setting of the "THP disable" flag is inherited by a child created via
+The setting of the "THP disable" flags is inherited by a child created via
 .BR fork (2)
 and is preserved across
 .BR execve (2).
+.P
+The behavior depends on the value of
+.IR flags:
+.TP
+.B 0
+The
+.BR prctl (2)
+call will disable THPs completely for the process,
+irrespective of global THP controls or
+.BR MADV_COLLAPSE .
+.TP
+.B PR_THP_DISABLE_EXCEPT_ADVISED
+The
+.BR prctl (2)
+call will disable THPs for the process
+except when the usage of THPs is
+advised.
+Consequently, THPs will only be used when:
+.RS
+.IP \[bu] 3
+Global THP controls are set to "always" or "madvise" and
+.I \%madvise(...,\~MADV_HUGEPAGE)
+or
+.I \%madvise(...,\~MADV_COLLAPSE)
+is used.
+.IP \[bu]
+Global THP controls are set to "never" and
+.I \%madvise(...,\~MADV_COLLAPSE)
+is used.
+This is the same behavior
+as if THPs would not be disabled on
+a process level.
+.RE
 .SH RETURN VALUE
 On success,
 0 is returned.
-- 
2.47.3
Re: [PATCH v2] PR_*ET_THP_DISABLE.2const: document addition of PR_THP_DISABLE_EXCEPT_ADVISED
Posted by Alejandro Colomar 1 month ago
Hi Usama,

On Wed, Nov 05, 2025 at 01:48:11PM +0000, Usama Arif wrote:
> PR_THP_DISABLE_EXCEPT_ADVISED extended PR_SET_THP_DISABLE to only provide
> THPs when advised. IOW, it allows individual processes to opt-out of THP =
> "always" into THP = "madvise", without affecting other workloads on the
> system. The series has been merged in [1]. Before [1], the following 2
> calls were allowed with PR_SET_THP_DISABLE:
> 
> prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0); // to reset THP setting.
> prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0); // to disable THPs completely.
> 
> Now in addition to the 2 calls above, you can do:
> 
> prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0); // to
> disable THPs except madvise.
> 
> This patch documents the changes introduced due to the addition of
> PR_THP_DISABLE_EXCEPT_ADVISED flag:
> - PR_GET_THP_DISABLE returns a value whose bits indicate how THP-disable
>   is configured for the calling thread (with or without
>   PR_THP_DISABLE_EXCEPT_ADVISED).
> - PR_SET_THP_DISABLE now uses arg3 to specify whether to disable THP
>   completely for the process, or disable except madvise
>   (PR_THP_DISABLE_EXCEPT_ADVISED).
> 
> [1] https://github.com/torvalds/linux/commit/9dc21bbd62edeae6f63e6f25e1edb7167452457b
> 
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> ---
> v1 -> v2 (Alejandro Colomar):
> - Fixed double negation on when MADV_HUGEPAGE will succeed
> - Turn return values of PR_GET_THP_DISABLE into a table
> - Turn madvise calls into full italics
> - Use semantic newlines

Thanks!  I've applied the patch.  I've amended a few things (see below).


Have a lovely day!
Alex

> ---
>  man/man2/madvise.2                      |  6 ++-
>  man/man2const/PR_GET_THP_DISABLE.2const | 20 +++++++---
>  man/man2const/PR_SET_THP_DISABLE.2const | 52 +++++++++++++++++++++----
>  3 files changed, 64 insertions(+), 14 deletions(-)
> 
> diff --git a/man/man2/madvise.2 b/man/man2/madvise.2
> index 7a4310c40..55c6f4a6c 100644
> --- a/man/man2/madvise.2
> +++ b/man/man2/madvise.2
> @@ -372,9 +372,11 @@ or
>  .BR VM_PFNMAP ,
>  nor can it be stack memory or backed by a DAX-enabled device
>  (unless the DAX device is hot-plugged as System RAM).
> -The process must also not have
> +The process can have
>  .B PR_SET_THP_DISABLE
> -set (see
> +set only if
> +.B PR_THP_DISABLE_EXCEPT_ADVISED
> +flag is set (see

I've removed 'flag', for consistency.

>  .BR prctl (2)).
>  .IP
>  The
> diff --git a/man/man2const/PR_GET_THP_DISABLE.2const b/man/man2const/PR_GET_THP_DISABLE.2const
> index 38ff3b370..d63cff21c 100644
> --- a/man/man2const/PR_GET_THP_DISABLE.2const
> +++ b/man/man2const/PR_GET_THP_DISABLE.2const
> @@ -6,7 +6,7 @@
>  .SH NAME
>  PR_GET_THP_DISABLE
>  \-
> -get the state of the "THP disable" flag for the calling thread
> +get the state of the "THP disable" flags for the calling thread
>  .SH LIBRARY
>  Standard C library
>  .RI ( libc ,\~ \-lc )
> @@ -18,13 +18,23 @@ Standard C library
>  .B int prctl(PR_GET_THP_DISABLE, 0L, 0L, 0L, 0L);
>  .fi
>  .SH DESCRIPTION
> -Return the current setting of
> -the "THP disable" flag for the calling thread:
> -either 1, if the flag is set, or 0, if it is not.
> +Return a value whose bits indicate how THP-disable is configured
> +for the calling thread.
> +The returned value is interpreted as follows:
> +.P
> +.TS
> +allbox;
> +cb cb cb l
> +c c c l.
> +Bit 1	Bit 0	Value	Description
> +0	0	0	No THP-disable behaviour specified.
> +0	1	1	THP is entirely disabled for this process.
> +1	1	3	THP-except-advised mode is set for this process.
> +.TE

I've replaced this by something simpler:

	.TP
	.B 0b00
	No THP-disable behaviour specified.
	.TP
	.B 0b01
	THP is entirely disabled for this process.
	.TP
	.B 0b11
	THP-except-advised mode is set for this process.

(0b is a binary prefix standardized in ISO C23, and it is now supported
 by printf(3) and strtol(3).)

>  .SH RETURN VALUE
>  On success,
>  .BR PR_GET_THP_DISABLE ,
> -returns the boolean value described above.
> +returns the value described above.
>  On error, \-1 is returned, and
>  .I errno
>  is set to indicate the error.
> diff --git a/man/man2const/PR_SET_THP_DISABLE.2const b/man/man2const/PR_SET_THP_DISABLE.2const
> index 532beac66..75e17fa6a 100644
> --- a/man/man2const/PR_SET_THP_DISABLE.2const
> +++ b/man/man2const/PR_SET_THP_DISABLE.2const
> @@ -6,7 +6,7 @@
>  .SH NAME
>  PR_SET_THP_DISABLE
>  \-
> -set the state of the "THP disable" flag for the calling thread
> +set the state of the "THP disable" flags for the calling thread
>  .SH LIBRARY
>  Standard C library
>  .RI ( libc ,\~ \-lc )
> @@ -15,15 +15,20 @@ Standard C library
>  .BR "#include <linux/prctl.h>" "  /* Definition of " PR_* " constants */"
>  .B #include <sys/prctl.h>
>  .P
> -.BI "int prctl(PR_SET_THP_DISABLE, long " flag ", 0L, 0L, 0L);"
> +.BI "int prctl(PR_SET_THP_DISABLE, long " thp_disable ", unsigned long " flags ", 0L, 0L);"
>  .fi
>  .SH DESCRIPTION
> -Set the state of the "THP disable" flag for the calling thread.
> +Set the state of the "THP disable" flags for the calling thread.
>  If
> -.I flag
> -has a nonzero value, the flag is set, otherwise it is cleared.
> +.I thp_disable
> +has a nonzero value,
> +the THP disable flag is set according to the value of
> +.I flags,

This should be

	.IR flags ,

> +otherwise it is cleared.
>  .P
> -Setting this flag provides a method
> +This
> +.BR prctl (2)
> +provides a method
>  for disabling transparent huge pages
>  for jobs where the code cannot be modified,
>  and using a
> @@ -31,10 +36,43 @@ and using a
>  hook with
>  .BR madvise (2)
>  is not an option (i.e., statically allocated data).
> -The setting of the "THP disable" flag is inherited by a child created via
> +The setting of the "THP disable" flags is inherited by a child created via
>  .BR fork (2)
>  and is preserved across
>  .BR execve (2).
> +.P
> +The behavior depends on the value of
> +.IR flags:

This should be:

	.IR flags :

> +.TP
> +.B 0
> +The
> +.BR prctl (2)
> +call will disable THPs completely for the process,
> +irrespective of global THP controls or
> +.BR MADV_COLLAPSE .
> +.TP
> +.B PR_THP_DISABLE_EXCEPT_ADVISED
> +The
> +.BR prctl (2)
> +call will disable THPs for the process
> +except when the usage of THPs is
> +advised.
> +Consequently, THPs will only be used when:
> +.RS
> +.IP \[bu] 3
> +Global THP controls are set to "always" or "madvise" and
> +.I \%madvise(...,\~MADV_HUGEPAGE)
> +or
> +.I \%madvise(...,\~MADV_COLLAPSE)
> +is used.
> +.IP \[bu]
> +Global THP controls are set to "never" and
> +.I \%madvise(...,\~MADV_COLLAPSE)
> +is used.
> +This is the same behavior
> +as if THPs would not be disabled on
> +a process level.
> +.RE
>  .SH RETURN VALUE
>  On success,
>  0 is returned.
> -- 
> 2.47.3
> 

-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).
Re: [PATCH v2] PR_*ET_THP_DISABLE.2const: document addition of PR_THP_DISABLE_EXCEPT_ADVISED
Posted by Usama Arif 1 month, 1 week ago

On 05/11/2025 16:48, Usama Arif wrote:
> PR_THP_DISABLE_EXCEPT_ADVISED extended PR_SET_THP_DISABLE to only provide
> THPs when advised. IOW, it allows individual processes to opt-out of THP =
> "always" into THP = "madvise", without affecting other workloads on the
> system. The series has been merged in [1]. Before [1], the following 2
> calls were allowed with PR_SET_THP_DISABLE:
> 
> prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0); // to reset THP setting.
> prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0); // to disable THPs completely.
> 
> Now in addition to the 2 calls above, you can do:
> 
> prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0); // to
> disable THPs except madvise.
> 
> This patch documents the changes introduced due to the addition of
> PR_THP_DISABLE_EXCEPT_ADVISED flag:
> - PR_GET_THP_DISABLE returns a value whose bits indicate how THP-disable
>   is configured for the calling thread (with or without
>   PR_THP_DISABLE_EXCEPT_ADVISED).
> - PR_SET_THP_DISABLE now uses arg3 to specify whether to disable THP
>   completely for the process, or disable except madvise
>   (PR_THP_DISABLE_EXCEPT_ADVISED).
> 
> [1] https://github.com/torvalds/linux/commit/9dc21bbd62edeae6f63e6f25e1edb7167452457b
> 
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> ---
> v1 -> v2 (Alejandro Colomar):
> - Fixed double negation on when MADV_HUGEPAGE will succeed
> - Turn return values of PR_GET_THP_DISABLE into a table
> - Turn madvise calls into full italics
> - Use semantic newlines
> ---
>  man/man2/madvise.2                      |  6 ++-
>  man/man2const/PR_GET_THP_DISABLE.2const | 20 +++++++---
>  man/man2const/PR_SET_THP_DISABLE.2const | 52 +++++++++++++++++++++----
>  3 files changed, 64 insertions(+), 14 deletions(-)
> 

Resending this for review as the patch to implement this is in merged [1]

[1] https://github.com/torvalds/linux/commit/9dc21bbd62edeae6f63e6f25e1edb7167452457b