man/man2/madvise.2 | 6 ++- man/man2const/PR_GET_THP_DISABLE.2const | 20 +++++++--- man/man2const/PR_SET_THP_DISABLE.2const | 52 +++++++++++++++++++++---- 3 files changed, 64 insertions(+), 14 deletions(-)
PR_THP_DISABLE_EXCEPT_ADVISED extended PR_SET_THP_DISABLE to only provide
THPs when advised. IOW, it allows individual processes to opt-out of THP =
"always" into THP = "madvise", without affecting other workloads on the
system. The series has been merged in [1]. Before [1], the following 2
calls were allowed with PR_SET_THP_DISABLE:
prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0); // to reset THP setting.
prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0); // to disable THPs completely.
Now in addition to the 2 calls above, you can do:
prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0); // to
disable THPs except madvise.
This patch documents the changes introduced due to the addition of
PR_THP_DISABLE_EXCEPT_ADVISED flag:
- PR_GET_THP_DISABLE returns a value whose bits indicate how THP-disable
is configured for the calling thread (with or without
PR_THP_DISABLE_EXCEPT_ADVISED).
- PR_SET_THP_DISABLE now uses arg3 to specify whether to disable THP
completely for the process, or disable except madvise
(PR_THP_DISABLE_EXCEPT_ADVISED).
[1] https://github.com/torvalds/linux/commit/9dc21bbd62edeae6f63e6f25e1edb7167452457b
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
---
v1 -> v2 (Alejandro Colomar):
- Fixed double negation on when MADV_HUGEPAGE will succeed
- Turn return values of PR_GET_THP_DISABLE into a table
- Turn madvise calls into full italics
- Use semantic newlines
---
man/man2/madvise.2 | 6 ++-
man/man2const/PR_GET_THP_DISABLE.2const | 20 +++++++---
man/man2const/PR_SET_THP_DISABLE.2const | 52 +++++++++++++++++++++----
3 files changed, 64 insertions(+), 14 deletions(-)
diff --git a/man/man2/madvise.2 b/man/man2/madvise.2
index 7a4310c40..55c6f4a6c 100644
--- a/man/man2/madvise.2
+++ b/man/man2/madvise.2
@@ -372,9 +372,11 @@ or
.BR VM_PFNMAP ,
nor can it be stack memory or backed by a DAX-enabled device
(unless the DAX device is hot-plugged as System RAM).
-The process must also not have
+The process can have
.B PR_SET_THP_DISABLE
-set (see
+set only if
+.B PR_THP_DISABLE_EXCEPT_ADVISED
+flag is set (see
.BR prctl (2)).
.IP
The
diff --git a/man/man2const/PR_GET_THP_DISABLE.2const b/man/man2const/PR_GET_THP_DISABLE.2const
index 38ff3b370..d63cff21c 100644
--- a/man/man2const/PR_GET_THP_DISABLE.2const
+++ b/man/man2const/PR_GET_THP_DISABLE.2const
@@ -6,7 +6,7 @@
.SH NAME
PR_GET_THP_DISABLE
\-
-get the state of the "THP disable" flag for the calling thread
+get the state of the "THP disable" flags for the calling thread
.SH LIBRARY
Standard C library
.RI ( libc ,\~ \-lc )
@@ -18,13 +18,23 @@ Standard C library
.B int prctl(PR_GET_THP_DISABLE, 0L, 0L, 0L, 0L);
.fi
.SH DESCRIPTION
-Return the current setting of
-the "THP disable" flag for the calling thread:
-either 1, if the flag is set, or 0, if it is not.
+Return a value whose bits indicate how THP-disable is configured
+for the calling thread.
+The returned value is interpreted as follows:
+.P
+.TS
+allbox;
+cb cb cb l
+c c c l.
+Bit 1 Bit 0 Value Description
+0 0 0 No THP-disable behaviour specified.
+0 1 1 THP is entirely disabled for this process.
+1 1 3 THP-except-advised mode is set for this process.
+.TE
.SH RETURN VALUE
On success,
.BR PR_GET_THP_DISABLE ,
-returns the boolean value described above.
+returns the value described above.
On error, \-1 is returned, and
.I errno
is set to indicate the error.
diff --git a/man/man2const/PR_SET_THP_DISABLE.2const b/man/man2const/PR_SET_THP_DISABLE.2const
index 532beac66..75e17fa6a 100644
--- a/man/man2const/PR_SET_THP_DISABLE.2const
+++ b/man/man2const/PR_SET_THP_DISABLE.2const
@@ -6,7 +6,7 @@
.SH NAME
PR_SET_THP_DISABLE
\-
-set the state of the "THP disable" flag for the calling thread
+set the state of the "THP disable" flags for the calling thread
.SH LIBRARY
Standard C library
.RI ( libc ,\~ \-lc )
@@ -15,15 +15,20 @@ Standard C library
.BR "#include <linux/prctl.h>" " /* Definition of " PR_* " constants */"
.B #include <sys/prctl.h>
.P
-.BI "int prctl(PR_SET_THP_DISABLE, long " flag ", 0L, 0L, 0L);"
+.BI "int prctl(PR_SET_THP_DISABLE, long " thp_disable ", unsigned long " flags ", 0L, 0L);"
.fi
.SH DESCRIPTION
-Set the state of the "THP disable" flag for the calling thread.
+Set the state of the "THP disable" flags for the calling thread.
If
-.I flag
-has a nonzero value, the flag is set, otherwise it is cleared.
+.I thp_disable
+has a nonzero value,
+the THP disable flag is set according to the value of
+.I flags,
+otherwise it is cleared.
.P
-Setting this flag provides a method
+This
+.BR prctl (2)
+provides a method
for disabling transparent huge pages
for jobs where the code cannot be modified,
and using a
@@ -31,10 +36,43 @@ and using a
hook with
.BR madvise (2)
is not an option (i.e., statically allocated data).
-The setting of the "THP disable" flag is inherited by a child created via
+The setting of the "THP disable" flags is inherited by a child created via
.BR fork (2)
and is preserved across
.BR execve (2).
+.P
+The behavior depends on the value of
+.IR flags:
+.TP
+.B 0
+The
+.BR prctl (2)
+call will disable THPs completely for the process,
+irrespective of global THP controls or
+.BR MADV_COLLAPSE .
+.TP
+.B PR_THP_DISABLE_EXCEPT_ADVISED
+The
+.BR prctl (2)
+call will disable THPs for the process
+except when the usage of THPs is
+advised.
+Consequently, THPs will only be used when:
+.RS
+.IP \[bu] 3
+Global THP controls are set to "always" or "madvise" and
+.I \%madvise(...,\~MADV_HUGEPAGE)
+or
+.I \%madvise(...,\~MADV_COLLAPSE)
+is used.
+.IP \[bu]
+Global THP controls are set to "never" and
+.I \%madvise(...,\~MADV_COLLAPSE)
+is used.
+This is the same behavior
+as if THPs would not be disabled on
+a process level.
+.RE
.SH RETURN VALUE
On success,
0 is returned.
--
2.47.3
Hi Usama, On Wed, Nov 05, 2025 at 01:48:11PM +0000, Usama Arif wrote: > PR_THP_DISABLE_EXCEPT_ADVISED extended PR_SET_THP_DISABLE to only provide > THPs when advised. IOW, it allows individual processes to opt-out of THP = > "always" into THP = "madvise", without affecting other workloads on the > system. The series has been merged in [1]. Before [1], the following 2 > calls were allowed with PR_SET_THP_DISABLE: > > prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0); // to reset THP setting. > prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0); // to disable THPs completely. > > Now in addition to the 2 calls above, you can do: > > prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0); // to > disable THPs except madvise. > > This patch documents the changes introduced due to the addition of > PR_THP_DISABLE_EXCEPT_ADVISED flag: > - PR_GET_THP_DISABLE returns a value whose bits indicate how THP-disable > is configured for the calling thread (with or without > PR_THP_DISABLE_EXCEPT_ADVISED). > - PR_SET_THP_DISABLE now uses arg3 to specify whether to disable THP > completely for the process, or disable except madvise > (PR_THP_DISABLE_EXCEPT_ADVISED). > > [1] https://github.com/torvalds/linux/commit/9dc21bbd62edeae6f63e6f25e1edb7167452457b > > Signed-off-by: Usama Arif <usamaarif642@gmail.com> > --- > v1 -> v2 (Alejandro Colomar): > - Fixed double negation on when MADV_HUGEPAGE will succeed > - Turn return values of PR_GET_THP_DISABLE into a table > - Turn madvise calls into full italics > - Use semantic newlines Thanks! I've applied the patch. I've amended a few things (see below). Have a lovely day! Alex > --- > man/man2/madvise.2 | 6 ++- > man/man2const/PR_GET_THP_DISABLE.2const | 20 +++++++--- > man/man2const/PR_SET_THP_DISABLE.2const | 52 +++++++++++++++++++++---- > 3 files changed, 64 insertions(+), 14 deletions(-) > > diff --git a/man/man2/madvise.2 b/man/man2/madvise.2 > index 7a4310c40..55c6f4a6c 100644 > --- a/man/man2/madvise.2 > +++ b/man/man2/madvise.2 > @@ -372,9 +372,11 @@ or > .BR VM_PFNMAP , > nor can it be stack memory or backed by a DAX-enabled device > (unless the DAX device is hot-plugged as System RAM). > -The process must also not have > +The process can have > .B PR_SET_THP_DISABLE > -set (see > +set only if > +.B PR_THP_DISABLE_EXCEPT_ADVISED > +flag is set (see I've removed 'flag', for consistency. > .BR prctl (2)). > .IP > The > diff --git a/man/man2const/PR_GET_THP_DISABLE.2const b/man/man2const/PR_GET_THP_DISABLE.2const > index 38ff3b370..d63cff21c 100644 > --- a/man/man2const/PR_GET_THP_DISABLE.2const > +++ b/man/man2const/PR_GET_THP_DISABLE.2const > @@ -6,7 +6,7 @@ > .SH NAME > PR_GET_THP_DISABLE > \- > -get the state of the "THP disable" flag for the calling thread > +get the state of the "THP disable" flags for the calling thread > .SH LIBRARY > Standard C library > .RI ( libc ,\~ \-lc ) > @@ -18,13 +18,23 @@ Standard C library > .B int prctl(PR_GET_THP_DISABLE, 0L, 0L, 0L, 0L); > .fi > .SH DESCRIPTION > -Return the current setting of > -the "THP disable" flag for the calling thread: > -either 1, if the flag is set, or 0, if it is not. > +Return a value whose bits indicate how THP-disable is configured > +for the calling thread. > +The returned value is interpreted as follows: > +.P > +.TS > +allbox; > +cb cb cb l > +c c c l. > +Bit 1 Bit 0 Value Description > +0 0 0 No THP-disable behaviour specified. > +0 1 1 THP is entirely disabled for this process. > +1 1 3 THP-except-advised mode is set for this process. > +.TE I've replaced this by something simpler: .TP .B 0b00 No THP-disable behaviour specified. .TP .B 0b01 THP is entirely disabled for this process. .TP .B 0b11 THP-except-advised mode is set for this process. (0b is a binary prefix standardized in ISO C23, and it is now supported by printf(3) and strtol(3).) > .SH RETURN VALUE > On success, > .BR PR_GET_THP_DISABLE , > -returns the boolean value described above. > +returns the value described above. > On error, \-1 is returned, and > .I errno > is set to indicate the error. > diff --git a/man/man2const/PR_SET_THP_DISABLE.2const b/man/man2const/PR_SET_THP_DISABLE.2const > index 532beac66..75e17fa6a 100644 > --- a/man/man2const/PR_SET_THP_DISABLE.2const > +++ b/man/man2const/PR_SET_THP_DISABLE.2const > @@ -6,7 +6,7 @@ > .SH NAME > PR_SET_THP_DISABLE > \- > -set the state of the "THP disable" flag for the calling thread > +set the state of the "THP disable" flags for the calling thread > .SH LIBRARY > Standard C library > .RI ( libc ,\~ \-lc ) > @@ -15,15 +15,20 @@ Standard C library > .BR "#include <linux/prctl.h>" " /* Definition of " PR_* " constants */" > .B #include <sys/prctl.h> > .P > -.BI "int prctl(PR_SET_THP_DISABLE, long " flag ", 0L, 0L, 0L);" > +.BI "int prctl(PR_SET_THP_DISABLE, long " thp_disable ", unsigned long " flags ", 0L, 0L);" > .fi > .SH DESCRIPTION > -Set the state of the "THP disable" flag for the calling thread. > +Set the state of the "THP disable" flags for the calling thread. > If > -.I flag > -has a nonzero value, the flag is set, otherwise it is cleared. > +.I thp_disable > +has a nonzero value, > +the THP disable flag is set according to the value of > +.I flags, This should be .IR flags , > +otherwise it is cleared. > .P > -Setting this flag provides a method > +This > +.BR prctl (2) > +provides a method > for disabling transparent huge pages > for jobs where the code cannot be modified, > and using a > @@ -31,10 +36,43 @@ and using a > hook with > .BR madvise (2) > is not an option (i.e., statically allocated data). > -The setting of the "THP disable" flag is inherited by a child created via > +The setting of the "THP disable" flags is inherited by a child created via > .BR fork (2) > and is preserved across > .BR execve (2). > +.P > +The behavior depends on the value of > +.IR flags: This should be: .IR flags : > +.TP > +.B 0 > +The > +.BR prctl (2) > +call will disable THPs completely for the process, > +irrespective of global THP controls or > +.BR MADV_COLLAPSE . > +.TP > +.B PR_THP_DISABLE_EXCEPT_ADVISED > +The > +.BR prctl (2) > +call will disable THPs for the process > +except when the usage of THPs is > +advised. > +Consequently, THPs will only be used when: > +.RS > +.IP \[bu] 3 > +Global THP controls are set to "always" or "madvise" and > +.I \%madvise(...,\~MADV_HUGEPAGE) > +or > +.I \%madvise(...,\~MADV_COLLAPSE) > +is used. > +.IP \[bu] > +Global THP controls are set to "never" and > +.I \%madvise(...,\~MADV_COLLAPSE) > +is used. > +This is the same behavior > +as if THPs would not be disabled on > +a process level. > +.RE > .SH RETURN VALUE > On success, > 0 is returned. > -- > 2.47.3 > -- <https://www.alejandro-colomar.es> Use port 80 (that is, <...:80/>).
On 05/11/2025 16:48, Usama Arif wrote: > PR_THP_DISABLE_EXCEPT_ADVISED extended PR_SET_THP_DISABLE to only provide > THPs when advised. IOW, it allows individual processes to opt-out of THP = > "always" into THP = "madvise", without affecting other workloads on the > system. The series has been merged in [1]. Before [1], the following 2 > calls were allowed with PR_SET_THP_DISABLE: > > prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0); // to reset THP setting. > prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0); // to disable THPs completely. > > Now in addition to the 2 calls above, you can do: > > prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0); // to > disable THPs except madvise. > > This patch documents the changes introduced due to the addition of > PR_THP_DISABLE_EXCEPT_ADVISED flag: > - PR_GET_THP_DISABLE returns a value whose bits indicate how THP-disable > is configured for the calling thread (with or without > PR_THP_DISABLE_EXCEPT_ADVISED). > - PR_SET_THP_DISABLE now uses arg3 to specify whether to disable THP > completely for the process, or disable except madvise > (PR_THP_DISABLE_EXCEPT_ADVISED). > > [1] https://github.com/torvalds/linux/commit/9dc21bbd62edeae6f63e6f25e1edb7167452457b > > Signed-off-by: Usama Arif <usamaarif642@gmail.com> > --- > v1 -> v2 (Alejandro Colomar): > - Fixed double negation on when MADV_HUGEPAGE will succeed > - Turn return values of PR_GET_THP_DISABLE into a table > - Turn madvise calls into full italics > - Use semantic newlines > --- > man/man2/madvise.2 | 6 ++- > man/man2const/PR_GET_THP_DISABLE.2const | 20 +++++++--- > man/man2const/PR_SET_THP_DISABLE.2const | 52 +++++++++++++++++++++---- > 3 files changed, 64 insertions(+), 14 deletions(-) > Resending this for review as the patch to implement this is in merged [1] [1] https://github.com/torvalds/linux/commit/9dc21bbd62edeae6f63e6f25e1edb7167452457b
© 2016 - 2025 Red Hat, Inc.