[PATCH v4 12/12] Documentation: mm: update the admin guide for mTHP collapse

Nico Pache posted 12 patches 8 months ago
There is a newer version of this series
[PATCH v4 12/12] Documentation: mm: update the admin guide for mTHP collapse
Posted by Nico Pache 8 months ago
Now that we can collapse to mTHPs lets update the admin guide to
reflect these changes and provide proper guidence on how to utilize it.

Signed-off-by: Nico Pache <npache@redhat.com>
---
 Documentation/admin-guide/mm/transhuge.rst | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index dff8d5985f0f..06814e05e1d5 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -63,7 +63,7 @@ often.
 THP can be enabled system wide or restricted to certain tasks or even
 memory ranges inside task's address space. Unless THP is completely
 disabled, there is ``khugepaged`` daemon that scans memory and
-collapses sequences of basic pages into PMD-sized huge pages.
+collapses sequences of basic pages into huge pages.
 
 The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
 interface and using madvise(2) and prctl(2) system calls.
@@ -144,6 +144,14 @@ hugepage sizes have enabled="never". If enabling multiple hugepage
 sizes, the kernel will select the most appropriate enabled size for a
 given allocation.
 
+khugepaged uses max_ptes_none scaled to the order of the enabled mTHP size to
+determine collapses. When using mTHPs it's recommended to set max_ptes_none
+low-- ideally less than HPAGE_PMD_NR / 2 (255 on 4k page size). This will
+prevent undesired "creep" behavior that leads to continuously collapsing to a
+larger mTHP size. max_ptes_shared and max_ptes_swap have no effect when
+collapsing to a mTHP, and mTHP collapse will fail on shared or swapped out
+pages.
+
 It's also possible to limit defrag efforts in the VM to generate
 anonymous hugepages in case they're not immediately free to madvise
 regions or to never try to defrag memory and simply fallback to regular
-- 
2.48.1
Re: [PATCH v4 12/12] Documentation: mm: update the admin guide for mTHP collapse
Posted by Usama Arif 7 months, 3 weeks ago

On 17/04/2025 01:02, Nico Pache wrote:
> Now that we can collapse to mTHPs lets update the admin guide to
> reflect these changes and provide proper guidence on how to utilize it.
> 
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  Documentation/admin-guide/mm/transhuge.rst | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index dff8d5985f0f..06814e05e1d5 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -63,7 +63,7 @@ often.
>  THP can be enabled system wide or restricted to certain tasks or even
>  memory ranges inside task's address space. Unless THP is completely
>  disabled, there is ``khugepaged`` daemon that scans memory and
> -collapses sequences of basic pages into PMD-sized huge pages.
> +collapses sequences of basic pages into huge pages.
>  
>  The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
>  interface and using madvise(2) and prctl(2) system calls.
> @@ -144,6 +144,14 @@ hugepage sizes have enabled="never". If enabling multiple hugepage
>  sizes, the kernel will select the most appropriate enabled size for a
>  given allocation.
>  
> +khugepaged uses max_ptes_none scaled to the order of the enabled mTHP size to
> +determine collapses. When using mTHPs it's recommended to set max_ptes_none
> +low-- ideally less than HPAGE_PMD_NR / 2 (255 on 4k page size). This will
> +prevent undesired "creep" behavior that leads to continuously collapsing to a
> +larger mTHP size. max_ptes_shared and max_ptes_swap have no effect when
> +collapsing to a mTHP, and mTHP collapse will fail on shared or swapped out
> +pages.
> +

Hi Nico,

Could you add a bit more explanation of the creep behaviour here in documentation.
I remember you explained in one of the earlier versions that if more than half of the
collapsed mTHP is zero-filled, it for some reason becomes eligible for collapsing to
larger order, but if less than half is zero-filled its not eligible? I cant exactly
remember what the reason was :) Would be good to have it documented more if possible.

Thanks

>  It's also possible to limit defrag efforts in the VM to generate
>  anonymous hugepages in case they're not immediately free to madvise
>  regions or to never try to defrag memory and simply fallback to regular
Re: [PATCH v4 12/12] Documentation: mm: update the admin guide for mTHP collapse
Posted by Nico Pache 7 months, 3 weeks ago
On Thu, Apr 24, 2025 at 9:04 AM Usama Arif <usamaarif642@gmail.com> wrote:
>
>
>
> On 17/04/2025 01:02, Nico Pache wrote:
> > Now that we can collapse to mTHPs lets update the admin guide to
> > reflect these changes and provide proper guidence on how to utilize it.
> >
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> >  Documentation/admin-guide/mm/transhuge.rst | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> > index dff8d5985f0f..06814e05e1d5 100644
> > --- a/Documentation/admin-guide/mm/transhuge.rst
> > +++ b/Documentation/admin-guide/mm/transhuge.rst
> > @@ -63,7 +63,7 @@ often.
> >  THP can be enabled system wide or restricted to certain tasks or even
> >  memory ranges inside task's address space. Unless THP is completely
> >  disabled, there is ``khugepaged`` daemon that scans memory and
> > -collapses sequences of basic pages into PMD-sized huge pages.
> > +collapses sequences of basic pages into huge pages.
> >
> >  The THP behaviour is controlled via :ref:`sysfs <thp_sysfs>`
> >  interface and using madvise(2) and prctl(2) system calls.
> > @@ -144,6 +144,14 @@ hugepage sizes have enabled="never". If enabling multiple hugepage
> >  sizes, the kernel will select the most appropriate enabled size for a
> >  given allocation.
> >
> > +khugepaged uses max_ptes_none scaled to the order of the enabled mTHP size to
> > +determine collapses. When using mTHPs it's recommended to set max_ptes_none
> > +low-- ideally less than HPAGE_PMD_NR / 2 (255 on 4k page size). This will
> > +prevent undesired "creep" behavior that leads to continuously collapsing to a
> > +larger mTHP size. max_ptes_shared and max_ptes_swap have no effect when
> > +collapsing to a mTHP, and mTHP collapse will fail on shared or swapped out
> > +pages.
> > +
>
> Hi Nico,
>
> Could you add a bit more explanation of the creep behaviour here in documentation.
> I remember you explained in one of the earlier versions that if more than half of the
> collapsed mTHP is zero-filled, it for some reason becomes eligible for collapsing to
> larger order, but if less than half is zero-filled its not eligible? I cant exactly
> remember what the reason was :) Would be good to have it documented more if possible.
Hi Usama,

You can think of the creep as a byproduct of introducing N new
non-zero pages to a N sized mTHP, essentially doubling the size. On a
second pass of this mTHP the same condition would be eligible, leading
to constant promotion to the next size. If we allow khugepaged to
double the size of mTHP, by introducing non-zero pages, it will keep
doubling.

I'll see how I can incorporate this description into the admin guide.

-- Nico
>
> Thanks
>
> >  It's also possible to limit defrag efforts in the VM to generate
> >  anonymous hugepages in case they're not immediately free to madvise
> >  regions or to never try to defrag memory and simply fallback to regular
>