include/linux/huge_mm.h | 2 + mm/huge_memory.c | 4 + mm/khugepaged.c | 445 +++++++++++++++++------- tools/testing/selftests/mm/khugepaged.c | 5 +- 4 files changed, 319 insertions(+), 137 deletions(-)
This patchset extends khugepaged from collapsing only PMD-sized THPs to collapsing anonymous mTHPs. mTHPs were introduced in the kernel to improve memory management by allocating chunks of larger memory, so as to reduce number of page faults, TLB misses (due to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property is often lost due to CoW, swap-in/out, and when the kernel just cannot find enough physically contiguous memory to allocate on fault. Henceforth, there is a need to regain mTHPs in the system asynchronously. This work is an attempt in this direction, starting with anonymous folios. In the fault handler, we select the THP order in a greedy manner; the same has been used here, along with the same sysfs interface to control the order of collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock(). --------------------------------------------------------- Testing --------------------------------------------------------- The set has been build tested on x86_64. For Aarch64, 1. mm-selftests: No regressions. 2. Analyzing with tools/mm/thpmaps on different userspace programs mapping aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs), and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs. This patchset is rebased on mm-unstable (e7e89af21ffcfd1077ca6d2188de6497db1ad84c). Some points to be noted: 1. Some stats like pages_collapsed for khugepaged have not been extended for mTHP. I'd welcome suggestions on any updation, or addition to the sysfs interface. 2. Please see patch 9 for lock handling. Dev Jain (12): khugepaged: Rename hpage_collapse_scan_pmd() -> ptes() khugepaged: Generalize alloc_charge_folio() khugepaged: Generalize hugepage_vma_revalidate() khugepaged: Generalize __collapse_huge_page_swapin() khugepaged: Generalize __collapse_huge_page_isolate() khugepaged: Generalize __collapse_huge_page_copy_failed() khugepaged: Scan PTEs order-wise khugepaged: Abstract PMD-THP collapse khugepaged: Introduce vma_collapse_anon_folio() khugepaged: Skip PTE range if a larger mTHP is already mapped khugepaged: Enable sysfs to control order of collapse selftests/mm: khugepaged: Enlighten for mTHP collapse include/linux/huge_mm.h | 2 + mm/huge_memory.c | 4 + mm/khugepaged.c | 445 +++++++++++++++++------- tools/testing/selftests/mm/khugepaged.c | 5 +- 4 files changed, 319 insertions(+), 137 deletions(-) -- 2.30.2
+Nico, apologies, forgot to CC you. On 16/12/24 10:20 pm, Dev Jain wrote: > This patchset extends khugepaged from collapsing only PMD-sized THPs to > collapsing anonymous mTHPs. > > mTHPs were introduced in the kernel to improve memory management by allocating > chunks of larger memory, so as to reduce number of page faults, TLB misses (due > to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property > is often lost due to CoW, swap-in/out, and when the kernel just cannot find > enough physically contiguous memory to allocate on fault. Henceforth, there is a > need to regain mTHPs in the system asynchronously. This work is an attempt in > this direction, starting with anonymous folios. > > In the fault handler, we select the THP order in a greedy manner; the same has > been used here, along with the same sysfs interface to control the order of > collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock(). > > --------------------------------------------------------- > Testing > --------------------------------------------------------- > > The set has been build tested on x86_64. > For Aarch64, > 1. mm-selftests: No regressions. > 2. Analyzing with tools/mm/thpmaps on different userspace programs mapping > aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs), > and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs. > > This patchset is rebased on mm-unstable (e7e89af21ffcfd1077ca6d2188de6497db1ad84c). > > Some points to be noted: > 1. Some stats like pages_collapsed for khugepaged have not been extended for mTHP. > I'd welcome suggestions on any updation, or addition to the sysfs interface. > 2. Please see patch 9 for lock handling. > > Dev Jain (12): > khugepaged: Rename hpage_collapse_scan_pmd() -> ptes() > khugepaged: Generalize alloc_charge_folio() > khugepaged: Generalize hugepage_vma_revalidate() > khugepaged: Generalize __collapse_huge_page_swapin() > khugepaged: Generalize __collapse_huge_page_isolate() > khugepaged: Generalize __collapse_huge_page_copy_failed() > khugepaged: Scan PTEs order-wise > khugepaged: Abstract PMD-THP collapse > khugepaged: Introduce vma_collapse_anon_folio() > khugepaged: Skip PTE range if a larger mTHP is already mapped > khugepaged: Enable sysfs to control order of collapse > selftests/mm: khugepaged: Enlighten for mTHP collapse > > include/linux/huge_mm.h | 2 + > mm/huge_memory.c | 4 + > mm/khugepaged.c | 445 +++++++++++++++++------- > tools/testing/selftests/mm/khugepaged.c | 5 +- > 4 files changed, 319 insertions(+), 137 deletions(-) >
On Mon, Dec 16, 2024 at 10:31 AM Dev Jain <dev.jain@arm.com> wrote: > > +Nico, apologies, forgot to CC you. Hey Dev, Happy New Year! Thanks! I'm trying to apply/test your patches, but am failing to apply them due to mm-unstable which has "unstable" sha values, making applying them difficult. Could you share a public git repo to your patches? Also, have you seen any issues with your patches? My version of khugepaged mTHP support was mostly done before the holidays but I haven't posted due to some issues with (BAD PAGE) refcount issues when trying to reclaim pages that I haven't found the cause of yet. -- Nico > > On 16/12/24 10:20 pm, Dev Jain wrote: > > This patchset extends khugepaged from collapsing only PMD-sized THPs to > > collapsing anonymous mTHPs. > > > > mTHPs were introduced in the kernel to improve memory management by allocating > > chunks of larger memory, so as to reduce number of page faults, TLB misses (due > > to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property > > is often lost due to CoW, swap-in/out, and when the kernel just cannot find > > enough physically contiguous memory to allocate on fault. Henceforth, there is a > > need to regain mTHPs in the system asynchronously. This work is an attempt in > > this direction, starting with anonymous folios. > > > > In the fault handler, we select the THP order in a greedy manner; the same has > > been used here, along with the same sysfs interface to control the order of > > collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock(). > > > > --------------------------------------------------------- > > Testing > > --------------------------------------------------------- > > > > The set has been build tested on x86_64. > > For Aarch64, > > 1. mm-selftests: No regressions. > > 2. Analyzing with tools/mm/thpmaps on different userspace programs mapping > > aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs), > > and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs. > > > > This patchset is rebased on mm-unstable (e7e89af21ffcfd1077ca6d2188de6497db1ad84c). > > > > Some points to be noted: > > 1. Some stats like pages_collapsed for khugepaged have not been extended for mTHP. > > I'd welcome suggestions on any updation, or addition to the sysfs interface. > > 2. Please see patch 9 for lock handling. > > > > Dev Jain (12): > > khugepaged: Rename hpage_collapse_scan_pmd() -> ptes() > > khugepaged: Generalize alloc_charge_folio() > > khugepaged: Generalize hugepage_vma_revalidate() > > khugepaged: Generalize __collapse_huge_page_swapin() > > khugepaged: Generalize __collapse_huge_page_isolate() > > khugepaged: Generalize __collapse_huge_page_copy_failed() > > khugepaged: Scan PTEs order-wise > > khugepaged: Abstract PMD-THP collapse > > khugepaged: Introduce vma_collapse_anon_folio() > > khugepaged: Skip PTE range if a larger mTHP is already mapped > > khugepaged: Enable sysfs to control order of collapse > > selftests/mm: khugepaged: Enlighten for mTHP collapse > > > > include/linux/huge_mm.h | 2 + > > mm/huge_memory.c | 4 + > > mm/khugepaged.c | 445 +++++++++++++++++------- > > tools/testing/selftests/mm/khugepaged.c | 5 +- > > 4 files changed, 319 insertions(+), 137 deletions(-) > > >
On 03/01/25 3:28 am, Nico Pache wrote: > On Mon, Dec 16, 2024 at 10:31 AM Dev Jain <dev.jain@arm.com> wrote: >> +Nico, apologies, forgot to CC you. > Hey Dev, > > Happy New Year! Happy New Year to you too! > > Thanks! I'm trying to apply/test your patches, but am failing to apply > them due to mm-unstable which has "unstable" sha values, making > applying them difficult. That is strange. This works for me: Clone mm from akpm, checkout to mm-unstable, hard reset to e7e89af21ffcfd1077ca6d2188de6497db1ad84c , then apply the patches. > Could you share a public git repo to your patches? > > Also, have you seen any issues with your patches? My version of > khugepaged mTHP support was mostly done before the holidays but I > haven't posted due to some issues with (BAD PAGE) refcount issues when > trying to reclaim pages that I haven't found the cause of yet. > > -- Nico Did not find any obvious issues till now with debug configs on :) > >> On 16/12/24 10:20 pm, Dev Jain wrote: >>> This patchset extends khugepaged from collapsing only PMD-sized THPs to >>> collapsing anonymous mTHPs. >>> >>> mTHPs were introduced in the kernel to improve memory management by allocating >>> chunks of larger memory, so as to reduce number of page faults, TLB misses (due >>> to TLB coalescing), reduce length of LRU lists, etc. However, the mTHP property >>> is often lost due to CoW, swap-in/out, and when the kernel just cannot find >>> enough physically contiguous memory to allocate on fault. Henceforth, there is a >>> need to regain mTHPs in the system asynchronously. This work is an attempt in >>> this direction, starting with anonymous folios. >>> >>> In the fault handler, we select the THP order in a greedy manner; the same has >>> been used here, along with the same sysfs interface to control the order of >>> collapse. In contrast to PMD-collapse, we (hopefully) get rid of the mmap_write_lock(). >>> >>> --------------------------------------------------------- >>> Testing >>> --------------------------------------------------------- >>> >>> The set has been build tested on x86_64. >>> For Aarch64, >>> 1. mm-selftests: No regressions. >>> 2. Analyzing with tools/mm/thpmaps on different userspace programs mapping >>> aligned VMAs of a large size, faulting in basepages/mTHPs (according to sysfs), >>> and then madvise()'ing the VMA, khugepaged is able to 100% collapse the VMAs. >>> >>> This patchset is rebased on mm-unstable (e7e89af21ffcfd1077ca6d2188de6497db1ad84c). >>> >>> Some points to be noted: >>> 1. Some stats like pages_collapsed for khugepaged have not been extended for mTHP. >>> I'd welcome suggestions on any updation, or addition to the sysfs interface. >>> 2. Please see patch 9 for lock handling. >>> >>> Dev Jain (12): >>> khugepaged: Rename hpage_collapse_scan_pmd() -> ptes() >>> khugepaged: Generalize alloc_charge_folio() >>> khugepaged: Generalize hugepage_vma_revalidate() >>> khugepaged: Generalize __collapse_huge_page_swapin() >>> khugepaged: Generalize __collapse_huge_page_isolate() >>> khugepaged: Generalize __collapse_huge_page_copy_failed() >>> khugepaged: Scan PTEs order-wise >>> khugepaged: Abstract PMD-THP collapse >>> khugepaged: Introduce vma_collapse_anon_folio() >>> khugepaged: Skip PTE range if a larger mTHP is already mapped >>> khugepaged: Enable sysfs to control order of collapse >>> selftests/mm: khugepaged: Enlighten for mTHP collapse >>> >>> include/linux/huge_mm.h | 2 + >>> mm/huge_memory.c | 4 + >>> mm/khugepaged.c | 445 +++++++++++++++++------- >>> tools/testing/selftests/mm/khugepaged.c | 5 +- >>> 4 files changed, 319 insertions(+), 137 deletions(-) >>>
© 2016 - 2026 Red Hat, Inc.