include/asm-generic/tlb.h | 77 +++++++++++++++++++++- include/linux/hugetlb.h | 17 +++-- include/linux/mm_types.h | 1 + mm/hugetlb.c | 131 +++++++++++++++++++++----------------- mm/mmu_gather.c | 33 ++++++++++ mm/rmap.c | 45 ++++++------- 6 files changed, 213 insertions(+), 91 deletions(-)
One functional fix, one performance regression fix, and two related
comment fixes.
I cleaned up my prototype I recently shared [1] for the performance fix,
deferring most of the cleanups I had in the prototype to a later point.
While doing that I identified the other things.
The goal of this patch set is to be backported to stable trees "fairly"
easily. At least patch #1 and #4.
Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing
Patch #2 + #3 are simple comment fixes that patch #4 interacts with.
Patch #4 is a fix for the reported performance regression due to excessive
IPI broadcasts during fork()+exit().
The last patch is all about TLB flushes, IPIs and mmu_gather.
Read: complicated
I added as much comments + description that I possibly could, and I am
hoping for review from Jann.
There are plenty of cleanups in the future to be had + one reasonable
optimization on x86. But that's all out of scope for this series.
Compile tested on plenty of architectures.
Runtime tested, with a focus on fixing the performance regression using
the original reproducer [2] on x86.
[1] https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/
[2] https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/
--
v2 -> v3:
* Rebased to 6.19-rc2 and retested on x86
* Changes on last patch:
* Introduce and use tlb_gather_mmu_vma() for properly setting up mmu_gather
for hugetlb -- thanks to Harry for pointing me once again at the nasty
hugetlb integration in mmu_gather
* Move tlb_remove_huge_tlb_entry() after move_huge_pte()
* For consistency, always call tlb_gather_mmu_vma() after
flush_cache_range()
* Don't pass mmu_gather to hugetlb_change_protection(), simply use
a local one for now. (avoids messing with tlb_start_vma() /
tlb_start_end())
* Dropped Lorenzo's RB due to the changes
v1 -> v2:
* Picked RB's/ACK's, hopefully I didn't miss any
* Added the initialization of fully_unshared_tables in __tlb_gather_mmu()
(Thanks Nadav!)
* Refined some comments based on Lorenzo's feedback.
Cc: Will Deacon <will@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: Harry Yoo <harry.yoo@oracle.com>
Cc: Uschakow, Stanislav" <suschako@amazon.de>
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Prakash Sangappa <prakash.sangappa@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
David Hildenbrand (Red Hat) (4):
mm/hugetlb: fix hugetlb_pmd_shared()
mm/hugetlb: fix two comments related to huge_pmd_unshare()
mm/rmap: fix two comments related to huge_pmd_unshare()
mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables
using mmu_gather
include/asm-generic/tlb.h | 77 +++++++++++++++++++++-
include/linux/hugetlb.h | 17 +++--
include/linux/mm_types.h | 1 +
mm/hugetlb.c | 131 +++++++++++++++++++++-----------------
mm/mmu_gather.c | 33 ++++++++++
mm/rmap.c | 45 ++++++-------
6 files changed, 213 insertions(+), 91 deletions(-)
base-commit: b927546677c876e26eba308550207c2ddf812a43
--
2.52.0
Hi, Any update on this series? It's a hotfix series and I don't see it queued up anywhere in either mm-hotfixes-unstable or mm-hotfixes-stable, this issue is causing ongoing problems for a lot of people, is there any reason it's being delayed? It's received extensive approval and testing, so should be GTG right? Andrew, David? Thanks, Lorenzo On Tue, Dec 23, 2025 at 10:40:33PM +0100, David Hildenbrand (Red Hat) wrote: > One functional fix, one performance regression fix, and two related > comment fixes. > > I cleaned up my prototype I recently shared [1] for the performance fix, > deferring most of the cleanups I had in the prototype to a later point. > While doing that I identified the other things. > > The goal of this patch set is to be backported to stable trees "fairly" > easily. At least patch #1 and #4. > > Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing > Patch #2 + #3 are simple comment fixes that patch #4 interacts with. > Patch #4 is a fix for the reported performance regression due to excessive > IPI broadcasts during fork()+exit(). > > The last patch is all about TLB flushes, IPIs and mmu_gather. > Read: complicated > > I added as much comments + description that I possibly could, and I am > hoping for review from Jann. > > There are plenty of cleanups in the future to be had + one reasonable > optimization on x86. But that's all out of scope for this series. > > Compile tested on plenty of architectures. > > Runtime tested, with a focus on fixing the performance regression using > the original reproducer [2] on x86. > > [1] https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ > [2] https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ > > -- > > v2 -> v3: > * Rebased to 6.19-rc2 and retested on x86 > * Changes on last patch: > * Introduce and use tlb_gather_mmu_vma() for properly setting up mmu_gather > for hugetlb -- thanks to Harry for pointing me once again at the nasty > hugetlb integration in mmu_gather > * Move tlb_remove_huge_tlb_entry() after move_huge_pte() > * For consistency, always call tlb_gather_mmu_vma() after > flush_cache_range() > * Don't pass mmu_gather to hugetlb_change_protection(), simply use > a local one for now. (avoids messing with tlb_start_vma() / > tlb_start_end()) > * Dropped Lorenzo's RB due to the changes > > v1 -> v2: > * Picked RB's/ACK's, hopefully I didn't miss any > * Added the initialization of fully_unshared_tables in __tlb_gather_mmu() > (Thanks Nadav!) > * Refined some comments based on Lorenzo's feedback. > > Cc: Will Deacon <will@kernel.org> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Nick Piggin <npiggin@gmail.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Muchun Song <muchun.song@linux.dev> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Jann Horn <jannh@google.com> > Cc: Pedro Falcato <pfalcato@suse.de> > Cc: Rik van Riel <riel@surriel.com> > Cc: Harry Yoo <harry.yoo@oracle.com> > Cc: Uschakow, Stanislav" <suschako@amazon.de> > Cc: Laurence Oberman <loberman@redhat.com> > Cc: Prakash Sangappa <prakash.sangappa@oracle.com> > Cc: Nadav Amit <nadav.amit@gmail.com> > > David Hildenbrand (Red Hat) (4): > mm/hugetlb: fix hugetlb_pmd_shared() > mm/hugetlb: fix two comments related to huge_pmd_unshare() > mm/rmap: fix two comments related to huge_pmd_unshare() > mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables > using mmu_gather > > include/asm-generic/tlb.h | 77 +++++++++++++++++++++- > include/linux/hugetlb.h | 17 +++-- > include/linux/mm_types.h | 1 + > mm/hugetlb.c | 131 +++++++++++++++++++++----------------- > mm/mmu_gather.c | 33 ++++++++++ > mm/rmap.c | 45 ++++++------- > 6 files changed, 213 insertions(+), 91 deletions(-) > > > base-commit: b927546677c876e26eba308550207c2ddf812a43 > -- > 2.52.0 >
On Thu, 15 Jan 2026 17:22:30 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > Any update on this series? It's a hotfix series and I don't see it queued > up anywhere in either mm-hotfixes-unstable or mm-hotfixes-stable, this > issue is causing ongoing problems for a lot of people, is there any reason > it's being delayed? > > It's received extensive approval and testing, so should be GTG right? This has been in mm-unstable for a long time. As the series had a mixture of cc:stable and not-cc:stable patches, I figured we'd merge them all into next merge window and let the -stable maintainers figure it all out. As this is more urgent than I believed, we need to figure out what to do. a) Pluck out the two cc:stable patches, merge just those into 6.19-rc. b) Merge all of them into 6.19-rc, let -stable maintainers figure it out c) Stick with my original plan. <checks> The cc:stable "mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather" has dependencies on the preceding two non-cc:stable patches. So I'm thinking I add cc:stable to everything and send the whole series into 6.19-rcX. wdyt? Also, David, where do we stand with : I'd love to get some generic hugetlb testing on arm64 and powerpc, : that do hugetlb TLB flushing stuff a bit more special. : : I'll try doing some arm64 testing early in the new year myself. ?
On 1/15/26 19:05, Andrew Morton wrote: > On Thu, 15 Jan 2026 17:22:30 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > >> Any update on this series? It's a hotfix series and I don't see it queued >> up anywhere in either mm-hotfixes-unstable or mm-hotfixes-stable, this >> issue is causing ongoing problems for a lot of people, is there any reason >> it's being delayed? >> >> It's received extensive approval and testing, so should be GTG right? > > This has been in mm-unstable for a long time. As the series had a > mixture of cc:stable and not-cc:stable patches, I figured we'd merge > them all into next merge window and let the -stable maintainers figure > it all out. > > As this is more urgent than I believed, we need to figure out what to > do. > > a) Pluck out the two cc:stable patches, merge just those into 6.19-rc. > > b) Merge all of them into 6.19-rc, let -stable maintainers figure it out Right. We can just CC: stable on comment fixes #2 and #3 to make back-porting easier. > > c) Stick with my original plan. > > <checks> > > The cc:stable "mm/hugetlb: fix excessive IPI broadcasts when unsharing > PMD tables using mmu_gather" has dependencies on the preceding two > non-cc:stable patches. At least one of them (doc update), yes. > > So I'm thinking I add cc:stable to everything and send the whole series > into 6.19-rcX. wdyt? Works for me. > > > Also, David, where do we stand with > > : I'd love to get some generic hugetlb testing on arm64 and powerpc, > : that do hugetlb TLB flushing stuff a bit more special. > : > : I'll try doing some arm64 testing early in the new year myself. > > ? Not done yet, unfortunately. (I don't (yet) have easy access to decent arm64 hardware ;) ) I still hope that Jann could quickly have a look, but it's been a while already since I posted v1. -- Cheers David
On Thu, 15 Jan 2026 20:40:13 +0100 "David Hildenbrand (Red Hat)" <david@kernel.org> wrote: > On 1/15/26 19:05, Andrew Morton wrote: > > On Thu, 15 Jan 2026 17:22:30 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > > >> Any update on this series? It's a hotfix series and I don't see it queued > >> up anywhere in either mm-hotfixes-unstable or mm-hotfixes-stable, this > >> issue is causing ongoing problems for a lot of people, is there any reason > >> it's being delayed? > >> > >> It's received extensive approval and testing, so should be GTG right? > > > > This has been in mm-unstable for a long time. As the series had a > > mixture of cc:stable and not-cc:stable patches, I figured we'd merge > > them all into next merge window and let the -stable maintainers figure > > it all out. > > > > As this is more urgent than I believed, we need to figure out what to > > do. > > > > a) Pluck out the two cc:stable patches, merge just those into 6.19-rc. > > > > b) Merge all of them into 6.19-rc, let -stable maintainers figure it out > > Right. We can just CC: stable on comment fixes #2 and #3 to make > back-porting easier. Yep. Seems lame to be backporting comment fixes because real fixes were textually dependent. Also seems lame to retain known-wrong comments in stable kernels! > > > > > > Also, David, where do we stand with > > > > : I'd love to get some generic hugetlb testing on arm64 and powerpc, > > : that do hugetlb TLB flushing stuff a bit more special. > > : > > : I'll try doing some arm64 testing early in the new year myself. > > > > ? > > Not done yet, unfortunately. (I don't (yet) have easy access to decent > arm64 hardware ;) ) > > I still hope that Jann could quickly have a look, but it's been a while > already since I posted v1. OK. It may well have had the hoped-for testing in linux-next, only we didn't hear about it.
On Tue, 2025-12-23 at 22:40 +0100, David Hildenbrand (Red Hat) wrote: > One functional fix, one performance regression fix, and two related > comment fixes. > > I cleaned up my prototype I recently shared [1] for the performance > fix, > deferring most of the cleanups I had in the prototype to a later > point. > While doing that I identified the other things. > > The goal of this patch set is to be backported to stable trees > "fairly" > easily. At least patch #1 and #4. > > Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing > Patch #2 + #3 are simple comment fixes that patch #4 interacts with. > Patch #4 is a fix for the reported performance regression due to > excessive > IPI broadcasts during fork()+exit(). > > The last patch is all about TLB flushes, IPIs and mmu_gather. > Read: complicated > > I added as much comments + description that I possibly could, and I > am > hoping for review from Jann. > > There are plenty of cleanups in the future to be had + one reasonable > optimization on x86. But that's all out of scope for this series. > > Compile tested on plenty of architectures. > > Runtime tested, with a focus on fixing the performance regression > using > the original reproducer [2] on x86. > > [1] > https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ > [2] > https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ > > -- > > v2 -> v3: > * Rebased to 6.19-rc2 and retested on x86 > * Changes on last patch: > * Introduce and use tlb_gather_mmu_vma() for properly setting up > mmu_gather > for hugetlb -- thanks to Harry for pointing me once again at the > nasty > hugetlb integration in mmu_gather > * Move tlb_remove_huge_tlb_entry() after move_huge_pte() > * For consistency, always call tlb_gather_mmu_vma() after > flush_cache_range() > * Don't pass mmu_gather to hugetlb_change_protection(), simply use > a local one for now. (avoids messing with tlb_start_vma() / > tlb_start_end()) > * Dropped Lorenzo's RB due to the changes > > v1 -> v2: > * Picked RB's/ACK's, hopefully I didn't miss any > * Added the initialization of fully_unshared_tables in > __tlb_gather_mmu() > (Thanks Nadav!) > * Refined some comments based on Lorenzo's feedback. > > Cc: Will Deacon <will@kernel.org> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Nick Piggin <npiggin@gmail.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: Muchun Song <muchun.song@linux.dev> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Jann Horn <jannh@google.com> > Cc: Pedro Falcato <pfalcato@suse.de> > Cc: Rik van Riel <riel@surriel.com> > Cc: Harry Yoo <harry.yoo@oracle.com> > Cc: Uschakow, Stanislav" <suschako@amazon.de> > Cc: Laurence Oberman <loberman@redhat.com> > Cc: Prakash Sangappa <prakash.sangappa@oracle.com> > Cc: Nadav Amit <nadav.amit@gmail.com> > > David Hildenbrand (Red Hat) (4): > mm/hugetlb: fix hugetlb_pmd_shared() > mm/hugetlb: fix two comments related to huge_pmd_unshare() > mm/rmap: fix two comments related to huge_pmd_unshare() > mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables > using mmu_gather > > include/asm-generic/tlb.h | 77 +++++++++++++++++++++- > include/linux/hugetlb.h | 17 +++-- > include/linux/mm_types.h | 1 + > mm/hugetlb.c | 131 +++++++++++++++++++++--------------- > -- > mm/mmu_gather.c | 33 ++++++++++ > mm/rmap.c | 45 ++++++------- > 6 files changed, 213 insertions(+), 91 deletions(-) > > > base-commit: b927546677c876e26eba308550207c2ddf812a43 Hello David For the V3 series, I re-ran the tests and the original reproducer and its clean. I see the same almost 6x improvement for the original reproducer # uname -r 6.19.0-rc2-hugetlbv3+ Un-patched Result of reproducer Iteration completed in 3436 ms V3 Patched Result of reproducer Iteration completed in 639 ms I also ran a test to map every hugepage I could access (460GB of them) then fill and validate and had no issues. Tested-by: Laurence Oberman <loberman@redhat.com>
On 12/24/25 00:23, Laurence Oberman wrote: > On Tue, 2025-12-23 at 22:40 +0100, David Hildenbrand (Red Hat) wrote: >> One functional fix, one performance regression fix, and two related >> comment fixes. >> >> I cleaned up my prototype I recently shared [1] for the performance >> fix, >> deferring most of the cleanups I had in the prototype to a later >> point. >> While doing that I identified the other things. >> >> The goal of this patch set is to be backported to stable trees >> "fairly" >> easily. At least patch #1 and #4. >> >> Patch #1 fixes hugetlb_pmd_shared() not detecting any sharing >> Patch #2 + #3 are simple comment fixes that patch #4 interacts with. >> Patch #4 is a fix for the reported performance regression due to >> excessive >> IPI broadcasts during fork()+exit(). >> >> The last patch is all about TLB flushes, IPIs and mmu_gather. >> Read: complicated >> >> I added as much comments + description that I possibly could, and I >> am >> hoping for review from Jann. >> >> There are plenty of cleanups in the future to be had + one reasonable >> optimization on x86. But that's all out of scope for this series. >> >> Compile tested on plenty of architectures. >> >> Runtime tested, with a focus on fixing the performance regression >> using >> the original reproducer [2] on x86. >> >> [1] >> https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ >> [2] >> https://lore.kernel.org/all/8cab934d-4a56-44aa-b641-bfd7e23bd673@kernel.org/ >> >> -- >> >> v2 -> v3: >> * Rebased to 6.19-rc2 and retested on x86 >> * Changes on last patch: >> * Introduce and use tlb_gather_mmu_vma() for properly setting up >> mmu_gather >> for hugetlb -- thanks to Harry for pointing me once again at the >> nasty >> hugetlb integration in mmu_gather >> * Move tlb_remove_huge_tlb_entry() after move_huge_pte() >> * For consistency, always call tlb_gather_mmu_vma() after >> flush_cache_range() >> * Don't pass mmu_gather to hugetlb_change_protection(), simply use >> a local one for now. (avoids messing with tlb_start_vma() / >> tlb_start_end()) >> * Dropped Lorenzo's RB due to the changes >> >> v1 -> v2: >> * Picked RB's/ACK's, hopefully I didn't miss any >> * Added the initialization of fully_unshared_tables in >> __tlb_gather_mmu() >> (Thanks Nadav!) >> * Refined some comments based on Lorenzo's feedback. >> >> Cc: Will Deacon <will@kernel.org> >> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Nick Piggin <npiggin@gmail.com> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Cc: Arnd Bergmann <arnd@arndb.de> >> Cc: Muchun Song <muchun.song@linux.dev> >> Cc: Oscar Salvador <osalvador@suse.de> >> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> >> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >> Cc: Vlastimil Babka <vbabka@suse.cz> >> Cc: Jann Horn <jannh@google.com> >> Cc: Pedro Falcato <pfalcato@suse.de> >> Cc: Rik van Riel <riel@surriel.com> >> Cc: Harry Yoo <harry.yoo@oracle.com> >> Cc: Uschakow, Stanislav" <suschako@amazon.de> >> Cc: Laurence Oberman <loberman@redhat.com> >> Cc: Prakash Sangappa <prakash.sangappa@oracle.com> >> Cc: Nadav Amit <nadav.amit@gmail.com> >> >> David Hildenbrand (Red Hat) (4): >> mm/hugetlb: fix hugetlb_pmd_shared() >> mm/hugetlb: fix two comments related to huge_pmd_unshare() >> mm/rmap: fix two comments related to huge_pmd_unshare() >> mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables >> using mmu_gather >> >> include/asm-generic/tlb.h | 77 +++++++++++++++++++++- >> include/linux/hugetlb.h | 17 +++-- >> include/linux/mm_types.h | 1 + >> mm/hugetlb.c | 131 +++++++++++++++++++++--------------- >> -- >> mm/mmu_gather.c | 33 ++++++++++ >> mm/rmap.c | 45 ++++++------- >> 6 files changed, 213 insertions(+), 91 deletions(-) >> >> >> base-commit: b927546677c876e26eba308550207c2ddf812a43 > Hello David > > For the V3 series, I re-ran the tests and the original reproducer and > its clean. I see the same almost 6x improvement for the original > reproducer > > # uname -r > 6.19.0-rc2-hugetlbv3+ > > Un-patched Result of reproducer Iteration completed in 3436 ms > V3 Patched Result of reproducer Iteration completed in 639 ms > > I also ran a test to map every hugepage I could access (460GB of them) > then fill and validate and had no issues. > > Tested-by: Laurence Oberman <loberman@redhat.com> Thanks a lot for the quick retest Laurence! I'd love to get some generic hugetlb testing on arm64 and powerpc, that do hugetlb TLB flushing stuff a bit more special. I'll try doing some arm64 testing early in the new year myself. -- Cheers David
© 2016 - 2026 Red Hat, Inc.