Despite recent efforts to prevent lazy_mmu sections from nesting, it
remains difficult to ensure that it never occurs - and in fact it
does occur on arm64 in certain situations (CONFIG_DEBUG_PAGEALLOC).
Commit 1ef3095b1405 ("arm64/mm: Permit lazy_mmu_mode to be nested")
made nesting tolerable on arm64, but without truly supporting it:
the inner leave() call clears TIF_LAZY_MMU, disabling the batching
optimisation before the outer section ends.
Now that the lazy_mmu API allows enter() to pass through a state to
the matching leave() call, we can actually support nesting. If
enter() is called inside an active lazy_mmu section, TIF_LAZY_MMU
will already be set, and we can then return LAZY_MMU_NESTED to
instruct the matching leave() call not to clear TIF_LAZY_MMU.
The only effect of this patch is to ensure that TIF_LAZY_MMU (and
therefore the batching optimisation) remains set until the outermost
lazy_mmu section ends. leave() still emits barriers if needed,
regardless of the nesting level, as the caller may expect any
page table changes to become visible when leave() returns.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
---
arch/arm64/include/asm/pgtable.h | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 816197d08165..602feda97dc4 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -85,24 +85,14 @@ typedef int lazy_mmu_state_t;
static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void)
{
- /*
- * lazy_mmu_mode is not supposed to permit nesting. But in practice this
- * does happen with CONFIG_DEBUG_PAGEALLOC, where a page allocation
- * inside a lazy_mmu_mode section (such as zap_pte_range()) will change
- * permissions on the linear map with apply_to_page_range(), which
- * re-enters lazy_mmu_mode. So we tolerate nesting in our
- * implementation. The first call to arch_leave_lazy_mmu_mode() will
- * flush and clear the flag such that the remainder of the work in the
- * outer nest behaves as if outside of lazy mmu mode. This is safe and
- * keeps tracking simple.
- */
+ int lazy_mmu_nested;
if (in_interrupt())
return LAZY_MMU_DEFAULT;
- set_thread_flag(TIF_LAZY_MMU);
+ lazy_mmu_nested = test_and_set_thread_flag(TIF_LAZY_MMU);
- return LAZY_MMU_DEFAULT;
+ return lazy_mmu_nested ? LAZY_MMU_NESTED : LAZY_MMU_DEFAULT;
}
static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state)
@@ -113,7 +103,8 @@ static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state)
if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING))
emit_pte_barriers();
- clear_thread_flag(TIF_LAZY_MMU);
+ if (state != LAZY_MMU_NESTED)
+ clear_thread_flag(TIF_LAZY_MMU);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
--
2.47.0
Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> On Mon, Sep 08, 2025 at 08:39:27AM +0100, Kevin Brodsky wrote: > Despite recent efforts to prevent lazy_mmu sections from nesting, it > remains difficult to ensure that it never occurs - and in fact it > does occur on arm64 in certain situations (CONFIG_DEBUG_PAGEALLOC). > Commit 1ef3095b1405 ("arm64/mm: Permit lazy_mmu_mode to be nested") > made nesting tolerable on arm64, but without truly supporting it: > the inner leave() call clears TIF_LAZY_MMU, disabling the batching > optimisation before the outer section ends. > > Now that the lazy_mmu API allows enter() to pass through a state to > the matching leave() call, we can actually support nesting. If > enter() is called inside an active lazy_mmu section, TIF_LAZY_MMU > will already be set, and we can then return LAZY_MMU_NESTED to > instruct the matching leave() call not to clear TIF_LAZY_MMU. > > The only effect of this patch is to ensure that TIF_LAZY_MMU (and > therefore the batching optimisation) remains set until the outermost > lazy_mmu section ends. leave() still emits barriers if needed, > regardless of the nesting level, as the caller may expect any > page table changes to become visible when leave() returns. > > Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> > --- > arch/arm64/include/asm/pgtable.h | 19 +++++-------------- > 1 file changed, 5 insertions(+), 14 deletions(-) > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index 816197d08165..602feda97dc4 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -85,24 +85,14 @@ typedef int lazy_mmu_state_t; > > static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) > { > - /* > - * lazy_mmu_mode is not supposed to permit nesting. But in practice this > - * does happen with CONFIG_DEBUG_PAGEALLOC, where a page allocation > - * inside a lazy_mmu_mode section (such as zap_pte_range()) will change > - * permissions on the linear map with apply_to_page_range(), which > - * re-enters lazy_mmu_mode. So we tolerate nesting in our > - * implementation. The first call to arch_leave_lazy_mmu_mode() will > - * flush and clear the flag such that the remainder of the work in the > - * outer nest behaves as if outside of lazy mmu mode. This is safe and > - * keeps tracking simple. > - */ > + int lazy_mmu_nested; > > if (in_interrupt()) > return LAZY_MMU_DEFAULT; > > - set_thread_flag(TIF_LAZY_MMU); > + lazy_mmu_nested = test_and_set_thread_flag(TIF_LAZY_MMU); > > - return LAZY_MMU_DEFAULT; > + return lazy_mmu_nested ? LAZY_MMU_NESTED : LAZY_MMU_DEFAULT; > } > > static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) > @@ -113,7 +103,8 @@ static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) > if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING)) > emit_pte_barriers(); > > - clear_thread_flag(TIF_LAZY_MMU); > + if (state != LAZY_MMU_NESTED) > + clear_thread_flag(TIF_LAZY_MMU); > } > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > -- > 2.47.0 > -- Sincerely, Yeoreum Yun
© 2016 - 2025 Red Hat, Inc.