From nobody Sun Sep 14 06:32:59 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 92AB4301460; Thu, 4 Sep 2025 12:58:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990708; cv=none; b=NBvT2ZGLRqxicrqZt3h53zentFGmKq+SZxpGmFY0iA6XmuMgH3mQ3TJOs7PWdrHpqby2Ue6sz0y3i0u7Pesr22y0snNeGJX/KyAwx7tZUNGRiBZ5LPzf/OXl47innYkp67llTpNERfjM7iWP8wevqFCOQSX3CLtecVSFh4TtX+k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990708; c=relaxed/simple; bh=J0wrEDhHvhhujJ+9znl+qpww540bRXLf9t4RiPsSpNE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oOweIR8XsJ7sbmPmP16s8k2RKPapgk8ss9h1wYYSD2CHSVV9K5dbE83s4aq5RFom80QCvGBCPgePc2XvVLy512WABfXiwZnssB+smikqvNjqcSMb6PhNpE5GMO42SxlM9TbL1Ge4uz1A6XtikgOoRuOKLhECqG1xTYb3lkUN4uc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 741D025E0; Thu, 4 Sep 2025 05:58:17 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 44AD33F6A8; Thu, 4 Sep 2025 05:58:21 -0700 (PDT) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH 1/7] mm: remove arch_flush_lazy_mmu_mode() Date: Thu, 4 Sep 2025 13:57:30 +0100 Message-ID: <20250904125736.3918646-2-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250904125736.3918646-1-kevin.brodsky@arm.com> References: <20250904125736.3918646-1-kevin.brodsky@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This function has only ever been used in arch/x86, so there is no need for other architectures to implement it. Remove it from linux/pgtable.h and all architectures besides x86. The arm64 implementation is not empty but it is only called from arch_leave_lazy_mmu_mode(), so we can simply fold it there. Signed-off-by: Kevin Brodsky Acked-by: Mike Rapoport (Microsoft) --- arch/arm64/include/asm/pgtable.h | 9 +-------- arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 2 -- arch/sparc/include/asm/tlbflush_64.h | 1 - arch/x86/include/asm/pgtable.h | 3 ++- include/linux/pgtable.h | 1 - 5 files changed, 3 insertions(+), 13 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index abd2dee416b3..728d7b6ed20a 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -101,21 +101,14 @@ static inline void arch_enter_lazy_mmu_mode(void) set_thread_flag(TIF_LAZY_MMU); } =20 -static inline void arch_flush_lazy_mmu_mode(void) +static inline void arch_leave_lazy_mmu_mode(void) { if (in_interrupt()) return; =20 if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING)) emit_pte_barriers(); -} - -static inline void arch_leave_lazy_mmu_mode(void) -{ - if (in_interrupt()) - return; =20 - arch_flush_lazy_mmu_mode(); clear_thread_flag(TIF_LAZY_MMU); } =20 diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powe= rpc/include/asm/book3s/64/tlbflush-hash.h index 146287d9580f..176d7fd79eeb 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h @@ -55,8 +55,6 @@ static inline void arch_leave_lazy_mmu_mode(void) preempt_enable(); } =20 -#define arch_flush_lazy_mmu_mode() do {} while (0) - extern void hash__tlbiel_all(unsigned int action); =20 extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, diff --git a/arch/sparc/include/asm/tlbflush_64.h b/arch/sparc/include/asm/= tlbflush_64.h index 8b8cdaa69272..cd144eb31bdd 100644 --- a/arch/sparc/include/asm/tlbflush_64.h +++ b/arch/sparc/include/asm/tlbflush_64.h @@ -44,7 +44,6 @@ void flush_tlb_kernel_range(unsigned long start, unsigned= long end); void flush_tlb_pending(void); void arch_enter_lazy_mmu_mode(void); void arch_leave_lazy_mmu_mode(void); -#define arch_flush_lazy_mmu_mode() do {} while (0) =20 /* Local cpu only. */ void __flush_tlb_all(void); diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e33df3da6980..14fd672bc9b2 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -117,7 +117,8 @@ extern pmdval_t early_pmd_flags; #define pte_val(x) native_pte_val(x) #define __pte(x) native_make_pte(x) =20 -#define arch_end_context_switch(prev) do {} while(0) +#define arch_end_context_switch(prev) do {} while (0) +#define arch_flush_lazy_mmu_mode() do {} while (0) #endif /* CONFIG_PARAVIRT_XXL */ =20 static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 4c035637eeb7..8848e132a6be 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -234,7 +234,6 @@ static inline int pmd_dirty(pmd_t pmd) #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE #define arch_enter_lazy_mmu_mode() do {} while (0) #define arch_leave_lazy_mmu_mode() do {} while (0) -#define arch_flush_lazy_mmu_mode() do {} while (0) #endif =20 #ifndef pte_batch_hint --=20 2.47.0 From nobody Sun Sep 14 06:32:59 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 65423302CC2; Thu, 4 Sep 2025 12:58:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990714; cv=none; b=QHO7MQ2xJO9s2rIX4avzlSn2GQcD0YNUd860XGKd0CmEI7gcQYGJ3Ag5z7QDzO2lKOygZ6ulpipRaN76mFS/WNJ++6NiSOXH6+rfex+457WSnl/JCPrfDeocfZkj9Yf314iLWJN6odPRI3D8/zybCMxyeyNpUackoIk8RsSvqYA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990714; c=relaxed/simple; bh=BNHla+dHaTUhyliCMqFZdLTy49MQkj7T7AR2y9EAUic=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aJ7HjTcB+y3r944XSeosbcV47c9ixGLSGCzhrxGgy15l6WXi8bUtm0I5KFVUlDdPHuhGZX2NzMsD4+0jfXifBjlz3LmcgoqUwzmWRDB77rFxKTydfNTga49953M2XyX1pbujKDnLqCTcOszi8zqx3pwhzVoa3Ix2wal/oeW6oy8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6F6E82E98; Thu, 4 Sep 2025 05:58:22 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1F7653F6A8; Thu, 4 Sep 2025 05:58:26 -0700 (PDT) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH 2/7] mm: introduce local state for lazy_mmu sections Date: Thu, 4 Sep 2025 13:57:31 +0100 Message-ID: <20250904125736.3918646-3-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250904125736.3918646-1-kevin.brodsky@arm.com> References: <20250904125736.3918646-1-kevin.brodsky@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" arch_{enter,leave}_lazy_mmu_mode() currently have a stateless API (taking and returning no value). This is proving problematic in situations where leave() needs to restore some context back to its original state (before enter() was called). In particular, this makes it difficult to support the nesting of lazy_mmu sections - leave() does not know whether the matching enter() call occurred while lazy_mmu was already enabled, and whether to disable it or not. This patch gives all architectures the chance to store local state while inside a lazy_mmu section by making enter() return some value, storing it in a local variable, and having leave() take that value. That value is typed lazy_mmu_state_t - each architecture defining __HAVE_ARCH_ENTER_LAZY_MMU_MODE is free to define it as it sees fit. For now we define it as int everywhere, which is sufficient to support nesting. The diff is unfortunately rather large as all the API changes need to be done atomically. Main parts: * Changing the prototypes of arch_{enter,leave}_lazy_mmu_mode() in generic and arch code, and introducing lazy_mmu_state_t. * Introducing LAZY_MMU_{DEFAULT,NESTED} for future support of nesting. enter() always returns LAZY_MMU_DEFAULT for now. (linux/mm_types.h is not the most natural location for defining those constants, but there is no other obvious header that is accessible where arch's implement the helpers.) * Changing all lazy_mmu sections to introduce a lazy_mmu_state local variable, having enter() set it and leave() take it. Most of these changes were generated using the Coccinelle script below. @@ @@ { + lazy_mmu_state_t lazy_mmu_state; ... - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); ... - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); ... } Note: it is difficult to provide a default definition of lazy_mmu_state_t for architectures implementing lazy_mmu, because that definition would need to be available in arch/x86/include/asm/paravirt_types.h and adding a new generic #include there is very tricky due to the existing header soup. Signed-off-by: Kevin Brodsky Acked-by: Mike Rapoport (Microsoft) --- arch/arm64/include/asm/pgtable.h | 10 +++++++--- .../include/asm/book3s/64/tlbflush-hash.h | 9 ++++++--- arch/powerpc/mm/book3s64/hash_tlb.c | 10 ++++++---- arch/powerpc/mm/book3s64/subpage_prot.c | 5 +++-- arch/sparc/include/asm/tlbflush_64.h | 5 +++-- arch/sparc/mm/tlb.c | 6 ++++-- arch/x86/include/asm/paravirt.h | 6 ++++-- arch/x86/include/asm/paravirt_types.h | 2 ++ arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/mmu_pv.c | 2 +- fs/proc/task_mmu.c | 5 +++-- include/linux/mm_types.h | 3 +++ include/linux/pgtable.h | 6 ++++-- mm/madvise.c | 20 ++++++++++--------- mm/memory.c | 20 +++++++++++-------- mm/migrate_device.c | 5 +++-- mm/mprotect.c | 5 +++-- mm/mremap.c | 5 +++-- mm/vmalloc.c | 15 ++++++++------ mm/vmscan.c | 15 ++++++++------ 20 files changed, 97 insertions(+), 59 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 728d7b6ed20a..816197d08165 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -81,7 +81,9 @@ static inline void queue_pte_barriers(void) } =20 #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE -static inline void arch_enter_lazy_mmu_mode(void) +typedef int lazy_mmu_state_t; + +static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { /* * lazy_mmu_mode is not supposed to permit nesting. But in practice this @@ -96,12 +98,14 @@ static inline void arch_enter_lazy_mmu_mode(void) */ =20 if (in_interrupt()) - return; + return LAZY_MMU_DEFAULT; =20 set_thread_flag(TIF_LAZY_MMU); + + return LAZY_MMU_DEFAULT; } =20 -static inline void arch_leave_lazy_mmu_mode(void) +static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) { if (in_interrupt()) return; diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powe= rpc/include/asm/book3s/64/tlbflush-hash.h index 176d7fd79eeb..c9f1e819e567 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h @@ -25,13 +25,14 @@ DECLARE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch= ); extern void __flush_tlb_pending(struct ppc64_tlb_batch *batch); =20 #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE +typedef int lazy_mmu_state_t; =20 -static inline void arch_enter_lazy_mmu_mode(void) +static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { struct ppc64_tlb_batch *batch; =20 if (radix_enabled()) - return; + return LAZY_MMU_DEFAULT; /* * apply_to_page_range can call us this preempt enabled when * operating on kernel page tables. @@ -39,9 +40,11 @@ static inline void arch_enter_lazy_mmu_mode(void) preempt_disable(); batch =3D this_cpu_ptr(&ppc64_tlb_batch); batch->active =3D 1; + + return LAZY_MMU_DEFAULT; } =20 -static inline void arch_leave_lazy_mmu_mode(void) +static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) { struct ppc64_tlb_batch *batch; =20 diff --git a/arch/powerpc/mm/book3s64/hash_tlb.c b/arch/powerpc/mm/book3s64= /hash_tlb.c index 21fcad97ae80..ee664f88e679 100644 --- a/arch/powerpc/mm/book3s64/hash_tlb.c +++ b/arch/powerpc/mm/book3s64/hash_tlb.c @@ -189,6 +189,7 @@ void hash__tlb_flush(struct mmu_gather *tlb) */ void __flush_hash_table_range(unsigned long start, unsigned long end) { + lazy_mmu_state_t lazy_mmu_state; int hugepage_shift; unsigned long flags; =20 @@ -205,7 +206,7 @@ void __flush_hash_table_range(unsigned long start, unsi= gned long end) * way to do things but is fine for our needs here. */ local_irq_save(flags); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); for (; start < end; start +=3D PAGE_SIZE) { pte_t *ptep =3D find_init_mm_pte(start, &hugepage_shift); unsigned long pte; @@ -217,12 +218,13 @@ void __flush_hash_table_range(unsigned long start, un= signed long end) continue; hpte_need_flush(&init_mm, start, ptep, pte, hugepage_shift); } - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); local_irq_restore(flags); } =20 void flush_hash_table_pmd_range(struct mm_struct *mm, pmd_t *pmd, unsigned= long addr) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte; pte_t *start_pte; unsigned long flags; @@ -237,7 +239,7 @@ void flush_hash_table_pmd_range(struct mm_struct *mm, p= md_t *pmd, unsigned long * way to do things but is fine for our needs here. */ local_irq_save(flags); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); start_pte =3D pte_offset_map(pmd, addr); if (!start_pte) goto out; @@ -249,6 +251,6 @@ void flush_hash_table_pmd_range(struct mm_struct *mm, p= md_t *pmd, unsigned long } pte_unmap(start_pte); out: - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); local_irq_restore(flags); } diff --git a/arch/powerpc/mm/book3s64/subpage_prot.c b/arch/powerpc/mm/book= 3s64/subpage_prot.c index ec98e526167e..4720f9f321af 100644 --- a/arch/powerpc/mm/book3s64/subpage_prot.c +++ b/arch/powerpc/mm/book3s64/subpage_prot.c @@ -53,6 +53,7 @@ void subpage_prot_free(struct mm_struct *mm) static void hpte_flush_range(struct mm_struct *mm, unsigned long addr, int npages) { + lazy_mmu_state_t lazy_mmu_state; pgd_t *pgd; p4d_t *p4d; pud_t *pud; @@ -73,13 +74,13 @@ static void hpte_flush_range(struct mm_struct *mm, unsi= gned long addr, pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); if (!pte) return; - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); for (; npages > 0; --npages) { pte_update(mm, addr, pte, 0, 0, 0); addr +=3D PAGE_SIZE; ++pte; } - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(pte - 1, ptl); } =20 diff --git a/arch/sparc/include/asm/tlbflush_64.h b/arch/sparc/include/asm/= tlbflush_64.h index cd144eb31bdd..02c93a4e6af5 100644 --- a/arch/sparc/include/asm/tlbflush_64.h +++ b/arch/sparc/include/asm/tlbflush_64.h @@ -40,10 +40,11 @@ static inline void flush_tlb_range(struct vm_area_struc= t *vma, void flush_tlb_kernel_range(unsigned long start, unsigned long end); =20 #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE +typedef int lazy_mmu_state_t; =20 void flush_tlb_pending(void); -void arch_enter_lazy_mmu_mode(void); -void arch_leave_lazy_mmu_mode(void); +lazy_mmu_state_t arch_enter_lazy_mmu_mode(void); +void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state); =20 /* Local cpu only. */ void __flush_tlb_all(void); diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index a35ddcca5e76..bf5094b770af 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -50,16 +50,18 @@ void flush_tlb_pending(void) put_cpu_var(tlb_batch); } =20 -void arch_enter_lazy_mmu_mode(void) +lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { struct tlb_batch *tb; =20 preempt_disable(); tb =3D this_cpu_ptr(&tlb_batch); tb->active =3D 1; + + return LAZY_MMU_DEFAULT; } =20 -void arch_leave_lazy_mmu_mode(void) +void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) { struct tlb_batch *tb =3D this_cpu_ptr(&tlb_batch); =20 diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravir= t.h index b5e59a7ba0d0..65a0d394fba1 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -527,12 +527,14 @@ static inline void arch_end_context_switch(struct tas= k_struct *next) } =20 #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE -static inline void arch_enter_lazy_mmu_mode(void) +static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { PVOP_VCALL0(mmu.lazy_mode.enter); + + return LAZY_MMU_DEFAULT; } =20 -static inline void arch_leave_lazy_mmu_mode(void) +static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) { PVOP_VCALL0(mmu.lazy_mode.leave); } diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/p= aravirt_types.h index 37a8627d8277..bc1af86868a3 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -41,6 +41,8 @@ struct pv_info { }; =20 #ifdef CONFIG_PARAVIRT_XXL +typedef int lazy_mmu_state_t; + struct pv_lazy_ops { /* Set deferred update mode, used for batching operations. */ void (*enter)(void); diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 26bbaf4b7330..a245ba47a631 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -426,7 +426,7 @@ static void xen_start_context_switch(struct task_struct= *prev) BUG_ON(preemptible()); =20 if (this_cpu_read(xen_lazy_mode) =3D=3D XEN_LAZY_MMU) { - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(LAZY_MMU_DEFAULT); set_ti_thread_flag(task_thread_info(prev), TIF_LAZY_MMU_UPDATES); } enter_lazy(XEN_LAZY_CPU); diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 2a4a8deaf612..2039d5132ca3 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -2140,7 +2140,7 @@ static void xen_flush_lazy_mmu(void) preempt_disable(); =20 if (xen_get_lazy_mode() =3D=3D XEN_LAZY_MMU) { - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(LAZY_MMU_DEFAULT); arch_enter_lazy_mmu_mode(); } =20 diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 29cca0e6d0ff..c9bf1128a4cd 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2610,6 +2610,7 @@ static int pagemap_scan_thp_entry(pmd_t *pmd, unsigne= d long start, static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start, unsigned long end, struct mm_walk *walk) { + lazy_mmu_state_t lazy_mmu_state; struct pagemap_scan_private *p =3D walk->private; struct vm_area_struct *vma =3D walk->vma; unsigned long addr, flush_end =3D 0; @@ -2628,7 +2629,7 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, return 0; } =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 if ((p->arg.flags & PM_SCAN_WP_MATCHING) && !p->vec_out) { /* Fast path for performing exclusive WP */ @@ -2698,7 +2699,7 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, if (flush_end) flush_tlb_range(vma, start, addr); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); =20 cond_resched(); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 08bc2442db93..18745c32f2c0 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1441,6 +1441,9 @@ extern void tlb_gather_mmu(struct mmu_gather *tlb, st= ruct mm_struct *mm); extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct= *mm); extern void tlb_finish_mmu(struct mmu_gather *tlb); =20 +#define LAZY_MMU_DEFAULT 0 +#define LAZY_MMU_NESTED 1 + struct vm_fault; =20 /** diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 8848e132a6be..6932c8e344ab 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -232,8 +232,10 @@ static inline int pmd_dirty(pmd_t pmd) * and the mode cannot be used in interrupt context. */ #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE -#define arch_enter_lazy_mmu_mode() do {} while (0) -#define arch_leave_lazy_mmu_mode() do {} while (0) +typedef int lazy_mmu_state_t; + +#define arch_enter_lazy_mmu_mode() (LAZY_MMU_DEFAULT) +#define arch_leave_lazy_mmu_mode(state) ((void)(state)) #endif =20 #ifndef pte_batch_hint diff --git a/mm/madvise.c b/mm/madvise.c index 35ed4ab0d7c5..72c032f2cf56 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -357,6 +357,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { + lazy_mmu_state_t lazy_mmu_state; struct madvise_walk_private *private =3D walk->private; struct mmu_gather *tlb =3D private->tlb; bool pageout =3D private->pageout; @@ -455,7 +456,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (!start_pte) return 0; flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); for (; addr < end; pte +=3D nr, addr +=3D nr * PAGE_SIZE) { nr =3D 1; ptent =3D ptep_get(pte); @@ -463,7 +464,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (++batch_count =3D=3D SWAP_CLUSTER_MAX) { batch_count =3D 0; if (need_resched()) { - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); cond_resched(); goto restart; @@ -499,7 +500,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (!folio_trylock(folio)) continue; folio_get(folio); - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); start_pte =3D NULL; err =3D split_folio(folio); @@ -510,7 +511,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (!start_pte) break; flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); if (!err) nr =3D 0; continue; @@ -558,7 +559,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, } =20 if (start_pte) { - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); } if (pageout) @@ -657,6 +658,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, =20 { const cydp_t cydp_flags =3D CYDP_CLEAR_YOUNG | CYDP_CLEAR_DIRTY; + lazy_mmu_state_t lazy_mmu_state; struct mmu_gather *tlb =3D walk->private; struct mm_struct *mm =3D tlb->mm; struct vm_area_struct *vma =3D walk->vma; @@ -677,7 +679,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (!start_pte) return 0; flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); for (; addr !=3D end; pte +=3D nr, addr +=3D PAGE_SIZE * nr) { nr =3D 1; ptent =3D ptep_get(pte); @@ -727,7 +729,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (!folio_trylock(folio)) continue; folio_get(folio); - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); start_pte =3D NULL; err =3D split_folio(folio); @@ -738,7 +740,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (!start_pte) break; flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); if (!err) nr =3D 0; continue; @@ -778,7 +780,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (nr_swap) add_mm_counter(mm, MM_SWAPENTS, nr_swap); if (start_pte) { - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); } cond_resched(); diff --git a/mm/memory.c b/mm/memory.c index 0ba4f6b71847..ebe0ffddcb77 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1079,6 +1079,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, unsigned long end) { + lazy_mmu_state_t lazy_mmu_state; struct mm_struct *dst_mm =3D dst_vma->vm_mm; struct mm_struct *src_mm =3D src_vma->vm_mm; pte_t *orig_src_pte, *orig_dst_pte; @@ -1126,7 +1127,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); orig_src_pte =3D src_pte; orig_dst_pte =3D dst_pte; - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 do { nr =3D 1; @@ -1195,7 +1196,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, } while (dst_pte +=3D nr, src_pte +=3D nr, addr +=3D PAGE_SIZE * nr, addr !=3D end); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(orig_src_pte, src_ptl); add_mm_rss_vec(dst_mm, rss); pte_unmap_unlock(orig_dst_pte, dst_ptl); @@ -1694,6 +1695,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, unsigned long addr, unsigned long end, struct zap_details *details) { + lazy_mmu_state_t lazy_mmu_state; bool force_flush =3D false, force_break =3D false; struct mm_struct *mm =3D tlb->mm; int rss[NR_MM_COUNTERS]; @@ -1714,7 +1716,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, return addr; =20 flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); do { bool any_skipped =3D false; =20 @@ -1746,7 +1748,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, direct_reclaim =3D try_get_and_clear_pmd(mm, pmd, &pmdval); =20 add_mm_rss_vec(mm, rss); - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); =20 /* Do the actual TLB flush before dropping ptl */ if (force_flush) { @@ -2683,6 +2685,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_= t *pmd, unsigned long addr, unsigned long end, unsigned long pfn, pgprot_t prot) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte, *mapped_pte; spinlock_t *ptl; int err =3D 0; @@ -2690,7 +2693,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_= t *pmd, mapped_pte =3D pte =3D pte_alloc_map_lock(mm, pmd, addr, &ptl); if (!pte) return -ENOMEM; - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); do { BUG_ON(!pte_none(ptep_get(pte))); if (!pfn_modify_allowed(pfn, prot)) { @@ -2700,7 +2703,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_= t *pmd, set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); pfn++; } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(mapped_pte, ptl); return err; } @@ -2989,6 +2992,7 @@ static int apply_to_pte_range(struct mm_struct *mm, p= md_t *pmd, pte_fn_t fn, void *data, bool create, pgtbl_mod_mask *mask) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte, *mapped_pte; int err =3D 0; spinlock_t *ptl; @@ -3007,7 +3011,7 @@ static int apply_to_pte_range(struct mm_struct *mm, p= md_t *pmd, return -EINVAL; } =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 if (fn) { do { @@ -3020,7 +3024,7 @@ static int apply_to_pte_range(struct mm_struct *mm, p= md_t *pmd, } *mask |=3D PGTBL_PTE_MODIFIED; =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); =20 if (mm !=3D &init_mm) pte_unmap_unlock(mapped_pte, ptl); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index e05e14d6eacd..659285c6ba77 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -59,6 +59,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, unsigned long end, struct mm_walk *walk) { + lazy_mmu_state_t lazy_mmu_state; struct migrate_vma *migrate =3D walk->private; struct folio *fault_folio =3D migrate->fault_page ? page_folio(migrate->fault_page) : NULL; @@ -110,7 +111,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); if (!ptep) goto again; - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 for (; addr < end; addr +=3D PAGE_SIZE, ptep++) { struct dev_pagemap *pgmap; @@ -287,7 +288,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (unmapped) flush_tlb_range(walk->vma, start, end); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(ptep - 1, ptl); =20 return 0; diff --git a/mm/mprotect.c b/mm/mprotect.c index 113b48985834..7bba651e5aa3 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -273,6 +273,7 @@ static long change_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte, oldpte; spinlock_t *ptl; long pages =3D 0; @@ -293,7 +294,7 @@ static long change_pte_range(struct mmu_gather *tlb, target_node =3D numa_node_id(); =20 flush_tlb_batched_pending(vma->vm_mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); do { nr_ptes =3D 1; oldpte =3D ptep_get(pte); @@ -439,7 +440,7 @@ static long change_pte_range(struct mmu_gather *tlb, } } } while (pte +=3D nr_ptes, addr +=3D nr_ptes * PAGE_SIZE, addr !=3D end); - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(pte - 1, ptl); =20 return pages; diff --git a/mm/mremap.c b/mm/mremap.c index e618a706aff5..dac29a734e16 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -193,6 +193,7 @@ static int mremap_folio_pte_batch(struct vm_area_struct= *vma, unsigned long addr static int move_ptes(struct pagetable_move_control *pmc, unsigned long extent, pmd_t *old_pmd, pmd_t *new_pmd) { + lazy_mmu_state_t lazy_mmu_state; struct vm_area_struct *vma =3D pmc->old; bool need_clear_uffd_wp =3D vma_has_uffd_without_event_remap(vma); struct mm_struct *mm =3D vma->vm_mm; @@ -256,7 +257,7 @@ static int move_ptes(struct pagetable_move_control *pmc, if (new_ptl !=3D old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); flush_tlb_batched_pending(vma->vm_mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 for (; old_addr < old_end; old_ptep +=3D nr_ptes, old_addr +=3D nr_ptes *= PAGE_SIZE, new_ptep +=3D nr_ptes, new_addr +=3D nr_ptes * PAGE_SIZE) { @@ -301,7 +302,7 @@ static int move_ptes(struct pagetable_move_control *pmc, } } =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); if (force_flush) flush_tlb_range(vma, old_end - len, old_end); if (new_ptl !=3D old_ptl) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6dbcdceecae1..f901675dd060 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -95,6 +95,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr,= unsigned long end, phys_addr_t phys_addr, pgprot_t prot, unsigned int max_page_shift, pgtbl_mod_mask *mask) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte; u64 pfn; struct page *page; @@ -105,7 +106,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long add= r, unsigned long end, if (!pte) return -ENOMEM; =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 do { if (unlikely(!pte_none(ptep_get(pte)))) { @@ -131,7 +132,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long add= r, unsigned long end, pfn++; } while (pte +=3D PFN_DOWN(size), addr +=3D size, addr !=3D end); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); *mask |=3D PGTBL_PTE_MODIFIED; return 0; } @@ -354,12 +355,13 @@ int ioremap_page_range(unsigned long addr, unsigned l= ong end, static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long= end, pgtbl_mod_mask *mask) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte; pte_t ptent; unsigned long size =3D PAGE_SIZE; =20 pte =3D pte_offset_kernel(pmd, addr); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 do { #ifdef CONFIG_HUGETLB_PAGE @@ -378,7 +380,7 @@ static void vunmap_pte_range(pmd_t *pmd, unsigned long = addr, unsigned long end, WARN_ON(!pte_none(ptent) && !pte_present(ptent)); } while (pte +=3D (size >> PAGE_SHIFT), addr +=3D size, addr !=3D end); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); *mask |=3D PGTBL_PTE_MODIFIED; } =20 @@ -514,6 +516,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned lo= ng addr, unsigned long end, pgprot_t prot, struct page **pages, int *nr, pgtbl_mod_mask *mask) { + lazy_mmu_state_t lazy_mmu_state; int err =3D 0; pte_t *pte; =20 @@ -526,7 +529,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned lo= ng addr, if (!pte) return -ENOMEM; =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 do { struct page *page =3D pages[*nr]; @@ -548,7 +551,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned lo= ng addr, (*nr)++; } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); *mask |=3D PGTBL_PTE_MODIFIED; =20 return err; diff --git a/mm/vmscan.c b/mm/vmscan.c index a48aec8bfd92..13b6657c8743 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3521,6 +3521,7 @@ static void walk_update_folio(struct lru_gen_mm_walk = *walk, struct folio *folio, static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long = end, struct mm_walk *args) { + lazy_mmu_state_t lazy_mmu_state; int i; bool dirty; pte_t *pte; @@ -3550,7 +3551,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long = start, unsigned long end, return false; } =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); restart: for (i =3D pte_index(start), addr =3D start; addr !=3D end; i++, addr += =3D PAGE_SIZE) { unsigned long pfn; @@ -3591,7 +3592,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long = start, unsigned long end, if (i < PTRS_PER_PTE && get_next_vma(PMD_MASK, PAGE_SIZE, args, &start, &= end)) goto restart; =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(pte, ptl); =20 return suitable_to_scan(total, young); @@ -3600,6 +3601,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long = start, unsigned long end, static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct v= m_area_struct *vma, struct mm_walk *args, unsigned long *bitmap, unsigned long *first) { + lazy_mmu_state_t lazy_mmu_state; int i; bool dirty; pmd_t *pmd; @@ -3632,7 +3634,7 @@ static void walk_pmd_range_locked(pud_t *pud, unsigne= d long addr, struct vm_area if (!spin_trylock(ptl)) goto done; =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 do { unsigned long pfn; @@ -3679,7 +3681,7 @@ static void walk_pmd_range_locked(pud_t *pud, unsigne= d long addr, struct vm_area =20 walk_update_folio(walk, last, gen, dirty); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); spin_unlock(ptl); done: *first =3D -1; @@ -4227,6 +4229,7 @@ static void lru_gen_age_node(struct pglist_data *pgda= t, struct scan_control *sc) */ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + lazy_mmu_state_t lazy_mmu_state; int i; bool dirty; unsigned long start; @@ -4278,7 +4281,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) } } =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 pte -=3D (addr - start) / PAGE_SIZE; =20 @@ -4312,7 +4315,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) =20 walk_update_folio(walk, last, gen, dirty); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); =20 /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) --=20 2.47.0 From nobody Sun Sep 14 06:32:59 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2BE412FF164; Thu, 4 Sep 2025 12:58:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990717; cv=none; b=iZmVJbCP6mDVwgDt8wFfTJRsPEmtSPtULv2TaSGQCGLg/VkcESfTAfUJXuLdrTFaEg5lpc0yTnWUutQxfER55CIieylH+ffuttOeFvBtdLM/jf64x520eh5wKcy3QmD4eQavAhL6xxSibEon3yKEZAO+W7QLAhMpNSRxiHH+nEk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990717; c=relaxed/simple; bh=hkDsOOLYTwNQnHVD2JN+ks+GUnYIcxo3wNnKCrpM0zU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gahKBmvq1KhL/8uu3D2H+89kNPUAWWRoR12gypCw/9tz3SZCudmESDs3rmPNUgR8lzojFDsQfxh2QnEZzLKrCEvHKP59ajBmVDAgthdyBj+UzVgj0tajdnP0exmgDAcoH6iDpgRLyfk/DmktsduazVxhwk2kWN5aVZPITZ/kNOA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4A9CE2EC6; Thu, 4 Sep 2025 05:58:27 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 17AC83F6A8; Thu, 4 Sep 2025 05:58:30 -0700 (PDT) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH 3/7] arm64: mm: fully support nested lazy_mmu sections Date: Thu, 4 Sep 2025 13:57:32 +0100 Message-ID: <20250904125736.3918646-4-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250904125736.3918646-1-kevin.brodsky@arm.com> References: <20250904125736.3918646-1-kevin.brodsky@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Despite recent efforts to prevent lazy_mmu sections from nesting, it remains difficult to ensure that it never occurs - and in fact it does occur on arm64 in certain situations (CONFIG_DEBUG_PAGEALLOC). Commit 1ef3095b1405 ("arm64/mm: Permit lazy_mmu_mode to be nested") made nesting tolerable on arm64, but without truly supporting it: the inner leave() call clears TIF_LAZY_MMU, disabling the batching optimisation before the outer section ends. Now that the lazy_mmu API allows enter() to pass through a state to the matching leave() call, we can actually support nesting. If enter() is called inside an active lazy_mmu section, TIF_LAZY_MMU will already be set, and we can then return LAZY_MMU_NESTED to instruct the matching leave() call not to clear TIF_LAZY_MMU. The only effect of this patch is to ensure that TIF_LAZY_MMU (and therefore the batching optimisation) remains set until the outermost lazy_mmu section ends. leave() still emits barriers if needed, regardless of the nesting level, as the caller may expect any page table changes to become visible when leave() returns. Signed-off-by: Kevin Brodsky --- arch/arm64/include/asm/pgtable.h | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 816197d08165..602feda97dc4 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -85,24 +85,14 @@ typedef int lazy_mmu_state_t; =20 static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { - /* - * lazy_mmu_mode is not supposed to permit nesting. But in practice this - * does happen with CONFIG_DEBUG_PAGEALLOC, where a page allocation - * inside a lazy_mmu_mode section (such as zap_pte_range()) will change - * permissions on the linear map with apply_to_page_range(), which - * re-enters lazy_mmu_mode. So we tolerate nesting in our - * implementation. The first call to arch_leave_lazy_mmu_mode() will - * flush and clear the flag such that the remainder of the work in the - * outer nest behaves as if outside of lazy mmu mode. This is safe and - * keeps tracking simple. - */ + int lazy_mmu_nested; =20 if (in_interrupt()) return LAZY_MMU_DEFAULT; =20 - set_thread_flag(TIF_LAZY_MMU); + lazy_mmu_nested =3D test_and_set_thread_flag(TIF_LAZY_MMU); =20 - return LAZY_MMU_DEFAULT; + return lazy_mmu_nested ? LAZY_MMU_NESTED : LAZY_MMU_DEFAULT; } =20 static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) @@ -113,7 +103,8 @@ static inline void arch_leave_lazy_mmu_mode(lazy_mmu_st= ate_t state) if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING)) emit_pte_barriers(); =20 - clear_thread_flag(TIF_LAZY_MMU); + if (state !=3D LAZY_MMU_NESTED) + clear_thread_flag(TIF_LAZY_MMU); } =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE --=20 2.47.0 From nobody Sun Sep 14 06:32:59 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 33F7630496A; Thu, 4 Sep 2025 12:58:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990722; cv=none; b=ia2P3LQakfZ04CPf2SaDS+ouIXOfKeHYq+s9keRcJpHvJMm8Mxcahtp7SiwfI2CpQhZUgK6ChsRQnDEFWFsOf5t3vI1RkAJJS8gKYrpwhg8gJAnk/lmnVSZDvQWb0/XygRa63E5hMsI0/9sEUgKxZdt7I5WHOknyYuNfT+5aiuM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990722; c=relaxed/simple; bh=J1/wcjUx9MmIGYpRK2XuavFky8ZRhqkYg87SLB42Ixk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=S4XWKupdMvj6Lavb7j92LuVPrH4MeERn0Pmq3/sCTqEzcMe8IyUHXbtrHcm8VgCaboythKwz8KlSk3ngABfcNskIgM4kTyq5pOyZJxEDEnYxUYwwfwRjZ1+/TzNwYJUrJjr0tg6kFe6nAGRCMoU0hu2DAWtciex/n8WeUd/O1z8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2267F2ECB; Thu, 4 Sep 2025 05:58:32 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E64E73F6A8; Thu, 4 Sep 2025 05:58:35 -0700 (PDT) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH 4/7] x86/xen: support nested lazy_mmu sections (again) Date: Thu, 4 Sep 2025 13:57:33 +0100 Message-ID: <20250904125736.3918646-5-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250904125736.3918646-1-kevin.brodsky@arm.com> References: <20250904125736.3918646-1-kevin.brodsky@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Commit 49147beb0ccb ("x86/xen: allow nesting of same lazy mode") originally introduced support for nested lazy sections (LAZY_MMU and LAZY_CPU). It later got reverted by commit c36549ff8d84 as its implementation turned out to be intolerant to preemption. Now that the lazy_mmu API allows enter() to pass through a state to the matching leave() call, we can support nesting again for the LAZY_MMU mode in a preemption-safe manner. If xen_enter_lazy_mmu() is called inside an active lazy_mmu section, xen_lazy_mode will already be set to XEN_LAZY_MMU and we can then return LAZY_MMU_NESTED to instruct the matching xen_leave_lazy_mmu() call to leave xen_lazy_mode unchanged. The only effect of this patch is to ensure that xen_lazy_mode remains set to XEN_LAZY_MMU until the outermost lazy_mmu section ends. xen_leave_lazy_mmu() still calls xen_mc_flush() unconditionally. Signed-off-by: Kevin Brodsky --- arch/x86/include/asm/paravirt.h | 6 ++---- arch/x86/include/asm/paravirt_types.h | 4 ++-- arch/x86/xen/mmu_pv.c | 11 ++++++++--- 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravir= t.h index 65a0d394fba1..4ecd3a6b1dea 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -529,14 +529,12 @@ static inline void arch_end_context_switch(struct tas= k_struct *next) #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { - PVOP_VCALL0(mmu.lazy_mode.enter); - - return LAZY_MMU_DEFAULT; + return PVOP_CALL0(lazy_mmu_state_t, mmu.lazy_mode.enter); } =20 static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) { - PVOP_VCALL0(mmu.lazy_mode.leave); + PVOP_VCALL1(mmu.lazy_mode.leave, state); } =20 static inline void arch_flush_lazy_mmu_mode(void) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/p= aravirt_types.h index bc1af86868a3..b7c567ccbf32 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -45,8 +45,8 @@ typedef int lazy_mmu_state_t; =20 struct pv_lazy_ops { /* Set deferred update mode, used for batching operations. */ - void (*enter)(void); - void (*leave)(void); + lazy_mmu_state_t (*enter)(void); + void (*leave)(lazy_mmu_state_t); void (*flush)(void); } __no_randomize_layout; #endif diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 2039d5132ca3..6e5390ff06a5 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -2130,9 +2130,13 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t= phys, pgprot_t prot) #endif } =20 -static void xen_enter_lazy_mmu(void) +static lazy_mmu_state_t xen_enter_lazy_mmu(void) { + if (this_cpu_read(xen_lazy_mode) =3D=3D XEN_LAZY_MMU) + return LAZY_MMU_NESTED; + enter_lazy(XEN_LAZY_MMU); + return LAZY_MMU_DEFAULT; } =20 static void xen_flush_lazy_mmu(void) @@ -2167,11 +2171,12 @@ static void __init xen_post_allocator_init(void) pv_ops.mmu.write_cr3 =3D &xen_write_cr3; } =20 -static void xen_leave_lazy_mmu(void) +static void xen_leave_lazy_mmu(lazy_mmu_state_t state) { preempt_disable(); xen_mc_flush(); - leave_lazy(XEN_LAZY_MMU); + if (state !=3D LAZY_MMU_NESTED) + leave_lazy(XEN_LAZY_MMU); preempt_enable(); } =20 --=20 2.47.0 From nobody Sun Sep 14 06:32:59 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id F24B52FF649; Thu, 4 Sep 2025 12:58:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990727; cv=none; b=qzyxxzqRVj5G8ZMg0QqTvesLVfCMkYA+Ql2b+SEdERtSpOqL7x0YT1oCqmiLuE53FoBg8wfGK8HIYFGhrPSP7KK08XFB89ZU+1rz9DMTVaj3C/hH+ZBoEpgq4kAVQRP4tdfpOX5MS54KlLdTYUGFeaHK6qMSWTUVh0s6lnWJX4I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990727; c=relaxed/simple; bh=Wcg92DRZXcNqlymrmFUfKEEAeGMaOYSIPVR12E7Jvgo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YqkJFeDN8Jm+nF1OjjTN3wG5p5KN1lxkaGrhkSpizbVKDhhgwoU/DUuq+irsBzMAbxIDcISvV3x+z4jCS36p/1Uqs+dy4j+uzn1e9awCI+Azv9g8Bc/ppXQplRiFVuqBvTswG3Jm7VDrJLmlrpGfMlXAv+5faOLx3gRZOLtJL6g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 00A6F2ED2; Thu, 4 Sep 2025 05:58:37 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C0CAA3F6A8; Thu, 4 Sep 2025 05:58:40 -0700 (PDT) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH 5/7] powerpc/mm: support nested lazy_mmu sections Date: Thu, 4 Sep 2025 13:57:34 +0100 Message-ID: <20250904125736.3918646-6-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250904125736.3918646-1-kevin.brodsky@arm.com> References: <20250904125736.3918646-1-kevin.brodsky@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The lazy_mmu API now allows nested sections to be handled by arch code: enter() can return a flag if called inside another lazy_mmu section, so that the matching call to leave() leaves any optimisation enabled. This patch implements that new logic for powerpc: if there is an active batch, then enter() returns LAZY_MMU_NESTED and the matching leave() leaves batch->active set. The preempt_{enable,disable} calls are left untouched as they already handle nesting themselves. TLB flushing is still done in leave() regardless of the nesting level, as the caller may rely on it whether nesting is occurring or not. Signed-off-by: Kevin Brodsky --- .../powerpc/include/asm/book3s/64/tlbflush-hash.h | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powe= rpc/include/asm/book3s/64/tlbflush-hash.h index c9f1e819e567..001c474da1fe 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h @@ -30,6 +30,7 @@ typedef int lazy_mmu_state_t; static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { struct ppc64_tlb_batch *batch; + int lazy_mmu_nested; =20 if (radix_enabled()) return LAZY_MMU_DEFAULT; @@ -39,9 +40,14 @@ static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(= void) */ preempt_disable(); batch =3D this_cpu_ptr(&ppc64_tlb_batch); - batch->active =3D 1; + lazy_mmu_nested =3D batch->active; =20 - return LAZY_MMU_DEFAULT; + if (!lazy_mmu_nested) { + batch->active =3D 1; + return LAZY_MMU_DEFAULT; + } else { + return LAZY_MMU_NESTED; + } } =20 static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) @@ -54,7 +60,10 @@ static inline void arch_leave_lazy_mmu_mode(lazy_mmu_sta= te_t state) =20 if (batch->index) __flush_tlb_pending(batch); - batch->active =3D 0; + + if (state !=3D LAZY_MMU_NESTED) + batch->active =3D 0; + preempt_enable(); } =20 --=20 2.47.0 From nobody Sun Sep 14 06:32:59 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 937F72FFDDD; Thu, 4 Sep 2025 12:58:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990731; cv=none; b=livHMqbdIacYubIstGoeRcziV7fenpYxKtCC7L+EOTXBukJCNHM2UC+Fx7QJ08sCjHaR1B/54T+4+E4lNpkUwvW16d4KXIidxH9XQG+hG6IPf7799wttuURqEuRpyVaj0I0DTlvXWakSz+7hjVLqs+G12ZHMo0L/ZShjaYstV94= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990731; c=relaxed/simple; bh=FfFahPfq5+SssMH1K9ICxs3Ni8XAHBJVYKTrFIbUjoE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ahDr8P/zqzTDw8w8Dny5lD30H4ymtpH2FzXmKa3286eQlfxcER+3GZXJSnIOC/YpXl5erGohUo6dROgAFrxeSZSxm+2XZmEaUADKvnndgSifGW+Z4wpzusw5WLgiyuNAxpIVZaOj0jk4EqyAjqtUOyn6o8+DKKMR3K4/MeKWUMo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CD01C2F27; Thu, 4 Sep 2025 05:58:41 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9C1D63F6A8; Thu, 4 Sep 2025 05:58:45 -0700 (PDT) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH 6/7] sparc/mm: support nested lazy_mmu sections Date: Thu, 4 Sep 2025 13:57:35 +0100 Message-ID: <20250904125736.3918646-7-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250904125736.3918646-1-kevin.brodsky@arm.com> References: <20250904125736.3918646-1-kevin.brodsky@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The lazy_mmu API now allows nested sections to be handled by arch code: enter() can return a flag if called inside another lazy_mmu section, so that the matching call to leave() leaves any optimisation enabled. This patch implements that new logic for sparc: if there is an active batch, then enter() returns LAZY_MMU_NESTED and the matching leave() leaves batch->active set. The preempt_{enable,disable} calls are left untouched as they already handle nesting themselves. TLB flushing is still done in leave() regardless of the nesting level, as the caller may rely on it whether nesting is occurring or not. Signed-off-by: Kevin Brodsky --- arch/sparc/mm/tlb.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index bf5094b770af..42de93d74d0e 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -53,12 +53,18 @@ void flush_tlb_pending(void) lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { struct tlb_batch *tb; + int lazy_mmu_nested; =20 preempt_disable(); tb =3D this_cpu_ptr(&tlb_batch); - tb->active =3D 1; + lazy_mmu_nested =3D tb->active; =20 - return LAZY_MMU_DEFAULT; + if (!lazy_mmu_nested) { + tb->active =3D 1; + return LAZY_MMU_DEFAULT; + } else { + return LAZY_MMU_NESTED; + } } =20 void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) @@ -67,7 +73,10 @@ void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) =20 if (tb->tlb_nr) flush_tlb_pending(); - tb->active =3D 0; + + if (state !=3D LAZY_MMU_NESTED) + tb->active =3D 0; + preempt_enable(); } =20 --=20 2.47.0 From nobody Sun Sep 14 06:32:59 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B1EEE3002BC; Thu, 4 Sep 2025 12:58:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990737; cv=none; b=LEKBXysrX9Y7H7x5wROdcnsLYyk7oNJ1RiIDNUf8DoFxJaXEnatvcvVnRc0DCTpIGJLmz2v94KxnegP5FVamsYEm0SDiNWwEPPTGW8gRTnesv7ofEaGU67f045dPGS1B8W5JhBHJeH6s79OzSFN2UyKV/rJwFLc6hhhj4MpKpao= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756990737; c=relaxed/simple; bh=eUDKAdZqgvs0HKckrnQ5CW9DnGDWJQkBfjxqaWum9qM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FoIwUqByaIYQI7asjWB4RD3wXWKo2ujTlxRYImRIDrxIW5Hz0gKX7HpXdfxU9ZCtKEXaxSwy37s34RYAGQ7GGFqYYnRrWHmJIJ9IRUvTSvWcLvW3WRdX/VN+B3+AjUjz0LE6ANYF4/NpKNCCNN5awyDBzlfjqfe92rJkKFE8RvM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A725E2F28; Thu, 4 Sep 2025 05:58:46 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 757883F6A8; Thu, 4 Sep 2025 05:58:50 -0700 (PDT) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH 7/7] mm: update lazy_mmu documentation Date: Thu, 4 Sep 2025 13:57:36 +0100 Message-ID: <20250904125736.3918646-8-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250904125736.3918646-1-kevin.brodsky@arm.com> References: <20250904125736.3918646-1-kevin.brodsky@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We now support nested lazy_mmu sections on all architectures implementing the API. Update the API comment accordingly. Signed-off-by: Kevin Brodsky Acked-by: Mike Rapoport (Microsoft) --- include/linux/pgtable.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 6932c8e344ab..be0f059beb4d 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -228,8 +228,18 @@ static inline int pmd_dirty(pmd_t pmd) * of the lazy mode. So the implementation must assume preemption may be e= nabled * and cpu migration is possible; it must take steps to be robust against = this. * (In practice, for user PTE updates, the appropriate page table lock(s) = are - * held, but for kernel PTE updates, no lock is held). Nesting is not perm= itted - * and the mode cannot be used in interrupt context. + * held, but for kernel PTE updates, no lock is held). The mode cannot be = used + * in interrupt context. + * + * Calls may be nested: an arch_{enter,leave}_lazy_mmu_mode() pair may be = called + * while the lazy MMU mode has already been enabled. An implementation sho= uld + * handle this using the state returned by enter() and taken by the matchi= ng + * leave() call; the LAZY_MMU_{DEFAULT,NESTED} flags can be used to indica= te + * whether this enter/leave pair is nested inside another or not. (It is u= p to + * the implementation to track whether the lazy MMU mode is enabled at any= point + * in time.) The expectation is that leave() will flush any batched state + * unconditionally, but only leave the lazy MMU mode if the passed state i= s not + * LAZY_MMU_NESTED. */ #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE typedef int lazy_mmu_state_t; --=20 2.47.0